GreatReads - Blog Aggregator · Phoenix Framework

0 views

Lonami 2 years ago

In defense of stale-bots

You are a young developer. You're only getting started, and learning new things as you go, but you love the idea of telling computers what to do. Automating mundane tasks. It's really exciting to have this much freedom and control over something you own. Your only limit is your imagination! Time goes by. You've come across GitHub quite a few times by now. A lot of code searches lead to GitHub projects created by other people. Other hobbyists, like you, doing what they love: coding . Until now, you've been keeping your code to yourself. It's not really nice code. The code structure is rather poor. It's difficult to add new features. It has bugs. Boy does it have bugs. But it works for you. It makes your day-by-day a little nicer. "What if I just shared this with others?" You've taken a lot of, er, inspiration, from open source projects over the past year. A lot of them are very useful! But not quite what you needed. On the other hand, your project solves your problems perfectly. By publishing your work, you would give back to that community. Share improvements. And others with a similar problem to yours would really value it! So you do it. You wake up the next day, and the project seems to have piqued the interest of many. That's… really awesome! It's a nice feeling. Not long after, pull requests start coming in. New feature additions, bug fixing, improving performance… And so do the issues. You knew the project wasn't perfect. The code had certainly grown a lot, and not in a very nice way. But it worked. It had the features you needed. It was good enough. You begin nicely answering questions in the issue tracker. Accepting most new features. Fixing bugs you never would've encountered yourself. You start a new job as a junior programming. How exciting! And there you meet your significant-other. A wonderful person in all aspects. You spend a lot of time with them and learning in your job. Time goes by. You're starting to grow a bit tired of this GitHub thing. The project has attracted a lot of attention. Many new issues are being created every day. Some extremely high quality! But plenty downright insulting you. You begin to close issues you know you won't work on. This project wasn't your job. It never was. It was a hobby. A passion project. Which worked perfectly fine for you. The constant notifications still bother you. You've disabled them. But the ever-growing issue count is always in the back of your mind. Thinking you need to bring it down. People want to use your work! You want to have a functional project! But you no longer have time. You come across this "stale bot". It can automate the process of closing down issues! Being the automation-lover you are, you begin using it in a blink. Then, people can continue benefiting from your work, and you can stop worrying about having to deal with all the bug reports and reviews. People are now angry their issue is closed without explanation. They're opening duplicates. Can't they see, this was just a hobby? This isn't your job! Why won't they realize this is your project? You don't have the time to fix all the issues. The stale bot is simply automating the process. This isn't for you. The situation is frustrating and annoying to deal with. You take the project down. After all, it doesn't need to be online for you to make full use of it. You did get feedback. You learnt a few things. The code is more performant, even if a bit harder to understand at places. There are new features, which… you don't really have a need for, but are cool, I guess. But you're glad you don't need to deal with the noise any longer. It is frustrating when you spend a lot of time on something for it to be ignored. Well-detailed bug reports, with stack traces, bisections pointing to the problem. Large patches for a new feature or a bug fix. There's progress, even if a little slow. Why can't the author have the decency of at least reading through? It is probably not their job. Why did they post their work online if they were not willing to build a commuity? They may have had a different goal in mind. They could simply want to share the work. Not deal with users, some of which have strong opinions and reflect it with equally strong language. There was a lot of high quality issues! And a large amount of angry comments. Not everyone has such thick skin. It gets to you. Your brain remembers bad experiences. Couldn't they just lock the issues? Make it clear it's a won't-fix. Maybe. The thought may not have crossed their mind. They may have weighed the pros and cons, and perhaps they wanted discussion to still happen, and take on the more interesting suggestions. They may only read through sparingly, picking up on a few. The rest would be closed automatically. Why did not they make their stance clear? Make it clear you're not accepting contributions. Why should they? They're just sharing their work. They'll work on things they find appealing. They don't want to deal with codes of conduct, educating users, writing documentation, or fixing obscure bugs. They just wanted to share their work. They're willing to accept patches. Why not leave the issues open so others can pick them up? Perhaps they just want an easy way to reduce clutter in the issues section. To keep only the ones the maintainer is actually interested in. The amount of issues wouldn't become a problem if they were triaged properly. Adding a tag is very easy! But it is still a non-zero amount of effort. You still need to think about it. Read through. Actively do it. And they may simply have no desire to do so. The maintainer is just lazy. Malpractice. But it's their issue tracker for their project. Wanting to keep a small amount of issues open is not necessarily evil. They may simply want to share code, not spend time moderating and triaging. The issue tracker is not only for the maintainer to use, it's for the users too! That is not an universal truth. The maintainer gets the final say on how their project is run. If they want public issues and sparingly work on some on their free time, they can do so. If the project is large enough, unofficial communities can form elsewhere. The place to share knowledge does not necessarily need to be "official". The maintainer does not understand their responsibilities. They should hand over their hat if they're not willing to deal with issues properly. The maintainer is simply sharing their work. There's no responsibility to keep issues open. This might be an unspoken "rule" of open source, but it's actually just a strong convention. The maintainer decides what "properly" means for their project. The existence of a stale bot should be clue enough on the maintainer's stance. The maintainer can keep sharing their work their way while forks exist. Closing issues as stale does not mean the issue is gone. They still exist. Certainly. But when browsing through the issues, there is no denying that there is a lot less clutter. And if your goal is to take a look every now and then, it works great. Stale bots waste everyone's time. It's not always obvious that they are being used, and those interested need to actively keep the issues open to prevent automatic closing. That's a choice the users make. The maintainer's stance is clear: the stale bot runs on the issue board. It is not the maintainer's problem that some users insist, even after it's clear that they maintainer doesn't care. The lack of discoverability on whether stale bots are being used or not is a fair point. However, if the repository lacks contribution guidelines, you might want to spare a minute to check for the existence of a "stale" label, or do a quick search to figure out if issues are closed as "stale". By doing this, you don't need to commit to creating well-detailed reports if you're not willing to put up with the fact that it might just get closed without the author reading it. Stale bots create fatigue for everyone, as duplicate issues are created and those watching get pinged. This is rude on part of the maintainer. Yes, this is annoying for the users. But not necessarily intentional malice from the maintainer. It is simply how they chose to automate a process. Open source has a few unspoken rules. Perhaps our junior developer was unaware of them. It is possible to educate them. Explain why you're against stale bots, and politely ask them to clarify what their stance is. Hobby projects are not an excuse to mistreat the community. Maintainers may not owe us support, but they owe us respect. The maintainer may not have wanted to have a community in the first place. Social rules may say this behaviour is impolite. This is a fair point. Ask the maintainer to make it clear what their stance is. This question is uncomfortable, so it may happen that the topic is avoided. Whatever the maintainer's answer is, remember they have the final say on how they run their project. You should respect this choice, even if you strongly dislike it and it makes you uncomfortable. Their project might not be for you. As unsatisfying as it is, it's a choice the maintainer can make. They may be aware of this problem and actively try to document how contributions are handled. But they don't have to. GitHub enables the issue tracker by default. The maintainer may want to keep it for their own use. They may not want to build a community. Users have their own expectations. There's an issue tracker, so I can report bugs, right? Well, yes. You can report issues. But that doesn't mean the maintainer has to read them. And this does not make them a bad person. Forking is always an option. Become a maintainer who is more engaged in building a community yourself. Triage the issues. This is not for everyone. Stale bots are simply automating a task. One solution of many. And like every solution, it comes with trade-offs. Yes, it feels more "humane" for the maintainer to close the report as "won't-fix" with an explanation. Not everyone has time for that, or the temperament to do so. The maintainer does not necessarily feel "shame" for having a lot of issues open. They might simply want to keep a small amount of them open. It's true. The maintainer may not care about your issues. But those are your issues, not theirs . They're free to spend their free time however they wish. And if they wish, they can choose to ignore and close said issues, while keeping a public project page up. If the maintainer's stance is not clear, take the opportunity to educate them on why it's important to make it clear. There are unspoken rules they may not be aware of. If your expectations don't match those of the maintainer, don't get angry at the maintainer. You can consider their use of stale bots as their stance on the matter. You can disagree. Take a step back. Read the contribution rules and don't engage if you disagree. Save yourself a bad time. To make it clear: this post's protagonist is a hobbyst. Of course, everyone's situation is different, and stale bots used by companies are a whole other matter. But don't go making other people's lives miserable. There's enough of that already. Some just want to code, not build a community. Harassing maintainers is never okay. Personally, I haven't used stale bots before, and don't plan to in the near future. I haven't burnt out on answering issues just yet. (Plus, I find the whole process extremely dull, even if they're easy to "set up".) When I was getting started in the open source world, I was thrilled to get bug reports and be able to answer. And not just bugs, but questions too. Code reviews, it was great! I did eventually get tired. Some of my past answers were starting to be clearly impolite. I've been a dick to people. Sorry about that. I've now realized angry answers help nobody, and try to at least provide some context, before closing issues myself if it's not actionable. The quality of my answer still depends on the quality of the question. But I really try to remain polite. I think this is the best a maintainer can do. But I acknowledge it takes a lot of time, temper, dedication. And so, I can understand those who choose to use stale bots instead. They may not be aware of all the trade-offs and implications, but they can be educated. Respect their choice, and keep it cool.

Career

Open Source

0 views

Lonami 4 years ago

Writing our own Cheat Engine: Multilevel pointers

Or: Dissecting Cheat Engine's Pointermaps This is part 8 on the Writing our own Cheat Engine series: In part 7 we learnt how to allocate memory in the remote process, and how we can use that memory to inject our own code for the remote process to execute. Although we didn't bother with an assembler, it shows just how strong this technique can really be. With it we've completed the Read, Write and eXecute trio! Now it's time to find how we can make our work persist. Having to manually find where some value lives is boring. If the game is able to find the player's health for its calculations, no matter how many times we restart it, then why can't we? This entry will review how Cheat Engine's pointermaps work, analyze them, and in the end we will approach the problem in our own way. Because this post is quite lengthy, here's a table of contents: This step will explain how to use multi-level pointers. In step 6 you had a simple level-1 pointer, with the first address found already being the real base address. This step however is a level-4 pointer. It has a pointer to a pointer to a pointer to a pointer to a pointer to the health. You basicly do the same as in step 6. Find out what accesses the value, look at the instruction and what probably is the base pointer value, and what is the offset, and already fill that in or write it down. But in this case the address you'll find will also be a pointer. You just have to find out the pointer to that pointer exactly the same way as you did with the value. Find out what accesses that address you found, look at the assembler instruction, note the probable instruction and offset, and use that, and continue till you can't get any further (usually when the base address is a static address, shown up as green). Click Change Value to let the tutorial access the health. If you think you've found the pointer path click Change Register. The pointers and value will then change and you'll have 3 seconds to freeze the address to 5000. Extra: This problem can also be solved using a auto assembler script, or using the pointer scanner. Extra2: In some situations it is recommended to change ce's codefinder settings to Access violations when encountering instructions like mov eax,[eax] since debugregisters show it AFTER it was changed, making it hard to find out the the value of the pointer. Extra3: If you're still reading. You might notice that when looking at the assembler instructions that the pointer is being read and filled out in the same codeblock (same routine, if you know assembler, look up till the start of the routine). This doesn't always happen, but can be really useful in finding a pointer when debugging is troublesome. If you say "pointer" enough, you'll end up having semantic satiation . My goal by the end of this post is that you actually get to experience that phenomenon. Anyway, no real program would actually have pointers pointing to pointers which themselves point to a different point (and, you've guessed it, this point points to another pointer pointing to yet another pointer), right? That would be silly. Why would I have a value behind, say, 5 references? I'm not writing Rust code like . But I am sure you are much more likely to be doing something like . And there's a lot of references there: Each of those is a different structure, with many fields each (for example, the areas also contain enemies and items dropped in different vectors). When there's more than one field, the pointer often points (sorry) to the beginning of the structure, and you need to add some offset to reach the desired field. If you have a reference to some but access the field, you actually need to read from . This is why the tutorial step suggests to "look at the instruction", because it very likely encodes the offset somewhere (if not directly, nearby). Looking at instructions to determine offsets works because normally people want their games to be fast, so they make good use of the available CPU instructions. Obfuscating hot code could slow a game way too much (but it may still be done to some degree!). To complicate things further, the same reference to one thing may be stored in multiple locations, making it possible to find your goal address through many different paths. In Rust, this happens when you have a shared pointer, such as or (or if you go the route and have the same value scattered around). The tutorial suggests to complete this step in the same way we did back in step 6. Add a watchpoint, find out what code is accessing this address, look around the disassembly, and write down your findings. Although this technique definitely is a valid way to approach the problem, it is quite tedious and error-prone. It would be hard to fully automate this, because who knows what shenanigans the code could be doing to calculate the right pointer and offset! Sure, Cheat Engine's tutorial is not going to purposedly obfuscate the instructions manipulating our target address. But other programs may be dynamically reading the offset from somewhere. This technique is also pretty intrusive, because it requires us to attach ourselves as the debugger of the victim program. I hardly have any experience writing debuggers, leave alone writing them in a way that makes them hard to detect! I'm sure it's a very interesting topic, but it's not the current topic at hand, so we'll leave it be. If you know of good resources for this, let me know so I can link them here. Furthermore, we've already gone this route before, so it would be silly to repeat that here, just to end up with a longer version of it. You may have noticed the "extra" information the tutorial step provides: Extra: This problem can also be solved using a auto assembler script, or using the pointer scanner. We've already done the "auto assembler script" part before (in part 7). I'm not sure how you would approach this problem with that technique. Maybe one could dig until the base pointer, and replace whatever read is happening there with a hardcoded value so that the game thinks that's what it actually read? I'm not sure if it would be possible to solve with injected code without following the entire pointer chain. Or maybe you could instead patch the write to use a fixed value. But anyway, we're not doing that, no manual work will happen on this one. No, we're interested in the pointer scanner ↪1 . Once you find a value in Cheat Engine, you have the option to "Generate pointermap". This will prompt you to select a file where the generated pointermap will be stored, in format (along with its ). If you're scanning a lot of memory, you will get to see a progress window (otherwise, it will be pretty much instant), along with some statistics: My guess for "unique pointervalues" is the set of pointers found so far, and the queues may be used by the way the scan is done, presumably hinting at an implementation detail. The rest of information is pretty much self-explanatory (lowest known path probably is the shortest "pointer path" found so far). When I talk about "pointer paths", I'm referring to a known, static base address that won't change, with a list of offsets that, when followed, arrive at some desired value in memory (for example, your character's health). In essence, it's a path made out of pointers, with a new pointer to follow at each step. The solution found with Cheat Engine for this tutorial step makes for a good example: Let's get back to talking about Cheat Engine's scan. After generating the pointermap, the idea is to force the game to change the pointer path (for example, by closing and re-opening the game again) and find your target value once again. For the tutorial, we can just change the pointer. After we find the value again, we do a "Pointer scan for this address". The "Pointerscanner options" has a checkbox to "Compare results with other saved pointermap(s)". Running this seems to generate a second pointermap, and after some magic, both are compared and the one true pointer path is found ↪2 . There's a bunch of files generated: Now, there's this one option under "advanced" known as "Compress pointerscan file". The long description reads (emphasis mine): Compresses the generated .PTR files slightly , so they take less space on the disk and less time writing to disk. Most of the time the bottleneck of a pointerscan is disk writing, so it is recommended to use this option. Slightly, huh. Well, for the tutorial, which is using (according to the task manager) 2'364 K, running the scan with the compression disabled generates roughly 5 gigabytes across the nine . That's… not too shabby for a "slight" compression. Let's guess what those files are storing. The screen with the results does say it found uh, well you know, the usual, 122'808'639 pointer paths. This is the result of scanning for an address. That's (very) roughly 40 bytes per path, and assuming 8 bytes for each address/offset, equates to 5 hops. I guess the math kind of checks out? On the other hand, "generate pointermap" just spits out the at roughly 60KB. So these two options are definitely doing something very, very different. And I have no idea what either of these are doing. Let's dive into Cheat Engine's "advanced options" for the pointer scan to try and gain some insight. I will be listing all the settings available in the scan form and adding a bit on whether I think they're useful to us or not. The Pointerscanner scanoptions window has plenty of options that are extremely valuable to gain insight of what's going on behind the scenes without having to dig into the code. At the very top we have three modes: The third option is what we use during the first step, and the first option for the second step. When using either the first or second mode, you can also check Use saved pointermap which you can use if you have created a pointermap on a system that runs the game, but you wish to do the scan on another system (or multiple systems). With the first or second mode, you can also Compare results with other saved pointermap(s) which, when ticked, lets you add other pointermaps which will be used to verify that the pointers it finds are correct. You do have to fill in the correct address for each pointermap provided, and one should expect at least the size of the game itself in memory for every pointermap used. We know this step is key, but we don't know how that comparison could be possibly done. The checkbox Include system modules I presume also scans in system modules and not just game's own modules, which is useful if you suspect the value lives elsewhere. Not helpful for us right now, but good to know this is a possibility. Apparently, Cheat Engine can improve pointerscan with gathered heap data. The heap is used to figure out the offset sizes, instead of blindly guessing them. This should greatly improve speed and a lot less useless results and give perfect pointers, but if the game allocates gigantic chunks of heap memory, and then divides it up itself, this will give wrong results. If you only allow static and heap addresses in the path, when the address searched isn't a heap address, the scan will return 0 results. I do not really know how Cheat Engine gathers heap data here to improve the pointerscan, but since this mode is unchecked by default, we should be fine without it. By default, the pointer path may only be inside the region 0000000000000000-7FFFFFFFFFFFFFFF. There's a fancier option to limit scan to specified region file, which presumably enables a more complex, discontinuous region. Or you can filter pointers so that they end with specific offsets ↪15 . Or you can indicate that the base address must be in specific range, which will only mark the given range as valid base address (this reduces the number of results, and internally makes use of the "Only find paths with a static address" feature by marking the provided range as static only, so it must be enabled). Pointers with read-only nodes are excluded by default, so the pointerscan will throw away memory that is readonly. When it looks for paths, it won't encounter paths that pass through read only memory blocks. This is often faster and yields less useless results, but if the game decides to mark a pointer as readonly Cheat Engine won't find it. Only paths with a static address are "found". The pointerscan will only store a path when it starts with a static address (or easily looked up address). It may miss pointers that are accessed through special paths like thread local storage (but even then they'd be useless for Cheat Engine as they will change). When it's disabled, it finds every single pointer path. Now, this bit is interesting, because the checkbox talks about "find", but the description talks about "store", so we can guess there's no trick to only "finding" correct ones. It's going to find a lot of things, and many of them will be discarded. It also mentions thread-local storage and how we probably shouldn't worry about it. Cheat Engine won't stop traversing a path when a static has been found by default. When the pointerscanner goes through the list of pointervalues with a specific value, this will stop exploring other paths as soon as it encounters a static pointer to that value. By enabling this option, some valid results could be missed. This talks about "pointervalues with a specific value", which is a bit too obscure for me to try and make any sense out of it. Addresses must be 32-bit alligned. Only pointers that are stored in an address dividable by 4 are looked at. When disabled, it won't bother. It enables fast scans, but "on some horrible designed games that you shouldn't even play it won't find the paths". Values in memory are often aligned, so reducing the search space by 75% ↪16 is a no-brainer. Cheat Engine can optionally verify that the first element of pointerstruct must point to module (e.g vtable). Object oriented programming languages tend to implement classobjects by having a pointer in the first element to something that describes the class. With this option enabled, Cheat Engine will check if it's a classobject by checking that rule. If not, it won't see it as a pointer. It should yield a tremendous speed increase and almost perfect pointers, but it doesn't work with runtime generated classes (Java, .NET). Optionally, it can also accept non-module addresses. I have no idea how this is achieved, but since it's disabled by default, we should be able to safely ignore it. By default, no looping pointers are allowed. This will filter out pointerpaths that ended up in a loop (for example, base->p1->p2->p3->p1->p4 since you could just as well do base->p1->p4 then, so throw this one away (base->p1->p4 will be found another way)). This gives less results so less diskspace used, but slightly slows down the scan as it needs to check for loops every single iteration. The thought of how much data the 5GB scan would generate without this option makes me shiver. Cheat Engine will allow stack addresses of the first thread(s) to be handled as static, which allows the stack of threads to be seen as static addresses by the pointerscan. The main thread is always a sure bet that it's the first one in the list. And often the second thread created is pretty stable as well. With more there's a bigger chance they get created and destroyed randomly. When a program enters a function and exits it, the stack pointer decreases and increases, and the data there gets written to. The farther the game is inside function calls, the more static the older data will be. With max stack offset you can set the max size that can be deemed as static enough (the max stackoffset to be deemed static enough is 4096 by default). It finds paths otherwise never found, but since there are more results, there's more diskspace. Cheat Engine by default will look at the stacks of two threads, from oldest to newest. It indicates "the total number of threads that should be allowed to be used as a stack lookup. Thread 1 is usually the main thread of the game, but if that one spawns another thread for game related events, you might want to have that secondary thread as well. More threads is not recommend as they may get created and destroyed on the fly, and are therefore useless as a lookup base, but it depends on the game". Unfortunately, this option is enabled by default, so it seems pretty important, and we might need to put some work into figuring out how "stacks" are found. However, this would mean that some "base" object (like a instance) is passed down by reference hundreds of calls, which seems pretty annoying just to have access to something that effectively acts like a global, so hopefully games don't make use of this. This can be taken a step further, and consider stack addresses as ONLY static address, if you wish to only find pointer paths with a stack address. It must be combined with "Only find paths with a static address" (default on) else this option will have no effect. You'll only get paths from the stack, but you don't get get paths from random DLL's or the executable. The pointerscan file is by default compressed. Cheat Engine Compresses the generated .PTR files slightly so they take less space on the disk and less time writing to disk. Most of the time the bottleneck of a pointerscan is disk writing, so it is recommended to use this option (which was not available in older versions). Only positive offsets are scanned by default, but Cheat Engine may optionally scan for negative offsets as well (although it can not be used in combination with compressed pointerscan files; this seems to hint that the compression assumes only positive values). On my machine, 9 threads are scanning by default with a maximum offset value of 4095 and a maximum level (depth) of 7. The maximum different offsets per node are 3. When the pointerscan looks through the list of pointers with a specific value, it goes through every single pointer that has that value. Every time increasing the offset slightly. With this feature enabled the pointerscan will only check the first few pointers with that value. This is extremely fast, and the results have the lowest pointer paths possible, but you'll miss a lot of pointers that might be valid too. I think this description is key, as it clearly says what the pointerscan does and maybe even how it works (although it sounds a bit inefficient, so Cheat Engine probably uses other tricks). Cheat Engine clearly knows this process is expensive, so it optionally allow scanners to connect at runtime. This opens a port that other systems running the pointerscanner can connect to and help out with the scan. Or it can connect to pointerscan node, which will send a broadcast message on the local network which will tell pointer scanner systems to join this scan if they are set to auto join (or "Setup specific IP's to notify" to notify systems of this scan that are outside of the local network). And that's all! In summary: After playing around a bit more with Cheat Engine's scans, I realized the 14 bytes of the is because the process literally finds a single path which it places there. Running the process with compression and no previous scan to compare it to spits out roughly 750MB (so the compression does go from 5GB to 750MB, that's a lot more reasonable). In any case, we're with the now. I really do wonder what could it possibly contain? I really doubt it's the pointer paths found, because then it would be huge. Perhaps it contains the memory regions? That would make some sense, since the sibling is a list of all the loaded modules. Maybe the contains the memory regions for all of those loaded modules. For the first time in this series, I really don't know how Cheat Engine could be working behind the scenes. Is it really evaluating millions of paths ? That's a lot of memory, no matter how you encode it! I'm really impressed at the processing speed if this is in fact the case. Let's see how a naive approach for that could look like ↪3 . We start off with a single address, the address of a particular value we care about in memory (for example, the player's health). This address is an 8-byte number (which for us is an ), so we can look for pointer-values (values in memory that look like a pointer to a certain address) that point to this address (or close enough). Let's call this . For every memory block, and for every pointer value in it, we check if the distance between the and the falls within an arbitrary range, for example: There's a lot of things to unpack in this small snippet: At the end of this, will have many memory addresses pointing to a different each. This points to minus some offset (which can be calculated at any time by substracting again). These are the pointer-values at depth 0 ↪9 . Each of these is in itself the next , and running the process again will produce pointer-values for depth 1. Yes, you've guessed it, this has exponential growth. No wonder Cheat Engine finds millions of paths. And don't forget to somehow save "this address came from this other address", so that you can follow the chain back after you're done! Note the importance of limiting the depth: not only this growth has to stop at some point, but also think about cyclic paths. The program would get stuck as soon as , looking for itself over and over again! Without actively looking for cycles ↪10 , and without limiting the depth, the process would never finish. After the full process completes (having executed multiple iterations of it at multiple depths), we would need to check every path to see if it works for us (that is, if it starts with a "static address"). You will have an obscene amount of paths, many of which won't actually work after restarting the program (it might have been luck that some unrelated component got allocated close to your original but now it's not anymore). So how do we clean this mess up? We run the process again! Preferably, after the memory has shuffled around enough (for example, again, restarting the program). Once we have the list of paths "before" and "after", we compare them all. The naive approach of checking, for every path in "before", if any of the paths in "after" is the same, would yield a sweet time complexity of , with millions of paths. This ain't gonna cut it. We must do better. I don't know if this is what Cheat Engine is doing (but if it is, I tip my hat to them), but since I can't think of an efficient way to do it, we'll be going a different route. By reading an entire block of memory at a time, we're actually doing pretty okay on that department. It would be very, very wasteful to issue millions of reads of 8 bytes, when we could instead run thousands of reads of several kilobytes (or more!). Of course, we still have to read millions of 8-bytes, but if they're in our memory and don't require a call to the Windows API, it's going to be orders of magnitude faster. We're only reading aligned pointers, cutting down the amount of reads and checks we perform down to . A lot of useless results are also discarded this way. We're only considering positive offsets, and we're limiting how "far" the can be from a possible before we stop considering said . After all, a structure longer than 4096 bytes should hopefully be uncommon. By doing this, we only keep "address-like" values, which have a very high chance of being an actual address, although they could very well not be! We may be finding arbitrary values and think they represent an address when they actually don't. We're limiting the maximum depth we're willing to go. This depth directly correlates to the maximum length a pointer path can have. If you're confident the path won't be longer than, say, 5 addressses, there's no need to dive any deeper, and you will save on a lot of processing this way. This code can be made parallel trivially (after making Rust compiler happy, anyway). There is a lot of values to scan for, so if we think of as a "queue of work", more than one thread can be popping from it and running the scan. This gives a nice boost on multi-core systems. It doesn't entirely scale linearly with the number of cores, but it's close enough to what you would expect. A pointer path will only be considered if it starts with a static address. This means the last address pushed must be static (the path will have been built backwards, because we started at the end, ). This should clean-up a lot of intermediate and uninteresting addresses. If the address isn't static, it's not really interesting to us. Remember, the reason we're doing all of this is so that we can reuse said address in the future, without the need to find manually. Comparing the pointer paths will result in paths that very likely will work in the future. Not only is this important to reduce the number of paths drastically, but it also provides better guarantees about what is a "good", reliable path to follow to find . Next up, let's talk about some of the more intrusive optimizations which I actually seeked to reach an acceptable runtime. This will be where I started to code this up. Add a braindump mess enough to find pointerpaths This is the commit message that made it possible to complete step 8 on the tutorial (the actual commit message has quite some more lines explaining the commit). Unlike previous entries of this series, I had a hard time making incremental progress. So let's dissect what was done instead. The approach used in this commit (although really messy), consists on taking two "snapshots" of the memory, and knowing where a desired value is located in both. By introducing the concept of "snapshots", we can "freeze" the process' memory at a given point in time, and scan it at our leisure, without having to worry about it changing. Not only this, but it also saves on a lot of calls to , so it's also more efficient. If memory is an issue, these structures could be saved to disk and streamed instead. I haven't measured how fast this is, but having our own copy of the process' memory lets us run the scan even after the process is closed (and by then we would reclaim some of that memory), so this approach is mostly benefits. Pretty straightforward. A consists of the process' memory along with some metadata for the blocks. This lets us know, given an index into , what its real address (or vice versa): Because we've sorted by , and we filled the in order, we can in both cases. translates into as follows: Because this time we already own the memory, we can return a slice and avoid allocations ↪11 . Now that we have two snapshots of the process' memory at different points in time (so the pointer-values to are different), we find in both snapshots (it should be a different pointer-value, unless it so happens to be in static memory already). Then, the pointer value of the address is searched in the second snapshot (within a certain range, it does not need to be exact). For every value found, a certain offset will have been used. Now, the pointer value minus this exact offset must be found exactly on the other snapshot (it does not matter which snapshot you start with ↪12 ). This was my "aha!" moment, and it's a key step, so let's make sure we understand why we're doing this. Rather than guessing candidate pointer-values which would have a given offset as a standalone step, we merge this with the comparison step, insanely reducing the amount of candidates. Before, any pointer-value close enough to had to be considered, and in a process with megabytes or gigabytes of memory, this is going to be a lot. However, by keeping only the pointer-values (which have a given offset) that also exist on the alternate snapshot with the exact value, we're tremendously reducing the number of false positives. is like , but better for our needs, because it automatically returns the pointer-values and its corresponding real address efficiently. The is a helper to avoid passing , and as parameters on every call. is a recursive method which is called with the in both the first and second snapshot, along with a depth. When this depth reaches 0, the method stops recursing. The method also stops when a base (static) address is found. The method starts by looking for all pointer-values in the second snapshot where for all . For every candidate with a given , it looks exactly for in the alternate snapshot (the first one). Once found, we have a candidate offset valid in both snapshots, and then we can recurse to find subsequent offsets on the real addresses of these pointer values themselves. The addresses of these pointer-values are our new in the next depth. Once returns from the top-most depth, we can convert post-process into something usable, with an algorithm akin to run-length encoding (the real-code abuses the vector's and to determine the and had inaccurate names, so I've rewritten that part for clarity): Note how this process can form a tree. Any given depth can have any amount of children. For example, if the address finding yields the following addresses (where the hundreds' also represent the depth): Then this represents the following call-stack tree: Once the many paths have been cleaned up into a separate vector each, we can turn these addresses into offsets: For the example above, the result would be: In order to reach , we have to read the from and add a given . This is given by . By iterating the list of addresses in reverse, we can neatly turn them into offsets substracting this . Now we're done! We can persist this list of and it will work at any point in the future to get back to our original . In pseudo-code: By the way, sometimes the scan will take horribly long and find thousands of path, and sometimes it will be blazingly fast. I don't know why this is the case, but if that happens, you can try restarting the tutorial. And do not forget to run on mode, or you will definitely be waiting a long, long time. The recursive implements a fairly elegant solution. Unfortunately, this is hard to parallelize, as it all runs on the same thread and there is no clean way to introduce threads here ↪13 . We will rewrite this version to use a queue instead, with the idea that multiple threads will be taking work from it. In order to do this, let's introduce two new concepts: The should be as small as possible, because there will be one for every address of the candidate pointer-values. Without doing anything fancy, we'll need an optional to build a "linked list" of the path, and the address of the pointer-value. With the field, we can trace all of the parent candidate nodes all the way back up to the root node. The will hold temporary values, until a thread picks it up and carries on, so there's no need to over-optimize this. For a thread to continue, it needs to know the pointer-value address and its parent (that is, the candidate node it will work on), along with the first and second goal address for a given depth. After the process completes (a base or static address is found), it's enough to remember the candidate node, as we'll later be able to follow the chain. Thus, the needs to hold the following values: This version probably uses more memory, as we need to remember all because any live may be referencing them, and a itself has parents. It should be possible to prune them if it gets too large, although a lot of indices would need to be adjusted, so for now, we don't worry about pruning that tree (which we store as a and the references to the parent are indirect through the use of indices). However, this version can use threads much more easily. It's enough to wrap all the inside a . And what's more, it is now trivial to perform the search breadth-first instead! With the recursive version, we were stuck performing a depth-first search, which is unfortunate, because the first valid paths which would be found would be the deepest. But now that we have our own work queue, if we keep it sorted by depth, we can easily switch to running breadth-first. Shorter paths feel better, because there's less hops to go through, and less things that could go wrong: Thanks to Rust's wrong decision of making be max-heaps ↪14 , and our use of a decreasing depth as we get deeper, the ordering just works out! Next up, threads should be introduced for the next big-boost in runtime performance. This isn't too tricky, but I would recommend you introduce by now and persist both and so that you can easily debug this. Running the program on Cheat Engine's tutorial gets boring fast. I'll leave both of these as an exercise to the reader. Just make sure the threads don't end prematurely, because even if there is no work now , it doesn't mean there won't be a few milliseconds later. Else you will be back at single-threaded execution! After adding threads, I kept poking around the program and seeing how seemingly-innocent changes made runtime performance a fair bit worse. Here's some of the insights I got: It turns out isn't called a lot while analyzing Cheat Engine's tutorial, so it better be fast. And one way to go fast is to do less! For every future node, we have to read and compare an entire snapshot of the process' memory against a value. For 8MiB worth of memory, that's over a million comparisons! Using threads can only scale as far as the amounnt of cores you have before degrading quickly. A lot of those comparisons won't be useful at all, and if the method runs a hundred times, there can easily be 6MiB that you could avoid scanning at all, a hundred times. What if, instead, we run some sort of "pre-scan" that tells us "don't bother looking around here, you will not find anything useful"? We totally can, and the good news is, it does improve the runtime quite a bit! In order to do this, we need another way of instructing the program where to look. We can do this by adding additional information to each block (either directly or indirectly) that tells us "which other blocks have pointer-values that point into us?": When running the scan (via ), instead of running over all addresses, we determine the block where the current falls in and scan only on the blocks indicated by . I did some math, and on the tutorial step, rather than scanning 95 blocks, we scan an average of 3.145 blocks (median 2, standard deviation 6.12), which greatly reduces the amount of work that needs to be done on a snapshot which is roughly 10 MiB. There's a chance that the block we're scanning just so happens to be very "busy" and have a lot of blocks pointing into it (which would make sense, as that's probably an indication that the interesting things occur there). However, it is definitely possible to improve on the heuristics, all with different trade-offs. The simplest heuristic is "assume every block can point to any other block" (which we were doing before). A slightly better one is "determine which blocks have a chance of pointing into other blocks". You could even narrow down the "scan area" within blocks to make them "smaller", for example, by finding the bounding addresses of "interest" and trimming the block size. You could sort the blocks differently, perhaps prioritizing when a block points into itself, or add additional exit conditions. But this is plenty fast, even more so if you use threads for as well! Another idea would be dropping some blocks entirely (although this is partially mitigated thanks to ). If a base block (i.e. it starts where a module does) doesn't belong to the program in question (for example, it belongs to a system DLL), we could drop it, and don't even consider it in . This is probably what Cheat Engine is doing with "Include system modules", although I haven't experimented much with that option. The downside is, if it just so happens the offsets follow a path through that block, it won't be found. But it shouldn't be a big deal when plenty of paths are found. In order to ignore system DLLs, it should be possible to find the module names and then where are they located (pretty much emulating the Dynamic-Link Library Search Order ). If it falls within system directories, then we would ignore it. If we want to reduce the search-space even more, we could specify a range of addresses. When any address falls outside this range, it is ignored. I believe Cheat Engine's default 0000000000000000-7FFFFFFFFFFFFFFF range is pretty much "scan all of it", as we're doing, but with more knowledge of the program at hand, you could definitely narrow this down. Because we're not directly working with offsets (they are calculated after, and not before finding a candidate), I'm not sure how we could accurately implement Cheat Engine's option for "maximum offsets per node". Perhaps by building a temporary , sorting them in descending order, and only considering the first few smallest ones? More testing would be necessary to see if this is worthwhile. Beyond this last optimization, I can't think of any other worthwhile implementing though. We should be getting pretty close to somewhere optimal. Anyway, let's finish this tutorial step, shall we?: After letting this post settle down on me, I realized we probably managed to re-invent the way Cheat Engine works, or at least most of it, something I'm quite proud of! If you want to have this idea "click" in your head by yourself, you can skip this section. But really, there's an awful lot of similarities, and even matching terminology to some extent. Recall back in the scan options section, the two primary modes were Scan for address and Generate pointermap . Scanning for an address with the setting "Compare results with other saved pointermap(s)" straight up sounds like the solution we came up with. We take two snapshots (the older one being equivalent to Cheat Engine's "saved pointermap") and perform a scan for the desired address, while comparing our intermediate results with the other pointermap to make sure it is still valid. Its job is to find all candidate paths, and if you were not comparing it to anything, obviously this would lead to a lot of false positives, which is why Cheat Engine advices against it. Remember when we talked about "the pointerscanner goes through the list of pointervalues with a specific value"? This sounds a lot like our queue, too. The scan settings even mention "Static and dynamic queue sizes", possibly hinting at this implementation detail (as opposed to using unbounded recursion). And what could a pointermap be other than… a mapping between pointers? This sounds like an awful lot to our "pre-scan" which scanned all the regions to find out "which regions could contain valid pointers into which other regions". That's a mapping of regions as determined by the pointers contained within them, and perhaps Cheat Engine only cares to store worthwhile snapshots of the memory and the corresponding regions. Maybe this is what Cheat Engine means by limiting the scan only to certain regions! And this my dear readers concludes my ambitions with the project! I think the program is pretty useful by now, even if it can only do a small fraction of what Cheat Engine can (I don't think I'm ready to write a form designer GUI yet… wait why was this part of Cheat Engine again?). Despite the length of this entry, we didn't even figure out how Cheat Engine's pointer scanner works. Maybe it really is finding millions of possible paths, perhaps storing the offsets in some compact way . Although we can't know for sure what Cheat Engine is doing behind the scenes without studying its source code, we came pretty darn close to it. Let's recap what we do have learnt: The code for this post is available over at my GitHub. You can run after cloning the repository to get the right version of the code. If you're feeling up for a challenge, try to find a different, faster way (as in, less computationally-expensive) in which you can complete this tutorial step. Although ways to cut down the amount of work that needs to be done are definitely welcome, I'm looking for an entirely different approach, which can, for the most part, side-step the "there's too much work" issue. In the next post, we'll tackle the ninth step of the tutorial: Shared code. I'm hoping it won't be too difficult, although there will be some learning that needs to be done. After that, I'll probably conclude the series. Maybe there could be some bonus episode in the future, or some other form of progress update. Until next time! 1 I spent a good chunk of time figuring out how to get this effect on the text (and borrowing code from several sites), but I'm extremely satisfied with the result. You do need a "modern" browser to see what I mean, though. I also lost it after the fact and had to redo it. Oh well. ↩ 2 Actually, over a couple hundred are often found. But there's a high chance most of them would work just fine. ↩ 3 I've gone through a lot of iterations for this post, with a fair amount of messy code, so this time I'll be explaining my thought process with new code rather than embedding what I've actually ended up writing. ↩ 4 Only if starts off as an aligned address, of course. But I think memory regions must start at multiples of the page size, which is a (relatively) large power of two, so it's safe to assume is divisible by 8. You could throw in an if you wanted to be extra sure. ↩ 5 Although, just like we assume is a multiple of 8, the probably is as well. ↩ 6 We could from to and dereference, but then we need to be careful about alignment, and seems to be plenty fast already. ↩ 7 Not that we actually care about debug builds, as they run several orders of magnitude slower. But still, has the right semantics here. ↩ 8 Most of the time pointers point to the beginning of some structure, not its end, so accessing this structure's fields is done by adding, and not substracting, an offset from the pointer-value. For example: ↩ 9 Or the top-depth, however you want to see it. I personally prefer starting at the highest depth so that when zero is reached, we know we're at the end. ↩ 10 Which really, I don't think is worth it at all. If Cheat Engine is finding millions of entire paths , what kind of magic is it using to find cycles at any two depths??? ↩ 11 Yes, could be changed to take in a buffer as input instead, so that it can be reused. Or it could even have an internal buffer. But we won't be using this method much anyway. ↩ 12 I prefer starting on the second snapshot because it feels more "fresh", as it's the latest one, although it doesn't really matter, because the path we're looking for must be valid in both anyway. ↩ 13 Maybe the recursive could run in a pool of threads? ↩ 14 Most heaps tend to be min-heaps, and it's not uncommon for the use of in Rust to need in order to get min-heap behaviour . There's been some discussion on internals about this, such as Why is std::collections::BinaryHeap a max-heap? and more recently Specializing BinaryHeap to MaxHeap and MinHeap where @matklad laments: ↩ I feel like our heap accumulated a bunch of problems (wrong default order, slow into-sorted, wrong into-iter, confusing naming, slow-perf due to being binary). 15 This sounds like it would be most useful when you've already put the work before, and is now time for a re-scan. In this scenario, you already know that there's probably some golden "offset" into the structure you care about. ↩ 16 87.5% for us, thanks to having 8-byte sized pointers! ↩ Part 1: Introduction (start here if you're new to the series!) Part 2: Exact Value scanning Part 3: Unknown initial value Part 4: Floating points Part 5: Code finder Part 6: Pointers Part 7: Code Injection Part 8: Multilevel pointers Multilevel pointers Pointers pointing points Pointer maps Single-threaded naive approach Speeding up the scan Working out a PoC Doing more for better runtime speed Doing less for better runtime speed Retrospective Unique pointervalues in target. Scan duration. Paths evaluated. Paths / seconds. Static and dynamic queue sizes. Results found. Time spent writing. Lowest known path. is a bunch of binary data that I have no idea what could contain. seems to contain , one per line, of the addresses you had "saved" when the first pointermap was made. This seems to be used when performing the pointer scan and comparing results (so that you can choose the address you want to compare it to). is 1201 bytes (such an strange size) and seems to contain a list of the modules loaded by the program , where is a number between 0 and 8, are mostly empty files (except for 4 which is 14 bytes). Scan for address Scan for addresses with value Generate pointermap Assume addresses are 32-bit aligned (maybe even 64-bit). Discard paths that don't end in a static address (bonus points if the top of the stack for the firsts two threads are also considered). Ignore read-only memory. Limit the number of offsets per pointer to something small like 3, and give up after reaching a depth greater than 7. Limit the offset range to . Use multiple threads. We're only interested in regions that are both readable and writable, pretty much like Cheat Engine is doing. If we can't read a memory region, we can just skip it. Our desired address is probably not there. There's a lot of regions anyway so this is probably a good thing as we can reduce the scanning time! achieves multiple things: It's the most concise way to read chunks of 8 bytes in size, the alternative being having a and then slicing on . It will look on aligned addresses ↪4 for free (the alternative being , which would also look for unaligned addresses). It makes sure the chunk is always 8 bytes in size, which is important ↪5 , because is also 8 bytes in size on 64-bit machines. Interpreting 8 bytes of memory as an can be safely (and efficiently!) ↪6 achieved through , which expects an . Thankfully, we can convert the 8-byte-long slice into an array pretty easily with . It's important to use , because the operator would panic on underflow on debug by default ↪7 . Since we're reading all of the memory in the program, there will be a lot of values, many of which would be less than , causing underflow. Note also how we could interpret the values as instead so that a negative offset could be used in the range. However, a negative offset is much less common ↪8 , so it's fine to stick with positive offsets. We have a candidate pointer-value, so we make sure to store its address. Using or not (by placing the inverted condition inside the loop with a ) can both help or hurt performance. Hoisting certain conditions, like , and duplicating the entire loop body rather than running it every time, can hurt performance. The moments when you should wake up threads matters (if your approach works in a way where this matters). Changing the order in which you compute certain values and then use them can matter. introduces a fair bit of overhead due to alignment concerns, and can easily be reduced from 24 bytes to 16 by getting rid of the and instead using a special value to signal "no-parent". Atomics are neat, but a bit annoying to use. Crates like make them easier to use while still not using locks if possible. You can beat Rust's functional-style iterators performance by writing your own custom iterator, but it isn't trivial to do so. Messing with larger (such as changing for ) or smaller (such as changing for ) types can hurt performance. We're experts in pointers by now! Seven layers of indirection? Easy peasy. There's a lot of configuration available for pointer scans: search depth, search breadth, search order, memory ranges, memory maps… One way to turn exponential problems into something more approachable is either finding an algorithm without the exponential growth, or trimming the amount of work to be done by a lot . And sometimes the former alternative is impossible.

Java

0 views

Lonami 5 years ago

Writing our own Cheat Engine: Code Injection

This is part 7 on the Writing our own Cheat Engine series: In part 6 we ended up spending most of the time in upgrading our breakpoint support to have a proper implementation, rather than using some hardcoded constants. We then made use of the new and improved breakpoint support to find what code accessed an specific memory address our very own debugger. To complete the tutorial, we read and understood the surrounding assembly around the code accessing our address and figured out what pointer to look for. In the end, we were left with a base address that we can rely on and follow to reach the target memory address, without having to scan for it every time. In this post, we will take a look at the different techniques Cheat Engine uses to patch instructions with as many other instructions as we need. Code injection is a technique where you inject a piece of code into the target process, and then reroute the execution of code to go through your own written code. In this tutorial you'll have a health value and a button that will decrease your health by 1 each time you click it. Your task is to use code injection to make the button increase your health by 2 each time it is clicked. Start with finding the address and then find what writes to it. Then when you've found the code that decreases it browse to that address in the disassembler, and open the auto assembler window (ctrl+a). There click on template and then code injection, and give it the address that decreases health (if it isn't already filled in correctly). That will generate a basic auto assembler injection framework you can use for your code. Notice the alloc, that will allocate a block of memory for your code cave, in the past, in the pre windows 2000 systems, people had to find code caves in the memory (regions of memory unused by the game), but that's luckily a thing of the past since windows 2000, and will these days cause errors when trying to be used, due to SP2 of XP and the NX bit of new CPU's Also notice the line and and the text "Place your code here". As you guessed it, write your code here that will increase the health with 2. An usefull assembler instruction in this case is the "ADD instruction". Here are a few examples: In this case, you'll have to use the same thing between the brackets as the original code has that decreases your health Notice: It is recommended to delete the line that decreases your health from the original code section, else you'll have to increase your health with 3 (you increase with 3, the original code decreases with 1, so the end result is increase with 2), which might become confusing. But it's all up to you and your programming. Notice 2: In some games the original code can exist out of multiple instructions, and sometimes, not always, it might happen that a code at another place jumps into your jump instruction end will then cause unknown behavior. If that happens, you should usually look near that instruction and see the jumps and fix it, or perhaps even choose to use a different address to do the code injection from. As long as you're able to figure out the address to change from inside your injected code. The Instruction Set Architecture (ISA) a typical desktop computer is able to interpret uses a variable-length encoding for the instructions (do correct me if this is phrased incorrectly; it's not my area of expertise). That means we can't go and blindly replace a instruction with the code we need. We need to be careful, and still hope that no code dynamically jumps to this very specific location. Otherwise we may end up executing Unintended Instructions ! The way Cheat Engine gets around this is by replacing the instruction with a jump. After the offending code is found, you can use a "template" that prompts "On what address do you want the jump?". After accepting the "code inject template", a window with the following code shows: It seems Cheat Engine has its own mini-language that extends assembly using Intel-syntax. It has which do… well, stuff. seems to allocate bytes at some address and assign to it. is where the jump to the newly-allocated memory will be inserted. seems to be used to define a label. Unlike your usual assembler, it appears we need to define the labels beforehand. A label may also be an address directly, in this case, . Cheat Engine will overwrite code from this address onwards. Executing Cheat Engine's assembler will greet you with the following message, provided everything went okay: Information The code injection was successfull newmem=FFFF0000 Go to FFFF0000? If we navigate to the address, we find the following: So, before this address we don't know what's in there. At the address, our newly inserted code is present, and after the code, a lot of zero values (which happen to be interpreted as ). After the allocated region (in our case, 2048 bytes), more unknown memory follows. The old code was replaced with the jump: Note how the instruction ( , 7 bytes) was replaced with both a ( , 5 bytes) and a ( , 2 bytes), both occupying 7 bytes. Because the size was respected, any old jumps will still fall in the same locations. But we were lucky to be working with 7 whole bytes to ourselves. What happens if we try to do the same on, say, a which is only 1 byte long? Interesting! A single byte is obviously not enough, so Cheat Engine goes ahead and replaces two instructions with the jump, even though we only intended to replace one. Note the old code at , it contains the and the next instruction (this was just before the code we are meant to replace, so I picked it as the example). Cheat Engine is obviously careful to both pick as many instructions as it needs to fit a , and the template pads the with as many bytes as it needs to respect the old size. If you attempt to assemble a longer instruction to replace a smaller one inline (as opposed to use the assembler with templates), Cheat Engine will warn you: Confirmation The generated code is 6 byte(s) long, but the selected opcode is 1 byte(s) long! Do you want to replace the incomplete opcode(s) with NOP's? Yes No Cancel Selecting "No" will leave the incomplete bytes as they were before (in the case you replace a long instruction with a short one), which is very likely to leave garbage instructions behind and mess up with even more instructions. When we initialize a new via , Rust will allocate enough space for 2048 items in a memory region that will belong to us. But we need this memory to belong to a different process, so that the remote process is the one with full Read, Write and eXecute access. There's quite a few ways to allocate memory: , , , … just to name a few! A process may even embed its own allocator which works on top of any of these. Each of these functions has its own purpose, with different tradeoffs, but the comparison on allocation methods notes: Starting with 32-bit Windows, and are implemented as wrapper functions that call using a handle to the process's default heap. Cool! That's two down. seems to be useful in COM-based applications, which we don't care about, and : […] allows you to specify additional options for memory allocation. However, its allocations use a page granularity, so using can result in higher memory usage. …which we don't care about, either. Since requires "A handle to the heap from which the memory will be allocated", and as far as I can tell, there is no easy way to do this for a different process, we'll turn our attention back to . The documentation reads: To allocate memory in the address space of another process, use the function. There's our function! But before we can use it, we should figure out how the memory allocated by Cheat Engine looks like. I'll be using this code: The results: So far, so good. This matches the address Cheat Engine was telling us about. It appears region 0x7ffe3000 was split to accomodate for region 0xffff0000, and the remaining had to become region 0xffff1000. The protection level for the region we care about is 40, which, according to the documentation is . It "Enables execute, read-only, or read/write access to the committed region of pages". Let's implement that in : will also zero-initialize the remote memory, although we don't care much about that. To us, all the memory is initialized, because we work through which is the one responsible for filling our buffers. The only fun remark is that we also saw zero-bytes when we did the process with Cheat Engine, and not random garbage, so that may be an indicator that we're on the right track. We also provide , so that the user can free memory if they want to. Otherwise, they're causing a memory leak in a remote process. Before we go and allocate memory, we need to determine where it should be allocated. Remember the instruction Cheat Engine added?: It's 5 bytes long, and the "address" is 4 bytes long. However, memory addresses are 8 bytes long! And also, the argument ( ) to the jump ( ) is backwards. Our machines are little endian, so the actual value is instead. I wonder what happens if… Aha! So the argument to the jump location is actually encoded relative to the current instruction pointer after reading the instruction (that's the plus five). In this case, all we need to do is find a memory region which is not yet reserved and is close enough to the offending instruction, so that we can make sure the relative offset will fit in 4 bytes: Sure enough, there's still free regions available to us. Because is sorted by , we can look for the first free region after the address we want to patch: There we go! 0x2f3b02 bytes away of 0x10002d4fe, we have a free memory region at 0x100321000 where we can allocate memory to. Alas, trying to allocate memory here fails: Well, to be fair, that's not the region Cheat Engine is finding. Here's what the memory looks like around the region Cheat Engine does use before injecting the code: And here is the after: Notice how the region it picked was 0x7ffe_6000, not 0x1_0000_0000. The offending instruction is at 0x1_0002_d4fe. So the jumps can go backwards just fine. But this doesn't really explain why the allocation at 0x1_0032_1000 failed, because it has the same state ( ) and protection level ( ) as the page at 0x7ffe_6000. I can't really explain why this is the case, but I can change the code to pick a free memory region before and not after the offending instruction: Perhaps the two regions aren't so different after all? At least we're picking the same region as Cheat Engine now. But why is the allocation failing? I'll be honest, I have no idea. We do have the required permission. I do not think the error is caused by enclaves (and I don't even know what those are): If the address in within an enclave that you initialized, then the allocation operation fails with the error. It also does not seem to be an issue with reserve and commit: Attempting to commit a specific address range by specifying without and a non- fails unless the entire range has already been reserved. The resulting error code is . We are using both and , and our is not null. Let's try reserving a memory region, but this time, from the end of the region (instead of from the beginning): Hey, that's… the same value Cheat Engine writes to! At the very last ↪1 , we can allocate memory where we can inject our assembled code. Now, we could go as far as getting our hands on some assembler, such as NASM , and invoke it on the input the user wishes to replace. Then we could read the output bytes of the assembled file, and write it to the desired memory location. However… that's just a lot of tedious work that won't teach us much (the Rust documentation already does an excellent job at teaching us how to work with files and invoke an external process). So I am going to cheat and hardcode the right bytes to complete this step of the tutorial. Here's what Cheat Engine says the area we're going to patch with the jump looks like: Here's the after: Let's finish up this tutorial step. Don't worry though, the addresses will still be correctly calculated. It's just the opcodes for the ADD instruction and NOP, mostly: So there we have it! The code calculates the correct relative address to jump to, depending on wherever the breakpoint was hit and wherever we ended up allocating memory. It also places in the ADD instruction, and this is enough to complete this tutorial step! We have seen one way to inject more than enough code for most needs (just allocate as much as you need!), through the use of watchpoints to figure out where the offending code we want to patch is. But this is not the only way! There are things known as "Windows hooks" which allow us to inject entire DLLs (Dynamic Loaded Libraries). We could also try mapping an existing program into the address space of the victim thread. Or we could create a remote thread which loads the library. Here's the more detailed Three Ways to Inject Your Code into Another Process article. When writing this post, I discovered other things, such as what the was and if I needed it, why was failing or why could it be failing , what the error code meant , among a couple other things. So there is definitely a lot to learn about this topic ↪2 . This post was a bit bittersweet for me! One takeaway definitely is the need to be a bit more creative when it comes down to studying how a different program works, but after all, if Cheat Engine can do it, so can we. There are still some unknowns left, and some shortcuts which we could've avoided, but regardless, we've seen how we can make it work. Making it ergonomic or more customizable comes later. Really, sometimes you just need to embrace the grind and get a first working version out. Don't obsess with making it perfect or cleaner at first, it's such a waste of time (if you are going to clean it up in the end, plan ahead, estimate how long it would take, and put aside your changes until the cleaning is done). The code for this post is available over at my GitHub. You can run after cloning the repository to get the right version of the code. Again, only the code necessary to complete the step is included at the tag. In the next post, we'll tackle the eighth step of the tutorial: Multilevel pointers. This step is what actually got me inspired into starting this entire series, which is why you may have felt this entry a bit more rushed. It is fairly more complicated than part 6 with a single pointer, because there's some ingenious work that needs to be done in order to efficiently, and automatically, solve it. I didn't manage to figure it out before starting the series, but maybe we're prepared now? The next post will also be the second-to-last entry in this series (the last step looks pretty tough as well!). After that, there are bonus levels of an actual graphical game, but as far as I can tell, it's there to gain a bit more experience with something more "serious", which I will probably leave as an exercise to the reader. 1 That "little" hiccup of me trying to figure out how Cheat Engine was finding that precise working location is what put an end to my one-blog-per-week streak. Ah well, sometimes taking a break from something and coming back to it later on just makes the problem obvious (or in this case, a new simple idea which happened to work). ↩ 2 I'm still not sure why we could not allocate near the first bytes of the free region, but we could do so just fine near the end. ↩ Part 1: Introduction (start here if you're new to the series!) Part 2: Exact Value scanning Part 3: Unknown initial value Part 4: Floating points Part 5: Code finder Part 6: Pointers Part 7: Code Injection Part 8: Multilevel pointers "ADD [00901234],9" to increase the address at 00901234 with 9 "ADD [ESP+4],9" to increase the address pointed to by ESP+4 with 9

Security

0 views

Lonami 5 years ago

Writing our own Cheat Engine: Pointers

This is part 6 on the Writing our own Cheat Engine series: In part 5 we wrote our very own debugger. We learnt that Cheat Engine is using hardware breakpoints to watch memory change, and how to do the same ourselves. We also learnt that hardware points are not the only way to achieve the effect of watchpoints, although they certainly are the fastest and cleanest approach. In this post, we will be reusing some of that knowledge to find out a closely related value, the pointer that points to the real value ↪1 . As a quick reminder, a pointer is nothing but an ↪2 representing the address of another portion of memory, in this case, the actual value we will be scanning for. A pointer is a value that, well, points elsewhere. In Rust we normally use reference instead, which are safer (typed and their lifetime is tracked) than pointers, but in the end we can achieve the same with both. Why care about pointers? It turns out that things, such as your current health in-game, are very unlikely to end up in the same memory position when you restart the game (or even change to another level, or even during gameplay). So, if you perform a scan and find that the address where your health is stored is , you might be tempted to save it and reuse it next time you launch the game. Now you don't need to scan for it again! Alas, as soon as you restart the game, the health is now stored at . Not all hope is lost! The game must somehow have a way to reliably find this value, and the way it's done is with pointers. There will always be some base address that holds a pointer, and the game code knows where to find this pointer. If we are also able to find the pointer at said base address, and follow it ourselves ("dereferencing" it), we can perform the same steps the game is doing, and reliably find the health no matter how much we restart the game ↪3 . In the previous step I explained how to use the Code finder to handle changing locations. But that method alone makes it difficult to find the address to set the values you want. That's why there are pointers: At the bottom you'll find 2 buttons. One will change the value, and the other changes the value AND the location of the value. For this step you don't really need to know assembler, but it helps a lot if you do. First find the address of the value. When you've found it use the function to find out what accesses this address. Change the value again, and a item will show in the list. Double click that item. (or select and click on more info) and a new window will open with detailed information on what happened when the instruction ran. If the assembler instruction doesn't have anything between a '[' and ']' then use another item in the list. If it does it will say what it think will be the value of the pointer you need. Go back to the main cheat engine window (you can keep this extra info window open if you want, but if you close it, remember what is between the [ and ]) and do a 4 byte scan in hexadecimal for the value the extra info told you. When done scanning it may return 1 or a few hundred addresses. Most of the time the address you need will be the smallest one. Now click on manually add and select the pointer checkbox. The window will change and allow you to type in the address of a pointer and a offset. Fill in as address the address you just found. If the assembler instruction has a calculation (e.g: [esi+12]) at the end then type the value in that's at the end. else leave it 0. If it was a more complicated instruction look at the calculation. Example of a more complicated instruction: [EAX*2+EDX+00000310] eax=4C and edx=00801234. In this case EDX would be the value the pointer has, and EAX*2+00000310 the offset, so the offset you'd fill in would be 2*4C+00000310=3A8. (this is all in hex, use calc.exe from windows in scientific mode to calculate). Back to the tutorial, click OK and the address will be added, If all went right the address will show P->xxxxxxx, with xxxxxxx being the address of the value you found. If thats not right, you've done something wrong. Now, change the value using the pointer you added in 5000 and freeze it. Then click Change pointer, and if all went right the next button will become visible. extra : And you could also use the pointer scanner to find the pointer to this address. Last time we managed to learn how hardware breakpoints were being set by observing Cheat Engine's behaviour. I think it's now time to handle this properly instead. We'll check out the CPU Registers x86 page on OSDev to learn about it: Each debug register DR0 through DR3 has two corresponding bits in DR7, starting from the lowest-order bit, to indicate whether the corresponding register is a L ocal or G lobal breakpoint. So it looks like this: Cheat Engine was using local breakpoints, because the zeroth bit was set. Probably because we don't want these breakpoints to infect other programs! Because we were using only one breakpoint, only the lowermost bit was being set. The local 1st, 2nd and 3rd bits were unset. Now, each debug register DR0 through DR4 has four additional bits in DR7, two for the C ondition and another two for the S ize: The two bits of the condition mean the following: When we were using Cheat Engine to add write watchpoints, the bits 17 and 16 were indeed set to , and the bits 19 and 18 were set to . Hm, but 11 2 = 3 10 , and yet, we were watching writes to 4 bytes. So what's up with this? Is there a different mapping for the size which isn't documented at the time of writing? Seems we need to learn from Cheat Engine's behaviour one more time. For reference, this is what DR7 looked like when we added a single write watchpoint: And this is the code I will be using to check the breakpoints of different sizes: Let's compare this to watchpoints for sizes 1, 2, 4 and 8 bytes: I have no idea what's up with that stray tenth bit. Its use does not seem documented, and things worked fine without it, so we'll ignore it. The lowest bit is set to indicate we're using DR0, bits 17 and 16 represent the write watchpoint, and the size seems to be as follows: Doesn't make much sense if you ask me, but we'll roll with it. Just to confirm, this is what the "on-access" breakpoint looks like according to Cheat Engine: So it all checks out! The bit pattern is for read/write (technically, a write is also an access). Let's implement this! The first thing we need to do is represent the possible breakpoint conditions: And also the legal breakpoint sizes: We are using so that we can convert a given variant into the corresponding bit pattern. With the right types defined in order to set a breakpoint, we can start implementing the method that will set them (inside ): First, let's try finding an "open spot" where we could set our breakpoint. We will "slide" a the bitmask over the lowest eight bits, and if and only if both the local and global bits are unset, then we're free to set a breakpoint at this index ↪4 : Once an is found, we can set the address we want to watch in the corresponding register and update the debug control bits: Note that we're first creating a "clear mask". We switch on all the bits that we may use for this breakpoint, and then negate. Effectively, will make sure we don't leave any bit high on accident. We apply the mask before OR-ing the rest of bits to also clear any potential garbage on the size and condition bits. Next, we set the bit to enable the new local breakpoint, and also store the size and condition bits at the right location. With the context updated, we can set it back and return the . It stores the and the so that it can clean up on . We are technically relying on to run behaviour here, but the cleanup is done on a best-effort basis. If the user intentionally forgets the , maybe they want the to forever be set. This logic is begging for a testcase though; I'll split it into a new method and test that out: Very good! With proper breakpoint handling usable, we can continue. After scanning memory for the location we're looking for (say, our current health), we then add an access watchpoint, and wait for an exception to occur. As a reminder, here's the page with the Debugging Events : Now, inside the we will want to do a few things, namely printing out the instructions "around this location" and dumping the entire thread context on screen. To print the instructions, we need to import again, iterate over all memory regions to find the region where the exception happened, read the corresponding bytes, decode the instructions, and when we find the one with a corresponding instruction pointer, print "around it": The result is pretty fancy: Cool! So is holding an address, meaning it's a pointer, and the value it reads (dereferences) is stored back into (because it does not need anymore). Alas, the current thread context has the register state after the instruction was executed, and no longer contains the address at this point. However, notice how the previous instruction writes a fixed value to , and then that value is used to access memory, like so: The value at is the pointer! No offsets are used, because nothing is added to the pointer after it's read. This means that, if we simply scan for the address we were looking for, we should find out where the pointer is stored: And just like that: Notice how the pointer address found matches with the offset used by the instructions: Very interesting indeed. We were actually very lucky to have only found a single memory location containing the pointer value, . Cheat Engine somehow knows that this value is always stored at (or rather, at ), because the address shows green. How does it do this? Remember back in part 2 when we introduced the memory regions? They're making a comeback! A memory region contains both the current memory protection option and the protection level when the region was created. If we try printing out the protection levels for both the memory region containing the value, and the memory region containing the pointer, this is what we get (the addresses differ from the ones previously because I restarted the tutorial): Interesting! According to the page , the type for the first region is , and the type for the second region is which: Indicates that the memory pages within the region are mapped into the view of an image section. The protection also changes from to simply , but I don't think it's relevant. Neither the type seems to be much more relevant. In part 2 we also mentioned the concept of "base address", but decided against using it, because starting to look for regions at address zero seemed to work fine. However, it would make sense that fixed "addresses" start at some known "base". Let's try getting the base address for all loaded modules . Currently, we only get the address for the base module, in order to retrieve its name, but now we need them all: The first call is used to retrieve the correct , then we allocate just enough, and make the second call. The returned type are pretty much memory addresses, so let's see if we can find regions that contain them: Exciting stuff! It appears also does the trick ↪5 , so now we could build a and simply check if to determine whether is a base address or not ↪6 . So there we have it! The address holding the pointer value does fall within one of these "base regions". You can also get the name from one of these module addresses, and print it in the same way as Cheat Engine does it (such as ). So, there's no "automated" solution to all of this? That's the end? Well, yes, once you have a pointer you can dereference it once and then write to the given address to complete the tutorial step! I can understand how this would feel a bit underwhelming, but in all fairness, we were required to pretty-print assembly to guess what pointer address we could potentially need to look for. There is an stupidly large amount of instructions , and I'm sure a lot of them can access memory, so automating that would be rough. We were lucky that the instructions right before the one that hit the breakpoint were changing the memory address, but you could imagine this value coming from somewhere completely different. It could also be using a myriad of different techniques to apply the offset. I would argue manual intervention is a must here ↪7 . We have learnt how to pretty-print instructions, and had a very gentle introduction to figuring out what we may need to look for. The code to retrieve the loaded modules, and their corresponding regions, will come in handy later on. Having access to this information lets us know when to stop looking for additional pointers. As soon as a pointer is found within a memory region corresponding to a base module, we're done! Also, I know the title doesn't really much the contents of this entry (sorry about that), but I'm just following the convention of calling it whatever the Cheat Engine tutorial calls them. The code for this post is available over at my GitHub. You can run after cloning the repository to get the right version of the code, although you will have to to individual commits if you want to review, for example, how the instructions were printed out. Only the code necessary to complete the step is included at the tag. In the next post , we'll tackle the seventh step of the tutorial: Code Injection. This will be pretty similar to part 5, but instead of writing out a simple NOP instruction, we will have to get a bit more creative. 1 This will only be a gentle introduction to pointers. Part 8 of this series will have to rely on more advanced techniques. ↩ 2 Kind of. The size of a pointer isn't necessarily the size as , although is guaranteed to be able of representing every possible address. For our purposes, we can assume a pointer is as big as . ↩ 3 Game updates are likely to pull more code and shuffle stuff around. This is unfortunately a difficult problem to solve. But storing a pointer which is usable across restarts for as long as the game doesn't update is still a pretty darn big improvement over having to constantly scan for the locations we care about. Although if you're smart enough to look for certain unique patterns, even if the code is changed, finding those patterns will give you the new updated address, so it's not impossible . ↩ 4 is a pretty recent addition at the time of writing (1.50.0), so make sure you if it's erroring out! ↩ 5 I wasn't sure if there would be some metadata before the module base address but within the region, so I went with the range check. What is important however is using , not . They're different, and this did bite me. ↩ 6 As usual, I have no idea if this is how Cheat Engine is doing it, but it seems reasonable. ↩ 6 But nothing's stopping you from implementing some heuristics to get the job done for you. If you run some algorithm in your head to find what the pointer value could be, you can program it in Rust as well, although I don't think it's worth the effort. ↩ Part 1: Introduction (start here if you're new to the series!) Part 2: Exact Value scanning Part 3: Unknown initial value Part 4: Floating points Part 5: Code finder Part 6: Pointers Part 7: Code Injection Part 8: Multilevel pointers DR0, DR1, DR2 and DR3 can hold a memory address each. This address will be used by the breakpoint. DR4 is actually an obsolete synonym for DR6. DR5 is another obsolete synonym, this time for DR7. DR6 is debug status. The four lowest bits indicate which breakpoint was hit, and the four highest bits contain additional information. We should make sure to clear this ourselves when a breakpoint is hit. DR7 is debug control, which we need to study more carefully. execution breakpoint. write watchpoint. read/write watchpoint. unsupported I/O read/write. for a single byte. for two bytes (a "word"). for four bytes (a "double word"). for eight bytes (a "quadruple word").

0 views

Lonami 5 years ago

Writing our own Cheat Engine: Code finder

This is part 5 on the Writing our own Cheat Engine series: In part 4 we spent a good deal of time trying to make our scans generic, and now we have something that works ↪1 ! Now that the scanning is fairly powerful and all covered, the Cheat Engine tutorial shifts focus into slightly more advanced techniques that you will most certainly need in anything bigger than a toy program. It's time to write our very own debugger in Rust! Sometimes the location something is stored at changes when you restart the game, or even while you're playing… In that case you can use 2 things to still make a table that works. In this step I'll try to describe how to use the Code Finder function. The value down here will be at a different location each time you start the tutorial, so a normal entry in the address list wouldn't work. First try to find the address. (You've got to this point so I assume you know how to.) When you've found the address, right-click the address in Cheat Engine and choose "Find out what writes to this address". A window will pop up with an empty list. Then click on the Change value button in this tutorial, and go back to Cheat Engine. If everything went right there should be an address with assembler code there now. Click it and choose the replace option to replace it with code that does nothing. That will also add the code address to the code list in the advanced options window. (Which gets saved if you save your table.) Click on stop, so the game will start running normal again, and close to close the window. Now, click on Change value, and if everything went right the Next button should become enabled. Note: When you're freezing the address with a high enough speed it may happen that next becomes visible anyhow Although I have used debuggers before, I have never had a need to write one myself so it's time for some research. Searching on DuckDuckGo, I can find entire series to Writing a Debugger . We would be done by now if only that series wasn't written for Linux. The Windows documentation contains a section called Creating a Basic Debugger , but as far as I can tell, it only teaches you the functions needed to configure the debugging loop. Which mind you, we will need, but in due time. According to Writing your own windows debugger in C , the steps needed to write a debugger are: There are pages documenting all of the debug events that our debugger will be able to handle. Okay, nice! Software breakpoints seem to be done by writing out memory to the region where the program is reading instructions from. We know how to write memory, as that's what all the previous posts have been doing to complete the corresponding tutorial steps. After the breakpoint is executed, all we need to do is restore the original memory back so that the next time the program executes the code it sees no difference. But a software breakpoint will halt execution when the code executes the interrupt instruction. This step of the tutorial wants us to find what writes to a memory location . Where should we place the breakpoint to detect such location? Writing out the instruction to the memory we want to break in won't do; it's not an instruction, it's just data. The name may have given it away. If we're talking about software breakpoints, it makes sense that there would exist such a thing as hardware breakpoints . Because they're tied to the hardware, they're highly processor-specific, but luckily for us, the processor on your usual desktop computer probably has them! Even the cortex-m does. The wikipedia page also tells us the name of the thing we're looking for, watchpoints: Other kinds of conditions can also be used, such as the reading, writing, or modification of a specific location in an area of memory. This is often referred to as a conditional breakpoint, a data breakpoint, or a watchpoint. A breakpoint that triggers when a specific memory location is written to is exactly what we need, and x86 has debug registers D0 to D3 to track memory addresses . As far as I can tell, there is no API in specific to mess with the registers. But we don't need any of that! We can just go ahead and write some assembly by hand to access these registers. At the time of writing, inline assembly is unstable, so we need a nightly compiler. Run if you haven't yet, and execute the following code with : stands is the debug control register , and running this we get… …an exception! In all fairness, I have no idea what that code would have done. So maybe the is just trying to protect us. Can we read from the register instead, and see it's default value? Nope. Okay, it seems directly reading from or writing to the debug register is a ring-0 thing. Surely there's a way around this. But first we should figure out how to enumerate and pause all the threads. It seems there is no straightforward way to enumerate the threads. One has to create a "toolhelp" and poll the entries. I won't bore you with the details. Let's add to the crate features of and try it out: Annoyingly, invalid handles returned by , are (which is -1), not null. But that's not a big deal, we simply can't use here. The function ignores the process identifier when using , used to include all threads, and we need to compare the process identifier ourselves. In summary, we create a "toolhelp" (wrapped in a helper so that whatever happens, will clean it up), initialize a thread enntry (with everything but the structure size to zero) and call the first time, subsequent times. It seems to work all fine! According to this, the Cheat Engine tutorial is only using one thread. Good to know. Much like processes, threads need to be opened before we can use them, with : Just your usual RAII pattern. The thread is opened with permission to suspend and resume it. Let's try to pause the handles with to make sure that this thread is actually the one we're looking for: Both suspend and resume return the previous "suspend count". It's kind of like a barrier or semaphore where the thread only runs if the suspend count is zero. Trying it out: If you run this code with the process ID of the Cheat Engine tutorial, you will see that the tutorial window freezes for ten seconds! Because the main and only thread is paused, it cannot process any window events, so it becomes unresponsive. It is now "safe" to mess around with the thread context. I'm definitely not the first person to wonder How to set a hardware breakpoint? . This is great, because it means I don't need to ask that question myself. It appears we need to change the debug register via the thread context . One has to be careful to use the right context structure. Confusingly enough, is 32 bits, not 64. alone seems to be the right one: Trying it out: Looks about right! Hm, I wonder what happens if I use Cheat Engine to add the watchpoint on the memory location we care about? Look at that! The debug registers changed! DR0 contains the location we want to watch for writes, and the debug control register DR7 changed. Cheat Engine sets the same values on all threads (for some reason I now see more than one thread printed for the tutorial, not sure what's up with that; maybe the single-thread is the weird one out). Hmm, what happens if I watch for access instead of write? What if I set both? Most intriguing! This was done by telling Cheat Engine to find "what writes" to the address, then "what accesses" the address. I wonder if the order matters? "What accesses" and then "what writes" does change it. Very well! We're only concerned in a single breakpoint, so we won't worry about this, but it's good to know that we can inspect what Cheat Engine is doing. It's also interesting to see how Cheat Engine is using hardware breakpoints and not software breakpoints. For simplicity, our code is going to assume that we're the only ones messing around with the debug registers, and that there will only be a single debug register in use. Make sure to add to the permissions when opening the thread handle: If we do this (and temporarily get rid of the ), trying to change the value in the Cheat Engine tutorial will greet us with a warm message: Tutorial-x86_64 External exception 80000004. Press OK to ignore and risk data corruption. Press Abort to kill the program. There is no debugger attached yet that could possibly handle this exception, so the exception just propagates. Let's fix that. Now that we've succeeded on setting breakpoints, we can actually follow the steps described in Creating a Basic Debugger . It starts by saying that we should use to attach our processor, the debugger, to the process we want to debug, the debuggee. This function lives under the header, so add it to features: Once again, we create a wrapper with to stop debugging the process once the token is dropped. The call to in our method ensures that, if our process (the debugger) dies, the process we're debugging (the debuggee) stays alive. We don't want to be restarting the entire Cheat Engine tutorial every time our Rust code crashes! With the debugger attached, we can wait for debug events. We will put this method inside of , so that the only way you can call it is if you successfully attached to another process: wants a timeout in milliseconds, so our function lets the user pass the more Rusty type. will indicate "there is no timeout", i.e., it's infinite. If the duration is too large to fit in the ( fails), it will also be infinite. If we attach the debugger, set the hardware watchpoint, and modify the memory location from the tutorial, an event with will be returned! Now, back to the page with the Debugging Events … Gah! It only has the name of the constants, not the values. Well, good thing docs.rs has a source view! We can just check the values in the source code for : So, we've got a : Generated whenever a new process is created in a process being debugged or whenever the debugger begins debugging an already active process. The system generates this debugging event before the process begins to execute in user mode and before the system generates any other debugging events for the new process. It makes sense that this is our first event. By the way, if you were trying this out with a lying around in your code, you may have noticed that the window froze until the debugger terminated. That's because: When the system notifies the debugger of a debugging event, it also suspends all threads in the affected process. The threads do not resume execution until the debugger continues the debugging event by using . Let's call but also wait on more than one event and see what happens: That's a lot of . Pumping it up to one hundred and also showing the index we get the following: In order, we got: And, if after all this, you change the value in the Cheat Engine tutorial (thus triggering our watch point), we get ! Generated whenever an exception occurs in the process being debugged. Possible exceptions include attempting to access inaccessible memory, executing breakpoint instructions, attempting to divide by zero, or any other exception noted in Structured Exception Handling. If we print out all the fields in the structure: The , which is , corresponds with : A trace trap or other single-instruction mechanism signaled that one instruction has been executed. The is supposed to be "the address where the exception occurred". Very well! I have already completed this step of the tutorial, and I know the instruction is (or, as Cheat Engine shows, the bytes in hexadecimal). The opcode for the instruction is in hexadecimal, so if we replace two bytes at this address, we should be able to complete the tutorial. Note that we also need to flush the instruction cache, as noted in the Windows documentation: Debuggers frequently read the memory of the process being debugged and write the memory that contains instructions to the instruction cache. After the instructions are written, the debugger calls the function to execute the cached instructions. So we add a new method to : And write some quick and dirty code to get this done: Although it seems to work: It really doesn't: Tutorial-x86_64 Access violation. Press OK to ignore and risk data corruption. Press Abort to kill the program. Did we write memory somewhere we shouldn't? The documentation does mention "segment-relative" and "linear virtual addresses": returns the descriptor table entry for a specified selector and thread. Debuggers use the descriptor table entry to convert a segment-relative address to a linear virtual address. The and functions require linear virtual addresses. But nope! This isn't the problem. The problem is that the is after the execution happened, so it's already 2 bytes beyond where it should be. We were accidentally writing out the first half of the next instruction, which, yeah, could not end good. So does it work if I do this instead?: This totally does work. Step 5: complete 🎉 You may not be satisfied at all with our solution. Not only are we hardcoding some magic constants to set hardware watchpoints, we're also relying on knowledge specific to the Cheat Engine tutorial (insofar that we're replacing two bytes worth of instruction with NOPs). Properly supporting more than one hardware breakpoint, along with supporting different types of breakpoints, is definitely doable. The meaning of the bits for the debug registers is well defined, and you can definitely study that to come up with something more sophisticated and support multiple different breakpoints. But for now, that's out of the scope of this series. The tutorial only wants us to use an on-write watchpoint, and our solution is fine and portable for that use case. However, relying on the size of the instructions is pretty bad. The instructions x86 executes are of variable length, so we can't possibly just look back until we find the previous instruction, or even naively determine its length. A lot of unrelated sequences of bytes are very likely instructions themselves. We need a disassembler. No, we're not writing our own ↪4 . Searching on crates.io for "disassembler" yields a few results, and the first one I've found is iced-x86 . I like the name, it has a decent amount of GitHub stars, and it was last updated less than a month ago. I don't know about you, but I think we've just hit a jackpot! It's quite heavy though, so I will add it behind a feature gate, and users that want it may opt into it: You can make use of it with . I don't want to turn this blog post into a tutorial for , but in essence, we need to make use of its . Here's the plan: Pretty straightforward! We can set the "instruction pointer" of the decoder so that it matches with the address we're reading from. The method comes in really handy. Overall, it's a bit inefficient, because we could reuse the regions retrieved previously, but other than that, there is not much room for improvement. With this, we are no longer hardcoding the instruction size or guessing which instruction is doing what. You may wonder, what if the region does not start with valid executable code? It could be possible that the instructions are in some memory region with garbage except for a very specific location with real code. I don't know how Cheat Engine handles this, but I think it's reasonable to assume that the region starts with valid code. As far as I can tell (after having asked a bit around), the encoding is usually self synchronizing (similar to UTF-8), so eventually we should end up with correct instructions. But someone can still intentionally write real code between garbage data which we would then disassemble incorrectly. This is a problem on all variable-length ISAs. Half a solution is to start at the entry point , decode all instructions, and follow the jumps. The other half would be correctly identifying jumps created just to trip a disassembler up, and jumps pointing to dynamically-calculated addresses! That was quite a deep dive! We have learnt about the existence of the various breakpoint types (software, hardware, and even behaviour, such as watchpoints), how to debug a separate process, and how to correctly update the code other process is running on-the-fly. The code for this post is available over at my GitHub. You can run after cloning the repository to get the right version of the code. Although we've only talked about setting breakpoints, there are of course ways of detecting them . There's entire guides about it . Again, we currently hardcode the fact we want to add a single watchpoint using the first debug register. A proper solution here would be to actually calculate the needs that need to be set, as well as keeping track of how many breakpoints have been added so far. Hardware breakpoints are also limited, since they're simply a bunch of registers, and our machine does not have infinite registers. How are other debuggers like able to create a seemingly unlimited amount of breakpoints? Well, the GDB wiki actually has a page on Internals Watchpoints , and it's really interesting! essentially single-steps through the entire program and tests the expressions after every instruction: Software watchpoints are very slow, since GDB needs to single-step the program being debugged and test the value of the watched expression(s) after each instruction. However, that's not the only way. One could change the protection level of the region of interest (for example, remove the write permission), and when the program tries to write there, it will fail! In any case, the GDB wiki is actually a pretty nice resource. It also has a section on Breakpoint Handling , which contains some additional insight. With regards to code improvements, could definitely be both nicer and safer to use, with a custom , so the user does not need to rely on magic constants or having to resort to access to get the right variant. In the next post , we'll tackle the sixth step of the tutorial: Pointers. It reuses the debugging techniques presented here to backtrack where the pointer for our desired value is coming from, so here we will need to actually understand what the instructions are doing, not just patching them out! 1 I'm not super happy about the design of it all, but we won't actually need anything beyond scanning for integers for the rest of the steps so it doesn't really matter. ↩ 2 There seems to be a way to pause the entire process in one go, with the undocumented function! ↩ 3 It really is called that. The naming went from "IP" (instruction pointer, 16 bits), to "EIP" (extended instruction pointer, 32 bits) and currently "RIP" (64 bits). The naming convention for upgraded registers is the same (RAX, RBX, RCX, and so on). The OS Dev wiki is a great resource for this kind of stuff. ↩ 4 Well, we don't need an entire disassembler. Knowing the length of each instruction is enough, but that on its own is also a lot of work. ↩ Part 1: Introduction (start here if you're new to the series!) Part 2: Exact Value scanning Part 3: Unknown initial value Part 4: Floating points Part 5: Code finder Part 6: Pointers Part 7: Code Injection Part 8: Multilevel pointers . It makes sense that we need to pause all the threads ↪2 before messing around with the code the program is executing, or things are very prone to go wrong. . This function retrieves the appropriate context of the specified thread and is highly processor specific. It basically takes a snapshot of all the registers. Think of registers like extremely fast, but also extremely limited, memory the processor uses. . Essentially writes out the 0xCC opcode , in assembly, also known as software breakpoint. It's written wherever the Register Instruction Pointer (RIP ↪3 ) currently points to, so in essence, when the thread resumes, it will immediately trigger the breakpoint . . Presumably continues debugging. Find the memory region corresponding to the address we want to patch. Read the entire region. Decode the read bytes until the instruction pointer reaches our address. Because we just parsed the previous instruction, we know its length, and can be replaced with NOPs.

0 views

Lonami 5 years ago

Writing our own Cheat Engine: Floating points

This is part 4 on the Writing our own Cheat Engine series: In part 3 we did a fair amount of plumbing in order to support scan modes beyond the trivial "exact value scan". As a result, we have abstracted away the , and types as a separate each. Scanning for changed memory regions in an opened process can now be achieved with three lines of code: How's that for programmability? No need to fire up Cheat Engine's GUI anymore! The in the example above remembers all the found within the range specified by . Up until now, we have only worked with , so that's the type the scans expect and what they work with. Now it's time to introduce support for different types, like , , or even more atypical ones, like arbitrary sequences of bytes (think of strings) or even numbers in big-endian. Tighten your belt, because this post is quite the ride. Let's get right into it! In the previous tutorial we used bytes to scan, but some games store information in so called 'floating point' notations. (probably to prevent simple memory scanners from finding it the easy way). A floating point is a value with some digits behind the point. (like 5.12 or 11321.1) Below you see your health and ammo. Both are stored as Floating point notations, but health is stored as a float and ammo is stored as a double. Click on hit me to lose some health, and on shoot to decrease your ammo with 0.5 You have to set BOTH values to 5000 or higher to proceed. Exact value scan will work fine here, but you may want to experiment with other types too. Hint: It is recommended to disable "Fast Scan" for type double The enumeration holds scanned values, and is currently hardcoded to store . The type also holds a value, the value we want to scan for. Changing it to support other types is trivial: is the raw memory, and can be interpreted from any sequence of bytes thanks to our friend . This change alone is enough to store an arbitrary ! So we're done now? Not really, no. First of all, we need to update all the places where or are used. Our first stop is the scanned , which holds the found : Then, we need to update everywhere is used, and on and on… All in all this process is just repeating , letting the compiler vent on you, and taking good care of it by fixing the errors. It's quite reassuring to know you will not miss a single place. Thank you, compiler! But wait, how could scanning for a decreased value work for any ? The type is not , we should add some trait bounds. And also, what happens if the type is not ? It could implement ↪1 , and we will be transmuting from raw bytes, which would trigger the implementation when we're done with the value! Not memory safe at all! And how could we possibly cast raw memory to the type without knowing its siz– oh nevermind, is already by default . But anyway, we need the other bounds. In order to not repeat ourselves, we will implement a new , let's say , which requires all other bounds: And fix our definitions: Every type which is , and can be scanned over ↪2 , because we where the bounds are met. Unfortunately, we cannot require or because the floating point types do not implement it. Also known as reinterpreting a bunch of bytes as something else, or perhaps it stands for "summoning the demon": is incredibly unsafe. There are a vast number of ways to cause undefined behavior with this function. should be the absolute last resort. Types like define methods such as and which convert raw bytes from and into its native representation. This is all really nice, but unfortunately, there's no standard trait in the Rust's standard library to "interpret a type as the byte sequence of its native representation". , however, does exist, and similar to any other function, it's safe to call as long as we respect its invariants . What are these invariants ↪3 ? Both types must have the same size Okay, we can just assert that the window length matches the type's length. What else? Neither the original, nor the result, may be an invalid value . What's an invalid value? Okay, that's actually an awful lot. Types like implement all the trait bounds we defined, and it would be insta-UB to ever try to cast them from arbitrary bytes. The same goes for , and all are out of our control, too. At least we're safe on the "memory is initialized" front. Dang it, I really wanted to use ! But if we were to use it for arbitrary types, it would trigger undefined behaviour sooner than later. We have several options here: We will go with the first option ↪5 , because I really want to use , and I want users to be able to implement the trait on their own types. In any case, we need to change our to something more specific, in order to prevent it from automatically implementing the trait for types for which their memory representation has invalid values. So we get rid of this: And replace it with this: Making a small macro for things like these is super useful. You could of course write for all ten as well, but that introduces even more to read. Last but not least, let's replace the hardcoded and with . All the need to be replaced with because the size may no longer be . All the need to be replaced with . We explicitly write out to make sure the compiler doesn't accidentally infer something we didn't intend. And… it doesn't work at all. We're working with byte slices of arbitrary length. We cannot transmute a type, which is 16 bytes (8 for the pointer and 8 for the length), to our . My plan to use transmute can't possibly work here. Sigh. Okay, we can't transmute, because we don't have a sized value, we only have a slice of bytes pointing somewhere else. What we could do is reinterpret the pointer to those bytes as a different type, and then dereference it! This is still a form of "transmutation", just without using . Woop! You can compile this and test it out on the step 2 and 3 of the tutorial, using , and it will still work! Something troubles me, though. Can you see what it is? When we talked about invalid values, it had a note about unaligned references: a reference/ that is dangling, unaligned, or points to an invalid value. Our is essentially a reference to . The only difference is we're working at the pointer level, but they're pretty much references. Let's see what the documentation for has to say as well, since we're dereferencing pointers: when a raw pointer is dereferenced (using the operator), it must be non-null and aligned. It must be aligned. The only reason why our data is aligned is because we are also performing a "fast scan", so we only look at aligned locations. This is a time bomb waiting to blow up. Is there any other way to from a pointer which is safer? must be properly aligned. Use if this is not the case. Bingo! Both and , unlike dereferencing the pointer, will perform a copy, but if it can make the code less prone to blowing up, I'll take it ↪6 . Let's change the code one more time: I prefer to avoid type annotations in variables where possible, which is why I use the turbofish so often. You can get rid of the cast and use a type annotation instead, but make sure the type is known, otherwise it will think it's because is a . Now, this is all cool and good. You can replace with for and you'll be able to get halfway done with the step 4 of Cheat Engine's tutorial. Unfortunately, as it is, this code is not enough to complete step 4 with exact scans ↪7 . You see, comparing floating point values is not as simple as checking for bitwise equality. We were actually really lucky that the part works! But the values in the part are not as precise as our inputs, so our exact scan fails. Using a fixed type parameter is pretty limiting as well. On the one hand, it is nice that, if you scan for , the compiler statically guarantees that subsequent scans will also happen on and thus be compatible. On the other, this requires us to know the type at compile time, which for an interactive program, is not possible. While we could create different methods for each supported type and, at runtime, decide to which we should jump, I am not satisfied with that solution. It also means we can't switch from scanning an to an , for whatever reason. So we need to work around this once more. What does our scanning function need, really? It needs a way to compare two chunks of memory as being equal or not (as we have seen, this isn't trivial with types such as floating point numbers) and, for other types of scans, it needs to be able to produce an ordering, or calculate a difference. Instead of having a our trait require the bounds and , we can define our own methods to compare with . It still should be , so we can pass it around without worrying about lifetimes: This can be trivially implemented for all integer types: The funny is because I decided to call the method , so I have to disambiguate between it and . We can go ahead and update the code using to use these new functions instead. Now, you may have noticed I only implemented it for the integer types. That's because floats need some extra care. Unfortunately, floating point types do not have any form of "precision" embedded in them, so we can't accurately say "compare these floats to the precision level the user specified". What we can do, however, is drop a few bits from the mantissa, so "relatively close" quantities are considered equal. It's definitely not as good as comparing floats to the user's precision, but it will get the job done. I'm going to arbitrarily say that we are okay comparing with "half" the precision. We can achieve that by masking half of the bits from the mantissa to zero: You may be wondering what's up with that weird . Let's visualize it with a . This type has 16 bits, 1 for sign, 5 for exponent, and 10 for the mantissa: If we substitute the constant with the numeric value and operate: So effectively, half of the mantisssa bit will be masked to 0. For the example, this makes us lose 5 bits of precision. Comparing two floating point values with their last five bits truncated is equivalent to checking if they are "roughly equal"! When Cheat Engine scans for floating point values, several additional settings show, and one such option is "truncated". I do not know if it behaves like this, but it might. Let's try this out: Huzzah! The makes sure that a normal comparision would fail, and then we that our custom one passes the test. When the user performs an exact scan, the code will be more tolerant to the user's less precise inputs, which overall should result in a nicer experience. The second problem we need to solve is the possibility of the size not being known at compile time ↪8 . While we can go as far as scanning over strings of a known length, this is rather limiting, because we need to know the length at compile time ↪9 . Heap allocated objects are another problem, because we don't want to compare the memory representation of the stack object, but likely the memory where they point to (such as ). Instead of using , we can add a new method to our , , which will tell us the size required of the memory view we're comparing against: It is to implement, because we are relying on the returned value to be truthful and unchanging. It should be safe to call, because it cannot have any invariants. Unfortunately, signaling "unsafe to implement" is done by marking the entire trait as , since "unsafe to call" is reserved for , and even though the rest of methods are not necessarily unsafe to implement, they're treated as such. At the moment, cannot be made into a trait object because it is not object safe . This is caused by the requirement on all object, which in turn needs the types to be because returns . Because of this, the size must be known. However, we can move the requirement to the methods that need it! This way, can remain object safe, enabling us to do the following: Any type which can be interpreted as a reference to is also a scannable! This enables us to perform scans over , where the type is known at runtime! Or rather, it would, if implemented , which it can't ↪10 because that's what prompted this entire issue. Dang it! I can't catch a breath today! Okay, let's step back. Why did we need our scannables to be clone in the first place? When we perform exact scans, we store the original value in the region, which we don't own, so we clone it. But what if we did own the value? Instead of taking the by reference, which holds , we could take it by value. If we get rid of all the bounds and update to take , along with updating all the things that take a to take them by value as well, it should all work out. But it does not. If we take by value, with it not being , we simply can't use it to scan over multiple regions. After the first region, we have lost the . Let's take a second step back. We are scanning memory, and we want to compare memory, but we want to treat the memory with different semantics (for example, if we treat it as , we want to check for rough equality). Instead of storing the value itself, we could store its memory representation , and when we compare memory representations, we can do so under certain semantics. First off, let's revert getting rid of all . Wherever we stored a , we will now store a . We will still use a type parameter to represent the "implementations of ". For this to work, our definitions need to use somewhere, or else the compiler refuses to compile the code with error E0392 . For this, I will stick a in the variant. It's a bit pointless to include it in all variants, and seems the most appropriated: This keeps in line with : Our will no longer work on and . Instead, it will work on two . We will also need a way to interpret a as , which we can achieve with a new method, . This method interprets the raw memory representation of as its raw bytes. It also lets us get rid of , because we can simply do . It's still to implement, because it should return the same length every time: But now we can't use it in trait object, so the following no longer works: Ugh! Well, to be fair, we no longer have a "scannable" at this point. It's more like a scan mode that tells us how memory should be compared according to a certain type. Let's split the trait into two: one for the scan mode, and other for "things which are scannable": Note that we have an associated which contains the corresponding . If we used a trait bound such as , we'd be back to square one: it would inherit the method definitions that don't use and thus cannot be used as trait objects. With these changes, it is possible to implement for any : We do have to adjust a few places of the code to account for both and , but all in all, it's pretty straightforward. Things like don't need to store the anymore, just a . It also doesn't need the , because it's not going to be scanning anything on its own. This applies transitively to which was holding a . does need to be updated to store the size of the region we are scanning for, however, because we need that information when running a subsequent scan. For all that don't have a explicit thing to scan for (like ), the also needs to be stored in them. Despite all our efforts, we're still unable to return an chosen at runtime. As far as I can tell, there's simply no way to specify that type. We want to return a type which is scannable, which has itself (which is also a ) as the corresponding mode. Even if we just tried to return the mode, we simply can't, because it's not object-safe. Is this the end of the road? We need a way to pass an arbitrary scan mode to our . This scan mode should go in tandem with types, because it would be unsafe otherwise. We've seen that using a type just doesn't cut it. What else can we do? Using an enumeration is a no-go, because I want users to be able to extend it further. I also would like to avoid having to update the and all the matches every time I come up with a different type combination. And it could get pretty complicated if I ever built something dynamically, such as letting the user combine different scans in one pass. So what if we make return a value that implements the functions we need? It's definitely… non-conventional. But hey, now we're left with the trait, which is object-safe, and does not have any type parameters! It is a bit weird, but defining local functions and using those in the returned value is a nice way to keep things properly scoped: Our needs to store the type, and not just the memory, once again. For variants that don't need any value, they can store the and size instead. Does this solution work? Yes! It's possible to return a from a function, and underneath, it may be using any type which is . Is this the best solution? Well, that's hard to say. This is one of the possible solutions. We have been going around in circles for quite some time now, so I'll leave it there. It's a solution, which may not be pretty, but it works. With these changes, the code is capable of completing all of the steps in the Cheat Engine tutorial up until point! If there's one lesson to learn from this post, it's that there is often no single correct solution to a problem. We could have approached the scan types in many, many ways (and we tried quite a few!), but in the end, choosing one option or the other comes down to your (sometimes self-imposed) requirements. You may obtain the code for this post over at my GitHub. You can run after cloning the repository to get the right version of the code. The code has gone through a lot of iterations, and I'd still like to polish it a bit more, so it might slightly differ from the code presented in this entry. If you feel adventurous, Cheat Engine has different options for scanning floating point types: "rounded (default)", "rounded (extreme)", and truncated. Optionally, it can scan for "simple values only". You could go ahead and toy around with these! We didn't touch on types with different lengths, such as strings. You could support UTF-8, UTF-16, or arbitrary byte sequences. This post also didn't cover scanning for multiple things at once, known as "groupscan commands", although from what I can tell, these are just a nice way to scan for arbitrary byte sequences. We also didn't look into supporting different the same scan with different alignments. All these things may be worth exploring depending on your requirements. You could even get rid of such genericity and go with something way simpler. Supporting , and is enough to complete the Cheat Engine tutorial. But I wanted something more powerful, although my solution currently can't scan for a sequence such as "exact type, unknown, exact matching the unknown". So yeah. In the next post , we'll tackle the fifth step of the tutorial: Code finder. Cheat Engine attaches its debugger to the process for this one, and then replaces the instruction that performs the write with a different no-op so that nothing is written anymore. This will be quite the challenge! 1 and are exclusive . See also E0184 . ↩ 2 If you added more scan types that require additional bounds, make sure to add them too. For example, the "decreased by" scan requires the type to . ↩ 3 This is a good time to remind you to read the documentation. It is of special importance when dealing with methods; I recommend reading it a couple times. ↩ 4 Even with this option, it would not be a bad idea to make the trait . ↩ 5 Not for long. As we will find out later, this approach has its limitations. ↩ 6 We can still perform the pointer dereference when we know it's aligned. This would likely be an optimization, although it would definitely complicate the code more. ↩ 7 It would work if you scanned for unknown values and then checked for decreased values repeatedly. But we can't just leave exact scan broken! ↩ 8 Unfortunately, this makes some optimizations harder or even impossible to perform. Providing specialized functions for types where the size is known at compile time could be worth doing. Programming is all tradeoffs. ↩ 9 Rust 1.51 , which was not out at the time of writing, would make it a lot easier to allow scanning for fixed-length sequences of bytes, thanks to const generics. ↩ 10 Workarounds do exist, such as dtolnay's . But I would rather not go that route. ↩ Part 1: Introduction (start here if you're new to the series!) Part 2: Exact Value scanning Part 3: Unknown initial value Part 4: Floating points Part 5: Code finder Part 6: Pointers Part 7: Code Injection Part 8: Multilevel pointers a that isn't 0 or 1 an with an invalid discriminant a null pointer a outside the ranges [0x0, 0xD7FF] and [0xE000, 0x10FFFF] a (all values are invalid for this type) an integer ( / ), floating point value ( ), or raw pointer read from uninitialized memory, or uninitialized memory in a . a reference/ that is dangling, unaligned, or points to an invalid value. a wide reference, , or raw pointer that has invalid metadata: metadata is invalid if it is not a pointer to a vtable for that matches the actual dynamic trait the pointer or reference points to slice metadata is invalid if the length is not a valid (i.e., it must not be read from uninitialized memory) a type with custom invalid values that is one of those values, such as a that is null. (Requesting custom invalid values is an unstable feature, but some stable libstd types, like , make use of it.) Make it an . Implementors will be responsible for ensuring that the type they're implementing it for can be safely transmuted from and into. Seal the and implement it only for types we know are safe ↪4 , like . Add methods to the definition that do the conversion of the type into its native representation.

0 views

Lonami 5 years ago

Writing our own Cheat Engine: Unknown initial value

This is part 3 on the Writing our own Cheat Engine series: In part 2 we left off with a bit of a cliff-hanger. Our little program is now able to scan for an exact value, remember the couple hundred addresses pointing to said value, and perform subsequent scans to narrow the list of addresses down until we're left with a handful of them. However, it is not always the case that you have an exact value to work with. The best you can do in these cases is guess what the software might be storing. For example, it could be a floating point for your current movement speed in a game, or an integer for your current health. The problem with this is that there are far too many possible locations storing our desired value. If you count misaligned locations, this means there is a different location to address every single byte in memory. A program with one megabyte of memory already has a million of addresses. Clearly, we need to do better than performing one million memory reads ↪1 . This post will shift focus a bit from using to possible techniques to perform the various scans. Ok, seeing that you've figured out how to find a value using exact value let's move on to the next step. First things first though. Since you are doing a new scan, you have to click on New Scan first, to start a new scan. (You may think this is straighforward, but you'd be surprised how many people get stuck on that step) I won't be explaining this step again, so keep this in mind Now that you've started a new scan, let's continue In the previous test we knew the initial value so we could do a exact value, but now we have a status bar where we don't know the starting value. We only know that the value is between 0 and 500. And each time you click 'hit me' you lose some health. The amount you lose each time is shown above the status bar. Again there are several different ways to find the value. (like doing a decreased value by... scan), but I'll only explain the easiest. "Unknown initial value", and decreased value. Because you don't know the value it is right now, a exact value wont do any good, so choose as scantype 'Unknown initial value', again, the value type is 4-bytes. (most windows apps use 4-bytes)click first scan and wait till it's done. When it is done click 'hit me'. You'll lose some of your health. (the amount you lost shows for a few seconds and then disappears, but you don't need that) Now go to Cheat Engine, and choose 'Decreased Value' and click 'Next Scan' When that scan is done, click hit me again, and repeat the above till you only find a few. We know the value is between 0 and 500, so pick the one that is most likely the address we need, and add it to the list. Now change the health to 5000, to proceed to the next step. The key thing to notice here is that, when we read memory from another process, we do so over entire regions . A memory region is represented by a starting offset, a size, and a bunch of other things like protection level. When running the first scan for an unknown value, all we need to remember is the starting offset and size for every single region. All the candidate locations that could point to our value fall within this range, so it is enough for us to store the range definition, and not every location within it. To gain a better understanding of what this means, let's come up with a more specific scenario. With our current approach of doing things, we store an address ( ) for every location pointing to our desired value. In the case of unknown values, all locations are equally valid, since we don't know what value they should point to yet, and any value they point to is good. With this representation, we would end up with a very large vector: This representation is dense. Every single number in the range is present. So why bother storing the values individually when the range is enough?: Much better! With two , one for the starting location and another for the end, we can indicate that we care about all the locations falling in that range. In fact, some accessible memory regions immediately follow eachother, so we could even compact this further and merge regions which are together. But due to their potential differences with regards to protection levels, we will not attempt to merge regions. We don't want to get rid of the old way of storing locations, because once we start narrowing them down, we will want to go back to storing just a few candidates. To keep things tidy, let's introduce a new representing either possibility: Let's also introduce another to perform the different scan types. For the time being, we will only worry about looking for in memory: When scanning for exact values, it's not necessary to store the value found. We already know they're all the same, for example, value . However, if the value is unknown, we do need to store it so that we can compare it in a subsequent scan to see if the value is the same or it changed. This means the value can be "any within" the read memory chunk: For every region in memory, there will be some candidate locations and a value (or value range) we need to compare against in subsequent scans: With all the data structures needed setup, we can finally refactor our old scanning code into a new method capable of dealing with all these cases. For brevity, I will omit the exact scan, as it remains mostly unchanged: Time to try it out! If we consider misaligned locations, there is a lot of potential addresses where we could look for. Running the same scan on Cheat Engine yields addresses, which is pretty close. It's probably skipping some additional regions that we are considering. Emulating Cheat Engine to perfection is not a concern for us at the moment, so I'm not going to investigate what regions it actually uses. Now that we have performed the initial scan and have stored all the and , we can re-implement the "next scan" step to handle any variant of our enum. This enables us to mix-and-match any mode in any order. For example, one could perform an exact scan, then one for decreased values, or start with unknown scan and scan for unchanged values. The tutorial suggests using "decreased value" scan, so let's start with that: Other scanning modes, such as decreased by a known amount rather than any decrease, increased, unchanged, changed and so on, are not very different from the "decreased" scan, so I won't bore you with the details. I will use a different method to perform a "rescan", since the first one is a bit more special in that it doesn't start with any previous values: If you've skimmed over that, I do not blame you. Here's the summary: for every existing region, when executing the scan mode "decreased", if the previous locations were dense, read the entire memory region. On success, if the previous values were a chunk of memory, iterate over the current and old memory at the same time, and for every aligned , if the new value is less, store it. It's also making me ill. Before I leave a mess on the floor, does it work? Okay, great, let's clean up this mess… Does it also make you uncomfortable to be writing something that you know will end up huge unless you begin refactoring other parts right now? I definitely feel that way. But I think it's good discipline to push through with something that works first, even if it's nasty, before going on a tangent. Now that we have the basic implementation working, let's take on this monster before it eats us alive. First things first, that method is inside an block. The deepest nesting level is 13. I almost have to turn around my chair to read the entire thing out! Second, we're nesting four matches. Three of them we care about: scan, candidate location, and value. If each of these has , and variants respectively, writing each of these by hand will require different implementations! Cheat Engine offers 10 different scans, I can think of at least 3 different ways to store candidate locations, and another 3 ways to store the values found. That's different combinations. I am not willing to write out all these ↪2 , so we need to start introducing some abstractions. Just imagine what a monster function you would end with! The horror! Third, why is the scan being executed in the process? This is something that should be done in the instead! Let's begin the cleanup: I already feel ten times better. Now, this method will unconditionally read the entire memory region, even if the scan or the previous candidate locations don't need it ↪3 . In the worst case with a single discrete candidate location, we will be reading a very large chunk of memory when we could have read just the 4 bytes needed for the . On the bright side, if there are more locations in this memory region, we will get read of them at the same time ↪4 . So even if we're moving more memory around all the time, it isn't too bad. Great! If reading memory succeeds, we want to rerun the scan: The rerun will live inside : An exact scan doesn't care about any previous values, so it behaves like a first scan. The first scan is done by the function (it contains the implementation factored out of the method), which only needs the region information and the current memory chunk we just read. The unknown scan leaves the region unchanged: any value stored is still valid, because it is unknown what we're looking for. The decreased scan will have to iterate over all the candidate locations, and compare them with the current memory chunk. But this time, we'll abstract this iteration too: For a dense candidate location, we iterate over all the 4-aligned addresses (fast scan for values), and yield . This way, the can do anything it wants with the old and new values, and if it finds a match, it can use the address. The method will deal with all the variants: This way, can easily use any value type. With this, we have all covered: in , in , and in . Now we can add as many variants as we want, and we will only need to update a single arm for each of them. Let's implement and try it out: Hmm… before we went down from to locations, and now we went down to . Where did we go wrong? After spending several hours on this, I can tell you where we went wrong. is always accessing the memory range , and not the right address. Here's the fix: Let's take a look at other possible types. Cheat Engine supports the following initial scan types: "Bigger than" and "Smaller than" can both be represented by "Value between", so it's pretty much just three. For subsequent scans, in addition to the scan types described above, we find: Not only does Cheat Engine provide all of these scans, but all of them can also be negated. For example, "find values that were not increased by 7". One could imagine to also support things like "increased value by range". For the increased and decreased scans, Cheat Engine also supports "at least xx%", so that if the value changed within the specified percentage interval, it will be considered. What about ? I can't tell you how Cheat Engine stores these, but I can tell you that can still be quite inefficient. Imagine you've started with a scan for unknown values and then ran a scan for unchanged valueus. Most values in memory will have been unchanged, but with our current implementation, we are now storing an entire address for each of these. One option would be to introduce , which would be a middle ground. You could implement it like and include a vector of booleans telling you which values to consider, or go smaller and use a bitstring or bit vector. You could use a sparse vector data structure. is very much like , except that it stores a value to compare against and not an address. Here we can either have an exact value, or an older copy of the memory. Again, keeping a copy of the entire memory chunk when all we need is a handful of values is inefficient. You could keep a mapping from addresses to values if you don't have too many. Or you could shrink and fragment the copied memory in a more optimal way. There's a lot of room for improvement! What if, despite all of the efforts above, we still don't have enough RAM to store all this information? The Cheat Engine Tutorial doesn't use a lot of memory, but as soon as you try scanning bigger programs, like games, you may find yourself needing several gigabytes worth of memory to remember all the found values in order to compare them in subsequent scans. You may even need to consider dumping all the regions to a file and read from it to run the comparisons. For example, running a scan for "unknown value" in Cheat Engine brings its memory up by the same amount of memory used by the process scanned (which makes sense), but as soon as I ran a scan for "unchanged value" over the misaligned values, Cheat Engine's disk usage skyrocketed to 1GB/s (!) for several seconds on my SSD. After it finished, memory usage went down to normal. It was very likely writing out all candidate locations to disk. There is a lot of things to learn from Cheat Engine just by observing its behaviour, and we're only scratching its surface. In the next post , we'll tackle the fourth step of the tutorial: Floating points. So far, we have only been working with for simplicity. We will need to update our code to be able to account for different data types, which will make it easy to support other types like , , or even strings, represented as an arbitrary sequence of bytes. As usual, you can obtain the code for this post over at my GitHub. You can run after cloning the repository to get the right version of the code. This version is a bit cleaner than the one presented in the blog, and contains some of the things described in the Going beyond section. Until next time! 1 Well, technically, we will perform a million memory reads ↪5 . The issue here is the million calls to , not reading memory per se. ↩ 2 Not currently. After a basic implementation works, writing each implementation by hand and fine-tuning them by treating each of them as a special case could yield significant speed improvements. So although it would be a lot of work, this option shouldn't be ruled out completely. ↩ 3 You could ask the candidate locations where one should read, which would still keep the code reasonably simple. ↩ 4 You could also optimize for this case by determining both the smallest and largest address, and reading enough to cover them both. Or apply additional heuristics to only do so if the ratio of the size you're reading compared to the size you need isn't too large and abort the joint read otherwise. There is a lot of room for optimization here. ↩ 5 (A footnote in a footnote?) The machine registers, memory cache and compiler will all help lower this cost, so the generated executable might not actually need that many reads from RAM. But that's getting way too deep into the details now. ↩ Part 1: Introduction (start here if you're new to the series!) Part 2: Exact Value scanning Part 3: Unknown initial value Part 4: Floating points Part 5: Code finder Part 6: Pointers Part 7: Code Injection Part 8: Multilevel pointers Exact Value Bigger than… Smaller than… Value between… Unknown initial value Increased value Increased value by… Decreased value Decreased value by… Changed value Unchanged value

0 views

Lonami 5 years ago

Writing our own Cheat Engine: Exact Value scanning

This is part 2 on the Writing our own Cheat Engine series: In the introduction, we spent a good deal of time enumerating all running processes just so we could find out the pid we cared about. With the pid now in our hands, we can do pretty much anything to its corresponding process. It's now time to read the process' memory and write to it. If our process was a single-player game, this would enable us to do things like setting a very high value on the player's current health pool, making us invincible. This technique will often not work for multi-player games, because the server likely knows your true current health (the most you could probably do is make the client render an incorrect value). However, if the server is crappy and it trusts the client, then you're still free to mess around with your current health. Even if we don't want to write to the process' memory, reading is still very useful. Maybe you could enhance your experience by making a custom overlay that displays useful information, or something that makes noise if it detects the life is too low, or even simulating a keyboard event to automatically recover some mana when you're running low. Be warned about anti-cheat systems. Anything beyond a basic game is likely to have some protection measures in place, making the analysis more difficult (perhaps the values are scrambled in memory), or even pinging the server if it detects something fishy. I am not responsible for any bans! Use your brain before messing with online games, and don't ruin the fun for everyone else. If you get caught for cheating, I don't want to know about it. Now that all script kiddies have left the room, let's proceed with the post. Now that you have opened the tutorial with Cheat Engine let's get on with the next step. You can see at the bottom of this window is the text Health: xxx. Each time you click 'Hit me' your health gets decreased. To get to the next step you have to find this value and change it to 1000 To find the value there are different ways, but I'll tell you about the easiest, 'Exact Value': First make sure value type is set to at least 2-bytes or 4-bytes. 1-byte will also work, but you'll run into an easy to fix problem when you've found the address and want to change it. The 8-byte may perhaps works if the bytes after the address are 0, but I wouldn't take the bet. Single, double, and the other scans just don't work, because they store the value in a different way. When the value type is set correctly, make sure the scantype is set to 'Exact Value'. Then fill in the number your health is in the value box. And click 'First Scan'. After a while (if you have a extremely slow pc) the scan is done and the results are shown in the list on the left If you find more than 1 address and you don't know for sure which address it is, click 'Hit me', fill in the new health value into the value box, and click 'Next Scan'. Repeat this until you're sure you've found it. (that includes that there's only 1 address in the list.....) Now double click the address in the list on the left. This makes the address pop-up in the list at the bottom, showing you the current value. Double click the value, (or select it and press enter), and change the value to 1000. If everything went ok the next button should become enabled, and you're ready for the next step. Note: If you did anything wrong while scanning, click "New Scan" and repeat the scanning again. Also, try playing around with the value and click 'hit me' The Cheat Engine tutorial talks about "value types" and "scan types" like "exact value". The value types will help us narrow down what we're looking for. For example, the integer type is represented in memory as 32 bits, or 4 bytes. However, is also represented by 4 bytes, and so is . Or perhaps the 4 bytes represent RGBA values of a color! So any 4 bytes in memory can be interpreted in many ways, and it's up to us to decide which way we interpret the bytes in. When programming, numbers which are 32-bit wide are common, as they're a good (and fast) size to work with. Scanning for this type is often a good bet. For positive numbers, is represented the same as in memory, so even if the value turns out to not be signed, the scan is likely to work. Focusing on will save us from scanning for or even other types, like interpreting 8 bytes for , , or less bytes like . The scan types will help us narrow down how we're looking for a value. Scanning for an exact value means what you think it does: interpret all 4 bytes in the process' memory as our value type, and check if they exactly match our value. This will often yield a lot of candidates, but it will be enough to get us started. Variations of the exact scan include checking for all values below a threshold, above, in between, or even just… unknown. What's the point of scanning for unknown values if everything in memory is unknown? Sometimes you don't have a concrete value. Maybe your health pool is a bar and it nevers tell you how much health you actually have, just a visual indicator of your percentage left, even if the health is not stored as a percentage. As we will find later on, scanning for unknown values is more useful than it might appear at first. We can access the memory of our own program by guessing random pointers and trying to read from them. But Windows isolates the memory of each program, so no pointer we could ever guess will let us read from the memory of another process. Luckily for us, searching for "read process memory winapi" leads us to the function. Spot on. Much like trying to dereference a pointer pointing to released memory or even null, reading from an arbitrary address can fail for the same reasons (and more). We will want to signal this with . It's funny to note that, even though we're doing something that seems wildly unsafe (reading arbitrary memory, even if the other process is mutating it at the same time), the function is perfectly safe. If we cannot read something, it will return , but if it succeeds, it has taken a snapshot of the memory of the process, and the returned value will be correctly initialized. The function will be defined inside our , since it conveniently holds an open handle to the process in question. It takes , because we do not need to mutate anything in the instance. After adding the feature to , we can perform the call: Great! But the address space is somewhat large. 64 bits large. Eighteen quintillion, four hundred and forty-six quadrillion, seven hundred and forty-four trillion, seventy-three billion, seven hundred and nine million, five hundred and fifty-one thousand, six hundred and sixteen ↪1 large. You gave up reading that, didn't you? Anyway, 18'446'744'073'709'551'616 is a big number. I am not willing to wait for the program to scan over so many values. I don't even have 16 exbibytes of RAM installed on my laptop yet ↪2 ! What's up with that? The program does not actually have all that memory allocated (surprise!). Random-guessing an address is extremely likely to point out to invalid memory. Reading from the start of the address space all the way to the end would not be any better. And we need to do better. We need to query for the memory regions allocated to the program. For this purpose we can use . Retrieves information about a range of pages within the virtual address space of a specified process. We have enumerated things before, and this function is not all that different. We start with a base address of zero ↪3 ( ), and ask the function to tell us what's in there. Let's try it out, with the crate feature in : That's annoying. It seems not everything has an , and you're supposed to send a PR if you want it to have debug, even if the feature is set. I'm surprised they don't auto-generate all of this and have to rely on manually adding as needed? Oh well, let's get rid of the feature and print it out ourselves: Hopefully we don't need to do this often: Awesome! There is a region at , and the of zero indicates that "the caller does not have access" when the region was created. However, is , and that is the current protection level. A value of one indicates : Disables all access to the committed region of pages. An attempt to read from, write to, or execute the committed region results in an access violation. Now that we know that the first region starts at 0 and has a size of 64 KiB, we can simply query for the page at to fetch the next region. Essentially, we want to loop until it fails, after which we'll know there are no more pages ↪4 : The size of the region beginning at the base address in which all pages have identical attributes, in bytes. …which also hints that the value we want is "base address", not the "allocation base". With these two values, we can essentially iterate over all the page ranges: That's a lot of pages! Let's try to narrow the amount of pages down. How many pages aren't ? Still a fair bit! Most likely, there are just a few interleaved pages, and the rest are allocated each with different protection levels. How much memory do we need to scan through? Wait, what? What do you mean over 4 GiB? The Task Manager claims that the Cheat Engine Tutorial is only using 2.1 MB worth of RAM! Perhaps we can narrow down the protection levels a bit more. If you look at the scan options in Cheat Engine, you will notice the "Memory Scan Options" groupbox. By default, it only scans for memory that is writable, and doesn't care if it's executable or not: Each memory protection level has its own bit, so we can OR them all together to have a single mask. When ANDing this mask with the protection level, if any bit is set, it will be non-zero, meaning we want to keep this region. Don't ask me why there isn't a specific bit for "write", "read", "execute", and there are only bits for combinations. I guess this way Windows forbids certain combinations. Hey, that's close to the value shown by the Task Manager! A handfull of megabytes is a lot more manageable than 4 entire gigabytes. Okay, we have all the memory regions from which the program can read, write, or execute. Now we also can read the memory in these regions: All that's left is for us to scan for a target value. To do this, we want to iterate over all the of size equal to the size of our scan type. We convert the 32-bit exact target value to its memory representation as a byte array in native byte order . This way we can compare the target bytes with the window bytes. Another option is to interpret the window bytes as an with , but gives us slices of type , and wants an array, so it's a bit more annoying to convert. This is enough to find the value in the process' memory! The tutorial starts out with health "100", which is what I scanned. Apparently, there are nearly a hundred of -valued integers stored in the memory of the tutorial. Attentive readers will notice that some values are located at an offset modulo 4. In Cheat Engine, this is known as "Fast Scan", which is enabled by default with an alignment of 4. Most of the time, values are aligned in memory, and this alignment often corresponds with the size of the type itself. For 4-byte integers, it's common that they're 4-byte aligned. We can perform a fast scan ourselves with ↪5 : As a bonus, over half the addresses are gone, so we have less results to worry about ↪6 . The first scan gave us way too many results. We have no way to tell which is the correct one, as they all have the same value. What we need to do is a second scan at the locations we just found . This way, we can get a second reading, and compare it against a new value. If it's the same, we're on good track, and if not, we can discard a location. Repeating this process lets us cut the hundreds of potential addresses to just a handful of them. For example, let's say we're scanning our current health of in a game. This gives us over a hundred addresses that point to the value of . If we go in-game and get hit ↪7 by some enemy and get our health down to, say, (we have a lot of defense), we can then read the memory at the hundred memory locations we found before. If this second reading is not , we know the address does not actually point to our health pool and it just happened to also contain a on the first scan. This address can be removed from the list of potential addresses pointing to our health. Let's do that: We create a vector to store all the locations the first scan finds, and then retain those that match a second target value. You may have noticed that we perform a memory read, and thus a call to the Windows API, for every single address. With a hundred locations to read from, this is not a big deal, but oftentimes you will have tens of thousands of addresses. For the time being, we will not worry about this inefficiency, but we will get back to it once it matters: Sweet! In a real-world scenario, you will likely need to perform these additional scans a couple of times, and even then, there may be more than one value left no matter what. For good measure, we'll wrap our in a loop ↪8 : Now that we have very likely locations pointing to our current health in memory, all that's left is writing our new desired value to gain infinite health ↪9 . Much like how we're able to read memory with , we can write to it with . Its usage is straightforward: Similar to how writing to a file can return short, writing to a memory location could also return short. Here we mimic the API for writing files and return the number of bytes written. The documentation indicates that we could actually ignore the amount written by passing as the last parameter, but it does no harm to retrieve the written count as well. And just like that: …oh noes. Oh yeah. The documentation, which I totally didn't forget to read, mentions: The handle must have and access to the process. We currently open our process with and , which is enough for reading, but not for writing. Let's adjust to accomodate for our new requirements: Isn't that active Next button just beautiful? This post somehow ended up being longer than part one, but look at what we've achieved! We completed a step of the Cheat Engine Tutorial without using Cheat Engine . Just pure Rust. Figuring out how a program works and reimplementing it yourself is a great way to learn what it's doing behind the scenes. And now that this code is yours, you can extend it as much as you like, without being constrained by Cheat Engine's UI. You can automate it as much as you want. And we're not even done. The current tutorial has nine steps, and three additional graphical levels. In the next post , we'll tackle the third step of the tutorial: Unknown initial value. This will pose a challenge, because with just 2 MiB of memory, storing all the 4-byte aligned locations would require 524288 addresses ( , 8 bytes). This adds up to twice as much memory as the original program (4 MiB), but that's not our main concern, having to perform over five hundred thousand API calls is! Remember that you can obtain the code for this post over at my GitHub. You can run after cloning the repository to get the right version of the code. 1 I did in fact use an online tool to spell it out for me. ↩ 2 16 GiB is good enough for my needs. I don't think I'll ever upgrade to 16 EiB. ↩ 3 Every address we query should have a corresponding region, even if it's not allocated or we do not have access. This is why we can query for the memory address zero to get its corresponding region. ↩ 4 Another option is to to determine the and and only work within bounds. ↩ 5 Memory regions are page-aligned, which is a large power of two. Our alignment of 4 is much lower than this, so we're guaranteed to start off at an aligned address. ↩ 6 If it turns out that the value was actually misaligned, we will miss it. You will notice this if, after going through the whole process, there are no results. It could mean that either the value type is wrong, or the value type is misaligned. In the worst case, the value is not stored directly but is rather computed with something like , or XORed with some magic value, or a myriad other things. ↩ 7 You could do this without getting hit, and just keep on repeating the scan for the same value over and over again. This does work, but the results are suboptimal, because there are also many other values that didn't change. Scanning for a changed value is a better option. ↩ 8 You could actually just go ahead and try to modify the memory at the hundred addresses you just found, although don't be surprised if the program starts to misbehave! ↩ 9 Okay, we cannot fit infinity in an . However, we can fit sufficiently large numbers. Like , which is enough to complete the tutorial. ↩ Part 1: Introduction (start here if you're new to the series!) Part 2: Exact Value scanning Part 3: Unknown initial value Part 4: Floating points Part 5: Code finder Part 6: Pointers Part 7: Code Injection Part 8: Multilevel pointers

Security

0 views

Lonami 5 years ago

Writing our own Cheat Engine: Introduction

This is part 1 on the Writing our own Cheat Engine series: Cheat Engine is a tool designed to modify single player games and contains other useful tools within itself that enable its users to debug games or other applications. It comes with a memory scanner, (dis)assembler, inspection tools and a handful other things. In this series, we will be writing our own tiny Cheat Engine capable of solving all steps of the tutorial, and diving into how it all works underneath. Needless to say, we're doing this for private and educational purposes only. One has to make sure to not violate the EULA or ToS of the specific application we're attaching to. This series, much like cheatengine.org, does not condone the illegal use of the code shared. Cheat Engine is a tool for Windows, so we will be developing for Windows as well. However, you can also read memory from Linux-like systems . GameConqueror is a popular alternative to Cheat Engine on Linux systems, so if you feel adventurous, you could definitely follow along too! The techniques shown in this series apply regardless of how we read memory from a process. You will learn a fair bit about doing FFI in Rust too. We will be developing the application in Rust, because it enables us to interface with the Windows API easily, is memory safe (as long as we're careful with !), and is speedy (we will need this for later steps in the Cheat Engine tutorial). You could use any language of your choice though. For example, Python also makes it relatively easy to use the Windows API . You don't need to be a Rust expert to follow along, but this series assumes some familiarity with C-family languages. Slightly advanced concepts like the use of or the type will be briefly explained. What a is or what does will not be explained. Cheat Engine's source code is mostly written in Pascal and C. And it's a lot of code, with a very flat project structure, and files ranging in the thousand lines of code each. It's daunting ↪1 . It's a mature project, with a lot of knowledge encoded in the code base, and a lot of features like distributed scanning or an entire disassembler. Unfortunately, there's not a lot of comments. For these reasons, I'll do some guesswork when possible as to how it's working underneath, rather than actually digging into what Cheat Engine is actually doing. With that out of the way, let's get started! This tutorial will teach you the basics of cheating in video games. It will also show you foundational aspects of using Cheat Engine (or CE for short). Follow the steps below to get started. Congratulations! If you did everything correctly, the process window should be gone with Cheat Engine now attached to the > tutorial (you will see the process name towards the top-center of CE). Click the "Next" button below to continue, or fill in the password and click the "OK" button to proceed to that step.) If you're having problems, simply head over to forum.cheatengine.org, then click on "Tutorials" to view beginner-friendly > guides! Our first step is attaching to the process we want to work with. But we need a way to find that process in the first place! Having to open the task manager, look for the process we care about, noting down the process ID (PID), and slapping it in the source code is not satisfying at all. Instead, let's enumerate all the processes from within the program, and let the user select one by typing its name. From a quick DuckDuckGo search , we find an official tutorial for Enumerating All Processes , which leads to the call. Cool! Let's slap in the crate on , because I don't want to write all the definitions by myself: Because is in (you can see this in the online page of its documentation), we know we'll need the crate feature. Another option is to search for it in the documentation and noting down the parent module where its stored. The documentation for the method has the following remark: It is a good idea to use a large array, because it is hard to predict how many processes there will be at the time you call EnumProcesses . Sidenote: reading the documentation for the methods we'll use from the Windows API is extremely important. There's a lot of gotchas involved, so we need to make sure we're extra careful. 1024 is a pretty big number, so let's go with that: We allocate enough space ↪2 for 1024 in a vector ↪3 , and pass a mutable pointer to the contents to . Note that the size of the array is in bytes , not items, so we need to multiply the capacity by the size of . The API likes to use for sizes, unlike Rust which uses , so we need a cast. Last, we need another mutable variable where the amount of bytes written is stored, . If the function fails, the return value is zero. To get extended error information, call . That's precisely what we do. If it returns false (zero), we return the last OS error. Rust provides us with , which essentially makes that same call but returns a proper instance. Cool! To determine how many processes were enumerated, divide the lpcbNeeded value by . Easy enough: Rust doesn't know that the memory for items were initialized by the call, but we do, so we make use of the call to indicate this. The Rust documentation even includes a FFI similar to our code! Let's give it a ride: It works! But currently we only have a bunch of process identifiers, with no way of knowing which process they refer to. To obtain process handles for the processes whose identifiers you have just obtained, call the function. The documentation for also contains the following: When you are finished with the handle, be sure to close it using the function. This sounds to me like the perfect time to introduce a custom with an ! We're using to cleanup resources, not behaviour, so it's fine. Using to cleanup behaviour is a bad idea . But anyway, let's get back to the code: For , we'll want to use (and we also need to add the feature to the dependency in ). It returns a , which is a nullable mutable pointer to . If it's null, the call failed, and if it's non-null, it succeeded and we have a valid handle. This is why we use Rust's : will return if the pointer is non-null. We map the non-null pointer to a instance with . converts the to a with the error builder function we provide if it was . The first parameter is a bitflag of permissions we want to have. For now, we can leave it as zero (all bits unset, no specific permissions granted). The second one is whether we want to inherit the handle, which we don't, and the third one is the process identifier. Let's close the resource handle on (after adding to the crate features): can actually fail (for example, on double-close), but given our invariants, it won't. You could add an to panic if this is not the case. We can now open processes, and they will be automatically closed on . Does any of this work though? …nope. Maybe the documentation for says something? The access to the process object. This access right is checked against the security descriptor for the process. This parameter can be one or more of the process access rights. One or more, but we're setting zero permissions. I told you, reading the documentation is important ↪4 ! The Process Security and Access Rights page lists all possible values we could use. seems to be appropriated: Required to retrieve certain information about a process, such as its token, exit code, and priority class Does this fix it? Nice . It does solve it. But why did we only open 69 processes out of 188? Does it help if we run our code as administrator? Let's search for in the Windows menu and right click to Run as administrator, then into our project and try again: We're able to open a few more, so it does help. In general, we'll want to run as administrator, so normal programs can't sniff on what we're doing, and so that we have permission to do more things. We're not done enumerating things just yet. To get the "name" of a process, we need to enumerate the modules that it has loaded, and only then can we get the module base name. The first module is the program itself, so we don't need to enumerate all modules, just the one is enough. For this we want and . I'm using the ASCII variant of because I'm too lazy to deal with UTF-16 of the (wide, unicode) variants. takes a pointer to an array of . We could use a of capacity one to hold the single module, but in memory, a pointer a single item can be seen as a pointer to an array of items. helps us reserve enough memory for the one item we need. With the module handle, we can retrieve its base name: Similar to how we did with , we create a buffer that will hold the ASCII string of the module's base name ↪5 . The call wants us to pass a pointer to a mutable buffer of , but Rust's wants a , so instead we declare a buffer of and the pointer in the call. You could also do this with , and Rust would infer the right type, but is neat. We the creation of the UTF-8 string because the buffer should contain only ASCII characters (which are also valid UTF-8). We could use the variant to create the string, but what if somehow it contains non-ASCII characters? The less , the better. Let's see it in action: That's not good. What's up with that? Maybe… The handle must have the and access rights. …I should've read the documentation. Okay, fine: Hooray 🎉! There's some processes we can't open, but that's because they're system processes. Security works! That was a fairly long post when all we did was print a bunch of pids and their corresponding name. But in all fairness, we also laid out a good foundation for what's coming next. You can obtain the code for this post over at my GitHub. At the end of every post, the last commit will be tagged, so you can to see the final code for any blog post. In the next post , we'll tackle the second step of the tutorial: Exact Value scanning. 1 You could say I simply love reinventing the wheel, which I do, but in this case, the codebase contains far more features than we're interested in. The (apparent) lack of structure and documentation regarding the code, along with the unfortunate lack of license for the source code, make it a no-go. There's a license, but I think that's for the distributed program itself. ↩ 2 If it turns out that there are more than 1024 processes, our code will be unaware of those extra processes. The documentation suggests to perform the call again with a larger buffer if , but given I have under 200 processes on my system, it seems unlikely we'll reach this limit. If you're worried about hitting this limit, simply use a larger limit or retry with a larger vector. ↩ 3 C code would likely use here, but Rust's handles the allocation for us, making the code both simpler and more idiomatic. In general, if you see calls to when porting some code to Rust, you can probably replace it with a . ↩ 4 This will be a recurring theme. ↩ 5 …and similar to , if the name doesn't fit in our buffer, the result will be truncated. ↩ Part 1: Introduction Part 2: Exact Value scanning Part 3: Unknown initial value Part 4: Floating points Part 5: Code finder Part 6: Pointers Part 7: Code Injection Part 8: Multilevel pointers Open Cheat Engine if it currently isn't running. Click on the "Open Process" icon (it's the top-left icon with the computer on it, below "File".). With the Process List window now open, look for this tutorial's process in the list. It will look something like > "00001F98-Tutorial-x86_64.exe" or "0000047C-Tutorial-i386.exe". (The first 8 numbers/letters will probably be different.) Once you've found the process, click on it to select it, then click the "Open" button. (Don't worry about all the > other buttons right now. You can learn about them later if you're interested.)

0 views

Lonami 6 years ago

Data Mining, Warehousing and Information Retrieval

During university, there were a few subjects where I had to write blog posts for (either as evaluable tasks or just for fun). I thought it was really fun and I wanted to preserve that work here, with the hopes it's interesting to someone. The posts series were auto-generated from the original HTML files and manually anonymized later. Data Mining and Data Warehousing Information Retrieval and Web Search

Database

HTML

Data

0 views

Lonami 6 years ago

My new computer

This post will be mostly me ranting about setting up a new laptop, but I also just want to share my upgrade. If you're considering installing Arch Linux with dual-boot for Windows, maybe this post will help. Or perhaps you will learn something new to troubleshoot systems in the future. Let's begin! Last Sunday, I ordered a Asus Rog Strix G531GT-BQ165 for 900€ (on a 20% discount) with the following specifications: I was mostly interested in a general upgrade (better processor, disk, more RAM), although the graphics card is a really nice addition which will allow me to take some time off on more games. After using it for a bit, I really love the feel of the keyboard, and I love the lack of numpad! (No sarcasm, I really don't like numpads.) This is an upgrade from my previous laptop (Asus X554LA-XX822T), which I won in a competition before entering university in a programming challenge. It has served me really well for the past five years, and had the following specifications: Prior to this one, I had a Lenovo (also won in the same competition of the previous year), and prior to that (just for the sake of history), it was HP Pavilion, AMD A4-3300M processor, which unfortunately ended with heating problems. But that's very old now. The laptop arrived 2 days ago at roughly 19:00, which I put charged for 3 hours as the book said. The day after, nightmares began! Trying to boot it the first two times was fun, as it comes with a somewhat loud sound on boot. I don't know why they would do this, and I immediately turned it off in the BIOS. I spent all of yesterday trying to setup Windows and Arch Linux (and didn't even finish, it took me this morning too and even now it's only half functional). I absolutely hate the amount of partitions the Windows installer creates on a clean disk. So instead, I first went with Arch Linux, and followed the installation guide on the Arch wiki . Pre-installation, setting up the wireless network, creating the partitions and formatting them went all good. I decided to avoid GRUB at first and go with rEFInd, but alas I missed a big warning on the wiki and after reboot (I would later find out) it was not mounting root properly, so all I had was whatever was in the Initramfs. Reboot didn't work, so I had to hold the power button. Anyway, once the partitions were created, I went to install Windows (there was a lot of back and forth burning different images on the USB, which was a bit annoying because it wasn't the fastest thing in the world). This was pretty painless, and the process was standard: select advanced to let me choose the right partition, pick the one, say "no" to everything in the services setup, and done. But this was the first Windows I tried. It was an old revision, and the drivers were causing issues when running (something weird about their , manually installing the driver files seemed to work?). The Nvidia drivers didn't want to be installed on such an old revision, after updating everything I could via Windows updates. So back I went to burning a newer Windows and going through the same process again… Once Windows was ready and I verified that I could boot to it correctly, it was time to have a second go at Arch Linux. And I went through the setup at least three times, getting it wrong every single time, formatting root every single time, redownloading the packages every single pain. If only had I known earlier what the issue was! Why bother with Arch? I was pretty happy with Linux Mint, and I lowkey wanted to try NixOS, but I had used Arch before and it's a really nice distro overall (up-to-date, has AUR, quite minimal, imperative), except for trying to install rEFInd while chrooted… In the end I managed to get something half-working, I still need to properly configure WiFi and pulseaudio in my system but hey it works. I like to be able to dual-boot Windows and Linux because Linux is amazing for productivity, but unfortunately, some games only work fine on Windows. Might as well have both systems and use one for gaming, while the other is my daily driver. This is the process I followed to install Arch Linux in the end, along with a brief explanation on what I think the things are doing and why we are doing them. I think the wiki could do a better job at this, but I also know it's hard to get it right for everyone. Something I do dislike is the link colour, after opening a link it becomes gray and it's a lot easier to miss the fact that it is a link in the first place, which was tough when re-reading it because some links actually matter a lot. Furthermore, important information may just be a single line, also easy to skim over. Anyway, on to the installation process… The first thing we want to do is configure our keyboard layout or else the keys won't correspond to what we expect: Because we're on a recent system, we want to verify that UEFI works correctly. If we see files listed, then it works fine: The next thing we want to do is configure the WiFi, because I don't have any ethernet cable nearby. To do this, we check what network interfaces our laptop has (we're looking for the one prefixed with "w", presumably for wireless, such as "wlan0" or "wlo1"), we set it up, scan for available wireless network, and finally connect. In my case, the network has WPA security so we rely on to connect, passing the SSID (network name) and password: After that's done, pinging an IP address like "1.1.1.1" should Just Work™, but to be able to resolve hostnames, we need to also setup a nameserver. I'm using Cloudflare's, but you could use any other: If the ping works, then network works! If you still have issues, you may need to manually configure a static IP address and add a route with the address of your, well, router. This basically shows if we have any address, adds a static address (so people know who we are), shows what route we have, and adds a default one (so our packets know where to go): Now that we have network available, we can enable NTP to synchronize our system time (this may be required for network operations where certificates have a validity period, not sure; in any case nobody wants a wrong system time): After that, we can manage our disk and partitions using . We want to define partitions to tell the system where it should live. To determine the disk name, we first list them, and then edit it. is really nice and reminds you at every step that help can be accessed with "m", which you should constantly use to guide you through. The partitions I made are the following: I like to have and separate because I can reinstall root without losing anything from home (projects, music, photos, screenshots, videos…). After the partitions are made, we format them in FAT32 and EXT4 which are good defaults for EFI, root and home. They need to have a format, or else they won't be usable: Because the laptop was new, there was no risk to lose anything, but if you're doing a install on a previous system, be very careful with the partition names. Make sure they match with the ones in . Now that we have usable partitions, we need to mount them or they won't be accessible. We can do this with : Remember to use the correct partitions while mounting. We mount everything so that the system knows which partitions we care about, which we will let know about later on. Next step is to setup the basic Arch Linux system on root, which can be done with . What follows the directory is a list of packages, and you may choose any you wish (at least add , and ). These can be installed later, but I'd recommend having them from the beginning, just in case: Because my system has an intel CPU, I also installed . Next up is generating the file, which we tell to use UUIDs to be on the safe side through . This file is important, because without it the system won't know what partitions exist and will happily only boot with the initramfs, without anything of what we just installed at root. Not knowing this made me restart the entire installation process a few times. After that's done, we can change our root into our mount point and finish up configuration. We setup our timezone (so DST can be handled correctly if needed), synchronize the hardware clock (to persist the current time to the BIOS), uncomment our locales (exit by pressing ESC, then type and press enter), generate locale files (which some applications need), configure language and keymap, update the hostname of our laptop and what indicate what means… Really, we could've done all of this later, and the same goes for setting root's password with or creating users (some of the groups you probably want are and ). The important part here is installing GRUB (which also needed the package): If we want GRUB to find our Windows install, we also need the and packages that we installed earlier with , and with those we need to mount the Windows partition somewhere. It doesn't matter where. With that done, we can generate the GRUB configuration file which lists all the boot options: (In my case, I installed Windows before completing the Arch install, which created an additional partition in between). With GRUB ready, we can exit the chroot and reboot the system, and if all went well, you should be greeted with a choice of operating system to use: If for some reason you need to find what mountpoints were active prior to rebooting (to them for example), you can use . Before GRUB I tried rEFInd, which as I explained had issues with for missing a warning. Then I tried systemd-boot, which did not pick up Arch at first. That's where the several reinstalls come from, I didn't want to work with a half-worked system so I mostly redid the entire process quite a few times. I had a external disk formatted with NTFS. Of course, after moving every file I cared about from my previous Linux install caused all the permissions to reset. All my repositories, dirty with file permission changes! This is going to take a while to fix, or maybe I should just . Here is a lovely command to sort them out on a per-repository basis: I never realized how much I had stored over the years, but it really was a lot. While moving things to the external disk, I tried to do some cleanup, such as removing some build artifacts which needlessly occupy space, or completely skipping all the binary application files. If I need those I will install them anyway. The process was mostly focused on finding all the projects and program data that I did care about, or even some game saves. Nothing too difficult, but definitely time consuming. Now that our system is ready, install to grab a copy of the speed. It should help speed up the download of whatever packages you want to install, since it will help us rank the mirrors by download speed . Making a copy of the file is important, otherwise whenever you try to install something it will fail saying it can't find anything. This will take a while, but it should be well worth it. We're using to see the progress as it goes. Some other packages I installed after I had a working system in no particular order: After that, I configured my Super L key to launch so that it opens the application menu, pretty much the same as it would on Windows, moved the panels around and configured them to my needs, and it feels like home once more. I made some mistakes while configuring systemd-networkd and accidentally added a service that was incorrect, which caused boot to wait for it to timeout before completing. My boot time was taking 90 seconds longer because of this! The solution was to remove said service , so this is something to look out for. In order to find what was taking long, I had to edit the kernel parameters to remove the option. I prefer seeing the output on what my computer is doing anyway, because it gives me a sense of progress and most importantly is of great value when things go wrong. Another interesting option is , which makes a disk lazily-mounted. If you have a slow disk, this could help speed things up. If you see a service taking long, you can also use to see what takes the longest, and is also helpful to find what services are active. My was spitting out a bunch of warnings: …ANSI encoding? Immediately I added the following to and : For some reason, I also had to edit 's preferences in advanced to change the default character encoding to UTF-8. This also solved my issues with pasting things into the terminal, and also proper rendering! I guess pastes were not working because it had some characters that could not be encoded. To have working notifications, I added the following to after : I'm pretty sure there's a better way to do this, or maybe it's not even necessary, but this works for me. Some of the other things I had left to do was setting up to speed up Rust builds: Once I had ready, installed and with it to perform screenshots. I also disabled the security delay when downloading files in Firefox because it's just annoying, in setting to , and added the Kill sticky headers to my bookmarks (you may prefer the updated version ). The comes with a utility to trim the SSD weekly , which I want enabled via (you may also want to it if you don't reboot often). For more SSD tips, check How to optimize your Solid State Drive . If the sound is funky prior to reboot, try and , or delete . I haven't been able to get the brightness keys to work yet, but it's not a big deal, because scrolling on the power manager plugin of Xfce does work (and also works, or writing directly to ). On the Windows side, I disabled the annoying Windows defender by running ( Ctrl+R ) and editing: I also updated the file (located at ) with the hope that it will stop some of the telemetry. Last, to have consistent time on Windows and Linux, I changed the following registry key for a with value : (The key might not exist, but you can create it if that's the case). All this time, my laptop had the keyboard lights on, which have been quite annoying. Apparently, they also can cause massive FPS drops . I headed over to Asus Rog downloads , selected Aura Sync… …great! I'll just find the Aura site somewhere else… Oh come on. After waiting for the next day, I headed over, downloaded their software, tried to install it and it was an awful experience. It felt like I was purposedly installing malware. It spammed and flashed a lot of 's on screen as if it was a virus. It was stuck at 100% doing that and then, Windows blue-screened with . Amazing. How do you screw up this bad? Well, at least rebooting worked. I tried to uninstall Aura, but of course that failed . Using the troubleshooter to uninstall programs helped me remove most of the crap that was installed. After searching around how to disable the lights (because my BIOS did not have this setting ), I stumbled upon "Armoury Crate" . Okay, fine, I will install that. The experience wasn't much better. It did the same thing with a lot of consoles flashing on screen. And of course, it resulted in another blue-screen, this time . To finish up, the BSOD kept happening as I rebooted the system. Time to reinstall Windows once more. After booting and crashing a few more times I could get into secure mode and perform the reinstall from there, which saved me from burning the again. Asus software might be good, but the software is utter crap. After trying out rogauracore (which didn't list my model), it worked! I could disable the stupid lights from Linux, and OpenRGB also works on Windows which may be worth checking out too. Because helped me and they linked to hw-probe , I decided to run it on my system , with the hopes it is useful for other people. I hope the installation journey is at least useful to someone, or that you enjoyed reading about it all. If not, sorry! Intel® Core i7-9750H (6 cores, 12MB cache, 2.6GHz up to 4.5GHz, 64-bit) 16GB RAM (8GB*2) DDR4 2666MHz 512GB SSD M.2 PCIe® NVMe Display 15.6" (1920x1080/16:9) 60Hz Graphics NVIDIA® GeForce® GTX1650 4GB GDDR5 VRAM LAN 10/100/1000 Wi-Fi 5 (802.11ac) 2x2 RangeBoost Bluetooth 5.0 48Wh battery with 3 cells 3 x USB 3.1 (GEN1) Intel® Core™ i5-5200U 4GB RAM DDR3L 1600MHz (which I upgraded to have 8GB) Display 15.6" (1366x768/16:9) Intel® HD Graphics 4400 LAN 10/100/1000 Wifi 802.11 bgn Bluetooth 4.0 Battery 2 cells 1 x USB 2.0 2 x USB 3.0 A 100MB one for the EFI system. A 32GB one for Linux' root partition. A 200GB one for Linux' home partition. The rest was unallocated for Windows because I did this first. and . I just love the simplicity of XFCE. , a really nice start menu. and , to quickly adjust the audio with my mouse. , a GUI alternative I generally prefer to . and to get nice integration with XFCE4 and audio mixing. , which comes with fonts too. A really good web browser. , to commit crimes code. , a wonderful editor which I used to write this blog entry. , so much nicer to write a simple commit message. and , my favourite language to toy around ideas or use as a calculator. , for my needs on sharing memes. and , a simple terminal music player and media player. , to connect into any VPS I have access to. , necessary to build most projects I'll find myself working with (or even compiling some projects Rust which I installed via ). , , , and , to be able to play more audio files. , to make random drawings. , to convert media or record screen. , to automatically copy screenshots to my clipboard. , needed by Thunar to handle mounting and having a trash (perma-deletion by default can be nasty sometimes). , , and , if you don't want missing gliphs everywhere. and , for notifications. , to be able to . Make sure to . (with , , and ) to uncompress stuff. to read files. is always nice to tinker around with SQLite databases. if you want to run Java applications. is nice with a SSD to view your disk statistics. Computer Configuration > Administrative Templates > Windows Components > Windows Defender » Turn off Windows Defender » Enable User Configuration > Administrative Templates > Start Menu and Taskbar » Remove Notifications and Action Center » Enable

Java

Hardware

Open Source

0 views

Lonami 6 years ago

Tips for Outpost

Outpost is a fun little game by Open Mid Interactive that has popped in recently in my recommended section of Steam, and I decided to give it a try. It's a fun tower-defense game with progression, different graphics and random world generation which makes it quite fun for a few hours. In this post I want to talk about some tips I found useful to get past night 50. At first, you may be inclined to design a checkerboard pattern like the following, where "C" is the Crystal shrine, "S" is a stone launcher and "B" is a booster: Indeed, this pattern will apply 4 boosts to every turret, but unfortunately, the other 4 slots of the booster are wasted! This is because boosters are able to power 8 different towers, and you really want to maximize that. Here's a better design: The shrine's tower does get boosted, but it's still not really worth it to boost it. This pattern works good, and it's really easy to tile: just repeat the same 3x3 pattern. Nonetheless, we can do better. What if we applied multiple boosters to the same tower while still applying all 8 boosts? That's what peak performance looks like. You can actually apply multiple boosters to the same tower, and it works great. Now, is it really worth it building anywhere except around the shrine? Not really. You never know where a boss will come from, so all sides need a lot of defense if you want to stand a chance. The addition of traps in 1.6 is amazing. You want to build these outside your strong "core", mostly to slow the enemies down so your turrets have more time to finish them off. Don't waste boosters on the traps, and build them at a reasonable distance from the center (the sixth tile is a good spot): If you gather enough materials, you can build more trap and cannon layers outside, roughly at enough distance to slow them for enough duration until they reach the next layer of traps, and so on. Probably a single gap of "cannon, booster, cannon" is enough between trap layers, just not in the center where you need a lot of fire power. Talents are the way progression works in the game. Generally, after a run, you will have enough experience to upgrade nearly all talents of roughly the same tier. However, some are worth upgrading more than others (which provide basically no value). The best ones to upgrade are: Some decent ones: Some okay ones: Some crap ones: Always build the highest tier, there's no point in anything lower than that. You will need to deal a lot of damage in a small area, which means space is a premium. If you're very early in the game, I recommend alternating both the flag and torch in a checkerboard pattern where the boosters should go in the pattern above. This way your towers will get extra speed and extra range, which works great. When you're in mid-game (stone launchers, gears and campfires), I do not recommend using campfires. The issue is their range boost is way too long, and the turrets will miss quite a few shots. It's better to put all your power into fire speed for increased DPS, at least near the center. If you manage to build too far out and some of the turrets hardly ever shoot, you may put campfires there. In end-game, of course alternate both of the highest tier upgrades. They are really good, and provide the best benefit / cost ratio. It is very important to use all your energy every day! Otherwise it will go to waste, and you will need a lot of materials. As of 1.6, you can mine two things at once if they're close enough! I don't know if this is intended or a bug, but it sure is great. Once you're in mid-game, your stone-based fort should stand pretty well against the nights on its own. After playing for a while you will notice, if your base can defend a boss, then it will have no issue carrying you through the nights until the next boss. You can (and should!) spend the nights gathering materials, but only when you're confident that the night won't run out. Before the boss hits (every fifth night), come back to your base and use all of your materials. This is the next fort upgrade that will carry it the five next nights. You may also speed up time during night, but make sure you use all your energy before hand. And also take care, in the current version of the game speeding up time only speeds up monster movement, not the fire rate or projectile speed of your turrets! This means they will miss more shots and can be pretty dangerous. If you're speeding up time, consider speeding it up for a little bit, then go back to normal until things are more calm, and repeat. If you're in the end-game, try to rush for chests. They provide a huge amount of materials which is really helpful to upgrade all your tools early so you can make sure to get the most out of every rock left in the map. In the end-game, after all stone has been collected, you don't really need to use all of your energy anymore. Just enough to have enough wood to build with the remaining stone. This will also be nice with the bow upgrades, which admitedly can get quite powerful, but it's best to have a strong fort first. In my opinion, winter is just the best of the seasons. You don't really need that much energy (it gets tiresome), or extra tree drops, or luck. Slower movement means your turrets will be able to shoot enemies for longer, dealing more damage over time, giving them more chance to take enemies out before they reach the shrine. Feel free to re-roll the map a few times (play and exit, or even restart the game) until you get winter if you want to go for The Play. In my opinion, you really should rush for the best pickaxe you can afford. Stone is a limited resource that doesn't regrow like trees, so once you run out, it's over. Better to make the best use out of it with a good pickaxe! You may also upgrade your greaves, we all known faster movement is a really nice quality of life improvement. Of course, you will eventually upgrade your axe to chop wood (otherwise it's wasted energy, really), but it's not as much of a priority as the pickaxe. Now, the bow is completely useless. Don't bother with it. Your energy is better spent gathering materials to build permanent turrets that deal constant damage while you're away, and the damage adds up with every extra turret you build. With regards to items you carry (like sword, or helmet), look for these (from best to worst): Less minion life, nothing to say. You will need it near end-game. The chance to not consume energy is better the more energy you have. With a 25% chance not to consume energy, you can think of it as 1 extra energy for every 4 energy you have on average. Turret damage is a tough one, it's amazing mid-game (it basically doubles your damage) but falls short once you unlock the cannon where you may prefer other items. Definitely recommended if you're getting started. You may even try to roll it on low tiers by dying on the second night, because it's that good. Extra energy is really good, because it means you can get more materials before it gets too rough. Make sure you have built at least two beds in the first night! This extra energy will pay of for the many nights to come. The problem with free wood or stone per day is that you have, often, five times as much energy per day. By this I mean you can get easily 5 stone every day, which means 5 extra stone, whereas the other would provide just 1 per night. On a good run, you will get around 50 free stone or 250 extra stone. It's a clear winner. In end-game, more quality of life are revealing chests so that you can rush them early, if you like to hunt for them try to make better use of the slot. I hope you enjoy the game as much as I do! Movement is sometimes janky and there's the occassional lag spikes, but despite this it should provide at least a few good hours of gameplay. Beware however a good run can take up to an hour! Starting supplies. Amazing to get good tools early. Shrine shield. Very useful to hold against tough bosses. Better buildings (cannon, boosters, bed and traps). They're a must to deal the most damage. Better pickaxe. Stone is limited, so better make good use of it. Better chests. They provide an insane amount of resources early. Winter slow. Turrets will have more time to deal damage, it's perfect. More time. Useful if you're running out, although generally you enter nights early after having a good core anyway. More rocks. Similar to a better pickaxe, more stone is always better. In-shrine turret. It's okay to get past the first night without building but not much beyond that. Better axe and greaves. Great to save some energy and really nice quality of life to move around. Tree growth. Normally there's enough trees for this not to be an issue but it can save some time gathering wood. Wisps. They're half-decent since they can provide materials once you max out or max out expensive gear. Extra XP while playing. Generally not needed due to the way XP scales per night, but can be a good boost. Runestones. Not as reliable as chests but some can grant more energy per day. Boosts for other seasons. I mean, winter is already the best, no use there. Bow. The bow is very useless at the moment, it's not worth your experience. More energy per bush. Not really worth hunting for bushes since you will have enough energy to do well. Less minion life. Chance to not consume energy. +1 turret damage. Extra energy. +1 drop from trees or stones. +1 free wood or stone per day.

0 views

Lonami 7 years ago

Python ctypes and Windows

Python 's is quite a nice library to easily load and invoke C methods available in already-compiled files without any additional dependencies. And I love depending on as little as possible. In this blog post, we will walk through my endeavors to use with the Windows API , and do some cool stuff with it. We will assume some knowledge of C/++ and Python, since we will need to read and write a bit of both. Please note that this post is only an introduction to , and if you need more information you should consult the Python's documentation for . While the post focuses on Windows' API, the code here probably applies to unix-based systems with little modifications. First of all, let's learn how to load a library. Let's say we want to load : Yes, it's that simple. When you access an attribute of , said library will load. Since Windows is case-insensitive, we will use lowercase consistently. Calling a function is just as simple. Let's say you want to call , which is defined as follows: Okay, it returns a and takes two inputs, and . So we can call it like so: Try it! Your cursor will move! We can go a bit more crazy and make it form a spiral: Ah, it's always so pleasant to do random stuff when programming. Sure makes it more fun. was really simple. It took two parameters and they both were integers. Let's go with something harder. Let's go with ! Emulating input will be a fun exercise: Okay, , what are you? Microsoft likes to prefix types with what they are. In this case, stands for "Long Pointer" (I guess?), so is just a Long Pointer to : Alright, that's new. We have a and , two different concepts. We can define both with : Structures are classes that subclass , and you define their fields in the class-level variable, which is a list of tuples . The C structure had a . is a , and is a name like any other, which is why we did . But what about the union? It's anonymous, and we can't make anonymous unions ( citation needed ) with . We will give it a concrete name and a type. Before defining the union, we need to define its inner structures, , and . We won't be using them all, but since they count towards the final struct size (C will choose the largest structure as the final size), we need them, or Windows' API will get confused and refuse to work (personal experience): Some things to note: Now that we have all the types we need defined, we can use them: Run it! It will press and release the keys to type the word ! stands for "virtual key". Letters correspond with their upper-case ASCII value, which is what we did above. You can find all the available keys in the page with all the Virtual Key Codes . What happens if a method wants something by reference? That is, a pointer to your thing? For example, : It wants a Long Pointer to . We can do just that with : Now you can track the mouse position! Make sure to the program when you're tired of it. What happens if a method wants a dynamically-sized input? In that case, you can create an in-memory of with . It will return a character array of that size, which you can pass as a pointer directly (without ). To access the buffer's contents, you can use either or : When the method fills in the data, you can your buffer back into a pointer of a concrete type: And you can de-reference pointers with : Arrays are defined as . Your linter may not like that, and if you don't know the size beforehand, consider creating a 0-sized array. For example: If there's a better way to initialize arrays, please let me know. Under Windows, the module has a submodule. This one contains definitions like which may be useful and can be imported as: Some functions (I'm looking at you, ) ask us to pass a callback. In this case, it wants a : The naive approach won't work: Instead, you must wrap your function as a C definition like so: You may have noticed this is what decorators do, wrap the function. So… …will also work. And it is a lot fancier. With the knowledge above and some experimentation, you should be able to call and do (almost) anything you want. That was pretty much all I needed on my project anyway :) We have been letting Python convert Python values into C values, but you can do so explicitly too. For example, you can use to make sure to pass that as a . And if you have a , you can convert or cast it to its Python as . The same applies for integers, longs, floats, doubles… pretty much anything, char pointers (strings) included. If you can't find something in their online documentation, you can always for it in the directory. Note that the 's that you define can have more methods of your own. For example, you can write them a to easily view its fields, or define a to re-interpret some data in a meaningful way. For enumerations, you can pass just the right integer number, make a constant for it, or if you prefer, use a . For example, would be: And you should be able to pass as the parameter now. If you see a function definition like , that's C's way of saying it takes no parameters, so just call it as . Make sure to pass all parameters, even if they seem optional they probably still want a at least, and of course, read the documentation well. Some methods have certain pre-conditions. Have fun hacking! Pointers are defined as . The field names can be anything you want. You can make them more "pythonic" if you want (such as changing for just ), but I chose to stick with the original naming. The union is very similar, but it uses instead of . We gave a name to the anonymous union, , and used it inside with also a made-up name, .

API

0 views

Lonami 7 years ago

Shattered Pixel Dungeon

Shattered Pixel Dungeon is the classic roguelike RPG game with randomly-generated dungeons. As a new player, it was a bit frustrating to be constantly killed on the first levels of the dungeon, but with some practice it's easy to reach high levels if you can kill the first boss. The game comes with its own tips, but here's a short and straight-forward summary: There is a boss every 5 levels. If you followed the basic tips, you will sooner or later make use of two scrolls of upgrade in a single run. This will unlock the mage class, which is ridiculously powerful. He starts with a ranged-weapon, a magic missile wand, which is really helpful to keep enemies at a distance. Normally, you want to use this at first to surprise attack them soon, and if you are low on charges, you may go melee on normal enemies if you are confident. This game is all about luck and patience! Some runs will be better than others, and you should thank and pray the RNG gods for them. If you don't, they will only give you cursed items and not a single scroll to clean them. So, good luck and enjoy playing! Don't rush into enemies . Abuse doors and small corridors to kill them one by one. You can use the clock on the bottom left to wait a turn without moving. Explore each level at full . You will find goodies and gain XP while doing so. Upon finding a special room (e.g. has a chest but is protected by piranhas), drink all potions that you found in that level until there's one that helps you (e.g. be invisible so piranhas leave you alone). There is guaranteed to be a helpful one per level with special rooms. Drink potions as early as possible . Harmful potions do less damage on early levels (and if you die, you lose less). This will keep them identified early for the rest of the game. Read scrolls as early as possible as well. This will keep them identified. It may be worth to wait until you have an item which may be cursed and until the level is clear, because some scrolls clean curses and others alert enemies. Food and health are resources that you have to manage , not keep them always at full. Even if you are starving and taking damage, you may not need to eat just yet , since food is scarce. Eat when you are low on health or in possible danger. Piranhas . Seriously, just leave them alone if you are melee. They're free food if you're playing ranged, though. Prefer armor over weapons . And make sure to identify or clean it from curses before wearing anything! Find a dew vial early . It's often a better idea to store dew (health) for later than to use it as soon as possible. Level 5 boss . Try to stay on water, but don't let it stay on water since it will heal. Be careful when he starts enraging. Level 10 boss . Ranged weapons are good against it. Level 15 boss . I somehow managed to tank it with a health potion. Level 20 boss . I didn't get this far just yet. You are advised to use scrolls of magic mapping in the last levels to skip straight to the boss, since there's nothing else of value. Level 25 boss . The final boss. Good job if you made it this far!

0 views

Lonami 7 years ago

Installing NixOS, Take 2

This is my second take at installing NixOS, after a while being frustrated with Arch Linux and the fact that a few kernel upgrades ago, the system crashed randomly from time to time. did not have any helpful hints and I thought reinstalling could be worthwhile anyway. This time, I started with more knowledge! The first step is heading to the NixOS website and downloading their minimal installation CD for 64 bits. I didn't go with their graphical live CD, because their installation manual is a wonderful resource that guides you nicely. Once you have downloaded their , you should probably verify it's and make sure that it matches. The easiest thing to do in my opinion is using an USB to burn the image in it. Plug it in and check its device name with . In my case, it was , so I went ahead with it and ran . Make sure to run once that's done. If either or seem "stuck" in the end, they are just flushing the changes to disk to make sure all is good. This is normal, and depends on your drives. Now, reboot your computer with the USB plugged in and make sure to boot into it. You should be welcome with a pretty screen. Just select the first option and wait until it logs you in as root. Once you're there you probably want to or whatever your keyboard layout is, or you will have a hard time with passwords, since the characters are all over the place. In a clean disk, you would normally create the partitions now. In my case, I already had the partitions made (100MB for the EFI system, where lives, 40GB for the root partition with my old Linux installation, and 700G for ), so I didn't need to do anything here. The manual showcases , but I personally use , which has very helpful help I check every time I use it. Important : The in is probably different in your system! Make sure you use to see the correct letters and numbers! With the partitions ready in my UEFI system, I formatted both and just to be safe with and (remember that these are the letters and numbers used in my partition scheme). Don't worry about the warning in the second command regarding lowercase letters and Windows. It's not really an issue. Now, since we gave each partition a label, we can easily mount them through and, in UEFI systems, be sure to and . I didn't bother setting up swap, since I have 8GB of RAM in my laptop and that's really enough for my use case. With that done, we will now ask the configuration wizard to do some work for us (in particular, generate a template) with . This generates a very well documented file that we should edit right now (and this is important!) with whatever editor you prefer. I used , but you can change it for if you prefer. On to the configuration file, we need to enable a few things, so and start scrolling down. We want to make sure to uncomment: (Fun fact, I overlooked the configuration file until I wrote this and hadn't noticed sound/pulseaudio was there. It wasn't hard to find online how to enable it though!) Now, let's modify . But if you have in a separate partition like me, you should run to figure out its UUID. To avoid typing it out myself, I just ran so that I could easily move it around with : Note that, obviously, you should put your own partition's UUID there. Modifying the configuration is where I think the current NixOS' manual should have made more emphasis, at this step of the installation. They do detail it below, but that was already too late in my first attempt. Anyway, you can boot from the USB and run as many times as you need until you get it working! But before installing, we need to configure the network since there are plenty of things to download. If you want to work from WiFi, you should first figure out the name of your network card with . In my case it's called . So with that knowledge we can run . Be sure to replace both and with the name of your network and password key, respectively. If they have spaces, surround them in quotes. Another funny pitfall was typing in the command above twice (instead of ). That sure spit out a few funny errors! Once you have ran that, wait a few seconds and to make sure that you can reach the internet. If you do, and let's install NixOS! Well, that was pretty painless. You can now and enjoy your new, functional system. The process of installing NixOS was really painless once you have made sense out of what things mean. I was far more pleased this time than in my previous attempt, despite the four attempts I needed to have it up and running. However not all is so good. I'm not sure where I went wrong, but the first time I tried with instead of , all I was welcome with was a white, small terminal in the top left corner. I even generated a configuration file with to make sure it could detect my Mod1/Mod4 keys (which, it did), but even after rebooting, my commands weren't responding. For example, I couldn't manage to open another terminal with . I'm not even sure that I was in … In my very first attempt, I pressed as suggested in the welcome message. This took me an offline copy of the manual, which is really nicely done. Funny enough, though, I couldn't exit . Both and to quit and take me back wouldn't work. Somehow, it kept throwing me back into , so I had to forcibly shutdown. In my second attempt, I also forgot to configure network, so I had no way to download without having itself to connect my laptop to the network! So, it was important to do that through the USB before installing it (which comes with the program preinstalled), just by making sure to add it in the configuration file. Some other notes, if you can't reach the internet, don't add any DNS in . This should be done declaratively in . In the end, I spent the entire afternoon playing around with it, taking breaks and what-not. I still haven't figured out why was printing the literal escape character when going from normal to insert mode in the (and other actions also made it print this "garbage" to the console), why sometimes the network can reach the internet (and only some sites!) and sometimes not, and how to setup dualboot. But despite all of this, I think it was a worth installing it again. One sure sees things from a different perspective, and gets the chance to write another blog post! If there's something I overlooked or that could be done better, or maybe you can explain it differently, please be sure to contact me to let me know! Well, that was surprisingly fast feedback. Thank you very much @bb010g for it! As they rightfully pointed out, one can avoid adding manually to if you mount it before generating the configuration files. However, the installation process doesn't need mounted, so I didn't do it. The second weird issue with is actually a funny one. switches to another TTY ! That's why quitting the program wouldn't do anything. You'd still be in a different TTY! Normally, this is , so I hadn't even thought that this is what could be happening. Anyway, the solution is not quitting the program, but rather going back to the main TTY with . You can switch back and forth all you need to consult the manual. More suggestions are having manage the graphical sessions, since it should be easier to deal with than the alternatives. Despite having followed the guide and having read it over and over several times, it seems like my thoughts in this blog post may be a bit messy. So I recommend you also reading through the guide to have two versions of all this, just in case. Regarding network issues, they use so that may be worth checking out. Regarding terminal issues with printing the literal escape character, I was told off for not having checked what my was. I hadn't really looked into it much myself, just complained about it here, so sorry for being annoying about that. A quick search in the repository lets us find neovim/default.nix , with version 0.3.1. Looking at Neovim's main repository we can see that this is a bit outdated, but that is fine. If only I had bothered to look at Neovim's wiki , (which they found through Neovim's GitHub issues ) I would've seen that some terminals just don't support the program properly. The solution is, of course, to use a different terminal emulator with better support or to disable the in Neovim's config. This is a pretty good life lesson. 30 seconds of searching, maybe two minutes and a half for also checking XFCE issues, are often more than enough to troubleshoot your issues. The internet is a big place and more people have surely came across the problem before, so make sure to look online first. In my defense I'll say that it didn't bother me so much so I didn't bother looking for that soon either.

Linux

Open Source

0 views

Lonami 7 years ago

Breaking Risk of Rain

Risk of Rain is a fun little game you can spend a lot of hours on. It's incredibly challenging for new players, and fun once you have learnt the basics. This blog will go through what I've learnt and how to play the game correctly. If you're new to the game, you may find it frustrating. You must learn very well to dodge. Your first character will be Commando . He's actually a very nice character. Use your third skill (dodge) to move faster, pass through large groups of enemies, and negate fall damage. If there are a lot of monsters, remember to leave from there! It's really important for survival. Most enemies don't do body damage . Not even the body of the Magma Worm or the Wandering Vagrant (just dodge the head and projectiles respectively). The first thing you must do is always rush for the teleporter . Completing the levels quick will make the game easier. But make sure to take note of where the chests are ! When you have time (even when the countdown finishes), go back for them and buy as many as you can. Generally, prefer chests over shrines since they may eat all your money. Completing the game on Drizzle is really easy if you follow these tips. Before breaking the game, you must obtain several artifacts . We are interested in particular in the following: With those, the game becomes trivial. Playing as Huntress is excellent since she can move at high speed while killing everything on screen. The rest is easy! With the command artifact you want the following items. If you want to be safer: If you don't have enough and want more fun, get one of these: If, again, you want more fun, get one of these: For more fun: You can now beat the game in Monsoon solo with any character. Have fun! And be careful with the sadly common crashes. Sacrifice . You really need this one, and may be a bit hard to get. With it, you will be able to farm the first level for 30 minutes and kill the final boss in 30 seconds. Command . You need this unless you want to grind for hours to get enough of the items you really need for the rest of the game. Getting this one is easy. Glass . Your life will be very small (at the beginning…), but you will be able to one-shot everything easily. Kin (optional). It makes it easier to obtain a lot of boxes if you restart the first level until you get lemurians or jellyfish as the monster, since they're cheap to spawn. Soldier's Syringe . Stack 13 of these and you will triple your attack speed. You can get started with 4 or so. Paul's Goat Hoof . Stack +30 of these and your movement speed will be insane. You can get a very good speed with 8 or so. Crowbar . Stack +20 to guarantee you can one-shot bosses. Hermit's Scarf . Stack 6 of these to dodge 1/3 of the attacks. Monster Tooth . Stack 9 of these to recover 50 life on kill. This is plenty, since you will be killing a lot . Gasoline . Burn the ground on kill, and more will die! Headstompers . They make a pleasing sound on fall, and hurt. Lens-Maker's Glasses . Stack 14 and you will always deal a critical strike for double the damage. Infusion . You only really need one of this. Your life will skyrocket after a while, since this gives you 1HP per kill. Hopoo Feather . Stack +10 of these. You will pretty much be able to fly with so many jumps. Guardian's Heart . Not really necessary, but useful for early and late game, since it will absorb infinite damage the first hit. Ukelele . Spazz your enemies! Will-o'-the-wisp . Explode your enemies! Chargefield Generator . It should cover your entire screen after a bit, hurting all enemies without moving a finger. Golden Gun . You will be rich, so this gives you +40% damage. Predatory Instincts . If you got 14 glasses, you will always be doing critical strikes, and this will give even more attack speed. 56 Leaf Clover . More drops, in case you didn't have enough. Ceremonial Dagger . Stack +3 , then killing one thing kills another thing and makes a chain reaction. Alien Head . Stack 3 , and you will be able to use your abilities more often. Brilliant Behemoth . Boom boom.

0 views

Lonami 8 years ago

WorldEdit Commands

WorldEdit is an extremely powerful tool for modifying entire worlds within Minecraft , which can be used as either a mod for your single-player worlds or as a plugin for your Bukkit servers. This command guide was written for Minecraft 1.12.1, version 6.1.7.3 , but should work for newer versions too. All WorldEdit commands can be used with a double slash ( ) so they don't conlict with built-in commands. This means you can get a list of all commands with . Let's explore different categories! In order to edit a world properly you need to learn how to move in said world properly. There are several straightforward commands that let you move: Knowing your world properly is as important as knowing how to move within it, and will also let you change the information in said world if you need to. You can act over all blocks in a radius around you with quite a few commands. Some won't actually act over the entire range you specify, so 100 is often a good number. You can fill pools with or caves with , both of which act below your feet. If the water or lava is buggy use or respectively. Some creeper removed the snow or the grass? Fear not, you can use or . You can empty a pool completely with , remove the snow with , and remove fire with . You can remove blocks above and below you in some area with the and . You probably want to set a limit though, or you could fall off the world with for radius and depth. You can also remove near blocks with . Making a cylinder (or circle) can be done with through , a third argument for the height. The radius can be comma-separated to make a ellipses instead, such as . Spheres are done with . This will build one right at your center, so you can raise it to be on your feet with . Similar to cylinders, you can comma separate the radius . Pyramids can be done with . All these commands can be prefixed with "h" to make them hollow. For instance, . Operating over an entire region is really important, and the first thing you need to work comfortably with them is a tool to make selections. The default wooden-axe tool can be obtained with , but you must be near the blocks to select. You can use a different tool, like a golden axe, to use as your "far wand" (wand usable over distance). Once you have one in your hand type to use it as your "far wand". You can select the two corners of your region with left and right click. If you have selected the wrong tool, use to clear it. If there are no blocks but you want to use your current position as a corner, use or 2. If you made a region too small, you can enlarge it with , or for the entire vertical range, etc., or make it smaller with etc., or it to contract in both directions. You can use short-names for the cardinal directions (NSEW). Finally, if you want to move your selection, you can it to wherever you need. You can get the of the selection or even in some area. If you want to count all blocks, get their distribution . With a region selected, you can it to be any block! For instance, you can use to clear it entirely. You can use more than one block evenly by separting them with a comma , or with a custom chance . You can use instead if you don't want to override all blocks in your selection. You can make an hollow set with , and if you just want the walls, use . If someone destroyed your wonderful snow landscape, fear not, you can use over it (although for this you actually have and its opposite ). If you set some rough area, you can always it, even more than one time with . You can get your dirt and stone back with and put some plants with or , both of which support a density or even the type for the trees. If you already have the dirt use instead. If you want some pumpkins, with . You can repeat an entire selection many times by stacking them with . This is extremely useful to make things like corridors or elevators. For instance, you can make a small section of the corridor, select it entirely, and then repeat it 10 times with . Or you can make the elevator and then . If you need to also copy the air use . Finally, if you don't need to repeat it and simply move it just a bit towards the right direction, you can use . The default direction is "me" (towards where you are facing) but you can set one with for example. You can not only select cuboids. You can also select different shapes, or even just points: Brushes are a way to paint in 3D without first bothering about making a selection, and there are spherical and cylinder brushes with e.g. , or the shorter form . For cylinder, one must use instead . There also exists a brush to smooth the terrain which can be enabled on the current item with , which can be used with right-click like any other brush. Finally, you can copy and cut things around like you would do with normal text with and . The copy is issued from wherever you issue the command, so when you use , remember that if you were 4 blocks apart when copying, it will be 4 blocks apart when pasting. The contents of the clipboard can be flipped to wherever you are looking via , and can be rotated via the command (in degrees). To remove the copy use . goes up one floor. goes down one floor. let's you pass through walls. to go wherever you are looking. shows all known biomes. shows the current biome. lets you change the biome. is the default. expands the default. first point with left click and right click to add new points. first point to select the center and right click to select the different radius. first point to select the center and one more right click for the radius. for cylinders, first click being the center. for convex shapes. This one is extremely useful for .

0 views

Lonami 8 years ago

An Introduction to Asyncio

After seeing some friends struggle with I decided that it could be a good idea to write a blog post using my own words to explain how I understand the world of asynchronous IO. I will focus on Python's module but this post should apply to any other language easily. So what is and what makes it good? Why don't we just use the old and known threads to run several parts of the code concurrently, at the same time? The first reason is that makes your code easier to reason about, as opposed to using threads, because the amount of ways in which your code can run grows exponentially. Let's see that with an example. Imagine you have this code: And you start two threads to run the method at the same time. What is the order in which the lines of code get executed? The answer is that you can't know! The first thread can run the entire method before the second thread even starts. Or it could be the first thread that runs after the second thread. Perhaps both run the "line 1", and then the line 2. Maybe the first thread runs lines 1 and 2, and then the second thread only runs the line 1 before the first thread finishes. As you can see, any combination of the order in which the lines run is possible. If the lines modify some global shared state, that will get messy quickly. Second, in Python, threads won't make your code faster most of the time. It will only increase the concurrency of your program (which is okay if it makes many blocking calls), allowing you to run several things at the same time. If you have a lot of CPU work to do though, threads aren't a real advantage. Indeed, your code will probably run slower under the most common Python implementation, CPython, which makes use of a Global Interpreter Lock (GIL) that only lets a thread run at once. The operations won't run in parallel! Before we go any further, let's first stop to talk about input and output, commonly known as "IO". There are two main ways to perform IO operations, such as reading or writing from a file or a network socket. The first one is known as "blocking IO". What this means is that, when you try performing IO, the current application thread is going to block until the Operative System can tell you it's done. Normally, this is not a problem, since disks are pretty fast anyway, but it can soon become a performance bottleneck. And network IO will be much slower than disk IO! Blocking IO offers timeouts, so that you can get control back in your code if the operation doesn't finish. Imagine that the remote host doesn't want to reply, your code would be stuck for as long as the connection remains alive! But wait, what if we make the timeout small? Very, very small? If we do that, we will never block waiting for an answer. That's how asynchronous IO works, and it's the opposite of blocking IO (you can also call it non-blocking IO if you want to). How does non-blocking IO work if the IO device needs a while to answer with the data? In that case, the operative system responds with "not ready", and your application gets control back so it can do other stuff while the IO device completes your request. It works a bit like this: In reality, you can tell the OS to notify you when the data is ready, as opposed to polling (constantly asking the OS whether the data is ready yet or not), which is more efficient. But either way, that's the difference between blocking and non-blocking IO, and what matters is that your application gets to run more without ever needing to wait for data to arrive, because the data will be there immediately when you ask, and if it's not yet, your app can do more things meanwhile. Now we've seen what blocking and non-blocking IO is, and how threads make your code harder to reason about, but they give concurrency (yet not more speed). Is there any other way to achieve this concurrency that doesn't involve threads? Yes! The answer is . So how does help? First we need to understand a very crucial concept before we can dive any deeper, and I'm talking about the event loop . What is it and why do we need it? You can think of the event loop as a loop that will be responsible for calling your functions: That's silly you may think. Now not only we run our code but we also have to run some "event loop". It doesn't sound beneficial at all. What are these events? Well, they are the IO events we talked about before! 's event loop is responsible for handling those IO events, such as file is ready, data arrived, flushing is done, and so on. As we saw before, we can make these events non-blocking by setting their timeout to 0. Let's say you want to read from 10 files at the same time. You will ask the OS to read data from 10 files, and at first none of the reads will be ready. But the event loop will be constantly asking the OS to know which are done, and when they are done, you will get your data. This has some nice advantages. It means that, instead of waiting for a network request to send you a response or some file, instead of blocking there, the event loop can decide to run other code meanwhile. Whenever the contents are ready, they can be read, and your code can continue. Waiting for the contents to be received is done with the keyword, and it tells the loop that it can run other code meanwhile: Start reading the code of the event loop and follow the arrows. You can see that, in the beginning, there are no events yet, so the loop calls one of your functions. The code runs until it has to for some IO operation to complete, such as sending a request over the network. The method is "paused" until an event occurs (for example, an "event" occurs when the request has been sent completely). While the first method is busy, the event loop can enter the second method, and run its code until the first . But it can happen that the event of the second query occurs before the request on the first method, so the event loop can re-enter the second method because it has already sent the query, but the first method isn't done sending the request yet. Then, the second method 's for an answer, and an event occurs telling the event loop that the request from the first method was sent. The code can be resumed again, until it has to for a response, and so on. Here's an explanation with pseudo-code for this process if you prefer: This is what the event loop will do on the above pseudo-code: You may be wondering "okay, but threads work for me, so why should I change?". There are some important things to note here. The first is that we only need one thread to be running! The event loop decides when and which methods should run. This results in less pressure for the operating system. The second is that we know when it may run other methods. Those are the keywords! Whenever there is one of those, we know that the loop is able to run other things until the resource (again, like network) becomes ready (when a event occurs telling us it's ready to be used without blocking or it has completed). So far, we already have two advantages. We are only using a single thread so the cost for switching between methods is low, and we can easily reason about where our program may interleave operations. Another advantage is that, with the event loop, you can easily schedule when a piece of code should run, such as using the method , without the need for spawning another thread at all. To tell the to run the two methods shown above, we can use , which is a way of saying "I want the future of my method to be ensured". That is, you want to run your method in the future, whenever the loop is free to do so. This method returns a object, so if your method returns a value, you can this future to retrieve its result. What is a ? This object represents the value of something that will be there in the future, but might not be there yet. Just like you can your own functions, you can these 's. The functions are also called "coroutines", and Python does some magic behind the scenes to turn them into such. The coroutines can be 'ed, and this is what you normally do. That's all about ! Let's wrap up with some example code. We will create a server that replies with the text a client sends, but reversed. First, we will show what you could write with normal synchronous code, and then we will port it. Here is the synchronous version : From what we've seen, this code will block on all the lines with a comment above them saying that they will block. This means that for running more than one client or server, or both in the same file, you will need threads. But we can do better, we can rewrite it into ! The first step is to mark all your initions that may block with . This marks them as coroutines, which can be ed on. Second, since we're using low-level sockets, we need to make use of the methods that provides directly. If this was a third-party library, this would be just like using their initions. Here is the asynchronous version : That's it! You can place these two files separately and run, first the server, then the client. You should see output in the client. The big difference here is that you can easily modify the code to run more than one server or clients at the same time. Whenever you the event loop will run other of your code. It seems to "block" on the parts, but remember it's actually jumping to run more code, and the event loop will get back to you whenever it can. In short, you need an to things, and you run them with the event loop instead of calling them directly. So this… …becomes this: This is pretty much how most of your scripts will start, running the main method until its completion. Let's have some fun with a real library. We'll be using Telethon to broadcast a message to our three best friends, all at the same time, thanks to the magic of . We'll dive right into the code, and then I'll explain our new friend : Wait… how did that send a message to all three of my friends? The magic is done here: This list comprehension creates another list with three coroutines, the three . Then we just pass that list to : This method, by default, waits for the list of coroutines to run until they've all finished. You can read more on the Python documentation . Truly a good function to know about! Now whenever you have some important news for your friends, you can simply to tell all your friends about your new car! All you need to remember is that you need to on coroutines, and you will be good. will warn you when you forget to do so. If you want to understand how works under the hood, I recommend you to watch this hour-long talk Get to grips with asyncio in Python 3 by Robert Smallshire. In the video, they will explain the differences between concurrency and parallelism, along with others concepts, and how to implement your own "scheduler" from scratch. Input / Output A Toy Example A Real Example Extra Material