Jay Taylor's notes

back to listing index

Analysis of large binaries and games in Ghidra-SRE | Hacker News

[web search]
Original source (news.ycombinator.com)
Tags: reverse-engineering binaries ghidra news.ycombinator.com
Clipped on: 2021-07-28

Image (Asset 1/2) alt=

Image (Asset 2/2) alt=
I recently started analysing a game with Ghidra. I found the plugin mentioned was actually fairly useless in my case (but this was version 9, maybe things have changed), instead I used OOAnalyzer [1] and its associated Ghidra plugin - great for bootstrapping a class hierarchy, which is otherwise extremely tedious to do. It took 24 hours and ~100GB of memory, so I ran it on AWS, splitting the analysis in parts according to [2]. Also I’d recommend looking into scripting, it can save you a lot of time in repetitive scenarios.

[1] https://github.com/cmu-sei/pharos/blob/master/tools/ooanalyz... [2] https://github.com/cmu-sei/pharos/blob/master/share/prolog/o...

I've also found Ghidra nearly useless for most things. IDA seems to blow it out of the water.

I've also had significantly more success with OOAnalyzer but as you say, it's dog slow and just consumes an unfathomable amount of memory. I had a few binaries it just completely choked on regardless of part size.

Finally, while Ghidra is pretty cool in theory, in practice it's quite brittle and rough at the edges. I've had projects get corrupted, analysis that always hang indefinitely with no diagnostic information why. And it performs about the same on a 4-core host as a 64-core host, which deeply saddens me, as almost nothing is multithreaded.

That’s another reason I looked into scripting: fear of losing many hours of work through corruption. My last workflow meant writing scripts to define everything so I _always_ started from a blank slate. Naturally this rots over time as the API changes, but I feel much more confident I won’t lose work completely. Not a glowing review of Ghidra though :)

Kind of expected a writeup of findings from an example, but it just looks like a how-to article.

And the biggest takeaway to me was that the software is still heavily bugged, especially the UI.

Ghidra is really neat, so a couple times I've tried to do reversing of small games through it to practice.

Turns out stuff is really hard, despite Ghidra doing a bunch of work to try and get things to be super easy. Even for really small old indie windows games I think you really have to have a good idea of what kind of tools the people are using to get places. That or get really lucky in a string search :)

For games, I find that it is much easier to dynamically anaylze them while they are running to get an idea where you should be looking. Use something like cheat engine to get the address of health and use the built in debugger to find what writes to that health address. Then you can get the address for the instructions that write to the health and go back to something like Ghidra and start reverse engineering from there. But for small indie games, the dissasmbler in Cheat Engine is actually quite good depending on what you are trying to do.

An understanding of how C/C++ memory mangement works and how things like arrays/pointers etc normally get compiled down by the compiler. Having a general idea of how indie games a typically made (a loop which reads input -> updates game state -> draws to the screne) helps as well. In the case of finding what writes to the HP, you'll end up in a function that most likely updates game state, so you can then just walk up the functions to get to the main loop.

I spent a lot of my youth reverse engineering a small indie game. It go to the point where as a community we reverse engineered most of the in game objects into C header files. Then with the help of those header files, we made custom AI possible. Used a DLL to detour the normal AI function to our function and in that function we loaded Angelscript files that had access to the original game objects so they could control the objects. Ghidra didn't exist back then, and IDA was too expensive and the Cheat Engine dissambler was bad, so we reverse engineered everything in Ollydbg.

This was of course for a tiny indie game with a custom engine. I imagine now days with everything being made in Unity/Unreal etc, it's much harder to reverse engineer. Also x86 calling conventions were nicer than x64 imo ;).

It's more like you need to have cut your teeth before on SW development and RE since you need a good gut feeling of what bundle of assembly lines/C code correspond to what kind of functionality the developer had in mind.

Trying to RE production SW without any prior development experience is brutally hard.

Even then I would add. Lets say you have the code to something. Just figuring out what goes where and what does what takes time. Then each dev/compiler has its own 'style'. It is a process that takes time. Sometimes you can get lucky and pull on a thread and the whole sweater comes apart. But finding that one thread to tug on... Then on top of that if they turned on optimizations the code may look 'odd' as it has been optimized and that part of many compilers can do some very strange things.

I am trying to RE a 16bit windows game. Ghidra gets lost a lot on the exe to system dll paths for me. For 32bit it seems fine though.

>> you need to have cut your teeth before on SW development

I remember this back when people wanted to learn cracks/keygens: If it was anything more than replacing a value with a hex editor, it was too complicated.

I started reverse engineering a game about 3 months ago with no prior experience. It is insanely difficult. I like to think that I have very good pattern recognition and investigative skills, I find most problems surmountable given enough time. But I'm barely making any progress and am tearing my hair out just to find tiny breadcrumbs of clues. Maybe this is one of those things where picking up a few books is imperative.

In software development, we share massive amounts of information, and there's always a premade tool out there that does what you need and will work the first time. This isn't my experience in the reverse engineering world. Information is sparse, seems to be kept private, and there's not always a tool that does what you want. Even if there is, good luck getting it to work.

I wanted to do the same a few years ago. I didn't even reach that point, as I was interested in doing 90s games. Pretty much every game I tried was compiled with watcom, which ghidra doesn't know, so cannot apply any tricks when analyzing. Watcom even has its own calling convention which ghidra doesn't understand, so the pseudo C code for most functions was almost useless. Already gave up at that point.

There are ton of docs on the subject of rce. What about tuts4you or Woodmann board?

The best way to do this is to code yourself some game, view it's assembler in your preferred IDE while doing step-by-step/trace into to familiarize how its instructions are executed at CPU level. Then do the same in Ghidra. After a while, all of this during the process of coding the game to completion, you'll be able to pickup fairly easily different patterns in assembler which will make you more productive at reverse engineering.

I did this with a Tetris clone, then a Minesweeper clone and then I did a multi-threaded "Shoot the ducks" clone, each duck being a thread. Probably will take you several months but at the end you'll really be able to do a lot of RE.

Co-pilot like tool hooked-in to guess meaning of reverse engineered code would be awesome ;)

call it what-it-do

This is an amazing idea!

I've always wanted to learn how they encrypted Valkyrie Profile 2 for the Ps2. I could never reverse it to get the audio of the English voice actors.

And there is a dub patch to apply the Japanese audio to it too, or was.

Now to find a copy...

Could anyone recommend some quality resources on RE?

Applications are open for YC Winter 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact