Skip to content

Decompilation notes

Benedikt Freisen edited this page Nov 24, 2023 · 3 revisions

This page is a collection of notes and ideas regarding the topic of static code analysis and decompilation, primarily with Ghidra in mind.

Getting started

The code base has grown quite a bit and most of it is entirely PC-specific, so the first step would have to deal with that. Certain optimizations would have to be removed.

  • Removal of unneeded video code
  • Removal of unneeded audio code
  • Expansion of the tcall tail call macro to call + ret rather than jmp
  • Addition of a wrapper function for map access, perhaps abstracted to a macro
  • More comprehensive documentation of calling conventions, either beforehand or during static analysis

All this should ideally happen in the static-analysis branch created for this purpose.

Decompilation using Ghidra

Ghidra is a powerful disassembler with decompiler, but cannot decompile assembler source code. The closest we can get is symbolic disassembly.

The following steps would be needed:

  • Prepare the YASM-generated symbol map for use in Ghidra (see below)
  • Use the script ImportSymbolsScript.py from Ghidra's script manager to import the symbols
  • Manually add all the data types and correct error caused by automatic code detection
  • Find the best decompiler settings
  • Manually add appropriate calling conventions for all the functions
  • Export the resulting C code and post-process it with a bunch of regular expressions
  • Manually turn the result into a proper (and properly documented) C port of the engine, ideally while viewing C and assembler code side by side

This chain of sed commands can turn YASM's symbol map output into something that Ghidra's importer script understands:

sed -nE '/Real +Virtual +Name/,$p' build/map.txt | sed -nE '2,$ s/([0-9A-F]+) +[0-9A-F]+ +([^ ]+)/\2 \1/ p' | sed -E 's/^([A-Z0-9_]+[ .])/\L\1/' | sed -E 's/(\.[A-Z0-9_]+ )/\L\1/' > px3_ghidra_sym.txt

The two rightmost sed invocations harmonize upper-case identifiers to lower case.

Actual porting

As of November 2023, the game logic has been ported to compilable C code and, using stubs for everything else, an executable can be created.

To be continued...

Clone this wiki locally