Aletheia is a binary analysis and reverse engineering tool built entirely in Rust. At its core sits a decompiler that transforms raw machine code into readable pseudo-C through a five-phase pipeline: loading, disassembly and lifting, SSA conversion, type recovery, and structured emission. This article walks through each phase in detail, with particular attention to the algorithms and design decisions that make the pipeline work on real-world binaries without crashing, hanging, or producing garbage output.
The entire codebase enforces #![deny(unsafe_code)] at the workspace level. Every data structure, every graph algorithm, every fixed-point loop is written in safe Rust. This is not an academic exercise in purity — it is a pragmatic choice. Decompilers process adversarial input by definition. A use-after-free in your decompiler while analyzing a use-after-free in a target binary is not a good look.
The IR Design
Before covering the pipeline, it is worth understanding the intermediate representation everything operates on. Aletheia's IR is a flat, typed, register-transfer language with explicit flag modeling. It draws heavily from the VarNode concept in Van Emmerik's 2007 thesis, but departs from Ghidra's p-code in several ways — most notably in how flags are handled.
The fundamental unit of storage in the IR is a location descriptor that encodes an address space, an offset within that space, and a size in bytes. The IR distinguishes five address spaces: registers, memory (RAM), constants (immediates), temporaries (lifter scratch space), and stack-relative slots. Each IR operation carries the address of the original machine instruction for source mapping, an opcode, an optional output location, and a list of input locations.
The IR defines 43 opcodes across the categories below (SIMD and x87 FPU are additionally lifted into existing arithmetic/extension primitives):
| Category | Opcodes | Count |
|---|---|---|
| Data movement | Copy, Load, Store |
3 |
| Arithmetic | IntAdd, IntSub, IntMul, IntDiv, IntSdiv, IntRem, IntSrem |
7 |
| Logical / bitwise | IntAnd, IntOr, IntXor, IntNot, IntLeft, IntRight, IntSright, IntNegate |
8 |
| Comparison | IntEqual, IntNotEqual, IntLess, IntLessEqual, IntSless, IntSlessEqual |
6 |
| Control flow | Branch, Cbranch, Call, Return, BranchInd, CallInd |
6 |
| Multi-register | Piece, Subpiece |
2 |
| Extension | IntZext, IntSext |
2 |
| Floating-point | FloatAdd, FloatSub, FloatMul, FloatDiv, FloatNeg, FloatSqrt, FloatConvert |
7 |
| Marker | Unimplemented |
1 |
A critical design choice: flags are modeled as explicit IR operations on dedicated flag registers. An x86 add eax, ebx does not implicitly set ZF, SF, CF, and OF — the lifter emits separate IntEqual, IntSless, and carry-detection operations targeting dedicated flag locations. This eliminates the flag-liveness ambiguity that plagues decompilers with implicit flag semantics. When SSA conversion renames variables, flags participate in the same rename pass as general-purpose registers, and dead flag computations are trivially eliminated during dead store removal.
Phase 1: Loading
The loader parses PE, ELF, and Mach-O binaries using the goblin crate. It extracts sections, symbols (if present), imports, exports, and relocation entries into a unified program representation that downstream phases consume without knowing the original format.
Two details matter here. First, a hard 512 MB cap on input size prevents memory exhaustion from crafted inputs. Second, the entire parse is wrapped in std::panic::catch_unwind. Goblin is battle-tested, but decompilers see adversarial binaries — files specifically crafted to break analysis tools. A panic in the parser should produce a clean error, not a crashed process. This pattern recurs throughout the pipeline: every boundary where external data enters gets defensive wrapping.
Phase 2: Disassembly and Lifting
The architecture layer defines a trait with two key methods: one decodes raw bytes into machine instructions, and the other translates each instruction into a sequence of IR operations.
Four architectures are supported:
| Architecture | Decoder | Binding |
|---|---|---|
| x86-64 | iced-x86 | Native Rust |
| x86-32 | iced-x86 | Native Rust |
| AArch64 | capstone | C FFI |
| ARM32 | capstone | C FFI |
Architecture dispatch happens once at the start of analysis, matching on the binary's architecture and returning a trait object. Everything above the lifter — SSA conversion, type recovery, structuring — operates on IR operations and blocks with no knowledge of the source architecture. This is the payoff of a well-designed IR: a substantial body of architecture-specific lifting code feeds into a single architecture-independent pipeline.
The x86-64 Lifter
The x86-64 lifter is the most complex, organized into tiered instruction categories. A lifting context tracks temporary variable allocation and x87 FPU stack state for legacy floating-point code. Each x86 instruction expands to between 1 and roughly 20 IR operations, depending on flag behavior and addressing modes. A simple mov rax, rbx is a single copy; an imul with flags generates arithmetic operations, zero-extension, and multiple flag computations.
Phase 3: SSA Conversion
SSA conversion takes a control-flow graph and its IR blocks, and produces a representation where every variable has a unique definition point. The conversion follows Cytron et al. (1991) and proceeds in three sub-phases: dominator tree construction, phi insertion, and variable renaming.
Cooper-Harvey-Kennedy Dominator Tree
The dominator tree is computed using the Cooper-Harvey-Kennedy iterative algorithm. CHK was chosen over Lengauer-Tarjan for its simplicity and the fact that its performance is effectively linear on the reducible CFGs that compilers produce — the theoretical O(n^2) worst case does not manifest in practice.
The algorithm begins with a depth-first traversal to assign postorder numbers. This DFS is implemented with an explicit stack rather than recursion:
// Iterative DFS for postorder numbering (pseudocode)
function compute_postorder(cfg, entry):
stack = [(entry, 0)] // (node, successor_index)
visited = {}
postorder = []
while stack is not empty:
(node, succ_idx) = stack.top()
visited.add(node)
if succ_idx < len(cfg.successors[node]):
next = cfg.successors[node][succ_idx]
stack.top().succ_idx += 1
if next not in visited:
stack.push((next, 0))
else:
postorder.append(stack.pop().node)
// postorder[i] has postorder number i
// Invert for O(1) lookup: postorder_number[node] = i
return postorder, inverted_map
This is a recurring pattern in Aletheia: every graph traversal uses an explicit stack. Recursive DFS on a function with 50,000 basic blocks will overflow the default 8 MB thread stack. This is not hypothetical — obfuscated binaries routinely produce deeply nested CFGs. Iterative traversal with a heap-allocated Vec has no depth limit.
With postorder numbers assigned, the algorithm iterates in reverse postorder. For each block, it computes the immediate dominator by intersecting the dominator chains of all predecessors:
// CHK intersect operation (pseudocode)
function dom_intersect(idom, postorder_number, a, b):
while a != b:
while postorder_number[a] < postorder_number[b]:
a = idom[a]
while postorder_number[b] < postorder_number[a]:
b = idom[b]
return a
This walks both fingers up the dominator tree until they meet, using postorder numbers to determine which finger is "lower." The outer loop repeats over all blocks until no immediate dominator entry changes — the fixed point. The resulting dominator tree stores the immediate dominator of each block plus precomputed children lists for efficient subtree traversal.
The same module also computes post-dominators by reversing the CFG edges and introducing a virtual exit node that collects all return blocks. Post-dominators drive if-then-else merge point detection during the structuring phase.
Phi Insertion: Cytron's Algorithm
Phi functions are inserted using the iterated dominance frontier, following Cytron et al. The implementation proceeds in three phases:
1. Variable collection. The first pass scans all blocks for SSA-trackable locations. A location is trackable if it lives in the register, temporary, or stack address space. Constants are immutable by definition, and memory-space locations are not renamed — they go through load/store operations instead. This distinction matters: SSA only tracks register-like storage, while memory remains in its pre-SSA form and is handled by alias analysis during type recovery.
2. Phi placement. The algorithm computes dominance frontiers from the dominator tree and uses them to determine where phi functions are needed. For each variable, a worklist is seeded with every block that contains a definition. The worklist propagates through the dominance frontier:
// Cytron's iterated dominance frontier phi insertion (pseudocode)
function insert_phi_functions(variables, def_sites, dom_frontiers, cfg):
phis = {}
for var in variables:
worklist = list(def_sites[var])
placed = {}
while worklist is not empty:
block = worklist.pop()
for df_block in dom_frontiers[block]:
if df_block not in placed:
placed.add(df_block)
// Insert phi for var at df_block
pred_count = len(cfg.predecessors[df_block])
phis[df_block].append(new_phi(var, pred_count))
// If df_block didn't already define var,
// it now does (via the phi), so propagate
if df_block not in def_sites[var]:
worklist.push(df_block)
return phis
The placed set prevents duplicate phi insertion. The key insight of the iterated dominance frontier is that a phi function itself constitutes a definition, which may trigger further phi placement in its own dominance frontier — hence the worklist loop.
3. Variable renaming. The final phase walks the dominator tree, assigning version numbers to each variable. This is where the textbook algorithm typically uses recursion — descend into dominated children, then unwind the version stacks on return. Aletheia replaces this with an iterative dominator tree walk using an explicit stack and a cleanup-frame pattern:
// Iterative SSA rename with cleanup frames (pseudocode)
function rename_variables(ssa_func, dom_tree):
stacks = {} // variable -> stack of version numbers
counters = {} // variable -> next version counter
work = [Process(entry)]
while work is not empty:
action = work.pop()
if action is Process(block):
pushed = []
// (a) Fresh versions for phi targets
for phi in ssa_func.blocks[block].phis:
ver = next_version(counters, phi.var)
stacks[phi.var].push(ver)
pushed.append((phi.var, ver))
phi.target_version = ver
// (b) Rename op inputs from stack tops, define outputs
for op in ssa_func.blocks[block].ops:
for input in op.inputs:
if input is a variable:
input.version = stacks[input.base].top()
if op has output:
ver = next_version(counters, op.output.base)
stacks[op.output.base].push(ver)
pushed.append((op.output.base, ver))
op.output.version = ver
// (c) Fill phi sources for successors
for succ in cfg.successors[block]:
pred_idx = index_of(block, cfg.predecessors[succ])
for phi in ssa_func.blocks[succ].phis:
phi.sources[pred_idx] = stacks[phi.var].top()
// (d) Push cleanup frame, then children (reverse order)
work.push(Cleanup(block, pushed))
for child in reversed(dom_tree.children[block]):
work.push(Process(child))
else if action is Cleanup(block, pushed):
// Unwind version stacks
for (var, _) in pushed:
stacks[var].pop()
The cleanup frame is pushed before the children, so it executes after all children have been processed — mimicking the unwinding that happens naturally in a recursive implementation. This pattern appears in every tree-walking algorithm in the codebase.
The resulting SSA function is an ordered map of blocks where every variable reference carries a unique version number paired with its base location. Phi functions carry a versioned target and a list of versioned sources, one per predecessor edge.
Phase 4: Type Recovery
Type inference is arguably the hardest problem in decompilation. Machine code has no types — everything is bytes. The type recovery system operates in three layers: constraint generation, lattice solving, and type materialization.
Constraint Generation
The constraint generator walks every SSA operation and emits typed constraints. Type variables are either tied to a specific SSA variable or are freshly introduced when an operation implies the existence of a type not directly observable — most commonly, the pointee type of a pointer.
The constraint vocabulary:
| Constraint | Meaning |
|---|---|
Equal(a, b) |
Variables a and b must have the same type |
HasSize(a, n) |
Variable a has width n bytes |
IsPointer(a, pointee) |
a is a pointer to type variable pointee |
IsInteger(a) |
a is an integer type |
IsSigned(a) / IsUnsigned(a) |
Signedness evidence for a |
IsFloat(a) |
a is a floating-point type |
IsBool(a) |
a is boolean |
PointerArith(ptr, offset, result) |
result = ptr + offset (for struct field recovery) |
PhiJoin(target, sources...) |
Phi target unifies with all source types |
IsUnionMember(a, union_ty) |
a is a member of a union type |
IsEnumValue(a, enum_ty) |
a is a value of an enum type |
The constraint generation rules are operation-specific and encode domain knowledge about what operations imply about their operands' types. Some representative rules:
- Load: If
output = Load(addr), emitIsPointer(addr, fresh_pointee)andEqual(output, fresh_pointee). The address must be a pointer, and whatever it points to has the same type as the loaded value. - IntAdd: Check for pointer arithmetic patterns first. If one operand is already known to be a pointer, emit
PointerArith(ptr, offset, result). Otherwise, emitEqual(a, b)andEqual(a, result)as a fallback — both operands and the result share a type. - IntSdiv / IntSright: Emit
IsSigned(a)andIsSigned(b). Signed division and signed right-shift are the strongest evidence of signedness in machine code. - IntLess vs. IntSless:
IntLessemitsIsUnsignedon both operands;IntSlessemitsIsSigned. The IR distinguishes signed and unsigned comparisons at lift time, so the type system inherits that distinction. - Comparisons (all): Emit
IsBool(output). Comparison results are always boolean.
A subtle but important detail: constraints are deduplicated via a HashSet before solving. Without this, a loop body that executes the same operation type on the same variables would generate thousands of duplicate constraints, causing quadratic blowup in the solver.
Lattice Solving
The solver uses a union-find data structure for type equivalence classes. When an Equal(a, b) constraint is processed, the equivalence classes of a and b are merged. Each equivalence class accumulates hints: size evidence, signedness evidence, kind evidence (pointer, integer, float, bool). Hints are combined using a lattice join:
// Type hint lattice join (pseudocode)
function lattice_join(existing, new):
size:
if one is unknown, take the other
if both agree, keep the value
on conflict, keep the first (conservative)
signedness:
if one is unknown, take the other
if both agree, keep the value
on conflict, signed wins (conservative — produces correct casts)
kind:
merge pointer/integer/float/bool evidence
pointee:
merge pointee equivalence classes
When signedness evidence conflicts (the same variable is used in both signed and unsigned contexts), the solver conservatively picks Signed. This produces correct casts in the output — a variable declared as int32_t that is later used in an unsigned comparison will get an explicit (uint32_t) cast, which is exactly what a human reverse engineer would want to see.
Pointer pointee types are propagated through a fixed-point iteration: if class A is known to be a pointer to class B, and class B acquires new type information (e.g., it is unified with a class known to be int32_t), then A's pointee type is updated and the change may propagate further.
The approach draws on several prior works: Noonan et al. (2016) "Retypd" for the constraint-based formulation, Lee et al. (2011) "TIE" for the lattice structure, and Mycroft (1999) for the foundational insight that type inference on machine code is a constraint satisfaction problem.
Recovered Types
After solving, each equivalence class materializes into a concrete type. The type system supports the standard C-like hierarchy: booleans, sized integers with signedness, floating-point types, pointers (with recursive pointee types), arrays, structs, unions, enums, and function pointers.
Struct types are recovered from pointer arithmetic constraints: if a pointer is consistently accessed at offsets 0, 8, and 16, and those accesses have types int32_t, *char, and uint64_t, the system infers a struct with three fields at those offsets. A deduplication pass ensures that two functions accessing the same layout produce a single named struct definition in the output.
Inter-Procedural Type Propagation
Local type inference within a single function can only go so far. A function that receives a pointer parameter and passes it straight to another function provides no local type evidence — but the callee might use it in a signed division, and the caller might pass a known int32_t*. Inter-procedural propagation bridges these gaps.
The inter-procedural analysis implements bidirectional propagation:
- Callee to caller: If the callee's analysis determines that parameter k has type T, then at every call site, the k-th argument variable inherits type T.
- Caller to callee: If the caller passes a variable of known type T as argument k, the callee's parameter k inherits type T.
The algorithm first runs SSA conversion and local type inference for all functions. It then sorts the call graph topologically (callees before callers for the first pass) and enters a fixed-point loop, capped at 10 iterations:
// Bidirectional inter-procedural type propagation (pseudocode)
function propagate_types(functions, call_graph):
topo_order = topological_sort(call_graph)
for round in 1..max_rounds:
changed = false
for func_id in topo_order:
for call_site in functions[func_id].call_sites:
callee = functions[call_site.target]
// Callee param types -> caller arg types
for (i, param_type) in callee.param_types:
if param_type is known:
if refine(functions[func_id], call_site.args[i], param_type):
changed = true
// Caller arg types -> callee param types
for (i, arg) in call_site.args:
arg_type = lookup_type(functions[func_id], arg)
if arg_type is known:
if refine_param(callee, i, arg_type):
changed = true
if not changed: break
The 10-round cap prevents infinite loops in the presence of recursive call graphs. In practice, convergence happens in 2-3 rounds for typical binaries. An important invariant: types recovered from debug information (DWARF, PDB) are never overwritten by inference — they serve as ground truth seeds that anchor the propagation.
Parameter register mapping is architecture-specific. The module knows that x86-64 System V passes the first six integer/pointer arguments in RDI, RSI, RDX, RCX, R8, R9; that AArch64 AAPCS uses X0-X7; and that ARM32 AAPCS uses R0-R3. These offsets map SSA variables at function entry back to parameter positions.
Phase 5: Structuring and Emission
The final phase transforms the SSA function into readable pseudo-C. This is a two-stage process: control-flow structuring and code emission.
Control-Flow Structuring
The structuring pass recovers high-level control flow from the CFG using the dominator and post-dominator trees. It identifies four patterns:
- If/else: A block with two successors where both successors' paths reconverge at the post-dominator of the conditional block. The post-dominator becomes the merge point.
- While loops: A back-edge in the dominator tree (a block that branches to one of its dominators) with the condition tested at the loop header.
- Do-while loops: Same back-edge pattern, but the condition is at the latch block (the source of the back-edge) rather than the header.
- Switch/case: A block with more than two successors (typically from an indirect branch through a jump table), structured as a switch statement with fall-through analysis.
Code Emission
The emitter walks the structured tree and produces pseudo-C text. Before emission, several optimization passes clean up the output:
- Dead variable elimination: SSA variables that are defined but never used (and whose defining operation has no side effects) are removed. This eliminates the flag computations that were explicitly modeled in the IR but are never consumed by a branch.
- Callee-save detection: Register saves and restores at function prologue/epilogue (e.g.,
push rbx/pop rbx) are identified and removed from the output. These are compiler artifacts, not program logic. - Dead store elimination: Stores to stack slots that are never subsequently loaded are removed.
- Constant folding: Operations on two constants are evaluated at decompilation time.
- Variable renaming: SSA names like
var_RAX_3are replaced with meaningful names where possible — loop counters becomei, pointer dereference targets becomeptr, and so on. - String resolution: Constants that correspond to addresses of null-terminated strings in the binary's read-only data section are replaced with string literals.
- Struct definitions: Structs inferred during type recovery are emitted as named type definitions at the top of the output.
Compiler Idiom Recovery
A dedicated compiler idiom recovery pass recognizes compiler-generated patterns that would otherwise decompile into incomprehensible arithmetic. The most important case: magic constant division. When a compiler transforms x / 7 into a multiply-and-shift sequence using the magic constant 0x2492492492492493, the decompiler recognizes the pattern and recovers the original division.
These patterns are drawn from Warren's Hacker's Delight and cover the idioms generated by GCC, Clang, and MSVC. Without idiom recovery, a simple n / 10 would appear as a 4-instruction sequence involving a 64-bit multiply by 0xCCCCCCCCCCCCCCCD, a right shift, and a subtraction — technically correct but completely unreadable.
Value Range Analysis
Alongside type recovery, Aletheia performs value range analysis using abstract interpretation with an interval lattice. The lattice elements are:
// Interval lattice for value range analysis
Bottom -- Unreachable / no information
Range(lo, hi) -- Concrete interval [lo, hi]
Top -- Any value possible
The analysis propagates intervals forward through the SSA graph. At merge points (phi functions), intervals are joined by taking the union: [0, 10] join [5, 20] = [0, 20]. At loop headers, intervals are widened after 3 iterations to ensure termination — if a variable's range keeps growing, it is widened to Top rather than iterating indefinitely. The entire analysis is capped at 20 fixed-point passes.
Value ranges feed into several downstream consumers: they help determine array bounds for Array type recovery, they inform switch statement completeness analysis, and they enable more aggressive constant folding when a variable's range collapses to a single value.
Design Decisions and Their Rationale
Several cross-cutting design decisions shape the entire codebase. They are worth collecting in one place because they reflect lessons learned from building a decompiler that has to work on real binaries, not just textbook examples.
Iterative everything. Every graph algorithm — DFS, dominator tree walks, rename passes, fixed-point loops — uses an explicit Vec-based stack. The alternative (recursive functions) is cleaner to read but imposes a hard depth limit equal to the thread stack size divided by the frame size. On a binary with a 100,000-node CFG, that limit will be hit. The explicit-stack pattern costs about 20% more code but eliminates an entire class of crashes.
Explicit flags. Modeling CPU flags as separate IR operations on dedicated flag registers adds complexity to the lifter (each x86 instruction must emit its flag effects explicitly) but simplifies everything downstream. SSA conversion treats flags identically to general-purpose registers. Dead code elimination removes unused flag computations automatically. And there is never a question about whether a flag is live across a basic block boundary — if it is, there will be an SSA variable for it.
PointerArith + Equal dual constraint. When an IntAdd might be pointer arithmetic or might be integer addition, the constraint generator emits both PointerArith(a, b, result) and Equal(a, result). If the solver later determines that a is a pointer, the PointerArith constraint activates and drives struct field recovery. If a turns out to be an integer, the Equal constraint correctly unifies the types. This dual-emission strategy avoids premature commitment in the presence of ambiguity.
Constraint deduplication. All generated constraints pass through a HashSet before entering the solver. In a hot loop that executes the same operation on the same variables 10,000 times (common in unrolled loops), this prevents 10,000 redundant constraint insertions. The alternative — deduplication inside the solver — would require checking each constraint against all existing constraints, which is more expensive than a hash lookup at generation time.
Safe Rust throughout. The workspace-level #![deny(unsafe_code)] is not aspirational; it is enforced in CI. The decompiler processes binaries that may be deliberately crafted to exploit analysis tools. An out-of-bounds index in the CFG adjacency list, a dangling reference in the union-find, or a buffer overflow in the string resolver would be ironic at best and a security vulnerability at worst. Safe Rust eliminates these categories of bugs at compile time, at the cost of occasional verbosity in low-level data structure manipulation. That trade-off is unambiguously worth it for a security tool.
Conclusion
Aletheia's decompiler pipeline is a substantial body of Rust spanning both an architecture-independent core and four architecture-specific lifters. It implements textbook algorithms (Cooper-Harvey-Kennedy, Cytron's phi insertion, union-find type inference) with engineering adaptations for adversarial inputs: explicit stacks instead of recursion, panic-catching at trust boundaries, constraint deduplication, and fixed-point caps.
The key insight, if there is one, is that decompilation is a series of well-understood compiler problems run in reverse — dominator trees, SSA, type inference, control-flow structuring — and the challenge is less in the algorithms themselves than in making them robust against the pathological inputs that real-world binaries present. Obfuscated code produces CFGs with 100,000 nodes and deeply nested irreducible regions. Stripped binaries provide zero type information. Adversarial inputs trigger edge cases in every data structure. Building in Rust with safe-by-default semantics, explicit resource management, and iterative graph algorithms turns these from crash-inducing surprises into handled cases.
The full system — loading through pseudo-C emission — runs as part of Aletheia, which exposes the decompiler alongside 140 MCP tools for binary analysis, concolic execution, and hybrid fuzzing. If you work on binary analysis tooling or have binaries you would like to throw at it, join the waitlist.