Binary analysis engine

Ghidra for
AI agents.

Load PE, ELF, or Mach-O binaries. Aletheia disassembles four architectures, lifts to a 43-opcode SSA-form IR, decompiles to typed C, and runs taint, concolic, and hybrid fuzzing workflows through 140 MCP tools.

140 MCP tools across 25 modules — structured JSON for every capability
100% detection on CTF binaries — 0 false positives on production code
SSA-form decompilation with Cooper-Harvey-Kennedy algorithm — v1.1
How It Works

From raw binary to structured intelligence

Traditional RE tools give you a GUI. Aletheia gives your agent 140 JSON APIs to reason with.

Step 01

Load

Parse ELF, PE, and Mach-O binaries. Extract sections, symbols, DWARF debug info, and relocations automatically.

Step 02

Disassemble

Decode x86-64, x86-32, ARM64, and ARM32 instructions, discover functions via recursive descent, and build control flow graphs with petgraph.

Step 03

Lift & Decompile

Lift to custom IR with 43 opcodes and explicit flag modelling (EFLAGS, NZCV). Construct SSA form via Cytron phi insertion. Decompile to typed C with struct reconstruction.

Step 04

Detect & Verify

Automated vulnerability scanning across 14 CWE classes with CVSS scoring. Concolic falsification with machine-checkable proofs. Hybrid fuzzing with LLM-guided search. MITRE ATT&CK-mapped evasion detection. SARIF 2.1.0 export.

Inside the Engine

From asm to typed C, through SSA

Aletheia lifts machine instructions into a 43-opcode IR (AletheiaIR), converts to SSA so every definition has a unique name, and inserts a φ-node at every control-flow merge. The phi-node is what lets type recovery and the decompiler emit clean, typed C — not because the math is fancy, but because it has a precise data-flow story to tell.

FIG. 01 AletheiaIR · SSA construction Cytron 1991 · CHK dominance frontier
AletheiaIR SSA construction On the left, a column of x86-64 instructions: cmp, jle, mov, jmp. On the right, the same control flow rendered as a four-block CFG in SSA form. RAX is versioned across blocks: RAX_1 in the then block, RAX_2 in the else block, and a phi-node at the merge block selects RAX_3 from the two predecessor versions. [A] x86-64 · iced-x86 decode_func 0x401230 push rbp 0x401231 mov rbp, rsp 0x401234 cmp edi, 0 0x401237 jle .else 0x401239 mov eax, 1 0x40123e jmp .merge .else: 0x401240 mov eax, 2 .merge: 0x401245 pop rbp 0x401246 ret 11 instr · 4 bb arch: x86_64 bytes: 0x17 lift 43 ops [B] AletheiaIR · SSA phi insertion at dominance frontier true false bb_0 · entry cmp_eq ZF, RDI_0, 0 cbranch ZF, .else bb_1 · then RAX_1 ← Copy 1 IntZext EAX, 32 Branch .merge bb_2 · else RAX_2 ← Copy 2 IntZext EAX, 32 Branch .merge bb_3 · merge φ RAX_3 ← (RAX_1, RAX_2) Return RAX_3 1 phi_function inserted target: RAX_3 · sources: 2 2 IrOpcode IntEqual, Cbranch, Copy DIFF · UNICORN ORACLE every lifted instruction validated against ground truth
The Hybrid Loop

Falsification with a witness

When fuzzer coverage plateaus, the HybridScheduler dispatches the concolic engine to the frontier branch, asks Z3 for an input that takes the negated path (QF_ABV), and feeds the satisfying model back into the corpus as a high-priority seed. Every finding ships with a concrete input vector that reaches the sink — not "we matched a pattern", but "here is the bytes, run them yourself."

FIG. 02 Hybrid loop · fuzzer ↔ concolic concolic + coverage-guided
Hybrid concolic + fuzzer loop Pane A shows an AFL-style fuzzer with a 64KB CoverageBitmap grid, an edges-over-time chart that climbs and then flatlines, and a "PLATEAU" alert when 50 consecutive inputs add no edges. Pane B shows the concolic engine recording four PathConstraints along the executed path; the frontier constraint is highlighted and tagged "negate". Pane C shows the Z3 solver receiving the negated assertion, returning SAT, and emitting a Witness with concrete bytes. A feedback curve flows from Pane C back to Pane A, labelled mutator.inject_witness(). [A] Fuzzer · AFL-style coverage-guided edges_hit / time t CoverageBitmap · 65,536 edges PLATEAU DETECTED 50 inputs · Δedges = 0 → dispatch concolic execs/s: 12,400 edges_hit: 1,847 corpus: 318 seeds [B] Concolic engine SymbolicShadow PathConstraint stack · 4 frames 0x401234 input[0] < 100 [taken] cmp_eq · taken=true 0x401240 input[1] > 0 [taken] cmp_sgt · taken=true 0x40125c size = input[0] * 256 [taken] int_mult · taken=true 0x401270 size < buf_capacity NEGATE ← frontier · flip + send to Z3 solver.assert( ¬ c4 ) SymExpr → QF_ABV bitvector ModelGap: 0 unrolled loops: 4 syscall stubs: 7 [C] Z3 solver QF_ABV · bitvectors > solver.push() > solver.assert(¬c4) > solver.check() SAT solver_time_ms: 18 model: arg0 = 0xAA 0xBB 0x00 0x00 arg1 = 0x01 size = 0xFFFE Witness violation_address: 0x401270 source_location: buffer.c:42 cwe: CWE-120 · CVSS 9.8 inputs: 3 vars · 7 bytes Witness → corpus seed high-priority injection · 30% mix-ratio · coverage resumes climbing FUZZ → PLATEAU → NEGATE → SAT → WITNESS → SEED 10–100× throughput · concolic + coverage-guided
Anatomy of a Finding

What an Aletheia finding actually looks like

Most binary scanners give you "potential buffer overflow at 0x401270 — please verify." Aletheia gives you the bytes that crash it, the path through the decompiled C, and a machine-checkable proof. Below is the shape of a real finding from a stripped binary.

FINDING #b9c8d2e7
VALIDATED CWE-120 CVSS 9.8 / Critical

Stack buffer overflow in decode_packet via unchecked length multiplication

A stripped 2.4 MB ELF binary contained a function that copied attacker-controlled bytes into a 1024-byte stack buffer using a length derived from input * 2. No bound check existed between the multiplication and the memcpy. Aletheia falsified the bound assumption, produced a witness, and replayed it to a confirmed crash — without ever running the binary on the host.

Discovery

While exploring decode_packet at 0x401230, the concolic engine recorded four path constraints. The fourth was treated as the frontier and negated:

// PathConstraint stack · 4 frames
0x401234   input_size < 1024              [taken]
0x401240   input_size > 0                 [taken]
0x40125c   copied_size = input_size * 2   [taken]
0x401270   copied_size < buf_capacity     [NEGATED]   ← frontier

Witness

Z3 (QF_ABV) returned SAT in 18 ms with a satisfying assignment. These bytes are the proof:

// Z3 model · satisfies all prior constraints + (size >= buf_capacity)
arg0  =  0xAA 0xBB 0x00 0x00
arg1  =  0x01
size  =  0xFFFE                            // 65,534 bytes into a 1,024-byte buffer

Decompiled

Type recovery from the SSA form named the buffer, recovered the integer width, and surfaced the missing bound check:

void decode_packet(int input_size, char *src) {
  char buf[1024];                         // stack-allocated, fixed
  int  copied_size = input_size * 2;
  if (input_size > 0) {
    memcpy(buf, src, copied_size);         // ← sink, no bound check
  }
}

Replay

The witness was injected into the fuzz harness and replayed concretely (not symbolically). The crash matches the predicted violation address:

$ ./harness < witness.bin
[harness] reading 7 bytes
[harness] decode_packet(input_size=0xFFFE, src=…)
[harness] *** stack smashing detected ***
Segmentation fault · violation at 0x401270
Aborted (core dumped)

Aletheia stops here with a Witness in hand: a concrete input vector, the violation address, the source location (buffer.c:42 from DWARF), and a CWE classification. The fuzzer accepts the witness as a high-priority corpus seed and resumes coverage.

Recommendation

Bound the copy length explicitly against sizeof(buf) before the memcpy:

if (copied_size >= sizeof(buf)) return -1;
memcpy(buf, src, copied_size);

The integer overflow on input_size * 2 also warrants a separate bound on the multiplication itself — Aletheia raised that as a secondary CWE-190 finding (not shown).

Generated 2026-05-03 · SARIF 2.1.0 export · vuln-report.sarif Machine-checkable · replay deterministic · no host-side execution
Honest Comparison

Aletheia vs. the tools you already know

Ghidra is powerful. IDA is the standard. Binary Ninja innovates. Aletheia is built for a different user: agents.

Side-by-side comparison of Aletheia against Ghidra, IDA Pro, and Binary Ninja across reasoning, detection, output, and foundation capabilities.
Capability Aletheia Ghidra IDA Pro Binary Ninja
Reasoning & analysis
SSA-form IR Full SSA, Cooper-Harvey-Kennedy P-Code (not SSA) Microcode (limited SSA) BNIL (SSA variants)
Data flow analysis SSA def-use, taint, slicing P-Code emulation Microcode analysis MLIL SSA slicing
Multi-architecture x86-64, x86-32, ARM64 & ARM32 Many architectures Many architectures Many architectures
Detection & verification
Evasion / anti-analysis detection Built-in, MITRE ATT&CK mapped Plugins only FLARE plugins Plugins only
Crypto signature identification Built-in, structured results FindCrypt plugin FindCrypt plugin Sigkit
Full-text search across binary Tantivy full-text engine Basic string search String / byte search Cross-references
Output & integration
AI agent integration (MCP) 140 MCP tools, structured JSON No No No
Structured API output Native JSON, agent-optimised Ghidra Script (Java/Python) IDAPython Python API
Headless / CLI mode Native CLI + MCP server analyzeHeadless Batch mode Headless
Foundation
Memory safety Pure Rust, zero unsafe Java / JVM C++ (manual) C++ core
Price Free tier + Pro Free (NSA) $1,879+ / yr $299+

Ghidra and IDA are battle-tested with decades of plugin ecosystem. Binary Ninja's IL design is excellent. Aletheia's advantage is agent ergonomics and structured output — it's complementary for AI-driven workflows.

What It Finds

Real vulnerabilities in real binaries

Clean-room implementation in pure Rust. Every algorithm from published literature. Validated against CTF suites and production binaries. Now at v1.1.

140
MCP Tools
25 modules, full agent integration
14
CWE Classes
Concolic falsification with CVSS scoring
100%
CTF Detection
0 false positives on production binaries
43
IR Opcodes
Custom IR with explicit flag modelling
4
Architectures
x86-64, x86-32, ARM64 & ARM32 — ~693 mnemonics
3
Binary Formats
PE, ELF, and Mach-O
Capabilities

What Aletheia actually does

Every capability exposed as a structured MCP tool. Built for agents that reason about binaries.

Core Architecture

SSA-Form Decompilation Pipeline

A full decompilation pipeline from raw bytes to typed C. Instructions are lifted to a custom intermediate representation with 43 opcodes and explicit flag modelling (EFLAGS for x86, NZCV for ARM64). SSA construction uses the Cooper-Harvey-Kennedy algorithm with Cytron-style phi insertion — the same foundations as production compilers.

43 IR Opcodes
SSA Form IR
4 Architectures

Constraint-based type recovery with union-find lattice solving recovers integers, pointers, arrays, structs, unions, enums, and bitfields. Inter-procedural propagation refines types bidirectionally across the call graph. Control flow structuring detects loops, if/else chains, switch statements, and break/continue. Every pipeline stage is independently accessible through MCP tools.

AI-Native

140 MCP Tools

Every capability — loading, disassembly, decompilation, search, detection, data flow tracing, type recovery, concolic analysis, hybrid fuzzing, vulnerability scanning, binary diffing, library signatures, attack-chain investigation, CyberGym benchmarking, and macOS security analysis — is exposed as a structured MCP tool across 25 modules. Connect Aletheia to Claude and the agent orchestrates entire analysis workflows autonomously.

stdio + SSE transport • Risk-tiered permissions • Knowledge graph • Audit logging
Threat Intelligence

Evasion & Anti-Analysis Detection

Built-in detection for anti-debugging, anti-VM, packing, obfuscation, and other evasion techniques. Every finding is mapped to MITRE ATT&CK technique IDs for structured threat intelligence integration.

MITRE ATT&CK mapping • Anti-debug • Anti-VM • Packing detection • Obfuscation scoring
Cryptographic Analysis

Crypto Signature Detection

Identifies cryptographic constants, S-boxes, and algorithm signatures embedded in binaries. Detects AES, DES, SHA-256, MD5 constants, and custom implementations through byte-pattern and structural analysis.

AES / DES • SHA-256 / MD5 • S-box detection • Byte-pattern matching
Search Engine

Tantivy Full-Text Search

Blazing-fast full-text search across all analysis artifacts — strings, function names, cross-references, and disassembly output. Powered by Tantivy, the Rust search engine library. Query results are structured JSON.

Full-text indexing • Bidirectional xrefs • petgraph CFG • Sub-millisecond queries
Vulnerability Research

Data Flow & Taint Analysis

Trace how data moves through a binary. Backward slicing computes the minimal instruction set influencing a target variable. Taint tracking follows untrusted input from sources like recv and read to dangerous sinks like system and exec. Both work across function boundaries.

Backward slicing • Forward slicing • Taint propagation • Cross-function • Xref tracking
Concolic Falsification

Security Variant Detection

Combines concrete emulation with sparse symbolic shadow to find machine-checkable security property violations. Detects 14 CWE classes: buffer overflows, command injection, format strings, use-after-free, integer overflow, path traversal, null dereference, double-free, divide-by-zero, out-of-bounds write, out-of-bounds read, dangerous function use, sizeof-pointer bugs, and hardcoded credentials. Validated at 100% detection on CTF binaries with 0 false positives on production binaries.

Z3 SMT solver • Parallel constraint solving • UNSAT caching • Attacker model inference • Sandboxed Unicorn emulation
Fuzzing Engine

Hybrid Coverage-Guided Fuzzing

AFL-compatible coverage-guided fuzzing integrated with concolic exploration. Corpus management, validation campaigns, and A/B comparison framework for exploration strategies. Scheduler automatically balances fuzzing with symbolic execution.

64KB edge bitmap • AFL-compatible • Corpus management • Concolic integration • A/B comparison
Pricing

Start free. Scale when you're ready.

No credit card required. Upgrade as your analysis workload grows.

Free
$0/mo
  • 3 binaries per month
  • Core disassembly
  • x86-64 architecture
  • Basic decompilation
  • Community support
Join Waitlist
Enterprise
Custom
  • Everything in Pro
  • On-prem deployment
  • Custom detection rules
  • CI/CD integration
  • Dedicated support & SLAs
Contact Us

IDA Pro starts at $1,879/yr. Binary Ninja at $299+. Aletheia Pro gives you full agent integration and SSA decompilation at a fraction of the cost.

FAQ

Frequently asked questions

What binary formats does Aletheia support?
ELF, PE, and Mach-O. Four architectures: x86-64, x86-32, ARM64, and ARM32. Format detection is automatic — load any supported binary and Aletheia identifies the format and architecture.
How does this differ from using Ghidra with a Python script?
Aletheia is purpose-built for agent consumption. Every capability returns structured JSON. No GUI, no Java runtime, no script glue. 140 MCP tools across 25 modules, with risk-tiered permissions and full audit logging.
Can I use Aletheia for malware analysis?
Yes. Built-in evasion detection identifies anti-debugging, anti-VM, packing, and obfuscation techniques with MITRE ATT&CK mapping. Unicorn-based sandboxed emulation lets you safely test hypotheses about binary behaviour.
What decompilation quality can I expect?
Aletheia uses SSA-form IR with the Cooper-Harvey-Kennedy dominance algorithm and Cytron-style phi insertion. Output is typed C with constraint-based type recovery, struct/union/enum reconstruction, and control flow structuring. Quality is actively improving with each release — currently at v1.1.
How do the MCP tools scale without overwhelming the AI's context window?
Both Aletheia and Arbiter integrate with Forgemax, our open source MCP gateway. Traditional MCP dumps every tool schema into the LLM's context window — at 140 tools, that's ~28,000 tokens of overhead before the agent even starts working. Forgemax collapses all tools into just two: search and execute. The agent discovers capabilities dynamically and writes JavaScript to chain tool calls inside a sandboxed V8 isolate. Context overhead drops to ~1,100 tokens regardless of tool count — a 95% reduction. The sandbox provides full security isolation with AST validation and opaque credential bindings.
Can I use Aletheia without AI agents?
Yes. Aletheia has a full CLI with subcommands for loading, analyzing, disassembling, listing functions, extracting strings, and generating reports. The MCP server is an additional interface — not a requirement.
How does Aletheia handle stripped binaries?
Aletheia uses recursive descent analysis to discover functions from entry points and call targets, assigning confidence scores to each detection. FLIRT-style library function signatures match common library functions by byte patterns even without symbol information. When DWARF debug info is present, it's used to enrich results.
Does Aletheia support taint analysis?
Yes. Aletheia traces data flow from untrusted sources (recv, read, fgets) to dangerous sinks (system, exec, memcpy) across function boundaries. Backward slicing computes the minimal instruction set influencing any target variable. Both are exposed as MCP tools.
Early Access

Aletheia is currently in closed development.

Join the waitlist to get early access when we open the beta. No spam. One email when it's ready.

Questions? Interested in collaborating?

[email protected]