Actively finding real vulnerabilities

The lab bench for
AI security agents.

Arbiter turns captured web traffic into verified exploit evidence and bounty-ready reports. Aletheia gives agents structured binary analysis: lifting, SSA decompilation, taint, concolic proofs, and hybrid fuzzing. Both are Rust MCP servers built for the work agents can actually finish responsibly.

Explore Arbiter Explore Aletheia
2
Products
407
MCP Tools
85/85
Firing Range
100%
Rust
The Platform

Two tools. One mission.

Attack the web and reverse the binary. AI agents get structured, programmatic access to both.

Arbiter logo

Arbiter

Web security for autonomous agents
For penetration testers and bug bounty hunters

Import traffic from any source. Arbiter builds a state graph, infers authorization and ordering constraints, then searches for violations across 52 vulnerability classes. Every finding is verified in a real browser and exported with evidence. Not a scanner — a reasoning engine.

Race Conditions Browser Verification HTTP Smuggling WAF Bypass Bug Bounty Reports Cache Poisoning AI Security Static Analysis
267
MCP Tools
52
Vuln Classes
85/85
Firing Range
100%
Browser Verified
Learn more about Arbiter
Aletheia logo

Aletheia

Binary analysis for agents that need more than strings
For reverse engineers and malware analysts

Load PE, ELF, or Mach-O binaries. Aletheia disassembles four architectures, lifts to a 43-opcode IR, constructs SSA form, and decompiles to typed C. Taint, concolic, and hybrid fuzzing workflows detect 14 CWE classes with concrete witnesses, CVSS scoring, and SARIF output.

SSA Decompilation Concolic Falsification Hybrid Fuzzing Evasion Detection Taint Analysis Crypto Signatures MITRE ATT&CK Vulnerability Scanning
140
MCP Tools
14
CWE Classes
100%
CTF Detection
4
Architectures
Learn more about Aletheia
For the AI Security Era

Built for the post-Mythos security workflow

The next generation of models won’t be limited by whether they can imagine a vulnerability. They’ll be limited by whether they can inspect the right state, run the right experiment, verify the result, preserve evidence, and stay in scope — without burning context on plumbing. That’s the layer Arbiter Security builds.

Structured Tools

MCP APIs instead of GUI scraping. Typed inputs, JSON outputs, composable workflows. Fewer tool calls per finding, less context spent parsing — agents finish faster, with cleaner audit trails.

Verification First

Browser proof, concolic witnesses, SARIF, screenshots, traces, reproducible reports. No “potential” or “likely” — every finding ships with evidence a human can replay.

Responsible Control

Scope boundaries, audit logs, structured outputs ready for disclosure pipelines. The same instrumentation that helps an agent find a bug also helps you prove it stayed inside the lines.

The thesis: the bottleneck is no longer reasoning — it’s instrumentation, verification, and control.

Real Results

Proof, not promises.

Arbiter has been used to discover and responsibly disclose real vulnerabilities in production open source projects.

Disclosure • 2026

Anthropic Open Source

Vulnerability discovered in Anthropic's open source tooling — the company behind Claude. Responsibly disclosed and acknowledged by their security team.

Responsibly Disclosed
Disclosure • 2026

Cloudflare Open Source

Security issue identified in Cloudflare's open source infrastructure tooling, used by millions of websites globally. Ethically reported via their responsible disclosure program.

Responsibly Disclosed
Benchmark

Google Firing Range

100% detection rate across all 85 test endpoints in Google's XSS Firing Range — the industry standard benchmark for vulnerability detection accuracy.

85/85 Verified
Philosophy

Why we build this way

Existing security tools weren't built for AI agents. They have GUIs, not APIs. Heuristics, not reasoning. Pattern matching, not constraint inference. We started from scratch.

Agent-First

Human GUIs are bottlenecks. Every capability is exposed as a structured MCP tool with JSON input and output. AI agents can orchestrate entire security assessments autonomously.

Rust, From Scratch

No wrappers. No FFI to legacy C code. Both tools are built in pure Rust with memory safety, zero-GC performance, and TDD-first development. Every component is tested before the next begins.

Deterministic Output

Security tools must be reproducible. Same input, same output. Structured JSON responses, explicit error handling, and full audit trails — the kind of reliability agents can depend on.

Early Access

Both tools are in closed development.

Join the waitlist to get early access. Tell us which product you're interested in. No spam. One email when the beta opens.

Questions? Interested in collaborating?

[email protected]