The Problem: Evasion Hides in Plain Sight

Malware authors have a structural advantage over analysts. The binary they ship is designed to resist the exact tools analysts use: debuggers, virtual machines, sandboxes, and disassemblers. These anti-analysis techniques are not exotic. They are well-documented, widely deployed, and — critically — detectable, if you know where to look and can tolerate operating without symbols.

The challenge is that most binaries encountered in real triage workflows are stripped. There are no function names, no debug info, no DWARF sections. The analyst gets a flat executable with an import table, a pile of sections, and raw bytes. Traditional signature-based tools can identify known packers (UPX, Themida, VMProtect), but they miss the broader class of evasion behaviors: timing checks via rdtsc, debugger detection via IsDebuggerPresent, VM fingerprinting through firmware table queries, and sandbox evasion through process enumeration.

The question is not whether the binary is packed. The question is: what is this binary trying to avoid, and how confident are we in that assessment?

Aletheia addresses this by implementing a four-tier detection system that operates entirely on artifacts available in stripped binaries — import tables, embedded strings, section entropy, and raw instruction mnemonics. Every detection is automatically mapped to a MITRE ATT&CK technique ID with a calibrated confidence score, producing structured output that integrates directly into threat intelligence workflows.

Detection Architecture

The evasion detection system runs as part of Aletheia's multi-pass analysis pipeline. It processes a binary through four sequential tiers — import scanning, string matching, entropy analysis, and instruction-level checks — and produces a list of evasion indicators. Each indicator records the category of evasion detected (e.g., AntiDebug, AntiVM, Packing), where in the binary it was found, which ATT&CK technique it maps to, and a calibrated confidence score.

Eleven categories cover the full spectrum of evasion behavior encountered in real-world malware: anti-debug, anti-VM, anti-sandbox, packing, self-modifying code, process injection, timestomping, privilege escalation, persistence, API obfuscation, and TLS callbacks. Each category can be triggered by multiple detection tiers — an anti-debug finding might come from an import (IsDebuggerPresent), a string (ollydbg), or an instruction (int 2dh). Evidence traces ensure analysts can trace any finding back to the exact artifact that triggered it.

Tier 1: Import-Level Detection

The import table is the highest-signal, lowest-noise source for evasion detection. Even stripped binaries retain their import address table — the loader needs it to resolve function addresses at runtime. The detector classifies imports against five categorized tables, each containing Windows API functions that are strong indicators of specific evasion behaviors.

Anti-Debug Imports

Import ATT&CK ID Confidence Rationale
IsDebuggerPresent T1622 High Sole purpose is debugger detection
CheckRemoteDebuggerPresent T1622 High Detects debuggers attached from another process
NtQueryInformationProcess T1622 High ProcessDebugPort / ProcessDebugObjectHandle queries
OutputDebugStringA T1622 High Return value differs under debugger (classic check)
OutputDebugStringW T1622 High Wide-character variant of the same technique

These imports receive high confidence because they have essentially no legitimate use outside of anti-debugging. IsDebuggerPresent exists for exactly one reason. Software that imports it is either a debugger itself or is checking for one. In the context of a sample submitted for triage, the latter interpretation is overwhelmingly more likely.

NtQueryInformationProcess is the most versatile of the five. Beyond ProcessDebugPort, it can query ProcessDebugObjectHandle and ProcessDebugFlags — three distinct anti-debug checks through a single import. The confidence does not reach the highest tier because this function has legitimate uses (querying process information for non-debug purposes), but in combination with the other anti-debug imports, the aggregate signal is strong.

Anti-VM Imports

Import ATT&CK ID Confidence Rationale
GetSystemFirmwareTable T1497.001 High SMBIOS/ACPI tables reveal hypervisor presence
EnumDeviceDrivers T1497.001 High Driver enumeration detects VM guest additions

GetSystemFirmwareTable is the canonical VM detection API. By querying the SMBIOS firmware table, malware can read the system manufacturer string — "VMware, Inc.", "QEMU", "Microsoft Corporation" (for Hyper-V) — and decide whether to execute its payload or exit silently. EnumDeviceDrivers serves a similar purpose: VM guest additions install kernel drivers with recognizable names (vmci.sys, VBoxGuest.sys), and enumerating loaded drivers reveals them.

Process Injection Imports

Import ATT&CK ID Sub-technique Confidence
CreateRemoteThread T1055.001 DLL Injection High
NtMapViewOfSection T1055.012 Process Hollowing High
WriteProcessMemory T1055 Generic Injection High
VirtualAllocEx T1055 Generic Injection High
QueueUserAPC T1055.004 APC Injection High
NtUnmapViewOfSection T1055.012 Process Hollowing High
SetWindowsHookExA T1055 Generic Injection High

Process injection imports are mapped to specific ATT&CK sub-techniques where possible. CreateRemoteThread maps directly to T1055.001 (DLL Injection) because it is the standard mechanism for forcing another process to load a DLL. NtMapViewOfSection and NtUnmapViewOfSection together indicate T1055.012 (Process Hollowing) — the pattern of unmapping a legitimate process's image and replacing it with malicious code. QueueUserAPC maps to T1055.004 (Asynchronous Procedure Call Injection), a technique that abuses the APC queue to execute code in the context of another thread.

Persistence and API Obfuscation Imports

Category Imports ATT&CK ID Confidence
Persistence (Registry) RegSetValueExA, RegSetValueExW T1547.001 High
Persistence (Service) CreateServiceA, CreateServiceW T1543.003 High
API Obfuscation GetProcAddress, LoadLibraryA, LoadLibraryW, LdrGetProcedureAddress T1027.007 Moderate

The API obfuscation category deserves special attention because it uses a co-occurrence requirement: a single GetProcAddress import does not trigger a finding. Legitimate software routinely calls GetProcAddress for plugin systems, optional feature detection, and backwards compatibility. The detection fires only when two or more of the four API resolution imports are present simultaneously — indicating the binary is resolving APIs dynamically rather than linking them statically, which is a hallmark of import table obfuscation.

The moderate confidence assigned to this category reflects the inherent ambiguity: even with co-occurrence, dynamic API resolution is a legitimate pattern in some software (plugin hosts, compatibility shims). The confidence is sufficient to surface the finding but low enough that it will not dominate a triage report in the absence of corroborating evidence from other tiers.

Tier 2: String-Level Detection

Tier 2 scans recovered strings for 11 patterns that indicate environment awareness — the binary contains references to specific analysis tools, virtual machine vendors, or persistence mechanisms. Each match produces a finding at moderate-to-high confidence.

Pattern Category What It Indicates
VMware AntiVM Checks for VMware hypervisor
VBoxGuest AntiVM Checks for VirtualBox guest additions
QEMU AntiVM Checks for QEMU emulator
Hyper-V AntiVM Checks for Microsoft Hyper-V
SbieDll AntiSandbox Sandboxie DLL detection
wireshark AntiSandbox Network analysis tool detection
procmon AntiSandbox Process Monitor detection
dbghelp.dll AntiDebug Debug helper library presence check
ollydbg AntiDebug OllyDbg debugger detection
x64dbg AntiDebug x64dbg debugger detection
SOFTWARE\Microsoft\Windows\CurrentVersion\Run Persistence Registry Run key for auto-start

String-level detection sits at a lower confidence than import-level detection because strings are inherently noisier. A binary might contain the string "VMware" as part of a legitimate compatibility check, an error message, or even a code comment that survived compilation. The string's mere presence is suggestive but not conclusive. However, when a Tier 2 string finding aligns with a Tier 1 import finding (e.g., the string "VMware" alongside the import GetSystemFirmwareTable), the combined evidence is substantially stronger than either signal alone.

The persistence pattern (SOFTWARE\Microsoft\Windows\CurrentVersion\Run) is particularly valuable because it is both highly specific and difficult to encounter accidentally. This registry path is the standard location for auto-start entries, and a binary containing this exact string is almost certainly reading or writing Run key values. Combined with RegSetValueExA from Tier 1, this provides a high-confidence T1547.001 finding.

Tier 3: Section Entropy Analysis

Tier 3 applies information theory to detect packing, encryption, and compression. The detector computes Shannon entropy for each section in the binary using a standard byte-level frequency distribution — counting the occurrence of each possible byte value (0x00 through 0xFF) and computing the information-theoretic entropy of the resulting distribution.

Shannon entropy measures the average information content per byte, expressed in bits. The theoretical maximum is 8.0 bits/byte (every byte value equally likely — indistinguishable from random data). The practical ranges are well-established:

Entropy Range Typical Content Interpretation
0.0 – 1.0 Zero-filled sections, padding Uninitialized data
4.0 – 5.5 ASCII text, string tables Human-readable content
5.5 – 6.5 Compiled native code Normal executable sections
6.5 – 7.0 Compressed resources, optimized code Borderline — may be benign
7.0 – 7.5 Packed or compressed code Likely packing (moderate-to-high confidence)
7.5 – 8.0 Encrypted or strongly compressed code Almost certainly packed (very high confidence)

The two-threshold design reflects the bimodal distribution of entropy in malicious samples. At 7.0, the section is likely packed — normal compiled code almost never reaches this entropy level, even with aggressive compiler optimizations. At 7.5, the section is almost certainly encrypted or compressed with a strong algorithm, as even compressed data (zlib, LZMA) typically settles between 7.2 and 7.6 depending on the compressibility of the original content. Both thresholds map to ATT&CK T1027.002 (Software Packing). The detector iterates over each section, computes its entropy, and assigns the appropriate confidence level based on which threshold is exceeded.

Entropy analysis is particularly valuable for stripped binaries. A packed, stripped binary offers almost no surface for import-level or string-level detection — the packer's stub typically imports only a handful of functions (often just VirtualAlloc, VirtualProtect, and a few others), and the original binary's strings are encrypted within the packed payload. Entropy is the one signal that survives aggressive packing: you can encrypt the contents, but you cannot hide the statistical properties of the ciphertext.

Tier 4: Instruction-Level Detection

The final tier operates on disassembled instruction mnemonics. These detections target specific x86/x64 instructions that serve as anti-analysis primitives at the hardware level.

Instruction Category ATT&CK ID Confidence Rationale
int3 (INT 3, 0xCC) AntiDebug T1622 Moderate Breakpoint instruction; generates EXCEPTION_BREAKPOINT. Under a debugger, the debugger catches it. Without one, the process's own exception handler runs.
rdtsc AntiDebug T1497.003 Moderate Read timestamp counter. Two consecutive rdtsc calls with a large delta indicate single-stepping or breakpoint-induced delays.
cpuid AntiVM T1497.001 Moderate Leaf 0x40000000 returns hypervisor vendor string. Leaf 0x1 ECX bit 31 is the hypervisor present bit.
int 2dh AntiDebug T1622 High Kernel-mode debug service interrupt. Almost exclusively used for anti-debug: raises EXCEPTION_BREAKPOINT but is handled differently under a debugger vs. normal execution.

The confidence scores for instruction-level detections are deliberately conservative. int3, rdtsc, and cpuid all have legitimate uses. int3 appears in assertion handlers and sanitizer instrumentation. rdtsc is used for high-precision timing in performance-sensitive code. cpuid is called by runtime libraries to detect CPU features (SSE, AVX, AES-NI). At moderate confidence, these findings are surfaced but will not, on their own, dominate a triage assessment.

The exception is int 2dh, which receives high confidence. This instruction has virtually no legitimate use in user-mode software. It is a debug service interrupt that behaves differently depending on whether a kernel debugger is attached: with a debugger, the interrupt is intercepted and the next byte is skipped; without one, the exception handler runs normally. Malware uses this asymmetry to detect debugging environments. Its presence in a binary is one of the strongest single-instruction indicators of anti-analysis intent.

ATT&CK Technique Mapping

Every detection across all four tiers is mapped to a specific MITRE ATT&CK technique ID. The mapping covers 11 techniques and sub-techniques:

Technique ID Sub-technique Description Detection Source
T1622 Debugger Evasion Import + Instruction
T1497.001 System Checks VM Detection (firmware, CPUID) Import + Instruction
T1497.003 Time-Based Evasion Timing checks (RDTSC) Instruction
T1055 Process Injection (generic) Import
T1055.001 DLL Injection CreateRemoteThread-based injection Import
T1055.004 APC Injection QueueUserAPC-based injection Import
T1055.012 Process Hollowing NtMapViewOfSection-based hollowing Import
T1547.001 Registry Run Keys Persistence via auto-start registry Import + String
T1543.003 Windows Service Service-based persistence Import
T1027.002 Software Packing High-entropy sections Entropy
T1027.007 Dynamic API Resolution GetProcAddress + LoadLibrary co-occurrence Import (co-occurrence)

Evidence Deduplication

The ATT&CK export deduplicates findings by technique ID, collects all evidence for each technique, and selects the highest confidence among all detections mapped to that technique. A single binary might trigger T1622 (Debugger Evasion) through three different sources: an IsDebuggerPresent import, an ollydbg string, and an int 2dh instruction. Rather than reporting three separate T1622 findings, the export collapses them into a single mapping with all three pieces of evidence and the highest confidence among them. Threat intelligence platforms consuming STIX output expect one entry per technique, not one entry per detection.

Why Stripped Binaries Are Not a Problem

A common concern in binary analysis is that stripped binaries — those without symbol tables, debug info, or DWARF sections — are resistant to automated analysis. For evasion detection, this concern is unfounded. Each of the four detection tiers operates on artifacts that survive stripping:

  • Import tables are present even in stripped binaries. The Windows PE loader requires them to resolve external function addresses at runtime. Stripping removes internal symbols (function names defined by the binary itself), not external symbols (functions imported from DLLs). The import address table is structural, not debug information.
  • Embedded strings are recovered by scanning raw section data for sequences of printable characters, regardless of whether symbol information associates them with named variables. The string "VMware" exists in the binary's data section whether or not the binary has symbols.
  • Section metadata and entropy are structural properties of the PE format. Section names, virtual addresses, raw sizes, and characteristics flags are part of the section header table, which is required for the loader to map the binary into memory. Entropy is computed from the section's raw content — it is a statistical property of the bytes, not a property of the debug information.
  • Instruction mnemonics are produced by the disassembler from raw machine code bytes. The disassembler does not need symbol information to decode 0xCC as int3 or 0F 31 as rdtsc. These are fixed encodings defined by the x86 ISA.

In fact, entropy analysis becomes more valuable for packed stripped binaries. A packed binary with symbols would be unusual (why pack a debug build?), but a packed stripped binary is the standard deployment format for production malware. The packed payload has high entropy, the packing stub has low entropy, and the contrast between them is a strong structural signal that no amount of symbol stripping can eliminate.

Pipeline Integration

Evasion detection does not exist in isolation. It integrates into Aletheia's analysis pipeline at five levels, each serving a different consumer:

Level 1: Full Pipeline

The full pipeline executes all analysis passes in order. Evasion detection runs after disassembly and control flow recovery but before higher-level analyses like vulnerability detection and decompilation. This ordering is intentional: evasion indicators inform later passes. A function flagged as containing anti-debug checks can be prioritized differently during decompilation, and packing detection influences whether the decompiler should even attempt to process certain sections.

Level 2: Composite MCP Tools

Composite MCP tools expose evasion results as part of broader analysis calls. Binary identification runs evasion detection alongside library signature matching and session information extraction in a single MCP call, returning a unified triage summary. Vulnerability investigation includes evasion context alongside the vulnerability itself — if a binary contains anti-debug checks, that context is relevant to understanding whether the vulnerability is in benign software or malware.

Level 3: Export Formats

Three export capabilities consume evasion indicators:

  • Report generation produces markdown or JSON reports that include an evasion summary section with all detected techniques, confidence scores, and ATT&CK mappings.
  • IOC export generates indicators of compromise in JSON, STIX 2.1, or CSV format. Evasion techniques become STIX attack-pattern objects linked to the analyzed binary via relationship objects.
  • ATT&CK mapping export produces a dedicated ATT&CK mapping with deduplicated techniques, evidence chains, and confidence scores — suitable for direct import into ATT&CK Navigator or threat intelligence platforms.

Level 4: Malware Triage Prompt

Aletheia includes a structured malware triage workflow designed for use with AI agents. Evasion detection is Phase 2 of this workflow — after initial identification (Phase 1) but before deep behavioral analysis (Phase 3). The prompt instructs the agent to assess section entropy and packing status before attempting deeper analysis, because a packed binary needs to be unpacked before decompilation will produce meaningful results. The evasion assessment in Phase 2 directly influences whether Phase 3 should proceed with static analysis or fall back to dynamic unpacking.

Level 5: Agent Annotation

Aletheia's MCP tools allow agents to annotate specific functions with ATT&CK technique IDs. This creates a bidirectional link between the evasion detection system and the agent's own analysis. If the agent identifies additional anti-analysis behavior through manual inspection (e.g., a custom VM detection routine that does not use recognized imports), it can tag the relevant function with the appropriate ATT&CK ID, and subsequent export operations will include the agent-augmented annotations alongside the automated detections.

Confidence Score Rationale

The confidence scoring system is not arbitrary. It reflects a deliberate hierarchy based on the specificity and reliability of each detection source:

Tier Base Confidence Rationale
Tier 1: Import High Imports are explicit declarations of capability. A binary that imports IsDebuggerPresent is declaring its intent to check for a debugger. False positives are possible but uncommon.
Tier 2: String Moderate-to-High Strings are suggestive but may be coincidental. A binary containing "VMware" might be checking for VM presence or might be part of VMware's own tooling.
Tier 3: Entropy (>7.0) Moderate-to-High High entropy is a strong structural signal, but some legitimate software (media codecs, cryptographic libraries) contains high-entropy data sections.
Tier 3: Entropy (>7.5) Very High Entropy above 7.5 in an executable section is almost never legitimate. The data is either encrypted or compressed with a strong algorithm.
Tier 4: Instruction Moderate Individual instructions have legitimate uses. rdtsc is used for performance measurement. cpuid is used for feature detection. Context determines intent, and a single instruction lacks context.
Tier 4: int 2dh High Near-zero legitimate use in user-mode code. This instruction exists almost exclusively as an anti-debug primitive.
Tier 1: API Obfuscation Moderate Co-occurrence requirement reduces false positives, but dynamic API resolution is a legitimate pattern in plugin architectures.

The confidence scores compose naturally. When a binary triggers findings across multiple tiers for the same ATT&CK technique, the export takes the maximum confidence. But the real value is in the breadth of evidence: a T1622 finding backed by three independent detection tiers (import, string, instruction) is far more persuasive in a triage report than a T1622 finding from a single rdtsc instruction, even though the maximum confidence is determined by the highest individual score.

CWE Mapping: The Complementary System

While ATT&CK mapping classifies adversary behavior, Aletheia's adjacent CWE mapping system classifies software weaknesses. The vulnerability scanner covers 14 vulnerability classes, each mapped to a CWE identifier and a base CVSS v3 score:

Vulnerability Class CWE ID Description
Buffer Overflow CWE-120 Buffer Copy without Checking Size of Input
Command Injection CWE-78 Improper Neutralization of Special Elements in OS Command
Format String CWE-134 Use of Externally-Controlled Format String
Use After Free CWE-416 Use After Free
Integer Overflow CWE-190 Integer Overflow or Wraparound
Path Traversal CWE-22 Improper Limitation of a Pathname to a Restricted Directory
Null Dereference CWE-476 NULL Pointer Dereference
Double Free CWE-415 Double Free
Divide By Zero CWE-369 Divide By Zero
Out-of-bounds Write CWE-787 Out-of-bounds Write
Dangerous Function CWE-676 Use of Potentially Dangerous Function
Sizeof Pointer CWE-467 Use of sizeof() on a Pointer Type

The two systems — ATT&CK for adversary behavior and CWE for software weaknesses — serve different audiences and answer different questions. ATT&CK mapping tells the threat intelligence analyst what the malware is doing: evading debuggers, injecting into processes, establishing persistence. CWE mapping tells the vulnerability researcher what exploitable weaknesses exist in the binary: buffer overflows, use-after-free conditions, format string vulnerabilities.

In a malware triage workflow, both are valuable. Knowing that a sample exhibits T1055.012 (Process Hollowing) and T1622 (Debugger Evasion) characterizes its behavior. Knowing that it contains CWE-416 (Use After Free) in its unpacking stub characterizes its quality — and suggests that the malware author may have introduced exploitable bugs in their own code, which is relevant for both attribution and potential disruption.

Practical Impact

The evasion detection system processes a typical PE binary in under 100 milliseconds. Import scanning is O(n) over the import table. String scanning is O(n) over recovered strings. Entropy computation is O(n) over section bytes. Instruction scanning is O(n) over disassembled mnemonics. There are no quadratic or exponential operations, no iterative solving, and no external dependencies.

This performance characteristic means evasion detection can run on every binary in a batch — not just the ones an analyst selects for manual review. In a SOC workflow processing hundreds of samples per day, automatic ATT&CK mapping allows analysts to filter, sort, and prioritize based on evasion sophistication before committing time to deep analysis. A sample exhibiting T1027.002 (packing at very high confidence), T1622 (debugger evasion), and T1497.001 (VM detection) is more likely to be worth investigating than a sample with no evasion indicators.

The structured output — JSON, STIX, CSV, or ATT&CK Navigator layers — integrates directly into existing threat intelligence infrastructure. There is no manual mapping step. The analyst receives a binary, runs Aletheia's identification tool, and gets a complete evasion assessment with ATT&CK technique IDs, confidence scores, and exportable evidence in the format their downstream tools expect.