vm-and-bytecode-reverse
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSKILL: VM & Bytecode Reverse Engineering — Expert Analysis Playbook
技能:VM & 字节码逆向工程 — 专家分析手册
AI LOAD INSTRUCTION: Expert techniques for reversing custom virtual machines and bytecode interpreters. Covers dispatcher identification, opcode mapping, custom ISA reconstruction, disassembler/decompiler writing, maze challenges, and real-world VM protector analysis. Base models often fail to recognize the fetch-decode-execute pattern or attempt to analyze VM bytecode as native code.
AI加载说明:用于逆向自定义虚拟机和字节码解释器的专业技术,涵盖调度器识别、操作码映射、自定义ISA重建、反汇编器/反编译器编写、迷宫挑战分析以及真实场景VM保护程序分析。基础模型通常无法识别取指-解码-执行模式,或会尝试将VM字节码作为原生代码分析。
0. RELATED ROUTING
0. 相关关联路径
- code-obfuscation-deobfuscation when the VM is a commercial protector (VMProtect/Themida)
- symbolic-execution-tools when using angr to solve VM-based challenges
- anti-debugging-techniques when the VM includes anti-debug checks
- code-obfuscation-deobfuscation 当VM属于商业保护程序(VMProtect/Themida)时使用
- symbolic-execution-tools 当使用angr求解基于VM的挑战时使用
- anti-debugging-techniques 当VM包含反调试检查时使用
Quick identification
快速识别
| Binary Pattern | Likely VM Type | Start With |
|---|---|---|
| Switch-based dispatcher | Map each case to an operation |
Indirect jump via table | Table-based dispatcher | Dump jump table, analyze handlers |
| Nested if-else chain on byte value | If-chain dispatcher | Same as switch, just different syntax |
| Stack push/pop dominant operations | Stack-based VM | Identify push, pop, arithmetic ops |
| Register-based VM | Map register indices to operations |
| 2D grid + direction input | Maze challenge | Extract grid, apply BFS/DFS |
| 二进制特征 | 可能的VM类型 | 入手点 |
|---|---|---|
| 基于switch的调度器 | 将每个case映射到对应操作 |
通过表实现间接跳转 | 基于表的调度器 | 导出跳转表,分析处理函数 |
| 基于字节值的嵌套if-else链 | 基于if链的调度器 | 与switch逻辑一致,仅语法不同 |
| 栈push/pop操作为主 | 基于栈的VM | 识别push、pop、算术运算操作 |
| 基于寄存器的VM | 将寄存器索引映射到对应操作 |
| 2D网格 + 方向输入 | 迷宫挑战 | 提取网格,应用BFS/DFS算法 |
1. CUSTOM VM IDENTIFICATION
1. 自定义VM识别
1.1 Structural Indicators
1.1 结构特征
VM Architecture Components:
┌─────────────────────────────────┐
│ Bytecode Program (data section)│
├─────────────────────────────────┤
│ Program Counter (pc/ip) │
│ Register File / Stack │
│ Memory / Data Area │
├─────────────────────────────────┤
│ Dispatcher Loop │
│ ├─ Fetch: opcode = code[pc] │
│ ├─ Decode: lookup handler │
│ └─ Execute: run handler │
└─────────────────────────────────┘VM架构组件:
┌─────────────────────────────────┐
│ 字节码程序(数据段) │
├─────────────────────────────────┤
│ 程序计数器(pc/ip) │
│ 寄存器组 / 栈 │
│ 内存 / 数据区 │
├─────────────────────────────────┤
│ 调度器循环 │
│ ├─ 取指:opcode = code[pc] │
│ ├─ 解码:查找处理函数 │
│ └─ 执行:运行处理函数 │
└─────────────────────────────────┘1.2 IDA/Ghidra Signatures
1.2 IDA/Ghidra特征
Switch dispatcher (most common in CTF):
c
while (running) {
unsigned char op = bytecode[pc++];
switch (op) {
case 0x00: /* nop */ break;
case 0x01: /* push imm */ stack[sp++] = bytecode[pc++]; break;
case 0x02: /* add */ stack[sp-2] += stack[sp-1]; sp--; break;
// ...
case 0xFF: /* halt */ running = 0; break;
}
}Table dispatcher (more optimized):
c
typedef void (*handler_t)(vm_ctx_t*);
handler_t handlers[256] = { handle_nop, handle_push, handle_add, ... };
while (running) {
handlers[bytecode[pc++]](&ctx);
}Switch调度器(CTF中最常见):
c
while (running) {
unsigned char op = bytecode[pc++];
switch (op) {
case 0x00: /* nop */ break;
case 0x01: /* push imm */ stack[sp++] = bytecode[pc++]; break;
case 0x02: /* add */ stack[sp-2] += stack[sp-1]; sp--; break;
// ...
case 0xFF: /* halt */ running = 0; break;
}
}表调度器(更优化的实现):
c
typedef void (*handler_t)(vm_ctx_t*);
handler_t handlers[256] = { handle_nop, handle_push, handle_add, ... };
while (running) {
handlers[bytecode[pc++]](&ctx);
}2. ANALYSIS METHODOLOGY
2. 分析方法论
Step 1: Find the Dispatcher
步骤1:找到调度器
Look for:
- Large switch statement (many cases) in a loop
- Array of function pointers indexed by a byte from a data buffer
- Single function with high cyclomatic complexity
- Cross-references to a data buffer read byte-by-byte
查找以下特征:
- 循环中的大型switch语句(包含大量case)
- 由数据缓冲区读取的字节索引的函数指针数组
- 高圈复杂度的单个函数
- 逐字节读取数据缓冲区的交叉引用
Step 2: Map Opcodes to Operations
步骤2:将操作码映射到操作
For each case/handler, determine:
| Property | How to Identify |
|---|---|
| Opcode value | Case number or table index |
| Operation type | Register/stack modifications |
| Operand count | How many bytes consumed after opcode |
| Operand type | Immediate value, register index, or memory address |
| Side effects | Output, memory write, flag modification |
对每个case/处理函数,确定以下信息:
| 属性 | 识别方式 |
|---|---|
| 操作码值 | case编号或表索引 |
| 操作类型 | 寄存器/栈修改逻辑 |
| 操作数数量 | 操作码之后消耗的字节数 |
| 操作数类型 | 立即数、寄存器索引或内存地址 |
| 副作用 | 输出、内存写入、标志位修改 |
Step 3: Extract Bytecode Program
步骤3:提取字节码程序
python
undefinedpython
undefinedTypical extraction from binary
从二进制中提取的典型代码
import struct
with open('challenge', 'rb') as f:
f.seek(bytecode_offset)
bytecode = f.read(bytecode_length)
import struct
with open('challenge', 'rb') as f:
f.seek(bytecode_offset)
bytecode = f.read(bytecode_length)
Or from IDA:
或从IDA中提取:
bytecode = idc.get_bytes(bytecode_addr, bytecode_len)
bytecode = idc.get_bytes(bytecode_addr, bytecode_len)
undefinedundefinedStep 4: Write Custom Disassembler
步骤4:编写自定义反汇编器
python
OPCODES = {
0x00: ("nop", 0), # (mnemonic, operand_bytes)
0x01: ("push", 1), # push immediate byte
0x02: ("pop", 0),
0x03: ("add", 0),
0x04: ("sub", 0),
0x05: ("xor", 0),
0x06: ("cmp", 0),
0x07: ("jmp", 2), # jump to 16-bit address
0x08: ("je", 2),
0x09: ("jne", 2),
0x0A: ("mov", 2), # mov reg, imm
0x0B: ("load", 1), # load from memory[operand]
0x0C: ("store",1), # store to memory[operand]
0x0D: ("print",0),
0x0E: ("read", 0), # read input
0xFF: ("halt", 0),
}
def disassemble(bytecode):
pc = 0
while pc < len(bytecode):
op = bytecode[pc]
if op not in OPCODES:
print(f" {pc:04x}: UNKNOWN {op:#04x}")
pc += 1
continue
mnemonic, operand_size = OPCODES[op]
operands = bytecode[pc+1:pc+1+operand_size]
operand_str = ' '.join(f'{b:#04x}' for b in operands)
print(f" {pc:04x}: {mnemonic:8s} {operand_str}")
pc += 1 + operand_size
disassemble(bytecode)python
OPCODES = {
0x00: ("nop", 0), # (助记符, 操作数字节数)
0x01: ("push", 1), # 压入立即数字节
0x02: ("pop", 0),
0x03: ("add", 0),
0x04: ("sub", 0),
0x05: ("xor", 0),
0x06: ("cmp", 0),
0x07: ("jmp", 2), # 跳转到16位地址
0x08: ("je", 2),
0x09: ("jne", 2),
0x0A: ("mov", 2), # 给寄存器赋值立即数
0x0B: ("load", 1), # 从memory[operand]加载
0x0C: ("store",1), # 存储到memory[operand]
0x0D: ("print",0),
0x0E: ("read", 0), # 读取输入
0xFF: ("halt", 0),
}
def disassemble(bytecode):
pc = 0
while pc < len(bytecode):
op = bytecode[pc]
if op not in OPCODES:
print(f" {pc:04x}: UNKNOWN {op:#04x}")
pc += 1
continue
mnemonic, operand_size = OPCODES[op]
operands = bytecode[pc+1:pc+1+operand_size]
operand_str = ' '.join(f'{b:#04x}' for b in operands)
print(f" {pc:04x}: {mnemonic:8s} {operand_str}")
pc += 1 + operand_size
disassemble(bytecode)Step 5: Analyze Disassembled Program
步骤5:分析反汇编后的程序
With the custom disassembly, apply standard reverse engineering:
- Identify input reading (read opcode)
- Trace data flow from input to comparison
- Determine success/failure conditions
- Extract the check logic (often XOR/ADD transformations of input compared against constants)
拿到自定义反汇编结果后,应用标准逆向工程方法:
- 识别输入读取逻辑(read操作码)
- 追踪从输入到比较环节的数据流
- 确定成功/失败条件
- 提取校验逻辑(通常是对输入做XOR/ADD转换后与常量比较)
3. COMMON VM PATTERNS IN CTF
3. CTF中常见的VM模式
3.1 Stack-Based VM
3.1 基于栈的VM
Operations work on a stack (like JVM or Python bytecode).
| Opcode | Operation | Stack Effect |
|---|---|---|
| PUSH imm | Push immediate value | [...] → [..., imm] |
| POP | Discard top | [..., a] → [...] |
| ADD | Add top two | [..., a, b] → [..., a+b] |
| SUB | Subtract | [..., a, b] → [..., a-b] |
| MUL | Multiply | [..., a, b] → [..., a*b] |
| XOR | Bitwise XOR | [..., a, b] → [..., a^b] |
| CMP | Compare | [..., a, b] → [..., (a==b)] |
| JMP addr | Unconditional jump | no change |
| JZ addr | Jump if top is zero | [..., a] → [...] |
| Output top as char | [..., a] → [...] | |
| READ | Read char to stack | [...] → [..., input] |
| HALT | Stop execution | - |
操作在栈上执行(类似JVM或Python字节码)。
| 操作码 | 操作 | 栈变化 |
|---|---|---|
| PUSH imm | 压入立即数 | [...] → [..., imm] |
| POP | 丢弃栈顶元素 | [..., a] → [...] |
| ADD | 栈顶两个元素相加 | [..., a, b] → [..., a+b] |
| SUB | 栈顶两个元素相减 | [..., a, b] → [..., a-b] |
| MUL | 栈顶两个元素相乘 | [..., a, b] → [..., a*b] |
| XOR | 栈顶两个元素按位异或 | [..., a, b] → [..., a^b] |
| CMP | 比较栈顶两个元素 | [..., a, b] → [..., (a==b)] |
| JMP addr | 无条件跳转 | 无变化 |
| JZ addr | 栈顶为0则跳转 | [..., a] → [...] |
| 将栈顶元素作为字符输出 | [..., a] → [...] | |
| READ | 读取字符压入栈 | [...] → [..., input] |
| HALT | 停止执行 | - |
3.2 Register-Based VM
3.2 基于寄存器的VM
Operations use register indices (like x86, ARM).
| Opcode | Format | Operation |
|---|---|---|
| MOV r, imm | | reg[R] = imm16 |
| MOV r1, r2 | | reg[R1] = reg[R2] |
| ADD r1, r2 | | reg[R1] += reg[R2] |
| SUB r1, r2 | | reg[R1] -= reg[R2] |
| XOR r1, r2 | | reg[R1] ^= reg[R2] |
| CMP r1, r2 | | flags = compare(r1, r2) |
| JMP addr | | pc = addr |
| JE addr | | if equal: pc = addr |
| LOAD r, [addr] | | reg[R] = mem[addr] |
| STORE [addr], r | | mem[addr] = reg[R] |
| SYSCALL | | I/O operation based on reg[0] |
| HALT | | stop |
操作使用寄存器索引(类似x86、ARM)。
| 操作码 | 格式 | 操作 |
|---|---|---|
| MOV r, imm | | reg[R] = imm16 |
| MOV r1, r2 | | reg[R1] = reg[R2] |
| ADD r1, r2 | | reg[R1] += reg[R2] |
| SUB r1, r2 | | reg[R1] -= reg[R2] |
| XOR r1, r2 | | reg[R1] ^= reg[R2] |
| CMP r1, r2 | | flags = compare(r1, r2) |
| JMP addr | | pc = addr |
| JE addr | | 相等则pc = addr |
| LOAD r, [addr] | | reg[R] = mem[addr] |
| STORE [addr], r | | mem[addr] = reg[R] |
| SYSCALL | | 根据reg[0]执行I/O操作 |
| HALT | | 停止执行 |
3.3 Brainfuck-like / Esoteric VMs
3.3 Brainfuck类/晦涩VM
| BF Command | VM Equivalent | Description |
|---|---|---|
| INC ptr | Move data pointer right |
| DEC ptr | Move data pointer left |
| INC [ptr] | Increment byte at pointer |
| DEC [ptr] | Decrement byte at pointer |
| OUTPUT [ptr] | Output byte at pointer |
| INPUT [ptr] | Input byte to pointer |
| JZ forward | Jump past |
| JNZ back | Jump back to |
| BF指令 | VM等价操作 | 描述 |
|---|---|---|
| INC ptr | 数据指针右移 |
| DEC ptr | 数据指针左移 |
| INC [ptr] | 指针指向的字节加1 |
| DEC [ptr] | 指针指向的字节减1 |
| OUTPUT [ptr] | 输出指针指向的字节 |
| INPUT [ptr] | 输入字节写入指针指向的位置 |
| JZ forward | 如果字节为0则跳转到 |
| JNZ back | 如果字节非0则跳转回 |
4. MAZE CHALLENGES
4. 迷宫挑战
4.1 Identification
4.1 识别特征
- Binary reads directional input (WASD, arrow keys, UDLR)
- 2D array in data section (walls, paths, start, end)
- Position tracking with x,y coordinates
- Win condition at specific coordinates
- 二进制读取方向输入(WASD、方向键、UDLR)
- 数据段中存在2D数组(墙、路径、起点、终点)
- 通过x,y坐标追踪位置
- 到达特定坐标触发获胜条件
4.2 Map Extraction
4.2 地图提取
python
undefinedpython
undefinedExtract maze grid from binary data section
从二进制数据段提取迷宫网格
MAZE_ADDR = 0x601060
WIDTH = 20
HEIGHT = 15
MAZE_ADDR = 0x601060
WIDTH = 20
HEIGHT = 15
From binary dump:
从二进制转储中提取:
maze = []
for row in range(HEIGHT):
line = ""
for col in range(WIDTH):
cell = bytecode[MAZE_ADDR + row * WIDTH + col - base_addr]
if cell == 0: line += "." # path
elif cell == 1: line += "#" # wall
elif cell == 2: line += "S" # start
elif cell == 3: line += "E" # end
else: line += "?"
maze.append(line)
print(line)
undefinedmaze = []
for row in range(HEIGHT):
line = ""
for col in range(WIDTH):
cell = bytecode[MAZE_ADDR + row * WIDTH + col - base_addr]
if cell == 0: line += "." # 路径
elif cell == 1: line += "#" # 墙
elif cell == 2: line += "S" # 起点
elif cell == 3: line += "E" # 终点
else: line += "?"
maze.append(line)
print(line)
undefined4.3 Automated Solving
4.3 自动求解
python
from collections import deque
def solve_maze(maze, start, end):
"""BFS solver returns direction string."""
rows, cols = len(maze), len(maze[0])
directions = {'U': (-1, 0), 'D': (1, 0), 'L': (0, -1), 'R': (0, 1)}
queue = deque([(start, "")])
visited = {start}
while queue:
(r, c), path = queue.popleft()
if (r, c) == end:
return path
for name, (dr, dc) in directions.items():
nr, nc = r + dr, c + dc
if (0 <= nr < rows and 0 <= nc < cols and
maze[nr][nc] != '#' and (nr, nc) not in visited):
visited.add((nr, nc))
queue.append(((nr, nc), path + name))
return Nonepython
from collections import deque
def solve_maze(maze, start, end):
"""BFS求解器返回方向字符串"""
rows, cols = len(maze), len(maze[0])
directions = {'U': (-1, 0), 'D': (1, 0), 'L': (0, -1), 'R': (0, 1)}
queue = deque([(start, "")])
visited = {start}
while queue:
(r, c), path = queue.popleft()
if (r, c) == end:
return path
for name, (dr, dc) in directions.items():
nr, nc = r + dr, c + dc
if (0 <= nr < rows and 0 <= nc < cols and
maze[nr][nc] != '#' and (nr, nc) not in visited):
visited.add((nr, nc))
queue.append(((nr, nc), path + name))
return NoneFind start and end positions
查找起点和终点位置
for r, row in enumerate(maze):
for c, cell in enumerate(row):
if cell == 'S': start = (r, c)
if cell == 'E': end = (r, c)
solution = solve_maze(maze, start, end)
print(f"Path: {solution}")
undefinedfor r, row in enumerate(maze):
for c, cell in enumerate(row):
if cell == 'S': start = (r, c)
if cell == 'E': end = (r, c)
solution = solve_maze(maze, start, end)
print(f"路径: {solution}")
undefined4.4 Direction Encoding
4.4 方向编码
Different challenges encode directions differently:
| Encoding | Up | Down | Left | Right |
|---|---|---|---|---|
| WASD | W | S | A | D |
| UDLR | U | D | L | R |
| Arrow keys | ↑ (0x48) | ↓ (0x50) | ← (0x4B) | → (0x4D) |
| Numbers | 1 | 2 | 3 | 4 |
| Hex opcodes | 0x01 | 0x02 | 0x03 | 0x04 |
不同挑战的方向编码方式不同:
| 编码方式 | 上 | 下 | 左 | 右 |
|---|---|---|---|---|
| WASD | W | S | A | D |
| UDLR | U | D | L | R |
| 方向键 | ↑ (0x48) | ↓ (0x50) | ← (0x4B) | → (0x4D) |
| 数字 | 1 | 2 | 3 | 4 |
| 十六进制操作码 | 0x01 | 0x02 | 0x03 | 0x04 |
5. REAL-WORLD VM PROTECTORS
5. 真实场景VM保护程序
5.1 VMProtect Analysis Approach
5.1 VMProtect分析方法
1. Find VM entry: search for pushad/pushfd sequence
2. Identify VM context structure (registers, flags, bytecode pointer)
3. Locate handler table (often obfuscated with opaque predicates)
4. For each handler:
a. Remove junk code / opaque predicates
b. Identify the core operation
c. Document handler semantics
5. Trace bytecode execution (instruction-level trace)
6. Reconstruct original code from trace1. 查找VM入口:搜索pushad/pushfd序列
2. 识别VM上下文结构(寄存器、标志位、字节码指针)
3. 定位处理函数表(通常被不透明谓词混淆)
4. 对每个处理函数:
a. 移除垃圾代码/不透明谓词
b. 识别核心操作
c. 记录处理函数语义
5. 追踪字节码执行(指令级追踪)
6. 从追踪中重建原始代码5.2 Tigress Obfuscator
5.2 Tigress混淆器
Academic VM obfuscator with configurable protection layers.
| Feature | Approach |
|---|---|
| Single-dispatch VM | Standard handler extraction |
| Split handlers | Handlers spread across multiple functions |
| Nested VMs | Outer VM handler invokes inner VM |
| Encrypted bytecode | Dynamic decryption before each fetch |
| Polymorphic handlers | Different code for same operation on each build |
学术级VM混淆器,支持配置保护层。
| 特性 | 分析方法 |
|---|---|
| 单调度VM | 标准处理函数提取 |
| 拆分处理函数 | 处理函数分散在多个函数中 |
| 嵌套VM | 外层VM处理函数调用内层VM |
| 加密字节码 | 每次取指前动态解密 |
| 多态处理函数 | 每次构建相同操作对应不同代码 |
5.3 Common VM Protector Patterns
5.3 常见VM保护程序模式
| Protector | Dispatcher Style | Difficulty |
|---|---|---|
| VMProtect | Table + opaque predicates | High |
| Themida (Code Virtualizer) | CISC-like, large handler set | High |
| Tigress | Configurable, academic | Medium-High |
| Custom CTF VM | Simple switch | Low-Medium |
| Movfuscator | All-mov computation | Medium |
| 保护程序 | 调度器类型 | 难度 |
|---|---|---|
| VMProtect | 表+不透明谓词 | 高 |
| Themida (Code Virtualizer) | 类CISC,处理函数集庞大 | 高 |
| Tigress | 可配置,学术级 | 中高 |
| 自定义CTF VM | 简单switch | 低中 |
| Movfuscator | 全mov指令计算 | 中 |
6. TOOLS
6. 工具
| Tool | Purpose | Usage |
|---|---|---|
| IDA Pro | Identify dispatcher, reverse handlers | F5 decompile, xref analysis |
| Ghidra | Free alternative with Sleigh processor modules | Write custom processor for VM ISA |
| angr | Symbolic execution through VM | Treat entire VM as constraint system |
| Pin / DynamoRIO | Dynamic instrumentation for tracing | Record opcode handler execution sequence |
| REVEN | Full-system trace recording | Replay and analyze VM execution |
| Unicorn | Emulate VM execution | Fast handler emulation |
| Miasm | IR-based analysis | Lift VM handlers to IR for analysis |
| Custom Python | Write disassembler/decompiler | Per-challenge custom tooling |
| 工具 | 用途 | 使用场景 |
|---|---|---|
| IDA Pro | 识别调度器,逆向处理函数 | F5反编译,交叉引用分析 |
| Ghidra | 免费替代工具,支持Sleigh处理器模块 | 为VM ISA编写自定义处理器 |
| angr | 通过VM做符号执行 | 将整个VM视为约束系统 |
| Pin / DynamoRIO | 动态插桩追踪 | 记录操作码处理函数执行序列 |
| REVEN | 全系统追踪录制 | 重放和分析VM执行过程 |
| Unicorn | 模拟VM执行 | 快速处理函数模拟 |
| Miasm | 基于IR的分析 | 将VM处理函数 lifting 到IR进行分析 |
| 自定义Python脚本 | 编写反汇编器/反编译器 | 每个挑战定制工具 |
Ghidra Sleigh Processor Module
Ghidra Sleigh处理器模块
For recurring VM architectures, write a Sleigh processor specification:
define space ram type=ram_space size=2 default;
define space register type=register_space size=1;
define register offset=0 size=1 [ R0 R1 R2 R3 FLAGS PC SP ];
define token opcode(8)
op = (0,7)
;
:NOP is op=0x00 { }
:PUSH imm is op=0x01; imm { SP = SP - 1; *[ram]:1 SP = imm; }
:POP is op=0x02 { SP = SP + 1; }
:ADD is op=0x03 { local a = *[ram]:1 (SP+1); *[ram]:1 (SP+1) = a + *[ram]:1 SP; SP = SP + 1; }对重复出现的VM架构,编写Sleigh处理器规范:
define space ram type=ram_space size=2 default;
define space register type=register_space size=1;
define register offset=0 size=1 [ R0 R1 R2 R3 FLAGS PC SP ];
define token opcode(8)
op = (0,7)
;
:NOP is op=0x00 { }
:PUSH imm is op=0x01; imm { SP = SP - 1; *[ram]:1 SP = imm; }
:POP is op=0x02 { SP = SP + 1; }
:ADD is op=0x03 { local a = *[ram]:1 (SP+1); *[ram]:1 (SP+1) = a + *[ram]:1 SP; SP = SP + 1; }7. DECISION TREE
7. 决策树
Binary contains custom bytecode interpreter?
│
├─ Can you identify the dispatcher?
│ ├─ Yes (switch/table/if-chain)
│ │ ├─ Few opcodes (< 20) → Simple CTF VM
│ │ │ ├─ Stack-based → map push/pop/arithmetic ops
│ │ │ ├─ Register-based → map mov/add/cmp ops
│ │ │ └─ Write disassembler → analyze program → solve
│ │ │
│ │ └─ Many opcodes (50+) → Commercial protector
│ │ ├─ Known protector → use specific deprotection tools
│ │ └─ Custom → trace execution, pattern-match handlers
│ │
│ └─ No clear dispatcher
│ ├─ All-mov instructions → movfuscator
│ ├─ Encrypted bytecode → find decryption, dump after decode
│ └─ Split/distributed handlers → trace execution to find them
│
├─ Is it a maze challenge?
│ ├─ Extract grid from data section
│ ├─ Identify direction encoding
│ ├─ BFS/DFS to find shortest path
│ └─ Convert path to expected input format
│
├─ Is there input validation in VM?
│ ├─ Small input space → brute-force via Unicorn emulation
│ ├─ Known format → constrained angr solve
│ └─ Complex check → write disassembler, analyze check logic
│
└─ Multiple VM layers (VM in VM)?
├─ Analyze outer VM first
├─ Extract inner bytecode
├─ Repeat analysis for inner VM
└─ Consider: symbolic execution may handle nested VMs directly二进制包含自定义字节码解释器?
│
├─ 能否识别调度器?
│ ├─ 是(switch/表/if链)
│ │ ├─ 操作码少(<20)→ 简单CTF VM
│ │ │ ├─ 基于栈 → 映射push/pop/算术操作
│ │ │ ├─ 基于寄存器 → 映射mov/add/cmp操作
│ │ │ └─ 编写反汇编器 → 分析程序 → 求解
│ │ │
│ │ └─ 操作码多(50+)→ 商业保护程序
│ │ ├─ 已知保护程序 → 使用专用脱壳工具
│ │ └─ 自定义 → 追踪执行,模式匹配处理函数
│ │
│ └─ 无明确调度器
│ ├─ 全mov指令 → movfuscator
│ ├─ 加密字节码 → 查找解密逻辑,解码后转储
│ └─ 拆分/分布式处理函数 → 执行追踪查找处理函数
│
├─ 是否为迷宫挑战?
│ ├─ 从数据段提取网格
│ ├─ 识别方向编码
│ ├─ BFS/DFS查找最短路径
│ └─ 将路径转换为预期输入格式
│
├─ VM中是否存在输入校验?
│ ├─ 输入空间小 → 通过Unicorn模拟暴力破解
│ ├─ 已知格式 → 约束angr求解
│ └─ 复杂校验 → 编写反汇编器,分析校验逻辑
│
└─ 多层VM(VM中嵌套VM)?
├─ 先分析外层VM
├─ 提取内层字节码
├─ 对内层VM重复分析流程
└─ 可选:符号执行可直接处理嵌套VM8. CTF SOLVING WORKFLOW
8. CTF求解工作流
1. Run the binary — understand I/O behavior
└─ What input does it expect? What output on success/failure?
2. Open in IDA/Ghidra — find the main loop
└─ Look for while/for loop with switch or indirect jump
3. Identify VM components:
├─ Bytecode location (where is the program data?)
├─ PC/IP variable (how is current position tracked?)
├─ Registers/stack (where is VM state stored?)
└─ I/O handlers (which opcodes read input / write output?)
4. Map all opcodes (create the ISA specification)
└─ For each case/handler: opcode number, operation, operands
5. Write disassembler in Python
└─ Output readable assembly for the bytecode
6. Analyze the disassembled program:
├─ Find input reading
├─ Trace transformations applied to input
├─ Find comparison against expected values
└─ Reverse the transformation to find valid input
7. Solve:
├─ If simple transforms (XOR, ADD) → reverse manually
├─ If complex → feed to Z3 as constraints
└─ If maze → extract grid, run pathfinding1. 运行二进制 — 理解I/O行为
└─ 预期输入是什么?成功/失败输出是什么?
2. 用IDA/Ghidra打开 — 查找主循环
└─ 查找带switch或间接跳转的while/for循环
3. 识别VM组件:
├─ 字节码位置(程序数据在哪里?)
├─ PC/IP变量(当前位置如何追踪?)
├─ 寄存器/栈(VM状态存储在哪里?)
└─ I/O处理函数(哪些操作码读取输入/写入输出?)
4. 映射所有操作码(创建ISA规范)
└─ 对每个case/处理函数:操作码编号、操作、操作数
5. 用Python编写反汇编器
└─ 输出字节码的可读汇编代码
6. 分析反汇编后的程序:
├─ 找到输入读取逻辑
├─ 追踪输入经过的转换
├─ 找到与预期值的比较逻辑
└─ 逆向转换得到有效输入
7. 求解:
├─ 如果是简单转换(XOR、ADD)→ 手动逆向
├─ 如果复杂 → 作为约束输入Z3求解
└─ 如果是迷宫 → 提取网格,运行路径查找