vm-and-bytecode-reverse

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

SKILL: VM & Bytecode Reverse Engineering — Expert Analysis Playbook

技能：VM & 字节码逆向工程 — 专家分析手册

AI LOAD INSTRUCTION: Expert techniques for reversing custom virtual machines and bytecode interpreters. Covers dispatcher identification, opcode mapping, custom ISA reconstruction, disassembler/decompiler writing, maze challenges, and real-world VM protector analysis. Base models often fail to recognize the fetch-decode-execute pattern or attempt to analyze VM bytecode as native code.

AI加载说明：用于逆向自定义虚拟机和字节码解释器的专业技术，涵盖调度器识别、操作码映射、自定义ISA重建、反汇编器/反编译器编写、迷宫挑战分析以及真实场景VM保护程序分析。基础模型通常无法识别取指-解码-执行模式，或会尝试将VM字节码作为原生代码分析。

0. RELATED ROUTING

0. 相关关联路径

code-obfuscation-deobfuscation when the VM is a commercial protector (VMProtect/Themida)
symbolic-execution-tools when using angr to solve VM-based challenges
anti-debugging-techniques when the VM includes anti-debug checks

code-obfuscation-deobfuscation 当VM属于商业保护程序（VMProtect/Themida）时使用
symbolic-execution-tools 当使用angr求解基于VM的挑战时使用
anti-debugging-techniques 当VM包含反调试检查时使用

Quick identification

快速识别

Binary Pattern	Likely VM Type	Start With
`while(1) { switch(bytecode[pc]) }`	Switch-based dispatcher	Map each case to an operation
Indirect jump via table `jmp [table + opcode*8]`	Table-based dispatcher	Dump jump table, analyze handlers
Nested if-else chain on byte value	If-chain dispatcher	Same as switch, just different syntax
Stack push/pop dominant operations	Stack-based VM	Identify push, pop, arithmetic ops
`reg[X] = ...` array operations	Register-based VM	Map register indices to operations
2D grid + direction input	Maze challenge	Extract grid, apply BFS/DFS

二进制特征	可能的VM类型	入手点
`while(1) { switch(bytecode[pc]) }`	基于switch的调度器	将每个case映射到对应操作
通过表实现间接跳转 `jmp [table + opcode*8]`	基于表的调度器	导出跳转表，分析处理函数
基于字节值的嵌套if-else链	基于if链的调度器	与switch逻辑一致，仅语法不同
栈push/pop操作为主	基于栈的VM	识别push、pop、算术运算操作
`reg[X] = ...` 数组操作	基于寄存器的VM	将寄存器索引映射到对应操作
2D网格 + 方向输入	迷宫挑战	提取网格，应用BFS/DFS算法

1. CUSTOM VM IDENTIFICATION

1. 自定义VM识别

1.1 Structural Indicators

1.1 结构特征

VM Architecture Components:
┌─────────────────────────────────┐
│  Bytecode Program (data section)│
├─────────────────────────────────┤
│  Program Counter (pc/ip)        │
│  Register File / Stack          │
│  Memory / Data Area             │
├─────────────────────────────────┤
│  Dispatcher Loop                │
│  ├─ Fetch: opcode = code[pc]    │
│  ├─ Decode: lookup handler      │
│  └─ Execute: run handler        │
└─────────────────────────────────┘

VM架构组件:
┌─────────────────────────────────┐
│  字节码程序（数据段）           │
├─────────────────────────────────┤
│  程序计数器（pc/ip）            │
│  寄存器组 / 栈                  │
│  内存 / 数据区                  │
├─────────────────────────────────┤
│  调度器循环                     │
│  ├─ 取指：opcode = code[pc]     │
│  ├─ 解码：查找处理函数          │
│  └─ 执行：运行处理函数          │
└─────────────────────────────────┘

1.2 IDA/Ghidra Signatures

1.2 IDA/Ghidra特征

Switch dispatcher (most common in CTF):

while (running) {
    unsigned char op = bytecode[pc++];
    switch (op) {
        case 0x00: /* nop */       break;
        case 0x01: /* push imm */  stack[sp++] = bytecode[pc++]; break;
        case 0x02: /* add */       stack[sp-2] += stack[sp-1]; sp--; break;
        // ...
        case 0xFF: /* halt */      running = 0; break;
    }
}

Table dispatcher (more optimized):

typedef void (*handler_t)(vm_ctx_t*);
handler_t handlers[256] = { handle_nop, handle_push, handle_add, ... };

while (running) {
    handlers[bytecode[pc++]](&ctx);
}

Switch调度器（CTF中最常见）:

while (running) {
    unsigned char op = bytecode[pc++];
    switch (op) {
        case 0x00: /* nop */       break;
        case 0x01: /* push imm */  stack[sp++] = bytecode[pc++]; break;
        case 0x02: /* add */       stack[sp-2] += stack[sp-1]; sp--; break;
        // ...
        case 0xFF: /* halt */      running = 0; break;
    }
}

表调度器（更优化的实现）:

typedef void (*handler_t)(vm_ctx_t*);
handler_t handlers[256] = { handle_nop, handle_push, handle_add, ... };

while (running) {
    handlers[bytecode[pc++]](&ctx);
}

2. ANALYSIS METHODOLOGY

2. 分析方法论

Step 1: Find the Dispatcher

步骤1：找到调度器

Look for:

Large switch statement (many cases) in a loop
Array of function pointers indexed by a byte from a data buffer
Single function with high cyclomatic complexity
Cross-references to a data buffer read byte-by-byte

查找以下特征：

循环中的大型switch语句（包含大量case）
由数据缓冲区读取的字节索引的函数指针数组
高圈复杂度的单个函数
逐字节读取数据缓冲区的交叉引用

Step 2: Map Opcodes to Operations

步骤2：将操作码映射到操作

For each case/handler, determine:

Property	How to Identify
Opcode value	Case number or table index
Operation type	Register/stack modifications
Operand count	How many bytes consumed after opcode
Operand type	Immediate value, register index, or memory address
Side effects	Output, memory write, flag modification

对每个case/处理函数，确定以下信息：

属性	识别方式
操作码值	case编号或表索引
操作类型	寄存器/栈修改逻辑
操作数数量	操作码之后消耗的字节数
操作数类型	立即数、寄存器索引或内存地址
副作用	输出、内存写入、标志位修改

Step 3: Extract Bytecode Program

步骤3：提取字节码程序

python

undefined

python

undefined

Typical extraction from binary

从二进制中提取的典型代码

import struct

with open('challenge', 'rb') as f: f.seek(bytecode_offset) bytecode = f.read(bytecode_length)

import struct

with open('challenge', 'rb') as f: f.seek(bytecode_offset) bytecode = f.read(bytecode_length)

Or from IDA:

或从IDA中提取：

bytecode = idc.get_bytes(bytecode_addr, bytecode_len)

undefined

undefined

Step 4: Write Custom Disassembler

步骤4：编写自定义反汇编器

python

OPCODES = {
    0x00: ("nop",  0),    # (mnemonic, operand_bytes)
    0x01: ("push", 1),    # push immediate byte
    0x02: ("pop",  0),
    0x03: ("add",  0),
    0x04: ("sub",  0),
    0x05: ("xor",  0),
    0x06: ("cmp",  0),
    0x07: ("jmp",  2),    # jump to 16-bit address
    0x08: ("je",   2),
    0x09: ("jne",  2),
    0x0A: ("mov",  2),    # mov reg, imm
    0x0B: ("load", 1),    # load from memory[operand]
    0x0C: ("store",1),    # store to memory[operand]
    0x0D: ("print",0),
    0x0E: ("read", 0),    # read input
    0xFF: ("halt", 0),
}

def disassemble(bytecode):
    pc = 0
    while pc < len(bytecode):
        op = bytecode[pc]
        if op not in OPCODES:
            print(f"  {pc:04x}: UNKNOWN {op:#04x}")
            pc += 1
            continue

        mnemonic, operand_size = OPCODES[op]
        operands = bytecode[pc+1:pc+1+operand_size]
        operand_str = ' '.join(f'{b:#04x}' for b in operands)
        print(f"  {pc:04x}: {mnemonic:8s} {operand_str}")
        pc += 1 + operand_size

disassemble(bytecode)

python

OPCODES = {
    0x00: ("nop",  0),    # (助记符, 操作数字节数)
    0x01: ("push", 1),    # 压入立即数字节
    0x02: ("pop",  0),
    0x03: ("add",  0),
    0x04: ("sub",  0),
    0x05: ("xor",  0),
    0x06: ("cmp",  0),
    0x07: ("jmp",  2),    # 跳转到16位地址
    0x08: ("je",   2),
    0x09: ("jne",  2),
    0x0A: ("mov",  2),    # 给寄存器赋值立即数
    0x0B: ("load", 1),    # 从memory[operand]加载
    0x0C: ("store",1),    # 存储到memory[operand]
    0x0D: ("print",0),
    0x0E: ("read", 0),    # 读取输入
    0xFF: ("halt", 0),
}

def disassemble(bytecode):
    pc = 0
    while pc < len(bytecode):
        op = bytecode[pc]
        if op not in OPCODES:
            print(f"  {pc:04x}: UNKNOWN {op:#04x}")
            pc += 1
            continue

        mnemonic, operand_size = OPCODES[op]
        operands = bytecode[pc+1:pc+1+operand_size]
        operand_str = ' '.join(f'{b:#04x}' for b in operands)
        print(f"  {pc:04x}: {mnemonic:8s} {operand_str}")
        pc += 1 + operand_size

disassemble(bytecode)

Step 5: Analyze Disassembled Program

步骤5：分析反汇编后的程序

With the custom disassembly, apply standard reverse engineering:

Identify input reading (read opcode)
Trace data flow from input to comparison
Determine success/failure conditions
Extract the check logic (often XOR/ADD transformations of input compared against constants)

拿到自定义反汇编结果后，应用标准逆向工程方法：

识别输入读取逻辑（read操作码）
追踪从输入到比较环节的数据流
确定成功/失败条件
提取校验逻辑（通常是对输入做XOR/ADD转换后与常量比较）

3. COMMON VM PATTERNS IN CTF

3. CTF中常见的VM模式

3.1 Stack-Based VM

3.1 基于栈的VM

Operations work on a stack (like JVM or Python bytecode).

Opcode	Operation	Stack Effect
PUSH imm	Push immediate value	[...] → [..., imm]
POP	Discard top	[..., a] → [...]
ADD	Add top two	[..., a, b] → [..., a+b]
SUB	Subtract	[..., a, b] → [..., a-b]
MUL	Multiply	[..., a, b] → [..., a*b]
XOR	Bitwise XOR	[..., a, b] → [..., a^b]
CMP	Compare	[..., a, b] → [..., (a==b)]
JMP addr	Unconditional jump	no change
JZ addr	Jump if top is zero	[..., a] → [...]
PRINT	Output top as char	[..., a] → [...]
READ	Read char to stack	[...] → [..., input]
HALT	Stop execution	-

操作在栈上执行（类似JVM或Python字节码）。

操作码	操作	栈变化
PUSH imm	压入立即数	[...] → [..., imm]
POP	丢弃栈顶元素	[..., a] → [...]
ADD	栈顶两个元素相加	[..., a, b] → [..., a+b]
SUB	栈顶两个元素相减	[..., a, b] → [..., a-b]
MUL	栈顶两个元素相乘	[..., a, b] → [..., a*b]
XOR	栈顶两个元素按位异或	[..., a, b] → [..., a^b]
CMP	比较栈顶两个元素	[..., a, b] → [..., (a==b)]
JMP addr	无条件跳转	无变化
JZ addr	栈顶为0则跳转	[..., a] → [...]
PRINT	将栈顶元素作为字符输出	[..., a] → [...]
READ	读取字符压入栈	[...] → [..., input]
HALT	停止执行	-

3.2 Register-Based VM

3.2 基于寄存器的VM

Operations use register indices (like x86, ARM).

Opcode	Format	Operation
MOV r, imm	`0x01 RR II II`	reg[R] = imm16
MOV r1, r2	`0x02 R1 R2`	reg[R1] = reg[R2]
ADD r1, r2	`0x03 R1 R2`	reg[R1] += reg[R2]
SUB r1, r2	`0x04 R1 R2`	reg[R1] -= reg[R2]
XOR r1, r2	`0x05 R1 R2`	reg[R1] ^= reg[R2]
CMP r1, r2	`0x06 R1 R2`	flags = compare(r1, r2)
JMP addr	`0x07 AA AA`	pc = addr
JE addr	`0x08 AA AA`	if equal: pc = addr
LOAD r, [addr]	`0x09 RR AA`	reg[R] = mem[addr]
STORE [addr], r	`0x0A AA RR`	mem[addr] = reg[R]
SYSCALL	`0x0B`	I/O operation based on reg[0]
HALT	`0xFF`	stop

操作使用寄存器索引（类似x86、ARM）。

操作码	格式	操作
MOV r, imm	`0x01 RR II II`	reg[R] = imm16
MOV r1, r2	`0x02 R1 R2`	reg[R1] = reg[R2]
ADD r1, r2	`0x03 R1 R2`	reg[R1] += reg[R2]
SUB r1, r2	`0x04 R1 R2`	reg[R1] -= reg[R2]
XOR r1, r2	`0x05 R1 R2`	reg[R1] ^= reg[R2]
CMP r1, r2	`0x06 R1 R2`	flags = compare(r1, r2)
JMP addr	`0x07 AA AA`	pc = addr
JE addr	`0x08 AA AA`	相等则pc = addr
LOAD r, [addr]	`0x09 RR AA`	reg[R] = mem[addr]
STORE [addr], r	`0x0A AA RR`	mem[addr] = reg[R]
SYSCALL	`0x0B`	根据reg[0]执行I/O操作
HALT	`0xFF`	停止执行

3.3 Brainfuck-like / Esoteric VMs

3.3 Brainfuck类/晦涩VM

BF Command	VM Equivalent	Description
`>`	INC ptr	Move data pointer right
`<`	DEC ptr	Move data pointer left
`+`	INC [ptr]	Increment byte at pointer
`-`	DEC [ptr]	Decrement byte at pointer
`.`	OUTPUT [ptr]	Output byte at pointer
`,`	INPUT [ptr]	Input byte to pointer
`[`	JZ forward	Jump past `]` if byte is zero
`]`	JNZ back	Jump back to `[` if byte is nonzero

BF指令	VM等价操作	描述
`>`	INC ptr	数据指针右移
`<`	DEC ptr	数据指针左移
`+`	INC [ptr]	指针指向的字节加1
`-`	DEC [ptr]	指针指向的字节减1
`.`	OUTPUT [ptr]	输出指针指向的字节
`,`	INPUT [ptr]	输入字节写入指针指向的位置
`[`	JZ forward	如果字节为0则跳转到 `]` 之后
`]`	JNZ back	如果字节非0则跳转回 `[` 处

4. MAZE CHALLENGES

4. 迷宫挑战

4.1 Identification

4.1 识别特征

Binary reads directional input (WASD, arrow keys, UDLR)
2D array in data section (walls, paths, start, end)
Position tracking with x,y coordinates
Win condition at specific coordinates

二进制读取方向输入（WASD、方向键、UDLR）
数据段中存在2D数组（墙、路径、起点、终点）
通过x,y坐标追踪位置
到达特定坐标触发获胜条件

4.2 Map Extraction

4.2 地图提取

python

undefined

python

undefined

Extract maze grid from binary data section

从二进制数据段提取迷宫网格

MAZE_ADDR = 0x601060 WIDTH = 20 HEIGHT = 15

From binary dump:

从二进制转储中提取：

maze = [] for row in range(HEIGHT): line = "" for col in range(WIDTH): cell = bytecode[MAZE_ADDR + row * WIDTH + col - base_addr] if cell == 0: line += "." # path elif cell == 1: line += "#" # wall elif cell == 2: line += "S" # start elif cell == 3: line += "E" # end else: line += "?" maze.append(line) print(line)

undefined

maze = [] for row in range(HEIGHT): line = "" for col in range(WIDTH): cell = bytecode[MAZE_ADDR + row * WIDTH + col - base_addr] if cell == 0: line += "." # 路径 elif cell == 1: line += "#" # 墙 elif cell == 2: line += "S" # 起点 elif cell == 3: line += "E" # 终点 else: line += "?" maze.append(line) print(line)

undefined

4.3 Automated Solving

4.3 自动求解

python

from collections import deque

def solve_maze(maze, start, end):
    """BFS solver returns direction string."""
    rows, cols = len(maze), len(maze[0])
    directions = {'U': (-1, 0), 'D': (1, 0), 'L': (0, -1), 'R': (0, 1)}
    queue = deque([(start, "")])
    visited = {start}

    while queue:
        (r, c), path = queue.popleft()
        if (r, c) == end:
            return path

        for name, (dr, dc) in directions.items():
            nr, nc = r + dr, c + dc
            if (0 <= nr < rows and 0 <= nc < cols and
                maze[nr][nc] != '#' and (nr, nc) not in visited):
                visited.add((nr, nc))
                queue.append(((nr, nc), path + name))

    return None

python

from collections import deque

def solve_maze(maze, start, end):
    """BFS求解器返回方向字符串"""
    rows, cols = len(maze), len(maze[0])
    directions = {'U': (-1, 0), 'D': (1, 0), 'L': (0, -1), 'R': (0, 1)}
    queue = deque([(start, "")])
    visited = {start}

    while queue:
        (r, c), path = queue.popleft()
        if (r, c) == end:
            return path

        for name, (dr, dc) in directions.items():
            nr, nc = r + dr, c + dc
            if (0 <= nr < rows and 0 <= nc < cols and
                maze[nr][nc] != '#' and (nr, nc) not in visited):
                visited.add((nr, nc))
                queue.append(((nr, nc), path + name))

    return None

Find start and end positions

查找起点和终点位置

for r, row in enumerate(maze): for c, cell in enumerate(row): if cell == 'S': start = (r, c) if cell == 'E': end = (r, c)

solution = solve_maze(maze, start, end) print(f"Path: {solution}")

undefined

for r, row in enumerate(maze): for c, cell in enumerate(row): if cell == 'S': start = (r, c) if cell == 'E': end = (r, c)

solution = solve_maze(maze, start, end) print(f"路径: {solution}")

undefined

4.4 Direction Encoding

4.4 方向编码

Different challenges encode directions differently:

Encoding	Up	Down	Left	Right
WASD	W	S	A	D
UDLR	U	D	L	R
Arrow keys	↑ (0x48)	↓ (0x50)	← (0x4B)	→ (0x4D)
Numbers	1	2	3	4
Hex opcodes	0x01	0x02	0x03	0x04

不同挑战的方向编码方式不同：

编码方式	上	下	左	右
WASD	W	S	A	D
UDLR	U	D	L	R
方向键	↑ (0x48)	↓ (0x50)	← (0x4B)	→ (0x4D)
数字	1	2	3	4
十六进制操作码	0x01	0x02	0x03	0x04

5. REAL-WORLD VM PROTECTORS

5. 真实场景VM保护程序

5.1 VMProtect Analysis Approach

5.1 VMProtect分析方法

1. Find VM entry: search for pushad/pushfd sequence
2. Identify VM context structure (registers, flags, bytecode pointer)
3. Locate handler table (often obfuscated with opaque predicates)
4. For each handler:
   a. Remove junk code / opaque predicates
   b. Identify the core operation
   c. Document handler semantics
5. Trace bytecode execution (instruction-level trace)
6. Reconstruct original code from trace

1. 查找VM入口：搜索pushad/pushfd序列
2. 识别VM上下文结构（寄存器、标志位、字节码指针）
3. 定位处理函数表（通常被不透明谓词混淆）
4. 对每个处理函数：
   a. 移除垃圾代码/不透明谓词
   b. 识别核心操作
   c. 记录处理函数语义
5. 追踪字节码执行（指令级追踪）
6. 从追踪中重建原始代码

5.2 Tigress Obfuscator

5.2 Tigress混淆器

Academic VM obfuscator with configurable protection layers.

Feature	Approach
Single-dispatch VM	Standard handler extraction
Split handlers	Handlers spread across multiple functions
Nested VMs	Outer VM handler invokes inner VM
Encrypted bytecode	Dynamic decryption before each fetch
Polymorphic handlers	Different code for same operation on each build

学术级VM混淆器，支持配置保护层。

特性	分析方法
单调度VM	标准处理函数提取
拆分处理函数	处理函数分散在多个函数中
嵌套VM	外层VM处理函数调用内层VM
加密字节码	每次取指前动态解密
多态处理函数	每次构建相同操作对应不同代码

5.3 Common VM Protector Patterns

5.3 常见VM保护程序模式

Protector	Dispatcher Style	Difficulty
VMProtect	Table + opaque predicates	High
Themida (Code Virtualizer)	CISC-like, large handler set	High
Tigress	Configurable, academic	Medium-High
Custom CTF VM	Simple switch	Low-Medium
Movfuscator	All-mov computation	Medium

保护程序	调度器类型	难度
VMProtect	表+不透明谓词	高
Themida (Code Virtualizer)	类CISC，处理函数集庞大	高
Tigress	可配置，学术级	中高
自定义CTF VM	简单switch	低中
Movfuscator	全mov指令计算	中

6. TOOLS

6. 工具

Tool	Purpose	Usage
IDA Pro	Identify dispatcher, reverse handlers	F5 decompile, xref analysis
Ghidra	Free alternative with Sleigh processor modules	Write custom processor for VM ISA
angr	Symbolic execution through VM	Treat entire VM as constraint system
Pin / DynamoRIO	Dynamic instrumentation for tracing	Record opcode handler execution sequence
REVEN	Full-system trace recording	Replay and analyze VM execution
Unicorn	Emulate VM execution	Fast handler emulation
Miasm	IR-based analysis	Lift VM handlers to IR for analysis
Custom Python	Write disassembler/decompiler	Per-challenge custom tooling

工具	用途	使用场景
IDA Pro	识别调度器，逆向处理函数	F5反编译，交叉引用分析
Ghidra	免费替代工具，支持Sleigh处理器模块	为VM ISA编写自定义处理器
angr	通过VM做符号执行	将整个VM视为约束系统
Pin / DynamoRIO	动态插桩追踪	记录操作码处理函数执行序列
REVEN	全系统追踪录制	重放和分析VM执行过程
Unicorn	模拟VM执行	快速处理函数模拟
Miasm	基于IR的分析	将VM处理函数 lifting 到IR进行分析
自定义Python脚本	编写反汇编器/反编译器	每个挑战定制工具

Ghidra Sleigh Processor Module

Ghidra Sleigh处理器模块

For recurring VM architectures, write a Sleigh processor specification:

define space ram      type=ram_space      size=2  default;
define space register type=register_space  size=1;

define register offset=0 size=1 [ R0 R1 R2 R3 FLAGS PC SP ];

define token opcode(8)
    op = (0,7)
;

:NOP    is op=0x00 { }
:PUSH   imm is op=0x01; imm { SP = SP - 1; *[ram]:1 SP = imm; }
:POP    is op=0x02 { SP = SP + 1; }
:ADD    is op=0x03 { local a = *[ram]:1 (SP+1); *[ram]:1 (SP+1) = a + *[ram]:1 SP; SP = SP + 1; }

对重复出现的VM架构，编写Sleigh处理器规范：

define space ram      type=ram_space      size=2  default;
define space register type=register_space  size=1;

define register offset=0 size=1 [ R0 R1 R2 R3 FLAGS PC SP ];

define token opcode(8)
    op = (0,7)
;

:NOP    is op=0x00 { }
:PUSH   imm is op=0x01; imm { SP = SP - 1; *[ram]:1 SP = imm; }
:POP    is op=0x02 { SP = SP + 1; }
:ADD    is op=0x03 { local a = *[ram]:1 (SP+1); *[ram]:1 (SP+1) = a + *[ram]:1 SP; SP = SP + 1; }

7. DECISION TREE

7. 决策树

Binary contains custom bytecode interpreter?
│
├─ Can you identify the dispatcher?
│  ├─ Yes (switch/table/if-chain)
│  │  ├─ Few opcodes (< 20) → Simple CTF VM
│  │  │  ├─ Stack-based → map push/pop/arithmetic ops
│  │  │  ├─ Register-based → map mov/add/cmp ops
│  │  │  └─ Write disassembler → analyze program → solve
│  │  │
│  │  └─ Many opcodes (50+) → Commercial protector
│  │     ├─ Known protector → use specific deprotection tools
│  │     └─ Custom → trace execution, pattern-match handlers
│  │
│  └─ No clear dispatcher
│     ├─ All-mov instructions → movfuscator
│     ├─ Encrypted bytecode → find decryption, dump after decode
│     └─ Split/distributed handlers → trace execution to find them
│
├─ Is it a maze challenge?
│  ├─ Extract grid from data section
│  ├─ Identify direction encoding
│  ├─ BFS/DFS to find shortest path
│  └─ Convert path to expected input format
│
├─ Is there input validation in VM?
│  ├─ Small input space → brute-force via Unicorn emulation
│  ├─ Known format → constrained angr solve
│  └─ Complex check → write disassembler, analyze check logic
│
└─ Multiple VM layers (VM in VM)?
   ├─ Analyze outer VM first
   ├─ Extract inner bytecode
   ├─ Repeat analysis for inner VM
   └─ Consider: symbolic execution may handle nested VMs directly

二进制包含自定义字节码解释器？
│
├─ 能否识别调度器？
│  ├─ 是（switch/表/if链）
│  │  ├─ 操作码少（<20）→ 简单CTF VM
│  │  │  ├─ 基于栈 → 映射push/pop/算术操作
│  │  │  ├─ 基于寄存器 → 映射mov/add/cmp操作
│  │  │  └─ 编写反汇编器 → 分析程序 → 求解
│  │  │
│  │  └─ 操作码多（50+）→ 商业保护程序
│  │     ├─ 已知保护程序 → 使用专用脱壳工具
│  │     └─ 自定义 → 追踪执行，模式匹配处理函数
│  │
│  └─ 无明确调度器
│     ├─ 全mov指令 → movfuscator
│     ├─ 加密字节码 → 查找解密逻辑，解码后转储
│     └─ 拆分/分布式处理函数 → 执行追踪查找处理函数
│
├─ 是否为迷宫挑战？
│  ├─ 从数据段提取网格
│  ├─ 识别方向编码
│  ├─ BFS/DFS查找最短路径
│  └─ 将路径转换为预期输入格式
│
├─ VM中是否存在输入校验？
│  ├─ 输入空间小 → 通过Unicorn模拟暴力破解
│  ├─ 已知格式 → 约束angr求解
│  └─ 复杂校验 → 编写反汇编器，分析校验逻辑
│
└─ 多层VM（VM中嵌套VM）？
   ├─ 先分析外层VM
   ├─ 提取内层字节码
   ├─ 对内层VM重复分析流程
   └─ 可选：符号执行可直接处理嵌套VM

8. CTF SOLVING WORKFLOW

8. CTF求解工作流

1. Run the binary — understand I/O behavior
   └─ What input does it expect? What output on success/failure?

2. Open in IDA/Ghidra — find the main loop
   └─ Look for while/for loop with switch or indirect jump

3. Identify VM components:
   ├─ Bytecode location (where is the program data?)
   ├─ PC/IP variable (how is current position tracked?)
   ├─ Registers/stack (where is VM state stored?)
   └─ I/O handlers (which opcodes read input / write output?)

4. Map all opcodes (create the ISA specification)
   └─ For each case/handler: opcode number, operation, operands

5. Write disassembler in Python
   └─ Output readable assembly for the bytecode

6. Analyze the disassembled program:
   ├─ Find input reading
   ├─ Trace transformations applied to input
   ├─ Find comparison against expected values
   └─ Reverse the transformation to find valid input

7. Solve:
   ├─ If simple transforms (XOR, ADD) → reverse manually
   ├─ If complex → feed to Z3 as constraints
   └─ If maze → extract grid, run pathfinding

1. 运行二进制 — 理解I/O行为
   └─ 预期输入是什么？成功/失败输出是什么？

2. 用IDA/Ghidra打开 — 查找主循环
   └─ 查找带switch或间接跳转的while/for循环

3. 识别VM组件：
   ├─ 字节码位置（程序数据在哪里？）
   ├─ PC/IP变量（当前位置如何追踪？）
   ├─ 寄存器/栈（VM状态存储在哪里？）
   └─ I/O处理函数（哪些操作码读取输入/写入输出？）

4. 映射所有操作码（创建ISA规范）
   └─ 对每个case/处理函数：操作码编号、操作、操作数

5. 用Python编写反汇编器
   └─ 输出字节码的可读汇编代码

6. 分析反汇编后的程序：
   ├─ 找到输入读取逻辑
   ├─ 追踪输入经过的转换
   ├─ 找到与预期值的比较逻辑
   └─ 逆向转换得到有效输入

7. 求解：
   ├─ 如果是简单转换（XOR、ADD）→ 手动逆向
   ├─ 如果复杂 → 作为约束输入Z3求解
   └─ 如果是迷宫 → 提取网格，运行路径查找