vm-and-bytecode-reverse

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

SKILL: VM & Bytecode Reverse Engineering — Expert Analysis Playbook

技能:VM & 字节码逆向工程 — 专家分析手册

AI LOAD INSTRUCTION: Expert techniques for reversing custom virtual machines and bytecode interpreters. Covers dispatcher identification, opcode mapping, custom ISA reconstruction, disassembler/decompiler writing, maze challenges, and real-world VM protector analysis. Base models often fail to recognize the fetch-decode-execute pattern or attempt to analyze VM bytecode as native code.
AI加载说明:用于逆向自定义虚拟机和字节码解释器的专业技术,涵盖调度器识别、操作码映射、自定义ISA重建、反汇编器/反编译器编写、迷宫挑战分析以及真实场景VM保护程序分析。基础模型通常无法识别取指-解码-执行模式,或会尝试将VM字节码作为原生代码分析。

0. RELATED ROUTING

0. 相关关联路径

  • code-obfuscation-deobfuscation when the VM is a commercial protector (VMProtect/Themida)
  • symbolic-execution-tools when using angr to solve VM-based challenges
  • anti-debugging-techniques when the VM includes anti-debug checks
  • code-obfuscation-deobfuscation 当VM属于商业保护程序(VMProtect/Themida)时使用
  • symbolic-execution-tools 当使用angr求解基于VM的挑战时使用
  • anti-debugging-techniques 当VM包含反调试检查时使用

Quick identification

快速识别

Binary PatternLikely VM TypeStart With
while(1) { switch(bytecode[pc]) }
Switch-based dispatcherMap each case to an operation
Indirect jump via table
jmp [table + opcode*8]
Table-based dispatcherDump jump table, analyze handlers
Nested if-else chain on byte valueIf-chain dispatcherSame as switch, just different syntax
Stack push/pop dominant operationsStack-based VMIdentify push, pop, arithmetic ops
reg[X] = ...
array operations
Register-based VMMap register indices to operations
2D grid + direction inputMaze challengeExtract grid, apply BFS/DFS

二进制特征可能的VM类型入手点
while(1) { switch(bytecode[pc]) }
基于switch的调度器将每个case映射到对应操作
通过表实现间接跳转
jmp [table + opcode*8]
基于表的调度器导出跳转表,分析处理函数
基于字节值的嵌套if-else链基于if链的调度器与switch逻辑一致,仅语法不同
栈push/pop操作为主基于栈的VM识别push、pop、算术运算操作
reg[X] = ...
数组操作
基于寄存器的VM将寄存器索引映射到对应操作
2D网格 + 方向输入迷宫挑战提取网格,应用BFS/DFS算法

1. CUSTOM VM IDENTIFICATION

1. 自定义VM识别

1.1 Structural Indicators

1.1 结构特征

VM Architecture Components:
┌─────────────────────────────────┐
│  Bytecode Program (data section)│
├─────────────────────────────────┤
│  Program Counter (pc/ip)        │
│  Register File / Stack          │
│  Memory / Data Area             │
├─────────────────────────────────┤
│  Dispatcher Loop                │
│  ├─ Fetch: opcode = code[pc]    │
│  ├─ Decode: lookup handler      │
│  └─ Execute: run handler        │
└─────────────────────────────────┘
VM架构组件:
┌─────────────────────────────────┐
│  字节码程序(数据段)           │
├─────────────────────────────────┤
│  程序计数器(pc/ip)            │
│  寄存器组 / 栈                  │
│  内存 / 数据区                  │
├─────────────────────────────────┤
│  调度器循环                     │
│  ├─ 取指:opcode = code[pc]     │
│  ├─ 解码:查找处理函数          │
│  └─ 执行:运行处理函数          │
└─────────────────────────────────┘

1.2 IDA/Ghidra Signatures

1.2 IDA/Ghidra特征

Switch dispatcher (most common in CTF):
c
while (running) {
    unsigned char op = bytecode[pc++];
    switch (op) {
        case 0x00: /* nop */       break;
        case 0x01: /* push imm */  stack[sp++] = bytecode[pc++]; break;
        case 0x02: /* add */       stack[sp-2] += stack[sp-1]; sp--; break;
        // ...
        case 0xFF: /* halt */      running = 0; break;
    }
}
Table dispatcher (more optimized):
c
typedef void (*handler_t)(vm_ctx_t*);
handler_t handlers[256] = { handle_nop, handle_push, handle_add, ... };

while (running) {
    handlers[bytecode[pc++]](&ctx);
}

Switch调度器(CTF中最常见):
c
while (running) {
    unsigned char op = bytecode[pc++];
    switch (op) {
        case 0x00: /* nop */       break;
        case 0x01: /* push imm */  stack[sp++] = bytecode[pc++]; break;
        case 0x02: /* add */       stack[sp-2] += stack[sp-1]; sp--; break;
        // ...
        case 0xFF: /* halt */      running = 0; break;
    }
}
表调度器(更优化的实现):
c
typedef void (*handler_t)(vm_ctx_t*);
handler_t handlers[256] = { handle_nop, handle_push, handle_add, ... };

while (running) {
    handlers[bytecode[pc++]](&ctx);
}

2. ANALYSIS METHODOLOGY

2. 分析方法论

Step 1: Find the Dispatcher

步骤1:找到调度器

Look for:
  • Large switch statement (many cases) in a loop
  • Array of function pointers indexed by a byte from a data buffer
  • Single function with high cyclomatic complexity
  • Cross-references to a data buffer read byte-by-byte
查找以下特征:
  • 循环中的大型switch语句(包含大量case)
  • 由数据缓冲区读取的字节索引的函数指针数组
  • 高圈复杂度的单个函数
  • 逐字节读取数据缓冲区的交叉引用

Step 2: Map Opcodes to Operations

步骤2:将操作码映射到操作

For each case/handler, determine:
PropertyHow to Identify
Opcode valueCase number or table index
Operation typeRegister/stack modifications
Operand countHow many bytes consumed after opcode
Operand typeImmediate value, register index, or memory address
Side effectsOutput, memory write, flag modification
对每个case/处理函数,确定以下信息:
属性识别方式
操作码值case编号或表索引
操作类型寄存器/栈修改逻辑
操作数数量操作码之后消耗的字节数
操作数类型立即数、寄存器索引或内存地址
副作用输出、内存写入、标志位修改

Step 3: Extract Bytecode Program

步骤3:提取字节码程序

python
undefined
python
undefined

Typical extraction from binary

从二进制中提取的典型代码

import struct
with open('challenge', 'rb') as f: f.seek(bytecode_offset) bytecode = f.read(bytecode_length)
import struct
with open('challenge', 'rb') as f: f.seek(bytecode_offset) bytecode = f.read(bytecode_length)

Or from IDA:

或从IDA中提取:

bytecode = idc.get_bytes(bytecode_addr, bytecode_len)

bytecode = idc.get_bytes(bytecode_addr, bytecode_len)

undefined
undefined

Step 4: Write Custom Disassembler

步骤4:编写自定义反汇编器

python
OPCODES = {
    0x00: ("nop",  0),    # (mnemonic, operand_bytes)
    0x01: ("push", 1),    # push immediate byte
    0x02: ("pop",  0),
    0x03: ("add",  0),
    0x04: ("sub",  0),
    0x05: ("xor",  0),
    0x06: ("cmp",  0),
    0x07: ("jmp",  2),    # jump to 16-bit address
    0x08: ("je",   2),
    0x09: ("jne",  2),
    0x0A: ("mov",  2),    # mov reg, imm
    0x0B: ("load", 1),    # load from memory[operand]
    0x0C: ("store",1),    # store to memory[operand]
    0x0D: ("print",0),
    0x0E: ("read", 0),    # read input
    0xFF: ("halt", 0),
}

def disassemble(bytecode):
    pc = 0
    while pc < len(bytecode):
        op = bytecode[pc]
        if op not in OPCODES:
            print(f"  {pc:04x}: UNKNOWN {op:#04x}")
            pc += 1
            continue

        mnemonic, operand_size = OPCODES[op]
        operands = bytecode[pc+1:pc+1+operand_size]
        operand_str = ' '.join(f'{b:#04x}' for b in operands)
        print(f"  {pc:04x}: {mnemonic:8s} {operand_str}")
        pc += 1 + operand_size

disassemble(bytecode)
python
OPCODES = {
    0x00: ("nop",  0),    # (助记符, 操作数字节数)
    0x01: ("push", 1),    # 压入立即数字节
    0x02: ("pop",  0),
    0x03: ("add",  0),
    0x04: ("sub",  0),
    0x05: ("xor",  0),
    0x06: ("cmp",  0),
    0x07: ("jmp",  2),    # 跳转到16位地址
    0x08: ("je",   2),
    0x09: ("jne",  2),
    0x0A: ("mov",  2),    # 给寄存器赋值立即数
    0x0B: ("load", 1),    # 从memory[operand]加载
    0x0C: ("store",1),    # 存储到memory[operand]
    0x0D: ("print",0),
    0x0E: ("read", 0),    # 读取输入
    0xFF: ("halt", 0),
}

def disassemble(bytecode):
    pc = 0
    while pc < len(bytecode):
        op = bytecode[pc]
        if op not in OPCODES:
            print(f"  {pc:04x}: UNKNOWN {op:#04x}")
            pc += 1
            continue

        mnemonic, operand_size = OPCODES[op]
        operands = bytecode[pc+1:pc+1+operand_size]
        operand_str = ' '.join(f'{b:#04x}' for b in operands)
        print(f"  {pc:04x}: {mnemonic:8s} {operand_str}")
        pc += 1 + operand_size

disassemble(bytecode)

Step 5: Analyze Disassembled Program

步骤5:分析反汇编后的程序

With the custom disassembly, apply standard reverse engineering:
  • Identify input reading (read opcode)
  • Trace data flow from input to comparison
  • Determine success/failure conditions
  • Extract the check logic (often XOR/ADD transformations of input compared against constants)

拿到自定义反汇编结果后,应用标准逆向工程方法:
  • 识别输入读取逻辑(read操作码)
  • 追踪从输入到比较环节的数据流
  • 确定成功/失败条件
  • 提取校验逻辑(通常是对输入做XOR/ADD转换后与常量比较)

3. COMMON VM PATTERNS IN CTF

3. CTF中常见的VM模式

3.1 Stack-Based VM

3.1 基于栈的VM

Operations work on a stack (like JVM or Python bytecode).
OpcodeOperationStack Effect
PUSH immPush immediate value[...] → [..., imm]
POPDiscard top[..., a] → [...]
ADDAdd top two[..., a, b] → [..., a+b]
SUBSubtract[..., a, b] → [..., a-b]
MULMultiply[..., a, b] → [..., a*b]
XORBitwise XOR[..., a, b] → [..., a^b]
CMPCompare[..., a, b] → [..., (a==b)]
JMP addrUnconditional jumpno change
JZ addrJump if top is zero[..., a] → [...]
PRINTOutput top as char[..., a] → [...]
READRead char to stack[...] → [..., input]
HALTStop execution-
操作在栈上执行(类似JVM或Python字节码)。
操作码操作栈变化
PUSH imm压入立即数[...] → [..., imm]
POP丢弃栈顶元素[..., a] → [...]
ADD栈顶两个元素相加[..., a, b] → [..., a+b]
SUB栈顶两个元素相减[..., a, b] → [..., a-b]
MUL栈顶两个元素相乘[..., a, b] → [..., a*b]
XOR栈顶两个元素按位异或[..., a, b] → [..., a^b]
CMP比较栈顶两个元素[..., a, b] → [..., (a==b)]
JMP addr无条件跳转无变化
JZ addr栈顶为0则跳转[..., a] → [...]
PRINT将栈顶元素作为字符输出[..., a] → [...]
READ读取字符压入栈[...] → [..., input]
HALT停止执行-

3.2 Register-Based VM

3.2 基于寄存器的VM

Operations use register indices (like x86, ARM).
OpcodeFormatOperation
MOV r, imm
0x01 RR II II
reg[R] = imm16
MOV r1, r2
0x02 R1 R2
reg[R1] = reg[R2]
ADD r1, r2
0x03 R1 R2
reg[R1] += reg[R2]
SUB r1, r2
0x04 R1 R2
reg[R1] -= reg[R2]
XOR r1, r2
0x05 R1 R2
reg[R1] ^= reg[R2]
CMP r1, r2
0x06 R1 R2
flags = compare(r1, r2)
JMP addr
0x07 AA AA
pc = addr
JE addr
0x08 AA AA
if equal: pc = addr
LOAD r, [addr]
0x09 RR AA
reg[R] = mem[addr]
STORE [addr], r
0x0A AA RR
mem[addr] = reg[R]
SYSCALL
0x0B
I/O operation based on reg[0]
HALT
0xFF
stop
操作使用寄存器索引(类似x86、ARM)。
操作码格式操作
MOV r, imm
0x01 RR II II
reg[R] = imm16
MOV r1, r2
0x02 R1 R2
reg[R1] = reg[R2]
ADD r1, r2
0x03 R1 R2
reg[R1] += reg[R2]
SUB r1, r2
0x04 R1 R2
reg[R1] -= reg[R2]
XOR r1, r2
0x05 R1 R2
reg[R1] ^= reg[R2]
CMP r1, r2
0x06 R1 R2
flags = compare(r1, r2)
JMP addr
0x07 AA AA
pc = addr
JE addr
0x08 AA AA
相等则pc = addr
LOAD r, [addr]
0x09 RR AA
reg[R] = mem[addr]
STORE [addr], r
0x0A AA RR
mem[addr] = reg[R]
SYSCALL
0x0B
根据reg[0]执行I/O操作
HALT
0xFF
停止执行

3.3 Brainfuck-like / Esoteric VMs

3.3 Brainfuck类/晦涩VM

BF CommandVM EquivalentDescription
>
INC ptrMove data pointer right
<
DEC ptrMove data pointer left
+
INC [ptr]Increment byte at pointer
-
DEC [ptr]Decrement byte at pointer
.
OUTPUT [ptr]Output byte at pointer
,
INPUT [ptr]Input byte to pointer
[
JZ forwardJump past
]
if byte is zero
]
JNZ backJump back to
[
if byte is nonzero

BF指令VM等价操作描述
>
INC ptr数据指针右移
<
DEC ptr数据指针左移
+
INC [ptr]指针指向的字节加1
-
DEC [ptr]指针指向的字节减1
.
OUTPUT [ptr]输出指针指向的字节
,
INPUT [ptr]输入字节写入指针指向的位置
[
JZ forward如果字节为0则跳转到
]
之后
]
JNZ back如果字节非0则跳转回
[

4. MAZE CHALLENGES

4. 迷宫挑战

4.1 Identification

4.1 识别特征

  • Binary reads directional input (WASD, arrow keys, UDLR)
  • 2D array in data section (walls, paths, start, end)
  • Position tracking with x,y coordinates
  • Win condition at specific coordinates
  • 二进制读取方向输入(WASD、方向键、UDLR)
  • 数据段中存在2D数组(墙、路径、起点、终点)
  • 通过x,y坐标追踪位置
  • 到达特定坐标触发获胜条件

4.2 Map Extraction

4.2 地图提取

python
undefined
python
undefined

Extract maze grid from binary data section

从二进制数据段提取迷宫网格

MAZE_ADDR = 0x601060 WIDTH = 20 HEIGHT = 15
MAZE_ADDR = 0x601060 WIDTH = 20 HEIGHT = 15

From binary dump:

从二进制转储中提取:

maze = [] for row in range(HEIGHT): line = "" for col in range(WIDTH): cell = bytecode[MAZE_ADDR + row * WIDTH + col - base_addr] if cell == 0: line += "." # path elif cell == 1: line += "#" # wall elif cell == 2: line += "S" # start elif cell == 3: line += "E" # end else: line += "?" maze.append(line) print(line)
undefined
maze = [] for row in range(HEIGHT): line = "" for col in range(WIDTH): cell = bytecode[MAZE_ADDR + row * WIDTH + col - base_addr] if cell == 0: line += "." # 路径 elif cell == 1: line += "#" # 墙 elif cell == 2: line += "S" # 起点 elif cell == 3: line += "E" # 终点 else: line += "?" maze.append(line) print(line)
undefined

4.3 Automated Solving

4.3 自动求解

python
from collections import deque

def solve_maze(maze, start, end):
    """BFS solver returns direction string."""
    rows, cols = len(maze), len(maze[0])
    directions = {'U': (-1, 0), 'D': (1, 0), 'L': (0, -1), 'R': (0, 1)}
    queue = deque([(start, "")])
    visited = {start}

    while queue:
        (r, c), path = queue.popleft()
        if (r, c) == end:
            return path

        for name, (dr, dc) in directions.items():
            nr, nc = r + dr, c + dc
            if (0 <= nr < rows and 0 <= nc < cols and
                maze[nr][nc] != '#' and (nr, nc) not in visited):
                visited.add((nr, nc))
                queue.append(((nr, nc), path + name))

    return None
python
from collections import deque

def solve_maze(maze, start, end):
    """BFS求解器返回方向字符串"""
    rows, cols = len(maze), len(maze[0])
    directions = {'U': (-1, 0), 'D': (1, 0), 'L': (0, -1), 'R': (0, 1)}
    queue = deque([(start, "")])
    visited = {start}

    while queue:
        (r, c), path = queue.popleft()
        if (r, c) == end:
            return path

        for name, (dr, dc) in directions.items():
            nr, nc = r + dr, c + dc
            if (0 <= nr < rows and 0 <= nc < cols and
                maze[nr][nc] != '#' and (nr, nc) not in visited):
                visited.add((nr, nc))
                queue.append(((nr, nc), path + name))

    return None

Find start and end positions

查找起点和终点位置

for r, row in enumerate(maze): for c, cell in enumerate(row): if cell == 'S': start = (r, c) if cell == 'E': end = (r, c)
solution = solve_maze(maze, start, end) print(f"Path: {solution}")
undefined
for r, row in enumerate(maze): for c, cell in enumerate(row): if cell == 'S': start = (r, c) if cell == 'E': end = (r, c)
solution = solve_maze(maze, start, end) print(f"路径: {solution}")
undefined

4.4 Direction Encoding

4.4 方向编码

Different challenges encode directions differently:
EncodingUpDownLeftRight
WASDWSAD
UDLRUDLR
Arrow keys↑ (0x48)↓ (0x50)← (0x4B)→ (0x4D)
Numbers1234
Hex opcodes0x010x020x030x04

不同挑战的方向编码方式不同:
编码方式
WASDWSAD
UDLRUDLR
方向键↑ (0x48)↓ (0x50)← (0x4B)→ (0x4D)
数字1234
十六进制操作码0x010x020x030x04

5. REAL-WORLD VM PROTECTORS

5. 真实场景VM保护程序

5.1 VMProtect Analysis Approach

5.1 VMProtect分析方法

1. Find VM entry: search for pushad/pushfd sequence
2. Identify VM context structure (registers, flags, bytecode pointer)
3. Locate handler table (often obfuscated with opaque predicates)
4. For each handler:
   a. Remove junk code / opaque predicates
   b. Identify the core operation
   c. Document handler semantics
5. Trace bytecode execution (instruction-level trace)
6. Reconstruct original code from trace
1. 查找VM入口:搜索pushad/pushfd序列
2. 识别VM上下文结构(寄存器、标志位、字节码指针)
3. 定位处理函数表(通常被不透明谓词混淆)
4. 对每个处理函数:
   a. 移除垃圾代码/不透明谓词
   b. 识别核心操作
   c. 记录处理函数语义
5. 追踪字节码执行(指令级追踪)
6. 从追踪中重建原始代码

5.2 Tigress Obfuscator

5.2 Tigress混淆器

Academic VM obfuscator with configurable protection layers.
FeatureApproach
Single-dispatch VMStandard handler extraction
Split handlersHandlers spread across multiple functions
Nested VMsOuter VM handler invokes inner VM
Encrypted bytecodeDynamic decryption before each fetch
Polymorphic handlersDifferent code for same operation on each build
学术级VM混淆器,支持配置保护层。
特性分析方法
单调度VM标准处理函数提取
拆分处理函数处理函数分散在多个函数中
嵌套VM外层VM处理函数调用内层VM
加密字节码每次取指前动态解密
多态处理函数每次构建相同操作对应不同代码

5.3 Common VM Protector Patterns

5.3 常见VM保护程序模式

ProtectorDispatcher StyleDifficulty
VMProtectTable + opaque predicatesHigh
Themida (Code Virtualizer)CISC-like, large handler setHigh
TigressConfigurable, academicMedium-High
Custom CTF VMSimple switchLow-Medium
MovfuscatorAll-mov computationMedium

保护程序调度器类型难度
VMProtect表+不透明谓词
Themida (Code Virtualizer)类CISC,处理函数集庞大
Tigress可配置,学术级中高
自定义CTF VM简单switch低中
Movfuscator全mov指令计算

6. TOOLS

6. 工具

ToolPurposeUsage
IDA ProIdentify dispatcher, reverse handlersF5 decompile, xref analysis
GhidraFree alternative with Sleigh processor modulesWrite custom processor for VM ISA
angrSymbolic execution through VMTreat entire VM as constraint system
Pin / DynamoRIODynamic instrumentation for tracingRecord opcode handler execution sequence
REVENFull-system trace recordingReplay and analyze VM execution
UnicornEmulate VM executionFast handler emulation
MiasmIR-based analysisLift VM handlers to IR for analysis
Custom PythonWrite disassembler/decompilerPer-challenge custom tooling
工具用途使用场景
IDA Pro识别调度器,逆向处理函数F5反编译,交叉引用分析
Ghidra免费替代工具,支持Sleigh处理器模块为VM ISA编写自定义处理器
angr通过VM做符号执行将整个VM视为约束系统
Pin / DynamoRIO动态插桩追踪记录操作码处理函数执行序列
REVEN全系统追踪录制重放和分析VM执行过程
Unicorn模拟VM执行快速处理函数模拟
Miasm基于IR的分析将VM处理函数 lifting 到IR进行分析
自定义Python脚本编写反汇编器/反编译器每个挑战定制工具

Ghidra Sleigh Processor Module

Ghidra Sleigh处理器模块

For recurring VM architectures, write a Sleigh processor specification:
define space ram      type=ram_space      size=2  default;
define space register type=register_space  size=1;

define register offset=0 size=1 [ R0 R1 R2 R3 FLAGS PC SP ];

define token opcode(8)
    op = (0,7)
;

:NOP    is op=0x00 { }
:PUSH   imm is op=0x01; imm { SP = SP - 1; *[ram]:1 SP = imm; }
:POP    is op=0x02 { SP = SP + 1; }
:ADD    is op=0x03 { local a = *[ram]:1 (SP+1); *[ram]:1 (SP+1) = a + *[ram]:1 SP; SP = SP + 1; }

对重复出现的VM架构,编写Sleigh处理器规范:
define space ram      type=ram_space      size=2  default;
define space register type=register_space  size=1;

define register offset=0 size=1 [ R0 R1 R2 R3 FLAGS PC SP ];

define token opcode(8)
    op = (0,7)
;

:NOP    is op=0x00 { }
:PUSH   imm is op=0x01; imm { SP = SP - 1; *[ram]:1 SP = imm; }
:POP    is op=0x02 { SP = SP + 1; }
:ADD    is op=0x03 { local a = *[ram]:1 (SP+1); *[ram]:1 (SP+1) = a + *[ram]:1 SP; SP = SP + 1; }

7. DECISION TREE

7. 决策树

Binary contains custom bytecode interpreter?
├─ Can you identify the dispatcher?
│  ├─ Yes (switch/table/if-chain)
│  │  ├─ Few opcodes (< 20) → Simple CTF VM
│  │  │  ├─ Stack-based → map push/pop/arithmetic ops
│  │  │  ├─ Register-based → map mov/add/cmp ops
│  │  │  └─ Write disassembler → analyze program → solve
│  │  │
│  │  └─ Many opcodes (50+) → Commercial protector
│  │     ├─ Known protector → use specific deprotection tools
│  │     └─ Custom → trace execution, pattern-match handlers
│  │
│  └─ No clear dispatcher
│     ├─ All-mov instructions → movfuscator
│     ├─ Encrypted bytecode → find decryption, dump after decode
│     └─ Split/distributed handlers → trace execution to find them
├─ Is it a maze challenge?
│  ├─ Extract grid from data section
│  ├─ Identify direction encoding
│  ├─ BFS/DFS to find shortest path
│  └─ Convert path to expected input format
├─ Is there input validation in VM?
│  ├─ Small input space → brute-force via Unicorn emulation
│  ├─ Known format → constrained angr solve
│  └─ Complex check → write disassembler, analyze check logic
└─ Multiple VM layers (VM in VM)?
   ├─ Analyze outer VM first
   ├─ Extract inner bytecode
   ├─ Repeat analysis for inner VM
   └─ Consider: symbolic execution may handle nested VMs directly

二进制包含自定义字节码解释器?
├─ 能否识别调度器?
│  ├─ 是(switch/表/if链)
│  │  ├─ 操作码少(<20)→ 简单CTF VM
│  │  │  ├─ 基于栈 → 映射push/pop/算术操作
│  │  │  ├─ 基于寄存器 → 映射mov/add/cmp操作
│  │  │  └─ 编写反汇编器 → 分析程序 → 求解
│  │  │
│  │  └─ 操作码多(50+)→ 商业保护程序
│  │     ├─ 已知保护程序 → 使用专用脱壳工具
│  │     └─ 自定义 → 追踪执行,模式匹配处理函数
│  │
│  └─ 无明确调度器
│     ├─ 全mov指令 → movfuscator
│     ├─ 加密字节码 → 查找解密逻辑,解码后转储
│     └─ 拆分/分布式处理函数 → 执行追踪查找处理函数
├─ 是否为迷宫挑战?
│  ├─ 从数据段提取网格
│  ├─ 识别方向编码
│  ├─ BFS/DFS查找最短路径
│  └─ 将路径转换为预期输入格式
├─ VM中是否存在输入校验?
│  ├─ 输入空间小 → 通过Unicorn模拟暴力破解
│  ├─ 已知格式 → 约束angr求解
│  └─ 复杂校验 → 编写反汇编器,分析校验逻辑
└─ 多层VM(VM中嵌套VM)?
   ├─ 先分析外层VM
   ├─ 提取内层字节码
   ├─ 对内层VM重复分析流程
   └─ 可选:符号执行可直接处理嵌套VM

8. CTF SOLVING WORKFLOW

8. CTF求解工作流

1. Run the binary — understand I/O behavior
   └─ What input does it expect? What output on success/failure?

2. Open in IDA/Ghidra — find the main loop
   └─ Look for while/for loop with switch or indirect jump

3. Identify VM components:
   ├─ Bytecode location (where is the program data?)
   ├─ PC/IP variable (how is current position tracked?)
   ├─ Registers/stack (where is VM state stored?)
   └─ I/O handlers (which opcodes read input / write output?)

4. Map all opcodes (create the ISA specification)
   └─ For each case/handler: opcode number, operation, operands

5. Write disassembler in Python
   └─ Output readable assembly for the bytecode

6. Analyze the disassembled program:
   ├─ Find input reading
   ├─ Trace transformations applied to input
   ├─ Find comparison against expected values
   └─ Reverse the transformation to find valid input

7. Solve:
   ├─ If simple transforms (XOR, ADD) → reverse manually
   ├─ If complex → feed to Z3 as constraints
   └─ If maze → extract grid, run pathfinding
1. 运行二进制 — 理解I/O行为
   └─ 预期输入是什么?成功/失败输出是什么?

2. 用IDA/Ghidra打开 — 查找主循环
   └─ 查找带switch或间接跳转的while/for循环

3. 识别VM组件:
   ├─ 字节码位置(程序数据在哪里?)
   ├─ PC/IP变量(当前位置如何追踪?)
   ├─ 寄存器/栈(VM状态存储在哪里?)
   └─ I/O处理函数(哪些操作码读取输入/写入输出?)

4. 映射所有操作码(创建ISA规范)
   └─ 对每个case/处理函数:操作码编号、操作、操作数

5. 用Python编写反汇编器
   └─ 输出字节码的可读汇编代码

6. 分析反汇编后的程序:
   ├─ 找到输入读取逻辑
   ├─ 追踪输入经过的转换
   ├─ 找到与预期值的比较逻辑
   └─ 逆向转换得到有效输入

7. 求解:
   ├─ 如果是简单转换(XOR、ADD)→ 手动逆向
   ├─ 如果复杂 → 作为约束输入Z3求解
   └─ 如果是迷宫 → 提取网格,运行路径查找