re-source

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese
For structure recovery patterns, see:
references/struct-recovery-patterns.md
This skill teaches a methodology for recovering source-level understanding from compiled binaries using idasql. It orchestrates the other skills into a systematic workflow.

关于结构体恢复模式,请参阅:
references/struct-recovery-patterns.md
本技能介绍一种使用idasql从编译后的二进制文件中恢复源码级理解的方法论。它将其他技能编排成一套系统化的工作流程。

Core Workflow: Recursive Re-Sourcing

核心工作流程:递归逆向溯源

1. Start at a Function

1. 从某个函数开始

Pick a function — an entry point, a function referenced by an interesting string, or a callee of a known function.
sql
-- Decompile the target function
SELECT decompile(0x401000);

-- Or by name
SELECT decompile('DriverEntry');
选择一个函数——可以是入口点、由感兴趣的字符串引用的函数,或是已知函数的被调用函数。
sql
-- 反编译目标函数
SELECT decompile(0x401000);

-- 或按名称反编译
SELECT decompile('DriverEntry');

2. Annotate the Function

2. 注释函数

Use the
annotations
skill to edit the decompilation into something readable:
sql
-- Rename local variables to meaningful names
UPDATE ctree_lvars SET name = 'driver_object' WHERE func_addr = 0x401000 AND idx = 0;
UPDATE ctree_lvars SET name = 'registry_path' WHERE func_addr = 0x401000 AND idx = 1;

-- Apply types to arguments/locals
UPDATE ctree_lvars SET type = 'PDRIVER_OBJECT'
WHERE func_addr = 0x401000 AND idx = 0;

-- Inspect pseudocode anchors before writing comments
SELECT line_num, ea, line, comment
FROM pseudocode
WHERE func_addr = 0x401000
ORDER BY line_num;

-- Add inline comments explaining logic
-- Example below uses a previously resolved writable anchor, not the function entry row.
UPDATE pseudocode SET comment = 'initialize dispatch table'
WHERE func_addr = 0x401000 AND ea = 0x401020;

-- Verify changes
SELECT decompile(0x401000, 1);
使用
annotations
技能将反编译代码编辑为更易读的形式:
sql
-- 将局部变量重命名为有意义的名称
UPDATE ctree_lvars SET name = 'driver_object' WHERE func_addr = 0x401000 AND idx = 0;
UPDATE ctree_lvars SET name = 'registry_path' WHERE func_addr = 0x401000 AND idx = 1;

-- 为参数/局部变量设置类型
UPDATE ctree_lvars SET type = 'PDRIVER_OBJECT'
WHERE func_addr = 0x401000 AND idx = 0;

-- 添加注释前先查看伪代码锚点
SELECT line_num, ea, line, comment
FROM pseudocode
WHERE func_addr = 0x401000
ORDER BY line_num;

-- 添加解释逻辑的行内注释
-- 下面的示例使用之前解析的可写锚点,而非函数入口行。
UPDATE pseudocode SET comment = 'initialize dispatch table'
WHERE func_addr = 0x401000 AND ea = 0x401020;

-- 验证修改
SELECT decompile(0x401000, 1);

3. Set a Function Comment

3. 设置函数注释

Write a concise summary describing what the function does. This makes the function indexable for later queries. For exact trigger semantics (
function summary
/
func-summary
/ singular
add function comment
), follow the
annotations
skill's Function Summary contract.
sql
SELECT address, name, comment, rpt_comment
FROM funcs
WHERE address = 0x401000;

UPDATE funcs
SET rpt_comment = 'DriverEntry: initializes driver dispatch routines and device object'
WHERE address = 0x401000;
编写简洁的摘要描述函数功能,以便后续查询时能索引到该函数。 关于确切的触发语义(
function summary
/
func-summary
/ 单数形式
add function comment
),请遵循
annotations
技能的函数摘要约定。
sql
SELECT address, name, comment, rpt_comment
FROM funcs
WHERE address = 0x401000;

UPDATE funcs
SET rpt_comment = 'DriverEntry: initializes driver dispatch routines and device object'
WHERE address = 0x401000;

4. Recurse into Callees

4. 递归处理被调用函数

Follow calls inside the function. Annotate each callee the same way, building understanding bottom-up.
sql
-- List callees to visit
SELECT callee_name, printf('0x%X', callee_addr) as addr
FROM disasm_calls WHERE func_addr = 0x401000;

-- Or map the full call subtree at once (BFS with depth tracking)
SELECT func_name, depth FROM call_graph
WHERE start = 0x401000 AND direction = 'down' AND max_depth = 5;

-- Decompile each callee
SELECT decompile(0x401050);

-- Annotate and recurse...
跟踪函数内部的调用,以相同方式注释每个被调用函数,自底向上构建理解。
sql
-- 列出需要处理的被调用函数
SELECT callee_name, printf('0x%X', callee_addr) as addr
FROM disasm_calls WHERE func_addr = 0x401000;

-- 或一次性映射完整的调用子树(带深度追踪的广度优先搜索)
SELECT func_name, depth FROM call_graph
WHERE start = 0x401000 AND direction = 'down' AND max_depth = 5;

-- 反编译每个被调用函数
SELECT decompile(0x401050);

-- 注释并递归处理...

5. Recurse into Callers

5. 递归处理调用函数

Follow callers to build the bigger picture: how is this function used?
sql
-- Who calls this function?
SELECT caller_name, printf('0x%X', caller_addr) as addr
FROM callers WHERE func_addr = 0x401000;

-- Or map ALL transitive callers at once
SELECT func_name, depth FROM call_graph
WHERE start = 0x401000 AND direction = 'up' AND max_depth = 10;

-- Find the shortest path from an entry point to this function
SELECT step, func_name FROM shortest_path
WHERE from_addr = (SELECT address FROM funcs WHERE name = 'main')
  AND to_addr = 0x401000 AND max_depth = 20;

-- Decompile callers to see usage context
SELECT decompile(0x400F00);
跟踪调用函数以构建全局视图:该函数是如何被使用的?
sql
-- 哪些函数调用了此函数?
SELECT caller_name, printf('0x%X', caller_addr) as addr
FROM callers WHERE func_addr = 0x401000;

-- 或一次性映射所有间接调用函数
SELECT func_name, depth FROM call_graph
WHERE start = 0x401000 AND direction = 'up' AND max_depth = 10;

-- 找到从入口点到该函数的最短路径
SELECT step, func_name FROM shortest_path
WHERE from_addr = (SELECT address FROM funcs WHERE name = 'main')
  AND to_addr = 0x401000 AND max_depth = 20;

-- 反编译调用函数查看使用上下文
SELECT decompile(0x400F00);

6. Structure Recovery

6. 结构体恢复

The hardest part. Decompiled code often shows casts like
*(DWORD *)(a1 + 0x10)
— these are structure field accesses.
这是最困难的部分。反编译代码常出现类似
*(DWORD *)(a1 + 0x10)
的强制转换——这些都是结构体字段访问。

Step-by-step Process

分步流程

a) Identify offset patterns in a single function:
sql
-- Look at the decompiled code for cast patterns
SELECT decompile(0x401000);

-- Query ctree for pointer arithmetic (field accesses)
SELECT ea, op_name, num_value
FROM ctree WHERE func_addr = 0x401000
  AND op_name IN ('cot_add', 'cot_idx')
  AND num_value IS NOT NULL;
b) Cross-function correlation — find more fields:
sql
-- Find all callers that pass the same struct pointer
SELECT DISTINCT (SELECT name FROM funcs WHERE dc.func_addr >= address AND dc.func_addr < end_ea LIMIT 1) as caller
FROM disasm_calls dc
WHERE dc.callee_addr = 0x401000;

-- Decompile each caller and look for more offset accesses
-- Each caller may reveal different fields of the same struct
c) Callee correlation — let callees reveal field types:
sql
-- What does the function call with the struct pointer?
SELECT COALESCE(call_obj_name, call_helper_name) AS callee_name,
       arg_idx, arg_op, arg_num_value
FROM ctree_call_args
WHERE func_addr = 0x401000
  AND arg_var_name = 'a1';
-- If callee expects HANDLE, the field at that offset is a HANDLE
d) Build the struct incrementally:
sql
-- Create the struct
INSERT INTO types (name, kind) VALUES ('MY_CONTEXT', 'struct');

-- Get the ordinal
SELECT ordinal FROM types WHERE name = 'MY_CONTEXT';

-- Add fields as you discover them
INSERT INTO types_members (type_ordinal, member_name, member_type)
VALUES (42, 'handle', 'HANDLE');
INSERT INTO types_members (type_ordinal, member_name, member_type)
VALUES (42, 'buffer_ptr', 'void *');
INSERT INTO types_members (type_ordinal, member_name, member_type)
VALUES (42, 'buffer_size', 'unsigned int');
e) Apply the recovered struct:
sql
-- Apply to function prototype
UPDATE funcs SET prototype = 'int __fastcall process_context(MY_CONTEXT *ctx);'
WHERE address = 0x401000;

-- Or apply to a local variable
UPDATE ctree_lvars SET type = 'MY_CONTEXT *'
WHERE func_addr = 0x401000 AND idx = 0;

-- Re-decompile to verify clean rendering
SELECT decompile(0x401000, 1);
a) 在单个函数中识别偏移模式:
sql
-- 查看反编译代码中的强制转换模式
SELECT decompile(0x401000);

-- 查询ctree中的指针运算(字段访问)
SELECT ea, op_name, num_value
FROM ctree WHERE func_addr = 0x401000
  AND op_name IN ('cot_add', 'cot_idx')
  AND num_value IS NOT NULL;
b) 跨函数关联——发现更多字段:
sql
-- 找到所有传递相同结构体指针的调用函数
SELECT DISTINCT (SELECT name FROM funcs WHERE dc.func_addr >= address AND dc.func_addr < end_ea LIMIT 1) as caller
FROM disasm_calls dc
WHERE dc.callee_addr = 0x401000;

-- 反编译每个调用函数,查找更多偏移访问
-- 每个调用函数可能揭示同一结构体的不同字段
c) 被调用函数关联——通过被调用函数揭示字段类型:
sql
-- 函数使用结构体指针调用了什么?
SELECT COALESCE(call_obj_name, call_helper_name) AS callee_name,
       arg_idx, arg_op, arg_num_value
FROM ctree_call_args
WHERE func_addr = 0x401000
  AND arg_var_name = 'a1';
-- 如果被调用函数期望HANDLE类型,则该偏移处的字段为HANDLE类型
d) 逐步构建结构体:
sql
-- 创建结构体
INSERT INTO types (name, kind) VALUES ('MY_CONTEXT', 'struct');

-- 获取序号
SELECT ordinal FROM types WHERE name = 'MY_CONTEXT';

-- 发现字段后添加到结构体
INSERT INTO types_members (type_ordinal, member_name, member_type)
VALUES (42, 'handle', 'HANDLE');
INSERT INTO types_members (type_ordinal, member_name, member_type)
VALUES (42, 'buffer_ptr', 'void *');
INSERT INTO types_members (type_ordinal, member_name, member_type)
VALUES (42, 'buffer_size', 'unsigned int');
e) 应用恢复后的结构体:
sql
-- 应用到函数原型
UPDATE funcs SET prototype = 'int __fastcall process_context(MY_CONTEXT *ctx);'
WHERE address = 0x401000;

-- 或应用到局部变量
UPDATE ctree_lvars SET type = 'MY_CONTEXT *'
WHERE func_addr = 0x401000 AND idx = 0;

-- 重新反编译以验证显示效果
SELECT decompile(0x401000, 1);

7. Track Progress

7. 追踪进度

Use
netnode_kv
to persist progress across sessions:
sql
-- Mark a function as fully annotated
INSERT INTO netnode_kv(key, value)
VALUES('re_source:0x401000', '{"status":"done","summary":"DriverEntry init"}');

-- Check progress
SELECT key, value FROM netnode_kv WHERE key LIKE 're_source:%';

使用
netnode_kv
在会话间保存进度:
sql
-- 将函数标记为已完全注释
INSERT INTO netnode_kv(key, value)
VALUES('re_source:0x401000', '{"status":"done","summary":"DriverEntry init"}');

-- 查看进度
SELECT key, value FROM netnode_kv WHERE key LIKE 're_source:%';

Advanced Re-Sourcing Patterns (CTEs)

高级逆向溯源模式(CTE)

Transitive caller discovery

间接调用函数发现

Find all functions that transitively pass a struct through a chain of calls — who ultimately provides the data?
Prefer
call_graph
for simple traversal:
SELECT func_name, depth FROM call_graph WHERE start = 0x401000 AND direction = 'up' AND max_depth = 5
replaces the CTE below. Use the CTE only when you need to JOIN caller context (e.g. offset accesses) at each step.
sql
-- Recursive CTE: walk callers up to 5 levels
WITH RECURSIVE caller_chain AS (
    -- Base: direct callers of the struct-consuming function
    SELECT c.caller_func_addr AS func_addr,
           c.caller_name AS func_name,
           1 AS depth
    FROM callers c
    WHERE c.func_addr = 0x401000

    UNION ALL

    -- Recurse: callers of callers
    SELECT c.caller_func_addr,
           c.caller_name,
           cc.depth + 1
    FROM caller_chain cc
    JOIN callers c ON c.func_addr = cc.func_addr
    WHERE cc.depth < 5
)
SELECT DISTINCT func_name, printf('0x%X', func_addr) AS addr, MIN(depth) AS min_depth
FROM caller_chain
GROUP BY func_addr
ORDER BY min_depth;
找到所有通过调用链间接传递结构体的函数——最终是谁提供了这些数据?
简单遍历优先使用
call_graph
SELECT func_name, depth FROM call_graph WHERE start = 0x401000 AND direction = 'up' AND max_depth = 5
可替代下面的CTE。仅当需要在每一步关联调用函数上下文(如偏移访问)时才使用CTE。
sql
-- 递归CTE:向上遍历最多5层的调用函数
WITH RECURSIVE caller_chain AS (
    -- 基础:直接调用结构体处理函数的函数
    SELECT c.caller_func_addr AS func_addr,
           c.caller_name AS func_name,
           1 AS depth
    FROM callers c
    WHERE c.func_addr = 0x401000

    UNION ALL

    -- 递归:调用函数的调用函数
    SELECT c.caller_func_addr,
           c.caller_name,
           cc.depth + 1
    FROM caller_chain cc
    JOIN callers c ON c.func_addr = cc.func_addr
    WHERE cc.depth < 5
)
SELECT DISTINCT func_name, printf('0x%X', func_addr) AS addr, MIN(depth) AS min_depth
FROM caller_chain
GROUP BY func_addr
ORDER BY min_depth;

Aggregate offset accesses across all callers

聚合所有调用函数的偏移访问

Build a comprehensive struct field map by collecting offset patterns from every function that touches the struct:
sql
-- Collect field offset accesses from all functions that call process_context
WITH callers_of AS (
    SELECT DISTINCT func_addr
    FROM disasm_calls
    WHERE callee_addr = 0x401000
),
offset_accesses AS (
    SELECT func_addr,
           (SELECT name FROM funcs WHERE func_addr >= address AND func_addr < end_ea LIMIT 1) AS func_name,
           num_value AS field_offset,
           op_name
    FROM ctree
    WHERE func_addr IN (SELECT func_addr FROM callers_of)
      AND op_name IN ('cot_add', 'cot_idx')
      AND num_value IS NOT NULL
      AND num_value BETWEEN 0 AND 0x1000
)
SELECT field_offset,
       printf('0x%X', field_offset) AS hex_offset,
       COUNT(DISTINCT func_addr) AS seen_in_funcs,
       GROUP_CONCAT(DISTINCT func_name) AS functions
FROM offset_accesses
GROUP BY field_offset
ORDER BY field_offset;
通过收集所有接触该结构体的函数中的偏移模式,构建全面的结构体字段映射:
sql
-- 收集所有调用process_context的函数中的字段偏移访问
WITH callers_of AS (
    SELECT DISTINCT func_addr
    FROM disasm_calls
    WHERE callee_addr = 0x401000
),
offset_accesses AS (
    SELECT func_addr,
           (SELECT name FROM funcs WHERE func_addr >= address AND func_addr < end_ea LIMIT 1) AS func_name,
           num_value AS field_offset,
           op_name
    FROM ctree
    WHERE func_addr IN (SELECT func_addr FROM callers_of)
      AND op_name IN ('cot_add', 'cot_idx')
      AND num_value IS NOT NULL
      AND num_value BETWEEN 0 AND 0x1000
)
SELECT field_offset,
       printf('0x%X', field_offset) AS hex_offset,
       COUNT(DISTINCT func_addr) AS seen_in_funcs,
       GROUP_CONCAT(DISTINCT func_name) AS functions
FROM offset_accesses
GROUP BY field_offset
ORDER BY field_offset;

Find struct-heavy functions (candidates for structure recovery)

查找结构体密集型函数(结构体恢复候选)

Functions with the most
cot_add
offset patterns are likely manipulating structs through raw pointer arithmetic:
sql
-- Functions with most pointer arithmetic (struct field access candidates)
WITH offset_funcs AS (
    SELECT func_addr,
           COUNT(*) AS offset_accesses,
           COUNT(DISTINCT num_value) AS unique_offsets
    FROM ctree
    WHERE func_addr IN (SELECT address FROM funcs ORDER BY size DESC LIMIT 100)
      AND op_name = 'cot_add'
      AND num_value IS NOT NULL
      AND num_value BETWEEN 1 AND 0x1000
    GROUP BY func_addr
)
SELECT (SELECT name FROM funcs WHERE func_addr >= address AND func_addr < end_ea LIMIT 1) AS name,
       printf('0x%X', func_addr) AS addr,
       offset_accesses,
       unique_offsets
FROM offset_funcs
ORDER BY unique_offsets DESC
LIMIT 15;
包含最多
cot_add
偏移模式的函数,很可能通过原始指针运算操作结构体:
sql
-- 包含最多指针运算的函数(结构体字段访问候选)
WITH offset_funcs AS (
    SELECT func_addr,
           COUNT(*) AS offset_accesses,
           COUNT(DISTINCT num_value) AS unique_offsets
    FROM ctree
    WHERE func_addr IN (SELECT address FROM funcs ORDER BY size DESC LIMIT 100)
      AND op_name = 'cot_add'
      AND num_value IS NOT NULL
      AND num_value BETWEEN 1 AND 0x1000
    GROUP BY func_addr
)
SELECT (SELECT name FROM funcs WHERE func_addr >= address AND func_addr < end_ea LIMIT 1) AS name,
       printf('0x%X', func_addr) AS addr,
       offset_accesses,
       unique_offsets
FROM offset_funcs
ORDER BY unique_offsets DESC
LIMIT 15;

Cross-reference struct field offsets with known type sizes

将结构体字段偏移与已知类型大小交叉引用

Match observed offsets against field sizes of existing types to guess field types:
sql
-- Compare observed offsets with known struct sizes
WITH observed AS (
    SELECT DISTINCT num_value AS offset
    FROM ctree
    WHERE func_addr = 0x401000
      AND op_name = 'cot_add'
      AND num_value IS NOT NULL
      AND num_value BETWEEN 0 AND 0x200
),
candidate_types AS (
    SELECT t.name AS type_name, t.size AS type_size
    FROM types t
    WHERE t.is_struct = 1 AND t.size > 0
)
SELECT o.offset, printf('0x%X', o.offset) AS hex_offset,
       ct.type_name, ct.type_size
FROM observed o
LEFT JOIN candidate_types ct ON ct.type_size = o.offset
ORDER BY o.offset;

将观测到的偏移与现有类型的字段大小匹配,推测字段类型:
sql
-- 比较观测到的偏移与已知结构体大小
WITH observed AS (
    SELECT DISTINCT num_value AS offset
    FROM ctree
    WHERE func_addr = 0x401000
      AND op_name = 'cot_add'
      AND num_value IS NOT NULL
      AND num_value BETWEEN 0 AND 0x200
),
candidate_types AS (
    SELECT t.name AS type_name, t.size AS type_size
    FROM types t
    WHERE t.is_struct = 1 AND t.size > 0
)
SELECT o.offset, printf('0x%X', o.offset) AS hex_offset,
       ct.type_name, ct.type_size
FROM observed o
LEFT JOIN candidate_types ct ON ct.type_size = o.offset
ORDER BY o.offset;

Key Principles

核心原则

  1. Bottom-up understanding: Start with leaf callees, annotate them, then work up to callers. Each annotated callee makes the caller easier to understand.
  2. Cross-function struct correlation: A single function rarely reveals the full struct layout. Look at multiple callers/callees of the same function to discover different fields.
  3. Iterative refinement: Apply what you know, re-decompile, see if the output improves. Add more fields/types as you discover them.
  4. Verify every edit: Follow the Mandatory Mutation Loop (read → edit → refresh → verify) from the
    annotations
    skill.
  5. Save periodically: Use
    SELECT save_database()
    to persist your work.

  1. 自底向上理解:从叶子被调用函数开始,注释它们,再向上处理调用函数。每个已注释的被调用函数都会让调用函数的理解变得更容易。
  2. 跨函数结构体关联:单个函数很少能揭示完整的结构体布局。查看同一函数的多个调用/被调用函数,以发现不同字段。
  3. 迭代优化:应用已知信息,重新反编译,查看输出是否改善。发现新字段/类型时及时添加。
  4. 验证每一处修改:遵循
    annotations
    技能中的强制变更循环(读取 → 编辑 → 刷新 → 验证)。
  5. 定期保存:使用
    SELECT save_database()
    保存工作成果。

Related Skills

相关技能

  • annotations
    — The editing/annotation workflow: how to rename, retype, comment
  • decompiler
    — Deep decompiler reference: ctree, types, parse_decls, union selection
  • types
    — Type system mechanics: struct/union/enum creation and manipulation
  • xrefs
    — Caller/callee traversal,
    call_graph
    /
    shortest_path
    tables,
    string_refs
    view
  • disassembly
    cfg_edges
    for control flow understanding during struct recovery
  • storage
    — netnode_kv for tracking progress across sessions
  • annotations
    — 编辑/注释工作流程:重命名、重新定义类型、添加注释的方法
  • decompiler
    — 反编译器深度参考:ctree、类型、parse_decls、联合体选择
  • types
    — 类型系统机制:结构体/联合体/枚举的创建与操作
  • xrefs
    — 调用/被调用函数遍历、
    call_graph
    /
    shortest_path
    表、
    string_refs
    视图
  • disassembly
    — 结构体恢复过程中用于理解控制流的
    cfg_edges
  • storage
    — 用于跨会话追踪进度的netnode_kv