re-source
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseFor structure recovery patterns, see:
references/struct-recovery-patterns.mdThis skill teaches a methodology for recovering source-level understanding from compiled binaries using idasql. It orchestrates the other skills into a systematic workflow.
关于结构体恢复模式,请参阅:
references/struct-recovery-patterns.md本技能介绍一种使用idasql从编译后的二进制文件中恢复源码级理解的方法论。它将其他技能编排成一套系统化的工作流程。
Core Workflow: Recursive Re-Sourcing
核心工作流程:递归逆向溯源
1. Start at a Function
1. 从某个函数开始
Pick a function — an entry point, a function referenced by an interesting string, or a callee of a known function.
sql
-- Decompile the target function
SELECT decompile(0x401000);
-- Or by name
SELECT decompile('DriverEntry');选择一个函数——可以是入口点、由感兴趣的字符串引用的函数,或是已知函数的被调用函数。
sql
-- 反编译目标函数
SELECT decompile(0x401000);
-- 或按名称反编译
SELECT decompile('DriverEntry');2. Annotate the Function
2. 注释函数
Use the skill to edit the decompilation into something readable:
annotationssql
-- Rename local variables to meaningful names
UPDATE ctree_lvars SET name = 'driver_object' WHERE func_addr = 0x401000 AND idx = 0;
UPDATE ctree_lvars SET name = 'registry_path' WHERE func_addr = 0x401000 AND idx = 1;
-- Apply types to arguments/locals
UPDATE ctree_lvars SET type = 'PDRIVER_OBJECT'
WHERE func_addr = 0x401000 AND idx = 0;
-- Inspect pseudocode anchors before writing comments
SELECT line_num, ea, line, comment
FROM pseudocode
WHERE func_addr = 0x401000
ORDER BY line_num;
-- Add inline comments explaining logic
-- Example below uses a previously resolved writable anchor, not the function entry row.
UPDATE pseudocode SET comment = 'initialize dispatch table'
WHERE func_addr = 0x401000 AND ea = 0x401020;
-- Verify changes
SELECT decompile(0x401000, 1);使用技能将反编译代码编辑为更易读的形式:
annotationssql
-- 将局部变量重命名为有意义的名称
UPDATE ctree_lvars SET name = 'driver_object' WHERE func_addr = 0x401000 AND idx = 0;
UPDATE ctree_lvars SET name = 'registry_path' WHERE func_addr = 0x401000 AND idx = 1;
-- 为参数/局部变量设置类型
UPDATE ctree_lvars SET type = 'PDRIVER_OBJECT'
WHERE func_addr = 0x401000 AND idx = 0;
-- 添加注释前先查看伪代码锚点
SELECT line_num, ea, line, comment
FROM pseudocode
WHERE func_addr = 0x401000
ORDER BY line_num;
-- 添加解释逻辑的行内注释
-- 下面的示例使用之前解析的可写锚点,而非函数入口行。
UPDATE pseudocode SET comment = 'initialize dispatch table'
WHERE func_addr = 0x401000 AND ea = 0x401020;
-- 验证修改
SELECT decompile(0x401000, 1);3. Set a Function Comment
3. 设置函数注释
Write a concise summary describing what the function does. This makes the function indexable for later queries.
For exact trigger semantics ( / / singular ), follow the skill's Function Summary contract.
function summaryfunc-summaryadd function commentannotationssql
SELECT address, name, comment, rpt_comment
FROM funcs
WHERE address = 0x401000;
UPDATE funcs
SET rpt_comment = 'DriverEntry: initializes driver dispatch routines and device object'
WHERE address = 0x401000;编写简洁的摘要描述函数功能,以便后续查询时能索引到该函数。
关于确切的触发语义( / / 单数形式),请遵循技能的函数摘要约定。
function summaryfunc-summaryadd function commentannotationssql
SELECT address, name, comment, rpt_comment
FROM funcs
WHERE address = 0x401000;
UPDATE funcs
SET rpt_comment = 'DriverEntry: initializes driver dispatch routines and device object'
WHERE address = 0x401000;4. Recurse into Callees
4. 递归处理被调用函数
Follow calls inside the function. Annotate each callee the same way, building understanding bottom-up.
sql
-- List callees to visit
SELECT callee_name, printf('0x%X', callee_addr) as addr
FROM disasm_calls WHERE func_addr = 0x401000;
-- Or map the full call subtree at once (BFS with depth tracking)
SELECT func_name, depth FROM call_graph
WHERE start = 0x401000 AND direction = 'down' AND max_depth = 5;
-- Decompile each callee
SELECT decompile(0x401050);
-- Annotate and recurse...跟踪函数内部的调用,以相同方式注释每个被调用函数,自底向上构建理解。
sql
-- 列出需要处理的被调用函数
SELECT callee_name, printf('0x%X', callee_addr) as addr
FROM disasm_calls WHERE func_addr = 0x401000;
-- 或一次性映射完整的调用子树(带深度追踪的广度优先搜索)
SELECT func_name, depth FROM call_graph
WHERE start = 0x401000 AND direction = 'down' AND max_depth = 5;
-- 反编译每个被调用函数
SELECT decompile(0x401050);
-- 注释并递归处理...5. Recurse into Callers
5. 递归处理调用函数
Follow callers to build the bigger picture: how is this function used?
sql
-- Who calls this function?
SELECT caller_name, printf('0x%X', caller_addr) as addr
FROM callers WHERE func_addr = 0x401000;
-- Or map ALL transitive callers at once
SELECT func_name, depth FROM call_graph
WHERE start = 0x401000 AND direction = 'up' AND max_depth = 10;
-- Find the shortest path from an entry point to this function
SELECT step, func_name FROM shortest_path
WHERE from_addr = (SELECT address FROM funcs WHERE name = 'main')
AND to_addr = 0x401000 AND max_depth = 20;
-- Decompile callers to see usage context
SELECT decompile(0x400F00);跟踪调用函数以构建全局视图:该函数是如何被使用的?
sql
-- 哪些函数调用了此函数?
SELECT caller_name, printf('0x%X', caller_addr) as addr
FROM callers WHERE func_addr = 0x401000;
-- 或一次性映射所有间接调用函数
SELECT func_name, depth FROM call_graph
WHERE start = 0x401000 AND direction = 'up' AND max_depth = 10;
-- 找到从入口点到该函数的最短路径
SELECT step, func_name FROM shortest_path
WHERE from_addr = (SELECT address FROM funcs WHERE name = 'main')
AND to_addr = 0x401000 AND max_depth = 20;
-- 反编译调用函数查看使用上下文
SELECT decompile(0x400F00);6. Structure Recovery
6. 结构体恢复
The hardest part. Decompiled code often shows casts like — these are structure field accesses.
*(DWORD *)(a1 + 0x10)这是最困难的部分。反编译代码常出现类似的强制转换——这些都是结构体字段访问。
*(DWORD *)(a1 + 0x10)Step-by-step Process
分步流程
a) Identify offset patterns in a single function:
sql
-- Look at the decompiled code for cast patterns
SELECT decompile(0x401000);
-- Query ctree for pointer arithmetic (field accesses)
SELECT ea, op_name, num_value
FROM ctree WHERE func_addr = 0x401000
AND op_name IN ('cot_add', 'cot_idx')
AND num_value IS NOT NULL;b) Cross-function correlation — find more fields:
sql
-- Find all callers that pass the same struct pointer
SELECT DISTINCT (SELECT name FROM funcs WHERE dc.func_addr >= address AND dc.func_addr < end_ea LIMIT 1) as caller
FROM disasm_calls dc
WHERE dc.callee_addr = 0x401000;
-- Decompile each caller and look for more offset accesses
-- Each caller may reveal different fields of the same structc) Callee correlation — let callees reveal field types:
sql
-- What does the function call with the struct pointer?
SELECT COALESCE(call_obj_name, call_helper_name) AS callee_name,
arg_idx, arg_op, arg_num_value
FROM ctree_call_args
WHERE func_addr = 0x401000
AND arg_var_name = 'a1';
-- If callee expects HANDLE, the field at that offset is a HANDLEd) Build the struct incrementally:
sql
-- Create the struct
INSERT INTO types (name, kind) VALUES ('MY_CONTEXT', 'struct');
-- Get the ordinal
SELECT ordinal FROM types WHERE name = 'MY_CONTEXT';
-- Add fields as you discover them
INSERT INTO types_members (type_ordinal, member_name, member_type)
VALUES (42, 'handle', 'HANDLE');
INSERT INTO types_members (type_ordinal, member_name, member_type)
VALUES (42, 'buffer_ptr', 'void *');
INSERT INTO types_members (type_ordinal, member_name, member_type)
VALUES (42, 'buffer_size', 'unsigned int');e) Apply the recovered struct:
sql
-- Apply to function prototype
UPDATE funcs SET prototype = 'int __fastcall process_context(MY_CONTEXT *ctx);'
WHERE address = 0x401000;
-- Or apply to a local variable
UPDATE ctree_lvars SET type = 'MY_CONTEXT *'
WHERE func_addr = 0x401000 AND idx = 0;
-- Re-decompile to verify clean rendering
SELECT decompile(0x401000, 1);a) 在单个函数中识别偏移模式:
sql
-- 查看反编译代码中的强制转换模式
SELECT decompile(0x401000);
-- 查询ctree中的指针运算(字段访问)
SELECT ea, op_name, num_value
FROM ctree WHERE func_addr = 0x401000
AND op_name IN ('cot_add', 'cot_idx')
AND num_value IS NOT NULL;b) 跨函数关联——发现更多字段:
sql
-- 找到所有传递相同结构体指针的调用函数
SELECT DISTINCT (SELECT name FROM funcs WHERE dc.func_addr >= address AND dc.func_addr < end_ea LIMIT 1) as caller
FROM disasm_calls dc
WHERE dc.callee_addr = 0x401000;
-- 反编译每个调用函数,查找更多偏移访问
-- 每个调用函数可能揭示同一结构体的不同字段c) 被调用函数关联——通过被调用函数揭示字段类型:
sql
-- 函数使用结构体指针调用了什么?
SELECT COALESCE(call_obj_name, call_helper_name) AS callee_name,
arg_idx, arg_op, arg_num_value
FROM ctree_call_args
WHERE func_addr = 0x401000
AND arg_var_name = 'a1';
-- 如果被调用函数期望HANDLE类型,则该偏移处的字段为HANDLE类型d) 逐步构建结构体:
sql
-- 创建结构体
INSERT INTO types (name, kind) VALUES ('MY_CONTEXT', 'struct');
-- 获取序号
SELECT ordinal FROM types WHERE name = 'MY_CONTEXT';
-- 发现字段后添加到结构体
INSERT INTO types_members (type_ordinal, member_name, member_type)
VALUES (42, 'handle', 'HANDLE');
INSERT INTO types_members (type_ordinal, member_name, member_type)
VALUES (42, 'buffer_ptr', 'void *');
INSERT INTO types_members (type_ordinal, member_name, member_type)
VALUES (42, 'buffer_size', 'unsigned int');e) 应用恢复后的结构体:
sql
-- 应用到函数原型
UPDATE funcs SET prototype = 'int __fastcall process_context(MY_CONTEXT *ctx);'
WHERE address = 0x401000;
-- 或应用到局部变量
UPDATE ctree_lvars SET type = 'MY_CONTEXT *'
WHERE func_addr = 0x401000 AND idx = 0;
-- 重新反编译以验证显示效果
SELECT decompile(0x401000, 1);7. Track Progress
7. 追踪进度
Use to persist progress across sessions:
netnode_kvsql
-- Mark a function as fully annotated
INSERT INTO netnode_kv(key, value)
VALUES('re_source:0x401000', '{"status":"done","summary":"DriverEntry init"}');
-- Check progress
SELECT key, value FROM netnode_kv WHERE key LIKE 're_source:%';使用在会话间保存进度:
netnode_kvsql
-- 将函数标记为已完全注释
INSERT INTO netnode_kv(key, value)
VALUES('re_source:0x401000', '{"status":"done","summary":"DriverEntry init"}');
-- 查看进度
SELECT key, value FROM netnode_kv WHERE key LIKE 're_source:%';Advanced Re-Sourcing Patterns (CTEs)
高级逆向溯源模式(CTE)
Transitive caller discovery
间接调用函数发现
Find all functions that transitively pass a struct through a chain of calls — who ultimately provides the data?
Preferfor simple traversal:call_graphreplaces the CTE below. Use the CTE only when you need to JOIN caller context (e.g. offset accesses) at each step.SELECT func_name, depth FROM call_graph WHERE start = 0x401000 AND direction = 'up' AND max_depth = 5
sql
-- Recursive CTE: walk callers up to 5 levels
WITH RECURSIVE caller_chain AS (
-- Base: direct callers of the struct-consuming function
SELECT c.caller_func_addr AS func_addr,
c.caller_name AS func_name,
1 AS depth
FROM callers c
WHERE c.func_addr = 0x401000
UNION ALL
-- Recurse: callers of callers
SELECT c.caller_func_addr,
c.caller_name,
cc.depth + 1
FROM caller_chain cc
JOIN callers c ON c.func_addr = cc.func_addr
WHERE cc.depth < 5
)
SELECT DISTINCT func_name, printf('0x%X', func_addr) AS addr, MIN(depth) AS min_depth
FROM caller_chain
GROUP BY func_addr
ORDER BY min_depth;找到所有通过调用链间接传递结构体的函数——最终是谁提供了这些数据?
简单遍历优先使用:call_graph可替代下面的CTE。仅当需要在每一步关联调用函数上下文(如偏移访问)时才使用CTE。SELECT func_name, depth FROM call_graph WHERE start = 0x401000 AND direction = 'up' AND max_depth = 5
sql
-- 递归CTE:向上遍历最多5层的调用函数
WITH RECURSIVE caller_chain AS (
-- 基础:直接调用结构体处理函数的函数
SELECT c.caller_func_addr AS func_addr,
c.caller_name AS func_name,
1 AS depth
FROM callers c
WHERE c.func_addr = 0x401000
UNION ALL
-- 递归:调用函数的调用函数
SELECT c.caller_func_addr,
c.caller_name,
cc.depth + 1
FROM caller_chain cc
JOIN callers c ON c.func_addr = cc.func_addr
WHERE cc.depth < 5
)
SELECT DISTINCT func_name, printf('0x%X', func_addr) AS addr, MIN(depth) AS min_depth
FROM caller_chain
GROUP BY func_addr
ORDER BY min_depth;Aggregate offset accesses across all callers
聚合所有调用函数的偏移访问
Build a comprehensive struct field map by collecting offset patterns from every function that touches the struct:
sql
-- Collect field offset accesses from all functions that call process_context
WITH callers_of AS (
SELECT DISTINCT func_addr
FROM disasm_calls
WHERE callee_addr = 0x401000
),
offset_accesses AS (
SELECT func_addr,
(SELECT name FROM funcs WHERE func_addr >= address AND func_addr < end_ea LIMIT 1) AS func_name,
num_value AS field_offset,
op_name
FROM ctree
WHERE func_addr IN (SELECT func_addr FROM callers_of)
AND op_name IN ('cot_add', 'cot_idx')
AND num_value IS NOT NULL
AND num_value BETWEEN 0 AND 0x1000
)
SELECT field_offset,
printf('0x%X', field_offset) AS hex_offset,
COUNT(DISTINCT func_addr) AS seen_in_funcs,
GROUP_CONCAT(DISTINCT func_name) AS functions
FROM offset_accesses
GROUP BY field_offset
ORDER BY field_offset;通过收集所有接触该结构体的函数中的偏移模式,构建全面的结构体字段映射:
sql
-- 收集所有调用process_context的函数中的字段偏移访问
WITH callers_of AS (
SELECT DISTINCT func_addr
FROM disasm_calls
WHERE callee_addr = 0x401000
),
offset_accesses AS (
SELECT func_addr,
(SELECT name FROM funcs WHERE func_addr >= address AND func_addr < end_ea LIMIT 1) AS func_name,
num_value AS field_offset,
op_name
FROM ctree
WHERE func_addr IN (SELECT func_addr FROM callers_of)
AND op_name IN ('cot_add', 'cot_idx')
AND num_value IS NOT NULL
AND num_value BETWEEN 0 AND 0x1000
)
SELECT field_offset,
printf('0x%X', field_offset) AS hex_offset,
COUNT(DISTINCT func_addr) AS seen_in_funcs,
GROUP_CONCAT(DISTINCT func_name) AS functions
FROM offset_accesses
GROUP BY field_offset
ORDER BY field_offset;Find struct-heavy functions (candidates for structure recovery)
查找结构体密集型函数(结构体恢复候选)
Functions with the most offset patterns are likely manipulating structs through raw pointer arithmetic:
cot_addsql
-- Functions with most pointer arithmetic (struct field access candidates)
WITH offset_funcs AS (
SELECT func_addr,
COUNT(*) AS offset_accesses,
COUNT(DISTINCT num_value) AS unique_offsets
FROM ctree
WHERE func_addr IN (SELECT address FROM funcs ORDER BY size DESC LIMIT 100)
AND op_name = 'cot_add'
AND num_value IS NOT NULL
AND num_value BETWEEN 1 AND 0x1000
GROUP BY func_addr
)
SELECT (SELECT name FROM funcs WHERE func_addr >= address AND func_addr < end_ea LIMIT 1) AS name,
printf('0x%X', func_addr) AS addr,
offset_accesses,
unique_offsets
FROM offset_funcs
ORDER BY unique_offsets DESC
LIMIT 15;包含最多偏移模式的函数,很可能通过原始指针运算操作结构体:
cot_addsql
-- 包含最多指针运算的函数(结构体字段访问候选)
WITH offset_funcs AS (
SELECT func_addr,
COUNT(*) AS offset_accesses,
COUNT(DISTINCT num_value) AS unique_offsets
FROM ctree
WHERE func_addr IN (SELECT address FROM funcs ORDER BY size DESC LIMIT 100)
AND op_name = 'cot_add'
AND num_value IS NOT NULL
AND num_value BETWEEN 1 AND 0x1000
GROUP BY func_addr
)
SELECT (SELECT name FROM funcs WHERE func_addr >= address AND func_addr < end_ea LIMIT 1) AS name,
printf('0x%X', func_addr) AS addr,
offset_accesses,
unique_offsets
FROM offset_funcs
ORDER BY unique_offsets DESC
LIMIT 15;Cross-reference struct field offsets with known type sizes
将结构体字段偏移与已知类型大小交叉引用
Match observed offsets against field sizes of existing types to guess field types:
sql
-- Compare observed offsets with known struct sizes
WITH observed AS (
SELECT DISTINCT num_value AS offset
FROM ctree
WHERE func_addr = 0x401000
AND op_name = 'cot_add'
AND num_value IS NOT NULL
AND num_value BETWEEN 0 AND 0x200
),
candidate_types AS (
SELECT t.name AS type_name, t.size AS type_size
FROM types t
WHERE t.is_struct = 1 AND t.size > 0
)
SELECT o.offset, printf('0x%X', o.offset) AS hex_offset,
ct.type_name, ct.type_size
FROM observed o
LEFT JOIN candidate_types ct ON ct.type_size = o.offset
ORDER BY o.offset;将观测到的偏移与现有类型的字段大小匹配,推测字段类型:
sql
-- 比较观测到的偏移与已知结构体大小
WITH observed AS (
SELECT DISTINCT num_value AS offset
FROM ctree
WHERE func_addr = 0x401000
AND op_name = 'cot_add'
AND num_value IS NOT NULL
AND num_value BETWEEN 0 AND 0x200
),
candidate_types AS (
SELECT t.name AS type_name, t.size AS type_size
FROM types t
WHERE t.is_struct = 1 AND t.size > 0
)
SELECT o.offset, printf('0x%X', o.offset) AS hex_offset,
ct.type_name, ct.type_size
FROM observed o
LEFT JOIN candidate_types ct ON ct.type_size = o.offset
ORDER BY o.offset;Key Principles
核心原则
-
Bottom-up understanding: Start with leaf callees, annotate them, then work up to callers. Each annotated callee makes the caller easier to understand.
-
Cross-function struct correlation: A single function rarely reveals the full struct layout. Look at multiple callers/callees of the same function to discover different fields.
-
Iterative refinement: Apply what you know, re-decompile, see if the output improves. Add more fields/types as you discover them.
-
Verify every edit: Follow the Mandatory Mutation Loop (read → edit → refresh → verify) from theskill.
annotations -
Save periodically: Useto persist your work.
SELECT save_database()
-
自底向上理解:从叶子被调用函数开始,注释它们,再向上处理调用函数。每个已注释的被调用函数都会让调用函数的理解变得更容易。
-
跨函数结构体关联:单个函数很少能揭示完整的结构体布局。查看同一函数的多个调用/被调用函数,以发现不同字段。
-
迭代优化:应用已知信息,重新反编译,查看输出是否改善。发现新字段/类型时及时添加。
-
验证每一处修改:遵循技能中的强制变更循环(读取 → 编辑 → 刷新 → 验证)。
annotations -
定期保存:使用保存工作成果。
SELECT save_database()
Related Skills
相关技能
- — The editing/annotation workflow: how to rename, retype, comment
annotations - — Deep decompiler reference: ctree, types, parse_decls, union selection
decompiler - — Type system mechanics: struct/union/enum creation and manipulation
types - — Caller/callee traversal,
xrefs/call_graphtables,shortest_pathviewstring_refs - —
disassemblyfor control flow understanding during struct recoverycfg_edges - — netnode_kv for tracking progress across sessions
storage
- — 编辑/注释工作流程:重命名、重新定义类型、添加注释的方法
annotations - — 反编译器深度参考:ctree、类型、parse_decls、联合体选择
decompiler - — 类型系统机制:结构体/联合体/枚举的创建与操作
types - — 调用/被调用函数遍历、
xrefs/call_graph表、shortest_path视图string_refs - — 结构体恢复过程中用于理解控制流的
disassemblycfg_edges - — 用于跨会话追踪进度的netnode_kv
storage