data

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Trigger Intents

触发场景

Use this skill when user asks to:

search strings/bytes/patterns quickly
map string evidence to code usage
investigate raw data-level indicators (IOCs, constants, signatures)

Route to:

```
xrefs
```
for relationship expansion from matched addresses
```
analysis
```
for triage synthesis from data signals
```
debugger
```
when findings should drive patch/breakpoint actions

当用户提出以下需求时使用本技能：

快速搜索字符串/字节/模式
将字符串线索映射到代码使用场景
调查原始数据级别的指示器（IOC、常量、签名）

流转至：

```
xrefs
```
：从匹配地址扩展关系分析
```
analysis
```
：基于数据信号进行分类综合分析
```
debugger
```
：根据分析结果执行补丁/断点操作

Do This First (Warm-Start Sequence)

先执行以下步骤（预热流程）

sql

-- 1) Validate string availability
SELECT COUNT(*) AS strings FROM strings;

-- 2) Sample high-value strings
SELECT content, printf('0x%X', address) AS addr
FROM strings
WHERE length >= 8
ORDER BY length DESC
LIMIT 40;

-- 3) If expected strings are missing, rebuild once
SELECT rebuild_strings();

Interpretation guidance:

Strings are often quickest behavioral clues; pivot to
```
xrefs
```
immediately for execution context.
For opcode/pattern hunts, prefer the
```
byte_search
```
table over full instruction scans.

sql

-- 1) 验证字符串可用性
SELECT COUNT(*) AS strings FROM strings;

-- 2) 采样高价值字符串
SELECT content, printf('0x%X', address) AS addr
FROM strings
WHERE length >= 8
ORDER BY length DESC
LIMIT 40;

-- 3) 若预期字符串缺失，重建一次字符串表
SELECT rebuild_strings();

解读指引：

字符串通常是最快捷的行为线索，应立即转向
```
xrefs
```
获取执行上下文。
对于操作码/模式搜索，优先使用
```
byte_search
```
表而非全指令扫描。

Failure and Recovery

故障排查与恢复

No strings or unexpectedly low count:
- Run
```
rebuild_strings()
```
  and validate with
```
COUNT(*) FROM strings
```
  .
Too many false positives:
- Increase specificity (
```
LIKE
```
  , regex-like pattern narrowing, module/function join filters).
Byte pattern search too broad:
- Restrict by range or join matched byte addresses to
```
funcs
```
  .
Need named functions, labels, structs, or members instead of string contents:
- Use
```
grep
```
  , not
```
strings
```
  or
```
byte_search
```
  .

无字符串或数量异常偏低：
- 执行
```
rebuild_strings()
```
  ，并通过
```
COUNT(*) FROM strings
```
  验证结果。
误报过多：
- 提高查询特异性（使用
```
LIKE
```
  、类正则模式缩小范围、模块/函数关联过滤）。
字节模式搜索范围过广：
- 通过地址范围限制，或将匹配的字节地址与
```
funcs
```
  表关联。
需要查找命名函数、标签、结构体或成员而非字符串内容：
- 使用
```
grep
```
  ，而非
```
strings
```
  或
```
byte_search
```
  。

Handoff Patterns

流转模式

```
data
```
->
```
xrefs
```
to convert data hits into code paths/functions.
```
data
```
->
```
analysis
```
for risk scoring and campaign-level insight.
```
data
```
->
```
debugger
```
for actionable patch/watchpoint setup.

```
data
```
->
```
xrefs
```
：将数据命中结果转换为代码路径/函数。
```
data
```
->
```
analysis
```
：进行风险评分和战役级洞察分析。
```
data
```
->
```
debugger
```
：设置可执行的补丁/监视点。

strings

String literals found in the binary. IDA maintains a cached string list that can be configured.

Column	Type	Description
`address`	INT	String address
`length`	INT	String length
`type`	INT	String type (raw encoding bits)
`type_name`	TEXT	Type name: ascii, utf16, utf32
`width`	INT	Char width (0=1-byte, 1=2-byte, 2=4-byte)
`width_name`	TEXT	Width name: 1-byte, 2-byte, 4-byte
`layout`	INT	String layout (0=null-terminated, 1-3=pascal)
`layout_name`	TEXT	Layout name: termchr, pascal1, pascal2, pascal4
`encoding`	INT	Encoding index (0=default)
`content`	TEXT	String content (the actual text — not `value` or `text` )

String Type Encoding: IDA stores string type as a 32-bit value:

Bits 0-1: Width (0=1B/ASCII, 1=2B/UTF-16, 2=4B/UTF-32)
Bits 2-7: Layout (0=TERMCHR, 1=PASCAL1, 2=PASCAL2, 3=PASCAL4)
Bits 8-15: term1 (first termination character)
Bits 16-23: term2 (second termination character)
Bits 24-31: encoding index

sql

-- Find error messages
SELECT content, printf('0x%X', address) as addr FROM strings WHERE content LIKE '%error%';

-- ASCII strings only
SELECT * FROM strings WHERE type_name = 'ascii';

-- UTF-16 strings (common in Windows)
SELECT * FROM strings WHERE type_name = 'utf16';

-- Count strings by type
SELECT type_name, layout_name, COUNT(*) as count
FROM strings GROUP BY type_name, layout_name ORDER BY count DESC;

Important: For new analysis (exe/dll), strings are auto-built. For existing databases (i64/idb), strings are already saved. If you see 0 strings unexpectedly, run

SELECT rebuild_strings()

once to rebuild the list. See String List Surfaces section below.

二进制文件中的字符串字面量。IDA会维护一个可配置的缓存字符串列表。

列名	类型	描述
`address`	INT	字符串地址
`length`	INT	字符串长度
`type`	INT	字符串类型（原始编码位）
`type_name`	TEXT	类型名称：ascii, utf16, utf32
`width`	INT	字符宽度（0=1字节, 1=2字节, 2=4字节）
`width_name`	TEXT	宽度名称：1-byte, 2-byte, 4-byte
`layout`	INT	字符串布局（0=空终止, 1-3=pascal格式）
`layout_name`	TEXT	布局名称：termchr, pascal1, pascal2, pascal4
`encoding`	INT	编码索引（0=默认）
`content`	TEXT	字符串内容（实际文本 — 不是 `value` 或 `text` ）

字符串类型编码： IDA将字符串类型存储为32位值：

位0-1：宽度（0=1字节/ASCII, 1=2字节/UTF-16, 2=4字节/UTF-32）
位2-7：布局（0=TERMCHR, 1=PASCAL1, 2=PASCAL2, 3=PASCAL4）
位8-15：term1（第一个终止字符）
位16-23：term2（第二个终止字符）
位24-31：编码索引

sql

-- 查找错误信息
SELECT content, printf('0x%X', address) as addr FROM strings WHERE content LIKE '%error%';

-- 仅查找ASCII字符串
SELECT * FROM strings WHERE type_name = 'ascii';

-- UTF-16字符串（Windows系统中常见）
SELECT * FROM strings WHERE type_name = 'utf16';

-- 按类型统计字符串数量
SELECT type_name, layout_name, COUNT(*) as count
FROM strings GROUP BY type_name, layout_name ORDER BY count DESC;

重要提示： 对于新分析的文件（exe/dll），字符串会自动生成。对于现有数据库（i64/idb），字符串已保存。若意外出现0条字符串，执行一次

SELECT rebuild_strings()

重建列表。详见下方“字符串列表界面”章节。

String References (explicit join pattern)

字符串引用（显式关联模式）

Use

strings + xrefs + funcs

directly. This is the canonical pattern.

sql

-- Find call sites/functions referencing error-like strings
SELECT
    s.content as string_value,
    printf('0x%X', x.from_ea) as ref_addr,
    (SELECT name FROM funcs WHERE x.from_ea >= address AND x.from_ea < end_ea LIMIT 1) as func_name
FROM strings s
JOIN xrefs x ON x.to_ea = s.address
WHERE s.content LIKE '%error%' OR s.content LIKE '%fail%'
ORDER BY func_name, ref_addr;

-- Functions with most string references
SELECT
    (SELECT name FROM funcs WHERE x.from_ea >= address AND x.from_ea < end_ea LIMIT 1) as func_name,
    COUNT(*) as string_refs
FROM strings s
JOIN xrefs x ON x.to_ea = s.address
GROUP BY func_name
ORDER BY string_refs DESC
LIMIT 10;

直接使用

strings + xrefs + funcs

组合，这是标准模式。

sql

-- 查找引用错误类字符串的调用位置/函数
SELECT
    s.content as string_value,
    printf('0x%X', x.from_ea) as ref_addr,
    (SELECT name FROM funcs WHERE x.from_ea >= address AND x.from_ea < end_ea LIMIT 1) as func_name
FROM strings s
JOIN xrefs x ON x.to_ea = s.address
WHERE s.content LIKE '%error%' OR s.content LIKE '%fail%'
ORDER BY func_name, ref_addr;

-- 引用字符串最多的函数
SELECT
    (SELECT name FROM funcs WHERE x.from_ea >= address AND x.from_ea < end_ea LIMIT 1) as func_name,
    COUNT(*) as string_refs
FROM strings s
JOIN xrefs x ON x.to_ea = s.address
GROUP BY func_name
ORDER BY string_refs DESC
LIMIT 10;

SQL Functions — Byte Access (Read-Only)

SQL函数 — 字节访问（只读）

Function	Description
`bytes(addr, n)`	Read `n` raw bytes as hex string
`bytes_raw(addr, n)`	Read `n` bytes as BLOB

For row-shaped byte workflows,

bytes

is a pure mapped-byte table:

sql

SELECT ea, value
FROM bytes
WHERE ea >= 0x401000 AND ea < 0x401010
ORDER BY ea;

Use

heads

when you need IDA item size/type metadata.

函数	描述
`bytes(addr, n)`	读取 `n` 个原始字节并以十六进制字符串返回
`bytes_raw(addr, n)`	读取 `n` 个字节并以BLOB格式返回

对于行式字节处理流程，

bytes

是纯映射字节表：

sql

SELECT ea, value
FROM bytes
WHERE ea >= 0x401000 AND ea < 0x401010
ORDER BY ea;

当需要IDA项的大小/类型元数据时，使用

heads

。

Binary Search Table

二进制搜索表

Use

byte_search

for raw bytes/opcodes. It requires

WHERE pattern = ...

;

matched_hex

is an output column, not the search input.

Column	Type	Description
`address`	INT	Match address
`matched_hex`	TEXT	Matched bytes rendered as hex text
`matched_bytes`	BLOB	Matched bytes as raw bytes
`size`	INT	Match size in bytes
`pattern`	HIDDEN TEXT	Required IDA byte pattern input
`start_ea`	HIDDEN INT	Optional inclusive lower bound
`end_ea`	HIDDEN INT	Optional exclusive upper bound
`max_results`	HIDDEN INT	Optional generator cap

Pattern syntax (IDA native):

```
"48 8B 05"
```
- Exact bytes (hex, space-separated)
```
"48 ? 05"
```
or
```
"48 ?? 05"
```
-
```
?
```
= any byte wildcard (whole byte only)
```
"(01 02 03)"
```
- Alternatives (match any of these bytes)

Note: Nibble wildcards and regex are not supported in byte patterns.

Example:

sql

-- Find all matches for a pattern
SELECT address, matched_hex, size
FROM byte_search
WHERE pattern = '48 8B ? 00'
LIMIT 10;

-- First match only
SELECT printf('0x%llX', address) AS addr
FROM byte_search
WHERE pattern = 'CC CC CC'
ORDER BY address
LIMIT 1;

-- Search with alternatives
SELECT address, matched_hex
FROM byte_search
WHERE pattern = 'E8 (01 02 03 04)'
LIMIT 20;

Optimization Pattern: Find functions using specific instruction

To answer "How many functions use RDTSC instruction?" efficiently:

sql

-- Count unique functions containing RDTSC (opcode: 0F 31)
SELECT COUNT(DISTINCT f.address) as count
FROM byte_search b
JOIN funcs f ON b.address >= f.address AND b.address < f.end_ea
WHERE b.pattern = '0F 31';

-- List those functions with names
SELECT DISTINCT
    f.address as func_ea,
    f.name as func_name
FROM byte_search b
JOIN funcs f ON b.address >= f.address AND b.address < f.end_ea
WHERE b.pattern = '0F 31';

This is much faster than scanning all disassembly lines because:

```
byte_search
```
uses IDA's native binary search
the containment join uses the compact
```
funcs
```
table instead of scanning every instruction

使用

byte_search

查找原始字节/操作码。必须使用

WHERE pattern = ...

；

matched_hex

是输出列，而非搜索输入。

列名	类型	描述
`address`	INT	匹配地址
`matched_hex`	TEXT	匹配字节的十六进制文本形式
`matched_bytes`	BLOB	匹配字节的原始格式
`size`	INT	匹配结果的字节大小
`pattern`	HIDDEN TEXT	必填的IDA字节模式输入
`start_ea`	HIDDEN INT	可选的包含性下界
`end_ea`	HIDDEN INT	可选的排他性上界
`max_results`	HIDDEN INT	可选的结果数量上限

模式语法（IDA原生）：

```
"48 8B 05"
```
- 精确字节（十六进制，空格分隔）
```
"48 ? 05"
```
或
```
"48 ?? 05"
```
-
```
?
```
= 任意字节通配符（仅支持整字节）
```
"(01 02 03)"
```
- 备选模式（匹配其中任意一组字节）

注意： 字节模式不支持半字节通配符和正则表达式。

示例：

sql

-- 查找某模式的所有匹配结果
SELECT address, matched_hex, size
FROM byte_search
WHERE pattern = '48 8B ? 00'
LIMIT 10;

-- 仅查找第一个匹配结果
SELECT printf('0x%llX', address) AS addr
FROM byte_search
WHERE pattern = 'CC CC CC'
ORDER BY address
LIMIT 1;

-- 使用备选模式搜索
SELECT address, matched_hex
FROM byte_search
WHERE pattern = 'E8 (01 02 03 04)'
LIMIT 20;

优化模式：查找使用特定指令的函数

高效回答“有多少函数使用RDTSC指令？”：

sql

-- 统计包含RDTSC（操作码：0F 31）的唯一函数数量
SELECT COUNT(DISTINCT f.address) as count
FROM byte_search b
JOIN funcs f ON b.address >= f.address AND b.address < f.end_ea
WHERE b.pattern = '0F 31';

-- 列出这些函数及其名称
SELECT DISTINCT
    f.address as func_ea,
    f.name as func_name
FROM byte_search b
JOIN funcs f ON b.address >= f.address AND b.address < f.end_ea
WHERE b.pattern = '0F 31';

这种方式比扫描所有反汇编行快得多，原因如下：

```
byte_search
```
使用IDA原生二进制搜索
包含关联使用紧凑的
```
funcs
```
表，而非扫描每条指令

Choose the Right Search Surface

选择合适的搜索界面

Use
```
grep
```
for named entities such as functions, labels, structs, enums, and members.
Use
```
strings
```
when the user is searching literal string contents inside the binary.
Use
```
byte_search
```
when the target is a raw byte or opcode pattern.

使用
```
grep
```
查找命名实体，如函数、标签、结构体、枚举及成员。
使用
```
strings
```
查找二进制文件中的字面字符串内容。
使用
```
byte_search
```
查找原始字节或操作码模式。

SQL Surfaces — String List

SQL界面 — 字符串列表

IDA maintains a cached list of strings. Use

rebuild_strings()

to detect and cache strings,

COUNT(*) FROM strings

for the current count, and

strings

for row-level analysis.

Surface	Description
`rebuild_strings()`	Rebuild with ASCII + UTF-16, minlen 5 (default)
`rebuild_strings(minlen)`	Rebuild with custom minimum length
`rebuild_strings(minlen, types)`	Rebuild with custom length and type mask
`SELECT COUNT(*) FROM strings`	Current string-list count (optimized without row materialization)

Type mask values:

```
1
```
= ASCII only (STRTYPE_C)
```
2
```
= UTF-16 only (STRTYPE_C_16)
```
4
```
= UTF-32 only (STRTYPE_C_32)
```
3
```
= ASCII + UTF-16 (default)
```
7
```
= All types

sql

-- Check current string count
SELECT COUNT(*) AS strings FROM strings;

-- Rebuild with defaults (ASCII + UTF-16, minlen 5)
SELECT rebuild_strings();

-- Rebuild with shorter minimum length
SELECT rebuild_strings(4);

-- Rebuild with specific types
SELECT rebuild_strings(5, 1);   -- ASCII only
SELECT rebuild_strings(5, 7);   -- All types (ASCII + UTF-16 + UTF-32)

-- Typical workflow: rebuild then query
SELECT rebuild_strings();
SELECT * FROM strings WHERE content LIKE '%error%';

IMPORTANT - Agent Behavior for String Queries: When the user asks about strings (e.g., "show me the strings", "what strings are in this binary"):

First run
```
SELECT rebuild_strings()
```
to ensure strings are detected
Then query the
```
strings
```
table

The

rebuild_strings()

function configures IDA's string detection with sensible defaults (ASCII + UTF-16, minimum length 5) and rebuilds the string list. This ensures the user gets results even if the database had no prior string analysis.

IDA维护一个缓存的字符串列表。使用

rebuild_strings()

检测并缓存字符串，

COUNT(*) FROM strings

查看当前数量，

strings

进行行级分析。

界面	描述
`rebuild_strings()`	重建字符串列表（默认包含ASCII + UTF-16，最小长度5）
`rebuild_strings(minlen)`	使用自定义最小长度重建字符串列表
`rebuild_strings(minlen, types)`	使用自定义长度和类型掩码重建字符串列表
`SELECT COUNT(*) FROM strings`	当前字符串列表数量（优化后无需行实例化）

类型掩码值：

```
1
```
= 仅ASCII（STRTYPE_C）
```
2
```
= 仅UTF-16（STRTYPE_C_16）
```
4
```
= 仅UTF-32（STRTYPE_C_32）
```
3
```
= ASCII + UTF-16（默认）
```
7
```
= 所有类型

sql

-- 查看当前字符串数量
SELECT COUNT(*) AS strings FROM strings;

-- 使用默认参数重建（ASCII + UTF-16，最小长度5）
SELECT rebuild_strings();

-- 使用更短的最小长度重建
SELECT rebuild_strings(4);

-- 使用特定类型重建
SELECT rebuild_strings(5, 1);   -- 仅ASCII
SELECT rebuild_strings(5, 7);   -- 所有类型（ASCII + UTF-16 + UTF-32）

-- 典型流程：重建后查询
SELECT rebuild_strings();
SELECT * FROM strings WHERE content LIKE '%error%';

重要提示 - 字符串查询的Agent行为： 当用户询问字符串相关问题时（例如“显示所有字符串”、“这个二进制文件中有哪些字符串”）：

首先执行
```
SELECT rebuild_strings()
```
确保字符串被检测到
然后查询
```
strings
```
表

rebuild_strings()

函数会使用合理默认值（ASCII + UTF-16，最小长度5）配置IDA的字符串检测，并重建字符串列表。即使数据库之前未进行字符串分析，也能确保用户获得结果。

Performance Rules

性能规则

Table/Function	Architecture	Notes
`COUNT(*) FROM strings`	Cached table count path	O(1) current string-list count
`strings`	Cached	Rebuilt on demand via `rebuild_strings()` ; fast once cached
`byte_search`	Native binary search table	Much faster than iterating instructions table
`bytes()`	Direct read	O(1) per address, no table overhead

Key rules:

Always call
```
rebuild_strings()
```
before the first string query on a new database or after making code/data changes that may create new strings.
```
byte_search
```
uses IDA's native binary search engine; for "find functions containing opcode X", join matches to
```
funcs
```
by containment instead of scanning the
```
instructions
```
table.
```
strings
```
table is a snapshot of the cached string list. If IDA's analysis creates new strings after your initial query, call
```
rebuild_strings()
```
again.
For cross-referencing strings with functions, the
```
strings + xrefs + funcs
```
JOIN pattern is canonical — IDA's xref index makes the JOIN fast.

表/函数	架构	说明
`COUNT(*) FROM strings`	缓存表计数路径	O(1)时间复杂度获取当前字符串列表数量
`strings`	缓存型	通过 `rebuild_strings()` 按需重建；缓存后查询速度快
`byte_search`	原生二进制搜索表	比遍历指令表快得多
`bytes()`	直接读取	每个地址O(1)时间复杂度，无表开销

核心规则：

在新数据库首次查询字符串前，或在修改代码/数据可能产生新字符串后，务必调用
```
rebuild_strings()
```
。
```
byte_search
```
使用IDA原生二进制搜索引擎；对于“查找包含操作码X的函数”，将匹配结果与
```
funcs
```
表进行包含关联，而非扫描
```
instructions
```
表。
```
strings
```
表是缓存字符串列表的快照。若IDA分析在初始查询后生成了新字符串，需再次调用
```
rebuild_strings()
```
。
交叉引用字符串与函数时，
```
strings + xrefs + funcs
```
的关联模式是标准方式 — IDA的xref索引使关联操作速度快。

Advanced Data Patterns (CTEs)

高级数据模式（CTE）

Security-relevant string triage

安全相关字符串分类

Categorize strings by security relevance for rapid threat assessment:

sql

-- Categorize strings by security relevance
SELECT rebuild_strings();

WITH categorized AS (
    SELECT address, content,
        CASE
            WHEN content LIKE '%password%' OR content LIKE '%passwd%' OR content LIKE '%secret%'
                THEN 'credential'
            WHEN content LIKE '%http://%' OR content LIKE '%https://%' OR content LIKE '%ftp://%'
                THEN 'url'
            WHEN content LIKE '%error%' OR content LIKE '%fail%' OR content LIKE '%exception%'
                THEN 'error'
            WHEN content LIKE '%debug%' OR content LIKE '%trace%' OR content LIKE '%assert%'
                THEN 'debug'
            WHEN content LIKE '%.exe%' OR content LIKE '%.dll%' OR content LIKE '%.sys%'
                THEN 'file_path'
            WHEN content LIKE '%HKEY_%' OR content LIKE '%SOFTWARE\\%'
                THEN 'registry'
            ELSE 'other'
        END AS category
    FROM strings
    WHERE length >= 5
)
SELECT category, COUNT(*) AS count,
       GROUP_CONCAT(SUBSTR(content, 1, 60), ' | ') AS samples
FROM categorized
WHERE category != 'other'
GROUP BY category
ORDER BY count DESC;

按安全相关性对字符串分类，快速进行威胁评估：

sql

-- 按安全相关性分类字符串
SELECT rebuild_strings();

WITH categorized AS (
    SELECT address, content,
        CASE
            WHEN content LIKE '%password%' OR content LIKE '%passwd%' OR content LIKE '%secret%'
                THEN 'credential'
            WHEN content LIKE '%http://%' OR content LIKE '%https://%' OR content LIKE '%ftp://%'
                THEN 'url'
            WHEN content LIKE '%error%' OR content LIKE '%fail%' OR content LIKE '%exception%'
                THEN 'error'
            WHEN content LIKE '%debug%' OR content LIKE '%trace%' OR content LIKE '%assert%'
                THEN 'debug'
            WHEN content LIKE '%.exe%' OR content LIKE '%.dll%' OR content LIKE '%.sys%'
                THEN 'file_path'
            WHEN content LIKE '%HKEY_%' OR content LIKE '%SOFTWARE\\%'
                THEN 'registry'
            ELSE 'other'
        END AS category
    FROM strings
    WHERE length >= 5
)
SELECT category, COUNT(*) AS count,
       GROUP_CONCAT(SUBSTR(content, 1, 60), ' | ') AS samples
FROM categorized
WHERE category != 'other'
GROUP BY category
ORDER BY count DESC;

Combine string references with function size for suspicion scoring

结合字符串引用与函数大小进行可疑度评分

Functions referencing security-relevant strings AND having significant size are high-priority targets:

sql

-- Suspicious functions: reference interesting strings AND are non-trivial
WITH interesting_strings AS (
    SELECT address, content FROM strings
    WHERE content LIKE '%password%' OR content LIKE '%encrypt%'
       OR content LIKE '%decrypt%' OR content LIKE '%http%'
       OR content LIKE '%socket%' OR content LIKE '%connect%'
),
string_funcs AS (
    SELECT DISTINCT (SELECT address FROM funcs WHERE x.from_ea >= address AND x.from_ea < end_ea LIMIT 1) AS func_addr,
           s.content AS matched_string
    FROM interesting_strings s
    JOIN xrefs x ON x.to_ea = s.address
    WHERE (SELECT address FROM funcs WHERE x.from_ea >= address AND x.from_ea < end_ea LIMIT 1) IS NOT NULL
)
SELECT (SELECT name FROM funcs WHERE sf.func_addr >= address AND sf.func_addr < end_ea LIMIT 1) AS func_name,
       printf('0x%X', sf.func_addr) AS addr,
       f.size AS func_size,
       GROUP_CONCAT(sf.matched_string, ' | ') AS strings_referenced
FROM string_funcs sf
JOIN funcs f ON f.address = sf.func_addr
GROUP BY sf.func_addr
ORDER BY f.size DESC
LIMIT 20;

引用安全相关字符串且体积较大的函数是高优先级分析目标：

sql

-- 可疑函数：引用敏感字符串且体积较大
WITH interesting_strings AS (
    SELECT address, content FROM strings
    WHERE content LIKE '%password%' OR content LIKE '%encrypt%'
       OR content LIKE '%decrypt%' OR content LIKE '%http%'
       OR content LIKE '%socket%' OR content LIKE '%connect%'
),
string_funcs AS (
    SELECT DISTINCT (SELECT address FROM funcs WHERE x.from_ea >= address AND x.from_ea < end_ea LIMIT 1) AS func_addr,
           s.content AS matched_string
    FROM interesting_strings s
    JOIN xrefs x ON x.to_ea = s.address
    WHERE (SELECT address FROM funcs WHERE x.from_ea >= address AND x.from_ea < end_ea LIMIT 1) IS NOT NULL
)
SELECT (SELECT name FROM funcs WHERE sf.func_addr >= address AND sf.func_addr < end_ea LIMIT 1) AS func_name,
       printf('0x%X', sf.func_addr) AS addr,
       f.size AS func_size,
       GROUP_CONCAT(sf.matched_string, ' | ') AS strings_referenced
FROM string_funcs sf
JOIN funcs f ON f.address = sf.func_addr
GROUP BY sf.func_addr
ORDER BY f.size DESC
LIMIT 20;

Find functions referencing both crypto-related and network-related strings

查找同时引用加密相关和网络相关字符串的函数

Cross-category correlation for identifying data exfiltration or C2 communication:

sql

-- Functions touching both crypto and network strings
WITH crypto_refs AS (
    SELECT DISTINCT (SELECT address FROM funcs WHERE x.from_ea >= address AND x.from_ea < end_ea LIMIT 1) AS func_addr
    FROM strings s JOIN xrefs x ON x.to_ea = s.address
    WHERE s.content LIKE '%crypt%' OR s.content LIKE '%aes%'
       OR s.content LIKE '%cipher%' OR s.content LIKE '%hash%'
),
network_refs AS (
    SELECT DISTINCT (SELECT address FROM funcs WHERE x.from_ea >= address AND x.from_ea < end_ea LIMIT 1) AS func_addr
    FROM strings s JOIN xrefs x ON x.to_ea = s.address
    WHERE s.content LIKE '%socket%' OR s.content LIKE '%connect%'
       OR s.content LIKE '%send%' OR s.content LIKE '%recv%'
       OR s.content LIKE '%http%'
)
SELECT (SELECT name FROM funcs WHERE c.func_addr >= address AND c.func_addr < end_ea LIMIT 1) AS func_name,
       printf('0x%X', c.func_addr) AS addr
FROM crypto_refs c
JOIN network_refs n ON n.func_addr = c.func_addr;

跨类别关联以识别数据泄露或C2通信行为：

sql

-- 同时涉及加密和网络字符串的函数
WITH crypto_refs AS (
    SELECT DISTINCT (SELECT address FROM funcs WHERE x.from_ea >= address AND x.from_ea < end_ea LIMIT 1) AS func_addr
    FROM strings s JOIN xrefs x ON x.to_ea = s.address
    WHERE s.content LIKE '%crypt%' OR s.content LIKE '%aes%'
       OR s.content LIKE '%cipher%' OR s.content LIKE '%hash%'
),
network_refs AS (
    SELECT DISTINCT (SELECT address FROM funcs WHERE x.from_ea >= address AND x.from_ea < end_ea LIMIT 1) AS func_addr
    FROM strings s JOIN xrefs x ON x.to_ea = s.address
    WHERE s.content LIKE '%socket%' OR s.content LIKE '%connect%'
       OR s.content LIKE '%send%' OR s.content LIKE '%recv%'
       OR s.content LIKE '%http%'
)
SELECT (SELECT name FROM funcs WHERE c.func_addr >= address AND c.func_addr < end_ea LIMIT 1) AS func_name,
       printf('0x%X', c.func_addr) AS addr
FROM crypto_refs c
JOIN network_refs n ON n.func_addr = c.func_addr;