data

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese


Trigger Intents

触发场景

Use this skill when user asks to:
  • search strings/bytes/patterns quickly
  • map string evidence to code usage
  • investigate raw data-level indicators (IOCs, constants, signatures)
Route to:
  • xrefs
    for relationship expansion from matched addresses
  • analysis
    for triage synthesis from data signals
  • debugger
    when findings should drive patch/breakpoint actions

当用户提出以下需求时使用本技能:
  • 快速搜索字符串/字节/模式
  • 将字符串线索映射到代码使用场景
  • 调查原始数据级别的指示器(IOC、常量、签名)
流转至:
  • xrefs
    :从匹配地址扩展关系分析
  • analysis
    :基于数据信号进行分类综合分析
  • debugger
    :根据分析结果执行补丁/断点操作

Do This First (Warm-Start Sequence)

先执行以下步骤(预热流程)

sql
-- 1) Validate string availability
SELECT COUNT(*) AS strings FROM strings;

-- 2) Sample high-value strings
SELECT content, printf('0x%X', address) AS addr
FROM strings
WHERE length >= 8
ORDER BY length DESC
LIMIT 40;

-- 3) If expected strings are missing, rebuild once
SELECT rebuild_strings();
Interpretation guidance:
  • Strings are often quickest behavioral clues; pivot to
    xrefs
    immediately for execution context.
  • For opcode/pattern hunts, prefer the
    byte_search
    table over full instruction scans.

sql
-- 1) 验证字符串可用性
SELECT COUNT(*) AS strings FROM strings;

-- 2) 采样高价值字符串
SELECT content, printf('0x%X', address) AS addr
FROM strings
WHERE length >= 8
ORDER BY length DESC
LIMIT 40;

-- 3) 若预期字符串缺失,重建一次字符串表
SELECT rebuild_strings();
解读指引:
  • 字符串通常是最快捷的行为线索,应立即转向
    xrefs
    获取执行上下文。
  • 对于操作码/模式搜索,优先使用
    byte_search
    表而非全指令扫描。

Failure and Recovery

故障排查与恢复

  • No strings or unexpectedly low count:
    • Run
      rebuild_strings()
      and validate with
      COUNT(*) FROM strings
      .
  • Too many false positives:
    • Increase specificity (
      LIKE
      , regex-like pattern narrowing, module/function join filters).
  • Byte pattern search too broad:
    • Restrict by range or join matched byte addresses to
      funcs
      .
  • Need named functions, labels, structs, or members instead of string contents:
    • Use
      grep
      , not
      strings
      or
      byte_search
      .

  • 无字符串或数量异常偏低:
    • 执行
      rebuild_strings()
      ,并通过
      COUNT(*) FROM strings
      验证结果。
  • 误报过多:
    • 提高查询特异性(使用
      LIKE
      、类正则模式缩小范围、模块/函数关联过滤)。
  • 字节模式搜索范围过广:
    • 通过地址范围限制,或将匹配的字节地址与
      funcs
      表关联。
  • 需要查找命名函数、标签、结构体或成员而非字符串内容:
    • 使用
      grep
      ,而非
      strings
      byte_search

Handoff Patterns

流转模式

  1. data
    ->
    xrefs
    to convert data hits into code paths/functions.
  2. data
    ->
    analysis
    for risk scoring and campaign-level insight.
  3. data
    ->
    debugger
    for actionable patch/watchpoint setup.

  1. data
    ->
    xrefs
    :将数据命中结果转换为代码路径/函数。
  2. data
    ->
    analysis
    :进行风险评分和战役级洞察分析。
  3. data
    ->
    debugger
    :设置可执行的补丁/监视点。

strings

strings

String literals found in the binary. IDA maintains a cached string list that can be configured.
ColumnTypeDescription
address
INTString address
length
INTString length
type
INTString type (raw encoding bits)
type_name
TEXTType name: ascii, utf16, utf32
width
INTChar width (0=1-byte, 1=2-byte, 2=4-byte)
width_name
TEXTWidth name: 1-byte, 2-byte, 4-byte
layout
INTString layout (0=null-terminated, 1-3=pascal)
layout_name
TEXTLayout name: termchr, pascal1, pascal2, pascal4
encoding
INTEncoding index (0=default)
content
TEXTString content (the actual text — not
value
or
text
)
String Type Encoding: IDA stores string type as a 32-bit value:
  • Bits 0-1: Width (0=1B/ASCII, 1=2B/UTF-16, 2=4B/UTF-32)
  • Bits 2-7: Layout (0=TERMCHR, 1=PASCAL1, 2=PASCAL2, 3=PASCAL4)
  • Bits 8-15: term1 (first termination character)
  • Bits 16-23: term2 (second termination character)
  • Bits 24-31: encoding index
sql
-- Find error messages
SELECT content, printf('0x%X', address) as addr FROM strings WHERE content LIKE '%error%';

-- ASCII strings only
SELECT * FROM strings WHERE type_name = 'ascii';

-- UTF-16 strings (common in Windows)
SELECT * FROM strings WHERE type_name = 'utf16';

-- Count strings by type
SELECT type_name, layout_name, COUNT(*) as count
FROM strings GROUP BY type_name, layout_name ORDER BY count DESC;
Important: For new analysis (exe/dll), strings are auto-built. For existing databases (i64/idb), strings are already saved. If you see 0 strings unexpectedly, run
SELECT rebuild_strings()
once to rebuild the list. See String List Surfaces section below.

二进制文件中的字符串字面量。IDA会维护一个可配置的缓存字符串列表。
列名类型描述
address
INT字符串地址
length
INT字符串长度
type
INT字符串类型(原始编码位)
type_name
TEXT类型名称:ascii, utf16, utf32
width
INT字符宽度(0=1字节, 1=2字节, 2=4字节)
width_name
TEXT宽度名称:1-byte, 2-byte, 4-byte
layout
INT字符串布局(0=空终止, 1-3=pascal格式)
layout_name
TEXT布局名称:termchr, pascal1, pascal2, pascal4
encoding
INT编码索引(0=默认)
content
TEXT字符串内容(实际文本 — 不是
value
text
字符串类型编码: IDA将字符串类型存储为32位值:
  • 位0-1:宽度(0=1字节/ASCII, 1=2字节/UTF-16, 2=4字节/UTF-32)
  • 位2-7:布局(0=TERMCHR, 1=PASCAL1, 2=PASCAL2, 3=PASCAL4)
  • 位8-15:term1(第一个终止字符)
  • 位16-23:term2(第二个终止字符)
  • 位24-31:编码索引
sql
-- 查找错误信息
SELECT content, printf('0x%X', address) as addr FROM strings WHERE content LIKE '%error%';

-- 仅查找ASCII字符串
SELECT * FROM strings WHERE type_name = 'ascii';

-- UTF-16字符串(Windows系统中常见)
SELECT * FROM strings WHERE type_name = 'utf16';

-- 按类型统计字符串数量
SELECT type_name, layout_name, COUNT(*) as count
FROM strings GROUP BY type_name, layout_name ORDER BY count DESC;
重要提示: 对于新分析的文件(exe/dll),字符串会自动生成。对于现有数据库(i64/idb),字符串已保存。若意外出现0条字符串,执行一次
SELECT rebuild_strings()
重建列表。详见下方“字符串列表界面”章节。

String References (explicit join pattern)

字符串引用(显式关联模式)

Use
strings + xrefs + funcs
directly. This is the canonical pattern.
sql
-- Find call sites/functions referencing error-like strings
SELECT
    s.content as string_value,
    printf('0x%X', x.from_ea) as ref_addr,
    (SELECT name FROM funcs WHERE x.from_ea >= address AND x.from_ea < end_ea LIMIT 1) as func_name
FROM strings s
JOIN xrefs x ON x.to_ea = s.address
WHERE s.content LIKE '%error%' OR s.content LIKE '%fail%'
ORDER BY func_name, ref_addr;

-- Functions with most string references
SELECT
    (SELECT name FROM funcs WHERE x.from_ea >= address AND x.from_ea < end_ea LIMIT 1) as func_name,
    COUNT(*) as string_refs
FROM strings s
JOIN xrefs x ON x.to_ea = s.address
GROUP BY func_name
ORDER BY string_refs DESC
LIMIT 10;

直接使用
strings + xrefs + funcs
组合,这是标准模式。
sql
-- 查找引用错误类字符串的调用位置/函数
SELECT
    s.content as string_value,
    printf('0x%X', x.from_ea) as ref_addr,
    (SELECT name FROM funcs WHERE x.from_ea >= address AND x.from_ea < end_ea LIMIT 1) as func_name
FROM strings s
JOIN xrefs x ON x.to_ea = s.address
WHERE s.content LIKE '%error%' OR s.content LIKE '%fail%'
ORDER BY func_name, ref_addr;

-- 引用字符串最多的函数
SELECT
    (SELECT name FROM funcs WHERE x.from_ea >= address AND x.from_ea < end_ea LIMIT 1) as func_name,
    COUNT(*) as string_refs
FROM strings s
JOIN xrefs x ON x.to_ea = s.address
GROUP BY func_name
ORDER BY string_refs DESC
LIMIT 10;

SQL Functions — Byte Access (Read-Only)

SQL函数 — 字节访问(只读)

FunctionDescription
bytes(addr, n)
Read
n
raw bytes as hex string
bytes_raw(addr, n)
Read
n
bytes as BLOB
For row-shaped byte workflows,
bytes
is a pure mapped-byte table:
sql
SELECT ea, value
FROM bytes
WHERE ea >= 0x401000 AND ea < 0x401010
ORDER BY ea;
Use
heads
when you need IDA item size/type metadata.

函数描述
bytes(addr, n)
读取
n
个原始字节并以十六进制字符串返回
bytes_raw(addr, n)
读取
n
个字节并以BLOB格式返回
对于行式字节处理流程,
bytes
是纯映射字节表:
sql
SELECT ea, value
FROM bytes
WHERE ea >= 0x401000 AND ea < 0x401010
ORDER BY ea;
当需要IDA项的大小/类型元数据时,使用
heads

Binary Search Table

二进制搜索表

Use
byte_search
for raw bytes/opcodes. It requires
WHERE pattern = ...
;
matched_hex
is an output column, not the search input.
ColumnTypeDescription
address
INTMatch address
matched_hex
TEXTMatched bytes rendered as hex text
matched_bytes
BLOBMatched bytes as raw bytes
size
INTMatch size in bytes
pattern
HIDDEN TEXTRequired IDA byte pattern input
start_ea
HIDDEN INTOptional inclusive lower bound
end_ea
HIDDEN INTOptional exclusive upper bound
max_results
HIDDEN INTOptional generator cap
Pattern syntax (IDA native):
  • "48 8B 05"
    - Exact bytes (hex, space-separated)
  • "48 ? 05"
    or
    "48 ?? 05"
    -
    ?
    = any byte wildcard (whole byte only)
  • "(01 02 03)"
    - Alternatives (match any of these bytes)
Note: Nibble wildcards and regex are not supported in byte patterns.
Example:
sql
-- Find all matches for a pattern
SELECT address, matched_hex, size
FROM byte_search
WHERE pattern = '48 8B ? 00'
LIMIT 10;

-- First match only
SELECT printf('0x%llX', address) AS addr
FROM byte_search
WHERE pattern = 'CC CC CC'
ORDER BY address
LIMIT 1;

-- Search with alternatives
SELECT address, matched_hex
FROM byte_search
WHERE pattern = 'E8 (01 02 03 04)'
LIMIT 20;
Optimization Pattern: Find functions using specific instruction
To answer "How many functions use RDTSC instruction?" efficiently:
sql
-- Count unique functions containing RDTSC (opcode: 0F 31)
SELECT COUNT(DISTINCT f.address) as count
FROM byte_search b
JOIN funcs f ON b.address >= f.address AND b.address < f.end_ea
WHERE b.pattern = '0F 31';

-- List those functions with names
SELECT DISTINCT
    f.address as func_ea,
    f.name as func_name
FROM byte_search b
JOIN funcs f ON b.address >= f.address AND b.address < f.end_ea
WHERE b.pattern = '0F 31';
This is much faster than scanning all disassembly lines because:
  • byte_search
    uses IDA's native binary search
  • the containment join uses the compact
    funcs
    table instead of scanning every instruction

使用
byte_search
查找原始字节/操作码。必须使用
WHERE pattern = ...
matched_hex
是输出列,而非搜索输入。
列名类型描述
address
INT匹配地址
matched_hex
TEXT匹配字节的十六进制文本形式
matched_bytes
BLOB匹配字节的原始格式
size
INT匹配结果的字节大小
pattern
HIDDEN TEXT必填的IDA字节模式输入
start_ea
HIDDEN INT可选的包含性下界
end_ea
HIDDEN INT可选的排他性上界
max_results
HIDDEN INT可选的结果数量上限
模式语法(IDA原生):
  • "48 8B 05"
    - 精确字节(十六进制,空格分隔)
  • "48 ? 05"
    "48 ?? 05"
    -
    ?
    = 任意字节通配符(仅支持整字节)
  • "(01 02 03)"
    - 备选模式(匹配其中任意一组字节)
注意: 字节模式不支持半字节通配符和正则表达式。
示例:
sql
-- 查找某模式的所有匹配结果
SELECT address, matched_hex, size
FROM byte_search
WHERE pattern = '48 8B ? 00'
LIMIT 10;

-- 仅查找第一个匹配结果
SELECT printf('0x%llX', address) AS addr
FROM byte_search
WHERE pattern = 'CC CC CC'
ORDER BY address
LIMIT 1;

-- 使用备选模式搜索
SELECT address, matched_hex
FROM byte_search
WHERE pattern = 'E8 (01 02 03 04)'
LIMIT 20;
优化模式:查找使用特定指令的函数
高效回答“有多少函数使用RDTSC指令?”:
sql
-- 统计包含RDTSC(操作码:0F 31)的唯一函数数量
SELECT COUNT(DISTINCT f.address) as count
FROM byte_search b
JOIN funcs f ON b.address >= f.address AND b.address < f.end_ea
WHERE b.pattern = '0F 31';

-- 列出这些函数及其名称
SELECT DISTINCT
    f.address as func_ea,
    f.name as func_name
FROM byte_search b
JOIN funcs f ON b.address >= f.address AND b.address < f.end_ea
WHERE b.pattern = '0F 31';
这种方式比扫描所有反汇编行快得多,原因如下:
  • byte_search
    使用IDA原生二进制搜索
  • 包含关联使用紧凑的
    funcs
    表,而非扫描每条指令

Choose the Right Search Surface

选择合适的搜索界面

  • Use
    grep
    for named entities such as functions, labels, structs, enums, and members.
  • Use
    strings
    when the user is searching literal string contents inside the binary.
  • Use
    byte_search
    when the target is a raw byte or opcode pattern.

  • 使用
    grep
    查找命名实体,如函数、标签、结构体、枚举及成员。
  • 使用
    strings
    查找二进制文件中的字面字符串内容。
  • 使用
    byte_search
    查找原始字节或操作码模式。

SQL Surfaces — String List

SQL界面 — 字符串列表

IDA maintains a cached list of strings. Use
rebuild_strings()
to detect and cache strings,
COUNT(*) FROM strings
for the current count, and
strings
for row-level analysis.
SurfaceDescription
rebuild_strings()
Rebuild with ASCII + UTF-16, minlen 5 (default)
rebuild_strings(minlen)
Rebuild with custom minimum length
rebuild_strings(minlen, types)
Rebuild with custom length and type mask
SELECT COUNT(*) FROM strings
Current string-list count (optimized without row materialization)
Type mask values:
  • 1
    = ASCII only (STRTYPE_C)
  • 2
    = UTF-16 only (STRTYPE_C_16)
  • 4
    = UTF-32 only (STRTYPE_C_32)
  • 3
    = ASCII + UTF-16 (default)
  • 7
    = All types
sql
-- Check current string count
SELECT COUNT(*) AS strings FROM strings;

-- Rebuild with defaults (ASCII + UTF-16, minlen 5)
SELECT rebuild_strings();

-- Rebuild with shorter minimum length
SELECT rebuild_strings(4);

-- Rebuild with specific types
SELECT rebuild_strings(5, 1);   -- ASCII only
SELECT rebuild_strings(5, 7);   -- All types (ASCII + UTF-16 + UTF-32)

-- Typical workflow: rebuild then query
SELECT rebuild_strings();
SELECT * FROM strings WHERE content LIKE '%error%';
IMPORTANT - Agent Behavior for String Queries: When the user asks about strings (e.g., "show me the strings", "what strings are in this binary"):
  1. First run
    SELECT rebuild_strings()
    to ensure strings are detected
  2. Then query the
    strings
    table
The
rebuild_strings()
function configures IDA's string detection with sensible defaults (ASCII + UTF-16, minimum length 5) and rebuilds the string list. This ensures the user gets results even if the database had no prior string analysis.

IDA维护一个缓存的字符串列表。使用
rebuild_strings()
检测并缓存字符串,
COUNT(*) FROM strings
查看当前数量,
strings
进行行级分析。
界面描述
rebuild_strings()
重建字符串列表(默认包含ASCII + UTF-16,最小长度5)
rebuild_strings(minlen)
使用自定义最小长度重建字符串列表
rebuild_strings(minlen, types)
使用自定义长度和类型掩码重建字符串列表
SELECT COUNT(*) FROM strings
当前字符串列表数量(优化后无需行实例化)
类型掩码值:
  • 1
    = 仅ASCII(STRTYPE_C)
  • 2
    = 仅UTF-16(STRTYPE_C_16)
  • 4
    = 仅UTF-32(STRTYPE_C_32)
  • 3
    = ASCII + UTF-16(默认)
  • 7
    = 所有类型
sql
-- 查看当前字符串数量
SELECT COUNT(*) AS strings FROM strings;

-- 使用默认参数重建(ASCII + UTF-16,最小长度5)
SELECT rebuild_strings();

-- 使用更短的最小长度重建
SELECT rebuild_strings(4);

-- 使用特定类型重建
SELECT rebuild_strings(5, 1);   -- 仅ASCII
SELECT rebuild_strings(5, 7);   -- 所有类型(ASCII + UTF-16 + UTF-32)

-- 典型流程:重建后查询
SELECT rebuild_strings();
SELECT * FROM strings WHERE content LIKE '%error%';
重要提示 - 字符串查询的Agent行为: 当用户询问字符串相关问题时(例如“显示所有字符串”、“这个二进制文件中有哪些字符串”):
  1. 首先执行
    SELECT rebuild_strings()
    确保字符串被检测到
  2. 然后查询
    strings
rebuild_strings()
函数会使用合理默认值(ASCII + UTF-16,最小长度5)配置IDA的字符串检测,并重建字符串列表。即使数据库之前未进行字符串分析,也能确保用户获得结果。

Performance Rules

性能规则

Table/FunctionArchitectureNotes
COUNT(*) FROM strings
Cached table count pathO(1) current string-list count
strings
CachedRebuilt on demand via
rebuild_strings()
; fast once cached
byte_search
Native binary search tableMuch faster than iterating instructions table
bytes()
Direct readO(1) per address, no table overhead
Key rules:
  • Always call
    rebuild_strings()
    before the first string query on a new database or after making code/data changes that may create new strings.
  • byte_search
    uses IDA's native binary search engine; for "find functions containing opcode X", join matches to
    funcs
    by containment instead of scanning the
    instructions
    table.
  • strings
    table is a snapshot of the cached string list. If IDA's analysis creates new strings after your initial query, call
    rebuild_strings()
    again.
  • For cross-referencing strings with functions, the
    strings + xrefs + funcs
    JOIN pattern is canonical — IDA's xref index makes the JOIN fast.

表/函数架构说明
COUNT(*) FROM strings
缓存表计数路径O(1)时间复杂度获取当前字符串列表数量
strings
缓存型通过
rebuild_strings()
按需重建;缓存后查询速度快
byte_search
原生二进制搜索表比遍历指令表快得多
bytes()
直接读取每个地址O(1)时间复杂度,无表开销
核心规则:
  • 在新数据库首次查询字符串前,或在修改代码/数据可能产生新字符串后,务必调用
    rebuild_strings()
  • byte_search
    使用IDA原生二进制搜索引擎;对于“查找包含操作码X的函数”,将匹配结果与
    funcs
    表进行包含关联,而非扫描
    instructions
    表。
  • strings
    表是缓存字符串列表的快照。若IDA分析在初始查询后生成了新字符串,需再次调用
    rebuild_strings()
  • 交叉引用字符串与函数时,
    strings + xrefs + funcs
    的关联模式是标准方式 — IDA的xref索引使关联操作速度快。

Advanced Data Patterns (CTEs)

高级数据模式(CTE)

Security-relevant string triage

安全相关字符串分类

Categorize strings by security relevance for rapid threat assessment:
sql
-- Categorize strings by security relevance
SELECT rebuild_strings();

WITH categorized AS (
    SELECT address, content,
        CASE
            WHEN content LIKE '%password%' OR content LIKE '%passwd%' OR content LIKE '%secret%'
                THEN 'credential'
            WHEN content LIKE '%http://%' OR content LIKE '%https://%' OR content LIKE '%ftp://%'
                THEN 'url'
            WHEN content LIKE '%error%' OR content LIKE '%fail%' OR content LIKE '%exception%'
                THEN 'error'
            WHEN content LIKE '%debug%' OR content LIKE '%trace%' OR content LIKE '%assert%'
                THEN 'debug'
            WHEN content LIKE '%.exe%' OR content LIKE '%.dll%' OR content LIKE '%.sys%'
                THEN 'file_path'
            WHEN content LIKE '%HKEY_%' OR content LIKE '%SOFTWARE\\%'
                THEN 'registry'
            ELSE 'other'
        END AS category
    FROM strings
    WHERE length >= 5
)
SELECT category, COUNT(*) AS count,
       GROUP_CONCAT(SUBSTR(content, 1, 60), ' | ') AS samples
FROM categorized
WHERE category != 'other'
GROUP BY category
ORDER BY count DESC;
按安全相关性对字符串分类,快速进行威胁评估:
sql
-- 按安全相关性分类字符串
SELECT rebuild_strings();

WITH categorized AS (
    SELECT address, content,
        CASE
            WHEN content LIKE '%password%' OR content LIKE '%passwd%' OR content LIKE '%secret%'
                THEN 'credential'
            WHEN content LIKE '%http://%' OR content LIKE '%https://%' OR content LIKE '%ftp://%'
                THEN 'url'
            WHEN content LIKE '%error%' OR content LIKE '%fail%' OR content LIKE '%exception%'
                THEN 'error'
            WHEN content LIKE '%debug%' OR content LIKE '%trace%' OR content LIKE '%assert%'
                THEN 'debug'
            WHEN content LIKE '%.exe%' OR content LIKE '%.dll%' OR content LIKE '%.sys%'
                THEN 'file_path'
            WHEN content LIKE '%HKEY_%' OR content LIKE '%SOFTWARE\\%'
                THEN 'registry'
            ELSE 'other'
        END AS category
    FROM strings
    WHERE length >= 5
)
SELECT category, COUNT(*) AS count,
       GROUP_CONCAT(SUBSTR(content, 1, 60), ' | ') AS samples
FROM categorized
WHERE category != 'other'
GROUP BY category
ORDER BY count DESC;

Combine string references with function size for suspicion scoring

结合字符串引用与函数大小进行可疑度评分

Functions referencing security-relevant strings AND having significant size are high-priority targets:
sql
-- Suspicious functions: reference interesting strings AND are non-trivial
WITH interesting_strings AS (
    SELECT address, content FROM strings
    WHERE content LIKE '%password%' OR content LIKE '%encrypt%'
       OR content LIKE '%decrypt%' OR content LIKE '%http%'
       OR content LIKE '%socket%' OR content LIKE '%connect%'
),
string_funcs AS (
    SELECT DISTINCT (SELECT address FROM funcs WHERE x.from_ea >= address AND x.from_ea < end_ea LIMIT 1) AS func_addr,
           s.content AS matched_string
    FROM interesting_strings s
    JOIN xrefs x ON x.to_ea = s.address
    WHERE (SELECT address FROM funcs WHERE x.from_ea >= address AND x.from_ea < end_ea LIMIT 1) IS NOT NULL
)
SELECT (SELECT name FROM funcs WHERE sf.func_addr >= address AND sf.func_addr < end_ea LIMIT 1) AS func_name,
       printf('0x%X', sf.func_addr) AS addr,
       f.size AS func_size,
       GROUP_CONCAT(sf.matched_string, ' | ') AS strings_referenced
FROM string_funcs sf
JOIN funcs f ON f.address = sf.func_addr
GROUP BY sf.func_addr
ORDER BY f.size DESC
LIMIT 20;
引用安全相关字符串且体积较大的函数是高优先级分析目标:
sql
-- 可疑函数:引用敏感字符串且体积较大
WITH interesting_strings AS (
    SELECT address, content FROM strings
    WHERE content LIKE '%password%' OR content LIKE '%encrypt%'
       OR content LIKE '%decrypt%' OR content LIKE '%http%'
       OR content LIKE '%socket%' OR content LIKE '%connect%'
),
string_funcs AS (
    SELECT DISTINCT (SELECT address FROM funcs WHERE x.from_ea >= address AND x.from_ea < end_ea LIMIT 1) AS func_addr,
           s.content AS matched_string
    FROM interesting_strings s
    JOIN xrefs x ON x.to_ea = s.address
    WHERE (SELECT address FROM funcs WHERE x.from_ea >= address AND x.from_ea < end_ea LIMIT 1) IS NOT NULL
)
SELECT (SELECT name FROM funcs WHERE sf.func_addr >= address AND sf.func_addr < end_ea LIMIT 1) AS func_name,
       printf('0x%X', sf.func_addr) AS addr,
       f.size AS func_size,
       GROUP_CONCAT(sf.matched_string, ' | ') AS strings_referenced
FROM string_funcs sf
JOIN funcs f ON f.address = sf.func_addr
GROUP BY sf.func_addr
ORDER BY f.size DESC
LIMIT 20;

Find functions referencing both crypto-related and network-related strings

查找同时引用加密相关和网络相关字符串的函数

Cross-category correlation for identifying data exfiltration or C2 communication:
sql
-- Functions touching both crypto and network strings
WITH crypto_refs AS (
    SELECT DISTINCT (SELECT address FROM funcs WHERE x.from_ea >= address AND x.from_ea < end_ea LIMIT 1) AS func_addr
    FROM strings s JOIN xrefs x ON x.to_ea = s.address
    WHERE s.content LIKE '%crypt%' OR s.content LIKE '%aes%'
       OR s.content LIKE '%cipher%' OR s.content LIKE '%hash%'
),
network_refs AS (
    SELECT DISTINCT (SELECT address FROM funcs WHERE x.from_ea >= address AND x.from_ea < end_ea LIMIT 1) AS func_addr
    FROM strings s JOIN xrefs x ON x.to_ea = s.address
    WHERE s.content LIKE '%socket%' OR s.content LIKE '%connect%'
       OR s.content LIKE '%send%' OR s.content LIKE '%recv%'
       OR s.content LIKE '%http%'
)
SELECT (SELECT name FROM funcs WHERE c.func_addr >= address AND c.func_addr < end_ea LIMIT 1) AS func_name,
       printf('0x%X', c.func_addr) AS addr
FROM crypto_refs c
JOIN network_refs n ON n.func_addr = c.func_addr;
跨类别关联以识别数据泄露或C2通信行为:
sql
-- 同时涉及加密和网络字符串的函数
WITH crypto_refs AS (
    SELECT DISTINCT (SELECT address FROM funcs WHERE x.from_ea >= address AND x.from_ea < end_ea LIMIT 1) AS func_addr
    FROM strings s JOIN xrefs x ON x.to_ea = s.address
    WHERE s.content LIKE '%crypt%' OR s.content LIKE '%aes%'
       OR s.content LIKE '%cipher%' OR s.content LIKE '%hash%'
),
network_refs AS (
    SELECT DISTINCT (SELECT address FROM funcs WHERE x.from_ea >= address AND x.from_ea < end_ea LIMIT 1) AS func_addr
    FROM strings s JOIN xrefs x ON x.to_ea = s.address
    WHERE s.content LIKE '%socket%' OR s.content LIKE '%connect%'
       OR s.content LIKE '%send%' OR s.content LIKE '%recv%'
       OR s.content LIKE '%http%'
)
SELECT (SELECT name FROM funcs WHERE c.func_addr >= address AND c.func_addr < end_ea LIMIT 1) AS func_name,
       printf('0x%X', c.func_addr) AS addr
FROM crypto_refs c
JOIN network_refs n ON n.func_addr = c.func_addr;