variant-analysis

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Variant Analysis

变体分析

You are a variant analysis expert. Your role is to help find similar vulnerabilities and bugs across a codebase after identifying an initial pattern.

你是一位变体分析专家，你的职责是在识别初始模式后，帮助跨代码库查找相似的漏洞和Bug。

When to Use

适用场景

Use this skill when:

A vulnerability has been found and you need to search for similar instances
Building or refining CodeQL/Semgrep queries for security patterns
Performing systematic code audits after an initial issue discovery
Hunting for bug variants across a codebase
Analyzing how a single root cause manifests in different code paths

当你遇到以下场景时，可使用此技能：

已发现漏洞，需要查找相似实例
为安全模式编写或优化CodeQL/Semgrep查询
发现初始问题后执行系统性代码审计
跨代码库寻找Bug变体
分析单一根本原因在不同代码路径中的表现形式

When NOT to Use

不适用场景

Do NOT use this skill for:

Initial vulnerability discovery (use audit-context-building or domain-specific audits instead)
General code review without a known pattern to search for
Writing fix recommendations (use issue-writer instead)
Understanding unfamiliar code (use audit-context-building for deep comprehension first)

请勿在以下场景使用此技能：

初始漏洞发现（请改用审计上下文构建或特定领域审计技能）
无已知搜索模式的常规代码评审
编写修复建议（请改用Issue编写技能）
理解陌生代码（请先使用审计上下文构建技能进行深度理解）

The Five-Step Process

五步流程法

Step 1: Understand the Original Issue

步骤1：理解原始问题

Before searching, deeply understand the known bug:

What is the root cause? Not the symptom, but WHY it's vulnerable
What conditions are required? Control flow, data flow, state
What makes it exploitable? User control, missing validation, etc.

在开始搜索前，需深入理解已知Bug：

根本原因是什么？ 不是表面症状，而是它存在漏洞的本质原因
需要满足哪些条件？控制流、数据流、系统状态
哪些因素使其可被利用？用户可控输入、缺失验证等

Step 2: Create an Exact Match

步骤2：创建精确匹配模式

Start with a pattern that matches ONLY the known instance:

bash

rg -n "exact_vulnerable_code_here"

Verify: Does it match exactly ONE location (the original)?

从仅匹配已知实例的模式开始：

bash

rg -n "exact_vulnerable_code_here"

验证：是否仅匹配一个位置（即原始漏洞位置）？

Step 3: Identify Abstraction Points

步骤3：确定抽象点

Element	Keep Specific	Can Abstract
Function name	If unique to bug	If pattern applies to family
Variable names	Never	Always use metavariables
Literal values	If value matters	If any value triggers bug
Arguments	If position matters	Use `...` wildcards

元素	需保留特定值	可进行抽象
函数名	若为Bug独有	若模式适用于同类函数
变量名	绝不保留	始终使用元变量
字面量值	若值对Bug有影响	若任意值均可触发Bug
参数	若位置对Bug有影响	使用 `...` 通配符

Step 4: Iteratively Generalize

步骤4：迭代泛化模式

Change ONE element at a time:

Run the pattern
Review ALL new matches
Classify: true positive or false positive?
If FP rate acceptable, generalize next element
If FP rate too high, revert and try different abstraction

Stop when false positive rate exceeds ~50%

每次仅修改一个元素：

运行当前模式
检查所有新匹配结果
分类：真阳性还是假阳性？
若假阳性率可接受，泛化下一个元素
若假阳性率过高，回退并尝试其他抽象方式

当假阳性率超过约50%时停止泛化

Step 5: Analyze and Triage Results

步骤5：分析并分类结果

For each match, document:

Location: File, line, function
Confidence: High/Medium/Low
Exploitability: Reachable? Controllable inputs?
Priority: Based on impact and exploitability

For deeper strategic guidance, see METHODOLOGY.md.

针对每个匹配结果，记录以下信息：

位置：文件、行号、函数
置信度：高/中/低
可利用性：是否可访问？是否存在可控输入？
优先级：基于影响范围和可利用性评估

如需更深入的策略指导，请查看METHODOLOGY.md。

Tool Selection

工具选择

Scenario	Tool	Why
Quick surface search	ripgrep	Fast, zero setup
Simple pattern matching	Semgrep	Easy syntax, no build needed
Data flow tracking	Semgrep taint / CodeQL	Follows values across functions
Cross-function analysis	CodeQL	Best interprocedural analysis
Non-building code	Semgrep	Works on incomplete code

场景	工具	选择理由
快速表层搜索	ripgrep	速度快，无需配置
简单模式匹配	Semgrep	语法简洁，无需构建代码
数据流追踪	Semgrep taint / CodeQL	可跨函数追踪值的流向
跨函数分析	CodeQL	最适合过程间分析
未构建代码	Semgrep	可处理不完整代码

Key Principles

核心原则

Root cause first: Understand WHY before searching for WHERE
Start specific: First pattern should match exactly the known bug
One change at a time: Generalize incrementally, verify after each change
Know when to stop: 50%+ FP rate means you've gone too generic
Search everywhere: Always search the ENTIRE codebase, not just the module where the bug was found
Expand vulnerability classes: One root cause often has multiple manifestations

先抓根本原因：在查找位置前，先理解问题产生的原因
从精确模式开始：初始模式应仅匹配已知Bug
每次仅修改一处：逐步泛化，每次修改后验证结果
懂得适时停止：假阳性率超过50%意味着模式过于宽泛
全域搜索：始终搜索整个代码库，而非仅Bug所在的模块
拓展漏洞类别：单一根本原因通常会有多种表现形式

Critical Pitfalls to Avoid

需避免的常见误区

These common mistakes cause analysts to miss real vulnerabilities:

以下常见错误会导致分析人员遗漏真实漏洞：

1. Narrow Search Scope

1. 搜索范围过窄

Searching only the module where the original bug was found misses variants in other locations.

Example: Bug found in

api/handlers/

→ only searching that directory → missing variant in

utils/auth.py

Mitigation: Always run searches against the entire codebase root directory.

仅搜索原始Bug所在的模块，会遗漏其他位置的变体。

示例：在

api/handlers/

中发现Bug → 仅搜索该目录 → 遗漏

utils/auth.py

中的变体

解决方法：始终针对整个代码库的根目录执行搜索。

2. Pattern Too Specific

2. 模式过于精确

Using only the exact attribute/function from the original bug misses variants using related constructs.

Example: Bug uses

isAuthenticated

check → only searching for that exact term → missing bugs using related properties like

isActive

isAdmin

isVerified

Mitigation: Enumerate ALL semantically related attributes/functions for the bug class.

仅使用原始Bug中的精确属性/函数，会遗漏使用相关结构的变体。

示例：Bug涉及

isAuthenticated

检查 → 仅搜索该精确术语 → 遗漏使用

isActive

、

isAdmin

、

isVerified

等相关属性的Bug

解决方法：枚举该类Bug所有语义相关的属性/函数。

3. Single Vulnerability Class

3. 局限于单一漏洞类别

Focusing on only one manifestation of the root cause misses other ways the same logic error appears.

Example: Original bug is "return allow when condition is false" → only searching that pattern → missing:

Null equality bypasses (
```
null == null
```
evaluates to true)
Documentation/code mismatches (function does opposite of what docs claim)
Inverted conditional logic (wrong branch taken)

Mitigation: List all possible manifestations of the root cause before searching.

仅关注根本原因的一种表现形式，会遗漏同一逻辑错误的其他表现方式。

示例：原始Bug是“条件为假时返回允许” → 仅搜索该模式 → 遗漏以下情况：

空值相等绕过（
```
null == null
```
结果为真）
文档与代码不匹配（函数行为与文档描述相反）
条件逻辑反转（执行了错误分支）

解决方法：在搜索前列出根本原因的所有可能表现形式。

4. Missing Edge Cases

4. 遗漏边缘情况

Testing patterns only with "normal" scenarios misses vulnerabilities triggered by edge cases.

Example: Testing auth checks only with valid users → missing bypass when

userId = null

matches

resourceOwnerId = null

Mitigation: Test with: unauthenticated users, null/undefined values, empty collections, and boundary conditions.

仅用“常规”场景测试模式，会遗漏由边缘情况触发的漏洞。

示例：仅用有效用户测试权限检查 → 遗漏当

userId = null

与

resourceOwnerId = null

匹配时的绕过情况

解决方法：使用以下场景测试：未认证用户、空值/未定义值、空集合，以及边界条件。

Resources

资源

Ready-to-use templates in

resources/

CodeQL (

resources/codeql/

python.ql

javascript.ql

java.ql

go.ql

cpp.ql

Semgrep (

resources/semgrep/

python.yaml

javascript.yaml

java.yaml

go.yaml

cpp.yaml

Report:

resources/variant-report-template.md

resources/

目录下提供了可直接使用的模板：

CodeQL（

resources/codeql/

）：

python.ql

、

javascript.ql

、

java.ql

、

go.ql

、

cpp.ql

Semgrep（

resources/semgrep/

）：

python.yaml

、

javascript.yaml

、

java.yaml

、

go.yaml

、

cpp.yaml

报告模板：

resources/variant-report-template.md