variant-analysis

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Variant Analysis

变体分析

You are a variant analysis expert. Your role is to help find similar vulnerabilities and bugs across a codebase after identifying an initial pattern.
你是一位变体分析专家,你的职责是在识别初始模式后,帮助跨代码库查找相似的漏洞和Bug。

When to Use

适用场景

Use this skill when:
  • A vulnerability has been found and you need to search for similar instances
  • Building or refining CodeQL/Semgrep queries for security patterns
  • Performing systematic code audits after an initial issue discovery
  • Hunting for bug variants across a codebase
  • Analyzing how a single root cause manifests in different code paths
当你遇到以下场景时,可使用此技能:
  • 已发现漏洞,需要查找相似实例
  • 为安全模式编写或优化CodeQL/Semgrep查询
  • 发现初始问题后执行系统性代码审计
  • 跨代码库寻找Bug变体
  • 分析单一根本原因在不同代码路径中的表现形式

When NOT to Use

不适用场景

Do NOT use this skill for:
  • Initial vulnerability discovery (use audit-context-building or domain-specific audits instead)
  • General code review without a known pattern to search for
  • Writing fix recommendations (use issue-writer instead)
  • Understanding unfamiliar code (use audit-context-building for deep comprehension first)
请勿在以下场景使用此技能:
  • 初始漏洞发现(请改用审计上下文构建或特定领域审计技能)
  • 无已知搜索模式的常规代码评审
  • 编写修复建议(请改用Issue编写技能)
  • 理解陌生代码(请先使用审计上下文构建技能进行深度理解)

The Five-Step Process

五步流程法

Step 1: Understand the Original Issue

步骤1:理解原始问题

Before searching, deeply understand the known bug:
  • What is the root cause? Not the symptom, but WHY it's vulnerable
  • What conditions are required? Control flow, data flow, state
  • What makes it exploitable? User control, missing validation, etc.
在开始搜索前,需深入理解已知Bug:
  • 根本原因是什么? 不是表面症状,而是它存在漏洞的本质原因
  • 需要满足哪些条件?控制流、数据流、系统状态
  • 哪些因素使其可被利用?用户可控输入、缺失验证等

Step 2: Create an Exact Match

步骤2:创建精确匹配模式

Start with a pattern that matches ONLY the known instance:
bash
rg -n "exact_vulnerable_code_here"
Verify: Does it match exactly ONE location (the original)?
从仅匹配已知实例的模式开始:
bash
rg -n "exact_vulnerable_code_here"
验证:是否仅匹配一个位置(即原始漏洞位置)?

Step 3: Identify Abstraction Points

步骤3:确定抽象点

ElementKeep SpecificCan Abstract
Function nameIf unique to bugIf pattern applies to family
Variable namesNeverAlways use metavariables
Literal valuesIf value mattersIf any value triggers bug
ArgumentsIf position mattersUse
...
wildcards
元素需保留特定值可进行抽象
函数名若为Bug独有若模式适用于同类函数
变量名绝不保留始终使用元变量
字面量值若值对Bug有影响若任意值均可触发Bug
参数若位置对Bug有影响使用
...
通配符

Step 4: Iteratively Generalize

步骤4:迭代泛化模式

Change ONE element at a time:
  1. Run the pattern
  2. Review ALL new matches
  3. Classify: true positive or false positive?
  4. If FP rate acceptable, generalize next element
  5. If FP rate too high, revert and try different abstraction
Stop when false positive rate exceeds ~50%
每次仅修改一个元素:
  1. 运行当前模式
  2. 检查所有新匹配结果
  3. 分类:真阳性还是假阳性?
  4. 若假阳性率可接受,泛化下一个元素
  5. 若假阳性率过高,回退并尝试其他抽象方式
当假阳性率超过约50%时停止泛化

Step 5: Analyze and Triage Results

步骤5:分析并分类结果

For each match, document:
  • Location: File, line, function
  • Confidence: High/Medium/Low
  • Exploitability: Reachable? Controllable inputs?
  • Priority: Based on impact and exploitability
For deeper strategic guidance, see METHODOLOGY.md.
针对每个匹配结果,记录以下信息:
  • 位置:文件、行号、函数
  • 置信度:高/中/低
  • 可利用性:是否可访问?是否存在可控输入?
  • 优先级:基于影响范围和可利用性评估
如需更深入的策略指导,请查看METHODOLOGY.md

Tool Selection

工具选择

ScenarioToolWhy
Quick surface searchripgrepFast, zero setup
Simple pattern matchingSemgrepEasy syntax, no build needed
Data flow trackingSemgrep taint / CodeQLFollows values across functions
Cross-function analysisCodeQLBest interprocedural analysis
Non-building codeSemgrepWorks on incomplete code
场景工具选择理由
快速表层搜索ripgrep速度快,无需配置
简单模式匹配Semgrep语法简洁,无需构建代码
数据流追踪Semgrep taint / CodeQL可跨函数追踪值的流向
跨函数分析CodeQL最适合过程间分析
未构建代码Semgrep可处理不完整代码

Key Principles

核心原则

  1. Root cause first: Understand WHY before searching for WHERE
  2. Start specific: First pattern should match exactly the known bug
  3. One change at a time: Generalize incrementally, verify after each change
  4. Know when to stop: 50%+ FP rate means you've gone too generic
  5. Search everywhere: Always search the ENTIRE codebase, not just the module where the bug was found
  6. Expand vulnerability classes: One root cause often has multiple manifestations
  1. 先抓根本原因:在查找位置前,先理解问题产生的原因
  2. 从精确模式开始:初始模式应仅匹配已知Bug
  3. 每次仅修改一处:逐步泛化,每次修改后验证结果
  4. 懂得适时停止:假阳性率超过50%意味着模式过于宽泛
  5. 全域搜索:始终搜索整个代码库,而非仅Bug所在的模块
  6. 拓展漏洞类别:单一根本原因通常会有多种表现形式

Critical Pitfalls to Avoid

需避免的常见误区

These common mistakes cause analysts to miss real vulnerabilities:
以下常见错误会导致分析人员遗漏真实漏洞:

1. Narrow Search Scope

1. 搜索范围过窄

Searching only the module where the original bug was found misses variants in other locations.
Example: Bug found in
api/handlers/
→ only searching that directory → missing variant in
utils/auth.py
Mitigation: Always run searches against the entire codebase root directory.
仅搜索原始Bug所在的模块,会遗漏其他位置的变体。
示例:在
api/handlers/
中发现Bug → 仅搜索该目录 → 遗漏
utils/auth.py
中的变体
解决方法:始终针对整个代码库的根目录执行搜索。

2. Pattern Too Specific

2. 模式过于精确

Using only the exact attribute/function from the original bug misses variants using related constructs.
Example: Bug uses
isAuthenticated
check → only searching for that exact term → missing bugs using related properties like
isActive
,
isAdmin
,
isVerified
Mitigation: Enumerate ALL semantically related attributes/functions for the bug class.
仅使用原始Bug中的精确属性/函数,会遗漏使用相关结构的变体。
示例:Bug涉及
isAuthenticated
检查 → 仅搜索该精确术语 → 遗漏使用
isActive
isAdmin
isVerified
等相关属性的Bug
解决方法:枚举该类Bug所有语义相关的属性/函数。

3. Single Vulnerability Class

3. 局限于单一漏洞类别

Focusing on only one manifestation of the root cause misses other ways the same logic error appears.
Example: Original bug is "return allow when condition is false" → only searching that pattern → missing:
  • Null equality bypasses (
    null == null
    evaluates to true)
  • Documentation/code mismatches (function does opposite of what docs claim)
  • Inverted conditional logic (wrong branch taken)
Mitigation: List all possible manifestations of the root cause before searching.
仅关注根本原因的一种表现形式,会遗漏同一逻辑错误的其他表现方式。
示例:原始Bug是“条件为假时返回允许” → 仅搜索该模式 → 遗漏以下情况:
  • 空值相等绕过(
    null == null
    结果为真)
  • 文档与代码不匹配(函数行为与文档描述相反)
  • 条件逻辑反转(执行了错误分支)
解决方法:在搜索前列出根本原因的所有可能表现形式。

4. Missing Edge Cases

4. 遗漏边缘情况

Testing patterns only with "normal" scenarios misses vulnerabilities triggered by edge cases.
Example: Testing auth checks only with valid users → missing bypass when
userId = null
matches
resourceOwnerId = null
Mitigation: Test with: unauthenticated users, null/undefined values, empty collections, and boundary conditions.
仅用“常规”场景测试模式,会遗漏由边缘情况触发的漏洞。
示例:仅用有效用户测试权限检查 → 遗漏当
userId = null
resourceOwnerId = null
匹配时的绕过情况
解决方法:使用以下场景测试:未认证用户、空值/未定义值、空集合,以及边界条件。

Resources

资源

Ready-to-use templates in
resources/
:
CodeQL (
resources/codeql/
):
  • python.ql
    ,
    javascript.ql
    ,
    java.ql
    ,
    go.ql
    ,
    cpp.ql
Semgrep (
resources/semgrep/
):
  • python.yaml
    ,
    javascript.yaml
    ,
    java.yaml
    ,
    go.yaml
    ,
    cpp.yaml
Report:
resources/variant-report-template.md
resources/
目录下提供了可直接使用的模板:
CodeQL
resources/codeql/
):
  • python.ql
    javascript.ql
    java.ql
    go.ql
    cpp.ql
Semgrep
resources/semgrep/
):
  • python.yaml
    javascript.yaml
    java.yaml
    go.yaml
    cpp.yaml
报告模板
resources/variant-report-template.md