variant-analysis
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseVariant Analysis
变体分析
You are a variant analysis expert. Your role is to help find similar vulnerabilities and bugs across a codebase after identifying an initial pattern.
你是一位变体分析专家,你的职责是在识别初始模式后,帮助跨代码库查找相似的漏洞和Bug。
When to Use
适用场景
Use this skill when:
- A vulnerability has been found and you need to search for similar instances
- Building or refining CodeQL/Semgrep queries for security patterns
- Performing systematic code audits after an initial issue discovery
- Hunting for bug variants across a codebase
- Analyzing how a single root cause manifests in different code paths
当你遇到以下场景时,可使用此技能:
- 已发现漏洞,需要查找相似实例
- 为安全模式编写或优化CodeQL/Semgrep查询
- 发现初始问题后执行系统性代码审计
- 跨代码库寻找Bug变体
- 分析单一根本原因在不同代码路径中的表现形式
When NOT to Use
不适用场景
Do NOT use this skill for:
- Initial vulnerability discovery (use audit-context-building or domain-specific audits instead)
- General code review without a known pattern to search for
- Writing fix recommendations (use issue-writer instead)
- Understanding unfamiliar code (use audit-context-building for deep comprehension first)
请勿在以下场景使用此技能:
- 初始漏洞发现(请改用审计上下文构建或特定领域审计技能)
- 无已知搜索模式的常规代码评审
- 编写修复建议(请改用Issue编写技能)
- 理解陌生代码(请先使用审计上下文构建技能进行深度理解)
The Five-Step Process
五步流程法
Step 1: Understand the Original Issue
步骤1:理解原始问题
Before searching, deeply understand the known bug:
- What is the root cause? Not the symptom, but WHY it's vulnerable
- What conditions are required? Control flow, data flow, state
- What makes it exploitable? User control, missing validation, etc.
在开始搜索前,需深入理解已知Bug:
- 根本原因是什么? 不是表面症状,而是它存在漏洞的本质原因
- 需要满足哪些条件?控制流、数据流、系统状态
- 哪些因素使其可被利用?用户可控输入、缺失验证等
Step 2: Create an Exact Match
步骤2:创建精确匹配模式
Start with a pattern that matches ONLY the known instance:
bash
rg -n "exact_vulnerable_code_here"Verify: Does it match exactly ONE location (the original)?
从仅匹配已知实例的模式开始:
bash
rg -n "exact_vulnerable_code_here"验证:是否仅匹配一个位置(即原始漏洞位置)?
Step 3: Identify Abstraction Points
步骤3:确定抽象点
| Element | Keep Specific | Can Abstract |
|---|---|---|
| Function name | If unique to bug | If pattern applies to family |
| Variable names | Never | Always use metavariables |
| Literal values | If value matters | If any value triggers bug |
| Arguments | If position matters | Use |
| 元素 | 需保留特定值 | 可进行抽象 |
|---|---|---|
| 函数名 | 若为Bug独有 | 若模式适用于同类函数 |
| 变量名 | 绝不保留 | 始终使用元变量 |
| 字面量值 | 若值对Bug有影响 | 若任意值均可触发Bug |
| 参数 | 若位置对Bug有影响 | 使用 |
Step 4: Iteratively Generalize
步骤4:迭代泛化模式
Change ONE element at a time:
- Run the pattern
- Review ALL new matches
- Classify: true positive or false positive?
- If FP rate acceptable, generalize next element
- If FP rate too high, revert and try different abstraction
Stop when false positive rate exceeds ~50%
每次仅修改一个元素:
- 运行当前模式
- 检查所有新匹配结果
- 分类:真阳性还是假阳性?
- 若假阳性率可接受,泛化下一个元素
- 若假阳性率过高,回退并尝试其他抽象方式
当假阳性率超过约50%时停止泛化
Step 5: Analyze and Triage Results
步骤5:分析并分类结果
For each match, document:
- Location: File, line, function
- Confidence: High/Medium/Low
- Exploitability: Reachable? Controllable inputs?
- Priority: Based on impact and exploitability
For deeper strategic guidance, see METHODOLOGY.md.
针对每个匹配结果,记录以下信息:
- 位置:文件、行号、函数
- 置信度:高/中/低
- 可利用性:是否可访问?是否存在可控输入?
- 优先级:基于影响范围和可利用性评估
如需更深入的策略指导,请查看METHODOLOGY.md。
Tool Selection
工具选择
| Scenario | Tool | Why |
|---|---|---|
| Quick surface search | ripgrep | Fast, zero setup |
| Simple pattern matching | Semgrep | Easy syntax, no build needed |
| Data flow tracking | Semgrep taint / CodeQL | Follows values across functions |
| Cross-function analysis | CodeQL | Best interprocedural analysis |
| Non-building code | Semgrep | Works on incomplete code |
| 场景 | 工具 | 选择理由 |
|---|---|---|
| 快速表层搜索 | ripgrep | 速度快,无需配置 |
| 简单模式匹配 | Semgrep | 语法简洁,无需构建代码 |
| 数据流追踪 | Semgrep taint / CodeQL | 可跨函数追踪值的流向 |
| 跨函数分析 | CodeQL | 最适合过程间分析 |
| 未构建代码 | Semgrep | 可处理不完整代码 |
Key Principles
核心原则
- Root cause first: Understand WHY before searching for WHERE
- Start specific: First pattern should match exactly the known bug
- One change at a time: Generalize incrementally, verify after each change
- Know when to stop: 50%+ FP rate means you've gone too generic
- Search everywhere: Always search the ENTIRE codebase, not just the module where the bug was found
- Expand vulnerability classes: One root cause often has multiple manifestations
- 先抓根本原因:在查找位置前,先理解问题产生的原因
- 从精确模式开始:初始模式应仅匹配已知Bug
- 每次仅修改一处:逐步泛化,每次修改后验证结果
- 懂得适时停止:假阳性率超过50%意味着模式过于宽泛
- 全域搜索:始终搜索整个代码库,而非仅Bug所在的模块
- 拓展漏洞类别:单一根本原因通常会有多种表现形式
Critical Pitfalls to Avoid
需避免的常见误区
These common mistakes cause analysts to miss real vulnerabilities:
以下常见错误会导致分析人员遗漏真实漏洞:
1. Narrow Search Scope
1. 搜索范围过窄
Searching only the module where the original bug was found misses variants in other locations.
Example: Bug found in → only searching that directory → missing variant in
api/handlers/utils/auth.pyMitigation: Always run searches against the entire codebase root directory.
仅搜索原始Bug所在的模块,会遗漏其他位置的变体。
示例:在中发现Bug → 仅搜索该目录 → 遗漏中的变体
api/handlers/utils/auth.py解决方法:始终针对整个代码库的根目录执行搜索。
2. Pattern Too Specific
2. 模式过于精确
Using only the exact attribute/function from the original bug misses variants using related constructs.
Example: Bug uses check → only searching for that exact term → missing bugs using related properties like , ,
isAuthenticatedisActiveisAdminisVerifiedMitigation: Enumerate ALL semantically related attributes/functions for the bug class.
仅使用原始Bug中的精确属性/函数,会遗漏使用相关结构的变体。
示例:Bug涉及检查 → 仅搜索该精确术语 → 遗漏使用、、等相关属性的Bug
isAuthenticatedisActiveisAdminisVerified解决方法:枚举该类Bug所有语义相关的属性/函数。
3. Single Vulnerability Class
3. 局限于单一漏洞类别
Focusing on only one manifestation of the root cause misses other ways the same logic error appears.
Example: Original bug is "return allow when condition is false" → only searching that pattern → missing:
- Null equality bypasses (evaluates to true)
null == null - Documentation/code mismatches (function does opposite of what docs claim)
- Inverted conditional logic (wrong branch taken)
Mitigation: List all possible manifestations of the root cause before searching.
仅关注根本原因的一种表现形式,会遗漏同一逻辑错误的其他表现方式。
示例:原始Bug是“条件为假时返回允许” → 仅搜索该模式 → 遗漏以下情况:
- 空值相等绕过(结果为真)
null == null - 文档与代码不匹配(函数行为与文档描述相反)
- 条件逻辑反转(执行了错误分支)
解决方法:在搜索前列出根本原因的所有可能表现形式。
4. Missing Edge Cases
4. 遗漏边缘情况
Testing patterns only with "normal" scenarios misses vulnerabilities triggered by edge cases.
Example: Testing auth checks only with valid users → missing bypass when matches
userId = nullresourceOwnerId = nullMitigation: Test with: unauthenticated users, null/undefined values, empty collections, and boundary conditions.
仅用“常规”场景测试模式,会遗漏由边缘情况触发的漏洞。
示例:仅用有效用户测试权限检查 → 遗漏当与匹配时的绕过情况
userId = nullresourceOwnerId = null解决方法:使用以下场景测试:未认证用户、空值/未定义值、空集合,以及边界条件。
Resources
资源
Ready-to-use templates in :
resources/CodeQL ():
resources/codeql/- ,
python.ql,javascript.ql,java.ql,go.qlcpp.ql
Semgrep ():
resources/semgrep/- ,
python.yaml,javascript.yaml,java.yaml,go.yamlcpp.yaml
Report:
resources/variant-report-template.mdresources/CodeQL():
resources/codeql/- 、
python.ql、javascript.ql、java.ql、go.qlcpp.ql
Semgrep():
resources/semgrep/- 、
python.yaml、javascript.yaml、java.yaml、go.yamlcpp.yaml
报告模板:
resources/variant-report-template.md