semgrep-rule-creator
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSemgrep Rule Creator
Semgrep规则创建指南
Create production-quality Semgrep rules with proper testing and validation.
创建经过充分测试和验证的生产级Semgrep规则。
When to Use
适用场景
Ideal scenarios:
- Writing Semgrep rules for specific bug patterns
- Writing rules to detect security vulnerabilities in your codebase
- Writing taint mode rules for data flow vulnerabilities
- Writing rules to enforce coding standards
理想场景:
- 为特定Bug模式编写Semgrep规则
- 编写规则以检测代码库中的安全漏洞
- 编写用于检测数据流漏洞的污点模式规则
- 编写规则以强制执行编码标准
When NOT to Use
不适用场景
Do NOT use this skill for:
- Running existing Semgrep rulesets
- General static analysis without custom rules (use skill)
static-analysis
请勿在以下场景使用本技能:
- 运行现有Semgrep规则集
- 无需自定义规则的通用静态分析(请使用技能)
static-analysis
Rationalizations to Reject
需避免的错误理由
When writing Semgrep rules, reject these common shortcuts:
- "The pattern looks complete" → Still run to verify. Untested rules have hidden false positives/negatives.
semgrep --test --config <rule-id>.yaml <rule-id>.<ext> - "It matches the vulnerable case" → Matching vulnerabilities is half the job. Verify safe cases don't match (false positives break trust).
- "Taint mode is overkill for this" → If data flows from user input to a dangerous sink, taint mode gives better precision than pattern matching.
- "One test is enough" → Include edge cases: different coding styles, sanitized inputs, safe alternatives, and boundary conditions.
- "I'll optimize the patterns first" → Write correct patterns first, optimize after all tests pass. Premature optimization causes regressions.
- "The AST dump is too complex" → The AST reveals exactly how Semgrep sees code. Skipping it leads to patterns that miss syntactic variations.
编写Semgrep规则时,需拒绝以下常见的偷懒做法:
- “这个模式看起来已经完整了” → 仍需运行进行验证。未测试的规则存在隐藏的误报/漏报问题。
semgrep --test --config <rule-id>.yaml <rule-id>.<ext> - “它能匹配到漏洞案例” → 匹配漏洞只是完成了一半工作。需验证安全案例不会被匹配(误报会破坏规则的可信度)。
- “污点模式对此来说大材小用” → 如果数据从用户输入流向危险的处理点,污点模式的精准度远高于普通模式匹配。
- “一个测试案例就足够了” → 需包含边缘案例:不同的编码风格、经过净化的输入、安全替代方案以及边界条件。
- “我先优化模式” → 先编写正确的模式,在所有测试通过后再进行优化。过早优化会导致回归问题。
- “AST输出太复杂了” → AST能准确展示Semgrep如何解析代码。跳过这一步会导致模式遗漏语法变体。
Anti-Patterns
反模式
Too broad - matches everything, useless for detection:
yaml
undefined范围过宽 - 匹配所有内容,毫无检测价值:
yaml
undefinedBAD: Matches any function call
BAD: Matches any function call
pattern: $FUNC(...)
pattern: $FUNC(...)
GOOD: Specific dangerous function
GOOD: Specific dangerous function
pattern: eval(...)
**Missing safe cases in tests** - leads to undetected false positives:
```pythonpattern: eval(...)
**测试中缺少安全案例** - 会导致未被发现的误报:
```pythonBAD: Only tests vulnerable case
BAD: Only tests vulnerable case
ruleid: my-rule
ruleid: my-rule
dangerous(user_input)
dangerous(user_input)
GOOD: Include safe cases to verify no false positives
GOOD: Include safe cases to verify no false positives
ruleid: my-rule
ruleid: my-rule
dangerous(user_input)
dangerous(user_input)
ok: my-rule
ok: my-rule
dangerous(sanitize(user_input))
dangerous(sanitize(user_input))
ok: my-rule
ok: my-rule
dangerous("hardcoded_safe_value")
**Overly specific patterns** - misses variations:
```yamldangerous("hardcoded_safe_value")
**模式过于具体** - 会遗漏变体情况:
```yamlBAD: Only matches exact format
BAD: Only matches exact format
pattern: os.system("rm " + $VAR)
pattern: os.system("rm " + $VAR)
GOOD: Matches all os.system calls with taint tracking
GOOD: Matches all os.system calls with taint tracking
mode: taint
pattern-sinks:
- pattern: os.system(...)
undefinedmode: taint
pattern-sinks:
- pattern: os.system(...)
undefinedStrictness Level
严格要求
This workflow is strict - do not skip steps:
- Read documentation first: See Documentation before writing Semgrep rules
- Test-first is mandatory: Never write a rule without tests
- 100% test pass is required: "Most tests pass" is not acceptable
- Optimization comes last: Only simplify patterns after all tests pass
- Avoid generic patterns: Rules must be specific, not match broad patterns
- Prioritize taint mode: For data flow vulnerabilities
- One YAML file - one Semgrep rule: Each YAML file must contain only one Semgrep rule; don't combine multiple rules in a single file
- No generic rules: When targeting a specific language for Semgrep rules - avoid generic pattern matching ()
languages: generic - Forbidden and
todooktest annotations:todoruleidandtodoruleid: <rule-id>annotations in tests files for future rule improvements are forbiddentodook: <rule-id>
本工作流要求严格执行,不得跳过任何步骤:
- 先阅读文档:编写Semgrep规则前,请查看文档
- 必须先写测试:绝不编写无测试的规则
- 必须100%通过测试:“大部分测试通过”是不可接受的
- 优化放在最后:仅在所有测试通过后再简化模式
- 避免通用模式:规则必须具体,不得匹配宽泛的模式
- 优先使用污点模式:针对数据流漏洞场景
- 一个YAML文件对应一个Semgrep规则:每个YAML文件只能包含一个Semgrep规则;请勿在单个文件中组合多个规则
- 禁止通用规则:为特定语言编写Semgrep规则时,避免使用通用模式匹配()
languages: generic - 禁止使用和
todook测试注解:测试文件中用于未来规则改进的todoruleid和todoruleid: <rule-id>注解是被禁止的todook: <rule-id>
Overview
概述
This skill guides creation of Semgrep rules that detect security vulnerabilities and code patterns. Rules are created iteratively: analyze the problem, write tests first, analyze AST structure, write the rule, iterate until all tests pass, optimize the rule.
Approach selection:
- Taint mode (prioritize): Data flow issues where untrusted input reaches dangerous sinks
- Pattern matching: Simple syntactic patterns without data flow requirements
Why prioritize taint mode? Pattern matching finds syntax but misses context. A pattern matches both (vulnerable) and (safe). Taint mode tracks data flow, so it only alerts when untrusted data actually reaches the sink—dramatically reducing false positives for injection vulnerabilities.
eval($X)eval(user_input)eval("safe_literal")Iterating between approaches: It's okay to experiment. If you start with taint mode and it's not working well (e.g., taint doesn't propagate as expected, too many false positives/negatives), switch to pattern matching. Conversely, if pattern matching produces too many false positives on safe cases, try taint mode instead. The goal is a working rule—not rigid adherence to one approach.
Output structure - exactly 2 files in a directory named after the rule-id:
<rule-id>/
├── <rule-id>.yaml # Semgrep rule
└── <rule-id>.<ext> # Test file with ruleid/ok annotations本技能指导您创建可检测安全漏洞和代码模式的Semgrep规则。规则创建采用迭代方式:分析问题→先写测试→分析AST结构→编写规则→迭代直到所有测试通过→优化规则。
方法选择:
- 污点模式(优先选择):适用于不可信输入流向危险处理点的数据流问题
- 模式匹配:适用于无数据流要求的简单语法模式
为什么优先选择污点模式? 模式匹配只能识别语法,但会忽略上下文。例如模式会同时匹配(存在漏洞)和(安全)。污点模式会跟踪数据流,因此仅当不可信数据实际流向危险处理点时才会触发警报——大幅减少注入漏洞的误报。
eval($X)eval(user_input)eval("safe_literal")方法间的切换: 可以灵活尝试不同方法。如果您从污点模式开始,但效果不佳(例如污点传播不符合预期、误报/漏报过多),可以切换到模式匹配。反之,如果模式匹配在安全案例上产生过多误报,尝试使用污点模式。我们的目标是创建可用的规则,而非严格坚持某一种方法。
输出结构 - 规则ID命名的目录下必须包含2个文件:
<rule-id>/
├── <rule-id>.yaml # Semgrep规则
└── <rule-id>.<ext> # 带有ruleid/ok注解的测试文件Quick Start
快速入门
yaml
rules:
- id: insecure-eval
languages: [python]
severity: HIGH
message: User input passed to eval() allows code execution
mode: taint
pattern-sources:
- pattern: request.args.get(...)
pattern-sinks:
- pattern: eval(...)Test file ():
insecure-eval.pypython
undefinedyaml
rules:
- id: insecure-eval
languages: [python]
severity: HIGH
message: User input passed to eval() allows code execution
mode: taint
pattern-sources:
- pattern: request.args.get(...)
pattern-sinks:
- pattern: eval(...)测试文件():
insecure-eval.pypython
undefinedruleid: insecure-eval
ruleid: insecure-eval
eval(request.args.get('code'))
eval(request.args.get('code'))
ok: insecure-eval
ok: insecure-eval
eval("print('safe')")
Run tests (from rule directory): `semgrep --test --config <rule-id>.yaml <rule-id>.<ext>`eval("print('safe')")
运行测试(在规则目录下执行):`semgrep --test --config <rule-id>.yaml <rule-id>.<ext>`Quick Reference
快速参考
- For commands, pattern operators, and taint mode syntax, see quick-reference.md.
- For detailed workflow and examples, you MUST see workflow.md
- 命令、模式运算符和污点模式语法,请查看quick-reference.md。
- 详细工作流程和示例,请务必查看workflow.md
Workflow
工作流程
Copy this checklist and track progress:
Semgrep Rule Progress:
- [ ] Step 1: Analyze the Problem
- [ ] Step 2: Write Tests First
- [ ] Step 3: Analyze AST structure
- [ ] Step 4: Write the rule
- [ ] Step 5: Iterate until all tests pass (semgrep --test)
- [ ] Step 6: Optimize the rule (remove redundancies, re-test)
- [ ] Step 7: Final Run复制以下检查清单并跟踪进度:
Semgrep规则创建进度:
- [ ] 步骤1:分析问题
- [ ] 步骤2:先编写测试
- [ ] 步骤3:分析AST结构
- [ ] 步骤4:编写规则
- [ ] 步骤5:迭代直到所有测试通过(执行semgrep --test)
- [ ] 步骤6:优化规则(移除冗余内容,重新测试)
- [ ] 步骤7:最终运行验证Documentation
文档
REQUIRED: Before writing any rule, use WebFetch to read all of these 4 links with Semgrep documentation:
必须执行:在编写任何规则前,使用WebFetch阅读以下4个Semgrep文档链接: