semgrep-rule-creator

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Semgrep Rule Creator

Semgrep规则创建指南

Create production-quality Semgrep rules with proper testing and validation.

创建经过充分测试和验证的生产级Semgrep规则。

When to Use

适用场景

Ideal scenarios:

Writing Semgrep rules for specific bug patterns
Writing rules to detect security vulnerabilities in your codebase
Writing taint mode rules for data flow vulnerabilities
Writing rules to enforce coding standards

理想场景：

为特定Bug模式编写Semgrep规则
编写规则以检测代码库中的安全漏洞
编写用于检测数据流漏洞的污点模式规则
编写规则以强制执行编码标准

When NOT to Use

不适用场景

Do NOT use this skill for:

Running existing Semgrep rulesets
General static analysis without custom rules (use
```
static-analysis
```
skill)

请勿在以下场景使用本技能：

运行现有Semgrep规则集
无需自定义规则的通用静态分析（请使用
```
static-analysis
```
技能）

Rationalizations to Reject

需避免的错误理由

When writing Semgrep rules, reject these common shortcuts:

"The pattern looks complete" → Still run
```
semgrep --test --config <rule-id>.yaml <rule-id>.<ext>
```
to verify. Untested rules have hidden false positives/negatives.
"It matches the vulnerable case" → Matching vulnerabilities is half the job. Verify safe cases don't match (false positives break trust).
"Taint mode is overkill for this" → If data flows from user input to a dangerous sink, taint mode gives better precision than pattern matching.
"One test is enough" → Include edge cases: different coding styles, sanitized inputs, safe alternatives, and boundary conditions.
"I'll optimize the patterns first" → Write correct patterns first, optimize after all tests pass. Premature optimization causes regressions.
"The AST dump is too complex" → The AST reveals exactly how Semgrep sees code. Skipping it leads to patterns that miss syntactic variations.

编写Semgrep规则时，需拒绝以下常见的偷懒做法：

“这个模式看起来已经完整了” → 仍需运行
```
semgrep --test --config <rule-id>.yaml <rule-id>.<ext>
```
进行验证。未测试的规则存在隐藏的误报/漏报问题。
“它能匹配到漏洞案例” → 匹配漏洞只是完成了一半工作。需验证安全案例不会被匹配（误报会破坏规则的可信度）。
“污点模式对此来说大材小用” → 如果数据从用户输入流向危险的处理点，污点模式的精准度远高于普通模式匹配。
“一个测试案例就足够了” → 需包含边缘案例：不同的编码风格、经过净化的输入、安全替代方案以及边界条件。
“我先优化模式” → 先编写正确的模式，在所有测试通过后再进行优化。过早优化会导致回归问题。
“AST输出太复杂了” → AST能准确展示Semgrep如何解析代码。跳过这一步会导致模式遗漏语法变体。

Anti-Patterns

反模式

Too broad - matches everything, useless for detection:

yaml

undefined

范围过宽 - 匹配所有内容，毫无检测价值：

yaml

undefined

BAD: Matches any function call

pattern: $FUNC(...)

GOOD: Specific dangerous function

pattern: eval(...)


**Missing safe cases in tests** - leads to undetected false positives:
```python

pattern: eval(...)


**测试中缺少安全案例** - 会导致未被发现的误报：
```python

BAD: Only tests vulnerable case

ruleid: my-rule

dangerous(user_input)

GOOD: Include safe cases to verify no false positives

ruleid: my-rule

dangerous(user_input)

ok: my-rule

dangerous(sanitize(user_input))

ok: my-rule

dangerous("hardcoded_safe_value")


**Overly specific patterns** - misses variations:
```yaml

dangerous("hardcoded_safe_value")


**模式过于具体** - 会遗漏变体情况：
```yaml

BAD: Only matches exact format

pattern: os.system("rm " + $VAR)

GOOD: Matches all os.system calls with taint tracking

mode: taint pattern-sinks:

pattern: os.system(...)

undefined

mode: taint pattern-sinks:

pattern: os.system(...)

undefined

Strictness Level

严格要求

This workflow is strict - do not skip steps:

Read documentation first: See Documentation before writing Semgrep rules
Test-first is mandatory: Never write a rule without tests
100% test pass is required: "Most tests pass" is not acceptable
Optimization comes last: Only simplify patterns after all tests pass
Avoid generic patterns: Rules must be specific, not match broad patterns
Prioritize taint mode: For data flow vulnerabilities
One YAML file - one Semgrep rule: Each YAML file must contain only one Semgrep rule; don't combine multiple rules in a single file
No generic rules: When targeting a specific language for Semgrep rules - avoid generic pattern matching (
```
languages: generic
```
)
Forbidden
todook
and
todoruleid
test annotations:
```
todoruleid: <rule-id>
```
and
```
todook: <rule-id>
```
annotations in tests files for future rule improvements are forbidden

本工作流要求严格执行，不得跳过任何步骤：

先阅读文档：编写Semgrep规则前，请查看文档
必须先写测试：绝不编写无测试的规则
必须100%通过测试：“大部分测试通过”是不可接受的
优化放在最后：仅在所有测试通过后再简化模式
避免通用模式：规则必须具体，不得匹配宽泛的模式
优先使用污点模式：针对数据流漏洞场景
一个YAML文件对应一个Semgrep规则：每个YAML文件只能包含一个Semgrep规则；请勿在单个文件中组合多个规则
禁止通用规则：为特定语言编写Semgrep规则时，避免使用通用模式匹配（
```
languages: generic
```
）
禁止使用
todook
和
todoruleid
测试注解：测试文件中用于未来规则改进的
```
todoruleid: <rule-id>
```
和
```
todook: <rule-id>
```
注解是被禁止的

Overview

概述

This skill guides creation of Semgrep rules that detect security vulnerabilities and code patterns. Rules are created iteratively: analyze the problem, write tests first, analyze AST structure, write the rule, iterate until all tests pass, optimize the rule.

Approach selection:

Taint mode (prioritize): Data flow issues where untrusted input reaches dangerous sinks
Pattern matching: Simple syntactic patterns without data flow requirements

Why prioritize taint mode? Pattern matching finds syntax but misses context. A pattern

eval($X)

matches both

eval(user_input)

(vulnerable) and

eval("safe_literal")

(safe). Taint mode tracks data flow, so it only alerts when untrusted data actually reaches the sink—dramatically reducing false positives for injection vulnerabilities.

Iterating between approaches: It's okay to experiment. If you start with taint mode and it's not working well (e.g., taint doesn't propagate as expected, too many false positives/negatives), switch to pattern matching. Conversely, if pattern matching produces too many false positives on safe cases, try taint mode instead. The goal is a working rule—not rigid adherence to one approach.

Output structure - exactly 2 files in a directory named after the rule-id:

<rule-id>/
├── <rule-id>.yaml     # Semgrep rule
└── <rule-id>.<ext>    # Test file with ruleid/ok annotations

本技能指导您创建可检测安全漏洞和代码模式的Semgrep规则。规则创建采用迭代方式：分析问题→先写测试→分析AST结构→编写规则→迭代直到所有测试通过→优化规则。

方法选择：

污点模式（优先选择）：适用于不可信输入流向危险处理点的数据流问题
模式匹配：适用于无数据流要求的简单语法模式

为什么优先选择污点模式？ 模式匹配只能识别语法，但会忽略上下文。例如模式

eval($X)

会同时匹配

eval(user_input)

（存在漏洞）和

eval("safe_literal")

（安全）。污点模式会跟踪数据流，因此仅当不可信数据实际流向危险处理点时才会触发警报——大幅减少注入漏洞的误报。

方法间的切换： 可以灵活尝试不同方法。如果您从污点模式开始，但效果不佳（例如污点传播不符合预期、误报/漏报过多），可以切换到模式匹配。反之，如果模式匹配在安全案例上产生过多误报，尝试使用污点模式。我们的目标是创建可用的规则，而非严格坚持某一种方法。

输出结构 - 规则ID命名的目录下必须包含2个文件：

<rule-id>/
├── <rule-id>.yaml     # Semgrep规则
└── <rule-id>.<ext>    # 带有ruleid/ok注解的测试文件

Quick Start

快速入门

yaml

rules:
  - id: insecure-eval
    languages: [python]
    severity: HIGH
    message: User input passed to eval() allows code execution
    mode: taint
    pattern-sources:
      - pattern: request.args.get(...)
    pattern-sinks:
      - pattern: eval(...)

Test file (

insecure-eval.py

python

undefined

yaml

rules:
  - id: insecure-eval
    languages: [python]
    severity: HIGH
    message: User input passed to eval() allows code execution
    mode: taint
    pattern-sources:
      - pattern: request.args.get(...)
    pattern-sinks:
      - pattern: eval(...)

测试文件（

insecure-eval.py

）：

python

undefined

ruleid: insecure-eval

eval(request.args.get('code'))

ok: insecure-eval

eval("print('safe')")


Run tests (from rule directory): `semgrep --test --config <rule-id>.yaml <rule-id>.<ext>`

eval("print('safe')")


运行测试（在规则目录下执行）：`semgrep --test --config <rule-id>.yaml <rule-id>.<ext>`

Quick Reference

快速参考

For commands, pattern operators, and taint mode syntax, see quick-reference.md.
For detailed workflow and examples, you MUST see workflow.md

命令、模式运算符和污点模式语法，请查看quick-reference.md。
详细工作流程和示例，请务必查看workflow.md

Workflow

工作流程

Copy this checklist and track progress:

Semgrep Rule Progress:
- [ ] Step 1: Analyze the Problem
- [ ] Step 2: Write Tests First
- [ ] Step 3: Analyze AST structure
- [ ] Step 4: Write the rule
- [ ] Step 5: Iterate until all tests pass (semgrep --test)
- [ ] Step 6: Optimize the rule (remove redundancies, re-test)
- [ ] Step 7: Final Run

复制以下检查清单并跟踪进度：

Semgrep规则创建进度：
- [ ] 步骤1：分析问题
- [ ] 步骤2：先编写测试
- [ ] 步骤3：分析AST结构
- [ ] 步骤4：编写规则
- [ ] 步骤5：迭代直到所有测试通过（执行semgrep --test）
- [ ] 步骤6：优化规则（移除冗余内容，重新测试）
- [ ] 步骤7：最终运行验证

Documentation

文档

REQUIRED: Before writing any rule, use WebFetch to read all of these 4 links with Semgrep documentation:

必须执行：在编写任何规则前，使用WebFetch阅读以下4个Semgrep文档链接：