semgrep-rule-creator

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Semgrep Rule Creator

Semgrep规则创建指南

Create production-quality Semgrep rules with proper testing and validation.
创建经过充分测试和验证的生产级Semgrep规则。

When to Use

适用场景

Ideal scenarios:
  • Writing Semgrep rules for specific bug patterns
  • Writing rules to detect security vulnerabilities in your codebase
  • Writing taint mode rules for data flow vulnerabilities
  • Writing rules to enforce coding standards
理想场景:
  • 为特定Bug模式编写Semgrep规则
  • 编写规则以检测代码库中的安全漏洞
  • 编写用于检测数据流漏洞的污点模式规则
  • 编写规则以强制执行编码标准

When NOT to Use

不适用场景

Do NOT use this skill for:
  • Running existing Semgrep rulesets
  • General static analysis without custom rules (use
    static-analysis
    skill)
请勿在以下场景使用本技能:
  • 运行现有Semgrep规则集
  • 无需自定义规则的通用静态分析(请使用
    static-analysis
    技能)

Rationalizations to Reject

需避免的错误理由

When writing Semgrep rules, reject these common shortcuts:
  • "The pattern looks complete" → Still run
    semgrep --test --config <rule-id>.yaml <rule-id>.<ext>
    to verify. Untested rules have hidden false positives/negatives.
  • "It matches the vulnerable case" → Matching vulnerabilities is half the job. Verify safe cases don't match (false positives break trust).
  • "Taint mode is overkill for this" → If data flows from user input to a dangerous sink, taint mode gives better precision than pattern matching.
  • "One test is enough" → Include edge cases: different coding styles, sanitized inputs, safe alternatives, and boundary conditions.
  • "I'll optimize the patterns first" → Write correct patterns first, optimize after all tests pass. Premature optimization causes regressions.
  • "The AST dump is too complex" → The AST reveals exactly how Semgrep sees code. Skipping it leads to patterns that miss syntactic variations.
编写Semgrep规则时,需拒绝以下常见的偷懒做法:
  • “这个模式看起来已经完整了” → 仍需运行
    semgrep --test --config <rule-id>.yaml <rule-id>.<ext>
    进行验证。未测试的规则存在隐藏的误报/漏报问题。
  • “它能匹配到漏洞案例” → 匹配漏洞只是完成了一半工作。需验证安全案例不会被匹配(误报会破坏规则的可信度)。
  • “污点模式对此来说大材小用” → 如果数据从用户输入流向危险的处理点,污点模式的精准度远高于普通模式匹配。
  • “一个测试案例就足够了” → 需包含边缘案例:不同的编码风格、经过净化的输入、安全替代方案以及边界条件。
  • “我先优化模式” → 先编写正确的模式,在所有测试通过后再进行优化。过早优化会导致回归问题。
  • “AST输出太复杂了” → AST能准确展示Semgrep如何解析代码。跳过这一步会导致模式遗漏语法变体。

Anti-Patterns

反模式

Too broad - matches everything, useless for detection:
yaml
undefined
范围过宽 - 匹配所有内容,毫无检测价值:
yaml
undefined

BAD: Matches any function call

BAD: Matches any function call

pattern: $FUNC(...)
pattern: $FUNC(...)

GOOD: Specific dangerous function

GOOD: Specific dangerous function

pattern: eval(...)

**Missing safe cases in tests** - leads to undetected false positives:
```python
pattern: eval(...)

**测试中缺少安全案例** - 会导致未被发现的误报:
```python

BAD: Only tests vulnerable case

BAD: Only tests vulnerable case

ruleid: my-rule

ruleid: my-rule

dangerous(user_input)
dangerous(user_input)

GOOD: Include safe cases to verify no false positives

GOOD: Include safe cases to verify no false positives

ruleid: my-rule

ruleid: my-rule

dangerous(user_input)
dangerous(user_input)

ok: my-rule

ok: my-rule

dangerous(sanitize(user_input))
dangerous(sanitize(user_input))

ok: my-rule

ok: my-rule

dangerous("hardcoded_safe_value")

**Overly specific patterns** - misses variations:
```yaml
dangerous("hardcoded_safe_value")

**模式过于具体** - 会遗漏变体情况:
```yaml

BAD: Only matches exact format

BAD: Only matches exact format

pattern: os.system("rm " + $VAR)
pattern: os.system("rm " + $VAR)

GOOD: Matches all os.system calls with taint tracking

GOOD: Matches all os.system calls with taint tracking

mode: taint pattern-sinks:
  • pattern: os.system(...)
undefined
mode: taint pattern-sinks:
  • pattern: os.system(...)
undefined

Strictness Level

严格要求

This workflow is strict - do not skip steps:
  • Read documentation first: See Documentation before writing Semgrep rules
  • Test-first is mandatory: Never write a rule without tests
  • 100% test pass is required: "Most tests pass" is not acceptable
  • Optimization comes last: Only simplify patterns after all tests pass
  • Avoid generic patterns: Rules must be specific, not match broad patterns
  • Prioritize taint mode: For data flow vulnerabilities
  • One YAML file - one Semgrep rule: Each YAML file must contain only one Semgrep rule; don't combine multiple rules in a single file
  • No generic rules: When targeting a specific language for Semgrep rules - avoid generic pattern matching (
    languages: generic
    )
  • Forbidden
    todook
    and
    todoruleid
    test annotations
    :
    todoruleid: <rule-id>
    and
    todook: <rule-id>
    annotations in tests files for future rule improvements are forbidden
本工作流要求严格执行,不得跳过任何步骤:
  • 先阅读文档:编写Semgrep规则前,请查看文档
  • 必须先写测试:绝不编写无测试的规则
  • 必须100%通过测试:“大部分测试通过”是不可接受的
  • 优化放在最后:仅在所有测试通过后再简化模式
  • 避免通用模式:规则必须具体,不得匹配宽泛的模式
  • 优先使用污点模式:针对数据流漏洞场景
  • 一个YAML文件对应一个Semgrep规则:每个YAML文件只能包含一个Semgrep规则;请勿在单个文件中组合多个规则
  • 禁止通用规则:为特定语言编写Semgrep规则时,避免使用通用模式匹配(
    languages: generic
  • 禁止使用
    todook
    todoruleid
    测试注解
    :测试文件中用于未来规则改进的
    todoruleid: <rule-id>
    todook: <rule-id>
    注解是被禁止的

Overview

概述

This skill guides creation of Semgrep rules that detect security vulnerabilities and code patterns. Rules are created iteratively: analyze the problem, write tests first, analyze AST structure, write the rule, iterate until all tests pass, optimize the rule.
Approach selection:
  • Taint mode (prioritize): Data flow issues where untrusted input reaches dangerous sinks
  • Pattern matching: Simple syntactic patterns without data flow requirements
Why prioritize taint mode? Pattern matching finds syntax but misses context. A pattern
eval($X)
matches both
eval(user_input)
(vulnerable) and
eval("safe_literal")
(safe). Taint mode tracks data flow, so it only alerts when untrusted data actually reaches the sink—dramatically reducing false positives for injection vulnerabilities.
Iterating between approaches: It's okay to experiment. If you start with taint mode and it's not working well (e.g., taint doesn't propagate as expected, too many false positives/negatives), switch to pattern matching. Conversely, if pattern matching produces too many false positives on safe cases, try taint mode instead. The goal is a working rule—not rigid adherence to one approach.
Output structure - exactly 2 files in a directory named after the rule-id:
<rule-id>/
├── <rule-id>.yaml     # Semgrep rule
└── <rule-id>.<ext>    # Test file with ruleid/ok annotations
本技能指导您创建可检测安全漏洞和代码模式的Semgrep规则。规则创建采用迭代方式:分析问题→先写测试→分析AST结构→编写规则→迭代直到所有测试通过→优化规则。
方法选择:
  • 污点模式(优先选择):适用于不可信输入流向危险处理点的数据流问题
  • 模式匹配:适用于无数据流要求的简单语法模式
为什么优先选择污点模式? 模式匹配只能识别语法,但会忽略上下文。例如模式
eval($X)
会同时匹配
eval(user_input)
(存在漏洞)和
eval("safe_literal")
(安全)。污点模式会跟踪数据流,因此仅当不可信数据实际流向危险处理点时才会触发警报——大幅减少注入漏洞的误报。
方法间的切换: 可以灵活尝试不同方法。如果您从污点模式开始,但效果不佳(例如污点传播不符合预期、误报/漏报过多),可以切换到模式匹配。反之,如果模式匹配在安全案例上产生过多误报,尝试使用污点模式。我们的目标是创建可用的规则,而非严格坚持某一种方法。
输出结构 - 规则ID命名的目录下必须包含2个文件:
<rule-id>/
├── <rule-id>.yaml     # Semgrep规则
└── <rule-id>.<ext>    # 带有ruleid/ok注解的测试文件

Quick Start

快速入门

yaml
rules:
  - id: insecure-eval
    languages: [python]
    severity: HIGH
    message: User input passed to eval() allows code execution
    mode: taint
    pattern-sources:
      - pattern: request.args.get(...)
    pattern-sinks:
      - pattern: eval(...)
Test file (
insecure-eval.py
):
python
undefined
yaml
rules:
  - id: insecure-eval
    languages: [python]
    severity: HIGH
    message: User input passed to eval() allows code execution
    mode: taint
    pattern-sources:
      - pattern: request.args.get(...)
    pattern-sinks:
      - pattern: eval(...)
测试文件(
insecure-eval.py
):
python
undefined

ruleid: insecure-eval

ruleid: insecure-eval

eval(request.args.get('code'))
eval(request.args.get('code'))

ok: insecure-eval

ok: insecure-eval

eval("print('safe')")

Run tests (from rule directory): `semgrep --test --config <rule-id>.yaml <rule-id>.<ext>`
eval("print('safe')")

运行测试(在规则目录下执行):`semgrep --test --config <rule-id>.yaml <rule-id>.<ext>`

Quick Reference

快速参考

  • For commands, pattern operators, and taint mode syntax, see quick-reference.md.
  • For detailed workflow and examples, you MUST see workflow.md
  • 命令、模式运算符和污点模式语法,请查看quick-reference.md
  • 详细工作流程和示例,请务必查看workflow.md

Workflow

工作流程

Copy this checklist and track progress:
Semgrep Rule Progress:
- [ ] Step 1: Analyze the Problem
- [ ] Step 2: Write Tests First
- [ ] Step 3: Analyze AST structure
- [ ] Step 4: Write the rule
- [ ] Step 5: Iterate until all tests pass (semgrep --test)
- [ ] Step 6: Optimize the rule (remove redundancies, re-test)
- [ ] Step 7: Final Run
复制以下检查清单并跟踪进度:
Semgrep规则创建进度:
- [ ] 步骤1:分析问题
- [ ] 步骤2:先编写测试
- [ ] 步骤3:分析AST结构
- [ ] 步骤4:编写规则
- [ ] 步骤5:迭代直到所有测试通过(执行semgrep --test)
- [ ] 步骤6:优化规则(移除冗余内容,重新测试)
- [ ] 步骤7:最终运行验证

Documentation

文档

REQUIRED: Before writing any rule, use WebFetch to read all of these 4 links with Semgrep documentation:
  1. Rule Syntax
  2. Pattern Syntax
  3. ToB Testing Handbook - Semgrep
  4. Constant propagation
  5. Writing Rules Index
必须执行:在编写任何规则前,使用WebFetch阅读以下4个Semgrep文档链接:
  1. Rule Syntax
  2. Pattern Syntax
  3. ToB Testing Handbook - Semgrep
  4. Constant propagation
  5. Writing Rules Index