semgrep

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Semgrep Static Analysis

Semgrep静态分析

When to Use Semgrep

何时使用Semgrep

Ideal scenarios:

Quick security scans (minutes, not hours)
Pattern-based bug detection
Enforcing coding standards and best practices
Finding known vulnerability patterns
Single-file analysis without complex data flow
First-pass analysis before deeper tools

Consider CodeQL instead when:

Need interprocedural taint tracking across files
Complex data flow analysis required
Analyzing custom proprietary frameworks

理想场景：

快速安全扫描（数分钟而非数小时）
基于模式的bug检测
强制执行编码标准和最佳实践
查找已知漏洞模式
无需复杂数据流的单文件分析
使用深度分析工具前的首轮分析

考虑改用CodeQL的场景：

需要跨文件的过程间污点追踪
需要复杂的数据流分析
分析自定义专有框架

When NOT to Use

何时不使用

Do NOT use this skill for:

Complex interprocedural data flow analysis (use CodeQL instead)
Binary analysis or compiled code without source
Custom deep semantic analysis requiring AST/CFG traversal
When you need to track taint across many function boundaries

请勿将此技能用于：

复杂的过程间数据流分析（改用CodeQL）
二进制分析或无源码的编译代码
需要AST/CFG遍历的自定义深度语义分析
需要跨多个函数边界追踪污点的场景

Installation

安装

bash

undefined

bash

undefined

pip

python3 -m pip install semgrep

Homebrew

brew install semgrep

Docker

docker run --rm -v "${PWD}:/src" returntocorp/semgrep semgrep --config auto /src

Update

升级

pip install --upgrade semgrep

undefined

pip install --upgrade semgrep

undefined

Core Workflow

核心工作流

1. Quick Scan

1. 快速扫描

bash

semgrep --config auto .                    # Auto-detect rules
semgrep --config auto --metrics=off .      # Disable telemetry for proprietary code

bash

semgrep --config auto .                    # 自动检测规则
semgrep --config auto --metrics=off .      # 为专有代码禁用遥测

2. Use Rulesets

2. 使用规则集

bash

semgrep --config p/<RULESET> .             # Single ruleset
semgrep --config p/security-audit --config p/trailofbits .  # Multiple

Ruleset	Description
`p/default`	General security and code quality
`p/security-audit`	Comprehensive security rules
`p/owasp-top-ten`	OWASP Top 10 vulnerabilities
`p/cwe-top-25`	CWE Top 25 vulnerabilities
`p/r2c-security-audit`	r2c security audit rules
`p/trailofbits`	Trail of Bits security rules
`p/python`	Python-specific
`p/javascript`	JavaScript-specific
`p/golang`	Go-specific

bash

semgrep --config p/<RULESET> .             # 单一规则集
semgrep --config p/security-audit --config p/trailofbits .  # 多个规则集

规则集	描述
`p/default`	通用安全与代码质量
`p/security-audit`	全面安全规则
`p/owasp-top-ten`	OWASP Top 10漏洞
`p/cwe-top-25`	CWE Top 25漏洞
`p/r2c-security-audit`	r2c安全审计规则
`p/trailofbits`	Trail of Bits安全规则
`p/python`	Python专属规则
`p/javascript`	JavaScript专属规则
`p/golang`	Go专属规则

3. Output Formats

3. 输出格式

bash

semgrep --config p/security-audit --sarif -o results.sarif .   # SARIF
semgrep --config p/security-audit --json -o results.json .     # JSON
semgrep --config p/security-audit --dataflow-traces .          # Show data flow

bash

semgrep --config p/security-audit --sarif -o results.sarif .   # SARIF格式
semgrep --config p/security-audit --json -o results.json .     # JSON格式
semgrep --config p/security-audit --dataflow-traces .          # 显示数据流

4. Scan Specific Paths

4. 扫描特定路径

bash

semgrep --config p/python app.py           # Single file
semgrep --config p/javascript src/         # Directory
semgrep --config auto --include='**/test/**' .  # Include tests (excluded by default)

bash

semgrep --config p/python app.py           # 单个文件
semgrep --config p/javascript src/         # 目录
semgrep --config auto --include='**/test/**' .  # 包含测试文件（默认排除）

Writing Custom Rules

编写自定义规则

Basic Structure

基本结构

yaml

rules:
  - id: hardcoded-password
    languages: [python]
    message: "Hardcoded password detected: $PASSWORD"
    severity: ERROR
    pattern: password = "$PASSWORD"

yaml

rules:
  - id: hardcoded-password
    languages: [python]
    message: "检测到硬编码密码: $PASSWORD"
    severity: ERROR
    pattern: password = "$PASSWORD"

Pattern Syntax

模式语法

Syntax	Description	Example
`...`	Match anything	`func(...)`
`$VAR`	Capture metavariable	`$FUNC($INPUT)`
`<... ...>`	Deep expression match	`<... user_input ...>`

语法	描述	示例
`...`	匹配任意内容	`func(...)`
`$VAR`	捕获元变量	`$FUNC($INPUT)`
`<... ...>`	深度表达式匹配	`<... user_input ...>`

Pattern Operators

模式运算符

Operator	Description
`pattern`	Match exact pattern
`patterns`	All must match (AND)
`pattern-either`	Any matches (OR)
`pattern-not`	Exclude matches
`pattern-inside`	Match only inside context
`pattern-not-inside`	Match only outside context
`pattern-regex`	Regex matching
`metavariable-regex`	Regex on captured value
`metavariable-comparison`	Compare values

运算符	描述
`pattern`	匹配精确模式
`patterns`	所有模式必须匹配（逻辑与）
`pattern-either`	任意模式匹配（逻辑或）
`pattern-not`	排除匹配项
`pattern-inside`	仅在指定上下文中匹配
`pattern-not-inside`	仅在指定上下文外匹配
`pattern-regex`	正则表达式匹配
`metavariable-regex`	对捕获值应用正则
`metavariable-comparison`	比较值

Combining Patterns

组合模式

yaml

rules:
  - id: sql-injection
    languages: [python]
    message: "Potential SQL injection"
    severity: ERROR
    patterns:
      - pattern-either:
          - pattern: cursor.execute($QUERY)
          - pattern: db.execute($QUERY)
      - pattern-not:
          - pattern: cursor.execute("...", (...))
      - metavariable-regex:
          metavariable: $QUERY
          regex: .*\+.*|.*\.format\(.*|.*%.*

yaml

rules:
  - id: sql-injection
    languages: [python]
    message: "潜在SQL注入"
    severity: ERROR
    patterns:
      - pattern-either:
          - pattern: cursor.execute($QUERY)
          - pattern: db.execute($QUERY)
      - pattern-not:
          - pattern: cursor.execute("...", (...))
      - metavariable-regex:
          metavariable: $QUERY
          regex: .*\+.*|.*\.format\(.*|.*%.*

Taint Mode (Data Flow)

污点模式（数据流）

Simple pattern matching finds obvious cases:

python

undefined

简单模式匹配可发现明显案例：

python

undefined

Pattern

os.system($CMD)

catches this:

模式

os.system($CMD)

可捕获此情况:

os.system(user_input) # Found


But misses indirect flows:

```python

os.system(user_input) # 已发现


但会遗漏间接数据流：

```python

Same pattern misses this:

相同模式会遗漏此情况:

cmd = user_input processed = cmd.strip() os.system(processed) # Missed - no direct match


Taint mode tracks data through assignments and transformations:
- **Source**: Where untrusted data enters (`user_input`)
- **Propagators**: How it flows (`cmd = ...`, `processed = ...`)
- **Sanitizers**: What makes it safe (`shlex.quote()`)
- **Sink**: Where it becomes dangerous (`os.system()`)

```yaml
rules:
  - id: command-injection
    languages: [python]
    message: "User input flows to command execution"
    severity: ERROR
    mode: taint
    pattern-sources:
      - pattern: request.args.get(...)
      - pattern: request.form[...]
      - pattern: request.json
    pattern-sinks:
      - pattern: os.system($SINK)
      - pattern: subprocess.call($SINK, shell=True)
      - pattern: subprocess.run($SINK, shell=True, ...)
    pattern-sanitizers:
      - pattern: shlex.quote(...)
      - pattern: int(...)

cmd = user_input processed = cmd.strip() os.system(processed) # 遗漏 - 无直接匹配


污点模式可追踪数据在赋值和转换过程中的流向：
- **源（Source）**: 不可信数据的入口点（如`user_input`）
- **传播器（Propagators）**: 数据的流转方式（如`cmd = ...`, `processed = ...`）
- **清理器（Sanitizers）**: 使数据安全的操作（如`shlex.quote()`）
- **Sink（Sink）**: 数据变得危险的位置（如`os.system()`）

```yaml
rules:
  - id: command-injection
    languages: [python]
    message: "用户输入流向命令执行"
    severity: ERROR
    mode: taint
    pattern-sources:
      - pattern: request.args.get(...)
      - pattern: request.form[...]
      - pattern: request.json
    pattern-sinks:
      - pattern: os.system($SINK)
      - pattern: subprocess.call($SINK, shell=True)
      - pattern: subprocess.run($SINK, shell=True, ...)
    pattern-sanitizers:
      - pattern: shlex.quote(...)
      - pattern: int(...)

Full Rule with Metadata

带元数据的完整规则

yaml

rules:
  - id: flask-sql-injection
    languages: [python]
    message: "SQL injection: user input flows to query without parameterization"
    severity: ERROR
    metadata:
      cwe: "CWE-89: SQL Injection"
      owasp: "A03:2021 - Injection"
      confidence: HIGH
    mode: taint
    pattern-sources:
      - pattern: request.args.get(...)
      - pattern: request.form[...]
      - pattern: request.json
    pattern-sinks:
      - pattern: cursor.execute($QUERY)
      - pattern: db.execute($QUERY)
    pattern-sanitizers:
      - pattern: int(...)
    fix: cursor.execute($QUERY, (params,))

yaml

rules:
  - id: flask-sql-injection
    languages: [python]
    message: "SQL注入: 用户输入流向未参数化的查询"
    severity: ERROR
    metadata:
      cwe: "CWE-89: SQL Injection"
      owasp: "A03:2021 - Injection"
      confidence: HIGH
    mode: taint
    pattern-sources:
      - pattern: request.args.get(...)
      - pattern: request.form[...]
      - pattern: request.json
    pattern-sinks:
      - pattern: cursor.execute($QUERY)
      - pattern: db.execute($QUERY)
    pattern-sanitizers:
      - pattern: int(...)
    fix: cursor.execute($QUERY, (params,))

Testing Rules

测试规则

Test File Format

测试文件格式

python

undefined

python

undefined

test_rule.py

def test_vulnerable(): user_input = request.args.get("id") # ruleid: flask-sql-injection cursor.execute("SELECT * FROM users WHERE id = " + user_input)

def test_safe(): user_input = request.args.get("id") # ok: flask-sql-injection cursor.execute("SELECT * FROM users WHERE id = ?", (user_input,))


```bash
semgrep --test rules/

def test_vulnerable(): user_input = request.args.get("id") # ruleid: flask-sql-injection cursor.execute("SELECT * FROM users WHERE id = " + user_input)

def test_safe(): user_input = request.args.get("id") # ok: flask-sql-injection cursor.execute("SELECT * FROM users WHERE id = ?", (user_input,))


```bash
semgrep --test rules/

CI/CD Integration (GitHub Actions)

CI/CD集成（GitHub Actions）

yaml

name: Semgrep

on:
  push:
    branches: [main]
  pull_request:
  schedule:
    - cron: '0 0 1 * *'  # Monthly

jobs:
  semgrep:
    runs-on: ubuntu-latest
    container:
      image: returntocorp/semgrep

    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0  # Required for diff-aware scanning

      - name: Run Semgrep
        run: |
          if [ "${{ github.event_name }}" = "pull_request" ]; then
            semgrep ci --baseline-commit ${{ github.event.pull_request.base.sha }}
          else
            semgrep ci
          fi
        env:
          SEMGREP_RULES: >-
            p/security-audit
            p/owasp-top-ten
            p/trailofbits

yaml

name: Semgrep

on:
  push:
    branches: [main]
  pull_request:
  schedule:
    - cron: '0 0 1 * *'  # 每月执行

jobs:
  semgrep:
    runs-on: ubuntu-latest
    container:
      image: returntocorp/semgrep

    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0  # 差异感知扫描所需

      - name: 运行Semgrep
        run: |
          if [ "${{ github.event_name }}" = "pull_request" ]; then
            semgrep ci --baseline-commit ${{ github.event.pull_request.base.sha }}
          else
            semgrep ci
          fi
        env:
          SEMGREP_RULES: >-
            p/security-audit
            p/owasp-top-ten
            p/trailofbits

Configuration

配置

.semgrepignore

tests/fixtures/
**/testdata/
generated/
vendor/
node_modules/

tests/fixtures/
**/testdata/
generated/
vendor/
node_modules/

Suppress False Positives

抑制误报

python

password = get_from_vault()  # nosemgrep: hardcoded-password
dangerous_but_safe()  # nosemgrep

python

password = get_from_vault()  # nosemgrep: hardcoded-password
dangerous_but_safe()  # nosemgrep

Performance

性能

bash

semgrep --config rules/ --time .    # Check rule performance
ulimit -n 4096                       # Increase file descriptors for large codebases

bash

semgrep --config rules/ --time .    # 检查规则性能
ulimit -n 4096                       # 为大型代码库增加文件描述符限制

Path Filtering in Rules

规则中的路径过滤

yaml

rules:
  - id: my-rule
    paths:
      include: [src/]
      exclude: [src/generated/]

yaml

rules:
  - id: my-rule
    paths:
      include: [src/]
      exclude: [src/generated/]

Third-Party Rules

第三方规则

bash

pip install semgrep-rules-manager
semgrep-rules-manager --dir ~/semgrep-rules download
semgrep -f ~/semgrep-rules .

bash

pip install semgrep-rules-manager
semgrep-rules-manager --dir ~/semgrep-rules download
semgrep -f ~/semgrep-rules .

Rationalizations to Reject

需摒弃的错误认知

Shortcut	Why It's Wrong
"Semgrep found nothing, code is clean"	Semgrep is pattern-based; it can't track complex data flow across functions
"I wrote a rule, so we're covered"	Rules need testing with `semgrep --test` ; false negatives are silent
"Taint mode catches injection"	Only if you defined all sources, sinks, AND sanitizers correctly
"Pro rules are comprehensive"	Pro rules are good but not exhaustive; supplement with custom rules for your codebase
"Too many findings = noisy tool"	High finding count often means real problems; tune rules, don't disable them

错误观点	原因
"Semgrep未发现问题，代码就是干净的"	Semgrep是基于模式的工具；无法跨函数追踪复杂数据流
"我写了规则，所以我们就覆盖全面了"	规则需要用 `semgrep --test` 测试；漏报是无提示的
"污点模式能捕获所有注入漏洞"	仅当正确定义了所有源、Sink和清理器时才有效
"专业规则是全面的"	专业规则虽好但并非穷尽所有场景；需针对你的代码库补充自定义规则
"发现太多问题 = 工具太嘈杂"	高发现量通常意味着真实问题；应调整规则，而非禁用工具

Resources

资源

Registry: https://semgrep.dev/explore
Playground: https://semgrep.dev/playground
Docs: https://semgrep.dev/docs/
Trail of Bits Rules: https://github.com/trailofbits/semgrep-rules
Blog: https://semgrep.dev/blog/

规则注册表: https://semgrep.dev/explore
在线Playground: https://semgrep.dev/playground
文档: https://semgrep.dev/docs/
Trail of Bits规则: https://github.com/trailofbits/semgrep-rules
博客: https://semgrep.dev/blog/

semgrep

Original

Translation

Semgrep Static Analysis

Semgrep静态分析

When to Use Semgrep

何时使用Semgrep

When NOT to Use

何时不使用

Installation

安装

pip

pip

Homebrew

Homebrew

Docker

Docker

Update

升级

Core Workflow

核心工作流

1. Quick Scan

1. 快速扫描

2. Use Rulesets

2. 使用规则集

3. Output Formats

3. 输出格式

4. Scan Specific Paths

4. 扫描特定路径

Writing Custom Rules

编写自定义规则

Basic Structure

基本结构

Pattern Syntax

模式语法

Pattern Operators

模式运算符

Combining Patterns

组合模式

Taint Mode (Data Flow)

污点模式（数据流）

Pattern os.system($CMD) catches this:

模式os.system($CMD)可捕获此情况:

Same pattern misses this:

相同模式会遗漏此情况:

Full Rule with Metadata

带元数据的完整规则

Testing Rules

测试规则

Test File Format

测试文件格式

test_rule.py

test_rule.py

CI/CD Integration (GitHub Actions)

CI/CD集成（GitHub Actions）

Configuration

配置

.semgrepignore

.semgrepignore

Suppress False Positives

抑制误报

Performance

性能

Path Filtering in Rules

规则中的路径过滤

Third-Party Rules

第三方规则

Rationalizations to Reject

需摒弃的错误认知

Resources

资源

Pattern
`os.system($CMD)`
catches this:

模式
`os.system($CMD)`
可捕获此情况: