semgrep
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSemgrep Static Analysis
Semgrep静态分析
When to Use Semgrep
何时使用Semgrep
Ideal scenarios:
- Quick security scans (minutes, not hours)
- Pattern-based bug detection
- Enforcing coding standards and best practices
- Finding known vulnerability patterns
- Single-file analysis without complex data flow
- First-pass analysis before deeper tools
Consider CodeQL instead when:
- Need interprocedural taint tracking across files
- Complex data flow analysis required
- Analyzing custom proprietary frameworks
理想场景:
- 快速安全扫描(数分钟而非数小时)
- 基于模式的bug检测
- 强制执行编码标准和最佳实践
- 查找已知漏洞模式
- 无需复杂数据流的单文件分析
- 使用深度分析工具前的首轮分析
考虑改用CodeQL的场景:
- 需要跨文件的过程间污点追踪
- 需要复杂的数据流分析
- 分析自定义专有框架
When NOT to Use
何时不使用
Do NOT use this skill for:
- Complex interprocedural data flow analysis (use CodeQL instead)
- Binary analysis or compiled code without source
- Custom deep semantic analysis requiring AST/CFG traversal
- When you need to track taint across many function boundaries
请勿将此技能用于:
- 复杂的过程间数据流分析(改用CodeQL)
- 二进制分析或无源码的编译代码
- 需要AST/CFG遍历的自定义深度语义分析
- 需要跨多个函数边界追踪污点的场景
Installation
安装
bash
undefinedbash
undefinedpip
pip
python3 -m pip install semgrep
python3 -m pip install semgrep
Homebrew
Homebrew
brew install semgrep
brew install semgrep
Docker
Docker
docker run --rm -v "${PWD}:/src" returntocorp/semgrep semgrep --config auto /src
docker run --rm -v "${PWD}:/src" returntocorp/semgrep semgrep --config auto /src
Update
升级
pip install --upgrade semgrep
undefinedpip install --upgrade semgrep
undefinedCore Workflow
核心工作流
1. Quick Scan
1. 快速扫描
bash
semgrep --config auto . # Auto-detect rules
semgrep --config auto --metrics=off . # Disable telemetry for proprietary codebash
semgrep --config auto . # 自动检测规则
semgrep --config auto --metrics=off . # 为专有代码禁用遥测2. Use Rulesets
2. 使用规则集
bash
semgrep --config p/<RULESET> . # Single ruleset
semgrep --config p/security-audit --config p/trailofbits . # Multiple| Ruleset | Description |
|---|---|
| General security and code quality |
| Comprehensive security rules |
| OWASP Top 10 vulnerabilities |
| CWE Top 25 vulnerabilities |
| r2c security audit rules |
| Trail of Bits security rules |
| Python-specific |
| JavaScript-specific |
| Go-specific |
bash
semgrep --config p/<RULESET> . # 单一规则集
semgrep --config p/security-audit --config p/trailofbits . # 多个规则集| 规则集 | 描述 |
|---|---|
| 通用安全与代码质量 |
| 全面安全规则 |
| OWASP Top 10漏洞 |
| CWE Top 25漏洞 |
| r2c安全审计规则 |
| Trail of Bits安全规则 |
| Python专属规则 |
| JavaScript专属规则 |
| Go专属规则 |
3. Output Formats
3. 输出格式
bash
semgrep --config p/security-audit --sarif -o results.sarif . # SARIF
semgrep --config p/security-audit --json -o results.json . # JSON
semgrep --config p/security-audit --dataflow-traces . # Show data flowbash
semgrep --config p/security-audit --sarif -o results.sarif . # SARIF格式
semgrep --config p/security-audit --json -o results.json . # JSON格式
semgrep --config p/security-audit --dataflow-traces . # 显示数据流4. Scan Specific Paths
4. 扫描特定路径
bash
semgrep --config p/python app.py # Single file
semgrep --config p/javascript src/ # Directory
semgrep --config auto --include='**/test/**' . # Include tests (excluded by default)bash
semgrep --config p/python app.py # 单个文件
semgrep --config p/javascript src/ # 目录
semgrep --config auto --include='**/test/**' . # 包含测试文件(默认排除)Writing Custom Rules
编写自定义规则
Basic Structure
基本结构
yaml
rules:
- id: hardcoded-password
languages: [python]
message: "Hardcoded password detected: $PASSWORD"
severity: ERROR
pattern: password = "$PASSWORD"yaml
rules:
- id: hardcoded-password
languages: [python]
message: "检测到硬编码密码: $PASSWORD"
severity: ERROR
pattern: password = "$PASSWORD"Pattern Syntax
模式语法
| Syntax | Description | Example |
|---|---|---|
| Match anything | |
| Capture metavariable | |
| Deep expression match | |
| 语法 | 描述 | 示例 |
|---|---|---|
| 匹配任意内容 | |
| 捕获元变量 | |
| 深度表达式匹配 | |
Pattern Operators
模式运算符
| Operator | Description |
|---|---|
| Match exact pattern |
| All must match (AND) |
| Any matches (OR) |
| Exclude matches |
| Match only inside context |
| Match only outside context |
| Regex matching |
| Regex on captured value |
| Compare values |
| 运算符 | 描述 |
|---|---|
| 匹配精确模式 |
| 所有模式必须匹配(逻辑与) |
| 任意模式匹配(逻辑或) |
| 排除匹配项 |
| 仅在指定上下文中匹配 |
| 仅在指定上下文外匹配 |
| 正则表达式匹配 |
| 对捕获值应用正则 |
| 比较值 |
Combining Patterns
组合模式
yaml
rules:
- id: sql-injection
languages: [python]
message: "Potential SQL injection"
severity: ERROR
patterns:
- pattern-either:
- pattern: cursor.execute($QUERY)
- pattern: db.execute($QUERY)
- pattern-not:
- pattern: cursor.execute("...", (...))
- metavariable-regex:
metavariable: $QUERY
regex: .*\+.*|.*\.format\(.*|.*%.*yaml
rules:
- id: sql-injection
languages: [python]
message: "潜在SQL注入"
severity: ERROR
patterns:
- pattern-either:
- pattern: cursor.execute($QUERY)
- pattern: db.execute($QUERY)
- pattern-not:
- pattern: cursor.execute("...", (...))
- metavariable-regex:
metavariable: $QUERY
regex: .*\+.*|.*\.format\(.*|.*%.*Taint Mode (Data Flow)
污点模式(数据流)
Simple pattern matching finds obvious cases:
python
undefined简单模式匹配可发现明显案例:
python
undefinedPattern os.system($CMD)
catches this:
os.system($CMD)模式os.system($CMD)
可捕获此情况:
os.system($CMD)os.system(user_input) # Found
But misses indirect flows:
```pythonos.system(user_input) # 已发现
但会遗漏间接数据流:
```pythonSame pattern misses this:
相同模式会遗漏此情况:
cmd = user_input
processed = cmd.strip()
os.system(processed) # Missed - no direct match
Taint mode tracks data through assignments and transformations:
- **Source**: Where untrusted data enters (`user_input`)
- **Propagators**: How it flows (`cmd = ...`, `processed = ...`)
- **Sanitizers**: What makes it safe (`shlex.quote()`)
- **Sink**: Where it becomes dangerous (`os.system()`)
```yaml
rules:
- id: command-injection
languages: [python]
message: "User input flows to command execution"
severity: ERROR
mode: taint
pattern-sources:
- pattern: request.args.get(...)
- pattern: request.form[...]
- pattern: request.json
pattern-sinks:
- pattern: os.system($SINK)
- pattern: subprocess.call($SINK, shell=True)
- pattern: subprocess.run($SINK, shell=True, ...)
pattern-sanitizers:
- pattern: shlex.quote(...)
- pattern: int(...)cmd = user_input
processed = cmd.strip()
os.system(processed) # 遗漏 - 无直接匹配
污点模式可追踪数据在赋值和转换过程中的流向:
- **源(Source)**: 不可信数据的入口点(如`user_input`)
- **传播器(Propagators)**: 数据的流转方式(如`cmd = ...`, `processed = ...`)
- **清理器(Sanitizers)**: 使数据安全的操作(如`shlex.quote()`)
- **Sink(Sink)**: 数据变得危险的位置(如`os.system()`)
```yaml
rules:
- id: command-injection
languages: [python]
message: "用户输入流向命令执行"
severity: ERROR
mode: taint
pattern-sources:
- pattern: request.args.get(...)
- pattern: request.form[...]
- pattern: request.json
pattern-sinks:
- pattern: os.system($SINK)
- pattern: subprocess.call($SINK, shell=True)
- pattern: subprocess.run($SINK, shell=True, ...)
pattern-sanitizers:
- pattern: shlex.quote(...)
- pattern: int(...)Full Rule with Metadata
带元数据的完整规则
yaml
rules:
- id: flask-sql-injection
languages: [python]
message: "SQL injection: user input flows to query without parameterization"
severity: ERROR
metadata:
cwe: "CWE-89: SQL Injection"
owasp: "A03:2021 - Injection"
confidence: HIGH
mode: taint
pattern-sources:
- pattern: request.args.get(...)
- pattern: request.form[...]
- pattern: request.json
pattern-sinks:
- pattern: cursor.execute($QUERY)
- pattern: db.execute($QUERY)
pattern-sanitizers:
- pattern: int(...)
fix: cursor.execute($QUERY, (params,))yaml
rules:
- id: flask-sql-injection
languages: [python]
message: "SQL注入: 用户输入流向未参数化的查询"
severity: ERROR
metadata:
cwe: "CWE-89: SQL Injection"
owasp: "A03:2021 - Injection"
confidence: HIGH
mode: taint
pattern-sources:
- pattern: request.args.get(...)
- pattern: request.form[...]
- pattern: request.json
pattern-sinks:
- pattern: cursor.execute($QUERY)
- pattern: db.execute($QUERY)
pattern-sanitizers:
- pattern: int(...)
fix: cursor.execute($QUERY, (params,))Testing Rules
测试规则
Test File Format
测试文件格式
python
undefinedpython
undefinedtest_rule.py
test_rule.py
def test_vulnerable():
user_input = request.args.get("id")
# ruleid: flask-sql-injection
cursor.execute("SELECT * FROM users WHERE id = " + user_input)
def test_safe():
user_input = request.args.get("id")
# ok: flask-sql-injection
cursor.execute("SELECT * FROM users WHERE id = ?", (user_input,))
```bash
semgrep --test rules/def test_vulnerable():
user_input = request.args.get("id")
# ruleid: flask-sql-injection
cursor.execute("SELECT * FROM users WHERE id = " + user_input)
def test_safe():
user_input = request.args.get("id")
# ok: flask-sql-injection
cursor.execute("SELECT * FROM users WHERE id = ?", (user_input,))
```bash
semgrep --test rules/CI/CD Integration (GitHub Actions)
CI/CD集成(GitHub Actions)
yaml
name: Semgrep
on:
push:
branches: [main]
pull_request:
schedule:
- cron: '0 0 1 * *' # Monthly
jobs:
semgrep:
runs-on: ubuntu-latest
container:
image: returntocorp/semgrep
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0 # Required for diff-aware scanning
- name: Run Semgrep
run: |
if [ "${{ github.event_name }}" = "pull_request" ]; then
semgrep ci --baseline-commit ${{ github.event.pull_request.base.sha }}
else
semgrep ci
fi
env:
SEMGREP_RULES: >-
p/security-audit
p/owasp-top-ten
p/trailofbitsyaml
name: Semgrep
on:
push:
branches: [main]
pull_request:
schedule:
- cron: '0 0 1 * *' # 每月执行
jobs:
semgrep:
runs-on: ubuntu-latest
container:
image: returntocorp/semgrep
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0 # 差异感知扫描所需
- name: 运行Semgrep
run: |
if [ "${{ github.event_name }}" = "pull_request" ]; then
semgrep ci --baseline-commit ${{ github.event.pull_request.base.sha }}
else
semgrep ci
fi
env:
SEMGREP_RULES: >-
p/security-audit
p/owasp-top-ten
p/trailofbitsConfiguration
配置
.semgrepignore
.semgrepignore
tests/fixtures/
**/testdata/
generated/
vendor/
node_modules/tests/fixtures/
**/testdata/
generated/
vendor/
node_modules/Suppress False Positives
抑制误报
python
password = get_from_vault() # nosemgrep: hardcoded-password
dangerous_but_safe() # nosemgreppython
password = get_from_vault() # nosemgrep: hardcoded-password
dangerous_but_safe() # nosemgrepPerformance
性能
bash
semgrep --config rules/ --time . # Check rule performance
ulimit -n 4096 # Increase file descriptors for large codebasesbash
semgrep --config rules/ --time . # 检查规则性能
ulimit -n 4096 # 为大型代码库增加文件描述符限制Path Filtering in Rules
规则中的路径过滤
yaml
rules:
- id: my-rule
paths:
include: [src/]
exclude: [src/generated/]yaml
rules:
- id: my-rule
paths:
include: [src/]
exclude: [src/generated/]Third-Party Rules
第三方规则
bash
pip install semgrep-rules-manager
semgrep-rules-manager --dir ~/semgrep-rules download
semgrep -f ~/semgrep-rules .bash
pip install semgrep-rules-manager
semgrep-rules-manager --dir ~/semgrep-rules download
semgrep -f ~/semgrep-rules .Rationalizations to Reject
需摒弃的错误认知
| Shortcut | Why It's Wrong |
|---|---|
| "Semgrep found nothing, code is clean" | Semgrep is pattern-based; it can't track complex data flow across functions |
| "I wrote a rule, so we're covered" | Rules need testing with |
| "Taint mode catches injection" | Only if you defined all sources, sinks, AND sanitizers correctly |
| "Pro rules are comprehensive" | Pro rules are good but not exhaustive; supplement with custom rules for your codebase |
| "Too many findings = noisy tool" | High finding count often means real problems; tune rules, don't disable them |
| 错误观点 | 原因 |
|---|---|
| "Semgrep未发现问题,代码就是干净的" | Semgrep是基于模式的工具;无法跨函数追踪复杂数据流 |
| "我写了规则,所以我们就覆盖全面了" | 规则需要用 |
| "污点模式能捕获所有注入漏洞" | 仅当正确定义了所有源、Sink和清理器时才有效 |
| "专业规则是全面的" | 专业规则虽好但并非穷尽所有场景;需针对你的代码库补充自定义规则 |
| "发现太多问题 = 工具太嘈杂" | 高发现量通常意味着真实问题;应调整规则,而非禁用工具 |
Resources
资源
- Registry: https://semgrep.dev/explore
- Playground: https://semgrep.dev/playground
- Docs: https://semgrep.dev/docs/
- Trail of Bits Rules: https://github.com/trailofbits/semgrep-rules
- Blog: https://semgrep.dev/blog/
- 规则注册表: https://semgrep.dev/explore
- 在线Playground: https://semgrep.dev/playground
- 文档: https://semgrep.dev/docs/
- Trail of Bits规则: https://github.com/trailofbits/semgrep-rules
- 博客: https://semgrep.dev/blog/