opengrep
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseOpengrep Static Analysis
Opengrep 静态分析
Opengrep is a community-maintained, open-source static analysis tool forked from Semgrep. It uses the same rule syntax and CLI interface, making existing Semgrep rules and knowledge transferable.
Opengrep是一款由社区维护的开源静态分析工具,从Semgrep分叉而来。它使用与Semgrep相同的规则语法和CLI界面,因此现有Semgrep的规则和相关知识可直接迁移使用。
Two Use Cases
两种使用场景
1. Semantic Code Search (grep alternative)
1. 语义化代码搜索(grep替代工具)
When exploring a codebase, grep finds text patterns but misses structural patterns. Opengrep understands code structure:
| Task | Grep | Opengrep |
|---|---|---|
| Find text "execute" | Fast, works | Overkill |
Find | May match comments, strings | Matches only actual calls |
Find functions that call | Difficult | |
| Find unparameterized SQL queries | Nearly impossible | Taint mode |
Use Opengrep when:
- You need to find function/method calls with specific arguments
- Grep returns too many false positives (matches in comments, strings, similar names)
- You need to find patterns inside specific contexts (inside loops, inside try blocks)
- The pattern has structural meaning, not just text
Stick with grep when:
- Simple text search
- Speed is critical
- The pattern is a literal string or simple regex
在探索代码库时,grep只能查找文本模式,但无法识别结构模式。Opengrep能够理解代码的结构:
| 任务 | Grep | Opengrep |
|---|---|---|
| 查找文本"execute" | 速度快,可用 | 大材小用 |
查找 | 可能匹配注释、字符串内容 | 仅匹配实际的函数调用 |
查找调用 | 难度大 | 结合 |
| 查找未参数化的SQL查询 | 几乎无法实现 | 污点模式(Taint mode) |
使用Opengrep的场景:
- 你需要查找带有特定参数的函数/方法调用
- Grep返回大量误报(匹配到注释、字符串、相似名称的内容)
- 你需要在特定上下文(如循环内部、try块内部)查找模式
- 目标模式具有结构意义,而非单纯的文本匹配
继续使用Grep的场景:
- 简单的文本搜索
- 对速度要求极高
- 目标模式是字面量字符串或简单正则表达式
2. Security Scanning
2. 安全扫描
Run rulesets to detect vulnerabilities and insecure patterns.
运行规则集来检测漏洞和不安全的代码模式。
Installation
安装
Linux / macOS
Linux / macOS
bash
curl -fsSL https://raw.githubusercontent.com/opengrep/opengrep/main/install.sh | bashbash
curl -fsSL https://raw.githubusercontent.com/opengrep/opengrep/main/install.sh | bashWindows (PowerShell)
Windows (PowerShell)
powershell
irm https://raw.githubusercontent.com/opengrep/opengrep/main/install.ps1 | iexpowershell
irm https://raw.githubusercontent.com/opengrep/opengrep/main/install.ps1 | iexManual Install
手动安装
Download binaries from the releases page.
从发布页面下载二进制文件。
Verify
验证安装
bash
opengrep --versionSelf-contained binaries are available for macOS, Linux, and Windows. No Python required.
Run to discover all available options and flags.
opengrep scan --helpbash
opengrep --version我们提供适用于macOS、Linux和Windows的独立二进制文件,无需安装Python。
运行查看所有可用选项和标志。
opengrep scan --helpCode Search Patterns
代码搜索模式
Quick One-Liners
快速单行命令
bash
undefinedbash
undefinedFind all calls to a function
Find all calls to a function
opengrep scan -e 'dangerous_function(...)' -l python .
opengrep scan -e 'dangerous_function(...)' -l python .
Find method calls on specific objects
Find method calls on specific objects
opengrep scan -e '$OBJ.execute(...)' -l python .
opengrep scan -e '$OBJ.execute(...)' -l python .
Find assignments to a variable name
Find assignments to a variable name
opengrep scan -e '$VAR = os.environ.get(...)' -l python .
opengrep scan -e '$VAR = os.environ.get(...)' -l python .
Find function definitions
Find function definitions
opengrep scan -e 'def $FUNC(...): ...' -l python .
undefinedopengrep scan -e 'def $FUNC(...): ...' -l python .
undefinedPerformance Note
性能说明
Opengrep parses the entire file into an AST. For a quick text search, grep is 10-100x faster. Use Opengrep when the structural match is worth the overhead.
Opengrep会将整个文件解析为AST(抽象语法树)。对于快速文本搜索,grep的速度是Opengrep的10-100倍。只有当结构匹配的需求值得付出性能开销时,才使用Opengrep。
Security Scanning
安全扫描
Quick Scan
快速扫描
bash
undefinedbash
undefinedScan with a ruleset
Scan with a ruleset
opengrep scan --config p/security-audit .
opengrep scan --config p/security-audit .
Multiple rulesets
Multiple rulesets
opengrep scan --config p/security-audit --config p/owasp-top-ten .
opengrep scan --config p/security-audit --config p/owasp-top-ten .
Scan specific paths
Scan specific paths
opengrep scan --config p/python src/
undefinedopengrep scan --config p/python src/
undefinedOutput Formats
输出格式
bash
undefinedbash
undefinedSARIF for tooling
SARIF for tooling
opengrep scan --config p/default --sarif -o results.sarif .
opengrep scan --config p/default --sarif -o results.sarif .
JSON for automation
JSON for automation
opengrep scan --config p/default --json -o results.json .
opengrep scan --config p/default --json -o results.json .
Show data flow traces
Show data flow traces
opengrep scan --dataflow-traces -f rule.yaml .
opengrep scan --dataflow-traces -f rule.yaml .
Include enclosing context (function/class) in JSON output (experimental)
Include enclosing context (function/class) in JSON output (experimental)
opengrep scan --output-enclosing-context --json -f rule.yaml . --experimental
undefinedopengrep scan --output-enclosing-context --json -f rule.yaml . --experimental
undefinedFiltering
过滤选项
bash
undefinedbash
undefinedBy severity
By severity
opengrep scan --config p/default --severity ERROR .
opengrep scan --config p/default --severity ERROR .
By path
By path
opengrep scan --config p/default --include='src/' --exclude='/test/**' .
opengrep scan --config p/default --include='src/' --exclude='/test/**' .
Apply exclusions to explicitly passed file targets (not just directory scan roots)
Apply exclusions to explicitly passed file targets (not just directory scan roots)
opengrep scan --force-exclude --exclude='/vendor/' -f rule.yaml vendor/lib.py
undefinedopengrep scan --force-exclude --exclude='/vendor/' -f rule.yaml vendor/lib.py
undefinedIntrafile Cross-Function Tainting
文件内跨函数污点追踪
Opengrep supports tracking taint across functions within a file:
bash
opengrep scan --taint-intrafile -f taint-rule.yaml .This enables higher-order function support and is similar to Semgrep Pro's .
--pro-intrafileOpengrep支持追踪单个文件内跨函数的数据流污点:
bash
opengrep scan --taint-intrafile -f taint-rule.yaml .这一功能支持高阶函数,与Semgrep Pro的功能类似。
--pro-intrafileWriting Custom Rules
编写自定义规则
Basic Rule Structure
基本规则结构
yaml
rules:
- id: hardcoded-secret
languages: [python]
message: "Hardcoded secret detected in $VAR"
severity: ERROR
patterns:
- pattern: $VAR = "$VALUE"
- metavariable-regex:
metavariable: $VALUE
regex: ^sk_live_yaml
rules:
- id: hardcoded-secret
languages: [python]
message: "Hardcoded secret detected in $VAR"
severity: ERROR
patterns:
- pattern: $VAR = "$VALUE"
- metavariable-regex:
metavariable: $VALUE
regex: ^sk_live_Rule ID Uniqueness
规则ID唯一性
Important: Rule IDs must be unique across all rules in a configuration. If multiple rules share the same ID, only one will be used due to deduplication during rule loading.
This is particularly important when writing rules for multiple languages. You cannot reuse the same rule ID even if the rules target different languages:
yaml
undefined重要提示:在一个配置文件中,所有规则的ID必须唯一。如果多个规则使用相同的ID,在规则加载去重时只会保留其中一个。
这一点在为多种语言编写规则时尤为重要。即使规则针对不同语言,也不能重复使用同一个ID:
yaml
undefinedWRONG - both rules have id: taint, only one will be active
WRONG - both rules have id: taint, only one will be active
rules:
-
id: taint languages: [python] pattern: dangerous_call(...)
...
-
id: taint languages: [rust] pattern: unsafe_fn(...)
...
```yamlrules:
-
id: taint languages: [python] pattern: dangerous_call(...)
...
-
id: taint languages: [rust] pattern: unsafe_fn(...)
...
```yamlCORRECT - unique IDs for each rule
CORRECT - unique IDs for each rule
rules:
-
id: taint-python-dangerous-call languages: [python] pattern: dangerous_call(...)
...
-
id: taint-rust-unsafe-fn languages: [rust] pattern: unsafe_fn(...)
...
Use descriptive IDs that include the language or context to avoid collisions.rules:
-
id: taint-python-dangerous-call languages: [python] pattern: dangerous_call(...)
...
-
id: taint-rust-unsafe-fn languages: [rust] pattern: unsafe_fn(...)
...
使用包含语言或上下文信息的描述性ID,避免冲突。Pattern Syntax
模式语法
| Syntax | Meaning |
|---|---|
| Match zero or more arguments/statements |
| Metavariable (captures any expression) |
| Ellipsis metavariable (captures zero or more) |
| Deep expression match (nested) |
| 语法 | 含义 |
|---|---|
| 匹配零个或多个参数/语句 |
| 元变量(捕获任意表达式) |
| 省略号元变量(捕获零个或多个内容) |
| 深度表达式匹配(嵌套结构) |
Pattern Operators
模式操作符
| Operator | Purpose |
|---|---|
| Match exact pattern |
| All must match (AND) |
| Any can match (OR) |
| Exclude matches |
| Must be inside context |
| Must not be inside context |
| Regex matching |
| Filter captured values by regex |
| Compare captured values |
| 操作符 | 用途 |
|---|---|
| 匹配精确模式 |
| 所有模式必须匹配(逻辑与) |
| 任意模式匹配即可(逻辑或) |
| 排除匹配结果 |
| 必须在指定上下文内匹配 |
| 必须不在指定上下文内匹配 |
| 正则表达式匹配 |
| 通过正则表达式过滤捕获的值 |
| 比较捕获的值 |
Combining Patterns
组合模式
yaml
rules:
- id: dangerous-deserialization
languages: [python]
message: "Unsafe pickle load on potentially untrusted data"
severity: ERROR
patterns:
- pattern-either:
- pattern: pickle.load(...)
- pattern: pickle.loads(...)
- pattern-not-inside: |
def $FUNC(...):
...yaml
rules:
- id: dangerous-deserialization
languages: [python]
message: "Unsafe pickle load on potentially untrusted data"
severity: ERROR
patterns:
- pattern-either:
- pattern: pickle.load(...)
- pattern: pickle.loads(...)
- pattern-not-inside: |
def $FUNC(...):
...Taint Mode
污点模式
For tracking data flow from sources to sinks:
yaml
rules:
- id: sql-injection
languages: [python]
message: "User input flows to SQL query without parameterization"
severity: ERROR
mode: taint
pattern-sources:
- pattern: request.args.get(...)
- pattern: request.form[...]
pattern-sinks:
- pattern: cursor.execute($QUERY, ...)
focus-metavariable: $QUERY
pattern-sanitizers:
- pattern: int(...)
- pattern: escape(...)Key taint concepts:
- Sources: Where untrusted data enters
- Sinks: Where data becomes dangerous
- Sanitizers: What makes data safe
用于追踪从数据源到数据汇的数据流:
yaml
rules:
- id: sql-injection
languages: [python]
message: "User input flows to SQL query without parameterization"
severity: ERROR
mode: taint
pattern-sources:
- pattern: request.args.get(...)
- pattern: request.form[...]
pattern-sinks:
- pattern: cursor.execute($QUERY, ...)
focus-metavariable: $QUERY
pattern-sanitizers:
- pattern: int(...)
- pattern: escape(...)污点模式核心概念:
- Sources(数据源):不可信数据进入系统的位置
- Sinks(数据汇):数据会引发危险的位置
- Sanitizers(清理器):使数据变得安全的处理操作
YAML Pitfalls
YAML陷阱
Patterns are YAML string values. Any pattern containing (colon-space) will be misinterpreted as a YAML mapping and cause a validation error. Always quote such patterns:
: yaml
undefined模式是YAML字符串值。任何包含(冒号加空格)的模式都会被YAML解析器错误地识别为映射关系,导致验证错误。请务必为这类模式添加引号:
: yaml
undefinedBROKEN -- YAML parser sees "shell: true" as a nested mapping
BROKEN -- YAML parser sees "shell: true" as a nested mapping
- pattern: spawn($CMD, { ..., shell: true, ... })
- pattern: spawn($CMD, { ..., shell: true, ... })
CORRECT -- quoted string, parsed as a single pattern value
CORRECT -- quoted string, parsed as a single pattern value
- pattern: "spawn($CMD, { ..., shell: true, ... })"
Common triggers: `shell: true`, `mode: 0o777`, `redirect: "follow"`, `error: $ERR`. When in doubt, quote the pattern.
For patterns containing both double quotes and colons, use escaped inner quotes:
```yaml
- pattern: "fetch($URL, { ..., redirect: \"follow\", ... })"- pattern: "spawn($CMD, { ..., shell: true, ... })"
常见触发场景:`shell: true`、`mode: 0o777`、`redirect: "follow"`、`error: $ERR`。不确定时,就给模式加上引号。
对于同时包含双引号和冒号的模式,使用转义的内部引号:
```yaml
- pattern: "fetch($URL, { ..., redirect: \\"follow\\", ... })"Scanning Non-Code Files (generic language)
扫描非代码文件(通用语言模式)
For XML configs, YAML pipelines, Dockerfiles, and other non-code files, structural patterns may not work. Use with :
languages: [generic]pattern-regexyaml
rules:
- id: cleartext-traffic
patterns:
- pattern-regex: 'cleartextTrafficPermitted\s*=\s*"true"'
languages: [generic]
severity: WARNING
paths:
include:
- "app/src/"The key restricts which files a rule applies to:
pathsyaml
paths:
include:
- ".github/workflows/" # Only scan GHA workflow files
exclude:
- "test/" # Skip test directoriesNote: exists but does not work reliably across multiple Dockerfile directives. Prefer + regex for Dockerfile security checks that span multiple lines.
languages: [dockerfile]pattern-not-insidegeneric对于XML配置文件、YAML流水线、Dockerfile等非代码文件,结构模式可能无法正常工作。可使用结合:
languages: [generic]pattern-regexyaml
rules:
- id: cleartext-traffic
patterns:
- pattern-regex: 'cleartextTrafficPermitted\\s*=\\s*"true"'
languages: [generic]
severity: WARNING
paths:
include:
- "app/src/"pathsyaml
paths:
include:
- ".github/workflows/" # Only scan GHA workflow files
exclude:
- "test/" # Skip test directories注意:虽然存在,但在跨多个Dockerfile指令时无法可靠工作。对于需要跨多行检查的Dockerfile安全检测,建议使用模式结合正则表达式。
languages: [dockerfile]pattern-not-insidegenericmetavariable-pattern
metavariable-pattern
Use to constrain what a metavariable can match. It must be a list item inside , alongside the pattern that captures the metavariable:
metavariable-patternpatterns:yaml
rules:
- id: dynamic-property-assignment
patterns:
- pattern: $OBJ[$KEY] = $VALUE
- metavariable-pattern:
metavariable: $KEY
patterns:
- pattern-not: "..." # Exclude literal string keys
languages: [typescript, javascript]
severity: WARNINGThis is distinct from which filters by regex rather than by pattern.
metavariable-regex使用来约束元变量可匹配的内容。它必须作为列表中的一个项,与捕获该元变量的模式一起使用:
metavariable-patternpatterns:yaml
rules:
- id: dynamic-property-assignment
patterns:
- pattern: $OBJ[$KEY] = $VALUE
- metavariable-pattern:
metavariable: $KEY
patterns:
- pattern-not: "..." # Exclude literal string keys
languages: [typescript, javascript]
severity: WARNING这与不同,后者是通过正则表达式过滤,而是通过模式匹配过滤。
metavariable-regexmetavariable-patternPattern Parsing Limitations
模式解析限制
Some language constructs cannot be matched as standalone fragments:
- blocks:
catchalone is invalid. You must include thecatch ($ERR) { ... }:try. Even then, complex ellipsis inside the catch body may fail to parse for TypeScript.try { ... } catch ($ERR) { ... } - Workaround: Match the dangerous expression directly instead of wrapping in try/catch context.
某些语言结构无法作为独立片段进行匹配:
- 块:单独的
catch是无效的,必须包含catch ($ERR) { ... }部分:try。即便如此,对于TypeScript,catch块内部的复杂省略号可能仍无法正确解析。try { ... } catch ($ERR) { ... } - 解决方法:直接匹配危险的表达式,而非将其包裹在try/catch上下文中。
Rule Options (Opengrep-specific)
Opengrep专属规则选项
yaml
rules:
- id: expensive-rule
options:
timeout: 10 # Per-rule timeout (requires --allow-rule-timeout-control)
dynamic_timeout: true # Scale timeout with file size
max_match_per_file: 100 # Limit matches per file
# ... rest of ruleUse to enable per-rule timeouts.
--allow-rule-timeout-controlyaml
rules:
- id: expensive-rule
options:
timeout: 10 # Per-rule timeout (requires --allow-rule-timeout-control)
dynamic_timeout: true # Scale timeout with file size
max_match_per_file: 100 # Limit matches per file
# ... rest of rule需使用标志来启用规则级超时设置。
--allow-rule-timeout-controlTesting Rules
测试规则
Test File Annotations
测试文件注释标记
Place on the line immediately before the expected finding (for taint rules, before the sink):
# ruleid:python
undefined在预期发现问题的行的前一行添加(对于污点规则,添加在数据汇的前一行):
# ruleid:python
undefinedtest_rule.py
test_rule.py
def vulnerable():
user_id = request.args.get("id")
# ruleid: sql-injection
cursor.execute("SELECT * FROM users WHERE id = " + user_id)
def safe():
user_id = int(request.args.get("id"))
# ok: sql-injection
cursor.execute("SELECT * FROM users WHERE id = " + user_id)
undefineddef vulnerable():
user_id = request.args.get("id")
# ruleid: sql-injection
cursor.execute("SELECT * FROM users WHERE id = " + user_id)
def safe():
user_id = int(request.args.get("id"))
# ok: sql-injection
cursor.execute("SELECT * FROM users WHERE id = " + user_id)
undefinedRunning Tests
运行测试
bash
undefinedbash
undefinedTest a rule (supports multiple target files)
Test a rule (supports multiple target files)
opengrep test --config rule.yaml test_file.py test_file2.py
opengrep test --config rule.yaml test_file.py test_file2.py
Validate rule syntax (positional argument, not --config)
Validate rule syntax (positional argument, not --config)
opengrep validate rule.yaml
opengrep validate rule.yaml
Debug taint flow
Debug taint flow
opengrep scan --dataflow-traces -f rule.yaml test_file.py
undefinedopengrep scan --dataflow-traces -f rule.yaml test_file.py
undefinedConfiguration
配置
.semgrepignore
.semgrepignore
Opengrep uses for compatibility. Custom filename via :
.semgrepignore--semgrepignore-filenameundefinedOpengrep使用文件以保持兼容性。可通过指定自定义文件名:
.semgrepignore--semgrepignore-filenameundefinedIgnore directories
Ignore directories
vendor/
node_modules/
**/testdata/
vendor/
node_modules/
**/testdata/
Ignore patterns
Ignore patterns
*.min.js
*.generated.go
undefined*.min.js
*.generated.go
undefinedInline Suppressions
内联抑制
Default annotations: , , (all work):
nosemgrepnosemnoopengreppython
password = get_from_vault() # nosemgrep: hardcoded-password
password = get_from_vault() # nosem: hardcoded-password
password = get_from_vault() # noopengrep: hardcoded-passwordExtend with additional patterns using :
--opengrep-ignore-patternbash
undefined默认支持的注释标记:、、(均可生效):
nosemgrepnosemnoopengreppython
password = get_from_vault() # nosemgrep: hardcoded-password
password = get_from_vault() # nosem: hardcoded-password
password = get_from_vault() # noopengrep: hardcoded-password可使用添加额外的抑制标记:
--opengrep-ignore-patternbash
undefinedAdd nosec as an additional suppression annotation
Add nosec as an additional suppression annotation
opengrep scan --opengrep-ignore-pattern='nosec' -f rule.yaml .
undefinedopengrep scan --opengrep-ignore-pattern='nosec' -f rule.yaml .
undefinedRule Metadata
规则元数据
yaml
rules:
- id: command-injection
metadata:
category: security
subcategory:
- vuln # vuln = confirmed vulnerability, audit = needs review
cwe:
- "CWE-78: Improper Neutralization of Special Elements used in an OS Command"
owasp:
- "A03:2021 - Injection"
confidence: HIGH # HIGH, MEDIUM, or LOW
references:
- https://owasp.org/Top10/A03_2021-Injection/
# ... rest of ruleThe // fields are used by tooling to classify and filter findings. Use for high-confidence vulnerabilities and for patterns that need manual review.
categorysubcategoryconfidencesubcategory: [vuln]subcategory: [audit]Use to include metavariable values in metadata output.
--inline-metavariablesyaml
rules:
- id: command-injection
metadata:
category: security
subcategory:
- vuln # vuln = confirmed vulnerability, audit = needs review
cwe:
- "CWE-78: Improper Neutralization of Special Elements used in an OS Command"
owasp:
- "A03:2021 - Injection"
confidence: HIGH # HIGH, MEDIUM, or LOW
references:
- https://owasp.org/Top10/A03_2021-Injection/
# ... rest of rulecategorysubcategoryconfidencesubcategory: [vuln]subcategory: [audit]使用可在元数据输出中包含元变量的值。
--inline-metavariablesCommon Rulesets
常用规则集
| Ruleset | Focus |
|---|---|
| Comprehensive security rules |
| OWASP Top 10 vulnerabilities |
| CWE Top 25 vulnerabilities |
| Language-specific |
Note: Ruleset availability may differ from Semgrep registry.
| 规则集 | 关注重点 |
|---|---|
| 全面的安全规则 |
| OWASP Top 10漏洞 |
| CWE Top 25漏洞 |
| 特定语言专属规则 |
注意:规则集的可用性可能与Semgrep注册表有所不同。
Differences from Semgrep
与Semgrep的差异
Opengrep is forked from Semgrep 1.100.0. Key differences:
- Semgrep Pro features, open: Intrafile cross-function tainting (), higher-order function support, and inter-method taint flow -- all available without a commercial license
--taint-intrafile - Additional languages: Visual Basic (not available in Semgrep CE or Pro), Apex, Elixir (not in Semgrep CE), improved Clojure with taint support
- Windows: Native support
- Per-rule timeouts: and
timeoutrule optionsdynamic_timeout - Match limits: rule option and
max_match_per_fileCLI flag--max-match-per-file - Context output: shows function/class context
--output-enclosing-context - Custom ignore patterns: extends default suppressions
--opengrep-ignore-pattern
For full changelog, see: https://github.com/opengrep/opengrep/blob/main/OPENGREP.md
Existing Semgrep rules and documentation generally apply.
Opengrep从Semgrep 1.100.0分叉而来。主要差异:
- Semgrep Pro功能开源化:文件内跨函数污点追踪()、高阶函数支持、方法间污点流——所有功能无需商业许可证即可使用
--taint-intrafile - 新增语言支持:Visual Basic(Semgrep CE和Pro均不支持)、Apex、Elixir(Semgrep CE不支持)、改进的Clojure支持并添加污点追踪
- Windows支持:原生Windows支持
- 规则级超时:和
timeout规则选项dynamic_timeout - 匹配限制:规则选项和
max_match_per_fileCLI标志--max-match-per-file - 上下文输出:可显示函数/类上下文
--output-enclosing-context - 自定义忽略模式:扩展默认抑制标记
--opengrep-ignore-pattern
现有Semgrep的规则和文档通常可直接适用于Opengrep。
Resources
资源
- Opengrep: https://github.com/opengrep/opengrep
- Semgrep rule syntax (compatible): https://semgrep.dev/docs/writing-rules/rule-syntax
- Opengrep仓库:https://github.com/opengrep/opengrep
- Semgrep规则语法(兼容):https://semgrep.dev/docs/writing-rules/rule-syntax