opengrep

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Opengrep Static Analysis

Opengrep 静态分析

Opengrep is a community-maintained, open-source static analysis tool forked from Semgrep. It uses the same rule syntax and CLI interface, making existing Semgrep rules and knowledge transferable.

Opengrep是一款由社区维护的开源静态分析工具，从Semgrep分叉而来。它使用与Semgrep相同的规则语法和CLI界面，因此现有Semgrep的规则和相关知识可直接迁移使用。

Two Use Cases

两种使用场景

1. Semantic Code Search (grep alternative)

1. 语义化代码搜索（grep替代工具）

When exploring a codebase, grep finds text patterns but misses structural patterns. Opengrep understands code structure:

Task	Grep	Opengrep
Find text "execute"	Fast, works	Overkill
Find `cursor.execute(...)` calls	May match comments, strings	Matches only actual calls
Find functions that call `os.system`	Difficult	`pattern-inside` + `pattern`
Find unparameterized SQL queries	Nearly impossible	Taint mode

Use Opengrep when:

You need to find function/method calls with specific arguments
Grep returns too many false positives (matches in comments, strings, similar names)
You need to find patterns inside specific contexts (inside loops, inside try blocks)
The pattern has structural meaning, not just text

Stick with grep when:

Simple text search
Speed is critical
The pattern is a literal string or simple regex

在探索代码库时，grep只能查找文本模式，但无法识别结构模式。Opengrep能够理解代码的结构：

任务	Grep	Opengrep
查找文本"execute"	速度快，可用	大材小用
查找 `cursor.execute(...)` 调用	可能匹配注释、字符串内容	仅匹配实际的函数调用
查找调用 `os.system` 的函数	难度大	结合 `pattern-inside` 与 `pattern` 实现
查找未参数化的SQL查询	几乎无法实现	污点模式（Taint mode）

使用Opengrep的场景：

你需要查找带有特定参数的函数/方法调用
Grep返回大量误报（匹配到注释、字符串、相似名称的内容）
你需要在特定上下文（如循环内部、try块内部）查找模式
目标模式具有结构意义，而非单纯的文本匹配

继续使用Grep的场景：

简单的文本搜索
对速度要求极高
目标模式是字面量字符串或简单正则表达式

2. Security Scanning

2. 安全扫描

Run rulesets to detect vulnerabilities and insecure patterns.

运行规则集来检测漏洞和不安全的代码模式。

Installation

安装

Linux / macOS

bash

curl -fsSL https://raw.githubusercontent.com/opengrep/opengrep/main/install.sh | bash

bash

curl -fsSL https://raw.githubusercontent.com/opengrep/opengrep/main/install.sh | bash

Windows (PowerShell)

powershell

irm https://raw.githubusercontent.com/opengrep/opengrep/main/install.ps1 | iex

powershell

irm https://raw.githubusercontent.com/opengrep/opengrep/main/install.ps1 | iex

Manual Install

手动安装

Download binaries from the releases page.

从发布页面下载二进制文件。

Verify

验证安装

bash

opengrep --version

Self-contained binaries are available for macOS, Linux, and Windows. No Python required.

Run

opengrep scan --help

to discover all available options and flags.

bash

opengrep --version

我们提供适用于macOS、Linux和Windows的独立二进制文件，无需安装Python。

运行

opengrep scan --help

查看所有可用选项和标志。

Code Search Patterns

代码搜索模式

Quick One-Liners

快速单行命令

bash

undefined

bash

undefined

Find all calls to a function

opengrep scan -e 'dangerous_function(...)' -l python .

Find method calls on specific objects

opengrep scan -e '$OBJ.execute(...)' -l python .

Find assignments to a variable name

opengrep scan -e '$VAR = os.environ.get(...)' -l python .

Find function definitions

opengrep scan -e 'def $FUNC(...): ...' -l python .

undefined

opengrep scan -e 'def $FUNC(...): ...' -l python .

undefined

Performance Note

性能说明

Opengrep parses the entire file into an AST. For a quick text search, grep is 10-100x faster. Use Opengrep when the structural match is worth the overhead.

Opengrep会将整个文件解析为AST（抽象语法树）。对于快速文本搜索，grep的速度是Opengrep的10-100倍。只有当结构匹配的需求值得付出性能开销时，才使用Opengrep。

Security Scanning

安全扫描

Quick Scan

快速扫描

bash

undefined

bash

undefined

Scan with a ruleset

opengrep scan --config p/security-audit .

Multiple rulesets

opengrep scan --config p/security-audit --config p/owasp-top-ten .

Scan specific paths

opengrep scan --config p/python src/

undefined

opengrep scan --config p/python src/

undefined

Output Formats

输出格式

bash

undefined

bash

undefined

SARIF for tooling

opengrep scan --config p/default --sarif -o results.sarif .

JSON for automation

opengrep scan --config p/default --json -o results.json .

Show data flow traces

opengrep scan --dataflow-traces -f rule.yaml .

Include enclosing context (function/class) in JSON output (experimental)

opengrep scan --output-enclosing-context --json -f rule.yaml . --experimental

undefined

opengrep scan --output-enclosing-context --json -f rule.yaml . --experimental

undefined

Filtering

过滤选项

bash

undefined

bash

undefined

By severity

opengrep scan --config p/default --severity ERROR .

By path

opengrep scan --config p/default --include='src/' --exclude='/test/**' .

Apply exclusions to explicitly passed file targets (not just directory scan roots)

opengrep scan --force-exclude --exclude='/vendor/' -f rule.yaml vendor/lib.py

undefined

opengrep scan --force-exclude --exclude='/vendor/' -f rule.yaml vendor/lib.py

undefined

Intrafile Cross-Function Tainting

文件内跨函数污点追踪

Opengrep supports tracking taint across functions within a file:

bash

opengrep scan --taint-intrafile -f taint-rule.yaml .

This enables higher-order function support and is similar to Semgrep Pro's

--pro-intrafile

Opengrep支持追踪单个文件内跨函数的数据流污点：

bash

opengrep scan --taint-intrafile -f taint-rule.yaml .

这一功能支持高阶函数，与Semgrep Pro的

--pro-intrafile

功能类似。

Writing Custom Rules

编写自定义规则

Basic Rule Structure

基本规则结构

yaml

rules:
  - id: hardcoded-secret
    languages: [python]
    message: "Hardcoded secret detected in $VAR"
    severity: ERROR
    patterns:
      - pattern: $VAR = "$VALUE"
      - metavariable-regex:
          metavariable: $VALUE
          regex: ^sk_live_

yaml

rules:
  - id: hardcoded-secret
    languages: [python]
    message: "Hardcoded secret detected in $VAR"
    severity: ERROR
    patterns:
      - pattern: $VAR = "$VALUE"
      - metavariable-regex:
          metavariable: $VALUE
          regex: ^sk_live_

Rule ID Uniqueness

规则ID唯一性

Important: Rule IDs must be unique across all rules in a configuration. If multiple rules share the same ID, only one will be used due to deduplication during rule loading.

This is particularly important when writing rules for multiple languages. You cannot reuse the same rule ID even if the rules target different languages:

yaml

undefined

重要提示：在一个配置文件中，所有规则的ID必须唯一。如果多个规则使用相同的ID，在规则加载去重时只会保留其中一个。

这一点在为多种语言编写规则时尤为重要。即使规则针对不同语言，也不能重复使用同一个ID：

yaml

undefined

WRONG - both rules have id: taint, only one will be active

rules:

id: taint languages: [python] pattern: dangerous_call(...)

...
id: taint languages: [rust] pattern: unsafe_fn(...)

...


```yaml

rules:

id: taint languages: [python] pattern: dangerous_call(...)

...
id: taint languages: [rust] pattern: unsafe_fn(...)

...


```yaml

CORRECT - unique IDs for each rule

rules:

id: taint-python-dangerous-call languages: [python] pattern: dangerous_call(...)

...
id: taint-rust-unsafe-fn languages: [rust] pattern: unsafe_fn(...)

...


Use descriptive IDs that include the language or context to avoid collisions.

rules:

id: taint-python-dangerous-call languages: [python] pattern: dangerous_call(...)

...
id: taint-rust-unsafe-fn languages: [rust] pattern: unsafe_fn(...)

...


使用包含语言或上下文信息的描述性ID，避免冲突。

Pattern Syntax

模式语法

Syntax	Meaning
`...`	Match zero or more arguments/statements
`$VAR`	Metavariable (captures any expression)
`$...ARGS`	Ellipsis metavariable (captures zero or more)
`<... $X ...>`	Deep expression match (nested)

语法	含义
`...`	匹配零个或多个参数/语句
`$VAR`	元变量（捕获任意表达式）
`$...ARGS`	省略号元变量（捕获零个或多个内容）
`<... $X ...>`	深度表达式匹配（嵌套结构）

Pattern Operators

模式操作符

Operator	Purpose
`pattern`	Match exact pattern
`patterns`	All must match (AND)
`pattern-either`	Any can match (OR)
`pattern-not`	Exclude matches
`pattern-inside`	Must be inside context
`pattern-not-inside`	Must not be inside context
`pattern-regex`	Regex matching
`metavariable-regex`	Filter captured values by regex
`metavariable-comparison`	Compare captured values

操作符	用途
`pattern`	匹配精确模式
`patterns`	所有模式必须匹配（逻辑与）
`pattern-either`	任意模式匹配即可（逻辑或）
`pattern-not`	排除匹配结果
`pattern-inside`	必须在指定上下文内匹配
`pattern-not-inside`	必须不在指定上下文内匹配
`pattern-regex`	正则表达式匹配
`metavariable-regex`	通过正则表达式过滤捕获的值
`metavariable-comparison`	比较捕获的值

Combining Patterns

组合模式

yaml

rules:
  - id: dangerous-deserialization
    languages: [python]
    message: "Unsafe pickle load on potentially untrusted data"
    severity: ERROR
    patterns:
      - pattern-either:
          - pattern: pickle.load(...)
          - pattern: pickle.loads(...)
      - pattern-not-inside: |
          def $FUNC(...):
            ...

yaml

rules:
  - id: dangerous-deserialization
    languages: [python]
    message: "Unsafe pickle load on potentially untrusted data"
    severity: ERROR
    patterns:
      - pattern-either:
          - pattern: pickle.load(...)
          - pattern: pickle.loads(...)
      - pattern-not-inside: |
          def $FUNC(...):
            ...

Taint Mode

污点模式

For tracking data flow from sources to sinks:

yaml

rules:
  - id: sql-injection
    languages: [python]
    message: "User input flows to SQL query without parameterization"
    severity: ERROR
    mode: taint
    pattern-sources:
      - pattern: request.args.get(...)
      - pattern: request.form[...]
    pattern-sinks:
      - pattern: cursor.execute($QUERY, ...)
        focus-metavariable: $QUERY
    pattern-sanitizers:
      - pattern: int(...)
      - pattern: escape(...)

Key taint concepts:

Sources: Where untrusted data enters
Sinks: Where data becomes dangerous
Sanitizers: What makes data safe

用于追踪从数据源到数据汇的数据流：

yaml

rules:
  - id: sql-injection
    languages: [python]
    message: "User input flows to SQL query without parameterization"
    severity: ERROR
    mode: taint
    pattern-sources:
      - pattern: request.args.get(...)
      - pattern: request.form[...]
    pattern-sinks:
      - pattern: cursor.execute($QUERY, ...)
        focus-metavariable: $QUERY
    pattern-sanitizers:
      - pattern: int(...)
      - pattern: escape(...)

污点模式核心概念：

Sources（数据源）：不可信数据进入系统的位置
Sinks（数据汇）：数据会引发危险的位置
Sanitizers（清理器）：使数据变得安全的处理操作

YAML Pitfalls

YAML陷阱

Patterns are YAML string values. Any pattern containing

(colon-space) will be misinterpreted as a YAML mapping and cause a validation error. Always quote such patterns:

yaml

undefined

模式是YAML字符串值。任何包含

（冒号加空格）的模式都会被YAML解析器错误地识别为映射关系，导致验证错误。请务必为这类模式添加引号：

yaml

undefined

BROKEN -- YAML parser sees "shell: true" as a nested mapping

pattern: spawn($CMD, { ..., shell: true, ... })

pattern: spawn($CMD, { ..., shell: true, ... })

CORRECT -- quoted string, parsed as a single pattern value

pattern: "spawn($CMD, { ..., shell: true, ... })"


Common triggers: `shell: true`, `mode: 0o777`, `redirect: "follow"`, `error: $ERR`. When in doubt, quote the pattern.

For patterns containing both double quotes and colons, use escaped inner quotes:

```yaml
- pattern: "fetch($URL, { ..., redirect: \"follow\", ... })"

pattern: "spawn($CMD, { ..., shell: true, ... })"


常见触发场景：`shell: true`、`mode: 0o777`、`redirect: "follow"`、`error: $ERR`。不确定时，就给模式加上引号。

对于同时包含双引号和冒号的模式，使用转义的内部引号：

```yaml
- pattern: "fetch($URL, { ..., redirect: \\"follow\\", ... })"

Scanning Non-Code Files (generic language)

扫描非代码文件（通用语言模式）

For XML configs, YAML pipelines, Dockerfiles, and other non-code files, structural patterns may not work. Use

languages: [generic]

with

pattern-regex

yaml

rules:
  - id: cleartext-traffic
    patterns:
      - pattern-regex: 'cleartextTrafficPermitted\s*=\s*"true"'
    languages: [generic]
    severity: WARNING
    paths:
      include:
        - "app/src/"

The

paths

key restricts which files a rule applies to:

yaml

paths:
  include:
    - ".github/workflows/"    # Only scan GHA workflow files
  exclude:
    - "test/"                 # Skip test directories

Note:

languages: [dockerfile]

exists but

pattern-not-inside

does not work reliably across multiple Dockerfile directives. Prefer

generic

+ regex for Dockerfile security checks that span multiple lines.

对于XML配置文件、YAML流水线、Dockerfile等非代码文件，结构模式可能无法正常工作。可使用

languages: [generic]

结合

pattern-regex

：

yaml

rules:
  - id: cleartext-traffic
    patterns:
      - pattern-regex: 'cleartextTrafficPermitted\\s*=\\s*"true"'
    languages: [generic]
    severity: WARNING
    paths:
      include:
        - "app/src/"

paths

键用于限制规则适用的文件路径：

yaml

paths:
  include:
    - ".github/workflows/"    # Only scan GHA workflow files
  exclude:
    - "test/"                 # Skip test directories

注意：虽然存在

languages: [dockerfile]

，但

pattern-not-inside

在跨多个Dockerfile指令时无法可靠工作。对于需要跨多行检查的Dockerfile安全检测，建议使用

generic

模式结合正则表达式。

metavariable-pattern

Use

metavariable-pattern

to constrain what a metavariable can match. It must be a list item inside

patterns:

, alongside the pattern that captures the metavariable:

yaml

rules:
  - id: dynamic-property-assignment
    patterns:
      - pattern: $OBJ[$KEY] = $VALUE
      - metavariable-pattern:
          metavariable: $KEY
          patterns:
            - pattern-not: "..."    # Exclude literal string keys
    languages: [typescript, javascript]
    severity: WARNING

This is distinct from

metavariable-regex

which filters by regex rather than by pattern.

使用

metavariable-pattern

来约束元变量可匹配的内容。它必须作为

patterns:

列表中的一个项，与捕获该元变量的模式一起使用：

yaml

rules:
  - id: dynamic-property-assignment
    patterns:
      - pattern: $OBJ[$KEY] = $VALUE
      - metavariable-pattern:
          metavariable: $KEY
          patterns:
            - pattern-not: "..."    # Exclude literal string keys
    languages: [typescript, javascript]
    severity: WARNING

这与

metavariable-regex

不同，后者是通过正则表达式过滤，而

metavariable-pattern

是通过模式匹配过滤。

Pattern Parsing Limitations

模式解析限制

Some language constructs cannot be matched as standalone fragments:

catch
blocks:
```
catch ($ERR) { ... }
```
alone is invalid. You must include the
```
try
```
:
```
try { ... } catch ($ERR) { ... }
```
. Even then, complex ellipsis inside the catch body may fail to parse for TypeScript.
Workaround: Match the dangerous expression directly instead of wrapping in try/catch context.

某些语言结构无法作为独立片段进行匹配：

catch
块：单独的
```
catch ($ERR) { ... }
```
是无效的，必须包含
```
try
```
部分：
```
try { ... } catch ($ERR) { ... }
```
。即便如此，对于TypeScript，catch块内部的复杂省略号可能仍无法正确解析。
解决方法：直接匹配危险的表达式，而非将其包裹在try/catch上下文中。

Rule Options (Opengrep-specific)

Opengrep专属规则选项

yaml

rules:
  - id: expensive-rule
    options:
      timeout: 10              # Per-rule timeout (requires --allow-rule-timeout-control)
      dynamic_timeout: true    # Scale timeout with file size
      max_match_per_file: 100  # Limit matches per file
    # ... rest of rule

Use

--allow-rule-timeout-control

to enable per-rule timeouts.

yaml

rules:
  - id: expensive-rule
    options:
      timeout: 10              # Per-rule timeout (requires --allow-rule-timeout-control)
      dynamic_timeout: true    # Scale timeout with file size
      max_match_per_file: 100  # Limit matches per file
    # ... rest of rule

需使用

--allow-rule-timeout-control

标志来启用规则级超时设置。

Testing Rules

测试规则

Test File Annotations

测试文件注释标记

Place

# ruleid:

on the line immediately before the expected finding (for taint rules, before the sink):

python

undefined

在预期发现问题的行的前一行添加

# ruleid:

（对于污点规则，添加在数据汇的前一行）：

python

undefined

test_rule.py

def vulnerable(): user_id = request.args.get("id") # ruleid: sql-injection cursor.execute("SELECT * FROM users WHERE id = " + user_id)

def safe(): user_id = int(request.args.get("id")) # ok: sql-injection cursor.execute("SELECT * FROM users WHERE id = " + user_id)

undefined

def vulnerable(): user_id = request.args.get("id") # ruleid: sql-injection cursor.execute("SELECT * FROM users WHERE id = " + user_id)

def safe(): user_id = int(request.args.get("id")) # ok: sql-injection cursor.execute("SELECT * FROM users WHERE id = " + user_id)

undefined

Running Tests

运行测试

bash

undefined

bash

undefined

Test a rule (supports multiple target files)

opengrep test --config rule.yaml test_file.py test_file2.py

Validate rule syntax (positional argument, not --config)

opengrep validate rule.yaml

Debug taint flow

opengrep scan --dataflow-traces -f rule.yaml test_file.py

undefined

opengrep scan --dataflow-traces -f rule.yaml test_file.py

undefined

Configuration

配置

.semgrepignore

Opengrep uses

.semgrepignore

for compatibility. Custom filename via

--semgrepignore-filename

undefined

Opengrep使用

.semgrepignore

文件以保持兼容性。可通过

--semgrepignore-filename

指定自定义文件名：

undefined

Ignore directories

vendor/ node_modules/ **/testdata/

Ignore patterns

*.min.js *.generated.go

undefined

*.min.js *.generated.go

undefined

Inline Suppressions

内联抑制

Default annotations:

nosemgrep

nosem

noopengrep

(all work):

python

password = get_from_vault()  # nosemgrep: hardcoded-password
password = get_from_vault()  # nosem: hardcoded-password
password = get_from_vault()  # noopengrep: hardcoded-password

Extend with additional patterns using

--opengrep-ignore-pattern

bash

undefined

默认支持的注释标记：

nosemgrep

、

nosem

、

noopengrep

（均可生效）：

python

password = get_from_vault()  # nosemgrep: hardcoded-password
password = get_from_vault()  # nosem: hardcoded-password
password = get_from_vault()  # noopengrep: hardcoded-password

可使用

--opengrep-ignore-pattern

添加额外的抑制标记：

bash

undefined

Add nosec as an additional suppression annotation

opengrep scan --opengrep-ignore-pattern='nosec' -f rule.yaml .

undefined

opengrep scan --opengrep-ignore-pattern='nosec' -f rule.yaml .

undefined

Rule Metadata

规则元数据

yaml

rules:
  - id: command-injection
    metadata:
      category: security
      subcategory:
        - vuln              # vuln = confirmed vulnerability, audit = needs review
      cwe:
        - "CWE-78: Improper Neutralization of Special Elements used in an OS Command"
      owasp:
        - "A03:2021 - Injection"
      confidence: HIGH      # HIGH, MEDIUM, or LOW
      references:
        - https://owasp.org/Top10/A03_2021-Injection/
    # ... rest of rule

The

category

subcategory

confidence

fields are used by tooling to classify and filter findings. Use

subcategory: [vuln]

for high-confidence vulnerabilities and

subcategory: [audit]

for patterns that need manual review.

Use

--inline-metavariables

to include metavariable values in metadata output.

yaml

rules:
  - id: command-injection
    metadata:
      category: security
      subcategory:
        - vuln              # vuln = confirmed vulnerability, audit = needs review
      cwe:
        - "CWE-78: Improper Neutralization of Special Elements used in an OS Command"
      owasp:
        - "A03:2021 - Injection"
      confidence: HIGH      # HIGH, MEDIUM, or LOW
      references:
        - https://owasp.org/Top10/A03_2021-Injection/
    # ... rest of rule

category

subcategory

confidence

字段被工具用于分类和过滤检测结果。对于高可信度的漏洞，使用

subcategory: [vuln]

；对于需要人工复核的模式，使用

subcategory: [audit]

。

使用

--inline-metavariables

可在元数据输出中包含元变量的值。

Common Rulesets

常用规则集

Ruleset	Focus
`p/security-audit`	Comprehensive security rules
`p/owasp-top-ten`	OWASP Top 10 vulnerabilities
`p/cwe-top-25`	CWE Top 25 vulnerabilities
`p/python` / `p/javascript` / `p/golang`	Language-specific

Note: Ruleset availability may differ from Semgrep registry.

规则集	关注重点
`p/security-audit`	全面的安全规则
`p/owasp-top-ten`	OWASP Top 10漏洞
`p/cwe-top-25`	CWE Top 25漏洞
`p/python` / `p/javascript` / `p/golang`	特定语言专属规则

注意：规则集的可用性可能与Semgrep注册表有所不同。

Differences from Semgrep

与Semgrep的差异

Opengrep is forked from Semgrep 1.100.0. Key differences:

Semgrep Pro features, open: Intrafile cross-function tainting (
```
--taint-intrafile
```
), higher-order function support, and inter-method taint flow -- all available without a commercial license
Additional languages: Visual Basic (not available in Semgrep CE or Pro), Apex, Elixir (not in Semgrep CE), improved Clojure with taint support
Windows: Native support
Per-rule timeouts:
```
timeout
```
and
```
dynamic_timeout
```
rule options
Match limits:
```
max_match_per_file
```
rule option and
```
--max-match-per-file
```
CLI flag
Context output:
```
--output-enclosing-context
```
shows function/class context
Custom ignore patterns:
```
--opengrep-ignore-pattern
```
extends default suppressions

For full changelog, see: https://github.com/opengrep/opengrep/blob/main/OPENGREP.md

Existing Semgrep rules and documentation generally apply.

Opengrep从Semgrep 1.100.0分叉而来。主要差异：

Semgrep Pro功能开源化：文件内跨函数污点追踪（
```
--taint-intrafile
```
）、高阶函数支持、方法间污点流——所有功能无需商业许可证即可使用
新增语言支持：Visual Basic（Semgrep CE和Pro均不支持）、Apex、Elixir（Semgrep CE不支持）、改进的Clojure支持并添加污点追踪
Windows支持：原生Windows支持
规则级超时：
```
timeout
```
和
```
dynamic_timeout
```
规则选项
匹配限制：
```
max_match_per_file
```
规则选项和
```
--max-match-per-file
```
CLI标志
上下文输出：
```
--output-enclosing-context
```
可显示函数/类上下文
自定义忽略模式：
```
--opengrep-ignore-pattern
```
扩展默认抑制标记

完整变更日志请查看：https://github.com/opengrep/opengrep/blob/main/OPENGREP.md

现有Semgrep的规则和文档通常可直接适用于Opengrep。

Resources

资源

Opengrep: https://github.com/opengrep/opengrep
Semgrep rule syntax (compatible): https://semgrep.dev/docs/writing-rules/rule-syntax

Opengrep仓库：https://github.com/opengrep/opengrep
Semgrep规则语法（兼容）：https://semgrep.dev/docs/writing-rules/rule-syntax