cpg-analysis
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseCPG Analysis Skill
CPG分析技能
Purpose: Deep code analysis beyond AST. Use Joern for full Code
Property Graph (control flow, data flow, program dependencies) and CodeQL
for interprocedural taint analysis and vulnerability detection.
These are opt-in tools. They require Docker/JVM (Joern) or CodeQL CLI.
Use codebase-memory-mcp (Tier 1, always-on) for everyday navigation.
Use these for deep analysis when Tier 1 is not enough.
┌────────────────────────────────────────────────────────────────┐
│ CODE PROPERTY GRAPH = AST + CFG + CDG + DDG + PDG │
│ ─────────────────────────────────────────────────────────────│
│ AST = Abstract Syntax Tree (structure) │
│ CFG = Control Flow Graph (execution paths) │
│ CDG = Control Dependency Graph (conditional dependencies) │
│ DDG = Data Dependency Graph (data flow between statements) │
│ PDG = Program Dependency Graph (CDG + DDG combined) │
│ │
│ Tier 2 (Joern): Full CPG with 40+ query tools │
│ Tier 3 (CodeQL): Interprocedural taint + security queries │
└────────────────────────────────────────────────────────────────┘用途: 超出AST能力范围的深度代码分析。使用Joern生成完整的代码属性图(控制流、数据流、程序依赖),使用CodeQL开展过程间污点分析和漏洞检测。
这些是可选工具。 它们需要安装Docker/JVM(用于Joern)或CodeQL CLI。
日常代码导航请使用codebase-memory-mcp(Tier 1,常驻运行)。
当Tier 1无法满足需求时,使用这些工具开展深度分析。
┌────────────────────────────────────────────────────────────────┐
│ CODE PROPERTY GRAPH = AST + CFG + CDG + DDG + PDG │
│ ─────────────────────────────────────────────────────────────│
│ AST = Abstract Syntax Tree (structure) │
│ CFG = Control Flow Graph (execution paths) │
│ CDG = Control Dependency Graph (conditional dependencies) │
│ DDG = Data Dependency Graph (data flow between statements) │
│ PDG = Program Dependency Graph (CDG + DDG combined) │
│ │
│ Tier 2 (Joern): Full CPG with 40+ query tools │
│ Tier 3 (CodeQL): Interprocedural taint + security queries │
└────────────────────────────────────────────────────────────────┘Tier Selection Guide
层级选择指南
Simple symbol lookup, dependency trace, blast radius?
→ Tier 1: codebase-memory-mcp (always on, sub-ms)
Control flow paths, data flow, dead code, complex refactoring?
→ Tier 2: Joern CPG (on-demand, seconds)
Security audit, taint analysis, vulnerability detection?
→ Tier 3: CodeQL (on-demand, seconds to minutes)
Full security review before release?
→ All three tiers in sequence简单符号查询、依赖追踪、影响范围评估?
→ Tier 1: codebase-memory-mcp(常驻运行,亚毫秒级响应)
控制流路径、数据流、死代码、复杂重构?
→ Tier 2: Joern CPG(按需启动,秒级响应)
安全审计、污点分析、漏洞检测?
→ Tier 3: CodeQL(按需启动,秒级到分钟级响应)
发布前全面安全评审?
→ 按顺序使用全部三个层级Tier 2: Joern CPG (CodeBadger MCP)
Tier 2: Joern CPG (CodeBadger MCP)
When to Use Joern
何时使用Joern
| Scenario | Why Joern | Tier 1 Can't Do This |
|---|---|---|
| Trace data flow through functions | Full DDG traversal | Tier 1 has no data flow |
| Understanding control flow paths | CFG analysis with branch conditions | Tier 1 has no CFG |
| Finding dead/unreachable code | PDG reachability analysis | Tier 1 only detects unused exports |
| Complex refactoring impact | Cross-function dependency chains | Tier 1 limited to call graph |
| Auditing third-party library usage | Deep call chain traversal | Tier 1 stops at import boundary |
| Understanding exception flow | CFG includes throw/catch paths | Tier 1 ignores exceptions |
| 场景 | 选用Joern的原因 | Tier 1无法实现的能力 |
|---|---|---|
| 跨函数追踪数据流 | 完整的DDG遍历能力 | Tier 1无数据流分析能力 |
| 理解控制流路径 | 带分支条件的CFG分析 | Tier 1无CFG分析能力 |
| 查找死代码/不可达代码 | PDG可达性分析 | Tier 1仅能检测未使用的导出项 |
| 复杂重构影响评估 | 跨函数依赖链分析 | Tier 1仅支持调用图分析 |
| 审计第三方库使用情况 | 深度调用链遍历 | Tier 1仅分析到导入边界 |
| 理解异常流 | CFG包含throw/catch路径 | Tier 1忽略异常处理逻辑 |
Key MCP Tools (Joern/CodeBadger)
核心MCP工具(Joern/CodeBadger)
| Tool | Purpose | Example Query |
|---|---|---|
| Build CPG for project | First-time setup or after major changes |
| Check CPG build status | Verify CPG is ready before querying |
| Run arbitrary CPGQL queries | |
| Query language reference | When unsure about query syntax |
| Control flow graph for a method | Understand execution paths in a function |
| List all methods in project | Overview of available functions |
| Get source code of a method | Read specific function source |
| List calls from/to a method | Caller/callee analysis |
| Full call graph visualization | Understand call chains |
| Type/class definitions | Understand type hierarchy |
| 工具 | 用途 | 示例查询 |
|---|---|---|
| 为项目构建CPG | 首次设置或重大变更后使用 |
| 检查CPG构建状态 | 查询前验证CPG是否就绪 |
| 运行任意CPGQL查询 | |
| 查询语言参考 | 不确定查询语法时使用 |
| 获取指定方法的控制流图 | 理解函数内的执行路径 |
| 列出项目中所有方法 | 概览可用函数 |
| 获取指定方法的源代码 | 读取特定函数源码 |
| 列出某个方法的入站/出站调用 | 调用方/被调用方分析 |
| 完整调用图可视化 | 理解调用链 |
| 类型/类定义 | 理解类型层级结构 |
Supported Languages (Joern)
支持的语言(Joern)
Java, Scala, C/C++, Python, JavaScript, TypeScript, PHP, Ruby, Go,
Kotlin, Swift, Lua
Not supported: Rust (use CodeQL for Rust)
Java、Scala、C/C++、Python、JavaScript、TypeScript、PHP、Ruby、Go、Kotlin、Swift、Lua
不支持: Rust(Rust请使用CodeQL)
MCP Configuration (Joern)
MCP配置(Joern)
json
{
"mcpServers": {
"codebadger": {
"url": "http://localhost:4242/mcp",
"type": "http"
}
}
}json
{
"mcpServers": {
"codebadger": {
"url": "http://localhost:4242/mcp",
"type": "http"
}
}
}Prerequisites
前置要求
- Docker (for Joern backend)
- Python 3.10+ (for MCP server)
- Install:
~/.claude/install-graph-tools.sh --joern
- Docker(用于Joern后端)
- Python 3.10+(用于MCP服务端)
- 安装命令:
~/.claude/install-graph-tools.sh --joern
Common CPGQL Queries
常用CPGQL查询
scala
// Find all methods that handle user input
cpg.method.where(_.parameter.name(".*input.*|.*request.*")).name.l
// Trace data flow from parameter to return
cpg.method("processPayment").parameter.reachableBy(cpg.method("processPayment").methodReturn).l
// Find methods with high cyclomatic complexity
cpg.method.where(_.controlStructure.size > 10).name.l
// Dead code: methods with no callers
cpg.method.where(_.callIn.size == 0).filter(_.name != "main").name.l
// Exception flow: methods that can throw but callers don't catch
cpg.method.where(_.ast.isThrow.size > 0).callIn.method.filter(_.ast.isTry.size == 0).name.lscala
// Find all methods that handle user input
cpg.method.where(_.parameter.name(".*input.*|.*request.*")).name.l
// Trace data flow from parameter to return
cpg.method("processPayment").parameter.reachableBy(cpg.method("processPayment").methodReturn).l
// Find methods with high cyclomatic complexity
cpg.method.where(_.controlStructure.size > 10).name.l
// Dead code: methods with no callers
cpg.method.where(_.callIn.size == 0).filter(_.name != "main").name.l
// Exception flow: methods that can throw but callers don't catch
cpg.method.where(_.ast.isThrow.size > 0).callIn.method.filter(_.ast.isTry.size == 0).name.lTier 3: CodeQL
Tier 3: CodeQL
When to Use CodeQL
何时使用CodeQL
| Scenario | Why CodeQL | Other Tiers Can't Do This |
|---|---|---|
| Security audit before release | Interprocedural taint analysis | Joern has basic taint, CodeQL is deeper |
| Reviewing auth/payment code | Data flow from source to sink | Cross-function, cross-file taint |
| PR security review | Targeted vulnerability scan | Pre-built OWASP query packs |
| Compliance checking | CWE/OWASP pattern matching | Curated security query suites |
| Rust security analysis | Full Rust support | Joern doesn't support Rust |
| 场景 | 选用CodeQL的原因 | 其他层级无法实现的能力 |
|---|---|---|
| 发布前安全审计 | 过程间污点分析 | Joern只有基础污点分析能力,CodeQL分析深度更高 |
| 审计认证/支付代码 | 从源到汇的数据流分析 | 跨函数、跨文件污点追踪 |
| PR安全评审 | 定向漏洞扫描 | 预置OWASP查询包 |
| 合规检查 | CWE/OWASP模式匹配 | 精选的安全查询套件 |
| Rust安全分析 | 完整的Rust支持 | Joern不支持Rust |
Key MCP Tools (CodeQL)
核心MCP工具(CodeQL)
| Tool | Purpose |
|---|---|
| Execute a CodeQL query against the database |
| Locate symbol definitions |
| Find all references to a symbol |
| Parse BQRS (Binary Query Result Sets) |
| 工具 | 用途 |
|---|---|
| 针对数据库执行CodeQL查询 |
| 定位符号定义 |
| 查找符号的所有引用 |
| 解析BQRS(二进制查询结果集) |
Supported Languages (CodeQL)
支持的语言(CodeQL)
C/C++, C#, Go, Java, Kotlin, JavaScript, TypeScript, Python, Ruby,
Swift, Rust
C/C++、C#、Go、Java、Kotlin、JavaScript、TypeScript、Python、Ruby、Swift、Rust
MCP Configuration (CodeQL)
MCP配置(CodeQL)
json
{
"mcpServers": {
"codeql": {
"command": "codeql-mcp",
"args": ["--database", ".code-graph/codeql-db"]
}
}
}json
{
"mcpServers": {
"codeql": {
"command": "codeql-mcp",
"args": ["--database", ".code-graph/codeql-db"]
}
}
}Prerequisites
前置要求
- CodeQL CLI (on macOS)
brew install codeql - Install:
~/.claude/install-graph-tools.sh --codeql
- CodeQL CLI(macOS上可执行安装)
brew install codeql - 安装命令:
~/.claude/install-graph-tools.sh --codeql
Common CodeQL Patterns
常用CodeQL模式
ql
// SQL injection: user input flows to SQL query
import python
from DataFlow::PathNode source, DataFlow::PathNode sink
where TaintTracking::hasFlowPath(source, sink)
and source instanceof RemoteFlowSource
and sink instanceof SqlExecution
select sink, source, sink, "SQL injection from $@.", source, "user input"
// Unvalidated redirect
from DataFlow::PathNode source, DataFlow::PathNode sink
where source instanceof RemoteFlowSource
and sink instanceof RedirectSink
select sink, "Unvalidated redirect from user input"ql
// SQL injection: user input flows to SQL query
import python
from DataFlow::PathNode source, DataFlow::PathNode sink
where TaintTracking::hasFlowPath(source, sink)
and source instanceof RemoteFlowSource
and sink instanceof SqlExecution
select sink, source, sink, "SQL injection from $@.", source, "user input"
// Unvalidated redirect
from DataFlow::PathNode source, DataFlow::PathNode sink
where source instanceof RemoteFlowSource
and sink instanceof RedirectSink
select sink, "Unvalidated redirect from user input"Combined Workflow: Deep Analysis
组合工作流:深度分析
When performing security review or complex refactoring, use all tiers:
1. SCOPE → Tier 1: detect_changes / get_architecture
Identify files and modules in scope
2. STRUCTURE → Tier 1: search_graph / trace_call_path
Map the call graph and dependencies
3. FLOW → Tier 2: get_cfg / run_cpgql_query
Analyze control flow and data flow paths
4. SECURITY → Tier 3: run_query with taint analysis
Check for vulnerabilities in data paths
5. REPORT → Combine findings from all tiers
Prioritize: Critical > High > Medium > Low开展安全评审或复杂重构时,可结合使用所有层级:
1. 范围确认 → Tier 1: detect_changes / get_architecture
识别涉及的文件和模块范围
2. 结构梳理 → Tier 1: search_graph / trace_call_path
绘制调用图和依赖关系
3. 流分析 → Tier 2: get_cfg / run_cpgql_query
分析控制流和数据流路径
4. 安全检查 → Tier 3: 运行带污点分析的查询
检查数据路径中的漏洞
5. 报告输出 → 整合所有层级的发现
优先级排序:严重 > 高危 > 中危 > 低危Anti-Patterns
反模式
| Anti-Pattern | Do This Instead |
|---|---|
| Using Joern/CodeQL for simple symbol lookup | Use Tier 1 |
| Running full CPG build on every commit | Build CPG on-demand; use Tier 1 for continuous monitoring |
Querying Joern without checking | Always verify CPG is built and current before querying |
| Running CodeQL without a specific security question | Have a hypothesis first; CodeQL queries are expensive |
| Ignoring Tier 1 blast radius before deep analysis | Always scope with Tier 1 first, then go deep on flagged areas |
| Using CodeQL for non-security structural queries | Use Joern CPGQL for structural/flow queries; CodeQL for security |
| 反模式 | 推荐做法 |
|---|---|
| 简单符号查询也使用Joern/CodeQL | 使用Tier 1的 |
| 每次提交都构建完整CPG | 按需构建CPG;使用Tier 1做持续监控 |
未检查 | 查询前始终验证CPG已构建且是最新版本 |
| 没有明确的安全问题就运行CodeQL | 先有分析假设;CodeQL查询资源成本较高 |
| 深度分析前不使用Tier 1评估影响范围 | 始终先用Tier 1划定范围,再针对标记区域做深度分析 |
| 使用CodeQL做非安全类结构查询 | 结构/流查询使用Joern CPGQL;CodeQL仅用于安全分析 |