cpg-analysis

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

CPG Analysis Skill

CPG分析技能

Purpose: Deep code analysis beyond AST. Use Joern for full Code Property Graph (control flow, data flow, program dependencies) and CodeQL for interprocedural taint analysis and vulnerability detection.

These are opt-in tools. They require Docker/JVM (Joern) or CodeQL CLI. Use codebase-memory-mcp (Tier 1, always-on) for everyday navigation. Use these for deep analysis when Tier 1 is not enough.

┌────────────────────────────────────────────────────────────────┐
│  CODE PROPERTY GRAPH = AST + CFG + CDG + DDG + PDG             │
│  ─────────────────────────────────────────────────────────────│
│  AST  = Abstract Syntax Tree (structure)                       │
│  CFG  = Control Flow Graph (execution paths)                   │
│  CDG  = Control Dependency Graph (conditional dependencies)    │
│  DDG  = Data Dependency Graph (data flow between statements)   │
│  PDG  = Program Dependency Graph (CDG + DDG combined)          │
│                                                                │
│  Tier 2 (Joern): Full CPG with 40+ query tools                │
│  Tier 3 (CodeQL): Interprocedural taint + security queries     │
└────────────────────────────────────────────────────────────────┘

用途： 超出AST能力范围的深度代码分析。使用Joern生成完整的代码属性图（控制流、数据流、程序依赖），使用CodeQL开展过程间污点分析和漏洞检测。

这些是可选工具。 它们需要安装Docker/JVM（用于Joern）或CodeQL CLI。日常代码导航请使用codebase-memory-mcp（Tier 1，常驻运行）。当Tier 1无法满足需求时，使用这些工具开展深度分析。

┌────────────────────────────────────────────────────────────────┐
│  CODE PROPERTY GRAPH = AST + CFG + CDG + DDG + PDG             │
│  ─────────────────────────────────────────────────────────────│
│  AST  = Abstract Syntax Tree (structure)                       │
│  CFG  = Control Flow Graph (execution paths)                   │
│  CDG  = Control Dependency Graph (conditional dependencies)    │
│  DDG  = Data Dependency Graph (data flow between statements)   │
│  PDG  = Program Dependency Graph (CDG + DDG combined)          │
│                                                                │
│  Tier 2 (Joern): Full CPG with 40+ query tools                │
│  Tier 3 (CodeQL): Interprocedural taint + security queries     │
└────────────────────────────────────────────────────────────────┘

Tier Selection Guide

层级选择指南

Simple symbol lookup, dependency trace, blast radius?
  → Tier 1: codebase-memory-mcp (always on, sub-ms)

Control flow paths, data flow, dead code, complex refactoring?
  → Tier 2: Joern CPG (on-demand, seconds)

Security audit, taint analysis, vulnerability detection?
  → Tier 3: CodeQL (on-demand, seconds to minutes)

Full security review before release?
  → All three tiers in sequence

简单符号查询、依赖追踪、影响范围评估？
  → Tier 1: codebase-memory-mcp（常驻运行，亚毫秒级响应）

控制流路径、数据流、死代码、复杂重构？
  → Tier 2: Joern CPG（按需启动，秒级响应）

安全审计、污点分析、漏洞检测？
  → Tier 3: CodeQL（按需启动，秒级到分钟级响应）

发布前全面安全评审？
  → 按顺序使用全部三个层级

Tier 2: Joern CPG (CodeBadger MCP)

When to Use Joern

何时使用Joern

Scenario	Why Joern	Tier 1 Can't Do This
Trace data flow through functions	Full DDG traversal	Tier 1 has no data flow
Understanding control flow paths	CFG analysis with branch conditions	Tier 1 has no CFG
Finding dead/unreachable code	PDG reachability analysis	Tier 1 only detects unused exports
Complex refactoring impact	Cross-function dependency chains	Tier 1 limited to call graph
Auditing third-party library usage	Deep call chain traversal	Tier 1 stops at import boundary
Understanding exception flow	CFG includes throw/catch paths	Tier 1 ignores exceptions

场景	选用Joern的原因	Tier 1无法实现的能力
跨函数追踪数据流	完整的DDG遍历能力	Tier 1无数据流分析能力
理解控制流路径	带分支条件的CFG分析	Tier 1无CFG分析能力
查找死代码/不可达代码	PDG可达性分析	Tier 1仅能检测未使用的导出项
复杂重构影响评估	跨函数依赖链分析	Tier 1仅支持调用图分析
审计第三方库使用情况	深度调用链遍历	Tier 1仅分析到导入边界
理解异常流	CFG包含throw/catch路径	Tier 1忽略异常处理逻辑

Key MCP Tools (Joern/CodeBadger)

核心MCP工具（Joern/CodeBadger）

Tool	Purpose	Example Query
`generate_cpg`	Build CPG for project	First-time setup or after major changes
`get_cpg_status`	Check CPG build status	Verify CPG is ready before querying
`run_cpgql_query`	Run arbitrary CPGQL queries	`cpg.method("login").callOut.code.l`
`get_cpgql_syntax_help`	Query language reference	When unsure about query syntax
`get_cfg`	Control flow graph for a method	Understand execution paths in a function
`list_methods`	List all methods in project	Overview of available functions
`get_method_source`	Get source code of a method	Read specific function source
`list_calls`	List calls from/to a method	Caller/callee analysis
`get_call_graph`	Full call graph visualization	Understand call chains
`get_type_definition`	Type/class definitions	Understand type hierarchy

工具	用途	示例查询
`generate_cpg`	为项目构建CPG	首次设置或重大变更后使用
`get_cpg_status`	检查CPG构建状态	查询前验证CPG是否就绪
`run_cpgql_query`	运行任意CPGQL查询	`cpg.method("login").callOut.code.l`
`get_cpgql_syntax_help`	查询语言参考	不确定查询语法时使用
`get_cfg`	获取指定方法的控制流图	理解函数内的执行路径
`list_methods`	列出项目中所有方法	概览可用函数
`get_method_source`	获取指定方法的源代码	读取特定函数源码
`list_calls`	列出某个方法的入站/出站调用	调用方/被调用方分析
`get_call_graph`	完整调用图可视化	理解调用链
`get_type_definition`	类型/类定义	理解类型层级结构

Supported Languages (Joern)

支持的语言（Joern）

Java, Scala, C/C++, Python, JavaScript, TypeScript, PHP, Ruby, Go, Kotlin, Swift, Lua

Not supported: Rust (use CodeQL for Rust)

Java、Scala、C/C++、Python、JavaScript、TypeScript、PHP、Ruby、Go、Kotlin、Swift、Lua

不支持： Rust（Rust请使用CodeQL）

MCP Configuration (Joern)

MCP配置（Joern）

json

{
  "mcpServers": {
    "codebadger": {
      "url": "http://localhost:4242/mcp",
      "type": "http"
    }
  }
}

json

{
  "mcpServers": {
    "codebadger": {
      "url": "http://localhost:4242/mcp",
      "type": "http"
    }
  }
}

Prerequisites

前置要求

Docker (for Joern backend)
Python 3.10+ (for MCP server)

Install:

~/.claude/install-graph-tools.sh --joern

Docker（用于Joern后端）
Python 3.10+（用于MCP服务端）

安装命令：

~/.claude/install-graph-tools.sh --joern

Common CPGQL Queries

常用CPGQL查询

scala

// Find all methods that handle user input
cpg.method.where(_.parameter.name(".*input.*|.*request.*")).name.l

// Trace data flow from parameter to return
cpg.method("processPayment").parameter.reachableBy(cpg.method("processPayment").methodReturn).l

// Find methods with high cyclomatic complexity
cpg.method.where(_.controlStructure.size > 10).name.l

// Dead code: methods with no callers
cpg.method.where(_.callIn.size == 0).filter(_.name != "main").name.l

// Exception flow: methods that can throw but callers don't catch
cpg.method.where(_.ast.isThrow.size > 0).callIn.method.filter(_.ast.isTry.size == 0).name.l

scala

// Find all methods that handle user input
cpg.method.where(_.parameter.name(".*input.*|.*request.*")).name.l

// Trace data flow from parameter to return
cpg.method("processPayment").parameter.reachableBy(cpg.method("processPayment").methodReturn).l

// Find methods with high cyclomatic complexity
cpg.method.where(_.controlStructure.size > 10).name.l

// Dead code: methods with no callers
cpg.method.where(_.callIn.size == 0).filter(_.name != "main").name.l

// Exception flow: methods that can throw but callers don't catch
cpg.method.where(_.ast.isThrow.size > 0).callIn.method.filter(_.ast.isTry.size == 0).name.l

Tier 3: CodeQL

When to Use CodeQL

何时使用CodeQL

Scenario	Why CodeQL	Other Tiers Can't Do This
Security audit before release	Interprocedural taint analysis	Joern has basic taint, CodeQL is deeper
Reviewing auth/payment code	Data flow from source to sink	Cross-function, cross-file taint
PR security review	Targeted vulnerability scan	Pre-built OWASP query packs
Compliance checking	CWE/OWASP pattern matching	Curated security query suites
Rust security analysis	Full Rust support	Joern doesn't support Rust

场景	选用CodeQL的原因	其他层级无法实现的能力
发布前安全审计	过程间污点分析	Joern只有基础污点分析能力，CodeQL分析深度更高
审计认证/支付代码	从源到汇的数据流分析	跨函数、跨文件污点追踪
PR安全评审	定向漏洞扫描	预置OWASP查询包
合规检查	CWE/OWASP模式匹配	精选的安全查询套件
Rust安全分析	完整的Rust支持	Joern不支持Rust

Key MCP Tools (CodeQL)

核心MCP工具（CodeQL）

Tool	Purpose
`run_query`	Execute a CodeQL query against the database
`find_definitions`	Locate symbol definitions
`find_references`	Find all references to a symbol
`get_results`	Parse BQRS (Binary Query Result Sets)

工具	用途
`run_query`	针对数据库执行CodeQL查询
`find_definitions`	定位符号定义
`find_references`	查找符号的所有引用
`get_results`	解析BQRS（二进制查询结果集）

Supported Languages (CodeQL)

支持的语言（CodeQL）

C/C++, C#, Go, Java, Kotlin, JavaScript, TypeScript, Python, Ruby, Swift, Rust

C/C++、C#、Go、Java、Kotlin、JavaScript、TypeScript、Python、Ruby、Swift、Rust

MCP Configuration (CodeQL)

MCP配置（CodeQL）

json

{
  "mcpServers": {
    "codeql": {
      "command": "codeql-mcp",
      "args": ["--database", ".code-graph/codeql-db"]
    }
  }
}

json

{
  "mcpServers": {
    "codeql": {
      "command": "codeql-mcp",
      "args": ["--database", ".code-graph/codeql-db"]
    }
  }
}

Prerequisites

前置要求

CodeQL CLI (
```
brew install codeql
```
on macOS)

Install:

~/.claude/install-graph-tools.sh --codeql

CodeQL CLI（macOS上可执行
```
brew install codeql
```
安装）

安装命令：

~/.claude/install-graph-tools.sh --codeql

Common CodeQL Patterns

常用CodeQL模式

// SQL injection: user input flows to SQL query
import python
from DataFlow::PathNode source, DataFlow::PathNode sink
where TaintTracking::hasFlowPath(source, sink)
  and source instanceof RemoteFlowSource
  and sink instanceof SqlExecution
select sink, source, sink, "SQL injection from $@.", source, "user input"

// Unvalidated redirect
from DataFlow::PathNode source, DataFlow::PathNode sink
where source instanceof RemoteFlowSource
  and sink instanceof RedirectSink
select sink, "Unvalidated redirect from user input"

// SQL injection: user input flows to SQL query
import python
from DataFlow::PathNode source, DataFlow::PathNode sink
where TaintTracking::hasFlowPath(source, sink)
  and source instanceof RemoteFlowSource
  and sink instanceof SqlExecution
select sink, source, sink, "SQL injection from $@.", source, "user input"

// Unvalidated redirect
from DataFlow::PathNode source, DataFlow::PathNode sink
where source instanceof RemoteFlowSource
  and sink instanceof RedirectSink
select sink, "Unvalidated redirect from user input"

Combined Workflow: Deep Analysis

组合工作流：深度分析

When performing security review or complex refactoring, use all tiers:

1. SCOPE       → Tier 1: detect_changes / get_architecture
                 Identify files and modules in scope

2. STRUCTURE   → Tier 1: search_graph / trace_call_path
                 Map the call graph and dependencies

3. FLOW        → Tier 2: get_cfg / run_cpgql_query
                 Analyze control flow and data flow paths

4. SECURITY    → Tier 3: run_query with taint analysis
                 Check for vulnerabilities in data paths

5. REPORT      → Combine findings from all tiers
                 Prioritize: Critical > High > Medium > Low

开展安全评审或复杂重构时，可结合使用所有层级：

1. 范围确认       → Tier 1: detect_changes / get_architecture
                 识别涉及的文件和模块范围

2. 结构梳理   → Tier 1: search_graph / trace_call_path
                 绘制调用图和依赖关系

3. 流分析        → Tier 2: get_cfg / run_cpgql_query
                 分析控制流和数据流路径

4. 安全检查    → Tier 3: 运行带污点分析的查询
                 检查数据路径中的漏洞

5. 报告输出      → 整合所有层级的发现
                 优先级排序：严重 > 高危 > 中危 > 低危

Anti-Patterns

反模式

Anti-Pattern	Do This Instead
Using Joern/CodeQL for simple symbol lookup	Use Tier 1 `search_graph` (sub-ms vs seconds)
Running full CPG build on every commit	Build CPG on-demand; use Tier 1 for continuous monitoring
Querying Joern without checking `get_cpg_status`	Always verify CPG is built and current before querying
Running CodeQL without a specific security question	Have a hypothesis first; CodeQL queries are expensive
Ignoring Tier 1 blast radius before deep analysis	Always scope with Tier 1 first, then go deep on flagged areas
Using CodeQL for non-security structural queries	Use Joern CPGQL for structural/flow queries; CodeQL for security

反模式	推荐做法
简单符号查询也使用Joern/CodeQL	使用Tier 1的 `search_graph` （亚毫秒级vs秒级响应）
每次提交都构建完整CPG	按需构建CPG；使用Tier 1做持续监控
未检查 `get_cpg_status` 就查询Joern	查询前始终验证CPG已构建且是最新版本
没有明确的安全问题就运行CodeQL	先有分析假设；CodeQL查询资源成本较高
深度分析前不使用Tier 1评估影响范围	始终先用Tier 1划定范围，再针对标记区域做深度分析
使用CodeQL做非安全类结构查询	结构/流查询使用Joern CPGQL；CodeQL仅用于安全分析