cpg-analysis

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

CPG Analysis Skill

CPG分析技能

Purpose: Deep code analysis beyond AST. Use Joern for full Code Property Graph (control flow, data flow, program dependencies) and CodeQL for interprocedural taint analysis and vulnerability detection.
These are opt-in tools. They require Docker/JVM (Joern) or CodeQL CLI. Use codebase-memory-mcp (Tier 1, always-on) for everyday navigation. Use these for deep analysis when Tier 1 is not enough.
┌────────────────────────────────────────────────────────────────┐
│  CODE PROPERTY GRAPH = AST + CFG + CDG + DDG + PDG             │
│  ─────────────────────────────────────────────────────────────│
│  AST  = Abstract Syntax Tree (structure)                       │
│  CFG  = Control Flow Graph (execution paths)                   │
│  CDG  = Control Dependency Graph (conditional dependencies)    │
│  DDG  = Data Dependency Graph (data flow between statements)   │
│  PDG  = Program Dependency Graph (CDG + DDG combined)          │
│                                                                │
│  Tier 2 (Joern): Full CPG with 40+ query tools                │
│  Tier 3 (CodeQL): Interprocedural taint + security queries     │
└────────────────────────────────────────────────────────────────┘

用途: 超出AST能力范围的深度代码分析。使用Joern生成完整的代码属性图(控制流、数据流、程序依赖),使用CodeQL开展过程间污点分析和漏洞检测。
这些是可选工具。 它们需要安装Docker/JVM(用于Joern)或CodeQL CLI。 日常代码导航请使用codebase-memory-mcp(Tier 1,常驻运行)。 当Tier 1无法满足需求时,使用这些工具开展深度分析。
┌────────────────────────────────────────────────────────────────┐
│  CODE PROPERTY GRAPH = AST + CFG + CDG + DDG + PDG             │
│  ─────────────────────────────────────────────────────────────│
│  AST  = Abstract Syntax Tree (structure)                       │
│  CFG  = Control Flow Graph (execution paths)                   │
│  CDG  = Control Dependency Graph (conditional dependencies)    │
│  DDG  = Data Dependency Graph (data flow between statements)   │
│  PDG  = Program Dependency Graph (CDG + DDG combined)          │
│                                                                │
│  Tier 2 (Joern): Full CPG with 40+ query tools                │
│  Tier 3 (CodeQL): Interprocedural taint + security queries     │
└────────────────────────────────────────────────────────────────┘

Tier Selection Guide

层级选择指南

Simple symbol lookup, dependency trace, blast radius?
  → Tier 1: codebase-memory-mcp (always on, sub-ms)

Control flow paths, data flow, dead code, complex refactoring?
  → Tier 2: Joern CPG (on-demand, seconds)

Security audit, taint analysis, vulnerability detection?
  → Tier 3: CodeQL (on-demand, seconds to minutes)

Full security review before release?
  → All three tiers in sequence

简单符号查询、依赖追踪、影响范围评估?
  → Tier 1: codebase-memory-mcp(常驻运行,亚毫秒级响应)

控制流路径、数据流、死代码、复杂重构?
  → Tier 2: Joern CPG(按需启动,秒级响应)

安全审计、污点分析、漏洞检测?
  → Tier 3: CodeQL(按需启动,秒级到分钟级响应)

发布前全面安全评审?
  → 按顺序使用全部三个层级

Tier 2: Joern CPG (CodeBadger MCP)

Tier 2: Joern CPG (CodeBadger MCP)

When to Use Joern

何时使用Joern

ScenarioWhy JoernTier 1 Can't Do This
Trace data flow through functionsFull DDG traversalTier 1 has no data flow
Understanding control flow pathsCFG analysis with branch conditionsTier 1 has no CFG
Finding dead/unreachable codePDG reachability analysisTier 1 only detects unused exports
Complex refactoring impactCross-function dependency chainsTier 1 limited to call graph
Auditing third-party library usageDeep call chain traversalTier 1 stops at import boundary
Understanding exception flowCFG includes throw/catch pathsTier 1 ignores exceptions
场景选用Joern的原因Tier 1无法实现的能力
跨函数追踪数据流完整的DDG遍历能力Tier 1无数据流分析能力
理解控制流路径带分支条件的CFG分析Tier 1无CFG分析能力
查找死代码/不可达代码PDG可达性分析Tier 1仅能检测未使用的导出项
复杂重构影响评估跨函数依赖链分析Tier 1仅支持调用图分析
审计第三方库使用情况深度调用链遍历Tier 1仅分析到导入边界
理解异常流CFG包含throw/catch路径Tier 1忽略异常处理逻辑

Key MCP Tools (Joern/CodeBadger)

核心MCP工具(Joern/CodeBadger)

ToolPurposeExample Query
generate_cpg
Build CPG for projectFirst-time setup or after major changes
get_cpg_status
Check CPG build statusVerify CPG is ready before querying
run_cpgql_query
Run arbitrary CPGQL queries
cpg.method("login").callOut.code.l
get_cpgql_syntax_help
Query language referenceWhen unsure about query syntax
get_cfg
Control flow graph for a methodUnderstand execution paths in a function
list_methods
List all methods in projectOverview of available functions
get_method_source
Get source code of a methodRead specific function source
list_calls
List calls from/to a methodCaller/callee analysis
get_call_graph
Full call graph visualizationUnderstand call chains
get_type_definition
Type/class definitionsUnderstand type hierarchy
工具用途示例查询
generate_cpg
为项目构建CPG首次设置或重大变更后使用
get_cpg_status
检查CPG构建状态查询前验证CPG是否就绪
run_cpgql_query
运行任意CPGQL查询
cpg.method("login").callOut.code.l
get_cpgql_syntax_help
查询语言参考不确定查询语法时使用
get_cfg
获取指定方法的控制流图理解函数内的执行路径
list_methods
列出项目中所有方法概览可用函数
get_method_source
获取指定方法的源代码读取特定函数源码
list_calls
列出某个方法的入站/出站调用调用方/被调用方分析
get_call_graph
完整调用图可视化理解调用链
get_type_definition
类型/类定义理解类型层级结构

Supported Languages (Joern)

支持的语言(Joern)

Java, Scala, C/C++, Python, JavaScript, TypeScript, PHP, Ruby, Go, Kotlin, Swift, Lua
Not supported: Rust (use CodeQL for Rust)
Java、Scala、C/C++、Python、JavaScript、TypeScript、PHP、Ruby、Go、Kotlin、Swift、Lua
不支持: Rust(Rust请使用CodeQL)

MCP Configuration (Joern)

MCP配置(Joern)

json
{
  "mcpServers": {
    "codebadger": {
      "url": "http://localhost:4242/mcp",
      "type": "http"
    }
  }
}
json
{
  "mcpServers": {
    "codebadger": {
      "url": "http://localhost:4242/mcp",
      "type": "http"
    }
  }
}

Prerequisites

前置要求

  • Docker (for Joern backend)
  • Python 3.10+ (for MCP server)
  • Install:
    ~/.claude/install-graph-tools.sh --joern
  • Docker(用于Joern后端)
  • Python 3.10+(用于MCP服务端)
  • 安装命令:
    ~/.claude/install-graph-tools.sh --joern

Common CPGQL Queries

常用CPGQL查询

scala
// Find all methods that handle user input
cpg.method.where(_.parameter.name(".*input.*|.*request.*")).name.l

// Trace data flow from parameter to return
cpg.method("processPayment").parameter.reachableBy(cpg.method("processPayment").methodReturn).l

// Find methods with high cyclomatic complexity
cpg.method.where(_.controlStructure.size > 10).name.l

// Dead code: methods with no callers
cpg.method.where(_.callIn.size == 0).filter(_.name != "main").name.l

// Exception flow: methods that can throw but callers don't catch
cpg.method.where(_.ast.isThrow.size > 0).callIn.method.filter(_.ast.isTry.size == 0).name.l

scala
// Find all methods that handle user input
cpg.method.where(_.parameter.name(".*input.*|.*request.*")).name.l

// Trace data flow from parameter to return
cpg.method("processPayment").parameter.reachableBy(cpg.method("processPayment").methodReturn).l

// Find methods with high cyclomatic complexity
cpg.method.where(_.controlStructure.size > 10).name.l

// Dead code: methods with no callers
cpg.method.where(_.callIn.size == 0).filter(_.name != "main").name.l

// Exception flow: methods that can throw but callers don't catch
cpg.method.where(_.ast.isThrow.size > 0).callIn.method.filter(_.ast.isTry.size == 0).name.l

Tier 3: CodeQL

Tier 3: CodeQL

When to Use CodeQL

何时使用CodeQL

ScenarioWhy CodeQLOther Tiers Can't Do This
Security audit before releaseInterprocedural taint analysisJoern has basic taint, CodeQL is deeper
Reviewing auth/payment codeData flow from source to sinkCross-function, cross-file taint
PR security reviewTargeted vulnerability scanPre-built OWASP query packs
Compliance checkingCWE/OWASP pattern matchingCurated security query suites
Rust security analysisFull Rust supportJoern doesn't support Rust
场景选用CodeQL的原因其他层级无法实现的能力
发布前安全审计过程间污点分析Joern只有基础污点分析能力,CodeQL分析深度更高
审计认证/支付代码从源到汇的数据流分析跨函数、跨文件污点追踪
PR安全评审定向漏洞扫描预置OWASP查询包
合规检查CWE/OWASP模式匹配精选的安全查询套件
Rust安全分析完整的Rust支持Joern不支持Rust

Key MCP Tools (CodeQL)

核心MCP工具(CodeQL)

ToolPurpose
run_query
Execute a CodeQL query against the database
find_definitions
Locate symbol definitions
find_references
Find all references to a symbol
get_results
Parse BQRS (Binary Query Result Sets)
工具用途
run_query
针对数据库执行CodeQL查询
find_definitions
定位符号定义
find_references
查找符号的所有引用
get_results
解析BQRS(二进制查询结果集)

Supported Languages (CodeQL)

支持的语言(CodeQL)

C/C++, C#, Go, Java, Kotlin, JavaScript, TypeScript, Python, Ruby, Swift, Rust
C/C++、C#、Go、Java、Kotlin、JavaScript、TypeScript、Python、Ruby、Swift、Rust

MCP Configuration (CodeQL)

MCP配置(CodeQL)

json
{
  "mcpServers": {
    "codeql": {
      "command": "codeql-mcp",
      "args": ["--database", ".code-graph/codeql-db"]
    }
  }
}
json
{
  "mcpServers": {
    "codeql": {
      "command": "codeql-mcp",
      "args": ["--database", ".code-graph/codeql-db"]
    }
  }
}

Prerequisites

前置要求

  • CodeQL CLI (
    brew install codeql
    on macOS)
  • Install:
    ~/.claude/install-graph-tools.sh --codeql
  • CodeQL CLI(macOS上可执行
    brew install codeql
    安装)
  • 安装命令:
    ~/.claude/install-graph-tools.sh --codeql

Common CodeQL Patterns

常用CodeQL模式

ql
// SQL injection: user input flows to SQL query
import python
from DataFlow::PathNode source, DataFlow::PathNode sink
where TaintTracking::hasFlowPath(source, sink)
  and source instanceof RemoteFlowSource
  and sink instanceof SqlExecution
select sink, source, sink, "SQL injection from $@.", source, "user input"

// Unvalidated redirect
from DataFlow::PathNode source, DataFlow::PathNode sink
where source instanceof RemoteFlowSource
  and sink instanceof RedirectSink
select sink, "Unvalidated redirect from user input"

ql
// SQL injection: user input flows to SQL query
import python
from DataFlow::PathNode source, DataFlow::PathNode sink
where TaintTracking::hasFlowPath(source, sink)
  and source instanceof RemoteFlowSource
  and sink instanceof SqlExecution
select sink, source, sink, "SQL injection from $@.", source, "user input"

// Unvalidated redirect
from DataFlow::PathNode source, DataFlow::PathNode sink
where source instanceof RemoteFlowSource
  and sink instanceof RedirectSink
select sink, "Unvalidated redirect from user input"

Combined Workflow: Deep Analysis

组合工作流:深度分析

When performing security review or complex refactoring, use all tiers:
1. SCOPE       → Tier 1: detect_changes / get_architecture
                 Identify files and modules in scope

2. STRUCTURE   → Tier 1: search_graph / trace_call_path
                 Map the call graph and dependencies

3. FLOW        → Tier 2: get_cfg / run_cpgql_query
                 Analyze control flow and data flow paths

4. SECURITY    → Tier 3: run_query with taint analysis
                 Check for vulnerabilities in data paths

5. REPORT      → Combine findings from all tiers
                 Prioritize: Critical > High > Medium > Low

开展安全评审或复杂重构时,可结合使用所有层级:
1. 范围确认       → Tier 1: detect_changes / get_architecture
                 识别涉及的文件和模块范围

2. 结构梳理   → Tier 1: search_graph / trace_call_path
                 绘制调用图和依赖关系

3. 流分析        → Tier 2: get_cfg / run_cpgql_query
                 分析控制流和数据流路径

4. 安全检查    → Tier 3: 运行带污点分析的查询
                 检查数据路径中的漏洞

5. 报告输出      → 整合所有层级的发现
                 优先级排序:严重 > 高危 > 中危 > 低危

Anti-Patterns

反模式

Anti-PatternDo This Instead
Using Joern/CodeQL for simple symbol lookupUse Tier 1
search_graph
(sub-ms vs seconds)
Running full CPG build on every commitBuild CPG on-demand; use Tier 1 for continuous monitoring
Querying Joern without checking
get_cpg_status
Always verify CPG is built and current before querying
Running CodeQL without a specific security questionHave a hypothesis first; CodeQL queries are expensive
Ignoring Tier 1 blast radius before deep analysisAlways scope with Tier 1 first, then go deep on flagged areas
Using CodeQL for non-security structural queriesUse Joern CPGQL for structural/flow queries; CodeQL for security
反模式推荐做法
简单符号查询也使用Joern/CodeQL使用Tier 1的
search_graph
(亚毫秒级vs秒级响应)
每次提交都构建完整CPG按需构建CPG;使用Tier 1做持续监控
未检查
get_cpg_status
就查询Joern
查询前始终验证CPG已构建且是最新版本
没有明确的安全问题就运行CodeQL先有分析假设;CodeQL查询资源成本较高
深度分析前不使用Tier 1评估影响范围始终先用Tier 1划定范围,再针对标记区域做深度分析
使用CodeQL做非安全类结构查询结构/流查询使用Joern CPGQL;CodeQL仅用于安全分析