audit-augmentation

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Audit Augmentation

审计增强

Projects findings from external tools (SARIF) and human auditors (weAudit) onto Trailmark code graphs as annotations and subgraphs.
将外部工具(SARIF)和人工审计(weAudit)的发现结果作为注释和子图投射到Trailmark代码图中。

When to Use

适用场景

  • Importing Semgrep, CodeQL, or other SARIF-producing tool results into a graph
  • Importing weAudit audit annotations into a graph
  • Cross-referencing static analysis findings with blast radius or taint data
  • Querying which functions have high-severity findings
  • Visualizing audit coverage alongside code structure
  • 将Semgrep、CodeQL或其他生成SARIF的工具结果导入代码图
  • 将weAudit审计注释导入代码图
  • 将静态分析发现结果与blast radius或taint数据交叉引用
  • 查询存在高严重级别发现结果的函数
  • 结合代码结构可视化审计覆盖范围

When NOT to Use

不适用场景

  • Running static analysis tools (use semgrep/codeql directly, then import)
  • Building the code graph itself (use the
    trailmark
    skill)
  • Generating diagrams (use the
    diagramming-code
    skill after augmenting)
  • 运行静态分析工具(直接使用semgrep/codeql运行后再导入结果)
  • 构建代码图本身(使用
    trailmark
    skill)
  • 生成图表(增强完成后使用
    diagramming-code
    skill)

Rationalizations to Reject

错误认知说明

RationalizationWhy It's WrongRequired Action
"The user only asked about SARIF, skip pre-analysis"Without pre-analysis, you can't cross-reference findings with blast radius or taintAlways run
engine.preanalysis()
before augmenting
"Unmatched findings don't matter"Unmatched findings may indicate parsing gaps or out-of-scope filesReport unmatched count and investigate if high
"One severity subgraph is enough"Different severities need different triage workflowsQuery all severity subgraphs, not just
error
"SARIF results speak for themselves"Findings without graph context lack blast radius and taint reachabilityCross-reference with pre-analysis subgraphs
"weAudit and SARIF overlap, pick one"Human auditors and tools find different thingsImport both when available
"Tool isn't installed, I'll do it manually"Manual analysis misses what tooling catchesInstall trailmark first

错误认知错误原因正确操作
"用户只提到了SARIF,跳过预分析"没有预分析的话无法将发现结果与blast radius或taint交叉引用增强操作前必须运行
engine.preanalysis()
"匹配不上的发现结果不重要"匹配失败可能意味着解析存在漏洞或者文件不在作用域内上报匹配失败的数量,如果数量较多需要排查原因
"只要一个严重级别子图就够了"不同严重级别需要不同的分类处理流程查询所有严重级别的子图,不要只查询
error
"SARIF结果本身就能说明问题"没有代码图上下文的发现结果缺少blast radius和taint可达性信息与预分析子图进行交叉引用
"weAudit和SARIF结果有重叠,选一个就行"人工审计和工具能发现不同的问题两者都可用时全部导入
"工具没安装,我手动分析就行"手动分析会遗漏工具能捕捉到的问题先安装trailmark

Installation

安装

MANDATORY: If
uv run trailmark
fails, install trailmark first:
bash
uv pip install trailmark
强制要求: 如果
uv run trailmark
运行失败,先安装trailmark:
bash
uv pip install trailmark

Quick Start

快速开始

CLI

CLI

bash
undefined
bash
undefined

Augment with SARIF

用SARIF增强

uv run trailmark augment {targetDir} --sarif results.sarif
uv run trailmark augment {targetDir} --sarif results.sarif

Augment with weAudit

用weAudit增强

uv run trailmark augment {targetDir} --weaudit .vscode/alice.weaudit
uv run trailmark augment {targetDir} --weaudit .vscode/alice.weaudit

Both at once, output JSON

同时使用两者,输出JSON

uv run trailmark augment {targetDir}
--sarif results.sarif
--weaudit .vscode/alice.weaudit
--json
undefined
uv run trailmark augment {targetDir}
--sarif results.sarif
--weaudit .vscode/alice.weaudit
--json
undefined

Programmatic API

编程API

python
from trailmark.query.api import QueryEngine

engine = QueryEngine.from_directory("{targetDir}", language="python")
python
from trailmark.query.api import QueryEngine

engine = QueryEngine.from_directory("{targetDir}", language="python")

Run pre-analysis first for cross-referencing

先运行预分析用于交叉引用

engine.preanalysis()
engine.preanalysis()

Augment with SARIF

用SARIF增强

result = engine.augment_sarif("results.sarif")
result = engine.augment_sarif("results.sarif")

result: {matched_findings: 12, unmatched_findings: 3, subgraphs_created: [...]}

result: {matched_findings: 12, unmatched_findings: 3, subgraphs_created: [...]}

Augment with weAudit

用weAudit增强

result = engine.augment_weaudit(".vscode/alice.weaudit")
result = engine.augment_weaudit(".vscode/alice.weaudit")

Query findings

查询发现结果

engine.findings() # All findings engine.subgraph("sarif:error") # High-severity SARIF engine.subgraph("weaudit:high") # High-severity weAudit engine.subgraph("sarif:semgrep") # By tool name engine.annotations_of("function_name") # Per-node lookup
undefined
engine.findings() # 所有发现结果 engine.subgraph("sarif:error") # 高严重级别SARIF engine.subgraph("weaudit:high") # 高严重级别weAudit engine.subgraph("sarif:semgrep") # 按工具名称筛选 engine.annotations_of("function_name") # 按节点查询
undefined

Workflow

工作流

Augmentation Progress:
- [ ] Step 1: Build graph and run pre-analysis
- [ ] Step 2: Locate SARIF/weAudit files
- [ ] Step 3: Run augmentation
- [ ] Step 4: Inspect results and subgraphs
- [ ] Step 5: Cross-reference with pre-analysis
Step 1: Build the graph and run pre-analysis for blast radius and taint context:
python
engine = QueryEngine.from_directory("{targetDir}", language="{lang}")
engine.preanalysis()
Step 2: Locate input files:
  • SARIF: Usually output by tools like
    semgrep --sarif -o results.sarif
    or
    codeql database analyze --format=sarif-latest
  • weAudit: Stored in
    .vscode/<username>.weaudit
    within the workspace
Step 3: Run augmentation via
engine.augment_sarif()
or
engine.augment_weaudit()
. Check
unmatched_findings
in the result — these are findings whose file/line locations didn't overlap any parsed code unit.
Step 4: Query findings and subgraphs. Use
engine.findings()
to list all annotated nodes. Use
engine.subgraph_names()
to see available subgraphs.
Step 5: Cross-reference with pre-analysis data to prioritize:
  • Findings on tainted nodes: overlap
    sarif:error
    with
    tainted
    subgraph
  • Findings on high blast radius nodes: overlap with
    high_blast_radius
  • Findings on privilege boundaries: overlap with
    privilege_boundary
增强进度:
- [ ] 步骤1:构建代码图并运行预分析
- [ ] 步骤2:定位SARIF/weAudit文件
- [ ] 步骤3:运行增强操作
- [ ] 步骤4:检查结果和子图
- [ ] 步骤5:与预分析结果交叉引用
步骤1: 构建代码图并运行预分析,获取blast radius和taint上下文:
python
engine = QueryEngine.from_directory("{targetDir}", language="{lang}")
engine.preanalysis()
步骤2: 定位输入文件:
  • SARIF:通常是
    semgrep --sarif -o results.sarif
    或者
    codeql database analyze --format=sarif-latest
    这类命令的输出
  • weAudit:存储在工作区的
    .vscode/<username>.weaudit
    路径下
步骤3: 通过
engine.augment_sarif()
engine.augment_weaudit()
运行增强操作。检查结果中的
unmatched_findings
——这些是对应的文件/行号位置没有匹配到任何已解析代码单元的发现结果。
步骤4: 查询发现结果和子图。使用
engine.findings()
列出所有已注释节点,使用
engine.subgraph_names()
查看可用的子图。
步骤5: 与预分析数据交叉引用来确定优先级:
  • 受taint影响节点上的发现结果:将
    sarif:error
    tainted
    子图重叠查询
  • 高blast radius节点上的发现结果:与
    high_blast_radius
    重叠查询
  • 权限边界上的发现结果:与
    privilege_boundary
    重叠查询

Annotation Format

注释格式

Findings are stored as standard Trailmark annotations:
  • Kind:
    finding
    (tool-generated) or
    audit_note
    (human notes)
  • Source:
    sarif:<tool_name>
    or
    weaudit:<author>
  • Description: Compact single-line:
    [SEVERITY] rule-id: message (tool)
发现结果以标准Trailmark注释格式存储:
  • 类型
    finding
    (工具生成)或
    audit_note
    (人工注释)
  • 来源
    sarif:<工具名称>
    weaudit:<作者>
  • 描述:精简的单行内容:
    [严重级别] 规则ID: 消息 (工具)

Subgraphs Created

生成的子图

SubgraphContents
sarif:error
Nodes with SARIF error-level findings
sarif:warning
Nodes with SARIF warning-level findings
sarif:note
Nodes with SARIF note-level findings
sarif:<tool>
Nodes flagged by a specific tool
weaudit:high
Nodes with high-severity weAudit findings
weaudit:medium
Nodes with medium-severity weAudit findings
weaudit:low
Nodes with low-severity weAudit findings
weaudit:findings
All weAudit findings (entryType=0)
weaudit:notes
All weAudit notes (entryType=1)
子图内容
sarif:error
存在SARIF错误级别发现结果的节点
sarif:warning
存在SARIF警告级别发现结果的节点
sarif:note
存在SARIF提示级别发现结果的节点
sarif:<工具名>
特定工具标记的节点
weaudit:high
存在高严重级别weAudit发现结果的节点
weaudit:medium
存在中严重级别weAudit发现结果的节点
weaudit:low
存在低严重级别weAudit发现结果的节点
weaudit:findings
所有weAudit发现结果(entryType=0)
weaudit:notes
所有weAudit注释(entryType=1)

How Matching Works

匹配规则说明

Findings are matched to graph nodes by file path and line range overlap:
  1. Finding file path is normalized relative to the graph's
    root_path
  2. Nodes whose
    location.file_path
    matches AND whose line range overlaps are selected
  3. The tightest match (smallest span) is preferred
  4. If a finding's location doesn't overlap any node, it counts as unmatched
SARIF paths may be relative, absolute, or
file://
URIs — all are handled. weAudit uses 0-indexed lines which are converted to 1-indexed automatically.
发现结果通过文件路径和行号范围重叠匹配到图节点:
  1. 发现结果的文件路径会被归一化为代码图
    root_path
    的相对路径
  2. 选择
    location.file_path
    匹配且行号范围重叠的节点
  3. 优先选择最匹配的(跨度最小的)节点
  4. 如果发现结果的位置没有匹配到任何节点,会被标记为匹配失败
SARIF路径可以是相对路径、绝对路径或者
file://
URI——所有格式都支持。weAudit使用的0索引行号会自动转换为1索引。

Supporting Documentation

参考文档

  • references/formats.md — SARIF 2.1.0 and weAudit file format field reference
  • references/formats.md — SARIF 2.1.0和weAudit文件格式字段参考