genotoxic

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Genotoxic

Genotoxic

Combines mutation testing and necessist (test statement removal) with code graph analysis to triage findings into actionable categories: false positives, missing unit tests, and fuzzing targets.
将突变测试、necessist(测试语句移除)与代码图分析相结合,将发现的问题分类为可落地的处理类别:误报、缺失的单元测试和模糊测试目标。

When to Use

适用场景

  • After mutation testing reveals survived mutants that need triage
  • Identifying where unit tests would have the highest impact
  • Finding functions that need fuzz harnesses instead of unit tests
  • Prioritizing test improvements using data flow context
  • Filtering out harmless mutants from actionable ones
  • Finding unnecessary test statements that indicate weak assertions (necessist)
  • 突变测试发现需要分类处理的存活突变体时
  • 识别单元测试投入性价比最高的位置时
  • 查找需要模糊测试工具而非单元测试的函数时
  • 利用数据流上下文优先改进测试用例时
  • 从可处理的突变体中过滤掉无害突变体时
  • 查找表明断言薄弱的不必要测试语句时(necessist)

When NOT to Use

不适用场景

  • Codebase has no existing test suite (write tests first)
  • Pure documentation or configuration changes
  • Single-file scripts with trivial logic
  • 代码库没有现有测试套件(请先编写测试)
  • 纯文档或配置变更
  • 逻辑简单的单文件脚本

Prerequisites

前置条件

  • trailmark installed — if
    uv run trailmark
    fails, run:
    bash
    uv pip install trailmark
    DO NOT fall back to "manual verification" or "manual analysis" as a substitute for running trailmark. Install it first. If installation fails, report the error instead of switching to manual analysis.
  • A mutation testing framework for the target language — if the framework command fails (not found, not installed), install it using the instructions in references/mutation-frameworks.md. DO NOT fall back to "manual mutation analysis" or skip mutation testing. Install the framework first. If installation fails, report the error instead of switching to manual mutation analysis.
  • necessist (optional, recommended) — if the target language is supported (Go, Rust, Solidity/Foundry, TypeScript/Hardhat, TypeScript/Vitest, Rust/Anchor), install with
    cargo install necessist
    . See references/mutation-frameworks.md for details.
  • An existing test suite that passes
  • macOS environment: Run
    ulimit -n 1024
    before any
    mull-runner
    invocation. macOS Tahoe (26+) sets unlimited file descriptors by default, which crashes Mull's subprocess spawning. See references/mutation-frameworks.md for details.

  • 已安装 trailmark — 如果
    uv run trailmark
    运行失败,请执行:
    bash
    uv pip install trailmark
    请勿使用「手动验证」或「手动分析」替代运行trailmark,请先安装该工具。如果安装失败,请上报错误而非切换到手动分析。
  • 已安装目标语言对应的突变测试框架 — 如果框架命令运行失败(未找到、未安装),请按照references/mutation-frameworks.md中的说明安装。请勿使用「手动突变分析」或跳过突变测试,请先安装框架。如果安装失败,请上报错误而非切换到手动突变分析。
  • necessist(可选,推荐) — 如果目标语言受支持(Go、Rust、Solidity/Foundry、TypeScript/Hardhat、TypeScript/Vitest、Rust/Anchor),请使用
    cargo install necessist
    安装。详情请参考references/mutation-frameworks.md
  • 现有可正常运行的测试套件
  • macOS环境:运行任何
    mull-runner
    命令前请先执行
    ulimit -n 1024
    。macOS Tahoe(26+)默认设置无限制文件描述符,会导致Mull的子进程创建崩溃。详情请参考references/mutation-frameworks.md

Rationalizations to Reject

需摒弃的错误认知

RationalizationWhy It's WrongRequired Action
"All survived mutants need tests"Many are harmless or equivalentTriage before writing tests
"Mutation testing is too noisy"Noise means you're not triagingUse graph data to filter
"Unit tests cover everything"Complex data flows need fuzzingCheck entrypoint reachability
"Dead code mutants don't matter"Dead code should be removedFlag for cleanup
"Low complexity = low risk"Boundary bugs hide in simple codeCheck mutant location
"Tool isn't installed, I'll do it manually"Manual analysis misses what tooling catchesInstall the tool first
"Necessist isn't mutation testing, skip it"Necessist finds what mutation testing misses: weak testsRun both when the language supports it

错误认知错误原因required 操作
「所有存活突变体都需要补充测试」很多突变体是无害的或等价的编写测试前先分类
「突变测试噪音太大」噪音意味着你没有做分类使用图数据过滤
「单元测试已经覆盖了所有内容」复杂数据流需要模糊测试检查入口点可达性
「死代码突变体不重要」死代码应该被移除标记为待清理
「低复杂度 = 低风险」边界bug容易隐藏在简单代码中检查突变体位置
「工具没装,我手动做就行」手动分析会漏掉工具能捕获的问题先安装工具
「Necessist不是突变测试,跳过就行」Necessist能发现突变测试漏掉的问题:薄弱测试语言支持时两者都运行

Quick Start

快速开始

bash
undefined
bash
undefined

1. Build the code graph

1. 构建代码图

uv run trailmark analyze --summary {targetDir}
uv run trailmark analyze --summary {targetDir}

2. Run mutation testing (language-dependent)

2. 运行突变测试(依语言而定)

Python:

Python:

uv run mutmut run --paths-to-mutate {targetDir}/src uv run mutmut results
uv run mutmut run --paths-to-mutate {targetDir}/src uv run mutmut results

2b. Run necessist (if language supported)

2b. 运行necessist(如果语言支持)

necessist
necessist

3. Analyze results with this skill's workflow (Phase 3)

3. 使用本工具的工作流分析结果(第3阶段)


---

---

Workflow Overview

工作流概览

Phase 1: Graph Build      → Parse codebase with trailmark
Phase 2: Mutation Run     → Execute mutation testing framework
Phase 2b: Necessist Run   → Remove test statements (optional, parallel)
Phase 3: Triage           → Classify findings using graph data
Output: Categorized Report
  ├── Corroborated         (both tools flag same function — highest value)
  ├── False Positives      (harmless, skip)
  ├── Missing Tests        (write unit tests)
  └── Fuzzing Targets      (set up fuzz harnesses)

阶段1:构建代码图      → 用trailmark解析代码库
阶段2:运行突变测试     → 执行突变测试框架
阶段2b:运行Necessist   → 移除测试语句(可选,可并行)
阶段3:分类处理           → 利用图数据对发现的问题分类
输出:分类报告
  ├── 交叉验证结果         (两个工具都标记了同一个函数 — 最高优先级)
  ├── 误报      (无害,跳过)
  ├── 缺失测试        (编写单元测试)
  └── 模糊测试目标      (搭建模糊测试框架)

Decision Tree

决策树

├─ Need to set up mutation testing for a language?
│  └─ Read: references/mutation-frameworks.md
├─ Need to set up necessist or find weak test statements?
│  └─ Read: references/mutation-frameworks.md (Necessist section)
├─ Need to understand the triage criteria in depth?
│  └─ Read: references/triage-methodology.md
├─ Need to understand how graph data informs triage?
│  └─ Read: references/graph-analysis.md
└─ Already have results + graph? Use Phase 3 below.

├─ 需要为某门语言搭建突变测试环境?
│  └─ 阅读:references/mutation-frameworks.md
├─ 需要搭建necessist或查找薄弱测试语句?
│  └─ 阅读:references/mutation-frameworks.md(Necessist章节)
├─ 需要深入理解分类标准?
│  └─ 阅读:references/triage-methodology.md
├─ 需要理解图数据如何支撑分类?
│  └─ 阅读:references/graph-analysis.md
└─ 已经有结果+图数据?使用下方第3阶段流程。

Phase 1: Build Code Graph and Run Pre-Analysis

阶段1:构建代码图并运行预分析

Parse the target codebase with trailmark and run pre-analysis before mutation testing. Pre-analysis computes blast radius, entry points, privilege boundaries, and taint propagation, which Phase 3 uses for triage.
bash
uv run trailmark analyze --summary {targetDir}
Use the
QueryEngine
API to build the graph and run pre-analysis:
  1. QueryEngine.from_directory("{targetDir}", language="{lang}")
  2. Call
    engine.preanalysis()
    mandatory before triage
  3. Export with
    engine.to_json()
    for cross-referencing with mutation results
See references/graph-analysis.md for the full API: node mapping, reachability queries, blast radius, and pre-analysis subgraph lookups.

在突变测试前用trailmark解析目标代码库并运行预分析。预分析会计算影响范围、入口点、权限边界和污点传播,供第3阶段分类使用。
bash
uv run trailmark analyze --summary {targetDir}
使用
QueryEngine
API构建代码图并运行预分析:
  1. QueryEngine.from_directory("{targetDir}", language="{lang}")
  2. 调用
    engine.preanalysis()
    — 分类前必须执行
  3. engine.to_json()
    导出,用于和突变测试结果交叉比对
完整API请参考references/graph-analysis.md:节点映射、可达性查询、影响范围、预分析子图查询。

Phase 2: Run Mutation Testing

阶段2:运行突变测试

Select and run the appropriate framework. See references/mutation-frameworks.md for language-specific setup.
Capture survived mutants. Each framework reports differently, but extract these fields per mutant:
FieldDescription
File pathSource file containing the mutant
Line numberLine where mutation was applied
Mutation typeWhat was changed (operator, value, etc.)
Statussurvived, killed, timeout, error
Filter to survived mutants only for Phase 3.

选择并运行对应的框架。语言特定的配置请参考references/mutation-frameworks.md
收集存活突变体。 不同框架的报告格式不同,但需要提取每个突变体的以下字段:
字段描述
文件路径突变体所在的源文件
行号突变应用的行位置
突变类型变更内容(运算符、值等)
状态存活、被杀死、超时、错误
仅过滤出存活的突变体进入第3阶段。

Phase 2b: Run Necessist (Optional)

阶段2b:运行Necessist(可选)

If the target language is supported (Go, Rust, Solidity/Foundry, TypeScript/Hardhat, TypeScript/Vitest, Rust/Anchor), run necessist to find unnecessary test statements. This runs independently of Phase 2 and can execute in parallel.
bash
undefined
如果目标语言受支持(Go、Rust、Solidity/Foundry、TypeScript/Hardhat、TypeScript/Vitest、Rust/Anchor),运行necessist查找不必要的测试语句。该步骤独立于阶段2,可并行执行。
bash
undefined

Auto-detect framework

自动检测框架

necessist
necessist

Or target specific test files

或指定测试文件

necessist tests/test_parser.rs
necessist tests/test_parser.rs

Export results

导出结果

necessist --dump

Filter to findings where the test **passed after removal**. See
[references/mutation-frameworks.md](references/mutation-frameworks.md)
for framework-specific configuration and the normalized record format.

Map each removal to a production function using the algorithm in
[references/graph-analysis.md](references/graph-analysis.md).

---
necessist --dump

过滤出语句移除后测试**仍通过**的结果。框架特定配置和标准化记录格式请参考[references/mutation-frameworks.md](references/mutation-frameworks.md)。

使用[references/graph-analysis.md](references/graph-analysis.md)中的算法,将每个移除的语句映射到对应的生产函数。

---

Phase 3: Triage Findings

阶段3:分类处理发现的问题

For each survived mutant and each necessist removal, determine its triage bucket using graph data. Necessist removals must first be mapped to a production function (see references/graph-analysis.md).
针对每个存活突变体和每个necessist移除的语句,使用图数据确定其分类桶。Necessist移除的语句必须先映射到对应的生产函数(参考references/graph-analysis.md)。

Quick Classification (Mutation Testing)

快速分类(突变测试)

SignalBucketReasoning
No callers in graphFalse PositiveDead code, mutant is unreachable
Only test callersFalse PositiveTest infrastructure, not production
Logging/display stringFalse PositiveCosmetic, no behavioral impact
Equivalent mutantFalse PositiveBehavior unchanged despite mutation
Simple function, low CC, no entrypoint pathMissing TestsUnit test is straightforward
Error handling pathMissing TestsShould have negative test cases
Boundary condition (off-by-one)Missing TestsProperty-based test candidate
Pure function, deterministicMissing TestsEasy to test, high value
High CC (>10), entrypoint reachableFuzzing TargetComplex + exposed = fuzz it
Parser/validator/deserializerFuzzing TargetStructured input handling
Many callers (>10) + moderate CCFuzzing TargetHigh blast radius
Binary/wire protocol handlingFuzzing TargetFuzzers excel at format testing
信号分类桶逻辑
图中无调用方误报死代码,突变体不可达
仅测试调用方误报测试基础设施,非生产代码
日志/展示字符串误报外观类修改,无行为影响
等价突变体误报尽管发生突变,行为未改变
简单函数、低圈复杂度、无入口点路径缺失测试单元测试容易编写
错误处理路径缺失测试应该有负向测试用例
边界条件(差一错误)缺失测试适合基于属性的测试
纯函数、确定性缺失测试容易测试,价值高
高圈复杂度(>10)、入口点可达模糊测试目标复杂+暴露 = 适合模糊测试
解析器/校验器/反序列化器模糊测试目标结构化输入处理
大量调用方(>10)+ 中等圈复杂度模糊测试目标影响范围大
二进制/网络协议处理模糊测试目标模糊测试擅长格式测试

Quick Classification (Necessist)

快速分类(Necessist)

SignalBucketReasoning
Redundant setup or debug callFalse PositiveStatement genuinely unnecessary
Cannot map to production functionFalse PositiveNo graph context for triage
Call removed, no assertion checks its effectMissing TestsTest has weak assertions
Assertion removed, test still passesMissing TestsRedundant or insufficient coverage
Maps to high-CC entrypoint-reachable functionFuzzing TargetComplex + exposed + weak test
When both mutation testing and necessist flag the same production function, mark as corroborated — highest confidence finding.
For detailed criteria, see references/triage-methodology.md.
信号分类桶逻辑
冗余配置或调试调用误报语句本身确实不必要
无法映射到生产函数误报无分类所需的图上下文
调用被移除,没有断言检查其影响缺失测试测试断言薄弱
断言被移除,测试仍通过缺失测试覆盖冗余或不足
映射到高圈复杂度、入口点可达的函数模糊测试目标复杂+暴露+测试薄弱
当突变测试和necessist都标记了同一个生产函数,标记为交叉验证结果 — 置信度最高的发现。
详细标准请参考references/triage-methodology.md

Graph Queries for Triage

分类用的图查询

For each mutant, map it to its containing graph node and use pre-analysis subgraphs (tainted, high_blast_radius, privilege_boundary) from Phase 1 to classify it. The classification logic checks: no callers → false positive, privilege boundary → fuzzing, high CC + tainted → fuzzing, high blast radius → fuzzing, otherwise → missing tests.
See references/graph-analysis.md for the
batch_triage
implementation and node mapping functions.

针对每个突变体,将其映射到对应的图节点,使用阶段1得到的预分析子图(污点、高影响范围、权限边界)进行分类。分类逻辑检查:无调用方→误报,权限边界→模糊测试,高圈复杂度+污点→模糊测试,高影响范围→模糊测试,其他情况→缺失测试。
batch_triage
实现和节点映射函数请参考references/graph-analysis.md

Output Format

输出格式

Generate a markdown report:
markdown
undefined
生成markdown报告:
markdown
undefined

Genotoxic Triage Report

Genotoxic分类报告

Summary

摘要

  • Total survived mutants: N
  • Total necessist removals: N
  • Corroborated findings: N
  • False positives: N (N%)
  • Missing test coverage: N (N%)
  • Fuzzing targets: N (N%)
  • 总存活突变体数:N
  • 总necessist移除语句数:N
  • 交叉验证结果数:N
  • 误报数:N(N%)
  • 缺失测试覆盖数:N(N%)
  • 模糊测试目标数:N(N%)

Corroborated Findings

交叉验证结果

FileLineFunctionMutation SignalNecessist SignalAction
文件行号函数突变信号Necessist信号操作

False Positives

误报

FileLineMutationReasonSource
文件行号突变原因来源

Missing Test Coverage

缺失测试覆盖

FileLineFunctionCCCallersSuggested TestSource
文件行号函数圈复杂度调用方数量建议测试类型来源

Fuzzing Targets

模糊测试目标

FileLineFunctionCCEntrypoint PathBlast RadiusSource

The `Source` column is `mutation`, `necessist`, or `corroborated`.

Write the report to `GENOTOXIC_REPORT.md` in the working directory.

---
文件行号函数圈复杂度入口点路径影响范围来源

`来源`列取值为`mutation`、`necessist`或`corroborated`。

将报告写入工作目录下的`GENOTOXIC_REPORT.md`文件。

---

Quality Checklist

质量检查清单

Before delivering:
  • Trailmark graph built for target language
  • Mutation framework ran to completion
  • Necessist ran (if language supported) or noted as not applicable
  • All survived mutants triaged (none unclassified)
  • All necessist removals triaged (if applicable)
  • Corroborated findings identified (if both tools ran)
  • False positives have clear justifications
  • Missing test items include suggested test type
  • Fuzzing targets include entrypoint paths and blast radius
  • Report file written to
    GENOTOXIC_REPORT.md
  • User notified with summary statistics

交付前确认:
  • 已为目标语言构建Trailmark图
  • 突变框架已运行完成
  • 已运行Necessist(如果语言支持)或标记为不适用
  • 所有存活突变体都已分类(无未分类项)
  • 所有Necessist移除语句都已分类(如果适用)
  • 已识别交叉验证结果(如果两个工具都运行)
  • 误报有明确的理由说明
  • 缺失测试项包含建议的测试类型
  • 模糊测试目标包含入口点路径和影响范围
  • 报告文件已写入
    GENOTOXIC_REPORT.md
  • 已向用户通知摘要统计数据

Integration

集成

trailmark skill:
  • Phase 1: Build code graph, query complexity and entrypoints
  • Phase 3: Caller analysis, reachability, blast radius
property-based-testing skill:
  • Missing test coverage items involving boundary conditions
  • Roundtrip/idempotence properties for serialization mutants
testing-handbook-skills (fuzzing):
  • Fuzzing target items: use
    harness-writing
    ,
    cargo-fuzz
    ,
    atheris

trailmark工具:
  • 阶段1:构建代码图,查询复杂度和入口点
  • 阶段3:调用方分析、可达性、影响范围
基于属性的测试工具:
  • 涉及边界条件的缺失测试覆盖项
  • 序列化突变体的往返/幂等属性测试
测试手册工具(模糊测试):
  • 模糊测试目标项:使用
    harness-writing
    cargo-fuzz
    atheris

Supporting Documentation

支持文档

  • references/mutation-frameworks.md - Language-specific framework setup, output parsing, and necessist configuration
  • references/triage-methodology.md - Detailed triage criteria, edge cases, and worked examples for both mutation testing and necessist
  • references/graph-analysis.md - Graph query patterns, test-to-production mapping, and result merging

First-time users: Start with Phase 1 (graph build), then run mutations, then use the Quick Classification table in Phase 3.
Experienced users: Jump to Phase 3 and use the Decision Tree to load specific reference material.
  • references/mutation-frameworks.md — 语言特定的框架搭建、输出解析和necessist配置
  • references/triage-methodology.md — 突变测试和necessist的详细分类标准、边界情况和示例
  • references/graph-analysis.md — 图查询模式、测试到生产的映射、结果合并

首次使用用户: 从阶段1(构建图)开始,然后运行突变测试,再使用阶段3的快速分类表。
有经验用户: 直接跳转到阶段3,使用决策树加载对应的参考资料。