Genotoxic
Combines mutation testing and necessist (test statement removal) with
code graph analysis to triage findings into actionable categories:
false positives, missing unit tests, and fuzzing targets.
When to Use
- After mutation testing reveals survived mutants that need triage
- Identifying where unit tests would have the highest impact
- Finding functions that need fuzz harnesses instead of unit tests
- Prioritizing test improvements using data flow context
- Filtering out harmless mutants from actionable ones
- Finding unnecessary test statements that indicate weak assertions (necessist)
When NOT to Use
- Codebase has no existing test suite (write tests first)
- Pure documentation or configuration changes
- Single-file scripts with trivial logic
Prerequisites
- trailmark installed — if fails, run:
DO NOT fall back to "manual verification" or "manual analysis"
as a substitute for running trailmark. Install it first. If installation
fails, report the error instead of switching to manual analysis.
- A mutation testing framework for the target language — if the framework
command fails (not found, not installed), install it using the instructions
in references/mutation-frameworks.md.
DO NOT fall back to "manual mutation analysis" or skip mutation testing.
Install the framework first. If installation fails, report the error
instead of switching to manual mutation analysis.
- necessist (optional, recommended) — if the target language is
supported (Go, Rust, Solidity/Foundry, TypeScript/Hardhat,
TypeScript/Vitest, Rust/Anchor), install with .
See references/mutation-frameworks.md
for details.
- An existing test suite that passes
- macOS environment: Run before any
invocation. macOS Tahoe (26+) sets unlimited file descriptors by
default, which crashes Mull's subprocess spawning. See
references/mutation-frameworks.md
for details.
Rationalizations to Reject
| Rationalization | Why It's Wrong | Required Action |
|---|
| "All survived mutants need tests" | Many are harmless or equivalent | Triage before writing tests |
| "Mutation testing is too noisy" | Noise means you're not triaging | Use graph data to filter |
| "Unit tests cover everything" | Complex data flows need fuzzing | Check entrypoint reachability |
| "Dead code mutants don't matter" | Dead code should be removed | Flag for cleanup |
| "Low complexity = low risk" | Boundary bugs hide in simple code | Check mutant location |
| "Tool isn't installed, I'll do it manually" | Manual analysis misses what tooling catches | Install the tool first |
| "Necessist isn't mutation testing, skip it" | Necessist finds what mutation testing misses: weak tests | Run both when the language supports it |
Quick Start
bash
# 1. Build the code graph
uv run trailmark analyze --summary {targetDir}
# 2. Run mutation testing (language-dependent)
# Python:
uv run mutmut run --paths-to-mutate {targetDir}/src
uv run mutmut results
# 2b. Run necessist (if language supported)
necessist
# 3. Analyze results with this skill's workflow (Phase 3)
Workflow Overview
Phase 1: Graph Build → Parse codebase with trailmark
↓
Phase 2: Mutation Run → Execute mutation testing framework
Phase 2b: Necessist Run → Remove test statements (optional, parallel)
↓
Phase 3: Triage → Classify findings using graph data
↓
Output: Categorized Report
├── Corroborated (both tools flag same function — highest value)
├── False Positives (harmless, skip)
├── Missing Tests (write unit tests)
└── Fuzzing Targets (set up fuzz harnesses)
Decision Tree
├─ Need to set up mutation testing for a language?
│ └─ Read: references/mutation-frameworks.md
│
├─ Need to set up necessist or find weak test statements?
│ └─ Read: references/mutation-frameworks.md (Necessist section)
│
├─ Need to understand the triage criteria in depth?
│ └─ Read: references/triage-methodology.md
│
├─ Need to understand how graph data informs triage?
│ └─ Read: references/graph-analysis.md
│
└─ Already have results + graph? Use Phase 3 below.
Phase 1: Build Code Graph and Run Pre-Analysis
Parse the target codebase with trailmark and run pre-analysis before
mutation testing. Pre-analysis computes blast radius, entry points, privilege
boundaries, and taint propagation, which Phase 3 uses for triage.
bash
uv run trailmark analyze --summary {targetDir}
Use the
API to build the graph and run pre-analysis:
QueryEngine.from_directory("{targetDir}", language="{lang}")
- Call — mandatory before triage
- Export with for cross-referencing with mutation results
See references/graph-analysis.md for the
full API: node mapping, reachability queries, blast radius, and
pre-analysis subgraph lookups.
Phase 2: Run Mutation Testing
Select and run the appropriate framework. See
references/mutation-frameworks.md for
language-specific setup.
Capture survived mutants. Each framework reports differently, but
extract these fields per mutant:
| Field | Description |
|---|
| File path | Source file containing the mutant |
| Line number | Line where mutation was applied |
| Mutation type | What was changed (operator, value, etc.) |
| Status | survived, killed, timeout, error |
Filter to survived mutants only for Phase 3.
Phase 2b: Run Necessist (Optional)
If the target language is supported (Go, Rust, Solidity/Foundry,
TypeScript/Hardhat, TypeScript/Vitest, Rust/Anchor), run necessist to
find unnecessary test statements. This runs independently of Phase 2 and
can execute in parallel.
bash
# Auto-detect framework
necessist
# Or target specific test files
necessist tests/test_parser.rs
# Export results
necessist --dump
Filter to findings where the test passed after removal. See
references/mutation-frameworks.md
for framework-specific configuration and the normalized record format.
Map each removal to a production function using the algorithm in
references/graph-analysis.md.
Phase 3: Triage Findings
For each survived mutant and each necessist removal, determine its
triage bucket using graph data. Necessist removals must first be mapped
to a production function (see
references/graph-analysis.md).
Quick Classification (Mutation Testing)
| Signal | Bucket | Reasoning |
|---|
| No callers in graph | False Positive | Dead code, mutant is unreachable |
| Only test callers | False Positive | Test infrastructure, not production |
| Logging/display string | False Positive | Cosmetic, no behavioral impact |
| Equivalent mutant | False Positive | Behavior unchanged despite mutation |
| Simple function, low CC, no entrypoint path | Missing Tests | Unit test is straightforward |
| Error handling path | Missing Tests | Should have negative test cases |
| Boundary condition (off-by-one) | Missing Tests | Property-based test candidate |
| Pure function, deterministic | Missing Tests | Easy to test, high value |
| High CC (>10), entrypoint reachable | Fuzzing Target | Complex + exposed = fuzz it |
| Parser/validator/deserializer | Fuzzing Target | Structured input handling |
| Many callers (>10) + moderate CC | Fuzzing Target | High blast radius |
| Binary/wire protocol handling | Fuzzing Target | Fuzzers excel at format testing |
Quick Classification (Necessist)
| Signal | Bucket | Reasoning |
|---|
| Redundant setup or debug call | False Positive | Statement genuinely unnecessary |
| Cannot map to production function | False Positive | No graph context for triage |
| Call removed, no assertion checks its effect | Missing Tests | Test has weak assertions |
| Assertion removed, test still passes | Missing Tests | Redundant or insufficient coverage |
| Maps to high-CC entrypoint-reachable function | Fuzzing Target | Complex + exposed + weak test |
When both mutation testing and necessist flag the same production
function, mark as corroborated — highest confidence finding.
For detailed criteria, see
references/triage-methodology.md.
Graph Queries for Triage
For each mutant, map it to its containing graph node and use pre-analysis
subgraphs (tainted, high_blast_radius, privilege_boundary) from Phase 1
to classify it. The classification logic checks: no callers → false
positive, privilege boundary → fuzzing, high CC + tainted → fuzzing,
high blast radius → fuzzing, otherwise → missing tests.
See
references/graph-analysis.md for
the
implementation and node mapping functions.
Output Format
Generate a markdown report:
markdown
# Genotoxic Triage Report
## Summary
- Total survived mutants: N
- Total necessist removals: N
- Corroborated findings: N
- False positives: N (N%)
- Missing test coverage: N (N%)
- Fuzzing targets: N (N%)
## Corroborated Findings
|------|------|----------|----------------|------------------|--------|
## False Positives
|------|------|----------|--------|--------|
## Missing Test Coverage
|------|------|----------|----|---------|----------------|--------|
## Fuzzing Targets
| File | Line | Function | CC | Entrypoint Path | Blast Radius | Source |
|------|------|----------|----|-----------------|--------------|--------|
Write the report to
in the working directory.
Quality Checklist
Before delivering:
Integration
trailmark skill:
- Phase 1: Build code graph, query complexity and entrypoints
- Phase 3: Caller analysis, reachability, blast radius
property-based-testing skill:
- Missing test coverage items involving boundary conditions
- Roundtrip/idempotence properties for serialization mutants
testing-handbook-skills (fuzzing):
- Fuzzing target items: use , ,
Supporting Documentation
- references/mutation-frameworks.md -
Language-specific framework setup, output parsing, and necessist configuration
- references/triage-methodology.md -
Detailed triage criteria, edge cases, and worked examples for both
mutation testing and necessist
- references/graph-analysis.md -
Graph query patterns, test-to-production mapping, and result merging
First-time users: Start with Phase 1 (graph build), then run mutations,
then use the Quick Classification table in Phase 3.
Experienced users: Jump to Phase 3 and use the Decision Tree to load
specific reference material.