vector-forge

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Vector Forge

Vector Forge

Uses mutation testing to systematically identify gaps in test vector coverage, then generates new test vectors that close those gaps. Measures effectiveness by comparing mutation kill rates before and after.
使用突变测试系统地识别测试向量覆盖中的缺口,然后生成填补这些缺口的新测试向量,通过对比前后突变杀死率衡量有效性。

When to Use

适用场景

  • Generating test vectors for cryptographic algorithms or protocols
  • Evaluating how well existing test vectors cover an implementation
  • Finding implementation code paths that no test vector exercises
  • Creating Wycheproof-style cross-implementation test vectors
  • Measuring the concrete coverage value of a test vector suite
  • 生成密码算法或协议的测试向量
  • 评估现有测试向量对某个实现的覆盖程度
  • 查找没有测试向量覆盖的实现代码路径
  • 创建Wycheproof风格的跨实现测试向量
  • 衡量测试向量套件的具体覆盖价值

When NOT to Use

不适用场景

  • No implementations exist yet (need code to mutate)
  • Single trivial implementation with no edge cases
  • Testing application logic rather than algorithm implementations
  • The algorithm has no public test vectors to compare against
  • 尚无任何实现(需要代码来执行突变)
  • 单个无边界情况的简单实现
  • 测试应用逻辑而非算法实现
  • 算法没有公开测试向量可用于对比

Prerequisites

前置要求

  • trailmark installed — if
    uv run trailmark
    fails, run:
    bash
    uv pip install trailmark
  • At least one implementation of the target algorithm in a language with mutation testing support
  • A test harness that consumes test vectors and exercises the implementation
  • A mutation testing framework for the target language

  • 已安装trailmark — 如果
    uv run trailmark
    运行失败,请执行:
    bash
    uv pip install trailmark
  • 目标算法至少有一个实现,且该实现所用语言支持突变测试
  • 可接收测试向量并运行实现的测试脚手架
  • 目标语言对应的突变测试框架

Rationalizations to Reject

需驳回的错误认知

RationalizationWhy It's WrongRequired Action
"We have enough test vectors"Mutation testing proves otherwiseRun the baseline first
"The implementation's own tests are sufficient"Own tests often share blind spots with the implCross-impl vectors catch different bugs
"FFI crates can be mutation tested at the binding layer"Mutations to wrappers don't affect the underlying implMutate the actual implementation language
"Timeouts mean the mutation was caught"Timeouts are ambiguous — could be killed or aliveResolve timeouts before drawing conclusions
"All mutants are equivalent"Most aren't — verify by reading the mutationClassify each escaped mutant individually
"Checking valid vectors is enough"Permissive mutations survive without negative assertionsAssert rejection for every invalid vector
"Manual analysis is fine"Manual analysis misses what tooling catchesInstall and run the tools

错误认知错误原因所需操作
"我们的测试向量已经足够了"突变测试证明并非如此先运行基线测试
"实现自带的测试已经足够"自带测试通常和实现有相同的盲区跨实现测试向量能发现不同的bug
"FFI crates可以在绑定层进行突变测试"对包装层的突变不会影响底层实现对实际的实现代码执行突变
"超时意味着突变被捕获了"超时是歧义的 — 可能被杀死也可能存活在得出结论前解决超时问题
"所有突变都是等价的"绝大多数都不是 — 通过阅读突变代码验证逐个分类每个逃逸突变体
"只检查有效向量就足够了"如果没有否定断言,宽松的突变会存活对每个无效向量都要断言其被拒绝
"人工分析就足够了"人工分析会遗漏工具能发现的问题安装并运行工具

Workflow Overview

工作流概览

Phase 1: Discovery       → Find implementations to test
Phase 2: Harness         → Write/adapt test vector harness for each impl
Phase 3: Baseline        → Run mutation testing with existing vectors
Phase 4: Escape Analysis → Classify escaped mutants by code path
Phase 5: Vector Gen      → Create test vectors targeting escapes
Phase 6: Validation      → Re-run mutation testing, compare before/after
Output: Coverage Report + New Test Vectors

Phase 1: Discovery       → Find implementations to test
Phase 2: Harness         → Write/adapt test vector harness for each impl
Phase 3: Baseline        → Run mutation testing with existing vectors
Phase 4: Escape Analysis → Classify escaped mutants by code path
Phase 5: Vector Gen      → Create test vectors targeting escapes
Phase 6: Validation      → Re-run mutation testing, compare before/after
Output: Coverage Report + New Test Vectors

Phase 1: Discovery

阶段1:发现

Find implementations of the target algorithm. Look for:
  1. Pure implementations in high-level languages (Go, Rust, Python) — these are the best mutation testing targets
  2. FFI wrapper crates — identify these early so you don't waste time mutating wrapper glue code
  3. Reference implementations — useful for cross-verification but may not be the best mutation targets
For each implementation, note:
  • Language and mutation testing framework
  • Whether it's pure code or FFI wrappers
  • Existing test suite size and coverage
  • Which API surface the test vectors will exercise
查找目标算法的实现,优先查找:
  1. 高级语言(Go、Rust、Python)编写的纯实现 — 这些是最佳的突变测试目标
  2. FFI wrapper crates — 尽早识别这类实现,避免浪费时间突变包装层的粘合代码
  3. 参考实现 — 可用于交叉验证,但可能不是最佳的突变测试目标
对每个实现,记录以下信息:
  • 所用语言和对应的突变测试框架
  • 是纯代码实现还是FFI包装实现
  • 现有测试套件规模和覆盖率
  • 测试向量需要覆盖的API面

Implementation Type Classification

实现类型分类

TypeMutation ValueExample
Pure implementationHighzkcrypto/bls12_381 (Rust), gnark-crypto (Go)
FFI bindings to C/asmLow at binding layerblst Rust crate
C/C++ implementationHigh (use Mull)blst C library
Generated codeMedium (mutations may be equivalent)gnark-crypto generated field arithmetic
Key insight: If an implementation delegates to another language via FFI, you must mutate the underlying implementation, not the bindings. For C/C++ underneath Rust/Go/Python, use Mull or similar.

类型突变测试价值示例
纯实现zkcrypto/bls12_381 (Rust), gnark-crypto (Go)
绑定C/汇编的FFI包装绑定层价值低blst Rust crate
C/C++实现高(使用Mull)blst C library
生成的代码中等(突变可能是等价的)gnark-crypto生成的域运算代码
核心要点: 如果某个实现通过FFI调用其他语言的代码,你必须对底层实现执行突变,而非绑定层代码。对于Rust/Go/Python底层调用C/C++的场景,使用Mull或同类工具。

Phase 2: Harness

阶段2:测试脚手架

For each implementation, create a test harness that:
  1. Reads test vectors from JSON files (Wycheproof format recommended)
  2. Exercises the implementation's API for each vector
  3. Asserts both acceptance and rejection:
    • Valid vectors: deserialization succeeds, output matches expected
    • Invalid vectors: deserialization fails or verification rejects
  4. Adds roundtrip assertions for valid deserialization vectors:
    serialize(deserialize(bytes)) == bytes
  5. Reports pass/fail per vector with test IDs
Critical: A harness that only checks valid vectors will miss all permissive mutations (e.g.,
&
|
in validation). See references/lessons-learned.md §7.
The harness must be runnable by the mutation testing framework. For most frameworks this means:
  • Go: A
    _test.go
    file in the same package as the implementation
  • Rust: An integration test in
    tests/
    or inline
    #[test]
    functions
  • Python: A pytest test file
  • C/C++: A test binary linked against the implementation
为每个实现创建测试脚手架,满足以下要求:
  1. 从JSON文件读取测试向量(推荐Wycheproof格式)
  2. 对每个向量调用实现的API
  3. 同时断言接受和拒绝两种情况:
    • 有效向量:反序列化成功,输出与预期一致
    • 无效向量:反序列化失败或验证被拒绝
  4. 对有效反序列化向量添加往返断言
    serialize(deserialize(bytes)) == bytes
  5. 按测试ID报告每个向量的通过/失败状态
关键注意点: 仅检查有效向量的脚手架会遗漏所有宽松突变(例如验证逻辑中的
&
|
错误)。参考references/lessons-learned.md第7节。
脚手架必须可被突变测试框架运行,对大多数框架来说:
  • Go: 实现所在包下的
    _test.go
    文件
  • Rust:
    tests/
    目录下的集成测试或内联
    #[test]
    函数
  • Python: pytest测试文件
  • C/C++: 链接了实现的测试二进制文件

Harness Placement

脚手架放置位置

The harness must live inside the implementation's package so the mutation framework can see it. This usually means:
bash
undefined
脚手架必须放在实现的包内部,这样突变测试框架才能识别到,通常操作如下:
bash
undefined

Go: add test file to the package being mutated

Go: 把测试文件添加到待突变的包目录下

cp wycheproof_test.go /path/to/impl/package/
cp wycheproof_test.go /path/to/impl/package/

Rust: add integration test

Rust: 添加集成测试

cp wycheproof.rs /path/to/crate/tests/
cp wycheproof.rs /path/to/crate/tests/

Python: add test to the test directory

Python: 把测试文件添加到测试目录

cp test_wycheproof.py /path/to/package/tests/
undefined
cp test_wycheproof.py /path/to/package/tests/
undefined

Handling Existing Vectors

处理现有向量

If the implementation already has test vectors:
  1. Run mutation testing with ONLY the existing vectors (baseline)
  2. Run mutation testing with ONLY your new vectors
  3. Run mutation testing with BOTH combined
  4. The delta between (1) and (3) shows the new vectors' value

如果实现已经自带测试向量:
  1. 仅使用现有向量运行突变测试(基线测试)
  2. 仅使用你的新向量运行突变测试
  3. 合并新旧向量运行突变测试
  4. (1)和(3)的差值就是新向量的价值

Phase 3: Baseline

阶段3:基线测试

Run mutation testing with existing test vectors only.
仅使用现有测试向量运行突变测试。

Framework Selection

框架选择

See references/mutation-frameworks.md for language-specific setup.
LanguageFrameworkCommand
Gogremlins
gremlins unleash ./path/to/package
Rustcargo-mutants
cargo mutants -j N --timeout T
Pythonmutmut
mutmut run --paths-to-mutate src/
C/C++Mull
mull-runner -test-framework=GoogleTest binary
参考references/mutation-frameworks.md完成不同语言的环境搭建。
语言框架命令
Gogremlins
gremlins unleash ./path/to/package
Rustcargo-mutants
cargo mutants -j N --timeout T
Pythonmutmut
mutmut run --paths-to-mutate src/
C/C++Mull
mull-runner -test-framework=GoogleTest binary

Parallelism

并行执行

Always use parallel execution for large codebases:
  • cargo mutants -j 8
    (Rust, 8 parallel workers)
  • gremlins unleash --timeout-coefficient 3
    (Go, increase timeouts)
  • mutmut run --runner "pytest -x -q"
    (Python, fail-fast)
大型代码库请始终使用并行执行:
  • cargo mutants -j 8
    (Rust,8个并行工作进程)
  • gremlins unleash --timeout-coefficient 3
    (Go,增加超时时间)
  • mutmut run --runner "pytest -x -q"
    (Python,快速失败)

Recording Baseline Results

记录基线结果

Capture these metrics per implementation:
MetricDescription
Total mutantsNumber of mutations generated
KilledMutants caught by tests
Survived/LivedMutants NOT caught (these are the targets)
Not coveredCode paths no test reaches at all
Timed outAmbiguous — resolve before comparing
Efficacy %Killed / (Killed + Survived)
Coverage %(Total - Not covered) / Total
Save the full mutation log for Phase 4 analysis.

为每个实现记录以下指标:
指标描述
总突变数生成的突变总数
被杀死被测试捕获的突变
存活未被测试捕获的突变(这些是优化目标)
未覆盖没有任何测试覆盖的代码路径
超时歧义状态 — 对比前需解决
有效性%被杀死 / (被杀死 + 存活)
覆盖率%(总突变数 - 未覆盖) / 总突变数
保存完整的突变日志用于阶段4的分析。

Phase 4: Escape Analysis (Graph-Informed Triage)

阶段4:逃逸分析(基于调用图的分类)

Classify each escaped (survived + not covered) mutant using the Trailmark call graph for reachability and blast radius analysis.
This phase MUST use the genotoxic skill's triage methodology. The call graph transforms mutation results from a flat list of survived mutants into an actionable, prioritized set of vector targets.
使用Trailmark调用图分析可达性和影响范围,对每个逃逸(存活+未覆盖)突变体进行分类。
本阶段必须使用genotoxic skill的分类方法。 调用图会将扁平的存活突变列表转换为可落地、可优先级排序的测试向量目标集。

Step 1: Build the Call Graph

步骤1:构建调用图

Build a Trailmark code graph for each implementation before triaging mutations:
bash
undefined
在分类突变前,为每个实现构建Trailmark代码图:
bash
undefined

Go

Go

uv run trailmark analyze --language go --summary {targetDir}
uv run trailmark analyze --language go --summary {targetDir}

Rust

Rust

uv run trailmark analyze --language rust --summary {targetDir}

The graph provides:
- **Caller chains** — trace from public API entry points to
  mutated functions to determine reachability
- **Cyclomatic complexity** — prioritize high-CC functions
- **Blast radius** — functions with many callers have wider
  impact if their mutations survive
uv run trailmark analyze --language rust --summary {targetDir}

调用图提供以下信息:
- **调用链** — 从公共API入口追溯到突变函数,判断可达性
- **圈复杂度** — 优先处理高圈复杂度函数
- **影响范围** — 被大量函数调用的函数如果发生突变存活,影响范围更广

Step 2: Filter to Relevant Code

步骤2:过滤相关代码

Mutation frameworks test the entire package. Filter results to only the files/functions that test vectors should exercise:
bash
undefined
突变测试框架会测试整个包,将结果过滤为仅测试向量需要覆盖的文件/函数:
bash
undefined

Go (gremlins)

Go (gremlins)

grep -E "(LIVED|NOT COVERED)" baseline.log
| grep -E " at (relevant|files)"
| sort
grep -E "(LIVED|NOT COVERED)" baseline.log
| grep -E " at (relevant|files)"
| sort

Rust (cargo-mutants)

Rust (cargo-mutants)

cat mutants.out/missed.txt | grep "src/relevant"
undefined
cat mutants.out/missed.txt | grep "src/relevant"
undefined

Step 3: Graph-Informed Classification

步骤3:基于调用图的分类

For each escaped mutant, map it to its containing function in the call graph and apply the genotoxic triage criteria:
Graph SignalClassificationAction
No callers in graphFalse PositiveDead code, skip
Only test callersFalse PositiveTest infrastructure
Logging/display/formattingFalse PositiveCosmetic
Cross-package callers but NOT COVEREDCross-Package GapSee below
Reachable from public API, low CCMissing VectorDesign targeted vector
Reachable from public API, high CC (>10)Fuzzing TargetBoth vector + fuzz harness
Validation/error-handling pathNegative VectorCraft invalid input that triggers path
Optimization path (GLV, SIMD, batch)Edge-Case VectorInput that triggers optimization threshold
|
^
after left shift (e.g.
(t<<1) | carry
)
Equivalent MutantSkip — bit 0 always 0, OR=XOR
ct_eq
&
|
on Montgomery limbs
API-UnreachableNeeds library-internal tests, not vectors
Equivalent mutation (behavior unchanged)False PositiveSkip
对每个逃逸突变体,映射到调用图中对应的函数,应用genotoxic分类标准:
调用图信号分类操作
图中无调用方误报死代码,跳过
仅测试代码调用误报测试基础设施代码,跳过
日志/展示/格式化代码误报外观类代码,跳过
跨包调用但未覆盖跨包覆盖缺口参考下方说明
可从公共API访问,低圈复杂度缺失测试向量设计定向测试向量
可从公共API访问,高圈复杂度(>10)Fuzz测试目标同时需要测试向量和fuzz脚手架
验证/错误处理路径否定测试向量构造触发该路径的无效输入
优化路径(GLV、SIMD、批处理)边界情况测试向量触发优化阈值的输入
左移后的
|
^
(例如
(t<<1) | carry
等价突变跳过 — 第0位始终为0,OR和XOR效果一致
Montgomery limb的ct_eq
&
|
API不可达需要库内部测试,无需测试向量
等价突变(行为无变化)误报跳过

Step 4: Identify Cross-Package Test Gaps

步骤4:识别跨包测试缺口

Critical pitfall: Mutation frameworks often only run tests within the same package as the mutation. For Go (gremlins) and Rust (cargo-mutants), this means:
  • A mutation in
    hash_to_curve/g2.go
    only runs tests in the
    hash_to_curve
    package, NOT tests in the parent
    bls12381
    package that imports it
  • Functions that are fully exercised by cross-package tests will appear as NOT COVERED — these are false positives
  • To confirm: check if the mutated function is called from a test in a different package that wouldn't be run
To resolve cross-package gaps:
  1. Add a thin test in the sub-package that calls through the same code path as the cross-package test
  2. Or run gremlins with
    --test-pkg ./...
    (if supported)
  3. Or document as a framework limitation in the report
关键陷阱: 突变测试框架通常仅运行突变所在包内的测试。对Go(gremlins)和Rust(cargo-mutants)来说:
  • hash_to_curve/g2.go
    中的突变仅运行
    hash_to_curve
    包内的测试,不会运行导入该包的父级
    bls12381
    包中的测试
  • 被跨包测试完全覆盖的函数会显示为未覆盖 — 这些是误报
  • 验证方法:检查突变函数是否被其他包的测试调用,而这些测试不会被当前框架运行
解决跨包缺口的方法:
  1. 在子包中添加精简测试,调用跨包测试覆盖的同一条代码路径
  2. 或者gremlins使用
    --test-pkg ./...
    参数运行(如果支持)
  3. 或者在报告中注明为框架限制

Step 5: Prioritize by Security Impact

步骤5:按安全影响排序

Using the call graph, rank surviving mutants by impact:
PriorityCriteriaExample
P0 — CriticalMutant weakens validation/equality/authentication
ct_eq
:
&
|
makes equality permissive
P1 — HighMutant in deserialization flag parsing
from_compressed
:
&
|
accepts invalid flags
P2 — MediumMutant in field arithmetic internals
Fp::square
:
|
^
corrupts computation
P3 — LowMutant in optimization path
phi
endomorphism: only affects performance path
SkipFormatting, display, equivalent mutation
Debug::fmt
return value replacement
使用调用图,按影响对存活突变排序:
优先级标准示例
P0 — 严重突变削弱验证/等值判断/认证逻辑
ct_eq
:
&
|
导致等值判断逻辑过于宽松
P1 — 高反序列化标志解析逻辑的突变
from_compressed
:
&
|
接受无效标志
P2 — 中域运算内部逻辑的突变
Fp::square
:
|
^
导致计算错误
P3 — 低优化路径的突变
phi
自同构:仅影响性能路径
跳过格式化、展示、等价突变
Debug::fmt
返回值替换

Step 6: Group by Vector Strategy

步骤6:按向量策略分组

Group escaped mutants by the code path they represent and the type of test vector needed:
Deserialization flag validation (P1):
  - g1.rs:339,363-365,384 — from_compressed_unchecked flags
  → Need: valid-point-wrong-flag vectors

Field arithmetic (P2):
  - fp.rs:371-376,406,635-643 — subtract_p, neg, square
  → Need: field arithmetic KATs with edge-case values

Optimization thresholds (P3):
  - g1.go:68, g2.go:75 — GLV vs windowed multiplication
  → Need: scalar multiplication with large scalars

Cross-package (framework limitation):
  - hash_to_curve/g2.go:242-278 — isogeny, sgn0
  → Document as false positive or add sub-package test
Each group becomes a target for new test vectors in Phase 5.

按逃逸突变体对应的代码路径和所需测试向量类型分组:
反序列化标志验证(P1):
  - g1.rs:339,363-365,384 — from_compressed_unchecked标志
  → 需要:有效点+错误标志的测试向量

域运算(P2):
  - fp.rs:371-376,406,635-643 — subtract_p, neg, square
  → 需要:带边界值的域运算KAT

优化阈值(P3):
  - g1.go:68, g2.go:75 — GLV vs 窗口乘法
  → 需要:大标量的标量乘法测试向量

跨包(框架限制):
  - hash_to_curve/g2.go:242-278 — isogeny, sgn0
  → 标记为误报或添加子包测试
每个分组都是阶段5中新测试向量的生成目标。

Phase 5: Vector Generation

阶段5:向量生成

For each escaped code path group, design test vectors that force execution through that path.
对每个逃逸代码路径分组,设计强制执行该路径的测试向量。

Vector Design Patterns

向量设计模式

Code Path TypeVector Strategy
Point deserializationMalformed points: wrong length, invalid field elements, off-curve, wrong subgroup, identity point
Signature verificationValid sig + all single-bit corruptions of sig, pk, msg
Hash-to-curveKnown answer tests (KATs) with edge-case inputs: empty, single byte, max length
Aggregate operations1 signer, many signers, duplicate signers, mixed valid/invalid
Error handlingEvery error path should have a vector that triggers it
Arithmetic edge casesZero, one, field modulus - 1, points at infinity
Serialization flagsEvery valid flag combination + every invalid flag combination
Roundtrip integrityFor every valid deser vector, assert
serialize(deserialize(b)) == b
Carry/reduction faultsReimplement at reduced limb widths, inject faults, extract distinguishing inputs
代码路径类型向量策略
点反序列化格式错误的点:长度错误、无效域元素、不在曲线上、子群错误、单位点
签名验证有效签名 + 签名、公钥、消息的所有单位错误篡改
Hash-to-curve带边界输入的已知答案测试(KAT):空输入、单字节、最大长度
聚合操作1个签名者、多个签名者、重复签名者、有效/无效混合
错误处理每个错误路径都要有触发该路径的测试向量
运算边界情况零、一、域模数减1、无穷远点
序列化标志所有有效标志组合 + 所有无效标志组合
往返完整性对每个有效反序列化向量,断言
serialize(deserialize(b)) == b
进位/约简错误用更窄的位宽重新实现,注入错误,提取可区分输入

Single-Fault Negative Vectors

单错误否定向量

Each negative vector should have exactly one defect with everything else valid — this isolates which validation check is being tested. See references/vector-patterns.md for per-flag construction examples.
每个否定向量应该仅包含一个缺陷,其余部分全部有效 — 这样可以隔离被测试的验证检查。参考references/vector-patterns.md查看每个标志的构造示例。

Fault Simulation (Limb-Width Reimplementation)

错误模拟(窄位宽重实现)

When mutation testing only applies local operator swaps, deeper architectural bugs (carry propagation, reduction overflow) go untested. To close this gap, reimplement the target algorithm at reduced limb widths (8, 16, 25, 32 bits) and deliberately inject faults — then generate vectors that catch them.
See references/fault-simulation.md for the full methodology: limb-width selection, fault injection catalog, vector extraction, and validation workflow.
突变测试仅应用本地运算符替换时,更深层的架构错误(进位传播、约简溢出)无法被测试到。为填补这个缺口,用更窄的位宽(8、16、25、32位)重新实现目标算法,主动注入错误 — 然后生成捕获这些错误的测试向量。
参考references/fault-simulation.md查看完整方法:位宽选择、错误注入目录、向量提取和验证工作流。

Cross-Implementation Verification

跨实现验证

Every new test vector MUST be verified against at least two independent implementations before being added to the suite:
  1. Generate the vector using implementation A
  2. Verify with implementation B (different codebase, ideally different language)
  3. If B disagrees, investigate — one implementation has a bug
每个新测试向量加入套件前,必须在至少两个独立实现上验证通过:
  1. 使用实现A生成向量
  2. 使用实现B验证(不同代码库,最好是不同语言)
  3. 如果B的结果不一致,排查问题 — 其中一个实现存在bug

Vector Format

向量格式

Use Wycheproof JSON format (
algorithm
,
testGroups[].tests[]
with
tcId
,
comment
,
result
,
flags
). See references/vector-patterns.md for the full schema.
JSON encoding: Wycheproof canonicalizes vectors with
reformat_json.py
, which unescapes HTML entities. Generate vectors with literal characters, not HTML-escaped sequences:
  • Go: Use
    json.NewEncoder
    +
    enc.SetEscapeHTML(false)
    — never
    json.Marshal
    /
    json.MarshalIndent
    , which silently escape
    >
    \u003e
    ,
    <
    \u003c
    ,
    &
    \u0026
  • Python:
    json.dumps
    is safe by default
  • Node.js:
    JSON.stringify
    is safe by default
See references/lessons-learned.md §14 for details.

使用Wycheproof JSON格式(
algorithm
testGroups[].tests[]
包含
tcId
comment
result
flags
)。参考references/vector-patterns.md查看完整schema。
JSON编码: Wycheproof使用
reformat_json.py
规范化向量,该工具会取消HTML实体转义。生成向量时使用字面字符,不要使用HTML转义序列:
  • Go: 使用
    json.NewEncoder
    +
    enc.SetEscapeHTML(false)
    — 不要使用
    json.Marshal
    /
    json.MarshalIndent
    ,这两个函数会自动转义
    >
    \u003e
    <
    \u003c
    &
    \u0026
  • Python: 默认
    json.dumps
    是安全的
  • Node.js: 默认
    JSON.stringify
    是安全的
参考references/lessons-learned.md第14节查看详情。

Phase 6: Validation

阶段6:验证

Re-run mutation testing with the new test vectors included.
Tip: Use per-file mutation testing for fast iteration during vector development (see references/lessons-learned.md §12). Only run full-crate tests for the final comparison.
加入新测试向量后重新运行突变测试。
提示: 向量开发阶段使用单文件突变测试加快迭代(参考references/lessons-learned.md第12节),最终对比时再运行全量测试。

Before/After Comparison

前后对比

MetricBaselineWith New VectorsDelta
KilledXYY - X
SurvivedABA - B (should decrease)
Not CoveredCDC - D (should decrease)
Efficacy %E%F%F - E
指标基线加入新向量后差值
被杀死XYY - X
存活ABA - B(应该下降)
未覆盖CDC - D(应该下降)
有效性%E%F%F - E

Success Criteria

成功标准

Vectors have both retroactive value (killing mutants in existing code) and proactive value (catching bugs in future implementations). Generate both kinds — boundary-condition vectors may not improve kill rates in mature libraries but will catch bugs in new implementations. See references/lessons-learned.md §13.
Retroactive (measurable): previously survived/uncovered mutants become killed, no regressions.
If kill rates don't change: the implementation's own tests likely already cover those paths. The vectors still add cross-implementation verification value. Document which case applies.

测试向量同时具备追溯价值(杀死现有代码中的突变)和预防价值(捕获未来实现中的bug)。两种都要生成 — 边界条件向量可能不会提升成熟库的突变杀死率,但会捕获新实现中的bug。参考references/lessons-learned.md第13节。
追溯价值(可衡量): 之前存活/未覆盖的突变被杀死,无回归。
如果杀死率没有变化: 实现自带的测试可能已经覆盖了这些路径,测试向量仍然具备跨实现验证的价值,在报告中说明对应情况即可。

Output Format

输出格式

Write
VECTOR_FORGE_REPORT.md
covering: target algorithm, implementations tested, baseline results, escape analysis, new vectors generated, after results, before/after delta, and conclusions. See references/report-template.md for the full template.

编写
VECTOR_FORGE_REPORT.md
,包含:目标算法、测试的实现、基线结果、逃逸分析、生成的新向量、加入后的结果、前后差值、结论。参考references/report-template.md查看完整模板。

Quality Checklist

质量检查清单

Before delivering:
  • At least one pure implementation mutation-tested (not just FFI wrappers)
  • Baseline run completed with existing vectors
  • Trailmark call graph built for each implementation
  • All escaped mutants triaged using graph-informed classification
  • Cross-package false positives identified and documented
  • Security-critical mutations (ct_eq, validation, auth) prioritized as P0/P1
  • Fault simulation and mutation-derived vectors cross-verified against 2+ implementations
  • After run completed with new vectors included
  • Before/after delta computed and explained
  • Report written to
    VECTOR_FORGE_REPORT.md
  • New test vectors saved in standard format (Wycheproof JSON)

交付前确认:
  • 至少对一个纯实现进行了突变测试(不只是FFI包装)
  • 已使用现有向量完成基线测试
  • 已为每个实现构建Trailmark调用图
  • 所有逃逸突变体都使用基于调用图的分类完成了梳理
  • 跨包误报已识别并记录
  • 安全关键突变(ct_eq、验证、认证)已按P0/P1优先级处理
  • 错误模拟和突变衍生的向量已在2个以上实现上交叉验证
  • 已加入新向量完成后续测试
  • 已计算并解释前后差值
  • 已编写
    VECTOR_FORGE_REPORT.md
    报告
  • 新测试向量已按标准格式(Wycheproof JSON)保存

Integration

集成说明

SkillRelationship
genotoxic (required for Phase 4)Provides graph-informed triage — call graph cuts actionable mutants by 30-50%
mutation-testing (mewt/muton)Use for Solidity; Vector Forge is language-agnostic
property-based-testingBetter than hand-crafted vectors for bitwise mutations in field arithmetic
testing-handbook-skills (fuzzing)Functions with CC > 10 and surviving mutants need both vectors and fuzz harnesses

Skill关系
genotoxic(阶段4必需)提供基于调用图的分类方法 — 调用图可将可落地的突变数量减少30-50%
mutation-testing(mewt/muton)用于Solidity测试;Vector Forge是语言无关的
property-based-testing针对域运算中的位突变,比手动构造向量效果更好
testing-handbook-skills(fuzzing)圈复杂度>10且存在存活突变的函数同时需要测试向量和fuzz脚手架

Supporting Documentation

支持文档

  • references/mutation-frameworks.md - Language-specific mutation testing framework setup
  • references/vector-patterns.md - Common test vector patterns for cryptographic primitives
  • references/fault-simulation.md - Limb-width reimplementation for carry, reduction, and overflow faults
  • references/report-template.md - Full markdown template for the Vector Forge report
  • references/lessons-learned.md - BLS12-381 case study: FFI kill rates, timeout masking, cross-package false positives, bitwise mutation gaps, and security-critical priorities
  • references/mutation-frameworks.md — 不同语言的突变测试框架搭建指南
  • references/vector-patterns.md — 密码原语的常用测试向量模式
  • references/fault-simulation.md — 用于检测进位、约简和溢出错误的窄位宽重实现方法
  • references/report-template.md — Vector Forge报告的完整Markdown模板
  • references/lessons-learned.md — BLS12-381案例研究:FFI杀死率、超时掩盖、跨包误报、位突变缺口、安全关键优先级