vector-forge
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseVector Forge
Vector Forge
Uses mutation testing to systematically identify gaps in test vector
coverage, then generates new test vectors that close those gaps.
Measures effectiveness by comparing mutation kill rates before and after.
使用突变测试系统地识别测试向量覆盖中的缺口,然后生成填补这些缺口的新测试向量,通过对比前后突变杀死率衡量有效性。
When to Use
适用场景
- Generating test vectors for cryptographic algorithms or protocols
- Evaluating how well existing test vectors cover an implementation
- Finding implementation code paths that no test vector exercises
- Creating Wycheproof-style cross-implementation test vectors
- Measuring the concrete coverage value of a test vector suite
- 生成密码算法或协议的测试向量
- 评估现有测试向量对某个实现的覆盖程度
- 查找没有测试向量覆盖的实现代码路径
- 创建Wycheproof风格的跨实现测试向量
- 衡量测试向量套件的具体覆盖价值
When NOT to Use
不适用场景
- No implementations exist yet (need code to mutate)
- Single trivial implementation with no edge cases
- Testing application logic rather than algorithm implementations
- The algorithm has no public test vectors to compare against
- 尚无任何实现(需要代码来执行突变)
- 单个无边界情况的简单实现
- 测试应用逻辑而非算法实现
- 算法没有公开测试向量可用于对比
Prerequisites
前置要求
- trailmark installed — if fails, run:
uv run trailmarkbashuv pip install trailmark - At least one implementation of the target algorithm in a language with mutation testing support
- A test harness that consumes test vectors and exercises the implementation
- A mutation testing framework for the target language
- 已安装trailmark — 如果运行失败,请执行:
uv run trailmarkbashuv pip install trailmark - 目标算法至少有一个实现,且该实现所用语言支持突变测试
- 可接收测试向量并运行实现的测试脚手架
- 目标语言对应的突变测试框架
Rationalizations to Reject
需驳回的错误认知
| Rationalization | Why It's Wrong | Required Action |
|---|---|---|
| "We have enough test vectors" | Mutation testing proves otherwise | Run the baseline first |
| "The implementation's own tests are sufficient" | Own tests often share blind spots with the impl | Cross-impl vectors catch different bugs |
| "FFI crates can be mutation tested at the binding layer" | Mutations to wrappers don't affect the underlying impl | Mutate the actual implementation language |
| "Timeouts mean the mutation was caught" | Timeouts are ambiguous — could be killed or alive | Resolve timeouts before drawing conclusions |
| "All mutants are equivalent" | Most aren't — verify by reading the mutation | Classify each escaped mutant individually |
| "Checking valid vectors is enough" | Permissive mutations survive without negative assertions | Assert rejection for every invalid vector |
| "Manual analysis is fine" | Manual analysis misses what tooling catches | Install and run the tools |
| 错误认知 | 错误原因 | 所需操作 |
|---|---|---|
| "我们的测试向量已经足够了" | 突变测试证明并非如此 | 先运行基线测试 |
| "实现自带的测试已经足够" | 自带测试通常和实现有相同的盲区 | 跨实现测试向量能发现不同的bug |
| "FFI crates可以在绑定层进行突变测试" | 对包装层的突变不会影响底层实现 | 对实际的实现代码执行突变 |
| "超时意味着突变被捕获了" | 超时是歧义的 — 可能被杀死也可能存活 | 在得出结论前解决超时问题 |
| "所有突变都是等价的" | 绝大多数都不是 — 通过阅读突变代码验证 | 逐个分类每个逃逸突变体 |
| "只检查有效向量就足够了" | 如果没有否定断言,宽松的突变会存活 | 对每个无效向量都要断言其被拒绝 |
| "人工分析就足够了" | 人工分析会遗漏工具能发现的问题 | 安装并运行工具 |
Workflow Overview
工作流概览
Phase 1: Discovery → Find implementations to test
↓
Phase 2: Harness → Write/adapt test vector harness for each impl
↓
Phase 3: Baseline → Run mutation testing with existing vectors
↓
Phase 4: Escape Analysis → Classify escaped mutants by code path
↓
Phase 5: Vector Gen → Create test vectors targeting escapes
↓
Phase 6: Validation → Re-run mutation testing, compare before/after
↓
Output: Coverage Report + New Test VectorsPhase 1: Discovery → Find implementations to test
↓
Phase 2: Harness → Write/adapt test vector harness for each impl
↓
Phase 3: Baseline → Run mutation testing with existing vectors
↓
Phase 4: Escape Analysis → Classify escaped mutants by code path
↓
Phase 5: Vector Gen → Create test vectors targeting escapes
↓
Phase 6: Validation → Re-run mutation testing, compare before/after
↓
Output: Coverage Report + New Test VectorsPhase 1: Discovery
阶段1:发现
Find implementations of the target algorithm. Look for:
- Pure implementations in high-level languages (Go, Rust, Python) — these are the best mutation testing targets
- FFI wrapper crates — identify these early so you don't waste time mutating wrapper glue code
- Reference implementations — useful for cross-verification but may not be the best mutation targets
For each implementation, note:
- Language and mutation testing framework
- Whether it's pure code or FFI wrappers
- Existing test suite size and coverage
- Which API surface the test vectors will exercise
查找目标算法的实现,优先查找:
- 高级语言(Go、Rust、Python)编写的纯实现 — 这些是最佳的突变测试目标
- FFI wrapper crates — 尽早识别这类实现,避免浪费时间突变包装层的粘合代码
- 参考实现 — 可用于交叉验证,但可能不是最佳的突变测试目标
对每个实现,记录以下信息:
- 所用语言和对应的突变测试框架
- 是纯代码实现还是FFI包装实现
- 现有测试套件规模和覆盖率
- 测试向量需要覆盖的API面
Implementation Type Classification
实现类型分类
| Type | Mutation Value | Example |
|---|---|---|
| Pure implementation | High | zkcrypto/bls12_381 (Rust), gnark-crypto (Go) |
| FFI bindings to C/asm | Low at binding layer | blst Rust crate |
| C/C++ implementation | High (use Mull) | blst C library |
| Generated code | Medium (mutations may be equivalent) | gnark-crypto generated field arithmetic |
Key insight: If an implementation delegates to another language
via FFI, you must mutate the underlying implementation, not the
bindings. For C/C++ underneath Rust/Go/Python, use Mull or similar.
| 类型 | 突变测试价值 | 示例 |
|---|---|---|
| 纯实现 | 高 | zkcrypto/bls12_381 (Rust), gnark-crypto (Go) |
| 绑定C/汇编的FFI包装 | 绑定层价值低 | blst Rust crate |
| C/C++实现 | 高(使用Mull) | blst C library |
| 生成的代码 | 中等(突变可能是等价的) | gnark-crypto生成的域运算代码 |
核心要点: 如果某个实现通过FFI调用其他语言的代码,你必须对底层实现执行突变,而非绑定层代码。对于Rust/Go/Python底层调用C/C++的场景,使用Mull或同类工具。
Phase 2: Harness
阶段2:测试脚手架
For each implementation, create a test harness that:
- Reads test vectors from JSON files (Wycheproof format recommended)
- Exercises the implementation's API for each vector
- Asserts both acceptance and rejection:
- Valid vectors: deserialization succeeds, output matches expected
- Invalid vectors: deserialization fails or verification rejects
- Adds roundtrip assertions for valid deserialization vectors:
serialize(deserialize(bytes)) == bytes - Reports pass/fail per vector with test IDs
Critical: A harness that only checks valid vectors will miss all
permissive mutations (e.g., → in validation). See
references/lessons-learned.md §7.
&|The harness must be runnable by the mutation testing framework.
For most frameworks this means:
- Go: A file in the same package as the implementation
_test.go - Rust: An integration test in or inline
tests/functions#[test] - Python: A pytest test file
- C/C++: A test binary linked against the implementation
为每个实现创建测试脚手架,满足以下要求:
- 从JSON文件读取测试向量(推荐Wycheproof格式)
- 对每个向量调用实现的API
- 同时断言接受和拒绝两种情况:
- 有效向量:反序列化成功,输出与预期一致
- 无效向量:反序列化失败或验证被拒绝
- 对有效反序列化向量添加往返断言:
serialize(deserialize(bytes)) == bytes - 按测试ID报告每个向量的通过/失败状态
关键注意点: 仅检查有效向量的脚手架会遗漏所有宽松突变(例如验证逻辑中的 → 错误)。参考references/lessons-learned.md第7节。
&|脚手架必须可被突变测试框架运行,对大多数框架来说:
- Go: 实现所在包下的文件
_test.go - Rust: 目录下的集成测试或内联
tests/函数#[test] - Python: pytest测试文件
- C/C++: 链接了实现的测试二进制文件
Harness Placement
脚手架放置位置
The harness must live inside the implementation's package so the
mutation framework can see it. This usually means:
bash
undefined脚手架必须放在实现的包内部,这样突变测试框架才能识别到,通常操作如下:
bash
undefinedGo: add test file to the package being mutated
Go: 把测试文件添加到待突变的包目录下
cp wycheproof_test.go /path/to/impl/package/
cp wycheproof_test.go /path/to/impl/package/
Rust: add integration test
Rust: 添加集成测试
cp wycheproof.rs /path/to/crate/tests/
cp wycheproof.rs /path/to/crate/tests/
Python: add test to the test directory
Python: 把测试文件添加到测试目录
cp test_wycheproof.py /path/to/package/tests/
undefinedcp test_wycheproof.py /path/to/package/tests/
undefinedHandling Existing Vectors
处理现有向量
If the implementation already has test vectors:
- Run mutation testing with ONLY the existing vectors (baseline)
- Run mutation testing with ONLY your new vectors
- Run mutation testing with BOTH combined
- The delta between (1) and (3) shows the new vectors' value
如果实现已经自带测试向量:
- 仅使用现有向量运行突变测试(基线测试)
- 仅使用你的新向量运行突变测试
- 合并新旧向量运行突变测试
- (1)和(3)的差值就是新向量的价值
Phase 3: Baseline
阶段3:基线测试
Run mutation testing with existing test vectors only.
仅使用现有测试向量运行突变测试。
Framework Selection
框架选择
See references/mutation-frameworks.md
for language-specific setup.
| Language | Framework | Command |
|---|---|---|
| Go | gremlins | |
| Rust | cargo-mutants | |
| Python | mutmut | |
| C/C++ | Mull | |
参考references/mutation-frameworks.md完成不同语言的环境搭建。
| 语言 | 框架 | 命令 |
|---|---|---|
| Go | gremlins | |
| Rust | cargo-mutants | |
| Python | mutmut | |
| C/C++ | Mull | |
Parallelism
并行执行
Always use parallel execution for large codebases:
- (Rust, 8 parallel workers)
cargo mutants -j 8 - (Go, increase timeouts)
gremlins unleash --timeout-coefficient 3 - (Python, fail-fast)
mutmut run --runner "pytest -x -q"
大型代码库请始终使用并行执行:
- (Rust,8个并行工作进程)
cargo mutants -j 8 - (Go,增加超时时间)
gremlins unleash --timeout-coefficient 3 - (Python,快速失败)
mutmut run --runner "pytest -x -q"
Recording Baseline Results
记录基线结果
Capture these metrics per implementation:
| Metric | Description |
|---|---|
| Total mutants | Number of mutations generated |
| Killed | Mutants caught by tests |
| Survived/Lived | Mutants NOT caught (these are the targets) |
| Not covered | Code paths no test reaches at all |
| Timed out | Ambiguous — resolve before comparing |
| Efficacy % | Killed / (Killed + Survived) |
| Coverage % | (Total - Not covered) / Total |
Save the full mutation log for Phase 4 analysis.
为每个实现记录以下指标:
| 指标 | 描述 |
|---|---|
| 总突变数 | 生成的突变总数 |
| 被杀死 | 被测试捕获的突变 |
| 存活 | 未被测试捕获的突变(这些是优化目标) |
| 未覆盖 | 没有任何测试覆盖的代码路径 |
| 超时 | 歧义状态 — 对比前需解决 |
| 有效性% | 被杀死 / (被杀死 + 存活) |
| 覆盖率% | (总突变数 - 未覆盖) / 总突变数 |
保存完整的突变日志用于阶段4的分析。
Phase 4: Escape Analysis (Graph-Informed Triage)
阶段4:逃逸分析(基于调用图的分类)
Classify each escaped (survived + not covered) mutant using the
Trailmark call graph for reachability and blast radius analysis.
This phase MUST use the genotoxic skill's triage methodology.
The call graph transforms mutation results from a flat list of
survived mutants into an actionable, prioritized set of vector
targets.
使用Trailmark调用图分析可达性和影响范围,对每个逃逸(存活+未覆盖)突变体进行分类。
本阶段必须使用genotoxic skill的分类方法。 调用图会将扁平的存活突变列表转换为可落地、可优先级排序的测试向量目标集。
Step 1: Build the Call Graph
步骤1:构建调用图
Build a Trailmark code graph for each implementation before
triaging mutations:
bash
undefined在分类突变前,为每个实现构建Trailmark代码图:
bash
undefinedGo
Go
uv run trailmark analyze --language go --summary {targetDir}
uv run trailmark analyze --language go --summary {targetDir}
Rust
Rust
uv run trailmark analyze --language rust --summary {targetDir}
The graph provides:
- **Caller chains** — trace from public API entry points to
mutated functions to determine reachability
- **Cyclomatic complexity** — prioritize high-CC functions
- **Blast radius** — functions with many callers have wider
impact if their mutations surviveuv run trailmark analyze --language rust --summary {targetDir}
调用图提供以下信息:
- **调用链** — 从公共API入口追溯到突变函数,判断可达性
- **圈复杂度** — 优先处理高圈复杂度函数
- **影响范围** — 被大量函数调用的函数如果发生突变存活,影响范围更广Step 2: Filter to Relevant Code
步骤2:过滤相关代码
Mutation frameworks test the entire package. Filter results to
only the files/functions that test vectors should exercise:
bash
undefined突变测试框架会测试整个包,将结果过滤为仅测试向量需要覆盖的文件/函数:
bash
undefinedGo (gremlins)
Go (gremlins)
grep -E "(LIVED|NOT COVERED)" baseline.log
| grep -E " at (relevant|files)"
| sort
| grep -E " at (relevant|files)"
| sort
grep -E "(LIVED|NOT COVERED)" baseline.log
| grep -E " at (relevant|files)"
| sort
| grep -E " at (relevant|files)"
| sort
Rust (cargo-mutants)
Rust (cargo-mutants)
cat mutants.out/missed.txt | grep "src/relevant"
undefinedcat mutants.out/missed.txt | grep "src/relevant"
undefinedStep 3: Graph-Informed Classification
步骤3:基于调用图的分类
For each escaped mutant, map it to its containing function in the
call graph and apply the genotoxic triage criteria:
| Graph Signal | Classification | Action |
|---|---|---|
| No callers in graph | False Positive | Dead code, skip |
| Only test callers | False Positive | Test infrastructure |
| Logging/display/formatting | False Positive | Cosmetic |
| Cross-package callers but NOT COVERED | Cross-Package Gap | See below |
| Reachable from public API, low CC | Missing Vector | Design targeted vector |
| Reachable from public API, high CC (>10) | Fuzzing Target | Both vector + fuzz harness |
| Validation/error-handling path | Negative Vector | Craft invalid input that triggers path |
| Optimization path (GLV, SIMD, batch) | Edge-Case Vector | Input that triggers optimization threshold |
| Equivalent Mutant | Skip — bit 0 always 0, OR=XOR |
ct_eq | API-Unreachable | Needs library-internal tests, not vectors |
| Equivalent mutation (behavior unchanged) | False Positive | Skip |
对每个逃逸突变体,映射到调用图中对应的函数,应用genotoxic分类标准:
| 调用图信号 | 分类 | 操作 |
|---|---|---|
| 图中无调用方 | 误报 | 死代码,跳过 |
| 仅测试代码调用 | 误报 | 测试基础设施代码,跳过 |
| 日志/展示/格式化代码 | 误报 | 外观类代码,跳过 |
| 跨包调用但未覆盖 | 跨包覆盖缺口 | 参考下方说明 |
| 可从公共API访问,低圈复杂度 | 缺失测试向量 | 设计定向测试向量 |
| 可从公共API访问,高圈复杂度(>10) | Fuzz测试目标 | 同时需要测试向量和fuzz脚手架 |
| 验证/错误处理路径 | 否定测试向量 | 构造触发该路径的无效输入 |
| 优化路径(GLV、SIMD、批处理) | 边界情况测试向量 | 触发优化阈值的输入 |
左移后的 | 等价突变 | 跳过 — 第0位始终为0,OR和XOR效果一致 |
Montgomery limb的ct_eq | API不可达 | 需要库内部测试,无需测试向量 |
| 等价突变(行为无变化) | 误报 | 跳过 |
Step 4: Identify Cross-Package Test Gaps
步骤4:识别跨包测试缺口
Critical pitfall: Mutation frameworks often only run tests
within the same package as the mutation. For Go (gremlins) and
Rust (cargo-mutants), this means:
- A mutation in only runs tests in the
hash_to_curve/g2.gopackage, NOT tests in the parenthash_to_curvepackage that imports itbls12381 - Functions that are fully exercised by cross-package tests will appear as NOT COVERED — these are false positives
- To confirm: check if the mutated function is called from a test in a different package that wouldn't be run
To resolve cross-package gaps:
- Add a thin test in the sub-package that calls through the same code path as the cross-package test
- Or run gremlins with (if supported)
--test-pkg ./... - Or document as a framework limitation in the report
关键陷阱: 突变测试框架通常仅运行突变所在包内的测试。对Go(gremlins)和Rust(cargo-mutants)来说:
- 中的突变仅运行
hash_to_curve/g2.go包内的测试,不会运行导入该包的父级hash_to_curve包中的测试bls12381 - 被跨包测试完全覆盖的函数会显示为未覆盖 — 这些是误报
- 验证方法:检查突变函数是否被其他包的测试调用,而这些测试不会被当前框架运行
解决跨包缺口的方法:
- 在子包中添加精简测试,调用跨包测试覆盖的同一条代码路径
- 或者gremlins使用参数运行(如果支持)
--test-pkg ./... - 或者在报告中注明为框架限制
Step 5: Prioritize by Security Impact
步骤5:按安全影响排序
Using the call graph, rank surviving mutants by impact:
| Priority | Criteria | Example |
|---|---|---|
| P0 — Critical | Mutant weakens validation/equality/authentication | |
| P1 — High | Mutant in deserialization flag parsing | |
| P2 — Medium | Mutant in field arithmetic internals | |
| P3 — Low | Mutant in optimization path | |
| Skip | Formatting, display, equivalent mutation | |
使用调用图,按影响对存活突变排序:
| 优先级 | 标准 | 示例 |
|---|---|---|
| P0 — 严重 | 突变削弱验证/等值判断/认证逻辑 | |
| P1 — 高 | 反序列化标志解析逻辑的突变 | |
| P2 — 中 | 域运算内部逻辑的突变 | |
| P3 — 低 | 优化路径的突变 | |
| 跳过 | 格式化、展示、等价突变 | |
Step 6: Group by Vector Strategy
步骤6:按向量策略分组
Group escaped mutants by the code path they represent and the
type of test vector needed:
Deserialization flag validation (P1):
- g1.rs:339,363-365,384 — from_compressed_unchecked flags
→ Need: valid-point-wrong-flag vectors
Field arithmetic (P2):
- fp.rs:371-376,406,635-643 — subtract_p, neg, square
→ Need: field arithmetic KATs with edge-case values
Optimization thresholds (P3):
- g1.go:68, g2.go:75 — GLV vs windowed multiplication
→ Need: scalar multiplication with large scalars
Cross-package (framework limitation):
- hash_to_curve/g2.go:242-278 — isogeny, sgn0
→ Document as false positive or add sub-package testEach group becomes a target for new test vectors in Phase 5.
按逃逸突变体对应的代码路径和所需测试向量类型分组:
反序列化标志验证(P1):
- g1.rs:339,363-365,384 — from_compressed_unchecked标志
→ 需要:有效点+错误标志的测试向量
域运算(P2):
- fp.rs:371-376,406,635-643 — subtract_p, neg, square
→ 需要:带边界值的域运算KAT
优化阈值(P3):
- g1.go:68, g2.go:75 — GLV vs 窗口乘法
→ 需要:大标量的标量乘法测试向量
跨包(框架限制):
- hash_to_curve/g2.go:242-278 — isogeny, sgn0
→ 标记为误报或添加子包测试每个分组都是阶段5中新测试向量的生成目标。
Phase 5: Vector Generation
阶段5:向量生成
For each escaped code path group, design test vectors that
force execution through that path.
对每个逃逸代码路径分组,设计强制执行该路径的测试向量。
Vector Design Patterns
向量设计模式
| Code Path Type | Vector Strategy |
|---|---|
| Point deserialization | Malformed points: wrong length, invalid field elements, off-curve, wrong subgroup, identity point |
| Signature verification | Valid sig + all single-bit corruptions of sig, pk, msg |
| Hash-to-curve | Known answer tests (KATs) with edge-case inputs: empty, single byte, max length |
| Aggregate operations | 1 signer, many signers, duplicate signers, mixed valid/invalid |
| Error handling | Every error path should have a vector that triggers it |
| Arithmetic edge cases | Zero, one, field modulus - 1, points at infinity |
| Serialization flags | Every valid flag combination + every invalid flag combination |
| Roundtrip integrity | For every valid deser vector, assert |
| Carry/reduction faults | Reimplement at reduced limb widths, inject faults, extract distinguishing inputs |
| 代码路径类型 | 向量策略 |
|---|---|
| 点反序列化 | 格式错误的点:长度错误、无效域元素、不在曲线上、子群错误、单位点 |
| 签名验证 | 有效签名 + 签名、公钥、消息的所有单位错误篡改 |
| Hash-to-curve | 带边界输入的已知答案测试(KAT):空输入、单字节、最大长度 |
| 聚合操作 | 1个签名者、多个签名者、重复签名者、有效/无效混合 |
| 错误处理 | 每个错误路径都要有触发该路径的测试向量 |
| 运算边界情况 | 零、一、域模数减1、无穷远点 |
| 序列化标志 | 所有有效标志组合 + 所有无效标志组合 |
| 往返完整性 | 对每个有效反序列化向量,断言 |
| 进位/约简错误 | 用更窄的位宽重新实现,注入错误,提取可区分输入 |
Single-Fault Negative Vectors
单错误否定向量
Each negative vector should have exactly one defect with
everything else valid — this isolates which validation check is
being tested. See references/vector-patterns.md
for per-flag construction examples.
每个否定向量应该仅包含一个缺陷,其余部分全部有效 — 这样可以隔离被测试的验证检查。参考references/vector-patterns.md查看每个标志的构造示例。
Fault Simulation (Limb-Width Reimplementation)
错误模拟(窄位宽重实现)
When mutation testing only applies local operator swaps, deeper
architectural bugs (carry propagation, reduction overflow) go
untested. To close this gap, reimplement the target algorithm
at reduced limb widths (8, 16, 25, 32 bits) and deliberately
inject faults — then generate vectors that catch them.
See references/fault-simulation.md
for the full methodology: limb-width selection, fault injection
catalog, vector extraction, and validation workflow.
突变测试仅应用本地运算符替换时,更深层的架构错误(进位传播、约简溢出)无法被测试到。为填补这个缺口,用更窄的位宽(8、16、25、32位)重新实现目标算法,主动注入错误 — 然后生成捕获这些错误的测试向量。
参考references/fault-simulation.md查看完整方法:位宽选择、错误注入目录、向量提取和验证工作流。
Cross-Implementation Verification
跨实现验证
Every new test vector MUST be verified against at least two
independent implementations before being added to the suite:
- Generate the vector using implementation A
- Verify with implementation B (different codebase, ideally different language)
- If B disagrees, investigate — one implementation has a bug
每个新测试向量加入套件前,必须在至少两个独立实现上验证通过:
- 使用实现A生成向量
- 使用实现B验证(不同代码库,最好是不同语言)
- 如果B的结果不一致,排查问题 — 其中一个实现存在bug
Vector Format
向量格式
Use Wycheproof JSON format (,
with , , , ). See
references/vector-patterns.md
for the full schema.
algorithmtestGroups[].tests[]tcIdcommentresultflagsJSON encoding: Wycheproof canonicalizes vectors with
, which unescapes HTML entities. Generate vectors
with literal characters, not HTML-escaped sequences:
reformat_json.py- Go: Use +
json.NewEncoder— neverenc.SetEscapeHTML(false)/json.Marshal, which silently escapejson.MarshalIndent→>,\u003e→<,\u003c→&\u0026 - Python: is safe by default
json.dumps - Node.js: is safe by default
JSON.stringify
See references/lessons-learned.md
§14 for details.
使用Wycheproof JSON格式(、包含、、、)。参考references/vector-patterns.md查看完整schema。
algorithmtestGroups[].tests[]tcIdcommentresultflagsJSON编码: Wycheproof使用规范化向量,该工具会取消HTML实体转义。生成向量时使用字面字符,不要使用HTML转义序列:
reformat_json.py- Go: 使用+
json.NewEncoder— 不要使用enc.SetEscapeHTML(false)/json.Marshal,这两个函数会自动转义json.MarshalIndent→>、\u003e→<、\u003c→&\u0026 - Python: 默认是安全的
json.dumps - Node.js: 默认是安全的
JSON.stringify
参考references/lessons-learned.md第14节查看详情。
Phase 6: Validation
阶段6:验证
Re-run mutation testing with the new test vectors included.
Tip: Use per-file mutation testing for fast iteration during
vector development (see references/lessons-learned.md §12).
Only run full-crate tests for the final comparison.
加入新测试向量后重新运行突变测试。
提示: 向量开发阶段使用单文件突变测试加快迭代(参考references/lessons-learned.md第12节),最终对比时再运行全量测试。
Before/After Comparison
前后对比
| Metric | Baseline | With New Vectors | Delta |
|---|---|---|---|
| Killed | X | Y | Y - X |
| Survived | A | B | A - B (should decrease) |
| Not Covered | C | D | C - D (should decrease) |
| Efficacy % | E% | F% | F - E |
| 指标 | 基线 | 加入新向量后 | 差值 |
|---|---|---|---|
| 被杀死 | X | Y | Y - X |
| 存活 | A | B | A - B(应该下降) |
| 未覆盖 | C | D | C - D(应该下降) |
| 有效性% | E% | F% | F - E |
Success Criteria
成功标准
Vectors have both retroactive value (killing mutants in
existing code) and proactive value (catching bugs in future
implementations). Generate both kinds — boundary-condition vectors
may not improve kill rates in mature libraries but will catch bugs
in new implementations. See
references/lessons-learned.md §13.
Retroactive (measurable): previously survived/uncovered mutants
become killed, no regressions.
If kill rates don't change: the implementation's own tests
likely already cover those paths. The vectors still add
cross-implementation verification value. Document which case
applies.
测试向量同时具备追溯价值(杀死现有代码中的突变)和预防价值(捕获未来实现中的bug)。两种都要生成 — 边界条件向量可能不会提升成熟库的突变杀死率,但会捕获新实现中的bug。参考references/lessons-learned.md第13节。
追溯价值(可衡量): 之前存活/未覆盖的突变被杀死,无回归。
如果杀死率没有变化: 实现自带的测试可能已经覆盖了这些路径,测试向量仍然具备跨实现验证的价值,在报告中说明对应情况即可。
Output Format
输出格式
Write covering: target algorithm,
implementations tested, baseline results, escape analysis,
new vectors generated, after results, before/after delta, and
conclusions. See
references/report-template.md
for the full template.
VECTOR_FORGE_REPORT.md编写,包含:目标算法、测试的实现、基线结果、逃逸分析、生成的新向量、加入后的结果、前后差值、结论。参考references/report-template.md查看完整模板。
VECTOR_FORGE_REPORT.mdQuality Checklist
质量检查清单
Before delivering:
- At least one pure implementation mutation-tested (not just FFI wrappers)
- Baseline run completed with existing vectors
- Trailmark call graph built for each implementation
- All escaped mutants triaged using graph-informed classification
- Cross-package false positives identified and documented
- Security-critical mutations (ct_eq, validation, auth) prioritized as P0/P1
- Fault simulation and mutation-derived vectors cross-verified against 2+ implementations
- After run completed with new vectors included
- Before/after delta computed and explained
- Report written to
VECTOR_FORGE_REPORT.md - New test vectors saved in standard format (Wycheproof JSON)
交付前确认:
- 至少对一个纯实现进行了突变测试(不只是FFI包装)
- 已使用现有向量完成基线测试
- 已为每个实现构建Trailmark调用图
- 所有逃逸突变体都使用基于调用图的分类完成了梳理
- 跨包误报已识别并记录
- 安全关键突变(ct_eq、验证、认证)已按P0/P1优先级处理
- 错误模拟和突变衍生的向量已在2个以上实现上交叉验证
- 已加入新向量完成后续测试
- 已计算并解释前后差值
- 已编写报告
VECTOR_FORGE_REPORT.md - 新测试向量已按标准格式(Wycheproof JSON)保存
Integration
集成说明
| Skill | Relationship |
|---|---|
| genotoxic (required for Phase 4) | Provides graph-informed triage — call graph cuts actionable mutants by 30-50% |
| mutation-testing (mewt/muton) | Use for Solidity; Vector Forge is language-agnostic |
| property-based-testing | Better than hand-crafted vectors for bitwise mutations in field arithmetic |
| testing-handbook-skills (fuzzing) | Functions with CC > 10 and surviving mutants need both vectors and fuzz harnesses |
| Skill | 关系 |
|---|---|
| genotoxic(阶段4必需) | 提供基于调用图的分类方法 — 调用图可将可落地的突变数量减少30-50% |
| mutation-testing(mewt/muton) | 用于Solidity测试;Vector Forge是语言无关的 |
| property-based-testing | 针对域运算中的位突变,比手动构造向量效果更好 |
| testing-handbook-skills(fuzzing) | 圈复杂度>10且存在存活突变的函数同时需要测试向量和fuzz脚手架 |
Supporting Documentation
支持文档
- references/mutation-frameworks.md - Language-specific mutation testing framework setup
- references/vector-patterns.md - Common test vector patterns for cryptographic primitives
- references/fault-simulation.md - Limb-width reimplementation for carry, reduction, and overflow faults
- references/report-template.md - Full markdown template for the Vector Forge report
- references/lessons-learned.md - BLS12-381 case study: FFI kill rates, timeout masking, cross-package false positives, bitwise mutation gaps, and security-critical priorities
- references/mutation-frameworks.md — 不同语言的突变测试框架搭建指南
- references/vector-patterns.md — 密码原语的常用测试向量模式
- references/fault-simulation.md — 用于检测进位、约简和溢出错误的窄位宽重实现方法
- references/report-template.md — Vector Forge报告的完整Markdown模板
- references/lessons-learned.md — BLS12-381案例研究:FFI杀死率、超时掩盖、跨包误报、位突变缺口、安全关键优先级