vector-forge

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Vector Forge

Uses mutation testing to systematically identify gaps in test vector coverage, then generates new test vectors that close those gaps. Measures effectiveness by comparing mutation kill rates before and after.

使用突变测试系统地识别测试向量覆盖中的缺口，然后生成填补这些缺口的新测试向量，通过对比前后突变杀死率衡量有效性。

When to Use

适用场景

Generating test vectors for cryptographic algorithms or protocols
Evaluating how well existing test vectors cover an implementation
Finding implementation code paths that no test vector exercises
Creating Wycheproof-style cross-implementation test vectors
Measuring the concrete coverage value of a test vector suite

生成密码算法或协议的测试向量
评估现有测试向量对某个实现的覆盖程度
查找没有测试向量覆盖的实现代码路径
创建Wycheproof风格的跨实现测试向量
衡量测试向量套件的具体覆盖价值

When NOT to Use

不适用场景

No implementations exist yet (need code to mutate)
Single trivial implementation with no edge cases
Testing application logic rather than algorithm implementations
The algorithm has no public test vectors to compare against

尚无任何实现（需要代码来执行突变）
单个无边界情况的简单实现
测试应用逻辑而非算法实现
算法没有公开测试向量可用于对比

Prerequisites

前置要求

trailmark installed — if

uv run trailmark

fails, run:

bash

uv pip install trailmark

At least one implementation of the target algorithm in a language with mutation testing support
A test harness that consumes test vectors and exercises the implementation
A mutation testing framework for the target language

已安装trailmark — 如果
```
uv run trailmark
```
运行失败，请执行：
bash
```
uv pip install trailmark
```
目标算法至少有一个实现，且该实现所用语言支持突变测试
可接收测试向量并运行实现的测试脚手架
目标语言对应的突变测试框架

Rationalizations to Reject

需驳回的错误认知

Rationalization	Why It's Wrong	Required Action
"We have enough test vectors"	Mutation testing proves otherwise	Run the baseline first
"The implementation's own tests are sufficient"	Own tests often share blind spots with the impl	Cross-impl vectors catch different bugs
"FFI crates can be mutation tested at the binding layer"	Mutations to wrappers don't affect the underlying impl	Mutate the actual implementation language
"Timeouts mean the mutation was caught"	Timeouts are ambiguous — could be killed or alive	Resolve timeouts before drawing conclusions
"All mutants are equivalent"	Most aren't — verify by reading the mutation	Classify each escaped mutant individually
"Checking valid vectors is enough"	Permissive mutations survive without negative assertions	Assert rejection for every invalid vector
"Manual analysis is fine"	Manual analysis misses what tooling catches	Install and run the tools

错误认知	错误原因	所需操作
"我们的测试向量已经足够了"	突变测试证明并非如此	先运行基线测试
"实现自带的测试已经足够"	自带测试通常和实现有相同的盲区	跨实现测试向量能发现不同的bug
"FFI crates可以在绑定层进行突变测试"	对包装层的突变不会影响底层实现	对实际的实现代码执行突变
"超时意味着突变被捕获了"	超时是歧义的 — 可能被杀死也可能存活	在得出结论前解决超时问题
"所有突变都是等价的"	绝大多数都不是 — 通过阅读突变代码验证	逐个分类每个逃逸突变体
"只检查有效向量就足够了"	如果没有否定断言，宽松的突变会存活	对每个无效向量都要断言其被拒绝
"人工分析就足够了"	人工分析会遗漏工具能发现的问题	安装并运行工具

Workflow Overview

工作流概览

Phase 1: Discovery       → Find implementations to test
      ↓
Phase 2: Harness         → Write/adapt test vector harness for each impl
      ↓
Phase 3: Baseline        → Run mutation testing with existing vectors
      ↓
Phase 4: Escape Analysis → Classify escaped mutants by code path
      ↓
Phase 5: Vector Gen      → Create test vectors targeting escapes
      ↓
Phase 6: Validation      → Re-run mutation testing, compare before/after
      ↓
Output: Coverage Report + New Test Vectors

Phase 1: Discovery       → Find implementations to test
      ↓
Phase 2: Harness         → Write/adapt test vector harness for each impl
      ↓
Phase 3: Baseline        → Run mutation testing with existing vectors
      ↓
Phase 4: Escape Analysis → Classify escaped mutants by code path
      ↓
Phase 5: Vector Gen      → Create test vectors targeting escapes
      ↓
Phase 6: Validation      → Re-run mutation testing, compare before/after
      ↓
Output: Coverage Report + New Test Vectors

Phase 1: Discovery

阶段1：发现

Find implementations of the target algorithm. Look for:

Pure implementations in high-level languages (Go, Rust, Python) — these are the best mutation testing targets
FFI wrapper crates — identify these early so you don't waste time mutating wrapper glue code
Reference implementations — useful for cross-verification but may not be the best mutation targets

For each implementation, note:

Language and mutation testing framework
Whether it's pure code or FFI wrappers
Existing test suite size and coverage
Which API surface the test vectors will exercise

查找目标算法的实现，优先查找：

高级语言（Go、Rust、Python）编写的纯实现 — 这些是最佳的突变测试目标
FFI wrapper crates — 尽早识别这类实现，避免浪费时间突变包装层的粘合代码
参考实现 — 可用于交叉验证，但可能不是最佳的突变测试目标

对每个实现，记录以下信息：

所用语言和对应的突变测试框架
是纯代码实现还是FFI包装实现
现有测试套件规模和覆盖率
测试向量需要覆盖的API面

Implementation Type Classification

实现类型分类

Type	Mutation Value	Example
Pure implementation	High	zkcrypto/bls12_381 (Rust), gnark-crypto (Go)
FFI bindings to C/asm	Low at binding layer	blst Rust crate
C/C++ implementation	High (use Mull)	blst C library
Generated code	Medium (mutations may be equivalent)	gnark-crypto generated field arithmetic

Key insight: If an implementation delegates to another language via FFI, you must mutate the underlying implementation, not the bindings. For C/C++ underneath Rust/Go/Python, use Mull or similar.

类型	突变测试价值	示例
纯实现	高	zkcrypto/bls12_381 (Rust), gnark-crypto (Go)
绑定C/汇编的FFI包装	绑定层价值低	blst Rust crate
C/C++实现	高（使用Mull）	blst C library
生成的代码	中等（突变可能是等价的）	gnark-crypto生成的域运算代码

核心要点： 如果某个实现通过FFI调用其他语言的代码，你必须对底层实现执行突变，而非绑定层代码。对于Rust/Go/Python底层调用C/C++的场景，使用Mull或同类工具。

Phase 2: Harness

阶段2：测试脚手架

For each implementation, create a test harness that:

Reads test vectors from JSON files (Wycheproof format recommended)
Exercises the implementation's API for each vector
Asserts both acceptance and rejection:
- Valid vectors: deserialization succeeds, output matches expected
- Invalid vectors: deserialization fails or verification rejects
Adds roundtrip assertions for valid deserialization vectors:
```
serialize(deserialize(bytes)) == bytes
```
Reports pass/fail per vector with test IDs

Critical: A harness that only checks valid vectors will miss all permissive mutations (e.g.,

→

in validation). See references/lessons-learned.md §7.

The harness must be runnable by the mutation testing framework. For most frameworks this means:

Go: A
```
_test.go
```
file in the same package as the implementation
Rust: An integration test in
```
tests/
```
or inline
```
#[test]
```
functions
Python: A pytest test file
C/C++: A test binary linked against the implementation

为每个实现创建测试脚手架，满足以下要求：

从JSON文件读取测试向量（推荐Wycheproof格式）
对每个向量调用实现的API
同时断言接受和拒绝两种情况：
- 有效向量：反序列化成功，输出与预期一致
- 无效向量：反序列化失败或验证被拒绝
对有效反序列化向量添加往返断言：
```
serialize(deserialize(bytes)) == bytes
```
按测试ID报告每个向量的通过/失败状态

关键注意点： 仅检查有效向量的脚手架会遗漏所有宽松突变（例如验证逻辑中的

→

错误）。参考references/lessons-learned.md第7节。

脚手架必须可被突变测试框架运行，对大多数框架来说：

Go: 实现所在包下的
```
_test.go
```
文件
Rust:
```
tests/
```
目录下的集成测试或内联
```
#[test]
```
函数
Python: pytest测试文件
C/C++: 链接了实现的测试二进制文件

Harness Placement

脚手架放置位置

The harness must live inside the implementation's package so the mutation framework can see it. This usually means:

bash

undefined

脚手架必须放在实现的包内部，这样突变测试框架才能识别到，通常操作如下：

bash

undefined

Go: add test file to the package being mutated

Go: 把测试文件添加到待突变的包目录下

cp wycheproof_test.go /path/to/impl/package/

Rust: add integration test

Rust: 添加集成测试

cp wycheproof.rs /path/to/crate/tests/

Python: add test to the test directory

Python: 把测试文件添加到测试目录

cp test_wycheproof.py /path/to/package/tests/

undefined

cp test_wycheproof.py /path/to/package/tests/

undefined

Handling Existing Vectors

处理现有向量

If the implementation already has test vectors:

Run mutation testing with ONLY the existing vectors (baseline)
Run mutation testing with ONLY your new vectors
Run mutation testing with BOTH combined
The delta between (1) and (3) shows the new vectors' value

如果实现已经自带测试向量：

仅使用现有向量运行突变测试（基线测试）
仅使用你的新向量运行突变测试
合并新旧向量运行突变测试
(1)和(3)的差值就是新向量的价值

Phase 3: Baseline

阶段3：基线测试

Run mutation testing with existing test vectors only.

仅使用现有测试向量运行突变测试。

Framework Selection

框架选择

See references/mutation-frameworks.md for language-specific setup.

Language	Framework	Command
Go	gremlins	`gremlins unleash ./path/to/package`
Rust	cargo-mutants	`cargo mutants -j N --timeout T`
Python	mutmut	`mutmut run --paths-to-mutate src/`
C/C++	Mull	`mull-runner -test-framework=GoogleTest binary`

参考references/mutation-frameworks.md完成不同语言的环境搭建。

语言	框架	命令
Go	gremlins	`gremlins unleash ./path/to/package`
Rust	cargo-mutants	`cargo mutants -j N --timeout T`
Python	mutmut	`mutmut run --paths-to-mutate src/`
C/C++	Mull	`mull-runner -test-framework=GoogleTest binary`

Parallelism

并行执行

Always use parallel execution for large codebases:

```
cargo mutants -j 8
```
(Rust, 8 parallel workers)

gremlins unleash --timeout-coefficient 3

(Go, increase timeouts)

```
mutmut run --runner "pytest -x -q"
```
(Python, fail-fast)

大型代码库请始终使用并行执行：

```
cargo mutants -j 8
```
（Rust，8个并行工作进程）

gremlins unleash --timeout-coefficient 3

（Go，增加超时时间）

```
mutmut run --runner "pytest -x -q"
```
（Python，快速失败）

Recording Baseline Results

记录基线结果

Capture these metrics per implementation:

Metric	Description
Total mutants	Number of mutations generated
Killed	Mutants caught by tests
Survived/Lived	Mutants NOT caught (these are the targets)
Not covered	Code paths no test reaches at all
Timed out	Ambiguous — resolve before comparing
Efficacy %	Killed / (Killed + Survived)
Coverage %	(Total - Not covered) / Total

Save the full mutation log for Phase 4 analysis.

为每个实现记录以下指标：

指标	描述
总突变数	生成的突变总数
被杀死	被测试捕获的突变
存活	未被测试捕获的突变（这些是优化目标）
未覆盖	没有任何测试覆盖的代码路径
超时	歧义状态 — 对比前需解决
有效性%	被杀死 / (被杀死 + 存活)
覆盖率%	(总突变数 - 未覆盖) / 总突变数

保存完整的突变日志用于阶段4的分析。

Phase 4: Escape Analysis (Graph-Informed Triage)

阶段4：逃逸分析（基于调用图的分类）

Classify each escaped (survived + not covered) mutant using the Trailmark call graph for reachability and blast radius analysis.

This phase MUST use the genotoxic skill's triage methodology. The call graph transforms mutation results from a flat list of survived mutants into an actionable, prioritized set of vector targets.

使用Trailmark调用图分析可达性和影响范围，对每个逃逸（存活+未覆盖）突变体进行分类。

本阶段必须使用genotoxic skill的分类方法。 调用图会将扁平的存活突变列表转换为可落地、可优先级排序的测试向量目标集。

Step 1: Build the Call Graph

步骤1：构建调用图

Build a Trailmark code graph for each implementation before triaging mutations:

bash

undefined

在分类突变前，为每个实现构建Trailmark代码图：

bash

undefined

Go

uv run trailmark analyze --language go --summary {targetDir}

Rust

uv run trailmark analyze --language rust --summary {targetDir}


The graph provides:
- **Caller chains** — trace from public API entry points to
  mutated functions to determine reachability
- **Cyclomatic complexity** — prioritize high-CC functions
- **Blast radius** — functions with many callers have wider
  impact if their mutations survive

uv run trailmark analyze --language rust --summary {targetDir}


调用图提供以下信息：
- **调用链** — 从公共API入口追溯到突变函数，判断可达性
- **圈复杂度** — 优先处理高圈复杂度函数
- **影响范围** — 被大量函数调用的函数如果发生突变存活，影响范围更广

Step 2: Filter to Relevant Code

步骤2：过滤相关代码

Mutation frameworks test the entire package. Filter results to only the files/functions that test vectors should exercise:

bash

undefined

突变测试框架会测试整个包，将结果过滤为仅测试向量需要覆盖的文件/函数：

bash

undefined

Go (gremlins)

grep -E "(LIVED|NOT COVERED)" baseline.log
| grep -E " at (relevant|files)"
| sort

Rust (cargo-mutants)

cat mutants.out/missed.txt | grep "src/relevant"

undefined

cat mutants.out/missed.txt | grep "src/relevant"

undefined

Step 3: Graph-Informed Classification

步骤3：基于调用图的分类

For each escaped mutant, map it to its containing function in the call graph and apply the genotoxic triage criteria:

Graph Signal	Classification	Action
No callers in graph	False Positive	Dead code, skip
Only test callers	False Positive	Test infrastructure
Logging/display/formatting	False Positive	Cosmetic
Cross-package callers but NOT COVERED	Cross-Package Gap	See below
Reachable from public API, low CC	Missing Vector	Design targeted vector
Reachable from public API, high CC (>10)	Fuzzing Target	Both vector + fuzz harness
Validation/error-handling path	Negative Vector	Craft invalid input that triggers path
Optimization path (GLV, SIMD, batch)	Edge-Case Vector	Input that triggers optimization threshold
`\|` → `^` after left shift (e.g. `(t<<1) \| carry` )	Equivalent Mutant	Skip — bit 0 always 0, OR=XOR
ct_eq `&` → `\|` on Montgomery limbs	API-Unreachable	Needs library-internal tests, not vectors
Equivalent mutation (behavior unchanged)	False Positive	Skip

对每个逃逸突变体，映射到调用图中对应的函数，应用genotoxic分类标准：

调用图信号	分类	操作
图中无调用方	误报	死代码，跳过
仅测试代码调用	误报	测试基础设施代码，跳过
日志/展示/格式化代码	误报	外观类代码，跳过
跨包调用但未覆盖	跨包覆盖缺口	参考下方说明
可从公共API访问，低圈复杂度	缺失测试向量	设计定向测试向量
可从公共API访问，高圈复杂度(>10)	Fuzz测试目标	同时需要测试向量和fuzz脚手架
验证/错误处理路径	否定测试向量	构造触发该路径的无效输入
优化路径（GLV、SIMD、批处理）	边界情况测试向量	触发优化阈值的输入
左移后的 `\|` → `^` （例如 `(t<<1) \| carry` ）	等价突变	跳过 — 第0位始终为0，OR和XOR效果一致
Montgomery limb的ct_eq `&` → `\|`	API不可达	需要库内部测试，无需测试向量
等价突变（行为无变化）	误报	跳过

Step 4: Identify Cross-Package Test Gaps

步骤4：识别跨包测试缺口

Critical pitfall: Mutation frameworks often only run tests within the same package as the mutation. For Go (gremlins) and Rust (cargo-mutants), this means:

A mutation in
```
hash_to_curve/g2.go
```
only runs tests in the
```
hash_to_curve
```
package, NOT tests in the parent
```
bls12381
```
package that imports it
Functions that are fully exercised by cross-package tests will appear as NOT COVERED — these are false positives
To confirm: check if the mutated function is called from a test in a different package that wouldn't be run

To resolve cross-package gaps:

Add a thin test in the sub-package that calls through the same code path as the cross-package test
Or run gremlins with
```
--test-pkg ./...
```
(if supported)
Or document as a framework limitation in the report

关键陷阱： 突变测试框架通常仅运行突变所在包内的测试。对Go（gremlins）和Rust（cargo-mutants）来说：

```
hash_to_curve/g2.go
```
中的突变仅运行
```
hash_to_curve
```
包内的测试，不会运行导入该包的父级
```
bls12381
```
包中的测试
被跨包测试完全覆盖的函数会显示为未覆盖 — 这些是误报
验证方法：检查突变函数是否被其他包的测试调用，而这些测试不会被当前框架运行

解决跨包缺口的方法：

在子包中添加精简测试，调用跨包测试覆盖的同一条代码路径
或者gremlins使用
```
--test-pkg ./...
```
参数运行（如果支持）
或者在报告中注明为框架限制

Step 5: Prioritize by Security Impact

步骤5：按安全影响排序

Using the call graph, rank surviving mutants by impact:

Priority	Criteria	Example
P0 — Critical	Mutant weakens validation/equality/authentication	`ct_eq` : `&` → `\|` makes equality permissive
P1 — High	Mutant in deserialization flag parsing	`from_compressed` : `&` → `\|` accepts invalid flags
P2 — Medium	Mutant in field arithmetic internals	`Fp::square` : `\|` → `^` corrupts computation
P3 — Low	Mutant in optimization path	`phi` endomorphism: only affects performance path
Skip	Formatting, display, equivalent mutation	`Debug::fmt` return value replacement

使用调用图，按影响对存活突变排序：

优先级	标准	示例
P0 — 严重	突变削弱验证/等值判断/认证逻辑	`ct_eq` : `&` → `\|` 导致等值判断逻辑过于宽松
P1 — 高	反序列化标志解析逻辑的突变	`from_compressed` : `&` → `\|` 接受无效标志
P2 — 中	域运算内部逻辑的突变	`Fp::square` : `\|` → `^` 导致计算错误
P3 — 低	优化路径的突变	`phi` 自同构：仅影响性能路径
跳过	格式化、展示、等价突变	`Debug::fmt` 返回值替换

Step 6: Group by Vector Strategy

步骤6：按向量策略分组

Group escaped mutants by the code path they represent and the type of test vector needed:

Deserialization flag validation (P1):
  - g1.rs:339,363-365,384 — from_compressed_unchecked flags
  → Need: valid-point-wrong-flag vectors

Field arithmetic (P2):
  - fp.rs:371-376,406,635-643 — subtract_p, neg, square
  → Need: field arithmetic KATs with edge-case values

Optimization thresholds (P3):
  - g1.go:68, g2.go:75 — GLV vs windowed multiplication
  → Need: scalar multiplication with large scalars

Cross-package (framework limitation):
  - hash_to_curve/g2.go:242-278 — isogeny, sgn0
  → Document as false positive or add sub-package test

Each group becomes a target for new test vectors in Phase 5.

按逃逸突变体对应的代码路径和所需测试向量类型分组：

反序列化标志验证（P1）：
  - g1.rs:339,363-365,384 — from_compressed_unchecked标志
  → 需要：有效点+错误标志的测试向量

域运算（P2）：
  - fp.rs:371-376,406,635-643 — subtract_p, neg, square
  → 需要：带边界值的域运算KAT

优化阈值（P3）：
  - g1.go:68, g2.go:75 — GLV vs 窗口乘法
  → 需要：大标量的标量乘法测试向量

跨包（框架限制）：
  - hash_to_curve/g2.go:242-278 — isogeny, sgn0
  → 标记为误报或添加子包测试

每个分组都是阶段5中新测试向量的生成目标。

Phase 5: Vector Generation

阶段5：向量生成

For each escaped code path group, design test vectors that force execution through that path.

对每个逃逸代码路径分组，设计强制执行该路径的测试向量。

Vector Design Patterns

向量设计模式

Code Path Type	Vector Strategy
Point deserialization	Malformed points: wrong length, invalid field elements, off-curve, wrong subgroup, identity point
Signature verification	Valid sig + all single-bit corruptions of sig, pk, msg
Hash-to-curve	Known answer tests (KATs) with edge-case inputs: empty, single byte, max length
Aggregate operations	1 signer, many signers, duplicate signers, mixed valid/invalid
Error handling	Every error path should have a vector that triggers it
Arithmetic edge cases	Zero, one, field modulus - 1, points at infinity
Serialization flags	Every valid flag combination + every invalid flag combination
Roundtrip integrity	For every valid deser vector, assert `serialize(deserialize(b)) == b`
Carry/reduction faults	Reimplement at reduced limb widths, inject faults, extract distinguishing inputs

代码路径类型	向量策略
点反序列化	格式错误的点：长度错误、无效域元素、不在曲线上、子群错误、单位点
签名验证	有效签名 + 签名、公钥、消息的所有单位错误篡改
Hash-to-curve	带边界输入的已知答案测试（KAT）：空输入、单字节、最大长度
聚合操作	1个签名者、多个签名者、重复签名者、有效/无效混合
错误处理	每个错误路径都要有触发该路径的测试向量
运算边界情况	零、一、域模数减1、无穷远点
序列化标志	所有有效标志组合 + 所有无效标志组合
往返完整性	对每个有效反序列化向量，断言 `serialize(deserialize(b)) == b`
进位/约简错误	用更窄的位宽重新实现，注入错误，提取可区分输入

Single-Fault Negative Vectors

单错误否定向量

Each negative vector should have exactly one defect with everything else valid — this isolates which validation check is being tested. See references/vector-patterns.md for per-flag construction examples.

每个否定向量应该仅包含一个缺陷，其余部分全部有效 — 这样可以隔离被测试的验证检查。参考references/vector-patterns.md查看每个标志的构造示例。

Fault Simulation (Limb-Width Reimplementation)

错误模拟（窄位宽重实现）

When mutation testing only applies local operator swaps, deeper architectural bugs (carry propagation, reduction overflow) go untested. To close this gap, reimplement the target algorithm at reduced limb widths (8, 16, 25, 32 bits) and deliberately inject faults — then generate vectors that catch them.

See references/fault-simulation.md for the full methodology: limb-width selection, fault injection catalog, vector extraction, and validation workflow.

突变测试仅应用本地运算符替换时，更深层的架构错误（进位传播、约简溢出）无法被测试到。为填补这个缺口，用更窄的位宽（8、16、25、32位）重新实现目标算法，主动注入错误 — 然后生成捕获这些错误的测试向量。

参考references/fault-simulation.md查看完整方法：位宽选择、错误注入目录、向量提取和验证工作流。

Cross-Implementation Verification

跨实现验证

Every new test vector MUST be verified against at least two independent implementations before being added to the suite:

Generate the vector using implementation A
Verify with implementation B (different codebase, ideally different language)
If B disagrees, investigate — one implementation has a bug

每个新测试向量加入套件前，必须在至少两个独立实现上验证通过：

使用实现A生成向量
使用实现B验证（不同代码库，最好是不同语言）
如果B的结果不一致，排查问题 — 其中一个实现存在bug

Vector Format

向量格式

Use Wycheproof JSON format (

algorithm

testGroups[].tests[]

with

tcId

comment

result

flags

). See references/vector-patterns.md for the full schema.

JSON encoding: Wycheproof canonicalizes vectors with

reformat_json.py

, which unescapes HTML entities. Generate vectors with literal characters, not HTML-escaped sequences:

Go: Use

json.NewEncoder

enc.SetEscapeHTML(false)

— never

json.Marshal

json.MarshalIndent

, which silently escape

→

\u003e

→

\u003c

→

\u0026

Python:
```
json.dumps
```
is safe by default
Node.js:
```
JSON.stringify
```
is safe by default

See references/lessons-learned.md §14 for details.

使用Wycheproof JSON格式（

algorithm

、

testGroups[].tests[]

包含

tcId

、

comment

、

result

、

flags

）。参考references/vector-patterns.md查看完整schema。

JSON编码： Wycheproof使用

reformat_json.py

规范化向量，该工具会取消HTML实体转义。生成向量时使用字面字符，不要使用HTML转义序列：

Go: 使用

json.NewEncoder

enc.SetEscapeHTML(false)

— 不要使用

json.Marshal

json.MarshalIndent

，这两个函数会自动转义

→

\u003e

、

→

\u003c

、

→

\u0026

Python: 默认
```
json.dumps
```
是安全的
Node.js: 默认
```
JSON.stringify
```
是安全的

参考references/lessons-learned.md第14节查看详情。

Phase 6: Validation

阶段6：验证

Re-run mutation testing with the new test vectors included.

Tip: Use per-file mutation testing for fast iteration during vector development (see references/lessons-learned.md §12). Only run full-crate tests for the final comparison.

加入新测试向量后重新运行突变测试。

提示： 向量开发阶段使用单文件突变测试加快迭代（参考references/lessons-learned.md第12节），最终对比时再运行全量测试。

Before/After Comparison

前后对比

Metric	Baseline	With New Vectors	Delta
Killed	X	Y	Y - X
Survived	A	B	A - B (should decrease)
Not Covered	C	D	C - D (should decrease)
Efficacy %	E%	F%	F - E

指标	基线	加入新向量后	差值
被杀死	X	Y	Y - X
存活	A	B	A - B（应该下降）
未覆盖	C	D	C - D（应该下降）
有效性%	E%	F%	F - E

Success Criteria

成功标准

Vectors have both retroactive value (killing mutants in existing code) and proactive value (catching bugs in future implementations). Generate both kinds — boundary-condition vectors may not improve kill rates in mature libraries but will catch bugs in new implementations. See references/lessons-learned.md §13.

Retroactive (measurable): previously survived/uncovered mutants become killed, no regressions.

If kill rates don't change: the implementation's own tests likely already cover those paths. The vectors still add cross-implementation verification value. Document which case applies.

测试向量同时具备追溯价值（杀死现有代码中的突变）和预防价值（捕获未来实现中的bug）。两种都要生成 — 边界条件向量可能不会提升成熟库的突变杀死率，但会捕获新实现中的bug。参考references/lessons-learned.md第13节。

追溯价值（可衡量）： 之前存活/未覆盖的突变被杀死，无回归。

如果杀死率没有变化： 实现自带的测试可能已经覆盖了这些路径，测试向量仍然具备跨实现验证的价值，在报告中说明对应情况即可。

Output Format

输出格式

Write

VECTOR_FORGE_REPORT.md

covering: target algorithm, implementations tested, baseline results, escape analysis, new vectors generated, after results, before/after delta, and conclusions. See references/report-template.md for the full template.

编写

VECTOR_FORGE_REPORT.md

，包含：目标算法、测试的实现、基线结果、逃逸分析、生成的新向量、加入后的结果、前后差值、结论。参考references/report-template.md查看完整模板。

Quality Checklist

质量检查清单

Before delivering:

At least one pure implementation mutation-tested (not just FFI wrappers)
Baseline run completed with existing vectors
Trailmark call graph built for each implementation
All escaped mutants triaged using graph-informed classification
Cross-package false positives identified and documented
Security-critical mutations (ct_eq, validation, auth) prioritized as P0/P1
Fault simulation and mutation-derived vectors cross-verified against 2+ implementations
After run completed with new vectors included
Before/after delta computed and explained
Report written to
```
VECTOR_FORGE_REPORT.md
```
New test vectors saved in standard format (Wycheproof JSON)

交付前确认：

至少对一个纯实现进行了突变测试（不只是FFI包装）
已使用现有向量完成基线测试
已为每个实现构建Trailmark调用图
所有逃逸突变体都使用基于调用图的分类完成了梳理
跨包误报已识别并记录
安全关键突变（ct_eq、验证、认证）已按P0/P1优先级处理
错误模拟和突变衍生的向量已在2个以上实现上交叉验证
已加入新向量完成后续测试
已计算并解释前后差值
已编写
```
VECTOR_FORGE_REPORT.md
```
报告
新测试向量已按标准格式（Wycheproof JSON）保存

Integration

集成说明

Skill	Relationship
genotoxic (required for Phase 4)	Provides graph-informed triage — call graph cuts actionable mutants by 30-50%
mutation-testing (mewt/muton)	Use for Solidity; Vector Forge is language-agnostic
property-based-testing	Better than hand-crafted vectors for bitwise mutations in field arithmetic
testing-handbook-skills (fuzzing)	Functions with CC > 10 and surviving mutants need both vectors and fuzz harnesses

Skill	关系
genotoxic（阶段4必需）	提供基于调用图的分类方法 — 调用图可将可落地的突变数量减少30-50%
mutation-testing（mewt/muton）	用于Solidity测试；Vector Forge是语言无关的
property-based-testing	针对域运算中的位突变，比手动构造向量效果更好
testing-handbook-skills（fuzzing）	圈复杂度>10且存在存活突变的函数同时需要测试向量和fuzz脚手架

Supporting Documentation

支持文档

references/mutation-frameworks.md - Language-specific mutation testing framework setup
references/vector-patterns.md - Common test vector patterns for cryptographic primitives
references/fault-simulation.md - Limb-width reimplementation for carry, reduction, and overflow faults
references/report-template.md - Full markdown template for the Vector Forge report
references/lessons-learned.md - BLS12-381 case study: FFI kill rates, timeout masking, cross-package false positives, bitwise mutation gaps, and security-critical priorities

references/mutation-frameworks.md — 不同语言的突变测试框架搭建指南
references/vector-patterns.md — 密码原语的常用测试向量模式
references/fault-simulation.md — 用于检测进位、约简和溢出错误的窄位宽重实现方法
references/report-template.md — Vector Forge报告的完整Markdown模板
references/lessons-learned.md — BLS12-381案例研究：FFI杀死率、超时掩盖、跨包误报、位突变缺口、安全关键优先级