grace-verification
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDesign verification that autonomous agents can trust: deterministic where possible, observable and traceable where equality checks alone are not enough.
设计可供自主Agent信任的验证机制:在可行的情况下采用确定性验证,当仅靠相等性检查不足以验证时,则采用可观测、可追溯的验证方式。
Prerequisites
前置条件
- The target module or workflow must already have a contract
- Read the relevant , function contracts, and semantic blocks first
MODULE_CONTRACT - If no contract exists yet, route through or
$grace-planbefore building verification$grace-generate
- 目标模块或工作流必须已具备合约
- 请先阅读相关的、函数合约及语义块
MODULE_CONTRACT - 若尚未存在合约,请先通过或
$grace-plan流程,再开展验证机制的构建$grace-generate
Goal
目标
Verification in GRACE is not just "did the final value match?"
It must answer:
- did the system produce the correct result?
- did it follow an acceptable execution path?
- can another agent debug the failure from the evidence left behind?
Use contracts for expected behavior, semantic blocks for traceability, and tests/logs for evidence.
GRACE中的验证并非仅局限于「最终值是否匹配?」
它必须能够回答以下问题:
- 系统是否生成了正确的结果?
- 系统是否遵循了可接受的执行路径?
- 其他Agent能否根据遗留的调试证据排查故障?
请利用合约定义预期行为,利用语义块实现可追溯性,利用测试/日志提供验证证据。
Process
实施流程
Step 1: Derive Verification Targets from Contracts
步骤1:从合约推导验证目标
Read the module and function contracts. Extract:
- success scenarios
- failure scenarios
- critical invariants
- side effects
- forbidden behaviors
Turn these into a verification matrix before writing or revising tests.
阅读模块及函数合约,提取以下内容:
- 成功场景
- 失败场景
- 关键不变量
- 副作用
- 禁止行为
在编写或修改测试前,将上述内容转化为验证矩阵。
Step 2: Design Observability
步骤2:设计可观测性方案
For each critical path, define the minimum telemetry needed to debug and verify it.
At a minimum:
- important logs must reference
[ModuleName][functionName][BLOCK_NAME] - each critical branch should be visible in the trace
- side effects should be logged at a high-signal level
- secrets, credentials, and sensitive payloads must be redacted or omitted
Prefer stable structured logs or stable key fields over prose-heavy log lines.
针对每条关键路径,定义调试及验证所需的最低遥测要求。
至少需满足:
- 重要日志必须引用格式的标识
[ModuleName][functionName][BLOCK_NAME] - 每条关键分支都应在执行轨迹中可见
- 副作用需以高信号级别记录
- 密钥、凭证及敏感负载必须被脱敏或省略
相较于冗长的自然语言日志,优先选择结构化日志或具备稳定关键字段的日志。
Step 3: Build the Verification Matrix
步骤3:构建验证矩阵
For each scenario, decide which evidence type to use:
- Deterministic assertions for stable outputs, return values, state transitions, and exact invariants
- Trace assertions for required execution paths, branch decisions, retries, and failure handling
- Integration or smoke checks for end-to-end viability
- Semantic evaluation of traces only when domain correctness cannot be expressed reliably with exact asserts alone
If an exact assert works, use it. Do not replace strong deterministic checks with fuzzy evaluation.
针对每个场景,确定需采用的证据类型:
- 确定性断言:适用于稳定输出、返回值、状态转换及精确不变量的验证
- 轨迹断言:适用于验证必需的执行路径、分支决策、重试逻辑及故障处理
- 集成或冒烟测试:用于验证端到端可行性
- 轨迹语义评估:仅当领域正确性无法通过精确断言可靠表达时使用
若精确断言可行,请优先使用,切勿用模糊评估替代强确定性检查。
Step 4: Implement AI-Friendly Tests
步骤4:实现适配AI的测试
Write tests and harnesses that:
- execute the scenario
- collect the relevant trace, logs, or telemetry
- verify both:
- outcome correctness
- trajectory correctness
Typical trace checks:
- required block markers appeared
- forbidden block markers did not appear
- events occurred in the expected order
- retries stayed within allowed bounds
- failure mode matched the contract
编写测试及测试工具,需包含以下步骤:
- 执行目标场景
- 收集相关的执行轨迹、日志或遥测数据
- 同时验证:
- 结果正确性
- 执行轨迹正确性
典型的轨迹检查项:
- 必需的块标记是否出现
- 禁止的块标记是否未出现
- 事件是否按预期顺序发生
- 重试次数是否在允许范围内
- 故障模式是否与合约匹配
Step 5: Use Semantic Verification Carefully
步骤5:谨慎使用语义验证
When strict equality is too weak or too brittle, use bounded semantic checks.
Allowed pattern:
- provide the evaluator with:
- the contract
- the scenario description
- the observed trace or structured logs
- an explicit rubric
- ask whether the evidence satisfies the contract and why
Disallowed pattern:
- asking a model to "judge if this feels correct"
- using raw hidden reasoning as evidence
- relying on unconstrained free-form log dumps without a rubric
当严格相等性检查过于薄弱或脆弱时,可使用受限语义检查。
允许的模式:
- 向评估器提供以下内容:
- 合约文档
- 场景描述
- 观测到的轨迹或结构化日志
- 明确的评估标准
- 询问证据是否符合合约要求并说明原因
禁止的模式:
- 要求模型「判断结果是否合理」
- 将隐藏的推理过程作为证据
- 在无评估标准的情况下,依赖无约束的自由格式日志转储
Step 6: Apply Verification Levels
步骤6:应用分层验证
Match the verification depth to the execution stage.
- Module level: worker-local typecheck, lint, unit tests, deterministic assertions, and local trace checks
- Wave level: integration checks only for the merged surfaces touched in the wave
- Phase level: full suite, broad traceability checks, and final confidence checks before marking the phase done
Do not require full-repository verification after every clean module if the wave and phase gates already cover that risk.
根据执行阶段匹配对应的验证深度。
- 模块级别:工作节点本地的类型检查、代码扫描、单元测试、确定性断言及本地轨迹检查
- Wave级别:仅对当前Wave中涉及的合并面进行集成检查
- Phase级别:全量测试套件、广泛的可追溯性检查,以及在标记Phase完成前的最终置信度检查
若Wave及Phase阶段的验证已覆盖风险,则无需在每次模块清理后进行全仓库验证。
Step 7: Failure Triage
步骤7:故障分类
When verification fails, produce a concise failure packet:
- contract or scenario that failed
- expected evidence
- observed evidence
- first divergent module/function/block
- suggested next action
Use this packet to drive or to hand off the issue to another agent without losing context.
$grace-fix当验证失败时,生成简洁的故障数据包:
- 失败的合约或场景
- 预期证据
- 实际观测到的证据
- 首个出现分歧的模块/函数/块
- 建议的后续操作
利用该数据包驱动流程,或在不丢失上下文的情况下将问题移交其他Agent处理。
$grace-fixVerification Rules
验证规则
- Deterministic assertions first, semantic trace evaluation second
- Logs are evidence, not decoration
- Every important log should map back to a semantic block
- Do not log chain-of-thought or hidden reasoning
- Do not assert on unstable wording if stable fields are available
- Prefer high-signal traces over verbose noise
- If verification is weak, improve observability before adding more agents
- Prefer module-level checks during worker execution and reserve broader suites for wave or phase gates
- 优先使用确定性断言,其次才是轨迹语义评估
- 日志是验证证据,而非装饰
- 每条重要日志都应映射到对应的语义块
- 不得记录思维链或隐藏的推理过程
- 若存在稳定字段,则不得针对不稳定的表述做断言
- 优先选择高信号的执行轨迹,而非冗长的噪声信息
- 若验证机制薄弱,请先提升可观测性,再增加更多Agent
- 工作节点执行期间优先使用模块级检查,将更全面的测试套件留到Wave或Phase阶段的验证关卡
Deliverables
交付物
When using this skill, produce:
- a verification matrix
- the telemetry/logging requirements
- the tests or harness changes needed
- the recommended verification level split across module, wave, and phase
- a brief assessment of whether the module is safe for autonomous or multi-agent execution
使用本技能时,需产出以下内容:
- 验证矩阵
- 遥测/日志要求
- 所需的测试或测试工具变更
- 建议的分层验证方案(覆盖模块、Wave及Phase级别)
- 关于模块是否适合自主式或多Agent执行的简要评估
When to Use It
适用场景
- Before enabling autonomous execution for a module
- When multi-agent workflows need trustworthy checks
- When tests are too brittle or too shallow
- When bugs recur and logs are not actionable
- When business logic is hard to verify with plain equality asserts alone
- 为模块启用自主式执行前
- 多Agent工作流需要可信检查时
- 测试过于脆弱或深度不足时
- 故障重复出现且日志无法用于排查时
- 业务逻辑仅靠简单相等性断言难以验证时