verification-before-completion
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseOverview
概述
The verification-before-completion skill is the terminal checkpoint for every task in the toolkit. It enforces a strict 5-step protocol that requires running fresh verification commands, reading their full output, and confirming results match the completion claim. Without this skill, agents make unverified claims that lead to broken code in production — with it, every completion claim is backed by evidence.
verification-before-completion 技能是工具包中所有任务的最终检查点。它强制执行严格的5步协议,要求运行最新的验证命令,阅读完整输出,并确认结果与完成声明一致。如果没有这个技能,Agent会做出未经核实的声明,导致生产环境代码出现问题——使用该技能后,每一份完成声明都有证据支撑。
Iron Law
铁律
NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE.
You cannot say "it works," "it's done," "the bug is fixed," or "the feature is complete" without running verification commands and reading their output in this session. Cached results, previous runs, and assumptions do not count.
没有最新验证证据,不得做出任何完成声明。
你不能在未运行验证命令并阅读本次会话输出的情况下,就说「它能用」、「已经做完了」、「Bug已修复」或者「功能已完成」。缓存结果、之前的运行记录和主观假设都不具备效力。
Phase 1: Identify Verification Commands
第1阶段:确定验证命令
Before running anything, explicitly list what needs to pass for this specific task:
| Verification Type | Example Commands | When Required |
|---|---|---|
| Unit tests | | Always |
| Integration tests | | When applicable |
| Type checking | | When project uses type checking |
| Linting | | Always |
| Build | | Always |
| Format check | | When project uses formatters |
Action: List which of these apply to the current project. Not all projects have all types. Be explicit about what you will run and why.
STOP: Do NOT proceed to running commands until you have identified ALL applicable verification types.
运行任何命令前,明确列出当前特定任务需要通过的验证项:
| 验证类型 | 示例命令 | 适用场景 |
|---|---|---|
| 单元测试 | | 所有场景 |
| 集成测试 | | 适用时执行 |
| 类型检查 | | 项目使用类型检查时执行 |
| 代码检查 | | 所有场景 |
| 构建 | | 所有场景 |
| 格式检查 | | 项目使用格式化工具时执行 |
操作: 列出上述验证项中适用于当前项目的内容。不是所有项目都包含所有验证类型,明确说明你要运行的命令和原因。
停止:在确定所有适用的验证类型前,不要继续运行命令。
Phase 2: Run Commands Fresh
第2阶段:重新运行所有命令
Execute every verification command identified in Phase 1.
| Rule | Rationale |
|---|---|
| Run AFTER the latest code change | Pre-change results are stale |
| Run the FULL suite, not a subset | Subset runs miss regressions |
| Do NOT rely on cached results | Code changed since last run |
| Do NOT skip commands because "they passed earlier" | Earlier is not now |
| If a command takes >5 minutes, note it | Explain what was run instead |
STOP: All commands must complete before proceeding to Phase 3.
执行第1阶段确定的所有验证命令。
| 规则 | 依据 |
|---|---|
| 在最新代码变更后运行 | 变更前的结果是过时的 |
| 运行完整套件,不要只运行子集 | 子集运行会遗漏回归问题 |
| 不要依赖缓存结果 | 上次运行后代码已变更 |
| 不要因为「之前已经通过」就跳过命令 | 之前不代表现在 |
| 如果命令运行时长超过5分钟,请注明 | 说明你改为运行了什么命令 |
停止:所有命令执行完成后,才能进入第3阶段。
Phase 3: Read Full Output
第3阶段:阅读完整输出
Read the entire output of each verification command. Pay attention to:
| Output Element | What to Look For | Why It Matters |
|---|---|---|
| Exit code | Non-zero = failure | Even if output looks ok, non-zero means something failed |
| Test count | Expected number of tests ran? Any skipped? | Skipped tests = untested code |
| Warning messages | New warnings not present before | Warnings become errors; they indicate degradation |
| Deprecation notices | Deprecated API usage | Future breakage risk |
| Performance indicators | Unusually slow tests | May indicate performance regression |
| Error messages | Any error, even if tests "pass" | Some frameworks report errors alongside passing tests |
STOP: Do NOT proceed if you have not read the full output of every command.
阅读每个验证命令的全部输出,重点关注:
| 输出元素 | 检查点 | 重要性 |
|---|---|---|
| 退出码 | 非零值 = 失败 | 即便输出看起来正常,非零退出码也意味着有内容执行失败 |
| 测试数量 | 是否运行了预期数量的测试?有没有跳过的测试? | 跳过的测试 = 未测试的代码 |
| 警告信息 | 有没有之前不存在的新警告? | 警告会演变为错误,说明代码质量出现下降 |
| 弃用通知 | 有没有使用已弃用的API? | 存在未来功能失效的风险 |
| 性能指标 | 有没有异常缓慢的测试? | 可能意味着出现性能回归 |
| 错误信息 | 有没有任何错误,即便测试显示「通过」? | 有些框架会在测试通过的同时输出错误 |
停止:如果你没有阅读所有命令的完整输出,不要继续推进。
Phase 4: Verify Output Matches Claim
第4阶段:验证输出与声明匹配
Ask yourself these questions. ALL must be "yes" to proceed:
| Question | If "No" or "Unsure" |
|---|---|
| Do ALL tests pass (not just the ones I wrote)? | Fix failing tests before claiming done |
| Does the build succeed without errors? | Fix build errors |
| Does the type checker find no errors? | Fix type errors |
| Does the linter pass? | Fix lint errors |
| Are there any NEW warnings that were not there before? | Investigate and fix or explicitly justify |
| Did the full suite run (not just a subset)? | Run the full suite |
| Is the test count what I expected? | Investigate skipped or missing tests |
STOP: If ANY answer is "no" or "unsure", go back to fix and restart from Phase 1. Do NOT proceed to Phase 5.
问自己以下问题,所有问题的答案都必须是「是」才能继续:
| 问题 | 如果答案是「否」或「不确定」 |
|---|---|
| 所有测试都通过了吗(不只是你写的测试)? | 修复失败的测试后再声明完成 |
| 构建是否成功无错误? | 修复构建错误 |
| 类型检查器没有发现错误吗? | 修复类型错误 |
| 代码检查通过了吗? | 修复代码检查错误 |
| 有没有之前不存在的新警告? | 排查并修复,或明确给出合理说明 |
| 是否运行了完整套件(不是子集)? | 运行完整套件 |
| 测试数量符合你的预期吗? | 排查被跳过或缺失的测试 |
停止:如果有任何答案是「否」或「不确定」,返回修复问题并从第1阶段重新开始。不要进入第5阶段。
Phase 5: Claim Completion with Evidence
第5阶段:附带证据声明完成
Only now may you say the task is complete. Include this evidence:
VERIFICATION EVIDENCE
=====================
Task: [what was being done]
Date: [timestamp]
Commands Run:
[x] Tests: [command] -> [result: X passed, Y failed, Z skipped]
[x] Build: [command] -> [result: success/failure]
[x] Lint: [command] -> [result: X errors, Y warnings]
[x] Type-check: [command] -> [result: X errors]
[x] Format: [command] -> [result: clean/X files to format]
All Green? [x] YES [ ] NO
New warnings introduced? [ ] YES [x] NO
Completion claim: [specific claim, e.g., "Feature X is implemented and all tests pass"]STOP: This is the end of the verification protocol. Only claims with this evidence are valid.
只有到这一步你才能声明任务完成,需包含以下证据:
VERIFICATION EVIDENCE
=====================
Task: [what was being done]
Date: [timestamp]
Commands Run:
[x] Tests: [command] -> [result: X passed, Y failed, Z skipped]
[x] Build: [command] -> [result: success/failure]
[x] Lint: [command] -> [result: X errors, Y warnings]
[x] Type-check: [command] -> [result: X errors]
[x] Format: [command] -> [result: clean/X files to format]
All Green? [x] YES [ ] NO
New warnings introduced? [ ] YES [x] NO
Completion claim: [specific claim, e.g., "Feature X is implemented and all tests pass"]停止:验证协议到此结束。只有附带上述证据的声明才有效。
Decision Table: What Counts as "Fresh" Evidence
决策表:什么才算「最新」证据
| Counts as Fresh | Does NOT Count as Fresh |
|---|---|
| Ran command after the latest code change | Ran before the latest code change |
| Full test suite executed | Subset of tests executed |
| Output read and analyzed | Output skimmed or ignored |
| All verification types run | Only tests run (no lint, no build) |
| Command run in current session | Recalled from memory of a previous session |
| Actual command output available | "I remember it passed" |
| 算最新证据 | 不算最新证据 |
|---|---|
| 在最新代码变更后运行命令 | 在最新代码变更前运行命令 |
| 执行了完整测试套件 | 只执行了测试子集 |
| 阅读并分析了输出 | 略读或忽略了输出 |
| 运行了所有类型的验证 | 只运行了测试(没有代码检查、没有构建) |
| 在当前会话中运行的命令 | 回忆之前会话的运行结果 |
| 有实际的命令输出可用 | 「我记得之前通过了」 |
Decision Table: Edge Cases
决策表:边界情况
| Situation | Protocol |
|---|---|
| Full suite takes >5 minutes | Run related tests + smoke suite. Note that full suite was not run. Recommend CI run before merge. |
| No automated tests exist | Note as significant risk. Perform manual verification with documented steps. Recommend adding tests as follow-up. At minimum, verify code compiles/runs. |
| Tests are flaky | Re-run failing test in isolation. If it passes alone, note flakiness. Verify your changes did not introduce it. Do NOT use flakiness as excuse to skip. |
| Only config change | Config changes are #1 cause of outages. Full verification required. |
| Single line change | One-line changes cause production outages. Full verification required. |
| Refactoring only | Existing tests must still pass. Run full suite. |
| 场景 | 处理协议 |
|---|---|
| 完整套件运行时长超过5分钟 | 运行相关测试 + 冒烟测试套件,注明未运行完整套件,建议合并前通过CI运行完整验证 |
| 没有自动化测试 | 注明存在重大风险,执行带文档化步骤的手动验证,建议后续补充测试,至少要验证代码可编译/可运行 |
| 测试不稳定 | 单独重新运行失败的测试,如果单独运行通过,注明测试不稳定性,确认不是你的变更引入的问题,不要用测试不稳定作为跳过的理由 |
| 只有配置变更 | 配置变更是故障的第一诱因,需要完整验证 |
| 只有单行代码变更 | 单行变更也会导致生产故障,需要完整验证 |
| 只有重构 | 现有测试必须仍然通过,运行完整套件 |
Common Failure Patterns
常见失败模式
| Pattern | What Happens | Why It Is Dangerous |
|---|---|---|
| Tests pass but lint fails | Code works but has quality issues | Lint failures often indicate real problems (unused vars, unreachable code) |
| Tests pass but build fails | Test environment differs from build | Production deployments will fail |
| Tests pass but type-check fails | Runtime works but types are wrong | Bugs hiding behind |
| Tests pass in isolation but fail together | Shared state between tests | Flaky CI, unreliable test suite |
| Manual testing passes but automated fails | Manual test missed edge cases | The automated test is right |
| Tests pass but new warnings appeared | Something degraded | Warnings become errors over time |
| Subset of tests pass | Only ran related tests | Regression in unrelated area possible |
| Tests pass but coverage decreased | New code is not tested | Untested code is unverified code |
| Old test run used as evidence | Results are stale | Code changed since that run |
| "It compiled, so it works" | Compilation is necessary but not sufficient | Compiled code can still be wrong |
| 模式 | 后果 | 危险原因 |
|---|---|---|
| 测试通过但代码检查失败 | 代码能运行但存在质量问题 | 代码检查失败通常指向真实问题(未使用变量、不可达代码) |
| 测试通过但构建失败 | 测试环境和构建环境不一致 | 生产部署会失败 |
| 测试通过但类型检查失败 | 运行时正常但类型错误 | 隐藏在 |
| 测试单独运行通过但一起运行失败 | 测试之间存在共享状态 | CI不稳定,测试套件不可靠 |
| 手动测试通过但自动化测试失败 | 手动测试遗漏了边界场景 | 自动化测试的结果是正确的 |
| 测试通过但出现了新警告 | 代码质量出现下降 | 警告随时间会演变为错误 |
| 测试子集通过 | 只运行了相关测试 | 可能在无关区域引入回归问题 |
| 测试通过但覆盖率下降 | 新代码没有被测试覆盖 | 未测试的代码就是未验证的代码 |
| 用旧的测试运行结果作为证据 | 结果已经过时 | 那次运行后代码已经变更 |
| 「编译通过了所以没问题」 | 编译是必要条件但不是充分条件 | 编译通过的代码仍然可能存在逻辑错误 |
Anti-Patterns / Common Mistakes
反模式/常见错误
| What NOT to Do | Why It Fails | What to Do Instead |
|---|---|---|
| Claim done without running tests | Unverified code breaks in production | Run all verification commands fresh |
| Use "it passed earlier" as evidence | Code changed since then | Run fresh after every change |
| Skip lint because "it is just warnings" | Warnings indicate real problems | Fix warnings or explicitly justify each one |
| Run only the tests you wrote | Misses regressions in other areas | Run the full suite |
| Read test output partially | Missed failures hidden in output | Read every line of output |
| Use manual testing as sole evidence | Manual testing is incomplete and unrepeatable | Run automated verification |
| Verify once, then make "small" additional changes | Those changes are unverified | Re-verify after every change |
| Suppress warnings without comment | Hides real issues | If truly false positive, add suppression comment explaining why |
| 禁止行为 | 失败原因 | 正确做法 |
|---|---|---|
| 不运行测试就声明完成 | 未验证的代码会在生产环境故障 | 重新运行所有验证命令 |
| 用「之前已经通过了」作为证据 | 之后代码已经变更 | 每次变更后重新运行验证 |
| 跳过代码检查,理由是「只是警告」 | 警告通常指向真实问题 | 修复警告,或为每个警告明确给出合理说明 |
| 只运行你写的测试 | 遗漏其他区域的回归问题 | 运行完整测试套件 |
| 只阅读部分测试输出 | 遗漏隐藏在输出中的失败 | 阅读输出的每一行 |
| 只用手动测试作为唯一证据 | 手动测试不完整且不可复现 | 运行自动化验证 |
| 验证一次后又做了「小」变更 | 这些变更没有被验证 | 每次变更后重新验证 |
| 不加注释就屏蔽警告 | 隐藏真实问题 | 如果确实是误报,添加屏蔽注释说明原因 |
Anti-Rationalization Guards
反合理化防护
| Excuse | Reality |
|---|---|
| "I only changed one line" | One-line changes cause production outages. Verify. |
| "The tests passed 5 minutes ago" | You made changes since then. Run them again. |
| "I have tested this pattern before" | This is a different instance. Verify this specific one. |
| "The change is obviously correct" | Obvious changes fail more often because they are not verified. |
| "Running tests takes too long" | Not running tests takes longer when the bug reaches production. |
| "I will verify after I submit" | You will not. And if verification fails, you will undo and redo. |
| "It is just a config change" | Config changes are the #1 cause of outages. Verify. |
| "The linter warnings are false positives" | Review each one. Suppress with comment if truly false. |
| "The type errors are in unrelated code" | They might interact. Run the full check. |
| "I tested it manually" | Manual testing is incomplete and unrepeatable. Run automated verification. |
Do NOT claim completion without Phase 5 evidence. There are zero exceptions.
| 借口 | 现实 |
|---|---|
| 「我只改了一行代码」 | 单行变更会导致生产故障,必须验证 |
| 「测试5分钟前刚通过」 | 那之后你做了变更,重新运行 |
| 「我之前测试过这种模式」 | 这是不同的实例,验证当前这个特定场景 |
| 「这个变更显然是对的」 | 显而易见的变更失败率更高,因为没人验证 |
| 「运行测试太费时间」 | 如果Bug流到生产环境,处理故障的时间更长 |
| 「我提交之后再验证」 | 你不会的,而且如果验证失败,你还要撤销重改 |
| 「只是配置变更而已」 | 配置变更是故障的第一诱因,必须验证 |
| 「代码检查警告都是误报」 | 逐个检查,如果确实是误报,添加注释后再屏蔽 |
| 「类型错误是无关代码里的」 | 它们可能会互相影响,运行完整检查 |
| 「我手动测试过了」 | 手动测试不完整且不可复现,运行自动化验证 |
没有第5阶段的证据,不得声明完成,没有任何例外。
Integration Points
集成点
| Skill | When Verification Is Required |
|---|---|
| After completing RED-GREEN-REFACTOR cycle for a feature |
| After applying a bug fix |
| After each task and after each batch |
| After implementer delivers (via |
| Before approving any code review |
| Before marking any task as complete |
| Before setting EXIT_SIGNAL to true |
| Before merge or PR creation |
| 技能 | 需要验证的时机 |
|---|---|
| 完成功能的红-绿-重构循环后 |
| 应用Bug修复后 |
| 每个任务完成后、每批任务完成后 |
| 实现者交付(通过 |
| 批准任何代码评审前 |
| 标记任何任务为完成前 |
| 将EXIT_SIGNAL设为true前 |
| 合并或创建PR前 |
Integration Flow
集成流程
[Do the work using other skills]
|
v
[Think you are done?]
|
v
[Invoke verification-before-completion]
|
+-- All checks pass -> Claim completion with Phase 5 evidence
|
+-- Any check fails -> Fix and re-verify (do NOT claim completion)[Do the work using other skills]
|
v
[Think you are done?]
|
v
[Invoke verification-before-completion]
|
+-- All checks pass -> Claim completion with Phase 5 evidence
|
+-- Any check fails -> Fix and re-verify (do NOT claim completion)Enforcement by Other Skills
其他技能的强制调用
This skill is invoked by ALL other skills at completion time. It is not optional. It is a terminal checkpoint — called at the end of work, never at the beginning.
所有其他技能在完成时都必须调用本技能,它不是可选项。它是最终检查点——只能在工作结束时调用,不能在开始时调用。
Skill Type
技能类型
RIGID — The 5-step protocol is a HARD-GATE. Every step must be executed in order. No step may be skipped. No completion claim is valid without Phase 5 evidence. Do not relax these requirements for any reason.
刚性(RIGID) —— 5步协议是硬门槛(HARD-GATE),必须按顺序执行每一步,不得跳过任何步骤。没有第5阶段证据的完成声明一律无效,任何情况下都不得放宽要求。