verification-before-completion

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Overview

概述

The verification-before-completion skill is the terminal checkpoint for every task in the toolkit. It enforces a strict 5-step protocol that requires running fresh verification commands, reading their full output, and confirming results match the completion claim. Without this skill, agents make unverified claims that lead to broken code in production — with it, every completion claim is backed by evidence.

verification-before-completion 技能是工具包中所有任务的最终检查点。它强制执行严格的5步协议，要求运行最新的验证命令，阅读完整输出，并确认结果与完成声明一致。如果没有这个技能，Agent会做出未经核实的声明，导致生产环境代码出现问题——使用该技能后，每一份完成声明都有证据支撑。

Iron Law

铁律

NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE.

You cannot say "it works," "it's done," "the bug is fixed," or "the feature is complete" without running verification commands and reading their output in this session. Cached results, previous runs, and assumptions do not count.

没有最新验证证据，不得做出任何完成声明。

你不能在未运行验证命令并阅读本次会话输出的情况下，就说「它能用」、「已经做完了」、「Bug已修复」或者「功能已完成」。缓存结果、之前的运行记录和主观假设都不具备效力。

Phase 1: Identify Verification Commands

第1阶段：确定验证命令

Before running anything, explicitly list what needs to pass for this specific task:

Verification Type	Example Commands	When Required
Unit tests	`npm test` , `pytest` , `go test ./...` , `cargo test` , `php artisan test`	Always
Integration tests	`npm run test:integration` , `pytest tests/integration/`	When applicable
Type checking	`tsc --noEmit` , `mypy .` , `pyright` , `phpstan analyse`	When project uses type checking
Linting	`eslint .` , `ruff check .` , `golint ./...` , `php-cs-fixer fix --dry-run`	Always
Build	`npm run build` , `cargo build` , `go build ./...`	Always
Format check	`prettier --check .` , `black --check .` , `gofmt -l .` , `pint --test`	When project uses formatters

Action: List which of these apply to the current project. Not all projects have all types. Be explicit about what you will run and why.

STOP: Do NOT proceed to running commands until you have identified ALL applicable verification types.

运行任何命令前，明确列出当前特定任务需要通过的验证项：

验证类型	示例命令	适用场景
单元测试	`npm test` , `pytest` , `go test ./...` , `cargo test` , `php artisan test`	所有场景
集成测试	`npm run test:integration` , `pytest tests/integration/`	适用时执行
类型检查	`tsc --noEmit` , `mypy .` , `pyright` , `phpstan analyse`	项目使用类型检查时执行
代码检查	`eslint .` , `ruff check .` , `golint ./...` , `php-cs-fixer fix --dry-run`	所有场景
构建	`npm run build` , `cargo build` , `go build ./...`	所有场景
格式检查	`prettier --check .` , `black --check .` , `gofmt -l .` , `pint --test`	项目使用格式化工具时执行

操作： 列出上述验证项中适用于当前项目的内容。不是所有项目都包含所有验证类型，明确说明你要运行的命令和原因。

停止：在确定所有适用的验证类型前，不要继续运行命令。

Phase 2: Run Commands Fresh

第2阶段：重新运行所有命令

Execute every verification command identified in Phase 1.

Rule	Rationale
Run AFTER the latest code change	Pre-change results are stale
Run the FULL suite, not a subset	Subset runs miss regressions
Do NOT rely on cached results	Code changed since last run
Do NOT skip commands because "they passed earlier"	Earlier is not now
If a command takes >5 minutes, note it	Explain what was run instead

STOP: All commands must complete before proceeding to Phase 3.

执行第1阶段确定的所有验证命令。

规则	依据
在最新代码变更后运行	变更前的结果是过时的
运行完整套件，不要只运行子集	子集运行会遗漏回归问题
不要依赖缓存结果	上次运行后代码已变更
不要因为「之前已经通过」就跳过命令	之前不代表现在
如果命令运行时长超过5分钟，请注明	说明你改为运行了什么命令

停止：所有命令执行完成后，才能进入第3阶段。

Phase 3: Read Full Output

第3阶段：阅读完整输出

Read the entire output of each verification command. Pay attention to:

Output Element	What to Look For	Why It Matters
Exit code	Non-zero = failure	Even if output looks ok, non-zero means something failed
Test count	Expected number of tests ran? Any skipped?	Skipped tests = untested code
Warning messages	New warnings not present before	Warnings become errors; they indicate degradation
Deprecation notices	Deprecated API usage	Future breakage risk
Performance indicators	Unusually slow tests	May indicate performance regression
Error messages	Any error, even if tests "pass"	Some frameworks report errors alongside passing tests

STOP: Do NOT proceed if you have not read the full output of every command.

阅读每个验证命令的全部输出，重点关注：

输出元素	检查点	重要性
退出码	非零值 = 失败	即便输出看起来正常，非零退出码也意味着有内容执行失败
测试数量	是否运行了预期数量的测试？有没有跳过的测试？	跳过的测试 = 未测试的代码
警告信息	有没有之前不存在的新警告？	警告会演变为错误，说明代码质量出现下降
弃用通知	有没有使用已弃用的API？	存在未来功能失效的风险
性能指标	有没有异常缓慢的测试？	可能意味着出现性能回归
错误信息	有没有任何错误，即便测试显示「通过」？	有些框架会在测试通过的同时输出错误

停止：如果你没有阅读所有命令的完整输出，不要继续推进。

Phase 4: Verify Output Matches Claim

第4阶段：验证输出与声明匹配

Ask yourself these questions. ALL must be "yes" to proceed:

Question	If "No" or "Unsure"
Do ALL tests pass (not just the ones I wrote)?	Fix failing tests before claiming done
Does the build succeed without errors?	Fix build errors
Does the type checker find no errors?	Fix type errors
Does the linter pass?	Fix lint errors
Are there any NEW warnings that were not there before?	Investigate and fix or explicitly justify
Did the full suite run (not just a subset)?	Run the full suite
Is the test count what I expected?	Investigate skipped or missing tests

STOP: If ANY answer is "no" or "unsure", go back to fix and restart from Phase 1. Do NOT proceed to Phase 5.

问自己以下问题，所有问题的答案都必须是「是」才能继续：

问题	如果答案是「否」或「不确定」
所有测试都通过了吗（不只是你写的测试）？	修复失败的测试后再声明完成
构建是否成功无错误？	修复构建错误
类型检查器没有发现错误吗？	修复类型错误
代码检查通过了吗？	修复代码检查错误
有没有之前不存在的新警告？	排查并修复，或明确给出合理说明
是否运行了完整套件（不是子集）？	运行完整套件
测试数量符合你的预期吗？	排查被跳过或缺失的测试

停止：如果有任何答案是「否」或「不确定」，返回修复问题并从第1阶段重新开始。不要进入第5阶段。

Phase 5: Claim Completion with Evidence

第5阶段：附带证据声明完成

Only now may you say the task is complete. Include this evidence:

VERIFICATION EVIDENCE
=====================
Task: [what was being done]
Date: [timestamp]

Commands Run:
  [x] Tests:      [command] -> [result: X passed, Y failed, Z skipped]
  [x] Build:      [command] -> [result: success/failure]
  [x] Lint:       [command] -> [result: X errors, Y warnings]
  [x] Type-check: [command] -> [result: X errors]
  [x] Format:     [command] -> [result: clean/X files to format]

All Green?  [x] YES  [ ] NO
New warnings introduced?  [ ] YES  [x] NO

Completion claim: [specific claim, e.g., "Feature X is implemented and all tests pass"]

STOP: This is the end of the verification protocol. Only claims with this evidence are valid.

只有到这一步你才能声明任务完成，需包含以下证据：

VERIFICATION EVIDENCE
=====================
Task: [what was being done]
Date: [timestamp]

Commands Run:
  [x] Tests:      [command] -> [result: X passed, Y failed, Z skipped]
  [x] Build:      [command] -> [result: success/failure]
  [x] Lint:       [command] -> [result: X errors, Y warnings]
  [x] Type-check: [command] -> [result: X errors]
  [x] Format:     [command] -> [result: clean/X files to format]

All Green?  [x] YES  [ ] NO
New warnings introduced?  [ ] YES  [x] NO

Completion claim: [specific claim, e.g., "Feature X is implemented and all tests pass"]

停止：验证协议到此结束。只有附带上述证据的声明才有效。

Decision Table: What Counts as "Fresh" Evidence

决策表：什么才算「最新」证据

Counts as Fresh	Does NOT Count as Fresh
Ran command after the latest code change	Ran before the latest code change
Full test suite executed	Subset of tests executed
Output read and analyzed	Output skimmed or ignored
All verification types run	Only tests run (no lint, no build)
Command run in current session	Recalled from memory of a previous session
Actual command output available	"I remember it passed"

算最新证据	不算最新证据
在最新代码变更后运行命令	在最新代码变更前运行命令
执行了完整测试套件	只执行了测试子集
阅读并分析了输出	略读或忽略了输出
运行了所有类型的验证	只运行了测试（没有代码检查、没有构建）
在当前会话中运行的命令	回忆之前会话的运行结果
有实际的命令输出可用	「我记得之前通过了」

Decision Table: Edge Cases

决策表：边界情况

Situation	Protocol
Full suite takes >5 minutes	Run related tests + smoke suite. Note that full suite was not run. Recommend CI run before merge.
No automated tests exist	Note as significant risk. Perform manual verification with documented steps. Recommend adding tests as follow-up. At minimum, verify code compiles/runs.
Tests are flaky	Re-run failing test in isolation. If it passes alone, note flakiness. Verify your changes did not introduce it. Do NOT use flakiness as excuse to skip.
Only config change	Config changes are #1 cause of outages. Full verification required.
Single line change	One-line changes cause production outages. Full verification required.
Refactoring only	Existing tests must still pass. Run full suite.

场景	处理协议
完整套件运行时长超过5分钟	运行相关测试 + 冒烟测试套件，注明未运行完整套件，建议合并前通过CI运行完整验证
没有自动化测试	注明存在重大风险，执行带文档化步骤的手动验证，建议后续补充测试，至少要验证代码可编译/可运行
测试不稳定	单独重新运行失败的测试，如果单独运行通过，注明测试不稳定性，确认不是你的变更引入的问题，不要用测试不稳定作为跳过的理由
只有配置变更	配置变更是故障的第一诱因，需要完整验证
只有单行代码变更	单行变更也会导致生产故障，需要完整验证
只有重构	现有测试必须仍然通过，运行完整套件

Common Failure Patterns

常见失败模式

Pattern	What Happens	Why It Is Dangerous
Tests pass but lint fails	Code works but has quality issues	Lint failures often indicate real problems (unused vars, unreachable code)
Tests pass but build fails	Test environment differs from build	Production deployments will fail
Tests pass but type-check fails	Runtime works but types are wrong	Bugs hiding behind `any` types, wrong interfaces
Tests pass in isolation but fail together	Shared state between tests	Flaky CI, unreliable test suite
Manual testing passes but automated fails	Manual test missed edge cases	The automated test is right
Tests pass but new warnings appeared	Something degraded	Warnings become errors over time
Subset of tests pass	Only ran related tests	Regression in unrelated area possible
Tests pass but coverage decreased	New code is not tested	Untested code is unverified code
Old test run used as evidence	Results are stale	Code changed since that run
"It compiled, so it works"	Compilation is necessary but not sufficient	Compiled code can still be wrong

模式	后果	危险原因
测试通过但代码检查失败	代码能运行但存在质量问题	代码检查失败通常指向真实问题（未使用变量、不可达代码）
测试通过但构建失败	测试环境和构建环境不一致	生产部署会失败
测试通过但类型检查失败	运行时正常但类型错误	隐藏在 `any` 类型、错误接口背后的Bug
测试单独运行通过但一起运行失败	测试之间存在共享状态	CI不稳定，测试套件不可靠
手动测试通过但自动化测试失败	手动测试遗漏了边界场景	自动化测试的结果是正确的
测试通过但出现了新警告	代码质量出现下降	警告随时间会演变为错误
测试子集通过	只运行了相关测试	可能在无关区域引入回归问题
测试通过但覆盖率下降	新代码没有被测试覆盖	未测试的代码就是未验证的代码
用旧的测试运行结果作为证据	结果已经过时	那次运行后代码已经变更
「编译通过了所以没问题」	编译是必要条件但不是充分条件	编译通过的代码仍然可能存在逻辑错误

Anti-Patterns / Common Mistakes

反模式/常见错误

What NOT to Do	Why It Fails	What to Do Instead
Claim done without running tests	Unverified code breaks in production	Run all verification commands fresh
Use "it passed earlier" as evidence	Code changed since then	Run fresh after every change
Skip lint because "it is just warnings"	Warnings indicate real problems	Fix warnings or explicitly justify each one
Run only the tests you wrote	Misses regressions in other areas	Run the full suite
Read test output partially	Missed failures hidden in output	Read every line of output
Use manual testing as sole evidence	Manual testing is incomplete and unrepeatable	Run automated verification
Verify once, then make "small" additional changes	Those changes are unverified	Re-verify after every change
Suppress warnings without comment	Hides real issues	If truly false positive, add suppression comment explaining why

禁止行为	失败原因	正确做法
不运行测试就声明完成	未验证的代码会在生产环境故障	重新运行所有验证命令
用「之前已经通过了」作为证据	之后代码已经变更	每次变更后重新运行验证
跳过代码检查，理由是「只是警告」	警告通常指向真实问题	修复警告，或为每个警告明确给出合理说明
只运行你写的测试	遗漏其他区域的回归问题	运行完整测试套件
只阅读部分测试输出	遗漏隐藏在输出中的失败	阅读输出的每一行
只用手动测试作为唯一证据	手动测试不完整且不可复现	运行自动化验证
验证一次后又做了「小」变更	这些变更没有被验证	每次变更后重新验证
不加注释就屏蔽警告	隐藏真实问题	如果确实是误报，添加屏蔽注释说明原因

Anti-Rationalization Guards

反合理化防护

Excuse	Reality
"I only changed one line"	One-line changes cause production outages. Verify.
"The tests passed 5 minutes ago"	You made changes since then. Run them again.
"I have tested this pattern before"	This is a different instance. Verify this specific one.
"The change is obviously correct"	Obvious changes fail more often because they are not verified.
"Running tests takes too long"	Not running tests takes longer when the bug reaches production.
"I will verify after I submit"	You will not. And if verification fails, you will undo and redo.
"It is just a config change"	Config changes are the #1 cause of outages. Verify.
"The linter warnings are false positives"	Review each one. Suppress with comment if truly false.
"The type errors are in unrelated code"	They might interact. Run the full check.
"I tested it manually"	Manual testing is incomplete and unrepeatable. Run automated verification.

Do NOT claim completion without Phase 5 evidence. There are zero exceptions.

借口	现实
「我只改了一行代码」	单行变更会导致生产故障，必须验证
「测试5分钟前刚通过」	那之后你做了变更，重新运行
「我之前测试过这种模式」	这是不同的实例，验证当前这个特定场景
「这个变更显然是对的」	显而易见的变更失败率更高，因为没人验证
「运行测试太费时间」	如果Bug流到生产环境，处理故障的时间更长
「我提交之后再验证」	你不会的，而且如果验证失败，你还要撤销重改
「只是配置变更而已」	配置变更是故障的第一诱因，必须验证
「代码检查警告都是误报」	逐个检查，如果确实是误报，添加注释后再屏蔽
「类型错误是无关代码里的」	它们可能会互相影响，运行完整检查
「我手动测试过了」	手动测试不完整且不可复现，运行自动化验证

没有第5阶段的证据，不得声明完成，没有任何例外。

Integration Points

集成点

Skill	When Verification Is Required
`test-driven-development`	After completing RED-GREEN-REFACTOR cycle for a feature
`systematic-debugging`	After applying a bug fix
`executing-plans`	After each task and after each batch
`subagent-driven-development`	After implementer delivers (via `Agent` tool), after reviewers approve
`code-review`	Before approving any code review
`resilient-execution`	Before marking any task as complete
`autonomous-loop`	Before setting EXIT_SIGNAL to true
`finishing-a-development-branch`	Before merge or PR creation

技能	需要验证的时机
`test-driven-development`	完成功能的红-绿-重构循环后
`systematic-debugging`	应用Bug修复后
`executing-plans`	每个任务完成后、每批任务完成后
`subagent-driven-development`	实现者交付（通过 `Agent` 工具）后、评审者批准后
`code-review`	批准任何代码评审前
`resilient-execution`	标记任何任务为完成前
`autonomous-loop`	将EXIT_SIGNAL设为true前
`finishing-a-development-branch`	合并或创建PR前

Integration Flow

集成流程

[Do the work using other skills]
    |
    v
[Think you are done?]
    |
    v
[Invoke verification-before-completion]
    |
    +-- All checks pass -> Claim completion with Phase 5 evidence
    |
    +-- Any check fails -> Fix and re-verify (do NOT claim completion)

[Do the work using other skills]
    |
    v
[Think you are done?]
    |
    v
[Invoke verification-before-completion]
    |
    +-- All checks pass -> Claim completion with Phase 5 evidence
    |
    +-- Any check fails -> Fix and re-verify (do NOT claim completion)

Enforcement by Other Skills

其他技能的强制调用

This skill is invoked by ALL other skills at completion time. It is not optional. It is a terminal checkpoint — called at the end of work, never at the beginning.

所有其他技能在完成时都必须调用本技能，它不是可选项。它是最终检查点——只能在工作结束时调用，不能在开始时调用。

Skill Type

技能类型

RIGID — The 5-step protocol is a HARD-GATE. Every step must be executed in order. No step may be skipped. No completion claim is valid without Phase 5 evidence. Do not relax these requirements for any reason.

刚性(RIGID) —— 5步协议是硬门槛(HARD-GATE)，必须按顺序执行每一步，不得跳过任何步骤。没有第5阶段证据的完成声明一律无效，任何情况下都不得放宽要求。