verification-before-completion

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Overview

概述

The verification-before-completion skill is the terminal checkpoint for every task in the toolkit. It enforces a strict 5-step protocol that requires running fresh verification commands, reading their full output, and confirming results match the completion claim. Without this skill, agents make unverified claims that lead to broken code in production — with it, every completion claim is backed by evidence.

verification-before-completion 技能是工具包中所有任务的最终检查点。它强制执行严格的5步协议,要求运行最新的验证命令,阅读完整输出,并确认结果与完成声明一致。如果没有这个技能,Agent会做出未经核实的声明,导致生产环境代码出现问题——使用该技能后,每一份完成声明都有证据支撑。

Iron Law

铁律

NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE.
You cannot say "it works," "it's done," "the bug is fixed," or "the feature is complete" without running verification commands and reading their output in this session. Cached results, previous runs, and assumptions do not count.

没有最新验证证据,不得做出任何完成声明。
你不能在未运行验证命令并阅读本次会话输出的情况下,就说「它能用」、「已经做完了」、「Bug已修复」或者「功能已完成」。缓存结果、之前的运行记录和主观假设都不具备效力。

Phase 1: Identify Verification Commands

第1阶段:确定验证命令

Before running anything, explicitly list what needs to pass for this specific task:
Verification TypeExample CommandsWhen Required
Unit tests
npm test
,
pytest
,
go test ./...
,
cargo test
,
php artisan test
Always
Integration tests
npm run test:integration
,
pytest tests/integration/
When applicable
Type checking
tsc --noEmit
,
mypy .
,
pyright
,
phpstan analyse
When project uses type checking
Linting
eslint .
,
ruff check .
,
golint ./...
,
php-cs-fixer fix --dry-run
Always
Build
npm run build
,
cargo build
,
go build ./...
Always
Format check
prettier --check .
,
black --check .
,
gofmt -l .
,
pint --test
When project uses formatters
Action: List which of these apply to the current project. Not all projects have all types. Be explicit about what you will run and why.
STOP: Do NOT proceed to running commands until you have identified ALL applicable verification types.

运行任何命令前,明确列出当前特定任务需要通过的验证项:
验证类型示例命令适用场景
单元测试
npm test
,
pytest
,
go test ./...
,
cargo test
,
php artisan test
所有场景
集成测试
npm run test:integration
,
pytest tests/integration/
适用时执行
类型检查
tsc --noEmit
,
mypy .
,
pyright
,
phpstan analyse
项目使用类型检查时执行
代码检查
eslint .
,
ruff check .
,
golint ./...
,
php-cs-fixer fix --dry-run
所有场景
构建
npm run build
,
cargo build
,
go build ./...
所有场景
格式检查
prettier --check .
,
black --check .
,
gofmt -l .
,
pint --test
项目使用格式化工具时执行
操作: 列出上述验证项中适用于当前项目的内容。不是所有项目都包含所有验证类型,明确说明你要运行的命令和原因。
停止:在确定所有适用的验证类型前,不要继续运行命令。

Phase 2: Run Commands Fresh

第2阶段:重新运行所有命令

Execute every verification command identified in Phase 1.
RuleRationale
Run AFTER the latest code changePre-change results are stale
Run the FULL suite, not a subsetSubset runs miss regressions
Do NOT rely on cached resultsCode changed since last run
Do NOT skip commands because "they passed earlier"Earlier is not now
If a command takes >5 minutes, note itExplain what was run instead
STOP: All commands must complete before proceeding to Phase 3.

执行第1阶段确定的所有验证命令。
规则依据
在最新代码变更后运行变更前的结果是过时的
运行完整套件,不要只运行子集子集运行会遗漏回归问题
不要依赖缓存结果上次运行后代码已变更
不要因为「之前已经通过」就跳过命令之前不代表现在
如果命令运行时长超过5分钟,请注明说明你改为运行了什么命令
停止:所有命令执行完成后,才能进入第3阶段。

Phase 3: Read Full Output

第3阶段:阅读完整输出

Read the entire output of each verification command. Pay attention to:
Output ElementWhat to Look ForWhy It Matters
Exit codeNon-zero = failureEven if output looks ok, non-zero means something failed
Test countExpected number of tests ran? Any skipped?Skipped tests = untested code
Warning messagesNew warnings not present beforeWarnings become errors; they indicate degradation
Deprecation noticesDeprecated API usageFuture breakage risk
Performance indicatorsUnusually slow testsMay indicate performance regression
Error messagesAny error, even if tests "pass"Some frameworks report errors alongside passing tests
STOP: Do NOT proceed if you have not read the full output of every command.

阅读每个验证命令的全部输出,重点关注:
输出元素检查点重要性
退出码非零值 = 失败即便输出看起来正常,非零退出码也意味着有内容执行失败
测试数量是否运行了预期数量的测试?有没有跳过的测试?跳过的测试 = 未测试的代码
警告信息有没有之前不存在的新警告?警告会演变为错误,说明代码质量出现下降
弃用通知有没有使用已弃用的API?存在未来功能失效的风险
性能指标有没有异常缓慢的测试?可能意味着出现性能回归
错误信息有没有任何错误,即便测试显示「通过」?有些框架会在测试通过的同时输出错误
停止:如果你没有阅读所有命令的完整输出,不要继续推进。

Phase 4: Verify Output Matches Claim

第4阶段:验证输出与声明匹配

Ask yourself these questions. ALL must be "yes" to proceed:
QuestionIf "No" or "Unsure"
Do ALL tests pass (not just the ones I wrote)?Fix failing tests before claiming done
Does the build succeed without errors?Fix build errors
Does the type checker find no errors?Fix type errors
Does the linter pass?Fix lint errors
Are there any NEW warnings that were not there before?Investigate and fix or explicitly justify
Did the full suite run (not just a subset)?Run the full suite
Is the test count what I expected?Investigate skipped or missing tests
STOP: If ANY answer is "no" or "unsure", go back to fix and restart from Phase 1. Do NOT proceed to Phase 5.

问自己以下问题,所有问题的答案都必须是「是」才能继续:
问题如果答案是「否」或「不确定」
所有测试都通过了吗(不只是你写的测试)?修复失败的测试后再声明完成
构建是否成功无错误?修复构建错误
类型检查器没有发现错误吗?修复类型错误
代码检查通过了吗?修复代码检查错误
有没有之前不存在的新警告?排查并修复,或明确给出合理说明
是否运行了完整套件(不是子集)?运行完整套件
测试数量符合你的预期吗?排查被跳过或缺失的测试
停止:如果有任何答案是「否」或「不确定」,返回修复问题并从第1阶段重新开始。不要进入第5阶段。

Phase 5: Claim Completion with Evidence

第5阶段:附带证据声明完成

Only now may you say the task is complete. Include this evidence:
VERIFICATION EVIDENCE
=====================
Task: [what was being done]
Date: [timestamp]

Commands Run:
  [x] Tests:      [command] -> [result: X passed, Y failed, Z skipped]
  [x] Build:      [command] -> [result: success/failure]
  [x] Lint:       [command] -> [result: X errors, Y warnings]
  [x] Type-check: [command] -> [result: X errors]
  [x] Format:     [command] -> [result: clean/X files to format]

All Green?  [x] YES  [ ] NO
New warnings introduced?  [ ] YES  [x] NO

Completion claim: [specific claim, e.g., "Feature X is implemented and all tests pass"]
STOP: This is the end of the verification protocol. Only claims with this evidence are valid.

只有到这一步你才能声明任务完成,需包含以下证据:
VERIFICATION EVIDENCE
=====================
Task: [what was being done]
Date: [timestamp]

Commands Run:
  [x] Tests:      [command] -> [result: X passed, Y failed, Z skipped]
  [x] Build:      [command] -> [result: success/failure]
  [x] Lint:       [command] -> [result: X errors, Y warnings]
  [x] Type-check: [command] -> [result: X errors]
  [x] Format:     [command] -> [result: clean/X files to format]

All Green?  [x] YES  [ ] NO
New warnings introduced?  [ ] YES  [x] NO

Completion claim: [specific claim, e.g., "Feature X is implemented and all tests pass"]
停止:验证协议到此结束。只有附带上述证据的声明才有效。

Decision Table: What Counts as "Fresh" Evidence

决策表:什么才算「最新」证据

Counts as FreshDoes NOT Count as Fresh
Ran command after the latest code changeRan before the latest code change
Full test suite executedSubset of tests executed
Output read and analyzedOutput skimmed or ignored
All verification types runOnly tests run (no lint, no build)
Command run in current sessionRecalled from memory of a previous session
Actual command output available"I remember it passed"

算最新证据不算最新证据
在最新代码变更后运行命令在最新代码变更前运行命令
执行了完整测试套件只执行了测试子集
阅读并分析了输出略读或忽略了输出
运行了所有类型的验证只运行了测试(没有代码检查、没有构建)
在当前会话中运行的命令回忆之前会话的运行结果
有实际的命令输出可用「我记得之前通过了」

Decision Table: Edge Cases

决策表:边界情况

SituationProtocol
Full suite takes >5 minutesRun related tests + smoke suite. Note that full suite was not run. Recommend CI run before merge.
No automated tests existNote as significant risk. Perform manual verification with documented steps. Recommend adding tests as follow-up. At minimum, verify code compiles/runs.
Tests are flakyRe-run failing test in isolation. If it passes alone, note flakiness. Verify your changes did not introduce it. Do NOT use flakiness as excuse to skip.
Only config changeConfig changes are #1 cause of outages. Full verification required.
Single line changeOne-line changes cause production outages. Full verification required.
Refactoring onlyExisting tests must still pass. Run full suite.

场景处理协议
完整套件运行时长超过5分钟运行相关测试 + 冒烟测试套件,注明未运行完整套件,建议合并前通过CI运行完整验证
没有自动化测试注明存在重大风险,执行带文档化步骤的手动验证,建议后续补充测试,至少要验证代码可编译/可运行
测试不稳定单独重新运行失败的测试,如果单独运行通过,注明测试不稳定性,确认不是你的变更引入的问题,不要用测试不稳定作为跳过的理由
只有配置变更配置变更是故障的第一诱因,需要完整验证
只有单行代码变更单行变更也会导致生产故障,需要完整验证
只有重构现有测试必须仍然通过,运行完整套件

Common Failure Patterns

常见失败模式

PatternWhat HappensWhy It Is Dangerous
Tests pass but lint failsCode works but has quality issuesLint failures often indicate real problems (unused vars, unreachable code)
Tests pass but build failsTest environment differs from buildProduction deployments will fail
Tests pass but type-check failsRuntime works but types are wrongBugs hiding behind
any
types, wrong interfaces
Tests pass in isolation but fail togetherShared state between testsFlaky CI, unreliable test suite
Manual testing passes but automated failsManual test missed edge casesThe automated test is right
Tests pass but new warnings appearedSomething degradedWarnings become errors over time
Subset of tests passOnly ran related testsRegression in unrelated area possible
Tests pass but coverage decreasedNew code is not testedUntested code is unverified code
Old test run used as evidenceResults are staleCode changed since that run
"It compiled, so it works"Compilation is necessary but not sufficientCompiled code can still be wrong

模式后果危险原因
测试通过但代码检查失败代码能运行但存在质量问题代码检查失败通常指向真实问题(未使用变量、不可达代码)
测试通过但构建失败测试环境和构建环境不一致生产部署会失败
测试通过但类型检查失败运行时正常但类型错误隐藏在
any
类型、错误接口背后的Bug
测试单独运行通过但一起运行失败测试之间存在共享状态CI不稳定,测试套件不可靠
手动测试通过但自动化测试失败手动测试遗漏了边界场景自动化测试的结果是正确的
测试通过但出现了新警告代码质量出现下降警告随时间会演变为错误
测试子集通过只运行了相关测试可能在无关区域引入回归问题
测试通过但覆盖率下降新代码没有被测试覆盖未测试的代码就是未验证的代码
用旧的测试运行结果作为证据结果已经过时那次运行后代码已经变更
「编译通过了所以没问题」编译是必要条件但不是充分条件编译通过的代码仍然可能存在逻辑错误

Anti-Patterns / Common Mistakes

反模式/常见错误

What NOT to DoWhy It FailsWhat to Do Instead
Claim done without running testsUnverified code breaks in productionRun all verification commands fresh
Use "it passed earlier" as evidenceCode changed since thenRun fresh after every change
Skip lint because "it is just warnings"Warnings indicate real problemsFix warnings or explicitly justify each one
Run only the tests you wroteMisses regressions in other areasRun the full suite
Read test output partiallyMissed failures hidden in outputRead every line of output
Use manual testing as sole evidenceManual testing is incomplete and unrepeatableRun automated verification
Verify once, then make "small" additional changesThose changes are unverifiedRe-verify after every change
Suppress warnings without commentHides real issuesIf truly false positive, add suppression comment explaining why

禁止行为失败原因正确做法
不运行测试就声明完成未验证的代码会在生产环境故障重新运行所有验证命令
用「之前已经通过了」作为证据之后代码已经变更每次变更后重新运行验证
跳过代码检查,理由是「只是警告」警告通常指向真实问题修复警告,或为每个警告明确给出合理说明
只运行你写的测试遗漏其他区域的回归问题运行完整测试套件
只阅读部分测试输出遗漏隐藏在输出中的失败阅读输出的每一行
只用手动测试作为唯一证据手动测试不完整且不可复现运行自动化验证
验证一次后又做了「小」变更这些变更没有被验证每次变更后重新验证
不加注释就屏蔽警告隐藏真实问题如果确实是误报,添加屏蔽注释说明原因

Anti-Rationalization Guards

反合理化防护

ExcuseReality
"I only changed one line"One-line changes cause production outages. Verify.
"The tests passed 5 minutes ago"You made changes since then. Run them again.
"I have tested this pattern before"This is a different instance. Verify this specific one.
"The change is obviously correct"Obvious changes fail more often because they are not verified.
"Running tests takes too long"Not running tests takes longer when the bug reaches production.
"I will verify after I submit"You will not. And if verification fails, you will undo and redo.
"It is just a config change"Config changes are the #1 cause of outages. Verify.
"The linter warnings are false positives"Review each one. Suppress with comment if truly false.
"The type errors are in unrelated code"They might interact. Run the full check.
"I tested it manually"Manual testing is incomplete and unrepeatable. Run automated verification.
Do NOT claim completion without Phase 5 evidence. There are zero exceptions.

借口现实
「我只改了一行代码」单行变更会导致生产故障,必须验证
「测试5分钟前刚通过」那之后你做了变更,重新运行
「我之前测试过这种模式」这是不同的实例,验证当前这个特定场景
「这个变更显然是对的」显而易见的变更失败率更高,因为没人验证
「运行测试太费时间」如果Bug流到生产环境,处理故障的时间更长
「我提交之后再验证」你不会的,而且如果验证失败,你还要撤销重改
「只是配置变更而已」配置变更是故障的第一诱因,必须验证
「代码检查警告都是误报」逐个检查,如果确实是误报,添加注释后再屏蔽
「类型错误是无关代码里的」它们可能会互相影响,运行完整检查
「我手动测试过了」手动测试不完整且不可复现,运行自动化验证
没有第5阶段的证据,不得声明完成,没有任何例外。

Integration Points

集成点

SkillWhen Verification Is Required
test-driven-development
After completing RED-GREEN-REFACTOR cycle for a feature
systematic-debugging
After applying a bug fix
executing-plans
After each task and after each batch
subagent-driven-development
After implementer delivers (via
Agent
tool), after reviewers approve
code-review
Before approving any code review
resilient-execution
Before marking any task as complete
autonomous-loop
Before setting EXIT_SIGNAL to true
finishing-a-development-branch
Before merge or PR creation
技能需要验证的时机
test-driven-development
完成功能的红-绿-重构循环后
systematic-debugging
应用Bug修复后
executing-plans
每个任务完成后、每批任务完成后
subagent-driven-development
实现者交付(通过
Agent
工具)后、评审者批准后
code-review
批准任何代码评审前
resilient-execution
标记任何任务为完成前
autonomous-loop
将EXIT_SIGNAL设为true前
finishing-a-development-branch
合并或创建PR前

Integration Flow

集成流程

[Do the work using other skills]
    |
    v
[Think you are done?]
    |
    v
[Invoke verification-before-completion]
    |
    +-- All checks pass -> Claim completion with Phase 5 evidence
    |
    +-- Any check fails -> Fix and re-verify (do NOT claim completion)

[Do the work using other skills]
    |
    v
[Think you are done?]
    |
    v
[Invoke verification-before-completion]
    |
    +-- All checks pass -> Claim completion with Phase 5 evidence
    |
    +-- Any check fails -> Fix and re-verify (do NOT claim completion)

Enforcement by Other Skills

其他技能的强制调用

This skill is invoked by ALL other skills at completion time. It is not optional. It is a terminal checkpoint — called at the end of work, never at the beginning.

所有其他技能在完成时都必须调用本技能,它不是可选项。它是最终检查点——只能在工作结束时调用,不能在开始时调用。

Skill Type

技能类型

RIGID — The 5-step protocol is a HARD-GATE. Every step must be executed in order. No step may be skipped. No completion claim is valid without Phase 5 evidence. Do not relax these requirements for any reason.
刚性(RIGID) —— 5步协议是硬门槛(HARD-GATE),必须按顺序执行每一步,不得跳过任何步骤。没有第5阶段证据的完成声明一律无效,任何情况下都不得放宽要求。