verification-before-completion
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseVerification Before Completion
完成前验证
You are verifying that a task is genuinely complete before it's marked as done. The goal is to prevent the common failure mode where an agent declares success while the problem still exists.
你需要在任务标记为完成前,验证它是否真正完成。目标是避免代理在问题仍存在时就宣告成功这一常见失效模式。
When to Activate
激活时机
- At every task checkpoint during plan execution
- Before marking a Linear issue as done
- After fixing a bug (verify the fix, not just that the code compiles)
- Before shipping — final pre-review verification
- 计划执行期间的每个任务检查点
- 在将Linear issue标记为完成前
- 修复bug后(验证修复效果,而不只是代码能编译)
- 上线前——最终的预审查验证
Preconditions
前置条件
This skill is invoked directly by at each task checkpoint. No independent precondition checks are needed.
executing-plans该技能由在每个任务检查点直接调用,无需独立的前置条件检查。
executing-plansActivation
激活流程
After being invoked, print the activation banner (see ):
_shared/observability.md---
**Verification Before Completion** activated
Trigger: Task checkpoint reached
Produces: 4-level verification report
---被调用后,打印激活横幅(参见):
_shared/observability.md---
**Verification Before Completion** activated
Trigger: Task checkpoint reached
Produces: 4-level verification report
---Verification Protocol
验证协议
Level 1: Build Verification
第1级:构建验证
Narrate:
Level 1/4: Build verification...Minimum bar — the project compiles and runs:
- Build succeeds: Run the build command, check for zero errors
- No type errors: Run typecheck (or equivalent)
tsc --noEmit - Lint passes: Run the linter with zero errors (warnings OK)
If Level 1 fails, the task is NOT complete. Stop and fix.
Narrate:
Level 1/4: Build verification... [PASS/FAIL]输出:
Level 1/4: Build verification...最低要求——项目能编译并运行:
- 构建成功:运行构建命令,检查是否无错误
- 无类型错误:运行类型检查(或等效命令)
tsc --noEmit - 代码检查通过:运行代码检查工具,确保无错误(警告可接受)
如果第1级验证失败,任务未完成。停止并修复。
输出:
Level 1/4: Build verification... [PASS/FAIL]Level 2: Test Verification
第2级:测试验证
Narrate:
Level 2/4: Test verification...Tests prove the behavior works:
- All tests pass: Run the full test suite, not just the new tests
- New tests exist: The task's changes should have corresponding tests
- Tests are meaningful: The tests actually verify the behavior, not just that the code runs
- No skipped tests: or
describe.skipfor the current task's tests = not donetest.todo
Verify tests are genuine by checking:
- Does the test fail if you revert the implementation change?
- Does the test cover the acceptance criteria from the issue?
- Does the test cover edge cases mentioned in the plan?
Narrate:
Level 2/4: Test verification... [PASS/FAIL]输出:
Level 2/4: Test verification...测试需证明功能正常工作:
- 所有测试通过:运行完整测试套件,不只是新增的测试
- 存在新增测试:任务的变更应有对应的测试
- 测试具备意义:测试需真正验证功能,而不只是代码能运行
- 无跳过的测试:当前任务的测试使用或
describe.skip等同于未完成test.todo
通过以下方式验证测试的有效性:
- 如果你回滚实现变更,测试是否会失败?
- 测试是否覆盖了issue中的验收标准?
- 测试是否覆盖了计划中提到的边缘情况?
输出:
Level 2/4: Test verification... [PASS/FAIL]Level 3: Acceptance Criteria
第3级:验收标准验证
Narrate:
Level 3/4: Acceptance criteria...The issue's requirements are met:
- Read the acceptance criteria from the Linear issue
- Check each criterion individually:
Acceptance Criteria: - [x] Users can log in with email/password — VERIFIED (test: auth.test.ts:24) - [x] Invalid credentials show error message — VERIFIED (test: auth.test.ts:38) - [ ] Rate limiting after 5 failed attempts — NOT VERIFIED (no test, no implementation found) - If any criterion is unmet: The task is NOT complete
Narrate:
Level 3/4: Acceptance criteria... [PASS/FAIL]输出:
Level 3/4: Acceptance criteria...需满足issue的要求:
- 阅读Linear issue中的验收标准
- 逐一检查每个标准:
验收标准: - [x] 用户可通过邮箱/密码登录 —— 已验证(测试:auth.test.ts:24) - [x] 无效凭证显示错误信息 —— 已验证(测试:auth.test.ts:38) - [ ] 5次失败尝试后触发限流 —— 未验证(无测试,未找到实现) - 如果有任何标准未满足:任务未完成
输出:
Level 3/4: Acceptance criteria... [PASS/FAIL]Level 4: Integration Verification
第4级:集成验证
Narrate:
Level 4/4: Integration verification...The changes work in context:
- No regression: Pre-existing tests still pass
- No side effects: Changes don't break unrelated features
- API contracts: If you changed an interface, all consumers are updated
- Data consistency: If you changed a schema, migrations exist and work
输出:
Level 4/4: Integration verification...变更需在上下文环境中正常工作:
- 无回归问题:原有测试仍能通过
- 无副作用:变更不会破坏无关功能
- API契约:如果修改了接口,所有调用方都已更新
- 数据一致性:如果修改了schema,需存在可用的迁移脚本且能正常运行
Verification Strategies
验证策略
For Bug Fixes
针对Bug修复
- Confirm the original reproduction steps no longer trigger the bug
- Confirm the regression test fails on the old code and passes on the new
- Check for related bugs that might have the same root cause
- 确认原始复现步骤不再触发该bug
- 确认回归测试在旧代码上失败,在新代码上通过
- 检查是否存在可能有相同根因的相关bug
For New Features
针对新功能
- Walk through the user flow end-to-end (mentally or via tests)
- Check empty states, error states, edge cases
- Verify the feature is discoverable and documented
- 从头到尾走一遍用户流程(通过思考或测试)
- 检查空状态、错误状态和边缘情况
- 验证功能可被发现且有文档说明
For Refactors
针对重构
- Behavior is identical before and after (tests prove this)
- Performance is not degraded (if relevant)
- The refactored code is actually cleaner (not just different)
Narrate:
Level 4/4: Integration verification... [PASS/FAIL]- 重构前后行为完全一致(由测试证明)
- 性能未下降(如相关)
- 重构后的代码确实更简洁(不只是不同)
输出:
Level 4/4: Integration verification... [PASS/FAIL]Failure Handling
失败处理
When verification fails:
-
Document what failed:
VERIFICATION FAILED Level: [1/2/3/4] Criterion: [what was being checked] Expected: [what should happen] Actual: [what happened instead] -
Don't retry blindly — Analyze why it failed first. Log the decision:Decision: [Retry approach] Reason: [root cause analysis] Alternatives: [other fix strategies considered]
-
Fix the root cause, then re-verify from Level 1
-
Max 3 retries — After 3 failures, use error recovery (see). AskUserQuestion with options: "Retry from Level 1 with different approach / Skip this verification level / Stop and report for manual review."
_shared/observability.md
当验证失败时:
-
记录失败内容:
验证失败 级别:[1/2/3/4] 检查项:[正在检查的内容] 预期:[应出现的结果] 实际:[实际出现的结果] -
不要盲目重试——先分析失败原因。记录决策:决策:[重试方案] 原因:[根因分析] 备选方案:[考虑过的其他修复策略]
-
修复根因,然后从第1级重新验证
-
最多3次重试——3次失败后,使用错误恢复机制(参见)。调用AskUserQuestion并提供选项:"使用不同方案从第1级重试 / 跳过该验证级别 / 停止并上报等待人工审核。"
_shared/observability.md
Completion Report
完成报告
Only after all relevant levels pass:
undefined只有在所有相关级别都通过后,生成:
undefinedVerification: PASS
验证:通过
Build: Clean
Tests: [N] passing, 0 failing, [N] new
Acceptance Criteria: [N/N] met
Integration: No regressions
Task is genuinely complete.
Or if issues remain:
构建:无问题
测试:[N]个通过,0个失败,[N]个新增
验收标准:[N/N]个满足
集成:无回归问题
任务已真正完成。
如果仍有问题:
Verification: BLOCKED
验证:阻塞
Passing: Levels 1-2
Failing: Level 3 — acceptance criterion [X] unmet
Details: [what's missing and why]
Recommendation: [what needs to happen next]
Task is NOT complete.
undefined通过级别:1-2级
失败级别:第3级——验收标准[X]未满足
详情:[缺失内容及原因]
建议:[下一步需执行的操作]
任务未完成。
undefinedRules
规则
- Never mark a task as done without running verification
- Never trust "it should work" — run the actual commands and check the actual output
- Tests passing is necessary but not sufficient — check acceptance criteria too
- If there's no test suite, flag it as a risk but don't block on it
- Verification should be fast — if it takes more than 2 minutes, the project's test/build setup needs improvement (flag this)
- Be honest about failures — a clearly reported failure is more valuable than a false success
- 未执行验证绝不标记任务完成
- 绝不相信“应该能行”——要实际运行命令并检查输出
- 测试通过是必要条件但不充分——还要检查验收标准
- 如果没有测试套件,标记为风险但不阻止任务
- 验证应快速完成——如果耗时超过2分钟,项目的测试/构建设置需要改进(标记此问题)
- 诚实地报告失败——清晰的失败报告比虚假的成功更有价值