verification-before-completion

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Verification Before Completion

完成前验证

You are verifying that a task is genuinely complete before it's marked as done. The goal is to prevent the common failure mode where an agent declares success while the problem still exists.
你需要在任务标记为完成前,验证它是否真正完成。目标是避免代理在问题仍存在时就宣告成功这一常见失效模式。

When to Activate

激活时机

  • At every task checkpoint during plan execution
  • Before marking a Linear issue as done
  • After fixing a bug (verify the fix, not just that the code compiles)
  • Before shipping — final pre-review verification
  • 计划执行期间的每个任务检查点
  • 在将Linear issue标记为完成前
  • 修复bug后(验证修复效果,而不只是代码能编译)
  • 上线前——最终的预审查验证

Preconditions

前置条件

This skill is invoked directly by
executing-plans
at each task checkpoint. No independent precondition checks are needed.
该技能由
executing-plans
在每个任务检查点直接调用,无需独立的前置条件检查。

Activation

激活流程

After being invoked, print the activation banner (see
_shared/observability.md
):
---
**Verification Before Completion** activated
Trigger: Task checkpoint reached
Produces: 4-level verification report
---
被调用后,打印激活横幅(参见
_shared/observability.md
):
---
**Verification Before Completion** activated
Trigger: Task checkpoint reached
Produces: 4-level verification report
---

Verification Protocol

验证协议

Level 1: Build Verification

第1级:构建验证

Narrate:
Level 1/4: Build verification...
Minimum bar — the project compiles and runs:
  1. Build succeeds: Run the build command, check for zero errors
  2. No type errors: Run typecheck (
    tsc --noEmit
    or equivalent)
  3. Lint passes: Run the linter with zero errors (warnings OK)
If Level 1 fails, the task is NOT complete. Stop and fix.
Narrate:
Level 1/4: Build verification... [PASS/FAIL]
输出:
Level 1/4: Build verification...
最低要求——项目能编译并运行:
  1. 构建成功:运行构建命令,检查是否无错误
  2. 无类型错误:运行类型检查(
    tsc --noEmit
    或等效命令)
  3. 代码检查通过:运行代码检查工具,确保无错误(警告可接受)
如果第1级验证失败,任务未完成。停止并修复。
输出:
Level 1/4: Build verification... [PASS/FAIL]

Level 2: Test Verification

第2级:测试验证

Narrate:
Level 2/4: Test verification...
Tests prove the behavior works:
  1. All tests pass: Run the full test suite, not just the new tests
  2. New tests exist: The task's changes should have corresponding tests
  3. Tests are meaningful: The tests actually verify the behavior, not just that the code runs
  4. No skipped tests:
    describe.skip
    or
    test.todo
    for the current task's tests = not done
Verify tests are genuine by checking:
  • Does the test fail if you revert the implementation change?
  • Does the test cover the acceptance criteria from the issue?
  • Does the test cover edge cases mentioned in the plan?
Narrate:
Level 2/4: Test verification... [PASS/FAIL]
输出:
Level 2/4: Test verification...
测试需证明功能正常工作:
  1. 所有测试通过:运行完整测试套件,不只是新增的测试
  2. 存在新增测试:任务的变更应有对应的测试
  3. 测试具备意义:测试需真正验证功能,而不只是代码能运行
  4. 无跳过的测试:当前任务的测试使用
    describe.skip
    test.todo
    等同于未完成
通过以下方式验证测试的有效性:
  • 如果你回滚实现变更,测试是否会失败?
  • 测试是否覆盖了issue中的验收标准?
  • 测试是否覆盖了计划中提到的边缘情况?
输出:
Level 2/4: Test verification... [PASS/FAIL]

Level 3: Acceptance Criteria

第3级:验收标准验证

Narrate:
Level 3/4: Acceptance criteria...
The issue's requirements are met:
  1. Read the acceptance criteria from the Linear issue
  2. Check each criterion individually:
    Acceptance Criteria:
    - [x] Users can log in with email/password — VERIFIED (test: auth.test.ts:24)
    - [x] Invalid credentials show error message — VERIFIED (test: auth.test.ts:38)
    - [ ] Rate limiting after 5 failed attempts — NOT VERIFIED (no test, no implementation found)
  3. If any criterion is unmet: The task is NOT complete
Narrate:
Level 3/4: Acceptance criteria... [PASS/FAIL]
输出:
Level 3/4: Acceptance criteria...
需满足issue的要求:
  1. 阅读Linear issue中的验收标准
  2. 逐一检查每个标准
    验收标准:
    - [x] 用户可通过邮箱/密码登录 —— 已验证(测试:auth.test.ts:24)
    - [x] 无效凭证显示错误信息 —— 已验证(测试:auth.test.ts:38)
    - [ ] 5次失败尝试后触发限流 —— 未验证(无测试,未找到实现)
  3. 如果有任何标准未满足:任务未完成
输出:
Level 3/4: Acceptance criteria... [PASS/FAIL]

Level 4: Integration Verification

第4级:集成验证

Narrate:
Level 4/4: Integration verification...
The changes work in context:
  1. No regression: Pre-existing tests still pass
  2. No side effects: Changes don't break unrelated features
  3. API contracts: If you changed an interface, all consumers are updated
  4. Data consistency: If you changed a schema, migrations exist and work
输出:
Level 4/4: Integration verification...
变更需在上下文环境中正常工作:
  1. 无回归问题:原有测试仍能通过
  2. 无副作用:变更不会破坏无关功能
  3. API契约:如果修改了接口,所有调用方都已更新
  4. 数据一致性:如果修改了schema,需存在可用的迁移脚本且能正常运行

Verification Strategies

验证策略

For Bug Fixes

针对Bug修复

  1. Confirm the original reproduction steps no longer trigger the bug
  2. Confirm the regression test fails on the old code and passes on the new
  3. Check for related bugs that might have the same root cause
  1. 确认原始复现步骤不再触发该bug
  2. 确认回归测试在旧代码上失败,在新代码上通过
  3. 检查是否存在可能有相同根因的相关bug

For New Features

针对新功能

  1. Walk through the user flow end-to-end (mentally or via tests)
  2. Check empty states, error states, edge cases
  3. Verify the feature is discoverable and documented
  1. 从头到尾走一遍用户流程(通过思考或测试)
  2. 检查空状态、错误状态和边缘情况
  3. 验证功能可被发现且有文档说明

For Refactors

针对重构

  1. Behavior is identical before and after (tests prove this)
  2. Performance is not degraded (if relevant)
  3. The refactored code is actually cleaner (not just different)
Narrate:
Level 4/4: Integration verification... [PASS/FAIL]
  1. 重构前后行为完全一致(由测试证明)
  2. 性能未下降(如相关)
  3. 重构后的代码确实更简洁(不只是不同)
输出:
Level 4/4: Integration verification... [PASS/FAIL]

Failure Handling

失败处理

When verification fails:
  1. Document what failed:
    VERIFICATION FAILED
    Level: [1/2/3/4]
    Criterion: [what was being checked]
    Expected: [what should happen]
    Actual: [what happened instead]
  2. Don't retry blindly — Analyze why it failed first. Log the decision:
    Decision: [Retry approach] Reason: [root cause analysis] Alternatives: [other fix strategies considered]
  3. Fix the root cause, then re-verify from Level 1
  4. Max 3 retries — After 3 failures, use error recovery (see
    _shared/observability.md
    ). AskUserQuestion with options: "Retry from Level 1 with different approach / Skip this verification level / Stop and report for manual review."
当验证失败时:
  1. 记录失败内容
    验证失败
    级别:[1/2/3/4]
    检查项:[正在检查的内容]
    预期:[应出现的结果]
    实际:[实际出现的结果]
  2. 不要盲目重试——先分析失败原因。记录决策:
    决策:[重试方案] 原因:[根因分析] 备选方案:[考虑过的其他修复策略]
  3. 修复根因,然后从第1级重新验证
  4. 最多3次重试——3次失败后,使用错误恢复机制(参见
    _shared/observability.md
    )。调用AskUserQuestion并提供选项:"使用不同方案从第1级重试 / 跳过该验证级别 / 停止并上报等待人工审核。"

Completion Report

完成报告

Only after all relevant levels pass:
undefined
只有在所有相关级别都通过后,生成:
undefined

Verification: PASS

验证:通过

Build: Clean Tests: [N] passing, 0 failing, [N] new Acceptance Criteria: [N/N] met Integration: No regressions
Task is genuinely complete.

Or if issues remain:
构建:无问题 测试:[N]个通过,0个失败,[N]个新增 验收标准:[N/N]个满足 集成:无回归问题
任务已真正完成。

如果仍有问题:

Verification: BLOCKED

验证:阻塞

Passing: Levels 1-2 Failing: Level 3 — acceptance criterion [X] unmet Details: [what's missing and why] Recommendation: [what needs to happen next]
Task is NOT complete.
undefined
通过级别:1-2级 失败级别:第3级——验收标准[X]未满足 详情:[缺失内容及原因] 建议:[下一步需执行的操作]
任务未完成。
undefined

Rules

规则

  • Never mark a task as done without running verification
  • Never trust "it should work" — run the actual commands and check the actual output
  • Tests passing is necessary but not sufficient — check acceptance criteria too
  • If there's no test suite, flag it as a risk but don't block on it
  • Verification should be fast — if it takes more than 2 minutes, the project's test/build setup needs improvement (flag this)
  • Be honest about failures — a clearly reported failure is more valuable than a false success
  • 未执行验证绝不标记任务完成
  • 绝不相信“应该能行”——要实际运行命令并检查输出
  • 测试通过是必要条件但不充分——还要检查验收标准
  • 如果没有测试套件,标记为风险但不阻止任务
  • 验证应快速完成——如果耗时超过2分钟,项目的测试/构建设置需要改进(标记此问题)
  • 诚实地报告失败——清晰的失败报告比虚假的成功更有价值