verify
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseVerify implementation against change artifacts using four dimensions. Uses independent subagents to eliminate context bias.
<decision_boundary>
Use for:
- Validating implementation completeness against spec artifacts before archive
- Independent verification via subagents to catch context bias
- Surfacing living-doc drift (Layer 1/2/3) as advisory findings
NOT for:
- Creating or modifying spec artifacts (use )
/beat:design - Writing tasks (use )
/beat:plan - Running implementation (use )
/beat:apply - Archiving the change (use )
/beat:archive
Trigger examples:
- "Verify the change" / "Check implementation against spec" / "Run verification"
- Should NOT trigger: "design a feature" / "implement the change" / "archive it"
</decision_boundary>
<HARD-GATE>
You MUST dispatch independent subagents for verification — NEVER verify implementation yourself
in the main session. The main session has context bias from the conversation history.
Dispatch the verification subagent AND code-reviewer in parallel — they are independent checks.
If a subagent fails, proceed with findings from the other. If BOTH fail, report the failure —
do NOT fall back to self-verification.
</HARD-GATE>
从四个维度验证实现与变更工件的一致性。使用独立子Agent消除语境偏差。
<decision_boundary>
适用场景:
- 归档前验证实现是否符合规范工件的完整性要求
- 通过子Agent进行独立验证,发现语境偏差
- 揭示活文档偏差(第1/2/3层),作为建议性检查结果
不适用场景:
- 创建或修改规范工件(使用)
/beat:design - 编写任务(使用)
/beat:plan - 执行实现(使用)
/beat:apply - 归档变更(使用)
/beat:archive
触发示例:
- "验证变更" / "检查实现是否符合规范" / "执行验证"
- 不应触发:"设计功能" / "实现变更" / "归档它"
</decision_boundary>
<HARD-GATE>
你必须调度独立的子Agent进行验证——绝对不能在主会话中自行验证实现。主会话会因对话历史产生语境偏差。
同时调度验证子Agent和代码审查Agent——它们是独立的检查环节。
如果其中一个子Agent失败,使用另一个的检查结果继续。如果两个都失败,报告失败情况——不要退回到自行验证。
</HARD-GATE>
Rationalization Prevention
避免合理化借口
| Thought | Reality |
|---|---|
| "The change is small, I can verify it myself" | Self-verification creates confirmation bias. You saw the implementation — you can't objectively verify it. |
| "I already reviewed the code during apply" | That's exactly why you need an independent verifier. Familiarity breeds blind spots. |
| "Running two subagents is overkill for this" | Code quality and spec alignment are independent dimensions. A single agent conflates them. |
| "I'll just run the tests, that's verification enough" | Tests verify behavior but not spec alignment, design adherence, or code quality. |
| "I'll dispatch them sequentially to save context" | They're independent — parallel dispatch is faster and prevents one report from biasing the other. |
| 错误想法 | 实际情况 |
|---|---|
| "改动很小,我可以自己验证" | 自行验证会产生确认偏差。你参与了实现过程——无法客观地进行验证。 |
| "我在执行阶段已经审查过代码了" | 这正是你需要独立验证者的原因。熟悉度会导致盲区。 |
| "为这个任务运行两个子Agent太夸张了" | 代码质量与规范一致性是独立维度。单个Agent会混淆这两者。 |
| "我只要运行测试就足够验证了" | 测试只能验证行为,无法验证规范一致性、设计遵循度或代码质量。 |
| "我会按顺序调度它们以节省上下文" | 它们是独立的——并行调度更快,还能避免一份报告影响另一份。 |
Red Flags — STOP if you catch yourself:
警示信号——如果发现自己有以下行为,请立即停止:
- Verifying any dimension yourself instead of dispatching a subagent
- Dispatching subagents sequentially instead of in parallel
- Skipping code-reviewer because "the code is simple"
- Claiming verification passed without reading the subagent reports
- Editing code during verification (verify reads, doesn't write)
- Falling back to self-verification because a subagent failed
- 自行验证任何维度,而非调度子Agent
- 按顺序而非并行调度子Agent
- 因“代码很简单”而跳过代码审查Agent
- 未阅读子Agent报告就声称验证通过
- 在验证过程中编辑代码(验证仅为读取操作,不涉及写入)
- 因子Agent失败而退回到自行验证
Process Flow
流程
dot
digraph verify {
"Select change" [shape=box];
"Read artifacts +\ntesting context" [shape=box];
"Parallel dispatch" [shape=box, style=bold];
"Verification\nsubagent" [shape=box];
"Code-reviewer\nsubagent" [shape=box];
"tests available?" [shape=diamond];
"Run automated tests" [shape=box];
"Present combined report" [shape=doublecircle];
"Select change" -> "Read artifacts +\ntesting context";
"Read artifacts +\ntesting context" -> "Parallel dispatch";
"Parallel dispatch" -> "Verification\nsubagent";
"Parallel dispatch" -> "Code-reviewer\nsubagent";
"Verification\nsubagent" -> "tests available?";
"Code-reviewer\nsubagent" -> "tests available?";
"tests available?" -> "Run automated tests" [label="yes"];
"tests available?" -> "Present combined report" [label="no"];
"Run automated tests" -> "Present combined report";
}Input: Optionally specify a change name. If omitted, infer from context or prompt.
Steps
-
Select the changeIf no name provided:
- Look for directories (excluding
beat/changes/)archive/ - If only one exists, use it
- If multiple exist, use AskUserQuestion tool to let user select
- Look for
-
Read all artifacts and determine testing contextRead from:
beat/changes/<name>/- (schema:
status.yaml)references/status-schema.md - (all Gherkin files, if gherkin status is
features/*.feature)done - (if exists)
proposal.md - (if exists)
design.md - (if exists)
tasks.md
Read(if exists, schema:beat/config.yaml).references/config-schema.mdDetermine drive mode:- If status is
gherkin→ Gherkin-driven verificationdone - If status is
gherkin→ Proposal-driven verificationskipped
Determine testing context (three-layer priority: tag > source > config):- Config layer: Is set to
testing.required? If yes, skip test existence checks globally.false - Source layer: Does contain
status.yaml? If yes, Dimension 1 switches to accuracy mode (see below).source: distill - Tag layer: Every scenario in a .feature file is expected to have a corresponding test (in TDD mode).
-
Dispatch verification subagent AND code-reviewer in parallelLaunch BOTH agents simultaneously using a single message with two Agent tool calls:Agent A — Verification subagent (subagent_type:): Read
Explorefor the complete subagent prompt.verification-subagent-prompt.mdProvide ONLY:- All artifact contents (features, proposal, design, tasks)
- Testing context (drive mode, testing config, source flag, tag counts)
- Do NOT pass conversation history or session context.
Agent B — Code quality review (subagent_type:):superpowers:code-reviewerProvide:- The change name and description (from proposal or status.yaml)
- List of files created/modified during apply
- The planning document (tasks.md or proposal.md) as the "original plan"
This reviews: code quality, architecture, naming, error handling, test quality, security, and plan alignment.Fallback: If one agent fails, proceed with the other's findings. If BOTH fail, report failure — do NOT self-verify. -
Run automated tests if availableDetect and run the project's test suite:
- Behavior tests: run using framework (or auto-detect)
testing.behavior - E2E tests: run using framework (or auto-detect). If
testing.e2eexists in status.yaml, combine BDD feature paths:gherkin.modified+beat/features/beat/changes/<name>/features/ - Report behavior and e2e results separately
- Behavior tests: run using
-
Present combined verification reportCombine both subagent reports:
- Dimensions 1-3 from verification subagent (spec alignment)
- Dimension 4 from code-reviewer (code quality)
- Dimension 5 from verification subagent (living docs sync — Layer 1/2/3, advisory only)
- Step 4 test results (if available)
Issue Classification
- CRITICAL: Must fix (missing scenario test [in coverage mode], inaccurate scenario [in accuracy mode], unimplemented goal, design violation, security vulnerability)
- WARNING: Should fix (partial coverage, possible divergence, non-executable test, Gherkin quality issues, code quality concerns, living-doc drift — Layer 1/2/3 sync gaps)
- SUGGESTION: Nice to fix (pattern inconsistency, minor improvement, missing test in distill mode, module without README)
Dimension 5 is advisory — its findings classify as WARNING or SUGGESTION only, never CRITICAL. The user decides whether to act before archiving; living-doc drift never blocks the archive.
Graceful Degradation
- Gherkin skipped: skip Dimension 1, strengthen Dimension 2 (proposal alignment)
- Only features exist: verify Gherkin coverage only
- Features + proposal: verify coverage + alignment
- Features + proposal + design: verify all four dimensions
- Always note which checks were skipped and why
dot
digraph verify {
"Select change" [shape=box];
"Read artifacts +\ntesting context" [shape=box];
"Parallel dispatch" [shape=box, style=bold];
"Verification\nsubagent" [shape=box];
"Code-reviewer\nsubagent" [shape=box];
"tests available?" [shape=diamond];
"Run automated tests" [shape=box];
"Present combined report" [shape=doublecircle];
"Select change" -> "Read artifacts +\ntesting context";
"Read artifacts +\ntesting context" -> "Parallel dispatch";
"Parallel dispatch" -> "Verification\nsubagent";
"Parallel dispatch" -> "Code-reviewer\nsubagent";
"Verification\nsubagent" -> "tests available?";
"Code-reviewer\nsubagent" -> "tests available?";
"tests available?" -> "Run automated tests" [label="yes"];
"tests available?" -> "Present combined report" [label="no"];
"Run automated tests" -> "Present combined report";
}输入:可选择性指定变更名称。如果未指定,从上下文推断或提示用户提供。
步骤
-
选择变更如果未提供名称:
- 查找目录(排除
beat/changes/)archive/ - 如果仅存在一个目录,直接使用
- 如果存在多个目录,使用AskUserQuestion工具让用户选择
- 查找
-
读取所有工件并确定测试上下文从读取:
beat/changes/<name>/- ( schema:
status.yaml)references/status-schema.md - (所有Gherkin文件,若gherkin状态为
features/*.feature)done - (如果存在)
proposal.md - (如果存在)
design.md - (如果存在)
tasks.md
读取(如果存在,schema:beat/config.yaml)。references/config-schema.md确定驱动模式:- 若状态为
gherkin→ Gherkin驱动验证done - 若状态为
gherkin→ 提案驱动验证skipped
确定测试上下文(三层优先级:标签 > 来源 > 配置):- 配置层:是否设为
testing.required?若是,全局跳过测试存在性检查。false - 来源层:是否包含
status.yaml?若是,维度1切换为精度模式(见下文)。source: distill - 标签层:.feature文件中的每个场景都应有对应的测试(TDD模式下)。
-
并行调度验证子Agent和代码审查Agent使用包含两个Agent工具调用的单条消息,同时启动两个Agent:Agent A — 验证子Agent(subagent_type:): 阅读
Explore获取完整子Agent提示词。verification-subagent-prompt.md仅提供:- 所有工件内容(features、提案、设计、任务)
- 测试上下文(驱动模式、测试配置、来源标记、标签数量)
- 不得传递对话历史或会话上下文。
Agent B — 代码质量审查(subagent_type:):superpowers:code-reviewer提供:- 变更名称和描述(来自提案或status.yaml)
- 实现阶段创建/修改的文件列表
- 规划文档(tasks.md或proposal.md)作为“原始计划”
审查内容包括:代码质量、架构、命名、错误处理、测试质量、安全性以及与计划的一致性。降级方案:如果一个Agent失败,使用另一个的检查结果继续。如果两个都失败,报告失败情况——不要自行验证。 -
若有可用测试则运行自动化测试检测并运行项目的测试套件:
- 行为测试:使用框架运行(或自动检测)
testing.behavior - E2E测试:使用框架运行(或自动检测)。若status.yaml中存在
testing.e2e,合并BDD feature路径:gherkin.modified+beat/features/beat/changes/<name>/features/ - 分别报告行为测试和E2E测试结果
- 行为测试:使用
-
呈现合并后的验证报告合并两个子Agent的报告:
- 验证子Agent提供的维度1-3(规范一致性)
- 代码审查Agent提供的维度4(代码质量)
- 验证子Agent提供的维度5(活文档同步——第1/2/3层,仅作为建议)
- 步骤4的测试结果(若有)
问题分类
- 严重:必须修复(覆盖模式下缺少场景测试、精度模式下场景不准确、未实现目标、违反设计、安全漏洞)
- 警告:应该修复(部分覆盖、可能存在偏差、无法执行的测试、Gherkin质量问题、代码质量问题、活文档偏差——第1/2/3层同步缺口)
- 建议:最好修复(模式不一致、微小改进、distill模式下缺少测试、无README的模块)
维度5仅为建议——其检查结果仅归类为警告或建议,绝不会是严重级别。用户可决定是否在归档前处理;活文档偏差不会阻止归档操作。
优雅降级
- Gherkin被跳过:跳过维度1,强化维度2(提案一致性)
- 仅存在features:仅验证Gherkin覆盖情况
- features + 提案:验证覆盖情况与一致性
- features + 提案 + 设计:验证全部四个维度
- 务必注明跳过了哪些检查及原因