verify

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese
Verify implementation against change artifacts using four dimensions. Uses independent subagents to eliminate context bias.
<decision_boundary>
Use for:
  • Validating implementation completeness against spec artifacts before archive
  • Independent verification via subagents to catch context bias
  • Surfacing living-doc drift (Layer 1/2/3) as advisory findings
NOT for:
  • Creating or modifying spec artifacts (use
    /beat:design
    )
  • Writing tasks (use
    /beat:plan
    )
  • Running implementation (use
    /beat:apply
    )
  • Archiving the change (use
    /beat:archive
    )
Trigger examples:
  • "Verify the change" / "Check implementation against spec" / "Run verification"
  • Should NOT trigger: "design a feature" / "implement the change" / "archive it"
</decision_boundary>
<HARD-GATE> You MUST dispatch independent subagents for verification — NEVER verify implementation yourself in the main session. The main session has context bias from the conversation history.
Dispatch the verification subagent AND code-reviewer in parallel — they are independent checks.
If a subagent fails, proceed with findings from the other. If BOTH fail, report the failure — do NOT fall back to self-verification. </HARD-GATE>
从四个维度验证实现与变更工件的一致性。使用独立子Agent消除语境偏差。
<decision_boundary>
适用场景:
  • 归档前验证实现是否符合规范工件的完整性要求
  • 通过子Agent进行独立验证,发现语境偏差
  • 揭示活文档偏差(第1/2/3层),作为建议性检查结果
不适用场景:
  • 创建或修改规范工件(使用
    /beat:design
  • 编写任务(使用
    /beat:plan
  • 执行实现(使用
    /beat:apply
  • 归档变更(使用
    /beat:archive
触发示例:
  • "验证变更" / "检查实现是否符合规范" / "执行验证"
  • 不应触发:"设计功能" / "实现变更" / "归档它"
</decision_boundary>
<HARD-GATE> 你必须调度独立的子Agent进行验证——绝对不能在主会话中自行验证实现。主会话会因对话历史产生语境偏差。
同时调度验证子Agent和代码审查Agent——它们是独立的检查环节。
如果其中一个子Agent失败,使用另一个的检查结果继续。如果两个都失败,报告失败情况——不要退回到自行验证。 </HARD-GATE>

Rationalization Prevention

避免合理化借口

ThoughtReality
"The change is small, I can verify it myself"Self-verification creates confirmation bias. You saw the implementation — you can't objectively verify it.
"I already reviewed the code during apply"That's exactly why you need an independent verifier. Familiarity breeds blind spots.
"Running two subagents is overkill for this"Code quality and spec alignment are independent dimensions. A single agent conflates them.
"I'll just run the tests, that's verification enough"Tests verify behavior but not spec alignment, design adherence, or code quality.
"I'll dispatch them sequentially to save context"They're independent — parallel dispatch is faster and prevents one report from biasing the other.
错误想法实际情况
"改动很小,我可以自己验证"自行验证会产生确认偏差。你参与了实现过程——无法客观地进行验证。
"我在执行阶段已经审查过代码了"这正是你需要独立验证者的原因。熟悉度会导致盲区。
"为这个任务运行两个子Agent太夸张了"代码质量与规范一致性是独立维度。单个Agent会混淆这两者。
"我只要运行测试就足够验证了"测试只能验证行为,无法验证规范一致性、设计遵循度或代码质量。
"我会按顺序调度它们以节省上下文"它们是独立的——并行调度更快,还能避免一份报告影响另一份。

Red Flags — STOP if you catch yourself:

警示信号——如果发现自己有以下行为,请立即停止:

  • Verifying any dimension yourself instead of dispatching a subagent
  • Dispatching subagents sequentially instead of in parallel
  • Skipping code-reviewer because "the code is simple"
  • Claiming verification passed without reading the subagent reports
  • Editing code during verification (verify reads, doesn't write)
  • Falling back to self-verification because a subagent failed
  • 自行验证任何维度,而非调度子Agent
  • 按顺序而非并行调度子Agent
  • 因“代码很简单”而跳过代码审查Agent
  • 未阅读子Agent报告就声称验证通过
  • 在验证过程中编辑代码(验证仅为读取操作,不涉及写入)
  • 因子Agent失败而退回到自行验证

Process Flow

流程

dot
digraph verify {
    "Select change" [shape=box];
    "Read artifacts +\ntesting context" [shape=box];
    "Parallel dispatch" [shape=box, style=bold];
    "Verification\nsubagent" [shape=box];
    "Code-reviewer\nsubagent" [shape=box];
    "tests available?" [shape=diamond];
    "Run automated tests" [shape=box];
    "Present combined report" [shape=doublecircle];

    "Select change" -> "Read artifacts +\ntesting context";
    "Read artifacts +\ntesting context" -> "Parallel dispatch";
    "Parallel dispatch" -> "Verification\nsubagent";
    "Parallel dispatch" -> "Code-reviewer\nsubagent";
    "Verification\nsubagent" -> "tests available?";
    "Code-reviewer\nsubagent" -> "tests available?";
    "tests available?" -> "Run automated tests" [label="yes"];
    "tests available?" -> "Present combined report" [label="no"];
    "Run automated tests" -> "Present combined report";
}
Input: Optionally specify a change name. If omitted, infer from context or prompt.
Steps
  1. Select the change
    If no name provided:
    • Look for
      beat/changes/
      directories (excluding
      archive/
      )
    • If only one exists, use it
    • If multiple exist, use AskUserQuestion tool to let user select
  2. Read all artifacts and determine testing context
    Read from
    beat/changes/<name>/
    :
    • status.yaml
      (schema:
      references/status-schema.md
      )
    • features/*.feature
      (all Gherkin files, if gherkin status is
      done
      )
    • proposal.md
      (if exists)
    • design.md
      (if exists)
    • tasks.md
      (if exists)
    Read
    beat/config.yaml
    (if exists, schema:
    references/config-schema.md
    ).
    Determine drive mode:
    • If
      gherkin
      status is
      done
      Gherkin-driven verification
    • If
      gherkin
      status is
      skipped
      Proposal-driven verification
    Determine testing context (three-layer priority: tag > source > config):
    • Config layer: Is
      testing.required
      set to
      false
      ? If yes, skip test existence checks globally.
    • Source layer: Does
      status.yaml
      contain
      source: distill
      ? If yes, Dimension 1 switches to accuracy mode (see below).
    • Tag layer: Every scenario in a .feature file is expected to have a corresponding test (in TDD mode).
  3. Dispatch verification subagent AND code-reviewer in parallel
    Launch BOTH agents simultaneously using a single message with two Agent tool calls:
    Agent A — Verification subagent (subagent_type:
    Explore
    ): Read
    verification-subagent-prompt.md
    for the complete subagent prompt.
    Provide ONLY:
    • All artifact contents (features, proposal, design, tasks)
    • Testing context (drive mode, testing config, source flag, tag counts)
    • Do NOT pass conversation history or session context.
    Agent B — Code quality review (subagent_type:
    superpowers:code-reviewer
    ):
    Provide:
    • The change name and description (from proposal or status.yaml)
    • List of files created/modified during apply
    • The planning document (tasks.md or proposal.md) as the "original plan"
    This reviews: code quality, architecture, naming, error handling, test quality, security, and plan alignment.
    Fallback: If one agent fails, proceed with the other's findings. If BOTH fail, report failure — do NOT self-verify.
  4. Run automated tests if available
    Detect and run the project's test suite:
    • Behavior tests: run using
      testing.behavior
      framework (or auto-detect)
    • E2E tests: run using
      testing.e2e
      framework (or auto-detect). If
      gherkin.modified
      exists in status.yaml, combine BDD feature paths:
      beat/features/
      +
      beat/changes/<name>/features/
    • Report behavior and e2e results separately
  5. Present combined verification report
    Combine both subagent reports:
    • Dimensions 1-3 from verification subagent (spec alignment)
    • Dimension 4 from code-reviewer (code quality)
    • Dimension 5 from verification subagent (living docs sync — Layer 1/2/3, advisory only)
    • Step 4 test results (if available)
Issue Classification
  • CRITICAL: Must fix (missing scenario test [in coverage mode], inaccurate scenario [in accuracy mode], unimplemented goal, design violation, security vulnerability)
  • WARNING: Should fix (partial coverage, possible divergence, non-executable test, Gherkin quality issues, code quality concerns, living-doc drift — Layer 1/2/3 sync gaps)
  • SUGGESTION: Nice to fix (pattern inconsistency, minor improvement, missing test in distill mode, module without README)
Dimension 5 is advisory — its findings classify as WARNING or SUGGESTION only, never CRITICAL. The user decides whether to act before archiving; living-doc drift never blocks the archive.
Graceful Degradation
  • Gherkin skipped: skip Dimension 1, strengthen Dimension 2 (proposal alignment)
  • Only features exist: verify Gherkin coverage only
  • Features + proposal: verify coverage + alignment
  • Features + proposal + design: verify all four dimensions
  • Always note which checks were skipped and why
dot
digraph verify {
    "Select change" [shape=box];
    "Read artifacts +\ntesting context" [shape=box];
    "Parallel dispatch" [shape=box, style=bold];
    "Verification\nsubagent" [shape=box];
    "Code-reviewer\nsubagent" [shape=box];
    "tests available?" [shape=diamond];
    "Run automated tests" [shape=box];
    "Present combined report" [shape=doublecircle];

    "Select change" -> "Read artifacts +\ntesting context";
    "Read artifacts +\ntesting context" -> "Parallel dispatch";
    "Parallel dispatch" -> "Verification\nsubagent";
    "Parallel dispatch" -> "Code-reviewer\nsubagent";
    "Verification\nsubagent" -> "tests available?";
    "Code-reviewer\nsubagent" -> "tests available?";
    "tests available?" -> "Run automated tests" [label="yes"];
    "tests available?" -> "Present combined report" [label="no"];
    "Run automated tests" -> "Present combined report";
}
输入:可选择性指定变更名称。如果未指定,从上下文推断或提示用户提供。
步骤
  1. 选择变更
    如果未提供名称:
    • 查找
      beat/changes/
      目录(排除
      archive/
    • 如果仅存在一个目录,直接使用
    • 如果存在多个目录,使用AskUserQuestion工具让用户选择
  2. 读取所有工件并确定测试上下文
    beat/changes/<name>/
    读取:
    • status.yaml
      ( schema:
      references/status-schema.md
    • features/*.feature
      (所有Gherkin文件,若gherkin状态为
      done
    • proposal.md
      (如果存在)
    • design.md
      (如果存在)
    • tasks.md
      (如果存在)
    读取
    beat/config.yaml
    (如果存在,schema:
    references/config-schema.md
    )。
    确定驱动模式:
    • gherkin
      状态为
      done
      Gherkin驱动验证
    • gherkin
      状态为
      skipped
      提案驱动验证
    确定测试上下文(三层优先级:标签 > 来源 > 配置):
    • 配置层
      testing.required
      是否设为
      false
      ?若是,全局跳过测试存在性检查。
    • 来源层
      status.yaml
      是否包含
      source: distill
      ?若是,维度1切换为精度模式(见下文)。
    • 标签层:.feature文件中的每个场景都应有对应的测试(TDD模式下)。
  3. 并行调度验证子Agent和代码审查Agent
    使用包含两个Agent工具调用的单条消息,同时启动两个Agent:
    Agent A — 验证子Agent(subagent_type:
    Explore
    ): 阅读
    verification-subagent-prompt.md
    获取完整子Agent提示词。
    仅提供:
    • 所有工件内容(features、提案、设计、任务)
    • 测试上下文(驱动模式、测试配置、来源标记、标签数量)
    • 不得传递对话历史或会话上下文。
    Agent B — 代码质量审查(subagent_type:
    superpowers:code-reviewer
    ):
    提供:
    • 变更名称和描述(来自提案或status.yaml)
    • 实现阶段创建/修改的文件列表
    • 规划文档(tasks.md或proposal.md)作为“原始计划”
    审查内容包括:代码质量、架构、命名、错误处理、测试质量、安全性以及与计划的一致性。
    降级方案:如果一个Agent失败,使用另一个的检查结果继续。如果两个都失败,报告失败情况——不要自行验证。
  4. 若有可用测试则运行自动化测试
    检测并运行项目的测试套件:
    • 行为测试:使用
      testing.behavior
      框架运行(或自动检测)
    • E2E测试:使用
      testing.e2e
      框架运行(或自动检测)。若status.yaml中存在
      gherkin.modified
      ,合并BDD feature路径:
      beat/features/
      +
      beat/changes/<name>/features/
    • 分别报告行为测试和E2E测试结果
  5. 呈现合并后的验证报告
    合并两个子Agent的报告:
    • 验证子Agent提供的维度1-3(规范一致性)
    • 代码审查Agent提供的维度4(代码质量)
    • 验证子Agent提供的维度5(活文档同步——第1/2/3层,仅作为建议)
    • 步骤4的测试结果(若有)
问题分类
  • 严重:必须修复(覆盖模式下缺少场景测试、精度模式下场景不准确、未实现目标、违反设计、安全漏洞)
  • 警告:应该修复(部分覆盖、可能存在偏差、无法执行的测试、Gherkin质量问题、代码质量问题、活文档偏差——第1/2/3层同步缺口)
  • 建议:最好修复(模式不一致、微小改进、distill模式下缺少测试、无README的模块)
维度5仅为建议——其检查结果仅归类为警告或建议,绝不会是严重级别。用户可决定是否在归档前处理;活文档偏差不会阻止归档操作。
优雅降级
  • Gherkin被跳过:跳过维度1,强化维度2(提案一致性)
  • 仅存在features:仅验证Gherkin覆盖情况
  • features + 提案:验证覆盖情况与一致性
  • features + 提案 + 设计:验证全部四个维度
  • 务必注明跳过了哪些检查及原因