autistic-code-review

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Autistic Code Review

Goal

目标

Audit an implementation end-to-end, with or without a formal plan, and produce a defensible review with evidence from code, diffs, tests, and manual UI verification.

对实施内容进行端到端审核（无论是否有正式计划），并基于代码、差异文件、测试及手动UI验证结果生成可举证的评审报告。

When to use

适用场景

Use this skill when the user asks for a broad post-implementation review such as:

comparing implementation to an attached plan or handoff
reviewing uncommitted or committed changes for regressions and bugs
manually verifying front-end behavior with Playwright and/or agent-browser
assessing strategic implementation quality, not only local correctness
identifying test coverage gaps, adding tests, and running suites across application and database layers

当用户需要进行宽泛的实施后评审时使用本技能，例如：

对比实施内容与附带的计划或交接文档
评审未提交或已提交的变更，排查回归问题与缺陷
使用Playwright和/或agent-browser手动验证前端行为
评估实施的战略质量，而非仅关注局部正确性
识别测试覆盖缺口，补充测试并在应用和数据库层运行测试套件

Entry criteria

准入条件

Check these preconditions before deep review:

repo scope is clear (
```
cwd
```
, target project, and base branch/range known)
change scope is available (
```
git status
```
/
```
git diff
```
or explicit commit range)
runnable environment exists for intended checks (tests/build/dev server as needed)
UI verification prerequisites are known (auth path, test user/role, seed state)
DB review prerequisites are known when relevant (local DB state, migration order, reset/test commands)
test command set is known (
```
npm test
```
/
```
vitest
```
,
```
supabase test db
```
, and any targeted commands)

If any criterion fails, continue with available lanes and clearly report blocked coverage.

在深入评审前检查以下前置条件：

repo范围明确（
```
cwd
```
、目标项目及基准分支/范围已知）
变更范围可用（
```
git status
```
/
```
git diff
```
或明确的提交范围）
存在可运行的环境以执行预期检查（按需提供测试/构建/开发服务器）
UI验证的先决条件已知（认证路径、测试用户/角色、初始状态）
相关时数据库评审的先决条件已知（本地DB状态、迁移顺序、重置/测试命令）
测试命令集已知（
```
npm test
```
/
```
vitest
```
、
```
supabase test db
```
及任何针对性命令）

若任何条件不满足，继续执行可用环节并清晰报告未覆盖的部分。

Inputs

输入信息

Gather the following before review:

Intention source (preferred in this order):

```
.plan.md
```
file path, or
pasted implementation/handoff text in the prompt, or
no-plan mode (derive expected behavior from changed files, tests, docs, and commit/diff context)

Change scope:

uncommitted (
```
git status
```
,
```
git diff
```
), or
committed range (
```
git diff <base>...HEAD
```
)

UI scope:

routes/pages to verify, pulled from plan, tests, docs, and changed files

Test scope:

app-layer test framework/commands
DB-layer test framework/commands (for example pgTAP via
```
supabase test db
```
)

If any item is missing and blocks execution, ask one short question. Otherwise, state assumptions and proceed.

评审前收集以下信息：

意图来源（优先顺序如下）：

```
.plan.md
```
文件路径，或
粘贴在提示中的实施/交接文本，或
无计划模式（从变更文件、测试、文档及提交/差异上下文推导预期行为）

变更范围：

未提交（
```
git status
```
、
```
git diff
```
），或
已提交范围（
```
git diff <base>...HEAD
```
）

UI范围：

需验证的路由/页面，从计划、测试、文档及变更文件中提取

测试范围：

应用层测试框架/命令
数据库层测试框架/命令（例如通过
```
supabase test db
```
运行的pgTAP）

若任何缺失项阻碍执行，提出一个简短问题。否则，说明假设并继续执行。

Review modes

评审模式

Select one mode explicitly at the start of the review:

```
plan
```
mode

Use when a
```
.plan.md
```
is available.
Evaluate strict plan-to-implementation alignment.

```
handoff
```
mode

Use when only prompt/handoff intent is available.
Evaluate claim-to-implementation alignment.

```
no-plan
```
mode

Use when no plan/handoff is provided.
Skip strict alignment claims and focus on correctness, regressions, UX behavior, coverage, and strategy quality.

```
self-review
```
mode

Use when the same agent that implemented changes performs the review.
Treat prior assumptions as untrusted and require diff/test/UI evidence for every claim.

在评审开始时明确选择一种模式：

```
plan
```
模式

当存在
```
.plan.md
```
文件时使用。
严格评估计划与实施内容的一致性。

```
handoff
```
模式

仅当提示/交接意图可用时使用。
评估声明内容与实施内容的一致性。

```
no-plan
```
模式

当无计划/交接文档时使用。
跳过严格的一致性检查，专注于正确性、回归问题、UX行为、覆盖范围及战略质量。

```
self-review
```
模式

当实施变更的同一代理执行评审时使用。
将先前的假设视为不可信，要求每个结论都有差异/测试/UI证据支持。

Parallel subagents

并行子代理

Run parallel subagents with explicit, non-overlapping responsibilities:

```
plan-alignment-reviewer
```

Build an intention-to-evidence matrix from plan/handoff claims.
Verify each claim against actual file diffs.
Flag missing, partial, or extra implementation.

```
ui-verification-reviewer
```

Perform manual UI checks using Playwright or agent-browser.
Validate key user paths and permissions/role gating.
Record pass/fail with exact route and observed behavior.

```
technical-risk-reviewer
```

Perform code review on changed files.
Prioritize bugs, regressions, data/permission risks, and design-level defects.
Include file references and concrete failure modes.

```
strategic-reviewer
```

Evaluate architecture and implementation strategy.
Identify coupling, migration safety gaps, maintainability risks, and scalability concerns.
Suggest alternatives only when they materially reduce risk.

```
test-coverage-reviewer
```

Determine test coverage for changed behavior across app and DB layers.
Identify missing tests and high-risk untested paths.
Suggest and/or create targeted tests to close gaps.
Run relevant suites and report results with command evidence.

运行具有明确、无重叠职责的并行子代理：

```
plan-alignment-reviewer
```

从计划/交接声明构建意图-证据矩阵。
对照实际文件差异验证每个声明。
标记缺失、部分完成或额外的实施内容。

```
ui-verification-reviewer
```

使用Playwright或agent-browser执行手动UI检查。
验证关键用户路径及权限/角色限制。
记录通过/失败结果，包含具体路由及观察到的行为。

```
technical-risk-reviewer
```

对变更文件执行代码评审。
优先处理缺陷、回归问题、数据/权限风险及设计层面的缺陷。
包含文件引用及具体失效模式。

```
strategic-reviewer
```

评估架构与实施策略。
识别耦合问题、迁移安全缺口、可维护性风险及可扩展性顾虑。
仅当能实质性降低风险时才建议替代方案。

```
test-coverage-reviewer
```

确定应用和数据库层中变更行为的测试覆盖情况。
识别缺失的测试及高风险未测试路径。
建议和/或创建针对性测试以填补缺口。
运行相关套件并附带命令结果报告。

Subagent output contract

子代理输出契约

Require each subagent to return this exact structure:

```
findings
```
: severity-ranked items with file references when applicable
```
evidence
```
: concrete observations (diff snippet summary, command result, UI observation)
```
confidence
```
:
```
high | medium | low
```
per finding
```
unverified_assumptions
```
: assumptions that could change conclusions
```
blocked_items
```
: what could not be validated and why

Reject subagent output that is opinion-only or lacks evidence.

要求每个子代理返回以下精确结构：

```
findings
```
：按严重性排序的项，适用时包含文件引用
```
evidence
```
：具体观察结果（差异片段摘要、命令结果、UI观察记录）
```
confidence
```
：每个结论的
```
high | medium | low
```
（高|中|低）
```
unverified_assumptions
```
：可能改变结论的假设
```
blocked_items
```
：无法验证的内容及原因

拒绝仅含观点或缺乏证据的子代理输出。

UI coverage matrix

UI覆盖矩阵

Build and execute a minimal matrix:

persona/role x route/page x key action x expected result
include at least one happy path and one negative/permission-boundary path per protected area
include a navigation/gating check (route guard, menu visibility, or access denial behavior)
record each matrix row as
```
pass
```
,
```
fail
```
, or
```
blocked
```

When blocked, capture exact blocker and the attempted step.

构建并执行最小化矩阵：

角色/人物 x 路由/页面 x 关键操作 x 预期结果
每个受保护区域至少包含一个正常路径和一个异常/权限边界路径
包含导航/权限检查（路由守卫、菜单可见性或访问拒绝行为）
将矩阵的每一行记录为
```
pass
```
（通过）、
```
fail
```
（失败）或
```
blocked
```
（受阻）

当受阻时，记录具体的阻碍因素及尝试的步骤。

Test coverage matrix

测试覆盖矩阵

Build and execute a minimal matrix:

changed component/module/function/table/function/RPC x existing tests x gap x action
app layer: unit/integration tests for changed behavior and boundary cases
DB layer: pgTAP (or equivalent) coverage for changed tables, policies, functions, and permissions
include at least one negative path for each changed permission-sensitive behavior

Action values:

```
covered
```
(existing tests already sufficient)
```
add-tests
```
(write targeted tests)
```
deferred
```
(cannot safely add in scope; justify)

When

add-tests

is chosen, create focused tests and run affected suites.

构建并执行最小化矩阵：

变更的组件/模块/函数/表/函数/RPC x 现有测试 x 缺口 x 操作
应用层：针对变更行为及边界情况的单元/集成测试
数据库层：针对变更的表、策略、函数及权限的pgTAP（或等效工具）覆盖
每个变更的权限敏感行为至少包含一个异常路径

操作值：

```
covered
```
（已覆盖）：现有测试已足够
```
add-tests
```
（补充测试）：编写针对性测试
```
deferred
```
（延后处理）：无法在当前范围内安全补充，需说明理由

当选择

add-tests

时，创建聚焦的测试并运行受影响的套件。

Workflow

工作流

Establish scope and evidence

Determine review mode (
```
plan
```
,
```
handoff
```
, or
```
no-plan
```
) and whether review is
```
self-review
```
.
Read plan/handoff text when provided.
Enumerate changed files and classify by area (DB/schema, server, client, tests/docs).
Derive expected outcomes from the best available intention source for the selected mode.

Validate entry criteria and set timebox

Confirm entry criteria; note any missing prerequisites.
Set a review timebox and prioritize critical paths first (permissions, data integrity, primary UI flows, high-risk untested changes).

Dispatch the five subagents in parallel

Provide each subagent only the context needed for its lane.
Require each subagent to return contract-compliant output.

Run UI verification explicitly

Start from user-visible flows (routes, nav, forms, role-conditional UI).
Verify both happy path and at least one negative/permission boundary path.
When blocked (auth, env, seed data), report blocker and partial coverage.

Run DB/migration checklist when schema or SQL changed

check RLS/policy behavior against intended access model
check migration safety (ordering, idempotency where relevant, rollback feasibility)
check grants/privileges drift and RPC exposure changes
check seed/test/type-generation consistency with schema changes

Close test coverage gaps

map changed behaviors to existing tests (app + DB)
create targeted tests for high-risk uncovered behavior where feasible
run relevant app-layer and DB-layer suites
capture exact commands and pass/fail output summary

Consolidate findings

de-duplicate overlaps across subagents
convert raw notes into severity-ranked findings
separate confirmed defects from open questions

Deliver review result

findings first (highest severity first)
then alignment/reconstruction matrix, UI status, coverage status, technical analysis, strategic analysis, artifacts, and verdict
if timebox expires or blockers remain, provide partial verdict with explicit coverage gaps

确定范围与证据

确定评审模式（
```
plan
```
、
```
handoff
```
或
```
no-plan
```
）及是否为
```
self-review
```
（自评审）。
阅读提供的计划/交接文本。
枚举变更文件并按领域分类（DB/schema、服务器、客户端、测试/文档）。
从所选模式下最可用的意图来源推导预期结果。

验证准入条件并设置时间盒

确认准入条件；记录任何缺失的前置条件。
设置评审时间盒并优先处理关键路径（权限、数据完整性、主要UI流程、高风险未测试变更）。

并行调度五个子代理

为每个子代理仅提供其职责所需的上下文。
要求每个子代理返回符合契约的输出。

明确执行UI验证

从用户可见的流程开始（路由、导航、表单、角色条件UI）。
验证正常路径及至少一个异常/权限边界路径。
当受阻时（认证、环境、初始数据问题），报告阻碍因素及已完成的部分覆盖内容。

当schema或SQL变更时运行数据库/迁移检查清单

检查RLS/policy行为与预期访问模型是否一致
检查迁移安全性（顺序、相关幂等性、回滚可行性）
检查权限/特权漂移及RPC暴露变更
检查初始数据/测试/类型生成与schema变更的一致性

填补测试覆盖缺口

将变更行为映射到现有测试（应用+数据库）
为可行的高风险未覆盖行为创建针对性测试
运行相关的应用层和数据库层套件
记录具体命令及通过/失败输出摘要

整合结果

去重子代理间的重叠内容
将原始笔记转换为按严重性排序的结论
区分已确认的缺陷与未解决的问题

交付评审结果

先展示结论（按最高严重性排序）
然后是一致性/重建矩阵、UI状态、覆盖状态、技术分析、战略分析、工件及评审结论
若时间盒到期或存在阻碍，提供部分结论并明确说明覆盖缺口

Severity model

严重性模型

Use this priority scale:

```
P0
```
: release-blocking correctness or security issue
```
P1
```
: high-risk bug/regression likely to affect production behavior
```
P2
```
: meaningful correctness/maintainability/test gap
```
P3
```
: minor issue or improvement opportunity

使用以下优先级等级：

```
P0
```
：阻碍发布的正确性或安全问题
```
P1
```
：高风险缺陷/回归问题，可能影响生产环境行为
```
P2
```
：重要的正确性/可维护性/测试缺口
```
P3
```
：次要问题或改进机会

Sign-off gates

审批关卡

Apply these gates before issuing the final verdict:

do not return
```
aligned
```
if any open
```
P0
```
or
```
P1
```
exists
do not return
```
aligned
```
when critical UI flows are
```
blocked
```
without mitigation evidence
do not return
```
aligned
```
when DB/migration changes were made but DB checklist was skipped
do not return
```
aligned
```
when high-risk changed behavior has unresolved coverage gaps or failing tests
in
```
no-plan
```
mode, return
```
no-plan reviewed
```
(never strict
```
aligned
```
)

在发布最终结论前应用以下审批关卡：

若存在任何未解决的
```
P0
```
或
```
P1
```
问题，不得返回
```
aligned
```
（一致）
当关键UI流程受阻且无缓解证据时，不得返回
```
aligned
```
当进行了数据库/迁移变更但跳过了数据库检查清单时，不得返回
```
aligned
```
当高风险变更行为存在未解决的覆盖缺口或失败测试时，不得返回
```
aligned
```
在
```
no-plan
```
模式下，返回
```
no-plan reviewed
```
（无计划评审完成），不得返回严格的
```
aligned
```

Output template

输出模板

markdown

Review target: `<plan path or prompt summary>`
Review mode: `<plan | handoff | no-plan>` (+ `self-review` when applicable)
Change scope: `<uncommitted | commit range>`

Findings:
1. [P1] <title> — `<file:line>`
   Evidence: <what was observed>
   Impact: <user/system impact>
   Recommendation: <concrete fix>
1. [P2] <title> — `<file:line>`
   Evidence: <what was observed>
   Impact: <user/system impact>
   Recommendation: <concrete fix>

Plan alignment matrix (for `plan`/`handoff` modes):
1. `<planned item>` -> `<implemented evidence>` -> `<aligned | partial | missing | extra>`
1. `<planned item>` -> `<implemented evidence>` -> `<aligned | partial | missing | extra>`

Intent reconstruction matrix (for `no-plan` mode):
1. `<inferred expected behavior>` -> `<implemented evidence>` -> `<confirmed | partial | contradicted>`
1. `<inferred expected behavior>` -> `<implemented evidence>` -> `<confirmed | partial | contradicted>`

UI verification:
1. `<route + area + action>` -> `<pass/fail/blocked>` -> `<observed result>`
1. `<route + area + action>` -> `<pass/fail/blocked>` -> `<observed result>`
Blockers: <none or list>

Test coverage:
1. `<changed behavior>` -> `<existing coverage>` -> `<gap>` -> `<covered | add-tests | deferred>`
1. `<changed behavior>` -> `<existing coverage>` -> `<gap>` -> `<covered | add-tests | deferred>`
Test execution:
- `<command>` -> `<pass/fail>` -> `<key result>`
- `<command>` -> `<pass/fail>` -> `<key result>`

Technical analysis:
- `<top technical risk or confirmation>`
- `<top technical risk or confirmation>`

Strategic analysis:
- `<strategy strength/weakness>`
- `<strategy strength/weakness>`

Review artifacts:
- `<commands run and key outcomes>`
- `<ui evidence: screenshots/log notes or blocker proof>`
- `<coverage summary: tested vs blocked vs deferred>`

Verdict: `<aligned | partially aligned | not aligned | no-plan reviewed>`
Recommended next steps:
1. <step>
1. <step>

markdown

Review target: `<plan path or prompt summary>`
Review mode: `<plan | handoff | no-plan>` (+ `self-review` when applicable)
Change scope: `<uncommitted | commit range>`

Findings:
1. [P1] <title> — `<file:line>`
   Evidence: <what was observed>
   Impact: <user/system impact>
   Recommendation: <concrete fix>
1. [P2] <title> — `<file:line>`
   Evidence: <what was observed>
   Impact: <user/system impact>
   Recommendation: <concrete fix>

Plan alignment matrix (for `plan`/`handoff` modes):
1. `<planned item>` -> `<implemented evidence>` -> `<aligned | partial | missing | extra>`
1. `<planned item>` -> `<implemented evidence>` -> `<aligned | partial | missing | extra>`

Intent reconstruction matrix (for `no-plan` mode):
1. `<inferred expected behavior>` -> `<implemented evidence>` -> `<confirmed | partial | contradicted>`
1. `<inferred expected behavior>` -> `<implemented evidence>` -> `<confirmed | partial | contradicted>`

UI verification:
1. `<route + area + action>` -> `<pass/fail/blocked>` -> `<observed result>`
1. `<route + area + action>` -> `<pass/fail/blocked>` -> `<observed result>`
Blockers: <none or list>

Test coverage:
1. `<changed behavior>` -> `<existing coverage>` -> `<gap>` -> `<covered | add-tests | deferred>`
1. `<changed behavior>` -> `<existing coverage>` -> `<gap>` -> `<covered | add-tests | deferred>`
Test execution:
- `<command>` -> `<pass/fail>` -> `<key result>`
- `<command>` -> `<pass/fail>` -> `<key result>`

Technical analysis:
- `<top technical risk or confirmation>`
- `<top technical risk or confirmation>`

Strategic analysis:
- `<strategy strength/weakness>`
- `<strategy strength/weakness>`

Review artifacts:
- `<commands run and key outcomes>`
- `<ui evidence: screenshots/log notes or blocker proof>`
- `<coverage summary: tested vs blocked vs deferred>`

Verdict: `<aligned | partially aligned | not aligned | no-plan reviewed>`
Recommended next steps:
1. <step>
1. <step>

Guardrails

防护规则

do not mark
```
aligned
```
unless plan claims are evidenced in diffs/tests/UI checks
in
```
no-plan
```
mode, do not claim strict alignment; use verdict
```
no-plan reviewed
```
do not bury critical defects under summary text; findings must appear first
if UI cannot be fully executed, provide exact blocker and what was still validated
if tests cannot be executed, list exact missing prerequisites and impacted confidence
prefer concrete, falsifiable statements over broad judgments
in
```
self-review
```
mode, call out reviewer/implementer overlap and keep evidence thresholds strict
enforce subagent output contract; request retries for incomplete outputs
if review is partial due to blockers/timebox, say so explicitly in verdict context

除非计划声明在差异/测试/UI检查中有证据支持，否则不得标记为
```
aligned
```
在
```
no-plan
```
模式下，不得声称严格一致；使用评审结论
```
no-plan reviewed
```
不得将关键缺陷隐藏在摘要文本下；结论必须放在最前面
若无法完全执行UI验证，提供具体阻碍因素及仍验证的内容
若无法执行测试，列出具体缺失的前置条件及对置信度的影响
优先使用具体、可证伪的陈述，而非宽泛的判断
在
```
self-review
```
模式下，指出评审者/实施者的重叠性，并保持严格的证据阈值
强制执行子代理输出契约；对不完整的输出要求重试
若因阻碍/时间盒导致评审不完整，在结论背景中明确说明

Subagent prompt pack

子代理提示包

Use these prompts as-is, replacing placeholders.

按原样使用以下提示，替换占位符。

Parent orchestration prompt

父编排提示

text

Run autistic-code-review.

Context:
- Review target: <plan path OR handoff summary OR "none">
- Review mode: <plan | handoff | no-plan>
- Self-review: <yes | no>
- Change scope: <uncommitted | commit range>
- Repo/project path: <path>
- UI routes in scope: <route list>
- Test commands in scope: <app commands + DB commands>
- Timebox: <minutes>

Execution requirements:
1) Spawn five parallel subagents:
   - plan-alignment-reviewer
   - ui-verification-reviewer
   - technical-risk-reviewer
   - strategic-reviewer
   - test-coverage-reviewer
2) Enforce this output contract for every subagent:
   - findings
   - evidence
   - confidence
   - unverified_assumptions
   - blocked_items
3) Reject and retry any subagent output that lacks evidence.
4) Require the test-coverage-reviewer to suggest/create tests for uncovered high-risk changes and run relevant suites.
5) Consolidate results into one findings-first report with severity ordering.
6) Apply sign-off gates from the skill and produce a final verdict.

text

Run autistic-code-review.

Context:
- Review target: <plan path OR handoff summary OR "none">
- Review mode: <plan | handoff | no-plan>
- Self-review: <yes | no>
- Change scope: <uncommitted | commit range>
- Repo/project path: <path>
- UI routes in scope: <route list>
- Test commands in scope: <app commands + DB commands>
- Timebox: <minutes>

Execution requirements:
1) Spawn five parallel subagents:
   - plan-alignment-reviewer
   - ui-verification-reviewer
   - technical-risk-reviewer
   - strategic-reviewer
   - test-coverage-reviewer
2) Enforce this output contract for every subagent:
   - findings
   - evidence
   - confidence
   - unverified_assumptions
   - blocked_items
3) Reject and retry any subagent output that lacks evidence.
4) Require the test-coverage-reviewer to suggest/create tests for uncovered high-risk changes and run relevant suites.
5) Consolidate results into one findings-first report with severity ordering.
6) Apply sign-off gates from the skill and produce a final verdict.

Prompt:

plan-alignment-reviewer

Prompt:

plan-alignment-reviewer

text

You are the plan-alignment-reviewer.

Inputs:
- Review mode: <plan | handoff | no-plan>
- Intention source: <plan path or handoff text; can be empty in no-plan mode>
- Change scope: <uncommitted | commit range>
- Changed file list/diff summary: <insert>

Tasks:
1) Build an intention-to-evidence matrix from intention claims and actual diffs.
2) For each claim, classify as aligned, partial, missing, or extra.
3) In no-plan mode, produce an intent reconstruction matrix:
   - inferred expected behavior -> implemented evidence -> confirmed/partial/contradicted
4) Flag any claimed work not evidenced in code/tests/docs.

Return exactly:
- findings: severity-ranked issues with file refs
- evidence: specific diff/test/doc observations
- confidence: high/medium/low per finding
- unverified_assumptions: assumptions and why
- blocked_items: what prevented validation

text

You are the plan-alignment-reviewer.

Inputs:
- Review mode: <plan | handoff | no-plan>
- Intention source: <plan path or handoff text; can be empty in no-plan mode>
- Change scope: <uncommitted | commit range>
- Changed file list/diff summary: <insert>

Tasks:
1) Build an intention-to-evidence matrix from intention claims and actual diffs.
2) For each claim, classify as aligned, partial, missing, or extra.
3) In no-plan mode, produce an intent reconstruction matrix:
   - inferred expected behavior -> implemented evidence -> confirmed/partial/contradicted
4) Flag any claimed work not evidenced in code/tests/docs.

Return exactly:
- findings: severity-ranked issues with file refs
- evidence: specific diff/test/doc observations
- confidence: high/medium/low per finding
- unverified_assumptions: assumptions and why
- blocked_items: what prevented validation

Prompt:

ui-verification-reviewer

Prompt:

ui-verification-reviewer

text

You are the ui-verification-reviewer.

Inputs:
- UI scope routes/pages: <insert>
- Personas/roles: <insert>
- Environment/access constraints: <insert>
- Change scope summary: <insert>

Tasks:
1) Use Playwright and/or agent-browser to manually verify UI behavior.
2) Build and execute a coverage matrix:
   - role x route/page x key action x expected result
3) Include at least:
   - one happy path per protected area
   - one negative/permission-boundary path per protected area
   - one gating/navigation check (route guard/menu visibility/access denial)
4) Record each row as pass/fail/blocked with observed result.
5) Capture evidence artifacts (screenshots/log notes) for failures or blockers.

Return exactly:
- findings: severity-ranked UI defects/regressions
- evidence: route-level observations and artifact references
- confidence: high/medium/low per finding
- unverified_assumptions: missing env/auth/data assumptions
- blocked_items: exact blocker + attempted step

text

You are the ui-verification-reviewer.

Inputs:
- UI scope routes/pages: <insert>
- Personas/roles: <insert>
- Environment/access constraints: <insert>
- Change scope summary: <insert>

Tasks:
1) Use Playwright and/or agent-browser to manually verify UI behavior.
2) Build and execute a coverage matrix:
   - role x route/page x key action x expected result
3) Include at least:
   - one happy path per protected area
   - one negative/permission-boundary path per protected area
   - one gating/navigation check (route guard/menu visibility/access denial)
4) Record each row as pass/fail/blocked with observed result.
5) Capture evidence artifacts (screenshots/log notes) for failures or blockers.

Return exactly:
- findings: severity-ranked UI defects/regressions
- evidence: route-level observations and artifact references
- confidence: high/medium/low per finding
- unverified_assumptions: missing env/auth/data assumptions
- blocked_items: exact blocker + attempted step

Prompt:

technical-risk-reviewer

Prompt:

technical-risk-reviewer

text

You are the technical-risk-reviewer.

Inputs:
- Changed files and diff: <insert>
- Related tests/docs/commands run: <insert>
- Review mode and constraints: <insert>

Tasks:
1) Perform a code review focused on:
   - correctness bugs
   - behavioral regressions
   - data integrity and permission risks
   - missing or weak tests
2) If SQL/schema changed, run DB/migration checklist:
   - RLS/policy behavior vs intended access model
   - migration safety, ordering, rollback feasibility
   - grants/privileges/RPC exposure drift
   - seed/test/type-generation consistency
3) Prioritize findings by P0-P3 and include file references.

Return exactly:
- findings: severity-ranked technical issues with file refs
- evidence: concrete code/diff/test command observations
- confidence: high/medium/low per finding
- unverified_assumptions: what is assumed but unproven
- blocked_items: checks that could not be completed

text

You are the technical-risk-reviewer.

Inputs:
- Changed files and diff: <insert>
- Related tests/docs/commands run: <insert>
- Review mode and constraints: <insert>

Tasks:
1) Perform a code review focused on:
   - correctness bugs
   - behavioral regressions
   - data integrity and permission risks
   - missing or weak tests
2) If SQL/schema changed, run DB/migration checklist:
   - RLS/policy behavior vs intended access model
   - migration safety, ordering, rollback feasibility
   - grants/privileges/RPC exposure drift
   - seed/test/type-generation consistency with schema changes
3) Prioritize findings by P0-P3 and include file references.

Return exactly:
- findings: severity-ranked technical issues with file refs
- evidence: concrete code/diff/test command observations
- confidence: high/medium/low per finding
- unverified_assumptions: what is assumed but unproven
- blocked_items: checks that could not be completed

Prompt:

strategic-reviewer

Prompt:

strategic-reviewer

text

You are the strategic-reviewer.

Inputs:
- Implementation summary: <insert>
- Changed areas by layer (db/server/client/tests/docs): <insert>
- Review mode: <insert>

Tasks:
1) Evaluate implementation strategy quality:
   - architecture cohesion and coupling
   - migration/cutover safety and operability
   - maintainability and future change cost
   - scalability and team workflow implications
2) Identify strategic weaknesses and practical alternatives.
3) Recommend only changes that materially reduce risk or complexity.

Return exactly:
- findings: severity-ranked strategic risks/anti-patterns
- evidence: concrete repo or diff observations
- confidence: high/medium/low per finding
- unverified_assumptions: strategic assumptions needing confirmation
- blocked_items: missing context that limits confidence

text

You are the strategic-reviewer.

Inputs:
- Implementation summary: <insert>
- Changed areas by layer (db/server/client/tests/docs): <insert>
- Review mode: <insert>

Tasks:
1) Evaluate implementation strategy quality:
   - architecture cohesion and coupling
   - migration/cutover safety and operability
   - maintainability and future change cost
   - scalability and team workflow implications
2) Identify strategic weaknesses and practical alternatives.
3) Recommend only changes that materially reduce risk or complexity.

Return exactly:
- findings: severity-ranked strategic risks/anti-patterns
- evidence: concrete repo or diff observations
- confidence: high/medium/low per finding
- unverified_assumptions: strategic assumptions needing confirmation
- blocked_items: missing context that limits confidence

Prompt:

test-coverage-reviewer

Prompt:

test-coverage-reviewer

text

You are the test-coverage-reviewer.

Inputs:
- Changed files and diff: <insert>
- Existing tests in scope: <insert>
- Test commands:
  - app layer: <insert>
  - DB layer (pgTAP or equivalent): <insert>
- Review mode and constraints: <insert>

Tasks:
1) Build a coverage matrix:
   - changed behavior -> existing tests -> gap -> action
2) Identify high-risk untested behavior in app and DB layers.
3) Suggest and create targeted tests to close feasible gaps.
   - app layer: unit/integration tests for changed behavior and boundaries
   - DB layer: pgTAP tests for changed tables/functions/policies/permissions
4) Run relevant test suites after test additions/updates.
5) Report pass/fail and any remaining uncovered high-risk behavior.

Return exactly:
- findings: severity-ranked coverage and test-quality issues
- evidence: coverage matrix + test diffs + command results
- confidence: high/medium/low per finding
- unverified_assumptions: assumptions about environment/data/setup
- blocked_items: tests not run or not creatable and why

text

You are the test-coverage-reviewer.

Inputs:
- Changed files and diff: <insert>
- Existing tests in scope: <insert>
- Test commands:
  - app layer: <insert>
  - DB layer (pgTAP or equivalent): <insert>
- Review mode and constraints: <insert>

Tasks:
1) Build a coverage matrix:
   - changed behavior -> existing tests -> gap -> action
2) Identify high-risk untested behavior in app and DB layers.
3) Suggest and create targeted tests to close feasible gaps.
   - app layer: unit/integration tests for changed behavior and boundaries
   - DB layer: pgTAP tests for changed tables/functions/policies/permissions
4) Run relevant test suites after test additions/updates.
5) Report pass/fail and any remaining uncovered high-risk behavior.

Return exactly:
- findings: severity-ranked coverage and test-quality issues
- evidence: coverage matrix + test diffs + command results
- confidence: high/medium/low per finding
- unverified_assumptions: assumptions about environment/data/setup
- blocked_items: tests not run or not creatable and why

Consolidation prompt (optional)

整合提示（可选）

text

Consolidate five subagent outputs into one final review.

Rules:
1) Findings first, highest severity first, deduplicated across lanes.
2) Keep only evidence-backed findings.
3) Include mode-appropriate matrix:
   - plan/handoff -> plan alignment matrix
   - no-plan -> intent reconstruction matrix
4) Include UI verification status, blockers, and coverage summary.
5) Include test coverage matrix, tests added/suggested, and execution results.
6) Apply sign-off gates before verdict.
7) Verdict allowed values:
   - aligned
   - partially aligned
   - not aligned
   - no-plan reviewed

text

Consolidate five subagent outputs into one final review.

Rules:
1) Findings first, highest severity first, deduplicated across lanes.
2) Keep only evidence-backed findings.
3) Include mode-appropriate matrix:
   - plan/handoff -> plan alignment matrix
   - no-plan -> intent reconstruction matrix
4) Include UI verification status, blockers, and coverage summary.
5) Include test coverage matrix, tests added/suggested, and execution results.
6) Apply sign-off gates before verdict.
7) Verdict allowed values:
   - aligned
   - partially aligned
   - not aligned
   - no-plan reviewed