Autistic Code Review

Goal

Audit an implementation end-to-end, with or without a formal plan, and produce a defensible review with evidence from code, diffs, tests, and manual UI verification.

When to use

Use this skill when the user asks for a broad post-implementation review such as:

comparing implementation to an attached plan or handoff
reviewing uncommitted or committed changes for regressions and bugs
manually verifying front-end behavior with Playwright and/or agent-browser
assessing strategic implementation quality, not only local correctness
identifying test coverage gaps, adding tests, and running suites across application and database layers

Entry criteria

Check these preconditions before deep review:

repo scope is clear (
```
cwd
```
, target project, and base branch/range known)
change scope is available (
```
git status
```
/
```
git diff
```
or explicit commit range)
runnable environment exists for intended checks (tests/build/dev server as needed)
UI verification prerequisites are known (auth path, test user/role, seed state)
DB review prerequisites are known when relevant (local DB state, migration order, reset/test commands)
test command set is known (
```
npm test
```
/
```
vitest
```
,
```
supabase test db
```
, and any targeted commands)

If any criterion fails, continue with available lanes and clearly report blocked coverage.

Inputs

Gather the following before review:

Intention source (preferred in this order):

```
.plan.md
```
file path, or
pasted implementation/handoff text in the prompt, or
no-plan mode (derive expected behavior from changed files, tests, docs, and commit/diff context)

Change scope:

uncommitted (
```
git status
```
,
```
git diff
```
), or
committed range (
```
git diff <base>...HEAD
```
)

UI scope:

routes/pages to verify, pulled from plan, tests, docs, and changed files

Test scope:

app-layer test framework/commands
DB-layer test framework/commands (for example pgTAP via
```
supabase test db
```
)

If any item is missing and blocks execution, ask one short question. Otherwise, state assumptions and proceed.

Review modes

Select one mode explicitly at the start of the review:

```
plan
```
mode

Use when a
```
.plan.md
```
is available.
Evaluate strict plan-to-implementation alignment.

```
handoff
```
mode

Use when only prompt/handoff intent is available.
Evaluate claim-to-implementation alignment.

```
no-plan
```
mode

Use when no plan/handoff is provided.
Skip strict alignment claims and focus on correctness, regressions, UX behavior, coverage, and strategy quality.

```
self-review
```
mode

Use when the same agent that implemented changes performs the review.
Treat prior assumptions as untrusted and require diff/test/UI evidence for every claim.

Parallel subagents

Run parallel subagents with explicit, non-overlapping responsibilities:

```
plan-alignment-reviewer
```

Build an intention-to-evidence matrix from plan/handoff claims.
Verify each claim against actual file diffs.
Flag missing, partial, or extra implementation.

```
ui-verification-reviewer
```

Perform manual UI checks using Playwright or agent-browser.
Validate key user paths and permissions/role gating.
Record pass/fail with exact route and observed behavior.

```
technical-risk-reviewer
```

Perform code review on changed files.
Prioritize bugs, regressions, data/permission risks, and design-level defects.
Include file references and concrete failure modes.

```
strategic-reviewer
```

Evaluate architecture and implementation strategy.
Identify coupling, migration safety gaps, maintainability risks, and scalability concerns.
Suggest alternatives only when they materially reduce risk.

```
test-coverage-reviewer
```

Determine test coverage for changed behavior across app and DB layers.
Identify missing tests and high-risk untested paths.
Suggest and/or create targeted tests to close gaps.
Run relevant suites and report results with command evidence.

Subagent output contract

Require each subagent to return this exact structure:

```
findings
```
: severity-ranked items with file references when applicable
```
evidence
```
: concrete observations (diff snippet summary, command result, UI observation)
```
confidence
```
:
```
high | medium | low
```
per finding
```
unverified_assumptions
```
: assumptions that could change conclusions
```
blocked_items
```
: what could not be validated and why

Reject subagent output that is opinion-only or lacks evidence.

UI coverage matrix

Build and execute a minimal matrix:

persona/role x route/page x key action x expected result
include at least one happy path and one negative/permission-boundary path per protected area
include a navigation/gating check (route guard, menu visibility, or access denial behavior)
record each matrix row as
```
pass
```
,
```
fail
```
, or
```
blocked
```

When blocked, capture exact blocker and the attempted step.

Test coverage matrix

Build and execute a minimal matrix:

changed component/module/function/table/function/RPC x existing tests x gap x action
app layer: unit/integration tests for changed behavior and boundary cases
DB layer: pgTAP (or equivalent) coverage for changed tables, policies, functions, and permissions
include at least one negative path for each changed permission-sensitive behavior

Action values:

```
covered
```
(existing tests already sufficient)
```
add-tests
```
(write targeted tests)
```
deferred
```
(cannot safely add in scope; justify)

When

add-tests

is chosen, create focused tests and run affected suites.

Workflow

Establish scope and evidence

Determine review mode (
```
plan
```
,
```
handoff
```
, or
```
no-plan
```
) and whether review is
```
self-review
```
.
Read plan/handoff text when provided.
Enumerate changed files and classify by area (DB/schema, server, client, tests/docs).
Derive expected outcomes from the best available intention source for the selected mode.

Validate entry criteria and set timebox

Confirm entry criteria; note any missing prerequisites.
Set a review timebox and prioritize critical paths first (permissions, data integrity, primary UI flows, high-risk untested changes).

Dispatch the five subagents in parallel

Provide each subagent only the context needed for its lane.
Require each subagent to return contract-compliant output.

Run UI verification explicitly

Start from user-visible flows (routes, nav, forms, role-conditional UI).
Verify both happy path and at least one negative/permission boundary path.
When blocked (auth, env, seed data), report blocker and partial coverage.

Run DB/migration checklist when schema or SQL changed

check RLS/policy behavior against intended access model
check migration safety (ordering, idempotency where relevant, rollback feasibility)
check grants/privileges drift and RPC exposure changes
check seed/test/type-generation consistency with schema changes

Close test coverage gaps

map changed behaviors to existing tests (app + DB)
create targeted tests for high-risk uncovered behavior where feasible
run relevant app-layer and DB-layer suites
capture exact commands and pass/fail output summary

Consolidate findings

de-duplicate overlaps across subagents
convert raw notes into severity-ranked findings
separate confirmed defects from open questions

Deliver review result

findings first (highest severity first)
then alignment/reconstruction matrix, UI status, coverage status, technical analysis, strategic analysis, artifacts, and verdict
if timebox expires or blockers remain, provide partial verdict with explicit coverage gaps

Severity model

Use this priority scale:

```
P0
```
: release-blocking correctness or security issue
```
P1
```
: high-risk bug/regression likely to affect production behavior
```
P2
```
: meaningful correctness/maintainability/test gap
```
P3
```
: minor issue or improvement opportunity

Sign-off gates

Apply these gates before issuing the final verdict:

do not return
```
aligned
```
if any open
```
P0
```
or
```
P1
```
exists
do not return
```
aligned
```
when critical UI flows are
```
blocked
```
without mitigation evidence
do not return
```
aligned
```
when DB/migration changes were made but DB checklist was skipped
do not return
```
aligned
```
when high-risk changed behavior has unresolved coverage gaps or failing tests
in
```
no-plan
```
mode, return
```
no-plan reviewed
```
(never strict
```
aligned
```
)

Output template

markdown

Review target: `<plan path or prompt summary>`
Review mode: `<plan | handoff | no-plan>` (+ `self-review` when applicable)
Change scope: `<uncommitted | commit range>`

Findings:
1. [P1] <title> — `<file:line>`
   Evidence: <what was observed>
   Impact: <user/system impact>
   Recommendation: <concrete fix>
1. [P2] <title> — `<file:line>`
   Evidence: <what was observed>
   Impact: <user/system impact>
   Recommendation: <concrete fix>

Plan alignment matrix (for `plan`/`handoff` modes):
1. `<planned item>` -> `<implemented evidence>` -> `<aligned | partial | missing | extra>`
1. `<planned item>` -> `<implemented evidence>` -> `<aligned | partial | missing | extra>`

Intent reconstruction matrix (for `no-plan` mode):
1. `<inferred expected behavior>` -> `<implemented evidence>` -> `<confirmed | partial | contradicted>`
1. `<inferred expected behavior>` -> `<implemented evidence>` -> `<confirmed | partial | contradicted>`

UI verification:
1. `<route + area + action>` -> `<pass/fail/blocked>` -> `<observed result>`
1. `<route + area + action>` -> `<pass/fail/blocked>` -> `<observed result>`
Blockers: <none or list>

Test coverage:
1. `<changed behavior>` -> `<existing coverage>` -> `<gap>` -> `<covered | add-tests | deferred>`
1. `<changed behavior>` -> `<existing coverage>` -> `<gap>` -> `<covered | add-tests | deferred>`
Test execution:
- `<command>` -> `<pass/fail>` -> `<key result>`
- `<command>` -> `<pass/fail>` -> `<key result>`

Technical analysis:
- `<top technical risk or confirmation>`
- `<top technical risk or confirmation>`

Strategic analysis:
- `<strategy strength/weakness>`
- `<strategy strength/weakness>`

Review artifacts:
- `<commands run and key outcomes>`
- `<ui evidence: screenshots/log notes or blocker proof>`
- `<coverage summary: tested vs blocked vs deferred>`

Verdict: `<aligned | partially aligned | not aligned | no-plan reviewed>`
Recommended next steps:
1. <step>
1. <step>

Guardrails

do not mark
```
aligned
```
unless plan claims are evidenced in diffs/tests/UI checks
in
```
no-plan
```
mode, do not claim strict alignment; use verdict
```
no-plan reviewed
```
do not bury critical defects under summary text; findings must appear first
if UI cannot be fully executed, provide exact blocker and what was still validated
if tests cannot be executed, list exact missing prerequisites and impacted confidence
prefer concrete, falsifiable statements over broad judgments
in
```
self-review
```
mode, call out reviewer/implementer overlap and keep evidence thresholds strict
enforce subagent output contract; request retries for incomplete outputs
if review is partial due to blockers/timebox, say so explicitly in verdict context

Subagent prompt pack

Use these prompts as-is, replacing placeholders.

Parent orchestration prompt

text

Run autistic-code-review.

Context:
- Review target: <plan path OR handoff summary OR "none">
- Review mode: <plan | handoff | no-plan>
- Self-review: <yes | no>
- Change scope: <uncommitted | commit range>
- Repo/project path: <path>
- UI routes in scope: <route list>
- Test commands in scope: <app commands + DB commands>
- Timebox: <minutes>

Execution requirements:
1) Spawn five parallel subagents:
   - plan-alignment-reviewer
   - ui-verification-reviewer
   - technical-risk-reviewer
   - strategic-reviewer
   - test-coverage-reviewer
2) Enforce this output contract for every subagent:
   - findings
   - evidence
   - confidence
   - unverified_assumptions
   - blocked_items
3) Reject and retry any subagent output that lacks evidence.
4) Require the test-coverage-reviewer to suggest/create tests for uncovered high-risk changes and run relevant suites.
5) Consolidate results into one findings-first report with severity ordering.
6) Apply sign-off gates from the skill and produce a final verdict.

Prompt:

plan-alignment-reviewer

text

You are the plan-alignment-reviewer.

Inputs:
- Review mode: <plan | handoff | no-plan>
- Intention source: <plan path or handoff text; can be empty in no-plan mode>
- Change scope: <uncommitted | commit range>
- Changed file list/diff summary: <insert>

Tasks:
1) Build an intention-to-evidence matrix from intention claims and actual diffs.
2) For each claim, classify as aligned, partial, missing, or extra.
3) In no-plan mode, produce an intent reconstruction matrix:
   - inferred expected behavior -> implemented evidence -> confirmed/partial/contradicted
4) Flag any claimed work not evidenced in code/tests/docs.

Return exactly:
- findings: severity-ranked issues with file refs
- evidence: specific diff/test/doc observations
- confidence: high/medium/low per finding
- unverified_assumptions: assumptions and why
- blocked_items: what prevented validation

Prompt:

ui-verification-reviewer

text

You are the ui-verification-reviewer.

Inputs:
- UI scope routes/pages: <insert>
- Personas/roles: <insert>
- Environment/access constraints: <insert>
- Change scope summary: <insert>

Tasks:
1) Use Playwright and/or agent-browser to manually verify UI behavior.
2) Build and execute a coverage matrix:
   - role x route/page x key action x expected result
3) Include at least:
   - one happy path per protected area
   - one negative/permission-boundary path per protected area
   - one gating/navigation check (route guard/menu visibility/access denial)
4) Record each row as pass/fail/blocked with observed result.
5) Capture evidence artifacts (screenshots/log notes) for failures or blockers.

Return exactly:
- findings: severity-ranked UI defects/regressions
- evidence: route-level observations and artifact references
- confidence: high/medium/low per finding
- unverified_assumptions: missing env/auth/data assumptions
- blocked_items: exact blocker + attempted step

Prompt:

technical-risk-reviewer

text

You are the technical-risk-reviewer.

Inputs:
- Changed files and diff: <insert>
- Related tests/docs/commands run: <insert>
- Review mode and constraints: <insert>

Tasks:
1) Perform a code review focused on:
   - correctness bugs
   - behavioral regressions
   - data integrity and permission risks
   - missing or weak tests
2) If SQL/schema changed, run DB/migration checklist:
   - RLS/policy behavior vs intended access model
   - migration safety, ordering, rollback feasibility
   - grants/privileges/RPC exposure drift
   - seed/test/type-generation consistency
3) Prioritize findings by P0-P3 and include file references.

Return exactly:
- findings: severity-ranked technical issues with file refs
- evidence: concrete code/diff/test command observations
- confidence: high/medium/low per finding
- unverified_assumptions: what is assumed but unproven
- blocked_items: checks that could not be completed

Prompt:

strategic-reviewer

text

You are the strategic-reviewer.

Inputs:
- Implementation summary: <insert>
- Changed areas by layer (db/server/client/tests/docs): <insert>
- Review mode: <insert>

Tasks:
1) Evaluate implementation strategy quality:
   - architecture cohesion and coupling
   - migration/cutover safety and operability
   - maintainability and future change cost
   - scalability and team workflow implications
2) Identify strategic weaknesses and practical alternatives.
3) Recommend only changes that materially reduce risk or complexity.

Return exactly:
- findings: severity-ranked strategic risks/anti-patterns
- evidence: concrete repo or diff observations
- confidence: high/medium/low per finding
- unverified_assumptions: strategic assumptions needing confirmation
- blocked_items: missing context that limits confidence

Prompt:

test-coverage-reviewer

text

You are the test-coverage-reviewer.

Inputs:
- Changed files and diff: <insert>
- Existing tests in scope: <insert>
- Test commands:
  - app layer: <insert>
  - DB layer (pgTAP or equivalent): <insert>
- Review mode and constraints: <insert>

Tasks:
1) Build a coverage matrix:
   - changed behavior -> existing tests -> gap -> action
2) Identify high-risk untested behavior in app and DB layers.
3) Suggest and create targeted tests to close feasible gaps.
   - app layer: unit/integration tests for changed behavior and boundaries
   - DB layer: pgTAP tests for changed tables/functions/policies/permissions
4) Run relevant test suites after test additions/updates.
5) Report pass/fail and any remaining uncovered high-risk behavior.

Return exactly:
- findings: severity-ranked coverage and test-quality issues
- evidence: coverage matrix + test diffs + command results
- confidence: high/medium/low per finding
- unverified_assumptions: assumptions about environment/data/setup
- blocked_items: tests not run or not creatable and why

Consolidation prompt (optional)

text

Consolidate five subagent outputs into one final review.

Rules:
1) Findings first, highest severity first, deduplicated across lanes.
2) Keep only evidence-backed findings.
3) Include mode-appropriate matrix:
   - plan/handoff -> plan alignment matrix
   - no-plan -> intent reconstruction matrix
4) Include UI verification status, blockers, and coverage summary.
5) Include test coverage matrix, tests added/suggested, and execution results.
6) Apply sign-off gates before verdict.
7) Verdict allowed values:
   - aligned
   - partially aligned
   - not aligned
   - no-plan reviewed

autistic-code-review

NPX Install

Tags

SKILL.md Content

Autistic Code Review

Goal

When to use

Entry criteria

Inputs

Review modes

Parallel subagents

Subagent output contract

UI coverage matrix

Test coverage matrix

Workflow

Severity model

Sign-off gates

Output template

Guardrails

Subagent prompt pack

Parent orchestration prompt

Prompt:
`plan-alignment-reviewer`

Prompt:
`ui-verification-reviewer`

Prompt:
`technical-risk-reviewer`

Prompt:
`strategic-reviewer`

Prompt:
`test-coverage-reviewer`

Consolidation prompt (optional)