Honest Review
Research-driven code review. Every finding validated with evidence.
Dispatch
| $ARGUMENTS | Mode |
|---|
| Empty + changes in session (git diff) | Session review of changed files |
| Empty + no changes (first message) | Full codebase audit |
| File or directory path | Scoped review of that path |
| "audit" | Force full codebase audit |
| PR number/URL | Review PR changes (gh pr diff) |
| Git range (HEAD~3..HEAD) | Review changes in that range |
Review Levels (Both Modes)
Every review covers three abstraction levels, each examining both
defects and unnecessary complexity:
Surface (lines, expressions, functions):
Correctness, error handling, security, readability.
Simplify: dead code, complex conditionals to early returns, hand-rolled to stdlib.
Structural (modules, classes, boundaries):
Test coverage, coupling, interface contracts, cognitive complexity.
Simplify: 1:1 wrappers, single-use abstractions, pass-through plumbing.
Algorithmic (algorithms, data structures, system design):
Complexity class, N+1, resource leaks, concurrency.
Simplify: O(n^2) to O(n), wrong data structure, unnecessary serialization.
Context-dependent: add security checks for auth/payment/user-data code.
Add observability checks for services/APIs.
Full checklists: read references/checklists.md
Research Validation
THIS IS THE CORE DIFFERENTIATOR. Do not report findings based solely
on LLM knowledge. For every non-trivial finding, validate with research:
Two-phase review per scope:
- Flag phase: Analyze code, generate hypotheses ("this API may be
deprecated", "this SQL pattern may be injectable", "this dependency
has a known CVE")
- Validate phase: For each flag, spawn research subagent(s) to confirm:
- Context7: look up current library docs for API correctness
- WebSearch: check current best practices, security advisories
- WebFetch: query package registries (npm, PyPI, crates.io)
- gh: check open issues, security advisories for dependencies
- Only report findings with evidence. Cite sources.
Research playbook: read references/research-playbook.md
Mode 1: Session Review
Step 1: Identify Changes
Run
git diff --name-only HEAD
to capture both staged and unstaged changes.
Collect
for full context.
Identify original task intent from session history.
Step 2: Scale and Launch
| Scope | Strategy |
|---|
| 1-2 files | Inline review at all 3 levels. Spawn research subagents for flagged findings. |
| 3-5 files | Spawn 3 parallel reviewer subagents (Surface/Structural/Algorithmic). Each flags then researches within their level. |
| 6+ files or 3+ modules | Spawn a team. See below. |
Team structure for large session reviews (6+ files):
[Lead: reconcile findings, produce final report]
|-- Surface Reviewer
| Wave 1: subagents analyzing files (1 per file)
| Wave 2: subagents researching flagged findings
|-- Structural Reviewer
| Wave 1: subagents analyzing module boundaries
| Wave 2: subagents researching flagged findings
|-- Algorithmic Reviewer
| Wave 1: subagents analyzing performance/complexity
| Wave 2: subagents researching flagged findings
|-- Verification Runner
Wave 1: subagents running build, lint, tests
Wave 2: subagents spot-checking behavior
Each teammate operates independently. Each runs internal waves of
massively parallelized subagents. No overlapping file ownership.
Step 3: Reconcile (5 Steps)
- Question: For each finding, ask: (a) Is this actually broken or
just unfamiliar? (b) Is there research evidence? (c) Would fixing
this genuinely improve the code? Discard unvalidated findings.
- Deduplicate: Same issue at different levels — keep deepest root cause
- Resolve conflicts: When levels disagree, choose most net simplification
- Elevate: Surface patterns across files to structural/algorithmic root
- Prioritize: P0/S0 (must fix), P1/S1 (should fix), P2/S2 (report but do not implement)
Severity calibration: P0 = will cause production incident. Not "ugly code."
Step 4: Present and Execute
Present all P0/P1/S0/S1 findings with evidence and citations.
Ask: "Implement fixes? [all / select / skip]"
If approved: parallel subagents by file (no overlapping ownership).
Then verify: build/lint, tests, behavior spot-check.
Output format: read references/output-formats.md
Mode 2: Full Codebase Audit
Step 1: Discover
Explore: language(s), framework(s), build system, directory structure,
entry points, dependency manifest, approximate size.
For 500+ files: prioritize recently modified, entry points, public API,
high-complexity areas. State scope in report.
Step 2: Design and Launch Team
Spawn a team with domain-based ownership. Each teammate runs all 3
review levels + research validation on their owned files.
[Lead: cross-domain analysis, reconciliation, final report]
|-- Domain A Reviewer — e.g., Backend
| Wave 1: parallel subagents scanning all owned files
| Wave 2: parallel subagents deep-diving flagged files
| Wave 3: parallel subagents researching flagged assumptions
|-- Domain B Reviewer — e.g., Frontend
| [same wave pattern]
|-- Domain C Reviewer — e.g., Tests/Infra
| [same wave pattern]
|-- Dependency and Security Researcher
Wave 1: subagents auditing each dependency (version, CVEs, license)
Wave 2: subagents checking security patterns against current docs
Wave 3: subagents verifying API usage against library docs (Context7)
Adapt team composition to project type.
Team archetypes + scaling: read references/team-templates.md
Step 3: Teammate Instructions
Each teammate receives: role, owned files, project context, all 3 review
levels, instruction to run two-phase (flag then research-validate), and
findings format. Full template: references/team-templates.md
Step 4: Cross-Domain Analysis (Lead)
While teammates review, lead spawns parallel subagents for:
- Architecture: module boundaries, dependency graph
- Data flow: trace key paths end-to-end
- Error propagation: consistency across system
- Shared patterns: duplication vs. necessary abstraction
Step 5: Reconcile Across Domains
Same 5-step reconciliation. Cross-domain deduplication and elevation.
Step 6: Report
Output format: references/output-formats.md
Required sections: Critical, Significant, Cross-Domain, Health Summary,
Top 3 Recommendations. All findings include evidence + citations.
Step 7: Execute (If Approved)
Ask: "Implement fixes? [all / select / skip]"
If approved: parallel subagents by file (no overlapping ownership).
Then verify: build/lint, tests, behavior spot-check.
Healthy Codebase
If no P0/P1 or S0 findings: state this explicitly. Acknowledge health.
Do not inflate minor issues. A short report is a good report.
Reference Files
| File | When to Read |
|---|
| references/checklists.md | During analysis or building teammate prompts |
| references/research-playbook.md | When setting up research validation subagents |
| references/output-formats.md | When producing final output |
| references/team-templates.md | When designing teams (Mode 2 or large Mode 1) |
Critical Rules
- Every non-trivial finding must have research evidence or be discarded
- Do not police style — follow the codebase's conventions
- Do not report phantom bugs requiring impossible conditions
- More than 15 findings means re-prioritize — 5 validated findings beat 50 speculative
- Never skip reconciliation
- Always present before implementing (approval gate)
- Always verify after implementing (build, tests, behavior)
- Never assign overlapping file ownership