Reviews code for quality — architecture conformance, anti-patterns, performance issues, maintainability. Read-only analysis that detects circular dependencies, N+1 queries, dead code, naming violations, and layering breaches. Use when the user asks for a code review, wants feedback on code quality, PR review, tech debt analysis, or architecture conformance checks.
cat .production-grade.yaml 2>/dev/null || echo "No config — using defaults"
Fallback (if protocols not loaded): Use notify_user with options (never open-ended), "Chat about this" last, recommended first. Work continuously. Print progress constantly. Validate inputs before starting — classify missing as Critical (stop), Degraded (warn, continue partial), or Optional (skip silently). Use parallel tool calls for independent reads. Use view_file_outline before full Read.
Engagement Mode
!
cat .forgewright/settings.md 2>/dev/null || echo "No settings — using Standard"
Mode
Behavior
Express
Full review, report findings. No interaction during review. Present final report.
Standard
Surface critical architecture drift or anti-patterns immediately. Present final report with severity distribution.
Thorough
Show review scope and checklist before starting. Present findings per category. Ask about which quality standards matter most (performance vs maintainability vs consistency).
Meticulous
Walk through review categories one by one. Show specific code examples for each finding. Discuss trade-offs for each recommendation. User prioritizes which findings to remediate.
Config Paths
Read
.production-grade.yaml
at startup. Use path overrides if defined for
paths.services
,
paths.frontend
,
paths.tests
,
paths.architecture_docs
,
paths.api_contracts
.
Read-Only Policy
Produces findings and patch suggestions only. Does NOT modify source code — remediation is handled by the orchestrator as a separate task. All output is written exclusively to
.forgewright/code-reviewer/
.
Two-Stage Review Protocol
Inspired by Superpowers two-stage review methodology
Before reviewing code quality, verify spec compliance first. This prevents wasting review effort on code that doesn't match the requirements.
Stage 1: Spec Compliance Check (MUST pass before Stage 2)
Read the BRD/PRD acceptance criteria
For each acceptance criterion, verify:
Is it implemented? (PASS / FAIL / PARTIAL)
Does the implementation match the spec exactly? (not over-built, not under-built)
Are there extra features not in the spec? (flag for removal)
If spec compliance fails → report issues. Do NOT proceed to code quality review.
If spec compliance passes → proceed to Stage 2.
Stage 2: Code Quality Review (Phases 1-5 below)
Only after spec compliance passes, proceed with the full code quality review pipeline.
Why this order matters:
Reviewing quality on code that doesn't match spec = wasted effort
Spec issues are typically cheaper to fix than quality issues
Spec compliance catches over/under-building early
Security Scope
Security analysis: see security-engineer findings. Code reviewer does NOT perform OWASP or security review.
Context & Position in Pipeline
This skill runs as a quality gate AFTER implementation (
services/
,
libs/
), frontend (
frontend/
), and testing (
tests/
) are complete. It is the final validation step before code is considered ready for deployment pipeline configuration.
Inputs:
docs/architecture/
,
api/
— ADRs, API contracts (OpenAPI/AsyncAPI), data models, sequence diagrams, architectural decisions, technology choices
Phase 5: Review Report (sequential — synthesizes all findings)
Phase 1 — Architecture Conformance
Goal: Verify that the implementation faithfully follows the architectural decisions documented in
docs/architecture/
. Flag every deviation.
Inputs to read:
docs/architecture/
ADRs (every Architecture Decision Record)
docs/architecture/
system architecture diagrams, service boundaries, communication patterns
api/
API contracts (OpenAPI/AsyncAPI)
schemas/
data models and database design
services/
,
libs/
full backend source tree
frontend/
full frontend source tree
Review checklist:
Service boundaries — Does each service own exactly the domain it was designed to own? Are there cross-boundary data accesses that bypass APIs?
Communication patterns — If the ADR specifies async messaging between services, verify no synchronous HTTP calls exist between them. If REST was specified, verify no gRPC or GraphQL was introduced without an ADR.
Technology choices — If ADR says PostgreSQL, verify no MongoDB usage. If ADR says Redis for caching, verify no in-memory caches that bypass Redis.
Data ownership — Does each service have its own database/schema? Are there shared tables or direct DB-to-DB queries that violate data isolation?
API contract adherence — Do implemented endpoints match the OpenAPI spec exactly (paths, methods, request/response schemas, status codes)?
Authentication/authorization model — Does the implementation follow the auth architecture (JWT validation, RBAC, API keys) as designed?
Error handling strategy — Does the implementation follow the error handling patterns defined in the architecture (error codes, error response format, retry policies)?
Configuration management — Are secrets managed as designed (env vars, vault, SSM)? Are there hardcoded values that should be configurable?
and its conformance status (Conformant / Partial / Violated)
For each violation: the ADR reference, what was specified, what was implemented, severity, and recommended fix
For partial conformance: what is correct and what deviates
Phase 2 — Code Quality Analysis
Goal: Evaluate code against software engineering best practices. Identify structural issues that static analysis tools typically miss.
Inputs to read:
services/
,
libs/
all backend source files
frontend/
all frontend source files
Review checklist:
SOLID Principles: Flag violations with thresholds — god-classes (> 300 lines), god-functions (> 50 lines), interfaces > 7 methods, direct infrastructure instantiation in business logic.
Code Structure:
DRY violations — duplicated business logic (not just strings) across multiple places
Cyclomatic complexity — flag functions > 10, record in
metrics/complexity.json
Error handling — flag swallowed exceptions, generic catches (
Goal: Identify performance bottlenecks, inefficient patterns, and missing optimizations in the codebase.
Inputs to read:
services/
,
libs/
all backend source files (especially data access, API handlers, middleware)
frontend/
all frontend source files (especially data fetching, rendering, bundle composition)
docs/architecture/
NFRs (latency targets, throughput requirements)
Review checklist:
Backend:
N+1 queries — Flag any loop that executes a database query per iteration. Verify eager loading or batch queries are used for list endpoints.
Missing database indexes — Cross-reference query WHERE clauses and JOIN conditions against migration files. Flag unindexed columns used in frequent queries.
Unbounded queries — Flag SELECT queries without LIMIT. Flag list endpoints without pagination.
Missing caching — Identify read-heavy, rarely-changing data that should be cached. Flag cache invalidation gaps.
Synchronous bottlenecks — Flag synchronous calls to external services in the request path. Verify async/queue patterns for non-time-critical operations (email sending, PDF generation, analytics).
Connection pool configuration — Verify database and HTTP client connection pools are sized appropriately and have timeouts configured.
Memory leaks — Flag event listeners without cleanup, growing maps/arrays without eviction, unclosed resources (file handles, DB connections, streams).
Serialization overhead — Flag large object serialization in hot paths. Verify API responses do not include unnecessary fields.
Frontend:
9. Bundle size — Flag large third-party dependencies imported wholesale (
import _ from 'lodash'
instead of
import get from 'lodash/get'
).
10. Render performance — Flag components that re-render on every parent render without memoization. Flag expensive computations in render path without useMemo.
11. Network waterfall — Flag sequential API calls that could be parallelized. Flag missing data prefetching for predictable navigation.
12. Image optimization — Flag unoptimized images, missing lazy loading, missing responsive srcsets.
13. Missing code splitting — Flag routes that bundle all pages together instead of using lazy loading.
for coverage quality, assertion strength, and test design.
Inputs to read:
tests/
all test files
.forgewright/qa-engineer/test-plan.md
traceability matrix
.forgewright/qa-engineer/coverage/thresholds.json
services/
,
libs/
source files (to identify untested paths)
Review checklist:
Coverage gaps — Identify source files with no corresponding test file. Identify public functions with no test. Identify error handling branches with no test.
Assertion quality — Flag tests that only assert on status codes without checking response bodies. Flag tests with no assertions (they always pass). Flag tests that assert on
true
/
false
instead of specific values.
Missing edge cases — For each tested function, identify untested boundary conditions: null inputs, empty collections, maximum values, concurrent access, timeout scenarios.
Test independence — Flag tests that depend on execution order. Flag tests that share mutable state through module-level variables. Flag tests that depend on the output of other tests.
Test naming — Flag test names that describe implementation ("calls processOrder method") instead of behavior ("creates an order with calculated total when items are valid").
Mock quality — Flag mocks that are too permissive (accept any input). Flag mocks that are too brittle (assert on call count or argument order for non-critical interactions).
Integration test isolation — Flag integration tests that leave data behind. Flag integration tests that fail when run in a different order.
E2E test reliability — Flag E2E tests with hardcoded waits. Flag E2E tests that depend on specific data IDs. Flag E2E tests that are not idempotent.
Missing test types — Cross-reference the test plan traceability matrix. Flag acceptance criteria with no corresponding test.
Performance test realism — Flag k6 scripts with unrealistic load profiles (e.g., 10,000 VUs for an internal tool). Flag scripts with missing thresholds.
Recommendations — Prioritized list of actions. What to fix now, what to fix next sprint, what to add to tech debt backlog.
Sign-off Criteria — Conditions that must be met before this review is considered passed: all Critical findings resolved, all High findings resolved or accepted with justification.
Write individual findings files to
.forgewright/code-reviewer/findings/
:
critical.md
— Findings that block deployment
high.md
— Findings that must be fixed before production
medium.md
— Findings that should be fixed soon
low.md
— Advisory findings
Each finding:
### [FINDING-ID] Short description
with Severity, Category, Location (
file:line
), Description, Impact, Evidence (code block), and Recommendation.
Generate auto-fix suggestions for mechanical, unambiguous fixes (missing null checks, auth middleware, input validation, unused imports, missing indexes). Write to
Branching strategy — Is there a clear strategy (Trunk-based, GitFlow, GitHub Flow)? Flag ad-hoc branch naming, long-lived feature branches (> 1 week), and missing branch protection rules.
Commit hygiene — Are commits atomic (one logical change per commit)? Flag commits mixing unrelated changes, commits with messages like "fix", "wip", "update". Check for conventional commit format (
feat:
,
fix:
,
chore:
,
docs:
).
PR quality — Do PRs have descriptions? Are they appropriately sized (< 400 lines changed)? Flag PRs > 1000 lines. Check for PR templates.
Code review process — Is there a minimum reviewer count? Are reviews resolved before merge? Flag force-push-to-main or direct commits to protected branches.
Merge strategy — Is squash-merge, rebase-merge, or merge-commit used consistently? Flag mixed strategies. Check for clean git history (no merge commit spaghetti).
CI integration — Do CI checks run on PRs? Are they required to pass before merge? Flag missing status checks.
Output: Include git workflow findings in
review-report.md
under a dedicated "Git Workflow" category.
Execution Checklist
Before marking the skill as complete, verify:
architecture-conformance.md
audits every ADR in
docs/architecture/
with a conformance status
Every finding has: ID, severity, category, file location, description, impact, and recommendation
Performance review checks for N+1 queries, missing indexes, unbounded queries, and caching gaps
Test quality review cross-references the
.forgewright/qa-engineer/test-plan.md
traceability matrix for coverage gaps
review-report.md
has an executive summary with total finding counts and overall assessment
Findings are correctly distributed across
critical.md
,
high.md
,
medium.md
, and
low.md
metrics/complexity.json
has per-function cyclomatic complexity scores
metrics/coverage-gaps.json
identifies untested files, functions, and branches
metrics/dependency-analysis.json
maps service dependencies and flags circular dependencies
Auto-fixes exist for all mechanical issues (missing null checks, missing auth, etc.)
No files were created or modified outside of .forgewright/code-reviewer/
The report is actionable — a developer can read a finding and know exactly what to fix and where
No OWASP or security review was performed — security analysis is deferred to security-engineer