Project Orchestrator

Overview

Universal project lifecycle skill. Classifies your project type, builds a phase plan, then walks through each phase sequentially — invoking existing skills where they exist and running inline design phases where they don't.

The rule: No project uses all phases. The router selects 4–14 phases based on what you're actually building.

Announce at start: "I'm using the orchestrate skill to guide this project through its lifecycle."

When to Use

Starting a new project from scratch (greenfield)
Adding a major feature that changes architecture, data flow, or integrations
Unsure which skills to invoke and in what order
Starting work on a project type you haven't classified before

When NOT to use:

Small bug fixes, typos, minor UI tweaks — just do the work
Pure research or exploration — use Explore agent directly
Single-file changes with clear requirements — use TDD directly
You already know exactly which single skill applies (e.g., just need
```
/security-audit
```
)

How It Works

Classify — Ask what you're building, determine project type
Route — Select the phases that apply
Execute — Walk through each phase sequentially
Handoff — Each phase produces a doc artifact; later phases build on earlier ones

All artifacts are saved to

docs/plans/

. If resuming mid-project, check which docs already exist to determine current phase.

Phase 0: Project Classification

Ask ONE question to classify the project:

"What are you building?"

Type	Indicators	Phase Count
macOS App	Desktop UI, SwiftUI, AppKit, menu bar app	4 phases
iOS Mobile App	iPhone/iPad, SwiftUI, UIKit, App Store	5–9 phases
Web Frontend	React, Vue, static site, no backend	5 phases
Full-Stack Web	Frontend + database + API + auth	10 phases
Voice Agent	LiveKit, telephony, STT/TTS, conversational AI	11 phases
Edge/IoT + ML	Hardware devices, computer vision, ML pipeline, fleet management	13 phases

Sub-classification questions (if needed):

Mobile/Web: "Does it have a backend?" — if yes, add full-stack phases
Any type: "Does it integrate with external services?" — if yes, add resilience phase
Any type: "Will this be deployed to cloud infrastructure you manage?" — if yes, add infrastructure phase
Any type: "Is this a new project or adding to an existing system?" — if existing, add system assessment phase

Route Table

Phase	macOS	iOS	Web FE	Full-Stack	Voice	Edge/IoT+ML
0.5 System Assessment	o	o	o	o	o	o
1. Brainstorm	x	x	x	x	x	x
2. Domain Model		o		x	x	x
3. System Design + Security		o		x	x	x
4. Resilience		o		x	x	x
5. ML Pipeline						x
6. Edge Architecture						x
7. API Specification		o		x	x	x
8. Voice Prompt Design					x
9. Infrastructure				o	o	x
10. Writing Plans	x	x	x	x	x	x
11. Implementation	x	x	x	x	x	x
12. Security Validation		o		x	x	x
13. Observability				x	x	x
14. ML Validation						x
15. Polish & Review	x	x	x	x	x	x

x = always applies | o = conditional (based on sub-classification) | blank = skip

Compile Your Phase Plan

After classification, explicitly list the active phases for this project before proceeding:

Review the route table for your project type
For each "o" phase, check the sub-classification answers to determine if it's active
Write out the numbered list of active phases (e.g., "Active phases: 0.5, 1, 2, 3, 4, 7, 10, 11, 12, 13, 15")
Present the phase plan to the user for confirmation before starting

This prevents accidentally skipping or running wrong phases.

Phase 0.5: Existing System Assessment

Applies to: All project types, only when adding to an existing project (skip for greenfield) Output: System assessment section prepended to design doc

Purpose

Before brainstorming new features, understand what already exists. Designing without mapping the current system produces plans that conflict with existing architecture, duplicate existing capabilities, or ignore existing tech debt.

Process

Map the current architecture:
- Read README, docs/, and any existing design docs
- Identify the tech stack, key dependencies, and deployment model
- Map the data model (schemas, migrations, key entities)
- Identify the main entry points and request flows
Identify constraints and boundaries:
- What patterns and conventions does the codebase follow?
- What are the existing API contracts that must not break?
- What tech debt or known issues exist? (check issues, TODOs, CHANGELOG)
- What dependencies are pinned or constrained?
Assess the test and CI situation:
- What test coverage exists? What's tested vs untested?
- What CI/CD pipeline exists? What checks run on PR?
- How are deployments done today?
Summarize the integration surface:
- What external services are already integrated?
- What internal APIs exist that the new feature could reuse?
- Where are the seams — natural places to extend without rewriting?

Deliverable

markdown

## Existing System Assessment

### Architecture Summary
- Stack: [Languages, frameworks, infrastructure]
- Key Components: [List with one-line descriptions]
- Data Model: [Key entities and relationships]

### Constraints
- Must Not Break: [Existing APIs, contracts, behaviors]
- Tech Debt: [Known issues that affect the new work]
- Conventions: [Patterns the new code must follow]

### Integration Surface
- Reusable: [Existing APIs/components the new feature can leverage]
- Seams: [Natural extension points]

### Test & CI Status
- Coverage: [What's tested, what's not]
- Pipeline: [What runs on PR/merge/deploy]

Phase 1: Brainstorming

Applies to: All project types Invoke:

/brainstorming

Output: Design doc at

docs/plans/YYYY-MM-DD-<topic>-design.md

If Phase 0.5 produced an assessment, feed it into brainstorming as context so the design builds on what exists rather than conflicting with it.

Do not proceed until the design doc is approved and committed.

Next: Proceed to Phase 2 (Domain Modeling) if active, otherwise skip to next active phase.

Phase 2: Domain Modeling

Applies to: Full-Stack, Voice, Edge/IoT+ML, Mobile (with backend) Source: Domain-Driven Design (Eric Evans) Output: Domain model section appended to design doc

Questions to Ask

Work through these one at a time:

Bounded Contexts: What are the distinct areas of the business domain?
- Each context has its own ubiquitous language, models, and rules
- Example (AiSyst): Ordering, Menu Management, Voice Interaction, Billing
- Example (RCM): Detection, Review, Training, Fleet Management, Telemetry
Aggregates: Within each context, what are the consistency boundaries?
- An aggregate is a cluster of entities that must be consistent together
- What invariants must hold within each aggregate?
- Example: An "Order" aggregate — items can't be empty, total must match items, status transitions are valid
Domain Events: What important things happen that other contexts care about?
- Events cross context boundaries; commands stay within them
- Example: "DetectionCreated" -> triggers Review context; "ReviewCompleted" -> triggers Training context
Context Map: How do bounded contexts communicate?
- Shared kernel, customer-supplier, conformist, anti-corruption layer?
- Where are the translation layers needed?

Deliverable

markdown

## Domain Model

### Bounded Contexts
- **[Context Name]**: [Purpose, key entities, invariants]

### Aggregates
- **[Aggregate Name]**: [Root entity, child entities, invariants]

### Domain Events
- [EventName]: [Source context] -> [Target context(s)]

### Context Map
[How contexts relate and communicate]

Phase 3: System Design + Security-by-Design

Applies to: Full-Stack, Voice, Edge/IoT+ML, Mobile (with backend) Invoke:

/ddia-design

Output: System design doc at

docs/plans/YYYY-MM-DD-<topic>-system-design.md

Security-by-Design Injection Points

IMPORTANT: Before invoking
/ddia-design
, write down the injection points below as a checklist. At each DDIA phase transition, check the list before proceeding. Do NOT rely on memory — the DDIA skill's own flow will consume your attention.

While running

/ddia-design

, inject these additional questions at three phases:

At DDIA Phase 2 (Storage & Data Model):

What access control model per table/collection? (RLS, RBAC, ABAC)
Which fields contain PII? Encryption at rest strategy?
What are the access patterns per role? (admin sees all, user sees own)
Audit logging: which mutations need an audit trail?

At DDIA Phase 3 (Data Flow & Integration):

What auth mechanism at each boundary? (JWT, API key, mTLS, webhook signature)
How are secrets managed? (Environment vars, Vault, Secrets Manager)
Transport security per channel? (TLS, mTLS for service-to-service)
Which data crosses trust boundaries? What validation is needed at each?

At DDIA Phase 5 (Correctness & Cross-Cutting):

What is the threat model? (STRIDE per component)
Input validation strategy per boundary? (Zod schemas, parameterized queries)
Rate limiting per endpoint tier? (public vs authenticated vs internal)
What happens if a credential is compromised? Rotation and revocation plan?

Accessibility-by-Design Injection Point (Web, Mobile, Desktop)

Inject at DDIA Phase 8 (Frontend & Derived Views) for any project with a UI:

What WCAG level are you targeting? (A, AA, AAA — AA is the standard for most products)
Color contrast: do all text/background combinations meet the target ratio? (4.5:1 for AA normal text, 3:1 for large text)
Keyboard navigation: can every interactive element be reached and operated without a mouse?
Screen reader strategy: what semantic HTML / ARIA roles are needed? What's the heading hierarchy?
Motion: do animations respect
```
prefers-reduced-motion
```
? Are there alternatives for motion-dependent interactions?
Touch targets: are all interactive elements at least 44x44pt (iOS) / 48x48dp (Android)?

These questions are proactive — catching contrast issues and keyboard traps during design costs minutes; fixing them after implementation costs hours.

Deliverable

The standard DDIA design summary doc, with security and accessibility decisions integrated into each relevant phase (not as a separate section).

Phase 4: Resilience Patterns

Applies to: Full-Stack, Voice, Edge/IoT+ML, Mobile (with external services) Source: Release It! (Michael Nygard) Output: Resilience section appended to system design doc

Questions to Ask

For each external dependency (API, database, message queue, third-party service):

Failure Mode: What happens when this dependency is unavailable?
- Timeout? Error response? Silent data loss?
- How long can you tolerate the outage?
Circuit Breaker: Should you fail fast after N failures?
- What's the threshold? (e.g., 5 failures in 30 seconds)
- What's the half-open recovery strategy?
Timeout Budget: What's the maximum wait time?
- For voice agents: total turn budget (STT + LLM + TTS must complete before silence)
- For web: p95 response time target per endpoint
Retry Policy: Is the operation safe to retry?
- Idempotent? -> Retry with exponential backoff
- Non-idempotent? -> Fail and surface to user
- Maximum retries before circuit opens?
Bulkhead: Does failure in one integration affect others?
- Separate thread pools / connection pools per dependency?
- Can a slow Stripe response block voice ordering?
Graceful Degradation: What's the reduced-functionality mode?
- POS down -> queue orders for later sync?
- GPS unavailable -> save detection without coordinates?
- Cache down -> serve from DB (slower but functional)?

Deliverable

markdown

## Resilience Patterns

| Dependency | Failure Mode | Circuit Breaker | Timeout | Retry | Degradation |
|-----------|-------------|----------------|---------|-------|-------------|
| [Service] | [What breaks] | [Threshold] | [ms] | [Policy] | [Fallback] |

Phase 5: ML Pipeline Design

Applies to: Edge/IoT+ML Source: Designing Machine Learning Systems (Chip Huyen) Output: ML pipeline doc at

docs/plans/YYYY-MM-DD-<topic>-ml-pipeline.md

Questions to Ask

Data Pipeline:

What is the training data source? (labeled images, sensor data, logs)
How is data labeled? (manual, semi-automated, active learning)
What is the labeling quality control process?
Data versioning strategy? (DVC, S3 versioning, git-lfs)
Class imbalance — what's the distribution? Augmentation strategy?
Train/val/test split strategy? (random, temporal, geographic)

Model Lifecycle: 7. Model architecture selection criteria? (accuracy vs latency vs size) 8. Experiment tracking? (MLflow, W&B, spreadsheet) 9. Model versioning scheme? (dev/staging/prod, semver) 10. Export format for deployment? (ONNX, TensorRT, CoreML, .pt) 11. Model size budget? (edge device storage + memory constraints)

Deployment & Serving: 12. How does a new model reach production? (OTA, manual flash, staged rollout) 13. Canary deployment? (% of fleet on new model before full rollout) 14. Rollback strategy? (automatic on metric degradation, manual) 15. A/B testing — how do you compare model versions in production?

Monitoring & Retraining: 16. What metrics define model health? (precision, recall, F1, latency) 17. How is drift detected? (data drift, concept drift, prediction drift) 18. What triggers retraining? (metric threshold, scheduled, manual) 19. Human-in-the-loop feedback loop — how long from detection to retraining? 20. Cold start — what happens when the model encounters a new environment?

Deliverable

markdown

## ML Pipeline Design

### Data Pipeline
- Source: [Where training data comes from]
- Labeling: [Process, QC, tooling]
- Versioning: [Strategy]
- Splits: [Train/val/test ratios and strategy]

### Model Lifecycle
- Architecture: [Model, why chosen]
- Experiment Tracking: [Tool/process]
- Versioning: [Scheme]
- Export: [Format, size budget]

### Deployment
- Delivery: [OTA/manual, canary %]
- Rollback: [Trigger and process]

### Monitoring
- Health Metrics: [What to track]
- Drift Detection: [Method and thresholds]
- Retraining Trigger: [Conditions]
- Feedback Loop Latency: [Time from detection to retrained model deployed]

Phase 6: Edge Architecture Design

Applies to: Edge/IoT+ML Source: IoT architecture patterns, Release It! edge extensions Output: Edge architecture section appended to system design doc

Questions to Ask

Device Constraints:

What hardware? (CPU, GPU, RAM, storage, connectivity)
Power source? (battery, vehicle power, mains)
Physical environment? (temperature range, vibration, dust, moisture)
What sensors? (cameras, GPS, accelerometer, etc.)

Offline-First Design: 5. Expected connectivity patterns? (always-on, intermittent, shift-based) 6. Maximum offline duration to survive? (hours, days) 7. Local queue strategy? (SQLite, file queue, memory buffer) 8. Queue overflow policy? (oldest-first eviction, priority-based, compress) 9. Sync strategy on reconnect? (batch upload, priority queue, bandwidth-aware)

Resource Budgeting: 10. CPU/GPU budget split? (inference %, upload %, logging %, OS overhead %) 11. Memory budget? (model size + working memory + queue + buffers) 12. Storage budget? (model files + offline queue + logs + OS) 13. Bandwidth budget? (payload size x frequency x fleet size = daily data volume) 14. Frame rate vs accuracy trade-off? (every frame, 1/sec, triggered)

Fleet Management: 15. How many devices? Current and projected? 16. Device provisioning workflow? (certificate issuance, registration, initial config) 17. OTA update strategy? (Greengrass, custom, staged rollout %) 18. Health monitoring? (heartbeat interval, metrics reported, alerting thresholds) 19. Decommissioning? (certificate revocation, data cleanup)

Deliverable

markdown

## Edge Architecture

### Device Profile
- Hardware: [Specs]
- Constraints: [Power, connectivity, environment]
- Sensors: [List with interfaces]

### Offline Strategy
- Queue: [Technology, max size, overflow policy]
- Sync: [Strategy, priority]
- Max Offline Duration: [Hours/days]

### Resource Budget
| Resource | Budget | Allocation |
|----------|--------|------------|
| CPU/GPU | 100% | Inference %, Upload %, Other % |
| RAM | [Size] | Model %, Queue %, OS % |
| Storage | [Size] | Models %, Queue %, Logs % |
| Bandwidth | [Daily] | Detections %, Telemetry %, Updates % |

### Fleet Management
- Fleet Size: [Current -> Projected]
- Provisioning: [Workflow]
- OTA Updates: [Strategy, rollout %]
- Health Monitoring: [Metrics, intervals, alerts]

Phase 7: API Specification

Applies to: Full-Stack, Voice, Edge/IoT+ML, Mobile (with backend) Output: API spec appended to system design doc or separate doc

Process

For each system boundary identified in Phase 3 (Data Flow):

List all endpoints/contracts:
- REST endpoints (method, path)
- Event schemas (SQS messages, EventBridge events)
- Webhook contracts (incoming from third parties)
- Device protocols (MQTT topics, IoT shadow schemas)
- Tool interfaces (voice agent tools, function calling schemas)
For each endpoint, define:
- Auth requirement (JWT, API key, service role, webhook signature, mTLS)
- Request schema (with types, required/optional, validation rules)
- Response schema (success + error shapes)
- Rate limiting tier (public, authenticated, internal, service-to-service)
- Idempotency (safe to retry? idempotency key?)
Error format standard:
- Agree on ONE error shape across all APIs
- Include: status code, error code, human message, details object

Deliverable

markdown

## API Specification

### Error Format
{ status, code, message, details }

### Endpoints

#### [Boundary Name]
| Method | Path | Auth | Rate Limit | Idempotent |
|--------|------|------|------------|------------|
| POST | /api/example | JWT | 100/min | Yes (key) |

**Request:** { ... }
**Response:** { ... }
**Errors:** 400 (validation), 401 (auth), 429 (rate limit)

Phase 8: Voice Agent Prompt Design

Applies to: Voice Agent Invoke:

/voice-agent-prompt

Output: Voice prompt doc

Phase 9: Infrastructure Design

Applies to: Edge/IoT+ML, and any project with self-managed cloud infrastructure Source: Infrastructure as Code (Kief Morris) Output: Infrastructure section appended to system design doc

Questions to Ask

IaC Tool & Module Structure:
- What IaC tool? (Terraform, Pulumi, CDK, CloudFormation)
- Module boundaries — which resources belong together?
- Shared vs environment-specific modules?
State Management:
- Remote state backend? (S3, Terraform Cloud, Azure Blob)
- State locking mechanism?
- State file per environment or per module?
Environment Strategy:
- How many environments? (dev, staging, prod)
- How do changes promote? (manual apply, CI/CD pipeline, GitOps)
- Blast radius of a bad apply — what's the worst case?
CI/CD Pipeline:
- Plan on PR, apply on merge?
- Who approves infrastructure changes?
- Rollback strategy for infrastructure?
IaC Testing:
- Static analysis? (tfsec, Checkov, OPA)
- Plan validation? (terraform plan diff review)
- Integration tests? (test environment that mirrors prod)
Secrets Management:
- Where do secrets live? (Vault, Secrets Manager, SSM Parameter Store)
- Rotation schedule?
- Emergency revocation process?
Cost Estimation:
- What's the compute cost per unit of work? (per API call, per inference, per voice minute)
- What are the third-party API costs at projected volume? (Twilio per-minute, Deepgram per-hour, Stripe per-transaction, S3 per-GB)
- What's the storage growth projection? (GB/month now, in 6 months, in 2 years)
- What's the monthly burn at current scale? At 10x scale?
- Where are the cost cliffs? (Aurora serverless scaling tiers, Lambda invocation thresholds, data transfer costs)
- Is there a cost ceiling / budget constraint?
- What's the cost-per-user or cost-per-unit-of-value? (Does the unit economics work?)

Deliverable

markdown

## Infrastructure Design

### IaC Structure
- Tool: [Terraform/Pulumi/etc.]
- Modules: [List with responsibilities]
- State: [Backend, locking, per-environment strategy]

### Environment Promotion
- Environments: [List]
- Promotion Flow: [PR -> plan -> review -> apply]
- Rollback: [Strategy]

### Secrets
- Store: [Tool]
- Rotation: [Schedule]
- Emergency Revocation: [Process]

### Cost Estimation
| Resource | Unit Cost | Current Volume | Monthly Cost | At 10x |
|----------|-----------|----------------|-------------|--------|
| [Compute] | [$/unit] | [units/month] | [$] | [$] |
| [Storage] | [$/GB] | [GB] | [$] | [$] |
| [Third-party API] | [$/call] | [calls/month] | [$] | [$] |
| **Total** | | | **[$]** | **[$]** |

Cost Ceiling: [Budget constraint if any]
Cost-per-User: [$/user/month at projected scale]

Phase 10: Implementation Planning

Applies to: All project types Invoke:

/writing-plans

Input: All design docs produced in prior phases

Testing Strategy Addition

When creating the implementation plan, ensure each task specifies which level of the testing pyramid it targets:

Level	What It Tests	When to Use
Unit	Single function/component in isolation	Every task (TDD)
Integration	Two+ components together, real dependencies	API endpoints, DB queries, service integrations
Contract	API shape matches between producer/consumer	Cross-service boundaries, webhook contracts, device protocols
End-to-End	Full user flow through the system	Critical paths only (login, core transaction, detection pipeline)
Load	Performance under expected/peak traffic	After core features are built

Source: Growing Object-Oriented Software, Guided by Tests (Freeman & Pryce)

Output: Implementation plan at

docs/plans/YYYY-MM-DD-<topic>-plan.md

Phase 11: Implementation

Applies to: All project types

Choose execution approach:

Subagent-driven (current session): Invoke
```
/subagent-driven-development
```
Parallel session (separate): Invoke
```
/executing-plans
```
in a new session

Phase 12: Security Validation

Applies to: Full-Stack, Voice, Edge/IoT+ML, Mobile (with backend) Invoke:

/security-audit

and/or

/web-app-security-audit

Run BEFORE deployment. Verify that security-by-design decisions from Phase 3 were actually implemented.

For Edge/IoT projects, additionally verify:

Device certificates: valid, unique per device, rotation scheduled
MQTT topic security: devices can only publish to their own topics
Firmware integrity: signed updates, verified on device
Physical security: what credentials are on the device if someone steals it?

Output: Security audit report

Phase 13: Observability Design

Applies to: Full-Stack, Voice, Edge/IoT+ML Source: Observability Engineering (Charity Majors) Output: Observability section appended to system design doc

Questions to Ask

Structured Logging:
- What logging format? (JSON, structured key-value)
- What fields on every log line? (timestamp, service, request_id, user_id, trace_id)
- Correlation IDs — how do you trace a request across services?
Distributed Tracing:
- What spans exist? (one per service hop in the request path)
- What tool? (X-Ray, Jaeger, OpenTelemetry)
- What sampling rate? (100% in staging, 10% in prod, 100% for errors)
Metrics:
- RED metrics per service: Rate, Errors, Duration
- Business metrics: orders/hour, detections/day, review latency
- Infrastructure metrics: CPU, memory, queue depth, cache hit rate
- Use percentiles (p50, p95, p99), not averages
Alerting:
- What's worth waking someone up for? (data loss, service down, security breach)
- What can wait until morning? (elevated error rate, slow responses, queue backlog)
- Alert fatigue prevention — fewer, better alerts
Dashboards:
- One dashboard per bounded context
- Top-level "system health" dashboard
- On-call runbook linked from each alert

Deliverable

markdown

## Observability

### Logging
- Format: [JSON/structured]
- Required Fields: [timestamp, service, request_id, trace_id, ...]
- Correlation: [How trace IDs propagate]

### Tracing
- Tool: [X-Ray/Jaeger/OTEL]
- Spans: [List of spans in critical path]
- Sampling: [Rate per environment]

### Metrics
| Metric | Type | Alert Threshold |
|--------|------|----------------|
| [name] | [RED/business/infra] | [threshold] |

### Alerting
| Alert | Severity | Action |
|-------|----------|--------|
| [What] | [Page/Warning/Info] | [Runbook link] |

Phase 14: ML Validation

Applies to: Edge/IoT+ML Source: Designing Machine Learning Systems (Chip Huyen), Reliable Machine Learning (Cathy Chen et al.) Output: ML validation report

Validation Checklist

Run after implementation, before production deployment:

Model Performance:
- Precision, recall, F1 on held-out test set
- Performance per class (not just aggregate)
- Performance on edge cases (night, rain, dust, unusual angles)
- Latency on target hardware (not just dev machine)
Data Quality:
- Label consistency audit (sample and re-label, measure agreement)
- Data leakage check (training data contaminating test set)
- Distribution shift check (training data vs production data)
Robustness:
- Adversarial inputs (unusual lighting, occlusion, camera artifacts)
- Out-of-distribution detection (does the model know when it doesn't know?)
- Confidence calibration (does 90% confidence mean 90% accuracy?)
Fairness & Bias:
- Performance across operating conditions (time of day, weather, road type)
- False positive/negative rates across conditions
- Are some environments systematically underrepresented?
Operational Readiness:
- Model loads correctly on target hardware
- Inference fits within resource budget (Phase 6)
- Offline queue handles expected volume
- Monitoring pipeline captures metrics correctly

Deliverable

markdown

## ML Validation Report

### Performance
| Metric | Overall | Class A | Class B | Edge Cases |
|--------|---------|---------|---------|------------|
| Precision | | | | |
| Recall | | | | |
| F1 | | | | |
| Latency (ms) | | | | |

### Data Quality
- Label Agreement: [%]
- Leakage Check: [Pass/Fail]
- Distribution Shift: [Within/Outside tolerance]

### Robustness
- Adversarial: [Results]
- OOD Detection: [Method, threshold]
- Calibration: [ECE score]

### Go/No-Go Decision
[Ready / Needs retraining / Needs more data]

Phase 15: Polish & Review

Applies to: All project types

Route to the appropriate review skill(s):

Project Type	Review Skills
macOS App	`/design-code-review` + `/apple-craftsman` (review mode)
iOS Mobile App	`/mobile-ios-design` (review mode) + `/ux-usability-review`
Web Frontend	`/ui-polish-review` + `/ux-usability-review`
Full-Stack Web	`/ui-polish-review` + `/ux-usability-review`
Voice Agent	`/ux-usability-review` (conversational flow review)
Edge/IoT + ML	`/ui-polish-review` (webapp) + `/ux-usability-review` (webapp)

After review skills complete, invoke

/code-simplifier

on the full codebase.

Resumption Protocol

If starting a new session mid-project:

Check
```
docs/plans/
```
for existing artifacts
Read each doc to understand decisions already made
Determine which phase produced the last artifact
Resume from the next phase

Artifact -> Phase mapping:

Artifact	Phase Completed
System Assessment section	Phase 0.5 (Existing System Assessment)
`*-design.md`	Phase 1 (Brainstorming)
Domain Model section in design doc	Phase 2
`*-system-design.md`	Phase 3 (DDIA)
Resilience section in system design	Phase 4
`*-ml-pipeline.md`	Phase 5
Edge Architecture section	Phase 6
API Specification section/doc	Phase 7
Voice prompt doc	Phase 8
Infrastructure section	Phase 9
`*-plan.md`	Phase 10 (Writing Plans)
Code exists + tests pass	Phase 11 (Implementation)
Security audit report	Phase 12
Observability section	Phase 13
ML validation report	Phase 14
Review findings addressed	Phase 15

Anti-Patterns

Mistake	Fix
Skipping to implementation	Always start at Phase 0, even if "you know what you're building"
Running all 15 phases for a simple macOS app	Trust the router — it selects only applicable phases
Treating security as Phase 12 only	Security-by-design is in Phase 3; Phase 12 validates it was implemented
Designing the ML pipeline after building the API	Phases are sequential — ML decisions affect API shape
Writing plans without a domain model	Plans based on a vague domain produce vague tasks
Skipping resilience for "internal" services	Internal services fail too — especially at 3am
Averaging latency instead of using percentiles	p50 hides tail latency; use p95/p99
Adding features to an existing system without mapping it first	Run Phase 0.5 — understand what exists before designing what's new
Treating accessibility as a Phase 15 afterthought	Accessibility-by-design in Phase 3 catches issues that are expensive to retrofit
Ignoring cloud costs until the bill arrives	Cost estimation in Phase 9 prevents surprises — unit economics matter

Book References

Phase	Book	Author
2. Domain Modeling	Domain-Driven Design	Eric Evans
3. System Design	Designing Data-Intensive Applications	Martin Kleppmann
4. Resilience	Release It!	Michael Nygard
5. ML Pipeline	Designing Machine Learning Systems	Chip Huyen
9. Infrastructure	Infrastructure as Code	Kief Morris
10. Testing Strategy	Growing Object-Oriented Software, Guided by Tests	Freeman & Pryce
13. Observability	Observability Engineering	Charity Majors
14. ML Validation	Reliable Machine Learning	Cathy Chen et al.
15. UI Polish	Refactoring UI	Wathan & Schoger
15. UX Review	Don't Make Me Think	Steve Krug

software-forge

NPX Install

Tags

SKILL.md Content

Project Orchestrator

Overview

When to Use

How It Works

Phase 0: Project Classification

Route Table

Compile Your Phase Plan

Phase 0.5: Existing System Assessment

Purpose

Process

Deliverable

Phase 1: Brainstorming

Phase 2: Domain Modeling

Questions to Ask

Deliverable

Phase 3: System Design + Security-by-Design

Security-by-Design Injection Points

Accessibility-by-Design Injection Point (Web, Mobile, Desktop)

Deliverable

Phase 4: Resilience Patterns

Questions to Ask

Deliverable

Phase 5: ML Pipeline Design

Questions to Ask

Deliverable

Phase 6: Edge Architecture Design

Questions to Ask

Deliverable

Phase 7: API Specification

Process

Deliverable

Phase 8: Voice Agent Prompt Design

Phase 9: Infrastructure Design

Questions to Ask

Deliverable

Phase 10: Implementation Planning

Testing Strategy Addition

Phase 11: Implementation

Phase 12: Security Validation

Phase 13: Observability Design

Questions to Ask

Deliverable

Phase 14: ML Validation

Validation Checklist

Deliverable

Phase 15: Polish & Review

Resumption Protocol

Anti-Patterns

Book References