Project Orchestrator
Overview
Universal project lifecycle skill. Classifies your project type, builds a phase plan, then walks through each phase sequentially — invoking existing skills where they exist and running inline design phases where they don't.
The rule: No project uses all phases. The router selects 4–14 phases based on what you're actually building.
Announce at start: "I'm using the orchestrate skill to guide this project through its lifecycle."
When to Use
- Starting a new project from scratch (greenfield)
- Adding a major feature that changes architecture, data flow, or integrations
- Unsure which skills to invoke and in what order
- Starting work on a project type you haven't classified before
When NOT to use:
- Small bug fixes, typos, minor UI tweaks — just do the work
- Pure research or exploration — use Explore agent directly
- Single-file changes with clear requirements — use TDD directly
- You already know exactly which single skill applies (e.g., just need )
How It Works
- Classify — Ask what you're building, determine project type
- Route — Select the phases that apply
- Execute — Walk through each phase sequentially
- Handoff — Each phase produces a doc artifact; later phases build on earlier ones
All artifacts are saved to
. If resuming mid-project, check which docs already exist to determine current phase.
Phase 0: Project Classification
Ask ONE question to classify the project:
"What are you building?"
| Type | Indicators | Phase Count |
|---|
| macOS App | Desktop UI, SwiftUI, AppKit, menu bar app | 4 phases |
| iOS Mobile App | iPhone/iPad, SwiftUI, UIKit, App Store | 5–9 phases |
| Web Frontend | React, Vue, static site, no backend | 5 phases |
| Full-Stack Web | Frontend + database + API + auth | 10 phases |
| Voice Agent | LiveKit, telephony, STT/TTS, conversational AI | 11 phases |
| Edge/IoT + ML | Hardware devices, computer vision, ML pipeline, fleet management | 13 phases |
Sub-classification questions (if needed):
- Mobile/Web: "Does it have a backend?" — if yes, add full-stack phases
- Any type: "Does it integrate with external services?" — if yes, add resilience phase
- Any type: "Will this be deployed to cloud infrastructure you manage?" — if yes, add infrastructure phase
- Any type: "Is this a new project or adding to an existing system?" — if existing, add system assessment phase
Route Table
| Phase | macOS | iOS | Web FE | Full-Stack | Voice | Edge/IoT+ML |
|---|
| 0.5 System Assessment | o | o | o | o | o | o |
| 1. Brainstorm | x | x | x | x | x | x |
| 2. Domain Model | | o | | x | x | x |
| 3. System Design + Security | | o | | x | x | x |
| 4. Resilience | | o | | x | x | x |
| 5. ML Pipeline | | | | | | x |
| 6. Edge Architecture | | | | | | x |
| 7. API Specification | | o | | x | x | x |
| 8. Voice Prompt Design | | | | | x | |
| 9. Infrastructure | | | | o | o | x |
| 10. Writing Plans | x | x | x | x | x | x |
| 11. Implementation | x | x | x | x | x | x |
| 12. Security Validation | | o | | x | x | x |
| 13. Observability | | | | x | x | x |
| 14. ML Validation | | | | | | x |
| 15. Polish & Review | x | x | x | x | x | x |
x = always applies | o = conditional (based on sub-classification) | blank = skip
Compile Your Phase Plan
After classification, explicitly list the active phases for this project before proceeding:
- Review the route table for your project type
- For each "o" phase, check the sub-classification answers to determine if it's active
- Write out the numbered list of active phases (e.g., "Active phases: 0.5, 1, 2, 3, 4, 7, 10, 11, 12, 13, 15")
- Present the phase plan to the user for confirmation before starting
This prevents accidentally skipping or running wrong phases.
Phase 0.5: Existing System Assessment
Applies to: All project types, only when adding to an existing project (skip for greenfield)
Output: System assessment section prepended to design doc
Purpose
Before brainstorming new features, understand what already exists. Designing without mapping the current system produces plans that conflict with existing architecture, duplicate existing capabilities, or ignore existing tech debt.
Process
-
Map the current architecture:
- Read README, docs/, and any existing design docs
- Identify the tech stack, key dependencies, and deployment model
- Map the data model (schemas, migrations, key entities)
- Identify the main entry points and request flows
-
Identify constraints and boundaries:
- What patterns and conventions does the codebase follow?
- What are the existing API contracts that must not break?
- What tech debt or known issues exist? (check issues, TODOs, CHANGELOG)
- What dependencies are pinned or constrained?
-
Assess the test and CI situation:
- What test coverage exists? What's tested vs untested?
- What CI/CD pipeline exists? What checks run on PR?
- How are deployments done today?
-
Summarize the integration surface:
- What external services are already integrated?
- What internal APIs exist that the new feature could reuse?
- Where are the seams — natural places to extend without rewriting?
Deliverable
markdown
## Existing System Assessment
### Architecture Summary
- Stack: [Languages, frameworks, infrastructure]
- Key Components: [List with one-line descriptions]
- Data Model: [Key entities and relationships]
### Constraints
- Must Not Break: [Existing APIs, contracts, behaviors]
- Tech Debt: [Known issues that affect the new work]
- Conventions: [Patterns the new code must follow]
### Integration Surface
- Reusable: [Existing APIs/components the new feature can leverage]
- Seams: [Natural extension points]
### Test & CI Status
- Coverage: [What's tested, what's not]
- Pipeline: [What runs on PR/merge/deploy]
Phase 1: Brainstorming
Applies to: All project types
Invoke:
Output: Design doc at
docs/plans/YYYY-MM-DD-<topic>-design.md
If Phase 0.5 produced an assessment, feed it into brainstorming as context so the design builds on what exists rather than conflicting with it.
Do not proceed until the design doc is approved and committed.
Next: Proceed to Phase 2 (Domain Modeling) if active, otherwise skip to next active phase.
Phase 2: Domain Modeling
Applies to: Full-Stack, Voice, Edge/IoT+ML, Mobile (with backend)
Source: Domain-Driven Design (Eric Evans)
Output: Domain model section appended to design doc
Questions to Ask
Work through these one at a time:
-
Bounded Contexts: What are the distinct areas of the business domain?
- Each context has its own ubiquitous language, models, and rules
- Example (AiSyst): Ordering, Menu Management, Voice Interaction, Billing
- Example (RCM): Detection, Review, Training, Fleet Management, Telemetry
-
Aggregates: Within each context, what are the consistency boundaries?
- An aggregate is a cluster of entities that must be consistent together
- What invariants must hold within each aggregate?
- Example: An "Order" aggregate — items can't be empty, total must match items, status transitions are valid
-
Domain Events: What important things happen that other contexts care about?
- Events cross context boundaries; commands stay within them
- Example: "DetectionCreated" -> triggers Review context; "ReviewCompleted" -> triggers Training context
-
Context Map: How do bounded contexts communicate?
- Shared kernel, customer-supplier, conformist, anti-corruption layer?
- Where are the translation layers needed?
Deliverable
markdown
## Domain Model
### Bounded Contexts
- **[Context Name]**: [Purpose, key entities, invariants]
### Aggregates
- **[Aggregate Name]**: [Root entity, child entities, invariants]
### Domain Events
- [EventName]: [Source context] -> [Target context(s)]
### Context Map
[How contexts relate and communicate]
Phase 3: System Design + Security-by-Design
Applies to: Full-Stack, Voice, Edge/IoT+ML, Mobile (with backend)
Invoke:
Output: System design doc at
docs/plans/YYYY-MM-DD-<topic>-system-design.md
Security-by-Design Injection Points
IMPORTANT: Before invoking , write down the injection points below as a checklist. At each DDIA phase transition, check the list before proceeding. Do NOT rely on memory — the DDIA skill's own flow will consume your attention.
While running
, inject these additional questions at three phases:
At DDIA Phase 2 (Storage & Data Model):
- What access control model per table/collection? (RLS, RBAC, ABAC)
- Which fields contain PII? Encryption at rest strategy?
- What are the access patterns per role? (admin sees all, user sees own)
- Audit logging: which mutations need an audit trail?
At DDIA Phase 3 (Data Flow & Integration):
- What auth mechanism at each boundary? (JWT, API key, mTLS, webhook signature)
- How are secrets managed? (Environment vars, Vault, Secrets Manager)
- Transport security per channel? (TLS, mTLS for service-to-service)
- Which data crosses trust boundaries? What validation is needed at each?
At DDIA Phase 5 (Correctness & Cross-Cutting):
- What is the threat model? (STRIDE per component)
- Input validation strategy per boundary? (Zod schemas, parameterized queries)
- Rate limiting per endpoint tier? (public vs authenticated vs internal)
- What happens if a credential is compromised? Rotation and revocation plan?
Accessibility-by-Design Injection Point (Web, Mobile, Desktop)
Inject at DDIA Phase 8 (Frontend & Derived Views) for any project with a UI:
- What WCAG level are you targeting? (A, AA, AAA — AA is the standard for most products)
- Color contrast: do all text/background combinations meet the target ratio? (4.5:1 for AA normal text, 3:1 for large text)
- Keyboard navigation: can every interactive element be reached and operated without a mouse?
- Screen reader strategy: what semantic HTML / ARIA roles are needed? What's the heading hierarchy?
- Motion: do animations respect ? Are there alternatives for motion-dependent interactions?
- Touch targets: are all interactive elements at least 44x44pt (iOS) / 48x48dp (Android)?
These questions are proactive — catching contrast issues and keyboard traps during design costs minutes; fixing them after implementation costs hours.
Deliverable
The standard DDIA design summary doc, with security and accessibility decisions integrated into each relevant phase (not as a separate section).
Phase 4: Resilience Patterns
Applies to: Full-Stack, Voice, Edge/IoT+ML, Mobile (with external services)
Source: Release It! (Michael Nygard)
Output: Resilience section appended to system design doc
Questions to Ask
For each external dependency (API, database, message queue, third-party service):
-
Failure Mode: What happens when this dependency is unavailable?
- Timeout? Error response? Silent data loss?
- How long can you tolerate the outage?
-
Circuit Breaker: Should you fail fast after N failures?
- What's the threshold? (e.g., 5 failures in 30 seconds)
- What's the half-open recovery strategy?
-
Timeout Budget: What's the maximum wait time?
- For voice agents: total turn budget (STT + LLM + TTS must complete before silence)
- For web: p95 response time target per endpoint
-
Retry Policy: Is the operation safe to retry?
- Idempotent? -> Retry with exponential backoff
- Non-idempotent? -> Fail and surface to user
- Maximum retries before circuit opens?
-
Bulkhead: Does failure in one integration affect others?
- Separate thread pools / connection pools per dependency?
- Can a slow Stripe response block voice ordering?
-
Graceful Degradation: What's the reduced-functionality mode?
- POS down -> queue orders for later sync?
- GPS unavailable -> save detection without coordinates?
- Cache down -> serve from DB (slower but functional)?
Deliverable
markdown
## Resilience Patterns
|-----------|-------------|----------------|---------|-------|-------------|
| [Service] | [What breaks] | [Threshold] | [ms] | [Policy] | [Fallback] |
Phase 5: ML Pipeline Design
Applies to: Edge/IoT+ML
Source: Designing Machine Learning Systems (Chip Huyen)
Output: ML pipeline doc at
docs/plans/YYYY-MM-DD-<topic>-ml-pipeline.md
Questions to Ask
Data Pipeline:
- What is the training data source? (labeled images, sensor data, logs)
- How is data labeled? (manual, semi-automated, active learning)
- What is the labeling quality control process?
- Data versioning strategy? (DVC, S3 versioning, git-lfs)
- Class imbalance — what's the distribution? Augmentation strategy?
- Train/val/test split strategy? (random, temporal, geographic)
Model Lifecycle:
7. Model architecture selection criteria? (accuracy vs latency vs size)
8. Experiment tracking? (MLflow, W&B, spreadsheet)
9. Model versioning scheme? (dev/staging/prod, semver)
10. Export format for deployment? (ONNX, TensorRT, CoreML, .pt)
11. Model size budget? (edge device storage + memory constraints)
Deployment & Serving:
12. How does a new model reach production? (OTA, manual flash, staged rollout)
13. Canary deployment? (% of fleet on new model before full rollout)
14. Rollback strategy? (automatic on metric degradation, manual)
15. A/B testing — how do you compare model versions in production?
Monitoring & Retraining:
16. What metrics define model health? (precision, recall, F1, latency)
17. How is drift detected? (data drift, concept drift, prediction drift)
18. What triggers retraining? (metric threshold, scheduled, manual)
19. Human-in-the-loop feedback loop — how long from detection to retraining?
20. Cold start — what happens when the model encounters a new environment?
Deliverable
markdown
## ML Pipeline Design
### Data Pipeline
- Source: [Where training data comes from]
- Labeling: [Process, QC, tooling]
- Versioning: [Strategy]
- Splits: [Train/val/test ratios and strategy]
### Model Lifecycle
- Architecture: [Model, why chosen]
- Experiment Tracking: [Tool/process]
- Versioning: [Scheme]
- Export: [Format, size budget]
### Deployment
- Delivery: [OTA/manual, canary %]
- Rollback: [Trigger and process]
### Monitoring
- Health Metrics: [What to track]
- Drift Detection: [Method and thresholds]
- Retraining Trigger: [Conditions]
- Feedback Loop Latency: [Time from detection to retrained model deployed]
Phase 6: Edge Architecture Design
Applies to: Edge/IoT+ML
Source: IoT architecture patterns, Release It! edge extensions
Output: Edge architecture section appended to system design doc
Questions to Ask
Device Constraints:
- What hardware? (CPU, GPU, RAM, storage, connectivity)
- Power source? (battery, vehicle power, mains)
- Physical environment? (temperature range, vibration, dust, moisture)
- What sensors? (cameras, GPS, accelerometer, etc.)
Offline-First Design:
5. Expected connectivity patterns? (always-on, intermittent, shift-based)
6. Maximum offline duration to survive? (hours, days)
7. Local queue strategy? (SQLite, file queue, memory buffer)
8. Queue overflow policy? (oldest-first eviction, priority-based, compress)
9. Sync strategy on reconnect? (batch upload, priority queue, bandwidth-aware)
Resource Budgeting:
10. CPU/GPU budget split? (inference %, upload %, logging %, OS overhead %)
11. Memory budget? (model size + working memory + queue + buffers)
12. Storage budget? (model files + offline queue + logs + OS)
13. Bandwidth budget? (payload size x frequency x fleet size = daily data volume)
14. Frame rate vs accuracy trade-off? (every frame, 1/sec, triggered)
Fleet Management:
15. How many devices? Current and projected?
16. Device provisioning workflow? (certificate issuance, registration, initial config)
17. OTA update strategy? (Greengrass, custom, staged rollout %)
18. Health monitoring? (heartbeat interval, metrics reported, alerting thresholds)
19. Decommissioning? (certificate revocation, data cleanup)
Deliverable
markdown
## Edge Architecture
### Device Profile
- Hardware: [Specs]
- Constraints: [Power, connectivity, environment]
- Sensors: [List with interfaces]
### Offline Strategy
- Queue: [Technology, max size, overflow policy]
- Sync: [Strategy, priority]
- Max Offline Duration: [Hours/days]
### Resource Budget
|----------|--------|------------|
| CPU/GPU | 100% | Inference %, Upload %, Other % |
| RAM | [Size] | Model %, Queue %, OS % |
| Storage | [Size] | Models %, Queue %, Logs % |
| Bandwidth | [Daily] | Detections %, Telemetry %, Updates % |
### Fleet Management
- Fleet Size: [Current -> Projected]
- Provisioning: [Workflow]
- OTA Updates: [Strategy, rollout %]
- Health Monitoring: [Metrics, intervals, alerts]
Phase 7: API Specification
Applies to: Full-Stack, Voice, Edge/IoT+ML, Mobile (with backend)
Output: API spec appended to system design doc or separate doc
Process
For each system boundary identified in Phase 3 (Data Flow):
-
List all endpoints/contracts:
- REST endpoints (method, path)
- Event schemas (SQS messages, EventBridge events)
- Webhook contracts (incoming from third parties)
- Device protocols (MQTT topics, IoT shadow schemas)
- Tool interfaces (voice agent tools, function calling schemas)
-
For each endpoint, define:
- Auth requirement (JWT, API key, service role, webhook signature, mTLS)
- Request schema (with types, required/optional, validation rules)
- Response schema (success + error shapes)
- Rate limiting tier (public, authenticated, internal, service-to-service)
- Idempotency (safe to retry? idempotency key?)
-
Error format standard:
- Agree on ONE error shape across all APIs
- Include: status code, error code, human message, details object
Deliverable
markdown
## API Specification
### Error Format
{ status, code, message, details }
### Endpoints
#### [Boundary Name]
|--------|------|------|------------|------------|
| POST | /api/example | JWT | 100/min | Yes (key) |
**Request:** { ... }
**Response:** { ... }
**Errors:** 400 (validation), 401 (auth), 429 (rate limit)
Phase 8: Voice Agent Prompt Design
Applies to: Voice Agent
Invoke:
Output: Voice prompt doc
Phase 9: Infrastructure Design
Applies to: Edge/IoT+ML, and any project with self-managed cloud infrastructure
Source: Infrastructure as Code (Kief Morris)
Output: Infrastructure section appended to system design doc
Questions to Ask
-
IaC Tool & Module Structure:
- What IaC tool? (Terraform, Pulumi, CDK, CloudFormation)
- Module boundaries — which resources belong together?
- Shared vs environment-specific modules?
-
State Management:
- Remote state backend? (S3, Terraform Cloud, Azure Blob)
- State locking mechanism?
- State file per environment or per module?
-
Environment Strategy:
- How many environments? (dev, staging, prod)
- How do changes promote? (manual apply, CI/CD pipeline, GitOps)
- Blast radius of a bad apply — what's the worst case?
-
CI/CD Pipeline:
- Plan on PR, apply on merge?
- Who approves infrastructure changes?
- Rollback strategy for infrastructure?
-
IaC Testing:
- Static analysis? (tfsec, Checkov, OPA)
- Plan validation? (terraform plan diff review)
- Integration tests? (test environment that mirrors prod)
-
Secrets Management:
- Where do secrets live? (Vault, Secrets Manager, SSM Parameter Store)
- Rotation schedule?
- Emergency revocation process?
-
Cost Estimation:
- What's the compute cost per unit of work? (per API call, per inference, per voice minute)
- What are the third-party API costs at projected volume? (Twilio per-minute, Deepgram per-hour, Stripe per-transaction, S3 per-GB)
- What's the storage growth projection? (GB/month now, in 6 months, in 2 years)
- What's the monthly burn at current scale? At 10x scale?
- Where are the cost cliffs? (Aurora serverless scaling tiers, Lambda invocation thresholds, data transfer costs)
- Is there a cost ceiling / budget constraint?
- What's the cost-per-user or cost-per-unit-of-value? (Does the unit economics work?)
Deliverable
markdown
## Infrastructure Design
### IaC Structure
- Tool: [Terraform/Pulumi/etc.]
- Modules: [List with responsibilities]
- State: [Backend, locking, per-environment strategy]
### Environment Promotion
- Environments: [List]
- Promotion Flow: [PR -> plan -> review -> apply]
- Rollback: [Strategy]
### Secrets
- Store: [Tool]
- Rotation: [Schedule]
- Emergency Revocation: [Process]
### Cost Estimation
|----------|-----------|----------------|-------------|--------|
| [Compute] | [$/unit] | [units/month] | [$] | [$] |
| [Storage] | [$/GB] | [GB] | [$] | [$] |
| [Third-party API] | [$/call] | [calls/month] | [$] | [$] |
| **Total** | | | **[$]** | **[$]** |
Cost Ceiling: [Budget constraint if any]
Cost-per-User: [$/user/month at projected scale]
Phase 10: Implementation Planning
Applies to: All project types
Invoke:
Input: All design docs produced in prior phases
Testing Strategy Addition
When creating the implementation plan, ensure each task specifies which level of the testing pyramid it targets:
| Level | What It Tests | When to Use |
|---|
| Unit | Single function/component in isolation | Every task (TDD) |
| Integration | Two+ components together, real dependencies | API endpoints, DB queries, service integrations |
| Contract | API shape matches between producer/consumer | Cross-service boundaries, webhook contracts, device protocols |
| End-to-End | Full user flow through the system | Critical paths only (login, core transaction, detection pipeline) |
| Load | Performance under expected/peak traffic | After core features are built |
Source: Growing Object-Oriented Software, Guided by Tests (Freeman & Pryce)
Output: Implementation plan at
docs/plans/YYYY-MM-DD-<topic>-plan.md
Phase 11: Implementation
Applies to: All project types
Choose execution approach:
- Subagent-driven (current session): Invoke
/subagent-driven-development
- Parallel session (separate): Invoke in a new session
Phase 12: Security Validation
Applies to: Full-Stack, Voice, Edge/IoT+ML, Mobile (with backend)
Invoke: and/or
Run BEFORE deployment. Verify that security-by-design decisions from Phase 3 were actually implemented.
For Edge/IoT projects, additionally verify:
- Device certificates: valid, unique per device, rotation scheduled
- MQTT topic security: devices can only publish to their own topics
- Firmware integrity: signed updates, verified on device
- Physical security: what credentials are on the device if someone steals it?
Output: Security audit report
Phase 13: Observability Design
Applies to: Full-Stack, Voice, Edge/IoT+ML
Source: Observability Engineering (Charity Majors)
Output: Observability section appended to system design doc
Questions to Ask
-
Structured Logging:
- What logging format? (JSON, structured key-value)
- What fields on every log line? (timestamp, service, request_id, user_id, trace_id)
- Correlation IDs — how do you trace a request across services?
-
Distributed Tracing:
- What spans exist? (one per service hop in the request path)
- What tool? (X-Ray, Jaeger, OpenTelemetry)
- What sampling rate? (100% in staging, 10% in prod, 100% for errors)
-
Metrics:
- RED metrics per service: Rate, Errors, Duration
- Business metrics: orders/hour, detections/day, review latency
- Infrastructure metrics: CPU, memory, queue depth, cache hit rate
- Use percentiles (p50, p95, p99), not averages
-
Alerting:
- What's worth waking someone up for? (data loss, service down, security breach)
- What can wait until morning? (elevated error rate, slow responses, queue backlog)
- Alert fatigue prevention — fewer, better alerts
-
Dashboards:
- One dashboard per bounded context
- Top-level "system health" dashboard
- On-call runbook linked from each alert
Deliverable
markdown
## Observability
### Logging
- Format: [JSON/structured]
- Required Fields: [timestamp, service, request_id, trace_id, ...]
- Correlation: [How trace IDs propagate]
### Tracing
- Tool: [X-Ray/Jaeger/OTEL]
- Spans: [List of spans in critical path]
- Sampling: [Rate per environment]
### Metrics
|--------|------|----------------|
| [name] | [RED/business/infra] | [threshold] |
### Alerting
|-------|----------|--------|
| [What] | [Page/Warning/Info] | [Runbook link] |
Phase 14: ML Validation
Applies to: Edge/IoT+ML
Source: Designing Machine Learning Systems (Chip Huyen), Reliable Machine Learning (Cathy Chen et al.)
Output: ML validation report
Validation Checklist
Run after implementation, before production deployment:
-
Model Performance:
- Precision, recall, F1 on held-out test set
- Performance per class (not just aggregate)
- Performance on edge cases (night, rain, dust, unusual angles)
- Latency on target hardware (not just dev machine)
-
Data Quality:
- Label consistency audit (sample and re-label, measure agreement)
- Data leakage check (training data contaminating test set)
- Distribution shift check (training data vs production data)
-
Robustness:
- Adversarial inputs (unusual lighting, occlusion, camera artifacts)
- Out-of-distribution detection (does the model know when it doesn't know?)
- Confidence calibration (does 90% confidence mean 90% accuracy?)
-
Fairness & Bias:
- Performance across operating conditions (time of day, weather, road type)
- False positive/negative rates across conditions
- Are some environments systematically underrepresented?
-
Operational Readiness:
- Model loads correctly on target hardware
- Inference fits within resource budget (Phase 6)
- Offline queue handles expected volume
- Monitoring pipeline captures metrics correctly
Deliverable
markdown
## ML Validation Report
### Performance
|--------|---------|---------|---------|------------|
| Precision | | | | |
| Recall | | | | |
| F1 | | | | |
| Latency (ms) | | | | |
### Data Quality
- Label Agreement: [%]
- Leakage Check: [Pass/Fail]
- Distribution Shift: [Within/Outside tolerance]
### Robustness
- Adversarial: [Results]
- OOD Detection: [Method, threshold]
- Calibration: [ECE score]
### Go/No-Go Decision
[Ready / Needs retraining / Needs more data]
Phase 15: Polish & Review
Applies to: All project types
Route to the appropriate review skill(s):
| Project Type | Review Skills |
|---|
| macOS App | + (review mode) |
| iOS Mobile App | (review mode) + |
| Web Frontend | + |
| Full-Stack Web | + |
| Voice Agent | (conversational flow review) |
| Edge/IoT + ML | (webapp) + (webapp) |
After review skills complete, invoke
on the full codebase.
Resumption Protocol
If starting a new session mid-project:
- Check for existing artifacts
- Read each doc to understand decisions already made
- Determine which phase produced the last artifact
- Resume from the next phase
Artifact -> Phase mapping:
| Artifact | Phase Completed |
|---|
| System Assessment section | Phase 0.5 (Existing System Assessment) |
| Phase 1 (Brainstorming) |
| Domain Model section in design doc | Phase 2 |
| Phase 3 (DDIA) |
| Resilience section in system design | Phase 4 |
| Phase 5 |
| Edge Architecture section | Phase 6 |
| API Specification section/doc | Phase 7 |
| Voice prompt doc | Phase 8 |
| Infrastructure section | Phase 9 |
| Phase 10 (Writing Plans) |
| Code exists + tests pass | Phase 11 (Implementation) |
| Security audit report | Phase 12 |
| Observability section | Phase 13 |
| ML validation report | Phase 14 |
| Review findings addressed | Phase 15 |
Anti-Patterns
| Mistake | Fix |
|---|
| Skipping to implementation | Always start at Phase 0, even if "you know what you're building" |
| Running all 15 phases for a simple macOS app | Trust the router — it selects only applicable phases |
| Treating security as Phase 12 only | Security-by-design is in Phase 3; Phase 12 validates it was implemented |
| Designing the ML pipeline after building the API | Phases are sequential — ML decisions affect API shape |
| Writing plans without a domain model | Plans based on a vague domain produce vague tasks |
| Skipping resilience for "internal" services | Internal services fail too — especially at 3am |
| Averaging latency instead of using percentiles | p50 hides tail latency; use p95/p99 |
| Adding features to an existing system without mapping it first | Run Phase 0.5 — understand what exists before designing what's new |
| Treating accessibility as a Phase 15 afterthought | Accessibility-by-design in Phase 3 catches issues that are expensive to retrofit |
| Ignoring cloud costs until the bill arrives | Cost estimation in Phase 9 prevents surprises — unit economics matter |
Book References
| Phase | Book | Author |
|---|
| 2. Domain Modeling | Domain-Driven Design | Eric Evans |
| 3. System Design | Designing Data-Intensive Applications | Martin Kleppmann |
| 4. Resilience | Release It! | Michael Nygard |
| 5. ML Pipeline | Designing Machine Learning Systems | Chip Huyen |
| 9. Infrastructure | Infrastructure as Code | Kief Morris |
| 10. Testing Strategy | Growing Object-Oriented Software, Guided by Tests | Freeman & Pryce |
| 13. Observability | Observability Engineering | Charity Majors |
| 14. ML Validation | Reliable Machine Learning | Cathy Chen et al. |
| 15. UI Polish | Refactoring UI | Wathan & Schoger |
| 15. UX Review | Don't Make Me Think | Steve Krug |