software-forge

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Project Orchestrator

项目编排器

Overview

概述

Universal project lifecycle skill. Classifies your project type, builds a phase plan, then walks through each phase sequentially — invoking existing skills where they exist and running inline design phases where they don't.
The rule: No project uses all phases. The router selects 4–14 phases based on what you're actually building.
Announce at start: "I'm using the orchestrate skill to guide this project through its lifecycle."
通用项目生命周期skill。它会识别你的项目类型,制定阶段计划,然后按顺序推进每个阶段——存在对应skill的情况下会调用现有skill,没有的情况下则运行内联设计阶段。
规则: 没有项目需要用到所有阶段。路由器会根据你实际开发的内容选择4-14个阶段。
启动时告知: "我正在使用orchestrate skill来引导这个项目走完整个生命周期。"

When to Use

适用场景

  • Starting a new project from scratch (greenfield)
  • Adding a major feature that changes architecture, data flow, or integrations
  • Unsure which skills to invoke and in what order
  • Starting work on a project type you haven't classified before
When NOT to use:
  • Small bug fixes, typos, minor UI tweaks — just do the work
  • Pure research or exploration — use Explore agent directly
  • Single-file changes with clear requirements — use TDD directly
  • You already know exactly which single skill applies (e.g., just need
    /security-audit
    )
  • 从零启动新项目(绿地项目)
  • 添加会改变架构、数据流或集成关系的重大功能
  • 不确定要调用哪些skill以及调用顺序
  • 开始开发你之前没有归类过的项目类型
不适用场景:
  • 小bug修复、拼写错误、微小UI调整——直接处理即可
  • 纯研究或探索——直接使用Explore agent
  • 需求明确的单文件变更——直接使用TDD
  • 你已经明确知道要使用哪个单一skill(比如只需要
    /security-audit

How It Works

工作原理

  1. Classify — Ask what you're building, determine project type
  2. Route — Select the phases that apply
  3. Execute — Walk through each phase sequentially
  4. Handoff — Each phase produces a doc artifact; later phases build on earlier ones
All artifacts are saved to
docs/plans/
. If resuming mid-project, check which docs already exist to determine current phase.

  1. 分类——询问你要开发的内容,确定项目类型
  2. 路由——选择适用的阶段
  3. 执行——按顺序推进每个阶段
  4. 交接——每个阶段都会生成文档产物,后续阶段会基于前期产物继续开发
所有产物都会保存到
docs/plans/
目录下。如果是项目中途恢复开发,请检查已存在的文档来确定当前所处阶段。

Phase 0: Project Classification

第0阶段:项目分类

Ask ONE question to classify the project:
"What are you building?"
TypeIndicatorsPhase Count
macOS AppDesktop UI, SwiftUI, AppKit, menu bar app4 phases
iOS Mobile AppiPhone/iPad, SwiftUI, UIKit, App Store5–9 phases
Web FrontendReact, Vue, static site, no backend5 phases
Full-Stack WebFrontend + database + API + auth10 phases
Voice AgentLiveKit, telephony, STT/TTS, conversational AI11 phases
Edge/IoT + MLHardware devices, computer vision, ML pipeline, fleet management13 phases
Sub-classification questions (if needed):
  • Mobile/Web: "Does it have a backend?" — if yes, add full-stack phases
  • Any type: "Does it integrate with external services?" — if yes, add resilience phase
  • Any type: "Will this be deployed to cloud infrastructure you manage?" — if yes, add infrastructure phase
  • Any type: "Is this a new project or adding to an existing system?" — if existing, add system assessment phase
问一个问题来给项目分类:
"你要开发什么内容?"
类型判定特征阶段数量
macOS App桌面UI、SwiftUI、AppKit、菜单栏应用4个阶段
iOS Mobile AppiPhone/iPad、SwiftUI、UIKit、App Store5-9个阶段
Web FrontendReact、Vue、静态站点、无后端5个阶段
Full-Stack Web前端 + 数据库 + API + 鉴权10个阶段
Voice AgentLiveKit、电话通信、STT/TTS、对话式AI11个阶段
Edge/IoT + ML硬件设备、计算机视觉、ML pipeline、集群管理13个阶段
次级分类问题(如有需要):
  • 移动端/网页端:"是否有后端?"——如果是,添加全栈相关阶段
  • 所有类型:"是否需要集成外部服务?"——如果是,添加容灾阶段
  • 所有类型:"是否会部署到你管理的云基础设施上?"——如果是,添加基础设施阶段
  • 所有类型:"这是新项目还是在现有系统上新增功能?"——如果是现有系统,添加系统评估阶段

Route Table

路由表

PhasemacOSiOSWeb FEFull-StackVoiceEdge/IoT+ML
0.5 System Assessmentoooooo
1. Brainstormxxxxxx
2. Domain Modeloxxx
3. System Design + Securityoxxx
4. Resilienceoxxx
5. ML Pipelinex
6. Edge Architecturex
7. API Specificationoxxx
8. Voice Prompt Designx
9. Infrastructureoox
10. Writing Plansxxxxxx
11. Implementationxxxxxx
12. Security Validationoxxx
13. Observabilityxxx
14. ML Validationx
15. Polish & Reviewxxxxxx
x = always applies | o = conditional (based on sub-classification) | blank = skip
阶段macOSiOSWeb FE全栈语音边缘/IoT+ML
0.5 系统评估oooooo
1. 头脑风暴xxxxxx
2. 领域建模oxxx
3. 系统设计 + 安全左移oxxx
4. 容灾设计oxxx
5. ML Pipelinex
6. 边缘架构x
7. API 规范oxxx
8. 语音Prompt设计x
9. 基础设施oox
10. 开发计划编写xxxxxx
11. 开发实现xxxxxx
12. 安全验证oxxx
13. 可观测性xxx
14. ML 验证x
15. 优化与评审xxxxxx
x = 始终适用 | o = 条件适用(基于次级分类结果) | 空白 = 跳过

Compile Your Phase Plan

生成阶段计划

After classification, explicitly list the active phases for this project before proceeding:
  1. Review the route table for your project type
  2. For each "o" phase, check the sub-classification answers to determine if it's active
  3. Write out the numbered list of active phases (e.g., "Active phases: 0.5, 1, 2, 3, 4, 7, 10, 11, 12, 13, 15")
  4. Present the phase plan to the user for confirmation before starting
This prevents accidentally skipping or running wrong phases.

分类完成后,继续推进前需要明确列出本项目的生效阶段
  1. 对照路由表查看你项目类型对应的阶段
  2. 针对每个"o"阶段,结合次级分类的答案判断是否生效
  3. 写出编号后的生效阶段列表(例如:"生效阶段:0.5, 1, 2, 3, 4, 7, 10, 11, 12, 13, 15")
  4. 开始前将阶段计划提交给用户确认
这可以避免意外跳过或运行错误的阶段。

Phase 0.5: Existing System Assessment

第0.5阶段:现有系统评估

Applies to: All project types, only when adding to an existing project (skip for greenfield) Output: System assessment section prepended to design doc
适用范围: 所有项目类型,仅当在现有项目上新增功能时适用(绿地项目跳过) 输出: 系统评估章节,会前置添加到设计文档中

Purpose

目的

Before brainstorming new features, understand what already exists. Designing without mapping the current system produces plans that conflict with existing architecture, duplicate existing capabilities, or ignore existing tech debt.
在头脑风暴新功能前,先了解现有系统的情况。没有梳理当前系统就做设计,会导致计划与现有架构冲突、重复开发已有能力,或者忽略现有技术债务。

Process

流程

  1. Map the current architecture:
    • Read README, docs/, and any existing design docs
    • Identify the tech stack, key dependencies, and deployment model
    • Map the data model (schemas, migrations, key entities)
    • Identify the main entry points and request flows
  2. Identify constraints and boundaries:
    • What patterns and conventions does the codebase follow?
    • What are the existing API contracts that must not break?
    • What tech debt or known issues exist? (check issues, TODOs, CHANGELOG)
    • What dependencies are pinned or constrained?
  3. Assess the test and CI situation:
    • What test coverage exists? What's tested vs untested?
    • What CI/CD pipeline exists? What checks run on PR?
    • How are deployments done today?
  4. Summarize the integration surface:
    • What external services are already integrated?
    • What internal APIs exist that the new feature could reuse?
    • Where are the seams — natural places to extend without rewriting?
  1. 梳理当前架构:
    • 阅读README、docs/目录以及所有现有设计文档
    • 识别技术栈、核心依赖、部署模式
    • 梳理数据模型(schema、迁移脚本、核心实体)
    • 识别核心入口点和请求流程
  2. 识别约束和边界:
    • 代码库遵循哪些模式和规范?
    • 哪些现有API契约不能被破坏?
    • 存在哪些技术债务或已知问题?(查看issue、TODO、CHANGELOG)
    • 哪些依赖是固定版本或有约束?
  3. 评估测试和CI情况:
    • 现有测试覆盖率是多少?哪些内容有测试、哪些没有?
    • 现有CI/CD pipeline是什么样的?PR上会运行哪些检查?
    • 目前是怎么做部署的?
  4. 总结集成面:
    • 已经集成了哪些外部服务?
    • 有哪些内部API可以供新功能复用?
    • 系统的接缝在哪里——即不需要重写就可以扩展的天然位置?

Deliverable

交付物

markdown
undefined
markdown
undefined

Existing System Assessment

现有系统评估

Architecture Summary

架构总结

  • Stack: [Languages, frameworks, infrastructure]
  • Key Components: [List with one-line descriptions]
  • Data Model: [Key entities and relationships]
  • 技术栈: [编程语言、框架、基础设施]
  • 核心组件: [列表加单行描述]
  • 数据模型: [核心实体和关系]

Constraints

约束

  • Must Not Break: [Existing APIs, contracts, behaviors]
  • Tech Debt: [Known issues that affect the new work]
  • Conventions: [Patterns the new code must follow]
  • 不可破坏内容: [现有API、契约、行为]
  • 技术债务: [影响新开发的已知问题]
  • 规范: [新代码必须遵循的模式]

Integration Surface

集成面

  • Reusable: [Existing APIs/components the new feature can leverage]
  • Seams: [Natural extension points]
  • 可复用内容: [新功能可以利用的现有API/组件]
  • 接缝: [天然扩展点]

Test & CI Status

测试与CI状态

  • Coverage: [What's tested, what's not]
  • Pipeline: [What runs on PR/merge/deploy]

---
  • 覆盖率: [哪些有测试、哪些没有]
  • Pipeline: [PR/合并/部署时运行的内容]

---

Phase 1: Brainstorming

第1阶段:头脑风暴

Applies to: All project types Invoke:
/brainstorming
Output: Design doc at
docs/plans/YYYY-MM-DD-<topic>-design.md
If Phase 0.5 produced an assessment, feed it into brainstorming as context so the design builds on what exists rather than conflicting with it.
Do not proceed until the design doc is approved and committed.
Next: Proceed to Phase 2 (Domain Modeling) if active, otherwise skip to next active phase.

适用范围: 所有项目类型 调用:
/brainstorming
输出: 设计文档,路径为
docs/plans/YYYY-MM-DD-<topic>-design.md
如果第0.5阶段生成了评估报告,将其作为上下文传入头脑风暴环节,这样设计会基于现有系统构建,而不会产生冲突。
设计文档获批并提交前不要继续推进。
下一步: 如果第2阶段(领域建模)生效则进入该阶段,否则跳转到下一个生效阶段。

Phase 2: Domain Modeling

第2阶段:领域建模

Applies to: Full-Stack, Voice, Edge/IoT+ML, Mobile (with backend) Source: Domain-Driven Design (Eric Evans) Output: Domain model section appended to design doc
适用范围: 全栈、语音、边缘/IoT+ML、带后端的移动端项目 来源: Domain-Driven Design (Eric Evans) 输出: 领域模型章节,追加到设计文档末尾

Questions to Ask

待确认问题

Work through these one at a time:
  1. Bounded Contexts: What are the distinct areas of the business domain?
    • Each context has its own ubiquitous language, models, and rules
    • Example (AiSyst): Ordering, Menu Management, Voice Interaction, Billing
    • Example (RCM): Detection, Review, Training, Fleet Management, Telemetry
  2. Aggregates: Within each context, what are the consistency boundaries?
    • An aggregate is a cluster of entities that must be consistent together
    • What invariants must hold within each aggregate?
    • Example: An "Order" aggregate — items can't be empty, total must match items, status transitions are valid
  3. Domain Events: What important things happen that other contexts care about?
    • Events cross context boundaries; commands stay within them
    • Example: "DetectionCreated" -> triggers Review context; "ReviewCompleted" -> triggers Training context
  4. Context Map: How do bounded contexts communicate?
    • Shared kernel, customer-supplier, conformist, anti-corruption layer?
    • Where are the translation layers needed?
逐个梳理以下问题:
  1. 限界上下文: 业务领域有哪些不同的分区?
    • 每个上下文有自己的通用语言、模型和规则
    • 示例(AiSyst):订单、菜单管理、语音交互、计费
    • 示例(RCM):检测、审核、训练、集群管理、遥测
  2. 聚合: 每个上下文内的一致性边界是什么?
    • 聚合是必须保持一致的实体集群
    • 每个聚合内必须满足哪些不变量?
    • 示例:"订单"聚合——商品不能为空,总价必须和商品总价匹配,状态流转必须合法
  3. 领域事件: 有哪些其他上下文关心的重要事件发生?
    • 事件跨上下文边界;命令则在上下文内部
    • 示例:"DetectionCreated" -> 触发审核上下文;"ReviewCompleted" -> 触发训练上下文
  4. 上下文映射: 限界上下文之间如何通信?
    • 共享内核、客户-供应商、遵奉者、防腐层?
    • 哪些位置需要转换层?

Deliverable

交付物

markdown
undefined
markdown
undefined

Domain Model

领域模型

Bounded Contexts

限界上下文

  • [Context Name]: [Purpose, key entities, invariants]
  • [上下文名称]: [用途、核心实体、不变量]

Aggregates

聚合

  • [Aggregate Name]: [Root entity, child entities, invariants]
  • [聚合名称]: [根实体、子实体、不变量]

Domain Events

领域事件

  • [EventName]: [Source context] -> [Target context(s)]
  • [事件名称]: [源上下文] -> [目标上下文]

Context Map

上下文映射

[How contexts relate and communicate]

---
[上下文之间的关联和通信方式]

---

Phase 3: System Design + Security-by-Design

第3阶段:系统设计 + 安全左移

Applies to: Full-Stack, Voice, Edge/IoT+ML, Mobile (with backend) Invoke:
/ddia-design
Output: System design doc at
docs/plans/YYYY-MM-DD-<topic>-system-design.md
适用范围: 全栈、语音、边缘/IoT+ML、带后端的移动端项目 调用:
/ddia-design
输出: 系统设计文档,路径为
docs/plans/YYYY-MM-DD-<topic>-system-design.md

Security-by-Design Injection Points

安全左移注入点

IMPORTANT: Before invoking
/ddia-design
, write down the injection points below as a checklist. At each DDIA phase transition, check the list before proceeding. Do NOT rely on memory — the DDIA skill's own flow will consume your attention.
While running
/ddia-design
, inject these additional questions at three phases:
At DDIA Phase 2 (Storage & Data Model):
  • What access control model per table/collection? (RLS, RBAC, ABAC)
  • Which fields contain PII? Encryption at rest strategy?
  • What are the access patterns per role? (admin sees all, user sees own)
  • Audit logging: which mutations need an audit trail?
At DDIA Phase 3 (Data Flow & Integration):
  • What auth mechanism at each boundary? (JWT, API key, mTLS, webhook signature)
  • How are secrets managed? (Environment vars, Vault, Secrets Manager)
  • Transport security per channel? (TLS, mTLS for service-to-service)
  • Which data crosses trust boundaries? What validation is needed at each?
At DDIA Phase 5 (Correctness & Cross-Cutting):
  • What is the threat model? (STRIDE per component)
  • Input validation strategy per boundary? (Zod schemas, parameterized queries)
  • Rate limiting per endpoint tier? (public vs authenticated vs internal)
  • What happens if a credential is compromised? Rotation and revocation plan?
重要提示:调用
/ddia-design
前,先把下面的注入点写成检查清单。每次DDIA阶段切换时,先对照检查清单再推进。不要靠记忆——DDIA skill本身的流程会占用你的注意力。
运行
/ddia-design
时,在三个阶段注入以下额外问题:
DDIA第2阶段(存储与数据模型):
  • 每个表/集合的访问控制模型是什么?(RLS、RBAC、ABAC)
  • 哪些字段包含PII?静态加密策略是什么?
  • 每个角色的访问模式是什么?(管理员查看所有内容,用户仅查看自己的内容)
  • 审计日志:哪些变更需要留审计 trail?
DDIA第3阶段(数据流与集成):
  • 每个边界的鉴权机制是什么?(JWT、API密钥、mTLS、webhook签名)
  • 密钥如何管理?(环境变量、Vault、Secrets Manager)
  • 每个通道的传输安全策略是什么?(TLS、服务间通信用mTLS)
  • 哪些数据会跨信任边界?每个边界需要做什么验证?
DDIA第5阶段(正确性与跨切面需求):
  • 威胁模型是什么?(每个组件对应STRIDE模型)
  • 每个边界的输入验证策略是什么?(Zod schema、参数化查询)
  • 每个端点层级的限流策略是什么?(公开、已认证、内部)
  • 凭证泄露后会发生什么?轮换和撤销计划是什么?

Accessibility-by-Design Injection Point (Web, Mobile, Desktop)

无障碍设计注入点(网页、移动端、桌面端)

Inject at DDIA Phase 8 (Frontend & Derived Views) for any project with a UI:
  • What WCAG level are you targeting? (A, AA, AAA — AA is the standard for most products)
  • Color contrast: do all text/background combinations meet the target ratio? (4.5:1 for AA normal text, 3:1 for large text)
  • Keyboard navigation: can every interactive element be reached and operated without a mouse?
  • Screen reader strategy: what semantic HTML / ARIA roles are needed? What's the heading hierarchy?
  • Motion: do animations respect
    prefers-reduced-motion
    ? Are there alternatives for motion-dependent interactions?
  • Touch targets: are all interactive elements at least 44x44pt (iOS) / 48x48dp (Android)?
These questions are proactive — catching contrast issues and keyboard traps during design costs minutes; fixing them after implementation costs hours.
针对所有带UI的项目,在**DDIA第8阶段(前端与派生视图)**注入:
  • 你的目标WCAG等级是多少?(A、AA、AAA——大部分产品的标准是AA)
  • 色彩对比度:所有文本/背景组合是否满足目标比例?(AA级普通文本是4.5:1,大文本是3:1)
  • 键盘导航:所有可交互元素是否都可以不用鼠标访问和操作?
  • 屏幕阅读器适配策略:需要哪些语义化HTML / ARIA角色?标题层级是什么样的?
  • 动效:动画是否尊重
    prefers-reduced-motion
    设置?依赖动效的交互有没有替代方案?
  • 触摸目标:所有可交互元素是否至少达到44x44pt(iOS)/48x48dp(Android)的尺寸?
这些问题是前瞻性的——设计阶段发现对比度问题和键盘陷阱只需要几分钟,实现后再修复就要花几个小时。

Deliverable

交付物

The standard DDIA design summary doc, with security and accessibility decisions integrated into each relevant phase (not as a separate section).

标准DDIA设计总结文档,安全和无障碍决策会集成到每个相关阶段(不是单独作为一个章节)。

Phase 4: Resilience Patterns

第4阶段:容灾模式

Applies to: Full-Stack, Voice, Edge/IoT+ML, Mobile (with external services) Source: Release It! (Michael Nygard) Output: Resilience section appended to system design doc
适用范围: 全栈、语音、边缘/IoT+ML、对接外部服务的移动端项目 来源: Release It! (Michael Nygard) 输出: 容灾章节,追加到系统设计文档末尾

Questions to Ask

待确认问题

For each external dependency (API, database, message queue, third-party service):
  1. Failure Mode: What happens when this dependency is unavailable?
    • Timeout? Error response? Silent data loss?
    • How long can you tolerate the outage?
  2. Circuit Breaker: Should you fail fast after N failures?
    • What's the threshold? (e.g., 5 failures in 30 seconds)
    • What's the half-open recovery strategy?
  3. Timeout Budget: What's the maximum wait time?
    • For voice agents: total turn budget (STT + LLM + TTS must complete before silence)
    • For web: p95 response time target per endpoint
  4. Retry Policy: Is the operation safe to retry?
    • Idempotent? -> Retry with exponential backoff
    • Non-idempotent? -> Fail and surface to user
    • Maximum retries before circuit opens?
  5. Bulkhead: Does failure in one integration affect others?
    • Separate thread pools / connection pools per dependency?
    • Can a slow Stripe response block voice ordering?
  6. Graceful Degradation: What's the reduced-functionality mode?
    • POS down -> queue orders for later sync?
    • GPS unavailable -> save detection without coordinates?
    • Cache down -> serve from DB (slower but functional)?
针对每个外部依赖(API、数据库、消息队列、第三方服务):
  1. 失败模式: 这个依赖不可用时会发生什么?
    • 超时?错误响应?静默数据丢失?
    • 你能容忍的最长中断时间是多久?
  2. 熔断机制: N次失败后是否需要快速失败?
    • 阈值是多少?(比如30秒内5次失败)
    • 半开恢复策略是什么?
  3. 超时预算: 最大等待时间是多少?
    • 语音agent:总响应预算(STT + LLM + TTS必须在静默前完成)
    • 网页端:每个端点的p95响应时间目标
  4. 重试策略: 操作是否安全重试?
    • 幂等?-> 指数退避重试
    • 非幂等?-> 失败并告知用户
    • 熔断前最大重试次数?
  5. 舱壁模式: 一个集成失败是否会影响其他集成?
    • 每个依赖是否有独立的线程池/连接池?
    • 缓慢的Stripe响应会不会阻塞语音下单?
  6. 优雅降级: 功能缩减模式是什么?
    • POS不可用 -> 把订单存入队列后续同步?
    • GPS不可用 -> 保存检测结果不带坐标?
    • 缓存不可用 -> 从数据库读取(更慢但可用)?

Deliverable

交付物

markdown
undefined
markdown
undefined

Resilience Patterns

容灾模式

DependencyFailure ModeCircuit BreakerTimeoutRetryDegradation
[Service][What breaks][Threshold][ms][Policy][Fallback]

---
依赖失败模式熔断策略超时时间重试策略降级方案
[服务][影响范围][阈值][毫秒][策略][降级方案]

---

Phase 5: ML Pipeline Design

第5阶段:ML Pipeline设计

Applies to: Edge/IoT+ML Source: Designing Machine Learning Systems (Chip Huyen) Output: ML pipeline doc at
docs/plans/YYYY-MM-DD-<topic>-ml-pipeline.md
适用范围: 边缘/IoT+ML 来源: Designing Machine Learning Systems (Chip Huyen) 输出: ML pipeline文档,路径为
docs/plans/YYYY-MM-DD-<topic>-ml-pipeline.md

Questions to Ask

待确认问题

Data Pipeline:
  1. What is the training data source? (labeled images, sensor data, logs)
  2. How is data labeled? (manual, semi-automated, active learning)
  3. What is the labeling quality control process?
  4. Data versioning strategy? (DVC, S3 versioning, git-lfs)
  5. Class imbalance — what's the distribution? Augmentation strategy?
  6. Train/val/test split strategy? (random, temporal, geographic)
Model Lifecycle: 7. Model architecture selection criteria? (accuracy vs latency vs size) 8. Experiment tracking? (MLflow, W&B, spreadsheet) 9. Model versioning scheme? (dev/staging/prod, semver) 10. Export format for deployment? (ONNX, TensorRT, CoreML, .pt) 11. Model size budget? (edge device storage + memory constraints)
Deployment & Serving: 12. How does a new model reach production? (OTA, manual flash, staged rollout) 13. Canary deployment? (% of fleet on new model before full rollout) 14. Rollback strategy? (automatic on metric degradation, manual) 15. A/B testing — how do you compare model versions in production?
Monitoring & Retraining: 16. What metrics define model health? (precision, recall, F1, latency) 17. How is drift detected? (data drift, concept drift, prediction drift) 18. What triggers retraining? (metric threshold, scheduled, manual) 19. Human-in-the-loop feedback loop — how long from detection to retraining? 20. Cold start — what happens when the model encounters a new environment?
数据Pipeline:
  1. 训练数据来源是什么?(标注图片、传感器数据、日志)
  2. 数据如何标注?(人工、半自动化、主动学习)
  3. 标注质量控制流程是什么?
  4. 数据版本管理策略?(DVC、S3版本控制、git-lfs)
  5. 类别不平衡——分布情况如何?增强策略是什么?
  6. 训练/验证/测试集拆分策略?(随机、时间、地理)
模型生命周期: 7. 模型架构选择标准?(准确率 vs 延迟 vs 体积) 8. 实验跟踪方式?(MLflow、W&B、电子表格) 9. 模型版本管理规则?(dev/staging/prod、语义化版本) 10. 部署导出格式?(ONNX、TensorRT、CoreML、.pt) 11. 模型体积预算?(边缘设备存储+内存约束)
部署与 serving: 12. 新模型如何上线到生产环境?(OTA、手动烧录、分阶段发布) 13. 金丝雀发布?(全量发布前先给多少比例的设备部署新模型) 14. 回滚策略?(指标下降自动回滚、手动回滚) 15. A/B测试——如何在生产环境对比模型版本?
监控与重训练: 16. 模型健康的衡量指标是什么?(精确率、召回率、F1、延迟) 17. 漂移如何检测?(数据漂移、概念漂移、预测漂移) 18. 什么情况会触发重训练?(指标阈值、定时、手动) 19. 人在环反馈流程——从发现问题到重训练需要多久? 20. 冷启动——模型遇到新环境时如何处理?

Deliverable

交付物

markdown
undefined
markdown
undefined

ML Pipeline Design

ML Pipeline设计

Data Pipeline

数据Pipeline

  • Source: [Where training data comes from]
  • Labeling: [Process, QC, tooling]
  • Versioning: [Strategy]
  • Splits: [Train/val/test ratios and strategy]
  • 来源: [训练数据来源]
  • 标注: [流程、质量控制、工具]
  • 版本管理: [策略]
  • 数据集拆分: [训练/验证/测试比例和策略]

Model Lifecycle

模型生命周期

  • Architecture: [Model, why chosen]
  • Experiment Tracking: [Tool/process]
  • Versioning: [Scheme]
  • Export: [Format, size budget]
  • 架构: [模型、选择原因]
  • 实验跟踪: [工具/流程]
  • 版本管理: [规则]
  • 导出格式: [格式、体积预算]

Deployment

部署

  • Delivery: [OTA/manual, canary %]
  • Rollback: [Trigger and process]
  • 交付方式: [OTA/手动、金丝雀比例]
  • 回滚: [触发条件和流程]

Monitoring

监控

  • Health Metrics: [What to track]
  • Drift Detection: [Method and thresholds]
  • Retraining Trigger: [Conditions]
  • Feedback Loop Latency: [Time from detection to retrained model deployed]

---
  • 健康指标: [需要跟踪的内容]
  • 漂移检测: [方法和阈值]
  • 重训练触发: [条件]
  • 反馈回路延迟: [从发现问题到重训练模型上线的时间]

---

Phase 6: Edge Architecture Design

第6阶段:边缘架构设计

Applies to: Edge/IoT+ML Source: IoT architecture patterns, Release It! edge extensions Output: Edge architecture section appended to system design doc
适用范围: 边缘/IoT+ML 来源: IoT架构模式、Release It! 边缘扩展 输出: 边缘架构章节,追加到系统设计文档末尾

Questions to Ask

待确认问题

Device Constraints:
  1. What hardware? (CPU, GPU, RAM, storage, connectivity)
  2. Power source? (battery, vehicle power, mains)
  3. Physical environment? (temperature range, vibration, dust, moisture)
  4. What sensors? (cameras, GPS, accelerometer, etc.)
Offline-First Design: 5. Expected connectivity patterns? (always-on, intermittent, shift-based) 6. Maximum offline duration to survive? (hours, days) 7. Local queue strategy? (SQLite, file queue, memory buffer) 8. Queue overflow policy? (oldest-first eviction, priority-based, compress) 9. Sync strategy on reconnect? (batch upload, priority queue, bandwidth-aware)
Resource Budgeting: 10. CPU/GPU budget split? (inference %, upload %, logging %, OS overhead %) 11. Memory budget? (model size + working memory + queue + buffers) 12. Storage budget? (model files + offline queue + logs + OS) 13. Bandwidth budget? (payload size x frequency x fleet size = daily data volume) 14. Frame rate vs accuracy trade-off? (every frame, 1/sec, triggered)
Fleet Management: 15. How many devices? Current and projected? 16. Device provisioning workflow? (certificate issuance, registration, initial config) 17. OTA update strategy? (Greengrass, custom, staged rollout %) 18. Health monitoring? (heartbeat interval, metrics reported, alerting thresholds) 19. Decommissioning? (certificate revocation, data cleanup)
设备约束:
  1. 硬件配置是什么?(CPU、GPU、RAM、存储、连接性)
  2. 电源类型?(电池、车载电源、市电)
  3. 物理环境?(温度范围、振动、灰尘、湿度)
  4. 搭载哪些传感器?(摄像头、GPS、加速度计等)
离线优先设计: 5. 预期连接模式?(始终在线、间歇性、按班次联网) 6. 最长需要支持多久的离线运行?(小时、天) 7. 本地队列策略?(SQLite、文件队列、内存缓冲区) 8. 队列溢出策略?(先进先出淘汰、基于优先级、压缩) 9. 重连后的同步策略?(批量上传、优先级队列、带宽自适应)
资源预算: 10. CPU/GPU预算分配?(推理占比、上传占比、日志占比、系统开销占比) 11. 内存预算?(模型体积 + 工作内存 + 队列 + 缓冲区) 12. 存储预算?(模型文件 + 离线队列 + 日志 + 系统) 13. 带宽预算?( payload 大小 x 频率 x 集群规模 = 每日数据量) 14. 帧率与准确率的权衡?(每帧都处理、每秒1次、触发式处理)
集群管理: 15. 设备数量?现有和预期规模? 16. 设备初始化流程?(证书颁发、注册、初始配置) 17. OTA更新策略?(Greengrass、自定义、分阶段发布比例) 18. 健康监控?(心跳间隔、上报指标、告警阈值) 19. 设备下线流程?(证书撤销、数据清理)

Deliverable

交付物

markdown
undefined
markdown
undefined

Edge Architecture

边缘架构

Device Profile

设备配置

  • Hardware: [Specs]
  • Constraints: [Power, connectivity, environment]
  • Sensors: [List with interfaces]
  • 硬件: [规格]
  • 约束: [电源、连接性、环境]
  • 传感器: [列表加接口]

Offline Strategy

离线策略

  • Queue: [Technology, max size, overflow policy]
  • Sync: [Strategy, priority]
  • Max Offline Duration: [Hours/days]
  • 队列: [技术、最大容量、溢出策略]
  • 同步: [策略、优先级]
  • 最长离线时长: [小时/天]

Resource Budget

资源预算

ResourceBudgetAllocation
CPU/GPU100%Inference %, Upload %, Other %
RAM[Size]Model %, Queue %, OS %
Storage[Size]Models %, Queue %, Logs %
Bandwidth[Daily]Detections %, Telemetry %, Updates %
资源总预算分配占比
CPU/GPU100%推理占比、上传占比、其他占比
RAM[大小]模型占比、队列占比、系统占比
存储[大小]模型占比、队列占比、日志占比
带宽[每日总量]检测数据占比、遥测占比、更新占比

Fleet Management

集群管理

  • Fleet Size: [Current -> Projected]
  • Provisioning: [Workflow]
  • OTA Updates: [Strategy, rollout %]
  • Health Monitoring: [Metrics, intervals, alerts]

---
  • 集群规模: [现有 -> 预期]
  • 初始化流程: [流程]
  • OTA更新: [策略、发布比例]
  • 健康监控: [指标、间隔、告警]

---

Phase 7: API Specification

第7阶段:API规范

Applies to: Full-Stack, Voice, Edge/IoT+ML, Mobile (with backend) Output: API spec appended to system design doc or separate doc
适用范围: 全栈、语音、边缘/IoT+ML、带后端的移动端项目 输出: API规范,追加到系统设计文档或者单独成文档

Process

流程

For each system boundary identified in Phase 3 (Data Flow):
  1. List all endpoints/contracts:
    • REST endpoints (method, path)
    • Event schemas (SQS messages, EventBridge events)
    • Webhook contracts (incoming from third parties)
    • Device protocols (MQTT topics, IoT shadow schemas)
    • Tool interfaces (voice agent tools, function calling schemas)
  2. For each endpoint, define:
    • Auth requirement (JWT, API key, service role, webhook signature, mTLS)
    • Request schema (with types, required/optional, validation rules)
    • Response schema (success + error shapes)
    • Rate limiting tier (public, authenticated, internal, service-to-service)
    • Idempotency (safe to retry? idempotency key?)
  3. Error format standard:
    • Agree on ONE error shape across all APIs
    • Include: status code, error code, human message, details object
针对第3阶段(数据流)识别的每个系统边界
  1. 列出所有端点/契约:
    • REST端点(方法、路径)
    • 事件schema(SQS消息、EventBridge事件)
    • Webhook契约(第三方传入)
    • 设备协议(MQTT主题、IoT影子schema)
    • 工具接口(语音agent工具、函数调用schema)
  2. 为每个端点定义:
    • 鉴权要求(JWT、API密钥、服务角色、webhook签名、mTLS)
    • 请求schema(包含类型、必填/可选、验证规则)
    • 响应schema(成功+错误格式)
    • 限流层级(公开、已认证、内部、服务间)
    • 幂等性(是否安全重试?是否需要幂等键?)
  3. 错误格式标准:
    • 所有API统一使用一种错误格式
    • 包含:状态码、错误码、人工可读消息、详情对象

Deliverable

交付物

markdown
undefined
markdown
undefined

API Specification

API规范

Error Format

错误格式

{ status, code, message, details }
{ status, code, message, details }

Endpoints

端点列表

[Boundary Name]

[边界名称]

MethodPathAuthRate LimitIdempotent
POST/api/exampleJWT100/minYes (key)
Request: { ... } Response: { ... } Errors: 400 (validation), 401 (auth), 429 (rate limit)

---
请求方法路径鉴权方式限流规则是否幂等
POST/api/exampleJWT100次/分钟是(需幂等键)
请求: { ... } 响应: { ... } 错误: 400 (参数校验失败), 401 (鉴权失败), 429 (触发限流)

---

Phase 8: Voice Agent Prompt Design

第8阶段:Voice Agent Prompt设计

Applies to: Voice Agent Invoke:
/voice-agent-prompt
Output: Voice prompt doc

适用范围: Voice Agent 调用:
/voice-agent-prompt
输出: 语音prompt文档

Phase 9: Infrastructure Design

第9阶段:基础设施设计

Applies to: Edge/IoT+ML, and any project with self-managed cloud infrastructure Source: Infrastructure as Code (Kief Morris) Output: Infrastructure section appended to system design doc
适用范围: 边缘/IoT+ML、所有使用自管理云基础设施的项目 来源: Infrastructure as Code (Kief Morris) 输出: 基础设施章节,追加到系统设计文档末尾

Questions to Ask

待确认问题

  1. IaC Tool & Module Structure:
    • What IaC tool? (Terraform, Pulumi, CDK, CloudFormation)
    • Module boundaries — which resources belong together?
    • Shared vs environment-specific modules?
  2. State Management:
    • Remote state backend? (S3, Terraform Cloud, Azure Blob)
    • State locking mechanism?
    • State file per environment or per module?
  3. Environment Strategy:
    • How many environments? (dev, staging, prod)
    • How do changes promote? (manual apply, CI/CD pipeline, GitOps)
    • Blast radius of a bad apply — what's the worst case?
  4. CI/CD Pipeline:
    • Plan on PR, apply on merge?
    • Who approves infrastructure changes?
    • Rollback strategy for infrastructure?
  5. IaC Testing:
    • Static analysis? (tfsec, Checkov, OPA)
    • Plan validation? (terraform plan diff review)
    • Integration tests? (test environment that mirrors prod)
  6. Secrets Management:
    • Where do secrets live? (Vault, Secrets Manager, SSM Parameter Store)
    • Rotation schedule?
    • Emergency revocation process?
  7. Cost Estimation:
    • What's the compute cost per unit of work? (per API call, per inference, per voice minute)
    • What are the third-party API costs at projected volume? (Twilio per-minute, Deepgram per-hour, Stripe per-transaction, S3 per-GB)
    • What's the storage growth projection? (GB/month now, in 6 months, in 2 years)
    • What's the monthly burn at current scale? At 10x scale?
    • Where are the cost cliffs? (Aurora serverless scaling tiers, Lambda invocation thresholds, data transfer costs)
    • Is there a cost ceiling / budget constraint?
    • What's the cost-per-user or cost-per-unit-of-value? (Does the unit economics work?)
  1. IaC工具与模块结构:
    • 使用什么IaC工具?(Terraform、Pulumi、CDK、CloudFormation)
    • 模块边界——哪些资源应该归为一组?
    • 共享模块 vs 环境专属模块?
  2. 状态管理:
    • 远程状态后端是什么?(S3、Terraform Cloud、Azure Blob)
    • 状态锁机制是什么?
    • 状态文件是按环境还是按模块拆分?
  3. 环境策略:
    • 有多少个环境?(dev、staging、prod)
    • 变更如何升级?(手动apply、CI/CD pipeline、GitOps)
    • 错误apply的爆炸半径——最坏情况是什么?
  4. CI/CD Pipeline:
    • PR上运行plan,合并后apply?
    • 谁审批基础设施变更?
    • 基础设施的回滚策略是什么?
  5. IaC测试:
    • 静态分析?(tfsec、Checkov、OPA)
    • 计划验证?(terraform plan diff 评审)
    • 集成测试?(和生产环境镜像的测试环境)
  6. 密钥管理:
    • 密钥存在哪里?(Vault、Secrets Manager、SSM Parameter Store)
    • 轮换周期?
    • 紧急撤销流程?
  7. 成本估算:
    • 每单位工作的计算成本是多少?(每次API调用、每次推理、每分钟语音通话)
    • 预期规模下第三方API成本是多少?(Twilio每分钟费用、Deepgram每小时费用、Stripe每笔交易费用、S3每GB费用)
    • 存储增长预期是多少?(现在每月多少GB、6个月后、2年后)
    • 当前规模下每月开销是多少?10倍规模下呢?
    • 成本 cliffs 在哪里?(Aurora serverless 扩容层级、Lambda调用阈值、数据传输成本)
    • 是否有成本上限/预算约束?
    • 每用户成本或者每单位价值成本是多少?(单位经济模型是否可行?)

Deliverable

交付物

markdown
undefined
markdown
undefined

Infrastructure Design

基础设施设计

IaC Structure

IaC结构

  • Tool: [Terraform/Pulumi/etc.]
  • Modules: [List with responsibilities]
  • State: [Backend, locking, per-environment strategy]
  • 工具: [Terraform/Pulumi等]
  • 模块: [列表加职责]
  • 状态管理: [后端、锁、按环境策略]

Environment Promotion

环境升级流程

  • Environments: [List]
  • Promotion Flow: [PR -> plan -> review -> apply]
  • Rollback: [Strategy]
  • 环境列表: [列表]
  • 升级流程: [PR -> plan -> 评审 -> apply]
  • 回滚: [策略]

Secrets

密钥管理

  • Store: [Tool]
  • Rotation: [Schedule]
  • Emergency Revocation: [Process]
  • 存储工具: [工具]
  • 轮换周期: [周期]
  • 紧急撤销: [流程]

Cost Estimation

成本估算

ResourceUnit CostCurrent VolumeMonthly CostAt 10x
[Compute][$/unit][units/month][$][$]
[Storage][$/GB][GB][$][$]
[Third-party API][$/call][calls/month][$][$]
Total[$][$]
Cost Ceiling: [Budget constraint if any] Cost-per-User: [$/user/month at projected scale]

---
资源单位成本当前用量月度成本10倍规模成本
[计算资源][$/单位][单位/月][$][$]
[存储][$/GB][GB][$][$]
[第三方API][$/调用][调用次数/月][$][$]
总计[$][$]
成本上限: [如有预算约束请填写] 每用户成本: [预期规模下的$/用户/月]

---

Phase 10: Implementation Planning

第10阶段:实现计划

Applies to: All project types Invoke:
/writing-plans
Input: All design docs produced in prior phases
适用范围: 所有项目类型 调用:
/writing-plans
输入: 前面所有阶段生成的设计文档

Testing Strategy Addition

测试策略补充

When creating the implementation plan, ensure each task specifies which level of the testing pyramid it targets:
LevelWhat It TestsWhen to Use
UnitSingle function/component in isolationEvery task (TDD)
IntegrationTwo+ components together, real dependenciesAPI endpoints, DB queries, service integrations
ContractAPI shape matches between producer/consumerCross-service boundaries, webhook contracts, device protocols
End-to-EndFull user flow through the systemCritical paths only (login, core transaction, detection pipeline)
LoadPerformance under expected/peak trafficAfter core features are built
Source: Growing Object-Oriented Software, Guided by Tests (Freeman & Pryce)
Output: Implementation plan at
docs/plans/YYYY-MM-DD-<topic>-plan.md

制定实现计划时,确保每个任务都明确对应测试金字塔的哪个层级:
层级测试内容适用场景
单元测试隔离测试单个函数/组件所有任务(TDD)
集成测试两个以上组件联合测试,使用真实依赖API端点、数据库查询、服务集成
契约测试生产者和消费者之间的API格式是否匹配跨服务边界、webhook契约、设备协议
端到端测试全用户流程测试仅核心路径(登录、核心交易、检测pipeline)
压测预期/峰值流量下的性能核心功能开发完成后
来源: Growing Object-Oriented Software, Guided by Tests (Freeman & Pryce)
输出: 实现计划,路径为
docs/plans/YYYY-MM-DD-<topic>-plan.md

Phase 11: Implementation

第11阶段:开发实现

Applies to: All project types
Choose execution approach:
  • Subagent-driven (current session): Invoke
    /subagent-driven-development
  • Parallel session (separate): Invoke
    /executing-plans
    in a new session

适用范围: 所有项目类型
选择执行方式:
  • 子agent驱动(当前会话):调用
    /subagent-driven-development
  • 并行会话(独立会话):在新会话中调用
    /executing-plans

Phase 12: Security Validation

第12阶段:安全验证

Applies to: Full-Stack, Voice, Edge/IoT+ML, Mobile (with backend) Invoke:
/security-audit
and/or
/web-app-security-audit
Run BEFORE deployment. Verify that security-by-design decisions from Phase 3 were actually implemented.
For Edge/IoT projects, additionally verify:
  • Device certificates: valid, unique per device, rotation scheduled
  • MQTT topic security: devices can only publish to their own topics
  • Firmware integrity: signed updates, verified on device
  • Physical security: what credentials are on the device if someone steals it?
Output: Security audit report

适用范围: 全栈、语音、边缘/IoT+ML、带后端的移动端项目 调用:
/security-audit
和/或
/web-app-security-audit
部署前运行。验证第3阶段的安全左移决策是否都已实际实现。
边缘/IoT项目需要额外验证:
  • 设备证书:有效、每设备唯一、轮换计划已制定
  • MQTT主题安全:设备只能发布到自己的主题
  • 固件完整性:更新已签名,设备端会验证
  • 物理安全:如果有人偷走设备,设备上会有什么凭证?
输出: 安全审计报告

Phase 13: Observability Design

第13阶段:可观测性设计

Applies to: Full-Stack, Voice, Edge/IoT+ML Source: Observability Engineering (Charity Majors) Output: Observability section appended to system design doc
适用范围: 全栈、语音、边缘/IoT+ML 来源: Observability Engineering (Charity Majors) 输出: 可观测性章节,追加到系统设计文档末尾

Questions to Ask

待确认问题

  1. Structured Logging:
    • What logging format? (JSON, structured key-value)
    • What fields on every log line? (timestamp, service, request_id, user_id, trace_id)
    • Correlation IDs — how do you trace a request across services?
  2. Distributed Tracing:
    • What spans exist? (one per service hop in the request path)
    • What tool? (X-Ray, Jaeger, OpenTelemetry)
    • What sampling rate? (100% in staging, 10% in prod, 100% for errors)
  3. Metrics:
    • RED metrics per service: Rate, Errors, Duration
    • Business metrics: orders/hour, detections/day, review latency
    • Infrastructure metrics: CPU, memory, queue depth, cache hit rate
    • Use percentiles (p50, p95, p99), not averages
  4. Alerting:
    • What's worth waking someone up for? (data loss, service down, security breach)
    • What can wait until morning? (elevated error rate, slow responses, queue backlog)
    • Alert fatigue prevention — fewer, better alerts
  5. Dashboards:
    • One dashboard per bounded context
    • Top-level "system health" dashboard
    • On-call runbook linked from each alert
  1. 结构化日志:
    • 使用什么日志格式?(JSON、结构化键值对)
    • 每条日志必须包含哪些字段?(时间戳、服务名、request_id、user_id、trace_id)
    • 关联ID——如何跨服务追踪请求?
  2. 分布式追踪:
    • 有哪些span?(请求路径上每个服务跳转对应一个span)
    • 使用什么工具?(X-Ray、Jaeger、OpenTelemetry)
    • 采样率是多少?(staging环境100%、生产环境10%、错误100%采样)
  3. 指标:
    • 每个服务的RED指标:Rate(请求量)、Errors(错误量)、Duration(延迟)
    • 业务指标:每小时订单量、每日检测量、审核延迟
    • 基础设施指标:CPU、内存、队列深度、缓存命中率
    • 使用百分位数(p50、p95、p99),不要用平均值
  4. 告警:
    • 哪些问题值得把人叫醒处理?(数据丢失、服务宕机、安全漏洞)
    • 哪些问题可以等到上班再处理?(错误率升高、响应变慢、队列积压)
    • 避免告警疲劳——少而精的告警
  5. 看板:
    • 每个限界上下文对应一个看板
    • 顶层"系统健康"看板
    • 每个告警都关联值班 runbook

Deliverable

交付物

markdown
undefined
markdown
undefined

Observability

可观测性

Logging

日志

  • Format: [JSON/structured]
  • Required Fields: [timestamp, service, request_id, trace_id, ...]
  • Correlation: [How trace IDs propagate]
  • 格式: [JSON/结构化]
  • 必选字段: [时间戳、服务名、request_id、trace_id、...]
  • 关联机制: [trace ID 如何传播]

Tracing

追踪

  • Tool: [X-Ray/Jaeger/OTEL]
  • Spans: [List of spans in critical path]
  • Sampling: [Rate per environment]
  • 工具: [X-Ray/Jaeger/OTEL]
  • Span列表: [核心路径的span列表]
  • 采样率: [每个环境的采样率]

Metrics

指标

MetricTypeAlert Threshold
[name][RED/business/infra][threshold]
指标名称类型告警阈值
[名称][RED/业务/基础设施][阈值]

Alerting

告警

AlertSeverityAction
[What][Page/Warning/Info][Runbook link]

---
告警内容严重级别处理动作
[内容][电话告警/警告/信息][Runbook链接]

---

Phase 14: ML Validation

第14阶段:ML验证

Applies to: Edge/IoT+ML Source: Designing Machine Learning Systems (Chip Huyen), Reliable Machine Learning (Cathy Chen et al.) Output: ML validation report
适用范围: 边缘/IoT+ML 来源: Designing Machine Learning Systems (Chip Huyen)、Reliable Machine Learning (Cathy Chen et al.) 输出: ML验证报告

Validation Checklist

验证检查清单

Run after implementation, before production deployment:
  1. Model Performance:
    • Precision, recall, F1 on held-out test set
    • Performance per class (not just aggregate)
    • Performance on edge cases (night, rain, dust, unusual angles)
    • Latency on target hardware (not just dev machine)
  2. Data Quality:
    • Label consistency audit (sample and re-label, measure agreement)
    • Data leakage check (training data contaminating test set)
    • Distribution shift check (training data vs production data)
  3. Robustness:
    • Adversarial inputs (unusual lighting, occlusion, camera artifacts)
    • Out-of-distribution detection (does the model know when it doesn't know?)
    • Confidence calibration (does 90% confidence mean 90% accuracy?)
  4. Fairness & Bias:
    • Performance across operating conditions (time of day, weather, road type)
    • False positive/negative rates across conditions
    • Are some environments systematically underrepresented?
  5. Operational Readiness:
    • Model loads correctly on target hardware
    • Inference fits within resource budget (Phase 6)
    • Offline queue handles expected volume
    • Monitoring pipeline captures metrics correctly
实现完成后、生产部署前运行:
  1. 模型性能:
    • 留出测试集上的精确率、召回率、F1
    • 每个类别的性能(不只是整体)
    • 边缘场景下的性能(夜晚、雨天、灰尘、非常规角度)
    • 目标硬件上的延迟(不只是开发机上)
  2. 数据质量:
    • 标注一致性审计(抽样重新标注,计算一致性)
    • 数据泄露检查(训练数据是否污染测试集)
    • 分布偏移检查(训练数据 vs 生产数据)
  3. 鲁棒性:
    • 对抗输入(非常规光照、遮挡、摄像头伪影)
    • 分布外检测(模型是否知道自己不知道的内容?)
    • 置信度校准(90%置信度是否对应90%准确率?)
  4. 公平性与偏见:
    • 不同运行条件下的性能(一天中的不同时间、天气、道路类型)
    • 不同条件下的假阳/假阴率
    • 是否有一些环境被系统性地少采样?
  5. 上线就绪:
    • 模型可以在目标硬件上正常加载
    • 推理符合资源预算(第6阶段)
    • 离线队列可以处理预期流量
    • 监控pipeline可以正确采集指标

Deliverable

交付物

markdown
undefined
markdown
undefined

ML Validation Report

ML验证报告

Performance

性能

MetricOverallClass AClass BEdge Cases
Precision
Recall
F1
Latency (ms)
指标整体类别A类别B边缘场景
精确率
召回率
F1
延迟 (ms)

Data Quality

数据质量

  • Label Agreement: [%]
  • Leakage Check: [Pass/Fail]
  • Distribution Shift: [Within/Outside tolerance]
  • 标注一致性: [%]
  • 泄露检查: [通过/失败]
  • 分布偏移: [在容忍范围内/超出容忍范围]

Robustness

鲁棒性

  • Adversarial: [Results]
  • OOD Detection: [Method, threshold]
  • Calibration: [ECE score]
  • 对抗输入测试: [结果]
  • 分布外检测: [方法、阈值]
  • 校准: [ECE分数]

Go/No-Go Decision

上线决策

[Ready / Needs retraining / Needs more data]

---
[就绪 / 需要重训练 / 需要更多数据]

---

Phase 15: Polish & Review

第15阶段:优化与评审

Applies to: All project types
Route to the appropriate review skill(s):
Project TypeReview Skills
macOS App
/design-code-review
+
/apple-craftsman
(review mode)
iOS Mobile App
/mobile-ios-design
(review mode) +
/ux-usability-review
Web Frontend
/ui-polish-review
+
/ux-usability-review
Full-Stack Web
/ui-polish-review
+
/ux-usability-review
Voice Agent
/ux-usability-review
(conversational flow review)
Edge/IoT + ML
/ui-polish-review
(webapp) +
/ux-usability-review
(webapp)
After review skills complete, invoke
/code-simplifier
on the full codebase.

适用范围: 所有项目类型
路由到对应的评审skill:
项目类型评审Skill
macOS App
/design-code-review
+
/apple-craftsman
(评审模式)
iOS Mobile App
/mobile-ios-design
(评审模式) +
/ux-usability-review
Web Frontend
/ui-polish-review
+
/ux-usability-review
Full-Stack Web
/ui-polish-review
+
/ux-usability-review
Voice Agent
/ux-usability-review
(对话流程评审)
Edge/IoT + ML
/ui-polish-review
(网页端) +
/ux-usability-review
(网页端)
评审skill执行完成后,对整个代码库调用
/code-simplifier

Resumption Protocol

恢复开发协议

If starting a new session mid-project:
  1. Check
    docs/plans/
    for existing artifacts
  2. Read each doc to understand decisions already made
  3. Determine which phase produced the last artifact
  4. Resume from the next phase
Artifact -> Phase mapping:
ArtifactPhase Completed
System Assessment sectionPhase 0.5 (Existing System Assessment)
*-design.md
Phase 1 (Brainstorming)
Domain Model section in design docPhase 2
*-system-design.md
Phase 3 (DDIA)
Resilience section in system designPhase 4
*-ml-pipeline.md
Phase 5
Edge Architecture sectionPhase 6
API Specification section/docPhase 7
Voice prompt docPhase 8
Infrastructure sectionPhase 9
*-plan.md
Phase 10 (Writing Plans)
Code exists + tests passPhase 11 (Implementation)
Security audit reportPhase 12
Observability sectionPhase 13
ML validation reportPhase 14
Review findings addressedPhase 15

如果项目中途启动新会话:
  1. 检查
    docs/plans/
    目录下的现有产物
  2. 阅读每个文档了解已经做出的决策
  3. 确定最后一个产物对应的完成阶段
  4. 从下一个阶段恢复开发
产物 -> 阶段映射:
产物完成阶段
系统评估章节第0.5阶段(现有系统评估)
*-design.md
第1阶段(头脑风暴)
设计文档中的领域模型章节第2阶段
*-system-design.md
第3阶段(DDIA)
系统设计中的容灾章节第4阶段
*-ml-pipeline.md
第5阶段
边缘架构章节第6阶段
API规范章节/文档第7阶段
语音prompt文档第8阶段
基础设施章节第9阶段
*-plan.md
第10阶段(开发计划编写)
代码存在 + 测试通过第11阶段(开发实现)
安全审计报告第12阶段
可观测性章节第13阶段
ML验证报告第14阶段
评审问题已解决第15阶段

Anti-Patterns

反模式

MistakeFix
Skipping to implementationAlways start at Phase 0, even if "you know what you're building"
Running all 15 phases for a simple macOS appTrust the router — it selects only applicable phases
Treating security as Phase 12 onlySecurity-by-design is in Phase 3; Phase 12 validates it was implemented
Designing the ML pipeline after building the APIPhases are sequential — ML decisions affect API shape
Writing plans without a domain modelPlans based on a vague domain produce vague tasks
Skipping resilience for "internal" servicesInternal services fail too — especially at 3am
Averaging latency instead of using percentilesp50 hides tail latency; use p95/p99
Adding features to an existing system without mapping it firstRun Phase 0.5 — understand what exists before designing what's new
Treating accessibility as a Phase 15 afterthoughtAccessibility-by-design in Phase 3 catches issues that are expensive to retrofit
Ignoring cloud costs until the bill arrivesCost estimation in Phase 9 prevents surprises — unit economics matter

错误修复方案
直接跳到实现阶段始终从第0阶段开始,哪怕你"知道要做什么"
给简单的macOS应用运行全部15个阶段信任路由器——它只会选择适用的阶段
只把安全当作第12阶段的工作安全左移在第3阶段就需要融入;第12阶段只是验证是否实现
写完API再设计ML pipeline阶段是顺序执行的——ML决策会影响API格式
没有领域模型就写计划基于模糊领域制定的计划会产生模糊的任务
"内部服务"跳过容灾设计内部服务也会出故障——尤其是凌晨3点的时候
用平均延迟而不是百分位数p50会掩盖长尾延迟;用p95/p99
给现有系统加功能前不先梳理系统运行第0.5阶段——设计新内容前先了解现有内容
把无障碍当作第15阶段的事后补充第3阶段的无障碍设计左移可以避免后期高额的改造成本
账单出来才关心云成本第9阶段的成本估算可以避免意外——单位经济很重要

Book References

参考书籍

PhaseBookAuthor
2. Domain ModelingDomain-Driven DesignEric Evans
3. System DesignDesigning Data-Intensive ApplicationsMartin Kleppmann
4. ResilienceRelease It!Michael Nygard
5. ML PipelineDesigning Machine Learning SystemsChip Huyen
9. InfrastructureInfrastructure as CodeKief Morris
10. Testing StrategyGrowing Object-Oriented Software, Guided by TestsFreeman & Pryce
13. ObservabilityObservability EngineeringCharity Majors
14. ML ValidationReliable Machine LearningCathy Chen et al.
15. UI PolishRefactoring UIWathan & Schoger
15. UX ReviewDon't Make Me ThinkSteve Krug
阶段书籍作者
2. 领域建模Domain-Driven DesignEric Evans
3. 系统设计Designing Data-Intensive ApplicationsMartin Kleppmann
4. 容灾设计Release It!Michael Nygard
5. ML PipelineDesigning Machine Learning SystemsChip Huyen
9. 基础设施Infrastructure as CodeKief Morris
10. 测试策略Growing Object-Oriented Software, Guided by TestsFreeman & Pryce
13. 可观测性Observability EngineeringCharity Majors
14. ML验证Reliable Machine LearningCathy Chen et al.
15. UI优化Refactoring UIWathan & Schoger
15. UX评审Don't Make Me ThinkSteve Krug