spec-driven-tdd
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSpec-Driven Test-Driven Development
规范驱动测试驱动开发
Transform requests into verified implementations through a structured pipeline
that adapts to the project context.
通过适配项目上下文的结构化流程,将需求转化为经过验证的实现。
Workflow Router
工作流路由
Determine the workflow type before starting. This drives which phases apply and
which templates to use.
Does a codebase already exist?
|
NO --> GREENFIELD (new project from scratch)
| Phases: Clarify -> Specify -> Test Plan -> TDD -> Verify
|
YES --> What type of change?
|
+-> Bug report / error / regression
| --> BUGFIX workflow
| Phases: Explore -> Bugfix Spec -> TDD -> Verify
|
+-> Add new capability not in codebase
| --> Does test infrastructure exist?
| |
| YES --> ENHANCEMENT workflow
| | Phases: Explore -> Clarify -> Specify (delta) -> Test Plan -> TDD -> Verify
| |
| NO --> ENHANCEMENT (NO TESTS) workflow
| Phases: Explore -> Clarify -> Specify (delta) -> Bootstrap Tests -> Test Plan -> TDD -> Verify
|
+-> Change behavior without adding capability
| --> Does test infrastructure exist?
| |
| YES --> REFACTOR workflow
| | Phases: Explore -> Clarify -> Specify (delta) -> Test Plan -> TDD -> Verify
| |
| NO --> REFACTOR (NO TESTS) workflow
| Phases: Explore -> Clarify -> Specify (delta) -> Bootstrap Tests -> Test Plan -> TDD -> Verify
|
+-> Simple, well-understood change (one-sentence diff)
--> DIRECT (skip this skill)Signal detection:
| Signal in request | Likely type |
|---|---|
| Error message, stack trace, "doesn't work", "broken" | Bugfix |
| "Add", "create", "new", "build", "implement" | Enhancement |
| "Improve", "enhance", "extend", "also support" | Enhancement |
| "Clean up", "modernize", "consolidate", "simplify" | Refactor |
| "Slow", "optimize", "performance" | Refactor (optimization) |
| No existing code, "start fresh", "new project" | Greenfield |
| "Add tests", "no tests", "untested" | Enhancement/Refactor (no tests) |
Test infrastructure detection:
Existing test infrastructure = ANY of these found:
- Test files (,
*test*,*spec*)__tests__/ - Test runner config (jest.config., vitest.config., pytest.ini, conftest.py, .mocharc.*)
- Test script in manifest (script in package.json, test target in Makefile)
"test" - CI config that runs tests
No test infrastructure = NONE of the above found.
开始前先确定工作流类型,这将决定适用的阶段和使用的模板。
是否已有代码库?
|
否 --> 全新项目(GREENFIELD,从零开始)
| 阶段:澄清需求 → 制定规范 → 测试计划 → TDD → 验证
|
是 --> 变更类型是什么?
|
+-> 缺陷报告/错误/回归问题
| --> 缺陷修复(BUGFIX)工作流
| 阶段:探索代码库 → 缺陷修复规范 → TDD → 验证
|
+-> 添加代码库中没有的新功能
| --> 是否存在测试基础设施?
| |
| 是 --> 功能增强(ENHANCEMENT)工作流
| | 阶段:探索代码库 → 澄清需求 → 制定增量规范 → 测试计划 → TDD → 验证
| |
| 否 --> 无测试的功能增强(ENHANCEMENT (NO TESTS))工作流
| 阶段:探索代码库 → 澄清需求 → 制定增量规范 → 搭建测试基础设施 → 测试计划 → TDD → 验证
|
+-> 修改现有行为但不添加新功能
| --> 是否存在测试基础设施?
| |
| 是 --> 代码重构(REFACTOR)工作流
| | 阶段:探索代码库 → 澄清需求 → 制定增量规范 → 测试计划 → TDD → 验证
| |
| 否 --> 无测试的代码重构(REFACTOR (NO TESTS))工作流
| 阶段:探索代码库 → 澄清需求 → 制定增量规范 → 搭建测试基础设施 → 测试计划 → TDD → 验证
|
+-> 简单、明确的变更(单行差异即可描述)
--> 直接处理(跳过此工作流)信号检测:
| 请求中的信号 | 可能的工作流类型 |
|---|---|
| 错误信息、堆栈跟踪、“无法工作”、“损坏” | 缺陷修复 |
| “添加”、“创建”、“新的”、“构建”、“实现” | 功能增强 |
| “改进”、“增强”、“扩展”、“同时支持” | 功能增强 |
| “清理”、“现代化”、“合并”、“简化” | 代码重构 |
| “缓慢”、“优化”、“性能” | 代码重构(优化) |
| 无现有代码、“从头开始”、“新项目” | 全新项目 |
| “添加测试”、“无测试”、“未测试” | 功能增强/代码重构(无测试) |
测试基础设施检测:
存在测试基础设施 = 满足以下任一条件:
- 存在测试文件(、
*test*、*spec*)__tests__/ - 存在测试运行器配置(jest.config.、vitest.config.、pytest.ini、conftest.py、.mocharc.*)
- 清单文件中存在测试脚本(package.json中的"test"脚本、Makefile中的测试目标)
- 存在执行测试的CI配置
无测试基础设施 = 以上条件均不满足。
Phase 0: Explore Existing Codebase (Brownfield Only)
阶段0:探索现有代码库(仅适用于已有代码库的场景)
Skip this phase for greenfield projects.
Before asking the user anything or writing any spec, understand what already
exists. Explore in read-only mode.
全新项目跳过此阶段。
在询问用户任何问题或编写任何规范之前,先了解已有的内容。以只读模式探索。
What to discover
需要探索的内容
- Project structure: Directory layout, where source and tests live
- Tech stack: Framework, language, versions, build system, package manager
- Architectural patterns: MVC, layered, event-driven, microservices, etc.
- Naming conventions: File naming, function naming, variable casing
- Existing patterns: Find the closest analog to the requested change — how is similar functionality already implemented?
- Data models: Key entities, relationships, schemas
- Test infrastructure: Test framework, test runner command, fixture patterns, approximate coverage, where test files live
- Existing spec artifacts: Check for prior requirements.md, design.md, tasks.md, CLAUDE.md, or similar documentation
- Affected area: Which files and modules will this change touch?
- 项目结构:目录布局、源码和测试文件的位置
- 技术栈:框架、语言、版本、构建系统、包管理器
- 架构模式:MVC、分层架构、事件驱动、微服务等
- 命名规范:文件命名、函数命名、变量大小写规则
- 现有实现模式:找到与请求变更最相似的已有功能——类似功能是如何实现的?
- 数据模型:核心实体、关系、 schema
- 测试基础设施:测试框架、测试运行命令、夹具模式、大致覆盖率、测试文件位置
- 现有规范文档:检查是否存在requirements.md、design.md、tasks.md、CLAUDE.md或类似文档
- 受影响区域:变更将涉及哪些文件和模块?
How to explore
探索方式
- Read the project's CLAUDE.md or README first (if they exist)
- Use glob/grep to find relevant source files and test files
- Read 2-3 existing files in the affected area to absorb patterns
- If prior spec files exist, read them to understand accumulated requirements
- 先阅读项目的CLAUDE.md或README(如果存在)
- 使用glob/grep查找相关源码文件和测试文件
- 阅读受影响区域的2-3个现有文件,熟悉现有模式
- 如果存在旧的规范文件,阅读它们以了解已积累的需求
Test infrastructure assessment
测试基础设施评估
Determine the test state by scanning for:
| Indicator | What it tells you |
|---|---|
| Test files exist in affected area | Tests cover the code you'll change |
| Test files exist elsewhere but not here | Project has tests, but not for this area |
| Test framework in dependencies | Framework chosen but maybe unused |
| Test runner config exists | Infrastructure is set up |
| Runner command is established |
| CI config runs tests | Tests are part of the workflow |
| No test files, no config, no scripts | No test infrastructure at all |
Classify the result:
| Test State | Definition | Workflow Impact |
|---|---|---|
| Tests exist | Test files + runner + config all present | Standard brownfield |
| Partial infra | Framework in deps or config exists, but no/few test files | Bootstrap: write tests, skip framework selection |
| No tests | No test files, no config, no runner | Bootstrap: full test infrastructure setup |
通过扫描以下内容确定测试状态:
| 指标 | 说明 |
|---|---|
| 受影响区域存在测试文件 | 测试覆盖了你将修改的代码 |
| 其他区域存在测试文件但此区域没有 | 项目有测试,但未覆盖此区域 |
| 依赖中包含测试框架 | 已选择框架但可能未使用 |
| 存在测试运行器配置 | 基础设施已搭建 |
| 清单文件中有"test"脚本 | 运行命令已确定 |
| CI配置执行测试 | 测试已纳入工作流 |
| 无测试文件、无配置、无脚本 | 完全没有测试基础设施 |
分类结果:
| 测试状态 | 定义 | 对工作流的影响 |
|---|---|---|
| 存在测试 | 测试文件、运行器、配置均存在 | 标准已有代码库工作流 |
| 部分基础设施 | 依赖中有框架或存在配置,但测试文件很少或没有 | 搭建测试:编写测试,跳过框架选择 |
| 无测试 | 无测试文件、无配置、无运行器 | 搭建测试:完整设置测试基础设施 |
Output: Codebase Context Summary
输出:代码库上下文摘要
Produce a brief mental model (do not write a file unless the user requests it):
Stack: [framework, language, test runner]
Patterns: [architecture, naming, file organization]
Affected area: [files/modules this change touches]
Existing tests: [relevant test files, approximate count, runner command]
OR: No test infrastructure found
Test state: [tests exist | partial infra | no tests]
Prior specs: [any existing requirements.md/design.md or none]
Closest analog: [existing feature most similar to requested change]This context informs every subsequent phase.
生成一个简要的心智模型(除非用户要求,否则无需写入文件):
技术栈:[框架、语言、测试运行器]
模式:[架构、命名、文件组织]
受影响区域:[变更将涉及的文件/模块]
现有测试:[相关测试文件、大致数量、运行命令]
或:未找到测试基础设施
测试状态:[存在测试 | 部分基础设施 | 无测试]
已有规范:[存在requirements.md/design.md等或无]
最相似的已有功能:[与请求变更最相似的现有功能]此上下文将指导后续所有阶段。
Phase 1: Clarify Requirements
阶段1:澄清需求
Evaluate the request's clarity before starting any work.
在开始任何工作之前,评估请求的清晰程度。
Greenfield
全新项目
Ask about the blank slate: inputs/outputs, data formats, error handling,
edge cases, performance constraints, tech stack preferences.
询问关于空白项目的信息:输入/输出、数据格式、错误处理、边缘案例、性能约束、技术栈偏好。
Brownfield (tests exist)
已有代码库(存在测试)
Use codebase context from Phase 0. Do NOT ask questions the codebase
already answers. Focus questions on:
- What should the NEW behavior be? (the codebase shows current behavior)
- Where does new behavior differ from existing patterns?
- What existing behavior must NOT change?
- Are there constraints the existing architecture imposes?
使用阶段0得到的代码库上下文。 不要询问代码库已经能回答的问题。重点关注以下问题:
- 新行为应该是什么?(代码库展示了当前行为)
- 新行为与现有模式有何不同?
- 哪些现有行为必须保持不变?
- 现有架构是否施加了约束?
Brownfield (no tests)
已有代码库(无测试)
In addition to the standard brownfield questions, clarify:
- Test framework preference: Does the user have a preferred test framework? If not, recommend one based on the stack (see Bootstrap Tests phase).
- Test scope: Should tests cover ONLY the new/changed code, or also establish baseline coverage for existing code in the affected area?
- Test location: Co-located with source () or separate directory (
src/foo.test.ts)? Infer from project conventions if possible.tests/
Ambiguity signals (if ANY present, ask before proceeding):
- Multiple valid interpretations producing different implementations
- Missing I/O specifications, pre/post conditions, or data formats
- Unspecified error handling, edge cases, or boundary conditions
- Unclear performance, security, or environmental constraints
- Change touches shared/critical code paths
When to proceed without asking:
- Request is well-constrained with one obvious implementation
- Ambiguity is purely cosmetic (naming, formatting)
- Clarification loop has reached 3 rounds (proceed with stated assumptions)
Interview approach:
- Ask up to 3 focused questions per round
- Each question should target a decision that changes the implementation
- State the default assumption alongside each question
- After each round, summarize resolved items and remaining unknowns
除了标准的已有代码库问题外,还需澄清:
- 测试框架偏好:用户是否有偏好的测试框架?如果没有,根据技术栈推荐一个(见搭建测试基础设施阶段)。
- 测试范围:测试应仅覆盖新增/修改的代码,还是也需为受影响区域的现有代码建立基线覆盖率?
- 测试文件位置:与源码共存()还是单独目录(
src/foo.test.ts)?尽可能从项目惯例推断。tests/
模糊信号(只要存在任一信号,就先询问再继续):
- 存在多种合理解释,会导致不同的实现
- 缺少输入/输出规范、前置/后置条件或数据格式
- 未指定错误处理、边缘案例或边界条件
- 性能、安全或环境约束不明确
- 变更涉及共享/关键代码路径
无需询问即可继续的情况:
- 请求约束明确,只有一种明显的实现方式
- 模糊点仅涉及外观(命名、格式)
- 澄清循环已达3轮(基于已说明的假设继续)
询问方法:
- 每轮最多问3个聚焦的问题
- 每个问题应针对会改变实现的决策点
- 每个问题附带默认假设
- 每轮询问后,总结已解决的问题和剩余未知项
Phase 2: Write Specification
阶段2:编写规范
Use the templates in references/spec-template.md.
Which template depends on the workflow type:
| Workflow | Spec Template | Key Difference |
|---|---|---|
| Greenfield | Feature (requirements.md + design.md) | Written from scratch |
| Enhancement | Enhancement (requirements.md + design-delta.md) | Extends existing specs, documents only what changes |
| Enhancement (no tests) | Enhancement (requirements.md + design-delta.md) | Same as above, plus test infrastructure decisions |
| Refactor | Refactor (refactor.md) | Documents behavior preservation + structural changes |
| Refactor (no tests) | Refactor (refactor.md) | Same as above, plus test infrastructure decisions |
| Bugfix | Bugfix (bugfix.md) | Current/Expected/Unchanged behavior |
使用references/spec-template.md中的模板。使用哪个模板取决于工作流类型:
| 工作流 | 规范模板 | 核心差异 |
|---|---|---|
| 全新项目 | 功能模板(requirements.md + design.md) | 从零开始编写 |
| 功能增强 | 功能增强模板(requirements.md + design-delta.md) | 扩展现有规范,仅记录变更内容 |
| 无测试的功能增强 | 功能增强模板(requirements.md + design-delta.md) | 同上,加上测试基础设施决策 |
| 代码重构 | 代码重构模板(refactor.md) | 记录行为保留情况和结构变更 |
| 无测试的代码重构 | 代码重构模板(refactor.md) | 同上,加上测试基础设施决策 |
| 缺陷修复 | 缺陷修复模板(bugfix.md) | 当前/预期/不变行为 |
EARS notation (all workflows)
EARS表示法(所有工作流适用)
| Pattern | Syntax | Use Case |
|---|---|---|
| Ubiquitous | THE SYSTEM SHALL [behavior] | Always-on requirements |
| Event-Driven | WHEN [event] THE SYSTEM SHALL [behavior] | Triggered by events |
| State-Driven | WHILE [state] THE SYSTEM SHALL [behavior] | During a state |
| Unwanted | IF [condition] THEN THE SYSTEM SHALL [behavior] | Exception handling |
| Complex | WHILE [state] WHEN [event] THE SYSTEM SHALL [behavior] | Combined |
Every requirement must be atomic, testable, and have a unique ID.
| 模式 | 语法 | 使用场景 |
|---|---|---|
| 通用型 | THE SYSTEM SHALL [行为] | 始终生效的需求 |
| 事件驱动型 | WHEN [事件] THE SYSTEM SHALL [行为] | 由事件触发的需求 |
| 状态驱动型 | WHILE [状态] THE SYSTEM SHALL [行为] | 处于某状态时的需求 |
| 异常处理型 | IF [条件] THEN THE SYSTEM SHALL [行为] | 异常处理场景 |
| 复杂型 | WHILE [状态] WHEN [事件] THE SYSTEM SHALL [行为] | 组合场景 |
每个需求必须是原子性的、可测试的,并拥有唯一ID。
Requirement ID namespacing (brownfield)
需求ID命名空间(已有代码库)
If prior specs exist with REQ-001 through REQ-N, continue numbering from REQ-(N+1).
If no prior specs exist, start fresh. Use prefixes to distinguish:
- for enhancement requirements
REQ-E-nnn - for refactor requirements
REQ-R-nnn - for bugfix requirements
REQ-BUG-nnn
如果已有规范的ID从REQ-001到REQ-N,则从REQ-(N+1)开始继续编号。如果没有已有规范,则从头开始。使用前缀区分:
- 用于功能增强需求
REQ-E-nnn - 用于代码重构需求
REQ-R-nnn - 用于缺陷修复需求
REQ-BUG-nnn
Unchanged Behavior section (critical for brownfield)
不变行为部分(已有代码库的关键内容)
Every brownfield spec MUST include an Unchanged Behavior section:
- INV-001: WHEN [condition] THE SYSTEM SHALL CONTINUE TO [existing behavior]This is the primary regression prevention mechanism.
When no tests exist, invariants are your ONLY regression protection. Be
more thorough than usual: list every behavior in the affected area that must
survive the change. Read the source code and its callers to identify these.
每个已有代码库的规范必须包含不变行为部分:
- INV-001: WHEN [条件] THE SYSTEM SHALL CONTINUE TO [现有行为]这是主要的回归预防机制。
当没有测试时,不变量是唯一的回归保护手段。 要比平时更全面:列出受影响区域中所有必须在变更后保持不变的行为。阅读源码及其调用方来识别这些行为。
Greenfield: requirements.md + design.md
全新项目:requirements.md + design.md
Write both from scratch per the Feature template. Full architecture, data models,
API contracts, error handling, testing strategy.
根据功能模板从零开始编写这两个文件。包含完整架构、数据模型、API契约、错误处理、测试策略。
Enhancement: requirements.md + design-delta.md
功能增强:requirements.md + design-delta.md
requirements.md: Add only NEW requirements. Reference existing spec if present.
Include Unchanged Behavior invariants for every existing behavior the change
could affect.
design-delta.md: Document only what changes:
- Modified components (what exists → what changes)
- New components (what's added)
- Integration points (how new connects to existing)
- Files to modify vs. files to create
Do NOT rewrite the entire architecture. Reference existing design.
No-tests addition to design-delta.md: Include a Test Infrastructure section:
markdown
undefinedrequirements.md:仅添加新需求。如果存在现有规范,需引用。包含变更可能影响的所有现有行为的不变行为不变量。
design-delta.md:仅记录变更内容:
- 修改的组件(现有内容 → 变更后内容)
- 新增的组件(添加的内容)
- 集成点(新组件与现有组件的连接方式)
- 需修改的文件 vs 需创建的文件
不要重写整个架构。引用现有设计。
无测试场景需添加到design-delta.md的内容:包含测试基础设施部分:
markdown
undefinedTest Infrastructure (New)
测试基础设施(新增)
- Framework: [chosen framework and rationale]
- Runner command: [command to run tests]
- Config file: [path to test config]
- Test location: [co-located | separate directory]
- Naming convention: [pattern, e.g., .test.ts, test_.py]
undefined- 框架: [所选框架及理由]
- 运行命令: [运行测试的命令]
- 配置文件: [测试配置文件路径]
- 测试文件位置: [与源码共存 | 单独目录]
- 命名规范: [模式,例如 .test.ts, test_.py]
undefinedBugfix: bugfix.md
缺陷修复:bugfix.md
Three-section format:
- Current Behavior (defect): WHEN [x] THEN the system [incorrect behavior]
- Expected Behavior (correct): WHEN [x] THEN the system SHALL [correct behavior]
- Unchanged Behavior (regression): WHEN [y] THE SYSTEM SHALL CONTINUE TO [existing behavior]
Derive three tests: reproduce bug, validate fix, confirm no regressions.
三部分格式:
- 当前行为(缺陷):WHEN [x] THEN 系统 [错误行为]
- 预期行为(正确行为):WHEN [x] THEN 系统 SHALL [正确行为]
- 不变行为(回归预防):WHEN [y] THE SYSTEM SHALL CONTINUE TO [现有行为]
衍生三个测试:复现缺陷、验证修复、确认无回归。
Phase 3: Bootstrap Test Infrastructure (No-Tests Workflows Only)
阶段3:搭建测试基础设施(仅适用于无测试的工作流)
Skip this phase if test infrastructure already exists.
Before deriving a test plan, establish the test infrastructure. You cannot
write tests without a framework, runner, and conventions.
如果已有测试基础设施,跳过此阶段。
在制定测试计划之前,先建立测试基础设施。没有框架、运行器和规范,无法编写测试。
Step 1: Select test framework
步骤1:选择测试框架
If the user specified a preference in Phase 1, use it. Otherwise, select
based on the stack:
| Stack | Recommended Framework | Rationale |
|---|---|---|
| TypeScript / ESM | vitest | Native ESM, fast, compatible with Jest API |
| TypeScript / CJS | jest + ts-jest | Mature ecosystem, wide adoption |
| JavaScript / Node | jest or vitest | Either works; match project's module system |
| JavaScript / Browser | vitest or jest + jsdom | DOM testing support |
| Python 3 | pytest | De facto standard, fixtures, parametrize |
| Python (Django) | pytest-django | Django integration with pytest |
| Go | testing (stdlib) | Built-in, no external dependency needed |
| Rust | cargo test (built-in) | Built-in test framework |
| Java / Spring | JUnit 5 + Mockito | Industry standard |
| Ruby / Rails | RSpec or Minitest | RSpec for BDD style; Minitest for minimal |
| C# / .NET | xUnit or NUnit | xUnit for modern .NET |
| Elixir | ExUnit (built-in) | Built-in test framework |
Principle: Choose the most conventional option. The goal is not the "best"
framework but the one the team (or a future developer) will expect.
如果用户在阶段1指定了偏好,则使用该框架。否则,根据技术栈选择:
| 技术栈 | 推荐框架 | 理由 |
|---|---|---|
| TypeScript / ESM | vitest | 原生ESM支持,速度快,兼容Jest API |
| TypeScript / CJS | jest + ts-jest | 成熟生态系统,广泛采用 |
| JavaScript / Node | jest 或 vitest | 均可;匹配项目的模块系统 |
| JavaScript / Browser | vitest 或 jest + jsdom | 支持DOM测试 |
| Python 3 | pytest | 事实上的标准,支持夹具、参数化 |
| Python (Django) | pytest-django | Django与pytest的集成 |
| Go | testing(标准库) | 内置,无需外部依赖 |
| Rust | cargo test(内置) | 内置测试框架 |
| Java / Spring | JUnit 5 + Mockito | 行业标准 |
| Ruby / Rails | RSpec 或 Minitest | RSpec适用于BDD风格;Minitest更轻量 |
| C# / .NET | xUnit 或 NUnit | xUnit适用于现代.NET |
| Elixir | ExUnit(内置) | 内置测试框架 |
原则:选择最符合惯例的选项。 目标不是“最好”的框架,而是团队(或未来开发者)会预期使用的框架。
Step 2: Install and configure
步骤2:安装和配置
-
Add the framework as a dev dependency:
npm install --save-dev vitest # Node/TS pip install pytest # Python # Go and Rust have built-in testing -
Create test config (if the framework needs one):
- ,
vitest.config.ts,jest.config.ts, etc.pytest.ini - Match the project's existing config patterns (TypeScript for TS projects, YAML if project uses YAML configs, etc.)
-
Add test runner script to the project manifest:json
// package.json { "scripts": { "test": "vitest run" } }toml# pyproject.toml [tool.pytest.ini_options] testpaths = ["tests"] -
Create test directory (if using separate directory convention):
mkdir tests/ # or __tests__/ or test/ -
Verify the framework works by writing a trivial smoke test:typescript
// tests/smoke.test.ts import { describe, it, expect } from 'vitest' describe('test infrastructure', () => { it('works', () => { expect(1 + 1).toBe(2) }) })Run it. If it passes, infrastructure is ready. Delete or keep the smoke test as appropriate.
-
将框架添加为开发依赖:
npm install --save-dev vitest # Node/TS pip install pytest # Python # Go和Rust有内置测试功能 -
创建测试配置(如果框架需要):
- 、
vitest.config.ts、jest.config.ts等pytest.ini - 匹配项目现有配置模式(TS项目用TypeScript,项目用YAML配置则用YAML等)
-
在项目清单中添加测试运行脚本:json
// package.json { "scripts": { "test": "vitest run" } }toml# pyproject.toml [tool.pytest.ini_options] testpaths = ["tests"] -
创建测试目录(如果使用单独目录惯例):
mkdir tests/ # 或 __tests__/ 或 test/ -
验证框架可用:编写一个简单的冒烟测试:typescript
// tests/smoke.test.ts import { describe, it, expect } from 'vitest' describe('test infrastructure', () => { it('works', () => { expect(1 + 1).toBe(2) }) })运行测试。如果通过,基础设施已准备就绪。根据需要保留或删除冒烟测试。
Step 3: Establish test conventions
步骤3:建立测试规范
Derive conventions from the project's CODE conventions:
| Code Convention | Test Convention |
|---|---|
Files in | Tests in |
| Functions use camelCase | Test names use camelCase: |
| Functions use snake_case | Test names use snake_case: |
| Modules organized by feature | Test directories mirror source directories |
| ES modules (import/export) | Tests use same import style |
| CommonJS (require) | Tests use same require style |
Document the conventions in design-delta.md (or design.md for greenfield)
so future tests stay consistent.
从项目的代码规范衍生测试规范:
| 代码规范 | 测试规范 |
|---|---|
文件位于 | 测试文件位于 |
| 函数使用camelCase | 测试名称使用camelCase: |
| 函数使用snake_case | 测试名称使用snake_case: |
| 模块按功能组织 | 测试目录镜像源码目录结构 |
| ES模块(import/export) | 测试使用相同的导入风格 |
| CommonJS(require) | 测试使用相同的require风格 |
记录规范:在design-delta.md(或全新项目的design.md)中记录测试规范,以便未来测试保持一致。
Step 4: Write characterization tests (critical)
步骤4:编写特征测试(关键步骤)
Before changing any existing code, write characterization tests that capture
the current behavior of the code you're about to modify. These are not tests
you want to pass — they are tests that document what the code ACTUALLY does,
so you can detect when your changes break something.
See references/test-plan.md for the full
characterization test method.
在修改任何现有代码之前,编写特征测试以记录你即将修改的代码的当前行为。这些不是你希望通过的测试——它们记录代码的实际行为,以便你能检测到变更何时破坏了现有功能。
有关完整的特征测试方法,请参阅references/test-plan.md。
What to characterize
需要特征化的内容
Focus on the affected area identified in Phase 0:
- Public functions/methods you'll call or modify: test their current inputs → outputs
- API endpoints you'll change: test their current request → response
- Critical code paths through the module: test the happy path and major error paths
- Integration points between the module and its callers: test that callers get what they expect
聚焦于阶段0确定的受影响区域:
- 你将调用或修改的公共函数/方法:测试它们当前的输入→输出
- 你将修改的API端点:测试它们当前的请求→响应
- 模块中的关键代码路径:测试正常路径和主要错误路径
- 模块与其调用方之间的集成点:测试调用方是否得到预期结果
How many characterization tests
特征测试的数量
- Enhancement: Characterize the specific functions/endpoints you'll modify or call. Not the entire codebase — just the affected surface.
- Refactor: More extensive. Characterize ALL externally observable behavior of the code being refactored. This is your safety net.
- Bugfix: Characterize the behavior around the bug. The reproduction test IS a characterization test (it documents the current broken behavior).
- 功能增强:特征化你将修改或调用的特定函数/端点。不是整个代码库——仅受影响的部分。
- 代码重构:更全面。特征化被重构代码的所有外部可观察行为。这是你的安全网。
- 缺陷修复:特征化缺陷周围的行为。复现测试本身就是特征测试(它记录了当前的错误行为)。
Characterization test naming
特征测试命名
Use a distinct naming pattern so these are recognizable:
test_CHAR_[function]_[scenario]_[current_behavior]Examples:
python
def test_CHAR_create_user_with_valid_data_returns_user_object():
def test_CHAR_create_user_with_duplicate_email_raises_conflict():
def test_CHAR_get_user_nonexistent_returns_none():使用独特的命名模式,以便识别:
test_CHAR_[函数]_[场景]_[当前行为]示例:
python
def test_CHAR_create_user_with_valid_data_returns_user_object():
def test_CHAR_create_user_with_duplicate_email_raises_conflict():
def test_CHAR_get_user_nonexistent_returns_none():Run characterization tests
运行特征测试
All characterization tests MUST pass against the UNCHANGED code. If a
characterization test fails, your test is wrong — fix the test, not the code.
These characterization tests become your regression suite. They replace
the "existing tests" that the standard brownfield workflow relies on.
所有特征测试必须在未修改的代码上通过。如果特征测试失败,说明你的测试有误——修复测试,而不是代码。
这些特征测试将成为你的回归测试套件。 它们替代了标准已有代码库工作流中依赖的“现有测试”。
Deliverable: Test infrastructure ready
交付物:测试基础设施就绪
After this phase:
- Framework installed and configured
- Runner command works
- Test conventions documented
- Characterization tests pass against unchanged code
- You can now proceed to Phase 4 (Test Plan) with confidence
此阶段完成后:
- 框架已安装并配置
- 运行命令可用
- 测试规范已记录
- 特征测试在未修改的代码上通过
- 你可以放心进入阶段4(测试计划)
Phase 4: Derive Test Plan
阶段4:制定测试计划
From the spec, derive tests and write . See
references/test-plan.md for templates.
tasks.md从规范中衍生测试并编写。有关模板,请参阅references/test-plan.md。
tasks.mdBrownfield: Discover existing tests first
已有代码库:先发现现有测试
Before writing any new tests:
- Find existing test files in the affected area (glob for ,
*test*)*spec* - Read relevant existing tests to understand patterns (framework, assertions, fixtures, naming conventions)
- Check for existing coverage of the behaviors you're about to change
- Match existing patterns in all new tests (same framework, same style, same file location conventions)
在编写任何新测试之前:
- 找到受影响区域的现有测试文件(使用glob查找、
*test*)*spec* - 阅读相关现有测试以了解模式(框架、断言、夹具、命名规范)
- 检查你即将修改的行为是否已有测试覆盖
- 所有新测试匹配现有模式(相同框架、相同风格、相同文件位置规范)
Brownfield (no tests): Use characterization tests as baseline
已有代码库(无测试):使用特征测试作为基线
If you bootstrapped tests in Phase 3:
- Your characterization tests ARE the existing tests. They serve the same role as pre-existing tests in the standard brownfield workflow.
- Match the conventions you established in Phase 3 for all new tests.
- Do not add more characterization tests at this point — Phase 3 covered the affected area. Focus on test derivation from the spec.
- Map invariants to characterization tests: Each INV-* should correspond to at least one characterization test that already passes.
如果你在阶段3搭建了测试基础设施:
- 你的特征测试就是现有测试。 它们扮演标准已有代码库工作流中预先存在的测试的角色。
- 所有新测试匹配你在阶段3建立的规范。
- 此时不要添加更多特征测试——阶段3已覆盖受影响区域。专注于从规范中衍生测试。
- 将不变量映射到特征测试:每个INV-*应对应至少一个已通过的特征测试。
Test derivation rules
测试衍生规则
- Each EARS requirement → at least one acceptance test (Given/When/Then)
- Each data model/interface → unit tests for validation, transformation, edge cases
- Each integration point → integration test using real services where possible
- Each invariant → regression test confirming unchanged behavior
- If tests already exist: existing tests cover this
- If no tests existed: characterization tests from Phase 3 cover this
- Identify properties (invariants for all inputs) → property-based tests
- 每个EARS需求 → 至少一个验收测试(Given/When/Then格式)
- 每个数据模型/接口 → 单元测试,涵盖验证、转换、边缘案例
- 每个集成点 → 集成测试,尽可能使用真实服务
- 每个不变量 → 回归测试,确认行为不变
- 如果已有测试:现有测试覆盖此内容
- 如果没有测试:阶段3的特征测试覆盖此内容
- 识别属性(所有输入的不变量)→ 属性化测试
Traceability
可追溯性
Maintain a mapping in tasks.md:
| Req ID | Test Case IDs | Status |
|-----------|-------------------|-------------|
| REQ-E-001 | TC-E-001, TC-E-002| Not Started |
| INV-001 | TC-REG-001 | Not Started |Brownfield traceability rules:
- Every NEW requirement must have >= 1 test
- Every INVARIANT must have >= 1 regression test
- Existing tests from prior iterations are NOT orphans — only flag tests from the current iteration that don't map to a requirement
- The traceability matrix covers only the current iteration's scope
No-tests traceability addition:
- Characterization tests (CHAR-) map to INV- invariants
- Include them in the matrix with their CHAR prefix:
| Req ID | Test Case IDs | Status |
|-----------|----------------------|-------------|
| REQ-E-001 | TC-E-001, TC-E-002 | Not Started |
| INV-001 | CHAR-create-user-001 | Passing (baseline) |
| INV-002 | CHAR-get-user-001 | Passing (baseline) |在tasks.md中维护映射:
| 需求ID | 测试用例ID | 状态 |
|-----------|-------------------|-------------|
| REQ-E-001 | TC-E-001, TC-E-002| 未开始 |
| INV-001 | TC-REG-001 | 未开始 |已有代码库可追溯性规则:
- 每个新需求必须有≥1个测试
- 每个不变量必须有≥1个回归测试
- 先前迭代的现有测试不是孤立的——仅标记当前迭代中未映射到需求的测试
- 可追溯性矩阵仅涵盖当前迭代的范围
无测试场景的可追溯性补充:
- 特征测试(CHAR-*)映射到INV-*不变量
- 在矩阵中包含它们的CHAR前缀:
| 需求ID | 测试用例ID | 状态 |
|-----------|----------------------|-------------|
| REQ-E-001 | TC-E-001, TC-E-002 | 未开始 |
| INV-001 | CHAR-create-user-001 | 通过(基线) |
| INV-002 | CHAR-get-user-001 | 通过(基线) |tasks.md
tasks.md
Break implementation into discrete, sequenced tasks. Each task:
- Maps to one or more requirements
- Has clear acceptance criteria
- Follows dependency order
- Includes "Write tests" as the FIRST subtask (TDD)
- Brownfield: Specifies which files are MODIFIED vs. CREATED
No-tests task ordering:
For brownfield-no-tests workflows, tasks.md should include the bootstrap
work as Task 0:
undefined将实现分解为离散的、按顺序排列的任务。每个任务:
- 映射到一个或多个需求
- 有明确的验收标准
- 遵循依赖顺序
- 将“编写测试”作为第一个子任务(TDD)
- 已有代码库:指定哪些文件是修改的 vs 创建的
无测试场景的任务顺序:
对于无测试的已有代码库工作流,tasks.md应将搭建工作作为任务0:
undefinedTask 0: Bootstrap test infrastructure
任务0:搭建测试基础设施
- Status: [ ] Not Started
- Requirements: (infrastructure — no REQ mapping)
- Subtasks:
- Install [framework], create config
- Add test runner script
- Write characterization tests for affected area
- Verify all characterization tests pass
- Acceptance: runs and all characterization tests pass
[test command]
Principle: **no big jumps in complexity**.- 状态:[ ] 未开始
- 需求:(基础设施——无REQ映射)
- 子任务:
- 安装[框架],创建配置
- 添加测试运行脚本
- 为受影响区域编写特征测试
- 验证所有特征测试通过
- 验收标准:可运行且所有特征测试通过
[测试命令]
原则:**不要有大幅的复杂度跳跃**。Phase 5: TDD Implementation Loop
阶段5:TDD实现循环
Execute tasks from tasks.md using strict TDD. This is NON-NEGOTIABLE.
使用严格的TDD执行tasks.md中的任务。这是不可协商的。
For each task:
每个任务的步骤:
1. RED - Write failing test(s) for the task's requirements
2. RUN - Execute test, confirm it FAILS (if it passes, test is wrong)
3. GREEN - Write MINIMAL code to make the test pass
4. RUN - Execute ALL tests (new + existing), confirm ALL pass
5. REFACTOR - Clean up, ensure no test breakage
6. COMMIT - Mark task complete in tasks.md1. 红(RED) - 为任务的需求编写失败的测试
2. 运行(RUN) - 执行测试,确认失败(如果通过,说明测试有误)
3. 绿(GREEN) - 编写最少的代码使测试通过
4. 运行(RUN) - 执行所有测试(新测试 + 现有测试),确认全部通过
5. 重构(REFACTOR) - 清理代码,确保测试不中断
6. 提交(COMMIT) - 在tasks.md中标记任务完成Running tests (brownfield, tests exist)
运行测试(已有代码库,存在测试)
- Inner loop: Run only the new/affected tests during Red-Green iterations (for speed)
- Task boundary: Run the FULL test suite after completing each task (for regression safety)
- Final verification: Run full suite + linters + type checks at the end
- 内部循环:在红-绿迭代期间仅运行新的/受影响的测试(为了速度)
- 任务边界:完成每个任务后运行完整测试套件(为了回归安全)
- 最终验证:最后运行完整套件 + 代码检查工具 + 类型检查
Running tests (brownfield, no tests — after bootstrap)
运行测试(已有代码库,无测试——搭建完成后)
- Inner loop: Run new tests + characterization tests during Red-Green iterations. The characterization tests are your regression guardrail.
- Task boundary: Run ALL tests (characterization + new) after each task.
- Final verification: Run all tests + linters + type checks at the end.
Critical: If a characterization test fails during implementation, you broke
existing behavior. This is the same signal as "existing test breaks" in the
standard brownfield workflow. Fix your new code first.
- 内部循环:在红-绿迭代期间运行新测试 + 特征测试。特征测试是你的回归防护栏。
- 任务边界:完成每个任务后运行所有测试(特征测试 + 新测试)。
- 最终验证:最后运行所有测试 + 代码检查工具 + 类型检查。
关键:如果在实现过程中特征测试失败,说明你破坏了现有行为。 这与标准已有代码库工作流中“现有测试失败”的信号相同。首先修复你的新代码。
Rules
规则
- Never write implementation before its test.
- Never alter the spec to satisfy a test. Spec-derived tests are authoritative.
- Minimal code only. Add nothing beyond what makes the current test pass.
- All tests green before moving to next task.
- Use real dependencies where feasible. Mocks only for external services outside your control.
- Decompose classes by method dependency. Generate in dependency order, test each method individually.
- Bounded repair. 3 fix attempts max, then reassess or ask user.
- 永远不要在编写测试之前编写实现代码。
- 永远不要为了满足测试而修改规范。 从规范衍生的测试是权威的。
- 仅编写最少的代码。 除了使当前测试通过的代码外,不要添加任何内容。
- 所有测试通过后再进入下一个任务。
- 尽可能使用真实依赖。 仅对无法控制的外部服务使用模拟。
- 按方法依赖分解类。 按依赖顺序生成,单独测试每个方法。
- 有限修复。 最多尝试3次修复,然后重新评估或询问用户。
Brownfield-specific rules (all brownfield workflows)
已有代码库特定规则(所有已有代码库工作流)
- Match existing patterns. New code must follow the conventions discovered in Phase 0 (naming, file structure, import style, error handling).
- Refactor only what you wrote. Do NOT refactor existing code unless the task explicitly requires it. Existing code is assumed correct until proven otherwise.
- Read before calling. Before calling existing functions, read their actual signatures. Do not assume existing interfaces — verify them.
- If an existing test breaks, your new code caused a regression. Fix your new code first (existing passing tests are authoritative). Only modify an existing test if the spec explicitly changes that behavior.
- If a characterization test breaks (no-tests workflow), the same rule applies: your new code caused a regression. Fix your new code. The characterization test documents real behavior that something depends on.
- 匹配现有模式。 新代码必须遵循阶段0发现的规范(命名、文件结构、导入风格、错误处理)。
- 仅重构你编写的代码。 除非任务明确要求,否则不要重构现有代码。现有代码在被证明有误之前被假定为正确。
- 调用前先阅读。 在调用现有函数之前,阅读它们的实际签名。不要假设现有接口——要验证。
- 如果现有测试失败,你的新代码导致了回归。首先修复你的新代码(已通过的现有测试是权威的)。只有当规范明确改变该行为时,才修改现有测试。
- 如果特征测试失败(无测试工作流),适用相同规则:你的新代码导致了回归。修复你的新代码。特征测试记录了真实的、有其他依赖的行为。
Hallucination prevention
防止幻觉
- Verify external APIs/libraries exist and check current interfaces
- Chain-of-thought: reason step-by-step before coding
- Run static analysis after generation
- Use execution traceback (not just re-reading) to fix failures
- Brownfield: Read existing code before calling it; verify signatures
- 验证外部API/库存在并检查当前接口
- 链式思考:编码前逐步推理
- 生成后运行静态分析
- 使用执行回溯(不仅仅是重新阅读)修复失败
- 已有代码库:调用现有代码前先阅读;验证签名
Phase 6: Verification
阶段6:验证
After all tasks complete, verify the full delivery.
所有任务完成后,验证完整交付成果。
Checklist
检查清单
- Spec compliance: Every new requirement has at least one passing test
- All tests pass: Full test suite green (new AND existing)
- Traceability complete: Updated matrix with final status
- No untested new requirements: Every REQ-* from this iteration is covered
- Unchanged behaviors verified: All INV-* regression tests pass
- Static analysis clean: No linter errors, type errors, security warnings
- Pattern compliance (brownfield): New code follows existing conventions
- 规范合规性:每个新需求至少有一个通过的测试
- 所有测试通过:完整测试套件全部通过(新测试 + 现有测试)
- 可追溯性完整:更新矩阵,包含最终状态
- 无未测试的新需求:此迭代的每个REQ-*都有测试覆盖
- 不变行为已验证:所有INV-*回归测试通过
- 静态分析无问题:无代码检查错误、类型错误、安全警告
- 模式合规(已有代码库):新代码遵循现有规范
Additional no-tests verification
无测试场景的额外验证
- Test infrastructure works: runs cleanly from project root
[test command] - Characterization tests still pass: All CHAR-* tests green, confirming no behavioral regressions in the affected area
- Test conventions documented: Future developers can find and follow the test patterns (in design-delta.md or equivalent)
- Runner script exists: Test command is in manifest (package.json scripts, Makefile, etc.) — not just a manual invocation
- 测试基础设施可用:可从项目根目录干净运行
[测试命令] - 特征测试仍通过:所有CHAR-*测试通过,确认受影响区域无行为回归
- 测试规范已记录:未来开发者可以找到并遵循测试模式(在design-delta.md或等效文件中)
- 存在运行脚本:测试命令在清单文件中(package.json脚本、Makefile等)——不仅仅是手动调用
Deliverables
交付物
Greenfield:
- — Full specification
requirements.md - — Full technical design
design.md - — Task list with traceability matrix
tasks.md - Test suite — All passing
- Implementation code
Brownfield enhancement/refactor (tests exist):
- — New/changed requirements only (or appended to existing)
requirements.md - — What changed in the design
design-delta.md - — Task list with traceability matrix for this iteration
tasks.md - New/modified tests
- Implementation changes
- Change summary: Files modified, files created, behaviors added/changed
Brownfield enhancement/refactor (no tests):
- — New/changed requirements only (or appended to existing)
requirements.md - — What changed in the design, INCLUDING test infrastructure decisions (framework, conventions, directory structure)
design-delta.md - — Task list with traceability matrix (includes CHAR-* mappings)
tasks.md - Test infrastructure: config, runner script, directory structure
- Characterization tests for affected area
- New tests derived from spec
- Implementation changes
- Change summary: Files modified, files created, behaviors added/changed, test infrastructure established
Bugfix:
- — Bug analysis with Current/Expected/Unchanged
bugfix.md - Tests: reproduction, fix validation, regression
- Fix implementation
- Change summary
全新项目:
- — 完整规范
requirements.md - — 完整技术设计
design.md - — 包含可追溯性矩阵的任务列表
tasks.md - 测试套件 — 全部通过
- 实现代码
已有代码库功能增强/重构(存在测试):
- — 仅包含新增/修改的需求(或追加到现有文件)
requirements.md - — 设计中的变更内容
design-delta.md - — 此迭代的任务列表,包含可追溯性矩阵
tasks.md - 新增/修改的测试
- 实现变更
- 变更摘要:修改的文件、创建的文件、新增/变更的行为
已有代码库功能增强/重构(无测试):
- — 仅包含新增/修改的需求(或追加到现有文件)
requirements.md - — 设计中的变更内容,包括测试基础设施决策(框架、规范、目录结构)
design-delta.md - — 包含可追溯性矩阵的任务列表(包含CHAR-*映射)
tasks.md - 测试基础设施:配置、运行脚本、目录结构
- 受影响区域的特征测试
- 从规范衍生的新测试
- 实现变更
- 变更摘要:修改的文件、创建的文件、新增/变更的行为、已建立的测试基础设施
缺陷修复:
- — 包含当前/预期/不变行为的缺陷分析
bugfix.md - 测试:复现测试、修复验证测试、回归测试
- 修复实现
- 变更摘要
Failure Recovery
故障恢复
Test fails
|
+-> Is it a NEW test that fails?
| +-> Code bug: fix implementation
| +-> Test wrong: does it match spec?
| +-> Yes: fix code (spec is authoritative)
| +-> No: fix test (or revisit spec with user)
|
+-> Is it an EXISTING test that fails? (standard brownfield)
| +-> Your new code caused a regression
| +-> Fix your new code (existing tests are authoritative)
| +-> Do NOT modify the existing test unless the spec
| explicitly changes that behavior
| +-> If the existing test seems wrong, confirm with user
| before changing it
|
+-> Is it a CHARACTERIZATION test that fails? (no-tests brownfield)
| +-> Your new code caused a behavioral regression
| +-> Fix your new code (characterization tests document real behavior)
| +-> Do NOT modify the characterization test unless the spec
| explicitly changes that behavior (listed in Modified Behavior
| with Was/Now)
| +-> If the characterization test documents behavior the spec
| INTENDS to change, update the test to match the new spec
|
+-> Test infrastructure won't set up? (no-tests bootstrap)
+-> Check framework compatibility with project's Node/Python/etc. version
+-> Check for conflicting config (e.g., module type mismatches)
+-> Try the next framework in the recommendation table
+-> If stuck after 2 frameworks, ask user for guidanceNever silently change the spec. Confirm with user first.
If stuck after 3 attempts, ask user for guidance.
测试失败
|
+-> 是否是新测试失败?
| +-> 代码错误:修复实现
| +-> 测试错误:是否符合规范?
| +-> 是:修复代码(规范是权威的)
| +-> 否:修复测试(或与用户重新讨论规范)
|
+-> 是否是现有测试失败?(标准已有代码库)
| +-> 你的新代码导致了回归
| +-> 修复你的新代码(现有测试是权威的)
| +-> 除非规范明确改变该行为,否则不要修改现有测试
| +-> 如果现有测试似乎有误,修改前先与用户确认
|
+-> 是否是特征测试失败?(无测试已有代码库)
| +-> 你的新代码导致了行为回归
| +-> 修复你的新代码(特征测试记录真实行为)
| +-> 除非规范明确改变该行为(在变更行为部分列出了旧/新行为),否则不要修改特征测试
| +-> 如果特征测试记录的行为是规范有意改变的,更新测试以匹配新规范
|
+-> 测试基础设施无法搭建?(无测试搭建阶段)
+-> 检查框架与项目Node/Python等版本的兼容性
+-> 检查是否存在冲突配置(例如模块类型不匹配)
+-> 尝试推荐列表中的下一个框架
+-> 如果尝试2个框架后仍卡住,询问用户获取指导永远不要悄悄修改规范。 先与用户确认。
如果尝试3次后仍卡住,询问用户获取指导。
Reference Files
参考文件
- Spec templates: See references/spec-template.md for all templates: Feature, Enhancement, Refactor, Bugfix, and Codebase Context
- Test plan guide: See references/test-plan.md for test derivation, existing test discovery, characterization tests, traceability, and Given/When/Then
- Edge case catalog: See references/edge-cases.md for edge case categories to check during specification
- 规范模板:所有模板请参阅references/spec-template.md:功能、功能增强、代码重构、缺陷修复和代码库上下文
- 测试计划指南:测试衍生、现有测试发现、特征测试、可追溯性和Given/When/Then请参阅references/test-plan.md
- 边缘案例目录:规范制定期间需检查的边缘案例类别请参阅references/edge-cases.md