spec-driven-tdd

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Spec-Driven Test-Driven Development

规范驱动测试驱动开发

Transform requests into verified implementations through a structured pipeline that adapts to the project context.

通过适配项目上下文的结构化流程，将需求转化为经过验证的实现。

Workflow Router

工作流路由

Determine the workflow type before starting. This drives which phases apply and which templates to use.

Does a codebase already exist?
  |
  NO --> GREENFIELD (new project from scratch)
  |       Phases: Clarify -> Specify -> Test Plan -> TDD -> Verify
  |
  YES --> What type of change?
           |
           +-> Bug report / error / regression
           |     --> BUGFIX workflow
           |     Phases: Explore -> Bugfix Spec -> TDD -> Verify
           |
           +-> Add new capability not in codebase
           |     --> Does test infrastructure exist?
           |          |
           |          YES --> ENHANCEMENT workflow
           |          |     Phases: Explore -> Clarify -> Specify (delta) -> Test Plan -> TDD -> Verify
           |          |
           |          NO  --> ENHANCEMENT (NO TESTS) workflow
           |                Phases: Explore -> Clarify -> Specify (delta) -> Bootstrap Tests -> Test Plan -> TDD -> Verify
           |
           +-> Change behavior without adding capability
           |     --> Does test infrastructure exist?
           |          |
           |          YES --> REFACTOR workflow
           |          |     Phases: Explore -> Clarify -> Specify (delta) -> Test Plan -> TDD -> Verify
           |          |
           |          NO  --> REFACTOR (NO TESTS) workflow
           |                Phases: Explore -> Clarify -> Specify (delta) -> Bootstrap Tests -> Test Plan -> TDD -> Verify
           |
           +-> Simple, well-understood change (one-sentence diff)
                 --> DIRECT (skip this skill)

Signal detection:

Signal in request	Likely type
Error message, stack trace, "doesn't work", "broken"	Bugfix
"Add", "create", "new", "build", "implement"	Enhancement
"Improve", "enhance", "extend", "also support"	Enhancement
"Clean up", "modernize", "consolidate", "simplify"	Refactor
"Slow", "optimize", "performance"	Refactor (optimization)
No existing code, "start fresh", "new project"	Greenfield
"Add tests", "no tests", "untested"	Enhancement/Refactor (no tests)

Test infrastructure detection:

Existing test infrastructure = ANY of these found:

Test files (
```
*test*
```
,
```
*spec*
```
,
```
__tests__/
```
)
Test runner config (jest.config., vitest.config., pytest.ini, conftest.py, .mocharc.*)
Test script in manifest (
```
"test"
```
script in package.json, test target in Makefile)
CI config that runs tests

No test infrastructure = NONE of the above found.

开始前先确定工作流类型，这将决定适用的阶段和使用的模板。

是否已有代码库？
  |
  否 --> 全新项目（GREENFIELD，从零开始）
  |       阶段：澄清需求 → 制定规范 → 测试计划 → TDD → 验证
  |
  是 --> 变更类型是什么？
           |
           +-> 缺陷报告/错误/回归问题
           |     --> 缺陷修复（BUGFIX）工作流
           |     阶段：探索代码库 → 缺陷修复规范 → TDD → 验证
           |
           +-> 添加代码库中没有的新功能
           |     --> 是否存在测试基础设施？
           |          |
           |          是 --> 功能增强（ENHANCEMENT）工作流
           |          |     阶段：探索代码库 → 澄清需求 → 制定增量规范 → 测试计划 → TDD → 验证
           |          |
           |          否  --> 无测试的功能增强（ENHANCEMENT (NO TESTS)）工作流
           |                阶段：探索代码库 → 澄清需求 → 制定增量规范 → 搭建测试基础设施 → 测试计划 → TDD → 验证
           |
           +-> 修改现有行为但不添加新功能
           |     --> 是否存在测试基础设施？
           |          |
           |          是 --> 代码重构（REFACTOR）工作流
           |          |     阶段：探索代码库 → 澄清需求 → 制定增量规范 → 测试计划 → TDD → 验证
           |          |
           |          否  --> 无测试的代码重构（REFACTOR (NO TESTS)）工作流
           |                阶段：探索代码库 → 澄清需求 → 制定增量规范 → 搭建测试基础设施 → 测试计划 → TDD → 验证
           |
           +-> 简单、明确的变更（单行差异即可描述）
                 --> 直接处理（跳过此工作流）

信号检测：

请求中的信号	可能的工作流类型
错误信息、堆栈跟踪、“无法工作”、“损坏”	缺陷修复
“添加”、“创建”、“新的”、“构建”、“实现”	功能增强
“改进”、“增强”、“扩展”、“同时支持”	功能增强
“清理”、“现代化”、“合并”、“简化”	代码重构
“缓慢”、“优化”、“性能”	代码重构（优化）
无现有代码、“从头开始”、“新项目”	全新项目
“添加测试”、“无测试”、“未测试”	功能增强/代码重构（无测试）

测试基础设施检测：

存在测试基础设施 = 满足以下任一条件：

存在测试文件（
```
*test*
```
、
```
*spec*
```
、
```
__tests__/
```
）
存在测试运行器配置（jest.config.、vitest.config.、pytest.ini、conftest.py、.mocharc.*）
清单文件中存在测试脚本（package.json中的"test"脚本、Makefile中的测试目标）
存在执行测试的CI配置

无测试基础设施 = 以上条件均不满足。

Phase 0: Explore Existing Codebase (Brownfield Only)

阶段0：探索现有代码库（仅适用于已有代码库的场景）

Skip this phase for greenfield projects.

Before asking the user anything or writing any spec, understand what already exists. Explore in read-only mode.

全新项目跳过此阶段。

在询问用户任何问题或编写任何规范之前，先了解已有的内容。以只读模式探索。

What to discover

需要探索的内容

Project structure: Directory layout, where source and tests live
Tech stack: Framework, language, versions, build system, package manager
Architectural patterns: MVC, layered, event-driven, microservices, etc.
Naming conventions: File naming, function naming, variable casing
Existing patterns: Find the closest analog to the requested change — how is similar functionality already implemented?
Data models: Key entities, relationships, schemas
Test infrastructure: Test framework, test runner command, fixture patterns, approximate coverage, where test files live
Existing spec artifacts: Check for prior requirements.md, design.md, tasks.md, CLAUDE.md, or similar documentation
Affected area: Which files and modules will this change touch?

项目结构：目录布局、源码和测试文件的位置
技术栈：框架、语言、版本、构建系统、包管理器
架构模式：MVC、分层架构、事件驱动、微服务等
命名规范：文件命名、函数命名、变量大小写规则
现有实现模式：找到与请求变更最相似的已有功能——类似功能是如何实现的？
数据模型：核心实体、关系、 schema
测试基础设施：测试框架、测试运行命令、夹具模式、大致覆盖率、测试文件位置
现有规范文档：检查是否存在requirements.md、design.md、tasks.md、CLAUDE.md或类似文档
受影响区域：变更将涉及哪些文件和模块？

How to explore

探索方式

Read the project's CLAUDE.md or README first (if they exist)
Use glob/grep to find relevant source files and test files
Read 2-3 existing files in the affected area to absorb patterns
If prior spec files exist, read them to understand accumulated requirements

先阅读项目的CLAUDE.md或README（如果存在）
使用glob/grep查找相关源码文件和测试文件
阅读受影响区域的2-3个现有文件，熟悉现有模式
如果存在旧的规范文件，阅读它们以了解已积累的需求

Test infrastructure assessment

测试基础设施评估

Determine the test state by scanning for:

Indicator	What it tells you
Test files exist in affected area	Tests cover the code you'll change
Test files exist elsewhere but not here	Project has tests, but not for this area
Test framework in dependencies	Framework chosen but maybe unused
Test runner config exists	Infrastructure is set up
`"test"` script in manifest	Runner command is established
CI config runs tests	Tests are part of the workflow
No test files, no config, no scripts	No test infrastructure at all

Classify the result:

Test State	Definition	Workflow Impact
Tests exist	Test files + runner + config all present	Standard brownfield
Partial infra	Framework in deps or config exists, but no/few test files	Bootstrap: write tests, skip framework selection
No tests	No test files, no config, no runner	Bootstrap: full test infrastructure setup

通过扫描以下内容确定测试状态：

指标	说明
受影响区域存在测试文件	测试覆盖了你将修改的代码
其他区域存在测试文件但此区域没有	项目有测试，但未覆盖此区域
依赖中包含测试框架	已选择框架但可能未使用
存在测试运行器配置	基础设施已搭建
清单文件中有"test"脚本	运行命令已确定
CI配置执行测试	测试已纳入工作流
无测试文件、无配置、无脚本	完全没有测试基础设施

分类结果：

测试状态	定义	对工作流的影响
存在测试	测试文件、运行器、配置均存在	标准已有代码库工作流
部分基础设施	依赖中有框架或存在配置，但测试文件很少或没有	搭建测试：编写测试，跳过框架选择
无测试	无测试文件、无配置、无运行器	搭建测试：完整设置测试基础设施

Output: Codebase Context Summary

输出：代码库上下文摘要

Produce a brief mental model (do not write a file unless the user requests it):

Stack: [framework, language, test runner]
Patterns: [architecture, naming, file organization]
Affected area: [files/modules this change touches]
Existing tests: [relevant test files, approximate count, runner command]
  OR: No test infrastructure found
Test state: [tests exist | partial infra | no tests]
Prior specs: [any existing requirements.md/design.md or none]
Closest analog: [existing feature most similar to requested change]

This context informs every subsequent phase.

生成一个简要的心智模型（除非用户要求，否则无需写入文件）：

技术栈：[框架、语言、测试运行器]
模式：[架构、命名、文件组织]
受影响区域：[变更将涉及的文件/模块]
现有测试：[相关测试文件、大致数量、运行命令]
  或：未找到测试基础设施
测试状态：[存在测试 | 部分基础设施 | 无测试]
已有规范：[存在requirements.md/design.md等或无]
最相似的已有功能：[与请求变更最相似的现有功能]

此上下文将指导后续所有阶段。

Phase 1: Clarify Requirements

阶段1：澄清需求

Evaluate the request's clarity before starting any work.

在开始任何工作之前，评估请求的清晰程度。

Greenfield

全新项目

Ask about the blank slate: inputs/outputs, data formats, error handling, edge cases, performance constraints, tech stack preferences.

询问关于空白项目的信息：输入/输出、数据格式、错误处理、边缘案例、性能约束、技术栈偏好。

Brownfield (tests exist)

已有代码库（存在测试）

Use codebase context from Phase 0. Do NOT ask questions the codebase already answers. Focus questions on:

What should the NEW behavior be? (the codebase shows current behavior)
Where does new behavior differ from existing patterns?
What existing behavior must NOT change?
Are there constraints the existing architecture imposes?

使用阶段0得到的代码库上下文。 不要询问代码库已经能回答的问题。重点关注以下问题：

新行为应该是什么？（代码库展示了当前行为）
新行为与现有模式有何不同？
哪些现有行为必须保持不变？
现有架构是否施加了约束？

Brownfield (no tests)

已有代码库（无测试）

In addition to the standard brownfield questions, clarify:

Test framework preference: Does the user have a preferred test framework? If not, recommend one based on the stack (see Bootstrap Tests phase).
Test scope: Should tests cover ONLY the new/changed code, or also establish baseline coverage for existing code in the affected area?
Test location: Co-located with source (
```
src/foo.test.ts
```
) or separate directory (
```
tests/
```
)? Infer from project conventions if possible.

Ambiguity signals (if ANY present, ask before proceeding):

Multiple valid interpretations producing different implementations
Missing I/O specifications, pre/post conditions, or data formats
Unspecified error handling, edge cases, or boundary conditions
Unclear performance, security, or environmental constraints
Change touches shared/critical code paths

When to proceed without asking:

Request is well-constrained with one obvious implementation
Ambiguity is purely cosmetic (naming, formatting)
Clarification loop has reached 3 rounds (proceed with stated assumptions)

Interview approach:

Ask up to 3 focused questions per round
Each question should target a decision that changes the implementation
State the default assumption alongside each question
After each round, summarize resolved items and remaining unknowns

除了标准的已有代码库问题外，还需澄清：

测试框架偏好：用户是否有偏好的测试框架？如果没有，根据技术栈推荐一个（见搭建测试基础设施阶段）。
测试范围：测试应仅覆盖新增/修改的代码，还是也需为受影响区域的现有代码建立基线覆盖率？
测试文件位置：与源码共存（
```
src/foo.test.ts
```
）还是单独目录（
```
tests/
```
）？尽可能从项目惯例推断。

模糊信号（只要存在任一信号，就先询问再继续）：

存在多种合理解释，会导致不同的实现
缺少输入/输出规范、前置/后置条件或数据格式
未指定错误处理、边缘案例或边界条件
性能、安全或环境约束不明确
变更涉及共享/关键代码路径

无需询问即可继续的情况：

请求约束明确，只有一种明显的实现方式
模糊点仅涉及外观（命名、格式）
澄清循环已达3轮（基于已说明的假设继续）

询问方法：

每轮最多问3个聚焦的问题
每个问题应针对会改变实现的决策点
每个问题附带默认假设
每轮询问后，总结已解决的问题和剩余未知项

Phase 2: Write Specification

阶段2：编写规范

Use the templates in references/spec-template.md. Which template depends on the workflow type:

Workflow	Spec Template	Key Difference
Greenfield	Feature (requirements.md + design.md)	Written from scratch
Enhancement	Enhancement (requirements.md + design-delta.md)	Extends existing specs, documents only what changes
Enhancement (no tests)	Enhancement (requirements.md + design-delta.md)	Same as above, plus test infrastructure decisions
Refactor	Refactor (refactor.md)	Documents behavior preservation + structural changes
Refactor (no tests)	Refactor (refactor.md)	Same as above, plus test infrastructure decisions
Bugfix	Bugfix (bugfix.md)	Current/Expected/Unchanged behavior

使用references/spec-template.md中的模板。使用哪个模板取决于工作流类型：

工作流	规范模板	核心差异
全新项目	功能模板（requirements.md + design.md）	从零开始编写
功能增强	功能增强模板（requirements.md + design-delta.md）	扩展现有规范，仅记录变更内容
无测试的功能增强	功能增强模板（requirements.md + design-delta.md）	同上，加上测试基础设施决策
代码重构	代码重构模板（refactor.md）	记录行为保留情况和结构变更
无测试的代码重构	代码重构模板（refactor.md）	同上，加上测试基础设施决策
缺陷修复	缺陷修复模板（bugfix.md）	当前/预期/不变行为

EARS notation (all workflows)

EARS表示法（所有工作流适用）

Pattern	Syntax	Use Case
Ubiquitous	THE SYSTEM SHALL [behavior]	Always-on requirements
Event-Driven	WHEN [event] THE SYSTEM SHALL [behavior]	Triggered by events
State-Driven	WHILE [state] THE SYSTEM SHALL [behavior]	During a state
Unwanted	IF [condition] THEN THE SYSTEM SHALL [behavior]	Exception handling
Complex	WHILE [state] WHEN [event] THE SYSTEM SHALL [behavior]	Combined

Every requirement must be atomic, testable, and have a unique ID.

模式	语法	使用场景
通用型	THE SYSTEM SHALL [行为]	始终生效的需求
事件驱动型	WHEN [事件] THE SYSTEM SHALL [行为]	由事件触发的需求
状态驱动型	WHILE [状态] THE SYSTEM SHALL [行为]	处于某状态时的需求
异常处理型	IF [条件] THEN THE SYSTEM SHALL [行为]	异常处理场景
复杂型	WHILE [状态] WHEN [事件] THE SYSTEM SHALL [行为]	组合场景

每个需求必须是原子性的、可测试的，并拥有唯一ID。

Requirement ID namespacing (brownfield)

需求ID命名空间（已有代码库）

If prior specs exist with REQ-001 through REQ-N, continue numbering from REQ-(N+1). If no prior specs exist, start fresh. Use prefixes to distinguish:

```
REQ-E-nnn
```
for enhancement requirements
```
REQ-R-nnn
```
for refactor requirements
```
REQ-BUG-nnn
```
for bugfix requirements

如果已有规范的ID从REQ-001到REQ-N，则从REQ-(N+1)开始继续编号。如果没有已有规范，则从头开始。使用前缀区分：

```
REQ-E-nnn
```
用于功能增强需求
```
REQ-R-nnn
```
用于代码重构需求
```
REQ-BUG-nnn
```
用于缺陷修复需求

Unchanged Behavior section (critical for brownfield)

不变行为部分（已有代码库的关键内容）

Every brownfield spec MUST include an Unchanged Behavior section:

- INV-001: WHEN [condition] THE SYSTEM SHALL CONTINUE TO [existing behavior]

This is the primary regression prevention mechanism.

When no tests exist, invariants are your ONLY regression protection. Be more thorough than usual: list every behavior in the affected area that must survive the change. Read the source code and its callers to identify these.

每个已有代码库的规范必须包含不变行为部分：

- INV-001: WHEN [条件] THE SYSTEM SHALL CONTINUE TO [现有行为]

这是主要的回归预防机制。

当没有测试时，不变量是唯一的回归保护手段。 要比平时更全面：列出受影响区域中所有必须在变更后保持不变的行为。阅读源码及其调用方来识别这些行为。

Greenfield: requirements.md + design.md

全新项目：requirements.md + design.md

Write both from scratch per the Feature template. Full architecture, data models, API contracts, error handling, testing strategy.

根据功能模板从零开始编写这两个文件。包含完整架构、数据模型、API契约、错误处理、测试策略。

Enhancement: requirements.md + design-delta.md

功能增强：requirements.md + design-delta.md

requirements.md: Add only NEW requirements. Reference existing spec if present. Include Unchanged Behavior invariants for every existing behavior the change could affect.

design-delta.md: Document only what changes:

Modified components (what exists → what changes)
New components (what's added)
Integration points (how new connects to existing)
Files to modify vs. files to create

Do NOT rewrite the entire architecture. Reference existing design.

No-tests addition to design-delta.md: Include a Test Infrastructure section:

markdown

undefined

requirements.md：仅添加新需求。如果存在现有规范，需引用。包含变更可能影响的所有现有行为的不变行为不变量。

design-delta.md：仅记录变更内容：

修改的组件（现有内容 → 变更后内容）
新增的组件（添加的内容）
集成点（新组件与现有组件的连接方式）
需修改的文件 vs 需创建的文件

不要重写整个架构。引用现有设计。

无测试场景需添加到design-delta.md的内容：包含测试基础设施部分：

markdown

undefined

Test Infrastructure (New)

测试基础设施（新增）

Framework: [chosen framework and rationale]
Runner command: [command to run tests]
Config file: [path to test config]
Test location: [co-located | separate directory]
Naming convention: [pattern, e.g., .test.ts, test_.py]

undefined

框架： [所选框架及理由]
运行命令： [运行测试的命令]
配置文件： [测试配置文件路径]
测试文件位置： [与源码共存 | 单独目录]
命名规范： [模式，例如 .test.ts, test_.py]

undefined

Bugfix: bugfix.md

缺陷修复：bugfix.md

Three-section format:

Current Behavior (defect): WHEN [x] THEN the system [incorrect behavior]
Expected Behavior (correct): WHEN [x] THEN the system SHALL [correct behavior]
Unchanged Behavior (regression): WHEN [y] THE SYSTEM SHALL CONTINUE TO [existing behavior]

Derive three tests: reproduce bug, validate fix, confirm no regressions.

三部分格式：

当前行为（缺陷）：WHEN [x] THEN 系统 [错误行为]
预期行为（正确行为）：WHEN [x] THEN 系统 SHALL [正确行为]
不变行为（回归预防）：WHEN [y] THE SYSTEM SHALL CONTINUE TO [现有行为]

衍生三个测试：复现缺陷、验证修复、确认无回归。

Phase 3: Bootstrap Test Infrastructure (No-Tests Workflows Only)

阶段3：搭建测试基础设施（仅适用于无测试的工作流）

Skip this phase if test infrastructure already exists.

Before deriving a test plan, establish the test infrastructure. You cannot write tests without a framework, runner, and conventions.

如果已有测试基础设施，跳过此阶段。

在制定测试计划之前，先建立测试基础设施。没有框架、运行器和规范，无法编写测试。

Step 1: Select test framework

步骤1：选择测试框架

If the user specified a preference in Phase 1, use it. Otherwise, select based on the stack:

Stack	Recommended Framework	Rationale
TypeScript / ESM	vitest	Native ESM, fast, compatible with Jest API
TypeScript / CJS	jest + ts-jest	Mature ecosystem, wide adoption
JavaScript / Node	jest or vitest	Either works; match project's module system
JavaScript / Browser	vitest or jest + jsdom	DOM testing support
Python 3	pytest	De facto standard, fixtures, parametrize
Python (Django)	pytest-django	Django integration with pytest
Go	testing (stdlib)	Built-in, no external dependency needed
Rust	cargo test (built-in)	Built-in test framework
Java / Spring	JUnit 5 + Mockito	Industry standard
Ruby / Rails	RSpec or Minitest	RSpec for BDD style; Minitest for minimal
C# / .NET	xUnit or NUnit	xUnit for modern .NET
Elixir	ExUnit (built-in)	Built-in test framework

Principle: Choose the most conventional option. The goal is not the "best" framework but the one the team (or a future developer) will expect.

如果用户在阶段1指定了偏好，则使用该框架。否则，根据技术栈选择：

技术栈	推荐框架	理由
TypeScript / ESM	vitest	原生ESM支持，速度快，兼容Jest API
TypeScript / CJS	jest + ts-jest	成熟生态系统，广泛采用
JavaScript / Node	jest 或 vitest	均可；匹配项目的模块系统
JavaScript / Browser	vitest 或 jest + jsdom	支持DOM测试
Python 3	pytest	事实上的标准，支持夹具、参数化
Python (Django)	pytest-django	Django与pytest的集成
Go	testing（标准库）	内置，无需外部依赖
Rust	cargo test（内置）	内置测试框架
Java / Spring	JUnit 5 + Mockito	行业标准
Ruby / Rails	RSpec 或 Minitest	RSpec适用于BDD风格；Minitest更轻量
C# / .NET	xUnit 或 NUnit	xUnit适用于现代.NET
Elixir	ExUnit（内置）	内置测试框架

原则：选择最符合惯例的选项。 目标不是“最好”的框架，而是团队（或未来开发者）会预期使用的框架。

Step 2: Install and configure

步骤2：安装和配置

Add the framework as a dev dependency:

npm install --save-dev vitest          # Node/TS
pip install pytest                     # Python
# Go and Rust have built-in testing

Create test config (if the framework needs one):
- ```
vitest.config.ts
```
  ,
```
jest.config.ts
```
  ,
```
pytest.ini
```
  , etc.
- Match the project's existing config patterns (TypeScript for TS projects, YAML if project uses YAML configs, etc.)

Add test runner script to the project manifest:

json

// package.json
{ "scripts": { "test": "vitest run" } }

toml

# pyproject.toml
[tool.pytest.ini_options]
testpaths = ["tests"]

Create test directory (if using separate directory convention):
```
mkdir tests/          # or __tests__/ or test/
```

Verify the framework works by writing a trivial smoke test:

typescript

// tests/smoke.test.ts
import { describe, it, expect } from 'vitest'

describe('test infrastructure', () => {
  it('works', () => {
    expect(1 + 1).toBe(2)
  })
})

Run it. If it passes, infrastructure is ready. Delete or keep the smoke test as appropriate.

将框架添加为开发依赖：

npm install --save-dev vitest          # Node/TS
pip install pytest                     # Python
# Go和Rust有内置测试功能

创建测试配置（如果框架需要）：
- ```
vitest.config.ts
```
  、
```
jest.config.ts
```
  、
```
pytest.ini
```
  等
- 匹配项目现有配置模式（TS项目用TypeScript，项目用YAML配置则用YAML等）

在项目清单中添加测试运行脚本：

json

// package.json
{ "scripts": { "test": "vitest run" } }

toml

# pyproject.toml
[tool.pytest.ini_options]
testpaths = ["tests"]

创建测试目录（如果使用单独目录惯例）：
```
mkdir tests/          # 或 __tests__/ 或 test/
```

验证框架可用：编写一个简单的冒烟测试：

typescript

// tests/smoke.test.ts
import { describe, it, expect } from 'vitest'

describe('test infrastructure', () => {
  it('works', () => {
    expect(1 + 1).toBe(2)
  })
})

运行测试。如果通过，基础设施已准备就绪。根据需要保留或删除冒烟测试。

Step 3: Establish test conventions

步骤3：建立测试规范

Derive conventions from the project's CODE conventions:

Code Convention	Test Convention
Files in `src/module/foo.ts`	Tests in `src/module/foo.test.ts` (co-located) or `tests/module/foo.test.ts` (separate)
Functions use camelCase	Test names use camelCase: `it('createsUserWithEmail')`
Functions use snake_case	Test names use snake_case: `test_creates_user_with_email`
Modules organized by feature	Test directories mirror source directories
ES modules (import/export)	Tests use same import style
CommonJS (require)	Tests use same require style

Document the conventions in design-delta.md (or design.md for greenfield) so future tests stay consistent.

从项目的代码规范衍生测试规范：

代码规范	测试规范
文件位于 `src/module/foo.ts`	测试文件位于 `src/module/foo.test.ts` （与源码共存）或 `tests/module/foo.test.ts` （单独目录）
函数使用camelCase	测试名称使用camelCase： `it('createsUserWithEmail')`
函数使用snake_case	测试名称使用snake_case： `test_creates_user_with_email`
模块按功能组织	测试目录镜像源码目录结构
ES模块（import/export）	测试使用相同的导入风格
CommonJS（require）	测试使用相同的require风格

记录规范：在design-delta.md（或全新项目的design.md）中记录测试规范，以便未来测试保持一致。

Step 4: Write characterization tests (critical)

步骤4：编写特征测试（关键步骤）

Before changing any existing code, write characterization tests that capture the current behavior of the code you're about to modify. These are not tests you want to pass — they are tests that document what the code ACTUALLY does, so you can detect when your changes break something.

See references/test-plan.md for the full characterization test method.

在修改任何现有代码之前，编写特征测试以记录你即将修改的代码的当前行为。这些不是你希望通过的测试——它们记录代码的实际行为，以便你能检测到变更何时破坏了现有功能。

有关完整的特征测试方法，请参阅references/test-plan.md。

What to characterize

需要特征化的内容

Focus on the affected area identified in Phase 0:

Public functions/methods you'll call or modify: test their current inputs → outputs
API endpoints you'll change: test their current request → response
Critical code paths through the module: test the happy path and major error paths
Integration points between the module and its callers: test that callers get what they expect

聚焦于阶段0确定的受影响区域：

你将调用或修改的公共函数/方法：测试它们当前的输入→输出
你将修改的API端点：测试它们当前的请求→响应
模块中的关键代码路径：测试正常路径和主要错误路径
模块与其调用方之间的集成点：测试调用方是否得到预期结果

How many characterization tests

特征测试的数量

Enhancement: Characterize the specific functions/endpoints you'll modify or call. Not the entire codebase — just the affected surface.
Refactor: More extensive. Characterize ALL externally observable behavior of the code being refactored. This is your safety net.
Bugfix: Characterize the behavior around the bug. The reproduction test IS a characterization test (it documents the current broken behavior).

功能增强：特征化你将修改或调用的特定函数/端点。不是整个代码库——仅受影响的部分。
代码重构：更全面。特征化被重构代码的所有外部可观察行为。这是你的安全网。
缺陷修复：特征化缺陷周围的行为。复现测试本身就是特征测试（它记录了当前的错误行为）。

Characterization test naming

特征测试命名

Use a distinct naming pattern so these are recognizable:

test_CHAR_[function]_[scenario]_[current_behavior]

Examples:

python

def test_CHAR_create_user_with_valid_data_returns_user_object():
def test_CHAR_create_user_with_duplicate_email_raises_conflict():
def test_CHAR_get_user_nonexistent_returns_none():

使用独特的命名模式，以便识别：

test_CHAR_[函数]_[场景]_[当前行为]

示例：

python

def test_CHAR_create_user_with_valid_data_returns_user_object():
def test_CHAR_create_user_with_duplicate_email_raises_conflict():
def test_CHAR_get_user_nonexistent_returns_none():

Run characterization tests

运行特征测试

All characterization tests MUST pass against the UNCHANGED code. If a characterization test fails, your test is wrong — fix the test, not the code.

These characterization tests become your regression suite. They replace the "existing tests" that the standard brownfield workflow relies on.

所有特征测试必须在未修改的代码上通过。如果特征测试失败，说明你的测试有误——修复测试，而不是代码。

这些特征测试将成为你的回归测试套件。 它们替代了标准已有代码库工作流中依赖的“现有测试”。

Deliverable: Test infrastructure ready

交付物：测试基础设施就绪

After this phase:

Framework installed and configured
Runner command works
Test conventions documented
Characterization tests pass against unchanged code
You can now proceed to Phase 4 (Test Plan) with confidence

此阶段完成后：

框架已安装并配置
运行命令可用
测试规范已记录
特征测试在未修改的代码上通过
你可以放心进入阶段4（测试计划）

Phase 4: Derive Test Plan

阶段4：制定测试计划

From the spec, derive tests and write

tasks.md

. See references/test-plan.md for templates.

从规范中衍生测试并编写

tasks.md

。有关模板，请参阅references/test-plan.md。

Brownfield: Discover existing tests first

已有代码库：先发现现有测试

Before writing any new tests:

Find existing test files in the affected area (glob for
```
*test*
```
,
```
*spec*
```
)
Read relevant existing tests to understand patterns (framework, assertions, fixtures, naming conventions)
Check for existing coverage of the behaviors you're about to change
Match existing patterns in all new tests (same framework, same style, same file location conventions)

在编写任何新测试之前：

找到受影响区域的现有测试文件（使用glob查找
```
*test*
```
、
```
*spec*
```
）
阅读相关现有测试以了解模式（框架、断言、夹具、命名规范）
检查你即将修改的行为是否已有测试覆盖
所有新测试匹配现有模式（相同框架、相同风格、相同文件位置规范）

Brownfield (no tests): Use characterization tests as baseline

已有代码库（无测试）：使用特征测试作为基线

If you bootstrapped tests in Phase 3:

Your characterization tests ARE the existing tests. They serve the same role as pre-existing tests in the standard brownfield workflow.
Match the conventions you established in Phase 3 for all new tests.
Do not add more characterization tests at this point — Phase 3 covered the affected area. Focus on test derivation from the spec.
Map invariants to characterization tests: Each INV-* should correspond to at least one characterization test that already passes.

如果你在阶段3搭建了测试基础设施：

你的特征测试就是现有测试。 它们扮演标准已有代码库工作流中预先存在的测试的角色。
所有新测试匹配你在阶段3建立的规范。
此时不要添加更多特征测试——阶段3已覆盖受影响区域。专注于从规范中衍生测试。
将不变量映射到特征测试：每个INV-*应对应至少一个已通过的特征测试。

Test derivation rules

测试衍生规则

Each EARS requirement → at least one acceptance test (Given/When/Then)
Each data model/interface → unit tests for validation, transformation, edge cases
Each integration point → integration test using real services where possible
Each invariant → regression test confirming unchanged behavior
- If tests already exist: existing tests cover this
- If no tests existed: characterization tests from Phase 3 cover this
Identify properties (invariants for all inputs) → property-based tests

每个EARS需求 → 至少一个验收测试（Given/When/Then格式）
每个数据模型/接口 → 单元测试，涵盖验证、转换、边缘案例
每个集成点 → 集成测试，尽可能使用真实服务
每个不变量 → 回归测试，确认行为不变
- 如果已有测试：现有测试覆盖此内容
- 如果没有测试：阶段3的特征测试覆盖此内容
识别属性（所有输入的不变量）→ 属性化测试

Traceability

可追溯性

Maintain a mapping in tasks.md:

| Req ID    | Test Case IDs     | Status      |
|-----------|-------------------|-------------|
| REQ-E-001 | TC-E-001, TC-E-002| Not Started |
| INV-001   | TC-REG-001        | Not Started |

Brownfield traceability rules:

Every NEW requirement must have >= 1 test
Every INVARIANT must have >= 1 regression test
Existing tests from prior iterations are NOT orphans — only flag tests from the current iteration that don't map to a requirement
The traceability matrix covers only the current iteration's scope

No-tests traceability addition:

Characterization tests (CHAR-) map to INV- invariants
Include them in the matrix with their CHAR prefix:

| Req ID    | Test Case IDs        | Status      |
|-----------|----------------------|-------------|
| REQ-E-001 | TC-E-001, TC-E-002   | Not Started |
| INV-001   | CHAR-create-user-001 | Passing (baseline) |
| INV-002   | CHAR-get-user-001    | Passing (baseline) |

在tasks.md中维护映射：

| 需求ID    | 测试用例ID     | 状态      |
|-----------|-------------------|-------------|
| REQ-E-001 | TC-E-001, TC-E-002| 未开始 |
| INV-001   | TC-REG-001        | 未开始 |

已有代码库可追溯性规则：

每个新需求必须有≥1个测试
每个不变量必须有≥1个回归测试
先前迭代的现有测试不是孤立的——仅标记当前迭代中未映射到需求的测试
可追溯性矩阵仅涵盖当前迭代的范围

无测试场景的可追溯性补充：

特征测试（CHAR-*）映射到INV-*不变量
在矩阵中包含它们的CHAR前缀：

| 需求ID    | 测试用例ID        | 状态      |
|-----------|----------------------|-------------|
| REQ-E-001 | TC-E-001, TC-E-002   | 未开始 |
| INV-001   | CHAR-create-user-001 | 通过（基线） |
| INV-002   | CHAR-get-user-001    | 通过（基线） |

tasks.md

Break implementation into discrete, sequenced tasks. Each task:

Maps to one or more requirements
Has clear acceptance criteria
Follows dependency order
Includes "Write tests" as the FIRST subtask (TDD)
Brownfield: Specifies which files are MODIFIED vs. CREATED

No-tests task ordering: For brownfield-no-tests workflows, tasks.md should include the bootstrap work as Task 0:

undefined

将实现分解为离散的、按顺序排列的任务。每个任务：

映射到一个或多个需求
有明确的验收标准
遵循依赖顺序
将“编写测试”作为第一个子任务（TDD）
已有代码库：指定哪些文件是修改的 vs 创建的

无测试场景的任务顺序： 对于无测试的已有代码库工作流，tasks.md应将搭建工作作为任务0：

undefined

Task 0: Bootstrap test infrastructure

任务0：搭建测试基础设施

Status: [ ] Not Started
Requirements: (infrastructure — no REQ mapping)
Subtasks:
1. Install [framework], create config
2. Add test runner script
3. Write characterization tests for affected area
4. Verify all characterization tests pass
Acceptance:
```
[test command]
```
runs and all characterization tests pass


Principle: **no big jumps in complexity**.

状态：[ ] 未开始
需求：（基础设施——无REQ映射）
子任务：
1. 安装[框架]，创建配置
2. 添加测试运行脚本
3. 为受影响区域编写特征测试
4. 验证所有特征测试通过
验收标准：
```
[测试命令]
```
可运行且所有特征测试通过


原则：**不要有大幅的复杂度跳跃**。

Phase 5: TDD Implementation Loop

阶段5：TDD实现循环

Execute tasks from tasks.md using strict TDD. This is NON-NEGOTIABLE.

使用严格的TDD执行tasks.md中的任务。这是不可协商的。

For each task:

每个任务的步骤：

1. RED    - Write failing test(s) for the task's requirements
2. RUN    - Execute test, confirm it FAILS (if it passes, test is wrong)
3. GREEN  - Write MINIMAL code to make the test pass
4. RUN    - Execute ALL tests (new + existing), confirm ALL pass
5. REFACTOR - Clean up, ensure no test breakage
6. COMMIT - Mark task complete in tasks.md

1. 红（RED）    - 为任务的需求编写失败的测试
2. 运行（RUN）    - 执行测试，确认失败（如果通过，说明测试有误）
3. 绿（GREEN）  - 编写最少的代码使测试通过
4. 运行（RUN）    - 执行所有测试（新测试 + 现有测试），确认全部通过
5. 重构（REFACTOR） - 清理代码，确保测试不中断
6. 提交（COMMIT） - 在tasks.md中标记任务完成

Running tests (brownfield, tests exist)

运行测试（已有代码库，存在测试）

Inner loop: Run only the new/affected tests during Red-Green iterations (for speed)
Task boundary: Run the FULL test suite after completing each task (for regression safety)
Final verification: Run full suite + linters + type checks at the end

内部循环：在红-绿迭代期间仅运行新的/受影响的测试（为了速度）
任务边界：完成每个任务后运行完整测试套件（为了回归安全）
最终验证：最后运行完整套件 + 代码检查工具 + 类型检查

Running tests (brownfield, no tests — after bootstrap)

运行测试（已有代码库，无测试——搭建完成后）

Inner loop: Run new tests + characterization tests during Red-Green iterations. The characterization tests are your regression guardrail.
Task boundary: Run ALL tests (characterization + new) after each task.
Final verification: Run all tests + linters + type checks at the end.

Critical: If a characterization test fails during implementation, you broke existing behavior. This is the same signal as "existing test breaks" in the standard brownfield workflow. Fix your new code first.

内部循环：在红-绿迭代期间运行新测试 + 特征测试。特征测试是你的回归防护栏。
任务边界：完成每个任务后运行所有测试（特征测试 + 新测试）。
最终验证：最后运行所有测试 + 代码检查工具 + 类型检查。

关键：如果在实现过程中特征测试失败，说明你破坏了现有行为。 这与标准已有代码库工作流中“现有测试失败”的信号相同。首先修复你的新代码。

Rules

规则

Never write implementation before its test.
Never alter the spec to satisfy a test. Spec-derived tests are authoritative.
Minimal code only. Add nothing beyond what makes the current test pass.
All tests green before moving to next task.
Use real dependencies where feasible. Mocks only for external services outside your control.
Decompose classes by method dependency. Generate in dependency order, test each method individually.
Bounded repair. 3 fix attempts max, then reassess or ask user.

永远不要在编写测试之前编写实现代码。
永远不要为了满足测试而修改规范。 从规范衍生的测试是权威的。
仅编写最少的代码。 除了使当前测试通过的代码外，不要添加任何内容。
所有测试通过后再进入下一个任务。
尽可能使用真实依赖。 仅对无法控制的外部服务使用模拟。
按方法依赖分解类。 按依赖顺序生成，单独测试每个方法。
有限修复。 最多尝试3次修复，然后重新评估或询问用户。

Brownfield-specific rules (all brownfield workflows)

已有代码库特定规则（所有已有代码库工作流）

Match existing patterns. New code must follow the conventions discovered in Phase 0 (naming, file structure, import style, error handling).
Refactor only what you wrote. Do NOT refactor existing code unless the task explicitly requires it. Existing code is assumed correct until proven otherwise.
Read before calling. Before calling existing functions, read their actual signatures. Do not assume existing interfaces — verify them.
If an existing test breaks, your new code caused a regression. Fix your new code first (existing passing tests are authoritative). Only modify an existing test if the spec explicitly changes that behavior.
If a characterization test breaks (no-tests workflow), the same rule applies: your new code caused a regression. Fix your new code. The characterization test documents real behavior that something depends on.

匹配现有模式。 新代码必须遵循阶段0发现的规范（命名、文件结构、导入风格、错误处理）。
仅重构你编写的代码。 除非任务明确要求，否则不要重构现有代码。现有代码在被证明有误之前被假定为正确。
调用前先阅读。 在调用现有函数之前，阅读它们的实际签名。不要假设现有接口——要验证。
如果现有测试失败，你的新代码导致了回归。首先修复你的新代码（已通过的现有测试是权威的）。只有当规范明确改变该行为时，才修改现有测试。
如果特征测试失败（无测试工作流），适用相同规则：你的新代码导致了回归。修复你的新代码。特征测试记录了真实的、有其他依赖的行为。

Hallucination prevention

防止幻觉

Verify external APIs/libraries exist and check current interfaces
Chain-of-thought: reason step-by-step before coding
Run static analysis after generation
Use execution traceback (not just re-reading) to fix failures
Brownfield: Read existing code before calling it; verify signatures

验证外部API/库存在并检查当前接口
链式思考：编码前逐步推理
生成后运行静态分析
使用执行回溯（不仅仅是重新阅读）修复失败
已有代码库：调用现有代码前先阅读；验证签名

Phase 6: Verification

阶段6：验证

After all tasks complete, verify the full delivery.

所有任务完成后，验证完整交付成果。

Checklist

检查清单

Additional no-tests verification

无测试场景的额外验证

Test infrastructure works:
```
[test command]
```
runs cleanly from project root
Characterization tests still pass: All CHAR-* tests green, confirming no behavioral regressions in the affected area
Test conventions documented: Future developers can find and follow the test patterns (in design-delta.md or equivalent)
Runner script exists: Test command is in manifest (package.json scripts, Makefile, etc.) — not just a manual invocation

测试基础设施可用：
```
[测试命令]
```
可从项目根目录干净运行
特征测试仍通过：所有CHAR-*测试通过，确认受影响区域无行为回归
测试规范已记录：未来开发者可以找到并遵循测试模式（在design-delta.md或等效文件中）
存在运行脚本：测试命令在清单文件中（package.json脚本、Makefile等）——不仅仅是手动调用

Deliverables

交付物

Greenfield:

```
requirements.md
```
— Full specification
```
design.md
```
— Full technical design
```
tasks.md
```
— Task list with traceability matrix
Test suite — All passing
Implementation code

Brownfield enhancement/refactor (tests exist):

```
requirements.md
```
— New/changed requirements only (or appended to existing)
```
design-delta.md
```
— What changed in the design
```
tasks.md
```
— Task list with traceability matrix for this iteration
New/modified tests
Implementation changes
Change summary: Files modified, files created, behaviors added/changed

Brownfield enhancement/refactor (no tests):

```
requirements.md
```
— New/changed requirements only (or appended to existing)
```
design-delta.md
```
— What changed in the design, INCLUDING test infrastructure decisions (framework, conventions, directory structure)
```
tasks.md
```
— Task list with traceability matrix (includes CHAR-* mappings)
Test infrastructure: config, runner script, directory structure
Characterization tests for affected area
New tests derived from spec
Implementation changes
Change summary: Files modified, files created, behaviors added/changed, test infrastructure established

Bugfix:

```
bugfix.md
```
— Bug analysis with Current/Expected/Unchanged
Tests: reproduction, fix validation, regression
Fix implementation
Change summary

全新项目：

```
requirements.md
```
— 完整规范
```
design.md
```
— 完整技术设计
```
tasks.md
```
— 包含可追溯性矩阵的任务列表
测试套件 — 全部通过
实现代码

已有代码库功能增强/重构（存在测试）：

```
requirements.md
```
— 仅包含新增/修改的需求（或追加到现有文件）
```
design-delta.md
```
— 设计中的变更内容
```
tasks.md
```
— 此迭代的任务列表，包含可追溯性矩阵
新增/修改的测试
实现变更
变更摘要：修改的文件、创建的文件、新增/变更的行为

已有代码库功能增强/重构（无测试）：

```
requirements.md
```
— 仅包含新增/修改的需求（或追加到现有文件）
```
design-delta.md
```
— 设计中的变更内容，包括测试基础设施决策（框架、规范、目录结构）
```
tasks.md
```
— 包含可追溯性矩阵的任务列表（包含CHAR-*映射）
测试基础设施：配置、运行脚本、目录结构
受影响区域的特征测试
从规范衍生的新测试
实现变更
变更摘要：修改的文件、创建的文件、新增/变更的行为、已建立的测试基础设施

缺陷修复：

```
bugfix.md
```
— 包含当前/预期/不变行为的缺陷分析
测试：复现测试、修复验证测试、回归测试
修复实现
变更摘要

Failure Recovery

故障恢复

Test fails
  |
  +-> Is it a NEW test that fails?
  |     +-> Code bug: fix implementation
  |     +-> Test wrong: does it match spec?
  |          +-> Yes: fix code (spec is authoritative)
  |          +-> No: fix test (or revisit spec with user)
  |
  +-> Is it an EXISTING test that fails? (standard brownfield)
  |     +-> Your new code caused a regression
  |     +-> Fix your new code (existing tests are authoritative)
  |     +-> Do NOT modify the existing test unless the spec
  |         explicitly changes that behavior
  |     +-> If the existing test seems wrong, confirm with user
  |         before changing it
  |
  +-> Is it a CHARACTERIZATION test that fails? (no-tests brownfield)
  |     +-> Your new code caused a behavioral regression
  |     +-> Fix your new code (characterization tests document real behavior)
  |     +-> Do NOT modify the characterization test unless the spec
  |         explicitly changes that behavior (listed in Modified Behavior
  |         with Was/Now)
  |     +-> If the characterization test documents behavior the spec
  |         INTENDS to change, update the test to match the new spec
  |
  +-> Test infrastructure won't set up? (no-tests bootstrap)
        +-> Check framework compatibility with project's Node/Python/etc. version
        +-> Check for conflicting config (e.g., module type mismatches)
        +-> Try the next framework in the recommendation table
        +-> If stuck after 2 frameworks, ask user for guidance

Never silently change the spec. Confirm with user first. If stuck after 3 attempts, ask user for guidance.

测试失败
  |
  +-> 是否是新测试失败？
  |     +-> 代码错误：修复实现
  |     +-> 测试错误：是否符合规范？
  |          +-> 是：修复代码（规范是权威的）
  |          +-> 否：修复测试（或与用户重新讨论规范）
  |
  +-> 是否是现有测试失败？（标准已有代码库）
  |     +-> 你的新代码导致了回归
  |     +-> 修复你的新代码（现有测试是权威的）
  |     +-> 除非规范明确改变该行为，否则不要修改现有测试
  |     +-> 如果现有测试似乎有误，修改前先与用户确认
  |
  +-> 是否是特征测试失败？（无测试已有代码库）
  |     +-> 你的新代码导致了行为回归
  |     +-> 修复你的新代码（特征测试记录真实行为）
  |     +-> 除非规范明确改变该行为（在变更行为部分列出了旧/新行为），否则不要修改特征测试
  |     +-> 如果特征测试记录的行为是规范有意改变的，更新测试以匹配新规范
  |
  +-> 测试基础设施无法搭建？（无测试搭建阶段）
        +-> 检查框架与项目Node/Python等版本的兼容性
        +-> 检查是否存在冲突配置（例如模块类型不匹配）
        +-> 尝试推荐列表中的下一个框架
        +-> 如果尝试2个框架后仍卡住，询问用户获取指导

永远不要悄悄修改规范。 先与用户确认。 如果尝试3次后仍卡住，询问用户获取指导。

Reference Files

参考文件

Spec templates: See references/spec-template.md for all templates: Feature, Enhancement, Refactor, Bugfix, and Codebase Context
Test plan guide: See references/test-plan.md for test derivation, existing test discovery, characterization tests, traceability, and Given/When/Then
Edge case catalog: See references/edge-cases.md for edge case categories to check during specification

规范模板：所有模板请参阅references/spec-template.md：功能、功能增强、代码重构、缺陷修复和代码库上下文
测试计划指南：测试衍生、现有测试发现、特征测试、可追溯性和Given/When/Then请参阅references/test-plan.md
边缘案例目录：规范制定期间需检查的边缘案例类别请参阅references/edge-cases.md