tdd

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Test-Driven Development

测试驱动开发（TDD）

Write test first. Commit it red. Write minimal code to pass. Commit green. Refactor.

Core principle: tests verify behavior through public interfaces, not implementation. Code can change entirely; tests shouldn't. A test that breaks when you rename an internal function — with no behavior change — was testing implementation. Delete it.

No watched failure = no proof the test tests the right thing. Commit history is the evidence; hooks check its structure.

See tests.md for examples and mocking.md for mocking guidelines.

先编写测试。提交失败状态的测试（red）。编写最少的代码使测试通过。提交成功状态的代码（green）。进行重构。

核心原则： 测试通过公共接口验证行为，而非实现细节。代码可以完全更改；测试不应受影响。如果只是重命名内部函数（未改变行为）就导致测试失败，说明该测试在验证实现细节，应删除它。

没有观察到测试失败 = 无法证明测试验证了正确的内容。提交历史就是证据；钩子会检查其结构。

查看 tests.md 获取示例，查看 mocking.md 获取模拟指南。

Iron Law

铁律

NO PRODUCTION CODE WITHOUT A FAILING TEST COMMITTED FIRST.

Wrote code before the test? Delete it. Implement fresh from tests. Delete means delete. Exceptions (throwaway prototypes, generated code, config) need human sign-off. Thinking "skip TDD just this once"? That is rationalization.

没有先提交失败的测试，就不能编写生产代码。

在测试之前写了代码？删除它。从测试开始重新实现。删除就是彻底删除。例外情况（一次性原型、生成代码、配置）需要人工审批。想“就这一次跳过TDD”？这只是合理化借口。

1. Plan (before any code)

1. 规划（编写任何代码之前）

Use the project's domain glossary so test names and interface vocabulary match the codebase; respect ADRs in the area you touch.

List the behaviors to test, not implementation steps. Prioritize critical paths and complex logic — you can't test everything.
Design interfaces for testability; identify opportunities for deep modules (small interface, deep implementation)
Confirm the public interface and the priority behaviors with the user, then proceed.

使用项目的领域术语表，确保测试名称和接口词汇与代码库一致；遵循你所涉及领域的ADRs（架构决策记录）。

列出要测试的行为，而非实现步骤。优先处理关键路径和复杂逻辑——你无法测试所有内容。
设计便于可测试性的接口；寻找创建深度模块的机会（接口小，实现深入）
与用户确认公共接口和优先测试的行为，然后再继续。

2. Vertical slices, not horizontal

2. 垂直切片，而非水平切片

DO NOT write all tests, then all implementation. That is horizontal slicing and it produces crap tests: written in bulk they test imagined behavior and the shape of things (signatures, data structures), go insensitive to real changes, and commit you to test structure before you understand the code.

Work in vertical slices — one test → one implementation → repeat. Each test responds to what the last cycle taught you.

WRONG (horizontal):  RED: t1 t2 t3 t4   GREEN: i1 i2 i3 i4
RIGHT (vertical):    t1→i1  t2→i2  t3→i3 ...

The first slice is a tracer bullet: it proves the path works end to end.

不要先写完所有测试，再写所有实现。这是水平切片，会产生糟糕的测试：批量编写的测试会验证想象中的行为和事物的形态（签名、数据结构），对实际变化不敏感，还会让你在理解代码之前就被测试结构束缚。

采用垂直切片的方式工作——一个测试 → 一个实现 → 重复此过程。每个测试都会根据上一个周期的经验调整。

WRONG (horizontal):  RED: t1 t2 t3 t4   GREEN: i1 i2 i3 i4
RIGHT (vertical):    t1→i1  t2→i2  t3→i3 ...

第一个切片是“示踪弹”：它能端到端验证路径是否可行。

3. Commit protocol

3. 提交协议

One behavior = one RED commit + one GREEN commit. Multiple cycles per branch is fine; prefer one cycle in flight in Go/Rust repos (see markers).

一个行为 = 一个RED提交 + 一个GREEN提交。一个分支中可以有多个周期；在Go/Rust仓库中，建议一次只进行一个周期（参见标记）。

RED commit

RED提交

Write one failing test. One behavior, clear name, real code — no mocks unless unavoidable.
Run it unmarked; watch it fail for the right reason (feature missing — not a typo or import error). Passes immediately? It tests existing behavior — fix the test.
Add the marker (below), suite green, commit. Tests only, prefix
```
test(red): 
```
.

Language	Marker	Strict?
Python (pytest)	`@pytest.mark.xfail(strict=True)`	yes — XPASS fails suite
TS/JS (vitest)	`test.fails(...)` / `it.fails(...)`	yes
TS/JS (jest)	`it.failing(...)`	yes
Go	`//go:build red` + `TestRed` prefix + `red-tests` job	aggregate only
Rust	`#[ignore = "red"]` + `red_` prefix + `red-tests` job	aggregate only
Other	find a strict expected-failure mechanism; none exists → commit unmarked, tell the human the repo lacks red enforcement

Go/Rust build-tag/ignore markers exclude tests from the normal suite, so a

red-tests

CI job must run only the marked tests and expect failure (no-op when none exist). Its exit code is aggregate: two red tests in flight, one wrongly passing → the job still passes. Strict markers catch this per-test; the job doesn't.

编写一个失败的测试。对应一个行为，名称清晰，使用真实代码——除非必要，否则不要使用模拟。
未添加标记运行测试；观察它因正确的原因失败（功能缺失——而非拼写错误或导入错误）。如果立即通过？说明它测试的是已有行为——修改测试。
添加标记（如下），测试套件全部通过，提交。仅包含测试文件，前缀为
```
test(red): 
```
。

编程语言	标记	是否严格？
Python (pytest)	`@pytest.mark.xfail(strict=True)`	是 —— XPASS会导致测试套件失败
TS/JS (vitest)	`test.fails(...)` / `it.fails(...)`	是
TS/JS (jest)	`it.failing(...)`	是
Go	`//go:build red` + `TestRed` 前缀 + `red-tests` 任务	仅聚合
Rust	`#[ignore = "red"]` + `red_` 前缀 + `red-tests` 任务	仅聚合
其他	寻找严格的预期失败机制；如果不存在 → 不添加标记提交，并告知用户该仓库缺少red状态的强制机制

Go/Rust的构建标签/忽略标记会将测试从常规测试套件中排除，因此必须有一个

red-tests

CI任务仅运行标记的测试，并预期它们失败（没有标记时不执行任何操作）。它的退出码是聚合的：如果有两个red状态的测试在运行，其中一个错误地通过了 → 任务仍然会通过。严格标记会逐个测试捕获这种情况；而该任务不会。

GREEN commit

GREEN提交

Simplest code that passes. YAGNI — no speculative generality.
Remove this cycle's red markers. No other test changes in this commit.
Full suite green, output pristine. Fails? Fix the code, not the test. Prefix
```
feat: 
```
/
```
fix: 
```
.

编写最简单的代码使测试通过。遵循YAGNI原则——不要做投机性的通用设计。
移除当前周期的red标记。此提交中不要修改其他测试。
完整测试套件全部通过，输出干净。如果失败？修复代码，而非测试。前缀为
```
feat: 
```
/
```
fix: 
```
。

REFACTOR (after green only)

重构（仅在green状态后进行）

Remove duplication, improve names, extract helpers, deepen modules. Tests stay green, no new behavior, separate commit. Never refactor while red. Then start the next cycle.

After all tests pass, look for refactor candidates:

移除重复代码，改进命名，提取工具函数，深化模块。测试保持通过，不添加新行为，单独提交。绝对不要在red状态时重构。然后开始下一个周期。

所有测试通过后，寻找重构候选对象：

Enforcement — and its limits

强制机制及其局限性

lint-red.sh

(ships with skill; wire into prek + CI):

staged

commit

modes check that

test(red):

commits touch only test files and add a marker, others add none;

merge

mode rejects any markers in tree; if Go/Rust markers are present, a

red-tests

job must exist.

Hooks verify commit structure and marker hygiene. They do not verify you ran the unmarked test and watched it fail for the right reason — that stays on you. Strict markers partially compensate (XPASS catches tests of already-existing behavior). "Lint passed" ≠ "TDD verified."

lint-red.sh

（随技能包提供；需接入pre-commit钩子和CI）：

staged

commit

模式会检查

test(red):

提交是否仅修改测试文件并添加了标记，其他提交是否未添加标记；

merge

模式会拒绝树中存在任何标记；如果存在Go/Rust标记，则必须有

red-tests

任务。

钩子会验证提交的结构和标记的规范性。它们不会验证你是否运行了未标记的测试并观察到它因正确原因失败——这取决于你自己。严格标记能部分弥补这一点（XPASS会捕获测试已有行为的情况）。“Lint通过”≠“TDD已验证”。

Good tests

优质测试

Quality	Rule
Behavioral	Exercises a real path through the public API; survives refactors
Minimal	One thing. "and" in the name? Split it.
Clear	Name states the behavior, not `test1`
Honest	Tests the code, never the mock

// Good — tests real behavior (vitest; jest: it.failing)
test.fails('retries failed operations 3 times', async () => {
  let attempts = 0;
  const op = () => { attempts++; if (attempts < 3) throw new Error('fail'); return 'ok'; };
  expect(await retryOperation(op)).toBe('ok');
  expect(attempts).toBe(3);
});

特性	规则
行为导向	测试公共API的真实路径；在重构后仍能正常运行
最小化	只测试一件事。名称里有“and”？拆分它。
清晰性	名称说明行为，而非 `test1`
真实性	测试代码，而非模拟对象

// 优质示例 —— 测试真实行为（vitest；jest使用it.failing）
test.fails('retries failed operations 3 times', async () => {
  let attempts = 0;
  const op = () => { attempts++; if (attempts < 3) throw new Error('fail'); return 'ok'; };
  expect(await retryOperation(op)).toBe('ok');
  expect(attempts).toBe(3);
});

Red flags — STOP, delete, restart

危险信号——停止，删除，重新开始

Unmarked test passes before implementation exists · can't explain why the test failed · weakening an assertion in GREEN to make it pass · testing the mock · "just this once" · "I'm being pragmatic, TDD is dogmatic."

Read testing-anti-patterns.md to avoid common pitfalls.

未标记的测试在实现前就通过 · 无法解释测试失败的原因 · 在GREEN阶段弱化断言以使其通过 · 测试模拟对象 · “就这一次” · “我是务实的，TDD太教条了。”

阅读 testing-anti-patterns.md 以避免常见陷阱。

Rationalization table

合理化借口对照表

Excuse	Reality
"Too simple to test"	Simple code breaks. The test takes 30s.
"I'll test after"	Tests-after are biased by the implementation: "what does this do?" not "what should this?"
"Deleting hours of code is wasteful"	Sunk cost. Unverified code is debt.
"Test is hard to write"	Hard to test = hard to use. Listen to it; simplify the interface.
"Must mock everything"	Too coupled. Inject dependencies.
"Lint passed, TDD done"	Hooks check structure, not that you watched the failure.

借口	真相
“太简单了，没必要测试”	简单代码也会出错。写测试只需要30秒。
“我之后再测试”	事后编写的测试会受实现影响：关注的是“这段代码做了什么？”而非“这段代码应该做什么？”
“删除数小时的代码太浪费了”	沉没成本谬误。未验证的代码就是技术债务。
“测试太难写了”	难测试 = 难使用。倾听这个信号；简化接口。
“必须模拟所有东西”	耦合度过高。注入依赖项。
“Lint通过了，TDD就完成了”	钩子只检查结构，不检查你是否观察到了测试失败。

When stuck / debugging

遇到困难/调试时

Don't know how to test → write the wished-for API and assertion first. Found a bug → write a failing test reproducing it, then run the full protocol. Never fix a bug without a test.

不知道如何测试 → 先编写期望的API和断言。发现bug → 编写一个失败的测试重现它，然后执行完整的流程。修复bug时必须先写测试。