writing-good-tests
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseWriting Good Tests
编写优质测试
Philosophy
测试理念
"Write tests. Not too many. Mostly integration." — Kent C. Dodds
Tests verify real behavior, not implementation details. The goal is confidence that your code works, not coverage numbers.
Core principles:
- Test behavior, not implementation — refactoring shouldn't break tests
- Integration tests provide better confidence-to-cost ratio than unit tests
- Wait for actual conditions, not arbitrary timeouts
- Mock strategically — real dependencies when feasible, mocks for external systems
- Don't pollute production code with test-only methods
"编写测试,不要写太多,主要写集成测试。" — Kent C. Dodds
测试要验证真实行为,而非实现细节。目标是确保代码能正常工作,而不是追求覆盖率数字。
核心原则:
- 测试行为而非实现——重构不应导致测试失败
- 集成测试的信心成本比优于单元测试
- 等待实际条件,而非任意超时
- 策略性地使用Mock——可行时使用真实依赖,外部系统使用Mock
- 不要在生产代码中添加仅用于测试的方法
Test Structure
测试结构
Use Arrange-Act-Assert (or Given-When-Then):
typescript
test('user can cancel reservation', async () => {
// Arrange
const reservation = await createReservation({ userId: 'user-1', roomId: 'room-1' });
// Act
const result = await cancelReservation(reservation.id);
// Assert
expect(result.status).toBe('cancelled');
expect(await getReservation(reservation.id)).toBeNull();
});One action per test. Multiple assertions are fine if they verify the same behavior.
使用Arrange-Act-Assert(或Given-When-Then)模式:
typescript
test('user can cancel reservation', async () => {
// Arrange(准备)
const reservation = await createReservation({ userId: 'user-1', roomId: 'room-1' });
// Act(执行)
const result = await cancelReservation(reservation.id);
// Assert(断言)
expect(result.status).toBe('cancelled');
expect(await getReservation(reservation.id)).toBeNull();
});每个测试对应一个操作。如果多个断言验证的是同一行为,那是完全可行的。
Condition-Based Waiting
基于条件的等待
Flaky tests often guess at timing. This creates race conditions where tests pass locally but fail in CI.
Wait for conditions, not time:
typescript
// BAD: Guessing at timing
await new Promise(r => setTimeout(r, 50));
const result = getResult();
// GOOD: Waiting for condition
await waitFor(() => getResult() !== undefined);
const result = getResult();不稳定的测试通常是因为对时机的猜测。这会导致竞态条件,使得测试在本地通过但在CI中失败。
等待条件,而非时间:
typescript
// 错误:猜测时机
await new Promise(r => setTimeout(r, 50));
const result = getResult();
// 正确:等待条件满足
await waitFor(() => getResult() !== undefined);
const result = getResult();Generic Polling Function
通用轮询函数
typescript
async function waitFor<T>(
condition: () => T | undefined | null | false,
description: string,
timeoutMs = 5000
): Promise<T> {
const startTime = Date.now();
while (true) {
const result = condition();
if (result) return result;
if (Date.now() - startTime > timeoutMs) {
throw new Error(`Timeout waiting for ${description} after ${timeoutMs}ms`);
}
await new Promise(r => setTimeout(r, 10)); // Poll every 10ms
}
}typescript
async function waitFor<T>(
condition: () => T | undefined | null | false,
description: string,
timeoutMs = 5000
): Promise<T> {
const startTime = Date.now();
while (true) {
const result = condition();
if (result) return result;
if (Date.now() - startTime > timeoutMs) {
throw new Error(`Timeout waiting for ${description} after ${timeoutMs}ms`);
}
await new Promise(r => setTimeout(r, 10)); // 每10ms轮询一次
}
}Quick Patterns
快速模式参考
| Scenario | Pattern |
|---|---|
| Wait for event | |
| Wait for state | |
| Wait for count | |
| 场景 | 实现方式 |
|---|---|
| 等待事件 | |
| 等待状态 | |
| 等待数量达标 | |
When Arbitrary Timeout IS Correct
何时可以使用任意超时
Only when testing actual timing behavior (debounce, throttle, intervals):
typescript
// Testing tool that ticks every 100ms
await waitForEvent(manager, 'TOOL_STARTED'); // First: wait for condition
await new Promise(r => setTimeout(r, 200)); // Then: wait for 2 ticks
// Comment explains WHY: 200ms = 2 ticks at 100ms intervals仅在测试实际计时行为(防抖、节流、间隔)时使用:
typescript
// 测试每100ms触发一次的工具
await waitForEvent(manager, 'TOOL_STARTED'); // 首先:等待条件满足
await new Promise(r => setTimeout(r, 200)); // 然后:等待2个周期
// 注释说明原因:200ms = 100ms间隔 × 2个周期Mocking Strategy
Mock策略
"You don't hate mocks; you hate side-effects." — J.B. Rainsberger
Mocks reveal where side-effects complicate your code. Use them strategically, not reflexively.
"你讨厌的不是Mock,而是副作用。" — J.B. Rainsberger
Mock会揭示哪些地方的副作用让代码变得复杂。要策略性地使用它们,而非条件反射式地使用。
Don't Mock What You Don't Own
不要Mock你不拥有的代码
Create thin wrappers around third-party libraries. Mock YOUR wrapper, not the library.
typescript
// BAD: Mock the HTTP client directly
const mockClient = vi.mocked(httpx.Client);
// GOOD: Create your own wrapper
class RegistryClient {
constructor(private client: HttpClient) {}
async getRepos() {
return this.client.get('https://registry.example.com/v2/_catalog');
}
}
// Mock your wrapper
vi.mock('./registry-client');This simplifies tests AND improves your design.
为第三方库创建轻量包装器。Mock你自己的包装器,而非直接Mock库。
typescript
// 错误:直接Mock HTTP客户端
const mockClient = vi.mocked(httpx.Client);
// 正确:创建自己的包装器
class RegistryClient {
constructor(private client: HttpClient) {}
async getRepos() {
return this.client.get('https://registry.example.com/v2/_catalog');
}
}
// Mock你自己的包装器
vi.mock('./registry-client');这不仅简化了测试,还能优化你的代码设计。
Managed vs Unmanaged Dependencies
受控依赖 vs 非受控依赖
| Dependency Type | Example | Strategy |
|---|---|---|
| Managed (you control it) | Your database, your file system | Use REAL instances |
| Unmanaged (external) | Third-party APIs, SMTP, message bus | Use MOCKS |
Communications with managed dependencies are implementation details — you can refactor them freely. Communications with unmanaged dependencies are observable behavior — mocking protects against external changes.
| 依赖类型 | 示例 | 策略 |
|---|---|---|
| 受控(你能控制) | 你的数据库、文件系统 | 使用真实实例 |
| 非受控(外部系统) | 第三方API、SMTP、消息总线 | 使用Mock |
与受控依赖的交互属于实现细节——你可以自由重构。与非受控依赖的交互属于可观察行为——Mock可以避免外部变化影响测试。
Anti-Pattern: Testing Mock Behavior
反模式:测试Mock行为
typescript
// BAD: Testing that the mock exists
test('renders sidebar', () => {
render(<Page />);
expect(screen.getByTestId('sidebar-mock')).toBeInTheDocument();
});
// GOOD: Test real behavior
test('renders sidebar', () => {
render(<Page />);
expect(screen.getByRole('navigation')).toBeInTheDocument();
});Gate: Before asserting on any mock element, ask: "Am I testing real behavior or mock existence?"
typescript
// 错误:测试Mock是否存在
test('renders sidebar', () => {
render(<Page />);
expect(screen.getByTestId('sidebar-mock')).toBeInTheDocument();
});
// 正确:测试真实行为
test('renders sidebar', () => {
render(<Page />);
expect(screen.getByRole('navigation')).toBeInTheDocument();
});检查点: 在对任何Mock元素进行断言前,先问自己:“我是在测试真实行为,还是在测试Mock是否存在?”
Anti-Pattern: Mocking Without Understanding
反模式:盲目Mock
typescript
// BAD: Mock breaks test logic
test('detects duplicate server', () => {
// Mock prevents config write that test depends on!
vi.mock('ToolCatalog', () => ({
discoverAndCacheTools: vi.fn().mockResolvedValue(undefined)
}));
await addServer(config);
await addServer(config); // Should throw - but won't!
});
// GOOD: Mock at correct level
test('detects duplicate server', () => {
vi.mock('MCPServerManager'); // Just mock slow server startup
await addServer(config); // Config written
await addServer(config); // Duplicate detected
});Gate: Before mocking, ask: "What side effects does this have? Does my test depend on them?"
typescript
// 错误:Mock破坏了测试逻辑
test('detects duplicate server', () => {
// Mock阻止了测试依赖的配置写入操作!
vi.mock('ToolCatalog', () => ({
discoverAndCacheTools: vi.fn().mockResolvedValue(undefined)
}));
await addServer(config);
await addServer(config); // 本应抛出错误,但现在不会了!
});
// 正确:在合适的层级Mock
test('detects duplicate server', () => {
vi.mock('MCPServerManager'); // 只Mock缓慢的服务器启动过程
await addServer(config); // 配置已写入
await addServer(config); // 检测到重复
});检查点: 在Mock前,先问自己:“这会产生什么副作用?我的测试是否依赖这些副作用?”
Anti-Pattern: Incomplete Mocks
反模式:不完整的Mock
Mock the COMPLETE data structure as it exists in reality:
typescript
// BAD: Partial mock
const mockResponse = {
status: 'success',
data: { userId: '123' }
// Missing: metadata that downstream code uses
};
// GOOD: Mirror real API
const mockResponse = {
status: 'success',
data: { userId: '123', name: 'Alice' },
metadata: { requestId: 'req-789', timestamp: 1234567890 }
};要Mock与真实结构完全一致的数据:
typescript
// 错误:部分Mock
const mockResponse = {
status: 'success',
data: { userId: '123' }
// 缺失:下游代码需要的元数据
};
// 正确:镜像真实API
const mockResponse = {
status: 'success',
data: { userId: '123', name: 'Alice' },
metadata: { requestId: 'req-789', timestamp: 1234567890 }
};When Mocks Become Too Complex
当Mock变得过于复杂时
Warning signs:
- Mock setup longer than test logic
- Mocking everything to make test pass
- Test breaks when mock changes
"As the number of mocks grows, the probability of testing the mock instead of the desired code goes up." — Codurance
Consider integration tests with real components — often simpler than elaborate mocks.
警告信号:
- Mock设置代码比测试逻辑更长
- 为了让测试通过而Mock所有内容
- 当Mock变化时测试失败
"随着Mock数量的增加,测试Mock而非目标代码的概率也会上升。" — Codurance
考虑使用包含真实组件的集成测试——通常比复杂的Mock更简单。
Anti-Pattern: Test-Only Methods in Production
反模式:生产代码中的仅测试方法
typescript
// BAD: destroy() only used in tests
class Session {
async destroy() { /* cleanup */ }
}
// GOOD: Test utilities handle cleanup
// test-utils/session-helpers.ts
export async function cleanupSession(session: Session) {
const workspace = session.getWorkspaceInfo();
if (workspace) {
await workspaceManager.destroyWorkspace(workspace.id);
}
}Gate: Before adding any method to production class, ask: "Is this only used by tests?" If yes, put it in test utilities.
typescript
// 错误:destroy()仅用于测试
class Session {
async destroy() { /* 清理操作 */ }
}
// 正确:测试工具类处理清理
// test-utils/session-helpers.ts
export async function cleanupSession(session: Session) {
const workspace = session.getWorkspaceInfo();
if (workspace) {
await workspaceManager.destroyWorkspace(workspace.id);
}
}检查点: 在向生产类添加方法前,先问自己:“这个方法只在测试中使用吗?”如果是,就把它放到测试工具类中。
Test Isolation
测试隔离
Tests should not depend on execution order. But isolation doesn't mean cleaning up everything.
测试不应依赖执行顺序。但隔离并不意味着要清理所有内容。
What to Clean Up
需要清理的内容
Long-lived resources MUST be cleaned up:
- Virtual machines, containers
- Kubernetes jobs, pods, deployments
- Cloud resources (instances, buckets)
- Background processes, daemons
Prefer product tools for cleanup when possible:
typescript
afterAll(async () => {
// Use the product's own cleanup mechanisms
await deployment.delete();
await job.terminate();
});Side-channel cleanup when product tools aren't available:
typescript
afterAll(async () => {
// Direct cleanup when product doesn't provide it
await exec('kubectl delete job test-job-123');
});必须清理长期存在的资源:
- 虚拟机、容器
- Kubernetes作业、Pod、部署
- 云资源(实例、存储桶)
- 后台进程、守护进程
尽可能使用产品自带的清理机制:
typescript
afterAll(async () => {
// 使用产品自身的清理方法
await deployment.delete();
await job.terminate();
});当产品工具不可用时,使用侧通道清理:
typescript
afterAll(async () => {
// 当产品未提供清理方法时,直接执行清理
await exec('kubectl delete job test-job-123');
});What's OK to Leave
可以保留的内容
Database artifacts are fine to leave around. Trying to clean up test data perfectly is a fool's errand and makes multi-step integration tests nearly impossible.
- Test records in databases
- Log entries
- Cached data that expires
The database should handle its own lifecycle. Tests that require pristine state should create unique identifiers, not depend on cleanup.
数据库产物可以保留。试图完美清理测试数据是徒劳的,还会让多步骤集成测试几乎无法进行。
- 数据库中的测试记录
- 日志条目
- 会过期的缓存数据
数据库应自行管理其生命周期。需要纯净状态的测试应创建唯一标识符,而非依赖清理。
Preventing Order Dependencies
避免顺序依赖
typescript
// Use unique identifiers instead of depending on clean state
const testId = `test-${Date.now()}-${Math.random()}`;
const user = await createUser({ email: `${testId}@test.com` });typescript
// 使用唯一标识符,而非依赖纯净状态
const testId = `test-${Date.now()}-${Math.random()}`;
const user = await createUser({ email: `${testId}@test.com` });Quick Reference
快速参考
| Problem | Fix |
|---|---|
| Arbitrary setTimeout in tests | Use condition-based waiting |
| Assert on mock elements | Test real component or unmock |
| Mock third-party directly | Create wrapper, mock wrapper |
| Test-only methods in production | Move to test utilities |
| Mock without understanding | Understand dependencies first |
| Incomplete mocks | Mirror real API completely |
| Over-complex mocks | Consider integration tests |
| Long-lived resources left running | Clean up VMs, k8s jobs, cloud resources |
| 问题 | 解决方案 |
|---|---|
| 测试中使用任意setTimeout | 使用基于条件的等待 |
| 对Mock元素进行断言 | 测试真实组件或取消Mock |
| 直接Mock第三方库 | 创建包装器,Mock包装器 |
| 生产代码中存在仅测试方法 | 移到测试工具类中 |
| 盲目Mock | 先理解依赖关系 |
| 不完整的Mock | 完全镜像真实API |
| Mock过于复杂 | 考虑使用集成测试 |
| 长期存在的资源未停止 | 清理虚拟机、K8s作业、云资源 |
Red Flags
危险信号
Stop and reconsider when you see:
- Arbitrary /
setTimeoutwithout justificationsleep - Assertions on mock elements or test IDs
- Methods only called in test files
- Mock setup is >50% of test code
- "Mocking just to be safe"
- Test depends on another test running first
- Long-lived resources not cleaned up
当你看到以下情况时,请停下来重新考虑:
- 无正当理由的任意/
setTimeoutsleep - 对Mock元素进行断言
- 仅在测试文件中调用的方法
- Mock设置占测试代码的比例超过50%
- “为了安全而Mock”
- 测试依赖另一个测试先运行
- 长期存在的资源未被清理
TDD Connection
与TDD的关联
TDD prevents most testing anti-patterns:
- Write test first → forces thinking about what you're testing
- Watch it fail → confirms test tests real behavior, not mocks
- Minimal implementation → no test-only methods creep in
- Real dependencies first → you see what test needs before mocking
TDD可以避免大多数测试反模式:
- 先写测试→迫使你思考要测试的内容
- 看着测试失败→确认测试验证的是真实行为而非Mock
- 最小化实现→不会混入仅测试方法
- 优先使用真实依赖→在Mock前了解测试的需求
Property-Based Testing
基于属性的测试
For certain patterns, property-based testing provides stronger coverage than example-based tests. See skill for complete reference.
property-based-testing对于某些模式,基于属性的测试比基于示例的测试提供更强的覆盖率。完整参考请查看技能。
property-based-testingWhen to Use PBT
何时使用PBT
| Pattern | Example | Why PBT |
|---|---|---|
| Serialization pairs | | Roundtrip property catches edge cases |
| Normalizers | | Idempotence property ensures stability |
| Validators | | Valid-after-normalize property |
| Pure functions | Business logic, calculations | Multiple properties verify contract |
| Sorting/ordering | | Ordering + idempotence properties |
| 模式 | 示例 | 为什么使用PBT |
|---|---|---|
| 序列化配对 | | 往返属性可以捕获边缘情况 |
| 归一化处理 | | 幂等性属性确保稳定性 |
| 验证器 | | 归一化后仍有效属性 |
| 纯函数 | 业务逻辑、计算 | 多属性验证契约 |
| 排序/排序逻辑 | | 排序+幂等性属性 |
When NOT to Use PBT
何时不使用PBT
- Simple CRUD without transformation
- UI/presentation logic
- Integration tests requiring external setup
- When specific examples suffice and edge cases are well-understood
- Prototyping with fluid requirements
- 无转换的简单CRUD操作
- UI/展示逻辑
- 需要外部设置的集成测试
- 当特定示例足够且边缘情况已充分理解时
- 需求多变的原型开发
PBT Quality Gates
PBT质量检查项
Before committing property-based tests:
- Not tautological: Assertion doesn't compare same expression (tests nothing)
sorted(xs) == sorted(xs) - Strong property: Not just "no crash" - aim for roundtrip, idempotence, or invariants
- Not vacuous: calls don't filter out most inputs
assume() - Edge cases explicit: Include ,
@example([])decorators@example([1]) - No reimplementation: Don't restate function logic in assertion ()
assert add(a,b) == a+b - Realistic constraints: Strategy matches real-world input constraints
在提交基于属性的测试之前:
- 非同义反复:断言不是比较相同的表达式(什么都测试不了)
sorted(xs) == sorted(xs) - 强属性:不只是“不崩溃”——目标是往返、幂等性或不变量
- 非空泛:调用不会过滤掉大多数输入
assume() - 显式边缘情况:包含、
@example([])装饰器@example([1]) - 不重复实现:不在断言中重述函数逻辑()
assert add(a,b) == a+b - 符合现实的约束:策略与真实世界的输入约束匹配