writing-good-tests

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Writing Good Tests

编写优质测试

Philosophy

测试理念

"Write tests. Not too many. Mostly integration." — Kent C. Dodds

Tests verify real behavior, not implementation details. The goal is confidence that your code works, not coverage numbers.

Core principles:

Test behavior, not implementation — refactoring shouldn't break tests
Integration tests provide better confidence-to-cost ratio than unit tests
Wait for actual conditions, not arbitrary timeouts
Mock strategically — real dependencies when feasible, mocks for external systems
Don't pollute production code with test-only methods

"编写测试，不要写太多，主要写集成测试。" — Kent C. Dodds

测试要验证真实行为，而非实现细节。目标是确保代码能正常工作，而不是追求覆盖率数字。

核心原则：

测试行为而非实现——重构不应导致测试失败
集成测试的信心成本比优于单元测试
等待实际条件，而非任意超时
策略性地使用Mock——可行时使用真实依赖，外部系统使用Mock
不要在生产代码中添加仅用于测试的方法

Test Structure

测试结构

Use Arrange-Act-Assert (or Given-When-Then):

typescript

test('user can cancel reservation', async () => {
  // Arrange
  const reservation = await createReservation({ userId: 'user-1', roomId: 'room-1' });

  // Act
  const result = await cancelReservation(reservation.id);

  // Assert
  expect(result.status).toBe('cancelled');
  expect(await getReservation(reservation.id)).toBeNull();
});

One action per test. Multiple assertions are fine if they verify the same behavior.

使用Arrange-Act-Assert（或Given-When-Then）模式：

typescript

test('user can cancel reservation', async () => {
  // Arrange（准备）
  const reservation = await createReservation({ userId: 'user-1', roomId: 'room-1' });

  // Act（执行）
  const result = await cancelReservation(reservation.id);

  // Assert（断言）
  expect(result.status).toBe('cancelled');
  expect(await getReservation(reservation.id)).toBeNull();
});

每个测试对应一个操作。如果多个断言验证的是同一行为，那是完全可行的。

Condition-Based Waiting

基于条件的等待

Flaky tests often guess at timing. This creates race conditions where tests pass locally but fail in CI.

Wait for conditions, not time:

typescript

// BAD: Guessing at timing
await new Promise(r => setTimeout(r, 50));
const result = getResult();

// GOOD: Waiting for condition
await waitFor(() => getResult() !== undefined);
const result = getResult();

不稳定的测试通常是因为对时机的猜测。这会导致竞态条件，使得测试在本地通过但在CI中失败。

等待条件，而非时间：

typescript

// 错误：猜测时机
await new Promise(r => setTimeout(r, 50));
const result = getResult();

// 正确：等待条件满足
await waitFor(() => getResult() !== undefined);
const result = getResult();

Generic Polling Function

通用轮询函数

typescript

async function waitFor<T>(
  condition: () => T | undefined | null | false,
  description: string,
  timeoutMs = 5000
): Promise<T> {
  const startTime = Date.now();

  while (true) {
    const result = condition();
    if (result) return result;

    if (Date.now() - startTime > timeoutMs) {
      throw new Error(`Timeout waiting for ${description} after ${timeoutMs}ms`);
    }

    await new Promise(r => setTimeout(r, 10)); // Poll every 10ms
  }
}

typescript

async function waitFor<T>(
  condition: () => T | undefined | null | false,
  description: string,
  timeoutMs = 5000
): Promise<T> {
  const startTime = Date.now();

  while (true) {
    const result = condition();
    if (result) return result;

    if (Date.now() - startTime > timeoutMs) {
      throw new Error(`Timeout waiting for ${description} after ${timeoutMs}ms`);
    }

    await new Promise(r => setTimeout(r, 10)); // 每10ms轮询一次
  }
}

Quick Patterns

快速模式参考

Scenario	Pattern
Wait for event	`waitFor(() => events.find(e => e.type === 'DONE'))`
Wait for state	`waitFor(() => machine.state === 'ready')`
Wait for count	`waitFor(() => items.length >= 5)`

场景	实现方式
等待事件	`waitFor(() => events.find(e => e.type === 'DONE'))`
等待状态	`waitFor(() => machine.state === 'ready')`
等待数量达标	`waitFor(() => items.length >= 5)`

When Arbitrary Timeout IS Correct

何时可以使用任意超时

Only when testing actual timing behavior (debounce, throttle, intervals):

typescript

// Testing tool that ticks every 100ms
await waitForEvent(manager, 'TOOL_STARTED'); // First: wait for condition
await new Promise(r => setTimeout(r, 200));   // Then: wait for 2 ticks
// Comment explains WHY: 200ms = 2 ticks at 100ms intervals

仅在测试实际计时行为（防抖、节流、间隔）时使用：

typescript

// 测试每100ms触发一次的工具
await waitForEvent(manager, 'TOOL_STARTED'); // 首先：等待条件满足
await new Promise(r => setTimeout(r, 200));   // 然后：等待2个周期
// 注释说明原因：200ms = 100ms间隔 × 2个周期

Mocking Strategy

Mock策略

"You don't hate mocks; you hate side-effects." — J.B. Rainsberger

Mocks reveal where side-effects complicate your code. Use them strategically, not reflexively.

"你讨厌的不是Mock，而是副作用。" — J.B. Rainsberger

Mock会揭示哪些地方的副作用让代码变得复杂。要策略性地使用它们，而非条件反射式地使用。

Don't Mock What You Don't Own

不要Mock你不拥有的代码

Create thin wrappers around third-party libraries. Mock YOUR wrapper, not the library.

typescript

// BAD: Mock the HTTP client directly
const mockClient = vi.mocked(httpx.Client);

// GOOD: Create your own wrapper
class RegistryClient {
  constructor(private client: HttpClient) {}
  async getRepos() {
    return this.client.get('https://registry.example.com/v2/_catalog');
  }
}

// Mock your wrapper
vi.mock('./registry-client');

This simplifies tests AND improves your design.

为第三方库创建轻量包装器。Mock你自己的包装器，而非直接Mock库。

typescript

// 错误：直接Mock HTTP客户端
const mockClient = vi.mocked(httpx.Client);

// 正确：创建自己的包装器
class RegistryClient {
  constructor(private client: HttpClient) {}
  async getRepos() {
    return this.client.get('https://registry.example.com/v2/_catalog');
  }
}

// Mock你自己的包装器
vi.mock('./registry-client');

这不仅简化了测试，还能优化你的代码设计。

Managed vs Unmanaged Dependencies

受控依赖 vs 非受控依赖

Dependency Type	Example	Strategy
Managed (you control it)	Your database, your file system	Use REAL instances
Unmanaged (external)	Third-party APIs, SMTP, message bus	Use MOCKS

Communications with managed dependencies are implementation details — you can refactor them freely. Communications with unmanaged dependencies are observable behavior — mocking protects against external changes.

依赖类型	示例	策略
受控（你能控制）	你的数据库、文件系统	使用真实实例
非受控（外部系统）	第三方API、SMTP、消息总线	使用Mock

与受控依赖的交互属于实现细节——你可以自由重构。与非受控依赖的交互属于可观察行为——Mock可以避免外部变化影响测试。

Anti-Pattern: Testing Mock Behavior

反模式：测试Mock行为

typescript

// BAD: Testing that the mock exists
test('renders sidebar', () => {
  render(<Page />);
  expect(screen.getByTestId('sidebar-mock')).toBeInTheDocument();
});

// GOOD: Test real behavior
test('renders sidebar', () => {
  render(<Page />);
  expect(screen.getByRole('navigation')).toBeInTheDocument();
});

Gate: Before asserting on any mock element, ask: "Am I testing real behavior or mock existence?"

typescript

// 错误：测试Mock是否存在
test('renders sidebar', () => {
  render(<Page />);
  expect(screen.getByTestId('sidebar-mock')).toBeInTheDocument();
});

// 正确：测试真实行为
test('renders sidebar', () => {
  render(<Page />);
  expect(screen.getByRole('navigation')).toBeInTheDocument();
});

检查点： 在对任何Mock元素进行断言前，先问自己：“我是在测试真实行为，还是在测试Mock是否存在？”

Anti-Pattern: Mocking Without Understanding

反模式：盲目Mock

typescript

// BAD: Mock breaks test logic
test('detects duplicate server', () => {
  // Mock prevents config write that test depends on!
  vi.mock('ToolCatalog', () => ({
    discoverAndCacheTools: vi.fn().mockResolvedValue(undefined)
  }));
  await addServer(config);
  await addServer(config);  // Should throw - but won't!
});

// GOOD: Mock at correct level
test('detects duplicate server', () => {
  vi.mock('MCPServerManager'); // Just mock slow server startup
  await addServer(config);  // Config written
  await addServer(config);  // Duplicate detected
});

Gate: Before mocking, ask: "What side effects does this have? Does my test depend on them?"

typescript

// 错误：Mock破坏了测试逻辑
test('detects duplicate server', () => {
  // Mock阻止了测试依赖的配置写入操作！
  vi.mock('ToolCatalog', () => ({
    discoverAndCacheTools: vi.fn().mockResolvedValue(undefined)
  }));
  await addServer(config);
  await addServer(config);  // 本应抛出错误，但现在不会了！
});

// 正确：在合适的层级Mock
test('detects duplicate server', () => {
  vi.mock('MCPServerManager'); // 只Mock缓慢的服务器启动过程
  await addServer(config);  // 配置已写入
  await addServer(config);  // 检测到重复
});

检查点： 在Mock前，先问自己：“这会产生什么副作用？我的测试是否依赖这些副作用？”

Anti-Pattern: Incomplete Mocks

反模式：不完整的Mock

Mock the COMPLETE data structure as it exists in reality:

typescript

// BAD: Partial mock
const mockResponse = {
  status: 'success',
  data: { userId: '123' }
  // Missing: metadata that downstream code uses
};

// GOOD: Mirror real API
const mockResponse = {
  status: 'success',
  data: { userId: '123', name: 'Alice' },
  metadata: { requestId: 'req-789', timestamp: 1234567890 }
};

要Mock与真实结构完全一致的数据：

typescript

// 错误：部分Mock
const mockResponse = {
  status: 'success',
  data: { userId: '123' }
  // 缺失：下游代码需要的元数据
};

// 正确：镜像真实API
const mockResponse = {
  status: 'success',
  data: { userId: '123', name: 'Alice' },
  metadata: { requestId: 'req-789', timestamp: 1234567890 }
};

When Mocks Become Too Complex

当Mock变得过于复杂时

Warning signs:

Mock setup longer than test logic
Mocking everything to make test pass
Test breaks when mock changes

"As the number of mocks grows, the probability of testing the mock instead of the desired code goes up." — Codurance

Consider integration tests with real components — often simpler than elaborate mocks.

警告信号：

Mock设置代码比测试逻辑更长
为了让测试通过而Mock所有内容
当Mock变化时测试失败

"随着Mock数量的增加，测试Mock而非目标代码的概率也会上升。" — Codurance

考虑使用包含真实组件的集成测试——通常比复杂的Mock更简单。

Anti-Pattern: Test-Only Methods in Production

反模式：生产代码中的仅测试方法

typescript

// BAD: destroy() only used in tests
class Session {
  async destroy() { /* cleanup */ }
}

// GOOD: Test utilities handle cleanup
// test-utils/session-helpers.ts
export async function cleanupSession(session: Session) {
  const workspace = session.getWorkspaceInfo();
  if (workspace) {
    await workspaceManager.destroyWorkspace(workspace.id);
  }
}

Gate: Before adding any method to production class, ask: "Is this only used by tests?" If yes, put it in test utilities.

typescript

// 错误：destroy()仅用于测试
class Session {
  async destroy() { /* 清理操作 */ }
}

// 正确：测试工具类处理清理
// test-utils/session-helpers.ts
export async function cleanupSession(session: Session) {
  const workspace = session.getWorkspaceInfo();
  if (workspace) {
    await workspaceManager.destroyWorkspace(workspace.id);
  }
}

检查点： 在向生产类添加方法前，先问自己：“这个方法只在测试中使用吗？”如果是，就把它放到测试工具类中。

Test Isolation

测试隔离

Tests should not depend on execution order. But isolation doesn't mean cleaning up everything.

测试不应依赖执行顺序。但隔离并不意味着要清理所有内容。

What to Clean Up

需要清理的内容

Long-lived resources MUST be cleaned up:

Virtual machines, containers
Kubernetes jobs, pods, deployments
Cloud resources (instances, buckets)
Background processes, daemons

Prefer product tools for cleanup when possible:

typescript

afterAll(async () => {
  // Use the product's own cleanup mechanisms
  await deployment.delete();
  await job.terminate();
});

Side-channel cleanup when product tools aren't available:

typescript

afterAll(async () => {
  // Direct cleanup when product doesn't provide it
  await exec('kubectl delete job test-job-123');
});

必须清理长期存在的资源：

虚拟机、容器
Kubernetes作业、Pod、部署
云资源（实例、存储桶）
后台进程、守护进程

尽可能使用产品自带的清理机制：

typescript

afterAll(async () => {
  // 使用产品自身的清理方法
  await deployment.delete();
  await job.terminate();
});

当产品工具不可用时，使用侧通道清理：

typescript

afterAll(async () => {
  // 当产品未提供清理方法时，直接执行清理
  await exec('kubectl delete job test-job-123');
});

What's OK to Leave

可以保留的内容

Database artifacts are fine to leave around. Trying to clean up test data perfectly is a fool's errand and makes multi-step integration tests nearly impossible.

Test records in databases
Log entries
Cached data that expires

The database should handle its own lifecycle. Tests that require pristine state should create unique identifiers, not depend on cleanup.

数据库产物可以保留。试图完美清理测试数据是徒劳的，还会让多步骤集成测试几乎无法进行。

数据库中的测试记录
日志条目
会过期的缓存数据

数据库应自行管理其生命周期。需要纯净状态的测试应创建唯一标识符，而非依赖清理。

Preventing Order Dependencies

避免顺序依赖

typescript

// Use unique identifiers instead of depending on clean state
const testId = `test-${Date.now()}-${Math.random()}`;
const user = await createUser({ email: `${testId}@test.com` });

typescript

// 使用唯一标识符，而非依赖纯净状态
const testId = `test-${Date.now()}-${Math.random()}`;
const user = await createUser({ email: `${testId}@test.com` });

Quick Reference

快速参考

Problem	Fix
Arbitrary setTimeout in tests	Use condition-based waiting
Assert on mock elements	Test real component or unmock
Mock third-party directly	Create wrapper, mock wrapper
Test-only methods in production	Move to test utilities
Mock without understanding	Understand dependencies first
Incomplete mocks	Mirror real API completely
Over-complex mocks	Consider integration tests
Long-lived resources left running	Clean up VMs, k8s jobs, cloud resources

问题	解决方案
测试中使用任意setTimeout	使用基于条件的等待
对Mock元素进行断言	测试真实组件或取消Mock
直接Mock第三方库	创建包装器，Mock包装器
生产代码中存在仅测试方法	移到测试工具类中
盲目Mock	先理解依赖关系
不完整的Mock	完全镜像真实API
Mock过于复杂	考虑使用集成测试
长期存在的资源未停止	清理虚拟机、K8s作业、云资源

Red Flags

危险信号

Stop and reconsider when you see:

Arbitrary
```
setTimeout
```
/
```
sleep
```
without justification
Assertions on mock elements or test IDs
Methods only called in test files
Mock setup is >50% of test code
"Mocking just to be safe"
Test depends on another test running first
Long-lived resources not cleaned up

当你看到以下情况时，请停下来重新考虑：

无正当理由的任意
```
setTimeout
```
/
```
sleep
```
对Mock元素进行断言
仅在测试文件中调用的方法
Mock设置占测试代码的比例超过50%
“为了安全而Mock”
测试依赖另一个测试先运行
长期存在的资源未被清理

TDD Connection

与TDD的关联

TDD prevents most testing anti-patterns:

Write test first → forces thinking about what you're testing
Watch it fail → confirms test tests real behavior, not mocks
Minimal implementation → no test-only methods creep in
Real dependencies first → you see what test needs before mocking

TDD可以避免大多数测试反模式：

先写测试→迫使你思考要测试的内容
看着测试失败→确认测试验证的是真实行为而非Mock
最小化实现→不会混入仅测试方法
优先使用真实依赖→在Mock前了解测试的需求

Property-Based Testing

基于属性的测试

For certain patterns, property-based testing provides stronger coverage than example-based tests. See

property-based-testing

skill for complete reference.

对于某些模式，基于属性的测试比基于示例的测试提供更强的覆盖率。完整参考请查看

property-based-testing

技能。

When to Use PBT

何时使用PBT

Pattern	Example	Why PBT
Serialization pairs	`encode` / `decode` , `toJSON` / `fromJSON`	Roundtrip property catches edge cases
Normalizers	`sanitize` , `canonicalize` , `format`	Idempotence property ensures stability
Validators	`is_valid` , `validate`	Valid-after-normalize property
Pure functions	Business logic, calculations	Multiple properties verify contract
Sorting/ordering	`sort` , `rank` , `compare`	Ordering + idempotence properties

模式	示例	为什么使用PBT
序列化配对	`encode` / `decode` , `toJSON` / `fromJSON`	往返属性可以捕获边缘情况
归一化处理	`sanitize` , `canonicalize` , `format`	幂等性属性确保稳定性
验证器	`is_valid` , `validate`	归一化后仍有效属性
纯函数	业务逻辑、计算	多属性验证契约
排序/排序逻辑	`sort` , `rank` , `compare`	排序+幂等性属性

When NOT to Use PBT

何时不使用PBT

Simple CRUD without transformation
UI/presentation logic
Integration tests requiring external setup
When specific examples suffice and edge cases are well-understood
Prototyping with fluid requirements

无转换的简单CRUD操作
UI/展示逻辑
需要外部设置的集成测试
当特定示例足够且边缘情况已充分理解时
需求多变的原型开发

PBT Quality Gates

PBT质量检查项

Before committing property-based tests:

Not tautological: Assertion doesn't compare same expression (
```
sorted(xs) == sorted(xs)
```
tests nothing)
Strong property: Not just "no crash" - aim for roundtrip, idempotence, or invariants
Not vacuous:
```
assume()
```
calls don't filter out most inputs
Edge cases explicit: Include
```
@example([])
```
,
```
@example([1])
```
decorators
No reimplementation: Don't restate function logic in assertion (
```
assert add(a,b) == a+b
```
)
Realistic constraints: Strategy matches real-world input constraints

在提交基于属性的测试之前：

非同义反复：断言不是比较相同的表达式（
```
sorted(xs) == sorted(xs)
```
什么都测试不了）
强属性：不只是“不崩溃”——目标是往返、幂等性或不变量
非空泛：
```
assume()
```
调用不会过滤掉大多数输入
显式边缘情况：包含
```
@example([])
```
、
```
@example([1])
```
装饰器
不重复实现：不在断言中重述函数逻辑（
```
assert add(a,b) == a+b
```
）
符合现实的约束：策略与真实世界的输入约束匹配