writing-good-tests

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Writing Good Tests

编写优质测试

Philosophy

测试理念

"Write tests. Not too many. Mostly integration." — Kent C. Dodds
Tests verify real behavior, not implementation details. The goal is confidence that your code works, not coverage numbers.
Core principles:
  1. Test behavior, not implementation — refactoring shouldn't break tests
  2. Integration tests provide better confidence-to-cost ratio than unit tests
  3. Wait for actual conditions, not arbitrary timeouts
  4. Mock strategically — real dependencies when feasible, mocks for external systems
  5. Don't pollute production code with test-only methods
"编写测试,不要写太多,主要写集成测试。" — Kent C. Dodds
测试要验证真实行为,而非实现细节。目标是确保代码能正常工作,而不是追求覆盖率数字。
核心原则:
  1. 测试行为而非实现——重构不应导致测试失败
  2. 集成测试的信心成本比优于单元测试
  3. 等待实际条件,而非任意超时
  4. 策略性地使用Mock——可行时使用真实依赖,外部系统使用Mock
  5. 不要在生产代码中添加仅用于测试的方法

Test Structure

测试结构

Use Arrange-Act-Assert (or Given-When-Then):
typescript
test('user can cancel reservation', async () => {
  // Arrange
  const reservation = await createReservation({ userId: 'user-1', roomId: 'room-1' });

  // Act
  const result = await cancelReservation(reservation.id);

  // Assert
  expect(result.status).toBe('cancelled');
  expect(await getReservation(reservation.id)).toBeNull();
});
One action per test. Multiple assertions are fine if they verify the same behavior.
使用Arrange-Act-Assert(或Given-When-Then)模式:
typescript
test('user can cancel reservation', async () => {
  // Arrange(准备)
  const reservation = await createReservation({ userId: 'user-1', roomId: 'room-1' });

  // Act(执行)
  const result = await cancelReservation(reservation.id);

  // Assert(断言)
  expect(result.status).toBe('cancelled');
  expect(await getReservation(reservation.id)).toBeNull();
});
每个测试对应一个操作。如果多个断言验证的是同一行为,那是完全可行的。

Condition-Based Waiting

基于条件的等待

Flaky tests often guess at timing. This creates race conditions where tests pass locally but fail in CI.
Wait for conditions, not time:
typescript
// BAD: Guessing at timing
await new Promise(r => setTimeout(r, 50));
const result = getResult();

// GOOD: Waiting for condition
await waitFor(() => getResult() !== undefined);
const result = getResult();
不稳定的测试通常是因为对时机的猜测。这会导致竞态条件,使得测试在本地通过但在CI中失败。
等待条件,而非时间:
typescript
// 错误:猜测时机
await new Promise(r => setTimeout(r, 50));
const result = getResult();

// 正确:等待条件满足
await waitFor(() => getResult() !== undefined);
const result = getResult();

Generic Polling Function

通用轮询函数

typescript
async function waitFor<T>(
  condition: () => T | undefined | null | false,
  description: string,
  timeoutMs = 5000
): Promise<T> {
  const startTime = Date.now();

  while (true) {
    const result = condition();
    if (result) return result;

    if (Date.now() - startTime > timeoutMs) {
      throw new Error(`Timeout waiting for ${description} after ${timeoutMs}ms`);
    }

    await new Promise(r => setTimeout(r, 10)); // Poll every 10ms
  }
}
typescript
async function waitFor<T>(
  condition: () => T | undefined | null | false,
  description: string,
  timeoutMs = 5000
): Promise<T> {
  const startTime = Date.now();

  while (true) {
    const result = condition();
    if (result) return result;

    if (Date.now() - startTime > timeoutMs) {
      throw new Error(`Timeout waiting for ${description} after ${timeoutMs}ms`);
    }

    await new Promise(r => setTimeout(r, 10)); // 每10ms轮询一次
  }
}

Quick Patterns

快速模式参考

ScenarioPattern
Wait for event
waitFor(() => events.find(e => e.type === 'DONE'))
Wait for state
waitFor(() => machine.state === 'ready')
Wait for count
waitFor(() => items.length >= 5)
场景实现方式
等待事件
waitFor(() => events.find(e => e.type === 'DONE'))
等待状态
waitFor(() => machine.state === 'ready')
等待数量达标
waitFor(() => items.length >= 5)

When Arbitrary Timeout IS Correct

何时可以使用任意超时

Only when testing actual timing behavior (debounce, throttle, intervals):
typescript
// Testing tool that ticks every 100ms
await waitForEvent(manager, 'TOOL_STARTED'); // First: wait for condition
await new Promise(r => setTimeout(r, 200));   // Then: wait for 2 ticks
// Comment explains WHY: 200ms = 2 ticks at 100ms intervals
仅在测试实际计时行为(防抖、节流、间隔)时使用:
typescript
// 测试每100ms触发一次的工具
await waitForEvent(manager, 'TOOL_STARTED'); // 首先:等待条件满足
await new Promise(r => setTimeout(r, 200));   // 然后:等待2个周期
// 注释说明原因:200ms = 100ms间隔 × 2个周期

Mocking Strategy

Mock策略

"You don't hate mocks; you hate side-effects." — J.B. Rainsberger
Mocks reveal where side-effects complicate your code. Use them strategically, not reflexively.
"你讨厌的不是Mock,而是副作用。" — J.B. Rainsberger
Mock会揭示哪些地方的副作用让代码变得复杂。要策略性地使用它们,而非条件反射式地使用。

Don't Mock What You Don't Own

不要Mock你不拥有的代码

Create thin wrappers around third-party libraries. Mock YOUR wrapper, not the library.
typescript
// BAD: Mock the HTTP client directly
const mockClient = vi.mocked(httpx.Client);

// GOOD: Create your own wrapper
class RegistryClient {
  constructor(private client: HttpClient) {}
  async getRepos() {
    return this.client.get('https://registry.example.com/v2/_catalog');
  }
}

// Mock your wrapper
vi.mock('./registry-client');
This simplifies tests AND improves your design.
为第三方库创建轻量包装器。Mock你自己的包装器,而非直接Mock库。
typescript
// 错误:直接Mock HTTP客户端
const mockClient = vi.mocked(httpx.Client);

// 正确:创建自己的包装器
class RegistryClient {
  constructor(private client: HttpClient) {}
  async getRepos() {
    return this.client.get('https://registry.example.com/v2/_catalog');
  }
}

// Mock你自己的包装器
vi.mock('./registry-client');
这不仅简化了测试,还能优化你的代码设计。

Managed vs Unmanaged Dependencies

受控依赖 vs 非受控依赖

Dependency TypeExampleStrategy
Managed (you control it)Your database, your file systemUse REAL instances
Unmanaged (external)Third-party APIs, SMTP, message busUse MOCKS
Communications with managed dependencies are implementation details — you can refactor them freely. Communications with unmanaged dependencies are observable behavior — mocking protects against external changes.
依赖类型示例策略
受控(你能控制)你的数据库、文件系统使用真实实例
非受控(外部系统)第三方API、SMTP、消息总线使用Mock
与受控依赖的交互属于实现细节——你可以自由重构。与非受控依赖的交互属于可观察行为——Mock可以避免外部变化影响测试。

Anti-Pattern: Testing Mock Behavior

反模式:测试Mock行为

typescript
// BAD: Testing that the mock exists
test('renders sidebar', () => {
  render(<Page />);
  expect(screen.getByTestId('sidebar-mock')).toBeInTheDocument();
});

// GOOD: Test real behavior
test('renders sidebar', () => {
  render(<Page />);
  expect(screen.getByRole('navigation')).toBeInTheDocument();
});
Gate: Before asserting on any mock element, ask: "Am I testing real behavior or mock existence?"
typescript
// 错误:测试Mock是否存在
test('renders sidebar', () => {
  render(<Page />);
  expect(screen.getByTestId('sidebar-mock')).toBeInTheDocument();
});

// 正确:测试真实行为
test('renders sidebar', () => {
  render(<Page />);
  expect(screen.getByRole('navigation')).toBeInTheDocument();
});
检查点: 在对任何Mock元素进行断言前,先问自己:“我是在测试真实行为,还是在测试Mock是否存在?”

Anti-Pattern: Mocking Without Understanding

反模式:盲目Mock

typescript
// BAD: Mock breaks test logic
test('detects duplicate server', () => {
  // Mock prevents config write that test depends on!
  vi.mock('ToolCatalog', () => ({
    discoverAndCacheTools: vi.fn().mockResolvedValue(undefined)
  }));
  await addServer(config);
  await addServer(config);  // Should throw - but won't!
});

// GOOD: Mock at correct level
test('detects duplicate server', () => {
  vi.mock('MCPServerManager'); // Just mock slow server startup
  await addServer(config);  // Config written
  await addServer(config);  // Duplicate detected
});
Gate: Before mocking, ask: "What side effects does this have? Does my test depend on them?"
typescript
// 错误:Mock破坏了测试逻辑
test('detects duplicate server', () => {
  // Mock阻止了测试依赖的配置写入操作!
  vi.mock('ToolCatalog', () => ({
    discoverAndCacheTools: vi.fn().mockResolvedValue(undefined)
  }));
  await addServer(config);
  await addServer(config);  // 本应抛出错误,但现在不会了!
});

// 正确:在合适的层级Mock
test('detects duplicate server', () => {
  vi.mock('MCPServerManager'); // 只Mock缓慢的服务器启动过程
  await addServer(config);  // 配置已写入
  await addServer(config);  // 检测到重复
});
检查点: 在Mock前,先问自己:“这会产生什么副作用?我的测试是否依赖这些副作用?”

Anti-Pattern: Incomplete Mocks

反模式:不完整的Mock

Mock the COMPLETE data structure as it exists in reality:
typescript
// BAD: Partial mock
const mockResponse = {
  status: 'success',
  data: { userId: '123' }
  // Missing: metadata that downstream code uses
};

// GOOD: Mirror real API
const mockResponse = {
  status: 'success',
  data: { userId: '123', name: 'Alice' },
  metadata: { requestId: 'req-789', timestamp: 1234567890 }
};
要Mock与真实结构完全一致的数据:
typescript
// 错误:部分Mock
const mockResponse = {
  status: 'success',
  data: { userId: '123' }
  // 缺失:下游代码需要的元数据
};

// 正确:镜像真实API
const mockResponse = {
  status: 'success',
  data: { userId: '123', name: 'Alice' },
  metadata: { requestId: 'req-789', timestamp: 1234567890 }
};

When Mocks Become Too Complex

当Mock变得过于复杂时

Warning signs:
  • Mock setup longer than test logic
  • Mocking everything to make test pass
  • Test breaks when mock changes
"As the number of mocks grows, the probability of testing the mock instead of the desired code goes up." — Codurance
Consider integration tests with real components — often simpler than elaborate mocks.
警告信号:
  • Mock设置代码比测试逻辑更长
  • 为了让测试通过而Mock所有内容
  • 当Mock变化时测试失败
"随着Mock数量的增加,测试Mock而非目标代码的概率也会上升。" — Codurance
考虑使用包含真实组件的集成测试——通常比复杂的Mock更简单。

Anti-Pattern: Test-Only Methods in Production

反模式:生产代码中的仅测试方法

typescript
// BAD: destroy() only used in tests
class Session {
  async destroy() { /* cleanup */ }
}

// GOOD: Test utilities handle cleanup
// test-utils/session-helpers.ts
export async function cleanupSession(session: Session) {
  const workspace = session.getWorkspaceInfo();
  if (workspace) {
    await workspaceManager.destroyWorkspace(workspace.id);
  }
}
Gate: Before adding any method to production class, ask: "Is this only used by tests?" If yes, put it in test utilities.
typescript
// 错误:destroy()仅用于测试
class Session {
  async destroy() { /* 清理操作 */ }
}

// 正确:测试工具类处理清理
// test-utils/session-helpers.ts
export async function cleanupSession(session: Session) {
  const workspace = session.getWorkspaceInfo();
  if (workspace) {
    await workspaceManager.destroyWorkspace(workspace.id);
  }
}
检查点: 在向生产类添加方法前,先问自己:“这个方法只在测试中使用吗?”如果是,就把它放到测试工具类中。

Test Isolation

测试隔离

Tests should not depend on execution order. But isolation doesn't mean cleaning up everything.
测试不应依赖执行顺序。但隔离并不意味着要清理所有内容。

What to Clean Up

需要清理的内容

Long-lived resources MUST be cleaned up:
  • Virtual machines, containers
  • Kubernetes jobs, pods, deployments
  • Cloud resources (instances, buckets)
  • Background processes, daemons
Prefer product tools for cleanup when possible:
typescript
afterAll(async () => {
  // Use the product's own cleanup mechanisms
  await deployment.delete();
  await job.terminate();
});
Side-channel cleanup when product tools aren't available:
typescript
afterAll(async () => {
  // Direct cleanup when product doesn't provide it
  await exec('kubectl delete job test-job-123');
});
必须清理长期存在的资源:
  • 虚拟机、容器
  • Kubernetes作业、Pod、部署
  • 云资源(实例、存储桶)
  • 后台进程、守护进程
尽可能使用产品自带的清理机制
typescript
afterAll(async () => {
  // 使用产品自身的清理方法
  await deployment.delete();
  await job.terminate();
});
当产品工具不可用时,使用侧通道清理
typescript
afterAll(async () => {
  // 当产品未提供清理方法时,直接执行清理
  await exec('kubectl delete job test-job-123');
});

What's OK to Leave

可以保留的内容

Database artifacts are fine to leave around. Trying to clean up test data perfectly is a fool's errand and makes multi-step integration tests nearly impossible.
  • Test records in databases
  • Log entries
  • Cached data that expires
The database should handle its own lifecycle. Tests that require pristine state should create unique identifiers, not depend on cleanup.
数据库产物可以保留。试图完美清理测试数据是徒劳的,还会让多步骤集成测试几乎无法进行。
  • 数据库中的测试记录
  • 日志条目
  • 会过期的缓存数据
数据库应自行管理其生命周期。需要纯净状态的测试应创建唯一标识符,而非依赖清理。

Preventing Order Dependencies

避免顺序依赖

typescript
// Use unique identifiers instead of depending on clean state
const testId = `test-${Date.now()}-${Math.random()}`;
const user = await createUser({ email: `${testId}@test.com` });
typescript
// 使用唯一标识符,而非依赖纯净状态
const testId = `test-${Date.now()}-${Math.random()}`;
const user = await createUser({ email: `${testId}@test.com` });

Quick Reference

快速参考

ProblemFix
Arbitrary setTimeout in testsUse condition-based waiting
Assert on mock elementsTest real component or unmock
Mock third-party directlyCreate wrapper, mock wrapper
Test-only methods in productionMove to test utilities
Mock without understandingUnderstand dependencies first
Incomplete mocksMirror real API completely
Over-complex mocksConsider integration tests
Long-lived resources left runningClean up VMs, k8s jobs, cloud resources
问题解决方案
测试中使用任意setTimeout使用基于条件的等待
对Mock元素进行断言测试真实组件或取消Mock
直接Mock第三方库创建包装器,Mock包装器
生产代码中存在仅测试方法移到测试工具类中
盲目Mock先理解依赖关系
不完整的Mock完全镜像真实API
Mock过于复杂考虑使用集成测试
长期存在的资源未停止清理虚拟机、K8s作业、云资源

Red Flags

危险信号

Stop and reconsider when you see:
  • Arbitrary
    setTimeout
    /
    sleep
    without justification
  • Assertions on mock elements or test IDs
  • Methods only called in test files
  • Mock setup is >50% of test code
  • "Mocking just to be safe"
  • Test depends on another test running first
  • Long-lived resources not cleaned up
当你看到以下情况时,请停下来重新考虑:
  • 无正当理由的任意
    setTimeout
    /
    sleep
  • 对Mock元素进行断言
  • 仅在测试文件中调用的方法
  • Mock设置占测试代码的比例超过50%
  • “为了安全而Mock”
  • 测试依赖另一个测试先运行
  • 长期存在的资源未被清理

TDD Connection

与TDD的关联

TDD prevents most testing anti-patterns:
  • Write test first → forces thinking about what you're testing
  • Watch it fail → confirms test tests real behavior, not mocks
  • Minimal implementation → no test-only methods creep in
  • Real dependencies first → you see what test needs before mocking
TDD可以避免大多数测试反模式:
  • 先写测试→迫使你思考要测试的内容
  • 看着测试失败→确认测试验证的是真实行为而非Mock
  • 最小化实现→不会混入仅测试方法
  • 优先使用真实依赖→在Mock前了解测试的需求

Property-Based Testing

基于属性的测试

For certain patterns, property-based testing provides stronger coverage than example-based tests. See
property-based-testing
skill for complete reference.
对于某些模式,基于属性的测试比基于示例的测试提供更强的覆盖率。完整参考请查看
property-based-testing
技能。

When to Use PBT

何时使用PBT

PatternExampleWhy PBT
Serialization pairs
encode
/
decode
,
toJSON
/
fromJSON
Roundtrip property catches edge cases
Normalizers
sanitize
,
canonicalize
,
format
Idempotence property ensures stability
Validators
is_valid
,
validate
Valid-after-normalize property
Pure functionsBusiness logic, calculationsMultiple properties verify contract
Sorting/ordering
sort
,
rank
,
compare
Ordering + idempotence properties
模式示例为什么使用PBT
序列化配对
encode
/
decode
,
toJSON
/
fromJSON
往返属性可以捕获边缘情况
归一化处理
sanitize
,
canonicalize
,
format
幂等性属性确保稳定性
验证器
is_valid
,
validate
归一化后仍有效属性
纯函数业务逻辑、计算多属性验证契约
排序/排序逻辑
sort
,
rank
,
compare
排序+幂等性属性

When NOT to Use PBT

何时不使用PBT

  • Simple CRUD without transformation
  • UI/presentation logic
  • Integration tests requiring external setup
  • When specific examples suffice and edge cases are well-understood
  • Prototyping with fluid requirements
  • 无转换的简单CRUD操作
  • UI/展示逻辑
  • 需要外部设置的集成测试
  • 当特定示例足够且边缘情况已充分理解时
  • 需求多变的原型开发

PBT Quality Gates

PBT质量检查项

Before committing property-based tests:
  • Not tautological: Assertion doesn't compare same expression (
    sorted(xs) == sorted(xs)
    tests nothing)
  • Strong property: Not just "no crash" - aim for roundtrip, idempotence, or invariants
  • Not vacuous:
    assume()
    calls don't filter out most inputs
  • Edge cases explicit: Include
    @example([])
    ,
    @example([1])
    decorators
  • No reimplementation: Don't restate function logic in assertion (
    assert add(a,b) == a+b
    )
  • Realistic constraints: Strategy matches real-world input constraints
在提交基于属性的测试之前:
  • 非同义反复:断言不是比较相同的表达式(
    sorted(xs) == sorted(xs)
    什么都测试不了)
  • 强属性:不只是“不崩溃”——目标是往返、幂等性或不变量
  • 非空泛
    assume()
    调用不会过滤掉大多数输入
  • 显式边缘情况:包含
    @example([])
    @example([1])
    装饰器
  • 不重复实现:不在断言中重述函数逻辑(
    assert add(a,b) == a+b
  • 符合现实的约束:策略与真实世界的输入约束匹配