test-strategy

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

When this skill is activated, always start your first response with the 🧢 emoji.

激活此Skill后，首次回复请务必以🧢表情开头。

Test Strategy

测试策略

A testing strategy answers three questions: what to test, at what level, and how much. Without a strategy, teams end up with either too many slow, brittle e2e tests or too few tests overall - both are expensive. This skill gives the judgment to design a test suite that provides high confidence, fast feedback, and low maintenance cost.

测试策略需要回答三个问题：测试什么、在哪个层级测试、测试程度如何。没有策略的团队，要么会编写大量缓慢且脆弱的e2e测试，要么整体测试数量不足——这两种情况都会产生高额成本。此Skill能帮助你设计出一套兼具高可信度、快速反馈和低维护成本的测试套件。

When to use this skill

何时使用此Skill

Trigger this skill when the user:

Asks which type of test to write for a given scenario
Wants to design a testing strategy for a new service or feature
Needs to decide between unit, integration, and e2e tests
Asks about test coverage targets or metrics
Wants to implement contract testing between services
Is dealing with flaky tests and needs a remediation plan
Asks about TDD or BDD workflow

Do NOT trigger this skill for:

Writing the actual test code syntax for a specific framework (defer to framework docs)
Performance testing or load testing strategy (separate domain)

当用户有以下需求时，触发此Skill：

询问针对特定场景应编写哪种类型的测试
想要为新服务或功能设计测试策略
需要在单元测试、集成测试和e2e测试之间做出选择
询问测试覆盖率目标或指标
想要在服务间实现契约测试
遇到不稳定测试（Flaky Tests）并需要整改方案
询问TDD或BDD工作流程

请勿在以下场景触发此Skill：

为特定框架编写实际测试代码语法（请参考框架文档）
性能测试或负载测试策略（属于独立领域）

Key principles

核心原则

Test behavior, not implementation - Tests should survive refactoring. If moving logic between private methods breaks your tests, the tests are testing the wrong thing. Test public contracts and observable outcomes.
The Testing Trophy over the pyramid - The classic pyramid (many unit, fewer integration, few e2e) was coined before modern tooling. The Trophy (Kent C. Dodds) weights integration tests most heavily: static analysis at the base, unit tests for isolated logic, integration tests for the bulk of coverage, and a few e2e tests for critical paths.
Fast feedback loops - A test suite that takes 30 minutes to run is a test suite that doesn't get run. Design for speed: unit tests in milliseconds, integration tests in seconds, e2e tests reserved for CI only.
Test at the right level - The cost of a test rises as you move up the stack (slower, more brittle, harder to debug). Test each concern at the lowest level that meaningfully exercises it.
Flaky tests are worse than no tests - A test that sometimes fails trains the team to ignore failures. A flaky test in CI delays every deploy. Fix or delete flaky tests immediately; never tolerate them.

测试行为而非实现 - 测试应能在代码重构后依然有效。如果在私有方法间转移逻辑导致测试失败，说明你的测试方向有误。应测试公开契约和可观察的结果。
用测试奖杯模型替代测试金字塔 - 经典的测试金字塔（大量单元测试、少量集成测试、极少e2e测试）是在现代工具出现前提出的。Kent C. Dodds提出的测试奖杯模型更重视集成测试：底层是静态分析，然后是针对独立逻辑的单元测试，集成测试占覆盖总量的大部分，最后是针对关键路径的少量e2e测试。
快速反馈循环 - 一套需要30分钟才能运行完成的测试套件，往往不会被团队频繁执行。测试套件的设计要优先考虑速度：单元测试以毫秒级运行，集成测试以秒级运行，e2e测试仅在CI流程中执行。
在合适的层级测试 - 测试的成本会随着测试层级的提升而增加（速度更慢、更脆弱、调试难度更高）。针对每个测试关注点，选择能有效验证它的最低层级进行测试。
不稳定测试比没有测试更糟糕 - 偶尔失败的测试会让团队逐渐忽略失败结果。CI流程中的不稳定测试会延迟每次部署。发现不稳定测试后应立即修复或删除，绝不能容忍。

Core concepts

核心概念

Test types taxonomy

测试类型分类

Type	What it tests	Speed	Cost	Use for
Static	Type errors, lint violations	Instant	Near-zero	Type safety, obvious mistakes
Unit	Single function/class in isolation	< 10ms	Low	Pure logic, edge cases, algorithms
Integration	Multiple modules together with real dependencies	100ms-2s	Medium	Service layer, DB queries, API handlers
E2E	Full user journey through deployed stack	5-60s	High	Critical user paths, smoke tests
Contract	API contract between producer and consumer	Seconds	Medium	Microservice boundaries

类型	测试内容	速度	成本	适用场景
Static（静态测试）	类型错误、代码规范违规	即时	近乎为零	类型安全检查、明显错误排查
Unit（单元测试）	独立的单个函数/类	< 10毫秒	低	纯逻辑、边界案例、算法
Integration（集成测试）	多个模块与真实依赖协同工作	100毫秒-2秒	中	服务层、数据库查询、API处理器
E2E（端到端测试）	覆盖已部署堆栈的完整用户流程	5-60秒	高	关键用户路径、冒烟测试
Contract（契约测试）	服务生产者与消费者之间的API契约	秒级	中	微服务边界

The Testing Trophy

测试奖杯模型

        /\
       /e2e\           - Few: critical flows only
      /------\
     /  integ  \       - Most: service + DB + API
    /------------\
   /    unit      \    - Some: pure logic and edge cases
  /----------------\
 /     static       \  - Always: types, lint, format
/--------------------\

The key insight is that integration tests give the best ROI for most application code: they test real behavior through real dependencies without the brittleness of e2e tests.

        /\
       /e2e\           - 少量：仅针对关键流程
      /------\
     /  integ  \       - 大部分：服务+数据库+API
    /------------\
   /    unit      \    - 部分：纯逻辑与边界案例
  /----------------\
 /     static       \  - 必备：类型检查、代码规范、格式校验
/--------------------\

核心观点是，对于大多数应用代码而言，集成测试能带来最佳的投资回报率（ROI）：它通过真实依赖测试真实行为，同时避免了e2e测试的脆弱性。

Test doubles

测试替身

Use the minimum isolation necessary for the test's purpose:

Double	When to use	Risk
Stub	Replace slow/unavailable dependency, return canned data	Low - no behavior coupling
Mock	Verify a side effect was triggered (email sent, event published)	Medium - couples to call signature
Spy	Observe calls without replacing behavior	Medium - couples to call count/args
Fake	Replace infrastructure with working in-memory version	Low - tests real behavior patterns

Prefer fakes for infrastructure (in-memory DB, in-memory queue). Mocks should be reserved for side effects you cannot otherwise observe.

根据测试目的，使用必要的最小隔离程度：

替身类型	适用场景	风险
Stub（存根）	替换缓慢/不可用的依赖，返回预设数据	低 - 不与行为耦合
Mock（模拟）	验证副作用是否触发（如邮件发送、事件发布）	中 - 与调用签名耦合
Spy（间谍）	观察调用情况但不替换行为	中 - 与调用次数/参数耦合
Fake（伪实现）	用内存中的可用版本替换基础设施	低 - 测试真实行为模式

优先为基础设施使用伪实现（如内存数据库、内存队列）。模拟（Mock）应仅用于无法通过其他方式观察的副作用。

Coverage metrics

覆盖率指标

Metric	What it measures	When to use
Line coverage	% of lines executed	Baseline floor, not a target
Branch coverage	% of conditional paths taken	Better for logic-heavy code
Mutation coverage	% of introduced bugs caught by tests	Gold standard for test quality

Line coverage above ~80% has diminishing returns and creates perverse incentives. Mutation coverage reveals whether tests actually assert meaningful things.

指标	衡量内容	适用场景
Line coverage（行覆盖率）	已执行代码行的百分比	基准下限，而非目标
Branch coverage（分支覆盖率）	已执行条件路径的百分比	更适合逻辑密集型代码
Mutation coverage（变异覆盖率）	测试捕获引入Bug的百分比	测试质量的黄金标准

行覆盖率超过约80%后，收益会逐渐递减，还会产生不良激励。变异覆盖率能揭示测试是否真正验证了有意义的内容。

Common tasks

常见任务

Choose the right test type - decision matrix

选择合适的测试类型——决策矩阵

When deciding what level to test something at, apply this logic:

Is this pure logic with no external dependencies?
  YES → Unit test
  NO  → Does it require a real DB / HTTP call / file system?
          YES → Integration test (use real infrastructure or a fast fake)
          NO  → Does it span multiple services or require a browser?
                  YES → E2E test (sparingly)
                  NO  → Integration test

Additional rules:

Cross-service API boundaries → Contract test (Pact or similar)
Complex UI interaction that cannot be tested at component level → E2E
Algorithm with many edge cases → Unit test per edge case + one integration

决定测试层级时，可遵循以下逻辑：

该内容是否为无外部依赖的纯逻辑？
  是 → 单元测试
  否 → 是否需要真实数据库/HTTP调用/文件系统？
          是 → 集成测试（使用真实基础设施或快速伪实现）
          否 → 是否跨多个服务或需要浏览器？
                  是 → e2e测试（谨慎使用）
                  否 → 集成测试

补充规则：

跨服务API边界 → 契约测试（使用Pact等工具）
无法在组件层级测试的复杂UI交互 → e2e测试
存在多个边界案例的算法 → 为每个边界案例编写单元测试+一个集成测试

Design a test suite for a new service

为新服务设计测试套件

Structure the test suite before writing the first line of code:

Map the test surface - Identify all external I/O: databases, queues, HTTP clients, file system. These are the integration seams.
Choose infrastructure strategy - Real DB with test containers, in-memory fake, or Docker Compose. Prefer real DBs for schema-heavy services.
Define the testing trophy for your context - Decide the ratio before you write tests. A typical distribution: 60% integration, 30% unit, 10% e2e.
Set up test data factories - Centralize how test objects are created. Factories prevent fragile fixtures and make tests self-documenting.
Wire CI from day one - Tests that only run locally drift. Run unit + integration in every PR, e2e in pre-merge or nightly.

在编写第一行代码前，先规划测试套件的结构：

梳理测试范围 - 识别所有外部输入输出：数据库、队列、HTTP客户端、文件系统。这些是集成测试的切入点。
选择基础设施策略 - 使用带测试容器的真实数据库、内存伪实现或Docker Compose。对于架构依赖强的服务，优先使用真实数据库。
根据上下文定义测试奖杯模型 - 在编写测试前确定各类测试的比例。典型分布为：60%集成测试、30%单元测试、10%e2e测试。
设置测试数据工厂 - 集中管理测试对象的创建方式。工厂模式可避免脆弱的测试固定数据，让测试具备自文档性。
从项目初期搭建CI流程 - 仅在本地运行的测试会逐渐失效。在每个PR中运行单元+集成测试，在预合并或夜间执行e2e测试。

Write effective unit tests - patterns

编写高效的单元测试——模式

Unit tests work best for:

Pure functions (same input always gives same output)
Complex conditional logic with many branches
Data transformations and parsing
Domain model invariants

Arrange-Act-Assert structure:

javascript

test('applies 10% discount for orders over $100', () => {
  // Arrange
  const order = buildOrder({ subtotal: 120 });

  // Act
  const discounted = applyLoyaltyDiscount(order);

  // Assert
  expect(discounted.total).toBe(108);
});

Parameterize boundary conditions:

javascript

test.each([
  [99,  0],   // just below threshold - no discount
  [100, 10],  // exactly at threshold
  [200, 20],  // above threshold
])('order of $%i gets $%i discount', (subtotal, expectedDiscount) => {
  const order = buildOrder({ subtotal });
  expect(applyLoyaltyDiscount(order).discount).toBe(expectedDiscount);
});

See

references/test-patterns.md

for more patterns.

单元测试最适合以下场景：

纯函数（相同输入始终得到相同输出）
存在多个分支的复杂条件逻辑
数据转换与解析
领域模型不变量

Arrange-Act-Assert（准备-执行-断言）结构：

javascript

test('applies 10% discount for orders over $100', () => {
  // Arrange
  const order = buildOrder({ subtotal: 120 });

  // Act
  const discounted = applyLoyaltyDiscount(order);

  // Assert
  expect(discounted.total).toBe(108);
});

参数化边界条件：

javascript

test.each([
  [99,  0],   // just below threshold - no discount
  [100, 10],  // exactly at threshold
  [200, 20],  // above threshold
])('order of $%i gets $%i discount', (subtotal, expectedDiscount) => {
  const order = buildOrder({ subtotal });
  expect(applyLoyaltyDiscount(order).discount).toBe(expectedDiscount);
});

更多模式请参考

references/test-patterns.md

。

Write integration tests - database and API

编写集成测试——数据库与API

For database integration tests:

javascript

// Use real DB, roll back after each test
beforeEach(() => db.beginTransaction());
afterEach(() => db.rollbackTransaction());

test('saves user and returns with id', async () => {
  const user = await userRepo.create({ name: 'Alice', email: 'alice@test.com' });
  expect(user.id).toBeDefined();
  const found = await userRepo.findById(user.id);
  expect(found.name).toBe('Alice');
});

For HTTP API integration tests, test the full request cycle:

javascript

test('POST /orders returns 201 with order id', async () => {
  const response = await request(app)
    .post('/orders')
    .send({ items: [{ productId: 'p1', qty: 2 }] });

  expect(response.status).toBe(201);
  expect(response.body.orderId).toBeDefined();
});

Test the unhappy paths equally: 400 for invalid input, 401 for missing auth, 404 for missing resource, 409 for conflicts.

针对数据库的集成测试：

javascript

// Use real DB, roll back after each test
beforeEach(() => db.beginTransaction());
afterEach(() => db.rollbackTransaction());

test('saves user and returns with id', async () => {
  const user = await userRepo.create({ name: 'Alice', email: 'alice@test.com' });
  expect(user.id).toBeDefined();
  const found = await userRepo.findById(user.id);
  expect(found.name).toBe('Alice');
});

针对HTTP API的集成测试，需测试完整请求周期：

javascript

test('POST /orders returns 201 with order id', async () => {
  const response = await request(app)
    .post('/orders')
    .send({ items: [{ productId: 'p1', qty: 2 }] });

  expect(response.status).toBe(201);
  expect(response.body.orderId).toBeDefined();
});

同样要测试异常路径：无效输入返回400、缺少权限返回401、资源不存在返回404、冲突返回409等。

Implement contract testing between services

在服务间实现契约测试

Contract testing decouples service teams without sacrificing confidence. The consumer defines what it expects; the provider proves it can deliver.

Pact workflow:

Consumer writes a pact test defining the expected request/response shape
Running the consumer test generates a pact file (JSON contract)
Provider runs a pact verification test against that contract
Both upload results to a Pact Broker -
```
can-i-deploy
```
gates deployment

Key rules:

The consumer owns the contract, not the provider
Contracts test shape and semantics, not business logic
Never test every field - only what the consumer actually uses

契约测试能在不降低可信度的前提下解耦服务团队。由消费者定义期望，提供者证明自己能满足这些期望。

Pact工作流：

消费者编写Pact测试，定义期望的请求/响应结构
运行消费者测试生成Pact文件（JSON格式的契约）
提供者针对该契约运行Pact验证测试
双方将结果上传至Pact Broker -
```
can-i-deploy
```
命令会作为部署的准入门槛

核心规则：

契约由消费者而非提供者主导
契约测试验证结构与语义，而非业务逻辑
无需测试每个字段——仅测试消费者实际使用的字段

Measure and improve test quality - not just coverage

衡量并提升测试质量——而非仅关注覆盖率

Line coverage is a floor, not a ceiling. Use these signals instead:

Mutation score - Run a mutation testing tool (Stryker, PITest). If removing a
```
> 0
```
check doesn't kill any test, your tests aren't asserting enough.
Test failure rate - Track which tests fail in CI over time. Tests that never fail on a production bug aren't exercising real risk.
Test change frequency - Tests that change every time production code changes are testing implementation, not behavior.
Time to red - How quickly does the suite tell you when something breaks? Optimize for signal speed, not raw pass/fail.

行覆盖率是下限，而非上限。可参考以下信号：

变异分数 - 运行变异测试工具（如Stryker、PITest）。如果移除
```
> 0
```
检查后没有测试失败，说明你的测试没有足够的断言。
测试失败率 - 跟踪CI流程中各测试的长期失败情况。从未在生产Bug中失败的测试，没有测试到真实风险。
测试变更频率 - 每次修改生产代码都需要修改的测试，测试的是实现而非行为。
故障反馈时间 - 套件能多快告诉你哪里出了问题？优化的重点是信号速度，而非单纯的通过/失败结果。

Handle flaky tests systematically

系统性处理不稳定测试

Never re-run a flaky test and call it fixed. Follow this protocol:

Quarantine immediately - Move the flaky test to a separate suite that runs but doesn't block CI. Don't delete it - you'll lose the signal.
Diagnose the root cause - Common causes:
- Shared mutable state between tests (missing cleanup)
- Time-dependent assertions (
```
Date.now()
```
  ,
```
setTimeout
```
  )
- Race conditions in async tests (missing
```
await
```
  )
- External service calls that should be stubbed
- Test order dependency
Fix the root cause - If time-dependent: freeze time with a clock fake. If shared state: isolate in beforeEach/afterEach. If async: await properly.
Un-quarantine and monitor - After the fix, restore to main suite and watch for a week of clean runs before declaring victory.

绝不能通过重新运行不稳定测试来“修复”它。请遵循以下流程：

立即隔离 - 将不稳定测试移至单独的测试套件，该套件会运行但不会阻塞CI流程。不要删除它——否则会丢失测试信号。
诊断根本原因 - 常见原因包括：
- 测试间共享可变状态（缺少清理步骤）
- 依赖时间的断言（如
```
Date.now()
```
  、
```
setTimeout
```
  ）
- 异步测试中的竞态条件（缺少
```
await
```
  ）
- 应被存根的外部服务调用
- 测试顺序依赖
修复根本原因 - 如果是时间依赖问题：使用时钟伪实现冻结时间。如果是共享状态问题：在beforeEach/afterEach中进行隔离。如果是异步问题：正确使用
```
await
```
。
解除隔离并监控 - 修复完成后，将测试恢复到主套件，并观察一周的无故障运行，再确认修复成功。

Anti-patterns

反模式

Anti-pattern	Problem	What to do instead
Testing the framework	`expect(orm.save).toHaveBeenCalled()` tests that the ORM is wired, not that data was saved	Assert the actual state after the operation
Snapshot testing everything	Snapshot tests fail on any UI change, creating noise and review fatigue	Use snapshots only for serialized output you rarely change (e.g., generated JSON schema)
100% coverage target	Creates tests that execute code without asserting anything meaningful	Set mutation score targets instead; aim for critical-path coverage
Giant test setup	Hundreds of lines of arrange code obscures what's actually being tested	Use builder/factory patterns; set only the fields that matter to the specific test
Mocking what you don't own	Mocking third-party libraries breaks on upgrades and doesn't test actual integration	Write a thin adapter you own, then mock your adapter
Skipping the testing pyramid for greenfield	Starting with e2e tests "because they test everything" leads to slow, brittle suites	Build bottom-up: unit tests first, integration second, e2e last

反模式	问题	替代方案
测试框架本身	`expect(orm.save).toHaveBeenCalled()` 测试的是ORM是否连接正确，而非数据是否已保存	操作完成后断言实际状态
对所有内容进行快照测试	快照测试会因任何UI变更失败，产生大量噪音和评审疲劳	仅对极少变更的序列化输出使用快照测试（如生成的JSON Schema）
100%覆盖率目标	导致测试仅执行代码而不验证有意义的内容	改为设置变异分数目标；聚焦关键路径的覆盖率
庞大的测试准备代码	数百行的准备代码会掩盖实际测试的内容	使用构建器/工厂模式；仅设置与当前测试相关的字段
模拟非自有代码	模拟第三方库会在库升级时失效，且无法测试实际集成	编写自己的轻量级适配器，然后模拟该适配器
新项目跳过测试金字塔	因为“能测试所有内容”而从e2e测试开始，会导致测试套件缓慢且脆弱	自下而上构建：先编写单元测试，再集成测试，最后e2e测试

References

参考资料

For detailed content on specific topics, read the relevant file from

references/

```
references/test-patterns.md
```
- Common testing patterns: builders, fakes, parameterized tests, and when to use each

Only load a references file if the current task requires deep detail on that topic.

如需特定主题的详细内容，请阅读

references/

目录下的相关文件：

```
references/test-patterns.md
```
- 常见测试模式：构建器、伪实现、参数化测试及各自的适用场景

仅当当前任务需要该主题的深度细节时，才加载参考文件。

test-strategy

Original

Translation

Test Strategy

测试策略

When to use this skill

何时使用此Skill

Key principles

核心原则

Core concepts

核心概念

Test types taxonomy

测试类型分类

The Testing Trophy

测试奖杯模型

Test doubles

测试替身

Coverage metrics

覆盖率指标

Common tasks

常见任务

Choose the right test type - decision matrix

选择合适的测试类型——决策矩阵

Design a test suite for a new service

为新服务设计测试套件

Write effective unit tests - patterns

编写高效的单元测试——模式

Write integration tests - database and API

编写集成测试——数据库与API

Implement contract testing between services

在服务间实现契约测试

Measure and improve test quality - not just coverage

衡量并提升测试质量——而非仅关注覆盖率

Handle flaky tests systematically

系统性处理不稳定测试

Anti-patterns

反模式

References

参考资料

Related skills

相关Skill