testing

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Testing

测试

A disciplined approach to verifying that software behaves correctly, remains stable under change, and communicates intent to future developers. Good tests act as living documentation, a safety net for refactoring, and a design feedback mechanism.

This skill covers universal testing concepts that apply regardless of language, framework, or tooling.

这是一种验证软件行为正确性、变更下稳定性，并向后续开发者传达意图的严谨方法。优质的测试可作为活文档、重构的安全网，以及设计反馈机制。

本内容涵盖通用测试概念，适用于任何语言、框架或工具。

When to Use

适用场景

Designing a test strategy for a new project or feature
Deciding what level of testing (unit, integration, e2e) a piece of code needs
Evaluating whether existing tests are providing value or creating drag
Applying TDD to drive design decisions
Debugging a flaky or brittle test suite
Reviewing test code for quality and maintainability

为新项目或新功能设计测试策略
确定某段代码需要的测试级别（单元、集成、端到端）
评估现有测试是否提供价值或造成负担
应用TDD驱动设计决策
调试不稳定或脆弱的测试套件
评审测试代码的质量与可维护性

Testing Pyramid

测试金字塔

The testing pyramid describes the ideal distribution of tests across three levels. More tests at the base, fewer at the top.

        /  E2E  \           Few, slow, expensive
       /----------\
      / Integration \       Moderate number, moderate speed
     /----------------\
    /    Unit Tests     \   Many, fast, cheap
   /____________________\

测试金字塔描述了测试在三个层级的理想分布。底层测试数量最多，顶层最少。

        /  E2E  \           数量少、速度慢、成本高
       /----------\
      / Integration \       数量中等、速度中等
     /----------------\
    /    Unit Tests     \   数量多、速度快、成本低
   /____________________\

Unit Tests (Base)

单元测试（基础层）

Test a single unit of behavior in isolation (a function, a method, a small class)
No I/O, no database, no network, no file system
Execute in milliseconds
Should form the majority of your test suite (roughly 70%)
Fast feedback loop enables rapid iteration

孤立测试单个行为单元（函数、方法、小型类）
无I/O操作、无数据库、无网络、无文件系统交互
执行时间以毫秒计
应占测试套件的大部分（约70%）
快速反馈循环支持快速迭代

Integration Tests (Middle)

集成测试（中间层）

Test how multiple units collaborate, or how code interacts with external systems
May involve a real database, message queue, or HTTP endpoint
Execute in seconds
Verify that wiring, configuration, and contracts between components work
Roughly 20% of your test suite

测试多个单元的协作，或代码与外部系统的交互
可能涉及真实数据库、消息队列或HTTP端点
执行时间以秒计
验证组件间的连接、配置与契约是否正常工作
约占测试套件的20%

End-to-End Tests (Top)

端到端测试（顶层）

Test complete user journeys through the full system
Interact with the application as a user would
Slowest, most brittle, most expensive to maintain
Reserve for critical business paths only
Roughly 10% of your test suite

测试完整系统中的完整用户流程
以用户视角与应用交互
速度最慢、最脆弱、维护成本最高
仅用于关键业务路径
约占测试套件的10%

The Ice Cream Cone Antipattern

冰淇淋锥反模式

The inverted pyramid: many e2e tests, few unit tests. Symptoms:

Test suite takes hours to run
Tests break constantly due to UI changes or timing issues
Developers stop running tests locally
Feedback loop is too slow to support continuous delivery

Fix: Identify what each e2e test is actually verifying. Push that verification down to the lowest possible level. Most business logic can be tested at the unit level.

即倒置金字塔：大量端到端测试，少量单元测试。症状包括：

测试套件需要数小时才能运行完成
因UI变更或时序问题频繁失败
开发者不再本地运行测试
反馈循环过慢，无法支持持续交付

修复方案： 明确每个端到端测试实际要验证的内容，将验证下沉到尽可能低的层级。大部分业务逻辑可在单元层测试。

Test Design Principles

测试设计原则

Arrange-Act-Assert (AAA)

准备-执行-断言（AAA）

Every test should follow three distinct phases:

Arrange — set up the preconditions and inputs
Act — execute the behavior under test
Assert — verify the expected outcome

Keep each phase clearly separated. If Arrange dominates the test, extract a builder or factory. If Act requires multiple steps, you may be testing too much at once.

每个测试应遵循三个明确阶段：

准备（Arrange） — 设置前置条件与输入
执行（Act） — 执行待测试行为
断言（Assert） — 验证预期结果

保持每个阶段清晰分离。如果准备阶段占比过大，可提取构建器或工厂类。如果执行阶段需要多个步骤，可能意味着一次测试的内容过多。

One Assertion per Concept

每个概念对应一个断言

A test should verify one logical concept. This does not mean literally one

assert

call — asserting multiple properties of a single result is fine. What matters is that the test fails for exactly one reason.

// Good: one concept — "completed order has correct totals"
assert order.subtotal == 100
assert order.tax == 21
assert order.total == 121

// Bad: two unrelated concepts in one test
assert order.total == 121
assert emailService.wasCalled()

一个测试应验证一个逻辑概念。这并不意味着只能有一个

assert

调用——断言单个结果的多个属性是可行的。关键在于测试仅会因一个原因失败。

// 良好示例：一个概念——"已完成订单的金额计算正确"
assert order.subtotal == 100
assert order.tax == 21
assert order.total == 121

// 不良示例：一个测试包含两个无关概念
assert order.total == 121
assert emailService.wasCalled()

Test Naming

测试命名

Test names should describe the behavior, not the implementation. A good test name answers: "What scenario is being tested, and what is the expected outcome?"

Patterns that work across languages:

```
should_return_zero_when_cart_is_empty
```
```
rejects_negative_quantities
```
```
applies_discount_for_premium_customers
```

Avoid names like

testCalculate

test1

, or

testGetterSetter

测试名称应描述行为，而非实现。好的测试名称应回答：“测试的是什么场景，预期结果是什么？”

适用于各语言的命名模式：

```
should_return_zero_when_cart_is_empty
```
```
rejects_negative_quantities
```
```
applies_discount_for_premium_customers
```

避免使用

testCalculate

、

test1

或

testGetterSetter

这类名称。

Test Independence and Isolation

测试的独立性与隔离性

Each test must be completely independent of every other test:

No shared mutable state between tests
No required execution order
Each test sets up its own preconditions and cleans up after itself
A single failing test should not cascade into other failures

每个测试必须完全独立于其他测试：

测试间无共享可变状态
无强制执行顺序
每个测试自行设置前置条件并清理环境
单个测试失败不应引发连锁失败

Deterministic Tests

确定性测试

A test must produce the same result every time it runs, regardless of:

The current time or date
The order of test execution
The machine it runs on
Network availability
Other tests running in parallel

Non-deterministic tests (flaky tests) destroy trust in the test suite and are worse than no tests at all.

无论以下因素如何变化，测试每次运行都应产生相同结果：

当前时间或日期
测试执行顺序
运行机器
网络可用性
并行运行的其他测试

非确定性测试（不稳定测试）会摧毁对测试套件的信任，甚至比没有测试更糟。

FIRST Principles

FIRST原则

Principle	Meaning
Fast	Tests should run in seconds, not minutes. Slow tests don't get run.
Independent	No test relies on the output of another test.
Repeatable	Same result in any environment — local, CI, staging.
Self-validating	Pass or fail with no human interpretation required.
Timely	Written at the right time — ideally before or alongside the production code.

原则	含义
Fast（快速）	测试应在数秒内完成，而非数分钟。慢测试不会被开发者运行。
Independent（独立）	无测试依赖其他测试的输出。
Repeatable（可重复）	在任何环境（本地、CI、预发布）中结果一致。
Self-validating（自验证）	无需人工解读即可判断通过或失败。
Timely（及时）	在合适的时机编写——理想情况下与生产代码同步或提前编写。

Test-Driven Development (TDD)

测试驱动开发（TDD）

TDD is a design discipline where tests are written before production code, following a tight feedback loop.

TDD是一种设计规范，要求在编写生产代码前先编写测试，遵循紧凑的反馈循环。

Red-Green-Refactor Cycle

红-绿-重构循环

Red — Write a failing test that describes the desired behavior
Green — Write the simplest production code that makes the test pass
Refactor — Improve the code structure while keeping all tests green

Rules:

Never write production code without a failing test
Write only enough test to fail (compilation failure counts)
Write only enough production code to pass the current failing test

红 — 编写一个描述期望行为的失败测试
绿 — 编写最简生产代码使测试通过
重构 — 优化代码结构，同时保持所有测试通过

规则：

无失败测试时绝不编写生产代码
仅编写足够导致失败的测试（编译失败也算）
仅编写足够通过当前失败测试的生产代码

Two Schools of TDD

TDD的两大流派

Aspect	Chicago (Classical)	London (Mockist)
Verification	State-based	Interaction-based
Direction	Inside-out	Outside-in
Collaborators	Real objects	Mocks/stubs
Strength	Refactoring-resilient tests	Drives interface design
Risk	Complex setup for deep graphs	Tests coupled to implementation

See TDD Schools reference for detailed comparison and guidance.

维度	芝加哥（经典派）	伦敦（模拟派）
验证方式	基于状态	基于交互
方向	由内向外	由外向内
协作对象	真实对象	模拟/存根
优势	测试对重构有韧性	驱动接口设计
风险	深层对象图设置复杂	测试与实现耦合

详见TDD流派参考文档获取详细对比与指导。

When TDD Helps Most

TDD的最佳适用场景

Business logic with clear rules and edge cases
Algorithm design
API contract definition
Bug reproduction and fixing (write the failing test first)

规则明确、存在边界情况的业务逻辑
算法设计
API契约定义
Bug复现与修复（先编写失败测试）

When TDD May Not Apply

TDD的不适用场景

Exploratory prototyping (write tests after you understand the shape)
UI layout and styling
One-off scripts

探索性原型开发（理解需求后再编写测试）
UI布局与样式
一次性脚本

Test Doubles

测试替身

Test doubles replace real dependencies during testing. Each type serves a different purpose.

Double	Purpose	Verifies?
Dummy	Fill parameter lists. Never actually used.	No
Stub	Provide canned responses to method calls.	No
Spy	Record interactions for later assertion.	Yes (after the fact)
Mock	Pre-programmed with expectations. Fails if not called correctly.	Yes (inline)
Fake	Simplified working implementation (e.g., in-memory repository).	No

See Test Doubles reference for detailed guidance on when to use each type.

测试替身用于在测试期间替代真实依赖。每种类型有不同用途。

替身类型	用途	是否验证？
Dummy（占位符）	填充参数列表，从未实际使用。	否
Stub（存根）	为方法调用提供预设响应。	否
Spy（间谍）	记录交互以便后续断言。	是（事后）
Mock（模拟）	预编程预期调用，若未按预期调用则失败。	是（内联）
Fake（伪实现）	简化的可用实现（如内存仓库）。	否

详见测试替身参考文档获取各类型的使用时机指导。

Key Principle: Mock at Boundaries

核心原则：在边界处使用模拟

Use test doubles at architectural boundaries (ports, external services), not between internal collaborators. Mocking internal classes couples your tests to implementation details and makes refactoring painful.

在架构边界（端口、外部服务）使用测试替身，而非内部协作对象之间。对内部类进行模拟会使测试与实现细节耦合，导致重构困难。

What to Test / What Not to Test

测试内容与非测试内容

High Value — Always Test

高价值内容——必须测试

Business rules and domain logic
Edge cases, boundary conditions, error paths
State transitions and workflows
Input validation and sanitization
Security-critical paths (authentication, authorization)
Data transformations and calculations

业务规则与领域逻辑
边界情况、临界条件、错误路径
状态转换与工作流程
输入验证与清理
安全关键路径（认证、授权）
数据转换与计算

Low Value — Usually Skip

低价值内容——通常可跳过

Trivial getters/setters with no logic
Framework-generated code (ORM mappings, routing config)
Third-party library internals (test your integration, not their code)
Private methods (test through the public API)
Logging and telemetry (unless business-critical)

无逻辑的简单getter/setter
框架生成代码（ORM映射、路由配置）
第三方库内部实现（测试集成而非其代码）
私有方法（通过公共API测试）
日志与遥测（除非业务关键）

Testing Implementation vs Behavior

测试实现 vs 测试行为

Test behavior, not implementation. A good test describes what the system does, not how it does it internally.

Signs you are testing implementation:

Test breaks when you refactor without changing behavior
Test asserts the order of internal method calls
Test verifies private state rather than public output
Renaming an internal class breaks tests for unrelated features

Signs you are testing behavior:

Test describes a user-meaningful scenario
Test remains green after internal refactoring
Test asserts on outputs, side effects, or state changes visible through the public API

测试行为，而非实现。 优质测试描述系统做什么，而非内部如何做。

测试实现的迹象：

重构但未变更行为时测试失败
测试断言内部方法调用顺序
测试验证私有状态而非公开输出
重命名内部类导致无关功能的测试失败

测试行为的迹象：

测试描述对用户有意义的场景
内部重构后测试仍通过
测试断言输出、副作用或通过公共API可见的状态变化

Testing Strategies by Layer

按层级划分的测试策略

Different architectural layers call for different testing approaches. See Testing Strategies reference for detailed guidance.

Layer	Primary Test Type	Key Technique
Domain/Business Logic	Unit tests	State-based verification, no I/O
Application Services	Unit + Integration	Test doubles for infrastructure ports
Data Access	Integration	Real database (test containers, in-memory)
API Endpoints	Integration + Contract	Request/response validation
UI Components	Component tests	Interaction simulation
Full System	E2E (selective)	Critical paths only

不同架构层级需要不同的测试方法。详见测试策略参考文档获取详细指导。

层级	主要测试类型	核心技巧
领域/业务逻辑	单元测试	基于状态的验证，无I/O
应用服务	单元+集成测试	对基础设施端口使用测试替身
数据访问	集成测试	真实数据库（测试容器、内存数据库）
API端点	集成+契约测试	请求/响应验证
UI组件	组件测试	交互模拟
完整系统	端到端测试（选择性）	仅覆盖关键路径

Common Antipatterns

常见反模式

Antipattern	Symptoms	Fix
Brittle tests	Tests break on every refactor even when behavior is unchanged	Test behavior through public API, not internal structure
Testing implementation	Asserting on method call order, private state, internal wiring	Assert on outputs and observable side effects
Slow test suite	Test suite takes 10+ minutes; developers skip running tests	Push tests down the pyramid; use test doubles for I/O
Flaky tests	Tests pass/fail randomly without code changes	Remove time dependencies, shared state, and ordering assumptions
Excessive mocking	More mock setup than actual test logic; tests are unreadable	Use real collaborators where possible; mock only at boundaries
Test data coupling	Tests share fixtures and break when shared data changes	Each test creates its own data; use builders/factories
Missing error paths	Only happy path tested; failures discovered in production	Explicitly test error cases, edge cases, and boundary conditions
Commented-out tests	Failing tests are disabled rather than fixed or deleted	Fix the test, or delete it if the behavior changed intentionally
Giant test methods	Tests are 50+ lines with multiple acts and asserts	Split into focused tests; extract setup into helpers
No assertion	Test executes code but never asserts anything	Every test must have at least one meaningful assertion

反模式	症状	修复方案
脆弱测试	每次重构即使行为未变更，测试也会失败	通过公共API测试行为，而非内部结构
测试实现	断言方法调用顺序、私有状态、内部连接	断言输出与可观察的副作用
缓慢的测试套件	测试套件运行需10+分钟；开发者跳过运行	将测试下沉到金字塔底层；对I/O使用测试替身
不稳定测试	无代码变更时测试随机通过/失败	移除时间依赖、共享状态与顺序假设
过度模拟	模拟设置多于实际测试逻辑；测试难以阅读	尽可能使用真实协作对象；仅在边界处模拟
测试数据耦合	测试共享固定数据，共享数据变更时测试失败	每个测试自行创建数据；使用构建器/工厂
缺失错误路径	仅测试正常路径；生产环境才发现故障	显式测试错误场景、边界情况
注释掉的测试	失败测试被禁用而非修复或删除	修复测试，若行为已变更则删除
巨型测试方法	测试代码超过50行，包含多个执行与断言步骤	拆分为聚焦的测试；将设置逻辑提取到辅助方法
无断言测试	测试执行代码但未做任何断言	每个测试必须至少有一个有意义的断言