unit-testing

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Unit Testing Best Practices

单元测试最佳实践

Comprehensive guidance on writing reliable, maintainable, and effective unit tests. Covers core principles, structural patterns, isolation strategies, and common pitfalls.

本文提供编写可靠、可维护且高效的单元测试的全面指南，涵盖核心原则、结构模式、隔离策略以及常见陷阱。

What Is a Unit Test

什么是单元测试

A unit test exercises the smallest individual component of code—a function, method, or class—in complete isolation, verifying that its actual behavior matches its expected behavior.

Key properties of a good unit test: readable, isolated, reliable, simple, fast, and timely.

Unit tests serve a dual purpose: they validate behavior and act as executable documentation that never goes out of sync with the code.

单元测试是在完全隔离的环境中测试代码的最小独立组件（函数、方法或类），验证其实际行为是否符合预期行为。

优质单元测试的核心特性：可读性强、完全隔离、可靠稳定、简洁易懂、执行快速、及时编写。

单元测试具有双重作用：验证代码行为，同时作为与代码始终保持同步的可执行文档。

Core Principles

核心原则

Isolation Is Non-Negotiable

隔离性是硬性要求

Unit tests must run without connecting to external systems: databases, file systems, network APIs, or third-party services. Isolation ensures:

Tests run fast (milliseconds, not seconds)
Test failures point directly to the unit under test, not to infrastructure
Tests remain deterministic across environments and runs

Replace external dependencies with test doubles (stubs, mocks, spies) to maintain isolation.

单元测试必须在不连接外部系统（数据库、文件系统、网络API或第三方服务）的情况下运行。隔离性可确保：

测试执行速度快（以毫秒为单位，而非秒）
测试失败可直接定位到被测单元，而非基础设施问题
在不同环境和多次运行中，测试结果始终具有确定性

使用Test Double（Stub、Mock、Spy等）替代外部依赖，以保持隔离性。

Determinism: Tests Must Be Predictable

确定性：测试结果必须可预测

A test is deterministic when it always produces the same result given unchanged production code. Non-deterministic tests—those that sometimes pass and sometimes fail without code changes—destroy developer trust.

Sources of non-determinism to eliminate:

Dependency on current time, date, or locale
Reliance on shared mutable state between tests
Calls to real external services
Dependence on test execution order

当生产代码未发生变化时，测试始终产生相同结果，则该测试具有确定性。非确定性测试（即无需修改代码，测试结果时而通过时而失败）会破坏开发者对测试套件的信任。

需要消除的非确定性来源：

依赖当前时间、日期或区域设置
依赖测试间的共享可变状态
调用真实的外部服务
依赖测试执行顺序

One Concern Per Test

每个测试仅关注一个关注点

Each test verifies a single end result from a single unit of work. End results are:

A return value
A change to system state
A call to a third-party dependency

When a test asserts on multiple unrelated outcomes, it becomes harder to diagnose failures and indicates the test is covering more than one concern.

每个测试仅验证单个工作单元的一个最终结果。最终结果包括：

返回值
系统状态变更
对第三方依赖的调用

当一个测试断言多个不相关的结果时，故障诊断难度会增加，这也表明该测试覆盖了多个关注点。

Tests Are First-Class Code

测试代码是一等代码

Test code has the same quality requirements as production code: readability, maintainability, and correctness. A buggy test suite is worse than no tests at all—it provides false confidence.

测试代码与生产代码有着相同的质量要求：可读性、可维护性和正确性。存在bug的测试套件比没有测试套件更糟糕，因为它会带来虚假的安全感。

Structural Patterns

结构模式

The AAA Pattern (Arrange-Act-Assert)

AAA模式（Arrange-Act-Assert）

Every test method follows three distinct phases:

Arrange: Create and configure all objects and preconditions needed for the test
Act: Call the method or trigger the behavior under test; capture the actual result
Assert: Compare the actual result against the expected result

Clearly delimit these three phases (via whitespace or comments) to improve readability at a glance.

每个测试方法都遵循三个不同的阶段：

Arrange（准备）：创建并配置测试所需的所有对象和前置条件
Act（执行）：调用被测方法或触发被测行为；捕获实际结果
Assert（断言）：将实际结果与预期结果进行比较

通过空格或注释清晰分隔这三个阶段，以提升测试代码的可读性。

One Act Per Test Method

每个测试方法仅包含一个Act步骤

Avoid multiple Act steps in a single test. When a test exercises two different behaviors, it becomes impossible to tell at a glance which behavior caused a failure. Create a separate test method for each behavior being verified.

避免在单个测试中包含多个Act步骤。当一个测试验证两种不同的行为时，无法一眼判断是哪种行为导致了失败。应为每个待验证的行为创建单独的测试方法。

Naming Convention: Method-State-Expected

命名规范：方法-状态-预期结果

Use a three-part naming pattern that makes the test self-documenting:

MethodName_StateUnderTest_ExpectedBehavior

Examples:

Add_TwoPositiveNumbers_ReturnsCorrectSum

ParseDate_InvalidFormat_ThrowsFormatException

```
GetUser_UserDoesNotExist_ReturnsNull
```

A good name communicates three things without reading the test body: what is being tested, under what conditions, and what result is expected. When a test fails, its name alone should indicate which scenario broke.

使用三段式命名模式，使测试代码具备自文档性：

MethodName_StateUnderTest_ExpectedBehavior

示例：

Add_TwoPositiveNumbers_ReturnsCorrectSum

ParseDate_InvalidFormat_ThrowsFormatException

```
GetUser_UserDoesNotExist_ReturnsNull
```

一个好的测试方法名称无需查看测试代码正文，就能传达三个信息：被测对象是什么、在什么条件下测试、预期结果是什么。当测试失败时，仅通过名称就能知道哪个场景出了问题。

Avoid Magic Values

避免使用魔法值

Hardcoded literal strings and numbers in tests obscure intent and make tests brittle. Use named constants or variables that communicate meaning:

Instead of

"123456789"

→

const INVALID_IDENTITY_NUMBER = "123456789"

Instead of
```
42
```
→
```
const MAX_RETRY_ATTEMPTS = 42
```

Named values also serve as documentation—they explain why that specific value is being used.

测试代码中硬编码的字符串和数字会模糊代码意图，导致测试代码脆弱。应使用具有明确含义的命名常量或变量：

替代

"123456789"

→

const INVALID_IDENTITY_NUMBER = "123456789"

替代
```
42
```
→
```
const MAX_RETRY_ATTEMPTS = 42
```

命名值还可作为文档，解释为何使用该特定值。

Use Helper Methods for Shared Setup

使用辅助方法处理共享设置

When multiple tests require the same object configuration, extract a factory or setup helper method rather than duplicating the construction logic inline. Benefits:

Changes to the object's constructor require updates in one place only
Test bodies remain focused on behavior, not setup
Reduces cognitive overhead when reading tests

当多个测试需要相同的对象配置时，应提取工厂方法或设置辅助方法，而非在多个测试中重复构造逻辑。这样做的好处：

当对象的构造函数发生变化时，仅需在一处更新代码
测试代码正文可专注于验证行为，而非设置过程
降低阅读测试代码时的认知负担

Isolation Strategies

隔离策略

Test Doubles: The Right Tool for Each Job

Test Double：选择合适的工具

The term "mock" is often used loosely, but test doubles come in distinct types with different purposes. Choosing the right type prevents over-specification and brittle tests.

Double Type	Purpose
Stub	Returns predefined data for a dependency; used to control the test environment
Mock	Records calls and verifies that expected interactions occurred; used for behavioral verification
Spy	Like a mock but wraps the real object; allows partial verification without full replacement
Fake	A lightweight working implementation (e.g., an in-memory repository) used when stubs are too simple
Dummy	A placeholder passed to satisfy a parameter; never actually used in the test

Use stubs and fakes when verifying return values or state changes. Use mocks and spies when verifying that a specific interaction with a dependency occurred.

For a detailed comparison of test double types and when to use each, see

references/test-doubles.md

人们常宽泛地使用“mock”一词，但Test Double分为不同类型，各有其用途。选择合适的类型可避免过度指定和脆弱的测试。

Double类型	用途
Stub	为依赖返回预定义数据；用于控制测试环境
Mock	记录调用并验证是否发生了预期的交互；用于行为验证
Spy	类似Mock，但包装真实对象；允许在不完全替换对象的情况下进行部分验证
Fake	轻量级的可工作实现（例如内存中的存储库），当Stub过于简单时使用
Dummy	用于满足参数要求的占位符；在测试中从未被实际使用

当验证返回值或状态变更时，使用Stub和Fake。当验证与依赖的特定交互是否发生时，使用Mock和Spy。

如需了解Test Double类型的详细对比以及各类型的适用场景，请参阅

references/test-doubles.md

。

Avoid Testing Through Implementation Details

避免针对实现细节进行测试

Tests that couple to internal implementation details—private methods, specific internal state, the exact sequence of internal calls—become brittle. When the implementation changes but the behavior stays the same, those tests break unnecessarily.

Test through the public interface. Verify observable outcomes: return values, state changes visible through public accessors, and calls to external dependencies.

与内部实现细节（私有方法、特定内部状态、内部调用的精确顺序）耦合的测试会变得脆弱。当实现方式变更但行为未变时，这些测试会不必要地失败。

应通过公共接口进行测试，验证可观察的结果：返回值、通过公共访问器可见的状态变更，以及对外部依赖的调用。

Test Quality Properties

测试质量特性

Speed

速度

Fast tests get run frequently; slow tests get run infrequently or skipped
A common threshold: any test exceeding 75–100ms is considered slow
Ensure speed by: keeping tests simple, mocking external dependencies, and avoiding interdependencies between tests

执行快速的测试会被频繁运行；执行缓慢的测试会被少运行或跳过
通用阈值：任何执行时间超过75-100毫秒的测试都被视为缓慢测试
通过以下方式确保测试速度：保持测试简单、Mock外部依赖、避免测试间的相互依赖

Simplicity and Low Cyclomatic Complexity

简洁性与低圈复杂度

Keep test logic free of conditional branches (

if

for

while

switch

). Test methods that contain branching logic are themselves complex enough to contain bugs. If multiple input scenarios need verification, use parameterized tests instead of loops within a single test.

测试逻辑应避免包含条件分支（

if

、

for

、

while

、

switch

）。包含分支逻辑的测试方法本身足够复杂，可能存在bug。如果需要验证多个输入场景，请使用参数化测试，而非在单个测试中使用循环。

No Duplication of Implementation Logic

避免复制实现逻辑

Tests that replicate the production algorithm inside the test body provide no real safety net. If the algorithm is wrong, the mirrored test logic will be wrong in the same way, and the test will still pass. Tests must encode the expected outcome as a fixed, independently derived value—not compute it using the same logic.

在测试代码中复制生产代码算法的测试无法提供真正的安全保障。如果算法存在错误，镜像的测试逻辑也会出现相同的错误，测试仍会通过。测试必须将预期结果编码为固定的、独立推导的值，而非使用相同逻辑计算得出。

Comprehensive Coverage

全面覆盖

Cover both positive and negative paths:

Positive cases: Valid inputs producing expected results
Negative cases: Invalid, unexpected, or boundary inputs
Edge cases: Empty values, nulls, maximum/minimum values, boundary conditions

Target 70–80% code coverage as a practical baseline. Coverage is a useful indicator but not a goal in itself—100% coverage with low-quality tests is worse than 75% coverage with high-quality tests.

覆盖正向和反向路径：

正向用例：有效输入产生预期结果
反向用例：无效、意外或边界输入
边缘用例：空值、null、最大/最小值、边界条件

将70-80%的代码覆盖率作为实用基线。覆盖率是一个有用的指标，但并非最终目标——低质量测试实现的100%覆盖率，不如高质量测试实现的75%覆盖率。

Environment Restoration (Teardown)

环境恢复（Teardown）

After each test, restore the environment to a clean state. Leftover state from one test can cause unpredictable failures in subsequent tests. Common teardown actions:

Delete temporary files
Reset global or shared state
Close database connections or file handles
Release resources acquired during the test

每次测试后，将环境恢复到干净状态。上一个测试留下的状态可能导致后续测试出现不可预测的失败。常见的Teardown操作：

删除临时文件
重置全局或共享状态
关闭数据库连接或文件句柄
释放测试过程中获取的资源

Integration With the Development Process

与开发流程的集成

Run Tests as Part of CI/CD

将测试作为CI/CD的一部分运行

Unit tests run automatically on every code change through a CI/CD pipeline. A failing test marks the build as broken and prevents broken code from reaching downstream environments. Running tests locally is necessary but not sufficient—the pipeline provides the authoritative safety net.

通过CI/CD流水线，每次代码变更时自动运行单元测试。失败的测试会标记构建为失败，并阻止有问题的代码进入下游环境。本地运行测试是必要的，但并不足够——流水线提供了权威的安全保障。

Test-Driven Development (TDD)

测试驱动开发（TDD）

TDD inverts the usual workflow: write a failing test first, then write the minimal production code to make it pass, then refactor. Benefits:

Forces the developer to define expected behavior before implementation
Naturally produces testable code (if code is hard to test, TDD surfaces that immediately)
Results in a test suite that documents intent, not just behavior

TDD反转了常规的工作流程：先编写一个失败的测试，然后编写最少的生产代码使测试通过，最后进行重构。其优势包括：

促使开发者在实现前定义预期行为
自然产生可测试的代码（如果代码难以测试，TDD会立即暴露这一问题）
生成的测试套件可记录意图，而非仅记录行为

Testable Code Architecture

可测试的代码架构

If adding unit tests to a piece of code is difficult, that difficulty signals a design problem. Common architectural enablers of testability:

Dependency injection: Dependencies are provided externally rather than constructed internally, making them replaceable with test doubles
Single responsibility: Small, focused units are easier to test in isolation than large units with many concerns
Pure functions: Functions with no side effects and no external dependencies are trivially testable
Avoiding global state: Global mutable state creates hidden dependencies between tests

Difficulty writing unit tests is a signal to refactor the production code, not to skip testing.

如果为一段代码添加单元测试很困难，这表明代码存在设计问题。提升可测试性的常见架构措施：

依赖注入：依赖由外部提供而非内部构造，使其可被Test Double替代
单一职责：小而专注的单元比包含多个关注点的大单元更容易隔离测试
纯函数：无副作用且无外部依赖的函数极易测试
避免全局状态：全局可变状态会在测试间创建隐藏依赖

编写单元测试困难是重构生产代码的信号，而非跳过测试的理由。

Common Pitfalls

常见陷阱

Pitfall	Why It Hurts	Remedy
Complex logic in tests	Tests become buggy and untrustworthy	Keep cyclomatic complexity near 1
Multiple acts in one test	Failures are ambiguous	One act per test method
Testing implementation details	Tests break on refactoring	Test through the public interface
Non-deterministic tests	Developers lose trust in the suite	Eliminate time, randomness, and shared state dependencies
Magic literals	Intent is obscured	Use named constants
Mirroring implementation logic	Tests can't catch bugs in the logic	Use independent, fixed expected values
Slow tests	Tests are run infrequently	Mock external dependencies; keep tests simple
Missing teardown	Tests pollute each other's environments	Always restore state after each test

陷阱	危害	解决方法
测试中包含复杂逻辑	测试代码会出现bug且不可信	将圈复杂度保持在1左右
单个测试包含多个Act步骤	失败原因模糊	每个测试方法仅包含一个Act步骤
针对实现细节测试	重构时测试会失败	通过公共接口进行测试
非确定性测试	开发者失去对测试套件的信任	消除对时间、随机性和共享状态的依赖
魔法值	代码意图模糊	使用命名常量
复制实现逻辑	测试无法发现逻辑中的bug	使用独立的固定预期值
测试执行缓慢	测试被少运行	Mock外部依赖；保持测试简单
缺少Teardown步骤	测试相互污染环境	每次测试后始终恢复状态

Quick Reference

快速参考

Core rules:

One concern, one test
One act per test method
Arrange → Act → Assert
No external dependencies—use test doubles
Name tests:
```
Method_State_Expected
```
No logic in tests (no
```
if
```
/
```
for
```
/
```
while
```
)
Use named constants, not magic values
Restore state after each test

Test coverage targets: 70–80% is a practical baseline

Test speed threshold: Tests taking >75–100ms warrant review

核心规则：

一个关注点，一个测试
每个测试方法仅包含一个Act步骤
Arrange → Act → Assert
无外部依赖——使用Test Double
测试命名：
```
Method_State_Expected
```
测试中无逻辑（不使用
```
if
```
/
```
for
```
/
```
while
```
)
使用命名常量，而非魔法值
每次测试后恢复状态

测试覆盖率目标： 70-80%是实用基线

测试速度阈值： 执行时间超过75-100毫秒的测试需要进行审查

Additional Resources

额外资源

For deeper detail on specific topics:

references/test-doubles.md
- Detailed breakdown of stub, mock, spy, fake, and dummy differences with decision guidance
references/test-patterns.md
- AAA pattern, BDD Given/When/Then style, naming conventions, and parameterized testing strategies

如需了解特定主题的详细内容：

references/test-doubles.md
- Stub、Mock、Spy、Fake和Dummy的详细对比及选型指南
references/test-patterns.md
- AAA模式、BDD Given/When/Then风格、命名规范以及参数化测试策略