testing

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Testing

测试

A disciplined approach to verifying that software behaves correctly, remains stable under change, and communicates intent to future developers. Good tests act as living documentation, a safety net for refactoring, and a design feedback mechanism.
This skill covers universal testing concepts that apply regardless of language, framework, or tooling.
这是一种验证软件行为正确性、变更下稳定性,并向后续开发者传达意图的严谨方法。优质的测试可作为活文档、重构的安全网,以及设计反馈机制。
本内容涵盖通用测试概念,适用于任何语言、框架或工具。

When to Use

适用场景

  • Designing a test strategy for a new project or feature
  • Deciding what level of testing (unit, integration, e2e) a piece of code needs
  • Evaluating whether existing tests are providing value or creating drag
  • Applying TDD to drive design decisions
  • Debugging a flaky or brittle test suite
  • Reviewing test code for quality and maintainability
  • 为新项目或新功能设计测试策略
  • 确定某段代码需要的测试级别(单元、集成、端到端)
  • 评估现有测试是否提供价值或造成负担
  • 应用TDD驱动设计决策
  • 调试不稳定或脆弱的测试套件
  • 评审测试代码的质量与可维护性

Testing Pyramid

测试金字塔

The testing pyramid describes the ideal distribution of tests across three levels. More tests at the base, fewer at the top.
        /  E2E  \           Few, slow, expensive
       /----------\
      / Integration \       Moderate number, moderate speed
     /----------------\
    /    Unit Tests     \   Many, fast, cheap
   /____________________\
测试金字塔描述了测试在三个层级的理想分布。底层测试数量最多,顶层最少。
        /  E2E  \           数量少、速度慢、成本高
       /----------\
      / Integration \       数量中等、速度中等
     /----------------\
    /    Unit Tests     \   数量多、速度快、成本低
   /____________________\

Unit Tests (Base)

单元测试(基础层)

  • Test a single unit of behavior in isolation (a function, a method, a small class)
  • No I/O, no database, no network, no file system
  • Execute in milliseconds
  • Should form the majority of your test suite (roughly 70%)
  • Fast feedback loop enables rapid iteration
  • 孤立测试单个行为单元(函数、方法、小型类)
  • 无I/O操作、无数据库、无网络、无文件系统交互
  • 执行时间以毫秒计
  • 应占测试套件的大部分(约70%)
  • 快速反馈循环支持快速迭代

Integration Tests (Middle)

集成测试(中间层)

  • Test how multiple units collaborate, or how code interacts with external systems
  • May involve a real database, message queue, or HTTP endpoint
  • Execute in seconds
  • Verify that wiring, configuration, and contracts between components work
  • Roughly 20% of your test suite
  • 测试多个单元的协作,或代码与外部系统的交互
  • 可能涉及真实数据库、消息队列或HTTP端点
  • 执行时间以秒计
  • 验证组件间的连接、配置与契约是否正常工作
  • 约占测试套件的20%

End-to-End Tests (Top)

端到端测试(顶层)

  • Test complete user journeys through the full system
  • Interact with the application as a user would
  • Slowest, most brittle, most expensive to maintain
  • Reserve for critical business paths only
  • Roughly 10% of your test suite
  • 测试完整系统中的完整用户流程
  • 以用户视角与应用交互
  • 速度最慢、最脆弱、维护成本最高
  • 仅用于关键业务路径
  • 约占测试套件的10%

The Ice Cream Cone Antipattern

冰淇淋锥反模式

The inverted pyramid: many e2e tests, few unit tests. Symptoms:
  • Test suite takes hours to run
  • Tests break constantly due to UI changes or timing issues
  • Developers stop running tests locally
  • Feedback loop is too slow to support continuous delivery
Fix: Identify what each e2e test is actually verifying. Push that verification down to the lowest possible level. Most business logic can be tested at the unit level.
即倒置金字塔:大量端到端测试,少量单元测试。症状包括:
  • 测试套件需要数小时才能运行完成
  • 因UI变更或时序问题频繁失败
  • 开发者不再本地运行测试
  • 反馈循环过慢,无法支持持续交付
修复方案: 明确每个端到端测试实际要验证的内容,将验证下沉到尽可能低的层级。大部分业务逻辑可在单元层测试。

Test Design Principles

测试设计原则

Arrange-Act-Assert (AAA)

准备-执行-断言(AAA)

Every test should follow three distinct phases:
  1. Arrange — set up the preconditions and inputs
  2. Act — execute the behavior under test
  3. Assert — verify the expected outcome
Keep each phase clearly separated. If Arrange dominates the test, extract a builder or factory. If Act requires multiple steps, you may be testing too much at once.
每个测试应遵循三个明确阶段:
  1. 准备(Arrange) — 设置前置条件与输入
  2. 执行(Act) — 执行待测试行为
  3. 断言(Assert) — 验证预期结果
保持每个阶段清晰分离。如果准备阶段占比过大,可提取构建器或工厂类。如果执行阶段需要多个步骤,可能意味着一次测试的内容过多。

One Assertion per Concept

每个概念对应一个断言

A test should verify one logical concept. This does not mean literally one
assert
call — asserting multiple properties of a single result is fine. What matters is that the test fails for exactly one reason.
// Good: one concept — "completed order has correct totals"
assert order.subtotal == 100
assert order.tax == 21
assert order.total == 121

// Bad: two unrelated concepts in one test
assert order.total == 121
assert emailService.wasCalled()
一个测试应验证一个逻辑概念。这并不意味着只能有一个
assert
调用——断言单个结果的多个属性是可行的。关键在于测试仅会因一个原因失败。
// 良好示例:一个概念——"已完成订单的金额计算正确"
assert order.subtotal == 100
assert order.tax == 21
assert order.total == 121

// 不良示例:一个测试包含两个无关概念
assert order.total == 121
assert emailService.wasCalled()

Test Naming

测试命名

Test names should describe the behavior, not the implementation. A good test name answers: "What scenario is being tested, and what is the expected outcome?"
Patterns that work across languages:
  • should_return_zero_when_cart_is_empty
  • rejects_negative_quantities
  • applies_discount_for_premium_customers
Avoid names like
testCalculate
,
test1
, or
testGetterSetter
.
测试名称应描述行为,而非实现。好的测试名称应回答:“测试的是什么场景,预期结果是什么?”
适用于各语言的命名模式:
  • should_return_zero_when_cart_is_empty
  • rejects_negative_quantities
  • applies_discount_for_premium_customers
避免使用
testCalculate
test1
testGetterSetter
这类名称。

Test Independence and Isolation

测试的独立性与隔离性

Each test must be completely independent of every other test:
  • No shared mutable state between tests
  • No required execution order
  • Each test sets up its own preconditions and cleans up after itself
  • A single failing test should not cascade into other failures
每个测试必须完全独立于其他测试:
  • 测试间无共享可变状态
  • 无强制执行顺序
  • 每个测试自行设置前置条件并清理环境
  • 单个测试失败不应引发连锁失败

Deterministic Tests

确定性测试

A test must produce the same result every time it runs, regardless of:
  • The current time or date
  • The order of test execution
  • The machine it runs on
  • Network availability
  • Other tests running in parallel
Non-deterministic tests (flaky tests) destroy trust in the test suite and are worse than no tests at all.
无论以下因素如何变化,测试每次运行都应产生相同结果:
  • 当前时间或日期
  • 测试执行顺序
  • 运行机器
  • 网络可用性
  • 并行运行的其他测试
非确定性测试(不稳定测试)会摧毁对测试套件的信任,甚至比没有测试更糟。

FIRST Principles

FIRST原则

PrincipleMeaning
FastTests should run in seconds, not minutes. Slow tests don't get run.
IndependentNo test relies on the output of another test.
RepeatableSame result in any environment — local, CI, staging.
Self-validatingPass or fail with no human interpretation required.
TimelyWritten at the right time — ideally before or alongside the production code.
原则含义
Fast(快速)测试应在数秒内完成,而非数分钟。慢测试不会被开发者运行。
Independent(独立)无测试依赖其他测试的输出。
Repeatable(可重复)在任何环境(本地、CI、预发布)中结果一致。
Self-validating(自验证)无需人工解读即可判断通过或失败。
Timely(及时)在合适的时机编写——理想情况下与生产代码同步或提前编写。

Test-Driven Development (TDD)

测试驱动开发(TDD)

TDD is a design discipline where tests are written before production code, following a tight feedback loop.
TDD是一种设计规范,要求在编写生产代码前先编写测试,遵循紧凑的反馈循环。

Red-Green-Refactor Cycle

红-绿-重构循环

  1. Red — Write a failing test that describes the desired behavior
  2. Green — Write the simplest production code that makes the test pass
  3. Refactor — Improve the code structure while keeping all tests green
Rules:
  • Never write production code without a failing test
  • Write only enough test to fail (compilation failure counts)
  • Write only enough production code to pass the current failing test
  1. — 编写一个描述期望行为的失败测试
  2. 绿 — 编写最简生产代码使测试通过
  3. 重构 — 优化代码结构,同时保持所有测试通过
规则:
  • 无失败测试时绝不编写生产代码
  • 仅编写足够导致失败的测试(编译失败也算)
  • 仅编写足够通过当前失败测试的生产代码

Two Schools of TDD

TDD的两大流派

AspectChicago (Classical)London (Mockist)
VerificationState-basedInteraction-based
DirectionInside-outOutside-in
CollaboratorsReal objectsMocks/stubs
StrengthRefactoring-resilient testsDrives interface design
RiskComplex setup for deep graphsTests coupled to implementation
See TDD Schools reference for detailed comparison and guidance.
维度芝加哥(经典派)伦敦(模拟派)
验证方式基于状态基于交互
方向由内向外由外向内
协作对象真实对象模拟/存根
优势测试对重构有韧性驱动接口设计
风险深层对象图设置复杂测试与实现耦合
详见TDD流派参考文档获取详细对比与指导。

When TDD Helps Most

TDD的最佳适用场景

  • Business logic with clear rules and edge cases
  • Algorithm design
  • API contract definition
  • Bug reproduction and fixing (write the failing test first)
  • 规则明确、存在边界情况的业务逻辑
  • 算法设计
  • API契约定义
  • Bug复现与修复(先编写失败测试)

When TDD May Not Apply

TDD的不适用场景

  • Exploratory prototyping (write tests after you understand the shape)
  • UI layout and styling
  • One-off scripts
  • 探索性原型开发(理解需求后再编写测试)
  • UI布局与样式
  • 一次性脚本

Test Doubles

测试替身

Test doubles replace real dependencies during testing. Each type serves a different purpose.
DoublePurposeVerifies?
DummyFill parameter lists. Never actually used.No
StubProvide canned responses to method calls.No
SpyRecord interactions for later assertion.Yes (after the fact)
MockPre-programmed with expectations. Fails if not called correctly.Yes (inline)
FakeSimplified working implementation (e.g., in-memory repository).No
See Test Doubles reference for detailed guidance on when to use each type.
测试替身用于在测试期间替代真实依赖。每种类型有不同用途。
替身类型用途是否验证?
Dummy(占位符)填充参数列表,从未实际使用。
Stub(存根)为方法调用提供预设响应。
Spy(间谍)记录交互以便后续断言。是(事后)
Mock(模拟)预编程预期调用,若未按预期调用则失败。是(内联)
Fake(伪实现)简化的可用实现(如内存仓库)。
详见测试替身参考文档获取各类型的使用时机指导。

Key Principle: Mock at Boundaries

核心原则:在边界处使用模拟

Use test doubles at architectural boundaries (ports, external services), not between internal collaborators. Mocking internal classes couples your tests to implementation details and makes refactoring painful.
在架构边界(端口、外部服务)使用测试替身,而非内部协作对象之间。对内部类进行模拟会使测试与实现细节耦合,导致重构困难。

What to Test / What Not to Test

测试内容与非测试内容

High Value — Always Test

高价值内容——必须测试

  • Business rules and domain logic
  • Edge cases, boundary conditions, error paths
  • State transitions and workflows
  • Input validation and sanitization
  • Security-critical paths (authentication, authorization)
  • Data transformations and calculations
  • 业务规则与领域逻辑
  • 边界情况、临界条件、错误路径
  • 状态转换与工作流程
  • 输入验证与清理
  • 安全关键路径(认证、授权)
  • 数据转换与计算

Low Value — Usually Skip

低价值内容——通常可跳过

  • Trivial getters/setters with no logic
  • Framework-generated code (ORM mappings, routing config)
  • Third-party library internals (test your integration, not their code)
  • Private methods (test through the public API)
  • Logging and telemetry (unless business-critical)
  • 无逻辑的简单getter/setter
  • 框架生成代码(ORM映射、路由配置)
  • 第三方库内部实现(测试集成而非其代码)
  • 私有方法(通过公共API测试)
  • 日志与遥测(除非业务关键)

Testing Implementation vs Behavior

测试实现 vs 测试行为

Test behavior, not implementation. A good test describes what the system does, not how it does it internally.
Signs you are testing implementation:
  • Test breaks when you refactor without changing behavior
  • Test asserts the order of internal method calls
  • Test verifies private state rather than public output
  • Renaming an internal class breaks tests for unrelated features
Signs you are testing behavior:
  • Test describes a user-meaningful scenario
  • Test remains green after internal refactoring
  • Test asserts on outputs, side effects, or state changes visible through the public API
测试行为,而非实现。 优质测试描述系统做什么,而非内部如何做
测试实现的迹象:
  • 重构但未变更行为时测试失败
  • 测试断言内部方法调用顺序
  • 测试验证私有状态而非公开输出
  • 重命名内部类导致无关功能的测试失败
测试行为的迹象:
  • 测试描述对用户有意义的场景
  • 内部重构后测试仍通过
  • 测试断言输出、副作用或通过公共API可见的状态变化

Testing Strategies by Layer

按层级划分的测试策略

Different architectural layers call for different testing approaches. See Testing Strategies reference for detailed guidance.
LayerPrimary Test TypeKey Technique
Domain/Business LogicUnit testsState-based verification, no I/O
Application ServicesUnit + IntegrationTest doubles for infrastructure ports
Data AccessIntegrationReal database (test containers, in-memory)
API EndpointsIntegration + ContractRequest/response validation
UI ComponentsComponent testsInteraction simulation
Full SystemE2E (selective)Critical paths only
不同架构层级需要不同的测试方法。详见测试策略参考文档获取详细指导。
层级主要测试类型核心技巧
领域/业务逻辑单元测试基于状态的验证,无I/O
应用服务单元+集成测试对基础设施端口使用测试替身
数据访问集成测试真实数据库(测试容器、内存数据库)
API端点集成+契约测试请求/响应验证
UI组件组件测试交互模拟
完整系统端到端测试(选择性)仅覆盖关键路径

Common Antipatterns

常见反模式

AntipatternSymptomsFix
Brittle testsTests break on every refactor even when behavior is unchangedTest behavior through public API, not internal structure
Testing implementationAsserting on method call order, private state, internal wiringAssert on outputs and observable side effects
Slow test suiteTest suite takes 10+ minutes; developers skip running testsPush tests down the pyramid; use test doubles for I/O
Flaky testsTests pass/fail randomly without code changesRemove time dependencies, shared state, and ordering assumptions
Excessive mockingMore mock setup than actual test logic; tests are unreadableUse real collaborators where possible; mock only at boundaries
Test data couplingTests share fixtures and break when shared data changesEach test creates its own data; use builders/factories
Missing error pathsOnly happy path tested; failures discovered in productionExplicitly test error cases, edge cases, and boundary conditions
Commented-out testsFailing tests are disabled rather than fixed or deletedFix the test, or delete it if the behavior changed intentionally
Giant test methodsTests are 50+ lines with multiple acts and assertsSplit into focused tests; extract setup into helpers
No assertionTest executes code but never asserts anythingEvery test must have at least one meaningful assertion
反模式症状修复方案
脆弱测试每次重构即使行为未变更,测试也会失败通过公共API测试行为,而非内部结构
测试实现断言方法调用顺序、私有状态、内部连接断言输出与可观察的副作用
缓慢的测试套件测试套件运行需10+分钟;开发者跳过运行将测试下沉到金字塔底层;对I/O使用测试替身
不稳定测试无代码变更时测试随机通过/失败移除时间依赖、共享状态与顺序假设
过度模拟模拟设置多于实际测试逻辑;测试难以阅读尽可能使用真实协作对象;仅在边界处模拟
测试数据耦合测试共享固定数据,共享数据变更时测试失败每个测试自行创建数据;使用构建器/工厂
缺失错误路径仅测试正常路径;生产环境才发现故障显式测试错误场景、边界情况
注释掉的测试失败测试被禁用而非修复或删除修复测试,若行为已变更则删除
巨型测试方法测试代码超过50行,包含多个执行与断言步骤拆分为聚焦的测试;将设置逻辑提取到辅助方法
无断言测试测试执行代码但未做任何断言每个测试必须至少有一个有意义的断言

Quality Checklist

质量检查清单

Use this checklist when writing or reviewing tests:
  • Behavior-focused: tests describe what the system does, not how
  • Independent: no test depends on another test's execution or state
  • Deterministic: same result every time, on every machine
  • Fast: unit tests in milliseconds, full suite in under 5 minutes
  • Readable: a new team member can understand the test without reading the implementation
  • Arranged clearly: AAA structure with obvious separation of phases
  • Named descriptively: test name explains the scenario and expected outcome
  • Error paths covered: not just happy path — edge cases and failures are tested
  • Minimal setup: no unnecessary dependencies or fixtures; builders/factories where needed
  • No flakiness: no time-dependent, order-dependent, or environment-dependent tests
  • Appropriate level: tested at the lowest pyramid level that provides confidence
  • Doubles at boundaries: mocks/stubs used at architectural ports, not internal classes
编写或评审测试时使用以下清单:
  • 聚焦行为:测试描述系统做什么,而非如何做
  • 独立:测试不依赖其他测试的执行或状态
  • 确定性:每次运行、在任何机器上结果一致
  • 快速:单元测试以毫秒计,完整套件运行时间少于5分钟
  • 可读:新团队成员无需阅读实现即可理解测试
  • 结构清晰:遵循AAA结构,阶段划分明确
  • 命名规范:测试名称说明场景与预期结果
  • 覆盖错误路径:不仅测试正常路径,还覆盖边界与错误场景
  • 最小化设置:无不必要的依赖或固定数据;必要时使用构建器/工厂
  • 无不稳定问题:无时间依赖、顺序依赖或环境依赖的测试
  • 层级合适:在能提供信心的最低金字塔层级进行测试
  • 边界处使用替身:仅在架构端口使用模拟/存根,而非内部类