testing
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseTesting
测试
A disciplined approach to verifying that software behaves correctly, remains stable under change, and communicates intent to future developers. Good tests act as living documentation, a safety net for refactoring, and a design feedback mechanism.
This skill covers universal testing concepts that apply regardless of language, framework, or tooling.
这是一种验证软件行为正确性、变更下稳定性,并向后续开发者传达意图的严谨方法。优质的测试可作为活文档、重构的安全网,以及设计反馈机制。
本内容涵盖通用测试概念,适用于任何语言、框架或工具。
When to Use
适用场景
- Designing a test strategy for a new project or feature
- Deciding what level of testing (unit, integration, e2e) a piece of code needs
- Evaluating whether existing tests are providing value or creating drag
- Applying TDD to drive design decisions
- Debugging a flaky or brittle test suite
- Reviewing test code for quality and maintainability
- 为新项目或新功能设计测试策略
- 确定某段代码需要的测试级别(单元、集成、端到端)
- 评估现有测试是否提供价值或造成负担
- 应用TDD驱动设计决策
- 调试不稳定或脆弱的测试套件
- 评审测试代码的质量与可维护性
Testing Pyramid
测试金字塔
The testing pyramid describes the ideal distribution of tests across three levels. More tests at the base, fewer at the top.
/ E2E \ Few, slow, expensive
/----------\
/ Integration \ Moderate number, moderate speed
/----------------\
/ Unit Tests \ Many, fast, cheap
/____________________\测试金字塔描述了测试在三个层级的理想分布。底层测试数量最多,顶层最少。
/ E2E \ 数量少、速度慢、成本高
/----------\
/ Integration \ 数量中等、速度中等
/----------------\
/ Unit Tests \ 数量多、速度快、成本低
/____________________\Unit Tests (Base)
单元测试(基础层)
- Test a single unit of behavior in isolation (a function, a method, a small class)
- No I/O, no database, no network, no file system
- Execute in milliseconds
- Should form the majority of your test suite (roughly 70%)
- Fast feedback loop enables rapid iteration
- 孤立测试单个行为单元(函数、方法、小型类)
- 无I/O操作、无数据库、无网络、无文件系统交互
- 执行时间以毫秒计
- 应占测试套件的大部分(约70%)
- 快速反馈循环支持快速迭代
Integration Tests (Middle)
集成测试(中间层)
- Test how multiple units collaborate, or how code interacts with external systems
- May involve a real database, message queue, or HTTP endpoint
- Execute in seconds
- Verify that wiring, configuration, and contracts between components work
- Roughly 20% of your test suite
- 测试多个单元的协作,或代码与外部系统的交互
- 可能涉及真实数据库、消息队列或HTTP端点
- 执行时间以秒计
- 验证组件间的连接、配置与契约是否正常工作
- 约占测试套件的20%
End-to-End Tests (Top)
端到端测试(顶层)
- Test complete user journeys through the full system
- Interact with the application as a user would
- Slowest, most brittle, most expensive to maintain
- Reserve for critical business paths only
- Roughly 10% of your test suite
- 测试完整系统中的完整用户流程
- 以用户视角与应用交互
- 速度最慢、最脆弱、维护成本最高
- 仅用于关键业务路径
- 约占测试套件的10%
The Ice Cream Cone Antipattern
冰淇淋锥反模式
The inverted pyramid: many e2e tests, few unit tests. Symptoms:
- Test suite takes hours to run
- Tests break constantly due to UI changes or timing issues
- Developers stop running tests locally
- Feedback loop is too slow to support continuous delivery
Fix: Identify what each e2e test is actually verifying. Push that verification down to the lowest possible level. Most business logic can be tested at the unit level.
即倒置金字塔:大量端到端测试,少量单元测试。症状包括:
- 测试套件需要数小时才能运行完成
- 因UI变更或时序问题频繁失败
- 开发者不再本地运行测试
- 反馈循环过慢,无法支持持续交付
修复方案: 明确每个端到端测试实际要验证的内容,将验证下沉到尽可能低的层级。大部分业务逻辑可在单元层测试。
Test Design Principles
测试设计原则
Arrange-Act-Assert (AAA)
准备-执行-断言(AAA)
Every test should follow three distinct phases:
- Arrange — set up the preconditions and inputs
- Act — execute the behavior under test
- Assert — verify the expected outcome
Keep each phase clearly separated. If Arrange dominates the test, extract a builder or factory. If Act requires multiple steps, you may be testing too much at once.
每个测试应遵循三个明确阶段:
- 准备(Arrange) — 设置前置条件与输入
- 执行(Act) — 执行待测试行为
- 断言(Assert) — 验证预期结果
保持每个阶段清晰分离。如果准备阶段占比过大,可提取构建器或工厂类。如果执行阶段需要多个步骤,可能意味着一次测试的内容过多。
One Assertion per Concept
每个概念对应一个断言
A test should verify one logical concept. This does not mean literally one call — asserting multiple properties of a single result is fine. What matters is that the test fails for exactly one reason.
assert// Good: one concept — "completed order has correct totals"
assert order.subtotal == 100
assert order.tax == 21
assert order.total == 121
// Bad: two unrelated concepts in one test
assert order.total == 121
assert emailService.wasCalled()一个测试应验证一个逻辑概念。这并不意味着只能有一个调用——断言单个结果的多个属性是可行的。关键在于测试仅会因一个原因失败。
assert// 良好示例:一个概念——"已完成订单的金额计算正确"
assert order.subtotal == 100
assert order.tax == 21
assert order.total == 121
// 不良示例:一个测试包含两个无关概念
assert order.total == 121
assert emailService.wasCalled()Test Naming
测试命名
Test names should describe the behavior, not the implementation. A good test name answers: "What scenario is being tested, and what is the expected outcome?"
Patterns that work across languages:
should_return_zero_when_cart_is_emptyrejects_negative_quantitiesapplies_discount_for_premium_customers
Avoid names like , , or .
testCalculatetest1testGetterSetter测试名称应描述行为,而非实现。好的测试名称应回答:“测试的是什么场景,预期结果是什么?”
适用于各语言的命名模式:
should_return_zero_when_cart_is_emptyrejects_negative_quantitiesapplies_discount_for_premium_customers
避免使用、或这类名称。
testCalculatetest1testGetterSetterTest Independence and Isolation
测试的独立性与隔离性
Each test must be completely independent of every other test:
- No shared mutable state between tests
- No required execution order
- Each test sets up its own preconditions and cleans up after itself
- A single failing test should not cascade into other failures
每个测试必须完全独立于其他测试:
- 测试间无共享可变状态
- 无强制执行顺序
- 每个测试自行设置前置条件并清理环境
- 单个测试失败不应引发连锁失败
Deterministic Tests
确定性测试
A test must produce the same result every time it runs, regardless of:
- The current time or date
- The order of test execution
- The machine it runs on
- Network availability
- Other tests running in parallel
Non-deterministic tests (flaky tests) destroy trust in the test suite and are worse than no tests at all.
无论以下因素如何变化,测试每次运行都应产生相同结果:
- 当前时间或日期
- 测试执行顺序
- 运行机器
- 网络可用性
- 并行运行的其他测试
非确定性测试(不稳定测试)会摧毁对测试套件的信任,甚至比没有测试更糟。
FIRST Principles
FIRST原则
| Principle | Meaning |
|---|---|
| Fast | Tests should run in seconds, not minutes. Slow tests don't get run. |
| Independent | No test relies on the output of another test. |
| Repeatable | Same result in any environment — local, CI, staging. |
| Self-validating | Pass or fail with no human interpretation required. |
| Timely | Written at the right time — ideally before or alongside the production code. |
| 原则 | 含义 |
|---|---|
| Fast(快速) | 测试应在数秒内完成,而非数分钟。慢测试不会被开发者运行。 |
| Independent(独立) | 无测试依赖其他测试的输出。 |
| Repeatable(可重复) | 在任何环境(本地、CI、预发布)中结果一致。 |
| Self-validating(自验证) | 无需人工解读即可判断通过或失败。 |
| Timely(及时) | 在合适的时机编写——理想情况下与生产代码同步或提前编写。 |
Test-Driven Development (TDD)
测试驱动开发(TDD)
TDD is a design discipline where tests are written before production code, following a tight feedback loop.
TDD是一种设计规范,要求在编写生产代码前先编写测试,遵循紧凑的反馈循环。
Red-Green-Refactor Cycle
红-绿-重构循环
- Red — Write a failing test that describes the desired behavior
- Green — Write the simplest production code that makes the test pass
- Refactor — Improve the code structure while keeping all tests green
Rules:
- Never write production code without a failing test
- Write only enough test to fail (compilation failure counts)
- Write only enough production code to pass the current failing test
- 红 — 编写一个描述期望行为的失败测试
- 绿 — 编写最简生产代码使测试通过
- 重构 — 优化代码结构,同时保持所有测试通过
规则:
- 无失败测试时绝不编写生产代码
- 仅编写足够导致失败的测试(编译失败也算)
- 仅编写足够通过当前失败测试的生产代码
Two Schools of TDD
TDD的两大流派
| Aspect | Chicago (Classical) | London (Mockist) |
|---|---|---|
| Verification | State-based | Interaction-based |
| Direction | Inside-out | Outside-in |
| Collaborators | Real objects | Mocks/stubs |
| Strength | Refactoring-resilient tests | Drives interface design |
| Risk | Complex setup for deep graphs | Tests coupled to implementation |
See TDD Schools reference for detailed comparison and guidance.
| 维度 | 芝加哥(经典派) | 伦敦(模拟派) |
|---|---|---|
| 验证方式 | 基于状态 | 基于交互 |
| 方向 | 由内向外 | 由外向内 |
| 协作对象 | 真实对象 | 模拟/存根 |
| 优势 | 测试对重构有韧性 | 驱动接口设计 |
| 风险 | 深层对象图设置复杂 | 测试与实现耦合 |
详见TDD流派参考文档获取详细对比与指导。
When TDD Helps Most
TDD的最佳适用场景
- Business logic with clear rules and edge cases
- Algorithm design
- API contract definition
- Bug reproduction and fixing (write the failing test first)
- 规则明确、存在边界情况的业务逻辑
- 算法设计
- API契约定义
- Bug复现与修复(先编写失败测试)
When TDD May Not Apply
TDD的不适用场景
- Exploratory prototyping (write tests after you understand the shape)
- UI layout and styling
- One-off scripts
- 探索性原型开发(理解需求后再编写测试)
- UI布局与样式
- 一次性脚本
Test Doubles
测试替身
Test doubles replace real dependencies during testing. Each type serves a different purpose.
| Double | Purpose | Verifies? |
|---|---|---|
| Dummy | Fill parameter lists. Never actually used. | No |
| Stub | Provide canned responses to method calls. | No |
| Spy | Record interactions for later assertion. | Yes (after the fact) |
| Mock | Pre-programmed with expectations. Fails if not called correctly. | Yes (inline) |
| Fake | Simplified working implementation (e.g., in-memory repository). | No |
See Test Doubles reference for detailed guidance on when to use each type.
测试替身用于在测试期间替代真实依赖。每种类型有不同用途。
| 替身类型 | 用途 | 是否验证? |
|---|---|---|
| Dummy(占位符) | 填充参数列表,从未实际使用。 | 否 |
| Stub(存根) | 为方法调用提供预设响应。 | 否 |
| Spy(间谍) | 记录交互以便后续断言。 | 是(事后) |
| Mock(模拟) | 预编程预期调用,若未按预期调用则失败。 | 是(内联) |
| Fake(伪实现) | 简化的可用实现(如内存仓库)。 | 否 |
详见测试替身参考文档获取各类型的使用时机指导。
Key Principle: Mock at Boundaries
核心原则:在边界处使用模拟
Use test doubles at architectural boundaries (ports, external services), not between internal collaborators. Mocking internal classes couples your tests to implementation details and makes refactoring painful.
在架构边界(端口、外部服务)使用测试替身,而非内部协作对象之间。对内部类进行模拟会使测试与实现细节耦合,导致重构困难。
What to Test / What Not to Test
测试内容与非测试内容
High Value — Always Test
高价值内容——必须测试
- Business rules and domain logic
- Edge cases, boundary conditions, error paths
- State transitions and workflows
- Input validation and sanitization
- Security-critical paths (authentication, authorization)
- Data transformations and calculations
- 业务规则与领域逻辑
- 边界情况、临界条件、错误路径
- 状态转换与工作流程
- 输入验证与清理
- 安全关键路径(认证、授权)
- 数据转换与计算
Low Value — Usually Skip
低价值内容——通常可跳过
- Trivial getters/setters with no logic
- Framework-generated code (ORM mappings, routing config)
- Third-party library internals (test your integration, not their code)
- Private methods (test through the public API)
- Logging and telemetry (unless business-critical)
- 无逻辑的简单getter/setter
- 框架生成代码(ORM映射、路由配置)
- 第三方库内部实现(测试集成而非其代码)
- 私有方法(通过公共API测试)
- 日志与遥测(除非业务关键)
Testing Implementation vs Behavior
测试实现 vs 测试行为
Test behavior, not implementation. A good test describes what the system does, not how it does it internally.
Signs you are testing implementation:
- Test breaks when you refactor without changing behavior
- Test asserts the order of internal method calls
- Test verifies private state rather than public output
- Renaming an internal class breaks tests for unrelated features
Signs you are testing behavior:
- Test describes a user-meaningful scenario
- Test remains green after internal refactoring
- Test asserts on outputs, side effects, or state changes visible through the public API
测试行为,而非实现。 优质测试描述系统做什么,而非内部如何做。
测试实现的迹象:
- 重构但未变更行为时测试失败
- 测试断言内部方法调用顺序
- 测试验证私有状态而非公开输出
- 重命名内部类导致无关功能的测试失败
测试行为的迹象:
- 测试描述对用户有意义的场景
- 内部重构后测试仍通过
- 测试断言输出、副作用或通过公共API可见的状态变化
Testing Strategies by Layer
按层级划分的测试策略
Different architectural layers call for different testing approaches. See Testing Strategies reference for detailed guidance.
| Layer | Primary Test Type | Key Technique |
|---|---|---|
| Domain/Business Logic | Unit tests | State-based verification, no I/O |
| Application Services | Unit + Integration | Test doubles for infrastructure ports |
| Data Access | Integration | Real database (test containers, in-memory) |
| API Endpoints | Integration + Contract | Request/response validation |
| UI Components | Component tests | Interaction simulation |
| Full System | E2E (selective) | Critical paths only |
不同架构层级需要不同的测试方法。详见测试策略参考文档获取详细指导。
| 层级 | 主要测试类型 | 核心技巧 |
|---|---|---|
| 领域/业务逻辑 | 单元测试 | 基于状态的验证,无I/O |
| 应用服务 | 单元+集成测试 | 对基础设施端口使用测试替身 |
| 数据访问 | 集成测试 | 真实数据库(测试容器、内存数据库) |
| API端点 | 集成+契约测试 | 请求/响应验证 |
| UI组件 | 组件测试 | 交互模拟 |
| 完整系统 | 端到端测试(选择性) | 仅覆盖关键路径 |
Common Antipatterns
常见反模式
| Antipattern | Symptoms | Fix |
|---|---|---|
| Brittle tests | Tests break on every refactor even when behavior is unchanged | Test behavior through public API, not internal structure |
| Testing implementation | Asserting on method call order, private state, internal wiring | Assert on outputs and observable side effects |
| Slow test suite | Test suite takes 10+ minutes; developers skip running tests | Push tests down the pyramid; use test doubles for I/O |
| Flaky tests | Tests pass/fail randomly without code changes | Remove time dependencies, shared state, and ordering assumptions |
| Excessive mocking | More mock setup than actual test logic; tests are unreadable | Use real collaborators where possible; mock only at boundaries |
| Test data coupling | Tests share fixtures and break when shared data changes | Each test creates its own data; use builders/factories |
| Missing error paths | Only happy path tested; failures discovered in production | Explicitly test error cases, edge cases, and boundary conditions |
| Commented-out tests | Failing tests are disabled rather than fixed or deleted | Fix the test, or delete it if the behavior changed intentionally |
| Giant test methods | Tests are 50+ lines with multiple acts and asserts | Split into focused tests; extract setup into helpers |
| No assertion | Test executes code but never asserts anything | Every test must have at least one meaningful assertion |
| 反模式 | 症状 | 修复方案 |
|---|---|---|
| 脆弱测试 | 每次重构即使行为未变更,测试也会失败 | 通过公共API测试行为,而非内部结构 |
| 测试实现 | 断言方法调用顺序、私有状态、内部连接 | 断言输出与可观察的副作用 |
| 缓慢的测试套件 | 测试套件运行需10+分钟;开发者跳过运行 | 将测试下沉到金字塔底层;对I/O使用测试替身 |
| 不稳定测试 | 无代码变更时测试随机通过/失败 | 移除时间依赖、共享状态与顺序假设 |
| 过度模拟 | 模拟设置多于实际测试逻辑;测试难以阅读 | 尽可能使用真实协作对象;仅在边界处模拟 |
| 测试数据耦合 | 测试共享固定数据,共享数据变更时测试失败 | 每个测试自行创建数据;使用构建器/工厂 |
| 缺失错误路径 | 仅测试正常路径;生产环境才发现故障 | 显式测试错误场景、边界情况 |
| 注释掉的测试 | 失败测试被禁用而非修复或删除 | 修复测试,若行为已变更则删除 |
| 巨型测试方法 | 测试代码超过50行,包含多个执行与断言步骤 | 拆分为聚焦的测试;将设置逻辑提取到辅助方法 |
| 无断言测试 | 测试执行代码但未做任何断言 | 每个测试必须至少有一个有意义的断言 |
Quality Checklist
质量检查清单
Use this checklist when writing or reviewing tests:
- Behavior-focused: tests describe what the system does, not how
- Independent: no test depends on another test's execution or state
- Deterministic: same result every time, on every machine
- Fast: unit tests in milliseconds, full suite in under 5 minutes
- Readable: a new team member can understand the test without reading the implementation
- Arranged clearly: AAA structure with obvious separation of phases
- Named descriptively: test name explains the scenario and expected outcome
- Error paths covered: not just happy path — edge cases and failures are tested
- Minimal setup: no unnecessary dependencies or fixtures; builders/factories where needed
- No flakiness: no time-dependent, order-dependent, or environment-dependent tests
- Appropriate level: tested at the lowest pyramid level that provides confidence
- Doubles at boundaries: mocks/stubs used at architectural ports, not internal classes
编写或评审测试时使用以下清单:
- 聚焦行为:测试描述系统做什么,而非如何做
- 独立:测试不依赖其他测试的执行或状态
- 确定性:每次运行、在任何机器上结果一致
- 快速:单元测试以毫秒计,完整套件运行时间少于5分钟
- 可读:新团队成员无需阅读实现即可理解测试
- 结构清晰:遵循AAA结构,阶段划分明确
- 命名规范:测试名称说明场景与预期结果
- 覆盖错误路径:不仅测试正常路径,还覆盖边界与错误场景
- 最小化设置:无不必要的依赖或固定数据;必要时使用构建器/工厂
- 无不稳定问题:无时间依赖、顺序依赖或环境依赖的测试
- 层级合适:在能提供信心的最低金字塔层级进行测试
- 边界处使用替身:仅在架构端口使用模拟/存根,而非内部类