testing

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Testing

测试

This skill provides guidance on testing philosophy and practices, emphasizing tests as specifications and API design through TDD.
本技能提供测试理念与实践的指导,强调通过TDD将测试作为规范和API设计的工具。

Core Philosophy

核心理念

Tests as Executable Specifications

测试作为可执行规范

Tests are not just verification tools — they are executable specifications that document how the system should behave. A well-written test suite serves as living documentation.
测试不仅仅是验证工具——它们是可执行规范,记录系统应有的行为。一套编写精良的测试套件可作为活文档。

Tests as API Consumers

测试作为API消费者

Tests are the first users of your code's APIs. This is why TDD is valuable: you design the API by thinking about the consumer first, before thinking about implementation.
When writing tests:
  • Consider what interface would be most convenient for the caller
  • Let the test drive the API design
  • If the test is awkward to write, the API is awkward to use
测试是代码API的首批使用者。这正是TDD的价值所在:在考虑实现之前,先从消费者的角度设计API。
编写测试时:
  • 思考什么样的接口对调用者来说最便捷
  • 让测试驱动API设计
  • 如果测试编写起来很别扭,说明API的使用体验也会很糟糕

Test-Driven Development (TDD)

测试驱动开发(TDD)

Red-Green-Refactor

红-绿-重构

The TDD cycle consists of three phases:
  1. Red: Write a failing test for the next piece of functionality
  2. Green: Write the minimum code necessary to make the test pass
  3. Refactor: Improve the code while keeping tests green
Each cycle should be short — ideally minutes, not hours. Small steps reduce risk and provide frequent feedback.
TDD循环包含三个阶段:
  1. Red(红):为下一个功能点编写一个失败的测试
  2. Green(绿):编写最少的代码让测试通过
  3. Refactor(重构):在保持测试通过的前提下优化代码
每个循环应该尽可能短——理想情况下是几分钟,而非几小时。小步迭代降低风险,同时提供频繁的反馈。

The Value of TDD

TDD的价值

  • Forces thinking about the API before implementation
  • Produces code with high test coverage by default
  • Encourages simpler designs (testable code tends to be well-designed)
  • Provides immediate feedback on whether code works
  • Creates executable documentation of intended behavior
  • 迫使开发者在实现前先思考API设计
  • 自然产生高测试覆盖率的代码
  • 鼓励更简洁的设计(可测试的代码往往设计精良)
  • 即时反馈代码是否正常工作
  • 创建记录预期行为的可执行文档

Flexible TDD

灵活的TDD

Strict TDD (one test at a time, red-green-refactor) is the ideal for learning and for complex logic. However, flexibility is acceptable:
Writing all tests first is appropriate when:
  • Tests need human review/approval before implementation
  • The behavior is well-understood and stable
  • Documenting a specification before implementing
Writing tests after is acceptable when:
  • Exploring or prototyping (but add tests before committing)
  • The design is genuinely uncertain
  • Spiking to learn about a problem
The goal is well-tested code with tests that serve as specifications. The path matters less than the destination, but TDD often produces better results.
严格的TDD(一次写一个测试,遵循红-绿-重构)是学习和处理复杂逻辑的理想方式。不过,适当的灵活性也是可以接受的:
先编写所有测试适用于以下场景:
  • 测试需要在实现前经过人工审核/批准
  • 需求行为被充分理解且稳定
  • 在实现前先编写规范文档
后编写测试适用于以下场景:
  • 探索或原型开发阶段(但提交代码前需补充测试)
  • 设计方案确实不确定
  • 快速尝试以了解问题
目标是拥有测试充分且测试可作为规范的代码。实现路径不如结果重要,但TDD通常能产生更好的效果。

Speed Matters

速度至关重要

Tests should be fast. Slow tests discourage running them frequently, which defeats their purpose.
  • Target sub-second feedback for unit tests
  • Keep the full suite under a few minutes when possible
  • Identify and isolate slow tests
测试应快速执行。缓慢的测试会阻碍开发者频繁运行它们,从而失去测试的意义。
  • 单元测试的反馈时间目标为亚秒级
  • 尽可能将全量测试套件的执行时间控制在几分钟内
  • 识别并隔离慢速测试

Database Access

数据库访问

Avoid hitting the database in tests except when:
  • Testing database-specific functionality (queries, constraints, transactions)
  • Integration tests that specifically verify database behavior
Do not hit the database just to:
  • Populate models or data structures
  • Create test fixtures when in-memory objects would suffice
  • Test business logic that happens to use database-backed models
Use factories or builders that create in-memory objects when database persistence isn't the thing being tested.
测试中应避免访问数据库,除非:
  • 测试数据库特定功能(查询、约束、事务)
  • 专门验证数据库行为的集成测试
以下情况不应访问数据库:
  • 填充模型或数据结构
  • 当内存对象足够时仍创建测试夹具
  • 测试恰好使用数据库持久化模型的业务逻辑
当不需要测试数据库持久化时,使用工厂或构建器创建内存对象。

Test Structure

测试结构

One Thing Per Test

每个测试验证一个行为

Each test should verify one behavior. This doesn't always mean one assertion — sometimes verifying one behavior requires multiple assertions, especially when tests are slow. But the test should have a single reason to fail.
每个测试应仅验证一个行为。这并不总是意味着只有一个断言——有时验证一个行为需要多个断言,尤其是在测试执行缓慢的情况下。但测试应该只有一个失败的原因。

AAA Pattern

AAA模式

Structure tests using Arrange-Act-Assert:
  1. Arrange: Set up the preconditions
  2. Act: Execute the behavior being tested
  3. Assert: Verify the expected outcome
Keep each section clearly delineated. If any section is complex, consider extracting helper methods.
使用Arrange-Act-Assert(AAA)模式构建测试:
  1. Arrange(准备):设置前置条件
  2. Act(执行):执行待测试的行为
  3. Assert(断言):验证预期结果
保持每个部分清晰划分。如果任何部分过于复杂,考虑提取辅助方法。

Given-When-Then

Given-When-Then

The BDD mindset aligns with AAA:
  • Given (Arrange): The initial context
  • When (Act): The event or action
  • Then (Assert): The expected outcome
This framing helps focus on behavior from the user's perspective.
BDD思维模式与AAA模式一致:
  • Given(给定):初始上下文
  • When(当):触发的事件或动作
  • Then(则):预期的结果
这种框架有助于从用户视角聚焦于行为本身。

Mocking and Test Doubles

模拟对象与测试替身

Prefer Real Objects

优先使用真实对象

Avoid mocking when possible. Build small, simple components with immutable data to reduce the need for mocks.
尽可能避免使用mocks。构建小型、简单且使用不可变数据的组件,以减少对mocks的需求。

When Mocking is Necessary

必要时才使用Mocking

If mocking is unavoidable:
  • Mock roles, not objects — mock interfaces/behaviors, not concrete implementations
  • Prefer fakes over mocks — fakes (simplified implementations) are often clearer than mock expectations
  • Keep mock setups simple; complex mocking often signals design problems
如果必须使用mocking:
  • 模拟角色而非对象——模拟接口/行为,而非具体实现
  • 优先使用fakes而非mocks——伪对象(fakes,简化的实现)通常比mock预期更清晰
  • 保持mock设置简单;复杂的mock通常意味着设计存在问题

Signs of Excessive Mocking

过度Mocking的迹象

  • Tests that are mostly mock setup
  • Mocks returning mocks
  • Tests that break when implementation details change
  • Difficulty understanding what's actually being tested
Consider these as signals to refactor the production code.
  • 测试大部分内容都是mock设置
  • Mock返回其他Mock
  • 当实现细节变化时测试失败
  • 难以理解实际在测试什么
将这些视为需要重构生产代码的信号。

Custom Matchers

自定义匹配器

Use custom matchers (RSpec matchers, Jest matchers, etc.) to make assertions readable and intention-revealing.
Good:
ruby
expect(order).to be_fulfilled
expect(user).to have_permission(:admin)
Less clear:
ruby
expect(order.status).to eq("fulfilled")
expect(user.permissions).to include("admin")
Custom matchers:
  • Make tests read like specifications
  • Provide better failure messages
  • Encapsulate complex assertions
  • Can be reused across tests
使用自定义匹配器(RSpec匹配器、Jest匹配器等)让断言更具可读性,更能体现意图。
良好示例:
ruby
expect(order).to be_fulfilled
expect(user).to have_permission(:admin)
不够清晰的示例:
ruby
expect(order.status).to eq("fulfilled")
expect(user.permissions).to include("admin")
自定义匹配器:
  • 让测试读起来像规范文档
  • 提供更友好的失败提示信息
  • 封装复杂的断言逻辑
  • 可在多个测试中复用

Language-Specific Guidelines

语言特定指南

Ruby (RSpec)

Ruby(RSpec)

  • Use RSpec as the primary testing framework
  • Prefer
    describe
    for classes/methods,
    context
    for states/conditions
  • Use
    let
    for lazy-evaluated test data
  • Use
    subject
    for the thing being tested
  • Prefer
    expect
    syntax over
    should
  • Use
    before
    sparingly; prefer explicit setup in each test when clarity matters
  • Create custom matchers for domain-specific assertions
  • Use
    shared_examples
    for common behavior across contexts
  • Use FactoryBot for test data, but prefer
    build
    over
    create
    when persistence isn't needed
ruby
RSpec.describe Order do
  describe "#fulfill" do
    context "when all items are in stock" do
      it "marks the order as fulfilled" do
        order = build(:order, :with_available_items)
        
        order.fulfill
        
        expect(order).to be_fulfilled
      end
    end
  end
end
  • 使用RSpec作为主要测试框架
  • 优先使用
    describe
    描述类/方法,
    context
    描述状态/条件
  • 使用
    let
    定义延迟加载的测试数据
  • 使用
    subject
    定义待测试的对象
  • 优先使用
    expect
    语法而非
    should
  • 谨慎使用
    before
    ;当清晰度很重要时,优先在每个测试中显式设置
  • 为领域特定断言创建自定义匹配器
  • 使用
    shared_examples
    处理不同上下文间的通用行为
  • 使用FactoryBot生成测试数据,但不需要持久化时优先使用
    build
    而非
    create
ruby
RSpec.describe Order do
  describe "#fulfill" do
    context "when all items are in stock" do
      it "marks the order as fulfilled" do
        order = build(:order, :with_available_items)
        
        order.fulfill
        
        expect(order).to be_fulfilled
      end
    end
  end
end

JavaScript (Jest/Vitest)

JavaScript(Jest/Vitest)

  • Use descriptive test names that read as specifications
  • Use
    describe
    blocks to group related tests
  • Prefer explicit assertions over snapshot tests (unless testing UI output)
  • Use
    beforeEach
    for common setup
  • Mock external dependencies, not internal modules
javascript
describe("Order", () => {
  describe("fulfill", () => {
    it("marks the order as fulfilled when all items are in stock", () => {
      const order = buildOrder({ items: availableItems });
      
      order.fulfill();
      
      expect(order.isFulfilled()).toBe(true);
    });
  });
});
  • 使用描述性的测试名称,使其读起来像规范文档
  • 使用
    describe
    块对相关测试进行分组
  • 优先使用显式断言而非快照测试(除非测试UI输出)
  • 使用
    beforeEach
    进行通用设置
  • 模拟外部依赖,而非内部模块
javascript
describe("Order", () => {
  describe("fulfill", () => {
    it("marks the order as fulfilled when all items are in stock", () => {
      const order = buildOrder({ items: availableItems });
      
      order.fulfill();
      
      expect(order.isFulfilled()).toBe(true);
    });
  });
});

Bash (BATS or similar)

Bash(BATS或类似工具)

  • Test scripts by testing their behavior, not their output format
  • Use temporary directories for file-based tests
  • Clean up test artifacts in teardown
  • Test error conditions and exit codes
  • 通过测试行为而非输出格式来测试脚本
  • 针对基于文件的测试使用临时目录
  • 在清理阶段清理测试产物
  • 测试错误条件和退出码

Test Smells

测试坏味道

Watch for these warning signs:
  • Slow tests: Usually means too much real I/O or database access
  • Flaky tests: Often timing issues or shared state
  • Fragile tests: Breaking when implementation changes, not behavior
  • Mystery guests: Test data coming from somewhere non-obvious
  • Eager tests: Testing too many things at once
  • Obscure tests: Hard to understand what's being tested
注意以下警告信号:
  • 慢速测试:通常意味着过多的真实I/O或数据库访问
  • 不稳定测试:通常是计时问题或共享状态导致
  • 脆弱测试:实现细节变化时测试失败,而非行为变化
  • 神秘访客:测试数据来源不明确
  • 过度测试:一次测试太多内容
  • 晦涩测试:难以理解实际在测试什么