writing-tests

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Writing Tests

测试编写

Core Principle

核心原则

Tests prove behavior works. A test that can't fail is worthless. A test that tests mocks instead of real code is theater.
测试用于验证行为是否正常。无法失败的测试毫无价值,只测试模拟对象而非真实代码的测试只是形式主义。

Writing Good Tests

编写优质测试

One behavior per test

每个测试验证单一行为

Each test should verify exactly one thing. If the test name needs "and" in it, split it into two tests.
Good:  "creates user with valid email"
Good:  "rejects user with duplicate email"
Bad:   "creates user and sends welcome email and updates counter"
每个测试应仅验证一件事。如果测试名称中需要用到“和”,就将其拆分为两个测试。
Good:  "creates user with valid email"
Good:  "rejects user with duplicate email"
Bad:   "creates user and sends welcome email and updates counter"

Derive test cases from three sources

从三个来源推导测试用例

Build test coverage from three independent sources and verify every item maps to at least one test:
  1. User requirements -- what was requested (spec, issue, conversation)
  2. Features implemented -- what the code actually does (scan the diff)
  3. Claims in the response -- what you're about to tell the user works
Anything that appears in any source but has no corresponding test is a coverage gap. This catches the common failure mode where implemented features work but aren't tested, or where claimed behavior isn't verified.
For each source, enumerate user journeys: "As a [role], I want to [action], so that [benefit]." Generate test cases from each journey -- this ensures tests cover user-visible behavior, not implementation details.
从三个独立来源构建测试覆盖率,并确保每个条目至少对应一个测试:
  1. 用户需求——需求内容(规格说明、问题工单、沟通记录)
  2. 已实现功能——代码实际实现的内容(查看代码差异)
  3. 响应中的声明——你准备告知用户可正常使用的功能
任何出现在任一来源但无对应测试的内容都是覆盖率缺口。这能发现常见的失效场景:已实现的功能正常但未测试,或者宣称的行为未得到验证。
针对每个来源,列举用户旅程:“作为[角色],我想要[操作],以便[获益]。”从每个旅程生成测试用例——这确保测试覆盖用户可见的行为,而非实现细节。

DAMP over DRY in tests

测试中优先遵循DAMP原则而非DRY原则

Each test should be independently readable without chasing shared setup through helper functions. Duplication in tests is acceptable -- even desirable -- when it makes the test's intent obvious at a glance. Extract shared setup only when it genuinely reduces noise without hiding what the test does.
每个测试应具备独立可读性,无需通过辅助函数追踪共享设置。测试中的重复是可接受的——甚至是可取的——只要能让测试意图一目了然。仅当共享设置能真正减少冗余且不掩盖测试行为时,才提取共享设置。

Test pyramid

测试金字塔

For API/web projects, aim for ~80% unit, ~15% integration, ~5% E2E. Adjust ratios based on project risk profile -- data pipelines may need heavier integration coverage, CLI tools may need minimal E2E.
  • Unit tests (~80%): fast, isolated, test one behavior per test. Run in milliseconds. No database, no network, no filesystem. These form the foundation -- cheap to write, cheap to run, fast feedback.
  • Integration tests (~15%): verify component boundaries -- API endpoints hitting a real test database, service layers wired to real dependencies, queue producers and consumers working together. Slower than unit tests but catch wiring bugs that mocks hide.
  • E2E tests (~5%): validate critical user paths end-to-end through the real system. Expensive to write, slow to run, brittle to maintain. Limit to high-value flows (signup, checkout, core workflow). Every E2E test must justify its maintenance cost.
对于API/网页项目,目标比例为~80%单元测试、~15%集成测试、~5%端到端(E2E)测试。可根据项目风险状况调整比例——数据管道可能需要更多集成测试覆盖,CLI工具可能只需极少的E2E测试。
  • 单元测试(~80%):快速、隔离,每个测试验证单一行为。运行耗时毫秒级。无需数据库、网络或文件系统。这些是基础——编写和运行成本低,反馈快。
  • 集成测试(~15%):验证组件边界——API端点连接真实测试数据库、服务层对接真实依赖、队列生产者与消费者协同工作。比单元测试慢,但能捕捉模拟对象无法发现的连接错误。
  • E2E测试(~5%):通过真实系统端到端验证关键用户路径。编写成本高、运行慢、维护难度大。仅用于高价值流程(注册、结账、核心工作流)。每个E2E测试都必须证明其维护成本的合理性。

Name tests by expected behavior

按预期行为命名测试

The test name should describe what happens, not what's being called.
Good:  "returns 404 when user does not exist"
Bad:   "test getUserById"
Good:  "sends notification after order is placed"
Bad:   "test processOrder"
测试名称应描述会发生什么,而非调用了什么。
Good:  "returns 404 when user does not exist"
Bad:   "test getUserById"
Good:  "sends notification after order is placed"
Bad:   "test processOrder"

Use real objects when practical

尽可能使用真实对象

Mocks should be a last resort, not a first choice. Every mock is an assumption about behavior that may drift from reality.
Use real objects forUse mocks/fakes for
Database queries (use test DB)External HTTP APIs
Internal services and classesPayment gateways
File system operations (use temp dirs)Email/SMS delivery
Business logic and transformationsThird-party SDKs with rate limits
Exception: framework-provided test doubles. When a framework offers dedicated faking mechanisms (Laravel
Queue::fake()
,
Event::fake()
; React test providers and
vi.mock
for API layers), use them -- they are the idiomatic approach and maintained alongside the framework. The principle is: avoid hand-rolled mocks that drift, not framework-blessed test utilities.
模拟对象应是最后选择,而非首选。每个模拟对象都是对行为的假设,可能与实际情况偏离。
优先使用真实对象的场景使用模拟/伪对象的场景
数据库查询(使用测试数据库)外部HTTP API
内部服务和类支付网关
文件系统操作(使用临时目录)邮件/SMS发送
业务逻辑和转换有调用限制的第三方SDK
**例外:框架提供的测试替身。**当框架提供专用的伪对象机制时(如Laravel的
Queue::fake()
Event::fake()
;React测试提供者和用于API层的
vi.mock
),使用它们——这是符合框架惯例的方法,且与框架一同维护。原则是:避免自行编写可能偏离实际的模拟对象,而非拒绝框架认可的测试工具。

Tests expose bugs, not the reverse

测试暴露缺陷,而非相反

If a test uncovers broken or buggy behavior, fix the source code -- never adjust the test to match incorrect behavior. A test that passes against a bug is worse than no test at all.
如果测试发现了错误或有缺陷的行为,修复源代码——绝不要调整测试以匹配错误行为。能通过错误代码的测试比没有测试更糟糕。

Assert on outcomes, not implementation

断言结果,而非实现细节

Good:  assert user exists in database after create
Bad:   assert repository.save() was called once
Good:  assert response body contains expected fields
Bad:   assert serializer.serialize() was called with user
Good:  assert user exists in database after create
Bad:   assert repository.save() was called once
Good:  assert response body contains expected fields
Bad:   assert serializer.serialize() was called with user

Test edge cases

测试边缘情况

For every feature, consider:
  • Empty input / null / undefined
  • Boundary values (0, 1, max, max+1)
  • Invalid types (string where number expected)
  • Concurrent access (if applicable)
  • Error paths (network failure, timeout, permission denied)
  • Unicode and special characters in string inputs
针对每个功能,考虑:
  • 空输入 / null / undefined
  • 边界值(0、1、最大值、最大值+1)
  • 无效类型(预期数字却传入字符串)
  • 并发访问(如适用)
  • 错误路径(网络故障、超时、权限拒绝)
  • 字符串输入中的Unicode和特殊字符

Red-Green-Refactor (When It Applies)

红-绿-重构(适用场景)

Tests-first answer "what should this do?" Tests-after answer "what does this do?" The distinction matters: tests written after implementation are biased toward verifying what you built, not what's required.
For bug fixes, writing the failing test first is genuinely valuable -- it proves the bug exists and proves the fix works. For new features, the order is less critical than the quality.
先写测试回答“应该做什么?”后写测试回答“实际做了什么?”二者的区别很重要:实现后编写的测试倾向于验证你构建的内容,而非需求内容。
对于bug修复,先编写失败的测试非常有价值——它证明bug存在,也证明修复有效。对于新功能,编写顺序不如测试质量重要。

Bug fixes: prove-it pattern

Bug修复:验证模式

The failing test is proof the bug exists. The passing test is proof the fix works. Without both halves, there is no proof -- just coincidence.
  1. Write a test that reproduces the bug
  2. Run it and watch it fail -- confirm it fails for the right reason. A test that fails due to a typo or import error hasn't captured the bug. The failure message should describe the buggy behavior.
  3. Apply the fix
  4. Run it and watch it pass -- confirm the fix addresses the specific failure AND other tests still pass. A fix that breaks something else isn't a fix.
  5. If the test passes immediately without a fix, the test is verifying existing behavior, not the bug. Go back to step 1.
This is non-negotiable for bugs -- a fix without a regression test is a fix that will break again. The two-run sequence (fail then pass) is the proof. Skipping the first run means the test might pass for reasons unrelated to the fix.
失败的测试证明bug存在。通过的测试证明修复有效。缺少任何一半都没有说服力——只是巧合。
  1. 编写一个能复现bug的测试
  2. 运行测试并观察其失败——确认它因正确的原因失败。因拼写错误或导入错误而失败的测试并未捕捉到bug。失败消息应描述错误行为。
  3. 应用修复
  4. 运行测试并观察其通过——确认修复解决了特定失败,且其他测试仍能通过。破坏其他功能的修复不是真正的修复。
  5. 如果无需修复测试就能立即通过,说明该测试验证的是现有行为,而非bug。回到步骤1。
这对bug修复是必不可少的——没有回归测试的修复会再次失效。两次运行(失败然后通过)就是证明。跳过第一次运行意味着测试可能因与修复无关的原因通过。

New features: test alongside

新功能:同步编写测试

Write tests as you build, not after. "I'll add tests later" means "I won't add tests."
The goal: by the time the feature is done, tests exist and pass. Whether you wrote the test 5 minutes before or 5 minutes after the code matters less than whether the test exists and is good.
Minimum viability during green phase: When making a test pass, write the simplest code that satisfies it. Not the abstraction you think is "right," not the feature you imagine you'll need next. The simplest thing. Refactor only after the test is green.
构建时同步编写测试,而非之后。“我稍后再加测试”意味着“我不会加测试”。
目标:功能完成时,测试已存在且通过。无论你在代码编写前5分钟还是后5分钟写测试,都不如测试存在且质量高重要。
**绿色阶段的最低要求:**在让测试通过时,编写满足测试的最简单代码。不是你认为“正确”的抽象,也不是你想象中未来需要的功能。就是最简单的实现。仅在测试通过后再进行重构。

Anti-Patterns

反模式

Testing mock behavior instead of real behavior

测试模拟对象行为而非真实行为

Symptom: Test passes but production breaks. Tests assert that mocks were called correctly, not that the actual system works.
Fix: Replace mocks with real objects for internal code. Only mock at system boundaries (external APIs, email, payment).
**症状:**测试通过但生产环境崩溃。测试断言模拟对象被正确调用,而非实际系统正常工作。
**修复:**将内部代码的模拟对象替换为真实对象。仅在系统边界(外部API、邮件、支付)使用模拟对象。

Test-only methods in production code

生产代码中的测试专用方法

Symptom: Methods like
reset()
,
clearState()
,
setTestMode()
that exist only because tests need them.
Fix: If tests need to reset state, the code has a design problem. Refactor to make state explicit and injectable.
**症状:**存在
reset()
clearState()
setTestMode()
等仅为测试需求而存在的方法。
**修复:**如果测试需要重置状态,说明代码存在设计问题。重构代码使状态显式且可注入。

Snapshot tests as the only test

仅使用快照测试

Symptom: All tests are snapshots that get bulk-updated whenever anything changes.
Fix: Snapshots catch unintended changes but don't verify correctness. Add behavioral assertions alongside snapshots.
**症状:**所有测试都是快照,任何变化都需要批量更新。
**修复:**快照能捕捉意外变化但无法验证正确性。在快照测试之外添加行为断言。

Testing the framework

测试框架本身

Symptom: Tests verify that the ORM saves records, the router routes requests, or the framework does what its docs say.
Fix: Trust the framework. Test YOUR logic -- the business rules, transformations, and decisions your code makes.
**症状:**测试验证ORM是否保存记录、路由是否正确转发请求,或框架是否按文档所述工作。
**修复:**信任框架。测试你的逻辑——业务规则、转换和代码做出的决策。

Incomplete mocks

不完整的模拟对象

Symptom: Mock only includes the fields the test author knows about. Downstream code consumes other fields and gets undefined.
Fix: Mock the COMPLETE data structure as it exists in reality, not just the fields the immediate test uses. Before creating a mock response, check what fields the real API/type contains -- include ALL fields the system might consume downstream. Use real objects or factory-generated fixtures with all fields populated. If you must mock, generate from the real type/schema.
**症状:**模拟对象仅包含测试作者知晓的字段。下游代码使用其他字段时会得到undefined。
**修复:**模拟真实存在的完整数据结构,而非仅当前测试使用的字段。创建模拟响应前,检查真实API/类型包含哪些字段——包含系统下游可能使用的所有字段。使用真实对象或填充了所有字段的工厂生成的测试数据。如果必须模拟,从真实类型/模式生成。

Mocking without understanding

盲目使用模拟对象

Before mocking any method, ask: (1) What side effects does the real method have? (2) Does this test depend on any of those side effects? (3) Mock at the lowest level that removes the slow/external part -- not higher.
在模拟任何方法前,询问:(1) 真实方法有哪些副作用?(2) 该测试是否依赖这些副作用中的任何一个?(3) 在能移除缓慢/外部依赖的最低层级进行模拟——而非更高层级。

When Stuck

遇到困境时的解决方法

Stuck on...Do this
Don't know how to testWrite the assertion first (desired outcome), then build the test around it
Test too complicatedSimplify the interface being tested
Must mock everythingCode is too coupled -- use dependency injection
Test setup too largeExtract helpers that reduce noise without hiding test intent (see DAMP). Still complex? Simplify the design
遇到的困境解决方法
不知道如何测试先编写断言(期望的结果),再围绕断言构建测试
测试过于复杂简化被测试的接口
必须模拟所有内容代码耦合度太高——使用依赖注入
测试设置过于庞大提取能减少冗余且不掩盖测试意图的辅助函数(参见DAMP原则)。如果仍然复杂?简化设计

Rationalization Table

自我合理化对照表

When you catch yourself thinking these things, stop:
RationalizationReality
"This is too simple to need tests"Simple code still breaks. Tests document expected behavior.
"I manually tested it"Manual testing is ephemeral -- it can't be re-run, it proves nothing to the next person
"Tests will slow me down"Debugging without tests slows you down more. Tests catch bugs at write time instead of production.
"I'll add tests later"Later never comes. The context you have now is gone later.
"The tests would just test the framework"Then you're not testing your logic. Find the logic and test that.
"It's just a refactor, behavior didn't change"Run the existing tests. If they pass, you're done. If none exist, this is exactly when to add them.
"100% coverage is overkill"Nobody said 100%. But 0% is negligence. Test the important paths.
"Mocks are faster"Mocks are faster to run and slower to maintain. They test assumptions, not behavior.
"I already wrote the implementation"Sunk cost. Tests written after pass immediately and prove nothing about the original bug.
"The test is too hard to write"Hard-to-test code signals a design problem. Simplify the interface, not the test.
"I need to understand the code first"Write the test to express what you expect. The test IS your understanding, made executable.
"This is a prototype / throwaway"Prototypes become production code. Every time. The test costs 5 minutes now vs. hours debugging later.
"The deadline is too tight for tests"The deadline is too tight to debug without tests. Tests catch bugs at write time, not in production under deadline pressure.
当你产生以下想法时,请停止:
自我合理化说法实际情况
“这太简单了,不需要测试”简单代码仍会出错。测试记录了预期行为。
“我已经手动测试过了”手动测试是临时的——无法重复运行,也无法向其他人证明什么
“测试会拖慢我的速度”没有测试的调试会更慢。测试在编写阶段就捕捉bug,而非等到生产环境。
“我稍后再加测试”稍后永远不会到来。你现在拥有的上下文之后会消失。
“测试只是在测试框架”那说明你没有测试自己的逻辑。找到逻辑并进行测试。
“这只是重构,行为没有改变”运行现有测试。如果通过,就完成了。如果没有测试,这正是添加测试的时候。
“100%覆盖率没必要”没人要求100%。但0%是疏忽。测试重要路径即可。
“模拟对象更快”模拟对象运行更快但维护更慢。它们测试的是假设,而非行为。
“我已经写完实现了”沉没成本。事后编写的测试会立即通过,无法证明原始bug的存在。
“这个测试太难写了”难以测试的代码表明存在设计问题。简化接口,而非简化测试。
“我需要先理解代码”编写测试来表达你的预期。测试就是你对代码的理解,且可执行。
“这只是原型 / 一次性代码”原型总会变成生产代码。每次都是如此。现在花5分钟写测试,好过之后花数小时调试。
“ deadline太紧,没时间写测试”deadline太紧,更不能没有测试调试。测试在编写阶段捕捉bug,而非在deadline压力下的生产环境中。

Verify

验证 checklist

Before considering tests complete:
  • Every new public function/endpoint has at least one test
  • Each test has a descriptive name stating expected behavior
  • Tests use real objects where possible (mocks only at system boundaries)
  • Edge cases covered (empty, null, boundary, error paths)
  • Tests assert on outcomes, not implementation details
  • Tests are independent -- no shared mutable state between tests. If tests pass individually but fail together, use bisection to find the polluter (run one-by-one in isolation until the offending test is found)
  • Tests run fast enough to run frequently (< 30 seconds for unit suite)
  • Bug fix tests reproduce the original bug
在认为测试完成前,请确认:
  • 每个新的公共函数/端点至少有一个测试
  • 每个测试都有描述预期行为的名称
  • 测试尽可能使用真实对象(仅在系统边界使用模拟对象)
  • 覆盖了边缘情况(空值、null、边界值、错误路径)
  • 测试断言结果,而非实现细节
  • 测试相互独立——测试之间无共享可变状态。如果单独运行测试通过但一起运行失败,使用二分法找出污染状态的测试(逐个隔离运行,直到找到问题测试)
  • 测试运行速度足够快,可频繁执行(单元测试套件耗时<30秒)
  • Bug修复测试能复现原始bug

Integration

集成

This skill is referenced by:
  • workflows:work
    -- when adding tests for new functionality (Phase 2)
  • debugging
    -- when creating failing tests to reproduce bugs
  • verification-before-completion
    -- tests as primary verification evidence
本技能被以下内容引用:
  • workflows:work
    ——为新功能添加测试时(第二阶段)
  • debugging
    ——创建失败测试以复现bug时
  • verification-before-completion
    ——测试作为主要验证依据

Tech-Specific Skills

特定技术技能

This skill provides generic test discipline. For framework-specific patterns, conventions, and tooling:
  • Laravel/PHP
    php-laravel
    (PHPUnit, factories, feature/unit split, facade faking, data providers)
  • React/TypeScript
    react-frontend
    (Vitest, RTL, component/hook patterns, Playwright E2E, mocking patterns)
Both skills are complementary -- this skill covers principles (why and what to test), tech-specific skills cover implementation (how to test in that framework). When both are active, framework-specific guidance takes precedence for tooling and conventions.
本技能提供通用测试准则。如需框架特定的模式、惯例和工具:
  • Laravel/PHP
    php-laravel
    (PHPUnit、工厂、功能/单元测试拆分、门面模拟、数据提供者)
  • React/TypeScript
    react-frontend
    (Vitest、RTL、组件/钩子模式、Playwright E2E、模拟模式)
这两个技能互为补充——本技能涵盖原则(为什么测试以及测试什么),特定技术技能涵盖实现(在该框架中如何测试)。当两者同时启用时,框架特定的指导在工具和惯例方面优先。