test-guard

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Test Guard

You are reviewing generated or changed test code before it ships. Enforce the rules below after the first test-writing pass and before the tests are presented, committed, or merged. Be a sharp reviewer, not a pedantic one: flag what wastes maintenance effort or hides real bugs, ignore cosmetic preferences.

These rules exist because coding agents over-generate tests. The common failure modes: mock-heavy unit tests that assert implementation details, near-duplicate test bodies that differ by one value, and tests that re-verify the framework instead of the project's logic. Each looks productive in a diff and costs maintenance forever.

你需要在测试代码发布前审核生成或修改的测试代码。在首次测试编写完成后、测试被展示、提交或合并前，执行以下规则。做一名敏锐的审核者，而非吹毛求疵者：标记那些会浪费维护精力或隐藏真实bug的问题，忽略美观偏好。

这些规则的存在是因为编码Agent会过度生成测试。常见的失败模式包括：依赖大量mock、断言实现细节的单元测试；除一个值外几乎完全重复的测试主体；以及重新验证框架而非项目逻辑的测试。这些测试在差异对比中看似有效，但会永久增加维护成本。

When this skill activates

此技能的激活场景

A coding agent has just written new test functions or test files, in any language
You are editing existing tests
You are reviewing a diff that contains test changes
The user asks you to write, add, or review tests

编码Agent刚编写了新的测试函数或测试文件（任意语言）
你正在编辑现有测试
你正在审核包含测试变更的差异内容
用户要求你编写、添加或审核测试

Adapt to the project first

先适配项目需求

These rules are universal, but their application is not. Before reviewing:

Check the project's own agent instructions (CLAUDE.md, AGENTS.md) and testing docs. Project-specific testing rules win over this skill when they conflict.
Identify the test stack, then read the matching reference for concrete patterns:
- Python / pytest → references/pytest.md
- PHP / PHPUnit / Pest / WordPress → references/phpunit.md
- JavaScript / TypeScript / Jest / Vitest → references/jest.md
If the project calls LLM APIs, uses agent frameworks, or wires up observability/telemetry, also read references/llm-app-testing.md — it adds three rules specific to LLM applications.
Map the project's system boundaries: network calls, databases, filesystem, clock and randomness, third-party SDKs, LLM APIs. Existing fixtures and test helpers usually reveal where the project already draws these lines.

这些规则是通用的，但应用方式并非一成不变。审核前：

查看项目自身的Agent说明文档（CLAUDE.md、AGENTS.md）和测试文档。当项目特定测试规则与此技能冲突时，以项目规则为准。
确定测试技术栈，然后阅读对应的参考文档获取具体模式：
- Python / pytest → references/pytest.md
- PHP / PHPUnit / Pest / WordPress → references/phpunit.md
- JavaScript / TypeScript / Jest / Vitest → references/jest.md
如果项目调用LLM API、使用Agent框架或集成可观测性/遥测功能，还需阅读references/llm-app-testing.md——其中包含针对LLM应用的三条特定规则。
梳理项目的系统边界：网络调用、数据库、文件系统、时钟与随机数、第三方SDK、LLM API。现有夹具（fixtures）和测试助手通常会揭示项目已划定的边界。

What to do

操作步骤

Read the test code: the diff, the new file, or the section being modified.
Check each test against the rules below.
Report violations concisely: rule number, location, why it violates, suggested fix.
If the user explicitly invokes this skill before test writing, apply the rules as you write — don't write violations and then flag them.

When writing new tests, ask for each test: "What specific bug does this catch that no other test in this suite catches?" If you can't answer clearly, don't write it.

阅读测试代码：差异内容、新文件或正在修改的部分。
根据以下规则检查每个测试。
简洁地报告违规情况：规则编号、位置、违规原因、修复建议。
如果用户在测试编写前显式调用此技能，则在编写时应用规则——不要先写出违规内容再标记。

编写新测试时，为每个测试自问：“这个测试能捕捉到哪些其他测试套件中未覆盖的特定bug？”如果无法清晰回答，就不要编写该测试。

The Nine Rules

九条规则

Rule 1: Test behavior, not implementation

规则1：测试行为，而非实现细节

Test what code does from the caller's perspective. Assert return values and observable side effects. Never assert that an internal helper was called with specific arguments — that test breaks on every refactor while catching nothing.

Violation pattern: asserting a mock of an internal function was called, where that function is not a system boundary. Fix: assert the return value or the state change the caller observes.

从调用者的视角测试代码的功能。断言返回值和可观测的副作用。绝不要断言内部辅助函数被调用了特定参数——这类测试在每次重构时都会失效，却无法捕捉任何真实问题。

违规模式： 断言内部函数的mock被调用，而该函数并非系统边界。 修复方案： 断言调用者能观察到的返回值或状态变化。

Rule 2: Every mock must be justified

规则2：每个Mock都必须有合理依据

Mock only at system boundaries: network and HTTP calls, LLM APIs, databases, filesystem I/O on external files, clock and randomness, third-party SDKs. Never mock internal classes or helper functions to isolate a "unit" — the seams you create hide the integration bugs worth catching.

When you mock a boundary, assert what the caller does with the response, not that the mock received specific arguments.

仅在系统边界处使用Mock：网络与HTTP调用、LLM API、数据库、对外部文件的文件系统I/O、时钟与随机数、第三方SDK。绝不要为了隔离“单元”而Mock内部类或辅助函数——你创建的这些隔离层会隐藏值得捕捉的集成bug。

当在边界处使用Mock时，断言调用者如何处理响应，而非断言Mock收到了特定参数。

Rule 3: One scenario per test, data-driven for variants

规则3：每个测试对应一个场景，变体使用数据驱动方式

If two or more tests share identical setup and differ only in input/output values, merge them into one data-driven test (

@pytest.mark.parametrize

, PHPUnit

#[DataProvider]

, Jest

test.each

When separate tests ARE correct: different setup, different assertions, different mock configurations, or genuinely different scenarios that happen to exercise the same function.

如果两个或多个测试设置完全相同，仅输入/输出值不同，则将它们合并为一个数据驱动测试（

@pytest.mark.parametrize

、PHPUnit

#[DataProvider]

、Jest

test.each

）。

应保留独立测试的情况： 设置不同、断言不同、Mock配置不同，或者确实是不同场景（只是恰好调用了同一个函数）。

Rule 4: Every test must justify its existence

规则4：每个测试都必须证明其存在的合理性

Ask: "What bug does this catch that no other test catches?" Delete tests that only catch typos, verify default values of data classes, or test trivial pass-through logic.

Common unjustified tests: constructors setting attributes, a function rejecting input the type system already forbids, string formatting of log messages, a constant equaling its literal value.

自问：“这个测试能捕捉到哪些其他测试未覆盖的bug？”删除那些仅能捕捉拼写错误、验证数据类默认值或测试简单传递逻辑的测试。

常见的无意义测试： 构造函数设置属性、函数拒绝类型系统已禁止的输入、日志消息的字符串格式化、常量等于其字面量值。

Rule 5: Name tests for the scenario

规则5：测试名称应描述场景

Pattern:

test_<scenario>_<expected_outcome>

. The name should read like a requirement, not echo the function signature.

Bad Good

Bad	Good
`test_parse_response_missing_field`	`test_malformed_response_falls_back_to_default`
`test_get_language_no_class`	`test_element_without_class_returns_empty_language`
`test_add_tags_single_string`	`test_single_tag_normalizes_to_list`

test_parse_response_missing_field

test_malformed_response_falls_back_to_default

test_get_language_no_class

test_element_without_class_returns_empty_language

test_add_tags_single_string

test_single_tag_normalizes_to_list

格式：

test_<场景>_<预期结果>

。名称应像一个需求描述，而非重复函数签名。

不良示例良好示例

不良示例	良好示例
`test_parse_response_missing_field`	`test_malformed_response_falls_back_to_default`
`test_get_language_no_class`	`test_element_without_class_returns_empty_language`
`test_add_tags_single_string`	`test_single_tag_normalizes_to_list`

test_parse_response_missing_field

test_malformed_response_falls_back_to_default

test_get_language_no_class

test_element_without_class_returns_empty_language

test_add_tags_single_string

test_single_tag_normalizes_to_list

Rule 6: Production regression tests are sacred

规则6：生产回归测试不可侵犯

Tests that reproduce a real production bug are always justified. Reference the incident (date, issue ID, or short description) in the name or a comment, and never delete them. They are exempt from Rule 4 — their justification is the incident.

重现真实生产bug的测试始终具有合理性。在测试名称或注释中引用事件（日期、问题ID或简短描述），且绝不要删除这些测试。它们不受规则4约束——事件本身就是其存在的依据。

Rule 7: No tests for framework guarantees

规则7：不要测试框架已保证的功能

Don't test that the validation library validates, the ORM commits, the router returns 404, or the test framework's fixtures work. Test your logic that sits on top of the framework.

Violation pattern: a test that would still pass if you deleted all the project's custom code and kept only framework defaults.

不要测试验证库是否进行验证、ORM是否提交、路由器是否返回404，或测试框架的夹具是否正常工作。测试你构建在框架之上的自有逻辑。

违规模式： 即使删除所有项目自定义代码、仅保留框架默认配置，测试仍能通过。

Rule 8: State and value objects are real, never mocked

规则8：状态与值对象需使用真实实例，绝不Mock

Never mock a data model, DTO, entity, or state object. Construct a real instance. Mocking state hides field-name typos and validation errors — exactly the bugs worth catching. If constructing the real object is painful, that is design feedback, not a reason to mock; add a small builder or factory helper.

绝不Mock数据模型、DTO、实体或状态对象。构造真实实例。Mock状态会隐藏字段名拼写错误和验证错误——而这些正是值得捕捉的bug。如果构造真实对象很繁琐，这是设计反馈，而非Mock的理由；应添加一个小型构建器或工厂助手。

Rule 9: Infrastructure under test gets real infrastructure

规则9：被测基础设施需使用真实基础设施

When database queries, schema behavior, or persistence logic is the subject of the test, run against a real test database with real migrations applied via fixtures. Mocking the session there tests nothing. Mocking the database is fine when persistence is only a side effect of the behavior under test.

当数据库查询、架构行为或持久化逻辑是测试主题时，在真实测试数据库上运行，并通过夹具应用真实迁移。在此处Mock会话毫无意义。当持久化只是被测行为的副作用时，Mock数据库是可行的。

Reporting format

报告格式

When flagging violations, use this format:

**Rule N violation** in `tests/path/file.ext::<test_name>`
- What: <one sentence describing the violation>
- Fix: <one sentence describing what to do instead>

Group violations by file. If a file has no violations, don't mention it.

标记违规时，请使用以下格式：

**规则N违规** 在 `tests/path/file.ext::<test_name>`
- 问题：<一句话描述违规内容>
- 修复：<一句话描述替代方案>

按文件分组违规情况。如果文件无违规，无需提及。

Severity guide

严重程度指南

Not all violations are equal. Use judgment:

Must fix: Rules 1, 2, 8 — these hide real bugs or make tests brittle
Should fix: Rules 3, 4, 5, 7 — these cause bloat and maintenance drag
Sacred: Rule 6 — never delete, always allow
Worth noting: Rule 9 — test architecture; flag it, but don't block small changes on it

并非所有违规都同等严重，请酌情判断：

必须修复： 规则1、2、8——这些会隐藏真实bug或导致测试脆弱
应该修复： 规则3、4、5、7——这些会导致冗余和维护负担
不可侵犯： 规则6——绝不要删除，始终允许存在
值得注意： 规则9——测试架构问题；标记即可，无需因它阻止小变更

References

参考文档

references/pytest.md — Python/pytest patterns: parametrize, fixtures, mock boundaries, real Pydantic instances
references/phpunit.md — PHP/PHPUnit/Pest patterns, including WordPress and WooCommerce test boundaries
references/jest.md — Jest/Vitest patterns: test.each, module mocks, msw, snapshot discipline
references/llm-app-testing.md — three extra rules for LLM applications: prompt contracts, observability wiring, agent-flow transitions

references/pytest.md — Python/pytest模式：参数化、夹具、边界Mock、真实Pydantic实例
references/phpunit.md — PHP/PHPUnit/Pest模式，包括WordPress和WooCommerce测试边界
references/jest.md — Jest/Vitest模式：test.each、模块Mock、msw、快照规范
references/llm-app-testing.md — LLM应用的三条额外规则：提示契约、可观测性集成、Agent流程转换

What this skill does NOT do

此技能不负责的事项

It does not run tests. Use the project's test runner for that.
It does not enforce code style — that's the linter's job.
It does not decide what to test — only how to test it.
It does not flag pre-existing violations in files you're not touching, unless asked to audit.

不运行测试。请使用项目的测试运行器执行测试。
不强制执行代码风格——这是代码检查工具（linter）的职责。
不决定要测试什么——仅决定如何测试。
不标记未修改文件中已存在的违规情况，除非被要求进行审计。