test-designer
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseTest Designer
测试设计器
Independent test-design orchestrator. Encodes Independent Evaluation: the agent writing the tests must not be the agent implementing the feature, and must not inherit the implementation's assumptions.
独立测试设计协调器。遵循独立评估原则:编写测试的Agent不能是实现该功能的Agent,且不能继承实现的假设。
When to Use
使用场景
- TDD red phase for a complex / non-trivial feature (multi-file, multi-branch logic, new subsystem)
- Requirement is ambiguous enough that the implementer's tests would likely rationalize the implementation instead of catching bugs
- User explicitly asks for "independent test design", "fresh-eyes tests", or runs
/test-designer
Don't use for:
- Trivial changes (one-line fix, rename) — just write the test inline
- Bug reproduction tests — write directly from the bug report
- Non-code changes (pure docs, pure config, pure prompt)
- 针对复杂/非 trivial功能的TDD红阶段(多文件、多分支逻辑、新子系统)
- 需求模糊,导致实现者编写的测试可能会合理化实现而非发现Bug
- 用户明确要求「独立测试设计」「全新视角测试」,或调用命令
/test-designer
不适用于:
- 微小变更(单行修复、重命名)——直接在代码中编写测试即可
- Bug复现测试——直接根据Bug报告编写
- 非代码变更(纯文档、纯配置、纯提示词)
The Iron Law
铁律
The agent designing the tests must not carry the implementation's context. If you (the main Agent) are about to implement the feature, you are disqualified from designing its tests. Dispatch.
Violating this = tests that pass because they mirror the buggy implementation.
设计测试的Agent不能携带实现的上下文。 如果你(主Agent)即将实现该功能,则你没有资格设计其测试。请进行调度。
违反此规则会导致测试通过,因为它们与有Bug的实现一致。
Steps
步骤
Step 1: Assemble the dispatch package
步骤1:组装调度包
Collect only these inputs — nothing else:
- Requirement description — "what to do" and acceptance criteria (not "how to do")
- Relevant code file paths — read-only access to the code the feature will touch or integrate with
- Edge case prompts — categories the dispatched agent should enumerate:
- Boundary inputs (empty, max, min, off-by-one)
- Concurrency / ordering (if applicable)
- Resource lifecycle (cleanup on error, partial failure)
- Invariants (data consistency, idempotency)
- Adversarial inputs (malformed, oversized, mis-encoded)
Explicitly exclude:
- The implementation plan or design you've been developing
- Hints about which approach you've chosen
- Code excerpts from a work-in-progress branch
- Your own guesses about "the right way to test this"
仅收集以下输入——无其他内容:
- 需求描述——「要做什么」和验收标准(而非「怎么做」)
- 相关代码文件路径——对功能将触及或集成的代码拥有只读访问权限
- 边缘场景提示——调度的Agent应枚举的类别:
- 边界输入(空值、最大值、最小值、差一错误)
- 并发/顺序(如适用)
- 资源生命周期(错误时清理、部分失败)
- 不变量(数据一致性、幂等性)
- 恶意输入(格式错误、过大、编码错误)
明确排除:
- 你正在开发的实现计划或设计
- 关于你选择的方案的提示
- 开发中分支的代码片段
- 你自己关于「测试此功能的正确方式」的猜测
Step 2: Choose the executor
步骤2:选择执行器
| Task shape | Executor | Reason |
|---|---|---|
| Complex, architectural implications | Independent Agent (e.g., | True zero-context isolation; can use strongest model at highest effort |
| Medium complexity, current conversation clean | In-conversation subagent | Cheaper; still acceptable if main Agent hasn't yet proposed an implementation |
| Trivial | Don't dispatch — write tests inline |
Default to Independent Agent when the main Agent has already discussed or sketched implementation. Subagent isolation within the same conversation doesn't undo prior context pollution.
| 任务形态 | 执行器 | 原因 |
|---|---|---|
| 复杂、具有架构影响 | 独立Agent(例如 | 真正的零上下文隔离;可使用最强模型并投入最高精力 |
| 中等复杂度、当前对话无干扰 | 对话内子Agent | 成本更低;如果主Agent尚未提出实现方案,仍可接受 |
| 简单 | 不要调度——直接在代码中编写测试 |
默认选择独立Agent,当主Agent已经讨论或草拟过实现方案时。同一会话内的子Agent隔离无法消除之前的上下文污染。
Step 3: Dispatch with the strongest model and highest effort
步骤3:使用最强模型和最高精力进行调度
Test design is a correctness-critical reasoning task, not a rote mechanical one. Use:
- Model: strongest reasoning model the runtime offers — inherit if the main Agent is already on that tier; otherwise override. Don't hardcode a specific brand name
- Effort: (the maximum level the runtime supports). Escalation ladder:
xhigh→low→medium→highxhigh - Tools: Read / Grep / Glob on code paths; Write on test files only
- Permission: read-only on non-test files; writable on test files
Example dispatch prompt skeleton:
You are designing failing tests for a feature. You will NOT see or write the
implementation. Your job is to produce executable tests that fail today and
pass only when the feature is correctly implemented.
Requirement:
<paste requirement description + acceptance criteria>
Code paths (read-only, for understanding context):
<list of file paths>
Existing test framework and conventions:
<infer from repo or specify>
Produce:
1. A test plan — enumerate the behaviors being tested (happy path + edge
cases), grouped by category (boundary / concurrency / lifecycle /
invariants / adversarial).
2. Executable test files that fail against the current code (or against
an empty implementation).
3. For each test, one-line rationale explaining the bug it would catch.
Constraints:
- Do NOT propose an implementation.
- Do NOT edit files outside the test directory.
- Cover edge cases explicitly; don't only test the happy path.
- Use the project's existing test framework and style.测试设计是一项正确性至关重要的推理任务,而非机械性工作。请使用:
- 模型:运行时提供的最强推理模型——如果主Agent已使用该层级模型则沿用;否则覆盖。不要硬编码特定品牌名称
- 精力:(运行时支持的最高级别)。升级阶梯:
xhigh→low→medium→highxhigh - 工具:对代码路径进行读取/全局搜索/匹配;仅可写入测试文件
- 权限:对非测试文件只读;对测试文件可写
示例调度提示框架:
You are designing failing tests for a feature. You will NOT see or write the
implementation. Your job is to produce executable tests that fail today and
pass only when the feature is correctly implemented.
Requirement:
<paste requirement description + acceptance criteria>
Code paths (read-only, for understanding context):
<list of file paths>
Existing test framework and conventions:
<infer from repo or specify>
Produce:
1. A test plan — enumerate the behaviors being tested (happy path + edge
cases), grouped by category (boundary / concurrency / lifecycle /
invariants / adversarial).
2. Executable test files that fail against the current code (or against
an empty implementation).
3. For each test, one-line rationale explaining the bug it would catch.
Constraints:
- Do NOT propose an implementation.
- Do NOT edit files outside the test directory.
- Cover edge cases explicitly; don't only test the happy path.
- Use the project's existing test framework and style.Step 4: Validate the returned tests
步骤4:验证返回的测试
Before handing the tests to the implementation phase:
- Run the tests — they should FAIL (red). Tests that pass on empty/wrong implementations are useless.
- Scan the rationale — does each test catch a distinct failure mode? Drop duplicates.
- Check coverage — are all edge case categories represented? Request additions if not.
- Confirm the test framework matches — ensure the dispatched agent used the right runner / assertion lib / fixtures.
在将测试交付给实现阶段之前:
- 运行测试——测试应该失败(红态)。在空实现/错误实现上通过的测试毫无用处。
- 检查理由——每个测试是否能捕捉到不同的失败模式?移除重复项。
- 检查覆盖范围——是否涵盖了所有边缘场景类别?如果没有,请求补充。
- 确认测试框架匹配——确保调度的Agent使用了正确的运行器/断言库/固定装置。
Step 5: Hand off to implementation
步骤5:交付给实现阶段
With the validated failing tests in place, implementation proceeds per skill: write minimal code to make them pass (green), then regression.
test-driven-development在验证通过的失败测试就位后,实现阶段按照技能进行:编写最少代码使测试通过(绿态),然后进行回归测试。
test-driven-developmentOutput Format (from the dispatched agent)
输出格式(来自调度的Agent)
Require the agent to return:
A test plan (bullet list, grouped by category) followed by the test files. Each test must include a one-line rationale comment. No implementation code. No commentary on how to implement. If assumptions about the code are needed, list them explicitly at the top of the test file.
要求Agent返回:
测试计划(按类别分组的项目符号列表),随后是测试文件。每个测试必须包含一行理由注释。不得包含实现代码。不得提供关于如何实现的评论。如果需要对代码做出假设,请在测试文件顶部明确列出。
Anti-patterns
反模式
- ❌ Main Agent writes the tests after sketching the implementation — tests will mirror the implementation's assumptions
- ❌ Dispatching with medium effort / weaker model to save cost — test design quality compounds across the whole feature's lifetime
- ❌ Passing the work-in-progress branch contents to the dispatched agent — defeats Independent Evaluation
- ❌ Accepting tests that pass against an empty implementation — those tests don't constrain anything
- ❌ Skipping Step 4 validation — unvalidated tests get merged as fake green
- ❌ 主Agent在草拟实现方案后编写测试——测试会与实现的假设一致
- ❌ 为节省成本使用中等精力/较弱模型进行调度——测试设计质量会影响整个功能的生命周期
- ❌ 将开发中分支的内容传递给调度的Agent——违背独立评估原则
- ❌ 接受在空实现上通过的测试——这些测试没有任何约束作用
- ❌ 跳过步骤4的验证——未经验证的测试会被当作虚假绿态合并
Relationship to other skills
与其他技能的关系
- → clarifies the requirement (upstream of
brainstorming)test-designer - → governs the red-green-refactor loop (downstream; consumes the failing tests)
test-driven-development - → kicks in if tests unexpectedly fail after implementation (downstream)
systematic-debugging - → runs the tests at the "done" gate (downstream)
verification-before-completion
- → 明确需求(
brainstorming的上游)test-designer - → 管理红-绿-重构循环(下游;使用失败测试)
test-driven-development - → 如果实现后测试意外失败则启动(下游)
systematic-debugging - → 在「完成」阶段运行测试(下游)
verification-before-completion
Example invocation
调用示例
User: Starting work on the new plugin dependency resolver. Design tests first.
Assistant:
1. Requirement: "Resolver takes a plugin manifest and returns install order
respecting deps and detecting cycles. Must handle: transitive deps,
diamond deps, self-references, missing deps, cycles."
2. Code paths: src/plugins.ts, .claude/plugins.json schema, tests/ dir
3. Dispatch to an independent-agent skill (fresh session) at `xhigh` effort,
read-only on src/, writable on tests/
4. Agent returns: test plan (5 categories, 18 tests), tests/resolver.test.ts
with failing assertions + per-test rationale comments
5. Main Agent runs tests → all red → validates rationale → hands offUser: Starting work on the new plugin dependency resolver. Design tests first.
Assistant:
1. Requirement: "Resolver takes a plugin manifest and returns install order
respecting deps and detecting cycles. Must handle: transitive deps,
diamond deps, self-references, missing deps, cycles."
2. Code paths: src/plugins.ts, .claude/plugins.json schema, tests/ dir
3. Dispatch to an independent-agent skill (fresh session) at `xhigh` effort,
read-only on src/, writable on tests/
4. Agent returns: test plan (5 categories, 18 tests), tests/resolver.test.ts
with failing assertions + per-test rationale comments
5. Main Agent runs tests → all red → validates rationale → hands off