testing-agents-with-subagents

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Testing Agents With Subagents

用子Agent测试Agent

Operator Context

操作者背景

This skill operates as an operator for agent testing workflows, configuring Claude's behavior for systematic agent validation. It applies TDD methodology to agent development — RED (observe failures), GREEN (fix agent definition), REFACTOR (edge cases and robustness) — with subagent dispatch as the execution mechanism.
本Skill作为Agent测试工作流的操作者,配置Claude的行为以实现系统化的Agent验证。它将TDD方法论应用于Agent开发——RED(观察失败)、GREEN(修复Agent定义)、REFACTOR(边界案例与鲁棒性)——并以子Agent调度作为执行机制。

Hardcoded Behaviors (Always Apply)

硬编码行为(始终生效)

  • CLAUDE.md Compliance: Read and follow repository CLAUDE.md files before testing
  • Over-Engineering Prevention: Only test what's directly needed. No elaborate test harnesses or infrastructure. Keep test cases focused and minimal.
  • Verbatim Output Capture: Document exact agent outputs. NEVER summarize or paraphrase.
  • Isolated Execution: Each test runs in a fresh subagent to avoid context pollution
  • Evidence-Based Claims: Every claim about agent behavior MUST be backed by actual test execution
  • No Self-Exemption: You cannot decide an agent doesn't need testing. Human partner must confirm exemptions.
  • CLAUDE.md合规性:测试前阅读并遵循仓库中的CLAUDE.md文件
  • 避免过度设计:仅测试直接需要的内容。无需复杂的测试工具或基础设施。保持测试用例聚焦且精简。
  • 原始输出捕获:记录Agent的精确输出。绝不总结或改写。
  • 隔离执行:每个测试在全新的子Agent中运行,避免上下文污染
  • 基于证据的结论:所有关于Agent行为的结论必须有实际测试执行的证据支持
  • 无自我豁免权:你不能决定某个Agent无需测试。必须由人类伙伴确认豁免情况。

Default Behaviors (ON unless disabled)

默认行为(除非禁用否则开启)

  • Multi-Case Testing: Run at least 3 test cases per agent (success, failure, edge case)
  • Output Schema Validation: Verify agent output matches expected structure and required sections
  • Consistency Testing: Run same input 2+ times to verify deterministic behavior
  • Regression Testing: After fixes, re-run ALL previous test cases before declaring green
  • Temporary File Cleanup: Remove test files and artifacts at completion. Keep only files needed for documentation.
  • Document Findings: Log all observations, hypotheses, and test results in structured format
  • 多案例测试:每个Agent至少运行3个测试用例(成功、失败、边界案例)
  • 输出Schema验证:验证Agent输出是否符合预期结构和必填章节
  • 一致性测试:相同输入运行2次以上,验证行为的确定性
  • 回归测试:修复后,重新运行所有之前的测试用例,再标记为GREEN
  • 临时文件清理:完成后移除测试文件和工件。仅保留文档所需的文件。
  • 记录发现:以结构化格式记录所有观察结果、假设和测试结果

Optional Behaviors (OFF unless enabled)

可选行为(除非启用否则关闭)

  • A/B Testing: Compare agent variants using agent-comparison skill
  • Performance Benchmarking: Measure response time and token usage
  • Stress Testing: Test with large inputs, many iterations, concurrent requests
  • Eval Harness Integration: Use
    evals/harness.py skill-test
    for YAML-based automated testing
  • A/B测试:使用agent-comparison Skill对比Agent变体
  • 性能基准测试:测量响应时间和Token使用量
  • 压力测试:使用大输入、多次迭代、并发请求进行测试
  • 评估工具集成:使用
    evals/harness.py skill-test
    进行基于YAML的自动化测试

What This Skill CAN Do

本Skill能做什么

  • Systematically validate agents through RED-GREEN-REFACTOR test cycles
  • Dispatch subagents with controlled inputs and capture verbatim outputs
  • Distinguish between output structure issues and behavioral correctness issues
  • Verify fixes don't introduce regressions across the full test suite
  • Test routing logic, skill invocation, and multi-agent workflows
  • 通过RED-GREEN-REFACTOR测试周期系统化验证Agent
  • 用受控输入调度子Agent并捕获原始输出
  • 区分输出结构问题与行为正确性问题
  • 验证修复不会在整个测试套件中引入回归问题
  • 测试路由逻辑、Skill调用和多Agent工作流

What This Skill CANNOT Do

本Skill不能做什么

  • Deploy agents without completing all three test phases
  • Substitute reading agent prompts for executing actual test runs
  • Make claims about agent behavior without evidence from subagent dispatch
  • Evaluate agent quality structurally (use agent-evaluation instead)
  • Skip the RED phase even when "the fix is obvious"

  • 未完成所有三个测试阶段就部署Agent
  • 用阅读Agent提示词替代实际测试运行
  • 没有子Agent调度的证据就对Agent行为下结论
  • 从结构上评估Agent质量(请使用agent-evaluation替代)
  • 即使「修复方案很明显」也跳过RED阶段

Instructions

操作说明

Phase 0: PREPARE — Understand the Agent

阶段0:准备——了解Agent

Goal: Read the agent definition and understand what it claims to do before writing tests.
Step 1: Read the agent file
bash
undefined
目标:在编写测试前,阅读Agent定义并理解其宣称的功能。
步骤1:阅读Agent文件
bash
undefined

Read agent definition

读取Agent定义

cat agents/{agent-name}.md
cat agents/{agent-name}.md

Read any referenced skills

读取所有关联的Skill

cat skills/{skill-name}/SKILL.md

**Step 2: Identify testable claims**

Extract concrete, testable behaviors from the agent definition:
- What inputs does it accept?
- What output structure does it produce?
- What routing triggers should activate it?
- What error conditions does it handle?
- What skills does it invoke?

**Step 3: Determine minimum test count**

| Agent Type | Minimum Tests | Required Coverage |
|------------|---------------|-------------------|
| Reviewer agents | 6 | 2 real issues, 2 clean, 1 edge, 1 ambiguous |
| Implementation agents | 5 | 2 typical, 1 complex, 1 minimal, 1 error |
| Analysis agents | 4 | 2 standard, 1 edge, 1 malformed |
| Routing/orchestration | 4 | 2 correct route, 1 ambiguous, 1 invalid |

No gate — this phase is preparation. Move directly to Phase 1.
cat skills/{skill-name}/SKILL.md

**步骤2:识别可测试的宣称**

从Agent定义中提取具体、可测试的行为:
- 它接受哪些输入?
- 它产生什么输出结构?
- 哪些路由触发条件应激活它?
- 它处理哪些错误情况?
- 它调用哪些Skill?

**步骤3:确定最小测试数量**

| Agent类型 | 最小测试数 | 必需覆盖范围 |
|------------|---------------|-------------------|
| 评审类Agent | 6 | 2个真实问题、2个无问题案例、1个边界案例、1个模糊案例 |
| 实现类Agent | 5 | 2个典型案例、1个复杂案例、1个极简案例、1个错误案例 |
| 分析类Agent | 4 | 2个标准案例、1个边界案例、1个格式错误案例 |
| 路由/编排类 | 4 | 2个正确路由、1个模糊案例、1个无效案例 |

无准入门槛——此阶段为准备阶段,直接进入阶段1。

Phase 1: RED — Observe Current Behavior

阶段1:RED——观察当前行为

Goal: Run agent with test inputs and document exact current behavior before any changes.
Step 1: Define test plan
markdown
undefined
目标:用测试输入运行Agent,记录修改前的精确当前行为。
步骤1:定义测试计划
markdown
undefined

Test Plan: {agent-name}

测试计划:{agent-name}

Agent Purpose: {what the agent does} Agent File: agents/{agent-name}.md Date: {date}
Test Cases:
IDInputExpected OutputValidates
T1{input}{expected}Happy path
T2{input}{expected}Error handling
T3{input}{expected}Edge case

Write the test plan to a file before executing. This creates a reproducible baseline.

**Step 2: Dispatch subagent with test inputs**

Use the Task tool to dispatch the agent:
Task( prompt=""" [Test input for the agent]
Context: [Any required context]
{Include the actual problem/request the agent should handle} """, subagent_type="{agent-name}" )

**Step 3: Capture results verbatim**

```markdown
Agent用途: {该Agent的功能} Agent文件: agents/{agent-name}.md 日期: {日期}
测试用例:
ID输入预期输出验证点
T1{输入内容}{预期结果}正常路径
T2{输入内容}{预期结果}错误处理
T3{输入内容}{预期结果}边界案例

在执行前将测试计划写入文件。这将创建可复现的基准。

**步骤2:用测试输入调度子Agent**

使用Task工具调度Agent:
Task( prompt=""" [Agent的测试输入]
上下文:[所有必需的上下文]
{包含Agent应处理的实际问题/请求} """, subagent_type="{agent-name}" )

**步骤3:原始捕获结果**

```markdown

Test T1: Happy Path

测试T1:正常路径

Input: {exact input provided}
Expected Output: {what you expected}
Actual Output: {verbatim output from agent — do not summarize}
Result: PASS / FAIL Failure Reason: {if FAIL, exactly what was wrong}

**Step 4: Identify failure patterns**
- Which test categories fail (happy path, error, edge)?
- Are failures structural (missing sections) or behavioral (wrong answers)?
- Do failures correlate with input characteristics?

**Gate**: All test cases executed. Exact outputs captured verbatim. Failures documented with specific issues identified. Proceed only when gate passes.
输入: {提供的精确输入}
预期输出: {你的预期结果}
实际输出: {Agent的原始输出——请勿总结}
结果: 通过 / 失败 失败原因: {如果失败,精确描述问题}

**步骤4:识别失败模式**
- 哪些测试类别失败(正常路径、错误、边界)?
- 失败是结构性的(缺少章节)还是行为性的(答案错误)?
- 失败是否与输入特征相关?

**准入门槛**:所有测试用例已执行。精确输出已原始捕获。失败情况已记录并明确问题。仅当准入门槛通过后才可继续。

Phase 2: GREEN — Fix Agent Definition

阶段2:GREEN——修复Agent定义

Goal: Update agent definition until all test cases pass. One fix at a time.
Step 1: Prioritize failures
Triage failures by severity:
SeverityDescriptionPriority
CriticalAgent produces wrong answers or harmful outputFix first
HighAgent missing required output sectionsFix second
MediumAgent formatting or structure issuesFix third
LowAgent phrasing or style inconsistenciesFix last
Step 2: Diagnose root cause
Failure TypeFix Approach
Missing output sectionAdd explicit instruction to include section
Wrong formatAdd output schema with examples
Missing context handlingAdd instructions for handling missing info
Incorrect classificationAdd calibration examples
Hallucinated contentAdd constraint to only use provided info
Agent asks questions instead of answeringProvide required context in prompt or add default handling
Step 3: Make one fix at a time
Change one thing in the agent definition. Re-run ALL test cases. Document which tests now pass/fail.
Never make multiple fixes simultaneously — you cannot determine which change was effective. This is the same principle as debugging: one variable at a time.
Step 4: Iterate until green
Repeat Step 3 until all test cases pass. If a fix causes a previously passing test to fail, revert and try a different approach.
Track fix iterations:
markdown
undefined
目标:更新Agent定义,直到所有测试用例通过。一次只做一个修复。
步骤1:优先处理失败
按严重程度对失败进行分类:
严重程度描述优先级
关键Agent产生错误答案或有害输出优先修复
Agent缺少必需的输出章节次优先修复
Agent格式或结构问题第三优先
Agent措辞或风格不一致最后修复
步骤2:诊断根本原因
失败类型修复方法
缺少输出章节添加明确指令要求包含该章节
格式错误添加带示例的输出Schema
缺少上下文处理添加处理缺失信息的指令
分类错误添加校准示例
生成幻觉内容添加约束,仅使用提供的信息
Agent提问而非回答在提示词中提供必需的上下文,或添加默认处理逻辑
步骤3:一次只做一个修复
修改Agent定义中的一处内容。重新运行所有测试用例。记录哪些测试现在通过/失败。
切勿同时进行多个修复——你将无法确定哪个更改起作用。这与调试原则相同:一次只改变一个变量。
步骤4:迭代直到全部通过
重复步骤3,直到所有测试用例通过。如果某个修复导致之前通过的测试失败,回滚并尝试其他方法。
跟踪修复迭代:
markdown
undefined

Fix Log

修复日志

IterationChange MadeTests PassedTests FailedAction
1Added output schemaT1, T2T3Continue
2Added error handling instructionT1, T2, T3Green

**Gate**: All test cases pass. No regressions from previously passing tests. Can explain what each fix changed and why. Proceed only when gate passes.
迭代次数所做更改通过的测试失败的测试操作
1添加输出SchemaT1, T2T3继续
2添加错误处理指令T1, T2, T3全部通过

**准入门槛**:所有测试用例通过。之前通过的测试无回归。能解释每个修复的更改内容和原因。仅当准入门槛通过后才可继续。

Phase 3: REFACTOR — Edge Cases and Robustness

阶段3:REFACTOR——边界案例与鲁棒性

Goal: Verify agent handles boundary conditions and produces consistent outputs.
Step 1: Add edge case tests
CategoryTest Cases
Empty InputEmpty string, whitespace only, no context
Large InputVery long content, deeply nested structures
Unusual InputMalformed data, unexpected formats
Ambiguous InputCases where correct behavior is unclear
Step 2: Run consistency tests
Run the same input 3 times. Outputs should be consistent:
  • Same structure
  • Same key findings (for analysis agents)
  • Acceptable variation in phrasing only
If inconsistent: add more explicit instructions to the agent definition. Re-test.
Step 3: Run regression suite
Re-run ALL test cases (original + edge cases) to confirm nothing broke during refactoring.
Step 4: Document final test report
markdown
undefined
目标:验证Agent能处理边界条件并产生一致输出。
步骤1:添加边界案例测试
类别测试用例
空输入空字符串、仅空白字符、无上下文
大输入超长内容、深度嵌套结构
异常输入格式错误的数据、意外格式
模糊输入正确行为不明确的案例
步骤2:运行一致性测试
相同输入运行3次。输出应保持一致:
  • 相同结构
  • 相同关键结论(针对分析类Agent)
  • 仅措辞可接受少量变化
如果不一致:在Agent定义中添加更明确的指令。重新测试。
步骤3:运行回归套件
重新运行所有测试用例(原始+边界案例),确认重构期间没有任何功能损坏。
步骤4:记录最终测试报告
markdown
undefined

Test Report: {agent-name}

测试报告:{agent-name}

MetricResult
Test Cases RunN
PassedN
FailedN
Pass RateN%
指标结果
运行的测试用例数N
通过N
失败N
通过率N%

Verdict

结论

READY FOR DEPLOYMENT / NEEDS FIXES / REQUIRES REVIEW

**Gate**: Edge cases handled. Consistency verified. Full suite green. Test report documented. Fix is complete.

---
可部署 / 需要修复 / 需评审

**准入门槛**:边界案例已处理。一致性已验证。全套件通过。测试报告已记录。修复完成。

---

Examples

示例

Example 1: Testing a New Reviewer Agent

示例1:测试新的评审类Agent

User says: "Test the new reviewer-security agent" Actions:
  1. Define 6 test cases: 2 real issues, 2 clean code, 1 edge case, 1 ambiguous (RED)
  2. Dispatch subagent for each, capture verbatim outputs (RED)
  3. Fix agent definition for any failures, re-run all tests (GREEN)
  4. Add edge cases (empty input, malformed code), verify consistency (REFACTOR) Result: Agent passes all tests, report documents pass rate and verdict
用户说:「测试新的reviewer-security Agent」 操作:
  1. 定义6个测试用例:2个真实问题、2个无问题代码、1个边界案例、1个模糊案例(RED)
  2. 为每个用例调度子Agent,捕获原始输出(RED)
  3. 修复Agent定义中的任何失败,重新运行所有测试(GREEN)
  4. 添加边界案例(空输入、格式错误代码),验证一致性(REFACTOR) 结果:Agent通过所有测试,报告记录了通过率和结论

Example 2: Testing After Agent Modification

示例2:Agent修改后测试

User says: "I updated the golang-general-engineer, make sure it still works" Actions:
  1. Run existing test cases against modified agent (RED)
  2. Compare outputs to previous baseline (RED)
  3. Fix any regressions introduced by the modification (GREEN)
  4. Test edge cases to verify robustness not degraded (REFACTOR) Result: Agent modification validated, no regressions confirmed
用户说:「我更新了golang-general-engineer,确保它仍能正常工作」 操作:
  1. 用修改后的Agent运行现有测试用例(RED)
  2. 将输出与之前的基准对比(RED)
  3. 修复修改引入的任何回归问题(GREEN)
  4. 测试边界案例,验证鲁棒性未下降(REFACTOR) 结果:Agent修改已验证,确认无回归问题

Example 3: Testing Routing Logic

示例3:测试路由逻辑

User says: "Verify the /do router sends Go requests to the right agent" Actions:
  1. Define test cases: "Review this Go code", "Fix this .go file", "Write a goroutine" (RED)
  2. Dispatch each through router, verify correct agent handles it (RED)
  3. Fix routing triggers if wrong agent selected (GREEN)
  4. Test ambiguous inputs like "Review this code" with mixed-language context (REFACTOR) Result: Routing validated for all trigger phrases, ambiguous cases documented

用户说:「验证/do路由器是否将Go请求发送给正确的Agent」 操作:
  1. 定义测试用例:「评审这段Go代码」「修复这个.go文件」「写一个goroutine」(RED)
  2. 通过路由器调度每个请求,验证正确的Agent处理了请求(RED)
  3. 如果选择了错误的Agent,修复路由触发条件(GREEN)
  4. 测试模糊输入,比如「评审这段代码」(包含混合语言上下文)(REFACTOR) 结果:所有触发短语的路由已验证,模糊案例已记录

Error Handling

错误处理

Error: "Agent type not found"

错误:「Agent type not found」

Cause: Agent not registered or name misspelled Solution:
  1. Verify agent file exists:
    ls agents/{agent-name}.md
  2. Check YAML frontmatter has correct
    name
    field
  3. Restart Claude Code to pick up new agents
原因:Agent未注册或名称拼写错误 解决方案:
  1. 验证Agent文件存在:
    ls agents/{agent-name}.md
  2. 检查YAML前置元数据中的
    name
    字段是否正确
  3. 重启Claude Code以加载新Agent

Error: "Inconsistent outputs across runs"

错误:「Inconsistent outputs across runs」

Cause: Agent produces different results for same input Solution:
  1. Document the inconsistency — this is a valid finding
  2. Add more explicit instructions to agent definition
  3. Re-test consistency after fix
  4. Determine if variation is acceptable (phrasing) or problematic (structure/findings)
原因:相同输入下Agent产生不同结果 解决方案:
  1. 记录不一致情况——这是有效的测试发现
  2. 在Agent定义中添加更明确的指令
  3. 修复后重新测试一致性
  4. 判断变化是否可接受(措辞变化可接受,结构/结论变化不可接受)

Error: "Subagent timeout"

错误:「Subagent timeout」

Cause: Agent taking too long to respond Solution:
  1. Simplify test input to reduce processing
  2. Check agent isn't in an infinite loop or excessive tool use
  3. Increase timeout if agent legitimately needs more time
原因:Agent响应时间过长 解决方案:
  1. 简化测试输入以减少处理量
  2. 检查Agent是否陷入无限循环或过度使用工具
  3. 如果Agent确实需要更多时间,增加超时时间

Error: "Agent asks questions instead of answering"

错误:「Agent asks questions instead of answering」

Cause: Agent needs clarification that test input did not provide Solution:
  1. This may be correct behavior — agent properly requesting context
  2. Update test input to provide the required context
  3. Or update agent definition to handle ambiguity with defaults
  4. Document whether questioning behavior is acceptable for this agent type

原因:Agent需要测试输入中未提供的澄清信息 解决方案:
  1. 这可能是正确行为——Agent正确请求上下文
  2. 更新测试输入以提供所需上下文
  3. 或更新Agent定义,用默认值处理模糊情况
  4. 记录该Agent类型的提问行为是否可接受

Anti-Patterns

反模式

Anti-Pattern 1: Testing Without Capturing Exact Output

反模式1:测试时不捕获精确输出

What it looks like: "Tested the agent, it looks good." Why wrong: No evidence of what was tested. Cannot reproduce or verify results. Subjective assessment instead of objective evidence. Do instead: Capture verbatim output for every test case. Document input, expected, actual, and result.
表现:「测试了Agent,看起来没问题。」 错误原因:没有测试内容的证据。无法复现或验证结果。用主观评估替代客观证据。 正确做法:为每个测试用例捕获原始输出。记录输入、预期、实际和结果。

Anti-Pattern 2: Testing Only Happy Path

反模式2:仅测试正常路径

What it looks like: "Tested with one example, it worked." Why wrong: Agents fail on edge cases most often. One test proves almost nothing. False confidence in agent quality. Do instead: Minimum 3-6 test cases per agent covering success, failure, edge, and ambiguous inputs.
表现:「用一个例子测试过了,能正常工作。」 错误原因:Agent最常在边界案例上失败。一个测试几乎证明不了什么。会对Agent质量产生错误的信心。 正确做法:每个Agent至少测试3-6个用例,覆盖成功、失败、边界和模糊输入。

Anti-Pattern 3: Skipping Re-test After Fixes

反模式3:修复后不重新测试

What it looks like: "Fixed the issue, should work now." Why wrong: Fix might have broken other tests. No verification fix actually works. Regression bugs slip through. Do instead: Re-run ALL test cases after any change. Only mark green when full suite passes.
表现:「修复了问题,现在应该能正常工作了。」 错误原因:修复可能破坏了其他测试。没有验证修复是否真的有效。回归漏洞会被遗漏。 正确做法:任何更改后重新运行所有测试用例。只有全套件通过后才能标记为GREEN。

Anti-Pattern 4: Reading Prompts Instead of Running Agents

反模式4:阅读提示词替代运行Agent

What it looks like: "Checked that agent prompt has the right sections." Why wrong: Reading a prompt is not executing an agent. Prompt structure does not guarantee behavior. Must verify actual output. Do instead: Test what the agent DOES, not what the prompt SAYS. Execute with real inputs via Task tool.
表现:「检查过了,Agent提示词有正确的章节。」 错误原因:阅读提示词不等于执行Agent。提示词结构不能保证行为。必须验证实际输出。 正确做法:测试Agent的实际行为,而不是提示词的内容。用Task工具执行真实输入。

Anti-Pattern 5: Self-Exempting from Testing

反模式5:自我豁免测试

What it looks like: "This agent is simple, doesn't need testing." or "Simple change, no need to re-test." Why wrong: Simple agents can still fail. Small changes can break behavior. You cannot self-determine exemptions from testing. Do instead: Get human partner confirmation for exemptions. When in doubt, test. Document why testing was skipped if approved.

表现:「这个Agent很简单,不需要测试。」或「只是小改动,不需要重新测试。」 错误原因:简单Agent也可能失败。小改动也可能破坏行为。你不能自行决定豁免测试。 正确做法:豁免情况必须得到人类伙伴的确认。如有疑问,就测试。如果获批跳过测试,记录原因。

References

参考资料

This skill uses these shared patterns:
  • Anti-Rationalization — Prevents shortcut rationalizations
  • Anti-Rationalization: Testing — Testing-specific rationalization blocks
  • Verification Checklist — Pre-completion checks
本Skill使用以下共享模式:
  • Anti-Rationalization — 防止找借口走捷径
  • Anti-Rationalization: Testing — 测试专用的防借口规则
  • Verification Checklist — 完成前检查清单

Domain-Specific Anti-Rationalization

领域专用防借口规则

RationalizationWhy It's WrongRequired Action
"Agent prompt looks correct"Reading prompt ≠ executing agentDispatch subagent and capture output
"Tested manually in conversation"Not reproducible, no baselineUse Task tool for formal dispatch
"Only a small change"Small changes can break agent behaviorRun full test suite
"Will monitor in production"Production monitoring ≠ pre-deployment testingComplete RED-GREEN-REFACTOR first
"Based on working template"Template correctness ≠ instance correctnessTest this specific agent
借口错误原因必需操作
「Agent提示词看起来是正确的」阅读提示词≠执行Agent调度子Agent并捕获输出
「在对话中手动测试过了」不可复现,无基准使用Task工具进行正式调度
「只是小改动」小改动可能破坏Agent行为运行完整测试套件
「会在生产环境中监控」生产监控≠部署前测试先完成RED-GREEN-REFACTOR
「基于可用的模板」模板正确≠实例正确测试这个特定的Agent

Integration

集成

  • agent-comparison
    : A/B test agent variants
  • agent-evaluation
    : Structural quality checks
  • test-driven-development
    : TDD principles applied to agents
  • agent-comparison
    :A/B测试Agent变体
  • agent-evaluation
    :结构质量检查
  • test-driven-development
    :TDD原则应用于Agent

Reference Files

参考文件

  • ${CLAUDE_SKILL_DIR}/references/testing-patterns.md
    : Dispatch patterns, test scenarios, eval harness integration
  • ${CLAUDE_SKILL_DIR}/references/testing-patterns.md
    :调度模式、测试场景、评估工具集成