subagent-testing
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSubagent Testing - TDD for Skills
Subagent测试 - 面向Skill的TDD方法
Test skills with fresh subagent instances to prevent priming bias and validate effectiveness.
通过全新的Subagent实例测试Skill,防止启动偏差并验证其有效性。
Table of Contents
目录
Overview
概述
Fresh instances prevent priming: Each test uses a new Claude conversation to verify
the skill's impact is measured, not conversation history effects.
全新实例防止启动偏差:每个测试都使用全新的Claude对话,以此确保衡量的是Skill的影响,而非对话历史带来的效果。
Why Fresh Instances Matter
为什么全新实例至关重要
The Priming Problem
启动偏差问题
Running tests in the same conversation creates bias:
- Prior context influences responses
- Skill effects get mixed with conversation history
- Can't isolate skill's true impact
在同一场对话中运行测试会产生偏差:
- 先前的上下文会影响响应
- Skill的效果会与对话历史混淆
- 无法隔离Skill的真实影响
Fresh Instance Benefits
全新实例的优势
- Isolation: Each test starts clean
- Reproducibility: Consistent baseline state
- Measurement: Clear before/after comparison
- Validation: Proves skill effectiveness, not priming
- 隔离性:每个测试都从干净状态开始
- 可复现性:一致的基准状态
- 可衡量性:清晰的前后对比
- 有效性验证:证明Skill的有效性,而非启动偏差的影响
Testing Methodology
测试方法论
Three-phase TDD-style approach:
三阶段TDD风格的方法:
Phase 1: Baseline Testing (RED)
阶段1:基准测试(RED)
Test without skill to establish baseline behavior.
在不加载Skill的情况下进行测试,建立基准行为。
Phase 2: With-Skill Testing (GREEN)
阶段2:带Skill测试(GREEN)
Test with skill loaded to measure improvements.
加载Skill后进行测试,衡量改进效果。
Phase 3: Rationalization Testing (REFACTOR)
阶段3:合理化测试(REFACTOR)
Test skill's anti-rationalization guardrails.
测试Skill的反合理化防护机制。
Quick Start
快速开始
bash
undefinedbash
undefined1. Create baseline tests (without skill)
1. 创建基准测试(不加载Skill)
Use 5 diverse scenarios
使用5种不同场景
Document full responses
记录完整响应
2. Create with-skill tests (fresh instances)
2. 创建带Skill的测试(全新实例)
Load skill explicitly
显式加载Skill
Use identical prompts
使用完全相同的提示词
Compare to baseline
与基准测试结果对比
3. Create rationalization tests
3. 创建合理化测试
Test anti-rationalization patterns
测试反合理化模式
Verify guardrails work
验证防护机制是否生效
undefinedundefinedDetailed Testing Guide
详细测试指南
For complete testing patterns, examples, and templates:
- Testing Patterns - Full TDD methodology
- Test Examples - Baseline, with-skill, rationalization tests
- Analysis Templates - Scoring and comparison frameworks
如需完整的测试模式、示例和模板:
- 测试模式 - 完整的TDD方法论
- 测试示例 - 基准测试、带Skill测试、合理化测试示例
- 分析模板 - 评分与对比框架
Success Criteria
成功标准
- Baseline: Document 5+ diverse baseline scenarios
- Improvement: ≥50% improvement in skill-related metrics
- Consistency: Results reproducible across fresh instances
- Rationalization Defense: Guardrails prevent ≥80% of rationalization attempts
- 基准测试:记录5种及以上不同的基准场景
- 改进效果:Skill相关指标提升≥50%
- 一致性:在全新实例中可复现测试结果
- 反合理化防护:防护机制可阻止≥80%的合理化尝试
See Also
相关链接
- skill-authoring: Creating effective skills
- bulletproof-skill: Anti-rationalization patterns
- test-skill: Automated skill testing command
- skill-authoring:创建高效Skill
- bulletproof-skill:反合理化模式
- test-skill:自动化Skill测试命令