subagent-testing

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Subagent Testing - TDD for Skills

Subagent测试 - 面向Skill的TDD方法

Test skills with fresh subagent instances to prevent priming bias and validate effectiveness.
通过全新的Subagent实例测试Skill,防止启动偏差并验证其有效性。

Table of Contents

目录

Overview

概述

Fresh instances prevent priming: Each test uses a new Claude conversation to verify the skill's impact is measured, not conversation history effects.
全新实例防止启动偏差:每个测试都使用全新的Claude对话,以此确保衡量的是Skill的影响,而非对话历史带来的效果。

Why Fresh Instances Matter

为什么全新实例至关重要

The Priming Problem

启动偏差问题

Running tests in the same conversation creates bias:
  • Prior context influences responses
  • Skill effects get mixed with conversation history
  • Can't isolate skill's true impact
在同一场对话中运行测试会产生偏差:
  • 先前的上下文会影响响应
  • Skill的效果会与对话历史混淆
  • 无法隔离Skill的真实影响

Fresh Instance Benefits

全新实例的优势

  • Isolation: Each test starts clean
  • Reproducibility: Consistent baseline state
  • Measurement: Clear before/after comparison
  • Validation: Proves skill effectiveness, not priming
  • 隔离性:每个测试都从干净状态开始
  • 可复现性:一致的基准状态
  • 可衡量性:清晰的前后对比
  • 有效性验证:证明Skill的有效性,而非启动偏差的影响

Testing Methodology

测试方法论

Three-phase TDD-style approach:
三阶段TDD风格的方法:

Phase 1: Baseline Testing (RED)

阶段1:基准测试(RED)

Test without skill to establish baseline behavior.
在不加载Skill的情况下进行测试,建立基准行为。

Phase 2: With-Skill Testing (GREEN)

阶段2:带Skill测试(GREEN)

Test with skill loaded to measure improvements.
加载Skill后进行测试,衡量改进效果。

Phase 3: Rationalization Testing (REFACTOR)

阶段3:合理化测试(REFACTOR)

Test skill's anti-rationalization guardrails.
测试Skill的反合理化防护机制。

Quick Start

快速开始

bash
undefined
bash
undefined

1. Create baseline tests (without skill)

1. 创建基准测试(不加载Skill)

Use 5 diverse scenarios

使用5种不同场景

Document full responses

记录完整响应

2. Create with-skill tests (fresh instances)

2. 创建带Skill的测试(全新实例)

Load skill explicitly

显式加载Skill

Use identical prompts

使用完全相同的提示词

Compare to baseline

与基准测试结果对比

3. Create rationalization tests

3. 创建合理化测试

Test anti-rationalization patterns

测试反合理化模式

Verify guardrails work

验证防护机制是否生效

undefined
undefined

Detailed Testing Guide

详细测试指南

For complete testing patterns, examples, and templates:
  • Testing Patterns - Full TDD methodology
  • Test Examples - Baseline, with-skill, rationalization tests
  • Analysis Templates - Scoring and comparison frameworks
如需完整的测试模式、示例和模板:
  • 测试模式 - 完整的TDD方法论
  • 测试示例 - 基准测试、带Skill测试、合理化测试示例
  • 分析模板 - 评分与对比框架

Success Criteria

成功标准

  • Baseline: Document 5+ diverse baseline scenarios
  • Improvement: ≥50% improvement in skill-related metrics
  • Consistency: Results reproducible across fresh instances
  • Rationalization Defense: Guardrails prevent ≥80% of rationalization attempts
  • 基准测试:记录5种及以上不同的基准场景
  • 改进效果:Skill相关指标提升≥50%
  • 一致性:在全新实例中可复现测试结果
  • 反合理化防护:防护机制可阻止≥80%的合理化尝试

See Also

相关链接

  • skill-authoring: Creating effective skills
  • bulletproof-skill: Anti-rationalization patterns
  • test-skill: Automated skill testing command
  • skill-authoring:创建高效Skill
  • bulletproof-skill:反合理化模式
  • test-skill:自动化Skill测试命令