capability-documentation

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Writing Skills

编写技能文档

Overview

概述

Writing skills IS Test-Driven Development applied to process documentation.

Personal skills live in agent-specific directories (
~/.claude/skills
for Claude Code,
~/.codex/skills
for Codex)

You write test cases (pressure scenarios with subagents), watch them fail (baseline behavior), write the skill (documentation), watch tests pass (agents comply), and refactor (close loopholes).

Core principle: If you didn't watch an agent fail without the skill, you don't know if the skill teaches the right thing.

REQUIRED BACKGROUND: You MUST understand superpowers:test-driven-development before using this skill. That skill defines the fundamental RED-GREEN-REFACTOR cycle. This skill adapts TDD to documentation.

Official guidance: For Anthropic's official skill authoring best practices, see anthropic-best-practices.md. This document provides additional patterns and guidelines that complement the TDD-focused approach in this skill.

编写技能文档就是将测试驱动开发（Test-Driven Development，TDD）应用于流程文档。

个人技能存储在Agent专属目录中（Claude Code对应
~/.claude/skills
，Codex对应
~/.codex/skills
）

你需要编写测试用例（包含子Agent的压力场景），观察测试失败（基准行为），编写技能文档，观察测试通过（Agent遵循规范），然后重构（填补漏洞）。

核心原则： 如果你没看到Agent在没有该技能文档时出现错误，就无法确定该技能文档是否传授了正确的内容。

必备背景知识： 在使用本技能前，你必须理解superpowers:test-driven-development技能。该技能定义了基础的RED-GREEN-REFACTOR循环。本技能将TDD方法适配到文档编写中。

官方指南： 关于Anthropic官方的技能文档编写最佳实践，请参考anthropic-best-practices.md。本文档提供了额外的模式和指南，补充了本技能中以TDD为核心的方法。

What is a Skill?

什么是技能文档？

A skill is a reference guide for proven techniques, patterns, or tools. Skills help future Claude instances find and apply effective approaches.

Skills are: Reusable techniques, patterns, tools, reference guides

Skills are NOT: Narratives about how you solved a problem once

技能文档（Skill） 是经过验证的技术、模式或工具的参考指南。技能文档可帮助未来的Claude实例找到并应用有效的解决方案。

技能文档包含： 可复用的技术、模式、工具、参考指南

技能文档不包含： 关于你某次解决问题过程的叙事内容

TDD Mapping for Skills

技能文档的TDD映射

TDD Concept	Skill Creation
Test case	Pressure scenario with subagent
Production code	Skill document (SKILL.md)
Test fails (RED)	Agent violates rule without skill (baseline)
Test passes (GREEN)	Agent complies with skill present
Refactor	Close loopholes while maintaining compliance
Write test first	Run baseline scenario BEFORE writing skill
Watch it fail	Document exact rationalizations agent uses
Minimal code	Write skill addressing those specific violations
Watch it pass	Verify agent now complies
Refactor cycle	Find new rationalizations → plug → re-verify

The entire skill creation process follows RED-GREEN-REFACTOR.

TDD概念	技能文档创建
测试用例	包含子Agent的压力场景
生产代码	技能文档（SKILL.md）
测试失败（RED）	Agent在无技能文档时违反规则（基准状态）
测试通过（GREEN）	Agent在有技能文档时遵循规范
重构	在保持合规性的同时填补漏洞
先写测试	在编写技能文档前先运行基准场景
观察测试失败	记录Agent使用的具体合理化说辞
最简代码	编写针对这些特定违规情况的技能文档
观察测试通过	验证Agent现在是否合规
重构循环	发现新的合理化说辞 → 填补漏洞 → 重新验证

整个技能文档创建过程遵循RED-GREEN-REFACTOR循环。

When to Create a Skill

何时创建技能文档

Create when:

Technique wasn't intuitively obvious to you
You'd reference this again across projects
Pattern applies broadly (not project-specific)
Others would benefit

Don't create for:

One-off solutions
Standard practices well-documented elsewhere
Project-specific conventions (put in CLAUDE.md)

创建时机：

该技术对你而言并非直观易懂
你会在多个项目中再次参考该内容
该模式具有广泛适用性（而非特定项目专属）
其他人也能从中受益

无需创建的情况：

一次性解决方案
其他地方已有完善文档的标准实践
特定项目的约定（应放入CLAUDE.md）

Skill Types

技能文档类型

Technique

技术型

Concrete method with steps to follow (condition-based-waiting, root-cause-tracing)

包含可遵循的具体步骤（如基于条件的等待、根本原因追踪）

Pattern

模式型

Way of thinking about problems (flatten-with-flags, test-invariants)

解决问题的思维方式（如标记扁平化、不变量测试）

Reference

参考型

API docs, syntax guides, tool documentation (office docs)

API文档、语法指南、工具文档（如办公软件文档）

Directory Structure

目录结构

skills/
  skill-name/
    SKILL.md              # Main reference (required)
    supporting-file.*     # Only if needed

Flat namespace - all skills in one searchable namespace

Separate files for:

Heavy reference (100+ lines) - API docs, comprehensive syntax
Reusable tools - Scripts, utilities, templates

Keep inline:

Principles and concepts
Code patterns (< 50 lines)
Everything else

skills/
  skill-name/
    SKILL.md              # 主参考文档（必填）
    supporting-file.*     # 仅在需要时添加

扁平命名空间 - 所有技能文档都在一个可搜索的命名空间中

需单独存放的内容：

大型参考资料（100行以上）- API文档、全面语法说明
可复用工具 - 脚本、实用程序、模板

需内联的内容：

原则和概念
代码模式（少于50行）
其他所有内容

SKILL.md Structure

SKILL.md结构

Frontmatter (YAML):

Only two fields supported:
```
name
```
and
```
description
```
Max 1024 characters total
```
name
```
: Use letters, numbers, and hyphens only (no parentheses, special chars)
```
description
```
: Third-person, describes ONLY when to use (NOT what it does)
- Start with "Use when..." to focus on triggering conditions
- Include specific symptoms, situations, and contexts
- NEVER summarize the skill's process or workflow (see CSO section for why)
- Keep under 500 characters if possible

markdown

---
name: Skill-Name-With-Hyphens
description: Use when [specific triggering conditions and symptoms]
---

前置元数据（YAML）：

仅支持两个字段：
```
name
```
和
```
description
```
总字符数不超过1024
```
name
```
：仅使用字母、数字和连字符（不使用括号、特殊字符）
```
description
```
：第三人称，仅描述何时使用（不描述功能）
- 以"Use when..."开头，聚焦触发条件
- 包含具体的症状、场景和上下文
- 绝对不要总结技能文档的流程或工作流（原因见CSO部分）
- 尽可能控制在500字符以内

markdown

---
name: Skill-Name-With-Hyphens
description: Use when [具体触发条件和症状]
---

Skill Name

技能名称

Overview

概述

What is this? Core principle in 1-2 sentences.

这是什么？用1-2句话说明核心原则。

When to Use

何时使用

[Small inline flowchart IF decision non-obvious]

Bullet list with SYMPTOMS and use cases When NOT to use

[若决策不直观，可添加小型内联流程图]

列出症状和使用场景的项目符号说明不适用的情况

Core Pattern (for techniques/patterns)

核心模式（针对技术型/模式型）

Before/after code comparison

代码前后对比示例

Quick Reference

快速参考

Table or bullets for scanning common operations

用于快速查阅常见操作的表格或项目符号

Implementation

实现方式

Inline code for simple patterns Link to file for heavy reference or reusable tools

简单模式的内联代码大型参考资料或可复用工具的文件链接

Common Mistakes

常见错误

What goes wrong + fixes

可能出现的问题及修复方法

Real-World Impact (optional)

实际效果（可选）

Concrete results

undefined

具体的成果

undefined

Claude Search Optimization (CSO)

Claude搜索优化（CSO）

Critical for discovery: Future Claude needs to FIND your skill

对发现至关重要： 未来的Claude需要能找到你的技能文档

1. Rich Description Field

1. 丰富的描述字段

Purpose: Claude reads description to decide which skills to load for a given task. Make it answer: "Should I read this skill right now?"

Format: Start with "Use when..." to focus on triggering conditions

CRITICAL: Description = When to Use, NOT What the Skill Does

The description should ONLY describe triggering conditions. Do NOT summarize the skill's process or workflow in the description.

Why this matters: Testing revealed that when a description summarizes the skill's workflow, Claude may follow the description instead of reading the full skill content. A description saying "code review between tasks" caused Claude to do ONE review, even though the skill's flowchart clearly showed TWO reviews (spec compliance then code quality).

When the description was changed to just "Use when executing implementation plans with independent tasks" (no workflow summary), Claude correctly read the flowchart and followed the two-stage review process.

The trap: Descriptions that summarize workflow create a shortcut Claude will take. The skill body becomes documentation Claude skips.

yaml

undefined

目的： Claude会读取描述字段来决定为给定任务加载哪些技能文档。描述需要回答："我现在应该阅读这个技能文档吗？"

格式： 以"Use when..."开头，聚焦触发条件

关键规则：描述=何时使用，而非技能文档功能

描述字段应仅描述触发条件。绝对不要在描述中总结技能文档的流程或工作流。

原因： 测试表明，当描述总结了工作流时，Claude可能会直接遵循描述内容，而忽略完整的技能文档。例如，描述中提到"任务间的代码审查"会导致Claude只进行一次审查，尽管技能文档中的流程图明确显示需要两次审查（规范合规性审查和代码质量审查）。

当描述改为仅说明"Use when executing implementation plans with independent tasks"（无工作流总结）时，Claude会正确阅读流程图并遵循两阶段审查流程。

陷阱： 总结工作流的描述会成为Claude的捷径，导致技能文档主体被忽略。

yaml

undefined

❌ BAD: Summarizes workflow - Claude may follow this instead of reading skill

❌ 错误：总结了工作流 - Claude可能会直接遵循描述而非阅读技能文档

description: Use when executing plans - dispatches subagent per task with code review between tasks

❌ BAD: Too much process detail

❌ 错误：包含过多流程细节

description: Use for TDD - write test first, watch it fail, write minimal code, refactor

✅ GOOD: Just triggering conditions, no workflow summary

✅ 正确：仅包含触发条件，无工作流总结

description: Use when executing implementation plans with independent tasks in the current session

✅ GOOD: Triggering conditions only

✅ 正确：仅包含触发条件

description: Use when implementing any feature or bugfix, before writing implementation code


**Content:**
- Use concrete triggers, symptoms, and situations that signal this skill applies
- Describe the *problem* (race conditions, inconsistent behavior) not *language-specific symptoms* (setTimeout, sleep)
- Keep triggers technology-agnostic unless the skill itself is technology-specific
- If skill is technology-specific, make that explicit in the trigger
- Write in third person (injected into system prompt)
- **NEVER summarize the skill's process or workflow**

```yaml

description: Use when implementing any feature or bugfix, before writing implementation code


**内容要求：**
- 使用具体的触发因素、症状和场景来表明该技能文档适用
- 描述**问题**（如竞争条件、不一致行为）而非**特定语言的症状**（如setTimeout、sleep）
- 除非技能文档本身是特定技术的，否则保持触发因素与技术无关
- 如果技能文档是特定技术的，在触发条件中明确说明
- 使用第三人称（会注入到系统提示中）
- **绝对不要总结技能文档的流程或工作流**

```yaml

❌ BAD: Too abstract, vague, doesn't include when to use

❌ 错误：过于抽象、模糊，未包含使用时机

description: For async testing

❌ BAD: First person

❌ 错误：使用第一人称

description: I can help you with async tests when they're flaky

❌ BAD: Mentions technology but skill isn't specific to it

❌ 错误：提到了技术但技能文档并非针对该技术

description: Use when tests use setTimeout/sleep and are flaky

✅ GOOD: Starts with "Use when", describes problem, no workflow

✅ 正确：以"Use when"开头，描述问题，无工作流

description: Use when tests have race conditions, timing dependencies, or pass/fail inconsistently

✅ GOOD: Technology-specific skill with explicit trigger

✅ 正确：特定技术的技能文档，触发条件明确

description: Use when using React Router and handling authentication redirects

undefined

description: Use when using React Router and handling authentication redirects

undefined

2. Keyword Coverage

2. 关键词覆盖

Use words Claude would search for:

Error messages: "Hook timed out", "ENOTEMPTY", "race condition"
Symptoms: "flaky", "hanging", "zombie", "pollution"
Synonyms: "timeout/hang/freeze", "cleanup/teardown/afterEach"
Tools: Actual commands, library names, file types

使用Claude会搜索的词汇：

错误信息："Hook timed out"、"ENOTEMPTY"、"race condition"
症状："flaky"、"hanging"、"zombie"、"pollution"
同义词："timeout/hang/freeze"、"cleanup/teardown/afterEach"
工具：实际命令、库名称、文件类型

3. Descriptive Naming

3. 描述性命名

Use active voice, verb-first:

✅
```
creating-skills
```
not
```
skill-creation
```

✅

condition-based-waiting

not

async-test-helpers

使用主动语态，以动词开头：

✅
```
creating-skills
```
而非
```
skill-creation
```

✅

condition-based-waiting

而非

async-test-helpers

4. Token Efficiency (Critical)

4. 令牌效率（关键）

Problem: getting-started and frequently-referenced skills load into EVERY conversation. Every token counts.

Target word counts:

getting-started workflows: <150 words each
Frequently-loaded skills: <200 words total
Other skills: <500 words (still be concise)

Techniques:

Move details to tool help:

bash

undefined

问题： 入门类和频繁参考的技能文档会加载到每一次对话中，每一个令牌都很重要。

目标字数：

入门工作流：每个少于150词
频繁加载的技能文档：总字数少于200词
其他技能文档：少于500词（仍需简洁）

技巧：

将细节移至工具帮助文档：

bash

undefined

❌ BAD: Document all flags in SKILL.md

❌ 错误：在SKILL.md中记录所有参数

search-conversations supports --text, --both, --after DATE, --before DATE, --limit N

✅ GOOD: Reference --help

✅ 正确：参考--help

search-conversations supports multiple modes and filters. Run --help for details.


**Use cross-references:**
```markdown

search-conversations supports multiple modes and filters. Run --help for details.


**使用交叉引用：**
```markdown

❌ BAD: Repeat workflow details

❌ 错误：重复工作流细节

When searching, dispatch subagent with template... [20 lines of repeated instructions]

When searching, dispatch subagent with template... [20行重复说明]

✅ GOOD: Reference other skill

✅ 正确：引用其他技能文档

Always use subagents (50-100x context savings). REQUIRED: Use [other-skill-name] for workflow.


**Compress examples:**
```markdown

Always use subagents (50-100x context savings). REQUIRED: Use [other-skill-name] for workflow.


**压缩示例：**
```markdown

❌ BAD: Verbose example (42 words)

❌ 错误：冗长示例（42词）

your human partner: "How did we handle authentication errors in React Router before?" You: I'll search past conversations for React Router authentication patterns. [Dispatch subagent with search query: "React Router authentication error handling 401"]

✅ GOOD: Minimal example (20 words)

✅ 正确：极简示例（20词）

Partner: "How did we handle auth errors in React Router?" You: Searching... [Dispatch subagent → synthesis]


**Eliminate redundancy:**
- Don't repeat what's in cross-referenced skills
- Don't explain what's obvious from command
- Don't include multiple examples of same pattern

**Verification:**
```bash
wc -w skills/path/SKILL.md

Partner: "How did we handle auth errors in React Router?" You: Searching... [Dispatch subagent → synthesis]


**消除冗余：**
- 不要重复交叉引用技能文档中的内容
- 不要解释从命令中可明显看出的内容
- 不要包含同一模式的多个示例

**验证：**
```bash
wc -w skills/path/SKILL.md

getting-started workflows: aim for <150 each

入门工作流：目标少于150词

Other frequently-loaded: aim for <200 total

其他频繁加载的文档：目标总字数少于200词


**Name by what you DO or core insight:**
- ✅ `condition-based-waiting` > `async-test-helpers`
- ✅ `using-skills` not `skill-usage`
- ✅ `flatten-with-flags` > `data-structure-refactoring`
- ✅ `root-cause-tracing` > `debugging-techniques`

**Gerunds (-ing) work well for processes:**
- `creating-skills`, `testing-skills`, `debugging-with-logs`
- Active, describes the action you're taking


**根据操作或核心见解命名：**
- ✅ `condition-based-waiting` > `async-test-helpers`
- ✅ `using-skills` not `skill-usage`
- ✅ `flatten-with-flags` > `data-structure-refactoring`
- ✅ `root-cause-tracing` > `debugging-techniques`

**动名词（-ing）适合命名流程：**
- `creating-skills`, `testing-skills`, `debugging-with-logs`
- 主动语态，描述你正在执行的操作

4. Cross-Referencing Other Skills

4. 交叉引用其他技能文档

When writing documentation that references other skills:

Use skill name only, with explicit requirement markers:

✅ Good:

**REQUIRED SUB-SKILL:** Use superpowers:test-driven-development

✅ Good:

**REQUIRED BACKGROUND:** You MUST understand superpowers:systematic-debugging

❌ Bad:

See skills/testing/test-driven-development

(unclear if required)

❌ Bad:

@skills/testing/test-driven-development/SKILL.md

(force-loads, burns context)

Why no @ links:

syntax force-loads files immediately, consuming 200k+ context before you need them.

当编写的文档引用其他技能文档时：

仅使用技能文档名称，并添加明确的要求标记：

✅ 正确：

**REQUIRED SUB-SKILL:** Use superpowers:test-driven-development

✅ 正确：

**REQUIRED BACKGROUND:** You MUST understand superpowers:systematic-debugging

❌ 错误：
```
See skills/testing/test-driven-development
```
（是否为必填项不明确）
❌ 错误：
```
@skills/testing/test-driven-development/SKILL.md
```
（会强制加载，消耗上下文）

为何不使用@链接：

语法会立即强制加载文件，在你需要之前就消耗200k+的上下文。

Flowchart Usage

流程图使用

dot

digraph when_flowchart {
    "Need to show information?" [shape=diamond];
    "Decision where I might go wrong?" [shape=diamond];
    "Use markdown" [shape=box];
    "Small inline flowchart" [shape=box];

    "Need to show information?" -> "Decision where I might go wrong?" [label="yes"];
    "Decision where I might go wrong?" -> "Small inline flowchart" [label="yes"];
    "Decision where I might go wrong?" -> "Use markdown" [label="no"];
}

Use flowcharts ONLY for:

Non-obvious decision points
Process loops where you might stop too early
"When to use A vs B" decisions

Never use flowcharts for:

Reference material → Tables, lists
Code examples → Markdown blocks
Linear instructions → Numbered lists
Labels without semantic meaning (step1, helper2)

See @graphviz-conventions.dot for graphviz style rules.

Visualizing for your human partner: Use

render-graphs.js

in this directory to render a skill's flowcharts to SVG:

bash

./render-graphs.js ../some-skill           # Each diagram separately
./render-graphs.js ../some-skill --combine # All diagrams in one SVG

dot

digraph when_flowchart {
    "Need to show information?" [shape=diamond];
    "Decision where I might go wrong?" [shape=diamond];
    "Use markdown" [shape=box];
    "Small inline flowchart" [shape=box];

    "Need to show information?" -> "Decision where I might go wrong?" [label="yes"];
    "Decision where I might go wrong?" -> "Small inline flowchart" [label="yes"];
    "Decision where I might go wrong?" -> "Use markdown" [label="no"];
}

仅在以下情况使用流程图：

非直观的决策点
可能提前终止的流程循环
"何时使用A vs B"的决策

绝对不要在以下情况使用流程图：

参考资料 → 使用表格、列表
代码示例 → 使用Markdown块
线性指令 → 使用编号列表
无语义的标签（如step1、helper2）

关于Graphviz样式规则，请参考@graphviz-conventions.dot。

为人类伙伴可视化： 使用本目录中的

render-graphs.js

将技能文档的流程图渲染为SVG：

bash

./render-graphs.js ../some-skill           # 单独渲染每个图表
./render-graphs.js ../some-skill --combine # 将所有图表合并为一个SVG

Code Examples

代码示例

One excellent example beats many mediocre ones

Choose most relevant language:

Testing techniques → TypeScript/JavaScript
System debugging → Shell/Python
Data processing → Python

Good example:

Complete and runnable
Well-commented explaining WHY
From real scenario
Shows pattern clearly
Ready to adapt (not generic template)

Don't:

Implement in 5+ languages
Create fill-in-the-blank templates
Write contrived examples

You're good at porting - one great example is enough.

一个优秀的示例胜过多个平庸的示例

选择最相关的语言：

测试技术 → TypeScript/JavaScript
系统调试 → Shell/Python
数据处理 → Python

优秀示例的特点：

完整且可运行
有详细注释解释原因
来自真实场景
清晰展示模式
可直接适配（非通用模板）

请勿：

在5+种语言中实现
创建填空式模板
编写虚构示例

你擅长移植代码 - 一个优秀的示例就足够了。

File Organization

文件组织

Self-Contained Skill

自包含技能文档

defense-in-depth/
  SKILL.md    # Everything inline

When: All content fits, no heavy reference needed

defense-in-depth/
  SKILL.md    # 所有内容内联

适用场景：所有内容都能容纳，无需大型参考资料

Skill with Reusable Tool

包含可复用工具的技能文档

condition-based-waiting/
  SKILL.md    # Overview + patterns
  example.ts  # Working helpers to adapt

When: Tool is reusable code, not just narrative

condition-based-waiting/
  SKILL.md    # 概述 + 模式
  example.ts  # 可适配的实用工具

适用场景：工具是可复用代码，而非仅叙事内容

Skill with Heavy Reference

包含大型参考资料的技能文档

pptx/
  SKILL.md       # Overview + workflows
  pptxgenjs.md   # 600 lines API reference
  ooxml.md       # 500 lines XML structure
  scripts/       # Executable tools

When: Reference material too large for inline

pptx/
  SKILL.md       # 概述 + 工作流
  pptxgenjs.md   # 600行API参考
  ooxml.md       # 500行XML结构
  scripts/       # 可执行工具

适用场景：参考资料太大，无法内联

The Iron Law (Same as TDD)

铁律（与TDD相同）

NO SKILL WITHOUT A FAILING TEST FIRST

This applies to NEW skills AND EDITS to existing skills.

Write skill before testing? Delete it. Start over. Edit skill without testing? Same violation.

No exceptions:

Not for "simple additions"
Not for "just adding a section"
Not for "documentation updates"
Don't keep untested changes as "reference"
Don't "adapt" while running tests
Delete means delete

REQUIRED BACKGROUND: The superpowers:test-driven-development skill explains why this matters. Same principles apply to documentation.

NO SKILL WITHOUT A FAILING TEST FIRST

这适用于新技能文档和现有技能文档的编辑。

先写技能文档再测试？删除它，重新开始。

编辑技能文档但未测试？同样违反规则。

无例外：

不适用于"简单添加"
不适用于"仅添加一个章节"
不适用于"文档更新"
不要将未测试的更改保留为"参考"
不要在测试时"适配"内容
删除就是彻底删除

必备背景知识： superpowers:test-driven-development技能解释了这一点的重要性。相同的原则适用于文档编写。

Testing All Skill Types

所有类型技能文档的测试方法

Different skill types need different test approaches:

不同类型的技能文档需要不同的测试方法：

Discipline-Enforcing Skills (rules/requirements)

纪律约束型技能文档（规则/要求）

Examples: TDD, verification-before-completion, designing-before-coding

Test with:

Academic questions: Do they understand the rules?
Pressure scenarios: Do they comply under stress?
Multiple pressures combined: time + sunk cost + exhaustion
Identify rationalizations and add explicit counters

Success criteria: Agent follows rule under maximum pressure

示例： TDD、完成前验证、先设计再编码

测试方式：

学术问题：他们是否理解规则？
压力场景：他们在压力下是否合规？
多重压力组合：时间 + 沉没成本 + 疲惫
识别合理化说辞并添加明确的反驳

成功标准： Agent在最大压力下仍遵循规则

Technique Skills (how-to guides)

技术型技能文档（操作指南）

Examples: condition-based-waiting, root-cause-tracing, defensive-programming

Test with:

Application scenarios: Can they apply the technique correctly?
Variation scenarios: Do they handle edge cases?
Missing information tests: Do instructions have gaps?

Success criteria: Agent successfully applies technique to new scenario

示例： 基于条件的等待、根本原因追踪、防御性编程

测试方式：

应用场景：他们能否正确应用该技术？
变体场景：他们能否处理边缘情况？
信息缺失测试：说明是否存在漏洞？

成功标准： Agent能成功将技术应用到新场景

Pattern Skills (mental models)

模式型技能文档（思维模型）

Examples: reducing-complexity, information-hiding concepts

Test with:

Recognition scenarios: Do they recognize when pattern applies?
Application scenarios: Can they use the mental model?
Counter-examples: Do they know when NOT to apply?

Success criteria: Agent correctly identifies when/how to apply pattern

示例： 降低复杂度、信息隐藏概念

测试方式：

识别场景：他们能否识别模式的适用情况？
应用场景：他们能否使用该思维模型？
反例：他们是否知道何时不适用？

成功标准： Agent能正确识别模式的适用时机和方式

Reference Skills (documentation/APIs)

参考型技能文档（文档/API）

Examples: API documentation, command references, library guides

Test with:

Retrieval scenarios: Can they find the right information?
Application scenarios: Can they use what they found correctly?
Gap testing: Are common use cases covered?

Success criteria: Agent finds and correctly applies reference information

示例： API文档、命令参考、库指南

测试方式：

检索场景：他们能否找到正确的信息？
应用场景：他们能否正确使用找到的信息？
漏洞测试：是否覆盖了常见使用场景？

成功标准： Agent能找到并正确应用参考信息

Common Rationalizations for Skipping Testing

跳过测试的常见合理化说辞

Excuse	Reality
"Skill is obviously clear"	Clear to you ≠ clear to other agents. Test it.
"It's just a reference"	References can have gaps, unclear sections. Test retrieval.
"Testing is overkill"	Untested skills have issues. Always. 15 min testing saves hours.
"I'll test if problems emerge"	Problems = agents can't use skill. Test BEFORE deploying.
"Too tedious to test"	Testing is less tedious than debugging bad skill in production.
"I'm confident it's good"	Overconfidence guarantees issues. Test anyway.
"Academic review is enough"	Reading ≠ using. Test application scenarios.
"No time to test"	Deploying untested skill wastes more time fixing it later.

All of these mean: Test before deploying. No exceptions.

借口	现实
"技能文档显然很清晰"	对你清晰≠对其他Agent清晰。测试它。
"这只是参考资料"	参考资料可能存在漏洞、模糊章节。测试检索能力。
"测试小题大做"	未测试的技能文档肯定存在问题。15分钟测试能节省数小时时间。
"出现问题我再测试"	问题=Agent无法使用技能文档。在部署前测试。
"测试太繁琐"	测试比在生产环境中调试糟糕的技能文档更轻松。
"我确信它没问题"	过度自信必然导致问题。无论如何都要测试。
"学术审查足够了"	阅读≠使用。测试应用场景。
"没时间测试"	部署未测试的技能文档会在后续修复中浪费更多时间。

结论： 部署前必须测试。无例外。

Bulletproofing Skills Against Rationalization

让技能文档抵御合理化说辞

Skills that enforce discipline (like TDD) need to resist rationalization. Agents are smart and will find loopholes when under pressure.

Psychology note: Understanding WHY persuasion techniques work helps you apply them systematically. See persuasion-principles.md for research foundation (Cialdini, 2021; Meincke et al., 2025) on authority, commitment, scarcity, social proof, and unity principles.

纪律约束型技能文档（如TDD）需要抵御合理化说辞。Agent很聪明，在压力下会找到漏洞。

心理学提示： 理解说服技巧的原理有助于系统地应用它们。请参考persuasion-principles.md了解研究基础（Cialdini, 2021; Meincke et al., 2025），包括权威、承诺、稀缺、社会认同和统一原则。

Close Every Loophole Explicitly

明确填补每个漏洞

Don't just state the rule - forbid specific workarounds:

<Bad> ```markdown Write code before test? Delete it. ``` </Bad> <Good> ```markdown Write code before test? Delete it. Start over.

No exceptions:

Don't keep it as "reference"
Don't "adapt" it while writing tests
Don't look at it
Delete means delete

</Good>

不要只陈述规则 - 禁止特定的变通方法：

<Bad> ```markdown Write code before test? Delete it. ``` </Bad> <Good> ```markdown Write code before test? Delete it. Start over.

No exceptions:

Don't keep it as "reference"
Don't "adapt" it while writing tests
Don't look at it
Delete means delete

</Good>

Address "Spirit vs Letter" Arguments

解决"精神vs文字"争论

Add foundational principle early:

markdown

**Violating the letter of the rules is violating the spirit of the rules.**

This cuts off entire class of "I'm following the spirit" rationalizations.

尽早添加基本原则：

markdown

**Violating the letter of the rules is violating the spirit of the rules.**

这能杜绝一整类"我遵循的是精神"的合理化说辞。

Build Rationalization Table

创建合理化说辞表格

Capture rationalizations from baseline testing (see Testing section below). Every excuse agents make goes in the table:

markdown

| Excuse | Reality |
|--------|---------|
| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
| "I'll test after" | Tests passing immediately prove nothing. |
| "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |

记录基准测试中发现的合理化说辞（见下方测试部分）。Agent提出的每一个借口都要放入表格：

markdown

| 借口 | 现实 |
|--------|---------|
| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
| "I'll test after" | Tests passing immediately prove nothing. |
| "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |

Create Red Flags List

创建危险信号列表

Make it easy for agents to self-check when rationalizing:

markdown

undefined

让Agent在合理化时能轻松自我检查：

markdown

undefined

Red Flags - STOP and Start Over

Code before test
"I already manually tested it"
"Tests after achieve the same purpose"
"It's about spirit not ritual"
"This is different because..."

All of these mean: Delete code. Start over with TDD.

undefined

Code before test
"I already manually tested it"
"Tests after achieve the same purpose"
"It's about spirit not ritual"
"This is different because..."

All of these mean: Delete code. Start over with TDD.

undefined

Update CSO for Violation Symptoms

更新CSO以包含违规症状

Add to description: symptoms of when you're ABOUT to violate the rule:

yaml

description: use when implementing any feature or bugfix, before writing implementation code

在描述中添加你即将违反规则的症状：

yaml

description: use when implementing any feature or bugfix, before writing implementation code

RED-GREEN-REFACTOR for Skills

技能文档的RED-GREEN-REFACTOR流程

Follow the TDD cycle:

遵循TDD循环：

RED: Write Failing Test (Baseline)

RED：编写失败的测试（基准）

Run pressure scenario with subagent WITHOUT the skill. Document exact behavior:

What choices did they make?
What rationalizations did they use (verbatim)?
Which pressures triggered violations?

This is "watch the test fail" - you must see what agents naturally do before writing the skill.

在无该技能文档的情况下，用子Agent运行压力场景。记录具体行为：

他们做出了哪些选择？
他们使用了哪些合理化说辞（逐字记录）？
哪些压力触发了违规？

这就是"观察测试失败" - 在编写技能文档前，你必须了解Agent的自然行为。

GREEN: Write Minimal Skill

GREEN：编写最简技能文档

Write skill that addresses those specific rationalizations. Don't add extra content for hypothetical cases.

Run same scenarios WITH skill. Agent should now comply.

编写针对这些特定合理化说辞的技能文档。不要为假设情况添加额外内容。

在有技能文档的情况下运行相同场景。Agent现在应合规。

REFACTOR: Close Loopholes

REFACTOR：填补漏洞

Agent found new rationalization? Add explicit counter. Re-test until bulletproof.

Testing methodology: See @testing-skills-with-subagents.md for the complete testing methodology:

How to write pressure scenarios
Pressure types (time, sunk cost, authority, exhaustion)
Plugging holes systematically
Meta-testing techniques

Agent发现了新的合理化说辞？添加明确的反驳。重新测试直到无懈可击。

测试方法： 完整的测试方法请参考@testing-skills-with-subagents.md：

如何编写压力场景
压力类型时间、沉没成本、权威、疲惫
系统性填补漏洞
元测试技术

Anti-Patterns

反模式

❌ Narrative Example

❌ 叙事示例

"In session 2025-10-03, we found empty projectDir caused..." Why bad: Too specific, not reusable

"In session 2025-10-03, we found empty projectDir caused..." 缺点： 过于具体，无法复用

❌ Multi-Language Dilution

❌ 多语言稀释

example-js.js, example-py.py, example-go.go Why bad: Mediocre quality, maintenance burden

example-js.js, example-py.py, example-go.go 缺点： 质量平庸，维护负担重

❌ Code in Flowcharts

❌ 流程图中包含代码

dot

step1 [label="import fs"];
step2 [label="read file"];

Why bad: Can't copy-paste, hard to read

dot

step1 [label="import fs"];
step2 [label="read file"];

缺点： 无法复制粘贴，可读性差

❌ Generic Labels

❌ 通用标签

helper1, helper2, step3, pattern4 Why bad: Labels should have semantic meaning

helper1, helper2, step3, pattern4 缺点： 标签应具有语义

STOP: Before Moving to Next Skill

停止：进入下一个技能文档前

After writing ANY skill, you MUST STOP and complete the deployment process.

Do NOT:

Create multiple skills in batch without testing each
Move to next skill before current one is verified
Skip testing because "batching is more efficient"

The deployment checklist below is MANDATORY for EACH skill.

Deploying untested skills = deploying untested code. It's a violation of quality standards.

在编写任何技能文档后，你必须停止并完成部署流程。

禁止：

批量创建多个技能文档而不逐个测试
当前技能文档未验证就进入下一个
以"批量更高效"为由跳过测试

以下部署检查清单对每个技能文档都是强制性的。

部署未测试的技能文档=部署未测试的代码。这违反了质量标准。

Skill Creation Checklist (TDD Adapted)

技能文档创建检查清单（适配TDD）

Discovery Workflow

发现工作流

How future Claude finds your skill:

Encounters problem ("tests are flaky")
Finds SKILL (description matches)
Scans overview (is this relevant?)
Reads patterns (quick reference table)
Loads example (only when implementing)

Optimize for this flow - put searchable terms early and often.

未来的Claude如何找到你的技能文档：

遇到问题（"测试不稳定"）
找到技能文档（描述匹配）
浏览概述（是否相关？）
阅读模式（快速参考表格）
加载示例（仅在实现时）

针对此流程优化 - 尽早并频繁放置可搜索的术语。

The Bottom Line

总结

Creating skills IS TDD for process documentation.

Same Iron Law: No skill without failing test first. Same cycle: RED (baseline) → GREEN (write skill) → REFACTOR (close loopholes). Same benefits: Better quality, fewer surprises, bulletproof results.

If you follow TDD for code, follow it for skills. It's the same discipline applied to documentation.

创建技能文档就是将TDD应用于流程文档。

相同的铁律：无失败测试则无技能文档。

相同的循环：RED（基准）→ GREEN（编写技能文档）→ REFACTOR（填补漏洞）。

相同的好处：更高质量、更少意外、无懈可击的结果。

如果你在代码编写中遵循TDD，那么在技能文档编写中也遵循它。这是应用于文档编写的相同纪律。