writing-skills
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseWriting Skills
技能编写
Overview
概述
Writing skills IS Test-Driven Development applied to process documentation.
Personal skills live in agent-specific directories ( for Claude Code, for Codex)
~/.claude/skills~/.codex/skillsYou write test cases (pressure scenarios with subagents), watch them fail (baseline behavior), write the skill (documentation), watch tests pass (agents comply), and refactor (close loopholes).
Core principle: If you didn't watch an agent fail without the skill, you don't know if the skill teaches the right thing.
REQUIRED BACKGROUND: You MUST understand superpowers:test-driven-development before using this skill. That skill defines the fundamental RED-GREEN-REFACTOR cycle. This skill adapts TDD to documentation.
Official guidance: For Anthropic's official skill authoring best practices, see anthropic-best-practices.md. This document provides additional patterns and guidelines that complement the TDD-focused approach in this skill.
技能编写是将测试驱动开发(Test-Driven Development,TDD)应用于流程文档的实践。
个人技能存储在Agent专属目录中(Claude Code的技能目录为,Codex的为)
~/.claude/skills~/.codex/skills你需要编写测试用例(包含子Agent的压力场景),观察测试失败(基准行为),编写Skill(文档),观察测试通过(Agent遵守规则),然后重构(填补漏洞)。
核心原则: 如果你没有观察到Agent在没有该Skill时的失败表现,就无法确定该Skill是否传授了正确的内容。
必备背景知识: 在使用本Skill之前,你必须掌握superpowers:test-driven-development技能。该技能定义了基础的RED-GREEN-REFACTOR循环。本Skill将TDD方法适配到文档编写中。
官方指南: 关于Anthropic官方的Skill编写最佳实践,请参考anthropic-best-practices.md。本文档提供了额外的模式和指南,补充了本Skill中以TDD为核心的方法。
What is a Skill?
什么是Skill?
A skill is a reference guide for proven techniques, patterns, or tools. Skills help future Claude instances find and apply effective approaches.
Skills are: Reusable techniques, patterns, tools, reference guides
Skills are NOT: Narratives about how you solved a problem once
Skill是经过验证的技术、模式或工具的参考指南。Skill可帮助后续的Claude实例找到并应用有效的解决方法。
Skill的定位: 可复用的技术、模式、工具、参考指南
Skill的非定位: 关于你某次问题解决过程的叙事记录
TDD Mapping for Skills
Skill的TDD映射关系
| TDD Concept | Skill Creation |
|---|---|
| Test case | Pressure scenario with subagent |
| Production code | Skill document (SKILL.md) |
| Test fails (RED) | Agent violates rule without skill (baseline) |
| Test passes (GREEN) | Agent complies with skill present |
| Refactor | Close loopholes while maintaining compliance |
| Write test first | Run baseline scenario BEFORE writing skill |
| Watch it fail | Document exact rationalizations agent uses |
| Minimal code | Write skill addressing those specific violations |
| Watch it pass | Verify agent now complies |
| Refactor cycle | Find new rationalizations → plug → re-verify |
The entire skill creation process follows RED-GREEN-REFACTOR.
| TDD概念 | Skill创建流程 |
|---|---|
| 测试用例 | 包含子Agent的压力场景 |
| 生产代码 | Skill文档(SKILL.md) |
| 测试失败(RED) | Agent在没有Skill时违反规则(基准行为) |
| 测试通过(GREEN) | Agent在有Skill时遵守规则 |
| 重构 | 在保持合规性的同时填补漏洞 |
| 先编写测试 | 在编写Skill之前运行基准场景 |
| 观察测试失败 | 记录Agent使用的具体合理化理由 |
| 最小化代码 | 编写Skill以解决这些特定的违规问题 |
| 观察测试通过 | 验证Agent现在是否合规 |
| 重构循环 | 发现新的合理化理由 → 填补漏洞 → 重新验证 |
整个Skill创建流程遵循RED-GREEN-REFACTOR循环。
When to Create a Skill
何时创建Skill
Create when:
- Technique wasn't intuitively obvious to you
- You'd reference this again across projects
- Pattern applies broadly (not project-specific)
- Others would benefit
Don't create for:
- One-off solutions
- Standard practices well-documented elsewhere
- Project-specific conventions (put in CLAUDE.md)
- Mechanical constraints (if it's enforceable with regex/validation, automate it—save documentation for judgment calls)
创建时机:
- 该技术对你而言并非直观易懂
- 你会在多个项目中反复参考该内容
- 该模式具有广泛适用性(而非特定项目专属)
- 其他用户也能从中受益
无需创建的情况:
- 一次性解决方案
- 其他地方已有完善文档的标准实践
- 特定项目的约定(应放入CLAUDE.md)
- 机械性约束(如果可以用正则/验证自动化实现,就自动化处理——文档仅用于需要主观判断的场景)
Skill Types
Skill类型
Technique
技术类
Concrete method with steps to follow (condition-based-waiting, root-cause-tracing)
包含具体步骤的实操方法(如condition-based-waiting、root-cause-tracing)
Pattern
模式类
Way of thinking about problems (flatten-with-flags, test-invariants)
解决问题的思维方式(如flatten-with-flags、test-invariants)
Reference
参考类
API docs, syntax guides, tool documentation (office docs)
API文档、语法指南、工具文档(如office docs)
Directory Structure
目录结构
skills/
skill-name/
SKILL.md # Main reference (required)
supporting-file.* # Only if neededFlat namespace - all skills in one searchable namespace
Separate files for:
- Heavy reference (100+ lines) - API docs, comprehensive syntax
- Reusable tools - Scripts, utilities, templates
Keep inline:
- Principles and concepts
- Code patterns (< 50 lines)
- Everything else
skills/
skill-name/
SKILL.md # 主参考文档(必填)
supporting-file.* # 仅在需要时添加扁平命名空间 - 所有Skill都在一个可搜索的命名空间中
独立文件适用场景:
- 大型参考文档(超过100行)- API文档、全面的语法说明
- 可复用工具 - 脚本、实用程序、模板
内联保留内容:
- 原则和概念
- 代码模式(少于50行)
- 其他所有内容
SKILL.md Structure
SKILL.md结构
Frontmatter (YAML):
- Only two fields supported: and
namedescription - Max 1024 characters total
- : Use letters, numbers, and hyphens only (no parentheses, special chars)
name - : Third-person, describes ONLY when to use (NOT what it does)
description- Start with "Use when..." to focus on triggering conditions
- Include specific symptoms, situations, and contexts
- NEVER summarize the skill's process or workflow (see CSO section for why)
- Keep under 500 characters if possible
markdown
---
name: Skill-Name-With-Hyphens
description: Use when [specific triggering conditions and symptoms]
---前置元数据(YAML):
- 仅支持两个字段:和
namedescription - 总字符数不超过1024
- :仅使用字母、数字和连字符(不允许使用括号、特殊字符)
name - :第三人称视角,仅描述何时使用(而非功能内容)
description- 以"Use when..."开头,聚焦触发条件
- 包含具体的症状、场景和上下文
- 绝对不要总结Skill的流程或工作流(原因请参考CSO部分)
- 尽可能控制在500字符以内
markdown
---
name: Skill-Name-With-Hyphens
description: Use when [specific triggering conditions and symptoms]
---Skill Name
Skill名称
Overview
概述
What is this? Core principle in 1-2 sentences.
这是什么?用1-2句话说明核心原则。
When to Use
适用场景
[Small inline flowchart IF decision non-obvious]
Bullet list with SYMPTOMS and use cases
When NOT to use
[如果决策逻辑不直观,可添加小型内联流程图]
包含症状和用例的项目符号列表
不适用场景
Core Pattern (for techniques/patterns)
核心模式(针对技术/模式类Skill)
Before/after code comparison
代码前后对比示例
Quick Reference
快速参考
Table or bullets for scanning common operations
用于快速查阅常见操作的表格或项目符号列表
Implementation
实现方式
Inline code for simple patterns
Link to file for heavy reference or reusable tools
简单模式的内联代码
大型参考文档或可复用工具的文件链接
Common Mistakes
常见错误
What goes wrong + fixes
可能出现的问题及解决方法
Real-World Impact (optional)
实际业务影响(可选)
Concrete results
undefined具体的实施效果
undefinedClaude Search Optimization (CSO)
Claude搜索优化(CSO)
Critical for discovery: Future Claude needs to FIND your skill
对发现至关重要: 后续的Claude需要能够找到你的Skill
1. Rich Description Field
1. 丰富的描述字段
Purpose: Claude reads description to decide which skills to load for a given task. Make it answer: "Should I read this skill right now?"
Format: Start with "Use when..." to focus on triggering conditions
CRITICAL: Description = When to Use, NOT What the Skill Does
The description should ONLY describe triggering conditions. Do NOT summarize the skill's process or workflow in the description.
Why this matters: Testing revealed that when a description summarizes the skill's workflow, Claude may follow the description instead of reading the full skill content. A description saying "code review between tasks" caused Claude to do ONE review, even though the skill's flowchart clearly showed TWO reviews (spec compliance then code quality).
When the description was changed to just "Use when executing implementation plans with independent tasks" (no workflow summary), Claude correctly read the flowchart and followed the two-stage review process.
The trap: Descriptions that summarize workflow create a shortcut Claude will take. The skill body becomes documentation Claude skips.
yaml
undefined目的: Claude会读取描述来决定为给定任务加载哪些Skill。描述需要回答:"我现在应该阅读这个Skill吗?"
格式: 以"Use when..."开头,聚焦触发条件
关键规则:描述=适用时机,而非Skill功能
描述应仅描述触发条件。绝对不要在描述中总结Skill的流程或工作流。
为什么这很重要: 测试表明,当描述中总结了Skill的工作流时,Claude可能会直接遵循描述内容,而忽略完整的Skill文档。例如,描述中提到"任务间的代码审查"会导致Claude仅执行一次审查,尽管Skill中的流程图明确显示需要两次审查(规范合规性审查和代码质量审查)。
当描述修改为仅说明"适用于执行包含独立任务的实施计划时"(不包含工作流总结),Claude会正确读取流程图并遵循两阶段审查流程。
常见陷阱: 包含工作流总结的描述会成为Claude的捷径,导致Skill主体文档被忽略。
yaml
undefined❌ BAD: Summarizes workflow - Claude may follow this instead of reading skill
❌ 错误示例:总结了工作流 - Claude可能会直接遵循描述而非阅读完整Skill
description: Use when executing plans - dispatches subagent per task with code review between tasks
description: Use when executing plans - dispatches subagent per task with code review between tasks
❌ BAD: Too much process detail
❌ 错误示例:包含过多流程细节
description: Use for TDD - write test first, watch it fail, write minimal code, refactor
description: Use for TDD - write test first, watch it fail, write minimal code, refactor
✅ GOOD: Just triggering conditions, no workflow summary
✅ 正确示例:仅包含触发条件,无工作流总结
description: Use when executing implementation plans with independent tasks in the current session
description: Use when executing implementation plans with independent tasks in the current session
✅ GOOD: Triggering conditions only
✅ 正确示例:仅包含触发条件
description: Use when implementing any feature or bugfix, before writing implementation code
**Content:**
- Use concrete triggers, symptoms, and situations that signal this skill applies
- Describe the *problem* (race conditions, inconsistent behavior) not *language-specific symptoms* (setTimeout, sleep)
- Keep triggers technology-agnostic unless the skill itself is technology-specific
- If skill is technology-specific, make that explicit in the trigger
- Write in third person (injected into system prompt)
- **NEVER summarize the skill's process or workflow**
```yamldescription: Use when implementing any feature or bugfix, before writing implementation code
**内容要求:**
- 使用具体的触发条件、症状和场景来表明该Skill适用
- 描述**问题本身**(如竞态条件、行为不一致)而非**特定语言的症状**(如setTimeout、sleep)
- 除非Skill本身是技术特定的,否则触发条件应与技术无关
- 如果Skill是技术特定的,需在触发条件中明确说明
- 采用第三人称视角(会被注入到系统提示词中)
- **绝对不要总结Skill的流程或工作流**
```yaml❌ BAD: Too abstract, vague, doesn't include when to use
❌ 错误示例:过于抽象、模糊,未说明适用时机
description: For async testing
description: For async testing
❌ BAD: First person
❌ 错误示例:第一人称视角
description: I can help you with async tests when they're flaky
description: I can help you with async tests when they're flaky
❌ BAD: Mentions technology but skill isn't specific to it
❌ 错误示例:提及了技术,但Skill本身并非特定于该技术
description: Use when tests use setTimeout/sleep and are flaky
description: Use when tests use setTimeout/sleep and are flaky
✅ GOOD: Starts with "Use when", describes problem, no workflow
✅ 正确示例:以"Use when"开头,描述问题,无工作流内容
description: Use when tests have race conditions, timing dependencies, or pass/fail inconsistently
description: Use when tests have race conditions, timing dependencies, or pass/fail inconsistently
✅ GOOD: Technology-specific skill with explicit trigger
✅ 正确示例:技术特定Skill,触发条件明确
description: Use when using React Router and handling authentication redirects
undefineddescription: Use when using React Router and handling authentication redirects
undefined2. Keyword Coverage
2. 关键词覆盖
Use words Claude would search for:
- Error messages: "Hook timed out", "ENOTEMPTY", "race condition"
- Symptoms: "flaky", "hanging", "zombie", "pollution"
- Synonyms: "timeout/hang/freeze", "cleanup/teardown/afterEach"
- Tools: Actual commands, library names, file types
使用Claude会搜索的词汇:
- 错误信息:"Hook timed out"、"ENOTEMPTY"、"race condition"
- 症状:"flaky"、"hanging"、"zombie"、"pollution"
- 同义词:"timeout/hang/freeze"、"cleanup/teardown/afterEach"
- 工具:实际命令、库名称、文件类型
3. Descriptive Naming
3. 描述性命名
Use active voice, verb-first:
- ✅ not
creating-skillsskill-creation - ✅ not
condition-based-waitingasync-test-helpers
使用主动语态,动词开头:
- ✅ 而非
creating-skillsskill-creation - ✅ 而非
condition-based-waitingasync-test-helpers
4. Token Efficiency (Critical)
4. 令牌效率(至关重要)
Problem: getting-started and frequently-referenced skills load into EVERY conversation. Every token counts.
Target word counts:
- getting-started workflows: <150 words each
- Frequently-loaded skills: <200 words total
- Other skills: <500 words (still be concise)
Techniques:
Move details to tool help:
bash
undefined问题: 入门类和频繁参考的Skill会加载到每一次对话中,每一个令牌都很重要。
目标字数:
- 入门工作流:每个不超过150词
- 频繁加载的Skill:总字数不超过200词
- 其他Skill:不超过500词(仍需保持简洁)
优化技巧:
将细节转移到工具帮助文档:
bash
undefined❌ BAD: Document all flags in SKILL.md
❌ 错误示例:在SKILL.md中记录所有参数
search-conversations supports --text, --both, --after DATE, --before DATE, --limit N
search-conversations supports --text, --both, --after DATE, --before DATE, --limit N
✅ GOOD: Reference --help
✅ 正确示例:参考--help文档
search-conversations supports multiple modes and filters. Run --help for details.
**Use cross-references:**
```markdownsearch-conversations supports multiple modes and filters. Run --help for details.
**使用交叉引用:**
```markdown❌ BAD: Repeat workflow details
❌ 错误示例:重复工作流细节
When searching, dispatch subagent with template...
[20 lines of repeated instructions]
When searching, dispatch subagent with template...
[20行重复说明]
✅ GOOD: Reference other skill
✅ 正确示例:引用其他Skill
Always use subagents (50-100x context savings). REQUIRED: Use [other-skill-name] for workflow.
**Compress examples:**
```markdownAlways use subagents (50-100x context savings). REQUIRED: Use [other-skill-name] for workflow.
**压缩示例:**
```markdown❌ BAD: Verbose example (42 words)
❌ 错误示例:冗长示例(42词)
your human partner: "How did we handle authentication errors in React Router before?"
You: I'll search past conversations for React Router authentication patterns.
[Dispatch subagent with search query: "React Router authentication error handling 401"]
your human partner: "How did we handle authentication errors in React Router before?"
You: I'll search past conversations for React Router authentication patterns.
[Dispatch subagent with search query: "React Router authentication error handling 401"]
✅ GOOD: Minimal example (20 words)
✅ 正确示例:极简示例(20词)
Partner: "How did we handle auth errors in React Router?"
You: Searching...
[Dispatch subagent → synthesis]
**Eliminate redundancy:**
- Don't repeat what's in cross-referenced skills
- Don't explain what's obvious from command
- Don't include multiple examples of same pattern
**Verification:**
```bash
wc -w skills/path/SKILL.mdPartner: "How did we handle auth errors in React Router?"
You: Searching...
[Dispatch subagent → synthesis]
**消除冗余:**
- 不要重复交叉引用Skill中的内容
- 不要解释从命令本身就能明显看出的信息
- 不要包含同一模式的多个示例
**验证方法:**
```bash
wc -w skills/path/SKILL.mdgetting-started workflows: aim for <150 each
入门工作流:目标少于150词
Other frequently-loaded: aim for <200 total
其他频繁加载的Skill:总目标少于200词
**Name by what you DO or core insight:**
- ✅ `condition-based-waiting` > `async-test-helpers`
- ✅ `using-skills` not `skill-usage`
- ✅ `flatten-with-flags` > `data-structure-refactoring`
- ✅ `root-cause-tracing` > `debugging-techniques`
**Gerunds (-ing) work well for processes:**
- `creating-skills`, `testing-skills`, `debugging-with-logs`
- Active, describes the action you're taking
**根据操作或核心洞察命名:**
- ✅ `condition-based-waiting` 优于 `async-test-helpers`
- ✅ `using-skills` 而非 `skill-usage`
- ✅ `flatten-with-flags` 优于 `data-structure-refactoring`
- ✅ `root-cause-tracing` 优于 `debugging-techniques`
**动名词(-ing形式)适用于流程类命名:**
- `creating-skills`, `testing-skills`, `debugging-with-logs`
- 主动语态,描述正在执行的操作4. Cross-Referencing Other Skills
4. 交叉引用其他Skill
When writing documentation that references other skills:
Use skill name only, with explicit requirement markers:
- ✅ Good:
**REQUIRED SUB-SKILL:** Use superpowers:test-driven-development - ✅ Good:
**REQUIRED BACKGROUND:** You MUST understand superpowers:systematic-debugging - ❌ Bad: (unclear if required)
See skills/testing/test-driven-development - ❌ Bad: (force-loads, burns context)
@skills/testing/test-driven-development/SKILL.md
Why no @ links: syntax force-loads files immediately, consuming 200k+ context before you need them.
@当编写引用其他Skill的文档时:
仅使用Skill名称,并添加明确的要求标记:
- ✅ 正确示例:
**REQUIRED SUB-SKILL:** Use superpowers:test-driven-development - ✅ 正确示例:
**REQUIRED BACKGROUND:** You MUST understand superpowers:systematic-debugging - ❌ 错误示例:(是否为必填项不明确)
See skills/testing/test-driven-development - ❌ 错误示例:(会强制加载,消耗上下文)
@skills/testing/test-driven-development/SKILL.md
为什么不使用@链接: 语法会立即强制加载文件,在你需要之前就消耗200k+的上下文令牌。
@Flowchart Usage
流程图使用
dot
digraph when_flowchart {
"Need to show information?" [shape=diamond];
"Decision where I might go wrong?" [shape=diamond];
"Use markdown" [shape=box];
"Small inline flowchart" [shape=box];
"Need to show information?" -> "Decision where I might go wrong?" [label="yes"];
"Decision where I might go wrong?" -> "Small inline flowchart" [label="yes"];
"Decision where I might go wrong?" -> "Use markdown" [label="no"];
}Use flowcharts ONLY for:
- Non-obvious decision points
- Process loops where you might stop too early
- "When to use A vs B" decisions
Never use flowcharts for:
- Reference material → Tables, lists
- Code examples → Markdown blocks
- Linear instructions → Numbered lists
- Labels without semantic meaning (step1, helper2)
See @graphviz-conventions.dot for graphviz style rules.
Visualizing for your human partner: Use in this directory to render a skill's flowcharts to SVG:
render-graphs.jsbash
./render-graphs.js ../some-skill # Each diagram separately
./render-graphs.js ../some-skill --combine # All diagrams in one SVGdot
digraph when_flowchart {
"Need to show information?" [shape=diamond];
"Decision where I might go wrong?" [shape=diamond];
"Use markdown" [shape=box];
"Small inline flowchart" [shape=box];
"Need to show information?" -> "Decision where I might go wrong?" [label="yes"];
"Decision where I might go wrong?" -> "Small inline flowchart" [label="yes"];
"Decision where I might go wrong?" -> "Use markdown" [label="no"];
}仅在以下场景使用流程图:
- 不直观的决策点
- 可能提前终止的流程循环
- "何时使用A而非B"的决策
绝对不要在以下场景使用流程图:
- 参考资料 → 使用表格、列表
- 代码示例 → 使用Markdown代码块
- 线性指令 → 使用编号列表
- 无语义的标签(如step1、helper2)
图形化风格规则请参考@graphviz-conventions.dot。
为人类伙伴可视化展示: 使用本目录中的将Skill的流程图渲染为SVG:
render-graphs.jsbash
./render-graphs.js ../some-skill # 单独渲染每个图表
./render-graphs.js ../some-skill --combine # 将所有图表合并为一个SVGCode Examples
代码示例
One excellent example beats many mediocre ones
Choose most relevant language:
- Testing techniques → TypeScript/JavaScript
- System debugging → Shell/Python
- Data processing → Python
Good example:
- Complete and runnable
- Well-commented explaining WHY
- From real scenario
- Shows pattern clearly
- Ready to adapt (not generic template)
Don't:
- Implement in 5+ languages
- Create fill-in-the-blank templates
- Write contrived examples
You're good at porting - one great example is enough.
一个优秀的示例胜过多个平庸的示例
选择最相关的编程语言:
- 测试技术 → TypeScript/JavaScript
- 系统调试 → Shell/Python
- 数据处理 → Python
优秀示例的标准:
- 完整且可运行
- 带有详细注释说明原因
- 来自真实场景
- 清晰展示模式
- 可直接适配使用(而非通用模板)
避免:
- 在5+种语言中实现
- 创建填空式模板
- 编写虚构的示例
你擅长移植代码——一个优秀的示例就足够了。
File Organization
文件组织
Self-Contained Skill
自包含Skill
defense-in-depth/
SKILL.md # Everything inlineWhen: All content fits, no heavy reference needed
defense-in-depth/
SKILL.md # 所有内容内联适用场景:所有内容均可内联,无需大型参考文档
Skill with Reusable Tool
包含可复用工具的Skill
condition-based-waiting/
SKILL.md # Overview + patterns
example.ts # Working helpers to adaptWhen: Tool is reusable code, not just narrative
condition-based-waiting/
SKILL.md # 概述 + 模式
example.ts # 可适配使用的实用工具适用场景:工具为可复用代码,而非仅为叙事内容
Skill with Heavy Reference
包含大型参考文档的Skill
pptx/
SKILL.md # Overview + workflows
pptxgenjs.md # 600 lines API reference
ooxml.md # 500 lines XML structure
scripts/ # Executable toolsWhen: Reference material too large for inline
pptx/
SKILL.md # 概述 + 工作流
pptxgenjs.md # 600行API参考
ooxml.md # 500行XML结构说明
scripts/ # 可执行工具适用场景:参考资料内容过多,无法内联
The Iron Law (Same as TDD)
铁律(与TDD一致)
NO SKILL WITHOUT A FAILING TEST FIRSTThis applies to NEW skills AND EDITS to existing skills.
Write skill before testing? Delete it. Start over.
Edit skill without testing? Same violation.
No exceptions:
- Not for "simple additions"
- Not for "just adding a section"
- Not for "documentation updates"
- Don't keep untested changes as "reference"
- Don't "adapt" while running tests
- Delete means delete
REQUIRED BACKGROUND: The superpowers:test-driven-development skill explains why this matters. Same principles apply to documentation.
NO SKILL WITHOUT A FAILING TEST FIRST此规则适用于新Skill和现有Skill的编辑。
先编写Skill再测试?删除它,重新开始。
编辑Skill但未测试?同样违反规则。
无例外:
- 不适用于"简单添加内容"
- 不适用于"仅添加一个章节"
- 不适用于"文档更新"
- 不要保留未测试的修改作为"参考"
- 不要在测试时"调整"Skill
- 删除意味着彻底删除
必备背景知识: superpowers:test-driven-development技能解释了此规则的重要性。相同原则适用于文档编写。
Testing All Skill Types
所有Skill类型的测试方法
Different skill types need different test approaches:
不同类型的Skill需要不同的测试方法:
Discipline-Enforcing Skills (rules/requirements)
纪律约束类Skill(规则/要求)
Examples: TDD, verification-before-completion, designing-before-coding
Test with:
- Academic questions: Do they understand the rules?
- Pressure scenarios: Do they comply under stress?
- Multiple pressures combined: time + sunk cost + exhaustion
- Identify rationalizations and add explicit counters
Success criteria: Agent follows rule under maximum pressure
示例: TDD、完成前验证、先设计再编码
测试方式:
- 学术性问题:他们是否理解规则?
- 压力场景:他们在压力下是否遵守规则?
- 多重压力组合:时间+沉没成本+疲惫
- 识别合理化理由并添加明确的反驳内容
成功标准: Agent在最大压力下仍遵守规则
Technique Skills (how-to guides)
技术类Skill(操作指南)
Examples: condition-based-waiting, root-cause-tracing, defensive-programming
Test with:
- Application scenarios: Can they apply the technique correctly?
- Variation scenarios: Do they handle edge cases?
- Missing information tests: Do instructions have gaps?
Success criteria: Agent successfully applies technique to new scenario
示例: condition-based-waiting、root-cause-tracing、防御性编程
测试方式:
- 应用场景:他们能否正确应用该技术?
- 变体场景:他们能否处理边缘情况?
- 信息缺失测试:说明是否存在漏洞?
成功标准: Agent能成功将技术应用到新场景中
Pattern Skills (mental models)
模式类Skill(思维模型)
Examples: reducing-complexity, information-hiding concepts
Test with:
- Recognition scenarios: Do they recognize when pattern applies?
- Application scenarios: Can they use the mental model?
- Counter-examples: Do they know when NOT to apply?
Success criteria: Agent correctly identifies when/how to apply pattern
示例: 降低复杂度、信息隐藏概念
测试方式:
- 识别场景:他们能否识别模式的适用场景?
- 应用场景:他们能否使用该思维模型?
- 反例测试:他们是否知道何时不适用?
成功标准: Agent能正确识别模式的适用场景和使用方式
Reference Skills (documentation/APIs)
参考类Skill(文档/API)
Examples: API documentation, command references, library guides
Test with:
- Retrieval scenarios: Can they find the right information?
- Application scenarios: Can they use what they found correctly?
- Gap testing: Are common use cases covered?
Success criteria: Agent finds and correctly applies reference information
示例: API文档、命令参考、库指南
测试方式:
- 检索场景:他们能否找到正确的信息?
- 应用场景:他们能否正确使用找到的信息?
- 漏洞测试:是否覆盖了常见用例?
成功标准: Agent能找到并正确应用参考信息
Common Rationalizations for Skipping Testing
跳过测试的常见合理化理由
| Excuse | Reality |
|---|---|
| "Skill is obviously clear" | Clear to you ≠ clear to other agents. Test it. |
| "It's just a reference" | References can have gaps, unclear sections. Test retrieval. |
| "Testing is overkill" | Untested skills have issues. Always. 15 min testing saves hours. |
| "I'll test if problems emerge" | Problems = agents can't use skill. Test BEFORE deploying. |
| "Too tedious to test" | Testing is less tedious than debugging bad skill in production. |
| "I'm confident it's good" | Overconfidence guarantees issues. Test anyway. |
| "Academic review is enough" | Reading ≠ using. Test application scenarios. |
| "No time to test" | Deploying untested skill wastes more time fixing it later. |
All of these mean: Test before deploying. No exceptions.
| 借口 | 现实 |
|---|---|
| "Skill内容显然清晰易懂" | 对你清晰≠对其他Agent清晰。请测试。 |
| "这只是参考文档" | 参考文档可能存在漏洞、表述不清的部分。请测试检索效果。 |
| "测试小题大做" | 未测试的Skill必然存在问题。无一例外。15分钟的测试能节省数小时的后续工作。 |
| "出现问题我再测试" | 问题=Agent无法使用Skill。请在部署前测试。 |
| "测试太繁琐" | 测试比在生产环境中调试有问题的Skill更轻松。 |
| "我确信内容没问题" | 过度自信必然导致问题。无论如何都要测试。 |
| "学术审查就足够了" | 阅读≠使用。请测试应用场景。 |
| "没时间测试" | 部署未测试的Skill会在后续修复中浪费更多时间。 |
结论:部署前必须测试。无例外。
Bulletproofing Skills Against Rationalization
让Skill能够抵御合理化理由的技巧
Skills that enforce discipline (like TDD) need to resist rationalization. Agents are smart and will find loopholes when under pressure.
Psychology note: Understanding WHY persuasion techniques work helps you apply them systematically. See persuasion-principles.md for research foundation (Cialdini, 2021; Meincke et al., 2025) on authority, commitment, scarcity, social proof, and unity principles.
纪律约束类Skill(如TDD)需要抵御合理化理由。Agent很聪明,在压力下会找到漏洞。
心理学提示: 理解说服技巧的原理有助于系统地应用它们。请参考persuasion-principles.md中的研究基础(Cialdini, 2021; Meincke et al., 2025),包括权威、承诺、稀缺性、社会认同和统一性原则。
Close Every Loophole Explicitly
明确填补每一个漏洞
Don't just state the rule - forbid specific workarounds:
<Bad>
```markdown
Write code before test? Delete it.
```
</Bad>
<Good>
```markdown
Write code before test? Delete it. Start over.
No exceptions:
- Don't keep it as "reference"
- Don't "adapt" it while writing tests
- Don't look at it
- Delete means delete
</Good>不要仅陈述规则——还要禁止特定的变通方法:
<Bad>
```markdown
Write code before test? Delete it.
```
</Bad>
<Good>
```markdown
Write code before test? Delete it. Start over.
No exceptions:
- Don't keep it as "reference"
- Don't "adapt" it while writing tests
- Don't look at it
- Delete means delete
</Good>Address "Spirit vs Letter" Arguments
解决"精神vs文字"的争论
Add foundational principle early:
markdown
**Violating the letter of the rules is violating the spirit of the rules.**This cuts off entire class of "I'm following the spirit" rationalizations.
提前添加基础原则:
markdown
**Violating the letter of the rules is violating the spirit of the rules.**这可以杜绝一整类"我遵守了精神"的合理化理由。
Build Rationalization Table
构建合理化理由表格
Capture rationalizations from baseline testing (see Testing section below). Every excuse agents make goes in the table:
markdown
| Excuse | Reality |
|--------|---------|
| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
| "I'll test after" | Tests passing immediately prove nothing. |
| "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |记录基准测试中发现的合理化理由(请参考下方测试部分)。Agent提出的每一个借口都要放入表格:
markdown
| 借口 | 现实 |
|--------|---------|
| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
| "I'll test after" | Tests passing immediately prove nothing. |
| "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |Create Red Flags List
创建红色警示列表
Make it easy for agents to self-check when rationalizing:
markdown
undefined让Agent能够自我检查是否在找合理化理由:
markdown
undefinedRed Flags - STOP and Start Over
红色警示 - 立即停止并重新开始
- Code before test
- "I already manually tested it"
- "Tests after achieve the same purpose"
- "It's about spirit not ritual"
- "This is different because..."
All of these mean: Delete code. Start over with TDD.
undefined- 先编写代码再写测试
- "我已经手动测试过了"
- "事后测试能达到同样的目的"
- "这关乎精神而非形式"
- "这次情况特殊因为..."
出现以上任何情况:删除代码。从TDD开始重新执行。
undefinedUpdate CSO for Violation Symptoms
更新CSO以包含违规症状
Add to description: symptoms of when you're ABOUT to violate the rule:
yaml
description: use when implementing any feature or bugfix, before writing implementation code在描述中添加即将违反规则的症状:
yaml
description: use when implementing any feature or bugfix, before writing implementation codeRED-GREEN-REFACTOR for Skills
Skill的RED-GREEN-REFACTOR流程
Follow the TDD cycle:
遵循TDD循环:
RED: Write Failing Test (Baseline)
RED:编写失败的测试(基准)
Run pressure scenario with subagent WITHOUT the skill. Document exact behavior:
- What choices did they make?
- What rationalizations did they use (verbatim)?
- Which pressures triggered violations?
This is "watch the test fail" - you must see what agents naturally do before writing the skill.
在不加载Skill的情况下,使用子Agent运行压力场景。记录具体行为:
- 他们做出了哪些选择?
- 他们使用了哪些合理化理由(逐字记录)?
- 哪些压力触发了违规?
这就是"观察测试失败"——在编写Skill之前,你必须了解Agent的自然行为。
GREEN: Write Minimal Skill
GREEN:编写最小化的Skill
Write skill that addresses those specific rationalizations. Don't add extra content for hypothetical cases.
Run same scenarios WITH skill. Agent should now comply.
编写Skill以解决这些特定的合理化理由。不要为假设的情况添加额外内容。
在加载Skill的情况下运行相同场景。Agent现在应该遵守规则。
REFACTOR: Close Loopholes
REFACTOR:填补漏洞
Agent found new rationalization? Add explicit counter. Re-test until bulletproof.
Testing methodology: See @testing-skills-with-subagents.md for the complete testing methodology:
- How to write pressure scenarios
- Pressure types (time, sunk cost, authority, exhaustion)
- Plugging holes systematically
- Meta-testing techniques
Agent找到了新的合理化理由?添加明确的反驳内容。重新测试直到Skill无懈可击。
测试方法论: 完整的测试方法论请参考@testing-skills-with-subagents.md:
- 如何编写压力场景
- 压力类型(时间、沉没成本、权威、疲惫)
- 系统地填补漏洞
- 元测试技巧
Anti-Patterns
反模式
❌ Narrative Example
❌ 叙事性示例
"In session 2025-10-03, we found empty projectDir caused..."
Why bad: Too specific, not reusable
"In session 2025-10-03, we found empty projectDir caused..."
问题: 过于具体,无法复用
❌ Multi-Language Dilution
❌ 多语言稀释
example-js.js, example-py.py, example-go.go
Why bad: Mediocre quality, maintenance burden
example-js.js, example-py.py, example-go.go
问题: 质量平庸,维护负担重
❌ Code in Flowcharts
❌ 流程图中包含代码
dot
step1 [label="import fs"];
step2 [label="read file"];Why bad: Can't copy-paste, hard to read
dot
step1 [label="import fs"];
step2 [label="read file"];问题: 无法复制粘贴,可读性差
❌ Generic Labels
❌ 通用标签
helper1, helper2, step3, pattern4
Why bad: Labels should have semantic meaning
helper1, helper2, step3, pattern4
问题: 标签应具有语义
STOP: Before Moving to Next Skill
停止:进入下一个Skill之前
After writing ANY skill, you MUST STOP and complete the deployment process.
Do NOT:
- Create multiple skills in batch without testing each
- Move to next skill before current one is verified
- Skip testing because "batching is more efficient"
The deployment checklist below is MANDATORY for EACH skill.
Deploying untested skills = deploying untested code. It's a violation of quality standards.
编写任何Skill后,你必须停止并完成部署流程。
禁止:
- 批量创建多个Skill而不逐个测试
- 当前Skill未验证就进入下一个Skill
- 以"批量处理更高效"为由跳过测试
以下部署检查清单对每个Skill都是强制性的。
部署未测试的Skill=部署未测试的代码。这违反了质量标准。
Skill Creation Checklist (TDD Adapted)
Skill创建检查清单(适配TDD)
IMPORTANT: Use TodoWrite to create todos for EACH checklist item below.
RED Phase - Write Failing Test:
- Create pressure scenarios (3+ combined pressures for discipline skills)
- Run scenarios WITHOUT skill - document baseline behavior verbatim
- Identify patterns in rationalizations/failures
GREEN Phase - Write Minimal Skill:
- Name uses only letters, numbers, hyphens (no parentheses/special chars)
- YAML frontmatter with only name and description (max 1024 chars)
- Description starts with "Use when..." and includes specific triggers/symptoms
- Description written in third person
- Keywords throughout for search (errors, symptoms, tools)
- Clear overview with core principle
- Address specific baseline failures identified in RED
- Code inline OR link to separate file
- One excellent example (not multi-language)
- Run scenarios WITH skill - verify agents now comply
REFACTOR Phase - Close Loopholes:
- Identify NEW rationalizations from testing
- Add explicit counters (if discipline skill)
- Build rationalization table from all test iterations
- Create red flags list
- Re-test until bulletproof
Quality Checks:
- Small flowchart only if decision non-obvious
- Quick reference table
- Common mistakes section
- No narrative storytelling
- Supporting files only for tools or heavy reference
Deployment:
- Commit skill to git and push to your fork (if configured)
- Consider contributing back via PR (if broadly useful)
重要提示:使用TodoWrite为以下每一个检查项创建待办事项。
RED阶段 - 编写失败的测试:
- 创建压力场景(纪律约束类Skill需要3+种组合压力)
- 在不加载Skill的情况下运行场景——逐字记录基准行为
- 识别合理化理由/失败的模式
GREEN阶段 - 编写最小化Skill:
- 名称仅使用字母、数字和连字符(无括号/特殊字符)
- 仅包含name和description的YAML前置元数据(总字符数≤1024)
- 描述以"Use when..."开头,包含具体的触发条件/症状
- 描述采用第三人称视角
- 全文包含用于搜索的关键词(错误、症状、工具)
- 清晰的概述和核心原则
- 解决RED阶段中发现的具体基准失败问题
- 代码内联或链接到独立文件
- 一个优秀的示例(非多语言)
- 在加载Skill的情况下运行场景——验证Agent现在是否合规
REFACTOR阶段 - 填补漏洞:
- 从测试中识别新的合理化理由
- 添加明确的反驳内容(针对纪律约束类Skill)
- 从所有测试迭代中构建合理化理由表格
- 创建红色警示列表
- 重新测试直到无懈可击
质量检查:
- 仅在决策不直观时使用小型流程图
- 快速参考表格
- 常见错误部分
- 无叙事性内容
- 仅为工具或大型参考文档添加支持文件
部署:
- 将Skill提交到git并推送到你的分支(如果已配置)
- 如果具有广泛适用性,可考虑通过PR贡献回主仓库
Discovery Workflow
发现工作流
How future Claude finds your skill:
- Encounters problem ("tests are flaky")
- Finds SKILL (description matches)
- Scans overview (is this relevant?)
- Reads patterns (quick reference table)
- Loads example (only when implementing)
Optimize for this flow - put searchable terms early and often.
后续的Claude如何找到你的Skill:
- 遇到问题(如"测试不稳定")
- 搜索关键词(错误、症状、工具)
- 找到SKILL(描述匹配)
- 浏览概述(是否相关?)
- 阅读模式(快速参考表格)
- 查看示例(仅在实施时)
针对此流程进行优化 - 尽早并频繁地放置可搜索的术语。
The Bottom Line
核心结论
Creating skills IS TDD for process documentation.
Same Iron Law: No skill without failing test first.
Same cycle: RED (baseline) → GREEN (write skill) → REFACTOR (close loopholes).
Same benefits: Better quality, fewer surprises, bulletproof results.
If you follow TDD for code, follow it for skills. It's the same discipline applied to documentation.
创建Skill是针对流程文档的TDD实践。
相同的铁律:没有失败的测试就不要创建Skill。
相同的循环:RED(基准)→ GREEN(编写Skill)→ REFACTOR(填补漏洞)。
相同的收益:更高的质量、更少的意外、无懈可击的结果。
如果你在代码开发中遵循TDD,那么在Skill创建中也应遵循。这是将相同的纪律应用于文档编写。