nbl.writing-skills
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseWriting Skills
编写技能
Overview
概述
Writing skills IS Test-Driven Development applied to process documentation.
Personal skills live in agent-specific directories ( for Claude Code, for Codex)
~/.claude/skills~/.agents/skills/You write test cases (pressure scenarios with subagents), watch them fail (baseline behavior), write the skill (documentation), watch tests pass (agents comply), and refactor (close loopholes).
Core principle: If you didn't watch an agent fail without the skill, you don't know if the skill teaches the right thing.
REQUIRED BACKGROUND: You MUST understand nbl.test-driven-development before using this skill. That skill defines the fundamental RED-GREEN-REFACTOR cycle. This skill adapts TDD to documentation.
Official guidance: For Anthropic's official skill authoring best practices, see anthropic-best-practices.md. This document provides additional patterns and guidelines that complement the TDD-focused approach in this skill.
编写技能本质是将测试驱动开发(TDD)应用于流程文档编写。
个人技能存储在Agent专属目录中(Claude Code对应路径为,Codex对应路径为)
~/.claude/skills~/.agents/skills/你需要先编写测试用例(使用子Agent模拟高压场景),观察测试失败(基准行为),然后编写技能(文档),再观察测试通过(Agent遵守规则),最后重构(填补漏洞)。
核心原则: 如果你没有见证过Agent在无该技能时表现失败,你就无法确认该技能是否传递了正确的内容。
必备背景: 使用本技能前你必须理解相关内容,该技能定义了基础的「红-绿-重构」循环,本技能是将TDD适配到文档编写场景的延伸。
nbl.test-driven-development官方指引: 如需查看Anthropic官方的技能编写最佳实践,请参考,本文档提供了额外的模式和指引,作为本技能聚焦TDD方法的补充。
anthropic-best-practices.mdWhat is a Skill?
什么是Skill?
A skill is a reference guide for proven techniques, patterns, or tools. Skills help future Claude instances find and apply effective approaches.
Skills are: Reusable techniques, patterns, tools, reference guides
Skills are NOT: Narratives about how you solved a problem once
Skill是经过验证的技术、模式或工具的参考指南,能够帮助后续的Claude实例找到并应用有效的解决方案。
Skill的属性: 可复用的技术、模式、工具、参考指南
Skill不属于: 单次解决某个问题的过程记录
TDD Mapping for Skills
Skill编写的TDD映射关系
| TDD Concept | Skill Creation |
|---|---|
| Test case | Pressure scenario with subagent |
| Production code | Skill document (SKILL.md) |
| Test fails (RED) | Agent violates rule without skill (baseline) |
| Test passes (GREEN) | Agent complies with skill present |
| Refactor | Close loopholes while maintaining compliance |
| Write test first | Run baseline scenario BEFORE writing skill |
| Watch it fail | Document exact rationalizations agent uses |
| Minimal code | Write skill addressing those specific violations |
| Watch it pass | Verify agent now complies |
| Refactor cycle | Find new rationalizations → plug → re-verify |
The entire skill creation process follows RED-GREEN-REFACTOR.
| TDD概念 | 技能创建对应环节 |
|---|---|
| 测试用例 | 带子Agent的高压场景 |
| 生产代码 | 技能文档(SKILL.md) |
| 测试失败(红) | 无技能加持时Agent违反规则(基准行为) |
| 测试通过(绿) | 加载技能后Agent遵守规则 |
| 重构 | 在保持规则合规性的前提下填补漏洞 |
| 先写测试 | 编写技能前先运行基准场景 |
| 观察失败 | 记录Agent给出的所有合理化借口原文 |
| 最小化代码 | 针对已发现的违规点编写对应技能内容 |
| 观察通过 | 验证Agent现在可以遵守规则 |
| 重构循环 | 发现新的合理化借口 → 填补漏洞 → 重新验证 |
整个技能创建流程严格遵循「红-绿-重构」循环。
When to Create a Skill
什么时候需要创建Skill
Create when:
- Technique wasn't intuitively obvious to you
- You'd reference this again across projects
- Pattern applies broadly (not project-specific)
- Others would benefit
Don't create for:
- One-off solutions
- Standard practices well-documented elsewhere
- Project-specific conventions (put in CLAUDE.md)
- Mechanical constraints (if it's enforceable with regex/validation, automate it—save documentation for judgment calls)
适合创建的场景:
- 你觉得相关技术不是直觉就能想到的
- 你会在多个项目中重复参考该内容
- 相关模式适用范围广(不是特定项目专属)
- 其他开发者也能从中受益
不适合创建的场景:
- 一次性解决方案
- 其他地方已有完善文档的标准实践
- 特定项目的约定(放在CLAUDE.md中即可)
- 可以通过正则/校验自动化强制执行的机械约束(文档只需要记录需要判断的场景)
Skill Types
Skill类型
Technique
技术类
Concrete method with steps to follow (condition-based-waiting, root-cause-tracing)
有明确步骤可遵循的具体方法(condition-based-waiting、root-cause-tracing)
Pattern
模式类
Way of thinking about problems (flatten-with-flags, test-invariants)
思考问题的方式(flatten-with-flags、test-invariants)
Reference
参考类
API docs, syntax guides, tool documentation (office docs)
API文档、语法指南、工具文档(office文档)
Directory Structure
目录结构
skills/
skill-name/
SKILL.md # Main reference (required)
supporting-file.* # Only if neededFlat namespace - all skills in one searchable namespace
Separate files for:
- Heavy reference (100+ lines) - API docs, comprehensive syntax
- Reusable tools - Scripts, utilities, templates
Keep inline:
- Principles and concepts
- Code patterns (< 50 lines)
- Everything else
skills/
skill-name/
SKILL.md # 主参考文档(必填)
supporting-file.* # 必要时才添加扁平命名空间 - 所有技能都放在同一个可搜索的命名空间下
适合拆分单独文件的场景:
- 厚重参考内容(超过100行)- API文档、完整语法说明
- 可复用工具 - 脚本、工具函数、模板
适合放在内联的内容:
- 原则和概念
- 代码模式(少于50行)
- 其他所有内容
SKILL.md Structure
SKILL.md结构
Frontmatter (YAML):
- Only two fields supported: and
namedescription - Max 1024 characters total
- : Use letters, numbers, and hyphens only (no parentheses, special chars)
name - : Third-person, describes ONLY when to use (NOT what it does)
description- Start with "Use when..." to focus on triggering conditions
- Include specific symptoms, situations, and contexts
- NEVER summarize the skill's process or workflow (see CSO section for why)
- Keep under 500 characters if possible
markdown
---
name: Skill-Name-With-Hyphens
description: Use when [specific triggering conditions and symptoms]
---前置元数据(YAML):
- 仅支持两个字段:和
namedescription - 总长度最大1024字符
- :仅允许使用字母、数字和连字符(不允许括号、特殊字符)
name - :第三人称,仅描述使用场景(不要描述具体功能)
description- 以"Use when..."开头,聚焦触发条件
- 包含具体的症状、场景和上下文
- 绝对不要总结技能的流程或工作流(原因见CSO章节)
- 尽量控制在500字符以内
markdown
---
name: Skill-Name-With-Hyphens
description: Use when [specific triggering conditions and symptoms]
---Skill Name
技能名称
Overview
概述
What is this? Core principle in 1-2 sentences.
这是什么?用1-2句话说明核心原则。
When to Use
适用场景
[Small inline flowchart IF decision non-obvious]
Bullet list with SYMPTOMS and use cases
When NOT to use
[如果决策逻辑不直观,可放小型内联流程图]
带症状和使用场景的无序列表
不适用的场景
Core Pattern (for techniques/patterns)
核心模式(技术/模式类技能需要)
Before/after code comparison
代码对比的前后效果
Quick Reference
快速参考
Table or bullets for scanning common operations
表格或列表形式,方便快速查阅常用操作
Implementation
实现方式
Inline code for simple patterns
Link to file for heavy reference or reusable tools
简单模式放内联代码
厚重参考或可复用工具链接到单独文件
Common Mistakes
常见错误
What goes wrong + fixes
可能出现的问题 + 修复方案
Real-World Impact (optional)
实际业务影响(可选)
Concrete results
undefined具体的效果数据
undefinedClaude Search Optimization (CSO)
Claude搜索优化(CSO)
Critical for discovery: Future Claude needs to FIND your skill
对技能可被发现至关重要: 后续的Claude需要能够找到你编写的技能
1. Rich Description Field
1. 完善描述字段
Purpose: Claude reads description to decide which skills to load for a given task. Make it answer: "Should I read this skill right now?"
Format: Start with "Use when..." to focus on triggering conditions
CRITICAL: Description = When to Use, NOT What the Skill Does
The description should ONLY describe triggering conditions. Do NOT summarize the skill's process or workflow in the description.
Why this matters: Testing revealed that when a description summarizes the skill's workflow, Claude may follow the description instead of reading the full skill content. A description saying "code review between tasks" caused Claude to do ONE review, even though the skill's flowchart clearly showed TWO reviews (spec compliance then code quality).
When the description was changed to just "Use when executing implementation plans with independent tasks" (no workflow summary), Claude correctly read the flowchart and followed the two-stage review process.
The trap: Descriptions that summarize workflow create a shortcut Claude will take. The skill body becomes documentation Claude skips.
yaml
undefined作用: Claude会读取描述字段来判断当前任务需要加载哪些技能,要让描述能够回答:「我现在需要读这个技能吗?」
格式: 以"Use when..."开头,聚焦触发条件
关键规则:描述=适用场景,不是技能功能说明
描述应该只说明触发条件,不要在描述中总结技能的流程或工作流。
为什么重要: 测试显示,如果描述中总结了技能的工作流,Claude可能会直接按照描述执行,而不读取完整的技能内容。比如描述写「任务间代码评审」会导致Claude只做1次评审,哪怕技能里的流程图明确要求2次评审(先规范合规评审,再代码质量评审)。
当描述修改为仅保留「Use when executing implementation plans with independent tasks」(无工作流总结)时,Claude会正确读取流程图,并遵循两阶段评审流程。
陷阱: 总结工作流的描述会让Claude走捷径,跳过技能主体内容。
yaml
undefined❌ BAD: Summarizes workflow - Claude may follow this instead of reading skill
❌ 错误:总结了工作流 - Claude可能直接按描述执行而不读取技能内容
description: Use when executing plans - dispatches subagent per task with code review between tasks
description: Use when executing plans - dispatches subagent per task with code review between tasks
❌ BAD: Too much process detail
❌ 错误:包含太多流程细节
description: Use for TDD - write test first, watch it fail, write minimal code, refactor
description: Use for TDD - write test first, watch it fail, write minimal code, refactor
✅ GOOD: Just triggering conditions, no workflow summary
✅ 正确:仅说明触发条件,无工作流总结
description: Use when executing implementation plans with independent tasks in the current session
description: Use when executing implementation plans with independent tasks in the current session
✅ GOOD: Triggering conditions only
✅ 正确:仅说明触发条件
description: Use when implementing any feature or bugfix, before writing implementation code
**Content:**
- Use concrete triggers, symptoms, and situations that signal this skill applies
- Describe the *problem* (race conditions, inconsistent behavior) not *language-specific symptoms* (setTimeout, sleep)
- Keep triggers technology-agnostic unless the skill itself is technology-specific
- If skill is technology-specific, make that explicit in the trigger
- Write in third person (injected into system prompt)
- **NEVER summarize the skill's process or workflow**
```yamldescription: Use when implementing any feature or bugfix, before writing implementation code
**内容要求:**
- 使用具体的触发条件、症状和场景来表明该技能适用
- 描述*问题*(竞争条件、行为不一致),不是*特定语言的症状*(setTimeout、sleep)
- 除非技能本身是特定技术栈专属,否则触发条件要保持技术无关
- 如果是技术栈专属技能,要在触发条件中明确说明
- 使用第三人称编写(会被注入到系统提示词中)
- **绝对不要总结技能的流程或工作流**
```yaml❌ BAD: Too abstract, vague, doesn't include when to use
❌ 错误:太抽象、模糊,未说明适用场景
description: For async testing
description: For async testing
❌ BAD: First person
❌ 错误:第一人称
description: I can help you with async tests when they're flaky
description: I can help you with async tests when they're flaky
❌ BAD: Mentions technology but skill isn't specific to it
❌ 错误:提到了特定技术,但技能并不是该技术专属
description: Use when tests use setTimeout/sleep and are flaky
description: Use when tests use setTimeout/sleep and are flaky
✅ GOOD: Starts with "Use when", describes problem, no workflow
✅ 正确:以"Use when"开头,描述问题,无工作流内容
description: Use when tests have race conditions, timing dependencies, or pass/fail inconsistently
description: Use when tests have race conditions, timing dependencies, or pass/fail inconsistently
✅ GOOD: Technology-specific skill with explicit trigger
✅ 正确:技术专属技能,明确触发条件
description: Use when using React Router and handling authentication redirects
undefineddescription: Use when using React Router and handling authentication redirects
undefined2. Keyword Coverage
2. 关键词覆盖
Use words Claude would search for:
- Error messages: "Hook timed out", "ENOTEMPTY", "race condition"
- Symptoms: "flaky", "hanging", "zombie", "pollution"
- Synonyms: "timeout/hang/freeze", "cleanup/teardown/afterEach"
- Tools: Actual commands, library names, file types
使用Claude可能会搜索的词汇:
- 错误信息:"Hook timed out"、"ENOTEMPTY"、"race condition"
- 症状:"flaky"、"hanging"、"zombie"、"pollution"
- 同义词:"timeout/hang/freeze"、"cleanup/teardown/afterEach"
- 工具:实际命令、库名、文件类型
3. Descriptive Naming
3. 描述性命名
Use active voice, verb-first:
- ✅ not
creating-skillsskill-creation - ✅ not
condition-based-waitingasync-test-helpers
使用主动语态,动词开头:
- ✅ 优于
creating-skillsskill-creation - ✅ 优于
condition-based-waitingasync-test-helpers
4. Token Efficiency (Critical)
4. Token效率(非常重要)
Problem: getting-started and frequently-referenced skills load into EVERY conversation. Every token counts.
Target word counts:
- getting-started workflows: <150 words each
- Frequently-loaded skills: <200 words total
- Other skills: <500 words (still be concise)
Techniques:
Move details to tool help:
bash
undefined问题: 入门指南和高频引用的技能会被加载到每一次对话中,每一个token都很宝贵。
目标字数:
- 入门工作流:每个小于150字
- 高频加载技能:总字数小于200字
- 其他技能:小于500字(保持简洁)
优化技巧:
把细节移到工具帮助中:
bash
undefined❌ BAD: Document all flags in SKILL.md
❌ 错误:在SKILL.md中说明所有参数
search-conversations supports --text, --both, --after DATE, --before DATE, --limit N
search-conversations supports --text, --both, --after DATE, --before DATE, --limit N
✅ GOOD: Reference --help
✅ 正确:引用--help
search-conversations supports multiple modes and filters. Run --help for details.
**Use cross-references:**
```markdownsearch-conversations supports multiple modes and filters. Run --help for details.
**使用交叉引用:**
```markdown❌ BAD: Repeat workflow details
❌ 错误:重复工作流细节
When searching, dispatch subagent with template...
[20 lines of repeated instructions]
When searching, dispatch subagent with template...
[20行重复说明]
✅ GOOD: Reference other skill
✅ 正确:引用其他技能
Always use subagents (50-100x context savings). REQUIRED: Use [other-skill-name] for workflow.
**Compress examples:**
```markdownAlways use subagents (50-100x context savings). REQUIRED: Use [other-skill-name] for workflow.
**精简示例:**
```markdown❌ BAD: Verbose example (42 words)
❌ 错误:冗长示例(42词)
your human partner: "How did we handle authentication errors in React Router before?"
You: I'll search past conversations for React Router authentication patterns.
[Dispatch subagent with search query: "React Router authentication error handling 401"]
your human partner: "How did we handle authentication errors in React Router before?"
You: I'll search past conversations for React Router authentication patterns.
[Dispatch subagent with search query: "React Router authentication error handling 401"]
✅ GOOD: Minimal example (20 words)
✅ 正确:极简示例(20词)
Partner: "How did we handle auth errors in React Router?"
You: Searching...
[Dispatch subagent → synthesis]
**Eliminate redundancy:**
- Don't repeat what's in cross-referenced skills
- Don't explain what's obvious from command
- Don't include multiple examples of same pattern
**Verification:**
```bash
wc -w skills/path/SKILL.mdPartner: "How did we handle auth errors in React Router?"
You: Searching...
[Dispatch subagent → synthesis]
**消除冗余:**
- 不要重复交叉引用的技能中已经有的内容
- 不要解释从命令就能明显看出的功能
- 不要为同一个模式提供多个示例
**验证方式:**
```bash
wc -w skills/path/SKILL.mdgetting-started workflows: aim for <150 each
入门工作流:目标小于150字
Other frequently-loaded: aim for <200 total
其他高频加载技能:目标小于200字
**Name by what you DO or core insight:**
- ✅ `condition-based-waiting` > `async-test-helpers`
- ✅ `using-skills` not `skill-usage`
- ✅ `flatten-with-flags` > `data-structure-refactoring`
- ✅ `root-cause-tracing` > `debugging-techniques`
**Gerunds (-ing) work well for processes:**
- `creating-skills`, `testing-skills`, `debugging-with-logs`
- Active, describes the action you're taking
**按操作内容或核心洞察命名:**
- ✅ `condition-based-waiting` > `async-test-helpers`
- ✅ `using-skills` 优于 `skill-usage`
- ✅ `flatten-with-flags` > `data-structure-refactoring`
- ✅ `root-cause-tracing` > `debugging-techniques`
**动名词(-ing)很适合流程类技能:**
- `creating-skills`、`testing-skills`、`debugging-with-logs`
- 主动语态,描述你正在执行的动作4. Cross-Referencing Other Skills
4. 其他技能的交叉引用
When writing documentation that references other skills:
Use skill name only, with explicit requirement markers:
- ✅ Good:
**REQUIRED SUB-SKILL:** Use nbl.test-driven-development - ✅ Good:
**REQUIRED BACKGROUND:** You MUST understand nbl.systematic-debugging - ❌ Bad: (unclear if required)
See skills/testing/test-driven-development - ❌ Bad: (force-loads, burns context)
@skills/testing/test-driven-development/SKILL.md
Why no @ links: syntax force-loads files immediately, consuming 200k+ context before you need them.
@编写文档需要引用其他技能时:
仅使用技能名称,加上明确的要求标记:
- ✅ 正确:
**必填子技能:** 使用 nbl.test-driven-development - ✅ 正确:
**必备背景:** 你必须理解 nbl.systematic-debugging - ❌ 错误:(不清楚是否必填)
See skills/testing/test-driven-development - ❌ 错误:(强制加载,提前消耗上下文)
@skills/testing/test-driven-development/SKILL.md
为什么不要用@链接: 语法会立即强制加载文件,在你需要之前就消耗200k+的上下文。
@Flowchart Usage
流程图使用规范
dot
digraph when_flowchart {
"Need to show information?" [shape=diamond];
"Decision where I might go wrong?" [shape=diamond];
"Use markdown" [shape=box];
"Small inline flowchart" [shape=box];
"Need to show information?" -> "Decision where I might go wrong?" [label="yes"];
"Decision where I might go wrong?" -> "Small inline flowchart" [label="yes"];
"Decision where I might go wrong?" -> "Use markdown" [label="no"];
}Use flowcharts ONLY for:
- Non-obvious decision points
- Process loops where you might stop too early
- "When to use A vs B" decisions
Never use flowcharts for:
- Reference material → Tables, lists
- Code examples → Markdown blocks
- Linear instructions → Numbered lists
- Labels without semantic meaning (step1, helper2)
See @graphviz-conventions.dot for graphviz style rules.
Visualizing for your human partner: Use in this directory to render a skill's flowcharts to SVG:
render-graphs.jsbash
./render-graphs.js ../some-skill # Each diagram separately
./render-graphs.js ../some-skill --combine # All diagrams in one SVGdot
digraph when_flowchart {
"Need to show information?" [shape=diamond];
"Decision where I might go wrong?" [shape=diamond];
"Use markdown" [shape=box];
"Small inline flowchart" [shape=box];
"Need to show information?" -> "Decision where I might go wrong?" [label="yes"];
"Decision where I might go wrong?" -> "Small inline flowchart" [label="yes"];
"Decision where I might go wrong?" -> "Use markdown" [label="no"];
}仅以下场景适合使用流程图:
- 不直观的决策点
- 可能提前终止的流程循环
- 「什么时候用A什么时候用B」的决策场景
以下场景不要使用流程图:
- 参考材料 → 用表格、列表
- 代码示例 → 用Markdown代码块
- 线性指令 → 用有序列表
- 无语义含义的标签(step1、helper2)
Graphviz样式规则请参考。
@graphviz-conventions.dot为人类协作方可视化: 使用当前目录下的将技能中的流程图渲染为SVG:
render-graphs.jsbash
./render-graphs.js ../some-skill # 每个图表单独导出
./render-graphs.js ../some-skill --combine # 所有图表合并为一个SVGCode Examples
代码示例规范
One excellent example beats many mediocre ones
Choose most relevant language:
- Testing techniques → TypeScript/JavaScript
- System debugging → Shell/Python
- Data processing → Python
Good example:
- Complete and runnable
- Well-commented explaining WHY
- From real scenario
- Shows pattern clearly
- Ready to adapt (not generic template)
Don't:
- Implement in 5+ languages
- Create fill-in-the-blank templates
- Write contrived examples
You're good at porting - one great example is enough.
一个优秀的示例好过一堆平庸的示例
选择最相关的语言:
- 测试技术 → TypeScript/JavaScript
- 系统调试 → Shell/Python
- 数据处理 → Python
优秀示例的特点:
- 完整可运行
- 注释完善,说明为什么这么写
- 来自真实场景
- 清晰展示模式
- 可直接适配使用(不是通用模板)
不要做的事:
- 用5种以上语言实现
- 创建填空模板
- 写虚构的示例
你很擅长代码移植,一个优秀的示例就足够了。
File Organization
文件组织方式
Self-Contained Skill
自包含技能
defense-in-depth/
SKILL.md # Everything inlineWhen: All content fits, no heavy reference needed
defense-in-depth/
SKILL.md # 所有内容都内联适用场景:所有内容都能放下,不需要厚重参考材料
Skill with Reusable Tool
带可复用工具的技能
condition-based-waiting/
SKILL.md # Overview + patterns
example.ts # Working helpers to adaptWhen: Tool is reusable code, not just narrative
condition-based-waiting/
SKILL.md # 概述 + 模式
example.ts # 可适配的可用工具函数适用场景:工具是可复用代码,不只是说明性内容
Skill with Heavy Reference
带厚重参考内容的技能
pptx/
SKILL.md # Overview + workflows
pptxgenjs.md # 600 lines API reference
ooxml.md # 500 lines XML structure
scripts/ # Executable toolsWhen: Reference material too large for inline
pptx/
SKILL.md # 概述 + 工作流
pptxgenjs.md # 600行API参考
ooxml.md # 500行XML结构说明
scripts/ # 可执行工具适用场景:参考内容太多,不适合内联
The Iron Law (Same as TDD)
铁律(和TDD一致)
NO SKILL WITHOUT A FAILING TEST FIRSTThis applies to NEW skills AND EDITS to existing skills.
Write skill before testing? Delete it. Start over.
Edit skill without testing? Same violation.
No exceptions:
- Not for "simple additions"
- Not for "just adding a section"
- Not for "documentation updates"
- Don't keep untested changes as "reference"
- Don't "adapt" while running tests
- Delete means delete
REQUIRED BACKGROUND: The nbl.test-driven-development skill explains why this matters. Same principles apply to documentation.
没有先跑通失败测试,就不要写技能这适用于新技能创建和现有技能的编辑。
测试前就写技能?删掉,重来。
没测试就编辑技能?同样违规。
没有例外:
- 哪怕是「简单补充」也不行
- 哪怕是「只是加个章节」也不行
- 哪怕是「文档更新」也不行
- 不要保留未测试的改动当「参考」
- 不要在跑测试的时候「适配」内容
- 删除就是彻底删除
必备背景: 技能解释了为什么这很重要,同样的原则适用于文档编写。
nbl.test-driven-developmentTesting All Skill Types
所有技能类型的测试
Different skill types need different test approaches:
不同类型的技能需要不同的测试方法:
Discipline-Enforcing Skills (rules/requirements)
规则约束类技能(规则/要求)
Examples: TDD, verification-before-completion, designing-before-coding
Test with:
- Academic questions: Do they understand the rules?
- Pressure scenarios: Do they comply under stress?
- Multiple pressures combined: time + sunk cost + exhaustion
- Identify rationalizations and add explicit counters
Success criteria: Agent follows rule under maximum pressure
示例: TDD、完成前验证、编码前设计
测试方法:
- 理论问题:是否理解规则?
- 高压场景:压力下是否还能遵守规则?
- 多重压力叠加:时间压力 + 沉没成本 + 疲惫
- 识别所有合理化借口,添加明确的应对条款
成功标准: Agent在最大压力下仍能遵守规则
Technique Skills (how-to guides)
技术类技能(操作指南)
Examples: condition-based-waiting, root-cause-tracing, defensive-programming
Test with:
- Application scenarios: Can they apply the technique correctly?
- Variation scenarios: Do they handle edge cases?
- Missing information tests: Do instructions have gaps?
Success criteria: Agent successfully applies technique to new scenario
示例: condition-based-waiting、root-cause-tracing、defensive-programming
测试方法:
- 应用场景:是否能正确应用该技术?
- 变体场景:是否能处理边界 case?
- 信息缺失测试:说明是否有遗漏?
成功标准: Agent能成功将技术应用到新场景
Pattern Skills (mental models)
模式类技能(思维模型)
Examples: reducing-complexity, information-hiding concepts
Test with:
- Recognition scenarios: Do they recognize when pattern applies?
- Application scenarios: Can they use the mental model?
- Counter-examples: Do they know when NOT to apply?
Success criteria: Agent correctly identifies when/how to apply pattern
示例: reducing-complexity、信息隐藏概念
测试方法:
- 识别场景:是否能识别出适用该模式的场景?
- 应用场景:是否能使用该思维模型?
- 反例测试:是否知道什么时候不该用?
成功标准: Agent能正确识别什么时候/如何应用该模式
Reference Skills (documentation/APIs)
参考类技能(文档/API)
Examples: API documentation, command references, library guides
Test with:
- Retrieval scenarios: Can they find the right information?
- Application scenarios: Can they use what they found correctly?
- Gap testing: Are common use cases covered?
Success criteria: Agent finds and correctly applies reference information
示例: API文档、命令参考、库指南
测试方法:
- 检索场景:是否能找到正确的信息?
- 应用场景:是否能正确使用找到的信息?
- 缺口测试:是否覆盖了常用场景?
成功标准: Agent能找到并正确应用参考信息
Common Rationalizations for Skipping Testing
跳过测试的常见借口
| Excuse | Reality |
|---|---|
| "Skill is obviously clear" | Clear to you ≠ clear to other agents. Test it. |
| "It's just a reference" | References can have gaps, unclear sections. Test retrieval. |
| "Testing is overkill" | Untested skills have issues. Always. 15 min testing saves hours. |
| "I'll test if problems emerge" | Problems = agents can't use skill. Test BEFORE deploying. |
| "Too tedious to test" | Testing is less tedious than debugging bad skill in production. |
| "I'm confident it's good" | Overconfidence guarantees issues. Test anyway. |
| "Academic review is enough" | Reading ≠ using. Test application scenarios. |
| "No time to test" | Deploying untested skill wastes more time fixing it later. |
All of these mean: Test before deploying. No exceptions.
| 借口 | 实际情况 |
|---|---|
| "技能内容明显很清晰" | 你觉得清晰≠其他Agent觉得清晰,测试一下。 |
| "只是个参考文档而已" | 参考文档也可能有缺口、表述不清的地方,测试检索能力。 |
| "测试太小题大做了" | 未测试的技能一定有问题,15分钟测试能节省数小时后续时间。 |
| "出问题了我再测试" | 出问题=Agent已经用不了这个技能了,部署前就测试。 |
| "测试太麻烦了" | 测试比生产环境排查坏技能的问题要简单得多。 |
| "我有信心没问题" | 过度自信一定会出问题,还是要测试。 |
| "理论评审就够了" | 读≠会用,测试应用场景。 |
| "没时间测试" | 部署未测试的技能后续修复会浪费更多时间。 |
所有这些借口都指向同一个结论:部署前必须测试,没有例外。
Bulletproofing Skills Against Rationalization
让技能免疫合理化借口
Skills that enforce discipline (like TDD) need to resist rationalization. Agents are smart and will find loopholes when under pressure.
Psychology note: Understanding WHY persuasion techniques work helps you apply them systematically. See persuasion-principles.md for research foundation (Cialdini, 2021; Meincke et al., 2025) on authority, commitment, scarcity, social proof, and unity principles.
约束类技能(比如TDD)需要能抵御合理化借口,Agent很聪明,压力下会找各种漏洞。
心理学提示: 理解说服技术的生效原理能帮你系统性地应用它们,相关研究基础请参考(Cialdini, 2021; Meincke et al., 2025),涵盖权威、承诺、稀缺、社会认同和统一性原则。
persuasion-principles.mdClose Every Loophole Explicitly
明确封堵所有漏洞
Don't just state the rule - forbid specific workarounds:
<Bad>
```markdown
Write code before test? Delete it.
```
</Bad>
<Good>
```markdown
Write code before test? Delete it. Start over.
No exceptions:
- Don't keep it as "reference"
- Don't "adapt" it while writing tests
- Don't look at it
- Delete means delete
</Good>不要只说规则,还要明确禁止具体的变通方法:
<错误示例>
markdown
测试前写代码?删掉。</错误示例>
<正确示例>
markdown
测试前写代码?删掉,重来。
**没有例外:**
- 不要保留当「参考」
- 不要写测试的时候「适配」这些代码
- 不要看这些代码
- 删除就是彻底删除</正确示例>
Address "Spirit vs Letter" Arguments
应对「精神vs字面」争议
Add foundational principle early:
markdown
**Violating the letter of the rules is violating the spirit of the rules.**This cuts off entire class of "I'm following the spirit" rationalizations.
提前添加基础原则:
markdown
**违反规则的字面要求就是违反规则的精神要求。**这能直接切断所有「我是遵循规则精神」的合理化借口。
Build Rationalization Table
建立合理化借口对照表
Capture rationalizations from baseline testing (see Testing section below). Every excuse agents make goes in the table:
markdown
| Excuse | Reality |
|--------|---------|
| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
| "I'll test after" | Tests passing immediately prove nothing. |
| "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |从基准测试中收集所有合理化借口(见下方测试章节),Agent找的所有借口都要放进表格:
markdown
| 借口 | 实际情况 |
|--------|---------|
| "太简单了不需要测试" | 简单代码也会出问题,测试只需要30秒。 |
| "我之后再测试" | 之后才跑通的测试什么都证明不了。 |
| "后写测试也能达到一样的效果" | 后写测试=「这段代码做了什么?」,先写测试=「这段代码应该做什么?」 |Create Red Flags List
创建红色预警列表
Make it easy for agents to self-check when rationalizing:
markdown
undefined方便Agent自我检查是否在找借口:
markdown
undefinedRed Flags - STOP and Start Over
红色预警 - 停止操作并重来
- Code before test
- "I already manually tested it"
- "Tests after achieve the same purpose"
- "It's about spirit not ritual"
- "This is different because..."
All of these mean: Delete code. Start over with TDD.
undefined- 测试前写代码
- "我已经手动测试过了"
- "后写测试也能达到一样的目的"
- "重要的是精神不是形式"
- "这次不一样因为..."
出现以上任何情况: 删除代码,用TDD流程重来。
undefinedUpdate CSO for Violation Symptoms
为违规症状更新CSO
Add to description: symptoms of when you're ABOUT to violate the rule:
yaml
description: use when implementing any feature or bugfix, before writing implementation code在描述中添加你即将违反规则的症状:
yaml
description: use when implementing any feature or bugfix, before writing implementation codeRED-GREEN-REFACTOR for Skills
技能编写的「红-绿-重构」流程
Follow the TDD cycle:
遵循TDD循环:
RED: Write Failing Test (Baseline)
红:编写失败测试(基准)
Run pressure scenario with subagent WITHOUT the skill. Document exact behavior:
- What choices did they make?
- What rationalizations did they use (verbatim)?
- Which pressures triggered violations?
This is "watch the test fail" - you must see what agents naturally do before writing the skill.
在不加载技能的情况下用子Agent运行高压场景,记录准确行为:
- 他们做了什么选择?
- 他们给出了什么合理化借口(原文记录)?
- 什么压力触发了违规?
这就是「观察测试失败」——编写技能前你必须知道Agent的天然行为是什么。
GREEN: Write Minimal Skill
绿:编写最小化技能
Write skill that addresses those specific rationalizations. Don't add extra content for hypothetical cases.
Run same scenarios WITH skill. Agent should now comply.
针对已发现的合理化借口编写技能内容,不要为假设的场景添加额外内容。
加载技能后运行相同的场景,Agent应该能遵守规则。
REFACTOR: Close Loopholes
重构:填补漏洞
Agent found new rationalization? Add explicit counter. Re-test until bulletproof.
Testing methodology: See @testing-skills-with-subagents.md for the complete testing methodology:
- How to write pressure scenarios
- Pressure types (time, sunk cost, authority, exhaustion)
- Plugging holes systematically
- Meta-testing techniques
Agent找到了新的合理化借口?添加明确的应对条款,重新测试直到无懈可击。
测试方法: 完整测试方法请参考:
@testing-skills-with-subagents.md- 如何编写高压场景
- 压力类型(时间、沉没成本、权威、疲惫)
- 系统性填补漏洞
- 元测试技术
Anti-Patterns
反模式
❌ Narrative Example
❌ 叙事式示例
"In session 2025-10-03, we found empty projectDir caused..."
Why bad: Too specific, not reusable
"在2025-10-03的会话中,我们发现空的projectDir导致了..."
为什么不好: 太具体,不可复用
❌ Multi-Language Dilution
❌ 多语言稀释
example-js.js, example-py.py, example-go.go
Why bad: Mediocre quality, maintenance burden
示例同时提供js、py、go等多个版本
为什么不好: 质量平庸,维护成本高
❌ Code in Flowcharts
❌ 流程图里放代码
dot
step1 [label="import fs"];
step2 [label="read file"];Why bad: Can't copy-paste, hard to read
dot
step1 [label="import fs"];
step2 [label="read file"];为什么不好: 不能复制粘贴,难阅读
❌ Generic Labels
❌ 通用标签
helper1, helper2, step3, pattern4
Why bad: Labels should have semantic meaning
helper1、helper2、step3、pattern4
为什么不好: 标签应该有语义含义
STOP: Before Moving to Next Skill
停止:进入下一个技能前
After writing ANY skill, you MUST STOP and complete the deployment process.
Do NOT:
- Create multiple skills in batch without testing each
- Move to next skill before current one is verified
- Skip testing because "batching is more efficient"
The deployment checklist below is MANDATORY for EACH skill.
Deploying untested skills = deploying untested code. It's a violation of quality standards.
编写完任何技能后,你必须停止并完成部署流程。
不要做:
- 批量创建多个技能,每个都不测试
- 当前技能还没验证就进入下一个
- 觉得「批量更高效」就跳过测试
下面的部署检查清单对每个技能都是强制要求的。
部署未测试的技能=部署未测试的代码,违反质量标准。
Skill Creation Checklist (TDD Adapted)
技能创建检查清单(TDD适配版)
IMPORTANT: Use TodoWrite to create todos for EACH checklist item below.
RED Phase - Write Failing Test:
- Create pressure scenarios (3+ combined pressures for discipline skills)
- Run scenarios WITHOUT skill - document baseline behavior verbatim
- Identify patterns in rationalizations/failures
GREEN Phase - Write Minimal Skill:
- Name uses only letters, numbers, hyphens (no parentheses/special chars)
- YAML frontmatter with only name and description (max 1024 chars)
- Description starts with "Use when..." and includes specific triggers/symptoms
- Description written in third person
- Keywords throughout for search (errors, symptoms, tools)
- Clear overview with core principle
- Address specific baseline failures identified in RED
- Code inline OR link to separate file
- One excellent example (not multi-language)
- Run scenarios WITH skill - verify agents now comply
REFACTOR Phase - Close Loopholes:
- Identify NEW rationalizations from testing
- Add explicit counters (if discipline skill)
- Build rationalization table from all test iterations
- Create red flags list
- Re-test until bulletproof
Quality Checks:
- Small flowchart only if decision non-obvious
- Quick reference table
- Common mistakes section
- No narrative storytelling
- Supporting files only for tools or heavy reference
Deployment:
- Commit skill to git and push to your fork (if configured)
- Consider contributing back via PR (if broadly useful)
重要: 对下面的每个检查项,用TodoWrite创建待办项。
红阶段 - 编写失败测试:
- 创建高压场景(约束类技能需要3种以上叠加压力)
- 不加载技能运行场景 - 原文记录基准行为
- 识别合理化借口/失败的模式
绿阶段 - 编写最小化技能:
- 名称仅使用字母、数字、连字符(无括号/特殊字符)
- YAML前置元数据仅包含name和description(最大1024字符)
- 描述以"Use when..."开头,包含具体触发条件/症状
- 描述用第三人称编写
- 全文覆盖搜索关键词(错误、症状、工具)
- 有清晰的概述和核心原则
- 针对红阶段发现的基准失败点做了对应说明
- 代码内联或链接到单独文件
- 提供一个优秀的示例(不要多语言版本)
- 加载技能后运行场景 - 验证Agent现在遵守规则
重构阶段 - 填补漏洞:
- 识别测试中出现的新合理化借口
- 添加明确的应对条款(约束类技能需要)
- 基于所有测试迭代建立合理化借口对照表
- 创建红色预警列表
- 重新测试直到无懈可击
质量检查:
- 仅当决策不直观时才添加小型流程图
- 有快速参考表格
- 有常见错误章节
- 无叙事性故事内容
- 仅工具或厚重参考内容才拆分单独文件
部署:
- 将技能提交到git并推送到你的fork(如果已配置)
- 如果适用范围广,考虑提PR贡献回上游
Discovery Workflow
发现流程
How future Claude finds your skill:
- Encounters problem ("tests are flaky")
- Finds SKILL (description matches)
- Scans overview (is this relevant?)
- Reads patterns (quick reference table)
- Loads example (only when implementing)
Optimize for this flow - put searchable terms early and often.
后续的Claude如何找到你的技能:
- 遇到问题(比如「测试不稳定」)
- 找到技能(描述匹配)
- 浏览概述(判断是否相关)
- 查看模式(快速参考表格)
- 加载示例(仅需要实现时)
针对这个流程做优化 —— 尽可能早、尽可能多地放置可搜索的术语。
The Bottom Line
总结
Creating skills IS TDD for process documentation.
Same Iron Law: No skill without failing test first.
Same cycle: RED (baseline) → GREEN (write skill) → REFACTOR (close loopholes).
Same benefits: Better quality, fewer surprises, bulletproof results.
If you follow TDD for code, follow it for skills. It's the same discipline applied to documentation.
创建技能本质就是流程文档的TDD。
同样的铁律:没有先跑通失败测试就不要写技能。
同样的循环:红(基准)→ 绿(写技能)→ 重构(补漏洞)。
同样的收益:质量更高、意外更少、结果可靠。
如果你写代码遵循TDD,那写技能也应该遵循TDD,这是应用在文档上的同一套规范。