skill-sharpen

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Skill Sharpen

技能打磨

Born from real-world production usage across multiple projects. Every diagnostic category, every proposal flow, and every guardrail exists because it solved a real problem in a real skill.
Kaizen (改善) for AI agent skills. Observe how a skill performed, find what went wrong or could be better, and propose concrete changes to its SKILL.md.
  • Gathers evidence from three sources: conversation friction, file diffs, and your feedback
  • Diagnoses root causes and proposes improvements — you decide each one
  • Tracks recurrence in LESSONS.md with automatic importance escalation
  • Works with Claude Code and any SKILL.md-based agent framework
源自多个项目的实际生产使用经验。每一个诊断分类、每一个提议流程、每一条防护规则,都是为了解决真实技能遇到的实际问题而设计的。
AI Agent技能的持续改善(Kaizen)工具:观察技能的执行表现,找出问题或可优化点,针对其SKILL.md提出具体修改建议。
  • 从三个来源收集证据:会话摩擦、文件diff、你的反馈
  • 诊断根本原因并提出改进建议——每一项都由你决定是否采纳
  • 在LESSONS.md中跟踪问题复现情况,自动提升重要性等级
  • 支持Claude Code以及任何基于SKILL.md的Agent框架

Process

流程

1. Resolve Target Skill

1. 确定目标技能

Determine which skill to sharpen:
  • Explicit (
    /skill-sharpen <name>
    ): Search for
    <name>
    across skill directories — local project skills, installed skills, and plugin skills. Match by directory name.
  • Auto-detect (
    /skill-sharpen
    with no args): Scan conversation history for the most recently loaded skill (look for SKILL.md content or
    /skill-name
    invocations). Ask the user to confirm: "Detected
    <name>
    — is that the one?"
If the skill is not found, list the paths searched and ask the user for a correction or an explicit path.
Once resolved, read the target skill's
SKILL.md
and
LESSONS.md
(if it exists). Keep both in context — they inform what to propose and what to skip.
判断要打磨的技能:
  • 显式指定
    /skill-sharpen <名称>
    ):在技能目录中搜索
    <名称>
    ——包括本地项目技能、已安装技能、插件技能,按目录名称匹配。
  • 自动检测(无参数的
    /skill-sharpen
    ):扫描会话历史查找最近加载的技能(查找SKILL.md内容或
    /skill-name
    调用记录),向用户确认:"检测到
    <名称>
    ——是要优化这个技能吗?"
如果未找到技能,列出搜索过的路径,请求用户修正或提供显式路径。
确定目标后,读取目标技能的
SKILL.md
LESSONS.md
(如果存在),将两份文件内容保留在上下文里——用来判断应该提出哪些建议、跳过哪些内容。

2. Determine Execution Mode

2. 确定执行模式

Ask the user or detect from arguments:
ModeTriggerBehavior
InteractiveDefault (no flag)Analyze sources → diagnose root cause → propose one by one → user decides each
Observe-only
--observe
or "just log"
Analyze sources → diagnose → write all to LESSONS.md → done
Watch
--watch <skill>
or "run X and observe"
Execute the target skill first, then analyze the results (interactive or with
--observe
)
Review
--review
or "review lessons"
Skip source analysis → walk through existing LESSONS.md entries
Audit
--audit
or "audit the skill"
Skip sources → full static diagnostic of the SKILL.md → propose fixes
If mode is Review, jump directly to Step 6: Review Mode.
Watch mode: Also detects natural language: "ejecutá /create-plan y después observemos" triggers watch + interactive. The skill being watched becomes the target for analysis.
Accumulation workflow: Use
--observe
(or
--watch --observe
) repeatedly across sessions to accumulate lessons in LESSONS.md. Each run adds new findings or increments Hits on existing ones. When ready to process, run
--review
to walk through everything and decide what to apply.
Session 1: /skill-sharpen --watch create-plan --observe  → runs skill, logs findings
Session 2: /skill-sharpen --observe                      → logs more findings, Hits grow
Session 3: /skill-sharpen --review                       → process all accumulated lessons
询问用户或通过参数检测模式:
模式触发方式行为
交互模式默认(无标记)分析来源 → 诊断根本原因 → 逐一提议改进 → 用户逐个决策
仅观察模式
--observe
或 "仅记录"
分析来源 → 诊断 → 全部写入LESSONS.md → 结束
监听模式
--watch <技能>
或 "运行X并观察"
先执行目标技能,再分析结果(交互模式或搭配
--observe
使用)
复盘模式
--review
或 "复盘经验"
跳过来源分析 → 遍历现有LESSONS.md条目
审计模式
--audit
或 "审计技能"
跳过来源分析 → 对SKILL.md进行全量静态诊断 → 提出修复建议
如果是复盘模式,直接跳转到第6步:复盘模式
监听模式:也支持自然语言触发:"ejecutá /create-plan y después observemos"会触发监听+交互模式,被监听的技能会成为分析目标。
积累工作流:跨会话重复使用
--observe
(或
--watch --observe
),将改进经验积累到LESSONS.md中。每次运行会添加新发现,或给已有条目的命中次数计数。准备好处理时,运行
--review
遍历所有积累的经验并决定是否应用。
会话1: /skill-sharpen --watch create-plan --observe  → 运行技能,记录发现
会话2: /skill-sharpen --observe                      → 记录更多发现,命中次数增加
会话3: /skill-sharpen --review                       → 处理所有积累的改进经验

3. Gather Evidence

3. 收集证据

Collect information from three sources. Work with whatever is available — not all sources will have signal every time.
Source A — Conversation history
Scan the conversation for friction signals:
  • Errors or exceptions during skill execution
  • User corrections ("no, not that", "I meant...", "undo that")
  • Retries or repeated attempts at the same step
  • Manual interventions the user had to make
  • Confusion about what the skill was supposed to do
  • Steps the skill skipped or did in the wrong order
Source B — File diffs
Check
git diff
or recently modified files for:
  • Files the skill created or modified — do they match what was expected?
  • Changes the user had to make after the skill ran (post-corrections)
  • Incomplete implementations (TODOs, placeholders, missing pieces)
  • Patterns that deviate from what the SKILL.md prescribed
Source C — User feedback
Ask the user directly:
"What worked well? What didn't? Anything specific you want the skill to do differently?"
This is especially valuable when conversation context is compressed or when the issues are subtle (preferences, style, approach). Keep it open-ended — one question, then follow up if needed.
从三个来源收集信息,可用的信息都可以使用——不需要每次都有三个来源的信号。
来源A — 会话历史
扫描会话中的摩擦信号:
  • 技能执行期间的错误或异常
  • 用户修正("不对,不是这个"、"我指的是..."、"撤销")
  • 同一步骤的重试或重复尝试
  • 用户不得不做的手动干预
  • 对技能预期功能的困惑
  • 技能跳过的步骤或执行顺序错误的步骤
来源B — 文件diff
检查
git diff
或最近修改的文件,查找:
  • 技能创建或修改的文件——是否符合预期?
  • 技能运行后用户不得不做的修改(事后修正)
  • 不完整的实现(TODO、占位符、缺失部分)
  • 偏离SKILL.md规定的模式
来源C — 用户反馈
直接询问用户:
"哪些部分运行良好?哪些不好?有没有什么你希望这个技能做出的具体调整?"
当会话上下文被压缩,或者问题比较隐蔽(偏好、风格、实现思路)时,这个方式尤其有价值。保持开放式提问——先问一个问题,必要时再跟进。

4. Analyze and Generate Proposals

4. 分析并生成改进提议

Cross-reference the evidence against the target skill's SKILL.md to identify:
CategoryWhat to look for
Missing instructionsSteps the skill should have taken but didn't because the SKILL.md didn't mention them
Ambiguous instructionsPlaces where the SKILL.md was vague and the skill made a wrong choice
Wrong defaultsDefault behaviors that consistently need overriding
Missing guardrailsErrors that a "don't" rule would have prevented
Outdated contentReferences to APIs, tools, or patterns that have changed
Missing examplesCases where an example would have prevented misinterpretation
Structural issuesOrdering problems, missing sections, or buried important info
For each finding, diagnose the root cause in the SKILL.md. Don't just describe what went wrong — explain why it happened by tracing it back to a specific instruction, gap, or ambiguity. Use these diagnostic categories:
DiagnosticWhat it means
CoherenceSections don't align — the process says one thing, the guardrails another
CouplingContent that doesn't belong in this skill — leaks from another domain, out-of-scope instructions, or mixed responsibilities that caused the agent to act outside its purpose. If it's not cohesive with the skill's core goal, it shouldn't be there
AmbiguityInstruction open to interpretation — "if needed", "as appropriate" without criteria
ContradictionTwo rules directly conflict
Specificity gapNo concrete rule exists for this case — the agent had to guess
Missing instructionThe SKILL.md simply doesn't cover this scenario
RedundancySame instruction repeated in different sections or worded differently — causes confusion about which one to follow, wastes context window
Error inducerA specific instruction actively promotes the wrong behavior
Each proposal must include a short root cause line. Format:
Finding: [what happened]
Root cause: [diagnostic] — [which line/section caused it and why]
Proposed change: [concrete fix]
--audit
mode:
When invoked with
--audit
, run a full static diagnostic of the SKILL.md without requiring execution evidence. Validate against these baseline rules (from Agent Skills spec + Anthropic best practices):
  • Frontmatter must have
    name
    and
    description
    (required)
  • Description max 1024 characters, third person, with specific trigger phrases
  • Body should be under 500 lines — use
    references/
    for overflow
  • Name: lowercase, hyphens only, 1-64 characters
  • Progressive disclosure: metadata (~100 tokens) → body (<5k tokens) → resources (as needed)
  • Check for: dead content, scope creep, trigger quality, token efficiency, completeness
Enrich with context7 (optional): If the
context7
MCP server is available, query the latest Agent Skills specification and Anthropic skill authoring best practices to ensure rules reflect the most current standards. If not available, use the baseline rules above — they cover the stable core.
For each finding, formulate a proposal: a concrete, actionable change to the SKILL.md.
Assign importance based on impact:
ImportanceCriteria
highBreaks output, causes errors, or requires user intervention every time
mediumSuboptimal results, friction exists but workaround is possible
lowStyle, preferences, minor improvements
Recurrence escalation: Before generating a new proposal, check LESSONS.md for an existing entry describing the same pattern. If found, increment its
Hits
column instead of creating a duplicate. When hits reach 3+, escalate importance:
low
medium
,
medium
high
.
high
stays
high
.
将证据与目标技能的SKILL.md交叉对比,识别以下问题:
分类查找内容
缺失说明技能应该执行但未执行的步骤,因为SKILL.md没有提及
说明模糊SKILL.md表述模糊,导致技能做出错误选择的地方
默认值错误始终需要覆盖的默认行为
防护规则缺失增加"禁止"规则就可以避免的错误
内容过时对已经变更的API、工具或模式的引用
示例缺失增加示例就能避免误解的场景
结构问题排序问题、章节缺失、重要信息被隐藏
对每个发现,诊断SKILL.md中的根本原因,不要只描述出了什么问题——要追溯到具体的说明、缺口或模糊表述,解释为什么会出现问题。使用以下诊断分类:
诊断类型含义
一致性问题章节内容不一致——流程说明是一套,防护规则是另一套
耦合问题不属于该技能的内容——其他领域的泄漏、超出范围的说明、混合的职责,导致Agent执行超出自身定位的操作。如果和技能核心目标不一致,就不应该出现在这里
模糊性说明存在解释空间——没有判定标准的"按需"、"视情况而定"
矛盾两条规则直接冲突
明确性缺口该场景没有对应的具体规则——Agent只能猜测
说明缺失SKILL.md完全没有覆盖该场景
冗余同一条说明在不同章节重复出现或表述不同——导致不知道该遵循哪一条,浪费上下文窗口
错误诱导某条具体说明主动引导了错误行为
每个提议都必须包含简短的根本原因行,格式如下:
发现: [发生了什么]
根本原因: [诊断类型] — 哪一行/哪个章节导致的问题及原因
提议修改: [具体修复方案]
--audit
模式:
调用时携带
--audit
参数时,不需要执行证据,直接对SKILL.md进行全量静态诊断,对照以下基线规则验证(来自Agent Skills规范 + Anthropic最佳实践):
  • Frontmatter必须包含
    name
    description
    (必填)
  • 描述最多1024字符,第三人称,包含明确的触发短语
  • 正文应该少于500行——超出部分放入
    references/
    目录
  • 名称:小写,仅使用连字符,长度1-64字符
  • 渐进式披露:元数据(~100 tokens)→ 正文(<5k tokens)→ 资源(按需)
  • 检查:无效内容、范围蔓延、触发质量、token效率、完整性
使用context7增强(可选): 如果
context7
MCP服务可用,查询最新的Agent Skills规范和Anthropic技能编写最佳实践,确保规则符合最新标准。如果不可用,使用上述基线规则——它们覆盖了稳定的核心要求。
对每个发现,制定一个提议:对SKILL.md的具体、可执行的修改。
根据影响分配重要性:
重要性判定标准
破坏输出、导致错误,或每次都需要用户干预
结果未达最优,存在摩擦但有 workaround
风格、偏好、微小改进
复现升级规则: 生成新提议前,检查LESSONS.md中是否已有描述相同模式的条目。如果有,增加其
Hits
计数,不要创建重复条目。当命中次数达到3次及以上时,升级重要性:
保持

5. Present Proposals (Interactive Mode)

5. 展示提议(交互模式)

Present proposals one at a time, ordered by importance (high → medium → low).
For each proposal, show:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  PROPOSAL [N/total] — [importance]
  Source: [conversation | diff | user]

  Finding: [what was observed]
  Root cause: [diagnostic] — [which line/section and why]
  Hits: [N — omit if first occurrence]

  Proposed change:
  [concrete description of what to add/modify/remove in SKILL.md]

  Preview:
  [show the actual diff or new text that would be added]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  (a)ccept  (p)ostpone  (r)eject  (d)on't  (s)kip all
Handle the user's decision:
  • Accept: Show the exact edit to be made. Apply it only after the user confirms. Edit the target SKILL.md directly.
  • Postpone: Append to the target skill's LESSONS.md (create if it doesn't exist).
  • Reject: Discard and move to the next proposal.
  • Don't: The user is saying "this is wrong, the skill should NEVER do this". Confirm the negative rule with the user, then add it to the SKILL.md as a "Do NOT..." instruction.
  • Skip all: Write remaining proposals to LESSONS.md and end.
After all proposals are processed, show a summary:
Done. [N] accepted, [N] postponed, [N] rejected, [N] don'ts added.
逐个展示提议,按重要性排序(高 → 中 → 低)。
对每个提议,展示以下内容:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  提议 [N/总数] — [重要性]
  来源: [conversation | diff | user]

  发现: [观察到的现象]
  根本原因: [诊断类型] — 哪一行/哪个章节及原因
  命中次数: [N — 首次出现则省略]

  提议修改:
  [对SKILL.md要新增/修改/删除内容的具体描述]

  预览:
  [展示实际diff或要新增的文本内容]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  (a)ccept 接受  (p)ostpone 延后  (r)eject 拒绝  (d)on't 禁止  (s)kip all 跳过全部
处理用户的决策:
  • 接受:展示要执行的具体修改内容,仅在用户确认后应用,直接编辑目标SKILL.md。
  • 延后:追加到目标技能的LESSONS.md(如果不存在则创建)。
  • 拒绝:丢弃,进入下一个提议。
  • 禁止:用户表示"这是错的,技能绝对不能这么做"。和用户确认禁止规则,然后将其作为"禁止..."的说明添加到SKILL.md中。
  • 跳过全部:将剩余提议写入LESSONS.md并结束。
所有提议处理完成后,展示总结:
完成。已接受[N]个,已延后[N]个,已拒绝[N]个,已添加[N]条禁止规则。

6. Review Mode

6. 复盘模式

Walk through existing LESSONS.md entries one by one. For each entry, present it in the same format as Step 5 (but source and finding come from the LESSONS.md row).
The user can:
  • Accept → apply to SKILL.md, remove from LESSONS.md
  • Reject → remove from LESSONS.md (optionally convert to "don't")
  • Keep → leave in LESSONS.md for later
After processing all entries, show the summary. If all entries were processed (none kept), delete the LESSONS.md file.
逐个遍历现有LESSONS.md条目。对每个条目,用和第5步相同的格式展示(但来源和发现来自LESSONS.md行)。
用户可以:
  • 接受 → 应用到SKILL.md,从LESSONS.md中移除
  • 拒绝 → 从LESSONS.md中移除(可选转为"禁止"规则)
  • 保留 → 留在LESSONS.md中后续处理
所有条目处理完成后,展示总结。如果所有条目都已处理(无保留),删除LESSONS.md文件。

LESSONS.md Format

LESSONS.md 格式

The file lives alongside the target skill's SKILL.md. Format:
markdown
undefined
该文件和目标技能的SKILL.md存放在同一目录,格式如下:
markdown
undefined

Lessons — {skill-name}

改进经验 — {skill-name}

1 — high | Hits: 1

1 — 高 | 命中次数: 1

  • Date: 2026-03-28
  • Source: conversation
  • Diagnostic: ambiguity — line 45 says "if needed" without criteria
  • Proposal: Replace "if needed" with explicit condition: "when scope is api or both"
  • 日期: 2026-03-28
  • 来源: conversation
  • 诊断: 模糊性 — 第45行提到"按需"但没有判定标准
  • 提议: 将"按需"替换为明确条件:"当范围是api或两者都是时"

2 — medium | Hits: 3

2 — 中 | 命中次数: 3

  • Date: 2026-03-27
  • Source: diff
  • Diagnostic: missing instruction
  • Proposal: Add validation step before Phase 3 for skill-scoped plans

**Fields:**
- **Heading**: entry number + importance + hits count
- **Date**: when first generated (YYYY-MM-DD), updated to latest occurrence on hit
- **Source**: `conversation`, `diff`, or `user`
- **Diagnostic**: root cause category + short explanation
- **Proposal**: concise description of finding + proposed change

**Rules:**
- Never create an empty LESSONS.md — only create it when there's at least one entry
- When the same pattern is detected again, increment `Hits` in the heading instead of
  adding a duplicate. Update `Date` to the latest occurrence
- When hits reach 3+, escalate importance: `low` → `medium`, `medium` → `high`
- When accepting or rejecting an entry, remove the entire block
- When all entries are removed, delete the file
  • 日期: 2026-03-27
  • 来源: diff
  • 诊断: 说明缺失
  • 提议: 在第3阶段前为技能范围的计划增加验证步骤

**字段说明:**
- **标题**: 条目编号 + 重要性 + 命中次数
- **日期**: 首次生成时间(YYYY-MM-DD),命中时更新为最近一次发生的时间
- **来源**: `conversation`、`diff`或`user`
- **诊断**: 根本原因分类 + 简短说明
- **提议**: 发现内容 + 提议修改的简明描述

**规则:**
- 永远不要创建空的LESSONS.md——仅当至少有一个条目时才创建
- 再次检测到相同模式时,增加标题中的`命中次数`,不要添加重复条目,将`日期`更新为最近一次发生的时间
- 命中次数达到3次及以上时,升级重要性:`低` → `中`,`中` → `高`
- 接受或拒绝条目时,删除整个区块
- 所有条目都被移除时,删除该文件

Guardrails

防护规则

  • Never edit without confirmation. Always show the proposed diff and wait for explicit user approval before modifying any SKILL.md. This is non-negotiable — no exceptions, not even in observe-only mode (which writes to LESSONS.md, never to SKILL.md). Always ask the user what they want to do. The user owns the skill.
  • Never expose secrets. When analyzing conversation history, diffs, or files, redact any sensitive content before displaying it in proposals, previews, or LESSONS.md entries. This includes: API keys, tokens, passwords, connection strings, private URLs, and any value that looks like a credential (e.g.,
    sk-...
    ,
    ghp_...
    ,
    Bearer ...
    ). Replace with
    [REDACTED]
    in all output. Never write secrets to LESSONS.md.
  • Read before proposing. Always read the target SKILL.md and LESSONS.md before generating proposals. Avoids duplicates, contradictions, and already-addressed issues.
  • Work with partial context. If the conversation was long and context is compressed, work with what's available. State what you can see and what might be missing. Never invent evidence or assume what happened.
  • One proposal at a time. Don't dump all proposals at once. Present, decide, move on.
  • Respect the SKILL.md structure. When inserting new content, match the existing style, indentation, and organizational pattern of the target SKILL.md.
  • Don'ts need double confirmation. Adding a negative rule to a SKILL.md is impactful. Always confirm: "Add this as a 'don't' rule to the SKILL.md?"

Made by Crystian
  • 未经确认绝对不编辑。 修改任何SKILL.md之前,必须展示提议的diff并等待用户明确批准。这是不可协商的——没有例外,即使是仅观察模式也不行(仅观察模式只会写入LESSONS.md,永远不会修改SKILL.md)。始终询问用户的意愿,用户拥有技能的所有权。
  • 永远不要泄露密钥。 分析会话历史、diff或文件时,在提议、预览或LESSONS.md条目中展示内容前,要脱敏所有敏感内容,包括:API密钥、token、密码、连接字符串、私有URL,以及任何看起来像凭证的值(例如
    sk-...
    ghp_...
    Bearer ...
    )。在所有输出中替换为
    [REDACTED]
    ,永远不要将密钥写入LESSONS.md。
  • 提议前先读取内容。 生成提议前,必须先读取目标SKILL.md和LESSONS.md,避免重复、矛盾,以及已经处理过的问题。
  • 支持部分上下文工作。 如果会话很长、上下文被压缩,使用可用的内容工作,说明你能看到的内容以及可能缺失的内容,永远不要编造证据或假设发生了什么。
  • 一次只提一个提议。 不要一次性抛出所有提议,展示、决策、再继续。
  • 尊重SKILL.md结构。 插入新内容时,匹配目标SKILL.md现有的风格、缩进和组织模式。
  • 禁止规则需要双重确认。 向SKILL.md添加否定规则影响较大,必须确认:"要将这条作为'禁止'规则添加到SKILL.md吗?"

作者:Crystian