repair-skill

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Skill Repair

Skill修复

Audit and improve an existing skill against a gold standard. Unlike create-skill (which generates from scratch), this skill diagnoses violations and identifies gaps — what is broken, what is missing, and what would raise quality. The output is a structured improvement plan covering all dimensions.
对照黄金标准审核并改进现有skill。与create-skill(从零生成skill)不同,本skill会诊断违规问题并识别差距——包括功能故障点、缺失内容,以及可提升质量的优化方向。输出结果为涵盖所有维度的结构化改进方案。

Phase 1: Load the Skill

阶段1:加载Skill

Read
$ARGUMENTS
as the path to a skill directory or SKILL.md file.
  • If a directory: read
    SKILL.md
    , then list and note which of
    references/
    ,
    examples/
    ,
    scripts/
    ,
    assets/
    exist and which are referenced from SKILL.md
  • If a file: read it directly, then discover sibling resource directories
If the path is missing or ambiguous, use AskUserQuestion to resolve before proceeding.
Load all three reference files before Phase 2:
  1. ${CLAUDE_PLUGIN_ROOT}/skills/repair-skill/references/skill-anatomy.md
    — gold standard for correct anatomy, three-level loading model, directory type definitions, degrees of freedom, naming conventions, body conventions. Required for Dimensions 5, 6, and 7.
  2. ${CLAUDE_PLUGIN_ROOT}/skills/repair-skill/references/frontmatter-options.md
    — complete frontmatter field catalog, valid values, tool list, tool selection framework. Required for Dimensions 1 and 2.
  3. ${CLAUDE_PLUGIN_ROOT}/skills/repair-skill/references/audit-calibration.md
    — known false-positive patterns that look like violations but are not. Prevents over-flagging on D2 (allowed-tools absent), D4 (Task/Skill prose), and D5 (orientation vs routing).
Proceed to Phase 2 when: SKILL.md is read, sibling directories are cataloged, and all three reference files are loaded.
$ARGUMENTS
视为skill目录或SKILL.md文件的路径。
  • 若为目录:读取
    SKILL.md
    ,然后列出并记录
    references/
    examples/
    scripts/
    assets/
    这些目录是否存在,以及哪些目录在SKILL.md中被引用
  • 若为文件:直接读取该文件,然后发现同级资源目录
如果路径缺失或模糊,使用AskUserQuestion工具解决后再继续。
进入阶段2前需加载全部三个参考文件:
  1. ${CLAUDE_PLUGIN_ROOT}/skills/repair-skill/references/skill-anatomy.md
    — 正确结构的黄金标准,三级加载模型,目录类型定义,自由度说明,命名规范,正文格式规范。是维度5、6、7的必备参考。
  2. ${CLAUDE_PLUGIN_ROOT}/skills/repair-skill/references/frontmatter-options.md
    — 完整的前置字段目录、有效值、工具列表、工具选择框架。是维度1和2的必备参考。
  3. ${CLAUDE_PLUGIN_ROOT}/skills/repair-skill/references/audit-calibration.md
    — 已知的假阳性模式,即看似违规但实际不违规的情况。可避免在D2(allowed-tools缺失)、D4(任务/Skill描述)和D5(导向与路由)上过度标记。
满足以下条件后进入阶段2:已读取SKILL.md,已分类同级目录,已加载全部三个参考文件。

Phase 2: Audit

阶段2:审核

Run each dimension independently. For each finding record: the dimension code, what is wrong or missing, which principle it violates or which gold standard it falls short of, and the specific change required. Proceed to Phase 3 when all 7 dimensions are evaluated.
Finding types:
  • Violation — something present that contradicts a rule
  • Gap — something absent that would improve the skill against the gold standard
  • Improvement — something that works but could be meaningfully tightened
Severity:
  • critical — breaks triggering or wastes significant context on every invocation
  • major — degrades generalization, reliability, or workflow correctness
  • minor — polish; the skill works but isn't as good as it could be

独立评估每个维度。对于每个发现,记录:维度代码、问题或缺失内容、违反的原则或未达到的黄金标准、具体的修改要求。完成全部7个维度的评估后进入阶段3。
发现类型:
  • 违规 — 存在违反规则的内容
  • 差距 — 缺少可使skill达到黄金标准的内容
  • 优化点 — 当前功能可用但可进一步完善
严重程度:
  • critical(严重) — 每次调用都会触发失败或浪费大量上下文
  • major(主要) — 降低通用性、可靠性或工作流正确性
  • minor(次要) — 润色优化;skill可正常工作但仍有提升空间

Dimension 1 — Frontmatter Quality

维度1 — 前置内容质量

The description is the only part of a skill that is always in context. Every token here costs budget across every session. Audit for violations and gaps:
Violations:
  • Person and framing: Is the description third-person ("This skill should be used when...")? First-person or imperative framing reads as an instruction to execute, not a triggering condition to evaluate. Critical if wrong.
  • Scalar type: Does the description use
    >
    (folded scalar)? The
    |
    literal scalar preserves newlines and can produce unexpected whitespace when parsed. Minor.
  • Trigger phrase authenticity: Are quoted phrases verbatim user speech — the exact tokens a user would type? Paraphrases ("hook creation tasks") have lower routing match rates than natural language ("create a hook"). Major if paraphrased.
  • Token density: Does the description restate the skill name, explain what skills are, or include meta-commentary? Every such token is budget waste. Minor per instance, major if systemic.
Gaps:
  • Trigger phrase coverage: Are there 3–5+ varied phrases? Single-phrase descriptions miss synonym space. Does coverage include the naive phrasing a user would use who has never heard of this skill? Major if sparse.
  • Missing
    argument-hint
    :
    Does the skill read
    $ARGUMENTS
    or
    $1
    /$
    $2
    without an
    argument-hint
    field? The hint is shown in autocomplete — its absence means users don't know what to pass. Minor.
  • Name validity: Is the skill name lowercase, hyphens only, max 64 chars? Verb-led for commands? Namespaced when it aids routing clarity? These constraints ensure filesystem compatibility, command-line ergonomics, and unambiguous routing. Minor if wrong.
  • Trigger accuracy: Mentally generate 3 prompts that should trigger this skill and 3 that should NOT (from adjacent domains). Does the description cover the should-triggers and exclude the shouldn't-triggers? Sparse coverage or broad false-trigger surface is a routing quality gap. Major if coverage is sparse.
  • Token budget: Is the description over 100 tokens? Per-session cost scales with description length across all installed skills. Over 150 tokens is a violation (major); 100–150 is a gap (minor) — tighten by prioritizing trigger phrases over prose.
  • Negative triggers absent: For skills in crowded domains (multiple skills with overlapping concerns), does the description include explicit "Not for X" exclusions? Negative triggers sharpen the routing decision boundary. Minor.

描述是skill中始终处于上下文环境的唯一部分。这里的每个token都会占用所有会话的预算。需审核违规问题和差距:
违规:
  • 人称与框架: 描述是否使用第三人称(“This skill should be used when...”)?第一人称或祈使句框架会被解读为执行指令,而非触发条件评估。错误时为严重级别。
  • 标量类型: 描述是否使用
    >
    (折叠标量)?
    |
    字面量标量会保留换行符,解析时可能产生意外空白。次要级别。
  • 触发短语真实性: 引用的短语是否为用户的原话——即用户会输入的准确token? paraphrase(如“hook creation tasks”)的路由匹配率低于自然语言(如“create a hook”)。使用paraphrase时为主要级别。
  • Token密度: 描述是否重复skill名称、解释skill是什么或包含元评论?这类token都是预算浪费。每个实例为次要级别,系统性问题为主要级别。
差距:
  • 触发短语覆盖范围: 是否有3-5个以上不同的短语?单一短语的描述会遗漏同义词场景。是否覆盖了从未听说过本skill的用户会使用的直白表述?覆盖稀疏时为主要级别。
  • 缺少
    argument-hint
    skill读取
    $ARGUMENTS
    $1
    /
    $2
    但没有
    argument-hint
    字段?该提示会显示在自动补全中——缺失会导致用户不知道应传入什么内容。次要级别。
  • 名称有效性: skill名称是否为小写、仅使用连字符、最长64字符?命令类skill是否以动词开头?命名空间是否有助于路由清晰?这些约束确保文件系统兼容性、命令行易用性和明确的路由。错误时为次要级别。
  • 触发准确性: 手动生成3个应触发本skill的提示和3个不应触发的提示(来自相邻领域)。描述是否覆盖了应触发的情况并排除了不应触发的情况?覆盖稀疏或误触发范围广属于路由质量差距。覆盖稀疏时为主要级别。
  • Token预算: 描述是否超过100个token?每个会话的成本随所有已安装skill的描述长度而增加。超过150个token属于违规(主要级别);100-150个token属于差距(次要级别)——优先保留触发短语,精简描述来优化。
  • 缺少负向触发: 对于处于拥挤领域的skill(多个skill关注点重叠),描述是否包含明确的“不适用于X”排除项?负向触发可明确路由决策边界。次要级别。

Dimension 2 — Execution Modifiers

维度2 — 执行修饰符

Modifiers left at their defaults are not errors — omitting them is correct when defaults apply. Audit for mismatches (violations) and missing configuration (gaps).
Refer to
frontmatter-options.md
for the complete field catalog, model selection table, and tool selection framework.
Violations:
  • Does the skill have unrestricted
    Bash
    when a scoped pattern (
    Bash(git:*)
    ) would work?
  • Does the skill have tools in
    allowed-tools
    it never uses? Dead entries add noise.
Gaps:
  • Does the skill invoke other skills or spawn agents without
    Skill
    or
    Task
    in
    allowed-tools
    ?
  • Does the skill require user decisions mid-workflow but lacks
    AskUserQuestion
    in
    allowed-tools
    ?
  • Does the skill read a file path from
    $1
    but uses a
    Read
    tool call instead of
    @$1
    inline injection? A tool round-trip is being wasted. Minor.
  • Could real-time data (git status, env vars, file tree) be injected using dynamic content syntax (bang + backtick-wrapped command) instead of a tool call? Major when the skill's workflow begins with infallible probes (git branch, file tree, env vars) that never need error handling; Minor for commands that may fail or need exit-code branching.
    Before (wastes tool round-trips):
    1. Run `git log --oneline -5` using Bash
    2. Run `git diff --name-only` using Bash
    3. Analyze the results...
    After (injected at invocation, zero tool calls). Replace those Bash calls with:
    • Recent commits: !`git log --oneline -5`
    • Changed files: !`git diff --name-only`
    Then continue with prose like "Analyze the results..."
    Note: The backslashes escape the backticks so this documentation doesn't execute — in a real skill, write !`cmd` without the backslashes.

修饰符保留默认值并非错误——当默认值适用时,省略修饰符是正确的。需审核不匹配(违规)缺失配置(差距)
参考
frontmatter-options.md
获取完整的字段目录、模型选择表和工具选择框架。
违规:
  • skill是否使用无限制的
    Bash
    ,而作用域模式(
    Bash(git:*)
    )即可满足需求?
  • allowed-tools
    中是否包含skill从未使用的工具?无效条目会增加干扰。
差距:
  • skill调用其他skill或生成Agent,但
    allowed-tools
    中没有
    Skill
    Task
  • skill工作流中途需要用户决策,但
    allowed-tools
    中没有
    AskUserQuestion
  • skill从
    $1
    读取文件路径,但使用
    Read
    工具调用而非
    @$1
    内联注入?这会浪费一次工具往返。次要级别。
  • 是否可以使用动态内容语法(感叹号+反引号包裹的命令)注入实时数据(git状态、环境变量、文件树),而非使用工具调用?当skill工作流以不会出错的探测(git分支、文件树、环境变量)开头且无需错误处理时为主要级别;对于可能失败或需要根据退出码分支的命令为次要级别。
    优化前(浪费工具往返):
    1. 使用Bash运行`git log --oneline -5`
    2. 使用Bash运行`git diff --name-only`
    3. 分析结果...
    优化后(调用时注入,无工具调用) 将上述Bash调用替换为:
    • 最近提交:!`git log --oneline -5`
    • 修改的文件:!`git diff --name-only`
    然后继续使用描述性文字,如“分析结果...”
    注意: 这里的反斜杠是为了转义反引号,避免文档执行——在实际skill中,直接编写!`cmd`即可,无需反斜杠。

Dimension 3 — Intensional vs Extensional Instruction

维度3 — 内涵式与外延式指令

A rule stated with its reasoning generalizes to every input. An example that implies a rule requires the reading model to reverse-engineer the rule — two reasoning hops instead of one, covering only the shape of that example.
Violations:
  • Does it show a good/bad contrast and leave the principle implicit? The principle should be stated first; the contrast confirms it, not carries it.
  • Is a "Common Mistakes" or "Bad/Good examples" section doing the work that a single principle sentence could do more efficiently? Major.
  • Would removing the examples leave the rule intact and still actionable? If yes, the examples are redundant. If no, the rule hasn't been stated yet — state it.
Gaps:
  • Are there instruction blocks that tell Claude what to do but not why? Adding the reasoning makes the instruction generalize to edge cases not covered by the current examples. Major per uncovered block.

附带推理说明的规则可推广到所有输入。通过示例隐含规则需要读取模型反向推导规则——需要两次推理步骤,且仅覆盖该示例的形式。
违规:
  • 是否展示了好坏对比但未明确说明原则?应先陈述原则;对比用于确认原则,而非承载原则。
  • “常见错误”或“好坏示例”部分是否在做单个原则句子即可更高效完成的工作?主要级别。
  • 删除示例后规则是否仍完整且可执行?如果是,示例属于冗余内容。如果不是,说明规则尚未明确——需先陈述规则。
差距:
  • 是否存在仅告诉Claude做什么但不说明为什么的指令块?添加推理说明可使指令推广到当前示例未覆盖的边缘情况。每个未覆盖的块为主要级别。

Dimension 4 — Agentic vs Deterministic Split

维度4 — 智能体式与确定性拆分

Load
${CLAUDE_PLUGIN_ROOT}/skills/create-skill/references/script-patterns.md
before auditing this dimension.
It contains the five signal patterns for recognizing a script candidate, CLI design conventions, common archetypes (init, validate, transform, package, query), and the delegation pattern for using
create-cli
to design the interface.
Skills mix LLM-guided reasoning (agentic) and script execution (deterministic). The split should be deliberate — see the Degrees of Freedom table in
skill-anatomy.md
.
Violations:
  • Code blocks that are repeated or identical across invocations — these are deterministic operations being re-generated each time. They belong in
    scripts/
    . Inlining costs context tokens on every run; scripts execute without being loaded.
  • Prose that describes a deterministic sequence — if the steps are always the same regardless of input, a script is more reliable than asking the model to reproduce them.
  • Scripts that exist but aren't referenced in SKILL.md — Claude won't use them. A script without a reference in SKILL.md specifying when and how to invoke it is invisible to the skill workflow. Major.
  • Vague script references — "run the validation script if needed" is not actionable. References must state the trigger condition, the exact invocation, and how to interpret the output. Minor.
Gaps — apply the five signal patterns from
script-patterns.md
to each workflow step:
  • Signal 1 (Repeated Generation): Does any step produce the same structure with different parameters across invocations? → Parameterized script candidate. Major.
  • Signal 2 (Unclear Tool Choice): Does any step require combining multiple standard tools in a fragile sequence to accomplish something naturally expressible as a single function? → Script the procedure. Major.
  • Signal 3 (Rigid Contract): Does any step have an input/output shape clear enough to write
    --help
    text for right now? → CLI candidate; delegate design to
    create-cli
    .
  • Signal 4 (Dual-Use Potential): Would any step be useful to run independently from the terminal, outside the skill workflow? → Design as proper CLI from the start.
  • Signal 5 (Consistency Critical): Does any step need to produce identical output for identical inputs — not "similar" but reproducible? → Script, not LLM generation.
  • Judgment steps with no criteria — "analyze the situation" is agentic but unanchored. Agentic steps need explicit criteria for what to consider and what constitutes a good outcome. Major per uncovered step.

审核此维度前先加载
${CLAUDE_PLUGIN_ROOT}/skills/create-skill/references/script-patterns.md
。该文件包含识别脚本候选的五个信号模式、CLI设计规范、常见原型(初始化、验证、转换、打包、查询),以及使用
create-cli
设计接口的委托模式。
Skill混合了LLM引导的推理(智能体式)和脚本执行(确定性)。拆分应经过深思熟虑——参见
skill-anatomy.md
中的自由度表。
违规:
  • 每次调用中重复或相同的代码块 — 这些是每次都会重新生成的确定性操作,应放在
    scripts/
    目录中。内联会在每次运行时消耗上下文token;脚本执行无需加载到上下文中。
  • 描述确定性序列的文字 — 如果步骤无论输入如何始终相同,脚本比让模型重复生成步骤更可靠。
  • 存在但未在SKILL.md中引用的脚本 — Claude不会使用这些脚本。如果SKILL.md中未指定调用时机和方式,脚本对skill工作流来说是不可见的。主要级别。
  • 模糊的脚本引用 — “必要时运行验证脚本”不具备可操作性。引用必须说明触发条件、准确的调用方式以及如何解释输出。次要级别。
差距 — 对每个工作流步骤应用
script-patterns.md
中的五个信号模式:
  • 信号1(重复生成): 是否有步骤在不同调用中生成结构相同但参数不同的内容?→ 参数化脚本候选。主要级别。
  • 信号2(工具选择不明确): 是否有步骤需要组合多个标准工具完成某一操作,而该操作本可以用单个函数自然实现?→ 将过程编写为脚本。主要级别。
  • 信号3(严格契约): 是否有步骤的输入/输出格式足够清晰,可立即编写
    --help
    文本?→ CLI候选;委托
    create-cli
    进行设计。
  • 信号4(双重用途潜力): 是否有步骤可独立于skill工作流,从终端单独运行?→ 从一开始就设计为标准CLI。
  • 信号5(一致性关键): 是否有步骤需要为相同输入生成完全相同的输出——不是“相似”而是可重现?→ 使用脚本,而非LLM生成。
  • 无判断标准的决策步骤 — “分析情况”属于智能体式操作但缺乏锚点。智能体式步骤需要明确的判断标准,说明应考虑哪些因素以及什么是良好结果。每个未覆盖的步骤为主要级别。

Dimension 5 — Verbosity and Context Efficiency

维度5 — 冗余度与上下文效率

Every token in SKILL.md is loaded into context when the skill triggers. Audit for tokens that consume budget without improving outcomes, and for content that belongs in
references/
instead.
Refer to the size invariants table in
skill-anatomy.md
to calibrate severity.
Violations:
  • Prose that restates the section header — "## Validation" followed by "In this section we will validate..." is pure redundancy. Minor per instance.
  • Hedging language — "you might want to consider", "it could be useful to", "generally speaking". Replace with direct imperatives or remove. Minor per instance.
  • Code blocks illustrating a principle stateable in one sentence — a good/bad YAML contrast often collapses to one intensional rule. Major if pattern is frequent.
  • Code blocks collapsed to prose that lose variable bindings — a code block that assigns workflow variables (
    BASE=...
    ,
    BRANCH=...
    ) used by later steps serves two purposes: illustrating the operation AND establishing state. Collapsing it to prose without preserving the bindings leaves downstream
    $VAR
    references unbound. When collapsing, add a "derive working variables" preamble that explicitly binds each variable in prose. Major per lost binding.
  • Repeated guidance across sections — the same rule in a "Best Practices" section and a "Common Mistakes" section. Consolidate to one location. Minor.
  • "When to Use This Skill" section in the body — body loads only after triggering; routing guidance here is never read by the routing decision. Dead tokens every invocation. Major.
  • Headers deeper than H3 — signals content that belongs in
    references/
    . Minor.
  • SKILL.md over ~500 lines — requires
    references/
    deferral. Major.
  • Extraneous documentation files (
    README.md
    ,
    CHANGELOG.md
    ,
    INSTALLATION.md
    ) in the skill directory — never loaded into context, add noise to the package. Minor per file.
Gaps:
  • Would a
    references/
    file reduce SKILL.md size?
    Identify sections only needed for specific sub-tasks and flag them as deferral candidates. Major if SKILL.md > 300 lines.
  • Would a
    references/
    file for domain-specific data help?
    Lookup tables, option catalogs, field definitions — these are reference data, not instructions. Major.

SKILL.md中的每个token在skill触发时都会加载到上下文中。需审核消耗预算但未提升结果的token,以及应放在
references/
目录中的内容。
参考
skill-anatomy.md
中的大小不变量表来校准严重程度。
违规:
  • 重复章节标题的文字 — “## 验证”后跟着“在本节中我们将进行验证...”属于纯粹的冗余。每个实例为次要级别。
  • 模糊措辞 — “你可能需要考虑”、“这可能有用”、“一般来说”。替换为直接的祈使句或删除。每个实例为次要级别。
  • 用代码块说明可单句陈述的原则 — 好坏YAML对比通常可简化为一条内涵式规则。频繁出现时为主要级别。
  • 代码块转换为文字后丢失变量绑定 — 分配工作流变量(
    BASE=...
    BRANCH=...
    )供后续步骤使用的代码块有两个作用:说明操作并建立状态。将其转换为文字但不保留绑定会导致后续
    $VAR
    引用无绑定。转换时,添加“推导工作变量”的前置说明,在文字中明确绑定每个变量。每个丢失的绑定为主要级别。
  • 跨章节重复的指导内容 — “最佳实践”章节和“常见错误”章节中出现相同规则。合并到同一位置。次要级别。
  • 正文中的“何时使用本Skill”章节 — 正文仅在触发后加载;此处的路由指导永远不会被路由决策读取。每次调用都会产生无效token。主要级别。
  • 层级深于H3的标题 — 表明内容应放在
    references/
    目录中。次要级别。
  • SKILL.md超过约500行 — 需要将部分内容延迟加载到
    references/
    目录。主要级别。
  • skill目录中的无关文档文件
    README.md
    CHANGELOG.md
    INSTALLATION.md
    )—— 永远不会加载到上下文中,会增加包的干扰。每个文件为次要级别。
差距:
  • 使用
    references/
    文件能否减少SKILL.md的大小?
    识别仅在特定子任务中需要的章节,标记为延迟加载候选。当SKILL.md超过300行时为主要级别。
  • 针对领域特定数据的
    references/
    文件是否有用?
    查找表、选项目录、字段定义——这些是参考数据,而非指令。主要级别。

Dimension 6 — Workflow Clarity

维度6 — 工作流清晰度

A skill's process should be sequential, complete, and have explicit exit conditions at each phase. Audit for broken workflow and for missing structure that would help.
Violations:
  • Is the process structured as numbered phases with clear names? Without explicit phases the model can't track progress or know which step it's in. Major if unstructured.
  • Does each phase have an explicit exit condition? Without one, the model doesn't know when to stop iterating on a phase and may loop or skip prematurely. Major if missing.
  • Are there half-thought steps — phases that describe intent without specifying what to do or how to evaluate the result? Major per uncovered phase.
  • Does the skill handle missing, ambiguous, or malformed input?
Gaps:
  • Variable continuity: Does every
    $VAR
    referenced in a step have an explicit binding in an earlier step or a pre-flight/preamble section? Scan all
    $VARNAME
    tokens in the skill body and trace each back to its origin. An unbound variable is a workflow break — the agent either halts on an invalid command or silently substitutes an empty string. Major per unbound variable.
  • Is there a delivery phase that tells Claude what to produce and in what format? Many skills describe the process clearly but leave the output format implicit. Major if absent.
  • Would a validation checklist at the end of the workflow catch errors that prose instructions miss? Minor.
  • Would an
    examples/
    directory help users understand what the expected output looks like? Minor.

Skill的流程应是顺序化、完整的,且每个阶段都有明确的退出条件。需审核工作流故障和缺失的辅助结构。
违规:
  • 流程是否以带明确名称的编号阶段结构化?没有明确阶段的话,模型无法跟踪进度或知道当前处于哪个步骤。无结构化时为主要级别。
  • 每个阶段是否有明确的退出条件?没有的话,模型不知道何时停止在某一阶段的迭代,可能会循环或提前跳过。缺失时为主要级别。
  • 是否有不完整的步骤——仅描述意图但未说明具体操作或如何评估结果的阶段?每个未覆盖的阶段为主要级别。
  • skill是否处理缺失、模糊或格式错误的输入?
差距:
  • 变量连续性: 步骤中引用的每个
    $VAR
    是否在更早的步骤或预检查/前置章节中有明确绑定?扫描skill正文中所有
    $VARNAME
    token,追踪每个变量的来源。未绑定的变量会导致工作流中断——智能体要么因无效命令停止,要么静默替换为空字符串。每个未绑定变量为主要级别。
  • 是否有交付阶段告诉Claude要生成什么内容以及格式?许多skill清晰描述了流程,但输出格式隐含未明。缺失时为主要级别。
  • 工作流末尾的验证清单能否捕获文字指令遗漏的错误?次要级别。
  • examples/
    目录是否有助于用户理解预期输出?次要级别。

Dimension 7 — Anatomy Completeness

维度7 — 结构完整性

Refer to
skill-anatomy.md
for the gold standard directory anatomy and the Gap Analysis Checklist. This dimension asks: does the skill's structure match its complexity tier, and what is absent that would raise it?
Use the Gap Analysis Checklist from
skill-anatomy.md
directly.
For each "yes" answer, record a gap at the appropriate severity.
Violations:
  • Does the skill have a
    scripts/
    directory with scripts not referenced in SKILL.md? Major — referenced or delete.
  • Does the skill have a
    references/
    directory with files not pointed to from SKILL.md? Major — referenced or delete.
  • Does the naming violate conventions (uppercase, underscores, over 64 chars)? Minor.
Gaps — ask for each absent directory:
  • Missing
    scripts/
    :
    Is there a deterministic operation that would be more reliable scripted? Does the same code block appear or would it appear in multiple invocations?
  • Missing
    references/
    :
    Does SKILL.md exceed 300 lines? Are there sections only needed for specific sub-tasks? Is there domain-specific reference data?
  • Missing
    examples/
    :
    Does the skill produce output users adapt? Are there ambiguous instructions a working example would clarify better than prose?
  • Missing resource pointers in SKILL.md: Are there directories present but not referenced — invisible to Claude unless it guesses to look?

参考
skill-anatomy.md
获取黄金标准目录结构和差距分析清单。此维度关注:skill的结构是否与其复杂度层级匹配,以及缺少哪些可提升层级的内容?
直接使用
skill-anatomy.md
中的差距分析清单
。对于每个“是”的回答,记录相应严重程度的差距。
违规:
  • skill的
    scripts/
    目录中是否包含未在SKILL.md中引用的脚本?主要级别——要么引用要么删除。
  • skill的
    references/
    目录中是否包含未在SKILL.md中指向的文件?主要级别——要么引用要么删除。
  • 命名是否违反规范(大写、下划线、超过64字符)?次要级别。
差距 — 针对每个缺失的目录提问:
  • 缺少
    scripts/
    是否存在确定性操作,编写为脚本会更可靠?是否有相同的代码块在多次调用中出现或会出现?
  • 缺少
    references/
    SKILL.md是否超过300行?是否有仅在特定子任务中需要的章节?是否有领域特定的参考数据?
  • 缺少
    examples/
    skill生成的输出是否需要用户调整?是否存在模糊的指令,可用工作示例比文字更清晰地说明?
  • SKILL.md中缺少资源指针: 是否存在目录但未被引用——Claude除非猜测查看,否则无法看到这些目录?

Phase 3: Improvement Report

阶段3:改进报告

Present findings as a structured report. Split violations from gaps — a violation is something wrong, a gap is something missing that would improve the skill.
SKILL IMPROVEMENT REPORT: <skill-name>
Current tier: [simple / standard / complex] — [lines] lines, [directories present]

VIOLATIONS
──────────
CRITICAL
  [D1] Description uses first-person — routing model reads as instruction, not trigger.
       Fix: rewrite as "This skill should be used when the user asks to..."

MAJOR
  [D3] Body teaches frontmatter quality by bad/good contrast; principle never stated.
       Fix: state the rule ("quoted phrases must be verbatim user speech because routing
       matches on literal tokens") then keep the contrast as confirmation.
  [D5] "When to Use This Skill" section in body — dead tokens every invocation.
       Fix: move routing guidance to frontmatter description, delete body section.

MINOR
  [D1] Description uses | scalar instead of >.
       Fix: change to >.

GAPS (what would improve this skill)
─────────────────────────────────────
MAJOR
  [D7] SKILL.md is 420 lines with no references/ directory. Three sections (option catalog,
       field definitions, examples table) are only needed for specific sub-tasks.
       Improvement: extract to references/; add load pointer in SKILL.md for each.
  [D4] File-path validation logic is inlined but must produce consistent output.
       Improvement: move to scripts/validate-input.py; reference from Phase 2.

MINOR
  [D2] Skill reads $1 as a file path but uses Read tool — @$1 injection would save a
       tool round-trip.
       Improvement: replace Read call with @$1 inline injection.
  [D7] No examples/ directory; skill produces config output users adapt.
       Improvement: add examples/ with one representative output file.
Group violations by severity, then gaps by severity. For each: dimension code, what is wrong or missing, the principle or gold standard it falls short of, the exact fix.
Ask: "Apply all critical and major items? Or select specific ones?"

以结构化报告呈现发现结果。将违规与差距分开——违规是存在的问题,差距是缺失的可提升内容。
SKILL改进报告: <skill名称>
当前层级: [简单 / 标准 / 复杂] — [行数]行,[已存在的目录]

违规项
──────────
严重
  [D1] 描述使用第一人称——路由模型会将其解读为执行指令,而非触发条件。
       修复:改写为“This skill should be used when the user asks to...”

主要
  [D3] 正文通过好坏对比讲解前置内容质量,但从未陈述原则。
       修复:先陈述规则(“引用短语必须是用户原话,因为路由匹配基于字面token”),再保留对比作为确认。
  [D5] 正文中存在“何时使用本Skill”章节——每次调用都会产生无效token。
       修复:将路由指导移至前置描述中,删除正文中的该章节。

次要
  [D1] 描述使用|标量而非>。
       修复:改为>。

差距(可提升skill的内容)
─────────────────────────────────────
主要
  [D7] SKILL.md共420行,无references/目录。三个章节(选项目录、字段定义、示例表)仅在特定子任务中需要。
       改进:提取到references/目录;在SKILL.md中为每个文件添加加载指针。
  [D4] 文件路径验证逻辑内联,但需要生成一致的输出。
       改进:移至scripts/validate-input.py;在阶段2中引用。

次要
  [D2] Skill读取$1作为文件路径但使用Read工具——@$1内联注入可节省一次工具往返。
       改进:将Read调用替换为@$1内联注入。
  [D7] 无examples/目录;skill生成的配置输出需要用户调整。
       改进:添加examples/目录并放入一个代表性输出文件。
按严重程度分组违规项,再按严重程度分组差距项。每个条目需包含:维度代码、问题或缺失内容、未达到的原则或黄金标准、具体修复方案。
询问:“是否应用所有严重和主要项?还是选择特定项?”

Phase 4: Apply Improvements

阶段4:应用改进

Apply confirmed items in order: critical violations → major violations → major gaps → minor violations → minor gaps.
For each item:
  • State what is being changed or added and why (principle reference, not just "you asked")
  • Make the edit or create the file
  • Confirm the change is consistent with surrounding content
按以下顺序应用确认的项:严重违规 → 主要违规 → 主要差距 → 次要违规 → 次要差距。
对于每个项:
  • 说明正在修改或添加的内容及原因(参考原则,而非仅“按要求操作”)
  • 进行编辑或创建文件
  • 确认修改与周边内容一致

Explain Your Choices

解释选择理由

After applying improvements, briefly explain:
  • What was changed and why — reference the principle: "Rewrote description as third-person because first-person framing is parsed as an instruction to execute, not a triggering condition to evaluate"
  • What was added and why — "Created references/options.md and deferred the option catalog because SKILL.md was 420 lines and the catalog is only needed for the configuration sub-task"
  • What was left unchanged and why — "
    hooks
    left unset — no lifecycle validation needed"
  • What remains for the user to address — "The examples/ gap requires domain knowledge to fill; a placeholder directory was created"
Phase 4 is complete when all confirmed items are applied, the explanation is delivered, and the validation checklist passes.

应用改进后,简要说明:
  • 修改内容及原因 — 参考原则:“将描述改写为第三人称,因为第一人称框架会被解析为执行指令,而非触发条件评估”
  • 添加内容及原因 — “创建references/options.md并将选项目录延迟加载,因为SKILL.md有420行,且该目录仅在配置子任务中需要”
  • 未修改内容及原因 — “
    hooks
    保持未设置——无需生命周期验证”
  • 需用户自行处理的内容 — “examples/差距需要领域知识来填补;已创建占位目录”
当所有确认项都已应用、已提供解释且验证清单通过后,阶段4完成。

Validation

验证

After applying all improvements, load
${CLAUDE_PLUGIN_ROOT}/skills/repair-skill/references/quality-checklist.md
and run the quality standards check followed by the item-by-item validation checklist. Report any failing items before delivering final results.
应用所有改进后,加载
${CLAUDE_PLUGIN_ROOT}/skills/repair-skill/references/quality-checklist.md
,先运行质量标准检查,再逐项运行验证清单。交付最终结果前报告所有未通过项。