skill-judge
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSkill Judge
Skill Judge
Evaluate Agent Skills against official specifications and patterns derived from 17+ official examples.
根据官方规范及从17+官方示例中提炼的模式,对Agent Skill进行评估。
Core Philosophy
核心理念
What is a Skill?
什么是Skill?
A Skill is NOT a tutorial. A Skill is a knowledge externalization mechanism.
Traditional AI knowledge is locked in model parameters. To teach new capabilities:
Traditional: Collect data → GPU cluster → Train → Deploy new version
Cost: $10,000 - $1,000,000+
Timeline: Weeks to monthsSkills change this:
Skill: Edit SKILL.md → Save → Takes effect on next invocation
Cost: $0
Timeline: InstantThis is the paradigm shift from "training AI" to "educating AI" — like a hot-swappable LoRA adapter that requires no training. You edit a Markdown file in natural language, and the model's behavior changes.
Skill不是教程,而是一种知识外化机制。
传统AI知识被锁在模型参数中。要教授新能力:
传统方式:收集数据 → GPU集群 → 训练 → 部署新版本
成本:10,000美元 - 1,000,000+美元
周期:数周至数月Skill彻底改变了这一模式:
Skill方式:编辑SKILL.md → 保存 → 下次调用立即生效
成本:0美元
周期:即时这是从“训练AI”到“教育AI”的范式转变——就像无需训练的热插拔LoRA适配器。你用自然语言编辑Markdown文件,模型的行为就会随之改变。
The Core Formula
核心公式
Good Skill = Expert-only Knowledge − What Claude Already Knows
A Skill's value is measured by its knowledge delta — the gap between what it provides and what the model already knows.
- Expert-only knowledge: Decision trees, trade-offs, edge cases, anti-patterns, domain-specific thinking frameworks — things that take years of experience to accumulate
- What Claude already knows: Basic concepts, standard library usage, common programming patterns, general best practices
When a Skill explains "what is PDF" or "how to write a for-loop", it's compressing knowledge Claude already has. This is token waste — context window is a public resource shared with system prompts, conversation history, other Skills, and user requests.
优质Skill = 专家专属知识 − Claude已掌握的知识
Skill的价值由其知识增量衡量——即它提供的内容与模型已掌握内容之间的差距。
- 专家专属知识:决策树、权衡取舍、边缘案例、反模式、领域特定思维框架——这些需要数年经验才能积累的内容
- Claude已掌握的知识:基础概念、标准库用法、常见编程模式、通用最佳实践
当Skill解释“什么是PDF”或“如何编写for循环”时,它是在压缩Claude已有的知识。这是令牌浪费——上下文窗口是与系统提示、对话历史、其他Skill和用户请求共享的公共资源。
Tool vs Skill
工具 vs Skill
| Concept | Essence | Function | Example |
|---|---|---|---|
| Tool | What model CAN do | Execute actions | bash, read_file, write_file, WebSearch |
| Skill | What model KNOWS how to do | Guide decisions | PDF processing, MCP building, frontend design |
Tools define capability boundaries — without bash tool, model can't execute commands.
Skills inject knowledge — without frontend-design Skill, model produces generic UI.
The equation:
General Agent + Excellent Skill = Domain Expert AgentSame Claude model, different Skills loaded, becomes different experts.
| 概念 | 本质 | 功能 | 示例 |
|---|---|---|---|
| 工具 | 模型能做什么 | 执行操作 | bash, read_file, write_file, WebSearch |
| Skill | 模型知道如何做什么 | 指导决策 | PDF处理、MCP构建、前端设计 |
工具定义能力边界——没有bash工具,模型无法执行命令。
Skill注入知识——没有frontend-design Skill,模型只会生成通用UI。
等式:
通用Agent + 优质Skill = 领域专家Agent同一个Claude模型,加载不同的Skill,就会成为不同的专家。
Three Types of Knowledge in Skills
Skill中的三类知识
When evaluating, categorize each section:
| Type | Definition | Treatment |
|---|---|---|
| Expert | Claude genuinely doesn't know this | Must keep — this is the Skill's value |
| Activation | Claude knows but may not think of | Keep if brief — serves as reminder |
| Redundant | Claude definitely knows this | Should delete — wastes tokens |
The art of Skill design is maximizing Expert content, using Activation sparingly, and eliminating Redundant ruthlessly.
评估时,需将每个部分分类:
| 类型 | 定义 | 处理方式 |
|---|---|---|
| 专家级 | Claude确实不知道的内容 | 必须保留——这是Skill的价值所在 |
| 激活型 | Claude知道但可能没想到的内容 | 若简洁则保留——起到提醒作用 |
| 冗余型 | Claude肯定知道的内容 | 应删除——浪费令牌 |
Skill设计的艺术在于最大化专家级内容,谨慎使用激活型内容,彻底消除冗余型内容。
Evaluation Dimensions (120 points total)
评估维度(总分120分)
D1: Knowledge Delta (20 points) — THE CORE DIMENSION
D1:知识增量(20分)——核心维度
The most important dimension. Does the Skill add genuine expert knowledge?
| Score | Criteria |
|---|---|
| 0-5 | Explains basics Claude knows (what is X, how to write code, standard library tutorials) |
| 6-10 | Mixed: some expert knowledge diluted by obvious content |
| 11-15 | Mostly expert knowledge with minimal redundancy |
| 16-20 | Pure knowledge delta — every paragraph earns its tokens |
Red flags (instant score ≤5):
- "What is [basic concept]" sections
- Step-by-step tutorials for standard operations
- Explaining how to use common libraries
- Generic best practices ("write clean code", "handle errors")
- Definitions of industry-standard terms
Green flags (indicators of high knowledge delta):
- Decision trees for non-obvious choices ("when X fails, try Y because Z")
- Trade-offs only an expert would know ("A is faster but B handles edge case C")
- Edge cases from real-world experience
- "NEVER do X because [non-obvious reason]"
- Domain-specific thinking frameworks
Evaluation questions:
- For each section, ask: "Does Claude already know this?"
- If explaining something, ask: "Is this explaining TO Claude or FOR Claude?"
- Count paragraphs that are Expert vs Activation vs Redundant
最重要的维度。Skill是否添加了真正的专家知识?
| 分数 | 标准 |
|---|---|
| 0-5 | 解释Claude已掌握的基础知识(什么是X、如何编写代码、标准库教程) |
| 6-10 | 混合内容:一些专家知识被明显冗余的内容稀释 |
| 11-15 | 大部分为专家知识,冗余内容极少 |
| 16-20 | 纯知识增量——每一段内容都物有所值 |
危险信号(立即得分≤5):
- “什么是[基础概念]”章节
- 标准操作的分步教程
- 解释常用库的用法
- 通用最佳实践(“编写简洁代码”、“处理错误”)
- 行业标准术语的定义
积极信号(高知识增量的指标):
- 非明显选择的决策树(“当X失败时,尝试Y,因为Z”)
- 只有专家才知道的权衡(“A更快,但B能处理边缘案例C”)
- 来自实际经验的边缘案例
- “绝对不要做X,因为[非明显原因]”
- 领域特定思维框架
评估问题:
- 对每个章节,问:“Claude已经知道这个吗?”
- 如果是解释内容,问:“这是给Claude讲解,还是为Claude准备的?”
- 统计专家级、激活型、冗余型内容的段落数量
D2: Mindset + Appropriate Procedures (15 points)
D2:思维模式+恰当流程(15分)
Does the Skill transfer expert thinking patterns along with necessary domain-specific procedures?
The difference between experts and novices isn't "knowing how to operate" — it's "how to think about the problem." But thinking patterns alone aren't enough when Claude lacks domain-specific procedural knowledge.
Key distinction:
| Type | Example | Value |
|---|---|---|
| Thinking patterns | "Before designing, ask: What makes this memorable?" | High — shapes decision-making |
| Domain-specific procedures | "OOXML workflow: unpack → edit XML → validate → pack" | High — Claude may not know this |
| Generic procedures | "Step 1: Open file, Step 2: Edit, Step 3: Save" | Low — Claude already knows |
| Score | Criteria |
|---|---|
| 0-3 | Only generic procedures Claude already knows |
| 4-7 | Has domain procedures but lacks thinking frameworks |
| 8-11 | Good balance: thinking patterns + domain-specific workflows |
| 12-15 | Expert-level: shapes thinking AND provides procedures Claude wouldn't know |
What counts as valuable procedures:
- Workflows Claude hasn't been trained on (new tools, proprietary systems)
- Correct ordering that's non-obvious (e.g., "validate BEFORE packing, not after")
- Critical steps that are easy to miss (e.g., "MUST recalculate formulas after editing")
- Domain-specific sequences (e.g., MCP server's 4-phase development process)
What counts as redundant procedures:
- Generic file operations (open, read, write, save)
- Standard programming patterns (loops, conditionals, error handling)
- Common library usage that's well-documented
Expert thinking patterns look like:
markdown
Before [action], ask yourself:
- **Purpose**: What problem does this solve? Who uses it?
- **Constraints**: What are the hidden requirements?
- **Differentiation**: What makes this solution memorable?Valuable domain procedures look like:
markdown
undefinedSkill是否传递了专家的思维模式以及必要的领域特定流程?
专家与新手的区别不在于“知道如何操作”——而在于“如何思考问题”。但当Claude缺乏领域特定流程知识时,仅靠思维模式是不够的。
关键区别:
| 类型 | 示例 | 价值 |
|---|---|---|
| 思维模式 | “设计前,问自己:什么让这个设计令人难忘?” | 高——塑造决策方式 |
| 领域特定流程 | “OOXML工作流:解压→编辑XML→验证→打包” | 高——Claude可能不知道这些 |
| 通用流程 | “步骤1:打开文件,步骤2:编辑,步骤3:保存” | 低——Claude已经知道 |
| 分数 | 标准 |
|---|---|
| 0-3 | 仅包含Claude已掌握的通用流程 |
| 4-7 | 有领域流程,但缺乏思维框架 |
| 8-11 | 平衡良好:思维模式+领域特定工作流 |
| 12-15 | 专家级:既塑造思维,又提供Claude不知道的流程 |
有价值的流程包括:
- Claude未经过训练的工作流(新工具、专有系统)
- 非明显的正确顺序(例如,“验证在打包之前,而不是之后”)
- 容易遗漏的关键步骤(例如,“编辑后必须重新计算公式”)
- 领域特定序列(例如,MCP服务器的4阶段开发流程)
冗余流程包括:
- 通用文件操作(打开、读取、写入、保存)
- 标准编程模式(循环、条件判断、错误处理)
- 文档完善的常用库用法
专家思维模式示例:
markdown
在[操作]之前,问自己:
- **目的**:这解决了什么问题?谁会使用它?
- **约束**:隐藏的要求是什么?
- **差异化**:什么让这个解决方案令人难忘?有价值的领域流程示例:
markdown
undefinedRedlining Workflow (Claude wouldn't know this sequence)
红线标注工作流(Claude不知道这个序列)
- Convert to markdown:
pandoc --track-changes=all - Map text to XML: grep for text in document.xml
- Implement changes in batches of 3-10
- Pack and verify: check ALL changes were applied
**Redundant generic procedures look like**:
```markdown
Step 1: Open the file
Step 2: Find the section
Step 3: Make the change
Step 4: Save and testThe test:
- Does it tell Claude WHAT to think about? (thinking patterns)
- Does it tell Claude HOW to do things it wouldn't know? (domain procedures)
A good Skill provides both when needed.
- 转换为Markdown:
pandoc --track-changes=all - 映射文本到XML:在document.xml中grep查找文本
- 批量实施3-10处更改
- 打包并验证:检查所有更改是否已应用
**冗余通用流程示例**:
```markdown
步骤1:打开文件
步骤2:找到章节
步骤3:进行更改
步骤4:保存并测试测试方法:
- 它是否告诉Claude要思考什么?(思维模式)
- 它是否告诉Claude如何做它不知道的事情?(领域流程)
优质Skill会在需要时同时提供这两者。
D3: Anti-Pattern Quality (15 points)
D3:反模式质量(15分)
Does the Skill have effective NEVER lists?
Why this matters: Half of expert knowledge is knowing what NOT to do. A senior designer sees purple gradient on white background and instinctively cringes — "too AI-generated." This intuition for "what absolutely not to do" comes from stepping on countless landmines.
Claude hasn't stepped on these landmines. It doesn't know Inter font is overused, doesn't know purple gradients are the signature of AI-generated content. Good Skills must explicitly state these "absolute don'ts."
| Score | Criteria |
|---|---|
| 0-3 | No anti-patterns mentioned |
| 4-7 | Generic warnings ("avoid errors", "be careful", "consider edge cases") |
| 8-11 | Specific NEVER list with some reasoning |
| 12-15 | Expert-grade anti-patterns with WHY — things only experience teaches |
Expert anti-patterns (specific + reason):
markdown
NEVER use generic AI-generated aesthetics like:
- Overused font families (Inter, Roboto, Arial)
- Cliched color schemes (particularly purple gradients on white backgrounds)
- Predictable layouts and component patterns
- Default border-radius on everythingWeak anti-patterns (vague, no reasoning):
markdown
Avoid making mistakes.
Be careful with edge cases.
Don't write bad code.The test: Would an expert read the anti-pattern list and say "yes, I learned this the hard way"? Or would they say "this is obvious to everyone"?
Skill是否有有效的“绝对不要”列表?
为什么重要:专家知识的一半是知道不要做什么。资深设计师看到白色背景上的紫色渐变会本能地皱眉——“太像AI生成的了”。这种“绝对不要做什么”的直觉来自踩过无数的坑。
Claude没有踩过这些坑。它不知道Inter字体被过度使用,不知道紫色渐变是AI生成内容的标志。优质Skill必须明确列出这些“绝对禁忌”。
| 分数 | 标准 |
|---|---|
| 0-3 | 未提及反模式 |
| 4-7 | 通用警告(“避免错误”、“小心”、“考虑边缘案例”) |
| 8-11 | 具体的“绝对不要”列表,附带部分理由 |
| 12-15 | 专家级反模式,附带原因——只有经验才能教会的内容 |
专家级反模式(具体+理由):
markdown
绝对不要使用通用AI生成的美学风格,例如:
- 过度使用的字体家族(Inter、Roboto、Arial)
- 陈词滥调的配色方案(尤其是白色背景上的紫色渐变)
- 可预测的布局和组件模式
- 所有元素都使用默认圆角薄弱的反模式(模糊,无理由):
markdown
避免犯错。
小心边缘案例。
不要写糟糕的代码。测试方法:专家看到反模式列表会说“是的,我是通过惨痛教训学到的”?还是会说“这对每个人来说都很明显”?
D4: Specification Compliance — Especially Description (15 points)
D4:规范合规性——尤其关注描述质量(15分)
Does the Skill follow official format requirements? Special focus on description quality.
| Score | Criteria |
|---|---|
| 0-5 | Missing frontmatter or invalid format |
| 6-10 | Has frontmatter but description is vague or incomplete |
| 11-13 | Valid frontmatter, description has WHAT but weak on WHEN |
| 14-15 | Perfect: comprehensive description with WHAT, WHEN, and trigger keywords |
Frontmatter requirements:
- : lowercase, alphanumeric + hyphens only, ≤64 characters
name - : THE MOST CRITICAL FIELD — determines if skill gets used at all
description
Why description is THE MOST IMPORTANT field:
┌─────────────────────────────────────────────────────────────────────┐
│ SKILL ACTIVATION FLOW │
│ │
│ User Request → Agent sees ALL skill descriptions → Decides which │
│ (only descriptions, not bodies!) to activate │
│ │
│ If description doesn't match → Skill NEVER gets loaded │
│ If description is vague → Skill might not trigger when it should │
│ If description lacks keywords → Skill is invisible to the Agent │
└─────────────────────────────────────────────────────────────────────┘The brutal truth: A Skill with perfect content but poor description is useless — it will never be activated. The description is the only chance to tell the Agent "use me in these situations."
Description must answer THREE questions:
- WHAT: What does this Skill do? (functionality)
- WHEN: In what situations should it be used? (trigger scenarios)
- KEYWORDS: What terms should trigger this Skill? (searchable terms)
Excellent description (all three elements):
yaml
description: "Comprehensive document creation, editing, and analysis with support
for tracked changes, comments, formatting preservation, and text extraction.
When Claude needs to work with professional documents (.docx files) for:
(1) Creating new documents, (2) Modifying or editing content,
(3) Working with tracked changes, (4) Adding comments, or any other document tasks"Analysis:
- WHAT: creation, editing, analysis, tracked changes, comments
- WHEN: "When Claude needs to work with... for: (1)... (2)... (3)..."
- KEYWORDS: .docx files, tracked changes, professional documents
Poor description (missing elements):
yaml
description: "处理文档相关功能"Problems:
- WHAT: vague ("文档相关功能" — what specifically?)
- WHEN: missing (when should Agent use this?)
- KEYWORDS: missing (no ".docx", no specific scenarios)
Another poor example:
yaml
description: "A helpful skill for various tasks"This is useless — Agent has no idea when to activate it.
Description quality checklist:
- Lists specific capabilities (not just "helps with X")
- Includes explicit trigger scenarios ("Use when...", "When user asks for...")
- Contains searchable keywords (file extensions, domain terms, action verbs)
- Specific enough that Agent knows EXACTLY when to use it
- Includes scenarios where this skill MUST be used (not just "can be used")
Skill是否遵循官方格式要求?特别关注描述质量。
| 分数 | 标准 |
|---|---|
| 0-5 | 缺少前置元数据或格式无效 |
| 6-10 | 有前置元数据,但描述模糊或不完整 |
| 11-13 | 有效的前置元数据,描述包含功能但使用场景薄弱 |
| 14-15 | 完美:全面的描述包含功能、使用场景和触发关键词 |
前置元数据要求:
- :小写,仅包含字母数字和连字符,≤64字符
name - :最关键的字段——决定Skill是否会被使用
description
为什么描述是最重要的字段:
┌─────────────────────────────────────────────────────────────────────┐
│ SKILL激活流程 │
│ │
│ 用户请求 → Agent查看所有Skill描述 → 决定激活哪一个 │
│ (仅查看描述,不查看正文!) │
│ │
│ 如果描述不匹配 → Skill永远不会被激活 │
│ 如果描述模糊 → Skill可能在应该激活时没有被触发 │
│ 如果描述缺少关键词 → Skill对Agent来说是不可见的 │
└─────────────────────────────────────────────────────────────────────┘残酷的事实:内容完美但描述糟糕的Skill是无用的——它永远不会被激活。描述是告诉Agent“在这些场景下使用我”的唯一机会。
描述必须回答三个问题:
- 是什么:这个Skill能做什么?(功能)
- 何时用:应该在什么场景下使用?(触发场景)
- 关键词:哪些术语应该触发这个Skill?(可搜索术语)
优秀描述示例(包含所有三个要素):
yaml
description: "全面的文档创建、编辑和分析,支持修订跟踪、批注、格式保留和文本提取。
当Claude需要处理专业文档(.docx文件)时使用:
(1) 创建新文档,(2) 修改或编辑内容,
(3) 处理修订跟踪,(4) 添加批注,或任何其他文档任务"分析:
- 是什么:创建、编辑、分析、修订跟踪、批注
- 何时用:“当Claude需要处理...时:(1)...(2)...(3)...”
- 关键词:.docx文件、修订跟踪、专业文档
糟糕描述示例(缺少要素):
yaml
description: "处理文档相关功能"问题:
- 是什么:模糊(“文档相关功能”——具体是什么?)
- 何时用:缺失(Agent应该何时使用?)
- 关键词:缺失(没有“.docx”,没有具体场景)
另一个糟糕示例:
yaml
description: "适用于各种任务的有用Skill"这完全无用——Agent不知道何时激活它。
描述质量检查清单:
- 列出具体功能(不只是“帮助处理X”)
- 包含明确的触发场景(“当...时使用”、“当用户请求...时”)
- 包含可搜索关键词(文件扩展名、领域术语、动作动词)
- 足够具体,让Agent确切知道何时使用
- 包含必须使用该Skill的场景(不只是“可以使用”)
D5: Progressive Disclosure (15 points)
D5:渐进式披露(15分)
Does the Skill implement proper content layering?
Skill loading has three layers:
Layer 1: Metadata (always in memory)
Only name + description
~100 tokens per skill
Layer 2: SKILL.md Body (loaded after triggering)
Detailed guidelines, code examples, decision trees
Ideal: < 500 lines
Layer 3: Resources (loaded on demand)
scripts/, references/, assets/
No limit| Score | Criteria |
|---|---|
| 0-5 | Everything dumped in SKILL.md (>500 lines, no structure) |
| 6-10 | Has references but unclear when to load them |
| 11-13 | Good layering with MANDATORY triggers present |
| 14-15 | Perfect: decision trees + explicit triggers + "Do NOT Load" guidance |
For Skills WITH references directory, check Loading Trigger Quality:
| Trigger Quality | Characteristics |
|---|---|
| Poor | References listed at end, no loading guidance |
| Mediocre | Some triggers but not embedded in workflow |
| Good | MANDATORY triggers in workflow steps |
| Excellent | Scenario detection + conditional triggers + "Do NOT Load" |
The loading problem:
Loading too little ◄─────────────────────────────────► Loading too much
- References sit unused - Wastes context space
- Agent doesn't know when to load - Irrelevant info dilutes key content
- Knowledge is there but never accessed - Unnecessary token overheadGood loading trigger (embedded in workflow):
markdown
undefinedSkill是否实现了适当的内容分层?
Skill加载分为三层:
第一层:元数据(始终在内存中)
仅包含名称+描述
每个Skill约100令牌
第二层:SKILL.md正文(触发后加载)
详细指南、代码示例、决策树
理想:< 500行
第三层:资源(按需加载)
scripts/, references/, assets/
无限制| 分数 | 标准 |
|---|---|
| 0-5 | 所有内容都堆在SKILL.md中(>500行,无结构) |
| 6-10 | 有参考文件,但加载时机不明确 |
| 11-13 | 分层良好,包含强制加载触发点 |
| 14-15 | 完美:决策树+明确触发点+“请勿加载”指导 |
对于包含references目录的Skill,检查加载触发质量:
| 触发质量 | 特征 |
|---|---|
| 差 | 参考文件仅在末尾列出,无加载指导 |
| 一般 | 有一些触发点,但未嵌入工作流 |
| 好 | 工作流步骤中包含强制加载触发点 |
| 优秀 | 场景检测+条件触发+“请勿加载” |
加载问题:
加载过少 ◄─────────────────────────────────► 加载过多
- 参考文件未被使用 - 浪费上下文空间
- Agent不知道何时加载 - 无关内容稀释关键信息
- 知识存在但从未被访问 - 不必要的令牌开销良好的加载触发示例(嵌入工作流):
markdown
undefinedCreating New Document
创建新文档
MANDATORY - READ ENTIRE FILE: Before proceeding, you MUST read
(~500 lines) completely from start to finish.
NEVER set any range limits when reading this file.
docx-js.mdDo NOT load or for this task.
ooxml.mdredlining.md
**Bad loading trigger** (just listed):
```markdown强制要求 - 阅读整个文件:在开始之前,你必须完全阅读
(约500行)。
阅读此文件时绝对不要设置任何范围限制。
docx-js.md请勿加载 或 用于此任务。
ooxml.mdredlining.md
**糟糕的加载触发示例**(仅列出):
```markdownReferences
参考
- docx-js.md - for creating documents
- ooxml.md - for editing
- redlining.md - for tracking changes
**For simple Skills** (no references, <100 lines): Score based on conciseness and self-containment.
---- docx-js.md - 用于创建文档
- ooxml.md - 用于编辑
- redlining.md - 用于修订跟踪
**对于简单Skill**(无参考文件,<100行):根据简洁性和自包含性评分。
---D6: Freedom Calibration (15 points)
D6:自由度校准(15分)
Is the level of specificity appropriate for the task's fragility?
Different tasks need different levels of constraint. This is about matching freedom to fragility.
| Score | Criteria |
|---|---|
| 0-5 | Severely mismatched (rigid scripts for creative tasks, vague for fragile ops) |
| 6-10 | Partially appropriate, some mismatches |
| 11-13 | Good calibration for most scenarios |
| 14-15 | Perfect freedom calibration throughout |
The freedom spectrum:
| Task Type | Should Have | Why | Example Skill |
|---|---|---|---|
| Creative/Design | High freedom | Multiple valid approaches, differentiation is value | frontend-design |
| Code review | Medium freedom | Principles exist but judgment required | code-review |
| File format operations | Low freedom | One wrong byte corrupts file, consistency critical | docx, xlsx, pdf |
High freedom (text-based instructions):
markdown
Commit to a BOLD aesthetic direction. Pick an extreme: brutally minimal,
maximalist chaos, retro-futuristic, organic natural...Medium freedom (pseudocode or parameterized):
markdown
Review priority:
1. Security vulnerabilities (must fix)
2. Logic errors (must fix)
3. Performance issues (should fix)
4. Maintainability (optional)Low freedom (specific scripts, exact steps):
markdown
**MANDATORY**: Use exact script in `scripts/create-doc.py`
Parameters: --title "X" --author "Y"
Do NOT modify the script.The test: Ask "if Agent makes a mistake, what's the consequence?"
- High consequence → Low freedom
- Low consequence → High freedom
特定程度是否与任务的脆弱性相匹配?
不同任务需要不同程度的约束。这关乎自由度与脆弱性的匹配。
| 分数 | 标准 |
|---|---|
| 0-5 | 严重不匹配(创意任务用严格脚本,脆弱操作用模糊指导) |
| 6-10 | 部分匹配,存在一些不匹配 |
| 11-13 | 大多数场景校准良好 |
| 14-15 | 全程完美校准自由度 |
自由度范围:
| 任务类型 | 应具备 | 原因 | 示例Skill |
|---|---|---|---|
| 创意/设计 | 高自由度 | 多种有效方法,差异化是价值所在 | frontend-design |
| 代码评审 | 中等自由度 | 存在原则,但需要判断 | code-review |
| 文件格式操作 | 低自由度 | 一个错误字节就会损坏文件,一致性至关重要 | docx, xlsx, pdf |
高自由度(基于文本的指导):
markdown
采用大胆的美学方向。选择一个极端:极简主义、极繁主义、复古未来主义、有机自然风格...中等自由度(伪代码或参数化):
markdown
评审优先级:
1. 安全漏洞(必须修复)
2. 逻辑错误(必须修复)
3. 性能问题(应该修复)
4. 可维护性(可选)低自由度(具体脚本,精确步骤):
markdown
**强制要求**:使用`scripts/create-doc.py`中的精确脚本
参数:--title "X" --author "Y"
请勿修改此脚本。测试方法:问“如果Agent犯错,后果是什么?”
- 高后果 → 低自由度
- 低后果 → 高自由度
D7: Pattern Recognition (10 points)
D7:模式识别(10分)
Does the Skill follow an established official pattern?
Through analyzing 17 official Skills, we identified 5 main design patterns:
| Pattern | ~Lines | Key Characteristics | Example | When to Use |
|---|---|---|---|---|
| Mindset | ~50 | Thinking > technique, strong NEVER list, high freedom | frontend-design | Creative tasks requiring taste |
| Navigation | ~30 | Minimal SKILL.md, routes to sub-files | internal-comms | Multiple distinct scenarios |
| Philosophy | ~150 | Two-step: Philosophy → Express, emphasizes craft | canvas-design | Art/creation requiring originality |
| Process | ~200 | Phased workflow, checkpoints, medium freedom | mcp-builder | Complex multi-step projects |
| Tool | ~300 | Decision trees, code examples, low freedom | docx, pdf, xlsx | Precise operations on specific formats |
| Score | Criteria |
|---|---|
| 0-3 | No recognizable pattern, chaotic structure |
| 4-6 | Partially follows a pattern with significant deviations |
| 7-8 | Clear pattern with minor deviations |
| 9-10 | Masterful application of appropriate pattern |
Pattern selection guide:
| Your Task Characteristics | Recommended Pattern |
|---|---|
| Needs taste and creativity | Mindset (~50 lines) |
| Needs originality and craft quality | Philosophy (~150 lines) |
| Has multiple distinct sub-scenarios | Navigation (~30 lines) |
| Complex multi-step project | Process (~200 lines) |
| Precise operations on specific format | Tool (~300 lines) |
Skill是否遵循已确立的官方模式?
通过分析17个官方Skill,我们确定了5种主要设计模式:
| 模式 | 约行数 | 关键特征 | 示例 | 使用场景 |
|---|---|---|---|---|
| 思维模式 | ~50 | 思维>技术,强大的“绝对不要”列表,高自由度 | frontend-design | 需要品味的创意任务 |
| 导航型 | ~30 | 极简SKILL.md,路由到子文件 | internal-comms | 多个不同场景 |
| 理念型 | ~150 | 两步:理念→表达,强调工艺 | canvas-design | 需要原创性的艺术/创作 |
| 流程型 | ~200 | 分阶段工作流,检查点,中等自由度 | mcp-builder | 复杂多步骤项目 |
| 工具型 | ~300 | 决策树,代码示例,低自由度 | docx, pdf, xlsx | 特定格式的精确操作 |
| 分数 | 标准 |
|---|---|
| 0-3 | 无可识别模式,结构混乱 |
| 4-6 | 部分遵循模式,但有重大偏差 |
| 7-8 | 模式清晰,有轻微偏差 |
| 9-10 | 熟练应用适当的模式 |
模式选择指南:
| 你的任务特征 | 推荐模式 |
|---|---|
| 需要品味和创意 | 思维模式(~50行) |
| 需要原创性和工艺质量 | 理念型(~150行) |
| 有多个不同子场景 | 导航型(~30行) |
| 复杂多步骤项目 | 流程型(~200行) |
| 特定格式的精确操作 | 工具型(~300行) |
D8: Practical Usability (15 points)
D8:实际可用性(15分)
Can an Agent actually use this Skill effectively?
| Score | Criteria |
|---|---|
| 0-5 | Confusing, incomplete, contradictory, or untested guidance |
| 6-10 | Usable but with noticeable gaps |
| 11-13 | Clear guidance for common cases |
| 14-15 | Comprehensive coverage including edge cases and error handling |
Check for:
- Decision trees: For multi-path scenarios, is there clear guidance on which path to take?
- Code examples: Do they actually work? Or are they pseudocode that breaks?
- Error handling: What if the main approach fails? Are fallbacks provided?
- Edge cases: Are unusual but realistic scenarios covered?
- Actionability: Can Agent immediately act, or needs to figure things out?
Good usability (decision tree + fallback):
markdown
| Task | Primary Tool | Fallback | When to Use Fallback |
|------|-------------|----------|----------------------|
| Read text | pdftotext | PyMuPDF | Need layout info |
| Extract tables | camelot-py | tabula-py | camelot fails |
**Common issues**:
- Scanned PDF: pdftotext returns blank → Use OCR first
- Encrypted PDF: Permission error → Use PyMuPDF with passwordPoor usability (vague):
markdown
Use appropriate tools for PDF processing.
Handle errors properly.
Consider edge cases.Agent能否实际有效使用这个Skill?
| 分数 | 标准 |
|---|---|
| 0-5 | 指导混乱、不完整、矛盾或未经测试 |
| 6-10 | 可用但存在明显差距 |
| 11-13 | 常见场景指导清晰 |
| 14-15 | 全面覆盖,包括边缘案例和错误处理 |
检查要点:
- 决策树:对于多路径场景,是否有清晰的路径选择指导?
- 代码示例:它们真的能运行吗?还是会出错的伪代码?
- 错误处理:如果主要方法失败怎么办?是否有备选方案?
- 边缘案例:是否覆盖了不常见但现实的场景?
- 可操作性:Agent能否立即行动,还是需要自行摸索?
良好可用性示例(决策树+备选方案):
markdown
| 任务 | 主要工具 | 备选方案 | 何时使用备选方案 |
|------|-------------|----------|----------------------|
| 读取文本 | pdftotext | PyMuPDF | 需要布局信息时 |
| 提取表格 | camelot-py | tabula-py | camelot失败时 |
**常见问题**:
- 扫描版PDF:pdftotext返回空白 → 先使用OCR
- 加密PDF:权限错误 → 使用带密码的PyMuPDF糟糕可用性示例(模糊):
markdown
使用适当的工具进行PDF处理。
正确处理错误。
考虑边缘案例。NEVER Do When Evaluating
评估时绝对不要做的事
- NEVER give high scores just because it "looks professional" or is well-formatted
- NEVER ignore token waste — every redundant paragraph should result in deduction
- NEVER let length impress you — a 43-line Skill can outperform a 500-line Skill
- NEVER skip mentally testing the decision trees — do they actually lead to correct choices?
- NEVER forgive explaining basics with "but it provides helpful context"
- NEVER overlook missing anti-patterns — if there's no NEVER list, that's a significant gap
- NEVER assume all procedures are valuable — distinguish domain-specific from generic
- NEVER undervalue the description field — poor description = skill never gets used
- NEVER put "when to use" info only in the body — Agent only sees description before loading
- 绝对不要仅仅因为Skill“看起来专业”或格式良好就给高分
- 绝对不要忽略令牌浪费——每一段冗余内容都应该扣分
- 绝对不要被长度打动——43行的Skill可能比500行的Skill表现更好
- 绝对不要跳过对决策树的测试——它们真的能引导出正确的选择吗?
- 绝对不要用“但它提供了有用的上下文”来原谅基础内容的解释
- 绝对不要忽略缺失的反模式——如果没有“绝对不要”列表,这是一个重大缺陷
- 绝对不要假设所有流程都有价值——区分领域特定和通用流程
- 绝对不要低估描述字段的价值——糟糕的描述=Skill永远不会被使用
- 绝对不要只在正文中放置“何时使用”信息——Agent在加载前仅查看描述
Evaluation Protocol
评估流程
Step 1: First Pass — Knowledge Delta Scan
步骤1:首次扫描——知识增量检查
Read SKILL.md completely and for each section ask:
"Does Claude already know this?"
Mark each section as:
- [E] Expert: Claude genuinely doesn't know this — value-add
- [A] Activation: Claude knows but brief reminder is useful — acceptable
- [R] Redundant: Claude definitely knows this — should be deleted
Calculate rough ratio: E:A:R
- Good Skill: >70% Expert, <20% Activation, <10% Redundant
- Mediocre Skill: 40-70% Expert, high Activation
- Bad Skill: <40% Expert, high Redundant
完整阅读SKILL.md,对每个章节问:
“Claude已经知道这个吗?”
将每个章节标记为:
- [E] 专家级:Claude确实不知道——增值内容
- [A] 激活型:Claude知道但简短提醒有用——可接受
- [R] 冗余型:Claude肯定知道——应删除
计算大致比例:E:A:R
- 优质Skill:>70%专家级,<20%激活型,<10%冗余型
- 中等Skill:40-70%专家级,高激活型
- 劣质Skill:<40%专家级,高冗余型
Step 2: Structure Analysis
步骤2:结构分析
[ ] Check frontmatter validity
[ ] Count total lines in SKILL.md
[ ] List all reference files and their sizes
[ ] Identify which pattern the Skill follows
[ ] Check for loading triggers (if references exist)[ ] 检查前置元数据有效性
[ ] 统计SKILL.md总行数
[ ] 列出所有参考文件及其大小
[ ] 识别Skill遵循的模式
[ ] 检查加载触发点(如果有参考文件)Step 3: Score Each Dimension
步骤3:为每个维度评分
For each of the 8 dimensions:
- Find specific evidence (quote relevant lines)
- Assign score with one-line justification
- Note specific improvements if score < max
对8个维度中的每个维度:
- 找到具体证据(引用相关行)
- 给出分数并附上一行理由
- 如果分数未达满分,记录具体改进建议
Step 4: Calculate Total & Grade
步骤4:计算总分和等级
Total = D1 + D2 + D3 + D4 + D5 + D6 + D7 + D8
Max = 120 pointsGrade Scale (percentage-based):
| Grade | Percentage | Meaning |
|---|---|---|
| A | 90%+ (108+) | Excellent — production-ready expert Skill |
| B | 80-89% (96-107) | Good — minor improvements needed |
| C | 70-79% (84-95) | Adequate — clear improvement path |
| D | 60-69% (72-83) | Below Average — significant issues |
| F | <60% (<72) | Poor — needs fundamental redesign |
总分 = D1 + D2 + D3 + D4 + D5 + D6 + D7 + D8
满分 = 120分等级划分(基于百分比):
| 等级 | 百分比 | 含义 |
|---|---|---|
| A | 90%+ (108+) | 优秀——可投入生产的专家级Skill |
| B | 80-89% (96-107) | 良好——需要小幅度改进 |
| C | 70-79% (84-95) | 合格——有清晰的改进路径 |
| D | 60-69% (72-83) | 低于平均水平——存在重大问题 |
| F | <60% (<72) | 差——需要彻底重新设计 |
Step 5: Generate Report
步骤5:生成报告
markdown
undefinedmarkdown
undefinedSkill Evaluation Report: [Skill Name]
Skill评估报告:[Skill名称]
Summary
摘要
- Total Score: X/120 (X%)
- Grade: [A/B/C/D/F]
- Pattern: [Mindset/Navigation/Philosophy/Process/Tool]
- Knowledge Ratio: E:A:R = X:Y:Z
- Verdict: [One sentence assessment]
- 总分:X/120 (X%)
- 等级:[A/B/C/D/F]
- 模式:[思维模式/导航型/理念型/流程型/工具型]
- 知识比例:E:A:R = X:Y:Z
- 结论:[一句话评估]
Dimension Scores
维度得分
| Dimension | Score | Max | Notes |
|---|---|---|---|
| D1: Knowledge Delta | X | 20 | |
| D2: Mindset vs Mechanics | X | 15 | |
| D3: Anti-Pattern Quality | X | 15 | |
| D4: Specification Compliance | X | 15 | |
| D5: Progressive Disclosure | X | 15 | |
| D6: Freedom Calibration | X | 15 | |
| D7: Pattern Recognition | X | 10 | |
| D8: Practical Usability | X | 15 |
| 维度 | 得分 | 满分 | 备注 |
|---|---|---|---|
| D1:知识增量 | X | 20 | |
| D2:思维模式与流程 | X | 15 | |
| D3:反模式质量 | X | 15 | |
| D4:规范合规性 | X | 15 | |
| D5:渐进式披露 | X | 15 | |
| D6:自由度校准 | X | 15 | |
| D7:模式识别 | X | 10 | |
| D8:实际可用性 | X | 15 |
Critical Issues
关键问题
[List must-fix problems that significantly impact the Skill's effectiveness]
[列出严重影响Skill有效性的必须修复问题]
Top 3 Improvements
三大改进建议
- [Highest impact improvement with specific guidance]
- [Second priority improvement]
- [Third priority improvement]
- [影响最大的改进,附具体指导]
- [第二优先级改进]
- [第三优先级改进]
Detailed Analysis
详细分析
[For each dimension scoring below 80%, provide:
- What's missing or problematic
- Specific examples from the Skill
- Concrete suggestions for improvement]
---[对每个得分低于80%的维度,提供:
- 缺失或有问题的内容
- Skill中的具体示例
- 具体改进建议]
---Common Failure Patterns
常见失败模式
Pattern 1: The Tutorial
模式1:教程型
Symptom: Explains what PDF is, how Python works, basic library usage
Root cause: Author assumes Skill should "teach" the model
Fix: Claude already knows this. Delete all basic explanations.
Focus on expert decisions, trade-offs, and anti-patterns.症状:解释什么是PDF、Python如何工作、基础库用法
根本原因:作者认为Skill应该“教”模型
修复:Claude已经知道这些。删除所有基础解释。
专注于专家决策、权衡和反模式。Pattern 2: The Dump
模式2:堆砌型
Symptom: SKILL.md is 800+ lines with everything included
Root cause: No progressive disclosure design
Fix: Core routing and decision trees in SKILL.md (<300 lines ideal)
Detailed content in references/, loaded on-demand症状:SKILL.md有800+行,包含所有内容
根本原因:没有渐进式披露设计
修复:核心路由和决策树放在SKILL.md中(理想<300行)
详细内容放在references/中,按需加载Pattern 3: The Orphan References
模式3:孤立参考型
Symptom: References directory exists but files are never loaded
Root cause: No explicit loading triggers
Fix: Add "MANDATORY - READ ENTIRE FILE" at workflow decision points
Add "Do NOT Load" to prevent over-loading症状:存在references目录,但文件从未被加载
根本原因:没有明确的加载触发点
修复:在工作流决策点添加“强制要求 - 阅读整个文件”
添加“请勿加载”以防止过度加载Pattern 4: The Checkbox Procedure
模式4: checkbox流程型
Symptom: Step 1, Step 2, Step 3... mechanical procedures
Root cause: Author thinks in procedures, not thinking frameworks
Fix: Transform into "Before doing X, ask yourself..."
Focus on decision principles, not operation sequences症状:步骤1、步骤2、步骤3...机械流程
根本原因:作者以流程而非思维框架思考
修复:转换为“在做X之前,问自己...”
专注于决策原则,而非操作序列Pattern 5: The Vague Warning
模式5:模糊警告型
Symptom: "Be careful", "avoid errors", "consider edge cases"
Root cause: Author knows things can go wrong but hasn't articulated specifics
Fix: Specific NEVER list with concrete examples and non-obvious reasons
"NEVER use X because [specific problem that takes experience to learn]"症状:“小心”、“避免错误”、“考虑边缘案例”
根本原因:作者知道可能出错,但未明确说明具体内容
修复:具体的“绝对不要”列表,附带具体示例和非明显原因
“绝对不要使用X,因为[需要经验才能学到的具体问题]”Pattern 6: The Invisible Skill
模式6:隐形Skill型
Symptom: Great content but skill rarely gets activated
Root cause: Description is vague, missing keywords, or lacks trigger scenarios
Fix: Description must answer WHAT, WHEN, and include KEYWORDS
"Use when..." + specific scenarios + searchable terms
Example fix:
BAD: "Helps with document tasks"
GOOD: "Create, edit, and analyze .docx files. Use when working with
Word documents, tracked changes, or professional document formatting."症状:内容优秀,但很少被激活
根本原因:描述模糊、缺少关键词或触发场景
修复:描述必须回答是什么、何时用,并包含关键词
“当...时使用”+具体场景+可搜索术语
修复示例:
差: “帮助处理文档任务”
好: “创建、编辑和分析.docx文件。在处理Word文档、修订跟踪或专业文档格式时使用。”Pattern 7: The Wrong Location
模式7:位置错误型
Symptom: "When to use this Skill" section in body, not in description
Root cause: Misunderstanding of three-layer loading
Fix: Move all triggering information to description field
Body is only loaded AFTER triggering decision is made症状:“何时使用此Skill”部分在正文中,而非描述中
根本原因:误解三层加载机制
修复:将所有触发信息移至描述字段
正文仅在触发决策后才会被加载Pattern 8: The Over-Engineered
模式8:过度工程型
Symptom: README.md, CHANGELOG.md, INSTALLATION_GUIDE.md, CONTRIBUTING.md
Root cause: Treating Skill like a software project
Fix: Delete all auxiliary files. Only include what Agent needs for the task.
No documentation about the Skill itself.症状:包含README.md、CHANGELOG.md、INSTALLATION_GUIDE.md、CONTRIBUTING.md
根本原因:将Skill视为软件项目
修复:删除所有辅助文件。仅保留Agent完成任务所需的内容。
不要包含关于Skill本身的文档。Pattern 9: The Freedom Mismatch
模式9:自由度不匹配型
Symptom: Rigid scripts for creative tasks, vague guidance for fragile operations
Root cause: Not considering task fragility
Fix: High freedom for creative (principles, not steps)
Low freedom for fragile (exact scripts, no parameters)症状:创意任务用严格脚本,脆弱操作用模糊指导
根本原因:未考虑任务的脆弱性
修复:创意任务→高自由度(原则)
脆弱操作→低自由度(精确脚本)Quick Reference Checklist
快速参考检查清单
┌─────────────────────────────────────────────────────────────────────────┐
│ SKILL EVALUATION QUICK CHECK │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ KNOWLEDGE DELTA (most important): │
│ [ ] No "What is X" explanations for basic concepts │
│ [ ] No step-by-step tutorials for standard operations │
│ [ ] Has decision trees for non-obvious choices │
│ [ ] Has trade-offs only experts would know │
│ [ ] Has edge cases from real-world experience │
│ │
│ MINDSET + PROCEDURES: │
│ [ ] Transfers thinking patterns (how to think about problems) │
│ [ ] Has "Before doing X, ask yourself..." frameworks │
│ [ ] Includes domain-specific procedures Claude wouldn't know │
│ [ ] Distinguishes valuable procedures from generic ones │
│ │
│ ANTI-PATTERNS: │
│ [ ] Has explicit NEVER list │
│ [ ] Anti-patterns are specific, not vague │
│ [ ] Includes WHY (non-obvious reasons) │
│ │
│ SPECIFICATION (description is critical!): │
│ [ ] Valid YAML frontmatter │
│ [ ] name: lowercase, ≤64 chars │
│ [ ] description answers: WHAT does it do? │
│ [ ] description answers: WHEN should it be used? │
│ [ ] description contains trigger KEYWORDS │
│ [ ] description is specific enough for Agent to know when to use │
│ │
│ STRUCTURE: │
│ [ ] SKILL.md < 500 lines (ideal < 300) │
│ [ ] Heavy content in references/ │
│ [ ] Loading triggers embedded in workflow │
│ [ ] Has "Do NOT Load" for preventing over-loading │
│ │
│ FREEDOM: │
│ [ ] Creative tasks → High freedom (principles) │
│ [ ] Fragile operations → Low freedom (exact scripts) │
│ │
│ USABILITY: │
│ [ ] Decision trees for multi-path scenarios │
│ [ ] Working code examples │
│ [ ] Error handling and fallbacks │
│ [ ] Edge cases covered │
│ │
└─────────────────────────────────────────────────────────────────────────┘┌─────────────────────────────────────────────────────────────────────────┐
│ SKILL评估快速检查 │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ 知识增量(最重要): │
│ [ ] 没有对基础概念的“什么是X”解释 │
│ [ ] 没有标准操作的分步教程 │
│ [ ] 有非明显选择的决策树 │
│ [ ] 有只有专家才知道的权衡 │
│ [ ] 有来自实际经验的边缘案例 │
│ │
│ 思维模式+流程: │
│ [ ] 传递思维模式(如何思考问题) │
│ [ ] 有“在做X之前,问自己...”框架 │
│ [ ] 包含Claude不知道的领域特定流程 │
│ [ ] 区分有价值的流程和通用流程 │
│ │
│ 反模式: │
│ [ ] 有明确的“绝对不要”列表 │
│ [ ] 反模式具体,不模糊 │
│ [ ] 包含原因(非明显理由) │
│ │
│ 规范(描述至关重要!): │
│ [ ] 有效的YAML前置元数据 │
│ [ ] name:小写,≤64字符 │
│ [ ] 描述回答:它能做什么?(是什么) │
│ [ ] 描述回答:应该何时使用?(何时用) │
│ [ ] 描述包含触发关键词 │
│ [ ] 描述足够具体,让Agent知道何时使用 │
│ │
│ 结构: │
│ [ ] SKILL.md < 500行(理想< 300) │
│ [ ] 大量内容在references/中 │
│ [ ] 加载触发点嵌入工作流 │
│ [ ] 有“请勿加载”以防止过度加载 │
│ │
│ 自由度: │
│ [ ] 创意任务 → 高自由度(原则) │
│ [ ] 脆弱操作 → 低自由度(精确脚本) │
│ │
│ 可用性: │
│ [ ] 多路径场景有决策树 │
│ [ ] 可运行的代码示例 │
│ [ ] 错误处理和备选方案 │
│ [ ] 覆盖边缘案例 │
│ │
└─────────────────────────────────────────────────────────────────────────┘The Meta-Question
核心问题
When evaluating any Skill, always return to this fundamental question:
"Would an expert in this domain, looking at this Skill, say: 'Yes, this captures knowledge that took me years to learn'?"
If the answer is yes → the Skill has genuine value.
If the answer is no → it's compressing what Claude already knows.
The best Skills are compressed expert brains — they take a designer's 10 years of aesthetic accumulation and compress it into 43 lines, or a document expert's operational experience into a 200-line decision tree.
What gets compressed must be things Claude doesn't have. Otherwise, it's garbage compression.
评估任何Skill时,始终回到这个根本问题:
“该领域的专家看到这个Skill时,会说: '是的,这捕捉了我花了数年时间才学到的知识'吗?”
如果答案是肯定的 → Skill具有真正的价值。
如果答案是否定的 → 它只是在压缩Claude已经知道的内容。
最好的Skill是压缩的专家大脑——它们将设计师10年的美学积累压缩成43行,或将文档专家的操作经验压缩成200行的决策树。
被压缩的必须是Claude没有的内容。否则,就是无效压缩。
Self-Evaluation Note
自我评估说明
This Skill (skill-judge) should itself pass evaluation:
- Knowledge Delta: Provides specific evaluation criteria Claude wouldn't generate on its own
- Mindset: Shapes how to think about Skill quality, not just checklist items
- Anti-Patterns: "NEVER Do When Evaluating" section with specific don'ts
- Specification: Valid frontmatter with comprehensive description
- Progressive Disclosure: Self-contained, no external references needed
- Freedom: Medium freedom appropriate for evaluation task
- Pattern: Follows Tool pattern with decision frameworks
- Usability: Clear protocol, report template, quick reference
Evaluate this Skill against itself as a calibration exercise.
本Skill(skill-judge)本身也应该通过评估:
- 知识增量:提供Claude无法自行生成的具体评估标准
- 思维模式:塑造如何思考Skill质量,而非仅提供检查项
- 反模式:“评估时绝对不要做的事”部分包含具体禁忌
- 规范:有效的前置元数据,包含全面的描述
- 渐进式披露:自包含,无需外部参考
- 自由度:中等自由度,适合评估任务
- 模式:遵循工具型模式,包含决策框架
- 可用性:清晰的流程、报告模板、快速参考
将本Skill与自身进行评估,作为校准练习。