skill-validator

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Skill Validator

Skill Validator

Validate any skill against production-level quality criteria.
对照生产级质量准则验证任意Skill。

Before Implementation

实施前准备

SourceGather
Skill DirectorySKILL.md, references/, scripts/, assets/
Skill TypeBuilder, Guide, Automation, Analyzer, or Validator
ConversationValidation purpose (audit, improvement, review)
来源收集内容
Skill 目录SKILL.md, references/, scripts/, assets/
Skill 类型Builder、Guide、Automation、Analyzer 或 Validator
对话信息验证目的(审计、优化、评审)

What This Skill Does NOT Do

本Skill不具备的功能

  • Test skills in production environments
  • Automatically fix identified issues
  • Validate skill runtime behavior (only structure/content)
  • Replace human judgment on domain accuracy
  • 在生产环境中测试Skill
  • 自动修复发现的问题
  • 验证Skill的运行时行为(仅验证结构/内容)
  • 替代人类对领域准确性的判断

Validation Workflow

验证工作流

Phase 1: Gather Context

阶段1:收集上下文信息

  1. Read the skill's SKILL.md completely
  2. Identify skill type from frontmatter description:
    • Builder skill (creates artifacts)
    • Guide skill (provides instructions)
    • Automation skill (executes workflows)
    • Analyzer skill (extracts insights)
    • Validator skill (enforces quality)
    • Hybrid skill (combination of above)
  3. Read all reference files in
    references/
    directory
  4. Check for assets/scripts directories
  5. Note frontmatter fields (
    name
    ,
    description
    ,
    allowed-tools
    ,
    model
    )
  1. 完整阅读Skill的SKILL.md文件
  2. 从前置描述中识别Skill类型
    • Builder Skill(生成工件)
    • Guide Skill(提供指导说明)
    • Automation Skill(执行工作流)
    • Analyzer Skill(提取洞察信息)
    • Validator Skill(保障质量)
    • Hybrid Skill(以上类型的组合)
  3. 阅读
    references/
    目录下的所有参考文件
  4. 检查是否存在assets/scripts目录
  5. 记录前置字段
    name
    description
    allowed-tools
    model

Phase 2: Apply Criteria

阶段2:应用验证准则

Evaluate against 9 criteria categories. Each criterion scores 0-3:
  • 0: Missing/Absent
  • 1: Present but inadequate
  • 2: Adequate implementation
  • 3: Excellent implementation

对照9类验证准则进行评估。每个准则的评分范围为0-3分:
  • 0分:缺失/未提供
  • 1分:已提供但不充分
  • 2分:实现符合要求
  • 3分:实现优秀

Criteria Categories

准则类别

1. Structure & Anatomy (Weight: 12%)

1. 结构与组成(权重:12%)

CriterionWhat to Check
SKILL.md existsRoot file present
Line count<500 lines (context is precious)
Frontmatter complete
name
and
description
present in YAML
Name constraints1-64 chars; lowercase alphanumeric + hyphens; no consecutive hyphens; can't start/end with hyphen; must match directory name
Description format[What] + [When] format; ≤1024 chars
Description styleThird-person: "This skill should be used when..."
No extraneous filesNo README.md, CHANGELOG.md, LICENSE in skill dir
Progressive disclosureDetails in
references/
, not bloated SKILL.md
Asset organizationTemplates in
assets/
, scripts in
scripts/
Large file guidanceIf references >10k words, grep patterns in SKILL.md
Fail condition: Missing SKILL.md or >800 lines = automatic fail
准则检查内容
SKILL.md文件存在根目录下存在该文件
行数限制少于500行(上下文信息宝贵)
前置信息完整YAML中包含
name
description
字段
名称约束1-64个字符;仅包含小写字母、数字和连字符;无连续连字符;不能以连字符开头/结尾;必须与目录名称匹配
描述格式[功能] + [适用场景]格式;≤1024个字符
描述风格第三人称表述:"本Skill应在……场景下使用"
无冗余文件Skill目录下无README.md、CHANGELOG.md、LICENSE文件
渐进式信息披露详细内容放在
references/
目录中,避免SKILL.md过于臃肿
资源组织模板放在
assets/
目录,脚本放在
scripts/
目录
大文件指引若参考文件超过10000字,在SKILL.md中提供grep检索模式
失败条件:缺失SKILL.md或文件行数超过800行 = 直接判定不合格

2. Content Quality (Weight: 15%)

2. 内容质量(权重:15%)

CriterionWhat to Check
ConcisenessNo verbose explanations, context is public good
Imperative formInstructions use "Do X" not "You should do X"
Appropriate freedomConstraints where needed, flexibility where safe
Scope clarityClear what skill does AND does not do
No hallucination riskNo instructions that encourage making up info
Output specificationClear expected outputs defined
准则检查内容
简洁性无冗余解释,上下文信息为公共资源
命令式表述说明使用"执行X操作"而非"你应该执行X操作"
适度灵活性必要时设置约束,安全场景下保留灵活性
范围清晰明确说明Skill能做什么以及不能做什么
无幻觉风险无鼓励编造信息的指导内容
输出规范明确定义预期输出内容

3. User Interaction (Weight: 12%)

3. 用户交互(权重:12%)

CriterionWhat to Check
Clarification triggersAsks questions before acting on ambiguity
Required vs optionalDistinguishes must-know from nice-to-know
Graceful handlingWhat to do when user doesn't answer
No over-askingDoesn't ask obvious or inferrable questions
Question pacingAvoids too many questions in single message
Context awarenessUses available context before asking
Key pattern to look for:
markdown
undefined
准则检查内容
歧义澄清触发机制在存在歧义时主动提问
必填与可选区分区分必须了解和可选了解的信息
优雅处理无回应场景定义用户未回复时的处理方式
避免过度提问不询问明显或可推断的问题
提问节奏避免在单条消息中提出过多问题
上下文感知先利用已有上下文信息,再进行提问
需关注的核心模式
markdown
undefined

Required Clarifications

必填澄清问题

  1. Question about X
  2. Question about Y
  1. 关于X的问题
  2. 关于Y的问题

Optional Clarifications

可选澄清问题

  1. Question about Z (if relevant)
Note: Avoid asking too many questions in a single message.
undefined
  1. 关于Z的问题(如相关)
注意:避免在单条消息中提出过多问题。
undefined

4. Documentation & References (Weight: 10%)

4. 文档与参考(权重:10%)

CriterionWhat to Check
Source URLsOfficial documentation links provided
Reference filesComplex details in
references/
not main file
Fetch guidanceInstructions to fetch docs for unlisted patterns
Version awarenessNotes about checking for latest patterns
Example coverageGood/bad examples for key patterns
Key pattern to look for:
markdown
| Resource | URL | Use For |
|----------|-----|---------|
| Official Docs | https://... | Complex cases |
准则检查内容
官方文档链接提供官方文档的链接
参考文件存放复杂细节放在
references/
目录而非主文件中
文档获取指引提供针对未列出模式的文档获取指导
版本意识提示检查最新模式的说明
示例覆盖为核心模式提供正反示例
需关注的核心模式
markdown
| 资源 | 链接 | 用途 |
|----------|-----|---------|
| 官方文档 | https://... | 处理复杂场景 |

5. Domain Standards (Weight: 10%)

5. 领域标准(权重:10%)

CriterionWhat to Check
Best practicesFollows domain conventions (e.g., WCAG, OWASP)
Enforcement mechanismChecklists, validation steps, must-verify items
Anti-patternsLists what NOT to do
Quality gatesOutput checklist before delivery
Key pattern to look for:
markdown
undefined
准则检查内容
最佳实践遵循符合领域惯例(如WCAG、OWASP)
执行机制包含检查清单、验证步骤、必须验证的事项
反模式列举列出不能做的事项
质量关卡交付前提供输出检查清单
需关注的核心模式
markdown
undefined

Must Follow

必须遵循

  • Requirement 1
  • Requirement 2
  • 要求1
  • 要求2

Must Avoid

必须避免

  • Antipattern 1
  • Antipattern 2
undefined
  • 反模式1
  • 反模式2
undefined

6. Technical Robustness (Weight: 8%)

6. 技术健壮性(权重:8%)

CriterionWhat to Check
Error handlingGuidance for failure scenarios
Security considerationsInput validation, secrets handling if relevant
DependenciesExternal tools/APIs documented
Edge casesCommon edge cases addressed
TestabilityCan outputs be verified?
准则检查内容
错误处理提供故障场景的处理指引
安全考量若相关,包含输入验证、敏感信息处理说明
依赖项说明记录外部工具/API
边缘场景覆盖处理常见边缘场景
可测试性输出结果可被验证

7. Maintainability (Weight: 8%)

7. 可维护性(权重:8%)

CriterionWhat to Check
ModularityReferences are self-contained topics
Update pathEasy to update when standards change
No hardcoded valuesUses placeholders/variables where appropriate
Clear organizationLogical section ordering
准则检查内容
模块化参考文件为独立主题
更新路径标准变更时易于更新
无硬编码值适当使用占位符/变量
组织清晰章节顺序符合逻辑

8. Zero-Shot Implementation (Weight: 12%)

8. 零次实现(Weight: 12%)

Skills should enable single-interaction implementation with embedded expertise.
CriterionWhat to Check
Before Implementation sectionContext gathering guidance present
Codebase contextGuidance to scan existing structure/patterns
Conversation contextUses discussed requirements/decisions
Embedded expertiseDomain knowledge in
references/
, not runtime discovery
User-only questionsOnly asks for USER requirements, not domain knowledge
Key pattern to look for:
markdown
undefined
Skill应支持借助内嵌专业知识的单次交互实现。
准则检查内容
实施前准备章节包含上下文收集指引
代码库上下文提供扫描现有结构/模式的指引
对话上下文利用已讨论的需求/决策
内嵌专业知识领域知识放在
references/
目录中,而非运行时获取
仅询问用户需求仅向用户询问需求,而非领域知识
需关注的核心模式
markdown
undefined

Before Implementation

实施前准备

Gather context to ensure successful implementation:
SourceGather
CodebaseExisting structure, patterns, conventions
ConversationUser's specific requirements
Skill ReferencesDomain patterns from
references/
User GuidelinesProject-specific conventions

**Red flag**: Skill instructs to "research" or "discover" domain knowledge at runtime instead of embedding it.
收集上下文信息以确保实施成功:
来源收集内容
代码库现有结构、模式、惯例
对话信息用户的具体需求
Skill参考文件
references/
目录中的领域模式
用户指南项目特定惯例

**警示信号**:Skill要求在运行时“研究”或“发现”领域知识,而非将其内嵌。

9. Reusability (Weight: 13%)

9. 可复用性(权重:13%)

Skills should handle variations, not single requirements.
CriterionWhat to Check
Handles variationsNot hardcoded to single use case
Variable elementsClarifications capture what VARIES
Constant patternsDomain best practices encoded as constants
Not requirement-specificAvoids hardcoded data, tools, configs
Abstraction levelAppropriate generalization for domain
Good example:
markdown
"Create visualizations - adaptable to data shape, chart type, library"
Bad example (too specific):
markdown
"Create bar chart with sales data using Recharts"
Key check: Does the skill work for multiple use cases within its domain?

Skill应能处理多种场景变化,而非仅适用于单一需求。
准则检查内容
支持场景变化未硬编码为单一用例
可变元素捕获通过澄清问题捕获可变内容
固定模式编码领域最佳实践作为固定模式编码
不绑定特定需求避免硬编码数据、工具、配置
抽象层级合适针对领域进行适度泛化
优秀示例
markdown
"生成可视化图表 - 适配数据形态、图表类型、库"
反面示例(过于具体)
markdown
"使用Recharts生成销售数据的柱状图"
核心检查点:该Skill是否适用于领域内的多个用例?

Type-Specific Validation

类型特定验证

After scoring general criteria, verify type-specific requirements:
TypeMust Have
BuilderClarifications, Output Spec, Domain Standards, Output Checklist
GuideWorkflow Steps, Examples (Good/Bad), Official Docs links
AutomationScripts in
scripts/
, Dependencies, Error Handling, I/O Spec
AnalyzerAnalysis Scope, Evaluation Criteria, Output Format, Synthesis
ValidatorQuality Criteria, Scoring Rubric, Thresholds, Remediation
Scoring: Deduct 10 points if type-specific requirements missing for identified type.

完成通用准则评分后,验证对应类型的特定要求:
类型必须具备的内容
Builder澄清问题、输出规范、领域标准、输出检查清单
Guide工作流步骤、正反示例、官方文档链接
Automation
scripts/
目录中的脚本、依赖项说明、错误处理、输入输出规范
Analyzer分析范围、评估准则、输出格式、信息整合
Validator质量准则、评分规则、阈值、整改建议
评分规则:若缺失对应类型的特定要求,扣除10分。

Scoring Guide

评分指南

Category Scores

类别得分计算

Calculate each category score:
Category Score = (Sum of criterion scores) / (Max possible) * 100
计算每个类别的得分:
类别得分 = (准则得分总和) / (最高可能得分) * 100

Overall Score

总体得分计算

Overall = Σ(Category Score × Weight)
总体得分 = Σ(类别得分 × 权重)

Rating Thresholds

评级阈值

ScoreRatingMeaning
90-100ProductionReady for wide use
75-89GoodMinor improvements needed
60-74AdequateFunctional but needs work
40-59DevelopingSignificant gaps
0-39IncompleteMajor rework required

得分评级含义
90-100生产级可广泛使用
75-89良好需小幅优化
60-74合格可用但需改进
40-59待开发存在显著差距
0-39不完整需要大幅重构

Output Format

输出格式

Generate validation report:
markdown
undefined
生成验证报告:
markdown
undefined

Skill Validation Report: [skill-name]

Skill验证报告: [skill-name]

Rating: [Production/Good/Adequate/Developing/Incomplete] Overall Score: [X]/100
评级: [生产级/良好/合格/待开发/不完整] 总体得分: [X]/100

Summary

摘要

[2-3 sentence assessment]
[2-3句话的评估内容]

Category Scores

类别得分

CategoryScoreWeightWeighted
Structure & AnatomyX/10012%X
Content QualityX/10015%X
User InteractionX/10012%X
DocumentationX/10010%X
Domain StandardsX/10010%X
Technical RobustnessX/1008%X
MaintainabilityX/1008%X
Zero-Shot ImplementationX/10012%X
ReusabilityX/10013%X
Type-Specific Deduction-X--X
类别得分权重加权得分
结构与组成X/10012%X
内容质量X/10015%X
用户交互X/10012%X
文档与参考X/10010%X
领域标准X/10010%X
技术健壮性X/1008%X
可维护性X/1008%X
零次实现X/10012%X
可复用性X/10013%X
类型特定扣分-X--X

Critical Issues (if any)

关键问题(若有)

  • [Issue requiring immediate fix]
  • [需要立即修复的问题]

Improvement Recommendations

优化建议

  1. High Priority: [Specific action]
  2. Medium Priority: [Specific action]
  3. Low Priority: [Specific action]
  1. 高优先级: [具体行动]
  2. 中优先级: [具体行动]
  3. 低优先级: [具体行动]

Strengths

优势

  • [What skill does well]

---
  • [Skill的优秀之处]

---

Quick Validation Checklist

快速验证检查清单

For rapid assessment, check these critical items:
用于快速评估的关键检查项:

Structure & Frontmatter

结构与前置信息

  • SKILL.md <500 lines
  • Frontmatter: name (≤64 chars, lowercase, hyphens) + description (≤1024 chars)
  • Description uses third-person style ("This skill should be used when...")
  • No README.md/CHANGELOG.md in skill directory
  • SKILL.md文件少于500行
  • 前置信息:名称(≤64字符、小写、连字符)+ 描述(≤1024字符)
  • 描述使用第三人称风格("本Skill应在……场景下使用")
  • Skill目录下无README.md/CHANGELOG.md文件

Content & Interaction

内容与交互

  • Has clarification questions (Required vs Optional)
  • Has output specification
  • Has official documentation links
  • 包含澄清问题(必填 vs 可选)
  • 包含输出规范
  • 提供官方文档链接

Zero-Shot & Reusability

零次实现与可复用性

  • Has "Before Implementation" section (context gathering)
  • Domain expertise embedded in
    references/
    (not runtime discovery)
  • Handles variations (not requirement-specific)
  • 包含“实施前准备”章节(上下文收集)
  • 领域专业知识内嵌在
    references/
    目录中(而非运行时获取)
  • 支持场景变化(不绑定特定需求)

Type-Specific (check based on skill type)

类型特定检查(根据Skill类型)

  • Builder: Clarifications + Output Spec + Standards + Checklist
  • Guide: Workflow + Examples + Docs
  • Automation: Scripts + Dependencies + Error Handling
  • Analyzer: Scope + Criteria + Output Format
  • Validator: Criteria + Scoring + Thresholds + Remediation
If 10+ checked: Likely Production (90+) If 7-9 checked: Likely Good (75-89) If 5-6 checked: Likely Adequate (60-74) If <5 checked: Needs significant work

  • Builder:澄清问题 + 输出规范 + 标准 + 检查清单
  • Guide:工作流 + 示例 + 文档链接
  • Automation:脚本 + 依赖项 + 错误处理
  • Analyzer:范围 + 准则 + 输出格式
  • Validator:准则 + 评分规则 + 阈值 + 整改建议
若勾选10项及以上:大概率为生产级(90分以上) 若勾选7-9项:大概率为良好(75-89分) 若勾选5-6项:大概率为合格(60-74分) 若勾选不足5项:需要大幅改进

Reference Files

参考文件

FileWhen to Read
references/detailed-criteria.md
Deep evaluation of specific criterion
references/scoring-examples.md
Example validations for calibration
references/improvement-patterns.md
Common fixes for common issues

文件阅读场景
references/detailed-criteria.md
深入评估特定准则
references/scoring-examples.md
参考示例验证进行校准
references/improvement-patterns.md
常见问题的通用修复方案

Usage Examples

使用示例

Validate a skill

验证某个Skill

Validate the chatgpt-widget-creator skill against production criteria
对照生产级准则验证chatgpt-widget-creator skill

Quick audit

快速审计

Quick validation check on mcp-builder skill
对mcp-builder skill进行快速验证检查

Focused review

聚焦式评审

Check if skill-creator skill has proper user interaction patterns
检查skill-creator skill是否具备合适的用户交互模式