skill-validator

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Skill Validator

Validate any skill against production-level quality criteria.

对照生产级质量准则验证任意Skill。

Before Implementation

实施前准备

Source	Gather
Skill Directory	SKILL.md, references/, scripts/, assets/
Skill Type	Builder, Guide, Automation, Analyzer, or Validator
Conversation	Validation purpose (audit, improvement, review)

来源	收集内容
Skill 目录	SKILL.md, references/, scripts/, assets/
Skill 类型	Builder、Guide、Automation、Analyzer 或 Validator
对话信息	验证目的（审计、优化、评审）

What This Skill Does NOT Do

本Skill不具备的功能

Test skills in production environments
Automatically fix identified issues
Validate skill runtime behavior (only structure/content)
Replace human judgment on domain accuracy

在生产环境中测试Skill
自动修复发现的问题
验证Skill的运行时行为（仅验证结构/内容）
替代人类对领域准确性的判断

Validation Workflow

验证工作流

Phase 1: Gather Context

阶段1：收集上下文信息

Read the skill's SKILL.md completely
Identify skill type from frontmatter description:
- Builder skill (creates artifacts)
- Guide skill (provides instructions)
- Automation skill (executes workflows)
- Analyzer skill (extracts insights)
- Validator skill (enforces quality)
- Hybrid skill (combination of above)
Read all reference files in
```
references/
```
directory
Check for assets/scripts directories
Note frontmatter fields (
```
name
```
,
```
description
```
,
```
allowed-tools
```
,
```
model
```
)

完整阅读Skill的SKILL.md文件
从前置描述中识别Skill类型：
- Builder Skill（生成工件）
- Guide Skill（提供指导说明）
- Automation Skill（执行工作流）
- Analyzer Skill（提取洞察信息）
- Validator Skill（保障质量）
- Hybrid Skill（以上类型的组合）
阅读
references/
目录下的所有参考文件
检查是否存在assets/scripts目录
记录前置字段（
```
name
```
、
```
description
```
、
```
allowed-tools
```
、
```
model
```
）

Phase 2: Apply Criteria

阶段2：应用验证准则

Evaluate against 9 criteria categories. Each criterion scores 0-3:

0: Missing/Absent
1: Present but inadequate
2: Adequate implementation
3: Excellent implementation

对照9类验证准则进行评估。每个准则的评分范围为0-3分：

0分：缺失/未提供
1分：已提供但不充分
2分：实现符合要求
3分：实现优秀

Criteria Categories

准则类别

1. Structure & Anatomy (Weight: 12%)

1. 结构与组成（权重：12%）

Criterion	What to Check
SKILL.md exists	Root file present
Line count	<500 lines (context is precious)
Frontmatter complete	`name` and `description` present in YAML
Name constraints	1-64 chars; lowercase alphanumeric + hyphens; no consecutive hyphens; can't start/end with hyphen; must match directory name
Description format	[What] + [When] format; ≤1024 chars
Description style	Third-person: "This skill should be used when..."
No extraneous files	No README.md, CHANGELOG.md, LICENSE in skill dir
Progressive disclosure	Details in `references/` , not bloated SKILL.md
Asset organization	Templates in `assets/` , scripts in `scripts/`
Large file guidance	If references >10k words, grep patterns in SKILL.md

Fail condition: Missing SKILL.md or >800 lines = automatic fail

准则	检查内容
SKILL.md文件存在	根目录下存在该文件
行数限制	少于500行（上下文信息宝贵）
前置信息完整	YAML中包含 `name` 和 `description` 字段
名称约束	1-64个字符；仅包含小写字母、数字和连字符；无连续连字符；不能以连字符开头/结尾；必须与目录名称匹配
描述格式	[功能] + [适用场景]格式；≤1024个字符
描述风格	第三人称表述："本Skill应在……场景下使用"
无冗余文件	Skill目录下无README.md、CHANGELOG.md、LICENSE文件
渐进式信息披露	详细内容放在 `references/` 目录中，避免SKILL.md过于臃肿
资源组织	模板放在 `assets/` 目录，脚本放在 `scripts/` 目录
大文件指引	若参考文件超过10000字，在SKILL.md中提供grep检索模式

失败条件：缺失SKILL.md或文件行数超过800行 = 直接判定不合格

2. Content Quality (Weight: 15%)

2. 内容质量（权重：15%）

Criterion	What to Check
Conciseness	No verbose explanations, context is public good
Imperative form	Instructions use "Do X" not "You should do X"
Appropriate freedom	Constraints where needed, flexibility where safe
Scope clarity	Clear what skill does AND does not do
No hallucination risk	No instructions that encourage making up info
Output specification	Clear expected outputs defined

准则	检查内容
简洁性	无冗余解释，上下文信息为公共资源
命令式表述	说明使用"执行X操作"而非"你应该执行X操作"
适度灵活性	必要时设置约束，安全场景下保留灵活性
范围清晰	明确说明Skill能做什么以及不能做什么
无幻觉风险	无鼓励编造信息的指导内容
输出规范	明确定义预期输出内容

3. User Interaction (Weight: 12%)

3. 用户交互（权重：12%）

Criterion	What to Check
Clarification triggers	Asks questions before acting on ambiguity
Required vs optional	Distinguishes must-know from nice-to-know
Graceful handling	What to do when user doesn't answer
No over-asking	Doesn't ask obvious or inferrable questions
Question pacing	Avoids too many questions in single message
Context awareness	Uses available context before asking

Key pattern to look for:

markdown

undefined

准则	检查内容
歧义澄清触发机制	在存在歧义时主动提问
必填与可选区分	区分必须了解和可选了解的信息
优雅处理无回应场景	定义用户未回复时的处理方式
避免过度提问	不询问明显或可推断的问题
提问节奏	避免在单条消息中提出过多问题
上下文感知	先利用已有上下文信息，再进行提问

需关注的核心模式：

markdown

undefined

Required Clarifications

必填澄清问题

Question about X
Question about Y

关于X的问题
关于Y的问题

Optional Clarifications

可选澄清问题

Question about Z (if relevant)

Note: Avoid asking too many questions in a single message.

undefined

关于Z的问题（如相关）

注意：避免在单条消息中提出过多问题。

undefined

4. Documentation & References (Weight: 10%)

4. 文档与参考（权重：10%）

Criterion	What to Check
Source URLs	Official documentation links provided
Reference files	Complex details in `references/` not main file
Fetch guidance	Instructions to fetch docs for unlisted patterns
Version awareness	Notes about checking for latest patterns
Example coverage	Good/bad examples for key patterns

Key pattern to look for:

markdown

| Resource | URL | Use For |
|----------|-----|---------|
| Official Docs | https://... | Complex cases |

准则	检查内容
官方文档链接	提供官方文档的链接
参考文件存放	复杂细节放在 `references/` 目录而非主文件中
文档获取指引	提供针对未列出模式的文档获取指导
版本意识	提示检查最新模式的说明
示例覆盖	为核心模式提供正反示例

需关注的核心模式：

markdown

| 资源 | 链接 | 用途 |
|----------|-----|---------|
| 官方文档 | https://... | 处理复杂场景 |

5. Domain Standards (Weight: 10%)

5. 领域标准（权重：10%）

Criterion	What to Check
Best practices	Follows domain conventions (e.g., WCAG, OWASP)
Enforcement mechanism	Checklists, validation steps, must-verify items
Anti-patterns	Lists what NOT to do
Quality gates	Output checklist before delivery

Key pattern to look for:

markdown

undefined

准则	检查内容
最佳实践遵循	符合领域惯例（如WCAG、OWASP）
执行机制	包含检查清单、验证步骤、必须验证的事项
反模式列举	列出不能做的事项
质量关卡	交付前提供输出检查清单

需关注的核心模式：

markdown

undefined

Must Follow

必须遵循

Requirement 1
Requirement 2

要求1
要求2

Must Avoid

必须避免

Antipattern 1
Antipattern 2

undefined

反模式1
反模式2

undefined

6. Technical Robustness (Weight: 8%)

6. 技术健壮性（权重：8%）

Criterion	What to Check
Error handling	Guidance for failure scenarios
Security considerations	Input validation, secrets handling if relevant
Dependencies	External tools/APIs documented
Edge cases	Common edge cases addressed
Testability	Can outputs be verified?

准则	检查内容
错误处理	提供故障场景的处理指引
安全考量	若相关，包含输入验证、敏感信息处理说明
依赖项说明	记录外部工具/API
边缘场景覆盖	处理常见边缘场景
可测试性	输出结果可被验证

7. Maintainability (Weight: 8%)

7. 可维护性（权重：8%）

Criterion	What to Check
Modularity	References are self-contained topics
Update path	Easy to update when standards change
No hardcoded values	Uses placeholders/variables where appropriate
Clear organization	Logical section ordering

准则	检查内容
模块化	参考文件为独立主题
更新路径	标准变更时易于更新
无硬编码值	适当使用占位符/变量
组织清晰	章节顺序符合逻辑

8. Zero-Shot Implementation (Weight: 12%)

8. 零次实现（Weight: 12%）

Skills should enable single-interaction implementation with embedded expertise.

Criterion	What to Check
Before Implementation section	Context gathering guidance present
Codebase context	Guidance to scan existing structure/patterns
Conversation context	Uses discussed requirements/decisions
Embedded expertise	Domain knowledge in `references/` , not runtime discovery
User-only questions	Only asks for USER requirements, not domain knowledge

Key pattern to look for:

markdown

undefined

Skill应支持借助内嵌专业知识的单次交互实现。

准则	检查内容
实施前准备章节	包含上下文收集指引
代码库上下文	提供扫描现有结构/模式的指引
对话上下文	利用已讨论的需求/决策
内嵌专业知识	领域知识放在 `references/` 目录中，而非运行时获取
仅询问用户需求	仅向用户询问需求，而非领域知识

需关注的核心模式：

markdown

undefined

Before Implementation

实施前准备

Gather context to ensure successful implementation:

Source	Gather
Codebase	Existing structure, patterns, conventions
Conversation	User's specific requirements
Skill References	Domain patterns from `references/`
User Guidelines	Project-specific conventions


**Red flag**: Skill instructs to "research" or "discover" domain knowledge at runtime instead of embedding it.

收集上下文信息以确保实施成功：

来源	收集内容
代码库	现有结构、模式、惯例
对话信息	用户的具体需求
Skill参考文件	`references/` 目录中的领域模式
用户指南	项目特定惯例


**警示信号**：Skill要求在运行时“研究”或“发现”领域知识，而非将其内嵌。

9. Reusability (Weight: 13%)

9. 可复用性（权重：13%）

Skills should handle variations, not single requirements.

Criterion	What to Check
Handles variations	Not hardcoded to single use case
Variable elements	Clarifications capture what VARIES
Constant patterns	Domain best practices encoded as constants
Not requirement-specific	Avoids hardcoded data, tools, configs
Abstraction level	Appropriate generalization for domain

Good example:

markdown

"Create visualizations - adaptable to data shape, chart type, library"

Bad example (too specific):

markdown

"Create bar chart with sales data using Recharts"

Key check: Does the skill work for multiple use cases within its domain?

Skill应能处理多种场景变化，而非仅适用于单一需求。

准则	检查内容
支持场景变化	未硬编码为单一用例
可变元素捕获	通过澄清问题捕获可变内容
固定模式编码	领域最佳实践作为固定模式编码
不绑定特定需求	避免硬编码数据、工具、配置
抽象层级合适	针对领域进行适度泛化

优秀示例：

markdown

"生成可视化图表 - 适配数据形态、图表类型、库"

反面示例（过于具体）：

markdown

"使用Recharts生成销售数据的柱状图"

核心检查点：该Skill是否适用于领域内的多个用例？

Type-Specific Validation

类型特定验证

After scoring general criteria, verify type-specific requirements:

Type	Must Have
Builder	Clarifications, Output Spec, Domain Standards, Output Checklist
Guide	Workflow Steps, Examples (Good/Bad), Official Docs links
Automation	Scripts in `scripts/` , Dependencies, Error Handling, I/O Spec
Analyzer	Analysis Scope, Evaluation Criteria, Output Format, Synthesis
Validator	Quality Criteria, Scoring Rubric, Thresholds, Remediation

Scoring: Deduct 10 points if type-specific requirements missing for identified type.

完成通用准则评分后，验证对应类型的特定要求：

类型	必须具备的内容
Builder	澄清问题、输出规范、领域标准、输出检查清单
Guide	工作流步骤、正反示例、官方文档链接
Automation	`scripts/` 目录中的脚本、依赖项说明、错误处理、输入输出规范
Analyzer	分析范围、评估准则、输出格式、信息整合
Validator	质量准则、评分规则、阈值、整改建议

评分规则：若缺失对应类型的特定要求，扣除10分。

Scoring Guide

评分指南

Category Scores

类别得分计算

Calculate each category score:

Category Score = (Sum of criterion scores) / (Max possible) * 100

计算每个类别的得分：

类别得分 = (准则得分总和) / (最高可能得分) * 100

Overall Score

总体得分计算

Overall = Σ(Category Score × Weight)

总体得分 = Σ(类别得分 × 权重)

Rating Thresholds

评级阈值

Score	Rating	Meaning
90-100	Production	Ready for wide use
75-89	Good	Minor improvements needed
60-74	Adequate	Functional but needs work
40-59	Developing	Significant gaps
0-39	Incomplete	Major rework required

得分	评级	含义
90-100	生产级	可广泛使用
75-89	良好	需小幅优化
60-74	合格	可用但需改进
40-59	待开发	存在显著差距
0-39	不完整	需要大幅重构

Output Format

输出格式

Generate validation report:

markdown

undefined

生成验证报告：

markdown

undefined

Skill Validation Report: [skill-name]

Skill验证报告: [skill-name]

Rating: [Production/Good/Adequate/Developing/Incomplete] Overall Score: [X]/100

评级: [生产级/良好/合格/待开发/不完整] 总体得分: [X]/100

Summary

摘要

[2-3 sentence assessment]

[2-3句话的评估内容]

Category Scores

类别得分

Category	Score	Weight	Weighted
Structure & Anatomy	X/100	12%	X
Content Quality	X/100	15%	X
User Interaction	X/100	12%	X
Documentation	X/100	10%	X
Domain Standards	X/100	10%	X
Technical Robustness	X/100	8%	X
Maintainability	X/100	8%	X
Zero-Shot Implementation	X/100	12%	X
Reusability	X/100	13%	X
Type-Specific Deduction	-X	-	-X

类别	得分	权重	加权得分
结构与组成	X/100	12%	X
内容质量	X/100	15%	X
用户交互	X/100	12%	X
文档与参考	X/100	10%	X
领域标准	X/100	10%	X
技术健壮性	X/100	8%	X
可维护性	X/100	8%	X
零次实现	X/100	12%	X
可复用性	X/100	13%	X
类型特定扣分	-X	-	-X

Critical Issues (if any)

关键问题（若有）

[Issue requiring immediate fix]

[需要立即修复的问题]

Improvement Recommendations

优化建议

High Priority: [Specific action]
Medium Priority: [Specific action]
Low Priority: [Specific action]

高优先级: [具体行动]
中优先级: [具体行动]
低优先级: [具体行动]

Strengths

优势

[What skill does well]

---

[Skill的优秀之处]

---

Quick Validation Checklist

快速验证检查清单

For rapid assessment, check these critical items:

用于快速评估的关键检查项：

Structure & Frontmatter

结构与前置信息

SKILL.md <500 lines
Frontmatter: name (≤64 chars, lowercase, hyphens) + description (≤1024 chars)
Description uses third-person style ("This skill should be used when...")
No README.md/CHANGELOG.md in skill directory

SKILL.md文件少于500行
前置信息：名称（≤64字符、小写、连字符）+ 描述（≤1024字符）
描述使用第三人称风格（"本Skill应在……场景下使用"）
Skill目录下无README.md/CHANGELOG.md文件

Content & Interaction

内容与交互

Has clarification questions (Required vs Optional)
Has output specification
Has official documentation links

包含澄清问题（必填 vs 可选）
包含输出规范
提供官方文档链接

Zero-Shot & Reusability

零次实现与可复用性

Has "Before Implementation" section (context gathering)
Domain expertise embedded in
```
references/
```
(not runtime discovery)
Handles variations (not requirement-specific)

包含“实施前准备”章节（上下文收集）
领域专业知识内嵌在
```
references/
```
目录中（而非运行时获取）
支持场景变化（不绑定特定需求）

Type-Specific (check based on skill type)

类型特定检查（根据Skill类型）

Reference Files

参考文件

File	When to Read
`references/detailed-criteria.md`	Deep evaluation of specific criterion
`references/scoring-examples.md`	Example validations for calibration
`references/improvement-patterns.md`	Common fixes for common issues

文件	阅读场景
`references/detailed-criteria.md`	深入评估特定准则
`references/scoring-examples.md`	参考示例验证进行校准
`references/improvement-patterns.md`	常见问题的通用修复方案

Usage Examples

使用示例

Validate a skill

验证某个Skill

Validate the chatgpt-widget-creator skill against production criteria

对照生产级准则验证chatgpt-widget-creator skill

Quick audit

快速审计

Quick validation check on mcp-builder skill

对mcp-builder skill进行快速验证检查

Focused review

聚焦式评审

Check if skill-creator skill has proper user interaction patterns

检查skill-creator skill是否具备合适的用户交互模式