review-multi

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Review-Multi

Overview

概述

review-multi provides a systematic framework for conducting comprehensive, multi-dimensional reviews of Claude Code skills. It evaluates skills across 5 independent dimensions, combining automated validation with manual assessment to deliver objective quality scores and actionable improvement recommendations.

Purpose: Systematic skill quality assurance through multi-dimensional assessment

The 5 Review Dimensions:

Structure Review - YAML frontmatter, file organization, naming conventions, progressive disclosure
Content Review - Section completeness, clarity, examples, documentation quality
Quality Review - Pattern compliance, best practices, anti-pattern detection, code quality
Usability Review - Ease of use, learnability, real-world effectiveness, user satisfaction
Integration Review - Dependency documentation, data flow, component integration, composition

Automation Levels:

Structure: 95% automated (validate-structure.py)
Content: 40% automated, 60% manual assessment
Quality: 50% automated, 50% manual assessment
Usability: 10% automated, 90% manual testing
Integration: 30% automated, 70% manual review

Scoring System:

Scale: 1-5 per dimension (Excellent/Good/Acceptable/Needs Work/Poor)
Overall Score: Weighted average across dimensions
Grade: A/B/C/D/F mapping
Production Readiness: ≥4.5 ready, 4.0-4.4 ready with improvements, 3.5-3.9 needs work, <3.5 not ready

Value Proposition:

Objective: Evidence-based scoring using detailed rubrics (not subjective opinion)
Comprehensive: 5 dimensions cover all quality aspects
Efficient: Automation handles 30-95% of checks depending on dimension
Actionable: Specific, prioritized improvement recommendations
Consistent: Standardized checklists ensure repeatable results
Flexible: 3 review modes (Comprehensive, Fast Check, Custom)

Key Benefits:

Catch 70% of issues with fast automated checks
Reduce common quality issues by 30% using checklists
Ensure production readiness before deployment
Identify improvement opportunities systematically
Track quality improvements over time
Establish quality standards across skill ecosystem

review-multi 为 Claude Code 技能提供了一套系统化的全方位多维度评审框架。它从5个独立维度对技能进行评估，结合自动化验证与人工评估，输出客观的质量评分和可落地的改进建议。

核心目标：通过多维度评估实现系统化的技能质量保障

5大评审维度：

结构评审 - YAML frontmatter、文件组织、命名规范、渐进式披露
内容评审 - 章节完整性、表述清晰度、示例质量、文档全面性
质量评审 - 模式合规性、最佳实践遵循、反模式检测、代码质量
可用性评审 - 易用性、易学性、实际场景有效性、用户满意度
集成评审 - 依赖项文档、数据流、组件集成、组合模式

自动化程度：

结构：95% 自动化（通过 validate-structure.py）
内容：40% 自动化，60% 人工评估
质量：50% 自动化，50% 人工评估
可用性：10% 自动化，90% 人工测试
集成：30% 自动化，70% 人工评审

评分体系：

评分范围：每个维度1-5分（优秀/良好/合格/需改进/较差）
综合评分：各维度加权平均分
等级映射：对应A/B/C/D/F等级
生产就绪性：≥4.5分可直接上线，4.0-4.4分需小幅改进后上线，3.5-3.9分需优化后上线，<3.5分暂不适合上线

核心价值：

客观性：基于详细准则的证据导向评分（非主观判断）
全面性：5大维度覆盖所有质量维度
高效性：自动化完成30%-95%的检查工作（依维度而定）
可落地性：具体、分优先级的改进建议
一致性：标准化检查清单确保结果可复现
灵活性：支持3种评审模式（全面评审、快速检查、自定义评审）

关键收益：

快速自动化检查可发现70%的问题
通过检查清单减少30%的常见质量问题
上线前确保技能的生产就绪性
系统识别技能的改进空间
跟踪技能质量的长期提升
在技能生态中建立统一质量标准

When to Use

适用场景

Use review-multi when:

Pre-Production Validation - Review new skills before deploying to production to catch issues early and ensure quality standards
Quality Assurance - Conduct systematic QA on skills to validate they meet ecosystem standards and user needs
Identifying Improvements - Discover specific, actionable improvements for existing skills through multi-dimensional assessment
Continuous Improvement - Regular reviews throughout development lifecycle, not just at end, to maintain quality
Production Readiness Assessment - Determine if skill is ready for production use with objective scoring and grade mapping
Skill Ecosystem Standards - Ensure consistency and quality across multiple skills using standardized review framework
Post-Update Validation - Review skills after major updates to ensure changes don't introduce issues or degrade quality
Learning and Improvement - Use review findings to learn patterns, improve future skills, and refine development practices
Team Calibration - Standardize quality assessment across multiple reviewers with objective rubrics

Don't Use When:

Quick syntax checks (use validate-structure.py directly)
In-progress drafts (wait until reasonably complete)
Experimental prototypes (not production-bound)

在以下场景中使用 review-multi：

上线前验证 - 新技能部署到生产环境前进行评审，提前发现问题并确保符合质量标准
质量保障 - 对技能进行系统化质量检查，验证其是否符合生态标准和用户需求
改进点识别 - 通过多维度评估发现现有技能的具体可落地改进方向
持续改进 - 在开发全周期内定期评审，而非仅在收尾阶段，以维持质量水平
生产就绪性评估 - 通过客观评分和等级映射判断技能是否具备上线条件
技能生态标准化 - 使用统一评审框架确保多个技能之间的一致性和质量
更新后验证 - 技能重大更新后进行评审，确保变更未引入问题或降低质量
学习与提升 - 利用评审结果总结模式，优化未来技能开发，完善开发流程
团队校准 - 通过客观准则标准化多位评审人员的质量评估

不适用场景：

快速语法检查（直接使用 validate-structure.py）
未完成的草稿（待内容基本完善后再评审）
实验性原型（非面向生产环境的技能）

Prerequisites

前置条件

Required:

Skill to review (in
```
.claude/skills/[skill-name]/
```
format)
Time allocation based on review mode:
- Fast Check: 5-10 minutes
- Single Operation: 15-60 minutes (varies by dimension)
- Comprehensive Review: 1.5-2.5 hours

Optional:

Python 3.7+ (for automation scripts in Structure and Quality reviews)
PyYAML library (for YAML frontmatter validation)
Access to skill-under-review documentation

Familiarity with Claude Code skill patterns (see

development-workflow/references/common-patterns.md

)

Skills (no required dependencies, complementary):

development-workflow: Use review-multi after skill development
skill-updater: Apply review-multi recommendations
testing-validator: Combine with review-multi for full QA

必填项：

待评审的技能（格式为
```
.claude/skills/[skill-name]/
```
）
根据评审模式分配时间：
- 快速检查：5-10分钟
- 单维度评审：15-60分钟（依维度而定）
- 全面评审：1.5-2.5小时

可选项：

Python 3.7+（用于结构和质量评审中的自动化脚本）
PyYAML库（用于YAML frontmatter验证）
待评审技能的文档访问权限

熟悉Claude Code技能模式（参考

development-workflow/references/common-patterns.md

）

相关技能（无强制依赖，可互补使用）：

development-workflow：技能开发完成后使用review-multi
skill-updater：应用review-multi的改进建议
testing-validator：与review-multi结合完成完整质量保障

Scoring System

评分体系

The review-multi scoring system provides objective, consistent quality assessment across all skill dimensions.

review-multi评分体系为所有技能维度提供客观、一致的质量评估。

Per-Dimension Scoring (1-5 Scale)

单维度评分（1-5分制）

Each dimension is scored independently using a 1-5 integer scale:

5 - Excellent (Exceeds Standards)

All criteria met perfectly
Goes beyond minimum requirements
Exemplary quality that sets the bar
No issues or concerns identified
Can serve as example for others

4 - Good (Meets Standards)

Meets all critical criteria
1-2 minor, non-critical issues
Production-ready quality
Standard expected level
Small improvements possible

3 - Acceptable (Minor Improvements Needed)

Meets most criteria
3-4 issues, some may be critical
Usable but not optimal
Several improvements recommended
Can proceed with noted concerns

2 - Needs Work (Notable Issues)

Missing several criteria
5-6 issues, multiple critical
Not production-ready
Significant improvements required
Rework needed before deployment

1 - Poor (Significant Problems)

Fails most criteria
7+ issues, fundamentally flawed
Major quality concerns
Extensive rework required
Not viable in current state

每个维度独立采用1-5分整数评分：

5分 - 优秀（超出标准）

所有准则完美达标
超越最低要求
质量堪称典范，可作为标杆
未发现任何问题或隐患
可作为其他技能的参考示例

4分 - 良好（符合标准）

满足所有关键准则
存在1-2个非关键性小问题
具备生产就绪质量
达到预期标准水平
存在小幅改进空间

3分 - 合格（需小幅改进）

满足大部分准则
存在3-4个问题，部分可能为关键性问题
可使用但并非最优
存在多个改进建议
可在记录问题后推进，但需后续优化

2分 - 需改进（存在明显问题）

缺失多项准则要求
存在5-6个问题，其中多个为关键性问题
不具备生产就绪性
需要显著改进
上线前需重新优化

1分 - 较差（存在严重问题）

未满足大部分准则
存在7个以上问题，基础设计存在缺陷
存在重大质量隐患
需要全面重构
当前状态不可用

Overall Score Calculation

综合评分计算

The overall score is a weighted average of the 5 dimension scores:

Overall = (Structure × 0.20) + (Content × 0.25) + (Quality × 0.25) +
          (Usability × 0.15) + (Integration × 0.15)

Weight Rationale:

Content & Quality (25% each): Core skill value - what it does and how well
Structure (20%): Important foundation - organization and compliance
Usability & Integration (15% each): Supporting factors - user experience and composition

Example Calculations:

Scores (5, 4, 4, 3, 4) → Overall = (5×0.20 + 4×0.25 + 4×0.25 + 3×0.15 + 4×0.15) = 4.15 → Grade B
Scores (4, 5, 5, 4, 4) → Overall = (4×0.20 + 5×0.25 + 5×0.25 + 4×0.15 + 4×0.15) = 4.55 → Grade A
Scores (3, 3, 2, 3, 3) → Overall = (3×0.20 + 3×0.25 + 2×0.25 + 3×0.15 + 3×0.15) = 2.85 → Grade C

综合评分为5个维度评分的加权平均值：

综合评分 = (结构 × 0.20) + (内容 × 0.25) + (质量 × 0.25) +
          (可用性 × 0.15) + (集成 × 0.15)

权重依据：

内容与质量（各25%）：技能核心价值——功能实现与质量
结构（20%）：重要基础——组织架构与合规性
可用性与集成（各15%）：支撑因素——用户体验与组件组合

计算示例：

各维度评分(5, 4, 4, 3, 4) → 综合评分 = (5×0.20 + 4×0.25 + 4×0.25 + 3×0.15 + 4×0.15) = 4.15 → 等级 B
各维度评分(4, 5, 5, 4, 4) → 综合评分 = (4×0.20 + 5×0.25 + 5×0.25 + 4×0.15 + 4×0.15) = 4.55 → 等级 A
各维度评分(3, 3, 2, 3, 3) → 综合评分 = (3×0.20 + 3×0.25 + 2×0.25 + 3×0.15 + 3×0.15) = 2.85 → 等级 C

Grade Mapping

等级映射

Overall scores map to letter grades:

A (4.5-5.0): Excellent - Production ready, high quality
B (3.5-4.4): Good - Ready with minor improvements
C (2.5-3.4): Acceptable - Needs improvements before production
D (1.5-2.4): Poor - Requires significant rework
F (1.0-1.4): Failing - Major issues, not viable

综合评分对应以下字母等级：

A (4.5-5.0)：优秀 - 生产就绪，高质量
B (3.5-4.4)：良好 - 需小幅改进后上线
C (2.5-3.4)：合格 - 上线前需要改进
D (1.5-2.4)：较差 - 需要显著重构
F (1.0-1.4)：不合格 - 存在重大问题，不可用

Production Readiness Assessment

生产就绪性评估

Based on overall score:

≥4.5 (Grade A): ✅ Production Ready - High quality, deploy with confidence
4.0-4.4 (Grade B+): ✅ Ready with Minor Improvements - Can deploy, address improvements in next iteration
3.5-3.9 (Grade B-): ⚠️ Needs Improvements - Address issues before production deployment
<3.5 (Grade C-F): ❌ Not Ready - Significant rework required before deployment

Decision Framework:

A Grade: Ship it - exemplary quality
B Grade (4.0+): Ship it - standard quality, note improvements for future
B- Grade (3.5-3.9): Hold - fix identified issues first
C-F Grade: Don't ship - substantial work needed

基于综合评分的生产就绪性判断：

≥4.5分（A级）：✅ 生产就绪 - 高质量，可放心部署
4.0-4.4分（B+级）：✅ 小幅改进后可上线 - 可部署，在下一迭代中完成改进
3.5-3.9分（B-级）：⚠️ 需改进 - 解决问题后再部署到生产环境
<3.5分（C-F级）：❌ 暂不就绪 - 需要显著重构后再考虑上线

决策框架：

A级：直接发布 - 质量堪称典范
B级（4.0+）：直接发布 - 标准质量，记录改进点用于未来迭代
B-级（3.5-3.9）：暂缓发布 - 先修复已识别问题
C-F级：不发布 - 需要大量优化工作

Operations

操作流程

Operation 1: Structure Review

操作1：结构评审

Purpose: Validate file organization, naming conventions, YAML frontmatter compliance, and progressive disclosure

When to Use This Operation:

Always run first (fast automated check catches 70% of issues)
Before comprehensive review (quick validation of basics)
During development (continuous structure validation)
Quick quality checks (5-10 minute validation)

Automation Level: 95% automated via

scripts/validate-structure.py

Process:

Run Structure Validation Script
bash
```
python3 scripts/validate-structure.py /path/to/skill [--json] [--verbose]
```
Script checks YAML, file structure, naming, progressive disclosure
Review YAML Frontmatter
- Verify name field in kebab-case format
- Check description has 5+ trigger keywords naturally embedded
- Validate YAML syntax is correct
Verify File Structure
- Confirm SKILL.md exists
- Check references/ and scripts/ organization (if present)
- Verify README.md exists
Check Naming Conventions
- SKILL.md and README.md uppercase
- references/ files: lowercase-hyphen-case
- scripts/ files: lowercase-hyphen-case with extension
Validate Progressive Disclosure
- SKILL.md <1,500 lines (warn if >1,200)
- references/ files 300-800 lines each
- No monolithic files

Validation Checklist:

YAML frontmatter present and valid syntax
```
name
```
field in kebab-case format (e.g., skill-name)
```
description
```
includes 5+ trigger keywords (naturally embedded)
SKILL.md file exists
File naming follows conventions (SKILL.md uppercase, references lowercase-hyphen)
Directory structure correct (references/, scripts/ if present)
SKILL.md size appropriate (<1,500 lines, ideally <1,200)
References organized by topic (if present)
No monolithic files (progressive disclosure maintained)
README.md present

Scoring Criteria:

5 - Excellent: All 10 checks pass, perfect compliance, exemplary structure
4 - Good: 8-9 checks pass, 1-2 minor non-critical issues (e.g., README missing but optional)
3 - Acceptable: 6-7 checks pass, 3-4 issues including some critical (e.g., YAML invalid but fixable)
2 - Needs Work: 4-5 checks pass, 5-6 issues with multiple critical (e.g., no SKILL.md, bad naming)
1 - Poor: ≤3 checks pass, 7+ issues, fundamentally flawed structure

Outputs:

Structure score (1-5)
Pass/fail status for each checklist item
List of issues found with severity (critical/warning/info)
Specific improvement recommendations with fix guidance
JSON report (if using script with --json flag)

Time Estimate: 5-10 minutes (mostly automated)

Example:

bash

$ python3 scripts/validate-structure.py .claude/skills/todo-management

Structure Validation Report
===========================
Skill: todo-management
Date: 2025-11-06

✅ YAML Frontmatter: PASS
   - Name format: valid (kebab-case)
   - Trigger keywords: 8 found (target: 5+)

✅ File Structure: PASS
   - SKILL.md: exists
   - README.md: exists
   - references/: 3 files found
   - scripts/: 1 file found

✅ Naming Conventions: PASS
   - All files follow conventions

⚠️  Progressive Disclosure: WARNING
   - SKILL.md: 569 lines (good)
   - state-management-guide.md: 501 lines (good)
   - BUT: No Quick Reference section detected

Overall Structure Score: 4/5 (Good)
Issues: 1 warning (missing Quick Reference)
Recommendation: Add Quick Reference section to SKILL.md

目标：验证文件组织、命名规范、YAML frontmatter合规性以及渐进式披露

适用场景：

作为首个操作执行（快速自动化检查可发现70%的问题）
全面评审前的快速基础验证
开发过程中的持续结构验证
快速质量检查（5-10分钟验证）

自动化程度：95% 自动化，通过

scripts/validate-structure.py

实现

流程：

运行结构验证脚本
bash
```
python3 scripts/validate-structure.py /path/to/skill [--json] [--verbose]
```
脚本检查YAML语法、文件结构、命名规范、渐进式披露
评审YAML frontmatter
- 验证name字段采用kebab-case格式
- 检查description中自然嵌入5个以上触发关键词
- 验证YAML语法正确
验证文件结构
- 确认SKILL.md文件存在
- 检查references/和scripts/目录的组织（若存在）
- 验证README.md文件存在
检查命名规范
- SKILL.md和README.md文件名大写
- references/目录下文件采用小写连字符格式
- scripts/目录下文件采用小写连字符格式并带扩展名
验证渐进式披露
- SKILL.md文件行数<1500行（超过1200行给出警告）
- references/目录下文件每行300-800行
- 不存在单体大文件

验证检查清单：

YAML frontmatter存在且语法有效
```
name
```
字段采用kebab-case格式（如skill-name）
```
description
```
中自然嵌入5个以上触发关键词
SKILL.md文件存在
文件命名符合规范（SKILL.md大写，references目录下文件小写连字符）
目录结构正确（若存在references/和scripts/目录）
SKILL.md文件大小合适（<1500行，理想状态<1200行）
references目录下文件按主题组织（若存在）
不存在单体大文件（保持渐进式披露）
README.md文件存在

评分准则：

5分 - 优秀：所有10项检查通过，完全合规，结构堪称典范
4分 - 良好：8-9项检查通过，存在1-2个非关键性小问题（如可选的README.md缺失）
3分 - 合格：6-7项检查通过，存在3-4个问题，部分为关键性问题（如YAML语法错误但可修复）
2分 - 需改进：4-5项检查通过，存在5-6个问题，其中多个为关键性问题（如缺失SKILL.md、命名错误）
1分 - 较差：≤3项检查通过，存在7个以上问题，结构存在根本性缺陷

输出结果：

结构评分（1-5分）
每项检查的通过/失败状态
发现的问题列表及严重程度（关键/警告/信息）
具体的改进建议及修复指导
JSON格式报告（使用--json参数时生成）

时间预估：5-10分钟（主要为自动化操作）

示例：

bash

$ python3 scripts/validate-structure.py .claude/skills/todo-management

结构验证报告
===========================
技能: todo-management
日期: 2025-11-06

✅ YAML Frontmatter: 通过
   - 名称格式: 有效（kebab-case）
   - 触发关键词: 发现8个（目标: 5个以上）

✅ 文件结构: 通过
   - SKILL.md: 存在
   - README.md: 存在
   - references/: 发现3个文件
   - scripts/: 发现1个文件

✅ 命名规范: 通过
   - 所有文件符合命名规范

⚠️  渐进式披露: 警告
   - SKILL.md: 569行（符合要求）
   - state-management-guide.md: 501行（符合要求）
   - 问题: 未检测到快速参考章节

整体结构评分: 4/5（良好）
问题: 1个警告（缺失快速参考章节）
建议: 在SKILL.md中添加快速参考章节

Operation 2: Content Review

操作2：内容评审

Purpose: Assess section completeness, content clarity, example quality, and documentation comprehensiveness

When to Use This Operation:

Evaluate documentation quality
Assess completeness of skill content
Review example quality and quantity
Validate information architecture
Check clarity and organization

Automation Level: 40% automated (section detection, example counting), 60% manual assessment

Process:

Check Section Completeness (automated + manual)
- Verify 5 core sections present: Overview, When to Use, Main Content (workflow/operations), Best Practices, Quick Reference
- Check optional sections: Prerequisites, Common Mistakes, Troubleshooting
- Assess if all necessary sections included
Assess Content Clarity (manual)
- Is content understandable?
- Is organization logical?
- Are explanations clear without being verbose?
- Is technical level appropriate for audience?
Evaluate Example Quality (automated count + manual quality)
- Count code/command examples (target: 5+)
- Check if examples are concrete (not abstract placeholders)
- Verify examples are executable/copy-pasteable
- Assess if examples help understanding
Review Documentation Completeness (manual)
- Is all necessary information present?
- Are there unexplained gaps?
- Is sufficient detail provided?
- Are edge cases covered?
Check Explanation Depth (manual)
- Not too brief (insufficient detail)?
- Not too verbose (unnecessary length)?
- Balanced depth for complexity?

Validation Checklist:

Overview/Introduction section present
When to Use section present with 5+ scenarios
Main content (workflow steps OR operations OR reference material) complete
Best Practices section present
Quick Reference section present
5+ code/command examples included
Examples are concrete (not abstract placeholders like "YOUR_VALUE_HERE")
Content clarity: readable and well-structured
Sufficient detail: not too brief
Not too verbose: concise without unnecessary length

Scoring Criteria:

5 - Excellent: All 10 checks pass, exceptional clarity, great examples, comprehensive documentation
4 - Good: 8-9 checks pass, good content with minor gaps or clarity issues
3 - Acceptable: 6-7 checks pass, some sections weak or missing, acceptable clarity
2 - Needs Work: 4-5 checks pass, multiple sections incomplete/unclear, poor examples
1 - Poor: ≤3 checks pass, major gaps, confusing content, few/no examples

Outputs:

Content score (1-5)
Section-by-section assessment (present/missing/weak)
Example quality rating and count
Specific content improvement recommendations
Clarity issues identified with examples

Time Estimate: 15-30 minutes (requires manual review)

Example:

Content Review: prompt-builder
==============================

Section Completeness: 9/10 ✅
✅ Overview: Present, clear explanation of purpose
✅ When to Use: 7 scenarios listed
✅ Main Content: 5-step workflow, well-organized
✅ Best Practices: 6 practices documented
✅ Quick Reference: Present
⚠️  Common Mistakes: Not present (optional but valuable)

Example Quality: 8/10 ✅
- Count: 12 examples (exceeds target of 5+)
- Concrete: Yes, all examples executable
- Helpful: Yes, demonstrate key concepts
- Minor: Could use 1-2 edge case examples

Content Clarity: 9/10 ✅
- Well-organized logical flow
- Clear explanations without verbosity
- Technical level appropriate
- Minor: Step 3 could be clearer (add diagram)

Documentation Completeness: 8/10 ✅
- All workflow steps documented
- Validation criteria clear
- Minor gaps: Error handling not covered

Content Score: 4/5 (Good)
Primary Recommendation: Add Common Mistakes section
Secondary: Add error handling guidance to Step 3

目标：评估章节完整性、内容清晰度、示例质量以及文档全面性

适用场景：

评估文档质量
评估技能内容的完整性
评审示例的质量与数量
验证信息架构
检查内容清晰度与组织性

自动化程度：40% 自动化（章节检测、示例计数），60% 人工评估

流程：

检查章节完整性（自动化+人工）
- 验证5个核心章节是否存在：概述、适用场景、核心内容（工作流/操作）、最佳实践、快速参考
- 检查可选章节：前置条件、常见错误、故障排除
- 评估是否包含所有必要章节
评估内容清晰度（人工）
- 内容是否易于理解？
- 组织逻辑是否清晰？
- 解释是否清晰且不冗长？
- 技术难度是否符合目标受众？
评估示例质量（自动化计数+人工质量评估）
- 统计代码/命令示例数量（目标: 5个以上）
- 检查示例是否具体（非抽象占位符）
- 验证示例是否可执行/可复制粘贴
- 评估示例是否有助于理解内容
评审文档全面性（人工）
- 是否包含所有必要信息？
- 是否存在未解释的空白？
- 提供的细节是否充分？
- 是否覆盖边缘场景？
检查解释深度（人工）
- 是否过于简略（细节不足）？
- 是否过于冗长（不必要的篇幅）？
- 深度是否与复杂度匹配？

验证检查清单：

概述/介绍章节存在
适用场景章节存在且包含5个以上场景
核心内容（工作流步骤/操作/参考资料）完整
最佳实践章节存在
快速参考章节存在
包含5个以上代码/命令示例
示例具体（非抽象占位符如"YOUR_VALUE_HERE"）
内容清晰度：可读性强且结构良好
细节充分：不过于简略
简洁性：无不必要的冗长内容

评分准则：

5分 - 优秀：所有10项检查通过，内容异常清晰，示例优质，文档全面
4分 - 良好：8-9项检查通过，内容质量良好，存在少量空白或清晰度问题
3分 - 合格：6-7项检查通过，部分章节薄弱或缺失，清晰度尚可
2分 - 需改进：4-5项检查通过，多个章节不完整/不清晰，示例质量差
1分 - 较差：≤3项检查通过，存在重大空白，内容混乱，示例极少或无

输出结果：

内容评分（1-5分）
逐章节评估（存在/缺失/薄弱）
示例质量评级与数量
具体的内容改进建议
识别的清晰度问题及示例

时间预估：15-30分钟（需要人工评审）

示例：

内容评审: prompt-builder
==============================

章节完整性: 9/10 ✅
✅ 概述: 存在，清晰说明目标
✅ 适用场景: 列出7个场景
✅ 核心内容: 5步工作流，组织良好
✅ 最佳实践: 记录6条实践
✅ 快速参考: 存在
⚠️  常见错误: 不存在（可选但有价值）

示例质量: 8/10 ✅
- 数量: 12个示例（超出5个以上的目标）
- 具体性: 是，所有示例均可执行
- 实用性: 是，演示核心概念
- 小问题: 可增加1-2个边缘场景示例

内容清晰度: 9/10 ✅
- 组织逻辑清晰
- 解释清晰且不冗长
- 技术难度合适
- 小问题: 步骤3可更清晰（添加图表）

文档全面性: 8/10 ✅
- 所有工作流步骤已记录
- 验证准则清晰
- 小空白: 未覆盖错误处理

内容评分: 4/5（良好）
主要建议: 添加常见错误章节
次要建议: 在步骤3中添加错误处理指导

Operation 3: Quality Review

操作3：质量评审

Purpose: Evaluate pattern compliance, best practices adherence, anti-pattern detection, and code/script quality

When to Use This Operation:

Validate standards compliance
Check pattern implementation
Detect anti-patterns
Assess code quality (if scripts present)
Ensure best practices followed

Automation Level: 50% automated (pattern detection, anti-pattern checking), 50% manual assessment

Process:

Detect Architecture Pattern (automated + manual)
- Identify pattern type: workflow/task/reference/capabilities
- Verify pattern correctly implemented
- Check pattern consistency throughout skill
Validate Documentation Patterns (automated + manual)
- Verify 5 core sections present
- Check consistent structure across steps/operations
- Validate section formatting
Check Best Practices (manual)
- Validation checklists present and specific?
- Examples throughout documentation?
- Quick Reference available?
- Error cases considered?
Detect Anti-Patterns (automated + manual)
- Keyword stuffing (trigger keywords unnatural)?
- Monolithic SKILL.md (>1,500 lines, no progressive disclosure)?
- Inconsistent structure (each section different format)?
- Vague validation ("everything works")?
- Missing examples (too abstract)?
- Placeholders in production ("YOUR_VALUE_HERE")?
- Ignoring error cases (only happy path)?
- Over-engineering simple skills?
- Unclear dependencies?
- No Quick Reference?
Assess Code Quality (manual, if scripts present)
- Scripts well-documented (docstrings)?
- Error handling present?
- CLI interfaces clear?
- Code style consistent?

Validation Checklist:

Architecture pattern correctly implemented (workflow/task/reference/capabilities)
Consistent structure across steps/operations (same format throughout)
Validation checklists present and specific (measurable, not vague)
Best practices section actionable (specific guidance)
No keyword stuffing (trigger keywords natural, contextual)
No monolithic SKILL.md (progressive disclosure used if >1,000 lines)
Examples are complete (no "YOUR_VALUE_HERE" placeholders in production)
Error cases considered (not just happy path documented)
Dependencies documented (if skill requires other skills)
Scripts well-documented (if present: docstrings, error handling, CLI help)

Scoring Criteria:

5 - Excellent: All 10 checks pass, exemplary quality, no anti-patterns, exceeds standards
4 - Good: 8-9 checks pass, high quality, meets all standards, minor deviations
3 - Acceptable: 6-7 checks pass, acceptable quality, some standard violations, 2-3 anti-patterns
2 - Needs Work: 4-5 checks pass, quality issues, multiple standard violations, 4-5 anti-patterns
1 - Poor: ≤3 checks pass, poor quality, significant problems, 6+ anti-patterns detected

Outputs:

Quality score (1-5)
Pattern compliance assessment (pattern detected, compliance level)
Anti-patterns detected (list with severity)
Best practices gaps identified
Code quality assessment (if scripts present)
Prioritized improvement recommendations

Time Estimate: 20-40 minutes (mixed automated + manual)

Example:

Quality Review: workflow-skill-creator
======================================

Pattern Compliance: ✅
- Pattern Detected: Workflow-based
- Implementation: Correct (5 sequential steps with dependencies)
- Consistency: High (all steps follow same structure)

Documentation Patterns: ✅
- 5 Core Sections: All present
- Structure: Consistent across all 5 steps
- Formatting: Proper heading levels

Best Practices Adherence: 8/10 ✅
✅ Validation checklists: Present and specific
✅ Examples throughout: 6 examples included
✅ Quick Reference: Present
⚠️ Error handling: Limited (only happy path in examples)

Anti-Pattern Detection: 1 detected ⚠️
✅ No keyword stuffing (15 natural keywords)
✅ No monolithic file (1,465 lines but has references/)
✅ Consistent structure
✅ Specific validation criteria
✅ Examples complete (no placeholders)
⚠️ Error cases: Only happy path documented
✅ Dependencies: Clearly documented
✅ Not over-engineered

Code Quality: N/A (no scripts)

Quality Score: 4/5 (Good)
Primary Issue: Limited error handling documentation
Recommendation: Add error case examples and recovery guidance

目标：评估模式合规性、最佳实践遵循、反模式检测以及代码/脚本质量

适用场景：

验证标准合规性
检查模式实现
检测反模式
评估代码质量（若存在脚本）
确保遵循最佳实践

自动化程度：50% 自动化（模式检测、反模式检查），50% 人工评估

流程：

检测架构模式（自动化+人工）
- 识别模式类型：工作流/任务/参考/能力
- 验证模式是否正确实现
- 检查技能中模式的一致性
验证文档模式（自动化+人工）
- 验证5个核心章节是否存在
- 检查步骤/操作间的结构一致性
- 验证章节格式
检查最佳实践遵循（人工）
- 是否存在具体的验证检查清单？
- 文档中是否包含示例？
- 是否提供快速参考？
- 是否考虑错误场景？
检测反模式（自动化+人工）
- 关键词堆砌（触发关键词不自然）？
- 单体SKILL.md文件（>1500行，无渐进式披露）？
- 结构不一致（各章节格式不同）？
- 模糊验证（如"一切正常"）？
- 缺失示例（过于抽象）？
- 生产环境中存在占位符（如"YOUR_VALUE_HERE"）？
- 忽略错误场景（仅记录正常流程）？
- 过度设计简单技能？
- 依赖项未明确说明？
- 无快速参考？
评估代码质量（人工，若存在脚本）
- 脚本是否有完善的文档（文档字符串）？
- 是否存在错误处理？
- CLI接口是否清晰？
- 代码风格是否一致？

验证检查清单：

架构模式正确实现（工作流/任务/参考/能力）
步骤/操作间结构一致（所有步骤遵循相同格式）
存在具体的验证检查清单（可衡量，非模糊）
最佳实践章节具备可操作性（具体指导）
无关键词堆砌（触发关键词自然、符合语境）
无单体文件（若超过1000行则使用渐进式披露）
示例完整（生产环境中无"YOUR_VALUE_HERE"占位符）
考虑错误场景（不仅记录正常流程）
依赖项已记录（若技能依赖其他技能）
脚本文档完善（若存在：文档字符串、错误处理、CLI帮助）

评分准则：

5分 - 优秀：所有10项检查通过，质量堪称典范，无反模式，超出标准
4分 - 良好：8-9项检查通过，高质量，符合所有标准，存在微小偏差
3分 - 合格：6-7项检查通过，质量尚可，存在部分标准违规，2-3个反模式
2分 - 需改进：4-5项检查通过，存在质量问题，多项标准违规，4-5个反模式
1分 - 较差：≤3项检查通过，质量差，存在重大问题，检测到6个以上反模式

输出结果：

质量评分（1-5分）
模式合规性评估（检测到的模式、合规等级）
检测到的反模式（列表及严重程度）
识别的最佳实践空白
代码质量评估（若存在脚本）
分优先级的改进建议

时间预估：20-40分钟（自动化+人工混合）

示例：

质量评审: workflow-skill-creator
======================================

模式合规性: ✅
- 检测到的模式: 基于工作流
- 实现: 正确（5个带依赖的连续步骤）
- 一致性: 高（所有步骤遵循相同结构）

文档模式: ✅
- 5个核心章节: 全部存在
- 结构: 所有5个步骤结构一致
- 格式: 正确的标题层级

最佳实践遵循: 8/10 ✅
✅ 验证检查清单: 存在且具体
✅ 文档中包含示例: 6个示例
✅ 快速参考: 存在
⚠️  错误处理: 有限（示例中仅包含正常流程）

反模式检测: 1个检测到 ⚠️
✅ 无关键词堆砌（15个自然关键词）
✅ 无单体文件（1465行但包含references/目录）
✅ 结构一致
✅ 具体验证准则
✅ 示例完整（无占位符）
⚠️  错误场景: 仅记录正常流程
✅ 依赖项: 已明确记录
✅ 未过度设计

代码质量: 不适用（无脚本）

质量评分: 4/5（良好）
主要问题: 错误处理文档有限
建议: 添加错误场景示例及恢复指导

Operation 4: Usability Review

操作4：可用性评审

Purpose: Evaluate ease of use, learnability, real-world effectiveness, and user satisfaction through scenario testing

When to Use This Operation:

Test real-world usage
Assess user experience
Evaluate learnability
Measure effectiveness
Validate skill achieves stated purpose

Automation Level: 10% automated (basic checks), 90% manual testing

Process:

Test in Real-World Scenario
- Select appropriate use case from "When to Use" section
- Actually use the skill to complete task
- Document experience: smooth or friction?
- Note any confusion or difficulty
Assess Navigation/Findability
- Can you find needed information easily?
- Is information architecture logical?
- Are sections well-organized?
- Is Quick Reference helpful?
Evaluate Clarity
- Are instructions clear and actionable?
- Are steps easy to follow?
- Do examples help understanding?
- Is technical terminology explained?
Measure Effectiveness
- Does skill achieve stated purpose?
- Does it deliver promised value?
- Are outputs useful and complete?
- Would you use it again?
Assess Learning Curve
- How long to understand skill?
- How long to use effectively?
- Is learning curve reasonable for complexity?
- Are first-time users supported well?

Validation Checklist:

Skill tested in real-world scenario (actual usage, not just reading)
Users can find information easily (navigation clear, sections logical)
Instructions are clear and actionable (can follow without confusion)
Examples help understanding (concrete, demonstrate key concepts)
Skill achieves stated purpose (delivers promised value)
Learning curve reasonable (appropriate for skill complexity)
Error messages helpful (if applicable: clear, actionable guidance)
Overall user satisfaction high (would use again, recommend to others)

Scoring Criteria:

5 - Excellent: All 8 checks pass, excellent usability, easy to learn, highly effective, very satisfying
4 - Good: 6-7 checks pass, good usability, minor friction points, generally effective
3 - Acceptable: 4-5 checks pass, acceptable usability, some confusion/difficulty, moderately effective
2 - Needs Work: 2-3 checks pass, usability issues, frustrating or confusing, limited effectiveness
1 - Poor: ≤1 check passes, poor usability, hard to use, ineffective, unsatisfying

Outputs:

Usability score (1-5)
Scenario test results (success/partial/failure)
User experience assessment (smooth/acceptable/frustrating)
Specific usability improvements identified
Learning curve assessment
Effectiveness rating

Time Estimate: 30-60 minutes (requires actual testing)

Example:

Usability Review: skill-researcher
==================================

Real-World Scenario Test: ✅
- Scenario: Research GitHub API integration patterns
- Result: SUCCESS - Found 5 relevant sources, synthesized findings
- Experience: Smooth, operations clearly explained
- Time: 45 minutes (expected 60 min range)

Navigation/Findability: 9/10 ✅
- Information easy to find
- 5 operations clearly separated
- Quick Reference table very helpful
- Minor: Could use table of contents for long doc

Instruction Clarity: 9/10 ✅
- Steps clear and actionable
- Process well-explained
- Examples demonstrate concepts
- Minor: Web search query formulation could be clearer

Effectiveness: 10/10 ✅
- Achieved purpose: Found patterns and synthesized
- Delivered value: Comprehensive research in 45 min
- Would use again: Yes, very helpful

Learning Curve: 8/10 ✅
- Time to understand: 10 minutes
- Time to use effectively: 15 minutes
- Reasonable for complexity
- First-time user: Some concepts need explanation (credibility scoring)

Error Handling: N/A (no errors encountered)

User Satisfaction: 9/10 ✅
- Would use again: Yes
- Would recommend: Yes
- Overall experience: Very positive

Usability Score: 5/5 (Excellent)
Minor Improvement: Add brief explanation of credibility scoring concept

目标：通过场景测试评估易用性、易学性、实际场景有效性以及用户满意度

适用场景：

测试实际场景使用
评估用户体验
评估易学性
衡量有效性
验证技能是否实现既定目标

自动化程度：10% 自动化（基础检查），90% 人工测试

流程：

实际场景测试
- 从"适用场景"中选择合适的用例
- 实际使用技能完成任务
- 记录体验：流畅还是存在障碍？
- 记录任何困惑或困难
评估导航/可查找性
- 是否能轻松找到所需信息？
- 信息架构是否逻辑清晰？
- 章节组织是否良好？
- 快速参考是否实用？
评估清晰度
- 说明是否清晰且可操作？
- 步骤是否易于遵循？
- 示例是否有助于理解？
- 技术术语是否有解释？
衡量有效性
- 技能是否实现既定目标？
- 是否交付承诺的价值？
- 输出是否有用且完整？
- 是否会再次使用？
评估学习曲线
- 理解技能需要多长时间？
- 熟练使用需要多长时间？
- 学习曲线是否与技能复杂度匹配？
- 是否为首次用户提供足够支持？

验证检查清单：

已在实际场景中测试技能（实际使用，而非仅阅读文档）
用户可轻松找到信息（导航清晰，章节逻辑）
说明清晰且可操作（无需困惑即可遵循）
示例有助于理解（具体，演示核心概念）
技能实现既定目标（交付承诺价值）
学习曲线合理（与技能复杂度匹配）
错误信息实用（若适用：清晰、可操作的指导）
整体用户满意度高（会再次使用，会推荐给他人）

评分准则：

5分 - 优秀：所有8项检查通过，可用性极佳，易于学习，高效，用户体验极佳
4分 - 良好：6-7项检查通过，可用性良好，存在微小障碍，整体有效
3分 - 合格：4-5项检查通过，可用性尚可，存在部分困惑/困难，中等有效
2分 - 需改进：2-3项检查通过，存在可用性问题，使用过程令人沮丧或困惑，有效性有限
1分 - 较差：≤1项检查通过，可用性差，难以使用，无效，用户体验差

输出结果：

可用性评分（1-5分）
场景测试结果（成功/部分成功/失败）
用户体验评估（流畅/尚可/令人沮丧）
识别的具体可用性改进点
学习曲线评估
有效性评级

时间预估：30-60分钟（需要实际测试）

示例：

可用性评审: skill-researcher
==================================

实际场景测试: ✅
- 场景: 研究GitHub API集成模式
- 结果: 成功 - 找到5个相关来源，完成成果整合
- 体验: 流畅，操作说明清晰
- 时间: 45分钟（预期60分钟范围内）

导航/可查找性: 9/10 ✅
- 信息易于查找
- 5个操作清晰分离
- 快速参考表格非常实用
- 小问题: 长文档可添加目录

说明清晰度: 9/10 ✅
- 步骤清晰且可操作
- 流程解释充分
- 示例演示核心概念
- 小问题: 网页搜索查询构建说明可更清晰

有效性: 10/10 ✅
- 实现目标: 找到模式并完成整合
- 交付价值: 45分钟内完成全面研究
- 会再次使用: 是，非常实用

学习曲线: 8/10 ✅
- 理解时间: 10分钟
- 熟练使用时间: 15分钟
- 与复杂度匹配
- 首次用户: 部分概念需要解释（可信度评分）

错误处理: 不适用（未遇到错误）

用户满意度: 9/10 ✅
- 会再次使用: 是
- 会推荐: 是
- 整体体验: 非常积极

可用性评分: 5/5（优秀）
小改进建议: 添加可信度评分概念的简要说明

Operation 5: Integration Review

操作5：集成评审

Purpose: Assess dependency documentation, data flow clarity, component integration, and composition patterns

When to Use This Operation:

Review workflow skills (that compose other skills)
Validate dependency documentation
Check integration clarity
Assess composition patterns
Verify cross-references valid

Automation Level: 30% automated (dependency checking, cross-reference validation), 70% manual assessment

Process:

Review Dependency Documentation (manual)
- Are required skills documented?
- Are optional/complementary skills mentioned?
- Is YAML
```
dependencies
```
  field used (if applicable)?
- Are dependency versions noted (if relevant)?
Assess Data Flow Clarity (manual, for workflow skills)
- Is data flow between skills explained?
- Are inputs/outputs documented for each step?
- Do users understand how data moves?
- Are there diagrams or flowcharts (if helpful)?
Evaluate Component Integration (manual)
- How do component skills work together?
- Are integration points clear?
- Are there integration examples?
- Is composition pattern documented?
Verify Cross-References (automated + manual)
- Do internal links work (references to references/, scripts/)?
- Are external skill references correct?
- Are complementary skills mentioned?
Check Composition Patterns (manual, for workflow skills)
- Is composition pattern identified (sequential/parallel/conditional/etc.)?
- Is pattern correctly implemented?
- Are orchestration details provided?

Validation Checklist:

Dependencies documented (if skill requires other skills)
YAML
```
dependencies
```
field correct (if used)
Data flow explained (for workflow skills: inputs/outputs clear)
Integration points clear (how component skills connect)
Component skills referenced correctly (names accurate, paths valid)
Cross-references valid (internal links work, external references correct)
Integration examples provided (if applicable: how to use together)
Composition pattern documented (if workflow: sequential/parallel/etc.)
Complementary skills mentioned (optional but valuable related skills)

Scoring Criteria:

5 - Excellent: All 9 checks pass (applicable ones), perfect integration documentation
4 - Good: 7-8 checks pass, good integration, minor gaps in documentation
3 - Acceptable: 5-6 checks pass, some integration unclear, missing details
2 - Needs Work: 3-4 checks pass, integration issues, poorly documented dependencies/flow
1 - Poor: ≤2 checks pass, poor integration, confusing or missing dependency documentation

Outputs:

Integration score (1-5)
Dependency validation results (required/optional/complementary documented)
Data flow clarity assessment (for workflow skills)
Integration clarity rating
Cross-reference validation results
Improvement recommendations

Time Estimate: 15-25 minutes (mostly manual)

Example:

Integration Review: development-workflow
========================================

Dependency Documentation: 10/10 ✅
- Required Skills: None (workflow is standalone)
- Component Skills: 5 clearly documented (skill-researcher, planning-architect, task-development, prompt-builder, todo-management)
- Optional Skills: 3 complementary skills mentioned (review-multi, skill-updater, testing-validator)
- YAML Field: Not used (not required, skills referenced in content)

Data Flow Clarity: 10/10 ✅ (Workflow Skill)
- Data flow diagram present (skill → output → next skill)
- Inputs/outputs for each step documented
- Users understand how artifacts flow
- Example:

skill-researcher → research-synthesis.md → planning-architect ↓ skill-architecture-plan.md → task-development


Component Integration: 10/10 ✅
- Integration method documented for each step (Guided Execution)
- Integration examples provided
- Clear explanation of how skills work together
- Process for using each component skill detailed

Cross-Reference Validation: ✅
- Internal links valid (references/ files exist and reachable)
- External skill references correct (all 5 component skills exist)
- Complementary skills mentioned appropriately

Composition Pattern: 10/10 ✅ (Workflow Skill)
- Pattern: Sequential Pipeline (with one optional step)
- Correctly implemented (Step 1 → 2 → [3 optional] → 4 → 5)
- Orchestration details provided
- Clear flow diagram

Integration Score: 5/5 (Excellent)
Notes: Exemplary integration documentation for workflow skill

目标：评估依赖项文档、数据流清晰度、组件集成以及组合模式

适用场景：

评审工作流技能（组合其他技能的技能）
验证依赖项文档
检查集成清晰度
评估组合模式
验证交叉引用有效性

自动化程度：30% 自动化（依赖项检查、交叉引用验证），70% 人工评估

流程：

评审依赖项文档（人工）
- 是否记录了所需技能？
- 是否提及可选/互补技能？
- 是否使用YAML
```
dependencies
```
  字段（若适用）？
- 是否记录了依赖项版本（若相关）？
评估数据流清晰度（人工，针对工作流技能）
- 是否解释了技能间的数据流？
- 是否记录了每个步骤的输入/输出？
- 用户是否理解数据如何流转？
- 是否提供了图表或流程图（若有帮助）？
评估组件集成（人工）
- 组件技能如何协同工作？
- 集成点是否清晰？
- 是否提供集成示例？
- 是否记录了组合模式？
验证交叉引用（自动化+人工）
- 内部链接是否有效（指向references/、scripts/的链接）？
- 外部技能引用是否正确？
- 是否提及互补技能？
检查组合模式（人工，针对工作流技能）
- 是否识别组合模式（顺序/并行/条件等）？
- 模式是否正确实现？
- 是否提供编排细节？

验证检查清单：

依赖项已记录（若技能依赖其他技能）
YAML
```
dependencies
```
字段正确（若使用）
数据流已解释（针对工作流技能：输入/输出清晰）
集成点清晰（组件技能如何连接）
组件技能引用正确（名称准确，路径有效）
交叉引用有效（内部链接可访问，外部引用正确）
提供集成示例（若适用：如何协同使用）
组合模式已记录（针对工作流技能：顺序/并行等）
提及互补技能（可选但有价值）

评分准则：

5分 - 优秀：所有9项适用检查通过，集成文档完美
4分 - 良好：7-8项检查通过，集成良好，文档存在微小空白
3分 - 合格：5-6项检查通过，部分集成不清晰，存在信息缺失
2分 - 需改进：3-4项检查通过，存在集成问题，依赖项/流文档不完善
1分 - 较差：≤2项检查通过，集成差，依赖项文档混乱或缺失

输出结果：

集成评分（1-5分）
依赖项验证结果（所需/可选/互补技能已记录）
数据流清晰度评估（针对工作流技能）
集成清晰度评级
交叉引用验证结果
改进建议

时间预估：15-25分钟（主要为人工）

示例：

集成评审: development-workflow
========================================

依赖项文档: 10/10 ✅
- 所需技能: 无（工作流为独立技能）
- 组件技能: 5个已明确记录（skill-researcher、planning-architect、task-development、prompt-builder、todo-management）
- 可选技能: 3个互补技能已提及（review-multi、skill-updater、testing-validator）
- YAML字段: 未使用（非必须，技能在内容中引用）

数据流清晰度: 10/10 ✅（工作流技能）
- 提供数据流图（技能 → 输出 → 下一个技能）
- 每个步骤的输入/输出已记录
- 用户理解工件如何流转
- 示例:

skill-researcher → research-synthesis.md → planning-architect ↓ skill-architecture-plan.md → task-development


组件集成: 10/10 ✅
- 每个步骤的集成方法已记录（引导式执行）
- 提供集成示例
- 清晰解释技能如何协同工作
- 详细说明每个组件技能的使用流程

交叉引用验证: ✅
- 内部链接有效（references/目录下文件存在且可访问）
- 外部技能引用正确（所有5个组件技能存在）
- 互补技能提及恰当

组合模式: 10/10 ✅（工作流技能）
- 模式: 顺序流水线（含1个可选步骤）
- 实现正确（步骤1 → 2 → [3可选] → 4 → 5）
- 提供编排细节
- 清晰的流程图

集成评分: 5/5（优秀）
备注: 工作流技能的集成文档堪称典范

Review Modes

评审模式

Comprehensive Review Mode

全面评审模式

Purpose: Complete multi-dimensional assessment across all 5 dimensions with aggregate scoring

When to Use:

Pre-production validation (ensure skill ready for deployment)
Major skill updates (validate changes don't degrade quality)
Quality certification (establish baseline quality score)
Periodic quality audits (track quality over time)

Process:

Run All 5 Operations Sequentially
- Operation 1: Structure Review (5-10 min, automated)
- Operation 2: Content Review (15-30 min, manual)
- Operation 3: Quality Review (20-40 min, mixed)
- Operation 4: Usability Review (30-60 min, manual)
- Operation 5: Integration Review (15-25 min, manual)
Aggregate Scores
- Record score (1-5) for each dimension
- Calculate weighted overall score using formula
- Map overall score to grade (A/B/C/D/F)
Assess Production Readiness
- ≥4.5: Production Ready
- 4.0-4.4: Ready with minor improvements
- 3.5-3.9: Needs improvements before production
- <3.5: Not ready, significant rework required
Compile Improvement Recommendations
- Aggregate issues from all dimensions
- Prioritize: Critical → High → Medium → Low
- Provide specific, actionable fixes
Generate Comprehensive Report
- Executive summary (overall score, grade, readiness)
- Per-dimension scores and findings
- Prioritized improvement list
- Detailed rationale for scores

Output:

Overall score (1.0-5.0 with one decimal)
Grade (A/B/C/D/F)
Production readiness assessment
Per-dimension scores (Structure, Content, Quality, Usability, Integration)
Comprehensive improvement recommendations (prioritized)
Detailed review report

Time Estimate: 1.5-2.5 hours total

Example Output:

Comprehensive Review Report: skill-researcher
=============================================

OVERALL SCORE: 4.6/5.0 - GRADE A
STATUS: ✅ PRODUCTION READY

Dimension Scores:
- Structure:   5/5 (Excellent) - Perfect file organization
- Content:     5/5 (Excellent) - Comprehensive, clear documentation
- Quality:     4/5 (Good) - High quality, minor error handling gaps
- Usability:   5/5 (Excellent) - Easy to use, highly effective
- Integration: 4/5 (Good) - Well-documented dependencies

Production Readiness: READY - High quality, deploy with confidence

Recommendations (Priority Order):
1. [Medium] Add error handling examples for web search failures
2. [Low] Consider adding table of contents for long SKILL.md

Strengths:
- Excellent structure and organization
- Comprehensive coverage of 5 research operations
- Strong usability with clear instructions
- Good examples throughout

Overall: Exemplary skill, production-ready quality

目标：针对所有5个维度的完整多维度评估，包含综合评分

适用场景：

上线前验证（确保技能可部署）
技能重大更新（验证变更未降低质量）
质量认证（建立基准质量评分）
定期质量审计（跟踪质量长期变化）

流程：

依次运行所有5个操作
- 操作1：结构评审（5-10分钟，自动化）
- 操作2：内容评审（15-30分钟，人工）
- 操作3：质量评审（20-40分钟，混合）
- 操作4：可用性评审（30-60分钟，人工）
- 操作5：集成评审（15-25分钟，人工）
汇总评分
- 记录每个维度的评分（1-5分）
- 使用公式计算加权综合评分
- 将综合评分映射到等级（A/B/C/D/F）
评估生产就绪性
- ≥4.5分：生产就绪
- 4.0-4.4分：小幅改进后可上线
- 3.5-3.9分：改进后上线
- <3.5分：暂不就绪，需要显著重构
整理改进建议
- 汇总所有维度的问题
- 按优先级排序：关键 → 高 → 中 → 低
- 提供具体、可落地的修复方案
生成全面评审报告
- 执行摘要（综合评分、等级、就绪性）
- 各维度评分及发现
- 分优先级的改进列表
- 评分的详细理由

输出：

综合评分（1.0-5.0，保留一位小数）
等级（A/B/C/D/F）
生产就绪性评估
各维度评分（结构、内容、质量、可用性、集成）
全面的改进建议（分优先级）
详细评审报告

时间预估：总计1.5-2.5小时

示例输出：

全面评审报告: skill-researcher
=============================================

综合评分: 4.6/5.0 - 等级 A
状态: ✅ 生产就绪

各维度评分:
- 结构:   5/5（优秀） - 文件组织完美
- 内容:     5/5（优秀） - 文档全面、清晰
- 质量:     4/5（良好） - 高质量，错误处理存在微小空白
- 可用性:   5/5（优秀） - 易用性强，高效
- 集成: 4/5（良好） - 依赖项文档完善

生产就绪性: 可上线 - 高质量，可放心部署

建议（按优先级）:
1. [中] 添加网页搜索失败的错误处理示例
2. [低] 考虑为长SKILL.md添加目录

优势:
- 结构和组织优秀
- 全面覆盖5个研究操作
- 可用性强，说明清晰
- 文档中包含优质示例

整体评价: 堪称典范的技能，具备生产就绪质量

Fast Check Mode

快速检查模式

Purpose: Quick automated validation for rapid quality feedback during development

When to Use:

During development (continuous validation)
Quick quality checks (before detailed review)
Pre-commit validation (catch issues early)
Rapid iteration (fast feedback loop)

Process:

Run Automated Structure Validation

bash

python3 scripts/validate-structure.py /path/to/skill

Check Critical Issues
- YAML frontmatter valid?
- Required files present?
- Naming conventions followed?
- File sizes appropriate?
Generate Pass/Fail Report
- PASS: Critical checks passed, proceed to development
- FAIL: Critical issues found, fix before continuing
Provide Quick Fixes (if available)
- Specific commands to fix issues
- Examples of correct format
- References to documentation

Output:

Pass/Fail status
Critical issues list (if failed)
Quick fixes or guidance
Score estimate (if passed)

Time Estimate: 5-10 minutes

Example Output:

bash

$ python3 scripts/validate-structure.py .claude/skills/my-skill

Fast Check Report
=================
Skill: my-skill

❌ FAIL - Critical Issues Found

Critical Issues:
1. YAML frontmatter: Invalid syntax (line 3: unexpected character)
2. Naming convention: File "MyGuide.md" should be "my-guide.md"

Quick Fixes:
1. Fix YAML: Remove trailing comma on line 3
2. Rename file: mv references/MyGuide.md references/my-guide.md

Run full validation after fixes: python3 scripts/validate-structure.py .claude/skills/my-skill

目标：快速自动化验证，为开发过程提供快速质量反馈

适用场景：

开发过程中（持续验证）
快速质量检查（详细评审前）
提交前验证（提前发现问题）
快速迭代（快速反馈循环）

流程：

运行自动化结构验证

bash

python3 scripts/validate-structure.py /path/to/skill

检查关键问题
- YAML frontmatter是否有效？
- 必要文件是否存在？
- 是否遵循命名规范？
- 文件大小是否合适？
生成通过/失败报告
- 通过：关键检查通过，可继续开发
- 失败：发现关键问题，修复后再继续
提供快速修复方案（若可用）
- 修复问题的具体命令
- 正确格式示例
- 文档参考

输出：

通过/失败状态
关键问题列表（若失败）
快速修复方案或指导
评分预估（若通过）

时间预估：5-10分钟

示例输出：

bash

$ python3 scripts/validate-structure.py .claude/skills/my-skill

快速检查报告
=================
技能: my-skill

❌ 失败 - 发现关键问题

关键问题:
1. YAML frontmatter: 语法无效（第3行：意外字符）
2. 命名规范: 文件"MyGuide.md"应命名为"my-guide.md"

快速修复方案:
1. 修复YAML: 删除第3行的尾随逗号
2. 重命名文件: mv references/MyGuide.md references/my-guide.md

修复后重新运行完整验证: python3 scripts/validate-structure.py .claude/skills/my-skill

Custom Review

自定义评审

Purpose: Flexible review focusing on specific dimensions or concerns

When to Use:

Targeted improvements (focus on specific dimension)
Time constraints (can't do comprehensive review)
Specific concerns (e.g., only check usability)
Iterative improvements (focus on one dimension at a time)

Options:

Select Dimensions: Choose 1-5 operations to run
Adjust Thoroughness: Quick/Standard/Thorough per dimension
Focus Areas: Specify particular concerns (e.g., "check examples quality")

Process:

Define Custom Review Scope
- Which dimensions to review?
- How thorough for each?
- Any specific focus areas?
Run Selected Operations
- Execute chosen operations
- Apply thoroughness level
Generate Targeted Report
- Scores for selected dimensions only
- Focused findings
- Specific recommendations

Example Scenarios:

Scenario 1: Content-Focused Review

Custom Review: Content + Examples
- Operations: Content Review only
- Thoroughness: Thorough
- Focus: Example quality and completeness
- Time: 30 minutes

Scenario 2: Quick Quality Check

Custom Review: Structure + Quality (Fast)
- Operations: Structure + Quality
- Thoroughness: Quick
- Focus: Pattern compliance, anti-patterns
- Time: 15-20 minutes

Scenario 3: Workflow Integration Review

Custom Review: Integration Deep Dive
- Operations: Integration Review only
- Thoroughness: Thorough
- Focus: Data flow, composition patterns
- Time: 30 minutes

目标：灵活评审，聚焦特定维度或关注点

适用场景：

针对性改进（聚焦特定维度）
时间有限（无法进行全面评审）
特定关注点（如仅检查可用性）
迭代改进（一次聚焦一个维度）

选项:

选择维度：选择1-5个操作执行
调整细致度：每个维度可选择快速/标准/细致
聚焦领域：指定特定关注点（如"检查示例质量"）

流程：

定义自定义评审范围
- 评审哪些维度？
- 每个维度的细致度？
- 有哪些特定关注点？
运行选定操作
- 执行选定的操作
- 应用指定的细致度
生成针对性报告
- 仅包含选定维度的评分
- 聚焦的发现
- 具体改进建议

示例场景:

场景1：内容聚焦评审

自定义评审: 内容 + 示例
- 操作: 仅内容评审
- 细致度: 细致
- 聚焦: 示例质量与完整性
- 时间: 30分钟

场景2：快速质量检查

自定义评审: 结构 + 质量（快速）
- 操作: 结构 + 质量
- 细致度: 快速
- 聚焦: 模式合规性、反模式
- 时间: 15-20分钟

场景3：工作流集成深度评审

自定义评审: 集成深度分析
- 操作: 仅集成评审
- 细致度: 细致
- 聚焦: 数据流、组合模式
- 时间: 30分钟

Best Practices

最佳实践

1. Self-Review First

1. 先进行自我评审

Practice: Run Fast Check mode before requesting comprehensive review

Rationale: Automated checks catch 70% of structural issues in 5-10 minutes, allowing manual review to focus on higher-value assessment

Application: Always run

validate-structure.py

before detailed review

实践：请求全面评审前先运行快速检查模式

理由：自动化检查可在5-10分钟内发现70%的结构问题，让人工评审聚焦于更高价值的评估工作

应用：详细评审前始终运行

validate-structure.py

2. Use Checklists Systematically

2. 系统使用检查清单

Practice: Follow validation checklists item-by-item for each operation

Rationale: Research shows teams using checklists reduce common issues by 30% and ensure consistent results

Application: Print or display checklist, mark each item explicitly

实践：每个操作逐项遵循验证检查清单

理由：研究表明，使用检查清单的团队可减少30%的常见问题，并确保结果一致

应用：打印或显示检查清单，逐项明确标记

3. Test in Real Scenarios

3. 实际场景测试

Practice: Conduct usability review with actual usage, not just documentation reading

Rationale: Real-world testing reveals hidden usability issues that documentation review misses

Application: For Usability Review, actually use the skill to complete a realistic task

实践：通过实际使用进行可用性评审，而非仅阅读文档

理由：实际场景测试可发现文档评审无法发现的隐藏可用性问题

应用：可用性评审时，实际使用技能完成真实任务

4. Focus on Automation

4. 聚焦自动化

Practice: Let scripts handle routine checks, focus manual effort on judgment-requiring assessment

Rationale: Automation provides 70% reduction in manual review time for routine checks

Application: Use scripts for Structure and partial Quality checks, manual for Content/Usability

实践：让脚本处理常规检查，人工精力聚焦于需要判断的评估工作

理由：自动化可减少70%的常规检查人工时间

应用：使用脚本进行结构和部分质量检查，人工处理内容/可用性评审

5. Provide Actionable Feedback

5. 提供可落地的反馈

Practice: Make improvement recommendations specific, prioritized, and actionable

Rationale: Vague feedback ("improve quality") is less valuable than specific guidance ("add error handling examples to Step 3")

Application: For each issue, specify: What, Why, How (to fix), Priority

实践：改进建议需具体、分优先级且可落地

理由：模糊反馈（如"提升质量"）远不如具体指导（如"在步骤3中添加错误处理示例"）有价值

应用：每个问题需明确：问题是什么、为什么需要修复、如何修复、优先级

6. Review Regularly

6. 定期评审

Practice: Conduct reviews throughout development lifecycle, not just at end

Rationale: Early reviews catch issues before they compound; rapid feedback maintains momentum (37% productivity increase)

Application: Fast Check during development, Comprehensive Review before production

实践：在开发全周期内定期评审，而非仅在收尾阶段

理由：早期评审可在问题复杂化前发现问题；快速反馈可保持开发节奏（提升37%的生产力）

应用：开发过程中使用快速检查，上线前使用全面评审

7. Track Improvements

7. 跟踪改进

Practice: Document before/after scores to measure improvement over time

Rationale: Tracking demonstrates progress, identifies patterns, validates improvements

Application: Save review reports, compare scores across iterations

实践：记录评审前后的评分，跟踪长期改进

理由：跟踪可展示进展、识别模式、验证改进效果

应用：保存评审报告，对比不同迭代的评分

8. Iterate Based on Findings

8. 基于发现迭代优化

Practice: Use review findings to improve future skills, not just current skill

Rationale: Learnings compound; patterns identified in reviews improve entire skill ecosystem

Application: Document common issues, create guidelines, update templates

实践：利用评审发现改进未来技能，而非仅当前技能

理由：经验可积累；评审中识别的模式可提升整个技能生态的质量

应用：记录常见问题，创建指南，更新模板

Common Mistakes

常见错误

Mistake 1: Skipping Structure Review

错误1：跳过结构评审

Symptom: Spending time on detailed review only to discover fundamental structural issues

Cause: Assumption that structure is correct, eagerness to assess content

Fix: Always run Structure Review (Fast Check) first - takes 5-10 minutes, catches 70% of issues

Prevention: Make Fast Check mandatory first step in any review process

症状：花费时间进行详细评审后才发现基础结构问题

原因：假设结构正确，急于评估内容

修复：始终先运行结构评审（快速检查）- 仅需5-10分钟，可发现70%的问题

预防：将快速检查作为任何评审流程的强制首个步骤

Mistake 2: Subjective Scoring

错误2：主观评分

Symptom: Inconsistent scores, debate over ratings, difficulty justifying scores

Cause: Using personal opinion instead of rubric criteria

Fix: Use

references/scoring-rubric.md

- score based on specific criteria, not feeling

Prevention: Print rubric, refer to criteria for each score, document evidence

症状：评分不一致，对评级存在争议，难以证明评分合理性

原因：使用个人判断而非准则评分

修复：使用

references/scoring-rubric.md

- 基于具体准则评分，而非感觉

预防：打印评分准则，评分时参考准则，记录评分依据

Mistake 3: Ignoring Usability

错误3：忽略可用性评审

Symptom: Skill looks good on paper but difficult to use in practice

Cause: Skipping Usability Review (90% manual, time-consuming)

Fix: Actually test skill in real scenario - reveals hidden issues

Prevention: Allocate 30-60 minutes for usability testing, cannot skip for production

症状：技能在文档中看起来不错，但实际使用困难

原因：跳过可用性评审（90%人工，耗时）

修复：实际场景测试技能 - 发现隐藏问题

预防：为可用性测试分配30-60分钟，生产就绪技能不可跳过此步骤

Mistake 4: No Prioritization

错误4：未分优先级

Symptom: Long list of improvements, unclear what to fix first, overwhelmed

Cause: Treating all issues equally without assessing impact

Fix: Prioritize issues: Critical (must fix) → High → Medium → Low (nice to have)

Prevention: Tag each issue with priority level during review

症状：改进列表过长，不清楚先修复什么，不知所措

原因：同等对待所有问题，未评估影响

修复：按优先级排序问题：关键（必须修复）→ 高 → 中 → 低（可选）

预防：评审时为每个问题标记优先级

Mistake 5: Batch Reviews

错误5：批量评审

Symptom: Discovering major issues late in development, costly rework

Cause: Waiting until end to review, accumulating issues

Fix: Review early and often - Fast Check during development, iterations

Prevention: Continuous validation, rapid feedback, catch issues when small

症状：开发后期才发现重大问题，修复成本高

原因：等到开发结束才评审，问题积累

修复：尽早并定期评审 - 开发过程中使用快速检查，迭代改进

预防：持续验证，快速反馈，问题小时就解决

Mistake 6: Ignoring Patterns

错误6：忽略模式

Symptom: Repeating same issues across multiple skills

Cause: Treating each review in isolation, not learning from patterns

Fix: Track common issues, create guidelines, update development process

Prevention: Document patterns, share learnings, improve templates

症状：多个技能重复出现相同问题

原因：孤立处理每个评审，未从模式中学习

修复：跟踪常见问题，创建指南，更新开发流程

预防：记录模式，分享经验，改进模板

Quick Reference

快速参考

The 5 Operations

5大操作

Operation	Focus	Automation	Time	Key Output
Structure	YAML, files, naming, organization	95%	5-10m	Structure score, compliance report
Content	Completeness, clarity, examples	40%	15-30m	Content score, section assessment
Quality	Patterns, best practices, anti-patterns	50%	20-40m	Quality score, pattern compliance
Usability	Ease of use, effectiveness	10%	30-60m	Usability score, scenario test results
Integration	Dependencies, data flow, composition	30%	15-25m	Integration score, dependency validation

操作	聚焦领域	自动化程度	时间	核心输出
结构评审	YAML、文件、命名、组织	95%	5-10m	结构评分、合规报告
内容评审	完整性、清晰度、示例	40%	15-30m	内容评分、章节评估
质量评审	模式、最佳实践、反模式	50%	20-40m	质量评分、模式合规性
可用性评审	易用性、有效性	10%	30-60m	可用性评分、场景测试结果
集成评审	依赖项、数据流、组合	30%	15-25m	集成评分、依赖项验证

Scoring Scale

评分等级

Score	Level	Meaning	Action
5	Excellent	Exceeds standards	Exemplary - use as example
4	Good	Meets standards	Production ready - standard quality
3	Acceptable	Minor improvements	Usable - note improvements
2	Needs Work	Notable issues	Not ready - significant improvements
1	Poor	Significant problems	Not viable - extensive rework

评分	等级	含义	行动
5	优秀	超出标准	堪称典范 - 作为示例
4	良好	符合标准	生产就绪 - 标准质量
3	合格	需小幅改进	可使用 - 记录改进点
2	需改进	存在明显问题	暂不就绪 - 需要显著改进
1	较差	存在严重问题	不可用 - 需要全面重构

Production Readiness

生产就绪性

Overall Score	Grade	Status	Decision
4.5-5.0	A	✅ Production Ready	Ship it - high quality
4.0-4.4	B+	✅ Ready (minor improvements)	Ship - note improvements for next iteration
3.5-3.9	B-	⚠️ Needs Improvements	Hold - fix issues first
2.5-3.4	C	❌ Not Ready	Don't ship - substantial work needed
1.5-2.4	D	❌ Not Ready	Don't ship - significant rework
1.0-1.4	F	❌ Not Ready	Don't ship - major issues

综合评分	等级	状态	决策
4.5-5.0	A	✅ 生产就绪	发布 - 高质量
4.0-4.4	B+	✅ 小幅改进后可上线	发布 - 记录改进点用于下一迭代
3.5-3.9	B-	⚠️ 需改进	暂缓 - 先修复问题
2.5-3.4	C	❌ 暂不就绪	不发布 - 需要大量优化
1.5-2.4	D	❌ 暂不就绪	不发布 - 需要显著重构
1.0-1.4	F	❌ 暂不就绪	不发布 - 存在重大问题

Review Modes

评审模式

Mode	Time	Use Case	Coverage
Fast Check	5-10m	During development, quick validation	Structure only (automated)
Custom	Variable	Targeted review, specific concerns	Selected dimensions
Comprehensive	1.5-2.5h	Pre-production, full assessment	All 5 dimensions + report

模式	时间	适用场景	覆盖范围
快速检查	5-10m	开发过程中、快速验证	仅结构（自动化）
自定义评审	可变	针对性评审、特定关注点	选定维度
全面评审	1.5-2.5h	上线前、全面评估	所有5个维度 + 报告

Common Commands

常用命令

bash

undefined

bash

undefined

Fast structure validation

快速结构验证

python3 scripts/validate-structure.py /path/to/skill

Verbose output

详细输出

python3 scripts/validate-structure.py /path/to/skill --verbose

JSON output

JSON格式输出

python3 scripts/validate-structure.py /path/to/skill --json

Pattern compliance check

模式合规性检查

python3 scripts/check-patterns.py /path/to/skill

Generate review report

生成评审报告

python3 scripts/generate-review-report.py review_data.json --output report.md

Run comprehensive review

运行全面评审

python3 scripts/review-runner.py /path/to/skill --mode comprehensive

undefined

python3 scripts/review-runner.py /path/to/skill --mode comprehensive

undefined

Weighted Average Formula

加权平均公式

Overall = (Structure × 0.20) + (Content × 0.25) + (Quality × 0.25) +
          (Usability × 0.15) + (Integration × 0.15)

Weight Rationale:

Content & Quality (25% each): Core value
Structure (20%): Foundation
Usability & Integration (15% each): Supporting

综合评分 = (结构 × 0.20) + (内容 × 0.25) + (质量 × 0.25) +
          (可用性 × 0.15) + (集成 × 0.15)

权重依据:

内容与质量（各25%）：核心价值
结构（20%）：基础
可用性与集成（各15%）：支撑因素