review-multi

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Review-Multi

Review-Multi

Overview

概述

review-multi provides a systematic framework for conducting comprehensive, multi-dimensional reviews of Claude Code skills. It evaluates skills across 5 independent dimensions, combining automated validation with manual assessment to deliver objective quality scores and actionable improvement recommendations.
Purpose: Systematic skill quality assurance through multi-dimensional assessment
The 5 Review Dimensions:
  1. Structure Review - YAML frontmatter, file organization, naming conventions, progressive disclosure
  2. Content Review - Section completeness, clarity, examples, documentation quality
  3. Quality Review - Pattern compliance, best practices, anti-pattern detection, code quality
  4. Usability Review - Ease of use, learnability, real-world effectiveness, user satisfaction
  5. Integration Review - Dependency documentation, data flow, component integration, composition
Automation Levels:
  • Structure: 95% automated (validate-structure.py)
  • Content: 40% automated, 60% manual assessment
  • Quality: 50% automated, 50% manual assessment
  • Usability: 10% automated, 90% manual testing
  • Integration: 30% automated, 70% manual review
Scoring System:
  • Scale: 1-5 per dimension (Excellent/Good/Acceptable/Needs Work/Poor)
  • Overall Score: Weighted average across dimensions
  • Grade: A/B/C/D/F mapping
  • Production Readiness: ≥4.5 ready, 4.0-4.4 ready with improvements, 3.5-3.9 needs work, <3.5 not ready
Value Proposition:
  • Objective: Evidence-based scoring using detailed rubrics (not subjective opinion)
  • Comprehensive: 5 dimensions cover all quality aspects
  • Efficient: Automation handles 30-95% of checks depending on dimension
  • Actionable: Specific, prioritized improvement recommendations
  • Consistent: Standardized checklists ensure repeatable results
  • Flexible: 3 review modes (Comprehensive, Fast Check, Custom)
Key Benefits:
  • Catch 70% of issues with fast automated checks
  • Reduce common quality issues by 30% using checklists
  • Ensure production readiness before deployment
  • Identify improvement opportunities systematically
  • Track quality improvements over time
  • Establish quality standards across skill ecosystem
review-multi 为 Claude Code 技能提供了一套系统化的全方位多维度评审框架。它从5个独立维度对技能进行评估,结合自动化验证与人工评估,输出客观的质量评分和可落地的改进建议。
核心目标:通过多维度评估实现系统化的技能质量保障
5大评审维度
  1. 结构评审 - YAML frontmatter、文件组织、命名规范、渐进式披露
  2. 内容评审 - 章节完整性、表述清晰度、示例质量、文档全面性
  3. 质量评审 - 模式合规性、最佳实践遵循、反模式检测、代码质量
  4. 可用性评审 - 易用性、易学性、实际场景有效性、用户满意度
  5. 集成评审 - 依赖项文档、数据流、组件集成、组合模式
自动化程度
  • 结构:95% 自动化(通过 validate-structure.py)
  • 内容:40% 自动化,60% 人工评估
  • 质量:50% 自动化,50% 人工评估
  • 可用性:10% 自动化,90% 人工测试
  • 集成:30% 自动化,70% 人工评审
评分体系
  • 评分范围:每个维度1-5分(优秀/良好/合格/需改进/较差)
  • 综合评分:各维度加权平均分
  • 等级映射:对应A/B/C/D/F等级
  • 生产就绪性:≥4.5分可直接上线,4.0-4.4分需小幅改进后上线,3.5-3.9分需优化后上线,<3.5分暂不适合上线
核心价值
  • 客观性:基于详细准则的证据导向评分(非主观判断)
  • 全面性:5大维度覆盖所有质量维度
  • 高效性:自动化完成30%-95%的检查工作(依维度而定)
  • 可落地性:具体、分优先级的改进建议
  • 一致性:标准化检查清单确保结果可复现
  • 灵活性:支持3种评审模式(全面评审、快速检查、自定义评审)
关键收益
  • 快速自动化检查可发现70%的问题
  • 通过检查清单减少30%的常见质量问题
  • 上线前确保技能的生产就绪性
  • 系统识别技能的改进空间
  • 跟踪技能质量的长期提升
  • 在技能生态中建立统一质量标准

When to Use

适用场景

Use review-multi when:
  1. Pre-Production Validation - Review new skills before deploying to production to catch issues early and ensure quality standards
  2. Quality Assurance - Conduct systematic QA on skills to validate they meet ecosystem standards and user needs
  3. Identifying Improvements - Discover specific, actionable improvements for existing skills through multi-dimensional assessment
  4. Continuous Improvement - Regular reviews throughout development lifecycle, not just at end, to maintain quality
  5. Production Readiness Assessment - Determine if skill is ready for production use with objective scoring and grade mapping
  6. Skill Ecosystem Standards - Ensure consistency and quality across multiple skills using standardized review framework
  7. Post-Update Validation - Review skills after major updates to ensure changes don't introduce issues or degrade quality
  8. Learning and Improvement - Use review findings to learn patterns, improve future skills, and refine development practices
  9. Team Calibration - Standardize quality assessment across multiple reviewers with objective rubrics
Don't Use When:
  • Quick syntax checks (use validate-structure.py directly)
  • In-progress drafts (wait until reasonably complete)
  • Experimental prototypes (not production-bound)
在以下场景中使用 review-multi:
  1. 上线前验证 - 新技能部署到生产环境前进行评审,提前发现问题并确保符合质量标准
  2. 质量保障 - 对技能进行系统化质量检查,验证其是否符合生态标准和用户需求
  3. 改进点识别 - 通过多维度评估发现现有技能的具体可落地改进方向
  4. 持续改进 - 在开发全周期内定期评审,而非仅在收尾阶段,以维持质量水平
  5. 生产就绪性评估 - 通过客观评分和等级映射判断技能是否具备上线条件
  6. 技能生态标准化 - 使用统一评审框架确保多个技能之间的一致性和质量
  7. 更新后验证 - 技能重大更新后进行评审,确保变更未引入问题或降低质量
  8. 学习与提升 - 利用评审结果总结模式,优化未来技能开发,完善开发流程
  9. 团队校准 - 通过客观准则标准化多位评审人员的质量评估
不适用场景
  • 快速语法检查(直接使用 validate-structure.py)
  • 未完成的草稿(待内容基本完善后再评审)
  • 实验性原型(非面向生产环境的技能)

Prerequisites

前置条件

Required:
  • Skill to review (in
    .claude/skills/[skill-name]/
    format)
  • Time allocation based on review mode:
    • Fast Check: 5-10 minutes
    • Single Operation: 15-60 minutes (varies by dimension)
    • Comprehensive Review: 1.5-2.5 hours
Optional:
  • Python 3.7+ (for automation scripts in Structure and Quality reviews)
  • PyYAML library (for YAML frontmatter validation)
  • Access to skill-under-review documentation
  • Familiarity with Claude Code skill patterns (see
    development-workflow/references/common-patterns.md
    )
Skills (no required dependencies, complementary):
  • development-workflow: Use review-multi after skill development
  • skill-updater: Apply review-multi recommendations
  • testing-validator: Combine with review-multi for full QA
必填项
  • 待评审的技能(格式为
    .claude/skills/[skill-name]/
  • 根据评审模式分配时间:
    • 快速检查:5-10分钟
    • 单维度评审:15-60分钟(依维度而定)
    • 全面评审:1.5-2.5小时
可选项
  • Python 3.7+(用于结构和质量评审中的自动化脚本)
  • PyYAML库(用于YAML frontmatter验证)
  • 待评审技能的文档访问权限
  • 熟悉Claude Code技能模式(参考
    development-workflow/references/common-patterns.md
相关技能(无强制依赖,可互补使用):
  • development-workflow:技能开发完成后使用review-multi
  • skill-updater:应用review-multi的改进建议
  • testing-validator:与review-multi结合完成完整质量保障

Scoring System

评分体系

The review-multi scoring system provides objective, consistent quality assessment across all skill dimensions.
review-multi评分体系为所有技能维度提供客观、一致的质量评估。

Per-Dimension Scoring (1-5 Scale)

单维度评分(1-5分制)

Each dimension is scored independently using a 1-5 integer scale:
5 - Excellent (Exceeds Standards)
  • All criteria met perfectly
  • Goes beyond minimum requirements
  • Exemplary quality that sets the bar
  • No issues or concerns identified
  • Can serve as example for others
4 - Good (Meets Standards)
  • Meets all critical criteria
  • 1-2 minor, non-critical issues
  • Production-ready quality
  • Standard expected level
  • Small improvements possible
3 - Acceptable (Minor Improvements Needed)
  • Meets most criteria
  • 3-4 issues, some may be critical
  • Usable but not optimal
  • Several improvements recommended
  • Can proceed with noted concerns
2 - Needs Work (Notable Issues)
  • Missing several criteria
  • 5-6 issues, multiple critical
  • Not production-ready
  • Significant improvements required
  • Rework needed before deployment
1 - Poor (Significant Problems)
  • Fails most criteria
  • 7+ issues, fundamentally flawed
  • Major quality concerns
  • Extensive rework required
  • Not viable in current state
每个维度独立采用1-5分整数评分:
5分 - 优秀(超出标准)
  • 所有准则完美达标
  • 超越最低要求
  • 质量堪称典范,可作为标杆
  • 未发现任何问题或隐患
  • 可作为其他技能的参考示例
4分 - 良好(符合标准)
  • 满足所有关键准则
  • 存在1-2个非关键性小问题
  • 具备生产就绪质量
  • 达到预期标准水平
  • 存在小幅改进空间
3分 - 合格(需小幅改进)
  • 满足大部分准则
  • 存在3-4个问题,部分可能为关键性问题
  • 可使用但并非最优
  • 存在多个改进建议
  • 可在记录问题后推进,但需后续优化
2分 - 需改进(存在明显问题)
  • 缺失多项准则要求
  • 存在5-6个问题,其中多个为关键性问题
  • 不具备生产就绪性
  • 需要显著改进
  • 上线前需重新优化
1分 - 较差(存在严重问题)
  • 未满足大部分准则
  • 存在7个以上问题,基础设计存在缺陷
  • 存在重大质量隐患
  • 需要全面重构
  • 当前状态不可用

Overall Score Calculation

综合评分计算

The overall score is a weighted average of the 5 dimension scores:
Overall = (Structure × 0.20) + (Content × 0.25) + (Quality × 0.25) +
          (Usability × 0.15) + (Integration × 0.15)
Weight Rationale:
  • Content & Quality (25% each): Core skill value - what it does and how well
  • Structure (20%): Important foundation - organization and compliance
  • Usability & Integration (15% each): Supporting factors - user experience and composition
Example Calculations:
  • Scores (5, 4, 4, 3, 4) → Overall = (5×0.20 + 4×0.25 + 4×0.25 + 3×0.15 + 4×0.15) = 4.15 → Grade B
  • Scores (4, 5, 5, 4, 4) → Overall = (4×0.20 + 5×0.25 + 5×0.25 + 4×0.15 + 4×0.15) = 4.55 → Grade A
  • Scores (3, 3, 2, 3, 3) → Overall = (3×0.20 + 3×0.25 + 2×0.25 + 3×0.15 + 3×0.15) = 2.85 → Grade C
综合评分为5个维度评分的加权平均值
综合评分 = (结构 × 0.20) + (内容 × 0.25) + (质量 × 0.25) +
          (可用性 × 0.15) + (集成 × 0.15)
权重依据
  • 内容与质量(各25%):技能核心价值——功能实现与质量
  • 结构(20%):重要基础——组织架构与合规性
  • 可用性与集成(各15%):支撑因素——用户体验与组件组合
计算示例
  • 各维度评分(5, 4, 4, 3, 4) → 综合评分 = (5×0.20 + 4×0.25 + 4×0.25 + 3×0.15 + 4×0.15) = 4.15 → 等级 B
  • 各维度评分(4, 5, 5, 4, 4) → 综合评分 = (4×0.20 + 5×0.25 + 5×0.25 + 4×0.15 + 4×0.15) = 4.55 → 等级 A
  • 各维度评分(3, 3, 2, 3, 3) → 综合评分 = (3×0.20 + 3×0.25 + 2×0.25 + 3×0.15 + 3×0.15) = 2.85 → 等级 C

Grade Mapping

等级映射

Overall scores map to letter grades:
  • A (4.5-5.0): Excellent - Production ready, high quality
  • B (3.5-4.4): Good - Ready with minor improvements
  • C (2.5-3.4): Acceptable - Needs improvements before production
  • D (1.5-2.4): Poor - Requires significant rework
  • F (1.0-1.4): Failing - Major issues, not viable
综合评分对应以下字母等级:
  • A (4.5-5.0):优秀 - 生产就绪,高质量
  • B (3.5-4.4):良好 - 需小幅改进后上线
  • C (2.5-3.4):合格 - 上线前需要改进
  • D (1.5-2.4):较差 - 需要显著重构
  • F (1.0-1.4):不合格 - 存在重大问题,不可用

Production Readiness Assessment

生产就绪性评估

Based on overall score:
  • ≥4.5 (Grade A): ✅ Production Ready - High quality, deploy with confidence
  • 4.0-4.4 (Grade B+): ✅ Ready with Minor Improvements - Can deploy, address improvements in next iteration
  • 3.5-3.9 (Grade B-): ⚠️ Needs Improvements - Address issues before production deployment
  • <3.5 (Grade C-F): ❌ Not Ready - Significant rework required before deployment
Decision Framework:
  • A Grade: Ship it - exemplary quality
  • B Grade (4.0+): Ship it - standard quality, note improvements for future
  • B- Grade (3.5-3.9): Hold - fix identified issues first
  • C-F Grade: Don't ship - substantial work needed
基于综合评分的生产就绪性判断:
  • ≥4.5分(A级):✅ 生产就绪 - 高质量,可放心部署
  • 4.0-4.4分(B+级):✅ 小幅改进后可上线 - 可部署,在下一迭代中完成改进
  • 3.5-3.9分(B-级):⚠️ 需改进 - 解决问题后再部署到生产环境
  • <3.5分(C-F级):❌ 暂不就绪 - 需要显著重构后再考虑上线
决策框架
  • A级:直接发布 - 质量堪称典范
  • B级(4.0+):直接发布 - 标准质量,记录改进点用于未来迭代
  • B-级(3.5-3.9):暂缓发布 - 先修复已识别问题
  • C-F级:不发布 - 需要大量优化工作

Operations

操作流程

Operation 1: Structure Review

操作1:结构评审

Purpose: Validate file organization, naming conventions, YAML frontmatter compliance, and progressive disclosure
When to Use This Operation:
  • Always run first (fast automated check catches 70% of issues)
  • Before comprehensive review (quick validation of basics)
  • During development (continuous structure validation)
  • Quick quality checks (5-10 minute validation)
Automation Level: 95% automated via
scripts/validate-structure.py
Process:
  1. Run Structure Validation Script
    bash
    python3 scripts/validate-structure.py /path/to/skill [--json] [--verbose]
    Script checks YAML, file structure, naming, progressive disclosure
  2. Review YAML Frontmatter
    • Verify name field in kebab-case format
    • Check description has 5+ trigger keywords naturally embedded
    • Validate YAML syntax is correct
  3. Verify File Structure
    • Confirm SKILL.md exists
    • Check references/ and scripts/ organization (if present)
    • Verify README.md exists
  4. Check Naming Conventions
    • SKILL.md and README.md uppercase
    • references/ files: lowercase-hyphen-case
    • scripts/ files: lowercase-hyphen-case with extension
  5. Validate Progressive Disclosure
    • SKILL.md <1,500 lines (warn if >1,200)
    • references/ files 300-800 lines each
    • No monolithic files
Validation Checklist:
  • YAML frontmatter present and valid syntax
  • name
    field in kebab-case format (e.g., skill-name)
  • description
    includes 5+ trigger keywords (naturally embedded)
  • SKILL.md file exists
  • File naming follows conventions (SKILL.md uppercase, references lowercase-hyphen)
  • Directory structure correct (references/, scripts/ if present)
  • SKILL.md size appropriate (<1,500 lines, ideally <1,200)
  • References organized by topic (if present)
  • No monolithic files (progressive disclosure maintained)
  • README.md present
Scoring Criteria:
  • 5 - Excellent: All 10 checks pass, perfect compliance, exemplary structure
  • 4 - Good: 8-9 checks pass, 1-2 minor non-critical issues (e.g., README missing but optional)
  • 3 - Acceptable: 6-7 checks pass, 3-4 issues including some critical (e.g., YAML invalid but fixable)
  • 2 - Needs Work: 4-5 checks pass, 5-6 issues with multiple critical (e.g., no SKILL.md, bad naming)
  • 1 - Poor: ≤3 checks pass, 7+ issues, fundamentally flawed structure
Outputs:
  • Structure score (1-5)
  • Pass/fail status for each checklist item
  • List of issues found with severity (critical/warning/info)
  • Specific improvement recommendations with fix guidance
  • JSON report (if using script with --json flag)
Time Estimate: 5-10 minutes (mostly automated)
Example:
bash
$ python3 scripts/validate-structure.py .claude/skills/todo-management

Structure Validation Report
===========================
Skill: todo-management
Date: 2025-11-06

✅ YAML Frontmatter: PASS
   - Name format: valid (kebab-case)
   - Trigger keywords: 8 found (target: 5+)

✅ File Structure: PASS
   - SKILL.md: exists
   - README.md: exists
   - references/: 3 files found
   - scripts/: 1 file found

✅ Naming Conventions: PASS
   - All files follow conventions

⚠️  Progressive Disclosure: WARNING
   - SKILL.md: 569 lines (good)
   - state-management-guide.md: 501 lines (good)
   - BUT: No Quick Reference section detected

Overall Structure Score: 4/5 (Good)
Issues: 1 warning (missing Quick Reference)
Recommendation: Add Quick Reference section to SKILL.md

目标:验证文件组织、命名规范、YAML frontmatter合规性以及渐进式披露
适用场景
  • 作为首个操作执行(快速自动化检查可发现70%的问题)
  • 全面评审前的快速基础验证
  • 开发过程中的持续结构验证
  • 快速质量检查(5-10分钟验证)
自动化程度:95% 自动化,通过
scripts/validate-structure.py
实现
流程
  1. 运行结构验证脚本
    bash
    python3 scripts/validate-structure.py /path/to/skill [--json] [--verbose]
    脚本检查YAML语法、文件结构、命名规范、渐进式披露
  2. 评审YAML frontmatter
    • 验证name字段采用kebab-case格式
    • 检查description中自然嵌入5个以上触发关键词
    • 验证YAML语法正确
  3. 验证文件结构
    • 确认SKILL.md文件存在
    • 检查references/和scripts/目录的组织(若存在)
    • 验证README.md文件存在
  4. 检查命名规范
    • SKILL.md和README.md文件名大写
    • references/目录下文件采用小写连字符格式
    • scripts/目录下文件采用小写连字符格式并带扩展名
  5. 验证渐进式披露
    • SKILL.md文件行数<1500行(超过1200行给出警告)
    • references/目录下文件每行300-800行
    • 不存在单体大文件
验证检查清单
  • YAML frontmatter存在且语法有效
  • name
    字段采用kebab-case格式(如skill-name)
  • description
    中自然嵌入5个以上触发关键词
  • SKILL.md文件存在
  • 文件命名符合规范(SKILL.md大写,references目录下文件小写连字符)
  • 目录结构正确(若存在references/和scripts/目录)
  • SKILL.md文件大小合适(<1500行,理想状态<1200行)
  • references目录下文件按主题组织(若存在)
  • 不存在单体大文件(保持渐进式披露)
  • README.md文件存在
评分准则
  • 5分 - 优秀:所有10项检查通过,完全合规,结构堪称典范
  • 4分 - 良好:8-9项检查通过,存在1-2个非关键性小问题(如可选的README.md缺失)
  • 3分 - 合格:6-7项检查通过,存在3-4个问题,部分为关键性问题(如YAML语法错误但可修复)
  • 2分 - 需改进:4-5项检查通过,存在5-6个问题,其中多个为关键性问题(如缺失SKILL.md、命名错误)
  • 1分 - 较差:≤3项检查通过,存在7个以上问题,结构存在根本性缺陷
输出结果
  • 结构评分(1-5分)
  • 每项检查的通过/失败状态
  • 发现的问题列表及严重程度(关键/警告/信息)
  • 具体的改进建议及修复指导
  • JSON格式报告(使用--json参数时生成)
时间预估:5-10分钟(主要为自动化操作)
示例
bash
$ python3 scripts/validate-structure.py .claude/skills/todo-management

结构验证报告
===========================
技能: todo-management
日期: 2025-11-06

✅ YAML Frontmatter: 通过
   - 名称格式: 有效(kebab-case)
   - 触发关键词: 发现8个(目标: 5个以上)

✅ 文件结构: 通过
   - SKILL.md: 存在
   - README.md: 存在
   - references/: 发现3个文件
   - scripts/: 发现1个文件

✅ 命名规范: 通过
   - 所有文件符合命名规范

⚠️  渐进式披露: 警告
   - SKILL.md: 569行(符合要求)
   - state-management-guide.md: 501行(符合要求)
   - 问题: 未检测到快速参考章节

整体结构评分: 4/5(良好)
问题: 1个警告(缺失快速参考章节)
建议: 在SKILL.md中添加快速参考章节

Operation 2: Content Review

操作2:内容评审

Purpose: Assess section completeness, content clarity, example quality, and documentation comprehensiveness
When to Use This Operation:
  • Evaluate documentation quality
  • Assess completeness of skill content
  • Review example quality and quantity
  • Validate information architecture
  • Check clarity and organization
Automation Level: 40% automated (section detection, example counting), 60% manual assessment
Process:
  1. Check Section Completeness (automated + manual)
    • Verify 5 core sections present: Overview, When to Use, Main Content (workflow/operations), Best Practices, Quick Reference
    • Check optional sections: Prerequisites, Common Mistakes, Troubleshooting
    • Assess if all necessary sections included
  2. Assess Content Clarity (manual)
    • Is content understandable?
    • Is organization logical?
    • Are explanations clear without being verbose?
    • Is technical level appropriate for audience?
  3. Evaluate Example Quality (automated count + manual quality)
    • Count code/command examples (target: 5+)
    • Check if examples are concrete (not abstract placeholders)
    • Verify examples are executable/copy-pasteable
    • Assess if examples help understanding
  4. Review Documentation Completeness (manual)
    • Is all necessary information present?
    • Are there unexplained gaps?
    • Is sufficient detail provided?
    • Are edge cases covered?
  5. Check Explanation Depth (manual)
    • Not too brief (insufficient detail)?
    • Not too verbose (unnecessary length)?
    • Balanced depth for complexity?
Validation Checklist:
  • Overview/Introduction section present
  • When to Use section present with 5+ scenarios
  • Main content (workflow steps OR operations OR reference material) complete
  • Best Practices section present
  • Quick Reference section present
  • 5+ code/command examples included
  • Examples are concrete (not abstract placeholders like "YOUR_VALUE_HERE")
  • Content clarity: readable and well-structured
  • Sufficient detail: not too brief
  • Not too verbose: concise without unnecessary length
Scoring Criteria:
  • 5 - Excellent: All 10 checks pass, exceptional clarity, great examples, comprehensive documentation
  • 4 - Good: 8-9 checks pass, good content with minor gaps or clarity issues
  • 3 - Acceptable: 6-7 checks pass, some sections weak or missing, acceptable clarity
  • 2 - Needs Work: 4-5 checks pass, multiple sections incomplete/unclear, poor examples
  • 1 - Poor: ≤3 checks pass, major gaps, confusing content, few/no examples
Outputs:
  • Content score (1-5)
  • Section-by-section assessment (present/missing/weak)
  • Example quality rating and count
  • Specific content improvement recommendations
  • Clarity issues identified with examples
Time Estimate: 15-30 minutes (requires manual review)
Example:
Content Review: prompt-builder
==============================

Section Completeness: 9/10 ✅
✅ Overview: Present, clear explanation of purpose
✅ When to Use: 7 scenarios listed
✅ Main Content: 5-step workflow, well-organized
✅ Best Practices: 6 practices documented
✅ Quick Reference: Present
⚠️  Common Mistakes: Not present (optional but valuable)

Example Quality: 8/10 ✅
- Count: 12 examples (exceeds target of 5+)
- Concrete: Yes, all examples executable
- Helpful: Yes, demonstrate key concepts
- Minor: Could use 1-2 edge case examples

Content Clarity: 9/10 ✅
- Well-organized logical flow
- Clear explanations without verbosity
- Technical level appropriate
- Minor: Step 3 could be clearer (add diagram)

Documentation Completeness: 8/10 ✅
- All workflow steps documented
- Validation criteria clear
- Minor gaps: Error handling not covered

Content Score: 4/5 (Good)
Primary Recommendation: Add Common Mistakes section
Secondary: Add error handling guidance to Step 3

目标:评估章节完整性、内容清晰度、示例质量以及文档全面性
适用场景
  • 评估文档质量
  • 评估技能内容的完整性
  • 评审示例的质量与数量
  • 验证信息架构
  • 检查内容清晰度与组织性
自动化程度:40% 自动化(章节检测、示例计数),60% 人工评估
流程
  1. 检查章节完整性(自动化+人工)
    • 验证5个核心章节是否存在:概述、适用场景、核心内容(工作流/操作)、最佳实践、快速参考
    • 检查可选章节:前置条件、常见错误、故障排除
    • 评估是否包含所有必要章节
  2. 评估内容清晰度(人工)
    • 内容是否易于理解?
    • 组织逻辑是否清晰?
    • 解释是否清晰且不冗长?
    • 技术难度是否符合目标受众?
  3. 评估示例质量(自动化计数+人工质量评估)
    • 统计代码/命令示例数量(目标: 5个以上)
    • 检查示例是否具体(非抽象占位符)
    • 验证示例是否可执行/可复制粘贴
    • 评估示例是否有助于理解内容
  4. 评审文档全面性(人工)
    • 是否包含所有必要信息?
    • 是否存在未解释的空白?
    • 提供的细节是否充分?
    • 是否覆盖边缘场景?
  5. 检查解释深度(人工)
    • 是否过于简略(细节不足)?
    • 是否过于冗长(不必要的篇幅)?
    • 深度是否与复杂度匹配?
验证检查清单
  • 概述/介绍章节存在
  • 适用场景章节存在且包含5个以上场景
  • 核心内容(工作流步骤/操作/参考资料)完整
  • 最佳实践章节存在
  • 快速参考章节存在
  • 包含5个以上代码/命令示例
  • 示例具体(非抽象占位符如"YOUR_VALUE_HERE")
  • 内容清晰度:可读性强且结构良好
  • 细节充分:不过于简略
  • 简洁性:无不必要的冗长内容
评分准则
  • 5分 - 优秀:所有10项检查通过,内容异常清晰,示例优质,文档全面
  • 4分 - 良好:8-9项检查通过,内容质量良好,存在少量空白或清晰度问题
  • 3分 - 合格:6-7项检查通过,部分章节薄弱或缺失,清晰度尚可
  • 2分 - 需改进:4-5项检查通过,多个章节不完整/不清晰,示例质量差
  • 1分 - 较差:≤3项检查通过,存在重大空白,内容混乱,示例极少或无
输出结果
  • 内容评分(1-5分)
  • 逐章节评估(存在/缺失/薄弱)
  • 示例质量评级与数量
  • 具体的内容改进建议
  • 识别的清晰度问题及示例
时间预估:15-30分钟(需要人工评审)
示例
内容评审: prompt-builder
==============================

章节完整性: 9/10 ✅
✅ 概述: 存在,清晰说明目标
✅ 适用场景: 列出7个场景
✅ 核心内容: 5步工作流,组织良好
✅ 最佳实践: 记录6条实践
✅ 快速参考: 存在
⚠️  常见错误: 不存在(可选但有价值)

示例质量: 8/10 ✅
- 数量: 12个示例(超出5个以上的目标)
- 具体性: 是,所有示例均可执行
- 实用性: 是,演示核心概念
- 小问题: 可增加1-2个边缘场景示例

内容清晰度: 9/10 ✅
- 组织逻辑清晰
- 解释清晰且不冗长
- 技术难度合适
- 小问题: 步骤3可更清晰(添加图表)

文档全面性: 8/10 ✅
- 所有工作流步骤已记录
- 验证准则清晰
- 小空白: 未覆盖错误处理

内容评分: 4/5(良好)
主要建议: 添加常见错误章节
次要建议: 在步骤3中添加错误处理指导

Operation 3: Quality Review

操作3:质量评审

Purpose: Evaluate pattern compliance, best practices adherence, anti-pattern detection, and code/script quality
When to Use This Operation:
  • Validate standards compliance
  • Check pattern implementation
  • Detect anti-patterns
  • Assess code quality (if scripts present)
  • Ensure best practices followed
Automation Level: 50% automated (pattern detection, anti-pattern checking), 50% manual assessment
Process:
  1. Detect Architecture Pattern (automated + manual)
    • Identify pattern type: workflow/task/reference/capabilities
    • Verify pattern correctly implemented
    • Check pattern consistency throughout skill
  2. Validate Documentation Patterns (automated + manual)
    • Verify 5 core sections present
    • Check consistent structure across steps/operations
    • Validate section formatting
  3. Check Best Practices (manual)
    • Validation checklists present and specific?
    • Examples throughout documentation?
    • Quick Reference available?
    • Error cases considered?
  4. Detect Anti-Patterns (automated + manual)
    • Keyword stuffing (trigger keywords unnatural)?
    • Monolithic SKILL.md (>1,500 lines, no progressive disclosure)?
    • Inconsistent structure (each section different format)?
    • Vague validation ("everything works")?
    • Missing examples (too abstract)?
    • Placeholders in production ("YOUR_VALUE_HERE")?
    • Ignoring error cases (only happy path)?
    • Over-engineering simple skills?
    • Unclear dependencies?
    • No Quick Reference?
  5. Assess Code Quality (manual, if scripts present)
    • Scripts well-documented (docstrings)?
    • Error handling present?
    • CLI interfaces clear?
    • Code style consistent?
Validation Checklist:
  • Architecture pattern correctly implemented (workflow/task/reference/capabilities)
  • Consistent structure across steps/operations (same format throughout)
  • Validation checklists present and specific (measurable, not vague)
  • Best practices section actionable (specific guidance)
  • No keyword stuffing (trigger keywords natural, contextual)
  • No monolithic SKILL.md (progressive disclosure used if >1,000 lines)
  • Examples are complete (no "YOUR_VALUE_HERE" placeholders in production)
  • Error cases considered (not just happy path documented)
  • Dependencies documented (if skill requires other skills)
  • Scripts well-documented (if present: docstrings, error handling, CLI help)
Scoring Criteria:
  • 5 - Excellent: All 10 checks pass, exemplary quality, no anti-patterns, exceeds standards
  • 4 - Good: 8-9 checks pass, high quality, meets all standards, minor deviations
  • 3 - Acceptable: 6-7 checks pass, acceptable quality, some standard violations, 2-3 anti-patterns
  • 2 - Needs Work: 4-5 checks pass, quality issues, multiple standard violations, 4-5 anti-patterns
  • 1 - Poor: ≤3 checks pass, poor quality, significant problems, 6+ anti-patterns detected
Outputs:
  • Quality score (1-5)
  • Pattern compliance assessment (pattern detected, compliance level)
  • Anti-patterns detected (list with severity)
  • Best practices gaps identified
  • Code quality assessment (if scripts present)
  • Prioritized improvement recommendations
Time Estimate: 20-40 minutes (mixed automated + manual)
Example:
Quality Review: workflow-skill-creator
======================================

Pattern Compliance: ✅
- Pattern Detected: Workflow-based
- Implementation: Correct (5 sequential steps with dependencies)
- Consistency: High (all steps follow same structure)

Documentation Patterns: ✅
- 5 Core Sections: All present
- Structure: Consistent across all 5 steps
- Formatting: Proper heading levels

Best Practices Adherence: 8/10 ✅
✅ Validation checklists: Present and specific
✅ Examples throughout: 6 examples included
✅ Quick Reference: Present
⚠️ Error handling: Limited (only happy path in examples)

Anti-Pattern Detection: 1 detected ⚠️
✅ No keyword stuffing (15 natural keywords)
✅ No monolithic file (1,465 lines but has references/)
✅ Consistent structure
✅ Specific validation criteria
✅ Examples complete (no placeholders)
⚠️ Error cases: Only happy path documented
✅ Dependencies: Clearly documented
✅ Not over-engineered

Code Quality: N/A (no scripts)

Quality Score: 4/5 (Good)
Primary Issue: Limited error handling documentation
Recommendation: Add error case examples and recovery guidance

目标:评估模式合规性、最佳实践遵循、反模式检测以及代码/脚本质量
适用场景
  • 验证标准合规性
  • 检查模式实现
  • 检测反模式
  • 评估代码质量(若存在脚本)
  • 确保遵循最佳实践
自动化程度:50% 自动化(模式检测、反模式检查),50% 人工评估
流程
  1. 检测架构模式(自动化+人工)
    • 识别模式类型:工作流/任务/参考/能力
    • 验证模式是否正确实现
    • 检查技能中模式的一致性
  2. 验证文档模式(自动化+人工)
    • 验证5个核心章节是否存在
    • 检查步骤/操作间的结构一致性
    • 验证章节格式
  3. 检查最佳实践遵循(人工)
    • 是否存在具体的验证检查清单?
    • 文档中是否包含示例?
    • 是否提供快速参考?
    • 是否考虑错误场景?
  4. 检测反模式(自动化+人工)
    • 关键词堆砌(触发关键词不自然)?
    • 单体SKILL.md文件(>1500行,无渐进式披露)?
    • 结构不一致(各章节格式不同)?
    • 模糊验证(如"一切正常")?
    • 缺失示例(过于抽象)?
    • 生产环境中存在占位符(如"YOUR_VALUE_HERE")?
    • 忽略错误场景(仅记录正常流程)?
    • 过度设计简单技能?
    • 依赖项未明确说明?
    • 无快速参考?
  5. 评估代码质量(人工,若存在脚本)
    • 脚本是否有完善的文档(文档字符串)?
    • 是否存在错误处理?
    • CLI接口是否清晰?
    • 代码风格是否一致?
验证检查清单
  • 架构模式正确实现(工作流/任务/参考/能力)
  • 步骤/操作间结构一致(所有步骤遵循相同格式)
  • 存在具体的验证检查清单(可衡量,非模糊)
  • 最佳实践章节具备可操作性(具体指导)
  • 无关键词堆砌(触发关键词自然、符合语境)
  • 无单体文件(若超过1000行则使用渐进式披露)
  • 示例完整(生产环境中无"YOUR_VALUE_HERE"占位符)
  • 考虑错误场景(不仅记录正常流程)
  • 依赖项已记录(若技能依赖其他技能)
  • 脚本文档完善(若存在:文档字符串、错误处理、CLI帮助)
评分准则
  • 5分 - 优秀:所有10项检查通过,质量堪称典范,无反模式,超出标准
  • 4分 - 良好:8-9项检查通过,高质量,符合所有标准,存在微小偏差
  • 3分 - 合格:6-7项检查通过,质量尚可,存在部分标准违规,2-3个反模式
  • 2分 - 需改进:4-5项检查通过,存在质量问题,多项标准违规,4-5个反模式
  • 1分 - 较差:≤3项检查通过,质量差,存在重大问题,检测到6个以上反模式
输出结果
  • 质量评分(1-5分)
  • 模式合规性评估(检测到的模式、合规等级)
  • 检测到的反模式(列表及严重程度)
  • 识别的最佳实践空白
  • 代码质量评估(若存在脚本)
  • 分优先级的改进建议
时间预估:20-40分钟(自动化+人工混合)
示例
质量评审: workflow-skill-creator
======================================

模式合规性: ✅
- 检测到的模式: 基于工作流
- 实现: 正确(5个带依赖的连续步骤)
- 一致性: 高(所有步骤遵循相同结构)

文档模式: ✅
- 5个核心章节: 全部存在
- 结构: 所有5个步骤结构一致
- 格式: 正确的标题层级

最佳实践遵循: 8/10 ✅
✅ 验证检查清单: 存在且具体
✅ 文档中包含示例: 6个示例
✅ 快速参考: 存在
⚠️  错误处理: 有限(示例中仅包含正常流程)

反模式检测: 1个检测到 ⚠️
✅ 无关键词堆砌(15个自然关键词)
✅ 无单体文件(1465行但包含references/目录)
✅ 结构一致
✅ 具体验证准则
✅ 示例完整(无占位符)
⚠️  错误场景: 仅记录正常流程
✅ 依赖项: 已明确记录
✅ 未过度设计

代码质量: 不适用(无脚本)

质量评分: 4/5(良好)
主要问题: 错误处理文档有限
建议: 添加错误场景示例及恢复指导

Operation 4: Usability Review

操作4:可用性评审

Purpose: Evaluate ease of use, learnability, real-world effectiveness, and user satisfaction through scenario testing
When to Use This Operation:
  • Test real-world usage
  • Assess user experience
  • Evaluate learnability
  • Measure effectiveness
  • Validate skill achieves stated purpose
Automation Level: 10% automated (basic checks), 90% manual testing
Process:
  1. Test in Real-World Scenario
    • Select appropriate use case from "When to Use" section
    • Actually use the skill to complete task
    • Document experience: smooth or friction?
    • Note any confusion or difficulty
  2. Assess Navigation/Findability
    • Can you find needed information easily?
    • Is information architecture logical?
    • Are sections well-organized?
    • Is Quick Reference helpful?
  3. Evaluate Clarity
    • Are instructions clear and actionable?
    • Are steps easy to follow?
    • Do examples help understanding?
    • Is technical terminology explained?
  4. Measure Effectiveness
    • Does skill achieve stated purpose?
    • Does it deliver promised value?
    • Are outputs useful and complete?
    • Would you use it again?
  5. Assess Learning Curve
    • How long to understand skill?
    • How long to use effectively?
    • Is learning curve reasonable for complexity?
    • Are first-time users supported well?
Validation Checklist:
  • Skill tested in real-world scenario (actual usage, not just reading)
  • Users can find information easily (navigation clear, sections logical)
  • Instructions are clear and actionable (can follow without confusion)
  • Examples help understanding (concrete, demonstrate key concepts)
  • Skill achieves stated purpose (delivers promised value)
  • Learning curve reasonable (appropriate for skill complexity)
  • Error messages helpful (if applicable: clear, actionable guidance)
  • Overall user satisfaction high (would use again, recommend to others)
Scoring Criteria:
  • 5 - Excellent: All 8 checks pass, excellent usability, easy to learn, highly effective, very satisfying
  • 4 - Good: 6-7 checks pass, good usability, minor friction points, generally effective
  • 3 - Acceptable: 4-5 checks pass, acceptable usability, some confusion/difficulty, moderately effective
  • 2 - Needs Work: 2-3 checks pass, usability issues, frustrating or confusing, limited effectiveness
  • 1 - Poor: ≤1 check passes, poor usability, hard to use, ineffective, unsatisfying
Outputs:
  • Usability score (1-5)
  • Scenario test results (success/partial/failure)
  • User experience assessment (smooth/acceptable/frustrating)
  • Specific usability improvements identified
  • Learning curve assessment
  • Effectiveness rating
Time Estimate: 30-60 minutes (requires actual testing)
Example:
Usability Review: skill-researcher
==================================

Real-World Scenario Test: ✅
- Scenario: Research GitHub API integration patterns
- Result: SUCCESS - Found 5 relevant sources, synthesized findings
- Experience: Smooth, operations clearly explained
- Time: 45 minutes (expected 60 min range)

Navigation/Findability: 9/10 ✅
- Information easy to find
- 5 operations clearly separated
- Quick Reference table very helpful
- Minor: Could use table of contents for long doc

Instruction Clarity: 9/10 ✅
- Steps clear and actionable
- Process well-explained
- Examples demonstrate concepts
- Minor: Web search query formulation could be clearer

Effectiveness: 10/10 ✅
- Achieved purpose: Found patterns and synthesized
- Delivered value: Comprehensive research in 45 min
- Would use again: Yes, very helpful

Learning Curve: 8/10 ✅
- Time to understand: 10 minutes
- Time to use effectively: 15 minutes
- Reasonable for complexity
- First-time user: Some concepts need explanation (credibility scoring)

Error Handling: N/A (no errors encountered)

User Satisfaction: 9/10 ✅
- Would use again: Yes
- Would recommend: Yes
- Overall experience: Very positive

Usability Score: 5/5 (Excellent)
Minor Improvement: Add brief explanation of credibility scoring concept

目标:通过场景测试评估易用性、易学性、实际场景有效性以及用户满意度
适用场景
  • 测试实际场景使用
  • 评估用户体验
  • 评估易学性
  • 衡量有效性
  • 验证技能是否实现既定目标
自动化程度:10% 自动化(基础检查),90% 人工测试
流程
  1. 实际场景测试
    • 从"适用场景"中选择合适的用例
    • 实际使用技能完成任务
    • 记录体验:流畅还是存在障碍?
    • 记录任何困惑或困难
  2. 评估导航/可查找性
    • 是否能轻松找到所需信息?
    • 信息架构是否逻辑清晰?
    • 章节组织是否良好?
    • 快速参考是否实用?
  3. 评估清晰度
    • 说明是否清晰且可操作?
    • 步骤是否易于遵循?
    • 示例是否有助于理解?
    • 技术术语是否有解释?
  4. 衡量有效性
    • 技能是否实现既定目标?
    • 是否交付承诺的价值?
    • 输出是否有用且完整?
    • 是否会再次使用?
  5. 评估学习曲线
    • 理解技能需要多长时间?
    • 熟练使用需要多长时间?
    • 学习曲线是否与技能复杂度匹配?
    • 是否为首次用户提供足够支持?
验证检查清单
  • 已在实际场景中测试技能(实际使用,而非仅阅读文档)
  • 用户可轻松找到信息(导航清晰,章节逻辑)
  • 说明清晰且可操作(无需困惑即可遵循)
  • 示例有助于理解(具体,演示核心概念)
  • 技能实现既定目标(交付承诺价值)
  • 学习曲线合理(与技能复杂度匹配)
  • 错误信息实用(若适用:清晰、可操作的指导)
  • 整体用户满意度高(会再次使用,会推荐给他人)
评分准则
  • 5分 - 优秀:所有8项检查通过,可用性极佳,易于学习,高效,用户体验极佳
  • 4分 - 良好:6-7项检查通过,可用性良好,存在微小障碍,整体有效
  • 3分 - 合格:4-5项检查通过,可用性尚可,存在部分困惑/困难,中等有效
  • 2分 - 需改进:2-3项检查通过,存在可用性问题,使用过程令人沮丧或困惑,有效性有限
  • 1分 - 较差:≤1项检查通过,可用性差,难以使用,无效,用户体验差
输出结果
  • 可用性评分(1-5分)
  • 场景测试结果(成功/部分成功/失败)
  • 用户体验评估(流畅/尚可/令人沮丧)
  • 识别的具体可用性改进点
  • 学习曲线评估
  • 有效性评级
时间预估:30-60分钟(需要实际测试)
示例
可用性评审: skill-researcher
==================================

实际场景测试: ✅
- 场景: 研究GitHub API集成模式
- 结果: 成功 - 找到5个相关来源,完成成果整合
- 体验: 流畅,操作说明清晰
- 时间: 45分钟(预期60分钟范围内)

导航/可查找性: 9/10 ✅
- 信息易于查找
- 5个操作清晰分离
- 快速参考表格非常实用
- 小问题: 长文档可添加目录

说明清晰度: 9/10 ✅
- 步骤清晰且可操作
- 流程解释充分
- 示例演示核心概念
- 小问题: 网页搜索查询构建说明可更清晰

有效性: 10/10 ✅
- 实现目标: 找到模式并完成整合
- 交付价值: 45分钟内完成全面研究
- 会再次使用: 是,非常实用

学习曲线: 8/10 ✅
- 理解时间: 10分钟
- 熟练使用时间: 15分钟
- 与复杂度匹配
- 首次用户: 部分概念需要解释(可信度评分)

错误处理: 不适用(未遇到错误)

用户满意度: 9/10 ✅
- 会再次使用: 是
- 会推荐: 是
- 整体体验: 非常积极

可用性评分: 5/5(优秀)
小改进建议: 添加可信度评分概念的简要说明

Operation 5: Integration Review

操作5:集成评审

Purpose: Assess dependency documentation, data flow clarity, component integration, and composition patterns
When to Use This Operation:
  • Review workflow skills (that compose other skills)
  • Validate dependency documentation
  • Check integration clarity
  • Assess composition patterns
  • Verify cross-references valid
Automation Level: 30% automated (dependency checking, cross-reference validation), 70% manual assessment
Process:
  1. Review Dependency Documentation (manual)
    • Are required skills documented?
    • Are optional/complementary skills mentioned?
    • Is YAML
      dependencies
      field used (if applicable)?
    • Are dependency versions noted (if relevant)?
  2. Assess Data Flow Clarity (manual, for workflow skills)
    • Is data flow between skills explained?
    • Are inputs/outputs documented for each step?
    • Do users understand how data moves?
    • Are there diagrams or flowcharts (if helpful)?
  3. Evaluate Component Integration (manual)
    • How do component skills work together?
    • Are integration points clear?
    • Are there integration examples?
    • Is composition pattern documented?
  4. Verify Cross-References (automated + manual)
    • Do internal links work (references to references/, scripts/)?
    • Are external skill references correct?
    • Are complementary skills mentioned?
  5. Check Composition Patterns (manual, for workflow skills)
    • Is composition pattern identified (sequential/parallel/conditional/etc.)?
    • Is pattern correctly implemented?
    • Are orchestration details provided?
Validation Checklist:
  • Dependencies documented (if skill requires other skills)
  • YAML
    dependencies
    field correct (if used)
  • Data flow explained (for workflow skills: inputs/outputs clear)
  • Integration points clear (how component skills connect)
  • Component skills referenced correctly (names accurate, paths valid)
  • Cross-references valid (internal links work, external references correct)
  • Integration examples provided (if applicable: how to use together)
  • Composition pattern documented (if workflow: sequential/parallel/etc.)
  • Complementary skills mentioned (optional but valuable related skills)
Scoring Criteria:
  • 5 - Excellent: All 9 checks pass (applicable ones), perfect integration documentation
  • 4 - Good: 7-8 checks pass, good integration, minor gaps in documentation
  • 3 - Acceptable: 5-6 checks pass, some integration unclear, missing details
  • 2 - Needs Work: 3-4 checks pass, integration issues, poorly documented dependencies/flow
  • 1 - Poor: ≤2 checks pass, poor integration, confusing or missing dependency documentation
Outputs:
  • Integration score (1-5)
  • Dependency validation results (required/optional/complementary documented)
  • Data flow clarity assessment (for workflow skills)
  • Integration clarity rating
  • Cross-reference validation results
  • Improvement recommendations
Time Estimate: 15-25 minutes (mostly manual)
Example:
Integration Review: development-workflow
========================================

Dependency Documentation: 10/10 ✅
- Required Skills: None (workflow is standalone)
- Component Skills: 5 clearly documented (skill-researcher, planning-architect, task-development, prompt-builder, todo-management)
- Optional Skills: 3 complementary skills mentioned (review-multi, skill-updater, testing-validator)
- YAML Field: Not used (not required, skills referenced in content)

Data Flow Clarity: 10/10 ✅ (Workflow Skill)
- Data flow diagram present (skill → output → next skill)
- Inputs/outputs for each step documented
- Users understand how artifacts flow
- Example:
skill-researcher → research-synthesis.md → planning-architect ↓ skill-architecture-plan.md → task-development

Component Integration: 10/10 ✅
- Integration method documented for each step (Guided Execution)
- Integration examples provided
- Clear explanation of how skills work together
- Process for using each component skill detailed

Cross-Reference Validation: ✅
- Internal links valid (references/ files exist and reachable)
- External skill references correct (all 5 component skills exist)
- Complementary skills mentioned appropriately

Composition Pattern: 10/10 ✅ (Workflow Skill)
- Pattern: Sequential Pipeline (with one optional step)
- Correctly implemented (Step 1 → 2 → [3 optional] → 4 → 5)
- Orchestration details provided
- Clear flow diagram

Integration Score: 5/5 (Excellent)
Notes: Exemplary integration documentation for workflow skill

目标:评估依赖项文档、数据流清晰度、组件集成以及组合模式
适用场景
  • 评审工作流技能(组合其他技能的技能)
  • 验证依赖项文档
  • 检查集成清晰度
  • 评估组合模式
  • 验证交叉引用有效性
自动化程度:30% 自动化(依赖项检查、交叉引用验证),70% 人工评估
流程
  1. 评审依赖项文档(人工)
    • 是否记录了所需技能?
    • 是否提及可选/互补技能?
    • 是否使用YAML
      dependencies
      字段(若适用)?
    • 是否记录了依赖项版本(若相关)?
  2. 评估数据流清晰度(人工,针对工作流技能)
    • 是否解释了技能间的数据流?
    • 是否记录了每个步骤的输入/输出?
    • 用户是否理解数据如何流转?
    • 是否提供了图表或流程图(若有帮助)?
  3. 评估组件集成(人工)
    • 组件技能如何协同工作?
    • 集成点是否清晰?
    • 是否提供集成示例?
    • 是否记录了组合模式?
  4. 验证交叉引用(自动化+人工)
    • 内部链接是否有效(指向references/、scripts/的链接)?
    • 外部技能引用是否正确?
    • 是否提及互补技能?
  5. 检查组合模式(人工,针对工作流技能)
    • 是否识别组合模式(顺序/并行/条件等)?
    • 模式是否正确实现?
    • 是否提供编排细节?
验证检查清单
  • 依赖项已记录(若技能依赖其他技能)
  • YAML
    dependencies
    字段正确(若使用)
  • 数据流已解释(针对工作流技能:输入/输出清晰)
  • 集成点清晰(组件技能如何连接)
  • 组件技能引用正确(名称准确,路径有效)
  • 交叉引用有效(内部链接可访问,外部引用正确)
  • 提供集成示例(若适用:如何协同使用)
  • 组合模式已记录(针对工作流技能:顺序/并行等)
  • 提及互补技能(可选但有价值)
评分准则
  • 5分 - 优秀:所有9项适用检查通过,集成文档完美
  • 4分 - 良好:7-8项检查通过,集成良好,文档存在微小空白
  • 3分 - 合格:5-6项检查通过,部分集成不清晰,存在信息缺失
  • 2分 - 需改进:3-4项检查通过,存在集成问题,依赖项/流文档不完善
  • 1分 - 较差:≤2项检查通过,集成差,依赖项文档混乱或缺失
输出结果
  • 集成评分(1-5分)
  • 依赖项验证结果(所需/可选/互补技能已记录)
  • 数据流清晰度评估(针对工作流技能)
  • 集成清晰度评级
  • 交叉引用验证结果
  • 改进建议
时间预估:15-25分钟(主要为人工)
示例
集成评审: development-workflow
========================================

依赖项文档: 10/10 ✅
- 所需技能: 无(工作流为独立技能)
- 组件技能: 5个已明确记录(skill-researcher、planning-architect、task-development、prompt-builder、todo-management)
- 可选技能: 3个互补技能已提及(review-multi、skill-updater、testing-validator)
- YAML字段: 未使用(非必须,技能在内容中引用)

数据流清晰度: 10/10 ✅(工作流技能)
- 提供数据流图(技能 → 输出 → 下一个技能)
- 每个步骤的输入/输出已记录
- 用户理解工件如何流转
- 示例:
skill-researcher → research-synthesis.md → planning-architect ↓ skill-architecture-plan.md → task-development

组件集成: 10/10 ✅
- 每个步骤的集成方法已记录(引导式执行)
- 提供集成示例
- 清晰解释技能如何协同工作
- 详细说明每个组件技能的使用流程

交叉引用验证: ✅
- 内部链接有效(references/目录下文件存在且可访问)
- 外部技能引用正确(所有5个组件技能存在)
- 互补技能提及恰当

组合模式: 10/10 ✅(工作流技能)
- 模式: 顺序流水线(含1个可选步骤)
- 实现正确(步骤1 → 2 → [3可选] → 4 → 5)
- 提供编排细节
- 清晰的流程图

集成评分: 5/5(优秀)
备注: 工作流技能的集成文档堪称典范

Review Modes

评审模式

Comprehensive Review Mode

全面评审模式

Purpose: Complete multi-dimensional assessment across all 5 dimensions with aggregate scoring
When to Use:
  • Pre-production validation (ensure skill ready for deployment)
  • Major skill updates (validate changes don't degrade quality)
  • Quality certification (establish baseline quality score)
  • Periodic quality audits (track quality over time)
Process:
  1. Run All 5 Operations Sequentially
    • Operation 1: Structure Review (5-10 min, automated)
    • Operation 2: Content Review (15-30 min, manual)
    • Operation 3: Quality Review (20-40 min, mixed)
    • Operation 4: Usability Review (30-60 min, manual)
    • Operation 5: Integration Review (15-25 min, manual)
  2. Aggregate Scores
    • Record score (1-5) for each dimension
    • Calculate weighted overall score using formula
    • Map overall score to grade (A/B/C/D/F)
  3. Assess Production Readiness
    • ≥4.5: Production Ready
    • 4.0-4.4: Ready with minor improvements
    • 3.5-3.9: Needs improvements before production
    • <3.5: Not ready, significant rework required
  4. Compile Improvement Recommendations
    • Aggregate issues from all dimensions
    • Prioritize: Critical → High → Medium → Low
    • Provide specific, actionable fixes
  5. Generate Comprehensive Report
    • Executive summary (overall score, grade, readiness)
    • Per-dimension scores and findings
    • Prioritized improvement list
    • Detailed rationale for scores
Output:
  • Overall score (1.0-5.0 with one decimal)
  • Grade (A/B/C/D/F)
  • Production readiness assessment
  • Per-dimension scores (Structure, Content, Quality, Usability, Integration)
  • Comprehensive improvement recommendations (prioritized)
  • Detailed review report
Time Estimate: 1.5-2.5 hours total
Example Output:
Comprehensive Review Report: skill-researcher
=============================================

OVERALL SCORE: 4.6/5.0 - GRADE A
STATUS: ✅ PRODUCTION READY

Dimension Scores:
- Structure:   5/5 (Excellent) - Perfect file organization
- Content:     5/5 (Excellent) - Comprehensive, clear documentation
- Quality:     4/5 (Good) - High quality, minor error handling gaps
- Usability:   5/5 (Excellent) - Easy to use, highly effective
- Integration: 4/5 (Good) - Well-documented dependencies

Production Readiness: READY - High quality, deploy with confidence

Recommendations (Priority Order):
1. [Medium] Add error handling examples for web search failures
2. [Low] Consider adding table of contents for long SKILL.md

Strengths:
- Excellent structure and organization
- Comprehensive coverage of 5 research operations
- Strong usability with clear instructions
- Good examples throughout

Overall: Exemplary skill, production-ready quality

目标:针对所有5个维度的完整多维度评估,包含综合评分
适用场景
  • 上线前验证(确保技能可部署)
  • 技能重大更新(验证变更未降低质量)
  • 质量认证(建立基准质量评分)
  • 定期质量审计(跟踪质量长期变化)
流程
  1. 依次运行所有5个操作
    • 操作1:结构评审(5-10分钟,自动化)
    • 操作2:内容评审(15-30分钟,人工)
    • 操作3:质量评审(20-40分钟,混合)
    • 操作4:可用性评审(30-60分钟,人工)
    • 操作5:集成评审(15-25分钟,人工)
  2. 汇总评分
    • 记录每个维度的评分(1-5分)
    • 使用公式计算加权综合评分
    • 将综合评分映射到等级(A/B/C/D/F)
  3. 评估生产就绪性
    • ≥4.5分:生产就绪
    • 4.0-4.4分:小幅改进后可上线
    • 3.5-3.9分:改进后上线
    • <3.5分:暂不就绪,需要显著重构
  4. 整理改进建议
    • 汇总所有维度的问题
    • 按优先级排序:关键 → 高 → 中 → 低
    • 提供具体、可落地的修复方案
  5. 生成全面评审报告
    • 执行摘要(综合评分、等级、就绪性)
    • 各维度评分及发现
    • 分优先级的改进列表
    • 评分的详细理由
输出
  • 综合评分(1.0-5.0,保留一位小数)
  • 等级(A/B/C/D/F)
  • 生产就绪性评估
  • 各维度评分(结构、内容、质量、可用性、集成)
  • 全面的改进建议(分优先级)
  • 详细评审报告
时间预估:总计1.5-2.5小时
示例输出
全面评审报告: skill-researcher
=============================================

综合评分: 4.6/5.0 - 等级 A
状态: ✅ 生产就绪

各维度评分:
- 结构:   5/5(优秀) - 文件组织完美
- 内容:     5/5(优秀) - 文档全面、清晰
- 质量:     4/5(良好) - 高质量,错误处理存在微小空白
- 可用性:   5/5(优秀) - 易用性强,高效
- 集成: 4/5(良好) - 依赖项文档完善

生产就绪性: 可上线 - 高质量,可放心部署

建议(按优先级):
1. [中] 添加网页搜索失败的错误处理示例
2. [低] 考虑为长SKILL.md添加目录

优势:
- 结构和组织优秀
- 全面覆盖5个研究操作
- 可用性强,说明清晰
- 文档中包含优质示例

整体评价: 堪称典范的技能,具备生产就绪质量

Fast Check Mode

快速检查模式

Purpose: Quick automated validation for rapid quality feedback during development
When to Use:
  • During development (continuous validation)
  • Quick quality checks (before detailed review)
  • Pre-commit validation (catch issues early)
  • Rapid iteration (fast feedback loop)
Process:
  1. Run Automated Structure Validation
    bash
    python3 scripts/validate-structure.py /path/to/skill
  2. Check Critical Issues
    • YAML frontmatter valid?
    • Required files present?
    • Naming conventions followed?
    • File sizes appropriate?
  3. Generate Pass/Fail Report
    • PASS: Critical checks passed, proceed to development
    • FAIL: Critical issues found, fix before continuing
  4. Provide Quick Fixes (if available)
    • Specific commands to fix issues
    • Examples of correct format
    • References to documentation
Output:
  • Pass/Fail status
  • Critical issues list (if failed)
  • Quick fixes or guidance
  • Score estimate (if passed)
Time Estimate: 5-10 minutes
Example Output:
bash
$ python3 scripts/validate-structure.py .claude/skills/my-skill

Fast Check Report
=================
Skill: my-skill

❌ FAIL - Critical Issues Found

Critical Issues:
1. YAML frontmatter: Invalid syntax (line 3: unexpected character)
2. Naming convention: File "MyGuide.md" should be "my-guide.md"

Quick Fixes:
1. Fix YAML: Remove trailing comma on line 3
2. Rename file: mv references/MyGuide.md references/my-guide.md

Run full validation after fixes: python3 scripts/validate-structure.py .claude/skills/my-skill

目标:快速自动化验证,为开发过程提供快速质量反馈
适用场景
  • 开发过程中(持续验证)
  • 快速质量检查(详细评审前)
  • 提交前验证(提前发现问题)
  • 快速迭代(快速反馈循环)
流程
  1. 运行自动化结构验证
    bash
    python3 scripts/validate-structure.py /path/to/skill
  2. 检查关键问题
    • YAML frontmatter是否有效?
    • 必要文件是否存在?
    • 是否遵循命名规范?
    • 文件大小是否合适?
  3. 生成通过/失败报告
    • 通过:关键检查通过,可继续开发
    • 失败:发现关键问题,修复后再继续
  4. 提供快速修复方案(若可用)
    • 修复问题的具体命令
    • 正确格式示例
    • 文档参考
输出
  • 通过/失败状态
  • 关键问题列表(若失败)
  • 快速修复方案或指导
  • 评分预估(若通过)
时间预估:5-10分钟
示例输出
bash
$ python3 scripts/validate-structure.py .claude/skills/my-skill

快速检查报告
=================
技能: my-skill

❌ 失败 - 发现关键问题

关键问题:
1. YAML frontmatter: 语法无效(第3行:意外字符)
2. 命名规范: 文件"MyGuide.md"应命名为"my-guide.md"

快速修复方案:
1. 修复YAML: 删除第3行的尾随逗号
2. 重命名文件: mv references/MyGuide.md references/my-guide.md

修复后重新运行完整验证: python3 scripts/validate-structure.py .claude/skills/my-skill

Custom Review

自定义评审

Purpose: Flexible review focusing on specific dimensions or concerns
When to Use:
  • Targeted improvements (focus on specific dimension)
  • Time constraints (can't do comprehensive review)
  • Specific concerns (e.g., only check usability)
  • Iterative improvements (focus on one dimension at a time)
Options:
  1. Select Dimensions: Choose 1-5 operations to run
  2. Adjust Thoroughness: Quick/Standard/Thorough per dimension
  3. Focus Areas: Specify particular concerns (e.g., "check examples quality")
Process:
  1. Define Custom Review Scope
    • Which dimensions to review?
    • How thorough for each?
    • Any specific focus areas?
  2. Run Selected Operations
    • Execute chosen operations
    • Apply thoroughness level
  3. Generate Targeted Report
    • Scores for selected dimensions only
    • Focused findings
    • Specific recommendations
Example Scenarios:
Scenario 1: Content-Focused Review
Custom Review: Content + Examples
- Operations: Content Review only
- Thoroughness: Thorough
- Focus: Example quality and completeness
- Time: 30 minutes
Scenario 2: Quick Quality Check
Custom Review: Structure + Quality (Fast)
- Operations: Structure + Quality
- Thoroughness: Quick
- Focus: Pattern compliance, anti-patterns
- Time: 15-20 minutes
Scenario 3: Workflow Integration Review
Custom Review: Integration Deep Dive
- Operations: Integration Review only
- Thoroughness: Thorough
- Focus: Data flow, composition patterns
- Time: 30 minutes

目标:灵活评审,聚焦特定维度或关注点
适用场景
  • 针对性改进(聚焦特定维度)
  • 时间有限(无法进行全面评审)
  • 特定关注点(如仅检查可用性)
  • 迭代改进(一次聚焦一个维度)
选项:
  1. 选择维度:选择1-5个操作执行
  2. 调整细致度:每个维度可选择快速/标准/细致
  3. 聚焦领域:指定特定关注点(如"检查示例质量")
流程
  1. 定义自定义评审范围
    • 评审哪些维度?
    • 每个维度的细致度?
    • 有哪些特定关注点?
  2. 运行选定操作
    • 执行选定的操作
    • 应用指定的细致度
  3. 生成针对性报告
    • 仅包含选定维度的评分
    • 聚焦的发现
    • 具体改进建议
示例场景:
场景1:内容聚焦评审
自定义评审: 内容 + 示例
- 操作: 仅内容评审
- 细致度: 细致
- 聚焦: 示例质量与完整性
- 时间: 30分钟
场景2:快速质量检查
自定义评审: 结构 + 质量(快速)
- 操作: 结构 + 质量
- 细致度: 快速
- 聚焦: 模式合规性、反模式
- 时间: 15-20分钟
场景3:工作流集成深度评审
自定义评审: 集成深度分析
- 操作: 仅集成评审
- 细致度: 细致
- 聚焦: 数据流、组合模式
- 时间: 30分钟

Best Practices

最佳实践

1. Self-Review First

1. 先进行自我评审

Practice: Run Fast Check mode before requesting comprehensive review
Rationale: Automated checks catch 70% of structural issues in 5-10 minutes, allowing manual review to focus on higher-value assessment
Application: Always run
validate-structure.py
before detailed review
实践:请求全面评审前先运行快速检查模式
理由:自动化检查可在5-10分钟内发现70%的结构问题,让人工评审聚焦于更高价值的评估工作
应用:详细评审前始终运行
validate-structure.py

2. Use Checklists Systematically

2. 系统使用检查清单

Practice: Follow validation checklists item-by-item for each operation
Rationale: Research shows teams using checklists reduce common issues by 30% and ensure consistent results
Application: Print or display checklist, mark each item explicitly
实践:每个操作逐项遵循验证检查清单
理由:研究表明,使用检查清单的团队可减少30%的常见问题,并确保结果一致
应用:打印或显示检查清单,逐项明确标记

3. Test in Real Scenarios

3. 实际场景测试

Practice: Conduct usability review with actual usage, not just documentation reading
Rationale: Real-world testing reveals hidden usability issues that documentation review misses
Application: For Usability Review, actually use the skill to complete a realistic task
实践:通过实际使用进行可用性评审,而非仅阅读文档
理由:实际场景测试可发现文档评审无法发现的隐藏可用性问题
应用:可用性评审时,实际使用技能完成真实任务

4. Focus on Automation

4. 聚焦自动化

Practice: Let scripts handle routine checks, focus manual effort on judgment-requiring assessment
Rationale: Automation provides 70% reduction in manual review time for routine checks
Application: Use scripts for Structure and partial Quality checks, manual for Content/Usability
实践:让脚本处理常规检查,人工精力聚焦于需要判断的评估工作
理由:自动化可减少70%的常规检查人工时间
应用:使用脚本进行结构和部分质量检查,人工处理内容/可用性评审

5. Provide Actionable Feedback

5. 提供可落地的反馈

Practice: Make improvement recommendations specific, prioritized, and actionable
Rationale: Vague feedback ("improve quality") is less valuable than specific guidance ("add error handling examples to Step 3")
Application: For each issue, specify: What, Why, How (to fix), Priority
实践:改进建议需具体、分优先级且可落地
理由:模糊反馈(如"提升质量")远不如具体指导(如"在步骤3中添加错误处理示例")有价值
应用:每个问题需明确:问题是什么、为什么需要修复、如何修复、优先级

6. Review Regularly

6. 定期评审

Practice: Conduct reviews throughout development lifecycle, not just at end
Rationale: Early reviews catch issues before they compound; rapid feedback maintains momentum (37% productivity increase)
Application: Fast Check during development, Comprehensive Review before production
实践:在开发全周期内定期评审,而非仅在收尾阶段
理由:早期评审可在问题复杂化前发现问题;快速反馈可保持开发节奏(提升37%的生产力)
应用:开发过程中使用快速检查,上线前使用全面评审

7. Track Improvements

7. 跟踪改进

Practice: Document before/after scores to measure improvement over time
Rationale: Tracking demonstrates progress, identifies patterns, validates improvements
Application: Save review reports, compare scores across iterations
实践:记录评审前后的评分,跟踪长期改进
理由:跟踪可展示进展、识别模式、验证改进效果
应用:保存评审报告,对比不同迭代的评分

8. Iterate Based on Findings

8. 基于发现迭代优化

Practice: Use review findings to improve future skills, not just current skill
Rationale: Learnings compound; patterns identified in reviews improve entire skill ecosystem
Application: Document common issues, create guidelines, update templates

实践:利用评审发现改进未来技能,而非仅当前技能
理由:经验可积累;评审中识别的模式可提升整个技能生态的质量
应用:记录常见问题,创建指南,更新模板

Common Mistakes

常见错误

Mistake 1: Skipping Structure Review

错误1:跳过结构评审

Symptom: Spending time on detailed review only to discover fundamental structural issues
Cause: Assumption that structure is correct, eagerness to assess content
Fix: Always run Structure Review (Fast Check) first - takes 5-10 minutes, catches 70% of issues
Prevention: Make Fast Check mandatory first step in any review process
症状:花费时间进行详细评审后才发现基础结构问题
原因:假设结构正确,急于评估内容
修复:始终先运行结构评审(快速检查)- 仅需5-10分钟,可发现70%的问题
预防:将快速检查作为任何评审流程的强制首个步骤

Mistake 2: Subjective Scoring

错误2:主观评分

Symptom: Inconsistent scores, debate over ratings, difficulty justifying scores
Cause: Using personal opinion instead of rubric criteria
Fix: Use
references/scoring-rubric.md
- score based on specific criteria, not feeling
Prevention: Print rubric, refer to criteria for each score, document evidence
症状:评分不一致,对评级存在争议,难以证明评分合理性
原因:使用个人判断而非准则评分
修复:使用
references/scoring-rubric.md
- 基于具体准则评分,而非感觉
预防:打印评分准则,评分时参考准则,记录评分依据

Mistake 3: Ignoring Usability

错误3:忽略可用性评审

Symptom: Skill looks good on paper but difficult to use in practice
Cause: Skipping Usability Review (90% manual, time-consuming)
Fix: Actually test skill in real scenario - reveals hidden issues
Prevention: Allocate 30-60 minutes for usability testing, cannot skip for production
症状:技能在文档中看起来不错,但实际使用困难
原因:跳过可用性评审(90%人工,耗时)
修复:实际场景测试技能 - 发现隐藏问题
预防:为可用性测试分配30-60分钟,生产就绪技能不可跳过此步骤

Mistake 4: No Prioritization

错误4:未分优先级

Symptom: Long list of improvements, unclear what to fix first, overwhelmed
Cause: Treating all issues equally without assessing impact
Fix: Prioritize issues: Critical (must fix) → High → Medium → Low (nice to have)
Prevention: Tag each issue with priority level during review
症状:改进列表过长,不清楚先修复什么,不知所措
原因:同等对待所有问题,未评估影响
修复:按优先级排序问题:关键(必须修复)→ 高 → 中 → 低(可选)
预防:评审时为每个问题标记优先级

Mistake 5: Batch Reviews

错误5:批量评审

Symptom: Discovering major issues late in development, costly rework
Cause: Waiting until end to review, accumulating issues
Fix: Review early and often - Fast Check during development, iterations
Prevention: Continuous validation, rapid feedback, catch issues when small
症状:开发后期才发现重大问题,修复成本高
原因:等到开发结束才评审,问题积累
修复:尽早并定期评审 - 开发过程中使用快速检查,迭代改进
预防:持续验证,快速反馈,问题小时就解决

Mistake 6: Ignoring Patterns

错误6:忽略模式

Symptom: Repeating same issues across multiple skills
Cause: Treating each review in isolation, not learning from patterns
Fix: Track common issues, create guidelines, update development process
Prevention: Document patterns, share learnings, improve templates

症状:多个技能重复出现相同问题
原因:孤立处理每个评审,未从模式中学习
修复:跟踪常见问题,创建指南,更新开发流程
预防:记录模式,分享经验,改进模板

Quick Reference

快速参考

The 5 Operations

5大操作

OperationFocusAutomationTimeKey Output
StructureYAML, files, naming, organization95%5-10mStructure score, compliance report
ContentCompleteness, clarity, examples40%15-30mContent score, section assessment
QualityPatterns, best practices, anti-patterns50%20-40mQuality score, pattern compliance
UsabilityEase of use, effectiveness10%30-60mUsability score, scenario test results
IntegrationDependencies, data flow, composition30%15-25mIntegration score, dependency validation
操作聚焦领域自动化程度时间核心输出
结构评审YAML、文件、命名、组织95%5-10m结构评分、合规报告
内容评审完整性、清晰度、示例40%15-30m内容评分、章节评估
质量评审模式、最佳实践、反模式50%20-40m质量评分、模式合规性
可用性评审易用性、有效性10%30-60m可用性评分、场景测试结果
集成评审依赖项、数据流、组合30%15-25m集成评分、依赖项验证

Scoring Scale

评分等级

ScoreLevelMeaningAction
5ExcellentExceeds standardsExemplary - use as example
4GoodMeets standardsProduction ready - standard quality
3AcceptableMinor improvementsUsable - note improvements
2Needs WorkNotable issuesNot ready - significant improvements
1PoorSignificant problemsNot viable - extensive rework
评分等级含义行动
5优秀超出标准堪称典范 - 作为示例
4良好符合标准生产就绪 - 标准质量
3合格需小幅改进可使用 - 记录改进点
2需改进存在明显问题暂不就绪 - 需要显著改进
1较差存在严重问题不可用 - 需要全面重构

Production Readiness

生产就绪性

Overall ScoreGradeStatusDecision
4.5-5.0A✅ Production ReadyShip it - high quality
4.0-4.4B+✅ Ready (minor improvements)Ship - note improvements for next iteration
3.5-3.9B-⚠️ Needs ImprovementsHold - fix issues first
2.5-3.4C❌ Not ReadyDon't ship - substantial work needed
1.5-2.4D❌ Not ReadyDon't ship - significant rework
1.0-1.4F❌ Not ReadyDon't ship - major issues
综合评分等级状态决策
4.5-5.0A✅ 生产就绪发布 - 高质量
4.0-4.4B+✅ 小幅改进后可上线发布 - 记录改进点用于下一迭代
3.5-3.9B-⚠️ 需改进暂缓 - 先修复问题
2.5-3.4C❌ 暂不就绪不发布 - 需要大量优化
1.5-2.4D❌ 暂不就绪不发布 - 需要显著重构
1.0-1.4F❌ 暂不就绪不发布 - 存在重大问题

Review Modes

评审模式

ModeTimeUse CaseCoverage
Fast Check5-10mDuring development, quick validationStructure only (automated)
CustomVariableTargeted review, specific concernsSelected dimensions
Comprehensive1.5-2.5hPre-production, full assessmentAll 5 dimensions + report
模式时间适用场景覆盖范围
快速检查5-10m开发过程中、快速验证仅结构(自动化)
自定义评审可变针对性评审、特定关注点选定维度
全面评审1.5-2.5h上线前、全面评估所有5个维度 + 报告

Common Commands

常用命令

bash
undefined
bash
undefined

Fast structure validation

快速结构验证

python3 scripts/validate-structure.py /path/to/skill
python3 scripts/validate-structure.py /path/to/skill

Verbose output

详细输出

python3 scripts/validate-structure.py /path/to/skill --verbose
python3 scripts/validate-structure.py /path/to/skill --verbose

JSON output

JSON格式输出

python3 scripts/validate-structure.py /path/to/skill --json
python3 scripts/validate-structure.py /path/to/skill --json

Pattern compliance check

模式合规性检查

python3 scripts/check-patterns.py /path/to/skill
python3 scripts/check-patterns.py /path/to/skill

Generate review report

生成评审报告

python3 scripts/generate-review-report.py review_data.json --output report.md
python3 scripts/generate-review-report.py review_data.json --output report.md

Run comprehensive review

运行全面评审

python3 scripts/review-runner.py /path/to/skill --mode comprehensive
undefined
python3 scripts/review-runner.py /path/to/skill --mode comprehensive
undefined

Weighted Average Formula

加权平均公式

Overall = (Structure × 0.20) + (Content × 0.25) + (Quality × 0.25) +
          (Usability × 0.15) + (Integration × 0.15)
Weight Rationale:
  • Content & Quality (25% each): Core value
  • Structure (20%): Foundation
  • Usability & Integration (15% each): Supporting
综合评分 = (结构 × 0.20) + (内容 × 0.25) + (质量 × 0.25) +
          (可用性 × 0.15) + (集成 × 0.15)
权重依据:
  • 内容与质量(各25%):核心价值
  • 结构(20%):基础
  • 可用性与集成(各15%):支撑因素

For More Information

更多信息

  • Structure details:
    references/structure-review-guide.md
  • Content details:
    references/content-review-guide.md
  • Quality details:
    references/quality-review-guide.md
  • Usability details:
    references/usability-review-guide.md
  • Integration details:
    references/integration-review-guide.md
  • Complete scoring rubrics:
    references/scoring-rubric.md
  • Report templates:
    references/review-report-template.md

For detailed guidance on each dimension, see reference files. For automation tools, see scripts/.
  • 结构细节:
    references/structure-review-guide.md
  • 内容细节:
    references/content-review-guide.md
  • 质量细节:
    references/quality-review-guide.md
  • 可用性细节:
    references/usability-review-guide.md
  • 集成细节:
    references/integration-review-guide.md
  • 完整评分准则:
    references/scoring-rubric.md
  • 报告模板:
    references/review-report-template.md

各维度详细指导请参考参考文件。自动化工具请查看scripts/目录。