audit-agents-skills
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAudit Agents/Skills/Commands (Advanced Skill)
审计Agent/Skill/命令(高级Skill)
Comprehensive quality audit system for Claude Code agents, skills, and commands. Provides quantitative scoring, comparative analysis, and production readiness grading based on industry best practices.
面向Claude Code Agent、Skill和命令的综合质量审计系统,基于行业最佳实践提供量化评分、对比分析和生产就绪度评级。
Purpose
用途
Problem: Manual validation of agents/skills is error-prone and inconsistent. According to the LangChain Agent Report 2026, 29.5% of organizations deploy agents without systematic evaluation, leading to "agent bugs" as the top challenge (18% of teams).
Solution: Automated quality scoring across 16 weighted criteria with production readiness thresholds (80% = Grade B minimum for production deployment).
Key Features:
- Quantitative scoring (32 points for agents/skills, 20 for commands)
- Weighted criteria (Identity 3x, Prompt 2x, Validation 1x, Design 2x)
- Production readiness grading (A-F scale with 80% threshold)
- Comparative analysis vs reference templates
- JSON/Markdown dual output for programmatic integration
- Fix suggestions for failing criteria
问题:人工验证Agent/Skill容易出错且标准不统一。根据《LangChain Agent报告2026》,29.5%的机构在未经过系统评估的情况下就部署Agent,导致“Agent漏洞”成为头号挑战(18%的团队受此影响)。
解决方案:基于16项加权标准的自动化质量评分,设置生产就绪阈值(80% = 生产部署最低B级要求)。
核心特性:
- 量化评分(Agent/Skill满分32分,命令满分20分)
- 加权评分标准(身份权重3x、Prompt权重2x、验证权重1x、设计权重2x)
- 生产就绪度评级(A-F等级,阈值80%)
- 与参考模板的对比分析
- 支持JSON/Markdown双输出,方便程序集成
- 针对未达标项提供修复建议
Modes
模式
| Mode | Usage | Output |
|---|---|---|
| Quick Audit | Top-5 critical criteria only | Fast pass/fail (3-5 min for 20 files) |
| Full Audit | All 16 criteria per file | Detailed scores + recommendations (10-15 min) |
| Comparative | Full + benchmark vs templates | Analysis + gap identification (15-20 min) |
Default: Full Audit (recommended for first run)
| 模式 | 用途 | 输出 |
|---|---|---|
| 快速审计 | 仅检查Top5关键标准 | 快速得出通过/未通过结果(20个文件仅需3-5分钟) |
| 全量审计 | 每个文件检查全部16项标准 | 详细得分+优化建议(10-15分钟) |
| 对比审计 | 全量审计+与模板基准对比 | 分析+差距识别(15-20分钟) |
默认模式:全量审计(首次运行推荐)
Methodology
方法论
Why These Criteria?
为什么选择这些标准?
The 16-criteria framework is derived from:
- Claude Code Best Practices (Ultimate Guide line 4921: Agent Validation Checklist)
- Industry Data (LangChain Agent Report 2026: evaluation gaps)
- Production Failures (Community feedback on hardcoded paths, missing error handling)
- Composition Patterns (Skills should reference other skills, agents should be modular)
16项标准框架来源于:
- Claude Code最佳实践(终极指南第4921行:Agent验证检查清单)
- 行业数据(《LangChain Agent报告2026》:评估缺口)
- 生产故障案例(社区反馈的硬编码路径、缺失错误处理等问题)
- 组合模式要求(Skill应当可以引用其他Skill,Agent应当模块化)
Scoring Philosophy
评分逻辑
Weight Rationale:
- Identity (3x): If users can't find/invoke the agent, quality is irrelevant (discoverability > quality)
- Prompt (2x): Determines reliability and accuracy of outputs
- Validation (1x): Improves robustness but is secondary to core functionality
- Design (2x): Impacts long-term maintainability and scalability
Grade Standards:
- A (90-100%): Production-ready, minimal risk
- B (80-89%): Good, meets production threshold
- C (70-79%): Needs improvement before production
- D (60-69%): Significant gaps, not production-ready
- F (<60%): Critical issues, requires major refactoring
Industry Alignment: The 80% threshold aligns with software engineering best practices for production deployment (e.g., code coverage >80%, security scan pass rates).
权重设置理由:
- 身份(3x):如果用户找不到/无法调用Agent,质量毫无意义(可发现性 > 质量)
- Prompt(2x):决定输出的可靠性和准确性
- 验证(1x):提升鲁棒性,但优先级低于核心功能
- 设计(2x):影响长期可维护性和可扩展性
评级标准:
- A(90-100%):生产就绪,风险极低
- B(80-89%):良好,达到生产阈值
- C(70-79%):上线前需要优化
- D(60-69%):存在明显缺口,不适合生产部署
- F(<60%):存在严重问题,需要大量重构
行业对齐:80%的阈值符合软件工程生产部署的最佳实践(如代码覆盖率>80%、安全扫描通过率要求等)。
Workflow
工作流
Phase 1: Discovery
阶段1:资源发现
-
Scan directories:
.claude/agents/ .claude/skills/ .claude/commands/ examples/agents/ (if exists) examples/skills/ (if exists) examples/commands/ (if exists) -
Classify files by type (agent/skill/command)
-
Load reference templates (for Comparative mode):
guide/examples/agents/ (benchmark files) guide/examples/skills/ (benchmark files) guide/examples/commands/ (benchmark files)
-
扫描目录:
.claude/agents/ .claude/skills/ .claude/commands/ examples/agents/ (如果存在) examples/skills/ (如果存在) examples/commands/ (如果存在) -
按类型分类文件(Agent/Skill/命令)
-
加载参考模板(仅对比模式需要):
guide/examples/agents/ (基准文件) guide/examples/skills/ (基准文件) guide/examples/commands/ (基准文件)
Phase 2: Scoring Engine
阶段2:评分引擎
Load scoring criteria from :
scoring/criteria.yamlyaml
agents:
max_points: 32
categories:
identity:
weight: 3
criteria:
- id: A1.1
name: "Clear name"
points: 3
detection: "frontmatter.name exists and is descriptive"
# ... (16 total criteria)For each file:
- Parse frontmatter (YAML)
- Extract content sections
- Run detection patterns (regex, keyword search)
- Calculate score:
(points / max_points) × 100 - Assign grade (A-F)
从加载评分标准:
scoring/criteria.yamlyaml
agents:
max_points: 32
categories:
identity:
weight: 3
criteria:
- id: A1.1
name: "Clear name"
points: 3
detection: "frontmatter.name exists and is descriptive"
# ... (共16项标准)对每个文件:
- 解析frontmatter(YAML)
- 提取内容区块
- 运行检测规则(正则、关键词搜索)
- 计算得分:
(得分/满分) × 100 - 分配等级(A-F)
Phase 3: Comparative Analysis (Comparative Mode Only)
阶段3:对比分析(仅对比模式)
For each project file:
- Find closest matching template (by description similarity)
- Compare scores per criterion
- Identify gaps:
template_score - project_score - Flag significant gaps (>10 points difference)
Example:
Project file: .claude/agents/debugging-specialist.md (Score: 78%, Grade C)
Closest template: examples/agents/debugging-specialist.md (Score: 94%, Grade A)
Gaps:
- Anti-hallucination measures: -2 points (template has, project missing)
- Edge cases documented: -1 point (template has 5 examples, project has 1)
- Integration documented: -1 point (template references 3 skills, project none)
Total gap: 16 points (explains C vs A difference)对每个项目文件:
- 匹配最接近的模板(基于描述相似度)
- 按标准项对比得分
- 识别差距:
模板得分 - 项目得分 - 标记显著差距(得分差>10分)
示例:
Project file: .claude/agents/debugging-specialist.md (Score: 78%, Grade C)
Closest template: examples/agents/debugging-specialist.md (Score: 94%, Grade A)
Gaps:
- Anti-hallucination measures: -2 points (template has, project missing)
- Edge cases documented: -1 point (template has 5 examples, project has 1)
- Integration documented: -1 point (template references 3 skills, project none)
Total gap: 16 points (explains C vs A difference)Phase 4: Report Generation
阶段4:报告生成
Markdown Report ():
audit-report.md- Summary table (overall + by type)
- Individual scores with top issues
- Detailed breakdown per file (collapsible)
- Prioritized recommendations
JSON Output ():
audit-report.jsonjson
{
"metadata": {
"project_path": "/path/to/project",
"audit_date": "2026-02-07",
"mode": "full",
"version": "1.0.0"
},
"summary": {
"overall_score": 82.5,
"overall_grade": "B",
"total_files": 15,
"production_ready_count": 10,
"production_ready_percentage": 66.7
},
"by_type": {
"agents": { "count": 5, "avg_score": 85.2, "grade": "B" },
"skills": { "count": 8, "avg_score": 78.9, "grade": "C" },
"commands": { "count": 2, "avg_score": 92.0, "grade": "A" }
},
"files": [
{
"path": ".claude/agents/debugging-specialist.md",
"type": "agent",
"score": 78.1,
"grade": "C",
"points_obtained": 25,
"points_max": 32,
"failed_criteria": [
{
"id": "A2.4",
"name": "Anti-hallucination measures",
"points_lost": 2,
"recommendation": "Add section on source verification"
}
]
}
],
"top_issues": [
{
"issue": "Missing error handling",
"affected_files": 8,
"impact": "Runtime failures unhandled",
"priority": "high"
}
]
}Markdown报告():
audit-report.md- 汇总表格(整体得分+按类型得分)
- 单个文件得分与核心问题
- 每个文件的详细得分拆解(可折叠)
- 按优先级排序的优化建议
JSON输出():
audit-report.jsonjson
{
"metadata": {
"project_path": "/path/to/project",
"audit_date": "2026-02-07",
"mode": "full",
"version": "1.0.0"
},
"summary": {
"overall_score": 82.5,
"overall_grade": "B",
"total_files": 15,
"production_ready_count": 10,
"production_ready_percentage": 66.7
},
"by_type": {
"agents": { "count": 5, "avg_score": 85.2, "grade": "B" },
"skills": { "count": 8, "avg_score": 78.9, "grade": "C" },
"commands": { "count": 2, "avg_score": 92.0, "grade": "A" }
},
"files": [
{
"path": ".claude/agents/debugging-specialist.md",
"type": "agent",
"score": 78.1,
"grade": "C",
"points_obtained": 25,
"points_max": 32,
"failed_criteria": [
{
"id": "A2.4",
"name": "Anti-hallucination measures",
"points_lost": 2,
"recommendation": "Add section on source verification"
}
]
}
],
"top_issues": [
{
"issue": "Missing error handling",
"affected_files": 8,
"impact": "Runtime failures unhandled",
"priority": "high"
}
]
}Phase 5: Fix Suggestions (Optional)
阶段5:修复建议(可选)
For each failing criterion, generate actionable fix:
markdown
undefined对每个未达标项,生成可执行的修复方案:
markdown
undefinedFile: .claude/agents/debugging-specialist.md
文件: .claude/agents/debugging-specialist.md
Issue: Missing anti-hallucination measures (2 points lost)
Fix:
Add this section after "Methodology":
问题: 缺失防幻觉措施(扣2分)
修复方案:
在“方法论”章节后添加如下内容:
Source Verification
来源验证
- Always cite sources for technical claims
- Use phrases: "According to [documentation]...", "Based on [tool output]..."
- If uncertain, state: "I don't have verified information on..."
- Never invent: statistics, version numbers, API signatures, stack traces
Detection: Grep for keywords: "verify", "cite", "source", "evidence"
---- 技术声明必须标注来源
- 使用话术:“根据[文档]...”, “基于[工具输出]...”
- 不确定时说明:“我没有关于该问题的验证信息...”
- 禁止编造:统计数据、版本号、API签名、堆栈跟踪
检测规则: 搜索关键词:"verify", "cite", "source", "evidence"
---Scoring Criteria
评分标准
See for complete definitions. Summary:
scoring/criteria.yaml完整定义请查看,摘要如下:
scoring/criteria.yamlAgents (32 points max)
Agent(满分32分)
| Category | Weight | Criteria Count | Max Points |
|---|---|---|---|
| Identity | 3x | 4 | 12 |
| Prompt Quality | 2x | 4 | 8 |
| Validation | 1x | 4 | 4 |
| Design | 2x | 4 | 8 |
Key Criteria:
- Clear name (3 pts): Not generic like "agent1"
- Description with triggers (3 pts): Contains "when"/"use"
- Role defined (2 pts): "You are..." statement
- 3+ examples (1 pt): Usage scenarios documented
- Single responsibility (2 pts): Focused, not "general purpose"
| 分类 | 权重 | 标准数量 | 满分 |
|---|---|---|---|
| 身份 | 3x | 4 | 12 |
| Prompt质量 | 2x | 4 | 8 |
| 验证 | 1x | 4 | 4 |
| 设计 | 2x | 4 | 8 |
核心标准:
- 清晰的名称(3分):不能是“agent1”这类通用名称
- 带触发条件的描述(3分):包含“当”/“使用场景”相关说明
- 明确定义角色(2分):有“你是...”类声明
- 3个以上示例(1分):有使用场景文档
- 单一职责(2分):功能聚焦,不是“通用型”Agent
Skills (32 points max)
Skill(满分32分)
| Category | Weight | Criteria Count | Max Points |
|---|---|---|---|
| Structure | 3x | 4 | 12 |
| Content | 2x | 4 | 8 |
| Technical | 1x | 4 | 4 |
| Design | 2x | 4 | 8 |
Key Criteria:
- Valid SKILL.md (3 pts): Proper naming
- Name valid (3 pts): Lowercase, 1-64 chars, no spaces
- Methodology described (2 pts): Workflow section exists
- No hardcoded paths (1 pt): No ,
/Users//home/ - Clear triggers (2 pts): "When to use" section
| 分类 | 权重 | 标准数量 | 满分 |
|---|---|---|---|
| 结构 | 3x | 4 | 12 |
| 内容 | 2x | 4 | 8 |
| 技术实现 | 1x | 4 | 4 |
| 设计 | 2x | 4 | 8 |
核心标准:
- 有效的SKILL.md文件(3分):命名规范
- 合法名称(3分):小写、1-64字符、无空格
- 描述方法论(2分):存在工作流章节
- 无硬编码路径(1分):没有、
/Users/这类路径/home/ - 清晰触发条件(2分):有“何时使用”章节
Commands (20 points max)
命令(满分20分)
| Category | Weight | Criteria Count | Max Points |
|---|---|---|---|
| Structure | 3x | 4 | 12 |
| Quality | 2x | 4 | 8 |
Key Criteria:
- Valid frontmatter (3 pts): name + description
- Argument hint (3 pts): If uses
$ARGUMENTS - Step-by-step workflow (3 pts): Numbered sections
- Error handling (2 pts): Mentions failure modes
| 分类 | 权重 | 标准数量 | 满分 |
|---|---|---|---|
| 结构 | 3x | 4 | 12 |
| 质量 | 2x | 4 | 8 |
核心标准:
- 有效的frontmatter(3分):包含名称+描述
- 参数提示(3分):如果使用需提供提示
$ARGUMENTS - 分步工作流(3分):有编号步骤说明
- 错误处理(2分):提及失败场景
Detection Patterns
检测规则
Frontmatter Parsing
Frontmatter解析
python
import yaml
import re
def parse_frontmatter(content):
match = re.search(r'^---\n(.*?)\n---', content, re.DOTALL)
if match:
return yaml.safe_load(match.group(1))
return Nonepython
import yaml
import re
def parse_frontmatter(content):
match = re.search(r'^---\n(.*?)\n---', content, re.DOTALL)
if match:
return yaml.safe_load(match.group(1))
return NoneKeyword Detection
关键词检测
python
def has_keywords(text, keywords):
text_lower = text.lower()
return any(kw in text_lower for kw in keywords)python
def has_keywords(text, keywords):
text_lower = text.lower()
return any(kw in text_lower for kw in keywords)Example
示例
has_trigger = has_keywords(description, ['when', 'use', 'trigger'])
has_error_handling = has_keywords(content, ['error', 'failure', 'fallback'])
undefinedhas_trigger = has_keywords(description, ['when', 'use', 'trigger'])
has_error_handling = has_keywords(content, ['error', 'failure', 'fallback'])
undefinedOverlap Detection (Duplication Check)
重叠检测(重复校验)
python
def jaccard_similarity(text1, text2):
words1 = set(text1.lower().split())
words2 = set(text2.lower().split())
intersection = words1 & words2
union = words1 | words2
return len(intersection) / len(union) if union else 0python
def jaccard_similarity(text1, text2):
words1 = set(text1.lower().split())
words2 = set(text2.lower().split())
intersection = words1 & words2
union = words1 | words2
return len(intersection) / len(union) if union else 0Flag if similarity > 0.5 (50% keyword overlap)
相似度>0.5(50%关键词重叠)则标记
if jaccard_similarity(desc1, desc2) > 0.5:
issues.append("High overlap with another file")
undefinedif jaccard_similarity(desc1, desc2) > 0.5:
issues.append("High overlap with another file")
undefinedToken Counting (Approximate)
Token计数(近似值)
python
def estimate_tokens(text):
# Rough estimate: 1 token ≈ 0.75 words
word_count = len(text.split())
return int(word_count * 1.3)python
def estimate_tokens(text):
# 粗略估算:1 token ≈ 0.75 个单词
word_count = len(text.split())
return int(word_count * 1.3)Check budget
检查长度上限
tokens = estimate_tokens(file_content)
if tokens > 5000:
issues.append("File too large (>5K tokens)")
---tokens = estimate_tokens(file_content)
if tokens > 5000:
issues.append("File too large (>5K tokens)")
---Industry Context
行业背景
Source: LangChain Agent Report 2026 (public report, page 14-22)
Key Findings:
- 29.5% of organizations deploy agents without systematic evaluation
- 18% cite "agent bugs" as their primary challenge
- Only 12% use automated quality checks (88% manual or none)
- 43% report difficulty maintaining agent quality over time
- Top issues: Hallucinations (31%), poor error handling (28%), unclear triggers (22%)
Implications:
- Automation gap: Most teams rely on manual checklists (error-prone at scale)
- Quality debt: Agents deployed without validation accumulate technical debt
- Maintenance burden: 43% struggle with quality over time (no tracking system)
This skill addresses:
- Automation: Replaces manual checklists with quantitative scoring
- Tracking: JSON output enables trend analysis over time
- Standards: 80% threshold provides clear production gate
来源:《LangChain Agent报告2026》(公开报告,第14-22页)
核心发现:
- **29.5%**的机构部署Agent前没有经过系统评估
- **18%**的团队将“Agent漏洞”列为首要挑战
- **仅12%**的团队使用自动化质量检查(88%的团队使用人工检查或无检查)
- **43%**的团队表示长期维护Agent质量很困难
- Top问题:幻觉(31%)、错误处理不足(28%)、触发条件不清晰(22%)
启示:
- 自动化缺口:大多数团队依赖人工检查清单(规模扩大后容易出错)
- 质量债务:未经验证就部署的Agent会累积技术债务
- 维护负担:43%的团队难以长期维持质量(没有跟踪系统)
本Skill解决的问题:
- 自动化:用量化评分替代人工检查清单
- 跟踪:JSON输出支持长期趋势分析
- 标准:80%的阈值提供清晰的生产上线门槛
Output Examples
输出示例
Quick Audit (Top-5 Criteria)
快速审计(Top5标准)
markdown
undefinedmarkdown
undefinedQuick Audit: Agents/Skills/Commands
快速审计:Agent/Skill/命令
Files: 15 (5 agents, 8 skills, 2 commands)
Critical Issues: 3 files fail top-5 criteria
文件总数:15(5个Agent、8个Skill、2个命令)
严重问题:3个文件未通过Top5标准检查
Top-5 Criteria (Pass/Fail)
Top5标准(通过/未通过)
| File | Valid Name | Has Triggers | Error Handling | No Hardcoded Paths | Examples |
|---|---|---|---|---|---|
| agent1.md | ✅ | ✅ | ❌ | ✅ | ❌ |
| skill2/ | ✅ | ❌ | ✅ | ❌ | ✅ |
| 文件 | 有效名称 | 触发条件 | 错误处理 | 无硬编码路径 | 示例 |
|---|---|---|---|---|---|
| agent1.md | ✅ | ✅ | ❌ | ✅ | ❌ |
| skill2/ | ✅ | ❌ | ✅ | ❌ | ✅ |
Action Required
需要执行的操作
- Add error handling: 5 files
- Remove hardcoded paths: 3 files
- Add usage examples: 4 files
undefined- 添加错误处理:5个文件
- 移除硬编码路径:3个文件
- 添加使用示例:4个文件
undefinedFull Audit
全量审计
See Phase 4: Report Generation above for full structure.
完整结构参考上文阶段4:报告生成部分。
Comparative (Full + Benchmarks)
对比审计(全量+基准对比)
markdown
undefinedmarkdown
undefinedComparative Audit
对比审计
Project vs Templates
项目 vs 模板基准
| File | Project Score | Template Score | Gap | Top Missing |
|---|---|---|---|---|
| debugging-specialist.md | 78% (C) | 94% (A) | -16 pts | Anti-hallucination, edge cases |
| testing-expert/ | 85% (B) | 91% (A) | -6 pts | Integration docs |
| 文件 | 项目得分 | 模板得分 | 差距 | 核心缺失项 |
|---|---|---|---|---|
| debugging-specialist.md | 78% (C) | 94% (A) | -16分 | 防幻觉措施、边界场景 |
| testing-expert/ | 85% (B) | 91% (A) | -6分 | 集成文档 |
Recommendations
优化建议
Focus on these gaps to reach template quality:
- Anti-hallucination measures (8 files): Add source verification sections
- Edge case documentation (5 files): Add failure scenario examples
- Integration documentation (4 files): List compatible agents/skills
---重点优化以下缺口即可达到模板质量:
- 防幻觉措施(8个文件):添加来源验证章节
- 边界场景文档(5个文件):添加失败场景示例
- 集成文档(4个文件):列出兼容的Agent/Skill
---Usage
使用方法
Basic (Full Audit)
基础用法(全量审计)
bash
undefinedbash
undefinedIn Claude Code
在Claude Code中使用
Use skill: audit-agents-skills
Use skill: audit-agents-skills
Specify path
指定路径
Use skill: audit-agents-skills for ~/projects/my-app
undefinedUse skill: audit-agents-skills for ~/projects/my-app
undefinedWith Options
带参数使用
bash
undefinedbash
undefinedQuick audit (fast)
快速审计(速度快)
Use skill: audit-agents-skills with mode=quick
Use skill: audit-agents-skills with mode=quick
Comparative (benchmark analysis)
对比审计(基准分析)
Use skill: audit-agents-skills with mode=comparative
Use skill: audit-agents-skills with mode=comparative
Generate fixes
生成修复建议
Use skill: audit-agents-skills with fixes=true
Use skill: audit-agents-skills with fixes=true
Custom output path
自定义输出路径
Use skill: audit-agents-skills with output=~/Desktop/audit.json
undefinedUse skill: audit-agents-skills with output=~/Desktop/audit.json
undefinedJSON Output Only
仅输出JSON
bash
undefinedbash
undefinedFor programmatic integration
用于程序集成
Use skill: audit-agents-skills with format=json output=audit.json
---Use skill: audit-agents-skills with format=json output=audit.json
---Integration with CI/CD
与CI/CD集成
Pre-commit Hook
Pre-commit钩子
bash
#!/bin/bashbash
#!/bin/bash.git/hooks/pre-commit
.git/hooks/pre-commit
Run quick audit on changed agent/skill/command files
对变更的Agent/Skill/命令文件运行快速审计
changed_files=$(git diff --cached --name-only | grep -E "^.claude/(agents|skills|commands)/")
if [ -n "$changed_files" ]; then
echo "Running quick audit on changed files..."
# Run audit (requires Claude Code CLI wrapper)
# Exit with 1 if any file scores <80%
fi
undefinedchanged_files=$(git diff --cached --name-only | grep -E "^.claude/(agents|skills|commands)/")
if [ -n "$changed_files" ]; then
echo "Running quick audit on changed files..."
# 运行审计(需要Claude Code CLI封装)
# 如果有文件得分<80%则退出码为1
fi
undefinedGitHub Actions
GitHub Actions
yaml
name: Audit Agents/Skills
on: [pull_request]
jobs:
audit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Run quality audit
run: |
# Run audit skill
# Parse JSON output
# Fail if overall_score < 80yaml
name: Audit Agents/Skills
on: [pull_request]
jobs:
audit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Run quality audit
run: |
# 运行审计Skill
# 解析JSON输出
# 如果整体得分<80则失败Comparison: Command vs Skill
对比:命令 vs Skill
| Aspect | Command ( | Skill (this file) |
|---|---|---|
| Scope | Current project only | Multi-project, comparative |
| Output | Markdown report | Markdown + JSON |
| Speed | Fast (5-10 min) | Slower (10-20 min with comparative) |
| Depth | Standard 16 criteria | Same + benchmark analysis |
| Fix suggestions | Via | Built-in with recommendations |
| Programmatic | Terminal output | JSON for CI/CD integration |
| Best for | Quick checks, dev workflow | Deep audits, quality tracking |
Recommendation: Use command for daily checks, skill for release gates and quality tracking.
| 维度 | 命令( | Skill(本文件) |
|---|---|---|
| 适用范围 | 仅当前项目 | 多项目,支持对比 |
| 输出 | Markdown报告 | Markdown + JSON |
| 速度 | 快(5-10分钟) | 较慢(对比模式10-20分钟) |
| 审计深度 | 标准16项检查 | 相同+基准对比分析 |
| 修复建议 | 通过 | 内置优化建议 |
| 程序集成 | 终端输出 | 支持JSON对接CI/CD |
| 适用场景 | 快速检查、开发流程 | 深度审计、质量跟踪 |
推荐用法:日常检查用命令,发布卡点和质量跟踪用Skill。
Maintenance
维护
Updating Criteria
更新标准
Edit :
scoring/criteria.yamlyaml
agents:
categories:
identity:
criteria:
- id: A1.5 # New criterion
name: "API versioning specified"
points: 3
detection: "mentions API version or compatibility"Version bump: Increment in frontmatter when criteria change.
version编辑:
scoring/criteria.yamlyaml
agents:
categories:
identity:
criteria:
- id: A1.5 # 新增标准
name: "API versioning specified"
points: 3
detection: "mentions API version or compatibility"版本升级:标准变更时增加frontmatter中的字段。
versionAdding File Types
新增文件类型支持
To support new file types (e.g., "workflows"):
- Add to :
scoring/criteria.yamlyamlworkflows: max_points: 24 categories: [...] - Update detection logic (file path patterns)
- Update report templates
要支持新的文件类型(如“workflows”):
- 在中添加配置:
scoring/criteria.yamlyamlworkflows: max_points: 24 categories: [...] - 更新检测逻辑(文件路径匹配规则)
- 更新报告模板
Related
相关资源
- Command version:
.claude/commands/audit-agents-skills.md - Agent Validation Checklist: guide line 4921 (manual 16 criteria)
- Skill Validation: guide line 5491 (spec documentation)
- Reference templates: ,
examples/agents/,examples/skills/examples/commands/
- 命令版本:
.claude/commands/audit-agents-skills.md - Agent验证检查清单:指南第4921行(16项人工检查标准)
- Skill验证规范:指南第5491行(规格文档)
- 参考模板:、
examples/agents/、examples/skills/examples/commands/
Changelog
更新日志
v1.0.0 (2026-02-07):
- Initial release
- 16-criteria framework (agents/skills/commands)
- 3 audit modes (quick/full/comparative)
- JSON + Markdown output
- Fix suggestions
- Industry context (LangChain 2026 report)
Skill ready for use:
audit-agents-skillsv1.0.0(2026-02-07):
- 首次发布
- 16项标准框架(支持Agent/Skill/命令)
- 3种审计模式(快速/全量/对比)
- JSON+Markdown输出
- 修复建议功能
- 行业背景(LangChain 2026报告)
Skill可投入使用:
audit-agents-skills