assess
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAssess
评估
Comprehensive assessment skill for answering "is this good?" with structured evaluation, scoring, and actionable recommendations.
这是一款综合性评估技能,通过结构化评估、打分和可落地的建议来回答「这个方案/代码/设计好不好?」这类问题。
Quick Start
快速开始
bash
/assess backend/app/services/auth.py
/assess our caching strategy
/assess the current database schema
/assess frontend/src/components/Dashboardbash
/assess backend/app/services/auth.py
/assess our caching strategy
/assess the current database schema
/assess frontend/src/components/DashboardSTEP 0: Verify User Intent with AskUserQuestion
步骤0:通过AskUserQuestion确认用户意图
BEFORE creating tasks, clarify assessment dimensions:
python
AskUserQuestion(
questions=[{
"question": "What dimensions to assess?",
"header": "Dimensions",
"options": [
{"label": "Full assessment (Recommended)", "description": "All dimensions: quality, maintainability, security, performance"},
{"label": "Code quality only", "description": "Readability, complexity, best practices"},
{"label": "Security focus", "description": "Vulnerabilities, attack surface, compliance"},
{"label": "Quick score", "description": "Just give me a 0-10 score with brief notes"}
],
"multiSelect": false
}]
)Based on answer, adjust workflow:
- Full assessment: All 7 phases, parallel agents
- Code quality only: Skip security and performance phases
- Security focus: Prioritize security-auditor agent
- Quick score: Single pass, brief output
在创建任务之前,先明确评估维度:
python
AskUserQuestion(
questions=[{
"question": "What dimensions to assess?",
"header": "Dimensions",
"options": [
{"label": "Full assessment (Recommended)", "description": "All dimensions: quality, maintainability, security, performance"},
{"label": "Code quality only", "description": "Readability, complexity, best practices"},
{"label": "Security focus", "description": "Vulnerabilities, attack surface, compliance"},
{"label": "Quick score", "description": "Just give me a 0-10 score with brief notes"}
],
"multiSelect": false
}]
)根据用户回答调整工作流:
- 全面评估:覆盖全部7个阶段,使用并行Agent
- 仅代码质量评估:跳过安全和性能阶段
- 侧重安全评估:优先调用security-auditor agent
- 快速打分:单次评估,输出简洁结果
STEP 0b: Select Orchestration Mode
步骤0b:选择编排模式
Choose Agent Teams (mesh — assessors cross-validate scores) or Task tool (star — all report to lead):
- → Agent Teams mode
ORCHESTKIT_PREFER_TEAMS=1 - Agent Teams unavailable → Task tool mode (default)
- Otherwise: Full assessment with 6 dimension agents → recommend Agent Teams; Quick score or single-dimension → Task tool
| Aspect | Task Tool | Agent Teams |
|---|---|---|
| Score calibration | Lead normalizes independently | Assessors discuss disagreements |
| Cross-dimension findings | Lead correlates after completion | Security assessor alerts performance assessor of overlap |
| Cost | ~200K tokens | ~500K tokens |
| Best for | Quick scores, single dimension | Full multi-dimensional assessment |
Fallback: If Agent Teams encounters issues, fall back to Task tool for remaining assessment.
选择Agent Teams(网状模式 — 评估人员交叉校验分数)或Task工具(星型模式 — 所有结果汇报至负责人):
- → Agent Teams模式
ORCHESTKIT_PREFER_TEAMS=1 - 若Agent Teams不可用 → Task工具模式(默认)
- 其他情况:全维度评估推荐使用Agent Teams;快速打分或单维度评估使用Task工具
| 维度 | Task工具 | Agent Teams |
|---|---|---|
| 分数校准 | 负责人独立标准化 | 评估人员讨论分歧 |
| 跨维度发现 | 负责人在完成后关联 | 安全评估人员向性能评估人员预警重叠问题 |
| 成本 | ~200K tokens | ~500K tokens |
| 最佳适用场景 | 快速打分、单维度评估 | 全维度综合评估 |
降级方案:若Agent Teams出现问题,剩余评估环节切换为Task工具模式。
Task Management (CC 2.1.16)
任务管理(CC 2.1.16)
python
undefinedpython
undefinedCreate main assessment task
创建主评估任务
TaskCreate(
subject="Assess: {target}",
description="Comprehensive evaluation with quality scores and recommendations",
activeForm="Assessing {target}"
)
TaskCreate(
subject="Assess: {target}",
description="Comprehensive evaluation with quality scores and recommendations",
activeForm="Assessing {target}"
)
Create subtasks for 7-phase process
为7阶段流程创建子任务
for phase in ["Understand target", "Rate quality", "List pros/cons",
"Compare alternatives", "Generate suggestions",
"Estimate effort", "Compile report"]:
TaskCreate(subject=phase, activeForm=f"{phase}ing")
---for phase in ["Understand target", "Rate quality", "List pros/cons",
"Compare alternatives", "Generate suggestions",
"Estimate effort", "Compile report"]:
TaskCreate(subject=phase, activeForm=f"{phase}ing")
---What This Skill Answers
该技能可回答的问题
| Question | How It's Answered |
|---|---|
| "Is this good?" | Quality score 0-10 with reasoning |
| "What are the trade-offs?" | Structured pros/cons list |
| "Should we change this?" | Improvement suggestions with effort |
| "What are the alternatives?" | Comparison with scores |
| "Where should we focus?" | Prioritized recommendations |
| 问题 | 回答方式 |
|---|---|
| 「这个好不好?」 | 给出0-10分的质量评分及理由 |
| 「有哪些权衡?」 | 结构化的优缺点列表 |
| 「我们应该改进吗?」 | 附带实施成本的改进建议 |
| 「有哪些替代方案?」 | 带评分的方案对比 |
| 「我们应该重点关注什么?」 | 优先级明确的建议 |
Workflow Overview
工作流概览
| Phase | Activities | Output |
|---|---|---|
| 1. Target Understanding | Read code/design, identify scope | Context summary |
| 2. Quality Rating | 6-dimension scoring (0-10) | Scores with reasoning |
| 3. Pros/Cons Analysis | Strengths and weaknesses | Balanced evaluation |
| 4. Alternative Comparison | Score alternatives | Comparison matrix |
| 5. Improvement Suggestions | Actionable recommendations | Prioritized list |
| 6. Effort Estimation | Time and complexity estimates | Effort breakdown |
| 7. Assessment Report | Compile findings | Final report |
| 阶段 | 活动内容 | 输出 |
|---|---|---|
| 1. 目标理解 | 读取代码/设计,确定评估范围 | 上下文摘要 |
| 2. 质量评级 | 6维度打分(0-10) | 带理由的评分结果 |
| 3. 优缺点分析 | 梳理优势与不足 | 平衡的评估结果 |
| 4. 替代方案对比 | 为替代方案打分 | 对比矩阵 |
| 5. 改进建议 | 可落地的优化建议 | 优先级列表 |
| 6. 成本估算 | 时间与复杂度估算 | 成本拆分 |
| 7. 评估报告 | 整合所有发现 | 最终报告 |
Phase 1: Target Understanding
阶段1:目标理解
Identify what's being assessed (code, design, approach, decision, pattern) and gather context:
python
undefined确定评估对象(代码、设计、方案、决策、模式)并收集上下文:
python
undefinedPARALLEL - Gather context
并行执行 - 收集上下文
Read(file_path="$ARGUMENTS") # If file path
Grep(pattern="$ARGUMENTS", output_mode="files_with_matches")
mcp__memory__search_nodes(query="$ARGUMENTS") # Past decisions
---Read(file_path="$ARGUMENTS") # 若传入文件路径
Grep(pattern="$ARGUMENTS", output_mode="files_with_matches")
mcp__memory__search_nodes(query="$ARGUMENTS") # 历史决策
---Phase 2: Quality Rating (6 Dimensions)
阶段2:质量评级(6维度)
Rate each dimension 0-10 with weighted composite score. See Scoring Rubric for details.
| Dimension | Weight | What It Measures |
|---|---|---|
| Correctness | 0.20 | Does it work correctly? |
| Maintainability | 0.20 | Easy to understand/modify? |
| Performance | 0.15 | Efficient, no bottlenecks? |
| Security | 0.15 | Follows best practices? |
| Scalability | 0.15 | Handles growth? |
| Testability | 0.15 | Easy to test? |
Composite Score: Weighted average of all dimensions.
Launch 6 parallel agents (one per dimension) with .
run_in_background=True对每个维度进行0-10分打分,最终计算加权综合得分。详情请见评分规则。
| 维度 | 权重 | 评估内容 |
|---|---|---|
| 正确性 | 0.20 | 是否能正常工作? |
| 可维护性 | 0.20 | 是否易于理解/修改? |
| 性能 | 0.15 | 是否高效、无瓶颈? |
| 安全性 | 0.15 | 是否遵循最佳实践? |
| 可扩展性 | 0.15 | 是否能应对业务增长? |
| 可测试性 | 0.15 | 是否易于测试? |
综合得分:所有维度的加权平均值。
启动6个并行Agent(每个维度一个),设置。
run_in_background=TruePhase 2 — Agent Teams Alternative
阶段2 — Agent Teams替代方案
In Agent Teams mode, form an assessment team where dimension assessors cross-validate scores and discuss disagreements:
python
TeamCreate(team_name="assess-{target-slug}", description="Assess {target}")
Task(subagent_type="code-quality-reviewer", name="correctness-assessor",
team_name="assess-{target-slug}",
prompt="""Assess CORRECTNESS (0-10) and MAINTAINABILITY (0-10) for: {target}
When you find issues that affect security, message security-assessor.
When you find issues that affect performance, message perf-assessor.
Share your scores with all teammates for calibration — if scores diverge
significantly (>2 points), discuss the disagreement.""")
Task(subagent_type="security-auditor", name="security-assessor",
team_name="assess-{target-slug}",
prompt="""Assess SECURITY (0-10) for: {target}
When correctness-assessor flags security-relevant patterns, investigate deeper.
When you find performance-impacting security measures, message perf-assessor.
Share your score and flag any cross-dimension trade-offs.""")
Task(subagent_type="performance-engineer", name="perf-assessor",
team_name="assess-{target-slug}",
prompt="""Assess PERFORMANCE (0-10) and SCALABILITY (0-10) for: {target}
When security-assessor flags performance trade-offs, evaluate the impact.
When you find testability issues (hard-to-benchmark code), message test-assessor.
Share your scores with reasoning for the composite calculation.""")
Task(subagent_type="test-generator", name="test-assessor",
team_name="assess-{target-slug}",
prompt="""Assess TESTABILITY (0-10) for: {target}
Evaluate test coverage, test quality, and ease of testing.
When other assessors flag dimension-specific concerns, verify test coverage
for those areas. Share your score and any coverage gaps found.""")Team teardown after report compilation:
python
SendMessage(type="shutdown_request", recipient="correctness-assessor", content="Assessment complete")
SendMessage(type="shutdown_request", recipient="security-assessor", content="Assessment complete")
SendMessage(type="shutdown_request", recipient="perf-assessor", content="Assessment complete")
SendMessage(type="shutdown_request", recipient="test-assessor", content="Assessment complete")
TeamDelete()Fallback: If team formation fails, use standard Phase 2 Task spawns above.
在Agent Teams模式下,组建评估团队,各维度评估人员交叉校验分数并讨论分歧:
python
TeamCreate(team_name="assess-{target-slug}", description="Assess {target}")
Task(subagent_type="code-quality-reviewer", name="correctness-assessor",
team_name="assess-{target-slug}",
prompt="""Assess CORRECTNESS (0-10) and MAINTAINABILITY (0-10) for: {target}
When you find issues that affect security, message security-assessor.
When you find issues that affect performance, message perf-assessor.
Share your scores with all teammates for calibration — if scores diverge
significantly (>2 points), discuss the disagreement.""")
Task(subagent_type="security-auditor", name="security-assessor",
team_name="assess-{target-slug}",
prompt="""Assess SECURITY (0-10) for: {target}
When correctness-assessor flags security-relevant patterns, investigate deeper.
When you find performance-impacting security measures, message perf-assessor.
Share your score and flag any cross-dimension trade-offs.""")
Task(subagent_type="performance-engineer", name="perf-assessor",
team_name="assess-{target-slug}",
prompt="""Assess PERFORMANCE (0-10) and SCALABILITY (0-10) for: {target}
When security-assessor flags performance trade-offs, evaluate the impact.
When you find testability issues (hard-to-benchmark code), message test-assessor.
Share your scores with reasoning for the composite calculation.""")
Task(subagent_type="test-generator", name="test-assessor",
team_name="assess-{target-slug}",
prompt="""Assess TESTABILITY (0-10) for: {target}
Evaluate test coverage, test quality, and ease of testing.
When other assessors flag dimension-specific concerns, verify test coverage
for those areas. Share your score and any coverage gaps found.""")报告完成后解散团队:
python
SendMessage(type="shutdown_request", recipient="correctness-assessor", content="Assessment complete")
SendMessage(type="shutdown_request", recipient="security-assessor", content="Assessment complete")
SendMessage(type="shutdown_request", recipient="perf-assessor", content="Assessment complete")
SendMessage(type="shutdown_request", recipient="test-assessor", content="Assessment complete")
TeamDelete()降级方案:若团队创建失败,使用上述标准的阶段2任务生成方式。
Phase 3: Pros/Cons Analysis
阶段3:优缺点分析
markdown
undefinedmarkdown
undefinedPros (Strengths)
优势(Strengths)
| # | Strength | Impact | Evidence |
|---|---|---|---|
| 1 | [strength] | High/Med/Low | [example] |
| 序号 | 优势 | 影响程度 | 证据 |
|---|---|---|---|
| 1 | [优势内容] | 高/中/低 | [示例] |
Cons (Weaknesses)
劣势(Weaknesses)
| # | Weakness | Severity | Evidence |
|---|---|---|---|
| 1 | [weakness] | High/Med/Low | [example] |
Net Assessment: [Strengths outweigh / Balanced / Weaknesses dominate]
Recommended action: [Keep as-is / Improve / Reconsider / Rewrite]
---| 序号 | 劣势 | 严重程度 | 证据 |
|---|---|---|---|
| 1 | [劣势内容] | 高/中/低 | [示例] |
整体评估:[优势占优 / 势均力敌 / 劣势主导]
建议行动:[保持现状 / 优化改进 / 重新考量 / 重写]
---Phase 4: Alternative Comparison
阶段4:替代方案对比
See Alternative Analysis for full comparison template.
| Criteria | Current | Alternative A | Alternative B |
|---|---|---|---|
| Composite | [N.N] | [N.N] | [N.N] |
| Migration Effort | N/A | [1-5] | [1-5] |
完整对比模板请见替代方案分析。
| 评估标准 | 当前方案 | 替代方案A | 替代方案B |
|---|---|---|---|
| 综合得分 | [N.N] | [N.N] | [N.N] |
| 迁移成本 | N/A | [1-5] | [1-5] |
Phase 5: Improvement Suggestions
阶段5:改进建议
See Improvement Prioritization for effort/impact guidelines.
| Suggestion | Effort (1-5) | Impact (1-5) | Priority (I/E) |
|---|---|---|---|
| [action] | [N] | [N] | [ratio] |
Quick Wins = Effort <= 2 AND Impact >= 4. Always highlight these first.
努力程度/影响程度评分规则请见改进优先级指南。
| 建议内容 | 实施成本(1-5) | 影响程度(1-5) | 优先级(影响/成本) |
|---|---|---|---|
| [行动项] | [N] | [N] | [比值] |
快速落地项 = 实施成本 ≤2 且 影响程度 ≥4。需优先高亮展示。
Phase 6: Effort Estimation
阶段6:成本估算
| Timeframe | Tasks | Total |
|---|---|---|
| Quick wins (< 1hr) | [list] | X min |
| Short-term (< 1 day) | [list] | X hrs |
| Medium-term (1-3 days) | [list] | X days |
| 时间范围 | 任务内容 | 总耗时 |
|---|---|---|
| 快速落地(<1小时) | [任务列表] | X分钟 |
| 短期(<1天) | [任务列表] | X小时 |
| 中期(1-3天) | [任务列表] | X天 |
Phase 7: Assessment Report
阶段7:评估报告
See Scoring Rubric for full report template.
markdown
undefined完整报告模板请见评分规则。
markdown
undefinedAssessment Report: $ARGUMENTS
评估报告:$ARGUMENTS
Overall Score: [N.N]/10 (Grade: [A+/A/B/C/D/F])
Verdict: [EXCELLENT | GOOD | ADEQUATE | NEEDS WORK | CRITICAL]
整体得分:[N.N]/10(等级:[A+/A/B/C/D/F])
结论: [优秀 | 良好 | 合格 | 需要改进 | 严重问题]
Answer: Is This Good?
问题:这个好不好?
[YES / MOSTLY / SOMEWHAT / NO]
[Reasoning]
---[是 / 大部分是 / 部分是 / 否]
[理由说明]
---Grade Interpretation
等级说明
| Score | Grade | Verdict |
|---|---|---|
| 9.0-10.0 | A+ | EXCELLENT |
| 8.0-8.9 | A | GOOD |
| 7.0-7.9 | B | GOOD |
| 6.0-6.9 | C | ADEQUATE |
| 5.0-5.9 | D | NEEDS WORK |
| 0.0-4.9 | F | CRITICAL |
| 分数 | 等级 | 结论 |
|---|---|---|
| 9.0-10.0 | A+ | 优秀 |
| 8.0-8.9 | A | 良好 |
| 7.0-7.9 | B | 良好 |
| 6.0-6.9 | C | 合格 |
| 5.0-5.9 | D | 需要改进 |
| 0.0-4.9 | F | 严重问题 |
Key Decisions
核心决策
| Decision | Choice | Rationale |
|---|---|---|
| 6 dimensions | Comprehensive coverage | All quality aspects without overwhelming |
| 0-10 scale | Industry standard | Easy to understand and compare |
| Parallel assessment | 6 agents | Fast, thorough evaluation |
| Effort/Impact scoring | 1-5 scale | Simple prioritization math |
| 决策内容 | 选择 | 理由 |
|---|---|---|
| 6个评估维度 | 全面覆盖 | 涵盖所有质量维度且不会过于复杂 |
| 0-10分制 | 行业标准 | 易于理解和对比 |
| 并行评估 | 6个Agent | 快速、全面的评估 |
| 成本/影响评分 | 1-5分制 | 简单的优先级计算方式 |
Related Skills
相关技能
- - Task complexity assessment
assess-complexity - - Post-implementation verification
verify - - Code review patterns
code-review-playbook - - Quality gate patterns
quality-gates
Version: 1.0.0 (January 2026)
- - 任务复杂度评估
assess-complexity - - 落地后验证
verify - - 代码评审模式
code-review-playbook - - 质量门禁模式
quality-gates
版本: 1.0.0(2026年1月)