assess

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Assess

评估

Comprehensive assessment skill for answering "is this good?" with structured evaluation, scoring, and actionable recommendations.
这是一款综合性评估技能,通过结构化评估、打分和可落地的建议来回答「这个方案/代码/设计好不好?」这类问题。

Quick Start

快速开始

bash
/assess backend/app/services/auth.py
/assess our caching strategy
/assess the current database schema
/assess frontend/src/components/Dashboard

bash
/assess backend/app/services/auth.py
/assess our caching strategy
/assess the current database schema
/assess frontend/src/components/Dashboard

STEP 0: Verify User Intent with AskUserQuestion

步骤0:通过AskUserQuestion确认用户意图

BEFORE creating tasks, clarify assessment dimensions:
python
AskUserQuestion(
  questions=[{
    "question": "What dimensions to assess?",
    "header": "Dimensions",
    "options": [
      {"label": "Full assessment (Recommended)", "description": "All dimensions: quality, maintainability, security, performance"},
      {"label": "Code quality only", "description": "Readability, complexity, best practices"},
      {"label": "Security focus", "description": "Vulnerabilities, attack surface, compliance"},
      {"label": "Quick score", "description": "Just give me a 0-10 score with brief notes"}
    ],
    "multiSelect": false
  }]
)
Based on answer, adjust workflow:
  • Full assessment: All 7 phases, parallel agents
  • Code quality only: Skip security and performance phases
  • Security focus: Prioritize security-auditor agent
  • Quick score: Single pass, brief output

在创建任务之前,先明确评估维度:
python
AskUserQuestion(
  questions=[{
    "question": "What dimensions to assess?",
    "header": "Dimensions",
    "options": [
      {"label": "Full assessment (Recommended)", "description": "All dimensions: quality, maintainability, security, performance"},
      {"label": "Code quality only", "description": "Readability, complexity, best practices"},
      {"label": "Security focus", "description": "Vulnerabilities, attack surface, compliance"},
      {"label": "Quick score", "description": "Just give me a 0-10 score with brief notes"}
    ],
    "multiSelect": false
  }]
)
根据用户回答调整工作流:
  • 全面评估:覆盖全部7个阶段,使用并行Agent
  • 仅代码质量评估:跳过安全和性能阶段
  • 侧重安全评估:优先调用security-auditor agent
  • 快速打分:单次评估,输出简洁结果

STEP 0b: Select Orchestration Mode

步骤0b:选择编排模式

Choose Agent Teams (mesh — assessors cross-validate scores) or Task tool (star — all report to lead):
  1. ORCHESTKIT_PREFER_TEAMS=1
    Agent Teams mode
  2. Agent Teams unavailable → Task tool mode (default)
  3. Otherwise: Full assessment with 6 dimension agents → recommend Agent Teams; Quick score or single-dimension → Task tool
AspectTask ToolAgent Teams
Score calibrationLead normalizes independentlyAssessors discuss disagreements
Cross-dimension findingsLead correlates after completionSecurity assessor alerts performance assessor of overlap
Cost~200K tokens~500K tokens
Best forQuick scores, single dimensionFull multi-dimensional assessment
Fallback: If Agent Teams encounters issues, fall back to Task tool for remaining assessment.

选择Agent Teams(网状模式 — 评估人员交叉校验分数)或Task工具(星型模式 — 所有结果汇报至负责人):
  1. ORCHESTKIT_PREFER_TEAMS=1
    Agent Teams模式
  2. 若Agent Teams不可用 → Task工具模式(默认)
  3. 其他情况:全维度评估推荐使用Agent Teams;快速打分或单维度评估使用Task工具
维度Task工具Agent Teams
分数校准负责人独立标准化评估人员讨论分歧
跨维度发现负责人在完成后关联安全评估人员向性能评估人员预警重叠问题
成本~200K tokens~500K tokens
最佳适用场景快速打分、单维度评估全维度综合评估
降级方案:若Agent Teams出现问题,剩余评估环节切换为Task工具模式。

Task Management (CC 2.1.16)

任务管理(CC 2.1.16)

python
undefined
python
undefined

Create main assessment task

创建主评估任务

TaskCreate( subject="Assess: {target}", description="Comprehensive evaluation with quality scores and recommendations", activeForm="Assessing {target}" )
TaskCreate( subject="Assess: {target}", description="Comprehensive evaluation with quality scores and recommendations", activeForm="Assessing {target}" )

Create subtasks for 7-phase process

为7阶段流程创建子任务

for phase in ["Understand target", "Rate quality", "List pros/cons", "Compare alternatives", "Generate suggestions", "Estimate effort", "Compile report"]: TaskCreate(subject=phase, activeForm=f"{phase}ing")

---
for phase in ["Understand target", "Rate quality", "List pros/cons", "Compare alternatives", "Generate suggestions", "Estimate effort", "Compile report"]: TaskCreate(subject=phase, activeForm=f"{phase}ing")

---

What This Skill Answers

该技能可回答的问题

QuestionHow It's Answered
"Is this good?"Quality score 0-10 with reasoning
"What are the trade-offs?"Structured pros/cons list
"Should we change this?"Improvement suggestions with effort
"What are the alternatives?"Comparison with scores
"Where should we focus?"Prioritized recommendations

问题回答方式
「这个好不好?」给出0-10分的质量评分及理由
「有哪些权衡?」结构化的优缺点列表
「我们应该改进吗?」附带实施成本的改进建议
「有哪些替代方案?」带评分的方案对比
「我们应该重点关注什么?」优先级明确的建议

Workflow Overview

工作流概览

PhaseActivitiesOutput
1. Target UnderstandingRead code/design, identify scopeContext summary
2. Quality Rating6-dimension scoring (0-10)Scores with reasoning
3. Pros/Cons AnalysisStrengths and weaknessesBalanced evaluation
4. Alternative ComparisonScore alternativesComparison matrix
5. Improvement SuggestionsActionable recommendationsPrioritized list
6. Effort EstimationTime and complexity estimatesEffort breakdown
7. Assessment ReportCompile findingsFinal report

阶段活动内容输出
1. 目标理解读取代码/设计,确定评估范围上下文摘要
2. 质量评级6维度打分(0-10)带理由的评分结果
3. 优缺点分析梳理优势与不足平衡的评估结果
4. 替代方案对比为替代方案打分对比矩阵
5. 改进建议可落地的优化建议优先级列表
6. 成本估算时间与复杂度估算成本拆分
7. 评估报告整合所有发现最终报告

Phase 1: Target Understanding

阶段1:目标理解

Identify what's being assessed (code, design, approach, decision, pattern) and gather context:
python
undefined
确定评估对象(代码、设计、方案、决策、模式)并收集上下文:
python
undefined

PARALLEL - Gather context

并行执行 - 收集上下文

Read(file_path="$ARGUMENTS") # If file path Grep(pattern="$ARGUMENTS", output_mode="files_with_matches") mcp__memory__search_nodes(query="$ARGUMENTS") # Past decisions

---
Read(file_path="$ARGUMENTS") # 若传入文件路径 Grep(pattern="$ARGUMENTS", output_mode="files_with_matches") mcp__memory__search_nodes(query="$ARGUMENTS") # 历史决策

---

Phase 2: Quality Rating (6 Dimensions)

阶段2:质量评级(6维度)

Rate each dimension 0-10 with weighted composite score. See Scoring Rubric for details.
DimensionWeightWhat It Measures
Correctness0.20Does it work correctly?
Maintainability0.20Easy to understand/modify?
Performance0.15Efficient, no bottlenecks?
Security0.15Follows best practices?
Scalability0.15Handles growth?
Testability0.15Easy to test?
Composite Score: Weighted average of all dimensions.
Launch 6 parallel agents (one per dimension) with
run_in_background=True
.
对每个维度进行0-10分打分,最终计算加权综合得分。详情请见评分规则
维度权重评估内容
正确性0.20是否能正常工作?
可维护性0.20是否易于理解/修改?
性能0.15是否高效、无瓶颈?
安全性0.15是否遵循最佳实践?
可扩展性0.15是否能应对业务增长?
可测试性0.15是否易于测试?
综合得分:所有维度的加权平均值。
启动6个并行Agent(每个维度一个),设置
run_in_background=True

Phase 2 — Agent Teams Alternative

阶段2 — Agent Teams替代方案

In Agent Teams mode, form an assessment team where dimension assessors cross-validate scores and discuss disagreements:
python
TeamCreate(team_name="assess-{target-slug}", description="Assess {target}")

Task(subagent_type="code-quality-reviewer", name="correctness-assessor",
     team_name="assess-{target-slug}",
     prompt="""Assess CORRECTNESS (0-10) and MAINTAINABILITY (0-10) for: {target}
     When you find issues that affect security, message security-assessor.
     When you find issues that affect performance, message perf-assessor.
     Share your scores with all teammates for calibration — if scores diverge
     significantly (>2 points), discuss the disagreement.""")

Task(subagent_type="security-auditor", name="security-assessor",
     team_name="assess-{target-slug}",
     prompt="""Assess SECURITY (0-10) for: {target}
     When correctness-assessor flags security-relevant patterns, investigate deeper.
     When you find performance-impacting security measures, message perf-assessor.
     Share your score and flag any cross-dimension trade-offs.""")

Task(subagent_type="performance-engineer", name="perf-assessor",
     team_name="assess-{target-slug}",
     prompt="""Assess PERFORMANCE (0-10) and SCALABILITY (0-10) for: {target}
     When security-assessor flags performance trade-offs, evaluate the impact.
     When you find testability issues (hard-to-benchmark code), message test-assessor.
     Share your scores with reasoning for the composite calculation.""")

Task(subagent_type="test-generator", name="test-assessor",
     team_name="assess-{target-slug}",
     prompt="""Assess TESTABILITY (0-10) for: {target}
     Evaluate test coverage, test quality, and ease of testing.
     When other assessors flag dimension-specific concerns, verify test coverage
     for those areas. Share your score and any coverage gaps found.""")
Team teardown after report compilation:
python
SendMessage(type="shutdown_request", recipient="correctness-assessor", content="Assessment complete")
SendMessage(type="shutdown_request", recipient="security-assessor", content="Assessment complete")
SendMessage(type="shutdown_request", recipient="perf-assessor", content="Assessment complete")
SendMessage(type="shutdown_request", recipient="test-assessor", content="Assessment complete")
TeamDelete()
Fallback: If team formation fails, use standard Phase 2 Task spawns above.

在Agent Teams模式下,组建评估团队,各维度评估人员交叉校验分数并讨论分歧:
python
TeamCreate(team_name="assess-{target-slug}", description="Assess {target}")

Task(subagent_type="code-quality-reviewer", name="correctness-assessor",
     team_name="assess-{target-slug}",
     prompt="""Assess CORRECTNESS (0-10) and MAINTAINABILITY (0-10) for: {target}
     When you find issues that affect security, message security-assessor.
     When you find issues that affect performance, message perf-assessor.
     Share your scores with all teammates for calibration — if scores diverge
     significantly (>2 points), discuss the disagreement.""")

Task(subagent_type="security-auditor", name="security-assessor",
     team_name="assess-{target-slug}",
     prompt="""Assess SECURITY (0-10) for: {target}
     When correctness-assessor flags security-relevant patterns, investigate deeper.
     When you find performance-impacting security measures, message perf-assessor.
     Share your score and flag any cross-dimension trade-offs.""")

Task(subagent_type="performance-engineer", name="perf-assessor",
     team_name="assess-{target-slug}",
     prompt="""Assess PERFORMANCE (0-10) and SCALABILITY (0-10) for: {target}
     When security-assessor flags performance trade-offs, evaluate the impact.
     When you find testability issues (hard-to-benchmark code), message test-assessor.
     Share your scores with reasoning for the composite calculation.""")

Task(subagent_type="test-generator", name="test-assessor",
     team_name="assess-{target-slug}",
     prompt="""Assess TESTABILITY (0-10) for: {target}
     Evaluate test coverage, test quality, and ease of testing.
     When other assessors flag dimension-specific concerns, verify test coverage
     for those areas. Share your score and any coverage gaps found.""")
报告完成后解散团队
python
SendMessage(type="shutdown_request", recipient="correctness-assessor", content="Assessment complete")
SendMessage(type="shutdown_request", recipient="security-assessor", content="Assessment complete")
SendMessage(type="shutdown_request", recipient="perf-assessor", content="Assessment complete")
SendMessage(type="shutdown_request", recipient="test-assessor", content="Assessment complete")
TeamDelete()
降级方案:若团队创建失败,使用上述标准的阶段2任务生成方式。

Phase 3: Pros/Cons Analysis

阶段3:优缺点分析

markdown
undefined
markdown
undefined

Pros (Strengths)

优势(Strengths)

#StrengthImpactEvidence
1[strength]High/Med/Low[example]
序号优势影响程度证据
1[优势内容]高/中/低[示例]

Cons (Weaknesses)

劣势(Weaknesses)

#WeaknessSeverityEvidence
1[weakness]High/Med/Low[example]
Net Assessment: [Strengths outweigh / Balanced / Weaknesses dominate] Recommended action: [Keep as-is / Improve / Reconsider / Rewrite]

---
序号劣势严重程度证据
1[劣势内容]高/中/低[示例]
整体评估:[优势占优 / 势均力敌 / 劣势主导] 建议行动:[保持现状 / 优化改进 / 重新考量 / 重写]

---

Phase 4: Alternative Comparison

阶段4:替代方案对比

See Alternative Analysis for full comparison template.
CriteriaCurrentAlternative AAlternative B
Composite[N.N][N.N][N.N]
Migration EffortN/A[1-5][1-5]

完整对比模板请见替代方案分析
评估标准当前方案替代方案A替代方案B
综合得分[N.N][N.N][N.N]
迁移成本N/A[1-5][1-5]

Phase 5: Improvement Suggestions

阶段5:改进建议

See Improvement Prioritization for effort/impact guidelines.
SuggestionEffort (1-5)Impact (1-5)Priority (I/E)
[action][N][N][ratio]
Quick Wins = Effort <= 2 AND Impact >= 4. Always highlight these first.

努力程度/影响程度评分规则请见改进优先级指南
建议内容实施成本(1-5)影响程度(1-5)优先级(影响/成本)
[行动项][N][N][比值]
快速落地项 = 实施成本 ≤2 且 影响程度 ≥4。需优先高亮展示。

Phase 6: Effort Estimation

阶段6:成本估算

TimeframeTasksTotal
Quick wins (< 1hr)[list]X min
Short-term (< 1 day)[list]X hrs
Medium-term (1-3 days)[list]X days

时间范围任务内容总耗时
快速落地(<1小时)[任务列表]X分钟
短期(<1天)[任务列表]X小时
中期(1-3天)[任务列表]X天

Phase 7: Assessment Report

阶段7:评估报告

See Scoring Rubric for full report template.
markdown
undefined
完整报告模板请见评分规则
markdown
undefined

Assessment Report: $ARGUMENTS

评估报告:$ARGUMENTS

Overall Score: [N.N]/10 (Grade: [A+/A/B/C/D/F])
Verdict: [EXCELLENT | GOOD | ADEQUATE | NEEDS WORK | CRITICAL]
整体得分:[N.N]/10(等级:[A+/A/B/C/D/F])
结论: [优秀 | 良好 | 合格 | 需要改进 | 严重问题]

Answer: Is This Good?

问题:这个好不好?

[YES / MOSTLY / SOMEWHAT / NO] [Reasoning]

---
[是 / 大部分是 / 部分是 / 否] [理由说明]

---

Grade Interpretation

等级说明

ScoreGradeVerdict
9.0-10.0A+EXCELLENT
8.0-8.9AGOOD
7.0-7.9BGOOD
6.0-6.9CADEQUATE
5.0-5.9DNEEDS WORK
0.0-4.9FCRITICAL

分数等级结论
9.0-10.0A+优秀
8.0-8.9A良好
7.0-7.9B良好
6.0-6.9C合格
5.0-5.9D需要改进
0.0-4.9F严重问题

Key Decisions

核心决策

DecisionChoiceRationale
6 dimensionsComprehensive coverageAll quality aspects without overwhelming
0-10 scaleIndustry standardEasy to understand and compare
Parallel assessment6 agentsFast, thorough evaluation
Effort/Impact scoring1-5 scaleSimple prioritization math

决策内容选择理由
6个评估维度全面覆盖涵盖所有质量维度且不会过于复杂
0-10分制行业标准易于理解和对比
并行评估6个Agent快速、全面的评估
成本/影响评分1-5分制简单的优先级计算方式

Related Skills

相关技能

  • assess-complexity
    - Task complexity assessment
  • verify
    - Post-implementation verification
  • code-review-playbook
    - Code review patterns
  • quality-gates
    - Quality gate patterns

Version: 1.0.0 (January 2026)
  • assess-complexity
    - 任务复杂度评估
  • verify
    - 落地后验证
  • code-review-playbook
    - 代码评审模式
  • quality-gates
    - 质量门禁模式

版本: 1.0.0(2026年1月)