task-quality-kpi
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseTask Quality KPI Framework
任务质量KPI框架
Overview
概述
The Task Quality KPI Framework provides objective, quantitative metrics for evaluating task implementation quality.
Key Architecture: KPIs are auto-generated by a hook - you read the results, not run scripts.
┌─────────────────────────────────────────────────────────────┐
│ HOOK (auto-executes) │
│ Trigger: PostToolUse on TASK-*.md │
│ Script: task-kpi-analyzer.py │
│ Output: TASK-XXX--kpi.json │
├─────────────────────────────────────────────────────────────┤
│ SKILL / AGENT (reads output) │
│ Input: TASK-XXX--kpi.json │
│ Action: Make evaluation decisions │
└─────────────────────────────────────────────────────────────┘任务质量KPI框架为评估任务实施质量提供客观、量化的指标。
核心架构:KPI由Hook自动生成——你只需读取结果,无需运行脚本。
┌─────────────────────────────────────────────────────────────┐
│ HOOK (auto-executes) │
│ Trigger: PostToolUse on TASK-*.md │
│ Script: task-kpi-analyzer.py │
│ Output: TASK-XXX--kpi.json │
├─────────────────────────────────────────────────────────────┤
│ SKILL / AGENT (reads output) │
│ Input: TASK-XXX--kpi.json │
│ Action: Make evaluation decisions │
└─────────────────────────────────────────────────────────────┘Why This Architecture?
为何采用该架构?
| Problem | Solution |
|---|---|
| Skills can't execute scripts | Hook auto-runs on file save |
| Subjective review_status | Quantitative 0-10 scores |
| "Looks good to me" | Evidence-based evaluation |
| Binary pass/fail | Graduated quality levels |
| 问题 | 解决方案 |
|---|---|
| 技能无法执行脚本 | 文件保存时Hook自动运行 |
| 主观的review_status | 量化0-10分评分 |
| “我看着没问题” | 基于证据的评估 |
| 二元通过/不通过 | 分级质量水平 |
KPI File Location
KPI文件位置
After any task file modification, find KPI data at:
docs/specs/[ID]/tasks/TASK-XXX--kpi.json任务文件修改后,可在以下路径找到KPI数据:
docs/specs/[ID]/tasks/TASK-XXX--kpi.jsonKPI Categories
KPI分类
┌─────────────────────────────────────────────────────────────┐
│ OVERALL SCORE (0-10) │
├─────────────────────────────────────────────────────────────┤
│ Spec Compliance (30%) │
│ ├── Acceptance Criteria Met (0-10) │
│ ├── Requirements Coverage (0-10) │
│ └── No Scope Creep (0-10) │
├─────────────────────────────────────────────────────────────┤
│ Code Quality (25%) │
│ ├── Static Analysis (0-10) │
│ ├── Complexity (0-10) │
│ └── Patterns Alignment (0-10) │
├─────────────────────────────────────────────────────────────┤
│ Test Coverage (25%) │
│ ├── Unit Tests Present (0-10) │
│ ├── Test/Code Ratio (0-10) │
│ └── Coverage Percentage (0-10) │
├─────────────────────────────────────────────────────────────┤
│ Contract Fulfillment (20%) │
│ ├── Provides Verified (0-10) │
│ └── Expects Satisfied (0-10) │
└─────────────────────────────────────────────────────────────┘┌─────────────────────────────────────────────────────────────┐
│ OVERALL SCORE (0-10) │
├─────────────────────────────────────────────────────────────┤
│ Spec Compliance (30%) │
│ ├── Acceptance Criteria Met (0-10) │
│ ├── Requirements Coverage (0-10) │
│ └── No Scope Creep (0-10) │
├─────────────────────────────────────────────────────────────┤
│ Code Quality (25%) │
│ ├── Static Analysis (0-10) │
│ ├── Complexity (0-10) │
│ └── Patterns Alignment (0-10) │
├─────────────────────────────────────────────────────────────┤
│ Test Coverage (25%) │
│ ├── Unit Tests Present (0-10) │
│ ├── Test/Code Ratio (0-10) │
│ └── Coverage Percentage (0-10) │
├─────────────────────────────────────────────────────────────┤
│ Contract Fulfillment (20%) │
│ ├── Provides Verified (0-10) │
│ └── Expects Satisfied (0-10) │
└─────────────────────────────────────────────────────────────┘Category Weights
分类权重
| Category | Weight | Why |
|---|---|---|
| Spec Compliance | 30% | Most important - did we build what was asked? |
| Code Quality | 25% | Technical excellence |
| Test Coverage | 25% | Verification and confidence |
| Contract Fulfillment | 20% | Integration with other tasks |
| 分类 | 权重 | 原因 |
|---|---|---|
| 规范合规 | 30% | 最重要——我们是否按要求完成了任务? |
| 代码质量 | 25% | 技术卓越性 |
| 测试覆盖 | 25% | 验证与可信度 |
| 契约履行 | 20% | 与其他任务的集成性 |
When to Use
适用场景
- Reading KPI data for task quality evaluation
- Understanding quality metrics and scoring breakdown
- Deciding whether to iterate or approve based on quantitative data
- Integrating KPI checks into automated loops ()
agents_loop.py - Generating evidence-based evaluation reports
- 读取KPI数据进行任务质量评估
- 理解质量指标及评分细分
- 基于量化数据决定是否迭代或批准任务
- 将KPI检查集成至自动化循环()
agents_loop.py - 生成基于证据的评估报告
Instructions
使用说明
1. Reading KPI Data (Primary Use)
1. 读取KPI数据(主要用途)
DO NOT run scripts - read the auto-generated file:
markdown
Read the KPI file:
docs/specs/001-feature/tasks/TASK-001--kpi.json请勿运行脚本——直接读取自动生成的文件:
markdown
读取KPI文件:
docs/specs/001-feature/tasks/TASK-001--kpi.json2. Understanding the Data
2. 理解数据内容
The KPI file contains:
json
{
"task_id": "TASK-001",
"evaluated_at": "2026-01-15T10:30:00Z",
"overall_score": 8.2,
"passed_threshold": true,
"threshold": 7.5,
"kpi_scores": [
{
"category": "Spec Compliance",
"weight": 30,
"score": 8.5,
"weighted_score": 2.55,
"metrics": {
"acceptance_criteria_met": 9.0,
"requirements_coverage": 8.0,
"no_scope_creep": 8.5
},
"evidence": [
"Acceptance criteria: 9/10 checked",
"Requirements coverage: 8/10"
]
}
],
"recommendations": [
"Code Quality: Moderate improvements possible"
],
"summary": "Score: 8.2/10 - PASSED"
}KPI文件包含以下内容:
json
{
"task_id": "TASK-001",
"evaluated_at": "2026-01-15T10:30:00Z",
"overall_score": 8.2,
"passed_threshold": true,
"threshold": 7.5,
"kpi_scores": [
{
"category": "Spec Compliance",
"weight": 30,
"score": 8.5,
"weighted_score": 2.55,
"metrics": {
"acceptance_criteria_met": 9.0,
"requirements_coverage": 8.0,
"no_scope_creep": 8.5
},
"evidence": [
"Acceptance criteria: 9/10 checked",
"Requirements coverage: 8/10"
]
}
],
"recommendations": [
"Code Quality: Moderate improvements possible"
],
"summary": "Score: 8.2/10 - PASSED"
}3. Making Decisions
3. 做出决策
Use and :
overall_scorepassed_thresholdIF passed_threshold == true:
→ Task meets quality standards
→ Approve and proceed
IF passed_threshold == false:
→ Task needs improvement
→ Check recommendations for specific targets
→ Create fix specification使用和字段:
overall_scorepassed_thresholdIF passed_threshold == true:
→ 任务符合质量标准
→ 批准并推进流程
IF passed_threshold == false:
→ 任务需要改进
→ 查看建议中的具体改进目标
→ 创建修复规范Integration with Workflow
与工作流集成
In Task Review (evaluator-agent)
在任务评审中(evaluator-agent)
markdown
undefinedmarkdown
undefinedReview Process
评审流程
- Read KPI file: TASK-XXX--kpi.json
- Extract overall_score and kpi_scores
- Read task file to validate
- Generate evaluation report
- Decision based on passed_threshold
undefined- 读取KPI文件:TASK-XXX--kpi.json
- 提取overall_score和kpi_scores
- 读取任务文件进行验证
- 生成评估报告
- 根据passed_threshold做出决策
undefinedIn agents_loop
在agents_loop中
python
undefinedpython
undefinedCheck KPI file exists
检查KPI文件是否存在
kpi_path = spec_path / "tasks" / f"{task_id}--kpi.json"
if kpi_path.exists():
kpi_data = json.loads(kpi_path.read_text())
if kpi_data["passed_threshold"]:
# Quality threshold met
advance_state("update_done")
else:
# Need more work
fix_targets = kpi_data["recommendations"]
create_fix_task(fix_targets)
advance_state("fix")else:
# KPI not generated yet - task may not be implemented
log_warning("No KPI data found")
undefinedkpi_path = spec_path / "tasks" / f"{task_id}--kpi.json"
if kpi_path.exists():
kpi_data = json.loads(kpi_path.read_text())
if kpi_data["passed_threshold"]:
# 达到质量阈值
advance_state("update_done")
else:
# 需要优化
fix_targets = kpi_data["recommendations"]
create_fix_task(fix_targets)
advance_state("fix")else:
# KPI尚未生成——任务可能未完成
log_warning("No KPI data found")
undefinedMulti-Iteration Loop
多迭代循环
Instead of max 3 retries, iterate until quality threshold met:
Iteration 1: Score 6.2 → FAILED → Fix: Improve test coverage
Iteration 2: Score 7.1 → FAILED → Fix: Refactor complex functions
Iteration 3: Score 7.8 → PASSED → ProceedEach iteration updates the KPI file automatically on task save.
不再限制最多3次重试,持续迭代直至达到质量阈值:
Iteration 1: Score 6.2 → FAILED → Fix: Improve test coverage
Iteration 2: Score 7.1 → FAILED → Fix: Refactor complex functions
Iteration 3: Score 7.8 → PASSED → Proceed每次迭代时,任务保存后KPI文件会自动更新。
Threshold Guidelines
阈值指南
| Score | Quality Level | Action |
|---|---|---|
| 9.0-10.0 | Exceptional | Approve, document best practices |
| 8.0-8.9 | Good | Approve with minor notes |
| 7.0-7.9 | Acceptable | Approve (if threshold 7.5) |
| 6.0-6.9 | Below Standard | Request specific improvements |
| < 6.0 | Poor | Significant rework required |
| 分数 | 质量等级 | 操作 |
|---|---|---|
| 9.0-10.0 | 卓越 | 批准,记录最佳实践 |
| 8.0-8.9 | 良好 | 批准并附带少量备注 |
| 7.0-7.9 | 可接受 | 批准(若阈值为7.5) |
| 6.0-6.9 | 低于标准 | 要求具体改进 |
| < 6.0 | 较差 | 需要重大返工 |
Recommended Thresholds
推荐阈值
| Project Type | Threshold | Rationale |
|---|---|---|
| Production MVP | 8.0 | High quality required |
| Internal Tool | 7.0 | Good enough |
| Prototype | 6.0 | Functional over perfect |
| Critical System | 8.5 | No compromises |
| 项目类型 | 阈值 | 理由 |
|---|---|---|
| 生产环境MVP | 8.0 | 要求高质量 |
| 内部工具 | 7.0 | 满足需求即可 |
| 原型 | 6.0 | 功能优先于完美 |
| 关键系统 | 8.5 | 无妥协空间 |
Metric Details
指标详情
Spec Compliance Metrics
规范合规指标
Acceptance Criteria Met
- Calculates:
(checked_criteria / total_criteria) * 10 - Source: Task file checkbox count
- Example: 9/10 checked = 9.0
Requirements Coverage
- Calculates: Count of REQ-IDs this task covers
- Source:
traceability-matrix.md - Example: 4 requirements covered = 8.0
No Scope Creep
- Calculates:
(implemented_files / expected_files) * 10 - Source: Task "Files to Create" vs actual files
- Penalizes: Missing files or unexpected additions
Acceptance Criteria Met(验收标准达成率)
- 计算公式:
(已勾选标准数 / 总标准数) * 10 - 数据来源:任务文件中的复选框计数
- 示例:9/10已勾选 = 9.0
Requirements Coverage(需求覆盖率)
- 计算方式:统计该任务覆盖的REQ-ID数量
- 数据来源:
traceability-matrix.md - 示例:覆盖4项需求 = 8.0
No Scope Creep(无范围蔓延)
- 计算公式:
(已实现文件数 / 预期文件数) * 10 - 数据来源:任务中的“待创建文件”与实际文件对比
- 扣分情况:缺失文件或添加了预期外的内容
Code Quality Metrics
代码质量指标
Static Analysis
- Java: Maven Checkstyle
- TypeScript: ESLint
- Python: ruff
- Score: 10 if passes, 5 if issues found
Complexity
- Calculates: Functions >50 lines
- Score:
10 - (long_functions_ratio * 5) - Penalizes: Large, complex functions
Patterns Alignment
- Checks: Knowledge Graph patterns
- Source:
knowledge-graph.json - Validates: Implementation follows project patterns
Static Analysis(静态分析)
- Java:Maven Checkstyle
- TypeScript:ESLint
- Python:ruff
- 评分:通过得10分,存在问题得5分
Complexity(复杂度)
- 计算方式:统计行数超过50行的函数数量
- 评分公式:
10 - (长函数占比 * 5) - 扣分情况:存在大型、复杂的函数
Patterns Alignment(模式匹配度)
- 检查内容:知识图谱模式
- 数据来源:
knowledge-graph.json - 验证:实现是否遵循项目模式
Test Coverage Metrics
测试覆盖指标
Unit Tests Present
- Calculates:
min(10, test_files * 5) - 2 test files = maximum score
- Penalizes: Missing tests
Test/Code Ratio
- Calculates:
(test_count / code_count) * 10 - 1:1 ratio = 10/10
- Ideal: At least 1 test file per code file
Coverage Percentage
- Source: Coverage reports (JaCoCo, lcov, etc.)
- Calculates:
coverage_percent / 10 - 80% coverage = 8.0
Unit Tests Present(单元测试存在性)
- 计算公式:
min(10, 测试文件数 * 5) - 2个测试文件 = 满分
- 扣分情况:缺失测试文件
Test/Code Ratio(测试/代码比率)
- 计算公式:
(测试用例数 / 代码行数) * 10 - 1:1比率 = 10/10
- 理想状态:至少每个代码文件对应一个测试文件
Coverage Percentage(覆盖百分比)
- 数据来源:覆盖率报告(JaCoCo、lcov等)
- 计算公式:
覆盖率百分比 / 10 - 80%覆盖率 = 8.0
Contract Fulfillment Metrics
契约履行指标
Provides Verified
- Checks: Files exist and export expected symbols
- Source: Task frontmatter
provides - Validates: Contract satisfied
Expects Satisfied
- Checks: Dependencies provide required files/symbols
- Source: Task frontmatter
expects - Validates: Prerequisites met
Provides Verified(提供内容验证)
- 检查内容:文件是否存在并导出预期符号
- 数据来源:任务的前置内容
provides - 验证:是否满足契约要求
Expects Satisfied(依赖满足情况)
- 检查内容:依赖是否提供了所需的文件/符号
- 数据来源:任务的前置内容
expects - 验证:是否满足先决条件
When KPI File is Missing
KPI文件缺失时的处理
If doesn't exist:
TASK-XXX--kpi.json- Task was never modified - Hook runs on file save
- Hook failed - Check Claude Code logs
- Task is new - Save the file first to trigger hook
DO NOT try to calculate KPIs manually. The hook runs automatically when:
- Task file is saved (Write tool)
- Task file is edited (Edit tool)
若不存在:
TASK-XXX--kpi.json- 任务从未被修改——Hook仅在文件保存时运行
- Hook运行失败——检查Claude Code日志
- 任务为新建——先保存文件以触发Hook
请勿手动计算KPI。Hook会在以下场景自动运行:
- 任务文件被保存(Write工具)
- 任务文件被编辑(Edit工具)
Best Practices
最佳实践
1. Always Check KPI File Exists
1. 始终检查KPI文件是否存在
Before evaluating:
markdown
Check if KPI file exists:
docs/specs/[ID]/tasks/TASK-XXX--kpi.json
If missing:
- Task may not be implemented yet
- Ask user to save the task file first评估前:
markdown
检查KPI文件是否存在:
docs/specs/[ID]/tasks/TASK-XXX--kpi.json
若缺失:
- 任务可能尚未完成
- 请用户先保存任务文件2. Trust the Metrics
2. 信任指标数据
The KPIs are objective. Only override with documented evidence:
- Critical security issue not in metrics
- Logic error not caught by static analysis
- Exceptional quality not measured
KPI是客观的。仅在有文档记录的证据时才覆盖结果:
- 指标未涵盖的严重安全问题
- 静态分析未发现的逻辑错误
- 未被度量的卓越质量
3. Iterate on Low KPIs
3. 针对低KPI进行迭代
Target specific categories:
❌ "Fix code quality issues"
✅ "Improve Code Quality KPI from 5.2 to 7.0:
- Complexity: Refactor processData() (5→8)
- Patterns: Add error handling (6→8)"瞄准特定分类进行优化:
❌ "修复代码质量问题"
✅ "将代码质量KPI从5.2提升至7.0:
- 复杂度:重构processData()(5→8)
- 模式匹配:添加错误处理(6→8)"4. Track KPI Trends
4. 跟踪KPI趋势
Monitor quality over time:
Sprint 1: Average KPI 6.8
Sprint 2: Average KPI 7.3 (+0.5)
Sprint 3: Average KPI 7.9 (+0.6)随时间监控质量变化:
Sprint 1: 平均KPI 6.8
Sprint 2: 平均KPI 7.3 (+0.5)
Sprint 3: 平均KPI 7.9 (+0.6)Troubleshooting
故障排除
KPI File Not Generated
KPI文件未生成
Check:
- Hook enabled in
hooks.json - Task file name matches pattern
TASK-*.md - File was actually saved (not just viewed)
检查项:
- 中Hook已启用
hooks.json - 任务文件名符合模式
TASK-*.md - 文件确实已保存(而非仅查看)
KPI Scores Seem Wrong
KPI评分似乎有误
Validate:
- Check evidence field for data sources
- Verify files exist at expected paths
- Some metrics need build tools (Maven, npm)
验证步骤:
- 查看evidence字段的数据来源
- 验证文件是否存在于预期路径
- 部分指标需要构建工具(Maven、npm)支持
Low Scores Despite Good Code
代码质量良好但评分较低
Possible causes:
- Missing test files
- No coverage report generated
- Acceptance criteria not checked
- Lint rules too strict
Fix the root cause, not just the score.
可能原因:
- 缺失测试文件
- 未生成覆盖率报告
- 验收标准未勾选
- 代码检查规则过于严格
修复根本原因,而非仅修改评分。
Examples
示例
Example 1: Reading KPI Data
示例1:读取KPI数据
markdown
Read the KPI file to evaluate task quality:
docs/specs/001-feature/tasks/TASK-042--kpi.json
Based on the data:
- Overall score: 6.8/10 (below threshold)
- Lowest KPI: Test Coverage (5.0/10)
- Recommendation: Add unit tests
Decision: REQUEST FIXES - target Test Coverage improvementmarkdown
读取KPI文件评估任务质量:
docs/specs/001-feature/tasks/TASK-042--kpi.json
基于数据:
- 总分:6.8/10(低于阈值)
- 最低KPI:测试覆盖(5.0/10)
- 建议:添加单元测试
决策:要求修复——目标提升测试覆盖Example 2: Iteration Decision
示例2:迭代决策
markdown
Iteration 1 KPI: Score 6.2 → FAILED
- Spec Compliance: 7.0 ✓
- Code Quality: 5.5 ✗
- Test Coverage: 6.0 ✗
Fix targets:
1. Refactor complex functions (Code Quality)
2. Add test coverage (Test Coverage)
Iteration 2 KPI: Score 7.8 → PASSED ✓markdown
Iteration 1 KPI: Score 6.2 → FAILED
- 规范合规:7.0 ✓
- 代码质量:5.5 ✗
- 测试覆盖:6.0 ✗
修复目标:
1. 重构复杂函数(代码质量)
2. 提升测试覆盖(测试覆盖)
Iteration 2 KPI: Score 7.8 → PASSED ✓Example 3: agents_loop Integration
示例3:agents_loop集成
python
undefinedpython
undefinedIn agents_loop, after implementation step
在agents_loop中,实现步骤完成后
kpi_file = spec_dir / "tasks" / f"{task_id}--kpi.json"
if kpi_file.exists():
kpi = json.loads(kpi_file.read_text())
if kpi["passed_threshold"]:
print(f"✅ Task passed quality check: {kpi['overall_score']}/10")
advance_state("update_done")
else:
print(f"❌ Task failed quality check: {kpi['overall_score']}/10")
print("Recommendations:")
for rec in kpi["recommendations"]:
print(f" - {rec}")
advance_state("fix")undefinedkpi_file = spec_dir / "tasks" / f"{task_id}--kpi.json"
if kpi_file.exists():
kpi = json.loads(kpi_file.read_text())
if kpi["passed_threshold"]:
print(f"✅ 任务通过质量检查:{kpi['overall_score']}/10")
advance_state("update_done")
else:
print(f"❌ 任务未通过质量检查:{kpi['overall_score']}/10")
print("建议:")
for rec in kpi["recommendations"]:
print(f" - {rec}")
advance_state("fix")undefinedReferences
参考资料
- - Agent that uses KPI data for evaluation
evaluator-agent.md - - Hook configuration for auto-generation
hooks.json - - Hook script (do not execute directly)
task-kpi-analyzer.py - - Orchestrator that reads KPI for decisions
agents_loop.py
- - 使用KPI数据进行评估的Agent
evaluator-agent.md - - 自动生成KPI的Hook配置
hooks.json - - Hook脚本(请勿直接执行)
task-kpi-analyzer.py - - 读取KPI以做出决策的编排器
agents_loop.py