Task Quality KPI Framework
Overview
The Task Quality KPI Framework provides objective, quantitative metrics for evaluating task implementation quality.
Key Architecture: KPIs are auto-generated by a hook - you read the results, not run scripts.
┌─────────────────────────────────────────────────────────────┐
│ HOOK (auto-executes) │
│ Trigger: PostToolUse on TASK-*.md │
│ Script: task-kpi-analyzer.py │
│ Output: TASK-XXX--kpi.json │
├─────────────────────────────────────────────────────────────┤
│ SKILL / AGENT (reads output) │
│ Input: TASK-XXX--kpi.json │
│ Action: Make evaluation decisions │
└─────────────────────────────────────────────────────────────┘
Why This Architecture?
| Problem | Solution |
|---|
| Skills can't execute scripts | Hook auto-runs on file save |
| Subjective review_status | Quantitative 0-10 scores |
| "Looks good to me" | Evidence-based evaluation |
| Binary pass/fail | Graduated quality levels |
KPI File Location
After any task file modification, find KPI data at:
docs/specs/[ID]/tasks/TASK-XXX--kpi.json
KPI Categories
┌─────────────────────────────────────────────────────────────┐
│ OVERALL SCORE (0-10) │
├─────────────────────────────────────────────────────────────┤
│ Spec Compliance (30%) │
│ ├── Acceptance Criteria Met (0-10) │
│ ├── Requirements Coverage (0-10) │
│ └── No Scope Creep (0-10) │
├─────────────────────────────────────────────────────────────┤
│ Code Quality (25%) │
│ ├── Static Analysis (0-10) │
│ ├── Complexity (0-10) │
│ └── Patterns Alignment (0-10) │
├─────────────────────────────────────────────────────────────┤
│ Test Coverage (25%) │
│ ├── Unit Tests Present (0-10) │
│ ├── Test/Code Ratio (0-10) │
│ └── Coverage Percentage (0-10) │
├─────────────────────────────────────────────────────────────┤
│ Contract Fulfillment (20%) │
│ ├── Provides Verified (0-10) │
│ └── Expects Satisfied (0-10) │
└─────────────────────────────────────────────────────────────┘
Category Weights
| Category | Weight | Why |
|---|
| Spec Compliance | 30% | Most important - did we build what was asked? |
| Code Quality | 25% | Technical excellence |
| Test Coverage | 25% | Verification and confidence |
| Contract Fulfillment | 20% | Integration with other tasks |
When to Use
- Reading KPI data for task quality evaluation
- Understanding quality metrics and scoring breakdown
- Deciding whether to iterate or approve based on quantitative data
- Integrating KPI checks into automated loops ()
- Generating evidence-based evaluation reports
Instructions
1. Reading KPI Data (Primary Use)
DO NOT run scripts - read the auto-generated file:
markdown
Read the KPI file:
docs/specs/001-feature/tasks/TASK-001--kpi.json
2. Understanding the Data
The KPI file contains:
json
{
"task_id": "TASK-001",
"evaluated_at": "2026-01-15T10:30:00Z",
"overall_score": 8.2,
"passed_threshold": true,
"threshold": 7.5,
"kpi_scores": [
{
"category": "Spec Compliance",
"weight": 30,
"score": 8.5,
"weighted_score": 2.55,
"metrics": {
"acceptance_criteria_met": 9.0,
"requirements_coverage": 8.0,
"no_scope_creep": 8.5
},
"evidence": [
"Acceptance criteria: 9/10 checked",
"Requirements coverage: 8/10"
]
}
],
"recommendations": [
"Code Quality: Moderate improvements possible"
],
"summary": "Score: 8.2/10 - PASSED"
}
3. Making Decisions
IF passed_threshold == true:
→ Task meets quality standards
→ Approve and proceed
IF passed_threshold == false:
→ Task needs improvement
→ Check recommendations for specific targets
→ Create fix specification
Integration with Workflow
In Task Review (evaluator-agent)
markdown
## Review Process
1. Read KPI file: TASK-XXX--kpi.json
2. Extract overall_score and kpi_scores
3. Read task file to validate
4. Generate evaluation report
5. Decision based on passed_threshold
In agents_loop
python
# Check KPI file exists
kpi_path = spec_path / "tasks" / f"{task_id}--kpi.json"
if kpi_path.exists():
kpi_data = json.loads(kpi_path.read_text())
if kpi_data["passed_threshold"]:
# Quality threshold met
advance_state("update_done")
else:
# Need more work
fix_targets = kpi_data["recommendations"]
create_fix_task(fix_targets)
advance_state("fix")
else:
# KPI not generated yet - task may not be implemented
log_warning("No KPI data found")
Multi-Iteration Loop
Instead of max 3 retries, iterate until quality threshold met:
Iteration 1: Score 6.2 → FAILED → Fix: Improve test coverage
Iteration 2: Score 7.1 → FAILED → Fix: Refactor complex functions
Iteration 3: Score 7.8 → PASSED → Proceed
Each iteration updates the KPI file automatically on task save.
Threshold Guidelines
| Score | Quality Level | Action |
|---|
| 9.0-10.0 | Exceptional | Approve, document best practices |
| 8.0-8.9 | Good | Approve with minor notes |
| 7.0-7.9 | Acceptable | Approve (if threshold 7.5) |
| 6.0-6.9 | Below Standard | Request specific improvements |
| < 6.0 | Poor | Significant rework required |
Recommended Thresholds
| Project Type | Threshold | Rationale |
|---|
| Production MVP | 8.0 | High quality required |
| Internal Tool | 7.0 | Good enough |
| Prototype | 6.0 | Functional over perfect |
| Critical System | 8.5 | No compromises |
Metric Details
Spec Compliance Metrics
Acceptance Criteria Met
- Calculates:
(checked_criteria / total_criteria) * 10
- Source: Task file checkbox count
- Example: 9/10 checked = 9.0
Requirements Coverage
- Calculates: Count of REQ-IDs this task covers
- Source:
- Example: 4 requirements covered = 8.0
No Scope Creep
- Calculates:
(implemented_files / expected_files) * 10
- Source: Task "Files to Create" vs actual files
- Penalizes: Missing files or unexpected additions
Code Quality Metrics
Static Analysis
- Java: Maven Checkstyle
- TypeScript: ESLint
- Python: ruff
- Score: 10 if passes, 5 if issues found
Complexity
- Calculates: Functions >50 lines
- Score:
10 - (long_functions_ratio * 5)
- Penalizes: Large, complex functions
Patterns Alignment
- Checks: Knowledge Graph patterns
- Source:
- Validates: Implementation follows project patterns
Test Coverage Metrics
Unit Tests Present
- Calculates:
- 2 test files = maximum score
- Penalizes: Missing tests
Test/Code Ratio
- Calculates:
(test_count / code_count) * 10
- 1:1 ratio = 10/10
- Ideal: At least 1 test file per code file
Coverage Percentage
- Source: Coverage reports (JaCoCo, lcov, etc.)
- Calculates:
- 80% coverage = 8.0
Contract Fulfillment Metrics
Provides Verified
- Checks: Files exist and export expected symbols
- Source: Task frontmatter
- Validates: Contract satisfied
Expects Satisfied
- Checks: Dependencies provide required files/symbols
- Source: Task frontmatter
- Validates: Prerequisites met
When KPI File is Missing
- Task was never modified - Hook runs on file save
- Hook failed - Check Claude Code logs
- Task is new - Save the file first to trigger hook
DO NOT try to calculate KPIs manually. The hook runs automatically when:
- Task file is saved (Write tool)
- Task file is edited (Edit tool)
Best Practices
1. Always Check KPI File Exists
Before evaluating:
markdown
Check if KPI file exists:
docs/specs/[ID]/tasks/TASK-XXX--kpi.json
If missing:
- Task may not be implemented yet
- Ask user to save the task file first
2. Trust the Metrics
The KPIs are objective. Only override with documented evidence:
- Critical security issue not in metrics
- Logic error not caught by static analysis
- Exceptional quality not measured
3. Iterate on Low KPIs
Target specific categories:
❌ "Fix code quality issues"
✅ "Improve Code Quality KPI from 5.2 to 7.0:
- Complexity: Refactor processData() (5→8)
- Patterns: Add error handling (6→8)"
4. Track KPI Trends
Monitor quality over time:
Sprint 1: Average KPI 6.8
Sprint 2: Average KPI 7.3 (+0.5)
Sprint 3: Average KPI 7.9 (+0.6)
Troubleshooting
KPI File Not Generated
Check:
- Hook enabled in
- Task file name matches pattern
- File was actually saved (not just viewed)
KPI Scores Seem Wrong
Validate:
- Check evidence field for data sources
- Verify files exist at expected paths
- Some metrics need build tools (Maven, npm)
Low Scores Despite Good Code
Possible causes:
- Missing test files
- No coverage report generated
- Acceptance criteria not checked
- Lint rules too strict
Fix the root cause, not just the score.
Examples
Example 1: Reading KPI Data
markdown
Read the KPI file to evaluate task quality:
docs/specs/001-feature/tasks/TASK-042--kpi.json
Based on the data:
- Overall score: 6.8/10 (below threshold)
- Lowest KPI: Test Coverage (5.0/10)
- Recommendation: Add unit tests
Decision: REQUEST FIXES - target Test Coverage improvement
Example 2: Iteration Decision
markdown
Iteration 1 KPI: Score 6.2 → FAILED
- Spec Compliance: 7.0 ✓
- Code Quality: 5.5 ✗
- Test Coverage: 6.0 ✗
Fix targets:
1. Refactor complex functions (Code Quality)
2. Add test coverage (Test Coverage)
Iteration 2 KPI: Score 7.8 → PASSED ✓
Example 3: agents_loop Integration
python
# In agents_loop, after implementation step
kpi_file = spec_dir / "tasks" / f"{task_id}--kpi.json"
if kpi_file.exists():
kpi = json.loads(kpi_file.read_text())
if kpi["passed_threshold"]:
print(f"✅ Task passed quality check: {kpi['overall_score']}/10")
advance_state("update_done")
else:
print(f"❌ Task failed quality check: {kpi['overall_score']}/10")
print("Recommendations:")
for rec in kpi["recommendations"]:
print(f" - {rec}")
advance_state("fix")
References
- - Agent that uses KPI data for evaluation
- - Hook configuration for auto-generation
- - Hook script (do not execute directly)
- - Orchestrator that reads KPI for decisions