evaluation-framework
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseTable of Contents
目录
Evaluation Framework
评估框架
Overview
概述
A generic framework for weighted scoring and threshold-based decision making. Provides reusable patterns for evaluating any artifact against configurable criteria with consistent scoring methodology.
This framework abstracts the common pattern of: define criteria → assign weights → score against criteria → apply thresholds → make decisions.
这是一个用于加权评分和基于阈值决策的通用框架。提供了可复用的模式,可根据可配置的标准,采用一致的评分方法评估任何工件。
该框架抽象了以下通用流程:定义标准 → 分配权重 → 按标准评分 → 应用阈值 → 做出决策。
When To Use
适用场景
- Implementing quality gates or evaluation rubrics
- Building scoring systems for artifacts, proposals, or submissions
- Need consistent evaluation methodology across different domains
- Want threshold-based automated decision making
- Creating assessment tools with weighted criteria
- 实施质量门或评估规则
- 为工件、提案或提交内容构建评分系统
- 需要在不同领域采用一致的评估方法
- 希望实现基于阈值的自动化决策
- 创建带加权标准的评估工具
When NOT To Use
不适用场景
- Simple pass/fail without scoring needs
- 仅需简单通过/不通过而无需评分的场景
Core Pattern
核心模式
1. Define Criteria
1. 定义评估标准
yaml
criteria:
- name: criterion_name
weight: 0.30 # 30% of total score
description: What this measures
scoring_guide:
90-100: Exceptional
70-89: Strong
50-69: Acceptable
30-49: Weak
0-29: PoorVerification: Run the command with flag to verify availability.
--helpyaml
criteria:
- name: criterion_name
weight: 0.30 # 占总分的30%
description: What this measures
scoring_guide:
90-100: Exceptional
70-89: Strong
50-69: Acceptable
30-49: Weak
0-29: Poor验证: 运行带参数的命令来验证可用性。
--help2. Score Each Criterion
2. 为每个标准评分
python
scores = {
"criterion_1": 85, # Out of 100
"criterion_2": 92,
"criterion_3": 78,
}Verification: Run the command with flag to verify availability.
--helppython
scores = {
"criterion_1": 85, # 满分100
"criterion_2": 92,
"criterion_3": 78,
}验证: 运行带参数的命令来验证可用性。
--help3. Calculate Weighted Total
3. 计算加权总分
python
total = sum(score * weights[criterion] for criterion, score in scores.items())python
total = sum(score * weights[criterion] for criterion, score in scores.items())Example: (85 × 0.30) + (92 × 0.40) + (78 × 0.30) = 85.5
示例:(85 × 0.30) + (92 × 0.40) + (78 × 0.30) = 85.5
**Verification:** Run the command with `--help` flag to verify availability.**验证:** 运行带`--help`参数的命令来验证可用性。4. Apply Decision Thresholds
4. 应用决策阈值
yaml
thresholds:
80-100: Accept with priority
60-79: Accept with conditions
40-59: Review required
20-39: Reject with feedback
0-19: RejectVerification: Run the command with flag to verify availability.
--helpyaml
thresholds:
80-100: Accept with priority
60-79: Accept with conditions
40-59: Review required
20-39: Reject with feedback
0-19: Reject验证: 运行带参数的命令来验证可用性。
--helpQuick Start
快速开始
Define Your Evaluation
定义你的评估体系
- Identify criteria: What aspects matter for your domain?
- Assign weights: Which criteria are most important? (sum to 1.0)
- Create scoring guides: What does each score range mean?
- Set thresholds: What total scores trigger which decisions?
- 确定评估标准:你的领域中哪些方面是重要的?
- 分配权重:哪些标准最重要?(权重总和为1.0)
- 创建评分指南:每个分数区间代表什么含义?
- 设置阈值:哪些总分会触发对应的决策?
Example: Code Review Evaluation
示例:代码审查评估
yaml
criteria:
correctness: {weight: 0.40, description: Does code work as intended?}
maintainability: {weight: 0.25, description: Is it readable?}
performance: {weight: 0.20, description: Meets performance needs?}
testing: {weight: 0.15, description: Tests detailed?}
thresholds:
85-100: Approve immediately
70-84: Approve with minor feedback
50-69: Request changes
0-49: Reject, major issuesVerification: Run to verify tests pass.
pytest -vyaml
criteria:
correctness: {weight: 0.40, description: Does code work as intended?}
maintainability: {weight: 0.25, description: Is it readable?}
performance: {weight: 0.20, description: Meets performance needs?}
testing: {weight: 0.15, description: Tests detailed?}
thresholds:
85-100: Approve immediately
70-84: Approve with minor feedback
50-69: Request changes
0-49: Reject, major issues验证: 运行来验证测试通过。
pytest -vEvaluation Workflow
评估工作流
**Verification:** Run the command with `--help` flag to verify availability.
1. Review artifact against each criterion
2. Assign 0-100 score for each criterion
3. Calculate: total = Σ(score × weight)
4. Compare total to thresholds
5. Take action based on threshold rangeVerification: Run the command with flag to verify availability.
--help**验证:** 运行带`--help`参数的命令来验证可用性。
1. 根据每个评估标准审查工件
2. 为每个标准分配0-100分
3. 计算:总分 = Σ(分数 × 权重)
4. 将总分与阈值进行对比
5. 根据阈值区间采取对应行动验证: 运行带参数的命令来验证可用性。
--helpCommon Use Cases
常见使用场景
Quality Gates: Code review, PR approval, release readiness
Content Evaluation: Document quality, knowledge intake, skill assessment
Resource Allocation: Backlog prioritization, investment decisions, triage
质量门:代码审查、PR批准、发布就绪检查
内容评估:文档质量、知识导入、技能评估
资源分配:待办事项优先级排序、投资决策、问题分流
Integration Pattern
集成模式
yaml
undefinedyaml
undefinedIn your skill's frontmatter
在你的Skill前置元数据中
dependencies: [leyline:evaluation-framework]
**Verification:** Run the command with `--help` flag to verify availability.
Then customize the framework for your domain:
- Define domain-specific criteria
- Set appropriate weights for your context
- Establish meaningful thresholds
- Document what each score range meansdependencies: [leyline:evaluation-framework]
**验证:** 运行带`--help`参数的命令来验证可用性。
然后根据你的领域自定义框架:
- 定义领域特定的评估标准
- 根据你的场景设置合适的权重
- 建立有意义的阈值
- 记录每个分数区间的含义Detailed Resources
详细资源
- Scoring Patterns: See for detailed methodology
modules/scoring-patterns.md - Decision Thresholds: See for threshold design
modules/decision-thresholds.md
- 评分模式:查看获取详细方法
modules/scoring-patterns.md - 决策阈值:查看获取阈值设计指南
modules/decision-thresholds.md
Exit Criteria
验收标准
- Criteria defined with clear descriptions
- Weights assigned and sum to 1.0
- Scoring guides documented for each criterion
- Thresholds mapped to specific actions
- Evaluation process documented and reproducible
- 已定义带清晰描述的评估标准
- 已分配权重且总和为1.0
- 已为每个标准记录评分指南
- 已将阈值映射到具体行动
- 已记录评估流程且流程可复现
Troubleshooting
故障排除
Common Issues
常见问题
Command not found
Ensure all dependencies are installed and in PATH
Permission errors
Check file permissions and run with appropriate privileges
Unexpected behavior
Enable verbose logging with flag
--verbose命令未找到
确保所有依赖已安装且已添加至PATH
权限错误
检查文件权限并使用适当的权限运行
意外行为
使用标志启用详细日志
--verbose