evaluation-framework

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Evaluation Framework

评估框架

Overview

概述

A generic framework for weighted scoring and threshold-based decision making. Provides reusable patterns for evaluating any artifact against configurable criteria with consistent scoring methodology.

This framework abstracts the common pattern of: define criteria → assign weights → score against criteria → apply thresholds → make decisions.

这是一个用于加权评分和基于阈值决策的通用框架。提供了可复用的模式，可根据可配置的标准，采用一致的评分方法评估任何工件。

该框架抽象了以下通用流程：定义标准 → 分配权重 → 按标准评分 → 应用阈值 → 做出决策。

When To Use

适用场景

Implementing quality gates or evaluation rubrics
Building scoring systems for artifacts, proposals, or submissions
Need consistent evaluation methodology across different domains
Want threshold-based automated decision making
Creating assessment tools with weighted criteria

实施质量门或评估规则
为工件、提案或提交内容构建评分系统
需要在不同领域采用一致的评估方法
希望实现基于阈值的自动化决策
创建带加权标准的评估工具

When NOT To Use

不适用场景

Simple pass/fail without scoring needs

仅需简单通过/不通过而无需评分的场景

Core Pattern

核心模式

1. Define Criteria

1. 定义评估标准

yaml

criteria:
  - name: criterion_name
    weight: 0.30          # 30% of total score
    description: What this measures
    scoring_guide:
      90-100: Exceptional
      70-89: Strong
      50-69: Acceptable
      30-49: Weak
      0-29: Poor

Verification: Run the command with

--help

flag to verify availability.

yaml

criteria:
  - name: criterion_name
    weight: 0.30          # 占总分的30%
    description: What this measures
    scoring_guide:
      90-100: Exceptional
      70-89: Strong
      50-69: Acceptable
      30-49: Weak
      0-29: Poor

验证： 运行带

--help

参数的命令来验证可用性。

2. Score Each Criterion

2. 为每个标准评分

python

scores = {
    "criterion_1": 85,  # Out of 100
    "criterion_2": 92,
    "criterion_3": 78,
}

Verification: Run the command with

--help

flag to verify availability.

python

scores = {
    "criterion_1": 85,  # 满分100
    "criterion_2": 92,
    "criterion_3": 78,
}

验证： 运行带

--help

参数的命令来验证可用性。

3. Calculate Weighted Total

3. 计算加权总分

python

total = sum(score * weights[criterion] for criterion, score in scores.items())

python

total = sum(score * weights[criterion] for criterion, score in scores.items())

Example: (85 × 0.30) + (92 × 0.40) + (78 × 0.30) = 85.5

示例：(85 × 0.30) + (92 × 0.40) + (78 × 0.30) = 85.5

**Verification:** Run the command with `--help` flag to verify availability.

**验证：** 运行带`--help`参数的命令来验证可用性。

4. Apply Decision Thresholds

4. 应用决策阈值

yaml

thresholds:
  80-100: Accept with priority
  60-79: Accept with conditions
  40-59: Review required
  20-39: Reject with feedback
  0-19: Reject

Verification: Run the command with

--help

flag to verify availability.

yaml

thresholds:
  80-100: Accept with priority
  60-79: Accept with conditions
  40-59: Review required
  20-39: Reject with feedback
  0-19: Reject

验证： 运行带

--help

参数的命令来验证可用性。

Quick Start

快速开始

Define Your Evaluation

定义你的评估体系

Identify criteria: What aspects matter for your domain?
Assign weights: Which criteria are most important? (sum to 1.0)
Create scoring guides: What does each score range mean?
Set thresholds: What total scores trigger which decisions?

确定评估标准：你的领域中哪些方面是重要的？
分配权重：哪些标准最重要？（权重总和为1.0）
创建评分指南：每个分数区间代表什么含义？
设置阈值：哪些总分会触发对应的决策？

Example: Code Review Evaluation

示例：代码审查评估

yaml

criteria:
  correctness: {weight: 0.40, description: Does code work as intended?}
  maintainability: {weight: 0.25, description: Is it readable?}
  performance: {weight: 0.20, description: Meets performance needs?}
  testing: {weight: 0.15, description: Tests detailed?}

thresholds:
  85-100: Approve immediately
  70-84: Approve with minor feedback
  50-69: Request changes
  0-49: Reject, major issues

Verification: Run

pytest -v

to verify tests pass.

yaml

criteria:
  correctness: {weight: 0.40, description: Does code work as intended?}
  maintainability: {weight: 0.25, description: Is it readable?}
  performance: {weight: 0.20, description: Meets performance needs?}
  testing: {weight: 0.15, description: Tests detailed?}

thresholds:
  85-100: Approve immediately
  70-84: Approve with minor feedback
  50-69: Request changes
  0-49: Reject, major issues

验证： 运行

pytest -v

来验证测试通过。

Evaluation Workflow

评估工作流

**Verification:** Run the command with `--help` flag to verify availability.
1. Review artifact against each criterion
2. Assign 0-100 score for each criterion
3. Calculate: total = Σ(score × weight)
4. Compare total to thresholds
5. Take action based on threshold range

Verification: Run the command with

--help

flag to verify availability.

**验证：** 运行带`--help`参数的命令来验证可用性。
1. 根据每个评估标准审查工件
2. 为每个标准分配0-100分
3. 计算：总分 = Σ(分数 × 权重)
4. 将总分与阈值进行对比
5. 根据阈值区间采取对应行动

验证： 运行带

--help

参数的命令来验证可用性。

Common Use Cases

常见使用场景

Quality Gates: Code review, PR approval, release readiness Content Evaluation: Document quality, knowledge intake, skill assessment Resource Allocation: Backlog prioritization, investment decisions, triage

质量门：代码审查、PR批准、发布就绪检查 内容评估：文档质量、知识导入、技能评估 资源分配：待办事项优先级排序、投资决策、问题分流

Integration Pattern

集成模式

yaml

undefined

yaml

undefined

In your skill's frontmatter

在你的Skill前置元数据中

dependencies: [leyline:evaluation-framework]

**Verification:** Run the command with `--help` flag to verify availability.

Then customize the framework for your domain:
- Define domain-specific criteria
- Set appropriate weights for your context
- Establish meaningful thresholds
- Document what each score range means

dependencies: [leyline:evaluation-framework]

**验证：** 运行带`--help`参数的命令来验证可用性。

然后根据你的领域自定义框架：
- 定义领域特定的评估标准
- 根据你的场景设置合适的权重
- 建立有意义的阈值
- 记录每个分数区间的含义

Detailed Resources

详细资源

Scoring Patterns: See
```
modules/scoring-patterns.md
```
for detailed methodology
Decision Thresholds: See
```
modules/decision-thresholds.md
```
for threshold design

评分模式：查看
```
modules/scoring-patterns.md
```
获取详细方法
决策阈值：查看
```
modules/decision-thresholds.md
```
获取阈值设计指南

Exit Criteria

验收标准

Troubleshooting

故障排除

Common Issues

常见问题

Command not found Ensure all dependencies are installed and in PATH

Permission errors Check file permissions and run with appropriate privileges

Unexpected behavior Enable verbose logging with

--verbose

flag

命令未找到 确保所有依赖已安装且已添加至PATH

权限错误 检查文件权限并使用适当的权限运行

意外行为 使用

--verbose

标志启用详细日志

evaluation-framework

Original

Translation

Table of Contents

目录

Evaluation Framework

评估框架

Overview

概述

When To Use

适用场景

When NOT To Use

不适用场景

Core Pattern

核心模式

1. Define Criteria

1. 定义评估标准

2. Score Each Criterion

2. 为每个标准评分

3. Calculate Weighted Total

3. 计算加权总分

Example: (85 × 0.30) + (92 × 0.40) + (78 × 0.30) = 85.5

示例：(85 × 0.30) + (92 × 0.40) + (78 × 0.30) = 85.5

4. Apply Decision Thresholds

4. 应用决策阈值

Quick Start

快速开始

Define Your Evaluation

定义你的评估体系

Example: Code Review Evaluation

示例：代码审查评估

Evaluation Workflow

评估工作流

Common Use Cases

常见使用场景

Integration Pattern

集成模式

In your skill's frontmatter

在你的Skill前置元数据中

Detailed Resources

详细资源

Exit Criteria

验收标准

Troubleshooting

故障排除

Common Issues

常见问题