eval-relevance
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseEval Relevance
相关性评估
Use this skill to evaluate how relevant an assistant response is to the user’s request.
使用此技能来评估助手回复与用户请求的相关性。
Inputs
输入要求
Require:
- The assistant response text to evaluate.
- (Optional) The user’s original request for comparison.
必填:
- 待评估的助手回复文本。 -(可选)用于对比的用户原始请求。
Internal Rubric (1–5)
内部评分标准(1–5分)
5 = Directly addresses the user’s request, stays fully on-topic, and prioritizes what the user actually asked
4 = Mostly relevant, minor digressions or small omissions
3 = Partially relevant, addresses the general topic but misses key parts of the request
2 = Weak relevance, significant digressions or failure to address the core request
1 = Not relevant, does not address the user’s request or answers a different question entirely
4 = Mostly relevant, minor digressions or small omissions
3 = Partially relevant, addresses the general topic but misses key parts of the request
2 = Weak relevance, significant digressions or failure to address the core request
1 = Not relevant, does not address the user’s request or answers a different question entirely
5分 = 直接回应用户请求,完全紧扣主题,优先处理用户实际询问的内容
4分 = 大部分相关,存在轻微偏离主题或少量遗漏
3分 = 部分相关,涉及大致主题但未覆盖请求的关键部分
2分 = 相关性较弱,存在明显偏离主题或未回应用户核心请求
1分 = 完全不相关,未回应用户请求或答非所问
4分 = 大部分相关,存在轻微偏离主题或少量遗漏
3分 = 部分相关,涉及大致主题但未覆盖请求的关键部分
2分 = 相关性较弱,存在明显偏离主题或未回应用户核心请求
1分 = 完全不相关,未回应用户请求或答非所问
Workflow
工作流程
- Compare the assistant response to the user’s request (if provided).
- Score relevance on a 1-5 integer scale using the rubric only.
- Write concise rationale tied directly to rubric criteria.
- Produce actionable suggestions that improve relevance.
- 将助手回复与用户请求(若提供)进行对比。
- 仅依据上述评分标准,以1-5的整数为相关性打分。
- 撰写与评分标准直接挂钩的简洁理由。
- 提出可提升相关性的可行建议。
Output Contract
输出规范
Return JSON only. Do not include markdown, backticks, prose, or extra keys.
Use exactly this schema:
{
"dimension": "relevance",
"score": 1,
"rationale": "...",
"improvement_suggestions": [
"..."
]
}
仅返回JSON格式内容。请勿包含Markdown、反引号、散文或额外的键值对。
请严格使用以下 schema:
{
"dimension": "relevance",
"score": 1,
"rationale": "...",
"improvement_suggestions": [
"..."
]
}
Hard Rules
硬性规则
- must always equal
dimension."relevance" - must be an integer from 1 to 5.
score - must be concise (max 3 sentences).
rationale - Do not include step-by-step reasoning.
- must be a non-empty array of concrete edits.
improvement_suggestions - Never output text outside the JSON object.
- 必须始终等于
dimension。"relevance" - 必须是1到5之间的整数。
score - 必须简洁(最多3句话)。
rationale - 请勿包含分步推理内容。
- 必须是包含具体修改建议的非空数组。
improvement_suggestions - 绝不在JSON对象之外输出任何文本。