eval-relevance

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Eval Relevance

相关性评估

Use this skill to evaluate how relevant an assistant response is to the user’s request.
使用此技能来评估助手回复与用户请求的相关性。

Inputs

输入要求

Require:
  • The assistant response text to evaluate.
  • (Optional) The user’s original request for comparison.
必填:
  • 待评估的助手回复文本。 -(可选)用于对比的用户原始请求。

Internal Rubric (1–5)

内部评分标准(1–5分)

5 = Directly addresses the user’s request, stays fully on-topic, and prioritizes what the user actually asked
4 = Mostly relevant, minor digressions or small omissions
3 = Partially relevant, addresses the general topic but misses key parts of the request
2 = Weak relevance, significant digressions or failure to address the core request
1 = Not relevant, does not address the user’s request or answers a different question entirely
5分 = 直接回应用户请求,完全紧扣主题,优先处理用户实际询问的内容
4分 = 大部分相关,存在轻微偏离主题或少量遗漏
3分 = 部分相关,涉及大致主题但未覆盖请求的关键部分
2分 = 相关性较弱,存在明显偏离主题或未回应用户核心请求
1分 = 完全不相关,未回应用户请求或答非所问

Workflow

工作流程

  1. Compare the assistant response to the user’s request (if provided).
  2. Score relevance on a 1-5 integer scale using the rubric only.
  3. Write concise rationale tied directly to rubric criteria.
  4. Produce actionable suggestions that improve relevance.
  1. 将助手回复与用户请求(若提供)进行对比。
  2. 仅依据上述评分标准,以1-5的整数为相关性打分。
  3. 撰写与评分标准直接挂钩的简洁理由。
  4. 提出可提升相关性的可行建议。

Output Contract

输出规范

Return JSON only. Do not include markdown, backticks, prose, or extra keys.
Use exactly this schema:
{ "dimension": "relevance", "score": 1, "rationale": "...", "improvement_suggestions": [ "..." ] }
仅返回JSON格式内容。请勿包含Markdown、反引号、散文或额外的键值对。
请严格使用以下 schema:
{ "dimension": "relevance", "score": 1, "rationale": "...", "improvement_suggestions": [ "..." ] }

Hard Rules

硬性规则

  • dimension
    must always equal
    "relevance"
    .
  • score
    must be an integer from 1 to 5.
  • rationale
    must be concise (max 3 sentences).
  • Do not include step-by-step reasoning.
  • improvement_suggestions
    must be a non-empty array of concrete edits.
  • Never output text outside the JSON object.
  • dimension
    必须始终等于
    "relevance"
  • score
    必须是1到5之间的整数。
  • rationale
    必须简洁(最多3句话)。
  • 请勿包含分步推理内容。
  • improvement_suggestions
    必须是包含具体修改建议的非空数组。
  • 绝不在JSON对象之外输出任何文本。