judge

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Judge

Judge

Quality review and evaluation skill that verifies completed work against defined criteria. Part of the two-tier multi-agent architecture where Judge evaluates worker output.
这是一项质量评审与评估技能,用于对照既定标准验证已完成的工作成果。它属于双层多Agent架构的一部分,由Judge负责评估Worker的输出结果。

Contract

约定

Inputs:
  • Completed work output (files, changes, artifacts)
  • Original acceptance criteria or success criteria
  • Context about what was attempted
Outputs:
  • Pass/fail determination
  • List of issues found (if any)
  • Recommendations for fixes (if failing)
Success Criteria:
  • All acceptance criteria evaluated
  • Clear pass/fail determination provided
  • Actionable feedback given for any failures
输入项:
  • 已完成的工作成果输出(文件、变更内容、制品)
  • 原始验收标准或成功标准
  • 关于本次工作目标的上下文信息
输出项:
  • 通过/不通过的判定结果
  • 发现的问题列表(如有)
  • 修复建议(如未通过)
成功标准:
  • 所有验收标准均已评估
  • 提供清晰的通过/不通过判定结果
  • 针对未通过的情况给出可执行的反馈

When to Use

适用场景

Invoke the judge skill:
  1. After atomic skill completion - Before marking work as done
  2. Before committing - Final quality gate
  3. After build/test phases - Verify implementation meets spec
  4. When reviewing generated code - Catch issues before integration
调用Judge技能的场景:
  1. 原子技能完成后 - 在标记工作完成前
  2. 提交代码前 - 最终质量关卡
  3. 构建/测试阶段后 - 验证实现是否符合规格
  4. 评审生成的代码时 - 在集成前发现问题

When NOT to Use

不适用场景

Skip the judge skill for:
  • Trivial changes - Single-line fixes, typo corrections
  • Mid-workflow - Don't interrupt atomic skills; judge at phase boundaries
  • Exploratory work - When user is iterating quickly and explicitly skipping review
  • User-requested skip - When user says "just do it" or "skip review"
跳过Judge技能的场景:
  • 微小变更 - 单行修复、拼写错误修正
  • 工作流中途 - 不要中断原子技能;仅在阶段边界处进行评审
  • 探索性工作 - 用户快速迭代且明确跳过评审时
  • 用户要求跳过 - 当用户说“直接执行”或“跳过评审”时

Review Process

评审流程

Step 1: Gather Context

步骤1:收集上下文

Collect the materials needed for review:
  1. The output - What was produced (files, code, documents)
  2. The criteria - What was supposed to be achieved (acceptance criteria, spec)
  3. The scope - What was in/out of scope for this work
收集评审所需的材料:
  1. 输出成果 - 生成的内容(文件、代码、文档)
  2. 评审标准 - 预期达成的目标(验收标准、规格说明)
  3. 工作范围 - 本次工作的包含/排除范围

Step 2: Evaluate Against Criteria

步骤2:对照标准评估

For each acceptance criterion:
  1. Check if the criterion is met
  2. Note any partial completion
  3. Document evidence (file paths, line numbers, test results)
Use this evaluation format:
markdown
undefined
针对每一项验收标准:
  1. 检查该标准是否已满足
  2. 记录任何部分完成的情况
  3. 记录证据(文件路径、行号、测试结果)
使用以下评估格式:
markdown
undefined

Review: [Work Description]

评审:[工作内容描述]

Criteria Evaluation

标准评估

CriterionStatusEvidence
[Criterion 1]PASS/FAIL/PARTIAL[Evidence]
[Criterion 2]PASS/FAIL/PARTIAL[Evidence]
标准状态证据
[标准1]通过/不通过/部分完成[证据]
[标准2]通过/不通过/部分完成[证据]

Issues Found

发现的问题

  1. [Issue description]
    • Severity: Critical/Major/Minor
    • Location: [File/line]
    • Fix: [Recommended action]
  1. [问题描述]
    • 严重程度: 关键/主要/次要
    • 位置: [文件/行号]
    • 修复建议: [推荐操作]

Verdict

评审结论

PASS / FAIL / PASS WITH NOTES
[Summary of decision]
undefined
通过 / 不通过 / 通过但附注意事项
[决策总结]
undefined

Step 3: Apply Review Dimensions

步骤3:多维度评审

Evaluate across these dimensions based on work type:
根据工作类型从以下维度进行评估:

For Code Changes

代码变更评审

DimensionCheck
CorrectnessDoes it do what was specified?
CompletenessAre all criteria addressed?
QualityNo obvious bugs, edge cases handled?
StyleFollows project conventions?
ScopeNo scope creep beyond criteria?
维度检查内容
正确性是否实现了指定功能?
完整性所有标准是否都已覆盖?
质量无明显Bug,边缘情况已处理?
风格是否遵循项目规范?
范围没有超出标准的范围蔓延?

For Document Generation

文档生成评审

DimensionCheck
AccuracyInformation is correct?
CompletenessAll required sections present?
FormatFollows expected structure?
ClarityUnderstandable to target audience?
维度检查内容
准确性信息是否正确?
完整性所有必填章节是否齐全?
格式是否符合预期结构?
清晰度目标受众是否易于理解?

For Infrastructure Changes

基础设施变更评审

DimensionCheck
FunctionalityWorks as expected?
SecurityNo exposed secrets, proper permissions?
IdempotencyCan be run again safely?
DocumentationChanges documented?
维度检查内容
功能性是否按预期工作?
安全性无暴露的密钥,权限设置是否合理?
幂等性能否安全地重复执行?
文档化变更是否已记录?

Severity Levels

严重程度分级

LevelDefinitionAction
CriticalBlocks functionality, security issue, data loss riskMust fix before proceeding
MajorSignificant deviation from spec, poor UXShould fix before commit
MinorStyle issues, minor improvementsCan note for future
级别定义操作要求
关键阻塞功能、存在安全问题、有数据丢失风险必须修复后才能继续
主要严重偏离规格、用户体验不佳应在提交前修复
次要风格问题、微小优化点可记录留待后续处理

Verdicts

评审结论类型

PASS

通过

All criteria met, no critical/major issues. Work can proceed.
所有标准均已满足,无关键/主要问题。工作可继续推进。

FAIL

不通过

Critical issues found OR acceptance criteria not met. Work must be revised.
Provide:
  • Specific issues with locations
  • Recommended fixes
  • Which criteria failed
发现关键问题或未满足验收标准。工作必须修订。
需提供:
  • 带位置信息的具体问题
  • 修复建议
  • 未通过的标准项

PASS WITH NOTES

通过但附注意事项

All criteria met, but minor issues noted. Work can proceed with awareness of noted items.
所有标准均已满足,但存在次要问题记录。工作可推进,但需注意记录的事项。

Integration with Orchestrators

与编排器的集成

When used in orchestrated workflows:
  1. Orchestrator invokes atomic skill - Work is produced
  2. Orchestrator invokes judge - Work is evaluated
  3. If PASS - Proceed to next phase
  4. If FAIL - Return to previous skill with feedback
This creates the planner/worker/judge pattern that scales.
在编排工作流中使用时:
  1. 编排器调用原子技能 - 生成工作成果
  2. 编排器调用Judge - 评估工作成果
  3. 如通过 - 进入下一阶段
  4. 如不通过 - 携带反馈返回至之前的技能
这形成了可扩展的规划者/执行者/评审者模式。

Quick Review Checklist

快速评审检查表

For rapid reviews, use this checklist:
markdown
undefined
针对快速评审,使用以下检查表:
markdown
undefined

Quick Review

快速评审

  • All acceptance criteria addressed
  • No obvious bugs or errors
  • Follows project conventions
  • No scope creep
  • Ready to commit/proceed
Verdict: PASS / FAIL
undefined
  • 所有验收标准均已覆盖
  • 无明显Bug或错误
  • 遵循项目规范
  • 无范围蔓延
  • 已准备好提交/推进
结论: 通过 / 不通过
undefined

References

参考资料

See
references/review-criteria.md
for detailed review criteria by skill type.
请查看
references/review-criteria.md
获取按技能类型划分的详细评审标准。