judge

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Judge

Quality review and evaluation skill that verifies completed work against defined criteria. Part of the two-tier multi-agent architecture where Judge evaluates worker output.

这是一项质量评审与评估技能，用于对照既定标准验证已完成的工作成果。它属于双层多Agent架构的一部分，由Judge负责评估Worker的输出结果。

Contract

约定

Inputs:

Completed work output (files, changes, artifacts)
Original acceptance criteria or success criteria
Context about what was attempted

Outputs:

Pass/fail determination
List of issues found (if any)
Recommendations for fixes (if failing)

Success Criteria:

All acceptance criteria evaluated
Clear pass/fail determination provided
Actionable feedback given for any failures

输入项：

已完成的工作成果输出（文件、变更内容、制品）
原始验收标准或成功标准
关于本次工作目标的上下文信息

输出项：

通过/不通过的判定结果
发现的问题列表（如有）
修复建议（如未通过）

成功标准：

所有验收标准均已评估
提供清晰的通过/不通过判定结果
针对未通过的情况给出可执行的反馈

When to Use

适用场景

Invoke the judge skill:

After atomic skill completion - Before marking work as done
Before committing - Final quality gate
After build/test phases - Verify implementation meets spec
When reviewing generated code - Catch issues before integration

调用Judge技能的场景：

原子技能完成后 - 在标记工作完成前
提交代码前 - 最终质量关卡
构建/测试阶段后 - 验证实现是否符合规格
评审生成的代码时 - 在集成前发现问题

When NOT to Use

不适用场景

Skip the judge skill for:

Trivial changes - Single-line fixes, typo corrections
Mid-workflow - Don't interrupt atomic skills; judge at phase boundaries
Exploratory work - When user is iterating quickly and explicitly skipping review
User-requested skip - When user says "just do it" or "skip review"

跳过Judge技能的场景：

微小变更 - 单行修复、拼写错误修正
工作流中途 - 不要中断原子技能；仅在阶段边界处进行评审
探索性工作 - 用户快速迭代且明确跳过评审时
用户要求跳过 - 当用户说“直接执行”或“跳过评审”时

Review Process

评审流程

Step 1: Gather Context

步骤1：收集上下文

Collect the materials needed for review:

The output - What was produced (files, code, documents)
The criteria - What was supposed to be achieved (acceptance criteria, spec)
The scope - What was in/out of scope for this work

收集评审所需的材料：

输出成果 - 生成的内容（文件、代码、文档）
评审标准 - 预期达成的目标（验收标准、规格说明）
工作范围 - 本次工作的包含/排除范围

Step 2: Evaluate Against Criteria

步骤2：对照标准评估

For each acceptance criterion:

Check if the criterion is met
Note any partial completion
Document evidence (file paths, line numbers, test results)

Use this evaluation format:

markdown

undefined

针对每一项验收标准：

检查该标准是否已满足
记录任何部分完成的情况
记录证据（文件路径、行号、测试结果）

使用以下评估格式：

markdown

undefined

Review: [Work Description]

评审：[工作内容描述]

Criteria Evaluation

标准评估

Criterion	Status	Evidence
[Criterion 1]	PASS/FAIL/PARTIAL	[Evidence]
[Criterion 2]	PASS/FAIL/PARTIAL	[Evidence]

标准	状态	证据
[标准1]	通过/不通过/部分完成	[证据]
[标准2]	通过/不通过/部分完成	[证据]

Issues Found

发现的问题

[Issue description]
- Severity: Critical/Major/Minor
- Location: [File/line]
- Fix: [Recommended action]

[问题描述]
- 严重程度: 关键/主要/次要
- 位置: [文件/行号]
- 修复建议: [推荐操作]

Verdict

评审结论

PASS / FAIL / PASS WITH NOTES

[Summary of decision]

undefined

通过 / 不通过 / 通过但附注意事项

[决策总结]

undefined

Step 3: Apply Review Dimensions

步骤3：多维度评审

Evaluate across these dimensions based on work type:

根据工作类型从以下维度进行评估：

For Code Changes

代码变更评审

Dimension	Check
Correctness	Does it do what was specified?
Completeness	Are all criteria addressed?
Quality	No obvious bugs, edge cases handled?
Style	Follows project conventions?
Scope	No scope creep beyond criteria?

维度	检查内容
正确性	是否实现了指定功能？
完整性	所有标准是否都已覆盖？
质量	无明显Bug，边缘情况已处理？
风格	是否遵循项目规范？
范围	没有超出标准的范围蔓延？

For Document Generation

文档生成评审

Dimension	Check
Accuracy	Information is correct?
Completeness	All required sections present?
Format	Follows expected structure?
Clarity	Understandable to target audience?

维度	检查内容
准确性	信息是否正确？
完整性	所有必填章节是否齐全？
格式	是否符合预期结构？
清晰度	目标受众是否易于理解？

For Infrastructure Changes

基础设施变更评审

Dimension	Check
Functionality	Works as expected?
Security	No exposed secrets, proper permissions?
Idempotency	Can be run again safely?
Documentation	Changes documented?

维度	检查内容
功能性	是否按预期工作？
安全性	无暴露的密钥，权限设置是否合理？
幂等性	能否安全地重复执行？
文档化	变更是否已记录？

Severity Levels

严重程度分级

Level	Definition	Action
Critical	Blocks functionality, security issue, data loss risk	Must fix before proceeding
Major	Significant deviation from spec, poor UX	Should fix before commit
Minor	Style issues, minor improvements	Can note for future

级别	定义	操作要求
关键	阻塞功能、存在安全问题、有数据丢失风险	必须修复后才能继续
主要	严重偏离规格、用户体验不佳	应在提交前修复
次要	风格问题、微小优化点	可记录留待后续处理

Verdicts

评审结论类型

PASS

通过

All criteria met, no critical/major issues. Work can proceed.

所有标准均已满足，无关键/主要问题。工作可继续推进。

FAIL

不通过

Critical issues found OR acceptance criteria not met. Work must be revised.

Provide:

Specific issues with locations
Recommended fixes
Which criteria failed

发现关键问题或未满足验收标准。工作必须修订。

需提供：

带位置信息的具体问题
修复建议
未通过的标准项

PASS WITH NOTES

通过但附注意事项

All criteria met, but minor issues noted. Work can proceed with awareness of noted items.

所有标准均已满足，但存在次要问题记录。工作可推进，但需注意记录的事项。

Integration with Orchestrators

与编排器的集成

When used in orchestrated workflows:

Orchestrator invokes atomic skill - Work is produced
Orchestrator invokes judge - Work is evaluated
If PASS - Proceed to next phase
If FAIL - Return to previous skill with feedback

This creates the planner/worker/judge pattern that scales.

在编排工作流中使用时：

编排器调用原子技能 - 生成工作成果
编排器调用Judge - 评估工作成果
如通过 - 进入下一阶段
如不通过 - 携带反馈返回至之前的技能

这形成了可扩展的规划者/执行者/评审者模式。

Quick Review Checklist

快速评审检查表

For rapid reviews, use this checklist:

markdown

undefined

针对快速评审，使用以下检查表：

markdown

undefined

Quick Review

快速评审

All acceptance criteria addressed
No obvious bugs or errors
Follows project conventions
No scope creep
Ready to commit/proceed

Verdict: PASS / FAIL

undefined

所有验收标准均已覆盖
无明显Bug或错误
遵循项目规范
无范围蔓延
已准备好提交/推进

结论: 通过 / 不通过

undefined

References

参考资料

See

references/review-criteria.md

for detailed review criteria by skill type.

请查看

references/review-criteria.md

获取按技能类型划分的详细评审标准。