judge
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseJudge
Judge
Quality review and evaluation skill that verifies completed work against defined criteria. Part of the two-tier multi-agent architecture where Judge evaluates worker output.
这是一项质量评审与评估技能,用于对照既定标准验证已完成的工作成果。它属于双层多Agent架构的一部分,由Judge负责评估Worker的输出结果。
Contract
约定
Inputs:
- Completed work output (files, changes, artifacts)
- Original acceptance criteria or success criteria
- Context about what was attempted
Outputs:
- Pass/fail determination
- List of issues found (if any)
- Recommendations for fixes (if failing)
Success Criteria:
- All acceptance criteria evaluated
- Clear pass/fail determination provided
- Actionable feedback given for any failures
输入项:
- 已完成的工作成果输出(文件、变更内容、制品)
- 原始验收标准或成功标准
- 关于本次工作目标的上下文信息
输出项:
- 通过/不通过的判定结果
- 发现的问题列表(如有)
- 修复建议(如未通过)
成功标准:
- 所有验收标准均已评估
- 提供清晰的通过/不通过判定结果
- 针对未通过的情况给出可执行的反馈
When to Use
适用场景
Invoke the judge skill:
- After atomic skill completion - Before marking work as done
- Before committing - Final quality gate
- After build/test phases - Verify implementation meets spec
- When reviewing generated code - Catch issues before integration
调用Judge技能的场景:
- 原子技能完成后 - 在标记工作完成前
- 提交代码前 - 最终质量关卡
- 构建/测试阶段后 - 验证实现是否符合规格
- 评审生成的代码时 - 在集成前发现问题
When NOT to Use
不适用场景
Skip the judge skill for:
- Trivial changes - Single-line fixes, typo corrections
- Mid-workflow - Don't interrupt atomic skills; judge at phase boundaries
- Exploratory work - When user is iterating quickly and explicitly skipping review
- User-requested skip - When user says "just do it" or "skip review"
跳过Judge技能的场景:
- 微小变更 - 单行修复、拼写错误修正
- 工作流中途 - 不要中断原子技能;仅在阶段边界处进行评审
- 探索性工作 - 用户快速迭代且明确跳过评审时
- 用户要求跳过 - 当用户说“直接执行”或“跳过评审”时
Review Process
评审流程
Step 1: Gather Context
步骤1:收集上下文
Collect the materials needed for review:
- The output - What was produced (files, code, documents)
- The criteria - What was supposed to be achieved (acceptance criteria, spec)
- The scope - What was in/out of scope for this work
收集评审所需的材料:
- 输出成果 - 生成的内容(文件、代码、文档)
- 评审标准 - 预期达成的目标(验收标准、规格说明)
- 工作范围 - 本次工作的包含/排除范围
Step 2: Evaluate Against Criteria
步骤2:对照标准评估
For each acceptance criterion:
- Check if the criterion is met
- Note any partial completion
- Document evidence (file paths, line numbers, test results)
Use this evaluation format:
markdown
undefined针对每一项验收标准:
- 检查该标准是否已满足
- 记录任何部分完成的情况
- 记录证据(文件路径、行号、测试结果)
使用以下评估格式:
markdown
undefinedReview: [Work Description]
评审:[工作内容描述]
Criteria Evaluation
标准评估
| Criterion | Status | Evidence |
|---|---|---|
| [Criterion 1] | PASS/FAIL/PARTIAL | [Evidence] |
| [Criterion 2] | PASS/FAIL/PARTIAL | [Evidence] |
| 标准 | 状态 | 证据 |
|---|---|---|
| [标准1] | 通过/不通过/部分完成 | [证据] |
| [标准2] | 通过/不通过/部分完成 | [证据] |
Issues Found
发现的问题
- [Issue description]
- Severity: Critical/Major/Minor
- Location: [File/line]
- Fix: [Recommended action]
- [问题描述]
- 严重程度: 关键/主要/次要
- 位置: [文件/行号]
- 修复建议: [推荐操作]
Verdict
评审结论
PASS / FAIL / PASS WITH NOTES
[Summary of decision]
undefined通过 / 不通过 / 通过但附注意事项
[决策总结]
undefinedStep 3: Apply Review Dimensions
步骤3:多维度评审
Evaluate across these dimensions based on work type:
根据工作类型从以下维度进行评估:
For Code Changes
代码变更评审
| Dimension | Check |
|---|---|
| Correctness | Does it do what was specified? |
| Completeness | Are all criteria addressed? |
| Quality | No obvious bugs, edge cases handled? |
| Style | Follows project conventions? |
| Scope | No scope creep beyond criteria? |
| 维度 | 检查内容 |
|---|---|
| 正确性 | 是否实现了指定功能? |
| 完整性 | 所有标准是否都已覆盖? |
| 质量 | 无明显Bug,边缘情况已处理? |
| 风格 | 是否遵循项目规范? |
| 范围 | 没有超出标准的范围蔓延? |
For Document Generation
文档生成评审
| Dimension | Check |
|---|---|
| Accuracy | Information is correct? |
| Completeness | All required sections present? |
| Format | Follows expected structure? |
| Clarity | Understandable to target audience? |
| 维度 | 检查内容 |
|---|---|
| 准确性 | 信息是否正确? |
| 完整性 | 所有必填章节是否齐全? |
| 格式 | 是否符合预期结构? |
| 清晰度 | 目标受众是否易于理解? |
For Infrastructure Changes
基础设施变更评审
| Dimension | Check |
|---|---|
| Functionality | Works as expected? |
| Security | No exposed secrets, proper permissions? |
| Idempotency | Can be run again safely? |
| Documentation | Changes documented? |
| 维度 | 检查内容 |
|---|---|
| 功能性 | 是否按预期工作? |
| 安全性 | 无暴露的密钥,权限设置是否合理? |
| 幂等性 | 能否安全地重复执行? |
| 文档化 | 变更是否已记录? |
Severity Levels
严重程度分级
| Level | Definition | Action |
|---|---|---|
| Critical | Blocks functionality, security issue, data loss risk | Must fix before proceeding |
| Major | Significant deviation from spec, poor UX | Should fix before commit |
| Minor | Style issues, minor improvements | Can note for future |
| 级别 | 定义 | 操作要求 |
|---|---|---|
| 关键 | 阻塞功能、存在安全问题、有数据丢失风险 | 必须修复后才能继续 |
| 主要 | 严重偏离规格、用户体验不佳 | 应在提交前修复 |
| 次要 | 风格问题、微小优化点 | 可记录留待后续处理 |
Verdicts
评审结论类型
PASS
通过
All criteria met, no critical/major issues. Work can proceed.
所有标准均已满足,无关键/主要问题。工作可继续推进。
FAIL
不通过
Critical issues found OR acceptance criteria not met. Work must be revised.
Provide:
- Specific issues with locations
- Recommended fixes
- Which criteria failed
发现关键问题或未满足验收标准。工作必须修订。
需提供:
- 带位置信息的具体问题
- 修复建议
- 未通过的标准项
PASS WITH NOTES
通过但附注意事项
All criteria met, but minor issues noted. Work can proceed with awareness of noted items.
所有标准均已满足,但存在次要问题记录。工作可推进,但需注意记录的事项。
Integration with Orchestrators
与编排器的集成
When used in orchestrated workflows:
- Orchestrator invokes atomic skill - Work is produced
- Orchestrator invokes judge - Work is evaluated
- If PASS - Proceed to next phase
- If FAIL - Return to previous skill with feedback
This creates the planner/worker/judge pattern that scales.
在编排工作流中使用时:
- 编排器调用原子技能 - 生成工作成果
- 编排器调用Judge - 评估工作成果
- 如通过 - 进入下一阶段
- 如不通过 - 携带反馈返回至之前的技能
这形成了可扩展的规划者/执行者/评审者模式。
Quick Review Checklist
快速评审检查表
For rapid reviews, use this checklist:
markdown
undefined针对快速评审,使用以下检查表:
markdown
undefinedQuick Review
快速评审
- All acceptance criteria addressed
- No obvious bugs or errors
- Follows project conventions
- No scope creep
- Ready to commit/proceed
Verdict: PASS / FAIL
undefined- 所有验收标准均已覆盖
- 无明显Bug或错误
- 遵循项目规范
- 无范围蔓延
- 已准备好提交/推进
结论: 通过 / 不通过
undefinedReferences
参考资料
See for detailed review criteria by skill type.
references/review-criteria.md请查看获取按技能类型划分的详细评审标准。
references/review-criteria.md