output-critic
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseOutput Critic Protocol
输出评审协议
Evaluate the output by type, score each criterion, make an accept/reject decision, and suggest concrete improvements. Goal: prevent weak output from reaching the next step.
根据输出类型进行评估,为每项标准打分,做出接受/拒绝决策,并给出具体改进建议。目标:避免低质量输出进入下一环节。
Workflow
工作流
1. Detect output type
2. Apply type-specific criteria
3. Score each criterion
4. Calculate overall score
5. Make accept / conditional / reject decision
6. Suggest improvements1. 识别输出类型
2. 应用对应类型的评估标准
3. 为每项标准打分
4. 计算总分
5. 做出接受 / 有条件通过 / 拒绝决策
6. 给出改进建议Acceptance Threshold
接受阈值
| Overall Score | Decision | Action |
|---|---|---|
| 8-10 | ACCEPT | Proceed |
| 6-7 | CONDITIONAL | Apply minor fixes, then proceed |
| 0-5 | REJECT | Apply improvements, re-evaluate |
| 总分 | 决策 | 操作 |
|---|---|---|
| 8-10 | 接受 | 继续推进 |
| 6-7 | 有条件通过 | 进行小幅修正后推进 |
| 0-5 | 拒绝 | 进行改进后重新评估 |
Type-Specific Criteria
各类型评估标准
Code
代码
| Criterion | Weight | Question |
|---|---|---|
| Correctness | 30% | Produces expected output? Handles edge cases? |
| Readability | 20% | Meaningful names? Clean indentation? |
| Security | 20% | SQL injection? Hardcoded secrets? Unsafe input? |
| Performance | 15% | Unnecessary loops? N+1 queries? Memory leaks? |
| Testability | 15% | Functions independently testable? |
| 评估标准 | 权重 | 问题 |
|---|---|---|
| 正确性 | 30% | 是否生成预期输出?是否处理边界情况? |
| 可读性 | 20% | 是否使用有意义的命名?缩进是否清晰? |
| 安全性 | 20% | 是否存在SQL注入风险?是否有硬编码密钥?是否处理不安全输入? |
| 性能 | 15% | 是否存在不必要的循环?是否有N+1查询问题?是否存在内存泄漏? |
| 可测试性 | 15% | 函数是否可独立测试? |
Report / Written Content
报告/书面内容
| Criterion | Weight | Question |
|---|---|---|
| Accuracy | 30% | Claims supported? Misleading statements? |
| Coverage | 25% | All requested topics addressed? Missing sections? |
| Clarity | 20% | Target audience can understand? Jargon explained? |
| Structure | 15% | Logical flow? Consistent headings? |
| Actionability | 10% | Reader knows what to do next? |
| 评估标准 | 权重 | 问题 |
|---|---|---|
| 准确性 | 30% | 主张是否有依据?是否存在误导性表述? |
| 覆盖度 | 25% | 是否覆盖所有要求的主题?是否有缺失的章节? |
| 清晰度 | 20% | 目标受众是否能理解?术语是否有解释? |
| 结构 | 15% | 逻辑是否通顺?标题是否统一? |
| 可执行性 | 10% | 读者是否清楚下一步操作? |
Plan / Task List
计划/任务清单
| Criterion | Weight | Question |
|---|---|---|
| Completeness | 30% | All necessary steps present? Critical step missing? |
| Atomicity | 25% | Each step does one thing? Overly broad steps? |
| Dependency accuracy | 20% | Order makes sense? Circular dependencies? |
| Verifiability | 15% | Each step has clear "done" criteria? |
| Realism | 10% | Steps are achievable? Overly optimistic estimates? |
| 评估标准 | 权重 | 问题 |
|---|---|---|
| 完整性 | 30% | 是否包含所有必要步骤?是否缺失关键步骤? |
| 原子性 | 25% | 每个步骤是否仅完成一件事?是否存在过于宽泛的步骤? |
| 依赖准确性 | 20% | 步骤顺序是否合理?是否存在循环依赖? |
| 可验证性 | 15% | 每个步骤是否有明确的“完成”判定标准? |
| 可行性 | 10% | 步骤是否可实现?预估是否过于乐观? |
Data / Table
数据/表格
| Criterion | Weight | Question |
|---|---|---|
| Accuracy | 35% | Numbers consistent? Calculations correct? |
| Completeness | 25% | Missing rows/columns? Nulls explained? |
| Format consistency | 20% | Units, date formats, currency consistent? |
| Readability | 20% | Meaningful headers? Proper sorting? |
| 评估标准 | 权重 | 问题 |
|---|---|---|
| 准确性 | 35% | 数据是否一致?计算是否正确? |
| 完整性 | 25% | 是否存在缺失的行/列?空值是否有说明? |
| 格式一致性 | 20% | 单位、日期格式、货币是否统一? |
| 可读性 | 20% | 表头是否有意义?排序是否合理? |
Output Format
输出格式
OUTPUT CRITIC
Type : [output type]
Decision : ACCEPT / CONDITIONAL / REJECT
Score : [X/10]输出评审
类型 : [输出类型]
决策 : 接受 / 有条件通过 / 拒绝
评分 : [X/10]Criterion Scores
各标准评分
| Criterion | Score | Note |
|---|---|---|
| [Criterion 1] | X/10 | [short note] |
| [Criterion 2] | X/10 | [short note] |
| Overall | X/10 |
| 评估标准 | 评分 | 备注 |
|---|---|---|
| [评估标准1] | X/10 | [简短备注] |
| [评估标准2] | X/10 | [简短备注] |
| 总分 | X/10 |
Strengths
优势
- [What was done well — specific]
- [做得好的地方 — 具体描述]
Weaknesses
不足
- [What is missing / wrong — specific]
- [缺失/错误的地方 — 具体描述]
Improvement Suggestions
改进建议
- [Concrete action — what to do, where]
- [Concrete action]
- [具体操作 — 做什么,在哪里修改]
- [具体操作]
Next Step
下一步
[Accept -> proceed | Conditional -> fix X | Reject -> apply suggestions, resubmit]
---[接受 -> 继续推进 | 有条件通过 -> 修正X问题 | 拒绝 -> 落实改进建议后重新提交]
---Re-Evaluation
重新评估
When improved output is resubmitted:
RE-EVALUATION
Previous score: X/10
New score : Y/10
Change : +N points
Improved : [which criteria]
Still open : [remaining issues if any]当提交改进后的输出时:
重新评估
之前评分: X/10
新评分 : Y/10
变化 : +N分
已改进 : [哪些标准得到提升]
仍存在问题 : [若有剩余问题]When to Skip
跳过场景
- User said "quick and dirty, doesn't need to be perfect"
- Prototype / draft stage (user explicitly stated)
- Single-line simple output
- 用户明确要求“快速完成,无需完美”
- 原型/草稿阶段(用户明确说明)
- 单行简单输出
Guardrails
规则约束
- Never accept security issues — hardcoded secrets = automatic REJECT regardless of other scores.
- Be specific in suggestions — "improve code" is useless; "move API key to env var at line 12" is actionable.
- Cross-skill: works with (validates plan quality),
task-decomposeris the quality gate before task completion.output-critic
- 绝不接受安全问题 — 硬编码密钥直接判定为拒绝,无论其他评分如何。
- 改进建议需具体 — “优化代码”毫无意义;“将第12行的API密钥移至环境变量”才是可执行的建议。
- 跨技能协作:可与配合使用(验证计划质量),
task-decomposer是任务完成前的质量关卡。output-critic