agent-reviewer

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Agent Reviewer Protocol

Agent 评审协议

Task is done — now look back. What went well, what went wrong, what should be different next time? Goal: never repeat the same mistake and continuously improve skills and processes.

Core principle: Retrospectives are painful but necessary. A good agent evaluates itself.

任务已完成——现在进行复盘。哪些地方做得好，哪些地方出了问题，下次应该做出哪些改变？目标：不再重复相同错误，持续优化Skill和流程。

核心原则： 回顾总结虽有难度但必不可少。优秀的Agent会进行自我评估。

6 Review Dimensions

6个评审维度

1. Goal Alignment

1. 目标一致性

Did the result match the original intent?

Was the user's actual request met?
Did scope creep occur?
Over-delivery or under-delivery?

结果是否符合初始意图？

是否满足了用户的实际需求？
是否出现了范围蔓延？
交付内容过多还是不足？

2. Efficiency

2. 效率

Did the task take longer than necessary?

Unnecessary tool calls?
Repeated operations?
Sequential steps that could have been parallel?
Token/resource waste?

任务耗时是否超出必要范围？

是否存在不必要的工具调用？
是否有重复操作？
可并行的步骤是否采用了串行执行？
是否存在Token/资源浪费？

3. Decision Quality

3. 决策质量

Were decisions well-reasoned?

Were assumptions verified?
Were alternatives considered?
Did early decisions cause later problems?

决策是否经过充分论证？

假设是否经过验证？
是否考虑了替代方案？
早期决策是否引发了后续问题？

4. Error Handling

4. 错误处理

How were errors addressed?

Detected quickly?
Right strategy applied?
Same error repeated?

错误是如何被处理的？

是否快速检测到错误？
是否采用了正确的应对策略？
是否重复出现相同错误？

5. Communication

5. 沟通能力

How was user interaction quality?

Unnecessary confirmations requested?
Critical information missing at key points?
Too many or too few questions?

与用户的交互质量如何？

是否存在不必要的确认请求？
关键节点是否缺失重要信息？
问题数量是否过多或过少？

6. Reusability

6. 可复用性

Can lessons from this task transfer to the next?

General patterns discovered?
Which skills were missing or insufficient?
Which decisions should become standard?

本次任务的经验是否能迁移到下一次任务中？

是否发现了通用模式？
缺少哪些Skill或哪些Skill存在不足？
哪些决策应形成标准流程？

Finding Severity

问题严重程度划分

Severity	Meaning	Action
CRITICAL	Endangered the task or significantly reduced quality	Must fix
MODERATE	Created inefficiency but didn't break the result	Improve
POSITIVE	Something that went better than expected	Repeat, standardize

严重程度	含义	行动要求
CRITICAL（严重）	危及任务完成或大幅降低成果质量	必须修复
MODERATE（中等）	造成低效但未导致任务失败	需要改进
POSITIVE（积极）	表现超出预期	重复执行、形成标准

Output Format

输出格式

AGENT REVIEWER — Task Retrospective
Task     : [task name]
Score    : X/10
Findings : N critical | N moderate | N positive

AGENT REVIEWER — Task Retrospective
Task     : [任务名称]
Score    : X/10
Findings : N critical | N moderate | N positive

Dimension Scores

Dimension	Score	Summary
Goal Alignment	X/10	...
Efficiency	X/10	...
Decision Quality	X/10	...
Error Handling	X/10	...
Communication	X/10	...
Reusability	X/10	...
Overall	X/10

Dimension	Score	Summary
Goal Alignment	X/10	...
Efficiency	X/10	...
Decision Quality	X/10	...
Error Handling	X/10	...
Communication	X/10	...
Reusability	X/10	...
Overall	X/10

Critical Findings

[If any — what happened, why critical, how to prevent]

[如有严重问题——问题详情、为何严重、预防措施]

Improvement Areas

[Inefficiencies, missed opportunities]

[低效环节、错失的机会]

What Went Well

[Decisions and approaches worth repeating]

[值得重复采用的决策和方法]

Action Items

For Next Task

[Concrete change — what to do]
[Concrete change]

[具体改进措施——要做什么]
[具体改进措施]

Skill / Process Improvement

[Which skill should be updated / added]
[Which pattern should be standardized]

[需要更新/新增的Skill]
[需要标准化的模式]

Lessons Learned

[Items a future agent instance should know — candidates for memory-ledger]

---

[未来Agent实例应知晓的内容——可纳入记忆账本的候选项]

---

Inefficiency Patterns — Auto-Detect

低效模式——自动检测

Scan the task history for these patterns:

Pattern	Symptom	Fix
Repeated tool call	Same file/API read 2+ times	Cache it
Unnecessary confirmation	Low-risk step triggered approval	Adjust checkpoint-guardian threshold
Late assumption discovery	"Actually it should be..." after error	Trigger assumption-checker earlier
Sequential parallel steps	Independent steps ran sequentially	Use parallel-planner
Blind retry	Logic error treated as transient	Fix error-recovery categorization
Context loss	Previous step info forgotten	Memory-ledger not updated
Over-decomposition	2-step task split into 8	Adjust task-decomposer granularity

扫描任务历史，识别以下模式：

模式	症状	修复方案
重复工具调用	同一文件/API被读取2次以上	进行缓存
不必要的确认	低风险步骤触发了审批	调整checkpoint-guardian的阈值
假设发现过晚	出错后才提出“实际上应该是……”	提前触发assumption-checker
串行执行可并行步骤	独立步骤采用串行执行	使用parallel-planner
盲目重试	将逻辑错误视为临时错误	修复错误恢复的分类规则
上下文丢失	遗忘了上一步的信息	未更新memory-ledger
过度分解	2步任务被拆分为8步	调整task-decomposer的颗粒度

Skill Performance Evaluation

Skill 表现评估

Evaluate skills used during the task:

undefined

对任务中使用的Skill进行评估：

undefined

Skills Used

Skill	Used?	Effective?	Notes
task-decomposer	Yes/No	Good/Fair/Poor	...
checkpoint-guardian	Yes/No	Good/Fair/Poor	...
assumption-checker	Yes/No	Good/Fair/Poor	...
tool-selector	Yes/No	Good/Fair/Poor	...
parallel-planner	Yes/No	Good/Fair/Poor	...
error-recovery	Yes/No	Good/Fair/Poor	...
memory-ledger	Yes/No	Good/Fair/Poor	...
output-critic	Yes/No	Good/Fair/Poor	...

Missing / untriggered skills and why?

---

Skill	Used?	Effective?	Notes
task-decomposer	是/否	良好/一般/较差	...
checkpoint-guardian	是/否	良好/一般/较差	...
assumption-checker	是/否	良好/一般/较差	...
tool-selector	是/否	良好/一般/较差	...
parallel-planner	是/否	良好/一般/较差	...
error-recovery	是/否	良好/一般/较差	...
memory-ledger	是/否	良好/一般/较差	...
output-critic	是/否	良好/一般/较差	...

缺失/未触发的Skill及原因？

---

When to Skip

跳过回顾的场景

Task was single-step or under 5 minutes
Prototype / experimental task
User said "no retrospective needed"

任务为单步骤或耗时不足5分钟
原型/实验性任务
用户明确表示“无需回顾总结”

Guardrails

约束规则

Be honest, not kind — the value is in finding problems, not hiding them.
Concrete suggestions only — "do better" is useless; "cache file reads to avoid 3 redundant calls" is actionable.
Cross-skill: this is the ecosystem's feedback loop — findings here should update other skills and processes.

坦诚务实，而非敷衍了事——回顾的价值在于发现问题，而非掩盖问题。
仅提供具体建议——“做得更好”毫无意义；“缓存文件读取以避免3次冗余调用”才是可落地的。
跨Skill协同——这是生态系统的反馈循环——此处的发现应用于更新其他Skill和流程。