agent-reviewer
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAgent Reviewer Protocol
Agent 评审协议
Task is done — now look back. What went well, what went wrong, what should be different next time? Goal: never repeat the same mistake and continuously improve skills and processes.
Core principle: Retrospectives are painful but necessary. A good agent evaluates itself.
任务已完成——现在进行复盘。哪些地方做得好,哪些地方出了问题,下次应该做出哪些改变?目标:不再重复相同错误,持续优化Skill和流程。
核心原则: 回顾总结虽有难度但必不可少。优秀的Agent会进行自我评估。
6 Review Dimensions
6个评审维度
1. Goal Alignment
1. 目标一致性
Did the result match the original intent?
- Was the user's actual request met?
- Did scope creep occur?
- Over-delivery or under-delivery?
结果是否符合初始意图?
- 是否满足了用户的实际需求?
- 是否出现了范围蔓延?
- 交付内容过多还是不足?
2. Efficiency
2. 效率
Did the task take longer than necessary?
- Unnecessary tool calls?
- Repeated operations?
- Sequential steps that could have been parallel?
- Token/resource waste?
任务耗时是否超出必要范围?
- 是否存在不必要的工具调用?
- 是否有重复操作?
- 可并行的步骤是否采用了串行执行?
- 是否存在Token/资源浪费?
3. Decision Quality
3. 决策质量
Were decisions well-reasoned?
- Were assumptions verified?
- Were alternatives considered?
- Did early decisions cause later problems?
决策是否经过充分论证?
- 假设是否经过验证?
- 是否考虑了替代方案?
- 早期决策是否引发了后续问题?
4. Error Handling
4. 错误处理
How were errors addressed?
- Detected quickly?
- Right strategy applied?
- Same error repeated?
错误是如何被处理的?
- 是否快速检测到错误?
- 是否采用了正确的应对策略?
- 是否重复出现相同错误?
5. Communication
5. 沟通能力
How was user interaction quality?
- Unnecessary confirmations requested?
- Critical information missing at key points?
- Too many or too few questions?
与用户的交互质量如何?
- 是否存在不必要的确认请求?
- 关键节点是否缺失重要信息?
- 问题数量是否过多或过少?
6. Reusability
6. 可复用性
Can lessons from this task transfer to the next?
- General patterns discovered?
- Which skills were missing or insufficient?
- Which decisions should become standard?
本次任务的经验是否能迁移到下一次任务中?
- 是否发现了通用模式?
- 缺少哪些Skill或哪些Skill存在不足?
- 哪些决策应形成标准流程?
Finding Severity
问题严重程度划分
| Severity | Meaning | Action |
|---|---|---|
| CRITICAL | Endangered the task or significantly reduced quality | Must fix |
| MODERATE | Created inefficiency but didn't break the result | Improve |
| POSITIVE | Something that went better than expected | Repeat, standardize |
| 严重程度 | 含义 | 行动要求 |
|---|---|---|
| CRITICAL(严重) | 危及任务完成或大幅降低成果质量 | 必须修复 |
| MODERATE(中等) | 造成低效但未导致任务失败 | 需要改进 |
| POSITIVE(积极) | 表现超出预期 | 重复执行、形成标准 |
Output Format
输出格式
AGENT REVIEWER — Task Retrospective
Task : [task name]
Score : X/10
Findings : N critical | N moderate | N positiveAGENT REVIEWER — Task Retrospective
Task : [任务名称]
Score : X/10
Findings : N critical | N moderate | N positiveDimension Scores
Dimension Scores
| Dimension | Score | Summary |
|---|---|---|
| Goal Alignment | X/10 | ... |
| Efficiency | X/10 | ... |
| Decision Quality | X/10 | ... |
| Error Handling | X/10 | ... |
| Communication | X/10 | ... |
| Reusability | X/10 | ... |
| Overall | X/10 |
| Dimension | Score | Summary |
|---|---|---|
| Goal Alignment | X/10 | ... |
| Efficiency | X/10 | ... |
| Decision Quality | X/10 | ... |
| Error Handling | X/10 | ... |
| Communication | X/10 | ... |
| Reusability | X/10 | ... |
| Overall | X/10 |
Critical Findings
Critical Findings
[If any — what happened, why critical, how to prevent]
[如有严重问题——问题详情、为何严重、预防措施]
Improvement Areas
Improvement Areas
[Inefficiencies, missed opportunities]
[低效环节、错失的机会]
What Went Well
What Went Well
[Decisions and approaches worth repeating]
[值得重复采用的决策和方法]
Action Items
Action Items
For Next Task
For Next Task
- [Concrete change — what to do]
- [Concrete change]
- [具体改进措施——要做什么]
- [具体改进措施]
Skill / Process Improvement
Skill / Process Improvement
- [Which skill should be updated / added]
- [Which pattern should be standardized]
- [需要更新/新增的Skill]
- [需要标准化的模式]
Lessons Learned
Lessons Learned
[Items a future agent instance should know — candidates for memory-ledger]
---[未来Agent实例应知晓的内容——可纳入记忆账本的候选项]
---Inefficiency Patterns — Auto-Detect
低效模式——自动检测
Scan the task history for these patterns:
| Pattern | Symptom | Fix |
|---|---|---|
| Repeated tool call | Same file/API read 2+ times | Cache it |
| Unnecessary confirmation | Low-risk step triggered approval | Adjust checkpoint-guardian threshold |
| Late assumption discovery | "Actually it should be..." after error | Trigger assumption-checker earlier |
| Sequential parallel steps | Independent steps ran sequentially | Use parallel-planner |
| Blind retry | Logic error treated as transient | Fix error-recovery categorization |
| Context loss | Previous step info forgotten | Memory-ledger not updated |
| Over-decomposition | 2-step task split into 8 | Adjust task-decomposer granularity |
扫描任务历史,识别以下模式:
| 模式 | 症状 | 修复方案 |
|---|---|---|
| 重复工具调用 | 同一文件/API被读取2次以上 | 进行缓存 |
| 不必要的确认 | 低风险步骤触发了审批 | 调整checkpoint-guardian的阈值 |
| 假设发现过晚 | 出错后才提出“实际上应该是……” | 提前触发assumption-checker |
| 串行执行可并行步骤 | 独立步骤采用串行执行 | 使用parallel-planner |
| 盲目重试 | 将逻辑错误视为临时错误 | 修复错误恢复的分类规则 |
| 上下文丢失 | 遗忘了上一步的信息 | 未更新memory-ledger |
| 过度分解 | 2步任务被拆分为8步 | 调整task-decomposer的颗粒度 |
Skill Performance Evaluation
Skill 表现评估
Evaluate skills used during the task:
undefined对任务中使用的Skill进行评估:
undefinedSkills Used
Skills Used
| Skill | Used? | Effective? | Notes |
|---|---|---|---|
| task-decomposer | Yes/No | Good/Fair/Poor | ... |
| checkpoint-guardian | Yes/No | Good/Fair/Poor | ... |
| assumption-checker | Yes/No | Good/Fair/Poor | ... |
| tool-selector | Yes/No | Good/Fair/Poor | ... |
| parallel-planner | Yes/No | Good/Fair/Poor | ... |
| error-recovery | Yes/No | Good/Fair/Poor | ... |
| memory-ledger | Yes/No | Good/Fair/Poor | ... |
| output-critic | Yes/No | Good/Fair/Poor | ... |
Missing / untriggered skills and why?
---| Skill | Used? | Effective? | Notes |
|---|---|---|---|
| task-decomposer | 是/否 | 良好/一般/较差 | ... |
| checkpoint-guardian | 是/否 | 良好/一般/较差 | ... |
| assumption-checker | 是/否 | 良好/一般/较差 | ... |
| tool-selector | 是/否 | 良好/一般/较差 | ... |
| parallel-planner | 是/否 | 良好/一般/较差 | ... |
| error-recovery | 是/否 | 良好/一般/较差 | ... |
| memory-ledger | 是/否 | 良好/一般/较差 | ... |
| output-critic | 是/否 | 良好/一般/较差 | ... |
缺失/未触发的Skill及原因?
---When to Skip
跳过回顾的场景
- Task was single-step or under 5 minutes
- Prototype / experimental task
- User said "no retrospective needed"
- 任务为单步骤或耗时不足5分钟
- 原型/实验性任务
- 用户明确表示“无需回顾总结”
Guardrails
约束规则
- Be honest, not kind — the value is in finding problems, not hiding them.
- Concrete suggestions only — "do better" is useless; "cache file reads to avoid 3 redundant calls" is actionable.
- Cross-skill: this is the ecosystem's feedback loop — findings here should update other skills and processes.
- 坦诚务实,而非敷衍了事——回顾的价值在于发现问题,而非掩盖问题。
- 仅提供具体建议——“做得更好”毫无意义;“缓存文件读取以避免3次冗余调用”才是可落地的。
- 跨Skill协同——这是生态系统的反馈循环——此处的发现应用于更新其他Skill和流程。