evaluate-findings
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseEvaluate Findings
评估发现结果
Confidence-based framework for evaluating external feedback (code reviews, AI suggestions, PR comments). Spawn a Devil's Advocate subagent to critically challenge non-trivial claims using research tools. Triage and classify findings — do not apply fixes. Return results for the main agent to act on.
基于置信度的框架,用于评估外部反馈(代码审查、AI建议、PR评论)。生成「魔鬼代言人」子Agent,利用研究工具批判性地质疑非琐碎的结论。对发现结果进行分类和筛选——不执行修复。返回结果供主Agent采取行动。
Process
流程
For each finding:
- Read the referenced code at the mentioned location — include the full function or logical block, not just the flagged line
- Verify the claim against the actual code — does the issue genuinely exist?
- Assign confidence:
| Level | Criteria | Verdict |
|---|---|---|
| High (>80%) | Clear bug, typo, missing check, obvious improvement, style violation matching project conventions | Accept |
| Medium (50-80%) | Likely valid but involves judgment calls or unclear project intent | Accept with caveats |
| Low (<50%) | Subjective preference, requires domain knowledge, might break things, reviewer may be wrong | Skip |
针对每个发现结果:
- 查看指定位置的参考代码 —— 包含完整函数或逻辑块,而非仅标记的行
- 对照实际代码验证结论 —— 问题是否真实存在?
- 分配置信度:
| 等级 | 判定标准 | 结论 |
|---|---|---|
| 高 (>80%) | 明确的Bug、拼写错误、缺失检查、明显的改进点、符合项目规范的风格违规 | 采纳 |
| 中 (50-80%) | 可能有效,但涉及主观判断或项目意图不明确 | 附带说明采纳 |
| 低 (<50%) | 主观偏好、需要领域知识、可能破坏现有功能、评审者可能错误 | 跳过 |
Devil's Advocate
魔鬼代言人
After the initial assessment, spawn a subagent to critically challenge findings from a different angle using research tools.
初步评估后,生成子Agent,从不同角度利用研究工具批判性地质疑发现结果。
Spawn Condition
生成条件
Spawn when there are 3 or more findings scored Medium or higher that involve non-trivial claims — API behavior, correctness arguments, performance assertions, or anything not verifiable by reading the code alone.
Skip when all findings are clear-cut (typos, missing null checks, style issues) or total count is 1-2 trivial items.
当存在3个及以上评分为中或更高的非琐碎结论时生成——涉及API行为、正确性论证、性能断言,或任何仅通过阅读代码无法验证的内容。
跳过场景:所有发现结果均为明确的简单问题(拼写错误、缺失空值检查、风格问题),或仅1-2个琐碎项。
Subagent Instructions
子Agent指令
Launch a single subagent using the tool (foreground — results are needed before presenting). Provide:
Agent- The challenge-worthy findings with file locations, claims, and initial verdicts
- Instructions to challenge each finding — try to prove it wrong, or confirm it with evidence
The subagent picks research tools based on claim type:
| Claim Type | Tool |
|---|---|
| API deprecated/removed/changed | Documentation MCP tools or WebSearch |
| Method doesn't exist / wrong signature | Documentation MCP tools, WebSearch fallback |
| Code causes specific bug or behavior | Bash (isolated read-only test snippet) |
| Best practice or ecosystem claim | WebSearch |
| Migration or changelog lookup | WebSearch → WebFetch |
Use whatever documentation tools are available — MCP servers, relevant skills, WebSearch/WebFetch as fallback. The specific tools vary by project setup.
Budget: max 2 research actions per finding. If the first action is conclusive, skip the second.
使用工具生成单个子Agent(前台执行——需先获取结果再呈现)。提供以下信息:
Agent- 值得质疑的发现结果,包含文件位置、结论和初步判定
- 质疑每个发现结果的指令——尝试证明其错误,或用证据确认其正确性
子Agent根据结论类型选择研究工具:
| 结论类型 | 工具 |
|---|---|
| API已弃用/移除/变更 | Documentation MCP tools或WebSearch |
| 方法不存在/签名错误 | Documentation MCP tools,WebSearch作为备选 |
| 代码导致特定Bug或行为 | Bash(隔离的只读测试片段) |
| 最佳实践或生态系统结论 | WebSearch |
| 迁移或变更日志查询 | WebSearch → WebFetch |
使用所有可用的文档工具——MCP服务器、相关技能,WebSearch/WebFetch作为备选。具体工具因项目配置而异。
预算限制:每个发现结果最多执行2次研究操作。若第一次操作已得出明确结论,则跳过第二次。
Subagent Verdicts
子Agent判定
The subagent returns per finding:
- Confirmed — found evidence supporting the claim (with source)
- Disputed — found counter-evidence (with source and explanation)
- Inconclusive — no definitive evidence either way
子Agent针对每个发现结果返回:
- 已确认 —— 找到支持结论的证据(附来源)
- 已质疑 —— 找到反驳证据(附来源和解释)
- 无结论 —— 未找到明确的支持或反驳证据
Reconciliation
结果调和
Merge subagent results with the initial assessment:
- Confirmed: verdict stands, confidence may increase. Note the evidence source.
- Disputed: if originally Accepted → downgrade to Skip or flag with both perspectives. Never silently override — show the disagreement.
- Inconclusive: verdict stands, note the uncertainty.
Findings not investigated by the subagent keep their original assessment unchanged.
将子Agent结果与初步评估合并:
- 已确认:判定结果保留,置信度可能提升。记录证据来源。
- 已质疑:若原判定为「采纳」→ 降级为「跳过」或标记两种观点。切勿默默覆盖——需展示分歧。
- 无结论:判定结果保留,记录不确定性。
未被子Agent调查的发现结果,保持其初步评估结果不变。
Accepted Findings (high/medium confidence)
已采纳的发现结果(高/中置信度)
- Document what the issue is and where
- For medium confidence, note assumptions and risks
- 记录问题内容及位置
- 对于中置信度结果,记录假设和风险
Skipped Findings (low confidence)
已跳过的发现结果(低置信度)
- Document why the suggestion was not accepted
- Note what additional context would be needed to reconsider
- 记录未采纳建议的原因
- 记录重新考虑所需的额外上下文
Presenting Results
结果呈现
Present a summary table.
When the Devil's Advocate subagent was not spawned:
| File | Issue | Confidence | Verdict |
|---|
When the subagent ran, add an Investigated column:
| File | Issue | Confidence | Verdict | Investigated |
|---|
Where Investigated shows:
- (empty) — not investigated by subagent
- Confirmed (source) — subagent found supporting evidence
- Disputed: [reason] — subagent found counter-evidence
For disputed findings, add a callout below the table showing both perspectives.
呈现汇总表格。
当未生成魔鬼代言人子Agent时:
| 文件 | 问题 | 置信度 | 结论 |
|---|
当已生成子Agent时,添加「是否调查」列:
| 文件 | 问题 | 置信度 | 结论 | 是否调查 |
|---|
「是否调查」列的内容:
- (空)—— 未被子Agent调查
- 已确认(来源) —— 子Agent找到支持证据
- 已质疑:[原因] —— 子Agent找到反驳证据
对于有分歧的发现结果,在表格下方添加标注,展示两种观点。
Rules
规则
- If a finding references code that no longer exists or has since changed, skip it and note that the code has diverged.
- If two findings conflict with each other, skip both and document the conflict.
- For each finding, clarify whether the issue was introduced by the PR/changeset or is pre-existing. Present this distinction explicitly so the user can decide whether it belongs in this PR's scope.
- 若发现结果引用的代码已不存在或已变更,跳过该结果并记录代码已不一致。
- 若两个发现结果相互冲突,均跳过并记录冲突情况。
- 针对每个发现结果,明确问题是由PR/变更集引入的,还是预先存在的。需明确区分此信息,以便用户决定是否将其纳入当前PR的处理范围。