evaluate-findings

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Evaluate Findings

评估发现结果

Confidence-based framework for evaluating external feedback (code reviews, AI suggestions, PR comments). Spawn a Devil's Advocate subagent to critically challenge non-trivial claims using research tools. Triage and classify findings — do not apply fixes. Return results for the main agent to act on.
基于置信度的框架,用于评估外部反馈(代码审查、AI建议、PR评论)。生成「魔鬼代言人」子Agent,利用研究工具批判性地质疑非琐碎的结论。对发现结果进行分类和筛选——不执行修复。返回结果供主Agent采取行动。

Process

流程

For each finding:
  1. Read the referenced code at the mentioned location — include the full function or logical block, not just the flagged line
  2. Verify the claim against the actual code — does the issue genuinely exist?
  3. Assign confidence:
LevelCriteriaVerdict
High (>80%)Clear bug, typo, missing check, obvious improvement, style violation matching project conventionsAccept
Medium (50-80%)Likely valid but involves judgment calls or unclear project intentAccept with caveats
Low (<50%)Subjective preference, requires domain knowledge, might break things, reviewer may be wrongSkip
针对每个发现结果:
  1. 查看指定位置的参考代码 —— 包含完整函数或逻辑块,而非仅标记的行
  2. 对照实际代码验证结论 —— 问题是否真实存在?
  3. 分配置信度
等级判定标准结论
(>80%)明确的Bug、拼写错误、缺失检查、明显的改进点、符合项目规范的风格违规采纳
(50-80%)可能有效,但涉及主观判断或项目意图不明确附带说明采纳
(<50%)主观偏好、需要领域知识、可能破坏现有功能、评审者可能错误跳过

Devil's Advocate

魔鬼代言人

After the initial assessment, spawn a subagent to critically challenge findings from a different angle using research tools.
初步评估后,生成子Agent,从不同角度利用研究工具批判性地质疑发现结果。

Spawn Condition

生成条件

Spawn when there are 3 or more findings scored Medium or higher that involve non-trivial claims — API behavior, correctness arguments, performance assertions, or anything not verifiable by reading the code alone.
Skip when all findings are clear-cut (typos, missing null checks, style issues) or total count is 1-2 trivial items.
当存在3个及以上评分为中或更高的非琐碎结论时生成——涉及API行为、正确性论证、性能断言,或任何仅通过阅读代码无法验证的内容。
跳过场景:所有发现结果均为明确的简单问题(拼写错误、缺失空值检查、风格问题),或仅1-2个琐碎项。

Subagent Instructions

子Agent指令

Launch a single subagent using the
Agent
tool (foreground — results are needed before presenting). Provide:
  1. The challenge-worthy findings with file locations, claims, and initial verdicts
  2. Instructions to challenge each finding — try to prove it wrong, or confirm it with evidence
The subagent picks research tools based on claim type:
Claim TypeTool
API deprecated/removed/changedDocumentation MCP tools or WebSearch
Method doesn't exist / wrong signatureDocumentation MCP tools, WebSearch fallback
Code causes specific bug or behaviorBash (isolated read-only test snippet)
Best practice or ecosystem claimWebSearch
Migration or changelog lookupWebSearch → WebFetch
Use whatever documentation tools are available — MCP servers, relevant skills, WebSearch/WebFetch as fallback. The specific tools vary by project setup.
Budget: max 2 research actions per finding. If the first action is conclusive, skip the second.
使用
Agent
工具生成单个子Agent(前台执行——需先获取结果再呈现)。提供以下信息:
  1. 值得质疑的发现结果,包含文件位置、结论和初步判定
  2. 质疑每个发现结果的指令——尝试证明其错误,或用证据确认其正确性
子Agent根据结论类型选择研究工具:
结论类型工具
API已弃用/移除/变更Documentation MCP tools或WebSearch
方法不存在/签名错误Documentation MCP tools,WebSearch作为备选
代码导致特定Bug或行为Bash(隔离的只读测试片段)
最佳实践或生态系统结论WebSearch
迁移或变更日志查询WebSearch → WebFetch
使用所有可用的文档工具——MCP服务器、相关技能,WebSearch/WebFetch作为备选。具体工具因项目配置而异。
预算限制:每个发现结果最多执行2次研究操作。若第一次操作已得出明确结论,则跳过第二次。

Subagent Verdicts

子Agent判定

The subagent returns per finding:
  • Confirmed — found evidence supporting the claim (with source)
  • Disputed — found counter-evidence (with source and explanation)
  • Inconclusive — no definitive evidence either way
子Agent针对每个发现结果返回:
  • 已确认 —— 找到支持结论的证据(附来源)
  • 已质疑 —— 找到反驳证据(附来源和解释)
  • 无结论 —— 未找到明确的支持或反驳证据

Reconciliation

结果调和

Merge subagent results with the initial assessment:
  • Confirmed: verdict stands, confidence may increase. Note the evidence source.
  • Disputed: if originally Accepted → downgrade to Skip or flag with both perspectives. Never silently override — show the disagreement.
  • Inconclusive: verdict stands, note the uncertainty.
Findings not investigated by the subagent keep their original assessment unchanged.
将子Agent结果与初步评估合并:
  • 已确认:判定结果保留,置信度可能提升。记录证据来源。
  • 已质疑:若原判定为「采纳」→ 降级为「跳过」或标记两种观点。切勿默默覆盖——需展示分歧。
  • 无结论:判定结果保留,记录不确定性。
未被子Agent调查的发现结果,保持其初步评估结果不变。

Accepted Findings (high/medium confidence)

已采纳的发现结果(高/中置信度)

  1. Document what the issue is and where
  2. For medium confidence, note assumptions and risks
  1. 记录问题内容及位置
  2. 对于中置信度结果,记录假设和风险

Skipped Findings (low confidence)

已跳过的发现结果(低置信度)

  1. Document why the suggestion was not accepted
  2. Note what additional context would be needed to reconsider
  1. 记录未采纳建议的原因
  2. 记录重新考虑所需的额外上下文

Presenting Results

结果呈现

Present a summary table.
When the Devil's Advocate subagent was not spawned:
FileIssueConfidenceVerdict
When the subagent ran, add an Investigated column:
FileIssueConfidenceVerdictInvestigated
Where Investigated shows:
  • (empty) — not investigated by subagent
  • Confirmed (source) — subagent found supporting evidence
  • Disputed: [reason] — subagent found counter-evidence
For disputed findings, add a callout below the table showing both perspectives.
呈现汇总表格。
未生成魔鬼代言人子Agent时:
文件问题置信度结论
当已生成子Agent时,添加「是否调查」列:
文件问题置信度结论是否调查
「是否调查」列的内容:
  • (空)—— 未被子Agent调查
  • 已确认(来源) —— 子Agent找到支持证据
  • 已质疑:[原因] —— 子Agent找到反驳证据
对于有分歧的发现结果,在表格下方添加标注,展示两种观点。

Rules

规则

  • If a finding references code that no longer exists or has since changed, skip it and note that the code has diverged.
  • If two findings conflict with each other, skip both and document the conflict.
  • For each finding, clarify whether the issue was introduced by the PR/changeset or is pre-existing. Present this distinction explicitly so the user can decide whether it belongs in this PR's scope.
  • 若发现结果引用的代码已不存在或已变更,跳过该结果并记录代码已不一致。
  • 若两个发现结果相互冲突,均跳过并记录冲突情况。
  • 针对每个发现结果,明确问题是由PR/变更集引入的,还是预先存在的。需明确区分此信息,以便用户决定是否将其纳入当前PR的处理范围。