research-review

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Research Review via Codex MCP (xhigh reasoning)

通过Codex MCP进行研究评审(超高推理强度)

Get a multi-round critical review of research work from an external LLM with maximum reasoning depth.
借助外部大语言模型,对研究工作进行多轮深度批判性评审。

Constants

常量

  • REVIEWER_MODEL =
    gpt-5.4
    — Model used via Codex MCP. Must be an OpenAI model (e.g.,
    gpt-5.4
    ,
    o3
    ,
    gpt-4o
    )
  • REVIEWER_MODEL =
    gpt-5.4
    — 通过Codex MCP调用的模型。必须是OpenAI模型(例如
    gpt-5.4
    o3
    gpt-4o

Context: $ARGUMENTS

上下文:$ARGUMENTS

Prerequisites

前置条件

  • Codex MCP Server configured in Claude Code:
    bash
    claude mcp add codex -s user -- codex mcp-server
  • This gives Claude Code access to
    mcp__codex__codex
    and
    mcp__codex__codex-reply
    tools
  • 在Claude Code中配置好Codex MCP Server
    bash
    claude mcp add codex -s user -- codex mcp-server
  • 这将让Claude Code获得
    mcp__codex__codex
    mcp__codex__codex-reply
    工具的访问权限

Workflow

工作流程

Step 1: Gather Research Context

步骤1:收集研究上下文

Before calling the external reviewer, compile a comprehensive briefing:
  1. Read project narrative documents (e.g., STORY.md, README.md, paper drafts)
  2. Read any memory/notes files for key findings and experiment history
  3. Identify: core claims, methodology, key results, known weaknesses
在调用外部评审模型前,先整理一份全面的简报:
  1. 阅读项目叙事文档(例如STORY.md、README.md、论文草稿)
  2. 阅读所有记录关键发现和实验历史的记忆/笔记文件
  3. 明确:核心论点、研究方法、关键结果、已知缺陷

Step 2: Initial Review (Round 1)

步骤2:初始评审(第一轮)

Send a detailed prompt with xhigh reasoning:
mcp__codex__codex:
  config: {"model_reasoning_effort": "xhigh"}
  prompt: |
    [Full research context + specific questions]
    Please act as a senior ML reviewer (NeurIPS/ICML level). Identify:
    1. Logical gaps or unjustified claims
    2. Missing experiments that would strengthen the story
    3. Narrative weaknesses
    4. Whether the contribution is sufficient for a top venue
    Please be brutally honest.
发送带有超高推理强度的详细提示词:
mcp__codex__codex:
  config: {"model_reasoning_effort": "xhigh"}
  prompt: |
    [完整研究上下文 + 具体问题]
    请你扮演资深机器学习评审专家(NeurIPS/ICML级别)。请指出:
    1. 逻辑漏洞或缺乏依据的论点
    2. 能完善研究叙事的缺失实验
    3. 叙事结构上的缺陷
    4. 研究贡献是否达到顶会录用标准
    请务必直言不讳。

Step 3: Iterative Dialogue (Rounds 2-N)

步骤3:迭代对话(第2至N轮)

Use
mcp__codex__codex-reply
with the returned
threadId
to continue the conversation:
For each round:
  1. Respond to criticisms with evidence/counterarguments
  2. Ask targeted follow-ups on the most actionable points
  3. Request specific deliverables: experiment designs, paper outlines, claims matrices
Key follow-up patterns:
  • "If we reframe X as Y, does that change your assessment?"
  • "What's the minimum experiment to satisfy concern Z?"
  • "Please design the minimal additional experiment package (highest acceptance lift per GPU week)"
  • "Please write a mock NeurIPS/ICML review with scores"
  • "Give me a results-to-claims matrix for possible experimental outcomes"
使用返回的
threadId
,通过
mcp__codex__codex-reply
继续对话:
每一轮需完成:
  1. 回应批评:提供证据或反驳论据
  2. 针对性跟进:聚焦最具可行性的评审意见
  3. 请求具体产出:实验设计、论文大纲、论点矩阵
关键跟进话术模板:
  • “如果我们把X重构为Y,会改变你的评估结论吗?”
  • “需要完成什么最小规模的实验才能解决问题Z?”
  • “请设计一套最小化的补充实验方案(每GPU周的录用提升率最高)”
  • “请撰写一份模拟NeurIPS/ICML评审意见,包含评分”
  • “请针对实验X和Y的可能结果,给出结果-论点对应矩阵”

Step 4: Convergence

步骤4:达成共识

Stop iterating when:
  • Both sides agree on the core claims and their evidence requirements
  • A concrete experiment plan is established
  • The narrative structure is settled
当满足以下条件时停止迭代:
  • 双方就核心论点及其证据要求达成一致
  • 确定了具体的实验计划
  • 叙事结构最终敲定

Step 5: Document Everything

步骤5:完整记录

Save the full interaction and conclusions to a review document in the project root:
  • Round-by-round summary of criticisms and responses
  • Final consensus on claims, narrative, and experiments
  • Claims matrix (what claims are allowed under each possible outcome)
  • Prioritized TODO list with estimated compute costs
  • Paper outline if discussed
Update project memory/notes with key review conclusions.
将全部交互内容和结论保存到项目根目录的评审文档中:
  • 逐轮总结批评意见与回应内容
  • 关于论点、叙事和实验的最终共识
  • 论点矩阵(不同实验结果对应的可支持论点)
  • 按优先级排序的待办事项及预估算力成本
  • 若讨论过则包含论文大纲
将评审关键结论更新到项目记忆/笔记中。

Key Rules

核心规则

  • ALWAYS use
    config: {"model_reasoning_effort": "xhigh"}
    for reviews
  • Send comprehensive context in Round 1 — the external model cannot read your files
  • Be honest about weaknesses — hiding them leads to worse feedback
  • Push back on criticisms you disagree with, but accept valid ones
  • Focus on ACTIONABLE feedback — "what experiment would fix this?"
  • Document the threadId for potential future resumption
  • The review document should be self-contained (readable without the conversation)
  • 评审时必须使用
    config: {"model_reasoning_effort": "xhigh"}
    配置
  • 第一轮需发送全面的上下文——外部模型无法直接读取你的文件
  • 坦诚告知研究缺陷——隐瞒会导致更差的反馈质量
  • 对不认同的批评可提出反驳,但需接受合理意见
  • 聚焦可落地的反馈——“需要做什么实验来解决这个问题?”
  • 记录
    threadId
    以便后续可能继续对话
  • 评审文档需独立可读(无需依赖原始对话内容)

Prompt Templates

提示词模板

For initial review:

初始评审用:

"I'm going to present a complete ML research project for your critical review. Please act as a senior ML reviewer (NeurIPS/ICML level)..."
“我将展示一个完整的机器学习研究项目,需要你的批判性评审。请你扮演资深机器学习评审专家(NeurIPS/ICML级别)……”

For experiment design:

实验设计用:

"Please design the minimal additional experiment package that gives the highest acceptance lift per GPU week. Our compute: [describe]. Be very specific about configurations."
“请设计一套最小化的补充实验方案,实现每GPU周的顶会录用提升率最大化。我们的算力情况:[描述]。请给出非常具体的配置细节。”

For paper structure:

论文结构用:

"Please turn this into a concrete paper outline with section-by-section claims and figure plan."
“请将此内容转化为具体的论文大纲,包含各章节论点和图表规划。”

For claims matrix:

论点矩阵用:

"Please give me a results-to-claims matrix: what claim is allowed under each possible outcome of experiments X and Y?"
“请给出结果-论点对应矩阵:针对实验X和Y的每种可能结果,可支持哪些论点?”

For mock review:

模拟评审用:

"Please write a mock NeurIPS review with: Summary, Strengths, Weaknesses, Questions for Authors, Score, Confidence, and What Would Move Toward Accept."
“请撰写一份模拟NeurIPS评审意见,包含:摘要、优点、缺点、作者需回答的问题、评分、置信度、录用提升建议。”