research-review

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Research Review via Codex MCP (xhigh reasoning)

通过Codex MCP进行研究评审（超高推理强度）

Get a multi-round critical review of research work from an external LLM with maximum reasoning depth.

借助外部大语言模型，对研究工作进行多轮深度批判性评审。

Constants

常量

REVIEWER_MODEL =
```
gpt-5.4
```
— Model used via Codex MCP. Must be an OpenAI model (e.g.,
```
gpt-5.4
```
,
```
o3
```
,
```
gpt-4o
```
)

REVIEWER_MODEL =
```
gpt-5.4
```
— 通过Codex MCP调用的模型。必须是OpenAI模型（例如
```
gpt-5.4
```
、
```
o3
```
、
```
gpt-4o
```
）

Context: $ARGUMENTS

上下文：$ARGUMENTS

Prerequisites

前置条件

Codex MCP Server configured in Claude Code:

bash

claude mcp add codex -s user -- codex mcp-server

This gives Claude Code access to

mcp__codex__codex

and

mcp__codex__codex-reply

tools

在Claude Code中配置好Codex MCP Server：

bash

claude mcp add codex -s user -- codex mcp-server

这将让Claude Code获得
```
mcp__codex__codex
```
和
```
mcp__codex__codex-reply
```
工具的访问权限

Workflow

工作流程

Step 1: Gather Research Context

步骤1：收集研究上下文

Before calling the external reviewer, compile a comprehensive briefing:

Read project narrative documents (e.g., STORY.md, README.md, paper drafts)
Read any memory/notes files for key findings and experiment history
Identify: core claims, methodology, key results, known weaknesses

在调用外部评审模型前，先整理一份全面的简报：

阅读项目叙事文档（例如STORY.md、README.md、论文草稿）
阅读所有记录关键发现和实验历史的记忆/笔记文件
明确：核心论点、研究方法、关键结果、已知缺陷

Step 2: Initial Review (Round 1)

步骤2：初始评审（第一轮）

Send a detailed prompt with xhigh reasoning:

mcp__codex__codex:
  config: {"model_reasoning_effort": "xhigh"}
  prompt: |
    [Full research context + specific questions]
    Please act as a senior ML reviewer (NeurIPS/ICML level). Identify:
    1. Logical gaps or unjustified claims
    2. Missing experiments that would strengthen the story
    3. Narrative weaknesses
    4. Whether the contribution is sufficient for a top venue
    Please be brutally honest.

发送带有超高推理强度的详细提示词：

mcp__codex__codex:
  config: {"model_reasoning_effort": "xhigh"}
  prompt: |
    [完整研究上下文 + 具体问题]
    请你扮演资深机器学习评审专家（NeurIPS/ICML级别）。请指出：
    1. 逻辑漏洞或缺乏依据的论点
    2. 能完善研究叙事的缺失实验
    3. 叙事结构上的缺陷
    4. 研究贡献是否达到顶会录用标准
    请务必直言不讳。

Step 3: Iterative Dialogue (Rounds 2-N)

步骤3：迭代对话（第2至N轮）

Use

mcp__codex__codex-reply

with the returned

threadId

to continue the conversation:

For each round:

Respond to criticisms with evidence/counterarguments
Ask targeted follow-ups on the most actionable points
Request specific deliverables: experiment designs, paper outlines, claims matrices

Key follow-up patterns:

"If we reframe X as Y, does that change your assessment?"
"What's the minimum experiment to satisfy concern Z?"
"Please design the minimal additional experiment package (highest acceptance lift per GPU week)"
"Please write a mock NeurIPS/ICML review with scores"
"Give me a results-to-claims matrix for possible experimental outcomes"

使用返回的

threadId

，通过

mcp__codex__codex-reply

继续对话：

每一轮需完成：

回应批评：提供证据或反驳论据
针对性跟进：聚焦最具可行性的评审意见
请求具体产出：实验设计、论文大纲、论点矩阵

关键跟进话术模板：

“如果我们把X重构为Y，会改变你的评估结论吗？”
“需要完成什么最小规模的实验才能解决问题Z？”
“请设计一套最小化的补充实验方案（每GPU周的录用提升率最高）”
“请撰写一份模拟NeurIPS/ICML评审意见，包含评分”
“请针对实验X和Y的可能结果，给出结果-论点对应矩阵”

Step 4: Convergence

步骤4：达成共识

Stop iterating when:

Both sides agree on the core claims and their evidence requirements
A concrete experiment plan is established
The narrative structure is settled

当满足以下条件时停止迭代：

双方就核心论点及其证据要求达成一致
确定了具体的实验计划
叙事结构最终敲定

Step 5: Document Everything

步骤5：完整记录

Save the full interaction and conclusions to a review document in the project root:

Round-by-round summary of criticisms and responses
Final consensus on claims, narrative, and experiments
Claims matrix (what claims are allowed under each possible outcome)
Prioritized TODO list with estimated compute costs
Paper outline if discussed

Update project memory/notes with key review conclusions.

将全部交互内容和结论保存到项目根目录的评审文档中：

逐轮总结批评意见与回应内容
关于论点、叙事和实验的最终共识
论点矩阵（不同实验结果对应的可支持论点）
按优先级排序的待办事项及预估算力成本
若讨论过则包含论文大纲

将评审关键结论更新到项目记忆/笔记中。

Key Rules

核心规则

ALWAYS use

config: {"model_reasoning_effort": "xhigh"}

for reviews

Send comprehensive context in Round 1 — the external model cannot read your files
Be honest about weaknesses — hiding them leads to worse feedback
Push back on criticisms you disagree with, but accept valid ones
Focus on ACTIONABLE feedback — "what experiment would fix this?"
Document the threadId for potential future resumption
The review document should be self-contained (readable without the conversation)

评审时必须使用

config: {"model_reasoning_effort": "xhigh"}

配置

第一轮需发送全面的上下文——外部模型无法直接读取你的文件
坦诚告知研究缺陷——隐瞒会导致更差的反馈质量
对不认同的批评可提出反驳，但需接受合理意见
聚焦可落地的反馈——“需要做什么实验来解决这个问题？”
记录
```
threadId
```
以便后续可能继续对话
评审文档需独立可读（无需依赖原始对话内容）

Prompt Templates

提示词模板

For initial review:

初始评审用：

"I'm going to present a complete ML research project for your critical review. Please act as a senior ML reviewer (NeurIPS/ICML level)..."

“我将展示一个完整的机器学习研究项目，需要你的批判性评审。请你扮演资深机器学习评审专家（NeurIPS/ICML级别）……”

For experiment design:

实验设计用：

"Please design the minimal additional experiment package that gives the highest acceptance lift per GPU week. Our compute: [describe]. Be very specific about configurations."

“请设计一套最小化的补充实验方案，实现每GPU周的顶会录用提升率最大化。我们的算力情况：[描述]。请给出非常具体的配置细节。”

For paper structure:

论文结构用：

"Please turn this into a concrete paper outline with section-by-section claims and figure plan."

“请将此内容转化为具体的论文大纲，包含各章节论点和图表规划。”

For claims matrix:

论点矩阵用：

"Please give me a results-to-claims matrix: what claim is allowed under each possible outcome of experiments X and Y?"

“请给出结果-论点对应矩阵：针对实验X和Y的每种可能结果，可支持哪些论点？”

For mock review:

模拟评审用：

"Please write a mock NeurIPS review with: Summary, Strengths, Weaknesses, Questions for Authors, Score, Confidence, and What Would Move Toward Accept."

“请撰写一份模拟NeurIPS评审意见，包含：摘要、优点、缺点、作者需回答的问题、评分、置信度、录用提升建议。”