research-review
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseResearch Review via Codex MCP (xhigh reasoning)
通过Codex MCP进行研究评审(超高推理强度)
Get a multi-round critical review of research work from an external LLM with maximum reasoning depth.
借助外部大语言模型,对研究工作进行多轮深度批判性评审。
Constants
常量
- REVIEWER_MODEL = — Model used via Codex MCP. Must be an OpenAI model (e.g.,
gpt-5.4,gpt-5.4,o3)gpt-4o
- REVIEWER_MODEL = — 通过Codex MCP调用的模型。必须是OpenAI模型(例如
gpt-5.4、gpt-5.4、o3)gpt-4o
Context: $ARGUMENTS
上下文:$ARGUMENTS
Prerequisites
前置条件
- Codex MCP Server configured in Claude Code:
bash
claude mcp add codex -s user -- codex mcp-server - This gives Claude Code access to and
mcp__codex__codextoolsmcp__codex__codex-reply
- 在Claude Code中配置好Codex MCP Server:
bash
claude mcp add codex -s user -- codex mcp-server - 这将让Claude Code获得和
mcp__codex__codex工具的访问权限mcp__codex__codex-reply
Workflow
工作流程
Step 1: Gather Research Context
步骤1:收集研究上下文
Before calling the external reviewer, compile a comprehensive briefing:
- Read project narrative documents (e.g., STORY.md, README.md, paper drafts)
- Read any memory/notes files for key findings and experiment history
- Identify: core claims, methodology, key results, known weaknesses
在调用外部评审模型前,先整理一份全面的简报:
- 阅读项目叙事文档(例如STORY.md、README.md、论文草稿)
- 阅读所有记录关键发现和实验历史的记忆/笔记文件
- 明确:核心论点、研究方法、关键结果、已知缺陷
Step 2: Initial Review (Round 1)
步骤2:初始评审(第一轮)
Send a detailed prompt with xhigh reasoning:
mcp__codex__codex:
config: {"model_reasoning_effort": "xhigh"}
prompt: |
[Full research context + specific questions]
Please act as a senior ML reviewer (NeurIPS/ICML level). Identify:
1. Logical gaps or unjustified claims
2. Missing experiments that would strengthen the story
3. Narrative weaknesses
4. Whether the contribution is sufficient for a top venue
Please be brutally honest.发送带有超高推理强度的详细提示词:
mcp__codex__codex:
config: {"model_reasoning_effort": "xhigh"}
prompt: |
[完整研究上下文 + 具体问题]
请你扮演资深机器学习评审专家(NeurIPS/ICML级别)。请指出:
1. 逻辑漏洞或缺乏依据的论点
2. 能完善研究叙事的缺失实验
3. 叙事结构上的缺陷
4. 研究贡献是否达到顶会录用标准
请务必直言不讳。Step 3: Iterative Dialogue (Rounds 2-N)
步骤3:迭代对话(第2至N轮)
Use with the returned to continue the conversation:
mcp__codex__codex-replythreadIdFor each round:
- Respond to criticisms with evidence/counterarguments
- Ask targeted follow-ups on the most actionable points
- Request specific deliverables: experiment designs, paper outlines, claims matrices
Key follow-up patterns:
- "If we reframe X as Y, does that change your assessment?"
- "What's the minimum experiment to satisfy concern Z?"
- "Please design the minimal additional experiment package (highest acceptance lift per GPU week)"
- "Please write a mock NeurIPS/ICML review with scores"
- "Give me a results-to-claims matrix for possible experimental outcomes"
使用返回的,通过继续对话:
threadIdmcp__codex__codex-reply每一轮需完成:
- 回应批评:提供证据或反驳论据
- 针对性跟进:聚焦最具可行性的评审意见
- 请求具体产出:实验设计、论文大纲、论点矩阵
关键跟进话术模板:
- “如果我们把X重构为Y,会改变你的评估结论吗?”
- “需要完成什么最小规模的实验才能解决问题Z?”
- “请设计一套最小化的补充实验方案(每GPU周的录用提升率最高)”
- “请撰写一份模拟NeurIPS/ICML评审意见,包含评分”
- “请针对实验X和Y的可能结果,给出结果-论点对应矩阵”
Step 4: Convergence
步骤4:达成共识
Stop iterating when:
- Both sides agree on the core claims and their evidence requirements
- A concrete experiment plan is established
- The narrative structure is settled
当满足以下条件时停止迭代:
- 双方就核心论点及其证据要求达成一致
- 确定了具体的实验计划
- 叙事结构最终敲定
Step 5: Document Everything
步骤5:完整记录
Save the full interaction and conclusions to a review document in the project root:
- Round-by-round summary of criticisms and responses
- Final consensus on claims, narrative, and experiments
- Claims matrix (what claims are allowed under each possible outcome)
- Prioritized TODO list with estimated compute costs
- Paper outline if discussed
Update project memory/notes with key review conclusions.
将全部交互内容和结论保存到项目根目录的评审文档中:
- 逐轮总结批评意见与回应内容
- 关于论点、叙事和实验的最终共识
- 论点矩阵(不同实验结果对应的可支持论点)
- 按优先级排序的待办事项及预估算力成本
- 若讨论过则包含论文大纲
将评审关键结论更新到项目记忆/笔记中。
Key Rules
核心规则
- ALWAYS use for reviews
config: {"model_reasoning_effort": "xhigh"} - Send comprehensive context in Round 1 — the external model cannot read your files
- Be honest about weaknesses — hiding them leads to worse feedback
- Push back on criticisms you disagree with, but accept valid ones
- Focus on ACTIONABLE feedback — "what experiment would fix this?"
- Document the threadId for potential future resumption
- The review document should be self-contained (readable without the conversation)
- 评审时必须使用配置
config: {"model_reasoning_effort": "xhigh"} - 第一轮需发送全面的上下文——外部模型无法直接读取你的文件
- 坦诚告知研究缺陷——隐瞒会导致更差的反馈质量
- 对不认同的批评可提出反驳,但需接受合理意见
- 聚焦可落地的反馈——“需要做什么实验来解决这个问题?”
- 记录以便后续可能继续对话
threadId - 评审文档需独立可读(无需依赖原始对话内容)
Prompt Templates
提示词模板
For initial review:
初始评审用:
"I'm going to present a complete ML research project for your critical review. Please act as a senior ML reviewer (NeurIPS/ICML level)..."
“我将展示一个完整的机器学习研究项目,需要你的批判性评审。请你扮演资深机器学习评审专家(NeurIPS/ICML级别)……”
For experiment design:
实验设计用:
"Please design the minimal additional experiment package that gives the highest acceptance lift per GPU week. Our compute: [describe]. Be very specific about configurations."
“请设计一套最小化的补充实验方案,实现每GPU周的顶会录用提升率最大化。我们的算力情况:[描述]。请给出非常具体的配置细节。”
For paper structure:
论文结构用:
"Please turn this into a concrete paper outline with section-by-section claims and figure plan."
“请将此内容转化为具体的论文大纲,包含各章节论点和图表规划。”
For claims matrix:
论点矩阵用:
"Please give me a results-to-claims matrix: what claim is allowed under each possible outcome of experiments X and Y?"
“请给出结果-论点对应矩阵:针对实验X和Y的每种可能结果,可支持哪些论点?”
For mock review:
模拟评审用:
"Please write a mock NeurIPS review with: Summary, Strengths, Weaknesses, Questions for Authors, Score, Confidence, and What Would Move Toward Accept."
“请撰写一份模拟NeurIPS评审意见,包含:摘要、优点、缺点、作者需回答的问题、评分、置信度、录用提升建议。”