auto-review-loop-minimax

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Auto Review Loop (MiniMax Version): Autonomous Research Improvement

自动评审循环(MiniMax版本):自主研究改进

Autonomously iterate: review → implement fixes → re-review, until the external reviewer gives a positive assessment or MAX_ROUNDS is reached.
自主迭代:评审→实施修复→重新评审,直到外部评审给出正面评价或达到MAX_ROUNDS上限。

Context: $ARGUMENTS

上下文:$ARGUMENTS

Constants

常量

  • MAX_ROUNDS = 4
  • POSITIVE_THRESHOLD: score >= 6/10, or verdict contains "accept", "sufficient", "ready for submission"
  • REVIEW_DOC:
    AUTO_REVIEW.md
    in project root (cumulative log)
  • REVIEWER_MODEL =
    MiniMax-M2.5
    — Model used via MiniMax API
  • MAX_ROUNDS = 4
  • POSITIVE_THRESHOLD:评分≥6/10,或评审结论包含"accept"、"sufficient"、"ready for submission"
  • REVIEW_DOC:项目根目录下的
    AUTO_REVIEW.md
    (累积日志)
  • REVIEWER_MODEL =
    MiniMax-M2.5
    — 通过MiniMax API调用的模型

API Configuration

API配置

This skill uses MiniMax API for external review. Two methods are supported:
本技能使用MiniMax API进行外部评审,支持两种方式:

Method 1: MCP Tool (Primary)

方式1:MCP工具(首选)

If
mcp__minimax-chat__minimax_chat
is available, use it:
mcp__minimax-chat__minimax_chat:
  prompt: |
    [Review prompt content]
  model: "MiniMax-M2.5"
  system: "You are a senior machine learning researcher..."
mcp__minimax-chat__minimax_chat
可用,则使用该工具:
mcp__minimax-chat__minimax_chat:
  prompt: |
    [Review prompt content]
  model: "MiniMax-M2.5"
  system: "You are a senior machine learning researcher..."

Method 2: curl (Fallback)

方式2:curl(备选)

If MCP is not available, use curl directly:
bash
curl -s "https://api.minimax.chat/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $MINIMAX_API_KEY" \
  -d '{
    "model": "MiniMax-M2.5",
    "messages": [
      {"role": "system", "content": "You are a senior ML researcher..."},
      {"role": "user", "content": "[Review prompt]"}
    ],
    "max_tokens": 4096
  }'
API Key: Read from
~/.claude/settings.json
under
env.MINIMAX_API_KEY
, or from environment variable.
Why MiniMax instead of Codex MCP? Codex CLI uses OpenAI's Responses API (
/v1/responses
) which is not supported by third-party providers. See: https://github.com/openai/codex/discussions/7782
若MCP不可用,则直接使用curl:
bash
curl -s "https://api.minimax.chat/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $MINIMAX_API_KEY" \
  -d '{
    "model": "MiniMax-M2.5",
    "messages": [
      {"role": "system", "content": "You are a senior ML researcher..."},
      {"role": "user", "content": "[Review prompt]"}
    ],
    "max_tokens": 4096
  }'
API密钥:从
~/.claude/settings.json
env.MINIMAX_API_KEY
读取,或从环境变量获取。
为何选择MiniMax而非Codex MCP? Codex CLI使用OpenAI的Responses API(
/v1/responses
),该接口不被第三方服务商支持。详情见:https://github.com/openai/codex/discussions/7782

State Persistence (Compact Recovery)

状态持久化(紧凑恢复)

Long-running loops may hit the context window limit, triggering automatic compaction. To survive this, persist state to
REVIEW_STATE.json
after each round:
json
{
  "round": 2,
  "status": "in_progress",
  "last_score": 5.0,
  "last_verdict": "not ready",
  "pending_experiments": ["screen_name_1"],
  "timestamp": "2026-03-13T21:00:00"
}
Write this file at the end of every Phase E (after documenting the round). Overwrite each time — only the latest state matters.
On completion (positive assessment or max rounds), set
"status": "completed"
so future invocations don't accidentally resume a finished loop.
长时间运行的循环可能会触达上下文窗口限制,触发自动压缩。为了应对这种情况,每轮结束后需将状态持久化到
REVIEW_STATE.json
json
{
  "round": 2,
  "status": "in_progress",
  "last_score": 5.0,
  "last_verdict": "not ready",
  "pending_experiments": ["screen_name_1"],
  "timestamp": "2026-03-13T21:00:00"
}
需在每个Phase E结束时写入该文件(记录完本轮内容后)。每次覆盖文件——仅保留最新状态。
循环完成时(获得正面评价或达到最大轮次),设置
"status": "completed"
,避免后续调用意外恢复已完成的循环。

Workflow

工作流

Initialization

初始化

  1. Check for
    REVIEW_STATE.json
    in project root:
    • If it does not exist: fresh start (normal case)
    • If it exists AND
      status
      is
      "completed"
      : fresh start (previous loop finished normally)
    • If it exists AND
      status
      is
      "in_progress"
      AND
      timestamp
      is older than 24 hours: fresh start (stale state from a killed/abandoned run — delete the file and start over)
    • If it exists AND
      status
      is
      "in_progress"
      AND
      timestamp
      is within 24 hours: resume
      • Read the state file to recover
        round
        ,
        last_score
        ,
        pending_experiments
      • Read
        AUTO_REVIEW.md
        to restore full context of prior rounds
      • If
        pending_experiments
        is non-empty, check if they have completed (e.g., check screen sessions)
      • Resume from the next round (round = saved round + 1)
      • Log: "Recovered from context compaction. Resuming at Round N."
  2. Read project narrative documents, memory files, and any prior review documents
  3. Read recent experiment results (check output directories, logs)
  4. Identify current weaknesses and open TODOs from prior reviews
  5. Initialize round counter = 1 (unless recovered from state file)
  6. Create/update
    AUTO_REVIEW.md
    with header and timestamp
  1. 检查项目根目录下的
    REVIEW_STATE.json
    文件
    • 若不存在:全新启动(常规情况)
    • 若存在且
      status
      "completed"
      全新启动(上一轮循环已正常结束)
    • 若存在且
      status
      "in_progress"
      timestamp
      早于24小时:全新启动(状态已过期,来自被终止/放弃的运行——删除文件后重新开始)
    • 若存在且
      status
      "in_progress"
      timestamp
      在24小时内:恢复运行
      • 读取状态文件恢复
        round
        last_score
        pending_experiments
      • 读取
        AUTO_REVIEW.md
        恢复之前轮次的完整上下文
      • pending_experiments
        非空,检查实验是否已完成(例如检查screen会话)
      • 从下一轮开始(round = 保存的轮次 + 1)
      • 日志记录:"从上下文压缩中恢复。将从第N轮继续。"
  2. 读取项目说明文档、内存文件及所有过往评审文档
  3. 读取近期实验结果(检查输出目录、日志)
  4. 从过往评审中识别当前存在的不足和未完成的TODO项
  5. 初始化轮次计数器为1(从状态文件恢复的情况除外)
  6. 创建/更新
    AUTO_REVIEW.md
    ,添加标题和时间戳

Loop (repeat up to MAX_ROUNDS)

循环(最多重复MAX_ROUNDS次)

Phase A: Review

Phase A:评审

Send comprehensive context to the external reviewer.
Check MCP availability first, then use appropriate method:
If MCP available (Primary):
Use mcp__minimax-chat__minimax_chat tool with:
- system: "You are a senior machine learning researcher serving as a reviewer for top-tier conferences like NeurIPS, ICML, and ICLR. Provide rigorous, constructive feedback."
- prompt: [Full review prompt with context]
- model: "MiniMax-M2.5"
If MCP NOT available (Fallback):
bash
curl -s "https://api.minimax.chat/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $MINIMAX_API_KEY" \
  -d '{
    "model": "MiniMax-M2.5",
    "messages": [
      {
        "role": "system",
        "content": "You are a senior machine learning researcher serving as a reviewer for top-tier conferences like NeurIPS, ICML, and ICLR. Provide rigorous, constructive feedback."
      },
      {
        "role": "user",
        "content": "[Round N/MAX_ROUNDS of autonomous review loop]\n\n[Full research context: claims, methods, results, known weaknesses]\n[Changes since last round, if any]\n[For round 2+: Summary of previous review feedback and what was addressed]\n\nPlease act as a senior ML reviewer (NeurIPS/ICML level).\n\n1. Score this work 1-10 for a top venue\n2. List remaining critical weaknesses (ranked by severity)\n3. For each weakness, specify the MINIMUM fix (experiment, analysis, or reframing)\n4. State clearly: is this READY for submission? Yes/No/Almost\n\nBe brutally honest. If the work is ready, say so clearly."
      }
    ],
    "max_tokens": 4096
  }'
Note: Each round is a standalone API call. For round 2+, include the summary of previous reviews and changes in the prompt itself.
向外部评审发送完整上下文。
优先检查MCP可用性,再选择合适的方式:
若MCP可用(首选):
使用mcp__minimax-chat__minimax_chat工具,配置如下:
- system: "You are a senior machine learning researcher serving as a reviewer for top-tier conferences like NeurIPS, ICML, and ICLR. Provide rigorous, constructive feedback."
- prompt: [包含上下文的完整评审提示词]
- model: "MiniMax-M2.5"
若MCP不可用(备选):
bash
curl -s "https://api.minimax.chat/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $MINIMAX_API_KEY" \
  -d '{
    "model": "MiniMax-M2.5",
    "messages": [
      {
        "role": "system",
        "content": "You are a senior machine learning researcher serving as a reviewer for top-tier conferences like NeurIPS, ICML, and ICLR. Provide rigorous, constructive feedback."
      },
      {
        "role": "user",
        "content": "[Round N/MAX_ROUNDS of autonomous review loop]\n\n[Full research context: claims, methods, results, known weaknesses]\n[Changes since last round, if any]\n[For round 2+: Summary of previous review feedback and what was addressed]\n\nPlease act as a senior ML reviewer (NeurIPS/ICML level).\n\n1. Score this work 1-10 for a top venue\n2. List remaining critical weaknesses (ranked by severity)\n3. For each weakness, specify the MINIMUM fix (experiment, analysis, or reframing)\n4. State clearly: is this READY for submission? Yes/No/Almost\n\nBe brutally honest. If the work is ready, say so clearly."
      }
    ],
    "max_tokens": 4096
  }'
注意:每一轮都是独立的API调用。从第2轮开始,需在提示词中包含过往评审摘要和已做的变更。

Phase B: Parse Assessment

Phase B:解析评审结果

CRITICAL: Save the FULL raw response from the external reviewer verbatim (store in a variable for Phase E). Do NOT discard or summarize — the raw text is the primary record.
Then extract structured fields:
  • Score (numeric 1-10)
  • Verdict ("ready" / "almost" / "not ready")
  • Action items (ranked list of fixes)
STOP CONDITION: If score >= 6 AND verdict contains "ready" or "almost" → stop loop, document final state.
关键:完整保存外部评审的原始响应(存储到变量中用于Phase E)。不得丢弃或总结——原始文本是主要记录。
然后提取结构化字段:
  • 评分(1-10的数值)
  • 结论("ready" / "almost" / "not ready")
  • 行动项(按优先级排序的修复列表)
停止条件:若评分≥6且结论包含"ready"或"almost" → 停止循环,记录最终状态。

Phase C: Implement Fixes (if not stopping)

Phase C:实施修复(若未停止)

For each action item (highest priority first):
  1. Code changes: Write/modify experiment scripts, model code, analysis scripts
  2. Run experiments: Deploy to GPU server via SSH + screen/tmux
  3. Analysis: Run evaluation, collect results, update figures/tables
  4. Documentation: Update project notes and review document
Prioritization rules:
  • Skip fixes requiring excessive compute (flag for manual follow-up)
  • Skip fixes requiring external data/models not available
  • Prefer reframing/analysis over new experiments when both address the concern
  • Always implement metric additions (cheap, high impact)
针对每个行动项(按优先级从高到低):
  1. 代码变更:编写/修改实验脚本、模型代码、分析脚本
  2. 运行实验:通过SSH + screen/tmux部署到GPU服务器
  3. 分析:运行评估、收集结果、更新图表/表格
  4. 文档更新:更新项目笔记和评审文档
优先级规则:
  • 跳过需要大量计算资源的修复(标记为手动跟进)
  • 跳过需要外部数据/模型且无法获取的修复
  • 当重构/分析和新实验都能解决问题时,优先选择重构/分析
  • 始终优先实现指标添加(成本低、影响大)

Phase D: Wait for Results

Phase D:等待结果

If experiments were launched:
  • Monitor remote sessions for completion
  • Collect results from output files and logs
若已启动实验:
  • 监控远程会话的完成状态
  • 从输出文件和日志中收集结果

Phase E: Document Round

Phase E:记录本轮内容

Append to
AUTO_REVIEW.md
:
markdown
undefined
AUTO_REVIEW.md
追加内容:
markdown
undefined

Round N (timestamp)

第N轮(时间戳)

Assessment (Summary)

评审结果(摘要)

  • Score: X/10
  • Verdict: [ready/almost/not ready]
  • Key criticisms: [bullet list]
  • 评分:X/10
  • 结论:[ready/almost/not ready]
  • 主要批评:[项目符号列表]

Reviewer Raw Response

评审原始响应

<details> <summary>Click to expand full reviewer response</summary>
[Paste the COMPLETE raw response from the external reviewer here — verbatim, unedited. This is the authoritative record. Do NOT truncate or paraphrase.]
</details>
<details> <summary>点击展开完整评审响应</summary>
[在此粘贴外部评审的完整原始响应——一字不差,未经编辑。 这是权威记录,不得截断或改写。]
</details>

Actions Taken

已实施行动

  • [what was implemented/changed]
  • [已实现/变更的内容]

Results

结果

  • [experiment outcomes, if any]
  • [实验结果(如有)]

Status

状态

  • [continuing to round N+1 / stopping]

**Write `REVIEW_STATE.json`** with current round, score, verdict, and any pending experiments.

Increment round counter → back to Phase A.
  • [继续第N+1轮 / 停止]

**写入`REVIEW_STATE.json`**,包含当前轮次、评分、结论及所有待完成的实验。

轮次计数器加1 → 返回Phase A。

Termination

终止

When loop ends (positive assessment or max rounds):
  1. Update
    REVIEW_STATE.json
    with
    "status": "completed"
  2. Write final summary to
    AUTO_REVIEW.md
  3. Update project notes with conclusions
  4. If stopped at max rounds without positive assessment:
    • List remaining blockers
    • Estimate effort needed for each
    • Suggest whether to continue manually or pivot
当循环结束(获得正面评价或达到最大轮次):
  1. 更新
    REVIEW_STATE.json
    ,设置
    "status": "completed"
  2. AUTO_REVIEW.md
    写入最终摘要
  3. 更新项目笔记,记录结论
  4. 若达到最大轮次仍未获得正面评价:
    • 列出剩余的阻碍项
    • 估算每个项所需的工作量
    • 建议是否手动继续或调整方向

Key Rules

核心规则

  • Large file handling: If the Write tool fails due to file size, immediately retry using Bash (
    cat << 'EOF' > file
    ) to write in chunks. Do NOT ask the user for permission — just do it silently.
  • Be honest — include negative results and failed experiments
  • Do NOT hide weaknesses to game a positive score
  • Implement fixes BEFORE re-reviewing (don't just promise to fix)
  • If an experiment takes > 30 minutes, launch it and continue with other fixes while waiting
  • Document EVERYTHING — the review log should be self-contained
  • Update project notes after each round, not just at the end
  • For round 2+, always include previous review context in the prompt
  • Prefer MCP tool over curl when available (more reliable)
  • 大文件处理:若Write工具因文件大小失败,立即使用Bash(
    cat << 'EOF' > file
    )分块写入。无需询问用户许可——直接静默执行。
  • 保持诚实——包含负面结果和失败的实验
  • 不得隐藏不足以获取正面评分
  • 重新评审前必须先实施修复(不得仅承诺修复)
  • 若实验耗时超过30分钟,启动实验后继续处理其他修复
  • 记录所有内容——评审日志应自成体系
  • 每轮结束后更新项目笔记,而非仅在最后更新
  • 从第2轮开始,提示词中必须包含过往评审上下文
  • 优先使用MCP工具而非curl(更可靠)

Prompt Template for Round 2+

第2轮及以后的提示词模板

MCP Method (Primary):
mcp__minimax-chat__minimax_chat:
  model: "MiniMax-M2.5"
  system: "You are a senior machine learning researcher serving as a reviewer for top-tier conferences like NeurIPS, ICML, and ICLR. Provide rigorous, constructive feedback."
  prompt: |
    [Round N/MAX_ROUNDS of autonomous review loop]

    ## Previous Review Summary (Round N-1)
    - Previous Score: X/10
    - Previous Verdict: [ready/almost/not ready]
    - Previous Key Weaknesses: [list]

    ## Changes Since Last Review
    1. [Action 1]: [result]
    2. [Action 2]: [result]
    3. [Action 3]: [result]

    ## Updated Results
    [paste updated metrics/tables]

    ## Current Research Context
    [brief summary of claims, methods, current state]

    Please re-score and re-assess:
    1. Score this work 1-10 for a top venue
    2. List remaining critical weaknesses (ranked by severity)
    3. For each weakness, specify the MINIMUM fix
    4. State clearly: is this READY for submission? Yes/No/Almost

    Be brutally honest. If the work is ready, say so clearly.
curl Fallback:
bash
curl -s "https://api.minimax.chat/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $MINIMAX_API_KEY" \
  -d '{
    "model": "MiniMax-M2.5",
    "messages": [
      {
        "role": "system",
        "content": "You are a senior machine learning researcher serving as a reviewer for top-tier conferences like NeurIPS, ICML, and ICLR. Provide rigorous, constructive feedback."
      },
      {
        "role": "user",
        "content": "[Round N/MAX_ROUNDS of autonomous review loop]\n\n## Previous Review Summary (Round N-1)\n- Previous Score: X/10\n- Previous Verdict: [ready/almost/not ready]\n- Previous Key Weaknesses: [list]\n\n## Changes Since Last Review\n1. [Action 1]: [result]\n2. [Action 2]: [result]\n3. [Action 3]: [result]\n\n## Updated Results\n[paste updated metrics/tables]\n\n## Current Research Context\n[brief summary of claims, methods, current state]\n\nPlease re-score and re-assess:\n1. Score this work 1-10 for a top venue\n2. List remaining critical weaknesses (ranked by severity)\n3. For each weakness, specify the MINIMUM fix\n4. State clearly: is this READY for submission? Yes/No/Almost\n\nBe brutally honest. If the work is ready, say so clearly."
      }
    ],
    "max_tokens": 4096
  }'
MCP方式(首选):
mcp__minimax-chat__minimax_chat:
  model: "MiniMax-M2.5"
  system: "You are a senior machine learning researcher serving as a reviewer for top-tier conferences like NeurIPS, ICML, and ICLR. Provide rigorous, constructive feedback."
  prompt: |
    [Round N/MAX_ROUNDS of autonomous review loop]

    ## 上一轮评审摘要(第N-1轮)
    - 上一轮评分:X/10
    - 上一轮结论:[ready/almost/not ready]
    - 上一轮主要不足:[列表]

    ## 自上轮以来的变更
    1. [行动1]:[结果]
    2. [行动2]:[结果]
    3. [行动3]:[结果]

    ## 更新后的结果
    [粘贴更新后的指标/表格]

    ## 当前研究上下文
    [研究主张、方法、当前状态的简要摘要]

    请重新评分和评估:
    1. 为该研究在顶级会议中的表现评分(1-10)
    2. 列出剩余的关键不足(按严重程度排序)
    3. 针对每个不足,说明最小修复要求
    4. 明确说明:该研究是否已准备好提交?是/否/接近完成

    请保持坦诚。若研究已准备好,请明确说明。
curl备选方式:
bash
curl -s "https://api.minimax.chat/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $MINIMAX_API_KEY" \
  -d '{
    "model": "MiniMax-M2.5",
    "messages": [
      {
        "role": "system",
        "content": "You are a senior machine learning researcher serving as a reviewer for top-tier conferences like NeurIPS, ICML, and ICLR. Provide rigorous, constructive feedback."
      },
      {
        "role": "user",
        "content": "[Round N/MAX_ROUNDS of autonomous review loop]\n\n## 上一轮评审摘要(第N-1轮)\n- 上一轮评分:X/10\n- 上一轮结论:[ready/almost/not ready]\n- 上一轮主要不足:[列表]\n\n## 自上轮以来的变更\n1. [行动1]:[结果]\n2. [行动2]:[结果]\n3. [行动3]:[结果]\n\n## 更新后的结果\n[paste updated metrics/tables]\n\n## 当前研究上下文\n[brief summary of claims, methods, current state]\n\n请重新评分和评估:\n1. 为该研究在顶级会议中的表现评分(1-10)\n2. 列出剩余的关键不足(按严重程度排序)\n3. 针对每个不足,说明最小修复要求\n4. 明确说明:该研究是否已准备好提交?是/否/接近完成\n\n请保持坦诚。若研究已准备好,请明确说明。"
      }
    ],
    "max_tokens": 4096
  }'