auto-review-loop-minimax
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAuto Review Loop (MiniMax Version): Autonomous Research Improvement
自动评审循环(MiniMax版本):自主研究改进
Autonomously iterate: review → implement fixes → re-review, until the external reviewer gives a positive assessment or MAX_ROUNDS is reached.
自主迭代:评审→实施修复→重新评审,直到外部评审给出正面评价或达到MAX_ROUNDS上限。
Context: $ARGUMENTS
上下文:$ARGUMENTS
Constants
常量
- MAX_ROUNDS = 4
- POSITIVE_THRESHOLD: score >= 6/10, or verdict contains "accept", "sufficient", "ready for submission"
- REVIEW_DOC: in project root (cumulative log)
AUTO_REVIEW.md - REVIEWER_MODEL = — Model used via MiniMax API
MiniMax-M2.5
- MAX_ROUNDS = 4
- POSITIVE_THRESHOLD:评分≥6/10,或评审结论包含"accept"、"sufficient"、"ready for submission"
- REVIEW_DOC:项目根目录下的(累积日志)
AUTO_REVIEW.md - REVIEWER_MODEL = — 通过MiniMax API调用的模型
MiniMax-M2.5
API Configuration
API配置
This skill uses MiniMax API for external review. Two methods are supported:
本技能使用MiniMax API进行外部评审,支持两种方式:
Method 1: MCP Tool (Primary)
方式1:MCP工具(首选)
If is available, use it:
mcp__minimax-chat__minimax_chatmcp__minimax-chat__minimax_chat:
prompt: |
[Review prompt content]
model: "MiniMax-M2.5"
system: "You are a senior machine learning researcher..."若可用,则使用该工具:
mcp__minimax-chat__minimax_chatmcp__minimax-chat__minimax_chat:
prompt: |
[Review prompt content]
model: "MiniMax-M2.5"
system: "You are a senior machine learning researcher..."Method 2: curl (Fallback)
方式2:curl(备选)
If MCP is not available, use curl directly:
bash
curl -s "https://api.minimax.chat/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $MINIMAX_API_KEY" \
-d '{
"model": "MiniMax-M2.5",
"messages": [
{"role": "system", "content": "You are a senior ML researcher..."},
{"role": "user", "content": "[Review prompt]"}
],
"max_tokens": 4096
}'API Key: Read from under , or from environment variable.
~/.claude/settings.jsonenv.MINIMAX_API_KEYWhy MiniMax instead of Codex MCP? Codex CLI uses OpenAI's Responses API () which is not supported by third-party providers. See: https://github.com/openai/codex/discussions/7782
/v1/responses若MCP不可用,则直接使用curl:
bash
curl -s "https://api.minimax.chat/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $MINIMAX_API_KEY" \
-d '{
"model": "MiniMax-M2.5",
"messages": [
{"role": "system", "content": "You are a senior ML researcher..."},
{"role": "user", "content": "[Review prompt]"}
],
"max_tokens": 4096
}'API密钥:从的读取,或从环境变量获取。
~/.claude/settings.jsonenv.MINIMAX_API_KEY为何选择MiniMax而非Codex MCP? Codex CLI使用OpenAI的Responses API(),该接口不被第三方服务商支持。详情见:https://github.com/openai/codex/discussions/7782
/v1/responsesState Persistence (Compact Recovery)
状态持久化(紧凑恢复)
Long-running loops may hit the context window limit, triggering automatic compaction. To survive this, persist state to after each round:
REVIEW_STATE.jsonjson
{
"round": 2,
"status": "in_progress",
"last_score": 5.0,
"last_verdict": "not ready",
"pending_experiments": ["screen_name_1"],
"timestamp": "2026-03-13T21:00:00"
}Write this file at the end of every Phase E (after documenting the round). Overwrite each time — only the latest state matters.
On completion (positive assessment or max rounds), set so future invocations don't accidentally resume a finished loop.
"status": "completed"长时间运行的循环可能会触达上下文窗口限制,触发自动压缩。为了应对这种情况,每轮结束后需将状态持久化到:
REVIEW_STATE.jsonjson
{
"round": 2,
"status": "in_progress",
"last_score": 5.0,
"last_verdict": "not ready",
"pending_experiments": ["screen_name_1"],
"timestamp": "2026-03-13T21:00:00"
}需在每个Phase E结束时写入该文件(记录完本轮内容后)。每次覆盖文件——仅保留最新状态。
循环完成时(获得正面评价或达到最大轮次),设置,避免后续调用意外恢复已完成的循环。
"status": "completed"Workflow
工作流
Initialization
初始化
- Check for in project root:
REVIEW_STATE.json- If it does not exist: fresh start (normal case)
- If it exists AND is
status: fresh start (previous loop finished normally)"completed" - If it exists AND is
statusAND"in_progress"is older than 24 hours: fresh start (stale state from a killed/abandoned run — delete the file and start over)timestamp - If it exists AND is
statusAND"in_progress"is within 24 hours: resumetimestamp- Read the state file to recover ,
round,last_scorepending_experiments - Read to restore full context of prior rounds
AUTO_REVIEW.md - If is non-empty, check if they have completed (e.g., check screen sessions)
pending_experiments - Resume from the next round (round = saved round + 1)
- Log: "Recovered from context compaction. Resuming at Round N."
- Read the state file to recover
- Read project narrative documents, memory files, and any prior review documents
- Read recent experiment results (check output directories, logs)
- Identify current weaknesses and open TODOs from prior reviews
- Initialize round counter = 1 (unless recovered from state file)
- Create/update with header and timestamp
AUTO_REVIEW.md
- 检查项目根目录下的文件:
REVIEW_STATE.json- 若不存在:全新启动(常规情况)
- 若存在且为
status:全新启动(上一轮循环已正常结束)"completed" - 若存在且为
status且"in_progress"早于24小时:全新启动(状态已过期,来自被终止/放弃的运行——删除文件后重新开始)timestamp - 若存在且为
status且"in_progress"在24小时内:恢复运行timestamp- 读取状态文件恢复、
round、last_scorepending_experiments - 读取恢复之前轮次的完整上下文
AUTO_REVIEW.md - 若非空,检查实验是否已完成(例如检查screen会话)
pending_experiments - 从下一轮开始(round = 保存的轮次 + 1)
- 日志记录:"从上下文压缩中恢复。将从第N轮继续。"
- 读取状态文件恢复
- 读取项目说明文档、内存文件及所有过往评审文档
- 读取近期实验结果(检查输出目录、日志)
- 从过往评审中识别当前存在的不足和未完成的TODO项
- 初始化轮次计数器为1(从状态文件恢复的情况除外)
- 创建/更新,添加标题和时间戳
AUTO_REVIEW.md
Loop (repeat up to MAX_ROUNDS)
循环(最多重复MAX_ROUNDS次)
Phase A: Review
Phase A:评审
Send comprehensive context to the external reviewer.
Check MCP availability first, then use appropriate method:
If MCP available (Primary):
Use mcp__minimax-chat__minimax_chat tool with:
- system: "You are a senior machine learning researcher serving as a reviewer for top-tier conferences like NeurIPS, ICML, and ICLR. Provide rigorous, constructive feedback."
- prompt: [Full review prompt with context]
- model: "MiniMax-M2.5"If MCP NOT available (Fallback):
bash
curl -s "https://api.minimax.chat/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $MINIMAX_API_KEY" \
-d '{
"model": "MiniMax-M2.5",
"messages": [
{
"role": "system",
"content": "You are a senior machine learning researcher serving as a reviewer for top-tier conferences like NeurIPS, ICML, and ICLR. Provide rigorous, constructive feedback."
},
{
"role": "user",
"content": "[Round N/MAX_ROUNDS of autonomous review loop]\n\n[Full research context: claims, methods, results, known weaknesses]\n[Changes since last round, if any]\n[For round 2+: Summary of previous review feedback and what was addressed]\n\nPlease act as a senior ML reviewer (NeurIPS/ICML level).\n\n1. Score this work 1-10 for a top venue\n2. List remaining critical weaknesses (ranked by severity)\n3. For each weakness, specify the MINIMUM fix (experiment, analysis, or reframing)\n4. State clearly: is this READY for submission? Yes/No/Almost\n\nBe brutally honest. If the work is ready, say so clearly."
}
],
"max_tokens": 4096
}'Note: Each round is a standalone API call. For round 2+, include the summary of previous reviews and changes in the prompt itself.
向外部评审发送完整上下文。
优先检查MCP可用性,再选择合适的方式:
若MCP可用(首选):
使用mcp__minimax-chat__minimax_chat工具,配置如下:
- system: "You are a senior machine learning researcher serving as a reviewer for top-tier conferences like NeurIPS, ICML, and ICLR. Provide rigorous, constructive feedback."
- prompt: [包含上下文的完整评审提示词]
- model: "MiniMax-M2.5"若MCP不可用(备选):
bash
curl -s "https://api.minimax.chat/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $MINIMAX_API_KEY" \
-d '{
"model": "MiniMax-M2.5",
"messages": [
{
"role": "system",
"content": "You are a senior machine learning researcher serving as a reviewer for top-tier conferences like NeurIPS, ICML, and ICLR. Provide rigorous, constructive feedback."
},
{
"role": "user",
"content": "[Round N/MAX_ROUNDS of autonomous review loop]\n\n[Full research context: claims, methods, results, known weaknesses]\n[Changes since last round, if any]\n[For round 2+: Summary of previous review feedback and what was addressed]\n\nPlease act as a senior ML reviewer (NeurIPS/ICML level).\n\n1. Score this work 1-10 for a top venue\n2. List remaining critical weaknesses (ranked by severity)\n3. For each weakness, specify the MINIMUM fix (experiment, analysis, or reframing)\n4. State clearly: is this READY for submission? Yes/No/Almost\n\nBe brutally honest. If the work is ready, say so clearly."
}
],
"max_tokens": 4096
}'注意:每一轮都是独立的API调用。从第2轮开始,需在提示词中包含过往评审摘要和已做的变更。
Phase B: Parse Assessment
Phase B:解析评审结果
CRITICAL: Save the FULL raw response from the external reviewer verbatim (store in a variable for Phase E). Do NOT discard or summarize — the raw text is the primary record.
Then extract structured fields:
- Score (numeric 1-10)
- Verdict ("ready" / "almost" / "not ready")
- Action items (ranked list of fixes)
STOP CONDITION: If score >= 6 AND verdict contains "ready" or "almost" → stop loop, document final state.
关键:完整保存外部评审的原始响应(存储到变量中用于Phase E)。不得丢弃或总结——原始文本是主要记录。
然后提取结构化字段:
- 评分(1-10的数值)
- 结论("ready" / "almost" / "not ready")
- 行动项(按优先级排序的修复列表)
停止条件:若评分≥6且结论包含"ready"或"almost" → 停止循环,记录最终状态。
Phase C: Implement Fixes (if not stopping)
Phase C:实施修复(若未停止)
For each action item (highest priority first):
- Code changes: Write/modify experiment scripts, model code, analysis scripts
- Run experiments: Deploy to GPU server via SSH + screen/tmux
- Analysis: Run evaluation, collect results, update figures/tables
- Documentation: Update project notes and review document
Prioritization rules:
- Skip fixes requiring excessive compute (flag for manual follow-up)
- Skip fixes requiring external data/models not available
- Prefer reframing/analysis over new experiments when both address the concern
- Always implement metric additions (cheap, high impact)
针对每个行动项(按优先级从高到低):
- 代码变更:编写/修改实验脚本、模型代码、分析脚本
- 运行实验:通过SSH + screen/tmux部署到GPU服务器
- 分析:运行评估、收集结果、更新图表/表格
- 文档更新:更新项目笔记和评审文档
优先级规则:
- 跳过需要大量计算资源的修复(标记为手动跟进)
- 跳过需要外部数据/模型且无法获取的修复
- 当重构/分析和新实验都能解决问题时,优先选择重构/分析
- 始终优先实现指标添加(成本低、影响大)
Phase D: Wait for Results
Phase D:等待结果
If experiments were launched:
- Monitor remote sessions for completion
- Collect results from output files and logs
若已启动实验:
- 监控远程会话的完成状态
- 从输出文件和日志中收集结果
Phase E: Document Round
Phase E:记录本轮内容
Append to :
AUTO_REVIEW.mdmarkdown
undefined向追加内容:
AUTO_REVIEW.mdmarkdown
undefinedRound N (timestamp)
第N轮(时间戳)
Assessment (Summary)
评审结果(摘要)
- Score: X/10
- Verdict: [ready/almost/not ready]
- Key criticisms: [bullet list]
- 评分:X/10
- 结论:[ready/almost/not ready]
- 主要批评:[项目符号列表]
Reviewer Raw Response
评审原始响应
<details>
<summary>Click to expand full reviewer response</summary>
[Paste the COMPLETE raw response from the external reviewer here — verbatim, unedited.
This is the authoritative record. Do NOT truncate or paraphrase.]
</details><details>
<summary>点击展开完整评审响应</summary>
[在此粘贴外部评审的完整原始响应——一字不差,未经编辑。
这是权威记录,不得截断或改写。]
</details>Actions Taken
已实施行动
- [what was implemented/changed]
- [已实现/变更的内容]
Results
结果
- [experiment outcomes, if any]
- [实验结果(如有)]
Status
状态
- [continuing to round N+1 / stopping]
**Write `REVIEW_STATE.json`** with current round, score, verdict, and any pending experiments.
Increment round counter → back to Phase A.- [继续第N+1轮 / 停止]
**写入`REVIEW_STATE.json`**,包含当前轮次、评分、结论及所有待完成的实验。
轮次计数器加1 → 返回Phase A。Termination
终止
When loop ends (positive assessment or max rounds):
- Update with
REVIEW_STATE.json"status": "completed" - Write final summary to
AUTO_REVIEW.md - Update project notes with conclusions
- If stopped at max rounds without positive assessment:
- List remaining blockers
- Estimate effort needed for each
- Suggest whether to continue manually or pivot
当循环结束(获得正面评价或达到最大轮次):
- 更新,设置
REVIEW_STATE.json"status": "completed" - 向写入最终摘要
AUTO_REVIEW.md - 更新项目笔记,记录结论
- 若达到最大轮次仍未获得正面评价:
- 列出剩余的阻碍项
- 估算每个项所需的工作量
- 建议是否手动继续或调整方向
Key Rules
核心规则
-
Large file handling: If the Write tool fails due to file size, immediately retry using Bash () to write in chunks. Do NOT ask the user for permission — just do it silently.
cat << 'EOF' > file -
Be honest — include negative results and failed experiments
-
Do NOT hide weaknesses to game a positive score
-
Implement fixes BEFORE re-reviewing (don't just promise to fix)
-
If an experiment takes > 30 minutes, launch it and continue with other fixes while waiting
-
Document EVERYTHING — the review log should be self-contained
-
Update project notes after each round, not just at the end
-
For round 2+, always include previous review context in the prompt
-
Prefer MCP tool over curl when available (more reliable)
-
大文件处理:若Write工具因文件大小失败,立即使用Bash()分块写入。无需询问用户许可——直接静默执行。
cat << 'EOF' > file -
保持诚实——包含负面结果和失败的实验
-
不得隐藏不足以获取正面评分
-
重新评审前必须先实施修复(不得仅承诺修复)
-
若实验耗时超过30分钟,启动实验后继续处理其他修复
-
记录所有内容——评审日志应自成体系
-
每轮结束后更新项目笔记,而非仅在最后更新
-
从第2轮开始,提示词中必须包含过往评审上下文
-
优先使用MCP工具而非curl(更可靠)
Prompt Template for Round 2+
第2轮及以后的提示词模板
MCP Method (Primary):
mcp__minimax-chat__minimax_chat:
model: "MiniMax-M2.5"
system: "You are a senior machine learning researcher serving as a reviewer for top-tier conferences like NeurIPS, ICML, and ICLR. Provide rigorous, constructive feedback."
prompt: |
[Round N/MAX_ROUNDS of autonomous review loop]
## Previous Review Summary (Round N-1)
- Previous Score: X/10
- Previous Verdict: [ready/almost/not ready]
- Previous Key Weaknesses: [list]
## Changes Since Last Review
1. [Action 1]: [result]
2. [Action 2]: [result]
3. [Action 3]: [result]
## Updated Results
[paste updated metrics/tables]
## Current Research Context
[brief summary of claims, methods, current state]
Please re-score and re-assess:
1. Score this work 1-10 for a top venue
2. List remaining critical weaknesses (ranked by severity)
3. For each weakness, specify the MINIMUM fix
4. State clearly: is this READY for submission? Yes/No/Almost
Be brutally honest. If the work is ready, say so clearly.curl Fallback:
bash
curl -s "https://api.minimax.chat/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $MINIMAX_API_KEY" \
-d '{
"model": "MiniMax-M2.5",
"messages": [
{
"role": "system",
"content": "You are a senior machine learning researcher serving as a reviewer for top-tier conferences like NeurIPS, ICML, and ICLR. Provide rigorous, constructive feedback."
},
{
"role": "user",
"content": "[Round N/MAX_ROUNDS of autonomous review loop]\n\n## Previous Review Summary (Round N-1)\n- Previous Score: X/10\n- Previous Verdict: [ready/almost/not ready]\n- Previous Key Weaknesses: [list]\n\n## Changes Since Last Review\n1. [Action 1]: [result]\n2. [Action 2]: [result]\n3. [Action 3]: [result]\n\n## Updated Results\n[paste updated metrics/tables]\n\n## Current Research Context\n[brief summary of claims, methods, current state]\n\nPlease re-score and re-assess:\n1. Score this work 1-10 for a top venue\n2. List remaining critical weaknesses (ranked by severity)\n3. For each weakness, specify the MINIMUM fix\n4. State clearly: is this READY for submission? Yes/No/Almost\n\nBe brutally honest. If the work is ready, say so clearly."
}
],
"max_tokens": 4096
}'MCP方式(首选):
mcp__minimax-chat__minimax_chat:
model: "MiniMax-M2.5"
system: "You are a senior machine learning researcher serving as a reviewer for top-tier conferences like NeurIPS, ICML, and ICLR. Provide rigorous, constructive feedback."
prompt: |
[Round N/MAX_ROUNDS of autonomous review loop]
## 上一轮评审摘要(第N-1轮)
- 上一轮评分:X/10
- 上一轮结论:[ready/almost/not ready]
- 上一轮主要不足:[列表]
## 自上轮以来的变更
1. [行动1]:[结果]
2. [行动2]:[结果]
3. [行动3]:[结果]
## 更新后的结果
[粘贴更新后的指标/表格]
## 当前研究上下文
[研究主张、方法、当前状态的简要摘要]
请重新评分和评估:
1. 为该研究在顶级会议中的表现评分(1-10)
2. 列出剩余的关键不足(按严重程度排序)
3. 针对每个不足,说明最小修复要求
4. 明确说明:该研究是否已准备好提交?是/否/接近完成
请保持坦诚。若研究已准备好,请明确说明。curl备选方式:
bash
curl -s "https://api.minimax.chat/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $MINIMAX_API_KEY" \
-d '{
"model": "MiniMax-M2.5",
"messages": [
{
"role": "system",
"content": "You are a senior machine learning researcher serving as a reviewer for top-tier conferences like NeurIPS, ICML, and ICLR. Provide rigorous, constructive feedback."
},
{
"role": "user",
"content": "[Round N/MAX_ROUNDS of autonomous review loop]\n\n## 上一轮评审摘要(第N-1轮)\n- 上一轮评分:X/10\n- 上一轮结论:[ready/almost/not ready]\n- 上一轮主要不足:[列表]\n\n## 自上轮以来的变更\n1. [行动1]:[结果]\n2. [行动2]:[结果]\n3. [行动3]:[结果]\n\n## 更新后的结果\n[paste updated metrics/tables]\n\n## 当前研究上下文\n[brief summary of claims, methods, current state]\n\n请重新评分和评估:\n1. 为该研究在顶级会议中的表现评分(1-10)\n2. 列出剩余的关键不足(按严重程度排序)\n3. 针对每个不足,说明最小修复要求\n4. 明确说明:该研究是否已准备好提交?是/否/接近完成\n\n请保持坦诚。若研究已准备好,请明确说明。"
}
],
"max_tokens": 4096
}'