auto-review-loop-llm

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Auto Review Loop (Generic LLM): Autonomous Research Improvement

自主评审循环(通用LLM):研究成果自主优化

Autonomously iterate: review → implement fixes → re-review, until the external reviewer gives a positive assessment or MAX_ROUNDS is reached.
自主迭代流程:评审→实施修复→重新评审,直到外部评审给出正面评价或达到MAX_ROUNDS上限。

Context: $ARGUMENTS

上下文:$ARGUMENTS

Constants

常量

  • MAX_ROUNDS = 4
  • POSITIVE_THRESHOLD: score >= 6/10, or verdict contains "accept", "sufficient", "ready for submission"
  • REVIEW_DOC:
    AUTO_REVIEW.md
    in project root (cumulative log)
  • MAX_ROUNDS = 4
  • POSITIVE_THRESHOLD:评分≥6/10,或评审结论包含"accept"、"sufficient"、"ready for submission"
  • REVIEW_DOC:项目根目录下的
    AUTO_REVIEW.md
    (累积评审日志)

LLM Configuration

LLM配置

This skill uses any OpenAI-compatible API for external review via the
llm-chat
MCP server.
本技能通过
llm-chat
MCP服务器,使用任何兼容OpenAI的API进行外部评审。

Configuration via MCP Server (Recommended)

通过MCP服务器配置(推荐)

Add to
~/.claude/settings.json
:
json
{
  "mcpServers": {
    "llm-chat": {
      "command": "/usr/bin/python3",
      "args": ["/Users/yourname/.claude/mcp-servers/llm-chat/server.py"],
      "env": {
        "LLM_API_KEY": "your-api-key",
        "LLM_BASE_URL": "https://api.deepseek.com/v1",
        "LLM_MODEL": "deepseek-chat"
      }
    }
  }
}
添加配置到
~/.claude/settings.json
json
{
  "mcpServers": {
    "llm-chat": {
      "command": "/usr/bin/python3",
      "args": ["/Users/yourname/.claude/mcp-servers/llm-chat/server.py"],
      "env": {
        "LLM_API_KEY": "your-api-key",
        "LLM_BASE_URL": "https://api.deepseek.com/v1",
        "LLM_MODEL": "deepseek-chat"
      }
    }
  }
}

Supported Providers

支持的服务商

ProviderLLM_BASE_URLLLM_MODEL
OpenAI
https://api.openai.com/v1
gpt-4o
,
o3
DeepSeek
https://api.deepseek.com/v1
deepseek-chat
,
deepseek-reasoner
MiniMax
https://api.minimax.chat/v1
MiniMax-M2.5
Kimi (Moonshot)
https://api.moonshot.cn/v1
moonshot-v1-8k
,
moonshot-v1-32k
ZhiPu (GLM)
https://open.bigmodel.cn/api/paas/v4
glm-4
,
glm-4-plus
SiliconFlow
https://api.siliconflow.cn/v1
Qwen/Qwen2.5-72B-Instruct
阿里云百炼
https://dashscope.aliyuncs.com/compatible-mode/v1
qwen-max
零一万物
https://api.lingyiwanwu.com/v1
yi-large
服务商LLM_BASE_URLLLM_MODEL
OpenAI
https://api.openai.com/v1
gpt-4o
,
o3
DeepSeek
https://api.deepseek.com/v1
deepseek-chat
,
deepseek-reasoner
MiniMax
https://api.minimax.chat/v1
MiniMax-M2.5
Kimi(Moonshot)
https://api.moonshot.cn/v1
moonshot-v1-8k
,
moonshot-v1-32k
智谱(GLM)
https://open.bigmodel.cn/api/paas/v4
glm-4
,
glm-4-plus
SiliconFlow
https://api.siliconflow.cn/v1
Qwen/Qwen2.5-72B-Instruct
阿里云百炼
https://dashscope.aliyuncs.com/compatible-mode/v1
qwen-max
零一万物
https://api.lingyiwanwu.com/v1
yi-large

API Call Method

API调用方式

Primary: MCP Tool
mcp__llm-chat__chat:
  prompt: |
    [Review prompt content]
  model: "deepseek-chat"
  system: "You are a senior ML reviewer..."
Fallback: curl
bash
curl -s "${LLM_BASE_URL}/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${LLM_API_KEY}" \
  -d '{
    "model": "${LLM_MODEL}",
    "messages": [
      {"role": "system", "content": "You are a senior ML reviewer..."},
      {"role": "user", "content": "[review prompt]"}
    ],
    "max_tokens": 4096
  }'
优先方案:MCP工具
mcp__llm-chat__chat:
  prompt: |
    [评审提示内容]
  model: "deepseek-chat"
  system: "You are a senior ML reviewer..."
备用方案:curl
bash
curl -s "${LLM_BASE_URL}/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${LLM_API_KEY}" \
  -d '{
    "model": "${LLM_MODEL}",
    "messages": [
      {"role": "system", "content": "You are a senior ML reviewer..."},
      {"role": "user", "content": "[review prompt]"}
    ],
    "max_tokens": 4096
  }'

State Persistence (Compact Recovery)

状态持久化(断点恢复)

Persist state to
REVIEW_STATE.json
after each round:
json
{
  "round": 2,
  "status": "in_progress",
  "last_score": 5.0,
  "last_verdict": "not ready",
  "pending_experiments": [],
  "timestamp": "2026-03-15T10:00:00"
}
Write this file at the end of every Phase E (after documenting the round).
On completion, set
"status": "completed"
.
每轮结束后将状态保存到
REVIEW_STATE.json
json
{
  "round": 2,
  "status": "in_progress",
  "last_score": 5.0,
  "last_verdict": "not ready",
  "pending_experiments": [],
  "timestamp": "2026-03-15T10:00:00"
}
在每个Phase E结束时(记录完本轮内容后)写入该文件。
完成时,将
"status"
设置为
"completed"

Workflow

工作流

Initialization

初始化

  1. Check
    REVIEW_STATE.json
    for recovery
  2. Read project context and prior reviews
  3. Initialize round counter
  1. **检查
    REVIEW_STATE.json
    **以支持断点恢复
  2. 读取项目上下文和历史评审记录
  3. 初始化轮次计数器

Loop (up to MAX_ROUNDS)

循环(最多MAX_ROUNDS轮)

Phase A: Review

Phase A:评审

If MCP available:
mcp__llm-chat__chat:
  system: "You are a senior ML reviewer (NeurIPS/ICML level)."
  prompt: |
    [Round N/MAX_ROUNDS of autonomous review loop]

    [Full research context: claims, methods, results, known weaknesses]
    [Changes since last round, if any]

    1. Score this work 1-10 for a top venue
    2. List remaining critical weaknesses (ranked by severity)
    3. For each weakness, specify the MINIMUM fix
    4. State clearly: is this READY for submission? Yes/No/Almost

    Be brutally honest. If the work is ready, say so clearly.
If MCP NOT available:
bash
curl -s "${LLM_BASE_URL}/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${LLM_API_KEY}" \
  -d '{
    "model": "${LLM_MODEL}",
    "messages": [
      {"role": "system", "content": "You are a senior ML reviewer (NeurIPS/ICML level)."},
      {"role": "user", "content": "[Full review prompt]"}
    ],
    "max_tokens": 4096
  }'
若MCP可用:
mcp__llm-chat__chat:
  system: "You are a senior ML reviewer (NeurIPS/ICML level)."
  prompt: |
    [自主评审循环第N/MAX_ROUNDS轮]

    [完整研究上下文:核心结论、方法、结果、已知缺陷]
    [自上一轮以来的变更(如有)]

    1. 为顶会评审对本工作打分1-10
    2. 列出剩余的关键缺陷(按严重程度排序)
    3. 针对每个缺陷,明确最小修复要求
    4. 清晰说明:本工作是否已准备好提交?是/否/接近

    请直言不讳。若工作已就绪,请明确告知。
若MCP不可用:
bash
curl -s "${LLM_BASE_URL}/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${LLM_API_KEY}" \
  -d '{
    "model": "${LLM_MODEL}",
    "messages": [
      {"role": "system", "content": "You are a senior ML reviewer (NeurIPS/ICML level)."},
      {"role": "user", "content": "[完整评审提示]"}
    ],
    "max_tokens": 4096
  }'

Phase B: Parse Assessment

Phase B:解析评审结果

CRITICAL: Save the FULL raw response verbatim. Then extract:
  • Score (numeric 1-10)
  • Verdict ("ready" / "almost" / "not ready")
  • Action items (ranked list of fixes)
STOP: If score >= 6 AND verdict contains "ready/almost"
关键要求:完整保存原始响应内容。然后提取:
  • 评分(1-10的数值)
  • 结论("ready" / "almost" / "not ready")
  • 行动项(按优先级排序的修复列表)
终止条件:若评分≥6且结论包含"ready/almost"

Phase C: Implement Fixes

Phase C:实施修复

Priority: metric additions > reframing > new experiments
优先级:指标补充 > 表述优化 > 新增实验

Phase D: Wait for Results

Phase D:等待结果

Monitor remote experiments
监控远程实验进度

Phase E: Document Round

Phase E:记录本轮内容

Append to
AUTO_REVIEW.md
:
markdown
undefined
将内容追加到
AUTO_REVIEW.md
markdown
undefined

Round N (timestamp)

第N轮(时间戳)

Assessment (Summary)

评审摘要

  • Score: X/10
  • Verdict: [ready/almost/not ready]
  • Key criticisms: [bullet list]
  • 评分:X/10
  • 结论:[ready/almost/not ready]
  • 核心问题:[项目符号列表]

Reviewer Raw Response

评审原始响应

<details> <summary>Click to expand full reviewer response</summary>
[Paste the COMPLETE raw response here — verbatim, unedited.]
</details>
<details> <summary>点击展开完整评审响应</summary>
[在此粘贴完整的原始响应内容——一字不差,未经编辑。]
</details>

Actions Taken

已实施的行动

  • [what was implemented/changed]
  • [已实现/变更的内容]

Results

结果

  • [experiment outcomes, if any]
  • [实验结果(如有)]

Status

状态

  • [continuing to round N+1 / stopping]

**Write `REVIEW_STATE.json`** with current state.
  • [进入第N+1轮 / 终止循环]

**写入`REVIEW_STATE.json`**保存当前状态。

Termination

终止流程

  1. Set
    REVIEW_STATE.json
    status to "completed"
  2. Write final summary
  1. REVIEW_STATE.json
    的status设置为"completed"
  2. 撰写最终总结

Key Rules

核心规则

  • Large file handling: If the Write tool fails due to file size, immediately retry using Bash (
    cat << 'EOF' > file
    ) to write in chunks. Do NOT ask the user for permission — just do it silently.
  • Be honest about weaknesses
  • Implement fixes BEFORE re-reviewing
  • Document everything
  • Include previous context in round 2+ prompts
  • Prefer MCP tool over curl when available
  • 大文件处理:若Write工具因文件大小失败,立即使用Bash(
    cat << 'EOF' > file
    )分块重试。无需询问用户权限——直接静默执行。
  • 如实反馈缺陷
  • 先实施修复再进行重新评审
  • 记录所有内容
  • 第2轮及以后的提示需包含历史上下文
  • 优先使用MCP工具而非curl

Prompt Template for Round 2+

第2轮及以后的提示模板

mcp__llm-chat__chat:
  system: "You are a senior ML reviewer (NeurIPS/ICML level)."
  prompt: |
    [Round N/MAX_ROUNDS of autonomous review loop]

    ## Previous Review Summary (Round N-1)
    - Previous Score: X/10
    - Previous Verdict: [ready/almost/not ready]
    - Previous Key Weaknesses: [list]

    ## Changes Since Last Review
    1. [Action 1]: [result]
    2. [Action 2]: [result]

    ## Updated Results
    [paste updated metrics/tables]

    Please re-score and re-assess:
    1. Score this work 1-10 for a top venue
    2. List remaining critical weaknesses (ranked by severity)
    3. For each weakness, specify the MINIMUM fix
    4. State clearly: is this READY for submission? Yes/No/Almost

    Be brutally honest. If the work is ready, say so clearly.
mcp__llm-chat__chat:
  system: "You are a senior ML reviewer (NeurIPS/ICML level)."
  prompt: |
    [自主评审循环第N/MAX_ROUNDS轮]

    ## 上一轮评审摘要(第N-1轮)
    - 上一轮评分:X/10
    - 上一轮结论:[ready/almost/not ready]
    - 上一轮核心缺陷:[列表]

    ## 自上一轮以来的变更
    1. [行动1]:[结果]
    2. [行动2]:[结果]

    ## 更新后的结果
    [粘贴更新后的指标/表格]

    请重新评分和评估:
    1. 为顶会评审对本工作打分1-10
    2. 列出剩余的关键缺陷(按严重程度排序)
    3. 针对每个缺陷,明确最小修复要求
    4. 清晰说明:本工作是否已准备好提交?是/否/接近

    请直言不讳。若工作已就绪,请明确告知。