auto-review-loop-llm
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAuto Review Loop (Generic LLM): Autonomous Research Improvement
自主评审循环(通用LLM):研究成果自主优化
Autonomously iterate: review → implement fixes → re-review, until the external reviewer gives a positive assessment or MAX_ROUNDS is reached.
自主迭代流程:评审→实施修复→重新评审,直到外部评审给出正面评价或达到MAX_ROUNDS上限。
Context: $ARGUMENTS
上下文:$ARGUMENTS
Constants
常量
- MAX_ROUNDS = 4
- POSITIVE_THRESHOLD: score >= 6/10, or verdict contains "accept", "sufficient", "ready for submission"
- REVIEW_DOC: in project root (cumulative log)
AUTO_REVIEW.md
- MAX_ROUNDS = 4
- POSITIVE_THRESHOLD:评分≥6/10,或评审结论包含"accept"、"sufficient"、"ready for submission"
- REVIEW_DOC:项目根目录下的(累积评审日志)
AUTO_REVIEW.md
LLM Configuration
LLM配置
This skill uses any OpenAI-compatible API for external review via the MCP server.
llm-chat本技能通过 MCP服务器,使用任何兼容OpenAI的API进行外部评审。
llm-chatConfiguration via MCP Server (Recommended)
通过MCP服务器配置(推荐)
Add to :
~/.claude/settings.jsonjson
{
"mcpServers": {
"llm-chat": {
"command": "/usr/bin/python3",
"args": ["/Users/yourname/.claude/mcp-servers/llm-chat/server.py"],
"env": {
"LLM_API_KEY": "your-api-key",
"LLM_BASE_URL": "https://api.deepseek.com/v1",
"LLM_MODEL": "deepseek-chat"
}
}
}
}添加配置到:
~/.claude/settings.jsonjson
{
"mcpServers": {
"llm-chat": {
"command": "/usr/bin/python3",
"args": ["/Users/yourname/.claude/mcp-servers/llm-chat/server.py"],
"env": {
"LLM_API_KEY": "your-api-key",
"LLM_BASE_URL": "https://api.deepseek.com/v1",
"LLM_MODEL": "deepseek-chat"
}
}
}
}Supported Providers
支持的服务商
| Provider | LLM_BASE_URL | LLM_MODEL |
|---|---|---|
| OpenAI | | |
| DeepSeek | | |
| MiniMax | | |
| Kimi (Moonshot) | | |
| ZhiPu (GLM) | | |
| SiliconFlow | | |
| 阿里云百炼 | | |
| 零一万物 | | |
| 服务商 | LLM_BASE_URL | LLM_MODEL |
|---|---|---|
| OpenAI | | |
| DeepSeek | | |
| MiniMax | | |
| Kimi(Moonshot) | | |
| 智谱(GLM) | | |
| SiliconFlow | | |
| 阿里云百炼 | | |
| 零一万物 | | |
API Call Method
API调用方式
Primary: MCP Tool
mcp__llm-chat__chat:
prompt: |
[Review prompt content]
model: "deepseek-chat"
system: "You are a senior ML reviewer..."Fallback: curl
bash
curl -s "${LLM_BASE_URL}/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${LLM_API_KEY}" \
-d '{
"model": "${LLM_MODEL}",
"messages": [
{"role": "system", "content": "You are a senior ML reviewer..."},
{"role": "user", "content": "[review prompt]"}
],
"max_tokens": 4096
}'优先方案:MCP工具
mcp__llm-chat__chat:
prompt: |
[评审提示内容]
model: "deepseek-chat"
system: "You are a senior ML reviewer..."备用方案:curl
bash
curl -s "${LLM_BASE_URL}/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${LLM_API_KEY}" \
-d '{
"model": "${LLM_MODEL}",
"messages": [
{"role": "system", "content": "You are a senior ML reviewer..."},
{"role": "user", "content": "[review prompt]"}
],
"max_tokens": 4096
}'State Persistence (Compact Recovery)
状态持久化(断点恢复)
Persist state to after each round:
REVIEW_STATE.jsonjson
{
"round": 2,
"status": "in_progress",
"last_score": 5.0,
"last_verdict": "not ready",
"pending_experiments": [],
"timestamp": "2026-03-15T10:00:00"
}Write this file at the end of every Phase E (after documenting the round).
On completion, set .
"status": "completed"每轮结束后将状态保存到:
REVIEW_STATE.jsonjson
{
"round": 2,
"status": "in_progress",
"last_score": 5.0,
"last_verdict": "not ready",
"pending_experiments": [],
"timestamp": "2026-03-15T10:00:00"
}在每个Phase E结束时(记录完本轮内容后)写入该文件。
完成时,将设置为。
"status""completed"Workflow
工作流
Initialization
初始化
- Check for recovery
REVIEW_STATE.json - Read project context and prior reviews
- Initialize round counter
- **检查**以支持断点恢复
REVIEW_STATE.json - 读取项目上下文和历史评审记录
- 初始化轮次计数器
Loop (up to MAX_ROUNDS)
循环(最多MAX_ROUNDS轮)
Phase A: Review
Phase A:评审
If MCP available:
mcp__llm-chat__chat:
system: "You are a senior ML reviewer (NeurIPS/ICML level)."
prompt: |
[Round N/MAX_ROUNDS of autonomous review loop]
[Full research context: claims, methods, results, known weaknesses]
[Changes since last round, if any]
1. Score this work 1-10 for a top venue
2. List remaining critical weaknesses (ranked by severity)
3. For each weakness, specify the MINIMUM fix
4. State clearly: is this READY for submission? Yes/No/Almost
Be brutally honest. If the work is ready, say so clearly.If MCP NOT available:
bash
curl -s "${LLM_BASE_URL}/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${LLM_API_KEY}" \
-d '{
"model": "${LLM_MODEL}",
"messages": [
{"role": "system", "content": "You are a senior ML reviewer (NeurIPS/ICML level)."},
{"role": "user", "content": "[Full review prompt]"}
],
"max_tokens": 4096
}'若MCP可用:
mcp__llm-chat__chat:
system: "You are a senior ML reviewer (NeurIPS/ICML level)."
prompt: |
[自主评审循环第N/MAX_ROUNDS轮]
[完整研究上下文:核心结论、方法、结果、已知缺陷]
[自上一轮以来的变更(如有)]
1. 为顶会评审对本工作打分1-10
2. 列出剩余的关键缺陷(按严重程度排序)
3. 针对每个缺陷,明确最小修复要求
4. 清晰说明:本工作是否已准备好提交?是/否/接近
请直言不讳。若工作已就绪,请明确告知。若MCP不可用:
bash
curl -s "${LLM_BASE_URL}/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${LLM_API_KEY}" \
-d '{
"model": "${LLM_MODEL}",
"messages": [
{"role": "system", "content": "You are a senior ML reviewer (NeurIPS/ICML level)."},
{"role": "user", "content": "[完整评审提示]"}
],
"max_tokens": 4096
}'Phase B: Parse Assessment
Phase B:解析评审结果
CRITICAL: Save the FULL raw response verbatim. Then extract:
- Score (numeric 1-10)
- Verdict ("ready" / "almost" / "not ready")
- Action items (ranked list of fixes)
STOP: If score >= 6 AND verdict contains "ready/almost"
关键要求:完整保存原始响应内容。然后提取:
- 评分(1-10的数值)
- 结论("ready" / "almost" / "not ready")
- 行动项(按优先级排序的修复列表)
终止条件:若评分≥6且结论包含"ready/almost"
Phase C: Implement Fixes
Phase C:实施修复
Priority: metric additions > reframing > new experiments
优先级:指标补充 > 表述优化 > 新增实验
Phase D: Wait for Results
Phase D:等待结果
Monitor remote experiments
监控远程实验进度
Phase E: Document Round
Phase E:记录本轮内容
Append to :
AUTO_REVIEW.mdmarkdown
undefined将内容追加到:
AUTO_REVIEW.mdmarkdown
undefinedRound N (timestamp)
第N轮(时间戳)
Assessment (Summary)
评审摘要
- Score: X/10
- Verdict: [ready/almost/not ready]
- Key criticisms: [bullet list]
- 评分:X/10
- 结论:[ready/almost/not ready]
- 核心问题:[项目符号列表]
Reviewer Raw Response
评审原始响应
<details>
<summary>Click to expand full reviewer response</summary>
[Paste the COMPLETE raw response here — verbatim, unedited.]
</details><details>
<summary>点击展开完整评审响应</summary>
[在此粘贴完整的原始响应内容——一字不差,未经编辑。]
</details>Actions Taken
已实施的行动
- [what was implemented/changed]
- [已实现/变更的内容]
Results
结果
- [experiment outcomes, if any]
- [实验结果(如有)]
Status
状态
- [continuing to round N+1 / stopping]
**Write `REVIEW_STATE.json`** with current state.- [进入第N+1轮 / 终止循环]
**写入`REVIEW_STATE.json`**保存当前状态。Termination
终止流程
- Set status to "completed"
REVIEW_STATE.json - Write final summary
- 将的status设置为"completed"
REVIEW_STATE.json - 撰写最终总结
Key Rules
核心规则
-
Large file handling: If the Write tool fails due to file size, immediately retry using Bash () to write in chunks. Do NOT ask the user for permission — just do it silently.
cat << 'EOF' > file -
Be honest about weaknesses
-
Implement fixes BEFORE re-reviewing
-
Document everything
-
Include previous context in round 2+ prompts
-
Prefer MCP tool over curl when available
-
大文件处理:若Write工具因文件大小失败,立即使用Bash()分块重试。无需询问用户权限——直接静默执行。
cat << 'EOF' > file -
如实反馈缺陷
-
先实施修复再进行重新评审
-
记录所有内容
-
第2轮及以后的提示需包含历史上下文
-
优先使用MCP工具而非curl
Prompt Template for Round 2+
第2轮及以后的提示模板
mcp__llm-chat__chat:
system: "You are a senior ML reviewer (NeurIPS/ICML level)."
prompt: |
[Round N/MAX_ROUNDS of autonomous review loop]
## Previous Review Summary (Round N-1)
- Previous Score: X/10
- Previous Verdict: [ready/almost/not ready]
- Previous Key Weaknesses: [list]
## Changes Since Last Review
1. [Action 1]: [result]
2. [Action 2]: [result]
## Updated Results
[paste updated metrics/tables]
Please re-score and re-assess:
1. Score this work 1-10 for a top venue
2. List remaining critical weaknesses (ranked by severity)
3. For each weakness, specify the MINIMUM fix
4. State clearly: is this READY for submission? Yes/No/Almost
Be brutally honest. If the work is ready, say so clearly.mcp__llm-chat__chat:
system: "You are a senior ML reviewer (NeurIPS/ICML level)."
prompt: |
[自主评审循环第N/MAX_ROUNDS轮]
## 上一轮评审摘要(第N-1轮)
- 上一轮评分:X/10
- 上一轮结论:[ready/almost/not ready]
- 上一轮核心缺陷:[列表]
## 自上一轮以来的变更
1. [行动1]:[结果]
2. [行动2]:[结果]
## 更新后的结果
[粘贴更新后的指标/表格]
请重新评分和评估:
1. 为顶会评审对本工作打分1-10
2. 列出剩余的关键缺陷(按严重程度排序)
3. 针对每个缺陷,明确最小修复要求
4. 清晰说明:本工作是否已准备好提交?是/否/接近
请直言不讳。若工作已就绪,请明确告知。