auto-paper-improvement-loop
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAuto Paper Improvement Loop: Review → Fix → Recompile
自动论文优化循环:评审→修改→重新编译
Autonomously improve the paper at: $ARGUMENTS
自主优化以下路径中的论文:$ARGUMENTS
Context
背景
This skill is designed to run after Workflow 3 ( → → → ). It takes a compiled paper and iteratively improves it through external LLM review.
/paper-plan/paper-figure/paper-write/paper-compileUnlike (which iterates on research — running experiments, collecting data, rewriting narrative), this skill iterates on paper writing quality — fixing theoretical inconsistencies, softening overclaims, adding missing content, and improving presentation.
/auto-review-loop本技能设计用于在工作流3( → → → )之后运行。它接收已编译的论文,通过外部LLM评审来迭代优化论文。
/paper-plan/paper-figure/paper-write/paper-compile与(针对研究迭代——开展实验、收集数据、重写叙事)不同,本技能针对论文写作质量迭代——修正理论不一致性、弱化过度表述、补充缺失内容、优化呈现形式。
/auto-review-loopConstants
常量配置
- MAX_ROUNDS = 2 — Two rounds of review→fix→recompile. Empirically, Round 1 catches structural issues (4→6/10), Round 2 catches remaining presentation issues (6→7/10). Diminishing returns beyond 2 rounds for writing-only improvements.
- REVIEWER_MODEL = — Model used via Codex MCP for paper review.
gpt-5.4 - REVIEW_LOG = — Cumulative log of all rounds, stored in paper directory.
PAPER_IMPROVEMENT_LOG.md - HUMAN_CHECKPOINT = false — When , pause after each round's review and present score + weaknesses to the user. The user can approve fixes, provide custom modification instructions, skip specific fixes, or stop early. When
true(default), runs fully autonomously.false
💡 Override:/auto-paper-improvement-loop "paper/" — human checkpoint: true
- MAX_ROUNDS = 2 — 执行2轮评审→修改→重新编译。根据经验,第1轮可解决结构性问题(评分从4/10提升至6/10),第2轮可解决剩余的呈现问题(评分从6/10提升至7/10)。仅针对写作优化的话,2轮之后收益会递减。
- REVIEWER_MODEL = — 用于论文评审的模型,通过Codex MCP调用。
gpt-5.4 - REVIEW_LOG = — 所有轮次的累积日志,存储在论文目录中。
PAPER_IMPROVEMENT_LOG.md - HUMAN_CHECKPOINT = false — 设为时,每轮评审结束后暂停,向用户展示评分和问题点。用户可批准修改、提供自定义修改指令、跳过特定修改,或提前终止。默认值为
true,全程自主运行。false
💡 覆盖配置:/auto-paper-improvement-loop "paper/" — human checkpoint: true
Inputs
输入
- Compiled paper — + LaTeX source files
paper/main.pdf - All section files — concatenated for review prompt
.tex
- 已编译的论文 — + LaTeX源文件
paper/main.pdf - 所有章节的文件 — 合并后用于评审提示词
.tex
State Persistence (Compact Recovery)
状态持久化(快速恢复)
If the context window fills up mid-loop, Claude Code auto-compacts. To recover, this skill writes after each round:
PAPER_IMPROVEMENT_STATE.jsonjson
{
"current_round": 1,
"threadId": "019ce736-...",
"last_score": 6,
"status": "in_progress",
"timestamp": "2026-03-13T21:00:00"
}On startup: if exists with AND is within 24 hours, read it + to recover context, then resume from the next round. Otherwise (file absent, , or older than 24 hours), start fresh.
PAPER_IMPROVEMENT_STATE.json"status": "in_progress"timestampPAPER_IMPROVEMENT_LOG.md"status": "completed"After each round: overwrite the state file. On completion: set .
"status": "completed"如果在循环过程中上下文窗口耗尽,Claude Code会自动压缩上下文。为了恢复,本技能会在每轮结束后写入文件:
PAPER_IMPROVEMENT_STATE.jsonjson
{
"current_round": 1,
"threadId": "019ce736-...",
"last_score": 6,
"status": "in_progress",
"timestamp": "2026-03-13T21:00:00"
}启动时:如果存在且,同时在24小时内,则读取该文件和恢复上下文,然后从下一轮继续。否则(文件不存在、,或时间超过24小时),从头开始。
PAPER_IMPROVEMENT_STATE.json"status": "in_progress"timestampPAPER_IMPROVEMENT_LOG.md"status": "completed"每轮结束后:覆盖状态文件。完成后:将设为。
"status""completed"Workflow
工作流
Step 0: Preserve Original
步骤0:保存原始版本
bash
cp paper/main.pdf paper/main_round0_original.pdfbash
cp paper/main.pdf paper/main_round0_original.pdfStep 1: Collect Paper Text
步骤1:收集论文文本
Concatenate all section files into a single text block for the review prompt:
bash
undefined将所有章节文件合并为单个文本块,用于评审提示词:
bash
undefinedCollect all sections in order
按顺序收集所有章节
for f in paper/sections/*.tex; do
echo "% === $(basename $f) ==="
cat "$f"
done > /tmp/paper_full_text.txt
undefinedfor f in paper/sections/*.tex; do
echo "% === $(basename $f) ==="
cat "$f"
done > /tmp/paper_full_text.txt
undefinedStep 2: Round 1 Review
步骤2:第1轮评审
Send the full paper text to GPT-5.4 xhigh:
mcp__codex__codex:
model: gpt-5.4
config: {"model_reasoning_effort": "xhigh"}
prompt: |
You are reviewing a [VENUE] paper. Please provide a detailed, structured review.
## Full Paper Text:
[paste concatenated sections]
## Review Instructions
Please act as a senior ML reviewer ([VENUE] level). Provide:
1. **Overall Score** (1-10, where 6 = weak accept, 7 = accept)
2. **Summary** (2-3 sentences)
3. **Strengths** (bullet list, ranked)
4. **Weaknesses** (bullet list, ranked: CRITICAL > MAJOR > MINOR)
5. **For each CRITICAL/MAJOR weakness**: A specific, actionable fix
6. **Missing References** (if any)
7. **Verdict**: Ready for submission? Yes / Almost / No
Focus on: theoretical rigor, claims vs evidence alignment, writing clarity,
self-containedness, notation consistency.Save the threadId for Round 2.
将完整论文文本发送至GPT-5.4 xhigh:
mcp__codex__codex:
model: gpt-5.4
config: {"model_reasoning_effort": "xhigh"}
prompt: |
你正在评审一篇[VENUE]会议的论文。请提供详细、结构化的评审意见。
## 完整论文文本:
[粘贴合并后的章节内容]
## 评审说明
请以资深机器学习评审专家([VENUE]会议级别)的身份进行评审。请提供:
1. **总体评分**(1-10分,6分=弱接收,7分=接收)
2. **摘要**(2-3句话)
3. **优势**(分点列出,按重要性排序)
4. **问题点**(分点列出,按严重程度排序:CRITICAL(严重)> MAJOR(主要)> MINOR(次要))
5. **针对每个严重/主要问题点**:具体、可执行的修改建议
6. **缺失的参考文献**(如果有)
7. **结论**:是否可提交?是/接近/否
重点关注:理论严谨性、表述与证据的一致性、写作清晰度、内容自洽性、符号一致性。保存threadId用于第2轮评审。
Step 2b: Human Checkpoint (if enabled)
步骤2b:人工检查点(若启用)
Skip if .
HUMAN_CHECKPOINT = falsePresent the review results and wait for user input:
📋 Round 1 review complete.
Score: X/10 — [verdict]
Key weaknesses (by severity):
1. [CRITICAL] ...
2. [MAJOR] ...
3. [MINOR] ...
Reply "go" to implement all fixes, give custom instructions, "skip 2" to skip specific fixes, or "stop" to end.Parse user response same as : approve / custom instructions / skip / stop.
/auto-review-loop如果则跳过。
HUMAN_CHECKPOINT = false展示评审结果并等待用户输入:
📋 第1轮评审完成。
评分:X/10 — [结论]
主要问题点(按严重程度):
1. [严重] ...
2. [主要] ...
3. [次要] ...
回复“go”执行所有修改,提供自定义指令,“skip 2”跳过特定修改,或“stop”终止流程。解析用户输入的逻辑与相同:批准/自定义指令/跳过/终止。
/auto-review-loopStep 3: Implement Round 1 Fixes
步骤3:实施第1轮修改
Parse the review and implement fixes by severity:
Priority order:
- CRITICAL fixes (assumption mismatches, internal contradictions)
- MAJOR fixes (overclaims, missing content, notation issues)
- MINOR fixes (if time permits)
Common fix patterns:
| Issue | Fix Pattern |
|---|---|
| Assumption-model mismatch | Rewrite assumption to match the model, add formal proposition bridging the gap |
| Overclaims | Soften language: "validate" → "demonstrate practical relevance", "comparable" → "qualitatively competitive" |
| Missing metrics | Add quantitative table with honest parameter counts and caveats |
| Theorem not self-contained | Add "Interpretation" paragraph listing all dependencies |
| Notation confusion | Rename conflicting symbols globally, add Notation paragraph |
| Missing references | Add to |
| Theory-practice gap | Explicitly frame theory as idealized; add synthetic validation subsection |
解析评审意见,按严重程度实施修改:
优先级顺序:
- 严重问题的修改(假设与模型不匹配、内部矛盾)
- 主要问题的修改(过度表述、内容缺失、符号问题)
- 次要问题的修改(时间允许的情况下)
常见修改模式:
| 问题 | 修改模式 |
|---|---|
| 假设与模型不匹配 | 重写假设使其与模型匹配,添加正式命题填补差距 |
| 过度表述 | 弱化表述:将“validate”改为“demonstrate practical relevance”,“comparable”改为“qualitatively competitive” |
| 缺失指标 | 添加包含真实参数计数和说明的量化表格 |
| 定理不自洽 | 添加“解释”段落,列出所有依赖项 |
| 符号混淆 | 全局重命名冲突符号,添加符号说明段落 |
| 缺失参考文献 | 添加至 |
| 理论与实践脱节 | 明确将理论表述为理想化情况;添加合成验证小节 |
Step 4: Recompile Round 1
步骤4:重新编译第1轮修改后的论文
bash
cd paper && latexmk -C && latexmk -pdf -interaction=nonstopmode -halt-on-error main.tex
cp main.pdf main_round1.pdfVerify: 0 undefined references, 0 undefined citations.
bash
cd paper && latexmk -C && latexmk -pdf -interaction=nonstopmode -halt-on-error main.tex
cp main.pdf main_round1.pdf验证:0个未定义引用,0个未定义文献。
Step 5: Round 2 Review
步骤5:第2轮评审
Use with the saved threadId:
mcp__codex__codex-replymcp__codex__codex-reply:
threadId: [saved from Round 1]
model: gpt-5.4
config: {"model_reasoning_effort": "xhigh"}
prompt: |
[Round 2 update]
Since your last review, we have implemented:
1. [Fix 1]: [description]
2. [Fix 2]: [description]
...
Please re-score and re-assess. Same format:
Score, Summary, Strengths, Weaknesses, Actionable fixes, Verdict.使用保存的threadId调用:
mcp__codex__codex-replymcp__codex__codex-reply:
threadId: [第1轮保存的ID]
model: gpt-5.4
config: {"model_reasoning_effort": "xhigh"}
prompt: |
[第2轮更新]
自上次评审后,我们已完成以下修改:
1. [修改1]:[描述]
2. [修改2]:[描述]
...
请重新评分和评估。格式要求与之前相同:评分、摘要、优势、问题点、可执行修改建议、结论。Step 5b: Human Checkpoint (if enabled)
步骤5b:人工检查点(若启用)
Skip if . Same as Step 2b — present Round 2 review, wait for user input.
HUMAN_CHECKPOINT = false如果则跳过。 与步骤2b相同——展示第2轮评审结果,等待用户输入。
HUMAN_CHECKPOINT = falseStep 6: Implement Round 2 Fixes
步骤6:实施第2轮修改
Same process as Step 3. Typical Round 2 fixes:
- Add controlled synthetic experiments validating theory
- Further soften any remaining overclaims
- Formalize informal arguments (e.g., truncation → formal proposition)
- Strengthen limitations section
与步骤3流程相同。第2轮的典型修改包括:
- 添加控制合成实验以验证理论
- 进一步弱化剩余的过度表述
- 将非正式论证形式化(例如,将截断改为正式命题)
- 强化局限性章节
Step 7: Recompile Round 2
步骤7:重新编译第2轮修改后的论文
bash
cd paper && latexmk -C && latexmk -pdf -interaction=nonstopmode -halt-on-error main.tex
cp main.pdf main_round2.pdfbash
cd paper && latexmk -C && latexmk -pdf -interaction=nonstopmode -halt-on-error main.tex
cp main.pdf main_round2.pdfStep 8: Format Check
步骤8:格式检查
After the final recompilation, run a format compliance check:
bash
undefined最终重新编译后,运行格式合规性检查:
bash
undefined1. Page count vs venue limit
1. 页数与会议限制对比
PAGES=$(pdfinfo paper/main.pdf | grep Pages | awk '{print $2}')
echo "Pages: $PAGES (limit: 9 main body for ICLR/NeurIPS)"
PAGES=$(pdfinfo paper/main.pdf | grep Pages | awk '{print $2}')
echo "页数:$PAGES(ICLR/NeurIPS会议主文限制为9页)"
2. Overfull hbox warnings (content exceeding margins)
2. 内容超出边距警告(Overfull hbox)
OVERFULL=$(grep -c "Overfull" paper/main.log 2>/dev/null || echo 0)
echo "Overfull hbox warnings: $OVERFULL"
grep "Overfull" paper/main.log 2>/dev/null | head -10
OVERFULL=$(grep -c "Overfull" paper/main.log 2>/dev/null || echo 0)
echo "内容超出边距警告数:$OVERFULL"
grep "Overfull" paper/main.log 2>/dev/null | head -10
3. Underfull hbox warnings (loose spacing)
3. 间距过松警告(Underfull hbox)
UNDERFULL=$(grep -c "Underfull" paper/main.log 2>/dev/null || echo 0)
echo "Underfull hbox warnings: $UNDERFULL"
UNDERFULL=$(grep -c "Underfull" paper/main.log 2>/dev/null || echo 0)
echo "间距过松警告数:$UNDERFULL"
4. Bad boxes summary
4. 排版问题汇总
grep -c "badness" paper/main.log 2>/dev/null || echo "0 badness warnings"
**Auto-fix patterns:**
| Issue | Fix |
|-------|-----|
| Overfull hbox in equation | Wrap in `\resizebox` or split with `\split`/`aligned` |
| Overfull hbox in table | Reduce font (`\small`/`\footnotesize`) or use `\resizebox{\linewidth}{!}{...}` |
| Overfull hbox in text | Rephrase sentence or add `\allowbreak` / `\-` hints |
| Over page limit | Move content to appendix, compress tables, reduce figure sizes |
| Underfull hbox (loose) | Rephrase for better line filling or add `\looseness=-1` |
If any overfull hbox > 10pt is found, fix it and recompile before documenting.grep -c "badness" paper/main.log 2>/dev/null || echo "0个badness警告"
**自动修正模式:**
| 问题 | 修正方法 |
|-------|-----|
| 公式中内容超出边距 | 使用`\resizebox`包裹,或用`\split`/`aligned`拆分 |
| 表格中内容超出边距 | 缩小字体(`\small`/`\footnotesize`)或使用`\resizebox{\linewidth}{!}{...}` |
| 正文中内容超出边距 | 改写句子或添加`\allowbreak`/`\-`换行提示 |
| 超过页数限制 | 将内容移至附录、压缩表格、缩小图片尺寸 |
| 间距过松 | 改写内容以优化行填充,或添加`\looseness=-1` |
如果发现任何超过10pt的内容超出边距问题,在记录前先修正并重新编译。Step 9: Document Results
步骤9:记录结果
Create in the paper directory:
PAPER_IMPROVEMENT_LOG.mdmarkdown
undefined在论文目录中创建:
PAPER_IMPROVEMENT_LOG.mdmarkdown
undefinedPaper Improvement Log
论文优化日志
Score Progression
评分变化
| Round | Score | Verdict | Key Changes |
|---|---|---|---|
| Round 0 (original) | X/10 | No/Almost/Yes | Baseline |
| Round 1 | Y/10 | No/Almost/Yes | [summary of fixes] |
| Round 2 | Z/10 | No/Almost/Yes | [summary of fixes] |
| 轮次 | 评分 | 结论 | 主要修改 |
|---|---|---|---|
| 第0轮(原始) | X/10 | 否/接近/是 | 基线版本 |
| 第1轮 | Y/10 | 否/接近/是 | [修改摘要] |
| 第2轮 | Z/10 | 否/接近/是 | [修改摘要] |
Round 1 Review & Fixes
第1轮评审与修改
<details>
<summary>GPT-5.4 xhigh Review (Round 1)</summary>
[Full raw review text, verbatim]
</details><details>
<summary>GPT-5.4 xhigh评审意见(第1轮)</summary>
[完整原始评审文本,原文保留]
</details>Fixes Implemented
已实施的修改
- [Fix description]
- [Fix description] ...
- [修改描述]
- [修改描述] ...
Round 2 Review & Fixes
第2轮评审与修改
<details>
<summary>GPT-5.4 xhigh Review (Round 2)</summary>
[Full raw review text, verbatim]
</details><details>
<summary>GPT-5.4 xhigh评审意见(第2轮)</summary>
[完整原始评审文本,原文保留]
</details>Fixes Implemented
已实施的修改
- [Fix description]
- [Fix description] ...
- [修改描述]
- [修改描述] ...
PDFs
PDF版本
- — Original generated paper
main_round0_original.pdf - — After Round 1 fixes
main_round1.pdf - — Final version after Round 2 fixes
main_round2.pdf
undefined- — 原始生成论文
main_round0_original.pdf - — 第1轮修改后版本
main_round1.pdf - — 第2轮修改后版本(最终版)
main_round2.pdf
undefinedStep 9: Summary
步骤9:总结
Report to user:
- Score progression table
- Number of CRITICAL/MAJOR/MINOR issues fixed per round
- Final page count
- Remaining issues (if any)
向用户汇报:
- 评分变化表
- 每轮解决的严重/主要/次要问题数量
- 最终页数
- 剩余问题(如果有)
Feishu Notification (if configured)
飞书通知(若已配置)
After each round's review AND at final completion, check :
~/.claude/feishu.json- After each round: Send — "Round N: X/10 — [key changes]"
review_scored - After final round: Send — score progression table + final page count
pipeline_done - If config absent or mode : skip entirely (no-op)
"off"
在每轮评审完成后以及最终完成时,检查:
~/.claude/feishu.json- 每轮评审后:发送消息——“第N轮:X/10分 — [主要修改]”
review_scored - 最终轮次后:发送消息——评分变化表 + 最终页数
pipeline_done - 如果配置文件不存在或模式为:完全跳过(不执行任何操作)
"off"
Output
输出
paper/
├── main_round0_original.pdf # Original
├── main_round1.pdf # After Round 1
├── main_round2.pdf # After Round 2 (final)
├── main.pdf # = main_round2.pdf
└── PAPER_IMPROVEMENT_LOG.md # Full review log with scorespaper/
├── main_round0_original.pdf # 原始版本
├── main_round1.pdf # 第1轮修改后
├── main_round2.pdf # 第2轮修改后(最终版)
├── main.pdf # = main_round2.pdf
└── PAPER_IMPROVEMENT_LOG.md # 包含评分的完整评审日志Key Rules
核心规则
-
Large file handling: If the Write tool fails due to file size, immediately retry using Bash () to write in chunks. Do NOT ask the user for permission — just do it silently.
cat << 'EOF' > file -
Preserve all PDF versions — user needs to compare progression
-
Save FULL raw review text — do not summarize or truncate GPT-5.4 responses
-
Usefor Round 2 to maintain conversation context
mcp__codex__codex-reply -
Always recompile after fixes — verify 0 errors before proceeding
-
Do not fabricate experimental results — synthetic validation must describe methodology, not invent numbers
-
Respect the paper's claims — soften overclaims rather than adding unsupported new claims
-
Global consistency — when renaming notation or softening claims, check ALL files (abstract, intro, method, experiments, theory sections, conclusion, tables, figure captions)
-
大文件处理:如果Write工具因文件大小失败,立即使用Bash命令()分块写入重试。无需向用户请求权限——直接静默执行。
cat << 'EOF' > file -
保留所有PDF版本——用户需要对比优化过程
-
保存完整原始评审文本——不要总结或截断GPT-5.4的回复
-
第2轮使用——保持对话上下文
mcp__codex__codex-reply -
修改后始终重新编译——在继续前验证无错误
-
不要编造实验结果——合成验证必须描述方法,而非虚构数据
-
尊重论文原有表述——弱化过度表述而非添加无依据的新表述
-
全局一致性——重命名符号或弱化表述时,检查所有文件(摘要、引言、方法、实验、理论章节、结论、表格、图片说明)
Typical Score Progression
典型评分变化
Based on end-to-end testing on a 9-page ICLR 2026 theory paper:
| Round | Score | Key Improvements |
|---|---|---|
| Round 0 | 4/10 (content) | Baseline: assumption-model mismatch, overclaims, notation issues |
| Round 1 | 6/10 (content) | Fixed assumptions, softened claims, added interpretation, renamed notation |
| Round 2 | 7/10 (content) | Added synthetic validation, formal truncation proposition, stronger limitations |
| Round 3 | 5→8.5/10 (format) | Removed hero fig, appendix, compressed conclusion, fixed overfull hbox |
+4.5 points across 3 rounds (2 content + 1 format) is typical for a well-structured but rough first draft. Final: 8 pages main body, 0 overfull hbox, ICLR-compliant.
基于对一篇9页ICLR 2026理论论文的端到端测试:
| 轮次 | 评分 | 主要改进 |
|---|---|---|
| 第0轮 | 4/10(内容) | 基线版本:假设与模型不匹配、过度表述、符号问题 |
| 第1轮 | 6/10(内容) | 修正假设、弱化表述、添加解释、重命名符号 |
| 第2轮 | 7/10(内容) | 添加合成验证、形式化截断命题、强化局限性章节 |
| 第3轮 | 5→8.5/10(格式) | 移除主图、移至附录、压缩结论、修正内容超出边距问题 |
3轮共提升4.5分(2轮内容优化+1轮格式优化)是结构完整但初稿粗糙的论文的典型情况。最终版本:主文8页,0个内容超出边距问题,符合ICLR会议要求。