auto-paper-improvement-loop

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Auto Paper Improvement Loop: Review → Fix → Recompile

自动论文优化循环:评审→修改→重新编译

Autonomously improve the paper at: $ARGUMENTS
自主优化以下路径中的论文:$ARGUMENTS

Context

背景

This skill is designed to run after Workflow 3 (
/paper-plan
/paper-figure
/paper-write
/paper-compile
). It takes a compiled paper and iteratively improves it through external LLM review.
Unlike
/auto-review-loop
(which iterates on research — running experiments, collecting data, rewriting narrative), this skill iterates on paper writing quality — fixing theoretical inconsistencies, softening overclaims, adding missing content, and improving presentation.
本技能设计用于在工作流3(
/paper-plan
/paper-figure
/paper-write
/paper-compile
之后运行。它接收已编译的论文,通过外部LLM评审来迭代优化论文。
/auto-review-loop
(针对研究迭代——开展实验、收集数据、重写叙事)不同,本技能针对论文写作质量迭代——修正理论不一致性、弱化过度表述、补充缺失内容、优化呈现形式。

Constants

常量配置

  • MAX_ROUNDS = 2 — Two rounds of review→fix→recompile. Empirically, Round 1 catches structural issues (4→6/10), Round 2 catches remaining presentation issues (6→7/10). Diminishing returns beyond 2 rounds for writing-only improvements.
  • REVIEWER_MODEL =
    gpt-5.4
    — Model used via Codex MCP for paper review.
  • REVIEW_LOG =
    PAPER_IMPROVEMENT_LOG.md
    — Cumulative log of all rounds, stored in paper directory.
  • HUMAN_CHECKPOINT = false — When
    true
    , pause after each round's review and present score + weaknesses to the user. The user can approve fixes, provide custom modification instructions, skip specific fixes, or stop early. When
    false
    (default), runs fully autonomously.
💡 Override:
/auto-paper-improvement-loop "paper/" — human checkpoint: true
  • MAX_ROUNDS = 2 — 执行2轮评审→修改→重新编译。根据经验,第1轮可解决结构性问题(评分从4/10提升至6/10),第2轮可解决剩余的呈现问题(评分从6/10提升至7/10)。仅针对写作优化的话,2轮之后收益会递减。
  • REVIEWER_MODEL =
    gpt-5.4
    — 用于论文评审的模型,通过Codex MCP调用。
  • REVIEW_LOG =
    PAPER_IMPROVEMENT_LOG.md
    — 所有轮次的累积日志,存储在论文目录中。
  • HUMAN_CHECKPOINT = false — 设为
    true
    时,每轮评审结束后暂停,向用户展示评分和问题点。用户可批准修改、提供自定义修改指令、跳过特定修改,或提前终止。默认值为
    false
    ,全程自主运行。
💡 覆盖配置:
/auto-paper-improvement-loop "paper/" — human checkpoint: true

Inputs

输入

  1. Compiled paper
    paper/main.pdf
    + LaTeX source files
  2. All section
    .tex
    files
    — concatenated for review prompt
  1. 已编译的论文
    paper/main.pdf
    + LaTeX源文件
  2. 所有章节的
    .tex
    文件
    — 合并后用于评审提示词

State Persistence (Compact Recovery)

状态持久化(快速恢复)

If the context window fills up mid-loop, Claude Code auto-compacts. To recover, this skill writes
PAPER_IMPROVEMENT_STATE.json
after each round:
json
{
  "current_round": 1,
  "threadId": "019ce736-...",
  "last_score": 6,
  "status": "in_progress",
  "timestamp": "2026-03-13T21:00:00"
}
On startup: if
PAPER_IMPROVEMENT_STATE.json
exists with
"status": "in_progress"
AND
timestamp
is within 24 hours, read it +
PAPER_IMPROVEMENT_LOG.md
to recover context, then resume from the next round. Otherwise (file absent,
"status": "completed"
, or older than 24 hours), start fresh.
After each round: overwrite the state file. On completion: set
"status": "completed"
.
如果在循环过程中上下文窗口耗尽,Claude Code会自动压缩上下文。为了恢复,本技能会在每轮结束后写入
PAPER_IMPROVEMENT_STATE.json
文件:
json
{
  "current_round": 1,
  "threadId": "019ce736-...",
  "last_score": 6,
  "status": "in_progress",
  "timestamp": "2026-03-13T21:00:00"
}
启动时:如果
PAPER_IMPROVEMENT_STATE.json
存在且
"status": "in_progress"
,同时
timestamp
在24小时内,则读取该文件和
PAPER_IMPROVEMENT_LOG.md
恢复上下文,然后从下一轮继续。否则(文件不存在、
"status": "completed"
,或时间超过24小时),从头开始。
每轮结束后:覆盖状态文件。完成后:将
"status"
设为
"completed"

Workflow

工作流

Step 0: Preserve Original

步骤0:保存原始版本

bash
cp paper/main.pdf paper/main_round0_original.pdf
bash
cp paper/main.pdf paper/main_round0_original.pdf

Step 1: Collect Paper Text

步骤1:收集论文文本

Concatenate all section files into a single text block for the review prompt:
bash
undefined
将所有章节文件合并为单个文本块,用于评审提示词:
bash
undefined

Collect all sections in order

按顺序收集所有章节

for f in paper/sections/*.tex; do echo "% === $(basename $f) ===" cat "$f" done > /tmp/paper_full_text.txt
undefined
for f in paper/sections/*.tex; do echo "% === $(basename $f) ===" cat "$f" done > /tmp/paper_full_text.txt
undefined

Step 2: Round 1 Review

步骤2:第1轮评审

Send the full paper text to GPT-5.4 xhigh:
mcp__codex__codex:
  model: gpt-5.4
  config: {"model_reasoning_effort": "xhigh"}
  prompt: |
    You are reviewing a [VENUE] paper. Please provide a detailed, structured review.

    ## Full Paper Text:
    [paste concatenated sections]

    ## Review Instructions
    Please act as a senior ML reviewer ([VENUE] level). Provide:
    1. **Overall Score** (1-10, where 6 = weak accept, 7 = accept)
    2. **Summary** (2-3 sentences)
    3. **Strengths** (bullet list, ranked)
    4. **Weaknesses** (bullet list, ranked: CRITICAL > MAJOR > MINOR)
    5. **For each CRITICAL/MAJOR weakness**: A specific, actionable fix
    6. **Missing References** (if any)
    7. **Verdict**: Ready for submission? Yes / Almost / No

    Focus on: theoretical rigor, claims vs evidence alignment, writing clarity,
    self-containedness, notation consistency.
Save the threadId for Round 2.
将完整论文文本发送至GPT-5.4 xhigh:
mcp__codex__codex:
  model: gpt-5.4
  config: {"model_reasoning_effort": "xhigh"}
  prompt: |
    你正在评审一篇[VENUE]会议的论文。请提供详细、结构化的评审意见。

    ## 完整论文文本:
    [粘贴合并后的章节内容]

    ## 评审说明
    请以资深机器学习评审专家([VENUE]会议级别)的身份进行评审。请提供:
    1. **总体评分**(1-10分,6分=弱接收,7分=接收)
    2. **摘要**(2-3句话)
    3. **优势**(分点列出,按重要性排序)
    4. **问题点**(分点列出,按严重程度排序:CRITICAL(严重)> MAJOR(主要)> MINOR(次要))
    5. **针对每个严重/主要问题点**:具体、可执行的修改建议
    6. **缺失的参考文献**(如果有)
    7. **结论**:是否可提交?是/接近/否

    重点关注:理论严谨性、表述与证据的一致性、写作清晰度、内容自洽性、符号一致性。
保存threadId用于第2轮评审。

Step 2b: Human Checkpoint (if enabled)

步骤2b:人工检查点(若启用)

Skip if
HUMAN_CHECKPOINT = false
.
Present the review results and wait for user input:
📋 Round 1 review complete.

Score: X/10 — [verdict]
Key weaknesses (by severity):
1. [CRITICAL] ...
2. [MAJOR] ...
3. [MINOR] ...

Reply "go" to implement all fixes, give custom instructions, "skip 2" to skip specific fixes, or "stop" to end.
Parse user response same as
/auto-review-loop
: approve / custom instructions / skip / stop.
如果
HUMAN_CHECKPOINT = false
则跳过。
展示评审结果并等待用户输入:
📋 第1轮评审完成。

评分:X/10 — [结论]
主要问题点(按严重程度):
1. [严重] ...
2. [主要] ...
3. [次要] ...

回复“go”执行所有修改,提供自定义指令,“skip 2”跳过特定修改,或“stop”终止流程。
解析用户输入的逻辑与
/auto-review-loop
相同:批准/自定义指令/跳过/终止。

Step 3: Implement Round 1 Fixes

步骤3:实施第1轮修改

Parse the review and implement fixes by severity:
Priority order:
  1. CRITICAL fixes (assumption mismatches, internal contradictions)
  2. MAJOR fixes (overclaims, missing content, notation issues)
  3. MINOR fixes (if time permits)
Common fix patterns:
IssueFix Pattern
Assumption-model mismatchRewrite assumption to match the model, add formal proposition bridging the gap
OverclaimsSoften language: "validate" → "demonstrate practical relevance", "comparable" → "qualitatively competitive"
Missing metricsAdd quantitative table with honest parameter counts and caveats
Theorem not self-containedAdd "Interpretation" paragraph listing all dependencies
Notation confusionRename conflicting symbols globally, add Notation paragraph
Missing referencesAdd to
references.bib
, cite in appropriate locations
Theory-practice gapExplicitly frame theory as idealized; add synthetic validation subsection
解析评审意见,按严重程度实施修改:
优先级顺序:
  1. 严重问题的修改(假设与模型不匹配、内部矛盾)
  2. 主要问题的修改(过度表述、内容缺失、符号问题)
  3. 次要问题的修改(时间允许的情况下)
常见修改模式:
问题修改模式
假设与模型不匹配重写假设使其与模型匹配,添加正式命题填补差距
过度表述弱化表述:将“validate”改为“demonstrate practical relevance”,“comparable”改为“qualitatively competitive”
缺失指标添加包含真实参数计数和说明的量化表格
定理不自洽添加“解释”段落,列出所有依赖项
符号混淆全局重命名冲突符号,添加符号说明段落
缺失参考文献添加至
references.bib
,在合适位置引用
理论与实践脱节明确将理论表述为理想化情况;添加合成验证小节

Step 4: Recompile Round 1

步骤4:重新编译第1轮修改后的论文

bash
cd paper && latexmk -C && latexmk -pdf -interaction=nonstopmode -halt-on-error main.tex
cp main.pdf main_round1.pdf
Verify: 0 undefined references, 0 undefined citations.
bash
cd paper && latexmk -C && latexmk -pdf -interaction=nonstopmode -halt-on-error main.tex
cp main.pdf main_round1.pdf
验证:0个未定义引用,0个未定义文献。

Step 5: Round 2 Review

步骤5:第2轮评审

Use
mcp__codex__codex-reply
with the saved threadId:
mcp__codex__codex-reply:
  threadId: [saved from Round 1]
  model: gpt-5.4
  config: {"model_reasoning_effort": "xhigh"}
  prompt: |
    [Round 2 update]

    Since your last review, we have implemented:
    1. [Fix 1]: [description]
    2. [Fix 2]: [description]
    ...

    Please re-score and re-assess. Same format:
    Score, Summary, Strengths, Weaknesses, Actionable fixes, Verdict.
使用保存的threadId调用
mcp__codex__codex-reply
mcp__codex__codex-reply:
  threadId: [第1轮保存的ID]
  model: gpt-5.4
  config: {"model_reasoning_effort": "xhigh"}
  prompt: |
    [第2轮更新]

    自上次评审后,我们已完成以下修改:
    1. [修改1]:[描述]
    2. [修改2]:[描述]
    ...

    请重新评分和评估。格式要求与之前相同:评分、摘要、优势、问题点、可执行修改建议、结论。

Step 5b: Human Checkpoint (if enabled)

步骤5b:人工检查点(若启用)

Skip if
HUMAN_CHECKPOINT = false
.
Same as Step 2b — present Round 2 review, wait for user input.
如果
HUMAN_CHECKPOINT = false
则跳过。
与步骤2b相同——展示第2轮评审结果,等待用户输入。

Step 6: Implement Round 2 Fixes

步骤6:实施第2轮修改

Same process as Step 3. Typical Round 2 fixes:
  • Add controlled synthetic experiments validating theory
  • Further soften any remaining overclaims
  • Formalize informal arguments (e.g., truncation → formal proposition)
  • Strengthen limitations section
与步骤3流程相同。第2轮的典型修改包括:
  • 添加控制合成实验以验证理论
  • 进一步弱化剩余的过度表述
  • 将非正式论证形式化(例如,将截断改为正式命题)
  • 强化局限性章节

Step 7: Recompile Round 2

步骤7:重新编译第2轮修改后的论文

bash
cd paper && latexmk -C && latexmk -pdf -interaction=nonstopmode -halt-on-error main.tex
cp main.pdf main_round2.pdf
bash
cd paper && latexmk -C && latexmk -pdf -interaction=nonstopmode -halt-on-error main.tex
cp main.pdf main_round2.pdf

Step 8: Format Check

步骤8:格式检查

After the final recompilation, run a format compliance check:
bash
undefined
最终重新编译后,运行格式合规性检查:
bash
undefined

1. Page count vs venue limit

1. 页数与会议限制对比

PAGES=$(pdfinfo paper/main.pdf | grep Pages | awk '{print $2}') echo "Pages: $PAGES (limit: 9 main body for ICLR/NeurIPS)"
PAGES=$(pdfinfo paper/main.pdf | grep Pages | awk '{print $2}') echo "页数:$PAGES(ICLR/NeurIPS会议主文限制为9页)"

2. Overfull hbox warnings (content exceeding margins)

2. 内容超出边距警告(Overfull hbox)

OVERFULL=$(grep -c "Overfull" paper/main.log 2>/dev/null || echo 0) echo "Overfull hbox warnings: $OVERFULL" grep "Overfull" paper/main.log 2>/dev/null | head -10
OVERFULL=$(grep -c "Overfull" paper/main.log 2>/dev/null || echo 0) echo "内容超出边距警告数:$OVERFULL" grep "Overfull" paper/main.log 2>/dev/null | head -10

3. Underfull hbox warnings (loose spacing)

3. 间距过松警告(Underfull hbox)

UNDERFULL=$(grep -c "Underfull" paper/main.log 2>/dev/null || echo 0) echo "Underfull hbox warnings: $UNDERFULL"
UNDERFULL=$(grep -c "Underfull" paper/main.log 2>/dev/null || echo 0) echo "间距过松警告数:$UNDERFULL"

4. Bad boxes summary

4. 排版问题汇总

grep -c "badness" paper/main.log 2>/dev/null || echo "0 badness warnings"

**Auto-fix patterns:**

| Issue | Fix |
|-------|-----|
| Overfull hbox in equation | Wrap in `\resizebox` or split with `\split`/`aligned` |
| Overfull hbox in table | Reduce font (`\small`/`\footnotesize`) or use `\resizebox{\linewidth}{!}{...}` |
| Overfull hbox in text | Rephrase sentence or add `\allowbreak` / `\-` hints |
| Over page limit | Move content to appendix, compress tables, reduce figure sizes |
| Underfull hbox (loose) | Rephrase for better line filling or add `\looseness=-1` |

If any overfull hbox > 10pt is found, fix it and recompile before documenting.
grep -c "badness" paper/main.log 2>/dev/null || echo "0个badness警告"

**自动修正模式:**

| 问题 | 修正方法 |
|-------|-----|
| 公式中内容超出边距 | 使用`\resizebox`包裹,或用`\split`/`aligned`拆分 |
| 表格中内容超出边距 | 缩小字体(`\small`/`\footnotesize`)或使用`\resizebox{\linewidth}{!}{...}` |
| 正文中内容超出边距 | 改写句子或添加`\allowbreak`/`\-`换行提示 |
| 超过页数限制 | 将内容移至附录、压缩表格、缩小图片尺寸 |
| 间距过松 | 改写内容以优化行填充,或添加`\looseness=-1` |

如果发现任何超过10pt的内容超出边距问题,在记录前先修正并重新编译。

Step 9: Document Results

步骤9:记录结果

Create
PAPER_IMPROVEMENT_LOG.md
in the paper directory:
markdown
undefined
在论文目录中创建
PAPER_IMPROVEMENT_LOG.md
markdown
undefined

Paper Improvement Log

论文优化日志

Score Progression

评分变化

RoundScoreVerdictKey Changes
Round 0 (original)X/10No/Almost/YesBaseline
Round 1Y/10No/Almost/Yes[summary of fixes]
Round 2Z/10No/Almost/Yes[summary of fixes]
轮次评分结论主要修改
第0轮(原始)X/10否/接近/是基线版本
第1轮Y/10否/接近/是[修改摘要]
第2轮Z/10否/接近/是[修改摘要]

Round 1 Review & Fixes

第1轮评审与修改

<details> <summary>GPT-5.4 xhigh Review (Round 1)</summary>
[Full raw review text, verbatim]
</details>
<details> <summary>GPT-5.4 xhigh评审意见(第1轮)</summary>
[完整原始评审文本,原文保留]
</details>

Fixes Implemented

已实施的修改

  1. [Fix description]
  2. [Fix description] ...
  1. [修改描述]
  2. [修改描述] ...

Round 2 Review & Fixes

第2轮评审与修改

<details> <summary>GPT-5.4 xhigh Review (Round 2)</summary>
[Full raw review text, verbatim]
</details>
<details> <summary>GPT-5.4 xhigh评审意见(第2轮)</summary>
[完整原始评审文本,原文保留]
</details>

Fixes Implemented

已实施的修改

  1. [Fix description]
  2. [Fix description] ...
  1. [修改描述]
  2. [修改描述] ...

PDFs

PDF版本

  • main_round0_original.pdf
    — Original generated paper
  • main_round1.pdf
    — After Round 1 fixes
  • main_round2.pdf
    — Final version after Round 2 fixes
undefined
  • main_round0_original.pdf
    — 原始生成论文
  • main_round1.pdf
    — 第1轮修改后版本
  • main_round2.pdf
    — 第2轮修改后版本(最终版)
undefined

Step 9: Summary

步骤9:总结

Report to user:
  • Score progression table
  • Number of CRITICAL/MAJOR/MINOR issues fixed per round
  • Final page count
  • Remaining issues (if any)
向用户汇报:
  • 评分变化表
  • 每轮解决的严重/主要/次要问题数量
  • 最终页数
  • 剩余问题(如果有)

Feishu Notification (if configured)

飞书通知(若已配置)

After each round's review AND at final completion, check
~/.claude/feishu.json
:
  • After each round: Send
    review_scored
    — "Round N: X/10 — [key changes]"
  • After final round: Send
    pipeline_done
    — score progression table + final page count
  • If config absent or mode
    "off"
    : skip entirely (no-op)
在每轮评审完成后以及最终完成时,检查
~/.claude/feishu.json
  • 每轮评审后:发送
    review_scored
    消息——“第N轮:X/10分 — [主要修改]”
  • 最终轮次后:发送
    pipeline_done
    消息——评分变化表 + 最终页数
  • 如果配置文件不存在或模式为
    "off"
    :完全跳过(不执行任何操作)

Output

输出

paper/
├── main_round0_original.pdf    # Original
├── main_round1.pdf             # After Round 1
├── main_round2.pdf             # After Round 2 (final)
├── main.pdf                    # = main_round2.pdf
└── PAPER_IMPROVEMENT_LOG.md    # Full review log with scores
paper/
├── main_round0_original.pdf    # 原始版本
├── main_round1.pdf             # 第1轮修改后
├── main_round2.pdf             # 第2轮修改后(最终版)
├── main.pdf                    # = main_round2.pdf
└── PAPER_IMPROVEMENT_LOG.md    # 包含评分的完整评审日志

Key Rules

核心规则

  • Large file handling: If the Write tool fails due to file size, immediately retry using Bash (
    cat << 'EOF' > file
    ) to write in chunks. Do NOT ask the user for permission — just do it silently.
  • Preserve all PDF versions — user needs to compare progression
  • Save FULL raw review text — do not summarize or truncate GPT-5.4 responses
  • Use
    mcp__codex__codex-reply
    for Round 2 to maintain conversation context
  • Always recompile after fixes — verify 0 errors before proceeding
  • Do not fabricate experimental results — synthetic validation must describe methodology, not invent numbers
  • Respect the paper's claims — soften overclaims rather than adding unsupported new claims
  • Global consistency — when renaming notation or softening claims, check ALL files (abstract, intro, method, experiments, theory sections, conclusion, tables, figure captions)
  • 大文件处理:如果Write工具因文件大小失败,立即使用Bash命令(
    cat << 'EOF' > file
    )分块写入重试。无需向用户请求权限——直接静默执行。
  • 保留所有PDF版本——用户需要对比优化过程
  • 保存完整原始评审文本——不要总结或截断GPT-5.4的回复
  • 第2轮使用
    mcp__codex__codex-reply
    ——保持对话上下文
  • 修改后始终重新编译——在继续前验证无错误
  • 不要编造实验结果——合成验证必须描述方法,而非虚构数据
  • 尊重论文原有表述——弱化过度表述而非添加无依据的新表述
  • 全局一致性——重命名符号或弱化表述时,检查所有文件(摘要、引言、方法、实验、理论章节、结论、表格、图片说明)

Typical Score Progression

典型评分变化

Based on end-to-end testing on a 9-page ICLR 2026 theory paper:
RoundScoreKey Improvements
Round 04/10 (content)Baseline: assumption-model mismatch, overclaims, notation issues
Round 16/10 (content)Fixed assumptions, softened claims, added interpretation, renamed notation
Round 27/10 (content)Added synthetic validation, formal truncation proposition, stronger limitations
Round 35→8.5/10 (format)Removed hero fig, appendix, compressed conclusion, fixed overfull hbox
+4.5 points across 3 rounds (2 content + 1 format) is typical for a well-structured but rough first draft. Final: 8 pages main body, 0 overfull hbox, ICLR-compliant.
基于对一篇9页ICLR 2026理论论文的端到端测试:
轮次评分主要改进
第0轮4/10(内容)基线版本:假设与模型不匹配、过度表述、符号问题
第1轮6/10(内容)修正假设、弱化表述、添加解释、重命名符号
第2轮7/10(内容)添加合成验证、形式化截断命题、强化局限性章节
第3轮5→8.5/10(格式)移除主图、移至附录、压缩结论、修正内容超出边距问题
3轮共提升4.5分(2轮内容优化+1轮格式优化)是结构完整但初稿粗糙的论文的典型情况。最终版本:主文8页,0个内容超出边距问题,符合ICLR会议要求。