research-pipeline

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Full Research Pipeline: Idea → Experiments → Submission

完整科研工作流:创意 → 实验 → 投稿

End-to-end autonomous research workflow for: $ARGUMENTS
面向**$ARGUMENTS**的端到端自主科研工作流

Constants

常量

  • AUTO_PROCEED = true — When
    true
    , Gate 1 auto-selects the top-ranked idea (highest pilot signal + novelty confirmed) and continues to implementation. When
    false
    , always waits for explicit user confirmation before proceeding.
  • ARXIV_DOWNLOAD = false — When
    true
    ,
    /research-lit
    downloads the top relevant arXiv PDFs during literature survey. When
    false
    (default), only fetches metadata via arXiv API. Passed through to
    /idea-discovery
    /research-lit
    .
  • HUMAN_CHECKPOINT = false — When
    true
    , the auto-review loops (Stage 4) pause after each round's review to let you see the score and provide custom modification instructions before fixes are implemented. When
    false
    (default), loops run fully autonomously. Passed through to
    /auto-review-loop
    .
💡 Override via argument, e.g.,
/research-pipeline "topic" — AUTO_PROCEED: false, human checkpoint: true
.
  • AUTO_PROCEED = true — 当设为
    true
    时,关卡1会自动选择排名第一的创意(最高试点信号+已确认创新性)并继续推进实现。当设为
    false
    时,在推进前始终等待用户明确确认。
  • ARXIV_DOWNLOAD = false — 当设为
    true
    时,
    /research-lit
    会在文献调研期间下载相关度最高的arXiv PDF文件。默认设为
    false
    时,仅通过arXiv API获取元数据。该参数会传递给
    /idea-discovery
    /research-lit
  • HUMAN_CHECKPOINT = false — 当设为
    true
    时,自动评审循环(阶段4)会在每轮评审后暂停,让你查看评分并提供自定义修改指令,之后再执行修复。默认设为
    false
    时,循环完全自主运行。该参数会传递给
    /auto-review-loop
💡 可通过参数覆盖默认设置,例如:
/research-pipeline "topic" — AUTO_PROCEED: false, human checkpoint: true
.

Overview

概述

This skill chains the entire research lifecycle into a single pipeline:
/idea-discovery → implement → /run-experiment → /auto-review-loop → submission-ready
├── Workflow 1 ──┤            ├────────── Workflow 2 ──────────────┤
It orchestrates two major workflows plus the implementation bridge between them.
该技能将整个科研生命周期串联为单一工作流:
/idea-discovery → implement → /run-experiment → /auto-review-loop → submission-ready
├── 工作流1 ──┤            ├────────── 工作流2 ──────────────┤
它编排了两大主要工作流,以及连接两者的实现环节。

Pipeline

工作流详情

Stage 1: Idea Discovery (Workflow 1)

阶段1:创意挖掘(工作流1)

Invoke the idea discovery pipeline:
/idea-discovery "$ARGUMENTS"
This internally runs:
/research-lit
/idea-creator
/novelty-check
/research-review
Output:
IDEA_REPORT.md
with ranked, validated, pilot-tested ideas.
🚦 Gate 1 — Human Checkpoint:
After
IDEA_REPORT.md
is generated, pause and present the top ideas to the user:
📋 Idea Discovery complete. Top ideas:

1. [Idea 1 title] — Pilot: POSITIVE (+X%), Novelty: CONFIRMED
2. [Idea 2 title] — Pilot: WEAK POSITIVE (+Y%), Novelty: CONFIRMED
3. [Idea 3 title] — Pilot: NEGATIVE, eliminated

Recommended: Idea 1. Shall I proceed with implementation?
If AUTO_PROCEED=false: Wait for user confirmation before continuing. The user may:
  • Approve an idea → proceed to Stage 2.
  • Pick a different idea → proceed with their choice.
  • Request changes (e.g., "combine Idea 1 and 3", "focus more on X") → update the idea prompt with user feedback, re-run
    /idea-discovery
    with refined constraints, and present again.
  • Reject all ideas → collect feedback on what's missing, re-run Stage 1 with adjusted research direction. Repeat until the user commits to an idea.
  • Stop here → save current state to
    IDEA_REPORT.md
    for future reference.
If AUTO_PROCEED=true: Present the top ideas, wait 10 seconds for user input. If no response, auto-select the #1 ranked idea (highest pilot signal + novelty confirmed) and proceed to Stage 2. Log:
"AUTO_PROCEED: selected Idea 1 — [title]"
.
⚠️ This gate waits for user confirmation when AUTO_PROCEED=false. When
true
, it auto-selects the top idea after presenting results. The rest of the pipeline (Stages 2-4) is expensive (GPU time + multiple review rounds), so set
AUTO_PROCEED=false
if you want to manually choose which idea to pursue.
调用创意挖掘工作流:
/idea-discovery "$ARGUMENTS"
该工作流内部会依次执行:
/research-lit
/idea-creator
/novelty-check
/research-review
输出:包含经排名、验证和试点测试的创意的
IDEA_REPORT.md
文件。
🚦 关卡1 — 人工检查点:
生成
IDEA_REPORT.md
后,暂停并向用户展示排名靠前的创意
📋 创意挖掘完成。排名靠前的创意:

1. [创意1标题] — 试点结果:POSITIVE (+X%),创新性:已确认
2. [创意2标题] — 试点结果:WEAK POSITIVE (+Y%),创新性:已确认
3. [创意3标题] — 试点结果:NEGATIVE,已淘汰

推荐选择:创意1。是否继续推进实现?
若AUTO_PROCEED=false:等待用户确认后再继续。用户可:
  • 批准某个创意 → 进入阶段2。
  • 选择其他创意 → 按用户选择推进。
  • 要求修改(例如:“结合创意1和3”“更聚焦X方向”)→ 根据用户反馈更新创意提示词,使用优化后的约束条件重新运行
    /idea-discovery
    ,然后再次展示结果。
  • 拒绝所有创意 → 收集用户关于缺失点的反馈,调整研究方向后重新运行阶段1。重复此过程直到用户选定创意。
  • 在此停止 → 将当前状态保存到
    IDEA_REPORT.md
    供后续参考。
若AUTO_PROCEED=true:展示排名靠前的创意后,等待10秒用户输入。若无响应,自动选择排名第1的创意(最高试点信号+已确认创新性)并进入阶段2。记录:
"AUTO_PROCEED: selected Idea 1 — [title]"
⚠️ 阶段1后的人工检查点由AUTO_PROCEED控制。设为
false
时,无用户确认不得推进。设为
true
时,展示结果后自动选择排名第一的创意。阶段2-4的计算成本较高(GPU时长+多轮评审),因此若你希望手动选择研究创意,请将
AUTO_PROCEED
设为
false

Stage 2: Implementation

阶段2:实现

Once the user confirms which idea to pursue:
  1. Read the idea details from
    IDEA_REPORT.md
    (hypothesis, experimental design, pilot code)
  2. Implement the full experiment:
    • Extend pilot code to full scale (multi-seed, full dataset, proper baselines)
    • Add proper evaluation metrics and logging (wandb if configured)
    • Write clean, reproducible experiment scripts
    • Follow existing codebase conventions
  3. Code review: Before deploying, do a self-review:
    • Are all hyperparameters configurable via argparse?
    • Is the random seed fixed and controllable?
    • Are results saved to JSON/CSV for later analysis?
    • Is there proper logging for debugging?
用户确认研究创意后:
  1. 读取创意细节:从
    IDEA_REPORT.md
    中获取假设、实验设计和试点代码
  2. 实现完整实验
    • 将试点代码扩展为全规模版本(多随机种子、完整数据集、标准基线)
    • 添加标准评估指标和日志记录(若已配置则使用wandb)
    • 编写清晰、可复现的实验脚本
    • 遵循现有代码库的规范
  3. 代码自检:部署前进行自我检查:
    • 是否所有超参数都可通过argparse配置?
    • 随机种子是否固定且可控制?
    • 结果是否保存为JSON/CSV格式供后续分析?
    • 是否有完善的调试日志?

Stage 3: Deploy Experiments (Workflow 2 — Part 1)

阶段3:部署实验(工作流2 — 第一部分)

Deploy the full-scale experiments:
/run-experiment [experiment command]
What this does:
  • Check GPU availability on configured servers
  • Sync code to remote server
  • Launch experiments in screen sessions with proper CUDA_VISIBLE_DEVICES
  • Verify experiments started successfully
Monitor progress:
/monitor-experiment [server]
Wait for experiments to complete. Collect results.
部署全规模实验:
/run-experiment [experiment command]
该命令的作用
  • 检查已配置服务器的GPU可用性
  • 将代码同步到远程服务器
  • 在screen会话中启动实验,并设置正确的CUDA_VISIBLE_DEVICES
  • 验证实验是否成功启动
监控进度
/monitor-experiment [server]
等待实验完成,收集结果。

Stage 4: Auto Review Loop (Workflow 2 — Part 2)

阶段4:自动评审循环(工作流2 — 第二部分)

Once initial results are in, start the autonomous improvement loop:
/auto-review-loop "$ARGUMENTS — [chosen idea title]"
What this does (up to 4 rounds):
  1. GPT-5.4 xhigh reviews the work (score, weaknesses, minimum fixes)
  2. Claude Code implements fixes (code changes, new experiments, reframing)
  3. Deploy fixes, collect new results
  4. Re-review → repeat until score ≥ 6/10 or 4 rounds reached
Output:
AUTO_REVIEW.md
with full review history and final assessment.
获取初始结果后,启动自主优化循环:
/auto-review-loop "$ARGUMENTS — [chosen idea title]"
该循环的执行流程(最多4轮)
  1. GPT-5.4 xhigh 对工作成果进行评审(评分、不足点、最小修复建议)
  2. Claude Code 执行修复(代码修改、新实验、重新梳理框架)
  3. 部署修复方案,收集新结果
  4. 重新评审 → 重复上述步骤直到评分≥6/10或完成4轮评审
输出:包含完整评审历史和最终评估的
AUTO_REVIEW.md
文件。

Stage 5: Final Summary

阶段5:最终总结

After the auto-review loop completes, write a final status report:
markdown
undefined
自动评审循环完成后,撰写最终状态报告:
markdown
undefined

Research Pipeline Report

科研工作流报告

Direction: $ARGUMENTS Chosen Idea: [title] Date: [start] → [end] Pipeline: idea-discovery → implement → run-experiment → auto-review-loop
研究方向: $ARGUMENTS 选定创意: [title] 时间范围: [start] → [end] 执行流程: idea-discovery → implement → run-experiment → auto-review-loop

Journey Summary

流程概述

  • Ideas generated: X → filtered to Y → piloted Z → chose 1
  • Implementation: [brief description of what was built]
  • Experiments: [number of GPU experiments, total compute time]
  • Review rounds: N/4, final score: X/10
  • 生成创意数:X → 筛选后剩余Y → 试点测试Z个 → 选定1个
  • 实现情况:[已完成工作的简要描述]
  • 实验情况:[GPU实验数量,总计算时长]
  • 评审轮次:N/4,最终评分:X/10

Final Status

最终状态

  • Ready for submission / [ ] Needs manual follow-up
  • 可投稿 / [ ] 需要人工后续跟进

Remaining TODOs (if any)

剩余待办事项(若有)

  • [items flagged by reviewer that weren't addressed]
  • [评审指出但未解决的问题]

Files Changed

修改的文件

  • [list of key files created/modified]
undefined
  • [创建/修改的关键文件列表]
undefined

Key Rules

核心规则

  • Large file handling: If the Write tool fails due to file size, immediately retry using Bash (
    cat << 'EOF' > file
    ) to write in chunks. Do NOT ask the user for permission — just do it silently.
  • Human checkpoint after Stage 1 is controlled by AUTO_PROCEED. When
    false
    , do not proceed without user confirmation. When
    true
    , auto-select the top idea after presenting results.
  • Stages 2-4 can run autonomously once the user confirms the idea. This is the "sleep and wake up to results" part.
  • If Stage 4 ends at round 4 without positive assessment, stop and report remaining issues. Do not loop forever.
  • Budget awareness: Track total GPU-hours across the pipeline. Flag if approaching user-defined limits.
  • Documentation: Every stage updates its own output file. The full history should be self-contained.
  • Fail gracefully: If any stage fails (no good ideas, experiments crash, review loop stuck), report clearly and suggest alternatives rather than forcing forward.
  • 大文件处理:若写入工具因文件大小限制失败,立即使用Bash命令(
    cat << 'EOF' > file
    )分块重试写入。无需询问用户许可——直接静默执行。
  • 阶段1后的人工检查点由AUTO_PROCEED控制。设为
    false
    时,无用户确认不得推进。设为
    true
    时,展示结果后自动选择排名第一的创意。
  • 用户确认创意后,阶段2-4可自主运行。这是“睡前启动,醒来看结果”的环节。
  • 若阶段4完成4轮评审仍未获得正面评估,则停止并报告剩余问题。不得无限循环。
  • 预算意识:跟踪工作流的总GPU时长。若接近用户设定的限额,及时发出提醒。
  • 文档记录:每个阶段都会更新自身的输出文件。完整历史记录应独立可查。
  • 优雅容错:若任何阶段失败(无优质创意、实验崩溃、评审循环停滞),清晰报告问题并提供替代方案,而非强行推进。

Typical Timeline

典型时间线

StageDurationCan sleep?
1. Idea Discovery30-60 minYes if AUTO_PROCEED=true
2. Implementation15-60 minYes (autonomous after Gate 1)
3. Deploy5 min + experiment timeYes ✅
4. Auto Review1-4 hours (depends on experiments)Yes ✅
Sweet spot: Run Stage 1-2 in the evening, launch Stage 3-4 before bed, wake up to a reviewed paper.
阶段时长可后台运行?
1. 创意挖掘30-60分钟若AUTO_PROCEED=true则可以
2. 实现15-60分钟可以(通过关卡1后自主运行)
3. 部署5分钟 + 实验时长可以 ✅
4. 自动评审1-4小时(取决于实验情况)可以 ✅
最佳实践:晚上完成阶段1-2,睡前启动阶段3-4,醒来即可获得经过评审的论文初稿。