research-pipeline

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Full Research Pipeline: Idea → Experiments → Submission

完整科研工作流：创意 → 实验 → 投稿

End-to-end autonomous research workflow for: $ARGUMENTS

面向**$ARGUMENTS**的端到端自主科研工作流

Constants

常量

AUTO_PROCEED = true — When
```
true
```
, Gate 1 auto-selects the top-ranked idea (highest pilot signal + novelty confirmed) and continues to implementation. When
```
false
```
, always waits for explicit user confirmation before proceeding.
ARXIV_DOWNLOAD = false — When
```
true
```
,
```
/research-lit
```
downloads the top relevant arXiv PDFs during literature survey. When
```
false
```
(default), only fetches metadata via arXiv API. Passed through to
```
/idea-discovery
```
→
```
/research-lit
```
.
HUMAN_CHECKPOINT = false — When
```
true
```
, the auto-review loops (Stage 4) pause after each round's review to let you see the score and provide custom modification instructions before fixes are implemented. When
```
false
```
(default), loops run fully autonomously. Passed through to
```
/auto-review-loop
```
.

💡 Override via argument, e.g.,
/research-pipeline "topic" — AUTO_PROCEED: false, human checkpoint: true
.

AUTO_PROCEED = true — 当设为
```
true
```
时，关卡1会自动选择排名第一的创意（最高试点信号+已确认创新性）并继续推进实现。当设为
```
false
```
时，在推进前始终等待用户明确确认。
ARXIV_DOWNLOAD = false — 当设为
```
true
```
时，
```
/research-lit
```
会在文献调研期间下载相关度最高的arXiv PDF文件。默认设为
```
false
```
时，仅通过arXiv API获取元数据。该参数会传递给
```
/idea-discovery
```
→
```
/research-lit
```
。
HUMAN_CHECKPOINT = false — 当设为
```
true
```
时，自动评审循环（阶段4）会在每轮评审后暂停，让你查看评分并提供自定义修改指令，之后再执行修复。默认设为
```
false
```
时，循环完全自主运行。该参数会传递给
```
/auto-review-loop
```
。

💡 可通过参数覆盖默认设置，例如：
/research-pipeline "topic" — AUTO_PROCEED: false, human checkpoint: true
.

Overview

概述

This skill chains the entire research lifecycle into a single pipeline:

/idea-discovery → implement → /run-experiment → /auto-review-loop → submission-ready
├── Workflow 1 ──┤            ├────────── Workflow 2 ──────────────┤

It orchestrates two major workflows plus the implementation bridge between them.

该技能将整个科研生命周期串联为单一工作流：

/idea-discovery → implement → /run-experiment → /auto-review-loop → submission-ready
├── 工作流1 ──┤            ├────────── 工作流2 ──────────────┤

它编排了两大主要工作流，以及连接两者的实现环节。

Pipeline

工作流详情

Stage 1: Idea Discovery (Workflow 1)

阶段1：创意挖掘（工作流1）

Invoke the idea discovery pipeline:

/idea-discovery "$ARGUMENTS"

This internally runs:

/research-lit

→

/idea-creator

→

/novelty-check

→

/research-review

Output:

IDEA_REPORT.md

with ranked, validated, pilot-tested ideas.

🚦 Gate 1 — Human Checkpoint:

After

IDEA_REPORT.md

is generated, pause and present the top ideas to the user:

📋 Idea Discovery complete. Top ideas:

1. [Idea 1 title] — Pilot: POSITIVE (+X%), Novelty: CONFIRMED
2. [Idea 2 title] — Pilot: WEAK POSITIVE (+Y%), Novelty: CONFIRMED
3. [Idea 3 title] — Pilot: NEGATIVE, eliminated

Recommended: Idea 1. Shall I proceed with implementation?

If AUTO_PROCEED=false: Wait for user confirmation before continuing. The user may:

Approve an idea → proceed to Stage 2.
Pick a different idea → proceed with their choice.
Request changes (e.g., "combine Idea 1 and 3", "focus more on X") → update the idea prompt with user feedback, re-run
```
/idea-discovery
```
with refined constraints, and present again.
Reject all ideas → collect feedback on what's missing, re-run Stage 1 with adjusted research direction. Repeat until the user commits to an idea.
Stop here → save current state to
```
IDEA_REPORT.md
```
for future reference.

If AUTO_PROCEED=true: Present the top ideas, wait 10 seconds for user input. If no response, auto-select the #1 ranked idea (highest pilot signal + novelty confirmed) and proceed to Stage 2. Log:

"AUTO_PROCEED: selected Idea 1 — [title]"

⚠️ This gate waits for user confirmation when AUTO_PROCEED=false. When
true
, it auto-selects the top idea after presenting results. The rest of the pipeline (Stages 2-4) is expensive (GPU time + multiple review rounds), so set
AUTO_PROCEED=false
if you want to manually choose which idea to pursue.

调用创意挖掘工作流：

/idea-discovery "$ARGUMENTS"

该工作流内部会依次执行：

/research-lit

→

/idea-creator

→

/novelty-check

→

/research-review

输出：包含经排名、验证和试点测试的创意的

IDEA_REPORT.md

文件。

🚦 关卡1 — 人工检查点：

生成

IDEA_REPORT.md

后，暂停并向用户展示排名靠前的创意：

📋 创意挖掘完成。排名靠前的创意：

1. [创意1标题] — 试点结果：POSITIVE (+X%)，创新性：已确认
2. [创意2标题] — 试点结果：WEAK POSITIVE (+Y%)，创新性：已确认
3. [创意3标题] — 试点结果：NEGATIVE，已淘汰

推荐选择：创意1。是否继续推进实现？

若AUTO_PROCEED=false：等待用户确认后再继续。用户可：

批准某个创意 → 进入阶段2。
选择其他创意 → 按用户选择推进。
要求修改（例如：“结合创意1和3”“更聚焦X方向”）→ 根据用户反馈更新创意提示词，使用优化后的约束条件重新运行
```
/idea-discovery
```
，然后再次展示结果。
拒绝所有创意 → 收集用户关于缺失点的反馈，调整研究方向后重新运行阶段1。重复此过程直到用户选定创意。
在此停止 → 将当前状态保存到
```
IDEA_REPORT.md
```
供后续参考。

若AUTO_PROCEED=true：展示排名靠前的创意后，等待10秒用户输入。若无响应，自动选择排名第1的创意（最高试点信号+已确认创新性）并进入阶段2。记录：

"AUTO_PROCEED: selected Idea 1 — [title]"

。

⚠️ 阶段1后的人工检查点由AUTO_PROCEED控制。设为
false
时，无用户确认不得推进。设为
true
时，展示结果后自动选择排名第一的创意。阶段2-4的计算成本较高（GPU时长+多轮评审），因此若你希望手动选择研究创意，请将
AUTO_PROCEED
设为
false
。

Stage 2: Implementation

阶段2：实现

Once the user confirms which idea to pursue:

Read the idea details from
```
IDEA_REPORT.md
```
(hypothesis, experimental design, pilot code)
Implement the full experiment:
- Extend pilot code to full scale (multi-seed, full dataset, proper baselines)
- Add proper evaluation metrics and logging (wandb if configured)
- Write clean, reproducible experiment scripts
- Follow existing codebase conventions
Code review: Before deploying, do a self-review:
- Are all hyperparameters configurable via argparse?
- Is the random seed fixed and controllable?
- Are results saved to JSON/CSV for later analysis?
- Is there proper logging for debugging?

用户确认研究创意后：

读取创意细节：从
```
IDEA_REPORT.md
```
中获取假设、实验设计和试点代码
实现完整实验：
- 将试点代码扩展为全规模版本（多随机种子、完整数据集、标准基线）
- 添加标准评估指标和日志记录（若已配置则使用wandb）
- 编写清晰、可复现的实验脚本
- 遵循现有代码库的规范
代码自检：部署前进行自我检查：
- 是否所有超参数都可通过argparse配置？
- 随机种子是否固定且可控制？
- 结果是否保存为JSON/CSV格式供后续分析？
- 是否有完善的调试日志？

Stage 3: Deploy Experiments (Workflow 2 — Part 1)

阶段3：部署实验（工作流2 — 第一部分）

Deploy the full-scale experiments:

/run-experiment [experiment command]

What this does:

Check GPU availability on configured servers
Sync code to remote server
Launch experiments in screen sessions with proper CUDA_VISIBLE_DEVICES
Verify experiments started successfully

Monitor progress:

/monitor-experiment [server]

Wait for experiments to complete. Collect results.

部署全规模实验：

/run-experiment [experiment command]

该命令的作用：

检查已配置服务器的GPU可用性
将代码同步到远程服务器
在screen会话中启动实验，并设置正确的CUDA_VISIBLE_DEVICES
验证实验是否成功启动

监控进度：

/monitor-experiment [server]

等待实验完成，收集结果。

Stage 4: Auto Review Loop (Workflow 2 — Part 2)

阶段4：自动评审循环（工作流2 — 第二部分）

Once initial results are in, start the autonomous improvement loop:

/auto-review-loop "$ARGUMENTS — [chosen idea title]"

What this does (up to 4 rounds):

GPT-5.4 xhigh reviews the work (score, weaknesses, minimum fixes)
Claude Code implements fixes (code changes, new experiments, reframing)
Deploy fixes, collect new results
Re-review → repeat until score ≥ 6/10 or 4 rounds reached

Output:

AUTO_REVIEW.md

with full review history and final assessment.

获取初始结果后，启动自主优化循环：

/auto-review-loop "$ARGUMENTS — [chosen idea title]"

该循环的执行流程（最多4轮）：

GPT-5.4 xhigh 对工作成果进行评审（评分、不足点、最小修复建议）
Claude Code 执行修复（代码修改、新实验、重新梳理框架）
部署修复方案，收集新结果
重新评审 → 重复上述步骤直到评分≥6/10或完成4轮评审

输出：包含完整评审历史和最终评估的

AUTO_REVIEW.md

文件。

Stage 5: Final Summary

阶段5：最终总结

After the auto-review loop completes, write a final status report:

markdown

undefined

自动评审循环完成后，撰写最终状态报告：

markdown

undefined

Research Pipeline Report

科研工作流报告

Direction: $ARGUMENTS Chosen Idea: [title] Date: [start] → [end] Pipeline: idea-discovery → implement → run-experiment → auto-review-loop

研究方向: $ARGUMENTS 选定创意: [title] 时间范围: [start] → [end] 执行流程: idea-discovery → implement → run-experiment → auto-review-loop

Journey Summary

流程概述

Ideas generated: X → filtered to Y → piloted Z → chose 1
Implementation: [brief description of what was built]
Experiments: [number of GPU experiments, total compute time]
Review rounds: N/4, final score: X/10

生成创意数：X → 筛选后剩余Y → 试点测试Z个 → 选定1个
实现情况：[已完成工作的简要描述]
实验情况：[GPU实验数量，总计算时长]
评审轮次：N/4，最终评分：X/10

Final Status

最终状态

Ready for submission / [ ] Needs manual follow-up

可投稿 / [ ] 需要人工后续跟进

Remaining TODOs (if any)

剩余待办事项（若有）

[items flagged by reviewer that weren't addressed]

[评审指出但未解决的问题]

Files Changed

修改的文件

[list of key files created/modified]

undefined

[创建/修改的关键文件列表]

undefined

Key Rules

核心规则

Large file handling: If the Write tool fails due to file size, immediately retry using Bash (
```
cat << 'EOF' > file
```
) to write in chunks. Do NOT ask the user for permission — just do it silently.
Human checkpoint after Stage 1 is controlled by AUTO_PROCEED. When
```
false
```
, do not proceed without user confirmation. When
```
true
```
, auto-select the top idea after presenting results.
Stages 2-4 can run autonomously once the user confirms the idea. This is the "sleep and wake up to results" part.
If Stage 4 ends at round 4 without positive assessment, stop and report remaining issues. Do not loop forever.
Budget awareness: Track total GPU-hours across the pipeline. Flag if approaching user-defined limits.
Documentation: Every stage updates its own output file. The full history should be self-contained.
Fail gracefully: If any stage fails (no good ideas, experiments crash, review loop stuck), report clearly and suggest alternatives rather than forcing forward.

大文件处理：若写入工具因文件大小限制失败，立即使用Bash命令（
```
cat << 'EOF' > file
```
）分块重试写入。无需询问用户许可——直接静默执行。
阶段1后的人工检查点由AUTO_PROCEED控制。设为
```
false
```
时，无用户确认不得推进。设为
```
true
```
时，展示结果后自动选择排名第一的创意。
用户确认创意后，阶段2-4可自主运行。这是“睡前启动，醒来看结果”的环节。
若阶段4完成4轮评审仍未获得正面评估，则停止并报告剩余问题。不得无限循环。
预算意识：跟踪工作流的总GPU时长。若接近用户设定的限额，及时发出提醒。
文档记录：每个阶段都会更新自身的输出文件。完整历史记录应独立可查。
优雅容错：若任何阶段失败（无优质创意、实验崩溃、评审循环停滞），清晰报告问题并提供替代方案，而非强行推进。

Typical Timeline

典型时间线

Stage	Duration	Can sleep?
1. Idea Discovery	30-60 min	Yes if AUTO_PROCEED=true
2. Implementation	15-60 min	Yes (autonomous after Gate 1)
3. Deploy	5 min + experiment time	Yes ✅
4. Auto Review	1-4 hours (depends on experiments)	Yes ✅

Sweet spot: Run Stage 1-2 in the evening, launch Stage 3-4 before bed, wake up to a reviewed paper.

阶段	时长	可后台运行？
1. 创意挖掘	30-60分钟	若AUTO_PROCEED=true则可以
2. 实现	15-60分钟	可以（通过关卡1后自主运行）
3. 部署	5分钟 + 实验时长	可以 ✅
4. 自动评审	1-4小时（取决于实验情况）	可以 ✅

最佳实践：晚上完成阶段1-2，睡前启动阶段3-4，醒来即可获得经过评审的论文初稿。