idea-discovery

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Workflow 1: Idea Discovery Pipeline

工作流1:想法发现流水线

Orchestrate a complete idea discovery workflow for: $ARGUMENTS
为以下内容编排完整的想法发现工作流:$ARGUMENTS

Overview

概述

This skill chains sub-skills into a single automated pipeline:
/research-lit → /idea-creator → /novelty-check → /research-review → /research-refine-pipeline
  (survey)      (brainstorm)    (verify novel)    (critical feedback)  (refine method + plan experiments)
Each phase builds on the previous one's output. The final deliverables are a validated
IDEA_REPORT.md
with ranked ideas, plus a refined proposal (
refine-logs/FINAL_PROPOSAL.md
) and experiment plan (
refine-logs/EXPERIMENT_PLAN.md
) for the top idea.
该技能将多个子技能串联成一条自动化流水线:
/research-lit → /idea-creator → /novelty-check → /research-review → /research-refine-pipeline
  (文献调研)      (头脑风暴)    (新颖性验证)    (批判性反馈)  (方法优化 + 实验规划)
每个阶段都基于前一阶段的输出展开。最终交付物包括一份带有排名想法的已验证
IDEA_REPORT.md
,以及针对最优想法的优化方案(
refine-logs/FINAL_PROPOSAL.md
)和实验规划(
refine-logs/EXPERIMENT_PLAN.md
)。

Constants

常量配置

  • PILOT_MAX_HOURS = 2 — Skip any pilot experiment estimated to take > 2 hours per GPU. Flag as "needs manual pilot" in the report.
  • PILOT_TIMEOUT_HOURS = 3 — Hard timeout: kill any running pilot that exceeds 3 hours. Collect partial results if available.
  • MAX_PILOT_IDEAS = 3 — Run pilots for at most 3 top ideas in parallel. Additional ideas are validated on paper only.
  • MAX_TOTAL_GPU_HOURS = 8 — Total GPU budget across all pilots. If exceeded, skip remaining pilots and note in report.
  • AUTO_PROCEED = true — If user doesn't respond at a checkpoint, automatically proceed with the best option after presenting results. Set to
    false
    to always wait for explicit user confirmation.
  • REVIEWER_MODEL =
    gpt-5.4
    — Model used via Codex MCP. Must be an OpenAI model (e.g.,
    gpt-5.4
    ,
    o3
    ,
    gpt-4o
    ). Passed to sub-skills.
  • ARXIV_DOWNLOAD = false — When
    true
    ,
    /research-lit
    downloads the top relevant arXiv PDFs during Phase 1. When
    false
    (default), only fetches metadata. Passed through to
    /research-lit
    .
💡 These are defaults. Override by telling the skill, e.g.,
/idea-discovery "topic" — pilot budget: 4h per idea, 20h total
or
/idea-discovery "topic" — arxiv download: true
.
  • PILOT_MAX_HOURS = 2 — 跳过任何预估单GPU运行时间超过2小时的试点实验,在报告中标记为“需手动试点”。
  • PILOT_TIMEOUT_HOURS = 3 — 强制超时:终止任何运行时间超过3小时的试点任务,若有部分结果则收集保存。
  • MAX_PILOT_IDEAS = 3 — 最多同时为3个排名靠前的想法开展试点实验,其余想法仅通过书面验证。
  • MAX_TOTAL_GPU_HOURS = 8 — 所有试点任务的总GPU预算。若超出预算,跳过剩余试点并在报告中说明。
  • AUTO_PROCEED = true — 若用户在检查点未回复,展示结果后将自动选择最优方案继续推进。设置为
    false
    则始终等待用户明确确认。
  • REVIEWER_MODEL =
    gpt-5.4
    — 通过Codex MCP调用的模型,必须为OpenAI模型(如
    gpt-5.4
    o3
    gpt-4o
    ),该配置会传递给子技能。
  • ARXIV_DOWNLOAD = false — 设为
    true
    时,第一阶段的
    /research-lit
    会下载相关度最高的arXiv论文PDF;默认设为
    false
    时,仅获取元数据,该配置会传递给
    /research-lit
💡 以上为默认配置,可通过向技能发送指令覆盖,例如:
/idea-discovery "topic" — pilot budget: 4h per idea, 20h total
/idea-discovery "topic" — arxiv download: true

Pipeline

流水线流程

Phase 1: Literature Survey

阶段1:文献调研

Invoke
/research-lit
to map the research landscape:
/research-lit "$ARGUMENTS"
What this does:
  • Search arXiv, Google Scholar, Semantic Scholar for recent papers
  • Build a landscape map: sub-directions, approaches, open problems
  • Identify structural gaps and recurring limitations
  • Output a literature summary (saved to working notes)
🚦 Checkpoint: Present the landscape summary to the user. Ask:
📚 Literature survey complete. Here's what I found:
- [key findings, gaps, open problems]

Does this match your understanding? Should I adjust the scope before generating ideas?
(If no response, I'll proceed with the top-ranked direction.)
  • User approves (or no response + AUTO_PROCEED=true) → proceed to Phase 2 with best direction.
  • User requests changes (e.g., "focus more on X", "ignore Y", "too broad") → refine the search with updated queries, re-run
    /research-lit
    with adjusted scope, and present again. Repeat until the user is satisfied.
调用
/research-lit
梳理研究格局:
/research-lit "$ARGUMENTS"
执行内容:
  • 在arXiv、Google Scholar、Semantic Scholar上搜索近期论文
  • 构建研究格局图谱:子方向、研究方法、待解决问题
  • 识别结构性空白和普遍存在的局限性
  • 输出文献总结(保存至工作笔记)
🚦 检查点: 向用户展示研究格局总结,并询问:
📚 文献调研完成,以下为发现成果:
- [关键发现、空白领域、待解决问题]

这是否符合你的认知?在生成想法前是否需要调整研究范围?
(若未收到回复,我将基于排名最高的方向继续推进。)
  • 用户确认通过(或未回复且AUTO_PROCEED=true)→ 基于最优方向进入阶段2。
  • 用户要求调整(如“更多关注X方向”“忽略Y方向”“范围太宽泛”)→ 更新搜索关键词,调整范围后重新运行
    /research-lit
    ,再次展示结果。重复此过程直至用户满意。

Phase 2: Idea Generation + Filtering + Pilots

阶段2:想法生成 + 筛选 + 试点实验

Invoke
/idea-creator
with the landscape context:
/idea-creator "$ARGUMENTS"
What this does:
  • Brainstorm 8-12 concrete ideas via GPT-5.4 xhigh
  • Filter by feasibility, compute cost, quick novelty search
  • Deep validate top ideas (full novelty check + devil's advocate)
  • Run parallel pilot experiments on available GPUs (top 2-3 ideas)
  • Rank by empirical signal
  • Output
    IDEA_REPORT.md
🚦 Checkpoint: Present
IDEA_REPORT.md
ranked ideas to the user. Ask:
💡 Generated X ideas, filtered to Y, piloted Z. Top results:

1. [Idea 1] — Pilot: POSITIVE (+X%)
2. [Idea 2] — Pilot: WEAK POSITIVE (+Y%)
3. [Idea 3] — Pilot: NEGATIVE, eliminated

Which ideas should I validate further? Or should I regenerate with different constraints?
(If no response, I'll proceed with the top-ranked ideas.)
  • User picks ideas (or no response + AUTO_PROCEED=true) → proceed to Phase 3 with top-ranked ideas.
  • User unhappy with all ideas → collect feedback ("what's missing?", "what direction do you prefer?"), update the prompt with user's constraints, and re-run Phase 2 (idea generation). Repeat until the user selects at least 1 idea.
  • User wants to adjust scope → go back to Phase 1 with refined direction.
结合研究格局上下文调用
/idea-creator
/idea-creator "$ARGUMENTS"
执行内容:
  • 通过GPT-5.4 xhigh头脑风暴生成8-12个具体想法
  • 基于可行性、计算成本、快速新颖性搜索进行筛选
  • 对排名靠前的想法进行深度验证(完整新颖性检查 + 反向质疑)
  • 在可用GPU上并行开展试点实验(针对2-3个最优想法)
  • 基于实验数据信号进行排名
  • 输出
    IDEA_REPORT.md
🚦 检查点: 向用户展示
IDEA_REPORT.md
中的排名想法,并询问:
💡 已生成X个想法,筛选后保留Y个,为Z个想法开展了试点实验。排名靠前的结果如下:

1. [想法1] — 试点结果:POSITIVE (+X%)
2. [想法2] — 试点结果:WEAK POSITIVE (+Y%)
3. [想法3] — 试点结果:NEGATIVE,已淘汰

需要进一步验证哪些想法?或者是否需要调整约束条件重新生成想法?
(若未收到回复,我将基于排名最高的想法继续推进。)
  • 用户选定想法(或未回复且AUTO_PROCEED=true)→ 基于排名靠前的想法进入阶段3。
  • 用户对所有想法不满意→ 收集反馈(“缺少什么?”“偏好什么方向?”),结合用户约束更新提示词,重新运行阶段2(想法生成)。重复此过程直至用户选定至少1个想法。
  • 用户要求调整范围→ 返回阶段1,基于优化后的方向重新调研。

Phase 3: Deep Novelty Verification

阶段3:深度新颖性验证

For each top idea (positive pilot signal), run a thorough novelty check:
/novelty-check "[top idea 1 description]"
/novelty-check "[top idea 2 description]"
What this does:
  • Multi-source literature search (arXiv, Scholar, Semantic Scholar)
  • Cross-verify with GPT-5.4 xhigh
  • Check for concurrent work (last 3-6 months)
  • Identify closest existing work and differentiation points
Update
IDEA_REPORT.md
with deep novelty results. Eliminate any idea that turns out to be already published.
针对每个试点结果为正向的最优想法,开展全面的新颖性检查:
/novelty-check "[最优想法1描述]"
/novelty-check "[最优想法2描述]"
执行内容:
  • 多源文献搜索(arXiv、Scholar、Semantic Scholar)
  • 结合GPT-5.4 xhigh进行交叉验证
  • 检查近期(3-6个月)的同期研究
  • 识别最接近的现有研究及差异化点
更新
IDEA_REPORT.md
,补充深度新颖性验证结果。淘汰任何已被发表的想法。

Phase 4: External Critical Review

阶段4:外部批判性评审

For the surviving top idea(s), get brutal feedback:
/research-review "[top idea with hypothesis + pilot results]"
What this does:
  • GPT-5.4 xhigh acts as a senior reviewer (NeurIPS/ICML level)
  • Scores the idea, identifies weaknesses, suggests minimum viable improvements
  • Provides concrete feedback on experimental design
Update
IDEA_REPORT.md
with reviewer feedback and revised plan.
针对留存的最优想法,获取严苛的评审反馈:
/research-review "[包含假设+试点结果的最优想法]"
执行内容:
  • GPT-5.4 xhigh扮演资深评审专家(NeurIPS/ICML级别)
  • 为想法评分、识别短板、建议最小可行改进方案
  • 针对实验设计提供具体反馈
更新
IDEA_REPORT.md
,补充评审反馈和修订后的规划。

Phase 4.5: Method Refinement + Experiment Planning

阶段4.5:方法优化 + 实验规划

After review, refine the top idea into a concrete proposal and plan experiments:
/research-refine-pipeline "[top idea description + pilot results + reviewer feedback]"
What this does:
  • Freeze a Problem Anchor to prevent scope drift
  • Iteratively refine the method via GPT-5.4 review (up to 5 rounds, until score ≥ 9)
  • Generate a claim-driven experiment roadmap with ablations, budgets, and run order
  • Output:
    refine-logs/FINAL_PROPOSAL.md
    ,
    refine-logs/EXPERIMENT_PLAN.md
    ,
    refine-logs/EXPERIMENT_TRACKER.md
🚦 Checkpoint: Present the refined proposal summary:
🔬 Method refined and experiment plan ready:
- Problem anchor: [anchored problem]
- Method thesis: [one sentence]
- Dominant contribution: [what's new]
- Must-run experiments: [N blocks]
- First 3 runs to launch: [list]

Proceed to implementation? Or adjust the proposal?
  • User approves (or AUTO_PROCEED=true) → proceed to Final Report.
  • User requests changes → pass feedback to
    /research-refine
    for another round.
  • Lite mode: If reviewer score < 6 or pilot was weak, run
    /research-refine
    only (skip
    /experiment-plan
    ) and note remaining risks in the report.
评审完成后,将最优想法细化为具体方案并规划实验:
/research-refine-pipeline "[最优想法描述 + 试点结果 + 评审反馈]"
执行内容:
  • 锁定问题锚点,防止范围偏离
  • 通过GPT-5.4评审迭代优化方法(最多5轮,直至评分≥9)
  • 生成以论点为核心的实验路线图,包含对照实验、预算和执行顺序
  • 输出:
    refine-logs/FINAL_PROPOSAL.md
    refine-logs/EXPERIMENT_PLAN.md
    refine-logs/EXPERIMENT_TRACKER.md
🚦 检查点: 向用户展示优化后的方案摘要:
🔬 方法已优化完成,实验规划就绪:
- 问题锚点:[锁定的问题]
- 方法核心论点:[一句话总结]
- 核心贡献:[创新点]
- 必做实验:[N个模块]
- 首批启动的3项实验:[列表]

是否推进至实施阶段?或者是否需要调整方案?
  • 用户确认通过(或AUTO_PROCEED=true)→ 进入最终报告阶段。
  • 用户要求调整→ 将反馈传递给
    /research-refine
    进行新一轮优化。
  • 轻量模式: 若评审评分<6或试点结果较弱,仅运行
    /research-refine
    (跳过
    /experiment-plan
    ),并在报告中注明剩余风险。

Phase 5: Final Report

阶段5:最终报告

Finalize
IDEA_REPORT.md
with all accumulated information:
markdown
undefined
整合所有信息,完成
IDEA_REPORT.md
的最终版本:
markdown
undefined

Idea Discovery Report

想法发现报告

Direction: $ARGUMENTS Date: [today] Pipeline: research-lit → idea-creator → novelty-check → research-review → research-refine-pipeline
研究方向:$ARGUMENTS 日期:[今日日期] 流水线流程:research-lit → idea-creator → novelty-check → research-review → research-refine-pipeline

Executive Summary

执行摘要

[2-3 sentences: best idea, key evidence, recommended next step]
[2-3句话:最优想法、核心证据、建议下一步行动]

Literature Landscape

研究格局

[from Phase 1]
[来自阶段1的内容]

Ranked Ideas

排名想法

[from Phase 2, updated with Phase 3-4 results]
[来自阶段2的内容,结合阶段3-4结果更新]

🏆 Idea 1: [title] — RECOMMENDED

🏆 想法1:[标题] — 推荐

  • Pilot: POSITIVE (+X%)
  • Novelty: CONFIRMED (closest: [paper], differentiation: [what's different])
  • Reviewer score: X/10
  • Next step: implement full experiment → /auto-review-loop
  • 试点结果:POSITIVE (+X%)
  • 新颖性:已确认(最接近的研究:[论文],差异化点:[创新之处])
  • 评审评分:X/10
  • 下一步:实施完整实验 → /auto-review-loop

Idea 2: [title] — BACKUP

想法2:[标题] — 备选

...
...

Eliminated Ideas

已淘汰想法

[ideas killed at each phase, with reasons]
[各阶段淘汰的想法及原因]

Refined Proposal

优化后方案

  • Proposal:
    refine-logs/FINAL_PROPOSAL.md
  • Experiment plan:
    refine-logs/EXPERIMENT_PLAN.md
  • Tracker:
    refine-logs/EXPERIMENT_TRACKER.md
  • 方案文档:
    refine-logs/FINAL_PROPOSAL.md
  • 实验规划:
    refine-logs/EXPERIMENT_PLAN.md
  • 跟踪文档:
    refine-logs/EXPERIMENT_TRACKER.md

Next Steps

下一步行动

  • /run-experiment to deploy experiments from the plan
  • /auto-review-loop to iterate until submission-ready
  • Or invoke /research-pipeline for the complete end-to-end flow
undefined
  • /run-experiment 部署规划中的实验
  • /auto-review-loop 迭代优化直至达到投稿标准
  • 或调用 /research-pipeline 执行完整端到端流程
undefined

Key Rules

核心规则

  • Large file handling: If the Write tool fails due to file size, immediately retry using Bash (
    cat << 'EOF' > file
    ) to write in chunks. Do NOT ask the user for permission — just do it silently.
  • Don't skip phases. Each phase filters and validates — skipping leads to wasted effort later.
  • Checkpoint between phases. Briefly summarize what was found before moving on.
  • Kill ideas early. It's better to kill 10 bad ideas in Phase 3 than to implement one and fail.
  • Empirical signal > theoretical appeal. An idea with a positive pilot outranks a "sounds great" idea without evidence.
  • Document everything. Dead ends are just as valuable as successes for future reference.
  • Be honest with the reviewer. Include negative results and failed pilots in the review prompt.
  • Feishu notifications are optional. If
    ~/.claude/feishu.json
    exists, send
    checkpoint
    at each phase transition and
    pipeline_done
    at final report. If absent/off, skip silently.
  • 大文件处理:若Write工具因文件大小失败,立即通过Bash(
    cat << 'EOF' > file
    )分块重试,无需询问用户许可,静默执行即可。
  • 不得跳过阶段:每个阶段都承担筛选和验证功能,跳过会导致后续工作浪费。
  • 阶段间设置检查点:推进到下一阶段前,简要总结已发现的内容。
  • 尽早淘汰无效想法:在阶段3淘汰10个糟糕想法,远胜于后续实施一个注定失败的想法。
  • 实验信号优先于理论吸引力:有正向试点结果的想法,排名优于“听起来不错”但无证据支撑的想法。
  • 记录所有内容:失败的尝试与成功的经验对未来研究同样有价值。
  • 对评审保持诚实:在评审提示中包含负面结果和失败的试点实验。
  • 飞书通知为可选配置:若存在
    ~/.claude/feishu.json
    文件,在每个阶段切换时发送
    checkpoint
    通知,在最终报告完成时发送
    pipeline_done
    通知;若文件不存在或未开启,静默跳过即可。

Composing with Workflow 2

与工作流2的组合使用

After this pipeline produces a validated top idea:
/idea-discovery "direction"         ← you are here (Workflow 1, includes method refinement + experiment planning)
/run-experiment                     ← deploy experiments from the plan
/auto-review-loop "top idea"        ← Workflow 2: iterate until submission-ready

Or use /research-pipeline for the full end-to-end flow.
本流水线生成经过验证的最优想法后,可按以下流程推进:
/idea-discovery "direction"         ← 当前位置(工作流1,包含方法优化 + 实验规划)
/run-experiment                     ← 部署规划中的实验
/auto-review-loop "top idea"        ← 工作流2:迭代优化直至达到投稿标准

或调用 /research-pipeline 执行完整端到端流程。