idea-tournament

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Idea Tournament

想法锦标赛

A structured framework for generating diverse research ideas through tree-based expansion, then selecting the strongest candidate via Elo-rated pairwise tournaments across four quality dimensions.
这是一个结构化框架,通过基于树的扩展生成多样化的研究想法,然后通过跨四个质量维度的Elo评分两两对决,选出最优候选方案。

When to Use This Skill

本技能适用场景

  • User has a research direction from
    research-ideation
    and needs concrete, ranked ideas
  • User wants to systematically compare multiple research ideas before committing
  • User asks about idea ranking, competitive selection, or proposal generation
  • User wants to explore variations of a research concept and select the best one
  • User mentions "idea tournament", "rank ideas", "compare approaches", "research proposal", "which idea is best"
  • 用户已通过
    research-ideation
    确定研究方向,需要具体的、经过排名的想法
  • 用户希望在投入前系统地比较多个研究想法
  • 用户询问想法排名、竞争性筛选或提案生成相关问题
  • 用户想要探索某一研究概念的变体并选出最佳方案
  • 用户提及“想法锦标赛”“排名想法”“比较方法”“研究提案”“哪个想法最好”

From Direction to Proposal

从方向到提案

The gap between "I have a research direction" and "I have a concrete proposal" is where most researchers stall. They either commit to their first idea (missing better alternatives) or endlessly brainstorm without converging (analysis paralysis).
The tournament solves both problems. Phase 1 forces breadth — you generate up to N_I=21 candidates (the paper's maximum) by systematically varying technique, domain, and formulation. Phase 2 forces convergence — pairwise Elo comparisons identify the strongest idea without requiring you to hold all candidates in your head simultaneously.
Before starting:
  1. Load prior knowledge from Ideation Memory (M_I):
    • Refer to the evo-memory skill → Read M_I at
      /memory/ideation-memory.md
    • Select the top-2 entries (k_I=2) most relevant to the user's current goal by comparing each entry's Summary and Retrieval Tags against the goal
    • Feasible directions become tree seeds — incorporate them as Level 1 branches in Phase 1
    • Unsuccessful directions (fundamental failures only) are used during pruning — prune any tree branch that matches a fundamental failure pattern
    • If M_I doesn't exist yet (first cycle), skip this step
  2. Retrieve relevant literature L for the user goal G. The paper defines idea tree search as IdeaTreeSearch(G, L, K_I) — literature is a formal input alongside the user goal and retrieved memory. Use web search or provided papers to ground idea generation in existing work.
“我有一个研究方向”和“我有一份具体提案”之间的差距,是大多数研究者停滞不前的地方。他们要么执着于第一个想到的想法(错失更好的替代方案),要么无休止地头脑风暴却无法收敛(分析瘫痪)。
本锦标赛框架能解决这两个问题。第一阶段强制拓展广度——通过系统地改变技术、领域和公式维度,生成最多N_I=21个候选方案(本文设定的最大值)。第二阶段强制收敛——两两Elo比较无需你同时记住所有候选方案,就能找出最优想法。
开始前的准备:
  1. 从Ideation Memory(M_I)加载先验知识:
    • 参考evo-memory技能 → 读取
      /memory/ideation-memory.md
      中的M_I
    • 通过比较每条记录的摘要和检索标签与当前目标的相关性,选择最相关的前2条记录(k_I=2)
    • 可行的方向将作为树的种子——在第一阶段中作为第1层级分支纳入
    • 失败的方向(仅限根本性失败)用于剪枝——剪掉任何与根本性失败模式匹配的树分支
    • 如果M_I尚未存在(首次使用),跳过此步骤
  2. 为用户目标G检索相关文献L。本文将想法树搜索定义为IdeaTreeSearch(G, L, K_I)——文献是与用户目标和检索到的记忆并列的正式输入。使用网络搜索或提供的论文,让想法生成基于现有研究。

Phase 1: Tree-Structured Idea Generation

第一阶段:树状结构想法生成

Expand a seed idea into a tree of candidates by varying one axis per level. The tree structure ensures diversity — each branch explores a fundamentally different variation rather than minor tweaks of the same concept.
通过每层改变一个维度,将种子想法扩展为候选方案树。树结构确保多样性——每个分支探索的是本质不同的变体,而非同一概念的微小调整。

The Three Axes

三个维度

LevelAxisWhat VariesExample
0SeedStarting research direction"Efficient LLM inference"
1TechniqueThe core technical approachPruning, quantization, distillation
2DomainThe application contextEdge devices, multi-modal, long-context
3FormulationThe problem framingLatency-constrained, memory-constrained, accuracy-preserving
层级维度变化内容示例
0种子方向初始研究方向"高效LLM推理"
1技术核心技术方案剪枝、量化、蒸馏
2领域应用场景边缘设备、多模态、长上下文
3公式问题框架延迟受限、内存受限、精度保留

Expansion Process

扩展流程

Level 0 — Seed (1 node): Start with the research direction from
research-ideation
. This is your root node.
Level 1 — Technique variants (3 nodes): Generate 3 fundamentally different technical approaches to the seed direction. These should be distinct paradigms, not variations of the same technique. Reflect carefully to verify each is genuinely different.
Level 2 — Domain adaptations (6-9 nodes): For each Level 1 node, generate 2-3 domain-specific adaptations. How does this technique apply differently in different contexts? What domain-specific constraints create new challenges?
Level 3 — Formulation variants (up to N_I=21 total leaves): For each Level 2 node, refine into 1-3 specific problem formulations. A formulation pins down the exact problem statement — the inputs, outputs, constraints, and evaluation criteria. The paper sets N_I=21 as the maximum number of candidate ideas. If the tree produces fewer than 15 leaves, expand Level 2 or Level 3 further. If more than 21, prune to stay within the N_I limit.
第0层级——种子方向(1个节点):以
research-ideation
输出的研究方向为起点,这是你的根节点。
第1层级——技术变体(3个节点):针对种子方向生成3种本质不同的技术方案。这些应是截然不同的范式,而非同一技术的变体。仔细思考,确保每个方案都真正独特。
第2层级——领域适配(6-9个节点):针对每个第1层级节点,生成2-3个特定领域的适配方案。该技术在不同场景下的应用有何不同?哪些领域特定约束会带来新挑战?
第3层级——公式变体(最多N_I=21个叶子节点):针对每个第2层级节点,细化为1-3个具体的问题公式。公式需明确具体的问题陈述——输入、输出、约束和评估标准。本文设定N_I=21为候选想法的最大数量。如果树生成的叶子节点少于15个,进一步扩展第2或第3层级;如果超过21个,进行剪枝以符合N_I限制。

Per-Node Cycle: Propose → Review → Refine

每个节点的循环:提出→评审→细化

For each new node:
  1. Propose: Write a 2-3 sentence description of the idea
  2. Review: Evaluate critically — Is this genuinely different from sibling nodes? Is it at least plausible?
  3. Refine: Sharpen the description based on the review. Remove vague language. Make the novelty claim specific.
对于每个新节点:
  1. 提出:撰写2-3句话描述该想法
  2. 评审:批判性评估——该想法与兄弟节点是否真正不同?是否至少具备可行性?
  3. 细化:基于评审结果优化描述,删除模糊表述,明确新颖性主张。

Pruning

剪枝

After expanding each level, prune clearly infeasible branches. A branch is "clearly infeasible" if:
  • It requires resources fundamentally unavailable (e.g., proprietary datasets you can't access)
  • It contradicts well-established theoretical results
  • It duplicates an existing, well-established solution with no meaningful variation
  • It appears in
    evo-memory
    's unsuccessful directions as a fundamental failure (not implementation failure)
Important: Pruning removes only the obviously unworkable. Do NOT prune ideas that are risky, unconventional, or outside your current expertise — these are exactly the ideas tournaments are designed to evaluate fairly.
Save the complete tree to
/idea-tree.md
.
See references/tree-search-protocol.md for detailed expansion rules and diversity metrics.
每层扩展完成后,剪掉明显不可行的分支。若分支满足以下任一条件,则视为“明显不可行”:
  • 需要根本无法获取的资源(例如,无法访问的专有数据集)
  • 与已确立的理论结果相矛盾
  • 重复现有成熟解决方案且无有意义的变体
  • 作为根本性失败出现在
    evo-memory
    的失败方向中(非实现失败)
重要提示:仅剪去明显不可行的分支。请勿剪去有风险、非常规或超出当前专业范围的想法——这些正是锦标赛旨在公平评估的想法。
将完整的树保存至
/idea-tree.md
详细的扩展规则和多样性指标请参阅references/tree-search-protocol.md

Phase 2: Elo Tournament Ranking

第二阶段:Elo锦标赛排名

Rank all leaf candidates through pairwise comparisons on four quality dimensions. Swiss-system pairing keeps the number of comparisons manageable while still producing reliable rankings.
通过四个质量维度的两两比较,对所有叶子节点候选方案进行排名。瑞士制配对可控制比较次数,同时生成可靠的排名。

The Four Dimensions

四个维度

DimensionWeightWhat It Measures
Novelty25%How different is this from existing published work?
Feasibility25%Can this be implemented and validated within reasonable time and resources?
Relevance25%Does this address an important, open problem in the field?
Clarity25%Is the idea well-defined enough to start working on immediately?
All dimensions are weighted equally. Researchers tend to overweight novelty and underweight feasibility — equal weights correct this bias.
维度权重评估内容
新颖性25%与已发表研究的差异程度
可行性25%是否能在合理时间和资源内实现并验证
相关性25%是否解决领域内重要的开放问题
清晰度25%想法是否定义清晰,可立即开展工作
所有维度权重相等。研究者往往过度重视新颖性而低估可行性——相等权重可纠正这一偏见。

Tournament Mechanics

锦标赛机制

Starting Elo: 1500 for all candidates.
K-factor: 32 (standard for new players; large enough that a few matches significantly move ratings).
Swiss-system pairing (4-5 rounds):
  1. Round 1: Random pairing
  2. Subsequent rounds: Pair candidates with similar current Elo ratings, avoiding rematches
  3. 4-5 rounds is sufficient for 15-21 candidates to produce stable rankings
Per-match process:
  1. Present both candidates side by side with their full descriptions
  2. Score each on all 4 dimensions (1-10 scale)
  3. Compute composite scores (average of 4 dimensions)
  4. Determine the match winner (higher composite score)
  5. Update Elo ratings using the standard formula (see elo-ranking-guide.md for the formula, worked example, and convergence criteria)
Save rankings to
/idea-rankings.md
.
See references/elo-ranking-guide.md for the detailed rubric and convergence criteria.
初始Elo分值:所有候选方案均为1500分。
K因子:32(适用于新参与者的标准值;数值足够大,几场比赛即可显著改变评分)。
瑞士制配对(4-5轮)
  1. 第一轮:随机配对
  2. 后续轮次:将当前Elo评分相近的候选方案配对,避免重复对战
  3. 对于15-21个候选方案,4-5轮足以生成稳定排名
每场比赛的流程
  1. 并列展示两个候选方案及其完整描述
  2. 在4个维度上分别为每个候选方案评分(1-10分)
  3. 计算综合得分(4个维度的平均分)
  4. 确定比赛胜者(综合得分更高的一方)
  5. 使用标准公式更新Elo评分(公式、示例和收敛标准请参阅elo-ranking-guide.md
将排名结果保存至
/idea-rankings.md
详细的评分标准和收敛标准请参阅references/elo-ranking-guide.md

Phase 3: Direction Summarization

第三阶段:方向总结

Synthesize the top-3 ranked ideas into a "promising directions" summary. This serves two purposes: it preserves optionality (the best idea may combine elements from multiple candidates), and it feeds into
evo-memory
for future cycles.
将排名前三的想法整合为“有前景的方向”摘要。这有两个作用:保留选择权(最佳想法可能融合多个候选方案的元素),并为未来循环提供数据给
evo-memory

Summarization Process

总结流程

For each of the top-3 ideas:
  1. Extract the core research direction (abstract away from specific implementation details)
  2. Identify the key insight that makes this direction promising
  3. Note the primary risk or uncertainty
  4. Check against
    evo-memory
    — has this direction been explored before? What was learned?
Then synthesize across the top-3:
  • What common threads run through the top-3? These may suggest an even stronger combined direction.
  • What dimensions do the top ideas excel in? Are there patterns (e.g., all top ideas score high on feasibility but moderate on novelty)?
  • What's missing? Are there important aspects of the original seed that none of the top ideas address?
Save to
/direction-summary.md
.
After completion, trigger
evo-memory
IDE (Idea Direction Evolution) to update Ideation Memory with the promising directions identified.
针对每个排名前三的想法:
  1. 提取核心研究方向(抽象掉具体实现细节)
  2. 确定该方向具有前景的关键洞察
  3. 记录主要风险或不确定性
  4. 对照
    evo-memory
    检查——该方向是否已被探索过?有哪些经验教训?
然后整合排名前三的想法:
  • 排名前三的想法有哪些共同主线?这些可能指向一个更强大的组合方向。
  • 排名靠前的想法在哪些维度上表现出色?是否存在模式(例如,所有排名靠前的想法在可行性上得分高,但在新颖性上得分中等)?
  • 缺少什么?原始种子方向中是否有重要方面未被任何排名靠前的想法覆盖?
将结果保存至
/direction-summary.md
完成后,触发
evo-memory
的IDE(Idea Direction Evolution,想法方向演化)功能,用识别出的有前景方向更新Ideation Memory。

Phase 4: Proposal Extension

第四阶段:提案扩展

Extend the tournament winner (rank #1) into a full research proposal with enough detail to begin implementation.
将锦标赛胜者(排名第1)扩展为完整的研究提案,内容详细到可立即开展实现工作。

Proposal Structure

提案结构

The paper defines proposal P as containing 5 sections: background, related work, method, experiment plan, and expected results. We extend this with a 6th practical section (risks and mitigations).
1. Background: Define the exact problem — inputs, outputs, constraints, and why existing solutions are insufficient. Be specific: "LLM inference on edge devices with <2GB memory while maintaining >90% of full-model accuracy" is a background statement; "make LLMs faster" is not. Include context and motivation.
2. Related Work: Position the idea within the existing literature. What has been tried? What are the gaps? This should draw on the literature L retrieved during Phase 1 tree generation.
3. Proposed Method: Describe the technical approach at a level of detail sufficient for implementation. Include the key insight that differentiates this from prior work. State assumptions explicitly. List 3 testable contributions.
4. Experiment Plan: Datasets, baselines, metrics, and ablation design. This should align with what
experiment-pipeline
Stage 4 will need. Include both quantitative metrics and qualitative evaluation where appropriate.
5. Expected Results: Quantitative targets (e.g., "15-20% latency reduction with <2% accuracy loss") and qualitative expectations. Being specific about expected results forces you to think about whether the idea is realistic.
6. Risks and Mitigations (practical extension): Technical risks that could prevent success, and fallback plans for each. A proposal without risks is either dishonest or insufficiently analyzed. This section is not in the paper but is valuable for practical research planning.
Save to
/research-proposal.md
.
See references/proposal-extension.md for detailed guidance on each section.
本文定义提案P包含5个部分:背景、相关工作、方法、实验计划、预期结果。我们额外增加了第6个实用部分(风险与缓解措施)。
1. 背景:明确定义问题——输入、输出、约束,以及现有解决方案的不足。表述要具体:“在内存<2GB的边缘设备上运行LLM推理,同时保持>90%的全模型精度”是合格的背景陈述;“让LLM更快”则不是。需包含背景信息和动机。
2. 相关工作:将想法定位在现有文献中。已尝试过哪些方法?存在哪些空白?这部分应基于第一阶段树生成时检索到的文献L。
3. 拟议方法:描述技术方案,详细程度需足以支持实现。包含与先前工作的关键差异点。明确说明假设。列出3个可测试的贡献。
4. 实验计划:数据集、基线、指标和消融实验设计。这部分应与
experiment-pipeline
第4阶段的需求对齐。适当时同时包含定量指标和定性评估。
5. 预期结果:定量目标(例如,“延迟降低15-20%,精度损失<2%”)和定性预期。明确预期结果可促使你思考该想法是否现实。
6. 风险与缓解措施(实用扩展部分):可能阻碍成功的技术风险,以及每个风险的备选方案。没有风险的提案要么不诚实,要么分析不充分。本文未包含此部分,但它对实际研究规划很有价值。
将结果保存至
/research-proposal.md
每个部分的详细指导请参阅references/proposal-extension.md

Counterintuitive Tournament Rules

违反直觉的锦标赛规则

Prioritize these rules during idea generation and ranking:
  1. Quantity before quality: Generate many candidates before evaluating any. Premature filtering kills diversity. You can't know which idea is strongest until you've seen the alternatives — and the best ideas often emerge from unexpected branches of the tree.
  2. Vary one axis per level: Changing multiple axes simultaneously produces ideas that are different but not meaningfully diverse. Each level of the tree should explore ONE dimension of variation, so you understand exactly what makes each branch unique.
  3. Feasibility is not optional: Brilliant but infeasible ideas waste entire research cycles. A novel idea that can't be validated within your constraints is not a contribution — it's a thought experiment. Weight feasibility equally with novelty.
  4. The tournament finds surprises: Structured pairwise comparison often reveals that your initial favorite isn't actually the strongest idea. Trust the rankings over your gut feeling. If the results surprise you, that means the tournament is working — it's surfacing information you wouldn't have found through intuition alone.
  5. Pruning is not selecting: Prune only clearly infeasible branches. The tournament handles quality ranking. If you aggressively prune before the tournament, you're substituting your initial intuition for systematic comparison — exactly the bias the tournament is designed to correct.
  6. Top-3, not top-1: Summarizing the top 3 directions (not just the winner) preserves optionality. The best final approach may combine elements from multiple top candidates. Committing to exactly one idea too early discards valuable signal.
在想法生成和排名过程中,请优先遵循以下规则:
  1. 先数量后质量:先生成大量候选方案,再进行评估。过早过滤会扼杀多样性。在看到所有替代方案之前,你无法知道哪个想法最优——最佳想法往往来自树的意外分支。
  2. 每层只改变一个维度:同时改变多个维度会产生不同但无意义的多样性。树的每一层应仅探索一个变化维度,这样你就能准确理解每个分支的独特之处。
  3. 可行性必不可少:出色但不可行的想法会浪费整个研究周期。一个无法在你的约束条件下验证的新颖想法不是贡献——只是思想实验。将可行性与新颖性同等权重。
  4. 锦标赛会带来惊喜:结构化的两两比较往往会发现,你最初喜欢的想法实际上并非最优。相信排名而非直觉。如果结果让你惊讶,说明锦标赛发挥了作用——它揭示了你仅凭直觉无法发现的信息。
  5. 剪枝不是选择:仅剪去明显不可行的分支。质量排名由锦标赛处理。如果在锦标赛前大幅剪枝,你就是用初始直觉替代系统比较——这正是锦标赛旨在纠正的偏见。
  6. 保留前三,而非仅第一:总结排名前三的方向(而非仅胜者)可保留选择权。最终的最佳方案可能融合多个排名靠前候选方案的元素。过早专注于一个想法会丢弃有价值的信号。

Handoff to Planning

向规划阶段交接

When the tournament is complete and the proposal is written, pass these artifacts to
paper-planning
:
ArtifactSource PhaseUsed By
Research proposal (5+1 sections)Phase 4Story design, experiment planning
Idea tree (full structure)Phase 1Related work positioning
Elo rankings with scoresPhase 2Justification for chosen direction
Direction summary (top-3)Phase 3Fallback directions if primary fails
Tournament scorecardsPhase 2Understanding idea strengths/weaknesses
Also pass results to
evo-memory
for evolution updates:
  • Trigger IDE (Idea Direction Evolution) with the top-3 directions from Phase 3
锦标赛完成且提案撰写完毕后,将以下成果传递给
paper-planning
成果来源阶段使用方
研究提案(5+1个部分)第四阶段故事设计、实验规划
完整想法树结构第一阶段相关工作定位
带评分的Elo排名第二阶段所选方向的合理性证明
方向摘要(前三)第三阶段主方向失败时的备选方向
锦标赛评分卡第二阶段理解想法的优势/劣势
同时将结果传递给
evo-memory
进行演化更新:
  • 用第三阶段的前三方向触发IDE(Idea Direction Evolution)

Skill Integration

技能集成

Before Starting (load memory)

开始前(加载记忆)

Refer to the evo-memory skill to read Ideation Memory: → Read M_I at
/memory/ideation-memory.md
参考evo-memory技能读取Ideation Memory: → 读取
/memory/ideation-memory.md
中的M_I

After Phase 3 (update memory)

第三阶段后(更新记忆)

Refer to the evo-memory skill and trigger IDE: → Run IDE protocol with
/direction-summary.md
参考evo-memory技能并触发IDE: → 用
/direction-summary.md
运行IDE协议

After Phase 4 (handoff to planning)

第四阶段后(向规划阶段交接)

Refer to the paper-planning skill: → Pass
/research-proposal.md
参考paper-planning技能: → 传递
/research-proposal.md

Reference Navigation

参考导航

TopicReference FileWhen to Use
Tree expansion rules and diversitytree-search-protocol.mdGenerating diverse idea candidates
Elo formula, rubric, and pairingelo-ranking-guide.mdRunning the tournament
Proposal section guidanceproposal-extension.mdWriting the research proposal
Idea candidate templateidea-candidate-template.mdDescribing individual ideas
Ranking scorecard templateranking-scorecard-template.mdRecording pairwise comparisons
Direction summary templatedirection-summary-template.mdSynthesizing top-3 directions
主题参考文件适用场景
树扩展规则与多样性tree-search-protocol.md生成多样化候选想法
Elo公式、评分标准与配对elo-ranking-guide.md运行锦标赛
提案各部分指导proposal-extension.md撰写研究提案
候选想法模板idea-candidate-template.md描述单个想法
排名评分卡模板ranking-scorecard-template.md记录两两比较结果
方向摘要模板direction-summary-template.md整合前三方向