idea-discovery-robot

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Robotics Idea Discovery Pipeline

机器人研究想法发现工作流

Orchestrate a robotics-specific idea discovery workflow for: $ARGUMENTS

为以下方向统筹专属机器人领域的想法发现工作流：$ARGUMENTS

Overview

概述

This skill chains four sub-skills into a single automated pipeline:

/research-lit → /idea-creator (robotics framing) → /novelty-check → /research-review
  (survey)              (filter + pilot plan)         (verify novel)    (critical feedback)

But every phase must be grounded in robotics-specific constraints:

Embodiment: arm, mobile manipulator, drone, humanoid, quadruped, autonomous car, etc.
Task family: grasping, insertion, locomotion, navigation, manipulation, rearrangement, multi-step planning
Observation + action interface: RGB/RGB-D/tactile/language; torque/velocity/waypoints/end-effector actions
Simulator / benchmark availability: simulation-first by default
Real robot constraints: hardware availability, reset cost, safety, operator time
Evaluation quality: success rate plus failure cases, safety violations, intervention count, latency, sample efficiency
Sim2real story: whether the idea can stay in sim, needs offline logs, or truly requires hardware

The goal is not to produce flashy demos. The goal is to produce ideas that are:

benchmarkable
falsifiable
feasible with available robotics infrastructure
interesting even if the answer is negative

本技能将四个子技能串联为一条自动化工作流：

/research-lit → /idea-creator (robotics framing) → /novelty-check → /research-review
  (调研)              (机器人领域框架梳理)         (创新性校验)    (批判性评审)

但每个阶段都必须基于机器人领域的特定约束：

具身形态：机械臂、移动操作机器人、无人机、人形机器人、四足机器人、自动驾驶汽车等
任务类别：抓取、插入、移动、导航、操作、重排、长周期规划
观测与动作接口：RGB/RGB-D/触觉/语言；力矩/速度/路径点/末端执行器动作
仿真器/基准测试可用性：默认优先采用仿真方案
实体机器人约束：硬件可用性、重置成本、安全性、操作人员时间
评估质量：成功率+失败案例、安全违规、干预次数、延迟、样本效率
Sim2Real落地路径：想法是否可仅在仿真中验证、是否需要离线日志、或确实需要实体硬件

本工作流的目标并非制作炫目的演示Demo，而是产出具备以下特性的研究想法：

可进行基准测试
可证伪
基于现有机器人基础设施具备可行性
即使得出负面结论仍具备研究价值

Constants

常量定义

MAX_PILOT_IDEAS = 3 — Validate at most 3 top ideas deeply
PILOT_MODE =
sim-first
— Prefer simulation or offline-log pilots before any hardware execution
REAL_ROBOT_PILOTS =
explicit approval only
— Never assume physical robot access or approval
AUTO_PROCEED = true — If user does not respond at checkpoints, proceed with the best sim-first option
REVIEWER_MODEL =
gpt-5.4
— External reviewer model via Codex MCP
TARGET_VENUES = CoRL, RSS, ICRA, IROS, RA-L — Default novelty and reviewer framing

Override inline, e.g.

/idea-discovery-robot "bimanual manipulation" — only sim ideas, no real robot

/idea-discovery-robot "drone navigation" — focus on CoRL/RSS, 2 pilot ideas max

MAX_PILOT_IDEAS = 3 — 最多深度验证3个顶级想法
PILOT_MODE =
sim-first
— 在任何实体硬件执行前，优先选择仿真或离线日志验证方案
REAL_ROBOT_PILOTS =
explicit approval only
— 绝不默认拥有实体机器人的使用权或操作许可
AUTO_PROCEED = true — 若用户在检查点未回应，则采用最优的优先仿真选项继续执行
REVIEWER_MODEL =
gpt-5.4
— 通过Codex MCP调用外部评审模型
TARGET_VENUES = CoRL, RSS, ICRA, IROS, RA-L — 默认的目标会议与创新性评审框架

可在调用时覆盖默认配置，例如：

/idea-discovery-robot "双机械臂操作" — only sim ideas, no real robot

或

/idea-discovery-robot "无人机导航" — focus on CoRL/RSS, 2 pilot ideas max

Execution Rule

执行规则

Follow the phases in order. Do not stop after a checkpoint unless:

the user explicitly says to stop, or
the user asks to change scope and re-run an earlier phase

AUTO_PROCEED=true

and the user does not respond, continue immediately to the next phase using the strongest sim-first, benchmark-grounded option.

按顺序执行各阶段。除非满足以下条件，否则不要在检查点停止：

用户明确要求停止，或
用户要求调整范围并重新运行之前的阶段

若

AUTO_PROCEED=true

且用户未回应，立即使用最优质的优先仿真、基准测试驱动选项进入下一阶段。

Phase 0: Frame the Robotics Problem

阶段0：机器人问题框架构建

Before generating ideas, extract or infer this Robotics Problem Frame from

$ARGUMENTS

and local project context:

Embodiment
Task family
Environment type: tabletop, warehouse, home, outdoor, aerial, driving, legged terrain
Observation modalities
Action interface / controller abstraction
Learning regime: RL, imitation, behavior cloning, world model, planning, VLA/VLM, classical robotics, hybrid
Available assets: simulator, benchmark suite, teleop data, offline logs, existing codebase, real hardware
Compute budget
Safety constraints
Desired contribution type: method, benchmark, diagnosis, systems, sim2real, data curation

If some fields are missing, make explicit assumptions and default to:

simulation-first
public benchmark preferred
no real robot execution

Write this frame into working notes before moving on. Every later decision should reference it.

在生成想法前，从

$ARGUMENTS

和本地项目上下文提取或推断机器人问题框架：

具身形态
任务类别
环境类型：桌面、仓库、家庭、户外、空中、驾驶场景、腿式机器人地形
观测模态
动作接口/控制器抽象
学习范式：RL、模仿学习、行为克隆、世界模型、规划、VLA/VLM、经典机器人学、混合方案
可用资源：仿真器、基准测试套件、遥操作数据、离线日志、现有代码库、实体硬件
计算预算
安全约束
期望贡献类型：方法、基准测试、诊断分析、系统方案、Sim2Real落地、数据整理

若部分字段缺失，需做出明确假设并默认采用：

优先仿真
优先选择公开基准测试
不执行实体机器人测试

在进入下一阶段前，将该框架写入工作笔记。后续所有决策都需参考此框架。

Phase 1: Robotics Literature Survey

阶段1：机器人领域文献调研

Invoke:

/research-lit "$ARGUMENTS — focus venues: CoRL, RSS, ICRA, IROS, RA-L, TRO, Science Robotics"

Then reorganize the findings using a robotics lens instead of a generic ML lens.

调用：

/research-lit "$ARGUMENTS — focus venues: CoRL, RSS, ICRA, IROS, RA-L, TRO, Science Robotics"

随后以机器人领域视角而非通用机器学习视角重组调研结果。

Build a Robotics Landscape Matrix

构建机器人领域全景矩阵

For each relevant paper, classify:

Axis	Examples
Embodiment	single-arm, mobile manipulator, humanoid, drone, quadruped
Task	pick-place, insertion, navigation, locomotion, long-horizon rearrangement
Learning setup	RL, BC, IL, offline RL, world model, planning, diffusion policy
Observation	RGB, RGB-D, proprioception, tactile, language
Action abstraction	torque, joint velocity, end-effector delta pose, waypoint planner
Eval regime	pure sim, sim+real, real-only, offline benchmark
Benchmark	ManiSkill, RLBench, Isaac Lab, Habitat, Meta-World, CALVIN, LIBERO, custom
Metrics	success rate, collision rate, intervention count, path length, latency, energy
Main bottleneck	sample inefficiency, brittleness, reset cost, perception drift, sim2real gap

针对每篇相关论文，按以下维度分类：

维度	示例
具身形态	single-arm, mobile manipulator, humanoid, drone, quadruped
任务	pick-place, insertion, navigation, locomotion, long-horizon rearrangement
学习设置	RL, BC, IL, offline RL, world model, planning, diffusion policy
观测方式	RGB, RGB-D, proprioception, tactile, language
动作抽象	torque, joint velocity, end-effector delta pose, waypoint planner
评估方案	pure sim, sim+real, real-only, offline benchmark
基准测试	ManiSkill, RLBench, Isaac Lab, Habitat, Meta-World, CALVIN, LIBERO, custom
评估指标	success rate, collision rate, intervention count, path length, latency, energy
核心瓶颈	sample inefficiency, brittleness, reset cost, perception drift, sim2real gap

Search Priorities

调研优先级

When refining the survey, prioritize:

recent work from CoRL, RSS, ICRA, IROS, RA-L
recent arXiv papers from the last 6-12 months
benchmark papers and follow-up reproductions
negative-result or diagnosis papers if they reveal system bottlenecks

优化调研范围时，优先关注：

CoRL, RSS, ICRA, IROS, RA-L的近期研究
过去6-12个月的arXiv预印本
基准测试相关论文及后续复现研究
揭示系统瓶颈的负面结论或诊断分析类论文

What to Look For

调研重点

Do not stop at "who got the best success rate." Explicitly identify:

recurring failure modes papers do not fix
benchmarks that are saturated or misleading
places where embodiment changes invalidate prior conclusions
methods that only work with privileged observations
ideas whose reported gains come from reset engineering, reward shaping, or hidden infrastructure
task families where evaluation quality is weak even if performance numbers look high

Checkpoint: Present the landscape to the user in robotics terms:

🤖 Robotics survey complete. I grouped the field by embodiment, benchmark, action interface, and sim2real setup.

Main gaps:
1. [...]
2. [...]
3. [...]

Should I generate ideas under this framing, or should I narrow to a specific robot / benchmark / modality?

User approves (or no response + AUTO_PROCEED=true) → proceed to Phase 2 with the best robotics frame.
User requests changes (e.g. narrower embodiment, different benchmark family, no sim2real, no hardware) → refine the robotics frame, re-run Phase 1, and present again.

不要仅停留在“谁取得了最高成功率”。需明确识别：

论文未解决的反复出现的失败模式
已饱和或存在误导性的基准测试
具身形态变化导致先前结论失效的场景
仅在特权观测下有效的方法
收益来自重置工程、奖励塑形或隐藏基础设施的想法
即使性能数值亮眼但评估质量薄弱的任务类别

检查点：以机器人领域术语向用户呈现调研全景：

🤖 机器人领域调研完成。我按具身形态、基准测试、动作接口及Sim2Real方案对领域进行了分组。

核心研究缺口：
1. [...]
2. [...]
3. [...]

是否基于此框架生成研究想法，或是否需要缩小至特定机器人/基准测试/模态范围？

用户确认（或无回应+AUTO_PROCEED=true）→ 采用最优机器人框架进入阶段2
用户要求调整（例如：缩小具身形态范围、更换基准测试类别、无需Sim2Real、无需实体硬件）→ 优化机器人框架，重新运行阶段1并再次呈现结果

Phase 2: Robotics-Specific Idea Generation and Filtering

阶段2：机器人领域专属想法生成与筛选

Generate ideas only after the robotics frame is explicit.

Invoke the existing idea generator, but pass the Robotics Problem Frame and landscape matrix into the prompt so it does not produce generic ML ideas:

/idea-creator "$ARGUMENTS — robotics frame: [paste Robotics Problem Frame] — focus venues: CoRL, RSS, ICRA, IROS, RA-L — benchmark-specific ideas only — sim-first pilots — no real-robot execution without explicit approval — require failure metrics and baseline clarity"

Then rewrite and filter the output using the robotics-specific rules below.

Each candidate idea must include:

One-sentence summary
Target embodiment
Target benchmark / simulator / dataset
Core bottleneck being addressed
Minimum sim-first pilot
Mandatory metrics
Expected failure mode if the idea does not work
Whether the idea truly needs real hardware

仅在机器人框架明确后生成想法。

调用现有想法生成器，但需将机器人问题框架和全景矩阵传入提示词，避免产出通用机器学习想法：

/idea-creator "$ARGUMENTS — robotics frame: [粘贴机器人问题框架] — focus venues: CoRL, RSS, ICRA, IROS, RA-L — benchmark-specific ideas only — sim-first pilots — no real-robot execution without explicit approval — require failure metrics and baseline clarity"

随后使用以下机器人领域特定规则重写并筛选输出结果。

每个候选想法必须包含：

一句话摘要
目标具身形态
目标基准测试/仿真器/数据集
核心解决的瓶颈
最小化优先仿真验证方案
强制评估指标
若想法无效的预期失败模式
是否确实需要实体硬件

Good Robotics Idea Patterns

优质机器人研究想法特征

Prefer ideas that:

expose a real bottleneck in perception-action coupling
improve robustness under embodiment or environment shift
reduce operator time, reset cost, or demonstration cost
strengthen sim2real transfer with measurable mechanisms
improve recovery, retry behavior, or failure detection
create a better benchmark, diagnostic, or evaluation protocol
test an assumption the community repeats but rarely measures

优先选择具备以下特性的想法：

暴露感知-动作耦合的真实瓶颈
提升具身形态或环境变化下的鲁棒性
减少操作人员时间、重置成本或演示数据成本
通过可测量机制增强Sim2Real迁移效果
提升恢复、重试能力或故障检测效率
构建更优的基准测试、诊断或评估协议
验证社区普遍重复但极少量化的假设

Weak Robotics Idea Patterns

低质量机器人研究想法特征

Downrank ideas that are mostly:

"apply a foundation model / VLM / diffusion model to robot X" with no new bottleneck analysis
demo-driven but not benchmarkable
dependent on inaccessible hardware, custom sensors, or massive private datasets
impossible to evaluate without a months-long infrastructure build
only interesting if everything works perfectly

以下类型的想法需降低优先级：

仅“将大语言模型/VLM/扩散模型应用于机器人X”而无新瓶颈分析
以演示为驱动但无法进行基准测试
依赖难以获取的硬件、定制传感器或大规模私有数据集
需耗时数月搭建基础设施才能评估
仅在完全成功时具备研究价值

Filtering Rules

筛选规则

For each idea, reject or heavily downrank if:

no concrete simulator or benchmark is available
no credible baseline exists
no measurable metric beyond "looks better"
real robot execution is required but hardware access is unclear
the setup depends on privileged observations that make the claim weak
the expected contribution disappears if evaluation is made fair

Checkpoint: Present the ranked robotics ideas before novelty checking:

💡 Robotics ideas generated. Top candidates:

1. [Idea 1] — Embodiment: [...] — Benchmark: [...] — Pilot: sim/offline — Risk: LOW/MEDIUM/HIGH
2. [Idea 2] — Embodiment: [...] — Benchmark: [...] — Pilot: sim/offline — Risk: LOW/MEDIUM/HIGH
3. [Idea 3] — requires hardware / weak benchmark / high risk

Should I carry the top sim-first ideas into novelty checking and external review?
(If no response, I'll continue with the strongest benchmark-grounded ideas.)

User picks ideas (or no response + AUTO_PROCEED=true) → proceed to Phase 3 with the top sim-first ideas, then continue to Phase 4 and Phase 5.
User wants different constraints → update the robotics frame and re-run Phase 2.
User wants narrower scope → go back to Phase 1 with a tighter embodiment / task / benchmark focus.

若候选想法满足以下任一条件，需拒绝或大幅降低优先级：

无可用的具体仿真器或基准测试
无可信的基线方案
除“视觉效果更优”外无其他可测量指标
需实体机器人执行但硬件使用权不明确
依赖特权观测导致结论说服力弱
若采用公平评估则贡献消失

检查点：在创新性校验前向用户呈现排名后的机器人研究想法：

💡 机器人研究想法已生成。顶级候选：

1. [想法1] — 具身形态: [...] — 基准测试: [...] — 验证方案: 仿真/离线 — 风险: 低/中/高
2. [想法2] — 具身形态: [...] — 基准测试: [...] — 验证方案: 仿真/离线 — 风险: 低/中/高
3. [想法3] — 需实体硬件/基准测试薄弱/高风险

是否将顶级优先仿真想法带入创新性校验与外部评审环节？
(若无回应，我将采用最优质的基准测试驱动想法继续执行。)

用户选择想法（或无回应+AUTO_PROCEED=true）→ 携带顶级优先仿真想法进入阶段3，随后继续阶段4和阶段5
用户要求调整约束→ 更新机器人框架并重新运行阶段2
用户要求缩小范围→ 返回阶段1，收紧具身形态/任务/基准测试范围

Phase 3: Feasibility and Pilot Design

阶段3：可行性分析与验证方案设计

For the top ideas, design a minimal validation package.

If the repository already contains a usable simulator, benchmark harness, or offline dataset pipeline, you may validate the top 1-3 ideas there. If not, do not force execution. Produce a concrete pilot plan instead.

By default, pilots should be one of:

simulation pilot
offline log / dataset pilot
analysis-only pilot using existing benchmark outputs

Only propose a real-robot pilot if the user explicitly wants that.

For each surviving idea, specify:

markdown

- Embodiment:
- Benchmark / simulator:
- Baselines:
- Pilot type: sim / offline / real
- Compute estimate:
- Human/operator time:
- Success metrics:
- Failure metrics:
- Safety concerns:
- What result would count as positive signal:
- What negative result would still be publishable:

针对顶级想法，设计最小化验证方案包。

若仓库中已包含可用的仿真器、基准测试工具或离线数据处理流程，可在其中验证1-3个顶级想法。若没有，请勿强制执行，需产出具体的验证方案。

默认情况下，验证方案应为以下类型之一：

仿真验证
离线日志/数据集验证
仅分析验证：使用现有基准测试输出

仅在用户明确要求时，才提出实体机器人验证方案。

针对每个留存的想法，需明确：

markdown

- 具身形态:
- 基准测试/仿真器:
- 基线方案:
- 验证类型: 仿真/离线/实体
- 计算资源预估:
- 操作人员时间预估:
- 成功指标:
- 失败指标:
- 安全顾虑:
- 何为积极信号:
- 何为仍具备发表价值的负面结论:

Real Robot Rule

实体机器人规则

Never auto-proceed to physical robot testing. If an idea needs hardware:

mark it as
```
needs physical validation
```
design the sim or offline precursor first
ask for explicit user confirmation before any real-robot step

If no cheap sim/offline pilot exists, keep the idea in the report but label it high execution risk.

After Phase 3, continue to Phase 4 even if you only produced a pilot plan rather than running a pilot. Lack of immediate execution is not a reason to stop the workflow.

绝不要自动进入实体机器人测试环节。若想法需要硬件：

标记为
```
needs physical validation
```
先设计仿真或离线前置验证方案
在任何实体机器人操作前需获得用户明确确认

若不存在低成本的仿真/离线验证方案，需在报告中保留该想法但标记为高执行风险。

完成阶段3后，即使仅产出验证方案而非实际执行验证，仍需继续进入阶段4。缺乏即时执行条件并非终止工作流的理由。

Phase 4: Deep Novelty Verification

阶段4：深度创新性验证

For each top idea, run:

/novelty-check "[idea description with embodiment + task family + benchmark + sensor stack + controller/policy class + sim2real angle + target venues: CoRL/RSS/ICRA/IROS/RA-L]"

Robotics novelty checks must include:

embodiment
task family
benchmark / simulator
sensor stack
controller / policy type
sim2real or safety angle if relevant

Be especially skeptical of ideas that are just:

old method + new benchmark
VLA/VLM + standard manipulation benchmark
sim2real claim without new transfer mechanism

If the method is not novel but the finding or evaluation protocol is, say that explicitly.

针对每个顶级想法，运行：

/novelty-check "[包含具身形态+任务类别+基准测试+传感器栈+控制器/策略类别+Sim2Real视角+target venues: CoRL/RSS/ICRA/IROS/RA-L的想法描述]"

机器人领域创新性校验必须包含：

具身形态
任务类别
基准测试/仿真器
传感器栈
控制器/策略类型
若相关则包含Sim2Real或安全视角

需特别警惕以下类型的想法：

旧方法+新基准测试
VLA/VLM+标准操作基准测试
无新迁移机制的Sim2Real宣称

若方法无创新性但发现或评估协议具备创新性，需明确说明。

Phase 5: External Robotics Review

阶段5：外部机器人领域评审

Invoke:

/research-review "[top idea with robotics framing, embodiment, benchmark, baselines, pilot plan, evaluation metrics, and sim2real/hardware risks — review as CoRL/RSS/ICRA reviewer]"

Frame the reviewer as a senior CoRL / RSS / ICRA reviewer. Ask them to focus on:

whether the contribution is really new for robotics, not just ML
the minimum benchmark package needed for credibility
whether the sim2real story is justified
missing baselines or failure analyses
whether the idea survives realistic infrastructure constraints

Update the report with the reviewer's minimum viable evidence package.

调用：

/research-review "[包含机器人领域框架、具身形态、基准测试、基线方案、验证方案、评估指标及Sim2Real/硬件风险的顶级想法 — 以CoRL/RSS/ICRA评审专家视角评审]"

将评审专家设定为资深CoRL / RSS / ICRA评审，要求其重点关注：

贡献是否为机器人领域专属创新，而非仅机器学习领域创新
具备可信度所需的最小基准测试包
Sim2Real落地路径是否合理
缺失的基线方案或失败分析
想法是否符合实际基础设施约束

使用评审专家提出的最小可行证据包更新报告。

Phase 6: Final Report

阶段6：最终报告

Write or update

IDEA_REPORT.md

with a robotics-specific structure so it stays compatible with downstream workflows.

markdown

undefined

撰写或更新

IDEA_REPORT.md

，采用机器人领域专属结构，确保与下游工作流兼容。

markdown

undefined

Robotics Idea Discovery Report

机器人研究想法发现报告

Direction: $ARGUMENTS Date: [today] Pipeline: research-lit → idea-creator (robotics framing) → novelty-check → research-review

研究方向: $ARGUMENTS 日期: [今日] 工作流: research-lit → idea-creator (robotics framing) → novelty-check → research-review

Robotics Problem Frame

机器人问题框架

Embodiment:
Task family:
Observation / action interface:
Available assets:
Constraints:

具身形态:
任务类别:
观测/动作接口:
可用资源:
约束条件:

Landscape Matrix

领域全景矩阵

[grouped by embodiment, benchmark, and bottleneck]

[按具身形态、基准测试及瓶颈分组]

Ranked Ideas

排名后的研究想法

Idea 1: [title] — RECOMMENDED

想法1: [标题] — 推荐

Embodiment:
Benchmark / simulator:
Bottleneck addressed:
Pilot type: sim / offline / real
Positive signal:
Novelty:
Reviewer score:
Hardware risk:
Next step:

具身形态:
基准测试/仿真器:
解决的核心瓶颈:
验证类型: 仿真/离线/实体
积极信号:
创新性:
评审得分:
硬件风险:
下一步:

Eliminated Ideas

被淘汰的想法

[idea] — killed because benchmark unclear / hardware inaccessible / novelty weak / no fair evaluation

[想法] — 淘汰原因: 基准测试不明确/硬件不可获取/创新性薄弱/无公平评估方案

Evidence Package for the Top Idea

顶级想法的证据包

Required baselines:
Required metrics:
Required failure cases:
Whether real robot evidence is mandatory:

必需基线方案:
必需评估指标:
必需失败案例:
是否必须实体机器人证据:

Next Steps

下一步计划

Implement sim-first pilot
Run /novelty-check on the final idea wording
Only after approval: consider hardware validation

undefined

实现优先仿真验证方案
针对最终想法表述运行/novelty-check
仅在获得批准后：考虑实体硬件验证

undefined

Key Rules

核心规则

Simulation first. Hardware is never the default.
Benchmark specificity is mandatory. No benchmark, no serious idea.
Evaluation must include failures. Success rate alone is not enough.
Embodiment matters. Do not assume a result on one robot transfers to another.
Avoid foundation-model theater. Novel terminology is not novelty.
Infrastructure realism matters. Operator time, reset burden, and safety count as research constraints.
If the contribution is mainly diagnostic or evaluative, say so. That can still be publishable.

优先仿真。硬件绝非默认选项。
必须明确基准测试。无基准测试则无严谨研究想法。
评估必须包含失败分析。仅成功率不足以支撑研究价值。
具身形态至关重要。不要假设某一机器人的结论可迁移至其他机器人。
避免大模型噱头。新颖术语不等于创新性。
基础设施现实性至关重要。操作人员时间、重置负担及安全性均为研究约束。
若贡献主要为诊断或评估类，需明确说明。此类贡献仍具备发表价值。

Composing with Later Work

与后续工作流的衔接

After this workflow identifies a strong robotics idea:

/idea-discovery-robot "direction"   ← you are here
implement sim-first pilot
/run-experiment                     ← if infrastructure exists
/auto-review-loop "top robotics idea"

If no simulator or benchmark is available yet, stop at the report and ask the user to choose whether to build infrastructure or pivot to a more executable idea.

本工作流识别出优质机器人研究想法后：

/idea-discovery-robot "direction"   ← 当前阶段
implement sim-first pilot
/run-experiment                     ← 若基础设施存在
/auto-review-loop "top robotics idea"

若尚无可用的仿真器或基准测试，需在报告阶段停止并询问用户是搭建基础设施还是转向更具可执行性的想法。