experiment-craft

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Experiment Craft

实验调试与迭代体系

A systematic approach to running, debugging, and iterating on research experiments. The critical skill is not running more experiments — it's understanding WHY experiments fail.
一种用于运行、调试和迭代研究实验的系统性方法。核心技能不在于进行更多实验,而在于理解实验失败的原因

When to Use This Skill

何时使用该Skill

  • User's experiment is not working or producing unexpected results
  • User needs help diagnosing why a method fails on certain data
  • User wants to organize their experiment process with structured logging
  • User asks about debugging research code or iterating on approaches
  • User mentions "experiment debugging", "why doesn't this work", "experiment log", "results are wrong"
This skill is typically loaded from within
experiment-pipeline
when a stage attempt fails. After debugging, return to the pipeline's stage-gate structure to continue. Can also be used standalone for any experiment debugging.
  • 用户的实验无法正常运行或产生意外结果
  • 用户需要帮助诊断某方法在特定数据上失效的原因
  • 用户希望通过结构化日志梳理其实验流程
  • 用户询问研究代码调试或方法迭代相关问题
  • 用户提及“实验调试”、“为什么这个方法无效”、“实验日志”、“结果有误”等内容
通常当
experiment-pipeline
中的某个阶段尝试失败时,会加载该Skill。调试完成后,返回至pipeline的阶段门控结构继续流程。该Skill也可独立用于任何实验调试场景。

The Debugging Mindset

调试思维模式

Finding WHY experiments fail is the most critical research skill. Not analyzing results leads to two failure modes:
  1. Slow progress: Running random experiments without understanding failure causes
  2. Wasted time: Abandoning good approaches because activation tricks were missed
The goal is not to run more experiments. The goal is to run the RIGHT experiments — ones that isolate causes and test specific hypotheses.
找出实验失败的原因是最重要的研究技能。 不分析结果会导致两种失败模式:
  1. 进展缓慢:在不理解失败原因的情况下随机进行实验
  2. 时间浪费:因为忽略了激活技巧而放弃有效的方法
我们的目标不是进行更多实验,而是进行正确的实验——即那些能够定位原因、验证特定假设的实验。

5-Step Diagnostic Flow

五步诊断流程

When an experiment fails or produces unexpected results, follow these five steps:
当实验失败或产生意外结果时,遵循以下五个步骤:

Step 1: Collect Failure Cases

步骤1:收集失败案例

Gather concrete examples of bad results. Look at the actual outputs, not just aggregate metrics. What specifically went wrong? Are the failures systematic or random?
收集具体的不良结果示例。查看实际输出,而不仅仅是汇总指标。具体哪里出了问题?失败是系统性的还是随机的?

Step 2: Find a Working Version

步骤2:找到可正常运行的版本

You need a baseline that works. Two ways to find one:
  • Simplify the task: Reduce data complexity, relax the task setting, add more supervision, use easier inputs
  • Remove your changes: Start from the baseline method and remove your algorithmic improvements one by one
If you can't find any working version, simplify further until something works. There is always a simple enough version that works.
你需要一个可正常运行的基线版本。有两种方法可以找到:
  • 简化任务:降低数据复杂度、放宽任务设置、增加监督、使用更简单的输入
  • 移除你的变更:从基线方法开始,逐一移除你所做的算法改进
如果找不到任何可正常运行的版本,进一步简化直到可以正常运行为止。总有一个足够简单的版本能够正常运行。

Step 3: Bridge the Gap

步骤3:缩小差距

Starting from the working version, incrementally add complexity until it breaks:
  • Add ONE factor at a time (more complex data, one algorithmic change, one constraint)
  • Find the single factor that causes failure
  • The more atomic the identified cause, the more useful the diagnosis
This step isolates the cause. Without it, you're guessing.
从可正常运行的版本开始,逐步增加复杂度直到出现问题:
  • 每次只添加一个因素(更复杂的数据、一项算法变更、一个约束条件)
  • 找出导致失败的单一因素
  • 定位到的原因越具体,诊断结果就越有用
这一步是为了隔离原因。没有这一步,你只是在猜测。

Step 4: Hypothesize and Verify

步骤4:提出假设并验证

Based on the isolated cause from Step 3:
  1. List possible explanations for why this factor causes failure
  2. Rank by likelihood (based on your understanding and literature)
  3. Design targeted experiments to verify or eliminate each hypothesis
  4. Confirm the actual cause experimentally — don't rely on intuition alone
基于步骤3中隔离出的原因:
  1. 列出该因素导致失败的可能解释
  2. 根据你的理解和相关文献按可能性排序
  3. 设计针对性实验来验证或排除每个假设
  4. 通过实验确认实际原因——不要仅依赖直觉

Step 5: Propose and Implement a Fix

步骤5:提出并实施修复方案

Based on the confirmed cause:
  • Search for techniques that address this specific cause (use your literature tree from the
    research-ideation
    skill)
  • Design a fix that targets the confirmed cause, not the surface symptom
  • Verify the fix works on the original failure cases
  • Check that the fix doesn't break previously working cases
See references/debugging-methodology.md for detailed branching logic and a cause taxonomy.
基于已确认的原因:
  • 寻找能够解决该特定原因的技术(使用
    research-ideation
    Skill中的文献树)
  • 设计针对已确认原因的修复方案,而非表面症状
  • 在原始失败案例上验证修复方案是否有效
  • 检查修复方案是否会破坏之前可正常运行的案例
查看references/debugging-methodology.md获取详细的分支逻辑和原因分类。

Counterintuitive Experiment Rules

违反直觉的实验规则

Prioritize these rules during experimental work:
  1. Change only one variable at a time: If you change two things and it works, you don't know which one fixed it. If you change two things and it doesn't work, you don't know which one is wrong. Single-variable changes are slower per experiment but faster overall.
  2. Fast iteration requires effective experiments, not more experiments: Blind experimentation makes things worse. One well-designed diagnostic experiment is worth ten random trials.
  3. Some great techniques don't work alone: They need specific activation tricks — learning rate schedules, initialization schemes, data preprocessing steps. Don't discard a technique after one failed attempt. Check related papers for their undisclosed tricks.
  4. Check related papers for their tricks: Papers solving similar technical challenges often have critical implementation details buried in supplementary material or code. These tricks can make the difference between a technique working or failing.
  5. "Once you've ruled out the impossible, whatever remains must be true": Systematic elimination beats intuition. When debugging, explicitly list ALL possible causes, then eliminate them one by one with targeted experiments.
在实验过程中优先遵循以下规则:
  1. 每次只更改一个变量:如果你同时更改两个内容后实验成功,你无法确定是哪一个起了作用。如果同时更改两个内容后实验失败,你也无法确定是哪一个出了问题。单变量更改每次实验的速度较慢,但总体进度更快。
  2. 快速迭代需要有效的实验,而非更多实验:盲目实验会让情况更糟。一个设计良好的诊断实验抵得上十次随机尝试。
  3. 一些优秀的技术无法单独生效:它们需要特定的激活技巧——学习率调度、初始化方案、数据预处理步骤。不要在一次尝试失败后就放弃某项技术。查看相关论文了解未公开的技巧。
  4. 查看相关论文获取技巧:解决类似技术难题的论文通常会在补充材料或代码中包含关键实现细节。这些技巧可能是技术生效与否的关键。
  5. “排除所有不可能的情况后,剩下的无论多么不可思议,都是真相”:系统性排除法优于直觉。调试时,明确列出所有可能的原因,然后通过针对性实验逐一排除。

Experiment Logging

实验日志记录

Every experiment should be logged with five sections. Use the template at assets/experiment-log-template.md.
SectionWhat to Record
PurposeWhy you're running this experiment; what you expect to learn
SettingData, algorithm changes, hyperparameters — everything needed to reproduce
ResultsQuantitative metrics + qualitative observations + specific good/failure cases
AnalysisDo results match expectations? If not, hypothesized causes ranked by likelihood
Next StepsWhat to do based on the analysis — YOU are the project leader
The "Next Steps" section is the most important. Don't wait for someone to tell you what to do next. Analyze your results and propose the next experiment yourself. This is what distinguishes a researcher from a technician.
Cross-cycle learning: If using
experiment-pipeline
, your experiment logs feed into
evo-memory
's ESE (Experiment Strategy Evolution) mechanism. Tag reusable strategies with
[Reusable]
so ESE can extract them for future cycles.
每个实验都应包含五个部分的日志记录。使用assets/experiment-log-template.md中的模板。
部分记录内容
目的你进行该实验的原因;期望了解的内容
设置数据、算法变更、超参数——所有用于复现实验的信息
结果定量指标 + 定性观察 + 具体的成功/失败案例
分析结果是否符合预期?如果不符合,按可能性排序的假设原因
后续步骤基于分析得出的下一步计划——你是项目负责人
“后续步骤”部分是最重要的。 不要等待他人告诉你下一步该做什么。分析你的结果并自行提出下一个实验。这是研究员与技术员的区别所在。
跨周期学习:如果使用
experiment-pipeline
,你的实验日志会输入到
evo-memory
的ESE(实验策略进化)机制中。使用
[Reusable]
标记可复用的策略,以便ESE可以提取这些策略用于未来的周期。

Return to experiment-pipeline

返回至experiment-pipeline

After completing the 5-step diagnostic flow, return to
experiment-pipeline
with:
  • Confirmed cause of failure (from Step 4)
  • Proposed fix and its verification status (from Step 5)
  • Updated experiment log entry
完成五步诊断流程后,携带以下内容返回至
experiment-pipeline
  • 已确认的失败原因(来自步骤4)
  • 提出的修复方案及其验证状态(来自步骤5)
  • 更新后的实验日志条目

Handoff to Paper Writing

交接至paper-writing

When experiments succeed and you have a complete set of results, pass these artifacts to
paper-writing
:
ArtifactSourceUsed By
Final experiment results (tables and figures)Experiment logsExperiments section
Ablation study resultsDiagnostic experimentsAblation tables
Failure case analysisStep 1 + Step 3Limitations discussion
Key implementation details and tricksSteps 3-5Method section / Supplementary
Baseline comparison resultsStep 2Comparison tables
当实验成功且你拥有完整的结果集时,将以下工件传递给
paper-writing
工件来源使用场景
最终实验结果(表格和图表)实验日志实验章节
消融实验结果诊断实验消融表格
失败案例分析步骤1 + 步骤3局限性讨论
关键实现细节和技巧步骤3-5方法章节 / 补充材料
基线对比结果步骤2对比表格

Reference Navigation

参考导航

TopicReference FileWhen to Use
Debugging methodologydebugging-methodology.mdDiagnosing why experiments fail
Experiment log templateexperiment-log-template.mdRecording experiment details
主题参考文件使用场景
调试方法论debugging-methodology.md诊断实验失败原因
实验日志模板experiment-log-template.md记录实验细节