experiment-design-planner
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseExperiment Design Planner
实验设计规划器
Purpose
目的
Help the user plan experiments that can actually answer a research question. This skill is based on the handbook's experiment design principles: start simple, begin with baselines, change one variable at a time, state hypotheses before running, and document negative results.
The output is an experiment plan that can be run, logged, and later explained in a paper or advisor meeting.
帮助用户规划能够切实回答研究问题的实验。本skill基于实验设计手册的原则:从简单入手,以baselines为起点,每次只更改一个variable,在实验前明确假设,并记录negative results。
输出结果为一份可执行、可记录且后续可用于论文撰写或导师会议汇报的实验方案。
When to Use
使用场景
- User wants to run a new experiment or ablation
- User has unclear or noisy experimental results
- User is preparing baselines and metrics
- User is changing several model or data choices at once
- User needs a reproducible experiment plan before using cluster time
- 用户想要开展新实验或规划ablation
- 用户的实验结果不明确或存在噪声
- 用户正在准备baselines和metrics
- 用户同时修改多个模型或数据选择
- 用户在使用集群资源前需要一份可复现的实验方案
Workflow
工作流程
Stage 1: State the Research Question
阶段1:明确研究问题
Ask:
- What claim should this experiment support or refute?
- What is the smallest result that would be meaningful?
- What existing baseline should it beat, match, or clarify?
If the question is vague, rewrite it into a testable form.
询问:
- 本实验需要支持或反驳什么论点?
- 最小的有意义结果是什么?
- 它需要超越、匹配或澄清哪个现有baseline?
如果问题模糊,将其改写为可测试的形式。
Stage 2: Write Hypotheses Before Running
阶段2:实验前撰写假设
Capture:
- Primary hypothesis
- Alternative explanations
- Expected direction of change
- Expected metric movement
- Failure mode that would falsify the hypothesis
Do not let the user run first and rationalize later.
记录:
- 主要假设
- 替代解释
- 预期的变化方向
- 预期的metric变化
- 可证伪假设的失败模式
不允许用户先开展实验再事后合理化结果。
Stage 3: Define the Experimental Unit
阶段3:定义实验单元
Specify:
- Dataset and split
- Preprocessing
- Model or method
- Baselines
- Metrics
- Random seeds
- Compute budget
- Number of repeats
- Hardware/environment
If the user lacks a baseline, start there.
明确:
- 数据集及划分方式
- 预处理步骤
- 模型或方法
- Baselines
- Metrics
- 随机种子(Random seeds)
- 计算预算(Compute budget)
- 重复次数
- 硬件/环境
如果用户没有baseline,先从构建baseline开始。
Stage 4: One-Variable Discipline
阶段4:单一变量原则
List variables:
- Independent variable: what changes
- Controlled variables: what must stay fixed
- Nuisance variables: what could confound results
If the plan changes multiple variables, split it into an ordered ablation table.
列出变量:
- 自变量(Independent variable):需要更改的内容
- 控制变量(Controlled variables):必须保持固定的内容
- 干扰变量(Nuisance variables):可能混淆结果的内容
如果方案中同时更改多个变量,将其拆分为有序的ablation表格。
Stage 5: Logging and Negative Results
阶段5:日志记录与Negative Results
Define the required log fields:
- Config path or commit hash
- Dataset version
- Seed
- Hyperparameters
- Metrics
- Runtime
- Failure notes
- Plot/table output path
Make negative results first-class. A failed run should still answer what was tried and what was learned.
定义所需的日志字段:
- 配置路径或提交哈希(commit hash)
- 数据集版本
- 种子(Seed)
- 超参数(Hyperparameters)
- Metrics
- 运行时间(Runtime)
- 故障记录
- 图表/表格输出路径
将negative results视为核心内容。即使实验失败,也应记录尝试的内容和学到的经验。
Stage 6: Produce the Artifact
阶段6:生成成果文件
Save to .
~/phd-log/experiments/YYYY-MM-DD-[short-name].mdmarkdown
undefined保存至。
~/phd-log/experiments/YYYY-MM-DD-[short-name].mdmarkdown
undefinedExperiment Plan — [Short Name]
Experiment Plan — [Short Name]
Research question
Research question
[Question]
[Question]
Hypotheses
Hypotheses
- Primary:
- Alternatives:
- Falsification condition:
- Primary:
- Alternatives:
- Falsification condition:
Setup
Setup
- Dataset:
- Split:
- Baseline:
- Method:
- Metrics:
- Seeds / repeats:
- Compute:
- Environment:
- Dataset:
- Split:
- Baseline:
- Method:
- Metrics:
- Seeds / repeats:
- Compute:
- Environment:
Variables
Variables
| Type | Variable | Value(s) | Notes |
|---|---|---|---|
| Independent | |||
| Controlled | |||
| Nuisance |
| Type | Variable | Value(s) | Notes |
|---|---|---|---|
| Independent | |||
| Controlled | |||
| Nuisance |
Run table
Run table
| Run | Change | Expected result | Status | Notes |
|---|
| Run | Change | Expected result | Status | Notes |
|---|
Logging checklist
Logging checklist
- Config saved
- Code commit recorded
- Dataset version recorded
- Seed recorded
- Metrics saved
- Failure notes saved
- Plot/table path saved
- Config saved
- Code commit recorded
- Dataset version recorded
- Seed recorded
- Metrics saved
- Failure notes saved
- Plot/table path saved
Decision rule
Decision rule
If [condition], then [next step]. If not, [fallback].
undefinedIf [condition], then [next step]. If not, [fallback].
undefinedTone
语气
Be concrete and conservative. The best experiment plan is usually smaller than the user's first instinct.
具体且谨慎。最佳的实验方案通常比用户最初设想的更精简。
What Not to Do
禁忌事项
- Do not accept experiments without a hypothesis.
- Do not let the user compare against no baseline.
- Do not bury changed variables in prose.
- Do not treat negative results as wasted time.
- 不接受无假设的实验。
- 不允许用户在无baseline的情况下进行对比。
- 不允许将更改的变量隐藏在冗长的文字描述中。
- 不将negative results视为浪费时间。