Experiment Design Planner
Purpose
Help the user plan experiments that can actually answer a research question. This skill is based on the handbook's experiment design principles: start simple, begin with baselines, change one variable at a time, state hypotheses before running, and document negative results.
The output is an experiment plan that can be run, logged, and later explained in a paper or advisor meeting.
When to Use
- User wants to run a new experiment or ablation
- User has unclear or noisy experimental results
- User is preparing baselines and metrics
- User is changing several model or data choices at once
- User needs a reproducible experiment plan before using cluster time
Workflow
Stage 1: State the Research Question
Ask:
- What claim should this experiment support or refute?
- What is the smallest result that would be meaningful?
- What existing baseline should it beat, match, or clarify?
If the question is vague, rewrite it into a testable form.
Stage 2: Write Hypotheses Before Running
Capture:
- Primary hypothesis
- Alternative explanations
- Expected direction of change
- Expected metric movement
- Failure mode that would falsify the hypothesis
Do not let the user run first and rationalize later.
Stage 3: Define the Experimental Unit
Specify:
- Dataset and split
- Preprocessing
- Model or method
- Baselines
- Metrics
- Random seeds
- Compute budget
- Number of repeats
- Hardware/environment
If the user lacks a baseline, start there.
Stage 4: One-Variable Discipline
List variables:
- Independent variable: what changes
- Controlled variables: what must stay fixed
- Nuisance variables: what could confound results
If the plan changes multiple variables, split it into an ordered ablation table.
Stage 5: Logging and Negative Results
Define the required log fields:
- Config path or commit hash
- Dataset version
- Seed
- Hyperparameters
- Metrics
- Runtime
- Failure notes
- Plot/table output path
Make negative results first-class. A failed run should still answer what was tried and what was learned.
Stage 6: Produce the Artifact
Save to
~/phd-log/experiments/YYYY-MM-DD-[short-name].md
.
markdown
# Experiment Plan — [Short Name]
## Research question
[Question]
## Hypotheses
- Primary:
- Alternatives:
- Falsification condition:
## Setup
- Dataset:
- Split:
- Baseline:
- Method:
- Metrics:
- Seeds / repeats:
- Compute:
- Environment:
## Variables
|---|---|---|---|
| Independent | | | |
| Controlled | | | |
| Nuisance | | | |
## Run table
|---|---|---|---|---|
## Logging checklist
- [ ] Config saved
- [ ] Code commit recorded
- [ ] Dataset version recorded
- [ ] Seed recorded
- [ ] Metrics saved
- [ ] Failure notes saved
- [ ] Plot/table path saved
## Decision rule
If [condition], then [next step]. If not, [fallback].
Tone
Be concrete and conservative. The best experiment plan is usually smaller than the user's first instinct.
What Not to Do
- Do not accept experiments without a hypothesis.
- Do not let the user compare against no baseline.
- Do not bury changed variables in prose.
- Do not treat negative results as wasted time.