experiment-design-planner

Original：🇺🇸 English

Translated

Help a CS or AI PhD student design hypothesis-driven experiments with baselines, variables, metrics, controls, logging, and stop conditions. Use this skill whenever the user is about to run experiments, compare models, plan an ablation, debug inconclusive results, prepare an experiment section, or wants to avoid changing too many things at once.

2installs

Sourcea-green-hand-jack/phd-skills

Added on2026-04-25

NPX Install

npx skill4agent add a-green-hand-jack/phd-skills experiment-design-planner

SKILL.md Content

View Translation Comparison →

Experiment Design Planner

Purpose

Help the user plan experiments that can actually answer a research question. This skill is based on the handbook's experiment design principles: start simple, begin with baselines, change one variable at a time, state hypotheses before running, and document negative results.

The output is an experiment plan that can be run, logged, and later explained in a paper or advisor meeting.

When to Use

User wants to run a new experiment or ablation
User has unclear or noisy experimental results
User is preparing baselines and metrics
User is changing several model or data choices at once
User needs a reproducible experiment plan before using cluster time

Workflow

Stage 1: State the Research Question

Ask:

What claim should this experiment support or refute?
What is the smallest result that would be meaningful?
What existing baseline should it beat, match, or clarify?

If the question is vague, rewrite it into a testable form.

Stage 2: Write Hypotheses Before Running

Capture:

Primary hypothesis
Alternative explanations
Expected direction of change
Expected metric movement
Failure mode that would falsify the hypothesis

Do not let the user run first and rationalize later.

Stage 3: Define the Experimental Unit

Specify:

Dataset and split
Preprocessing
Model or method
Baselines
Metrics
Random seeds
Compute budget
Number of repeats
Hardware/environment

If the user lacks a baseline, start there.

Stage 4: One-Variable Discipline

List variables:

Independent variable: what changes
Controlled variables: what must stay fixed
Nuisance variables: what could confound results

If the plan changes multiple variables, split it into an ordered ablation table.

Stage 5: Logging and Negative Results

Define the required log fields:

Config path or commit hash
Dataset version
Seed
Hyperparameters
Metrics
Runtime
Failure notes
Plot/table output path

Make negative results first-class. A failed run should still answer what was tried and what was learned.

Stage 6: Produce the Artifact

Save to

~/phd-log/experiments/YYYY-MM-DD-[short-name].md

.

markdown

# Experiment Plan — [Short Name]

## Research question
[Question]

## Hypotheses
- Primary:
- Alternatives:
- Falsification condition:

## Setup
- Dataset:
- Split:
- Baseline:
- Method:
- Metrics:
- Seeds / repeats:
- Compute:
- Environment:

## Variables
| Type | Variable | Value(s) | Notes |
|---|---|---|---|
| Independent |  |  |  |
| Controlled |  |  |  |
| Nuisance |  |  |  |

## Run table
| Run | Change | Expected result | Status | Notes |
|---|---|---|---|---|

## Logging checklist
- [ ] Config saved
- [ ] Code commit recorded
- [ ] Dataset version recorded
- [ ] Seed recorded
- [ ] Metrics saved
- [ ] Failure notes saved
- [ ] Plot/table path saved

## Decision rule
If [condition], then [next step]. If not, [fallback].

Tone

Be concrete and conservative. The best experiment plan is usually smaller than the user's first instinct.

What Not to Do

Do not accept experiments without a hypothesis.
Do not let the user compare against no baseline.
Do not bury changed variables in prose.
Do not treat negative results as wasted time.