Loading...
Loading...
Design and analyze A/B tests with proper statistical methodology including sample size calculation, randomization, frequentist and Bayesian approaches, and sequential testing. Use this skill when the user needs to set up an experiment, calculate required sample size, interpret test results, or decide between testing methodologies — even if they say 'should we A/B test this', 'how many users do we need', 'is the test result conclusive', or 'can we stop the test early'.
npx skill4agent add asgard-ai-platform/skills stat-ab-testingIRON LAW: Calculate Sample Size BEFORE Running the Test
Running a test without knowing the required sample size leads to two
failures: stopping too early (false positives) or running too long (waste).
Required inputs: baseline conversion rate, minimum detectable effect (MDE),
significance level (α), power (1-β). Calculate BEFORE starting.n per group ≈ (Z_α/2 + Z_β)² × [p₁(1-p₁) + p₂(1-p₂)] / (p₁ - p₂)²| Baseline Rate | MDE (relative) | N per Group |
|---|---|---|
| 5% | 10% (→5.5%) | ~58,000 |
| 5% | 20% (→6.0%) | ~15,000 |
| 10% | 10% (→11%) | ~15,000 |
| 10% | 20% (→12%) | ~4,000 |
| Approach | How It Works | Best When |
|---|---|---|
| Frequentist (fixed-horizon) | Set sample size, run to completion, then analyze | Standard practice, well-understood |
| Bayesian | Update beliefs with data, compute probability of improvement | Want probability statements ("90% chance B is better") |
| Sequential testing | Check results at intervals with adjusted thresholds | Need to stop early if clear winner, or limit downside risk |
# A/B Test Design: {Experiment Name}
## Hypothesis
- H₀: {no difference}
- H₁: {expected improvement}
- Primary metric: {metric}
- MDE: {X% relative}
## Sample Size
- Baseline rate: {X%}
- Required N per group: {N}
- Estimated duration: {days/weeks}
## Results (post-test)
| Metric | Control | Treatment | Diff | CI (95%) | p-value |
|--------|---------|-----------|------|----------|---------|
| {primary} | X% | X% | +X% | [X, X] | {value} |
## Decision
{Ship / Don't ship / Extend test} — {rationale}references/bayesian-ab.mdreferences/bandits.md