🇺🇸

Original

English

🇨🇳

Translation

Chinese

Experiment Design

实验设计

An experiment design document defines all parameters needed to run a rigorous A/B test or controlled experiment. It ensures the team aligns on what you're testing, how you'll measure success, and how long to run the test before drawing conclusions. Good experiment design prevents common pitfalls: underpowered tests, unclear success criteria, and decisions based on noise rather than signal.

实验设计文档定义了开展严谨的A/B测试或对照实验所需的所有参数。它确保团队在测试内容、成功衡量方式以及得出结论前的测试时长上达成共识。优秀的实验设计可避免常见陷阱：测试效力不足、成功标准不明确，以及基于噪音而非有效信号做出决策。

When to Use

使用场景

Before launching an A/B test to validate a product change
When testing a hypothesis that requires quantitative validation
After solution design to validate assumptions before full rollout
When stakeholders want data-driven evidence for a decision
To establish a culture of experimentation and learning

在启动A/B测试以验证产品变更之前
当测试需要量化验证的假设时
完成方案设计后，在全面推广前验证假设时
当利益相关者需要基于数据的决策依据时
用于建立实验与学习的文化时

Instructions

操作步骤

When asked to design an experiment, follow these steps:

Articulate the Hypothesis Write a clear, testable hypothesis in the format: "We believe [change] for [users] will [outcome] as measured by [metric]." One hypothesis per experiment . if you're testing multiple things, run multiple experiments.
Define the Variants Describe the control (current experience) and treatment (new experience) in sufficient detail. Include screenshots, mockups, or precise descriptions so anyone can understand what users will see.
Choose Primary and Secondary Metrics Select one primary metric that will determine success or failure. Add 2-3 secondary metrics to understand the broader impact. Include guardrail metrics to catch unintended negative effects.
Calculate Sample Size Determine how many users you need per variant to detect your minimum detectable effect (MDE) with statistical significance. Specify your significance level (typically 0.05) and power (typically 0.80).
Estimate Duration Based on sample size and available traffic, calculate how long the experiment needs to run. Account for weekly patterns . avoid ending mid-week if behavior varies by day.
Define Targeting and Allocation Specify which users are eligible for the experiment and how traffic is split between variants. Document any exclusions (e.g., employees, specific segments).
Set Success Criteria Define upfront what constitutes a win, a loss, or an inconclusive result. This prevents post-hoc rationalization and moving goalposts.
Document Risks and Mitigations Identify what could go wrong and how you'll detect/address it. Include monitoring plans and rollback criteria.

当需要设计实验时，请遵循以下步骤：

明确假设 按照“我们认为，针对[用户群体]做出[变更]将带来[结果]，可通过[指标]衡量”的格式撰写清晰、可测试的假设。每个实验对应一个假设——如果要测试多个内容，请开展多个实验。
定义变体 详细描述对照组（当前体验）和实验组（新体验）。可包含截图、原型或精准描述，确保任何人都能理解用户将看到的内容。
选择主要与次要指标 选择一个决定实验成败的主要指标。添加2-3个次要指标以了解更广泛的影响。同时纳入防护指标，以捕捉意外的负面影响。
计算样本量 确定每个变体需要多少用户才能在统计显著性水平下检测到最小可检测效应（MDE）。明确显著性水平（通常为0.05）和检验效力（通常为0.80）。
估算时长 根据样本量和可用流量，计算实验所需的运行时长。考虑每周行为模式——如果用户行为随日期变化，避免在周中结束实验。
确定目标用户与流量分配 明确哪些用户符合实验参与条件，以及如何在变体间分配流量。记录所有排除情况（例如员工、特定用户群体）。
设定成功标准 提前定义什么是成功、失败或无结论结果。这可避免事后合理化解释和更改目标。
记录风险与缓解措施 识别可能出现的问题以及检测/解决方法。包含监控计划和回退标准。

Output Format

输出格式

Use the template in

references/TEMPLATE.md

to structure the output.

使用

references/TEMPLATE.md

中的模板来组织输出内容。

Quality Checklist

质量检查清单

Before finalizing, verify:

Hypothesis is falsifiable and specific
Only one primary metric is defined
Sample size calculation is documented with assumptions
Duration accounts for traffic patterns and statistical requirements
Success criteria are defined before the experiment starts
Guardrail metrics protect against unintended harm

定稿前，请验证以下内容：

假设可证伪且具体
仅定义了一个主要指标
样本量计算已记录并附带假设条件
时长考虑了流量模式和统计要求
成功标准在实验开始前已定义
防护指标可防范意外损害

See

references/EXAMPLE.md

for a completed example.

请查看

references/EXAMPLE.md

获取完整示例。

measure-experiment-design