ab-test-setup
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseA/B Test Setup
A/B测试设置
1️⃣ Purpose & Scope
1️⃣ 目的与范围
Ensure every A/B test is valid, rigorous, and safe before a single line of code is written.
- Prevents "peeking"
- Enforces statistical power
- Blocks invalid hypotheses
确保在编写任何代码之前,每个A/B测试都是有效、严谨且安全的。
- 防止“提前查看结果”
- 确保统计效力(statistical power)
- 阻止无效假设
2️⃣ Pre-Requisites
2️⃣ 前置条件
You must have:
- A clear user problem
- Access to an analytics source
- Roughly estimated traffic volume
你必须具备:
- 明确的用户问题
- 访问分析数据源的权限
- 大致估算的流量规模
Hypothesis Quality Checklist
假设质量检查表
A valid hypothesis includes:
- Observation or evidence
- Single, specific change
- Directional expectation
- Defined audience
- Measurable success criteria
有效的假设需包含:
- 观察依据或实证
- 单一、明确的变更点
- 方向性预期
- 定义清晰的受众
- 可衡量的成功标准
3️⃣ Hypothesis Lock (Hard Gate)
3️⃣ 假设锁定(强制检查环节)
Before designing variants or metrics, you MUST:
- Present the final hypothesis
- Specify:
- Target audience
- Primary metric
- Expected direction of effect
- Minimum Detectable Effect (MDE)
Ask explicitly:
“Is this the final hypothesis we are committing to for this test?”
Do NOT proceed until confirmed.
在设计变体或指标之前,你必须:
- 提交最终假设
- 明确说明:
- 目标受众
- 核心指标
- 预期效果方向
- 最小可检测效应(Minimum Detectable Effect, MDE)
需明确询问:
“这是我们将为本次测试敲定的最终假设吗?”
在得到确认前不得推进。
4️⃣ Assumptions & Validity Check (Mandatory)
4️⃣ 假设与有效性检查(强制要求)
Explicitly list assumptions about:
- Traffic stability
- User independence
- Metric reliability
- Randomization quality
- External factors (seasonality, campaigns, releases)
If assumptions are weak or violated:
- Warn the user
- Recommend delaying or redesigning the test
明确列出关于以下内容的假设:
- 流量稳定性
- 用户独立性
- 指标可靠性
- 随机化质量
- 外部因素(季节性、营销活动、版本发布)
如果假设薄弱或不成立:
- 向用户发出警告
- 建议推迟或重新设计测试
5️⃣ Test Type Selection
5️⃣ 测试类型选择
Choose the simplest valid test:
- A/B Test – single change, two variants
- A/B/n Test – multiple variants, higher traffic required
- Multivariate Test (MVT) – interaction effects, very high traffic
- Split URL Test – major structural changes
Default to A/B unless there is a clear reason otherwise.
选择最简单的有效测试类型:
- A/B Test – 单一变更,两个变体
- A/B/n Test – 多个变体,需要更多流量
- Multivariate Test (MVT) – 交互效应,需要大量流量
- Split URL Test – 重大结构变更
除非有明确理由,否则默认选择A/B测试。
6️⃣ Metrics Definition
6️⃣ 指标定义
Primary Metric (Mandatory)
核心指标(强制要求)
- Single metric used to evaluate success
- Directly tied to the hypothesis
- Pre-defined and frozen before launch
- 用于评估测试成功的单一指标
- 与假设直接关联
- 在启动前预先定义并固定
Secondary Metrics
次要指标
- Provide context
- Explain why results occurred
- Must not override the primary metric
- 提供上下文信息
- 解释结果产生的原因
- 不得凌驾于核心指标之上
Guardrail Metrics
防护指标
- Metrics that must not degrade
- Used to prevent harmful wins
- Trigger test stop if significantly negative
- 不得出现恶化的指标
- 用于防止有害的“成功”结果
- 若出现显著负面结果,触发测试终止
7️⃣ Sample Size & Duration
7️⃣ 样本量与测试时长
Define upfront:
- Baseline rate
- MDE
- Significance level (typically 95%)
- Statistical power (typically 80%)
Estimate:
- Required sample size per variant
- Expected test duration
Do NOT proceed without a realistic sample size estimate.
提前定义:
- 基准转化率
- MDE
- 显著性水平(通常为95%)
- 统计效力(通常为80%)
估算:
- 每个变体所需的样本量
- 预期测试时长
未完成合理的样本量估算不得推进。
8️⃣ Execution Readiness Gate (Hard Stop)
8️⃣ 执行就绪性检查(强制终止环节)
You may proceed to implementation only if all are true:
- Hypothesis is locked
- Primary metric is frozen
- Sample size is calculated
- Test duration is defined
- Guardrails are set
- Tracking is verified
If any item is missing, stop and resolve it.
仅当以下所有条件满足时,方可进入实施阶段:
- 假设已锁定
- 核心指标已固定
- 样本量已计算完成
- 测试时长已定义
- 防护指标已设置
- 数据跟踪已验证
若有任何一项缺失,需停止并解决问题。
Running the Test
测试运行阶段
During the Test
测试进行中
DO:
- Monitor technical health
- Document external factors
DO NOT:
- Stop early due to “good-looking” results
- Change variants mid-test
- Add new traffic sources
- Redefine success criteria
需要做:
- 监控技术健康状况
- 记录外部因素
禁止做:
- 因“看起来不错”的结果提前终止测试
- 在测试中途变更变体
- 添加新的流量来源
- 重新定义成功标准
Analyzing Results
结果分析
Analysis Discipline
分析原则
When interpreting results:
- Do NOT generalize beyond the tested population
- Do NOT claim causality beyond the tested change
- Do NOT override guardrail failures
- Separate statistical significance from business judgment
解读结果时:
- 不得将结果推广到测试人群之外
- 不得声称测试变更之外的因果关系
- 不得无视防护指标的失败
- 将统计显著性与业务判断区分开
Interpretation Outcomes
解读结果与对应行动
| Result | Action |
|---|---|
| Significant positive | Consider rollout |
| Significant negative | Reject variant, document learning |
| Inconclusive | Consider more traffic or bolder change |
| Guardrail failure | Do not ship, even if primary wins |
| 结果 | 行动 |
|---|---|
| 显著正向结果 | 考虑全量推出 |
| 显著负向结果 | 拒绝变体,记录经验教训 |
| 无明确结论 | 考虑增加流量或采用更激进的变更 |
| 防护指标失败 | 即使核心指标达标,也不得上线 |
Documentation & Learning
文档记录与经验沉淀
Test Record (Mandatory)
测试记录(强制要求)
Document:
- Hypothesis
- Variants
- Metrics
- Sample size vs achieved
- Results
- Decision
- Learnings
- Follow-up ideas
Store records in a shared, searchable location to avoid repeated failures.
需记录:
- 假设内容
- 变体信息
- 指标定义
- 样本量目标与实际达成情况
- 测试结果
- 决策内容
- 经验教训
- 后续改进思路
将记录存储在共享、可搜索的位置,避免重复犯错。
Refusal Conditions (Safety)
拒绝推进的条件(安全规则)
Refuse to proceed if:
- Baseline rate is unknown and cannot be estimated
- Traffic is insufficient to detect the MDE
- Primary metric is undefined
- Multiple variables are changed without proper design
- Hypothesis cannot be clearly stated
Explain why and recommend next steps.
若出现以下情况,需拒绝推进测试:
- 基准转化率未知且无法估算
- 流量不足以检测MDE
- 核心指标未定义
- 在未进行合理设计的情况下变更多个变量
- 假设无法清晰表述
解释原因并建议下一步行动。
Key Principles (Non-Negotiable)
核心原则(不可妥协)
- One hypothesis per test
- One primary metric
- Commit before launch
- No peeking
- Learning over winning
- Statistical rigor first
- 一次测试对应一个假设
- 一个核心指标
- 启动前敲定所有内容
- 不得提前查看结果
- 经验优先于“获胜”
- 统计严谨性第一
Final Reminder
最终提醒
A/B testing is not about proving ideas right.
It is about learning the truth with confidence.
If you feel tempted to rush, simplify, or “just try it” —
that is the signal to slow down and re-check the design.
A/B测试的目的不是证明想法正确。
而是带着信心探索真相。
如果你急于推进、简化或“只是试试”——
这正是你需要放慢脚步,重新检查设计的信号。