design-of-experiments

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Design of Experiments

实验设计（Design of Experiments, DOE）

Purpose

用途

Design of Experiments (DOE) helps you systematically discover how multiple factors affect an outcome while minimizing the number of experimental runs. Instead of testing one variable at a time (inefficient) or guessing randomly (unreliable), DOE uses structured experimental designs to:

Screen many factors to find the critical few
Optimize factor settings to maximize/minimize a response
Discover interactions where factors affect each other
Map response surfaces to understand the full factor space
Validate robustness against noise and environmental variation

实验设计（DOE）可帮助你系统地挖掘多因素对结果的影响，同时最大限度减少实验次数。不同于单次单变量测试（效率低下）或随机测试（结果不可靠），DOE采用结构化的实验设计实现以下目标：

筛选大量因素，定位核心影响因素
优化因子配置，最大化/最小化响应结果
发现因子间的交互作用（即不同因子会互相影响效果）
绘制响应面，全面理解因子空间特征
验证方案对噪声和环境波动的鲁棒性

When to Use

适用场景

Use this skill when:

Limited experimental budget: You have constraints on time, cost, or resources for testing
Multiple factors: 3+ controllable variables that could affect the outcome
Interaction suspicion: Factors may interact (effect of A depends on level of B)
Optimization needed: Finding best settings, not just "better than baseline"
Screening required: Many candidate factors (10+), need to identify vital few
Response surface: Need to map curvature, find peaks/valleys, understand tradeoffs
Robust design: Must work well despite noise factors or environmental variation
Process improvement: Manufacturing, chemical processes, software performance tuning
Product development: Formulations, recipes, configurations with multiple parameters
A/B/n testing: Web/app features with multiple variants and combinations
Machine learning: Hyperparameter tuning for models with many parameters

Trigger phrases: "optimize", "tune parameters", "factorial test", "interaction effects", "response surface", "efficient experiments", "minimize runs", "robustness", "sensitivity analysis"

以下场景可使用本方法：

实验预算有限：测试的时间、成本或资源存在约束
多因素影响：存在3个及以上可能影响结果的可控变量
疑似存在交互作用：因子间可能存在关联（如A的效果取决于B的取值水平）
有优化需求：需要找到最优配置，而不仅是「比基线好」的方案
需要筛选因素：存在10个及以上候选因素，需要识别出关键少数
需要响应面分析：需要绘制曲率、找到极值、明确权衡关系
需要鲁棒设计：方案必须在存在噪声因素或环境波动的情况下稳定运行
流程改进：制造业、化工流程、软件性能调优等场景
产品开发：配方、工艺、多参数配置类产品的开发
A/B/n测试：存在多变量组合的Web/应用功能测试
机器学习：多参数模型的超参数调优

触发关键词：「优化」、「参数调优」、「析因测试」、「交互效应」、「响应面」、「高效实验」、「最少实验次数」、「鲁棒性」、「敏感性分析」

What Is It?

基本定义

Design of Experiments is a statistical framework for planning, executing, and analyzing experiments where you deliberately vary multiple input factors to observe effects on output responses.

Quick example:

You're optimizing a web signup flow with 3 factors:

Factor A: Form layout (single-page vs multi-step)
Factor B: CTA button color (blue vs green)
Factor C: Social proof (testimonials vs user count)

Naive approach: Test one at a time = 6 runs (2 levels each × 3 factors)

But you miss interactions! Maybe blue works better for single-page, green for multi-step.

DOE approach: 2³ factorial design = 8 runs

Tests all combinations: (single/blue/testimonials), (single/blue/count), (single/green/testimonials), etc.
Reveals main effects AND interactions
Statistical power to detect differences

Result: You discover that layout and CTA color interact strongly—multi-step + green outperforms everything, but single-page + blue is close second. Social proof has minimal effect. Make data-driven decision with confidence.

实验设计是一套统计框架，用于规划、执行和分析实验，你可以主动调整多个输入因子，观察其对输出响应的影响。

快速示例：

假设你要优化网站注册流程，存在3个影响因子：

因子A：表单布局（单页 vs 多步骤）
因子B：CTA按钮颜色（蓝色 vs 绿色）
因子C：社会证明（用户评价 vs 用户总量展示）

** naive方案**：单次只测一个变量，共需要6次实验（每个因子2个水平 × 3个因子）

但会错过交互效应！比如蓝色可能更适配单页布局，绿色更适配多步骤布局。

DOE方案：2³析因设计，共需要8次实验

测试所有组合：(单页/蓝色/评价)、(单页/蓝色/用户量)、(单页/绿色/评价) 等
可以同时得到主效应和交互效应结果
具备统计效力，可验证差异显著性

结果：你会发现布局和CTA颜色存在强交互——多步骤+绿色的表现远超其他组合，单页+蓝色的表现紧随其后，社会证明的影响可以忽略。最终可以基于数据做出高置信度的决策。

Workflow

工作流程

Copy this checklist and track your progress:

Design of Experiments Progress:
- [ ] Step 1: Define objectives and constraints
- [ ] Step 2: Identify factors, levels, and responses
- [ ] Step 3: Choose experimental design
- [ ] Step 4: Plan execution details
- [ ] Step 5: Create experiment plan document
- [ ] Step 6: Validate quality

Step 1: Define objectives and constraints

Clarify the experiment goal (screening vs optimization), response metric(s), experimental budget (max runs), time/cost constraints, and success criteria. See Common Patterns for typical objectives.

Step 2: Identify factors, levels, and responses

List all candidate factors (controllable inputs), specify levels for each factor (low/high or discrete values), categorize factors (control vs noise), and define response variables (measurable outputs). For screening many factors (8+), see resources/methodology.md for Plackett-Burman and fractional factorial approaches.

Step 3: Choose experimental design

Based on objective and constraints:

For screening 5+ factors with limited runs → Use resources/methodology.md for fractional factorial or Plackett-Burman
For optimizing 2-5 factors → Use resources/template.md for full or fractional factorial
For response surface mapping → Use resources/methodology.md for central composite or Box-Behnken
For robust design against noise → Use resources/methodology.md for parameter vs noise factor arrays

Step 4: Plan execution details

Specify randomization order (eliminate time trends), blocking strategy (control nuisance variables), replication plan (estimate error), sample size justification (power analysis), and measurement protocols. See Guardrails for critical requirements.

Step 5: Create experiment plan document

Create

design-of-experiments.md

with sections: objective, factors table, design matrix (run order with factor settings), response variables, execution protocol, and analysis plan. Use resources/template.md for structure.

Step 6: Validate quality

Self-assess using resources/evaluators/rubric_design_of_experiments.json. Check: objective clarity, factor completeness, design appropriateness, randomization plan, measurement protocol, statistical power, analysis plan, and deliverable quality. Minimum standard: Average score ≥ 3.5 before delivering.

复制以下清单跟踪你的进度：

Design of Experiments Progress:
- [ ] Step 1: Define objectives and constraints
- [ ] Step 2: Identify factors, levels, and responses
- [ ] Step 3: Choose experimental design
- [ ] Step 4: Plan execution details
- [ ] Step 5: Create experiment plan document
- [ ] Step 6: Validate quality

步骤1：明确目标和约束

明确实验目标（筛选/优化）、响应指标、实验预算（最大实验次数）、时间/成本约束、成功标准。常见目标可参考常见模式。

步骤2：识别因子、水平和响应

列出所有候选因子（可控输入），指定每个因子的水平（高/低或离散取值），对因子进行分类（可控因子 vs 噪声因子），定义响应变量（可量化的输出指标）。如果需要筛选8个以上的大量因子，可参考resources/methodology.md了解Plackett-Burman和部分析因设计方法。

步骤3：选择实验设计方案

根据目标和约束选择：

需要筛选5个以上因子且实验次数有限 → 参考resources/methodology.md选择部分析因设计或Plackett-Burman设计
需要优化2-5个因子 → 参考resources/template.md选择全析因或部分析因设计
需要做响应面映射 → 参考resources/methodology.md选择中心复合设计或Box-Behnken设计
需要做抗噪声鲁棒设计 → 参考resources/methodology.md选择田口方法的参数-噪声因子阵列

步骤4：规划执行细节

指定随机化实验顺序（消除时间趋势影响）、分块策略（控制干扰变量）、重复实验计划（估算误差）、样本量合理性说明（效力分析）、测量协议。关键要求可参考注意事项。

步骤5：生成实验计划文档

创建

design-of-experiments.md

，包含以下模块：目标、因子表、设计矩阵（带因子配置的实验顺序）、响应变量、执行协议、分析计划。可参考resources/template.md的结构。

步骤6：质量校验

使用resources/evaluators/rubric_design_of_experiments.json进行自我评估，校验项包括：目标清晰度、因子完整性、设计适配性、随机化计划、测量协议、统计效力、分析计划、交付物质量。最低标准：交付前平均得分≥3.5。

Common Patterns

常见模式

Pattern 1: Screening (many factors → vital few)

Context: 10-30 candidate factors, limited budget, want to identify 3-5 critical factors
Approach: Plackett-Burman or fractional factorial (Resolution III/IV)
Output: Pareto chart of effect sizes, shortlist for follow-up optimization
Example: Software performance tuning with 15 configuration parameters

Pattern 2: Optimization (find best settings)

Context: 2-5 factors already identified as important, want to find optimal levels
Approach: Full factorial (2^k) or fractional factorial + steepest ascent
Output: Main effects plot, interaction plots, recommended settings
Example: Manufacturing process with temperature, pressure, time factors

Pattern 3: Response Surface (map the landscape)

Context: Need to understand curvature, find maximum/minimum, quantify tradeoffs
Approach: Central Composite Design (CCD) or Box-Behnken
Output: Response surface equation, contour plots, optimal region
Example: Chemical formulation with ingredient ratios

Pattern 4: Robust Design (work despite noise)

Context: Product/process must perform well despite uncontrollable variation
Approach: Taguchi inner-outer array (control × noise factors)
Output: Settings that minimize sensitivity to noise factors
Example: Consumer product that must work across temperature/humidity ranges

Pattern 5: Sequential Experimentation (learn then refine)

Context: High uncertainty, want to learn iteratively with minimal waste
Approach: Screening → Steepest ascent → Response surface → Confirmation
Output: Progressively refined understanding and settings
Example: New product development with unknown factor relationships

模式1：筛选（大量因子→关键少数）

场景：存在10-30个候选因子，预算有限，需要识别出3-5个核心影响因子
方法：Plackett-Burman设计或部分析因设计（Resolution III/IV）
输出：效应大小帕累托图，可供后续优化的短名单
示例：含15个配置参数的软件性能调优

模式2：优化（找到最佳配置）

场景：已经识别出2-5个重要因子，需要找到最优取值水平
方法：全析因设计（2^k）或部分析因设计+最速上升法
输出：主效应图、交互效应图、推荐配置
示例：受温度、压力、时间影响的制造流程优化

模式3：响应面（映射全局特征）

场景：需要理解曲率、找到极值、量化权衡关系
方法：中心复合设计（CCD）或Box-Behnken设计
输出：响应面方程、等高线图、最优区间
示例：含成分配比的化工配方优化

模式4：鲁棒设计（抗噪声稳定运行）

场景：产品/流程必须在不可控波动下稳定运行
方法：田口内外阵列（可控因子×噪声因子）
输出：对噪声因子敏感度最低的配置
示例：需要在不同温湿度区间稳定运行的消费产品设计

模式5：序列实验（迭代学习优化）

场景：不确定性高，希望以最低成本迭代学习
方法：筛选→最速上升→响应面→验证
输出：逐步迭代优化的认知和配置
示例：因子关系未知的新产品开发

Guardrails

注意事项

Critical requirements:

Randomize run order: Eliminates time-order bias and confounding with lurking variables. Use random number generator, not "convenient" sequences.
Replicate center points: For designs with continuous factors, replicate center point runs (3-5 times) to estimate pure error and detect curvature.
Avoid confounding critical interactions: In fractional factorials, don't confound important 2-way interactions with main effects. Choose Resolution ≥ IV if interactions matter.
Check design balance: Ensure orthogonality (factors are uncorrelated in design matrix). Correlation > 0.3 reduces precision and interpretability.
Define response precisely: Use objective, quantitative, repeatable measurements. Avoid subjective scoring unless calibrated with multiple raters.
Justify sample size: Run power analysis to ensure design can detect meaningful effect sizes with acceptable Type II error risk (β ≤ 0.20).
Document assumptions: State expected effect magnitudes, interaction assumptions, noise variance estimates. Design validity depends on these.
Plan for analysis before running: Specify statistical tests, significance level (α), effect size metrics before data collection. Prevents p-hacking.

Common pitfalls:

❌ One-factor-at-a-time (OFAT): Misses interactions, requires more runs than factorial designs
❌ Ignoring blocking: If runs span days/batches/operators, block accordingly or confound results with time trends
❌ Too many levels: Use 2-3 levels initially. More levels increase runs exponentially.
❌ Unmeasured factors: If an important factor isn't controlled/measured, it becomes noise
❌ Changing protocols mid-experiment: Breaks design structure. If necessary, restart or analyze separately.

核心要求：

随机化实验顺序：消除时间顺序偏差和潜在变量混淆，使用随机数生成器确定顺序，不要采用「方便」的序列。
重复中心点实验：针对含连续因子的设计，重复3-5次中心点实验，以估算纯误差和检测曲率。
避免关键交互作用混淆：在部分析因设计中，不要让重要的双因子交互作用与主效应混淆。如果关注交互作用，选择Resolution≥IV的设计。
校验设计平衡性：确保正交性（设计矩阵中因子互不相关），相关性>0.3会降低精度和可解释性。
精确定义响应指标：采用客观、可量化、可复现的测量方式，除非经过多评分者校准，否则避免主观评分。
说明样本量合理性：开展效力分析，确保设计可以检测到有意义的效应大小，第二类错误风险可接受（β≤0.20）。
记录假设条件：说明预期效应量级、交互作用假设、噪声方差估算值，设计的有效性依赖这些前提。
实验前规划好分析方案：在收集数据前明确统计检验方法、显著性水平（α）、效应大小指标，避免p值操纵。

常见陷阱：

❌ 单次单因子法（OFAT）：会遗漏交互作用，相比析因设计需要更多实验次数
❌ 忽略分块处理：如果实验跨天/批次/操作人员，需要做分块处理，否则结果会和时间趋势混淆
❌ 水平设置过多：初期使用2-3个水平即可，更多水平会让实验次数呈指数级增长
❌ 遗漏未测量因子：如果重要因子没有被控制/测量，会成为噪声变量
❌ 实验中途修改协议：会破坏设计结构，如果必须修改，需要重启实验或单独分析

Quick Reference

快速参考

Key resources:

resources/template.md: Quick-start templates for common designs (factorial, screening, response surface)
resources/methodology.md: Advanced techniques (optimal designs, Taguchi, mixture experiments, sequential strategies)
resources/evaluators/rubric_design_of_experiments.json: Quality criteria for experiment plans

Typical workflow time:

Simple factorial (2-4 factors): 15-30 minutes
Screening design (8+ factors): 30-45 minutes
Response surface design: 45-60 minutes
Robust design (Taguchi): 60-90 minutes

When to escalate:

User needs mixture experiments (factors must sum to 100%)
Split-plot designs required (hard-to-change factors)
Optimal designs for irregular constraints
Bayesian adaptive designs → Use resources/methodology.md for these advanced cases

Inputs required:

Process/System: What you're experimenting on
Factors: List of controllable inputs with candidate levels
Responses: Measurable outputs (KPIs, metrics)
Constraints: Budget (max runs), time, resources
Objective: Screening, optimization, response surface, or robust design

Outputs produced:

```
design-of-experiments.md
```
: Complete experiment plan with design matrix, randomization, protocols, analysis approach

核心资源：

resources/template.md：常见设计（析因、筛选、响应面）的快速启动模板
resources/methodology.md：高级技术（最优设计、田口方法、混料实验、序列策略）
resources/evaluators/rubric_design_of_experiments.json：实验计划的质量校验标准

典型工作时长：

简单析因设计（2-4个因子）：15-30分钟
筛选设计（8个及以上因子）：30-45分钟
响应面设计：45-60分钟
鲁棒设计（田口方法）：60-90分钟

升级处理场景：

用户需要混料实验（因子总和必须为100%）
需要裂区设计（存在难以调整的因子）
非规则约束下的最优设计
贝叶斯自适应设计 → 以上高级场景可参考resources/methodology.md

所需输入：

流程/系统：实验对象
因子：带候选水平的可控输入列表
响应：可量化输出（KPI、指标）
约束：预算（最大实验次数）、时间、资源
目标：筛选、优化、响应面分析或鲁棒设计

交付输出：

```
design-of-experiments.md
```
：完整实验计划，包含设计矩阵、随机化规则、协议、分析方法