algo-forecast-ensemble

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Ensemble Forecasting

集成预测

Overview

概述

Ensemble forecasting combines predictions from multiple models to reduce variance and improve accuracy. Simple average of 3-5 diverse models often outperforms the best individual model. Methods: equal-weight average, inverse-error weighting, stacking with a meta-learner. The "forecast combination puzzle" shows simple averaging is hard to beat.

集成预测通过组合多个模型的预测结果来降低方差并提高准确性。3-5个不同模型的简单平均值通常优于最佳的单个模型。方法包括：等权平均、逆误差加权、使用元学习器的堆叠法。"预测组合谜题"表明简单平均很难被超越。

When to Use

使用场景

Trigger conditions:

Multiple forecasting models are available and perform similarly
Reducing forecast risk is more important than maximum accuracy
Building a production pipeline that's robust to model failure

When NOT to use:

When one model clearly dominates all others (just use that model)
When computational budget only allows one model

触发条件：

有多个预测模型可用且表现相近
降低预测风险比追求最高准确性更重要
构建对模型故障具有鲁棒性的生产级管道

不适用场景：

当某个模型明显优于所有其他模型时（直接使用该模型即可）
计算预算仅允许使用一个模型时

Algorithm

算法

IRON LAW: Simple Average Often Beats Complex Combination
The "forecast combination puzzle" (Stock & Watson, 2004): equal-weight
averaging of diverse models frequently outperforms sophisticated
weighting schemes. This is because weight estimation introduces noise
that offsets the theoretical gain. Start with simple average and only
move to weighted combination if you have abundant validation data.

IRON LAW: Simple Average Often Beats Complex Combination
The "forecast combination puzzle" (Stock & Watson, 2004): equal-weight
averaging of diverse models frequently outperforms sophisticated
weighting schemes. This is because weight estimation introduces noise
that offsets the theoretical gain. Start with simple average and only
move to weighted combination if you have abundant validation data.

Phase 1: Input Validation

阶段1：输入验证

Generate forecasts from 3+ diverse models (e.g., ARIMA, ETS, Prophet, ML-based). Ensure models are truly diverse (different assumptions/approaches). Gate: 3+ model forecasts available, models use different methodologies.

从3个及以上不同的模型（如ARIMA、ETS、Prophet、基于机器学习的模型）生成预测结果。确保模型真正具有多样性（基于不同的假设/方法）。 准入条件： 拥有3个及以上模型的预测结果，且模型采用不同的方法论。

Phase 2: Core Algorithm

阶段2：核心算法

Simple average: ŷ_ensemble = (1/M) × Σ ŷ_m

Inverse-error weighting: w_m = (1/MSE_m) / Σ(1/MSE_j), ŷ_ensemble = Σ w_m × ŷ_m

Stacking: Train a meta-model (linear regression) that learns optimal weights from cross-validated individual model predictions.

简单平均： ŷ_ensemble = (1/M) × Σ ŷ_m

逆误差加权： w_m = (1/MSE_m) / Σ(1/MSE_j), ŷ_ensemble = Σ w_m × ŷ_m

堆叠法： 训练一个元模型（线性回归），从交叉验证后的单个模型预测结果中学习最优权重。

Phase 3: Verification

阶段3：验证

Compare ensemble vs individual models on held-out data. Ensemble should: have lower average error AND lower maximum error (more robust). Gate: Ensemble RMSE ≤ best individual model RMSE.

在预留数据集上比较集成模型与单个模型的表现。集成模型应：具有更低的平均误差AND更低的最大误差（更稳健）。 准入条件： 集成模型的RMSE ≤ 最佳单个模型的RMSE。

Phase 4: Output

阶段4：输出

Return ensemble forecast with component model contributions.

返回带有各组件模型贡献的集成预测结果。

Output Format

输出格式

json

{
  "ensemble_forecast": [{"period": "2025-04", "forecast": 1200, "lower_95": 1050, "upper_95": 1350}],
  "model_forecasts": {"arima": 1180, "prophet": 1220, "ets": 1200},
  "weights": {"arima": 0.35, "prophet": 0.30, "ets": 0.35},
  "metadata": {"method": "inverse_error_weighted", "ensemble_rmse": 42, "best_individual_rmse": 48}
}

json

{
  "ensemble_forecast": [{"period": "2025-04", "forecast": 1200, "lower_95": 1050, "upper_95": 1350}],
  "model_forecasts": {"arima": 1180, "prophet": 1220, "ets": 1200},
  "weights": {"arima": 0.35, "prophet": 0.30, "ets": 0.35},
  "metadata": {"method": "inverse_error_weighted", "ensemble_rmse": 42, "best_individual_rmse": 48}
}

Examples

示例

Sample I/O

输入输出示例

Input: ARIMA forecast=1180, Prophet=1220, ETS=1200 for next month sales Expected: Simple average = 1200. If ARIMA historically best (lowest MSE), weighted average shifts toward 1180.

输入： 下月销售额的ARIMA预测值=1180，Prophet=1220，ETS=1200 预期结果： 简单平均值=1200。如果ARIMA历史表现最佳（MSE最低），加权平均值会向1180偏移。

Edge Cases

边缘情况

Input	Expected	Why
All models agree	Ensemble = individual	Consensus, high confidence
Models wildly disagree	Ensemble = compromise, wide CI	High uncertainty, flag for review
One model is outlier	Average dampens outlier	Ensemble robustness benefit

输入	预期结果	原因
所有模型预测一致	集成结果=单个模型结果	达成共识，置信度高
模型预测差异极大	集成结果=折中值，置信区间宽	不确定性高，需标记以便复查
某个模型结果为异常值	平均值会削弱异常值的影响	集成模型的稳健性优势

Gotchas

注意事项

Diversity is key: Combining 5 ARIMA variants adds little. Combine fundamentally different approaches (statistical + ML + judgmental).
Weight instability: Optimal weights estimated on past data may not be optimal in the future. Simple average avoids this instability.
Correlation between errors: If model errors are correlated (they often are), ensemble improvement is limited. Seek models with uncorrelated errors.
Confidence intervals: Combining point forecasts is easy. Combining prediction intervals properly requires knowledge of error correlation structure.
Over-engineering risk: For stable, well-understood series, a single well-tuned model may outperform an ensemble. Ensembles shine for uncertain or volatile series.

多样性是关键：组合5个ARIMA变体几乎没有增益。应组合根本不同的方法（统计方法+机器学习+判断性方法）。
权重不稳定性：基于历史数据估计的最优权重在未来可能不再最优。简单平均可避免这种不稳定性。
误差相关性：如果模型误差存在相关性（通常如此），集成模型的性能提升会受限。应选择误差不相关的模型。
置信区间：组合点预测很简单。正确组合预测区间需要了解误差的相关结构。
过度设计风险：对于稳定、易于理解的序列，单个调优良好的模型可能优于集成模型。集成模型在处理不确定或波动较大的序列时表现出色。

References

参考文献

For forecast combination methods survey, see
```
references/combination-survey.md
```
For stacking meta-learner implementation, see
```
references/stacking.md
```

有关预测组合方法的综述，请参阅
```
references/combination-survey.md
```
有关堆叠元学习器的实现，请参阅
```
references/stacking.md
```