Adaptive Walk-Forward Epoch Selection (AWFES)
自适应滚动窗口Epoch选择(AWFES)
Machine-readable reference for adaptive epoch selection within Walk-Forward Optimization (WFO). Optimizes training epochs per-fold using Walk-Forward Efficiency (WFE) as the objective.
这是滚动窗口优化(WFO)中自适应Epoch选择的机器可读参考文档,以滚动窗口效率(WFE)为目标,对每个折的训练Epoch进行优化。
When to Use This Skill
何时使用该工具
Use this skill when:
- Selecting optimal training epochs for ML models in WFO
- Avoiding overfitting via Walk-Forward Efficiency metrics
- Implementing per-fold adaptive epoch selection
- Computing efficient frontiers for epoch-performance trade-offs
- Carrying epoch priors across WFO folds
在以下场景中使用该工具:
- 为WFO中的ML模型选择最优训练Epoch
- 通过滚动窗口效率指标避免过拟合
- 实现按折自适应的Epoch选择
- 计算Epoch-性能权衡的有效前沿
- 在WFO折之间传递Epoch先验值
python
from adaptive_wfo_epoch import AWFESConfig, compute_efficient_frontier
python
from adaptive_wfo_epoch import AWFESConfig, compute_efficient_frontier
Generate epoch candidates from search bounds and granularity
从搜索范围和粒度生成Epoch候选值
config = AWFESConfig.from_search_space(
min_epoch=100,
max_epoch=2000,
granularity=5, # Number of frontier points
)
config = AWFESConfig.from_search_space(
min_epoch=100,
max_epoch=2000,
granularity=5, # 前沿点数量
)
config.epoch_configs → [100, 211, 447, 945, 2000] (log-spaced)
config.epoch_configs → [100, 211, 447, 945, 2000](对数间距)
Per-fold epoch sweep
按折进行Epoch扫描
for fold in wfo_folds:
epoch_metrics = []
for epoch in config.epoch_configs:
is_sharpe, oos_sharpe = train_and_evaluate(fold, epochs=epoch)
wfe = config.compute_wfe(is_sharpe, oos_sharpe, n_samples=len(fold.train))
epoch_metrics.append({"epoch": epoch, "wfe": wfe, "is_sharpe": is_sharpe})
# Select from efficient frontier
selected_epoch = compute_efficient_frontier(epoch_metrics)
# Carry forward to next fold as prior
prior_epoch = selected_epoch
for fold in wfo_folds:
epoch_metrics = []
for epoch in config.epoch_configs:
is_sharpe, oos_sharpe = train_and_evaluate(fold, epochs=epoch)
wfe = config.compute_wfe(is_sharpe, oos_sharpe, n_samples=len(fold.train))
epoch_metrics.append({"epoch": epoch, "wfe": wfe, "is_sharpe": is_sharpe})
# 从有效前沿中选择
selected_epoch = compute_efficient_frontier(epoch_metrics)
# 传递到下一个折作为先验值
prior_epoch = selected_epoch
Per-fold adaptive epoch selection where:
- Train models across a range of epochs (e.g., 400, 800, 1000, 2000)
- Compute WFE = OOS_Sharpe / IS_Sharpe for each epoch count
- Find the "efficient frontier" - epochs maximizing WFE vs training cost
- Select optimal epoch from frontier for OOS evaluation
- Carry forward as prior for next fold
按折自适应Epoch选择,步骤如下:
- 在一系列Epoch上训练模型(例如400、800、1000、2000)
- 为每个Epoch数量计算WFE = OOS_Sharpe / IS_Sharpe
- 找到“有效前沿”——在WFE与训练成本之间达到最优的Epoch
- 从前沿中选择最优Epoch用于OOS评估
- 将该Epoch传递到下一个折作为先验值
What This Is NOT
什么不是AWFES
- NOT early stopping: Early stopping monitors validation loss continuously; this evaluates discrete candidates post-hoc
- NOT Bayesian optimization: No surrogate model; direct evaluation of all candidates
- NOT nested cross-validation: Uses temporal WFO, not shuffled splits
- 不是早停机制:早停会持续监控验证损失,而本工具是在事后评估离散候选值
- 不是贝叶斯优化:没有 surrogate 模型,直接评估所有候选值
- 不是嵌套交叉验证:使用时序WFO,而非打乱的拆分
| Concept | Citation | Key Insight |
|---|
| Walk-Forward Efficiency | Pardo (1992, 2008) | WFE = OOS_Return / IS_Return as robustness metric |
| Deflated Sharpe Ratio | Bailey & López de Prado (2014) | Adjusts for multiple testing |
| Pareto-Optimal HP Selection | Bischl et al. (2023) | Multi-objective hyperparameter optimization |
| Warm-Starting | Nomura & Ono (2021) | Transfer knowledge between optimization runs |
See references/academic-foundations.md for full literature review.
| 概念 | 引用文献 | 核心见解 |
|---|
| 滚动窗口效率(Walk-Forward Efficiency) | Pardo (1992, 2008) | WFE = OOS_Return / IS_Return 作为鲁棒性指标 |
| 压缩夏普比率(Deflated Sharpe Ratio) | Bailey & López de Prado (2014) | 针对多重检验进行调整 |
| 帕累托最优超参数选择(Pareto-Optimal HP Selection) | Bischl et al. (2023) | 多目标超参数优化 |
| 热启动(Warm-Starting) | Nomura & Ono (2021) | 在优化运行之间传递知识 |
完整文献综述请参见 references/academic-foundations.md。
Core Formula: Walk-Forward Efficiency
核心公式:滚动窗口效率
python
def compute_wfe(
is_sharpe: float,
oos_sharpe: float,
n_samples: int | None = None,
) -> float | None:
"""Walk-Forward Efficiency - measures performance transfer.
WFE = OOS_Sharpe / IS_Sharpe
Interpretation (guidelines, not hard thresholds):
- WFE ≥ 0.70: Excellent transfer (low overfitting)
- WFE 0.50-0.70: Good transfer
- WFE 0.30-0.50: Moderate transfer (investigate)
- WFE < 0.30: Severe overfitting (likely reject)
The IS_Sharpe minimum is derived from signal-to-noise ratio,
not a fixed magic number. See compute_is_sharpe_threshold().
Reference: Pardo (2008) "The Evaluation and Optimization of Trading Strategies"
"""
# Data-driven threshold: IS_Sharpe must exceed 2σ noise floor
min_is_sharpe = compute_is_sharpe_threshold(n_samples) if n_samples else 0.1
if abs(is_sharpe) < min_is_sharpe:
return None
return oos_sharpe / is_sharpe
python
def compute_wfe(
is_sharpe: float,
oos_sharpe: float,
n_samples: int | None = None,
) -> float | None:
"""滚动窗口效率 - 衡量性能迁移能力。
WFE = OOS_Sharpe / IS_Sharpe
解读(指南,非硬性阈值):
- WFE ≥ 0.70: 优秀的迁移能力(低过拟合)
- WFE 0.50-0.70: 良好的迁移能力
- WFE 0.30-0.50: 中等迁移能力(需调研)
- WFE < 0.30: 严重过拟合(可能需要拒绝)
IS_Sharpe的最小值由信噪比推导而来,
并非固定的魔法数字。请参见compute_is_sharpe_threshold()。
参考:Pardo (2008) 《交易策略的评估与优化》
"""
# 数据驱动的阈值:IS_Sharpe必须超过2σ噪声下限
min_is_sharpe = compute_is_sharpe_threshold(n_samples) if n_samples else 0.1
if abs(is_sharpe) < min_is_sharpe:
return None
return oos_sharpe / is_sharpe
Principled Configuration Framework
原则性配置框架
All parameters in AWFES are derived from first principles or data characteristics, not arbitrary magic numbers.
AWFES中的所有参数均从第一性原理或数据特征推导而来,而非任意的魔法数字。
AWFESConfig: Unified Configuration
AWFESConfig:统一配置
python
from dataclasses import dataclass, field
from typing import Literal
import numpy as np
@dataclass
class AWFESConfig:
"""AWFES configuration with principled parameter derivation.
No magic numbers - all values derived from search space or data.
"""
# Search space bounds (user-specified)
min_epoch: int
max_epoch: int
granularity: int # Number of frontier points
# Derived automatically
epoch_configs: list[int] = field(init=False)
prior_variance: float = field(init=False)
observation_variance: float = field(init=False)
# Market context for annualization
# crypto_session_filtered: Use when data is filtered to London-NY weekday hours
market_type: Literal["crypto_24_7", "crypto_session_filtered", "equity", "forex"] = "crypto_24_7"
time_unit: Literal["bar", "daily", "weekly"] = "weekly"
def __post_init__(self):
# Generate epoch configs with log spacing (optimal for frontier discovery)
self.epoch_configs = self._generate_epoch_configs()
# Derive Bayesian variances from search space
self.prior_variance, self.observation_variance = self._derive_variances()
def _generate_epoch_configs(self) -> list[int]:
"""Generate epoch candidates with log spacing.
Log spacing is optimal for efficient frontier because:
1. Early epochs: small changes matter more (underfit → fit transition)
2. Late epochs: diminishing returns (already near convergence)
3. Uniform coverage of the WFE vs cost trade-off space
Formula: epoch_i = min × (max/min)^(i/(n-1))
"""
if self.granularity < 2:
return [self.min_epoch]
log_min = np.log(self.min_epoch)
log_max = np.log(self.max_epoch)
log_epochs = np.linspace(log_min, log_max, self.granularity)
return sorted(set(int(round(np.exp(e))) for e in log_epochs))
def _derive_variances(self) -> tuple[float, float]:
"""Derive Bayesian variances from search space.
Principle: Prior should span the search space with ~95% coverage.
For Normal distribution: 95% CI = mean ± 1.96σ
If we want 95% of prior mass in [min_epoch, max_epoch]:
range = max - min = 2 × 1.96 × σ = 3.92σ
σ = range / 3.92
σ² = (range / 3.92)²
Observation variance: Set to achieve reasonable learning rate.
Rule: observation_variance ≈ prior_variance / 4
This means each observation updates the posterior meaningfully
but doesn't dominate the prior immediately.
"""
epoch_range = self.max_epoch - self.min_epoch
prior_std = epoch_range / 3.92 # 95% CI spans search space
prior_variance = prior_std ** 2
# Observation variance: 1/4 of prior for balanced learning
# This gives ~0.2 weight to each new observation initially
observation_variance = prior_variance / 4
return prior_variance, observation_variance
@classmethod
def from_search_space(
cls,
min_epoch: int,
max_epoch: int,
granularity: int = 5,
market_type: str = "crypto_24_7",
) -> "AWFESConfig":
"""Create config from search space bounds."""
return cls(
min_epoch=min_epoch,
max_epoch=max_epoch,
granularity=granularity,
market_type=market_type,
)
def compute_wfe(
self,
is_sharpe: float,
oos_sharpe: float,
n_samples: int | None = None,
) -> float | None:
"""Compute WFE with data-driven IS_Sharpe threshold."""
min_is = compute_is_sharpe_threshold(n_samples) if n_samples else 0.1
if abs(is_sharpe) < min_is:
return None
return oos_sharpe / is_sharpe
def get_annualization_factor(self) -> float:
"""Get annualization factor to scale Sharpe from time_unit to ANNUAL.
IMPORTANT: This returns sqrt(periods_per_year) for scaling to ANNUAL Sharpe.
For daily-to-weekly scaling, use get_daily_to_weekly_factor() instead.
Principled derivation:
- Sharpe scales with √(periods per year)
- Crypto 24/7: 365 days/year, 52.14 weeks/year
- Crypto session-filtered: 252 days/year (like equity)
- Equity: 252 trading days/year, ~52 weeks/year
- Forex: ~252 days/year (varies by pair)
"""
PERIODS_PER_YEAR = {
("crypto_24_7", "daily"): 365,
("crypto_24_7", "weekly"): 52.14,
("crypto_24_7", "bar"): None, # Cannot annualize bars directly
("crypto_session_filtered", "daily"): 252, # London-NY weekdays only
("crypto_session_filtered", "weekly"): 52,
("equity", "daily"): 252,
("equity", "weekly"): 52,
("forex", "daily"): 252,
}
key = (self.market_type, self.time_unit)
periods = PERIODS_PER_YEAR.get(key)
if periods is None:
raise ValueError(
f"Cannot annualize {self.time_unit} for {self.market_type}. "
"Use daily or weekly aggregation first."
)
return np.sqrt(periods)
def get_daily_to_weekly_factor(self) -> float:
"""Get factor to scale DAILY Sharpe to WEEKLY Sharpe.
This is different from get_annualization_factor()!
- Daily → Weekly: sqrt(days_per_week)
- Daily → Annual: sqrt(days_per_year) (use get_annualization_factor)
Market-specific:
- Crypto 24/7: sqrt(7) = 2.65 (7 trading days/week)
- Crypto session-filtered: sqrt(5) = 2.24 (weekdays only)
- Equity: sqrt(5) = 2.24 (5 trading days/week)
"""
DAYS_PER_WEEK = {
"crypto_24_7": 7,
"crypto_session_filtered": 5, # London-NY weekdays only
"equity": 5,
"forex": 5,
}
days = DAYS_PER_WEEK.get(self.market_type)
if days is None:
raise ValueError(f"Unknown market type: {self.market_type}")
return np.sqrt(days)
python
from dataclasses import dataclass, field
from typing import Literal
import numpy as np
@dataclass
class AWFESConfig:
"""AWFES配置,基于原则性参数推导。
无魔法数字 - 所有值均从搜索空间或数据中推导得出。
"""
# 搜索空间边界(用户指定)
min_epoch: int
max_epoch: int
granularity: int # 前沿点数量
# 自动推导
epoch_configs: list[int] = field(init=False)
prior_variance: float = field(init=False)
observation_variance: float = field(init=False)
# 用于年化的市场环境
# crypto_session_filtered: 当数据过滤为伦敦-纽约工作日时段时使用
market_type: Literal["crypto_24_7", "crypto_session_filtered", "equity", "forex"] = "crypto_24_7"
time_unit: Literal["bar", "daily", "weekly"] = "weekly"
def __post_init__(self):
# 生成对数间距的Epoch配置(对前沿发现最优)
self.epoch_configs = self._generate_epoch_configs()
# 从搜索空间推导贝叶斯方差
self.prior_variance, self.observation_variance = self._derive_variances()
def _generate_epoch_configs(self) -> list[int]:
"""生成对数间距的Epoch候选值。
对数间距对有效前沿而言是最优的,原因如下:
1. 早期Epoch:微小变化影响更大(从欠拟合到拟合的过渡)
2. 晚期Epoch:收益递减(已接近收敛)
3. 均匀覆盖WFE与成本的权衡空间
公式:epoch_i = min × (max/min)^(i/(n-1))
"""
if self.granularity < 2:
return [self.min_epoch]
log_min = np.log(self.min_epoch)
log_max = np.log(self.max_epoch)
log_epochs = np.linspace(log_min, log_max, self.granularity)
return sorted(set(int(round(np.exp(e))) for e in log_epochs))
def _derive_variances(self) -> tuple[float, float]:
"""从搜索空间推导贝叶斯方差。
原则:先验分布应覆盖搜索空间的~95%范围。
对于正态分布:95%置信区间 = 均值 ± 1.96σ
如果希望95%的先验质量落在[min_epoch, max_epoch]内:
range = max - min = 2 × 1.96 × σ = 3.92σ
σ = range / 3.92
σ² = (range / 3.92)²
观测方差:设置为实现合理的学习率。
规则:observation_variance ≈ prior_variance / 4
这意味着每个观测值都会有意义地更新后验分布,但不会立即主导先验分布。
"""
epoch_range = self.max_epoch - self.min_epoch
prior_std = epoch_range / 3.92 # 95%置信区间覆盖搜索空间
prior_variance = prior_std ** 2
# 观测方差:先验的1/4,实现平衡学习
# 初始时每个新观测值的权重约为0.2
observation_variance = prior_variance / 4
return prior_variance, observation_variance
@classmethod
def from_search_space(
cls,
min_epoch: int,
max_epoch: int,
granularity: int = 5,
market_type: str = "crypto_24_7",
) -> "AWFESConfig":
"""从搜索空间边界创建配置。"""
return cls(
min_epoch=min_epoch,
max_epoch=max_epoch,
granularity=granularity,
market_type=market_type,
)
def compute_wfe(
self,
is_sharpe: float,
oos_sharpe: float,
n_samples: int | None = None,
) -> float | None:
"""使用数据驱动的IS_Sharpe阈值计算WFE。"""
min_is = compute_is_sharpe_threshold(n_samples) if n_samples else 0.1
if abs(is_sharpe) < min_is:
return None
return oos_sharpe / is_sharpe
def get_annualization_factor(self) -> float:
"""获取年化因子,将Sharpe从time_unit转换为年度。
重要提示:返回sqrt(每年周期数)用于转换为年度Sharpe。
如需从日度转换为周度,请使用get_daily_to_weekly_factor()。
原则性推导:
- Sharpe与√(每年周期数)成正比
- 加密货币24/7:每年365天,52.14周
- 过滤时段的加密货币:每年252天(与股票相同)
- 股票:每年252个交易日,约52周
- 外汇:每年约252天(因货币对而异)
"""
PERIODS_PER_YEAR = {
("crypto_24_7", "daily"): 365,
("crypto_24_7", "weekly"): 52.14,
("crypto_24_7", "bar"): None, # 无法直接对bar进行年化
("crypto_session_filtered", "daily"): 252, # 仅伦敦-纽约工作日
("crypto_session_filtered", "weekly"): 52,
("equity", "daily"): 252,
("equity", "weekly"): 52,
("forex", "daily"): 252,
}
key = (self.market_type, self.time_unit)
periods = PERIODS_PER_YEAR.get(key)
if periods is None:
raise ValueError(
f"无法对{self.market_type}的{self.time_unit}进行年化。 "
"请先进行日度或周度聚合。"
)
return np.sqrt(periods)
def get_daily_to_weekly_factor(self) -> float:
"""获取将日度Sharpe转换为周度Sharpe的因子。
这与get_annualization_factor()不同!
- 日度 → 周度:sqrt(每周天数)
- 日度 → 年度:sqrt(每年天数) (使用get_annualization_factor)
市场特定值:
- 加密货币24/7:sqrt(7) = 2.65(每周7个交易日)
- 过滤时段的加密货币:sqrt(5) = 2.24(仅工作日)
- 股票:sqrt(5) = 2.24(每周5个交易日)
"""
DAYS_PER_WEEK = {
"crypto_24_7": 7,
"crypto_session_filtered": 5, # 仅伦敦-纽约工作日
"equity": 5,
"forex": 5,
}
days = DAYS_PER_WEEK.get(self.market_type)
if days is None:
raise ValueError(f"未知市场类型:{self.market_type}")
return np.sqrt(days)
IS_Sharpe Threshold: Signal-to-Noise Derivation
IS_Sharpe阈值:信噪比推导
python
def compute_is_sharpe_threshold(n_samples: int | None = None) -> float:
"""Compute minimum IS_Sharpe threshold from signal-to-noise ratio.
Principle: IS_Sharpe must be statistically distinguishable from zero.
Under null hypothesis (no skill), Sharpe ~ N(0, 1/√n).
To reject null at α=0.05 (one-sided), need Sharpe > 1.645/√n.
For practical use, we use 2σ threshold (≈97.7% confidence):
threshold = 2.0 / √n
This adapts to sample size:
- n=100: threshold ≈ 0.20
- n=400: threshold ≈ 0.10
- n=1600: threshold ≈ 0.05
Fallback for unknown n: 0.1 (assumes n≈400, typical fold size)
Rationale for 0.1 fallback:
- 2/√400 = 0.1, so 0.1 assumes ~400 samples per fold
- This is conservative: 400 samples is typical for weekly folds
- If actual n is smaller, threshold is looser (accepts more noise)
- If actual n is larger, threshold is tighter (fine, we're conservative)
- The 0.1 value also corresponds to "not statistically distinguishable
from zero at reasonable sample sizes" - a natural floor for Sharpe SE
"""
if n_samples is None or n_samples < 10:
# Conservative fallback: 0.1 assumes ~400 samples (typical fold size)
# Derivation: 2/√400 = 0.1; see rationale above
return 0.1
return 2.0 / np.sqrt(n_samples)
python
def compute_is_sharpe_threshold(n_samples: int | None = None) -> float:
"""从信噪比推导最小IS_Sharpe阈值。
原则:IS_Sharpe必须在统计上显著区别于零。
在原假设(无技能)下,Sharpe ~ N(0, 1/√n)。
要在α=0.05(单侧)下拒绝原假设,需要Sharpe > 1.645/√n。
实际使用中,我们使用2σ阈值(≈97.7%置信度):
threshold = 2.0 / √n
该阈值会随样本量调整:
- n=100: 阈值≈0.20
- n=400: 阈值≈0.10
- n=1600: 阈值≈0.05
未知n时的 fallback:0.1(假设n≈400,典型折大小)
0.1 fallback的理由:
- 2/√400 = 0.1,因此0.1假设每个折约400个样本
- 这是保守值:400个样本是周度折的典型大小
- 如果实际n更小,阈值更宽松(接受更多噪声)
- 如果实际n更大,阈值更严格(没问题,我们是保守的)
- 0.1值也对应“在合理样本量下无法与零统计区分”——Sharpe标准误的自然下限
"""
if n_samples is None or n_samples < 10:
# 保守fallback:0.1假设约400个样本(典型折大小)
# 推导:2/√400 = 0.1;参见上述理由
return 0.1
return 2.0 / np.sqrt(n_samples)
Guardrails (Principled Guidelines)
防护准则(原则性指南)
G1: WFE Thresholds
G1: WFE阈值
The traditional thresholds (0.30, 0.50, 0.70) are guidelines based on practitioner consensus, not derived from first principles. They represent:
| Threshold | Meaning | Statistical Basis |
|---|
| 0.30 | Hard reject | Retaining <30% of IS performance is almost certainly noise |
| 0.50 | Warning | At 50%, half the signal is lost - investigate |
| 0.70 | Target | Industry standard for "good" transfer |
传统阈值(0.30、0.50、0.70)是基于从业者共识的指南,而非从第一性原理推导而来。它们代表:
| 阈值 | 含义 | 统计基础 |
|---|
| 0.30 | 硬性拒绝 | 保留的IS性能<30%几乎可以肯定是噪声 |
| 0.50 | 警告 | 达到50%时,一半信号丢失——需调研 |
| 0.70 | 目标值 | 行业标准的“良好”迁移能力 |
These are GUIDELINES, not hard rules
这些是指南,而非硬性规则
Adjust based on your domain and risk tolerance
根据你的领域和风险容忍度进行调整
WFE_THRESHOLDS = {
"hard_reject": 0.30, # Below this: almost certainly overfitting
"warning": 0.50, # Below this: significant signal loss
"target": 0.70, # Above this: good generalization
}
def classify_wfe(wfe: float | None) -> str:
"""Classify WFE with principled thresholds."""
if wfe is None:
return "INVALID" # IS_Sharpe below noise floor
if wfe < WFE_THRESHOLDS["hard_reject"]:
return "REJECT"
if wfe < WFE_THRESHOLDS["warning"]:
return "INVESTIGATE"
if wfe < WFE_THRESHOLDS["target"]:
return "ACCEPTABLE"
return "EXCELLENT"
WFE_THRESHOLDS = {
"hard_reject": 0.30, # 低于此值:几乎肯定过拟合
"warning": 0.50, # 低于此值:显著信号丢失
"target": 0.70, # 高于此值:良好的泛化能力
}
def classify_wfe(wfe: float | None) -> str:
"""使用原则性阈值对WFE进行分类。"""
if wfe is None:
return "INVALID" # IS_Sharpe低于噪声下限
if wfe < WFE_THRESHOLDS["hard_reject"]:
return "REJECT"
if wfe < WFE_THRESHOLDS["warning"]:
return "INVESTIGATE"
if wfe < WFE_THRESHOLDS["target"]:
return "ACCEPTABLE"
return "EXCELLENT"
G2: IS_Sharpe Minimum (Data-Driven)
G2: IS_Sharpe最小值(数据驱动)
WRONG: Fixed threshold regardless of sample size
错误:固定阈值,与样本量无关
if is_sharpe < 1.0:
wfe = None
**NEW (principled):**
```python
if is_sharpe < 1.0:
wfe = None
CORRECT: Threshold adapts to sample size
正确:阈值随样本量调整
min_is_sharpe = compute_is_sharpe_threshold(n_samples)
if is_sharpe < min_is_sharpe:
wfe = None # Below noise floor for this sample size
The threshold derives from the standard error of Sharpe ratio: SE(SR) ≈ 1/√n.
**Note on SE(Sharpe) approximation**: The formula `1/√n` is a first-order approximation valid when SR is small (close to 0). The full Lo (2002) formula is:
SE(SR) = √((1 + 0.5×SR²) / n)
For high-Sharpe strategies (SR > 1.0), the simplified formula underestimates SE by ~25-50%. Use the full formula when evaluating strategies with SR > 1.0.
min_is_sharpe = compute_is_sharpe_threshold(n_samples)
if is_sharpe < min_is_sharpe:
wfe = None # 低于该样本量下的噪声下限
该阈值来源于Sharpe比率的标准误:SE(SR) ≈ 1/√n。
**关于SE(Sharpe)近似的说明**:公式`1/√n`是一阶近似,在SR较小(接近0)时有效。完整的Lo (2002)公式为:
SE(SR) = √((1 + 0.5×SR²) / n)
对于高Sharpe策略(SR > 1.0),简化公式会低估SE约25-50%。当评估SR > 1.0的策略时,请使用完整公式。
G3: Stability Penalty for Epoch Changes (Adaptive)
G3: Epoch变化的稳定性惩罚(自适应)
The stability penalty prevents hyperparameter churn. Instead of fixed thresholds, use relative improvement based on WFE variance:
python
def compute_stability_threshold(wfe_history: list[float]) -> float:
"""Compute stability threshold from observed WFE variance.
Principle: Require improvement exceeding noise level.
If WFE has std=0.15 across folds, random fluctuation could be ±0.15.
To distinguish signal from noise, require improvement > 1σ of WFE.
Minimum: 5% (prevent switching on negligible improvements)
Maximum: 20% (don't be overly conservative)
"""
if len(wfe_history) < 3:
return 0.10 # Default until enough history
wfe_std = np.std(wfe_history)
threshold = max(0.05, min(0.20, wfe_std))
return threshold
class AdaptiveStabilityPenalty:
"""Stability penalty that adapts to observed WFE variance."""
def __init__(self):
self.wfe_history: list[float] = []
self.epoch_changes: list[int] = []
def should_change_epoch(
self,
current_wfe: float,
candidate_wfe: float,
current_epoch: int,
candidate_epoch: int,
) -> bool:
"""Decide whether to change epochs based on adaptive threshold."""
self.wfe_history.append(current_wfe)
if current_epoch == candidate_epoch:
return False # Same epoch, no change needed
threshold = compute_stability_threshold(self.wfe_history)
improvement = (candidate_wfe - current_wfe) / max(abs(current_wfe), 0.01)
if improvement > threshold:
self.epoch_changes.append(len(self.wfe_history))
return True
return False # Improvement not significant
稳定性惩罚用于防止超参数频繁变化。替代固定阈值,使用基于WFE方差的相对改进:
python
def compute_stability_threshold(wfe_history: list[float]) -> float:
"""从观测到的WFE方差计算稳定性阈值。
原则:要求改进超过噪声水平。
如果WFE在各折中的标准差为0.15,随机波动可能为±0.15。
为区分信号与噪声,要求改进> WFE的1σ。
最小值:5%(防止因微小改进而切换)
最大值:20%(不过于保守)
"""
if len(wfe_history) < 3:
return 0.10 # 有足够历史前的默认值
wfe_std = np.std(wfe_history)
threshold = max(0.05, min(0.20, wfe_std))
return threshold
class AdaptiveStabilityPenalty:
"""自适应稳定性惩罚,基于观测到的WFE方差调整。"""
def __init__(self):
self.wfe_history: list[float] = []
self.epoch_changes: list[int] = []
def should_change_epoch(
self,
current_wfe: float,
candidate_wfe: float,
current_epoch: int,
candidate_epoch: int,
) -> bool:
"""基于自适应阈值决定是否更换Epoch。"""
self.wfe_history.append(current_wfe)
if current_epoch == candidate_epoch:
return False # 相同Epoch,无需更换
threshold = compute_stability_threshold(self.wfe_history)
improvement = (candidate_wfe - current_wfe) / max(abs(current_wfe), 0.01)
if improvement > threshold:
self.epoch_changes.append(len(self.wfe_history))
return True
return False # 改进不显著
G4: DSR Adjustment for Epoch Search (Principled)
G4: 针对Epoch搜索的DSR调整(原则性)
python
def adjusted_dsr_for_epoch_search(
sharpe: float,
n_folds: int,
n_epochs: int,
sharpe_se: float | None = None,
n_samples_per_fold: int | None = None,
) -> float:
"""Deflated Sharpe Ratio accounting for epoch selection multiplicity.
When selecting from K epochs, the expected maximum Sharpe under null
is inflated. This adjustment corrects for that selection bias.
Principled SE estimation:
- If n_samples provided: SE(Sharpe) ≈ 1/√n
- Otherwise: estimate from typical fold size
Reference: Bailey & López de Prado (2014), Gumbel distribution
"""
from math import sqrt, log, pi
n_trials = n_folds * n_epochs # Total selection events
if n_trials < 2:
return sharpe # No multiple testing correction needed
# Expected maximum under null (Gumbel distribution)
# E[max(Z_1, ..., Z_n)] ≈ √(2·ln(n)) - (γ + ln(π/2)) / √(2·ln(n))
# where γ ≈ 0.5772 is Euler-Mascheroni constant
euler_gamma = 0.5772156649
sqrt_2_log_n = sqrt(2 * log(n_trials))
e_max_z = sqrt_2_log_n - (euler_gamma + log(pi / 2)) / sqrt_2_log_n
# Estimate Sharpe SE if not provided
if sharpe_se is None:
if n_samples_per_fold is not None:
sharpe_se = 1.0 / sqrt(n_samples_per_fold)
else:
# Conservative default: assume ~300 samples per fold
sharpe_se = 1.0 / sqrt(300)
# Expected maximum Sharpe under null
e_max_sharpe = e_max_z * sharpe_se
# Deflated Sharpe
return max(0, sharpe - e_max_sharpe)
Example: For 5 epochs × 50 folds = 250 trials with 300 samples/fold:
- A Sharpe of 1.0 deflates to 0.83 after adjustment.
python
def adjusted_dsr_for_epoch_search(
sharpe: float,
n_folds: int,
n_epochs: int,
sharpe_se: float | None = None,
n_samples_per_fold: int | None = None,
) -> float:
"""针对Epoch选择的多重检验调整压缩夏普比率。
当从K个Epoch中选择时,原假设下的最大Sharpe会被高估。此调整用于修正该选择偏差。
原则性SE估计:
- 如果提供n_samples:SE(Sharpe) ≈ 1/√n
- 否则:从典型折大小估计
参考:Bailey & López de Prado (2014),Gumbel分布
"""
from math import sqrt, log, pi
n_trials = n_folds * n_epochs # 总选择事件数
if n_trials < 2:
return sharpe # 无需多重检验校正
# 原假设下的期望最大值(Gumbel分布)
# E[max(Z_1, ..., Z_n)] ≈ √(2·ln(n)) - (γ + ln(π/2)) / √(2·ln(n))
# 其中γ ≈ 0.5772是欧拉-马歇罗尼常数
euler_gamma = 0.5772156649
sqrt_2_log_n = sqrt(2 * log(n_trials))
e_max_z = sqrt_2_log_n - (euler_gamma + log(pi / 2)) / sqrt_2_log_n
# 如果未提供,估计Sharpe SE
if sharpe_se is None:
if n_samples_per_fold is not None:
sharpe_se = 1.0 / sqrt(n_samples_per_fold)
else:
# 保守默认:假设每个折约300个样本
sharpe_se = 1.0 / sqrt(300)
# 原假设下的期望最大Sharpe
e_max_sharpe = e_max_z * sharpe_se
# 压缩夏普比率
return max(0, sharpe - e_max_sharpe)
示例:对于5个Epoch × 50个折 = 250次试验,每个折300个样本:
WFE Aggregation Methods
WFE聚合方法
WARNING: Cauchy Distribution Under Null
Under the null hypothesis (no predictive skill), WFE follows a Cauchy distribution, which has:
- No defined mean (undefined expectation)
- No defined variance (infinite)
- Heavy tails (extreme values common)
This makes
arithmetic mean unreliable. A single extreme WFE can dominate the average.
Always prefer median or pooled methods for robust WFE aggregation. See
references/mathematical-formulation.md for the proof:
WFE | H0 ~ Cauchy(0, sqrt(T_IS/T_OOS))
.
警告:原假设下的柯西分布
在原假设(无预测能力)下,WFE服从柯西分布,该分布具有:
- 无定义均值(期望不存在)
- 无定义方差(无穷大)
- 厚尾(极端值常见)
这使得
算术均值不可靠。单个极端WFE会主导平均值。
始终优先使用中位数或加权聚合方法进行鲁棒的WFE聚合。证明请参见
references/mathematical-formulation.md:
WFE | H0 ~ Cauchy(0, sqrt(T_IS/T_OOS))
。
Method 1: Pooled WFE (Recommended for precision-weighted)
方法1:加权WFE(推荐用于精度加权)
python
def pooled_wfe(fold_results: list[dict]) -> float:
"""Weights each fold by its sample size (precision).
Formula: Σ(T_OOS × SR_OOS) / Σ(T_IS × SR_IS)
Advantage: More stable than arithmetic mean, handles varying fold sizes.
Use when: Fold sizes vary significantly.
"""
numerator = sum(r["n_oos"] * r["oos_sharpe"] for r in fold_results)
denominator = sum(r["n_is"] * r["is_sharpe"] for r in fold_results)
if denominator < 1e-10:
return float("nan")
return numerator / denominator
python
def pooled_wfe(fold_results: list[dict]) -> float:
"""按样本量(精度)对每个折进行加权。
公式:Σ(T_OOS × SR_OOS) / Σ(T_IS × SR_IS)
优势:比算术均值更稳定,处理不同折大小。
使用场景:折大小差异显著时。
"""
numerator = sum(r["n_oos"] * r["oos_sharpe"] for r in fold_results)
denominator = sum(r["n_is"] * r["is_sharpe"] for r in fold_results)
if denominator < 1e-10:
return float("nan")
return numerator / denominator
Method 2: Median WFE (Recommended for robustness)
方法2:中位数WFE(推荐用于鲁棒性)
python
def median_wfe(fold_results: list[dict]) -> float:
"""Robust to outliers, standard in robust statistics.
Advantage: Single extreme fold doesn't dominate.
Use when: Suspected outlier folds (regime changes, data issues).
"""
wfes = [r["wfe"] for r in fold_results if r["wfe"] is not None]
return float(np.median(wfes)) if wfes else float("nan")
python
def median_wfe(fold_results: list[dict]) -> float:
"""对异常值鲁棒,是鲁棒统计中的标准方法。
优势:单个极端折不会主导结果。
使用场景:怀疑存在异常折( regime变化、数据问题)时。
"""
wfes = [r["wfe"] for r in fold_results if r["wfe"] is not None]
return float(np.median(wfes)) if wfes else float("nan")
Method 3: Weighted Arithmetic Mean
方法3:加权算术均值
python
def weighted_mean_wfe(fold_results: list[dict]) -> float:
"""Weights by inverse variance (efficiency weighting).
Formula: Σ(w_i × WFE_i) / Σ(w_i)
where w_i = 1 / Var(WFE_i) ≈ n_oos × n_is / (n_oos + n_is)
Advantage: Optimal when combining estimates of different precision.
Use when: All folds have similar characteristics.
"""
weighted_sum = 0.0
weight_total = 0.0
for r in fold_results:
if r["wfe"] is None:
continue
weight = r["n_oos"] * r["n_is"] / (r["n_oos"] + r["n_is"] + 1e-10)
weighted_sum += weight * r["wfe"]
weight_total += weight
return weighted_sum / weight_total if weight_total > 0 else float("nan")
python
def weighted_mean_wfe(fold_results: list[dict]) -> float:
"""按逆方差加权(效率加权)。
公式:Σ(w_i × WFE_i) / Σ(w_i)
其中w_i = 1 / Var(WFE_i) ≈ n_oos × n_is / (n_oos + n_is)
优势:当合并不同精度的估计时最优。
使用场景:所有折特征相似时。
"""
weighted_sum = 0.0
weight_total = 0.0
for r in fold_results:
if r["wfe"] is None:
continue
weight = r["n_oos"] * r["n_is"] / (r["n_oos"] + r["n_is"] + 1e-10)
weighted_sum += weight * r["wfe"]
weight_total += weight
return weighted_sum / weight_total if weight_total > 0 else float("nan")
Aggregation Selection Guide
聚合方法选择指南
| Scenario | Recommended Method | Rationale |
|---|
| Variable fold sizes | Pooled WFE | Weights by precision |
| Suspected outliers | Median WFE | Robust to extremes |
| Homogeneous folds | Weighted mean | Optimal efficiency |
| Reporting | All three | Cross-check consistency |
| 场景 | 推荐方法 | 理由 |
|---|
| 折大小可变 | 加权WFE | 按精度加权 |
| 怀疑存在异常折 | 中位数WFE | 对极端值鲁棒 |
| 折特征相似 | 加权均值 | 最优效率 |
| 报告 | 三种方法都用 | 交叉检查一致性 |
Efficient Frontier Algorithm
有效前沿算法
python
def compute_efficient_frontier(
epoch_metrics: list[dict],
wfe_weight: float = 1.0,
time_weight: float = 0.1,
) -> tuple[list[int], int]:
"""
Find Pareto-optimal epochs and select best.
An epoch is on the frontier if no other epoch dominates it
(better WFE AND lower training time).
Args:
epoch_metrics: List of {epoch, wfe, training_time_sec}
wfe_weight: Weight for WFE in selection (higher = prefer generalization)
time_weight: Weight for training time (higher = prefer speed)
Returns:
(frontier_epochs, selected_epoch)
"""
import numpy as np
# Filter valid metrics
valid = [(m["epoch"], m["wfe"], m.get("training_time_sec", m["epoch"]))
for m in epoch_metrics
if m["wfe"] is not None and np.isfinite(m["wfe"])]
if not valid:
# Fallback: return epoch with best OOS Sharpe
best_oos = max(epoch_metrics, key=lambda m: m.get("oos_sharpe", 0))
return ([best_oos["epoch"]], best_oos["epoch"])
# Pareto dominance check
frontier = []
for i, (epoch_i, wfe_i, time_i) in enumerate(valid):
dominated = False
for j, (epoch_j, wfe_j, time_j) in enumerate(valid):
if i == j:
continue
# j dominates i if: better/equal WFE AND lower/equal time (strict in at least one)
if (wfe_j >= wfe_i and time_j <= time_i and
(wfe_j > wfe_i or time_j < time_i)):
dominated = True
break
if not dominated:
frontier.append((epoch_i, wfe_i, time_i))
frontier_epochs = [e for e, _, _ in frontier]
if len(frontier) == 1:
return (frontier_epochs, frontier[0][0])
# Weighted score selection
wfes = np.array([w for _, w, _ in frontier])
times = np.array([t for _, _, t in frontier])
wfe_norm = (wfes - wfes.min()) / (wfes.max() - wfes.min() + 1e-10)
time_norm = (times.max() - times) / (times.max() - times.min() + 1e-10)
scores = wfe_weight * wfe_norm + time_weight * time_norm
best_idx = np.argmax(scores)
return (frontier_epochs, frontier[best_idx][0])
python
def compute_efficient_frontier(
epoch_metrics: list[dict],
wfe_weight: float = 1.0,
time_weight: float = 0.1,
) -> tuple[list[int], int]:
"""
找到帕累托最优Epoch并选择最优值。
一个Epoch在前沿上的条件是:没有其他Epoch能在WFE和训练时间上同时优于它
(更好的WFE AND更短的训练时间,至少有一项严格更优)。
参数:
epoch_metrics: 列表,元素为{epoch, wfe, training_time_sec}
wfe_weight: 选择中WFE的权重(值越高越偏好泛化能力)
time_weight: 选择中训练时间的权重(值越高越偏好速度)
返回:
(frontier_epochs, selected_epoch)
"""
import numpy as np
# 过滤有效指标
valid = [(m["epoch"], m["wfe"], m.get("training_time_sec", m["epoch"]))
for m in epoch_metrics
if m["wfe"] is not None and np.isfinite(m["wfe"])]
if not valid:
# 回退:返回OOS Sharpe最优的Epoch
best_oos = max(epoch_metrics, key=lambda m: m.get("oos_sharpe", 0))
return ([best_oos["epoch"]], best_oos["epoch"])
# 帕累托占优检查
frontier = []
for i, (epoch_i, wfe_i, time_i) in enumerate(valid):
dominated = False
for j, (epoch_j, wfe_j, time_j) in enumerate(valid):
if i == j:
continue
# j占优i的条件:WFE更好/相等 且 训练时间更短/相等(至少一项严格更优)
if (wfe_j >= wfe_i and time_j <= time_i and
(wfe_j > wfe_i or time_j < time_i)):
dominated = True
break
if not dominated:
frontier.append((epoch_i, wfe_i, time_i))
frontier_epochs = [e for e, _, _ in frontier]
if len(frontier) == 1:
return (frontier_epochs, frontier[0][0])
# 加权分数选择
wfes = np.array([w for _, w, _ in frontier])
times = np.array([t for _, _, t in frontier])
wfe_norm = (wfes - wfes.min()) / (wfes.max() - wfes.min() + 1e-10)
time_norm = (times.max() - times) / (times.max() - times.min() + 1e-10)
scores = wfe_weight * wfe_norm + time_weight * time_norm
best_idx = np.argmax(scores)
return (frontier_epochs, frontier[best_idx][0])
Carry-Forward Mechanism
传递机制
python
class AdaptiveEpochSelector:
"""Maintains epoch selection state across WFO folds with adaptive stability."""
def __init__(self, epoch_configs: list[int]):
self.epoch_configs = epoch_configs
self.selection_history: list[dict] = []
self.last_selected: int | None = None
self.stability = AdaptiveStabilityPenalty() # Use adaptive, not fixed
def select_epoch(self, epoch_metrics: list[dict]) -> int:
"""Select epoch with adaptive stability penalty for changes."""
frontier_epochs, candidate = compute_efficient_frontier(epoch_metrics)
# Apply adaptive stability penalty if changing epochs
if self.last_selected is not None and candidate != self.last_selected:
candidate_wfe = next(
m["wfe"] for m in epoch_metrics if m["epoch"] == candidate
)
last_wfe = next(
(m["wfe"] for m in epoch_metrics if m["epoch"] == self.last_selected),
0.0
)
# Use adaptive threshold derived from WFE variance
if not self.stability.should_change_epoch(
last_wfe, candidate_wfe, self.last_selected, candidate
):
candidate = self.last_selected
# Record and return
self.selection_history.append({
"epoch": candidate,
"frontier": frontier_epochs,
"changed": candidate != self.last_selected,
})
self.last_selected = candidate
return candidate
python
class AdaptiveEpochSelector:
"""在WFO折之间维护Epoch选择状态,带有自适应稳定性。"""
def __init__(self, epoch_configs: list[int]):
self.epoch_configs = epoch_configs
self.selection_history: list[dict] = []
self.last_selected: int | None = None
self.stability = AdaptiveStabilityPenalty() # 使用自适应,而非固定
def select_epoch(self, epoch_metrics: list[dict]) -> int:
"""使用自适应稳定性惩罚选择Epoch。"""
frontier_epochs, candidate = compute_efficient_frontier(epoch_metrics)
# 更换Epoch时应用自适应稳定性惩罚
if self.last_selected is not None and candidate != self.last_selected:
candidate_wfe = next(
m["wfe"] for m in epoch_metrics if m["epoch"] == candidate
)
last_wfe = next(
(m["wfe"] for m in epoch_metrics if m["epoch"] == self.last_selected),
0.0
)
# 使用从WFE方差推导的自适应阈值
if not self.stability.should_change_epoch(
last_wfe, candidate_wfe, self.last_selected, candidate
):
candidate = self.last_selected
# 记录并返回
self.selection_history.append({
"epoch": candidate,
"frontier": frontier_epochs,
"changed": candidate != self.last_selected,
})
self.last_selected = candidate
return candidate
| Anti-Pattern | Symptom | Fix | Severity |
|---|
| Expanding window (range bars) | Train size grows per fold | Use fixed sliding window | CRITICAL |
| Peak picking | Best epoch always at sweep boundary | Expand range, check for plateau | HIGH |
| Insufficient folds | effective_n < 30 | Increase folds or data span | HIGH |
| Ignoring temporal autocorr | Folds correlated | Use purged CV, gap between folds | HIGH |
| Overfitting to IS | IS >> OOS Sharpe | Reduce epochs, add regularization | HIGH |
| sqrt(252) for crypto | Inflated Sharpe | Use sqrt(365) or sqrt(7) weekly | MEDIUM |
| Single epoch selection | No uncertainty quantification | Report confidence interval | MEDIUM |
| Meta-overfitting | Epoch selection itself overfits | Limit to 3-4 candidates max | HIGH |
CRITICAL: Never use expanding window for range bar ML training. Expanding windows create fold non-equivalence, regime dilution, and systematically bias risk metrics. See references/anti-patterns.md for the full analysis (Section 7).
| 反模式 | 症状 | 修复方案 | 严重程度 |
|---|
| 扩展窗口(范围bar) | 训练集大小随折增长 | 使用固定滑动窗口 | 严重 |
| 峰值选取 | 最优Epoch始终在扫描边界 | 扩大范围,检查是否存在平台 | 高 |
| 折数量不足 | effective_n < 30 | 增加折数量或数据跨度 | 高 |
| 忽略时序自相关 | 折之间存在相关性 | 使用净化CV,在折之间添加间隔 | 高 |
| 过拟合到IS | IS >> OOS Sharpe | 减少Epoch,添加正则化 | 高 |
| 对加密货币使用sqrt(252) | Sharpe被高估 | 使用sqrt(365)或周度的sqrt(7) | 中 |
| 单个Epoch选择 | 无不确定性量化 | 报告置信区间 | 中 |
| 元过拟合 | Epoch选择本身过拟合 | 最多限制为3-4个候选Epoch | 高 |
严重警告:永远不要对范围bar ML训练使用扩展窗口。扩展窗口会导致折非等价、regime稀释,并系统性地偏置风险指标。完整分析请参见references/anti-patterns.md(第7节)。
See references/epoch-selection-decision-tree.md for the full practitioner decision tree.
Start
│
├─ IS_Sharpe > compute_is_sharpe_threshold(n)? ──NO──> Mark WFE invalid, use fallback
│ │ (threshold = 2/√n, adapts to sample size)
│ YES
│ │
├─ Compute WFE for each epoch
│ │
├─ Any WFE > 0.30? ──NO──> REJECT all epochs (severe overfit)
│ │ (guideline, not hard threshold)
│ YES
│ │
├─ Compute efficient frontier
│ │
├─ Apply AdaptiveStabilityPenalty
│ │ (threshold derived from WFE variance)
└─> Return selected epoch
完整从业者决策树请参见references/epoch-selection-decision-tree.md。
开始
│
├─ IS_Sharpe > compute_is_sharpe_threshold(n)? ──NO──> 标记WFE无效,使用回退方案
│ │ (阈值 = 2/√n,随样本量调整)
│ YES
│ │
├─ 为每个Epoch计算WFE
│ │
├─ 是否有WFE > 0.30? ──NO──> 拒绝所有Epoch(严重过拟合)
│ │ (指南,非硬性阈值)
│ YES
│ │
├─ 计算有效前沿
│ │
├─ 应用AdaptiveStabilityPenalty
│ │ (阈值从WFE方差推导)
└─> 返回选中的Epoch
Integration with rangebar-eval-metrics
与rangebar-eval-metrics集成
This skill extends rangebar-eval-metrics:
| Metric Source | Used For | Reference |
|---|
| WFE numerator (OOS) and denominator (IS) | range-bar-metrics.md |
| Sample size for aggregation weights | metrics-schema.md |
| , | Final acceptance criteria | sharpe-formulas.md |
| Validate model isn't collapsed | ml-prediction-quality.md |
| Model health check | ml-prediction-quality.md |
| Extended risk metrics | Deep risk analysis (optional) | risk-metrics.md |
本工具扩展了rangebar-eval-metrics:
| 指标来源 | 用途 | 参考链接 |
|---|
| WFE的分子(OOS)和分母(IS) | range-bar-metrics.md |
| 聚合权重的样本大小 | metrics-schema.md |
| , | 最终验收标准 | sharpe-formulas.md |
| 验证模型未崩溃 | ml-prediction-quality.md |
| 模型健康检查 | ml-prediction-quality.md |
| 扩展风险指标 | 深度风险分析(可选) | risk-metrics.md |
Recommended Workflow
推荐工作流
- Compute base metrics using
rangebar-eval-metrics:compute_metrics.py
- Feed to AWFES for epoch selection with as primary signal
- Validate with and before deployment
- Monitor and for model health
- 计算基础指标 使用
rangebar-eval-metrics:compute_metrics.py
- 输入到AWFES 以为主要信号进行Epoch选择
- 验证 部署前需满足和
- 监控 和以确保模型健康
OOS Application Phase
OOS应用阶段
After epoch selection via efficient frontier, apply the selected epochs to held-out test data for final OOS performance metrics. This phase produces "live trading" results that simulate deployment.
通过有效前沿选择Epoch后,将选中的Epoch应用于保留的测试数据以获取最终OOS性能指标。此阶段生成模拟部署的“实盘交易”结果。
Nested WFO Structure
嵌套WFO结构
AWFES uses Nested WFO with three data splits per fold:
AWFES: Nested WFO Data Split (per fold)
############# +----------+ +---------+ +----------+ #==========#
AWFES使用嵌套WFO,每个折包含三个数据拆分:
AWFES: 嵌套WFO数据拆分(每个折)
############# +----------+ +---------+ +----------+ #==========#
Train 60% # --> | Gap 6% A | --> | Val 20% | --> | Gap 6% B | --> H Test 20% H
训练集60% # --> | 间隔6% A | --> | 验证集20% | --> | 间隔6% B | --> H 测试集20% H
############# +----------+ +---------+ +----------+ #==========#
<details>
<summary>graph-easy source</summary>
graph { label: "AWFES: Nested WFO Data Split (per fold)"; flow: east; }
[ Train 60% ] { border: bold; }
[ Gap 6% A ]
[ Val 20% ]
[ Gap 6% B ]
[ Test 20% ] { border: double; }
[ Train 60% ] -> [ Gap 6% A ]
[ Gap 6% A ] -> [ Val 20% ]
[ Val 20% ] -> [ Gap 6% B ]
[ Gap 6% B ] -> [ Test 20% ]
############# +----------+ +---------+ +----------+ #==========#
<details>
<summary>graph-easy源码</summary>
graph { label: "AWFES: 嵌套WFO数据拆分(每个折)"; flow: east; }
[ 训练集60% ] { border: bold; }
[ 间隔6% A ]
[ 验证集20% ]
[ 间隔6% B ]
[ 测试集20% ] { border: double; }
[ 训练集60% ] -> [ 间隔6% A ]
[ 间隔6% A ] -> [ 验证集20% ]
[ 验证集20% ] -> [ 间隔6% B ]
[ 间隔6% B ] -> [ 测试集20% ]
AWFES: Per-Fold Workflow
-----------------------
| Fold i Data |
-----------------------
|
v
+-----------------------+
| Split: Train/Val/Test |
+-----------------------+
|
v
+-----------------------+
| Epoch Sweep on Train |
+-----------------------+
|
v
+-----------------------+
| Compute WFE on Val |
+-----------------------+
|
| val optimal
v
#=======================#
H Bayesian Update H
#=======================#
|
| smoothed epoch
v
+-----------------------+
| Train Final Model |
+-----------------------+
|
v
#=======================#
H Evaluate on Test H
#=======================#
|
v
-----------------------
| Fold i Metrics |
-----------------------
<details>
<summary>graph-easy source</summary>
graph { label: "AWFES: Per-Fold Workflow"; flow: south; }
[ Fold i Data ] { shape: rounded; }
[ Split: Train/Val/Test ]
[ Epoch Sweep on Train ]
[ Compute WFE on Val ]
[ Bayesian Update ] { border: double; }
[ Train Final Model ]
[ Evaluate on Test ] { border: double; }
[ Fold i Metrics ] { shape: rounded; }
[ Fold i Data ] -> [ Split: Train/Val/Test ]
[ Split: Train/Val/Test ] -> [ Epoch Sweep on Train ]
[ Epoch Sweep on Train ] -> [ Compute WFE on Val ]
[ Compute WFE on Val ] -- val optimal --> [ Bayesian Update ]
[ Bayesian Update ] -- smoothed epoch --> [ Train Final Model ]
[ Train Final Model ] -> [ Evaluate on Test ]
[ Evaluate on Test ] -> [ Fold i Metrics ]
</details>
AWFES: 单折工作流
-----------------------
| 折i数据 |
-----------------------
|
v
+-----------------------+
| 拆分:训练/验证/测试 |
+-----------------------+
|
v
+-----------------------+
| 在训练集上进行Epoch扫描 |
+-----------------------+
|
v
+-----------------------+
| 在验证集上计算WFE |
+-----------------------+
|
| 验证集最优
v
#=======================#
H 贝叶斯更新 H
#=======================#
|
| 平滑后的Epoch
v
+-----------------------+
| 训练最终模型 |
+-----------------------+
|
v
#=======================#
H 在测试集上评估 H
#=======================#
|
v
-----------------------
| 折i指标 |
-----------------------
<details>
<summary>graph-easy源码</summary>
graph { label: "AWFES: 单折工作流"; flow: south; }
[ 折i数据 ] { shape: rounded; }
[ 拆分:训练/验证/测试 ]
[ 在训练集上进行Epoch扫描 ]
[ 在验证集上计算WFE ]
[ 贝叶斯更新 ] { border: double; }
[ 训练最终模型 ]
[ 在测试集上评估 ] { border: double; }
[ 折i指标 ] { shape: rounded; }
[ 折i数据 ] -> [ 拆分:训练/验证/测试 ]
[ 拆分:训练/验证/测试 ] -> [ 在训练集上进行Epoch扫描 ]
[ 在训练集上进行Epoch扫描 ] -> [ 在验证集上计算WFE ]
[ 在验证集上计算WFE ] -- 验证集最优 --> [ 贝叶斯更新 ]
[ 贝叶斯更新 ] -- 平滑后的Epoch --> [ 训练最终模型 ]
[ 训练最终模型 ] -> [ 在测试集上评估 ]
[ 在测试集上评估 ] -> [ 折i指标 ]
</details>
Bayesian Carry-Forward Across Folds
跨折贝叶斯传递
AWFES: Bayesian Carry-Forward Across Folds
------- init +--------+ posterior +--------+ posterior +--------+ +--------+ -----------
| Prior | ------> | Fold 1 | -----------> | Fold 2 | -----------> | Fold 3 | ..> | Fold N | --> | Aggregate |
------- +--------+ +--------+ +--------+ +--------+ -----------
<details>
<summary>graph-easy source</summary>
graph { label: "AWFES: Bayesian Carry-Forward Across Folds"; flow: east; }
[ Prior ] { shape: rounded; }
[ Fold 1 ]
[ Fold 2 ]
[ Fold 3 ]
[ Fold N ]
[ Aggregate ] { shape: rounded; }
[ Prior ] -- init --> [ Fold 1 ]
[ Fold 1 ] -- posterior --> [ Fold 2 ]
[ Fold 2 ] -- posterior --> [ Fold 3 ]
[ Fold 3 ] ..> [ Fold N ]
[ Fold N ] -> [ Aggregate ]
</details>
AWFES: 跨折贝叶斯传递
------- 初始化 +--------+ 后验 +--------+ 后验 +--------+ +--------+ -----------
| 先验 | ------> | 折1 | -----------> | 折2 | -----------> | 折3 | ..> | 折N | --> | 聚合 |
------- +--------+ +--------+ +--------+ +--------+ -----------
<details>
<summary>graph-easy源码</summary>
graph { label: "AWFES: 跨折贝叶斯传递"; flow: east; }
[ 先验 ] { shape: rounded; }
[ 折1 ]
[ 折2 ]
[ 折3 ]
[ 折N ]
[ 聚合 ] { shape: rounded; }
[ 先验 ] -- 初始化 --> [ 折1 ]
[ 折1 ] -- 后验 --> [ 折2 ]
[ 折2 ] -- 后验 --> [ 折3 ]
[ 折3 ] ..> [ 折N ]
[ 折N ] -> [ 聚合 ]
</details>
Bayesian Epoch Selection for OOS
用于OOS的贝叶斯Epoch选择
Instead of using the current fold's optimal epoch (look-ahead bias), use Bayesian-smoothed epoch from prior folds:
python
class BayesianEpochSelector:
"""Bayesian updating of epoch selection across folds.
Also known as: BayesianEpochSmoother (alias in epoch-smoothing.md)
Variance parameters are DERIVED from search space, not hard-coded.
See AWFESConfig._derive_variances() for the principled derivation.
"""
def __init__(
self,
epoch_configs: list[int],
prior_mean: float | None = None,
prior_variance: float | None = None,
observation_variance: float | None = None,
):
self.epoch_configs = sorted(epoch_configs)
# PRINCIPLED DERIVATION: Variances from search space
# If not provided, derive from epoch range
epoch_range = max(epoch_configs) - min(epoch_configs)
# Prior spans search space with 95% coverage
# 95% CI = mean ± 1.96σ → range = 3.92σ → σ² = (range/3.92)²
default_prior_var = (epoch_range / 3.92) ** 2
# Observation variance: 1/4 of prior for balanced learning
default_obs_var = default_prior_var / 4
self.posterior_mean = prior_mean or np.mean(epoch_configs)
self.posterior_variance = prior_variance or default_prior_var
self.observation_variance = observation_variance or default_obs_var
self.history: list[dict] = []
def update(self, observed_optimal_epoch: int, wfe: float) -> int:
"""Update posterior with new fold's optimal epoch.
Uses precision-weighted Bayesian update:
posterior_mean = (prior_precision * prior_mean + obs_precision * obs) /
(prior_precision + obs_precision)
Args:
observed_optimal_epoch: Optimal epoch from current fold's validation
wfe: Walk-Forward Efficiency (used to weight observation)
Returns:
Smoothed epoch selection for TEST evaluation
"""
# Weight observation by WFE (higher WFE = more reliable signal)
# Clamp WFE to [0.1, 2.0] to prevent extreme weights:
# - Lower bound 0.1: Prevents division issues and ensures minimum weight
# - Upper bound 2.0: WFE > 2 is suspicious (OOS > 2× IS suggests:
# a) Regime shift favoring OOS (lucky timing, not skill)
# b) IS severely overfit (artificially low denominator)
# c) Data anomaly or look-ahead bias
# Capping at 2.0 treats such observations with skepticism
wfe_clamped = max(0.1, min(wfe, 2.0))
effective_variance = self.observation_variance / wfe_clamped
prior_precision = 1.0 / self.posterior_variance
obs_precision = 1.0 / effective_variance
# Bayesian update
new_precision = prior_precision + obs_precision
new_mean = (
prior_precision * self.posterior_mean +
obs_precision * observed_optimal_epoch
) / new_precision
# Record before updating
self.history.append({
"observed_epoch": observed_optimal_epoch,
"wfe": wfe,
"prior_mean": self.posterior_mean,
"posterior_mean": new_mean,
"selected_epoch": self._snap_to_config(new_mean),
})
self.posterior_mean = new_mean
self.posterior_variance = 1.0 / new_precision
return self._snap_to_config(new_mean)
def _snap_to_config(self, continuous_epoch: float) -> int:
"""Snap continuous estimate to nearest valid epoch config."""
return min(self.epoch_configs, key=lambda e: abs(e - continuous_epoch))
def get_current_epoch(self) -> int:
"""Get current smoothed epoch without updating."""
return self._snap_to_config(self.posterior_mean)
不要使用当前折的最优Epoch(前瞻偏差),而是使用来自先验折的贝叶斯平滑Epoch:
python
class BayesianEpochSelector:
"""跨折贝叶斯更新Epoch选择。
也称为:BayesianEpochSmoother(在epoch-smoothing.md中的别名)
方差参数从搜索空间推导,而非硬编码。
推导细节请参见AWFESConfig._derive_variances()。
"""
def __init__(
self,
epoch_configs: list[int],
prior_mean: float | None = None,
prior_variance: float | None = None,
observation_variance: float | None = None,
):
self.epoch_configs = sorted(epoch_configs)
# 原则性推导:从搜索空间得到方差
# 如果未提供,从Epoch范围推导
epoch_range = max(epoch_configs) - min(epoch_configs)
# 先验分布覆盖搜索空间的95%范围
# 95%置信区间 = 均值 ± 1.96σ → 范围 = 3.92σ → σ² = (range/3.92)²
default_prior_var = (epoch_range / 3.92) ** 2
# 观测方差:先验的1/4,实现平衡学习
default_obs_var = default_prior_var / 4
self.posterior_mean = prior_mean or np.mean(epoch_configs)
self.posterior_variance = prior_variance or default_prior_var
self.observation_variance = observation_variance or default_obs_var
self.history: list[dict] = []
def update(self, observed_optimal_epoch: int, wfe: float) -> int:
"""使用当前折的验证最优Epoch更新后验分布。
使用精度加权贝叶斯更新:
posterior_mean = (prior_precision * prior_mean + obs_precision * obs) /
(prior_precision + obs_precision)
参数:
observed_optimal_epoch: 当前折验证集的最优Epoch
wfe: 滚动窗口效率(用于加权观测值)
返回:
用于TEST评估的平滑Epoch选择
"""
# 按WFE加权观测值(WFE越高,信号越可靠)
# 将WFE限制在[0.1, 2.0]以防止极端权重:
# - 下限0.1:防止除零问题并确保最小权重
# - 上限2.0:WFE > 2可疑(OOS > 2× IS表明:
# a) 有利于OOS的regime变化(幸运时机,而非技能)
# b) IS严重过拟合(人为降低分母)
# c) 数据异常或前瞻偏差
# 限制为2.0表示对这类观测值持怀疑态度
wfe_clamped = max(0.1, min(wfe, 2.0))
effective_variance = self.observation_variance / wfe_clamped
prior_precision = 1.0 / self.posterior_variance
obs_precision = 1.0 / effective_variance
# 贝叶斯更新
new_precision = prior_precision + obs_precision
new_mean = (
prior_precision * self.posterior_mean +
obs_precision * observed_optimal_epoch
) / new_precision
# 更新前记录
self.history.append({
"observed_epoch": observed_optimal_epoch,
"wfe": wfe,
"prior_mean": self.posterior_mean,
"posterior_mean": new_mean,
"selected_epoch": self._snap_to_config(new_mean),
})
self.posterior_mean = new_mean
self.posterior_variance = 1.0 / new_precision
return self._snap_to_config(new_mean)
def _snap_to_config(self, continuous_epoch: float) -> int:
"""将连续估计值对齐到最近的有效Epoch配置。"""
return min(self.epoch_configs, key=lambda e: abs(e - continuous_epoch))
def get_current_epoch(self) -> int:
"""获取当前平滑后的Epoch,无需更新。"""
return self._snap_to_config(self.posterior_mean)
Application Workflow
应用工作流
python
def apply_awfes_to_test(
folds: list[Fold],
model_factory: Callable,
bayesian_selector: BayesianEpochSelector,
) -> list[dict]:
"""Apply AWFES with Bayesian smoothing to test data.
Workflow per fold:
1. Split into train/validation/test (60/20/20)
2. Sweep epochs on train, compute WFE on validation
3. Update Bayesian posterior with validation-optimal epoch
4. Train final model at Bayesian-selected epoch on train+validation
5. Evaluate on TEST (untouched data)
"""
results = []
for fold_idx, fold in enumerate(folds):
# Step 1: Split data
train, validation, test = fold.split_nested(
train_pct=0.60,
validation_pct=0.20,
test_pct=0.20,
embargo_pct=0.06, # 6% gap at each boundary
)
# Step 2: Epoch sweep on train → validate on validation
epoch_metrics = []
for epoch in bayesian_selector.epoch_configs:
model = model_factory()
model.fit(train.X, train.y, epochs=epoch)
is_sharpe = compute_sharpe(model.predict(train.X), train.y)
val_sharpe = compute_sharpe(model.predict(validation.X), validation.y)
# Use data-driven threshold instead of hardcoded 0.1
is_threshold = compute_is_sharpe_threshold(len(train.X))
wfe = val_sharpe / is_sharpe if is_sharpe > is_threshold else None
epoch_metrics.append({
"epoch": epoch,
"is_sharpe": is_sharpe,
"val_sharpe": val_sharpe,
"wfe": wfe,
})
# Step 3: Find validation-optimal and update Bayesian
val_optimal = max(
[m for m in epoch_metrics if m["wfe"] is not None],
key=lambda m: m["wfe"],
default={"epoch": bayesian_selector.epoch_configs[0], "wfe": 0.3}
)
selected_epoch = bayesian_selector.update(
val_optimal["epoch"],
val_optimal["wfe"],
)
# Step 4: Train final model on train+validation at selected epoch
combined_X = np.vstack([train.X, validation.X])
combined_y = np.hstack([train.y, validation.y])
final_model = model_factory()
final_model.fit(combined_X, combined_y, epochs=selected_epoch)
# Step 5: Evaluate on TEST (untouched)
test_predictions = final_model.predict(test.X)
test_metrics = compute_oos_metrics(test_predictions, test.y, test.timestamps)
results.append({
"fold_idx": fold_idx,
"validation_optimal_epoch": val_optimal["epoch"],
"bayesian_selected_epoch": selected_epoch,
"test_metrics": test_metrics,
"epoch_metrics": epoch_metrics,
})
return results
See references/oos-application.md for complete implementation.
python
def apply_awfes_to_test(
folds: list[Fold],
model_factory: Callable,
bayesian_selector: BayesianEpochSelector,
) -> list[dict]:
"""将AWFES与贝叶斯平滑应用于测试数据。
单折工作流:
1. 拆分为训练/验证/测试(60/20/20)
2. 在训练集上扫描Epoch,在验证集上计算WFE
3. 使用验证最优Epoch更新贝叶斯后验
4. 在训练+验证集上以贝叶斯选中的Epoch训练最终模型
5. 在TEST(未接触的数据)上评估
"""
results = []
for fold_idx, fold in enumerate(folds):
# 步骤1:拆分数据
train, validation, test = fold.split_nested(
train_pct=0.60,
validation_pct=0.20,
test_pct=0.20,
embargo_pct=0.06, # 每个边界6%的间隔
)
# 步骤2:在训练集上扫描Epoch → 在验证集上验证
epoch_metrics = []
for epoch in bayesian_selector.epoch_configs:
model = model_factory()
model.fit(train.X, train.y, epochs=epoch)
is_sharpe = compute_sharpe(model.predict(train.X), train.y)
val_sharpe = compute_sharpe(model.predict(validation.X), validation.y)
# 使用数据驱动的阈值而非硬编码的0.1
is_threshold = compute_is_sharpe_threshold(len(train.X))
wfe = val_sharpe / is_sharpe if is_sharpe > is_threshold else None
epoch_metrics.append({
"epoch": epoch,
"is_sharpe": is_sharpe,
"val_sharpe": val_sharpe,
"wfe": wfe,
})
# 步骤3:找到验证最优并更新贝叶斯
val_optimal = max(
[m for m in epoch_metrics if m["wfe"] is not None],
key=lambda m: m["wfe"],
default={"epoch": bayesian_selector.epoch_configs[0], "wfe": 0.3}
)
selected_epoch = bayesian_selector.update(
val_optimal["epoch"],
val_optimal["wfe"],
)
# 步骤4:在训练+验证集上以选中的Epoch训练最终模型
combined_X = np.vstack([train.X, validation.X])
combined_y = np.hstack([train.y, validation.y])
final_model = model_factory()
final_model.fit(combined_X, combined_y, epochs=selected_epoch)
# 步骤5:在TEST(未接触的数据)上评估
test_predictions = final_model.predict(test.X)
test_metrics = compute_oos_metrics(test_predictions, test.y, test.timestamps)
results.append({
"fold_idx": fold_idx,
"validation_optimal_epoch": val_optimal["epoch"],
"bayesian_selected_epoch": selected_epoch,
"test_metrics": test_metrics,
"epoch_metrics": epoch_metrics,
})
return results
完整实现请参见references/oos-application.md。
Epoch Smoothing Methods
Epoch平滑方法
Why Smooth Epoch Selections?
为什么要平滑Epoch选择?
Raw per-fold epoch selections are noisy due to:
- Limited validation data per fold
- Regime changes between folds
- Stochastic training dynamics
Smoothing reduces variance while preserving signal.
原始的单折Epoch选择存在噪声,原因包括:
- 每个折的验证数据有限
- 折之间的regime变化
- 训练的随机性
平滑可以减少方差同时保留信号。
| Method | Formula | Pros | Cons |
|---|
| Bayesian (Recommended) | Precision-weighted update | Principled, handles uncertainty | More complex |
| EMA | | Simple, responsive | No uncertainty quantification |
| SMA | Mean of last N | Most stable | Slow to adapt |
| Median | Median of last N | Robust to outliers | Loses magnitude info |
| 方法 | 公式 | 优点 | 缺点 |
|---|
| 贝叶斯(推荐) | 精度加权更新 | 原则性,处理不确定性 | 更复杂 |
| EMA | | 简单,响应快 | 无不确定性量化 |
| SMA | 最近N个的均值 | 最稳定 | 适应慢 |
| 中位数 | 最近N个的中位数 | 对异常值鲁棒 | 丢失幅度信息 |
Bayesian Updating (Primary Method)
贝叶斯更新(主要方法)
python
def bayesian_epoch_update(
prior_mean: float,
prior_variance: float,
observed_epoch: int,
observation_variance: float,
wfe_weight: float = 1.0,
) -> tuple[float, float]:
"""Single Bayesian update step.
Mathematical formulation:
- Prior: N(μ₀, σ₀²)
- Observation: N(x, σ_obs²/wfe) # WFE-weighted
- Posterior: N(μ₁, σ₁²)
Where:
μ₁ = (μ₀/σ₀² + x·wfe/σ_obs²) / (1/σ₀² + wfe/σ_obs²)
σ₁² = 1 / (1/σ₀² + wfe/σ_obs²)
"""
# Effective observation variance (lower WFE = less reliable)
eff_obs_var = observation_variance / max(wfe_weight, 0.1)
prior_precision = 1.0 / prior_variance
obs_precision = 1.0 / eff_obs_var
posterior_precision = prior_precision + obs_precision
posterior_mean = (
prior_precision * prior_mean + obs_precision * observed_epoch
) / posterior_precision
posterior_variance = 1.0 / posterior_precision
return posterior_mean, posterior_variance
python
def bayesian_epoch_update(
prior_mean: float,
prior_variance: float,
observed_epoch: int,
observation_variance: float,
wfe_weight: float = 1.0,
) -> tuple[float, float]:
"""单次贝叶斯更新步骤。
数学公式:
- 先验:N(μ₀, σ₀²)
- 观测:N(x, σ_obs²/wfe) # WFE加权
- 后验:N(μ₁, σ₁²)
其中:
μ₁ = (μ₀/σ₀² + x·wfe/σ_obs²) / (1/σ₀² + wfe/σ_obs²)
σ₁² = 1 / (1/σ₀² + wfe/σ_obs²)
"""
# 有效观测方差(WFE越低,可靠性越差)
eff_obs_var = observation_variance / max(wfe_weight, 0.1)
prior_precision = 1.0 / prior_variance
obs_precision = 1.0 / eff_obs_var
posterior_precision = prior_precision + obs_precision
posterior_mean = (
prior_precision * prior_mean + obs_precision * observed_epoch
) / posterior_precision
posterior_variance = 1.0 / posterior_precision
return posterior_mean, posterior_variance
Exponential Moving Average (Alternative)
指数移动平均(替代方法)
python
def ema_epoch_update(
current_ema: float,
observed_epoch: int,
alpha: float = 0.3,
) -> float:
"""EMA update: more weight on recent observations.
α = 0.3 means ~90% of signal from last 7 folds.
α = 0.5 means ~90% of signal from last 4 folds.
"""
return alpha * observed_epoch + (1 - alpha) * current_ema
python
def ema_epoch_update(
current_ema: float,
observed_epoch: int,
alpha: float = 0.3,
) -> float:
"""EMA更新:给最近观测值更高权重。
α = 0.3表示~90%的信号来自最近7个折。
α = 0.5表示~90%的信号来自最近4个折。
"""
return alpha * observed_epoch + (1 - alpha) * current_ema
Initialization Strategies
初始化策略
| Strategy | When to Use | Implementation |
|---|
| Midpoint prior | No domain knowledge | |
| Literature prior | Published optimal exists | Known optimal ± uncertainty |
| Burn-in | Sufficient data | Use first N folds for initialization |
| 策略 | 使用场景 | 实现方式 |
|---|
| 中点先验 | 无领域知识 | |
| 文献先验 | 存在已发表的最优值 | 已知最优值 ± 不确定性 |
| 预热期 | 数据充足 | 使用前N个折进行初始化 |
RECOMMENDED: Use AWFESConfig for principled derivation
推荐:使用AWFESConfig进行原则性推导
config = AWFESConfig.from_search_space(
min_epoch=80,
max_epoch=400,
granularity=5,
)
config = AWFESConfig.from_search_space(
min_epoch=80,
max_epoch=400,
granularity=5,
)
prior_variance = ((400-80)/3.92)² ≈ 6,658 (derived automatically)
prior_variance = ((400-80)/3.92)² ≈ 6,658(自动推导)
observation_variance = prior_variance/4 ≈ 1,665 (derived automatically)
observation_variance = prior_variance/4 ≈ 1,665(自动推导)
Alternative strategies (if manual configuration needed):
替代策略(如需手动配置):
Strategy 1: Search-space derived (same as AWFESConfig)
策略1:从搜索空间推导(与AWFESConfig相同)
epoch_range = max(EPOCH_CONFIGS) - min(EPOCH_CONFIGS)
prior_mean = np.mean(EPOCH_CONFIGS)
prior_variance = (epoch_range / 3.92) ** 2 # 95% CI spans search space
epoch_range = max(EPOCH_CONFIGS) - min(EPOCH_CONFIGS)
prior_mean = np.mean(EPOCH_CONFIGS)
prior_variance = (epoch_range / 3.92) ** 2 # 95%置信区间覆盖搜索空间
Strategy 2: Burn-in (use first 5 folds)
策略2:预热期(使用前5个折)
burn_in_optima = [run_fold_sweep(fold) for fold in folds[:5]]
prior_mean = np.mean(burn_in_optima)
base_variance = (epoch_range / 3.92) ** 2 / 4 # Reduced after burn-in
prior_variance = max(np.var(burn_in_optima), base_variance)
See [references/epoch-smoothing.md](./references/epoch-smoothing.md) for extended analysis.
---
burn_in_optima = [run_fold_sweep(fold) for fold in folds[:5]]
prior_mean = np.mean(burn_in_optima)
base_variance = (epoch_range / 3.92) ** 2 / 4 # 预热期后减小
prior_variance = max(np.var(burn_in_optima), base_variance)
扩展分析请参见[references/epoch-smoothing.md](./references/epoch-smoothing.md)。
---
OOS Metrics Specification
OOS指标规范
Metric Tiers for Test Evaluation
测试评估的指标层级
Following rangebar-eval-metrics, compute these metrics on TEST data.
CRITICAL for Range Bars: Use time-weighted Sharpe (
) instead of simple bar Sharpe. See
range-bar-metrics.md for the canonical implementation. The metrics below assume time-weighted computation for range bar data.
遵循rangebar-eval-metrics,在TEST数据上计算这些指标。
范围bar的关键注意事项:使用时间加权Sharpe (
)而非简单bar Sharpe。标准实现请参见
range-bar-metrics.md。以下指标假设对范围bar数据使用时间加权计算。
Tier 1: Primary Metrics (Mandatory)
层级1:核心指标(必填)
| Metric | Formula | Threshold | Purpose |
|---|
| Time-weighted (see range-bar-metrics.md) | > 0 | Core performance |
| | > 0.50 | Directional accuracy |
| | > 0 | Total return |
| n_folds(sharpe_tw > 0) / n_folds
| > 0.55 | Consistency |
| test_sharpe_tw / validation_sharpe_tw
| > 0.30 | Final transfer |
| 指标 | 公式 | 阈值 | 用途 |
|---|
| 时间加权(参见range-bar-metrics.md) | > 0 | 核心性能 |
| | > 0.50 | 方向准确率 |
| | > 0 | 总收益 |
| n_folds(sharpe_tw > 0) / n_folds
| > 0.55 | 一致性 |
| test_sharpe_tw / validation_sharpe_tw
| > 0.30 | 最终迁移能力 |
Tier 2: Risk Metrics
层级2:风险指标
| Metric | Formula | Threshold | Purpose |
|---|
| max(peak - trough) / peak
| < 0.30 | Worst loss |
| annual_return / max_drawdown
| > 0.5 | Risk-adjusted |
| gross_profit / gross_loss
| > 1.0 | Win/loss ratio |
| | > -0.05 | Tail risk |
| 指标 | 公式 | 阈值 | 用途 |
|---|
| max(peak - trough) / peak
| < 0.30 | 最大回撤 |
| annual_return / max_drawdown
| > 0.5 | 风险调整后收益 |
| gross_profit / gross_loss
| > 1.0 | 盈亏比 |
| | > -0.05 | 尾部风险 |
Tier 3: Statistical Validation
层级3:统计验证
| Metric | Formula | Threshold | Purpose |
|---|
| | > 0.85 | Statistical significance |
| sharpe - E[max_sharpe_null]
| > 0.50 | Multiple testing adjusted |
| binom.test(n_positive, n_total)
| < 0.05 | Sign test |
| HAC-adjusted t-test | < 0.05 | Autocorrelation robust |
| 指标 | 公式 | 阈值 | 用途 |
|---|
| | > 0.85 | 统计显著性 |
| sharpe - E[max_sharpe_null]
| > 0.50 | 多重检验调整 |
| binom.test(n_positive, n_total)
| < 0.05 | 符号检验 |
| HAC调整t检验 | < 0.05 | 自相关鲁棒检验 |
Metric Computation Code
指标计算代码
python
import numpy as np
from scipy.stats import norm, binomtest # norm for PSR, binomtest for sign test
def compute_oos_metrics(
predictions: np.ndarray,
actuals: np.ndarray,
timestamps: np.ndarray,
duration_us: np.ndarray | None = None, # Required for range bars
market_type: str = "crypto_24_7", # For annualization factor
) -> dict[str, float]:
"""Compute full OOS metrics suite for test data.
Args:
predictions: Model predictions (signed magnitude)
actuals: Actual returns
timestamps: Bar timestamps for daily aggregation
duration_us: Bar durations in microseconds (REQUIRED for range bars)
Returns:
Dictionary with all tier metrics
IMPORTANT: For range bars, pass duration_us to compute sharpe_tw.
Simple bar_sharpe violates i.i.d. assumption - see range-bar-metrics.md.
"""
pnl = predictions * actuals
# Tier 1: Primary
# For range bars: Use time-weighted Sharpe (canonical)
if duration_us is not None:
from exp066e_tau_precision import compute_time_weighted_sharpe
sharpe_tw, weighted_std, total_days = compute_time_weighted_sharpe(
bar_pnl=pnl,
duration_us=duration_us,
annualize=True,
)
else:
# Fallback for time bars (all same duration)
daily_pnl = group_by_day(pnl, timestamps)
weekly_factor = get_daily_to_weekly_factor(market_type=market_type)
sharpe_tw = (
np.mean(daily_pnl) / np.std(daily_pnl) * weekly_factor
if np.std(daily_pnl) > 1e-10 else 0.0
)
hit_rate = np.mean(np.sign(predictions) == np.sign(actuals))
cumulative_pnl = np.sum(pnl)
# Tier 2: Risk
equity_curve = np.cumsum(pnl)
running_max = np.maximum.accumulate(equity_curve)
drawdowns = (running_max - equity_curve) / np.maximum(running_max, 1e-10)
max_drawdown = np.max(drawdowns)
gross_profit = np.sum(pnl[pnl > 0])
gross_loss = abs(np.sum(pnl[pnl < 0]))
profit_factor = gross_profit / gross_loss if gross_loss > 0 else float("inf")
# CVaR (10%)
sorted_pnl = np.sort(pnl)
cvar_cutoff = max(1, int(len(sorted_pnl) * 0.10))
cvar_10pct = np.mean(sorted_pnl[:cvar_cutoff])
# Tier 3: Statistical (use sharpe_tw for PSR)
sharpe_se = 1.0 / np.sqrt(len(pnl)) if len(pnl) > 0 else 1.0
psr = norm.cdf(sharpe_tw / sharpe_se) if sharpe_se > 0 else 0.5
n_positive = np.sum(pnl > 0)
n_total = len(pnl)
# Use binomtest (binom_test deprecated since scipy 1.10)
binomial_pvalue = binomtest(n_positive, n_total, 0.5, alternative="greater").pvalue
return {
# Tier 1 (use sharpe_tw for range bars)
"sharpe_tw": sharpe_tw,
"hit_rate": hit_rate,
"cumulative_pnl": cumulative_pnl,
"n_bars": len(pnl),
# Tier 2
"max_drawdown": max_drawdown,
"profit_factor": profit_factor,
"cvar_10pct": cvar_10pct,
# Tier 3
"psr": psr,
"binomial_pvalue": binomial_pvalue,
}
python
import numpy as np
from scipy.stats import norm, binomtest # norm用于PSR,binomtest用于符号检验
def compute_oos_metrics(
predictions: np.ndarray,
actuals: np.ndarray,
timestamps: np.ndarray,
duration_us: np.ndarray | None = None, # 范围bar必填
market_type: str = "crypto_24_7", # 用于年化因子
) -> dict[str, float]:
"""为测试数据计算完整OOS指标集。
参数:
predictions: 模型预测值(带符号幅度)
actuals: 实际收益
timestamps: 用于日度聚合的bar时间戳
duration_us: bar持续时间(微秒,范围bar必填)
返回:
包含所有层级指标的字典
重要提示:对于范围bar,传递duration_us以计算sharpe_tw。
简单bar_sharpe违反独立同分布假设 - 参见range-bar-metrics.md。
"""
pnl = predictions * actuals
# 层级1:核心
# 范围bar:使用时间加权Sharpe(标准方法)
if duration_us is not None:
from exp066e_tau_precision import compute_time_weighted_sharpe
sharpe_tw, weighted_std, total_days = compute_time_weighted_sharpe(
bar_pnl=pnl,
duration_us=duration_us,
annualize=True,
)
else:
# 时间bar回退(所有bar持续时间相同)
daily_pnl = group_by_day(pnl, timestamps)
weekly_factor = get_daily_to_weekly_factor(market_type=market_type)
sharpe_tw = (
np.mean(daily_pnl) / np.std(daily_pnl) * weekly_factor
if np.std(daily_pnl) > 1e-10 else 0.0
)
hit_rate = np.mean(np.sign(predictions) == np.sign(actuals))
cumulative_pnl = np.sum(pnl)
# 层级2:风险
equity_curve = np.cumsum(pnl)
running_max = np.maximum.accumulate(equity_curve)
drawdowns = (running_max - equity_curve) / np.maximum(running_max, 1e-10)
max_drawdown = np.max(drawdowns)
gross_profit = np.sum(pnl[pnl > 0])
gross_loss = abs(np.sum(pnl[pnl < 0]))
profit_factor = gross_profit / gross_loss if gross_loss > 0 else float("inf")
# CVaR(10%)
sorted_pnl = np.sort(pnl)
cvar_cutoff = max(1, int(len(sorted_pnl) * 0.10))
cvar_10pct = np.mean(sorted_pnl[:cvar_cutoff])
# 层级3:统计(使用sharpe_tw计算PSR)
sharpe_se = 1.0 / np.sqrt(len(pnl)) if len(pnl) > 0 else 1.0
psr = norm.cdf(sharpe_tw / sharpe_se) if sharpe_se > 0 else 0.5
n_positive = np.sum(pnl > 0)
n_total = len(pnl)
# 使用binomtest(binom_test自scipy 1.10起已弃用)
binomial_pvalue = binomtest(n_positive, n_total, 0.5, alternative="greater").pvalue
return {
# 层级1(范围bar使用sharpe_tw)
"sharpe_tw": sharpe_tw,
"hit_rate": hit_rate,
"cumulative_pnl": cumulative_pnl,
"n_bars": len(pnl),
# 层级2
"max_drawdown": max_drawdown,
"profit_factor": profit_factor,
"cvar_10pct": cvar_10pct,
# 层级3
"psr": psr,
"binomial_pvalue": binomial_pvalue,
}
Aggregation Across Folds
跨折聚合
python
def aggregate_test_metrics(fold_results: list[dict]) -> dict[str, float]:
"""Aggregate test metrics across all folds.
NOTE: For range bars, use sharpe_tw (time-weighted).
See range-bar-metrics.md for why simple bar_sharpe is invalid for range bars.
"""
metrics = [r["test_metrics"] for r in fold_results]
# Positive Sharpe Folds (use sharpe_tw for range bars)
sharpes = [m["sharpe_tw"] for m in metrics]
positive_sharpe_folds = np.mean([s > 0 for s in sharpes])
# Median for robustness
median_sharpe_tw = np.median(sharpes)
median_hit_rate = np.median([m["hit_rate"] for m in metrics])
# DSR for multiple testing (use time-weighted Sharpe)
n_trials = len(metrics)
dsr = compute_dsr(median_sharpe_tw, n_trials)
return {
"n_folds": len(metrics),
"positive_sharpe_folds": positive_sharpe_folds,
"median_sharpe_tw": median_sharpe_tw,
"mean_sharpe_tw": np.mean(sharpes),
"std_sharpe_tw": np.std(sharpes),
"median_hit_rate": median_hit_rate,
"dsr": dsr,
"total_pnl": sum(m["cumulative_pnl"] for m in metrics),
}
See references/oos-metrics.md for threshold justifications.
python
def aggregate_test_metrics(fold_results: list[dict]) -> dict[str, float]:
"""跨所有折聚合测试指标。
注意:对于范围bar,使用sharpe_tw(时间加权)。
参见range-bar-metrics.md了解为什么简单bar_sharpe对范围bar无效。
"""
metrics = [r["test_metrics"] for r in fold_results]
# 正Sharpe折数(范围bar使用sharpe_tw)
sharpes = [m["sharpe_tw"] for m in metrics]
positive_sharpe_folds = np.mean([s > 0 for s in sharpes])
# 中位数鲁棒性
median_sharpe_tw = np.median(sharpes)
median_hit_rate = np.median([m["hit_rate"] for m in metrics])
# 多重检验DSR(使用时间加权Sharpe)
n_trials = len(metrics)
dsr = compute_dsr(median_sharpe_tw, n_trials)
return {
"n_folds": len(metrics),
"positive_sharpe_folds": positive_sharpe_folds,
"median_sharpe_tw": median_sharpe_tw,
"mean_sharpe_tw": np.mean(sharpes),
"std_sharpe_tw": np.std(sharpes),
"median_hit_rate": median_hit_rate,
"dsr": dsr,
"total_pnl": sum(m["cumulative_pnl"] for m in metrics),
}
阈值理由请参见references/oos-metrics.md。
Look-Ahead Bias Prevention
前瞻偏差预防
Using the same data for epoch selection AND final evaluation creates look-ahead bias:
❌ WRONG: Use fold's own optimal epoch for fold's OOS evaluation
- Epoch selection "sees" validation returns
- Then apply same epoch to OOS from same period
- Result: Overly optimistic performance
使用相同数据进行Epoch选择和最终评估会产生前瞻偏差:
❌ 错误:将折自身的最优Epoch用于该折的OOS评估
- Epoch选择“看到”了验证收益
- 然后将相同Epoch应用于同一时期的OOS
- 结果:过于乐观的性能
The Solution: Nested WFO + Bayesian Lag
解决方案:嵌套WFO + 贝叶斯延迟
✅ CORRECT: Bayesian-smoothed epoch from PRIOR folds for current TEST
- Epoch selection on train/validation (inner loop)
- Update Bayesian posterior with validation-optimal
- Apply Bayesian-selected epoch to TEST (outer loop)
- TEST data completely untouched during selection
✅ 正确:使用来自PRIOR折的贝叶斯平滑Epoch用于当前TEST
- 在内部循环的训练/验证上进行Epoch选择
- 使用验证最优更新贝叶斯后验
- 在外部循环的TEST上应用贝叶斯选中的Epoch
- TEST数据在选择过程中完全未被接触
v3 Temporal Ordering (CRITICAL - 2026 Fix)
v3时序顺序(关键 - 2026修复)
The v3 implementation fixes a subtle but critical look-ahead bias bug in the original AWFES workflow. The key insight:
TEST must use , NOT .
v3实现修复了原始AWFES工作流中一个微妙但严重的前瞻偏差bug。核心见解:
TEST必须使用,而非。
The Bug (v2 and earlier)
错误(v2及更早版本)
v2 BUG: Bayesian update BEFORE test evaluation
v2错误:贝叶斯更新在测试评估之前
for fold in folds:
epoch_metrics = sweep_epochs(fold.train, fold.validation)
val_optimal_epoch = select_optimal(epoch_metrics)
# WRONG: Update Bayesian with current fold's val_optimal
bayesian.update(val_optimal_epoch, wfe)
selected_epoch = bayesian.get_current_epoch() # CONTAMINATED!
# This selected_epoch is influenced by val_optimal from SAME fold
test_metrics = evaluate(selected_epoch, fold.test) # LOOK-AHEAD BIAS
for fold in folds:
epoch_metrics = sweep_epochs(fold.train, fold.validation)
val_optimal_epoch = select_optimal(epoch_metrics)
# 错误:使用当前折的val_optimal更新贝叶斯
bayesian.update(val_optimal_epoch, wfe)
selected_epoch = bayesian.get_current_epoch() # 被污染!
# 该selected_epoch受同一折的val_optimal影响
test_metrics = evaluate(selected_epoch, fold.test) # 前瞻偏差
v3 CORRECT: Get prior epoch BEFORE any work on current fold
v3正确:在处理当前折之前先获取先验Epoch
for fold in folds:
# Step 1: FIRST - Get epoch from ONLY prior folds
prior_bayesian_epoch = bayesian.get_current_epoch() # BEFORE any fold work
# Step 2: Train and sweep to find this fold's optimal
epoch_metrics = sweep_epochs(fold.train, fold.validation)
val_optimal_epoch = select_optimal(epoch_metrics)
# Step 3: TEST uses prior_bayesian_epoch (NOT val_optimal!)
test_metrics = evaluate(prior_bayesian_epoch, fold.test) # UNBIASED
# Step 4: AFTER test - update Bayesian for FUTURE folds only
bayesian.update(val_optimal_epoch, wfe) # For fold+1, fold+2, ...
for fold in folds:
# 步骤1:首先 - 仅从先验折获取Epoch
prior_bayesian_epoch = bayesian.get_current_epoch() # 在处理折之前
# 步骤2:训练并扫描以找到该折的最优值
epoch_metrics = sweep_epochs(fold.train, fold.validation)
val_optimal_epoch = select_optimal(epoch_metrics)
# 步骤3:TEST使用prior_bayesian_epoch(而非val_optimal!)
test_metrics = evaluate(prior_bayesian_epoch, fold.test) # 无偏差
# 步骤4:测试完成后 - 仅为未来折更新贝叶斯
bayesian.update(val_optimal_epoch, wfe) # 用于折+1、折+2...
| Aspect | v2 (Buggy) | v3 (Fixed) |
|---|
| When Bayesian updated | Before test eval | After test eval |
| Test epoch source | Current fold influences | Only prior folds |
| Information flow | Future → Present | Past → Present only |
| Expected bias | Optimistic by ~10-20% | Unbiased |
| 方面 | v2(错误) | v3(修复) |
|---|
| 贝叶斯更新时间 | 测试评估前 | 测试评估后 |
| 测试Epoch来源 | 当前折有影响 | 仅先验折 |
| 信息流 | 未来→当前 | 仅过去→现在 |
| 预期偏差 | 乐观~10-20% | 无偏差 |
Validation Checkpoint
验证检查点
MANDATORY: Log these values for audit trail
强制要求:记录这些值用于审计追踪
fold_log.info(
f"Fold {fold_idx}: "
f"prior_bayesian_epoch={prior_bayesian_epoch}, "
f"val_optimal_epoch={val_optimal_epoch}, "
f"test_uses={prior_bayesian_epoch}" # MUST equal prior_bayesian_epoch
)
See [references/look-ahead-bias.md](./references/look-ahead-bias.md) for detailed examples.
fold_log.info(
f"折 {fold_idx}: "
f"prior_bayesian_epoch={prior_bayesian_epoch}, "
f"val_optimal_epoch={val_optimal_epoch}, "
f"test_uses={prior_bayesian_epoch}" # 必须等于prior_bayesian_epoch
)
详细示例请参见[references/look-ahead-bias.md](./references/look-ahead-bias.md)。
| Boundary | Embargo | Rationale |
|---|
| Train → Validation | 6% of fold | Prevent feature leakage |
| Validation → Test | 6% of fold | Prevent selection leakage |
| Fold → Fold | 1 hour (calendar) | Range bar duration |
python
def compute_embargo_indices(
n_total: int,
train_pct: float = 0.60,
val_pct: float = 0.20,
test_pct: float = 0.20,
embargo_pct: float = 0.06,
) -> dict[str, tuple[int, int]]:
"""Compute indices for nested split with embargoes.
Returns dict with (start, end) tuples for each segment.
"""
embargo_size = int(n_total * embargo_pct)
train_end = int(n_total * train_pct)
val_start = train_end + embargo_size
val_end = val_start + int(n_total * val_pct)
test_start = val_end + embargo_size
test_end = n_total
return {
"train": (0, train_end),
"embargo_1": (train_end, val_start),
"validation": (val_start, val_end),
"embargo_2": (val_end, test_start),
"test": (test_start, test_end),
}
| 边界 | 间隔 | 理由 |
|---|
| 训练→验证 | 折的6% | 防止特征泄露 |
| 验证→测试 | 折的6% | 防止选择泄露 |
| 折→折 | 1小时(日历时间) | 范围bar持续时间 |
python
def compute_embargo_indices(
n_total: int,
train_pct: float = 0.60,
val_pct: float = 0.20,
test_pct: float = 0.20,
embargo_pct: float = 0.06,
) -> dict[str, tuple[int, int]]:
"""计算带间隔的嵌套拆分索引。
返回包含每个段(start, end)元组的字典。
"""
embargo_size = int(n_total * embargo_pct)
train_end = int(n_total * train_pct)
val_start = train_end + embargo_size
val_end = val_start + int(n_total * val_pct)
test_start = val_end + embargo_size
test_end = n_total
return {
"train": (0, train_end),
"embargo_1": (train_end, val_start),
"validation": (val_start, val_end),
"embargo_2": (val_end, test_start),
"test": (test_start, test_end),
}
Validation Checklist
验证检查清单
Before running AWFES with OOS application:
| Anti-Pattern | Detection | Fix |
|---|
| Using current fold's epoch on current fold's OOS | selected_epoch == fold_optimal_epoch
| Use Bayesian posterior |
| Validation overlaps test | Date ranges overlap | Add embargo |
| Features computed on full dataset | Scaler fit includes test | Per-split scaling |
| Fold shuffling | Folds not time-ordered | Enforce temporal order |
See references/look-ahead-bias.md for detailed examples.
| 反模式 | 检测方式 | 修复方案 |
|---|
| 将当前折的Epoch用于当前折的OOS | selected_epoch == fold_optimal_epoch
| 使用贝叶斯后验 |
| 验证集与测试集重叠 | 日期范围重叠 | 添加间隔 |
| 在全数据集上计算特征 | 缩放器拟合包含测试集 | 按拆分单独缩放 |
| 折打乱 | 折不是时序排列 | 强制时序顺序 |
详细示例请参见references/look-ahead-bias.md。
| Topic | Reference File |
|---|
| Academic Literature | academic-foundations.md |
| Mathematical Formulation | mathematical-formulation.md |
| Decision Tree | epoch-selection-decision-tree.md |
| Anti-Patterns | anti-patterns.md |
| OOS Application | oos-application.md |
| Epoch Smoothing | epoch-smoothing.md |
| OOS Metrics | oos-metrics.md |
| Look-Ahead Bias | look-ahead-bias.md |
| Feature Sets | feature-sets.md |
| xLSTM Implementation | xlstm-implementation.md |
| Range Bar Metrics | range-bar-metrics.md |
| 主题 | 参考文件 |
|---|
| 学术文献 | academic-foundations.md |
| 数学公式 | mathematical-formulation.md |
| 决策树 | epoch-selection-decision-tree.md |
| 反模式 | anti-patterns.md |
| OOS应用 | oos-application.md |
| Epoch平滑 | epoch-smoothing.md |
| OOS指标 | oos-metrics.md |
| 前瞻偏差 | look-ahead-bias.md |
| 特征集 | feature-sets.md |
| xLSTM实现 | xlstm-implementation.md |
| 范围bar指标 | range-bar-metrics.md |
- Bailey, D. H., & López de Prado, M. (2014). The deflated Sharpe ratio: Correcting for selection bias, backtest overfitting and non-normality. The Journal of Portfolio Management, 40(5), 94-107.
- Bischl, B., et al. (2023). Multi-Objective Hyperparameter Optimization in Machine Learning. ACM Transactions on Evolutionary Learning and Optimization.
- López de Prado, M. (2018). Advances in Financial Machine Learning. Wiley. Chapter 7.
- Nomura, M., & Ono, I. (2021). Warm Starting CMA-ES for Hyperparameter Optimization. AAAI Conference on Artificial Intelligence.
- Pardo, R. E. (2008). The Evaluation and Optimization of Trading Strategies, 2nd Edition. John Wiley & Sons.
- Bailey, D. H., & López de Prado, M. (2014). The deflated Sharpe ratio: Correcting for selection bias, backtest overfitting and non-normality. The Journal of Portfolio Management, 40(5), 94-107.
- Bischl, B., et al. (2023). Multi-Objective Hyperparameter Optimization in Machine Learning. ACM Transactions on Evolutionary Learning and Optimization.
- López de Prado, M. (2018). Advances in Financial Machine Learning. Wiley. Chapter 7.
- Nomura, M., & Ono, I. (2021). Warm Starting CMA-ES for Hyperparameter Optimization. AAAI Conference on Artificial Intelligence.
- Pardo, R. E. (2008). The Evaluation and Optimization of Trading Strategies, 2nd Edition. John Wiley & Sons.
| Issue | Cause | Solution |
|---|
| WFE is None | IS_Sharpe below noise floor | Check if IS_Sharpe > 2/sqrt(n_samples) |
| All epochs rejected | Severe overfitting | Reduce model complexity, add regularization |
| Bayesian posterior unstable | High WFE variance | Increase observation_variance or use median WFE |
| Epoch always at boundary | Search range too narrow | Expand min_epoch or max_epoch bounds |
| Look-ahead bias detected | Using val_optimal for test | Use prior_bayesian_epoch for test evaluation |
| DSR too aggressive | Too many epoch candidates | Limit to 3-5 epoch configs (meta-overfitting risk) |
| Cauchy mean issues | Arithmetic mean of WFE | Use median or pooled WFE for aggregation |
| Fold metrics inconsistent | Variable fold sizes | Use pooled WFE (precision-weighted) |
| 问题 | 原因 | 解决方案 |
|---|
| WFE为None | IS_Sharpe低于噪声下限 | 检查IS_Sharpe > 2/sqrt(n_samples) |
| 所有Epoch被拒绝 | 严重过拟合 | 降低模型复杂度,添加正则化 |
| 贝叶斯后验不稳定 | WFE方差高 | 增加observation_variance或使用中位数WFE |
| Epoch始终在边界 | 搜索范围过窄 | 扩大min_epoch或max_epoch边界 |
| 检测到前瞻偏差 | 使用val_optimal进行测试 | 使用prior_bayesian_epoch进行测试评估 |
| DSR过于激进 | Epoch候选过多 | 限制为3-5个Epoch配置(元过拟合风险) |
| 柯西均值问题 | WFE的算术均值 | 使用中位数或加权WFE进行聚合 |
| 折指标不一致 | 折大小可变 | 使用加权WFE(精度加权) |