rangebar-eval-metrics
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseRange Bar Evaluation Metrics
Range Bar评估指标
Machine-readable reference + computation scripts for state-of-the-art metrics evaluating range bar (price-based sampling) data.
用于评估Range Bar(基于价格的采样)数据的最先进指标的机器可读参考资料及计算脚本。
When to Use This Skill
何时使用该技能
Use this skill when:
- Evaluating ML model performance on range bar data
- Computing Sharpe ratios with non-IID bar sequences
- Running Walk-Forward Optimization metric analysis
- Calculating PSR, DSR, or MinTRL statistical tests
- Generating evaluation reports from fold results
在以下场景使用本技能:
- 评估Range Bar数据上的ML模型性能
- 计算非IID Bar序列的Sharpe ratio
- 进行Walk-Forward Optimization(WFO)指标分析
- 计算PSR、DSR或MinTRL统计检验
- 基于折结果生成评估报告
Quick Start
快速开始
bash
undefinedbash
undefinedCompute metrics from predictions + actuals
Compute metrics from predictions + actuals
python scripts/compute_metrics.py --predictions preds.npy --actuals actuals.npy --timestamps ts.npy
python scripts/compute_metrics.py --predictions preds.npy --actuals actuals.npy --timestamps ts.npy
Generate full evaluation report
Generate full evaluation report
python scripts/generate_report.py --results folds.jsonl --output report.md
undefinedpython scripts/generate_report.py --results folds.jsonl --output report.md
undefinedMetric Tiers
指标层级
| Tier | Purpose | Metrics | Compute |
|---|---|---|---|
| Primary (5) | Research decisions | weekly_sharpe, hit_rate, cumulative_pnl, n_bars, positive_sharpe_rate | Per-fold + aggregate |
| Secondary/Risk (5) | Additional context | max_drawdown, bar_sharpe, return_per_bar, profit_factor, cv_fold_returns | Per-fold |
| ML Quality (3) | Prediction health | ic, prediction_autocorr, is_collapsed | Per-fold |
| Diagnostic (5) | Final validation | psr, dsr, autocorr_lag1, effective_n, binomial_pvalue | Aggregate only |
| Extended Risk (5) | Deep risk analysis | var_95, cvar_95, omega_ratio, sortino_ratio, ulcer_index | Per-fold (optional) |
| 层级 | 用途 | 指标 | 计算方式 |
|---|---|---|---|
| 主指标 (5) | 研究决策参考 | weekly_sharpe, hit_rate, cumulative_pnl, n_bars, positive_sharpe_rate | 每折计算+汇总 |
| 次要/风险指标 (5) | 补充上下文信息 | max_drawdown, bar_sharpe, return_per_bar, profit_factor, cv_fold_returns | 每折计算 |
| ML质量指标 (3) | 预测健康度评估 | ic, prediction_autocorr, is_collapsed | 每折计算 |
| 诊断指标 (5) | 最终验证 | psr, dsr, autocorr_lag1, effective_n, binomial_pvalue | 仅汇总计算 |
| 扩展风险指标 (5) | 深度风险分析 | var_95, cvar_95, omega_ratio, sortino_ratio, ulcer_index | 每折计算(可选) |
Why Range Bars Need Special Treatment
为何Range Bar需要特殊处理
Range bars violate standard IID assumptions:
- Variable duration: Bars form based on price movement, not time
- Autocorrelation: High-volatility periods cluster bars → temporal correlation
- Non-constant information: More bars during volatility = more information per day
Canonical solution: Daily aggregation via before Sharpe calculation.
_group_by_day()Range Bar违反了标准的IID(独立同分布)假设:
- 可变时长:Bar基于价格波动形成,而非时间
- 自相关性:高波动时期Bar会聚集 → 时间相关性
- 非恒定信息量:波动期间Bar更多 → 每日信息量更大
标准解决方案:在计算Sharpe ratio前,通过按日聚合。
_group_by_day()References
参考资料
Core Reference Files
核心参考文件
| Topic | Reference File |
|---|---|
| Sharpe Ratio Calculations | sharpe-formulas.md |
| Risk Metrics (VaR, Omega, Ulcer) | risk-metrics.md |
| ML Prediction Quality (IC, Autocorr) | ml-prediction-quality.md |
| Crypto Market Considerations | crypto-markets.md |
| Temporal Aggregation Rules | temporal-aggregation.md |
| JSON Schema for Metrics | metrics-schema.md |
| Anti-Patterns (Transaction Costs) | anti-patterns.md |
| SOTA 2025-2026 (SHAP, BOCPD, etc.) | sota-2025-2026.md |
| Worked Examples (BTC, EUR/USD) | worked-examples.md |
| Structured Logging (NDJSON) | structured-logging.md |
| 主题 | 参考文件 |
|---|---|
| Sharpe Ratio计算 | sharpe-formulas.md |
| 风险指标(VaR、Omega、Ulcer) | risk-metrics.md |
| ML预测质量(IC、自相关) | ml-prediction-quality.md |
| 加密货币市场考量 | crypto-markets.md |
| 时间聚合规则 | temporal-aggregation.md |
| 指标JSON Schema | metrics-schema.md |
| 反模式(交易成本) | anti-patterns.md |
| 2025-2026最先进技术(SHAP、BOCPD等) | sota-2025-2026.md |
| 实操示例(BTC、EUR/USD) | worked-examples.md |
| 结构化日志(NDJSON) | structured-logging.md |
Related Skills
相关技能
| Skill | Relationship |
|---|---|
| adaptive-wfo-epoch | Uses |
| 技能 | 关联关系 |
|---|---|
| adaptive-wfo-epoch | 使用 |
Dependencies
依赖项
bash
pip install -r requirements.txtbash
pip install -r requirements.txtOr: pip install numpy>=1.24 pandas>=2.0 scipy>=1.10
Or: pip install numpy>=1.24 pandas>=2.0 scipy>=1.10
undefinedundefinedKey Formulas
核心公式
Daily-Aggregated Sharpe (Primary Metric)
按日聚合的Sharpe Ratio(主指标)
python
def weekly_sharpe(pnl: np.ndarray, timestamps: np.ndarray) -> float:
"""Sharpe with daily aggregation for range bars."""
daily_pnl = _group_by_day(pnl, timestamps) # Sum PnL per calendar day
if len(daily_pnl) < 2 or np.std(daily_pnl) == 0:
return 0.0
daily_sharpe = np.mean(daily_pnl) / np.std(daily_pnl)
# For crypto (7-day week): sqrt(7). For equities: sqrt(5)
return daily_sharpe * np.sqrt(7) # Crypto defaultpython
def weekly_sharpe(pnl: np.ndarray, timestamps: np.ndarray) -> float:
"""Sharpe with daily aggregation for range bars."""
daily_pnl = _group_by_day(pnl, timestamps) # Sum PnL per calendar day
if len(daily_pnl) < 2 or np.std(daily_pnl) == 0:
return 0.0
daily_sharpe = np.mean(daily_pnl) / np.std(daily_pnl)
# For crypto (7-day week): sqrt(7). For equities: sqrt(5)
return daily_sharpe * np.sqrt(7) # Crypto defaultInformation Coefficient (Prediction Quality)
信息系数(预测质量)
python
from scipy.stats import spearmanr
def information_coefficient(predictions: np.ndarray, actuals: np.ndarray) -> float:
"""Spearman rank IC - captures magnitude alignment."""
ic, _ = spearmanr(predictions, actuals)
return ic # Range: [-1, 1]. >0.02 acceptable, >0.05 good, >0.10 excellentpython
from scipy.stats import spearmanr
def information_coefficient(predictions: np.ndarray, actuals: np.ndarray) -> float:
"""Spearman rank IC - captures magnitude alignment."""
ic, _ = spearmanr(predictions, actuals)
return ic # Range: [-1, 1]. >0.02 acceptable, >0.05 good, >0.10 excellent信息系数范围:[-1, 1]。>0.02为可接受,>0.05为良好,>0.10为优秀
Probabilistic Sharpe Ratio (Statistical Validation)
概率夏普比率(统计验证)
python
from scipy.stats import norm
def psr(sharpe: float, se: float, benchmark: float = 0.0) -> float:
"""P(true Sharpe > benchmark)."""
return norm.cdf((sharpe - benchmark) / se)python
from scipy.stats import norm
def psr(sharpe: float, se: float, benchmark: float = 0.0) -> float:
"""P(true Sharpe > benchmark)."""
return norm.cdf((sharpe - benchmark) / se)Annualization Factors
年化系数
| Market | Daily → Weekly | Daily → Annual | Rationale |
|---|---|---|---|
| Crypto (24/7) | sqrt(7) = 2.65 | sqrt(365) = 19.1 | 7 trading days/week |
| Equity | sqrt(5) = 2.24 | sqrt(252) = 15.9 | 5 trading days/week |
NEVER use sqrt(252) for crypto markets.
| 市场 | 日→周 | 日→年化 | 依据 |
|---|---|---|---|
| 加密货币(7*24) | sqrt(7) = 2.65 | sqrt(365) = 19.1 | 每周7个交易日 |
| 股票 | sqrt(5) = 2.24 | sqrt(252) = 15.9 | 每周5个交易日 |
切勿对加密货币市场使用sqrt(252)。
CRITICAL: Session Filter Changes Annualization
重要提示:时段筛选会改变年化系数
| View | Filter | days_per_week | Rationale |
|---|---|---|---|
| Session-filtered (London-NY) | Weekdays 08:00-16:00 | sqrt(5) | Trading like equities |
| All-bars (unfiltered) | None | sqrt(7) | Full 24/7 crypto |
Using sqrt(7) for session-filtered data overstates Sharpe by ~18%!
See crypto-markets.md for detailed rationale.
| 视角 | 筛选条件 | 每周天数系数 | 依据 |
|---|---|---|---|
| 时段筛选后(伦敦-纽约) | 工作日08:00-16:00 | sqrt(5) | 与股票交易模式类似 |
| 全Bar(未筛选) | 无 | sqrt(7) | 加密货币7*24交易 |
对时段筛选后的数据使用sqrt(7)会使Sharpe比率高估约18%!
详情请参阅crypto-markets.md。
Dual-View Metrics
双视角指标
For comprehensive analysis, compute metrics with BOTH views:
- Session-filtered (London 08:00 to NY 16:00): Primary strategy evaluation
- All-bars: Regime detection, data quality diagnostics
为进行全面分析,请同时计算两种视角下的指标:
- 时段筛选后(伦敦08:00至纽约16:00):策略评估主视角
- 全Bar:Regime检测、数据质量诊断
Academic References
学术参考
| Concept | Citation |
|---|---|
| Deflated Sharpe Ratio | Bailey & López de Prado (2014) |
| Sharpe SE with Non-Normality | Mertens (2002) |
| Statistics of Sharpe Ratios | Lo (2002) |
| Omega Ratio | Keating & Shadwick (2002) |
| Ulcer Index | Peter Martin (1987) |
| 概念 | 引用文献 |
|---|---|
| Deflated Sharpe Ratio | Bailey & López de Prado (2014) |
| 非正态分布下的Sharpe标准误 | Mertens (2002) |
| Sharpe比率统计特性 | Lo (2002) |
| Omega比率 | Keating & Shadwick (2002) |
| Ulcer指数 | Peter Martin (1987) |
Decision Framework
决策框架
Go Criteria (Research)
研究通过标准
yaml
go_criteria:
- positive_sharpe_rate > 0.55
- mean_weekly_sharpe > 0
- cv_fold_returns < 1.5
- mean_hit_rate > 0.50yaml
go_criteria:
- positive_sharpe_rate > 0.55
- mean_weekly_sharpe > 0
- cv_fold_returns < 1.5
- mean_hit_rate > 0.50Publication Criteria
发表标准
yaml
publication_criteria:
- binomial_pvalue < 0.05
- psr > 0.85
- dsr > 0.50 # If n_trials > 1yaml
publication_criteria:
- binomial_pvalue < 0.05
- psr > 0.85
- dsr > 0.50 # If n_trials > 1Scripts
脚本
| Script | Purpose |
|---|---|
| Compute all metrics from predictions/actuals |
| Generate Markdown report from fold results |
| Validate metrics JSON against schema |
| 脚本 | 用途 |
|---|---|
| 基于预测值和实际值计算所有指标 |
| 基于折结果生成Markdown报告 |
| 验证指标JSON是否符合Schema |
Remediations (2026-01-19 Multi-Agent Audit)
修复措施(2026-01-19多Agent审计)
The following fixes were applied based on a 12-subagent adversarial audit:
| Issue | Root Cause | Fix | Source |
|---|---|---|---|
| Constant predictions | Model collapse detection + architecture fix | model-expert |
| Zero variance predictions | Return 1.0 for constant (semantically correct) | model-expert |
| Division by zero | Guard for std < 1e-10, return 1.0 | model-expert |
| Ulcer Index divide-by-zero | Peak equity = 0 | Guard with np.where(peak > 1e-10, ...) | risk-analyst |
| Omega/Profit Factor unreliable | Too few samples | min_days parameter (default: 5) | robustness-analyst |
| BiLSTM mean collapse | Architecture too small | hidden_size: 16→48, dropout: 0.5→0.3 | model-expert |
| Early return wrong value | Return NaN when no data to compute ratio | risk-analyst |
基于12个Agent的对抗性审计,已应用以下修复:
| 问题 | 根本原因 | 修复方案 | 来源 |
|---|---|---|---|
| 预测值恒定 | 模型崩溃检测 + 架构修复 | model-expert |
| 预测值方差为零 | 返回1.0(语义正确) | model-expert |
| 除零错误 | 增加std <1e-10的判断,返回1.0 | model-expert |
| Ulcer指数除零错误 | 峰值权益=0 | 使用np.where(peak >1e-10, ...)添加判断 | risk-analyst |
| Omega/盈利因子不可靠 | 样本量过少 | 添加min_days参数(默认:5) | robustness-analyst |
| BiLSTM均值崩溃 | 架构过小 | hidden_size:16→48,dropout:0.5→0.3 | model-expert |
| 提前返回错误值 | 无数据计算比率时返回NaN | risk-analyst |
Model Collapse Detection
模型崩溃检测
python
undefinedpython
undefinedALWAYS check for model collapse after prediction
ALWAYS check for model collapse after prediction
pred_std = np.std(predictions)
if pred_std < 1e-6:
logger.warning(
f"Constant predictions detected (std={pred_std:.2e}). "
"Model collapsed to mean - check architecture."
)
undefinedpred_std = np.std(predictions)
if pred_std < 1e-6:
logger.warning(
f"Constant predictions detected (std={pred_std:.2e}). "
"Model collapsed to mean - check architecture."
)
undefinedRecommended BiLSTM Architecture
推荐的BiLSTM架构
python
undefinedpython
undefinedBEFORE (causes collapse on range bars)
BEFORE (causes collapse on range bars)
HIDDEN_SIZE = 16
DROPOUT = 0.5
HIDDEN_SIZE = 16
DROPOUT = 0.5
AFTER (prevents collapse)
AFTER (prevents collapse)
HIDDEN_SIZE = 48 # Triple capacity
DROPOUT = 0.3 # Less aggressive regularization
See reference docs for complete implementation details.
---HIDDEN_SIZE = 48 # Triple capacity
DROPOUT = 0.3 # Less aggressive regularization
请参阅参考文档获取完整实现细节。
---Troubleshooting
故障排除
| Issue | Cause | Solution |
|---|---|---|
| weekly_sharpe is 0 | Constant predictions | Check for model collapse, increase hidden_size |
| IC returns None | Zero variance in predictions | Model collapsed - check architecture |
| prediction_autocorr is NaN | Division by zero | Guard for std < 1e-10 in autocorr calculation |
| Ulcer Index divide error | Peak equity is zero | Add guard: np.where(peak > 1e-10, ...) |
| profit_factor = 1.0 | No bars processed | Return NaN when n_bars is 0 |
| Sharpe inflated 18% | Wrong annualization for data | Use sqrt(5) for session-filtered, sqrt(7) for 24/7 |
| PSR/DSR not computed | Missing scipy | Install: |
| Timestamps not parsed | Wrong format | Ensure Unix timestamps, not datetime strings |
| 问题 | 原因 | 解决方案 |
|---|---|---|
| weekly_sharpe为0 | 预测值恒定 | 检查模型崩溃情况,增大hidden_size |
| IC返回None | 预测值方差为零 | 模型崩溃 - 检查架构 |
| prediction_autocorr为NaN | 除零错误 | 在自相关计算中添加std <1e-10的判断 |
| Ulcer指数计算错误 | 峰值权益为零 | 添加判断:np.where(peak >1e-10, ...) |
| profit_factor=1.0 | 未处理任何Bar | 当n_bars为0时返回NaN |
| Sharpe比率高估18% | 数据使用错误的年化系数 | 时段筛选后数据用sqrt(5),7*24数据用sqrt(7) |
| PSR/DSR未计算 | 缺少scipy | 安装: |
| 时间戳未解析 | 格式错误 | 确保为Unix时间戳,而非日期时间字符串 |