rangebar-eval-metrics

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Range Bar Evaluation Metrics

Range Bar评估指标

Machine-readable reference + computation scripts for state-of-the-art metrics evaluating range bar (price-based sampling) data.
用于评估Range Bar(基于价格的采样)数据的最先进指标的机器可读参考资料及计算脚本。

When to Use This Skill

何时使用该技能

Use this skill when:
  • Evaluating ML model performance on range bar data
  • Computing Sharpe ratios with non-IID bar sequences
  • Running Walk-Forward Optimization metric analysis
  • Calculating PSR, DSR, or MinTRL statistical tests
  • Generating evaluation reports from fold results
在以下场景使用本技能:
  • 评估Range Bar数据上的ML模型性能
  • 计算非IID Bar序列的Sharpe ratio
  • 进行Walk-Forward Optimization(WFO)指标分析
  • 计算PSR、DSR或MinTRL统计检验
  • 基于折结果生成评估报告

Quick Start

快速开始

bash
undefined
bash
undefined

Compute metrics from predictions + actuals

Compute metrics from predictions + actuals

python scripts/compute_metrics.py --predictions preds.npy --actuals actuals.npy --timestamps ts.npy
python scripts/compute_metrics.py --predictions preds.npy --actuals actuals.npy --timestamps ts.npy

Generate full evaluation report

Generate full evaluation report

python scripts/generate_report.py --results folds.jsonl --output report.md
undefined
python scripts/generate_report.py --results folds.jsonl --output report.md
undefined

Metric Tiers

指标层级

TierPurposeMetricsCompute
Primary (5)Research decisionsweekly_sharpe, hit_rate, cumulative_pnl, n_bars, positive_sharpe_ratePer-fold + aggregate
Secondary/Risk (5)Additional contextmax_drawdown, bar_sharpe, return_per_bar, profit_factor, cv_fold_returnsPer-fold
ML Quality (3)Prediction healthic, prediction_autocorr, is_collapsedPer-fold
Diagnostic (5)Final validationpsr, dsr, autocorr_lag1, effective_n, binomial_pvalueAggregate only
Extended Risk (5)Deep risk analysisvar_95, cvar_95, omega_ratio, sortino_ratio, ulcer_indexPer-fold (optional)
层级用途指标计算方式
主指标 (5)研究决策参考weekly_sharpe, hit_rate, cumulative_pnl, n_bars, positive_sharpe_rate每折计算+汇总
次要/风险指标 (5)补充上下文信息max_drawdown, bar_sharpe, return_per_bar, profit_factor, cv_fold_returns每折计算
ML质量指标 (3)预测健康度评估ic, prediction_autocorr, is_collapsed每折计算
诊断指标 (5)最终验证psr, dsr, autocorr_lag1, effective_n, binomial_pvalue仅汇总计算
扩展风险指标 (5)深度风险分析var_95, cvar_95, omega_ratio, sortino_ratio, ulcer_index每折计算(可选)

Why Range Bars Need Special Treatment

为何Range Bar需要特殊处理

Range bars violate standard IID assumptions:
  1. Variable duration: Bars form based on price movement, not time
  2. Autocorrelation: High-volatility periods cluster bars → temporal correlation
  3. Non-constant information: More bars during volatility = more information per day
Canonical solution: Daily aggregation via
_group_by_day()
before Sharpe calculation.
Range Bar违反了标准的IID(独立同分布)假设:
  1. 可变时长:Bar基于价格波动形成,而非时间
  2. 自相关性:高波动时期Bar会聚集 → 时间相关性
  3. 非恒定信息量:波动期间Bar更多 → 每日信息量更大
标准解决方案:在计算Sharpe ratio前,通过
_group_by_day()
按日聚合。

References

参考资料

Core Reference Files

核心参考文件

TopicReference File
Sharpe Ratio Calculationssharpe-formulas.md
Risk Metrics (VaR, Omega, Ulcer)risk-metrics.md
ML Prediction Quality (IC, Autocorr)ml-prediction-quality.md
Crypto Market Considerationscrypto-markets.md
Temporal Aggregation Rulestemporal-aggregation.md
JSON Schema for Metricsmetrics-schema.md
Anti-Patterns (Transaction Costs)anti-patterns.md
SOTA 2025-2026 (SHAP, BOCPD, etc.)sota-2025-2026.md
Worked Examples (BTC, EUR/USD)worked-examples.md
Structured Logging (NDJSON)structured-logging.md
主题参考文件
Sharpe Ratio计算sharpe-formulas.md
风险指标(VaR、Omega、Ulcer)risk-metrics.md
ML预测质量(IC、自相关)ml-prediction-quality.md
加密货币市场考量crypto-markets.md
时间聚合规则temporal-aggregation.md
指标JSON Schemametrics-schema.md
反模式(交易成本)anti-patterns.md
2025-2026最先进技术(SHAP、BOCPD等)sota-2025-2026.md
实操示例(BTC、EUR/USD)worked-examples.md
结构化日志(NDJSON)structured-logging.md

Related Skills

相关技能

SkillRelationship
adaptive-wfo-epochUses
weekly_sharpe
,
psr
,
dsr
for WFE calculation
技能关联关系
adaptive-wfo-epoch使用
weekly_sharpe
psr
dsr
进行WFE计算

Dependencies

依赖项

bash
pip install -r requirements.txt
bash
pip install -r requirements.txt

Or: pip install numpy>=1.24 pandas>=2.0 scipy>=1.10

Or: pip install numpy>=1.24 pandas>=2.0 scipy>=1.10

undefined
undefined

Key Formulas

核心公式

Daily-Aggregated Sharpe (Primary Metric)

按日聚合的Sharpe Ratio(主指标)

python
def weekly_sharpe(pnl: np.ndarray, timestamps: np.ndarray) -> float:
    """Sharpe with daily aggregation for range bars."""
    daily_pnl = _group_by_day(pnl, timestamps)  # Sum PnL per calendar day
    if len(daily_pnl) < 2 or np.std(daily_pnl) == 0:
        return 0.0
    daily_sharpe = np.mean(daily_pnl) / np.std(daily_pnl)
    # For crypto (7-day week): sqrt(7). For equities: sqrt(5)
    return daily_sharpe * np.sqrt(7)  # Crypto default
python
def weekly_sharpe(pnl: np.ndarray, timestamps: np.ndarray) -> float:
    """Sharpe with daily aggregation for range bars."""
    daily_pnl = _group_by_day(pnl, timestamps)  # Sum PnL per calendar day
    if len(daily_pnl) < 2 or np.std(daily_pnl) == 0:
        return 0.0
    daily_sharpe = np.mean(daily_pnl) / np.std(daily_pnl)
    # For crypto (7-day week): sqrt(7). For equities: sqrt(5)
    return daily_sharpe * np.sqrt(7)  # Crypto default

Information Coefficient (Prediction Quality)

信息系数(预测质量)

python
from scipy.stats import spearmanr

def information_coefficient(predictions: np.ndarray, actuals: np.ndarray) -> float:
    """Spearman rank IC - captures magnitude alignment."""
    ic, _ = spearmanr(predictions, actuals)
    return ic  # Range: [-1, 1]. >0.02 acceptable, >0.05 good, >0.10 excellent
python
from scipy.stats import spearmanr

def information_coefficient(predictions: np.ndarray, actuals: np.ndarray) -> float:
    """Spearman rank IC - captures magnitude alignment."""
    ic, _ = spearmanr(predictions, actuals)
    return ic  # Range: [-1, 1]. >0.02 acceptable, >0.05 good, >0.10 excellent
信息系数范围:[-1, 1]。>0.02为可接受,>0.05为良好,>0.10为优秀

Probabilistic Sharpe Ratio (Statistical Validation)

概率夏普比率(统计验证)

python
from scipy.stats import norm

def psr(sharpe: float, se: float, benchmark: float = 0.0) -> float:
    """P(true Sharpe > benchmark)."""
    return norm.cdf((sharpe - benchmark) / se)
python
from scipy.stats import norm

def psr(sharpe: float, se: float, benchmark: float = 0.0) -> float:
    """P(true Sharpe > benchmark)."""
    return norm.cdf((sharpe - benchmark) / se)

Annualization Factors

年化系数

MarketDaily → WeeklyDaily → AnnualRationale
Crypto (24/7)sqrt(7) = 2.65sqrt(365) = 19.17 trading days/week
Equitysqrt(5) = 2.24sqrt(252) = 15.95 trading days/week
NEVER use sqrt(252) for crypto markets.
市场日→周日→年化依据
加密货币(7*24)sqrt(7) = 2.65sqrt(365) = 19.1每周7个交易日
股票sqrt(5) = 2.24sqrt(252) = 15.9每周5个交易日
切勿对加密货币市场使用sqrt(252)。

CRITICAL: Session Filter Changes Annualization

重要提示:时段筛选会改变年化系数

ViewFilterdays_per_weekRationale
Session-filtered (London-NY)Weekdays 08:00-16:00sqrt(5)Trading like equities
All-bars (unfiltered)Nonesqrt(7)Full 24/7 crypto
Using sqrt(7) for session-filtered data overstates Sharpe by ~18%!
See crypto-markets.md for detailed rationale.
视角筛选条件每周天数系数依据
时段筛选后(伦敦-纽约)工作日08:00-16:00sqrt(5)与股票交易模式类似
全Bar(未筛选)sqrt(7)加密货币7*24交易
对时段筛选后的数据使用sqrt(7)会使Sharpe比率高估约18%!
详情请参阅crypto-markets.md

Dual-View Metrics

双视角指标

For comprehensive analysis, compute metrics with BOTH views:
  1. Session-filtered (London 08:00 to NY 16:00): Primary strategy evaluation
  2. All-bars: Regime detection, data quality diagnostics
为进行全面分析,请同时计算两种视角下的指标:
  1. 时段筛选后(伦敦08:00至纽约16:00):策略评估主视角
  2. 全Bar:Regime检测、数据质量诊断

Academic References

学术参考

ConceptCitation
Deflated Sharpe RatioBailey & López de Prado (2014)
Sharpe SE with Non-NormalityMertens (2002)
Statistics of Sharpe RatiosLo (2002)
Omega RatioKeating & Shadwick (2002)
Ulcer IndexPeter Martin (1987)
概念引用文献
Deflated Sharpe RatioBailey & López de Prado (2014)
非正态分布下的Sharpe标准误Mertens (2002)
Sharpe比率统计特性Lo (2002)
Omega比率Keating & Shadwick (2002)
Ulcer指数Peter Martin (1987)

Decision Framework

决策框架

Go Criteria (Research)

研究通过标准

yaml
go_criteria:
  - positive_sharpe_rate > 0.55
  - mean_weekly_sharpe > 0
  - cv_fold_returns < 1.5
  - mean_hit_rate > 0.50
yaml
go_criteria:
  - positive_sharpe_rate > 0.55
  - mean_weekly_sharpe > 0
  - cv_fold_returns < 1.5
  - mean_hit_rate > 0.50

Publication Criteria

发表标准

yaml
publication_criteria:
  - binomial_pvalue < 0.05
  - psr > 0.85
  - dsr > 0.50 # If n_trials > 1
yaml
publication_criteria:
  - binomial_pvalue < 0.05
  - psr > 0.85
  - dsr > 0.50 # If n_trials > 1

Scripts

脚本

ScriptPurpose
scripts/compute_metrics.py
Compute all metrics from predictions/actuals
scripts/generate_report.py
Generate Markdown report from fold results
scripts/validate_schema.py
Validate metrics JSON against schema
脚本用途
scripts/compute_metrics.py
基于预测值和实际值计算所有指标
scripts/generate_report.py
基于折结果生成Markdown报告
scripts/validate_schema.py
验证指标JSON是否符合Schema

Remediations (2026-01-19 Multi-Agent Audit)

修复措施(2026-01-19多Agent审计)

The following fixes were applied based on a 12-subagent adversarial audit:
IssueRoot CauseFixSource
weekly_sharpe=0
Constant predictionsModel collapse detection + architecture fixmodel-expert
IC=None
Zero variance predictionsReturn 1.0 for constant (semantically correct)model-expert
prediction_autocorr=NaN
Division by zeroGuard for std < 1e-10, return 1.0model-expert
Ulcer Index divide-by-zeroPeak equity = 0Guard with np.where(peak > 1e-10, ...)risk-analyst
Omega/Profit Factor unreliableToo few samplesmin_days parameter (default: 5)robustness-analyst
BiLSTM mean collapseArchitecture too smallhidden_size: 16→48, dropout: 0.5→0.3model-expert
profit_factor=1.0
(n_bars=0)
Early return wrong valueReturn NaN when no data to compute ratiorisk-analyst
基于12个Agent的对抗性审计,已应用以下修复:
问题根本原因修复方案来源
weekly_sharpe=0
预测值恒定模型崩溃检测 + 架构修复model-expert
IC=None
预测值方差为零返回1.0(语义正确)model-expert
prediction_autocorr=NaN
除零错误增加std <1e-10的判断,返回1.0model-expert
Ulcer指数除零错误峰值权益=0使用np.where(peak >1e-10, ...)添加判断risk-analyst
Omega/盈利因子不可靠样本量过少添加min_days参数(默认:5)robustness-analyst
BiLSTM均值崩溃架构过小hidden_size:16→48,dropout:0.5→0.3model-expert
profit_factor=1.0
(n_bars=0)
提前返回错误值无数据计算比率时返回NaNrisk-analyst

Model Collapse Detection

模型崩溃检测

python
undefined
python
undefined

ALWAYS check for model collapse after prediction

ALWAYS check for model collapse after prediction

pred_std = np.std(predictions) if pred_std < 1e-6: logger.warning( f"Constant predictions detected (std={pred_std:.2e}). " "Model collapsed to mean - check architecture." )
undefined
pred_std = np.std(predictions) if pred_std < 1e-6: logger.warning( f"Constant predictions detected (std={pred_std:.2e}). " "Model collapsed to mean - check architecture." )
undefined

Recommended BiLSTM Architecture

推荐的BiLSTM架构

python
undefined
python
undefined

BEFORE (causes collapse on range bars)

BEFORE (causes collapse on range bars)

HIDDEN_SIZE = 16 DROPOUT = 0.5
HIDDEN_SIZE = 16 DROPOUT = 0.5

AFTER (prevents collapse)

AFTER (prevents collapse)

HIDDEN_SIZE = 48 # Triple capacity DROPOUT = 0.3 # Less aggressive regularization

See reference docs for complete implementation details.

---
HIDDEN_SIZE = 48 # Triple capacity DROPOUT = 0.3 # Less aggressive regularization

请参阅参考文档获取完整实现细节。

---

Troubleshooting

故障排除

IssueCauseSolution
weekly_sharpe is 0Constant predictionsCheck for model collapse, increase hidden_size
IC returns NoneZero variance in predictionsModel collapsed - check architecture
prediction_autocorr is NaNDivision by zeroGuard for std < 1e-10 in autocorr calculation
Ulcer Index divide errorPeak equity is zeroAdd guard: np.where(peak > 1e-10, ...)
profit_factor = 1.0No bars processedReturn NaN when n_bars is 0
Sharpe inflated 18%Wrong annualization for dataUse sqrt(5) for session-filtered, sqrt(7) for 24/7
PSR/DSR not computedMissing scipyInstall:
pip install scipy
Timestamps not parsedWrong formatEnsure Unix timestamps, not datetime strings
问题原因解决方案
weekly_sharpe为0预测值恒定检查模型崩溃情况,增大hidden_size
IC返回None预测值方差为零模型崩溃 - 检查架构
prediction_autocorr为NaN除零错误在自相关计算中添加std <1e-10的判断
Ulcer指数计算错误峰值权益为零添加判断:np.where(peak >1e-10, ...)
profit_factor=1.0未处理任何Bar当n_bars为0时返回NaN
Sharpe比率高估18%数据使用错误的年化系数时段筛选后数据用sqrt(5),7*24数据用sqrt(7)
PSR/DSR未计算缺少scipy安装:
pip install scipy
时间戳未解析格式错误确保为Unix时间戳,而非日期时间字符串