signal-classification

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Signal Classification

信号分类

Predict whether an asset's price will move up or down over a forward horizon using supervised machine learning classifiers. This skill covers the full pipeline: label creation, model training, walk-forward validation, feature importance analysis, and threshold optimization for trading applications.
使用监督机器学习分类器预测资产价格在未来一段时间内的涨跌方向。本技能涵盖完整流程:标签创建、模型训练、滚动向前验证(walk-forward validation)、特征重要性分析以及面向交易场景的阈值优化。

Why Tree-Based Models Dominate Trading ML

为何树模型在交易机器学习中占据主导地位

XGBoost and LightGBM are the workhorses of quantitative trading ML for good reason:
  • Non-linear relationships: Financial features interact in complex, non-linear ways that trees capture naturally
  • Robust to feature scale: No need to normalize or standardize inputs — trees split on rank order
  • Built-in feature importance: Understand which features drive predictions without separate analysis
  • Fast training and inference: Train on thousands of samples in seconds, predict in microseconds
  • Handle missing values: Native support for NaN without imputation hacks
  • Regularization built in: max_depth, min_child_weight, subsample all prevent overfitting
Linear models and deep learning have their place, but for tabular trading features with fewer than 100k samples, gradient-boosted trees consistently outperform alternatives.
XGBoost和LightGBM成为量化交易机器学习的主力工具是有充分理由的:
  • 非线性关系:金融特征间存在复杂的非线性交互,树模型可自然捕捉这类关系
  • 不受特征尺度影响:无需对输入进行归一化或标准化——树模型基于排序进行分裂
  • 内置特征重要性:无需额外分析即可了解哪些特征驱动预测结果
  • 训练与推理速度快:数秒内即可完成数千样本的训练,推理仅需微秒级时间
  • 处理缺失值:原生支持NaN,无需借助填充技巧
  • 内置正则化:max_depth、min_child_weight、subsample等参数均可防止过拟合
线性模型和深度学习各有其适用场景,但对于样本量不足10万的表格型交易特征,梯度提升树的表现始终优于其他模型。

Classification Types

分类类型

Binary Classification

二分类

The simplest and most common setup. Predict whether forward returns exceed a threshold:
  • Up signal: forward return > +1%
  • Down signal: forward return < -1%
  • Neutral (excluded): -1% to +1% — drop these from training to create cleaner labels
python
import numpy as np

def create_binary_labels(
    prices: np.ndarray, horizon: int = 24, threshold: float = 0.01
) -> np.ndarray:
    """Create binary labels from forward returns.

    Args:
        prices: Array of prices.
        horizon: Forward return lookback in bars.
        threshold: Minimum return magnitude for a label.

    Returns:
        Array of labels: 1 (up), 0 (down), NaN (neutral).
    """
    fwd_returns = np.roll(prices, -horizon) / prices - 1
    fwd_returns[-horizon:] = np.nan
    labels = np.where(fwd_returns > threshold, 1,
             np.where(fwd_returns < -threshold, 0, np.nan))
    return labels
最简单且最常用的设置。预测未来收益是否超过阈值:
  • 上涨信号:未来收益 > +1%
  • 下跌信号:未来收益 < -1%
  • 中性(排除):-1% 至 +1% —— 训练时剔除这类样本以生成更清晰的标签
python
import numpy as np

def create_binary_labels(
    prices: np.ndarray, horizon: int = 24, threshold: float = 0.01
) -> np.ndarray:
    """Create binary labels from forward returns.

    Args:
        prices: Array of prices.
        horizon: Forward return lookback in bars.
        threshold: Minimum return magnitude for a label.

    Returns:
        Array of labels: 1 (up), 0 (down), NaN (neutral).
    """
    fwd_returns = np.roll(prices, -horizon) / prices - 1
    fwd_returns[-horizon:] = np.nan
    labels = np.where(fwd_returns > threshold, 1,
             np.where(fwd_returns < -threshold, 0, np.nan))
    return labels

Multi-Class Classification

多分类

Three classes for finer signal granularity:
ClassConditionTypical threshold
Strong Upfwd_return > +2%High confidence long
Mild Up+0.5% to +2%Moderate confidence
Downfwd_return < -0.5%Avoid / short
Multi-class reduces per-class sample size. Use only with large datasets (1000+ samples per class).
三类划分以实现更精细的信号粒度:
类别条件典型用途
强势上涨fwd_return > +2%高置信度做多
温和上涨+0.5% 至 +2%中等置信度操作
下跌fwd_return < -0.5%规避/做空
多分类会减少每个类别的样本量,仅适用于大型数据集(每个类别样本量≥1000)。

Probability Calibration

概率校准

Raw model probabilities from XGBoost/LightGBM are not well-calibrated. A predicted 0.7 probability does not mean 70% chance of being correct. Use calibration to fix this:
python
from sklearn.calibration import CalibratedClassifierCV

calibrated = CalibratedClassifierCV(base_model, cv=5, method="isotonic")
calibrated.fit(X_train, y_train)
probs = calibrated.predict_proba(X_test)[:, 1]
Isotonic calibration works better than Platt scaling for tree models.
XGBoost/LightGBM输出的原始模型概率校准效果不佳。预测概率为0.7并不代表70%的正确率,需通过校准解决此问题:
python
from sklearn.calibration import CalibratedClassifierCV

calibrated = CalibratedClassifierCV(base_model, cv=5, method="isotonic")
calibrated.fit(X_train, y_train)
probs = calibrated.predict_proba(X_test)[:, 1]
对于树模型,等渗校准的效果优于Platt缩放。

Walk-Forward Validation

滚动向前验证(Walk-Forward Validation)

This is the single most important concept in trading ML. Standard cross-validation randomly shuffles data, which creates lookahead bias. Walk-forward validation respects time ordering.
这是交易机器学习中最重要的概念。 标准交叉验证会随机打乱数据,从而引入前瞻偏差,而滚动向前验证遵循时间顺序。

How It Works

工作原理

Window 1: [===TRAIN===][GAP][=TEST=]
Window 2:    [===TRAIN===][GAP][=TEST=]
Window 3:       [===TRAIN===][GAP][=TEST=]
Window 4:          [===TRAIN===][GAP][=TEST=]
Each window:
  1. Train on past N bars
  2. Skip a gap (embargo) equal to the forward return horizon
  3. Predict on next M bars
  4. Record out-of-sample predictions
  5. Slide forward and repeat
Window 1: [===TRAIN===][GAP][=TEST=]
Window 2:    [===TRAIN===][GAP][=TEST=]
Window 3:       [===TRAIN===][GAP][=TEST=]
Window 4:          [===TRAIN===][GAP][=TEST=]
每个窗口的流程:
  1. 基于过去N根K线训练模型
  2. 跳过与未来收益周期相等的间隔(禁售期)
  3. 对接下来M根K线进行预测
  4. 记录样本外预测结果
  5. 向前滑动窗口并重复上述步骤

Typical Parameters

典型参数

ParameterValueRationale
Train window30 days (720 hourly bars)Enough data to learn, recent enough to be relevant
Test window7 days (168 hourly bars)Enough predictions for statistical significance
Step size1 day (24 bars)Overlap test windows for more data points
Gap (embargo)Same as forward horizonPrevents label leakage
参数取值理由
训练窗口30天(720根小时K线)数据量足够用于学习,且时效性强
测试窗口7天(168根小时K线)预测数量足够具备统计显著性
步长1天(24根小时K线)测试窗口重叠以获取更多数据点
间隔(禁售期)与未来收益周期相同防止标签泄露

Walk-Forward Implementation

滚动向前验证实现

python
from typing import Iterator

def walk_forward_splits(
    n_samples: int,
    train_size: int = 720,
    test_size: int = 168,
    step_size: int = 24,
    gap: int = 24,
) -> Iterator[tuple[np.ndarray, np.ndarray]]:
    """Generate walk-forward train/test index splits.

    Args:
        n_samples: Total number of samples.
        train_size: Number of training samples per window.
        test_size: Number of test samples per window.
        step_size: Step between successive windows.
        gap: Gap between train end and test start.

    Yields:
        Tuples of (train_indices, test_indices).
    """
    start = 0
    while start + train_size + gap + test_size <= n_samples:
        train_idx = np.arange(start, start + train_size)
        test_start = start + train_size + gap
        test_idx = np.arange(test_start, test_start + test_size)
        yield train_idx, test_idx
        start += step_size
See
references/validation_methods.md
for purged CV, CPCV, and evaluation metrics.
python
from typing import Iterator

def walk_forward_splits(
    n_samples: int,
    train_size: int = 720,
    test_size: int = 168,
    step_size: int = 24,
    gap: int = 24,
) -> Iterator[tuple[np.ndarray, np.ndarray]]:
    """Generate walk-forward train/test index splits.

    Args:
        n_samples: Total number of samples.
        train_size: Number of training samples per window.
        test_size: Number of test samples per window.
        step_size: Step between successive windows.
        gap: Gap between train end and test start.

    Yields:
        Tuples of (train_indices, test_indices).
    """
    start = 0
    while start + train_size + gap + test_size <= n_samples:
        train_idx = np.arange(start, start + train_size)
        test_start = start + train_size + gap
        test_idx = np.arange(test_start, test_start + test_size)
        yield train_idx, test_idx
        start += step_size
有关purged CV、CPCV及评估指标的内容,请参阅
references/validation_methods.md

Model Training Pipeline

模型训练流程

Full Pipeline Overview

完整流程概述

  1. Feature engineering — compute technical indicators, on-chain metrics, volume features (see
    feature-engineering
    skill)
  2. Label creation — forward returns with threshold, drop neutral zone
  3. Walk-forward split — time-ordered train/test windows with gap
  4. Train model — XGBoost or LightGBM on each training window
  5. Predict on test — generate out-of-sample probability predictions
  6. Aggregate predictions — concatenate all out-of-sample results
  7. Evaluate — accuracy, precision, recall, F1, AUC, profit factor
  1. 特征工程 —— 计算技术指标、链上指标、成交量特征(详见
    feature-engineering
    技能)
  2. 标签创建 —— 基于阈值划分未来收益,剔除中性区间
  3. 滚动向前划分 —— 带间隔的时间序列训练/测试窗口
  4. 模型训练 —— 在每个训练窗口上训练XGBoost或LightGBM模型
  5. 测试预测 —— 生成样本外概率预测结果
  6. 预测结果聚合 —— 拼接所有样本外预测结果
  7. 评估 —— 准确率、精确率、召回率、F1值、AUC、盈利因子

Quick Training Example

快速训练示例

python
from xgboost import XGBClassifier

model = XGBClassifier(
    n_estimators=200,
    max_depth=4,
    learning_rate=0.05,
    subsample=0.8,
    colsample_bytree=0.8,
    eval_metric="logloss",
    use_label_encoder=False,
    random_state=42,
)

model.fit(
    X_train, y_train,
    eval_set=[(X_val, y_val)],
    verbose=False,
)

probabilities = model.predict_proba(X_test)[:, 1]
See
references/model_guide.md
for parameter recommendations and tuning.
python
from xgboost import XGBClassifier

model = XGBClassifier(
    n_estimators=200,
    max_depth=4,
    learning_rate=0.05,
    subsample=0.8,
    colsample_bytree=0.8,
    eval_metric="logloss",
    use_label_encoder=False,
    random_state=42,
)

model.fit(
    X_train, y_train,
    eval_set=[(X_val, y_val)],
    verbose=False,
)

probabilities = model.predict_proba(X_test)[:, 1]
有关参数推荐与调优的内容,请参阅
references/model_guide.md

SHAP Feature Importance

SHAP特征重要性

SHAP (SHapley Additive exPlanations) provides the gold standard for understanding model predictions.
SHAP(SHapley Additive exPlanations,沙普利加性解释)是理解模型预测结果的黄金标准。

Global Feature Importance

全局特征重要性

Which features matter most across all predictions:
python
import shap

explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)
哪些特征在所有预测中影响最大:
python
import shap

explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)

Summary plot (top 15 features)

Summary plot (top 15 features)

shap.summary_plot(shap_values, X_test, max_display=15)
undefined
shap.summary_plot(shap_values, X_test, max_display=15)
undefined

Local Explanations

局部解释

Why a specific prediction was made:
python
undefined
某一特定预测结果的生成原因:
python
undefined

Explain a single prediction

Explain a single prediction

shap.force_plot(explainer.expected_value, shap_values[0], X_test.iloc[0])
undefined
shap.force_plot(explainer.expected_value, shap_values[0], X_test.iloc[0])
undefined

Temporal Feature Importance

时序特征重要性

Track how feature importance drifts over walk-forward windows. If a feature's importance drops significantly, the market regime may have shifted.
跟踪特征重要性在滚动向前窗口中的变化趋势。若某特征的重要性大幅下降,可能意味着市场状态已发生转变。

Threshold Optimization

阈值优化

The default 0.5 probability threshold is almost never optimal for trading.
默认的0.5概率阈值在交易场景中几乎从未达到最优。

Why Not 0.5?

为何不选0.5?

  • Class imbalance: if 60% of labels are "up", a 0.5 threshold is too aggressive
  • Trading costs: marginal signals (0.51 probability) rarely cover transaction costs
  • Asymmetric payoffs: precision matters more than recall for trading
  • 类别不平衡:若60%的标签为“上涨”,0.5的阈值过于激进
  • 交易成本:边际信号(如概率0.51)几乎无法覆盖交易成本
  • 非对称收益:在交易场景中,精确率比召回率更重要

Optimize for Profit Factor

针对盈利因子优化

python
def optimize_threshold(
    probabilities: np.ndarray,
    returns: np.ndarray,
    thresholds: np.ndarray | None = None,
) -> tuple[float, float]:
    """Find threshold that maximizes profit factor.

    Args:
        probabilities: Model predicted probabilities.
        returns: Actual forward returns.
        thresholds: Thresholds to search over.

    Returns:
        Tuple of (best_threshold, best_profit_factor).
    """
    if thresholds is None:
        thresholds = np.arange(0.50, 0.85, 0.01)
    best_threshold, best_pf = 0.5, 0.0
    for t in thresholds:
        signals = probabilities >= t
        if signals.sum() < 10:
            continue
        signal_returns = returns[signals]
        wins = signal_returns[signal_returns > 0].sum()
        losses = abs(signal_returns[signal_returns < 0].sum())
        pf = wins / losses if losses > 0 else 0.0
        if pf > best_pf:
            best_pf = pf
            best_threshold = t
    return best_threshold, best_pf
Typical finding: optimal threshold is 0.60-0.75 for crypto trading signals.
python
def optimize_threshold(
    probabilities: np.ndarray,
    returns: np.ndarray,
    thresholds: np.ndarray | None = None,
) -> tuple[float, float]:
    """Find threshold that maximizes profit factor.

    Args:
        probabilities: Model predicted probabilities.
        returns: Actual forward returns.
        thresholds: Thresholds to search over.

    Returns:
        Tuple of (best_threshold, best_profit_factor).
    """
    if thresholds is None:
        thresholds = np.arange(0.50, 0.85, 0.01)
    best_threshold, best_pf = 0.5, 0.0
    for t in thresholds:
        signals = probabilities >= t
        if signals.sum() < 10:
            continue
        signal_returns = returns[signals]
        wins = signal_returns[signal_returns > 0].sum()
        losses = abs(signal_returns[signal_returns < 0].sum())
        pf = wins / losses if losses > 0 else 0.0
        if pf > best_pf:
            best_pf = pf
            best_threshold = t
    return best_threshold, best_pf
典型结论:加密货币交易信号的最优阈值为0.60-0.75。

Crypto-Specific Considerations

加密货币特有的注意事项

Short Training Windows

短训练窗口

Crypto market regimes change fast. A model trained on 6 months of data may perform worse than one trained on 30 days. Use shorter training windows and retrain frequently.
加密货币市场状态变化迅速。基于6个月数据训练的模型表现可能不如基于30天数据训练的模型。应使用更短的训练窗口并频繁重新训练。

Class Imbalance

类别不平衡

Most time periods are "flat" (returns within the neutral zone). Strategies to handle this:
  • Drop neutral zone: only train on clear up/down labels
  • Undersample majority class:
    scale_pos_weight
    in XGBoost
  • SMOTE: synthetic minority oversampling (use cautiously — can introduce lookahead)
  • Adjust threshold: raise the probability threshold to compensate
大部分时间段处于“横盘”状态(收益在中性区间内)。处理此问题的策略:
  • 剔除中性区间:仅基于清晰的涨跌标签进行训练
  • 多数类别欠采样:使用XGBoost中的
    scale_pos_weight
    参数
  • SMOTE:合成少数类过采样(需谨慎使用——可能引入前瞻偏差)
  • 调整阈值:提高概率阈值以进行补偿

Transaction Costs

交易成本

A model with 55% accuracy sounds good, but after 0.5% round-trip costs (slippage + fees), many signals become unprofitable. Always evaluate signals net of costs:
python
net_return = gross_return - 0.005  # 50 bps round-trip
准确率55%的模型听起来不错,但在扣除0.5%的往返成本(滑点+手续费)后,许多信号将无利可图。务必基于扣除成本后的结果评估信号:
python
net_return = gross_return - 0.005  # 50 bps round-trip

Feature Decay

特征衰减

Features lose predictive power over time as more participants discover and trade on them. Monitor rolling performance and retrain when metrics degrade.
随着越来越多参与者发现并基于特征进行交易,特征的预测能力会随时间衰减。需监控滚动表现,当指标下降时重新训练模型。

Integration with Other Skills

与其他技能的集成

SkillIntegration
feature-engineering
Compute input features for the classifier
vectorbt
Backtest trading strategies from ML signals
regime-detection
Train separate models per regime, or use regime as a feature
position-sizing
Size positions based on classifier confidence
risk-management
Apply portfolio-level risk limits to ML-generated signals
技能名称集成方式
feature-engineering
为分类器计算输入特征
vectorbt
基于机器学习信号回测交易策略
regime-detection
针对不同市场状态训练独立模型,或将状态作为特征
position-sizing
根据分类器置信度调整仓位大小
risk-management
对机器学习生成的信号应用组合层面的风险限制

Files

文件

References

参考资料

  • references/model_guide.md
    — XGBoost and LightGBM parameter guide, tuning, and ensembling
  • references/validation_methods.md
    — Walk-forward, purged CV, CPCV, and evaluation metrics
  • references/model_guide.md
    —— XGBoost和LightGBM参数指南、调优及集成方法
  • references/validation_methods.md
    —— 滚动向前验证、purged CV、CPCV及评估指标

Scripts

脚本

  • scripts/train_classifier.py
    — Train a signal classifier with walk-forward validation and feature importance
  • scripts/walk_forward_backtest.py
    — Backtest ML signals vs buy-and-hold with walk-forward validation
  • scripts/train_classifier.py
    —— 训练带滚动向前验证和特征重要性分析的信号分类器
  • scripts/walk_forward_backtest.py
    —— 通过滚动向前验证回测机器学习信号与买入持有策略的表现

Dependencies

依赖项

bash
undefined
bash
undefined

Core (required)

Core (required)

uv pip install pandas numpy scikit-learn
uv pip install pandas numpy scikit-learn

Optional (recommended)

Optional (recommended)

uv pip install xgboost lightgbm shap
undefined
uv pip install xgboost lightgbm shap
undefined

Key Takeaways

核心要点

  1. Walk-forward validation is non-negotiable — random CV will give you wildly inflated results
  2. Optimize threshold for profit factor, not accuracy — a high-precision, low-recall model beats a high-accuracy one
  3. Short training windows for crypto — 30 days beats 6 months in most regimes
  4. Monitor feature decay — retrain when rolling metrics drop below baseline
  5. Always evaluate net of costs — a 55% accurate model may be unprofitable after fees
  6. SHAP over raw feature importance — SHAP gives consistent, theoretically grounded explanations
  1. 滚动向前验证必不可少 —— 随机交叉验证会得出严重高估的结果
  2. 针对盈利因子优化阈值,而非准确率 —— 高精确率、低召回率的模型优于高准确率模型
  3. 加密货币使用短训练窗口 —— 在大多数市场状态下,30天窗口的表现优于6个月窗口
  4. 监控特征衰减 —— 当滚动指标低于基线时重新训练模型
  5. 始终基于扣除成本后的结果评估 —— 准确率55%的模型扣除手续费后可能无利可图
  6. 优先使用SHAP而非原始特征重要性 —— SHAP提供一致且有理论依据的解释