backtest-expert

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Backtest Expert

回测专家

Systematic approach to backtesting trading strategies based on professional methodology that prioritizes robustness over optimistic results.

基于专业方法论的交易策略系统化回测方法，优先考虑鲁棒性而非乐观的纸面结果。

Core Philosophy

核心理念

Goal: Find strategies that "break the least", not strategies that "profit the most" on paper.

Principle: Add friction, stress test assumptions, and see what survives. If a strategy holds up under pessimistic conditions, it's more likely to work in live trading.

目标：找到“最不容易失效”的策略，而非纸面“利润最高”的策略。

原则：增加摩擦、压力测试假设，观察哪些策略能留存。如果一个策略在悲观条件下仍能保持表现，那么它在实盘交易中更有可能奏效。

When to Use This Skill

何时使用此技能

Use this skill when:

Developing or validating systematic trading strategies
Evaluating whether a trading idea is robust enough for live implementation
Troubleshooting why a backtest might be misleading
Learning proper backtesting methodology
Avoiding common pitfalls (curve-fitting, look-ahead bias, survivorship bias)
Assessing parameter sensitivity and regime dependence
Setting realistic expectations for slippage and execution costs

在以下场景使用此技能：

开发或验证系统化交易策略
评估交易想法是否足够鲁棒以用于实盘执行
排查回测结果可能存在误导性的原因
学习正确的回测方法论
避免常见陷阱（曲线拟合、前瞻偏差、生存偏差）
评估参数敏感性和市场周期依赖性
为滑点和执行成本设定合理预期

Backtesting Workflow

回测工作流程

1. State the Hypothesis

1. 提出假设

Define the edge in one sentence.

Example: "Stocks that gap up >3% on earnings and pull back to previous day's close within first hour provide mean-reversion opportunity."

If you can't articulate the edge clearly, don't proceed to testing.

用一句话定义交易优势。

示例：“财报发布后跳空高开>3%且在首个小时内回落至前一日收盘价的股票，存在均值回归机会。”

如果无法清晰阐述交易优势，请勿进入测试环节。

2. Codify Rules with Zero Discretion

2. 编写无主观判断的规则

Define with complete specificity:

Entry: Exact conditions, timing, price type
Exit: Stop loss, profit target, time-based exit
Position sizing: Fixed $$, % of portfolio, volatility-adjusted
Filters: Market cap, volume, sector, volatility conditions
Universe: What instruments are eligible

Critical: No subjective judgment allowed. Every decision must be rule-based and unambiguous.

明确定义以下内容：

入场：精确条件、时机、价格类型
出场：止损、止盈、基于时间的出场规则
仓位管理：固定金额、占投资组合比例、波动率调整
筛选条件：市值、成交量、行业、波动率要求
标的范围：符合条件的交易品种

关键要求：不允许主观判断。每一个决策都必须基于明确的规则，不存在歧义。

3. Run Initial Backtest

3. 运行初始回测

Test over:

Minimum 5 years (preferably 10+)
Multiple market regimes (bull, bear, high/low volatility)
Realistic costs: Commissions + conservative slippage

Examine initial results for basic viability. If fundamentally broken, iterate on hypothesis.

在以下条件下测试：

至少5年（最好10年以上）的历史数据
多种市场周期（牛市、熊市、高/低波动率）
真实成本：佣金+保守滑点

检查初始结果的基本可行性。如果从根本上不可行，则迭代假设。

4. Stress Test the Strategy

4. 压力测试策略

This is where 80% of testing time should be spent.

Parameter sensitivity:

Test stop loss at 50%, 75%, 100%, 125%, 150% of baseline
Test profit target at 80%, 90%, 100%, 110%, 120% of baseline
Vary entry/exit timing by ±15-30 minutes
Look for "plateaus" of stable performance, not narrow spikes

Execution friction:

Increase slippage to 1.5-2x typical estimates
Model worst-case fills (buy at ask+1 tick, sell at bid-1 tick)
Add realistic order rejection scenarios
Test with pessimistic commission structures

Time robustness:

Analyze year-by-year performance
Require positive expectancy in majority of years
Ensure strategy doesn't rely on 1-2 exceptional periods
Test in different market regimes separately

Sample size:

Absolute minimum: 30 trades
Preferred: 100+ trades
High confidence: 200+ trades

这部分应占据80%的测试时间。

参数敏感性测试：

测试止损为基准值的50%、75%、100%、125%、150%
测试止盈为基准值的80%、90%、100%、110%、120%
将入场/出场时机前后调整±15-30分钟
寻找表现稳定的“平台区间”，而非狭窄的峰值

执行摩擦测试：

将滑点提高至典型估计值的1.5-2倍
模拟最差成交情况（以卖一价+1个tick买入，以买一价-1个tick卖出）
添加真实的订单拒绝场景
用悲观的佣金结构测试

时间鲁棒性测试：

分析逐年表现
要求大部分年份都有正收益预期
确保策略不依赖1-2个特殊时期
分别在不同市场周期下测试

样本量要求：

绝对最小值：30笔交易
推荐值：100+笔交易
高置信度：200+笔交易

5. Out-of-Sample Validation

5. 样本外验证

Walk-forward analysis:

Optimize on training period (e.g., Year 1-3)
Test on validation period (Year 4)
Roll forward and repeat
Compare in-sample vs out-of-sample performance

Warning signs:

Out-of-sample <50% of in-sample performance
Need frequent parameter re-optimization
Parameters change dramatically between periods

滚动窗口分析：

在训练期（如第1-3年）优化参数
在验证期（第4年）测试
滚动窗口并重复上述步骤
比较样本内与样本外表现

警示信号：

样本外表现仅为样本内的50%以下
需要频繁重新优化参数
参数在不同时期发生巨大变化

6. Evaluate Results

6. 评估结果

Questions to answer:

Does edge survive pessimistic assumptions?
Is performance stable across parameter variations?
Does strategy work in multiple market regimes?
Is sample size sufficient for statistical confidence?
Are results realistic, not "too good to be true"?

Decision criteria:

✅ Deploy: Survives all stress tests with acceptable performance
🔄 Refine: Core logic sound but needs parameter adjustment
❌ Abandon: Fails stress tests or relies on fragile assumptions

需要回答的问题：

交易优势在悲观假设下是否依然存在？
表现是否在参数变化范围内保持稳定？
策略在多种市场周期下是否有效？
样本量是否足够达到统计置信度？
结果是否真实合理，而非“好得离谱”？

决策标准：

✅ 部署：通过所有压力测试且表现可接受
🔄 优化：核心逻辑合理但需调整参数
❌ 放弃：未通过压力测试或依赖脆弱假设

Key Testing Principles

核心测试原则

Punish the Strategy

严格考验策略

Add friction everywhere:

Commissions higher than reality
Slippage 1.5-2x typical
Worst-case fills
Order rejections
Partial fills

Rationale: Strategies that survive pessimistic assumptions often outperform in live trading.

全方位增加摩擦：

佣金高于实际水平
滑点为典型值的1.5-2倍
最差成交情况
订单拒绝
部分成交

原理：能在悲观假设下留存的策略，在实盘交易中往往表现更优。

Seek Plateaus, Not Peaks

寻找平台区间，而非峰值

Look for parameter ranges where performance is stable, not optimal values that create performance spikes.

Good: Strategy profitable with stop loss anywhere from 1.5% to 3.0% Bad: Strategy only works with stop loss at exactly 2.13%

Stable performance indicates genuine edge; narrow optima suggest curve-fitting.

寻找表现稳定的参数范围，而非能创造表现峰值的最优参数值。

良好情况：止损在1.5%至3.0%之间时策略均盈利 糟糕情况：仅当止损恰好为2.13%时策略有效

稳定的表现表明存在真实的交易优势；狭窄的最优值则暗示曲线拟合。

Test All Cases, Not Cherry-Picked Examples

测试所有情况，而非挑选特例

Wrong approach: Study hand-picked "market leaders" that worked Right approach: Test every stock that met criteria, including those that failed

Selective examples create survivorship bias and overestimate strategy quality.

错误做法：研究被精心挑选的“成功案例” 正确做法：测试所有符合条件的股票，包括那些失败的案例

选择性案例会导致生存偏差，高估策略质量。

Separate Idea Generation from Validation

将想法生成与验证分离

Intuition: Useful for generating hypotheses Validation: Must be purely data-driven

Never let attachment to an idea influence interpretation of test results.

直觉：适用于生成假设验证：必须完全基于数据驱动

绝不要因为对某个想法的偏好而影响对测试结果的解读。

Common Failure Patterns

常见失效模式

Recognize these patterns early to save time:

Parameter sensitivity: Only works with exact parameter values
Regime-specific: Great in some years, terrible in others
Slippage sensitivity: Unprofitable when realistic costs added
Small sample: Too few trades for statistical confidence
Look-ahead bias: "Too good to be true" results
Over-optimization: Many parameters, poor out-of-sample results

See

references/failed_tests.md

for detailed examples and diagnostic framework.

尽早识别这些模式以节省时间：

参数敏感性：仅在特定参数值下有效
周期特异性：某些年份表现极佳，其他年份表现极差
滑点敏感性：加入真实成本后无利可图
样本量过小：交易笔数不足，无法达到统计置信度
前瞻偏差：“好得离谱”的结果
过度优化：参数过多，样本外表现差

见

references/failed_tests.md

获取详细示例和诊断框架。

Available Reference Documentation

可用参考文档

Methodology Reference

方法论参考

File:

references/methodology.md

When to read: For detailed guidance on specific testing techniques.

Contents:

Stress testing methods
Parameter sensitivity analysis
Slippage and friction modeling
Sample size requirements
Market regime classification
Common biases and pitfalls (survivorship, look-ahead, curve-fitting, etc.)

文件：

references/methodology.md

阅读时机：需要特定测试技术的详细指导时。

内容：

压力测试方法
参数敏感性分析
滑点与摩擦建模
样本量要求
市场周期分类
常见偏差与陷阱（生存偏差、前瞻偏差、曲线拟合等）

Failed Tests Reference

失效测试参考

File:

references/failed_tests.md

When to read: When strategy fails tests, or learning from past mistakes.

Contents:

Why failures are valuable
Common failure patterns with examples
Case study documentation framework
Red flags checklist for evaluating backtests

文件：

references/failed_tests.md

阅读时机：策略测试失败时，或从过往错误中学习时。

内容：

失效案例的价值
常见失效模式及示例
案例研究文档框架
回测评估警示信号清单

Critical Reminders

重要提醒

Time allocation: Spend 20% generating ideas, 80% trying to break them.

Context-free requirement: If strategy requires "perfect context" to work, it's not robust enough for systematic trading.

Red flag: If backtest results look too good (>90% win rate, minimal drawdowns, perfect timing), audit carefully for look-ahead bias or data issues.

Tool limitations: Understand your backtesting platform's quirks (interpolation methods, handling of low liquidity, data alignment issues).

Statistical significance: Small edges require large sample sizes to prove. 5% edge per trade needs 100+ trades to distinguish from luck.

时间分配：20%的时间用于生成想法，80%的时间用于尝试推翻它们。

无场景依赖要求：如果策略需要“完美场景”才能生效，那么它的鲁棒性不足以支持系统化交易。

警示信号：如果回测结果好得离谱（胜率>90%、回撤极小、时机完美），请仔细检查是否存在前瞻偏差或数据问题。

工具局限性：了解你的回测平台的特性（插值方法、低流动性处理、数据对齐问题）。

统计显著性：微小的交易优势需要大样本量来证明。每笔交易5%的优势需要100+笔交易才能区分于运气。

Discretionary vs Systematic Differences

主观交易与系统化交易的区别

This skill focuses on systematic/quantitative backtesting where:

All rules are codified in advance
No discretion or "feel" in execution
Testing happens on all historical examples, not cherry-picked cases
Context (news, macro) is deliberately stripped out

Discretionary traders study differently—this skill may not apply to setups requiring subjective judgment.

本技能专注于系统化/量化回测，其中：

所有规则均提前编写完成
执行过程中无主观判断或“感觉”
测试覆盖所有历史案例，而非挑选特例
刻意剔除场景因素（新闻、宏观经济）

主观交易者的学习方法不同——本技能可能不适用于需要主观判断的交易场景。