backtest-expert

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Backtest Expert

回测专家

Systematic approach to backtesting trading strategies based on professional methodology that prioritizes robustness over optimistic results.
基于专业方法论的交易策略系统化回测方法,优先考虑鲁棒性而非乐观的纸面结果。

Core Philosophy

核心理念

Goal: Find strategies that "break the least", not strategies that "profit the most" on paper.
Principle: Add friction, stress test assumptions, and see what survives. If a strategy holds up under pessimistic conditions, it's more likely to work in live trading.
目标:找到“最不容易失效”的策略,而非纸面“利润最高”的策略。
原则:增加摩擦、压力测试假设,观察哪些策略能留存。如果一个策略在悲观条件下仍能保持表现,那么它在实盘交易中更有可能奏效。

When to Use This Skill

何时使用此技能

Use this skill when:
  • Developing or validating systematic trading strategies
  • Evaluating whether a trading idea is robust enough for live implementation
  • Troubleshooting why a backtest might be misleading
  • Learning proper backtesting methodology
  • Avoiding common pitfalls (curve-fitting, look-ahead bias, survivorship bias)
  • Assessing parameter sensitivity and regime dependence
  • Setting realistic expectations for slippage and execution costs
在以下场景使用此技能:
  • 开发或验证系统化交易策略
  • 评估交易想法是否足够鲁棒以用于实盘执行
  • 排查回测结果可能存在误导性的原因
  • 学习正确的回测方法论
  • 避免常见陷阱(曲线拟合、前瞻偏差、生存偏差)
  • 评估参数敏感性和市场周期依赖性
  • 为滑点和执行成本设定合理预期

Backtesting Workflow

回测工作流程

1. State the Hypothesis

1. 提出假设

Define the edge in one sentence.
Example: "Stocks that gap up >3% on earnings and pull back to previous day's close within first hour provide mean-reversion opportunity."
If you can't articulate the edge clearly, don't proceed to testing.
用一句话定义交易优势。
示例:“财报发布后跳空高开>3%且在首个小时内回落至前一日收盘价的股票,存在均值回归机会。”
如果无法清晰阐述交易优势,请勿进入测试环节。

2. Codify Rules with Zero Discretion

2. 编写无主观判断的规则

Define with complete specificity:
  • Entry: Exact conditions, timing, price type
  • Exit: Stop loss, profit target, time-based exit
  • Position sizing: Fixed $$, % of portfolio, volatility-adjusted
  • Filters: Market cap, volume, sector, volatility conditions
  • Universe: What instruments are eligible
Critical: No subjective judgment allowed. Every decision must be rule-based and unambiguous.
明确定义以下内容:
  • 入场:精确条件、时机、价格类型
  • 出场:止损、止盈、基于时间的出场规则
  • 仓位管理:固定金额、占投资组合比例、波动率调整
  • 筛选条件:市值、成交量、行业、波动率要求
  • 标的范围:符合条件的交易品种
关键要求:不允许主观判断。每一个决策都必须基于明确的规则,不存在歧义。

3. Run Initial Backtest

3. 运行初始回测

Test over:
  • Minimum 5 years (preferably 10+)
  • Multiple market regimes (bull, bear, high/low volatility)
  • Realistic costs: Commissions + conservative slippage
Examine initial results for basic viability. If fundamentally broken, iterate on hypothesis.
在以下条件下测试:
  • 至少5年(最好10年以上)的历史数据
  • 多种市场周期(牛市、熊市、高/低波动率)
  • 真实成本:佣金+保守滑点
检查初始结果的基本可行性。如果从根本上不可行,则迭代假设。

4. Stress Test the Strategy

4. 压力测试策略

This is where 80% of testing time should be spent.
Parameter sensitivity:
  • Test stop loss at 50%, 75%, 100%, 125%, 150% of baseline
  • Test profit target at 80%, 90%, 100%, 110%, 120% of baseline
  • Vary entry/exit timing by ±15-30 minutes
  • Look for "plateaus" of stable performance, not narrow spikes
Execution friction:
  • Increase slippage to 1.5-2x typical estimates
  • Model worst-case fills (buy at ask+1 tick, sell at bid-1 tick)
  • Add realistic order rejection scenarios
  • Test with pessimistic commission structures
Time robustness:
  • Analyze year-by-year performance
  • Require positive expectancy in majority of years
  • Ensure strategy doesn't rely on 1-2 exceptional periods
  • Test in different market regimes separately
Sample size:
  • Absolute minimum: 30 trades
  • Preferred: 100+ trades
  • High confidence: 200+ trades
这部分应占据80%的测试时间。
参数敏感性测试
  • 测试止损为基准值的50%、75%、100%、125%、150%
  • 测试止盈为基准值的80%、90%、100%、110%、120%
  • 将入场/出场时机前后调整±15-30分钟
  • 寻找表现稳定的“平台区间”,而非狭窄的峰值
执行摩擦测试
  • 将滑点提高至典型估计值的1.5-2倍
  • 模拟最差成交情况(以卖一价+1个tick买入,以买一价-1个tick卖出)
  • 添加真实的订单拒绝场景
  • 用悲观的佣金结构测试
时间鲁棒性测试
  • 分析逐年表现
  • 要求大部分年份都有正收益预期
  • 确保策略不依赖1-2个特殊时期
  • 分别在不同市场周期下测试
样本量要求
  • 绝对最小值:30笔交易
  • 推荐值:100+笔交易
  • 高置信度:200+笔交易

5. Out-of-Sample Validation

5. 样本外验证

Walk-forward analysis:
  1. Optimize on training period (e.g., Year 1-3)
  2. Test on validation period (Year 4)
  3. Roll forward and repeat
  4. Compare in-sample vs out-of-sample performance
Warning signs:
  • Out-of-sample <50% of in-sample performance
  • Need frequent parameter re-optimization
  • Parameters change dramatically between periods
滚动窗口分析
  1. 在训练期(如第1-3年)优化参数
  2. 在验证期(第4年)测试
  3. 滚动窗口并重复上述步骤
  4. 比较样本内与样本外表现
警示信号
  • 样本外表现仅为样本内的50%以下
  • 需要频繁重新优化参数
  • 参数在不同时期发生巨大变化

6. Evaluate Results

6. 评估结果

Questions to answer:
  • Does edge survive pessimistic assumptions?
  • Is performance stable across parameter variations?
  • Does strategy work in multiple market regimes?
  • Is sample size sufficient for statistical confidence?
  • Are results realistic, not "too good to be true"?
Decision criteria:
  • Deploy: Survives all stress tests with acceptable performance
  • 🔄 Refine: Core logic sound but needs parameter adjustment
  • Abandon: Fails stress tests or relies on fragile assumptions
需要回答的问题
  • 交易优势在悲观假设下是否依然存在?
  • 表现是否在参数变化范围内保持稳定?
  • 策略在多种市场周期下是否有效?
  • 样本量是否足够达到统计置信度?
  • 结果是否真实合理,而非“好得离谱”?
决策标准
  • 部署:通过所有压力测试且表现可接受
  • 🔄 优化:核心逻辑合理但需调整参数
  • 放弃:未通过压力测试或依赖脆弱假设

Key Testing Principles

核心测试原则

Punish the Strategy

严格考验策略

Add friction everywhere:
  • Commissions higher than reality
  • Slippage 1.5-2x typical
  • Worst-case fills
  • Order rejections
  • Partial fills
Rationale: Strategies that survive pessimistic assumptions often outperform in live trading.
全方位增加摩擦:
  • 佣金高于实际水平
  • 滑点为典型值的1.5-2倍
  • 最差成交情况
  • 订单拒绝
  • 部分成交
原理:能在悲观假设下留存的策略,在实盘交易中往往表现更优。

Seek Plateaus, Not Peaks

寻找平台区间,而非峰值

Look for parameter ranges where performance is stable, not optimal values that create performance spikes.
Good: Strategy profitable with stop loss anywhere from 1.5% to 3.0% Bad: Strategy only works with stop loss at exactly 2.13%
Stable performance indicates genuine edge; narrow optima suggest curve-fitting.
寻找表现稳定的参数范围,而非能创造表现峰值的最优参数值。
良好情况:止损在1.5%至3.0%之间时策略均盈利 糟糕情况:仅当止损恰好为2.13%时策略有效
稳定的表现表明存在真实的交易优势;狭窄的最优值则暗示曲线拟合。

Test All Cases, Not Cherry-Picked Examples

测试所有情况,而非挑选特例

Wrong approach: Study hand-picked "market leaders" that worked Right approach: Test every stock that met criteria, including those that failed
Selective examples create survivorship bias and overestimate strategy quality.
错误做法:研究被精心挑选的“成功案例” 正确做法:测试所有符合条件的股票,包括那些失败的案例
选择性案例会导致生存偏差,高估策略质量。

Separate Idea Generation from Validation

将想法生成与验证分离

Intuition: Useful for generating hypotheses Validation: Must be purely data-driven
Never let attachment to an idea influence interpretation of test results.
直觉:适用于生成假设 验证:必须完全基于数据驱动
绝不要因为对某个想法的偏好而影响对测试结果的解读。

Common Failure Patterns

常见失效模式

Recognize these patterns early to save time:
  1. Parameter sensitivity: Only works with exact parameter values
  2. Regime-specific: Great in some years, terrible in others
  3. Slippage sensitivity: Unprofitable when realistic costs added
  4. Small sample: Too few trades for statistical confidence
  5. Look-ahead bias: "Too good to be true" results
  6. Over-optimization: Many parameters, poor out-of-sample results
See
references/failed_tests.md
for detailed examples and diagnostic framework.
尽早识别这些模式以节省时间:
  1. 参数敏感性:仅在特定参数值下有效
  2. 周期特异性:某些年份表现极佳,其他年份表现极差
  3. 滑点敏感性:加入真实成本后无利可图
  4. 样本量过小:交易笔数不足,无法达到统计置信度
  5. 前瞻偏差:“好得离谱”的结果
  6. 过度优化:参数过多,样本外表现差
references/failed_tests.md
获取详细示例和诊断框架。

Available Reference Documentation

可用参考文档

Methodology Reference

方法论参考

File:
references/methodology.md
When to read: For detailed guidance on specific testing techniques.
Contents:
  • Stress testing methods
  • Parameter sensitivity analysis
  • Slippage and friction modeling
  • Sample size requirements
  • Market regime classification
  • Common biases and pitfalls (survivorship, look-ahead, curve-fitting, etc.)
文件
references/methodology.md
阅读时机:需要特定测试技术的详细指导时。
内容
  • 压力测试方法
  • 参数敏感性分析
  • 滑点与摩擦建模
  • 样本量要求
  • 市场周期分类
  • 常见偏差与陷阱(生存偏差、前瞻偏差、曲线拟合等)

Failed Tests Reference

失效测试参考

File:
references/failed_tests.md
When to read: When strategy fails tests, or learning from past mistakes.
Contents:
  • Why failures are valuable
  • Common failure patterns with examples
  • Case study documentation framework
  • Red flags checklist for evaluating backtests
文件
references/failed_tests.md
阅读时机:策略测试失败时,或从过往错误中学习时。
内容
  • 失效案例的价值
  • 常见失效模式及示例
  • 案例研究文档框架
  • 回测评估警示信号清单

Critical Reminders

重要提醒

Time allocation: Spend 20% generating ideas, 80% trying to break them.
Context-free requirement: If strategy requires "perfect context" to work, it's not robust enough for systematic trading.
Red flag: If backtest results look too good (>90% win rate, minimal drawdowns, perfect timing), audit carefully for look-ahead bias or data issues.
Tool limitations: Understand your backtesting platform's quirks (interpolation methods, handling of low liquidity, data alignment issues).
Statistical significance: Small edges require large sample sizes to prove. 5% edge per trade needs 100+ trades to distinguish from luck.
时间分配:20%的时间用于生成想法,80%的时间用于尝试推翻它们。
无场景依赖要求:如果策略需要“完美场景”才能生效,那么它的鲁棒性不足以支持系统化交易。
警示信号:如果回测结果好得离谱(胜率>90%、回撤极小、时机完美),请仔细检查是否存在前瞻偏差或数据问题。
工具局限性:了解你的回测平台的特性(插值方法、低流动性处理、数据对齐问题)。
统计显著性:微小的交易优势需要大样本量来证明。每笔交易5%的优势需要100+笔交易才能区分于运气。

Discretionary vs Systematic Differences

主观交易与系统化交易的区别

This skill focuses on systematic/quantitative backtesting where:
  • All rules are codified in advance
  • No discretion or "feel" in execution
  • Testing happens on all historical examples, not cherry-picked cases
  • Context (news, macro) is deliberately stripped out
Discretionary traders study differently—this skill may not apply to setups requiring subjective judgment.
本技能专注于系统化/量化回测,其中:
  • 所有规则均提前编写完成
  • 执行过程中无主观判断或“感觉”
  • 测试覆盖所有历史案例,而非挑选特例
  • 刻意剔除场景因素(新闻、宏观经济)
主观交易者的学习方法不同——本技能可能不适用于需要主观判断的交易场景。