← Back to Learn
III AdvancedWeek 12 • Lesson 35Duration: 45 min

BTV Backtesting Validation Checklist

The 20 things to verify before trusting any backtest

Learning Objectives

  • Learn a systematic checklist for validating backtest results
  • Understand common backtesting errors and biases
  • Build habits that prevent false confidence in strategies

Explain Like I'm 5

Most backtests are wrong. Not because the code is buggy (though it might be), but because of subtle biases that inflate results. This lesson is a checklist of everything that can go wrong — and how to verify each one.

Think of It This Way

Like a pilot's preflight checklist. You don't skip items because "it was fine last time." Every flight, every backtest — you go through the entire list. This discipline prevents crashes.

1The Validation Checklist

Data quality: - □ No look-ahead bias in features - □ Survivorship bias addressed (delisted assets included) - □ Data gaps identified and handled - □ Corporate actions/splits adjusted - □ Bid-ask spread included (not just mid prices) Execution realism: - □ Slippage modeled (typically 0.5-2 pips for forex) - □ Commission/swap costs included - □ Fill assumptions realistic - □ Position size based on available margin - □ Gap risk modeled (weekend/holiday gaps) Statistical validity: - □ Walk-forward (not in-sample) results used - □ Enough trades for significance (300+) - □ PBO computed and acceptable - □ Multiple time periods tested - □ Performance consistent across regimes

2The Five Most Dangerous Biases

1. Look-ahead bias. Using information that wouldn't be available at trade time. This is the most common and most dangerous. Example: using today's close to make a decision at today's open. 2. Survivorship bias. Only testing on assets that currently exist. This ignores delisted or failed assets, inflating results. 3. Selection bias. Reporting only the best out of many tested strategies. PBO addresses this directly. 4. Time-period bias. Cherry-picking the backtest period. Solution: test on the longest available history. 5. Transaction cost underestimation. Ignoring or underestimating spreads, slippage, and commissions. This can turn a profitable strategy into a losing one. A well-built production engine addresses all five: strict lag enforcement in feature engineering (no look-ahead), testing on all available history (7+ years), low PBO (verified through CSCV), realistic spread and slippage modeling, and walk-forward results exclusively.

3The "Smells" of a Bad Backtest

After you've seen enough backtests, you develop a nose for problems. Here are the red flags: Too-smooth equity curve. Real equity curves are jagged. If it looks like a straight line going up, something is wrong — probably look-ahead bias. No losing streaks. Real strategies regularly have 5-10 trade losing streaks. If the max losing streak is 3, be suspicious. Consistent profit across ALL conditions. Real strategies have weak periods. If the backtest shows profit in every single month, it's probably overfit. Suspiciously high win rate. Anything above 70% WR for a directional strategy should raise eyebrows. Most real strategies land between 50% and 65%. Performance too good for the timeframe. Making 100% annually on 15-minute bars with low drawdown would make you the best trader alive. Probably a bug. When you see these smells, go back to the checklist. Something is wrong.

4The Realistic Backtest Setup

Here's what a properly configured backtest actually looks like: Data: Tick-level or 1-minute bars, not daily OHLCV. You need granularity to model intrabar execution. Execution model: - Slippage: 0.5-2 pips per trade (depends on pair and time of day) - Spread: use actual historical spreads, not fixed - Fill delay: 50-200ms latency simulation - Partial fills: not always getting your full size Costs: - Commission per lot - Swap rates for overnight holds - Financing costs Position sizing: Based on actual available margin, not unlimited capital. Data splits: Walk-forward with temporal ordering. Never shuffle time series data. Once you set all this up properly, your backtest returns typically drop by 20-40% compared to the naive version. That's reality. The remaining edge is your real edge.

Key Formulas

Slippage-Adjusted Return

Gross return minus slippage (s) per trade times number of trades. High-frequency strategies are especially sensitive to slippage — small per-trade costs compound rapidly.

Hands-On Code

Backtest Validation Checker

python
import numpy as np

def validate_backtest(results, config):
    """Systematic backtest validation checklist."""
    checks = []
    
    n_trades = results['total_trades']
    checks.append(('Sufficient trades (>300)', n_trades > 300, n_trades))
    
    checks.append(('Walk-forward used', results.get('walk_forward', False), ''))
    
    pbo = results.get('pbo', 1.0)
    checks.append(('PBO < 0.25', pbo < 0.25, f'{pbo:.3f}'))
    
    checks.append(('Slippage modeled', config.get('slippage', 0) > 0, ''))
    checks.append(('Commission included', config.get('commission', 0) > 0, ''))
    
    years = results.get('years', 0)
    checks.append(('Backtest > 3 years', years > 3, f'{years:.1f} years'))
    
    print("=== BACKTEST VALIDATION ===")
    all_pass = True
    for name, passed, detail in checks:
        status = '[PASS]' if passed else '[FAIL]'
        print(f"  {status} {name}: {detail}")
        if not passed:
            all_pass = False
    
    verdict = "VALIDATED" if all_pass else "FAILED"
    print(f"\nVerdict: {verdict}")
    return all_pass

Systematic validation prevents deploying strategies based on flawed backtests. Every item on this checklist has caused someone to lose money when ignored.

Knowledge Check

Q1.Your backtest shows incredible results but slippage wasn't modeled. What happens when you add realistic slippage?

Assignment

Go through the full validation checklist for your strategy. Document each check's result. If any fail, fix them and re-run the backtest.