PBO Probability of Backtest Overfitting (PBO)
Quantifying the chance that your backtest is lying to you
Learning Objectives
- •Understand PBO and why it's critical for strategy validation
- •Learn the Combinatorial Purged Cross-Validation (CPCV) method
- •Interpret PBO scores for deployment decisions
Explain Like I'm 5
PBO asks a pointed question: "If you tried many strategies and picked the best-performing one, what's the probability that the winner is overfit to the backtest?" It turns out that if you test 100 strategies and pick the winner, there's a very high chance the winner just got lucky. PBO quantifies exactly how high.
Think of It This Way
Imagine flipping 100 coins 20 times each. Some coin will get 15+ heads by pure luck. If you declare that coin "the winner" and bet on it, you'll be disappointed — it was lucky, not magic. PBO is the framework that tells you: given how many things you tried, how likely is it that your winner is genuinely good versus just lucky?
1The Multiple Testing Problem
2How PBO Works
3CSCV — The Combinatorial Method
PBO Distribution Across CSCV Splits
4Interpreting PBO Results
Key Formulas
PBO Estimate
Fraction of combinatorial splits where the best in-sample strategy ranks below median OOS. S = number of partitions. Lower PBO = less overfitting. Target PBO < 0.15 for production deployment.
Hands-On Code
Simplified PBO Calculator
import numpy as np
from itertools import combinations
def compute_pbo(performance_matrix, n_partitions=16):
"""
Simplified PBO computation.
performance_matrix: shape (n_strategies, n_partitions)
Each row = one strategy's performance on each partition.
"""
S = n_partitions
half = S // 2
n_overfit = 0
n_total = 0
for train_idx in combinations(range(S), half):
test_idx = [i for i in range(S) if i not in train_idx]
# IS performance: mean across train partitions
is_perf = performance_matrix[:, list(train_idx)].mean(axis=1)
best_is = np.argmax(is_perf)
# OOS performance of the IS winner
oos_perf = performance_matrix[:, test_idx].mean(axis=1)
all_oos = sorted(oos_perf)
rank = np.searchsorted(all_oos, oos_perf[best_is]) / len(all_oos)
if rank < 0.5:
n_overfit += 1
n_total += 1
pbo = n_overfit / n_total
print(f"PBO = {pbo:.3f}")
print(f" {'[PASS] Low overfit risk' if pbo < 0.25 else '[WARN] Overfit concern'}")
return pboPBO gives you a single number answering "how likely is it that my strategy is overfit?" Target PBO below 0.15 for strong evidence the strategy captures real market patterns.
Knowledge Check
Q1.You tested 100 strategy configurations and the best one has 65% backtest win rate. What's the likely problem?
Assignment
Implement PBO calculation for your strategy. Test 3 different configurations and compute PBO for the "winner." Is it below 0.25?