SEL Model Selection & Validation
How to choose between models without fooling yourself
Learning Objectives
- •Understand walk-forward validation and why k-fold fails for time series
- •Learn proper model comparison methodology
- •See the dangers of optimizing for the wrong metric
- •Build a model selection workflow you can actually trust
Explain Like I'm 5
You've trained three models. They all look good on paper. How do you pick the best one? Not by looking at training accuracy — that's like judging a student by their open-book exam. You need a proper validation setup that simulates real deployment conditions. If the model can't perform on data it's never seen, it's useless.
Think of It This Way
Choosing a model is like hiring an employee. Their resume (training performance) tells you what they CAN do. The interview (validation) tests what they ACTUALLY do under unfamiliar conditions. Reference checks (out-of-sample testing) confirm they're not just good at interviews.
1Why K-Fold Cross-Validation Doesn't Work for Trading
Apparent Performance: K-Fold vs Walk-Forward
2Choosing the Right Metric
Model A vs Model B — Same Accuracy, Very Different Value
3The Model Selection Workflow
4Occam's Razor — When Models Tie
Model Complexity vs Robustness
Key Formulas
Profit Factor
Gross profits divided by gross losses. PF > 1 = profitable. PF > 1.5 = solid. PF > 2.0 = excellent. This directly measures whether the model makes money, unlike accuracy.
Sharpe Ratio (Annualized)
Risk-adjusted return. Mean excess return divided by standard deviation, annualized. Higher = better risk-adjusted performance. S > 2 is considered very good for trading strategies.
Hands-On Code
Walk-Forward Model Comparison
import numpy as np
def walk_forward_compare(models, features, labels, window_size=252,
step_size=63):
"""Compare models using walk-forward validation."""
n = len(features)
results = {name: [] for name in models}
for start in range(window_size, n - step_size, step_size):
train_end = start
test_end = start + step_size
X_train = features[:train_end]
y_train = labels[:train_end]
X_test = features[train_end:test_end]
y_test = labels[train_end:test_end]
for name, model_fn in models.items():
model = model_fn()
model.fit(X_train, y_train)
preds = model.predict(X_test)
# Calculate profit factor
wins = preds[y_test == 1].sum()
losses = abs(preds[y_test == 0].sum()) + 1e-10
pf = wins / losses
results[name].append(pf)
# Compare across windows
for name, pfs in results.items():
print(f"{name}: PF = {np.mean(pfs):.2f} ± {np.std(pfs):.2f}"
f" (wins {sum(1 for p in pfs if p > 1)}/{len(pfs)} windows)")Each model is tested on multiple non-overlapping forward windows. Comparing average profit factor AND consistency (how many windows are profitable) gives a fair assessment.
Knowledge Check
Q1.Why does k-fold cross-validation overestimate trading model performance?
Assignment
Take two models (e.g., XGBoost and Random Forest) and compare them using walk-forward validation over at least 4 non-overlapping test windows. Report profit factor for each window and overall. Is the difference statistically significant? Would you trust the winner?