SUP Supervised Learning for Trading
Classification and regression — the practical differences that matter
Learning Objectives
- •Understand classification vs regression framing for trading problems
- •Learn why labels matter more than models
- •See common labeling schemes and their tradeoffs
- •Know when to use each approach and what pitfalls to avoid
Explain Like I'm 5
Supervised learning means showing the model thousands of examples with known answers: "this pattern led to a winning trade, this one led to a loss." After enough examples, the model learns to recognize which patterns are likely to work going forward. The quality of those examples — the labels — matters more than which algorithm you use.
Think of It This Way
It's like training a new analyst. You show them 5,000 past trades: "this one worked because of X, this one failed because of Y." After enough examples, they develop intuition for what makes a good setup. The quality of your teaching materials (labels) determines how good they get. Garbage examples = garbage analyst.
1Classification vs Regression — Which One?
Classification vs Regression Performance by Task
2The Label Problem
3Triple Barrier Labeling — Visualized
Label Distribution with Triple Barrier Method
4The Lookback/Lookahead Problem
5Sample Weighting — Not All Labels Are Equal
Model Accuracy: Uniform vs Weighted Samples
Key Formulas
Triple Barrier Label
Three possible labels: +1 if price hits upper barrier Δ+ first, -1 if lower barrier Δ- first, 0 if neither is hit within the time limit t_max. This handles the "trade went nowhere" case that binary labels miss.
Sample Uniqueness Weight
Weight each sample inversely by how much its label-defining period overlaps with other samples. Highly overlapping samples are down-weighted since they carry redundant information.
Hands-On Code
Triple Barrier Labeling
import numpy as np
def triple_barrier_labels(prices, entry_idx, upper_mult=2.0,
lower_mult=1.0, max_bars=50, atr=None):
"""Label trades using the triple barrier method."""
labels = []
for idx in entry_idx:
entry_price = prices[idx]
bar_atr = atr[idx] if atr is not None else prices[idx] * 0.01
upper = entry_price + upper_mult * bar_atr
lower = entry_price - lower_mult * bar_atr
label = 0 # default: timeout
for t in range(1, max_bars + 1):
if idx + t >= len(prices):
break
price = prices[idx + t]
if price >= upper:
label = 1 # Hit take profit
break
elif price <= lower:
label = -1 # Hit stop loss
break
labels.append(label)
labels = np.array(labels)
print(f"Win: {(labels==1).mean():.1%}, "
f"Loss: {(labels==-1).mean():.1%}, "
f"Timeout: {(labels==0).mean():.1%}")
return labelsThe triple barrier method assigns clean labels to historical data. Upper barrier = take profit, lower = stop loss, time limit = neither hit. Timeout labels (0) are crucial — forcing everything into win/loss distorts the training signal.
Knowledge Check
Q1.Why do labels matter more than model choice?
Assignment
Implement triple barrier labeling on historical OHLCV data. Compare the label distribution against simple binary labels (up/down after N bars). How many timeout samples does the triple barrier capture that binary labeling would have incorrectly classified?