← Back to Learn
II IntermediateWeek 5 • Lesson 14Duration: 50 min

SUP Supervised Learning for Trading

Classification and regression — the practical differences that matter

Learning Objectives

  • Understand classification vs regression framing for trading problems
  • Learn why labels matter more than models
  • See common labeling schemes and their tradeoffs
  • Know when to use each approach and what pitfalls to avoid

Explain Like I'm 5

Supervised learning means showing the model thousands of examples with known answers: "this pattern led to a winning trade, this one led to a loss." After enough examples, the model learns to recognize which patterns are likely to work going forward. The quality of those examples — the labels — matters more than which algorithm you use.

Think of It This Way

It's like training a new analyst. You show them 5,000 past trades: "this one worked because of X, this one failed because of Y." After enough examples, they develop intuition for what makes a good setup. The quality of your teaching materials (labels) determines how good they get. Garbage examples = garbage analyst.

1Classification vs Regression — Which One?

Two ways to frame the prediction problem: Classification: "Will this trade be a winner or a loser?" Binary output (1 or 0), possibly with a probability attached. Clean, simple, directly actionable. This is what most L1 signal models use. Regression: "How much will this trade return?" Continuous output. More informative in theory — you know not just IF a trade will work but HOW MUCH. In practice, regression targets are noisier and harder to predict accurately. For L1 signal detection, classification usually wins. You don't need to know the exact return — you just need to know whether it's worth entering. The probability output gives you a natural confidence score for position sizing. For L3 exit management, regression can make more sense. Predicting expected remaining return or time to target helps make exit decisions more granular than just hold/close. Hastie, T., Tibshirani, R. & Friedman, J. (2009). "The Elements of Statistical Learning." Springer.

Classification vs Regression Performance by Task

2The Label Problem

Here's the uncomfortable truth: your labels matter more than your model. Put another way — a simple model with great labels will outperform a complex model with bad labels every time. Common labeling approaches: Fixed barrier. Did the trade hit +2R before hitting -1R? Binary. Simple. But ignores time and path — a trade that takes 100 bars to reach +2R is labeled the same as one that gets there in 5 bars. Triple barrier. Three possible outcomes: hit profit target, hit stop-loss, or hit time limit. This is more realistic — trades that go nowhere and time out are labeled differently from clear wins and losses. López de Prado advocates this strongly. Forward return. Label is the N-bar forward return. Continuous and flexible but noisy. Small differences in N can change labels dramatically. Trend-based. Use a trend detection algorithm to label the start and end of trends. More aligned with what you actually want to capture but introduces label complexity. In practice, the triple barrier method gives the best results for most trading applications. It naturally handles the "did nothing" case that other methods ignore.

3Triple Barrier Labeling — Visualized

The triple barrier creates three possible outcomes for each potential trade: 1. Price hits upper barrier first → Win (label = 1) 2. Price hits lower barrier first → Loss (label = -1) 3. Time runs out without hitting either → Timeout (label = 0) The timeout label is crucial. Many potential trades just don't go anywhere. Labeling them as wins or losses when they're really "meh" corrupts your training data. The triple barrier separates genuine opportunities from noise. López de Prado, M. (2018). "Advances in Financial Machine Learning." Wiley — Chapter 3: Labeling.

Label Distribution with Triple Barrier Method

4The Lookback/Lookahead Problem

This is the single most common mistake in ML for trading. And it's subtle. Lookahead bias: Your features contain information from the future. This is obviously wrong but sneaks in through: - Using close price to calculate a feature, then predicting the close price - Normalizing features using statistics computed over the full dataset (including future data) - Labels that depend on future volatility computed with future data Lookback bias: Your model implicitly assumes you had information that wasn't available at the time of the trade. Common in: - Using an indicator that requires N bars to warm up without accounting for that - Features that reference "today's session high" before the session has closed The fix: purged cross-validation. When you split train/test, add a gap between them to prevent information leakage. If your features use 50 bars of history, the gap should be at least 50 bars. If you skip this step, your backtest will look amazing and your live performance will be mediocre at best. This is probably the #1 reason ML backtests fail in production.

5Sample Weighting — Not All Labels Are Equal

Some training samples are more informative than others. Weighting them appropriately improves model quality. By uniqueness. If a label was determined by a price path that overlaps heavily with other labels, it's not providing independent information. Down-weight overlapping samples. By return magnitude. Clear wins (+3R) and clear losses (-1R) are more informative than scratches (+0.1R). Weight by the absolute return. By regime. Recent data is more relevant than old data. Apply exponential decay weights so the model emphasizes recent patterns. By rarity. If one class is rare (e.g., only 10% of samples are "strong buy"), up-weight rare samples to prevent the model from ignoring them. This is one of those things that seems fiddly but reliably improves results. A well-weighted training set can improve out-of-sample performance by 2-5% in accuracy. López de Prado, M. (2018). "Advances in Financial Machine Learning." — Chapter 4: Sample Weights.

Model Accuracy: Uniform vs Weighted Samples

Key Formulas

Triple Barrier Label

Three possible labels: +1 if price hits upper barrier Δ+ first, -1 if lower barrier Δ- first, 0 if neither is hit within the time limit t_max. This handles the "trade went nowhere" case that binary labels miss.

Sample Uniqueness Weight

Weight each sample inversely by how much its label-defining period overlaps with other samples. Highly overlapping samples are down-weighted since they carry redundant information.

Hands-On Code

Triple Barrier Labeling

python
import numpy as np

def triple_barrier_labels(prices, entry_idx, upper_mult=2.0,
                          lower_mult=1.0, max_bars=50, atr=None):
    """Label trades using the triple barrier method."""
    labels = []
    
    for idx in entry_idx:
        entry_price = prices[idx]
        bar_atr = atr[idx] if atr is not None else prices[idx] * 0.01
        
        upper = entry_price + upper_mult * bar_atr
        lower = entry_price - lower_mult * bar_atr
        
        label = 0  # default: timeout
        for t in range(1, max_bars + 1):
            if idx + t >= len(prices):
                break
            price = prices[idx + t]
            
            if price >= upper:
                label = 1   # Hit take profit
                break
            elif price <= lower:
                label = -1  # Hit stop loss
                break
        
        labels.append(label)
    
    labels = np.array(labels)
    print(f"Win: {(labels==1).mean():.1%}, "
          f"Loss: {(labels==-1).mean():.1%}, "
          f"Timeout: {(labels==0).mean():.1%}")
    return labels

The triple barrier method assigns clean labels to historical data. Upper barrier = take profit, lower = stop loss, time limit = neither hit. Timeout labels (0) are crucial — forcing everything into win/loss distorts the training signal.

Knowledge Check

Q1.Why do labels matter more than model choice?

Assignment

Implement triple barrier labeling on historical OHLCV data. Compare the label distribution against simple binary labels (up/down after N bars). How many timeout samples does the triple barrier capture that binary labeling would have incorrectly classified?