II IntermediateWeek 3 • Lesson 8Duration: 70 min

FEA Feature Engineering for Trading

Turning raw price data into signals your ML models can learn from

Learning Objectives

•Understand what features are and why they matter more than model choice
•Learn the standard feature groups used in production systems
•Avoid the classic traps: look-ahead bias, multicollinearity, target leakage
•Build a feature pipeline with proper validation checks

Explain Like I'm 5

Feature engineering is detective work. The market gives you raw evidence (prices, volumes). Your job is to extract useful clues. "Was the market volatile today?" "Is momentum strong?" "Is the spread widening?" These clues — features — are what your ML model uses to make predictions.

Think of It This Way

Imagine you're a doctor diagnosing a patient. Raw data = "the patient exists." Features = "heart rate is 95, blood pressure is 140/90, temperature is 101°F, they're sweating." Features are far more useful than raw data. Same with markets — raw prices mean little. Features extracted from prices mean everything.

1Features Matter More Than Models

Here's something every experienced quant agrees on: feature engineering matters more than model selection. It's not even close. A simple model with great features will beat a complex model with bad features. Every time. People build transformer architectures on raw prices and get 51% accuracy — basically a coin flip. Meanwhile, a well-tuned XGBoost with carefully engineered features can hit high-50s accuracy. Think of it this way: give a talented detective the right clues and they'll solve the case. Give them garbage evidence and it doesn't matter how smart they are — they'll chase phantoms. Where to spend your time: 70% on features, 20% on validation, 10% on model tuning. Most beginners do the exact opposite.

2The Standard Feature Groups

Production systems typically organize features into five groups: Price Action — derived from OHLC data. Percentage changes, range metrics, candlestick ratios, gap analysis, close position within range. Tells you what price is doing right now. Momentum — rate of change. RSI, MACD, stochastics, ADX, directional movement. Tells you how fast price is moving and in which direction. Volatility — market uncertainty. ATR, Bollinger Band width, realized vol, volatility ratios (current vs. historical). Tells you how nervous the market is. Microstructure — market mechanics. Spread analysis, volume profiles, order flow proxies. Tells you about liquidity and market health. Regime — market state. Hurst exponent, autocorrelation lags, ADX regime, volatility regime. Tells you what type of market you're in. The regime features are where the real alpha tends to live. Most retail systems only use price action and momentum. Adding microstructure and regime features is what separates hobby projects from production-grade systems.

Feature Group Importance (Typical Production System)

3The Three Feature Engineering Traps

Three ways to destroy your strategy with bad features. I've seen every one of these in "profitable" strategies that died on contact with live markets: 1. Look-ahead bias — using future information. Computing a 20-day moving average that includes today and tomorrow's prices. Sounds obvious, but it's incredibly easy to do accidentally with pandas. Always use `.shift(1)` before feeding data to your model. 2. Multicollinearity — features that are basically the same thing. If you include both SMA_10 and SMA_12, they're 99% correlated. The model splits importance between them, making everything less stable. Use correlation matrices and remove one from any pair with |corr| > 0.85. 3. Target leakage — features derived from the target variable. If your target is "will price go up" and your feature includes any future price information... that's game over for your backtest credibility. You'd be surprised how many published strategies have this bug. Build explicit checks for all three into your pipeline. Paranoia pays off here.

Feature Correlation Matrix — Finding Redundant Features

4Feature Normalization

This is a step beginners skip constantly, then wonder why their model is garbage. Different features have wildly different scales: • RSI: 0-100 • ATR: 0.0010 (forex) to 50+ (indices) • Volume: thousands to millions Without normalization, models weight high-magnitude features more heavily regardless of actual importance. Tree models (XGBoost) are somewhat robust to this, but neural networks will break completely. Common methods: • Z-score: (x - mean) / std. Works well for roughly normal features. • Min-Max: (x - min) / (max - min). Maps to [0, 1]. Good for bounded features. • Robust scaling: (x - median) / IQR. Resistant to outliers — this is what financial data needs. • Rank normalization: convert to percentile ranks. Most robust option. Critical rule: Compute normalization stats from training data only. If you include test data, you've leaked information. Use sklearn's pipeline, or fit on train and transform on test.

5Feature Selection — Less Is More

You've built 50 features. Now throw away 20 of them. More features doesn't mean a better model. Past a certain point, additional features add noise faster than signal. This is the curse of dimensionality and it's very real. Methods, from simple to sophisticated: • Correlation filter: Remove one from any pair with |corr| > 0.85. • Variance threshold: Remove features with near-zero variance (they're constant and useless). • Mutual information: Measures non-linear dependency with the target. Better than correlation. • Recursive feature elimination: Train model, drop worst feature, repeat. Slow but effective. • XGBoost importance: Train, check feature_importances_. Bottom 20% are candidates for removal. The features that survive aggressive selection are your core set. The marginal ones that add 0.1% accuracy? Probably overfit. They'll hurt you live. Production systems typically end up with 30-40 features after starting with 100+. Quality over quantity.

Key Formulas

RSI (Relative Strength Index)

Momentum on a 0-100 scale. RSI > 70 = overbought, RSI < 30 = oversold. RSI(14) is a core feature in basically every quant system. Simple but effective.

Bollinger Band Width

Measures relative volatility. Narrow width = low volatility (squeeze, potential breakout). Wide = high volatility. A core volatility feature in most production systems.

Hands-On Code

Production Feature Engineering Pipeline

python

import pandas as pd
import numpy as np

def engineer_features(df):
    """Build the core feature set for ML model."""
    f = pd.DataFrame(index=df.index)
    
    # Price action
    f['close_pct'] = df['close'].pct_change()
    f['hl_range'] = (df['high'] - df['low']) / df['close']
    f['body_ratio'] = abs(df['close'] - df['open']) / (df['high'] - df['low'] + 1e-8)
    
    # Momentum
    delta = df['close'].diff()
    gain = delta.clip(lower=0).rolling(14).mean()
    loss = (-delta.clip(upper=0)).rolling(14).mean()
    f['rsi_14'] = 100 - (100 / (1 + gain / (loss + 1e-8)))
    
    # Volatility
    tr = pd.concat([
        df['high'] - df['low'],
        abs(df['high'] - df['close'].shift(1)),
        abs(df['low'] - df['close'].shift(1))
    ], axis=1).max(axis=1)
    f['atr_14'] = tr.rolling(14).mean()
    
    sma20 = df['close'].rolling(20).mean()
    std20 = df['close'].rolling(20).std()
    f['bb_width'] = (4 * std20) / (sma20 + 1e-8)
    
    # CRITICAL: shift to prevent look-ahead bias
    f = f.shift(1)  # <-- THIS LINE MATTERS
    
    return f.dropna()

# Every feature is computed from PAST data only
# .shift(1) ensures no look-ahead contamination

Notice the .shift(1) at the end — that's the most important line. It ensures all features use data available before the prediction point. Forgetting this = look-ahead bias = fake results.

Knowledge Check

Q1.Why is feature engineering considered more important than model selection?

Q2.What does .shift(1) prevent in a feature pipeline?

Assignment

Build a feature engineering pipeline for any forex pair. Include at least 10 features from 3 different groups (price action, momentum, volatility). Compute the correlation matrix and remove any with correlation > 0.85. How many survive?

← Previous Lesson Next Lesson →