FEA Feature Engineering for Trading
Turning raw price data into signals your ML models can learn from
Learning Objectives
- •Understand what features are and why they matter more than model choice
- •Learn the standard feature groups used in production systems
- •Avoid the classic traps: look-ahead bias, multicollinearity, target leakage
- •Build a feature pipeline with proper validation checks
Explain Like I'm 5
Feature engineering is detective work. The market gives you raw evidence (prices, volumes). Your job is to extract useful clues. "Was the market volatile today?" "Is momentum strong?" "Is the spread widening?" These clues — features — are what your ML model uses to make predictions.
Think of It This Way
Imagine you're a doctor diagnosing a patient. Raw data = "the patient exists." Features = "heart rate is 95, blood pressure is 140/90, temperature is 101°F, they're sweating." Features are far more useful than raw data. Same with markets — raw prices mean little. Features extracted from prices mean everything.
1Features Matter More Than Models
2The Standard Feature Groups
Feature Group Importance (Typical Production System)
3The Three Feature Engineering Traps
Feature Correlation Matrix — Finding Redundant Features
4Feature Normalization
5Feature Selection — Less Is More
Key Formulas
RSI (Relative Strength Index)
Momentum on a 0-100 scale. RSI > 70 = overbought, RSI < 30 = oversold. RSI(14) is a core feature in basically every quant system. Simple but effective.
Bollinger Band Width
Measures relative volatility. Narrow width = low volatility (squeeze, potential breakout). Wide = high volatility. A core volatility feature in most production systems.
Hands-On Code
Production Feature Engineering Pipeline
import pandas as pd
import numpy as np
def engineer_features(df):
"""Build the core feature set for ML model."""
f = pd.DataFrame(index=df.index)
# Price action
f['close_pct'] = df['close'].pct_change()
f['hl_range'] = (df['high'] - df['low']) / df['close']
f['body_ratio'] = abs(df['close'] - df['open']) / (df['high'] - df['low'] + 1e-8)
# Momentum
delta = df['close'].diff()
gain = delta.clip(lower=0).rolling(14).mean()
loss = (-delta.clip(upper=0)).rolling(14).mean()
f['rsi_14'] = 100 - (100 / (1 + gain / (loss + 1e-8)))
# Volatility
tr = pd.concat([
df['high'] - df['low'],
abs(df['high'] - df['close'].shift(1)),
abs(df['low'] - df['close'].shift(1))
], axis=1).max(axis=1)
f['atr_14'] = tr.rolling(14).mean()
sma20 = df['close'].rolling(20).mean()
std20 = df['close'].rolling(20).std()
f['bb_width'] = (4 * std20) / (sma20 + 1e-8)
# CRITICAL: shift to prevent look-ahead bias
f = f.shift(1) # <-- THIS LINE MATTERS
return f.dropna()
# Every feature is computed from PAST data only
# .shift(1) ensures no look-ahead contaminationNotice the .shift(1) at the end — that's the most important line. It ensures all features use data available before the prediction point. Forgetting this = look-ahead bias = fake results.
Knowledge Check
Q1.Why is feature engineering considered more important than model selection?
Q2.What does .shift(1) prevent in a feature pipeline?
Assignment
Build a feature engineering pipeline for any forex pair. Include at least 10 features from 3 different groups (price action, momentum, volatility). Compute the correlation matrix and remove any with correlation > 0.85. How many survive?