ALT Alternative Data for Trading
Beyond price and volume — what the non-traditional data landscape actually looks like
Learning Objectives
- •Categorize the major types of alternative data sources
- •Evaluate the cost-benefit tradeoff of alt data vs. price-derived features
- •Identify freely available alternative data sources
- •Build an evaluation framework for new data sources
Explain Like I'm 5
Alternative data is anything beyond traditional price, volume, and financial statements. Satellite imagery, credit card transactions, social media sentiment — potentially useful, all expensive, and most not worth the cost for smaller operations.
Think of It This Way
Traditional market data is like reading the scoreboard. Alternative data is like having a camera in the locker room — you see things before they show up in the score. But the camera costs a fortune and sometimes points the wrong way.
1The Alternative Data Landscape
2The Cost-Benefit Reality
3The Case for Price-Only Features
4Free Alternative Data Worth Using
5Alt Data Evaluation Framework
Key Formulas
Expected Alpha from Alternative Data
Expected alpha contribution from alternative data signal, where IC_alt is the information coefficient, sigma_r is return volatility, and z_alt is the standardized signal score
Hands-On Code
Alt Data Signal Evaluation Pipeline
import numpy as np
from scipy.stats import spearmanr
def evaluate_alt_data_signal(
signal_values: np.ndarray,
forward_returns: np.ndarray,
cost_per_year: float,
asset_volatility: float = 0.15
) -> dict:
"""
Evaluate whether an alternative data signal is worth incorporating.
"""
# 1. Compute IC
ic, p_value = spearmanr(signal_values, forward_returns)
# 2. Estimate alpha contribution
alpha_estimate = abs(ic) * asset_volatility
# 3. Break-even AUM
breakeven_aum = cost_per_year / alpha_estimate if alpha_estimate > 0 else float('inf')
# 4. Rolling IC stability
window = min(60, len(signal_values) // 4)
rolling_ics = []
for i in range(window, len(signal_values)):
chunk_ic, _ = spearmanr(
signal_values[i-window:i],
forward_returns[i-window:i]
)
rolling_ics.append(chunk_ic)
rolling_ics = np.array(rolling_ics)
return {
'ic': round(ic, 4),
'p_value': round(p_value, 4),
'ic_hit_rate': round(np.mean(rolling_ics > 0), 3),
'ic_stability': round(np.mean(rolling_ics) / (np.std(rolling_ics) + 1e-8), 3),
'estimated_alpha_bps': round(alpha_estimate * 10000, 1),
'breakeven_aum_millions': round(breakeven_aum / 1e6, 1),
'recommendation': 'investigate' if ic > 0.02 and p_value < 0.05 else 'skip'
}Evaluates whether an alternative data signal justifies its cost by computing IC, estimating alpha contribution, calculating break-even AUM, and assessing IC stability over time.
Knowledge Check
Q1.What is the primary reason price-derived features often outperform alternative data for liquid markets?
Q2.Before investing in a new alternative data source, the FIRST question to ask is:
Assignment
Using freely available data (VIX, VIX term structure, FRED macro indicators), construct three alternative features for an FX trading strategy. Compute the IC of each against forward 5-day returns for EUR/USD. Compare these ICs to simple price-derived features (20-day momentum, RSI, ATR ratio).