← Back to Learn
IV ExpertWeek 15 • Lesson 45Duration: 40 min

ALT Alternative Data for Trading

Beyond price and volume — what the non-traditional data landscape actually looks like

Learning Objectives

  • Categorize the major types of alternative data sources
  • Evaluate the cost-benefit tradeoff of alt data vs. price-derived features
  • Identify freely available alternative data sources
  • Build an evaluation framework for new data sources

Explain Like I'm 5

Alternative data is anything beyond traditional price, volume, and financial statements. Satellite imagery, credit card transactions, social media sentiment — potentially useful, all expensive, and most not worth the cost for smaller operations.

Think of It This Way

Traditional market data is like reading the scoreboard. Alternative data is like having a camera in the locker room — you see things before they show up in the score. But the camera costs a fortune and sometimes points the wrong way.

1The Alternative Data Landscape

Alternative data has exploded into a multi-billion-dollar industry. Here are the main categories: 1. Sentiment & News. Natural language processing applied to news articles, social media, earnings call transcripts, and analyst reports. The signal is real but rapidly decaying — by the time you process a news article, most of the price impact has already occurred. 2. Web & App Data. Web scraping for product pricing, job listings, app download rankings, website traffic estimates. Can be predictive for individual stocks but requires significant infrastructure to collect and maintain. 3. Geospatial & Satellite. Satellite imagery of retail parking lots, oil storage facilities, crop conditions. High cost (100K100K-500K/year), long lag times, and requires specialized ML for image processing. 4. Transaction Data. Credit card panels, point-of-sale data, bank transaction records. Among the most predictive alt data categories, but also the most expensive and privacy-sensitive. 5. Government & Regulatory. SEC filings, patent applications, FDA drug approvals, trade data, environmental permits. Mostly free but requires parsing infrastructure.

2The Cost-Benefit Reality

Let's be honest about the economics. Most alternative data is not worth its price tag for operations managing less than $50-100 million.

3The Case for Price-Only Features

Here's a perspective that won't make you popular at alt data conferences: for many trading strategies — particularly in FX, futures, and liquid markets — price-derived features remain superior to most alternative data. Why price data is underrated: 1. It's free and universal. No vendor risk, no data lag, no parsing errors. 2. It reflects all information. The efficient market hypothesis isn't perfectly true, but it's true enough that price action already incorporates most fundamental and alternative signals — just with some lag and noise. 3. It's the most liquid feature space. Any signal derived from price data can immediately be implemented at the same frequency and in the same markets where the data originates. 4. The feature engineering space is enormous. Returns at multiple horizons, volatility measures, correlation structures, microstructure features, regime indicators — you can extract hundreds of meaningful features from OHLCV data alone. This doesn't mean alternative data is useless. It means you should exhaust price-derived features first, and only add alt data when you have a specific hypothesis about what additional information it provides. Harvey, Liu, & Zhu (2016) in "...and the Cross-Section of Expected Returns" document hundreds of factors — the vast majority derivable from price and fundamental data alone.

4Free Alternative Data Worth Using

If you're going to incorporate alternative data, start with what's freely available: | Source | Data | Update Frequency | Use Case | |--------|------|-------------------|----------| | VIX & VVIX | Implied vol, vol-of-vol | Real-time | Regime detection, risk scaling | | COT Reports | Futures positioning | Weekly (Fri release) | Contrarian positioning signals | | FRED | Macro indicators | Varies | Interest rate, inflation regime | | EDGAR | SEC filings | Same day | Corporate event signals | | Treasury.gov | Yield curve data | Daily | Macro regime, risk-on/risk-off | | CFTC | Options/futures data | Weekly | Sentiment, positioning | The VIX family alone (VIX, VIX term structure, VVIX, skew index) provides a remarkably rich set of regime and sentiment features at zero cost. If your models aren't already incorporating implied volatility surface data, that should be your first alt data integration.

5Alt Data Evaluation Framework

Before spending money or engineering time on a new data source, run it through this checklist: 1. Economic hypothesis. Can you articulate why this data should predict returns? "It feels like it should work" is not a hypothesis. "Credit card spending at retailers predicts earnings surprises with 3-week lead time" is. 2. Signal-to-noise ratio. Most alt data is extremely noisy. What's the expected IC? If it's below 0.02 and you can't increase breadth, the data probably isn't worth the effort. 3. Exclusivity window. How long until this data is widely available? Satellite data that was exclusive in 2018 is now offered by a dozen vendors. 4. Backtest feasibility. Can you get historical data for a meaningful backtest? Many alt data sources only have 2-3 years of history, which is insufficient for robust validation. 5. Legal and compliance. Is the data sourced ethically and legally? Web scraping can violate terms of service. Credit card data raises privacy concerns. 6. Implementation complexity. How much engineering work to ingest, clean, and integrate? A dataset that takes 6 months of engineering and provides 0.01 IC improvement isn't a good tradeoff.

Key Formulas

Expected Alpha from Alternative Data

Expected alpha contribution from alternative data signal, where IC_alt is the information coefficient, sigma_r is return volatility, and z_alt is the standardized signal score

Hands-On Code

Alt Data Signal Evaluation Pipeline

python
import numpy as np
from scipy.stats import spearmanr

def evaluate_alt_data_signal(
    signal_values: np.ndarray,
    forward_returns: np.ndarray,
    cost_per_year: float,
    asset_volatility: float = 0.15
) -> dict:
    """
    Evaluate whether an alternative data signal is worth incorporating.
    """
    # 1. Compute IC
    ic, p_value = spearmanr(signal_values, forward_returns)
    
    # 2. Estimate alpha contribution
    alpha_estimate = abs(ic) * asset_volatility
    
    # 3. Break-even AUM
    breakeven_aum = cost_per_year / alpha_estimate if alpha_estimate > 0 else float('inf')
    
    # 4. Rolling IC stability
    window = min(60, len(signal_values) // 4)
    rolling_ics = []
    for i in range(window, len(signal_values)):
        chunk_ic, _ = spearmanr(
            signal_values[i-window:i], 
            forward_returns[i-window:i]
        )
        rolling_ics.append(chunk_ic)
    
    rolling_ics = np.array(rolling_ics)
    
    return {
        'ic': round(ic, 4),
        'p_value': round(p_value, 4),
        'ic_hit_rate': round(np.mean(rolling_ics > 0), 3),
        'ic_stability': round(np.mean(rolling_ics) / (np.std(rolling_ics) + 1e-8), 3),
        'estimated_alpha_bps': round(alpha_estimate * 10000, 1),
        'breakeven_aum_millions': round(breakeven_aum / 1e6, 1),
        'recommendation': 'investigate' if ic > 0.02 and p_value < 0.05 else 'skip'
    }

Evaluates whether an alternative data signal justifies its cost by computing IC, estimating alpha contribution, calculating break-even AUM, and assessing IC stability over time.

Knowledge Check

Q1.What is the primary reason price-derived features often outperform alternative data for liquid markets?

Q2.Before investing in a new alternative data source, the FIRST question to ask is:

Assignment

Using freely available data (VIX, VIX term structure, FRED macro indicators), construct three alternative features for an FX trading strategy. Compute the IC of each against forward 5-day returns for EUR/USD. Compare these ICs to simple price-derived features (20-day momentum, RSI, ATR ratio).