CI Cointegration
The real math behind pairs trading — correlation is not enough
Learning Objectives
- •Understand why correlation and cointegration are fundamentally different
- •Learn the Engle-Granger and Johansen cointegration tests
- •Apply cointegration to validate trading pairs rigorously
Explain Like I'm 5
Correlation means two assets move in the same direction. Cointegration means they move TOGETHER with a stable spread. The difference is massive: correlation can be sky-high while the spread drifts forever. Cointegration guarantees the spread reverts. For pairs trading, cointegration beats correlation. Always.
Think of It This Way
Think of a drunk person walking their dog. The person and dog are correlated (they go in roughly the same direction). But they're also cointegrated — the leash keeps them within a bounded distance. The "spread" (distance between them) mean-reverts because of the leash. In markets, economic relationships are the leash. This is arguably the best analogy in all of quantitative finance.
1Correlation vs. Cointegration
Correlation vs Cointegration: The Critical Difference
2Testing for Cointegration
3Half-Life: The Speed of Mean Reversion
Mean Reversion Speed by Half-Life
4Common Cointegration Mistakes
Key Formulas
Engle-Granger Regression
Step 1: regress X on Y. Step 2: test residuals for stationarity. If ADF rejects the unit root, the residuals are stationary and X and Y are cointegrated.
Half-Life of Mean Reversion
Where theta is the coefficient from regressing delta-epsilon on epsilon_{t-1}. Shorter half-life means faster mean reversion and more tradeable pairs.
Hands-On Code
Cointegration Testing
import numpy as np
from statsmodels.tsa.stattools import coint
def test_cointegration(price_a, price_b, names=('A', 'B')):
"""Test for cointegration between two price series."""
score, p_value, _ = coint(price_a, price_b)
print(f"=== COINTEGRATION: {names[0]} & {names[1]} ===")
print(f"Engle-Granger test stat: {score:.4f}")
print(f"p-value: {p_value:.4f}")
print(f" {'[PASS] COINTEGRATED' if p_value < 0.05 else '[FAIL] NOT cointegrated'}")
if p_value < 0.05:
from sklearn.linear_model import LinearRegression
model = LinearRegression()
log_a, log_b = np.log(price_a), np.log(price_b)
model.fit(log_b.reshape(-1, 1), log_a)
beta = model.coef_[0]
spread = log_a - beta * log_b
spread_lag = spread[:-1]
spread_diff = np.diff(spread)
theta_model = LinearRegression()
theta_model.fit(spread_lag.reshape(-1, 1), spread_diff)
theta = theta_model.coef_[0]
if -1 < theta < 0:
half_life = -np.log(2) / np.log(1 + theta)
print(f" Half-life: {half_life:.1f} periods")
print(f" Hedge ratio: {beta:.4f}")
if half_life < 50:
print(f" [PASS] Tradeable half-life")
else:
print(f" [WARN] Slow mean reversion")
return p_value < 0.05, p_valueTests for cointegration between two price series using the Engle-Granger method, then computes the hedge ratio and half-life to assess tradeability.
Knowledge Check
Q1.Two assets have correlation of 0.95 but fail the cointegration test. Should you pairs trade them?
Assignment
Test cointegration for 5-10 instrument pairs in your trading universe. Compute half-lives. Identify the best 2-3 pairs for trading. Verify that cointegration holds in a walk-forward framework.