← Back to Learn
I BeginnerWeek 1 • Lesson 3Duration: 45 min

STT Statistical Foundations for Quant Finance

Telling real edge apart from lucky noise

Learning Objectives

  • Understand distributions, averages, and dispersion in a trading context
  • Learn what stationarity means and why it breaks so often in markets
  • Know how to use p-values and handle the multiple testing trap
  • Recognize fat tails and what they mean for your risk estimates

Explain Like I'm 5

Statistics answers one question: "Is this pattern real, or just luck?" If someone says they can predict coin flips, you'd want to see them do it a lot of times before believing them. Statistics gives you the tools to tell genuine skill from random chance — and in finance, that distinction is worth billions.

Think of It This Way

Statistics is the forensic lab of quant finance. Anyone can show you a backtest with a pretty equity curve. Statistical analysis examines the evidence to figure out whether the performance is real or manufactured — whether the strategy actually found patterns or just memorized noise.

1How Financial Returns Actually Behave

Most introductory finance courses teach that returns follow a normal (bell curve) distribution. They don't. Real market returns break from normality in three important ways: Fat Tails. Extreme events happen far more often than a bell curve predicts. The 2008 crash, the 2020 COVID selloff, the 2015 Swiss franc flash crash — under normal distribution assumptions, these should basically never happen. Mandelbrot documented this back in 1963 with cotton prices, and it's held up across every asset class since. Negative Skew. Crashes tend to be faster and sharper than rallies. The left tail (big losses) is fatter than the right tail (big gains). Volatility Clustering. Volatile days tend to cluster together. A big move today makes a big move tomorrow more likely. Engle (1982) first documented this as ARCH effects. The practical takeaway: if you use normal distribution assumptions for risk management, you'll systematically underestimate how bad things can get. Reference: Mandelbrot, B. (1963). "The Variation of Certain Speculative Prices." Journal of Business, 36(4), 394–419.

Empirical vs Normal Distribution — Market Returns

2Mean, Variance, and the Sharpe Ratio

Three numbers tell you most of what you need to know about a strategy's return profile: Mean (μ): Average return per trade or per period. The first-order measure of profitability. Variance (σ²) and Standard Deviation (σ): How widely returns scatter around the mean. Lower scatter with the same mean = more consistent. The Sharpe Ratio combines both into one risk-adjusted number. Here's how to read it: • Below 0.5 — Marginal. Not enough compensation for the risk. • 0.5–1.0 — Decent. • 1.0–2.0 — Strong. This is where most good professional systems land. • Above 2.0 — Exceptional (or likely overfit). • Above 3.0 — Almost certainly overfit in backtesting. Reference: Sharpe, W. (1966). "Mutual Fund Performance." Journal of Business, 39(1), 119–138.

3Stationarity — The Assumption That Always Breaks

Almost every classical statistical method assumes your data is stationary — that the mean, variance, and correlation structure stay constant over time. Financial data breaks this assumption constantly. Markets shift between regimes: trending vs. mean-reverting, low vs. high volatility, correlated vs. decorrelated. You can formally test for stationarity: • Augmented Dickey-Fuller (ADF) — tests for unit roots (non-stationarity) • KPSS — tests for stationarity around a trend • Phillips-Perron — like ADF but handles serial correlation better Practical fixes: • Differencing — convert prices to returns • Regime detection — use metrics like the Hurst exponent to identify the current state • Rolling windows — compute stats over recent data only • Walk-forward optimization — retrain models on recent data periodically Reference: Hamilton, J.D. (1994). "Time Series Analysis." Princeton University Press.

4Statistical Significance and the Multiple Testing Problem

When you evaluate a trading signal, you're really asking: "Could this result have come from pure chance?" The p-value answers this. A p-value of 0.05 means there's a 5% chance of seeing results this strong if the signal has zero actual edge. In quant finance, the standards are strict: • p < 0.05 — Interesting, but not enough to deploy • p < 0.01 — Worth serious investigation • p < 0.001 — Strong evidence of real edge The Multiple Testing Problem: Test 100 random signals at p < 0.05, and roughly 5 will look "significant" purely by chance. This data mining bias has fooled a lot of people. Defenses against it: • Bonferroni correction — divide your significance threshold by the number of tests • False Discovery Rate (FDR) — Benjamini-Hochberg procedure • Probability of Backtest Overfitting (PBO) — Bailey et al. (2014) • Walk-forward validation — train on the past, test on unseen future data Reference: Bailey, D.H., et al. (2014). "The Probability of Backtest Overfitting." Journal of Computational Finance.

Multiple Testing: False Positive Rate vs Number of Tests

Key Formulas

Sample Standard Deviation

Measures how much returns scatter around the mean. Lower σ with the same mean = better risk-adjusted performance. This is the building block for the Sharpe ratio and most risk metrics.

Sharpe Ratio

Mean excess return divided by standard deviation. Annualize by multiplying by √252 for daily data. The standard measure of risk-adjusted performance across the industry.

Z-Score

How many standard deviations a value sits from the mean. Used in mean-reversion strategies, outlier detection, feature normalization for ML, and signal generation.

Hands-On Code

Return Distribution Analysis

python
import numpy as np
from scipy import stats

# Simulate realistic trade returns
np.random.seed(42)
n_trades = 4500
wins = np.random.uniform(0.5, 3.5, int(n_trades * 0.592))
losses = np.random.uniform(-2.0, -0.1, n_trades - len(wins))
returns = np.concatenate([wins, losses])
np.random.shuffle(returns)

# Descriptive statistics
print("=== Return Distribution Analysis ===")
print(f"Mean return:  {returns.mean():+.3f}R")
print(f"Std dev:      {returns.std():.3f}R")
print(f"Sharpe proxy: {returns.mean()/returns.std():.3f}")
print(f"Skewness:     {stats.skew(returns):+.3f}")
print(f"Kurtosis:     {stats.kurtosis(returns):+.3f}")

# Statistical significance test
t_stat, p_value = stats.ttest_1samp(returns, 0)
print(f"\n=== Statistical Significance ===")
print(f"t-statistic:  {t_stat:.2f}")
print(f"p-value:      {p_value:.2e}")
print(f"Significant at 0.01? {'YES' if p_value < 0.01 else 'NO'}")

# Normality test (Jarque-Bera)
jb_stat, jb_p = stats.jarque_bera(returns)
print(f"\n=== Normality Test (Jarque-Bera) ===")
print(f"JB statistic: {jb_stat:.2f}")
print(f"Normal?       {'YES' if jb_p > 0.05 else 'NO — fat tails present'}")

Three fundamental checks for any strategy: descriptive stats (what does the distribution look like?), statistical significance (is the edge real?), and normality test (are your risk assumptions valid?). Run these before trusting any backtest.

Knowledge Check

Q1.Why are fat tails particularly dangerous for risk management?

Q2.A backtest produces a Sharpe ratio of 3.5. What should your first reaction be?

Q3.What is the multiple testing problem in quantitative research?

Assignment

Download daily returns for any major instrument. Compute: mean, standard deviation, skewness, kurtosis, and annualized Sharpe. Make a Q-Q plot comparing the empirical distribution to a normal. Run a Jarque-Bera test. Write 500 words analyzing your findings, focusing on tail behavior.