← Back to Learn
III AdvancedWeek 29 • Lesson 80Duration: 30 min

PERF Performance Optimization

Making your system fast enough without wasting time on unnecessary speed

Learning Objectives

  • Identify bottlenecks in trading systems
  • Optimize data processing and model inference
  • Balance speed vs accuracy

Explain Like I'm 5

Performance optimization is about making your system FAST ENOUGH without wasting time on unnecessary speed. For M15 trading, you have 15 minutes between bars — latency isn't critical. But for backtesting 7.5 years of data, speed matters significantly. Optimize what's slow, ignore what's already fast.

Think of It This Way

Performance optimization is like preparing for a marathon, not a sprint. You don't need to be the fastest — you need to be consistent and efficient. Find the bottleneck (shoes, hydration, or pacing?) and fix THAT. Don't buy fancy shoes if you're dehydrated.

1Common Bottlenecks

1. Data loading: reading large CSV files repeatedly. -> Solution: cache in memory, use parquet format (5-10x faster than CSV) 2. Feature computation: computing 38 features for each bar. -> Solution: vectorize with numpy/pandas. Avoid Python loops. -> V7: feature computation takes ~50ms per bar (fine for M15) 3. Model inference: running ML prediction. -> XGBoost: ~1-5ms per prediction (very fast) -> LSTM: ~10-50ms per prediction (acceptable) -> Solution: batch predictions where possible 4. Backtesting loops: iterating over millions of bars. -> Solution: vectorized backtesting (process all bars at once) -> Avoid: iterating bar by bar in Python 5. Monte Carlo: running 20K simulations. -> Solution: numpy vectorization, parallel processing -> V7 MC validation: ~30 seconds with 20K sims (acceptable)

2Practical Optimization Tips

Profile first, optimize second. Never guess where the bottleneck is. Use cProfile or line_profiler to find the slow parts: python -m cProfile -s cumulative your_script.py Top optimizations by impact: 1. Numpy vectorization vs Python loops: 10-1000x speedup 2. Parquet vs CSV file format: 5-10x speedup for I/O 3. Caching computed features: eliminates redundant work 4. Multiprocessing for MC: linear speedup with cores 5. Numba JIT compilation: 10-100x for numerical loops Don't optimize: - Code that runs once at startup (loading configs) - Code that takes < 1ms per call - Readability for marginal speed gains For V7: the current performance is adequate for M15 trading. Backtest speed is the main optimization target — faster backtesting enables more thorough validation.

Hands-On Code

Vectorized vs Loop Performance

python
import numpy as np
import time

def compute_rsi_loop(prices, period=14):
    """RSI computed with Python loops (SLOW)."""
    rsi_values = []
    for i in range(period, len(prices)):
        gains, losses = [], []
        for j in range(i-period, i):
            change = prices[j+1] - prices[j]
            if change > 0:
                gains.append(change)
            else:
                losses.append(abs(change))
        avg_gain = np.mean(gains) if gains else 0
        avg_loss = np.mean(losses) if losses else 1e-10
        rs = avg_gain / avg_loss
        rsi_values.append(100 - 100/(1+rs))
    return rsi_values

def compute_rsi_vectorized(prices, period=14):
    """RSI computed with numpy vectorization (FAST)."""
    deltas = np.diff(prices)
    gains = np.where(deltas > 0, deltas, 0)
    losses = np.where(deltas < 0, -deltas, 0)
    
    avg_gain = np.convolve(gains, np.ones(period)/period, mode='valid')
    avg_loss = np.convolve(losses, np.ones(period)/period, mode='valid')
    avg_loss = np.where(avg_loss == 0, 1e-10, avg_loss)
    
    rs = avg_gain / avg_loss
    rsi = 100 - 100 / (1 + rs)
    return rsi

# Benchmark
prices = np.random.randn(10000).cumsum() + 100

t1 = time.time()
r1 = compute_rsi_loop(prices)
t2 = time.time()
r2 = compute_rsi_vectorized(prices)
t3 = time.time()

print(f"Loop:       {t2-t1:.3f}s")
print(f"Vectorized: {t3-t2:.6f}s")
print(f"Speedup:    {(t2-t1)/(t3-t2):.0f}x")

Numpy vectorization typically gives 100-1000x speedup over Python loops. This is the single most impactful optimization for data processing in Python.

Knowledge Check

Q1.Your backtest takes 3 hours. Profiling shows 90% of time is in feature computation (Python loops). What should you do?

Assignment

Profile your backtest with cProfile. Identify the top 3 bottlenecks. Optimize the biggest one (likely feature computation or bar-by-bar loops). Measure the speedup.