II IntermediateWeek 26 • Lesson 74Duration: 40 min

DB Database Design for Trading

If you can't query your own trades, you're flying blind

Learning Objectives

•Design database schemas for trading data
•Choose between SQL and time-series databases
•Implement efficient data storage and retrieval

Explain Like I'm 5

You need to store: market data (potentially millions of bars), trade history (every single trade with entry/exit/PnL), model outputs (predictions, features), and configuration (parameters, thresholds). The database design determines how fast you can answer questions like "what was my win rate on FOREX trades last month?" Get it right and analysis is instant. Get it wrong and simple queries take hours. Most beginners dump everything to CSVs and then wonder why their analysis workflow is painful.

Think of It This Way

Database design is like organizing a library. You COULD throw all books in a massive pile, or organize by genre, author, and date with a search catalog. Same books, wildly different ability to find what you need. Now imagine trying to find one specific book in an unsorted pile of 500,000.

1Database Options for Trading

There are several database options, each suited to different workloads: SQLite/PostgreSQL (relational): - Best for: trade logs, configurations, results - Schema: tables with foreign keys, ACID compliance - Many systems start with JSON files (simple, portable) - Upgrade path: PostgreSQL for multi-user, concurrent access - SQLite is underrated — it handles millions of rows beautifully for a single user InfluxDB/TimescaleDB (time-series): - Best for: OHLCV data, tick data, feature time series - Optimized for time-range queries ("give me EURUSD 15-min bars from Jan-Jun 2024") - Compression: 10-50x for time-series data - If you have years of tick data, this is the right choice Redis (in-memory cache): - Best for: real-time data, current state, session data - Sub-millisecond reads — essentially instant - Volatile (data lost on restart unless persisted) - Ideal for live trading's current price cache File-based (CSV/JSON/Parquet): - JSON for configs and results - CSV/pickle for market data - Parquet for large datasets (5-10x faster than CSV) - Simple, no infrastructure needed - Works well for single-user, moderate data volume — and it's where most people start, which is completely fine

2Schema Design

Key tables/collections for a trading system: market_data: timestamp, symbol, timeframe, open, high, low, close, volume - Indexed by (symbol, timeframe, timestamp) - Partitioned by symbol or date for large datasets - This will be your biggest table by far trades: trade_id, symbol, direction, entry_time, exit_time, entry_price, exit_price, sl, tp, pnl_r, pnl_usd, cluster, l1_score, l2_decision - Indexed by trade_id, entry_time - Tracks every trade for analysis - This is the table you'll query most often features: timestamp, symbol, feature_name, value - OR wide format: timestamp, symbol, rsi_14, macd, atr_14, ... - Indexed by (symbol, timestamp) - Useful for debugging signals and detecting feature drift model_predictions: timestamp, symbol, model, prediction, confidence - Tracks what the model predicted vs what actually happened - Essential for monitoring signal quality over time risk_state: timestamp, balance, equity, drawdown_pct, dd_zone, risk_level - Current state of risk management - Critical for DD-triggered scaling and compliance tracking

3Storage Size Reality Check

Let's do some actual math on data volume: 1-minute bars, 30 instruments, 5 years: - 252 trading days x 1,440 minutes x 30 instruments x 5 years - Approximately 544 million rows - At ~100 bytes per row = ~54 GB uncompressed - With time-series DB compression: ~5-10 GB - With parquet: ~8-15 GB - With CSV: ~54 GB (painful) 15-minute bars, 30 instruments, 5 years: - ~36 million rows — very manageable - Fits easily in SQLite or even memory Tick data, 30 instruments, 1 year: - Could be 500M+ ticks depending on instruments - You NEED a time-series DB for this - Or aggressive downsampling The takeaway: for M15 trading with 20-30 instruments, SQLite handles it fine. For tick-level research, invest in a proper time-series database.

Storage Size by Data Type (30 instruments, 5 years)

4Query Patterns That Actually Matter

You'll run these queries constantly in production. Design your schema around them: "What was my win rate last month?" -> SELECT COUNT(), AVG(CASE WHEN pnl_r > 0 THEN 1.0 ELSE 0.0 END) FROM trades WHERE exit_time > '2026-01-01' "Show me all losing trades on GOLD" -> SELECT FROM trades WHERE symbol = 'XAUUSD' AND pnl_r < 0 ORDER BY pnl_r "What did the model predict before this bad trade?" -> JOIN trades with model_predictions on timestamp and symbol "Is my model drifting?" -> SELECT date, AVG(prediction) FROM model_predictions GROUP BY date ORDER BY date -> Plot this over time and you'll see drift immediately "What was the risk state when this drawdown happened?" -> SELECT * FROM risk_state WHERE drawdown_pct > 4 ORDER BY timestamp If any of these queries take more than a second, your schema needs work. Indexes are your friend — add them on columns you frequently filter or sort by.

Hands-On Code

SQLite Trading Database

python

import sqlite3
import json
from datetime import datetime

class TradingDB:
    """Simple SQLite database for trading system."""
    
    def __init__(self, db_path='trading.db'):
        self.conn = sqlite3.connect(db_path)
        self.create_tables()
    
    def create_tables(self):
        self.conn.executescript("""
            CREATE TABLE IF NOT EXISTS trades (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                symbol TEXT NOT NULL,
                direction TEXT NOT NULL,
                entry_time TEXT NOT NULL,
                exit_time TEXT,
                entry_price REAL NOT NULL,
                exit_price REAL,
                stop_loss REAL,
                take_profit REAL,
                pnl_r REAL,
                cluster TEXT,
                l1_confidence REAL,
                l2_decision TEXT,
                status TEXT DEFAULT 'OPEN'
            );
            
            CREATE TABLE IF NOT EXISTS risk_state (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                timestamp TEXT NOT NULL,
                balance REAL,
                drawdown_pct REAL,
                dd_zone TEXT,
                risk_level REAL,
                open_positions INTEGER
            );
            
            CREATE INDEX IF NOT EXISTS idx_trades_symbol ON trades(symbol);
            CREATE INDEX IF NOT EXISTS idx_trades_entry ON trades(entry_time);
        """)
    
    def log_trade(self, trade: dict):
        self.conn.execute(
            "INSERT INTO trades (symbol, direction, entry_time, entry_price, "
            "stop_loss, cluster, l1_confidence, l2_decision) "
            "VALUES (?, ?, ?, ?, ?, ?, ?, ?)",
            (trade['symbol'], trade['direction'], str(datetime.now()),
             trade['price'], trade['sl'], trade.get('cluster'),
             trade.get('confidence'), trade.get('l2_decision'))
        )
        self.conn.commit()
    
    def get_stats(self):
        cur = self.conn.execute(
            "SELECT COUNT(*), AVG(pnl_r), SUM(CASE WHEN pnl_r > 0 THEN 1 ELSE 0 END) * 100.0 / COUNT(*) "
            "FROM trades WHERE status = 'CLOSED'"
        )
        total, avg_r, win_rate = cur.fetchone()
        print(f"Trades: {total}, Avg R: {avg_r:.2f}, WR: {win_rate:.1f}%")

A proper database lets you query trade history instantly. Without one, you're parsing files manually. Once you have this set up, you'll wonder how you ever operated without it.

Knowledge Check

Q1.You need to store 5 years of 1-minute OHLCV data for 30 instruments. What's the best database choice?

Assignment

Design and implement a SQLite database for your trading system. Store at least 100 simulated trades with all relevant fields. Write queries to compute: win rate, average R, best/worst trade, performance by cluster.

← Previous Lesson Next Lesson →