← Back to Learn
III AdvancedWeek 6 • Lesson 17Duration: 55 min

SEQ LSTMs & Sequence Models

When order matters — modeling trade evolution and time series

Learning Objectives

  • Understand why LSTMs were designed and what problem they solve
  • Learn how to frame trading problems as sequence tasks
  • See practical LSTM applications: exit management and regime detection
  • Know the common failure modes and how to avoid them

Explain Like I'm 5

Standard neural networks see each input independently — they have no memory. LSTMs (Long Short-Term Memory networks) were designed to remember. They process sequences step by step and decide what to remember, what to forget, and what to output at each step. For trade management, the sequence of how a trade evolves bar by bar is exactly this kind of problem.

Think of It This Way

Reading a single sentence vs. reading a whole paragraph. A standard network sees one sentence and responds. An LSTM reads the whole paragraph, carrying context forward. "The stock rallied hard" means something different after "The company reported weak earnings" vs. "The company crushed expectations." Context is everything.

1The Vanishing Gradient Problem

Before LSTMs, people tried using simple recurrent neural networks (RNNs) for sequences. They didn't work well because of the vanishing gradient problem. In a simple RNN, information is passed through many time steps. At each step, the gradient gets multiplied by the same weight matrix. If those weights are < 1, the gradient shrinks exponentially — after 20-30 steps it's effectively zero. The network "forgets" what happened at the beginning of the sequence. This matters for trading because a trade might last 30+ bars. A simple RNN would forget the entry conditions by the time it needs to make an exit decision. LSTMs solve this with a clever gating mechanism: a "cell state" highway that carries information across many time steps without degradation. Three gates (forget, input, output) control what information flows along this highway. Hochreiter, S. & Schmidhuber, J. (1997). "Long Short-Term Memory." Neural Computation.

Gradient Magnitude Over Time Steps: RNN vs LSTM

2LSTMs for Exit Management

This is where LSTMs genuinely earn their keep in a trading system. A trade is a sequence. At each bar after entry, you observe: - Current unrealized P&L (R) - MFE and MAE so far - Current volatility vs entry volatility - Current regime vs entry regime - Bars held - Recent price momentum The LSTM processes this sequence bar by bar, maintaining an internal state that captures the trade's "story." Has the trade been slowly grinding up? Did it spike and reverse? Has it been flat for 20 bars? At each time step, the LSTM outputs a score: probability that now is a good time to exit. This naturally adapts to the trade's trajectory rather than using a fixed rule. Why this beats fixed rules: a trade at +1.5R after 5 bars (trending nicely) should be treated differently from a trade at +1.5R after 50 bars (went nowhere for a long time then got lucky). The LSTM captures this distinction; a fixed exit rule doesn't.

LSTM Exit Score Over Trade Lifetime

3GRUs — The Simpler Alternative

GRUs (Gated Recurrent Units) are a simplified version of LSTMs with two gates instead of three. They merge the forget and input gates and combine the cell state and hidden state. Practical differences: - GRUs train faster — fewer parameters - GRUs perform comparably on most tasks - LSTMs have a slight edge on very long sequences (100+ steps) - GRUs are slightly better on smaller datasets For trading applications where sequences are typically 10-50 bars, GRUs often perform just as well as LSTMs with less computational cost. Try both and compare — the difference is usually within noise. Cho, K. et al. (2014). "Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation." EMNLP.

4Common Failure Modes

LSTMs for trading fail in predictable ways. Here's what to watch for: Overfitting to sequence patterns. LSTMs are powerful enough to memorize specific price patterns from training data. They'll "recognize" a pattern that appeared once and never appears again. Solution: aggressive dropout (0.3-0.5 on recurrent connections), small hidden sizes (32-64 units). Look-ahead bias in sequences. If you construct sequences using future data (e.g., normalized using full-sample statistics), your model cheats. All normalization must use only past data at each time step. Wrong framing. Not every problem benefits from sequential treatment. L1 signal detection is usually better as tabular classification. LSTMs add value when the time-ordering of features genuinely matters (exit management, regime evolution). Too-long sequences. Feeding 200+ bars of history to an LSTM adds noise without signal. The model struggles to determine what's relevant. Shorter sequences (20-50 bars) are usually better for financial applications. Not enough training data. LSTMs need thousands of sequences to train well. If you only have 500 complete trades, you don't have enough data for a reliable LSTM. Use simpler approaches.

LSTM Performance by Sequence Length

Key Formulas

LSTM Forget Gate

Decides what to forget from the cell state. σ outputs values between 0 (forget everything) and 1 (keep everything). The network learns which past information is relevant for the current decision.

LSTM Cell State Update

The cell state is updated by forgetting some old information (f_t × C_{t-1}) and adding new information (i_t × C̃_t). This selective memory is what prevents the vanishing gradient problem.

Hands-On Code

LSTM Exit Model

python
import torch
import torch.nn as nn

class ExitLSTM(nn.Module):
    """LSTM for trade exit timing (L3)."""
    
    def __init__(self, input_dim=8, hidden_dim=64, dropout=0.3):
        super().__init__()
        self.lstm = nn.LSTM(
            input_dim, hidden_dim,
            num_layers=2,
            dropout=dropout,
            batch_first=True,
        )
        self.classifier = nn.Sequential(
            nn.Linear(hidden_dim, 32),
            nn.ReLU(),
            nn.Dropout(dropout),
            nn.Linear(32, 1),
            nn.Sigmoid(),
        )
    
    def forward(self, sequence):
        """
        sequence shape: (batch, seq_len, features)
        Features per bar: current_r, mfe_r, mae_r, bars_held,
                          atr_ratio, regime_score, momentum, volume_ratio
        """
        lstm_out, _ = self.lstm(sequence)
        last_output = lstm_out[:, -1, :]  # Use final hidden state
        exit_prob = self.classifier(last_output)
        return exit_prob.squeeze(-1)

# Training note:
# Each training sample is a sequence of bars for one trade
# Label: 1 = should have exited at this bar, 0 = hold
# Use walk-forward: train on trades from months 1-12,
# validate on months 13-15, test on months 16-18

Two LSTM layers process the trade bar-by-bar. The final hidden state encodes the trade's full history. The classifier head converts this to an exit probability. Dropout on recurrent connections prevents overfitting to specific price patterns.

Knowledge Check

Q1.Why are LSTMs better suited for exit management than signal detection?

Assignment

Frame a trade exit problem as a sequence task. Define the per-bar features (at least 6), create training sequences from historical trades, and build a simple LSTM. Compare its exit decisions to a fixed trailing stop on the same trades.