III AdvancedWeek 6 • Lesson 16Duration: 55 min

NET Neural Networks for Finance

The honest case for and against deep learning in trading

Learning Objectives

•Understand when neural networks actually help in trading applications
•Learn the fundamental architecture: layers, activations, and backpropagation
•See why deep learning often loses to tree-based models on financial data
•Know which financial problems genuinely benefit from neural approaches

Explain Like I'm 5

A neural network is a function that learns its own rules. You give it thousands of examples and it figures out the pattern. For images and language, this is magical. For structured financial data — price, volume, indicators — it's often overkill. The question isn't "can neural nets do this?" It's "should they?"

Think of It This Way

Think of neural networks like hiring a genius who needs to learn everything from scratch versus hiring an experienced specialist. The genius (neural net) might eventually surpass the specialist (XGBoost) if given enough data and time. But for a specific, well-defined job with limited training examples, the specialist usually wins. Finance is that job.

1The Honest Assessment

Let's start with the uncomfortable truth: for most tabular financial prediction tasks, neural networks underperform gradient-boosted trees. This isn't controversial in the ML research community. Grinsztajn et al. (2022) showed it empirically. Borisov et al. (2022) surveyed 60+ papers and found the same pattern. Why? 1. Financial datasets are small by deep learning standards. 10,000 labeled trades is a large trading dataset. It's a tiny deep learning dataset. 2. Features are structured. The data is already in rows and columns — there's no spatial or sequential structure that convolutions or attention mechanisms can exploit (unless you frame it that way, which we'll cover with LSTMs). 3. Signal-to-noise ratio is terrible. Markets are mostly noise. Neural nets are powerful enough to memorize that noise. Trees, with proper regularization, are less prone to this. 4. Interpretability matters. When a tree-based model makes a prediction, you can trace the decision path. Neural nets are opaque. In production, when something breaks, opacity is expensive. Grinsztajn, L. et al. (2022). "Why do tree-based models still outperform deep learning on typical tabular data?" NeurIPS.

Model Family Performance on Financial Tabular Data

2Where Neural Nets Actually Help

That said, there are specific financial tasks where neural networks genuinely outperform: Sequential data. LSTMs and GRUs process time series natively. A trade's evolution over bars — the trajectory of unrealized P&L, volatility changes, regime shifts — is a sequence. Trees can't naturally handle sequences; neural nets can. Feature extraction from raw data. If you want to learn features directly from raw OHLCV without manually engineering RSI, MACD, etc., autoencoders and 1D CNNs can find patterns humans wouldn't construct. Multi-modal inputs. Combining price data with text (news sentiment), order book snapshots, and technical indicators. Neural nets handle heterogeneous inputs more naturally than trees. Reinforcement learning. For exit management and portfolio optimization, RL uses neural networks as function approximators. Trees can't play this role. The pattern: neural nets win when there's rich structure to exploit (sequences, images, text) and enough data to learn from. They lose when the problem is a flat table with limited rows.

3Architecture Fundamentals

If you're going to use neural networks for trading, you need to understand the basics: Input layer. Your features — one neuron per feature. Normalize everything to mean 0, std 1. Neural nets are sensitive to scale in a way trees aren't. Hidden layers. Each layer applies a linear transformation (weights × inputs + bias) followed by a non-linear activation function. ReLU is the default choice — simple, effective, fast. Output layer. For classification: sigmoid (binary) or softmax (multi-class). For regression: linear activation. The output should match your task. Dropout. Randomly zeroes out neurons during training. This is the single most important regularization technique for financial applications. Forces the network to not rely on any single feature or pathway. Use dropout = 0.3-0.5 on hidden layers. Batch normalization. Normalizes layer outputs during training. Stabilizes learning and allows higher learning rates. Especially useful for deeper architectures. For a starting point on financial tabular data: 2-3 hidden layers, 128-256 neurons each, ReLU, dropout 0.3, Adam optimizer, learning rate 1e-3. Srivastava, N. et al. (2014). "Dropout: A Simple Way to Prevent Neural Networks from Overfitting." JMLR.

Effect of Dropout Rate on Overfitting

4The Practical Recommendation

Here's the honest recommendation based on what actually works: For L1 (signal detection): Use XGBoost or LightGBM. Tabular features, limited data, need interpretability. Trees win. For L2 (entry filtering): Rules + simple ML. The decision is mostly based on observable market conditions. Don't overcomplicate it. For L3 (exit management): This is where neural networks earn their keep. LSTMs for modeling trade trajectories. RL agents for learning optimal exit policies. The sequential nature of trade management is a natural fit. For regime detection: Autoencoders can learn latent regime representations that are richer than hand-crafted features. Worth exploring if you have enough data. Don't use neural networks because they're impressive. Use them because the problem demands sequential or unstructured data processing that trees can't handle. For everything else, trees.

Key Formulas

Neuron Activation

Each neuron computes a weighted sum of inputs, adds a bias, and passes the result through an activation function σ. For ReLU: σ(z) = max(0, z). For sigmoid: σ(z) = 1/(1+e^{-z}).

Binary Cross-Entropy Loss

Standard loss function for binary classification (win/loss prediction). Penalizes confident wrong predictions more heavily than uncertain ones. Lower is better.

Hands-On Code

Simple MLP for Signal Classification

python

import torch
import torch.nn as nn

class SignalClassifier(nn.Module):
    """MLP for L1 signal detection (when you want to try neural)."""
    
    def __init__(self, n_features, hidden_dim=128, dropout=0.3):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(n_features, hidden_dim),
            nn.BatchNorm1d(hidden_dim),
            nn.ReLU(),
            nn.Dropout(dropout),
            
            nn.Linear(hidden_dim, hidden_dim // 2),
            nn.BatchNorm1d(hidden_dim // 2),
            nn.ReLU(),
            nn.Dropout(dropout),
            
            nn.Linear(hidden_dim // 2, 1),
            nn.Sigmoid(),
        )
    
    def forward(self, x):
        return self.net(x).squeeze(-1)

# Honest comparison workflow:
# 1. Train this MLP on your data
# 2. Train XGBoost on the same data with same splits
# 3. Compare validation AUC
# 4. XGBoost almost certainly wins
# 5. Use the MLP only if it consistently beats XGBoost
#    across multiple walk-forward windows

This is a fair starting point. BatchNorm and dropout provide regularization. But the real test is whether this beats XGBoost on your specific data. Usually it doesn't. Be empirical, not ideological.

Knowledge Check

Q1.For a signal detection task with 10,000 labeled trades and 38 tabular features, which model is most likely to perform best?

Assignment

Train both an MLP and XGBoost on the same dataset with the same walk-forward split. Compare AUC, accuracy, and profit factor. Document which model wins and speculate on why. This exercise builds calibration about when neural nets help.

← Previous Lesson Next Lesson →