← Back to Learn
IV ExpertWeek 12 • Lesson 36Duration: 50 min

MRF Model Risk Framework

What can go wrong with your models and how to prepare for it

Learning Objectives

  • Understand model risk and its sources in trading systems
  • Learn how to build a model risk management framework
  • Implement monitoring and escalation procedures

Explain Like I'm 5

Model risk is the risk that your model is wrong and you lose money because of it. Every model has assumptions. When those assumptions break, the model breaks. A model risk framework asks "what could go wrong?" and prepares responses for each scenario. This is where amateurs get destroyed.

Think of It This Way

Think of a model risk framework like an emergency response plan. You can't prevent all emergencies, but you can prepare: "If the model stops working, do X. If accuracy drops below Y%, do Z." Having the plan BEFORE the emergency is what separates professionals from amateurs.

1Sources of Model Risk

1. Specification error — wrong model architecture for the problem. Example: using linear regression for a non-linear relationship. 2. Estimation error — right model but wrong parameters. Example: overfitting to noise in training data. 3. Implementation error — bugs in code. Example: off-by-one error in feature calculation. 4. Environmental change — model assumptions no longer hold. Example: trained on trending markets, deployed in ranging markets. 5. Data quality degradation — input data quality drops. Example: data feed starts providing delayed quotes. For most production trading systems, the biggest risk is #4. Models trained on 2024 data might not work in a radically different 2027 market. Regime detection and regular retraining provide protection, but environmental change remains the primary threat. Federal Reserve SR 11-7. "Guidance on Model Risk Management." Board of Governors of the Federal Reserve System.

2Monitoring & Escalation

Continuous monitoring is non-negotiable. Here's a practical monitoring framework: Daily monitoring: - Win rate (rolling 50-trade window) - Average R per trade - Regime indicators (Hurst, ADX) - Model confidence distribution Weekly review: - Realized vs. predicted performance comparison - Feature importance stability - Correlation changes - Drawdown trajectory Monthly review: - Full walk-forward re-validation - PBO re-computation - Monte Carlo breach probability update - Model retraining decision Escalation triggers: - Rolling WR < 52% for 50 trades → INVESTIGATE - Rolling WR < 50% for 100 trades → HALT TRADING - Max drawdown > 6% → automatic risk reduction - Model confidence dropping consistently → schedule retrain

3The Model Risk Inventory

Every model in your system needs a risk inventory. Most people skip this and it catches up with them. For each model, document: Assumptions: - What distribution do you assume returns follow? - What's the expected regime (trending, ranging, volatile)? - What's the minimum data quality required? Failure modes: - What happens if a feature is NaN? - What happens if volatility spikes 5x? - What happens if the market regime flips? Mitigations for each failure: - NaN → use last valid value or skip trade - Vol spike → automatic risk reduction via drawdown-triggered scaling - Regime flip → regime detection flags it, models adapt The goal: no surprise should be a total surprise. You should have a pre-planned response for every foreseeable failure mode.

4Model Governance — Who Decides What

Even if you're a solo trader, you need governance rules. Write them down. Future-you under stress will thank present-you. Model changes require: - Walk-forward validation on recent data - Comparison against current production model - Monte Carlo breach probability check - Documentation of what changed and why Emergency procedures: - If rolling WR drops below 50% for 100 trades → halt trading - If max DD exceeds 6% → automatic risk reduction - If max DD exceeds 8% → halt and investigate - If data feed fails → close new positions, manage existing ones Change log requirement: Every model change gets logged: date, what changed, why, validation results, who approved. Even if "who approved" is just you. The discipline matters. This sounds like bureaucracy, but it's actually freedom. When something goes wrong at 3am, you don't have to think — you follow the playbook.

Key Formulas

Model Confidence Score

Measures how much better than random the model is performing, relative to target. MCS = 1 means hitting target WR. MCS = 0 means performing at random. MCS < 0 means worse than random.

Hands-On Code

Model Risk Monitor

python
import numpy as np

class ModelRiskMonitor:
    """Continuous model risk monitoring."""
    
    def __init__(self, target_wr=0.59, random_wr=0.50, halt_wr=0.50):
        self.target_wr = target_wr
        self.random_wr = random_wr
        self.halt_wr = halt_wr
        self.trade_results = []
    
    def record_trade(self, won: bool):
        self.trade_results.append(1 if won else 0)
    
    def status(self, window=50):
        if len(self.trade_results) < window:
            return "INSUFFICIENT_DATA"
        
        recent = self.trade_results[-window:]
        wr = np.mean(recent)
        mcs = (wr - self.random_wr) / (self.target_wr - self.random_wr)
        
        if wr < self.halt_wr:
            status = "HALT — below random"
        elif wr < 0.52:
            status = "INVESTIGATE — degraded"
        elif wr < self.target_wr:
            status = "OK — below target but acceptable"
        else:
            status = "EXCELLENT — at or above target"
        
        print(f"Rolling WR ({window} trades): {wr:.1%}")
        print(f"Model Confidence Score: {mcs:.2f}")
        print(f"Status: {status}")
        return status

Continuous monitoring catches model degradation before it becomes a serious loss. The Model Confidence Score gives a single, interpretable metric for model health.

Knowledge Check

Q1.Your model's rolling win rate has dropped from 59% to 52% over the last 100 trades. What do you do?

Assignment

Build a model risk monitoring system that tracks rolling win rate, model confidence score, and regime indicators. Set up alert thresholds and test them with simulated degradation scenarios.