← Back to Learn
III AdvancedWeek 7 • Lesson 20Duration: 50 min

IMP Feature Importance & Selection

Knowing what your model actually uses — and what to cut

Learning Objectives

  • Understand different feature importance methods and their tradeoffs
  • Learn SHAP values and how they explain individual predictions
  • See why feature pruning improves robustness
  • Build a feature selection pipeline you can reuse

Explain Like I'm 5

Your model uses 38 features. Which ones actually matter? Feature importance analysis answers this question. Some features carry real signal. Others are noise that the model latches onto during training but adds nothing out-of-sample. Finding and removing the noise features makes your model more robust and faster.

Think of It This Way

It's like cleaning out your toolbox. You have 38 tools but you regularly use 15. The other 23 are taking up space and making it harder to find what you need. Feature importance tells you which 15 are doing the work. Remove the rest and you have a leaner, faster, more reliable setup.

1Three Approaches to Feature Importance

There are three ways to measure how important a feature is, and they sometimes disagree: 1. Split-based importance (built into trees) Counts how often a feature is used in tree splits, weighted by the information gain each split provides. Fast and free — XGBoost computes this automatically. But biased toward high-cardinality features (features with many unique values get more split opportunities). 2. Permutation importance Shuffle one feature's values randomly. If accuracy drops significantly, that feature was important. Model-agnostic — works with any model type. More reliable than split-based but slower (requires re-evaluation for each feature). 3. SHAP (SHapley Additive exPlanations) Computes each feature's contribution to each individual prediction using game theory. The gold standard for model interpretability. Shows not just IF a feature matters but HOW it affects each prediction (positive or negative direction). For production model development, use split-based importance for quick screening and SHAP for deep analysis of your final model. Lundberg, S.M. & Lee, S.I. (2017). "A Unified Approach to Interpreting Model Predictions." NeurIPS.

Feature Importance: Split-Based vs Permutation vs SHAP

2SHAP Values — How to Read Them

SHAP values tell you how each feature pushed a specific prediction away from the baseline. Example prediction: model predicts 65% probability of a winning trade. • Base rate (average prediction): 52% • RSI_14 = 72 → SHAP = +4% (overbought supports trend continuation) • Hurst = 0.62 → SHAP = +5% (trending regime is favorable) • Spread_ratio = 1.8 → SHAP = -2% (wide spread hurts) • Volume_ratio = 0.6 → SHAP = -1% (low volume is bearish) • All other features sum → +7% • Total: 52% + 4% + 5% - 2% - 1% + 7% = 65% ✓ This decomposition is incredibly useful for debugging. When a model makes a surprising prediction, SHAP shows exactly which features are responsible and in which direction. For trading, SHAP reveals the "investment thesis" behind each prediction. If the model is buying because of a feature you don't trust, that's a red flag. Shapley, L.S. (1953). "A Value for n-person Games." Contributions to the Theory of Games.

3Feature Pruning — Less Is More

A counterintuitive finding in ML for finance: removing features often improves out-of-sample performance. Why? Because low-importance features are mostly noise. During training, the model finds spurious correlations in these features that don't persist. Removing them forces the model to rely on genuine signal. The pruning workflow: 1. Train model with all features 2. Compute permutation importance for each feature 3. Remove features with importance < threshold (or bottom 20%) 4. Retrain on reduced feature set 5. Compare walk-forward performance 6. Repeat until performance stops improving Typically, you can remove 30-50% of features without losing performance. Often you gain 1-3% in out-of-sample accuracy because the model has fewer opportunities to overfit. The noise feature test: add 5 random columns. If ANY of them survive with non-zero importance, you need more aggressive regularization or feature pruning.

Model Performance vs Number of Features

4Stability of Feature Importance Over Time

A feature that's important during training might not be important six months later. Markets change. Regime shifts happen. Features that captured a specific pattern in 2022 might be irrelevant in 2024. This is why you should compute feature importance across multiple walk-forward windows, not just one. A feature that's consistently in the top 10 across all windows is reliable. A feature that's #1 in one window and #30 in another is unstable — it's fitting to something temporary. Stable features tend to be: - Volatility measures (ATR, realized vol) - Regime indicators (Hurst exponent, ADX) - Relative measures (RSI, z-scores) Unstable features tend to be: - Absolute price levels (meaningless across time) - Calendar features (unless there's a genuine seasonal effect) - Exotic indicators without theoretical backing Build your model on stable features. Avoid chasing temporary patterns.

Feature Importance Rank Stability Across Walk-Forward Windows

Key Formulas

SHAP Value

The Shapley value for feature j — its average marginal contribution across all possible feature coalitions. Computationally expensive to calculate exactly, but TreeSHAP provides an exact O(TLD²) algorithm for tree-based models.

Permutation Importance

Importance of feature j = drop in model score when feature j is randomly shuffled. If shuffling a feature doesn't change the score, it wasn't useful. Simple, model-agnostic, and reliable.

Hands-On Code

Feature Importance Analysis with SHAP

python
import numpy as np
import xgboost as xgb

def feature_importance_analysis(model, X_val, feature_names):
    """Multi-method feature importance analysis."""
    
    # 1. Split-based (built-in)
    split_imp = model.get_score(importance_type='gain')
    
    # 2. Permutation importance
    dval = xgb.DMatrix(X_val)
    base_pred = model.predict(dval)
    perm_imp = {}
    
    for i, name in enumerate(feature_names):
        X_perm = X_val.copy()
        X_perm[:, i] = np.random.permutation(X_perm[:, i])
        perm_pred = model.predict(xgb.DMatrix(X_perm))
        perm_imp[name] = np.mean(np.abs(base_pred - perm_pred))
    
    # 3. Rank features
    sorted_feats = sorted(perm_imp.items(), key=lambda x: -x[1])
    
    print("=== Feature Importance Ranking ===")
    for rank, (feat, imp) in enumerate(sorted_feats, 1):
        bar = "█" * int(imp * 200)
        print(f"  {rank:2d}. {feat:20s} {imp:.4f} {bar}")
    
    # Noise test
    noise_features = [f for f in feature_names if 'noise' in f.lower()]
    if noise_features:
        noise_imp = sum(perm_imp[f] for f in noise_features)
        total_imp = sum(perm_imp.values())
        print(f"\nNoise features: {noise_imp/total_imp:.1%} of total importance")
        if noise_imp / total_imp > 0.05:
            print("⚠️  WARNING: Model is fitting to noise features!")

Permutation importance is the most reliable method. The noise feature test is a built-in sanity check — if random features get importance, your model is overfitting.

Knowledge Check

Q1.You remove the bottom 40% of features by importance. Out-of-sample accuracy improves by 1.5%. Why?

Q2.A feature ranks #1 in one walk-forward window but #25 in the next. Should you keep it?

Assignment

Run feature importance analysis on your trained model using all three methods (split-based, permutation, SHAP). Do they agree on the top 5 features? Add 5 random noise columns, retrain, and check if any noise features appear in the top 20. If they do, increase regularization until they don't.