IMP Feature Importance & Selection
Knowing what your model actually uses — and what to cut
Learning Objectives
- •Understand different feature importance methods and their tradeoffs
- •Learn SHAP values and how they explain individual predictions
- •See why feature pruning improves robustness
- •Build a feature selection pipeline you can reuse
Explain Like I'm 5
Your model uses 38 features. Which ones actually matter? Feature importance analysis answers this question. Some features carry real signal. Others are noise that the model latches onto during training but adds nothing out-of-sample. Finding and removing the noise features makes your model more robust and faster.
Think of It This Way
It's like cleaning out your toolbox. You have 38 tools but you regularly use 15. The other 23 are taking up space and making it harder to find what you need. Feature importance tells you which 15 are doing the work. Remove the rest and you have a leaner, faster, more reliable setup.
1Three Approaches to Feature Importance
Feature Importance: Split-Based vs Permutation vs SHAP
2SHAP Values — How to Read Them
3Feature Pruning — Less Is More
Model Performance vs Number of Features
4Stability of Feature Importance Over Time
Feature Importance Rank Stability Across Walk-Forward Windows
Key Formulas
SHAP Value
The Shapley value for feature j — its average marginal contribution across all possible feature coalitions. Computationally expensive to calculate exactly, but TreeSHAP provides an exact O(TLD²) algorithm for tree-based models.
Permutation Importance
Importance of feature j = drop in model score when feature j is randomly shuffled. If shuffling a feature doesn't change the score, it wasn't useful. Simple, model-agnostic, and reliable.
Hands-On Code
Feature Importance Analysis with SHAP
import numpy as np
import xgboost as xgb
def feature_importance_analysis(model, X_val, feature_names):
"""Multi-method feature importance analysis."""
# 1. Split-based (built-in)
split_imp = model.get_score(importance_type='gain')
# 2. Permutation importance
dval = xgb.DMatrix(X_val)
base_pred = model.predict(dval)
perm_imp = {}
for i, name in enumerate(feature_names):
X_perm = X_val.copy()
X_perm[:, i] = np.random.permutation(X_perm[:, i])
perm_pred = model.predict(xgb.DMatrix(X_perm))
perm_imp[name] = np.mean(np.abs(base_pred - perm_pred))
# 3. Rank features
sorted_feats = sorted(perm_imp.items(), key=lambda x: -x[1])
print("=== Feature Importance Ranking ===")
for rank, (feat, imp) in enumerate(sorted_feats, 1):
bar = "█" * int(imp * 200)
print(f" {rank:2d}. {feat:20s} {imp:.4f} {bar}")
# Noise test
noise_features = [f for f in feature_names if 'noise' in f.lower()]
if noise_features:
noise_imp = sum(perm_imp[f] for f in noise_features)
total_imp = sum(perm_imp.values())
print(f"\nNoise features: {noise_imp/total_imp:.1%} of total importance")
if noise_imp / total_imp > 0.05:
print("⚠️ WARNING: Model is fitting to noise features!")Permutation importance is the most reliable method. The noise feature test is a built-in sanity check — if random features get importance, your model is overfitting.
Knowledge Check
Q1.You remove the bottom 40% of features by importance. Out-of-sample accuracy improves by 1.5%. Why?
Q2.A feature ranks #1 in one walk-forward window but #25 in the next. Should you keep it?
Assignment
Run feature importance analysis on your trained model using all three methods (split-based, permutation, SHAP). Do they agree on the top 5 features? Add 5 random noise columns, retrain, and check if any noise features appear in the top 20. If they do, increase regularization until they don't.