StableGLM: Rashomon Sets for Generalized Linear Models
StableGLM: Rashomon Sets for Generalized Linear Models
Problem
Interpretability methods typically report properties of a single fitted model. But in practice, many parameter vectors achieve nearly the same loss --- the Rashomon effect. If a feature appears important under one near-optimal model but irrelevant under another, the explanation is an artifact of model selection, not a property of the data.
StableGLM makes this concrete for generalized linear models by characterizing the full set of near-optimal models and computing interpretability metrics over that set.
Setup
For a GLM with convex loss , the ε-Rashomon set is
This is a convex sublevel set. Near the optimum, the Hessian provides a local ellipsoidal approximation:
The ellipsoid is cheap to work with analytically. For arbitrary linear functionals , the extrema over have closed forms involving . For exact (non-approximate) computations, we sample uniformly from using hit-and-run with a membership oracle.
What the toolkit computes
Per-point prediction bands. For each data point, the range of predictions across all models in . Points with wide bands are ambiguous: the model's prediction depends on which near-optimal was chosen.
Variable Importance Clouds (VIC). The range of each coefficient across the Rashomon set, and Shapley-weighted variants that account for feature correlations.
Model Class Reliance (MCR). The range of permutation-based feature importance scores across the set, answering: could this feature be unimportant under some near-optimal model?
Predictive multiplicity metrics. Ambiguity (fraction of points whose predicted label changes across ), discrepancy (maximum pairwise disagreement), and Rashomon capacity (effective volume of the set).
Calibrating ε
The choice of determines the size of the set. We support three calibration modes: (1) percent loss slack (), (2) likelihood-ratio inversion (), and (3) a high-dimensional correction for the regime.
Takeaway
For any GLM fit on correlated or noisy features, single-model explanations are likely unstable. The Rashomon set makes this instability visible and quantifiable. The practical message: before trusting a feature importance ranking, check whether it survives across near-optimal models. If it doesn't, the ranking reflects optimization noise, not signal.