Module 4: SHAP Relativities

Part of Modern Insurance Pricing with Python and Databricks.


The problem this module solves

You have a CatBoost model that outperforms your production GLM. The loss ratio lift is real. The problem is that neither your pricing committee nor Radar can work with a black-box gradient boosting model. They need a factor table: one row per (feature, level) combination, relativity relative to a base level, with confidence intervals. The same format as exp(beta) from a GLM.

This module teaches you to get that table from the GBM using SHAP values. The approach is mathematically sound, not a heuristic. The output is reviewable by a pricing actuary, submittable to the FCA, and importable into Radar or Emblem.


What you will build


Prerequisites


Contents

File Description Estimated time
00-overview.md Module overview, objectives, prerequisites 10 min
01-why-shap-relativities.md The production problem and why SHAP solves it 30 min
02-setup.md Installation, notebook setup, dataset 20 min
03-training-the-gbm.md CatBoost freq and severity training 45 min
04-extracting-relativities.md SHAP extraction pipeline 45 min
05-regulatory-tables.md Committee formatting, proxy discrimination, IBNR 30 min
06-radar-export.md Radar/Emblem export, version control, drift monitoring 30 min
07-exercises.md Five exercises with worked solutions 45 min

Key technical decisions

CatBoost. CatBoost handles categorical features natively (no ordinal encoding needed), has built-in SHAP support that is faster than the generic shap library, and its Poisson objective handles exposure via a proper log-offset rather than a sample weight. These are practical advantages, not preferences.

Exposure as offset, not weight. In a Poisson frequency model, exposure enters the log-linear predictor as an offset: log(exposure) is added to log(lambda). It is not a sample weight on the likelihood. Setting both baseline=log(exposure) and weight=exposure simultaneously double-counts exposure and produces wrong predictions. This is covered in section 3 and demonstrated in Exercise 1.

Polars for data manipulation. All DataFrame operations use Polars. Conversion to pandas happens only at the CatBoost Pool boundary. The shap-relativities library accepts Polars DataFrames natively.

SHAP on original features, band aggregation separately. Continuous features like driver age are passed to SHAP as continuous variables (what the model was trained on). Banding for the factor table is a post-hoc aggregation step on the SHAP values, not a re-specification of the model. Passing a banded feature to an explainer trained on the continuous feature produces wrong SHAP values.


The shap-relativities library

This module uses shap-relativities, an open-source Python library for extracting multiplicative rating relativities from GBMs. Install via:

uv pip install 'shap-relativities[catboost]==0.1.0'

Source: https://github.com/burningcost/shap-relativities

The library outputs Polars DataFrames with columns: feature, level, relativity, lower_ci, upper_ci, mean_shap, shap_std, n_obs, exposure_weight.


Part of the MVP bundle

This module is included in the £295 MVP bundle alongside Module 1 (Databricks for Pricing Teams), Module 2 (GLMs in Python), and Module 6 (Credibility and Bayesian Pricing). Individual module: £79.