Open source · UK insurance pricing

Your GBM outperforms.
Your GLM is still live.

14 Python libraries that bridge the gap: from walk-forward cross-validation to constrained rate optimisation. Written for teams that already know what a GLM is.

shap_relativities_demo.py
14 Libraries
600+ Tests
17 Articles
8 Course modules

The missing piece is not technical skill. It is tooling that bridges the two.

Most UK pricing teams have adopted GBMs but are still taking GLM outputs to production. The GBM sits on a server outperforming the production model, but the outputs are not in a form that a rating engine, regulator, or pricing committee can work with. The model never makes it to rates.

Each library here solves one specific problem in the pricing workflow. Actuarial tests are included. Outputs use the formats pricing teams already recognise: factor tables, Lorenz curves, A/E ratios, movement-capped rate changes.

sklearn-compatible where it matters. Documented by people who have sat in the same sign-off meetings you have.

Three lines to a factor table. Five to validated splits.

Real API calls from the libraries. Not wrappers around wrappers. Each one does the specific thing a pricing team needs.

from shap_relativities import SHAPRelativities sr = SHAPRelativities(model, X_train) factors = sr.fit_transform(X_test) # Returns multiplicative factor tables in GLM format # Same structure as exp(beta) from your Emblem model factors.head() # vehicle_age relativity ci_lower ci_upper # 0 1.000 0.982 1.018 # 1 0.912 0.901 0.923 print(f"Reconstruction R² = {sr.reconstruction_r2:.4f}") # Reconstruction R² = 0.9973
Factor tables, confidence intervals, exposure weighting, reconstruction validation. Output goes straight into a pricing committee pack.
from insurance_cv import InsuranceTemporalCV from sklearn.model_selection import cross_val_score cv = InsuranceTemporalCV( n_splits=5, ibnr_buffer_months=6 ) scores = cross_val_score( model, X, y, cv=cv, scoring="poisson_deviance" ) # Walk-forward splits - no future data leaks into training folds # IBNR buffer prevents immature periods contaminating validation print(f"CV deviance: {scores.mean():.4f} ± {scores.std():.4f}")
Walk-forward splits with configurable IBNR buffers. Temporally correct: no future data leaks into training folds. sklearn-compatible API.
from rate_optimiser import RateOptimiser opt = RateOptimiser( current_rates, technical_rates, exposure ) result = opt.optimise( max_movement=0.10, target_lr_improvement=0.03 ) # Efficient frontier as a linear programme # Respects ±10% movement cap per segment print(f"LR improvement: {result.lr_delta:.1%}") # LR improvement: 2.8% (within movement constraints)
Formulates the efficient frontier as a linear programme. Respects movement caps per segment, targets aggregate loss ratio improvement.

Built for people who know the problem from the inside

These libraries assume you understand insurance pricing. They do not explain what a GLM is.

PA
Pricing actuaries moving from Emblem or Radar to Python

You know the techniques. These libraries give you Python equivalents that produce outputs in the same formats you already use: factor tables, A/E ratios, Lorenz curves.

DS
Data scientists joining an insurance pricing team

You have the ML skills but lack the actuarial context. These libraries encode that context: correct cross-validation for IBNR, credibility-weighted factors, fairness tests that map to FCA requirements.

PM
Pricing managers evaluating modern tooling

You need to know what is production-ready and what is a research prototype. Each library here has actuarial tests, a clear scope, and outputs a pricing team lead can explain to a committee.

AR
Academic researchers working on insurance pricing methods

We implement recent literature: Manna et al. (2025) on conformal prediction, BYM2 spatial models, variance-weighted non-conformity scores. Reproducible, documented, testable.

The full pricing workflow, covered

Each library solves one well-defined problem. Actuarial tests included. sklearn-compatible where it matters.

📉
Data & Features
insurance-cv
🧠
Model Fitting
credibility bayesian-pricing
🔍
Interpretation
shap-relativities ins-interactions insurance-distill
Validation
ins-conformal ins-monitoring
Compliance
ins-fairness
📈
Rates & Commercial
rate-optimiser ins-demand experience-rating
🔍 Model interpretation
Validation
Techniques
📈 Commercial
Compliance
Training course

Modern Insurance Pricing with Python and Databricks

Eight modules written for pricing actuaries and analysts at UK personal lines insurers. Every module covers a real pricing problem, not a generic data science tutorial adapted to insurance. You work through real Databricks notebooks, on synthetic data that behaves like the real thing.

See the full course →
  • 01 Databricks for pricing teams
  • 02 GLMs in Python: the bridge from Emblem
  • 03 GBMs for insurance pricing
  • 04 SHAP relativities
  • 05 Conformal prediction intervals
  • 06 Credibility and Bayesian pricing
  • 07 Constrained rate optimisation
  • 08 End-to-end pipeline capstone

Practitioner articles on insurance pricing

Your Rating Factor Might Be Confounded
When exp(beta) from a GLM is not what you think it is. How omitted variable bias and confounding distort rating factor estimates, and how Double Machine Learning produces cleaner causal estimates.
Read article →
Your Pricing Model Might Be Discriminating
How to detect and correct proxy discrimination in UK insurance pricing models. Using SHAP and the insurance-fairness library to identify protected characteristic leakage under FCA Consumer Duty.
Read article →
Your Pricing Model is Drifting (and You Probably Can't Tell)
PSI and aggregate A/E are not enough. A three-layer monitoring framework - feature drift, segmented calibration, and a formal Gini test - that tells you whether to recalibrate or refit.
Read article →
Your Demand Model Is Confounded
Naive price elasticity estimates from insurance quote data are biased - risk drives both premium and lapse. The insurance-demand library implements Double Machine Learning to fix this, covering the full conversion, retention, elasticity, and demand curve pipeline with FCA GIPP compliance built in.
Read article →
From CatBoost to Radar in 50 Lines of Python
An open-source Python library that distils GBM models into multiplicative GLM factor tables for Radar, Emblem, and other rating engines. The first open-source solution for the most common deployment problem in UK pricing.
Read article →
Your NCD Threshold Advice Is Wrong at 65%
A Python library for NCD/bonus-malus systems, experience modification factors, and schedule rating. Includes the non-obvious finding that optimal NCD claiming thresholds peak at 30% NCD, not 65%.
Read article →
Demand Modelling for Insurance Pricing
How to build a demand model for UK personal lines pricing: conversion, retention, price elasticity, and demand curves. Covers FCA GIPP requirements and the tools that make it tractable.
Read article →
How Much of Your GLM Coefficient Is Actually Causal?
GLM coefficients measure association, not causation. How Double Machine Learning isolates the causal effect of rating factors from confounding, and why this matters for FCA-compliant pricing.
Read article →
Why Your Cross-Validation is Lying to You
Standard k-fold cross-validation is wrong for insurance pricing models. How temporal leakage and IBNR contamination inflate CV scores, and how walk-forward validation fixes both problems.
Read article →
All articles →
Join waitlist →