Practitioner course · UK personal lines

Modern Insurance Pricing
with Python and Databricks

Eight modules. GLMs, GBMs, SHAP relativities, conformal intervals, credibility, and constrained rate optimisation. Every line of code written for insurance pricing specifically.

8 modules  ·  Databricks notebooks included  ·  Synthetic UK motor data throughout

Join the waitlist View the modules
One-time payment, no subscription All future updates included Access to all Burning Cost tools

Generic Databricks tutorials will not teach you to price insurance

Most actuaries and analysts learn Databricks from the same place: tutorials aimed at software engineers doing retail churn or ad-click models. Those tutorials cover Delta Lake and MLflow in the abstract. They do not cover Poisson deviance as a loss function, IBNR buffers in cross-validation, or how to get SHAP relativities into a format that a pricing committee will actually accept.

You can piece it together. People do. But it takes six months of wasted effort, and you end up with notebooks that work but that nobody else on the team can maintain, because they were written by someone learning two things at once.

This course teaches Databricks for insurance pricing specifically. Every module starts with a real personal lines problem, uses realistic UK motor data, and ends with output that a pricing team can use.

Pricing teams making the move to Python

This is not an introductory course to Python or to insurance pricing. It assumes you price things for a living and want to do it properly on Databricks.

  • Pricing actuaries and analysts at UK personal lines insurers
  • Using Databricks now, or will be within the next twelve months
  • Already know GLMs: you understand what a log link is and why it is there
  • Can write basic Python: loops, functions, DataFrames
  • Tired of adapting generic tutorials to insurance problems and hoping for the best

Not every course is the right course

Better to be clear about this upfront than to have you work through Module 2 wondering why you do not have the background.

  • Complete beginners to Python (you need to know what a function is)
  • People new to insurance pricing who need the fundamentals first
  • Data scientists looking for a general ML course (this is for insurance)
  • Teams on a platform other than Databricks (all exercises use Databricks specifically)

Tangible outputs from every module

Module 01
A clean, reproducible pricing workspace on Databricks
Unity Catalog schema, Delta tables with synthetic motor data, MLflow experiment tracking, and an audit trail that ties model runs to data versions.
Module 02
A GLM frequency-severity model a traditional reviewer can follow
statsmodels GLM replicating Emblem's output, including offset terms, deviance statistics, one-way analysis, and a Radar-compatible factor table export.
Module 03
CatBoost models with proper insurance cross-validation
Poisson frequency and Gamma severity models with walk-forward CV, IBNR buffers, Optuna hyperparameter tuning, and a proper GBM-vs-GLM comparison.
Module 04
A SHAP relativity table formatted for a pricing committee
Multiplicative relativities extracted from a CatBoost GBM, in Excel format for review and CSV format for Radar, with proxy discrimination detection for Consumer Duty.
Module 05
Calibrated prediction intervals on individual risk estimates
Conformal prediction intervals with a finite-sample coverage guarantee, calibrated to your holdout data, roughly 30% narrower than the naive approach.
Module 06
Credibility-weighted relativities for thin-cell segments
Buhlmann-Straub credibility in Python, including NCD factor stabilisation and blending a new model with incumbent rates where the data is thin.
Module 07
A constrained rate optimisation with an FCA compliance check
Linear programming that hits a target loss ratio, respects movement caps, and has Consumer Duty PS21/5 constraints built in. Includes efficient frontier analysis and shadow price reporting.
Module 08
A complete motor pricing pipeline, ready to use as a template
From Delta ingestion to rate change recommendation, every component connected. Structured so you can swap in your own data and run it with minimal modification.

Eight modules. No filler.

01
Databricks for Pricing Teams
What Databricks actually is, not the marketing version. Set up a workspace for pricing, not for a generic data pipeline.
Available
  • Unity Catalog for pricing data: where to put tables, how to set retention for FCA audit
  • Cluster configuration that does not cost a fortune to leave running
  • Delta tables as a replacement for flat-file data passes between pricing and MI
  • MLflow experiment tracking from first principles: log parameters, metrics, and artefacts
  • An audit trail table that ties model runs to data versions
02
GLMs in Python: The Bridge from Emblem
How to replicate what Emblem does in Python, transparently. The same deviance statistics. The same factor tables. A workflow a traditional reviewer can follow.
Available
  • statsmodels GLMs with offset terms, variance functions, and IRLS - the same algorithm Emblem uses
  • One-way and two-way analysis, aliasing detection, and model comparison by deviance
  • The difference between statsmodels and sklearn's GLM implementation, and when it matters
  • Exporting factor tables to a format Radar can import directly
  • On clean data, output matches Emblem to four decimal places given identical encodings
03
GBMs for Insurance Pricing
CatBoost with Poisson, Gamma, and Tweedie objectives. Walk-forward cross-validation with IBNR buffers so you are not lying to yourself about out-of-sample performance.
Coming soon
  • CatBoost Poisson frequency and Gamma severity models with correct exposure handling
  • Why default hyperparameters from generic tutorials are wrong for insurance data
  • Walk-forward temporal cross-validation using insurance-cv, with IBNR buffer support
  • Optuna hyperparameter tuning and MLflow experiment tracking
  • Proper GBM-vs-GLM comparison: Gini, calibration curves, and double-lift charts
04
SHAP Relativities
How to get a factor table out of a GBM. Mathematically sound, reviewable by a pricing actuary, submittable to the FCA, importable into Radar.
Available
  • SHAP values as a principled replacement for GLM relativities - not a heuristic
  • Aggregating raw SHAP values into multiplicative factor tables with confidence intervals
  • When SHAP relativities are honest and when they are misleading: interactions, correlated features
  • Proxy discrimination detection using SHAP for FCA Consumer Duty compliance
  • Excel and Radar-CSV exports using the open-source shap-relativities library
05
Conformal Prediction Intervals
Prediction intervals with a finite-sample coverage guarantee that does not depend on distributional assumptions. Calibrated to your holdout data.
Coming soon
  • Why point estimates are not enough and why standard confidence intervals are the wrong answer
  • Conformal prediction: the theory and why the coverage guarantee is unconditional
  • Variance-weighted non-conformity scores for heteroscedastic insurance data (Manna et al., 2025)
  • Intervals roughly 30% narrower than the naive approach, with identical coverage
  • Using intervals to flag uncertain risks and set minimum premium floors
06
Credibility and Bayesian Pricing
You have 200 policies in a postcode area and 3 claims. Is 1.5% the true frequency, or noise? Buhlmann-Straub credibility and hierarchical Bayesian models for thin-cell segments.
Available
  • Buhlmann-Straub credibility in Python using the open-source credibility library
  • Its relationship to mixed models and partial pooling - when they agree and when they diverge
  • Practical applications: NCD factor stabilisation, blending a new model with incumbent rates
  • Full Bayesian hierarchical models via bayesian-pricing for segments where credibility is not enough
  • What credibility does not protect you from when exposure mix is shifting
07
Constrained Rate Optimisation
The module most courses do not have. Linear programming for rate changes that hit a target loss ratio, respect movement caps, and minimise cross-subsidy simultaneously.
Coming soon
  • The formal problem of finding which factors to move, by how much, subject to constraints
  • Linear programming formulation using rate-optimiser and scipy.optimize
  • The efficient frontier of achievable (loss ratio, volume) outcomes for a rate review cycle
  • Shadow price analysis: the marginal cost of tightening the LR target, quantified
  • FCA PS21/5 Consumer Duty compliance constraints built into the optimisation
08
End-to-End Pipeline (Capstone)
Every component from Modules 1-7 connected into a working motor pricing pipeline. Not a demonstration: a template for a real project.
Coming soon
  • Raw data from Unity Catalog through to a rate change pack in a single reproducible pipeline
  • A feature transform layer defined as pure functions, versioned alongside the model
  • CatBoost frequency and severity models, SHAP relativities, conformal intervals, credibility blending, rate optimisation
  • Output tables written to Delta: relativities, rate change summary, efficient frontier data, model diagnostics
  • Designed to work with your own motor portfolio data with minimal modification

Built by practitioners who have done this at UK insurers

14 open-source libraries, each solving one well-defined pricing problem. Over 600 tests. Every library used in the course was built by us.

Open source
github.com/burningcost

Written for people who already know what a GLM is. No generic data science padding. Every module covers a real pricing workflow problem.

Practitioner focus
UK personal lines

Executable Databricks notebooks with synthetic data that behaves like the real thing. Run the code, see the outputs, adapt to your own book.

Hands-on
Databricks notebooks
The course teaches you to use these open-source libraries. We built all of them.
Temporal CV with IBNR buffers
SHAP values as multiplicative factor tables
Buhlmann-Straub credibility in Python
Distribution-free intervals for GBMs
Constrained rate change optimisation
Hierarchical Bayesian models for thin segments
GBM-to-GLM distillation for production

One product. One price. Everything included.

Buy once. Get the full course, every update, and access to all Burning Cost tools as they ship.

Payment processing is nearly ready. Join the waitlist and we will email you when it is live — waitlist members get first access at the launch price.

Get notified when it launches

Payment processing is nearly ready. Leave your email and we will send you a link the day it goes live. Waitlist members get first access.

Common questions

Do I need a paid Databricks account?
No. All exercises are compatible with Databricks Free Edition. You do not need company approval or a budget to start.
How much Python do I need?
You should be able to read a function, understand a list comprehension, and follow a data pipeline. You do not need to be a software engineer. If you can write a basic script to load a CSV and filter rows, you have enough Python. The course introduces every library we use as we go.
Do I need to know GLMs before starting?
For most of the course, yes. You should know what a log link is, what the deviance statistic measures, and have built at least one frequency or severity model, even if it was in Emblem. Module 2 covers the Python implementation in detail but does not re-teach GLM theory from scratch.
What data does the course use?
Synthetic UK motor data throughout. 10,000 policies with realistic exposure, claim counts, development patterns, and rating factor structures. Not a Kaggle dataset. The data is generated to mirror the statistical properties of a real personal lines motor portfolio without using real customer data.
Why Polars instead of Pandas?
Polars is faster, has better null handling, and encourages a more explicit data pipeline style. On Databricks, it runs well alongside PySpark and does not require the Pandas-on-Spark shim. Pandas appears only at the boundary where statsmodels requires it. If you know Pandas, you will be able to follow Polars without difficulty.
Why CatBoost and not XGBoost or other GBM libraries?
CatBoost handles categorical features natively without ordinal encoding, which matters for insurance rating factors. It also has more stable default behaviour on small datasets. The Poisson, Gamma, and Tweedie objectives are well-tested and documented. That said, the concepts transfer directly to XGBoost or other gradient boosting libraries if your team is committed to a different tool.
What does "access to all Burning Cost tools" mean?
As we build new products — additional tools, dashboards, templates, or workflow utilities — course purchasers get access as part of the same one-time payment. We are building a suite of things for UK pricing teams. One payment gets you into all of it.
When will the remaining modules be published?
Modules 3, 5, 7, and 8 are in progress. We publish them as they are ready rather than waiting until the full course is complete. When you buy, you get immediate access to published modules and new ones as they land.
What if the content does not meet my expectations?
You can preview the tutorials for Modules 1, 2, 4, and 6 before buying. If you read those and feel the depth and style are not what you need, we would rather you saved the money. Email us and we will discuss it.

Stop spending months adapting the wrong tutorials

Eight modules written specifically for UK personal lines pricing teams. GLMs, GBMs, SHAP relativities, credibility, rate optimisation. The full workflow, done properly. One price, everything included.

Join the waitlist Review the modules