Name: Modern Insurance Pricing with Python and Databricks
Price: 295 GBP
Availability: PreOrder
Author: Burning Cost

What you will build

Tangible outputs from every module

Module 01

A clean, reproducible pricing workspace on Databricks

Unity Catalog schema, Delta tables with synthetic motor data, MLflow experiment tracking, and an audit trail that ties model runs to data versions.

Module 02

A GLM frequency-severity model a traditional reviewer can follow

statsmodels GLM replicating Emblem's output, including offset terms, deviance statistics, one-way analysis, and a Radar-compatible factor table export.

Module 03

CatBoost models with proper insurance cross-validation

Poisson frequency and Gamma severity models with walk-forward CV, IBNR buffers, Optuna hyperparameter tuning, and a proper GBM-vs-GLM comparison.

Module 04

A SHAP relativity table formatted for a pricing committee

Multiplicative relativities extracted from a CatBoost GBM, in Excel format for review and CSV format for Radar, with proxy discrimination detection for Consumer Duty.

Module 05

Calibrated prediction intervals on individual risk estimates

Conformal prediction intervals with a finite-sample coverage guarantee, calibrated to your holdout data, roughly 30% narrower than the naive approach.

Module 06

Credibility-weighted relativities for thin-cell segments

Buhlmann-Straub credibility in Python, including NCD factor stabilisation and blending a new model with incumbent rates where the data is thin.

Module 07

A constrained rate optimisation with an FCA compliance check

Linear programming that hits a target loss ratio, respects movement caps, and has Consumer Duty PS21/5 constraints built in. Includes efficient frontier analysis and shadow price reporting.

Module 08

A complete motor pricing pipeline, ready to use as a template

From Delta ingestion to rate change recommendation, every component connected. Structured so you can swap in your own data and run it with minimal modification.

The curriculum

Eight modules. No filler.

Databricks for Pricing Teams

What Databricks actually is, not the marketing version. Set up a workspace for pricing, not for a generic data pipeline.

Available

Unity Catalog for pricing data: where to put tables, how to set retention for FCA audit
Cluster configuration that does not cost a fortune to leave running
Delta tables as a replacement for flat-file data passes between pricing and MI
MLflow experiment tracking from first principles: log parameters, metrics, and artefacts
An audit trail table that ties model runs to data versions

Module overview Preview tutorial

GLMs in Python: The Bridge from Emblem

How to replicate what Emblem does in Python, transparently. The same deviance statistics. The same factor tables. A workflow a traditional reviewer can follow.

Available

statsmodels GLMs with offset terms, variance functions, and IRLS - the same algorithm Emblem uses
One-way and two-way analysis, aliasing detection, and model comparison by deviance
The difference between statsmodels and sklearn's GLM implementation, and when it matters
Exporting factor tables to a format Radar can import directly
On clean data, output matches Emblem to four decimal places given identical encodings

Module overview Preview tutorial

GBMs for Insurance Pricing

CatBoost with Poisson, Gamma, and Tweedie objectives. Walk-forward cross-validation with IBNR buffers so you are not lying to yourself about out-of-sample performance.

Coming soon

CatBoost Poisson frequency and Gamma severity models with correct exposure handling
Why default hyperparameters from generic tutorials are wrong for insurance data
Walk-forward temporal cross-validation using insurance-cv, with IBNR buffer support
Optuna hyperparameter tuning and MLflow experiment tracking
Proper GBM-vs-GLM comparison: Gini, calibration curves, and double-lift charts

SHAP Relativities

How to get a factor table out of a GBM. Mathematically sound, reviewable by a pricing actuary, submittable to the FCA, importable into Radar.

Available

SHAP values as a principled replacement for GLM relativities - not a heuristic
Aggregating raw SHAP values into multiplicative factor tables with confidence intervals
When SHAP relativities are honest and when they are misleading: interactions, correlated features
Proxy discrimination detection using SHAP for FCA Consumer Duty compliance
Excel and Radar-CSV exports using the open-source shap-relativities library

Module overview Preview tutorial

Conformal Prediction Intervals

Prediction intervals with a finite-sample coverage guarantee that does not depend on distributional assumptions. Calibrated to your holdout data.

Coming soon

Why point estimates are not enough and why standard confidence intervals are the wrong answer
Conformal prediction: the theory and why the coverage guarantee is unconditional
Variance-weighted non-conformity scores for heteroscedastic insurance data (Manna et al., 2025)
Intervals roughly 30% narrower than the naive approach, with identical coverage
Using intervals to flag uncertain risks and set minimum premium floors

Credibility and Bayesian Pricing

You have 200 policies in a postcode area and 3 claims. Is 1.5% the true frequency, or noise? Buhlmann-Straub credibility and hierarchical Bayesian models for thin-cell segments.

Available

Buhlmann-Straub credibility in Python using the open-source credibility library
Its relationship to mixed models and partial pooling - when they agree and when they diverge
Practical applications: NCD factor stabilisation, blending a new model with incumbent rates
Full Bayesian hierarchical models via bayesian-pricing for segments where credibility is not enough
What credibility does not protect you from when exposure mix is shifting

Module overview Preview tutorial

Constrained Rate Optimisation

The module most courses do not have. Linear programming for rate changes that hit a target loss ratio, respect movement caps, and minimise cross-subsidy simultaneously.

Coming soon

The formal problem of finding which factors to move, by how much, subject to constraints
Linear programming formulation using rate-optimiser and scipy.optimize
The efficient frontier of achievable (loss ratio, volume) outcomes for a rate review cycle
Shadow price analysis: the marginal cost of tightening the LR target, quantified
FCA PS21/5 Consumer Duty compliance constraints built into the optimisation

End-to-End Pipeline (Capstone)

Every component from Modules 1-7 connected into a working motor pricing pipeline. Not a demonstration: a template for a real project.

Coming soon

Raw data from Unity Catalog through to a rate change pack in a single reproducible pipeline
A feature transform layer defined as pure functions, versioned alongside the model
CatBoost frequency and severity models, SHAP relativities, conformal intervals, credibility blending, rate optimisation
Output tables written to Delta: relativities, rate change summary, efficient frontier data, model diagnostics
Designed to work with your own motor portfolio data with minimal modification

Why trust us on this

Built by practitioners who have done this at UK insurers

14 open-source libraries, each solving one well-defined pricing problem. Over 600 tests. Every library used in the course was built by us.

Open source

github.com/burningcost

Written for people who already know what a GLM is. No generic data science padding. Every module covers a real pricing workflow problem.

Practitioner focus

UK personal lines

Executable Databricks notebooks with synthetic data that behaves like the real thing. Run the code, see the outputs, adapt to your own book.

Hands-on

Databricks notebooks

The course teaches you to use these open-source libraries. We built all of them.

insurance-cv

Temporal CV with IBNR buffers

shap-relativities

SHAP values as multiplicative factor tables

credibility

Buhlmann-Straub credibility in Python

insurance-conformal

Distribution-free intervals for GBMs

rate-optimiser

Constrained rate change optimisation

bayesian-pricing

Hierarchical Bayesian models for thin segments

insurance-distill

GBM-to-GLM distillation for production

FAQ

Common questions

Do I need a paid Databricks account?

No. All exercises are compatible with Databricks Free Edition. You do not need company approval or a budget to start.

How much Python do I need?

You should be able to read a function, understand a list comprehension, and follow a data pipeline. You do not need to be a software engineer. If you can write a basic script to load a CSV and filter rows, you have enough Python. The course introduces every library we use as we go.

Do I need to know GLMs before starting?

For most of the course, yes. You should know what a log link is, what the deviance statistic measures, and have built at least one frequency or severity model, even if it was in Emblem. Module 2 covers the Python implementation in detail but does not re-teach GLM theory from scratch.

What data does the course use?

Synthetic UK motor data throughout. 10,000 policies with realistic exposure, claim counts, development patterns, and rating factor structures. Not a Kaggle dataset. The data is generated to mirror the statistical properties of a real personal lines motor portfolio without using real customer data.

Why Polars instead of Pandas?

Polars is faster, has better null handling, and encourages a more explicit data pipeline style. On Databricks, it runs well alongside PySpark and does not require the Pandas-on-Spark shim. Pandas appears only at the boundary where statsmodels requires it. If you know Pandas, you will be able to follow Polars without difficulty.

Why CatBoost and not XGBoost or other GBM libraries?

CatBoost handles categorical features natively without ordinal encoding, which matters for insurance rating factors. It also has more stable default behaviour on small datasets. The Poisson, Gamma, and Tweedie objectives are well-tested and documented. That said, the concepts transfer directly to XGBoost or other gradient boosting libraries if your team is committed to a different tool.

What does "access to all Burning Cost tools" mean?

As we build new products — additional tools, dashboards, templates, or workflow utilities — course purchasers get access as part of the same one-time payment. We are building a suite of things for UK pricing teams. One payment gets you into all of it.

When will the remaining modules be published?

Modules 3, 5, 7, and 8 are in progress. We publish them as they are ready rather than waiting until the full course is complete. When you buy, you get immediate access to published modules and new ones as they land.

What if the content does not meet my expectations?

You can preview the tutorials for Modules 1, 2, 4, and 6 before buying. If you read those and feel the depth and style are not what you need, we would rather you saved the money. Email us and we will discuss it.

Modern Insurance Pricing
with Python and Databricks

Generic Databricks tutorials will not teach you to price insurance

Pricing teams making the move to Python

Not every course is the right course

Tangible outputs from every module

Eight modules. No filler.

Built by practitioners who have done this at UK insurers

One product. One price. Everything included.

Get notified when it launches

Common questions

Stop spending months adapting the wrong tutorials

Modern Insurance Pricingwith Python and Databricks

Generic Databricks tutorials will not teach you to price insurance

Pricing teams making the move to Python

Not every course is the right course

Tangible outputs from every module

Eight modules. No filler.

Built by practitioners who have done this at UK insurers

One product. One price. Everything included.

Get notified when it launches

Common questions

Stop spending months adapting the wrong tutorials

Modern Insurance Pricing
with Python and Databricks