Module 8: End-to-End Pricing Pipeline
Part of: Modern Insurance Pricing with Python and Databricks
The one that ties it together
Modules 1 through 7 each cover a distinct part of the pricing workflow. Delta tables. GLMs. Temporal cross-validation. SHAP relativities. Conformal uncertainty. Credibility blending. Rate optimisation. Each works in isolation. None of them, alone, is a pricing pipeline.
This module builds the pipeline.
We go from raw data in Unity Catalog to a rate change recommendation ready for a pricing committee. Every component from the earlier modules appears: the feature transform layer, the CatBoost frequency-severity models, walk-forward CV, SHAP-derived relativities, conformal intervals, Buhlmann-Straub credibility blending, and the constrained rate optimiser with FCA compliance built in.
The notebook you produce at the end of this module is not a demonstration. It is a working template for a real motor pricing project. We have structured it so that you can swap out the synthetic data for your own portfolio and run the pipeline with minimal modification.
What you will build
- A Unity Catalog schema with Delta tables for raw data, feature views, model outputs, and the rate change pack
- A reproducible feature transform layer: all transforms defined as pure functions, versioned alongside the model
- Walk-forward cross-validation using
insurance-cvwith IBNR buffers, to validate frequency and severity models on temporally correct splits - Separate CatBoost Poisson (frequency) and CatBoost Gamma (severity) models, hyperparameter-tuned with Optuna and tracked in MLflow
- SHAP relativities extracted with
shap-relativities, one table per rating factor - Conformal prediction intervals on pure premium using
insurance-conformal, calibrated on a held-out conformity set - Credibility-blended relativities using
credibility(Buhlmann-Straub), balancing model-derived relativities against incumbent rates - A constrained rate optimisation using
rate-optimiserwith the FCA PS21/5 ENBP constraint and the efficient frontier - A final output pack written to Delta tables: relativities table, rate change summary, frontier data, model diagnostics
Prerequisites
- Completed Modules 1-7, or comfortable with: GLM/GBM pricing, Delta tables and Unity Catalog, SHAP for tree models, credibility theory, and basic constrained optimisation
- Python comfortable. You will be reading and adapting code, not writing from scratch
- A Databricks workspace with Unity Catalog enabled. Databricks Free Edition works for everything here
If you have not done the earlier modules, the tutorial explains every component as it appears. You will not be lost. But the earlier modules give you the why behind each decision; this one concentrates on the how of connecting them.
Estimated time
6-8 hours for the full tutorial and exercises. Most of that is reading and understanding the pipeline structure. The code itself - once you have run it once - takes around 12-15 minutes to execute end to end on a single-node cluster. The walk-forward CV is the slowest component.
Files
| File | Purpose |
|---|---|
tutorial.md |
Main written tutorial - read this first |
notebook.py |
Databricks notebook - the complete pipeline in runnable form |
exercises.md |
Four exercises extending the pipeline |
Libraries
All from Burning Cost open-source repositories:
# Local development
uv add insurance-cv shap-relativities insurance-conformal credibility rate-optimiser catboost polars optuna mlflow
On Databricks, the first notebook cell:
%pip install \
"insurance-cv @ git+https://github.com/burningcost/insurance-cv.git" \
"shap-relativities[catboost] @ git+https://github.com/burningcost/shap-relativities.git" \
"insurance-conformal[catboost] @ git+https://github.com/burningcost/insurance-conformal.git" \
"credibility @ git+https://github.com/burningcost/credibility.git" \
"rate-optimiser @ git+https://github.com/burningcost/rate-optimiser.git" \
catboost polars optuna mlflow --quiet
What you will be able to do after this module
- Build a complete motor pricing pipeline that a real pricing team could inherit and run
- Explain every decision in the pipeline: why walk-forward CV, why separate frequency and severity, why Buhlmann-Straub for credibility, why SLSQP for the optimisation
- Structure a Databricks notebook so that a colleague who did not write it can understand what each stage does and why
- Write a rate change pack to Delta tables with a full audit trail: what the model recommended, how confident we are, and what the optimiser decided
- Adapt the template for a different line of business or a different model type
What makes this different from a demo
Most end-to-end pricing examples online cut corners. They use a flat DataFrame throughout with no data layer. They skip cross-validation. They have one model, not frequency and severity separately. They do not produce uncertainty intervals. They have no credibility step. And they hand-wave the rate change decision.
This pipeline does not cut those corners. Each stage corresponds to a real decision a pricing team makes. The output is something you could present to an underwriting director and defend.
Pricing context
This module is the full pricing workflow, not just the modelling piece. The question it answers is: given the data we have, what rate action should we take, and how confident should we be in that recommendation? That requires modelling, uncertainty quantification, credibility judgement, and formal optimisation. All four are here.
Part of the MVP bundle
Module 8 is included in the £395 full course bundle. Individual module: £99.