Module 8: End-to-End Pricing Pipeline

Part of: Modern Insurance Pricing with Python and Databricks


The one that ties it together

Modules 1 through 7 each cover a distinct part of the pricing workflow. Delta tables. GLMs. Temporal cross-validation. SHAP relativities. Conformal uncertainty. Credibility blending. Rate optimisation. Each works in isolation. None of them, alone, is a pricing pipeline.

This module builds the pipeline.

We go from raw data in Unity Catalog to a rate change recommendation ready for a pricing committee. Every component from the earlier modules appears: the feature transform layer, the CatBoost frequency-severity models, walk-forward CV, SHAP-derived relativities, conformal intervals, Buhlmann-Straub credibility blending, and the constrained rate optimiser with FCA compliance built in.

The notebook you produce at the end of this module is not a demonstration. It is a working template for a real motor pricing project. We have structured it so that you can swap out the synthetic data for your own portfolio and run the pipeline with minimal modification.


What you will build


Prerequisites

If you have not done the earlier modules, the tutorial explains every component as it appears. You will not be lost. But the earlier modules give you the why behind each decision; this one concentrates on the how of connecting them.


Estimated time

6-8 hours for the full tutorial and exercises. Most of that is reading and understanding the pipeline structure. The code itself - once you have run it once - takes around 12-15 minutes to execute end to end on a single-node cluster. The walk-forward CV is the slowest component.


Files

File Purpose
tutorial.md Main written tutorial - read this first
notebook.py Databricks notebook - the complete pipeline in runnable form
exercises.md Four exercises extending the pipeline

Libraries

All from Burning Cost open-source repositories:

# Local development
uv add insurance-cv shap-relativities insurance-conformal credibility rate-optimiser catboost polars optuna mlflow

On Databricks, the first notebook cell:

%pip install \
  "insurance-cv @ git+https://github.com/burningcost/insurance-cv.git" \
  "shap-relativities[catboost] @ git+https://github.com/burningcost/shap-relativities.git" \
  "insurance-conformal[catboost] @ git+https://github.com/burningcost/insurance-conformal.git" \
  "credibility @ git+https://github.com/burningcost/credibility.git" \
  "rate-optimiser @ git+https://github.com/burningcost/rate-optimiser.git" \
  catboost polars optuna mlflow --quiet

What you will be able to do after this module


What makes this different from a demo

Most end-to-end pricing examples online cut corners. They use a flat DataFrame throughout with no data layer. They skip cross-validation. They have one model, not frequency and severity separately. They do not produce uncertainty intervals. They have no credibility step. And they hand-wave the rate change decision.

This pipeline does not cut those corners. Each stage corresponds to a real decision a pricing team makes. The output is something you could present to an underwriting director and defend.


Pricing context

This module is the full pricing workflow, not just the modelling piece. The question it answers is: given the data we have, what rate action should we take, and how confident should we be in that recommendation? That requires modelling, uncertainty quantification, credibility judgement, and formal optimisation. All four are here.


Part of the MVP bundle

Module 8 is included in the £395 full course bundle. Individual module: £99.