Distribution-Free Finite-Sample Guarantees and Split Conformal Prediction

26 Oct 2022·
Roel Hulsman
Roel Hulsman
· 0 min read
Figure 2
Abstract
Modern black-box predictive models are often accompanied by weak performance guarantees that only hold asymptotically in the size of the dataset or require strong parametric assumptions. In response to this, split conformal prediction represents a promising avenue to obtain finite-sample guarantees under minimal distribution-free assumptions. Although prediction set validity most often concerns marginal coverage, we explore the related but different guarantee of tolerance regions, reformulating known results in the language of nested prediction sets and extending on the duality between marginal coverage and tolerance regions. Furthermore, we highlight the connection between split conformal prediction and classical tolerance predictors developed in the 1940s, as well as recent developments in distribution-free risk control. One result that transfers from classical tolerance predictors is that the coverage of a prediction set based on order statistics, conditional on the calibration set, is a random variable stochastically dominating the Beta distribution. We demonstrate the empirical effectiveness of our findings on synthetic and real datasets using a popular split conformal prediction procedure called conformalized quantile regression (CQR).
Type
Publication
Master’s thesis, University of Oxford, Oxford, United Kingdom. Available at https://arxiv.org/pdf/2210.14735
publications
Roel Hulsman
Authors
PhD Candidate in Causal Machine Learning

I am a second-year PhD candidate in causal machine learning at the Amsterdam Machine Learning Lab (AMLab), supervised by Sara Magliacane and Herke van Hoof. My PhD is funded by Adyen, a global financial technology company, where I spent a minor portion of my time. My research primarily focuses on causal methods for (nonstationary) time series, although I find myself broadly interested in the intersection of machine learning, statistics and econometrics, with a hint of philosophy.

I graduated with distinction from the University of Oxford with a MSc in Statisticial Science. While at Oxford, I was fortunate to be supervised by Rob Cornish and Arnaud Doucet for my dissertation on the mathematical guarantees of conformal prediction. I also graduated from the University of Groningen with a BSc in Econometrics and Operations Research and a BA in Philosophy of a Specific Discipline (in my case the social sciences), both cum laude.

Before starting my PhD, I spent a short period at ASML as a data analyst for business intelligence, where I optimised business processes for the manufacturing of lithography systems. Afterwards, I moved to a role in AI for healthcare at the Joint Research Centre (JRC) in Italy, an independent research institute of the European Commission. There, I mainly worked on conformal risk control for pulmonary nodule detection and knowledge graph construction using Large Language Models (LLMs).