Hubbry Logo
search
logo

Conformal prediction

logo
Community Hub0 Subscribers
Write something...
Be the first to start a discussion here.
Be the first to start a discussion here.
See all
Conformal prediction

Conformal prediction (CP) is an algorithm for uncertainty quantification that produces statistically valid prediction regions (multidimensional prediction intervals) for any underlying point predictor (whether statistical, machine learning, or deep learning) only assuming exchangeability of the data. CP works by computing "nonconformity scores" on previously labeled data, and using these to create prediction sets on a new (unlabeled) test data point. A transductive version of CP was first proposed in 1998 by Gammerman, Vovk, and Vapnik, and since, several variants of conformal prediction have been developed with different computational complexities, formal guarantees, and practical applications.

Conformal prediction requires a user-specified significance level for which the algorithm should produce its predictions. This significance level restricts the frequency of errors that the algorithm is allowed to make. For example, a significance level of 0.1 means that the algorithm can make at most 10% erroneous predictions. To meet this requirement, the output is a set prediction, instead of a point prediction produced by standard supervised machine learning models. For classification tasks, this means that predictions are not a single class, for example 'cat', but instead a set like {'cat', 'dog'}. Depending on how good the underlying model is (how well it can discern between cats, dogs and other animals) and the specified significance level, these sets can be smaller or larger. For regression tasks, the output is prediction intervals, where a smaller significance level (fewer allowed errors) produces wider intervals which are less specific, and vice versa – more allowed errors produce tighter prediction intervals.

The conformal prediction first arose in a collaboration between Gammerman, Vovk, and Vapnik in 1998; this initial version of conformal prediction used what are now called E-values though the version of conformal prediction best known today uses p-values and was proposed a year later by Saunders et al. Vovk, Gammerman, and their students and collaborators, particularly Craig Saunders, Harris Papadopoulos, and Kostas Proedrou, continued to develop the ideas of conformal prediction; major developments include the proposal of inductive conformal prediction (a.k.a. split conformal prediction), in 2002. A book on the topic was written by Vovk and Shafer in 2005, and a tutorial was published in 2008.

The data has to conform to some standards, such as data being exchangeable (a slightly weaker assumption than the standard IID imposed in standard machine learning). For conformal prediction, a n% prediction region is said to be valid if the truth is in the output n% of the time. The efficiency is the size of the output. For classification, this size is the number of classes; for regression, it is interval width.

In the purest form, conformal prediction is made for an online (transductive) section. That is, after a label is predicted, its true label is known before the next prediction. Thus, the underlying model can be re-trained using this new data point and the next prediction will be made on a calibration set containing n + 1 data points, where the previous model had n data points.

The goal of standard classification algorithms is to classify a test object into one of several discrete classes. Conformal classifiers instead compute and output the p-value for each available class by performing a ranking of the nonconformity measure (α-value) of the test object against examples from the training data set. Similar to standard hypothesis testing, the p-value together with a threshold (referred to as significance level in the CP field) is used to determine whether the label should be in the prediction set. For example, for a significance level of 0.1, all classes with a p-value of 0.1 or greater are added to the prediction set. Transductive algorithms compute the nonconformity score using all available training data, while inductive algorithms compute it on a subset of the training set.

Inductive Conformal Prediction was first known as inductive confidence machines, but was later re-introduced as ICP. It has gained popularity in practical settings because the underlying model does not need to be retrained for every new test example. This makes it interesting for any model that is heavy to train, such as neural networks.

In MICP, the alpha values are class-dependent (Mondrian) and the underlying model does not follow the original online setting introduced in 2005.

See all
User Avatar
No comments yet.