Empirical risk minimization

Main page

What are your thoughts?

Be the first to start a discussion here.

Recent from talks

Be the first to start a discussion here.

Recent from talks

Be the first to start a discussion here.

Empirical risk minimization

Community hub0 subscribers

Talks overview Knowledge Base overview

About hubStatsRules

Wikipedia

Grokipedia

Empirical risk minimization

In statistical learning theory, the principle of empirical risk minimization defines a family of learning algorithms based on evaluating performance over a known and fixed dataset. The core idea is based on an application of the law of large numbers; more specifically, we cannot know exactly how well a predictive algorithm will work in practice (i.e. the "true risk") because we do not know the true distribution of the data, but we can instead estimate and optimize the performance of the algorithm on a known set of training data. The performance over the known set of training data is referred to as the "empirical risk".

The following situation is a general setting of many supervised learning problems. There are two spaces of objects $X$ and $Y$ and we would like to learn a function $\ h:X\to Y$ (often called hypothesis) which outputs an object $y\in Y$ , given $x\in X$ . To do so, there is a training set of $n$ examples $\ (x_{1},y_{1}),\ldots ,(x_{n},y_{n})$ where $x_{i}\in X$ is an input and $y_{i}\in Y$ is the corresponding response that is desired from $h(x_{i})$ .

To put it more formally, assuming that there is a joint probability distribution $P(x,y)$ over $X$ and $Y$ , and that the training set consists of $n$ instances $\ (x_{1},y_{1}),\ldots ,(x_{n},y_{n})$ drawn i.i.d. from $P(x,y)$ . The assumption of a joint probability distribution allows for the modelling of uncertainty in predictions (e.g. from noise in data) because $y$ is not a deterministic function of $x$ , but rather a random variable with conditional distribution $P(y|x)$ for a fixed $x$ .

It is also assumed that there is a non-negative real-valued loss function $L({\hat {y}},y)$ which measures how different the prediction ${\hat {y}}$ of a hypothesis is from the true outcome $y$ . For classification tasks, these loss functions can be scoring rules. The risk associated with hypothesis $h(x)$ is then defined as the expectation of the loss function:

A loss function commonly used in theory is the 0-1 loss function: $L({\hat {y}},y)={\begin{cases}1&{\mbox{ if }}\quad {\hat {y}}\neq y\\0&{\mbox{ if }}\quad {\hat {y}}=y\end{cases}}$ .

The ultimate goal of a learning algorithm is to find a hypothesis $h^{*}$ among a fixed class of functions ${\mathcal {H}}$ for which the risk $R(h)$ is minimal:

For classification problems, the Bayes classifier is defined to be the classifier minimizing the risk defined with the 0–1 loss function.

In general, the risk $R(h)$ cannot be computed because the distribution $P(x,y)$ is unknown to the learning algorithm. However, given a sample of iid training data points, we can compute an estimate, called the empirical risk, by computing the average of the loss function over the training set; more formally, computing the expectation with respect to the empirical measure:

See all

Hub AI

Empirical risk minimization AI simulator

(@Empirical risk minimization_simulator)

Wikipedia

Grokipedia

Hub AI

Empirical risk minimization

A loss function commonly used in theory is the 0-1 loss function: $L({\hat {y}},y)={\begin{cases}1&{\mbox{ if }}\quad {\hat {y}}\neq y\\0&{\mbox{ if }}\quad {\hat {y}}=y\end{cases}}$ .

The ultimate goal of a learning algorithm is to find a hypothesis $h^{*}$ among a fixed class of functions ${\mathcal {H}}$ for which the risk $R(h)$ is minimal:

For classification problems, the Bayes classifier is defined to be the classifier minimizing the risk defined with the 0–1 loss function.

See all

Talk Channels

Knowledge Base

Special Pages

Talk Channels

Knowledge Base

Special Pages

Empirical risk minimization

Empirical risk minimization

Recent from talks

Recent from talks

Knowledge base stats:

Talk channels stats:

Members stats:

Empirical risk minimization

Hub AI

Empirical risk minimization

Contribute something to knowledge base

History

History

Empirical risk minimization

Empirical risk minimization

Recent from talks

Recent from talks

Knowledge base stats:

Talk channels stats:

Members stats:

Empirical risk minimization

Hub AI

Empirical risk minimization

Contribute something to knowledge base