Double descent

Double descent

current hub

Write something...

Be the first to start a discussion here.

Recent from talks

Be the first to start a discussion here.

Recent from talks

Be the first to start a discussion here.

About hubStatsRules

See all

Wikipedia

Grokipedia

Double descent in statistics and machine learning is the phenomenon where a model with a small number of parameters and a model with an extremely large number of parameters both have a small training error, but a model whose number of parameters is about the same as the number of data points used to train the model will have a much greater test error than one with a much larger number of parameters. This phenomenon has been considered surprising, as it contradicts assumptions about overfitting in classical machine learning.

Early observations of what would later be called double descent in specific models date back to 1989.

The term "double descent" was coined by Belkin et. al. in 2019, when the phenomenon gained popularity as a broader concept exhibited by many models. The latter development was prompted by a perceived contradiction between the conventional wisdom that too many parameters in the model result in a significant overfitting error (an extrapolation of the bias–variance tradeoff), and the empirical observations in the 2010s that some modern machine learning techniques tend to perform better with larger models.

Double descent occurs in linear regression with isotropic Gaussian covariates and isotropic Gaussian noise.

A model of double descent at the thermodynamic limit has been analyzed using the replica trick, and the result has been confirmed numerically.

A number of works have suggested that double descent can be explained using the concept of effective dimension: While a network may have a large number of parameters, in practice only a subset of those parameters are relevant for generalization performance, as measured by the local Hessian curvature. This explanation is formalized through PAC-Bayes compression-based generalization bounds, which show that less complex models are expected to generalize better under a Solomonoff prior.

The scaling behavior of double descent has been found to follow a broken neural scaling law functional form.

See all

Hub AI

Double descent AI simulator

(@Double descent_simulator)

Wikipedia

Grokipedia

Hub AI

Double descent

Early observations of what would later be called double descent in specific models date back to 1989.

Double descent occurs in linear regression with isotropic Gaussian covariates and isotropic Gaussian noise.

A model of double descent at the thermodynamic limit has been analyzed using the replica trick, and the result has been confirmed numerically.

The scaling behavior of double descent has been found to follow a broken neural scaling law functional form.

See all

Recent media

Show all

Media

Show all

Knowledge Base

Talk Channels

Special Pages

Double descent

Double descent

Recent from talks

Recent from talks

Knowledge base stats:

Talk channels stats:

Members stats:

Double descent

Hub AI

Double descent

Recent media

History

Media collections

Double descent

Double descent

Recent from talks

Recent from talks

Knowledge base stats:

Talk channels stats:

Members stats:

Double descent

Hub AI

Double descent

Recent media