Neural scaling law

Neural scaling law

current hub

Write something...

Be the first to start a discussion here.

Recent from talks

Be the first to start a discussion here.

Recent from talks

Be the first to start a discussion here.

About hubStatsRules

See all

Wikipedia

Grokipedia

In machine learning, a neural scaling law is an empirical scaling law that describes how neural network performance changes as key factors are scaled up or down. These factors typically include the number of parameters, training dataset size, and training cost. Some models also exhibit performance gains by scaling inference through increased test-time compute (TTC), extending neural scaling laws beyond training to the deployment phase.

In general, a deep learning model can be characterized by four parameters: model size, training dataset size, training cost, and the post-training error rate (e.g., the test set error rate). Each of these variables can be defined as a real number, usually written as $N,D,C,L$ (respectively: parameter count, dataset size, computing cost, and loss).

A neural scaling law is a theoretical or empirical statistical law between these parameters. There are also other parameters with other scaling laws.

In most cases, the model's size is simply the number of parameters. However, one complication arises with the use of sparse models, such as mixture-of-expert models. With sparse models, during inference, only a fraction of their parameters are used. In comparison, most other kinds of neural networks, such as transformer models, always use all their parameters during inference.

The size of the training dataset is usually quantified by the number of data points within it. Larger training datasets are typically preferred, as they provide a richer and more diverse source of information from which the model can learn. This can lead to improved generalization performance when the model is applied to new, unseen data. However, increasing the size of the training dataset also increases the computational resources and time required for model training.

With the "pretrain, then finetune" method used for most large language models, there are two kinds of training dataset: the pretraining dataset and the finetuning dataset. Their sizes have different effects on model performance. Generally, the finetuning dataset is less than 1% the size of pretraining dataset.

In some cases, a small amount of high quality data suffices for finetuning, and more data does not necessarily improve performance.

Training cost is typically measured in terms of time (how long it takes to train the model) and computational resources (how much processing power and memory are required). It is important to note that the cost of training can be significantly reduced with efficient training algorithms, optimized software libraries, and parallel computing on specialized hardware such as GPUs or TPUs.

See all

Hub AI

Neural scaling law AI simulator

(@Neural scaling law_simulator)

Wikipedia

Grokipedia

Hub AI

Neural scaling law

A neural scaling law is a theoretical or empirical statistical law between these parameters. There are also other parameters with other scaling laws.

In some cases, a small amount of high quality data suffices for finetuning, and more data does not necessarily improve performance.

See all

Recent media

Show all

Media

Show all

Knowledge Base

Talk Channels

Special Pages

Neural scaling law

Neural scaling law

Recent from talks

Recent from talks

Knowledge base stats:

Talk channels stats:

Members stats:

Neural scaling law

Hub AI

Neural scaling law

Recent media

History

Media collections

Neural scaling law

Neural scaling law

Recent from talks

Recent from talks

Knowledge base stats:

Talk channels stats:

Members stats:

Neural scaling law

Hub AI

Neural scaling law

Recent media