Gradient descent

Gradient descent

Main page

What are your thoughts?

Be the first to start a discussion here.

Recent from talks

Be the first to start a discussion here.

Recent from talks

Be the first to start a discussion here.

Gradient descent

Community hub0 subscribers

Talks overview Knowledge Base overview

About hubStatsRules

Wikipedia

Gradient descent is a method for unconstrained mathematical optimization. It is a first-order iterative algorithm for minimizing a differentiable multivariate function.

The idea is to take repeated steps in the opposite direction of the gradient (or approximate gradient) of the function at the current point, because this is the direction of steepest descent. Conversely, stepping in the direction of the gradient will lead to a trajectory that maximizes that function; the procedure is then known as gradient ascent. It is particularly useful in machine learning for minimizing the cost or loss function. Gradient descent should not be confused with local search algorithms, although both are iterative methods for optimization.

Gradient descent is generally attributed to Augustin-Louis Cauchy, who first suggested it in 1847. Jacques Hadamard independently proposed a similar method in 1907. Its convergence properties for non-linear optimization problems were first studied by Haskell Curry in 1944, with the method becoming increasingly well-studied and used in the following decades.

A simple extension of gradient descent, stochastic gradient descent, serves as the most basic algorithm used for training most deep networks today.

Gradient descent is based on the observation that if the multi-variable function $f(\mathbf {x} )$ is defined and differentiable in a neighborhood of a point $\mathbf {a}$ , then $f(\mathbf {x} )$ decreases fastest if one goes from $\mathbf {a}$ in the direction of the negative gradient of $f$ at $\mathbf {a} ,-\nabla f(\mathbf {a} )$ . It follows that, if

for a small enough step size or learning rate $\eta \in \mathbb {R} _{+}$ , then $f(\mathbf {a_{n}} )\geq f(\mathbf {a_{n+1}} )$ . In other words, the term $\eta \nabla f(\mathbf {a} )$ is subtracted from $\mathbf {a}$ because we want to move against the gradient, toward the local minimum. With this observation in mind, one starts with a guess $\mathbf {x} _{0}$ for a local minimum of $f$ , and considers the sequence $\mathbf {x} _{0},\mathbf {x} _{1},\mathbf {x} _{2},\ldots$ such that

We have a monotonic sequence

so the sequence $(\mathbf {x} _{n})$ converges to the desired local minimum. Note that the value of the step size $\eta$ is allowed to change at every iteration.

See all

Hub AI

Gradient descent AI simulator

(@Gradient descent_simulator)

Wikipedia

Hub AI

Gradient descent

Gradient descent is a method for unconstrained mathematical optimization. It is a first-order iterative algorithm for minimizing a differentiable multivariate function.

A simple extension of gradient descent, stochastic gradient descent, serves as the most basic algorithm used for training most deep networks today.

We have a monotonic sequence

so the sequence $(\mathbf {x} _{n})$ converges to the desired local minimum. Note that the value of the step size $\eta$ is allowed to change at every iteration.

See all

Recent media

Show all

Media

Show all

Talk Channels

Knowledge Base

Special Pages

Talk Channels

Knowledge Base

Special Pages

Gradient descent

Gradient descent

Recent from talks

Recent from talks

Knowledge base stats:

Talk channels stats:

Members stats:

Gradient descent

Hub AI

Gradient descent

Recent media

Contribute something to knowledge base

History

Media collections

History

Media collections

Gradient descent

Gradient descent

Recent from talks

Recent from talks

Knowledge base stats:

Talk channels stats:

Members stats:

Gradient descent

Hub AI

Gradient descent

Recent media

Contribute something to knowledge base