Backpropagation through time

Community hub

0 subscribers

Be the first to start a discussion here.

Recent from talks

Be the first to start a discussion here.

Recent from talks

Be the first to start a discussion here.

Contribute something to knowledge base

About hubMembersRules

Hub AI

Backpropagation through time AI simulator

(@Backpropagation through time_simulator)

Hub AI

Backpropagation through time AI simulator

(@Backpropagation through time_simulator)

Wikipedia

Grokipedia

Backpropagation through time

Backpropagation through time (BPTT) is a gradient-based technique for training certain types of recurrent neural networks, such as Elman networks. The algorithm was independently derived by numerous researchers.

The training data for a recurrent neural network is an ordered sequence of $k$ input-output pairs, $\langle \mathbf {a} _{0},\mathbf {y} _{0}\rangle ,\langle \mathbf {a} _{1},\mathbf {y} _{1}\rangle ,\langle \mathbf {a} _{2},\mathbf {y} _{2}\rangle ,...,\langle \mathbf {a} _{k-1},\mathbf {y} _{k-1}\rangle$ . An initial value must be specified for the hidden state $\mathbf {x} _{0}$ , typically chosen to be a zero vector.

BPTT begins by unfolding a recurrent neural network in time. The unfolded network contains $k$ inputs and outputs, but every copy of the network shares the same parameters. Then, the backpropagation algorithm is used to find the gradient of the loss function with respect to all the network parameters.

Consider an example of a neural network that contains a recurrent layer $f$ and a feedforward layer $g$ . There are different ways to define the training cost, but the aggregated cost is always the average of the costs of each of the time steps. The cost of each time step can be computed separately. The figure above shows how the cost at time $t+3$ can be computed, by unfolding the recurrent layer $f$ for three time steps and adding the feedforward layer $g$ . Each instance of $f$ in the unfolded network shares the same parameters. Thus, the weight updates in each instance ( $f_{1},f_{2},f_{3}$ ) are summed together.

Below is pseudocode for a truncated version of BPTT, where the training data contains $n$ input-output pairs, and the network is unfolded for $k$ time steps:

BPTT tends to be significantly faster for training recurrent neural networks than general-purpose optimization techniques such as evolutionary optimization.

BPTT has difficulty with local optima. With recurrent neural networks, local optima are a much more significant problem than with feed-forward neural networks. The recurrent feedback in such networks tends to create chaotic responses in the error surface which cause local optima to occur frequently, and in poor locations on the error surface.

See all

Wikipedia

Grokipedia

Wikipedia

Grokipedia

Backpropagation through time

Below is pseudocode for a truncated version of BPTT, where the training data contains $n$ input-output pairs, and the network is unfolded for $k$ time steps:

BPTT tends to be significantly faster for training recurrent neural networks than general-purpose optimization techniques such as evolutionary optimization.

See all

Knowledge Base

Talk Channels

Special Pages

Backpropagation through time

Recent from talks

Recent from talks

Contribute something to knowledge base

Subscribers

Supporters

Contributors

Moderators

Hub AI

Hub AI

Hub AI

Backpropagation through time

Backpropagation through time

History

Backpropagation through time

Recent from talks

Recent from talks

Contribute something to knowledge base

Subscribers

Supporters

Contributors

Moderators

Hub AI

Hub AI

Hub AI

Backpropagation through time

Backpropagation through time