Automatic differentiation

current hub

Write something...

Be the first to start a discussion here.

Recent from talks

Be the first to start a discussion here.

Recent from talks

Be the first to start a discussion here.

About hubStatsRules

See all

Wikipedia

Grokipedia

In mathematics and computer algebra, automatic differentiation (auto-differentiation, autodiff, or AD), also called algorithmic differentiation, computational differentiation, and differentiation arithmetic is a set of techniques to evaluate the partial derivative of a function specified by a computer program. Automatic differentiation is a subtle and central tool to automate the simultaneous computation of the numerical values of arbitrarily complex functions and their derivatives with no need for the symbolic representation of the derivative, only the function rule or an algorithm thereof is required. Auto-differentiation is thus neither numeric nor symbolic, nor is it a combination of both. It is also preferable to ordinary numerical methods: In contrast to the more traditional numerical methods based on finite differences, auto-differentiation is 'in theory' exact, and in comparison to symbolic algorithms, it is computationally inexpensive.

Automatic differentiation exploits the fact that every computer calculation, no matter how complicated, executes a sequence of elementary arithmetic operations (addition, subtraction, multiplication, division, etc.) and elementary functions (exp, log, sin, cos, etc.). By applying the chain rule repeatedly to these operations, partial derivatives of arbitrary order can be computed automatically, accurately to working precision, and using at most a small constant factor of more arithmetic operations than the original program.

Automatic differentiation is distinct from symbolic differentiation and numerical differentiation. Symbolic differentiation faces the difficulty of converting a computer program into a single mathematical expression and can lead to inefficient code. Numerical differentiation (the method of finite differences) can introduce round-off errors in the discretization process and cancellation. Both of these classical methods have problems with calculating higher derivatives, where complexity and errors increase. Finally, both of these classical methods are slow at computing partial derivatives of a function with respect to many inputs, as is needed for gradient-based optimization algorithms. Automatic differentiation solves all of these problems.

Currently, for its efficiency and accuracy in computing first and higher order derivatives, auto-differentiation is a celebrated technique with diverse applications in scientific computing and mathematics. It should therefore come as no surprise that there are numerous computational implementations of auto-differentiation. Among these, one mentions INTLAB, Sollya, and InCLosure. In practice, there are two types (modes) of algorithmic differentiation: a forward-type and a reversed-type. Presently, the two types are highly correlated and complementary and both have a wide variety of applications in, e.g., non-linear optimization, sensitivity analysis, robotics, machine learning, computer graphics, and computer vision. Automatic differentiation is particularly important in the field of machine learning. For example, it allows one to implement backpropagation in a neural network without a manually-computed derivative.

Fundamental to automatic differentiation is the decomposition of differentials provided by the chain rule of partial derivatives of composite functions. For the simple composition ${\begin{aligned}y&=f(g(h(x)))=f(g(h(w_{0})))=f(g(w_{1}))=f(w_{2})=w_{3}\\w_{0}&=x\\w_{1}&=h(w_{0})\\w_{2}&=g(w_{1})\\w_{3}&=f(w_{2})=y\end{aligned}}$ the chain rule gives ${\frac {\partial y}{\partial x}}={\frac {\partial y}{\partial w_{2}}}{\frac {\partial w_{2}}{\partial w_{1}}}{\frac {\partial w_{1}}{\partial x}}={\frac {\partial f(w_{2})}{\partial w_{2}}}{\frac {\partial g(w_{1})}{\partial w_{1}}}{\frac {\partial h(w_{0})}{\partial x}}$

Usually, two distinct modes of automatic differentiation are presented.

Forward accumulation specifies that one traverses the chain rule from inside to outside (that is, first compute $\partial w_{1}/\partial x$ and then $\partial w_{2}/\partial w_{1}$ and lastly $\partial y/\partial w_{2}$ ), while reverse accumulation traverses from outside to inside (first compute $\partial y/\partial w_{2}$ and then $\partial w_{2}/\partial w_{1}$ and lastly $\partial w_{1}/\partial x$ ). More succinctly,

The value of the partial derivative, called the seed, is propagated forward or backward and is initially ${\frac {\partial x}{\partial x}}=1$ or ${\frac {\partial y}{\partial y}}=1$ . Forward accumulation evaluates the function and calculates the derivative with respect to one independent variable in one pass. For each independent variable $x_{1},x_{2},\dots ,x_{n}$ a separate pass is therefore necessary in which the derivative with respect to that independent variable is set to one ( ${\frac {\partial x_{1}}{\partial x_{1}}}=1$ ) and of all others to zero ( ${\frac {\partial x_{2}}{\partial x_{1}}}=\dots ={\frac {\partial x_{n}}{\partial x_{1}}}=0$ ). In contrast, reverse accumulation requires the evaluated partial functions for the partial derivatives. Reverse accumulation therefore evaluates the function first and calculates the derivatives with respect to all independent variables in an additional pass.