LogSumExp

The LogSumExp (LSE) (also called RealSoftMax^[1] or multivariable softplus) function is a smooth maximum – a smooth approximation to the maximum function, mainly used by machine learning algorithms.^[2] It is defined as the logarithm of the sum of the exponentials of the arguments:

$\mathrm {LSE} (x_{1},\dots ,x_{n})=\log \left(\exp(x_{1})+\cdots +\exp(x_{n})\right).$

Properties

The LogSumExp function domain is $\mathbb {R} ^{n}$ , the real coordinate space, and its codomain is $\mathbb {R}$ , the real line. It is an approximation to the maximum $\max _{i}x_{i}$ with the following bounds $\max {\{x_{1},\dots ,x_{n}\}}\leq \mathrm {LSE} (x_{1},\dots ,x_{n})\leq \max {\{x_{1},\dots ,x_{n}\}}+\log(n).$ The first inequality is strict unless $n=1$ . The second inequality is strict unless all arguments are equal. (Proof: Let $m=\max _{i}x_{i}$ . Then $\exp(m)\leq \sum _{i=1}^{n}\exp(x_{i})\leq n\exp(m)$ . Applying the logarithm to the inequality gives the result.)

In addition, we can scale the function to make the bounds tighter. Consider the function ${\frac {1}{t}}\mathrm {LSE} (tx_{1},\dots ,tx_{n})$ . Then $\max {\{x_{1},\dots ,x_{n}\}}<{\frac {1}{t}}\mathrm {LSE} (tx_{1},\dots ,tx_{n})\leq \max {\{x_{1},\dots ,x_{n}\}}+{\frac {\log(n)}{t}}.$ (Proof: Replace each $x_{i}$ with $tx_{i}$ for some $t>0$ in the inequalities above, to give $\max {\{tx_{1},\dots ,tx_{n}\}}<\mathrm {LSE} (tx_{1},\dots ,tx_{n})\leq \max {\{tx_{1},\dots ,tx_{n}\}}+\log(n).$ and, since $t>0$ $t\max {\{x_{1},\dots ,x_{n}\}}<\mathrm {LSE} (tx_{1},\dots ,tx_{n})\leq t\max {\{x_{1},\dots ,x_{n}\}}+\log(n).$ finally, dividing by $t$ gives the result.)

Also, if we multiply by a negative number instead, we of course find a comparison to the $\min$ function: $\min {\{x_{1},\dots ,x_{n}\}}-{\frac {\log(n)}{t}}\leq {\frac {1}{-t}}\mathrm {LSE} (-tx)<\min {\{x_{1},\dots ,x_{n}\}}.$

The LogSumExp function is convex, and is strictly increasing everywhere in its domain.^[3] It is not strictly convex, since it is affine (linear plus a constant) on the diagonal and parallel lines:^[4]

\mathrm {LSE} (x_{1}+c,\dots ,x_{n}+c)=\mathrm {LSE} (x_{1},\dots ,x_{n})+c.

Other than this direction, it is strictly convex (the Hessian has rank ⁠ $n-1$ ⁠), so for example restricting to a hyperplane that is transverse to the diagonal results in a strictly convex function. See $\mathrm {LSE} _{0}^{+}$ , below.

Writing $\mathbf {x} =(x_{1},\dots ,x_{n}),$ the partial derivatives are: ${\frac {\partial }{\partial x_{i}}}{\mathrm {LSE} (\mathbf {x} )}={\frac {\exp x_{i}}{\sum _{j}\exp {x_{j}}}},$ which means the gradient of LogSumExp is the softmax function.

The convex conjugate of LogSumExp is the negative entropy.

log-sum-exp trick for log-domain calculations

The LSE function is often encountered when the usual arithmetic computations are performed on a logarithmic scale, as in log probability.^[5]

Similar to multiplication operations in linear-scale becoming simple additions in log-scale, an addition operation in linear-scale becomes the LSE in log-scale:

$\mathrm {LSE} (\log(x_{1}),...,\log(x_{n}))=\log(x_{1}+\dots +x_{n})$ A common purpose of using log-domain computations is to increase accuracy and avoid underflow and overflow problems when very small or very large numbers are represented directly (i.e. in a linear domain) using limited-precision floating point numbers.^[6]

Unfortunately, the use of LSE directly in this case can again cause overflow/underflow problems. Therefore, the following equivalent must be used instead (especially when the accuracy of the above 'max' approximation is not sufficient).

$\mathrm {LSE} (x_{1},\dots ,x_{n})=x^{*}+\log \left(\exp(x_{1}-x^{*})+\cdots +\exp(x_{n}-x^{*})\right)$ where $x^{*}=\max {\{x_{1},\dots ,x_{n}\}}$

Many math libraries such as IT++ provide a default routine of LSE and use this formula internally.

A strictly convex log-sum-exp type function

LSE is convex but not strictly convex. We can define a strictly convex log-sum-exp type function^[7] by adding an extra argument set to zero:

$\mathrm {LSE} _{0}^{+}(x_{1},...,x_{n})=\mathrm {LSE} (0,x_{1},...,x_{n})$ This function is a proper Bregman generator (strictly convex and differentiable). It is encountered in machine learning, for example, as the cumulant of the multinomial/binomial family.

In tropical analysis, this is the sum in the log semiring.

References

^ Zhang, Aston; Lipton, Zack; Li, Mu; Smola, Alex. "Dive into Deep Learning, Chapter 3 Exercises". www.d2l.ai. Retrieved 27 June 2020.
^ Nielsen, Frank; Sun, Ke (2016). "Guaranteed bounds on the Kullback-Leibler divergence of univariate mixtures using piecewise log-sum-exp inequalities". Entropy. 18 (12): 442. arXiv:1606.05850. Bibcode:2016Entrp..18..442N. doi:10.3390/e18120442. S2CID 17259055.
^ El Ghaoui, Laurent (2017). Optimization Models and Applications.
^ "convex analysis - About the strictly convexity of log-sum-exp function - Mathematics Stack Exchange". stackexchange.com.
^ McElreath, Richard. Statistical Rethinking. OCLC 1107423386.
^ "Practical issues: Numeric stability". CS231n Convolutional Neural Networks for Visual Recognition.
^ Nielsen, Frank; Hadjeres, Gaetan (2018). "Monte Carlo Information Geometry: The dually flat case". arXiv:1803.07225 [cs.LG].

[1] Zhang, Aston; Lipton, Zack; Li, Mu; Smola, Alex. "Dive into Deep Learning, Chapter 3 Exercises". www.d2l.ai. Retrieved 27 June 2020.

[F._Nielsen_2016-2] Nielsen, Frank; Sun, Ke (2016). "Guaranteed bounds on the Kullback-Leibler divergence of univariate mixtures using piecewise log-sum-exp inequalities". Entropy. 18 (12): 442. arXiv:1606.05850. Bibcode:2016Entrp..18..442N. doi:10.3390/e18120442. S2CID 17259055.

[L._El_Ghaoui_2017-3] El Ghaoui, Laurent (2017). Optimization Models and Applications.

[4] "convex analysis - About the strictly convexity of log-sum-exp function - Mathematics Stack Exchange". stackexchange.com.

[5] McElreath, Richard. Statistical Rethinking. OCLC 1107423386.

[6] "Practical issues: Numeric stability". CS231n Convolutional Neural Networks for Visual Recognition.

[F._Nielsen_2018-7] Nielsen, Frank; Hadjeres, Gaetan (2018). "Monte Carlo Information Geometry: The dually flat case". arXiv:1803.07225 [cs.LG].

[1]

[2]

[3]

[4]

[5]

[6]

[7]

Info Pages

Talk Pages

Special Pages

LogSumExp

Recent from talks

Recent from talks

Contribute something

Contribute something

Media Pages

Timelines

Articles

Notes collections

Notes

Notes

Days in Chronicle

LogSumExp

Properties

log-sum-exp trick for log-domain calculations

A strictly convex log-sum-exp type function

See also

References

LogSumExp

Definition and Properties

Definition

Basic properties

Convexity and derivatives

Numerical Computation

The log-sum-exp trick

Implementation and stability

Applications

In machine learning and statistics

In optimization and other fields

Variants

Strictly convex log-sum-exp function

References

Add your contribution

Related Hubs

Contribute something

History

LogSumExp

Recent from talks

Recent from talks

Contribute something

Contribute something

Media Pages

Timelines

Articles

Notes collections

Notes

Notes

Days in Chronicle

LogSumExp

Properties

log-sum-exp trick for log-domain calculations

A strictly convex log-sum-exp type function

See also

References

LogSumExp

Definition and Properties

Definition

Basic properties

Convexity and derivatives

Numerical Computation

The log-sum-exp trick

Implementation and stability

Add your contribution

Related Hubs

Contribute something