Hubbry Logo
search
logo

Pushforward measure

logo
Community Hub0 Subscribers
Read side by side
from Wikipedia

In measure theory, a pushforward measure (also known as push forward, push-forward or image measure) is obtained by transferring ("pushing forward") a measure from one measurable space to another using a measurable function.

Definition

[edit]

Given measurable spaces and , a measurable function and a measure , the pushforward of by is defined to be the measure given by

for

This definition applies mutatis mutandis for a signed or complex measure. The pushforward measure is also denoted as , , , or .

Properties

[edit]

Change of variable formula

[edit]

Theorem:[1] A measurable function g on X2 is integrable with respect to the pushforward measure f(μ) if and only if the composition is integrable with respect to the measure μ. In that case, the integrals coincide, i.e.,

Note that in the previous formula .

Functoriality

[edit]

Pushforwards of measures allow to induce, from a function between measurable spaces , a function between the spaces of measures . As with many induced mappings, this construction has the structure of a functor, on the category of measurable spaces.

For the special case of probability measures, this property amounts to functoriality of the Giry monad.

Examples and applications

[edit]
  • If is a probability space, is a measurable space, and is a -valued random variable, then the probability distribution of is the pushforward measure of by onto .
  • A natural "Lebesgue measure" on the unit circle S1 (here thought of as a subset of the complex plane C) may be defined using a push-forward construction and Lebesgue measure λ on the real line R. Let λ also denote the restriction of Lebesgue measure to the interval [0, 2π) and let f : [0, 2π) → S1 be the natural bijection defined by f(t) = exp(i t). The natural "Lebesgue measure" on S1 is then the push-forward measure f(λ). The measure f(λ) might also be called "arc length measure" or "angle measure", since the f(λ)-measure of an arc in S1 is precisely its arc length (or, equivalently, the angle that it subtends at the centre of the circle.)
  • The previous example extends nicely to give a natural "Lebesgue measure" on the n-dimensional torus Tn. The previous example is a special case, since S1 = T1. This Lebesgue measure on Tn is, up to normalization, the Haar measure for the compact, connected Lie group Tn.
  • Gaussian measures on infinite-dimensional vector spaces are defined using the push-forward and the standard Gaussian measure on the real line: a Borel measure γ on a separable Banach space X is called Gaussian if the push-forward of γ by any non-zero linear functional in the continuous dual space to X is a Gaussian measure on R.
  • Consider a measurable function f : XX and the composition of f with itself n times:
This iterated function forms a dynamical system. It is often of interest in the study of such systems to find a measure μ on X that the map f leaves unchanged, a so-called invariant measure, i.e one for which f(μ) = μ.
  • One can also consider quasi-invariant measures for such a dynamical system: a measure on is called quasi-invariant under if the push-forward of by is merely equivalent to the original measure μ, not necessarily equal to it. A pair of measures on the same space are equivalent if and only if , so is quasi-invariant under if
  • Many natural probability distributions, such as the chi distribution, can be obtained via this construction.
  • Random variables induce pushforward measures. They map a probability space into a codomain space and endow that space with a probability measure defined by the pushforward. Furthermore, because random variables are functions (and hence total functions), the inverse image of the whole codomain is the whole domain, and the measure of the whole domain is 1, so the measure of the whole codomain is 1. This means that random variables can be composed ad infinitum and they will always remain random variables and endow the codomain spaces with probability measures.

A generalization

[edit]

In general, any measurable function can be pushed forward. The push-forward then becomes a linear operator, known as the transfer operator or Frobenius–Perron operator. In finite spaces this operator typically satisfies the requirements of the Frobenius–Perron theorem, and the maximal eigenvalue of the operator corresponds to the invariant measure.

The adjoint to the push-forward is the pullback; as an operator on spaces of functions on measurable spaces, it is the composition operator or Koopman operator.

See also

[edit]

Notes

[edit]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
In measure theory, the pushforward measure (also known as the image measure) of a measure μ\mu on a measurable space (X,A)(X, \mathcal{A}) under a measurable map f:XYf: X \to Y to another measurable space (Y,B)(Y, \mathcal{B}) is the measure fμf_* \mu on (Y,B)(Y, \mathcal{B}) defined by (fμ)(B)=μ(f1(B))(f_* \mu)(B) = \mu(f^{-1}(B)) for every BBB \in \mathcal{B}.[1] This construction transfers the "mass" of μ\mu from XX to YY via ff, preserving the total measure if μ\mu is a probability measure, so that fμ(Y)=μ(X)=1f_* \mu(Y) = \mu(X) = 1.[2] A key property of the pushforward measure is its compatibility with integration: for any measurable function g:Y[0,]g: Y \to [0, \infty], the integral transforms as Ygd(fμ)=X(gf)dμ\int_Y g \, d(f_* \mu) = \int_X (g \circ f) \, d\mu.[3] This ensures that expectations and probabilities are preserved under the mapping, making pushforward measures essential for change-of-variables formulas in multiple integrals.[1] For instance, under a linear transformation T:RdRdT: \mathbb{R}^d \to \mathbb{R}^d with Lebesgue measure mm, the pushforward satisfies Tm=1detTmT_* m = \frac{1}{|\det T|} m, which scales the measure by the absolute value of the determinant.[4] In probability theory, the pushforward measure fPf_* \mathbb{P} induced by a random variable ff on a probability space (Ω,F,P)(\Omega, \mathcal{F}, \mathbb{P}) is precisely the distribution (or law) of ff, describing the probabilities of events in the codomain.[2] This concept extends to more advanced applications, such as optimal transport, where pushforwards model the displacement of mass between measures, and in differential geometry, where they facilitate the study of measures under diffeomorphisms via Jacobian adjustments.[3] Pushforward measures also play a role in product spaces and Haar measures on groups, ensuring invariance under group actions.[1]

Fundamentals

Definition

In measure theory, consider two measurable spaces (X,ΣX)(X, \Sigma_X) and (Y,ΣY)(Y, \Sigma_Y), where ΣX\Sigma_X and ΣY\Sigma_Y are σ\sigma-algebras on sets XX and YY, respectively. Let f:XYf: X \to Y be a measurable function, meaning that f1(B)ΣXf^{-1}(B) \in \Sigma_X for every BΣYB \in \Sigma_Y, and let μ\mu be a measure on the measurable space (X,ΣX)(X, \Sigma_X).[1] The pushforward measure (also known as the image measure) fμf_* \mu induced by ff is the measure on (Y,ΣY)(Y, \Sigma_Y) defined by
(fμ)(B)=μ(f1(B)) (f_* \mu)(B) = \mu(f^{-1}(B))
for every BΣYB \in \Sigma_Y.[1]
To verify that fμf_* \mu is indeed a measure, first note non-negativity: for any BΣYB \in \Sigma_Y, (fμ)(B)=μ(f1(B))0(f_* \mu)(B) = \mu(f^{-1}(B)) \geq 0 since μ\mu is non-negative. For σ\sigma-additivity, suppose {Bn}n=1\{B_n\}_{n=1}^\infty is a countable collection of pairwise disjoint sets in ΣY\Sigma_Y. Then f1(n=1Bn)=n=1f1(Bn)f^{-1}(\bigcup_{n=1}^\infty B_n) = \bigcup_{n=1}^\infty f^{-1}(B_n), and the preimages are also pairwise disjoint and measurable, so
(fμ)(n=1Bn)=μ(n=1f1(Bn))=n=1μ(f1(Bn))=n=1(fμ)(Bn), (f_* \mu)\left( \bigcup_{n=1}^\infty B_n \right) = \mu\left( \bigcup_{n=1}^\infty f^{-1}(B_n) \right) = \sum_{n=1}^\infty \mu(f^{-1}(B_n)) = \sum_{n=1}^\infty (f_* \mu)(B_n),
by the σ\sigma-additivity of μ\mu. Additionally, (fμ)()=μ(f1())=μ()=0(f_* \mu)(\emptyset) = \mu(f^{-1}(\emptyset)) = \mu(\emptyset) = 0.[1]
The pushforward measure fμf_* \mu inherits key finiteness properties from μ\mu. It is finite if μ(X)<\mu(X) < \infty, since (fμ)(Y)=μ(f1(Y))=μ(X)<(f_* \mu)(Y) = \mu(f^{-1}(Y)) = \mu(X) < \infty. If μ\mu is a probability measure (i.e., μ(X)=1\mu(X) = 1), then fμf_* \mu is also a probability measure on YY.[1]

Notation

The standard notation for the pushforward of a measure μ\mu on a measurable space (X,A)(X, \mathcal{A}) under a measurable map f:XYf: X \to Y is fμf_* \mu, defined such that (fμ)(B)=μ(f1(B))(f_* \mu)(B) = \mu(f^{-1}(B)) for BBB \in \mathcal{B}, where B\mathcal{B} is the σ\sigma-algebra on YY.[5] An equivalent common notation is f#μf_\# \mu or f#μf\# \mu, particularly prevalent in probability and optimal transport literature.[6] Alternative notations include μf1\mu \circ f^{-1}, which explicitly emphasizes the preimage operation, and the term "image measure" μf\mu_f to denote the transferred measure.[7] A key convention distinguishes pushforward from pullback notations: subscripts like ff_* or f#f_\# indicate pushforward (forward direction along ff), while superscripts such as ff^* denote pullback (backward along ff). This subscript-superscript dichotomy is standard in analysis and geometry to avoid ambiguity in differential forms or densities. In probability theory, the pushforward fμf_* \mu—where μ\mu is the law of a random variable XX—is often called the law of f(X)f(X), highlighting its role in describing the distribution of transformed random variables.[8] Conversely, in real analysis, it is frequently referred to as the transferred measure or pushed-forward Lebesgue measure when μ\mu is Lebesgue measure on Rn\mathbb{R}^n and ff is a diffeomorphism, underscoring its use in change-of-variables formulas.[5]

Properties

Change of Variable Formula

The change of variable formula, also known as the substitution rule for integrals with respect to pushforward measures, establishes a fundamental relationship between integration over a measure space and its image under a measurable map. Specifically, let (X,A,μ)(X, \mathcal{A}, \mu) be a measure space, (Y,B)(Y, \mathcal{B}) a measurable space, and f:XYf: X \to Y a measurable function defining the pushforward measure μf\mu_f on B\mathcal{B} by μf(B)=μ(f1(B))\mu_f(B) = \mu(f^{-1}(B)) for BBB \in \mathcal{B}. For a non-negative measurable function g:Y[0,]g: Y \to [0, \infty], the formula asserts that
Ygdμf=X(gf)dμ. \int_Y g \, d\mu_f = \int_X (g \circ f) \, d\mu.
This holds under the condition that gfg \circ f is measurable, which follows from the measurability of gg and ff.[9][10] For the formula to apply in the sense of Lebesgue integration, additional conditions ensure integrability: gg must be μf\mu_f-integrable, meaning Ygdμf<\int_Y |g| \, d\mu_f < \infty, which is equivalent to Xgfdμ<\int_X |g \circ f| \, d\mu < \infty by the formula itself applied to g|g|. Absolute integrability of gfg \circ f with respect to μ\mu thus guarantees the validity of the equality for signed or complex-valued functions, as detailed below. These conditions prevent issues with infinite integrals and ensure the integrals are well-defined.[9][10] A proof outline proceeds first for non-negative functions via approximation by simple functions. For a simple function g=i=1nciχEig = \sum_{i=1}^n c_i \chi_{E_i} with ci0c_i \geq 0 and EiBE_i \in \mathcal{B}, the integral Ygdμf=i=1nciμf(Ei)=i=1nciμ(f1(Ei))\int_Y g \, d\mu_f = \sum_{i=1}^n c_i \mu_f(E_i) = \sum_{i=1}^n c_i \mu(f^{-1}(E_i)), while X(gf)dμ=i=1nciμ(f1(Ei))\int_X (g \circ f) \, d\mu = \sum_{i=1}^n c_i \mu(f^{-1}(E_i)), yielding equality by the definition of the pushforward. Any non-negative measurable gg can then be approximated from below by an increasing sequence of simple functions gngg_n \uparrow g, and the monotone convergence theorem implies YgndμfYgdμf\int_Y g_n \, d\mu_f \uparrow \int_Y g \, d\mu_f and similarly for X(gnf)dμX(gf)dμ\int_X (g_n \circ f) \, d\mu \uparrow \int_X (g \circ f) \, d\mu, establishing the result. Linearity extends it to simple linear combinations.[9][10] The formula extends naturally to signed functions and signed measures on the domain. For a signed function g=g+gg = g^+ - g^- with g+,g0g^+, g^- \geq 0 μf\mu_f-integrable (i.e., Ygdμf<\int_Y |g| \, d\mu_f < \infty), define Ygdμf=Yg+dμfYgdμf\int_Y g \, d\mu_f = \int_Y g^+ \, d\mu_f - \int_Y g^- \, d\mu_f, and the formula yields Ygdμf=X(gf)dμ\int_Y g \, d\mu_f = \int_X (g \circ f) \, d\mu by linearity. Similarly, if σ\sigma is a signed measure on XX, decomposable as σ=σ+σ\sigma = \sigma^+ - \sigma^- with positive measures σ+\sigma^+, σ\sigma^-, define the pushforward fσ=fσ+fσf_* \sigma = f_* \sigma^+ - f_* \sigma^-. Then, for gg integrable with respect to fσ|f_* \sigma|, Ygd(fσ)=X(gf)dσ\int_Y g \, d(f_* \sigma) = \int_X (g \circ f) \, d\sigma. For complex-valued g=g1+ig2g = g_1 + i g_2 with real and imaginary parts μf\mu_f-integrable, the equality follows by linearity over the reals. These extensions preserve the measurability and integrability conditions on gfg \circ f.[9][10] In the context of Lebesgue integration on Rn\mathbb{R}^n, the change of variable formula recovers the classical substitution rule as a special case. When μ\mu is Lebesgue measure λn\lambda^n, Y=X=RnY = X = \mathbb{R}^n, and ff is a C1C^1-diffeomorphism, the pushforward fλnf_* \lambda^n has density detDf(f1(y))1|\det Df(f^{-1}(y))|^{-1} with respect to λn\lambda^n, so for integrable g:RnRg: \mathbb{R}^n \to \mathbb{R},
Rng(y)dλn(y)=Rng(f(x))detDf(x)dλn(x), \int_{\mathbb{R}^n} g(y) \, d\lambda^n(y) = \int_{\mathbb{R}^n} g(f(x)) |\det Df(x)| \, d\lambda^n(x),
aligning with the standard Jacobian formula in multivariable calculus.[10]

Functoriality

The pushforward operation on measures induces a covariant functor in the category of measure spaces. Specifically, given measurable spaces (X,A,μ)(X, \mathcal{A}, \mu) and (Y,B)(Y, \mathcal{B}), a measurable map f:XYf: X \to Y defines the pushforward fμf_* \mu on (Y,B)(Y, \mathcal{B}) by fμ(B)=μ(f1(B))f_* \mu(B) = \mu(f^{-1}(B)) for BBB \in \mathcal{B}. This assignment extends functorially: if g:YZg: Y \to Z is another measurable map to a measure space (Z,C)(Z, \mathcal{C}), then the pushforward satisfies the composition rule
(gf)μ=g(fμ), (g \circ f)_* \mu = g_* (f_* \mu),
meaning the pushforward of the composite map equals the composite of the pushforwards. This functoriality ensures that the pushforward preserves the structure of measurable maps, acting covariantly on the category where objects are measurable spaces equipped with measures and morphisms are measurable functions.[1] The pushforward preserves certain measure-theoretic properties under appropriate conditions. For σ\sigma-finiteness, if μ\mu is σ\sigma-finite on XX, then fμf_* \mu is also σ\sigma-finite provided that the conditional measures on the fibers of ff have finite total mass almost everywhere with respect to the pushforward; otherwise, it may fail, as in the case of a projection map ϕ(x,y)=x\phi(x,y) = x from the unit square I2I^2 to II under a σ\sigma-finite measure whose density integrates to infinity over vertical fibers. Completeness is preserved if the domain measure space is complete and ff is such that null sets in the codomain correspond to null preimages, though this requires the codomain σ\sigma-algebra to be completed accordingly; in general, pushforwards of complete measures need not be complete without additional assumptions on ff, such as injectivity. These preservations highlight the pushforward's role in maintaining structural integrity across measure spaces.[11] In the subcategory of probability measures, the pushforward relates closely to the Giry monad, which equips the category of measurable spaces with a monad structure where the functor assigns to each space the space of probability measures on it. Here, a deterministic measurable map f:XYf: X \to Y induces a Kleisli arrow in the Kleisli category of the Giry monad by pushing forward probability measures via f:P(X)P(Y)f_* : \mathcal{P}(X) \to \mathcal{P}(Y), where P\mathcal{P} denotes the space of probabilities; this corresponds to the Markov kernel that maps xXx \in X to the Dirac measure δf(x)\delta_{f(x)} on YY. The monad's unit provides the Dirac embedding, and the multiplication handles convolutions, making pushforwards the deterministic special case of probabilistic morphisms.[12] Pushforwards can form algebraic structures when the underlying measurable maps do. If a set of measurable maps closed under composition forms a monoid (e.g., transformations generated by iterates in a dynamical system), the associated pushforward operators on the space of measures inherit a monoid structure via the functorial composition rule, acting as a representation of the original monoid on measures. This occurs, for instance, in the symmetries of probability distributions, where the set of measure-preserving transformations forms a monoid under composition, and pushforwards preserve this algebraic action. Such structures underpin applications in ergodic theory and stochastic processes.[13]

Examples

Probability Distributions

In probability theory, the distribution of a random variable XX defined on a probability space (Ω,F,P)(\Omega, \mathcal{F}, P) is given by the pushforward measure PX1P \circ X^{-1}, often denoted as the law of XX or PXP_* X.[14] This measure assigns to each measurable set BB in the codomain the probability P(X1(B))P(X^{-1}(B)), capturing how the original probability PP is transferred through the mapping induced by XX.[2] A concrete example arises when XX follows a uniform distribution on [0,1][0,1], so PP is the Lebesgue measure restricted to this interval. Consider the transformation Y=X2Y = X^2; the pushforward measure of PP under this map yields the distribution of YY with probability density function fY(y)=12yf_Y(y) = \frac{1}{2\sqrt{y}} for y[0,1]y \in [0,1].[15] This density reflects the compression of probabilities near zero due to the quadratic mapping, where values of XX near 1 contribute less densely to YY. For transformations involving monotone functions, the cumulative distribution function (CDF) of the resulting random variable connects directly to the pushforward. If Y=g(X)Y = g(X) where gg is strictly increasing and continuous, then the CDF of YY is FY(y)=P(g(X)y)=P(Xg1(y))=FX(g1(y))F_Y(y) = P(g(X) \leq y) = P(X \leq g^{-1}(y)) = F_X(g^{-1}(y)), illustrating how the pushforward preserves cumulative probabilities through the inverse mapping.[16] The chi-squared distribution provides another illustration, arising as the pushforward under the sum-of-squares map from independent standard Gaussian random variables. Specifically, if Z1,,ZkZ_1, \dots, Z_k are independent standard normals on Rk\mathbb{R}^k with product measure PP, then the pushforward under the function (z1,,zk)i=1kzi2(z_1, \dots, z_k) \mapsto \sum_{i=1}^k z_i^2 yields the chi-squared distribution with kk degrees of freedom.[15] For k=1k=1, this reduces to the square of a single standard normal, with density f(y)=12πyey/2f(y) = \frac{1}{\sqrt{2\pi y}} e^{-y/2} for y>0y > 0.[17]

Geometric Constructions

One prominent geometric construction using pushforward measures involves inducing the standard Lebesgue measure (or Hausdorff 1-measure) on the unit circle S1R2S^1 \subset \mathbb{R}^2. Consider the parametrization map f:[0,2π)S1f: [0, 2\pi) \to S^1 defined by f(t)=(cost,sint)f(t) = (\cos t, \sin t). The pushforward fλf_* \lambda, where λ\lambda denotes the Lebesgue measure on [0,2π)[0, 2\pi), coincides with the arc-length measure on S1S^1, which has total mass 2π2\pi and is equivalent to the 1-dimensional Hausdorff measure H1\mathcal{H}^1 on S1S^1.[18] This construction is absolutely continuous with respect to H1\mathcal{H}^1 on S1S^1, as the map ff is Lipschitz with derivative of constant speed f(t)=1|f'(t)| = 1.[19] Another key example arises in Euclidean spaces, where Gaussian measures on Rn\mathbb{R}^n are pushed forward under linear transformations to yield measures supported on ellipsoids. Let γ\gamma be the standard Gaussian measure on Rn\mathbb{R}^n with mean zero and identity covariance, and let A:RnRmA: \mathbb{R}^n \to \mathbb{R}^m be a linear map represented by an m×nm \times n matrix. The pushforward AγA_* \gamma is a Gaussian measure on Rm\mathbb{R}^m with mean zero and covariance matrix AATA A^T, concentrating mass along the ellipsoid defined by the range of AA with quadratic form given by the inverse covariance.[20] If AA has full rank mnm \leq n, then AγA_* \gamma is absolutely continuous with respect to Lebesgue measure λm\lambda_m on Rm\mathbb{R}^m, with density proportional to exp(12xT(AAT)1x)\exp\left( -\frac{1}{2} x^T (A A^T)^{-1} x \right); otherwise, it is singular with respect to λm\lambda_m.[21] Pushforward measures also facilitate the construction of Hausdorff measures on fractal sets via parametrizations from measures on suitable parameter domains. For self-similar fractals generated by an iterated function system (IFS), such as the middle-thirds Cantor set in [0,1][0,1], the Hausdorff measure Hd\mathcal{H}^d (where d=log2/log30.631d = \log 2 / \log 3 \approx 0.631) on the attractor can be realized as the pushforward of the infinite product Bernoulli measure with equal probabilities 1/2 on the symbolic space {0,1}N\{0,1\}^\mathbb{N} under the coding map Ψ:{0,1}N[0,1]\Psi: \{0,1\}^\mathbb{N} \to [0,1], Ψ((ik))=k=12ik3k\Psi((i_k)) = \sum_{k=1}^\infty 2 i_k 3^{-k}, up to normalization.[22] More generally, for the Sierpinski gasket in R2\mathbb{R}^2 (with d=log3/log21.585d = \log 3 / \log 2 \approx 1.585), the symbolic space is {1,2,3}N\{1,2,3\}^\mathbb{N} equipped with the infinite product Bernoulli measure with probabilities 1/3 each, and the pushforward under the coding map yields a measure equivalent to Hd\mathcal{H}^d on the gasket, which is singular with respect to Lebesgue measure λ2\lambda_2 on R2\mathbb{R}^2.[23] These constructions ensure the pushforward aligns with the intrinsic dimension of the fractal, capturing its geometric scaling properties.

Applications

Dynamical Systems

In dynamical systems, the pushforward measure plays a central role in defining invariant measures for transformations. Given a measurable map f:XXf: X \to X on a measure space (X,A,μ)(X, \mathcal{A}, \mu), a measure μ\mu is said to be ff-invariant if fμ=μf_* \mu = \mu, meaning that μ(f1(A))=μ(A)\mu(f^{-1}(A)) = \mu(A) for every measurable set AAA \in \mathcal{A}. This condition ensures that the measure remains unchanged under the action of ff, preserving the probabilistic or geometric structure of the space. Such invariant measures are fundamental for studying long-term behavior in systems where dynamics are governed by iterative applications of ff. A classic example arises in rotations on the unit circle T=R/Z\mathbb{T} = \mathbb{R}/\mathbb{Z}, equipped with the Lebesgue measure λ\lambda. For an irrational rotation Rα:xx+α(mod1)R_\alpha: x \mapsto x + \alpha \pmod{1}, where αRQ\alpha \in \mathbb{R} \setminus \mathbb{Q}, the pushforward satisfies Rαλ=λR_{\alpha *} \lambda = \lambda, making λ\lambda invariant. This invariance reflects the uniform distribution preserved by the irrational rotation, leading to dense orbits and equidistribution properties essential in ergodic analysis. More generally, quasi-invariant measures extend this framework to transformations that may distort volumes but preserve the null sets of the measure. A measure μ\mu is quasi-invariant under ff if fμμf_* \mu \sim \mu, meaning fμf_* \mu and μ\mu are equivalent (they agree on null sets), and the Radon-Nikodym derivative d(fμ)dμ\frac{d(f_* \mu)}{d\mu} exists and is positive μ\mu-almost everywhere. This derivative quantifies the local stretching or contraction induced by ff, allowing the study of non-volume-preserving dynamics, such as those in non-singular transformations. The connection to ergodic theory highlights how pushforwards underpin the preservation of integrals, a key to ergodicity. For an invariant measure μ\mu, the pushforward ensures gd(fμ)=(gf)dμ\int g \, d(f_* \mu) = \int (g \circ f) \, d\mu for integrable gg, implying that time averages along orbits converge to space averages under ergodicity, where invariant sets have μ\mu-measure 0 or 1. This integral preservation facilitates the analysis of mixing and recurrence in systems. An illustrative example is the Bernoulli shift on the symbolic space {0,1}Z\{0,1\}^\mathbb{Z}, where the shift map σ:(xn)nZ(xn+1)nZ\sigma: (x_n)_{n \in \mathbb{Z}} \mapsto (x_{n+1})_{n \in \mathbb{Z}} acts by shifting sequences. The product measure μ=(12δ0+12δ1)Z\mu = \left(\frac{1}{2} \delta_0 + \frac{1}{2} \delta_1\right)^\mathbb{Z} is invariant under σ\sigma, as σμ=μ\sigma_* \mu = \mu, and the system is ergodic, modeling independent coin flips with uniform distribution preserved across shifts. This construction exemplifies mixing properties and serves as a prototype for isomorphic classifications in ergodic theory.

Statistics and Optimal Transport

In statistical inference, pushforward measures play a key role in describing the induced distributions of transformed random variables, such as test statistics or estimators derived from observed data. For instance, under a null hypothesis in non-parametric testing, the distribution of a kernel-based test statistic is the pushforward of the empirical data measure under the statistic's mapping function, enabling the computation of p-values and critical regions for detecting distributional differences. Similarly, the sampling distribution of an estimator θ^\hat{\theta} in parametric models is the pushforward of the data-generating measure under the estimation map, which facilitates asymptotic analysis and confidence interval construction in procedures like maximum likelihood estimation. This transformation perspective unifies various inference tasks by framing them as measure-preserving operations that preserve probabilistic structure while adapting to model assumptions. In optimal transport theory, pushforward measures are central to the definition and computation of Wasserstein distances, which quantify dissimilarities between probability distributions. The pp-Wasserstein distance Wp(μ,ν)W_p(\mu, \nu) between measures μ\mu and ν\nu on a metric space is given by
Wp(μ,ν)=(infT#μ=νxT(x)pdμ(x))1/p, W_p(\mu, \nu) = \left( \inf_{\substack{T_\# \mu = \nu}} \int \|x - T(x)\|^p \, d\mu(x) \right)^{1/p},
where the infimum is over measurable maps TT such that the pushforward T#μ=νT_\# \mu = \nu, and \|\cdot\| denotes the ground metric; this formulation emphasizes deterministic transport plans as pushforwards that minimize expected transport cost. This distance has become a cornerstone for comparing empirical distributions in high-dimensional settings, with applications in domain adaptation and generative modeling where aligning pushforwards ensures metric-aware matching. Normalizing flows extend this idea to machine learning by using compositions of invertible neural network transformations to push forward a simple base measure—often a standard Gaussian—onto complex target distributions approximating real data. These flows enable exact likelihood computation via the change-of-variables formula while generating samples by applying the inverse map, making them powerful for density estimation and variational inference in tasks like anomaly detection and image synthesis. The approach was formalized in seminal work showing that sequential invertible layers can model multimodal distributions effectively. Computationally, the Sinkhorn algorithm addresses the scalability of entropic optimal transport by iteratively solving a regularized problem that approximates optimal couplings, from which near-optimal transport maps can be extracted to define pushforwards between marginals. By adding an entropy term to the transport cost, the algorithm yields smooth, differentiable solutions amenable to gradient-based optimization, with convergence rates scaling favorably for large-scale problems in machine learning pipelines. This method has been widely adopted for tasks requiring fast computation of Wasserstein barycenters or unbalanced transport involving pushforwards.

Generalizations

Transfer Operators

In the context of dynamical systems, the pushforward measure induces a transfer operator on the space of probability densities, which describes how densities evolve under the action of a map T:XXT: X \to X. For a nonsingular map TT on an interval, the transfer operator PTP_T, also known as the Frobenius-Perron operator, acts on a density ρ\rho by
(PTρ)(y)=x:T(x)=yρ(x)T(x), (P_T \rho)(y) = \sum_{x: T(x)=y} \frac{\rho(x)}{|T'(x)|},
where the sum is over all preimages of yy under TT, assuming TT' exists and is nonzero at those points.[24] For invertible maps, this reduces to a single term reflecting the change of variables formula. This operator preserves the total mass of the density, ensuring (PTρ)(y)dy=1\int (P_T \rho)(y) \, dy = 1 if ρ\rho is a probability density.[24] The Frobenius-Perron operator is the adjoint of the Koopman operator UTg=gTU_T g = g \circ T, which acts on observables (bounded functions) by composition with TT. Specifically, for densities in L1L^1 and observables in LL^\infty, the duality relation (PTρ)gdμ=ρ(UTg)dμ\int (P_T \rho) g \, d\mu = \int \rho (U_T g) \, d\mu holds with respect to a reference measure μ\mu, such as Lebesgue measure. The pushforward measure TμT_* \mu corresponds to the measure-theoretic version of this operator, where applying PTP_T to the density of μ\mu yields the density of TμT_* \mu.[25] Spectral properties of the Frobenius-Perron operator reveal key features of the underlying dynamics; in particular, its fixed points ρ\rho satisfying PTρ=ρP_T \rho = \rho are precisely the densities of TT-invariant probability measures absolutely continuous with respect to the reference measure. The leading eigenvalue is typically 1, corresponding to these invariant densities, while subleading eigenvalues govern the decay of correlations and mixing rates.[26] A concrete example is the logistic map Tr(x)=rx(1x)T_r(x) = r x (1 - x) on [0,1][0, 1] with parameter r>1r > 1, where the transfer operator PTrP_{T_r} explicitly sums contributions from the two preimages of each point yy, weighted by the reciprocal of the derivative at those preimages. For r=4r = 4, the invariant density is explicitly ρ(x)=1πx(1x)\rho(x) = \frac{1}{\pi \sqrt{x(1-x)}}, which is a fixed point of PT4P_{T_4}, illustrating ergodicity and mixing.[27]

Disintegration and Extensions

The disintegration theorem provides a fundamental decomposition of a measure μ\mu on a measurable space (X,Σ)(X, \Sigma) with respect to a measurable map f:XYf: X \to Y to another measurable space (Y,T)(Y, T), where the pushforward measure ν=fμ\nu = f_* \mu on YY serves as the base. Specifically, under suitable conditions such as μ\mu being σ\sigma-finite and the spaces being standard Borel or Polish, there exists a family of probability measures {νy}yY\{\nu_y\}_{y \in Y} on XX, unique up to ν\nu-almost everywhere equality, such that each νy\nu_y is concentrated on the fiber f1(y)f^{-1}(y) (i.e., νy(f1(y))=1\nu_y(f^{-1}(y)) = 1) and the original measure disintegrates as
μ(E)=Yνy(Ef1(y))dν(y) \mu(E) = \int_Y \nu_y(E \cap f^{-1}(y)) \, d\nu(y)
for every EΣE \in \Sigma.[28] This formulation extends the intuitive notion of conditional measures along the fibers of ff, allowing the pushforward ν\nu to parameterize the decomposition.[29] Extensions of the disintegration theorem apply beyond probability measures to more general settings, including σ\sigma-finite or Radon measures on non-compact spaces. For instance, when μ\mu is a totally finite Radon measure and ff is a Borel map between locally compact Hausdorff spaces, the family {μy}\{\mu_y\} consists of Radon measures on the fibers, satisfying the integral decomposition without requiring normalization to probabilities.[28] In infinite-dimensional or non-locally compact spaces, such as separable metric spaces with countably compact measures, the theorem holds provided the pushforward ν\nu is analytic, ensuring the existence of measurable selections for the fiber measures.[28] These generalizations facilitate applications in spaces like infinite product measures or Gaussian processes on Hilbert spaces, where fibers may be uncountable.[28] The disintegration theorem is intimately related to conditional expectations in L1(μ)L^1(\mu)-spaces and martingale theory in stochastic processes. For an integrable function g:XRg: X \to \mathbb{R}, the function h(x)=gdνf(x)h(x) = \int g \, d\nu_{f(x)} provides a version of the conditional expectation E[gf]\mathbb{E}[g \mid f], which is Σf\Sigma_f-measurable (where Σf={f1(F):FT}\Sigma_f = \{f^{-1}(F) : F \in T\}) and satisfies Σfhdμ=gdμ\int_{\Sigma_f} h \, d\mu = \int g \, d\mu.[28] In the context of stochastic processes, this connection manifests in filtrations, where disintegrations yield conditional distributions given stopping times, underpinning martingale convergence and optional sampling theorems; for example, in Brownian motion filtrations, the fiber measures νy\nu_y represent conditional laws that preserve martingale properties almost surely.[29] Pushforward measures arise naturally in the theory of fiber bundles, where they describe integrations over fibers in fibered measure spaces. For a fiber bundle π:EB\pi: E \to B with a measure μ\mu on the total space EE, by the disintegration theorem, $ \mu(\pi^{-1}(F)) = \int_B \mu_b(\pi^{-1}(F) \cap \pi^{-1}(b)) , d(\pi_* \mu)(b) $, which simplifies to $ (\pi_* \mu)(F) = \int_F \mu_b(\pi^{-1}(b)) , d(\pi_* \mu)(b) $ for normalized fiber measures $ \mu_b $. In homogeneous spaces modeled as principal bundles GG/HG \to G/H for locally compact groups, explicit formulas for pushforwards incorporate modular functions to account for the geometry. For instance, for a density ϕ\phi on GG, the pushforward satisfies relations like $ \int_{G/H} \left( \int_H \nu(gh) , dh \right) d(p_* (\phi , dg)) = \int_G \nu(g) \left( \int_H \phi(gh) \frac{\Delta_G(h)}{\Delta_H(h)} , dh \right) dg $, enabling computations in ergodic theory and representation theory.[30]

References

User Avatar
No comments yet.