Hubbry Logo
Entropy rateEntropy rateMain
Open search
Entropy rate
Community hub
Entropy rate
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Entropy rate
Entropy rate
from Wikipedia

In the mathematical theory of probability, the entropy rate or source information rate is a function assigning an entropy to a stochastic process.

For a strongly stationary process, the conditional entropy for latest random variable eventually tend towards this rate value.

Definition

[edit]

A process with a countable index gives rise to the sequence of its joint entropies . If the limit exists, the entropy rate is defined as

Note that given any sequence with and letting , by telescoping one has . The entropy rate thus computes the mean of the first such entropy changes, with going to infinity. The th entropy change is itself the conditional entropy . The entropy rate is thus the average entropy of the distribution of the th variable once the previous ones are known. The behaviour of joint entropies from one index to the next is also explicitly subject in some characterizations of entropy.

Discussion

[edit]

While may be understood as a sequence of random variables, the entropy rate represents the average entropy change per one random variable, in the long term.

It can be thought of as a general property of stochastic sources - this is the subject of the asymptotic equipartition property.

For strongly stationary processes

[edit]

A stochastic process also gives rise to a sequence of conditional entropies, comprising more and more random variables. For strongly stationary stochastic processes, the entropy rate equals the limit of that sequence

The quantity given by the limit on the right is also denoted , which is motivated to the extent that here this is then again a rate associated with the process, in the above sense.

For Markov chains

[edit]

Since a stochastic process defined by a Markov chain that is irreducible and aperiodic has a stationary distribution, the entropy rate is independent of the initial distribution.[1]

For example, consider a Markov chain defined on a countable number of states. Given its right stochastic transition matrix and an entropy

associated with each state, one finds

where is the asymptotic distribution of the chain.

In particular, it follows that the entropy rate of an i.i.d. stochastic process is the same as the entropy of any individual member in the process.

For hidden Markov models

[edit]

The entropy rate of hidden Markov models (HMM) has no known closed-form solution. However, it has known upper and lower bounds. Let the underlying Markov chain be stationary, and let be the observable states, then we haveand at the limit of , both sides converge to the middle.[2]

Applications

[edit]

The entropy rate may be used to estimate the complexity of stochastic processes. It is used in diverse applications ranging from characterizing the complexity of languages, blind source separation, through to optimizing quantizers and data compression algorithms. For example, a maximum entropy rate criterion may be used for feature selection in machine learning.[3]

See also

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
The entropy rate is a central in that quantifies the average amount of uncertainty or information produced per symbol or per unit time by a , serving as a measure of its intrinsic and predictability. Formally, for a discrete-time stationary {Xi}\{X_i\} with finite alphabet, the entropy rate H(X)H(X) is defined as limn1nH(X1,,Xn)\lim_{n \to \infty} \frac{1}{n} H(X_1, \dots, X_n), where HH denotes the joint Shannon entropy; this limit exists and equals the limnH(XnX1,,Xn1)\lim_{n \to \infty} H(X_n \mid X_1, \dots, X_{n-1}) for stationary processes. An analogous rate applies to continuous-valued processes, defined via joint densities as limn1nh(X1,,Xn)\lim_{n \to \infty} \frac{1}{n} h(X_1, \dots, X_n), capturing uncertainty in domains like . Introduced by in his foundational 1948 paper "," the entropy rate extends the notion of from single random variables to sequences, providing the theoretical foundation for efficient encoding of information sources. For independent and identically distributed (i.i.d.) processes, it simplifies to the entropy of a single symbol, H(Xi)H(X_i); for first-order Markov chains, it is H(X)=iπijPijlog2PijH(X) = -\sum_i \pi_i \sum_j P_{ij} \log_2 P_{ij}, where π\pi is the and PP the transition matrix, highlighting dependence structure. In ergodic processes, the asymptotic equipartition property (AEP) ensures that typical sequences have probability approximately 2nH(X)2^{-n H(X)}, enabling near-optimal compression at this rate. The entropy rate plays a pivotal role in data compression, where it sets the minimal achievable for lossless encoding without redundancy, as per . It also informs bounds in noisy communication systems and models predictability in diverse applications, including , genomic sequences, and neural signal analysis. For non-stationary or asymptotically mean stationary processes, extensions via ergodic preserve the rate as an integral over component measures, ensuring robustness in practical estimations.

Fundamentals

Historical Development

The concept of entropy rate in traces its conceptual roots to in the late , where introduced a measure of as the logarithm of the number of microstates corresponding to a macrostate, quantifying disorder in physical systems, and J. Willard Gibbs later formalized the Gibbs entropy for statistical ensembles, providing a probabilistic framework that influenced subsequent developments in uncertainty measures. These thermodynamic notions served as analogies for information-theoretic entropy, marking a shift toward applying entropy-like quantities to communication and prediction in the . Claude Shannon formalized in as a measure of average uncertainty in a in his 1948 paper "," where he extended this to the entropy rate for processes, defining it as the limit of the entropy per symbol in increasingly long sequences to capture the average information production rate of a source. This extension built directly on Shannon's single-symbol , adapting it to model the unpredictability in sequential data like communication channels. In the and , the entropy rate concept evolved through connections to relative entropy and process predictability, notably advanced by Solomon Kullback and Richard Leibler, who introduced the Kullback-Leibler divergence in 1951 as a measure of loss between probability distributions, enabling comparisons of predictability in stochastic processes. Key milestones included applications to Markov processes in the , with Brockway McMillan proving in 1953 that for stationary ergodic processes, the entropy rate equals the limit of the conditional entropies for blocks of increasing length, establishing the existence of the limit for such processes, and Leo Breiman establishing in 1957 the individual ergodic theorem, which guarantees almost sure convergence of the per-symbol entropy to the rate for ergodic sources. By the 1970s, the entropy rate received further formalization within , particularly through Donald Ornstein's 1970 proof that Bernoulli shifts with equal entropy rates are isomorphic, solidifying the rate's role as a complete invariant for classifying stationary processes and bridging with dynamical systems. This period emphasized the rate's invariance properties, advancing its application beyond communication to broader probabilistic modeling.

Definition and Properties

The entropy rate of a discrete-time stochastic process {Xi}i=1\{X_i\}_{i=1}^\infty with finite alphabet is formally defined as H=limn1nH(X1,,Xn),H = \lim_{n \to \infty} \frac{1}{n} H(X_1, \dots, X_n), where H(X1,,Xn)H(X_1, \dots, X_n) denotes the joint Shannon entropy of the first nn random variables, provided the limit exists. This definition quantifies the average uncertainty or information content per symbol in the long run, building on Shannon entropy for finite sequences. The entropy rate is typically measured in bits per symbol when using the base-2 logarithm or in nats per symbol with the natural logarithm. It possesses several key properties: non-negativity, ensuring H0H \geq 0, with equality holding only for deterministic processes where outcomes are certain; additivity for independent processes, such that if {Xi}\{X_i\} and {Yi}\{Y_i\} are independent, then the entropy rate of the joint process satisfies HXY=HX+HYH_{XY} = H_X + H_Y; monotonicity with respect to process memory, where increased dependence (longer memory) yields a lower or equal entropy rate compared to less dependent processes; and invariance under bijective relabeling of the alphabet, preserving the rate regardless of symbol renaming. The rate relates directly to the predictability of the process: a lower value indicates greater predictability due to underlying statistical structure, while a zero rate corresponds to fully deterministic sequences with no residual uncertainty. For continuous-time or continuous-alphabet processes with probability density functions, the concept generalizes via the rate, defined analogously as the limit of the average differential entropy per dimension or time unit.

Theoretical Foundations

Stationary Processes

A strongly stochastic process, also known as a strict-sense , is one where the joint probability distributions of any finite collection of random variables remain invariant under time shifts. This means that for any integer kk and any nn, the joint distribution of (X1,,Xn)(X_1, \dots, X_n) equals that of (Xk+1,,Xk+n)(X_{k+1}, \dots, X_{k+n}), ensuring that statistical properties like means, variances, and higher-order moments do not change over time. This time-invariance is a prerequisite for defining a consistent rate, as it allows the entropy of finite-length blocks to be independent of their starting position in the sequence. For such processes, the entropy rate H(X)H(X) is defined as the limit of the average per symbol: H(X)=limn1nH(Xn+1,,X2nX1,,Xn),H(X) = \lim_{n \to \infty} \frac{1}{n} H(X_{n+1}, \dots, X_{2n} | X_1, \dots, X_n), but more commonly expressed using the one-step-ahead : H(X)=limnH(XnX1,,Xn1).H(X) = \lim_{n \to \infty} H(X_n | X_1, \dots, X_{n-1}). This limit equals the infimum over nn of H(Xn+1X1,,Xn)H(X_{n+1} | X_1, \dots, X_n), reflecting the minimal added by each new symbol given the entire past. The existence of this limit is guaranteed by the of : the joint H(X1,,Xn)H(X_1, \dots, X_n) satisfies H(X1,,Xn+m)H(X1,,Xn)+H(Xn+1,,Xn+m)H(X_1, \dots, X_{n+m}) \leq H(X_1, \dots, X_n) + H(X_{n+1}, \dots, X_{n+m}), which, combined with stationarity, ensures the sequence 1nH(X1,,Xn)\frac{1}{n} H(X_1, \dots, X_n) converges. This formulation specializes the general entropy rate to cases where temporal homogeneity simplifies computations and interpretations. In ergodic theory, an equivalent definition arises through the Kolmogorov-Sinai entropy, which quantifies the average information generated by iterating a measure-preserving transformation on a probability space. For an invariant measure μ\mu on a space MM, a finite partition α={C1,,Cr}\alpha = \{C_1, \dots, C_r\}, and the shift map T:MMT: M \to M, the entropy with respect to α\alpha is H(μ,α)=limn1nH(μ,i=0n1Tiα),H(\mu, \alpha) = \lim_{n \to \infty} \frac{1}{n} H\left(\mu, \bigvee_{i=0}^{n-1} T^{-i} \alpha \right), where \bigvee denotes the join of partitions (refining them successively) and H(μ,)H(\mu, \cdot) is the Shannon entropy of the induced measure on the partition. The overall system entropy is then h(T)=supαH(μ,α)h(T) = \sup_{\alpha} H(\mu, \alpha), taken over all finite partitions. This metric invariant connects dynamical systems to stochastic processes by viewing the process as generated by symbolic dynamics from the partition, yielding the same rate as the probabilistic definition for stationary ergodic sources. Illustrative examples highlight these concepts. For independent and identically distributed (IID) processes, stationarity holds trivially, and the entropy rate simplifies to the single-symbol entropy H(X)=H(X1)H(X) = H(X_1), as conditional entropies H(XnX1,,Xn1)=H(X1)H(X_n | X_1, \dots, X_{n-1}) = H(X_1) for all n>1n > 1. In processes with finite memory, such as those depending only on the previous mm symbols (e.g., mm-th order Markov processes), the entropy rate equals H(Xm+1X1,,Xm)H(X_{m+1} | X_1, \dots, X_m), stabilizing after the memory length and computable from the over states of size up to the alphabet raised to mm. These cases demonstrate how stationarity enables exact expressions, contrasting with non-stationary processes where limits may not exist or require additional assumptions.

Asymptotic Equipartition Property

The asymptotic equipartition property (AEP) characterizes the behavior of sequences generated by a stationary ergodic stochastic process with finite entropy rate HH. For such a process {Xi}i=1\{X_i\}_{i=1}^\infty taking values in a finite alphabet, the ϵ\epsilon-typical set Aϵ(n)A_\epsilon^{(n)} consists of all length-nn sequences x1nx_1^n satisfying 1nlog2P(x1n)H<ϵ\left| -\frac{1}{n} \log_2 P(x_1^n) - H \right| < \epsilon
Add your contribution
Related Hubs
User Avatar
No comments yet.