Hubbry Logo
Typical setTypical setMain
Open search
Typical set
Community hub
Typical set
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something
Typical set
Typical set
from Wikipedia

In information theory, the typical set is a set of sequences whose probability is close to two raised to the negative power of the entropy of their source distribution. That this set has total probability close to one is a consequence of the asymptotic equipartition property (AEP) which is a kind of law of large numbers. The notion of typicality is only concerned with the probability of a sequence and not the actual sequence itself.

This has great use in compression theory as it provides a theoretical means for compressing data, allowing us to represent any sequence Xn using nH(X) bits on average, and, hence, justifying the use of entropy as a measure of information from a source.

The AEP can also be proven for a large class of stationary ergodic processes, allowing typical set to be defined in more general cases.

Additionally, the typical set concept is foundational in understanding the limits of data transmission and error correction in communication systems. By leveraging the properties of typical sequences, efficient coding schemes like Shannon's source coding theorem and channel coding theorem are developed, enabling near-optimal data compression and reliable transmission over noisy channels.

(Weakly) typical sequences (weak typicality, entropy typicality)

[edit]

If a sequence x1, ..., xn is drawn from an independent identically-distributed random variable (IID) X defined over a finite alphabet , then the typical set, Aε(n)(n) is defined as those sequences which satisfy:

where

is the information entropy of X. The probability above need only be within a factor of 2n ε. Taking the logarithm on all sides and dividing by -n, this definition can be equivalently stated as

For i.i.d sequence, since

we further have

By the law of large numbers, for sufficiently large n

Properties

[edit]

An essential characteristic of the typical set is that, if one draws a large number n of independent random samples from the distribution X, the resulting sequence (x1x2, ..., xn) is very likely to be a member of the typical set, even though the typical set comprises only a small fraction of all the possible sequences. Formally, given any , one can choose n such that:

  1. The probability of a sequence from X(n) being drawn from Aε(n) is greater than 1 − ε, i.e.
  2. If the distribution over is not uniform, then the fraction of sequences that are typical is
as n becomes very large, since where is the cardinality of .

For a general stochastic process {X(t)} with AEP, the (weakly) typical set can be defined similarly with p(x1x2, ..., xn) replaced by p(x0τ) (i.e. the probability of the sample limited to the time interval [0, τ]), n being the degree of freedom of the process in the time interval and H(X) being the entropy rate. If the process is continuous valued, differential entropy is used instead.

Example

[edit]

Counter-intuitively, the most likely sequence is often not a member of the typical set. For example, suppose that X is an i.i.d Bernoulli random variable with p(0)=0.1 and p(1)=0.9. In n independent trials, since p(1)>p(0), the most likely sequence of outcome is the sequence of all 1's, (1,1,...,1). Here the entropy of X is H(X)=0.469, while

So this sequence is not in the typical set because its average logarithmic probability cannot come arbitrarily close to the entropy of the random variable X no matter how large we take the value of n.

For Bernoulli random variables, the typical set consists of sequences with average numbers of 0s and 1s in n independent trials. This is easily demonstrated: If p(1) = p and p(0) = 1-p, then for n trials with m 1's, we have

The average number of 1's in a sequence of Bernoulli trials is m = np. Thus, we have

For this example, if n=10, then the typical set consist of all sequences that have a single 0 in the entire sequence. In case p(0)=p(1)=0.5, then every possible binary sequences belong to the typical set.

Strongly typical sequences (strong typicality, letter typicality)

[edit]

If a sequence x1, ..., xn is drawn from some specified joint distribution defined over a finite or an infinite alphabet , then the strongly typical set, Aε,strong(n) is defined as the set of sequences which satisfy

where is the number of occurrences of a specific symbol in the sequence.

It can be shown that strongly typical sequences are also weakly typical (with a different constant ε), and hence the name. The two forms, however, are not equivalent. Strong typicality is often easier to work with in proving theorems for memoryless channels. However, as is apparent from the definition, this form of typicality is only defined for random variables having finite support.

Jointly typical sequences

[edit]

Two sequences and are jointly ε-typical if the pair is ε-typical with respect to the joint distribution and both and are ε-typical with respect to their marginal distributions and . The set of all such pairs of sequences is denoted by . Jointly ε-typical n-tuple sequences are defined similarly.

Let and be two independent sequences of random variables with the same marginal distributions and . Then for any ε>0, for sufficiently large n, jointly typical sequences satisfy the following properties:

Applications of typicality

[edit]

Typical set encoding

[edit]

In information theory, typical set encoding encodes only the sequences in the typical set of a stochastic source with fixed length block codes. Since the size of the typical set is about 2nH(X), only nH(X) bits are required for the coding, while at the same time ensuring that the chances of encoding error is limited to ε. Asymptotically, it is, by the AEP, lossless and achieves the minimum rate equal to the entropy rate of the source.

Typical set decoding

[edit]

In information theory, typical set decoding is used in conjunction with random coding to estimate the transmitted message as the one with a codeword that is jointly ε-typical with the observation. i.e.

where are the message estimate, codeword of message and the observation respectively. is defined with respect to the joint distribution where is the transition probability that characterizes the channel statistics, and is some input distribution used to generate the codewords in the random codebook.

Universal null-hypothesis testing

[edit]

Universal channel code

[edit]

See also

[edit]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
In , the typical set (or ϵ\epsilon-typical set) for a discrete random variable XX with finite X\mathcal{X} and length-nn sequences is defined as the set Aϵ(n)A_\epsilon^{(n)} of all sequences xnXnx^n \in \mathcal{X}^n such that 2n(H(X)+ϵ)PXn(xn)2n(H(X)ϵ)2^{-n(H(X)+\epsilon)} \leq P_{X^n}(x^n) \leq 2^{-n(H(X)-\epsilon)}, where H(X)H(X) denotes the of XX and ϵ>0\epsilon > 0 is a small fixed . This set captures sequences whose empirical probabilities align closely with the source distribution, excluding atypically probable or improbable outcomes. A fundamental property of the typical set, arising from the asymptotic equipartition property (AEP), is that as nn grows large, it contains nearly all the probability mass of the source: Pr(XnAϵ(n))1ϵ\Pr(X^n \in A_\epsilon^{(n)}) \geq 1 - \epsilon for sufficiently large nn. Moreover, the cardinality of the typical set is exponentially bounded by the : 2n(H(X)ϵ)Aϵ(n)2n(H(X)+[ϵ](/page/Epsilon))2^{n(H(X)-\epsilon)} \leq |A_\epsilon^{(n)}| \leq 2^{n(H(X)+[\epsilon](/page/Epsilon))}, implying that typical sequences each have probability roughly 2nH(X)2^{-nH(X)} and are asymptotically equiprobable. These characteristics make the typical set a for proving achievability in coding theorems, as it identifies a compact of sequences that represent the source's typical behavior with . The concept extends to jointly typical sets for pairs of random variables (X,Y)(X,Y), defined similarly with respect to their joint entropy H(X,Y)H(X,Y), and plays a pivotal role in channel coding, rate-distortion theory, and network information theory by enabling bounds on error probabilities and compression rates. Introduced in the foundational works of , the typical set underpins the intuitive notion that most observed data from a source conforms to its average statistical properties, facilitating efficient representation and transmission.

Fundamentals of Typicality

Asymptotic Equipartition Property

The asymptotic equipartition property (AEP) (also known as the Shannon–McMillan–Breiman theorem) provides the probabilistic foundation for typical sets in information theory, characterizing the distribution of long sequences generated by a stationary ergodic source. For such a source with entropy rate H(X)H(X), as the sequence length nn increases, the probability P(Xn)P(X^n) of a sequence XnX^n drawn from the source approaches 2nH(X)2^{-nH(X)} for most sequences, rendering them approximately equiprobable when measured on a logarithmic scale. This uniformity in log-probabilities implies that the vast majority of the probability mass concentrates on a subset of sequences whose individual probabilities are close to the exponential decay rate dictated by the entropy. The AEP is formally stated as follows: For a {Xi}\{X_i\} with H(X)H(X), limn1nlog1P(Xn)=H(X)\lim_{n \to \infty} \frac{1}{n} \log \frac{1}{P(X^n)} = H(X) . Equivalently, 1nlogP(Xn)H(X)-\frac{1}{n} \log P(X^n) \to H(X) with probability 1, ensuring that the normalized negative log-probability converges to the for almost all sequences. This convergence arises from the ergodic theorem applied to the sequence of information densities logp(XiXi1)-\log p(X_i \mid X^{i-1}), where P(Xn)=i=1np(XiXi1)P(X^n) = \prod_{i=1}^n p(X_i \mid X^{i-1}). The average 1nlogP(Xn)=1ni=1nlogp(XiXi1)-\frac{1}{n} \log P(X^n) = \frac{1}{n} \sum_{i=1}^n -\log p(X_i \mid X^{i-1}) thus converges to the H(X)H(X), the of the process. For the special case of independent and identically distributed (i.i.d.) sources, the result follows directly from the on the i.i.d. random variables logp(Xi)-\log p(X_i). The AEP was first introduced by in 1948 as a key insight supporting the , demonstrating that typical sequences have probabilities near 2nH2^{-nH} and enabling efficient encoding strategies. Shannon proved the property for i.i.d. processes, while extensions to stationary ergodic sources were established by Brock McMillan in 1953 and Leo Breiman in 1957, solidifying its generality.

Definition of Weak Typicality

In , weak typicality provides a foundational concept for understanding the behavior of sequences generated by a discrete memoryless source with PP over X\mathcal{X}. A xn=(x1,x2,,xn)Xnx^n = (x_1, x_2, \dots, x_n) \in \mathcal{X}^n is defined as weakly ϵ\epsilon-typical with respect to PP if it satisfies the condition 1nlogP(xn)H(X)ϵ,\left| -\frac{1}{n} \log P(x^n) - H(X) \right| \leq \epsilon,
Add your contribution
Related Hubs
Contribute something
User Avatar
No comments yet.