Hubbry Logo
Channel capacityChannel capacityMain
Open search
Channel capacity
Community hub
Channel capacity
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Channel capacity
Channel capacity
from Wikipedia

Channel capacity, in electrical engineering, computer science, and information theory, is the theoretical maximum rate at which information can be reliably transmitted over a communication channel.

Following the terms of the noisy-channel coding theorem, the channel capacity of a given channel is the highest information rate (in units of information per unit time) that can be achieved with arbitrarily small error probability.[1][2]

Information theory, developed by Claude E. Shannon in 1948, defines the notion of channel capacity and provides a mathematical model by which it may be computed. The key result states that the capacity of the channel, as defined above, is given by the maximum of the mutual information between the input and output of the channel, where the maximization is with respect to the input distribution.[3]

The notion of channel capacity has been central to the development of modern wireline and wireless communication systems, with the advent of novel error correction coding mechanisms that have resulted in achieving performance very close to the limits promised by channel capacity.

Formal definition

[edit]

The basic mathematical model for a communication system is the following:

where:

  • is the message to be transmitted;
  • is the channel input symbol ( is a sequence of symbols) taken in an alphabet ;
  • is the channel output symbol ( is a sequence of symbols) taken in an alphabet ;
  • is the estimate of the transmitted message;
  • is the encoding function for a block of length ;
  • is the noisy channel, which is modeled by a conditional probability distribution; and,
  • is the decoding function for a block of length .

Let and be modeled as random variables. Furthermore, let be the conditional probability distribution function of given , which is an inherent fixed property of the communication channel. Then the choice of the marginal distribution completely determines the joint distribution due to the identity

which, in turn, induces a mutual information . The channel capacity is defined as

where the supremum is taken over all possible choices of .

Additivity of channel capacity

[edit]

Channel capacity is additive over independent channels.[4] It means that using two independent channels in a combined manner provides the same theoretical capacity as using them independently. More formally, let and be two independent channels modelled as above; having an input alphabet and an output alphabet . Idem for . We define the product channel as

This theorem states:

Proof

We first show that .

Let and be two independent random variables. Let be a random variable corresponding to the output of through the channel , and for through .

By definition .

Since and are independent, as well as and , is independent of . We can apply the following property of mutual information:

For now we only need to find a distribution such that . In fact, and , two probability distributions for and achieving and , suffice:

ie.

Now let us show that .

Let be some distribution for the channel defining and the corresponding output . Let be the alphabet of , for , and analogously and .

By definition of mutual information, we have

Let us rewrite the last term of entropy.

By definition of the product channel, . For a given pair , we can rewrite as:

By summing this equality over all , we obtain .

We can now give an upper bound over mutual information:

This relation is preserved at the supremum. Therefore

Combining the two inequalities we proved, we obtain the result of the theorem:

Shannon capacity of a graph

[edit]

If G is an undirected graph, it can be used to define a communications channel in which the symbols are the graph vertices, and two codewords may be confused with each other if their symbols in each position are equal or adjacent. The computational complexity of finding the Shannon capacity of such a channel remains open, but it can be upper bounded by another important graph invariant, the Lovász number.[5]

Noisy-channel coding theorem

[edit]

The noisy-channel coding theorem states that for any error probability ε > 0 and for any transmission rate R less than the channel capacity C, there is an encoding and decoding scheme transmitting data at rate R whose error probability is less than ε, for a sufficiently large block length. Also, for any rate greater than the channel capacity, the probability of error at the receiver goes to 0.5 as the block length goes to infinity.

Example application

[edit]

An application of the channel capacity concept to an additive white Gaussian noise (AWGN) channel with B Hz bandwidth and signal-to-noise ratio S/N is the Shannon–Hartley theorem:

C is measured in bits per second if the logarithm is taken in base 2, or nats per second if the natural logarithm is used, assuming B is in hertz; the signal and noise powers S and N are expressed in a linear power unit (like watts or volts2). Since S/N figures are often cited in dB, a conversion may be needed. For example, a signal-to-noise ratio of 30 dB corresponds to a linear power ratio of .

Channel capacity estimation

[edit]

To determine the channel capacity, it is necessary to find the capacity-achieving distribution and evaluate the mutual information . Research has mostly focused on studying additive noise channels under certain power constraints and noise distributions, as analytical methods are not feasible in the majority of other scenarios. Hence, alternative approaches such as, investigation on the input support,[6] relaxations[7] and capacity bounds,[8] have been proposed in the literature.

The capacity of a discrete memoryless channel can be computed using the Blahut-Arimoto algorithm.

Deep learning can be used to estimate the channel capacity. In fact, the channel capacity and the capacity-achieving distribution of any discrete-time continuous memoryless vector channel can be obtained using CORTICAL,[9] a cooperative framework inspired by generative adversarial networks. CORTICAL consists of two cooperative networks: a generator with the objective of learning to sample from the capacity-achieving input distribution, and a discriminator with the objective to learn to distinguish between paired and unpaired channel input-output samples and estimates .

Channel capacity in wireless communications

[edit]

This section[10] focuses on the single-antenna, point-to-point scenario. For channel capacity in systems with multiple antennas, see the article on MIMO.

Bandlimited AWGN channel

[edit]
AWGN channel capacity with the power-limited regime and bandwidth-limited regime indicated. Here, ; B and C can be scaled proportionally for other values.

If the average received power is [W], the total bandwidth is in Hertz, and the noise power spectral density is [W/Hz], the AWGN channel capacity is

[bits/s],

where is the received signal-to-noise ratio (SNR). This result is known as the Shannon–Hartley theorem.[11]

When the SNR is large (SNR ≫ 0 dB), the capacity is logarithmic in power and approximately linear in bandwidth. This is called the bandwidth-limited regime.

When the SNR is small (SNR ≪ 0 dB), the capacity is linear in power but insensitive to bandwidth. This is called the power-limited regime.

The bandwidth-limited regime and power-limited regime are illustrated in the figure.

Frequency-selective AWGN channel

[edit]

The capacity of the frequency-selective channel is given by so-called water filling power allocation,

where and is the gain of subchannel , with chosen to meet the power constraint.

Slow-fading channel

[edit]

In a slow-fading channel, where the coherence time is greater than the latency requirement, there is no definite capacity as the maximum rate of reliable communications supported by the channel, , depends on the random channel gain , which is unknown to the transmitter. If the transmitter encodes data at rate [bits/s/Hz], there is a non-zero probability that the decoding error probability cannot be made arbitrarily small,

,

in which case the system is said to be in outage. With a non-zero probability that the channel is in deep fade, the capacity of the slow-fading channel in strict sense is zero. However, it is possible to determine the largest value of such that the outage probability is less than . This value is known as the -outage capacity.

Fast-fading channel

[edit]

In a fast-fading channel, where the latency requirement is greater than the coherence time and the codeword length spans many coherence periods, one can average over many independent channel fades by coding over a large number of coherence time intervals. Thus, it is possible to achieve a reliable rate of communication of [bits/s/Hz] and it is meaningful to speak of this value as the capacity of the fast-fading channel.

Feedback Capacity

[edit]

Feedback capacity is the greatest rate at which information can be reliably transmitted, per unit time, over a point-to-point communication channel in which the receiver feeds back the channel outputs to the transmitter. Information-theoretic analysis of communication systems that incorporate feedback is more complicated and challenging than without feedback. Possibly, this was the reason C.E. Shannon chose feedback as the subject of the first Shannon Lecture, delivered at the 1973 IEEE International Symposium on Information Theory in Ashkelon, Israel.

The feedback capacity is characterized by the maximum of the directed information between the channel inputs and the channel outputs, where the maximization is with respect to the causal conditioning of the input given the output. The directed information was coined by James Massey[12] in 1990, who showed that its an upper bound on feedback capacity. For memoryless channels, Shannon showed[13] that feedback does not increase the capacity, and the feedback capacity coincides with the channel capacity characterized by the mutual information between the input and the output. The feedback capacity is known as a closed-form expression only for several examples such as the trapdoor channel,[14] Ising channel,[15][16]. For some other channels, it is characterized through constant-size optimization problems such as the binary erasure channel with a no-consecutive-ones input constraint,[17] NOST channel.[18]

The basic mathematical model for a communication system is the following:

Communication with feedback

Here is the formal definition of each element (where the only difference with respect to the nonfeedback capacity is the encoder definition):

  • is the message to be transmitted, taken in an alphabet ;
  • is the channel input symbol ( is a sequence of symbols) taken in an alphabet ;
  • is the channel output symbol ( is a sequence of symbols) taken in an alphabet ;
  • is the estimate of the transmitted message;
  • is the encoding function at time , for a block of length ;
  • is the noisy channel at time , which is modeled by a conditional probability distribution; and,
  • is the decoding function for a block of length .

That is, for each time there exists a feedback of the previous output such that the encoder has access to all previous outputs . An code is a pair of encoding and decoding mappings with , and is uniformly distributed. A rate is said to be achievable if there exists a sequence of codes such that the average probability of error: tends to zero as .

The feedback capacity is denoted by , and is defined as the supremum over all achievable rates.

Main results on feedback capacity

[edit]

Let and be modeled as random variables. The causal conditioning describes the given channel. The choice of the causally conditional distribution determines the joint distribution due to the chain rule for causal conditioning[19] which, in turn, induces a directed information .

The feedback capacity is given by

,

where the supremum is taken over all possible choices of .

Gaussian feedback capacity

[edit]

When the Gaussian noise is colored, the channel has memory. Consider for instance the simple case on an autoregressive model noise process where is an i.i.d. process.

Solution techniques

[edit]

The feedback capacity is difficult to solve in the general case. There are some techniques that are related to control theory and Markov decision processes if the channel is discrete.

See also

[edit]
[edit]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
In , channel capacity represents the maximum rate at which information can be reliably communicated over a noisy channel using an optimal coding scheme, measured in bits per second. This concept, introduced by Claude E. Shannon in , establishes a fundamental limit on the achievable data transmission rate without errors exceeding an arbitrarily small probability, even in the presence of or interference. It serves as the theoretical foundation for designing efficient communication systems, ensuring that rates below this capacity allow for error-free transmission through proper encoding, while rates above it inevitably lead to uncorrectable errors. For discrete memoryless channels, channel capacity CC is formally defined as the maximum mutual information between the input XX and output YY, expressed as C=maxp(x)I(X;Y)C = \max_{p(x)} I(X; Y), where the maximization is over all possible input probability distributions p(x)p(x). Shannon's noisy-channel coding theorem asserts that reliable communication is possible at rates up to CC, with the equivocation (or uncertainty in the input given the output) bounded by H(X)CH(X) - C for source entropy H(X)>CH(X) > C. In continuous-time channels, such as those with bandwidth WW, signal power PP, and noise power NN, the capacity simplifies to C=Wlog2(1+PN)C = W \log_2 \left(1 + \frac{P}{N}\right), known as the Shannon-Hartley theorem, which highlights the trade-off between bandwidth, signal-to-noise ratio, and achievable rate. Channel capacity has profound implications across telecommunications, data storage, and networking, guiding the development of error-correcting codes like turbo and LDPC codes that approach these limits in practice. Extensions to modern scenarios, including multiple-access and broadcast channels, further refine capacity bounds under constraints like power or interference, enabling advancements in wireless systems and 5G/6G technologies. Despite these theoretical ideals, real-world factors such as fading, multi-user interference, and computational complexity often require approximations, yet Shannon's framework remains the benchmark for reliability and efficiency.

Core Concepts

Formal Definition

In , the channel capacity CC of a represents the supremum of the rates at which can be transmitted reliably over that channel, in the limit of using arbitrarily long codewords. For a discrete memoryless channel (DMC) characterized by a p(yx)p(y|x) from input X\mathcal{X} to output Y\mathcal{Y}, the capacity is formally defined as the maximum between the input XX and output YY: C=maxp(x)I(X;Y)=maxp(x)[H(Y)H(YX)],C = \max_{p(x)} I(X; Y) = \max_{p(x)} \left[ H(Y) - H(Y|X) \right], where the maximization is over all possible input distributions p(x)p(x) on X\mathcal{X}, I(X;Y)I(X; Y) is the mutual information, H(Y)H(Y) is the entropy of the output, and H(YX)H(Y|X) is the conditional entropy of the output given the input. This quantity is typically expressed in bits per channel use. For continuous-time channels, such as those with bandwidth limitations and additive noise, the capacity is defined analogously as the maximum of the mutual information rate. In the seminal case of a bandlimited channel with bandwidth WW hertz, average signal power PP, and additive white Gaussian noise with power spectral density NN, C=Wlog2(1+PNW)C = W \log_2 \left(1 + \frac{P}{N W}\right) bits per second, achieved by a Gaussian input distribution. This formula highlights how capacity scales with signal-to-noise ratio and bandwidth, establishing a fundamental limit independent of specific signaling schemes.

Noisy-Channel Coding Theorem

The noisy-channel coding theorem, a cornerstone of information theory, demonstrates the limits of reliable communication over noisy channels. Formulated by Claude E. Shannon in 1948, the theorem asserts that for a discrete memoryless channel with capacity CC bits per channel use, there exists an encoding and decoding scheme capable of transmitting information at any rate R<CR < C with an arbitrarily small probability of error as the code block length nn approaches infinity. Conversely, for any rate R>CR > C, no such scheme exists, and the error probability remains bounded below by a positive constant. This result fundamentally separates the challenges of source compression and error correction, showing that noise does not preclude reliable communication provided the rate stays below capacity. The channel capacity CC is rigorously defined as the supremum of the I(X;Y)I(X; Y) over all possible input distributions p(x)p(x), expressed as C=maxp(x)I(X;Y)=maxp(x)[H(Y)H(YX)],C = \max_{p(x)} I(X; Y) = \max_{p(x)} \left[ H(Y) - H(Y \mid X) \right], where XX is the input symbol, YY the output symbol, H(Y)H(Y) the of the output, and H(YX)H(Y \mid X) the representing noise-induced uncertainty. Equivalently, C=maxp(x)[H(X)H(XY)]C = \max_{p(x)} \left[ H(X) - H(X \mid Y) \right], with H(XY)H(X \mid Y) denoting the or residual uncertainty about the input given the output. The theorem's achievability is established via a random coding argument: by generating 2nR2^{nR} random codewords and selecting a maximum-likelihood decoder, the average error probability over the codebook ensemble decays exponentially with nn for R<CR < C, using the on typical set decoding. The converse relies on , which bounds the error probability from below for rates exceeding CC, ensuring the capacity bound is tight. In the continuous-time setting, particularly for bandlimited channels with , Shannon derived the capacity formula in 1948. For a channel of bandwidth WW hertz, with average transmitted power constrained to PP watts and spectral density NN watts per hertz, the capacity becomes C=Wlog2(1+PNW)C = W \log_2 \left(1 + \frac{P}{N W}\right) bits per second. This expression highlights the power-bandwidth trade-off: increasing signal power or bandwidth expands capacity, but fundamentally limits the rate, with Gaussian signaling achieving the maximum. The implies that practical codes, such as convolutional or , can approach this limit asymptotically, though finite-block-length effects introduce a small gap. These results underpin modern digital communication systems, from wireless networks to , by quantifying the reliability achievable against .

Theoretical Properties

Additivity of Channel Capacity

In classical information theory, the additivity of channel capacity refers to the property that the capacity of a product of independent channels equals the sum of their individual capacities. For two discrete memoryless channels N1:X1Y1\mathcal{N}_1: \mathcal{X}_1 \to \mathcal{Y}_1 and N2:X2Y2\mathcal{N}_2: \mathcal{X}_2 \to \mathcal{Y}_2, the capacity C(N1N2)C(\mathcal{N}_1 \otimes \mathcal{N}_2) of the joint channel satisfies C(N1N2)=C(N1)+C(N2)C(\mathcal{N}_1 \otimes \mathcal{N}_2) = C(\mathcal{N}_1) + C(\mathcal{N}_2), where C(N)=maxp(x)I(X;Y)C(\mathcal{N}) = \max_{p(x)} I(X; Y) is the maximum mutual information over input distributions. This holds because the mutual information for independent uses factorizes: if X1X_1 and X2X_2 are independent and the channels are memoryless, then I(X1,X2;Y1,Y2)=I(X1;Y1)+I(X2;Y2)I(X_1, X_2; Y_1, Y_2) = I(X_1; Y_1) + I(X_2; Y_2). The proof relies on the chain rule for and the memoryless property of the channels. Specifically, for independent inputs p(x1,x2)=p(x1)p(x2)p(x_1, x_2) = p(x_1) p(x_2) and p(y1,y2x1,x2)=p(y1x1)p(y2x2)p(y_1, y_2 | x_1, x_2) = p(y_1 | x_1) p(y_2 | x_2), the joint simplifies as follows: I(X1,X2;Y1,Y2)=H(Y1,Y2)H(Y1,Y2X1,X2).I(X_1, X_2; Y_1, Y_2) = H(Y_1, Y_2) - H(Y_1, Y_2 | X_1, X_2). The term factors: H(Y1,Y2X1,X2)=H(Y1X1)+H(Y2X2)H(Y_1, Y_2 | X_1, X_2) = H(Y_1 | X_1) + H(Y_2 | X_2), and the joint entropy H(Y1,Y2)=H(Y1)+H(Y2)H(Y_1, Y_2) = H(Y_1) + H(Y_2) due to of the outputs. Thus, maximizing over product distributions yields the sum of individual maxima, eliminating the need for regularization in the capacity formula. This additivity extends to nn independent uses, where the capacity is nCnC, enabling the to apply directly without additional complexity. For continuous channels like the additive white Gaussian noise (AWGN) channel, additivity holds under power constraints with optimal allocation, such as water-filling across parallel Gaussian subchannels. The capacity becomes i=1k12log2(1+PiNi)\sum_{i=1}^k \frac{1}{2} \log_2 \left(1 + \frac{P_i}{N_i}\right), where PiP_i is the power allocated to the ii-th subchannel with noise variance NiN_i, confirming the additive structure. However, additivity fails in settings with interference, such as multiple-access channels, where the capacity region involves joint mutual informations like R1+R2I(X1,X2;Y)R_1 + R_2 \leq I(X_1, X_2; Y), exceeding simple sums due to correlated signals. This property, established in Shannon's foundational work and elaborated in subsequent analyses, simplifies capacity computations for memoryless systems and underpins reliable communication protocols over independent links.

Shannon Capacity of a Graph

The Shannon capacity of a graph arises in the study of zero-error communication over noisy channels, where the goal is to transmit information with no possibility of decoding error. In 1956, Claude Shannon introduced the zero-error capacity C0C_0 of a discrete memoryless channel (DMC) as the supremum of rates (in bits per channel use) at which reliable communication is possible without any error probability. For a DMC with input alphabet XX, the zero-error capacity is modeled using the confusability graph GG, where the vertices V(G)=XV(G) = X and there is an edge between distinct inputs x,yXx, y \in X if xx and yy are confusable, meaning there exists an output that can arise from both (i.e., their output supports overlap). A valid codeword set for one channel use corresponds to an independent set in GG, as no two codewords can be confusable. The size of the maximum independent set, denoted α(G)\alpha(G), thus gives the maximum number of distinguishable messages in a single use. For nn independent uses of the channel, the effective confusability graph becomes the strong product GnG^{\boxtimes n} (also called the OR-product or disjunctive product), where two nn-tuples of inputs are adjacent if they are confusable in at least one position. An independent set in GnG^{\boxtimes n} represents a valid for nn uses, and α(Gn)\alpha(G^{\boxtimes n}) is the maximum code size. The zero-error capacity is then C0=limn1nlog2α(Gn)C_0 = \lim_{n \to \infty} \frac{1}{n} \log_2 \alpha(G^{\boxtimes n}), and by Fekete's lemma (due to submultiplicativity of α\alpha), this equals supn1nlog2α(Gn)=log2Θ(G)\sup_n \frac{1}{n} \log_2 \alpha(G^{\boxtimes n}) = \log_2 \Theta(G), where the Shannon capacity of the graph is defined as Θ(G)=supn1α(Gn)1/n.\Theta(G) = \sup_{n \geq 1} \alpha(G^{\boxtimes n})^{1/n}. Thus, Θ(G)\Theta(G) quantifies the asymptotic growth rate of the largest independent sets in the strong powers of GG, normalized appropriately; it always satisfies α(G)Θ(G)V(G)\alpha(G) \leq \Theta(G) \leq |V(G)|. Computing Θ(G)\Theta(G) is notoriously difficult, as it requires evaluating independence numbers of arbitrarily high powers, and the limit may not be achieved at finite nn. A trivial lower bound is Θ(G)α(G)\Theta(G) \geq \alpha(G), from single-use codes, but equality holds only for certain graphs, such as complete graphs (Θ(Km)=1\Theta(K_m) = 1) or empty graphs (Θ(Km)=m\Theta(\overline{K_m}) = m). In 1979, introduced the Lovász theta function ϑ(G)\vartheta(G) as a relaxation that provides a computable upper bound: Θ(G)ϑ(G)\Theta(G) \leq \vartheta(G). Defined via orthonormal representations, ϑ(G)\vartheta(G) is the minimum over unit vectors cc and orthonormal vectors {ui}\{u_i\} (one per vertex, with cui=0c^\top u_i = 0 if iji \sim j) of maxi(cui)2\max_i (c^\top u_i)^2, or dually via the maximum trace of a positive semidefinite matrix under constraints. This bound is tight for perfect graphs (where α(G)=χ(G)\alpha(G) = \chi(\overline{G}), the number of the complement), and ϑ\vartheta is multiplicative: ϑ(GH)=ϑ(G)ϑ(H)\vartheta(G \boxtimes H) = \vartheta(G) \vartheta(H). A seminal example where α(G)<Θ(G)\alpha(G) < \Theta(G) is the 5-cycle graph C5C_5, with α(C5)=2\alpha(C_5) = 2 but Θ(C5)=52.236\Theta(C_5) = \sqrt{5} \approx 2.236
Add your contribution
Related Hubs
User Avatar
No comments yet.