Recent from talks
Nothing was collected or created yet.
Shannon coding
View on WikipediaIn the field of data compression, Shannon coding, named after its creator, Claude Shannon, is a lossless data compression technique for constructing a prefix code based on a set of symbols and their probabilities (estimated or measured). It is suboptimal in the sense that it does not achieve the lowest possible expected code word length like Huffman coding does, and never better than but sometimes equal to the Shannon–Fano coding (Fano's method).
The method was the first of its type, the technique was used to prove Shannon's noiseless coding theorem in his 1948 article "A Mathematical Theory of Communication",[1] and is therefore a centerpiece of the information age.
Shannon–Fano coding methods gave rise to the field of information theory and without its contributions, the world would not have any of the many successors; for example Huffman coding, or arithmetic coding. Much of our day-to-day lives are significantly influenced by digital data and this would not be possible without Shannon-Fano coding and the ongoing evolution of its methods.[2][page needed]
In Shannon coding, the symbols are arranged in order from most probable to least probable, and assigned codewords by taking the first bits from the binary expansions of the cumulative probabilities Here denotes the ceiling function (which rounds up to the next integer value).
Example
[edit]In the table below is an example of creating a code scheme for symbols a1 to a6. The value of li gives the number of bits used to represent the symbol ai. The last column is the bit code of each symbol.
| i | pi | li | Previous value in binary | Codeword for ai | |
|---|---|---|---|---|---|
| 1 | 0.36 | 2 | 0.0 | 0.0000 | 00 |
| 2 | 0.18 | 3 | 0.36 | 0.0101... | 010 |
| 3 | 0.18 | 3 | 0.54 | 0.1000... | 100 |
| 4 | 0.12 | 4 | 0.72 | 0.1011... | 1011 |
| 5 | 0.09 | 4 | 0.84 | 0.1101... | 1101 |
| 6 | 0.07 | 4 | 0.93 | 0.1110... | 1110 |
References
[edit]- ^ Shannon, Claude E. (July 1948). "A Mathematical Theory of Communication [reprint with corrections]" (PDF). Bell System Technical Journal. 27 (3): 379–423. doi:10.1002/j.1538-7305.1948.tb01338.x. hdl:11858/00-001M-0000-002C-4314-2.
- ^ Ze-Nian Li; Mark S. Drew; Jiangchuan Liu (9 April 2014). Fundamentals of Multimedia. Springer Science & Business Media. ISBN 978-3-319-05290-8.
Shannon, Claude Elwood. "A Mathematical Theory of Communication." ACM SIGMOBILE mobile computing and communications review 5.1 (2001): 3-55.
Shannon coding
View on GrokipediaBackground
Prefix Codes
A prefix code, also known as a prefix-free code, is a variable-length coding scheme in which no codeword serves as a prefix of any other codeword. This structural property guarantees unique decodability, allowing a receiver to unambiguously reconstruct the original sequence of symbols from the concatenated codewords without requiring explicit delimiters or additional metadata.[6] The notion of prefix codes emerged in the context of efficient data representation for communication systems, as introduced by Claude Shannon in his seminal 1948 paper "A Mathematical Theory of Communication." In this work, Shannon illustrated coding examples that satisfy the prefix condition to enable reliable message recovery, laying the groundwork for modern source coding techniques.[3] A fundamental bound on the lengths of codewords in a prefix code is provided by the Kraft inequality. For a code over an alphabet of size with codeword lengths , the inequality asserts that This condition is both necessary and sufficient for the existence of a prefix code with those lengths; it limits how short the codewords can be collectively, preventing overlaps that would violate the prefix property.[7] Prefix codes enable instantaneous decoding, where each codeword can be recognized and decoded immediately upon receipt, without lookahead into subsequent bits. This efficiency arises from their tree-based representation: codewords correspond to paths from the root to leaves in a -ary tree, with the prefix condition ensuring that no codeword path passes through another codeword's leaf. In the binary case (), this manifests as a binary tree structure.[8]Source Coding Theorem
The source coding theorem, also known as the noiseless coding theorem, applies to a discrete memoryless source, in which symbols are generated independently and identically distributed according to a known probability distribution.[3][9] The entropy of such a source quantifies the fundamental limit on lossless compression and is defined as where is the probability of the -th symbol. This quantity represents the average amount of uncertainty or information per source symbol, measured in bits, and serves as the theoretical minimum average code length required for unique decodability.[3][9] The theorem states that for this source, there exists a prefix code with average codeword length satisfying bits per symbol. The lower bound establishes the necessity: no uniquely decodable code can achieve an average length below the entropy, as proved using the Kraft inequality, which bounds the total code space based on codeword lengths and source probabilities. The upper bound demonstrates sufficiency: a code can be constructed to approach within 1 bit of the entropy, with the proof relying on assigning code lengths proportional to and leveraging the asymptotic equipartition property for block coding, though for single symbols, methods like Shannon coding achieve the bound directly when probabilities are dyadic (i.e., of the form ), allowing exact attainment of .[3][9] These results imply that variable-length prefix codes, such as Shannon coding, can efficiently approximate the entropy bound, enabling near-optimal compression for sources with known statistics, particularly when probabilities align with binary fractions to minimize redundancy.[9]Formal Definition
Code Construction
Shannon coding can be applied to any discrete probability distribution by assigning code lengths . When probabilities are dyadic, meaning each for some integer , the code achieves the entropy bound exactly, as the code lengths match the negative log-probabilities precisely. For non-dyadic probabilities, the ceiling function approximates the ideal lengths, ensuring the expected code length satisfies , though grouping less probable symbols into blocks can further approach optimality.[10][9] The construction algorithm proceeds as follows. First, compute the code length for each symbol . Second, sort the symbols in decreasing order of probability. Then, assign codewords using the cumulative probability distribution: for the -th symbol, let , and form the codeword as the first bits of the binary expansion of a number in the half-open interval . This ensures distinct prefix-free codewords, with higher-probability symbols receiving shorter codes.[11][12] In terms of binary tree interpretation, each codeword corresponds to a unique path from the root to a leaf at depth , where left branches are labeled 0 and right branches are labeled 1. The ordering by decreasing probability fills the tree from left to right, placing more probable symbols at shallower depths and ensuring no leaf is an internal node.[9] The design inherently guarantees the prefix property via the Kraft inequality, as the lengths satisfy , preventing any codeword from being a prefix of another. This feature allows for instantaneous decodability during reception.[9][10] This method of code construction aligns with the Source Coding Theorem by assigning lengths approximately equal to the self-information , thereby approaching the entropy limit.[3]Length and Probability Assignment
In Shannon coding, the codeword length for each symbol with probability is assigned as , which ensures that .[9] This assignment approximates the ideal length by rounding up to the nearest integer, guaranteeing that the resulting lengths satisfy the Kraft inequality for prefix codes while minimizing inefficiency relative to the source entropy.[9] The average codeword length is then given by . For any discrete memoryless source, this satisfies , where is the entropy of the source.[9] This bound demonstrates that Shannon coding achieves near-optimality, with the average length deviating from the entropy by less than 1 bit per symbol.[9] When the source probabilities are dyadic, meaning each for some integer , the assigned lengths yield exact optimality such that .[13] In non-dyadic cases, however, the ceiling operation introduces redundancy, causing to exceed by up to but less than 1 bit.[9] The redundancy of the code is quantified as , with the exact value in dyadic scenarios being zero due to the precise match between probabilities and lengths.[13] For general distributions, this redundancy arises solely from the integer constraint on lengths and can be expressed as , highlighting the approximation's impact on efficiency.[9]Properties
Instantaneous Decodability
Shannon codes are prefix codes, meaning no codeword is a prefix of another. This property is guaranteed because the assigned code lengths satisfy the Kraft inequality , allowing the construction of a prefix-free set of codewords using the cumulative probability intervals.[14] The prefix-free property implies instantaneous decodability: the receiver can decode each symbol as soon as its complete codeword is received, without needing additional bits or lookahead. Decoding proceeds by reading bits from the stream and checking against the codeword set (e.g., via a binary tree or trie) until a full codeword matches, then outputting the corresponding symbol and continuing with the next bit. This enables efficient, real-time processing without buffering the entire message. This decodability corresponds to a binary tree where each codeword defines a unique path to a leaf labeled with the symbol, and no leaf lies on the path to another leaf, ensuring unambiguous resolution upon reaching a leaf.[15] The prefix-free condition also provides inherent error detection: if the received bit sequence does not match any valid codeword path, it indicates a transmission error, though Shannon codes do not include formal error correction mechanisms.Optimality Bounds
Shannon codes achieve the entropy bound precisely when the source probabilities are dyadic, i.e., each for integers and , ensuring that the ideal code lengths are integers. In such cases, the average code length equals the entropy , making the code optimal among all uniquely decodable codes for binary alphabets.[14] For arbitrary probability distributions, Shannon codes assign lengths , resulting in an average length satisfying . This matches the upper bound established by the source coding theorem for the minimal achievable average length of binary prefix codes.[14] Consequently, Shannon codes are optimal among prefix codes for dyadic sources, as they coincide with Huffman codes under these conditions, but become suboptimal for non-dyadic probabilities due to the ceiling operation, which introduces unavoidable redundancy from rounding up non-integer lengths.[16] The redundancy for Shannon codes satisfies , quantifying the inefficiency relative to the entropy lower bound. This redundancy is particularly pronounced in cases of uniform distributions over non-power-of-two symbol sets, where all code lengths are equal and exceed the ideal by up to nearly 1 bit, or in highly non-dyadic distributions where many probabilities have fractional close to 1, maximizing the cumulative rounding loss.[14] Extensions of Shannon codes to non-binary alphabets involve generalizing the length assignment to for alphabet size , preserving similar optimality properties for dyadic-like distributions in base .[14]Examples
Basic Encoding Example
To illustrate the basic encoding process in Shannon coding, consider a source with alphabet symbols A, B, and C having dyadic probabilities , , and . These probabilities, being exact negative powers of 2, enable a code where the expected length equals the entropy bits per symbol.[9] The code lengths follow from the general construction, with , giving , , and .[9] Codewords are assigned using cumulative probabilities , , and , taking the first binary digits of each 's fractional expansion: A maps to 0 (from 0.000...), B to 10 (from 0.1000...), and C to 11 (from 0.11000...). This yields the prefix code {0, 10, 11}.[9] To encode a sample message such as "ABAC", substitute and concatenate the codewords:- A → 0
- B → 10
- A → 0
- C → 11
