Context-adaptive binary arithmetic coding
View on WikipediaContext-adaptive binary arithmetic coding (CABAC) is a form of entropy encoding used in the H.264/MPEG-4 AVC[1][2] and High Efficiency Video Coding (HEVC) standards. It is a lossless compression technique, although the video coding standards in which it is used are typically for lossy compression applications. CABAC is notable for providing much better compression than most other entropy encoding algorithms used in video encoding, and it is one of the key elements that provides the H.264/AVC encoding scheme with better compression capability than its predecessors.[3]
In H.264/MPEG-4 AVC, CABAC is only supported in the Main and higher profiles (but not the extended profile) of the standard, as it requires a larger amount of processing to decode than the simpler scheme known as context-adaptive variable-length coding (CAVLC) that is used in the standard's Baseline profile. CABAC is also difficult to parallelize and vectorize, so other forms of parallelism (such as spatial region parallelism) may be coupled with its use. In HEVC, CABAC is used in all profiles of the standard.
Algorithm
[edit]CABAC is based on arithmetic coding, with a few innovations and changes to adapt it to the needs of video encoding standards:[4]
- It encodes binary symbols, which keeps the complexity low and allows probability modelling for more frequently used bits of any symbol.
- The probability models are selected adaptively based on local context, allowing better modelling of probabilities, because coding modes are usually locally well correlated.
- It uses a multiplication-free range division by the use of quantized probability ranges and probability states.
CABAC has multiple probability modes for different contexts. It first converts all non-binary symbols to binary. Then, for each bit, the coder selects which probability model to use, then uses information from nearby elements to optimize the probability estimate. Arithmetic coding is finally applied to compress the data.

The context modeling provides estimates of conditional probabilities of the coding symbols. Utilizing suitable context models, a given inter-symbol redundancy can be exploited by switching between different probability models according to already-coded symbols in the neighborhood of the current symbol to encode. The context modeling is responsible for most of CABAC's roughly 10% savings in bit rate over the CAVLC entropy coding method.
Coding a data symbol involves the following stages.
- Binarization: CABAC uses Binary Arithmetic Coding which means that only binary decisions (1 or 0) are encoded. A non-binary-valued symbol (e.g. a transform coefficient or motion vector) is "binarized" or converted into a binary code prior to arithmetic coding. This process is similar to the process of converting a data symbol into a variable length code but the binary code is further encoded (by the arithmetic coder) prior to transmission.
- Stages are repeated for each bit (or "bin") of the binarized symbol.
- Context model selection: A "context model" is a probability model for one or more bins of the binarized symbol. This model may be chosen from a selection of available models depending on the statistics of recently coded data symbols. The context model stores the probability of each bin being "1" or "0".
- Arithmetic encoding: An arithmetic coder encodes each bin according to the selected probability model. Note that there are just two sub-ranges for each bin (corresponding to "0" and "1").
- Probability update: The selected context model is updated based on the actual coded value (e.g. if the bin value was "1", the frequency count of "1"s is increased).
Example
[edit]| MVDx | Binarization |
|---|---|
| 0 | 0 |
| 1 | 10 |
| 2 | 110 |
| 3 | 1110 |
| 4 | 11110 |
| 5 | 111110 |
| 6 | 1111110 |
| 7 | 11111110 |
| 8 | 111111110 |
1. Binarize the value MVDx, the motion vector difference in the x direction.
The first bit of the binarized codeword is bin 1; the second bit is bin 2; and so on.
| ek | Context model for bin 1 |
|---|---|
| 0 ≤ ek < 3 | Model 0 |
| 3 ≤ ek < 33 | Model 1 |
| 33 ≤ ek | Model 2 |
2. Choose a context model for each bin. One of 3 models is selected for bin 1, based on previous coded MVD values. The L1 norm of two previously-coded values, ek, is calculated:
| Bin | Context model |
|---|---|
| 1 | 0, 1 or 2 depending on ek |
| 2 | 3 |
| 3 | 4 |
| 4 | 5 |
| 5 and higher | 6 |
If ek is small, then there is a high probability that the current MVD will have a small magnitude; conversely, if ek is large then it is more likely that the current MVD will have a large magnitude. We select a probability table (context model) accordingly. The remaining bins are coded using one of 4 further context models:
3. Encode each bin. The selected context model supplies two probability estimates: the probability that the bin contains "1" and the probability that the bin contains "0". These estimates determine the two sub-ranges that the arithmetic coder uses to encode the bin.
4. Update the context models. For example, if context model 2 was selected for bin 1 and the value of bin 1 was "0", the frequency count of "0"s is incremented. This means that the next time this model is selected, the probability of a "0" will be slightly higher. When the total number of occurrences of a model exceeds a threshold value, the frequency counts for "0" and "1" will be scaled down, which in effect gives higher priority to recent observations.
The arithmetic decoding engine
[edit]The arithmetic decoder is described in some detail in the Standard. It has three distinct properties:
- Probability estimation is performed by a transition process between 64 separate probability states for "Least Probable Symbol" (LPS, the least probable of the two binary decisions "0" or "1").
- The range R representing the current state of the arithmetic coder is quantized to a small range of pre-set values before calculating the new range at each step, making it possible to calculate the new range using a look-up table (i.e. multiplication-free).
- A simplified encoding and decoding process is defined for data symbols with a near uniform probability distribution.
The definition of the decoding process is designed to facilitate low-complexity implementations of arithmetic encoding and decoding. Overall, CABAC provides improved coding efficiency compared with CAVLC-based coding, at the expense of greater computational complexity.
History
[edit]In 1986, IBM researchers Kottappuram M. A. Mohiuddin and Jorma Johannes Rissanen filed a patent for a multiplication-free binary arithmetic coding algorithm.[5][6] In 1988, an IBM research team including R.B. Arps, T.K. Truong, D.J. Lu, W. B. Pennebaker, L. Mitchell and G. G. Langdon presented an adaptive binary arithmetic coding (ABAC) algorithm called Q-Coder.[7][8]
The above patents and research papers, along several others from IBM and Mitsubishi Electric, were later cited by the CCITT and Joint Photographic Experts Group as the basis for the JPEG image compression format's adaptive binary arithmetic coding algorithm in 1992.[5] However, encoders and decoders of the JPEG file format, which has options for both Huffman encoding and arithmetic coding, typically only support the Huffman encoding option, which was originally because of patent concerns, although JPEG's arithmetic coding patents[9] have since expired due to the age of the JPEG standard.[10] The first reported use of adaptive binary arithmetic coding in motion video compression was in a proposal by IBM researchers to the MPEG group in 1989.[11][12] This proposal extended the use of arithmetic coding from intraframe JPEG to interframe video coding.
In 1999, Youngjun Yoo (Texas Instruments), Young Gap Kwon and Antonio Ortega (University of Southern California) presented a context-adaptive form of binary arithmetic coding.[13] The modern context-adaptive binary arithmetic coding (CABAC) algorithm was commercially introduced with the H.264/MPEG-4 AVC format in 2003.[14] The majority of patents for the AVC format are held by Panasonic, Godo Kaisha IP Bridge and LG Electronics.[15]
See also
[edit]References
[edit]- ^ Richardson, Iain E. G., H.264 / MPEG-4 Part 10 White Paper, 17 October 2002.
- ^ Richardson, Iain E. G. (2003). H.264 and MPEG-4 Video Compression: Video Coding for Next-generation Multimedia. Chichester: John Wiley & Sons Ltd.
- ^ Ze-Nian Li; Mark S. Drew; Jiangchuan Liu (9 April 2014). Fundamentals of Multimedia. Springer Science & Business Media. ISBN 978-3-319-05290-8.
- ^ Marpe, D., Schwarz, H., and Wiegand, T., Context-Based Adaptive Binary Arithmetic Coding in the H.264/AVC Video Compression Standard, IEEE Trans. Circuits and Systems for Video Technology, Vol. 13, No. 7, pp. 620–636, July, 2003.
- ^ a b "T.81 – DIGITAL COMPRESSION AND CODING OF CONTINUOUS-TONE STILL IMAGES – REQUIREMENTS AND GUIDELINES" (PDF). CCITT. September 1992. Retrieved 12 July 2019.
- ^ U.S. patent 4,652,856
- ^ Arps, R. B.; Truong, T. K.; Lu, D. J.; Pasco, R. C.; Friedman, T. D. (November 1988). "A multi-purpose VLSI chip for adaptive data compression of bilevel images". IBM Journal of Research and Development. 32 (6): 775–795. doi:10.1147/rd.326.0775. ISSN 0018-8646.
- ^ Pennebaker, W. B.; Mitchell, J. L.; Langdon, G. G.; Arps, R. B. (November 1988). "An overview of the basic principles of the Q-Coder adaptive binary arithmetic coder". IBM Journal of Research and Development. 32 (6): 717–726. doi:10.1147/rd.326.0717. ISSN 0018-8646.
- ^ "Recommendation T.81 (1992) Corrigendum 1 (01/04)". Recommendation T.81 (1992). International Telecommunication Union. 9 November 2004. Retrieved 3 February 2011.
- ^ JPEG Still Image Data Compression Standard, W. B. Pennebaker and J. L. Mitchell, Kluwer Academic Press, 1992. ISBN 0-442-01272-1
- ^ DCT Coding for Motion Video Storage using Adaptive Arithmetic Coding, C. A. Gonzales. L. Allman, T. McCarthy, P. Wendt and A. N. Akansu, Signal Processing: Image Communication, 2, 145, 1990.
- ^ Encoding of motion video sequences for the MPEG environment using arithmetic coding, E. Viscito and C. Gonzales, SPIE Visual Communications and Image Processing '90, October 2–4, 1990.
- ^ Ortega, A. (October 1999). "Embedded image-domain compression using context models". Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348). Vol. 1. pp. 477–481 vol.1. doi:10.1109/ICIP.1999.821655. ISBN 0-7803-5467-2. S2CID 27303716.
- ^ "Context-Based Adaptive Binary Arithmetic Coding (CABAC)". Fraunhofer Heinrich Hertz Institute. Retrieved 13 July 2019.
- ^ "AVC/H.264 – Patent List" (PDF). MPEG LA. Retrieved 6 July 2019.
Context-adaptive binary arithmetic coding
View on GrokipediaFundamentals
Arithmetic Coding Basics
Arithmetic coding is a lossless data compression technique that encodes an entire message as a single real number within the unit interval [0,1), achieving near-optimal compression by assigning shorter codes to more probable symbols through fractional interval representation rather than fixed-length or variable-length codes for individual symbols. This method, developed in the late 1970s, maps the message to a subinterval whose length is proportional to the probability of the message, allowing the entropy of the source to be approached closely without the overhead of codeword boundaries. The encoding process begins by initializing the current interval as [low, high] = [0, 1). For each symbol in the message, the interval is subdivided based on the symbol's probability distribution, and the subinterval corresponding to the emitted symbol becomes the new current interval. Specifically, if the cumulative probability up to the previous symbol is and the probability of the current symbol is , the updates are:- First A: low = 0 + (1-0) × 0 = 0, high = 0 + (1-0) × 0.8 = 0.8. New interval: [0, 0.8).
- Second B: low = 0 + (0.8-0) × 0.8 = 0.64, high = 0.64 + (0.8-0) × 0.2 = 0.8. New interval: [0.64, 0.8).
- Third A: low = 0.64 + (0.8-0.64) × 0 = 0.64, high = 0.64 + (0.8-0.64) × 0.8 = 0.768. Final interval: [0.64, 0.768).
Binary Arithmetic Coding
Binary arithmetic coding (BAC) adapts the principles of arithmetic coding to a binary alphabet consisting solely of symbols 0 and 1, minimizing the overhead inherent in general arithmetic coding for larger alphabets. This specialization is particularly advantageous for encoding binary data, such as the binarized representations used in many compression schemes, as it allows the coder to approach the theoretical entropy limit more closely than fixed-length or Huffman coding methods, which are constrained by integer bit lengths and cannot fully exploit fractional bit efficiencies for skewed probabilities. By processing one bit at a time, BAC achieves higher compression ratios for sources with uneven symbol probabilities, making it a foundational component in efficient entropy coding systems.[4] The core mechanism of BAC involves iteratively subdividing a current coding interval [low, high) to represent the encoded bit sequence, where the range is defined as range = high - low. For a bit with fixed probability p of being 1 (and thus 1 - p for 0), the subdivision updates the interval based on the observed bit: if the bit is 0, the new low remains unchanged and high is set to low + range × (1 - p); if the bit is 1, low is updated to low + range × (1 - p) and high remains unchanged. These updates narrow the interval progressively, with the final interval used to generate the code stream by selecting a value within it and outputting sufficient bits to distinguish it from adjacent intervals. Practical implementations of BAC employ multiplication-free integer arithmetic to enhance computational efficiency, representing low and range as fixed-point integers (typically 32 bits or less) and approximating multiplications via bit shifts and precomputed lookup tables. Renormalization occurs when range falls below a threshold (e.g., 0.25 in scaled units), shifting bits from low to the output stream and rescaling range by left-shifting (effectively multiplying by powers of 2) until it exceeds the threshold. This approach avoids expensive floating-point or multiplication operations, relying instead on simple additions, shifts, and table accesses for interval rescaling and subdivision, enabling high-throughput encoding in both hardware and software.[7][4] A prominent variant, the MQ-coder, integrates state-based probability estimation into BAC using a finite-state machine with 64 probability states (indexed from 0 to 63), each linked to a quantized less probable symbol (LPS) probability value from a lookup table for rapid access during updates. While this section focuses on static probabilities, the MQ-coder's design supports adaptive extensions by transitioning states based on observed symbols, though here p remains fixed.[4] To illustrate BAC with a static p = 0.3 for bit 1 (thus 1 - p = 0.7 for 0), consider encoding the binary string "1011" starting with low = 0 and high = 1 (range = 1):- Bit 1 ('1'): low = 0 + 1 × 0.7 = 0.7, high = 1, range = 0.3
- Bit 2 ('0'): low = 0.7, high = 0.7 + 0.3 × 0.7 = 0.91, range = 0.21
- Bit 3 ('1'): low = 0.7 + 0.21 × 0.7 = 0.847, high = 0.91, range = 0.063
- Bit 4 ('1'): low = 0.847 + 0.063 × 0.7 = 0.8911, high = 0.91, range = 0.0189
Context Adaptation Concepts
Context adaptation in binary arithmetic coding refers to the process of dynamically selecting or modifying probability estimates for binary symbols based on contextual information, such as previously encoded neighboring symbols or syntactic elements within the data stream. This approach enhances compression performance by tailoring probabilities to the local statistics of non-stationary sources, where symbol probabilities vary depending on surrounding data rather than remaining uniform.[4] Context models serve as the core structure for this adaptation, typically implemented as finite-state machines or probability tables that associate each state with an estimate of the likelihood for a binary symbol being '0' or '1' given the current context. These models enable the coder to switch between different probability distributions as the context changes, capturing dependencies that fixed-probability schemes cannot. Adaptation mechanisms within these models often rely on frequency counting to tally symbol occurrences specific to each context, combined with techniques like exponential smoothing to update probabilities over time. For instance, the updated probability can be computed recursively asEncoding Process
Symbol Binarization
Symbol binarization is a preprocessing step in context-adaptive binary arithmetic coding (CABAC) that maps non-binary input symbols, such as integers representing transform coefficients or motion vector differences, into a sequence of binary decisions known as bins. This conversion is essential because CABAC operates exclusively on binary symbols, reducing the alphabet size to enable efficient arithmetic coding while allowing for prefix and suffix codes that optimize the representation of variable-length data.[9] Common binarization techniques in CABAC include unary binarization, which is particularly effective for encoding small integer values like run-lengths, and exponential-Golomb (EG) codes for larger integers. Unary binarization represents an unsigned integer as a sequence of '1' bits followed by a terminating '0' bit, resulting in a bin string length of . For instance, the value is binarized as the unary code "111110". Variants such as truncated unary (TU) limit the unary prefix to a maximum length , avoiding the terminator if , while EG codes combine a unary prefix with a binary suffix for the remainder , where is the order. Context-adaptive signed variants, like signed exponential-Golomb (SEGs), append a sign bit to handle positive and negative values, such as in motion vector differences.[9] In CABAC as specified in the H.264/AVC standard, binarization is tailored to specific syntax elements for optimal compression. Transform coefficients are binarized using a combination of flags and codes: the significance flag indicates whether a coefficient is nonzero, modeled with contexts based on position in the scanning order; the greater-than-1 flag signals if the absolute value exceeds 1, followed by unsigned exponential-Golomb (UEG0) for the absolute level, which adapts based on the number of trailing ones and previously coded levels. Motion vector differences employ an adaptive unary/exponential-Golomb scheme, such as a truncated unary prefix up to length 9 combined with an EG3 suffix, plus a sign bit for the component value. Each bin in these binarizations is assigned to one of several context models—ranging from 0 to 398 indices—to enable adaptive probability estimation during encoding.[9] While binarization simplifies the input to a binary stream suitable for arithmetic coding, it inherently introduces multiple bins per symbol, each potentially using distinct contexts to capture local statistics and improve adaptation. This bin-level granularity follows the arithmetic encoding step, where the binarized sequence is processed to update coding intervals.[9]Context Modeling
In context-adaptive binary arithmetic coding (CABAC), context model selection is performed using a predefined set of models, totaling 399 in the H.264/AVC standard, indexed by syntax element type, position within the element, and contextual information such as neighboring data.[4] These models are organized into tables that categorize syntax elements like macroblock types, motion vectors, and residual data, with context indices derived from limited templates, often involving up to two neighboring elements to capture spatial dependencies, such as for intra-prediction modes.[4] For instance, the context index increment for a macroblock skip flag is the sum of the skip flag values of the left (A) and top (B) neighboring macroblocks (each 0 or 1), yielding ctxIdxInc from 0 to 2, which selects context models such as 11 to 13 for P slices.[4] During the encoding process, for each binary symbol (bin) produced by binarization, a specific context model $ m $ is selected, and its current most probable symbol (MPS) probability estimate $ p_m $ is used to perform the arithmetic encoding.[4] Following encoding, the model is adapted by updating its state based on the encoded symbol: if the symbol matches the MPS, the state index increments by 1 (up to a maximum of 62); if it is the least probable symbol (LPS), the state index is decremented according to a predefined table TransIdxLPS, and the MPS may toggle when crossing the equi-probable state (index 63).[4] This adaptation is governed by the state transition function $ \text{new_state} = f(\text{current_state}, \text{symbol}) $, implemented via two lookup tables (TransIdxMPS for MPS transitions and TransIdxLPS for LPS), operating within a reduced range of 0 to 63 states—compared to the 512 states in the MQ arithmetic coder—to balance adaptation speed and precision.[4] Although explicit frequency counters like count_0[m] or count_1[m] are not directly maintained, the state transitions implicitly track symbol frequencies to refine $ p_m $ over time.[4] Context models are reset at slice boundaries to ensure independence between slices, initialized with probability values dependent on the slice quantization parameter (SliceQP) and slice type, using predefined tables for values 0 to 2.[4] Most models are adaptive, updating dynamically during encoding, whereas a subset remains fixed for elements like slice headers to avoid overhead from unnecessary adaptation.[4] Spatial adaptation is achieved by deriving context indices from neighboring blocks; for example, in encoding a transform coefficient's significance flag (indicating whether the coefficient is nonzero), the context index (ranging from 0 to 3) is selected based on the significance of the left and above neighboring coefficients within the transform block.[4]Arithmetic Encoding Steps
The arithmetic encoding steps in context-adaptive binary arithmetic coding (CABAC) form the core mechanism for compressing binary symbols (bins) by mapping them to subintervals of a unit interval based on their estimated probabilities, producing a compact bitstream through iterative range refinement and renormalization. This process operates on a current interval defined by a lower bound (initially 0) and a range (initially 510 in the scaled integer representation to maintain precision, with the range quantized to one of four values: 256, 384, 448, 510). For each bin provided from the binarization and context modeling stages, the interval is subdivided according to the bin's value relative to the most probable symbol (MPS) and the estimated probability of the least probable symbol (LPS), , retrieved from the selected context model. The subdivision ensures that the MPS branch occupies the larger portion of the interval (probability ), while the LPS branch occupies the smaller portion ().[4] The step-by-step encoding flow for a single bin proceeds as follows. First, the context model provides the current probability state index (0 to 63), from which is derived, and the MPS value (0 or 1). The bin value is compared to the MPS: if it matches, the MPS branch is selected; otherwise, the LPS branch is taken. The range is quantized to an index to select from a precomputed lookup tableTabRangeLPS[$\sigma$][q], which approximates using 8-bit fixed-point values for multiplication-free operation. The updates are then:
- If the bin equals MPS (MPS branch):
- If the bin differs from MPS (LPS branch):
transIdxLPS[$\sigma$] to reflect the observation. Additionally, if (equi-probable state), the MPS value toggles after an LPS occurrence. This process repeats for each bin in the sequence.[4]
CABAC employs 32-bit fixed-point integer arithmetic to implement these updates efficiently, avoiding floating-point multiplications by relying on the TabRangeLPS table (64 states × 4 range quanta = 256 entries), which stores precomputed scaled values ensuring aligns with the desired approximation where . The low and range are maintained as unsigned 32-bit integers, with and typically after renormalization, providing sufficient precision for video coding without overflow in intermediate computations. Scaling factors are chosen such that renormalization maintains post-update.[4]
Following the interval update, renormalization scales the interval to prevent precision loss when becomes small. This involves a loop that repeatedly left-shifts and by 1 bit until :
- Bin 1 (LPS, ): , ,
- Renormalize: Shift 3 times (), output bits based on MSBs of shifted and the counter.
- Bin 0 (MPS, ): , (unchanged),
- Bin 1 (LPS, ): , ,
- Renormalize: Shift 4 times (), outputting additional bits based on MSBs and handling.
Decoding Process
Arithmetic Decoding Steps
The arithmetic decoding process in context-adaptive binary arithmetic coding (CABAC) reverses the encoding operations to extract binary symbols (bins) from the compressed bitstream, using the same context models to determine interval subdivisions. This engine operates with fixed-point integer arithmetic for efficiency, maintaining three key state variables: the lower boundlow of the current coding interval, the interval width range, and the code value c representing the bitstream position within the interval.[4]
Initialization occurs at the start of each slice or coding unit, setting low = 0 and range = 256 to establish an 8-bit precision interval spanning the full possible range (0 to 255). The code value c is initialized by reading the first 8 bits from the bitstream into the lower bits of c, ensuring 0 ≤ c < 256 to align within the initial interval. These initial values ensure the decoder begins with a normalized state, mirroring the encoder's setup and enabling precise interval tracking from the outset.[4]
To decode a single bin, the decoder first selects the appropriate context model to obtain the most probable symbol value valMPS (0 or 1) and the probability estimate $ p $ of the least probable symbol (LPS). The cumulative probability width for the MPS subinterval is computed as $ \text{cum_prob} = \text{range} \times (1 - p) $. The decoder then determines the offset offset = c - low and compares it against $ \text{cum_prob} $: if offset < cum_prob, the bin is decoded as valMPS (MPS), and the interval is updated by setting range = cum_prob with low unchanged; otherwise, the bin is decoded as 1 - valMPS (LPS), low += cum_prob, and range *= p. This decision effectively identifies which subinterval contains the code value, narrowing the interval while preserving the bitstream's information. In practice, multiplications are approximated using precomputed tables (e.g., TabRangeLPS) indexed by quantized range and probability state to avoid floating-point operations.[4][10]
Renormalization maintains numerical precision during decoding by rescaling the interval when range drops below 256, preventing underflow in the fixed-point representation. While range < 256, the decoder reads the next bits from the bitstream (typically byte-aligned for efficiency), then rescales both low and range by left-shifting them (multiplying by 2 per bit or 256 per byte), updates c accordingly, with carry handling if the shifted low would exceed the register size (e.g., outputting a bit '1' and adjusting low down). This process, repeated as needed, effectively appends new bits to c in an MSB-aligned manner, ensuring the interval remains normalized without losing precision. The renormalization step is symmetric to the encoding counterpart, where bits are output instead of input.[4]
A key equation in this rescaling is the update for the code value: $ c = ((c - \text{low}) \ll n) + \text{new_bits} + (\text{low} \ll n) $, where n is the number of bits read (e.g., 8 for a byte), applied after shifting low and range left by n bits (with carry handling if low's MSB is set). This formula can equivalently be expressed as $ c = (c \ll n) + \text{new_bits} $ in cases without carry propagation from low. It isolates the relative position before incorporating new bitstream data, avoiding overflow and maintaining decoder synchronization with the encoded stream.[4]
After decoding the last bin, final bits are flushed to resolve any pending renormalization or carry bits in low. The remaining bits in low are output directly if a carry propagates, followed by up to two disambiguating bits (typically '1's) to ensure the decoder consumes exactly the bitstream length without over-reading. This termination prevents emulation of start codes and guarantees exact decoding, using a fixed probability state (e.g., index 63 with $ p = 0.5 $) for the final interval. Context models are reused from the encoding process to match these probabilities without additional updates during flushing.[4]
For illustration, consider decoding the bin sequence "101" from a bitstream assuming a fixed context with valMPS = 0, LPS probability $ p = 0.2 $ for simplicity, starting with low = 0, range = 256, and c = 200 (within [0,256)). For the first bin: cum_prob = 256 × 0.8 ≈ 205; offset = 200 - 0 = 200 < 205, so bin = 0 (MPS), range = 205, low = 0. No renormalization. For the second bin: cum_prob = 205 × 0.8 ≈ 164; offset = 200 >= 164, so bin = 1 (LPS), low = 0 + 164 = 164, range = 205 × 0.2 ≈ 41, c remains 200 (unchanged). The new offset = 200 - 164 = 36 < 41, consistent with the interval. Renormalization is needed (41 < 256): shift low, range, and c (using the corrected formula, e.g., for n=8, $ c = ((200 - 164) \ll 8) + \text{new_bits} + (164 \ll 8) $), while handling carry if applicable. Subsequent steps refine the interval further to recover the sequence. This example demonstrates how bin decisions and state updates progressively narrow the interval to recover the original sequence, with c unchanged during subdivision and correct renormalization handling.[4]
Context Modeling Reuse
In context-adaptive binary arithmetic coding (CABAC), synchronization between the encoder and decoder is achieved through the symmetric reuse of context models, ensuring that both sides maintain identical probability estimates without transmitting any side information for model states.[4] The encoder and decoder begin with the same initial context models and apply identical deterministic update rules after processing each binary symbol (bin), allowing the models to evolve in lockstep based solely on the sequence of encoded or decoded bins.[4] This reuse mechanism confines adaptation to the current slice, preventing desynchronization across independent processing units.[4] During decoding, after a bin is recovered through arithmetic decoding—either as the most probable symbol (MPS) or least probable symbol (LPS)—the corresponding context model is updated using the same rules as in encoding to reflect the observed symbol.[4] Each context model is represented by a probability state index (ranging from 0 to 63, corresponding to 126 total states including MPS polarity) and the MPS value (0 or 1).[4] If the decoded bin matches the MPS, the state index increments by 1 (saturating at 62 for most probable cases); if it is the LPS, the state index transitions according to a predefined table (TransIdxLPS), which decrements the index based on the current probability estimate, with a special toggle of the MPS value at the equi-probable state (index 63).[4] These updates effectively track frequency-like adaptations, such as increasing the count for the observed symbol, but are implemented via efficient table lookups to minimize computation.[4] Context models are initialized at the start of each slice from standard tables derived via linear regression on empirical data, parameterized by the slice quantization parameter (SliceQP) and a slice-type index (0-2) signaled in the slice header.[4] This initialization ensures both encoder and decoder load equivalent starting states, after which adaptive evolution mirrors the encoding path through bin-by-bin updates.[4] No explicit model states are transmitted, as desynchronization is avoided by the deterministic nature of the updates, which depend only on the bitstream symbols processed within the slice.[4] For edge cases, models are fully reset at slice boundaries to support independent decoding of slices, limiting backward adaptation to the duration of a single slice and enabling parallel processing.[4] Fixed contexts, such as those for certain syntax elements with uniform probabilities (e.g., equi-probable bins), remain unchanged throughout, bypassing adaptation to preserve efficiency.[4] As an illustrative example, consider a context model starting at state index 20 with MPS = 0 (implying a low probability for bin '1'). If the first decoded bin is 0 (MPS), the state increments to 21; if the second decoded bin is 1 (now LPS at the updated state), the state transitions per TransIdxLPS (e.g., to 19), but in a sequence of two MPS '0' bins, it would increment to 22—mirroring the exact path taken during encoding without any additional data exchange.[4]Symbol Debinarization
In the decoding process of context-adaptive binary arithmetic coding (CABAC), symbol debinarization serves as the final step to reconstruct the original symbols from the sequence of binary decisions (bins) obtained through arithmetic decoding. This inverse mapping follows the identical binarization rules employed during encoding, ensuring that the bitstream's syntax structure guides the parsing of bins into symbols such as transform coefficients, motion vectors, or prediction modes. By accumulating bins according to predefined prefix codes or fixed-length representations, the decoder reverses the transformation while leveraging the same context models to interpret the bin stream accurately.[4] The debinarization process begins by parsing the bin sequence in a manner that mirrors the forward binarization, consuming bins until a terminator is encountered or a fixed length is reached. For unary binarization, which represents non-negative integers as consecutive '1' bins followed by a '0' terminator, the inverse operation counts the number of leading '1's until the '0' to recover . Truncated unary codes, used for bounded values up to a cutoff , omit the terminator for the maximum value, allowing the decoder to infer directly from '1's without a following '0'. Exponential-Golomb (EG) codes, common for sparse data like coefficient levels, divide the bin string into a unary prefix of length (representing the quotient) and a binary suffix of bits (the remainder); the unsigned value is then computed as , with an optional additional bin indicating the sign for signed symbols. These mappings ensure efficient recovery for varying symbol entropies, with the choice of code (e.g., EG0 for levels, EG3 for motion components) dictated by the syntax element.[4] Context considerations during debinarization align precisely with those used in encoding, where each bin's probability estimate derives from an adaptive model indexed by previously decoded syntax elements. However, the actual reconstruction of the symbol relies on the decoder's knowledge of the video syntax, such as block type or neighboring coefficients, to select the appropriate binarization scheme and context indices (e.g., contexts 0-3 for unary prefix bins in transform levels). This synchronization minimizes mismatches, as the bins are decoded using the same context-adaptive probabilities that were updated during encoding. For instance, in decoding a transform coefficient absolute level using unary/EG0, the first few '1's might use context index 0 (for significant coefficients), while later bins shift to higher indices based on magnitude.[4] A practical example illustrates this: consider the bin string "111110" for a unary-coded coefficient, where the first five bins are '1' (each decoded with progressively higher context indices, say 0 through 4, based on prior levels) followed by a '0' terminator. The decoder counts five '1's to reconstruct the value 5, updating the relevant contexts afterward to reflect the symbol's magnitude. For an exponential-Golomb case, a prefix of three '1's followed by '0' (q=3) and a 3-bit suffix "101" (5 in decimal) yields value = , potentially followed by a sign bin '1' for negative.[4] Errors in decoding early bins can propagate through the debinarization, potentially corrupting the entire symbol (e.g., miscounting a unary prefix alters the value significantly); however, CABAC's design, with its context adaptation and separate models for different bin positions, limits the impact by localizing error effects to similar future symbols rather than the whole stream.[4]Implementation Details
Probability Estimation
In context-adaptive binary arithmetic coding (CABAC), probability estimation for each binary symbol is managed through a finite-state machine comprising 128 distinct states, indexed by a 7-bit value ranging from 0 to 127, which allows for efficient adaptation without explicit frequency counting.[4] These states encode both the estimated probability of the least probable symbol (LPS) and the identity of the most probable symbol (MPS), with even indices (0 to 126 in steps of 2) corresponding to MPS = 0 and odd indices to MPS = 1, effectively yielding 64 unique LPS probability estimates that range from approximately 0.5 to 0.01875.[4] The LPS probabilities are predefined in a lookup table and approximated to support multiplication-free arithmetic operations, where the interval size for the LPS is determined by scaling the current range using quantized values from another table, TabRangeLPS[state][quarter_index], to maintain precision with 8-bit range normalization.[4] This table-driven representation enables faster adaptation than the full-statistics approach of the MQ-coder, which relies on incremental frequency counts and periodic renormalization.[4] The 64 LPS probabilities are derived recursively to mimic an exponential moving average with an adaptation factor α ≈ 0.9493, starting from p_LPS(0) = 0.5 and following p_LPS(i) = α · p_LPS(i-1) for decreasing probabilities as the state index increases toward higher confidence in the MPS.[4] Internally, these probabilities are scaled and quantized for fixed-point arithmetic; for instance, the most probable probability can be represented as mpp[s] ≈ Q[s]/512 in 9-bit precision, where Q[s] is the table entry for state s (0-63), and the least probable probability lpp[s] = 512 - mpp[s], though the coding process uses the complementary interval directly without computing explicit probabilities.[4] At the beginning of each coding slice, probability estimation is initialized using standard-defined tables that assign an initial state index and MPS value to each context model, tailored to the slice type (I, P, or B) and quantization parameter (QP). The initial state is computed via a linear approximation formula, such as state ≈ clip((μ · QP + offset) / scale, 1, 126), where μ and offset parameters are predefined per context category and QP modulo 6, with separate tables (0-2) signaled in the slice header to reflect content-specific biases, such as assuming MPS = 0 for many intra-prediction contexts.[4] This QP-dependent initialization ensures quick convergence to typical probabilities for the video content, reducing early coding inefficiency.[4] Following the coding of each binary symbol, the probability state is updated adaptively to reflect the observed outcome. If the symbol equals the current MPS, the state index increments by 1 (saturated at 126), shifting toward greater MPS likelihood; if the symbol is the LPS, the state index is decremented according to the transition table transIdxLPS[current_index], which maps the 6-bit index (0-63) to a new value approximating the probability shift.[4] The update equations approximated by the table are p_LPS,new ≈ α · p_LPS for MPS occurrences and p_LPS,new ≈ (1 - α) · p_LPS + (1 - p_LPS) · α for LPS occurrences, with the MPS toggled if the transition crosses the equi-probable point (index 32).[4] One non-adaptive state (index 63 for the probability sub-index, corresponding to full 7-bit states with fixed p_LPS ≈ 0.5) is reserved for equiprobable or termination-related fixed coding.[4] The following excerpt from the standard's context probability table illustrates representative LPS interval sizes (in quarter-range units, approximate p_LPS = value/256) for selected state indices:| State Index | TabRangeLPS[0] (p_LPS ≈) |
|---|---|
| 0 | 128 (0.500) |
| 10 | 87 (0.340) |
| 20 | 59 (0.230) |
| 30 | 40 (0.156) |
| 40 | 27 (0.105) |
| 50 | 18 (0.070) |
| 60 | 12 (0.047) |
| 63 | 2 (0.008) |
Bypass and Termination Modes
In CABAC, the bypass mode provides a simplified encoding mechanism for binary symbols (bins) that are assumed to occur with equal probability, approximately 0.5, thereby bypassing the context modeling and adaptive probability estimation processes used in the regular mode.[4] This mode is particularly employed for data with low entropy or uniform distribution, such as slice headers, sign bits of transform coefficients, and certain suffix bits in exponential-Golomb codes for motion vector differences.[4] By avoiding the multiplication operations and probability updates inherent in regular arithmetic coding, the bypass mode significantly reduces computational complexity for these symbols while maintaining coding efficiency close to that of ideal equiprobable encoding.[4] The encoding steps in bypass mode involve updating the arithmetic coder's interval without adaptive scaling. Specifically, the range is halved for each bin, and the low value is adjusted only if the bin value is 1 by adding the halved range. Renormalization occurs more frequently due to the fixed halving, allowing faster bit output through bit shifting rather than complex arithmetic. The process can be described as follows:- If the bin value is 0:
range = range / 2,lowunchanged. - If the bin value is 1:
range = range / 2,low = low + range. - Renormalization: While
range < 256, output the least significant bit oflowto the bitstream, shiftlowleft by 1 (discarding overflow carries as additional bits if needed), and doublerange.[4]
low = 0 and range = 512. After the first '1': range = 256, low = 256. After the second '1': range = 128, low = 384, followed by renormalization outputting bit 0, setting low = 768, range = 256. After the third '0': range = 128, low = 768, renormalizing to output bit 0, low = 1536, range = 256. After the fourth '0': range = 128, low = 1536, renormalizing to output bit 0, low = 3072, range = 256. The bitstream receives the bits 000 in this case, demonstrating how bypass approximates direct appending adjusted by the arithmetic interval management (note: actual initialization uses range=32768).[4]
The termination mode finalizes the bitstream at the end of a slice or coding unit, ensuring the decoder can unambiguously recover all encoded symbols without further input.[4] It is invoked after the last regular or bypass bin, followed by flushing the remaining interval using a procedure that outputs a definitive set of bits without encoding any additional symbol.[4] This produces the necessary bits to cover the current low and range, preventing any renormalization on the decoder side and guaranteeing exact bitstream length—typically involving outputting floor(log_2(range)) + 1 bits from the most significant bits of low, followed by zeros equal to the number of pending underflow bits E.[4]