Hubbry Logo
search
logo

Context-adaptive binary arithmetic coding

logo
Community Hub0 Subscribers
Read side by side
from Wikipedia

Context-adaptive binary arithmetic coding (CABAC) is a form of entropy encoding used in the H.264/MPEG-4 AVC[1][2] and High Efficiency Video Coding (HEVC) standards. It is a lossless compression technique, although the video coding standards in which it is used are typically for lossy compression applications. CABAC is notable for providing much better compression than most other entropy encoding algorithms used in video encoding, and it is one of the key elements that provides the H.264/AVC encoding scheme with better compression capability than its predecessors.[3]

In H.264/MPEG-4 AVC, CABAC is only supported in the Main and higher profiles (but not the extended profile) of the standard, as it requires a larger amount of processing to decode than the simpler scheme known as context-adaptive variable-length coding (CAVLC) that is used in the standard's Baseline profile. CABAC is also difficult to parallelize and vectorize, so other forms of parallelism (such as spatial region parallelism) may be coupled with its use. In HEVC, CABAC is used in all profiles of the standard.

Algorithm

[edit]

CABAC is based on arithmetic coding, with a few innovations and changes to adapt it to the needs of video encoding standards:[4]

  • It encodes binary symbols, which keeps the complexity low and allows probability modelling for more frequently used bits of any symbol.
  • The probability models are selected adaptively based on local context, allowing better modelling of probabilities, because coding modes are usually locally well correlated.
  • It uses a multiplication-free range division by the use of quantized probability ranges and probability states.

CABAC has multiple probability modes for different contexts. It first converts all non-binary symbols to binary. Then, for each bit, the coder selects which probability model to use, then uses information from nearby elements to optimize the probability estimate. Arithmetic coding is finally applied to compress the data.

CABAC method of entropy encoding used within H264 video compression standard in English

The context modeling provides estimates of conditional probabilities of the coding symbols. Utilizing suitable context models, a given inter-symbol redundancy can be exploited by switching between different probability models according to already-coded symbols in the neighborhood of the current symbol to encode. The context modeling is responsible for most of CABAC's roughly 10% savings in bit rate over the CAVLC entropy coding method.

Coding a data symbol involves the following stages.

  • Binarization: CABAC uses Binary Arithmetic Coding which means that only binary decisions (1 or 0) are encoded. A non-binary-valued symbol (e.g. a transform coefficient or motion vector) is "binarized" or converted into a binary code prior to arithmetic coding. This process is similar to the process of converting a data symbol into a variable length code but the binary code is further encoded (by the arithmetic coder) prior to transmission.
  • Stages are repeated for each bit (or "bin") of the binarized symbol.
  • Context model selection: A "context model" is a probability model for one or more bins of the binarized symbol. This model may be chosen from a selection of available models depending on the statistics of recently coded data symbols. The context model stores the probability of each bin being "1" or "0".
  • Arithmetic encoding: An arithmetic coder encodes each bin according to the selected probability model. Note that there are just two sub-ranges for each bin (corresponding to "0" and "1").
  • Probability update: The selected context model is updated based on the actual coded value (e.g. if the bin value was "1", the frequency count of "1"s is increased).

Example

[edit]
MVDx Binarization
0 0
1 10
2 110
3 1110
4 11110
5 111110
6 1111110
7 11111110
8 111111110

1. Binarize the value MVDx, the motion vector difference in the x direction.

The first bit of the binarized codeword is bin 1; the second bit is bin 2; and so on.

ek Context model for bin 1
0 ≤ ek < 3 Model 0
3 ≤ ek < 33 Model 1
33 ≤ ek Model 2

2. Choose a context model for each bin. One of 3 models is selected for bin 1, based on previous coded MVD values. The L1 norm of two previously-coded values, ek, is calculated:

Bin Context model
1 0, 1 or 2 depending on ek
2 3
3 4
4 5
5 and higher 6

If ek is small, then there is a high probability that the current MVD will have a small magnitude; conversely, if ek is large then it is more likely that the current MVD will have a large magnitude. We select a probability table (context model) accordingly. The remaining bins are coded using one of 4 further context models:

3. Encode each bin. The selected context model supplies two probability estimates: the probability that the bin contains "1" and the probability that the bin contains "0". These estimates determine the two sub-ranges that the arithmetic coder uses to encode the bin.

4. Update the context models. For example, if context model 2 was selected for bin 1 and the value of bin 1 was "0", the frequency count of "0"s is incremented. This means that the next time this model is selected, the probability of a "0" will be slightly higher. When the total number of occurrences of a model exceeds a threshold value, the frequency counts for "0" and "1" will be scaled down, which in effect gives higher priority to recent observations.

The arithmetic decoding engine

[edit]

The arithmetic decoder is described in some detail in the Standard. It has three distinct properties:

  1. Probability estimation is performed by a transition process between 64 separate probability states for "Least Probable Symbol" (LPS, the least probable of the two binary decisions "0" or "1").
  2. The range R representing the current state of the arithmetic coder is quantized to a small range of pre-set values before calculating the new range at each step, making it possible to calculate the new range using a look-up table (i.e. multiplication-free).
  3. A simplified encoding and decoding process is defined for data symbols with a near uniform probability distribution.

The definition of the decoding process is designed to facilitate low-complexity implementations of arithmetic encoding and decoding. Overall, CABAC provides improved coding efficiency compared with CAVLC-based coding, at the expense of greater computational complexity.

History

[edit]

In 1986, IBM researchers Kottappuram M. A. Mohiuddin and Jorma Johannes Rissanen filed a patent for a multiplication-free binary arithmetic coding algorithm.[5][6] In 1988, an IBM research team including R.B. Arps, T.K. Truong, D.J. Lu, W. B. Pennebaker, L. Mitchell and G. G. Langdon presented an adaptive binary arithmetic coding (ABAC) algorithm called Q-Coder.[7][8]

The above patents and research papers, along several others from IBM and Mitsubishi Electric, were later cited by the CCITT and Joint Photographic Experts Group as the basis for the JPEG image compression format's adaptive binary arithmetic coding algorithm in 1992.[5] However, encoders and decoders of the JPEG file format, which has options for both Huffman encoding and arithmetic coding, typically only support the Huffman encoding option, which was originally because of patent concerns, although JPEG's arithmetic coding patents[9] have since expired due to the age of the JPEG standard.[10] The first reported use of adaptive binary arithmetic coding in motion video compression was in a proposal by IBM researchers to the MPEG group in 1989.[11][12] This proposal extended the use of arithmetic coding from intraframe JPEG to interframe video coding.

In 1999, Youngjun Yoo (Texas Instruments), Young Gap Kwon and Antonio Ortega (University of Southern California) presented a context-adaptive form of binary arithmetic coding.[13] The modern context-adaptive binary arithmetic coding (CABAC) algorithm was commercially introduced with the H.264/MPEG-4 AVC format in 2003.[14] The majority of patents for the AVC format are held by Panasonic, Godo Kaisha IP Bridge and LG Electronics.[15]

See also

[edit]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Context-adaptive binary arithmetic coding (CABAC) is a lossless entropy coding technique that enhances compression efficiency in video coding by adaptively modeling the probabilities of binary symbols based on their context, thereby reducing statistical redundancies in the bitstream. Developed as a core component of modern video compression standards, CABAC transforms input syntax elements into binary representations, selects appropriate probability models derived from previously encoded data, and applies arithmetic coding to produce a compact output stream.[1] Introduced in the H.264/AVC standard in 2003, CABAC was designed by researchers at Fraunhofer HHI to outperform traditional methods like context-adaptive variable-length coding (CAVLC) by providing 10-15% better compression for typical video content, particularly in complex scenes with high motion or texture.[2] It became a normative part of H.264/AVC's high-profile configurations, where encoders can select it for superior rate-distortion performance. Subsequently, CABAC was refined and adopted as the sole entropy coding method in the H.265/HEVC standard (2013), further optimizing its throughput and adaptability for higher resolutions like 4K video. CABAC has been further refined and remains the entropy coding method in the H.266/VVC standard (2020), supporting even higher efficiencies for ultra-high-definition video.[1] The encoding process in CABAC consists of three main stages: binarization, where non-binary syntax elements (e.g., motion vectors or transform coefficients) are mapped to sequences of binary decisions called bins using prefix codes like unary or exponential-Golomb; context modeling, which assigns one of up to 400 probability models to each bin based on spatial or temporal neighbors and updates these models adaptively using a finite-state machine; and binary arithmetic coding, which cumulatively refines an interval based on the bin's probability to generate fractional bit representations, followed by renormalization for output.[3] This multi-step approach allows CABAC to exploit both intra-symbol and inter-symbol correlations, making it particularly effective for structured data like quantized coefficients, where runs of zeros are efficiently modeled through a significance map using context-adaptive flags before arithmetic coding.[4] Beyond video standards, CABAC's principles have been extended to other domains, such as deep neural network compression, where it encodes model weights by binarizing significance, sign, and magnitude levels with context adaptation to achieve near-optimal lossless rates.[5] Its hardware implementations, often optimized for parallel processing in modern codecs, balance high throughput (e.g., over 1 bin/cycle in HEVC) with low complexity, contributing to real-time decoding on resource-constrained devices.[1] Overall, CABAC remains a benchmark for adaptive entropy coding due to its proven gains in compression efficiency and robustness across diverse content types.

Fundamentals

Arithmetic Coding Basics

Arithmetic coding is a lossless data compression technique that encodes an entire message as a single real number within the unit interval [0,1), achieving near-optimal compression by assigning shorter codes to more probable symbols through fractional interval representation rather than fixed-length or variable-length codes for individual symbols. This method, developed in the late 1970s, maps the message to a subinterval whose length is proportional to the probability of the message, allowing the entropy of the source to be approached closely without the overhead of codeword boundaries. The encoding process begins by initializing the current interval as [low, high] = [0, 1). For each symbol in the message, the interval is subdivided based on the symbol's probability distribution, and the subinterval corresponding to the emitted symbol becomes the new current interval. Specifically, if the cumulative probability up to the previous symbol is CC and the probability of the current symbol is pp, the updates are:
lowlow+(highlow)×C \text{low} \leftarrow \text{low} + (\text{high} - \text{low}) \times C
highlow+(highlow)×p \text{high} \leftarrow \text{low} + (\text{high} - \text{low}) \times p
These steps narrow the interval successively, with the final interval's length equaling the probability of the entire message under the model; any value within this interval (typically the midpoint or low endpoint, quantized to sufficient precision) serves as the code.[6] The decoding process reverses this by starting with the received code value vv in [0,1) and, for each step, identifying the symbol whose subinterval contains vv, then rescaling vv to the relative position within that subinterval:
vvCp v \leftarrow \frac{v - C}{p}
and repeating until the message length is reached.[6] To mitigate precision loss in finite-precision implementations and prevent underflow as the interval shrinks, renormalization is applied periodically. This involves scaling the interval by shifting out leading zeros from low and high when the range (high - low) falls below a threshold, such as 1/28=1/2561/2^8 = 1/256 for 8-bit precision, effectively multiplying low, high, and vv by 2k2^k where kk is the number of shifts needed to restore adequate range, while outputting or buffering the shifted bits as part of the code stream.[6] This adaptive rescaling ensures numerical stability without altering the relative interval proportions. For illustration, consider encoding the sequence "ABA" assuming a static model with P(A)=0.8P(A) = 0.8 and P(B)=0.2P(B) = 0.2, so cumulative for A is 0 and for B is 0.8. Initial interval: [0, 1).
  • First A: low = 0 + (1-0) × 0 = 0, high = 0 + (1-0) × 0.8 = 0.8. New interval: [0, 0.8).
  • Second B: low = 0 + (0.8-0) × 0.8 = 0.64, high = 0.64 + (0.8-0) × 0.2 = 0.8. New interval: [0.64, 0.8).
  • Third A: low = 0.64 + (0.8-0.64) × 0 = 0.64, high = 0.64 + (0.8-0.64) × 0.8 = 0.768. Final interval: [0.64, 0.768).
A code value such as 0.7 in this interval uniquely represents "ABA".[6] Binary arithmetic coding extends these principles to binary symbols for efficient bitstream integration.

Binary Arithmetic Coding

Binary arithmetic coding (BAC) adapts the principles of arithmetic coding to a binary alphabet consisting solely of symbols 0 and 1, minimizing the overhead inherent in general arithmetic coding for larger alphabets. This specialization is particularly advantageous for encoding binary data, such as the binarized representations used in many compression schemes, as it allows the coder to approach the theoretical entropy limit more closely than fixed-length or Huffman coding methods, which are constrained by integer bit lengths and cannot fully exploit fractional bit efficiencies for skewed probabilities. By processing one bit at a time, BAC achieves higher compression ratios for sources with uneven symbol probabilities, making it a foundational component in efficient entropy coding systems.[4] The core mechanism of BAC involves iteratively subdividing a current coding interval [low, high) to represent the encoded bit sequence, where the range is defined as range = high - low. For a bit with fixed probability p of being 1 (and thus 1 - p for 0), the subdivision updates the interval based on the observed bit: if the bit is 0, the new low remains unchanged and high is set to low + range × (1 - p); if the bit is 1, low is updated to low + range × (1 - p) and high remains unchanged. These updates narrow the interval progressively, with the final interval used to generate the code stream by selecting a value within it and outputting sufficient bits to distinguish it from adjacent intervals. Practical implementations of BAC employ multiplication-free integer arithmetic to enhance computational efficiency, representing low and range as fixed-point integers (typically 32 bits or less) and approximating multiplications via bit shifts and precomputed lookup tables. Renormalization occurs when range falls below a threshold (e.g., 0.25 in scaled units), shifting bits from low to the output stream and rescaling range by left-shifting (effectively multiplying by powers of 2) until it exceeds the threshold. This approach avoids expensive floating-point or multiplication operations, relying instead on simple additions, shifts, and table accesses for interval rescaling and subdivision, enabling high-throughput encoding in both hardware and software.[7][4] A prominent variant, the MQ-coder, integrates state-based probability estimation into BAC using a finite-state machine with 64 probability states (indexed from 0 to 63), each linked to a quantized less probable symbol (LPS) probability value from a lookup table for rapid access during updates. While this section focuses on static probabilities, the MQ-coder's design supports adaptive extensions by transitioning states based on observed symbols, though here p remains fixed.[4] To illustrate BAC with a static p = 0.3 for bit 1 (thus 1 - p = 0.7 for 0), consider encoding the binary string "1011" starting with low = 0 and high = 1 (range = 1):
  • Bit 1 ('1'): low = 0 + 1 × 0.7 = 0.7, high = 1, range = 0.3
  • Bit 2 ('0'): low = 0.7, high = 0.7 + 0.3 × 0.7 = 0.91, range = 0.21
  • Bit 3 ('1'): low = 0.7 + 0.21 × 0.7 = 0.847, high = 0.91, range = 0.063
  • Bit 4 ('1'): low = 0.847 + 0.063 × 0.7 = 0.8911, high = 0.91, range = 0.0189
The final interval [0.8911, 0.91) can then be mapped to a binary code string (e.g., via renormalization steps outputting bits like 1111 in integer approximation), uniquely identifying the sequence.

Context Adaptation Concepts

Context adaptation in binary arithmetic coding refers to the process of dynamically selecting or modifying probability estimates for binary symbols based on contextual information, such as previously encoded neighboring symbols or syntactic elements within the data stream. This approach enhances compression performance by tailoring probabilities to the local statistics of non-stationary sources, where symbol probabilities vary depending on surrounding data rather than remaining uniform.[4] Context models serve as the core structure for this adaptation, typically implemented as finite-state machines or probability tables that associate each state with an estimate of the likelihood for a binary symbol being '0' or '1' given the current context. These models enable the coder to switch between different probability distributions as the context changes, capturing dependencies that fixed-probability schemes cannot. Adaptation mechanisms within these models often rely on frequency counting to tally symbol occurrences specific to each context, combined with techniques like exponential smoothing to update probabilities over time. For instance, the updated probability can be computed recursively as
pt+1=(1α)pt+αo, p_{t+1} = (1 - \alpha) p_t + \alpha \cdot o,
where $ p_t $ is the prior probability estimate, $ o $ is the observed binary outcome (0 or 1), and $ \alpha $ (between 0 and 1) controls the adaptation rate, emphasizing recent observations while retaining historical influence.[4][8] Contexts in adaptive binary arithmetic coding are classified as stationary, which use fixed probability estimates unchanging throughout the process, or adaptive, which evolve by incorporating statistics from recently processed symbols to track shifting distributions. Emphasis is placed on Markovian contexts of order 1 to 3, where the probability of a symbol depends solely on the preceding 1 to 3 symbols, balancing modeling accuracy with computational efficiency and avoiding excessive memory demands. In practical implementations like the Context-based Adaptive Binary Arithmetic Coding (CABAC) used in the H.264/AVC video compression standard, up to 399 distinct context models are predefined for diverse syntax elements, such as motion vector differences and transform coefficient levels, allowing fine-grained adaptation to video-specific patterns.[4] The overall adaptation workflow can be conceptualized through a flowchart: it begins with context selection using available prior symbols or structural cues to identify the appropriate model; proceeds to probability lookup retrieving the current estimate from the model's state; continues with encoding the binary decision under that probability; and concludes with model update adjusting the state based on the actual symbol to refine future estimates. This iterative cycle ensures ongoing alignment between modeled probabilities and the evolving data source.[4]

Encoding Process

Symbol Binarization

Symbol binarization is a preprocessing step in context-adaptive binary arithmetic coding (CABAC) that maps non-binary input symbols, such as integers representing transform coefficients or motion vector differences, into a sequence of binary decisions known as bins. This conversion is essential because CABAC operates exclusively on binary symbols, reducing the alphabet size to enable efficient arithmetic coding while allowing for prefix and suffix codes that optimize the representation of variable-length data.[9] Common binarization techniques in CABAC include unary binarization, which is particularly effective for encoding small integer values like run-lengths, and exponential-Golomb (EG) codes for larger integers. Unary binarization represents an unsigned integer xx as a sequence of xx '1' bits followed by a terminating '0' bit, resulting in a bin string length of len=x+1len = x + 1. For instance, the value x=5x = 5 is binarized as the unary code "111110". Variants such as truncated unary (TU) limit the unary prefix to a maximum length cc, avoiding the terminator if x=cx = c, while EG codes combine a unary prefix with a binary suffix for the remainder x2kx - 2^k, where kk is the order. Context-adaptive signed variants, like signed exponential-Golomb (SEGs), append a sign bit to handle positive and negative values, such as in motion vector differences.[9] In CABAC as specified in the H.264/AVC standard, binarization is tailored to specific syntax elements for optimal compression. Transform coefficients are binarized using a combination of flags and codes: the significance flag indicates whether a coefficient is nonzero, modeled with contexts based on position in the scanning order; the greater-than-1 flag signals if the absolute value exceeds 1, followed by unsigned exponential-Golomb (UEG0) for the absolute level, which adapts based on the number of trailing ones and previously coded levels. Motion vector differences employ an adaptive unary/exponential-Golomb scheme, such as a truncated unary prefix up to length 9 combined with an EG3 suffix, plus a sign bit for the component value. Each bin in these binarizations is assigned to one of several context models—ranging from 0 to 398 indices—to enable adaptive probability estimation during encoding.[9] While binarization simplifies the input to a binary stream suitable for arithmetic coding, it inherently introduces multiple bins per symbol, each potentially using distinct contexts to capture local statistics and improve adaptation. This bin-level granularity follows the arithmetic encoding step, where the binarized sequence is processed to update coding intervals.[9]

Context Modeling

In context-adaptive binary arithmetic coding (CABAC), context model selection is performed using a predefined set of models, totaling 399 in the H.264/AVC standard, indexed by syntax element type, position within the element, and contextual information such as neighboring data.[4] These models are organized into tables that categorize syntax elements like macroblock types, motion vectors, and residual data, with context indices derived from limited templates, often involving up to two neighboring elements to capture spatial dependencies, such as for intra-prediction modes.[4] For instance, the context index increment for a macroblock skip flag is the sum of the skip flag values of the left (A) and top (B) neighboring macroblocks (each 0 or 1), yielding ctxIdxInc from 0 to 2, which selects context models such as 11 to 13 for P slices.[4] During the encoding process, for each binary symbol (bin) produced by binarization, a specific context model $ m $ is selected, and its current most probable symbol (MPS) probability estimate $ p_m $ is used to perform the arithmetic encoding.[4] Following encoding, the model is adapted by updating its state based on the encoded symbol: if the symbol matches the MPS, the state index increments by 1 (up to a maximum of 62); if it is the least probable symbol (LPS), the state index is decremented according to a predefined table TransIdxLPS, and the MPS may toggle when crossing the equi-probable state (index 63).[4] This adaptation is governed by the state transition function $ \text{new_state} = f(\text{current_state}, \text{symbol}) $, implemented via two lookup tables (TransIdxMPS for MPS transitions and TransIdxLPS for LPS), operating within a reduced range of 0 to 63 states—compared to the 512 states in the MQ arithmetic coder—to balance adaptation speed and precision.[4] Although explicit frequency counters like count_0[m] or count_1[m] are not directly maintained, the state transitions implicitly track symbol frequencies to refine $ p_m $ over time.[4] Context models are reset at slice boundaries to ensure independence between slices, initialized with probability values dependent on the slice quantization parameter (SliceQP) and slice type, using predefined tables for values 0 to 2.[4] Most models are adaptive, updating dynamically during encoding, whereas a subset remains fixed for elements like slice headers to avoid overhead from unnecessary adaptation.[4] Spatial adaptation is achieved by deriving context indices from neighboring blocks; for example, in encoding a transform coefficient's significance flag (indicating whether the coefficient is nonzero), the context index (ranging from 0 to 3) is selected based on the significance of the left and above neighboring coefficients within the transform block.[4]

Arithmetic Encoding Steps

The arithmetic encoding steps in context-adaptive binary arithmetic coding (CABAC) form the core mechanism for compressing binary symbols (bins) by mapping them to subintervals of a unit interval based on their estimated probabilities, producing a compact bitstream through iterative range refinement and renormalization. This process operates on a current interval defined by a lower bound LL (initially 0) and a range RR (initially 510 in the scaled integer representation to maintain precision, with the range quantized to one of four values: 256, 384, 448, 510). For each bin provided from the binarization and context modeling stages, the interval is subdivided according to the bin's value relative to the most probable symbol (MPS) and the estimated probability of the least probable symbol (LPS), pLPSp_{\text{LPS}}, retrieved from the selected context model. The subdivision ensures that the MPS branch occupies the larger portion of the interval (probability 1pLPS1 - p_{\text{LPS}}), while the LPS branch occupies the smaller portion (pLPSp_{\text{LPS}}).[4] The step-by-step encoding flow for a single bin proceeds as follows. First, the context model provides the current probability state index σ\sigma (0 to 63), from which pLPSp_{\text{LPS}} is derived, and the MPS value (0 or 1). The bin value is compared to the MPS: if it matches, the MPS branch is selected; otherwise, the LPS branch is taken. The range is quantized to an index q=min((R6)&3,3)q = \min((R \gg 6) \& 3, 3) to select from a precomputed lookup table TabRangeLPS[$\sigma$][q], which approximates rLPS=R×pLPSr_{\text{LPS}} = R \times p_{\text{LPS}} using 8-bit fixed-point values for multiplication-free operation. The updates are then:
  • If the bin equals MPS (MPS branch):
R=RrLPS,L=L R' = R - r_{\text{LPS}}, \quad L' = L
  • If the bin differs from MPS (LPS branch):
R=rLPS,L=L+(RrLPS) R' = r_{\text{LPS}}, \quad L' = L + (R - r_{\text{LPS}})
These equations ensure the new interval [L,L+R)[L', L' + R') precisely represents the conditional probability of the bin given the context. The probability state σ\sigma is then updated adaptively: for the MPS branch, σ\sigma increments toward 63 (indicating higher MPS probability); for the LPS branch, σ\sigma transitions via a lookup table transIdxLPS[$\sigma$] to reflect the observation. Additionally, if σ=63\sigma = 63 (equi-probable state), the MPS value toggles after an LPS occurrence. This process repeats for each bin in the sequence.[4] CABAC employs 32-bit fixed-point integer arithmetic to implement these updates efficiently, avoiding floating-point multiplications by relying on the TabRangeLPS table (64 states × 4 range quanta = 256 entries), which stores precomputed scaled values ensuring rLPSr_{\text{LPS}} aligns with the desired pLPSp_{\text{LPS}} approximation ασ\alpha^{\sigma} where α0.9493\alpha \approx 0.9493. The low LL and range RR are maintained as unsigned 32-bit integers, with L<232L < 2^{32} and R216R \leq 2^{16} typically after renormalization, providing sufficient precision for video coding without overflow in intermediate computations. Scaling factors are chosen such that renormalization maintains R256R \geq 256 post-update.[4] Following the interval update, renormalization scales the interval to prevent precision loss when RR becomes small. This involves a loop that repeatedly left-shifts RR and LL by 1 bit until R256R \geq 256:
RR×2,LL×2(mod232) R \leftarrow R \times 2, \quad L \leftarrow L \times 2 \pmod{2^{32}}
Each shift outputs determined bits from the most significant bits (MSBs) of LL, as they are now fixed in the bitstream. Specifically, if the MSB of LL is 0 before shifting (indicating the interval is below the midpoint), a '0' bit is output; if 1, a '1' bit is output, and LL is adjusted by subtracting the modulus to keep it normalized. To handle cases where the interval straddles a power-of-2 boundary (potential underflow), an extra-bits counter EE (or bits outstanding) tracks deferred '1' outputs: when an MSB=0 shift occurs with E>0E > 0, EE '1's followed by a '0' are output, and EE is reset; conversely, an MSB=1 shift increments EE. This carry-over control ensures no information loss and resolves underflow at termination by flushing remaining bits (e.g., EE '1's followed by a '0', then pending buffer bits). The process can require up to 8 shifts per renormalization for efficiency in hardware/software implementations.[4] For illustration, consider encoding the bin string "101" assuming MPS=0 for all contexts, initial L=0L=0, R=1024R=1024 (scaled for simplicity), and varying pLPSp_{\text{LPS}} values of 0.1, 0.2, 0.1 retrieved from contexts. No renormalization occurs until R<256R < 256, but tracing shows:
  • Bin 1 (LPS, pLPS=0.1p_{\text{LPS}}=0.1): rLPS102r_{\text{LPS}} \approx 102, L=0+1024102=922L' = 0 + 1024 - 102 = 922, R=102R' = 102
  • Renormalize: Shift 3 times (R=816>256R=816 >256), output bits based on MSBs of shifted LL and the EE counter.
  • Bin 0 (MPS, pLPS=0.2p_{\text{LPS}}=0.2): rLPS163r_{\text{LPS}} \approx 163, L=922L'' = 922 (unchanged), R=816163=653R'' = 816 - 163 = 653
  • Bin 1 (LPS, pLPS=0.1p_{\text{LPS}}=0.1): rLPS65r_{\text{LPS}} \approx 65, L=922+65365=1510L''' = 922 + 653 - 65 = 1510, R=65R''' = 65
  • Renormalize: Shift 4 times (R=1040>256R=1040 >256), outputting additional bits based on MSBs and EE handling.
The final subinterval [1510mod4096,1510+1040)[1510 \mod 4096, 1510 + 1040) encodes the sequence, with total output bits approximating log2p(bini)-\sum \log_2 p(\text{bin}_i) (e.g., ~4-5 bits here vs. 3 uncoded). This example demonstrates interval narrowing and bit generation, though actual outputs depend on precise scaling and EE handling.[4]

Decoding Process

Arithmetic Decoding Steps

The arithmetic decoding process in context-adaptive binary arithmetic coding (CABAC) reverses the encoding operations to extract binary symbols (bins) from the compressed bitstream, using the same context models to determine interval subdivisions. This engine operates with fixed-point integer arithmetic for efficiency, maintaining three key state variables: the lower bound low of the current coding interval, the interval width range, and the code value c representing the bitstream position within the interval.[4] Initialization occurs at the start of each slice or coding unit, setting low = 0 and range = 256 to establish an 8-bit precision interval spanning the full possible range (0 to 255). The code value c is initialized by reading the first 8 bits from the bitstream into the lower bits of c, ensuring 0 ≤ c < 256 to align within the initial interval. These initial values ensure the decoder begins with a normalized state, mirroring the encoder's setup and enabling precise interval tracking from the outset.[4] To decode a single bin, the decoder first selects the appropriate context model to obtain the most probable symbol value valMPS (0 or 1) and the probability estimate $ p $ of the least probable symbol (LPS). The cumulative probability width for the MPS subinterval is computed as $ \text{cum_prob} = \text{range} \times (1 - p) $. The decoder then determines the offset offset = c - low and compares it against $ \text{cum_prob} $: if offset < cum_prob, the bin is decoded as valMPS (MPS), and the interval is updated by setting range = cum_prob with low unchanged; otherwise, the bin is decoded as 1 - valMPS (LPS), low += cum_prob, and range *= p. This decision effectively identifies which subinterval contains the code value, narrowing the interval while preserving the bitstream's information. In practice, multiplications are approximated using precomputed tables (e.g., TabRangeLPS) indexed by quantized range and probability state to avoid floating-point operations.[4][10] Renormalization maintains numerical precision during decoding by rescaling the interval when range drops below 256, preventing underflow in the fixed-point representation. While range < 256, the decoder reads the next bits from the bitstream (typically byte-aligned for efficiency), then rescales both low and range by left-shifting them (multiplying by 2 per bit or 256 per byte), updates c accordingly, with carry handling if the shifted low would exceed the register size (e.g., outputting a bit '1' and adjusting low down). This process, repeated as needed, effectively appends new bits to c in an MSB-aligned manner, ensuring the interval remains normalized without losing precision. The renormalization step is symmetric to the encoding counterpart, where bits are output instead of input.[4] A key equation in this rescaling is the update for the code value: $ c = ((c - \text{low}) \ll n) + \text{new_bits} + (\text{low} \ll n) $, where n is the number of bits read (e.g., 8 for a byte), applied after shifting low and range left by n bits (with carry handling if low's MSB is set). This formula can equivalently be expressed as $ c = (c \ll n) + \text{new_bits} $ in cases without carry propagation from low. It isolates the relative position before incorporating new bitstream data, avoiding overflow and maintaining decoder synchronization with the encoded stream.[4] After decoding the last bin, final bits are flushed to resolve any pending renormalization or carry bits in low. The remaining bits in low are output directly if a carry propagates, followed by up to two disambiguating bits (typically '1's) to ensure the decoder consumes exactly the bitstream length without over-reading. This termination prevents emulation of start codes and guarantees exact decoding, using a fixed probability state (e.g., index 63 with $ p = 0.5 $) for the final interval. Context models are reused from the encoding process to match these probabilities without additional updates during flushing.[4] For illustration, consider decoding the bin sequence "101" from a bitstream assuming a fixed context with valMPS = 0, LPS probability $ p = 0.2 $ for simplicity, starting with low = 0, range = 256, and c = 200 (within [0,256)). For the first bin: cum_prob = 256 × 0.8 ≈ 205; offset = 200 - 0 = 200 < 205, so bin = 0 (MPS), range = 205, low = 0. No renormalization. For the second bin: cum_prob = 205 × 0.8 ≈ 164; offset = 200 >= 164, so bin = 1 (LPS), low = 0 + 164 = 164, range = 205 × 0.2 ≈ 41, c remains 200 (unchanged). The new offset = 200 - 164 = 36 < 41, consistent with the interval. Renormalization is needed (41 < 256): shift low, range, and c (using the corrected formula, e.g., for n=8, $ c = ((200 - 164) \ll 8) + \text{new_bits} + (164 \ll 8) $), while handling carry if applicable. Subsequent steps refine the interval further to recover the sequence. This example demonstrates how bin decisions and state updates progressively narrow the interval to recover the original sequence, with c unchanged during subdivision and correct renormalization handling.[4]

Context Modeling Reuse

In context-adaptive binary arithmetic coding (CABAC), synchronization between the encoder and decoder is achieved through the symmetric reuse of context models, ensuring that both sides maintain identical probability estimates without transmitting any side information for model states.[4] The encoder and decoder begin with the same initial context models and apply identical deterministic update rules after processing each binary symbol (bin), allowing the models to evolve in lockstep based solely on the sequence of encoded or decoded bins.[4] This reuse mechanism confines adaptation to the current slice, preventing desynchronization across independent processing units.[4] During decoding, after a bin is recovered through arithmetic decoding—either as the most probable symbol (MPS) or least probable symbol (LPS)—the corresponding context model is updated using the same rules as in encoding to reflect the observed symbol.[4] Each context model is represented by a probability state index (ranging from 0 to 63, corresponding to 126 total states including MPS polarity) and the MPS value (0 or 1).[4] If the decoded bin matches the MPS, the state index increments by 1 (saturating at 62 for most probable cases); if it is the LPS, the state index transitions according to a predefined table (TransIdxLPS), which decrements the index based on the current probability estimate, with a special toggle of the MPS value at the equi-probable state (index 63).[4] These updates effectively track frequency-like adaptations, such as increasing the count for the observed symbol, but are implemented via efficient table lookups to minimize computation.[4] Context models are initialized at the start of each slice from standard tables derived via linear regression on empirical data, parameterized by the slice quantization parameter (SliceQP) and a slice-type index (0-2) signaled in the slice header.[4] This initialization ensures both encoder and decoder load equivalent starting states, after which adaptive evolution mirrors the encoding path through bin-by-bin updates.[4] No explicit model states are transmitted, as desynchronization is avoided by the deterministic nature of the updates, which depend only on the bitstream symbols processed within the slice.[4] For edge cases, models are fully reset at slice boundaries to support independent decoding of slices, limiting backward adaptation to the duration of a single slice and enabling parallel processing.[4] Fixed contexts, such as those for certain syntax elements with uniform probabilities (e.g., equi-probable bins), remain unchanged throughout, bypassing adaptation to preserve efficiency.[4] As an illustrative example, consider a context model starting at state index 20 with MPS = 0 (implying a low probability for bin '1'). If the first decoded bin is 0 (MPS), the state increments to 21; if the second decoded bin is 1 (now LPS at the updated state), the state transitions per TransIdxLPS (e.g., to 19), but in a sequence of two MPS '0' bins, it would increment to 22—mirroring the exact path taken during encoding without any additional data exchange.[4]

Symbol Debinarization

In the decoding process of context-adaptive binary arithmetic coding (CABAC), symbol debinarization serves as the final step to reconstruct the original symbols from the sequence of binary decisions (bins) obtained through arithmetic decoding. This inverse mapping follows the identical binarization rules employed during encoding, ensuring that the bitstream's syntax structure guides the parsing of bins into symbols such as transform coefficients, motion vectors, or prediction modes. By accumulating bins according to predefined prefix codes or fixed-length representations, the decoder reverses the transformation while leveraging the same context models to interpret the bin stream accurately.[4] The debinarization process begins by parsing the bin sequence in a manner that mirrors the forward binarization, consuming bins until a terminator is encountered or a fixed length is reached. For unary binarization, which represents non-negative integers kk as kk consecutive '1' bins followed by a '0' terminator, the inverse operation counts the number of leading '1's until the '0' to recover kk. Truncated unary codes, used for bounded values up to a cutoff CC, omit the terminator for the maximum value, allowing the decoder to infer CC directly from CC '1's without a following '0'. Exponential-Golomb (EG) codes, common for sparse data like coefficient levels, divide the bin string into a unary prefix of length qq (representing the quotient) and a binary suffix of qq bits (the remainder); the unsigned value is then computed as (2q)+suffix1(2^q) + \text{suffix} - 1, with an optional additional bin indicating the sign for signed symbols. These mappings ensure efficient recovery for varying symbol entropies, with the choice of code (e.g., EG0 for levels, EG3 for motion components) dictated by the syntax element.[4] Context considerations during debinarization align precisely with those used in encoding, where each bin's probability estimate derives from an adaptive model indexed by previously decoded syntax elements. However, the actual reconstruction of the symbol relies on the decoder's knowledge of the video syntax, such as block type or neighboring coefficients, to select the appropriate binarization scheme and context indices (e.g., contexts 0-3 for unary prefix bins in transform levels). This synchronization minimizes mismatches, as the bins are decoded using the same context-adaptive probabilities that were updated during encoding. For instance, in decoding a transform coefficient absolute level using unary/EG0, the first few '1's might use context index 0 (for significant coefficients), while later bins shift to higher indices based on magnitude.[4] A practical example illustrates this: consider the bin string "111110" for a unary-coded coefficient, where the first five bins are '1' (each decoded with progressively higher context indices, say 0 through 4, based on prior levels) followed by a '0' terminator. The decoder counts five '1's to reconstruct the value 5, updating the relevant contexts afterward to reflect the symbol's magnitude. For an exponential-Golomb case, a prefix of three '1's followed by '0' (q=3) and a 3-bit suffix "101" (5 in decimal) yields value = 23+51=112^3 + 5 - 1 = 11, potentially followed by a sign bin '1' for negative.[4] Errors in decoding early bins can propagate through the debinarization, potentially corrupting the entire symbol (e.g., miscounting a unary prefix alters the value significantly); however, CABAC's design, with its context adaptation and separate models for different bin positions, limits the impact by localizing error effects to similar future symbols rather than the whole stream.[4]

Implementation Details

Probability Estimation

In context-adaptive binary arithmetic coding (CABAC), probability estimation for each binary symbol is managed through a finite-state machine comprising 128 distinct states, indexed by a 7-bit value ranging from 0 to 127, which allows for efficient adaptation without explicit frequency counting.[4] These states encode both the estimated probability of the least probable symbol (LPS) and the identity of the most probable symbol (MPS), with even indices (0 to 126 in steps of 2) corresponding to MPS = 0 and odd indices to MPS = 1, effectively yielding 64 unique LPS probability estimates that range from approximately 0.5 to 0.01875.[4] The LPS probabilities are predefined in a lookup table and approximated to support multiplication-free arithmetic operations, where the interval size for the LPS is determined by scaling the current range using quantized values from another table, TabRangeLPS[state][quarter_index], to maintain precision with 8-bit range normalization.[4] This table-driven representation enables faster adaptation than the full-statistics approach of the MQ-coder, which relies on incremental frequency counts and periodic renormalization.[4] The 64 LPS probabilities are derived recursively to mimic an exponential moving average with an adaptation factor α ≈ 0.9493, starting from p_LPS(0) = 0.5 and following p_LPS(i) = α · p_LPS(i-1) for decreasing probabilities as the state index increases toward higher confidence in the MPS.[4] Internally, these probabilities are scaled and quantized for fixed-point arithmetic; for instance, the most probable probability can be represented as mpp[s] ≈ Q[s]/512 in 9-bit precision, where Q[s] is the table entry for state s (0-63), and the least probable probability lpp[s] = 512 - mpp[s], though the coding process uses the complementary interval directly without computing explicit probabilities.[4] At the beginning of each coding slice, probability estimation is initialized using standard-defined tables that assign an initial state index and MPS value to each context model, tailored to the slice type (I, P, or B) and quantization parameter (QP). The initial state is computed via a linear approximation formula, such as state ≈ clip((μ · QP + offset) / scale, 1, 126), where μ and offset parameters are predefined per context category and QP modulo 6, with separate tables (0-2) signaled in the slice header to reflect content-specific biases, such as assuming MPS = 0 for many intra-prediction contexts.[4] This QP-dependent initialization ensures quick convergence to typical probabilities for the video content, reducing early coding inefficiency.[4] Following the coding of each binary symbol, the probability state is updated adaptively to reflect the observed outcome. If the symbol equals the current MPS, the state index increments by 1 (saturated at 126), shifting toward greater MPS likelihood; if the symbol is the LPS, the state index is decremented according to the transition table transIdxLPS[current_index], which maps the 6-bit index (0-63) to a new value approximating the probability shift.[4] The update equations approximated by the table are p_LPS,new ≈ α · p_LPS for MPS occurrences and p_LPS,new ≈ (1 - α) · p_LPS + (1 - p_LPS) · α for LPS occurrences, with the MPS toggled if the transition crosses the equi-probable point (index 32).[4] One non-adaptive state (index 63 for the probability sub-index, corresponding to full 7-bit states with fixed p_LPS ≈ 0.5) is reserved for equiprobable or termination-related fixed coding.[4] The following excerpt from the standard's context probability table illustrates representative LPS interval sizes (in quarter-range units, approximate p_LPS = value/256) for selected state indices:
State IndexTabRangeLPS[0] (p_LPS ≈)
0128 (0.500)
1087 (0.340)
2059 (0.230)
3040 (0.156)
4027 (0.105)
5018 (0.070)
6012 (0.047)
632 (0.008)
These values support the arithmetic engine by providing precomputed subinterval sizes, ensuring low-complexity implementation (note: approximation for low p_LPS is rough due to quantization; nominal estimates range to ≈0.01875).[4] The estimated probabilities from this process are applied within the selected context model during both encoding and decoding for consistent adaptation.[4]

Bypass and Termination Modes

In CABAC, the bypass mode provides a simplified encoding mechanism for binary symbols (bins) that are assumed to occur with equal probability, approximately 0.5, thereby bypassing the context modeling and adaptive probability estimation processes used in the regular mode.[4] This mode is particularly employed for data with low entropy or uniform distribution, such as slice headers, sign bits of transform coefficients, and certain suffix bits in exponential-Golomb codes for motion vector differences.[4] By avoiding the multiplication operations and probability updates inherent in regular arithmetic coding, the bypass mode significantly reduces computational complexity for these symbols while maintaining coding efficiency close to that of ideal equiprobable encoding.[4] The encoding steps in bypass mode involve updating the arithmetic coder's interval without adaptive scaling. Specifically, the range is halved for each bin, and the low value is adjusted only if the bin value is 1 by adding the halved range. Renormalization occurs more frequently due to the fixed halving, allowing faster bit output through bit shifting rather than complex arithmetic. The process can be described as follows:
  • If the bin value is 0: range = range / 2, low unchanged.
  • If the bin value is 1: range = range / 2, low = low + range.
  • Renormalization: While range < 256, output the least significant bit of low to the bitstream, shift low left by 1 (discarding overflow carries as additional bits if needed), and double range.[4]
This implementation leverages integer shifts for efficiency, eliminating the need for table lookups or multiplications. For example, consider encoding a 4-bit header "1100" in bypass mode for illustration with scaled-down values starting with low = 0 and range = 512. After the first '1': range = 256, low = 256. After the second '1': range = 128, low = 384, followed by renormalization outputting bit 0, setting low = 768, range = 256. After the third '0': range = 128, low = 768, renormalizing to output bit 0, low = 1536, range = 256. After the fourth '0': range = 128, low = 1536, renormalizing to output bit 0, low = 3072, range = 256. The bitstream receives the bits 000 in this case, demonstrating how bypass approximates direct appending adjusted by the arithmetic interval management (note: actual initialization uses range=32768).[4] The termination mode finalizes the bitstream at the end of a slice or coding unit, ensuring the decoder can unambiguously recover all encoded symbols without further input.[4] It is invoked after the last regular or bypass bin, followed by flushing the remaining interval using a procedure that outputs a definitive set of bits without encoding any additional symbol.[4] This produces the necessary bits to cover the current low and range, preventing any renormalization on the decoder side and guaranteeing exact bitstream length—typically involving outputting floor(log_2(range)) + 1 bits from the most significant bits of low, followed by zeros equal to the number of pending underflow bits E.[4]
Output bits: log2(range)+1 bits from low, followed by E zeros. \text{Output bits: } \left\lfloor \log_2(\text{range}) \right\rfloor + 1 \text{ bits from } \text{low}, \text{ followed by E zeros.}
This approach contrasts with ongoing regular encoding by providing a definitive endpoint, essential for independent slice decoding in video standards.[4]

Precision and Range Handling

In CABAC, the current coding interval is represented using two key variables: the lower bound low and the interval width range, both implemented as 32-bit unsigned integers. This fixed-point representation implicitly scales the interval to the range [0, 2^{32} - 1], corresponding to the probability space [0, 1). The range is initialized to 2^{15} (0x8000) and maintained above a minimum threshold to ensure sufficient precision for subsequent operations.[11] Renormalization is performed to counteract the precision loss from repeated interval subdivisions, ensuring numerical stability. This process is triggered whenever range < 256, at which point range and low are left-shifted (doubled) in a loop until range ≥ 256, with bits from low output to the bitstream as needed. Unlike basic binary arithmetic coding, which typically employs single-byte (8-bit) shifts, CABAC uses double-byte (16-bit) shifts for renormalization when possible, reducing the average number of loop iterations by half and improving efficiency for real-time video encoding applications.[4][11] Underflow conditions, which arise from potential carry propagation when low + range approaches or exceeds 2^{32}, are managed using an extra bits counter E. Specifically, E is incremented each time low ≥ 2^{31} during interval updates. During renormalization, if E > 0, a '1' bit is output to the bitstream instead of the most significant bit of low, and E is decremented accordingly; otherwise, the bit from low is output normally. This mechanism prevents information loss and maintains the integrity of the encoded bitstream without requiring floating-point arithmetic.[4][11] For rescaling the interval after a symbol decision, CABAC avoids direct multiplication of range by the probability estimate p (or 1 - p) to minimize computational overhead. Instead, a two-dimensional lookup table rLPS[state][q] provides the rescaled range for the least probable symbol (LPS) case, where state indexes one of 64 quantized probability states and q is derived from range via bit-masking and shifting (specifically, q = min(((range - 256) >> 6) + 1, 4) to quantize range into 4 classes). In the most probable symbol (MPS) case, range is updated as range = range - rLPS. The low is adjusted accordingly, with renormalization applied immediately after to restore precision. This table-driven approach ensures hardware and software implementations remain efficient while preserving accuracy.[4][11] In practice, these mechanisms are integrated into the encoding and decoding loops, with low and range updates following the form:
range=(ranges)×Q[state],low=(lows)+offset \text{range} = (\text{range} \gg s) \times Q[\text{state}], \quad \text{low} = (\text{low} \gg s) + \text{offset}
where $ s $ is the cumulative shift count from renormalization, $ Q[\text{state}] $ is the precomputed quantized interval width from the table, and offset accounts for the subinterval selection. This integer-only operation supports seamless operation across bypass and regular modes.[4]

Applications

Role in H.264/AVC

Context-adaptive binary arithmetic coding (CABAC) was standardized in 2003 as part of the H.264/AVC video coding standard, providing an optional entropy coding alternative to context-adaptive variable-length coding (CAVLC) for improved compression efficiency.[4] It is supported in the Main and High profiles of H.264/AVC, where encoders and decoders must implement it alongside CAVLC, whereas the Baseline profile exclusively uses CAVLC due to its lower complexity requirements.[12] This design allows CABAC to be selected for scenarios demanding higher efficiency, such as broadcast and storage applications.[2] In H.264/AVC, CABAC encodes a wide range of syntax elements, including macroblock types (mb_type), sub-macroblock types (sub_mb_type), and macroblock skip flags (mb_skip_flag), using tailored binarization trees and context models to capture local statistical dependencies.[4] For motion vectors, differences are binarized with unary-exponential Golomb codes of order 3 (UEG3), where the unary prefix (up to 9 bins) is coded in regular mode using four dedicated context models based on prediction errors from neighboring blocks, while the exponential-Golomb suffix and sign bit are processed in bypass mode to reduce latency.[4] Transform coefficients, a major portion of the bitstream, are handled through significance maps and level coding, with significant_coeff_flag (sig_flag) and last_significant_coeff_flag (last_sig_flag) using up to 61 context models each for position-dependent adaptation in 4x4 luma blocks, and coeff_abs_level_minus1 employing 49 context models for level magnitudes via truncated unary binarization.[4] A representative example of CABAC's application is the coding of quantized transform coefficients in a 4x4 block, where the significance map requires sig_flag and last_sig_flag for each potential non-zero position (up to 16 sig_flags and 15 last_sig_flags across the block), followed by level and sign coding for actual non-zero coefficients; this process generates approximately 15 bins per coefficient on average, combining map contributions and level details for efficient representation of sparse data.[4] CABAC delivers 9–14% bitrate savings over CAVLC at equivalent quality levels, especially in the 30–38 dB PSNR range, making it valuable for high-efficiency profiles.[4] However, its adaptive context updates and renormalization steps increase decoder complexity compared to CAVLC.[2] Additionally, CABAC's inherently serial operation hinders parallelism, complicating real-time implementations on multi-core processors or hardware accelerators.[12] The original H.264/AVC specification incorporated 399 probability context models in CABAC, which were progressively refined through later amendments to optimize adaptation and reduce overhead without altering core functionality.[4]

Role in HEVC/H.265

In the High Efficiency Video Coding (HEVC/H.265) standard finalized in 2013, context-adaptive binary arithmetic coding (CABAC) was retained as the sole entropy coding method, building on its role as an optional high-efficiency alternative in the predecessor H.264/AVC standard.[13] Key enhancements focused on improving throughput and compression for higher resolutions like 4K and 8K, including a reduction in the total number of contexts from over 400 in H.264/AVC to approximately 154, which lowers memory requirements by about three times while maintaining coding efficiency.[14][15] Additionally, support for better parallelism was introduced through tile and slice partitioning, enabling independent processing of picture regions to facilitate multi-core decoding without significant bitrate overhead.[16] Specific adaptations in HEVC CABAC include reduced contexts for transform coefficient coding across larger transform unit (TU) sizes ranging from 4×4 to 32×32, compared to the smaller blocks in H.264/AVC.[13] For partitioning, the quadtree-based coding unit (CU) split flags generate fewer context-coded bins overall, with optimizations to minimize dependencies.[15] Sample adaptive offset (SAO) parameters, which help reduce banding artifacts, are largely bypass-coded—only the merge flag and the first bin of the SAO type index use regular contexts—allowing faster processing of offset values and signs.[13] These changes contribute to HEVC CABAC achieving 20-30% better compression efficiency than H.264/AVC CABAC for ultra-high-definition (UHD) content, alongside optimizations like grouped renormalization steps that enable single instruction, multiple data (SIMD) acceleration for higher throughput.[13][15] A representative example of coefficient coding in HEVC CABAC is the significance map for a 16×16 TU, where non-zero coefficients are signaled using significant_coeff_flag bins scanned in a diagonal order to reduce inter-bin dependencies and support parallelism.[17] For larger TUs like 16×16, the map is divided into 4×4 subblocks flagged by coded_sub_block_flag (using four context patterns based on neighbors), with 5-9 contexts applied per bin plane for the flags, depending on position, scan type, and chroma/luma separation.[17][13] Further refinements include dedicated contexts for intra prediction mode signaling, where three most probable modes (MPMs) are derived from neighbors and coded with prev_intra_luma_pred_flag (one context) followed by remainder modes in bypass.[13] For motion compensation, advanced motion vector prediction (AMVP) and merge indices are binarized using truncated unary codes with limited candidates (up to five for merge, signaled via slice header), where the first bin is context-coded and subsequent bins are bypass-coded to balance efficiency and speed.[13] HEVC CABAC also supports 10-bit coding through extended delta QP ranges (-32 to 31) and larger coefficient representations, essential for high dynamic range content.[13] In the screen content coding extensions, CABAC integrates with palette mode by binarizing palette indices and escape values, enabling efficient representation of discrete color regions in applications like desktop sharing.[18]

Role in VVC/H.266

The Versatile Video Coding (VVC/H.266) standard, finalized in 2020, continues to use an enhanced version of CABAC as its entropy coding method, further optimizing for 8K and higher resolutions, immersive media, and screen content.[19] Key enhancements include multi-hypothesis probability estimation with two estimates per context for faster adaptation, advanced context modeling based on transform coefficient quantization states and local templates, and reduced context-coded bins for larger transform blocks. The total number of contexts is expanded to support these features, with specific allocations such as 60 for significance flags (36 luma, 24 chroma). These improvements contribute approximately 1.1-1.3% bitrate savings over HEVC CABAC, as part of VVC's overall 30-50% compression gains relative to HEVC.[20]

Performance Comparisons

Context-adaptive binary arithmetic coding (CABAC) offers significant bitrate reductions compared to the Context-Adaptive Variable-Length Coding (CAVLC) entropy coder in H.264/AVC, typically achieving 10-20% savings for standard- and high-definition TV signals at equivalent video quality.[1] This efficiency stems from CABAC's adaptive probability modeling and arithmetic coding, which better approximate the Shannon limit for video data statistics, though it incurs higher computational complexity—often 2-3 times that of CAVLC in software implementations due to renormalization and probability updates.[4] CAVLC, relying on variable-length codes derived from Huffman principles, remains preferable for low-complexity decoders in resource-constrained environments, as it avoids the serial dependencies in CABAC's range updates.[21] Relative to traditional Huffman coding, CABAC provides gains approaching 0.1-0.5 dB in PSNR for equivalent bitrates in video compression, primarily through its context-adaptive binarization and elimination of fixed codeword tables, enabling finer adaptation to non-stationary symbol probabilities without the integer bit limitations of Huffman methods.[22] In H.264/AVC, these improvements translate to 9-14% bitrate savings over Huffman-based CAVLC across diverse content.[23] Joint Video Exploration Team (JVET) tests for HEVC demonstrated approximately 50% bitrate reduction over H.264/AVC at matching quality, with CABAC contributing substantially through enhanced context models and bypass modes, though at 10-20 cycles per bin versus roughly 5 for CAVLC in optimized implementations.[24] CABAC's rapid adaptation to video-specific statistics excels in dynamic scenes, but its serial nature reduces error resilience compared to block-independent coders like Huffman, as bit errors propagate through the arithmetic range.[25] In the AV1 codec standardized by AOMedia in 2018, a multi-symbol arithmetic coder replaces CABAC's binary approach with simpler probability models for broader symbol alphabets, improving parallelization while maintaining comparable efficiency.[26]

History

Origins and Early Development

The foundations of context-adaptive binary arithmetic coding (CABAC) lie in the development of arithmetic coding techniques during the late 1970s. Jorma Rissanen introduced key concepts in adaptive arithmetic coding, deriving it from enumerative coding theory and emphasizing universal modeling for unknown sources, as detailed in his 1976 paper on generalized Kraft inequality and arithmetic coding. Independently, Robert Pasco explored finite-precision implementations of arithmetic coding to address the finite-state source coding problem, providing early practical insights into adaptive probability estimation without multiplications. These works, further refined by Rissanen in 1979, laid the groundwork for adaptive arithmetic coders capable of handling dynamic symbol probabilities, influencing subsequent binary adaptations. A significant advancement came with the Q-Coder, an adaptive binary arithmetic coding system developed at IBM in the mid-1980s and published in 1988. This coder extended Rissanen and Pasco's ideas to binary alphabets, using interval renormalization for robust probability estimation and avoiding floating-point operations through integer arithmetic, which enabled efficient hardware implementation. Building on this, the MQ-Coder emerged as a refined binary arithmetic coder, incorporating state-based adaptation for improved efficiency in image compression standards. Originally proposed for bi-level image coding in contexts like JBIG, it introduced more precise probability updates via finite-state machines, as analyzed in early 1990s implementations that balanced speed and compression. These binary extensions prioritized low-complexity adaptation, setting the stage for video-specific enhancements. CABAC was proposed in 2001 by researchers at the Fraunhofer Institute for Telecommunications, Heinrich Hertz Institute (HHI) in Berlin, as an entropy coding method for the emerging H.26L video coding standard (later H.264/AVC). The initial version, presented as ITU-T VCEG document L-13 in January 2001, combined binary arithmetic coding with context modeling and adaptive binarization tailored to video syntax elements.[27] Key contributors included Detlev Marpe, Heiko Schwarz, and Thomas Wiegand, who refined the design through 2001–2002 iterations to optimize for low-delay encoding in real-time video applications.[1] Their seminal 2003 paper formalized CABAC, highlighting its multiplication-free arithmetic engine inspired by the Q-Coder and MQ-Coder but enhanced with video-oriented context adaptation to reduce redundancy beyond prior methods like JPEG 2000's MQ-Coder.[28] Early experiments integrated CABAC into H.26L test models, demonstrating its superiority over prototype context-adaptive variable-length coding (CAVLC) alternatives. Tests on typical video sequences at 30–38 dB PSNR showed CABAC achieving 9–14% bit-rate reductions compared to CAVLC, attributed to its finer-grained probability modeling and binary decomposition of non-binary symbols.[4] This performance edge, verified in pre-standard evaluations, underscored CABAC's potential for high-efficiency video compression while maintaining compatibility with low-complexity requirements for delay-sensitive applications.[13]

Standardization Milestones

Context-adaptive binary arithmetic coding (CABAC) achieved its first major standardization milestone with its inclusion in the H.264/AVC video coding standard, approved as ITU-T Recommendation H.264 and ISO/IEC International Standard 14496-10 in May 2003.[29] CABAC was specified as one of two entropy coding options, available in the Main Profile, with the final draft international standard incorporating it released in July 2004.[12] Subsequent refinements to H.264/AVC enhanced CABAC's capabilities through key amendments. The Fidelity Range Extensions (FRExt) amendment, finalized in July 2004, introduced support for 10-bit and 12-bit sample depths, along with lossless coding modes that leveraged CABAC for improved efficiency in high-fidelity applications.[12] The Scalable Video Coding (SVC) extension, completed in November 2007, adapted CABAC contexts to handle layered bitstreams, enabling scalability in temporal, spatial, and quality dimensions while maintaining backward compatibility with baseline H.264/AVC decoders.[30] CABAC's role expanded significantly in the High Efficiency Video Coding (HEVC) standard, designated ITU-T H.265 and ISO/IEC 23008-2, which was approved in April 2013.[31] Unlike in H.264/AVC where it was optional, CABAC became the mandatory entropy coding method in HEVC, featuring optimizations such as a reduced number of context models from 399 to 153 to improve throughput without sacrificing compression performance.[31] Later developments further advanced CABAC in subsequent standards. The Versatile Video Coding (VVC) standard, ITU-T H.266 and ISO/IEC 23090-3, approved in July 2020, optimized CABAC for higher efficiency, including refined binarization and context modeling for matrix-based intra prediction bins to support enhanced intra-frame coding.[19] In parallel, China's Audio Video Coding Standard (AVS) series adopted similar context-adaptive binary arithmetic coding techniques; AVS2 (2013) and AVS3 (first phase finalized in 2019, with ongoing enhancements as of 2025) incorporate advanced entropy coders like CBAC, which achieve performance comparable to CABAC while addressing ultra-high-definition and screen content needs.[32][33] Subsequent phases of AVS3 were advanced, achieving about 30% better compression than AVS2 and being adopted by DVB in 2022 for ultra-HD broadcasting.[34] By 2016, H.264/AVC had undergone over 10 amendments and corrigenda that refined CABAC for emerging applications, including high dynamic range (HDR) and 360-degree video, through extensions like multiview and range enhancements. Regarding intellectual property, CABAC's implementation in H.264/AVC and extensions involves essential patents from approximately 20 to 40 entities, collectively licensed through the MPEG LA patent pool to facilitate widespread adoption.[35]

References

User Avatar
No comments yet.