Hubbry Logo
Advanced Video CodingAdvanced Video CodingMain
Open search
Advanced Video Coding
Community hub
Advanced Video Coding
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Advanced Video Coding
Advanced Video Coding
from Wikipedia

Advanced Video Coding / H.264 / MPEG-4 Part 10
Advanced video coding for generic audiovisual services
StatusIn force
Year started2003 (22 years ago) (2003)
First published17 August 2004 (2004-08-17)
Latest version15.0
13 August 2024 (2024-08-13)
OrganizationITU-T, ISO, IEC
CommitteeSG16 (VCEG), MPEG
Base standardsH.261, H.262 (aka MPEG-2 Video), H.263, ISO/IEC 14496-2 (aka MPEG-4 Part 2)
Related standardsH.265 (aka HEVC), H.266 (aka VVC)
PredecessorH.263
SuccessorH.265
DomainVideo compression
LicenseMPEG LA[1]
Websitewww.itu.int/rec/T-REC-H.264
Block diagram of video coding layer of H.264 encoder with perceptual quality score

Advanced Video Coding (AVC), also referred to as H.264 or MPEG-4 Part 10, is a video compression standard based on block-oriented, motion-compensated coding.[2] It is by far the most commonly used format for the recording, compression, and distribution of video content, used by 79% of video industry developers as of December 2024.[3] It supports a maximum resolution of 8K UHD.[4][5]

The intent of the H.264/AVC project was to create a standard capable of providing good video quality at substantially lower bit rates than previous standards (i.e., half or less the bit rate of MPEG-2, H.263, or MPEG-4 Part 2), without increasing the complexity of design so much that it would be impractical or excessively expensive to implement. This was achieved with features such as a reduced-complexity integer discrete cosine transform (integer DCT),[6] variable block-size segmentation, and multi-picture inter-picture prediction. An additional goal was to provide enough flexibility to allow the standard to be applied to a wide variety of applications on a wide variety of networks and systems, including low and high bit rates, low and high resolution video, broadcast, DVD storage, RTP/IP packet networks, and ITU-T multimedia telephony systems. The H.264 standard can be viewed as a "family of standards" composed of a number of different profiles, although its "High profile" is by far the most commonly used format. A specific decoder decodes at least one, but not necessarily all profiles. The standard describes the format of the encoded data and how the data is decoded, but it does not specify algorithms for encoding—that is left open as a matter for encoder designers to select for themselves, and a wide variety of encoding schemes have been developed. H.264 is typically used for lossy compression, although it is also possible to create truly lossless-coded regions within lossy-coded pictures or to support rare use cases for which the entire encoding is lossless.

H.264 was standardized by the ITU-T Video Coding Experts Group (VCEG) of Study Group 16 together with the ISO/IEC JTC 1 Moving Picture Experts Group (MPEG). The project partnership effort is known as the Joint Video Team (JVT). The ITU-T H.264 standard and the ISO/IEC MPEG-4 AVC standard (formally, ISO/IEC 14496-10 – MPEG-4 Part 10, Advanced Video Coding) are jointly maintained so that they have identical technical content. The final drafting work on the first version of the standard was completed in May 2003, and various extensions of its capabilities have been added in subsequent editions. High Efficiency Video Coding (HEVC), a.k.a. H.265 and MPEG-H Part 2 is a successor to H.264/MPEG-4 AVC developed by the same organizations, while earlier standards are still in common use.

H.264 is perhaps best known as being the most commonly used video encoding format on Blu-ray Discs. It has also widely used by streaming Internet sources, such as videos from Netflix, Hulu, Amazon Prime Video, Vimeo, YouTube (more recently transitioning to VP9 and AV1), and the iTunes Store, Web software such as the Adobe Flash Player and Microsoft Silverlight, and also various HDTV broadcasts over terrestrial (ATSC, ISDB-T, DVB-T or DVB-T2), cable (DVB-C), and satellite (DVB-S and DVB-S2) systems.

H.264 is restricted by patents owned by various parties. A license covering most (but not all[citation needed]) patents essential to H.264 is administered by a patent pool formerly administered by MPEG LA. Via Licensing Corp acquired MPEG LA in April 2023 and formed a new patent pool administration company called Via Licensing Alliance.[7] The commercial use of patented H.264 technologies requires the payment of royalties to Via and other patent owners. MPEG LA has allowed the free use of H.264 technologies for streaming Internet video that is free to end users, and Cisco paid royalties to MPEG LA on behalf of the users of binaries for its open source H.264 encoder openH264.

Naming

[edit]

The H.264 name follows the ITU-T naming convention, where Recommendations are given a letter corresponding to their series and a recommendation number within the series. H.264 is part of "H-Series Recommendations: Audiovisual and multimedia systems". H.264 is further categorized into "H.200-H.499: Infrastructure of audiovisual services" and "H.260-H.279: Coding of moving video".[8] The MPEG-4 AVC name relates to the naming convention in ISO/IEC MPEG, where the standard is part 10 of ISO/IEC 14496, which is the suite of standards known as MPEG-4. The standard was developed jointly in a partnership of VCEG and MPEG, after earlier development work in the ITU-T as a VCEG project called H.26L. It is thus common to refer to the standard with names such as H.264/AVC, AVC/H.264, H.264/MPEG-4 AVC, or MPEG-4/H.264 AVC, to emphasize the common heritage. Occasionally, it is also referred to as "the JVT codec", in reference to the Joint Video Team (JVT) organization that developed it. (Such partnership and multiple naming is not uncommon. For example, the video compression standard known as MPEG-2 also arose from the partnership between MPEG and the ITU-T, where MPEG-2 video is known to the ITU-T community as H.262.[9]) Some software programs (such as VLC media player) internally identify this standard as AVC1.

History

[edit]

Overall history

[edit]

In early 1998, the Video Coding Experts Group (VCEG – ITU-T SG16 Q.6) issued a call for proposals on a project called H.26L, with the target to double the coding efficiency (which means halving the bit rate necessary for a given level of fidelity) in comparison to any other existing video coding standards for a broad variety of applications. VCEG was chaired by Gary Sullivan (Microsoft, formerly PictureTel, U.S.). The first draft design for that new standard was adopted in August 1999. In 2000, Thomas Wiegand (Heinrich Hertz Institute, Germany) became VCEG co-chair.

In December 2001, VCEG and the Moving Picture Experts Group (MPEG – ISO/IEC JTC 1/SC 29/WG 11) formed a Joint Video Team (JVT), with the charter to finalize the video coding standard.[10] Formal approval of the specification came in March 2003. The JVT was (is) chaired by Gary Sullivan, Thomas Wiegand, and Ajay Luthra (Motorola, U.S.: later Arris, U.S.). In July 2004, the Fidelity Range Extensions (FRExt) project was finalized. From January 2005 to November 2007, the JVT was working on an extension of H.264/AVC towards scalability by an Annex (G) called Scalable Video Coding (SVC). The JVT management team was extended by Jens-Rainer Ohm (RWTH Aachen University, Germany). From July 2006 to November 2009, the JVT worked on Multiview Video Coding (MVC), an extension of H.264/AVC towards 3D television and limited-range free-viewpoint television. That work included the development of two new profiles of the standard: the Multiview High Profile and the Stereo High Profile.

Throughout the development of the standard, additional messages for containing supplemental enhancement information (SEI) have been developed. SEI messages can contain various types of data that indicate the timing of the video pictures or describe various properties of the coded video or how it can be used or enhanced. SEI messages are also defined that can contain arbitrary user-defined data. SEI messages do not affect the core decoding process, but can indicate how the video is recommended to be post-processed or displayed. Some other high-level properties of the video content are conveyed in video usability information (VUI), such as the indication of the color space for interpretation of the video content. As new color spaces have been developed, such as for high dynamic range and wide color gamut video, additional VUI identifiers have been added to indicate them.

Fidelity range extensions and professional profiles

[edit]

The standardization of the first version of H.264/AVC was completed in May 2003. In the first project to extend the original standard, the JVT then developed what was called the Fidelity Range Extensions (FRExt). These extensions enabled higher quality video coding by supporting increased sample bit depth precision and higher-resolution color information, including the sampling structures known as Y′CBCR 4:2:2 (a.k.a. YUV 4:2:2) and 4:4:4. Several other features were also included in the FRExt project, such as adding an 8×8 integer discrete cosine transform (integer DCT) with adaptive switching between the 4×4 and 8×8 transforms, encoder-specified perceptual-based quantization weighting matrices, efficient inter-picture lossless coding, and support of additional color spaces. The design work on the FRExt project was completed in July 2004, and the drafting work on them was completed in September 2004.

Five other new profiles (see version 7 below) intended primarily for professional applications were then developed, adding extended-gamut color space support, defining additional aspect ratio indicators, defining two additional types of "supplemental enhancement information" (post-filter hint and tone mapping), and deprecating one of the prior FRExt profiles (the High 4:4:4 profile) that industry feedback[by whom?] indicated should have been designed differently.

Scalable video coding

[edit]

The next major feature added to the standard was Scalable Video Coding (SVC). Specified in Annex G of H.264/AVC, SVC allows the construction of bitstreams that contain layers of sub-bitstreams that also conform to the standard, including one such bitstream known as the "base layer" that can be decoded by a H.264/AVC codec that does not support SVC. For temporal bitstream scalability (i.e., the presence of a sub-bitstream with a smaller temporal sampling rate than the main bitstream), complete access units are removed from the bitstream when deriving the sub-bitstream. In this case, high-level syntax and inter-prediction reference pictures in the bitstream are constructed accordingly. On the other hand, for spatial and quality bitstream scalability (i.e. the presence of a sub-bitstream with lower spatial resolution/quality than the main bitstream), the NAL (Network Abstraction Layer) is removed from the bitstream when deriving the sub-bitstream. In this case, inter-layer prediction (i.e., the prediction of the higher spatial resolution/quality signal from the data of the lower spatial resolution/quality signal) is typically used for efficient coding. The Scalable Video Coding extensions were completed in November 2007.

Multiview video coding

[edit]

The next major feature added to the standard was Multiview Video Coding (MVC). Specified in Annex H of H.264/AVC, MVC enables the construction of bitstreams that represent more than one view of a video scene. An important example of this functionality is stereoscopic 3D video coding. Two profiles were developed in the MVC work: Multiview High profile supports an arbitrary number of views, and Stereo High profile is designed specifically for two-view stereoscopic video. The Multiview Video Coding extensions were completed in November 2009.

3D-AVC and MFC stereoscopic coding

[edit]

Additional extensions were later developed that included 3D video coding with joint coding of depth maps and texture (termed 3D-AVC), multi-resolution frame-compatible (MFC) stereoscopic and 3D-MFC coding, various additional combinations of features, and higher frame sizes and frame rates.

Versions

[edit]

Versions of the H.264/AVC standard include the following completed revisions, corrigenda, and amendments (dates are final approval dates in ITU-T, while final "International Standard" approval dates in ISO/IEC are somewhat different and slightly later in most cases). Each version represents changes relative to the next lower version that is integrated into the text.

  • Version 1 (Edition 1): (May 30, 2003) First approved version of H.264/AVC containing Baseline, Main, and Extended profiles.[11]
  • Version 2 (Edition 1.1): (May 7, 2004) Corrigendum containing various minor corrections.[12]
  • Version 3 (Edition 2): (March 1, 2005) Major addition containing the first amendment, establishing the Fidelity Range Extensions (FRExt). This version added the High, High 10, High 4:2:2, and High 4:4:4 profiles.[13] After a few years, the High profile became the most commonly used profile of the standard.
  • Version 4 (Edition 2.1): (September 13, 2005) Corrigendum containing various minor corrections and adding three aspect ratio indicators.[14]
  • Version 5 (Edition 2.2): (June 13, 2006) Amendment consisting of removal of prior High 4:4:4 profile (processed as a corrigendum in ISO/IEC).[15]
  • Version 6 (Edition 2.2): (June 13, 2006) Amendment consisting of minor extensions like extended-gamut color space support (bundled with above-mentioned aspect ratio indicators in ISO/IEC).[15]
  • Version 7 (Edition 2.3): (April 6, 2007) Amendment containing the addition of the High 4:4:4 Predictive profile and four Intra-only profiles (High 10 Intra, High 4:2:2 Intra, High 4:4:4 Intra, and CAVLC 4:4:4 Intra).[16]
  • Version 8 (Edition 3): (November 22, 2007) Major addition to H.264/AVC containing the amendment for Scalable Video Coding (SVC) containing Scalable Baseline, Scalable High, and Scalable High Intra profiles.[17]
  • Version 9 (Edition 3.1): (January 13, 2009) Corrigendum containing minor corrections.[18]
  • Version 10 (Edition 4): (March 16, 2009) Amendment containing definition of a new profile (the Constrained Baseline profile) with only the common subset of capabilities supported in various previously specified profiles.[19]
  • Version 11 (Edition 4): (March 16, 2009) Major addition to H.264/AVC containing the amendment for Multiview Video Coding (MVC) extension, including the Multiview High profile.[19]
  • Version 12 (Edition 5): (March 9, 2010) Amendment containing definition of a new MVC profile (the Stereo High profile) for two-view video coding with support of interlaced coding tools and specifying an additional supplemental enhancement information (SEI) message termed the frame packing arrangement SEI message.[20]
  • Version 13 (Edition 5): (March 9, 2010) Corrigendum containing minor corrections.[20]
  • Version 14 (Edition 6): (June 29, 2011) Amendment specifying a new level (Level 5.2) supporting higher processing rates in terms of maximum macroblocks per second, and a new profile (the Progressive High profile) supporting only the frame coding tools of the previously specified High profile.[21]
  • Version 15 (Edition 6): (June 29, 2011) Corrigendum containing minor corrections.[21]
  • Version 16 (Edition 7): (January 13, 2012) Amendment containing definition of three new profiles intended primarily for real-time communication applications: the Constrained High, Scalable Constrained Baseline, and Scalable Constrained High profiles.[22]
  • Version 17 (Edition 8): (April 13, 2013) Amendment with additional SEI message indicators.[23]
  • Version 18 (Edition 8): (April 13, 2013) Amendment to specify the coding of depth map data for 3D stereoscopic video, including a Multiview Depth High profile.[23]
  • Version 19 (Edition 8): (April 13, 2013) Corrigendum to correct an error in the sub-bitstream extraction process for multiview video.[23]
  • Version 20 (Edition 8): (April 13, 2013) Amendment to specify additional color space identifiers (including support of ITU-R Recommendation BT.2020 for UHDTV) and an additional model type in the tone mapping information SEI message.[23]
  • Version 21 (Edition 9): (February 13, 2014) Amendment to specify the Enhanced Multiview Depth High profile.[24]
  • Version 22 (Edition 9): (February 13, 2014) Amendment to specify the multi-resolution frame compatible (MFC) enhancement for 3D stereoscopic video, the MFC High profile, and minor corrections.[24]
  • Version 23 (Edition 10): (February 13, 2016) Amendment to specify MFC stereoscopic video with depth maps, the MFC Depth High profile, the mastering display color volume SEI message, and additional color-related VUI codepoint identifiers.[25]
  • Version 24 (Edition 11): (October 14, 2016) Amendment to specify additional levels of decoder capability supporting larger picture sizes (Levels 6, 6.1, and 6.2), the green metadata SEI message, the alternative depth information SEI message, and additional color-related VUI codepoint identifiers.[26]
  • Version 25 (Edition 12): (April 13, 2017) Amendment to specify the Progressive High 10 profile, hybrid log–gamma (HLG), and additional color-related VUI code points and SEI messages.[27]
  • Version 26 (Edition 13): (June 13, 2019) Amendment to specify additional SEI messages for ambient viewing environment, content light level information, content color volume, equirectangular projection, cubemap projection, sphere rotation, region-wise packing, omnidirectional viewport, SEI manifest, and SEI prefix.[28]
  • Version 27 (Edition 14): (August 22, 2021) Amendment to specify additional SEI messages for annotated regions and shutter interval information, and miscellaneous minor corrections and clarifications.[29]
  • Version 28 (Edition 15): (August 13, 2024) Amendment to specify additional SEI messages for neural-network postfilter characteristics, neural-network post-filter activation, and phase indication, additional colour type identifiers, and miscellaneous minor corrections and clarifications.[30]

Applications

[edit]

The H.264 video format has a very broad application range that covers all forms of digital compressed video from low bit-rate Internet streaming applications to HDTV broadcast and Digital Cinema applications with nearly lossless coding. With the use of H.264, bit rate savings of 50% or more compared to MPEG-2 Part 2 are reported. For example, H.264 has been reported to give the same Digital Satellite TV quality as current MPEG-2 implementations with less than half the bitrate, with current MPEG-2 implementations working at around 3.5 Mbit/s and H.264 at only 1.5 Mbit/s.[31] Sony claims that 9 Mbit/s AVC recording mode is equivalent to the image quality of the HDV format, which uses approximately 18–25 Mbit/s.[32]

To ensure compatibility and problem-free adoption of H.264/AVC, many standards bodies have amended or added to their video-related standards so that users of these standards can employ H.264/AVC. Both the Blu-ray Disc format and the now-discontinued HD DVD format include the H.264/AVC High Profile as one of three mandatory video compression formats. The Digital Video Broadcast project (DVB) approved the use of H.264/AVC for broadcast television in late 2004.

The Advanced Television Systems Committee (ATSC) standards body in the United States approved the use of H.264/AVC for broadcast television in July 2008, although the standard is not yet used for fixed ATSC broadcasts within the United States.[33][34] It has also been approved for use with the more recent ATSC-M/H (Mobile/Handheld) standard, using the AVC and SVC portions of H.264.[35]

The closed-circuit-television and video-surveillance markets have included the technology in many products.

Many common DSLRs use H.264 video wrapped in QuickTime MOV containers as the native recording format.

Derived formats

[edit]

AVCHD is a high-definition recording format designed by Sony and Panasonic that uses H.264 (conforming to H.264 while adding additional application-specific features and constraints).

AVC-Intra is an intraframe-only compression format, developed by Panasonic.

XAVC is a recording format designed by Sony that uses level 5.2 of H.264/MPEG-4 AVC, which is the highest level supported by that video standard.[36][37] XAVC can support 4K resolution (4096 × 2160 and 3840 × 2160) at up to 60 frames per second (fps).[36][37] Sony has announced that cameras that support XAVC include two CineAlta cameras—the Sony PMW-F55 and Sony PMW-F5.[38] The Sony PMW-F55 can record XAVC with 4K resolution at 30 fps at 300 Mbit/s and 2K resolution at 30 fps at 100 Mbit/s.[39] XAVC can record 4K resolution at 60 fps with 4:2:2 chroma sampling at 600 Mbit/s.[40][41]

Design

[edit]

Features

[edit]
Block diagram of H.264

H.264/AVC/MPEG-4 Part 10 contains a number of new features that allow it to compress video much more efficiently than older standards and to provide more flexibility for application to a wide variety of network environments. In particular, some such key features include:

  • Multi-picture inter-picture prediction including the following features:
    • Using previously encoded pictures as references in a much more flexible way than in past standards, allowing up to 16 reference frames (or 32 reference fields, in the case of interlaced encoding) to be used in some cases. In profiles that support non-IDR frames, most levels specify that sufficient buffering should be available to allow for at least 4 or 5 reference frames at maximum resolution. This is in contrast to prior standards, where the limit was typically one; or, in the case of conventional "B pictures" (B-frames), two.
    • Variable block-size motion compensation (VBSMC) with block sizes as large as 16×16 and as small as 4×4, enabling precise segmentation of moving regions. The supported luma prediction block sizes include 16×16, 16×8, 8×16, 8×8, 8×4, 4×8, and 4×4, many of which can be used together in a single macroblock. Chroma prediction block sizes are correspondingly smaller when chroma subsampling is used.
    • The ability to use multiple motion vectors per macroblock (one or two per partition) with a maximum of 32 in the case of a B macroblock constructed of 16 4×4 partitions. The motion vectors for each 8×8 or larger partition region can point to different reference pictures.
    • The ability to use any macroblock type in B-frames, including I-macroblocks, resulting in much more efficient encoding when using B-frames. This feature was notably left out from MPEG-4 ASP.
    • Six-tap filtering for derivation of half-pel luma sample predictions, for sharper subpixel motion-compensation. Quarter-pixel motion is derived by linear interpolation of the halfpixel values, to save processing power.
    • Quarter-pixel precision for motion compensation, enabling precise description of the displacements of moving areas. For chroma the resolution is typically halved both vertically and horizontally (see 4:2:0) therefore the motion compensation of chroma uses one-eighth chroma pixel grid units.
    • Weighted prediction, allowing an encoder to specify the use of a scaling and offset when performing motion compensation, and providing a significant benefit in performance in special cases—such as fade-to-black, fade-in, and cross-fade transitions. This includes implicit weighted prediction for B-frames, and explicit weighted prediction for P-frames.
  • Spatial prediction from the edges of neighboring blocks for "intra" coding, rather than the "DC"-only prediction found in MPEG-2 Part 2 and the transform coefficient prediction found in H.263v2 and MPEG-4 Part 2. This includes luma prediction block sizes of 16×16, 8×8, and 4×4 (of which only one type can be used within each macroblock).
  • Integer discrete cosine transform (integer DCT),[5][42][43] a type of discrete cosine transform (DCT)[42] where the transform is an integer approximation of the standard DCT.[44] It has selectable block sizes[6] and exact-match integer computation to reduce complexity, including:
    • An exact-match integer 4×4 spatial block transform, allowing precise placement of residual signals with little of the "ringing" often found with prior codec designs. It is similar to the standard DCT used in previous standards, but uses a smaller block size and simple integer processing. Unlike the cosine-based formulas and tolerances expressed in earlier standards (such as H.261 and MPEG-2), integer processing provides an exactly specified decoded result.
    • An exact-match integer 8×8 spatial block transform, allowing highly correlated regions to be compressed more efficiently than with the 4×4 transform. This design is based on the standard DCT, but simplified and made to provide exactly specified decoding.
    • Adaptive encoder selection between the 4×4 and 8×8 transform block sizes for the integer transform operation.
    • A secondary Hadamard transform performed on "DC" coefficients of the primary spatial transform applied to chroma DC coefficients (and also luma in one special case) to obtain even more compression in smooth regions.
  • Lossless macroblock coding features including:
    • A lossless "PCM macroblock" representation mode in which video data samples are represented directly,[45] allowing perfect representation of specific regions and allowing a strict limit to be placed on the quantity of coded data for each macroblock.
    • An enhanced lossless macroblock representation mode allowing perfect representation of specific regions while ordinarily using substantially fewer bits than the PCM mode.
  • Flexible interlaced-scan video coding features, including:
    • Macroblock-adaptive frame-field (MBAFF) coding, using a macroblock pair structure for pictures coded as frames, allowing 16×16 macroblocks in field mode (compared with MPEG-2, where field mode processing in a picture that is coded as a frame results in the processing of 16×8 half-macroblocks).
    • Picture-adaptive frame-field coding (PAFF or PicAFF) allowing a freely selected mixture of pictures coded either as complete frames where both fields are combined for encoding or as individual single fields.
  • A quantization design including:
    • Logarithmic step size control for easier bit rate management by encoders and simplified inverse-quantization scaling
    • Frequency-customized quantization scaling matrices selected by the encoder for perceptual-based quantization optimization
  • An in-loop deblocking filter that helps prevent the blocking artifacts common to other DCT-based image compression techniques, resulting in better visual appearance and compression efficiency
  • An entropy coding design including:
    • Context-adaptive binary arithmetic coding (CABAC), an algorithm to losslessly compress syntax elements in the video stream knowing the probabilities of syntax elements in a given context. CABAC compresses data more efficiently than CAVLC but requires considerably more processing to decode.
    • Context-adaptive variable-length coding (CAVLC), which is a lower-complexity alternative to CABAC for the coding of quantized transform coefficient values. Although lower complexity than CABAC, CAVLC is more elaborate and more efficient than the methods typically used to code coefficients in other prior designs.
    • A common simple and highly structured variable length coding (VLC) technique for many of the syntax elements not coded by CABAC or CAVLC, referred to as Exponential-Golomb coding (or Exp-Golomb).
  • Loss resilience features including:
    • A Network Abstraction Layer (NAL) definition allowing the same video syntax to be used in many network environments. One very fundamental design concept of H.264 is to generate self-contained packets, to remove the header duplication as in MPEG-4's Header Extension Code (HEC).[46] This was achieved by decoupling information relevant to more than one slice from the media stream. The combination of the higher-level parameters is called a parameter set.[46] The H.264 specification includes two types of parameter sets: Sequence Parameter Set (SPS) and Picture Parameter Set (PPS). An active sequence parameter set remains unchanged throughout a coded video sequence, and an active picture parameter set remains unchanged within a coded picture. The sequence and picture parameter set structures contain information such as picture size, optional coding modes employed, and macroblock to slice group map.[46]
    • Flexible macroblock ordering (FMO), also known as slice groups, and arbitrary slice ordering (ASO), which are techniques for restructuring the ordering of the representation of the fundamental regions (macroblocks) in pictures. Typically considered an error/loss robustness feature, FMO and ASO can also be used for other purposes.
    • Data partitioning (DP), a feature providing the ability to separate more important and less important syntax elements into different packets of data, enabling the application of unequal error protection (UEP) and other types of improvement of error/loss robustness.
    • Redundant slices (RS), an error/loss robustness feature that lets an encoder send an extra representation of a picture region (typically at lower fidelity) that can be used if the primary representation is corrupted or lost.
    • Frame numbering, a feature that allows the creation of "sub-sequences", enabling temporal scalability by optional inclusion of extra pictures between other pictures, and the detection and concealment of losses of entire pictures, which can occur due to network packet losses or channel errors.
  • Switching slices, called SP and SI slices, allowing an encoder to direct a decoder to jump into an ongoing video stream for such purposes as video streaming bit rate switching and "trick mode" operation. When a decoder jumps into the middle of a video stream using the SP/SI feature, it can get an exact match to the decoded pictures at that location in the video stream despite using different pictures, or no pictures at all, as references prior to the switch.
  • A simple automatic process for preventing the accidental emulation of start codes, which are special sequences of bits in the coded data that allow random access into the bitstream and recovery of byte alignment in systems that can lose byte synchronization.
  • Supplemental enhancement information (SEI) and video usability information (VUI), which are extra information that can be inserted into the bitstream for various purposes such as indicating the color space used the video content or various constraints that apply to the encoding. SEI messages can contain arbitrary user-defined metadata payloads or other messages with syntax and semantics defined in the standard.
  • Auxiliary pictures, which can be used for such purposes as alpha compositing.
  • Support of monochrome (4:0:0), 4:2:0, 4:2:2, and 4:4:4 chroma sampling (depending on the selected profile).
  • Support of sample bit depth precision ranging from 8 to 14 bits per sample (depending on the selected profile).
  • The ability to encode individual color planes as distinct pictures with their own slice structures, macroblock modes, motion vectors, etc., allowing encoders to be designed with a simple parallelization structure (supported only in the three 4:4:4-capable profiles).
  • Picture order count, a feature that serves to keep the ordering of the pictures and the values of samples in the decoded pictures isolated from timing information, allowing timing information to be carried and controlled/changed separately by a system without affecting decoded picture content.

These techniques, along with several others, help H.264 to perform significantly better than any prior standard under a wide variety of circumstances in a wide variety of application environments. H.264 can often perform radically better than MPEG-2 video—typically obtaining the same quality at half of the bit rate or less, especially on high bit rate and high resolution video content.[47]

Like other ISO/IEC MPEG video standards, H.264/AVC has a reference software implementation that can be freely downloaded.[48] Its main purpose is to give examples of H.264/AVC features, rather than being a useful application per se. Some reference hardware design work has also been conducted in the Moving Picture Experts Group. The above-mentioned aspects include features in all profiles of H.264. A profile for a codec is a set of features of that codec identified to meet a certain set of specifications of intended applications. This means that many of the features listed are not supported in some profiles. Various profiles of H.264/AVC are discussed in next section.

Profiles

[edit]

The standard defines several sets of capabilities, which are referred to as profiles, targeting specific classes of applications. These are declared using a profile code (profile_idc) and sometimes a set of additional constraints applied in the encoder. The profile code and indicated constraints allow a decoder to recognize the requirements for decoding that specific bitstream. (And in many system environments, only one or two profiles are allowed to be used, so decoders in those environments do not need to be concerned with recognizing the less commonly used profiles.) By far the most commonly used profile is the High Profile.

Profiles for non-scalable 2D video applications include the following:

Constrained Baseline Profile (CBP, 66 with constraint set 1)
Primarily for low-cost applications, this profile is most typically used in videoconferencing and mobile applications. It corresponds to the subset of features that are in common between the Baseline, Main, and High Profiles.
Baseline Profile (BP, 66)
Primarily for low-cost applications that require additional data loss robustness, this profile is used in some videoconferencing and mobile applications. This profile includes all features that are supported in the Constrained Baseline Profile, plus three additional features that can be used for loss robustness (or for other purposes such as low-delay multi-point video stream compositing). The importance of this profile has faded somewhat since the definition of the Constrained Baseline Profile in 2009. All Constrained Baseline Profile bitstreams are also considered to be Baseline Profile bitstreams, as these two profiles share the same profile identifier code value.
Extended Profile (XP, 88)
Intended as the streaming video profile, this profile has relatively high compression capability and some extra tricks for robustness to data losses and server stream switching.
Main Profile (MP, 77)
This profile is used for standard-definition digital TV broadcasts that use the MPEG-4 format as defined in the DVB standard.[49] It is not, however, used for high-definition television broadcasts, as the importance of this profile faded when the High Profile was developed in 2004 for that application.
High Profile (HiP, 100)
The primary profile for broadcast and disc storage applications, particularly for high-definition television applications (for example, this is the profile adopted by the Blu-ray Disc storage format and the DVB HDTV broadcast service).
Progressive High Profile (PHiP, 100 with constraint set 4)
Similar to the High profile, but without support of field coding features.
Constrained High Profile (100 with constraint set 4 and 5)
Similar to the Progressive High profile, but without support of B (bi-predictive) slices.
High 10 Profile (Hi10P, 110)
Going beyond typical mainstream consumer product capabilities, this profile builds on top of the High Profile, adding support for up to 10 bits per sample of decoded picture precision.
High 4:2:2 Profile (Hi422P, 122)
Primarily targeting professional applications that use interlaced video, this profile builds on top of the High 10 Profile, adding support for the 4:2:2 chroma sampling format while using up to 10 bits per sample of decoded picture precision.
High 4:4:4 Predictive Profile (Hi444PP, 244)
This profile builds on top of the High 4:2:2 Profile, supporting up to 4:4:4 chroma sampling, up to 14 bits per sample, and additionally supporting efficient lossless region coding and the coding of each picture as three separate color planes.

For camcorders, editing, and professional applications, the standard contains four additional Intra-frame-only profiles, which are defined as simple subsets of other corresponding profiles. These are mostly for professional (e.g., camera and editing system) applications:

High 10 Intra Profile (110 with constraint set 3)
The High 10 Profile constrained to all-Intra use.
High 4:2:2 Intra Profile (122 with constraint set 3)
The High 4:2:2 Profile constrained to all-Intra use.
High 4:4:4 Intra Profile (244 with constraint set 3)
The High 4:4:4 Profile constrained to all-Intra use.
CAVLC 4:4:4 Intra Profile (44)
The High 4:4:4 Profile constrained to all-Intra use and to CAVLC entropy coding (i.e., not supporting CABAC).

As a result of the Scalable Video Coding (SVC) extension, the standard contains five additional scalable profiles, which are defined as a combination of a H.264/AVC profile for the base layer (identified by the second word in the scalable profile name) and tools that achieve the scalable extension:

Scalable Baseline Profile (83)
Primarily targeting video conferencing, mobile, and surveillance applications, this profile builds on top of the Constrained Baseline profile to which the base layer (a subset of the bitstream) must conform. For the scalability tools, a subset of the available tools is enabled.
Scalable Constrained Baseline Profile (83 with constraint set 5)
A subset of the Scalable Baseline Profile intended primarily for real-time communication applications.
Scalable High Profile (86)
Primarily targeting broadcast and streaming applications, this profile builds on top of the H.264/AVC High Profile to which the base layer must conform.
Scalable Constrained High Profile (86 with constraint set 5)
A subset of the Scalable High Profile intended primarily for real-time communication applications.
Scalable High Intra Profile (86 with constraint set 3)
Primarily targeting production applications, this profile is the Scalable High Profile constrained to all-Intra use.

As a result of the Multiview Video Coding (MVC) extension, the standard contains two multiview profiles:

Stereo High Profile (128)
This profile targets two-view stereoscopic 3D video and combines the tools of the High profile with the inter-view prediction capabilities of the MVC extension.
Multiview High Profile (118)
This profile supports two or more views using both inter-picture (temporal) and MVC inter-view prediction, but does not support field pictures and macroblock-adaptive frame-field coding.

The Multi-resolution Frame-Compatible (MFC) extension added two more profiles:

MFC High Profile (134)
A profile for stereoscopic coding with two-layer resolution enhancement.
MFC Depth High Profile (135)

The 3D-AVC extension added two more profiles:

Multiview Depth High Profile (138)
This profile supports joint coding of depth map and video texture information for improved compression of 3D video content.
Enhanced Multiview Depth High Profile (139)
An enhanced profile for combined multiview coding with depth information.

Feature support in particular profiles

[edit]
Feature CBP BP XP MP ProHiP HiP Hi10P Hi422P Hi444PP
I and P slices Yes Yes Yes Yes Yes Yes Yes Yes Yes
Bit depth (per sample) 8 8 8 8 8 8 8 to 10 8 to 10 8 to 14
Chroma formats 4:2:0

 
4:2:0

 
4:2:0

 
4:2:0

 
4:2:0

 
4:2:0

 
4:2:0

 
4:2:0/
4:2:2
 
4:2:0/
4:2:2/
4:4:4
Flexible macroblock ordering (FMO) No Yes Yes No No No No No No
Arbitrary slice ordering (ASO) No Yes Yes No No No No No No
Redundant slices (RS) No Yes Yes No No No No No No
Data Partitioning No No Yes No No No No No No
SI and SP slices No No Yes No No No No No No
Interlaced coding (PicAFF, MBAFF) No No Yes Yes No Yes Yes Yes Yes
B slices No No Yes Yes Yes Yes Yes Yes Yes
Multiple reference frames Yes Yes Yes Yes Yes Yes Yes Yes Yes
In-loop deblocking filter Yes Yes Yes Yes Yes Yes Yes Yes Yes
CAVLC entropy coding Yes Yes Yes Yes Yes Yes Yes Yes Yes
CABAC entropy coding No No No Yes Yes Yes Yes Yes Yes
4:0:0 (Greyscale) No No No No Yes Yes Yes Yes Yes
8×8 vs. 4×4 transform adaptivity No No No No Yes Yes Yes Yes Yes
Quantization scaling matrices No No No No Yes Yes Yes Yes Yes
Separate CB and CR QP control No No No No Yes Yes Yes Yes Yes
Separate color plane coding No No No No No No No No Yes
Predictive lossless coding No No No No No No No No Yes

Levels

[edit]

As the term is used in the standard, a "level" is a specified set of constraints that indicate a degree of required decoder performance for a profile. For example, a level of support within a profile specifies the maximum picture resolution, frame rate, and bit rate that a decoder may use. A decoder that conforms to a given level must be able to decode all bitstreams encoded for that level and all lower levels.

Levels with maximum property values[27]
Level
Maximum
decoding speed
(macroblocks/s)
Maximum
frame size
(macroblocks)
Maximum video
bit rate for video
coding layer (VCL)
(Constrained Baseline,
Baseline, Extended
and Main Profiles)
(kbits/s)
Examples for high resolution
@ highest frame rate
(maximum stored frames)
Toggle additional details

1 1,485 99 64
128×96@30.9 (8)
176×144@15.0 (4)
1b 1,485 99 128
128×96@30.9 (8)
176×144@15.0 (4)
1.1 3,000 396 192
176×144@30.3 (9)
320×240@10.0 (3)
352×288@7.5 (2)
1.2 6,000 396 384
320×240@20.0 (7)
352×288@15.2 (6)
1.3 11,880 396 768
320×240@36.0 (7)
352×288@30.0 (6)
2 11,880 396 2,000
320×240@36.0 (7)
352×288@30.0 (6)
2.1 19,800 792 4,000
352×480@30.0 (7)
352×576@25.0 (6)
2.2 20,250 1,620 4,000
352×480@30.7 (12)
352×576@25.6 (10)
720×480@15.0 (6)
720×576@12.5 (5)
3 40,500 1,620 10,000
352×480@61.4 (12)
352×576@51.1 (10)
720×480@30.0 (6)
720×576@25.0 (5)
3.1 108,000 3,600 14,000
720×480@80.0 (13)
720×576@66.7 (11)
1,280×720@30.0 (5)
3.2 216,000 5,120 20,000
1,280×720@60.0 (5)
1,280×1,024@42.2 (4)
4 245,760 8,192 20,000
1,280×720@68.3 (9)
1,920×1,080@30.1 (4)
2,048×1,024@30.0 (4)
4.1 245,760 8,192 50,000
1,280×720@68.3 (9)
1,920×1,080@30.1 (4)
2,048×1,024@30.0 (4)
4.2 522,240 8,704 50,000
1,280×720@145.1 (9)
1,920×1,080@64.0 (4)
2,048×1,080@60.0 (4)
5 589,824 22,080 135,000
1,920×1,080@72.3 (13)
2,048×1,024@72.0 (13)
2,048×1,080@67.8 (12)
2,560×1,920@30.7 (5)
3,672×1,536@26.7 (5)
5.1 983,040 36,864 240,000
1,920×1,080@120.5 (16)
2,560×1,920@51.2 (9)
3,840×2,160@31.7 (5)
4,096×2,048@30.0 (5)
4,096×2,160@28.5 (5)
4,096×2,304@26.7 (5)
5.2 2,073,600 36,864 240,000
1,920×1,080@172.0 (16)
2,560×1,920@108.0 (9)
3,840×2,160@66.8 (5)
4,096×2,048@63.3 (5)
4,096×2,160@60.0 (5)
4,096×2,304@56.3 (5)
6 4,177,920 139,264 240,000
3,840×2,160@128.9 (16)
7,680×4,320@32.2 (5)
8,192×4,320@30.2 (5)
6.1 8,355,840 139,264 480,000
3,840×2,160@257.9 (16)
7,680×4,320@64.5 (5)
8,192×4,320@60.4 (5)
6.2 16,711,680 139,264 800,000
3,840×2,160@300.0 (16)
7,680×4,320@128.9 (5)
8,192×4,320@120.9 (5)

The maximum bit rate for the High Profile is 1.25 times that of the Constrained Baseline, Baseline, Extended and Main Profiles; 3 times for Hi10P, and 4 times for Hi422P/Hi444PP.

The number of luma samples is 16×16=256 times the number of macroblocks (and the number of luma samples per second is 256 times the number of macroblocks per second).

Decoded picture buffering

[edit]

Previously encoded pictures are used by H.264/AVC encoders to provide predictions of the values of samples in other pictures. This allows the encoder to make efficient decisions on the best way to encode a given picture. At the decoder, such pictures are stored in a virtual decoded picture buffer (DPB). The maximum capacity of the DPB, in units of frames (or pairs of fields), as shown in parentheses in the right column of the table above, can be computed as follows:

DpbCapacity = min(floor(MaxDpbMbs / (PicWidthInMbs * FrameHeightInMbs)), 16)

Where MaxDpbMbs is a constant value provided in the table below as a function of level number, and PicWidthInMbs and FrameHeightInMbs are the picture width and frame height for the coded video data, expressed in units of macroblocks (rounded up to integer values and accounting for cropping and macroblock pairing when applicable). This formula is specified in sections A.3.1.h and A.3.2.f of the 2017 edition of the standard.[27]

Level 1 1b 1.1 1.2 1.3 2 2.1 2.2 3 3.1 3.2 4 4.1 4.2 5 5.1 5.2 6 6.1 6.2
MaxDpbMbs 396 396 900 2,376 2,376 2,376 4,752 8,100 8,100 18,000 20,480 32,768 32,768 34,816 110,400 184,320 184,320 696,320 696,320 696,320

For example, for an HDTV picture that is 1,920 samples wide (PicWidthInMbs = 120) and 1,080 samples high (FrameHeightInMbs = 68), a Level 4 decoder has a maximum DPB storage capacity of floor(32768/(120*68)) = 4 frames (or 8 fields). Thus, the value 4 is shown in parentheses in the table above in the right column of the row for Level 4 with the frame size 1920×1080.

The current picture being decoded is not included in the computation of DPB fullness (unless the encoder has indicated for it to be stored for use as a reference for decoding other pictures or for delayed output timing). Thus, a decoder needs to actually have sufficient memory to handle (at least) one frame more than the maximum capacity of the DPB as calculated above.

Implementations

[edit]
A YouTube video statistics with AVC (H.264) video codec and Opus audio format

In 2009, the HTML5 working group was split between supporters of Ogg Theora, a free video format which is thought to be unencumbered by patents, and H.264, which contains patented technology. As late as July 2009, Google and Apple were said to support H.264, while Mozilla and Opera support Ogg Theora (now Google, Mozilla and Opera all support Theora and WebM with VP8).[50] Microsoft, with the release of Internet Explorer 9, has added support for HTML 5 video encoded using H.264. At the Gartner Symposium/ITXpo in November 2010, Microsoft CEO Steve Ballmer answered the question "HTML 5 or Silverlight?" by saying "If you want to do something that is universal, there is no question the world is going HTML5."[51] In January 2011, Google announced that they were pulling support for H.264 from their Chrome browser and supporting both Theora and WebM/VP8 to use only open formats.[52]

On March 18, 2012, Mozilla announced support for H.264 in Firefox on mobile devices, due to prevalence of H.264-encoded video and the increased power-efficiency of using dedicated H.264 decoder hardware common on such devices.[53] On February 20, 2013, Mozilla implemented support in Firefox for decoding H.264 on Windows 7 and above. This feature relies on Windows' built in decoding libraries.[54] Firefox 35.0, released on January 13, 2015, supports H.264 on OS X 10.6 and higher.[55]

On October 30, 2013, Rowan Trollope from Cisco Systems announced that Cisco would release both binaries and source code of an H.264 video codec called OpenH264 under the Simplified BSD license, and pay all royalties for its use to MPEG LA for any software projects that use Cisco's precompiled binaries, thus making Cisco's OpenH264 binaries free to use. However, any software projects that use Cisco's source code instead of its binaries would be legally responsible for paying all royalties to MPEG LA. Target CPU architectures include x86 and ARM, and target operating systems include Linux, Windows XP and later, Mac OS X, and Android; iOS was notably absent from this list, because it does not allow applications to fetch and install binary modules from the Internet.[56][57][58] Also on October 30, 2013, Brendan Eich from Mozilla wrote that it would use Cisco's binaries in future versions of Firefox to add support for H.264 to Firefox where platform codecs are not available.[59] Cisco published the source code to OpenH264 on December 9, 2013.[60]

Although iOS was not supported by the 2013 Cisco software release, Apple updated its Video Toolbox Framework with iOS 8 (released in September 2014) to provide direct access to hardware-based H.264/AVC video encoding and decoding.[57]

Software encoders

[edit]
AVC software implementations
Feature QuickTime Nero OpenH264 x264 Main-
Concept
Elecard  TSE  Pro-
Coder
Avivo Elemental  IPP 
B slices Yes Yes Yes Yes Yes Yes Yes Yes No Yes Yes
Multiple reference frames Yes Yes Yes Yes Yes Yes Yes Yes No Yes Yes
Interlaced coding (PicAFF, MBAFF) No MBAFF MBAFF MBAFF Yes Yes No Yes MBAFF Yes No
CABAC entropy coding Yes Yes Yes Yes Yes Yes Yes Yes No Yes Yes
8×8 vs. 4×4 transform adaptivity No Yes Yes Yes Yes Yes Yes Yes No Yes Yes
Quantization scaling matrices No No Yes Yes Yes No No No No No No
Separate CB and CR QP control No No Yes Yes Yes Yes No No No No No
Extended chroma formats No No No 4:0:0[61]
4:2:0
4:2:2[62]
4:4:4[63]  
4:2:2 4:2:2 4:2:2 No No 4:2:0
4:2:2
No
Largest sample depth (bit) 8 8 8 10[64] 10 8 8 8 8 10 12
Predictive lossless coding No No No Yes[65] No No No No No No No

Hardware

[edit]

Because H.264 encoding and decoding requires significant computing power in specific types of arithmetic operations, software implementations that run on general-purpose CPUs are typically less power efficient. However, the latest[when?] quad-core general-purpose x86 CPUs have sufficient computation power to perform real-time SD and HD encoding. Compression efficiency depends on video algorithmic implementations, not on whether hardware or software implementation is used. Therefore, the difference between hardware and software based implementation is more on power-efficiency, flexibility and cost. To improve the power efficiency and reduce hardware form-factor, special-purpose hardware may be employed, either for the complete encoding or decoding process, or for acceleration assistance within a CPU-controlled environment.

CPU based solutions are known to be much more flexible, particularly when encoding must be done concurrently in multiple formats, multiple bit rates and resolutions (multi-screen video), and possibly with additional features on container format support, advanced integrated advertising features, etc. CPU based software solution generally makes it much easier to load balance multiple concurrent encoding sessions within the same CPU.

The 2nd generation Intel "Sandy Bridge" Core i3/i5/i7 processors introduced at the January 2011 CES (Consumer Electronics Show) offer an on-chip hardware full HD H.264 encoder, known as Intel Quick Sync Video.[66][67]

A hardware H.264 encoder can be an ASIC or an FPGA.

ASIC encoders with H.264 encoder functionality are available from many different semiconductor companies, but the core design used in the ASIC is typically licensed from one of a few companies such as Chips&Media, Allegro DVT, On2 (formerly Hantro, acquired by Google), Imagination Technologies, NGCodec. Some companies have both FPGA and ASIC product offerings.[68]

Texas Instruments manufactures a line of ARM + DSP cores that perform DSP H.264 BP encoding 1080p at 30 fps.[69] This permits flexibility with respect to codecs (which are implemented as highly optimized DSP code) while being more efficient than software on a generic CPU.

Licensing

[edit]

In countries where patents on software algorithms are upheld, vendors and commercial users of products that use H.264/AVC are expected to pay patent licensing royalties for the patented technology that their products use.[70] This applies to the Baseline Profile as well.[71]

A private organization known as MPEG LA, which is not affiliated in any way with the MPEG standardization organization, administers the licenses for patents applying to this standard, as well as other patent pools, such as for MPEG-4 Part 2 Video, HEVC and MPEG-DASH. The patent holders include Fujitsu, Panasonic, Sony, Mitsubishi, Apple, Columbia University, KAIST, Dolby, Google, JVC Kenwood, LG Electronics, Microsoft, NTT Docomo, Philips, Samsung, Sharp, Toshiba and ZTE,[72] although the majority of patents in the pool are held by Panasonic (1,197 patents), Godo Kaisha IP Bridge (1,130 patents) and LG Electronics (990 patents).[73]

On August 26, 2010, MPEG LA announced that royalties would not be charged for H.264 encoded Internet video that is free to end users.[74] All other royalties remain in place, such as royalties for products that decode and encode H.264 video as well as to operators of free television and subscription channels.[75] The license terms are updated in 5-year blocks.[76]

Since the first version of the standard was completed in May 2003 (22 years ago) and the most commonly used profile (the High profile) was completed in June 2004[citation needed] (21 years ago), some of the relevant patents are expired by now,[73] while others are still in force in jurisdictions around the world and one of the US patents in the MPEG LA H.264 pool (granted in 2016, priority from 2001) lasts at least until November 2030.[77]

In 2005, Qualcomm sued Broadcom in US District Court, alleging that Broadcom infringed on two of its patents by making products that were compliant with the H.264 video compression standard.[78] In 2007, the District Court found that the patents were unenforceable because Qualcomm had failed to disclose them to the JVT prior to the release of the H.264 standard in May 2003.[78] In December 2008, the US Court of Appeals for the Federal Circuit affirmed the District Court's order that the patents be unenforceable but remanded to the District Court with instructions to limit the scope of unenforceability to H.264 compliant products.[78]

In October 2023 Nokia sued HP and Amazon for H.264/H.265 patent infringement in USA, UK and other locations.[79]

See also

[edit]

References

[edit]

Further reading

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Advanced Video Coding (AVC), formally known as H.264 or ISO/IEC 14496-10 (MPEG-4 Part 10), is a widely used video compression standard designed for the efficient encoding and decoding of streams in generic audiovisual services. It achieves substantially higher compression efficiency than its predecessors, such as and , typically requiring about half the bitrate for equivalent video quality, which enables delivery over bandwidth-constrained networks. Developed jointly by the (VCEG) and the ISO/IEC (MPEG), the standard was first approved in May 2003 by and July 2003 by MPEG, with subsequent editions adding features like scalable and multiview extensions up to the 15th edition in August 2024. Key innovations in AVC include variable block-size with quarter-sample accuracy and multiple reference frames, an integer-based 4x4 transform (extendable to 8x8 in high profiles), directional intra-prediction modes, and an in-loop to reduce artifacts, all contributing to its robustness against errors and flexibility across diverse applications. The standard defines several profiles to suit different use cases: the Baseline profile for low-complexity applications like videoconferencing with no overhead; the Main profile adding context-adaptive binary arithmetic coding (CABAC) for better efficiency in ; and High profiles (including High 10, High 4:2:2, and High 4:4:4) supporting higher bit depths, , and professional workflows like film . Extensions such as Scalable Video Coding (SVC), (MVC), and Stereo High profiles further enable layered bitstream scalability, 3D video, and stereoscopic content, respectively. AVC has become foundational for modern video technologies, powering Blu-ray discs, digital television broadcasting, video streaming services like and , mobile video, and IP-based surveillance systems, despite requiring 2-4 times more computational resources for encoding than earlier standards. Its network-friendly design supports packetization for protocols like RTP and integration with systems such as MPEG-2 transport streams, ensuring low-latency decoding and exact match reconstruction in error-prone environments. Supplemental enhancement information (SEI) messages allow embedding of metadata for advanced features like HDR tone mapping and frame packing, with ongoing updates maintaining relevance even as successors like HEVC (H.265) emerge.

Introduction

Overview

Advanced Video Coding (AVC), also known as H.264 or MPEG-4 Part 10, is a block-oriented, motion-compensated video compression standard developed jointly by the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG). It achieves high compression efficiency for digital video storage, transmission, and playback by reducing redundancy in video data while maintaining quality. The standard supports a wide range of resolutions, from low-definition formats like QCIF (176×144 pixels) to ultrahigh-definition up to 8192×4320 pixels at its highest level (Level 6.2). At its core, AVC employs techniques such as an integer-based 4×4 (DCT) for frequency-domain representation of residual data, intra-frame and inter-frame prediction to exploit spatial and temporal correlations, and methods including context-adaptive variable-length coding (CAVLC) and context-adaptive binary arithmetic coding (CABAC) for efficient bitstream representation. These elements enable the codec to handle diverse applications, from video to broadcast and streaming services. Released in May 2003, AVC quickly became the most widely deployed by the , powering Blu-ray discs, , and online streaming platforms due to its superior performance. Compared to its predecessor , AVC provides up to 50% better compression efficiency at similar quality levels, allowing for higher resolution video at lower .

Naming Conventions

Advanced Video Coding (AVC) is known by several designations stemming from its joint development by the Video Coding Experts Group (VCEG) and the ISO/IEC (MPEG), resulting in primary names such as H.264 for the ITU-T recommendation and MPEG-4 Part 10 for the ISO/IEC standard. The H.264 name follows the ITU-T's conventional numbering for video coding recommendations in the H.26x series, where it was officially titled "Advanced video coding for generic audiovisual services" upon its initial publication in May 2003. Similarly, MPEG-4 Part 10, formalized as ISO/IEC 14496-10, integrates AVC into the broader MPEG-4 family of standards for coding audio-visual objects, emphasizing its role in multimedia applications beyond basic video compression. The multiplicity of names arises from this collaborative effort, with "Advanced Video Coding" (AVC) serving as a neutral shorthand that highlights improvements over prior codecs like H.263, such as enhanced compression efficiency for low-bitrate applications. During development, the project was initially termed H.26L by VCEG starting in 1998, evolving through the Joint Video Team (JVT) formed in 2001, which produced a unified specification adopted by both organizations. The "MPEG-4 AVC" variant underscores its alignment with the MPEG-4 ecosystem, while the full "MPEG-4 Part 10" avoids conflation with other parts, such as Part 2 (Visual), which employs simpler coding methods. AVC serves as a neutral and has become the predominant common usage in technical literature and industry, unifying references to the standard across contexts despite its multiple aliases, including the developmental JVT label. This evolution reflects the standard's rapid adoption following its 2003 release. Common misconceptions include confusing AVC with its successor, (HEVC or H.265), which builds upon but is distinct from H.264, or with the earlier baseline for lower-complexity video telephony.

History

Development Timeline

The development of Advanced Video Coding (AVC), also known as H.264 or MPEG-4 Part 10, began as a joint effort between the (VCEG) and the ISO/IEC (MPEG). In 1998, VCEG initiated the H.26L project as a long-term effort to create a successor to earlier video coding standards like , with the first test model (TML-1) released in August 1999. By 2001, following MPEG's open call for technology in July, the two organizations formalized their collaboration by forming the Joint Video Team (JVT) in December, aiming to develop a unified standard for advanced video compression. This partnership was driven by the need for a versatile capable of supporting emerging applications in and . The collaborative process involved rigorous evaluation through core experiments conducted in 2001, where numerous proposals from global contributors were tested to identify optimal technologies. These experiments led to consensus on key elements, including variable block sizes for , multiple prediction modes for intra and inter coding, and an integer-based transform for efficient residual representation. Building on this foundation, the JVT produced the first draft in July 2002, followed by a final draft ballot in December 2002 that achieved technical freeze. The standard reached final approval by in May 2003 as Recommendation H.264 and by ISO/IEC in July 2003 as 14496-10, marking the completion of the initial version. Early adoption of AVC was propelled by its superior compression efficiency, offering up to 50% bit rate reduction compared to predecessors like and while maintaining equivalent video quality, making it ideal for bandwidth-constrained environments. Targeted applications included broadband internet streaming, DVD storage, and (HDTV) broadcast, where its enhanced robustness and flexibility addressed limitations in prior standards. Following the 2003 release, the first corrigendum was issued in May 2004 to address minor corrections and clarifications. By 2005, amendments had introduced features for improved error resilience in challenging transmission scenarios and high-fidelity profiles via the Fidelity Range Extensions (FRExt), expanding applicability to professional workflows.

Key Extensions and Profiles

The Advanced Video Coding (AVC) standard, also known as H.264, has been extended through several amendments to address diverse applications, including professional workflows, scalable streaming, and immersive 3D content, while maintaining with the base specification via the (NAL) unit syntax. These extensions build upon the core block-based hybrid coding framework, introducing enhanced tools for higher fidelity, adaptability, and multi-dimensional representation without altering the fundamental decoding process for legacy conformant bitstreams. Fidelity Range Extensions (FRExt), approved in July 2004 as Amendment 1 to ITU-T H.264 and ISO/IEC 14496-10, expanded AVC capabilities for high-end production environments by supporting bit depths of 10 and 12 bits per sample, additional color spaces such as RGB and YCgCo, and lossless coding modes. These features enable efficient handling of professional-grade video, such as in and archiving, where higher precision reduces banding artifacts and supports broader without introducing compression losses in selected modes. Scalable Video Coding (SVC), standardized in July 2007 as Amendment 3, introduces hierarchical structures, including medium-grained through layered NAL units, to facilitate bit-rate adaptation, spatial/temporal resolution scaling, and quality enhancement in real-time streaming and mobile applications. SVC bitstreams allow extraction of subsets for lower-bandwidth scenarios while preserving high-quality decoding for full streams, achieving up to 50% bitrate savings over in scalable scenarios. Multiview Video Coding (MVC), integrated in the March 2009 edition of H.264/AVC, extends the standard to encode multiple synchronized camera views with inter-view prediction, enabling efficient compression for 3D stereoscopic and free-viewpoint by exploiting across viewpoints. This amendment defines the Multiview High Profile, which reduces bitrate by approximately 20-30% compared to independent encoding of views, supporting up to 128 views while remaining compatible with single-view decoders through prefixed base view NAL units. Further 3D enhancements, developed from 2010 to 2014, include depth-plus-view coding in MVC extensions (MVC+D) and asymmetric frame packing, which integrate depth maps with texture views for advanced , such as in Blu-ray Disc stereoscopic playback. These tools, specified in later amendments like (2012), enable view synthesis and improved compression for depth-based 3D content, with depth data coded at lower resolutions to optimize bitrate while supporting backward-compatible stereoscopic profiles. Professional profiles within FRExt, such as High 10 (10-bit intra/inter prediction for reduced quantization noise), High 4:2:2 (supporting broadcast for SDI workflows), and High 4:4:4 (full chroma resolution with RGB/palette modes for and ), cater to studio and transmission needs by handling progressive formats up to and lossless intra-coding. The High 4:4:4 Profile, initially defined in 2004, was later refined in to emphasize additional color spaces while ensuring NAL-based . All extensions leverage the NAL unit header extensions and prefix mechanisms to ensure seamless integration, allowing base AVC decoders to ignore enhanced layers and process only the compatible base layer, thus preserving ecosystem-wide adoption.

Versions and Amendments

The Advanced Video Coding (AVC) standard, jointly developed as ITU-T Recommendation H.264 and ISO/IEC International Standard 14496-10, has evolved through multiple editions and amendments since its initial publication. The first edition was approved in May 2003 by ITU-T and July 2003 by ISO/IEC, establishing the baseline specification for block-oriented, motion-compensated video compression. Subsequent editions integrated key extensions, with the standard reaching its 15th edition in August 2024 for ITU-T H.264, corresponding to version 28 of ISO/IEC 14496-10. The eleventh edition of ISO/IEC 14496-10 was published in July 2025, technically revising the prior edition by integrating the 2024 updates, including additional SEI messages for neural-network post-filtering and color type identifiers, along with minor corrections. Early amendments focused on enhancing fidelity and scalability. The second edition, approved in November 2004, incorporated the Fidelity Range Extensions (FRExt), adding High, High 10, High 4:2:2, and High 4:4:4 profiles to support higher bit depths and chroma formats for professional applications. The third edition, approved in November 2007, integrated Amendment 3 to introduce Scalable Video Coding (SVC) in three profiles (Scalable Baseline, Scalable High, and Scalable High Intra), enabling temporal, spatial, and quality scalability. The fourth edition, approved in May 2009, added (MVC) along with the Constrained Baseline Profile for improved in stereoscopic and multiview content. In 2012, an amendment to the seventh edition introduced MVC extensions for 3D-AVC, including depth handling and 3D-related SEI messages for enhanced stereoscopic and multiview applications. Post-2020 updates have emphasized metadata for emerging applications. The 14th edition, approved in August 2021, added SEI messages for annotated regions to support interactive and region-specific . The 15th edition, approved in August 2024, introduced SEI messages specifying neural-network post-filter characteristics, activation, and phase indication, in alignment with H.274 for AI-enhanced decoding, alongside additional color type identifiers and minor corrections such as the removal of Annex F. These enhancements enable integration with neural network-based post-processing for improved perceptual quality. Over 20 corrigenda have been issued since 2003 to address errata in syntax, semantics, and decoder conformance behavior, with notable examples including Corrigendum 1 to the first edition (May 2004) for minor corrections and Corrigendum 1 to the second edition ( 2005) for clarifications integrated into subsequent publications. Maintenance of the standard is conducted by the Joint Video Team (JVT) and the Joint Collaborative Team on Video Coding (JCT-VC), achieving core stability by 2010 while continuing to approve targeted amendments for ongoing relevance in diverse audiovisual services.

Design

Core Features

Advanced Video Coding (AVC), standardized as H.264 and ISO/IEC MPEG-4 Part 10, employs a block-based hybrid coding framework that combines spatial and temporal prediction with to achieve high compression efficiency. The fundamental processing unit is the , consisting of a 16×16 block of luma samples and two 8×8 blocks of chroma samples (for color format), which allows for flexible partitioning to adapt to local video characteristics. These macroblocks can be subdivided into partitions ranging from 16×16 down to 4×4 blocks, enabling finer-grained that reduces residual errors compared to fixed block sizes in prior standards. Prediction in AVC exploits both spatial and temporal redundancies to generate a reference signal for each macroblock. Intra-prediction operates within a frame using directional modes: nine angular modes for 4×4 luma blocks, four modes (vertical, horizontal, DC, and plane) for 16×16 luma blocks, and four modes for 8×8 chroma blocks, allowing extrapolation from neighboring samples to minimize spatial residuals. Inter-prediction, used in P and B slices, performs motion-compensated temporal prediction with variable block sizes (up to seven partition types per macroblock) and supports multiple reference frames (up to 16 in certain configurations), employing quarter-sample accuracy for luma and eighth-sample for chroma via interpolation filters, which enhances accuracy over integer-sample motion in earlier codecs. Motion vectors are differentially coded using a predictor derived from the median of neighboring vectors, reducing overhead from spatial correlations in motion fields. After prediction, the residual signal undergoes transform and quantization to compact energy into fewer coefficients. AVC applies a separable integer transform approximating the (DCT): primarily 4×4 blocks for luma AC coefficients and chroma, with 4×4 or 8×8 for luma DC in intra modes, and an optional 8×8 transform available in high-profile extensions for better frequency selectivity. Quantization employs a scalar approach with 52 uniform steps (for 8-bit video), where the step size doubles approximately every six levels, balancing bitrate and distortion while allowing rate control through parameter adjustments. Entropy coding further compresses the quantized coefficients, motion data, and syntax elements using two methods: context-adaptive variable-length coding (CAVLC), which selects from multiple exponential-Golomb or Huffman-like code tables based on local statistics for coefficients and runs, or context-adaptive binary arithmetic coding (CABAC), which models probabilities adaptively for binary symbols and achieves 5–15% bitrate savings over CAVLC by exploiting inter-symbol dependencies. CABAC binarizes non-binary syntax elements and uses adaptive contexts for higher efficiency in complex scenes. To mitigate coding artifacts, AVC incorporates an in-loop applied to block edges after reconstruction, adaptively adjusting filter strength based on modes, quantization parameters, and boundary conditions to reduce blocking discontinuities while preserving edges, which improves both subjective quality and prediction efficiency by 5–10% in bitrate savings. The filter can be disabled per if it risks blurring details. The bitstream is structured via the Network (NAL), which encapsulates video coding layer (VCL) data—such as slices containing macroblocks—into self-contained units with headers indicating type and importance. NAL units include sequence parameter sets (SPS) and picture parameter sets (PPS) for global and frame-level configuration, slice units for segmented decoding, and supplemental enhancement information (SEI) messages for non-essential metadata like buffering hints, enabling robust transmission over networks by allowing independent packetization and error resilience.

Profiles

In Advanced Video Coding (AVC), profiles specify constrained subsets of the coding tools, parameters, and syntax elements to meet the needs of particular applications, balancing compression efficiency, , and robustness. Each profile is identified by a unique profile_idc value signaled in the sequence parameter set (SPS) of the , which indicates the feature set and ensures decoder conformance. The SPS syntax element profile_idc, an 8-bit unsigned integer, along with associated constraint flags (e.g., constraint_set0_flag to constraint_set6_flag), defines the active profile and any additional restrictions. The Baseline Profile (profile_idc = 66) targets low-complexity, low-latency applications in error-prone environments, such as video conferencing and mobile streaming. It supports intra (I) and predicted (P) slices, 4x4 integer transforms, Context-Adaptive Variable-Length Coding (CAVLC) for , and 8-bit chroma format, but excludes bi-predictive (B) slices, Context-Adaptive Binary Arithmetic Coding (CABAC), interlaced coding, flexible macroblock ordering (FMO), arbitrary slice ordering (ASO), and redundant pictures to minimize decoder complexity and enhance error resilience. The Main Profile (profile_idc = 77) extends the Baseline Profile for broader broadcast and streaming use cases, adding support for B slices, CABAC , , weighted , and frame/field adaptive coding while retaining CAVLC and excluding FMO, ASO, and redundant pictures. This profile enables higher compression efficiency for entertainment content, such as and DVD storage, at typically ranging from 1 to 8 Mbps. The Extended Profile (profile_idc = 88) builds on the Baseline Profile with enhancements for resilience in streaming over unreliable networks, incorporating B slices, weighted , SP/SI slices for switching and recovery, slice data partitioning, FMO, ASO, and redundant pictures, but omitting CABAC and to maintain moderate complexity. It is suited for applications like video delivery at of 50–1500 kbps. The High Profile (profile_idc = 100) is optimized for high-quality applications like HDTV broadcasting, introducing 8x8 integer transforms, 8x8 intra prediction modes, custom quantization matrices, auxiliary components, and adaptive macroblock-to-slice grouping on top of Main Profile features, all with 8-bit 4:2:0 chroma. Variants extend fidelity further: High 10 Profile (profile_idc = 110) supports up to 10-bit depth; High 4:2:2 Profile (profile_idc = 122) adds 4:2:2 chroma and up to 10-bit depth for professional production; and High 4:4:4 Predictive Profile (profile_idc = 244) enables 4:4:4 chroma, up to 14-bit depth, separate color plane coding, and lossless mode for high-end and . Intra-only variants (signaled via constraint_set3_flag = 1) restrict to I slices for simplified editing workflows. The following table compares key feature support across profiles:
FeatureBaselineMainExtendedHighHigh 10High 4:2:2High 4:4:4 Predictive
I/P SlicesYesYesYesYesYesYesYes
B SlicesNoYesYesYesYesYesYes
CABACNoYesNoYesYesYesYes
CAVLCYesYesYesYesYesYesYes
8x8 Transform/IntraNoNoNoYesYesYesYes
Weighted PredictionNoYesYesYesYesYesYes
Interlaced CodingNoYesNoYesYesYesYes
FMO/ASO/Redundant PicsNoNoYesNoNoNoNo
Data Partitioning/SI-SPNoNoYesNoNoNoNo
Chroma Format4:2:04:2:04:2:04:2:04:2:04:2:24:4:4
Bit Depth (max)8888101014
Lossless ModeNoNoNoNoNoNoYes
Extensions like Scalable Video Coding (SVC) build upon these profiles by adding scalability layers, but are defined in separate amendments.

Levels

In Advanced Video Coding (AVC), also known as H.264, levels define a set of constraints on operational parameters to ensure decoder and limit computational, , and bitrate requirements across different applications. These levels impose limits on factors such as the maximum number of macroblocks processed per second (MaxMBs), maximum frame size in macroblocks (MaxFS), maximum video bitrate (MaxBR), maximum coded picture buffer size (MaxCPB), maximum decoded picture buffer size in macroblocks (MaxDpbMbs), and maximum decoding frame buffering (MaxDecFrameBuffering). There are 16 levels, ranging from Level 1 for low-end mobile devices to Level 6.2 for ultra-high-definition applications up to . Level 1b provides an additional low-complexity option with higher bitrate allowance than Level 1. The level is signaled in the bitstream via the level_idc syntax element in the Sequence Parameter Set (SPS), where values from 10 (Level 1) to 62 (Level 6.2) indicate the conforming level, and 9 denotes Level 1b. Profile-level combinations, such as Main@Level 4 or High@Level 4.1, specify both the toolset (profile) and constraints (level) for a stream, enabling devices to declare supported capabilities. For example, Main@Level 4 supports high-definition broadcast applications like at 30 frames per second (fps). Key parameters vary by level and profile; for instance, bitrates differ between Baseline/Main profiles and High profiles, with High profiles allowing higher MaxBR for improved efficiency in complex content. Level 4.1 accommodates at 30 fps with up to 50 Mbps in certain profiles, while Level 5.1 supports 4K UHD at 30 fps. These constraints ensure the maximum decoding time per frame aligns with processing capabilities, interacting with buffer management for smooth playback. Extensions like Scalable Video Coding (SVC) and (MVC) require higher levels due to increased complexity from scalability layers or multiple views, often necessitating Level 4.1 or above for practical deployment. The following table summarizes representative parameters for selected levels in the Baseline/Main profiles (High profiles have elevated MaxBR values, e.g., 14 Mbps for Level 3.1 High versus 10 Mbps for Main). Values are drawn from H.264 Annex A.
LevelMaxMBs (macroblocks/s)MaxFS (macroblocks)MaxBR (kbit/s, Baseline/Main)MaxCPB (kbit)Example Resolution @ fps
11,4859964175QCIF (176×144) @ 15
211,8803962,0002,000 (352×288) @ 30
3.1108,0003,60010,00014,000 (1280×720) @ 30
4245,7608,19220,00025,000 (1920×1080) @ 30
4.2522,2408,70450,00062,500 (1920×1080) @ 60
5.1983,04036,864240,000300,0004K (3840×2160) @ 30
6.24,147,2003,686,400800,0006,000,0008K (8192×4320) @ 60

Decoded Picture Buffering

The Decoded Picture Buffer (DPB) in (AVC), also known as H.264, serves to store decoded pictures that are used for motion-compensated prediction and for reordering pictures to match the display order, which may differ from the decoding order due to the use of hierarchical prediction structures. This buffering mechanism enables efficient inter-frame prediction by allowing multiple reference pictures to be retained, supporting up to a maximum of 16 pictures in the DPB depending on the profile and level constraints. Key parameters governing the DPB are specified in the Sequence Parameter Set (SPS), including num_reorder_frames, which indicates the maximum number of frames that can be reordered for output from the DPB, and max_dec_frame_buffering, which signals the maximum number of decoded frames that the DPB is required to hold for both reference and output reordering purposes. If num_reorder_frames is not explicitly present in the SPS, its value is inferred to equal max_dec_frame_buffering to ensure compliance. The bitstream must be constructed such that the required DPB capacity, typically num_reorder_frames + num_ref_frames (where num_ref_frames is the number of reference frames signaled in the SPS), does not exceed max_dec_frame_buffering, ensuring the decoder can handle the buffering needs without overflow. Reference picture marking in the DPB is managed through two primary processes to assign and retire pictures as references: the implicit sliding window mechanism and explicit adaptive Control Operations (MMCO). In the sliding window process, when a new short-term picture is added to the DPB and it reaches capacity, the oldest short-term picture is automatically marked as unused for in a first-in, first-out manner, maintaining a fixed-size buffer without explicit commands. The MMCO process, signaled via syntax elements in the , provides finer control by allowing operations such as marking a picture as unused for , assigning long-term indices, or sliding the window explicitly, which is particularly useful for irregular GOP structures or to optimize for specific content. The Hypothetical Reference Decoder (HRD) models both the Coded Picture Buffer (CPB) for bitstream arrival and the DPB to verify timing and buffer compliance, preventing underflow or overflow in compliant decoders. The HRD ensures that the DPB adheres to the signaled parameters by simulating decoding and output processes, with initial delays derived from SEI messages or VUI parameters to establish the startup timing for low-latency applications. For B-frame handling, the DPB facilitates reordering to resolve the mismatch between coding order (where B-frames precede subsequent reference frames) and display order, buffering non-reference B-frames until their output time while prioritizing reference pictures for . In low-delay profiles, such as Baseline, reordering is minimized or eliminated by restricting B-frames, reducing latency by ensuring pictures are output immediately after decoding without buffering for future references. This approach balances compression with real-time constraints in applications like video conferencing.

Applications

Primary Uses

Advanced Video Coding (AVC), also known as H.264, serves as a foundational standard for (HDTV) broadcasting within digital TV frameworks such as the Advanced Television Systems Committee (ATSC) and (DVB) standards. In ATSC, AVC High Profile at Level 4.1 enables efficient compression for and resolutions, supporting up to 1920x1080 at 60 fps while maintaining broadcast quality at lower bitrates compared to prior standards like MPEG-2. Similarly, specifications incorporate AVC for HDTV transmission over satellite, cable, and terrestrial networks, allowing broadcasters to deliver multiple HD channels within constrained bandwidth. In video streaming, AVC dominates adaptive bitrate workflows on platforms like and , where Baseline and Main Profiles facilitate seamless quality adjustments based on network conditions, ensuring broad compatibility across devices. This approach supports efficient delivery of on-demand and live content, with AVC's intra-frame prediction and enabling up to 50% bitrate reduction over older codecs without perceptible quality loss in standard (SDR) streams. uses AVC High Profile alongside HEVC and for optimizing adaptive streams from mobile to 4K resolutions. For storage media, AVC is the mandatory for Blu-ray Disc, supporting High Profile for high-bitrate HD content up to 40 Mbps, which ensures lossless-like quality on optical discs while enabling with DVD re-encoding workflows. This compression efficiency allows full-length HD movies to fit on dual-layer discs, reducing storage needs for archival and distribution. In mobile and video conferencing applications, AVC's Baseline Profile provides low-latency encoding essential for real-time communication, as seen in implementations on resource-constrained devices. Apple's similarly mandates H.264 support for video calls, leveraging its Baseline Profile to deliver smooth streams over cellular networks with minimal buffering. AVC is also widely used in IP-based systems, where its error resilience and efficient compression support real-time video transmission over networks with varying bandwidth and error conditions. A notable recent advancement is the 2024 update to H.264, which introduces Supplemental Enhancement Information (SEI) messages for neural-network post-filter characteristics, enabling AI-driven upscaling and enhancement in 4K and 8K streaming pipelines without increasing bitrate. This facilitates post-processing for sharper details and in live and VOD scenarios. As of 2025, AVC maintains significant , with 80% of video developers adopting it primarily for cross-platform compatibility in streaming and broadcast ecosystems.

Derived Formats

Several derived formats have been developed by industry players to adapt Advanced Video Coding (AVC, also known as H.264) for specific applications, such as professional production, consumer recording, and file distribution, while preserving core compatibility features. These formats typically encapsulate AVC bitstreams in tailored containers and may restrict certain profiles or tools to meet ecosystem needs, ensuring seamless integration with existing hardware and software decoders. One prominent example is , developed by for professional intra-frame video recording in camcorders and production workflows. This format complies with the H.264/AVC standard but limits encoding to intra-frame only, using the High 10 Intra profile for 50 Mb/s bitrates and High 422 Intra for 100 Mb/s, both at 10-bit 4:2:2 color sampling to support high-quality editing without inter-frame dependencies. Encapsulated in MXF files, AVC-Intra is optimized for P2-series camcorders like the AJ-PX270 and AJ-PX5000, enabling efficient storage on memory cards while maintaining broadcast-grade fidelity for systems. Sony's represents another key adaptation, extending AVC to support 4K resolutions and higher frame rates for both professional and consumer cameras. Compliant with H.264/AVC High Profile up to level 5.2, uses an MXF OP-1a wrapper for bitrates up to 600 Mb/s in intra-frame modes, accommodating 3840x2160 at 60p and incorporating Multiview Coding (MVC) extensions for 3D stereoscopic content. This makes it suitable for cinema and production, as seen in cameras like the PMW-F55, where it balances compression efficiency with support for S-Log gamma and . For consumer high-definition camcorders, provides a widely adopted format using AVC's Main and High Profiles within an Transport Stream (TS) container. Developed jointly by and , it compresses video at bitrates from 12 to 28 Mb/s, paired with or Linear PCM audio, to enable long recording times on DVDs, HDDs, and memory cards. Version 2.0 extends support to 60p and 3D via MVC, ensuring compatibility with Blu-ray players and TVs while prioritizing ease of playback in home ecosystems. In the realm of unofficial, community-driven adaptations for file sharing and distribution, DivX has incorporated AVC support through its DivX Plus HD variant, which encodes H.264 bitstreams into MKV or MP4 containers with custom profiles optimized for internet streaming and portable devices. This allows for high-efficiency compression at lower bitrates compared to earlier MPEG-4 ASP versions, maintaining broad decoder compatibility despite non-standard tweaks like enhanced AAC audio integration. Similarly, open-source tools like x264, an AVC encoder, are frequently integrated into MP4 and MKV containers via libraries such as FFmpeg, enabling custom profiles for peer-to-peer sharing without official standardization. XviD, while primarily an MPEG-4 ASP encoder, has been paired in hybrid workflows with AVC streams for backward-compatible file sharing, though its core remains distinct. A unifying aspect of these derived formats is their adherence to AVC's (NAL) syntax, which ensures with standard H.264 decoders by structuring bitstreams into self-contained units that can be parsed without proprietary extensions. This design allows derived content, such as clips or files, to be decoded by generic AVC hardware and software, facilitating across ecosystems while enabling specialized features like intra-only encoding or 4K support.

Implementations

Software Encoders and Decoders

is an open-source H.264/AVC encoder developed by the project, renowned for its high compression efficiency and support for all H.264 profiles, including Baseline, Main, and High, through advanced features like two-pass rate control that optimizes bitrate allocation across video segments. This encoder employs sophisticated algorithms for and rate-distortion optimization, enabling superior quality at lower bitrates compared to reference implementations. FFmpeg's library provides an integrated H.264 encoder and decoder, widely adopted in applications for its versatility in handling various input formats and supporting real-time encoding scenarios, such as . The encoder leverages as a backend option, allowing tunable presets for balancing speed and quality in real-time workflows. For commercial applications, particularly in broadcast environments, MainConcept offers the AVC/H.264 SDK, a professional-grade toolkit that delivers high-speed encoding and decoding with hooks for GPU acceleration via technologies like Intel Quick Sync Video, facilitating efficient transcoding pipelines. On the decoder side, FFmpeg's H.264 decoder ensures compliance with all H.264 levels, from Level 1 to 5.2, supporting bitstreams up to 4K resolutions and enabling seamless integration in playback tools. Additionally, the Joint Video Team (JVT) provides the open-source JM reference software as a verifiable implementation of the H.264 standard, used for conformance testing and research. Performance benchmarks highlight x264's efficiency, achieving high encoding speeds on modern multi-core CPUs like i9 or using medium presets, underscoring its balance of speed and visual quality. Ongoing development in these tools includes updates to enhance H.264 features; for example, , which integrates x264 and FFmpeg, released version 1.10.2 in September 2025 with improved codec handling, building on prior enhancements to support advanced Supplemental Enhancement Information (SEI) messages for metadata embedding.

Hardware Implementations

Hardware implementations of Advanced Video Coding (AVC), also known as H.264, primarily involve dedicated application-specific integrated circuits () and accelerators integrated into processors, GPUs, and system-on-chips (SoCs) to enable efficient encoding and decoding. These hardware solutions offload computationally intensive tasks from general-purpose CPUs, achieving real-time for high-definition and beyond while minimizing power usage in consumer devices. Early examples include dedicated ASICs like the GPU, which powers boards and supports hardware-accelerated H.264 decoding up to at 30 frames per second (fps) via its multimedia processing unit. Similarly, Intel's Quick Sync Video technology, embedded in CPUs with integrated graphics starting from the generation, provides dedicated hardware cores for both H.264 encoding and decoding, supporting profiles up to High 10 and levels suitable for in later implementations. GPU-based acceleration has become prominent for AVC processing, leveraging programmable shaders and fixed-function encoders. NVIDIA's NVENC (NVIDIA Encoder) is a hardware video encoding engine integrated into , , and Tesla GPUs since the Kepler architecture, enabling real-time H.264 encoding at and up to 60 fps, with support for APIs to integrate encoding into software pipelines. AMD's (VCE), found in GPUs from the Southern Islands series onward, offers comparable H.264 encoding capabilities, including baseline, main, and high profiles, often utilized through or direct hardware access for applications requiring low-latency encoding. These GPU accelerators typically outperform software-only methods in throughput, handling multiple simultaneous streams for tasks like live broadcasting. In mobile and embedded systems, AVC hardware is deeply integrated into SoCs for seamless support in smartphones, tablets, and smart TVs. Qualcomm's Snapdragon processors, such as the Snapdragon 8 series, incorporate dedicated video processing units that handle H.264 encoding and decoding up to Level 5.1, corresponding to 4K UHD at 60 fps, enabling efficient playback and capture in devices like high-end Android smartphones. Samsung's SoCs, used in devices, feature similar integrated decoders supporting H.264 up to , with for both AVC and its extensions to ensure smooth video experiences in mobile and TV applications. Blu-ray players, as per the Blu-ray Disc specification, mandate hardware decoders capable of processing H.264 High Profile at Level 4.1 for content, ensuring compatibility with the format's primary video compression standard. These hardware implementations deliver substantial efficiency gains over software decoding, particularly for , including reduced processing time, usage, and power consumption—often by a factor of several times—for HD streams on battery-powered devices. Recent SoCs, such as the RK3588, extend this capability to 8K resolution H.264 decoding at 30 fps, incorporating support for Supplemental Enhancement Information (SEI) messages to handle advanced metadata like HDR signaling introduced in updates around 2024. Apple's A-series SoCs, powering devices from the iPhone XS onward, include dedicated video encode/decode engines that support H.264 High Profile and the (MVC) extension for 3D stereoscopic content, facilitating hardware-accelerated playback of Blu-ray 3D discs and spatial video formats. While software decoders serve as fallbacks in hybrid systems, hardware paths dominate for power-constrained real-time applications.

Licensing and Adoption

Licensing Framework

The licensing framework for Advanced Video Coding (AVC), also known as H.264, is primarily managed through patent pools that aggregate essential patents from multiple contributors to provide implementers with a single point of access under fair, reasonable, and non-discriminatory (FRAND) terms. The main pool, administered by Via Licensing Alliance (formerly ), was established in 2003 and encompasses essential patents from more than 41 companies (as of 2022), including major licensors such as Technology, Inc., Dolby Laboratories, , and Panasonic Corporation. This structure simplifies compliance for manufacturers and service providers by offering a consolidated license covering patents necessary for AVC implementation, including extensions like (MVC). The fee structure distinguishes between encoders and decoders, with royalties focused on commercial encoders to encourage widespread adoption. For encoders, the royalty rate is $0.20 per unit after the first 100,000 units annually, with an annual cap of $25 million per legal entity to provide cost predictability for high-volume producers. Decoders are royalty-free for products below volume thresholds (e.g., 100,000 units annually). In 2010, (now Via Licensing) announced a permanent of royalties for distributed free to end users, covering streaming and download services without subscription or fees, which has supported the codec's dominance in web-based applications. Additional patent pools, such as HEVC Advance, hold patents that may overlap with AVC implementations, particularly where they intersect with successor standards like HEVC (H.265). These pools offer separate licensing options, but implementers are advised to cross-reference with Via Licensing to ensure comprehensive coverage under FRAND commitments. The framework has seen no major changes since the formation of additional pools; Via Licensing's AVC Patent Portfolio License Agreement remains the core reference document, requiring licensees to report usage and adhere to essentiality declarations (as of 2022, over 2,000 licensees).

Current Usage Statistics

According to the 8th Annual Bitmovin Video Developer Report (2024–2025), based on a survey of over 1,000 video professionals, 79% used (AVC, also known as H.264) in their workflows, underscoring its enduring role despite the emergence of newer codecs (as of early 2025). In streaming applications, H.264 maintains dominance, with virtually all live and video-on-demand traffic relying on it for broad compatibility as of 2025; industry analyses indicate that over 90% of streaming platforms prioritize H.264-based formats to ensure seamless playback across diverse infrastructures. Adoption trends show a gradual decline in H.264 for high-resolution content, particularly 4K and HDR, where (HEVC) and (AV1) are gaining ground due to superior compression efficiency; however, H.264 persists as the compatibility baseline for mixed-device ecosystems. For 8K video, H.264 usage remains niche, often supplemented by neural enhancement techniques to address bandwidth limitations. H.264 decoder support is ubiquitous across major browsers such as Chrome and , smart TVs, and mobile devices, with penetration rates approaching 99% globally as of 2025, enabling near-universal playback without additional plugins. Looking ahead, H.264 is projected to remain stable through 2030 for legacy and compatibility-driven applications, bolstered by 2024 amendments to Supplemental Enhancement Information (SEI) messages that enhance integration with AI-driven workflows, such as timecode embedding and metadata handling. Market forecasts from ITU reports and analyses by Wowza and Gumlet affirm this trajectory, with the AVC sector expected to grow steadily amid ongoing transitions.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.