Hubbry Logo
Network Abstraction LayerNetwork Abstraction LayerMain
Open search
Network Abstraction Layer
Community hub
Network Abstraction Layer
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Network Abstraction Layer
Network Abstraction Layer
from Wikipedia

The Network Abstraction Layer (NAL) is a part of the H.264/AVC and HEVC video coding standards. The main goal of the NAL is the provision of a "network-friendly" video representation addressing "conversational" (video telephony) and "non conversational" (storage, broadcast, or streaming) applications. NAL has achieved a significant improvement in application flexibility relative to prior video coding standards.

Introduction

[edit]

An increasing number of services and growing popularity of high definition TV are creating greater needs for higher coding efficiency. Moreover, other transmission media such as cable modem, xDSL, or UMTS offer much lower data rates than broadcast channels, and enhanced coding efficiency can enable the transmission of more video channels or higher quality video representations within existing digital transmission capacities. Video coding for telecommunication applications has diversified from ISDN and T1/E1 service to embrace PSTN, mobile wireless networks, and LAN/Internet network delivery. Throughout this evolution, continued efforts have been made to maximize coding efficiency while dealing with the diversification of network types and their characteristic formatting and loss/error robustness requirements.

The H.264/AVC and HEVC standards are designed for technical solutions including areas like broadcasting (over cable, satellite, cable modem, DSL, terrestrial, etc.) interactive or serial storage on optical and magnetic devices, conversational services, video-on-demand or multimedia streaming, multimedia messaging services, etc. Moreover, new applications may be deployed over existing and future networks. This raises the question about how to handle this variety of applications and networks. To address this need for flexibility and customizability, the design covers a NAL that formats the Video Coding Layer (VCL) representation of the video and provides header information in a manner appropriate for conveyance by a variety of transport layers or storage media.

The NAL is designed in order to provide "network friendliness" to enable simple and effective customization of the use of VCL for a broad variety of systems. The NAL facilitates the ability to map VCL data to transport layers such as:[1]

  • RTP/IP for any kind of real-time wire-line and wireless Internet services.[1]
  • File formats, e.g., ISO MP4 for storage and MMS.[1]
  • H.32X for wireline and wireless conversational services.[1]
  • MPEG-2 systems for broadcasting services, etc.[1]

The full degree of customization of the video content to fit the needs of each particular application is outside the scope of the video coding standardization effort, but the design of the NAL anticipates a variety of such mappings. Some key concepts of the NAL are NAL units, byte stream, and packet formats uses of NAL units, parameter sets, and access units. A short description of these concepts is given below.

NAL units

[edit]

The coded video data is organized into NAL units, each of which is effectively a packet that contains an integer number of bytes. The first byte of each H.264/AVC NAL unit is a header byte that contains an indication of the type of data in the NAL unit. For HEVC the header was extended to two bytes. All the remaining bytes contain payload data of the type indicated by the header. The NAL unit structure definition specifies a generic format for use in both packet-oriented and bitstream-oriented transport systems, and a series of NAL units generated by an encoder is referred to as a NAL unit stream.

NAL Units in Byte-Stream Format Use

[edit]

Some systems require delivery of the entire or partial NAL unit stream as an ordered stream of bytes or bits within which the locations of NAL unit boundaries need to be identifiable from patterns within the coded data itself. For use in such systems, the H.264/AVC and HEVC specifications define a byte stream format. In the byte stream format, each NAL unit is prefixed by a specific pattern of three bytes called a start code prefix. The boundaries of the NAL unit can then be identified by searching the coded data for the unique start code prefix pattern. The use of emulation prevention bytes guarantees that start code prefixes are unique identifiers of the start of a new NAL unit. A small amount of additional data (one byte per video picture) is also added to allow decoders that operate in systems that provide streams of bits without alignment to byte boundaries to recover the necessary alignment from the data in the stream. Additional data can also be inserted in the byte stream format that allows expansion of the amount of data to be sent and can aid in achieving more rapid byte alignment recovery, if desired.

NAL Units in Packet-Transport System Use

[edit]

In other systems (e.g., IP/RTP systems), the coded data is carried in packets that are framed by the system transport protocol, and identification of the boundaries of NAL units within the packets can be established without use of start code prefix patterns. In such systems, the inclusion of start code prefixes in the data would be a waste of data carrying capacity, so instead the NAL units can be carried in data packets without start code prefixes.

VCL and Non-VCL NAL Units

[edit]

NAL units are classified into VCL and non-VCL NAL units.

  • VCL NAL units contain the data that represents the values of the samples in the video pictures.
  • Non-VCL NAL units contain any associated additional information such as parameter sets (important header data that can apply to a large number of VCL NAL units) and supplemental enhancement information (timing information and other supplemental data that may enhance usability of the decoded video signal but are not necessary for decoding the values of the samples in the video pictures).

Parameter Sets

[edit]

A parameter set contains shared configuration data that is carried in non-VCL NAL units. Parameter sets are typically reused when decoding many coded pictures within a video sequence. Each VCL NAL unit references a picture parameter set (PPS), which in turn references a sequence parameter set (SPS). There are two types of parameter sets:

  • Sequence parameter set (SPS), which specifies mostly constant configuration such as resolution, bit depth, or chroma format. (For a concrete implementation, see FFmpeg's SPS struct.)
  • Picture parameter set (PPS), which applies on top of an SPS, and specifies configuration such as QP offsets. (For a concrete implementation, see FFmpeg's PPS struct.)

The sequence and picture parameter-set mechanism decouples the transmission of infrequently changing information from the transmission of coded representations of the values of the samples in the video pictures. Each VCL NAL unit contains an identifier that refers to the content of the relevant picture parameter set and each picture parameter set contains an identifier that refers to the content of the relevant sequence parameter set. In this manner, a small amount of data (the identifier) can be used to refer to a larger amount of information (the parameter set) without repeating that information within each VCL NAL unit. Sequence and picture parameter sets can be sent well ahead of the VCL NAL units that they apply to, and can be repeated to provide robustness against data loss. In some applications, parameter sets may be sent within the channel that carries the VCL NAL units (termed "in-band" transmission). In other applications, it can be advantageous to convey the parameter sets "out-of-band" using a more reliable transport mechanism than the video channel itself.

Parameter Set use with reliable "out-of-band" parameter set exchange

Access Units

[edit]

A set of NAL units in a specified form is referred to as an access unit. The decoding of each access unit results in one decoded picture. Each access unit contains a set of VCL NAL units that together compose a primary coded picture. It may also be prefixed with an access unit delimiter to aid in locating the start of the access unit. Some supplemental enhancement information containing data such as picture timing information may also precede the primary coded picture. The primary coded picture consists of a set of VCL NAL units consisting of slices or slice data partitions that represent the samples of the video picture. Following the primary coded picture may be some additional VCL NAL units that contain redundant representations of areas of the same video picture. These are referred to as redundant coded pictures, and are available for use by a decoder in recovering from loss or corruption of the data in the primary coded pictures. Decoders are not required to decode redundant coded pictures if they are present. Finally, if the coded picture is the last picture of a coded video sequence (a sequence of pictures that is independently decodable and uses only one sequence parameter set), an end of sequence NAL unit may be present to indicate the end of the sequence; and if the coded picture is the last coded picture in the entire NAL unit stream, an end of stream NAL unit may be present to indicate that the stream is ending.

Structure of a NAL Access Unit

Coded Video Sequences

[edit]

A coded video sequence consists of a series of access units that are sequential in the NAL unit stream and use only one sequence parameter set. Each coded video sequence can be decoded independently of any other coded video sequence, given the necessary parameter set information, which may be conveyed "in-band" or "out-of-band". At the beginning of a coded video sequence is an instantaneous decoding refresh (IDR) access unit. An IDR access unit contains an intra picture which is a coded picture that can be decoded without decoding any previous pictures in the NAL unit stream, and the presence of an IDR access unit indicates that no subsequent picture in the stream will require reference to pictures prior to the intra picture it contains in order to be decoded. A NAL unit stream may contain one or more coded video sequence.

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
The Network Abstraction Layer (NAL) is a fundamental component of video compression standards developed by the (VCEG) and the ISO/IEC (MPEG), including H.264/AVC and H.265/HEVC, that encapsulates the output of the Video Coding Layer (VCL) into a standardized, network-agnostic format for efficient transmission and storage across diverse systems. Introduced in the H.264/AVC standard in 2003, the NAL serves as an intermediary layer between the VCL—which handles the core compression of video into syntax elements like coded pictures and slices—and lower-level transport protocols, enabling flexible adaptation to various network environments such as RTP/IP, transport streams, or file formats without altering the underlying video encoding. Key features include the division of the into self-contained NAL units (NALUs), each comprising a one-byte header specifying the unit type (e.g., coded slice, sequence parameter set) and a carrying either VCL for video content or non-VCL for metadata like supplemental enhancement information (SEI). This structure supports error resilience through mechanisms like single NALU packetization or aggregation, and it facilitates scalability by allowing independent decoding of certain units. In H.265/HEVC, released in 2013, the NAL builds on this foundation with enhancements for higher efficiency, including longer headers for additional temporal and layer identifiers to support multi-view and scalable extensions, while maintaining with H.264 transport formats. The NAL's design emphasizes robustness and ; for instance, it uses a "start " prefix in byte-stream formats or fields in packet-oriented formats to delineate units, preventing from affecting the entire stream. Widely adopted in applications ranging from streaming services and broadcast television to mobile video and surveillance systems, the NAL has influenced subsequent standards like H.266/VVC, where it continues to abstract high-level syntax for even greater compression ratios. Its protocol-independent nature has also enabled integrations with real-time transport protocols, as detailed in IETF RFCs, ensuring seamless delivery over IP networks. Overall, the NAL remains a cornerstone of modern technologies, balancing coding efficiency with practical deployment needs.

Introduction and Background

Definition and Purpose

The Network Abstraction Layer (NAL) is a key component in video coding standards such as H.264/AVC, serving as an intermediary layer that formats the output of the Video Coding Layer (VCL) into a structure suitable for transmission over diverse networks or storage in various media. The VCL focuses on efficiently representing the video content through processes like motion-compensated prediction and , while the NAL operates above it to encapsulate this data into discrete, network-friendly packets known as NAL units, which act as the basic building blocks for data organization. Positioned below transport protocols such as RTP/IP or systems, the NAL ensures that the underlying video data remains independent of specific delivery mechanisms, allowing seamless integration across heterogeneous environments like wireline , wireless networks, or broadcast systems. The primary purpose of the NAL is to enable "network friendliness" by providing a standardized interface that supports customization of VCL without modifying the core video encoding process. This includes facilitating error resilience through flexible packetization strategies, where can be divided into smaller units to minimize the impact of transmission losses, and signaling essential parameters for decoding directly within the . Additionally, the NAL promotes adaptability to varying network conditions by allowing the video stream to be tailored for different bandwidths or error rates, all while preserving the integrity of the VCL's compressed representation. Key benefits of the NAL include simplified integration with transport protocols, as it abstracts away the complexities of video-specific formatting from upper-layer systems, thereby reducing development overhead for applications like real-time streaming or file storage. It also supports to video content by defining structures that enable quick and decoding from arbitrary points in the stream, which is crucial for interactive applications. Furthermore, the NAL facilitates bitstream subsetting, allowing subsets of the encoded data to be extracted and transmitted independently for purposes, such as in low-bandwidth scenarios.

Historical Development and Standards

The Network Abstraction Layer (NAL) was first introduced as a key component of the H.264/ (AVC) standard, finalized in May 2003 by the Joint Video Team (JVT), a collaborative effort between the Video Coding Experts Group (VCEG) and the ISO/IEC (MPEG). This layer was designed to provide a "network-friendly" representation of video data, separating the Video Coding Layer (VCL) from non-VCL elements to facilitate flexible transport over diverse networks, including packet-based systems. The JVT's work built on prior video coding efforts, aiming to enhance error resilience and interoperability for applications like streaming and . The evolution of NAL continued with the (HEVC, H.265) standard, published in April 2013 by the Joint Collaborative Team on Video Coding (JCT-VC), which succeeded the JVT and continued the VCEG and MPEG partnership. HEVC extended NAL functionality with a two-byte header to support advanced scalability features, including explicit signaling of layer identifiers and temporal identifiers, enabling more efficient handling of multilayer bitstreams compared to the one-byte header in H.264/AVC. This change allowed for better temporal sublayer access and improved extensibility for future enhancements, addressing the growing demands of higher-resolution video. Further advancements appeared in the (VVC, H.266) standard, approved in August 2020 by the Joint Video Experts Team (JVET), which expanded the scope of previous teams to include ongoing maintenance of H.264/AVC and HEVC alongside new developments. VVC retained the two-byte NAL header structure from HEVC but refined it with fields such as a six-bit layer ID, six-bit NAL unit type, and three-bit temporal ID plus one, supporting enhanced tools like subpictures and affine for ultra-high-definition content. These updates improved compression efficiency by approximately 30-50% over HEVC while maintaining in NAL design principles. Key differences across standards include the NAL header length—one byte in H.264/AVC versus two bytes in both HEVC and VVC—and the introduction of temporal ID signaling in the latter two for hierarchical temporal scalability, which was absent in AVC. As of 2025, the JVET and MPEG continue exploratory work on next-generation standards beyond VVC, with workshops drafting requirements for improved efficiency in AI-driven and immersive video applications, potentially leading to a new H-series recommendation around 2029-2030.

Core Components of NAL

NAL Units

The Network Abstraction Layer (NAL) in H.264/AVC defines NAL units as the fundamental atomic elements of the coded video bitstream, each consisting of a one-byte header followed by a Raw Byte Sequence Payload (RBSP). The header encapsulates essential metadata for parsing and decoding, while the RBSP carries the actual payload data in a format designed for network-friendly transmission. This structure ensures that NAL units can be independently extracted and processed, facilitating error resilience and flexible packetization in various transport environments. The NAL unit header employs a fixed one-byte syntax with three bit fields: the forbidden_zero_bit (1 bit, always set to 0 to indicate a valid unit and prevent emulation of start codes), nal_ref_idc (2 bits, ranging from 0 for non-reference units to 1-3 for reference units indicating priority for error recovery), and nal_unit_type (5 bits, with values from 1 to 31 specifying the payload category, such as 1-5 for coded slice data or 6 for Supplemental Enhancement Information (SEI)). NAL units are classified as Video Coding Layer (VCL) or non-VCL based on the nal_unit_type value, with VCL units containing core video content and non-VCL units providing metadata. To maintain bitstream integrity, the RBSP incorporates emulation prevention bytes: whenever two consecutive zero bytes (0x00 0x00) appear in the payload—except at the end—a byte with value 0x03 is inserted before the next byte, avoiding sequences that could be mistaken for primary or secondary start codes (0x000001 or 0x000002). These emulation prevention bytes are removed by the decoder during RBSP-to-payload conversion. In H.265/HEVC, the NAL unit header is extended to two bytes to support multi-layer and temporal scalability. It includes: forbidden_zero_bit (1 bit, must be 0), nuh_layer_id (6 bits, identifying the layer for scalable or multiview extensions, 0 for base layer), nuh_temporal_id_plus1 (3 bits, indicating the temporal sub-layer, from 1 to 7), and nal_unit_type (6 bits, ranging from 0 to 63). This design replaces the nal_ref_idc field and enables better identification of decoding dependencies. Common payload types within NAL units include coded slice data, which represents compressed video frames or partitions thereof, and SEI messages, which convey optional non-essential information such as timing, buffering requirements, or user data without affecting decoding conformance. The RBSP for these payloads consists of a string of data bits (SODB) terminated by a single stop bit (1), followed by zero bits for byte alignment, ensuring the total length is a multiple of eight bits. For certain modes like CABAC, an additional cabac_zero_word (two bytes of 0x0000) may be appended to the RBSP to avoid decoder buffer issues, though it is discarded during processing.

VCL and Non-VCL NAL Units

In video coding standards such as H.264/AVC and H.265/HEVC, NAL units are classified into two primary categories: Video Coding Layer (VCL) NAL units and non-VCL NAL units. This distinction separates the core video data from supporting metadata, enabling efficient handling in network environments. VCL NAL units encapsulate the essential coded video content, while non-VCL NAL units provide auxiliary information necessary for proper decoding and enhanced functionality. VCL NAL units contain the coded slice data that represents the actual video pictures. In H.264/AVC, these correspond to NAL unit types 1 through 5, which include coded slices of non-IDR pictures, data partitions, and IDR slices essential for points. Similarly, in H.265/HEVC, VCL NAL units are identified by NAL unit types 0 through 31, encompassing various slice types such as trailing slices (TRAIL), temporal sub-layer access (TSA), and instantaneous decoding refresh (IDR) slices. For scalable video coding extensions, such as those in H.264/SVC, type 20 denotes coded slices in scalable extensions, allowing layered enhancement of video quality. These units form the backbone of picture reconstruction, directly contributing to the visual output. Non-VCL NAL units, in contrast, handle auxiliary data that supports but does not directly encode the video content. In H.264/AVC, types 6 through 12 cover supplemental enhancement information (SEI), sequence parameter sets (SPS), picture parameter sets (PPS), access unit delimiters, and filler data. H.265/HEVC extends this with types 32 through 63, including video parameter sets (VPS), SPS, PPS, and SEI messages. sets, a key subset of non-VCL units, convey decoding parameters like resolution and profile without delving into slice specifics. These units enable features such as timing and buffering control. VCL NAL units rely heavily on non-VCL units for decoding parameters; for instance, slices in VCL units reference SPS and PPS from non-VCL units to interpret coding structures correctly. Conversely, non-VCL units like SEI facilitate advanced features, including error recovery through messages on picture timing or , which aid in concealing transmission losses without altering core video data. This interdependency ensures robust video delivery across networks. A representative example of a VCL NAL unit is a coded slice (type 1 in H.264), which carries compressed data for a portion of , directly impacting visual . In comparison, an SEI message (type 6 in H.264 or 39 in H.265) as a non-VCL unit embeds user data or timing metadata, such as hints, which enhances playback but is not required for basic reconstruction. Regarding error sensitivity, VCL units are highly critical, as their loss or corruption can prevent picture reconstruction and propagate errors in subsequent frames, whereas non-VCL units are generally more tolerant—SEI can often be discarded with minimal impact, though parameter sets demand careful protection to avoid decoding failures.

Parameter Sets

Parameter sets are essential non-VCL NAL units in video coding standards such as H.264/AVC that convey decoder configuration information without requiring frequent repetition in the . They enable efficient transmission by separating essential metadata from video coding layer (VCL) data, allowing decoders to initialize and adapt to sequence or picture properties once and reuse them across multiple frames. The primary types of parameter sets in H.264/AVC are the Sequence Parameter Set (SPS), identified by nal_unit_type 7, and the Picture Parameter Set (PPS), identified by nal_unit_type 8. The SPS provides sequence-level information applicable to an entire coded video sequence, including the profile and level (via profile_idc and level_idc), (derived from num_units_in_tick and time_scale), resolution (via pic_width_in_mbs_minus1 and pic_height_in_mbs_minus1), and other properties. In contrast, the PPS specifies picture-level details for one or more pictures within the sequence, such as the mode (via entropy_coding_mode_flag, where 0 indicates CAVLC and 1 indicates CABAC) and slice group configurations (via num_slice_groups_minus1). Parameter sets are activated and referenced indirectly through slice headers, which include seq_parameter_set_id to link to the active SPS and pic_parameter_set_id to link to the active PPS. An SPS becomes active when first referenced by an IDR access unit or slice header and persists until a new SPS is activated, while a PPS activates upon reference by the first slice of a picture and remains valid for subsequent slices in that picture. Out-of-band transmission of parameter sets is supported, allowing them to be delivered separately from the main video stream via external channels, which decouples them from VCL NAL units for greater flexibility. Later standards extend parameter sets for advanced features; for instance, (HEVC, H.265) introduces the Video Parameter Set (VPS) with nal_unit_type 32 to support multi-layer coding structures. The VPS provides high-level information, including layer dependencies (via direct_dependency_flag), maximum layers (via vps_max_layers_minus1), and dimensions for applications like spatial or quality and multiview video. It is referenced by SPS via sps_video_parameter_set_id and activates at the start of a coded video sequence or group, typically in an IRAP access unit with TemporalId 0. These parameter sets reduce bitstream overhead by avoiding redundant transmission of configuration data in every slice or picture header, potentially saving significant bandwidth in high-resolution or long-duration videos. By isolating parameter sets from VCL data, they also enhance error resilience, as in video content does not necessarily affect decoder initialization, and lost sets can be retransmitted independently. Key syntax elements in the SPS exemplify its role in defining decoding parameters; for example, chroma_format_idc specifies the format (e.g., 1 for ), while bit_depth_luma_minus8 indicates the luma bit depth beyond 8 bits (e.g., 0 for 8-bit). Similar precision applies to chroma via bit_depth_chroma_minus8, ensuring compatibility with high-dynamic-range content. In the VPS of HEVC, elements like vps_num_layer_sets_minus1 further delineate layer set configurations for efficient multi-layer extraction and rendering.

Data Organization

Access Units

In the Network Abstraction Layer (NAL) of the H.264/AVC video coding standard, an access unit is defined as a set of NAL units that are consecutive in decoding order and collectively represent the data required to produce a single output picture, always including exactly one primary coded picture composed of video coding layer (VCL) NAL units. This primary coded picture may be accompanied by associated non-VCL NAL units, such as supplemental enhancement information (SEI) NAL units for metadata like timing or user data, sequence parameter set (SPS) NAL units, picture parameter set (PPS) NAL units if they apply to this picture, or filler data NAL units for bitrate control. The structure ensures that decoding an access unit yields one decoded picture, facilitating temporal synchronization in video streams. Access units are delimited in the bitstream by the presence of the primary coded picture's VCL NAL units, which indicate the start of a new unit in decoding order. An optional access unit NAL unit, identified by nal_unit_type = 9, can precede the primary coded picture to explicitly mark the boundary, aiding parsers in low-delay or fragmented transport scenarios without relying solely on VCL unit detection. This carries primary_pic_type information to specify the slice types (e.g., I, P, B) in the upcoming picture, enhancing efficiency in network environments. For decodability, each access unit must reference active parameter sets (SPS and PPS) that are either included as non-VCL NAL units within the unit or have been previously delivered and remain valid, ensuring the decoder has all necessary decoding parameters like profile, resolution, and slice structure without prior context. This requirement supports robust decoding in error-prone networks, where mechanisms like intra-refresh—periodic insertion of intra-coded macroblocks within non-IDR pictures—allow gradual recovery from errors without full resynchronization, reducing drift in predicted frames. Random access points are provided by instantaneous decoding refresh (IDR) access units, where the primary coded picture consists solely of I or SI slices with nal_unit_type = 5, enabling a decoder to start playback cleanly from that point without referencing prior pictures, as it resets the reference picture buffer. Such units are essential for stream seeking, error recovery, and multi-broadcast scenarios, marking clean temporal boundaries in the video sequence. In extensions like scalable video coding (SVC, Annex G) and multiview video coding (MVC, Annex H) of H.264/AVC, access units accommodate multiple layers or views to support and stereoscopic/; for instance, an SVC access unit includes a base layer primary coded picture plus enhancement layers for spatial, temporal, or quality , while an MVC access unit groups coded pictures from dependent views alongside the base view for efficient inter-view . These extensions maintain the core access unit concept but expand it to include non-primary coded pictures, ensuring with base-layer decoders.

Coded Video Sequences

A coded video sequence (CVS) in the context of the Network Abstraction Layer (NAL) is defined as a series of consecutive access units in the NAL unit stream that share the same sequence parameter set (SPS), enabling independent decodability provided the necessary parameter sets are available either in-band or out-of-band. In H.264/AVC, a CVS specifically consists, in decoding order, of an instantaneous decoding refresh (IDR) access unit followed by zero or more non-IDR access units, up to but not including the next IDR access unit or the end of the bitstream. This structure ensures that the sequence begins with an intra-coded picture in the IDR unit, from which subsequent pictures can be decoded without reference to prior sequences. For conformance, a CVS must adhere to the Hypothetical Reference Decoder (HRD) parameters specified in the SPS, which define buffering and decoding constraints to guarantee timely processing by compliant decoders. The H.264/AVC standard focuses on decoder conformance, meaning any meeting the syntactic and semantic rules—including proper NAL unit ordering and parameter set activation—produces identical output across conforming decoders, though encoding quality is not guaranteed. In HEVC (H.265), conformance extends this by incorporating an instantaneous point (IRAP) access unit as the starting point, replacing the IDR for broader random access support, while still relying on HRD parameters from the SPS for buffer management. Parameter set updates can initiate a new CVS; a fresh SPS (or in HEVC, a video parameter set (VPS) alongside SPS) signals changes in sequence-level parameters, such as resolution or profile, effectively ending the prior sequence. In H.264/AVC, the end of a CVS may be explicitly marked by an end of sequence NAL unit (type 13), though this is rarely used in practice, with sequences more commonly delimited by the next IDR or bitstream termination. HEVC allows VPS and SPS updates within or across CVSs to accommodate dynamic adjustments, with the VPS providing global parameters for the entire sequence. In applications, the CVS structure validates the integrity of video streams for storage, transmission, or editing, as it encapsulates self-contained segments that can be subsetted or concatenated while maintaining decodability. This logical organization supports robust handling in network environments, where individual CVSs can be extracted or rejoined without affecting overall stream conformance. HEVC introduces the VPS (NAL unit type 32) to enhance sequence handling in layered coding scenarios, such as scalable or multiview extensions, where it specifies inter-layer dependencies and overall structure, differing from H.264/AVC's reliance solely on SPS for single-layer sequences. This addition enables more flexible multi-layer CVSs while preserving for base-layer decoding using only SPS and picture parameter sets (PPS).

Transport and Usage

Byte-Stream Format

The byte-stream format organizes Network Abstraction Layer (NAL) units into a continuous, serialized stream suitable for storage and non-packetized transmission, where each NAL unit is preceded by a start code prefix for clear delimitation. The start code consists of either the three-byte sequence 0x000001 or the four-byte sequence 0x00000001, ensuring that decoders can reliably identify the boundaries of individual NAL units without additional length fields. This approach, defined in Annex B of (), transforms the discrete NAL units into a seamless byte stream resembling traditional video elementary streams. To facilitate accurate parsing, the format incorporates rules for start code emulation prevention within the NAL unit payloads. Specifically, when encoding the raw byte sequence payload (RBSP), any two consecutive zero bytes (0x00 0x00) followed by a byte less than 0x03 are escaped by inserting a three-byte emulation prevention code (0x000003); this prevents false start codes from appearing inside the data, allowing parsers to extract units solely by scanning for the unique start code patterns. Extraction begins at the first occurrence of a start code, with the preceding bytes (if any) treated as trailing data from the previous unit, and continues until the next start code is detected. This format offers simplicity and low overhead for applications involving file storage or direct streaming over byte-oriented channels, such as raw H.264 video files (often with .h264 extension) or transport streams, where no complex packetization is required. However, it has limitations for network transport, particularly over IP protocols, as it provides no inherent mechanism for fragmentation or reordering of NAL units across packets, leading to potential issues. Moreover, in lossy channels, or loss of bytes can propagate errors across multiple NAL units, hindering robust recovery without external error detection. A similar byte-stream format is employed in (HEVC, H.265), also detailed in its Annex B, using the same start code prefixes and emulation prevention techniques to maintain compatibility with H.264-based workflows while supporting enhanced compression efficiency.

Packet-Transport Systems

The Network Abstraction Layer (NAL) facilitates packet-based transport by adapting its units for protocols like the (RTP), primarily through the removal of start codes that delimit NAL units in byte-stream formats, allowing direct encapsulation into RTP payloads for efficient network transmission. This adaptation enables the packetization of one or more NAL units per RTP packet, supporting reliable delivery over unreliable networks such as UDP/IP. In the RTP payload format for H.264/AVC, defined in RFC 6184, NAL units are transported in three primary modes: single NAL unit mode, where an entire NAL unit fits within one RTP packet; non-interleaved mode for low-latency applications; and interleaved mode for higher efficiency in bandwidth-constrained scenarios. Aggregation packets combine multiple NAL units into a single RTP payload to reduce header overhead, including Single-Time Aggregation Packets (STAP-A and STAP-B) for units sharing the same timestamp and Multi-Time Aggregation Packets (MTAP16 and MTAP24) for units with differing timestamps, using Decoding Order Number (DON) fields to maintain reassembly order. Fragmentation Units (FU-A and FU-B) handle oversized NAL units by splitting them across multiple RTP packets, with FU-A used in non-interleaved mode and FU-B incorporating DON for interleaved operation; the start (S) and end (E) bits in the FU header indicate fragment boundaries. Error handling in this format relies on the forbidden bit (F) in the NAL unit header, which media-aware network elements (MANEs) set to 1 to signal detected bit errors or syntax violations, prompting decoders to discard affected units. The DON mechanism further aids resilience by enabling reordering of out-of-order packets and detection of losses, while optional redundancy through (FEC) or retransmission protocols like NACK-based recovery can protect against in unreliable transports. This packet-transport approach is widely applied in real-time streaming scenarios, such as video conferencing and IP-based broadcast, where RTP over UDP/IP ensures low-latency delivery of H.264-encoded video. Extensions for (HEVC/H.265) in RFC 7798 build on the H.264 format with longer NAL unit headers (two bytes) to accommodate HEVC's enhanced syntax, while retaining similar aggregation via Aggregation Packets (e.g., AP-A) and fragmentation via Fragmentation Units (FU), including DON-based ordering for error-prone networks. Error recovery incorporates RTCP feedback messages like Picture Loss Indication () and Slice Loss Indication (SLI) to trigger intra-frame refreshes or retransmissions, supporting scalable real-time applications like 4K streaming.
Add your contribution
Related Hubs
User Avatar
No comments yet.