Hubbry Logo
Advanced Audio CodingAdvanced Audio CodingMain
Open search
Advanced Audio Coding
Community hub
Advanced Audio Coding
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something
Advanced Audio Coding
Advanced Audio Coding
from Wikipedia
Advanced Audio Coding
Filename extension.3gp, .aac, .adif, .adts, .m4a, .m4b, .m4p, .m4r, .mp4
Internet media type
audio/aac
audio/aacp
audio/3gpp
audio/3gpp2
audio/mp4
audio/mp4a-latm
audio/mpeg4-generic
Developed byAT&T Labs, Dolby Laboratories, Fraunhofer Society, Sony Corporation
Initial releaseDecember 1997; 27 years ago (1997-12)[1]
Latest release
ISO/IEC 14496-3:2019
December 2019; 5 years ago (2019-12)
Type of formatLossy audio
Contained byMPEG-4 Part 14, 3GP and 3G2, ISO base media file format and Audio Data Interchange Format (ADIF)
StandardISO/IEC 13818-7,
ISO/IEC 14496-3
Open format?Yes
Free format?No[2]

Advanced Audio Coding (AAC) is an audio coding standard for lossy digital audio compression. It was developed by Dolby, AT&T, Fraunhofer and Sony,[3][4][5] originally as part of the MPEG-2 specification but later improved under MPEG-4.[6][7] AAC was designed to be the successor of the MP3 format (MPEG-2 Audio Layer III) and generally achieves higher sound quality than MP3 at the same bit rate.[8] AAC encoded audio files are typically packaged in an MP4 container most commonly using the filename extension .m4a.[9][10]

The basic profile of AAC (both MPEG-4 and MPEG-2) is called AAC-LC (Low Complexity). It is widely supported in the industry and has been adopted as the default or standard audio format on products including Apple's iTunes Store, Nintendo's Wii,[11] DSi and 3DS and Sony's PlayStation 3.[12] It is also further supported on various other devices and software such as iPhone, iPod, PlayStation Portable and Vita, PlayStation 5, Android and older cell phones,[13] digital audio players like Sony Walkman and SanDisk Clip, media players such as VLC, Winamp and Windows Media Player, various in-dash car audio systems,[14] and is used on Spotify,[a] Google Nest, Amazon Alexa.[citation needed][15] Apple Music, and YouTube web streaming services.[16] AAC has been further extended into HE-AAC (High Efficiency, or AAC+), which improves efficiency over AAC-LC.[17] Another variant is AAC-LD (Low Delay).[18]

AAC supports inclusion of 48 full-bandwidth (up to 96 kHz) audio channels in one stream plus 16 low frequency effects (LFE, limited to 120 Hz) channels, up to 16 "coupling" or dialog channels, and up to 16 data streams. The quality for stereo is satisfactory to modest requirements at 96 kbit/s in joint stereo mode; however, hi-fi transparency demands data rates of at least 128 kbit/s (VBR). Tests[which?] of MPEG-4 audio have shown that AAC meets the requirements referred to as "transparent" for the ITU at 128 kbit/s for stereo, and 384 kbit/s for 5.1 audio.[19] AAC uses only a modified discrete cosine transform (MDCT) algorithm, giving it higher compression efficiency than MP3, which uses a hybrid coding algorithm that is part MDCT and part FFT.[8]

History

[edit]

Background

[edit]

The discrete cosine transform (DCT), a type of transform coding for lossy compression, was proposed by Nasir Ahmed in 1972, and developed by Ahmed with T. Natarajan and K. R. Rao in 1973, publishing their results in 1974.[20][21][22] This led to the development of the modified discrete cosine transform (MDCT), proposed by J. P. Princen, A. W. Johnson and A. B. Bradley in 1987,[23] following earlier work by Princen and Bradley in 1986.[24] The MP3 audio coding standard introduced in 1992 used a hybrid coding algorithm that is part MDCT and part FFT.[25] AAC uses a purely MDCT algorithm, giving it higher compression efficiency than MP3.[8] Development further advanced when Lars Liljeryd introduced a method that radically shrank the amount of information needed to store the digitized form of a song or speech.[26]

AAC was developed with cooperation between AT&T Labs, Dolby, Fraunhofer IIS (who developed MP3) and Sony Corporation.[3] AAC was officially declared an international standard by the Moving Picture Experts Group in April 1997. It is specified both as Part 7 of the MPEG-2 standard, and Subpart 4 in Part 3 of the MPEG-4 standard.[27] Further companies have contributed to development in later years including Bell Labs, LG Electronics, NEC, Nokia, Panasonic, ETRI, JVC Kenwood, Philips, Microsoft, and NTT.[28][29]

Standardization

[edit]

In 1997, AAC was first introduced as MPEG-2 Part 7, formally known as ISO/IEC 13818-7:1997. This part of MPEG-2 was a new part, since MPEG-2 already included MPEG-2 Part 3, formally known as ISO/IEC 13818-3: MPEG-2 BC (Backwards Compatible).[30][31] Therefore, MPEG-2 Part 7 is also known as MPEG-2 NBC (Non-Backward Compatible), because it is not compatible with the MPEG-1 audio formats (MP1, MP2 and MP3).[30][32][33][34]

MPEG-2 Part 7 defined three profiles: Low-Complexity profile (AAC-LC / LC-AAC), Main profile (AAC Main) and Scalable Sampling Rate profile (AAC-SSR). AAC-LC profile consists of a base format very much like AT&T's Perceptual Audio Coding (PAC) coding format,[35][36][37] with the addition of temporal noise shaping (TNS),[38] the Kaiser window (described below), a nonuniform quantizer, and a reworking of the bitstream format to handle up to 16 stereo channels, 16 mono channels, 16 low-frequency effect (LFE) channels and 16 commentary channels in one bitstream. The Main profile adds a set of recursive predictors that are calculated on each tap of the filterbank. The SSR uses a 4-band PQMF filterbank, with four shorter filterbanks following, in order to allow for scalable sampling rates.

In 1999, MPEG-2 Part 7 was updated and included in the MPEG-4 family of standards and became known as MPEG-4 Part 3, MPEG-4 Audio or ISO/IEC 14496-3:1999. This update included several improvements. One of these improvements was the addition of Audio Object Types which are used to allow interoperability with a diverse range of other audio formats such as TwinVQ, CELP, HVXC, speech synthesis and MPEG-4 Structured Audio. Another notable addition in this version of the AAC standard is Perceptual Noise Substitution (PNS). In that regard, the AAC profiles (AAC-LC, AAC Main and AAC-SSR profiles) are combined with perceptual noise substitution and are defined in the MPEG-4 audio standard as Audio Object Types.[39] MPEG-4 Audio Object Types are combined in four MPEG-4 Audio profiles: Main (which includes most of the MPEG-4 Audio Object Types), Scalable (AAC LC, AAC LTP, CELP, HVXC, TwinVQ, Wavetable Synthesis, TTSI), Speech (CELP, HVXC, TTSI) and Low Rate Synthesis (Wavetable Synthesis, TTSI).[39][40]

The reference software for MPEG-4 Part 3 is specified in MPEG-4 Part 5 and the conformance bit-streams are specified in MPEG-4 Part 4. MPEG-4 Audio remains backward-compatible with MPEG-2 Part 7.[41]

The MPEG-4 Audio Version 2 (ISO/IEC 14496-3:1999/Amd 1:2000) defined new audio object types: the low delay AAC (AAC-LD) object type, bit-sliced arithmetic coding (BSAC) object type, parametric audio coding using harmonic and individual line plus noise and error resilient (ER) versions of object types.[42][43][44] It also defined four new audio profiles: High Quality Audio Profile, Low Delay Audio Profile, Natural Audio Profile and Mobile Audio Internetworking Profile.[45]

The HE-AAC Profile (AAC LC with SBR) and AAC Profile (AAC LC) were first standardized in ISO/IEC 14496-3:2001/Amd 1:2003.[46] The HE-AAC v2 Profile (AAC LC with SBR and Parametric Stereo) was first specified in ISO/IEC 14496-3:2005/Amd 2:2006.[47][48][49] The Parametric Stereo audio object type used in HE-AAC v2 was first defined in ISO/IEC 14496-3:2001/Amd 2:2004.[50][51][52]

The current version of the AAC standard is defined in ISO/IEC 14496-3:2009.[53]

AAC+ v2 is also standardized by ETSI (European Telecommunications Standards Institute) as TS 102005.[50]

The MPEG-4 Part 3 standard also contains other ways of compressing sound. These include lossless compression formats, synthetic audio and low bit-rate compression formats generally used for speech.

AAC's improvements over MP3

[edit]

Advanced Audio Coding is designed to be the successor of the MPEG-1 Audio Layer 3, known as MP3 format, which was specified by ISO/IEC in 11172-3 (MPEG-1 Audio) and 13818-3 (MPEG-2 Audio).

Improvements include:

  • more sample rates (from 8 to 96 kHz) than MP3 (16 to 48 kHz);
  • up to 48 channels (MP3 supports up to two channels in MPEG-1 mode and up to 5.1 channels in MPEG-2 mode);
  • arbitrary bit rates and variable frame length. Standardized constant bit rate with bit reservoir;
  • higher efficiency and simpler filter bank. AAC uses a pure MDCT (modified discrete cosine transform), rather than MP3's hybrid coding (which was part MDCT and part FFT);
  • higher coding efficiency for stationary signals (AAC uses a blocksize of 1024 or 960 samples, allowing more efficient coding than MP3's 576 sample blocks);
  • higher coding accuracy for transient signals (AAC uses a blocksize of 128 or 120 samples, allowing more accurate coding than MP3's 192 sample blocks);
  • possibility to use Kaiser-Bessel derived window function to eliminate spectral leakage at the expense of widening the main lobe;
  • much better handling of audio frequencies above 16 kHz;
  • more flexible joint stereo (different methods can be used in different frequency ranges);
  • additional modules (tools) added to increase compression efficiency: TNS, backwards prediction, perceptual noise substitution (PNS), etc. These modules can be combined to constitute different encoding profiles.

Overall, the AAC format allows developers more flexibility to design codecs than MP3 does, and corrects many of the design choices made in the original MPEG-1 audio specification. This increased flexibility often leads to more concurrent encoding strategies and, as a result, to more efficient compression. This is especially true at very low bit rates where the superior stereo coding, pure MDCT, and better transform window sizes leave MP3 unable to compete.

Adoption

[edit]

While the MP3 format has near-universal hardware and software support, primarily because MP3 was the format of choice during the crucial first few years of widespread music file-sharing/distribution over the internet, AAC remained a strong contender due to some unwavering industry support.[54] Due to MP3's dominance, adoption of AAC was initially slow. The first commercialization was in 1997 when AT&T Labs (a co-owner of AAC patents) launched a digital music store with songs encoded in MPEG-2 AAC.[55] HomeBoy for Windows was one of the earliest available AAC encoders and decoders.[56]

Dolby Laboratories came in charge of AAC licensing in 2000.[55] A new licensing model was launched by Dolby in 2002, while Nokia became a fifth co-licenser of the format.[57] Dolby itself also marketed its own coding format, Dolby AC-3.

Nokia started supporting AAC playback on devices as early as 2001,[58] but it was the exclusive use of AAC by Apple Computer for their iTunes Store which accelerated attention to AAC. Soon the format was also supported by Sony for their PlayStation Portable (albeit Sony continued promoting its proprietary ATRAC), and music-oriented cell phones from Sony Ericsson, beginning with the Sony Ericsson W800.[59] The Windows Media Audio (WMA) format, from Microsoft, was considered to be AAC's main competitor.[60]

By 2017, AAC was considered to have become a de facto industry standard for lossy audio.[61]

Functionality

[edit]

AAC is a wideband audio coding algorithm that exploits two primary coding strategies to dramatically reduce the amount of data needed to represent high-quality digital audio:

  • Signal components that are perceptually irrelevant are discarded.
  • Redundancies in the coded audio signal are eliminated.

The actual encoding process consists of the following steps:

  • The signal is converted from time-domain to frequency-domain using forward modified discrete cosine transform (MDCT). This is done by using filter banks that take an appropriate number of time samples and convert them to frequency samples.
  • The frequency domain signal is quantized based on a psychoacoustic model and encoded.
  • Internal error correction codes are added.
  • The signal is stored or transmitted.
  • In order to prevent corrupt samples, a modern implementation of the Luhn mod N algorithm is applied to each frame.[62]

The MPEG-4 audio standard does not define a single or small set of highly efficient compression schemes but rather a complex toolbox to perform a wide range of operations from low bit rate speech coding to high-quality audio coding and music synthesis.

  • The MPEG-4 audio coding algorithm family spans the range from low bit rate speech encoding (down to 2 kbit/s) to high-quality audio coding (at 64 kbit/s per channel and higher).
  • AAC offers sampling frequencies between 8 kHz and 96 kHz and any number of channels between 1 and 48.
  • In contrast to MP3's hybrid filter bank, AAC uses the modified discrete cosine transform (MDCT) together with the increased window lengths of 1024 or 960 points.

AAC encoders can switch dynamically between a single MDCT block of length 1024 points or 8 blocks of 128 points (or between 960 points and 120 points, respectively).

  • If a signal change or a transient occurs, 8 shorter windows of 128/120 points each are chosen for their better temporal resolution.
  • By default, the longer 1024-point/960-point window is otherwise used because the increased frequency resolution allows for a more sophisticated psychoacoustic model, resulting in improved coding efficiency.

Modular encoding

[edit]

AAC takes a modular approach to encoding. Depending on the complexity of the bitstream to be encoded, the desired performance and the acceptable output, implementers may create profiles to define which of a specific set of tools they want to use for a particular application.

The MPEG-2 Part 7 standard (Advanced Audio Coding) was first published in 1997 and offers three default profiles:[1][63]

  • Low Complexity (LC) – the simplest and most widely used and supported
  • Main Profile (Main) – like the LC profile, with the addition of backwards prediction
  • Scalable Sample Rate (SSR) a.k.a. Sample-Rate Scalable (SRS)

The MPEG-4 Part 3 standard (MPEG-4 Audio) defined various new compression tools (a.k.a. Audio Object Types) and their usage in brand new profiles. AAC is not used in some of the MPEG-4 Audio profiles. The MPEG-2 Part 7 AAC LC profile, AAC Main profile and AAC SSR profile are combined with Perceptual Noise Substitution and defined in the MPEG-4 Audio standard as Audio Object Types (under the name AAC LC, AAC Main and AAC SSR). These are combined with other Object Types in MPEG-4 Audio profiles.[39] Here is a list of some audio profiles defined in the MPEG-4 standard:[47][64]

  • Main Audio Profile – defined in 1999, uses most of the MPEG-4 Audio Object Types (AAC Main, AAC-LC, AAC-SSR, AAC-LTP, AAC Scalable, TwinVQ, CELP, HVXC, TTSI, Main synthesis)
  • Scalable Audio Profile – defined in 1999, uses AAC-LC, AAC-LTP, AAC Scalable, TwinVQ, CELP, HVXC, TTSI
  • Speech Audio Profile – defined in 1999, uses CELP, HVXC, TTSI
  • Synthetic Audio Profile – defined in 1999, TTSI, Main synthesis
  • High Quality Audio Profile – defined in 2000, uses AAC-LC, AAC-LTP, AAC Scalable, CELP, ER-AAC-LC, ER-AAC-LTP, ER-AAC Scalable, ER-CELP
  • Low Delay Audio Profile – defined in 2000, uses CELP, HVXC, TTSI, ER-AAC-LD, ER-CELP, ER-HVXC
  • Low Delay AAC v2 – defined in 2012, uses AAC-LD, AAC-ELD and AAC-ELDv2[65]
  • Mobile Audio Internetworking Profile – defined in 2000, uses ER-AAC-LC, ER-AAC-Scalable, ER-TwinVQ, ER-BSAC, ER-AAC-LD
  • AAC Profile – defined in 2003, uses AAC-LC
  • High Efficiency AAC Profile – defined in 2003, uses AAC-LC, SBR
  • High Efficiency AAC v2 Profile – defined in 2006, uses AAC-LC, SBR, PS
  • Extended High Efficiency AAC xHE-AAC – defined in 2012, uses USAC

One of many improvements in MPEG-4 Audio is an Object Type called Long Term Prediction (LTP), which is an improvement of the Main profile using a forward predictor with lower computational complexity.[41]

AAC error protection toolkit

[edit]

Applying error protection enables error correction up to a certain extent. Error correcting codes are usually applied equally to the whole payload. However, since different parts of an AAC payload show different sensitivity to transmission errors, this would not be a very efficient approach.

The AAC payload can be subdivided into parts with different error sensitivities.

  • Independent error correcting codes can be applied to any of these parts using the Error Protection (EP) tool defined in MPEG-4 Audio standard.
  • This toolkit provides the error correcting capability to the most sensitive parts of the payload in order to keep the additional overhead low.
  • The toolkit is backwardly compatible with simpler and pre-existing AAC decoders. A great deal of the toolkit's error correction functions are based around spreading information about the audio signal more evenly in the datastream.

Error Resilient (ER) AAC

[edit]

Error Resilience (ER) techniques can be used to make the coding scheme itself more robust against errors.

For AAC, three custom-tailored methods were developed and defined in MPEG-4 Audio

  • Huffman Codeword Reordering (HCR) to avoid error propagation within spectral data
  • Virtual Codebooks (VCB11) to detect serious errors within spectral data
  • Reversible Variable Length Code (RVLC) to reduce error propagation within scale factor data

AAC Low Delay

[edit]

The audio coding standards MPEG-4 Low Delay (AAC-LD), Enhanced Low Delay (AAC-ELD), and Enhanced Low Delay v2 (AAC-ELDv2) as defined in ISO/IEC 14496-3:2009 and ISO/IEC 14496-3:2009/Amd 3 are designed to combine the advantages of perceptual audio coding with the low delay necessary for two-way communication. They are closely derived from the MPEG-2 Advanced Audio Coding (AAC) format.[66][67][68] AAC-ELD is recommended by GSMA as super-wideband voice codec in the IMS Profile for High Definition Video Conference (HDVC) Service.[69]

Licensing and patents

[edit]

No licenses or payments are required for a user to stream or distribute audio in AAC format.[70] This reason alone might have made AAC a more attractive format to distribute audio than its predecessor MP3, particularly for streaming audio (such as Internet radio) depending on the use case.

However, a patent license is required for all manufacturers or developers of AAC "end-user" codecs.[71] The terms (as disclosed to SEC) use per-unit pricing. In the case of software, each computer running the software is to be considered a separate "unit".[72]

It used to be common for free and open source software implementations such as FFmpeg and FAAC to only be distributed in source code form so as to not "otherwise supply" an AAC codec. However, FFmpeg has since become more lenient on patent matters: the "gyan.dev" builds recommended by the official site now contain its AAC codec, with the FFmpeg legal page stating that patent law conformance is the user's responsibility.[73] (See below under Products that support AAC, Software.) The Fedora Project, a community backed by Red Hat, imported the "Third-Party Modified Version of the Fraunhofer FDK AAC Codec Library for Android" to its repositories on September 25, 2018,[74] and has enabled FFmpeg's native AAC encoder and decoder for its ffmpeg-free package on January 31, 2023.[75]

The AAC patent holders include Bell Labs, Dolby, ETRI, Fraunhofer, JVC Kenwood, LG Electronics, Microsoft, NEC, NTT (and its subsidiary NTT Docomo), Panasonic, Philips, and Sony Corporation.[28][29] Based on the list of patents from the SEC terms, the last baseline AAC patent expires in 2028, and the last patent for all AAC extensions mentioned expires in 2031.[76]

Extensions and improvements

[edit]

Some extensions have been added to the first AAC standard (defined in MPEG-2 Part 7 in 1997):

  • Perceptual Noise Substitution (PNS), added in MPEG-4 in 1999. It allows the coding of noise as pseudorandom data.
  • Long Term Predictor (LTP), added in MPEG-4 in 1999. It is a forward predictor with lower computational complexity.[41]
  • Error Resilience (ER), added in MPEG-4 Audio version 2 in 2000, used for transport over error prone channels[77]
  • AAC-LD (Low Delay), defined in 2000, used for real-time conversation applications
  • High Efficiency AAC (HE-AAC), a.k.a. aacPlus v1 or AAC+, the combination of SBR (Spectral Band Replication) and AAC LC. Used for low bitrates. Defined in 2003.
  • HE-AAC v2, a.k.a. aacPlus v2, eAAC+ or Enhanced aacPlus, the combination of Parametric Stereo (PS) and HE-AAC; used for even lower bitrates. Defined in 2004 and 2006.
  • xHE-AAC, extends the operating range of the codec from 12 to 300 kbit/s.[78][79]
  • MPEG-4 Scalable to Lossless (SLS), Not yet published,[80] can supplement an AAC stream to provide a lossless decoding option, such as in Fraunhofer IIS's "HD-AAC" product

Container formats

[edit]

In addition to the MP4, 3GP and other container formats based on ISO base media file format for file storage, AAC audio data was first packaged in a file for the MPEG-2 standard using Audio Data Interchange Format (ADIF),[81] consisting of a single header followed by the raw AAC audio data blocks.[82] However, if the data is to be streamed within an MPEG-2 transport stream, a self-synchronizing format called an Audio Data Transport Stream (ADTS) is used, consisting of a series of frames, each frame having a header followed by the AAC audio data.[81] This file and streaming-based format are defined in MPEG-2 Part 7, but are only considered informative by MPEG-4, so an MPEG-4 decoder does not need to support either format.[81] These containers, as well as a raw AAC stream, may bear the .aac file extension. MPEG-4 Part 3 also defines its own self-synchronizing format called a Low Overhead Audio Stream (LOAS) that encapsulates not only AAC, but any MPEG-4 audio compression scheme such as TwinVQ and ALS. This format is what was defined for use in DVB transport streams when encoders use either SBR or parametric stereo AAC extensions. However, it is restricted to only a single non-multiplexed AAC stream. This format is also referred to as a Low Overhead Audio Transport Multiplex (LATM), which is just an interleaved multiple stream version of a LOAS.[81]

Encoders and decoders

[edit]

Tools

[edit]

Apple AAC

[edit]

Apple's AAC encoder was first part of the QuickTime media framework but is now part of Audio Toolbox.

FAAC and FAAD2

[edit]

FAAC and FAAD2 stand for Freeware Advanced Audio Coder and Decoder 2 respectively. FAAC supports audio object types LC, Main and LTP.[83] FAAD2 supports audio object types LC, Main, LTP, SBR and PS.[84] Although FAAD2 is free software, FAAC is not free software.

Fraunhofer FDK AAC

[edit]

A Fraunhofer-authored open-source encoder/decoder included in Android has been ported to other platforms. FFmpeg's native AAC encoder does not support HE-AAC and HE-AACv2, but GPL 2.0+ of ffmpeg is not compatible with FDK AAC, hence ffmpeg with libfdk-aac is not redistributable. The QAAC encoder that is using Apple's Core Media Audio is still higher quality than FDK.

FFmpeg and Libav

[edit]

The native AAC encoder created in FFmpeg's libavcodec, and forked with Libav, was considered experimental and poor. A significant amount of work was done for the 3.0 release of FFmpeg (February 2016) to make its version usable and competitive with the rest of the AAC encoders.[85] Libav has not merged this work and continues to use the older version of the AAC encoder. These encoders are LGPL-licensed open-source and can be built for any platform that the FFmpeg or Libav frameworks can be built.

Both FFmpeg and Libav can use the Fraunhofer FDK AAC library via libfdk-aac, and while the FFmpeg native encoder has become stable and good enough for common use, FDK is still considered the highest quality encoder available for use with FFmpeg.[86] Libav also recommends using FDK AAC if it is available.[87] FFmpeg 4.4 and above can also use the Apple audiotoolbox encoder.[86]

Although the native AAC encoder only produces AAC-LC, ffmpeg's native decoder is able to deal with a wide range of input formats.

Nero Digital Audio

[edit]

In May 2006, Nero AG released an AAC encoding tool free of charge, Nero Digital Audio (the AAC codec portion has become Nero AAC Codec),[88] which is capable of encoding LC-AAC, HE-AAC and HE-AAC v2 streams. The tool is a command-line interface tool only. A separate utility is also included to decode to PCM WAV.

Various tools including the foobar2000 audio player and MediaCoder can provide a GUI for this encoder.

Media players

[edit]

Almost all current computer media players include built-in decoders for AAC, or can utilize a library to decode it. On Microsoft Windows, DirectShow can be used this way with the corresponding filters to enable AAC playback in any DirectShow based player. Mac OS X supports AAC via the QuickTime libraries. Adobe Flash Player, since version 9 update 3, can also play back AAC streams.[89][90] Since Flash Player is also a browser plugin, it can play AAC files through a browser as well.

The Rockbox open source firmware (available for multiple portable players) also offers support for AAC to varying degrees, depending on the model of player and the AAC profile.

Optional iPod support (playback of unprotected AAC files) for the Xbox 360 is available as a free download from Xbox Live.[91]

The following is a non-comprehensive list of other software player applications:

Some of these players (e.g., foobar2000, Winamp, and VLC) also support the decoding of ADTS (Audio Data Transport Stream) using the SHOUTcast protocol. Plug-ins for Winamp and foobar2000 enable the creation of such streams.

Use in HDTV broadcasting

[edit]

Japanese ISDB-T

[edit]

In December 2003, Japan started broadcasting terrestrial DTV ISDB-T standard that implements MPEG-2 video and MPEG-2 AAC audio. In April 2006 Japan started broadcasting the ISDB-T mobile sub-program, called 1seg, that was the first implementation of video H.264/AVC with audio HE-AAC in Terrestrial HDTV broadcasting service on the planet.

International ISDB-Tb

[edit]

In December 2007, Brazil started broadcasting terrestrial DTV standard called International ISDB-Tb that implements video coding H.264/AVC with audio AAC-LC on main program (single or multi) and video H.264/AVC with audio HE-AACv2 in the 1seg mobile sub-program.

DVB

[edit]

The ETSI, the standards governing body for the DVB suite, supports AAC, HE-AAC and HE-AAC v2 audio coding in DVB applications since at least 2004.[92] DVB broadcasts which use the H.264 compression for video normally use HE-AAC for audio.[citation needed]

Hardware

[edit]

iTunes and iPod

[edit]

In April 2003, Apple brought mainstream attention to AAC by announcing that its iTunes and iPod products would support songs in MPEG-4 AAC format (via a firmware update for older iPods). Customers could download music in a closed-source digital rights management (DRM)-restricted form of 128 kbit/s AAC (see FairPlay) via the iTunes Store or create files without DRM from their own CDs using iTunes. In later years, Apple began offering music videos and movies, which also use AAC for audio encoding.

On May 29, 2007, Apple began selling songs and music videos from participating record labels at higher bitrate (256 kbit/s cVBR) and free of DRM, a format dubbed "iTunes Plus" . These files mostly adhere to the AAC standard and are playable on many non-Apple products but they do include custom iTunes information such as album artwork and a purchase receipt, so as to identify the customer in case the file is leaked out onto peer-to-peer networks. It is possible, however, to remove these custom tags to restore interoperability with players that conform strictly to the AAC specification. As of January 6, 2009, nearly all music on the USA regioned iTunes Store became DRM-free, with the remainder becoming DRM-free by the end of March 2009.[93]

iTunes offers a "Variable Bit Rate" encoding option which encodes AAC tracks in the Constrained Variable Bitrate scheme (a less strict variant of ABR encoding); the underlying QuickTime API does offer a true VBR encoding profile however.[94]

As of September 2009, Apple has added support for HE-AAC (which is fully part of the MP4 standard) only for radio streams, not file playback, and iTunes still lacks support for true VBR encoding.

Other portable players

[edit]

Mobile phones

[edit]

For a number of years, many mobile phones from manufacturers such as Nokia, Motorola, Samsung, Sony Ericsson, BenQ-Siemens and Philips have supported AAC playback. The first such phone was the Nokia 5510 released in 2002 which also plays MP3s. However, this phone was a commercial failure[citation needed] and such phones with integrated music players did not gain mainstream popularity until 2005 when the trend of having AAC as well as MP3 support continued. Most new smartphones and music-themed phones support playback of these formats.

  • Sony Ericsson phones support various AAC formats in MP4 container. AAC-LC is supported in all phones beginning with K700, phones beginning with W550 have support of HE-AAC. The latest devices such as the P990, K610, W890i and later support HE-AAC v2.
  • Nokia XpressMusic and other new generation Nokia multimedia phones like N- and E-Series also support AAC format in LC, HE, M4A and HEv2 profiles. These also supports playing LTP-encoded AAC audio.
  • BlackBerry phones running the BlackBerry 10 operating system support AAC playback natively. Select previous generation BlackBerry OS devices also support AAC.
  • bada OS
  • Apple's iPhone supports AAC and FairPlay protected AAC files formerly used as the default encoding format in the iTunes Store until the removal of DRM restrictions in March 2009.
  • Android 2.3[95] and later supports AAC-LC, HE-AAC and HE-AAC v2 in MP4 or M4A containers along with several other audio formats. Android 3.1 and later supports raw ADTS files. Android 4.1 can encode AAC.[96]
  • WebOS by HP/Palm supports AAC, AAC+, eAAC+, and .m4a containers in its native music player as well as several third-party players. However, it does not support Apple's FairPlay DRM files downloaded from iTunes.[97]
  • Windows Phone's Silverlight runtime supports AAC-LC, HE-AAC and HE-AAC v2 decoding.

Other devices

[edit]
  • Apple's iPad: Supports AAC and FairPlay protected AAC files used as the default encoding format in the iTunes Store
  • Palm OS PDAs: Many Palm OS based PDAs and smartphones can play AAC and HE-AAC with the 3rd party software Pocket Tunes. Version 4.0, released in December 2006, added support for native AAC and HE-AAC files. The AAC codec for TCPMP, a popular video player, was withdrawn after version 0.66 due to patent issues, but can still be downloaded from sites other than corecodec.org. CorePlayer, the commercial follow-on to TCPMP, includes AAC support. Other Palm OS programs supporting AAC include Kinoma Player and AeroPlayer.
  • Windows Mobile: Supports AAC either by the native Windows Media Player or by third-party products (TCPMP, CorePlayer)[citation needed]
  • Epson: Supports AAC playback in the P-2000 and P-4000 Multimedia/Photo Storage Viewers
  • Sony Reader: plays M4A files containing AAC, and displays metadata created by iTunes. Other Sony products, including the A and E series Network Walkmans, support AAC with firmware updates (released May 2006) while the S series supports it out of the box.
  • Sonos Digital Media Player: supports playback of AAC files
  • Barnes & Noble Nook Color: supports playback of AAC encoded files
  • Roku SoundBridge: a network audio player, supports playback of AAC encoded files
  • Squeezebox: network audio player (made by Slim Devices, a Logitech company) that supports playback of AAC files
  • PlayStation 3: supports encoding and decoding of AAC files
  • Xbox 360: supports streaming of AAC through the Zune software, and of supported iPods connected through the USB port
  • Wii: supports AAC files through version 1.1 of the Photo Channel as of December 11, 2007. All AAC profiles and bitrates are supported as long as it is in the .m4a file extension. The 1.1 update removed MP3 compatibility, but according to Nintendo, users who have installed this may freely downgrade to the old version if they wish.[98]
  • Livescribe Pulse and Echo Smartpens: record and store audio in AAC format. The audio files can be replayed using the pen's integrated speaker, attached headphones, or on a computer using the Livescribe Desktop software. The AAC files are stored in the user's "My Documents" folder of the Windows OS and can be distributed and played without specialized hardware or software from Livescribe.
  • Google Chromecast: supports playback of LC-AAC and HE-AAC audio[99]

See also

[edit]

Notes

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Advanced Audio Coding (AAC) is a standardized lossy compression format that provides higher audio quality than at equivalent or lower bit rates, supporting multichannel audio up to 48 full-bandwidth channels and sample rates from 8 to 96 kHz. It employs perceptual coding techniques, including (MDCT) for representation, temporal noise shaping, and prediction tools, to efficiently compress audio while minimizing perceptible artifacts. AAC was jointly developed by Fraunhofer IIS, Laboratories, Bell Laboratories, and Corporation starting in the early , with the first version standardized in 1997 as part of the audio specification (ISO/IEC 13818-7). The format was later enhanced under MPEG-4 (ISO/IEC 14496-3), introducing profiles like AAC Low Complexity (LC) for broad compatibility and High-Efficiency AAC (HE-AAC) for low-bitrate applications such as streaming at 24–48 kbit/s per channel. These improvements enable perceptually transparent quality at around 128 kbit/s for and support advanced features like parametric stereo and replication for bandwidth extension. AAC's modular structure allows flexibility in encoding tools, making it suitable for diverse applications. AAC has become a cornerstone of digital media, serving as the default audio codec for platforms like Apple's and , YouTube videos, and digital broadcasting standards such as DAB+ and . It is also widely used for high-quality audio transmission in supported devices and is integral to container formats like MP4, ADTS, and 3GP. AAC is supported in high-definition media like Blu-ray discs. Licensed through Via Licensing Alliance, AAC's widespread adoption stems from its balance of efficiency and quality, reducing file sizes by up to 30% compared to without quality loss, thus facilitating efficient storage and transmission in mobile, streaming, and broadcast environments.

History

Background and Origins

In the early 1990s, researchers at the Fraunhofer Institute for Integrated Circuits (IIS) in , in collaboration with Bell Laboratories, Laboratories, and Corporation, initiated efforts to advance perceptual audio coding beyond the capabilities of MPEG-1 Audio Layer III (). These investigations focused on overcoming the inherent constraints of , which had been standardized in 1992 as part of the framework for efficient compression. The collaborative research emphasized enhancing compression efficiency for emerging applications like and portable media, drawing on prior experience with MPEG audio layers to push toward higher fidelity at constrained data rates. A primary motivation was addressing MP3's performance shortcomings, particularly its inefficiency at low bitrates below 128 kbps, where compression artifacts became prominent, and issues with stereo imaging, such as blurred spatial separation and pre-echo effects, degraded perceived quality. For instance, at bitrates around 64 kbps for stereo audio, MP3 often introduced audible distortions that compromised the listening experience, limiting its suitability for bandwidth-limited transmission. Karlheinz Brandenburg, a leading audio engineer at Fraunhofer IIS renowned for his expertise in psychoacoustics, played a central role in this work; his foundational research on human auditory perception informed the development of more sophisticated masking models to minimize these artifacts. Brandenburg's contributions, building on his earlier psychoacoustic advancements for MP3, helped prototype improved encoding strategies that better exploited perceptual redundancies in audio signals. Initial software-based prototypes of the new were developed and subjected to verification testing in early 1994, revealing substantial quality gains through novel algorithms that prioritized efficiency over strict adherence to prior formats. These tests confirmed the potential for a that could deliver near-transparent audio reproduction at lower bitrates, setting the stage for its integration into the standard. This evolution from and the backward-compatible extensions in audio layers underscored the need to balance innovation with ecosystem compatibility, as the new approach—later formalized as Advanced Audio Coding (AAC) in Part 7—opted for non-backward compatibility to achieve superior performance while allowing decoders to handle legacy content through hybrid implementations. Amid MP3's growing dominance in the for digital music distribution, these foundational efforts laid the groundwork for a more versatile successor.

Standardization Process

The standardization of Advanced Audio Coding (AAC) began in 1996 under the auspices of the (MPEG), a subgroup of ISO/IEC JTC 1/SC 29/WG 11 responsible for developing multimedia compression standards. This effort aimed to create a next-generation audio codec surpassing the capabilities of MPEG-1 Layer III (), building on prior psychoacoustic research while addressing multichannel and higher-fidelity needs. Key contributors included leading organizations such as Fraunhofer Institute for Integrated Circuits (IIS), Dolby Laboratories, , and AT&T Bell Laboratories, which collaborated on proposal submissions and technical refinements during MPEG meetings. In July 1996, MPEG issued a call for proposals to solicit advanced audio coding technologies, receiving submissions from multiple parties that underwent rigorous subjective listening tests at the 1996 MPEG meeting in , . The evaluation process selected a hybrid approach combining elements from various proposals, leading to the core specification's finalization later that year. This culminated in the ratification of AAC as Part 7, formally titled ISO/IEC 13818-7, in November 1997, establishing it as an for high-quality, multichannel audio compression. Profiles were defined at this stage, with AAC Low Complexity (AAC-LC) designated as the baseline for broad compatibility, alongside extensions for scalable and parametric coding to support diverse applications like streaming and low-bitrate transmission. The standard evolved further through the MPEG-4 framework, with AAC integrated and enhanced as Part 3 of ISO/IEC 14496 in 1999, enabling object-based audio and improved error resilience for interactive multimedia. Subsequent amendments addressed emerging requirements, including bandwidth extension tools and high-efficiency variants; notable updates occurred in 2003 (Amendment 1 for HE-AAC), 2004, and up to 2013 (Amendment 4 for new AAC profile levels), ensuring ongoing relevance in digital broadcasting and mobile devices. These developments maintained AAC's position as a versatile, royalty-bearing standard managed by the Via Licensing Alliance, successor to the original MPEG patent pool.

Key Improvements over MP3

Advanced Audio Coding (AAC) was developed to address the limitations of , particularly its inefficiencies in achieving high-quality audio at lower bitrates, by incorporating a range of algorithmic enhancements. One of the primary advancements is AAC's superior compression efficiency, allowing it to deliver audio quality comparable to at approximately 70% of the bitrate—meaning near-transparent stereo audio at around 96 kbps versus 's typical 128 kbps requirement. This efficiency stems from refined perceptual modeling and coding tools that more effectively discard inaudible components while preserving essential audio details. AAC enhances frequency resolution through its use of the (MDCT) with variable window sizes of 2048 samples for long blocks and 256 samples for short blocks, enabling adaptive block switching for better handling of transients and steady-state signals compared to MP3's fixed MDCT windows of 576 and 192 frequency lines. This flexibility reduces artifacts like pre-echo in percussive audio, providing more precise spectral representation across diverse content. In stereo handling, AAC includes Intensity Stereo (IS) as one of its joint stereo options, providing more flexible stereo coding than MP3's reliance on joint stereo modes like mid-side or intensity stereo, resulting in improved spatial imaging and reduced bitrate overhead for stereo signals at low rates. AAC natively supports up to 48 channels, including efficient encoding at 320 kbps for perceptually transparent quality, whereas MP3 is limited to stereo in its core specification and requires less optimized extensions for multichannel in MPEG-2. Quantitative evaluations, such as those using the (PEAQ) metric, demonstrate AAC's lower audible distortion levels; for instance, AAC at 64 kbps often achieves PEAQ Objective Difference Grade (ODG) scores closer to transparent (-1 or better) than equivalent encodings, confirming its perceptual advantages.

Adoption Timeline

In the early , Advanced Audio Coding (AAC) was integrated into the MPEG-4 standard as its codec, enabling efficient compression for applications such as streaming and interactive content delivery. This integration positioned AAC as a successor to within the evolving MPEG-4 framework, which emphasized object-based coding for enhanced flexibility in digital media. Apple's launch of the in 2003 further accelerated adoption by using AAC at 128 kbps as the default encoding format for digital music downloads, marking a shift toward higher-quality compressed audio in consumer ecosystems. By the mid-2000s, AAC saw widespread integration into portable devices, with Nokia introducing support in models like the 3300 music phone in 2003, allowing playback of AAC files alongside for mobile audio consumption. Sony followed suit, announcing plans in 2006 to incorporate AAC into its digital audio players, broadening compatibility beyond proprietary formats and aligning with emerging industry standards for portable media. A key milestone came in 2006 with the Blu-ray Disc format's specification, which included AAC as a supported for high-definition video discs, facilitating its use in home entertainment systems. That same year, the WorldDAB forum adopted HE-AAC (an enhanced AAC profile) for the DAB+ standard, improving audio efficiency for broadcast applications worldwide. During the 2010s, AAC solidified its dominance in online streaming services, becoming the default codec for platforms like and , where it delivers balanced quality at bitrates around 128–256 kbps to optimize bandwidth and device compatibility. Another milestone occurred around 2012, when AAC emerged as a recommended for HTML5's <audio> element, particularly in MP4 containers paired with H.264 video, ensuring broad cross-browser support for web-based media playback. Early adoption faced challenges from licensing complexities, as MP3's more favorable royalty structure initially slowed AAC's momentum in consumer hardware. In the 2020s, AAC maintained its relevance through extensions for advanced applications, including spatial audio formats like in streaming services, where it serves as the base for immersive multichannel experiences delivered via platforms such as . Its compatibility with multimedia standards further supports low-latency, high-fidelity audio transmission in mobile networks, underscoring AAC's enduring role in modern digital ecosystems. In 2025, xHE-AAC saw further adoption in Amazon's new product lines for improved streaming efficiency.

Technical Principles

Psychoacoustic Modeling

The psychoacoustic model in Advanced Audio Coding (AAC) exploits principles of human auditory perception, particularly simultaneous and temporal masking, to determine perceptual irrelevancies in the and allocate bits efficiently across bands. Simultaneous masking renders sounds near a louder masker in frequency inaudible due to overlapping excitation patterns on the basilar , while temporal masking suppresses detection of sounds occurring shortly before or after a masker, typically within 1-200 ms depending on the masker's intensity and duration. These effects enable the encoder to introduce quantization noise below computed masking thresholds without perceptible distortion, optimizing compression for transparent quality at low . Central to the model is the (ATH), also termed threshold in quiet (TIQ), which defines the minimum level detectable across frequencies from 20 Hz to 20 kHz in the absence of any masker. This frequency-dependent curve follows a characteristic U-shape, with peak sensitivity (lowest threshold) near 2-5 kHz at approximately 0 dB SPL and rising sharply to 50-80 dB SPL at the extremes, reflecting the uneven distribution of hair cells along the . The ATH serves as the baseline for all masking calculations, ensuring that noise below this level remains inaudible even without signal masking. Masking thresholds are derived by integrating the ATH with signal-induced excitations, using tonal and noise-like components identified via fast Fourier transform analysis. The overall threshold per critical band is the minimum of individual tonal and noise masker contributions, combined via nonlinear superposition to account for multiple maskers. The spreading function models the asymmetric excitation pattern underlying these thresholds, applied on the bark scale—a perceptual frequency unit where each of the 24 critical bands spans approximately equal perceptual distance, transforming linear Hz to a nonlinear scale via z=13arctan(0.00076f)+3.5arctan((f/7500)2)z = 13 \arctan(0.00076 f) + 3.5 \arctan((f/7500)^2), with ff in Hz and zz in barks. This asymmetry captures the basilar membrane's sharper roll-off at low frequencies (roughly 27 dB per bark) compared to higher frequencies (about 8 dB per bark), ensuring accurate prediction of masking that extends farther upward in frequency than downward. Pre-echo avoidance addresses temporal resolution limits in transform-based coding, where noise from a long window can precede sharp transients, becoming audible due to weak backward temporal masking. The model detects transients via time-domain analysis of signal envelope changes exceeding adaptive thresholds, triggering adjustments to time-frequency granularity—such as shorter windows or temporal noise shaping—to confine noise post-transient and exploit forward masking for imperceptibility.

Transform Coding Mechanism

The core of Advanced Audio Coding (AAC) lies in its frequency-domain transform coding, which converts time-domain audio signals into a spectral representation for efficient compression. This process begins with the application of the Modified Discrete Cosine Transform (MDCT), a critically sampled transform that provides high frequency resolution while minimizing artifacts through 50% overlap-add between adjacent blocks. The overlap-add mechanism ensures seamless reconstruction by adding overlapping portions of consecutive windows, reducing blocking effects and enabling smooth transitions during block switching for handling transients. AAC employs variable block sizes to balance time and frequency resolution, guided briefly by psychoacoustic models for bit allocation. Long blocks consist of 2048 time-domain samples, transforming into spectral coefficients suitable for stationary signals, while short blocks use 128 samples (yielding 64 coefficients) to capture rapid changes like transients, with eight short blocks often grouped to span the equivalent of a long block duration. Window functions include the sine window for short blocks and transitions, and the Kaiser-Bessel Derived () window for long blocks to optimize energy compaction and side-lobe suppression. The MDCT is defined by the equation: Xk=n=0N1xncos[π(n+0.5)(2k+N+1)N],k=0,1,,N/21X_k = \sum_{n=0}^{N-1} x_n \cos\left[\pi (n + 0.5) \frac{(2k + N + 1)}{N}\right], \quad k = 0, 1, \dots, N/2 - 1 where NN is the block length (2048 for long, 128 for short), xnx_n are the windowed time samples, and the transform produces N/2N/2 real-valued coefficients. Following the transform, spectral coefficients undergo quantization to control bitrate and distortion, using a non-uniform quantizer shaped by scale factors that adjust precision across frequency bands. Quantized values are entropy-coded with Huffman codes, which employ variable-length codebooks (up to 11 types) for both spectral coefficients and scale factors, achieving further compression by exploiting statistical redundancies. For regions where quantization noise exceeds the signal—often noise-like high-frequency areas—perceptual noise substitution (PNS) replaces detailed coefficients with a noise generator and scale factor, preserving perceived quality at low bitrates. To reduce inter-channel redundancy in stereo signals, AAC implements joint stereo coding through two methods: mid-side (M/S) coding, which transforms left and right channels into sum (mid) and difference (side) signals for selective coding of the side channel at higher frequencies; and intensity stereo, which encodes a mono signal with directional scale factors for high-frequency bands where precise is less critical. For multichannel audio, the coupling channel element enables efficient representation by deriving multiple channels from a shared low-frequency component, with individual scales applied to high frequencies to maintain spatial cues while minimizing bitrate.

Bitstream Syntax and Tools

The Advanced Audio Coding (AAC) bitstream is organized into access units (AUs), each comprising a header that specifies key parameters such as the audio profile, sampling rate, and number of channels, followed by the encoded audio data. This structure ensures compatibility across and MPEG-4 systems, with the header providing essential decoding instructions. Within each AU, the core data is encapsulated in one or more raw_data_block() elements, which include side information for decoding guidance, scale factors for spectral coefficient adjustment, the main quantized spectral data derived from the (MDCT), and fill bits to pad the stream for bitrate control. The side information and main data are positioned variably to optimize error resilience, with fill bits allowing flexible bitrate allocation without altering the audio content. AAC incorporates several optional tools to enhance compression efficiency and perceptual quality. Temporal Noise Shaping (TNS) applies in the to shape quantization noise temporally, effectively reducing pre-echo artifacts in transient signals by aligning noise distribution with the signal's time-domain envelope. Long Term Prediction (LTP), available in MPEG-4 AAC profiles, uses forward across frames to exploit periodicity in stationary signals, such as tonal or repetitive audio, thereby improving coding gain for sources like speech or music with sustained pitches. For scalability, MPEG-4 AAC supports layered coding where a base layer provides at lower quality and bitrate, enhanced by one or more enhancement layers that add detail for higher fidelity, enabling adaptive streaming over varying network conditions. This hierarchical approach allows decoding at multiple quality levels from a single . AAC accommodates sampling rates from 8 kHz to 96 kHz, supporting applications from low-bitrate speech to high-fidelity multichannel audio, with typical bitrates ranging from 8 kbps for mono signals to 576 kbps for 5.1 surround configurations. Two primary header formats facilitate bitstream transport: the Audio Data Transport Stream (ADTS), which includes a synchronization header before each raw_data_block for seamless streaming in formats like MPEG-TS, and the Audio Data Interchange Format (ADIF), featuring a single file-level header followed by the data blocks, suited for self-contained file storage without per-frame overhead.

Encoding and Decoding

Modular Encoding Framework

The modular encoding framework of Advanced Audio Coding (AAC) enables flexible configuration by allowing encoders to select and combine a suite of coding tools tailored to specific audio content, target bitrates, and computational constraints, thereby optimizing compression efficiency and perceptual . Defined in ISO/IEC 14496-3, this integrates sophisticated, individually standardized tools to support a wide range of applications, from low-bitrate streaming to high-fidelity multichannel audio. At its core, the framework relies on four primary modules. The filterbank module applies a (MDCT) to convert the input time-domain signal into critically sampled frequency-domain representations, facilitating subsequent perceptual processing. The psychoacoustic model module analyzes the signal to compute masking thresholds and signal-to-mask ratios, identifying audible components while suppressing inaudible ones based on human auditory perception principles. Quantization then applies a nonuniform scalar process to the spectral coefficients, guided by the psychoacoustic data to allocate bits efficiently and shape quantization into masked regions. Finally, entropy coding employs Huffman variable-length codes to compress the quantized coefficients and associated side information, minimizing the overall size. Tool selection enhances the framework's adaptability, permitting optional inclusion of advanced features depending on the signal characteristics and bitrate. Perceptual Noise Substitution (PNS) is one such tool, activated for noise-like spectral regions at low bitrates; it replaces detailed coefficient transmission with a noise generator and spectral envelope parameters, significantly reducing data volume while preserving perceived quality. For audio, Mid/Side (MS) processing transforms left and right channels into mid (sum) and side (difference) signals to exploit inter-channel redundancy, while Intensity (IS), or SID, uses directional cues to encode intensity differences rather than full coefficients for high-frequency bands, further improving . These tools are dynamically chosen during encoding to balance and . Encoder complexity levels span a spectrum to accommodate diverse hardware environments. Basic implementations use fixed-point arithmetic for low-power, real-time applications such as mobile devices, prioritizing speed over precision, whereas advanced variants leverage floating-point operations to achieve higher fidelity in studio or broadcast settings, often incorporating vectorized processing for multichannel support. Rate-distortion optimization forms the backbone of tool integration, employing iterative search algorithms—such as dynamic programming or trellis-based methods—to evaluate combinations of modules and parameters, adjusting quantization scales and tool activation to minimize perceptual distortion subject to bitrate constraints. This process typically involves multiple encoding iterations per frame, enabling adaptive decisions that enhance quality at rates from 16 kbit/s upward. Backward compatibility is embedded in the design, with MPEG-4 AAC incorporating the full AAC as a compatible ; this allows MPEG-4 compliant decoders to process MPEG-2 bitstreams seamlessly using only the core tools, facilitating evolution without obsoleting prior deployments.

Error Protection Features

Advanced Audio Coding (AAC) employs a suite of built-in error protection mechanisms to detect and mitigate transmission errors, ensuring robust decoding in noisy or error-prone environments. Central to this is the use of (CRC) codes for error detection in critical bitstream components, including headers and scale factors. A 16-bit CRC is computed and appended to the side information for each element, such as individual channels or coupling channels, enabling the decoder to verify the integrity of these sensitive data. Errors in scale factors, which control the quantization levels across frequency bands, are particularly detrimental as they can cause audible clipping or ; the CRC allows their prompt identification. Upon CRC failure, the decoder activates error concealment techniques to enable graceful degradation rather than complete frame discard. These include zeroing out affected spectral coefficients, muting erroneous bands, or extrapolating from preceding and succeeding frames using techniques like bandwidth extension or temporal interpolation. Bit error flags embedded in the side information further support fine-grained error localization, allowing the decoder to flag and bypass specific bits without affecting the entire frame. This modular approach leverages the underlying encoding framework, where side information is separated from spectral data for targeted protection. For channels susceptible to packet losses or erasures, such as or broadcast transmissions, AAC supports optional outer error correction via Reed-Solomon coding within the Error Protection (EP) toolkit defined in the MPEG-4 Audio standard. This UEP scheme classifies elements by sensitivity—scale factors and headers receive the strongest protection (e.g., highest code rates), while less critical spectral coefficients get minimal overhead—to balance robustness and efficiency. Reed-Solomon codes, typically shortened variants like RS(255,223), correct burst errors or erasures by adding parity symbols, making them suitable for packet-based delivery. Performance evaluations demonstrate the effectiveness of these features, with error concealment preserving perceptual fidelity even under moderate error conditions. This robustness stems from the prioritized protection of perceptually vital elements, preventing widespread artifacts in the reconstructed audio.

Low-Delay and Error-Resilient Variants

The Low-Delay AAC (AAC-LD) variant of Advanced Audio Coding is designed to minimize algorithmic delay for applications requiring real-time interaction, achieving a total delay of approximately 20 ms at a 48 kHz sampling rate through the use of a single 512-sample (MDCT) block for both analysis and synthesis filtering. This configuration eliminates the need for overlapping windows and buffering typical in standard AAC, while omitting tools that introduce additional delay, such as Long-Term Prediction (LTP), to further reduce latency without introducing additional processing overhead. The resulting latency can be approximated by the equation \text{Latency} \approx 2 \times \frac{\text{block_size}}{\text{sample_rate}}, where the factor of 2 accounts for the in the MDCT transform. The Error-Resilient AAC (ER AAC) variant extends the core AAC framework with specialized tools to enhance robustness in transmission over error-prone channels, such as networks or packet-based systems. Key additions include Virtual Codebooks (VCB11), which enable partial decoding by detecting and isolating severe errors in spectral data through extended sectioning information, and Huffman Codeword Reordering (HCR), which mitigates error propagation by segmenting and repositioning spectral codewords into fixed-size blocks for independent recovery. These tools build upon foundational error protection mechanisms in AAC, allowing for graceful degradation rather than complete failure in noisy environments. AAC-LD and ER AAC are particularly suited for use cases like (VoIP), two-way communication, and wireless broadcasting, where low latency and reliability are critical for maintaining conversational flow and audio intelligibility. However, these variants often require slightly higher bitrates—typically around 64 kbps per channel for acceptable quality—compared to standard AAC due to the constraints on block size and tool usage, which limit compression efficiency. AAC-LD was standardized as part of MPEG-4 Audio Amendment 1 in 2000 (ISO/IEC 14496-3:1999/Amd 1:2000), while ER AAC tools were introduced in MPEG-4 Audio Version 2 the same year to address emerging needs in mobile and networked audio delivery.

Profiles and Extensions

AAC Low Complexity (AAC-LC)

AAC Low Complexity (AAC-LC) serves as the foundational profile of the Advanced Audio Coding (AAC) standard, defined in ISO/IEC 14496-3 as the baseline object type that excludes advanced extensions such as Spectral Band Replication (SBR) or Parametric Stereo (PS), focusing instead on core perceptual coding tools for full-bandwidth audio signals. This profile supports up to 48 channels at sampling rates ranging from 8 kHz to 96 kHz, enabling applications from mono speech to immersive multichannel . Key features of AAC-LC include its reliance on the (MDCT) for frequency-domain representation, Huffman variable-length coding for entropy compression, and intensity stereo or mid-side joint stereo for efficient multichannel encoding. These elements allow operation across a broad bitrate range, typically from around 12 kbps for low-quality mono to 576 kbps for high-fidelity multichannel, though optimal performance is achieved in the 64-320 kbps range per channel for stereo and surround content. AAC-LC offers advantages in low , making it particularly suitable for resource-constrained embedded systems like mobile devices and hardware, where decoding requires minimal processing power compared to more advanced profiles. It delivers perceptually transparent for stereo signals at approximately 96-128 kbps using high- encoders, surpassing at equivalent bitrates due to its refined psychoacoustic model and transform efficiency. However, AAC-LC exhibits limitations in efficiency at bitrates below 64 kbps, where artifacts become more noticeable without bandwidth extension tools, leading to reduced performance relative to high-efficiency variants for ultra-low-rate applications like streaming speech. As the most widely adopted AAC profile, AAC-LC underpins the majority of deployments in consumer electronics, online media, and broadcast systems as of 2024, serving as the default format in platforms like Apple's ecosystem and MPEG-4 containers.

High-Efficiency AAC (HE-AAC)

High-Efficiency AAC (HE-AAC) builds upon the AAC Low Complexity (AAC-LC) profile by incorporating Spectral Band Replication (SBR) as a bandwidth extension tool to enhance performance at low bitrates, enabling the reconstruction of high-frequency content that would otherwise require excessive bits in the core . The core AAC encoder handles the low-frequency band up to approximately 8 kHz, while SBR generates the higher frequencies, typically in the 4-16 kHz range, by replicating spectral patterns from the low band and adjusting them with transmitted parameters. Specifically, the SBR mechanism transmits quantized spectral envelope data to shape the amplitude characteristics, noise-floor levels to model stochastic components, and inverse tonal flags to identify and adjust structures, with gain factors applied per subband (usually 8-64 bands) to refine the perceptual reconstruction during decoding. This parametric approach exploits the between low and high frequencies in audio signals, allowing decoders to synthesize high-band content efficiently without direct transmission of all spectral details. HE-AAC version 1, standardized in as part of MPEG-4 Audio Amendment 1, combines the AAC core with SBR to achieve good quality at 24-48 kbps, roughly half the bitrate needed for comparable performance with AAC-LC alone, which typically requires around 96 kbps for similar perceptual results. , released in via Amendment 2 to ISO/IEC 14496-3:2005, optionally integrates Parametric (PS) to further compress the stereo image into a compact set of spatial parameters, such as inter-channel intensity differences and phase shifts, enabling high-quality mono or stereo encoding below 24 kbps without significant loss in spatial perception. PS operates on the downmixed mono signal from the AAC core, transmitting only a few parameters per frame to guide the upmixing process, thus reducing bitrate overhead for multichannel-like stereo at ultra-low rates. This efficiency makes HE-AAC particularly suited for mobile streaming and bandwidth-constrained applications, where it delivers near-transparent audio quality at rates as low as 32 kbps for stereo content. Perceptual listening tests using the MUSHRA methodology have demonstrated that HE-AAC achieves subjective quality equivalent to MP3 at approximately half the bitrate, with average mean opinion scores (MOS) showing superior performance over legacy codecs like MP3 or WMA at 24-48 kbps for both music and speech. For instance, in standardized evaluations, HE-AAC v2 at 24 kbps stereo matched the perceptual transparency of 48 kbps MP3, highlighting its impact on efficient delivery in digital broadcasting and internet streaming.

Other Specialized Profiles

Advanced Audio Coding (AAC) encompasses several specialized profiles tailored for niche applications, extending the core technology to address specific requirements in delay-sensitive communication, immersive audio, hybrid speech-music coding, and . These variants build on the modular framework of AAC, incorporating targeted tools to optimize performance in specialized scenarios without compromising the format's foundational perceptual coding principles. AAC-ELD, or Enhanced Low Delay AAC, is designed for full-duplex communication systems where minimal latency is critical, achieving algorithmic delays as low as 2.5 ms at certain sample rates and under 10 ms in typical configurations for real-time applications. This profile, defined as Audio Object Type 39 in the MPEG-4 Audio standard, employs a reduced window size and algorithmic optimizations to minimize delay while maintaining high-fidelity audio quality comparable to full-bandwidth codecs, supporting bitrates from 24 to 128 kbit/s for signals. It was standardized in ISO/IEC 14496-3 as part of the Low Delay AAC v2 Profile, enabling Full-HD Voice capabilities with a frequency range up to 20 kHz. Adoption of AAC-ELD accelerated in mobile networks following its integration into Voice over LTE (VoLTE) specifications around 2012, where Fraunhofer IIS demonstrated its use for high-definition voice services over LTE, providing superior quality to traditional narrowband codecs at low bitrates. By enabling low-latency, wideband audio in cellular infrastructure, it supported enhanced conversational clarity in real-time telephony. In WebRTC implementations, AAC-ELD has been utilized for browser-based communication, offering interoperability in full-duplex scenarios like video calls, though often alongside primary codecs like Opus. MPEG-H 3D Audio represents an extension for immersive spatial audio, combining the USAC core with parametric spatial tools from MPEG Surround and higher-order to deliver 3D soundscapes suitable for 360-degree and experiences. Standardized under ISO/IEC 23008-3:2015, it supports up to 22.2 channels plus object-based audio with metadata for dynamic, personalized rendering in immersive environments. This profile facilitates bandwidth-efficient transmission of spatial content, achieving perceptual transparency at bitrates around 128-256 kbit/s per channel for complex scenes. USAC, or Unified Speech and Audio Coding, is a hybrid profile that seamlessly handles both speech and music content, making it ideal for versatile broadcasting and streaming applications. Defined in ISO/IEC 23003-3 and finalized in 2011, USAC employs a switched core architecture combining (ACELP) for speech, Transform Coded eXcitation (TCX) for mid-complexity signals, and AAC Linear Predictive Coding (AAC-LD) for full audio, with optional Spectral Band Replication (SBR) for bandwidth extension. This allows efficient coding of mixed content at bitrates as low as 8 kbit/s for speech up to 96 kbit/s for music, providing consistent quality across signal types. Its adoption in standards, such as for enhanced audio services, underscores its role in mobile multimedia delivery. xHE-AAC, or extended High-Efficiency AAC, further enhances HE-AAC and USAC for ultra-low bitrate scenarios, supporting bitrates as low as 12 kbit/s for speech with high quality, up to 500 kbit/s for immersive multichannel audio. Standardized as part of ISO/IEC 14496-3 (2012), it includes advanced tools like enhanced SBR, parametric , and mandatory MPEG-D Control (DRC) for consistent . Backward compatible with HE-AAC v2 decoders, xHE-AAC is widely used in adaptive streaming (e.g., HLS, ), digital radio (DRM+), and platforms like Meta and as of 2025, enabling efficient delivery of high-fidelity audio in bandwidth-limited environments. AAC-SLS, or Scalable Lossless Audio Coding, extends AAC to provide lossless compression layered atop a lossy core, using an integer approximation of the Modified Discrete Cosine Transform (IntMDCT) for precise, reversible spectral representation. Specified in ISO/IEC 14496-3:2005 Amendment 3, it structures the bitstream with a base AAC layer for perceptual coding followed by enhancement layers that recover the exact original signal through entropy coding and noise compensation, supporting sample rates up to 192 kHz and word lengths up to 24 bits. The IntMDCT ensures bit-exact reconstruction without floating-point operations, enabling compression ratios of 2:1 for CD-quality audio while maintaining scalability for progressive transmission. This profile is particularly valuable for archival and professional audio workflows requiring both efficiency and fidelity. As of 2025, these specialized profiles exhibit targeted adoption: AAC-ELD remains prominent in low-latency telecom, while spatial extensions like see increasing use in AR/VR ecosystems for immersive content delivery, driven by standards like MPEG-I for 6DoF audio rendering, though overall penetration remains limited compared to core AAC variants due to computational demands and ecosystem maturity. xHE-AAC continues to gain traction in streaming and broadcast for its efficiency at low bitrates.

Licensing and Patents

Patent Holders and Pools

The primary patent holders for Advanced Audio Coding (AAC) include Fraunhofer IIS, which owns core patents related to the foundational encoding framework, and , which holds key patents for extensions such as Spectral Band Replication (SBR) and Parametric Stereo (PS) used in High-Efficiency AAC (HE-AAC). Other significant contributors encompass Sony Corporation, Nokia Corporation, and Intellectual Property, collectively contributing to a comprehensive portfolio of essential patents developed during the standard's creation in the . The Via Licensing Alliance (Via LA) has administered the AAC patent pool since 1998, offering a unified licensing mechanism that simplifies access to essential patents from over a dozen licensors, including the aforementioned major holders, to promote broad implementation without fragmented bilateral negotiations. This one-stop approach covers AAC-LC, HE-AAC, and related profiles, with royalty rates structured on a tiered per-unit basis: for example, $0.98 per unit for the first 500,000 units in high-revenue regions, decreasing to $0.20 per unit for volumes exceeding 10 million units, plus a one-time $15,000 administrative fee. Following the 2023 merger of Via Licensing and , Via LA now manages the program exclusively, ensuring continuity while incorporating updates like MPEG-D Dynamic Range Control for enhanced compatibility. In the early 2000s, the pool's establishment mitigated potential barriers by centralizing licensing, averting the fragmented disputes seen in prior audio standards like , and by 2005, cross-licensing arrangements among holders further streamlined implementation. As of 2025, numerous core AAC patents have expired or are nearing the end of their terms, with the final baseline patents projected to lapse by 2027, thereby lowering overall costs and supporting open-source compliance through initiatives like Fraunhofer's FDK-AAC library, which provides defensive publication strategies to enable royalty-free decoding for basic AAC-LC in compliant implementations. This evolving landscape has facilitated AAC's integration into billions of devices, underscoring the pool's role in sustaining its ubiquity.

Implementation Challenges

One major implementation challenge for AAC lies in its computational demands, particularly for advanced profiles like HE-AAC, which exhibit higher encoder complexity compared to earlier codecs such as . Encoding with HE-AAC typically requires significantly more processing cycles—often estimated at 2-3 times those of MP3 at equivalent bitrates—due to additional tools like Spectral Band Replication (SBR) and parametric stereo, which enhance efficiency but increase algorithmic overhead. Decoder complexity remains moderate for HE-AAC, but real-time applications on resource-constrained devices demand optimizations such as SIMD instructions (e.g., SSE or ) to accelerate transform operations like the (MDCT), reducing cycle counts by up to 50% in optimized implementations. Compatibility issues arise from the fragmentation across AAC profiles (e.g., LC, Main, LTP, HE-AAC), requiring decoders to handle multiple variants for broad , which can lead to and increased in software implementations. To mitigate playback failures, many decoders incorporate fallback mechanisms, such as gracefully degrading to AAC-LC when encountering unsupported extensions like SBR in legacy streams. This profile diversity, while enabling specialized use cases, complicates deployment in embedded systems where unified decoding is preferred to minimize size and power consumption. Quality tuning presents variability across encoders, as perceptual models and quantization strategies differ, potentially introducing audible artifacts like pre-echo or noise shaping inconsistencies at low bitrates. Benchmarks from 3GPP evaluations highlight this, showing that poorly tuned AAC encoders can underperform by 1-2 Mean Opinion Score (MOS) points compared to reference implementations on critical test items, emphasizing the need for iterative psychoacoustic optimization to balance bitrate and transparency. Ecosystem gaps further hinder adoption, particularly limited hardware support in legacy devices from the , where decoders dominated due to simpler integration, delaying the shift to AAC despite its superior efficiency. Transitioning embedded systems often faced challenges with firmware updates, as older chipsets lacked native AAC acceleration, forcing software-only decoding that strained battery life and CPU resources. Solutions include reference software like the Fraunhofer FDK AAC library, which provides a low-resource, optimized implementation supporting multiple profiles with efficient SIMD usage for real-time encoding and decoding on platforms like Android. Ongoing MPEG updates, such as the xHE-AAC extension, address efficiency through enhanced tools like unified speech-audio coding, reducing complexity while maintaining .

Applications and Integration

Container Formats

Advanced Audio Coding (AAC) bitstreams are typically encapsulated in standardized container formats to facilitate storage, streaming, and playback, with the choice of container influencing features like seeking, metadata support, and compatibility with systems. The primary container for AAC is the MP4 format, defined by the (ISO/IEC 14496-12), which uses the 'mp4a' identifier for AAC audio tracks. This format supports efficient and seeking through atoms such as stts (decoding time to sample) and stsc (sample to chunk), enabling precise navigation within audio files without decoding the entire stream. MP4 containers, often with .m4a extensions for audio-only files, are widely used for distribution due to their robustness and integration with systems like Apple's ecosystem. For streaming applications, particularly in MPEG-2 Transport Streams (MPEG-TS), AAC employs the Audio Data Transport Stream (ADTS) format as specified in ISO/IEC 13818-7 and ISO/IEC 14496-3. ADTS packages each Access Unit (AU)—a complete AAC frame or set of frames—with a dedicated header that includes information, sampling rate, channel configuration, and an optional (CRC) for error detection, making it suitable for real-time transmission over networks. This per-AU header structure embeds the AAC bitstream syntax directly, allowing decoders to parse frames sequentially with minimal overhead. Other containers include 3GP, a mobile-optimized variant derived from the (ISO/IEC 14496-12) and specified by for third-generation mobile services, which supports AAC-LC at low bit rates for efficient playback on handheld devices. Experimental encapsulation of AAC in Ogg containers has been implemented in open-source tools like FFmpeg, though it lacks formal standardization and is primarily used in niche applications. While serves as a lossless container for its native codec, it is not natively designed to wrap AAC streams, limiting its use to custom or converter-based scenarios. Metadata in AAC containers, particularly MP4, follows iTunes-style tagging (an Apple extension to ISO/IEC 14496-12) stored in the 'udta' atom, supporting fields like artist, album, and artwork for enhanced organization and playback. In multimedia files, AAC audio synchronizes with video tracks in containers like (MKV) and (AVI) via timestamp-based alignment, where MKV uses EBML elements for precise interleaving and AVI relies on RIFF chunk indexing to maintain lip-sync. For , best practices emphasize low-latency containers like ADTS in MPEG-TS or fragmented MP4 (fMP4) in (HLS), where segment durations under 2 seconds and minimal buffering reduce to approximately 5-10 seconds while preserving AAC's perceptual quality.

Broadcasting and Transmission

Advanced Audio Coding (AAC) plays a pivotal role in and transmission standards, enabling efficient audio delivery over terrestrial, satellite, and mobile networks with high quality at constrained bitrates. Its variants, particularly high-efficiency profiles, are integrated into various international standards to support both fixed and mobile reception, optimizing for bandwidth limitations and channel conditions. In Japan's Digital Broadcasting-Terrestrial (ISDB-T) , launched in 2003, the one-segment (1seg) mobile service, introduced in 2006, mandates the use of High-Efficiency AAC (HE-AAC) for audio coding to accommodate low-bitrate transmission suitable for handheld devices. Typical bitrates for HE-AAC in 1seg range from 32 to 64 kbps, ensuring robust performance in mobile environments while maintaining perceptual audio quality. The Digital Video Broadcasting (DVB) standards, widely adopted in , incorporate AAC for audio in both (terrestrial) and DVB-H (handheld) specifications, facilitating mobile TV services. AAC supports multichannel configurations up to , allowing broadcasters to deliver immersive audio experiences within the transport stream framework. Plus (DAB+), an enhanced version of DAB introduced in , specifies HE-AAC version 2 (HE-AAC v2) as the core audio codec to achieve superior efficiency over legacy MPEG Layer II. Operating at bitrates typically between 32 and 96 kbps, HE-AAC v2 enables up to four stereo audio services per ensemble, supporting multilingual broadcasting and higher quality at reduced data rates. The standard for next-generation includes support for AAC, aligning with its 2020 rollout to enhance over-the-air capabilities. This integration allows for flexible audio delivery in IP-based streams, complementing primary codecs like for immersive sound. In wireless communications, the Enhanced (EVS) codec, standardized for (VoLTE), incorporates the AAC Enhanced Low Delay (AAC-ELD) mode to provide with low latency. AAC-ELD enhances error resilience in fading channels through techniques like channel-aware coding and , ensuring reliable transmission in mobile networks.

Hardware and Software Implementations

Advanced Audio Coding (AAC) has been implemented in a wide array of software libraries and frameworks, enabling encoding and decoding across various platforms. FFmpeg, a prominent open-source framework, integrates the libfdk-aac encoder, which is renowned for its high-quality AAC encoding capabilities, supporting profiles such as AAC-LC and HE-AAC. Apple's framework provides native AAC encoding and decoding support, optimized for macOS and devices, facilitating seamless integration in applications like and . For decoding, the open-source FAAD2 library offers robust AAC support, including error resilience features, and is widely used in embedded systems and media players. (Note: FAAD2 is part of the OpenCORE project.) Hardware implementations of AAC are prevalent in , particularly in mobile and home entertainment devices. Qualcomm's Snapdragon processors, found in many Android smartphones and tablets, incorporate dedicated for AAC decoding, enabling efficient playback of high-bitrate audio streams in real-time. Realtek's audio codecs, such as the ALC series used in smart TVs and soundbars, support AAC decoding natively, contributing to widespread adoption in digital broadcasting receivers. These hardware solutions often leverage digital signal processors (DSPs) to handle AAC's perceptual coding algorithms with minimal power consumption. Notable software encoders include Apple's iTunes (now part of Music app), which features AAC encoding with quality tiers ranging from 0 (highest quality, variable bitrate) to 2 (standard), allowing users to balance file size and audio fidelity. Since Android 2.3 (Gingerbread) in 2010, AAC has served as the default audio codec for media playback on the platform, with hardware-accelerated decoding in most devices. Performance benchmarks for AAC implementations highlight their efficiency; for instance, DSP-based decoders from the Fraunhofer Society achieve real-time decoding of stereo AAC at 128 kbps on processors operating at 100 MHz, demonstrating suitability for resource-constrained environments like early mobile phones. As of 2025, AAC continues to evolve with integrations in emerging technologies. Browser support has matured, with providing full HE-AAC decoding since version 57 (2017), extended to v2 profiles for enhanced efficiency in web audio applications. Apple Safari provides long-standing support for HE-AAC, enabling high-fidelity streaming in web-based media players on macOS and .

References

Add your contribution
Related Hubs
Contribute something
User Avatar
No comments yet.