Recent from talks
Nothing was collected or created yet.
Adaptive Multi-Rate audio codec
View on Wikipedia| Adaptive Multi-Rate (AMR) | |
|---|---|
| Filename extension |
.amr, .3ga |
| Internet media type |
audio/amr, audio/3gpp, audio/3gpp2 |
| Initial release | 23 June 1999[1][2] |
| Latest release | 14.0.0 17 March 2017 |
| Type of format | Lossy audio |
| Open format? | Yes |
| Free format? | No |
The Adaptive Multi-Rate (AMR, AMR-NB or GSM-AMR) audio codec is an audio compression format optimized for speech coding. AMR is a multi-rate narrowband speech codec that encodes narrowband (200–3400 Hz) signals at variable bit rates ranging from 4.75 to 12.2 kbit/s with toll quality[3] speech starting at 7.4 kbit/s.[4]
AMR was adopted as the standard speech codec by 3GPP in October 1999 and is now widely used in GSM[5] and UMTS. It uses link adaptation to select from one of eight different bit rates based on link conditions.
AMR is also a file format for storing spoken audio using the AMR codec. Many modern mobile telephone handsets can store short audio recordings in the AMR format, and both free and proprietary programs exist (see Software support) to convert between this and other formats, although AMR is a speech format and is unlikely to give ideal results for other audio. The common filename extension is .amr. There also exists another storage format for AMR that is suitable for applications with more advanced demands on the storage format, like random access or synchronization with video. This format is the 3GPP-specified 3GP container format based on ISO base media file format.[6]
Usage
[edit]The frames contain 160 samples and are 20 milliseconds long.[1] AMR uses various techniques, such as ACELP, DTX, VAD and CNG. The usage of AMR requires optimized link adaptation that selects the best codec mode to meet the local radio channel and capacity requirements. If the radio conditions are bad, source coding is reduced and channel coding is increased. This improves the quality and robustness of the network connection while sacrificing some voice clarity. In the particular case of AMR this improvement is somewhere around S/N = 4–6 dB for usable communication. The new intelligent system allows the network operator to prioritize capacity or quality per base station.
There are a total of 14 modes of the AMR codec, eight are available in a full rate channel (FR) and six on a half rate channel (HR).
| Mode | Bitrate (kbit/s) | Channel | Compatible with |
|---|---|---|---|
| AMR_12.20 | 12.20 | FR | ETSI GSM enhanced full rate |
| AMR_10.20 | 10.20 | FR | |
| AMR_7.95 | 7.95 | FR/HR | |
| AMR_7.40 | 7.40 | FR/HR | TIA/EIA IS-641 TDMA enhanced full rate |
| AMR_6.70 | 6.70 | FR/HR | ARIB 6.7 kbit/s enhanced full rate |
| AMR_5.90 | 5.90 | FR/HR | |
| AMR_5.15 | 5.15 | FR/HR | |
| AMR_4.75 | 4.75 | FR/HR | |
| AMR_SID | 1.80 | FR/HR |
Features
[edit]- Sampling frequency 8 kHz/13-bit (160 samples for 20 ms frames), filtered to 200–3400 Hz.
- The AMR codec uses eight source codecs with bit-rates of 12.2, 10.2, 7.95, 7.40, 6.70, 5.90, 5.15 and 4.75 kbit/s.
- Generates frame length of 95, 103, 118, 134, 148, 159, 204, or 244 bits for AMR FR bit rates 4.75, 5.15, 5.90, 6.70, 7.40, 7.95, 10.2, or 12.2 kbit/s, respectively. AMR HR frame lengths are different.
- AMR utilizes discontinuous transmission (DTX), with voice activity detection (VAD) and comfort noise generation (CNG) to reduce bandwidth usage during silence periods
- Algorithmic delay is 20 ms per frame. For bit-rates of 12.2, there is no "algorithm" look-ahead delay. For other rates, look-ahead delay is 5 ms. Note that there is 5 ms "dummy" look-ahead delay, to allow seamless frame-wise mode switching with the rest of rates.
- AMR is a hybrid speech coder, and as such transmits both speech parameters and a waveform signal
- Linear predictive coding (LPC) is used to synthesize the speech from a residual waveform. The LPC parameters are encoded as line spectral pairs (LSP).
- The residual waveform is coded using algebraic code-excited linear prediction (ACELP).
- The complexity of the algorithm is rated at 5, using a relative scale where G.711 is 1 and G.729a is 15.
- PSQM testing under ideal conditions yields mean opinion scores of 4.14 for AMR (12.2 kbit/s), compared to 4.45 for G.711 (μ-law)[citation needed]
- PSQM testing under network stress yields mean opinion scores of 3.79 for AMR (12.2 kbit/s), compared to 4.13 for G.711 (μ-law)
Licensing and patent issues
[edit]AMR codecs incorporate several patents of Nokia, Ericsson, NTT and VoiceAge,[7][8] the last one being the License Administrator for the AMR patent pools. VoiceAge also accepts submission of patents for determination of their possible essentiality to these standards.[9][10]
The initial fee for professional content creation tools and "real-time channel" products is US$6,500.[when?] The minimum annual royalty is $10,000, which, in the first year, excludes the initial fee. Per-channel license fees fall from $0.99 to $0.50 with volume, up to a maximum of $2 million annually.[7][8]
In the category of personal computer products, e.g., media players, the AMR decoder is licensed for free. The license fee for a sold encoder falls from $0.40 to $0.30 with volume, up to a maximum of $300,000 annually. The minimum annual royalty is not applied to licensed products that fall under the category of personal computer products and use only the free decoder.[7][8]
More information:
- VoiceAge licensing information, including pricing to license the AMR codecs
- 3GPP legal issues
- The 3G Patent Platform and its licensing policy
- AMR Codecs as Shared Libraries Archived 2021-04-11 at the Wayback Machine — legal notices for usage of amrnb and amrwb libraries based on the reference implementation
Software support
[edit]- 3GPP TS 26.073 – AMR speech Codec (C source code) – reference implementation[11]
- Audacity (beta version 1.3) via the FFmpeg integration libraries[12] (both input and output format)
- FFmpeg with OpenCORE AMR libraries[13]
- Android[14] Used for voice recorder.
- AMR Codecs as Shared Libraries Archived 2021-04-11 at the Wayback Machine – amrnb and amrwb libraries development site. These libraries are based on the reference implementation and were created to prevent embedding of possibly patented source code into many open source projects.
- Open source software to convert the .amr format: RetroCode, Amr2Wav, both are in an early developmental stage
- AMR Player is freeware to play AMR audio files, and can convert AMR from/to MP3/WAV audio format.
- Nokia Multimedia Converter 2.0 can convert (create) samples, one can use Nokia's conversion tool to create both .amr and .awb files. It works in Windows 7 as well if the setup is run in XP compatibility mode.
- MPlayer (SMPlayer, KMPlayer[15])
- Parole Media Player 0.8.1 (in Ubuntu 16.04)
- QuickTime Player and multimedia framework
- RealPlayer version 11 and later
- VLC media player version 1.1.0 and later (input format only, not output format)
- ffdshow
- Apple iPhone (can play back AMR files)
- iOS & macOS (iMessage)
- BlackBerry smartphones (used for voice recorder file format, while BlackBerry 10 cannot play AMR format)
- K-Lite Codec Pack
- Media Player Classic Home Cinema, around 1.7.1
- foobar2000 with the component foo_input_amr
See also
[edit]References
[edit]- ^ a b "3GPP TS 26.090 - Mandatory Speech Codec speech processing functions; Adaptive Multi-Rate (AMR) speech codec; Transcoding functions". 3GPP. Retrieved 2010-07-21.
- ^ "3GPP TS 26.071 - Mandatory speech CODEC speech processing functions; AMR speech Codec; General description". 3GPP. Retrieved 2010-07-21.
- ^ "What's toll-quality voice?". ITworld. 13 December 2000. Retrieved 26 July 2019.
- ^ RFC 4867 - RTP Payload Format and File Storage Format for the Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband (AMR-WB) Audio Codecs Page 35
- ^ "Sorting Through GSM Codecs: A Tutorial". 11 July 2003.
- ^ RFC 4867 - RTP Payload Format and File Storage Format for the Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband (AMR-WB) Audio Codecs Page 35
- ^ a b c VoiceAge Corporation (2007-10-14). "AMR Licensing Terms". VoiceAge Corporation. Archived from the original on 2007-10-14. Retrieved 2009-09-12.
- ^ a b c VoiceAge Corporation (June 2007). "AMR Licensing Terms". VoiceAge Corporation. Archived from the original on 2007-10-14. Retrieved 2009-09-12.
- ^ VoiceAge Corporation. "Licensing - Patent Calls". VoiceAge Corporation. Archived from the original on 2007-10-14. Retrieved 2009-09-12.
- ^ VoiceAge Corporation (2007-10-14). "Licensing - Patent Calls". Archived from the original on 2007-10-14. Retrieved 2009-09-12.
- ^ 3GPP (2008-12-11) 3GPP TS 26.073 - AMR speech Codec, Retrieved 2009-09-08
- ^ Retrieved on 2010-02-28
- ^ FFmpeg General Documentation - AMR external library, Retrieved on 2009-07-08
- ^ Android AMR codecs, Retrieved on 2009-07-08 Archived February 18, 2009, at the Wayback Machine
- ^ KMPlayer Internal Audio Decoder Preferences Archived 2014-10-22 at the Wayback Machine, Retrieved 2014-10-22
External links
[edit]- 3GPP TS 26.090 – Mandatory Speech Codec speech processing functions; Adaptive Multi-Rate (AMR) speech codec; Transcoding functions
- 3GPP TS 26.071 – Mandatory Speech Codec speech processing functions; AMR Speech Codec; General Description
- 3GPP codecs specifications; 3G and beyond / GSM, 26 series
- RFC 4867 – RTP Payload Format and File Storage Format for the Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband (AMR-WB) Audio Codecs
- RFC 4281 – The Codecs Parameter for "Bucket" Media Types
Adaptive Multi-Rate audio codec
View on GrokipediaBackground and Development
Historical Context
The development of the Adaptive Multi-Rate (AMR) audio codec was initiated in the late 1990s by the European Telecommunications Standards Institute (ETSI), with subsequent adoption by the 3rd Generation Partnership Project (3GPP) following its formation in 1998, driven by the growing demand for efficient speech compression technologies suitable for second-generation (2G) and emerging third-generation (3G) mobile networks.[3] Conceptual work began in earnest with the launch of the AMR standardization effort at ETSI's SMG#23 meeting in October 1997, focusing initially on narrowband coding to address limitations in existing mobile telephony systems.[4] Initial prototypes and feasibility studies were conducted through 1998, evaluating multi-rate approaches to enhance performance in diverse network environments. This included two competitive selection phases culminating in October 1998, where a codec jointly proposed by companies including Ericsson and Nokia was chosen by ETSI.[5][6] Key motivations for AMR's creation included the need to improve voice quality under variable radio channel conditions, such as those encountered in Global System for Mobile Communications (GSM) and Universal Mobile Telecommunications System (UMTS) networks, where signal degradation from interference or fading could severely impact audio fidelity.[7] Unlike prior fixed-rate codecs, AMR was designed to dynamically adjust its bit rate and error protection levels in response to channel quality and traffic load, thereby optimizing speech intelligibility and capacity.[7] This adaptive strategy aimed to supersede older codecs like the GSM Full Rate (FR) and Enhanced Full Rate (EFR), which suffered from suboptimal performance in adverse conditions despite their widespread adoption in 2G systems.[8] At its core, the AMR codec relies on Algebraic Code-Excited Linear Prediction (ACELP) as the primary encoding technique, a method that builds on earlier advancements in speech coding efficiency.[2] ACELP originated from the Conjugate-Structure ACELP (CS-ACELP) framework developed for the ITU-T G.729 standard, ratified in 1996, which introduced algebraic pulse structuring to reduce computational complexity while maintaining high-quality narrowband speech reconstruction at 8 kbit/s. AMR extended this foundation into a multi-mode system, enabling variable-rate operation tailored to mobile constraints. This evolution positioned AMR for formal adoption by 3GPP in 1999.Standardization Process
The Adaptive Multi-Rate (AMR) narrowband speech codec, selected by ETSI in October 1998 following competitive evaluation, was adopted by the 3rd Generation Partnership Project (3GPP) in 1999 as the mandatory codec for Release 1999 specifications, enabling its use in GSM Enhanced Data rates for Global Evolution (EDGE) and Universal Mobile Telecommunications System (UMTS) networks to improve speech quality under variable channel conditions.[9] This adoption marked a shift from earlier fixed-rate codecs, positioning AMR as a foundational element for circuit-switched voice services in 2.5G and 3G mobile systems.[10] The core technical specifications for AMR are documented in a series of 3GPP Technical Specifications (TS), harmonized with ETSI standards, with ETSI TS 126.073 providing the ANSI-C source code implementation for the codec, including details on frame processing and integration aspects.[11] Complementary documents, such as 3GPP TS 26.101 (ETSI TS 126.101), define the generic frame structure for AMR and GSM Enhanced Full Rate (EFR) payloads, while TS 26.091 (ETSI TS 126.091) specifies error concealment procedures, including frame substitution and muting to handle lost or corrupted frames in packet-switched environments.[12][13] These specifications ensure bit-exact interoperability across implementations. The wideband extension, AMR-WB, underwent a separate standardization process, with 3GPP selecting the codec in December 2000 and approving specifications in March 2001 as part of Release 5; it was subsequently integrated into ITU-T Recommendation G.722.2 in 2002, distinguishing it from the narrowband AMR by supporting 7 kHz audio bandwidth for higher fidelity speech.[14][15] Key milestones post-Release 1999 include enhancements in Releases 4 through 6 for improved robustness, such as optimized half-rate modes (OHR-AMR) and integration with adaptive multi-rate operations in GERAN, addressing packet loss and noise suppression in evolving mobile architectures.[16] In contrast to fixed-rate predecessors like the GSM Full Rate codec defined in ETSI GSM 06.10, which operated at a constant 13 kbit/s without channel adaptation, AMR's multi-rate design allows dynamic selection from eight modes (4.75 to 12.2 kbit/s) based on link quality, enhancing efficiency and error resilience in cellular networks. This adaptive approach was a direct outcome of collaborative efforts by ETSI and 3GPP working groups, culminating in frozen specifications that prioritized seamless evolution from 2G to 3G systems.[17]Technical Specifications
Encoding Mechanism
The Adaptive Multi-Rate (AMR) speech codec utilizes Algebraic Code-Excited Linear Prediction (ACELP) as its primary encoding paradigm, which combines linear prediction with codebook-based excitation to achieve efficient compression of narrowband speech signals. In ACELP, the speech signal is modeled as the output of a short-term linear predictor followed by a long-term predictor, where the excitation is derived from an adaptive codebook for periodic components and a fixed algebraic codebook for stochastic elements. This approach employs Long-Term Prediction (LTP) via an adaptive codebook to capture pitch periodicity and Short-Term Prediction (STP) through a 10th-order linear prediction filter to represent the spectral envelope of the vocal tract.[18] The encoding process operates on fixed frame structures to balance computational efficiency and perceptual quality. Each 20 ms frame encompasses 160 samples sampled at 8 kHz, targeting the narrowband frequency range of 200–3400 Hz suitable for telephony applications. Within each frame, the signal is segmented into four subframes of 5 ms (40 samples each), enabling subframe-specific parameter estimation and quantization for adaptive refinement of the prediction model.[18] Central to the encoding are several key signal processing steps. Linear Predictive Coding (LPC) analysis first estimates the 10th-order LPC coefficients to model the vocal tract filter, computed via the Levinson-Durbin algorithm on autocorrelations derived from a 30 ms asymmetric Hamming window applied to the pre-emphasized speech signal. Pitch detection occurs through an open-loop search across two 10 ms segments per frame (or one for lower rates), maximizing the normalized correlation between weighted speech segments to identify the pitch lag for the LTP adaptive codebook. Subsequently, the algebraic codebook search optimizes the fixed codebook excitation by selecting sparse pulse positions and signs in an interleaved structure, minimizing the weighted mean-squared error between the target signal and the synthesized speech.[18] To handle transmission interruptions and silence periods, the codec incorporates error concealment techniques integrated with Discontinuous Transmission (DTX) and Voice Activity Detection (VAD). DTX reduces bandwidth usage during non-speech intervals by transmitting only periodic comfort noise parameters, while VAD classifies input frames as speech or noise based on spectral distortion and full-band energy thresholds, enabling generation of synthetic comfort noise that maintains natural auditory continuity. These mechanisms employ parameter interpolation and predictive buffering to conceal frame erasures without audible artifacts.[19] The mathematical foundation of the LPC stage relies on the Levinson-Durbin recursion to solve the normal equations for minimum prediction error, iteratively deriving reflection coefficients from the autocorrelation matrix. This yields the prediction gain, quantifying the compression efficacy of the linear model as where denotes the variance of the input speech signal and the variance of the residual after prediction, typically achieving gains of 10–20 dB for voiced speech segments.[18]Bit Rates and Operational Modes
The Adaptive Multi-Rate (AMR) codec operates in eight distinct modes, each corresponding to a specific bit rate optimized for narrowband speech signals (200–3400 Hz), enabling efficient use of bandwidth under varying network conditions. The supported bit rates are 12.2, 10.2, 7.95, 7.40, 6.70, 5.90, 5.15, and 4.75 kbit/s, with frame sizes ranging from 244 bits at the highest rate to 95 bits at the lowest, all processed over 20 ms frames of 160 samples.[20] This multi-rate design allows the codec to trade off speech quality for lower transmission overhead, achieving an average bit rate of approximately 7.4 kbit/s in typical mixed speech and noise scenarios. The bit allocation within each frame varies by mode to prioritize essential speech parameters, such as linear predictive coding (LPC) coefficients for spectral envelope, adaptive codebook for pitch periodicity, and fixed codebook for residual excitation. Total bits per frame can be expressed as the sum of LPC bits (23–38), pitch-related bits (20–46, including delay and gain), and codebook bits (52–160, covering indices and gains), adjusted per mode to fit the target rate.[20] The following table summarizes the modes, bit rates, frame sizes, and key bit allocations:| Mode Bit Rate (kbit/s) | Frame Size (bits) | LPC Bits | Pitch Delay Bits | Pitch Gain Bits | Codebook Index Bits | Codebook Gain Bits |
|---|---|---|---|---|---|---|
| 12.2 | 244 | 38 | 30 | 16 | 140 | 20 |
| 10.2 | 204 | 26 | 26 | 0 | 124 | 28 |
| 7.95 | 159 | 27 | 28 | 16 | 68 | 20 |
| 7.40 | 148 | 26 | 26 | 0 | 68 | 28 |
| 6.70 | 134 | 26 | 24 | 0 | 56 | 28 |
| 5.90 | 118 | 26 | 24 | 0 | 44 | 24 |
| 5.15 | 103 | 23 | 20 | 0 | 36 | 24 |
| 4.75 | 95 | 23 | 20 | 0 | 36 | 16 |
