Recent from talks
Nothing was collected or created yet.
Codec 2
View on Wikipedia| Codec 2 | |
|---|---|
| Developer | David Grant Rowe |
| Initial release | August 25, 2010 |
| Stable release | 1.2.0
/ June 24, 2023 |
| Repository | github |
| Written in | C99 |
| Platform | Cross-platform |
| Type | Audio codec |
| License | GNU LGPL, v2.1 |
| Website | www |
Codec 2 is a low-bitrate speech audio codec (speech coding) that is patent free and open source.[1] Codec 2 compresses speech using sinusoidal coding, a method specialized for human speech. Bit rates of 3200 to 450 bit/s have been successfully created. Codec 2 was designed to be used for amateur radio and other high compression voice applications.
Overview
[edit]The codec was developed by David Grant Rowe, with support and cooperation of other researchers (e.g., Jean-Marc Valin from Opus).[2]
Codec 2 consists of 3200, 2400, 1600, 1400, 1300, 1200, 700 and 450 bit/s codec modes. It outperforms most other low-bitrate speech codecs. For example, it uses half the bandwidth of Advanced Multi-Band Excitation to encode speech with similar quality.[citation needed] The speech codec uses 16-bit PCM sampled audio, and outputs packed digital bytes. When sent packed digital bytes, it outputs PCM sampled audio. The audio sample rate is fixed at 8 kHz.
The reference implementation is open source and is freely available in a GitHub repository.[3] The source code is released under the terms of version 2.1 of the GNU Lesser General Public License (LGPL).[4] It is programmed in C and current source code requires floating-point arithmetic, although the algorithm itself does not require this. The reference software package also includes a frequency-division multiplex digital voice software modem and a graphical user interface based on WxWidgets. The software is developed on Linux and a port for Microsoft Windows created with Cygwin is offered in addition to an Apple MacOS version.
The codec has been presented in various conferences and has received the 2012 ARRL Technical Innovation Award,[5] and the Linux Australia Conference's Best Presentation Award.[6]
Technology
[edit]Internally, parametric audio coding algorithms operate on 10 ms PCM frames using a model of the human voice. Each of these audio segments is declared voiced (vowel) or unvoiced (consonant).
Codec 2 uses sinusoidal coding to model speech, which is closely related to that of multi-band excitation codecs. Sinusoidal coding is based on regularities (periodicity) in the pattern of overtone frequencies and layers harmonic sinusoids. Spoken audio is recreated by modelling speech as a sum of harmonically related sine waves with independent amplitudes called Line spectral pairs, or LSP, on top of a determined fundamental frequency of the speaker's voice (pitch). The (quantised) pitch and the amplitude (energy) of the harmonics are encoded, and with the LSP's are exchanged across a channel in a digital format. The LSP coefficients represent the Linear Predictive Coding (LPC) model in the frequency domain, and lend themselves to a robust and efficient quantisation of the LPC parameters.[7]
The digital bytes are in a bit-field format that have been packed together into bytes. These bit fields are also optionally gray coded before being grouped together. The gray coding may be useful if sending raw, but normally an application will just burst the bit fields out. The bit fields make up the various parameters that are stored or exchanged (pitch, energy, voicing Booleans, LSP's, etc.).
For example, Mode 3200, has 20 ms of audio converted to 64 bits. So 64 bits will be output every 20 ms (50 times a second), for a minimum data rate of 3200 bit/s. These 64 bits are sent as 8 bytes to the application, which has to unwrap the bit fields, or send the bytes over a data channel.
Another example is Mode 1300, which is sent 40 ms of audio, and outputs 52 bits every 40 ms (25 times a second), for a minimum rate of 1300 bit/s. These 52 bits are sent as 7 bytes to the application or data channel.
Adoption
[edit]Codec 2 is currently used in several radios and Software Defined Radio Systems:
Codec 2 has also been integrated into FreeSWITCH and there's a patch available for support in Asterisk.
There was an FM-to-Codec2 digital voice repeater in earth orbit on amateur radio CubeSat LilacSat-1 (call sign ON02CN, QB50 constellation), which was launched and subsequently deployed from the International Space Station in 2017.[13]
History
[edit]The prominent free software advocate and radio amateur Bruce Perens lobbied for the creation of a free speech codec for operation at less than 5 kbit/s. Since Perens did not have the background himself, he approached Jean-Marc Valin in 2008, who introduced him to lead developer David Grant Rowe, who has worked with Valin on Speex on several occasions. Rowe himself was also a radio amateur (amateur radio call sign VK5DGR) and had experience in creating and using voice codecs and other signal processing algorithms for speech signals. He obtained a PhD in speech coding in the 1990s and was involved in the development of one of the first satellite telephony systems (Mobilesat).
He agreed to the task and announced his decision to work on a format on August 21, 2009. He built on the research and findings from his doctoral thesis.[14][15] The underlying sinusoidal modelling goes back to developments by Robert J. McAulay and Thomas F. Quatieri (MIT Lincoln labs) from the mid-1980s.
In August 2010, David Rowe published version 0.1 alpha.[16] Version 0.2 was released towards the end of 2011, introducing a mode with 1,400 bits/s and significant improvements in quantization.
In January 2012, at linux.conf.au, Jean-Marc Valin helped improve the quantization of line spectral pairs, which Rowe is less familiar with.[17] After several changes to the available bit rate modes in winter and spring 2011/2012, 2,400, 1,400 and 1,200 bit/s modes were available after May of that year.
Codec 2 700C, a new mode with a bit rate of 700 bit/s, was finished in early 2017.[18]
In July 2018 an experimental 450 bit/s mode was demonstrated, which was developed as part of a master thesis at the University of Erlangen-Nuremberg. By clever training of the vector quantization the data rate could be further reduced based on the principle of the 700C mode.[19]
References
[edit]- ^ "DCC2011-Codec2-VK5DGR" (PDF).
- ^ "A Pitch-Energy Quantizer for Codec2". Archived from the original on 2015-06-19.
- ^ "Repository for Codec 2 Source". GitHub. 14 October 2021.
- ^ "Codec2 – an Open Source, Low-Bandwidth Voice Codec". Slashdot. 21 September 2010.
- ^ "ARRL Board of Directors Names Award Recipients at 2012 Second Meeting". www.arrl.org.
- ^ "Linux Australia 2012 conference". Archived from the original on 2012-11-29. Retrieved 2012-08-02.
- ^ "Techniques for Harmonic Sinusoidal Coding" (PDF). Archived from the original (PDF) on 2013-05-15. Retrieved 2013-04-12.
- ^ "FreeDV: Open Source Amateur Digital Voice – Where Amateur Radio Is Driving The State of the Art".
- ^ "FreeDV, CODEC2 and the WaveformAPI". Archived from the original on 2015-04-02. Retrieved 2015-03-06.
- ^ "Introducing the SM1000 Smart Mic – Rowetel". 21 May 2014.
- ^ "Quisk, A Software Defined Radio (SDR)". james.ahlstrom.name.
- ^ "M17 protocol description". GitHub.
- ^ "QB-50 Constellation Satellites Deployed from ISS". American Radio Relay League website. 2017-11-15. Retrieved 2019-03-31.
- ^ "Techniques for Harmonic Sinusoidal Coding" (PDF). Archived from the original (PDF) on 2013-05-15. Retrieved 2013-04-12.
- ^ "Open Source Low Rate Speech Codec Part 1 – Rowetel". 21 August 2009.
- ^ "Codec2 V0.1 Alpha Released – Rowetel". 25 August 2010.
- ^ "A Pitch-Energy Quantizer for Codec2".
- ^ "Open Source Codec Encodes Voice Into Only 700 Bits Per Second". Slashdot. 13 January 2017. Retrieved 2019-03-31.
- ^ "Codec2 HF digital voice at 450 bps". Southgate Amateur Radio News. 2018-07-08. Retrieved 2019-03-31.
External links
[edit]Codec 2
View on GrokipediaIntroduction
Overview
Codec 2 is an open-source speech codec designed for low-bitrate digital voice communications, targeting bit rates from 450 to 3200 bit/s to achieve communications-quality speech in bandwidth-constrained environments.[6] It primarily serves applications in amateur radio, enabling efficient voice transmission over narrow bandwidths in HF and VHF digital modes such as FreeDV.[3] The codec accepts input as 8 kHz sampled 16-bit linear PCM audio and processes it in frames of 10 ms or 20 ms duration, depending on the selected mode.[3][7] Licensed under the GNU Lesser General Public License version 2.1, Codec 2 was developed by David Grant Rowe to provide a patent-free alternative to proprietary low-bitrate codecs.[3] Codec 2 has received recognition for its innovation, including the 2012 ARRL Technical Innovation Award for advancing digital voice technology in amateur radio and the Linux Australia Conference's Best Presentation Award for Rowe's 2012 talk at linux.conf.au.[8][1]Development Background
Codec 2 was initiated in 2010 by David Grant Rowe, an Australian electrical engineer specializing in digital signal processing and telecommunications. Rowe earned his PhD in 1997 from the University of Wollongong for a thesis on techniques for harmonic sinusoidal coding of speech signals, which laid the groundwork for efficient low-bitrate representation of voiced speech using sinusoidal oscillators with parametric phase modeling.[9] His experience in digital signal processing includes developing speech codecs and modems for open-source amateur radio projects, such as FreeDV, which integrates Codec 2 with channel modulation techniques for high-frequency (HF) radio transmission.[10] The primary motivation for Codec 2 stemmed from the limitations of existing open-source speech codecs, such as Speex, which were designed for higher bit rates (typically above 2 kbit/s) and struggled to deliver intelligible speech at ultra-low rates suitable for bandwidth-constrained amateur radio applications over HF channels. Rowe was particularly inspired by Bruce Perens, a prominent advocate for open-source software and amateur radio, who in 2008 called for the development of patent-free alternatives to proprietary military-grade codecs like MELP, emphasizing the need for accessible, low-complexity solutions for hobbyist and emergency communications.[11] This push aligned with broader efforts to democratize digital voice technology, avoiding the licensing barriers that restricted adoption in non-commercial settings.[5] Codec 2's foundational research draws heavily from 1980s advancements in sinusoidal speech modeling, pioneered by researchers including Robert J. McAulay and Thomas F. Quatieri, who introduced methods for decomposing speech into harmonic sinusoids to enable low-bitrate coding while preserving perceptual quality.[9] (References to McAulay and Quatieri's 1986 work on sinusoidal transform coding.) Early development benefited from collaboration and support by Jean-Marc Valin, creator of the Speex codec, who provided insights on open-source implementation and integration challenges during initial discussions prompted by Perens.[11] The project's initial goals centered on achieving communications-quality speech—defined as highly intelligible with acceptable distortion—at bit rates below 700 bit/s, while minimizing computational demands to run on resource-limited embedded systems like microcontrollers in radio transceivers.[2]Technical Specifications
Encoding and Decoding Process
Codec 2 operates on input speech sampled at 8 kHz in PCM format, processing it in frames of 20 ms (160 samples) for higher bit rate modes (3200 and 2400 bit/s) or 40 ms (320 samples) for lower bit rate modes, with internal analysis using shorter windows such as 10 ms (80 samples) for LPC parameter estimation to capture quasi-stationary characteristics of the signal. This allows for efficient parameter estimation while minimizing delay.[9][2] The process begins with voiced/unvoiced detection for each frame, which classifies the speech segment as periodic (voiced) or aperiodic (unvoiced) to guide subsequent modeling. This classification relies on two primary features: the signal's short-term energy, which is higher in voiced frames due to glottal pulses, and the zero-crossing rate, which is lower for voiced speech owing to its periodic nature compared to the noise-like unvoiced segments. These metrics enable a simple yet effective decision threshold to distinguish frame types without complex computation.[9][2] For voiced frames, parameter extraction employs a sinusoidal model, representing the speech waveform as a sum of harmonically related sine waves: , where is the fundamental frequency (pitch), are the harmonic amplitudes, the phases, and the number of harmonics within the 4 kHz bandwidth. The pitch (typically 50-400 Hz) is estimated using an analysis-by-synthesis approach that minimizes spectral distortion, often via a non-linear pitch detection algorithm for robustness. Harmonic amplitudes are derived from the discrete Fourier transform (DFT) of the windowed frame, averaged over frequency bins around each harmonic to yield root-mean-square (RMS) magnitudes, with the spectral envelope modeled using line spectral pairs (LSPs). The spectral envelope, modeling vocal tract resonances, is captured using line spectral pairs (LSPs), which are roots of polynomials derived from linear predictive coding (LPC) coefficients; these provide stable and efficient quantization of the 10th-order filter typically used.[9][2] Encoding quantizes these extracted parameters into compact fixed-length bit fields, allocating bits to pitch, LSPs, harmonic amplitudes (or energy), and voicing flags without employing entropy coding to maintain low complexity and fixed delay. Vector quantization is applied to LSPs and sometimes amplitude vectors for perceptual optimality, as scalar methods may introduce spectral mismatches; for instance, in the 3200 bit/s mode, parameters are packed into 64 bits per 20 ms frame using multi-stage vector quantizers trained on speech data. Unvoiced frames simplify encoding by modeling noise excitation shaped by the LSP-derived envelope, reducing bit allocation for harmonics.[9][2][3] Decoding reconstructs the speech by synthesizing the sinusoidal components from the quantized parameters. For voiced frames, the speech is synthesized as a sum of harmonically related sine waves using the quantized pitch, amplitudes (derived from the LSP envelope sampled at harmonics), and phases (modeled continuously across frames via quadratic interpolation or mixed excitation to avoid discontinuities, with overlap-add windowing of adjacent frames). For unvoiced frames, random phase noise is generated and shaped by the spectral envelope derived from LSPs, using an LPC synthesis filter with coefficients obtained by converting LSPs via the relation , where and are polynomials with roots at the conjugate pairs of LSP frequencies on the unit circle; transitions between voiced and unvoiced are blended seamlessly.[9][2]Supported Modes and Bit Rates
Codec 2 operates in several fixed-rate modes tailored to varying bandwidth requirements, ranging from 450 bit/s to 3200 bit/s, each defined by a specific number of bits per frame and frame duration to maintain constant output suitable for channel-constrained applications like HF radio. Higher-rate modes typically employ 20 ms frames, while lower-rate modes extend to 40 ms frames to optimize bit efficiency and reduce synchronization overhead. This structure ensures robust frame synchronization through predictable bit-field packing, where parameters are quantized and arranged in a mode-specific order without variable-length coding. The following table summarizes the supported modes, their bit rates, bits per frame, and frame durations:| Mode | Bit Rate (bit/s) | Bits per Frame | Frame Duration (ms) |
|---|---|---|---|
| 3200 | 3200 | 64 | 20 |
| 2400 | 2400 | 48 | 20 |
| 1600 | 1600 | 64 | 40 |
| 1400 | 1400 | 56 | 40 |
| 1300 | 1300 | 52 | 40 |
| 1200 | 1200 | 48 | 40 |
| 700 | 700 | 28 | 40 |
| 450 | 450 | 18 | 40 |
