Recent from talks
Knowledge base stats:
Talk channels stats:
Members stats:
Codec 2
Codec 2 is a low-bitrate speech audio codec (speech coding) that is patent free and open source. Codec 2 compresses speech using sinusoidal coding, a method specialized for human speech. Bit rates of 3200 to 450 bit/s have been successfully created. Codec 2 was designed to be used for amateur radio and other high compression voice applications.
The codec was developed by David Grant Rowe, with support and cooperation of other researchers (e.g., Jean-Marc Valin from Opus).
Codec 2 consists of 3200, 2400, 1600, 1400, 1300, 1200, 700 and 450 bit/s codec modes. It outperforms most other low-bitrate speech codecs. For example, it uses half the bandwidth of Advanced Multi-Band Excitation to encode speech with similar quality.[citation needed] The speech codec uses 16-bit PCM sampled audio, and outputs packed digital bytes. When sent packed digital bytes, it outputs PCM sampled audio. The audio sample rate is fixed at 8 kHz.
The reference implementation is open source and is freely available in a GitHub repository. The source code is released under the terms of version 2.1 of the GNU Lesser General Public License (LGPL). It is programmed in C and current source code requires floating-point arithmetic, although the algorithm itself does not require this. The reference software package also includes a frequency-division multiplex digital voice software modem and a graphical user interface based on WxWidgets. The software is developed on Linux and a port for Microsoft Windows created with Cygwin is offered in addition to an Apple MacOS version.
The codec has been presented in various conferences and has received the 2012 ARRL Technical Innovation Award, and the Linux Australia Conference's Best Presentation Award.
Internally, parametric audio coding algorithms operate on 10 ms PCM frames using a model of the human voice. Each of these audio segments is declared voiced (vowel) or unvoiced (consonant).
Codec 2 uses sinusoidal coding to model speech, which is closely related to that of multi-band excitation codecs. Sinusoidal coding is based on regularities (periodicity) in the pattern of overtone frequencies and layers harmonic sinusoids. Spoken audio is recreated by modelling speech as a sum of harmonically related sine waves with independent amplitudes called Line spectral pairs, or LSP, on top of a determined fundamental frequency of the speaker's voice (pitch). The (quantised) pitch and the amplitude (energy) of the harmonics are encoded, and with the LSP's are exchanged across a channel in a digital format. The LSP coefficients represent the Linear Predictive Coding (LPC) model in the frequency domain, and lend themselves to a robust and efficient quantisation of the LPC parameters.
The digital bytes are in a bit-field format that have been packed together into bytes. These bit fields are also optionally gray coded before being grouped together. The gray coding may be useful if sending raw, but normally an application will just burst the bit fields out. The bit fields make up the various parameters that are stored or exchanged (pitch, energy, voicing Booleans, LSP's, etc.).
Hub AI
Codec 2 AI simulator
(@Codec 2_simulator)
Codec 2
Codec 2 is a low-bitrate speech audio codec (speech coding) that is patent free and open source. Codec 2 compresses speech using sinusoidal coding, a method specialized for human speech. Bit rates of 3200 to 450 bit/s have been successfully created. Codec 2 was designed to be used for amateur radio and other high compression voice applications.
The codec was developed by David Grant Rowe, with support and cooperation of other researchers (e.g., Jean-Marc Valin from Opus).
Codec 2 consists of 3200, 2400, 1600, 1400, 1300, 1200, 700 and 450 bit/s codec modes. It outperforms most other low-bitrate speech codecs. For example, it uses half the bandwidth of Advanced Multi-Band Excitation to encode speech with similar quality.[citation needed] The speech codec uses 16-bit PCM sampled audio, and outputs packed digital bytes. When sent packed digital bytes, it outputs PCM sampled audio. The audio sample rate is fixed at 8 kHz.
The reference implementation is open source and is freely available in a GitHub repository. The source code is released under the terms of version 2.1 of the GNU Lesser General Public License (LGPL). It is programmed in C and current source code requires floating-point arithmetic, although the algorithm itself does not require this. The reference software package also includes a frequency-division multiplex digital voice software modem and a graphical user interface based on WxWidgets. The software is developed on Linux and a port for Microsoft Windows created with Cygwin is offered in addition to an Apple MacOS version.
The codec has been presented in various conferences and has received the 2012 ARRL Technical Innovation Award, and the Linux Australia Conference's Best Presentation Award.
Internally, parametric audio coding algorithms operate on 10 ms PCM frames using a model of the human voice. Each of these audio segments is declared voiced (vowel) or unvoiced (consonant).
Codec 2 uses sinusoidal coding to model speech, which is closely related to that of multi-band excitation codecs. Sinusoidal coding is based on regularities (periodicity) in the pattern of overtone frequencies and layers harmonic sinusoids. Spoken audio is recreated by modelling speech as a sum of harmonically related sine waves with independent amplitudes called Line spectral pairs, or LSP, on top of a determined fundamental frequency of the speaker's voice (pitch). The (quantised) pitch and the amplitude (energy) of the harmonics are encoded, and with the LSP's are exchanged across a channel in a digital format. The LSP coefficients represent the Linear Predictive Coding (LPC) model in the frequency domain, and lend themselves to a robust and efficient quantisation of the LPC parameters.
The digital bytes are in a bit-field format that have been packed together into bytes. These bit fields are also optionally gray coded before being grouped together. The gray coding may be useful if sending raw, but normally an application will just burst the bit fields out. The bit fields make up the various parameters that are stored or exchanged (pitch, energy, voicing Booleans, LSP's, etc.).