Hubbry Logo
logo
Linear predictive coding
Community hub

Linear predictive coding

logo
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something to knowledge base
Hub AI

Linear predictive coding AI simulator

(@Linear predictive coding_simulator)

Linear predictive coding

Linear predictive coding (LPC) is a method used mostly in audio signal processing and speech processing for representing the spectral envelope of a digital signal of speech in compressed form, using the information of a linear predictive model.

LPC is the most widely used method in speech coding and speech synthesis. It is a powerful speech analysis technique, and a useful method for encoding good quality speech at a low bit rate.

LPC starts with the assumption that a speech signal is produced by a buzzer at the end of a tube (for voiced sounds), with occasional added hissing and popping sounds (for voiceless sounds such as sibilants and plosives). Although apparently crude, this Source–filter model is actually a close approximation of the reality of speech production. The glottis (the space between the vocal folds) produces the buzz, which is characterized by its intensity (loudness) and frequency (pitch). The vocal tract (the throat and mouth) forms the tube, which is characterized by its resonances; these resonances give rise to formants, or enhanced frequency bands in the sound produced. Hisses and pops are generated by the action of the tongue, lips and throat during sibilants and plosives.

LPC analyzes the speech signal by estimating the formants, removing their effects from the speech signal, and estimating the intensity and frequency of the remaining buzz. The process of removing the formants is called inverse filtering, and the remaining signal after the subtraction of the filtered modeled signal is called the residue.

The numbers which describe the intensity and frequency of the buzz, the formants, and the residue signal, can be stored or transmitted somewhere else. LPC synthesizes the speech signal by reversing the process: use the buzz parameters and the residue to create a source signal, use the formants to create a filter (which represents the tube), and run the source through the filter, resulting in speech.

Because speech signals vary with time, this process is done on short chunks of the speech signal, which are called frames; generally, 30 to 50 frames per second give an intelligible speech with good compression.

Linear prediction (signal estimation) goes back to at least the 1940s when Norbert Wiener developed a mathematical theory for calculating the best filters and predictors for detecting signals hidden in noise. Soon after Claude Shannon established a general theory of coding, work on predictive coding was done by C. Chapin Cutler, Bernard M. Oliver and Henry C. Harrison. Peter Elias in 1955 published two papers on predictive coding of signals.

Linear predictors were applied to speech analysis independently by Fumitada Itakura of Nagoya University and Shuzo Saito of Nippon Telegraph and Telephone in 1966 and in 1967 by Bishnu S. Atal, Manfred R. Schroeder and John Burg. Itakura and Saito described a statistical approach based on maximum likelihood estimation; Atal and Schroeder described an adaptive linear predictor approach; Burg outlined an approach based on principle of maximum entropy.

See all
User Avatar
No comments yet.