Recent from talks
Contribute something to knowledge base
Content stats: 0 posts, 0 articles, 0 media, 0 notes
Members stats: 0 subscribers, 0 contributors, 0 moderators, 0 supporters
Subscribers
Supporters
Contributors
Moderators
Lip reading
Lip reading, also known as speechreading, is a technique of understanding a limited range of speech by visually interpreting the movements of the lips, face and tongue without sound. Estimates of the range of lip reading vary, with some figures as low as 30% because lip reading relies on context, language knowledge, and any residual hearing. Although lip reading is used most extensively by deaf and hard-of-hearing people, most people with normal hearing process can infer some speech information by observing a speaker's mouth.
Although speech perception is considered to be an auditory skill, it is intrinsically multimodal, since producing speech requires the speaker to make movements of the lips, teeth and tongue which are often visible in face-to-face communication. Information from the lips and face supports aural comprehension and most fluent listeners of a language are sensitive to seen speech actions (see McGurk effect). The extent to which people make use of seen speech actions varies with the visibility of the speech action and the knowledge and skill of the perceiver.
The phoneme is the smallest detectable unit of sound in a language that serves to distinguish words from one another. /pit/ and /pik/ differ by one phoneme and refer to different concepts. Spoken English has about 44 phonemes. For lip reading, the number of visually distinctive units - visemes - is much smaller, thus several phonemes map onto a few visemes. This is because many phonemes are produced within the mouth and throat, and are hard to see. These include glottal consonants and most gestures of the tongue. Voiced and unvoiced pairs look identical, such as [p] and [b], [k] and [g], [t] and [d], [f] and [v], and [s] and [z]; likewise for nasalisation (e.g. [m] vs. [b]). Homophenes are words that look similar when lip read, but which contain different phonemes. Because there are about three times as many phonemes as visemes in English, it is often claimed that only 30% of speech can be lip read. Homophenes are a crucial source of mis-lip reading.
Visemes can be captured as still images, but speech unfolds in time. The smooth articulation of speech sounds in sequence can mean that mouth patterns may be 'shaped' by an adjacent phoneme: the 'th' sound in 'tooth' and in 'teeth' appears very different because of the vocalic context. This feature of dynamic speech-reading affects lip-reading 'beyond the viseme'.
While visemes offer a useful starting point for understanding lipreading, spoken distinctions within a viseme can be distinguished and can help support identification. Moreover, the statistical distribution of phonemes within the lexicon of a language is uneven. While there are clusters of words which are phonemically similar to each other ('lexical neighbors', such as spit/sip/sit/stick...etc.), others are unlike all other words: they are 'unique' in terms of the distribution of their phonemes ('umbrella' may be an example). Skilled users of the language bring this knowledge to bear when interpreting speech, so it is generally harder to identify a heard word with many lexical neighbors than one with few neighbors. Applying this insight to seen speech, some words in the language can be unambiguously lip-read even when they contain few visemes - simply because no other words could possibly 'fit'.
Many factors affect the visibility of a speaking face, including illumination, movement of the head/camera, frame-rate of the moving image and distance from the viewer (see e.g.). Head movement that accompanies normal speech can also improve lip-reading, independently of oral actions. However, when lip-reading connected speech, the viewer's knowledge of the spoken language, familiarity with the speaker and style of speech, and the context of the lip-read material are as important as the visibility of the speaker. While most hearing people are sensitive to seen speech, there is great variability in individual speechreading skill. Good lipreaders are often more accurate than poor lipreaders at identifying phonemes from visual speech.
A simple visemic measure of 'lipreadability' has been questioned by some researchers. The 'phoneme equivalence class' measure takes into account the statistical structure of the lexicon and can also accommodate individual differences in lip-reading ability. In line with this, excellent lipreading is often associated with more broad-based cognitive skills including general language proficiency, executive function and working memory.
Seeing the mouth plays a role in the very young infant's early sensitivity to speech, and prepares them to become speakers at 1 – 2 years. In order to imitate, a baby must learn to shape their lips in accordance with the sounds they hear; seeing the speaker may help them to do this. Newborns imitate adult mouth movements such as sticking out the tongue or opening the mouth, which could be a precursor to further imitation and later language learning. Infants are disturbed when audiovisual speech of a familiar speaker is desynchronized and tend to show different looking patterns for familiar than for unfamiliar faces when matched to (recorded) voices. Infants are sensitive to McGurk illusions months before they have learned to speak. These studies and many more point to a role for vision in the development of sensitivity to (auditory) speech in the first half-year of life.
Lip reading
Lip reading, also known as speechreading, is a technique of understanding a limited range of speech by visually interpreting the movements of the lips, face and tongue without sound. Estimates of the range of lip reading vary, with some figures as low as 30% because lip reading relies on context, language knowledge, and any residual hearing. Although lip reading is used most extensively by deaf and hard-of-hearing people, most people with normal hearing process can infer some speech information by observing a speaker's mouth.
Although speech perception is considered to be an auditory skill, it is intrinsically multimodal, since producing speech requires the speaker to make movements of the lips, teeth and tongue which are often visible in face-to-face communication. Information from the lips and face supports aural comprehension and most fluent listeners of a language are sensitive to seen speech actions (see McGurk effect). The extent to which people make use of seen speech actions varies with the visibility of the speech action and the knowledge and skill of the perceiver.
The phoneme is the smallest detectable unit of sound in a language that serves to distinguish words from one another. /pit/ and /pik/ differ by one phoneme and refer to different concepts. Spoken English has about 44 phonemes. For lip reading, the number of visually distinctive units - visemes - is much smaller, thus several phonemes map onto a few visemes. This is because many phonemes are produced within the mouth and throat, and are hard to see. These include glottal consonants and most gestures of the tongue. Voiced and unvoiced pairs look identical, such as [p] and [b], [k] and [g], [t] and [d], [f] and [v], and [s] and [z]; likewise for nasalisation (e.g. [m] vs. [b]). Homophenes are words that look similar when lip read, but which contain different phonemes. Because there are about three times as many phonemes as visemes in English, it is often claimed that only 30% of speech can be lip read. Homophenes are a crucial source of mis-lip reading.
Visemes can be captured as still images, but speech unfolds in time. The smooth articulation of speech sounds in sequence can mean that mouth patterns may be 'shaped' by an adjacent phoneme: the 'th' sound in 'tooth' and in 'teeth' appears very different because of the vocalic context. This feature of dynamic speech-reading affects lip-reading 'beyond the viseme'.
While visemes offer a useful starting point for understanding lipreading, spoken distinctions within a viseme can be distinguished and can help support identification. Moreover, the statistical distribution of phonemes within the lexicon of a language is uneven. While there are clusters of words which are phonemically similar to each other ('lexical neighbors', such as spit/sip/sit/stick...etc.), others are unlike all other words: they are 'unique' in terms of the distribution of their phonemes ('umbrella' may be an example). Skilled users of the language bring this knowledge to bear when interpreting speech, so it is generally harder to identify a heard word with many lexical neighbors than one with few neighbors. Applying this insight to seen speech, some words in the language can be unambiguously lip-read even when they contain few visemes - simply because no other words could possibly 'fit'.
Many factors affect the visibility of a speaking face, including illumination, movement of the head/camera, frame-rate of the moving image and distance from the viewer (see e.g.). Head movement that accompanies normal speech can also improve lip-reading, independently of oral actions. However, when lip-reading connected speech, the viewer's knowledge of the spoken language, familiarity with the speaker and style of speech, and the context of the lip-read material are as important as the visibility of the speaker. While most hearing people are sensitive to seen speech, there is great variability in individual speechreading skill. Good lipreaders are often more accurate than poor lipreaders at identifying phonemes from visual speech.
A simple visemic measure of 'lipreadability' has been questioned by some researchers. The 'phoneme equivalence class' measure takes into account the statistical structure of the lexicon and can also accommodate individual differences in lip-reading ability. In line with this, excellent lipreading is often associated with more broad-based cognitive skills including general language proficiency, executive function and working memory.
Seeing the mouth plays a role in the very young infant's early sensitivity to speech, and prepares them to become speakers at 1 – 2 years. In order to imitate, a baby must learn to shape their lips in accordance with the sounds they hear; seeing the speaker may help them to do this. Newborns imitate adult mouth movements such as sticking out the tongue or opening the mouth, which could be a precursor to further imitation and later language learning. Infants are disturbed when audiovisual speech of a familiar speaker is desynchronized and tend to show different looking patterns for familiar than for unfamiliar faces when matched to (recorded) voices. Infants are sensitive to McGurk illusions months before they have learned to speak. These studies and many more point to a role for vision in the development of sensitivity to (auditory) speech in the first half-year of life.
