Recent from talks
Nothing was collected or created yet.
ARPABET
View on WikipediaARPABET (also spelled ARPAbet) is a set of phonetic transcription codes developed by Advanced Research Projects Agency (ARPA) as a part of their Speech Understanding Research project in the 1970s. It represents phonemes and allophones of General American English with distinct sequences of ASCII characters. Two systems, one representing each segment with one character (alternating upper- and lower-case letters) and the other with one or two (case-insensitive), were devised, the latter being far more widely adopted.[1]
ARPABET has been used in several speech synthesizers, including Computalker for the S-100 system, SAM (Software Automatic Mouth) for the Atari 8-bit computers and Commodore 64, the Say utility shipped with the Amiga, TextAssist for the PC and Speakeasy from Intelligent Artefacts which used the Votrax SC-01 speech synthesiser IC. It is also used in the CMU Pronouncing Dictionary. A revised version of ARPABET is used in the TIMIT corpus.[1]
Symbols
[edit]Stress is indicated by a digit immediately following a vowel. Auxiliary symbols are identical in 1- and 2-letter codes. In 2-letter notation, segments are separated by a space.
| ARPABET | IPA | Example(s) | |
|---|---|---|---|
| 1-letter | 2-letter | ||
| a | AA | ɑ~ɒ | balm, bot (with father–bother merger) |
| @ | AE | æ | bat |
| A | AH | ʌ | buck |
| c | AO | ɔ | caught, story |
| W | AW | aʊ | bout |
| x | AX | ə | comma |
| N/a | AXR[3] | ɚ | letter, forward |
| Y | AY | aɪ | bite |
| E | EH | ɛ | bet |
| R | ER | ɝ | bird, foreword |
| e | EY | eɪ | bait |
| I | IH | ɪ | bit |
| X | IX | ɨ | roses, rabbit |
| i | IY | i | beat |
| o | OW | oʊ | boat |
| O | OY | ɔɪ | boy |
| U | UH | ʊ | book |
| u | UW | u | boot |
| N/a | UX[3] | ʉ | dude |
| ARPABET | IPA | Example | |
|---|---|---|---|
| 1-letter | 2-letter | ||
| b | B | b | buy |
| C | CH | tʃ | China |
| d | D | d | die |
| D | DH | ð | thy |
| F | DX | ɾ | butter |
| L | EL | l̩ | bottle |
| M | EM | m̩ | rhythm |
| N | EN | n̩ | button |
| f | F | f | fight |
| g | G | ɡ | guy |
| h | HH or H[3] | h | high |
| J | JH | dʒ | jive |
| k | K | k | kite |
| l | L | l | lie |
| m | M | m | my |
| n | N | n | nigh |
| G | NX or NG[3] | ŋ | sing |
| N/a | NX[3] | ɾ̃ | winter |
| p | P | p | pie |
| Q | Q | ʔ | uh-oh |
| r | R | ɹ | rye |
| s | S | s | sigh |
| S | SH | ʃ | shy |
| t | T | t | tie |
| T | TH | θ | thigh |
| v | V | v | vie |
| w | W | w | wise |
| H | WH | ʍ | why (without wine–whine merger) |
| y | Y | j | yacht |
| z | Z | z | zoo |
| Z | ZH | ʒ | pleasure |
| AB | Description |
|---|---|
| 0 | No stress |
| 1 | Primary stress |
| 2 | Secondary stress |
| 3... | Tertiary and further stress |
| - | Silence |
| ! | Non-speech segment |
| + | Morpheme boundary |
| / | Word boundary |
| # | Utterance boundary |
| : | Tone group boundary |
| :1 or . | Falling or declining juncture |
| :2 or ? | Rising or internal juncture |
| :3 or . | Fall-rise or non-terminal juncture |
TIMIT
[edit]In TIMIT, the following symbols are used in addition to the ones listed above:[4]
| Symbol | IPA | Example | Description |
|---|---|---|---|
| AX-H | ə̥ | suspect | Devoiced /ə/ |
| BCL | b̚ | obtain | [b] closure |
| DCL | d̚ | width | [d] closure |
| ENG | ŋ̍ | Washington | Syllabic [ŋ] |
| GCL | ɡ̚ | dogtooth | [ɡ] closure |
| HV | ɦ | ahead | Voiced /h/ |
| KCL | k̚ | doctor | [k] closure |
| PCL | p̚ | accept | [p] closure |
| TCL | t̚ | catnip | [t] closure |
| PAU | N/a | N/a | Pause |
| EPI | N/a | N/a | Epenthetic silence |
| H# | N/a | N/a | Begin/end marker |
See also
[edit]- Comparison of ASCII encodings of the International Phonetic Alphabet
- SAMPA, language-specific
- X-SAMPA, encoding the whole International Phonetic Alphabet
- Pronunciation respelling for English
References
[edit]- ^ a b Klautau, Aldebaro (2001). "ARPABET and the TIMIT alphabet" (PDF). Archived from the original (PDF) on June 3, 2016. Retrieved September 8, 2017.
- ^ a b c Rice, Lloyd (April 1976). "Hardware & software for speech synthesis". Dr. Dobb's Journal of Computer Calisthenics & Orthodontia. 1 (4): 6–8.
- ^ a b c d e Jurafsky, Daniel; Martin, James H. (2000). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice Hall. pp. 94–5. ISBN 0-1309-5069-6.
- ^ "Table of all the phonemic and phonetic symbols used in the TIMIT lexicon". Linguistic Data Consortium. October 12, 1990. Retrieved September 8, 2017.
External links
[edit]ARPABET
View on GrokipediaHistory and Development
Origins in ARPA Projects
The Advanced Research Projects Agency (ARPA, now DARPA) launched the Speech Understanding Research (SUR) program in 1971, marking a pivotal U.S. government-funded initiative to overcome longstanding challenges in continuous speech recognition for American English. This five-year effort, spanning 1971 to 1976, responded to criticisms of the field's progress, such as those voiced by John R. Pierce in 1969, by allocating resources to develop practical systems capable of handling connected speech from multiple speakers with limited training. The program emphasized interdisciplinary collaboration among computer scientists, linguists, and engineers to create robust acoustic-phonetic models and understanding frameworks, ultimately demonstrating four major systems by 1976.[5][6] A core outcome of the SUR program was the creation of ARPABET, a phonetic transcription system designed specifically as a machine-readable alphabet to enable consistent representation of American English phonemes in computational environments. The primary goal was to standardize notation for acoustic-phonetic labeling, allowing researchers to annotate speech data, model pronunciations, and integrate phonetic knowledge into recognition algorithms without reliance on complex international symbols incompatible with early computing hardware. This addressed the need for an ASCII-friendly alternative to systems like the International Phonetic Alphabet (IPA), facilitating data sharing across project sites and accelerating experiments in speech segmentation and verification.[2][7] Key contributors to ARPABET's development included researchers at Carnegie Mellon University (CMU), where teams building the Hearsay and Harpy systems required precise phonetic dictionaries for word hypothesis generation and verification, as well as collaborators at other ARPA contractors such as Bolt Beranek and Newman (BBN), who integrated similar notations into their HWIM system. These groups, funded under the SUR initiative, collectively defined ARPABET to support tasks like syllable-based equivalence classes and allophonic labeling, ensuring interoperability in phonetic networks and acoustic matching. The first formal definition appeared in an early 1970s ARPA project report, specifying 39 core symbols for consonants and vowels, augmented by stress markers (0 for no stress, 1 for primary stress, and 2 for secondary stress) to capture prosodic features essential for natural speech processing.[6][2]Evolution and Standardization
Following its initial definition in the 1970s as part of ARPA-funded speech research, ARPABET's core phoneme inventory of 39 symbols remained stable, with formal documentation provided in Shoup (1980).[2] Standardization efforts accelerated through DARPA's Strategic Computing Initiative, launched in 1983, which emphasized interoperability in AI and speech technologies across funded labs. This initiative promoted ARPABET as a common encoding scheme to ensure consistent phonetic annotations in shared datasets and evaluation benchmarks, reducing variability in speech recognition experiments. DARPA's oversight helped establish ARPABET as the de facto standard for American English phonetics in computational linguistics during the decade.[8] Key milestones included its integration into the 1987 DARPA Resource Management evaluation, the first large-scale benchmark for continuous speech recognition systems, where ARPABET transcriptions were used to assess performance on naval resource queries. Formal documentation appeared in NIST reports, such as those accompanying the TIMIT corpus developed under DARPA auspices from 1982 to 1986, which extended ARPABET for time-aligned phonetic labeling.[9][10] Prosody handling, including utterance boundaries marked by # to denote silences and phrase breaks, was included from early development and supported segmentation in connected speech analysis. These features, verified in DARPA workshops like the 1986 Speech Recognition Meeting, aided robust modeling of intonation and timing without altering the core phoneme inventory.[11]Phoneme Inventory
Vowel Phonemes
ARPABET utilizes a distinct set of symbols to represent the vowel phonemes of General American English, focusing on the primary distinctions in articulation and quality observed in speech. These symbols encode both monophthongs, which maintain a relatively steady tongue position, and diphthongs, which involve a glide between two vowel targets. The system distinguishes tense and lax vowels, particularly in the high and mid positions, to reflect durational and spectral differences crucial for speech recognition and synthesis.[2] The core monophthong vowels consist of 11 symbols, capturing variations in height (high, mid, low), backness (front, central, back), and rounding, as well as r-colored and reduced forms. For instance, tense high front IY contrasts with lax high front IH, where IY exhibits higher second formant frequencies around 2,200-2,500 Hz, aiding in perceptual separation. Similarly, low back AA features a low first formant (F1 ≈ 700-800 Hz) and back second formant (F2 ≈ 1,100-1,300 Hz), distinguishing it from front low AE. The central schwa AX serves as the most common reduced vowel in unstressed positions, with neutral formants (F1 ≈ 500 Hz, F2 ≈ 1,500 Hz). R-colored ER incorporates rhotic resonance, lowering F3 to about 1,600-1,800 Hz. These acoustic properties were considered in ARPABET's design to support machine processing of American English speech variability.[2][12] Diphthong vowels are represented by 4 symbols, each denoting a dynamic transition: AY glides from low central to high front, AW from low central to high back, EY from mid front to high front, and OW from mid back to high back. These glides are essential for capturing the off-glides in words like "bite" (B AY T), where the trajectory shifts F2 from ≈1,200 Hz to ≈2,200 Hz. The symbols prioritize the primary vowel target followed by the glide component, facilitating efficient transcription in phonetic databases.[2][13] Stress is indicated directly on vowels using numeric markers: primary stress with '1' (often rendered as ´ in display), secondary stress with '2' (rendered as `), and no stress with '0' (default or omitted). This applies exclusively to vowels, as in "father" transcribed as F AA1 DH ER0, where AA bears primary stress, elevating its duration and pitch prominence. These markers enable precise prosodic annotation in applications like speech synthesis.[13][14] The following table summarizes the vowel phonemes, with ARPABET symbols, approximate IPA equivalents, articulatory descriptions, example words, and transcriptions:| Symbol | IPA Approx. | Description | Example Word | Transcription |
|---|---|---|---|---|
| AA | /ɑ/ | Low back unrounded monophthong | father | F AA1 DH ER |
| AE | /æ/ | Low front unrounded monophthong | bat | B AE1 T |
| AH | /ʌ/ | Mid central unrounded monophthong | but | B AH1 T |
| AO | /ɔ/ | Mid back rounded monophthong | bought | B AO1 T |
| AX | /ə/ | Mid central reduced monophthong (schwa) | sofa | S OW1 F AX |
| EH | /ɛ/ | Mid front unrounded monophthong (lax) | bet | B EH1 T |
| ER | /ɝ/ | Mid central r-colored monophthong | bird | B ER1 D |
| IH | /ɪ/ | High front unrounded monophthong (lax) | bit | B IH1 T |
| IY | /i/ | High front unrounded monophthong (tense) | beat | B IY1 T |
| UH | /ʊ/ | High back rounded monophthong (lax) | put | P UH1 T |
| UW | /u/ | High back rounded monophthong (tense) | boot | B UW1 T |
| AY | /aɪ/ | Low to high front diphthong | bite | B AY1 T |
| AW | /aʊ/ | Low to high back diphthong | bout | B AW1 T |
| EY | /eɪ/ | Mid to high front diphthong | bait | B EY1 T |
| OW | /oʊ/ | Mid to high back diphthong | boat | B OW1 T |
Consonant Phonemes
ARPABET employs 24 consonant phonemes to transcribe General American English sounds, using uppercase ASCII symbols that encode key articulatory features such as place and manner of articulation, voicing, and nasality. These phonemes form the consonantal backbone for applications in speech recognition and synthesis, where precise distinctions enable accurate acoustic modeling. The inventory draws from the phonemic contrasts in American English, omitting allophonic variations except where contextually relevant, like aspiration in voiceless stops.[2] The stop consonants comprise six symbols, organized as voiceless and voiced pairs across bilabial, alveolar, and velar places of articulation: P (/p/), B (/b/), T (/t/), D (/d/), K (/k/), and G (/g/). Stops involve a complete closure in the vocal tract followed by a sudden release of air pressure; the voiceless variants P, T, and K are typically aspirated ([pʰ], [tʰ], [kʰ]) when occurring at the onset of stressed syllables, as in "pin" (P IH N) or "cat" (K AE T). This aspiration, a burst of voiceless airflow, distinguishes English stops from their unaspirated counterparts in other languages, though ARPABET uses a single symbol for each. Alveolar stops (T, D) differ from velar ones (K, G) in the tongue's contact point—behind the teeth versus the soft palate—yielding contrasts like "tip" (T IH P) versus "keep" (K IY P).[2] Fricatives and affricates are captured by nine fricative symbols and two affricates, emphasizing continuous airflow turbulence or combined stop-fricative sequences: F (/f/), V (/v/), TH (/θ/), DH (/ð/), S (/s/), Z (/z/), SH (/ʃ/), ZH (/ʒ/), and HH (/h/) for fricatives, plus CH (/tʃ/) and JH (/dʒ/) for affricates. Fricatives produce noise from air forced through a narrow constriction, with voicing distinguishing pairs like S (voiceless alveolar, as in "soup" S UW P) from Z (voiced, as in "zoo" Z UW). Affricates begin with a stop closure and transition to fricative release, as in CH for "cherry" (CH EH R IY). Postalveolar fricatives (SH, ZH) involve tongue contact further back than alveolar (S, Z), creating sounds like "ship" (SH IH P) versus "sip" (S IH P); the glottal HH represents breathy onset, as in "honey" (HH AH N IY).[2] Nasals, liquids, and glides total seven symbols, facilitating resonant sounds with partial or no obstruction: M (/m/), N (/n/), NG (/ŋ/) for nasals; L (/l/) for the lateral liquid; and W (/w/), Y (/j/), R (/ɹ/) for glides and the rhotic approximant. Nasals divert airflow through the nose via lowered velum, with place varying from bilabial M (as in "mint" M IH N T) to alveolar N ("nutmeg" N AH T M EH G) to velar NG ("baking" B EY K IH NG). The liquid L allows air to flow around the tongue sides ("licorice" L IH K ER IH SH), while R is a bunched or retroflex approximant ("rice" R AY S) differing from alveolar in tongue shape. Glides W and Y are vowel-like transitions, labial-velar W in "kiwi" (K IY W IY) and palatal Y in "yellow" (Y EH L OW), enabling smooth syllable onsets.[2] The following table summarizes the ARPABET consonant symbols, their IPA equivalents, articulatory details, and representative examples:| ARPAbet | IPA | Place of Articulation | Manner of Articulation | Example Word | ARPAbet Example |
|---|---|---|---|---|---|
| P | /p/ | Bilabial | Stop (voiceless, aspirated) | pin | P IH N |
| B | /b/ | Bilabial | Stop (voiced) | bay | B EY |
| T | /t/ | Alveolar | Stop (voiceless, aspirated) | tea | T IY |
| D | /d/ | Alveolar | Stop (voiced) | dill | D IH L |
| K | /k/ | Velar | Stop (voiceless, aspirated) | cook | K UH K |
| G | /g/ | Velar | Stop (voiced) | garlic | G AA R L IH K |
| CH | /tʃ/ | Postalveolar | Affricate (voiceless) | cherry | CH EH R IY |
| JH | /dʒ/ | Postalveolar | Affricate (voiced) | jar | JH AA R |
| F | /f/ | Labiodental | Fricative (voiceless) | flour | F L AW ER |
| V | /v/ | Labiodental | Fricative (voiced) | clove | K L OW V |
| TH | /θ/ | Dental | Fricative (voiceless) | thick | TH IH K |
| DH | /ð/ | Dental | Fricative (voiced) | those | DH OW Z |
| S | /s/ | Alveolar | Fricative (voiceless) | soup | S UW P |
| Z | /z/ | Alveolar | Fricative (voiced) | zoo | Z UW |
| SH | /ʃ/ | Postalveolar | Fricative (voiceless) | ship | SH IH P |
| ZH | /ʒ/ | Postalveolar | Fricative (voiced) | azure | AE ZH ER |
| HH | /h/ | Glottal | Fricative (voiceless) | honey | HH AH N IY |
| M | /m/ | Bilabial | Nasal | mint | M IH N T |
| N | /n/ | Alveolar | Nasal | nutmeg | N AH T M EH G |
| NG | /ŋ/ | Velar | Nasal | baking | B EY K IH NG |
| L | /l/ | Alveolar | Lateral approximant | licorice | L IH K ER IH SH |
| R | /ɹ/ | Alveolar | Approximant (rhotic) | rice | R AY S |
| W | /w/ | Labial-velar | Glide | kiwi | K IY W IY |
| Y | /j/ | Palatal | Glide | yellow | Y EH L OW |
