List of Latin-script alphabets
View on Wikipedia

The lists and tables below summarize and compare the letter inventories of some of the Latin-script alphabets. In this article, the scope of the word "alphabet" is broadened to include letters with tone marks, and other diacritics used to represent a wide range of orthographic traditions, without regard to whether or how they are sequenced in their alphabet or the table.
Parentheses indicate characters not used in modern standard orthographies of the languages, but used in obsolete and/or dialectal forms.
Letters contained in the ISO basic Latin alphabet
[edit]Alphabets that contain only ISO basic Latin letters
[edit]Among alphabets for natural languages the English,[36] Indonesian, and Malay alphabets only use the 26 letters in both cases.
Among alphabets for constructed languages the Ido and Interlingua alphabets only use the 26 letters, while Toki Pona uses a 14-letter subset.
- German (ß), Scandinavian (æ)
Extended by diacritical marks
[edit]- Spanish (ñ), German (ä, ö, and ü)
Extended by multigraphs
[edit]- Filipino (ng)
- Old French (ch)
Alphabets that contain all ISO basic Latin letters
[edit]Among alphabets for natural languages the Afrikaans,[54] Aromanian, Azerbaijani (some dialects)[53], Basque,[4], Celtic British, Catalan,[6] Cornish, Czech,[8] Danish,[9] Dutch,[10] Emilian-Romagnol, Filipino,[11] Finnish, French,[12], German,[13] Greenlandic, Hungarian,[15] Javanese, Karakalpak,[23] Kurdish, Modern Latin, Luxembourgish, Norwegian,[9] Oromo[65], Papiamento[63], Polish[22], Portuguese, Quechua, Rhaeto-Romance, Romanian, Slovak,[24] Spanish,[25] Sundanese, Swedish, Tswana,[52] Uyghur, Venda,[51] Võro, Walloon,[27] West Frisian, Xhosa, Zhuang, Zulu alphabets include all 26 letters, at least in their largest version.
Among alphabets for constructed languages the Interglossa and Occidental alphabets include all 26 letters.
The International Phonetic Alphabet (IPA) includes all 26 letters in their lowercase forms, although g is always single-storey (ɡ) in the IPA and never double-storey (
).
Alphabets that do not contain all ISO basic Latin letters
[edit]This list is based on official definitions of each alphabet. However, excluded letters might occur in non-integrated loan words and place names.
| Alphabet | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | # |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Classical Latin[2] | A | B | C | D | E | F | G | H | I | K | L | M | N | O | P | Q | R | S | T | V | X | Y | Z | 23 | |||
| Albanian[3] | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | X | Y | Z | 25 | |
| Anglo-Saxon | A | B | C | D | E | F | G | H | I | K | L | M | N | O | P | Q | R | S | T | U | X | Y | Z | 23 | |||
| Arapaho | B | C | E | H | I | K | N | O | S | T | U | W | Y | 13 | |||||||||||||
| Arbëresh | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | X | Y | Z | 25 | |
| Asturian | A | B | C | D | E | F | G | H | I | L | M | N | O | P | Q | R | S | T | U | V | X | Y | Z | 23 | |||
| Azerbaijani[53] | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | X | Y | Z | 25 | |
| Bambara[39] | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | R | S | T | U | W | Y | Z | 23 | |||
| Belarusian[5] | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | R | S | T | U | V | Y | Z | 23 | |||
| Berber | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | Q | R | S | T | U | W | X | Y | Z | 24 | ||
| Bislama[45] | A | B | D | E | F | G | H | I | J | K | L | M | N | O | P | R | S | T | U | V | W | Y | 22 | ||||
| Breton | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | R | S | T | U | V | W | Y | Z | 24 | ||
| Chamorro[43] | A | B | C | D | E | F | G | H | I | K | L | M | N | O | P | R | S | T | U | Y | 20 | ||||||
| Chewa | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | R | S | T | U | V | W | Y | Z | 24 | ||
| Corsican[31] | A | B | C | D | E | F | G | H | I | J | L | M | N | O | P | Q | R | S | T | U | V | Z | 22 | ||||
| Crimean Tatar | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | Y | Z | 24 | ||
| Croatian [7] | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | R | S | T | U | V | Z | 22 | ||||
| Cypriot Arabic[59] | A | B | C | D | E | F | G | I | J | K | L | M | N | O | P | R | S | T | U | V | W | X | Y | Z | 24 | ||
| Dakelh[61] | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | R | S | T | U | V | W | Y | Z | 22 | ||
| Dakota | A | B | C | D | E | G | H | I | J | K | M | N | O | P | S | T | U | W | Y | Z | 20 | ||||||
| Dalecarlian | A | B | D | E | F | G | H | I | J | K | L | M | N | O | P | R | S | T | U | V | W | Y | 22 | ||||
| Dinka[40] | A | B | C | D | E | G | H | I | J | K | L | M | N | O | P | R | T | U | W | Y | 20 | ||||||
| Esperanto | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | R | S | T | U | V | Z | 22 | ||||
| Estonian | A | B | D | E | F | G | H | I | J | K | L | M | N | O | P | R | S | T | U | V | Z | 21 | |||||
| Extremaduran | A | B | C | D | E | F | G | H | I | J | L | M | N | O | P | Q | R | S | T | U | V | X | Y | Z | 24 | ||
| Fala | A | B | C | D | E | F | G | H | I | J | L | M | N | O | P | Q | R | S | T | U | V | X | Z | 23 | |||
| Faroese | A | B | D | E | F | G | H | I | J | K | L | M | N | O | P | R | S | T | U | V | Y | 21 | |||||
| Filipino Abakada[11] | A | B | D | E | G | H | I | K | L | M | N | O | P | R | S | T | U | W | Y | 19 | |||||||
| Friulian | A | B | C | D | E | F | G | H | I | J | L | M | N | O | P | R | S | T | U | V | Z | 21 | |||||
| Fula[41] | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | R | S | T | U | W | X | Y | 23 | |||
| Gagauz | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | R | S | T | U | V | Y | Z | 23 | |||
| Galician[33] | A | B | C | D | E | F | G | H | I | L | M | N | O | P | Q | R | S | T | U | V | X | Z | 22 | ||||
| Gilbertese | A | B | E | I | K | M | N | O | R | T | U | W | 12 | ||||||||||||||
| Glosa | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Z | 25 | |
| Traditional Greenlandic | A | E | F | G | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | 18 | ||||||||
| Guaraní[14] | A | B | C | D | E | G | H | I | J | K | L | M | N | O | P | R | S | T | U | V | Y | 21 | |||||
| Gwich'in[67] | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | R | S | T | U | V | W | Y | Z | 23 | |||
| Haitian | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | R | S | T | U | V | W | X | Y | Z | 25 | |
| Hän | A | B | C | D | E | G | H | I | J | K | L | M | N | O | P | R | S | T | U | W | Y | Z | 22 | ||||
| Hausa[30] | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | R | S | T | U | W | Y | Z | 23 | |||
| Hawaiian | A | E | H | I | K | L | M | N | O | P | U | W | 12 | ||||||||||||||
| Icelandic | A | B | D | E | F | G | H | I | J | K | L | M | N | O | P | R | S | T | U | V | X | Y | 22 | ||||
| Igbo[42] | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | R | S | T | U | V | W | Y | Z | 24 | ||
| Inari Sami | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | R | S | T | U | V | Y | Z | 23 | |||
| Irish[16] | A | B | C | D | E | F | G | H | I | L | M | N | O | P | R | S | T | U | V | Z | 20 | ||||||
| Italian[17] | A | B | C | D | E | F | G | H | I | L | M | N | O | P | Q | R | S | T | U | V | Z | 21 | |||||
| Karelian | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | R | S | T | U | V | Y | Z | 23 | |||
| Kashubian | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | R | S | T | U | W | Y | Z | 23 | |||
| Kazakh[38] | A | B | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | Y | Z | 23 | |||
| Khasi | A | B | D | E | G | H | I | J | K | L | M | N | O | P | R | S | T | U | W | Y | 20 | ||||||
| Latvian[18] | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | R | S | T | U | V | (Y) | Z | 23 | |||
| Lithuanian[19] | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | R | S | T | U | V | Y | Z | 23 | |||
| Livonian[46] | A | B | D | E | F | G | H | I | J | K | L | M | N | O | P | R | S | T | U | V | (Y) | Z | 23 | ||||
| Lojban | A | B | C | D | E | F | G | I | J | K | L | M | N | O | P | R | S | T | U | V | X | Y | Z | 23 | |||
| Lule Sami[60] | A | B | D | E | F | G | H | I | J | K | L | M | N | O | P | R | S | T | U | V | 20 | ||||||
| Malagasy | A | B | D | E | F | G | H | I | J | K | L | M | N | O | P | R | S | T | V | Y | Z | 21 | |||||
| Maltese[20] | A | B | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Z | 24 | ||
| Manx Gaelic | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | Y | 24 | ||
| Māori[34] | A | E | G | H | I | K | M | N | O | P | R | T | U | W | 14 | ||||||||||||
| Marshallese[47] | A | B | D | E | I | J | K | L | M | N | O | P | R | T | U | W | Y | 17 | |||||||||
| Massachusett[62] | A | C | E | H | K | M | N | P | Q | S | T | U | W | Y | 14 | ||||||||||||
| Mirandese | A | B | C | D | E | F | G | H | I | J | L | M | N | O | P | Q | R | S | T | U | X | Y | Z | 23 | |||
| Mohawk | A | E | H | I | K | N | O | R | S | T | W | Y | 12 | ||||||||||||||
| Na'vi[57][2] | A | E | F | G | H | I | K | L | M | N | O | P | R | S | T | U | V | W | X | Y | Z | 21 | |||||
| Navajo | A | B | C | D | E | G | H | I | J | K | L | M | N | O | S | T | W | X | Y | Z | 20 | ||||||
| Northern Sami | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | R | S | T | U | V | Z | 22 | ||||
| Nuxalk | A | C | H | I | K | L | M | N | P | Q | S | T | U | W | X | Y | 16 | ||||||||||
| Occitan | A | B | C | D | E | F | G | H | I | J | L | M | N | O | P | Q | R | S | T | U | V | X | Z | 23 | |||
| Pan-Nigerian | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | R | S | T | U | V | W | Y | Z | 24 | ||
| Piedmontese | A | B | C | D | E | F | G | H | I | J | L | M | N | O | P | Q | R | S | T | U | V | Z | 22 | ||||
| Pinyin[32] | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | W | X | Y | Z | 25 | |
| Romani[29] | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | R | S | T | U | V | X | Z | 23 | |||
| Rotokas | A | E | G | I | K | O | P | R | S | T | U | V | 12 | ||||||||||||||
| Samoan | A | E | F | G | H | I | K | L | M | N | O | P | R | S | T | U | V | 17 | |||||||||
| Sardinian | A | B | C | D | E | F | G | H | I | J | L | M | N | O | P | Q | R | S | T | U | V | X | Y | Z | 24 | ||
| Scottish Gaelic | A | B | C | D | E | F | G | H | I | L | M | N | O | P | R | S | T | U | 18 | ||||||||
| Serbian[7] | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | R | S | T | U | V | Z | 22 | ||||
| Shona | A | B | C | D | E | F | G | H | I | J | K | M | N | O | P | R | S | T | U | V | W | Y | Z | 24 | |||
| Sicilian | A | B | C | D | E | F | G | H | I | J | L | M | N | O | P | Q | R | S | T | U | V | Z | 22 | ||||
| Skolt Sami | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | R | S | T | U | V | Z | 22 | ||||
| Slovenian | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | R | S | T | U | V | Z | 22 | ||||
| Somali | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | Q | R | S | T | U | W | X | Y | 23 | |||
| Sorbian[64] | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | R | S | T | U | W | Y | Z | 23 | |||
| Southern Sami | A | B | D | E | F | G | H | I | J | K | L | M | N | O | P | R | S | T | U | V | Y | 21 | |||||
| Swahili | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | R | S | T | U | V | W | Y | Z | 24 | ||
| Tahitian | A | E | F | H | I | M | N | O | P | R | T | U | V | 13 | |||||||||||||
| Tetum | A | B | D | E | F | G | H | I | J | K | L | M | N | O | P | R | S | T | U | V | W | X | Z | 23 | |||
| Toki Pona | A | E | I | J | K | L | M | N | O | P | S | T | U | W | 14 | ||||||||||||
| Tongan | A | E | F | G | H | I | K | L | M | N | O | P | S | T | U | V | 16 | ||||||||||
| Turkish | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | R | S | T | U | V | Y | Z | 23 | |||
| Turkmen[55] | A | B | D | E | F | G | H | I | J | K | L | M | N | O | P | R | S | T | U | W | Y | Z | 22 | ||||
| Ulithian[49] | A | B | C | D | E | F | G | H | I | K | L | M | N | O | P | R | S | T | U | W | Y | 21 | |||||
| Ume Sami | A | B | D | E | F | G | H | I | J | K | L | M | N | O | P | R | S | T | U | V | Y | 21 | |||||
| Uzbek[25] | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | X | Y | Z | 25 | |
| Veps | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | R | S | T | U | V | Z | 22 | ||||
| Vietnamese[26] | A | B | C | D | E | G | H | I | K | L | M | N | O | P | Q | R | S | T | U | V | X | Y | 22 | ||||
| Volapük | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | R | S | T | U | V | X | Y | Z | 24 | ||
| Waray | A | B | D | G | H | I | K | L | M | N | P | R | S | T | U | W | Y | 17 | |||||||||
| Welsh[28] | A | B | C | D | E | F | G | H | I | J | L | M | N | O | P | R | S | T | U | W | Y | 21 | |||||
| Wolof | A | B | C | D | E | F | G | I | J | K | L | M | N | O | P | Q | R | S | T | U | W | X | Y | 24 | |||
| Yapese[50] | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | W | Y | 23 | |||
| Yoruba[44] | A | B | D | E | F | G | H | I | J | K | L | M | N | O | P | R | S | T | U | W | Y | 21 | |||||
| Zuni[66] | A | B | C | D | E | H | I | K | L | M | N | O | P | S | T | U | W | Y | 18 | ||||||||
| count | 101 | 91 | 72 | 89 | 100 | 84 | 93 | 95 | 101 | 77 | 86 | 93 | 99 | 101 | 99 | 95 | 32 | 94 | 96 | 101 | 98 | 68 | 47 | 31 | 70 | 64 |
The I is used in two distinct versions in Turkic languages: dotless (I ı) and dotted (İ i). They are considered different letters, and case conversion must take care to preserve the distinction. Irish traditionally does not write the dot, or tittle, over the small letter i, but the language makes no distinction here if a dot is displayed, so no specific encoding and special case conversion rule is needed as it is for Turkic alphabets.
Statistics
[edit]The chart above lists a variety of alphabets that do not officially contain all 26 letters of the ISO basic Latin alphabet. In this list, at least one language lacks one of every letter. For each of the 26 basic ISO Latin alphabet letters, the number of alphabets in the list above using it is as follows:
| Letter | A | I | N | T | E | O | M | U | S | P | H | R | G | L | B | D | K | F | J | C | Y | V | Z | W | Q | X |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Alphabets | 101 | 101 | 101 | 101 | 100 | 99 | 99 | 98 | 96 | 95 | 95 | 94 | 93 | 93 | 91 | 89 | 86 | 84 | 77 | 72 | 70 | 68 | 64 | 47 | 32 | 31 |
Letters not contained in the ISO basic Latin alphabet
[edit]Some languages have extended the Latin alphabet with ligatures, modified letters, or digraphs. These symbols are listed below.
Additional letters by type
[edit]Independent letters and ligatures
[edit]| Additional base letters | Æ | Ɑ | Ꞵ | Ð | Ǝ | Ə | Ɛ | Ɣ | I | Ɩ | Ŋ | Œ | Ɔ | Ꞷ | Ʊ | K' | ẞ | Ʃ | Þ | Ʋ | Ƿ | Ȝ | Ʒ | ʔ |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| æ | ɑ | ꞵ | ð | ǝ | ə | ɛ | ɣ | ı | ɩ | ŋ | œ | ɔ | ꞷ | ʊ | ĸ | ß | ʃ | þ | ʋ | ƿ | ȝ | ʒ | ʔ | |
| Anglo-Saxon | Æ | Ð | Þ | Ƿ | Ȝ | |||||||||||||||||||
| Azeri[53] | Ə | I | ||||||||||||||||||||||
| Bambara[39] | Ɛ | Ŋ | Ɔ | |||||||||||||||||||||
| Northern Berber | Ɛ | Ɣ | ||||||||||||||||||||||
| Southern Berber | Ǝ | Ɣ | Ŋ | |||||||||||||||||||||
| Crimean Tatar | I | |||||||||||||||||||||||
| Dalecarlian | Ð | |||||||||||||||||||||||
| Danish[9] Norwegian[9] Southern Sami (Norway) |
Æ | |||||||||||||||||||||||
| Dinka | Ɛ | Ɣ | Ŋ | Ɔ | ||||||||||||||||||||
| Faroese | Æ | Ð | ||||||||||||||||||||||
| Greenlandic | Æ | (ĸ) | ||||||||||||||||||||||
| German[13] | ß | |||||||||||||||||||||||
| Icelandic Norn |
Æ | Ð | Þ | |||||||||||||||||||||
| Celtic British English[36] French[12] Latin[2] |
Æ | Œ | ||||||||||||||||||||||
| Inari Sami Northern Sami Lule Sami[60] Fula[41] Alphabet of Mauritania Alphabet of Senegal |
Ŋ | |||||||||||||||||||||||
| Skolt Sami | Ŋ | Ʒ | ||||||||||||||||||||||
| Pan-Nigerian | Ǝ | |||||||||||||||||||||||
| Turkish Kazakh[38] |
I | |||||||||||||||||||||||
| Alphabet of Cameroon | Æ | Ɑ | Ə | Ɛ | Ŋ | Œ | Ɔ | |||||||||||||||||
| Alphabet of Benin | Ǝ | Ɛ | Ɣ | Ŋ | Ɔ | Ʊ | Ʋ | |||||||||||||||||
| Alphabet of Burkina Faso | Ǝ | Ɛ | Ɩ | Ŋ | Ɔ | Ʋ | ||||||||||||||||||
| Alphabet of Chad[68] | Ə | Ɛ | Ŋ | Ɔ | ||||||||||||||||||||
| Alphabet of Côte d'Ivoire | Ɛ | Ɩ | Ŋ | Ɔ | Ʊ | Ɂ | ||||||||||||||||||
| Scientific Alphabet of Gabon | Ꞵ | Ð | Ǝ | Ɛ | Ɣ | Ŋ | Ɔ | Ʃ | Ʒ | Ɂ | ||||||||||||||
| Alphabet of Mali | Ǝ | Ɛ | Ɣ | Ŋ | Ɔ | Ɂ | ||||||||||||||||||
| Alphabet of Niger | Ǝ | Ɣ | Ŋ | |||||||||||||||||||||
| Alphabet of Zaïre | Ɛ | Ɔ | ||||||||||||||||||||||
| African reference alphabet | Ɑ | Ǝ | Ɛ | Ɣ | Ɩ | Ŋ | Ɔ | Ꞷ | Ʃ | Ʋ | Ʒ | Ɂ | ||||||||||||
| Count | 7 | 2 | 1 | 5 | 8 | 3 | 12 | 8 | 3 | 3 | 14 | 2 | 11 | 1 | 2 | 1 | 1 | 2 | 2 | 3 | 1 | 1 | 3 | 4 |
Letter–diacritic combinations: connected or overlaid
[edit]Other letters in collation order
[edit]The tables below are a work in progress. Eventually, table cells with light blue shading will indicate letter forms that do not constitute distinct letters in their associated alphabets. Please help with this task if you have the required linguistic knowledge and technical editing skill.
For the order in which the characters are sorted in each alphabet, see collating sequence.
Letters derived from A–H
[edit]Letters derived from I–O
[edit]Letters derived from P–Z
[edit]Notes
[edit]- ↑↑↑↑ In classical Latin, the digraphs ⟨ch⟩, ⟨ph⟩, ⟨rh⟩, ⟨th⟩ were used in loanwords from Greek, but they were not included in the alphabet. The ligatures ⟨æ⟩, ⟨œ⟩ and ⟨w⟩, as well as lowercase letters, were added to the alphabet only in Middle Ages. The letters ⟨j⟩ and ⟨u⟩ were used as typographical variants of ⟨i⟩ and ⟨v⟩, respectively, roughly until the Enlightenment.
- ↑↑↑↑ In Afrikaans, ⟨c⟩ and ⟨q⟩ are only (and ⟨x⟩ and ⟨z⟩ almost only) used in loanwords.
- ↑↑↑↑ Albanian officially has the digraphs ⟨dh, gj, ll, nj, rr, sh, th, xh, zh⟩, which is sufficient to represent the Tosk dialect. The Gheg dialect supplements the official alphabet with 6 nasal vowels, namely ⟨â, ê, î, ô, û, ŷ⟩.
- ↑↑↑↑ Arbëresh officially has the digraphs ⟨dh, gj, hj, ll, nj, rr, sh, th, xh, zh⟩. Arbëresh has the distinctive ⟨hj⟩, which is considered as a letter in its own right.
- ↑↑ Achomi also has the digraph ⟨a'⟩.
- ↑↑↑↑ Azeri only uses the letter ⟨ä⟩ as a substitute for ⟨ə⟩ if the latter cannot be used (it was replaced by the schwa one year later because it is the most common letter). These cases should be avoided! The letters ⟨w⟩, ⟨đ⟩, ⟨ŋ⟩, ⟨q̇⟩, ⟨ć⟩ (or the digraph ⟨ts⟩), and the digraph ⟨dz⟩ are only used in certain dialects.
- ↑ Bambara also has the digraphs: ⟨kh⟩ (only present in loanwords), ⟨sh⟩ (also written as ⟨ʃ⟩; only present in some dialects). Historically, ⟨è⟩ was used instead of ⟨ɛ⟩, ⟨ny⟩ was used instead of ⟨ɲ⟩, and ⟨ò⟩ was used instead of ⟨ɔ⟩ in Mali.
- ↑↑↑↑ Basque has several digraphs: ⟨dd, ll, rr, ts, tt, tx, tz⟩. The ⟨ü⟩, which represents /ø/, is required for various words in its Zuberoan dialect. ⟨c, q, v, w, y⟩ are used in foreign words, but are officially considered part of the alphabet.
- ↑↑↑↑ Belarusian also has several digraphs: ⟨ch, dz, dź, dž⟩.
- ↑↑↑↑ Bislama also has the digraph ⟨ng⟩.
- ↑↑↑↑ Breton also has the digraphs ⟨ch, c'h, zh⟩. ⟨c, q, x⟩ are used in foreign words or digraphs only.
- ↑↑↑ Catalan also has a large number of digraphs: ⟨dj, gu, gü, ig, ix, ll, l·l, ny, qu, qü, rr, ss, tg, tj, ts, tx, tz⟩. The letters ⟨k, q, w, y⟩ are only used in loanwords or the digraphs mentioned.
- ↑↑ The Alphabet of Chad also uses the unique letters ⟨n̰⟩ and ⟨r̰⟩.
- ↑↑↑ Chamorro also has the digraphs ⟨ch, ng⟩. ⟨c⟩ used only in digraphs.
- ↑↑↑↑ Corsican has the trigraphs: ⟨chj, ghj⟩.
- ↑↑↑↑↑↑↑↑ Croatian Gaj's alphabet also has the digraphs: ⟨dž, lj, nj⟩. There are also four tone markers that are sometimes used on vowels to avoid ambiguity in homophones, but this is generally uncommon. Gaj's alphabet has been adopted by the Serbian and Bosnian standards and that it has complete one-to-one congruence with Serbian Cyrillic, where the three digraphs map to Cyrillic letters ⟨џ⟩, ⟨љ⟩ and ⟨њ⟩, respectively. Rarely and non-standardly, digraph ⟨dj⟩ is used instead of ⟨đ⟩ (like it was previously) (Cyrillic ⟨ђ⟩). Montenegrin variant additionally uses ⟨ś⟩ and ⟨ź⟩ to indicate dialectal pronunciation.
- ↑↑ Cypriot Arabic also has the letters ⟨Θ⟩ and ⟨Δ⟩.
- ↑↑↑↑ Czech also has the digraph ⟨ch⟩, which is considered a separate letter and is sorted between ⟨h⟩ and ⟨i⟩. While ⟨á, ď, é, ě, í, ň, ó, ť, ú, ů, ý⟩ are considered separate letters, in collation they are treated merely as letters with diacritics. However, ⟨č, ř, š, ž⟩ are sorted as separate letters. ⟨q, w, x⟩ occur only in loanwords.
- ↑ Dakelh also contains the letter ⟨'⟩, which represents the glottal stop. The letters ⟨f, p, r, v⟩ are only used in loanwords.
- ↑↑↑↑↑↑ The Norwegian alphabet is currently identical with the Danish alphabet. ⟨c⟩ is part of both alphabets and is not used in native Danish or Norwegian words (except some proper names), but occurs quite frequently in well-established loanwords in Danish. Norwegian and Danish use ⟨é⟩ in some words such as én, although ⟨é⟩ is considered a diacritic mark, while ⟨å, æ, ø⟩ are letters. ⟨q, w, x, z⟩ are not used except for names and some foreign words.
- ↑ Dinka also has the digraphs: ⟨dh, nh, ny, th⟩. ⟨h⟩ is only present in these digraphs. Dinka also used the letters ⟨ä, ë, ï, ö, ɛ̈, ɔ̈⟩ (the last two which do not exist as precomposed characters in Unicode)
- ↑↑↑ The status of ⟨ij⟩ as a letter, ligature or digraph in Dutch is disputed. ⟨c⟩ (outside the digraph ⟨ch⟩), ⟨q⟩, ⟨x⟩, and ⟨y⟩ occur mostly in foreign words. Letters with grave and letters with circumflex occur only in loanwords.
- ↑↑↑ English generally now uses extended Latin letters only in loan words, such as fiancé, fiancée, and résumé. Rare publication guides may still use the dieresis on words, such as "coöperate", rather than the now-more-common "co-operate" (UK) or "cooperate" (US). For a fuller discussion, see articles branching from Lists of English words of international origin, which was used to determine the diacritics needed for more unambiguous English. However, an ⟨é⟩ or ⟨è⟩ is sometimes used in poetry to show that a normally silent vowel is to be pronounced, as in "blessèd".
- ↑↑↑↑ Filipino [and also applicable in or to Tagalog, which is the topmost influencer and contributor language of Filipino, among the rest of the other influencer and contributor languages of the Philippines and foreign languages for Filipino's evolution, further development, and further enrichment; it (Tagalog) is also the de facto historical, traditional, and linguistic basis of Filipino and the de jure or official basis of Filipino's both predecessor Philippine national and official language/s or language phase/s or stage/s since 1937 (as a national language) and 1946 (as an official language), which is lastly institutionally, officially, and constitutionally named or renamed as or into Pilipino from 1959 to 1987, before being constitutionally and officially replaced by Filipino as the national and an official language since 1987] also uses the digraph ⟨ng⟩, even originally with a large tilde that spanned both ⟨n⟩ and ⟨g⟩ (as in ⟨n͠g⟩) when a vowel follows the digraph. (The use of the tilde over the two letters is now rare). Only ⟨ñ⟩ is required for everyday use (only in loanwords). The accented vowels are used in dictionaries to indicate pronunciation, and ⟨g⟩ with tilde is only present in older works. ⟨Ë⟩ and ⟨ë⟩ are new variants of ⟨E⟩ and ⟨e⟩, respectively, and we're introduced in 2013 by the Komisyon sa Wikang Filipino (Commission on the Filipino Language)'s "Ortograpiyang Pambansa" (National Orthography) and in 2014 by the Komisyon sa Wikang Filipino (Commission on the Filipino Language)'s KWF Manwal sa Masinop na Pagsulat (KWF Manual on Provident Writing) to represent and preserve the schwa vowel sound /ə/ in non-Tagalog Filipino words of Philippine origin or from the other languages of the Philippines that natively have this vowel sound in their languages.
- ↑↑↑ Uppercase diacritics in French are often (incorrectly) thought of as being optional, but the official rules of French orthography designate accents on uppercase letters as obligatory in most cases. Many pairs or triplets are read as digraphs or trigraphs depending on context, but are not treated as such lexicographically: consonants ⟨ph, (ng), th, gu/gü, qu, ce, ch/(sh/sch), rh⟩; vocal vowels ⟨(ee), ai/ay, ei/ey, eu, au/eau, ou⟩; nasal vowels ⟨ain/aim, in/im/ein, un/um/eun, an/am, en/em, om/on⟩; the half-consonant -⟨(i)ll⟩-; half-consonant and vowel pairs ⟨oi, oin/ouin, ien, ion⟩. When rules that govern the French orthography are not observed, they are read as separate letters, or using an approximating phonology of a foreign language for loan words, and there are many exceptions. In addition, most final consonants are mute (including those consonants that are part of feminine, plural, and conjugation endings). ⟨ÿ⟩ and ⟨ü⟩ are only used in certain geographical names and proper names plus their derivatives, or, in the case of ⟨ü⟩ with diaeresis, newly proposed reforms, e.g. capharnaüm 'shambles' is derived from the proper name Capharnaüm. ⟨æ⟩ occurs only in Latin or Greek loanwords.
- ↑ Fula has ⟨x⟩ as part of the alphabet in all countries except Guinea, Guinea-Bissau, Liberia, and Sierra Leone (used only in loanwords in these countries). ⟨ɠ⟩, which is used only in loanwords (but still part of the alphabet), is used in Guinea only. Fula also uses the digraphs ⟨mb⟩ (In Guinea spelled ⟨mb⟩), ⟨nd, ng, nj⟩. ⟨aa, ee, ii, oo, uu⟩ are part of the alphabet in all countries except Guinea, Guinea-Bissau, Liberia, and Sierra Leone. ⟨ƴ⟩ is used in all countries except for Nigeria, where it is written ⟨'y⟩. ⟨ ŋ⟩ is used in all countries except for Nigeria. ⟨ɲ⟩ is used in Guinea, Mali, and Burkina Faso, ⟨ñ⟩ is used in Senegal, Gambia, Mauritania, Guinea-Bissau, Liberia, and Sierra Leone, and the digraph ⟨ny⟩ is used in Niger, Cameroon, Chad, Central African Republic, and Nigeria. The apostrophe is a letter (representing the glottal stop) in Guinea-Bissau, Liberia, and Sierra Leone. ⟨q, v, z⟩ are only used in loanwords, and are not part of the alphabet.
- ↑↑↑↑ Galician. The standard of 1982 set also the digraphs gu, qu (both always before ⟨e⟩ and ⟨i⟩), ch, ll, nh and rr. In addition, the standard of 2003 added the grapheme ⟨ao⟩ as an alternative writing of ⟨ó⟩. Although not marked (or forgotten) in the list of digraphs, they are used to represent the same sound, so the sequence ⟨ao⟩ should be considered as a digraph. The sequence ⟨nh⟩ represents a velar nasal (not a palatal as in Portuguese) and is restricted only to three feminine words, being either demonstrative or pronoun: unha ('a' and 'one'), algunha ('some') and ningunha ('not one'). The Galician reintegracionismo movement uses it as in Portuguese. ⟨j⟩ (outside of the Limia Baixa region), ⟨k⟩, ⟨w⟩, and ⟨y⟩ are only used in loanwords, and are not part of the alphabet.
- ↑↑↑↑ German also retains most original letters in French loan words. Swiss German does not use ⟨ß⟩ any more. The long s ⟨ſ⟩ was in use until the mid-20th century. ⟨sch⟩ is usually not considered a separate letter, neither are the digraphs ⟨ch, ck, st, sp, th, (ph, rh), qu⟩. ⟨q⟩ only appears in the sequence ⟨qu⟩ and in loanwords, while ⟨x⟩ and ⟨y⟩ are found almost only in loan words. The capital ⟨ß⟩ (⟨ẞ⟩) is almost never used. The accented letters (other than the letters ⟨ä⟩, ⟨ö⟩, ⟨ü⟩, and ⟨ß⟩) are used only in loanwords.
- ↑↑↑↑ Guaraní also uses digraphs ⟨ch, mb, nd, ng, nt, rr⟩ and the glottal stop ⟨'⟩. ⟨b, c, d⟩ are only used in these digraphs.
- ↑ Gwich'in also contains the letter ⟨'⟩, which represents the glottal stop. Gwich'in also uses the letters ⟨ą̀, ę̀, į̀, ǫ̀, ų̀⟩, which are not available as precomposed characters in Unicode. Gwich'in also uses the digraphs and trigraphs: ⟨aa, ąą, àà, ą̀ą̀, ch, ch', ddh, dh, dl, dr, dz, ee, ęę, èè, ę̀ę̀, gh, ghw, gw, ii, įį, ìì, į̀į̀, kh, kw, k', nd, nh, nj, oo, ǫǫ, òò, ǫ̀ǫ̀, rh, sh, shr, th, tl, tl', tr, tr', ts, ts', tth, tth', t', uu, ųų, ùù, ų̀ų̀, zh, zhr⟩. The letter ⟨c⟩ is only used the digraphs above. ⟨b, f, m⟩ are only used in loanwords.
- ↑↑↑↑ Hausa has the digraphs: ⟨sh, ts⟩. Vowel length and tone are usually not marked. Textbooks usually use macron or doubled vowel to mark the length, grave to mark the low tone and circumflex to mark the falling tone. Therefore, in some systems, it is possible that macron is used in combination with grave or circumflex over a, e, i, o or u. The letter ⟨p⟩ is only used in loanwords.
- ↑↑↑↑ Hungarian also has the digraphs: ⟨cs, dz, gy, ly, ny, sz, ty, zs⟩; and the trigraph: ⟨dzs⟩. ⟨á, é, í, ó, ő, ú, ű⟩ are considered separate letters, but are collated as variants of ⟨a, e, i, o, ö, u, ü⟩.
- ↑↑↑↑ Irish traditionally used the dot diacritic (Irish: ponc séimhithe) to mark lenition, forming the dotted letters (litreacha buailte "struck letters") ⟨ḃ, ċ, ḋ, ḟ, ġ, ṁ, ṗ, ṡ, ṫ⟩. These have largely been replaced by the digraphs: ⟨bh, ch, dh, fh, gh, mh, ph, sh, th⟩ except for in decorative or self-consciously traditional contexts. ⟨v⟩ occurs in a small number of (mainly onomatopoeic) native words (e.g. vácarnach "to quack") and colloquialisms (vís for bís "screw"). ⟨j, k, q, w, x, y, z⟩ only occur in loanwords and scientific terminology.
- ↑ Igbo writes ⟨ṅ⟩ alternatively as ⟨n̄⟩. Igbo has the digraphs: ⟨ch, gb, gh, gw, kp, kw, nw, ny, sh⟩. ⟨c⟩ is only used in the digraph before. Also, vowels take a grave accent, an acute accent, or no accent, depending on tone.
- ↑↑↑↑ Italian also has the digraphs: ⟨ch, gh, gn, gl, sc⟩. ⟨j, k, w, x, y⟩ are used in foreign words, and are not part of the alphabet. ⟨x⟩ is also used for native words derived from Latin and Greek; ⟨j⟩ is also used for just a few native words, mainly names of persons (as in Jacopo) or of places (as in Jesolo and Jesi), in which represents /i/. While it does not occur in ordinary running texts, geographical names on maps are often written only with acute accents. The circumflex is used on an -i ending that was anciently written -ii (or -ji, -ij, -j, etc.) to distinguish homograph plurals and verb forms: e.g. principî form principi, genî from geni.
- ↑ Karakalpak also has the digraphs: ⟨ch, sh⟩. ⟨c, f, v⟩ are used in foreign words.
- ↑ Kazakh also has the digraphs: ⟨ia, io, iu⟩. ⟨f, h, v⟩ and the digraph ⟨io⟩ are used in foreign words.
- ↑↑↑↑ Latvian also has the digraphs: ⟨dz, dž, ie⟩. Dz and dž are occasionally considered separate letters of the alphabet in more archaic examples, which have been published as recently as the 1950s; however, modern alphabets and teachings discourage this due to an ongoing effort to set decisive rules for Latvian and eliminate barbaric words accumulated during the Soviet occupation. The digraph "ie" is never considered a separate letter. Ō, Ŗ, and the digraphs CH (only used in loanwords) and UO are no longer part of the alphabet, but are still used in certain dialects and newspapers that use the old orthography. Y is used only in certain dialects and not in the standard language. F and H are only used in loanwords.
- ↑↑↑↑ A nearby language, Pite Sami, uses Lule Sami orthography but also includes the letters ⟨đ⟩ and ⟨ŧ⟩, which are not in Lule Sami.
- ↑↑↑↑ Lithuanian also has the digraphs: ⟨ch, dz, dž, ie, uo⟩. However, these are not considered separate letters of the alphabet. F, H, and the digraph CH are only used in loanwords. Demanding publications such as dictionaries, maps, schoolbooks etc. need additional diacritical marks to differentiate homographs. Using grave accent on A, E, I, O, U, acute accent on all vowels, and tilde accent on all vowels and on L, M, N and R. Small E and I (also with ogonek) must retain the dot when additional accent mark is added to the character; the use of ì and í (with missing dot) is considered unacceptable.
- ↑↑↑↑ In Livonian, the letters Ö, Ȫ, Y, Ȳ were used by the older generation, but the younger generation merged these sounds; Around the late 1990s, these letters were removed from the alphabet.
- ↑↑↑↑ Maltese also has the digraphs: ⟨ie, għ⟩.
- ↑ Māori only uses ⟨g⟩ in ⟨ng⟩ digraph. ⟨wh⟩ is also a digraph.
- ↑↑↑↑ Marshallese often uses the old orthography (because people did not approve of the new orthography), which writes ļ as l, m̧ as m, ņ as n, p as b, o̧ as o at the ends of words or in the word yokwe (also spelled iakwe under the old orthography; under the new orthography, spelled io̧kwe), but a[clarification needed] at other places, and d as dr before vowels, or r after vowels. The old orthography writes ā as e in some words, but ā in others; it also writes ū as i between consonants. The old orthography writes geminates and long vowels as two letters instead. Allophones of /ɘ/, written as only e o ō in the new orthography, are also written as i u and very rarely, ū. The letter Y only occurs in the words yokwe or the phrase yokwe yuk (also spelled iakwe iuk in the old orthography or io̧kwe eok in the new orthography).
- ↑↑↑ Massachusett also uses the digraphs ⟨ch, ee, sh, ty⟩ and the letter ⟨8⟩ (which was previously written ⟨oo⟩). ⟨c⟩ is only used in the digraph ⟨ch⟩.
- ↑ Some Mohawk speakers use orthographic ⟨i⟩ in place of the consonant ⟨y⟩. The glottal stop is indicated with an apostrophe ⟨'⟩ and long vowels are written with a colon ⟨:⟩.
- ↑ Na'vi uses the letter ʼ and the digraphs aw, ay, ew, ey, kx, ll, ng (sometimes written ⟨g⟩), px, rr, ts (sometimes written ⟨c⟩), tx. ⟨g⟩ (in standard orthography) and ⟨x⟩ are used only in digraphs.
- ↑ Oromo uses the following digraphs: ⟨ch, dh, ny, ph, sh⟩. ⟨p⟩ is only used in the digraph ⟨ph⟩ and loanwords. ⟨v⟩ and ⟨z⟩ are only used in loanwords.
- ↑↑↑↑ Papiamento also has the digraphs: ⟨ch, dj, sh, zj⟩. ⟨q, x⟩ are only used in loanwords and proper names. ⟨j⟩ is only used in digraphs, loanwords, and proper names. Papiamentu in Bonaire and Curaçao is different from Papiamento in Aruba in the following ways: Papiamento in Aruba uses a more etymological spelling, so Papiamento uses ⟨c⟩ in native words outside of the digraph ⟨ch⟩, but Papiamentu in Bonaire and Curaçao does not. Papiamentu in Bonaire and Curaçao uses ⟨è⟩, ⟨ò⟩, ⟨ù⟩, and ⟨ü⟩ for various sounds and ⟨á, é, í, ó, ú⟩ for stress, but Papiamento in Aruba does not use these letters.
- ↑ Piedmontese also uses the letter ⟨n-⟩, which usually precedes a vowel, as in lun-a "moon".
- ↑↑↑↑ Pinyin has four tone markers that can go on top of any of the six vowels (⟨a, e, i, o, u, ü⟩); e.g.: macron (⟨ā, ē, ī, ō, ū, ǖ⟩), acute accent (⟨á, é, í, ó, ú, ǘ⟩), caron (⟨ǎ, ě, ǐ, ǒ, ǔ, ǚ⟩), grave accent (⟨à, è, ì, ò, ù, ǜ⟩). It also uses the digraphs: ⟨ch, sh, zh⟩.
- ↑↑↑↑ Polish also has the digraphs: ⟨ch, cz, dz, dż, dź, sz, rz⟩. ⟨q, v, x⟩ occur only in loanwords, and are sometimes not considered as part of the alphabet.
- ↑↑↑↑ Portuguese uses the digraphs ⟨ch, lh, nh, rr, ss⟩. The trema on ⟨ü⟩ was used in Brazilian Portuguese from 1943 to 2009. European Portuguese in that case used the grave accent (⟨ù⟩) from 1911 to 1920, then abolished. The grave accent was used on ⟨e, i, o, u⟩, until 1973. ⟨è, ò⟩ are used in geographical names outside Europe and not part of the language proper. The now abandoned practice was to indicate underlying stress in words with suffixes that begin with -z or in words ending in -mente, e.g. cafèzeiro, açaìzal, sòmente, ùltimamente etc. The trema on ⟨ï⟩ could be used to mark not stressed hiatuses, e.g. constituïção, although this use was only optional and applied to ⟨ü⟩ too. Neither the digraphs nor accented letters are considered part of the alphabet. ⟨k, w, y⟩ occur only in loanwords, and were not letters of the alphabet from 1911 (Portugal) or 1943 (Brazil) until 2009, but these letters were in fact used before 1911 in Portugal and before 1943 in Brazil when the word's etymology allowed, e.g. kilometro, sandwiche, typo etc. (although ⟨w⟩ was formally not included in the alphabet).
- ↑↑↑↑ Romani has the digraphs: ⟨čh, dž, kh, ph, th⟩.
- ↑ Romanian normally uses the letters ⟨ș, ț⟩ (⟨s, t⟩ with a comma diacritic below) but they are frequently replaced by ⟨ş, ţ⟩ (⟨s, t⟩ with a cedilla) due to past lack of standardization. ⟨k, q, w, x, y⟩ occur only in loanwords.
- ↑↑↑ Slovak also has the digraphs ⟨dz, dž, ch⟩ which are considered separate letters. While ⟨á, ä, ď, é, í, ĺ, ň, ó, ô, ŕ, ť, ú', ý⟩ are considered separate letters, in collation they are treated merely as letters with diacritics. However, ⟨č, ľ, š, ž⟩, as well as the digraphs, are actually sorted as separate letters. ⟨q, w, x, ö, ü⟩ occur only in loanwords.
- ↑↑↑↑↑ Sorbian also uses the digraphs: ⟨ch⟩, ⟨dź⟩. ⟨ř⟩ is only used in Upper Sorbian, and ⟨ŕ⟩, ⟨ś⟩, and ⟨ź⟩ (outside the digraph ⟨dź⟩) are only used in Lower Sorbian.
- ↑↑↑ Spanish uses several digraphs to represent single sounds: ⟨ch⟩, ⟨gu⟩ (preceding ⟨e⟩ or ⟨i⟩), ⟨ll⟩, ⟨qu⟩, ⟨rr⟩; of these, the digraphs ⟨ch⟩ and ⟨ll⟩ were traditionally considered individual letters with their own name (che, elle) and place in the alphabet (after ⟨c⟩ and ⟨l⟩, respectively), but in order to facilitate international compatibility the Royal Spanish Academy decided to cease this practice in 1994 and all digraphs are now collated as combinations of two separate characters. While cedilla is etymologically Spanish diminutive of ceda (⟨z⟩) and Sancho Pança is the original form in Cervantes books, C with cedilla ⟨ç⟩ is now completely displaced by ⟨z⟩ in contemporary language. In poetry, the diaeresis may be used to break a diphthong into separate vowels. Regarding that usage, Ortografía de la lengua española states that "diaeresis is usually placed over the closed vowel [i.e. ⟨i⟩ or ⟨u⟩] and, when both are closed, generally over the first"[citation needed]. In this context, the use of ⟨ï⟩ is rare, but part of the normative orthography.
- ↑ Swedish uses ⟨é⟩ in well integrated loan words like idé and armé, although ⟨é⟩ is considered a modified ⟨e⟩, while ⟨å⟩, ⟨ä⟩, ⟨ö⟩ are letters. ⟨á⟩ and ⟨à⟩ are rarely used words. ⟨w⟩ and ⟨z⟩ are used in some integrated words like webb and zon. ⟨q⟩, ⟨ü⟩, ⟨è⟩, and ⟨ë⟩ are used for names only, but exist in Swedish names. For foreign names ⟨ó⟩, ⟨ç⟩, ⟨ñ⟩ and more are sometimes used, but usually not. Swedish has many digraphs and some trigraphs. ⟨ch, dj, lj, rl, rn, rs, sj, sk, si, ti, sch, skj, stj⟩ and others are usually pronounced as one sound.
- ↑↑↑↑ Tswana also has the digraphs: ⟨kg, kh, ng, ph, th, tl, tlh, ts, tsh, tš, tšh⟩. The letters ⟨c⟩, ⟨q⟩, and ⟨x⟩ only appear in onomatopoeic and loanwords. The letters ⟨v⟩ and ⟨z⟩ only appear in loanwords.
- ↑↑↑↑ Turkmen had a slightly different alphabet in 1993–1995 (which used some rare letters) ⟨ý⟩ was written as ⟨ÿ⟩ (capital ⟨¥⟩), ⟨ň⟩ as ⟨ñ⟩, ⟨ş⟩ as ⟨¢⟩ (capital ⟨$⟩), and ⟨ž⟩ as ⟨⌠⟩ (capital ⟨£⟩) (so that all characters were available in Code page 437). In the new alphabet, all characters are available in ISO/IEC 8859-2.
- ↑↑↑ Ulithian also has the digraphs: ⟨ch, l', mw, ng⟩. ⟨c⟩ is used only in digraphs.
- ↑ Uzbek also has the digraphs: ⟨ch, ng, sh⟩ considered as letters. ⟨c⟩ is used only in digraphs. ⟨g'⟩, ⟨o'⟩ and apostrophe ⟨'⟩ are considered as letters. These letters have preferred typographical variants: ⟨gʻ⟩, ⟨oʻ⟩, and ⟨ʼ⟩ respectively.
- ↑↑↑↑ Venda also has the digraphs and trigraphs: ⟨bv, bw, dz, dzh, dzw, fh, hw, kh, khw, ng, ny, nz, ṅw, ph, pf, pfh, sh, sw, th, ts, tsh, tsw, ty, ṱh, vh, zh, zw⟩. ⟨c, j, q⟩ are used in foreign words.
- ↑↑↑↑ Vietnamese has seven additional base letters: ⟨ă â đ ê ô ơ ư⟩. It uses five tone markers that can go on top (or below) any of the 12 vowels (⟨a, ă, â, e, ê, i, o, ô, ơ, u, ư, y⟩); e.g.: grave accent (⟨à, ằ, ầ, è, ề, ì, ò, ồ, ờ, ù, ừ, ỳ⟩), hook above (⟨ả, ẳ, ẩ, ẻ, ể, ỉ, ỏ, ổ, ở, ủ, ử, ỷ⟩), tilde (⟨ã, ẵ, ẫ, ẽ, ễ, ĩ, õ, ỗ, ỡ, ũ, ữ, ỹ⟩), acute accent (⟨á, ắ, ấ, é, ế, í, ó, ố, ớ, ú, ứ, ý⟩), and dot below (⟨ạ, ặ, ậ, ẹ, ệ, ị, ọ, ộ, ợ, ụ, ự, ỵ⟩). It also uses several digraphs and trigraphs ⟨ch, gh, gi, kh, ng, ngh, nh, ph, th, tr⟩ but they are no longer considered letters.
- ↑↑↑↑ Walloon has the digraphs and trigraphs: ⟨ae, ch, dj, ea, jh, oe, oen, oi, sch, sh, tch, xh⟩. The letter ⟨x⟩ outside the digraph ⟨xh⟩ is in some orthographies, but not the default two. The letter ⟨q⟩ is in some orthographies, but not in the default two. Also in some orthographies are ⟨à⟩, ⟨ì⟩, ⟨ù⟩, and even ⟨e̊⟩ and ⟨o̊⟩ (which are not available as a precomposed character in Unicode, so ⟨ë⟩ and ⟨ö⟩ are used as substitutes)
- ↑↑↑↑ Welsh has the digraphs ⟨ch⟩, ⟨dd⟩, ⟨ff⟩, ⟨ng⟩, ⟨ll⟩, ⟨ph⟩, ⟨rh⟩, ⟨th⟩. Each of these digraphs is collated as a separate letter, and ⟨ng⟩ comes immediately after ⟨g⟩ in the alphabet. It also frequently uses circumflexes, and occasionally uses diaereses, acute accents and grave accents, on its seven vowels (⟨a, e, i, o, u, w, y⟩), but accented characters are not regarded as separate letters of the alphabet.
- ↑↑↑↑ Xhosa has a large number of digraphs, trigraphs, and even one tetragraph are used to represent various phonemes: ⟨bh, ch, dl, dy, dz, gc, gq, gr, gx, hh, hl, kh, kr, lh, mb, mf, mh, nc, ndl, ndz, ng, ng', ngc, ngh, ngq, ngx, nh, nkc, nkq, nkx, nq, nx, ntl, ny, nyh, ph, qh, rh, sh, th, ths, thsh, ts, tsh, ty, tyh, wh, xh, yh, zh⟩. It also occasionally uses acute accents, grave accents, circumflexes, and diaereses on its five vowels (⟨a, e, i, o, u⟩), but accented characters are not regarded as separate letters of the alphabet.
- ↑↑↑ Yapese has the digraphs and trigraphs: ⟨aa, ae, ch, ea, ee, ii, k', l', m', n', ng, ng', oe, oo, p', t', th, th', uu, w', y'⟩. ⟨q⟩, representing the glottal stop, is not always used. Often an apostrophe is used to represent the glottal stop instead. ⟨c⟩ is used only in digraphs. ⟨h⟩ is used only in digraphs and loanwords. ⟨q⟩ is used only in loanwords.
- ↑↑↑↑ Yoruba uses the digraph ⟨gb⟩. Also, vowels take a grave accent, an acute accent, or no accent, depending on tone. Although the "dot below" diacritic is widely used, purists prefer a short vertical underbar (Unicode COMBINING VERTICAL LINE BELOW U+0329) - this resembles the IPA notation for a syllabic consonant, attached to the base of the letter (⟨e⟩, ⟨o⟩ or ⟨s⟩). The seven Yoruba vowels (⟨a⟩, ⟨e⟩, ⟨ẹ⟩, ⟨i⟩, ⟨o⟩, ⟨ọ⟩, ⟨u⟩) can be uttered in three different tones: high (acute accent); middle (no accent) and low (grave accent). The letters ⟨m⟩ and ⟨n⟩, when written without diacritics, indicate nasalisation of the preceding vowel. ⟨m⟩ and ⟨n⟩ also occur as syllabics - in these circumstances, they take acute or grave tonal diacritics, like the vowels. Middle tone is marked with a macron to differentiate it from the unmarked nasalising consonants. A tilde was used in older orthography (still occasionally used) to indicate a double vowel. This is tonally ambiguous, and has now been replaced by showing the paired vowels, each marked with the appropriate tones. However, where a double vowel has the tonal sequence high-low or low-high, it may optionally be replaced by a single vowel with a circumflex (high-low) or caron (low-high), e.g. á + à = ⟨â⟩; à + á = ⟨ǎ⟩.
- ↑↑ Zuni contains the glottal stop ⟨'⟩ and the digraph: ⟨ch⟩; ⟨c⟩ is only used in that digraph. The other digraphs ⟨kw⟩, ⟨sh⟩, and ⟨ts⟩ are not part of the alphabet.
Miscellaneous
[edit]- Africa Alphabet
- African reference alphabet
- Beghilos
- Gaj's Latin alphabet, is the only script of both the Croatian and Bosniak standard languages in current use, and one of the two scripts of both the Serbian and Montenegrin standard languages alongside the Cyrillic alphabet.
- Initial Teaching Alphabet
- International Phonetic Alphabet
- Łatynka for Ukrainian
- Leet (1337 alphabet)
- Romani alphabet for most Romani languages
- Sámi Latin alphabet
- Standard Alphabet by Lepsius
- Tatar alphabet, similar to Turkish alphabet and Jaꞑalif as a part of Uniform Turkic alphabet
- Uralic Phonetic Alphabet
See also
[edit]- Diacritic
- Latin-script alphabet
- Latin-script multigraph
- Latin script in Unicode
- Ligature
- List of Latin-script letters
- List of precomposed Latin characters in Unicode
- Romanization
- Writing systems of Africa
- Categories
Footnotes
[edit]- ^ As defined in ISO/IEC 646 based on ASCII, which was based on the 26 letters of the English alphabet and previous telecommunications standards, and used in later ISO standards, see Latin characters in Unicode.
- ^ "Nav script".
External links
[edit]List of Latin-script alphabets
View on GrokipediaCore Components of Latin-script Alphabets
ISO Basic Latin Alphabet Overview
The ISO Basic Latin Alphabet comprises 26 uppercase letters (A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z) and their corresponding 26 lowercase letters (a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z), excluding all diacritics, ligatures, or supplementary characters. This set forms the unaltered core of Latin-script writing systems, ensuring uniformity in representation across languages that adopt the Latin script.[8][9] The standardization of this alphabet emerged from efforts to create compatible character encodings for early computing and data interchange. It was established in the first edition of ISO/IEC 646, published in July 1973, as a 7-bit coded character set designed specifically for Latin-script alphabets to support international information processing. Building on the equivalent ECMA-6 standard from 1965, ISO/IEC 646 provided a neutral international reference version (IRV) that aligned closely with ASCII while accommodating global needs, thereby enabling widespread adoption in computers, teleprinters, and digital communication protocols.[10][11] In English, these letters carry specific phonetic values, though individual letters often represent multiple sounds depending on context; the following provides representative pronunciations using the International Phonetic Alphabet (IPA) for General American English, with example words illustrating common usages:| Letter (Upper/Lower) | Representative IPA Sound(s) | Example Word and Pronunciation |
|---|---|---|
| A/a | /æ/, /eɪ/ | cat /kæt/, cake /keɪk/ |
| B/b | /b/ | bat /bæt/ |
| C/c | /k/, /s/ | cat /kæt/, city /ˈsɪti/ |
| D/d | /d/ | dog /dɔɡ/ |
| E/e | /ɛ/, /i/ | bed /bɛd/, be /bi/ |
| F/f | /f/ | fish /fɪʃ/ |
| G/g | /ɡ/, /dʒ/ | go /ɡoʊ/, gem /dʒɛm/ |
| H/h | /h/ | hat /hæt/ |
| I/i | /ɪ/, /aɪ/ | sit /sɪt/, site /saɪt/ |
| J/j | /dʒ/ | jam /dʒæm/ |
| K/k | /k/ | kite /kaɪt/ |
| L/l | /l/ | lamp /læmp/ |
| M/m | /m/ | man /mæn/ |
| N/n | /n/ | net /nɛt/ |
| O/o | /ɑ/, /oʊ/ | hot /hɑt/, boat /boʊt/ |
| P/p | /p/ | pen /pɛn/ |
| Q/q | /kw/ | quick /kwɪk/ |
| R/r | /r/ | red /rɛd/ |
| S/s | /s/, /z/ | sun /sʌn/, rose /roʊz/ |
| T/t | /t/ | top /tɑp/ |
| U/u | /ʌ/, /ju/ | sun /sʌn/, use /jus/ |
| V/v | /v/ | van /væn/ |
| W/w | /w/ | wet /wɛt/ |
| X/x | /ks/ | box /bɑks/ |
| Y/y | /j/, /aɪ/ | yes /jɛs/, my /maɪ/ |
| Z/z | /z/ | zoo /zu/ |
Role of Extensions in Latin Scripts
Extensions in Latin scripts refer to mechanisms that modify or supplement the core ISO Basic Latin alphabet to accommodate phonetic, orthographic, or historical needs beyond English. These extensions primarily include ligatures, diacritics, and multigraphs, each serving to expand the script's representational capacity without fundamentally altering its letter inventory. The term "ligature" derives from the Latin ligatura, meaning "a band" or "something used for tying," originating around 1400 CE to describe joined characters that bind multiple graphemes into a single glyph for efficiency in writing and printing. Similarly, "diacritic" comes from the Ancient Greek diakritikós, meaning "distinguishing," highlighting its role in marking distinctions in pronunciation or meaning. "Multigraph," a more modern linguistic term formed from English "multi-" and "-graph" around the early 20th century, denotes a sequence of multiple letters functioning as a single phonemic unit, as documented in the Oxford English Dictionary.[14][15] The primary purpose of these extensions is to represent phonemes and orthographic features in non-English languages, particularly those in Romance and Germanic families, as well as historical scripts derived from Latin. For instance, ligatures like æ (from ae) emerged in medieval Latin to denote diphthongs in Germanic and Romance languages, streamlining the representation of combined vowel sounds that were common in Old English and Old Norse. Diacritics, such as the acute accent in á, are employed in Romance languages like Spanish and Portuguese to indicate stress or vowel quality, distinguishing words like Portuguese sá ("healthy") from sa (a conjunction). Multigraphs, exemplified by ng in languages like Filipino or historical Germanic orthographies, treat consonant clusters as unified sounds, avoiding the need for new letterforms while preserving the basic alphabet's structure. These adaptations arose as Latin script spread across Europe from the Roman era onward, necessitating modifications to capture diverse linguistic inventories without inventing entirely new symbols.[16][17][18] In typography, extensions influence design and legibility by addressing visual collisions and enhancing aesthetic harmony; ligatures prevent overlapping strokes in pairs like fi or ff, a practice rooted in early movable type printing to improve flow, while diacritics require precise positioning to maintain readability in multilingual typesetting. This has led to specialized font features, such as OpenType support for discretionary ligatures, ensuring compatibility across scripts. In Unicode encoding, extensions are handled through dedicated blocks like Latin Extended-A and Latin-1 Supplement, which include over 100 precomposed characters with diacritics and ligatures, alongside combining marks for dynamic composition; this approach, formalized since Unicode 1.0 in 1991, facilitates global text processing but poses challenges for legacy systems limited to 8-bit encodings, impacting digital rendering of accented Latin text in non-Western contexts. The Unicode Consortium's guidelines emphasize normalization to balance precomposed forms with combining sequences, promoting interoperability in software and databases.[19][16]Alphabets Limited to ISO Basic Latin Letters
Pure Basic Letter Alphabets
Pure basic letter alphabets are writing systems that utilize precisely the 26 uppercase and lowercase letters of the ISO basic Latin alphabet—A through Z—without incorporating diacritics, ligatures, or supplementary characters. This set includes A a, B b, C c, D d, E e, F f, G g, H h, I i, J j, K k, L l, M m, N n, O o, P p, Q q, R r, S s, T t, U u, V v, W w, X x, Y y, Z z—providing a streamlined orthography ideal for mechanical reproduction and digital input. Such alphabets emphasize phonetic representation through combinations of these letters rather than modifications, promoting uniformity across diverse linguistic applications.[20] A key characteristic of these alphabets is their reliance on digraphs and multigraphs to denote sounds absent from the basic inventory, avoiding the need for extended glyphs. For instance, in English, the digraph "ch" represents the affricate /tʃ/, while "th" conveys /θ/ or /ð/, and "ng" indicates the velar nasal /ŋ/. This approach maintains orthographic purity while accommodating phonological complexity, though it can lead to ambiguities in spelling and pronunciation for learners. Similar strategies appear in other languages, where positional rules or contextual cues further distinguish meanings without altering the letter set.[20] Prominent examples include the English alphabet, which forms the foundation for writing the English language and has been standardized since the late Middle Ages. The Indonesian alphabet mirrors this exact 26-letter structure for Bahasa Indonesia, a lingua franca spoken by over 200 million people. Likewise, the Malay alphabet employs the same basic letters for standard Malay (Bahasa Malaysia) in Malaysia, Brunei, and Singapore, ensuring phonetic transparency with consistent letter-sound correspondences. These systems highlight the alphabet's adaptability to non-European phonologies while preserving its core form.[21][22] Historically, pure basic letter alphabets proliferated with the invention of movable-type printing in the mid-15th century, as printers like Johannes Gutenberg favored the unmodified Latin set for its efficiency in casting type and compatibility with classical texts. This facilitated the rapid dissemination of knowledge across Europe, where the 26-letter inventory became the norm for vernacular printing by the 16th century. In colonial expansions, European powers exported this simplified script to overseas territories; for example, the Dutch introduced the basic Latin alphabet to the Indonesian archipelago in the 19th century, replacing indigenous scripts like Javanese and Pegon to standardize administration and education. Such adoptions underscored the alphabet's role in linguistic unification during imperial eras.[23][24] In contemporary usage, these alphabets dominate in English-speaking regions, including the United States, United Kingdom, and former colonies, supporting global media, literature, and commerce. In Southeast Asia, Indonesian and Malay variants enable seamless cross-border communication and digital accessibility, with over 270 million speakers collectively relying on this unadorned script for everyday and official purposes. Their persistence reflects a preference for interoperability in an increasingly connected world, though occasional reforms address evolving phonetic needs without straying from the basic framework.[25][26]Extensions via Multigraphs
Multigraphs in Latin-script alphabets consist of two or more basic Latin letters combined to function as a single grapheme, representing phonemes not covered by individual letters and thereby extending the script's expressive range without requiring additional symbols. Common examples include the trigraph "sch" in German, pronounced as /ʃ/ (as in Schule, "school"); the digraph "ll" in traditional Spanish orthography, representing /ʎ/ or /j/ (as in llama, "flame"); and the digraph "ng" in Filipino, denoting /ŋ/ (as in ngayon, "now").[27][28][29] These combinations, often called digraphs for two letters or trigraphs for three, allow languages to adapt the core ISO Basic Latin set to their phonological needs. Several languages employ multigraphs prominently in their orthographies. In Welsh, eight digraphs—"ch" (/χ/, as in chware, "play"), "dd" (/ð/, as in ddŵr, "water"), "ff" (/f/, as in ffrânc, "French"), "ng" (/ŋ/, as in sing, "sing"), "ll" (/ɬ/, as in llyn, "lake"), "ph" (/f/, as in ffôn, "phone"), "rh" (/r̥/, as in rhew, "frost"), and "th" (/θ/, as in ty, "house")"—are treated as distinct letters, expanding the 26-letter ISO basic set to 28 letters (or 29 including the rare 'j').[30] Hungarian incorporates digraphs such as "cs" (/t͡ʃ/, as in csokoládé, "chocolate"), "gy" (/ɟ/, as in gyümölcs, "fruit"), "sz" (/s/, as in szép, "beautiful"), and the trigraph "dzs" (/d͡ʒ/, as in Dzsudzsák, a surname), though their use is limited compared to accented letters.[31] Afrikaans relies on vowel digraphs like "ou" (/ʊə/, as in ou, "old"), "ei" (/ɛɪ/, as in ei, "egg"), "ooi" (/ɔɪ/, as in ooi, "ewe"), and "ui" (/œɪ/ or /ʏə/, as in ui, "onion") to capture diphthongs, alongside consonant combinations such as "tj" (/tʃ/ or /kj/, as in tjank, "whine") and "gh" (/x/ or /ɡ/, context-dependent).[32] In collation and sorting, multigraphs are often handled as unified units in dictionaries and indexes, influencing alphabetical order. For instance, in Welsh, "ll" follows "lk" and precedes "lm", reflecting its status as a single letter.[30] Hungarian sorts digraphs like "cs" after "c" but before "d", with geminated forms (e.g., "ccs") treated similarly.[31] In Spanish, "ll" was historically sorted after "lj" (e.g., llama after liana but before lobo), but since the 2010 orthographic reform by the Real Academia Española, "ch" and "ll" are no longer independent letters and are collated as sequences of individual letters.[33] The primary advantage of multigraphs lies in their ability to encode unique sounds using only the existing 26 basic Latin letters, promoting compatibility with standard keyboards, digital encoding, and international typography while avoiding the visual complexity or input challenges of diacritics or new letterforms.[34] This approach facilitates orthographic adaptation for languages with limited resources for script reform, as seen in the historical evolution of these systems.[35]Alphabets Encompassing All ISO Basic Latin Letters
Complete Basic Letter National Alphabets
National alphabets that encompass all 26 letters of the ISO basic Latin alphabet (A–Z) without omissions form a key subset of Latin-script writing systems used in official national contexts. These alphabets ensure complete utilization of the basic letter set as defined by ISO/IEC 646, providing a standardized foundation for orthographic representation in their respective languages. Prominent examples include modern English, standard German (treating the eszett ß as an optional variant rather than a core addition), and Indonesian, where the full set supports primary written communication without relying on supplementary basic letters beyond the ISO standard.[36] Post-World War II orthographic reforms played a significant role in standardizing these alphabets, often driven by efforts to promote linguistic unity, accessibility, and international compatibility in decolonizing or rebuilding nations. In Indonesia, the 1972 Ejaan Yang Disempurnakan (Perfected Spelling) reform aligned the script more closely with phonetic principles and harmonized it with Malaysian orthography, adopting the full 26-letter set to facilitate education and administration after independence.[37] Germany's 1996 orthographic reform, building on post-war standardization efforts, reaffirmed the use of all basic letters while simplifying rules for consistency, reflecting a broader push for inclusivity in European education systems. English, already established with its 26-letter alphabet, saw indirect reinforcement through global standardization initiatives like ISO 646 in the 1960s, enhancing its role in international diplomacy and trade without major domestic changes.[38] These alphabets achieve phonetic coverage by mapping the 26 letters to the core phonemes of their languages, often through digraphs or contextual variations rather than additional letters. In English, the letters represent approximately 44 phonemes, with vowels like covering /æ/, /eɪ/, and /ɑː/ depending on position, enabling representation of diphthongs and consonants like /θ/ via . German utilizes all basic letters to denote its 20 consonants and 8–12 vowels (including umlaut-modified forms treated as extensions), where,, , or in informal Indonesian speech (though retained in writing), or Swiss German preferences against ß in favor of . These deviations do not alter national orthographic norms, which prioritize the complete ISO basic set for uniformity in education, media, and governance.[42]
Complete Basic Letter Auxiliary Alphabets
Complete Basic Letter Auxiliary Alphabets refer to constructed systems within the Latin script that incorporate the full set of 26 ISO basic Latin letters (A–Z and a–z) to facilitate international or specialized communication, distinguishing them from national alphabets tied to specific sovereign languages. These auxiliary alphabets emerged primarily in the late 19th and early 20th centuries as part of efforts to create international auxiliary languages (IALs) or phonetic notations, aiming for simplicity, universality, and ease of adoption across linguistic boundaries. Unlike more widespread national variants, such as those for English or French, these invented systems prioritize global interoperability, often drawing vocabulary and grammar from multiple Romance or Indo-European sources while adhering strictly to basic Latin letter forms, sometimes with minimal extensions for phonetic precision. The International Phonetic Alphabet (IPA), developed by the International Phonetic Association in 1886, exemplifies a phonetic auxiliary system that encompasses all ISO basic Latin letters as core components of its 107-symbol inventory, which also includes modified Latin forms, Greek letters, and diacritics for transcribing sounds from any language. Its design principle centers on universality, enabling precise representation of global phonetic diversity without bias toward any single tongue, making it indispensable in linguistics for education, research, and documentation. Evolving through revisions—such as the 1947 and 2020 updates—the IPA has become a standard tool in academic fields, with ongoing use in phonetic analysis, language teaching, and speech therapy communities worldwide.[43] Among IALs, Occidental (later renamed Interlingue), created by Edgar de Wahl in 1922, employs exactly the 26 basic letters without diacritics, emphasizing naturalistic grammar inspired by Western European languages to enhance readability and learnability for global users. Interglossa, devised by Lancelot Hogben in 1943, and its successor Glosa (developed in the 1970s), also rely on the basic Latin set to construct a semantically principled auxiliary language from Greco-Latin roots, focusing on isolating morphology for straightforward cross-cultural exchange. These 19th- and 20th-century innovations reflect a broader movement toward esperanto-like constructed tongues, though adoption remains niche.[44][45] In contemporary contexts, these auxiliary alphabets sustain small but dedicated communities: the IPA thrives in linguistic scholarship and conlang design, while Interlingue persists through societies like the Interlingue Union, hosting congresses and online resources for enthusiasts. Interlingua, formalized by the International Auxiliary Language Association in 1951, boasts the largest following among IALs, with publications and media reaching thousands globally, underscoring their role in fostering constructed language experimentation despite limited mainstream penetration compared to national scripts.Alphabets with Incomplete ISO Basic Latin Letters
Partial Basic Letter Historical Alphabets
The classical Latin alphabet, used from the Roman Republic through the early Empire (circa 7th century BCE to 5th century CE), comprised 23 letters derived primarily from Etruscan and Greek influences: A, B, C, D, E, F, G, H, I, K, L, M, N, O, P, Q, R, S, T, V, X, Y, Z. This set omitted J, a distinct U, and W, as the sounds they represent were not phonetically necessary in classical Latin. The letter I served dual roles for the vowel /i/ and the semivowel /j/ (as in "Iulius"), while no affricate /dʒ/ sound existed to warrant a dedicated J; similarly, V covered the vowel /u/, the consonant /w/, and early /v/ fricative variants without requiring separation.[46] Y and Z were late additions solely for transcribing Greek loanwords, placed at the end, underscoring the alphabet's adaptation to Latin's Indo-European phonology rather than exhaustive coverage of all possible sounds. In medieval Europe, particularly during the Anglo-Saxon period (5th to 11th centuries CE), the Latin alphabet was adapted for vernacular languages like Old English, resulting in a partial set of approximately 24 letters that incorporated runic influences while omitting several ISO basic letters. The core letters included A, B, C, D, E, F, G, H, I, L, M, N, O, P, R, S, T, with additions such as thorn (þ) and eth (ð) for /θ/ and /ð/ sounds, ash (æ) for /æ/, and wynn (ƿ) for /w/; K, Q, V, X, and Z were largely absent or rare, as Old English phonology lacked prominent /k/ contrasts beyond C, /kw/ beyond CW, /v/ (represented by F medially), and /ks/ or /z/ sounds.[47] J was entirely omitted, with /j/ rendered by G or I, reflecting the Germanic language's limited need for Romance-derived distinctions.[48] This adaptation arose from the 7th-century introduction of Latin script by Christian missionaries, who modified it to fit West Germanic sounds absent in Latin, such as dental fricatives, without expanding to include extraneous consonants.[47] These historical omissions highlight a principle of phonetic economy in pre-modern Latin scripts, where letters were retained only if essential to the source language's sound system—Latin prioritized its vowel harmony and stop consonants, while Old English emphasized fricatives and diphthongs suited to its dialects. From the Roman era through the medieval period, such alphabets supported literature, inscriptions, and religious texts, but inconsistencies arose due to regional variations and scribal practices. The transition to fuller modern forms occurred gradually during Renaissance humanism (14th to 17th centuries), when scholars like Gian Giorgio Trissino (in 1524) advocated distinguishing J from I for consonantal /j/, U from V for the vowel /u/, and introducing W (as a doubled V) for Germanic /w/ to accommodate evolving European vernaculars and classical revivals.[49] This scholarly refinement standardized the 26-letter ISO basic set, bridging historical partiality with contemporary universality.[50]Partial Basic Letter Modern Variants
Contemporary Latin-script alphabets that employ only a partial set of the ISO basic Latin letters represent adaptations tailored to the specific phonological requirements of their associated languages, prioritizing efficiency in modern usage while omitting letters for sounds absent in the phonemic inventory. These variants emerged or were refined in the 20th century amid language revitalization efforts and global standardization initiatives, often influenced by international bodies promoting indigenous language preservation. Unlike full alphabets, they exclude redundant letters to streamline writing and reduce learner burden, though this can complicate integration with international digital tools. The Hawaiian alphabet, known as ka pīopa Hawaiʻi, consists of 13 letters: five vowels (A, E, I, O, U) and eight consonants (H, K, L, M, N, P, W, ʻokina), deliberately excluding B, C, D, F, G, J, Q, R, S, T, V, X, Y, and Z. This limited set corresponds directly to Hawaiian's eight consonant phonemes (/p/, /k/, /ʔ/, /h/, /m/, /n/, /l/, /w/) and five vowels, which lack voiced stops (like /b/, /d/, /g/) and most fricatives beyond /h/, reflecting the language's Austronesian origins and phonetic simplicity. The ʻokina (ʻ), representing the glottal stop /ʔ/, functions as a consonant and is essential for distinguishing words, such as ka ("the") versus kaʻa ("to roll"). Orthographic standardization occurred in 1826 by missionaries but saw significant 20th-century refinements, including the 1978 Spelling Project by the Bishop Museum, which established guidelines for diacritics like the kahakō (macron) over long vowels to aid pronunciation in revitalization programs. UNESCO has supported Hawaiian revitalization through initiatives like the International Decade of Indigenous Languages (2022–2032), which includes ʻŌlelo Hawaiʻi to promote its transmission and cultural preservation. However, compatibility challenges persist, as standard QWERTY keyboards lack dedicated keys for the ʻokina and kahakō, requiring users to install specialized input methods or use right-Alt combinations, which can hinder digital adoption in global systems. Rotokas, spoken by communities on Bougainville Island in Papua New Guinea, utilizes one of the smallest modern Latin alphabets with just 12 letters: A, E, G, I, K, O, P, R, S, T, U, V, omitting B, C, D, F, H, J, L, M, N, Q, W, X, Y, Z. This minimal configuration aligns with the Central Rotokas dialect's 11 phonemes—six consonants (/p/, /t/, /k/, /g/, /s/, /r/) and five vowels—where nasals are absent and certain sounds like /g/ and /k/ function as allophones, eliminating the need for additional letters. The orthography was developed in the mid-20th century by missionaries and linguists to facilitate literacy, building on the language's inherently sparse sound system for efficient transcription. UNESCO recognizes Rotokas as vulnerable, indirectly supporting its documentation through endangered languages programs, though direct orthographic reforms are limited compared to larger revitalization efforts. Hànyǔ Pīnyīn, the official romanization system for Standard Mandarin Chinese, selectively employs 25 of the 26 ISO basic Latin letters (excluding V, with Ü for the /y/ sound), using combinations like initials (e.g., B, P for bilabials) and finals to represent Mandarin's 21 initial consonants and 39 finals without needing all English-like distinctions. This partial adoption stems from Mandarin's phonemic structure, which lacks sounds like /v/ and relies on tones for differentiation, allowing a streamlined Latin-based script introduced in the 1950s to promote literacy and phonetic teaching. Standardized in 1958 by the People's Republic of China under linguist Zhou Youguang, Pīnyīn underwent 20th-century refinements for international compatibility, influenced by UNESCO's endorsement as a tool for global education in Chinese. Keyboard challenges arise primarily from tone diacritics (ā, á, ǎ, à), which require specialized input methods like IME (Input Method Editor) on standard Latin layouts, though widespread software support has mitigated broader exclusions. These modern variants trace their omissions to historical precursors in missionary adaptations but emphasize practical efficiency in contemporary contexts, such as education and digital communication.Usage and Adoption Statistics
The Latin script serves as the primary writing system for approximately 305 languages worldwide, encompassing a substantial portion of the approximately 7,159 living languages documented globally.[3][51] Among these, a portion utilize partial sets of the ISO basic Latin letters, often in adapted forms tailored to phonetic needs.[52] Adoption of partial basic letter alphabets is notably high in the Pacific Islands, reflecting historical missionary influences and linguistic simplification.[52] In contrast, usage remains low in Europe, where nearly all Latin-script languages employ the full 26-letter ISO basic set due to standardization in education and printing.[53] Trends indicate shifts toward fuller sets driven by digital globalization and Unicode compatibility, particularly evident in revitalization efforts where partial systems are expanded for broader accessibility in computing and media.[54] Recent statistics highlight the vulnerability of partial alphabets in endangered languages, with UNESCO's International Decade of Indigenous Languages (2022–2032) supporting preservation efforts for such systems in contexts like the Pacific and Amazon.[55][56]Additional Letters Beyond ISO Basic Latin
Independent and Ligature Additions
Independent letters in Latin-script alphabets include thorn (Þ, þ) and eth (Ð, ð), which originated from runic influences and are retained in modern Nordic languages to represent dental fricative sounds. In Icelandic and Faroese, thorn denotes the voiceless dental fricative [θ] as in English "thin," while eth represents the voiced dental fricative [ð] as in "this." These letters trace back to Old Norse and were adopted into the insular Latin scripts of the British Isles, appearing in Old English manuscripts where they were used interchangeably for both voiced and voiceless "th" sounds before standardization led to their replacement by the digraph "th."[57][8] Ligature additions encompass characters like the ae ligature (Æ, æ) and ij ligature (IJ, ij), which evolved from fused letter forms to represent diphthongs or distinct phonemes. The ae ligature, historically a fusion of "a" and "e" for the Latin diphthong /ai/, functions as an independent vowel in Nordic alphabets: in Danish and Norwegian, it typically represents the near-open front unrounded vowel [æ] as in "cat"; in Icelandic and Faroese, it denotes the diphthong [ai].[8] Similarly, the ij ligature in Dutch is treated as a single letter for the diphthong [ɛi], akin to the "ay" in English "pay," and is positioned as the 24th letter in the Dutch alphabet between "x" and "z," often rendered as a connected glyph in typography for aesthetic and phonetic unity.[58][59] These additions are primarily used in Nordic languages such as Icelandic, Faroese, Danish, and Norwegian, where they fill phonetic gaps absent in the basic ISO Latin alphabet, and historically in insular Celtic and Anglo-Saxon contexts like Old English, enhancing representation of inherited Germanic and Norse sounds.[57] In modern usage, they distinguish regional orthographies: for instance, Faroese incorporates thorn, eth, and æ alongside basic letters, while Dutch employs the ij ligature in words like "ijzer" (iron).[8] In Unicode, these characters are encoded in early blocks to support legacy and modern European scripts. Thorn (U+00DE capital, U+00FE small) and eth (U+00D0 capital, U+00F0 small) reside in the Latin-1 Supplement (U+0080–U+00FF), while the ae ligature (U+00C6 capital, U+00E6 small) shares this block, and the ij ligature (U+0132 capital, U+0133 small) is in Latin Extended-A (U+0100–U+017F).[8][60] Collation rules under the Unicode Collation Algorithm (UCA) treat these as distinct elements to preserve linguistic order. Thorn collates after "s" and before "t" with primary weight [.0712.0020.0008]; eth follows "d" before "e" at [.0712.0020.0008]; the ae ligature sorts after "d" with equivalence to "ae" at [.06D9.002B.0008]; and the ij ligature expands to "i" + "j" for sorting as a unit.[61] Language-specific tailorings, such as for Icelandic or Dutch, adjust these to match native dictionary orders.[61] Unicode includes independent letter additions for African languages using extended Latin scripts, such as hooked consonants in Latin Extended-B (e.g., U+0181 Ɓ for bilabial implosives in Pan-Nigerian alphabets), with ongoing updates to support diverse phonologies, though specific new ligatures remain limited in adoption.[62]| Character | Unicode Code Point | Block | Primary Usage |
|---|---|---|---|
| Þ (thorn capital) | U+00DE | Latin-1 Supplement | Icelandic, Faroese (voiceless [θ]) |
| þ (thorn small) | U+00FE | Latin-1 Supplement | Icelandic, Faroese (voiceless [θ]) |
| Ð (eth capital) | U+00D0 | Latin-1 Supplement | Icelandic, Faroese (voiced [ð]) |
| ð (eth small) | U+00F0 | Latin-1 Supplement | Icelandic, Faroese (voiced [ð]) |
| Æ (ae capital) | U+00C6 | Latin-1 Supplement | Danish, Norwegian, Icelandic, Faroese ([æ] or [ai]) |
| æ (ae small) | U+00E6 | Latin-1 Supplement | Danish, Norwegian, Icelandic, Faroese ([æ] or [ai]) |
| IJ (ij capital) | U+0132 | Latin Extended-A | Dutch ([ɛi]) |
| ij (ij small) | U+0133 | Latin Extended-A | Dutch ([ɛi]) |