Recent from talks
Nothing was collected or created yet.
Diacritic
View on Wikipedia
A diacritic (also diacritical mark, diacritical point, diacritical sign, or accent) is a glyph added to a letter or to a basic glyph. The term derives from the Ancient Greek διακριτικός (diakritikós, "distinguishing"), from διακρίνω (diakrínō, "to distinguish"). The word diacritic is a noun, though it is sometimes used in an attributive sense, whereas diacritical is only an adjective. Some diacritics, such as the acute ⟨ó⟩, grave ⟨ò⟩, and circumflex ⟨ô⟩ (all shown above an 'o'), are often called accents. Diacritics may appear above or below a letter or in some other position such as within the letter or between two letters.
The main use of diacritics in Latin script is to change the sound-values of the letters to which they are added. Historically, English has used the diaeresis diacritic to indicate the correct pronunciation of ambiguous words, such as "coöperate", without which the <oo> letter sequence could be misinterpreted to be pronounced /ˈkuːpəreɪt/. Other examples are the acute and grave accents, which can indicate that a vowel is to be pronounced differently than is normal in that position, for example not reduced to /ə/ or silent as in the case of the two uses of the letter e in the noun résumé (as opposed to the verb resume) and the help sometimes provided in the pronunciation of some words such as doggèd, learnèd, blessèd, and especially words pronounced differently than normal in poetry (for example movèd, breathèd).
Most other words with diacritics in English are borrowings from languages such as French to better preserve the spelling, such as the diaeresis on naïve and Noël, the acute from café, the circumflex in the word crêpe, and the cedille in façade. All these diacritics, however, are frequently omitted in writing, and English is the only major modern European language that does not have diacritics in common usage.[a]
In Latin-script alphabets in other languages diacritics may distinguish between homonyms, such as the French là ("there") versus la ("the"), which are both pronounced /la/. In Gaelic type, a dot over a consonant indicates lenition of the consonant in question. In other writing systems, diacritics may perform other functions. Vowel pointing systems, namely the Arabic harakat and the Hebrew niqqud systems, indicate vowels that are not conveyed by the basic alphabet. The Indic virama ( ् etc.) and the Arabic sukūn ( ـْـ ) mark the absence of vowels. Cantillation marks indicate prosody. Other uses include the Early Cyrillic titlo stroke ( ◌҃ ) and the Hebrew gershayim ( ״ ), which, respectively, mark abbreviations or acronyms, and Greek diacritical marks, which showed that letters of the alphabet were being used as numerals. In Vietnamese and the Hanyu Pinyin official romanization system for Mandarin in China, diacritics are used to mark the tones of the syllables in which the marked vowels occur.
In orthography and collation, a letter modified by a diacritic may be treated either as a new, distinct letter or as a letter–diacritic combination. This varies from language to language and may vary from case to case within a language.
In some cases, letters are used as "in-line diacritics", with the same function as ancillary glyphs, in that they modify the sound of the letter preceding them, as in the case of the "h" in the English pronunciation of "sh" and "th".[2] Such letter combinations are sometimes even collated as a single distinct letter. For example, the spelling sch was traditionally often treated as a separate letter in German. Words with that spelling were listed after all other words spelled with s in card catalogs in the Vienna public libraries, for example (before digitization).
Types
[edit]Among the types of diacritic used in alphabets based on the Latin script are:
- accents (so called because the acute, grave, and circumflex were originally used to indicate different types of pitch accents in the polytonic transcription of Greek)
- ◌́ – acute (Latin: apex); for example ó
- ◌̀ – grave; for example ò
- ◌̂ – circumflex; for example ô
- ◌̌ – caron, wedge; for example ǒ
- ◌̋ – double acute; for example ő
- ◌̏ – double grave; for example ȍ
- one dot
- ◌̇ – an overdot is used in many orthographies and transcriptions; for example ȯ
- ◌̣ – an underdot is also used in many orthographies and transcriptions; for example ọ
- ◌·◌ – an interpunct is used in the Catalan ela geminada (l·l)
- ◌͘ – a dot above right is used in Pe̍h-ōe-jī
- tittle, the superscript dot of the modern lowercase Latin ⟨i⟩ and ⟨j⟩
- two dots:
- two overdots (◌̈) are used for umlaut, diaeresis and others; (for example ö)
- two underdots (◌̤) are used in the International Phonetic Alphabet (IPA) and the ALA-LC romanization system
- ◌ː – triangular colon, used in the IPA to mark long vowels (the "dots" are triangular, not circular).
- curves
- ◌̆ – breve; for example ŏ
- ◌̑ – inverted breve; for example ȏ
- ◌͗ – sicilicus, a palaeographic diacritic similar to a caron or breve
- ◌̃ – tilde; for example õ
- ◌҃ – titlo
- vertical stroke
- ◌̩ – a subscript vertical stroke is used in IPA to mark syllabicity and in Rheinische Dokumenta to mark a schwa
- ◌̍ – a superscript vertical stroke is used in Pe̍h-ōe-jī
- macron or horizontal line
- overlays
- ◌⃓ – vertical bar through the character
- ◌̷ – slash through the character; for example ø
- ◌̵ – crossbar through the character
- ring
- ◌̊ – overring: for example å
- superscript curls
- ◌̓ – comma above
- ◌̒ – inverted apostrophe
- ◌̔ – reversed comma above
- ◌̉ – hook above (Vietnamese: dấu hỏi)
- ◌̛ – horn (Vietnamese: dấu móc); for example ơ
- subscript curls
- ◌̦ – undercomma; for example ș
- ◌̧ – cedilla; for example ç
- ◌̡ ◌̢ – hook, left or right, sometimes superscript
- ◌̨ – ogonek; for example ǫ
- double marks (over or under two base characters)
- ◌͝◌ – double breve
- ◌͡◌ – tie bar or top ligature
- ◌᷍◌ – double circumflex
- ◌͞◌ – longum
- ◌͠◌ – double tilde
- double sub/superscript diacritics
- ◌̧ ̧ – double cedilla
- ◌̨ ̨ – double ogonek
- ◌̈ ̈ – double diaeresis
- ◌ͅͺ – double ypogegrammeni
The tilde, dot, comma, titlo, apostrophe, bar, and colon are sometimes diacritical marks, but also have other uses.
Not all diacritics occur adjacent to the letter they modify. In the Wali language of Ghana, for example, an apostrophe indicates a change of vowel quality, but occurs at the beginning of the word, as in the dialects ’Bulengee and ’Dolimi. Because of vowel harmony, all vowels in a word are affected, so the scope of the diacritic is the entire word. In abugida scripts, like those used to write Hindi and Thai, diacritics indicate vowels, and may occur above, below, before, after, or around the consonant letter they modify.
The tittle (dot) on the letter ⟨i⟩ or the letter ⟨j⟩, of the Latin alphabet originated as a diacritic to clearly distinguish ⟨i⟩ from the minims (downstrokes) of adjacent letters. It first appeared in the 11th century in the sequence ii (as in ingeníí), then spread to i adjacent to m, n, u, and finally to all lowercase is. The ⟨j⟩, originally a variant of i, inherited the tittle. The shape of the diacritic developed from initially resembling today's acute accent to a long flourish by the 15th century. With the advent of Roman type it was reduced to the round dot we have today.[3]
Several languages of eastern Europe use diacritics on both consonants and vowels, whereas in western Europe digraphs are more often used to change consonant sounds. Most languages in Europe use diacritics on vowels, aside from English where there are typically none (with some exceptions).
Diacritics specific to non-Latin alphabets
[edit]Arabic
[edit]- (ئ ؤ إ أ and stand alone ء) hamza: indicates a glottal stop.
- (ــًــٍــٌـ) tanwīn (تنوين) symbols: Serve a grammatical role in Arabic. The sign ـً is most commonly written in combination with alif, e.g. ـًا.
- (ــّـ) shadda: Gemination (doubling) of consonants.
- (ٱ) waṣla: Comes most commonly at the beginning of a word. Indicates a type of hamza that is pronounced only when the letter is read at the beginning of the talk.
- (آ) madda: A written replacement for a hamza that is followed by an alif, i.e. (ءا). Read as a glottal stop followed by a long /aː/, e.g. ءاداب، ءاية، قرءان، مرءاة are written out respectively as آداب، آية، قرآن، مرآة. This writing rule does not apply when the alif that follows a hamza is not a part of the stem of the word, e.g. نتوءات is not written out as نتوآت as the stem نتوء does not have an alif that follows its hamza.
- (ــٰـ) superscript alif (also "short" or "dagger alif": A replacement for an original alif that is dropped in the writing out of some rare words, e.g. لاكن is not written out with the original alif found in the word pronunciation, instead it is written out as لٰكن.
- ḥarakāt (In Arabic: حركات also called تشكيل tashkīl):
- (ــَـ) fatḥa (a)
- (ــِـ) kasra (i)
- (ــُـ) ḍamma (u)
- (ــْـ) sukūn (no vowel)
- The ḥarakāt or vowel points serve two purposes:
- They serve as a phonetic guide. They indicate the presence of short vowels (fatḥa, kasra, or ḍamma) or their absence (sukūn).
- At the last letter of a word, the vowel point reflects the inflection case or conjugation mood.
- For nouns, The ḍamma is for the nominative, fatḥa for the accusative, and kasra for the genitive.
- For verbs, the ḍamma is for the imperfective, fatḥa for the perfective, and the sukūn is for verbs in the imperative or jussive moods.
- Vowel points or tashkīl should not be confused with consonant points or iʿjam (إعجام) – one, two or three dots written above or below a consonant to distinguish between letters of the same or similar form.
Greek
[edit]These diacritics are used in addition to the acute, grave, and circumflex accents and the diaeresis:
- ◌ͺ – iota subscript (ᾳ, εͅ, ῃ, ιͅ, οͅ, υͅ, ῳ)
- ῾◌ – rough breathing (Ancient Greek: δασὺ πνεῦμα, romanized: dasỳ pneûma, Latin: spīritus asper): aspiration
- ᾿◌ – smooth (or soft) breathing (Ancient Greek: ψιλὸν πνεῦμα, romanized: psilòn pneûma, Latin: spīritus lēnis): lack of aspiration
Hebrew
[edit]
Letters in black, niqqud in red, cantillation in blue
(Cantillation marks do not generally render correctly; refer to Hebrew cantillation#Names and shapes of the ta'amim for a complete table together with instructions for how to maximize the possibility of viewing them in a web browser.)
Korean
[edit]
The diacritics 〮 and 〯 , known as Bangjeom (방점; 傍點), were used to mark pitch accents in Hangul for Middle Korean. They were written to the left of a syllable in vertical writing and above a syllable in horizontal writing.
Sanskrit and Indic
[edit]
Syriac
[edit]- A dot above and a dot below a letter represent [a], transliterated as a or ă,
- Two diagonally-placed dots above a letter represent [ɑ], transliterated as ā or â or å,
- Two horizontally-placed dots below a letter represent [ɛ], transliterated as e or ĕ; often pronounced [ɪ] and transliterated as i in the East Syriac dialect,
- Two diagonally-placed dots below a letter represent [e], transliterated as ē,
- A dot underneath the Beth represent a soft [v] sound, transliterated as v
- A tilde (~) placed under Gamel represent a [dʒ] sound, transliterated as j
- The letter Waw with a dot below it represents [u], transliterated as ū or u,
- The letter Waw with a dot above it represents [o], transliterated as ō or o,
- The letter Yōḏ with a dot beneath it represents [i], transliterated as ī or i,
- A tilde (~) under Kaph represent a [t͡ʃ] sound, transliterated as ch or č,
- A semicircle under Peh represents an [f] sound, transliterated as f or ph.
In addition to the above vowel marks, transliteration of Syriac sometimes includes ə, e̊ or superscript e (or often nothing at all) to represent an original Aramaic schwa that became lost later on at some point in the development of Syriac.[4] Some transliteration schemes find its inclusion necessary for showing spirantization or for historical reasons.[5][6]
Non-alphabetic scripts
[edit]Some non-alphabetic scripts also employ symbols that function essentially as diacritics.
- Non-pure abjads (such as Hebrew and Arabic script) and abugidas use diacritics for denoting vowels. Hebrew and Arabic also indicate consonant doubling and change with diacritics; Hebrew and Devanagari use them for foreign sounds. Devanagari and related abugidas also use a diacritical mark called a virama to mark the absence of a vowel. In addition, Devanagari uses the moon-dot chandrabindu ( ँ ) for vowel nasalization.
- Unified Canadian Aboriginal Syllabics use several types of diacritics, including the diacritics with alphabetic properties known as Medials and Finals. Although long vowels originally were indicated with a negative line through the Syllabic glyphs, making the glyph appear broken, in the modern forms, a dot above is used to indicate vowel length. In some of the styles, a ring above indicates a long vowel with a [j] off-glide. Another diacritic, the "inner ring" is placed at the glyph's head to modify [p] to [f] and [t] to [θ]. Medials such as the "w-dot" placed next to the Syllabics glyph indicates a [w] being placed between the syllable onset consonant and the nucleus vowel. Finals indicate the syllable coda consonant; some of the syllable coda consonants in word medial positions, such as with the "h-tick", indicate the fortification of the consonant in the syllable following it.
- The Japanese hiragana and katakana syllabaries use the dakuten (◌゛) and handakuten (◌゜) (in Japanese: 濁点 and 半濁点) symbols, also known as nigori (濁 "muddying") or ten-ten (点々 "dot dot") and maru (丸 "circle"), to indicate voiced consonants or other phonetic changes.
- Emoticons are commonly created with diacritic symbols, especially Japanese emoticons on popular imageboards.
Alphabetization or collation
[edit]Different languages use different rules to put diacritic characters in alphabetical order. For example, French and Portuguese treat letters with diacritical marks the same as the underlying letter for purposes of ordering and dictionaries. The Scandinavian languages and the Finnish language, by contrast, treat the characters with diacritics ⟨å⟩, ⟨ä⟩, and ⟨ö⟩ as distinct letters of the alphabet, and sort them after ⟨z⟩. Usually ⟨ä⟩ (a-umlaut) and ⟨ö⟩ (o-umlaut) [used in Swedish and Finnish] are sorted as equivalent to ⟨æ⟩ (ash) and ⟨ø⟩ (o-slash) [used in Danish and Norwegian]. Also, aa, when used as an alternative spelling to ⟨å⟩, is sorted as such. Other letters modified by diacritics are treated as variants of the underlying letter, with the exception that ⟨ü⟩ is frequently sorted as ⟨y⟩.
Languages that treat accented letters as variants of the underlying letter usually alphabetize words with such symbols immediately after similar unmarked words. For instance, in German where two words differ only by an umlaut, the word without it is sorted first in German dictionaries (e.g. schon and then schön, or fallen and then fällen). However, when names are concerned (e.g. in phone books or in author catalogues in libraries), umlauts are often treated as combinations of the vowel with a suffixed ⟨e⟩; Austrian phone books now treat characters with umlauts as separate letters (immediately following the underlying vowel).
In Spanish, the grapheme ⟨ñ⟩ is considered a distinct letter, different from ⟨n⟩ and collated between ⟨n⟩ and ⟨o⟩, as it denotes a different sound from that of a plain ⟨n⟩. But the accented vowels ⟨á⟩, ⟨é⟩, ⟨í⟩, ⟨ó⟩, ⟨ú⟩ are not separated from the unaccented vowels ⟨a⟩, ⟨e⟩, ⟨i⟩, ⟨o⟩, ⟨u⟩, as the acute accent in Spanish only modifies stress within the word or denotes a distinction between homonyms, and does not modify the sound of a letter.
For a comprehensive list of the collating orders in various languages, see Collating sequence.
Generation with computers
[edit]
Modern computer technology was developed mostly in countries that speak Western European languages (particularly English), and many early binary encodings were developed with a bias favoring English—a language written without diacritical marks. With computer memory and computer storage at premium, early character sets were limited to the Latin alphabet, the ten digits and a few punctuation marks and conventional symbols. The American Standard Code for Information Interchange (ASCII), first published in 1963, encoded just 95 printable characters. It included just four free-standing diacritics—acute, grave, circumflex and tilde—which were to be used by backspacing and overprinting the base letter. The ISO/IEC 646 standard (1967) defined national variations that replace some American graphemes with precomposed characters (such as ⟨é⟩, ⟨è⟩ and ⟨ë⟩), according to language—but remained limited to 95 printable characters.
Unicode was conceived to solve this problem by assigning every known character its own code; if this code is known, most modern computer systems provide a method to input it. For historical reasons, almost all the letter-with-accent combinations used in European languages were given unique code points and these are called precomposed characters. For other languages, it is usually necessary to use a combining character diacritic together with the desired base letter. Unfortunately, even as of 2024, many applications and web browsers remain unable to operate the combining diacritic concept properly.
Depending on the keyboard layout and keyboard mapping, it is more or less easy to enter letters with diacritics on computers and typewriters. Keyboards used in countries where letters with diacritics are the norm, have keys engraved with the relevant symbols. In other cases, such as when the US international or UK extended mappings are used, the accented letter is created by first pressing the key with the diacritic mark, followed by the letter to place it on. This method is known as the dead key technique, as it produces no output of its own but modifies the output of the key pressed after it.
Languages with letters containing diacritics
[edit]The following languages have letters with diacritics that are orthographically distinct from those without diacritics.
Latin script
[edit]Baltic
[edit]- Latvian has the following letters: ⟨ā⟩, ⟨ē⟩, ⟨ī⟩, ⟨ū⟩, ⟨č⟩, ⟨ģ⟩, ⟨ķ⟩, ⟨ļ⟩, ⟨ņ⟩, ⟨š⟩, ⟨ž⟩
- Lithuanian. In general usage, where letters appear with the caron (⟨č⟩, ⟨š⟩ and ⟨ž⟩), they are considered as separate letters from ⟨c⟩, ⟨s⟩ or ⟨z⟩ and collated separately; letters with the ogonek (⟨ą⟩, ⟨ę⟩, ⟨į⟩ and ⟨ų⟩), the macron (⟨ū⟩) and the overdot (⟨ė⟩) are considered as separate letters as well, but not given a unique collation order.
Celtic
[edit]- Welsh uses the circumflex, diaeresis, acute, and grave accents on its seven vowels ⟨a⟩, ⟨e⟩, ⟨i⟩, ⟨o⟩, ⟨u⟩, ⟨w⟩, ⟨y⟩ (hence the composites ⟨â⟩, ⟨ê⟩, ⟨î⟩, ⟨ô⟩, ⟨û⟩, ⟨ŵ⟩, ⟨ŷ⟩, ⟨ä⟩, ⟨ë⟩, ⟨ï⟩, ⟨ö⟩, ⟨ü⟩, ⟨ẅ⟩, ⟨ÿ⟩, ⟨á⟩, ⟨é⟩, ⟨í⟩, ⟨ó⟩, ⟨ú⟩, ⟨ẃ⟩, ⟨ý⟩, ⟨à⟩, ⟨è⟩, ⟨ì⟩, ⟨ò⟩, ⟨ù⟩, ⟨ẁ⟩, ⟨ỳ⟩). However all except the circumflex (which is used as a macron) are fairly rare.
- Following spelling reforms since the 1970s, Scottish Gaelic uses graves only, which can be used on any vowel (⟨à⟩, ⟨è⟩, ⟨ì⟩, ⟨ò⟩, ⟨ù⟩). Formerly acute accents could be used on ⟨á⟩, ⟨ó⟩ and ⟨é⟩, which were used to indicate a specific vowel quality. With the elimination of these accents, the new orthography relies on the reader having prior knowledge of pronunciation of a given word.
- Manx uses the cedilla diacritic ⟨ç⟩ combined with h to give the digraph ⟨çh⟩ (pronounced /tʃ/) to mark the distinction between it and the digraph ⟨ch⟩ (pronounced /h/ or /x/). Other diacritics used in Manx included the circumflex and diaeresis, as in ⟨â⟩, ⟨ê⟩, ⟨ï⟩, etc. to mark the distinction between two similarly spelled words but with slightly differing pronunciation.
- Irish uses only acute accents to mark long vowels, following the 1948 spelling reform. Lenition is indicated using an overdot in Gaelic type (⟨ċ⟩,⟨ḋ⟩,⟨ḟ⟩, ⟨ġ⟩, ⟨ṁ⟩, ⟨ṗ⟩, ⟨ṡ⟩, ⟨ṫ⟩); in Roman type, a suffixed ⟨h⟩ is used. Thus, a ṁáṫair is equivalent to a mháthair.
- Breton does not have a single orthography (spelling system), but uses diacritics for a number of purposes. The diaeresis is used to mark that two vowels are pronounced separately and not as a diphthong/digraph. The circumflex is used to mark long vowels, but usually only when the vowel length is not predictable by phonology. Nasalization of vowels may be marked with a tilde, or following the vowel with the letter ⟨ñ⟩. The plural suffix -où is used as a unified spelling to represent a suffix with a number of pronunciations in different dialects, and to distinguish this suffix from the digraph ⟨ou⟩ which is pronounced as /u:/. An apostrophe is used to distinguish ⟨c'h⟩, pronounced /x/ as the digraph ⟨ch⟩ is used in other Celtic languages, from the French-influenced digraph ch, pronounced /ʃ/.
Finno-Ugric
[edit]- Estonian has a distinct letter ⟨õ⟩, which contains a tilde. Estonian vowels with double-dot diacritics ⟨ä⟩, ⟨ö⟩, ⟨ü⟩ are similar to German, but these are also distinct letters, unlike German umlauted letters. All four have their own place in the alphabet, between ⟨w⟩ and ⟨x⟩. Carons in ⟨š⟩ or ⟨ž⟩ appear only in foreign proper names and loanwords. Also these are distinct letters, placed in the alphabet between s and t.
- Finnish uses double-dotted vowels (⟨ä⟩ and ⟨ö⟩). As in Swedish and Estonian, these are regarded as individual letters, rather than 'vowel + diacritic' combinations (as happens in German). It also uses the characters ⟨å⟩, ⟨š⟩ and ⟨ž⟩ in foreign names and loanwords. In the Finnish and Swedish alphabets, ⟨å⟩, ⟨ä⟩ and ⟨ö⟩ collate as separate letters after ⟨z⟩, the others as variants of their base letter.
- Hungarian uses the double-dot, the acute and double acute diacritics (the last is unique to Hungarian): (⟨ö⟩, ⟨ü⟩), (⟨á⟩, ⟨é⟩, ⟨í⟩, ⟨ó⟩, ⟨ú⟩) and (⟨ő⟩, ⟨ű⟩). The acute accent indicates the long form of a vowel (in case of ⟨i⟩/⟨í⟩, ⟨o⟩/⟨ó⟩, ⟨u⟩/⟨ú⟩) while the double acute performs the same function for ⟨ö⟩ and ⟨ü⟩. The acute accent can also indicate a different sound (more open, as in case of ⟨a⟩/⟨á⟩, ⟨e⟩/⟨é⟩). Both long and short forms of the vowels are listed separately in the Hungarian alphabet, but members of the pairs ⟨a⟩/⟨á⟩, ⟨e⟩/⟨é⟩, ⟨i⟩/⟨í⟩, ⟨o⟩/⟨ó⟩, ⟨ö⟩/⟨ő⟩, ⟨u⟩/⟨ú⟩ and ⟨ü⟩/⟨ű⟩ are collated in dictionaries as the same letter.
- Livonian has the following letters: ⟨ā⟩, ⟨ä⟩, ⟨ǟ⟩, ⟨ḑ⟩, ⟨ē⟩, ⟨ī⟩, ⟨ļ⟩, ⟨ņ⟩, ⟨ō⟩, ⟨ȯ⟩, ⟨ȱ⟩, ⟨õ⟩, ⟨ȭ⟩, ⟨ŗ⟩, ⟨š⟩, ⟨ț⟩, ⟨ū⟩, ⟨ž⟩.
Germanic
[edit]- German uses the two-dots diacritic (German: umlaut): letters ⟨ä⟩, ⟨ö⟩, ⟨ü⟩, used to indicate the fronting of back vowels (see umlaut (linguistics)).
- Dutch uses acute, circumflex, grave and two-dots diacritics with most vowels and cedilla with c, as in French. This results in ⟨á⟩, ⟨à⟩, ⟨ä⟩, ⟨é⟩, ⟨è⟩, ⟨ê⟩, ⟨ë⟩, ⟨í⟩, ⟨î⟩, ⟨ï⟩, ⟨ó⟩, ⟨ô⟩, ⟨ö⟩, ⟨ú⟩, ⟨û⟩, ⟨ü⟩ and ⟨ç⟩. This is mostly on words (and names) originating from French (like crème, café, gêne, façade). The acute accent is also used to stress the vowel (like één). The two-dots diacritic is used as a linguistic diaeresis (a vowel hiatus) that splits the two vowels, e.g., reële, reünie, coördinatie), rather than to indicate a linguistic umlaut as used in German.
- Afrikaans uses 16 additional vowel forms, both uppercase and lowercase: ⟨á⟩, ⟨ä⟩, ⟨é⟩, ⟨è⟩, ⟨ê⟩, ⟨ë⟩, ⟨í⟩, ⟨î⟩, ⟨ï⟩, ⟨ó⟩, ⟨ô⟩, ⟨ö⟩, ⟨ú⟩, ⟨û⟩, ⟨ü⟩, ⟨ý⟩.
- Faroese uses acutes and some additional letters. All are considered separate letters and have their own place in the alphabet: ⟨á⟩, ⟨í⟩, ⟨ó⟩, ⟨ú⟩, ⟨ý⟩ and ⟨ø⟩.
- Icelandic uses acutes and other additional letters. All are considered separate letters, and have their own place in the alphabet: ⟨á⟩, ⟨é⟩, ⟨í⟩, ⟨ó⟩, ⟨ú⟩, ⟨ý⟩ and ⟨ö⟩.
- Danish and Norwegian use additional characters like the o-slash ⟨ø⟩ and the a-overring ⟨å⟩. These letters come after ⟨z⟩ and ⟨æ⟩ in the order ⟨ø⟩, ⟨å⟩. Historically, the ⟨å⟩ has developed from a ligature by writing a small superscript ⟨a⟩ over a lowercase ⟨a⟩; if an ⟨å⟩ character is unavailable, some Scandinavian languages allow the substitution of a doubled a, thus ⟨aa⟩. The Scandinavian languages collate these letters after ⟨z⟩, but have different national collation standards.
- Swedish uses a-diaeresis (⟨ä⟩) and o-diaeresis (⟨ö⟩) in the place of ash (⟨æ⟩) and slashed o (⟨ø⟩) in addition to the a-overring (⟨å⟩). Historically, the two-dots diacritic for the Swedish letters ⟨ä⟩ and ⟨ö⟩ developed from a small Gothic ⟨e⟩ written above the letters. These letters are collated after ⟨z⟩, in the order ⟨å⟩, ⟨ä⟩, ⟨ö⟩.
Romance
[edit]- In Asturian, Galician and Spanish, the character ⟨ñ⟩ is a letter and collated between n and o.
- Asturian uses an underdot: ⟨Ḷ⟩ (lower case, ⟨ḷ⟩), and ⟨Ḥ⟩ (lower case ⟨ḥ⟩)[7]
- Catalan uses the acute accent ⟨é⟩, ⟨í⟩, ⟨ó⟩, ⟨ú⟩, the grave accent ⟨à⟩, ⟨è⟩, ⟨ò⟩, the diaeresis ⟨ï⟩, ⟨ü⟩, the cedilla ⟨ç⟩, and the interpunct ⟨l·l⟩.
- In Valencian, the circumflex ⟨â⟩, ⟨ê⟩, ⟨î⟩, ⟨ô⟩, ⟨û⟩ may also be used.
- Corsican uses the following in its alphabet: ⟨À⟩/⟨à⟩, ⟨È⟩/⟨è⟩, ⟨Ì⟩/⟨ì⟩, ⟨Ò⟩/⟨ò⟩, ⟨Ù⟩/⟨ù⟩.
- French uses four diacritics, appearing on vowels (circumflex, acute, grave, diaeresis) and the cedilla appearing in ⟨ç⟩.
- Italian uses two diacritics, appearing on vowels (acute, grave)
- Leonese: could use ⟨ñ⟩ or ⟨nn⟩.
- Portuguese uses a tilde with the vowels ⟨a⟩ and ⟨o⟩ and a cedilla with c.
- Romanian uses a breve on the letter a (⟨ă⟩) to indicate the sound schwa /ə/, as well as a circumflex over the letters a (⟨â⟩) and i (⟨î⟩) for the sound /ɨ/. Romanian also writes a comma below the letters s (⟨ș⟩) and t (⟨ț⟩) to represent the sounds /ʃ/ and /t͡s/, respectively. These characters are collated after their non-diacritic equivalent.
- Spanish uses acute accents (⟨á⟩, ⟨é⟩, ⟨í⟩, ⟨ó⟩, ⟨ú⟩) to indicate stress falling on a different syllable than the one it would fall on based on default rules, and to distinguish certain one-syllable homonyms (e.g. el (masculine singular definite article) and él [he]). The acute accent is also used to break up sequences of vowels that would normally be pronouced as a diphthong into two syllables, as in the word reír. Diaeresis is used on u only, to distinguish the combinations gue, gui /ge/, /gi/ from güe, güi /gwe/, /gwi/, e.g. vergüenza, lingüística. The tilde on ⟨ñ⟩ is not considered a diacritic as ⟨ñ⟩ is considered a distinct letter from ⟨n⟩, not a mutated form of it.
Slavic
[edit]- Gaj's Latin alphabet, used in Croatian and latinized Serbian, has the symbols ⟨č⟩, ⟨ć⟩, ⟨đ⟩, ⟨š⟩ and ⟨ž⟩, which are considered separate letters and are listed as such in dictionaries and other contexts in which words are listed according to alphabetical order. It also has one digraph including a diacritic, dž, which is also alphabetized independently, and follows ⟨d⟩ and precedes ⟨đ⟩ in the alphabetical order.
- The Czech alphabet uses the acute (lowercase á é í ó ú ý, uppercase Á É Í Ó Ú Ý), caron (lowercase č ď ě ň ř š ť ž, uppercase Č Ď Ě Ň Ř Š Ť Ž), and for one letter (lowercase ů, uppercase Ů) the ring. (In ď and ť the caron is modified to look rather like an apostrophe.) Letter with caron are considered separate letters, whereas vowels are considered only as longer variants of the unaccented letters. Acute does not affect alphabetical order, letters with caron are ordered after original counterparts.
- Polish has the following letters: ą ć ę ł ń ó ś ź ż. These are considered to be separate letters: each of them is placed in the alphabet immediately after its Latin counterpart (e.g. ⟨ą⟩ between ⟨a⟩ and ⟨b⟩), ⟨ź⟩ and ⟨ż⟩ are placed after ⟨z⟩ in that order.
- The Serbian Cyrillic alphabet has no diacritics, instead it has a grapheme (glyph) for every letter of its Latin counterpart (including Latin letters with diacritics and the digraphs dž, lj and nj).
- The Slovak alphabet uses the acute (lowercase á é í ó ú ý ĺ ŕ, uppercase Á É Í Ó Ú Ý Ĺ Ŕ), caron (lowercase č ď ľ ň š ť ž dž, uppercase Č Ď Ľ Ň Š Ť Ž DŽ), umlaut (ä Ä) and circumflex accent (ô Ô). All of those are considered separate letters and are placed directly after the original counterpart in the alphabet.[8]
- The basic Slovenian alphabet has the symbols ⟨č⟩, ⟨š⟩, and ⟨ž⟩, which are considered separate letters and are listed as such in dictionaries and other contexts in which words are listed according to alphabetical order. Letters with a caron are placed right after the letters as written without the diacritic. The letter ⟨đ⟩ ('d with bar') may be used in non-transliterated foreign words, particularly names, and is placed after ⟨č⟩ and before ⟨d⟩.
Turkic
[edit]- Azerbaijani includes the distinct Turkish alphabet letters Ç, Ğ, I, İ, Ö, Ş and Ü.
- Crimean Tatar includes the distinct Turkish alphabet letters Ç, Ğ, I, İ, Ö, Ş and Ü. Unlike Turkish, Crimean Tatar also has the letter Ñ.
- Gagauz includes the distinct Turkish alphabet letters Ç, Ğ, I, İ, Ö and Ü. Unlike Turkish, Gagauz also has the letters Ä, Ê Ș and Ț. Ș and Ț are derived from the Romanian alphabet for the same sounds. Sometime the Turkish Ş may be used instead of Ș.
- Turkish uses a ⟨G⟩ with a breve (⟨Ğ⟩), two letters with two dots (⟨Ö⟩ and ⟨Ü⟩, representing two rounded front vowels), two letters with a cedilla (⟨Ç⟩ and ⟨Ş⟩, representing the affricate /tʃ/ and the fricative /ʃ/), and also possesses a dotted capital ⟨İ⟩ (and a dotless lowercase ⟨ı⟩ representing a high unrounded back vowel). In Turkish each of these are separate letters, rather than versions of other letters, where dotted capital ⟨İ⟩ and lower case ⟨i⟩ are the same letter, as are dotless capital ⟨I⟩ and lowercase ⟨ı⟩. Typographically, ⟨Ç⟩ and ⟨Ş⟩ are sometimes rendered with an underdot, as in ⟨Ṣ⟩. The new Azerbaijani, Crimean Tatar, and Gagauz alphabets are based on the Turkish alphabet and its same diacriticized letters, with some additions.
- Turkmen includes the distinct Turkish alphabet letters Ç, Ö, Ş and Ü. In addition, Turkmen uses A with diaeresis (Ä) to represent /æ/, N with caron (⟨Ň⟩) to represent the velar nasal /ŋ/, Y with acute (⟨Ý⟩) to represent the palatal approximant /j/, and Z with caron (⟨Ž⟩) to represent /ʒ/.
Other
[edit]- Albanian has two special letters Ç and Ë upper and lowercase. They are placed next to the most similar letters in the alphabet, c and e correspondingly.
- Esperanto has the symbols ŭ, ĉ, ĝ, ĥ, ĵ and ŝ, which are included in the alphabet, and considered separate letters.
- Filipino also has the character ñ as a letter and is collated between n and o.
- Modern Greenlandic does not use any diacritics, although ø and å are used to spell loanwords, especially from Danish and English.[9][10] From 1851 until 1973, Greenlandic was written in an alphabet invented by Samuel Kleinschmidt, where long vowels and geminate consonants were indicated by diacritics on vowels (in the case of consonant gemination, the diacritics were placed on the vowel preceding the affected consonant). For example, the name Kalaallit Nunaat was spelled Kalâdlit Nunât. This scheme uses the circumflex (◌̂) to indicate a long vowel (e.g. ⟨ât, ît, ût⟩; modern: ⟨aat, iit, uut⟩), an acute accent (◌́) to indicate gemination of the following consonant: (i.e. ⟨ák, ík, úk⟩; modern: ⟨akk, ikk, ukk⟩) and, finally, a tilde (◌̃) or a grave accent (◌̀), depending on the author, indicates vowel length and gemination of the following consonant (e.g. ⟨ãt/àt, ĩt/ìt, ũt/ùt⟩; modern: ⟨aatt, iitt, uutt⟩). ⟨ê, ô⟩, used only before ⟨r, q⟩, are now written ⟨ee, oo⟩ in Greenlandic.
- Hawaiian uses the kahakō (macron) over vowels, although there is some disagreement over considering them as individual letters. The kahakō over a vowel can completely change the meaning of a word that is spelled the same but without the kahakō.
- Kurdish uses the symbols Ç, Ê, Î, Ş and Û with other 26 standard Latin alphabet symbols.
- Lakota alphabet uses the caron for the letters č, ȟ, ǧ, š, and ž. It also uses the acute accent for stressed vowels á, é, í, ó, ú, áŋ, íŋ, úŋ.
- Malay uses some diacritics such as á, ā, ç, í, ñ, ó, š, ú. Uses of diacritics was continued until late 19th century except ā and ē.
- Maltese uses a C, G, and Z with a dot over them (Ċ, Ġ, Ż), and also has an H with an extra horizontal bar. For uppercase H, the extra bar is written slightly above the usual bar. For lowercase H, the extra bar is written crossing the vertical, like a t, and not touching the lower part (Ħ, ħ). The above characters are considered separate letters. The letter 'c' without a dot has fallen out of use due to redundancy. 'Ċ' is pronounced like the English 'ch' and 'k' is used as a hard c as in 'cat'. 'Ż' is pronounced just like the English 'Z' as in 'Zebra', while 'Z' is used to make the sound of 'ts' in English (like 'tsunami' or 'maths'). 'Ġ' is used as a soft 'G' like in 'geometry', while the 'G' sounds like a hard 'G' like in 'log'. The digraph 'għ' (called għajn after the Arabic letter name ʻayn for غ) is considered separate, and sometimes ordered after 'g', whilst in other volumes it is placed between 'n' and 'o' (the Latin letter 'o' originally evolved from the shape of Phoenician ʻayin, which was traditionally collated after Phoenician nūn).
- The romanization of Syriac uses the altered letters of. Ā, Č, Ḏ, Ē, Ë, Ġ, Ḥ, Ō, Š, Ṣ, Ṭ, Ū, Ž alongside the 26 standard Latin alphabet symbols.[11]
- Vietnamese uses the horn diacritic for the letters ơ and ư; the circumflex for the letters â, ê, and ô; the breve for the letter ă; and a bar through the letter đ. Separately, it also has á, à, ả, ã and ạ, the five tones used for vowels besides the flat tone 'a'.
Cyrillic letters
[edit]- Belarusian and Uzbek Cyrillic have a letter ⟨ў⟩.
- Belarusian, Bulgarian, Russian and Ukrainian have the letter ⟨й⟩.
- Belarusian and Russian have the letter ⟨ё⟩. In Russian, this letter is usually replaced by ⟨е⟩, although it has a different pronunciation. The use of ⟨е⟩ instead of ⟨ё⟩ does not affect the pronunciation. Ё is always used in children's books and in dictionaries. A minimal pair is все (vs'e, "everybody" pl.) and всё (vs'o, "everything" n. sg.). In Belarusian the replacement by ⟨е⟩ is a mistake; in Russian, it is permissible to use either ⟨е⟩ or ⟨ё⟩ for ⟨ё⟩ but the former is more common in everyday writing (as opposed to instructional or juvenile writing).
- The Cyrillic Ukrainian alphabet has the letters ⟨ґ⟩, ⟨й⟩ and ⟨ї⟩. Ukrainian Latynka has many more.
- Macedonian has the letters ⟨ќ⟩ and ⟨ѓ⟩.
- In Bulgarian and Macedonian the possessive pronoun ѝ (ì, "her") is spelled with a grave accent in order to distinguish it from the conjunction и (i, "and").
- The acute accent ◌́ above any vowel in Cyrillic alphabets is used in dictionaries, books for children and foreign learners to indicate the word stress, it also can be used for disambiguation of similarly spelled words with different lexical stresses.
Diacritics that do not produce new letters
[edit]
English
[edit]English is one of the few European languages that does not have many words that contain diacritical marks. Instead, digraphs are the main way the Modern English alphabet adapts the Latin to its phonemes. Exceptions are unassimilated foreign loanwords, including borrowings from French (and, increasingly, Spanish, like jalapeño and piñata); however, the diacritic is also sometimes omitted from such words. Loanwords that frequently appear with the diacritic in English include café, résumé or resumé (a usage that helps distinguish it from the verb resume), soufflé, and naïveté (see English terms with diacritical marks). In older practice (and even among some orthographically conservative modern writers), one may see examples such as élite, mêlée and rôle.
English speakers and writers once used the diaeresis more often than now in words such as coöperation (from Fr. coopération), zoölogy (from Grk. zoologia), and seeër (now more commonly see-er or simply seer) as a way of indicating that adjacent vowels belonged to separate syllables, but this practice has become far less common. The New Yorker magazine is a major publication that continues to use the diaeresis in place of a hyphen for clarity and economy of space.[12]
A few English words, often when used out of context, especially in isolation, can only be distinguished from other words of the same spelling by using a diacritic or modified letter. These include exposé, lamé, maté, öre, øre, résumé and rosé. In a few words, diacritics that did not exist in the original have been added for disambiguation, as in maté (from Sp. and Port. mate), saké (the standard Romanization of the Japanese has no accent mark), and Malé (from Dhivehi މާލެ), to clearly distinguish them from the English words mate, sake, and male.
The acute and grave accents are occasionally used in poetry and lyrics: the acute to indicate stress overtly where it might be ambiguous (rébel vs. rebél) or nonstandard for metrical reasons (caléndar), the grave to indicate that an ordinarily silent or elided syllable is pronounced (warnèd, parlìament).
In certain personal names such as Renée and Zoë, often two spellings exist, and the person's own preference will be known only to those close to them. Even when the name of a person is spelled with a diacritic, like Charlotte Brontë, this may be dropped in English-language articles, and even in official documents such as passports, due either to carelessness, the typist not knowing how to enter letters with diacritical marks, or technical reasons (California, for example, does not allow[clarification needed] names with diacritics, as the computer system cannot process such characters). They also appear in some worldwide company names and/or trademarks, such as Nestlé and Citroën.
Other languages
[edit]The following languages have letter-diacritic combinations that are not considered independent letters.
- Afrikaans uses a diaeresis to mark vowels that are pronounced separately and not as one would expect where they occur together, for example voel (to feel) as opposed to voël (bird). The circumflex is used in ê, î, ô and û generally to indicate long close-mid, as opposed to open-mid vowels, for example in the words wêreld (world) and môre (morning, tomorrow). The acute accent is used to add emphasis in the same way as underlining or writing in bold or italics in English, for example Dit is jóú boek (It is your book). The grave accent is used to distinguish between words that are different only in placement of the stress, for example appel (apple) and appèl (appeal) and in a few cases where it makes no difference to the pronunciation but distinguishes between homophones. The two most usual cases of the latter are in the sayings òf... òf (either... or) and nòg... nòg (neither... nor) to distinguish them from of (or) and nog (again, still).
- Aymara uses a diacritical horn over p, q, t, k, ch.
- Catalan has the following composite characters: à, ç, é, è, í, ï, ó, ò, ú, ü, l·l. The acute and the grave indicate stress and vowel height, the cedilla marks the result of a historical palatalization, the diaeresis indicates either a hiatus, or that the letter u is pronounced when the graphemes gü, qü are followed by e or i, the interpunct (·) distinguishes the different values of ll/l·l.
- Some orthographies of Cornish such as Kernowek Standard and Unified Cornish use diacritics, while others such as Kernewek Kemmyn and the Standard Written Form do not (or only use them optionally in teaching materials).
- Dutch uses the diaeresis. For example, in ruïne it means that the u and the i are separately pronounced in their usual way, and not in the way that the combination ui is normally pronounced. Thus it works as a separation sign and not as an indication for an alternative version of the i. Diacritics can be used for emphasis (érg koud for very cold) or for disambiguation between a number of words that are spelled the same when context does not indicate the correct meaning (één appel = one apple, een appel = an apple; vóórkomen = to occur, voorkómen = to prevent). Grave and acute accents are used on a very small number of words, mostly loanwords. The ç also appears in some loanwords.[13]
- Faroese. Non-Faroese accented letters are not added to the Faroese alphabet. These include é, ö, ü, å and recently also letters like š, ł, and ć.
- Filipino has the following composite characters: á, à, â, é, è, ê, í, ì, î, ó, ò, ô, ú, ù, û. Everyday use of diacritics for Filipino is, however, uncommon, and meant only to distinguish between homonyms between a word with the usual penultimate stress and one with a different stress placement. This aids both comprehension and pronunciation if both are relatively adjacent in a text, or if a word is itself ambiguous in meaning. The letter ñ ("eñe") is not a n with a diacritic, but rather collated as a separate letter, one of eight borrowed from Spanish. Diacritics appear in Spanish loanwords and names observing Spanish orthography rules.
- Finnish. Carons in š and ž appear only in foreign proper names and loanwords, but may be substituted with sh or zh if and only if it is technically impossible to produce accented letters in the medium. Contrary to Estonian, š and ž are not considered distinct letters in Finnish.
- French uses five diacritics. The grave (accent grave) marks the sound /ɛ/ when over an e, as in père ("father") or is used to distinguish words that are otherwise homographs such as a/à ("has"/"to") or ou/où ("or"/"where"). The acute (accent aigu) is only used in "é", modifying the "e" to make the sound /e/, as in étoile ("star"). The circumflex (accent circonflexe) generally denotes that an "s" once followed the vowel in Old French or Latin, as in fête ("party"), the Old French being feste and the Latin being festum. Whether the circumflex modifies the vowel's pronunciation depends on the dialect and the vowel. The cedilla (cédille) indicates that a normally hard "c" (before the vowels "a", "o", and "u") is to be pronounced /s/, as in ça ("that"). The diaeresis diacritic (French: tréma) indicates that two adjacent vowels that would normally be pronounced as one are to be pronounced separately, as in Noël ("Christmas").
- Galician vowels can bear an acute (á, é, í, ó, ú) to indicate stress or difference between two otherwise same written words (é, 'is' vs. e, 'and'), but the diaeresis is only used with ï and ü to show two separate vowel sounds in pronunciation. Only in foreign words may Galician use other diacritics such as ç (common during the Middle Ages), ê, or à.
- German uses the three umlauted characters ä, ö and ü. These diacritics indicate vowel changes. For instance, the word Ofen [ˈoːfən] "oven" has the plural Öfen [ˈøːfən]. The mark originated as a superscript e; a handwritten blackletter e resembles two parallel vertical lines, like a diaeresis. Due to this history, "ä", "ö" and "ü" can be written as "ae", "oe" and "ue" respectively, if the umlaut letters are not available.
- Hebrew has many various diacritic marks known as niqqud that are used above and below script to represent vowels. These must be distinguished from cantillation, which are keys to pronunciation and syntax.
- The International Phonetic Alphabet uses diacritic symbols and characters to indicate phonetic features or secondary articulations.
- Irish uses the acute to indicate that a vowel is long: á, é, í, ó, ú. It is known as síneadh fada "long sign" or simply fada "long" in Irish. In the older Gaelic type, overdots are used to indicate lenition of a consonant: ḃ, ċ, ḋ, ḟ, ġ, ṁ, ṗ, ṡ, ṫ.
- Italian mainly has the acute and the grave (à, è/é, ì, ò/ó, ù), typically to indicate a stressed syllable that would not be stressed under the normal rules of pronunciation but sometimes also to distinguish between words that are otherwise spelled the same way (e.g. "e", and; "è", is). Despite its rare use, Italian orthography allows the circumflex (î) too, in two cases: it can be found in old literary context (roughly up to 19th century) to signal a syncope (fêro→fecero, they did), or in modern Italian to signal the contraction of ″-ii″ due to the plural ending -i whereas the root ends with another -i; e.g., s. demonio, p. demonii→demonî; in this case the circumflex also signals that the word intended is not demoni, plural of "demone" by shifting the accent (demònî, "devils"; dèmoni, "demons").
- Lithuanian uses the acute, grave and tilde in dictionaries to indicate stress types in the language's pitch accent system.
- Maltese also uses the grave on its vowels to indicate stress at the end of a word with two syllables or more:– lowercase letters: à, è, ì, ò, ù; capital letters: À, È, Ì, Ò, Ù
- Māori makes use of macrons to mark long vowels.
- Occitan has the following composite characters: á, à, ç, é, è, í, ï, ó, ò, ú, ü, n·h, s·h. The acute and the grave indicate stress and vowel height, the cedilla marks the result of a historical palatalization, the diaeresis indicates either a hiatus, or that the letter u is pronounced when the graphemes gü, qü are followed by e or i, and the interpunct (·) distinguishes the different values of nh/n·h and sh/s·h (i.e., that the letters are supposed to be pronounced separately, not combined into "ny" and "sh").
- Portuguese has the following composite characters: à, á, â, ã, ç, é, ê, í, ó, ô, õ, ú. The acute and the circumflex indicate stress and vowel height, the grave indicates crasis, the tilde represents nasalization, and the cedilla marks the result of a historical lenition.
- Acutes are also used in Slavic language dictionaries and textbooks to indicate lexical stress, placed over the vowel of the stressed syllable. This can also serve to disambiguate meaning (e.g., in Russian писа́ть (pisáť) means "to write", but пи́сать (písať) means "to piss"), or "бо́льшая часть" (the biggest part) vs "больша́я часть" (the big part).
- Spanish uses the acute and the diaeresis. The acute is used on a vowel in a stressed syllable in words with irregular stress patterns. It can also be used to "break up" a diphthong as in tío (pronounced [ˈti.o], rather than [ˈtjo] as it would be without the accent). Moreover, the acute can be used to distinguish words that otherwise are spelled alike, such as si ("if") and sí ("yes"), and also to distinguish interrogative and exclamatory pronouns from homophones with a different grammatical function, such as donde/¿dónde? ("where"/"where?") or como/¿cómo? ("as"/"how?"). The acute may also be used to avoid typographical ambiguity, as in 1 ó 2 ("1 or 2"; without the acute this might be interpreted as "1 0 2". The diaeresis is used only over u (ü) for it to be pronounced [w] in the combinations gue and gui, where u is normally silent, for example ambigüedad. In poetry, the diaeresis may be used on i and u as a way to force a hiatus. As foreshadowed above, in nasal ñ the tilde (squiggle) is not considered a diacritic sign at all, but a composite part of a distinct glyph, with its own chapter in the dictionary: a glyph that denotes the 15th letter of the Spanish alphabet.
- Swedish uses the acute to show non-standard stress, for example in kafé (café) and resumé (résumé). This occasionally helps resolve ambiguities, such as ide (hibernation) versus idé (idea). In these words, the acute is not optional. Some proper names use non-standard diacritics, such as Carolina Klüft and Staël von Holstein. For foreign loanwords the original accents are strongly recommended, unless the word has been infused into the language, in which case they are optional. Hence crème fraîche but ampere. Swedish also has the letters å, ä, and ö, but these are considered distinct letters, not a and o with diacritics.
- Tamil does not have any diacritics in itself, but uses the Arabic numerals 2, 3 and 4 as diacritics to represent aspirated, voiced, and voiced-aspirated consonants when Tamil script is used to write long passages in Sanskrit.
- Thai has its own system of diacritics derived from Indian numerals, which denote different tones.
- Vietnamese uses the acute (dấu sắc), the grave (dấu huyền), the tilde (dấu ngã), the underdot (dấu nặng) and the hook above (dấu hỏi) on vowels as tone indicators.
- Welsh uses the circumflex, diaeresis, acute, and grave on its seven vowels a, e, i, o, u, w, y. The most common is the circumflex (which it calls to bach, meaning "little roof", or acen grom "crooked accent", or hirnod "long sign") to denote a long vowel, usually to disambiguate it from a similar word with a short vowel or a semivowel. The rarer grave accent has the opposite effect, shortening vowel sounds that would usually be pronounced long. The acute accent and diaeresis are also occasionally used, to denote stress and vowel separation respectively. The w-circumflex ŵ and the y-circumflex ŷ are among the most commonly accented characters in Welsh, but unusual in languages generally, and were until recently very hard to obtain in word-processed and HTML documents.
Transliteration
[edit]Several languages that are not written with the Roman alphabet are transliterated, or romanized, using diacritics. Examples:
- Arabic has several romanisations, depending on the type of the application, region, intended audience, country, etc. many of them extensively use diacritics, e.g., some methods use an underdot for rendering emphatic consonants (ṣ, ṭ, ḍ, ẓ, ḥ). The macron is often used to render long vowels. š is often used for /ʃ/, ġ for /ɣ/.
- Chinese has several romanizations that use the umlaut, but only on u (ü). In Hanyu Pinyin, the four tones of Mandarin Chinese are denoted by the macron (first tone), acute (second tone), caron (third tone) and grave (fourth tone) diacritics. Example: ā, á, ǎ, à.
- Romanized Japanese (Rōmaji) occasionally uses macrons to mark long vowels. The Hepburn romanization system uses macrons to mark long vowels, and the Kunrei-shiki and Nihon-shiki systems use a circumflex.
- Sanskrit, as well as many of its descendants, like Hindi and Bengali, uses a lossless romanization system, IAST. This includes several letters with diacritical markings, such as the macron (ā, ī, ū), over- and underdots (ṛ, ḥ, ṃ, ṇ, ṣ, ṭ, ḍ) as well as a few others (ś, ñ).
Limits
[edit]Orthographic
[edit]Possibly the greatest number of combining diacritics required to compose a valid character in any Unicode language is 8, for the "well-known grapheme cluster in Tibetan and Ranjana scripts" or HAKṢHMALAWARAYAṀ.[14]
It consists of
- U+0F67 ཧ TIBETAN LETTER HA
- U+0F90 ྐ TIBETAN SUBJOINED LETTER KA
- U+0FB5 ྵ TIBETAN SUBJOINED LETTER SSA
- U+0FA8 ྨ TIBETAN SUBJOINED LETTER MA
- U+0FB3 ླ TIBETAN SUBJOINED LETTER LA
- U+0FBA ྺ TIBETAN SUBJOINED LETTER FIXED-FORM WA
- U+0FBC ྼ TIBETAN SUBJOINED LETTER FIXED-FORM RA
- U+0FBB ྻ TIBETAN SUBJOINED LETTER FIXED-FORM YA
- U+0F82 ྂ TIBETAN SIGN NYI ZLA NAA DA
An example of the rendering, which may be broken depending on the browser used:
ཧྐྵྨླྺྼྻྂ
Unorthographic/ornamental
[edit]
Some users have explored the limits of rendering in web browsers and other software by "decorating" words with excessive nonsensical diacritics per character to produce so-called Zalgo text.
List of diacritics in Unicode
[edit]Diacritics for Latin script in Unicode:
| Character | Character name Unicode code point |
Mark | General category | Script |
|---|---|---|---|---|
| ◌̀ |
|
Grave | Mn: Mark, nonspacing | Inherited |
| ◌́ |
|
Acute | Mn: Mark, nonspacing | Inherited |
| ◌̂ |
|
Circumflex | Mn: Mark, nonspacing | Inherited |
| ◌̃ |
|
Tilde | Mn: Mark, nonspacing | Inherited |
| ◌̄ |
|
Macron | Mn: Mark, nonspacing | Inherited |
| ◌̅ |
|
Overline | Mn: Mark, nonspacing | Inherited |
| ◌̆ |
|
Breve | Mn: Mark, nonspacing | Inherited |
| ◌̇ |
|
Dot | Mn: Mark, nonspacing | Inherited |
| ◌̈ |
|
Diaeresis | Mn: Mark, nonspacing | Inherited |
| ◌̉ |
|
Hook | Mn: Mark, nonspacing | Inherited |
| ◌̊ |
|
Ring | Mn: Mark, nonspacing | Inherited |
| ◌̋ |
|
Double acute | Mn: Mark, nonspacing | Inherited |
| ◌̌ |
|
Caron | Mn: Mark, nonspacing | Inherited |
| ◌̍ |
|
Vertical line | Mn: Mark, nonspacing | Inherited |
| ◌̎ |
|
Double vertical line | Mn: Mark, nonspacing | Inherited |
| ◌̏ |
|
Double grave | Mn: Mark, nonspacing | Inherited |
| ◌̐ |
|
Candrabindu | Mn: Mark, nonspacing | Inherited |
| ◌̑ |
|
Inverted breve | Mn: Mark, nonspacing | Inherited |
| ◌̒ |
|
Turned comma | Mn: Mark, nonspacing | Inherited |
| ◌̓ |
|
Comma | Mn: Mark, nonspacing | Inherited |
| ◌̔ |
|
Reversed comma | Mn: Mark, nonspacing | Inherited |
| ◌̕ |
|
Comma right | Mn: Mark, nonspacing | Inherited |
| ◌̖ |
|
Grave | Mn: Mark, nonspacing | Inherited |
| ◌̗ |
|
Acute | Mn: Mark, nonspacing | Inherited |
| ◌̘ |
|
Left tack | Mn: Mark, nonspacing | Inherited |
| ◌̙ |
|
Right tack | Mn: Mark, nonspacing | Inherited |
| ◌̚ |
|
Left angle | Mn: Mark, nonspacing | Inherited |
| ◌̛ |
|
Horn | Mn: Mark, nonspacing | Inherited |
| ◌̜ |
|
Left half ring | Mn: Mark, nonspacing | Inherited |
| ◌̝ |
|
Up tack | Mn: Mark, nonspacing | Inherited |
| ◌̞ |
|
Down tack | Mn: Mark, nonspacing | Inherited |
| ◌̟ |
|
Plus sign | Mn: Mark, nonspacing | Inherited |
| ◌̠ |
|
Minus sign | Mn: Mark, nonspacing | Inherited |
| ◌̡ |
|
Palatalized hook | Mn: Mark, nonspacing | Inherited |
| ◌̢ |
|
Retroflex hook | Mn: Mark, nonspacing | Inherited |
| ◌̣ |
|
Dot | Mn: Mark, nonspacing | Inherited |
| ◌̤ |
|
Diaeresis | Mn: Mark, nonspacing | Inherited |
| ◌̥ |
|
Ring | Mn: Mark, nonspacing | Inherited |
| ◌̦ |
|
Comma | Mn: Mark, nonspacing | Inherited |
| ◌̧ |
|
Cedilla | Mn: Mark, nonspacing | Inherited |
| ◌̨ |
|
Ogonek | Mn: Mark, nonspacing | Inherited |
| ◌̩ |
|
Vertical line | Mn: Mark, nonspacing | Inherited |
| ◌̪ |
|
Bridge | Mn: Mark, nonspacing | Inherited |
| ◌̫ |
|
Double arch | Mn: Mark, nonspacing | Inherited |
| ◌̬ |
|
Caron | Mn: Mark, nonspacing | Inherited |
| ◌̭ |
|
Circumflex | Mn: Mark, nonspacing | Inherited |
| ◌̮ |
|
Breve | Mn: Mark, nonspacing | Inherited |
| ◌̯ |
|
Inverted breve | Mn: Mark, nonspacing | Inherited |
| ◌̰ |
|
Tilde | Mn: Mark, nonspacing | Inherited |
| ◌̱ |
|
Macron | Mn: Mark, nonspacing | Inherited |
| ◌̲ |
|
Low line | Mn: Mark, nonspacing | Inherited |
| ◌̳ |
|
Double low line | Mn: Mark, nonspacing | Inherited |
| ◌̴ |
|
Tilde overlay | Mn: Mark, nonspacing | Inherited |
| ◌̵ |
|
Short stroke overlay | Mn: Mark, nonspacing | Inherited |
| ◌̶ |
|
Long stroke overlay | Mn: Mark, nonspacing | Inherited |
| ◌̷ |
|
Short solidus overlay | Mn: Mark, nonspacing | Inherited |
| ◌̸ |
|
Long solidus overlay | Mn: Mark, nonspacing | Inherited |
| ◌̹ |
|
Right half ring | Mn: Mark, nonspacing | Inherited |
| ◌̺ |
|
Inverted bridge | Mn: Mark, nonspacing | Inherited |
| ◌̻ |
|
Square | Mn: Mark, nonspacing | Inherited |
| ◌̼ |
|
Seagull | Mn: Mark, nonspacing | Inherited |
| ◌̽ |
|
X | Mn: Mark, nonspacing | Inherited |
| ◌̾ |
|
Vertical tilde | Mn: Mark, nonspacing | Inherited |
| ◌̿ |
|
Double overline | Mn: Mark, nonspacing | Inherited |
| ◌̀ |
|
Grave tone | Mn: Mark, nonspacing | Inherited |
| ◌́ |
|
Acute tone | Mn: Mark, nonspacing | Inherited |
| ◌͆ |
|
Bridge | Mn: Mark, nonspacing | Inherited |
| ◌͇ |
|
Equals sign | Mn: Mark, nonspacing | Inherited |
| ◌͈ |
|
Double vertical line | Mn: Mark, nonspacing | Inherited |
| ◌͉ |
|
Left angle | Mn: Mark, nonspacing | Inherited |
| ◌͊ |
|
Not tilde | Mn: Mark, nonspacing | Inherited |
| ◌͋ |
|
Homothetic | Mn: Mark, nonspacing | Inherited |
| ◌͌ |
|
Almost equal to | Mn: Mark, nonspacing | Inherited |
| ◌͍ |
|
Left right arrow | Mn: Mark, nonspacing | Inherited |
| ◌͎ |
|
Upwards arrow | Mn: Mark, nonspacing | Inherited |
| ◌͐ |
|
Right arrowhead | Mn: Mark, nonspacing | Inherited |
| ◌͑ |
|
Left half ring | Mn: Mark, nonspacing | Inherited |
| ◌͒ |
|
Fermata | Mn: Mark, nonspacing | Inherited |
| ◌͓ |
|
X | Mn: Mark, nonspacing | Inherited |
| ◌͔ |
|
Left arrowhead | Mn: Mark, nonspacing | Inherited |
| ◌͕ |
|
Right arrowhead | Mn: Mark, nonspacing | Inherited |
| ◌͖ |
|
Right arrowhead and up arrowhead | Mn: Mark, nonspacing | Inherited |
| ◌͗ |
|
Right half ring | Mn: Mark, nonspacing | Inherited |
| ◌͘ |
|
Dot right | Mn: Mark, nonspacing | Inherited |
| ◌͙ |
|
Asterisk | Mn: Mark, nonspacing | Inherited |
| ◌͚ |
|
Double ring | Mn: Mark, nonspacing | Inherited |
| ◌͛ |
|
Zigzag | Mn: Mark, nonspacing | Inherited |
| ◌͜◌ |
|
Double breve | Mn: Mark, nonspacing | Inherited |
| ◌͝◌ |
|
Double breve | Mn: Mark, nonspacing | Inherited |
| ◌͞◌ |
|
Double macron | Mn: Mark, nonspacing | Inherited |
| ◌͟◌ |
|
Double macron | Mn: Mark, nonspacing | Inherited |
| ◌͠◌ |
|
Double tilde | Mn: Mark, nonspacing | Inherited |
| ◌͡◌ |
|
Double inverted breve | Mn: Mark, nonspacing | Inherited |
| ◌͢◌ |
|
Double rightwards arrow | Mn: Mark, nonspacing | Inherited |
| ◌ͣ |
|
Latin small letter a | Mn: Mark, nonspacing | Inherited |
| ◌ͤ |
|
Latin small letter e | Mn: Mark, nonspacing | Inherited |
| ◌ͥ |
|
Latin small letter i | Mn: Mark, nonspacing | Inherited |
| ◌ͦ |
|
Latin small letter o | Mn: Mark, nonspacing | Inherited |
| ◌ͧ |
|
Latin small letter u | Mn: Mark, nonspacing | Inherited |
| ◌ͨ |
|
Latin small letter c | Mn: Mark, nonspacing | Inherited |
| ◌ͩ |
|
Latin small letter d | Mn: Mark, nonspacing | Inherited |
| ◌ͪ |
|
Latin small letter h | Mn: Mark, nonspacing | Inherited |
| ◌ͫ |
|
Latin small letter m | Mn: Mark, nonspacing | Inherited |
| ◌ͬ |
|
Latin small letter r | Mn: Mark, nonspacing | Inherited |
| ◌ͭ |
|
Latin small letter t | Mn: Mark, nonspacing | Inherited |
| ◌ͮ |
|
Latin small letter v | Mn: Mark, nonspacing | Inherited |
| ◌ͯ |
|
Latin small letter x | Mn: Mark, nonspacing | Inherited |
| ◌᪰ |
|
Doubled circumflex | Mn: Mark, nonspacing | Inherited |
| ◌᪱ |
|
Diaeresis-ring | Mn: Mark, nonspacing | Inherited |
| ◌᪲ |
|
Infinity | Mn: Mark, nonspacing | Inherited |
| ◌᪳ |
|
Downwards arrow | Mn: Mark, nonspacing | Inherited |
| ◌᪴ |
|
Triple dot | Mn: Mark, nonspacing | Inherited |
| ◌᪵ |
|
X-x | Mn: Mark, nonspacing | Inherited |
| ◌᪶ |
|
Wiggly line | Mn: Mark, nonspacing | Inherited |
| ◌᪷ |
|
Open mark | Mn: Mark, nonspacing | Inherited |
| ◌᪸ |
|
Double open mark | Mn: Mark, nonspacing | Inherited |
| ◌᪹ |
|
Light centralization stroke | Mn: Mark, nonspacing | Inherited |
| ◌᪺ |
|
Strong centralization stroke | Mn: Mark, nonspacing | Inherited |
| ◌᪻ |
|
Parentheses | Mn: Mark, nonspacing | Inherited |
| ◌᪼ |
|
Double parentheses | Mn: Mark, nonspacing | Inherited |
| ◌᪽ |
|
Parentheses | Mn: Mark, nonspacing | Inherited |
| ◌ᪿ |
|
Latin small letter w | Mn: Mark, nonspacing | Inherited |
| ◌ᫀ |
|
Latin small letter turned w | Mn: Mark, nonspacing | Inherited |
| ◌᷀ |
|
Dotted grave | Mn: Mark, nonspacing | Inherited |
| ◌᷁ |
|
Dotted acute | Mn: Mark, nonspacing | Inherited |
| ◌᷂ |
|
Snake | Mn: Mark, nonspacing | Inherited |
| ◌᷃ |
|
Suspension mark | Mn: Mark, nonspacing | Inherited |
| ◌᷄ |
|
Macron-acute | Mn: Mark, nonspacing | Inherited |
| ◌᷅ |
|
Grave-macron | Mn: Mark, nonspacing | Inherited |
| ◌᷆ |
|
Macron-grave | Mn: Mark, nonspacing | Inherited |
| ◌᷇ |
|
Acute-macron | Mn: Mark, nonspacing | Inherited |
| ◌᷈ |
|
Grave-acute-grave | Mn: Mark, nonspacing | Inherited |
| ◌᷉ |
|
Acute-grave-acute | Mn: Mark, nonspacing | Inherited |
| ◌᷊ |
|
Latin small letter r | Mn: Mark, nonspacing | Inherited |
| ◌᷋ |
|
Breve-macron | Mn: Mark, nonspacing | Inherited |
| ◌᷌ |
|
Macron-breve | Mn: Mark, nonspacing | Inherited |
| ◌᷍◌ |
|
Double circumflex | Mn: Mark, nonspacing | Inherited |
| ◌᷎ |
|
Ogonek | Mn: Mark, nonspacing | Inherited |
| ◌᷏ |
|
Zigzag | Mn: Mark, nonspacing | Inherited |
| ◌᷐ |
|
Is | Mn: Mark, nonspacing | Inherited |
| ◌᷑ |
|
Ur | Mn: Mark, nonspacing | Inherited |
| ◌᷒ |
|
Us | Mn: Mark, nonspacing | Inherited |
| ◌ᷓ |
|
Latin small letter flattened open a | Mn: Mark, nonspacing | Inherited |
| ◌ᷔ |
|
Latin small letter ae | Mn: Mark, nonspacing | Inherited |
| ◌ᷕ |
|
Latin small letter ao | Mn: Mark, nonspacing | Inherited |
| ◌ᷖ |
|
Latin small letter av | Mn: Mark, nonspacing | Inherited |
| ◌ᷗ |
|
Latin small letter c cedilla | Mn: Mark, nonspacing | Inherited |
| ◌ᷘ |
|
Latin small letter insular d | Mn: Mark, nonspacing | Inherited |
| ◌ᷙ |
|
Latin small letter eth | Mn: Mark, nonspacing | Inherited |
| ◌ᷚ |
|
Latin small letter g | Mn: Mark, nonspacing | Inherited |
| ◌ᷛ |
|
Latin letter small capital g | Mn: Mark, nonspacing | Inherited |
| ◌ᷜ |
|
Latin small letter k | Mn: Mark, nonspacing | Inherited |
| ◌ᷝ |
|
Latin small letter l | Mn: Mark, nonspacing | Inherited |
| ◌ᷞ |
|
Latin letter small capital l | Mn: Mark, nonspacing | Inherited |
| ◌ᷟ |
|
Latin letter small capital m | Mn: Mark, nonspacing | Inherited |
| ◌ᷠ |
|
Latin small letter n | Mn: Mark, nonspacing | Inherited |
| ◌ᷡ |
|
Latin letter small capital n | Mn: Mark, nonspacing | Inherited |
| ◌ᷢ |
|
Latin letter small capital r | Mn: Mark, nonspacing | Inherited |
| ◌ᷣ |
|
Latin small letter r rotunda | Mn: Mark, nonspacing | Inherited |
| ◌ᷤ |
|
Latin small letter s | Mn: Mark, nonspacing | Inherited |
| ◌ᷥ |
|
Latin small letter long s | Mn: Mark, nonspacing | Inherited |
| ◌ᷦ |
|
Latin small letter z | Mn: Mark, nonspacing | Inherited |
| ◌ᷧ |
|
Latin small letter alpha | Mn: Mark, nonspacing | Inherited |
| ◌ᷨ |
|
Latin small letter b | Mn: Mark, nonspacing | Inherited |
| ◌ᷩ |
|
Latin small letter beta | Mn: Mark, nonspacing | Inherited |
| ◌ᷪ |
|
Latin small letter schwa | Mn: Mark, nonspacing | Inherited |
| ◌ᷫ |
|
Latin small letter f | Mn: Mark, nonspacing | Inherited |
| ◌ᷬ |
|
Latin small letter l with double middle tilde | Mn: Mark, nonspacing | Inherited |
| ◌ᷭ |
|
Latin small letter o with light centralization stroke | Mn: Mark, nonspacing | Inherited |
| ◌ᷮ |
|
Latin small letter p | Mn: Mark, nonspacing | Inherited |
| ◌ᷯ |
|
Latin small letter esh | Mn: Mark, nonspacing | Inherited |
| ◌ᷰ |
|
Latin small letter u with light centralization stroke | Mn: Mark, nonspacing | Inherited |
| ◌ᷱ |
|
Latin small letter w | Mn: Mark, nonspacing | Inherited |
| ◌ᷲ |
|
Latin small letter a with diaeresis | Mn: Mark, nonspacing | Inherited |
| ◌ᷳ |
|
Latin small letter o with diaeresis | Mn: Mark, nonspacing | Inherited |
| ◌ᷴ |
|
Latin small letter u with diaeresis | Mn: Mark, nonspacing | Inherited |
| ◌᷵ |
|
Up tack | Mn: Mark, nonspacing | Inherited |
| ◌᷸ |
|
Dot left | Mn: Mark, nonspacing | Inherited |
| ◌᷹ |
|
Wide inverted bridge | Mn: Mark, nonspacing | Inherited |
| ◌᷻ |
|
Deletion mark | Mn: Mark, nonspacing | Inherited |
| ◌᷼◌ |
|
Double inverted breve | Mn: Mark, nonspacing | Inherited |
| ◌᷽ |
|
Almost equal to | Mn: Mark, nonspacing | Inherited |
| ◌᷾ |
|
Left arrowhead | Mn: Mark, nonspacing | Inherited |
| ◌᷿ |
|
Right arrowhead and down arrowhead | Mn: Mark, nonspacing | Inherited |
| ◌⃐◌ |
|
Left harpoon | Mn: Mark, nonspacing | Inherited |
| ◌⃑◌ |
|
Right harpoon | Mn: Mark, nonspacing | Inherited |
| ◌⃒ |
|
Long vertical line overlay | Mn: Mark, nonspacing | Inherited |
| ◌⃓ |
|
Short vertical line overlay | Mn: Mark, nonspacing | Inherited |
| ◌⃔◌ |
|
Anticlockwise arrow | Mn: Mark, nonspacing | Inherited |
| ◌⃕◌ |
|
Clockwise arrow | Mn: Mark, nonspacing | Inherited |
| ◌⃖◌ |
|
Left arrow | Mn: Mark, nonspacing | Inherited |
| ◌⃗◌ |
|
Right arrow | Mn: Mark, nonspacing | Inherited |
| ◌⃘ |
|
Ring overlay | Mn: Mark, nonspacing | Inherited |
| ◌⃙ |
|
Clockwise ring overlay | Mn: Mark, nonspacing | Inherited |
| ◌⃚ |
|
Anticlockwise ring overlay | Mn: Mark, nonspacing | Inherited |
| ◌⃛◌ |
|
Three dots | Mn: Mark, nonspacing | Inherited |
| ◌⃜◌ |
|
Four dots | Mn: Mark, nonspacing | Inherited |
| ◌⃡◌ |
|
Left right arrow | Mn: Mark, nonspacing | Inherited |
| ◌⃥ |
|
Reverse solidus overlay | Mn: Mark, nonspacing | Inherited |
| ◌⃦ |
|
Double vertical stroke overlay | Mn: Mark, nonspacing | Inherited |
| ◌⃧ |
|
Annuity symbol | Mn: Mark, nonspacing | Inherited |
| ◌⃨ |
|
Triple underdot | Mn: Mark, nonspacing | Inherited |
| ◌⃩◌ |
|
Wide bridge | Mn: Mark, nonspacing | Inherited |
| ◌⃪ |
|
Leftwards arrow overlay | Mn: Mark, nonspacing | Inherited |
| ◌⃫ |
|
Long double solidus overlay | Mn: Mark, nonspacing | Inherited |
| ◌⃬ |
|
Rightwards harpoon with barb downwards | Mn: Mark, nonspacing | Inherited |
| ◌⃭ |
|
Leftwards harpoon with barb downwards | Mn: Mark, nonspacing | Inherited |
| ◌⃮ |
|
Left arrow | Mn: Mark, nonspacing | Inherited |
| ◌⃯ |
|
Right arrow | Mn: Mark, nonspacing | Inherited |
| ◌⃰◌ |
|
Asterisk | Mn: Mark, nonspacing | Inherited |
| ◌︠ |
|
Ligature left half | Mn: Mark, nonspacing | Inherited |
| ◌︡ |
|
Ligature right half | Mn: Mark, nonspacing | Inherited |
| ◌︢ |
|
Double tilde left half | Mn: Mark, nonspacing | Inherited |
| ◌︣ |
|
Double tilde right half | Mn: Mark, nonspacing | Inherited |
| ◌︤ |
|
Macron left half | Mn: Mark, nonspacing | Inherited |
| ◌︥ |
|
Macron right half | Mn: Mark, nonspacing | Inherited |
| ◌︦◌ |
|
Conjoining macron | Mn: Mark, nonspacing | Inherited |
| ◌︧ |
|
Ligature left half | Mn: Mark, nonspacing | Inherited |
| ◌︨ |
|
Ligature right half | Mn: Mark, nonspacing | Inherited |
| ◌︩ |
|
Tilde left half | Mn: Mark, nonspacing | Inherited |
| ◌︪ |
|
Tilde right half | Mn: Mark, nonspacing | Inherited |
| ◌︫ |
|
Macron left half | Mn: Mark, nonspacing | Inherited |
| ◌︬ |
|
Macron right half | Mn: Mark, nonspacing | Inherited |
| ◌︭ |
|
Conjoining macron | Mn: Mark, nonspacing | Inherited |
See also
[edit]- Latin-script alphabets
- Alt code
- Category:Letters with diacritics
- Collating sequence
- Combining character
- Compose key
- English terms with diacritical marks
- Heavy metal umlaut
- ISO/IEC 8859 8-bit extended-Latin-alphabet European character encodings
- Latin alphabet
- List of Latin letters
- List of precomposed Latin characters in Unicode
- List of U.S. cities with diacritics
- Romanization
Notes
[edit]- ^ The New Yorker is reported as being unique in its continuing usage of them.[1]
References
[edit]- ^ Baum, Dan (16 December 2010). "The New Yorker's odd mark — the diaeresis". dscriber. Archived from the original on 16 December 2010.
Among the many mysteries of The New Yorker is that funny little umlaut over words like coöperate and reëlect. The New Yorker seems to be the only publication on the planet that uses it, and I always found it a little pretentious until I did some research. Turns out, it's not an umlaut. It's a diaeresis.
- ^ Sweet, Henry (1877). A Handbook of Phonetics. Cambridge: Cambridge University Press. pp. 174–175.
Even letters with accents and diacritics [...] being only cast for a few founts, act practically as new letters. [...] We may consider the h in sh and th simply as a diacritic written for convenience on a line with the letter it modifies.
- ^ Oxford English Dictionary
- ^ Nestle, Eberhard (1888). Syrische Grammatik mit Litteratur, Chrestomathie und Glossar. Berlin: H. Reuther's Verlagsbuchhandlung. [translated to English as Syriac grammar with bibliography, chrestomathy and glossary, by R. S. Kennedy. London: Williams & Norgate 1889].
- ^ Coakley, J. F. (2002). Robinson's Paradigms and Exercises in Syriac Grammar (5th ed.). Oxford University Press. ISBN 978-0-19-926129-1.
- ^ Michaelis, Ioannis Davidis (1784). Grammatica Syriaca.
- ^ Gramática de la Llingua Asturiana (PDF) (3rd ed.). Academia de la Llingua Asturiana. 2001. section 1.2. ISBN 84-8168-310-8. Archived from the original (PDF) on 2011-05-25. Retrieved 2011-06-07.
- ^ http://www.juls.savba.sk/ediela/psp2000/psp.pdf page 12, section I.2
- ^ Grønlands sprognævn (1992)
- ^ Petersen (1990)
- ^ S.P. Brock, "An Introduction to Syriac Studies", in J.H. Eaton (Ed.,), Horizons in Semitic Studies (1980)
- ^ Norris, Mary (26 April 2012). "The Curse of the Diaeresis". The New Yorker. Retrieved 18 April 2014.
- ^ van Geloven, Sander (2012). Diakritische tekens in het Nederlands (in Dutch). Utrecht: Hellebaard. Archived from the original on 2013-10-29.
- ^ Steele, Shawn (2010-01-25). "Most combining characters in a Unicode glyph/character/whatever". Microsoft. Archived from the original on 2019-05-16. Retrieved 2019-11-25.
External links
[edit]- Context of Diacritics | A research project Archived 2014-10-12 at the Wayback Machine
- Diacritics Project
- Unicode
- Orthographic diacritics and multilingual computing, by J. C. Wells
- Notes on the use of the diacritics, by Markus Lång
- Entering International Characters (in Linux, KDE)
- Standard Character Set for Macintosh PDF at Adobe
Diacritic
View on GrokipediaBasic Concepts
Definition and Purpose
A diacritic is a mark, such as an accent, placed over, under, or through a letter in some languages to indicate that the letter should be pronounced in a different way from usual.[6] These supplementary glyphs are added to a base letter or symbol to modify its phonetic value, distinguish it from similar forms, or signal other linguistic features.[1] The primary purposes of diacritics in writing systems include distinguishing phonemes or homographs, indicating tones, marking stress or vowel length, and denoting grammatical categories such as nasalization.[7] For example, in Spanish, the acute accent (´) differentiates homographs like papá (father) from papa (potato), altering meaning through stress placement.[8] In tonal languages like Vietnamese, diacritics such as the acute (´) and grave (`) accents represent distinct tones that can change word meanings entirely, with six tones marked on vowels.[9] The macron (¯) commonly signals vowel length, as seen in Latin where māter (mother) contrasts with shorter-voweled forms to preserve phonological distinctions.[10] Similarly, the tilde (~) indicates nasalization in Portuguese, as in mão (hand), where the vowel is pronounced with nasal airflow.[11] Diacritics differ from punctuation marks, which are standalone symbols used to structure sentences, indicate pauses, or denote questions and exclamations, whereas diacritics integrate directly with letters to modify their properties.[12] They also contrast with ligatures, which join two or more letters into a single unified glyph for historical, aesthetic, or space-saving reasons, such as æ representing a diphthong, without altering pronunciation independently.[13]Historical Development
The earliest known use of diacritics in writing systems emerged in ancient Greek during the Hellenistic period, where they served to mark prosody and pronunciation. In the 2nd century BCE, the scholar Aristophanes of Byzantium, working at the Library of Alexandria, invented the system of accent marks (acute, circumflex, and grave) along with breathing marks (rough and smooth) to indicate pitch accent and aspiration in Greek texts, addressing ambiguities in the unvocalized script.[14] This innovation, refined by his successor Dionysius Thrax, enhanced the existing alphabetic script with prosodic and phonetic notations, influencing subsequent Greek scholarship.[15] In Semitic traditions, diacritics appeared independently to denote vowels in consonantal scripts. The Hebrew niqqud system, consisting of points and dashes placed above, below, or within letters to represent vowel sounds and cantillation, was developed by the Tiberian Masoretes between the 6th and 10th centuries CE as part of efforts to standardize and preserve the pronunciation of the Hebrew Bible amid linguistic shifts.[16] This system drew on earlier proto-vocalization techniques but achieved widespread adoption through Masoretic manuscripts like the Aleppo Codex. During the medieval period, these Greek and Hebrew models influenced the Latin script through scholarly exchanges. The Carolingian reforms of the 8th and 9th centuries, led by figures like Alcuin of York under Charlemagne, promoted a clearer minuscule script that facilitated the later integration of diacritics such as the acute accent for stress in Latin pedagogical texts, while Byzantine Greek breathing marks inspired similar notations in ecclesiastical Latin manuscripts to aid pronunciation in polyglot monastic communities.[17] The Renaissance and the advent of printing accelerated the consistency and dissemination of diacritics across European scripts. Theodorus Gaza, a Byzantine émigré scholar (c. 1398–1475), played a pivotal role by authoring a widely used Greek grammar that incorporated full diacritic notation; its first printed edition in 1495 by Aldus Manutius in Venice established standardized typographic forms for accents and breathings, influencing subsequent Greek printing and Latin adaptations.[18] The printing press itself, introduced in the 15th century, enforced uniformity by casting diacritics as integral to typefaces, reducing scribal variations and promoting their routine use in vernacular languages like French and Italian, where accents clarified etymological and phonetic distinctions.[19] In the 19th and 20th centuries, colonial linguistics expanded diacritics to non-European languages through Romanization efforts. European missionaries and administrators, such as those in French Indochina, standardized diacritics in scripts for Vietnamese and African languages to facilitate Bible translation and administration, often adapting Latin-based systems to capture tonal and phonetic features absent in European orthographies.[20] Post-World War II decolonization prompted orthographic reforms in newly independent nations, where diacritics were sometimes simplified or rejected to assert cultural autonomy; for instance, in Southern African languages, harmonization initiatives reduced reliance on colonial-era diacritics to promote pan-ethnic readability and national identity.[21]Types and Classification
Typological Categories
Diacritics are typologically categorized by their position relative to the base letter, including supra-linear marks placed above the letter, sub-linear marks positioned below, inline marks integrated within or through the letter, and combining marks that overlap or attach in varied ways.[22] Supra-linear diacritics, such as the circumflex (^) in â or the acute (´) in é, modify letters from above to indicate features like tone or length. Sub-linear diacritics include the cedilla (¸) in ç, which alters consonant pronunciation, and the ogonek (˛) in ą, extending below and to the right for nasalization. Inline diacritics, like the stroke overlay (̷) or breve (˘) in inline forms, intersect the base letter's structure, while combining marks such as the tilde (~) in ã can appear above, below, or centered depending on rendering rules.[22] Further classification occurs by attachment method, distinguishing free-standing diacritics from ligated ones, and spaced from non-spaced forms. Free-standing diacritics, treated as spacing modifier letters, occupy their own space and function independently, such as the prime (ʹ) for primary stress or the apostrophe-like glottal stop (ʼ) in some orthographies.[23] In contrast, ligated diacritics fuse with the base to form a single glyph, as in precomposed characters like ç (c with cedilla) or ł (l with stroke). Spaced diacritics maintain separation for clarity in certain scripts, whereas non-spaced combining marks attach directly without advancing the cursor, enabling stacked or overlapping applications like multiple tones in Vietnamese.[23][22] Structural variations among diacritics include those tailored to vowels, consonants, or multiglyph combinations. Vowel diacritics, such as the diaeresis (¨) in ë, separate vowel sounds or indicate distinct pronunciation, often using paired dots or lines above. Consonant modifiers, like the stroke (/) through đ or the caron (ˇ) in č, adjust articulation without altering the letter's core shape. Multiglyph combinations allow layering, as in the double acute (˝) over ő, which merges two acute marks into one for efficiency in scripts like Hungarian.[22] The forms of diacritics have evolved from simple dots or lines to more complex shapes through mechanisms like iconicity, derivation from letters, and adaptation for disambiguation. Basic dots, originally deletion marks in Old Irish (punctum delens ˙) or vowel indicators in Arabic, developed into paired or elongated variants for clarity, such as the diaeresis from two separate dots. Lines, like the Latin apex for length, stylized into curves or hooks; the cedilla (¸), for instance, derives from a diminutive Visigothic z (zeda), reinterpreted under c to denote softness in Romance languages. Complex shapes like the double acute (˝) emerged from combining simpler acute accents to represent long umlauted vowels in Hungarian orthography.[24][25] These adaptations reflect scribal innovations and borrowing across scripts, prioritizing legibility and phonetic precision.[24]Functional Roles
Diacritics fulfill diverse linguistic functions across writing systems, primarily by encoding phonetic modifications, grammatical information, semantic contrasts, and certain non-phonemic attributes. These roles enhance the precision of orthographies, allowing scripts to represent complex sound systems or structural nuances without expanding the base alphabet excessively.Phonetic Functions
Diacritics most commonly serve phonetic purposes, altering the articulation or prosody of letters to reflect specific sounds. For vowel quality, the umlaut (¨) in German denotes fronting and rounding, as in ö, where the back vowel /o/ shifts to a front counterpart /ø/ due to historical assimilation before a following high front vowel or semivowel.[26] Similarly, for consonant modification, the háček (ˇ), known as the caron in typography, indicates palatalization in Slavic languages like Czech, transforming z into ž pronounced as /ʒ/, a postalveolar fricative, by raising the tongue toward the hard palate. In tonal languages, diacritics mark pitch contours essential for phonemic distinctions; for instance, the sắc accent (´) in Vietnamese signals a high rising tone on vowels, as in ma meaning "ghost" versus má meaning "mother."[9]Grammatical Roles
Beyond phonetics, diacritics can signal grammatical categories, such as inflectional changes or morphological patterns. In Irish, the acute accent, or fada, functions as an ablaut indicator by marking vowel length in verb paradigms and nominal forms, where long vowels alternate with short ones to denote tense, mood, or case— for example, marking long vowels in forms like tá (is, present), which contrast with short vowels in other paradigm elements.[27] This role underscores how diacritics integrate historical sound changes into contemporary morphology, preserving paradigmatic oppositions without additional letters.Semantic Distinctions
Diacritics often differentiate homographs or near-homophones to clarify lexical meaning, preventing ambiguity in communication. In Polish, the stroke diacritic on ł creates a semantic contrast with plain l, as łuk (bow, /wuk/) refers to an archer's weapon while luk (hatch, /luk/) denotes a ship's opening; this distinction arose from the velarization of /l/ to /w/ in Polish phonology, with the diacritic preserving the opposition.[28]Non-Phonemic Uses
Occasionally, diacritics convey suprasegmental or paralinguistic features without altering core phonemic inventory. For breathiness, an underdot or similar mark appears in some Indic transliteration schemes to denote murmured (breathy-voiced) consonants, as in representations of Hindi aspirates like /bʱ/ transcribed with an underdot for the glottal friction accompanying voicing.[29] Such uses extend to emphasis, where diacritics like the Arabic shadda (ّ) indicate consonant gemination for prosodic stress, reinforcing intensity without shifting segmental pronunciation.Diacritics in Alphabetic Scripts
Latin Script Applications
In Latin-based writing systems, several diacritics are commonly employed to indicate phonetic distinctions, stress, or tonal variations. The acute accent (á) often marks stress or a closed vowel sound, as seen in Spanish words like café, where it distinguishes the pronunciation from cafe. The grave accent (à) typically signals an open vowel or differentiates homographs, such as in French à versus a. The circumflex (â) denotes historical vowel length or contraction, prominently in French (e.g., forêt) and Portuguese (e.g., ação). The diaeresis (ë) separates vowels to prevent diphthongization, appearing in French (naïf) and Catalan (pèl). The cedilla (ç) softens a "c" before "a," "o," or "u" to produce an /s/ sound, as in French garçon and Portuguese cação.[30][31][32] Regional patterns of diacritic usage vary across Latin-script languages. In Western Europe, Romance languages heavily feature these marks: French employs the circumflex (ê) for vowel quality in words like fenêtre, while Spanish and Portuguese use the acute for stress (e.g., Spanish mamá, Portuguese amanhã) and the tilde on ñ or ã. Eastern European Slavic languages integrate distinct diacritics, such as the ogonek in Polish (ą) to indicate nasalization (e.g., ręka for "hand") and the caron (háček) in Czech for palatalization (e.g., člověk). Overseas adaptations extend this further; Vietnamese, using a Latin script influenced by French colonialism, applies diacritics for six tones, including acute (sá), grave (sà), and hook above (sả) to convey pitch contours essential for meaning (e.g., ma can mean "ghost," "mother," or "rice seedling" depending on the tone).[30][33][32][34] Diacritics frequently combine with base letters to form entirely new letters in the alphabet, treated as distinct graphemes in sorting and pronunciation. In Spanish, the tilde over n creates ñ (e.g., niño), representing a palatal nasal sound /ɲ/ and functioning as the 15th letter of the alphabet. Similarly, Danish and Norwegian use the slash over o to produce ø (e.g., Danish øl for "beer"), denoting a mid-front rounded vowel /ø/, and position it as the 15th letter. These formations allow Latin script to adapt efficiently to diverse phonologies without adding entirely new symbols.[35][36] Many modern European languages employing the Latin script incorporate diacritics, reflecting widespread adaptation for phonetic precision; English stands out as a major exception without them in standard usage. This prevalence underscores the script's flexibility across more than 200 languages globally.[37][35]Non-Latin Alphabetic Scripts
In alphabetic scripts outside the Latin tradition, diacritics often serve to indicate phonetic nuances such as aspiration, vowel quality, or prosody, integrated directly into the script's design to accommodate the language's phonological needs. These marks differ from Latin extensions by being intrinsic to the script's historical development, frequently involving vertical stacking or right-to-left orientation, which adds complexity to rendering and reading compared to the predominantly horizontal, left-to-right Latin modifications. The Greek alphabet employs a polytonic system of diacritics in its classical and modern forms to denote pitch accents, breathings, and contractions. Rough breathing (῾) indicates an initial /h/ sound, as in ἁ (ha), while smooth breathing (᾿) marks its absence, appearing as a reversed apostrophe or apostrophe-like mark above the vowel.[38] Iota subscript (ᾳ), a small iota below certain long diphthongs like alpha or eta, represents a historical contraction without altering pronunciation in modern Greek.[39] Modern polytonic Greek also uses acute (Ά), grave, and circumflex accents for stress, though monotonic Greek simplifies these to a single tonos mark. These diacritics originated in ancient notations for poetic meter and are combined above or below letters, requiring precise Unicode positioning.[40] Other non-Latin alphabets incorporate diacritics sparingly for specific contrasts. Georgian's mkhedruli script, left-to-right, uses minimal modifiers, but generally avoids complex diacritics in favor of phonetic letter forms.[41] These examples highlight how non-Latin alphabets adapt diacritics to script direction and phonological demands, often prioritizing compactness over extensibility.[42]Cyrillic Script Features
The Cyrillic script, originally developed in the 9th century in the First Bulgarian Empire from Greek uncial forms with some Latin letter influences, initially featured no diacritics and relied on a uniform ustav style for phonetic representation across early Slavic texts.[43] Over time, as the script adapted to diverse languages, diacritics and letter modifications emerged to denote palatalization, vowel qualities, and stress, particularly in Slavic orthographies, while non-Slavic adaptations in Soviet-era extensions introduced further variations for indigenous sounds.[43] These features reflect influences from Greek suprascriptal marks and Latin diacritics, enabling distinctions in phonology without altering the core alphabet's structure. A key unique form in Cyrillic is the soft sign (ь), which functions as a diacritic to palatalize the preceding consonant, softening its articulation in languages like Russian, where it appears in words such as мать (mät', "mother") to indicate [matʲ].[44] The hard sign (ъ) similarly separates hard consonants from following soft vowels but is rarer, used post-consonantally in Russian for clarity, as in общий (obshchiy, "common").[44] In Bulgarian, ъ and ь derive from historical yer vowels, with ъ representing a central vowel [ɤ] akin to schwa in unstressed positions (e.g., мъж, "man") and ь serving dual roles as soft sign and reduced vowel [ĭ].[45] Language-specific distributions vary widely: Russian employs minimal diacritics, relying primarily on ь and ъ for palatalization and separation without additional accents in standard orthography.[44] Ukrainian incorporates ї, a letter with diaeresis marks over і to denote the diphthong /ji/ (e.g., йогурт, "yogurt"), distinguishing it from plain і [/i/]. In South Slavic languages like Macedonian, combining acute accents (e.g., Ќ ќ for /c/) appear in extended forms or for stress in pedagogical texts, though standard writing avoids them.[46] Belarusian orthography similarly limits diacritics to the soft sign ь, with no routine use of grave or other accents. Historical Romanian Cyrillic, used until the 1860s, featured the breve on а (ӑ) to mark the mid-central vowel /ə/ (e.g., ѧмă, "to have"), alongside short forms like й with breve for /j/.[47] The 1918 Soviet orthography reform significantly reduced diacritics across Cyrillic-using languages, eliminating combining marks like the breve on и (replaced by й) and diaeresis on е (replaced by є), while restricting ь after certain consonants (e.g., no longer after ж, ш, ч before е or и) and abolishing final ъ.[44] This simplification aimed to streamline printing and literacy, dropping obsolete letters like ѣ and ѳ, and influenced standardized forms in Russian and other Soviet languages.[44] Post-Soviet revivals in minority languages have reintroduced or retained extended diacritics; for instance, Chuvash Cyrillic uses the double acute on у (ӳ) to represent /y/, preserving Turkic phonetics in Russia's Volga region. Such adaptations support over 50 non-Slavic languages, emphasizing phonetic accuracy in indigenous contexts.[43]Diacritics in Non-Alphabetic Systems
Logographic and Syllabic Scripts
In logographic and syllabic scripts, diacritics often serve to modify the phonetic or semantic interpretation of base symbols, addressing ambiguities inherent in systems that prioritize morphemes or syllables over linear alphabetic sequences. Unlike alphabetic scripts, where diacritics typically adjust individual letter sounds within a sequential framework, these markers in non-alphabetic systems compensate for the absence of strict phonemic ordering by providing suprasegmental cues, such as tone or voicing, directly attached to holistic units like characters or glyphs. This functional adaptation enhances distinguishability in dense, non-linear representations, where context alone may not suffice for disambiguation.[48] In the context of Chinese, a prototypical logographic script, diacritics appear prominently in Hanyu Pinyin, the official romanization system used alongside hanzi characters to indicate Mandarin tones. Pinyin employs four primary diacritics—macron (¯) for the high level tone, acute (´) for the rising tone, caron (ˇ) for the dipping tone, and grave (`) for the falling tone—placed over vowels to differentiate otherwise identical syllables, such as mā (mother), má (hemp), mǎ (horse), and mà (scold). These marks are essential because Mandarin's tonal system creates homophones in the logographic writing, where characters convey meaning without inherent phonetic redundancy; without them, pinyin would fail to capture the four tones of standard Putonghua, leading to potential misinterpretation in romanized texts or learning materials.[49] Japanese syllabaries, hiragana and katakana, incorporate diacritics to alter consonant voicing within moraic units, adapting the script's syllabic nature to phonetic nuances. The dakuten (゙), two small dots, is applied to unvoiced consonants like k, s, t, and h to produce voiced counterparts g, z, d, and b, transforming, for instance, か (ka) into が (ga). Similarly, the handakuten (゜), a small circle, modifies the h-sound to the fricative p, as in は (ha) becoming ぱ (pa). These diacritics, derived from historical adaptations of the kana system, enable the representation of voiced and prenasalized sounds without expanding the core syllabary inventory, thus maintaining the script's compact, mora-based structure while accommodating loanwords and native variations. In educational contexts, such markers facilitate second-language acquisition by visually signaling phonological shifts in otherwise uniform symbols.[50] Korean Hangul, a featural syllabary where letters assemble into blocks representing syllables, features the arae-a (ᆞ) as an under-dot modifier, historically positioned beneath consonants to denote a low central vowel sound /ʌ/ or /a/. This diacritic-like element, meaning "lower a," appears in archaic forms and the Jeju dialect, modifying base vowels or consonants within blocks, such as in obsolete spellings where it alters ㅏ (a) to a deeper variant. Though Hangul's design emphasizes phonetic assembly over additive marks, the arae-a functions as a positional modifier to extend vowel harmony and dialectal distinctions, compensating for the script's limited basic jamo set in representing regional phonology. Its retention in modern Unicode underscores its role in preserving historical and minority usages.[51][52] In other logographic traditions, such as Classic Maya glyphs, superfixes—affixes positioned above the main sign—act as diacritical modifiers to indicate grammatical aspects like passivity or location, often read last despite their visual prominence. For example, a pair of dots as a superfix cues passive voice in verbal expressions, distinguishing active from stative forms in the logographic-syllabic hybrid system. Egyptian hieroglyphs use determinatives, non-phonetic ideograms appended to words to clarify semantic categories without affecting pronunciation, functioning as classifiers rather than diacritics; a god figure determinative, for instance, specifies divine references in ambiguous phonetic strings, aiding interpretation in the script's logographic core. These mechanisms highlight how diacritics in such systems bridge semantic gaps arising from non-alphabetic organization, prioritizing contextual modification over sequential phonetics.[53][54][55]Abjad and Abugida Scripts
In abjad scripts, which primarily represent consonants while vowels are often implied or optional, diacritics serve to distinguish similar letter forms and indicate pronunciation nuances. The Arabic script employs i'jam, a system of dots added to consonant shapes to differentiate letters such as ب (bāʾ) from ت (tāʾ), essential for clarity in the unvocalized form of the text. In Persian, a subset of Arabic diacritics is used, including zabar (fatha-like mark for /æ/) and zēr (kasra-like for /e/), while Urdu extends this with zabar (above the letter for short /ə/, as in زَبَر zəbər), zair (below for /ɪ/, as in دِل dil), and paish (above for /ʊ/, as in گُل gul), though these vowel marks are frequently omitted in everyday writing for brevity.[56] The Syriac abjad, derived from earlier Aramaic traditions, incorporates diacritics in its Estrangela form for spirantization and vocalization; for instance, the rwḥā (ܿ) and rukkāḵā (݂) marks indicate spirantized consonants like fricative forms of stops post-vowel, while separate points denote vowels such as "o".[57] Phoenician, as a foundational abjad precursor from the 11th century BCE, lacked such diacritics, relying solely on 22 consonant letters, which influenced later Semitic scripts to develop pointing systems for vowel supplementation and distinction.[58] In the Ethiopic abugida, diacritical order marks (e.g., dots or lines) modify the inherent /ä/ vowel of consonants to other vowels, such as በ (bä) to ቤ (be). Abugida scripts, where consonants carry an inherent vowel modifiable by diacritics, integrate marks more systematically to specify vowel qualities. In Devanagari, used for Hindi and Sanskrit, matras function as dependent vowel signs attached to consonants; for example, the base क (ka with inherent /ə/) combines with ा (ā matra) to form का (kā, pronounced /kaː/), altering or suppressing the default vowel.[59] Tamil script, an abugida variant, incorporates Grantha letters and diacritics to accommodate Sanskrit loanwords, adding marks for aspirated or retroflex sounds absent in native Tamil, such as puḷḷi (a dot below consonants) as virama to suppress the inherent vowel and form clusters; short vowels like e and o use dedicated forms.[60] These systems highlight diacritics' role in phonetic precision within consonant-vowel matrices. The complexity of diacritics in abjads and abugidas arises from stacking and directional challenges; in right-to-left (RTL) abjads like Arabic-derived scripts, marks must align without disrupting cursive flow, while abugidas like Khmer employ coeng (្) for subjoined consonants, stacking up to three levels with below-base diacritics (e.g., vowel signs under clusters), necessitating precise rendering to avoid overlap.[61][62] Modern adaptations often simplify usage, particularly in cursive styles like Nastaliq for Urdu and Persian, where vowel diacritics are routinely reduced or omitted in print and digital media to enhance readability and aesthetic flow, relying on context for interpretation.[63][64]Language-Specific Usage
Languages Forming New Letters
In languages where diacritics form new letters, these modified characters are recognized as independent elements of the alphabet, occupying unique positions in lexicographic ordering and often having dedicated input methods on keyboards. This approach contrasts with mere phonetic modifications, as the resulting letters represent distinct phonemes or orthographic units essential to the language's identity. Such formations enhance precision in spelling and pronunciation while expanding the base alphabet to accommodate linguistic needs.[65] In Romance languages using the Latin script, French treats é (acute accent on e) and ç (cedilla on c) as distinct letters, mandatory for correct orthography and sorted separately from their base forms in dictionaries; for instance, é follows e but precedes f. Similarly, Portuguese regards ã (tilde on a) as a unique letter required for nasal vowels, with its absence rendering words incorrect, as in "maçã" (apple). In contrast, older English usage of diacritics like é was optional and non-alphabetic, as seen in loanwords, but modern English rarely forms new letters this way. Germanic and Finno-Ugric languages also integrate diacritics as new letters; German's umlauts ä, ö, ü are considered separate vowels with their own dictionary sections, pronounced as /ɛ/, /ø/, /y/ respectively, and essential for words like "Mädchen" (girl). Finnish similarly treats ä and ö as independent letters following a and o in the alphabet, representing front vowels and required for accurate spelling in a highly phonetic system. Among Celtic languages, Welsh includes ŵ (circumflex on w) as a distinct letter for the vowel /uː/, mandatory in words like "ŵy" (egg) and sorted after w. Slavic languages exemplify this in non-Latin scripts as well; Czech uses carons to create new letters like č (ch) and ě (palatal e), which are fully integrated into the 42-letter alphabet and treated as unique in collation, with diacritics altering pronunciation fundamentally.[66] Polish features ł (stroke on l) as a distinct consonant for /w/, separate from l (/l/) and positioned after l in dictionaries. In Turkic languages, Turkish incorporates ü (umlaut on u) as a new front rounded vowel letter, vital for phonemic distinctions in words like "gül" (rose). Baltic languages like Lithuanian form letters such as ą (ogonek on a) and ė (dot on e) as independent nasal and palatal vowels, respectively, expanding the alphabet to 32 letters and mandating their use for grammatical accuracy. Non-Latin alphabetic scripts show parallel developments; modern Greek employs the tonos (acute accent) to form distinct letters like ά (stressed alpha), which are sorted separately and essential for polytonic remnants in formal writing, though monotonic Greek simplifies this. Vietnamese exemplifies extensive use in a Latin-based system, where diacritics combine to create 29 distinct letters, including ă (breve on a), ơ (horn on o), and tones on vowels like ắ, forming a quoc ngu alphabet that treats each combination as unique for tonal and vowel distinctions. This orthographic status varies by language: mandatory diacritics like Portuguese ã ensure phonemic representation, while optional ones in historical contexts, such as early modern English, did not achieve letter status. These new letters have influenced literacy and reforms; for instance, Azerbaijan's 1991 switch from Cyrillic to Latin introduced diacritics like ç and ş, alongside ə, to better represent Turkic sounds, aligning with international scripts and simplifying instruction during post-Soviet transition.Diacritics as Modifiers Only
In English, diacritics serve primarily as temporary modifiers to alter pronunciation or indicate etymological origins in loanwords, rather than forming distinct letters in the native lexicon. The diaeresis (¨), for instance, appears over the second vowel in words like naïve to signal separate syllable pronunciation, preventing it from blending with the preceding vowel as in naive. Similarly, coöperate historically used the diaeresis to distinguish the vowels, though modern usage often omits it in favor of hyphenation or simplification. The cedilla (¸) under the "c" in façade modifies it to a soft /s/ sound, retaining the French influence without integrating as a new alphabetic character. These instances are rare in native English vocabulary and largely confined to stylistic or pedantic writing, with the diaeresis notably declining since the early 20th century.[67][68][69] Historically, English employed modifier-like marks in abbreviations for currency, such as the slashed "s" (s/) or tilde-over-s to denote shillings in medieval and early modern manuscripts, functioning as a shorthand diacritic to economize space without altering letter identity. In other languages, similar non-letter-forming roles appear; for example, Danish and Norwegian orthographies allow digraphs ae and oe as alternatives to the ligatures æ and ø in informal or pre-digital contexts, treating the marks as optional modifiers rather than essential letter components. The Swedish å, evolved from a ring diacritic over "a" (originally denoting a prolonged /a:/ sound from the digraph aa), is generally a full letter but can function as a modifier in certain dialectal transcriptions or historical texts where the ring indicates vowel lengthening without independent status. In Pinyin romanization for Mandarin Chinese, the umlaut over "u" (ü) modifies it to represent the /y/ vowel, distinct from plain "u," and is omitted in combinations like ju to avoid redundancy while preserving the sound alteration.[70][71][72] Beyond alphabetic modification, diacritics can affect punctuation for emphasis or intonation, as in Spanish where the inverted question mark (¿) and exclamation mark (¡) precede interrogative or exclamatory clauses to signal prosody from the outset, serving a modifier role without attaching to letters. Debates arise over the status of such marks, particularly the Hawaiian ʻokina (ʻ), a reversed apostrophe representing the glottal stop; while officially a consonant letter in modern Hawaiian orthography, it originated as a phonetic modifier and is sometimes treated as punctuation in non-standard or historical transcriptions, blurring the line between temporary alteration and lexical integration. Overall, English stands out as an outlier among Indo-European languages for its minimal reliance on diacritics as modifiers, largely due to its historical avoidance of systematic accentuation in favor of contextual inference.[73][74][75][68]Technical and Practical Aspects
Sorting and Collation Rules
In sorting and collation, diacritics influence the order of characters in alphabetical sequences, with rules varying by language and standard to reflect linguistic conventions in dictionaries, search engines, and databases.[76] The Unicode Collation Algorithm (UCA), as implemented in the Common Locale Data Repository (CLDR), defines multiple collation levels to handle these differences systematically. At the primary level, diacritics are ignored, treating accented characters equivalently to their base forms—for instance, "é" sorts as "e" in sequences like "eat" before "fed."[76] The secondary level accounts for diacritics and other non-base modifications, such as distinguishing "é" from "è" while still ignoring case. The tertiary level provides full distinction, including case sensitivity, placing "É" after "e" but before "f," ensuring precise ordering for applications requiring exact matches.[76][77] Language-specific variations adapt these principles to cultural norms, often tailoring the UCA through CLDR data. In French collation, accents are evaluated using backward secondary sorting, where the rightmost differing diacritic determines order; for example, "côte" precedes "coté" because the acute accent on the final "é" in "coté" sorts after the earlier circumflex on "ô" in "côte."[78] In German, umlauts function as distinct letters sorted immediately after their base vowels—thus, "ä" follows "a" but precedes "b," as seen in dictionary entries like "Apfel" before "Ärztin" before "Bahn."[79] Swedish places the ring-above "å" at the end of the alphabet after "z," treating it as a separate letter, so words like "zorro" precede "åka."[76] In Spanish phonebooks and standard sorting, "ñ" is a unique letter positioned after "n" but before "o," distinguishing it from "n" in sequences such as "nación" after "nada" but before "oasis."[80] Sorting with multiple diacritics presents challenges, particularly in languages like Vietnamese, where vowels combine base letters, vowel modifiers (e.g., breve or horn), and tone marks (e.g., acute or grave), requiring collation to decompose and weigh combining characters correctly to avoid misordering— for instance, "ả" (a with hook above) must sort appropriately relative to "ã" (a with tilde) under CLDR tailorings.[81] Legacy systems exacerbate these issues through backward compatibility, often relying on byte-based ASCII sorting that ignores or incorrectly positions diacritics, such as treating "é" (Unicode U+00E9) as a higher code point than intended, leading to inconsistent results in older databases or search tools without Unicode support.[82] International standards like ISO/IEC 14651:2025 provide a framework for multilingual collation, aligning closely with the UCA to enable consistent string ordering across scripts and languages by specifying a reference method for comparing character weights.[83] This standard supports tailorings for diacritic handling, ensuring interoperability in global applications, such as phonebooks where "ñ" maintains its post-"n" position in Spanish contexts.[83]| Language | Diacritic Example | Sorting Position | Source |
|---|---|---|---|
| French | é (acute) | After base "e", backward secondary | ICU Documentation |
| German | ä (umlaut) | After "a", before "b" | Microsoft Globalization |
| Swedish | å (ring) | After "z" | Unicode TR10 |
| Spanish | ñ (tilde) | After "n", before "o" | MySQL Reference |
Digital Generation and Input
The 7-bit ASCII standard, developed in the 1960s, limited character representation to 128 basic symbols primarily for English, excluding diacritics and thus hindering support for accented languages.[84] To overcome this, the ISO 8859 series of 8-bit extensions emerged in the late 1970s and 1980s, with ISO 8859-1 (Latin-1) incorporating common Western European diacritics like á, ç, and ñ in positions 128–255.[85] In early web contexts, HTML named character entities provided a workaround for inserting diacritics in ASCII-based environments, such as é for é or ü for ü, as defined in HTML standards before UTF-8 dominance.[86] Input methods for diacritics have evolved to enable efficient entry on physical and virtual keyboards. Dead keys function as modifiers that apply an accent to the subsequent letter without producing output themselves; for instance, pressing the circumflex (^) dead key followed by e yields ê on Windows systems.[87] Compose sequences, prevalent in Unix-like operating systems, use a dedicated Compose key (often mapped to Right Alt) followed by two symbols, such as Compose + ' + e to generate é.[88] The AltGr key on international layouts accesses third- and fourth-level characters, including diacritics like AltGr + , + c for ç.[87] On mobile devices, virtual keyboards facilitate input via long-press gestures; for example, holding e on iOS reveals options like é, è, and ê, while Android's Gboard offers similar pop-up variants for supported languages.[89] Rendering diacritics in digital displays often relies on Unicode's combining character mechanism, where a base glyph like e pairs with a modifier like U+0301 (acute accent) to form é, positioned algorithmically relative to the base's bounding box.[90] Challenges arise from font inconsistencies, such as improper vertical or horizontal alignment of marks—e.g., acute accents drifting off-center—or failure to maintain ligatures like fi when diacritics are added, leading to fallback displays where components appear uncombined.[90] These issues persist in environments lacking comprehensive OpenType support for mark positioning. In modern applications, emoji skin tone modifiers operate akin to combining diacritics, appending one of five Fitzpatrick scale-based codes (U+1F3FB to U+1F3FF) to a base emoji like 👏 to alter its appearance, such as 👏🏿 for dark skin tone, with rendering as a unified glyph when font-supported.[91] AI-assisted predictive text enhances input for accented languages by suggesting contextually appropriate words with diacritics, as in Gboard's multilingual autocorrection that prioritizes forms like café over cafe based on learned patterns.[92] For accessibility, screen readers like NVDA and JAWS process diacritics via Unicode support, pronouncing them phonetically—e.g., é as "e acute" or integrated into words like "café" as "cafay"—while outputting to braille displays where contracted forms accommodate accents.[93] Speech synthesis engines adjust intonation for diacritic-influenced sounds, ensuring comprehensible rendering in languages like French or Spanish, though inconsistencies in symbol voicing can occur across voices.[93]Transliteration Practices
Transliteration practices employ diacritics to map sounds from non-Latin scripts into the Latin alphabet, enabling accurate representation of phonemes during romanization, especially for academic, governmental, and bibliographic purposes. These systems prioritize reversibility and precision, using marks like underdots, carons, and macrons to distinguish consonants, vowels, and tones absent in standard Latin. For instance, in Arabic romanization, the ALA-LC standard, approved by the Library of Congress and American Library Association, applies underdots to denote emphatic consonants, such as ḥ for ح (ḥāʾ), ṣ for ص (ṣād), and ḍ for ض (ḍād), facilitating exact transcription of Semitic phonology.[94][95] Similar approaches appear in other scripts. The ISO 9:1995 standard for Cyrillic transliteration uses caron diacritics for sibilants and affricates, including š for ш (sha), č for ч (che), and ž for ж (zhe), supporting one-to-one conversion across Slavic languages like Russian and Bulgarian for international exchange.[96] In Indic scripts, ISO 15919:2001 provides a unified scheme for Devanagari and related systems, employing underdots and overlines for retroflex and long vowels, such as ṛ for ऋ (ṛ) and ṝ for ॠ (ṝ), applicable to languages like Sanskrit, Hindi, and Tamil.[97][98] Established standards guide these practices across institutions. The ALA-LC system is standard for library cataloging, emphasizing scholarly fidelity with diacritics.[95] In contrast, the BGN/PCGN conventions, adopted by the U.S. Board on Geographic Names and UK Permanent Committee on Geographical Names, favor simplified forms without diacritics for many languages, including Russian and Arabic, to enhance readability in official mapping and documents, though reversible options with marks are available for precision.[99] For Japanese, the modified Hepburn system, used by the Library of Congress, indicates long vowels with macrons (e.g., Tōkyō for 東京) and apostrophes for syllable breaks (e.g., Kyōto no), balancing phonetic accuracy for non-native learners.[100] Diacritic selection varies by tradition and practicality. In Slavic transliterations, scholarly guidelines often prefer acute accents over apostrophes to denote palatalization or the soft sign (e.g., Rus´ as Ruś), as seen in Ukrainian publications, to avoid punctuation confusion while maintaining typographic clarity. Chinese Hanyu Pinyin, the official romanization system, marks four tones with diacritics (ā for first tone, á for second, ǎ for third, à for fourth), but these are frequently omitted in casual or digital contexts due to input difficulties and visual clutter, risking homophone ambiguity in a tonal language.[101] Bidirectional transliteration poses accuracy challenges, particularly when source details like diacritics are absent. In Hebrew romanization under ALA-LC, vowels are inferred from grammatical context and syntax rather than explicit niqqud (vowel points), requiring catalogers to supply the correct vocalization; back-transliteration from Latin thus often loses niqqud precision, as modern Hebrew texts typically omit these marks, leading to multiple possible reconstructions.[102][103] Debates center on simplified versus scholarly approaches, balancing accessibility and fidelity. For Cyrillic, ISO 9's diacritic-heavy scheme enables lossless mapping but complicates everyday use, while BGN/PCGN's diacritic-free variants prioritize phonetic approximation for governmental applications, sparking discussions on whether reversibility should yield to broader adoption in global communication.[99]Representation and Limitations
Unicode Encoding
Diacritics in Unicode are primarily represented through two mechanisms: combining diacritical marks and precomposed characters. The core set of combining diacritical marks is encoded in the Unicode block "Combining Diacritical Marks" spanning U+0300 to U+036F, which includes 112 characters designed to attach to preceding base characters.[104] For example, the combining acute accent at U+0301 is used to indicate stress or tone in various scripts, such as the Greek tonos or Pinyin second tone.[104] Precomposed characters integrate a base letter with a diacritic into a single code point, primarily in blocks like Latin-1 Supplement (U+0080–U+00FF) and Latin Extended-A/B. A representative example is U+00E9, Latin Small Letter E with Acute (é), which combines the base 'e' (U+0065) with an acute accent.[105] This block and extensions provide coverage for over 100 diacritical marks, encompassing legacy forms from European languages in Latin Extended blocks as well as more exotic marks like the combining caron (háček) at U+030C, used in Slavic and Pinyin notations.[22] To ensure consistent representation, Unicode employs normalization forms that handle equivalences between precomposed and decomposed sequences. Normalization Form C (NFC) composes compatible sequences into precomposed characters where possible, such as transforming 'e' + combining acute (U+0065 + U+0301) into é (U+00E9).[106] In contrast, Normalization Form D (NFD) decomposes precomposed forms into base characters followed by combining marks, yielding é as 'e' + combining acute, which facilitates searching and processing by separating components.[106] Unicode's coverage of diacritics has evolved, with post-2010 updates addressing needs for African languages, such as additions in Unicode 6.1 (2012) including precomposed letters with hooks and dots for scripts like Hausa and Igbo. The combining hook above (U+0309), vital for Vietnamese and some African tonal indications, was established earlier but supported expanded use in these contexts through subsequent font and script integrations.[22] For instance, compound tone diacritics (U+1AD0–U+1AD5) for complex tonal scripts in African and Asian languages, proposed in 2023, were added in Unicode 16.0 (2024).[107] As of Unicode 17.0 (released in 2025), support continues to expand, though some gaps may remain in highly specialized phonetic representations.[108] Fonts play a crucial role in rendering these encodings, with open-source families like Noto Sans designed to support stacking of multiple diacritics on a single base character via OpenType glyph positioning tables, ensuring proper vertical alignment and overlap avoidance across scripts.[109] For instance, Noto handles sequences like 'a' + combining acute + combining dot below by positioning the marks hierarchically above and below the base.[110]Orthographic Constraints
Orthographic constraints on diacritic usage arise from the interplay between a language's phonological inventory and its writing system's capacity to represent distinctions without excessive complexity. In languages like English, the absence of lexical tones or other suprasegmental features that require fine-grained marking eliminates the need for diacritics altogether, as the phonemic system relies primarily on consonant and vowel contrasts that can be conveyed through basic Latin letters.[111] This phonological simplicity contrasts sharply with tonal languages, where diacritics are essential but limited by the number of available marks; Vietnamese exemplifies a maximal system, employing six tones marked by five diacritics (acute, grave, hook, tilde, and dot) combined with an unmarked level tone, applied to a 29-letter alphabet derived from Latin.[112] Such constraints ensure that diacritics do not proliferate beyond what the script can visually distinguish, preventing overlap or ambiguity in tone representation. Design challenges further restrict diacritic integration, particularly in scripts where spatial limitations or stylistic features hinder clear placement. In polytonic Greek, the accumulation of multiple diacritics—such as breathing marks, accents, and iota subscripts—on a single vowel creates visual density that can obscure legibility, especially in dense text, prompting a shift to monotonic orthography in modern usage to reduce this clutter.[113] Similarly, Indic scripts like Devanagari stack vowel diacritics and consonant conjuncts vertically around a core glyph, leading to cramped forms that challenge readability and typesetting, as the nonlinear arrangement limits the number and size of marks without distorting the base character's proportions.[114] In cursive scripts such as Arabic, diacritics (tashkil) are frequently omitted or repositioned due to the fluid, connected letterforms, which prioritize writing speed and aesthetic flow over full vocalization, rendering stacked or multiple marks incompatible with rapid handwriting.[115] Standardization efforts often impose additional constraints by balancing orthographic depth—the consistency of grapheme-to-phoneme mappings—with practical usability, sometimes curtailing diacritic reliance to simplify learning and printing. English exemplifies a deep orthography, where irregular spellings reduce the impetus for diacritics despite phonological ambiguities, as historical conventions favor morpheme preservation over phonetic precision.[116] In contrast, Czech maintains a shallower orthography through consistent diacritics like háčky and čárky to mark distinct phonemes, but reforms have streamlined their application to avoid overcomplication in a language with relatively transparent sound-letter correspondences.[117] Major reforms, such as Turkey's 1928 adoption of a Latin-based alphabet under Atatürk, deliberately minimized diacritics—retaining only a few like ç and ş—by replacing the Arabic script's vowel-pointing system, aiming to enhance literacy rates and align with phonetic needs without excessive markings.[118] When adapting loanwords, orthographic constraints lead to selective diacritic omission to fit the host script's phonology and visual norms. Japanese katakana, used for foreign borrowings, typically strips original diacritics from source languages, approximating sounds with plain katakana characters or minimal voicing marks (dakuten), as the syllabic structure and lack of tonal distinctions preclude retaining accents or other modifiers from European loans.[119] This adaptation ensures compatibility but can result in loss of source-language nuances, reflecting broader limits on how scripts accommodate external phonological elements without expanding their diacritic inventory.Ornamental and Non-Standard Uses
Diacritics serve ornamental purposes in typography, where they extend beyond linguistic function to add decorative flourishes, such as swash variants that elongate accents for elegant logos and display fonts. For example, in elaborate gothic typefaces, the German ß (eszett) may incorporate additional curlicues or flourishes to enhance visual appeal in branding materials. These applications prioritize aesthetic enhancement over phonetic accuracy, allowing designers to create distinctive, eye-catching elements in print and digital media.[120][121] In informal online contexts, diacritics appear in internet slang and stylized text to convey playfulness or subversion, often building on leetspeak traditions by replacing standard letters with accented equivalents like ę for "e" in usernames or chat messages. A prominent example is zalgo text, an internet phenomenon that stacks multiple combining diacritics above and below letters to produce a distorted, eerie appearance, commonly used in memes, horror-themed content, and glitch art for comedic or unsettling effects. This non-standard manipulation exploits Unicode's combining character mechanics, originating around 2004 but gaining traction in social media post-2010.[122][123] Brands frequently adopt diacritics for non-standard flair to evoke exoticism or sophistication, even absent orthographic need. Häagen-Dazs, an American ice cream company founded in 1961, invented its name with umlauts over the "a" to mimic Danish heritage and premium quality, though Danish orthography does not employ umlauts and the term holds no meaning in the language. Similarly, the Pokémon media franchise stylizes its title with acute accents on the "e" to lend an international, adventurous vibe, while product names like "naïve" preserve the diaeresis for visual distinction in marketing. Other examples include rock bands like Blue Öyster Cult, which added umlauts to suggest gothic intrigue, highlighting diacritics' role in crafting memorable, pseudo-European identities.[124][125] These ornamental and non-standard applications introduce risks, including misinterpretation in digital searches where engines like Google treat accented and unaccented variants differently—for instance, querying "México" may prioritize pages with the accent, while "Mexico" broadens results, potentially harming brand discoverability. Accessibility challenges also arise, as screen readers often ignore decorative diacritics or pronounce them inconsistently, rendering stylized text opaque or garbled for visually impaired users and violating guidelines for clear communication. Furthermore, employing faux diacritics can border on cultural appropriation; Western advertising's use of inauthentic umlauts or faux-Cyrillic letters (e.g., substituting Latin "B" with Cyrillic "В" for a "Russian" look) exoticizes non-Western scripts, reinforcing stereotypes without respect for their origins.[126][127][128] Since Unicode's major expansions around 2015, trends show increased decorative diacritic use in digital stickers and text art, enabled by broader support for combining marks in apps and social platforms, fostering creative expressions like customized glitch effects. However, professional style guides caution against overuse, emphasizing that excessive diacritics in formal writing can obscure meaning and undermine readability, recommending restraint to preserve accessibility and clarity.[129]References
- https://en.wiktionary.org/wiki/Zalgo