Hubbry Logo
Unicode subscripts and superscriptsUnicode subscripts and superscriptsMain
Open search
Unicode subscripts and superscripts
Community hub
Unicode subscripts and superscripts
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Unicode subscripts and superscripts
Unicode subscripts and superscripts
from Wikipedia

Unicode has subscripted and superscripted versions of a number of characters including a full set of Arabic numerals.[1] These characters allow any polynomial, chemical and certain other equations to be represented in plain text without using any form of markup like HTML or TeX.

The World Wide Web Consortium and the Unicode Consortium have made recommendations on the choice between using markup and using superscript and subscript characters:

When used in mathematical context (MathML) it is recommended to consistently use style markup for superscripts and subscripts [...] However, when super and sub-scripts are to reflect semantic distinctions, it is easier to work with these meanings encoded in text rather than markup, for example, in phonetic or phonemic transcription.[2]

Uses

[edit]
The difference between superscript/subscript and numerator/denominator glyphs. In many popular computer fonts the Unicode "superscript" and "subscript" characters are actually numerator and denominator glyphs.

The intended use[2] when these characters were added to Unicode was to produce true superscripts and subscripts so that chemical and algebraic formulas could be written without markup. Thus "H₂O" (using a subscript 2 character) is supposed to be identical to "H2O" (with subscript markup).

In reality, many fonts that include these characters ignore the Unicode definition, and instead design the digits for mathematical numerator and denominator glyphs,[3][4] which are aligned with the cap line and the baseline, respectively. When used with the solidus or the Fraction Slash, they produce an almost typographically correct diagonal fraction, such as ³/₄ for the ¾ glyph. Super and subscript markup does not produce a correct fraction (compare markup 3/4 with precomposed ¾). The change also makes the superscript letters useful for ordinal indicators, more closely matching the ª and º characters.

Unicode intended that diagonal fractions be rendered by a different mechanism: the fraction slash U+2044 is visually similar to the solidus, but when used with the ordinary digits (not the superscripts and subscripts), it instructs the layout system that a fraction such as ¾ is to be rendered using automatic glyph substitution.[5][a] User-end support was quite poor for a number of years, but fonts,[b] browsers,[c] word processors,[d] desktop publishing software[e] and others increasingly support the intended Unicode behavior. This browser and your default font render the sequence as ⟨3⁄4⟩. (See Slash (punctuation)#Fractions for rendering in various other fonts.)

Superscripts and subscripts block

[edit]

The most common superscript digits (1, 2, and 3) were included in ISO-8859-1 and were therefore carried over into those code points in the Latin-1 range of Unicode. The remainder were placed along with basic arithmetical symbols, and later some Latin subscripts, in a dedicated block at U+2070 to U+209F. The table below shows these characters together. Each superscript or subscript character is preceded by a baseline x to show the height of subscripting/superscripting.

Six code points in the "Superscripts and Subscripts" block are unassigned, and remain available for future characters. As of November 2024, three of these (209D, 209E, and 209F) were provisionally assigned to new subscript characters, namely Latin lowercase w, y, and z.[6][7]

Unicode characters
0 1 2 3 4 5 6 7 8 9 A B C D E F
U+00Bx
U+207x x⁰ xⁱ x⁴ x⁵ x⁶ x⁷ x⁸ x⁹ x⁺ x⁻ x⁼ x⁽ x⁾ xⁿ
U+208x x₀ x₁ x₂ x₃ x₄ x₅ x₆ x₇ x₈ x₉ x₊ x₋ x₌ x₍ x₎
U+209x xₐ xₑ xₒ xₓ xₔ xₕ xₖ xₗ xₘ xₙ xₚ xₛ xₜ
  Not yet assigned.
  Other characters from Latin-1 not related to super- or sub-scripts.

Other superscript and subscript characters

[edit]

Unicode also includes codepoints for subscript and superscript characters that are intended for semantic usage, in the following blocks:[1][8]

Superscript
  • The Latin-1 Supplement block contains the feminine and masculine ordinal indicators ª and º.
  • The Latin Extended-C block contains one superscript, ⱽ.
  • The Latin Extended-D block contains seven superscripts: ꝰ ꟲ ꟳ ꟴ ꟱ ꟸ ꟹ.
  • The Latin Extended-E block contains five superscripts: ꭜ ꭝ ꭞ ꭟ ꭩ.
  • The Latin Extended-F block is entirely superscript IPA letters: 𐞁 𐞂 𐞃 𐞄 𐞅 𐞇 𐞈 𐞉 𐞊 𐞋 𐞌 𐞍 𐞎 𐞏 𐞐 𐞑 𐞒 𐞓 𐞔 𐞕 𐞖 𐞗 𐞘 𐞙 𐞚 𐞛 𐞜 𐞝 𐞞 𐞟 𐞠 𐞡 𐞢 𐞣 𐞤 𐞥 𐞦 𐞧 𐞨 𐞩 𐞪 𐞫 𐞬 𐞭 𐞮 𐞯 𐞰 𐞲 𐞳 𐞴 𐞵 𐞶 𐞷 𐞸 𐞹 𐞺.
  • The Spacing Modifier Letters block has superscripted letters and symbols used for phonetic transcription: ʰ ʱ ʲ ʳ ʴ ʵ ʶ ʷ ʸ ˀ ˁ ˠ ˡ ˢ ˣ ˤ.
  • The Phonetic Extensions block has several superscripted letters and symbols: Latin/IPA ᴬ ᴭ ᴮ ᴯ ᴰ ᴱ ᴲ ᴳ ᴴ ᴵ ᴶ ᴷ ᴸ ᴹ ᴺ ᴻ ᴼ ᴽ ᴾ ᴿ ᵀ ᵁ ᵂ ᵃ ᵄ ᵅ ᵆ ᵇ ᵈ ᵉ ᵊ ᵋ ᵌ ᵍ ᵏ ᵐ ᵑ ᵒ ᵓ ᵖ ᵗ ᵘ ᵚ ᵛ, Greek ᵝ ᵞ ᵟ ᵠ ᵡ, Cyrillic ᵸ, other ᵎ ᵔ ᵕ ᵙ ᵜ. These are intended to indicate secondary articulation.
  • The Phonetic Extensions Supplement block has several more: Latin/IPA ᶛ ᶜ ᶝ ᶞ ᶟ ᶠ ᶡ ᶢ ᶣ ᶤ ᶥ ᶦ ᶧ ᶨ ᶩ ᶪ ᶫ ᶬ ᶭ ᶮ ᶯ ᶰ ᶱ ᶲ ᶳ ᶴ ᶵ ᶶ ᶷ ᶸ ᶹ ᶺ ᶻ ᶼ ᶽ ᶾ, Greek ᶿ.
  • The Cyrillic Extended-B block contains two Cyrillic superscripts: ꚜ ꚝ.
  • The Cyrillic Extended-D block contains many Cyrillic superscripts: 𞀰 𞀱 𞀲 𞀳 𞀷 𞀵 𞀶 𞀷 𞀸 𞀹 𞀺 𞀻 𞀼 𞀽 𞀾 𞀿 𞁀 𞁁 𞁂 𞁃 𞁄 𞁅 𞁆 𞁇 𞁈 𞁉 𞁊 𞁋 𞁌 𞁍 𞁎 𞁏 𞁐 𞁫 𞁬 𞁭.
  • The Georgian block contains one superscripted Mkhedruli letter: ჼ.
  • The Kanbun block has superscripted annotation characters used in Japanese copies of Classical Chinese texts: ㆒ ㆓ ㆔ ㆕ ㆖ ㆗ ㆘ ㆙ ㆚ ㆛ ㆜ ㆝ ㆞ ㆟.
  • The Tifinagh block has one superscript letter : ⵯ.
  • The Unified Canadian Aboriginal Syllabics and its Extended blocks contain several mostly consonant-only letters to indicate syllable coda called Finals, along with some characters that indicate syllable medial known as Medials: Main block ᐜ ᐝ ᐞ ᐟ ᐠ ᐡ ᐢ ᐣ ᐤ ᐥ ᐦ ᐧ ᐨ ᐩ ᐪ ᑉ ᑊ ᑋ ᒃ ᒄ ᒡ ᒢ ᒻ ᒼ ᒽ ᒾ ᓐ ᓑ ᓒ ᓪ ᓫ ᔅ ᔆ ᔇ ᔈ ᔉ ᔊ ᔋ ᔥ ᔾ ᔿ ᕀ ᕁ ᕐ ᕑ ᕝ ᕪ ᕻ ᕯ ᕽ ᖅ ᖕ ᖖ ᖟ ᖦ ᖮ ᗮ ᘁ ᙆ ᙇ ᙚ ᙾ ᙿ; Extended block: ᣔ ᣕ ᣖ ᣗ ᣘ ᣙ ᣚ ᣛ ᣜ ᣝ ᣞ ᣟ ᣳ ᣴ ᣵ.
Combining superscript
  • The Combining Diacritical Marks block contains medieval superscript letter diacritics. These letters are written directly above other letters appearing in medieval Germanic manuscripts, and so these glyphs do not include spacing, for example uͤ. They are shown here over the dotted circle placeholder ◌: ◌ͣ ◌ͤ ◌ͥ ◌ͦ ◌ͧ ◌ͨ ◌ͩ ◌ͪ ◌ͫ ◌ͬ ◌ͭ ◌ͮ ◌ͯ.
  • The Combining Diacritical Marks Extended block contains three combining insular letters for the Middle English Ormulum, ◌ᫌ ◌ᫍ ◌ᫎ.[9]
  • The Combining Diacritical Marks Supplement block contains additional medieval superscript letter diacritics, enough to complete the basic lowercase Latin alphabet except for j, q and y, a few small capitals and ligatures (ae, ao, av), and additional letters: ◌᷒ ◌ᷓ ◌ᷔ ◌ᷕ ◌ᷖ ◌ᷗ ◌ᷘ ◌ᷙ ◌ᷚ ◌ᷛ ◌ᷜ ◌ᷝ ◌ᷞ ◌ᷟ ◌ᷠ ◌ᷡ ◌ᷢ ◌ᷣ ◌ᷤ ◌ᷥ ◌ᷦ ◌ᷧ ◌ᷨ ◌ᷪ ◌ᷫ ◌ᷬ ◌ᷭ ◌ᷮ ◌ᷯ ◌ᷰ ◌ᷱ ◌ᷲ ◌ᷳ ◌ᷴ, Greek ◌ᷩ.
  • The Cyrillic Extended-A and -B blocks contains multiple medieval superscript letter diacritics, enough to complete the basic lowercase Cyrillic alphabet used in Church Slavonic texts, also includes an additional ligature (ст): ◌ⷠ ◌ⷡ ◌ⷢ ◌ⷣ ◌ⷤ ◌ⷥ ◌ⷦ ◌ⷧ ◌ⷨ ◌ⷩ ◌ⷪ ◌ⷫ ◌ⷬ ◌ⷭ ◌ⷮ ◌ⷯ ◌ⷰ ◌ⷱ ◌ⷲ ◌ⷳ ◌ⷴ ◌ⷵ ◌ⷶ ◌ⷷ ◌ⷸ ◌ⷹ ◌ⷺ ◌ⷻ ◌ⷼ ◌ⷽ ◌ⷾ ◌ⷿ ◌ꙴ ◌ꙵ ◌ꙶ ◌ꙷ ◌ꙸ ◌ꙹ ◌ꙺ ◌ꙻ ◌ꚞ ◌ꚟ.
  • The Cyrillic Extended-D block has one additional combining character, that being і: ◌𞂏.
Subscript
  • The Latin Extended-C block contains one subscript, ⱼ.
  • The Phonetic Extensions block has several subscripted letters and symbols: Latin/IPA ᵢ ᵣ ᵤ ᵥ and Greek ᵦ ᵧ ᵨ ᵩ ᵪ.
  • The Cyrillic Extended-D block also contains many Cyrillic subscripts: 𞁑 𞁒 𞁓 𞁔 𞁕 𞁖 𞁗 𞁘 𞁙 𞁚 𞁛 𞁜 𞁝 𞁞 𞁟 𞁠 𞁡 𞁢 𞁣 𞁤 𞁥 𞁦 𞁧 𞁨 𞁩 𞁪.
Combining subscript

Latin, Greek, Cyrillic, and IPA tables

[edit]
A superscript small-cap W may be distinct from a superscript lowercase w in italic typeface, as in this phonetic notation.

Consolidated, the Unicode standard contains superscript and subscript versions of a subset of Latin, Greek and Cyrillic letters. Here they are arranged in alphabetical order for comparison (or for copy and paste convenience). Since these characters appear in different Unicode ranges, they may not appear to be the same size or position due to font substitution by the browser. Shaded cells mark petite capitals that are not very distinct from minuscules in roman typeface, but they may be distinct in italic typeface, as is used in some phonetic notation.

Little punctuation is encoded. Parentheses are shown in the basic superscript block above, and the exclamation mark ⟨⟩ is shown in the IPA table below. In a supporting font, a question mark may be created with a superscript gelded question mark and a combining dot below: ⟨ˀ̣⟩.

Basic Latin modifier letters
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Superscript capital ᴿ
Superscript small capital * 𐞄 * * 𐞒 𐞖 𞀹§ 𞀻§ * 𐞪 𞁀§ 𐞲
Superscript minuscule ʰ ʲ ˡ 𐞥 ʳ ˢ ʷ ˣ ʸ
Overscript small capital ◌ⷡ § ◌ᷛ ◌ⷩ § ◌ⷦ § ◌ᷞ ◌ᷟ ◌ᷡ ◌ᷢ ◌ⷮ §
Overscript minuscule ◌ͣ ◌ᷨ ◌ͨ ◌ͩ ◌ͤ ◌ᷫ ◌ᷚ ◌ͪ ◌ͥ ◌ᷜ ◌ᷝ ◌ͫ ◌ᷠ ◌ͦ ◌ᷮ ◌ͬ ◌ᷤ ◌ͭ ◌ͧ ◌ͮ ◌ᷱ ◌ͯ ◌ꙷ § ◌ᷦ
Subscript minuscule 𞁞§ * * *
Underscript minuscule ◌᷊ ◌ᪿ

*Superscript versions of petite capital A, D, E and P, of ƀ, and subscript versions of w, y and z have been accepted for a future version of the Unicode Standard.[6]

§ Cyrillic 𞀹 𞀻 𞁀, ◌ⷡ ◌ⷩ ◌ⷦ ◌ⷮ ◌ꙷ and 𞁞 might be substituted for these letters, but the same font would have to cover both ranges for it to look right.

Additional Latin modifier letters
Æ Ä Ƀ Ǝ Ə Ŋ Ö Ü
Superscript capital (ᴬ̈) (ᴼ̈) (ᵁ̈)
Superscript minuscule 𐞃 (ᵃ̈) * (ᵒ̈) (ᵘ̈)
Overscript minuscule ◌ᷔ ◌ᷲ ◌ᷪ ◌ᷬ ◌ᷳ ◌ᷴ
Subscript minuscule

Some of these superscript capitals are small caps in the source documents in the Unicode proposals. Superscript Ä, Ö, Ü (in parentheses) are composed of the base letter and a combining tréma.

Except for the iota subscript, which has use in Greek text, the modifier Greek letters are intended as phonetic characters in Latin-script text. Shaded cells are indistinguishable from Latin letters, and so would not be expected to have distinctive use in Latin text or to be supported by Unicode.

Greek modifier letters (intended for use in Latin-script text)
Α Β Γ Δ Ε Ζ Η Θ Ι Κ Λ Μ Ν Ξ Ο Π Ρ Σ Τ Υ Φ Χ Ψ Ω
Superscript minuscule [A] ᶿ [A] * *
Overscript minuscule ◌ᷧ[A] ◌ᷩ ◌᫇[f]
Subscript minuscule ͺ[g]
Underscript minuscule ◌ͅ ◌̫[f]
  1. ^ a b c In some fonts, script-a ᵅ and ᶹ have the form of superscript Greek alpha and upsilon, in others they are not a close match.

*Superscript versions of Greek psi and omega have been accepted for a future version of the Unicode Standard.[6]

Cyrillic modifier characters are intended for use in Cyrillic text.

Russian modifier letters
А Б В Г Д Е Ж З И К Л М Н О П Р С Т У Ф Х Ц Ч Ш Щ Ъ Ы Ь Э Ю Я
Superscript 𞀰 𞀱 𞀲 𞀳 𞀴 𞀵 𞀶 𞀷 𞀸 𞀹 𞀺 𞀻 𞀼 𞀽 𞀾 𞀿 𞁀 𞁁 𞁂 𞁃 𞁄 𞁅 𞁆 𞁇 𞁈 𞁉
Overscript ◌ⷶ ◌ⷠ ◌ⷡ ◌ⷢ ◌ⷣ ◌ⷷ ◌ⷤ ◌ⷥ ◌ꙵ ◌ⷦ ◌ⷧ ◌ⷨ ◌ⷩ ◌ⷪ ◌ⷫ ◌ⷬ ◌ⷭ ◌ⷮ ◌ꙷ ◌ꚞ ◌ⷯ ◌ⷰ ◌ⷱ ◌ⷲ ◌ⷳ ◌ꙸ ◌ꙹ ◌ꙺ ◌ⷻ
Subscript 𞁑 𞁒 𞁓 𞁔 𞁕 𞁖 𞁗 𞁘 𞁙 𞁚 𞁛 𞁜 𞁝 𞁞 𞁟 𞁠 𞁡 𞁢 𞁣 𞁤 𞁥 𞁦
Additional modern Cyrillic modifier letters
Ә Ґ Є Ѕ І Ї Ј Ө Ҫ Ү Ұ Џ Ӏ
Superscript 𞁋 𞁊 𞁌 𞁌̈ 𞁍 𞁎 𞁫 𞁏 𞁭 𞁐
Overscript ◌ꙴ ◌𞂏 ◌ꙶ
Subscript 𞁧 𞁩 𞁨 𞁨̈ 𞁪
Additional medieval Cyrillic modifier letters
Ѡ Ѣ Ѥ Ѧ Ѫ Ѭ Ѳ
Superscript 𞁬
Overscript ◌ⷹ ◌ꙻ ◌ⷺ ◌ⷼ ◌ꚟ ◌ⷽ ◌ⷾ ◌ⷿ ◌ⷴ

Superscript and subscript ё, ї, й, ў etc. are handled with diacritics, ⟨𞀵̈ 𞁌̈ 𞀸̆ 𞁁̆⟩ etc. Many of the Cyrillic characters were added to the Cyrillic Extended-D block, which was added to the free Gentium and Andika fonts with version 6.2 in February 2023.

See also small caps in Unicode.

Superscript IPA

[edit]

The Latin Extended-F block was created for the remaining superscript IPA letters. They are supported by the free Gentium and Andika fonts. Additional superscript characters for historical and para-IPA letters have been accepted for future versions of the Unicode Standard.[6]

Consonant letters

[edit]

The Unicode characters for superscript (modifier) IPA and extIPA consonant letters are as follows. The entire Latin Extended-F block is dedicated to superscript IPA. Characters for sounds with secondary articulation are set off in parentheses and placed below the base letters. Asterisks mark superscript letters in the Unicode pipeline.

IPA and extIPA consonants, along with superscript variants and their Unicode code points
Bi­labial Labio­dental Dental Alveolar Post­alveolar Retro­flex Palatal Velar Uvular Pharyn­geal Glottal
Nasal m ᵐ
1D50
ɱ ᶬ
1DAC
n ⁿ
207F
(ᶇ *)
 
 
(ȵ *)
ɳ ᶯ
1DAF
ɲ ᶮ
1DAE
ŋ ᵑ
1D51
ɴ ᶰ
1DB0
Plosive p ᵖ
1D56
b ᵇ
1D47
t ᵗ
1D57
(ƫ ᶵ)
1DB5
d ᵈ
1D48
(ᶁ *)
 
 
(ȶ *)
 
 
(ȡ *)
ʈ 𐞯
107AF
ɖ 𐞋
1078B
c ᶜ
1D9C
ɟ ᶡ
1DA1
k ᵏ
1D4F
ɡ ᶢ/g ᵍ
1DA2/1D4D
q 𐞥
107A5
ɢ 𐞒
10792
ʡ 𐞳
107B3
ʔ ˀ
02C0
Affricate ʦ 𐞬
107AC
ʣ 𐞇
10787
ʧ 𐞮
107AE
(ʨ 𐞫)
107AB
ʤ 𐞊
1078A
(ʥ 𐞉)
10789
ꭧ 𐞭
107AD
(𝼜 *)
ꭦ 𐞈
10788
(𝼙 *)
Fricative ɸ ᶲ
1DB2
β ᵝ
1D5D
f ᶠ
1DA0
v ᵛ
1D5B
θ ᶿ
1DBF
ð ᶞ
1D9E
s ˢ
02E2
(ᶊ *)
z ᶻ
1DBB
(ᶎ *)
ʃ ᶴ
1DB4
(ɕ ᶝ)
1D9D
ʒ ᶾ
1DBE
(ʑ ᶽ)
1DBD
ʂ ᶳ
1DB3
(ᶘ *)
ʐ ᶼ
1DBC
(ᶚ *)
ç ᶜ̧
1D9C + 0327[h]
ʝ ᶨ
1DA8
x ˣ
02E3
(ɧ 𐞗)
10797
ɣ ˠ
02E0
χ ᵡ
1D61
ʁ ʶ
02B6
ħ 𐞕
10795
(ʩ 𐞐)
10790
ʕ ˤ
02E4[i]
h ʰ
02B0
(ꞕ *)
ɦ ʱ
02B1
Approximant ʋ ᶹ
1DB9
ɹ ʴ
02B4
ɻ ʵ
02B5
j ʲ
02B2
(ɥ ᶣ)
1DA3
 
 
(ʍ ꭩ)
AB69
ɰ ᶭ
1DAD
(w ʷ)
02B7
Tap/flap ⱱ 𐞰
107B0
ɾ 𐞩
107A9
ɽ 𐞨
107A8
Trill ʙ 𐞄
10784
r ʳ
02B3
ʀ 𐞪
107AA
ʜ 𐞖
10796
ʢ 𐞴
107B4
Lateral fricative ɬ 𐞛
1079B
(ʪ 𐞙)
10799
ɮ 𐞞
1079E
(ʫ 𐞚)
1079A
ꞎ 𐞝
1079D
𝼅 𐞟
1079F
𝼆 𐞡
107A1
𝼄 𐞜
1079C
Lateral approximant l ˡ
02E1
(ᶅ ᶪ)
1DAA
 
 
(ȴ *)
ɭ ᶩ
1DA9
ʎ 𐞠
107A0
ʟ ᶫ
1DAB
(ɫ ꭞ)[j]
AB5E
Lateral tap/flap ɺ 𐞦
107A6
𝼈 𐞧
107A7
Implosive ƥ * ɓ 𐞅
10785
ƭ * ɗ 𐞌
1078C
𝼉 * ᶑ 𐞍
1078D
ƈ * ʄ 𐞘
10798
ƙ * ɠ 𐞓
10793
ʠ * ʛ 𐞔
10794
Click release ʘ 𐞵
107B5
ɋ ǀ 𐞶
107B6
ʇ * ǃ ꜝ
A71D
ʗ * 𝼊 𐞹
107B9
ψ * ǂ 𐞸
107B8
𝼋 * (ʞ *)
Lateral click
release
ǁ 𐞷
107B7
ʖ *
Percussive ¡ ꜞ
A71E[k]

The spacing diacritic for ejective consonants, U+2BC, works with superscript letters despite not being superscript itself: ⟨ᵖʼ ᵗʼ ᶜʼ ᵏˣʼ⟩. If a distinction needs to be made, the combining apostrophe U+315 may be used: ⟨ᵖ̕ ᵗ̕ ᶜ̕ ᵏˣ̕⟩. The spacing diacritic should be used for a baseline letter with a superscript release, such as [tˢʼ] or [kˣʼ], where the scope of the apostrophe includes the non-superscript letter, but the combining apostrophe U+315 might be used to indicate a weakly articulated ejective consonant like [ᵗ̕] or [ᵏ̕], where the whole consonant is written as a superscript, or together with U+2BC when separate apostrophes have scope over the base and modifier letters, as in ⟨pʼᵏˣ̕⟩.[10]

Spacing diacritics, as in ⟨⟩, cannot be secondarily superscripted in plain text: ⟨ᵗʲ⟩. (In this instance, the old IPA letter for [tʲ], ⟨ƫ⟩, has a superscript variant in Unicode, U+1DB5 ⟨⟩, but that is not generally the case.)

Among older letters, the most common letters with palatal hook are supported; they are displayed in the table above. IPA once had an idiosyncratic curl on some of the palatalized letters: these are the fricative letters ⟨ʆ ʓ⟩. Their superscript forms have been accepted for a future version of the Unicode Standard. Old-style click letters and the retired letters ⟨ƞ⟩ and ⟨ɼ⟩ have also been accepted for a future version of the Unicode Standard.[6] The Teuthonista letter ⟨⟩ (U+A727) is an old graphic variant of ⟨ɮ⟩. Its superscript is supported at ⟨⟩ (U+AB5C).

Among para-IPA letters, superscript variants of Sinological ⟨ȡ ȴ ȵ ȶ⟩, of the Bantuist labio-dental plosives ⟨ȹ⟩ and ⟨ȸ⟩, and of central semivowels ⟨ɉ⟩, ⟨ɥ̶⟩, and ⟨⟩ have been accepted for a future version of the Unicode Standard.[6]

Vowel letters

[edit]

The Unicode characters for superscript (modifier) IPA vowel letters, plus a pair of extended letters ⟨ ᵿ⟩ found in English dictionaries, are as follows. Recently retired alternative letters such as ⟨ɩ ɷ⟩ are also supported; they are set off in parentheses and placed below the modern IPA letters. Asterisks mark superscript letters in the Unicode pipeline.

IPA vowels and superscript variants
Front Central Back
Close i ⁱ
2071
y ʸ
02B8
ɨ ᶤ
1DA4
ʉ ᶶ
1DB6
ɯ ᵚ
1D5A
u ᵘ
1D58
Near-close ɪ ᶦ
1DA6
(ɩ ᶥ)
1DA5
ʏ 𐞲
107B2




(ᵻ ᶧ)
1DA7


(ᵿ *)



(ω *)

ʊ ᶷ
1DB7
(ɷ 𐞤)
107A4
Close-mid e ᵉ
1D49
ø 𐞢
107A2
ɘ 𐞎
1078E
ɵ ᶱ
1DB1
ɤ 𐞑
10791
o ᵒ
1D52
Mid ə ᵊ
1D4A
Open-mid ɛ ᵋ
1D4B
œ ꟹ
A7F9
ɜ ᶟ
1D9F
(ᴈ ᵌ)[l]
1D4C
ɞ 𐞏
1078F
ʌ ᶺ
1DBA
ɔ ᵓ
1D53
Near-open æ 𐞃
10783
ɶ 𐞣
107A3
ɐ ᵄ
1D44
ɑ ᵅ
1D45
ɒ ᶛ
1D9B
Open a ᵃ
1D43

The precomposed Unicode rhotic vowel letters ⟨ɚ ɝ⟩ are not directly supported. The rhotic diacritic U+02DE ◌˞ should be used instead: ⟨ᵊ˞ ᶟ˞⟩.[11]

Among older letters, ⟨⟩ (U+1D1C), a graphic variant of ⟨ʊ⟩, is supported at ⟨⟩ (U+1DB8).[12] The briefly resurrected vowel letter ⟨ʚ⟩ (U+029A) is not supported as a superscript, only its reversed replacement ⟨ɞ⟩ is.

Among para-IPA letters, Sinological superscript ⟨ɿ ʅ ʮ ʯ ⟩ and ⟨ ⟩ have been accepted for a future version of the Unicode Standard.[6]

Length marks

[edit]

The two length marks are also supported:

Length marks
Long Half-long
ː 𐞁
10781
ˑ 𐞂
10782

These are used to add length to another superscript, such as ⟨Cʰ𐞁⟩ or ⟨Cʰ𐞂⟩ for long aspiration.

Wildcards

[edit]

Superscript wildcards (full caps) are largely supported: e.g. ᴺC (prenasalized consonant), ꟲN (prestopped nasal), Pꟳ (fricative release), D꟱ (sibilant release, added to Unicode in 2025), NᴾF (epenthetic plosive), CVNᵀ (tone-bearing syllable), Cᴸ (liquid or lateral release), Cᴿ (rhotic or resonant release), Vᴳ (off-glide/diphthong), Cⱽ (fleeting vowel). Superscript for fleeting/epenthetic click is not included in the Unicode Standard. Other basic Latin superscript wildcards for tone and weak indeterminate sounds, as described in the article on the International Phonetic Alphabet, are mostly supported. (See table in the Latin section.)

Combining marks and subscripts

[edit]

In addition to superscript IPA, a very few IPA letters beyond the basic Latin alphabet have combining forms or are supported as subscripts:

Additional IPA modifier letters
ɑ æ β ç ð ə ɣ ʃ ʍ χ ʔ ʼ
Overscript ◌ᷧ ◌ᷔ ◌ᷩ ◌ᷗ ◌ᷙ ◌ᷪ ◌ᷯ ◌̉[m] ◌̓
Subscript *
Underscript ◌ᫀ ◌̦

Composite characters

[edit]

Primarily for compatibility with earlier character sets, Unicode contains a number of characters that compose super- and subscripts with other symbols.[1] In most fonts these render much better than attempts to construct these symbols from the above characters or by using markup.

Notes

[edit]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Unicode subscripts and superscripts are specialized characters in the standard that provide fixed-position variants of letters, digits, , and symbols raised above (superscripts) or lowered below (subscripts) the baseline, enabling plain-text representation of typographic effects without relying on markup or styling. These characters are primarily intended for semantic uses in fields such as , chemistry, physics, and , where the positioning conveys specific meaning, such as exponents in equations (e.g., x²) or atomic subscripts in molecular formulas (e.g., H₂O). Unlike arbitrary font-based superscripting for elements like footnotes, Unicode's approach emphasizes compatibility with legacy encodings and structured notations, avoiding overuse as a general formatting substitute. The core collection resides in the Superscripts and Subscripts block (U+2070–U+209F), which spans 48 code points, of which 42 are assigned, including superscript digits ⁰ through ⁹, common operators like ⁺, ⁻, ⁼, ⁽, and ⁾, and letters such as ⁱ (superscript i) and ⁿ (superscript n), alongside subscript letters like ₐ (subscript a), ₑ (subscript e), and ₜ (subscript t). This block originated from compatibility needs, incorporating characters from earlier standards like ISO/IEC 8859-1 (e.g., superscript ¹, ², ³) and was introduced in 1.1, with subsequent versions adding more subscripts for , such as ₕ (subscript h) in Unicode 6.0. Additional superscript forms, particularly modifier letters for and abbreviations, appear in blocks like Spacing Modifier Letters (U+02B0–U+02FF), (U+1D00–U+1D7F), and (U+1DC0–U+1DFF), expanding options for scripts beyond Latin and Greek. In practice, these characters support plain-text encoding of complex expressions, as outlined in Unicode Technical Note #28 for nearly plain-text , where subscripts and superscripts integrate with operators for linear input that can be rendered appropriately. However, the recommends preferring styled ASCII characters (e.g., via markup in or CSS) for new mathematical content to ensure flexibility, reserving compatibility superscripts and subscripts for legacy systems, unstyled text environments, and cases where semantic distinction is crucial, such as distinguishing ordinal indicators (1ˢᵗ) from cardinal numbers. Limitations include incomplete coverage—no full sets for all letters—prompting ongoing proposals for expansions, such as subscript w, y, z, and ɣ to better serve phonetics and chemistry.

Overview

Definition and Scope

Superscripts are typographic characters positioned above the baseline of surrounding text, such as in the expression x², while subscripts are positioned below the baseline, as in H₂O. These positions enable compact representation of exponents, indices, and other modifiers in plain text without relying on advanced formatting. In the Standard, such characters are encoded primarily as compatibility characters to support round-trip conversion with legacy encodings like ISO/IEC 8859-1, ensuring portability across systems. The scope of Unicode superscripts and subscripts is limited to the Basic Multilingual Plane (BMP), specifically the Superscripts and Subscripts block from U+2070 to U+209F, which includes characters designed for simple plain-text usage rather than full mathematical presentation forms. Unlike complex mathematical notation, which often uses OpenType font features or markup languages like MathML for precise rendering, these Unicode characters provide fixed, encoded glyphs for legacy compatibility and basic needs, such as inline exponents in non-specialized text environments. For instance, numeral superscripts range from ⁰ (U+2070) to ⁹ (U+2079), and subscripts from ₀ (U+2080) to ₉ (U+2089), allowing expressions like CO₂ without additional styling. Key properties of these characters include compatibility decompositions, which map them to their base forms with a superscript or subscript tag; for example, ² (U+00B2) decomposes to 0032. Their general category varies: numeral forms are classified as No (Other Number), distinguishing them from standard decimal digits (Nd), while many letter-based variants are Lm (Modifier Letter). Bidirectional classes are typically (Left-to-Right) or EN (European Number) to integrate seamlessly with surrounding text in mixed-directionality contexts. This encoding emphasizes 's role in providing interoperable, plain-text representations distinct from font-specific mechanisms like positioning, which apply stylistic variants to base characters.

Historical Development

The use of superscripts and subscripts traces back to early , where printers manually positioned smaller characters above or below the baseline for abbreviations, footnotes, and rudimentary as early as the 15th century with the advent of . By the , this practice had become standard in scientific publishing, with compositors hand-setting raised or lowered type for chemical formulas and exponents in . Typewriters introduced in the late 1800s offered limited support, often requiring manual backspacing and half-spacing mechanisms or specialized mathematical typewriters for creating superscripts and subscripts, as seen in academic typing practices through the mid-20th century. With the rise of digital computing, early character encoding standards like ASCII (1963) provided no dedicated support for superscripts or subscripts beyond basic digits, leading to reliance on extensions such as ISO/IEC 8859-1 (1987), which included only superscript two (², U+00B2) and three (³, U+00B3) for common uses like squared and cubed measurements. For complex mathematical notation, systems like —developed by in 1978—became essential, allowing programmatic generation of positioned characters through markup rather than fixed encodings. This gap in ISO standards, including the lack of full subscript support in ISO 8859-1, highlighted the need for more comprehensive character sets in plain text representation. Unicode addressed these limitations starting with Version 1.1 (June 1993), which introduced the initial Superscripts and Subscripts block (U+2070–U+209F) containing basic superscript numerals (⁰–⁹) and operators like plus (⁺) and minus (⁻), along with subscript digits (₀–₉). Version 3.2 (March 2002) added superscript small letter i (ⁱ). Version 4.1 (March 2005) added Latin subscript letters such as a (ₐ), e (ₑ), o (ₒ), and x (ₓ) to support phonetic and chemical notations. Version 6.0 (October 2010) added subscript letters such as h (ₕ), k (ₖ), l (ₗ), m (ₘ), n (ₙ), r (ᵣ), s (ₛ), and t (ₜ). Parallel developments influenced through web standards, such as HTML 2.0 (1995), which defined entities like ² for ² to enable inline superscripts in markup. , introduced in 1998, further standardized superscript rendering in XML for mathematical expressions, informing Unicode's inclusion of compatible characters. As of November 2025, no further characters have been added to the Superscripts and Subscripts block since Unicode 6.0, though ongoing proposals (e.g., L2/24-219) seek to add subscript small letters w (proposed U+209D), y (U+209E), z (U+209F), and gamma (proposed in a supplementary block) to support linguistic applications.

Applications

Scientific and Mathematical Notation

In chemistry, Unicode subscripts are commonly employed to denote the number of atoms in molecular formulas, such as H₂O for water and CO₂ for carbon dioxide, while superscripts indicate oxidation states like Fe²⁺ for iron(II) ion and charges on ions. Superscripts also represent isotopes, as in ¹⁴C for carbon-14, facilitating the representation of chemical structures in without requiring specialized rendering software. These conventions align with standard chemical notation practices, ensuring consistency in digital documents. In mathematics, superscripts are essential for expressing exponents, such as x² for squaring or aⁿ for general powers, and appear in summations with indices like Σᵢ=₁ⁿ i for the sum from i=1 to n. Subscripts serve to index variables or elements, while for more complex italicized expressions in limits or integrals, the block (U+1D400–U+1D7FF) is preferred over the legacy Superscripts and Subscripts block to provide full stylistic support. This approach allows basic equations to be encoded directly in plain text, as outlined in the Unicode Nearly Plain-Text Encoding of Mathematics guideline. In physics and , subscripts distinguish components of vectors or tensors, such as vₓ for the x-component of , and superscripts denote powers in units like m⁻¹ for inverse or reciprocal quantities. These notations enable precise variable labeling in equations without markup, supporting interdisciplinary applications in scientific computing and documentation. The legacy Superscripts and Subscripts block (U+2070–U+209F) provides only partial alphabetic coverage, limited to select Latin and Greek letters, necessitating combining diacritics for other extensions, while subscript digits like ₃ decompose compatibly to their base forms in normalized text. This incompleteness can lead to inconsistencies in rendering across fonts, though compatibility decompositions ensure fallback behavior in compliant systems. Unicode's adoption in tools like , via the unicode-math package's support for input since version 0.7 in 2012, and , which has included superscript and subscript formatting compatible with characters since the late through font-based rendering and editors, promotes seamless integration. This compatibility ensures portability of across PDFs and files, as UnicodeMath linear formats render consistently in applications like Word's editor and PDF viewers supporting .

Phonetic and Linguistic Representation

In , particularly within the International Phonetic Alphabet (IPA), superscripts and subscripts enable precise representation of articulatory and prosodic features that extend beyond basic segmental sounds. Superscripts are commonly employed for ejectives, where characters like ʔ (modifier letter glottal stop, U+02BC) denote following a stop, as in pʔ, distinguishing it from plain stops in languages like Georgian or . For secondary articulations, modifier letters such as ʰ (U+02B0) indicate aspiration, and ʲ (U+02B2) palatalization. Subscripts indicate modifications like subscript h ₕ (U+2095) for or subscript small letters like ₐ (U+2090) for annotations in narrow transcription. These characters allow to transcribe subtle phonetic variations without introducing entirely new symbols, supporting narrow transcription in descriptive . Suprasegmental features in linguistic markup further rely on these forms to denote prosody and phonological attributes. Primary stress is marked with a superscript vertical ˈ (U+02C8) before the stressed , such as ˈa, while tones use superscript diacritics like the in modifier form. In feature geometry models of , subscripts facilitate the annotation of articulatory traits; for instance, subscript n ₙ (U+2099) can represent nasal features. supports these through dedicated modifier letters in the Spacing Modifier Letters block (U+02B0–U+02FF), including ʰ for aspiration and ʲ for palatalization, which function as superscript attachments in IPA to indicate secondary articulations or co-occurrence with the base sound. This block has been integral to IPA since Unicode 4.0 in 2005, when expanded phonetic support enabled full chart rendering in digital tools. A proposal (L2/24-219) requested encoding of subscript small letters w, y, z, and gamma for use in phonetic traditions, including Americanist notation and extensions to IPA, to support simultaneous articulations or fricative vowels in underrepresented languages such as those in the , but these were not included in 17.0 and remain under consideration for future versions. In practical applications, such as dictionaries and language learning resources, these elements appear in English phonemic transcriptions like /kæt/ for "cat," where modifier letters may indicate features. Integration with XML formats, via Unicode entities, facilitates structured linguistic data exchange in corpora and annotation tools, ensuring compatibility for cross-linguistic analysis and preservation efforts.

Typographic and General Uses

Superscript numbers from the Superscripts and Subscripts block, such as ¹, ², and ³, are widely employed in word processors, wikis, and digital documents to denote and , enabling compact citation markers without relying on stylistic formatting. These compatibility characters preserve plain-text readability while mimicking typographic conventions, as originally intended for technical but adapted for general editorial use. Ordinal indicators in leverage Unicode superscripts for grammatical precision; for instance, Spanish and use the masculine º (U+00BA) and feminine ª (U+00AA) indicators, as in 5º for "quinto," to signify ordinal positions in abbreviations and numbering. In French, superscript letters like ᵉ (U+1D49) form ordinals such as 2ᵉ for "deuxième," drawing from the block to support legacy orthographic practices in plain text. These characters ensure compatibility across systems while avoiding full styling markup. In informal digital communication, superscripts enable playful and expressive text, with users often combining them in memes and stylized messaging, such as elevated letters for humorous effects, enhancing visual flair in platforms like . Programming environments benefit from Unicode subscripts in ; Python 3, for example, permits subscript digits like ₁ (U+2081) in variable names such as x₁, promoting clearer notation for sequences or indices while adhering to identifier rules. Similarly, superscripts denote units in data contexts, with ² (U+00B2) standard for area measurements like km², integrating seamlessly into strings and outputs. Accessibility features in screen readers handle these characters effectively; JAWS, for instance, pronounces superscript ² as "squared" in contextual units or as "superscript two" otherwise, allowing users to navigate formatted text without confusion, though best practices recommend alongside for fuller interpretation. The , introduced in the HTML 3.2 specification in 1997, laid early groundwork for web superscripts, with browsers now rendering them via for consistent display. Post-2010, mobile keyboards on and Android integrated long-press access to these symbols, boosting their adoption in everyday typing alongside .

Unicode Encoding

Core Superscripts and Subscripts Block

The Core Superscripts and Subscripts block occupies code points U+2070 through U+209F in the Basic Multilingual Plane of , positioned adjacent to the Latin-1 Supplement block for compatibility with early Western European encodings. As of Unicode 17.0 (released in 2025), it includes 42 assigned characters out of 48 possible positions, focusing on typographic variants for superscripts and subscripts used in mathematical and . These characters serve as compatibility equivalents to styled base forms, enabling plain-text representation of raised or lowered glyphs without requiring rich formatting. The block divides into two main segments: superscripts from U+2070 to U+207F (16 code points, 14 assigned) and subscripts from U+2080 to U+209F (32 code points, with 28 assigned and 4 unassigned). The superscript section encompasses digits (excluding ¹, which is in U+00B9), select Latin letters, and common mathematical operators. Representative examples include superscript (U+2070 ⁰), superscript Latin small letter i (U+2071 ⁱ), superscript four (U+2074 ⁴), and superscript n (U+207F ⁿ). The subscript section similarly covers digits, operators, and a broader set of Latin-derived letters, such as subscript (U+2080 ₀), subscript a (U+2090 ₐ), subscript schwa (U+2094 ₔ), and subscript t (U+209C ₜ). Unassigned gaps in the block include U+2072, U+2073, U+208F, and U+209D–U+209F, reserved for potential future expansions, such as additional phonetic or mathematical variants. A proposal in sought to add subscript w, y, z (U+209D–U+209F) and turned gamma for phonetics, potentially filling remaining gaps in future versions.
CategoryCode RangeAssigned CountKey Examples
SuperscriptsU+2070–U+207F14Digits: ⁰ ⁴ ⁵ ⁶ ⁷ ⁸ ⁹
Letters: ⁱ ⁿ
Operators: ⁺ ⁻ ⁼ ⁽ ⁾
SubscriptsU+2080–U+209F28Digits: ₀ ₁ ₂ ₃ ₄ ₅ ₆ ₇ ₈ ₉
Operators: ₊ ₋ ₌ ₍ ₎
Letters: ₐ ₑ ₒ ₓ ₔ ₕ ₖ ₗ ₘ ₙ ₚ ₛ ₜ
All characters in the block are designated as compatibility ideographs with the Cf (format) or Lm (letter, modifier) general category, and they undergo compatibility decomposition during NFKC (Normalization Form Compatibility Composition) and NFKD (Normalization Form Compatibility Decomposition) processes. This means they map to a tagged sequence of their base characters, preserving semantic equivalence while allowing normalization to canonical forms. For example, superscript n (U+207F ⁿ) decomposes canonically to <super> 006E (the base Latin small letter n), and subscript e (U+2091 ₑ) to <sub> 0065. Such decompositions facilitate interoperability with systems lacking native superscript/subscript support, like early ASCII extensions or legacy word processors. The block's initial allocation occurred in Unicode 1.1 (1993), starting with 28 characters mainly for superscript and subscript digits and operators to support chemical formulas and ordinal indicators from ISO 8859 variants. Expansions in later versions enhanced support for linguistic and scientific needs: (2002) added four subscript letters (h, k, l, m); Unicode 4.1 (2005) introduced the subscript schwa (U+2094); and Unicode 6.0 (2010) incorporated eight more characters, including additional subscript letters (n, p, s, t) and completing the operator set, bringing the total to 42. These additions were driven by proposals for better plain-text encoding of phonetic transcriptions and mathematical expressions, ensuring the block's stability without further changes through Unicode 17.0.

Supplementary Superscript and Subscript Characters

Supplementary superscript and subscript characters are encoded outside the dedicated Superscripts and Subscripts block (U+2070–U+209F), appearing in various Unicode blocks to support , linguistic, and typographic needs across scripts. These forms extend the range of available modifiers, often as spacing or non-spacing letters that attach to base characters for precise notation. The Spacing Modifier Letters block (U+02B0–U+02FF) houses a significant collection of superscript modifier letters, primarily for , such as ʰ (U+02B0, modifier letter small h) and ʲ (U+02B2, modifier letter small j). This block also includes limited subscript-like symbols, for example, ˑ (U+02D1, modifier letter half triangular colon), which positions below the baseline in certain linguistic contexts. Approximately 40 characters in this block function as superscripts or subscripts, enabling reduced or echoed vowel representations in . In the block (U+0300–U+036F), several non-spacing marks produce subscript-like effects by attaching below base characters, such as ̱ (U+0331, combining ) or ̖ (U+0316, combining below). These are not standalone subscripts but combine to simulate subscript positioning in mathematical or phonetic expressions where dedicated forms are unavailable. Phonetic extensions provide further supplementary forms, notably in the block (U+1D00–U+1D7F), which includes subscript letters like ᵢ (U+1D62, Latin subscript small letter i) and superscript variants such as ᵝ (U+1D5D, modifier letter small beta). More recent encodings appear in scripts like Kayah Li (U+A910–U+A95F), where tone marks such as ꢰ (U+A930, Kayah Li tone calya pla pla) adopt superscript positioning to indicate tonal variations. These extensions total around 50 characters across blocks, acting as fallbacks for glyphs missing from the core block, particularly for non-Latin scripts. Limited supplementary forms overlap with the block (starting at U+1D400), focusing on non-italic compatibility superscripts and subscripts for alphanumeric notation, though these prioritize mathematical rendering over general . Many supplementary characters fall under the Lm (Letter, modifier) general category, allowing tight attachment to preceding bases without inter-character spacing. Encoding includes compatibility decompositions for normalization; for instance, ᵦ (U+1D66, modifier letter small v with right hook) decomposes to followed by U+03B2 (Greek small letter beta).

Character Inventories

Latin

The Unicode Standard provides extensive support for superscript and subscript forms of Latin letters, primarily through the Superscripts and Subscripts block (U+2070–U+209F) for core compatibility characters and the block (U+1D00–U+1D7F) for additional modifier forms used in linguistic and phonetic contexts. These characters include both precomposed superscripts and subscripts for common letters, with superscripts often serving as modifier letters and subscripts appearing in chemical and mathematical notations. Compatibility decompositions map many of these to their base forms plus combining marks, ensuring rendering consistency across legacy systems. The core block contains 13 subscript letters, while the add about 25 superscript modifiers and 4 subscript letters, totaling around 50 forms.
Code PointGlyphNameDecomposition
U+2071SUPERSCRIPT LATIN SMALL LETTER I 0069
U+207ASUPERSCRIPT LATIN SMALL LETTER N 006E
U+2080SUBSCRIPT ZERO 0030
U+2090SUBSCRIPT LATIN SMALL LETTER A 0061
U+2091SUBSCRIPT LATIN SMALL LETTER E 0065
U+2092SUBSCRIPT LATIN SMALL LETTER O 006F
U+2093SUBSCRIPT LATIN SMALL LETTER X 0078
U+2094SUBSCRIPT LATIN SMALL LETTER SCHWA 0259
U+2095SUBSCRIPT LATIN SMALL LETTER H 0068
U+2096SUBSCRIPT LATIN SMALL LETTER K 006B
U+2097SUBSCRIPT LATIN SMALL LETTER L 006C
U+2098SUBSCRIPT LATIN SMALL LETTER M 006D
U+2099SUBSCRIPT LATIN SMALL LETTER N 006E
U+209ASUBSCRIPT LATIN SMALL LETTER P 0070
U+209BSUBSCRIPT LATIN SMALL LETTER S 0073
U+209CSUBSCRIPT LATIN SMALL LETTER T 0074
U+1D2CMODIFIER LETTER CAPITAL H 0048
U+1D2EMODIFIER LETTER CAPITAL REVERSED N 1D0E
U+1D2FMODIFIER LETTER CAPITAL B 0042
U+1D43MODIFIER LETTER SMALL A 0061
U+1D44MODIFIER LETTER SMALL TURNED AE 1D02
U+1D47MODIFIER LETTER SMALL B 0062
U+1D48MODIFIER LETTER SMALL D 0064
U+1D4BMODIFIER LETTER SMALL OPEN E 025B
U+1D4DMODIFIER LETTER SMALL G 0067
U+1D4FMODIFIER LETTER SMALL K 006B
U+1D50MODIFIER LETTER SMALL M 006D
U+1D51MODIFIER LETTER SMALL ENG 014B
U+1D52MODIFIER LETTER SMALL O 006F
U+1D54MODIFIER LETTER SMALL TOP HALF O 1D16
U+1D56MODIFIER LETTER SMALL P 0070
U+1D57MODIFIER LETTER SMALL T 0074
U+1D58MODIFIER LETTER SMALL U 0075
U+1D5BMODIFIER LETTER SMALL V 0076
U+1D62LATIN SUBSCRIPT SMALL LETTER I 0069
U+1D63LATIN SUBSCRIPT SMALL LETTER R 0072
U+1D64LATIN SUBSCRIPT SMALL LETTER U 0075
U+1D65LATIN SUBSCRIPT SMALL LETTER V 0076
Note: The table includes representative forms; full inventories exceed 50 characters across blocks, with decompositions ensuring compatibility.

Greek

Support for Greek superscripts and subscripts is limited compared to Latin, with dedicated precomposed characters appearing mainly in the block (U+1D00–U+1D7F) as modifier letters for linguistic use. Superscripts include small forms of beta, gamma, and others, while subscripts are sparse, often relying on such as the (U+0331) applied to base Greek letters (e.g., α̱ for subscript alpha). Key characters number around 10, focusing on phonetic modifiers rather than general . No core block (U+2070–U+209F) entries exist for Greek letters.
Code PointGlyphNameDecomposition
U+1D5DMODIFIER LETTER SMALL BETA 03B2
U+1D5EMODIFIER LETTER SMALL GAMMA 03B3
U+1D5FMODIFIER LETTER SMALL DELTA 03B4
U+1D60MODIFIER LETTER SMALL EPSILON 03B5
U+1D61MODIFIER LETTER SMALL ZETA 03B6
U+1DBFᶿMODIFIER LETTER SMALL RHO 03C1
U+1D66GREEK SUBSCRIPT SMALL LETTER BETA SYMBOL 03D0
U+1D67GREEK SUBSCRIPT SMALL LETTER CHI SYMBOL 03D7
U+1D68GREEK SUBSCRIPT SMALL LETTER RHO 03C1
U+1D69GREEK SUBSCRIPT SMALL LETTER PHI 03C6
These forms are primarily for phonetic transcription, with subscripts often decomposed to base Greek plus combining subscript marks for broader compatibility.

Cyrillic

Cyrillic support for superscripts and subscripts remains sparse in Unicode, with no dedicated characters in the core Superscripts and Subscripts block (U+2070–U+209F). Instead, combining forms in the Cyrillic Extended-A block (U+2DE0–U+2DFF) provide superscript-like modifiers for Old Church Slavonic texts, applicable to legacy phonetic notations. The Cyrillic Extended-D block introduced in Unicode 17.0 (U+1E030–U+1E08F) adds superscript and subscript characters for phonetic and phonological purposes, including modifier letters for small superscript forms (e.g., U+1E030–U+1E04A) and subscript small letters (e.g., U+1E051–U+1E067) for phonetic and phonological notations, though adoption is limited as of 2025, with ongoing proposals for expansion to include forms like small ya (similar to U+1D5D adaptations). Examples include combining letters that can overlay bases for superscript effects, emphasizing compatibility with historical Cyrillic typesetting. Coverage gaps persist, particularly for subscripts, which often use combining low line (U+0332).
Code PointGlyphNameDecomposition
U+2DE0COMBINING CYRILLIC LETTER BE 0431
U+2DE3COMBINING CYRILLIC LETTER DE 0434
U+2DE4COMBINING CYRILLIC LETTER IE 0435
U+2DE8COMBINING CYRILLIC LETTER ZHE 0436
U+2DEECOMBINING CYRILLIC LETTER HA 0445
U+2DF4COMBINING CYRILLIC LETTER TE 0442
U+1E030MODIFIER LETTER CYRILLIC SMALL A 0430
U+1E051CYRILLIC SUBSCRIPT SMALL LETTER A 0430
These combining characters function as superscripts in stacked notations, supporting legacy texts but lacking full subscript equivalents without additional marks; the new Extended-D forms provide precomposed options for modern phonetic use. Scripts related to Latin, such as Armenian (U+0530–U+058F) and Georgian (U+10A0–U+10FF), have minimal dedicated superscript or subscript characters in Unicode. Support relies on from the Combining Diacritical Marks block (U+0300–U+036F), such as superscript tone marks or subscript lines, applied to base letters for formatting. Total coverage gaps are notable, with no precomposed forms in core or phonetic blocks, limiting use in typographic or scientific contexts compared to Latin. Proposals for expanded support remain under discussion but unimplemented as of 2025.

International Phonetic Alphabet Extensions

The International Phonetic Alphabet (IPA) employs a range of superscript and subscript characters in to denote phonetic modifications, such as aspiration, , and syllabicity, enabling precise transcription of across languages. These extensions primarily draw from the Spacing Modifier Letters block (U+02B0–U+02FF) for superscripts and the block (U+0300–U+036F) for many subscripts, with additional support from the Superscripts and Subscripts block (U+2070–U+209F). Over 50 such characters are relevant to IPA usage, facilitating digital tools for and research by providing standardized encoding for modifiers that alter base symbols without requiring complex font rendering. Consonant superscripts, spanning more than 20 forms in the U+02B0–U+02E4 range, indicate secondary articulations or releases, such as aspiration (ʰ U+02B0, modifier letter small h) or fricative release (ˢ U+02E2, modifier letter small s). These are applied sequentially after base consonants, as in [pʰ] for aspirated bilabial plosive, and are integral to pulmonic consonant charts in IPA documentation. Vowel and diacritic superscripts, including ʷ (U+02B7, modifier letter small w for labialization) and ˑ (U+02D1, modifier letter half triangular colon for half-long duration), modify vowels or consonants for features like rounding or stress. Subscripts often use combining forms, such as ̥ (U+0325, combining ring below for voicelessness) on vowels to indicate devoicing, as in [i̥] for a voiceless high front vowel. These elements support vowel quality adjustments and prosodic annotations in IPA transcriptions. Length and prosody marks include ː (U+02D0, modifier letter triangular colon for long duration), applied as [aː] for prolonged vowels, and ̩ (U+0329, combining vertical line below for syllabicity), as in [n̩] for a syllabic nasal. These are essential for suprasegmental features like and syllable structure in the IPA prosody chart. Wildcards and ties utilize characters like ̯ (U+032F, combining inverted breve below for non-syllabicity), combined with ʔ (U+02BC, modifier letter apostrophe) as [ʔ̯] to denote a non-syllabic glottal, and ligatures formed via the combining double inverted breve (U+0361) or (U+200D) for ties like [t͡s]. These aid in representing linked or ambiguous articulations in and prosody charts.
Phonetic ValueGlyphCode PointIPA Chart ReferenceNotes
AspirationʰU+02B0Pulmonic consonantsSequential modifier; added in Unicode 1.1.0.
PalatalizationʲU+02B2Pulmonic consonantsAlternative to combining acute; essential for .
ʷU+02B7Vowels & consonantsUsed for rounded vowels like [uʷ]; added in Unicode 1.1.
ˤU+02E4Pulmonic consonantsFor emphatic sounds in ; added in Unicode 1.1.
Voiceless (subscript)̥U+0325Vowels & diacriticsCombining ring below; e.g., [ḁ] for devoiced vowel.
Syllabicity (subscript)̩U+0329SuprasegmentalsSubscript wedge for syllabic consonants like [l̩].
Long durationːU+02D0SuprasegmentalsTriangular colon; doubles in transcription.
Half-long durationˑU+02D1SuprasegmentalsHalf triangular colon; for intermediate lengths.
Non-syllabicity̯U+032FSuprasegmentals below; e.g., with glottal [ʔ̯].
Affricate tie◌͡◌U+0361Pulmonic consonantsCombining double ; for ligatures like [t͡ʃ].
Recent proposals, such as those in 2024 for subscript forms of w (U+209B-like), y, z, and ɣ (resembling U+1D5F but subscripted), aim to expand options for simultaneous articulations in extIPA, distinguishing them from superscript sequential modifiers; these are under review for future Unicode versions to enhance phonetic precision. These IPA extensions became more comprehensive for digital linguistics tools with Unicode 5.1 in 2008, incorporating key modifiers for broad compatibility in software and fonts.

Advanced Features

Composite and Precomposed Forms

Unicode provides both precomposed characters and combining sequences to form complex superscript and subscript notations, enabling the construction of mathematical, phonetic, and typographic expressions without relying solely on basic standalone variants. Precomposed forms are single code points that inherently encode the raised or lowered positioning, such as U+2082 SUBSCRIPT TWO (₂) for numerical subscripts or U+00B2 SUPERSCRIPT TWO (²) for exponents, which are part of the compatibility mappings from legacy encodings. These are particularly useful in where direct rendering of the offset is required, as seen in the Superscripts and Subscripts block (U+2070–U+209F). Combining sequences, in contrast, build superscripts and subscripts by attaching diacritical marks to a base character, offering flexibility for non-standard or extended forms not covered by precomposed options. For instance, a subscript effect on arbitrary letters can be achieved with U+0332 COMBINING LOW LINE (a̲), which places an below the base, while stacking multiple modifiers allows layered notations like a dotted superscript i (xⁱ̇) using U+0307 COMBINING DOT ABOVE on U+2071 SUPERSCRIPT LATIN SMALL LETTER I. Such sequences follow ordering rules to ensure consistent rendering, with combining marks applied in a specific sequence to avoid visual overlap. For phonetic representations, ligatures and ties enhance clustering, such as using U+0361 COMBINING DOUBLE INVERTED BREVE to connect affricates (t͡s), which visually ties adjacent consonants without altering their base forms. The (U+200D) further supports this by requesting connected rendering in sequences that might otherwise separate, applicable in linguistic transcriptions to form tight clusters. Normalization processes in Unicode handle these forms to promote interoperability. Under Normalization Form C (NFC), precomposed characters like ² remain intact as they are canonically composed, preserving the intended offset without decomposition. However, Normalization Form KC (NFKC) applies compatibility decompositions, converting superscripts and subscripts to their base equivalents (e.g., ² to 2), which can simplify storage but risks losing typographic distinctions in mathematical contexts. Overlong sequences in NFKC may arise if multiple compatibility mappings are chained, potentially leading to unintended flattening of complex notations. Font support for these constructions is robust in modern typefaces, with Noto Sans including glyphs for the full Superscripts and Subscripts block to ensure accurate display of both precomposed and combining forms. In dynamic text applications, variable fonts leverage 's Glyph Positioning table (GPOS) for precise adjustments, enabling scalable positioning of superscripts and subscripts since the feature's maturation in OpenType specifications around 2010.

Compatibility Decomposition and Rendering

Unicode superscripts and subscripts belong to the class of compatibility characters in the Unicode Standard, meaning they are subject to compatibility decomposition during normalization processes such as NFKC (Normalization Form Compatibility Composition) and NFKD (Normalization Form Compatibility Decomposition). In these forms, characters like subscript zero (U+2080, ₀) map to their base equivalents, such as the plain digit zero (U+0030, 0), to ensure semantic consistency across variant representations. This decomposition relies on the Compatibility Decomposition Mapping in the Unicode Character Database, which prioritizes round-trip compatibility over strict canonical equivalence. In contrast, canonical decomposition, used in NFC and NFD, does not apply these mappings, preserving the distinct typographic forms of superscripts and subscripts without altering them to base characters. Rendering these characters presents challenges due to inconsistent font support across systems, often requiring font fallback mechanisms to display glyphs properly. For example, common fonts like may omit glyphs for less frequently used subscript letters, such as those in the Latin Extended block, leading the rendering engine to substitute from fallback fonts like that include broader coverage. Recent examples include the acceptance in October 2025 of subscript small letters w (U+209D), y (U+209E), z (U+209F), and gamma (U+1DFD0) for Unicode 18.0, which will require future font updates for full support. In web contexts, CSS properties like vertical-align: sub or vertical-align: sup adjust baseline positioning to simulate or enhance the visual offset of these characters, though the exact shift can vary by font metrics and browser implementation. The <sub> and <sup> elements, introduced in the HTML 4.01 specification in December 1999, provide semantic markup for such rendering, typically applying a default font-size reduction to approximately 83% and appropriate vertical alignment. Cross-platform variations further complicate rendering, particularly with stylistic interpretations of related characters; for instance, enclosed alphanumeric symbols like circled digit two (U+2461, ②) in the block may appear with a superscript-like in certain fonts or mobile renderers, despite their primary design as circled forms. Incomplete support remains a gap, as seen with provisional assignments such as the Latin subscript small letter z (U+209F), which was provisionally assigned in November 2024 and accepted for inclusion in Unicode 18.0 in October 2025, and thus lacks stable encoding and widespread font rendering as of November 2025. To address advanced layout needs, MATH tables allow fonts to define precise positioning, spacing, and variants for mathematical superscripts and subscripts, improving consistency in equation rendering. The Technical Report #25, updated in October 2025, outlines recommendations for math-specific handling of these characters, emphasizing properties like italic correction and script style to avoid clashes in subscript placement, such as preventing tails on italic lowercase z from interfering with baselines. Developers can test and normalization behaviors using established libraries like the (ICU), which implements all Unicode normalization forms and verifies equivalence mappings for compatibility characters.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.