Recent from talks
Nothing was collected or created yet.
Variant form (Unicode)
View on WikipediaA variant form is an alternate glyph for a character, encoded in Unicode through the mechanism of variation sequences: sequences in Unicode that consist of a base character followed by a variation selector character.
A variant form usually has a very similar appearance and meaning as its base form. The mechanism is intended for variant forms where, generally, if the variant form is unavailable, displaying the base character does not change the meaning of the text, and may not even be noticeable to many readers.
Unicode defines two types of variation sequences:
- Standardized variation sequences defined in StandardizedVariants.txt[1]
- Ideographic variation sequences defined in the Ideographic Variation Database (IVD)[2][3]
Variation selector characters reside in several Unicode blocks:
- Variation Selectors (16 characters abbreviated VS1–VS16)
- Variation Selectors Supplement (240 characters abbreviated VS17–VS256)
- Mongolian (4 characters abbreviated FVS1–FVS4)
Variation selectors are not required for Arabic and Latin cursive characters, where substitution of glyphs can occur based on context: glyphs may be connected together depending on whether the character is the initial character in a word, the final character, a medial character or an isolated character. These types of glyph substitution are easily handled by the context of the character with no other authoring input involved. Authors may also use special-purpose characters such as joiners and non-joiners to force an alternate form of glyph where it would not otherwise appear. Ligatures are similar instances where glyphs may be substituted simply by turning ligatures on or off as a rich text attribute.
For other glyph substitution, the author's intent may need to be encoded with the text and cannot be determined contextually. This is the case with character/glyphs referred to as gaiji, where different glyphs are used for the same character either historically or for ideographs for family names. This is one of the gray areas in distinguishing between a glyph and a character: If a family name differs slightly from the ideograph character it derives from, then is that a simple glyph variant or a character variant?
Character substitutions may also occur outside of Unicode, for example with OpenType Layout tags.[4]
Blocks with standardized variation sequences
[edit]As of Unicode version 17.0, standardized variation sequences specifically for emoji/text presentation are defined for base characters in 20 blocks:[1]
- Arrows
- Basic Latin
- CJK Symbols and Punctuation
- Dingbats
- Emoticons
- Enclosed Alphanumeric Supplement
- Enclosed Alphanumerics
- Enclosed CJK Letters and Months
- Enclosed Ideographic Supplement
- General Punctuation
- Geometric Shapes
- Latin-1 Supplement
- Letterlike Symbols
- Mahjong Tiles
- Miscellaneous Symbols
- Miscellaneous Symbols and Arrows
- Miscellaneous Symbols and Pictographs
- Miscellaneous Technical
- Supplemental Arrows-B
- Transport and Map Symbols
Other standardized variation sequences are formed with base characters in the following sixteen blocks:[1]
- CJK Unified Ideographs
- CJK Unified Ideographs Extension A
- CJK Unified Ideographs Extension B
- Egyptian Hieroglyph Format Controls
- Egyptian Hieroglyphs
- Egyptian Hieroglyphs Extended-A
- Halfwidth and Fullwidth Forms
- Manichaean
- Mathematical Alphanumeric Symbols
- Mathematical Operators
- Miscellaneous Mathematical Symbols-B
- Mongolian
- Myanmar
- Myanmar Extended-A
- Phags-pa
- Supplemental Mathematical Operators
Blocks with ideographic variation sequences
[edit]As of 14 July 2025[update], ideographic variation sequences are defined for base characters in eleven blocks:[2][3]
- CJK Compatibility Ideographs
- CJK Unified Ideographs
- CJK Unified Ideographs Extension A
- CJK Unified Ideographs Extension B
- CJK Unified Ideographs Extension C
- CJK Unified Ideographs Extension D
- CJK Unified Ideographs Extension E
- CJK Unified Ideographs Extension F
- CJK Unified Ideographs Extension G
- CJK Unified Ideographs Extension H
- CJK Unified Ideographs Extension I
See also
[edit]References
[edit]- ^ a b c "UCD: Standardized Variation Sequences". Unicode Consortium.
- ^ a b "Ideographic Variation Database". Unicode Consortium.
- ^ a b "UTS #37, Unicode Ideographic Variation Database". Unicode Consortium.
- ^ "Language system tags". Microsoft. 30 September 2022.
Variant form (Unicode)
View on GrokipediaStandardizedVariants.txt file, cover over 1,000 sequences for elements like punctuation, arrows, and mathematical operators, guaranteeing consistent cross-platform rendering—such as a non-fullwidth left single quotation mark (U+2018 U+FE00). Ideographic variants, managed through the Unicode Ideographic Variation Database (UTS #37), address glyph differences in Han characters for regions like Japan or Korea, with registered sequences in IVD_Sequences.txt to preserve distinctions lost in unification. Emoji variants, outlined in UTS #51 and emoji-variation-sequences.txt, allow toggling between text-style (e.g., U+0023 U+FE0E for black-and-white "#") and emoji-style (U+0023 U+FE0F for colorful "#") presentations, supporting over 100 such pairs.[1][2][3][4]
Beyond selectors, Unicode accommodates variant forms through compatibility character blocks, such as Small Form Variants (U+FE50–U+FE6B), which provide superscript-like or subscript-sized punctuation (e.g., small comma U+FE50) for legacy compatibility, and CJK Compatibility Forms (U+FE30–U+FE4F) for vertical typesetting adjustments. These approaches ensure variant forms remain stable under normalization processes like NFC or NFD (UAX #15), where sequences are preserved to avoid unintended glyph changes during text collation or searching.[5][6]
Overall, variant forms enhance Unicode's flexibility for global text interchange, balancing unification of similar glyphs with the need for precise, locale-specific rendering in applications ranging from digital typography to international software localization.[1]
