Unicode compatibility characters

Community hub

0 subscribers

Be the first to start a discussion here.

Recent from talks

Be the first to start a discussion here.

Recent from talks

Be the first to start a discussion here.

Contribute something to knowledge base

About hubMembersRules

Hub AI

Unicode compatibility characters AI simulator

(@Unicode compatibility characters_simulator)

Hub AI

Unicode compatibility characters AI simulator

(@Unicode compatibility characters_simulator)

Wikipedia

Unicode compatibility characters

In Unicode and the Universal Character Set, a compatibility character is a character that is encoded solely to maintain round-trip convertibility with other, often older standards. According to the Unicode Glossary:

A character that would not have been encoded except for compatibility and round-trip convertibility with other standards.

Although the term compatibility appears in character names, it is not itself represented as a distinct character property. In practice, the definition is more complex. One of the properties assigned to characters by the Unicode Consortium is decomposition, including compatibility decomposition. More than five thousand characters have a compatibility decomposition mapping that relates the compatibility character to one or more other UCS characters. By assigning a compatibility decomposition to a character, Unicode effectively designates it as a compatibility character.

The reasons for assigning compatibility status vary and are discussed in more detail below. The term decomposition can be confusing, because in some cases a character’s decomposition consists of a single character. In such cases, the decomposition maps one character to another that is approximately—but not canonically—equivalent.

The compatibility decomposition property for the 5,402 Unicode compatibility characters^[when?] includes a keyword that divides the compatibility characters into 17 logical groups. Those characters with a compatibility decomposition but without a keyword are termed canonically decomposable characters and those characters are not compatibility characters. Keywords for compatibility decomposable characters include: <initial>, <medial>, <final>, <isolated>, <wide>, <narrow>, <small>, <square>, <vertical>, <circle>, <noBreak>, <fraction>, <sub>, <super>, and <compat>. These keywords provide some indication of the relation between the compatibility character and its compatibility decomposition character sequence. Compatibility characters fall into three basic categories:

Because these semantically distinct characters may be displayed with glyphs similar to the glyphs of other characters, text processing software should try to address possible confusion for the sake of end users. When comparing and collating (sorting) text strings, different forms and rich text variants of characters should not alter the text processing results. For example, software users may be confused when performing a ‘find’ on a page for a capital Latin letter 'I' and their software application fails to find the visually similar Roman numeral 'Ⅰ'.

Some compatibility characters are completely dispensable for text processing and display software that conforms to the Unicode standard. These include:

The UCS, Unicode character properties and the Unicode algorithms provide software implementations with everything needed to properly display these characters from their decomposition equivalents. Therefore, these decomposable compatibility characters become redundant and unnecessary. Their existence in the character set requires extra text processing to ensure text is properly compared and collated (see Unicode normalization). Moreover, these compatibility characters provide no additional or distinct semantics. Nor do these characters provide any visually distinct rendering, provided the text layout and fonts are Unicode conforming. Also, none of these characters are required for round-trip convertibility to other character sets, since the transliteration can easily map decomposed characters to precomposed counterparts in another character set. Similarly, contextual forms, such as a final Arabic letter can be mapped based on its position within a word to the appropriate legacy character set form character.

See all

Wikipedia

Unicode compatibility characters

A character that would not have been encoded except for compatibility and round-trip convertibility with other standards.

Some compatibility characters are completely dispensable for text processing and display software that conforms to the Unicode standard. These include:

See all

Knowledge Base

Talk Channels

Special Pages

Unicode compatibility characters

Recent from talks

Recent from talks

Contribute something to knowledge base

Subscribers

Supporters

Contributors

Moderators

Hub AI

Hub AI

Hub AI

Unicode compatibility characters

Wikipedia

Unicode compatibility characters

History

Unicode compatibility characters

Recent from talks

Recent from talks

Contribute something to knowledge base

Subscribers

Supporters

Contributors

Moderators

Hub AI

Hub AI

Hub AI

Unicode compatibility characters

Wikipedia

Unicode compatibility characters