Hubbry Logo
search
logo
1228886

Extended ASCII

logo
Community Hub0 Subscribers
Write something...
Be the first to start a discussion here.
Be the first to start a discussion here.
See all
Extended ASCII

Extended ASCII is a repertoire of character encodings that include (most of) the original 96 ASCII character set, plus up to 128 additional characters. There is no formal definition of "extended ASCII", and even use of the term is sometimes criticized, because it can be mistakenly interpreted to mean that the American National Standards Institute (ANSI) had updated its ANSI X3.4-1986 standard to include more characters, or that the term identifies a single unambiguous encoding, neither of which is the case.

The ISO standard ISO 8859 was the first international standard to formalise a (limited) expansion of the ASCII character set: of the many language variants it encoded, ISO 8859-1 ("ISO Latin 1") – which supports most Western European languages – is best known in the West. There are many other extended ASCII encodings (more than 220 DOS and Windows codepages). EBCDIC ("the other" major character code) likewise developed many extended variants (more than 186 EBCDIC codepages) over the decades.

All modern operating systems use Unicode which supports thousands of characters. However, extended ASCII remains important in the history of computing, and supporting multiple extended ASCII character sets required software to be written in ways that made it much easier to support the UTF-8 encoding method later on.

ASCII was designed in the 1960s for teleprinters and telegraphy, and some computing. Early teleprinters were electromechanical, having no microprocessor and just enough electromechanical memory to function. They fully processed one character at a time, returning to an idle state immediately afterward; this meant that any control sequences had to be only one character long, and thus a large number of codes needed to be reserved for such controls. They were typewriter-derived impact printers, and could only print a fixed set of glyphs, which were cast into a metal type element or elements; this also encouraged a minimum set of glyphs.

Seven-bit ASCII improved over prior five- and six-bit codes. Of the 27=128 codes, 33 were used for controls, and 95 carefully selected printable characters (94 glyphs and one space), which include the English alphabet (uppercase and lowercase), digits, and 31 punctuation marks and symbols: all of the symbols on a standard US typewriter plus a few selected for programming tasks. Some popular peripherals only implemented a 64-printing-character subset: Teletype Model 33 could not transmit "a" through "z" or five less-common symbols (`, {, |, }, and ~). and when they received such characters they instead printed "A" through "Z" (forced all caps) and five other mostly-similar symbols (@, [, \, ], and ^).

The ASCII character set is barely large enough for US English use, lacks many glyphs common in typesetting, and is far too small for universal use. Many more letters and symbols are desirable, useful, or required to directly represent letters of alphabets other than English, more kinds of punctuation and spacing, more mathematical operators and symbols (× ÷ ⋅ ≠ ≥ ≈ π etc.), some unique symbols used by some programming languages, ideograms, logograms, box-drawing characters, etc.

The biggest problem for computer users around the world was the needs of their local alphabets. ASCII's English alphabet almost accommodates European languages, if accented letters are written without accents or two-character approximations, such as ss for ß, are used. Modified local variants of 7-bit ASCII appeared promptly, trading some lesser-used symbols for highly desired symbols or letters, such as replacing # with £ on UK Teletypes, \ with ¥ in Japan or in Korea, etc. At least 29 variant sets resulted. Twelve codepoints were modified by at least one national set, leaving only 82 "invariant" codes. Programming languages however had assigned meaning to many of the replaced characters, work-arounds were devised such as C three-character sequences ??< and ??> to represent { and }. Languages with dissimilar basic alphabets could use transliteration, such as replacing all the Latin letters with the closest match Cyrillic letters (resulting in odd but somewhat readable text when English was printed in Cyrillic or vice versa). Schemes were also devised so that two letters could be overprinted (often with the backspace control between them) to produce accented letters. Users were not comfortable with any of these compromises and they were often poorly supported.[citation needed]

When computers and peripherals standardized on eight-bit bytes in the 1970s, it became obvious that computers and software could handle text that uses 256-character sets at almost no additional cost in programming, and no additional cost for storage (assuming that the unused 8th bit of each byte was not reused in some way, such as error checking, Boolean fields, or packing 8 characters into 7 bytes). This would allow ASCII to be used unchanged and provide 128 more characters. Many manufacturers devised 8-bit character sets consisting of ASCII plus up to 128 of the unused codes. Thus encodings which covered all the major Western European (and Latin American) languages and more could be made.

See all
User Avatar
No comments yet.