Recent from talks
Nothing was collected or created yet.
ISO/IEC 8859-8
View on Wikipedia| MIME / IANA | ISO-8859-8 |
|---|---|
| Alias(es) | iso-ir-138, hebrew, csISOLatinHebrew[1] |
| Languages | Hebrew, English |
| Standard | ISO/IEC 8859-8, ECMA-121, SI 1311 |
| Classification | extended ASCII, ISO 8859 |
| Based on | DEC Hebrew (8-bit), ISO/IEC 8859-1 |
| Other related encoding | Windows-1255 |
ISO/IEC 8859-8, Information technology — 8-bit single-byte coded graphic character sets — Part 8: Latin/Hebrew alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings. ISO/IEC 8859-8:1999 from 1999 represents its second and current revision, preceded by the first edition ISO/IEC 8859-8:1988 in 1988. It is informally referred to as Latin/Hebrew. ISO/IEC 8859-8 covers all the Hebrew letters, but no Hebrew vowel signs. IBM assigned code page 916 (CCSIDs 916 and 5012) to it.[2][3][4] This character set was also adopted by Israeli Standard SI1311:2002, with some extensions.
ISO-8859-8 is the IANA preferred charset name for this standard when supplemented with the C0 and C1 control codes from ISO/IEC 6429. The text is (usually) in logical order, so bidi processing is required for display. Nominally ISO-8859-8 (code page 28598) is for “visual order”, and ISO-8859-8-I (code page 38598) is for logical order. But usually in practice, and required for XML documents,[citation needed] ISO-8859-8 also stands for logical order text. The WHATWG Encoding Standard used by HTML5 treats ISO-8859-8 and ISO-8859-8-I as distinct encodings with the same mapping due to influence on the layout direction, but notes that this no longer applies to ISO-8859-6 (Arabic), only to ISO-8859-8.[5]
There is also ISO-8859-8-E which supposedly requires directionality to be explicitly specified with special control characters; this latter variant is in practice unused.
The Microsoft Windows code page for Hebrew, Windows-1255, is mostly an extension of ISO/IEC 8859-8 without C1 controls, except for the omission of the double underscore, and replacement of the generic currency sign (¤) with the sheqel sign (₪). It adds support for vowel points as combining characters, and some additional punctuation.
Over a decade after the publication of that standard, Unicode is preferred, at least for the Internet[6] (meaning UTF-8, the dominant encoding for web pages). ISO-8859-8 is used by less than 0.1% of websites.[7]
Code page layout
[edit]| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
| 0x | ||||||||||||||||
| 1x | ||||||||||||||||
| 2x | SP | ! | " | # | $ | % | & | ' | ( | ) | * | + | , | - | . | / |
| 3x | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | : | ; | < | = | > | ? |
| 4x | @ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O |
| 5x | P | Q | R | S | T | U | V | W | X | Y | Z | [ | \ | ] | ^ | _ |
| 6x | ` | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o |
| 7x | p | q | r | s | t | u | v | w | x | y | z | { | | | } | ~ | |
| 8x | ||||||||||||||||
| 9x | ||||||||||||||||
| Ax | NBSP | ¢ | £ | ¤ | ¥ | ¦ | § | ¨ | © | × | « | ¬ | SHY | ® | ¯ | |
| Bx | ° | ± | ² | ³ | ´ | µ | ¶ | · | ¸ | ¹ | ÷ | » | ¼ | ½ | ¾ | |
| Cx | ||||||||||||||||
| Dx | ‗ | |||||||||||||||
| Ex | א | ב | ג | ד | ה | ו | ז | ח | ט | י | ך | כ | ל | ם | מ | ן |
| Fx | נ | ס | ע | ף | פ | ץ | צ | ק | ר | ש | ת | LRM | RLM |
FD is left-to-right mark (U+200E) and FE is right-to-left mark (U+200F), as specified in a newer amendment as ISO/IEC 8859-8:1999.
2002 Israeli Standard extensions
[edit]Israeli Standard SI1311:2002 matches ISO/IEC 8859-8:1999 except for a number of additional character allocations for the euro sign, new shekel sign and more advanced explicit bidirectional formatting.[12]
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
| Dx | € | ₪ | LRO | RLO | ‗ | |||||||||||
| Ex | א | ב | ג | ד | ה | ו | ז | ח | ט | י | ך | כ | ל | ם | מ | ן |
| Fx | נ | ס | ע | ף | פ | ץ | צ | ק | ר | ש | ת | LRE | RLE | LRM | RLM |
See also
[edit]- 8-bit DEC Hebrew (similar DEC code page)
- Code page 1255 (similar Windows code page)
- SI 960
- 7-bit DEC Hebrew
References
[edit]- ^ Character Sets, Internet Assigned Numbers Authority (IANA), 2018-12-12
- ^ "Code page 916 information document". Archived from the original on 2017-02-16.
- ^ "CCSID 916 information document". Archived from the original on 2014-11-29.
- ^ "CCSID 5012 information document". Archived from the original on 2016-03-27.
- ^ van Kesteren, Anne. "9. Legacy single-byte encodings". Encoding Standard. WHATWG.
Note: ISO-8859-8 and ISO-8859-8-I are distinct encoding names, because ISO-8859-8 has influence on the layout direction. And although historically this might have been the case for ISO-8859-6 and "ISO-8859-6-I" as well, that is no longer true.
- ^ John, Nicholas A. (2013). "The Construction of the Multilingual Internet: Unicode, Hebrew, and Globalization". Journal of Computer-Mediated Communication. 18 (3): 321–338. doi:10.1111/jcc4.12015. ISSN 1083-6101.
Background: the problem of Hebrew and the Internet
- ^ "Usage Statistics of ISO-8859-8 for Websites, January 2019". w3techs.com. Retrieved 2019-01-17.
- ^ Code Page CPGID 00916 (pdf) (PDF), IBM
- ^ Code Page CPGID 00916 (txt), IBM
- ^ International Components for Unicode (ICU), ibm-916_P100-1995.ucm, 2002-12-03
- ^ International Components for Unicode (ICU), ibm-5012_P100-1999.ucm, 2002-12-03
- ^ a b Standards Institution of Israel. ISO-IR-234: Latin/Hebrew character set for 8-bit codes (PDF). ITSCJ/IPSJ.
External links
[edit]- ISO/IEC 8859-8:1999
- Standard ECMA-121 - 8-Bit Single-Byte Coded Graphics Character Sets - Latin/Hebrew Alphabet
- Israeli Standard SI1311:2002 Archived 2005-11-24 at the Wayback Machine (Hebrew)
- ISO-IR registrations:
- From ECMA-121:1987 and following ISO/IEC 8859-8:1988: European Computer Manufacturers Association (1987-07-31). ISO-IR-138: Latin/Hebrew Alphabet (PDF). ITSCJ/IPSJ.
- Following ISO/IEC 8859-8:1999 and ECMA-121:2000: Standards Institution of Israel (1998-05-01). ISO-IR-198: Latin/Hebrew Alphabet (PDF). ITSCJ/IPSJ.
- From SI 1311:2002: Standards Institution of Israel (2004-07-20). ISO-IR-234: Latin/Hebrew character set for 8-bit codes (PDF). ITSCJ/IPSJ.
ISO/IEC 8859-8
View on GrokipediaOverview
Introduction
ISO/IEC 8859-8 is an international standard defining an 8-bit single-byte coded graphic character set for the Latin/Hebrew alphabet, designed to support the representation of text in English and Hebrew.[1] This encoding extends the 7-bit ASCII character set by assigning the upper 128 code points (0x80 to 0xFF) primarily to Hebrew characters while retaining compatibility with basic Latin symbols.[3] As part of the broader ISO/IEC 8859 series, which consists of ASCII-based 8-bit standards tailored to various scripts and languages, ISO/IEC 8859-8 specifically addresses the needs of Hebrew typography in data processing environments.[3] It incorporates the 95 printable ASCII characters in the lower 128 code points (0x20 to 0x7E) and adds 27 Hebrew letters—comprising the 22 basic consonants and their 5 final forms—in the upper range.[6] A key limitation of this standard is its exclusion of niqqud (Hebrew vowel signs) and other diacritical marks, making it suitable for unpointed Modern Hebrew text but not for fully vocalized or accented content.[3] The preferred MIME/IANA name for this character encoding is iso-8859-8. It is commonly referenced by code page aliases including CP916 in IBM systems, 28598 for the visual ordering variant in Windows, and 38598 for the logical ordering variant.[7][8] Like other members of the series, it shares the initial Latin repertoire with ISO/IEC 8859-1 but replaces the extended Latin characters with Hebrew ones.[1]Scope and Purpose
ISO/IEC 8859-8 is designed as an 8-bit single-byte coded graphic character set primarily for information interchange in data processing systems that handle Hebrew text alongside Latin-based scripts, such as English. It facilitates the encoding of unpointed Hebrew characters, enabling efficient storage and transmission of bilingual content in environments requiring compatibility between Western and Hebrew scripts.[9] The standard supports the Hebrew language, focusing exclusively on consonants without vowel points (niqqud) or diacritical marks, and is also compatible with English through its Latin alphabet subset. This limitation ensures a compact set suitable for modern Hebrew (Ivrit) text in general office applications, while excluding features like pointed Hebrew to maintain simplicity and focus on consonant-based writing systems. Yiddish orthography is not accommodated. The total composition includes 125 graphic characters, balancing brevity with functionality for bidirectional text that requires right-to-left rendering.[9] Key design goals emphasize seamless integration with existing 7-bit systems by retaining full compatibility in the 0x00-0x7F range, which mirrors ISO 646 (International Reference Version) for ASCII characters, while extending to the 0x80-0xFF range for Hebrew-specific codes. This approach promotes widespread adoption in information technology systems without necessitating major overhauls. Additionally, the base encoding uses visual ordering for Hebrew text, storing characters in the order they appear on display (left-to-right, reversed from logical reading order); a variant (ISO-8859-8-I) uses logical ordering for storage in natural reading sequence to support easier processing in bidirectional environments. The standard also accommodates bidirectional processing through control characters such as the Left-to-Right Mark (LRM) at 0xFD and Right-to-Left Mark (RLM) at 0xFE.[9][10]History
Development and Standardization
The development of ISO/IEC 8859-8 took place in the 1980s as part of the international effort to create standardized 8-bit single-byte coded graphic character sets for various scripts, including Hebrew. This work was conducted under ISO/IEC Joint Technical Committee 1 (JTC 1), Subcommittee 2 (SC 2) on Coded Character Sets, specifically by Working Group 3 (WG 3) responsible for character sets and internationalization.[11][12] The character set was first registered as ISO-IR-138 in 1988.[2] The initiative built on earlier proposals from organizations like ECMA International's Technical Committee 1 (TC 1), which in February 1984 recommended an 8-bit character set framework to ISO's predecessor body, Technical Committee 97 (TC 97)/SC 2, following discussions with ANSI/X3L2.[13] ISO/IEC 8859-8 was first published on June 15, 1988, as the eighth part of the ISO/IEC 8859 series, titled "Information processing—8-bit single-byte coded graphic character sets—Part 8: Latin/Hebrew alphabet."[14] This edition defined a set of 155 coded graphic characters, focusing on the Hebrew alphabet while maintaining compatibility with the ISO 646 International Reference Version (IRV), equivalent to ASCII, for the lower 128 positions (codes 00-7F hex). The design drew structural influence from ISO/IEC 8859-1 (Latin alphabet No. 1), ensuring shared compatibility in the 7-bit subset to facilitate interoperability across the 8859 family of standards.[13] Upon its release, ISO/IEC 8859-8:1988 was jointly adopted as an International Standard by the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC) through their collaborative JTC 1 framework, marking a key milestone in supporting Hebrew text processing in information technology systems.[14] The standard's initial edition remained in effect until its withdrawal in 1999, superseded by a revised version.[14]Revisions and Amendments
The second edition of ISO/IEC 8859-8, designated ISO/IEC 8859-8:1999, was published on January 21, 1999, and technically revised the original 1988 edition to enhance compatibility with evolving information processing needs.[1] This update incorporated key amendments, including the addition of the left-to-right mark (LRM) at code position 0xFD and the right-to-left mark (RLM) at 0xFE, which provide essential support for bidirectional text rendering in Hebrew and mixed-language environments.[15][16] These format effector characters address the challenges of logical versus visual ordering in right-to-left scripts, improving the standard's applicability for international text interchange.[15] In 2002, the Israeli Standards Institution adopted SI 1311 as a national extension of the 1999 edition, incorporating additional characters while maintaining compatibility with the core ISO/IEC 8859-8 structure for Hebrew computing applications.[17] This standard builds directly on the revised ISO version to support localized requirements in Israel.[17] The 1999 edition remains the current and final revision of ISO/IEC 8859-8, last reviewed and confirmed in 2020 without amendments.[1] The ISO/IEC working group responsible for maintaining the 8859 series was disbanded in 2004, with no subsequent ISO updates issued, reflecting the broader transition within the ISO/IEC 8859 family toward Unicode as the preferred universal encoding scheme.[18]Technical Specifications
Character Set Composition
ISO/IEC 8859-8 defines a character set comprising 155 coded graphic characters, including 95 printable characters from the ASCII/Latin repertoire in the range 0x20–0x7E and 60 additional graphic characters in the upper range 0xA0–0xFF, with a dedicated subset of 27 for the Hebrew script.[1] The set is structured to support bilingual text processing in English and Hebrew, while adhering to the 8-bit single-byte encoding framework of the ISO/IEC 8859 series.[1] The character categories encompass control characters occupying positions 0x00–0x1F and 0x80–0x9F, which align with ISO/IEC 6429 for formatting and control functions; basic Latin characters in 0x20–0x7E, covering standard printable ASCII symbols, letters, and numerals; and a non-spacing character at 0xA0 representing the non-breaking space.[3] Hebrew letters are assigned to 0xE0–0xFA, forming a compact block dedicated to the script's core elements. Additionally, the soft hyphen appears at 0xAD, enabling discretionary line breaks in word processing without visible rendering.[3] These categories ensure compatibility with Latin-based systems while extending support for right-to-left Hebrew text.[1] The Hebrew subset consists exclusively of the 27 consonants from Alef to Tav, including both regular and final forms (such as Final Kaf, Mem, Nun, Pe, and Tsadi), but excludes niqqud (vowel points) and cantillation marks to prioritize unpointed text common in modern usage.[3] This selection reflects the standard's focus on essential orthography for Hebrew without diacritical complexity, facilitating straightforward implementation in legacy systems.[1] In terms of Unicode mapping, the Hebrew characters correspond to the block in the range U+0590–U+05EA. The 1999 revision introduced bidirectional marks to enhance handling of mixed Latin-Hebrew content.[3][1]Encoding Scheme
ISO/IEC 8859-8 employs an 8-bit single-byte encoding scheme, where each character is represented by exactly one byte, facilitating straightforward mapping from code points to octet sequences in data storage and transmission. This fixed-width approach ensures that the entire repertoire of 256 possible values can be processed uniformly without variable-length complications, making it suitable for legacy systems and early internationalization efforts. The lower half of the code space, bytes 0x00 through 0x7F, directly accommodates the 7-bit US-ASCII controls and basic Latin characters, providing backward compatibility with ASCII-based environments.[1][4] The allocation of bytes in the upper half (0x80 through 0xFF) follows a structured partitioning to support Latin and Hebrew scripts alongside controls. Specifically, 0x80 to 0x9F are designated for C1 control functions, 0xA0 to 0xBF for extended Latin punctuation and symbols, 0xE0 to 0xFA for the core Hebrew alphabet, and 0xFB to 0xFF for special bidirectional markers with some positions left undefined. This division allows for the inclusion of 60 defined graphic characters in the non-ASCII range (0xA0–0xFF) while reserving space for essential controls and extensions.[3][19] In terms of text directionality, the standard specifies visual ordering as the default, wherein storage reflects the right-to-left visual layout for Hebrew segments, reversing the logical sequence to simplify direct rendering on displays without additional processing. There are no multi-byte sequences in this scheme, reinforcing its single-byte, fixed-width nature for efficient parsing and minimal overhead in implementations.[20][4] For robustness, error handling in decoding treats undefined byte values—such as those in 0xC0 to 0xDF or 0xFB to 0xFC—variously across systems, often mapping them to control characters or substituting with a replacement glyph like the Unicode error symbol U+FFFD to prevent data corruption during conversion.[4][3]Code Page Layout
Basic Layout
ISO/IEC 8859-8 employs a standard 8-bit single-byte encoding scheme, where each character is represented by a single byte ranging from 0x00 to 0xFF. The lower half (0x00 to 0x7F) conforms exactly to the US-ASCII standard, including control characters from 0x00 to 0x1F and 0x7F (DELETE), as well as printable Latin characters from 0x20 (SPACE) to 0x7E (TILDE).[21] In the upper half (0x80 to 0xFF), positions 0x80 to 0x9F are undefined and typically treated as control characters. The defined graphic characters begin with 0xA0 (NO-BREAK SPACE) and include a set of Latin symbols and punctuation from 0xA0 to 0xBE, followed by a gap until 0xDF (DOUBLE LOW LINE), then the dedicated Hebrew block from 0xE0 to 0xFA. Positions 0xFB to 0xFC and 0xFF remain undefined and are treated as controls. The 1999 amendment introduced mappings for 0xFD (LEFT-TO-RIGHT MARK, U+200E) and 0xFE (RIGHT-TO-LEFT MARK, U+200F) to support bidirectional text handling.[21][21] The core of the encoding's distinctiveness lies in the Hebrew block (0xE0 to 0xFA), which maps to the 27 Hebrew characters (22 letters plus 5 final forms) in Unicode (U+05D0 to U+05EA). The code points are assigned in the logical sequence of the Hebrew alphabet, with final forms following their corresponding non-final letters (e.g., Kaf followed by Final Kaf). However, the base standard is designed for visual ordering, where Hebrew text runs are stored in reverse byte order to match left-to-right display without reordering. For example, 0xE0 maps to Hebrew Letter Alef (א, U+05D0), while 0xFA maps to Hebrew Letter Tav (ת, U+05EA). In practice, many systems treat ISO/IEC 8859-8 as logical ordering.[21] The following table summarizes the key defined positions in the upper half, focusing on graphic characters; undefined positions are noted as such for brevity, with controls implied for the lower ranges. Full mappings align with the Unicode standard derivation.[21]| Hex | Character/Description |
|---|---|
| 0x80-0x9F | Undefined (treated as controls) |
| 0xA0 | (NO-BREAK SPACE, U+00A0) |
| 0xA1 | Undefined |
| 0xA2 | ¢ (CENT SIGN, U+00A2) |
| 0xA3 | £ (POUND SIGN, U+00A3) |
| 0xA4 | ¤ (CURRENCY SIGN, U+00A4) |
| 0xA5 | ¥ (YEN SIGN, U+00A5) |
| 0xA6 | ¦ (BROKEN BAR, U+00A6) |
| 0xA7 | § (SECTION SIGN, U+00A7) |
| 0xA8 | ¨ (DIAERESIS, U+00A8) |
| 0xA9 | © (COPYRIGHT SIGN, U+00A9) |
| 0xAA | × (MULTIPLICATION SIGN, U+00D7) |
| 0xAB | « (LEFT-POINTING DOUBLE ANGLE QUOTATION MARK, U+00AB) |
| 0xAC | ¬ (NOT SIGN, U+00AC) |
| 0xAD | (SOFT HYPHEN, U+00AD) |
| 0xAE | ® (REGISTERED SIGN, U+00AE) |
| 0xAF | ¯ (MACRON, U+00AF) |
| 0xB0 | ° (DEGREE SIGN, U+00B0) |
| 0xB1 | ± (PLUS-MINUS SIGN, U+00B1) |
| 0xB2 | ² (SUPERSCRIPT TWO, U+00B2) |
| 0xB3 | ³ (SUPERSCRIPT THREE, U+00B3) |
| 0xB4 | ´ (ACUTE ACCENT, U+00B4) |
| 0xB5 | µ (MICRO SIGN, U+00B5) |
| 0xB6 | ¶ (PILCROW SIGN, U+00B6) |
| 0xB7 | · (MIDDLE DOT, U+00B7) |
| 0xB8 | ¸ (CEDILLA, U+00B8) |
| 0xB9 | Undefined |
| 0xBA | ÷ (DIVISION SIGN, U+00F7) |
| 0xBB | » (RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK, U+00BB) |
| 0xBC | ¼ (VULGAR FRACTION ONE QUARTER, U+00BC) |
| 0xBD | ½ (VULGAR FRACTION ONE HALF, U+00BD) |
| 0xBE | ¾ (VULGAR FRACTION THREE QUARTERS, U+00BE) |
| 0xBF-0xDE | Undefined (except 0xDF) |
| 0xDF | ‗ (DOUBLE LOW LINE, U+2017) |
| 0xE0 | א (HEBREW LETTER ALEF, U+05D0) |
| 0xE1 | ב (HEBREW LETTER BET, U+05D1) |
| 0xE2 | ג (HEBREW LETTER GIMEL, U+05D2) |
| 0xE3 | ד (HEBREW LETTER DALET, U+05D3) |
| 0xE4 | ה (HEBREW LETTER HE, U+05D4) |
| 0xE5 | ו (HEBREW LETTER VAV, U+05D5) |
| 0xE6 | ז (HEBREW LETTER ZAYIN, U+05D6) |
| 0xE7 | ח (HEBREW LETTER HET, U+05D7) |
| 0xE8 | ט (HEBREW LETTER TET, U+05D8) |
| 0xE9 | י (HEBREW LETTER YOD, U+05D9) |
| 0xEA | כ (HEBREW LETTER KAF, U+05DA) |
| 0xEB | ך (HEBREW LETTER FINAL KAF, U+05DB) |
| 0xEC | ל (HEBREW LETTER LAMED, U+05DC) |
| 0xED | מ (HEBREW LETTER MEM, U+05DD) |
| 0xEE | ם (HEBREW LETTER FINAL MEM, U+05DE) |
| 0xEF | נ (HEBREW LETTER NUN, U+05DF) |
| 0xF0 | ן (HEBREW LETTER FINAL NUN, U+05E0) |
| 0xF1 | ס (HEBREW LETTER SAMEKH, U+05E1) |
| 0xF2 | ע (HEBREW LETTER AYIN, U+05E2) |
| 0xF3 | פ (HEBREW LETTER PE, U+05E3) |
| 0xF4 | ף (HEBREW LETTER FINAL PE, U+05E4) |
| 0xF5 | צ (HEBREW LETTER TSADI, U+05E5) |
| 0xF6 | ץ (HEBREW LETTER FINAL TSADI, U+05E6) |
| 0xF7 | ק (HEBREW LETTER QOF, U+05E7) |
| 0xF8 | ר (HEBREW LETTER RESH, U+05E8) |
| 0xF9 | ש (HEBREW LETTER SHIN, U+05E9) |
| 0xFA | ת (HEBREW LETTER TAV, U+05EA) |
| 0xFB-0xFC | Undefined (treated as controls) |
| 0xFD | LEFT-TO-RIGHT MARK (U+200E) |
| 0xFE | RIGHT-TO-LEFT MARK (U+200F) |
| 0xFF | Undefined |
Visual vs. Logical Ordering
In the context of bidirectional Hebrew text encoded in ISO/IEC 8859-8, logical ordering refers to the storage of characters in the sequence they are intended to be read or typed, which for Hebrew follows a right-to-left (RTL) direction but is represented in memory as a left-to-right (LTR) byte stream. This approach preserves the natural input order, such as the sequence in which a user types on a keyboard, but necessitates reordering during display to achieve the correct visual presentation, particularly when mixing LTR elements like Latin punctuation or numbers.[20] In contrast, visual ordering stores the characters in the exact sequence they appear on the screen, effectively reversing the positions of Hebrew characters relative to their logical reading order to match LTR display conventions without additional processing. Nominally, the standard ISO/IEC 8859-8 encoding, corresponding to code page 28598, is for visual order, but in practice and per web standards (e.g., Character Model for the Web), it is often interpreted as logical order. The variant ISO-8859-8-I (code page 38598) is nominally for logical but sometimes used for visual in legacy contexts.[8][4] For example, a simple Hebrew phrase in logical order would have its bytes arranged in typing sequence (e.g., starting from the first character typed on the right), while the same phrase in visual order would have the Hebrew bytes reversed to align with LTR rendering.[20] To manage directionality in logical order without altering the character sequence, bidirectional marks such as the Left-to-Right Mark (LRM, U+200E) and Right-to-Left Mark (RLM, U+200F) are employed; these invisible Unicode control characters embed explicit directional cues at boundaries between LTR and RTL runs, ensuring proper isolation and resolution during rendering.[22] Although ISO/IEC 8859-8 itself includes these marks in its 1999 version, they are more fully integrated when migrating to or displaying alongside Unicode-based systems.[22] Implementation of logical ordering in ISO-8859-8-I requires application of a bidirectional algorithm for display, with the Unicode Bidirectional Algorithm serving as the reference standard to resolve embedding levels, directional runs, and reversals for mixed-script text.[22] Historically, visual ordering was favored for early computing environments, such as mainframe terminals and pre-bidi word processors, where direct LTR rendering simplified output; logical ordering gained prominence with the rise of editable text systems and web standards in the 1990s, enabling more intuitive editing and compatibility with evolving internationalization frameworks.[20][4]Extensions and Variants
Israeli Standard SI 1311
The Israeli Standard SI 1311:2002, published by the Standards Institution of Israel, adopts the ISO/IEC 8859-8:1999 as its base and extends it to provide an 8-bit coded character set tailored for Hebrew interchange in Israeli contexts.[17] This national standard, registered internationally as ISO-IR-234, incorporates enhancements to support modern currency symbols and improved text processing while building on the logical ordering of Hebrew characters defined in the ISO specification.[17] Key additions in SI 1311:2002 include the Euro sign (€) at code point 0xA4, which replaces the generic currency sign from the base standard; the new Israeli shekel sign (₪) at 0xFD, necessitating a shift of the Left-to-Right Mark (LRM) to another position; and the geresh (׳), a Hebrew punctuation mark, at 0xDF.[17] These characters address practical needs for financial and typographic applications in Hebrew environments, filling gaps in the original ISO/IEC 8859-8:1999 repertoire. For bidirectional (BiDi) text support, SI 1311:2002 introduces additional formatting characters beyond those in ISO/IEC 8859-8:1999, enabling more robust handling of mixed Hebrew and Latin scripts according to principles aligned with the Unicode Bidirectional Algorithm.[17] Examples include overrides and embeddings that facilitate direction control in legacy systems processing right-to-left Hebrew alongside left-to-right Latin text. The extensions in SI 1311:2002 primarily affect previously undefined positions or legacy control bytes in the 0xA0–0xFF range of ISO/IEC 8859-8:1999, such as reassigning 0xA4, 0xDF, and 0xFD, while preserving core Hebrew letter mappings to ensure partial backward compatibility with prior implementations.[17] As a national specification, SI 1311:2002 holds official status in Israel for legacy computing and data interchange but does not constitute an ISO/IEC international standard.[17]Relation to Windows-1255
Windows-1255, also known as CP1255, is a single-byte character encoding developed by Microsoft as an extension of ISO/IEC 8859-8 specifically for supporting the Hebrew script in the Windows operating system.[8] It expands upon the base ISO/IEC 8859-8 character set by incorporating additional characters essential for Hebrew typography, including niqqud (vowel points), select cantillation marks used in biblical texts, and the new shekel sign (₪).[23] This makes Windows-1255 more comprehensive for rendering full Hebrew script features compared to the more limited ISO/IEC 8859-8, which primarily covers the 27 basic Hebrew letters and directional controls without diacritics.[21] Key differences between the two encodings lie in their treatment of the upper byte range (0xA0–0xFF). While both maintain logical ordering for bidirectional Hebrew text—encoding characters in reading order rather than display order—Windows-1255 populates positions such as 0xC0–0xCF with niqqud diacritics (e.g., 0xC1 maps to U+05B1 HEBREW POINT HATAF SEGOL) and includes cantillation accents like 0xCF (U+05BF HEBREW POINT RAFE), which are absent or undefined in ISO/IEC 8859-8.[23][21] Additionally, Windows-1255 assigns 0xA4 to the new shekel sign (U+20AA) and integrates Western European punctuation and symbols (e.g., 0x91–0x94 for curved quotes) in 0x80–0x9F, enhancing compatibility with Latin scripts, whereas ISO/IEC 8859-8 reserves much of this range for control characters.[23] In terms of compatibility, ISO/IEC 8859-8 serves as a strict subset of Windows-1255, with matching mappings for ASCII (0x00–0x7F) and the core Hebrew letters (0xE0–0xFA, corresponding to U+05D0–U+05EA in the Unicode Hebrew block).[21][23] Text encoded in ISO/IEC 8859-8 can thus be interpreted in Windows-1255 without loss for basic characters, though interpreting Windows-1255 as ISO/IEC 8859-8 may render added diacritics incorrectly as undefined glyphs. Both encodings map directly to the Unicode Hebrew block (U+0590–U+05FF), with Windows-1255's extras aligning to combining diacritical marks.[23][21] Historically, Windows-1255 emerged in the 1990s as part of Microsoft's internationalization efforts for Windows 3.1 and later versions, providing broader Hebrew support than the international standard to accommodate user needs in Israel and Hebrew-speaking communities.[24] Windows also supports a visual-order variant of CP1255 in certain legacy applications, where characters are stored in display order to simplify rendering without bidirectional algorithms, though the standard CP1255 remains logical.[8]| Aspect | ISO/IEC 8859-8 | Windows-1255 |
|---|---|---|
| Core Hebrew Letters | 0xE0–0xFA (22 letters + finals) | Identical: 0xE0–0xFA |
| Diacritics (Niqqud) | None | 0xC0–0xCF (e.g., sheva, hiriq) |
| Cantillation Marks | None | Select (e.g., 0xCF rafe) |
| Special Symbols | Basic punctuation; no sheqel | Includes ₪ (0xA4); Euro (0x80) |
| Ordering | Logical | Logical (visual variant available) |
| Unicode Mapping | Hebrew block + controls | Hebrew block + combining marks |
Usage and Compatibility
Applications and Adoption
ISO/IEC 8859-8 found early adoption in computing environments requiring Hebrew text support during the late 1980s and 1990s, particularly in IBM mainframe systems where it was assigned code page 916 (CCSIDs 916 and 5012) for processing Latin and Hebrew characters. It was also integrated into early Unix-like systems as part of the broader ISO 8859 family for terminal and internationalization efforts, enabling Hebrew display in text-based interfaces.[25] In Israel, the encoding gained traction through the Standards Institution of Israel's earlier national standard SI 960 (a 7-bit code), which ISO 8859-8 extended to 8 bits for combined Latin-Hebrew processing in applications such as the ALEPH library system.[26] Regionally, ISO/IEC 8859-8 became prevalent in Israel for email communications and printing Hebrew documents before widespread Unicode adoption, with over 40 Listserv mailing lists utilizing it via MIME headers specifying the charset.[27][28] The encoding supported visual left-to-right ordering initially, later supplemented by logical variants like ISO-8859-8-I for better right-to-left handling in email and text files.[29] Its use extended to government and institutional documents in the 1990s, where it facilitated Hebrew text interchange without vowel points (niqqud).[26] In web applications, ISO/IEC 8859-8 is identified by the MIME type iso-8859-8 and persists in legacy Hebrew content, though it is detected on less than 0.1% of websites as of November 2025.[30] Modern software maintains compatibility, with browsers implementing it through the WHATWG Encoding Standard, alongside support in text editors and databases for decoding historical files.[31] Adoption has declined significantly in favor of UTF-8, which offers superior multilingual capabilities and avoids the limitations of single-byte encodings for bidirectional text.[28] While still present in some embedded systems and legacy Israeli applications, ISO/IEC 8859-8 is largely superseded for new developments.[27]Migration to Unicode
The migration from ISO/IEC 8859-8 to Unicode involves a straightforward mapping for its core characters, enabling seamless integration into modern text processing systems. The 27 Hebrew letters (including final forms) in ISO/IEC 8859-8 map directly on a 1:1 basis to the Unicode Hebrew block in the range U+0590–U+05EA, preserving their semantic identity during conversion. Additionally, the bidirectional control characters at positions 0xFD and 0xFE map to U+200E (LEFT-TO-RIGHT MARK) and U+200F (RIGHT-TO-LEFT MARK), respectively, facilitating proper text directionality in Unicode environments.[15] Unicode offers significant advantages over ISO/IEC 8859-8, particularly in supporting advanced Hebrew features and broader text handling. Unlike ISO/IEC 8859-8, which lacks encoding for niqqud (vowel points) and cantillation marks, Unicode includes these as combining diacritics in the Hebrew block (e.g., U+05B0 to U+05C7), allowing full representation of pointed Hebrew text. It also provides comprehensive right-to-left (RTL) rendering through its bidirectional algorithm, supports seamless multilingual documents by encompassing all scripts, and uses UTF-8 as a variable-width encoding that backward-compatibly handles ASCII while efficiently encoding non-Latin characters. These capabilities make Unicode ideal for web, software, and global communication applications.[32][33] Conversion processes typically employ standard utilities to transform ISO/IEC 8859-8 data into UTF-8. In Unix-like systems, the iconv command handles this reliably, as iniconv -f ISO-8859-8 -t UTF-8 input.txt > output.txt, ensuring byte-for-byte translation where mappings exist. Web browsers and applications detect encodings via HTTP Content-Type headers (e.g., Content-Type: text/html; charset=ISO-8859-8), HTML meta tags, or UTF-8 byte order marks (BOM), often auto-converting legacy content to Unicode for display. For Windows-1255 data, similar mappings apply, though it includes additional characters like some niqqud points.
Challenges in migration arise primarily from ISO/IEC 8859-8's visual ordering variant, where Hebrew characters are stored in display sequence rather than logical (input) order, contrasting with Unicode's mandatory logical ordering. Converting visual-ordered text requires algorithmic reordering of bidirectional segments to avoid reversed rendering, often using tools that specify the -i (logical) variant for input. Unmapped bytes in ISO/IEC 8859-8 (e.g., many positions in the 0x80–0x9F range) may result in substitution characters like U+FFFD (REPLACEMENT CHARACTER) during conversion, potentially leading to data loss if not reviewed.[20][32]
Since the early 2000s, ISO/IEC 8859-8 has been deprecated in favor of Unicode for new content creation, with standards bodies and software ecosystems recommending UTF-8 as the default for Hebrew and multilingual text to ensure future-proof compatibility and full feature support. Legacy systems may still encounter it in older files, but comprehensive migration tools and Unicode's superset nature minimize ongoing reliance.[32][34]