Mac OS Roman
View on Wikipedia| MIME / IANA | macintosh |
|---|---|
| Languages | English, various others |
| Created by | Apple Computer, Inc. |
| Classification | Extended ASCII, Mac OS script |
| Extends | ASCII, Macintosh character set |
Mac OS Roman is a character encoding created by Apple Computer, Inc. for use by Macintosh computers.[1] It is suitable for representing text in English and several other languages that use the Latin script. Mac OS Roman encodes 256 characters, the first 128 of which are identical to ASCII, with the remaining characters including mathematical symbols, diacritics, and additional punctuation marks. Mac OS Roman is an extension of the original Macintosh character set, which encoded 217 characters.[1] Full support for Mac OS Roman first appeared in System 6.0.4, released in 1989,[2] and the encoding is still supported in current versions of macOS, though the standard character encoding is now UTF-8. Apple modified Mac OS Roman in 1998 with the release of Mac OS 8.5 by replacing the currency sign with the euro sign,[3] but otherwise the encoding has been unchanged since its release.
Character set
[edit]The following table shows how characters are encoded in Mac OS Roman. The row and column headings give the first and second digit of the hexadecimal code for each character in the table.
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
| 0x | NUL | SOH | STX | ETX | EOT | ENQ | ACK | BEL | BS | HT | LF | VT | FF | CR | SO | SI |
| 1x | DLE | DC1 | DC2 | DC3 | DC4 | NAK | SYN | ETB | CAN | EM | SUB | ESC | FS | GS | RS | US |
| 2x | SP | ! | " | # | $ | % | & | ' | ( | ) | * | + | , | - | . | / |
| 3x | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | : | ; | < | = | > | ? |
| 4x | @ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O |
| 5x | P | Q | R | S | T | U | V | W | X | Y | Z | [ | \ | ] | ^ | _ |
| 6x | ` | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o |
| 7x | p | q | r | s | t | u | v | w | x | y | z | { | | | } | ~ | DEL |
| 8x | Ä | Å | Ç | É | Ñ | Ö | Ü | á | à | â | ä | ã | å | ç | é | è |
| 9x | ê | ë | í | ì | î | ï | ñ | ó | ò | ô | ö | õ | ú | ù | û | ü |
| Ax | † | ° | ¢ | £ | § | • | ¶ | ß | ® | © | ™ | ´ | ¨ | ≠ | Æ | Ø |
| Bx | ∞ | ± | ≤ | ≥ | ¥ | µ | ∂ | ∑ | ∏ | π | ∫ | ª | º | Ω[a] | æ | ø |
| Cx | ¿ | ¡ | ¬ | √ | ƒ | ≈ | ∆ | « | » | … | NBSP | À | à | Õ | Œ | œ |
| Dx | – | — | “ | ” | ‘ | ’ | ÷ | ◊ | ÿ | Ÿ | ⁄ | €[b] | ‹ | › | fi | fl |
| Ex | ‡ | · | ‚ | „ | ‰ | Â | Ê | Á | Ë | È | Í | Î | Ï | Ì | Ó | Ô |
| Fx | Ò | Ú | Û | Ù | ı | ˆ | ˜ | ¯ | ˘ | ˙ | ˚ | ¸ | ˝ | ˛ | ˇ |
- ^ Prior to December 1997, Apple's mapping published on Unicode.org mapped this character to U+2126 OHM SIGN.[5]
- ^ Before Mac OS 8.5, the character at position 0xDB was the generic currency sign (¤).[5]
- ^ The character at position 0xF0 is a solid Apple logo. Apple uses Unicode character U+F8FF in the Corporate Private Use Area for this character, but it may not be supported on non-Apple systems.
See also
[edit]Notes
[edit]- ^ a b Apple Computer, Inc. (1993). Inside Macintosh: Text (PDF). Addison Wesley Publishing Company. p. 1-53. ISBN 0-201-63298-5. Archived (PDF) from the original on 2019-12-11. Retrieved July 10, 2021.
- ^ Apple Computer, Inc. (1991). Inside Macintosh, Volume VI. p. 14-104. ISBN 0-201-57755-0.
- ^ Apple Computer, Inc. (September 14, 1998). "Technical Note TN1104: The Euro Currency Symbol". Retrieved July 10, 2021.
- ^ Inside Macintosh: Text (PDF). Apple Computer, Inc. 1993. pp. 1–54, A-5 – A-18. ISBN 0-201-63298-5. Archived from the original (PDF) on 2019-12-11. Retrieved July 10, 2021.
- ^ a b c "ROMAN.TXT". Unicode.org. Apple Computer, Inc. 5 April 2005. Retrieved 9 October 2023.
Mac OS Roman
View on GrokipediaHistory and Development
Origins in Early Macintosh Systems
Mac OS Roman originated with the launch of the original Macintosh 128K computer on January 24, 1984, as Apple's proprietary 8-bit character encoding scheme tailored for the system's text-handling capabilities.[5] Developed to meet the demands of the Macintosh's graphical user interface, it extended the foundational character set used in the machine's ROM and software, enabling robust support for typography in early applications.[6] The encoding built upon an initial character repertoire of 217 glyphs, primarily designed to accommodate English and major Western European languages through the inclusion of diacritics, punctuation variants, and typographic symbols.[7] Apple's design motivation stemmed from the Macintosh's emphasis on desktop publishing and creative workflows, where basic 7-bit ASCII proved insufficient for professional text composition involving accented characters (such as é or ñ) and specialized symbols (like © or ™) essential for international documents and graphic design.[8] This extension allowed the system to handle composite diacritics via keyboard combinations, such as the Option key paired with letters, facilitating seamless input for multilingual content without requiring complex software add-ons.[6] Implemented as the core text encoding in System Software 1.0—the operating environment shipped with the Macintosh 128K—Mac OS Roman functioned as the default script system for U.S. English localizations, ensuring compatibility with the system's fonts like Chicago and Geneva.[9] The full encoding supported 256 characters in total, with the lower 128 positions (0x00–0x7F) directly mirroring the US-ASCII standard to maintain interoperability with existing computing standards and peripherals.[6] This structure provided a solid foundation for text rendering via QuickDraw, prioritizing visual consistency and ease of use in the Macintosh's pioneering bitmap displays.[6]Standardization and Key Updates
Mac OS Roman achieved full standardization as Apple's primary single-byte encoding for the Roman script system with the release of System 6.0.4 in September 1989. This update expanded the character set from its earlier 217-character precursor to a complete 256-character repertoire, incorporating high-ASCII extensions for accented letters, symbols, and diacritical marks while maintaining compatibility with the baseline ASCII range. As the foundational encoding for the Macintosh's Script Manager, it became the default for text handling in U.S. English and other Western European languages, ensuring consistent rendering across fonts and applications in the evolving Mac OS ecosystem.[2][10] A significant modification occurred in 1998 with Mac OS 8.5, where Apple replaced the generic currency sign (¤ at code point 0xDB) with the euro symbol (€ at Unicode U+20AC) to support the European Monetary Union's adoption of the euro as a common currency. This change was implemented system-wide, affecting text rendering, input methods, and font mappings without disrupting backward compatibility for legacy content. The update reflected Apple's commitment to aligning its encoding standards with international economic developments, and the euro variant has remained the standard in subsequent macOS releases.[11][10] For internet compatibility, the Internet Assigned Numbers Authority (IANA) registered "macintosh" as the official MIME charset name for Mac OS Roman, with aliases "mac" and "csMacintosh," facilitating reliable transmission of Macintosh-encoded text in email, web content, and other protocols. Although the core encoding prioritized English and standard Roman alphabets, minor adjustments accommodated localizations like Swiss French and Swiss German through script-specific resources in the Script Manager, such as tailored keyboard layouts and sorting rules, while preserving the underlying 256-character structure. These adaptations ensured broad usability across European regions without requiring a divergent encoding scheme.[12][1]Technical Specifications
Encoding Structure
Mac OS Roman is an 8-bit single-byte character encoding that defines 256 code points ranging from 0x00 to 0xFF. This fixed-width structure allows each character to be represented by exactly one byte, facilitating efficient processing on early Macintosh hardware without the complexity of variable-length sequences. The encoding extends the 7-bit US-ASCII standard by maintaining full compatibility in its lower half, where code points 0x00 through 0x7F are identical to ASCII.[1] This includes 33 control characters consisting of those from 0x00 to 0x1F (such as NUL at 0x00) and DEL at 0x7F, along with 95 printable characters from 0x20 (space) to 0x7E (tilde), covering basic English text and punctuation.[1] The adherence to ASCII ensures seamless interoperability for standard Latin scripts while reserving the upper range for enhancements. The upper half, comprising code points 0x80 to 0xFF, allocates 128 slots for proprietary extensions defined by Apple, including diacritics, symbols, and typographic elements tailored to Western European languages and Macintosh-specific needs.[1] Unlike ISO 8859-1, which follows an international standard for its high-byte assignments, Mac OS Roman's upper range is vendor-specific without adherence to a broader standardization body beyond Apple's implementation.[13] This design choice prioritized Macintosh ecosystem cohesion over universal portability, resulting in a self-contained encoding that lacks multi-byte or variable-length mechanisms.[1]Character Repertoire
Mac OS Roman, as an 8-bit character encoding, defines a repertoire of 256 code points, with the lower 128 (0x00 to 0x7F) mirroring the ASCII standard, including 95 printable characters from 0x20 to 0x7E and control codes otherwise.[1] The upper 128 code points (0x80 to 0xFF) extend this base with 128 additional printable characters tailored for enhanced text representation in Western contexts, resulting in a total of 223 printable characters across the encoding when excluding controls (0x00–0x1F and 0x7F).[3][1] These upper-half characters are grouped into categories emphasizing diacritics for accented letters, mathematical and technical symbols, currency marks, and typographic elements. Diacritics include forms such as Ä (0x80), é (0x8E), â (0x89), and ñ (0x96), enabling support for languages like French, German, and Spanish through accented Latin letters and ligatures.[1] Mathematical symbols feature ∑ (0xB7) for summation, ∞ (0xB0) for infinity, and ± (0xB1) for plus-minus, alongside others like ≠ (0xAD).[1] Currency symbols encompass ¢ (0xA2), £ (0xA3), and ¥ (0xB4), with the € (0xDB) added in later updates starting from Mac OS 8.5 to accommodate the eurozone.[1][3] Typographic characters provide punctuation and ornaments, such as † (0xA0) for dagger, … (0xC9) for ellipsis, and « (0xC7) for left guillemet.[1] Notable among the repertoire are Apple-specific glyphs, including the Apple logo (0xF0) and the bullet • (0xA5), which were integral to early Macintosh interface and documentation elements.[1][3] Overall, the encoding prioritizes Roman-based scripts for Western European languages, offering robust coverage for everyday text in those tongues but with notable exclusions: it lacks comprehensive support for non-Latin alphabets, such as full Cyrillic or Greek sets beyond mathematical operators, limiting its applicability to broader multilingual scenarios.[3][1]Usage in Macintosh Ecosystems
Integration with Operating Systems
Mac OS Roman served as the default character encoding and script system in Macintosh operating systems from System 1.0 (released in 1984) through Mac OS 9 (2001), forming the baseline for text processing in Roman-localized versions such as those for the U.S., UK, and French markets.[14] It handled essential system elements including file names, menu labels, and dialog boxes, ensuring consistent sorting and display through the Script Manager's string-manipulation resources.[14] In these environments, the encoding provided a 256-character repertoire optimized for Western European languages, with the first 128 characters matching ASCII for basic compatibility.[3] Within the graphics subsystem, Mac OS Roman played a central role in QuickDraw text rendering, where character code points directly corresponded to glyph indices in one-byte Roman fonts managed by the Font Manager.[15] This direct mapping enabled efficient drawing of text via routines likeDrawText and StdText, positioning glyphs along the graphics pen's baseline in the current graphics port, with styles, sizes, and modes applied at the system level.[16] The integration supported seamless text output to screens and printers without additional translation layers for Roman script content.
For international variants, Mac OS Roman was employed in various European localizations, such as those for German and Swedish, through the Script Manager, which adapted keyboard layouts, sorting orders, and string comparison routines while retaining the core encoding.[14] These systems overrode U.S.-specific behaviors for diacritics and collation but lacked comprehensive support for non-Roman scripts, limiting full localization to Latin-based alphabets.[14]
System 7 (introduced in 1991) enhanced diacritic handling in international text via updated Text Utilities in the Script Manager, incorporating the 'itl2' resource for routines like StripDiacritics and UppercaseStripDiacritics.[17] These improvements allowed accented characters (e.g., Å to A, ê to e) to be stripped or case-converted accurately, supporting better multilingual input and output in Roman-localized environments without altering the underlying encoding.[17]
Application and Font Support
Mac OS Roman served as the default character encoding for text handling in many early Macintosh applications, particularly those developed for desktop publishing and word processing. Applications such as MacWrite, Apple's bundled word processor, natively supported Mac OS Roman for saving and loading plain text files (.txt) and rich text format files (.rtf), ensuring seamless integration with the system's text utilities without requiring explicit encoding declarations.[18] Similarly, Aldus PageMaker, a pioneering desktop publishing tool, imported and exported text files using Mac OS Roman as the standard Macintosh text encoding, often labeled simply as "ASCII" in import dialogs, which facilitated layout workflows involving accented characters and symbols common in Western European languages.[19] In the realm of font technologies, Mac OS Roman was integral to glyph mapping in both bitmap and outline formats prevalent on Macintosh systems. TrueType and PostScript fonts, including Apple's bitmap fonts like Geneva and Chicago, incorporated glyph tables aligned with Mac OS Roman code points to enable accurate bitmap rendering on screen and in print. For instance, the Standard Roman character set, which defines Mac OS Roman, ensured that characters from $20 to $FF were available in most Roman outline fonts, though bitmapped versions of Geneva and Chicago provided partial support, prioritizing readability for common Latin scripts over full repertoire coverage.[3] This mapping allowed fonts to render diacritics and typographic symbols directly from the encoding without additional translation layers.[13] Developer tools and APIs in the Classic Macintosh environment further embedded Mac OS Roman into string handling routines. In the Carbon and Classic APIs, functions such as[DrawString](/page/Drawstring) in QuickDraw expected input as Pascal strings encoded in Mac OS Roman, using the current graphics port's text attributes (e.g., font, size, and mode) to render text at the pen location.[16] This design streamlined application development by assuming Mac Roman as the native format for text measurement and drawing operations, with routines like TextWidth and CharWidth computing widths based on the encoding's glyph assignments.[16]
Despite its ubiquity, Mac OS Roman's implementation in early applications revealed limitations, particularly in handling undefined characters during cross-platform file transfers. Many pre-1990s apps lacked fallback mechanisms for characters outside the expected repertoire, resulting in mojibake—garbled text—when files were opened on non-Macintosh systems or vice versa, as bytes above $7F were misinterpreted without proper encoding detection.[20] This issue was exacerbated in desktop publishing workflows, where transferring Mac OS Roman-encoded documents to Windows environments often led to visual corruption of accented letters and symbols until later tools introduced explicit conversions.[20]
Compatibility and Comparisons
Relation to ASCII and ISO 8859-1
Mac OS Roman maintains full compatibility with the 7-bit ASCII standard in its lower range, where bytes 0x00 through 0x7F map identically to the corresponding ASCII control codes and printable characters.[21] This design choice ensured seamless interoperability with early computing systems and protocols that relied on ASCII, allowing Mac OS Roman text to display correctly in ASCII-only environments without alteration.[21] In the extended 8-bit range from 0x80 to 0xFF, however, Mac OS Roman diverges from both ASCII extensions and ISO 8859-1 (Latin-1), prioritizing characters suited to Macintosh typography and Western European languages over international standardization. Specifically, the range 0x80–0x9F in Mac OS Roman assigns printable glyphs such as Ä (U+00C4) at 0x80 and ï (U+00EF) at 0x95, whereas ISO 8859-1 treats most of these positions as undefined or reserves them for C1 control codes, leading to potential rendering failures when Mac OS Roman text is misinterpreted as Latin-1.[1] For instance, the copyright symbol © (U+00A9) appears at 0xA9 in both encodings, providing a point of overlap, but mismatches abound elsewhere, such as Mac OS Roman's placement of the dagger † (U+2020) at 0xA0 compared to ISO 8859-1's non-breaking space (U+00A0) at the same position.[1] Overall, while the two encodings share a substantial portion of their character repertoire—focusing on Latin-script letters with diacritics and common symbols—their positional differences result in only partial direct compatibility in the upper half.[22] These structural variances frequently caused compatibility challenges in cross-platform data exchange during the pre-Unicode era. In email and file transfers between Macintosh systems and PCs assuming ISO 8859-1 as the default, Mac OS Roman text often appeared garbled, a phenomenon known as mojibake, where bytes intended as accented characters or symbols were rendered as unintended punctuation or controls.[23] Historically, Mac OS Roman was tailored for the proprietary Macintosh hardware and font rendering introduced in 1984, predating widespread web standards that favored ISO 8859-1 for HTML and internet protocols in the early 1990s, which exacerbated display issues on non-Mac platforms accessing Mac-generated content.[10] Apple's inclusion of non-standard elements, such as the Apple logo (U+F8FF) at 0xF0, further highlighted its platform-specific focus, rendering such symbols invisible or substituted on systems lacking Macintosh font support.[1]Mapping to Unicode
The Unicode Consortium maintains an official one-to-one mapping table for Mac OS Roman, documented in the ROMAN.TXT file, which assigns each code point from 0x00 to 0xFF to a corresponding Unicode scalar value.[1] This mapping covers the full 256-character repertoire of the standard variant, including standard ASCII in the lower half (0x00–0x7F) and Macintosh-specific extensions in the upper half (0x80–0xFF), such as 0xA3 mapping to the pound sign £ (U+00A3) and 0xCF to the œ ligature (U+0153); separate variant tables exist for international versions such as Croatian, Icelandic, Turkish, and Romanian.[1] The table ensures lossless conversion from Mac OS Roman to Unicode for all characters, with the Apple logo at 0xF0 specifically assigned to U+F8FF in the Private Use Area (PUA).[1] Round-trip compatibility between Mac OS Roman and Unicode is generally preserved, meaning most characters can be converted back and forth without loss, as Unicode encompasses the entire Mac OS Roman set.[1] However, an exception arises with the currency sign at code point 0xDB: prior to Mac OS 8.5 (pre-1998), it mapped to the generic currency sign ¤ (U+00A4), while post-1998 updates remap it to the euro sign € (U+20AC) to reflect the introduction of the euro currency.[1] This change requires variant-specific handling for accurate round-trip conversions in legacy contexts.[13] Apple's developer documentation supports these mappings through the Text Encoding Conversion Manager, which includes C functions such as ConvertFromTextToUnicode for programmatic conversion from Mac OS Roman (identified by the encoding constant kTextEncodingMacRoman) to Unicode scalars.[24] These APIs handle the one-to-one assignments and account for variants like the euro update, enabling developers to process legacy Macintosh text in modern Unicode-based applications.[25] A notable exception in the mapping is the Apple logo glyph at 0xF0, placed in the Unicode Private Use Area at U+F8FF, which was not available for standardization until Unicode version 1.1 in June 1993.[1] This PUA assignment allows proprietary rendering on Apple systems but lacks universal standardization, potentially leading to fallback glyphs in non-Apple environments.[26]Legacy and Modern Relevance
Transition to Unicode
Apple began integrating Unicode support into its operating systems in 1998 with the release of Mac OS 8.5, which introduced the Apple Type Services for Unicode Imaging (ATSUI) framework. This allowed for the rendering and input of Unicode text (specifically using UTF-16 encoding based on Unicode 2.1) while continuing to operate alongside the legacy Mac OS Roman encoding for backward compatibility with existing applications and files.[10][27] The primary motivations for this transition stemmed from the limitations of single-byte encodings like Mac OS Roman, which were inadequate for supporting a wide range of global languages and non-Roman scripts such as Japanese, Arabic, and Chinese. Apple was a key participant in the development of Unicode starting in 1987 and co-founded the Unicode Consortium in 1991 alongside other companies including Xerox to develop a universal character encoding standard, sought to address the growing need for multilingual text processing in software and documents. Additionally, compliance with emerging web standards, including the Multipurpose Internet Mail Extensions (MIME) defined in RFC 2046—which aligns with ISO/IEC 10646 (the basis for Unicode)—drove the shift to enable seamless handling of internationalized content on the internet.[28][10] The full transition occurred with the launch of Mac OS X in 2001 (version 10.0), where UTF-8 became the default encoding for new text files and system interfaces, marking the deprecation of Mac OS Roman for modern development. To maintain compatibility with classic Macintosh applications, Apple introduced the Carbon framework, which ported APIs from the Classic Mac OS environment to Mac OS X and included on-the-fly conversion of Mac Roman strings to Unicode during runtime execution. This ensured that legacy software could run under emulation without immediate rewriting, while new applications were encouraged to adopt Unicode natively. Furthermore, Mac OS X favored Unicode Normalization Form D (NFD) for text storage, particularly in the HFS+ file system, to preserve compatibility with decomposed character representations from earlier Macintosh encodings.[10][1][29]Ongoing Support and Tools
Modern macOS includes built-in support for Mac OS Roman encoding to handle legacy text files. Theiconv command-line tool, part of the system's core utilities, enables conversion between Mac OS Roman and contemporary encodings like UTF-8; for example, the command iconv -f macroman -t utf-8 input.txt reads a Mac OS Roman file and outputs it in UTF-8.[30] TextEdit, the default text editor, automatically detects Mac OS Roman files upon opening and displays them correctly, with options to manually select or change the encoding via the "Plain Text File Encoding" preferences if characters appear garbled.[31] Similarly, the Finder's Get Info panel provides information about plain text files to aid in file management and conversion workflows.
Third-party tools extend this support for developers and archivists working with Mac OS Roman data. The Unicode Consortium maintains official mapping tables that convert Mac OS Roman characters to Unicode code points, facilitating integration with modern systems.[1] In Python, the standard codecs module recognizes 'mac_roman' as an alias for this encoding, allowing seamless file operations like open('legacy.txt', encoding='mac_roman') to read and decode older documents without corruption.[32]
This ongoing support is particularly valuable for processing artifacts from the 1980s and 1990s, including PDFs, emails, and system files stored in digital archives or used in forensic investigations, where accurate decoding preserves original content.[33] However, attempting to view or edit these files in software without proper Mac OS Roman handling risks mojibake—garbled text resulting from mismatched encodings, such as interpreting bytes as ISO-8859-1 instead.[34] Apple continues to include Mac OS Roman compatibility in macOS as of 2025 to maintain access to historical data, but discourages its use for new content, aligning with Unicode Consortium best practices that prioritize UTF-8 for universal interoperability and future-proofing.[35]
