Recent from talks
Nothing was collected or created yet.
Windows-1251
View on Wikipedia| MIME / IANA | windows-1251 |
|---|---|
| Alias(es) | cp1251 (Code page 1251) |
| Languages | Russian, Ukrainian, Belarusian, Bulgarian, Serbian Cyrillic, Bosnian Cyrillic, Macedonian, Rotokas, Rusyn, English |
| Created by | Microsoft |
| Standard | WHATWG Encoding Standard |
| Classification | extended ASCII, Windows-125x |
| Other related encodings | Amiga-1251, KZ-1048, RFC 1345's "ECMA-Cyrillic" |
Windows-1251 is an 8-bit character encoding, designed to cover languages that use the Cyrillic script such as Russian, Ukrainian, Belarusian, Bulgarian, Serbian Cyrillic, Macedonian and other languages.
On the web, it is the second most-used single-byte character encoding (or third most-used character encoding overall), and most used of the single-byte encodings supporting Cyrillic. As of January 2024[update], 0.3% of all websites use Windows-1251.[1][2] It is by far mostly used for Russian, while a small minority of Russian websites use it, with 94.6% of Russian (.ru) websites using UTF-8,[3][4][5] and the legacy 8-bit encoding is distant second. In Linux, the encoding is known as cp1251.[6] IBM uses code page 1251 (CCSID 1251 and euro sign extended CCSID 5347) for Windows-1251.[7][8][9][10][11][12][13]
Windows-1251 and KOI8-R (or its Ukrainian variant KOI8-U) are much more commonly used than ISO 8859-5 (which is used by less than 0.0004% of websites).[14] In contrast to Windows-1252 and ISO 8859-1, Windows-1251 is not closely related to ISO 8859-5.
Unicode (e.g. UTF-8) is preferred to Windows-1251 or other Cyrillic encodings in modern applications, especially on the Internet, making UTF-8 the dominant encoding for web pages. (For further discussion of Unicode's complete coverage, of 436 Cyrillic letters/code points, including for Old Cyrillic, and how single-byte character encodings, such as Windows-1251 and KOI8-R, cannot provide this, see Cyrillic script in Unicode.)
Character set
[edit]The following table shows Windows-1251. Each character is shown with its Unicode equivalent and its Alt code.
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
| 0x | NUL | SOH | STX | ETX | EOT | ENQ | ACK | BEL | BS | HT | LF | VT | FF | CR | SO | SI |
| 1x | DLE | DC1 | DC2 | DC3 | DC4 | NAK | SYN | ETB | CAN | EM | SUB | ESC | FS | GS | RS | US |
| 2x | SP | ! | " | # | $ | % | & | ' | ( | ) | * | + | , | - | . | / |
| 3x | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | : | ; | < | = | > | ? |
| 4x | @ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O |
| 5x | P | Q | R | S | T | U | V | W | X | Y | Z | [ | \ | ] | ^ | _ |
| 6x | ` | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o |
| 7x | p | q | r | s | t | u | v | w | x | y | z | { | | | } | ~ | DEL |
| 8x | Ђ | Ѓ | ‚ | ѓ | „ | … | † | ‡ | € | ‰ | Љ | ‹ | Њ | Ќ | Ћ | Џ |
| 9x | ђ | ‘ | ’ | “ | ” | • | – | — | ™ | љ | › | њ | ќ | ћ | џ | |
| Ax | NBSP | Ў | ў | Ј | ¤ | Ґ | ¦ | § | Ё | © | Є | « | ¬ | SHY | ® | Ї |
| Bx | ° | ± | І | і | ґ | µ | ¶ | · | ё | № | є | » | ј | Ѕ | ѕ | ї |
| Cx | А | Б | В | Г | Д | Е | Ж | З | И | Й | К | Л | М | Н | О | П |
| Dx | Р | С | Т | У | Ф | Х | Ц | Ч | Ш | Щ | Ъ | Ы | Ь | Э | Ю | Я |
| Ex | а | б | в | г | д | е | ж | з | и | й | к | л | м | н | о | п |
| Fx | р | с | т | у | ф | х | ц | ч | ш | щ | ъ | ы | ь | э | ю | я |
Kazakh variants
[edit]KZ-1048
[edit]An altered version of Windows-1251 was standardised in Kazakhstan as Kazakh standard STRK1048, and is known by the label KZ-1048. It differs in the rows shown below:
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
| 8x | Ђ | Ѓ | ‚ | ѓ | „ | … | † | ‡ | € | ‰ | Љ | ‹ | Њ | Қ | Һ | Џ |
| 9x | ђ | ‘ | ’ | “ | ” | • | – | — | ™ | љ | › | њ | қ | һ | џ | |
| Ax | NBSP | Ұ | ұ | Ә | ¤ | Ө | ¦ | § | Ё | © | Ғ | « | ¬ | SHY | ® | Ү |
| Bx | ° | ± | І | і | ө | µ | ¶ | · | ё | № | ғ | » | ә | Ң | ң | ү |
Code Page 1174
[edit]Code Page 1174 is another variant created for the Kazakh language, which matches Windows-1251 for the Russian subset of the Cyrillic letters. It differs from KZ-1048 by moving the Cyrillic letter Shha from 8E/9E to 8A/9A.
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
| 8x | Ђ | Ѓ | ‚ | ѓ | „ | … | † | ‡ | € | ‰ | Һ | ‹ | Њ | Қ | Ћ | Џ |
| 9x | ђ | ‘ | ’ | “ | ” | • | – | — | ™ | һ | › | њ | қ | ћ | џ | |
| Ax | NBSP | Ұ | ұ | Ә | ¤ | Ө | ¦ | § | Ё | © | Ғ | « | ¬ | SHY | ® | Ү |
| Bx | ° | ± | І | і | ө | µ | ¶ | · | ё | № | ғ | » | ә | Ң | ң | ү |
Latvian variant
[edit]Windows Latvian + Russian is a modification of Windows-1251 to support the Latvian language. It uses the letter Ō/ō, abolished in 1946 but still used in the Latgalian language while it lacks the letter Ŗ/ŗ.
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
| 8x | Ū | Ģ | ‚ | ō | „ | … | † | ‡ | Ž | ‰ | Š | ‹ | Ē | Ķ | Č | ģ |
| 9x | ū | ‘ | ’ | “ | ” | • | – | — | ž | ™ | š | › | ē | ķ | č | Ō |
| Ax | NBSP | Ā | ā | Ļ | ¤ | ļ | ¦ | § | Ё | © | Ņ | « | ¬ | SHY | ® | ¯ |
| Bx | ° | ± | Ī | ī | ´ | µ | ¶ | · | ё | № | ņ | » | ¼ | ½ | ¾ | × |
Finnish variant
[edit]Windows Cyrillic + Finnish is a modification of Windows-1251 that was used by Paratype to cover the Finnish language. This encoding is supported by FontLab Studio 5.[18] This variant is missing the letters Š and Ž which are used in loanwords in Finnish and can be replaced by the digraphs SH and ZH.
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
| 8x | Ђ | Ѓ | ‚ | ѓ | „ | … | † | ‡ | ˆ | ‰ | Љ | ‹ | Њ | Ќ | Ћ | Џ |
| 9x | ђ | ‘ | ’ | “ | ” | • | – | — | ˜ | ™ | љ | › | њ | ќ | ћ | џ |
| Ax | NBSP | Ў | ў | Ó | ¤ | Ґ | ¦ | § | Ё | © | Ä | « | ¬ | SHY | ® | Ö |
| Bx | ° | ± | Å | å | ґ | µ | ¶ | · | ё | № | ä | » | ó | É | é | ö |
Amiga variant
[edit]| MIME / IANA | Amiga-1251 |
|---|---|
| Alias(es) | Ami1251 |
| Languages | English, Russian |
| Classification | extended ASCII |
| Based on | Windows-1251, ISO-8859-1, ISO-8859-15 |
Russian Amiga OS systems used a version of code page 1251 which matches Windows-1251 for the Russian subset of the Cyrillic letters, but otherwise mostly follows ISO-8859-1. This version is known as Amiga-1251,[19] under which name it is registered with the IANA.[20]
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
| 8x | XXX | XXX | BPH | NBH | IND | NEL | SSA | ESA | HTS | HTJ | VTS | PLD | PLU | RI | SS2 | SS3 |
| 9x | DCS | PU1 | PU2 | STS | CCH | MW | SPA | EPA | SOS | XXX | SCI | CSI | ST | OSC | PM | APC |
| Ax | NBSP | ¡ | ¢ | £ | €[a] | ¥ | ¦ | § | Ё | © | №[b] | « | ¬ | SHY | ® | ¯ |
| Bx | ° | ± | ² | ³ | ´ | µ | ¶ | · | ё | ¹ | º | » | ¼ | ½ | ¾ | ¿ |
- ^ Matching ISO-8859-15; at a different location than in Windows-1251
- ^ Present in Windows-1251, but in a different location (absent from ISO-8859-1/15)
See also
[edit]References
[edit]- ^ "Historical trends in the usage of character encodings, January 2024". Retrieved 2024-01-01.
- ^ "Frequently Asked Questions".
- ^ "Distribution of Character Encodings among websites that use .ru". w3techs.com. Retrieved 2024-01-01.
- ^ "Distribution of Character Encodings among websites that use Russian". w3techs.com. Retrieved 2023-01-16.
- ^ "Distribution of Character Encodings among websites that use Russian Federation". w3techs.com. Retrieved 2021-11-05.
- ^ "cp1251(7) - Linux manual page". man7.org. Retrieved 2018-07-01.
- ^ "Code page 1251 information document". Archived from the original on 2016-03-03.
- ^ "CCSID 1251 information document". Archived from the original on 2014-11-29.
- ^ "CCSID 5347 information document". Archived from the original on 2014-11-29.
- ^ Code Page CPGID 01251 (pdf) (PDF), IBM
- ^ Code Page CPGID 01251 (txt), IBM
- ^ International Components for Unicode (ICU), ibm-1251_P100-1995.ucm, 2002-12-03
- ^ International Components for Unicode (ICU), ibm-5347_P100-1998.ucm, 2002-12-03
- ^ "Usage Statistics of Character Encodings for Websites". w3techs.com. Archived from the original on 2012-05-30.
- ^ Steele, Shawn (1998). CP1251 to Unicode table. Unicode Consortium. CP1251.TXT.
- ^ Whistler, Ken (2007). KZ-1048 to Unicode. Unicode Consortium. KZ1048.TXT.
- ^ ibm-1174_X100-2007.ucm, IBM
- ^ "FontLab Studio 5. Classic pro font editor for Mac & Windows".
- ^ a b Malyshev, Michael (2003). "Amiga-1251 to Unicode table". Registration of new charset [Amiga-1251]. IANA.
- ^ "Character Sets". IANA.
Further reading
[edit]- Kornai, Andras; Birnbaum, David J.; da Cruz, Frank; Davis, Bur; Fowler, George; Paine, Richard B.; Paperno, Slava; Simonsen, Keld J.; Thobe, Glenn E.; Vulis, Dimitri; van Wingen, Johan W. (1993-03-13). "CYRILLIC ENCODING FAQ Version 1.3". Retrieved 2020-06-24.
External links
[edit]- Windows 1251 reference chart
- IANA Charset Name Registration
- Unicode mappings of windows 1251 with "best fit"
- Universal Cyrillic decoder, an online program that may help recovering unreadable Cyrillic texts with broken Windows-1251 or other character encodings.
Windows-1251
View on GrokipediaHistory and Development
Origins and Introduction
Windows-1251 was developed by Microsoft in the early 1990s to provide enhanced support for Cyrillic languages, overcoming the constraints of earlier encodings such as the DOS code page 866, which was primarily suited for console-based applications and lacked comprehensive coverage for graphical user interfaces.[4] This new encoding addressed the growing need for internationalization in Windows environments, where DOS-era code pages like 866 offered limited character sets that hindered proper rendering of text in multilingual contexts.[4] The encoding was introduced in 1992 alongside Windows 3.1, marking a key advancement in Microsoft's efforts to localize the operating system for non-Western markets.[5] As part of the broader Windows-125x family of code pages—designed to deliver single-byte character support for various non-Latin scripts—Windows-1251 enabled Russian-language versions of Windows 3.1 and subsequent editions like Windows for Workgroups 3.11.[4][5] This family of encodings, including 1250 for Central European and 1252 for Western European languages, reflected Microsoft's strategy to extend beyond the ASCII standard for regional adaptations.[3] Initially targeted at Cyrillic-using languages such as Russian, Ukrainian, Belarusian, Bulgarian, Serbian, and Macedonian, Windows-1251 expanded the available character repertoire beyond the basic ASCII set by utilizing the upper byte range (0x80–0xFF).[6] This focus allowed for the inclusion of essential scripts and punctuation specific to these languages, facilitating broader adoption in Eastern European computing.[5] A core design principle of Windows-1251 was maintaining full compatibility with the ASCII standard in the 0x00–0x7F range, ensuring seamless integration with existing English-language software and data.[4] The remaining 128 slots were allocated primarily for uppercase and lowercase Cyrillic letters, along with diacritics, punctuation marks, and other symbols necessary for accurate text representation in the supported languages.[4] These choices prioritized efficiency in single-byte storage while supporting the nuances of Cyrillic orthography.Standardization Efforts
Microsoft registered Windows-1251 as code page 1251 in the early 1990s as part of its efforts to support Cyrillic languages in Windows operating systems. The encoding received formal recognition through the Internet Assigned Numbers Authority (IANA) on May 3, 1996, which assigned it the MIME name "windows-1251" along with aliases such as "cp1251" and "x-cp1251" for use in internet protocols and applications.[5] In the 2010s, the Web Hypertext Application Technology Working Group (WHATWG) standardized Windows-1251 within its Encoding Standard to ensure compatibility with legacy web content, defining precise algorithms for encoding and decoding byte sequences to scalar values. This inclusion facilitated its support in modern browsers and web technologies, including as a legacy single-byte encoding with predefined labels. IBM further integrated the encoding into its systems by assigning it CCSID 1251 for general Windows Cyrillic use and CCSID 5347 for an extended variant incorporating the euro sign, enabling conversions across EBCDIC and ASCII-based environments.[7][8] Maintenance of Windows-1251 has involved minor revisions for consistency, particularly in mappings to Unicode, as documented in Microsoft's official CP1251.TXT file released on April 15, 1998, which provides a comprehensive table aligning code points to Unicode 2.0 equivalents. These updates addressed errata in character assignments to support interoperability with Unicode-based systems. No major changes have occurred post-2020, though the encoding remains incorporated in contemporary specifications, such as the HTML5 encoding sniffing algorithm, which detects and handles it in document parsing for backward compatibility.[9]Encoding Details
Character Set Composition
Windows-1251 is an 8-bit single-byte character encoding that encompasses a total of 256 code points, divided into 128 positions compatible with the ASCII standard (0x00 to 0x7F) and 128 extended positions (0x80 to 0xFF) primarily dedicated to Cyrillic and related characters.[10] The extended range provides comprehensive coverage for Cyrillic letters, with uppercase and lowercase forms tailored to six Slavic languages—Russian, Ukrainian, Belarusian, Bulgarian, Serbian (Cyrillic variant), and Macedonian—as well as other Cyrillic-using languages.[10][11] This repertoire ensures support for the core alphabets of these languages, including unique letters like the short I (І/і) for Ukrainian and Belarusian, and the Kje (Ќ/ќ) and Gje (Ѓ/ѓ) for Macedonian. A distinctive feature is the allocation of separate code points to the letters Yo (Ё) and yo (ё), recognizing them as independent letters in Russian and related orthographies rather than as digraphs composed of Ye and a hard sign.[10] Beyond letters, the encoding incorporates a variety of additional symbols in the extended range, including punctuation marks such as curved quotes and em dashes, mathematical operators like the plus-minus sign (±), and currency symbols such as the euro sign (€) and the generic currency sign (¤). The control characters follow the standard C0 set (0x00 to 0x1F and 0x7F for DEL), while the C1 range (0x80 to 0x9F) deviates from traditional control functions by assigning most positions to printable symbols and letters instead. Notably, Windows-1251 includes no private use area, ensuring all code points map to defined characters without reserved spaces for custom implementations.[10]Code Point Assignments
Windows-1251 employs a single-byte, fixed-width encoding scheme, assigning each character to one of 256 possible byte values ranging from 0x00 to 0xFF, with no byte order or endianness considerations due to its fixed single-byte nature.[10] The code points from 0x00 to 0x7F are identical to those in US-ASCII (equivalent to ISO 646 basic set), providing direct compatibility for Latin characters and control codes.[10] In the extended range (0x80 to 0xFF), the encoding primarily maps to Cyrillic characters, along with additional punctuation, symbols, and spacing characters; notably, 0x98 remains undefined.[10] A key extension in this range is 0xA0, which maps to the non-breaking space (Unicode U+00A0), differing from some other 8-bit encodings where 0xA0 might be undefined or a control.[10] The following table details the code point assignments for 0x80 to 0xFF, including the byte value in hexadecimal, the corresponding character, its Unicode code point, and a brief description. These mappings are based on the official bidirectional conversion table provided by Microsoft.[10]| Hex | Character | Unicode | Description |
|---|---|---|---|
| 0x80 | Ђ | U+0402 | CYRILLIC CAPITAL LETTER DJE |
| 0x81 | Ѓ | U+0403 | CYRILLIC CAPITAL LETTER GJE |
| 0x82 | ‚ | U+201A | SINGLE LOW-9 QUOTATION MARK |
| 0x83 | ѓ | U+0453 | CYRILLIC SMALL LETTER GJE |
| 0x84 | „ | U+201E | DOUBLE LOW-9 QUOTATION MARK |
| 0x85 | … | U+2026 | HORIZONTAL ELLIPSIS |
| 0x86 | † | U+2020 | DAGGER |
| 0x87 | ‡ | U+2021 | DOUBLE DAGGER |
| 0x88 | € | U+20AC | EURO SIGN |
| 0x89 | ‰ | U+2030 | PER MILLE SIGN |
| 0x8A | Ј | U+0409 | CYRILLIC CAPITAL LETTER LJE |
| 0x8B | ‹ | U+2039 | SINGLE LEFT-POINTING ANGLE QUOTATION MARK |
| 0x8C | Љ | U+040A | CYRILLIC CAPITAL LETTER NJE |
| 0x8D | Ќ | U+040C | CYRILLIC CAPITAL LETTER KJE |
| 0x8E | Ћ | U+040B | CYRILLIC CAPITAL LETTER TSHE |
| 0x8F | Џ | U+040F | CYRILLIC CAPITAL LETTER DZHE |
| 0x90 | ђ | U+0452 | CYRILLIC SMALL LETTER DJE |
| 0x91 | ‘ | U+2018 | LEFT SINGLE QUOTATION MARK |
| 0x92 | ’ | U+2019 | RIGHT SINGLE QUOTATION MARK |
| 0x93 | “ | U+201C | LEFT DOUBLE QUOTATION MARK |
| 0x94 | ” | U+201D | RIGHT DOUBLE QUOTATION MARK |
| 0x95 | • | U+2022 | BULLET |
| 0x96 | – | U+2013 | EN DASH |
| 0x97 | — | U+2014 | EM DASH |
| 0x98 | (undefined) | Undefined | |
| 0x99 | ™ | U+2122 | TRADE MARK SIGN |
| 0x9A | ј | U+0459 | CYRILLIC SMALL LETTER LJE |
| 0x9B | › | U+203A | SINGLE RIGHT-POINTING ANGLE QUOTATION MARK |
| 0x9C | љ | U+045A | CYRILLIC SMALL LETTER NJE |
| 0x9D | ќ | U+045C | CYRILLIC SMALL LETTER KJE |
| 0x9E | ћ | U+045B | CYRILLIC SMALL LETTER TSHE |
| 0x9F | џ | U+045F | CYRILLIC SMALL LETTER DZHE |
| 0xA0 | U+00A0 | NO-BREAK SPACE | |
| 0xA1 | Ў | U+040E | CYRILLIC CAPITAL LETTER SHORT U |
| 0xA2 | ў | U+045E | CYRILLIC SMALL LETTER SHORT U |
| 0xA3 | Ј | U+0408 | CYRILLIC CAPITAL LETTER JE |
| 0xA4 | ¤ | U+00A4 | CURRENCY SIGN |
| 0xA5 | Ґ | U+0490 | CYRILLIC CAPITAL LETTER GHE WITH UPTURN |
| 0xA6 | ¦ | U+00A6 | BROKEN BAR |
| 0xA7 | § | U+00A7 | SECTION SIGN |
| 0xA8 | Ё | U+0401 | CYRILLIC CAPITAL LETTER IO |
| 0xA9 | © | U+00A9 | COPYRIGHT SIGN |
| 0xAA | Є | U+0404 | CYRILLIC CAPITAL LETTER UKRAINIAN IE |
| 0xAB | « | U+00AB | LEFT-POINTING DOUBLE ANGLE QUOTATION MARK |
| 0xAC | ¬ | U+00AC | NOT SIGN |
| 0xAD | | U+00AD | SOFT HYPHEN |
| 0xAE | ® | U+00AE | REGISTERED SIGN |
| 0xAF | Ї | U+0407 | CYRILLIC CAPITAL LETTER YI |
| 0xB0 | ° | U+00B0 | DEGREE SIGN |
| 0xB1 | ± | U+00B1 | PLUS-MINUS SIGN |
| 0xB2 | І | U+0406 | CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I |
| 0xB3 | і | U+0456 | CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I |
| 0xB4 | ґ | U+0491 | CYRILLIC SMALL LETTER GHE WITH UPTURN |
| 0xB5 | µ | U+00B5 | MICRO SIGN |
| 0xB6 | ¶ | U+00B6 | PILCROW SIGN |
| 0xB7 | · | U+00B7 | MIDDLE DOT |
| 0xB8 | ё | U+0451 | CYRILLIC SMALL LETTER IO |
| 0xB9 | № | U+2116 | NUMERO SIGN |
| 0xBA | є | U+0454 | CYRILLIC SMALL LETTER UKRAINIAN IE |
| 0xBB | » | U+00BB | RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK |
| 0xBC | ј | U+0458 | CYRILLIC SMALL LETTER JE |
| 0xBD | Ѕ | U+0405 | CYRILLIC CAPITAL LETTER DZE |
| 0xBE | ѕ | U+0455 | CYRILLIC SMALL LETTER DZE |
| 0xBF | ї | U+0457 | CYRILLIC SMALL LETTER YI |
| 0xC0 | А | U+0410 | CYRILLIC CAPITAL LETTER A |
| 0xC1 | Б | U+0411 | CYRILLIC CAPITAL LETTER BE |
| 0xC2 | В | U+0412 | CYRILLIC CAPITAL LETTER VE |
| 0xC3 | Г | U+0413 | CYRILLIC CAPITAL LETTER GHE |
| 0xC4 | Д | U+0414 | CYRILLIC CAPITAL LETTER DE |
| 0xC5 | Е | U+0415 | CYRILLIC CAPITAL LETTER IE |
| 0xC6 | Ж | U+0416 | CYRILLIC CAPITAL LETTER ZHE |
| 0xC7 | З | U+0417 | CYRILLIC CAPITAL LETTER ZE |
| 0xC8 | И | U+0418 | CYRILLIC CAPITAL LETTER I |
| 0xC9 | Й | U+0419 | CYRILLIC CAPITAL LETTER SHORT I |
| 0xCA | К | U+041A | CYRILLIC CAPITAL LETTER KA |
| 0xCB | Л | U+041B | CYRILLIC CAPITAL LETTER EL |
| 0xCC | М | U+041C | CYRILLIC CAPITAL LETTER EM |
| 0xCD | Н | U+041D | CYRILLIC CAPITAL LETTER EN |
| 0xCE | О | U+041E | CYRILLIC CAPITAL LETTER O |
| 0xCF | П | U+041F | CYRILLIC CAPITAL LETTER PE |
| 0xD0 | Р | U+0420 | CYRILLIC CAPITAL LETTER ER |
| 0xD1 | С | U+0421 | CYRILLIC CAPITAL LETTER ES |
| 0xD2 | Т | U+0422 | CYRILLIC CAPITAL LETTER TE |
| 0xD3 | У | U+0423 | CYRILLIC CAPITAL LETTER U |
| 0xD4 | Ф | U+0424 | CYRILLIC CAPITAL LETTER EF |
| 0xD5 | Х | U+0425 | CYRILLIC CAPITAL LETTER HA |
| 0xD6 | Ц | U+0426 | CYRILLIC CAPITAL LETTER TSE |
| 0xD7 | Ч | U+0427 | CYRILLIC CAPITAL LETTER CHE |
| 0xD8 | Ш | U+0428 | CYRILLIC CAPITAL LETTER SHA |
| 0xD9 | Щ | U+0429 | CYRILLIC CAPITAL LETTER SHCHA |
| 0xDA | Ъ | U+042A | CYRILLIC CAPITAL LETTER HARD SIGN |
| 0xDB | Ы | U+042B | CYRILLIC CAPITAL LETTER YERU |
| 0xDC | Ь | U+042C | CYRILLIC CAPITAL LETTER SOFT SIGN |
| 0xDD | Э | U+042D | CYRILLIC CAPITAL LETTER E |
| 0xDE | Ю | U+042E | CYRILLIC CAPITAL LETTER YU |
| 0xDF | Я | U+042F | CYRILLIC CAPITAL LETTER YA |
| 0xE0 | а | U+0430 | CYRILLIC SMALL LETTER A |
| 0xE1 | б | U+0431 | CYRILLIC SMALL LETTER BE |
| 0xE2 | в | U+0432 | CYRILLIC SMALL LETTER VE |
| 0xE3 | г | U+0433 | CYRILLIC SMALL LETTER GHE |
| 0xE4 | д | U+0434 | CYRILLIC SMALL LETTER DE |
| 0xE5 | е | U+0435 | CYRILLIC SMALL LETTER IE |
| 0xE6 | ж | U+0436 | CYRILLIC SMALL LETTER ZHE |
| 0xE7 | з | U+0437 | CYRILLIC SMALL LETTER ZE |
| 0xE8 | и | U+0438 | CYRILLIC SMALL LETTER I |
| 0xE9 | й | U+0439 | CYRILLIC SMALL LETTER SHORT I |
| 0xEA | к | U+043A | CYRILLIC SMALL LETTER KA |
| 0xEB | л | U+043B | CYRILLIC SMALL LETTER EL |
| 0xEC | м | U+043C | CYRILLIC SMALL LETTER EM |
| 0xED | н | U+043D | CYRILLIC SMALL LETTER EN |
| 0xEE | о | U+043E | CYRILLIC SMALL LETTER O |
| 0xEF | п | U+043F | CYRILLIC SMALL LETTER PE |
| 0xF0 | р | U+0440 | CYRILLIC SMALL LETTER ER |
| 0xF1 | с | U+0441 | CYRILLIC SMALL LETTER ES |
| 0xF2 | т | U+0442 | CYRILLIC SMALL LETTER TE |
| 0xF3 | у | U+0443 | CYRILLIC SMALL LETTER U |
| 0xF4 | ф | U+0444 | CYRILLIC SMALL LETTER EF |
| 0xF5 | х | U+0445 | CYRILLIC SMALL LETTER HA |
| 0xF6 | ц | U+0446 | CYRILLIC SMALL LETTER TSE |
| 0xF7 | ч | U+0447 | CYRILLIC SMALL LETTER CHE |
| 0xF8 | ш | U+0448 | CYRILLIC SMALL LETTER SHA |
| 0xF9 | щ | U+0449 | CYRILLIC SMALL LETTER SHCHA |
| 0xFA | ъ | U+044A | CYRILLIC SMALL LETTER HARD SIGN |
| 0xFB | ы | U+044B | CYRILLIC SMALL LETTER YERU |
| 0xFC | ь | U+044C | CYRILLIC SMALL LETTER SOFT SIGN |
| 0xFD | э | U+044D | CYRILLIC SMALL LETTER E |
| 0xFE | ю | U+044E | CYRILLIC SMALL LETTER YU |
| 0xFF | я | U+044F | CYRILLIC SMALL LETTER YA |
Variants
Kazakh Variants
The Kazakh variants of Windows-1251 adapt the standard encoding to support the additional letters unique to the Kazakh Cyrillic alphabet, such as schwa (Ә/ә), en with descender (Ң/ң), straight U (Ұ/ұ), straight U with stroke (Ү/ү), barred O (Ө/ө), and others, by remapping unused code points primarily in the 0xA0–0xBF range.[13] These modifications ensure compatibility with Kazakh orthography while preserving the Russian Cyrillic subset from the base Windows-1251 encoding.[14] The primary variant is KZ-1048, also designated as STRK1048-2002, which serves as the national standard established by the Republic of Kazakhstan in 2002.[14] This encoding remaps approximately 16 code points from Windows-1251 to accommodate 12 Kazakh-specific characters, reallocating positions that were previously undefined or used for non-Kazakh symbols.[15] Key examples include the assignment of 0xA3 to Ә (U+04D8, CYRILLIC CAPITAL LETTER SCHWA), 0xBD to Ң (U+04A2, CYRILLIC CAPITAL LETTER EN WITH DESCENDER), and 0xAF to Ұ (U+04AE, CYRILLIC CAPITAL LETTER STRAIGHT U).[13] The following table summarizes select remapped code points for Kazakh letters in KZ-1048:| Hex Code | Character (Capital/Small) | Unicode (Capital) | Description |
|---|---|---|---|
| 0xA1 | Ү / ү | U+04B0 | CYRILLIC CAPITAL LETTER STRAIGHT U WITH STROKE |
| 0xA3 | Ә / ә | U+04D8 | CYRILLIC CAPITAL LETTER SCHWA |
| 0xA5 | Ө / ө | U+04E8 | CYRILLIC CAPITAL LETTER BARRED O |
| 0xAF | Ұ / ұ | U+04AE | CYRILLIC CAPITAL LETTER STRAIGHT U |
| 0xBD | Ң / ң | U+04A2 | CYRILLIC CAPITAL LETTER EN WITH DESCENDER |
Amiga Adaptation
Amiga-1251, a variant adapted for Amiga operating systems during the 1990s, combines the Cyrillic character block from Windows-1251 (code points 0xC0–0xFF) with the Latin-1 characters from ISO/IEC 8859-1 (code points 0x80–0xBF) to enable support for both Cyrillic and Western European languages on the platform.[18] This encoding also incorporates elements from ISO/IEC 8859-15, such as the Euro sign, along with the numero sign (№) and the Russian-specific letters Ё and ё.[19] Developed by the Amiga user community to address the platform's need for multilingual text handling in software and fonts, Amiga-1251 is not an official Microsoft product but a grassroots effort tailored to AmigaOS environments.[18] Key modifications in the Latin range replace symbols present in the base encodings with additional Western European accented characters, such as Ñ/ñ and ¿, to better accommodate languages like Spanish while preserving Cyrillic integrity in the higher code points; these changes facilitated its use in Amiga font editors and localized ports of international applications.[19] The partial overlap between Amiga-1251's Latin and Cyrillic assignments and those in standard Windows-1251 often results in compatibility issues, including mojibake—garbled text display—when files from the two encodings are interchanged without proper conversion.[18] Following the decline of Amiga hardware and software ecosystems in the early 2000s, Amiga-1251 fell into obsolescence, though it persists in documentation within retro computing repositories like the Aminet archive.[20]Usage and Adoption
In Microsoft Windows
Windows-1251 serves as the default ANSI code page (code page 1251) for Cyrillic locales, such as Russian, in Microsoft Windows operating systems from Windows 95 through Windows 10 and Windows 11.[3][21] This encoding enables proper representation of Cyrillic characters in system interfaces and applications configured for those locales, where the ANSI code page determines the mapping for non-Unicode text handling.[22] Core Windows fonts, including Arial and Times New Roman, provide full glyph support for Windows-1251, ensuring accurate rendering of its character set across various weights and styles.[23][24] These fonts, bundled with Windows since version 95, include the necessary Cyrillic extensions in their design, facilitating consistent display in documents and user interfaces without requiring additional installations for basic usage.[25] In legacy applications developed before widespread Unicode adoption, Windows-1251 was extensively used for text storage and display, particularly in pre-Unicode software running on Cyrillic-configured systems.[26] Applications like the original Notepad saved plain text files (.txt) in the system's ANSI code page, which was Windows-1251 for Russian locales, while legacy Microsoft Word documents (.doc) similarly relied on it for non-Unicode content.[27] This made Windows-1251 the de facto standard for file formats and data interchange in Eastern European and CIS region deployments during the 1990s and early 2000s. Windows provides built-in system APIs for handling Windows-1251, such as the MultiByteToWideChar function, which converts text from code page 1251 to wide-character (UTF-16) format for internal processing.[28] This support has been available since Windows NT, allowing developers to integrate Cyrillic text conversion seamlessly into applications without external libraries.[29] As of 2025, Windows-1251 remains supported in Windows 11 for backward compatibility with legacy files and non-Unicode applications, though Microsoft recommends UTF-8 for new development due to its broader character coverage and cross-platform consistency.[1] Modern Windows applications, including the updated Notepad, feature auto-detection mechanisms that identify and correctly render Windows-1251 encoded files, mitigating issues in mixed-encoding environments.[27]Web and Other Applications
Windows-1251 remains in use on approximately 0.2% of global websites as of November 2025, primarily as a legacy encoding for Cyrillic content, though its adoption has steadily declined over the past decade.[30] Among websites hosted in the Russian Federation, UTF-8 dominates with 96.4% usage, leaving Windows-1251 and other legacy Cyrillic encodings to serve the remaining share, often in older or regionally specific sites.[31] This encoding's web presence is concentrated in legacy systems where migration to Unicode has not yet occurred, but new web development overwhelmingly favors UTF-8 for broader compatibility.[32] In email and MIME standards, Windows-1251 is officially registered as a charset name by the Internet Assigned Numbers Authority (IANA), enabling its use in email headers and bodies for Cyrillic text transmission.[5] It persists in legacy email systems, particularly for Russian and Bulgarian correspondence, where older clients and servers default to this encoding for compatibility with pre-Unicode infrastructure.[33] Support for Windows-1251 extends to non-Windows platforms, including Linux and Unix systems through the iconv utility, which handles conversions using the alias CP1251 for Cyrillic data processing.[34] In Java environments, Oracle's implementation includes windows-1251 as a supported charset in the java.base module, with aliases like Cp1251 and cp1251 for input/output operations involving legacy Cyrillic files.[35] IBM mainframes recognize it via Coded Character Set Identifier (CCSID) 1251, facilitating data exchange in enterprise environments with historical Cyrillic requirements.[8] Various applications continue to leverage Windows-1251 for Cyrillic handling, such as Adobe Acrobat for importing and exporting text in PDF documents, where it ensures proper rendering of Russian characters in legacy workflows.[36] Older web browsers like Internet Explorer 6 natively supported windows-1251 for displaying Cyrillic web pages, a feature inherited from earlier versions to accommodate regional content without Unicode.[37] PDF generation tools also employ it for embedding Cyrillic fonts in documents targeted at legacy systems.[38] As of 2025, Windows-1251 sees minimal new adoption, confined largely to legacy databases and regional intranets in Cyrillic-speaking areas, where it maintains compatibility with unmodernized archives and internal networks.[39]Compatibility and Comparisons
With Other Cyrillic Encodings
Windows-1251 differs significantly from ISO/IEC 8859-5, the international standard for Cyrillic encoding established in 1988, in both character coverage and code point assignments. While ISO/IEC 8859-5 provides 96 graphic characters in the 0xA0–0xFF range, including 66 core Cyrillic letters for Russian and related languages, Windows-1251 expands this to 191 usable characters across 0x80–0xFF, incorporating additional symbols, punctuation, and letters for non-Russian Slavic languages such as Ukrainian (e.g., Ukrainian IE at 0xAA/0xBA) and Belarusian (e.g., BYELORUSSIAN-UKRAINIAN I at 0xB2/0xB3).[40][10] Specific additions in Windows-1251 include typographic symbols like the euro sign (0x88, U+20AC) and en dash (0x96, U+2013), absent in ISO/IEC 8859-5, as well as the Yo letter (Ё/ё at 0xA8/0xB8), which appears at 0xA1/0xF1 in the ISO standard. Mappings for common letters are incompatible; for instance, Cyrillic capital A (А, U+0410) is at 0xC0 in Windows-1251 but 0xB0 in ISO/IEC 8859-5.[40][10][41] In comparison to KOI8-R, the de facto Unix standard for Russian text defined in RFC 1489, Windows-1251 offers broader multilingual support while maintaining ASCII compatibility in the 0x00–0x7F range, unlike KOI8-R, which reserves this for controls and symbols. Both encodings cover the 33 letters of the Russian alphabet, but KOI8-R employs a distinctive design where uppercase letters occupy 0xE0–0xFF and lowercase 0xC0–0xDF, effectively reversing the case bit relative to standard 7-bit assignments (e.g., capital A at 0xE1, U+0410; small a at 0xC1, U+0430).[42][43] This phonetic ordering in KOI8-R prioritizes Russian frequency but limits it primarily to Russian, excluding dedicated support for Bulgarian or Serbian variants found in Windows-1251 (e.g., short U at 0xA1/0xA2). KOI8-R was preferred in Unix environments for its legacy in Soviet computing standards, whereas Windows-1251 integrated better with Microsoft ecosystems.[42][43][10] IBM-866, a DOS-era code page (CCSID 866) developed by IBM for Cyrillic under OS/2 and MS-DOS, contrasts with Windows-1251 through its emphasis on terminal graphics over typographic symbols. IBM-866 allocates 0x80–0x9F and parts of 0xA0–0xAF for uppercase and lowercase Cyrillic letters, respectively, with additional ranges like 0xE0–0xFF for more small letters and symbols, but its layout derives from bit-reversed assignments inspired by earlier EBCDIC influences, differing from Windows-1251's sequential Cyrillic block at 0xC0–0xFF.[44][45] For example, capital A (U+0410) is at 0x80 in IBM-866 versus 0xC0 in Windows-1251. While both support Russian Cyrillic, IBM-866 prioritizes box-drawing characters (e.g., 0xB0–0xDF for lines and blocks) for text-mode interfaces, sacrificing space for the punctuation and currency symbols (e.g., section sign at 0xA7 in Windows-1251) that enhance Windows-1251's document handling.[44][10][45] Direct interchange between Windows-1251 and these encodings risks mojibake due to mismatched mappings, such as displaying Cyrillic A as a degree sign or control character if misinterpreted. For instance, a Windows-1251 file read as ISO/IEC 8859-5 would render 0xC0 (А) as an undefined or control code, while KOI8-R misread as Windows-1251 swaps cases and scatters letters. Conversion tools like GNU recode or iconv are essential for safe migration, supporting pairwise transformations via Unicode as an intermediary. Windows-1251's advantages include its superset of ISO/IEC 8859-5's Cyrillic coverage with added multilingual extensions and full ASCII compatibility, reducing legacy issues compared to KOI8-R's non-ASCII low bytes or IBM-866's graphics focus, making it suitable for broader East European text processing in Windows environments.[10][40][43]Migration to Unicode
Windows-1251 has a complete one-to-one mapping to the Unicode Cyrillic block (U+0400–U+04FF), enabling straightforward conversion for standard Cyrillic characters as defined in Unicode 1.0 and later versions. Microsoft provides official conversion tables that align the 8-bit code points of Windows-1251 with their Unicode equivalents, primarily covering Russian, Bulgarian, Serbian, and other Slavic languages using the Cyrillic script.[10][3] However, migrations can encounter challenges, particularly with variants of Windows-1251 such as those adapted for Kazakh (e.g., Code Page 1174 or 58595), where certain byte positions differ from the standard encoding and may map to non-standard or additional characters not directly represented in the core Unicode Cyrillic range. These discrepancies can result in lossy conversions, where undefined bytes (e.g., 0x98 in the standard table) are replaced with substitution characters like the Unicode replacement character (U+FFFD) or question marks, potentially losing semantic information in legacy data. Additionally, as a single-byte encoding, Windows-1251 lacks byte-order marks (BOMs), complicating automatic detection during conversion without explicit metadata.[10][46] For effective migration, best practices emphasize using established libraries for batch conversions to preserve data integrity. The International Components for Unicode (ICU) library supports direct conversion from Windows-1251 (CP1251) to UTF-8 or UTF-16, handling fallback mappings for undefined characters. Similarly, Python's built-in codecs module allows reliable decoding viacodecs.decode(data, 'cp1251') followed by UTF-8 encoding, suitable for scripting large-scale file conversions. New content creation should default to UTF-8 to align with modern standards, ensuring compatibility across platforms without region-specific limitations.[47]
As of November 2025, Windows-1251 persists as a legacy encoding on approximately 0.2% of websites globally, with modern browsers like Chrome and Firefox employing auto-detection heuristics (e.g., via the Encoding Standard) to render it correctly without user intervention. Microsoft has deprecated reliance on code pages in favor of Unicode for new Windows 11 applications, where UTF-8 or UTF-16 serves as the default for text handling, eliminating Windows-1251 as an implicit fallback in UWP apps and .NET frameworks.[30][48][3]
Migration to Unicode offers key benefits, including support for 154,998 characters across 168 scripts, far exceeding Windows-1251's 256-code-point limit and enabling seamless handling of mixed-language content. This transition prevents mojibake—garbled text from encoding mismatches—in global contexts, such as international email or web localization, where legacy 8-bit encodings often fail.[49][50]
A notable case study is the Russian web ecosystem, where Windows-1251 dominated over 50% of sites in the early 2000s due to its status as the default for Russian Windows installations, but adoption has plummeted to under 5% by 2025 amid widespread UTF-8 standardization. This shift, driven by browser support and content management systems like WordPress defaulting to Unicode, has improved accessibility while requiring one-time conversions for archival content.[30][51]References
- https://en.wikibooks.org/wiki/Character_Encodings/Code_Tables/Windows/Code_page_58595
