Hubbry Logo
Windows-1251Windows-1251Main
Open search
Windows-1251
Community hub
Windows-1251
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Windows-1251
Windows-1251
from Wikipedia
Windows-1251
MIME / IANAwindows-1251
Alias(es)cp1251 (Code page 1251)
LanguagesRussian, Ukrainian, Belarusian, Bulgarian, Serbian Cyrillic, Bosnian Cyrillic, Macedonian, Rotokas, Rusyn, English
Created byMicrosoft
StandardWHATWG Encoding Standard
Classificationextended ASCII, Windows-125x
Other related encodingsAmiga-1251, KZ-1048,
RFC 1345's "ECMA-Cyrillic"

Windows-1251 is an 8-bit character encoding, designed to cover languages that use the Cyrillic script such as Russian, Ukrainian, Belarusian, Bulgarian, Serbian Cyrillic, Macedonian and other languages.

On the web, it is the second most-used single-byte character encoding (or third most-used character encoding overall), and most used of the single-byte encodings supporting Cyrillic. As of January 2024, 0.3% of all websites use Windows-1251.[1][2] It is by far mostly used for Russian, while a small minority of Russian websites use it, with 94.6% of Russian (.ru) websites using UTF-8,[3][4][5] and the legacy 8-bit encoding is distant second. In Linux, the encoding is known as cp1251.[6] IBM uses code page 1251 (CCSID 1251 and euro sign extended CCSID 5347) for Windows-1251.[7][8][9][10][11][12][13]

Windows-1251 and KOI8-R (or its Ukrainian variant KOI8-U) are much more commonly used than ISO 8859-5 (which is used by less than 0.0004% of websites).[14] In contrast to Windows-1252 and ISO 8859-1, Windows-1251 is not closely related to ISO 8859-5.

Unicode (e.g. UTF-8) is preferred to Windows-1251 or other Cyrillic encodings in modern applications, especially on the Internet, making UTF-8 the dominant encoding for web pages. (For further discussion of Unicode's complete coverage, of 436 Cyrillic letters/code points, including for Old Cyrillic, and how single-byte character encodings, such as Windows-1251 and KOI8-R, cannot provide this, see Cyrillic script in Unicode.)

Character set

[edit]

The following table shows Windows-1251. Each character is shown with its Unicode equivalent and its Alt code.

Windows-1251[15]
0 1 2 3 4 5 6 7 8 9 A B C D E F
0x NUL SOH STX ETX EOT ENQ ACK BEL BS HT LF VT FF CR SO SI
1x DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US
2x  SP  ! " # $ % & ' ( ) * + , - . /
3x 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
4x @ A B C D E F G H I J K L M N O
5x P Q R S T U V W X Y Z [ \ ] ^ _
6x ` a b c d e f g h i j k l m n o
7x p q r s t u v w x y z { | } ~ DEL
8x Ђ Ѓ ѓ Љ Њ Ќ Ћ Џ
9x ђ љ њ ќ ћ џ
Ax NBSP Ў ў Ј ¤ Ґ ¦ § Ё © Є « ¬ SHY ® Ї
Bx ° ± І і ґ µ · ё є » ј Ѕ ѕ ї
Cx А Б В Г Д Е Ж З И Й К Л М Н О П
Dx Р С Т У Ф Х Ц Ч Ш Щ Ъ Ы Ь Э Ю Я
Ex а б в г д е ж з и й к л м н о п
Fx р с т у ф х ц ч ш щ ъ ы ь э ю я
  Differences from Windows-1252

Kazakh variants

[edit]

KZ-1048

[edit]

An altered version of Windows-1251 was standardised in Kazakhstan as Kazakh standard STRK1048, and is known by the label KZ-1048. It differs in the rows shown below:

KZ-1048 (STRK1048-2002)[16]
0 1 2 3 4 5 6 7 8 9 A B C D E F
8x Ђ Ѓ ѓ Љ Њ Қ Һ Џ
9x ђ љ њ қ һ џ
Ax NBSP Ұ ұ Ә ¤ Ө ¦ § Ё © Ғ « ¬ SHY ® Ү
Bx ° ± І і ө µ · ё ғ » ә Ң ң ү
  Differences from Windows-1251

Code Page 1174

[edit]

Code Page 1174 is another variant created for the Kazakh language, which matches Windows-1251 for the Russian subset of the Cyrillic letters. It differs from KZ-1048 by moving the Cyrillic letter Shha from 8E/9E to 8A/9A.

Code page 1174[17]
0 1 2 3 4 5 6 7 8 9 A B C D E F
8x Ђ Ѓ ѓ Һ Њ Қ Ћ Џ
9x ђ һ њ қ ћ џ
Ax NBSP Ұ ұ Ә ¤ Ө ¦ § Ё © Ғ « ¬ SHY ® Ү
Bx ° ± І і ө µ · ё ғ » ә Ң ң ү
  Different from Windows-1251

Latvian variant

[edit]

Windows Latvian + Russian is a modification of Windows-1251 to support the Latvian language. It uses the letter Ō/ō, abolished in 1946 but still used in the Latgalian language while it lacks the letter Ŗ/ŗ.

Windows Latvian + Russian
0 1 2 3 4 5 6 7 8 9 A B C D E F
8x Ū Ģ ō Ž Š Ē Ķ Č ģ
9x ū ž š ē ķ č Ō
Ax NBSP Ā ā Ļ ¤ ļ ¦ § Ё © Ņ « ¬ SHY ® ¯
Bx ° ± Ī ī ´ µ · ё ņ » ¼ ½ ¾ ×
  Differences from Windows-1251

Finnish variant

[edit]

Windows Cyrillic + Finnish is a modification of Windows-1251 that was used by Paratype to cover the Finnish language. This encoding is supported by FontLab Studio 5.[18] This variant is missing the letters Š and Ž which are used in loanwords in Finnish and can be replaced by the digraphs SH and ZH.

Windows Cyrillic + Finnish
0 1 2 3 4 5 6 7 8 9 A B C D E F
8x Ђ Ѓ ѓ ˆ Љ Њ Ќ Ћ Џ
9x ђ ˜  љ њ ќ ћ џ
Ax NBSP Ў ў Ó ¤ Ґ ¦ § Ё © Ä « ¬ SHY ® Ö
Bx ° ± Å å ґ µ · ё ä » ó É é ö
  Differences from Windows-1251

Amiga variant

[edit]
Amiga-1251
MIME / IANAAmiga-1251
Alias(es)Ami1251
LanguagesEnglish, Russian
Classificationextended ASCII
Based onWindows-1251, ISO-8859-1, ISO-8859-15

Russian Amiga OS systems used a version of code page 1251 which matches Windows-1251 for the Russian subset of the Cyrillic letters, but otherwise mostly follows ISO-8859-1. This version is known as Amiga-1251,[19] under which name it is registered with the IANA.[20]

Amiga-1251[19]
0 1 2 3 4 5 6 7 8 9 A B C D E F
8x XXX XXX BPH NBH IND NEL SSA ESA HTS HTJ VTS PLD PLU RI SS2 SS3
9x DCS PU1 PU2 STS CCH MW SPA EPA SOS XXX SCI CSI ST OSC PM APC
Ax NBSP ¡ ¢ £ [a] ¥ ¦ § Ё © [b] « ¬ SHY ® ¯
Bx ° ± ² ³ ´ µ · ё ¹ º » ¼ ½ ¾ ¿
  Different from Windows-1251 to match ISO-8859-1
  Different from both Windows-1251 and ISO-8859-1
  1. ^ Matching ISO-8859-15; at a different location than in Windows-1251
  2. ^ Present in Windows-1251, but in a different location (absent from ISO-8859-1/15)

See also

[edit]

References

[edit]

Further reading

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Windows-1251, also known as code page 1251 (CP1251), is an 8-bit single-byte character encoding developed by Microsoft to support the Cyrillic script and associated languages, including Russian, Bulgarian, and Serbian. It maps 256 code points to characters, with the first 128 (0x00–0x7F) compatible with ASCII for basic Latin letters, digits, and symbols, while the upper range (0x80–0xFF) accommodates Cyrillic letters, punctuation, and additional symbols. This encoding serves as the standard for Cyrillic text representation in legacy Windows applications and documents. Introduced in the mid-1990s, Windows-1251 first appeared with Windows 3.1 and subsequent versions like Windows 3.11 for Russian and Central/Eastern European locales, enabling proper display and input of Cyrillic text in early multilingual Windows environments. It became the default ANSI code page for Cyrillic regions in Windows 95 and Windows NT 3.5/3.51, and was fully integrated into English-language versions starting with Windows NT 4.0 in 1996. In 1998, the encoding was updated to include the euro symbol (€) at code point 0x80 and other minor additions for broader compatibility. As a legacy encoding, Windows-1251 remains supported in modern Windows for with older files, software, and web content, though recommends transitioning to (UTF-8 or UTF-16) for new applications to handle global scripts more efficiently. It is recognized by labels such as "windows-1251", "cp1251", and "x-cp1251" in various systems and standards. The encoding's design ensures one-to-one mapping for most characters, facilitating simple conversion tools.

History and Development

Origins and Introduction

Windows-1251 was developed by Microsoft in the early 1990s to provide enhanced support for Cyrillic languages, overcoming the constraints of earlier encodings such as the DOS code page 866, which was primarily suited for console-based applications and lacked comprehensive coverage for graphical user interfaces. This new encoding addressed the growing need for internationalization in Windows environments, where DOS-era code pages like 866 offered limited character sets that hindered proper rendering of text in multilingual contexts. The encoding was introduced in 1992 alongside , marking a key advancement in Microsoft's efforts to localize the operating system for non-Western markets. As part of the broader Windows-125x family of code pages—designed to deliver single-byte character support for various non-Latin scripts—Windows-1251 enabled Russian-language versions of and subsequent editions like Windows for Workgroups 3.11. This family of encodings, including for Central European and 1252 for Western European languages, reflected Microsoft's strategy to extend beyond the ASCII standard for regional adaptations. Initially targeted at Cyrillic-using languages such as Russian, Ukrainian, Belarusian, Bulgarian, Serbian, and Macedonian, Windows-1251 expanded the available character repertoire beyond the basic ASCII set by utilizing the upper byte range (0x80–0xFF). This focus allowed for the inclusion of essential scripts and punctuation specific to these languages, facilitating broader adoption in Eastern European computing. A core design principle of Windows-1251 was maintaining full compatibility with the ASCII standard in the 0x00–0x7F range, ensuring seamless integration with existing English-language software and data. The remaining 128 slots were allocated primarily for uppercase and lowercase Cyrillic letters, along with diacritics, punctuation marks, and other symbols necessary for accurate text representation in the supported languages. These choices prioritized efficiency in single-byte storage while supporting the nuances of Cyrillic orthography.

Standardization Efforts

Microsoft registered Windows-1251 as code page 1251 in the early 1990s as part of its efforts to support Cyrillic languages in Windows operating systems. The encoding received formal recognition through the Internet Assigned Numbers Authority (IANA) on May 3, 1996, which assigned it the MIME name "windows-1251" along with aliases such as "cp1251" and "x-cp1251" for use in internet protocols and applications. In the , the standardized Windows-1251 within its Encoding Standard to ensure compatibility with legacy web content, defining precise algorithms for encoding and decoding byte sequences to scalar values. This inclusion facilitated its support in modern browsers and web technologies, including as a legacy single-byte encoding with predefined labels. further integrated the encoding into its systems by assigning it CCSID 1251 for general Windows Cyrillic use and CCSID 5347 for an extended variant incorporating the euro sign, enabling conversions across and ASCII-based environments. Maintenance of Windows-1251 has involved minor revisions for consistency, particularly in mappings to , as documented in Microsoft's official CP1251.TXT file released on April 15, 1998, which provides a comprehensive table aligning code points to Unicode 2.0 equivalents. These updates addressed errata in character assignments to support with Unicode-based systems. No major changes have occurred post-2020, though the encoding remains incorporated in contemporary specifications, such as the HTML5 encoding sniffing algorithm, which detects and handles it in document parsing for .

Encoding Details

Character Set Composition

Windows-1251 is an 8-bit single-byte character encoding that encompasses a total of 256 code points, divided into 128 positions compatible with the ASCII standard (0x00 to 0x7F) and 128 extended positions (0x80 to 0xFF) primarily dedicated to Cyrillic and related characters. The extended range provides comprehensive coverage for Cyrillic letters, with uppercase and lowercase forms tailored to six Slavic languages—Russian, Ukrainian, Belarusian, Bulgarian, Serbian (Cyrillic variant), and Macedonian—as well as other Cyrillic-using languages. This repertoire ensures support for the core alphabets of these languages, including unique letters like the short I (І/і) for Ukrainian and Belarusian, and the Kje (Ќ/ќ) and Gje (Ѓ/ѓ) for Macedonian. A distinctive feature is the allocation of separate code points to the letters Yo (Ё) and yo (ё), recognizing them as independent letters in Russian and related orthographies rather than as digraphs composed of Ye and a hard sign. Beyond letters, the encoding incorporates a variety of additional symbols in the extended range, including punctuation marks such as curved quotes and em dashes, mathematical operators like the plus-minus sign (±), and currency symbols such as the (€) and the generic currency sign (¤). The control characters follow the standard C0 set (0x00 to 0x1F and 0x7F for ), while the C1 range (0x80 to 0x9F) deviates from traditional control functions by assigning most positions to printable symbols and letters instead. Notably, Windows-1251 includes no private use area, ensuring all code points map to defined characters without reserved spaces for custom implementations.

Code Point Assignments

Windows-1251 employs a single-byte, fixed-width encoding scheme, assigning each character to one of 256 possible byte values ranging from 0x00 to 0xFF, with no byte order or endianness considerations due to its fixed single-byte nature. The code points from 0x00 to 0x7F are identical to those in US-ASCII (equivalent to ISO 646 basic set), providing direct compatibility for Latin characters and control codes. In the extended range (0x80 to 0xFF), the encoding primarily maps to Cyrillic characters, along with additional punctuation, symbols, and spacing characters; notably, 0x98 remains undefined. A key extension in this range is 0xA0, which maps to the non-breaking space (Unicode U+00A0), differing from some other 8-bit encodings where 0xA0 might be undefined or a control. The following table details the code point assignments for 0x80 to 0xFF, including the byte value in hexadecimal, the corresponding character, its Unicode code point, and a brief description. These mappings are based on the official bidirectional conversion table provided by Microsoft.
HexCharacterUnicodeDescription
0x80ЂU+0402CYRILLIC CAPITAL LETTER DJE
0x81ЃU+0403CYRILLIC CAPITAL LETTER GJE
0x82U+201ASINGLE LOW-9 QUOTATION MARK
0x83ѓU+0453CYRILLIC SMALL LETTER GJE
0x84U+201EDOUBLE LOW-9 QUOTATION MARK
0x85U+2026HORIZONTAL ELLIPSIS
0x86U+2020DAGGER
0x87U+2021DOUBLE DAGGER
0x88U+20ACEURO SIGN
0x89U+2030PER MILLE SIGN
0x8AЈU+0409CYRILLIC CAPITAL LETTER LJE
0x8BU+2039SINGLE LEFT-POINTING ANGLE QUOTATION MARK
0x8CЉU+040ACYRILLIC CAPITAL LETTER NJE
0x8DЌU+040CCYRILLIC CAPITAL LETTER KJE
0x8EЋU+040BCYRILLIC CAPITAL LETTER TSHE
0x8FЏU+040FCYRILLIC CAPITAL LETTER DZHE
0x90ђU+0452CYRILLIC SMALL LETTER DJE
0x91U+2018LEFT SINGLE QUOTATION MARK
0x92U+2019RIGHT SINGLE QUOTATION MARK
0x93U+201CLEFT DOUBLE QUOTATION MARK
0x94U+201DRIGHT DOUBLE QUOTATION MARK
0x95U+2022BULLET
0x96U+2013EN DASH
0x97U+2014EM DASH
0x98(undefined)Undefined
0x99U+2122TRADE MARK SIGN
0x9AјU+0459CYRILLIC SMALL LETTER LJE
0x9BU+203ASINGLE RIGHT-POINTING ANGLE QUOTATION MARK
0x9CљU+045ACYRILLIC SMALL LETTER NJE
0x9DќU+045CCYRILLIC SMALL LETTER KJE
0x9EћU+045BCYRILLIC SMALL LETTER TSHE
0x9FџU+045FCYRILLIC SMALL LETTER DZHE
0xA0 U+00A0NO-BREAK SPACE
0xA1ЎU+040ECYRILLIC CAPITAL LETTER SHORT U
0xA2ўU+045ECYRILLIC SMALL LETTER SHORT U
0xA3ЈU+0408CYRILLIC CAPITAL LETTER JE
0xA4¤U+00A4CURRENCY SIGN
0xA5ҐU+0490CYRILLIC CAPITAL LETTER GHE WITH UPTURN
0xA6¦U+00A6BROKEN BAR
0xA7§U+00A7SECTION SIGN
0xA8ЁU+0401CYRILLIC CAPITAL LETTER IO
0xA9©U+00A9COPYRIGHT SIGN
0xAAЄU+0404CYRILLIC CAPITAL LETTER UKRAINIAN IE
0xAB«U+00ABLEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0xAC¬U+00ACNOT SIGN
0xAD­U+00ADSOFT HYPHEN
0xAE®U+00AEREGISTERED SIGN
0xAFЇU+0407CYRILLIC CAPITAL LETTER YI
0xB0°U+00B0DEGREE SIGN
0xB1±U+00B1PLUS-MINUS SIGN
0xB2ІU+0406CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I
0xB3іU+0456CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I
0xB4ґU+0491CYRILLIC SMALL LETTER GHE WITH UPTURN
0xB5µU+00B5MICRO SIGN
0xB6U+00B6PILCROW SIGN
0xB7·U+00B7MIDDLE DOT
0xB8ёU+0451CYRILLIC SMALL LETTER IO
0xB9U+2116NUMERO SIGN
0xBAєU+0454CYRILLIC SMALL LETTER UKRAINIAN IE
0xBB»U+00BBRIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0xBCјU+0458CYRILLIC SMALL LETTER JE
0xBDЅU+0405CYRILLIC CAPITAL LETTER DZE
0xBEѕU+0455CYRILLIC SMALL LETTER DZE
0xBFїU+0457CYRILLIC SMALL LETTER YI
0xC0АU+0410CYRILLIC CAPITAL LETTER A
0xC1БU+0411CYRILLIC CAPITAL LETTER BE
0xC2ВU+0412CYRILLIC CAPITAL LETTER VE
0xC3ГU+0413CYRILLIC CAPITAL LETTER GHE
0xC4ДU+0414CYRILLIC CAPITAL LETTER DE
0xC5ЕU+0415CYRILLIC CAPITAL LETTER IE
0xC6ЖU+0416CYRILLIC CAPITAL LETTER ZHE
0xC7ЗU+0417CYRILLIC CAPITAL LETTER ZE
0xC8ИU+0418CYRILLIC CAPITAL LETTER I
0xC9ЙU+0419CYRILLIC CAPITAL LETTER SHORT I
0xCAКU+041ACYRILLIC CAPITAL LETTER KA
0xCBЛU+041BCYRILLIC CAPITAL LETTER EL
0xCCМU+041CCYRILLIC CAPITAL LETTER EM
0xCDНU+041DCYRILLIC CAPITAL LETTER EN
0xCEОU+041ECYRILLIC CAPITAL LETTER O
0xCFПU+041FCYRILLIC CAPITAL LETTER PE
0xD0РU+0420CYRILLIC CAPITAL LETTER ER
0xD1СU+0421CYRILLIC CAPITAL LETTER ES
0xD2ТU+0422CYRILLIC CAPITAL LETTER TE
0xD3УU+0423CYRILLIC CAPITAL LETTER U
0xD4ФU+0424CYRILLIC CAPITAL LETTER EF
0xD5ХU+0425CYRILLIC CAPITAL LETTER HA
0xD6ЦU+0426CYRILLIC CAPITAL LETTER TSE
0xD7ЧU+0427CYRILLIC CAPITAL LETTER CHE
0xD8ШU+0428CYRILLIC CAPITAL LETTER SHA
0xD9ЩU+0429CYRILLIC CAPITAL LETTER SHCHA
0xDAЪU+042ACYRILLIC CAPITAL LETTER HARD SIGN
0xDBЫU+042BCYRILLIC CAPITAL LETTER YERU
0xDCЬU+042CCYRILLIC CAPITAL LETTER SOFT SIGN
0xDDЭU+042DCYRILLIC CAPITAL LETTER E
0xDEЮU+042ECYRILLIC CAPITAL LETTER YU
0xDFЯU+042FCYRILLIC CAPITAL LETTER YA
0xE0аU+0430CYRILLIC SMALL LETTER A
0xE1бU+0431CYRILLIC SMALL LETTER BE
0xE2вU+0432CYRILLIC SMALL LETTER VE
0xE3гU+0433CYRILLIC SMALL LETTER GHE
0xE4дU+0434CYRILLIC SMALL LETTER DE
0xE5еU+0435CYRILLIC SMALL LETTER IE
0xE6жU+0436CYRILLIC SMALL LETTER ZHE
0xE7зU+0437CYRILLIC SMALL LETTER ZE
0xE8иU+0438CYRILLIC SMALL LETTER I
0xE9йU+0439CYRILLIC SMALL LETTER SHORT I
0xEAкU+043ACYRILLIC SMALL LETTER KA
0xEBлU+043BCYRILLIC SMALL LETTER EL
0xECмU+043CCYRILLIC SMALL LETTER EM
0xEDнU+043DCYRILLIC SMALL LETTER EN
0xEEоU+043ECYRILLIC SMALL LETTER O
0xEFпU+043FCYRILLIC SMALL LETTER PE
0xF0рU+0440CYRILLIC SMALL LETTER ER
0xF1сU+0441CYRILLIC SMALL LETTER ES
0xF2тU+0442CYRILLIC SMALL LETTER TE
0xF3уU+0443CYRILLIC SMALL LETTER U
0xF4фU+0444CYRILLIC SMALL LETTER EF
0xF5хU+0445CYRILLIC SMALL LETTER HA
0xF6цU+0446CYRILLIC SMALL LETTER TSE
0xF7чU+0447CYRILLIC SMALL LETTER CHE
0xF8шU+0448CYRILLIC SMALL LETTER SHA
0xF9щU+0449CYRILLIC SMALL LETTER SHCHA
0xFAъU+044ACYRILLIC SMALL LETTER HARD SIGN
0xFBыU+044BCYRILLIC SMALL LETTER YERU
0xFCьU+044CCYRILLIC SMALL LETTER SOFT SIGN
0xFDэU+044DCYRILLIC SMALL LETTER E
0xFEюU+044ECYRILLIC SMALL LETTER YU
0xFFяU+044FCYRILLIC SMALL LETTER YA
In Windows environments, users commonly input Windows-1251 characters using the Alt key combined with the decimal equivalent of the code point on the numeric keypad; for example, Alt+1040 produces the Cyrillic capital A (А, Unicode U+0410). This method relies on the system's active code page being set to Windows-1251 for accurate mapping.

Variants

Kazakh Variants

The Kazakh variants of Windows-1251 adapt the standard encoding to support the additional letters unique to the Kazakh Cyrillic alphabet, such as schwa (Ә/ә), en with descender (Ң/ң), straight U (Ұ/ұ), straight U with stroke (Ү/ү), barred O (Ө/ө), and others, by remapping unused code points primarily in the 0xA0–0xBF range. These modifications ensure compatibility with Kazakh orthography while preserving the Russian Cyrillic subset from the base Windows-1251 encoding. The primary variant is KZ-1048, also designated as STRK1048-2002, which serves as the national standard established by the in 2002. This encoding remaps approximately 16 code points from to accommodate 12 Kazakh-specific characters, reallocating positions that were previously undefined or used for non-Kazakh symbols. Key examples include the assignment of 0xA3 to Ә (U+04D8, CYRILLIC CAPITAL LETTER SCHWA), 0xBD to Ң (U+04A2, CYRILLIC CAPITAL LETTER EN WITH DESCENDER), and 0xAF to Ұ (U+04AE, CYRILLIC CAPITAL LETTER STRAIGHT U). The following table summarizes select remapped code points for Kazakh letters in KZ-1048:
Hex CodeCharacter (Capital/Small)Unicode (Capital)Description
0xA1Ү / үU+04B0CYRILLIC CAPITAL LETTER STRAIGHT U WITH STROKE
0xA3Ә / әU+04D8CYRILLIC CAPITAL LETTER SCHWA
0xA5Ө / өU+04E8CYRILLIC CAPITAL LETTER BARRED O
0xAFҰ / ұU+04AE
0xBDҢ / ңU+04A2CYRILLIC CAPITAL LETTER EN WITH DESCENDER
These mappings align directly with Unicode's Cyrillic Extended-A and Extended-B blocks (U+0400–U+04FF), facilitating conversion to modern standards. Another variant, IBM Code Page 1174 (CCSID 1174), is an ASCII-based encoding for Kazakh Cyrillic used in environments, matching Windows-1251 for the core Russian letters but incorporating Kazakh extensions similar to KZ-1048. It supports the same Kazakh character set but differs in a few positions. KZ-1048 was officially adopted as the Kazakh standard for digital text in the early and remained in use for official documents, , and until the 2017 presidential decree initiating a transition to a Latin-based , with a phased implementation planned through 2031 as of 2021. Despite the shift, KZ-1048 and Code Page 1174 persist in legacy archives, educational materials, and applications in .

Amiga Adaptation

Amiga-1251, a variant adapted for Amiga operating systems during the 1990s, combines the Cyrillic character block from Windows-1251 (code points 0xC0–0xFF) with the Latin-1 characters from ISO/IEC 8859-1 (code points 0x80–0xBF) to enable support for both Cyrillic and Western European languages on the platform. This encoding also incorporates elements from ISO/IEC 8859-15, such as the Euro sign, along with the numero sign (№) and the Russian-specific letters Ё and ё. Developed by the Amiga user community to address the platform's need for multilingual text handling in software and fonts, Amiga-1251 is not an official Microsoft product but a grassroots effort tailored to AmigaOS environments. Key modifications in the Latin range replace symbols present in the base encodings with additional Western European accented characters, such as Ñ/ñ and ¿, to better accommodate languages like Spanish while preserving Cyrillic integrity in the higher code points; these changes facilitated its use in Amiga font editors and localized ports of international applications. The partial overlap between Amiga-1251's Latin and Cyrillic assignments and those in standard Windows-1251 often results in compatibility issues, including mojibake—garbled text display—when files from the two encodings are interchanged without proper conversion. Following the decline of Amiga hardware and software ecosystems in the early 2000s, Amiga-1251 fell into obsolescence, though it persists in documentation within retro computing repositories like the Aminet archive.

Usage and Adoption

In Microsoft Windows

Windows-1251 serves as the default ANSI code page (code page 1251) for Cyrillic locales, such as Russian, in Microsoft Windows operating systems from through Windows 10 and Windows 11. This encoding enables proper representation of Cyrillic characters in system interfaces and applications configured for those locales, where the ANSI code page determines the mapping for non-Unicode text handling. Core Windows fonts, including Arial and Times New Roman, provide full glyph support for Windows-1251, ensuring accurate rendering of its character set across various weights and styles. These fonts, bundled with Windows since version 95, include the necessary Cyrillic extensions in their design, facilitating consistent display in documents and user interfaces without requiring additional installations for basic usage. In legacy applications developed before widespread Unicode adoption, Windows-1251 was extensively used for text storage and display, particularly in pre-Unicode software running on Cyrillic-configured systems. Applications like the original saved files (.txt) in the system's ANSI , which was Windows-1251 for Russian locales, while legacy documents (.doc) similarly relied on it for non-Unicode content. This made Windows-1251 the de facto standard for file formats and data interchange in Eastern European and CIS region deployments during the and early . Windows provides built-in system APIs for handling Windows-1251, such as the MultiByteToWideChar function, which converts text from code page 1251 to wide-character (UTF-16) format for internal processing. This support has been available since , allowing developers to integrate Cyrillic text conversion seamlessly into applications without external libraries. As of 2025, Windows-1251 remains supported in for with legacy files and non-Unicode applications, though recommends for new development due to its broader character coverage and cross-platform consistency. Modern Windows applications, including the updated , feature auto-detection mechanisms that identify and correctly render Windows-1251 encoded files, mitigating issues in mixed-encoding environments.

Web and Other Applications

Windows-1251 remains in use on approximately 0.2% of global websites as of November 2025, primarily as a legacy encoding for Cyrillic content, though its adoption has steadily declined over the past decade. Among websites hosted in the Russian Federation, UTF-8 dominates with 96.4% usage, leaving Windows-1251 and other legacy Cyrillic encodings to serve the remaining share, often in older or regionally specific sites. This encoding's web presence is concentrated in legacy systems where migration to Unicode has not yet occurred, but new web development overwhelmingly favors UTF-8 for broader compatibility. In email and MIME standards, Windows-1251 is officially registered as a charset name by the Internet Assigned Numbers Authority (IANA), enabling its use in email headers and bodies for Cyrillic text transmission. It persists in legacy email systems, particularly for Russian and Bulgarian correspondence, where older clients and servers default to this encoding for compatibility with pre-Unicode infrastructure. Support for Windows-1251 extends to non-Windows platforms, including Linux and Unix systems through the iconv utility, which handles conversions using the alias CP1251 for Cyrillic data processing. In Java environments, Oracle's implementation includes windows-1251 as a supported charset in the java.base module, with aliases like Cp1251 and cp1251 for input/output operations involving legacy Cyrillic files. IBM mainframes recognize it via Coded Character Set Identifier (CCSID) 1251, facilitating data exchange in enterprise environments with historical Cyrillic requirements. Various applications continue to leverage Windows-1251 for Cyrillic handling, such as Adobe Acrobat for importing and exporting text in PDF documents, where it ensures proper rendering of Russian characters in legacy workflows. Older web browsers like Internet Explorer 6 natively supported windows-1251 for displaying Cyrillic web pages, a feature inherited from earlier versions to accommodate regional content without Unicode. PDF generation tools also employ it for embedding Cyrillic fonts in documents targeted at legacy systems. As of 2025, Windows-1251 sees minimal new adoption, confined largely to legacy databases and regional intranets in Cyrillic-speaking areas, where it maintains compatibility with unmodernized archives and internal networks.

Compatibility and Comparisons

With Other Cyrillic Encodings

Windows-1251 differs significantly from ISO/IEC 8859-5, the international standard for Cyrillic encoding established in 1988, in both character coverage and code point assignments. While ISO/IEC 8859-5 provides 96 graphic characters in the 0xA0–0xFF range, including 66 core Cyrillic letters for Russian and related languages, Windows-1251 expands this to 191 usable characters across 0x80–0xFF, incorporating additional symbols, punctuation, and letters for non-Russian Slavic languages such as Ukrainian (e.g., Ukrainian IE at 0xAA/0xBA) and Belarusian (e.g., BYELORUSSIAN-UKRAINIAN I at 0xB2/0xB3). Specific additions in Windows-1251 include typographic symbols like the euro sign (0x88, U+20AC) and en dash (0x96, U+2013), absent in ISO/IEC 8859-5, as well as the Yo letter (Ё/ё at 0xA8/0xB8), which appears at 0xA1/0xF1 in the ISO standard. Mappings for common letters are incompatible; for instance, Cyrillic capital A (А, U+0410) is at 0xC0 in Windows-1251 but 0xB0 in ISO/IEC 8859-5. In comparison to , the Unix standard for Russian text defined in RFC 1489, Windows-1251 offers broader multilingual support while maintaining ASCII compatibility in the 0x00–0x7F range, unlike , which reserves this for controls and symbols. Both encodings cover the 33 letters of the , but employs a distinctive design where uppercase letters occupy 0xE0–0xFF and lowercase 0xC0–0xDF, effectively reversing the case bit relative to standard 7-bit assignments (e.g., capital A at 0xE1, U+0410; small a at 0xC1, U+0430). This phonetic ordering in prioritizes Russian frequency but limits it primarily to Russian, excluding dedicated support for Bulgarian or Serbian variants found in Windows-1251 (e.g., short U at 0xA1/0xA2). was preferred in Unix environments for its legacy in Soviet computing standards, whereas Windows-1251 integrated better with ecosystems. IBM-866, a DOS-era code page (CCSID 866) developed by IBM for Cyrillic under OS/2 and MS-DOS, contrasts with Windows-1251 through its emphasis on terminal graphics over typographic symbols. IBM-866 allocates 0x80–0x9F and parts of 0xA0–0xAF for uppercase and lowercase Cyrillic letters, respectively, with additional ranges like 0xE0–0xFF for more small letters and symbols, but its layout derives from bit-reversed assignments inspired by earlier EBCDIC influences, differing from Windows-1251's sequential Cyrillic block at 0xC0–0xFF. For example, capital A (U+0410) is at 0x80 in IBM-866 versus 0xC0 in Windows-1251. While both support Russian Cyrillic, IBM-866 prioritizes box-drawing characters (e.g., 0xB0–0xDF for lines and blocks) for text-mode interfaces, sacrificing space for the punctuation and currency symbols (e.g., section sign at 0xA7 in Windows-1251) that enhance Windows-1251's document handling. Direct interchange between Windows-1251 and these encodings risks mojibake due to mismatched mappings, such as displaying Cyrillic A as a degree sign or control character if misinterpreted. For instance, a Windows-1251 file read as ISO/IEC 8859-5 would render 0xC0 (А) as an undefined or control code, while KOI8-R misread as Windows-1251 swaps cases and scatters letters. Conversion tools like GNU recode or iconv are essential for safe migration, supporting pairwise transformations via Unicode as an intermediary. Windows-1251's advantages include its superset of ISO/IEC 8859-5's Cyrillic coverage with added multilingual extensions and full ASCII compatibility, reducing legacy issues compared to KOI8-R's non-ASCII low bytes or IBM-866's graphics focus, making it suitable for broader East European text processing in Windows environments.

Migration to Unicode

Windows-1251 has a complete one-to-one mapping to the Cyrillic block (U+0400–U+04FF), enabling straightforward conversion for standard Cyrillic characters as defined in 1.0 and later versions. Microsoft provides official conversion tables that align the 8-bit code points of Windows-1251 with their Unicode equivalents, primarily covering Russian, Bulgarian, Serbian, and other using the Cyrillic script. However, migrations can encounter challenges, particularly with variants of Windows-1251 such as those adapted for Kazakh (e.g., Code Page 1174 or 58595), where certain byte positions differ from the standard encoding and may map to non-standard or additional characters not directly represented in the core Cyrillic range. These discrepancies can result in lossy conversions, where undefined bytes (e.g., 0x98 in the standard table) are replaced with substitution characters like the replacement character (U+FFFD) or question marks, potentially losing semantic information in legacy data. Additionally, as a single-byte encoding, Windows-1251 lacks byte-order marks (BOMs), complicating automatic detection during conversion without explicit metadata. For effective migration, best practices emphasize using established libraries for batch conversions to preserve . The (ICU) library supports direct conversion from Windows-1251 (CP1251) to or UTF-16, handling fallback mappings for undefined characters. Similarly, Python's built-in codecs module allows reliable decoding via codecs.decode(data, 'cp1251') followed by encoding, suitable for scripting large-scale file conversions. New content creation should default to to align with modern standards, ensuring compatibility across platforms without region-specific limitations. As of November 2025, Windows-1251 persists as a legacy encoding on approximately 0.2% of websites globally, with modern browsers like Chrome and employing auto-detection heuristics (e.g., via the Encoding Standard) to render it correctly without user intervention. has deprecated reliance on code pages in favor of for new applications, where or UTF-16 serves as the default for text handling, eliminating Windows-1251 as an implicit fallback in UWP apps and .NET frameworks. Migration to Unicode offers key benefits, including support for 154,998 characters across 168 scripts, far exceeding Windows-1251's 256-code-point limit and enabling seamless handling of mixed-language content. This transition prevents —garbled text from encoding mismatches—in global contexts, such as international email or web localization, where legacy 8-bit encodings often fail. A notable is the Russian web ecosystem, where Windows-1251 dominated over 50% of sites in the early due to its status as the default for Russian Windows installations, but adoption has plummeted to under 5% by 2025 amid widespread standardization. This shift, driven by browser support and content management systems like defaulting to , has improved accessibility while requiring one-time conversions for archival content.

References

  1. https://en.wikibooks.org/wiki/Character_Encodings/Code_Tables/Windows/Code_page_58595
Add your contribution
Related Hubs
User Avatar
No comments yet.