Windows-1250

Windows-1250Main

Community hub

Windows-1250

7 pages, 0 posts

0 subscribers

Recent from talks

Be the first to start a discussion here.

Recent from talks

Be the first to start a discussion here.

Contribute something

About hubMembersContent overviewUpdatesRules

Main reference articles

Windows-1250

View on Wikipedia

from Wikipedia

Windows-1250
MIME / IANA	windows-1250
Alias(es)	cp1250 (Code page 1250)
Languages	Czech, Polish, Slovak, Hungarian, Slovene, Serbo-Croatian (Latin script), Montenegrin, Romanian (before 1993 spelling reform), Turkmen, Rotokas, Albanian, English, German, Irish, Luxembourgish, Dutch
Created by	Microsoft
Standard	WHATWG Encoding Standard
Classification	extended ASCII, Windows-125x
Other related encoding	ISO-8859-2

Windows-1250 is a code page used under Microsoft Windows to represent texts in Central European and Eastern European languages that use the Latin script. It is primarily used by Czech.^[1] It is also used for Polish (as can Windows-1257), Slovak, Hungarian, Slovene (as can Windows-1257), Serbo-Croatian (Latin script), Romanian (before a 1993 spelling reform) and Albanian (as can Windows-1252). It may also be used with the German language, though it is missing uppercase ẞ.^[a] German-language texts encoded with Windows-1250 and Windows-1252 are identical.

This has been replaced by UTF-8 far more than Windows-1252 has. As of March 2025^[update], less than 0.05% of all web pages use Windows-1250.^[2]^[3]^[4]

Windows-1250 is similar to ISO-8859-2 and has all the printable characters it has and more. However, a few of them are rearranged (unlike Windows-1252, which keeps all printable characters from ISO-8859-1 in the same place). Most of the rearrangements seem to have been done to keep characters shared with Windows-1252 in the same place but three of the characters moved (Ą, Ľ, ź) cannot be explained this way, since those do not occur in Windows-1252 and could have been put in the same positions as in ISO-8859-2 if ˇ had been put e.g. at 9F.

IBM uses code page 1250 (CCSID 1250 and euro sign extended CCSID 5346) for Windows-1250.^[5]^[6]^[7]^[8]^[9]^[10]^[11]

Character set

[edit]

The following table shows Windows-1250. Each character is shown with its Unicode equivalent.

Windows-1250^[12]
	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
0x	NUL	SOH	STX	ETX	EOT	ENQ	ACK	BEL	BS	HT	LF	VT	FF	CR	SO	SI
1x	DLE	DC1	DC2	DC3	DC4	NAK	SYN	ETB	CAN	EM	SUB	ESC	FS	GS	RS	US
2x	SP	!	"	#	$	%	&	'	(	)	*	+	,	-	.	/
3x	0	1	2	3	4	5	6	7	8	9	:	;	<	=	>	?
4x	@	A	B	C	D	E	F	G	H	I	J	K	L	M	N	O
5x	P	Q	R	S	T	U	V	W	X	Y	Z	[	\	]	^	_
6x	`	a	b	c	d	e	f	g	h	i	j	k	l	m	n	o
7x	p	q	r	s	t	u	v	w	x	y	z	{	\|	}	~	DEL
8x	€		‚		„	…	†	‡		‰	Š	‹	Ś	Ť	Ž	Ź
9x		‘	’	“	”	•	–	—		™	š	›	ś	ť	ž	ź
Ax	NBSP	ˇ	˘	Ł	¤	Ą	¦	§	¨	©	Ş	«	¬	SHY	®	Ż
Bx	°	±	˛	ł	´	µ	¶	·	¸	ą	ş	»	Ľ	˝	ľ	ż
Cx	Ŕ	Á	Â	Ă	Ä	Ĺ	Ć	Ç	Č	É	Ę	Ë	Ě	Í	Î	Ď
Dx	Đ	Ń	Ň	Ó	Ô	Ő	Ö	×	Ř	Ů	Ú	Ű	Ü	Ý	Ţ	ß
Ex	ŕ	á	â	ă	ä	ĺ	ć	ç	č	é	ę	ë	ě	í	î	ď
Fx	đ	ń	ň	ó	ô	ő	ö	÷	ř	ů	ú	ű	ü	ý	ţ	˙

Different from ISO-8859-2

Different from Windows-1252 to match ISO-8859-2

Different from both Windows-1252 and ISO-8859-2

Notes

[edit]

^ In 2017, the Council for German Orthography officially adopted a capital, ⟨ẞ⟩, before support for German was complete. Fully compatible with ISO/IEC 8859-1 for German texts.

References

[edit]

^ "Distribution of Content Languages among websites that use Windows-1250". w3techs.com. Retrieved 2022-10-23.
^ "Historical trends in the usage of character encodings for websites, October 2022". w3techs.com.
^ "Frequently Asked Questions". w3techs.com.
^ "Distribution of Character Encodings among websites that use Czech". w3techs.com. Retrieved 2022-10-23.
^ "Code page 1250 information document". Archived from the original on 2016-03-03.
^ "CCSID 1250 information document". Archived from the original on 2016-03-27.
^ "CCSID 5346 information document". Archived from the original on 2014-11-29.
^ Code Page CPGID 01250 (pdf) (PDF), IBM
^ Code Page CPGID 01250 (txt), IBM
^ International Components for Unicode (ICU), ibm-1250_P100-1995.ucm, 2002-12-03
^ International Components for Unicode (ICU), ibm-5346_P100-1998.ucm, 2002-12-03
^ Steele, Shawn (1998), CP1250 to Unicode table, Unicode Consortium, CP1250.TXT

External links

[edit]

v t e Character encodings
Early telecommunications	Telegraph code Needle Morse Non-Latin Wabun/Kana Chinese Cyrillic Baudot and Murray Fieldata ASCII ISO/IEC 646 BCDIC Teletex and Videotex/Teletext T.51/ISO/IEC 6937 ITU T.61 ITU T.101 World System Teletext background sets Transcode
ISO/IEC 8859	Approved parts -1 (Western Europe) -2 (Central Europe) -3 (Maltese/Esperanto) -4 (North Europe) -5 (Cyrillic) -6 (Arabic) -7 (Greek) -8 (Hebrew) -9 (Turkish) -10 (Nordic) -11 (Thai) -13 (Baltic) -14 (Celtic) -15 (New Western Europe) -16 (Romanian) Abandoned parts -12 (Devanagari) Proposed but not approved KOI-8 Cyrillic Sámi Adaptations Welsh Estonian Ukrainian Cyrillic
Bibliographic use	MARC-8 ANSEL CCCII/EACC ISO 5426 5426-2 5427 5428 6438 6862
National standards	ArmSCII Big5 BraSCII BSCII CNS 11643 DIN 66003 ELOT 927 GOST 10859 GB 2312 GB 12345 GB 12052 GB 18030 HKSCS ISCII JIS X 0201 JIS X 0208 JIS X 0212 JIS X 0213 KOI-7 KPS 9566 KS X 1001 KS X 1002 LST 1564 LST 1590-4 PASCII Shift JIS SI 960 TIS-620 TSCII VISCII VSCII YUSCII
ISO/IEC 2022	ISO/IEC 8859 ISO/IEC 10367 Extended Unix Code / EUC
Mac OS Code pages ("scripts")	Armenian Arabic Barents Cyrillic Celtic Central European Croatian Cyrillic Devanagari Farsi (Persian) Font X (Kermit) Gaelic Georgian Greek Gujarati Gurmukhi Hebrew Iceland Inuit Keyboard Latin (Kermit) Maltese/Esperanto Ogham Roman Romanian Sámi Turkish Turkic Cyrillic Ukrainian VT100
DOS code pages	437 737 850 858 861 862 863 864 865 866 867 868 869 899 904 932 936 942 949 950 951 1040 1043 1046 1098 1115 1116 1117 1118 1127 ABICOMP CS Indic CSX Indic CSX+ Indic CWI-2 Iran System Kamenický Mazovia MIK
IBM AIX code pages	895 896 912 915 921 922 1006 1008 1009 1010 1012 1013 1014 1015 1016 1017 1018 1019 1046 1133
Windows code pages	CER-GS 932 936 (GBK) 950 Extended Latin-8 1250 1251 1252 1253 1254 1255 1256 1257 1258 1270 Cyrillic + French Cyrillic + German Polytonic Greek
EBCDIC code pages	Japanese language in EBCDIC DKOI
DEC terminals (VTx)	Multinational (MCS) National Replacement (NRCS) French Canadian Swiss Spanish United Kingdom Dutch Finnish French Norwegian and Danish Swedish Norwegian and Danish (alternative) 8-bit Greek 8-bit Turkish SI 960 Hebrew Special Graphics Technical (TCS)
Platform specific	1052 1053 1054 1055 1058 Acorn RISC OS Amstrad CPC Apple II ATASCII Atari ST BICS Casio calculators CDC Compucolor 8001 Compucolor II CP/M+ DEC RADIX 50 DEC MCS/NRCS DG International Galaksija GEM GSM 03.38 HP Roman HP FOCAL HP RPL SQUOZE LICS LMBCS MSX NEC APC NeXT PETSCII PostScript Standard PostScript Latin 1 SAM Coupé Sega SC-3000 Sharp calculators Sharp MZ Sinclair QL Teletext TI calculators TRS-80 Ventura International WISCII XCCS ZX80 ZX81 ZX Spectrum
Unicode / ISO/IEC 10646	UTF-1 UTF-7 UTF-8 UTF-16 UTF-32 UTF-EBCDIC GB 18030 DIN 91379 BOCU-1 CESU-8 SCSU TACE16 Comparison of Unicode encodings
TeX typesetting system	Cork LY1 OML OMS OT1
Miscellaneous code pages	ABICOMP ASMO 449 Digital encoding of APL symbols ISO-IR-68 ARIB STD-B24 Fieldata HZ IEC-P27-1 INIS 7-bit 8-bit ISO-IR-169 ISO 2033 KOI KOI8-R KOI8-RU KOI8-U Mojikyō SEASCII Stanford/ITS Symbol TRON Unified Hangul Code
Control character	Morse prosigns C0 and C1 control codes ISO/IEC 6429 JIS X 0211 Unicode control, format and separator characters Whitespace characters
Related topics	CCSID Character encodings in HTML Charset detection Han unification Hardware code page MICR code Mojibake Variable-length encoding
Character sets

Revisions and contributors Edit on Wikipedia Read on Wikipedia

View on Grokipedia

from Grokipedia

Windows-1250 is a single-byte character encoding (SBCS) developed by Microsoft for representing text in Central and Eastern European languages that use the Latin script, such as Polish, Czech, Slovak, Hungarian, Slovene, Croatian, Serbian (in Latin script), and Romanian.^[1]^[2] It serves as code page 1250 in Windows environments, mapping 256 code points (0–255) to characters while maintaining compatibility with ASCII in the lower range (0–127).^[3]^[4] Also referred to as ANSI Central European or Central European (Windows), this encoding was introduced to support multilingual text in early Windows versions, including Windows 3.1, Windows 95, and Windows NT.^[3]^[2] It was formally registered with the Internet Assigned Numbers Authority (IANA) on May 3, 1996, under the MIME name "windows-1250."^[2] The encoding supports 223 printable characters, focusing on diacritics and symbols essential for the targeted languages, and is part of the broader Windows-125x family of code pages designed for regional text handling.^[2]^[4] While Windows-1250 provided an efficient solution for 8-bit text storage and display in legacy systems, its limitations—such as support for only 256 characters—have been largely superseded by Unicode, which enables broader multilingual capabilities in a single encoding standard.^[4] Detailed mappings and specifications are documented in Microsoft's globalization resources and IANA registries for interoperability in software development and data processing.^[4]^[2]

Introduction

Overview

Windows-1250 is an 8-bit, single-byte character encoding standard, designated as code page 1250 by Microsoft, that maps 256 distinct code points to characters, with the initial 128 code points identical to those in the ASCII standard for basic Latin letters, digits, and symbols.^[3] This design ensures backward compatibility with ASCII while extending support to additional glyphs required for regional scripts.^[4] Developed as part of Microsoft's efforts to internationalize the Windows operating system, Windows-1250 was introduced in the 1990s to accommodate Central and Eastern European languages that employ the Latin alphabet augmented with diacritical marks and special characters.^[2] It primarily serves languages such as Polish, Czech, Slovak, Hungarian, Slovene, Croatian, Serbian (in Latin script), and Romanian, enabling proper representation of their native texts in software and documents.^[2] As a member of the Windows-12xx family of code pages—intended for non-East Asian scripts—Windows-1250 parallels encodings like Windows-1252, which targets Western European languages, but focuses on the unique orthographic needs of its designated region.^[3]

Purpose and Scope

Windows-1250 was designed to provide an efficient single-byte encoding scheme for Central European languages within Windows applications, addressing the limitations of ASCII and early ISO standards that lacked sufficient support for accented characters in these languages.^[5]^[1] The scope of Windows-1250 is limited to a total of 256 characters, prioritizing Latin-based scripts with common diacritics required for languages such as Polish, Czech, Slovak, and Hungarian; it does not support non-Latin scripts like Cyrillic or Greek, which are instead covered by other code pages such as Windows-1251.^[3]^[1] This encoding is optimized for text processing in Windows environments, with a strong emphasis on compatibility for file input/output operations and display in regional settings to ensure seamless handling of legacy applications.^[5] Compared to ASCII, Windows-1250 maintains full compatibility in the 0x00–0x7F range while extending the 0x80–0xFF byte values to include accented letters and symbols essential for the target Central European languages.^[5]

History and Development

Origins in Microsoft Code Pages

Windows-1250 was developed by Microsoft in the early 1990s as part of the internationalization efforts for Windows 3.1, which was released in April 1992, to support Central and Eastern European languages using the Latin script, such as Czech, Hungarian, Polish, and Slovak.^[2] This code page built upon earlier OEM code pages like CP852, which had been used in DOS and OS/2 environments for similar linguistic regions, by extending the character repertoire to better accommodate Windows-specific needs while maintaining compatibility with legacy systems.^[6] Introduced in non-English versions of Windows 3.1 and Windows for Workgroups 3.11, it represented a shift toward region-specific encodings in Microsoft's ecosystem, diverging from the single ANSI code page used in the English edition.^[7] The code page drew influences from extensions to the ISO 646 standard and early drafts of ISO/IEC 8859-2 (Latin-2), but Microsoft customized it for optimal integration with Windows font rendering technologies, including the newly introduced TrueType fonts in Windows 3.1, which enabled scalable typography for accented characters common in Central European languages.^[7] Unlike strict adherence to international standards, these adaptations prioritized practical usability in Windows applications, such as improved support for diacritics and punctuation in graphical user interfaces. First prominently documented in 1993 alongside Windows NT 3.1, it was named "Windows Central European" to highlight its regional focus and distinguish it from ANSI-compliant alternatives.^[2] As part of the broader Windows-125x series of single-byte code pages—where the "1250" designation specifically denotes Central European coverage—Windows-1250 contrasted with Windows-1252, which targeted Western European languages.^[7] This series emerged from Microsoft's need to provide localized text handling without relying solely on emerging Unicode standards, ensuring backward compatibility while expanding global reach in the pre-Windows 95 era.^[2]

Standardization and Evolution

Windows-1250 was formally documented and registered with the Internet Assigned Numbers Authority (IANA) in 1996 as the MIME charset name "windows-1250," enabling its use in internet protocols and email standards.^[2] Following its initial development in the 1990s, Windows-1250 underwent minor updates, particularly in 1998, when Microsoft released a euro symbol (€) addition for code pages including 1250 to support the upcoming introduction of the euro currency; this update was integrated into Windows NT 4.0 service packs and later extended to Windows 95 and 98 for improved compatibility with Central European fonts and symbols.^[7] The encoding has seen no major revisions since its inception, remaining largely stable under Microsoft's proprietary control, which allowed for tweaks diverging from international standards like ISO/IEC 8859-2, such as the addition of characters for better Windows font rendering. With the release of Windows XP in 2001, Microsoft began emphasizing Unicode (UTF-16 internally) for new applications, marking the start of deprecation trends for legacy code pages like Windows-1250, though full support persisted for backward compatibility in non-Unicode programs.^[5]^[3] By the 2010s, Microsoft explicitly recommended UTF-8 over code pages for consistent internationalization in Windows applications, as outlined in developer guidelines promoting Unicode to avoid locale-specific inconsistencies.^[3] Despite this shift, Windows-1250 continues to serve as the default ANSI code page for Central European locales (e.g., Polish, Czech) in Windows as of 2025, ensuring legacy software compatibility while encouraging migration to UTF-8 via system settings.^[8]^[3]

Technical Specifications

Encoding Structure

Windows-1250 is an 8-bit single-byte character encoding scheme that defines a total of 256 code points, corresponding to byte values from 0x00 to 0xFF in hexadecimal notation.^[5]^[2] This fixed structure allows for straightforward mapping of characters to bytes, making it suitable for legacy systems where processing efficiency was paramount. The encoding is designed as a superset of the 7-bit US-ASCII standard, ensuring compatibility with basic Latin text and control characters.^[5] The code point allocation divides the 256 slots into two primary ranges: bytes 0x00 through 0x7F, which align directly with ASCII for control characters (such as 0x00 for null and 0x1F for unit separator) and printable basic Latin characters (from 0x20 space to 0x7E tilde), and bytes 0x80 through 0xFF, reserved for language-specific extensions primarily targeting Central and Eastern European scripts using the Latin alphabet.^[5]^[9] This bifurcation maintains interoperability with ASCII-based systems while providing room for additional glyphs, such as accented letters and diacritics, without altering the foundational 128-character set.^[5] As a single-byte encoding, Windows-1250 represents each character with exactly one 8-bit byte, eschewing multi-byte sequences found in more complex schemes like UTF-8; this fixed-width design simplifies parsing and rendering in early computing environments, where variable-length encodings could introduce overhead in string manipulation and display routines.^[5]^[2] A notable feature is its deviation from ISO standards, particularly in the 0x80–0x9F range, where control codes in ISO/IEC 8859-2 are reassigned to printable characters—for instance, 0x8A maps to the uppercase Š (S with caron)—to better accommodate practical needs in Windows applications.^[9]^[2] Unlike Unicode-based encodings, Windows-1250 requires no byte-order mark (BOM) for interpretation, as its single-byte format renders byte order and endianness concerns irrelevant; text streams in this encoding can thus be processed directly without preamble indicators.^[5]^[2]

Byte Values and Assignment

Windows-1250 employs an 8-bit structure where the byte values 0x00 through 0x7F directly correspond to the ASCII standard, ensuring compatibility with basic Latin text. Bytes 0x00 to 0x1F are assigned to control characters identical to those in ASCII, such as 0x00 for NULL and 0x0A for line feed. Bytes 0x20 to 0x7E map to the printable ASCII characters, including 0x20 for space and 0x7A for lowercase 'z'. The byte 0x7F is designated for the delete control character.^[10] In the extended range of 0x80 to 0xFF, Windows-1250 assigns 123 characters (all printable), primarily supporting Central European scripts through Latin letters with modifications and typographic symbols. Notably, 27 positions within 0x80 to 0x9F are redefined from the C1 control codes used in ISO/IEC 8859-1 or ISO/IEC 8859-2 to printable characters, such as 0x82 for the single low-9 quotation mark (U+201A) and 0x8D for the Latin capital letter T with caron (U+0164). Five bytes in this subrange—0x81, 0x83, 0x88, 0x90, and 0x98—remain undefined.^[10]^[11] The assignments in 0xA0 to 0xBF focus on spacing characters, diacritical marks, and uppercase letters with accents, exemplified by 0xA0 for the non-breaking space (U+00A0) and 0xA1 for the caron (U+02C7). The range 0xC0 to 0xFF predominantly covers uppercase and lowercase letters with various accents along with additional symbols, such as 0xC0 for the Latin capital letter R with acute (U+0154) and 0xFF for the dot above (U+02D9). These mappings enable representation of characters unique to languages like Polish, Czech, and Hungarian.^[10]

Byte Range	Description of Assignments	Examples
0x00–0x1F	ASCII control characters	0x00: NULL (U+0000), 0x09: Horizontal tab (U+0009)
0x20–0x7E	Printable ASCII characters	0x21: Exclamation mark (!, U+0021), 0x41: Latin capital letter A (A, U+0041)
0x7F	Delete control	0x7F: Delete (U+007F)
0x80–0x9F	Typographic symbols, letters with carons/acutes, and undefined positions (27 redefined from controls)	0x82: Single low-9 quotation mark (‚, U+201A), 0x91: Left single quotation mark (‘, U+2018), 0x9A: Latin small letter s with caron (š, U+0161)
0xA0–0xBF	Non-breaking space, diacritics, and accented uppercase	0xA0: Non-breaking space ( , U+00A0), 0xA2: Breve (˘, U+02D8), 0xA3: Latin capital letter L with stroke (Ł, U+0141), 0xAF: Latin capital letter Z with dot above (Ż, U+017B)
0xC0–0xFF	Accented uppercase and lowercase letters, additional symbols	0xC0: Latin capital letter R with acute (Ŕ, U+0154), 0xE0: Latin small letter r with acute (ŕ, U+0155), 0xFF: Dot above (˙, U+02D9)

Character Coverage

Basic Latin and Punctuation

The Basic Latin and Punctuation range in Windows-1250 corresponds to byte values 0x00 through 0x7F, encompassing 128 characters that form the core of the encoding.^[4] This range is defined identically to the US-ASCII standard, ensuring backward compatibility with systems and software designed for 7-bit ASCII text processing.^[4] As a single-byte encoding, Windows-1250 assigns these bytes directly to characters without alteration, facilitating the representation of basic English-language content in international contexts.^[5] The printable characters within this range include the 26 uppercase letters (A–Z, bytes 0x41–0x5A), 26 lowercase letters (a–z, bytes 0x61–0x7A), and the 10 digits (0–9, bytes 0x30–0x39).^[9] Common punctuation and symbols, such as the exclamation mark (!) at 0x21, quotation marks (" and ') at 0x22 and 0x27, number sign (#) at 0x23, dollar sign ($) at 0x24, percent sign (%) at 0x25, ampersand (&) at 0x26, apostrophe ('), parentheses ( and ) at 0x28–0x29, asterisk (*) at 0x2A, plus (+) at 0x2B, comma (,) at 0x2C, hyphen-minus (-) at 0x2D, period (.) at 0x2E, slash (/) at 0x2F, colon (:) at 0x3A, semicolon (;) at 0x3B, less-than (<) and greater-than (>) at 0x3C–0x3E, question mark (?) at 0x3F, at sign (@) at 0x40, square brackets [ and ] at 0x5B–0x5D, backslash () at 0x5C, caret (^) at 0x5E, underscore (_) at 0x5F, grave accent (`) at 0x60, curly braces { and } at 0x7B–0x7D, vertical bar (|) at 0x7C, and tilde (~) at 0x7E, provide essential tools for sentence structure, mathematical notation, and formatting in text.^[9] Additionally, the space character at 0x20 serves as a fundamental delimiter.^[9] Control characters occupy bytes 0x00–0x1F and 0x7F, supporting non-printable functions critical for data interchange and display.^[4] Notable examples include the null character (NUL) at 0x00 for string termination, line feed (LF) at 0x0A for vertical spacing in Unix-style line endings, carriage return (CR) at 0x0D for horizontal positioning in DOS-style endings, horizontal tab (HT) at 0x09 for columnar alignment, and delete (DEL) at 0x7F for erasing positions in teletype operations.^[9] These controls enable consistent text formatting across files, terminals, and protocols, underpinning the reliability of Windows-1250 in mixed-language environments.^[5] By mirroring US-ASCII in this foundational block, Windows-1250 supports the seamless integration of English terminology and symbols into documents primarily using Central European scripts, promoting interoperability without requiring translation for basic elements.^[3]

Central European Extensions

Windows-1250 extends the Basic Latin and punctuation characters of the ASCII standard by incorporating approximately 128 additional code points in the range 0x80–0xFF to support the diverse Latin-based scripts of Central and Eastern Europe. Of these, several code points (0x81, 0x83, 0x88, 0x8D, 0x8F, 0x90, 0x98) are undefined. These extensions focus on accented letters and diacritics essential for accurate representation of regional orthographies, enabling proper rendering of text in multiple languages without resorting to Unicode or other multi-byte encodings in legacy Windows environments.^[10]^[2] The character set includes key uppercase accented letters such as Á, Č, Ď, É, Ě, Í, Ň, Ó, Ř, Š, Ť, Ú, Ů, Ý, and Ž, along with their lowercase counterparts á, č, ď, é, ě, í, ň, ó, ř, š, ť, ú, ů, ý, and ž. These characters are grouped primarily by case, with uppercase forms concentrated in the 0xC0–0xDF range and lowercase in 0xE0–0xFF, while symbols and diacritic variants are interspersed in the 0x80–0xBF blocks for compatibility with existing software layouts.^[10] Coverage is tailored to several languages, including Polish with characters like Ł and ł for the unique "w" sound, Czech and Slovak featuring ř and ů for specific phonetic distinctions, and Hungarian incorporating ő and ű for rounded vowels. Other supported scripts include Croatian, Romanian, Serbian (Latin), and Slovenian, encompassing a total of about 100 diacritic extensions that prioritize Latin alphabet variations over non-Latin scripts. This selection ensures comprehensive support for everyday text in these regions, such as literature, signage, and official documents.^[10]^[2] Unique to Windows-1250 among early Microsoft code pages are typographic elements like single low-9 quotation mark ‚, double low-9 quotation mark „, en dash –, and em dash —, which facilitate professional typesetting in European publications. Unlike Western European code pages such as Windows-1252, these extensions initially omitted the euro symbol € (added later at 0x80 via Windows updates in the late 1990s), emphasizing instead regional linguistic needs over pan-European currency adoption at the time of original design.^[10]^[12]

Compatibility and Mappings

Relation to ISO/IEC 8859-2

Windows-1250 and ISO/IEC 8859-2 are both single-byte character encodings primarily designed to support Central and Eastern European languages that use Latin-based scripts with diacritical marks, such as Polish, Czech, Slovak, Hungarian, and Croatian.^[7] Windows-1250 includes all the printable characters of ISO/IEC 8859-2 plus additional ones, but a few are rearranged, including key diacritics like acute accents, carons, and ogoneks.^[13]^[14] For instance, the character Š (Latin capital letter S with caron, U+0160) appears in both encodings but is mapped to byte 0xA9 in ISO/IEC 8859-2 and 0x8A in Windows-1250. Specific rearrangements include Ą, Ľ, and ź.^[13]^[14] A significant structural difference arises in the byte range 0x80–0x9F, which ISO/IEC 8859-2 reserves for non-printable C1 control codes, such as line tabulation set (0x8A) and single shift two (0x8E), to maintain compatibility with international standards for data interchange.^[14] In contrast, Windows-1250 repurposes 27 of these 32 slots—leaving five undefined, similar to other Windows code pages—for printable characters tailored to graphical user interfaces, including typographic symbols like the en dash (– at 0x96), em dash (— at 0x97), and additional diacritics such as Ž (Latin capital letter Z with caron, U+017D at 0x8E).^[13]^[7] This redefinition enhances support for common punctuation and quotes in Windows applications but introduces incompatibilities with pure ISO-compliant systems. Beyond the C1 range, Windows-1250 adds unique glyphs not present in the standard, such as the euro sign (€ at 0x80) and single angle quotation marks (‹ at 0x8B and › at 0x9B, U+2039 and U+203A).^[13] While Windows-1250 includes all core printable characters from ISO/IEC 8859-2, the positional shifts for a few characters mean it is not a strict superset; direct byte-to-byte conversion can result in mojibake for mismatched characters.^[14] A few characters, such as Ľ (Latin capital letter L with caron, U+013D at 0xA5 in ISO/IEC 8859-2 versus 0xBC in Windows-1250), require explicit remapping during conversion to preserve fidelity.^[13]^[14] The Internet Assigned Numbers Authority (IANA) maintains them as distinct registered character sets, underscoring their non-interchangeability without transformation.^[15] Historically, the divergence traces to ISO/IEC 8859-2's finalization as an international standard in 1987, aimed at uniform encoding across diverse systems.^[16] Windows-1250 emerged in 1992 as part of Microsoft Windows 3.1, positioned as a "Windows Latin-2" variant with optimizations for the operating system's text rendering and localization needs, including the printable C1 extensions to better accommodate UI elements like dialog boxes and menus.^[7]^[2] This evolution reflects Microsoft's proprietary adaptations for enhanced usability in regional Windows deployments, finalized around 1993.^[7]

Unicode Conversion Guidelines

Windows-1250 provides a direct one-to-one mapping for its defined characters to Unicode code points, primarily utilizing the Latin-1 Supplement (U+0080 to U+00FF) for accented Latin letters and the Latin Extended-A block (U+0100 to U+017F) for additional Central European extensions such as carons and rings.^[10] For instance, the byte 0xC1 maps to U+00C1 (LATIN CAPITAL LETTER A WITH ACUTE), while 0x8A maps to U+0160 (LATIN CAPITAL LETTER S WITH CARON).^[17] This structure ensures that the 251 printable characters in Windows-1250, including ASCII-compatible bytes from 0x00 to 0x7F, correspond uniquely to Unicode scalars without overlap or ambiguity.^[10] Converting from Windows-1250 to Unicode involves addressing the code page's Windows-specific additions, such as mappings for characters like 0xA5 (Š, U+0160) that extend beyond basic Latin sets, but these pose no significant challenges since all 251 defined characters (with 5 undefined bytes like 0x81) map losslessly to Unicode.^[10] There are no lossy conversions required, as Windows-1250 is a strict subset of Unicode's Basic Multilingual Plane, allowing full preservation of the original data during transformation.^[11] Round-trip compatibility is complete: encoding a Unicode string back to Windows-1250 and then to Unicode recovers the exact original, provided only supported characters are used.^[10] Practical guidelines for conversion emphasize using established APIs and detection methods. In Windows environments, the MultiByteToWideChar function with the code page identifier 1250 converts Windows-1250 bytes to UTF-16 Unicode wide characters, handling the mapping automatically for strings or buffers.^[18] To detect Windows-1250 encoding in files or streams lacking a byte-order mark (BOM), rely on heuristics such as the presence of bytes in the 0x80–0xFF range typical of Central European text, combined with validation against known invalid sequences.^[19] For web applications or modern interoperability, convert to UTF-8 as the target encoding, which supports seamless display and transmission of these characters across platforms. These extensions build on the Central European character coverage to ensure accurate representation of diacritics in languages like Polish and Czech.^[10]

Usage and Applications

In Windows Operating Systems

Windows-1250 serves as the default ANSI code page for Central European locales, such as Czech, Hungarian, Polish, Slovak, and Slovenian, in Microsoft Windows operating systems from Windows 95 through Windows 11.^[3]^[20] In these locales, it is used by applications like Notepad for saving and opening text files in ANSI format and by Windows Explorer for handling non-Unicode file names, ensuring compatibility with legacy text-based operations.^[5]^[21] The encoding is integrated into Windows through the Code Page APIs, identifiable by the constant CP1250 (value 1250), which allows developers to convert between Windows-1250 and Unicode strings using functions like MultiByteToWideChar and WideCharToMultiByte.^[3] Font support is provided by system fonts such as Arial CE (Central European variant) and Tahoma, which include glyphs for the extended Latin characters defined in Windows-1250.^[22] Users can switch code pages via regional settings in the Control Panel, under the non-Unicode language options, to accommodate different locales without reinstalling the OS.^[5] In early Windows versions, such as Windows 95, Windows-1250 was essential for maintaining compatibility with MS-DOS applications using OEM code page 852, bridging the gap between console and graphical interfaces.^[7] However, starting with Windows NT, Microsoft shifted internal string storage to UTF-16 (initially UCS-2), deprecating code pages like Windows-1250 for core system operations while retaining them for legacy support.^[23] As of 2025, Windows-1250 remains available in the Win32 API for backward-compatible applications, though Microsoft documentation strongly recommends migrating to Unicode to avoid encoding limitations and ensure global compatibility.^[5]^[1]

In Web and Legacy Software

Windows-1250 is declared as the character encoding in HTML documents using the <meta charset="windows-1250"> tag, allowing browsers to correctly interpret text content for Central and Eastern European languages.^[24] This declaration ensures proper rendering of characters such as accented letters in Polish, Czech, and Hungarian. Modern web browsers, including Google Chrome and Microsoft Edge, provide support for Windows-1250 primarily to handle legacy websites, decoding pages that specify this encoding via HTTP headers or meta tags.^[25] The official IANA MIME name for this encoding is "windows-1250", which is used in HTTP Content-Type headers like text/html; charset=windows-1250 to signal the encoding to clients.^[2] In legacy software from the 1990s and 2000s, Windows-1250 was commonly employed in applications targeting Central and Eastern European users, including older email clients that processed messages with regional diacritics and databases such as pre-Unicode versions of Oracle, which supported it as the WE8MSWIN1250 character set for storing and retrieving text data. Mis-detection of this encoding in such systems often results in mojibake, where bytes are misinterpreted as another charset like UTF-8, leading to garbled displays of characters such as "ł" appearing as "Â£".^[26] As of 2025, Windows-1250 is rarely used in new web development due to the dominance of UTF-8, which accounts for over 97% of websites, while Windows-1250 appears on less than 0.1% of sites with known encodings.^[27] However, it persists in some Eastern European government and enterprise systems for backward compatibility with legacy data and applications. The W3C recommends against specifying legacy encodings like Windows-1250 in HTTP headers for new content, favoring UTF-8 to ensure universal interoperability.^[28] In Unix-like environments, tools such as GNU iconv facilitate conversions involving Windows-1250, for example, using the command iconv -f WINDOWS-1250 -t UTF-8 input.txt > output.txt to migrate legacy files to modern standards.

History

Windows-1250

Recent from talks

Recent from talks

Contribute something

Contribute something

Media Pages

Timelines

Articles

Notes collections

Notes

Notes

Days in Chronicle

Windows-1250

Character set

See also

Notes

References

External links

Windows-1250

Introduction

Overview

Purpose and Scope

History and Development

Origins in Microsoft Code Pages

Standardization and Evolution

Technical Specifications

Encoding Structure

Byte Values and Assignment

Character Coverage

Basic Latin and Punctuation

Central European Extensions

Compatibility and Mappings

Relation to ISO/IEC 8859-2

Unicode Conversion Guidelines

Usage and Applications

In Windows Operating Systems

In Web and Legacy Software

References

Add your contribution

Related Hubs

Contribute something