Windows-1256

Windows-1256
MIME / IANA	windows-1256
Alias(es)	cp1256 (Code page 1256)
Languages	Arabic, Persian, Urdu, English, French (except capital letters with diacritics)
Created by	Microsoft
Standard	WHATWG Encoding Standard
Classification	extended ASCII, Windows-125x

Windows-1256 is a code page used under Microsoft Windows to write Arabic and other languages that use Arabic script, such as Persian and Urdu.

This code page is neither compatible with ISO/IEC 8859-6 nor the MacArabic encoding.

Windows-1256 encodes every abstract single letter of the basic Arabic alphabet, not every concrete visual form of isolated, initial, medial, final or ligatured letter shape variants (i.e. it encodes characters, not glyphs). The Arabic letters in the C0-FF range are in Arabic alphabetic order, but some Latin characters are interspersed among them. These are some Windows-1252 Latin characters used for French, since this European language has some historic relevance in former French colonies in North Africa such as Morocco and Algeria. This allowed French and Arabic text to be intermixed when using Windows-1256 without any need for code-page switching (however, upper-case letters with diacritics were not included).

IBM uses code page 1256 (CCSID 1256, euro sign extended CCSID 5352, and the further extended CCSID 9448 for some letters used in modern Persian and Urdu) for Windows-1256.^[1]^[2]^[3]^[4]

Unicode is preferred over Windows-1256 in modern applications, especially on the Internet, where the dominant UTF-8 encoding is most used for web pages, including for Arabic (see also Arabic script in Unicode, for complete coverage, unlike for e.g. Windows-1256 or ISO/IEC 8859-6 that do not cover extras). Less than 0.03% of all web pages use Windows-1256 in October 2022,^[5]^[6] and while that encoding is mostly used for Arabic, and second-most popular for it, it is only used for 1.6% of the Arabic text on the web.

Character set

Since the original code page left 9 byte values marked as "NOT USED" in the original specification (hexadecimal 0x80, 0x8A, 0x8F, 0x98, 0x9A, 0x9F, 0xAA, 0xC0, and 0xFF),^[7] these bytes were used later for the euro sign, and for additional letters in the Perso-Arabic script (for the Persian and Urdu languages).^[8]

The following table shows the extended version of Windows-1256. Each character is shown with its Unicode equivalent and its decimal code.

Here every Arabic letter is shown in isolated form. The actual forms of the letters inside Arabic words are rendered by a combination of software rules and appropriate font support.

Windows-1256^[8]^[9]^[10]^[11]^[12]^[13]^[14]
	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
0x	NUL	SOH	STX	ETX	EOT	ENQ	ACK	BEL	BS	HT	LF	VT	FF	CR	SO	SI
1x	DLE	DC1	DC2	DC3	DC4	NAK	SYN	ETB	CAN	EM	SUB	ESC	FS	GS	RS	US
2x	SP	!	"	#	$	%	&	'	(	)	*	+	,	-	.	/
3x	0	1	2	3	4	5	6	7	8	9	:	;	<	=	>	?
4x	@	A	B	C	D	E	F	G	H	I	J	K	L	M	N	O
5x	P	Q	R	S	T	U	V	W	X	Y	Z	[	\	]	^	_
6x	`	a	b	c	d	e	f	g	h	i	j	k	l	m	n	o
7x	p	q	r	s	t	u	v	w	x	y	z	{	\|	}	~	DEL
8x	€	پ	‚	ƒ	„	…	†	‡	ˆ	‰	ٹ	‹	Œ	چ	ژ	ڈ
9x	گ	‘	’	“	”	•	–	—	ک	™	ڑ	›	œ	ZWNJ	ZWJ	ں
Ax	NBSP	،	¢	£	¤	¥	¦	§	¨	©	ھ	«	¬	SHY	®	¯
Bx	°	±	²	³	´	µ	¶	·	¸	¹	؛	»	¼	½	¾	؟
Cx	ہ	ء	آ	أ	ؤ	إ	ئ	ا	ب	ة	ت	ث	ج	ح	خ	د
Dx	ذ	ر	ز	س	ش	ص	ض	×	ط	ظ	ع	غ	ـ	ف	ق	ك
Ex	à	ل	â	م	ن	ه	و	ç	è	é	ê	ë	ى	ي	î	ï
Fx	ً	ٌ	ٍ	َ	ô	ُ	ِ	÷	ّ	ù	ْ	û	ü	LRM	RLM	ے

Differences from Windows-1252

References

^ "Code page 1256 information document". Archived from the original on 2016-03-03.
^ "CCSID 1256 information document". Archived from the original on 2016-03-27.
^ "CCSID 5352 information document". Archived from the original on 2014-11-29.
^ "CCSID 9448 information document". Archived from the original on 2014-11-29.
^ "Historical trends in the usage of character encodings for websites, October 2022". w3techs.com.
^ "Frequently Asked Questions". w3techs.com.
^ Archiveddocs. "Code Page 1256 Windows Arabic". docs.microsoft.com.
^ ^a ^b "cp1256 to Unicode table" (PDF). www.unicode.org. Retrieved 2019-05-31.
^ Unicode mappings of windows 1256 with "best fit"
^ Code Page CPGID 01256 (pdf) (PDF), IBM
^ Code Page CPGID 01256 (txt), IBM
^ International Components for Unicode (ICU), ibm-1256_P110-1997.ucm, 2002-12-03
^ International Components for Unicode (ICU), ibm-5352_P100-1998.ucm, 2002-12-03
^ International Components for Unicode (ICU), ibm-9448_X100-2005.ucm, 2005-11-15

External links

[1] "Code page 1256 information document". Archived from the original on 2016-03-03.

[2] "CCSID 1256 information document". Archived from the original on 2016-03-27.

[3] "CCSID 5352 information document". Archived from the original on 2014-11-29.

[4] "CCSID 9448 information document". Archived from the original on 2014-11-29.

[5] "Historical trends in the usage of character encodings for websites, October 2022". w3techs.com.

[6] "Frequently Asked Questions". w3techs.com.

[7] Archiveddocs. "Code Page 1256 Windows Arabic". docs.microsoft.com.

[cp1256-8] "cp1256 to Unicode table" (PDF). www.unicode.org. Retrieved 2019-05-31.

[9] Unicode mappings of windows 1256 with "best fit"

[10] Code Page CPGID 01256 (pdf) (PDF), IBM

[11] Code Page CPGID 01256 (txt), IBM

[12] International Components for Unicode (ICU), ibm-1256_P110-1997.ucm, 2002-12-03

[13] International Components for Unicode (ICU), ibm-5352_P100-1998.ucm, 2002-12-03

[14] International Components for Unicode (ICU), ibm-9448_X100-2005.ucm, 2005-11-15

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

v t e Character encodings
Early telecommunications	Telegraph code Needle Morse Non-Latin Wabun/Kana Chinese Cyrillic Baudot and Murray Fieldata ASCII ISO/IEC 646 BCDIC Teletex and Videotex/Teletext T.51/ISO/IEC 6937 ITU T.61 ITU T.101 World System Teletext background sets Transcode
ISO/IEC 8859	Approved parts -1 (Western Europe) -2 (Central Europe) -3 (Maltese/Esperanto) -4 (North Europe) -5 (Cyrillic) -6 (Arabic) -7 (Greek) -8 (Hebrew) -9 (Turkish) -10 (Nordic) -11 (Thai) -13 (Baltic) -14 (Celtic) -15 (New Western Europe) -16 (Romanian) Abandoned parts -12 (Devanagari) Proposed but not approved KOI-8 Cyrillic Sámi Adaptations Welsh Estonian Ukrainian Cyrillic
Bibliographic use	MARC-8 ANSEL CCCII/EACC ISO 5426 5426-2 5427 5428 6438 6862
National standards	ArmSCII Big5 BraSCII BSCII CNS 11643 DIN 66003 ELOT 927 GOST 10859 GB 2312 GB 12345 GB 12052 GB 18030 HKSCS ISCII JIS X 0201 JIS X 0208 JIS X 0212 JIS X 0213 KOI-7 KPS 9566 KS X 1001 KS X 1002 LST 1564 LST 1590-4 PASCII Shift JIS SI 960 TIS-620 TSCII VISCII VSCII YUSCII
ISO/IEC 2022	ISO/IEC 8859 ISO/IEC 10367 Extended Unix Code / EUC
Mac OS Code pages ("scripts")	Armenian Arabic Barents Cyrillic Celtic Central European Croatian Cyrillic Devanagari Font X (Kermit) Gaelic Georgian Greek Gujarati Gurmukhi Hebrew Iceland Inuit Keyboard Latin (Kermit) Maltese/Esperanto Ogham Roman Romanian Sámi Turkish Turkic Cyrillic Ukrainian VT100
DOS code pages	437 737 850 858 861 862 863 864 865 866 867 868 869 899 904 932 936 942 949 950 951 1040 1043 1046 1098 1115 1116 1117 1118 1127 ABICOMP CS Indic CSX Indic CSX+ Indic CWI-2 Iran System Kamenický Mazovia MIK
IBM AIX code pages	895 896 912 915 921 922 1006 1008 1009 1010 1012 1013 1014 1015 1016 1017 1018 1019 1046 1133
Windows code pages	CER-GS 932 936 (GBK) 950 Extended Latin-8 1250 1251 1252 1253 1254 1255 1256 1257 1258 1270 Cyrillic + French Cyrillic + German Polytonic Greek
EBCDIC code pages	Japanese language in EBCDIC DKOI
DEC terminals (VTx)	Multinational (MCS) National Replacement (NRCS) French Canadian Swiss Spanish United Kingdom Dutch Finnish French Norwegian and Danish Swedish Norwegian and Danish (alternative) 8-bit Greek 8-bit Turkish SI 960 Hebrew Special Graphics Technical (TCS)
Platform specific	1052 1053 1054 1055 1058 Acorn RISC OS Amstrad CPC Apple II ATASCII Atari ST BICS Casio calculators CDC Compucolor 8001 Compucolor II CP/M+ DEC RADIX 50 DEC MCS/NRCS DG International Galaksija GEM GSM 03.38 HP Roman HP FOCAL HP RPL SQUOZE LICS LMBCS MSX NEC APC NeXT PETSCII PostScript Standard PostScript Latin 1 SAM Coupé Sega SC-3000 Sharp calculators Sharp MZ Sinclair QL Teletext TI calculators TRS-80 Ventura International WISCII XCCS ZX80 ZX81 ZX Spectrum
Unicode / ISO/IEC 10646	UTF-1 UTF-7 UTF-8 UTF-16 UTF-32 UTF-EBCDIC GB 18030 DIN 91379 BOCU-1 CESU-8 SCSU TACE16 Comparison of Unicode encodings
TeX typesetting system	Cork LY1 OML OMS OT1
Miscellaneous code pages	ABICOMP ASMO 449 Digital encoding of APL symbols ISO-IR-68 ARIB STD-B24 Fieldata HZ IEC-P27-1 INIS 7-bit 8-bit ISO-IR-169 ISO 2033 KOI KOI8-R KOI8-RU KOI8-U Mojikyō SEASCII Stanford/ITS Symbol TRON Unified Hangul Code
Control character	Morse prosigns C0 and C1 control codes ISO/IEC 6429 JIS X 0211 Unicode control, format and separator characters Whitespace characters
Related topics	CCSID Character encodings in HTML Charset detection Han unification Hardware code page MICR code Mojibake Variable-length encoding
Character sets

Info Pages

Talk Pages

Special Pages

Windows-1256

Recent from talks

Recent from talks

Contribute something

Contribute something

Media Pages

Timelines

Articles

Notes collections

Notes

Notes

Days in Chronicle

Windows-1256

Character set

See also

References

External links

Windows-1256

Development and History

Origins

Standardization

Technical Specifications

Encoding Structure

Character Composition

Mappings and Compatibility

Relation to Other Encodings

Unicode Conversion

Usage and Legacy

Applications in Software

Current Status

References

Add your contribution

Related Hubs

Contribute something