BCD (character encoding)

BCD Interchange Codes
Classification	6-bit alphanumeric basic Latin encodings
Succeeded by	EBCDIC

BCD (binary-coded decimal), also called alphanumeric BCD, alphameric BCD, BCD Interchange Code,^[1] or BCDIC,^[1] is a family of representations of numerals, uppercase Latin letters, and some special and control characters as six-bit character codes.

Unlike later encodings such as ASCII, BCD codes were not standardized. Different computer manufacturers, and even different product lines from the same manufacturer, often had their own variants, and sometimes included unique characters. Other six-bit encodings with completely different mappings, such as some FIELDATA^[1] variants or Transcode, are sometimes incorrectly termed BCD.

Many variants of BCD encode the characters '0' through '9' as the corresponding binary values.

History

Technically, binary-coded decimal describes the encoding of decimal numbers where each decimal digit is represented by a fixed number of bits, usually four.

With the introduction of the IBM card in 1928, IBM created a code^[a] capable of representing alphanumeric information,^[2] later adopted by other manufacturers. This code represents the numbers 0-9 by a single punch, and uses multiple punches for upper-case letters and special characters.^[3] A letter has two punches (zone [12,11,0] + digit [1–9]); most special characters have two or three punches (zone [12,11,0,or none] + digit [2–7] + 8).

The BCD code is the adaptation of the punched card code to a six-bit binary code by encoding the digit rows (nine rows, plus unpunched) into the low four bits, and the zone rows (three rows, plus unpunched) into the high two bits.^[4] The digit zero (a single punch in row 0) is usually handled specially in some way, and the digit code was extended to values 10 through 15 by combining a digit in the range 2–7 with a punch in row 8. IBM applied the terms binary-coded decimal and BCD to the variations of BCD alphamerics used in most early IBM computers, including the IBM 1620, IBM 1400 series, and non-Decimal Architecture members of the IBM 700/7000 series.

Among the vendors using BCD were Burroughs,^[5] Bull, CDC,^[6] IBM, General Electric (the computer division was purchased by Honeywell in 1969), NCR, Siemens, and Sperry-UNIVAC.

IBM announced the 8-bit Extended Binary Coded Decimal Interchange Code (EBCDIC), based on BCDIC, in 1964 with the introduction of its System/360 line.

Special characters

The Recordmark or Record mark character (represented as ‡) is a character used to mark the end of a record.^[7] The BCD code for this character is 32₈ in some BCD variants. The closest Unicode equivalent is U+29E7 ⧧ THERMODYNAMIC, but that is not found in many fonts, so U+2021 ‡ DOUBLE DAGGER is often used instead. Functionally this corresponds to the EBCDIC IRS character (ASCII RS), X'1E'.

The Groupmark or Group mark character (represented as ) is a character used to indicate the start or finish of a group of related fields.^[8] The BCD code for this character is 77₈ in some BCD variants. The groupmark was proposed for Unicode standardization in 2015,^[9] and was assigned to value U+2BD2 ⯒ GROUP MARK. Functionally this corresponds to the EBCDIC IGS character (ASCII GS), X'1D'. It is now in Unicode 10.0 at this position, but only the Symbola and Unifont fonts support it.

The Wordmark, by contrast, is not a BCD character. Rather, it is a flag bit used to mark the end of a word on some variable word length computers such as the IBM 1401.

BCD code variations

There are many different versions of the six-bit BCD code. There are three major categories of difference:

The mapping from zone punches to high-order bits. All codes translate no zone punches to a bit pattern of 00, but some encode the zone punches in 12-11-0 order, preserving alphabetical order, while others use 0-11-12 order, resulting in a partially reversed alphabet.
The handling of the digit 0. The straightforward translation from punched form would place the blank before digits 1–9, and encode 0 at the start of the line with 'S' in it. All codes have some special-case handling which either translates the digit 0 to the all-zero binary code (and moves the blank elsewhere), or gives it binary code 001010 (decimal 10) and moves the 8+2 punch elsewhere.
The assignment of special characters. The characters assigned to codes beyond the basic alphanumeric set varied widely, even within one model of computer. For example, some computers^[b] had the percent and lozenge (U+2311 ⌑ SQUARE LOZENGE) at the same codes as left and right parentheses in other^[c] encodings.

In "Spanish speaking countries", the character "Ñ" did not exist in the original system, therefore "@" was chosen by most manufacturers: Bull, NCR, and Control Data, but there was an inconsistency when merging databases to 7-bit ASCII code, for in that coding system the "/" character was chosen, resulting in two different codes for the same character.

Examples of BCD codes

The following charts show the numeric values of BCD characters in hexadecimal (base-16) notation, as that most clearly reflects the structure of 4-bit binary coded decimal, plus two extra bits. For example, the code for 'A', in row 3x and column x1, is hexadecimal 31, or binary '11 0001'.

Tape style

48-character BCD code

The first versions of BCDIC had 48 characters, as they were based on card punch patterns and the character sets of printers, neither of which encouraged having a power-of-two number of characters.

IBM 48-character BCDIC code^[1]^: 68
	x0	x1	x2	x3	x4	x5	x6	x7	x8	x9	xA	xB	xC
0x	space	1	2	3	4	5	6	7	8	9	0	#	@
1x		/	S	T	U	V	W	X	Y	Z		,	%
2x	-	J	K	L	M	N	O	P	Q	R		$	*
3x	&	A	B	C	D	E	F	G	H	I		.	⌑

This was based on a 40-character punched card code; the original 37 (10 digits, 26 letters, and blank), plus three commercially important characters added around 1932:^[1]^: 67 hyphen-minus used for printing credit balances and hyphenated names, the ampersand also used in many names and addresses (Procter & Gamble, Mr. & Mrs. Smith), and the asterisk used to overprint unused fields when printing cheques.

IBM 1401 BCD code

Rather than following the IBM 704's storage representation, IBM 1401 followed the tape representation (descended from the 48-character BCD), thus using the all-zero code for blank and the code 10 (0x0A) for the digit zero. It had defined character forms for all possible values, for documentation purposes,^[10] but only 48 of the 63 non-blank characters were printable, and there was considerable variation in how the other code values (shaded in the table below) were depicted in practice. Even the other characters varied between different available print chains for the IBM 1403 printer.

	x0	x1	x2	x3	x4	x5	x6	x7	x8	x9	xA	xB	xC	xD	xE	xF
0x	space	1	2	3	4	5	6	7	8	9	0	#	@	:	>	√
1x	¢	/	S	T	U	V	W	X	Y	Z	⧧	,	%	=	'	"
2x	-	J	K	L	M	N	O	P	Q	R	!	$	*	)	;	Δ
3x	&	A	B	C	D	E	F	G	H	I	?	.	⌑	(	<	⯒

Code page 353

The BCDIC-A Code page was assigned as Code page 353, also known as CP353. Some of the characters in this code page are not in Unicode. (The duplication of '#' can be found in IBM's own documentation and is not a mistake here.^[11])

	x0	x1	x2	x3	x4	x5	x6	x7	x8	x9	xA	xB	xC	xD	xE	xF
0x	space	1	2	3	4	5	6	7	8	9	0	#	@	:	>	√
1x	␢	/	S	T	U	V	W	X	Y	Z	⧧	,	%	γ	\	⧻
2x	-	J	K	L	M	N	O	P	Q	R	!	#	*	]	;	Δ
3x	&	A	B	C	D	E	F	G	H	I	?	.	⌑	[	<	⯒

At 0x1A is the record mark. At 0x3F is the group mark.

Code page 354

The BCDIC-B Code page was assigned as Code page 354, also known as CP354.^[12] Some of the characters in this code page are not in Unicode.

	x0	x1	x2	x3	x4	x5	x6	x7	x8	x9	xA	xB	xC	xD	xE	xF
0x	space	1	2	3	4	5	6	7	8	9	0	⊙	'	:	>	√
1x	␢	/	S	T	U	V	W	X	Y	Z	⧧	,	(	γ	\	⧻
2x	-	J	K	L	M	N	O	P	Q	R	!	#	*	]	;	Δ
3x	+	A	B	C	D	E	F	G	H	I	?	.	)	[	<	⯒

At 0x1A is the record mark. At 0x3F is the group mark.

PTTC/BCD code pages

PTTC/BCD had 5 options. There were five code pages. They are shown below. The PTTC/BCD Standard Option was assigned as Code page 355, or CP355.

	x0	x1	x2	x3	x4	x5	x6	x7	x8	x9	xA	xB	xD
0x	space	1	2	3	4	5	6	7	8	9	0	#
1x	@	/	S	T	U	V	W	X	Y	Z	⧧	,	γ
2x	-	J	K	L	M	N	O	P	Q	R	<	$
3x	&	A	B	C	D	E	F	G	H	I	)	.

The PTTC/BCD H Option was assigned as Code page 357, or CP357.

	x0	x1	x2	x3	x4	x5	x6	x7	x8	x9	xA	xB
0x	space	1	2	3	4	5	6	7	8	9	0	=
1x	'	/	S	T	U	V	W	X	Y	Z	⧧	,
2x	-	J	K	L	M	N	O	P	Q	R	!	$
3x	+	A	B	C	D	E	F	G	H	I	?	.

The PTTC/BCD Correspondence Option was assigned as Code page 358, or CP358.

	x0	x1	x2	x3	x4	x5	x6	x7	x8	x9	xA	xB
0x	space	1	2	3	4	5	6	7	8	9	0	'
1x	!	/	S	T	U	V	W	X	Y	Z	⧧	,
2x	-	J	K	L	M	N	O	P	Q	R	<	;
3x	=	A	B	C	D	E	F	G	H	I	>	.

The PTTC/BCD Monocase Option was assigned as Code page 359, or CP359.

	x0	x1	x2	x3	x4	x5	x6	x7	x8	x9	xA	xB
0x	space	1	2	3	4	5	6	7	8	9	0	#
1x	@	/	S	T	U	V	W	X	Y	Z		,
2x	-	J	K	L	M	N	O	P	Q	R		$
3x	&	A	B	C	D	E	F	G	H	I		.

The PTTC/BCD Duocase Option was assigned as Code page 360, or CP360.

	x0	x1	x2	x3	x4	x5	x6	x7	x8	x9	xA	xB
0x	space	1	2	3	4	5	6	7	8	9	0	#
1x	@	/	S	T	U	V	W	X	Y	Z		,
2x	-	J	K	L	M	N	O	P	Q	R		$
3x	&	A	B	C	D	E	F	G	H	I		.

IBM 704 storage style

IBM 704 BCD code

The IBM 704 reordered the BCDIC code to allow a normal alphabetic collating order internally, with 0 before 1 and A before Z. It could automatically translate between this internal form and the earlier BCDIC when reading and writing magnetic tapes.^[13]^: 35

The following table shows the code assignments for the IBM 704 computer. Unassigned code positions appear as blanks.^[13]^: 35

IBM 704 character set
	x0	x1	x2	x3	x4	x5	x6	x7	x8	x9	xA	xB	xC
0x	0	1	2	3	4	5	6	7	8	9		#	@
1x	&	A	B	C	D	E	F	G	H	I	+0	.	⌑
2x	-	J	K	L	M	N	O	P	Q	R	−0	$	*
3x	space	/	S	T	U	V	W	X	Y	Z	⧧	,	%

(+0 and −0 were rarely used characters that corresponded to the punched-card convention of a digit 0 with an overpunched sign in rows 12 or 11.)

The following table shows the code assignments for the type 716 printer used starting with the IBM 704 computer and through the 7094.^[13]^: 58 The 704 interface^[d] sent virtual punched-card rows to this printer, two words (72 bits) at a time, so the mapping from 6-bit BCD characters was done by software, and was not built into the printer.

IBM 716 printer character set G
	x0	x1	x2	x3	x4	x5	x6	x7	x8	x9	xB	xC
0x	*	1	2	3	4	5	6	7	8	9	+	-
1x	+	A	B	C	D	E	F	G	H	I	.	⌑
2x	-	J	K	L	M	N	O	P	Q	R	$	*
3x	0	/	S	T	U	V	W	X	Y	Z	,	%

This is a repertoire of 45 characters (not counting blank, which is handled specially by the printer), as the characters +, - and * are duplicated.

Fortran character set

There was some variation; IBM 704 Fortran had a different set of special characters (preserving only the duplicated minus sign and asterisk, period, comma, and dollar sign).^[14]

IBM 716 printer Fortran character set
	x0	x1	x2	x3	x4	x5	x6	x7	x8	x9	xB	xC
0x	*	1	2	3	4	5	6	7	8	9	=	-
1x	+	A	B	C	D	E	F	G	H	I	.	)
2x	-	J	K	L	M	N	O	P	Q	R	$	*
3x	0	/	S	T	U	V	W	X	Y	Z	,	(

A similar code was used for the IBM 709, 7090 and 7094 successors,^[15] but with some of the special characters reassigned:

IBM 7090/7094 character set
	x0	x1	x2	x3	x4	x5	x6	x7	x8	x9	xA	xB	xC
0x	0	1	2	3	4	5	6	7	8	9		=	"
1x	&	A	B	C	D	E	F	G	H	I	+0	.	)
2x	-	J	K	L	M	N	O	P	Q	R	−0	$	*
3x	space	/	S	T	U	V	W	X	Y	Z	±	,	(

GBCD code

Below is the table of GE/Honeywell's GBCD code, a variant of BCD.^[16]

	x0	x1	x2	x3	x4	x5	x6	x7	x8	x9	xA	xB	xC	xD	xE	xF
0x	0	1	2	3	4	5	6	7	8	9	[	#	@	:	>	?
1x	space	A	B	C	D	E	F	G	H	I	&	.	]	(	<	\
2x	^	J	K	L	M	N	O	P	Q	R	-	$	*	)	;	'
3x	+	/	S	T	U	V	W	X	Y	Z	_	,	%	=	"	!

Burroughs B5500 BCD code

The following table shows the code assignments for the Burroughs B5500 computer, sometimes referred to as BIC (Burroughs Interchange Code).^[17]

	x0	x1	x2	x3	x4	x5	x6	x7	x8	x9	xA	xB	xC	xD	xE	xF
0x	0	1	2	3	4	5	6	7	8	9	#	@	?	:	>	≥
1x	+	A	B	C	D	E	F	G	H	I	.	[	&	(	<	←
2x	×	J	K	L	M	N	O	P	Q	R	$	*	-	)	;	≤
3x	space	/	S	T	U	V	W	X	Y	Z	,	%	≠	=	]	"

Notes

^ There are actually multiple card codes, e.g, by 1964 there were ten versions of the IBM 026 with slightly different character sets.
^ E.g., IBM 702, IBM 705
^ E.g., IBM 701, IBM 704.
^ The interface on, e.g., the 7090, is different, although the software still must do mapping.

References

^ ^a ^b ^c ^d ^e Mackenzie, Charles E. (1980). Coded Character Sets, History and Development (PDF). The Systems Programming Series (1 ed.). Addison-Wesley Publishing Company, Inc. ISBN 0-201-14460-3. LCCN 77-90165. Archived (PDF) from the original on 2016-05-26. Retrieved 2017-04-22. [1]
^ Pugh, Emerson W.; Heide, Lars. "STARS:Punched Card Equipment". IEEE Global History Network. Archived from the original on 2012-05-11. Retrieved 2012-06-09.
^ Pugh, Emerson W. (1995). Building IBM: Shaping and Industry and Its Technology. MIT Press. pp. 50–51. ISBN 978-0-262-16147-3.
^ Jones, Douglas W. "Punched Card Codes". Retrieved 2014-01-01.
^ Burroughs B5500 Information Processing Systems: Reference Manual (PDF). Burroughs Corporation. 1964. Archived from the original (PDF) on 2020-07-29. Retrieved 2012-06-08.
^ Control Data Corporation (1965). Codes/Control Data 6600 Computer System (PDF).
^ "Record-mark". Encyclopedia. PC Magazine. Retrieved 2016-04-09.
^ "group mark". Encyclopedia.com. Retrieved 2016-04-09.
^ Shirriff, Ken. "Proposal for addition of Group Mark symbol" (PDF). unicode.org. Retrieved 2016-04-09.
^ IBM 1401 Data Processing System: Reference Manual (PDF). IBM. April 1962. p. 170. A24-1403-5. Archived from the original (PDF) on 2012-03-14.
^ "Systems i Software Globalization cp00353z" (PDF). www-03.ibm.com. Archived from the original (PDF) on 2013-01-21. Retrieved 2022-06-30.
^ https://ccsids.net/ccsids.html#ccsid-354. {{cite web}}: Missing or empty |title= (help)
^ ^a ^b ^c IBM 704 electronic data-processing machine manual of operation (PDF). IBM. 1955. pp. 35, 58. Form 24-6661-2. Retrieved 2017-04-22.
^ "Fortran Automatic Coding System for the IBM 704" (PDF). IBM. 1956-10-15. p. 49. Archived from the original (PDF) on 2015-09-24. Retrieved 2015-09-15.
^ Harper, Jack (2001-08-21). "IBM 7090/94 Character Representation". Archived from the original on 2017-03-16. Retrieved 2017-04-22.
^ "Section: Tables of characters in BULL computers" (PDF). Archived from the original (PDF) on 2011-07-08. Retrieved 2010-11-15.
^ Burroughs B 5500 Information Processing Systems Extended Algol Reference Manual (PDF). 1966. p. B-1.

v t e Character encodings
Early telecommunications	Telegraph code Needle Morse Non-Latin Wabun/Kana Chinese Cyrillic Baudot and Murray Fieldata ASCII ISO/IEC 646 BCDIC Teletex and Videotex/Teletext T.51/ISO/IEC 6937 ITU T.61 ITU T.101 World System Teletext background sets Transcode
ISO/IEC 8859	Approved parts -1 (Western Europe) -2 (Central Europe) -3 (Maltese/Esperanto) -4 (North Europe) -5 (Cyrillic) -6 (Arabic) -7 (Greek) -8 (Hebrew) -9 (Turkish) -10 (Nordic) -11 (Thai) -13 (Baltic) -14 (Celtic) -15 (New Western Europe) -16 (Romanian) Abandoned parts -12 (Devanagari) Proposed but not approved KOI-8 Cyrillic Sámi Adaptations Welsh Estonian Ukrainian Cyrillic
Bibliographic use	MARC-8 ANSEL CCCII/EACC ISO 5426 5426-2 5427 5428 6438 6862
National standards	ArmSCII Big5 BraSCII BSCII CNS 11643 DIN 66003 ELOT 927 GOST 10859 GB 2312 GB 12345 GB 12052 GB 18030 HKSCS ISCII JIS X 0201 JIS X 0208 JIS X 0212 JIS X 0213 KOI-7 KPS 9566 KS X 1001 KS X 1002 LST 1564 LST 1590-4 PASCII Shift JIS SI 960 TIS-620 TSCII VISCII VSCII YUSCII
ISO/IEC 2022	ISO/IEC 8859 ISO/IEC 10367 Extended Unix Code / EUC
Mac OS Code pages ("scripts")	Armenian Arabic Barents Cyrillic Celtic Central European Croatian Cyrillic Devanagari Farsi (Persian) Font X (Kermit) Gaelic Georgian Greek Gujarati Gurmukhi Hebrew Iceland Inuit Keyboard Latin (Kermit) Maltese/Esperanto Ogham Roman Romanian Sámi Turkish Turkic Cyrillic Ukrainian VT100
DOS code pages	437 737 850 858 861 862 863 864 865 866 867 868 869 899 904 932 936 942 949 950 951 1040 1043 1046 1098 1115 1116 1117 1118 1127 ABICOMP CS Indic CSX Indic CSX+ Indic CWI-2 Iran System Kamenický Mazovia MIK
IBM AIX code pages	895 896 912 915 921 922 1006 1008 1009 1010 1012 1013 1014 1015 1016 1017 1018 1019 1046 1133
Windows code pages	CER-GS 932 936 (GBK) 950 Extended Latin-8 1250 1251 1252 1253 1254 1255 1256 1257 1258 1270 Cyrillic + French Cyrillic + German Polytonic Greek
EBCDIC code pages	Japanese language in EBCDIC DKOI
DEC terminals (VTx)	Multinational (MCS) National Replacement (NRCS) French Canadian Swiss Spanish United Kingdom Dutch Finnish French Norwegian and Danish Swedish Norwegian and Danish (alternative) 8-bit Greek 8-bit Turkish SI 960 Hebrew Special Graphics Technical (TCS)
Platform specific	1052 1053 1054 1055 1058 Acorn RISC OS Amstrad CPC Apple II ATASCII Atari ST BICS Casio calculators CDC Compucolor 8001 Compucolor II CP/M+ DEC RADIX 50 DEC MCS/NRCS DG International Galaksija GEM GSM 03.38 HP Roman HP FOCAL HP RPL SQUOZE LICS LMBCS MSX NEC APC NeXT PETSCII PostScript Standard PostScript Latin 1 SAM Coupé Sega SC-3000 Sharp calculators Sharp MZ Sinclair QL Teletext TI calculators TRS-80 Ventura International WISCII XCCS ZX80 ZX81 ZX Spectrum
Unicode / ISO/IEC 10646	UTF-1 UTF-7 UTF-8 UTF-16 UTF-32 UTF-EBCDIC GB 18030 DIN 91379 BOCU-1 CESU-8 SCSU TACE16 Comparison of Unicode encodings
TeX typesetting system	Cork LY1 OML OMS OT1
Miscellaneous code pages	ABICOMP ASMO 449 Digital encoding of APL symbols ISO-IR-68 ARIB STD-B24 Fieldata HZ IEC-P27-1 INIS 7-bit 8-bit ISO-IR-169 ISO 2033 KOI KOI8-R KOI8-RU KOI8-U Mojikyō SEASCII Stanford/ITS Symbol TRON Unified Hangul Code
Control character	Morse prosigns C0 and C1 control codes ISO/IEC 6429 JIS X 0211 Unicode control, format and separator characters Whitespace characters
Related topics	CCSID Character encodings in HTML Charset detection Han unification Hardware code page MICR code Mojibake Variable-length encoding
Character sets

Character	Decimal	Binary (6-bit)	Zone-Digit Breakdown
0	0	000000	00-0000
1	1	000001	00-0001
2	2	000010	00-0010
3	3	000011	00-0011
4	4	000100	00-0100
5	5	000101	00-0101
6	6	000110	00-0110
7	7	000111	00-0111
8	8	001000	00-1000
9	9	001001	00-1001
A	49	110001	11-0001
B	50	110010	11-0010
C	51	110011	11-0011
D	52	110100	11-0100
E	53	110101	11-0101
F	54	110110	11-0110
G	55	110111	11-0111
H	56	111000	11-1000
I	57	111001	11-1001
J	33	100001	10-0001
K	34	100010	10-0010
L	35	100011	10-0011
M	36	100100	10-0100
N	37	100101	10-0101
O	38	100110	10-0110
P	39	100111	10-0111
Q	40	101000	10-1000
R	41	101001	10-1001
S	18	010010	01-0010
T	19	010011	01-0011
U	20	010100	01-0100
V	21	010101	01-0101
W	22	010110	01-0110
X	23	010111	01-0111
Y	24	011000	01-1000
Z	25	011001	01-1001
-	32	100000	10-0000
/	17	010001	01-0001
,	16	010000	01-0000
"	48	110000	11-0000

History

BCD (character encoding)

Recent from talks

Recent from talks

Contribute something

Contribute something

Media Pages

Timelines

Articles

Notes collections

Notes

Notes

Days in Chronicle

BCD (character encoding)

History

Special characters

BCD code variations

Examples of BCD codes

Tape style

48-character BCD code

IBM 1401 BCD code

Code page 353

Code page 354

PTTC/BCD code pages

IBM 704 storage style

IBM 704 BCD code

Fortran character set

GBCD code

Burroughs B5500 BCD code

See also

Notes

References

Further reading