Hubbry Logo
C0 and C1 control codesC0 and C1 control codesMain
Open search
C0 and C1 control codes
Community hub
C0 and C1 control codes
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something
C0 and C1 control codes
C0 and C1 control codes
from Wikipedia

The C0 and C1 control code or control character sets define control codes for use in text by computer systems that use ASCII and derivatives of ASCII. The codes represent additional information about the text, such as the position of a cursor, an instruction to start a new line, or a message that the text has been received.

C0 codes are the range 00HEX–1FHEX and the default C0 set was originally defined in ISO 646 (ASCII). C1 codes are the range 80HEX–9FHEX and the default C1 set was originally defined in ECMA-48 (harmonized later with ISO 6429). The ISO/IEC 2022 system of specifying control and graphic characters allows other C0 and C1 sets to be available for specialized applications, but they are rarely used.

C0 controls

[edit]

ASCII defines 32 control characters, plus the DEL character. This large number of codes was desirable at the time, as multi-byte controls would require implementation of a state machine in the terminal, which was very difficult with contemporary electronics and mechanical terminals.

Only a few codes have maintained their use: BEL, ESC, and the format effector[1] (FEn) characters BS, TAB, LF, VT, FF, and CR. Others are unused or have acquired different meanings such as NUL being the C string terminator. Some data transfer protocols such as ANPA-1312, Kermit, and XMODEM do make extensive use of SOH, STX, ETX, EOT, ACK, NAK and SYN for purposes approximating their original definitions; and some file formats use the "Information Separators" (ISn) such as the Unix info format[2] and Python's splitlines string method.[3]

The names of some codes were changed in ISO 6429:1992 (or ECMA-48:1991) to be neutral with respect to writing direction. The abbreviations used were not changed, as the standard had already specified that those would remain unchanged when the standard is translated to other languages. In this table both new and old names are shown for the renamed controls (the old name is the one matching the abbreviation).

Unicode provides Control Pictures that can replace C0 control characters to make them visible on screen. However caret notation is used more often.

ASCII control codes, originally defined in ANSI X3.4.[4]
Caret notation
Decimal
Hexadecimal
Abbreviations Name Description
^@ 0 00 NUL Null \0 Does nothing. The code of blank paper tape, and also used for padding to slow transmission.
^A 1 01 TC1, SOH Start of Heading First character of the heading of a message.[5]
^B 2 02 TC2, STX Start of Text Terminates the header and starts the message text.
^C 3 03 TC3, ETX End of Text Ends the message text, starts a footer (up to the next TC character).[5][6]
^D 4 04 TC4, EOT End of Transmission Ends the transmission of one or more messages.[5][6] May place terminals on standby.[6]
^E 5 05 TC5, ENQ, WRU[a] Enquiry Trigger a response at the receiving end, to see if it is still present.
^F 6 06 TC6, ACK Acknowledge Indication of successful receipt of a message.
^G 7 07 BEL[b] Bell, Alert \a Call for attention from an operator.
^H 8 08 FE0, BS Backspace \b Move one position leftwards. Next character may overprint or replace the character that was there.
^I 9 09 FE1, HT Character Tabulation,
Horizontal Tabulation
\t Move right to the next tab stop.
^J 10 0A FE2, LF Line Feed \n Move down to the same position on the next line (some devices also moved to the left column).
^K 11 0B FE3, VT Line Tabulation,
Vertical Tabulation
\v Move down to the next vertical tab stop.
^L 12 0C FE4, FF Form Feed \f Move down to the top of the next page.
^M 13 0D FE5, CR Carriage Return \r Move to column zero while staying on the same line.
^N 14 0E SO, LS1[13][c] Shift Out Switch to an alternative character set.
^O 15 0F SI, LS0[13][c] Shift In Return to regular character set after SO.
^P 16 10 TC7, DC0,[d] DLE Data Link Escape Cause a limited number of contiguously following characters to be interpreted in some different way.[15][16]
^Q 17 11 DC1, XON Device Control One Turn on (DC1 and DC2) or off (DC3 and DC4) devices.

Teletype[7] used these for the paper tape reader and the paper tape punch. The first use became the de facto standard for software flow control.[17]

^R 18 12 DC2, TAPE Device Control Two
^S 19 13 DC3, XOFF Device Control Three
^T 20 14 DC4, TAPE Device Control Four
^U 21 15 TC8, NAK Negative Acknowledge Negative response to a sender, such as a detected error.
^V 22 16 TC9, SYN Synchronous Idle Sent in synchronous transmission systems when no other character is being transmitted.
^W 23 17 TC10, ETB End of Transmission Block End of a transmission block of data when data are divided into such blocks for transmission purposes.
^X 24 18 CAN Cancel Indicates that the data preceding it are in error or are to be disregarded.
^Y 25 19 EM End of medium Indicates on paper or magnetic tapes that the end of the usable portion of the tape had been reached.[4]
^Z 26 1A SUB Substitute Replaces a character that was found to be invalid or in error. Should be ignored.
^[ 27 1B ESC Escape \e
[e]
Alters the meaning of a limited number of following bytes.
Nowadays this is almost always used to introduce an ANSI escape sequence.
^\ 28 1C IS4, FS File Separator Can be used as delimiters to mark fields of data structures. US is the lowest level, while RS, GS, and FS are of increasing level to divide groups made up of items of the level beneath it. SP (space) could be considered an even lower level.
^] 29 1D IS3, GS Group Separator
^^ 30 1E IS2, RS Record Separator
^_ 31 1F IS1, US Unit Separator
While not technically part of the C0 control character range, the following two characters can be thought of as having some characteristics of control characters.
  32 20 SP Space Move right one character position.
^? 127 7F DEL Delete Should be ignored. Used to delete characters on punched tape by punching out all the holes.
  1. ^ Teletype labelled the key WRU for 'who are you?'[7]
  2. ^ The name BELL is assigned by Unicode to the unrelated emoji character 🔔 (U+1F514). While C0 and C1 control characters were not formally named by the Unicode standard itself at the time, this collided with existing use of BELL as the name of this control character in software following the previous versions of UTS#18 (the Unicode Regular Expressions standard),[8] e.g. in Perl.[9] Unicode now accepts ALERT and BEL (but not BELL) as formal aliases for the control character,[10] although the code chart still lists BELL as the ISO 6429 alias,[11] and the corresponding control picture code point is called SYMBOL FOR BELL. Perl subsequently switched to using BELL for the emoji in version 5.18.[12]
  3. ^ a b ISO/IEC 2022 (ECMA-35) refers to these as LS0 and LS1 in 8-bit environments, and as SI and SO in 7-bit environments.[13]
  4. ^ The first, 1963 edition of ASCII classified DLE as a device control, rather than a transmission control, and gave it the abbreviation DC0 ("device control reserved for data link escape").[14]
  5. ^ The '\e' escape sequence is not part of ISO C and many other language specifications. However, it is understood by several compilers, including GCC.

C1 controls

[edit]

In 1973, ECMA-35 and ISO 2022[18] attempted to define a method so an 8-bit "extended ASCII" code could be converted to a corresponding 7-bit code, and vice versa.[19] In a 7-bit environment, the Shift Out (SO) would change the meaning of the 96 bytes 0x20 through 0x7F[a][21] (i.e. all but the C0 control codes), to be the characters that an 8-bit environment would print if it used the same code with the high bit set. This meant that the range 0x80 through 0x9F could not be printed in a 7-bit environment,[19] thus it was decided that no alternative character set could use them, and that these codes should be additional control codes, which become known as the C1 control codes. To allow a 7-bit environment to use these new controls, the sequences ESC @ through ESC _ were to be considered equivalent.[19] The later ISO 8859 standards abandoned support for 7-bit codes, but preserved this range of control characters.

The first C1 control code set to be registered for use with ISO 2022 was DIN 31626,[22] a specialised set for bibliographic use which was registered in 1979.[23]

The more common general-use ISO/IEC 6429 set was registered in 1983,[24] although the ECMA-48 specification upon which it was based had been first published in 1976[25] and JIS X 0211 (formerly JIS C 6323).[26] Symbolic names defined by RFC 1345 and early drafts of ISO 10646, but not in ISO/IEC 6429 (PAD, HOP and SGC) are also used.[9][27]

Except for SS2 and SS3 in EUC-JP text, and NEL in text transcoded from EBCDIC, the 8-bit forms of these codes were almost never used. CSI, DCS and OSC are used to control text terminals and terminal emulators, but almost always by using their 7-bit escape code representations. Nowadays if these codes are encountered it is far more likely they are intended to be printing characters from that position of Windows-1252 or Mac OS Roman.

Except for NEL, Unicode does not provide a "control picture" for any of these. There is no well-known variation of Caret notation for them either.

ISO/IEC 6429 and RFC 1345 C1 control codes
ESC+
Decimal
Hex
Abbr Name Description[28]
@ 128 80 PAD[10] Padding Character[b] Proposed as a "padding" or "high byte" for single-byte characters to make them two bytes long for easier interoperability with multiple byte characters. Extended Unix Code (EUC) occasionally uses this.[32]
A 129 81 HOP[10] High Octet Preset[b] Proposed to set the high byte of a sequence of multiple byte characters so they only need one byte each, as a simple form of data compression.
B 130 82 BPH Break Permitted Here[c] Follows a graphic character where a line break is permitted. Roughly equivalent to a soft hyphen or zero-width space except it does not define what is printed at the line break.
C 131 83 NBH No Break Here[c] Follows the graphic character that is not to be broken. See also word joiner.
D 132 84 IND Index[d] Move down one line without moving horizontally, to eliminate ambiguity about the meaning of LF.
E 133 85 NEL Next Line Equivalent to CR+LF, to match the EBCDIC control character.
F 134 86 SSA Start of Selected Area Used by block-oriented terminals. In xterm ESC F moves to the lower-left corner of the screen, since certain software assumes this behaviour.[35]
G 135 87 ESA End of Selected Area
H 136 88 HTS
  • Character Tabulation Set
  • Horizontal Tabulation Set
Set a tab stop at the current position.
I 137 89 HTJ
  • Character Tabulation With Justification
  • Horizontal Tabulation With Justification
Right-justify the text since the last tab against the next tab stop.
J 138 8A VTS
  • Line Tabulation Set
  • Vertical Tabulation Set
Set a vertical tab stop.
K 139 8B PLD
  • Partial Line Forward
  • Partial Line Down
To produce subscripts and superscripts in ISO/IEC 6429.
Subscripts use PLD text PLU while superscripts use PLU text PLD.
L 140 8C PLU
  • Partial Line Backward
  • Partial Line Up
M 141 8D RI
  • Reverse Line Feed
  • Reverse Index
Move up one line.
N 142 8E SS2 Single-Shift 2 Next character is from the G2 or G3 sets, respectively.
O 143 8F SS3 Single-Shift 3
P 144 90 DCS Device Control String Followed by a string of printable characters (0x20 through 0x7E) and format effectors (0x08 through 0x0D), terminated by ST (0x9C). Xterm defined a number of these.[36]
Q 145 91 PU1 Private Use 1 Reserved for private function agreed on between the sender and the recipient of the data.
R 146 92 PU2 Private Use 2
S 147 93 STS Set Transmit State
T 148 94 CCH Cancel character Destructive backspace, to eliminate ambiguity about meaning of BS.
U 149 95 MW Message Waiting
V 150 96 SPA Start of Protected Area Used by block-oriented terminals.
W 151 97 EPA End of Protected Area
X 152 98 SOS Start of String[c] Followed by a control string terminated by ST (0x9C) which (unlike DCS, OSC, PM or APC) may contain any character except SOS or ST.
Y 153 99 SGC,[10] SGCI[37] Single Graphic Character Introducer[b] Intended to allow an arbitrary Unicode character to be printed; it would be followed by that character, most likely encoded in UTF-1.[37]
Z 154 9A SCI Single Character Introducer[c] To be followed by a single printable character (0x20 through 0x7E) or format effector (0x08 through 0x0D), and to print it as ASCII no matter what graphic or control sets were in use.
[ 155 9B CSI Control Sequence Introducer Used to introduce control sequences that take parameters. Used for ANSI escape sequences.
\ 156 9C ST String Terminator Terminates a string started by DCS, SOS, OSC, PM or APC.
] 157 9D OSC Operating System Command Followed by a string of printable characters (0x20 through 0x7E) and format effectors (0x08 through 0x0D), terminated by ST (0x9C), intended for use to allow in-band signaling of protocol information, but rarely used for that purpose.

Some terminal emulators, including xterm, use OSC sequences for setting the window title and changing the colour palette. They may also support terminating an OSC sequence with BEL instead of ST.[38] Kermit used APC to transmit commands.[39]

^ 158 9E PM Privacy Message
_ 159 9F APC Application Program Command
  1. ^ In early versions the range excluded SP and DEL[20]
  2. ^ a b c Not part of ISO/IEC 6429 (ECMA-48)[9][27][29]: 4 [30]: 5 [31]: 8 
  3. ^ a b c d Not part of the first edition of ISO/IEC 6429.[24][29]: 4 
  4. ^ Deprecated in 1988 and withdrawn in 1992 from ISO/IEC 6429[31]: 87  (1986[33] and 1991[34] respectively for ECMA-48).

Other control code sets

[edit]

The ISO/IEC 2022 (ECMA-35) extension mechanism allowed escape sequences to change the C0 and C1 sets. The standard C0 control character set shown above is chosen with the sequence ESC ! @ and the above C1 set chosen with the sequence ESC " C.[24]

Several official and unofficial alternatives have been defined, but this is pretty much obsolete. Most were forced to retain a good deal of compatibility with the ASCII controls for interoperability. The standard makes ESC,[40][41] SP and DEL[a] "fixed" coded characters, which are available in their ASCII locations in all encodings that conform to the standard.[43] It also specifies that if a C0 set included transmission control (TCn) codes, they must be encoded at their ASCII locations[40] and could not be put in a C1 set,[44] and any new transmission controls must be in a C1 set.[40]

Alternative C0 character sets

[edit]
  • ANPA-1312, a text markup language used for news transmission, replaces several C0 control characters.
  • IPTC 7901, the newer international version of the above, has its own variations.
  • Videotex has a completely different set.
  • Teletext also defines a set similar to Videotex.
  • T.61/T.51,[45] and others[46] replaced EM and GS with SS2 and SS3 so these functions could be used in a 7-bit environment without resorting to escape sequences.
  • Some sets replaced FS with SS2,[47] (same as ANPA-1312).
  • The now-withdrawn JIS C 6225, designated JIS X 0207 in later sources.[48] replaced FS with CEX or "Control Extension"[49] which introduces control sequences for vertical text behaviour, superscripts and subscripts[50] and for transmitting custom character graphics.[48]

Alternative C1 character sets

[edit]
  • A specialized C1 control code set is registered for bibliographic use (including string collation), such as by MARC-8.[23][51][52]
  • Various specialised C1 control code sets are registered for use by Videotex formats.[22]
  • The Stratus VOS operating system uses a C1 set called the NLS control set.[53] It includes SS1 (Single-Shift 1) through SS15 (Single-Shift 15) controls,[54] used to invoke individual characters from pre-defined supplementary character sets,[55] in a similar manner to the single-shift mechanism of ISO/IEC 2022. The only single-shift controls defined by ISO/IEC 2022 are SS2 and SS3; these are retained in the VOS set at their original code points and function the same way.
  • EBCDIC defines up to 29 additional control codes besides those present in ASCII. When translating EBCDIC to Unicode (or to ISO 8859), these codes are mapped to C1 control characters in a manner specified by IBM's Character Data Representation Architecture (CDRA).[56][57] Although the New Line (NL) does translate to the ISO/IEC 6429 NEL (although it is often swapped with LF, following UNIX line ending convention),[56] the remainder of the control codes do not correspond. For example, the EBCDIC control SPS and the ECMA-48 control PLU are both used to begin a superscript or end a subscript, but are not mapped to one another. Extended-ASCII-mapped EBCDIC can therefore be regarded as having its own C1 set, although it is not registered with the ISO-IR registry for ISO/IEC 2022.[22]

Unicode

[edit]

Unicode reserves the 65 code points described above for compatibility with the C0 and C1 control codes, giving them the general category Cc (control). These are:

Unicode only specifies semantics for the C0 format controls HT, LF, VT, FF, and CR (note BS is missing); the C0 information separators FS, GS, RS, US (and SP); and the C1 control NEL.[58] The rest of the codes are transparent to Unicode and their meanings are left to higher-level protocols, with ISO/IEC 6429 suggested as a default.[58]

Unicode includes many additional format effector characters besides these, such as marks, embeds, isolates and pops for explicit bidirectional formatting, and the zero-width joiner and non-joiner for controlling ligature use. However these are given the general category Cf (format) rather than Cc.

See also

[edit]

Footnotes

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
C0 and C1 control codes are standardized sets of non-printable characters that provide essential control functions for text processing, formatting, and device management in computer systems adhering to ASCII and its extensions. The C0 set comprises 32 codes in the range 0 to 31 (decimal), plus code 127 (DEL), originally defined in the ASCII standard (ISO 646) for basic operations like null termination, alerts, and line breaks. The C1 set includes 32 codes in the range 128 to 159 (decimal), enabling advanced features such as escape sequences for cursor control and private use areas, as specified in 8-bit character encodings. Together, these codes facilitate interoperability across text-based protocols, terminals, and data streams without altering printable content. Defined primarily in ISO/IEC 6429:1992 (equivalent to ECMA-48, 5th edition, 1991), the control codes support both 7-bit and 8-bit environments, with C1 codes representable in 7-bit systems via escape sequences starting with the ESC (U+001B) character from the C0 set. In the Unicode Standard, 65 code points (U+0000–U+001F, U+007F, U+0080–U+009F) are permanently reserved for C0 and C1 compatibility, ensuring stable semantics for higher-level protocols like terminal emulators and file formats, though their interpretation often depends on application-specific rules rather than Unicode itself. This reservation prevents reallocation and supports legacy data interchange, with semantics aligned to ISO/IEC 6429 where applicable. Key functions in the C0 set include basic formatting and communication aids, while C1 extends to presentation controls. Notable examples are summarized below:

Selected C0 Control Codes

Code (Decimal)NameFunctionCitation
0Null character; padding or terminator
7BELAudible bell or visual alert
9HTHorizontal tabulation
10LFLine feed
13Carriage return
27ESCEscape; introduces control sequences

Selected C1 Control Codes

Code (Decimal)NameFunctionCitation
128PADPadding character
133NELNext line (CR + LF equivalent)
155CSIControl Sequence Introducer; starts parameterized sequences (e.g., for cursor movement)
156STString Terminator
These codes originated in early standards from the and , evolving through international harmonization to support global text handling, and remain integral to modern systems like ANSI escape sequences in terminals.

Overview and History

Definition and Purpose

C0 and C1 control codes refer to specific ranges of non-printable characters within 7-bit and 8-bit coded character sets, designed for signaling and control rather than representing visible graphic symbols. The C0 set occupies the first 32 positions (codes 0x00 to 0x1F) and code 0x7F (DELETE) in both 7-bit and 8-bit encodings, while the C1 set comprises the next 32 positions (codes 0x80 to 0x9F) available only in 8-bit extensions. These control codes serve to manage the operation of peripherals and text processing systems, such as printers, terminals, and displays, by issuing commands for actions like formatting text layout (for instance, initiating line breaks) and regulating data transmission. Their primary purpose is to enable efficient interchange and processing of information without altering the visual content, with semantics typically defined by higher-level protocols for device-specific behaviors. Unlike graphic characters, which form the printable repertoire of a character set, C0 and C1 codes do not produce visible output but instead trigger functional responses in receiving systems; they can also form the basis for escape sequences that invoke additional control or character set changes. In standards like ISO 646, which defines a 7-bit structure with 128 total positions, the C0 set is mandatory as the 32 control positions, leaving 96 for graphic characters, while C1 extends this framework in 8-bit codes to support more advanced control functions.

Historical Development

The origins of C0 and C1 control codes trace back to 19th-century , where early binary encoding systems laid the groundwork for non-printing characters used in device control and formatting. Émile Baudot's 5-bit , patented in 1874, introduced uniform-length binary sequences for letters, numbers, and symbols, marking the first widely adopted digital communication protocol that influenced subsequent character encodings. By the early , these concepts evolved into teletype systems, such as those based on the International Telegraph Alphabet No. 2 (ITA2), a refined Baudot variant standardized in 1930 for mechanical teleprinters, which incorporated basic control functions for shifting between character sets and managing transmission. The standardization of the C0 control set began in the mid-20th century with the development of ASCII in 1963, formally known as ANSI X3.4-1963, which defined 32 control characters in positions 00–1F for functions like and line feed, alongside the delete () character at position 7F to aid tape erasure. This 7-bit code was quickly adopted internationally through the International Reference Version (IRV) of ISO 646, published in 1973, which harmonized national variants while preserving the core C0 controls to ensure compatibility across telegraph and systems. In , ECMA-6, adopted in 1965, mirrored ASCII's structure by specifying a 7-bit coded character set with up to 32 C0 controls, facilitating input/output operations in early computers and peripherals. The C1 control set emerged in the 1970s to extend capabilities for 8-bit environments, with ECMA-48's first edition in March 1976 introducing codes in the 80–9F hexadecimal range, including the Control Sequence Introducer (CSI) for parameterized device commands. This was harmonized internationally via ISO 6429, first published in 1983, which defined C1 functions for advanced formatting and became the basis for ISO/IEC 2022 in 1986, enabling dynamic switching between C0/C1 sets and graphic character sets. Key milestones included minor refinements in ECMA-48's subsequent editions, culminating in the 5th edition of June 1991, which added controls for coded character imaging while maintaining backward compatibility. International harmonization efforts addressed national variants in ISO 646, where differing graphic symbols were allowed but C0 controls remained invariant to promote , as seen in standards like the UK's BS 4731 and France's variants from the 1970s. By the 1990s, with Unicode's adoption in 1991 incorporating C0 and C1 as fixed ranges (U+0000–U+001F and U+0080–U+009F), the sets achieved global stability under a policy prohibiting removals or reassignments, ensuring no major changes through 2025 despite ongoing digital evolution.

Standard Control Codes

C0 Control Codes

The C0 control codes constitute the invariant set of 33 non-printing control characters standardized across all 7-bit systems, assigned to bit combinations 00/00 through 01/15 ( 00 to 1F) and 7/15 ( 7F for ). These codes are designed primarily for managing data interchange, transmission control, text formatting, and device operations in early and environments. Unlike graphic characters, C0 codes do not represent visible symbols but instead trigger specific actions in receiving devices, such as terminals or printers. The following table enumerates the standard C0 control codes, including their names, hexadecimal values, and primary functions as specified in the relevant international standards.
HexNamePrimary Function
00NULL (NUL)Acts as a filler or padding character with no effect on data content, often used for media-fill or time-fill.
01START OF HEADING (SOH)Marks the beginning of a message heading or control block in data streams.
02START OF TEXT (STX)Indicates the start of the textual content, terminating any preceding heading.
03END OF TEXT (ETX)Signals the conclusion of a block of text.
04END OF TRANSMISSION (EOT)Denotes the end of a complete transmission, potentially including multiple texts.
05ENQUIRY (ENQ)Requests a response from the receiving station, such as status information.
06ACKNOWLEDGE (ACK)Provides affirmative confirmation of successful data receipt.
07BELL (BEL)Triggers an audible or visible alert to notify the operator.
08BACKSPACE (BS)Moves the active position one character backward, sometimes interpreted as non-destructive in display systems.
09HORIZONTAL TABULATION (HT)Advances the position to the next predefined tab stop on the current line.
0ALINE FEED (LF)Advances the active position to the next line, maintaining the horizontal position.
0BVERTICAL TABULATION (VT)Moves to the next predefined vertical tab stop.
0CFORM FEED (FF)Advances to the starting position on the next page or form.
0DCARRIAGE RETURN (CR)Returns the active position to the beginning of the current line.
0ESHIFT OUT (SO)Invokes an alternate graphic character set, as per code extension techniques.
0FSHIFT IN (SI)Reverts to the standard (primary) character set.
10DATA LINK ESCAPE (DLE)Modifies the interpretation of subsequent characters for transmission control.
11DEVICE CONTROL ONE (DC1)Activates or initializes a device, often used as X-ON for flow control.
12DEVICE CONTROL TWO (DC2)Triggers device-specific operations or modes.
13DEVICE CONTROL THREE (DC3)Deactivates or halts a device, often used as X-OFF for flow control.
14DEVICE CONTROL FOUR (DC4)Interrupts or stops device operation.
15NEGATIVE ACKNOWLEDGE (NAK)Indicates refusal or error in response to a transmission.
16SYNCHRONOUS IDLE (SYN)Maintains timing synchronization during idle periods in synchronous transmission.
17END OF TRANSMISSION BLOCK (ETB)Marks the end of a logical block within a larger transmission.
18CANCEL (CAN)Aborts the current procedure and ignores preceding data as erroneous.
19END OF MEDIUM (EM)Identifies the physical end of a recording medium or data segment.
1ASUBSTITUTE (SUB)Replaces invalid or erroneous characters to prevent processing errors.
1BESCAPE (ESC)Serves as a prefix to introduce extended control sequences for additional functions.
1CFILE SEPARATOR (FS)Logically separates higher-level data structures, such as files.
1DGROUP SEPARATOR (GS)Delimits subgroups within a larger data hierarchy.
1ERECORD SEPARATOR (RS)Separates individual records in a structured dataset.
1FUNIT SEPARATOR (US)Divides the smallest units, such as fields, within a record.
7FDELETE (DEL)Obliterates or erases data, originally used to punch all holes in tape media for security.
These functions are categorized into transmission controls (e.g., SOH, ETX, EOT for message delimitation), format effectors (e.g., LF and CR for line navigation, HT for spacing), and information separators (e.g., FS through US for hierarchical data organization). In practice, LF and CR are frequently combined for line endings in text files and displays, while ESC enables invocation of more sophisticated controls beyond the basic C0 set. BEL remains widely used for audible feedback in terminals. Interpretations can vary by system; for instance, BS may delete characters in some printers but only move the cursor in others. The C0 set was formalized in ISO/IEC 646 (International Reference Version, IRV) and ECMA-6, ensuring compatibility across 7-bit codes, with DEL specifically for media erasure to prevent data recovery. This foundational set supports basic operations, serving as an extension point for the optional C1 controls in 8-bit environments.

C1 Control Codes

The C1 control codes constitute the secondary control set defined in ISO/IEC 6429 (harmonized from ECMA-48), occupying positions 80–9F in 8-bit coded character sets. This set extends the basic transmission and formatting capabilities of the C0 controls by introducing functions for structured text manipulation, device coordination, and intermediate sequence initiation, which are essential for advanced applications like video displays and printers. Unlike the C0 set, C1 codes generally require an 8-bit environment for direct transmission; in 7-bit systems, they are emulated via escape sequences consisting of the ESC (1B hex) character followed by a final byte in the range 40–5F hex. Key functions in the C1 set support enhanced formatting, such as Next Line (NEL, 85 hex), which repositions the active cursor to the first character position on the subsequent line, functioning equivalently to a combined and line feed in many systems. The Control Sequence Introducer (CSI, 9B hex) enables parameterized commands for precise control, such as cursor movement or attribute setting in terminal emulators, while the String Terminator (ST, 9C hex) delimits the end of such sequences to prevent ambiguity in data streams. Additional codes facilitate device-specific operations, including Start of Protected Area (SPA, 96 hex) and End of Protected Area (EPA, 97 hex), which designate text regions immune to erasure or overwrite in interactive displays. Codes like Device Control String (DCS, 90 hex) allow for vendor-defined commands, supporting customization in hardware like impact printers. The C1 set reserves several positions for private or national use, such as Private Use One (PU1, 91 hex) and Private Use Two (PU2, 92 hex), permitting implementers to assign non-standard functions through bilateral agreement without conflicting with the core standard. Overall, these codes emphasize sequential and contextual control, distinguishing them from the immediate-action focus of C0, and their adoption underscores the evolution toward interoperable, feature-rich text processing in international standards. The standard 32 C1 control codes, as specified in ISO/IEC 6429 and ECMA-48, are detailed in the following table, including hexadecimal values, official names, and concise functional descriptions:
HexNameDescription
80PADProvides padding for time-fill or media synchronization in transmission.
81HOPPresets high-order bits for subsequent code extension techniques.
82BPHSignals a permissible point for line breaking during text formatting.
83NBHProhibits line breaking at the current position in formatted text.
85NELMoves the active position to the initial position on the following line.
86SSADesignates the start of a selectable or transmittable text area.
87ESADesignates the end of a selectable or transmittable text area.
88HTSEstablishes a horizontal tab stop at the current active position.
89HTJAdvances to the next tab stop and performs character justification.
8AVTSEstablishes a vertical tab stop at the current active line.
8BPLDShifts the active position forward by a partial line increment for imaging.
8CPLUShifts the active position backward by a partial line increment for imaging.
8DRIMoves the active position to the initial position on the preceding line.
8ESS2Temporarily invokes the G2 character set for the immediately following graphic character.
8FSS3Temporarily invokes the G3 character set for the immediately following graphic character.
90DCSIntroduces a device-specific control string, terminated by ST.
91PU1Reserved for private, user-defined control functions.
92PU2Reserved for private, user-defined control functions.
93STSConfigures the transmit state for data flow from the device.
94CCHInvalidates the effect of the preceding character in the stream.
95MWActivates a message-waiting indicator on the receiving device.
96SPADesignates the start of a protected or guarded text area.
97EPADesignates the end of a protected or guarded text area.
98SOSIntroduces a delimited string for special processing, terminated by ST.
99SGCIIntroduces a single graphic character for intermediate processing.
9ASCIIntroduces a single control function or character.
9BCSIIntroduces a control sequence, optionally with parameters and intermediates.
9CSTTerminates control strings initiated by DCS, OSC, PM, APC, or SOS.
9DOSCIntroduces an operating system command string, terminated by ST.
9EPMIntroduces a privacy or user message string, terminated by ST.
9FAPCIntroduces an application program command string, terminated by ST.

Alternative Control Code Sets

Alternative C0 Sets

In certain specialized systems, alternative C0 control sets deviate from the ISO 646 baseline to accommodate domain-specific requirements, such as enhanced formatting for transmission protocols or display capabilities. For instance, the ANPA-1312 standard, developed for news wire services by the American Newspaper Publishers Association, extensively employs C0 characters like SOH (start of heading), STX (start of text), ETX (end of text), and EOT (end of transmission) for markup and segmentation in text transmission, effectively repurposing them beyond general-purpose control while maintaining positional invariance in the 7-bit code. This approach arose from legacy hardware constraints in teletype systems, prioritizing reliable news dissemination over universal compatibility. Videotex systems, including the British Prestel service, introduce a variant C0 set tailored for interactive display and mosaic graphics, as defined in the North American Presentation Level Protocol Syntax (PLPS). Here, standard format effectors like BS (backspace) and LF (line feed) are supplemented or replaced by adjacent positioning controls such as APB (0/8, adjacent back), APF (0/9, adjacent forward), APD (0/10, adjacent down), and APU (0/11, adjacent up), which enable precise cursor movement for rendering alphanumeric and block mosaic characters without disrupting screen layout. Additional codes like CS (0/12, clear screen), APR (0/13, adjacent return), and NSR (1/15, new screen request) support page-based navigation and reset functions essential for low-bandwidth terminal interactions. These adaptations stem from the need to optimize 7-bit or 8-bit channels for consumer-grade modems and televisions, with mosaic controls invoking G3 sets via SS3 (1/13) for semigraphic elements like diagonals and lines. In , IBM's 8-bit encoding for mainframe systems, C0-equivalent controls occupy positions 0x00 to 0x3F but with significant shifts from ASCII/ISO alignments, reflecting punch-card heritage and hardware-specific signaling. For example, NUL remains at X'00', SOH at X'01', and CR at X'0D', but LF is relocated to X'25' (outside traditional C0), while utilities like RES (restore, X'14') and NL (new line, X'15') serve combined CR+LF functions. This mapping supports legacy peripherals like tape drives and printers, where bit patterns prioritize BCD compatibility over international standardization. The CCITT T.61 recommendation for Teletex and telematic services adheres closely to ISO 646 C0 without positional deviations, defining standard functions like HT (horizontal tabulation), VT (vertical tabulation), and FF (form feed) for document interchange, though invocation via ESC sequences allows context-specific extensions. Such alternatives often emerged from industry needs, including banking protocols requiring custom delimiters or hardware limitations in systems, but convergence toward ISO standards post- limited their proliferation. ISO 2022 facilitates transitions by permitting C0 designation via ESC F, yet mandates the primary C0 set remain invariant in most invocations to ensure , with G0/G1 shifts handling graphic variants instead.

Alternative C1 Sets

Alternative C1 sets emerged in proprietary, sector-specific, and international standards to address specialized requirements beyond the baseline ISO/IEC 6429 C1 controls, often redefining the 0x80–0x9F range for enhanced formatting, graphics, or functions. In systems, particularly those standardized by ETSI for services like the French , the C1 set incorporates extensions for visual presentation, including color selection (e.g., 0x90 for foreground color), mosaic graphics rendering (e.g., 0x97 for mosaic block selection), and display adjustments such as size control (e.g., 0x8B followed by parameters for double height or width). These deviate from the standard C1 by prioritizing telematic and interactive display features over general text processing. IBM's encoding remaps control functions across its 8-bit structure, placing some C1-equivalent operations in higher bit positions (e.g., New Line at 0x15, akin to a C0 shift but extended for mainframe data streams), while additional C1-like controls in the 0x80–0xFF range support device-specific operations like printer formatting in legacy systems. Notable deviations appear in code pages like , where the 0x80–0x9F range assigns printable graphic characters (e.g., 0x80 as a , 0x92 as an opening single quote) instead of control functions, effectively repurposing the C1 block for Western European text display in environments. Private C1 sets, registered under ISO 2022 for OSI application layers, allow invocation via escape sequences for domain-specific uses, such as locking shifts in telematic protocols. The CCITT (now ) Recommendation T.61 for Teletex defines a specialized C1 set focused on document interchange, including codes for page ejection (0x0C shifted), superscript/subscript toggles, and fixed-spacing modes to support international telex-like formatting in early electronic mail and systems. Early proposals during Unicode's development in the late and early explored custom C1 sets, such as the bibliographic-oriented DIN 31626 registered in 1979, but the standard ultimately adopted the ISO C1 while reserving flexibility for private-use controls in terminal and legacy integrations. By the 1990s, adoption shifted toward escape sequences (e.g., ECMA-48 CSI sequences) for invoking advanced functions, reducing reliance on raw alternative C1 codes and confining their use to legacy hardware and protocols.

Encoding and Representation

In ASCII and ISO Standards

In the American Standard Code for Information Interchange (ASCII), defined as a 7-bit code in ANSI X3.4-1986 and harmonized with ISO/IEC 646, the C0 control codes occupy bit positions 0/0 through 1/15, corresponding to values 00 through 1F, with bit patterns ranging from 0000000 to 00011111. The (DEL) is positioned at 7/15 ( 7F, bit pattern 0111111 in 7 bits). ASCII provides no native positions for C1 control codes, as it is limited to 7 bits; instead, C1 emulation in 7-bit environments uses the (ESC, 1B, bit pattern 00011011) followed by a byte in the range 40 through 5F (bit patterns 01000000 to 01011111). This mapping derives from the formula where a C1 code ( 80–9F) is represented as ESC followed by (C1 value minus 80 plus 40 ), preserving the low 7 bits of the C1 code. For example, the control sequence introducer (CSI, 9B) emulates as ESC followed by 5B (ESC [). In 8-bit extensions like ISO/IEC 8859 series (e.g., ISO/IEC 8859-1), the C0 set remains fixed in the low bits at positions 00–1F , matching ASCII for compatibility. The C1 set is assigned to the high bits at positions 80–9F (bit patterns 10000000 through 10011111), enabling direct single-byte transmission in 8-bit environments. These standards reserve the first two columns (0–1 and 8–9) of the 96x96 code table structure for control functions, ensuring C0 operates in the character left (CL) area and C1 in the character right (CR) area. Invocation and shifting mechanisms follow ISO/IEC 2022 (ECMA-35), which supports both 7-bit and 8-bit codes. The shift-in (SI, hexadecimal 0F, from C0) and shift-out (SO, hexadecimal 0E, from C0) controls invoke the G0 or G1 graphic sets into the GL position for character shifting, facilitating 7-bit safe transmission where the eighth bit may serve as parity. For C1 designation, the sequence ESC followed by 02/02 and a final byte F (e.g., ESC 02/02 04/02 for the standard C1 set) identifies and enables the C1 control set. In 7-bit systems lacking direct C1 support, individual C1 functions are invoked via the ESC Fe sequence, where Fe is a byte from 40–5F hexadecimal, ensuring compatibility by transmitting C1 equivalents as two 7-bit bytes over parity-aware channels. This approach allows 7-bit devices to process 8-bit C1 controls without loss, though it doubles the byte count for those sequences.

In Unicode

In Unicode, the C0 control codes are assigned to the code points U+0000 through U+001F and U+007F (DELETE), while the C1 control codes occupy U+0080 through U+009F, all located within the Basic Multilingual Plane (BMP) for compatibility with legacy 7-bit and 8-bit encodings. These 65 code points (33 for C0 including DEL and 32 for C1) are classified under the Unicode category "Cc" (Other, Control), distinguishing them from graphic characters and treating them as non-renderable controls whose semantics are defined by external protocols rather than Unicode itself. Applications must preserve these codes during interchange to maintain integrity, and Unicode normalization forms apply identity mappings to them without alteration or removal. Unicode supports the interpretation of these controls in accordance with ISO/IEC 6429:1992, where the (U+001B) initiates control sequences such as Control Sequence Introducer (CSI) for formatting and device control. This ensures compatibility with ISO/IEC 2022 escape sequences, allowing higher-level protocols to define behaviors like cursor movement or screen clearing. Additionally, 's general formatting controls in the range U+2060 through U+206F, such as the ZERO WIDTH NON-JOINER (U+200C) and INVISIBLE SEPARATOR (U+2063), extend concepts from C1 controls by providing invisible operators for text shaping and layout without affecting visible rendering. Certain C1 code points carry aliases reflecting historical proposals, such as U+0080 (PADDING CHARACTER, also known as HIGH OCTET PRESET), but strongly discourages remapping these positions to graphic characters or alternative semantics to preserve interoperability and stability. The C0 controls were included from Unicode 1.0 (1991), with the full C1 set integrated in the same initial release as part of the Latin-1 compatibility block, and no substantive changes have occurred since. As of 17.0 (2025), these assignments remain stable, with ongoing policies prohibiting their repurposing as private-use areas. In encoding, C0 controls are represented as single-byte sequences from 00 to 1F and 7F, directly matching their code point values, while C1 controls use two-byte sequences from C2 80 to C2 9F to avoid conflicts with ASCII and ensure lossless round-trip conversion from legacy 8-bit standards. This encoding preserves the controls' positions and behaviors across modern text processing, supporting their use in protocols like and XML where they must be escaped or handled explicitly.

In EBCDIC and Legacy Systems

In , the Extended Binary Coded Decimal Interchange Code primarily used on systems, the C0 control codes are not assigned to a contiguous block of code points from 0x00 to 0x1F as in ASCII and ISO 646 standards. Instead, they are scattered across the lower range of code points, reflecting EBCDIC's origins in encodings and early hardware designs. For instance, in 1047 (CCSID 1047), a common EBCDIC variant for open systems, the (NUL) is at 0x00, (CR) at 0x0D, and line feed (LF) at 0x25, while the (DEL) is absent in the standard ASCII position of 0x7F and instead represented differently, often as 0x07 for a delete function but without direct equivalence. Extended code pages provide partial support for C1-like control functions, typically mapped to the range 0x80–0x9F, though not as a full, contiguous set compatible with ISO/IEC 6429. In IBM 1047, examples include the control sequence introducer (CSI) at 0x3B (mapping to Unicode ) and single character introducer (SCI) at 0x3A (), which enable escape sequences for terminal control, while other C1 codes like privacy message (PM) appear at 0x3E (). These mappings allow limited emulation of C1 behaviors through software interpretation, but direct hardware support is absent, leading to incompatibilities in interchange. Conversion tables between EBCDIC and ASCII, such as those used in mainframe utilities, adjust for these differences; for example, EBCDIC LF (0x25) maps to ASCII LF (0x0A), and CR remains at 0x0D in both, though new line sequences may require additional handling like EBCDIC NL (0x15) converting to ASCII CR-LF pairs. Legacy systems beyond standard EBCDIC implementations further diverge in control code usage. systems, foundational to EBCDIC's development, employed Hollerith codes—a 12-row/80-column format influenced by early Baudot-like 5- and 6-bit encodings—where control functions like end-of-record were indicated by specific hole patterns rather than byte values, with later adopting compatible hole assignments for shared controls such as CR. terminals, used in mainframe environments, relied on custom orders embedded in the rather than standard C0 or C1 codes; these include start field (SF) at 0x1D for attribute setting and set buffer address (SBA) at 0x11 for cursor positioning, processed by the terminal's to manage screen displays. Mainframe utilities like those in z/OS handle these through proprietary protocols, often requiring explicit translation. EBCDIC lacks native, direct support for the full C1 set, necessitating software-based translations in tools such as iconv for with ASCII or systems, where escape sequences approximate missing functions. By 2025, EBCDIC control handling in primarily occurs through emulation layers, supporting legacy applications while facilitating conversions to modern encodings. This has driven a shift toward for enhanced since the 1990s, reducing reliance on EBCDIC's scattered controls in new developments, though it remains entrenched in existing mainframe .
Control FunctionEBCDIC 1047 (Hex)ASCII Equivalent (Hex)Notes
NUL0x000x00Null terminator
0x0D0x0D
LF0x250x0ALine feed; requires mapping
ESC0x270x1BEscape for sequences
CSI0x3B0x9BC1-like; partial support
0x07 (partial)0x7FNo direct match; often ignored in conversions

Modern Usage and Implications

In Terminals and Display Systems

In terminal emulators such as , , and , C0 and C1 control codes remain integral for implementing ANSI and VT100-compatible sequences that manage cursor movement, colors, and text attributes. For instance, the C1 Control Sequence Introducer (CSI, ESC [ or single-byte 0x9B) enables commands like ESC [A to move the cursor up one line or ESC [31m to set red foreground color, ensuring compatibility with legacy applications while supporting modern rendering. Modern extensions build on these foundations; the xterm-256color terminfo entry, widely adopted in emulators like (default since version 3.16), leverages C1-derived sequences such as Operating System Command (OSC, ESC ] or 0x9D) to set window titles or manipulate 256-color palettes. Similarly, web-based terminals like .js preserve core C0 functions, including Line Feed (LF, 0x0A) and (CR, 0x0D), to accurately render line breaks and cursor positioning in browser environments. Network protocols like SSH and facilitate the transmission of raw C0 and C1 codes over connections, allowing remote terminals to interpret them directly for interactive sessions. Emulators claiming or higher compatibility, such as those using the VTE library in , fully support C1 codes like Next Line (NEL, 0x85), which combines line feed with to enable automatic line wrapping without overflow. As of 2025, Wayland compositors have enhanced support for these codes in native terminals like Foot, integrating them seamlessly with GPU acceleration for smoother rendering, though raw C1 usage has declined in UTF-8-dominant environments where codes 0x80–0x9F may conflict with multi-byte sequences. The library exemplifies this evolution by generating C0/C1-equivalent escape sequences via databases, optimizing output for diverse terminals without direct byte manipulation. Physical hardware terminals are rare today, with C0 and C1 primarily emulated in software-based virtual consoles, such as Linux's /dev/tty devices, which interpret a of these codes for text-mode operation on framebuffers.

In Programming and Data Processing

In contemporary programming languages, C0 and C1 control codes are integral to text manipulation and validation routines. For instance, Python's str.splitlines() method relies on C0 controls such as LF (U+000A) and CR (U+000D) to identify line boundaries, treating sequences like CR LF as a single while preserving the original characters in the output unless specified otherwise. Similarly, Java's Character.isISOControl() method identifies characters in the C0 and C1 ranges (U+0000–U+001F and U+007F–U+009F) as ISO control codes, enabling developers to filter or validate input strings for compliance with standards like ISO/IEC 2022. In , the std::char::from_u32() function constructs characters from code points, including those in the C0 and C1 blocks (U+0000–U+009F), which is commonly used for safe parsing of legacy or streams. Specialized libraries extend this handling for terminal and internationalization contexts. The ncurses library, widely used for text-based user interfaces, interprets C1 escape sequences (e.g., CSI for cursor control) to manage screen output in Unix-like environments, mapping them to function calls like tputs() for device-independent rendering. PDCurses, a portable variant, similarly processes C1 codes for cross-platform terminal emulation, ensuring compatibility with Windows consoles by translating them into API calls. For Unicode-aware applications, the International Components for Unicode (ICU) library provides normalization functions that handle C0 and C1 codes, such as converting variant forms or stripping them during collation, adhering to Unicode Standard Annex #15 for compatibility decomposition. Data formats impose specific rules for C0 and C1 inclusion to maintain portability. In , control characters must be escaped (e.g., as \u000A for LF) to avoid parsing errors, as unescaped controls outside string contexts are invalid per RFC 8259. XML documents permit C0 controls within character data but require escaping for certain ones like NUL (U+0000) to comply with the XML 1.0 specification, ensuring well-formedness during serialization. CSV processing, governed by RFC 4180, treats the C0 HT (U+0009) as a common field in tab-separated variants, while other controls may disrupt if not quoted, prompting libraries like Python's csv module to handle them as literal data. Common processing techniques involve regex patterns and I/O modes to manage these codes. Python's re module supports substitutions like re.sub(r'[\x00-\x1F\x7F-\x9F]', '', text) to strip C0 and C1 characters, a standard approach for sanitizing user input in web backends. File I/O in binary mode, such as Python's 'rb' or 'wb', preserves all byte values including controls without alteration, contrasting with text mode which may normalize line endings based on C0 LF/CR. As of 2025, trends in cloud-native development emphasize robust UTF-8 handling of C0 controls; for example, the AWS CLI incorporates LF and CR for interactive prompts in its output streams, leveraging UTF-8 encoding for global compatibility. Meanwhile, web applications increasingly implement proactive sanitization of C1 codes at the framework level, such as in or Django, to align with browser security models while supporting legacy data migration.

Security and Compatibility Issues

C0 and C1 control codes, particularly escape sequences like ESC (U+001B) and BEL (U+0007), have been exploited in injection attacks against terminal emulators, allowing attackers to manipulate display output or execute unintended actions. In 2022, a (CVE-2022-45063) in enabled code execution through crafted OSC 50 s pasted into zsh with vi-mode line editing, potentially allowing unauthorized actions. More broadly, ANSI injections can lead to denial-of-service (DoS) by overwhelming terminal rendering or altering user interfaces to deceive operators. Raw C0 characters, such as NUL (U+0000), have also been used in DoS attacks by flooding inputs, causing hangs in systems or string-handling functions due to improper neutralization. Compatibility challenges arise in modern environments where legacy C1 codes (U+0080 to U+009F) are often ignored or stripped by decoders, leading to inconsistent behavior across systems. Web browsers, for instance, typically treat these C1 controls as invalid in streams and remove them during rendering, which can result in or unexpected parsing in web-based terminals. In networked transmissions, while single-byte C0 codes are generally unaffected by due to their ASCII origins, multi-byte protocols embedding control sequences may encounter byte-order mismatches, complicating interoperability in heterogeneous environments. Additionally, overlong encodings of C1 characters enable evasion of input validation filters, allowing malicious sequences to bypass security checks and inject control codes into applications. Private C1 codes, intended for vendor-specific extensions, have been misused in malware to embed hidden commands or disrupt parsing, though documented cases remain limited. A notable 2025 example is CVE-2025-61984 in versions before 10.1, where control characters such as newlines in usernames from untrusted sources via ProxyCommand expansion allow command injection and remote code execution. Post-2020 exploits have increasingly targeted terminals with ANSI sequences for remote code execution across platforms, including mobile devices. In AI contexts, large language models (LLMs) face parsing risks from control codes, as attackers use ANSI escapes to deceive models by hiding malicious instructions in terminal-like outputs, potentially leading to insecure code generation or overlooked vulnerabilities. To mitigate these risks, developers should sanitize inputs by stripping or escaping control characters like ESC before processing, using libraries that validate strictly to reject overlong forms. Employing Normalization Form C (NFC) ensures consistent representation of compatible characters, reducing evasion opportunities. Cross-platform testing is essential, particularly for line-ending differences where Windows uses CR+LF (U+000D U+000A) while Unix employs LF alone, to prevent parsing errors in data exchange.

References

Add your contribution
Related Hubs
Contribute something
User Avatar
No comments yet.