Recent from talks
Nothing was collected or created yet.
Caret notation
View on WikipediaThis article needs additional citations for verification. (July 2013) |
Caret notation is a notation for control characters in ASCII. The notation assigns ^A to control-code 1, sequentially through the alphabet to ^Z assigned to control-code 26. For the control-codes outside of the range 1–26, the notation extends to the adjacent, non-alphabetic ASCII characters; for example ^@ is used for control-code 0.
Often a control character can be typed on a keyboard by holding down the Ctrl and typing the character shown after the caret. The notation is often used to describe keyboard shortcuts even though the control character is not actually used (as in "type ^X to cut the text").
The meaning or interpretation of, or response to the individual control-codes is not prescribed by the letters in caret notation.
Description
[edit]The notation consists of a caret (^) followed by a single character (usually a capital letter). The character has the ASCII code equal to the control code with the bit representing 0x40 reversed. A useful mnemonic, this has the effect of rendering the control codes 1 through 26 as ^A through ^Z. Seven ASCII control characters map outside the upper-case alphabet: 0 (NUL) is ^@, 27 (ESC) is ^[, 28 (FS) is ^\, 29 (GS) is ^], 30 (RS) is ^^, 31 (US) is ^_, and 127 (DEL) is ^?.
Examples are "^M^J" for the Windows CR, LF newline pair, and describing the ANSI escape sequence to clear the screen as "^[[3J".
Only the use of characters in the range of 63–95 ("?@ABC...XYZ[\]^_") is specifically allowed in the notation, but use of lower-case alphabetic characters entered at the keyboard is nearly always allowed – they are treated as equivalent to upper-case letters. When converting to a control character, except for '?', masking with 0x1F will produce the same result and also turn lower-case into the same control character as upper-case.
There is no corresponding version of the caret notation for control-codes with more than 7 bits such as the C1 control characters from 128–159 (0x80–0x9F). Some programs that produce caret notation show these as backslash and octal ("\200" through "\237"). Also see the bar notation used by Acorn Computers, below.
History
[edit]The convention dates back to at least the PDP-6 (1964) from Digital Equipment Corporation (DEC) and DEC's operating system for it. A manual for the PDP-6 describes Control+C as printing ↑C, i.e., a small superscript upwards arrow before the C.[1] In the change from 1961 ASCII to 1968 ASCII, the up arrow became a caret.[2] The PDP-6's successor, the PDP-10, and its operating system used the same convention. Some non-DEC operating systems for PDP-10s, such as TENEX[3] and ITS,[4] adopted the convention as well.
The same convention was used in DEC's operating systems for its PDP-11 minicomputer, such as RT-11,[5] RSTS,[6][7] and RSX-11M.[8]
Earlier versions of Unix did not use the caret convention to display non-printing control characters, although the stty command accepted caret notation when setting the character-erase and line-kill characters.[9] 4BSD added a ctlecho mode in which control characters are echoed using caret notation;[10] this has been adopted by modern Unix-like systems as echoctl.[11][12][13][14][15][16]
Use in software
[edit]Caret notation is used to describe control characters in output by many programs, especially on Unix. They can be seen when echoing characters as the user types them as input, and showing the contents of files in a text editor or with the more and less commands.
Many terminals and terminal emulators allow the user to enter a control character by holding down Ctrl and typing the caret notation letter. Many control characters (e.g., EOT) otherwise cannot be entered directly from a keyboard. Usually, the need to hold down ⇧ Shift is avoided, for instance lower-case letters work just like upper-case ones. On a US keyboard layout ctrl+/ produces DEL and ctrl+2 produces ^@. It is also common for ctrl+space to produce ^@.
This correspondence has affected shortcuts used even in modern software. For instance it might be tempting to make Ctrl+H mean "Help" but this is the same code as ← Backspace so other shortcuts for Help were devised.
Alternate notations
[edit]The GSTrans string processing API on the operating systems for the Acorn Atom and the BBC Micro, and on RISC OS for the Acorn Archimedes and later machines, use the vertical bar character | in place of the caret. For example, |M (pronounced "control em", the same as for the ^M notation) is the carriage return character, ASCII 13. || is the vertical bar character code 124, |? is character 127 as above and |! adds 128 to the code of the character that follows it, so |!|? is character code 128 + 127 = 255.
See also
[edit]- C0 and C1 control codes, which shows the caret notation for all C0 control codes as well as DEL
- Control key
References
[edit]- ^ PDP-6 Timesharing Software (PDF). Digital Equipment Corporation. p. 4.
- ^ Haynes, Jim (2015-01-13). "First-Hand: Chad is Our Most Important Product: An Engineer's Memory of Teletype Corporation". Engineering and Technology History Wiki (ETHW). Archived from the original on October 31, 2016. Retrieved 2016-10-31.
There was the change from 1961 ASCII to 1968 ASCII. Some computer languages used characters in 1961 ASCII such as up arrow and left arrow. These characters disappeared from 1968 ASCII. We worked with Fred Mocking, who by now was in Sales at Teletype, on a type cylinder that would compromise the changing characters so that the meanings of 1961 ASCII were not totally lost. The underscore character was made rather wedge-shaped so it could also serve as a left arrow.
- ^ TENEX Executive Manual for Users (PDF). Bolt, Beranek, and Newman. April 1973. pp. 6, 17–18.
- ^ "ITSTTY".
- ^ RT-11 System Reference Manual (PDF). Digital Equipment Corporation. September 1973. p. 2-3. DEC-11-ORUGA-A-D.
- ^ PDP-11 Resource Time-Sharing System (RSTS-11) User's Guide BASIC-Plus Programming Language (PDF). Digital Equipment Corporation. September 1973. p. 1-6. PL-11-71-01-01-A-D.
- ^ RSTS-11 System User's Guide (PDF). Digital Equipment Corporation. July 1975. pp. 3-2 – 3-3. DEC-11-ORSUA-D-D.
- ^ RSX-11M Operator's Procedures Manual (PDF). Digital Equipment Corporation. 1975. pp. 2-2 – 2-4. DEC-11-OMOGA-B-D.
- ^ – Version 7 Unix Programmer's Manual
- ^ stty(1) – 4BSD Unix Programmer's Manual
- ^ – Linux User Manual – User Commands from Manned.org
- ^ – Solaris 11.4 User Commands Reference Manual
- ^ – FreeBSD General Commands Manual
- ^ – Darwin and macOS General Commands Manual
- ^ – HP-UX 11i User Commands Manual
- ^ stty - AIX Commands Manual
Caret notation
View on GrokipediaFundamentals
Definition and Purpose
Caret notation is a convention for representing non-printable ASCII control characters using the caret symbol (^) followed by an uppercase letter or specific symbol, corresponding to the 33 control codes in the ASCII standard: values 0 through 31 and 127.[3][4] This notation provides a compact, mnemonic way to denote these characters in textual contexts, where the letter following the caret typically represents the uppercase equivalent of the control code's bit pattern (e.g., ^A for code 1).[3] ASCII control characters are a subset of the character set defined in the American Standard Code for Information Interchange (ASCII), consisting of non-printable codes intended for device control, text formatting, or data transmission rather than visual display.[4] Examples include the line feed (code 10), which advances the cursor to the next line, and the horizontal tab (code 9), which moves the cursor to the next tab stop.[4] These characters are "invisible" in output, as they do not produce visible glyphs but instead trigger specific hardware or software behaviors, such as carriage return (code 13) for returning the cursor to the line start.[4] The primary purpose of caret notation is to facilitate the human-readable depiction of these control characters in environments where direct rendering is impossible or impractical, such as plain text files, command-line interfaces, or programming documentation.[3] By converting control codes into printable strings like ^G for the bell character (code 7), it bridges the divide between low-level binary signals and accessible textual descriptions, aiding in troubleshooting and communication.[3] This approach is particularly valuable in software libraries, such as those implementing the unctrl() function in POSIX systems, which automatically generate such representations for display purposes.[3] Among its benefits, caret notation improves clarity and usability in technical contexts by avoiding more cumbersome alternatives like decimal or hexadecimal values, allowing developers and users to quickly recognize and reference control sequences in logs, error messages, and manuals without specialized tools.[5] For instance, in debugging terminal output, ^D (code 4) can succinctly indicate an end-of-file signal, enhancing readability over raw byte values.[3]Syntax and Mapping
Caret notation represents non-printable ASCII control characters (codes 0–31 and 127) using a caret symbol (^) followed immediately by an uppercase letter from A to Z or a specific symbol, providing a textual way to denote these otherwise invisible characters.[6] For the standard alphabetic mappings, ^X denotes the ASCII control code equal to 1 plus the position of X in the alphabet minus 1, where A is position 1, B is 2, and so on up to Z as 26; thus, ^A corresponds to code 1 (Start of Heading, SOH), ^B to code 2 (Start of Text, STX), and ^Z to code 26 (Substitute, SUB).[7] Certain control codes beyond the A–Z range use special symbols after the caret: ^@ for code 0 (Null, NUL), ^[ for code 27 (Escape, ESC), ^\ for code 28 (File Separator, FS), ^] for code 29 (Group Separator, GS), ^^ for code 30 (Record Separator, RS), ^_ for code 31 (Unit Separator, US), and ^? for code 127 (Delete, DEL).[6] These mappings cover all 33 ASCII control characters, with no notation defined for printable characters in the range 32–126, as they are represented directly.[7] The notation is case-insensitive, meaning ^a is equivalent to ^A, though uppercase letters are conventionally used for consistency in documentation and displays.[6] The following table lists all caret notations with their corresponding ASCII decimal values and standard names:| Caret | Decimal | Name |
|---|---|---|
| ^@ | 0 | Null (NUL) |
| ^A | 1 | Start of Heading (SOH) |
| ^B | 2 | Start of Text (STX) |
| ^C | 3 | End of Text (ETX) |
| ^D | 4 | End of Transmission (EOT) |
| ^E | 5 | Enquiry (ENQ) |
| ^F | 6 | Acknowledgment (ACK) |
| ^G | 7 | Bell (BEL) |
| ^H | 8 | Backspace (BS) |
| ^I | 9 | Horizontal Tab (HT) |
| ^J | 10 | Line Feed (LF) |
| ^K | 11 | Vertical Tab (VT) |
| ^L | 12 | Form Feed (FF) |
| ^M | 13 | Carriage Return (CR) |
| ^N | 14 | Shift Out (SO) |
| ^O | 15 | Shift In (SI) |
| ^P | 16 | Data Link Escape (DLE) |
| ^Q | 17 | Device Control 1 (DC1) |
| ^R | 18 | Device Control 2 (DC2) |
| ^S | 19 | Device Control 3 (DC3) |
| ^T | 20 | Device Control 4 (DC4) |
| ^U | 21 | Negative Acknowledgment (NAK) |
| ^V | 22 | Synchronous Idle (SYN) |
| ^W | 23 | End of Transmission Block (ETB) |
| ^X | 24 | Cancel (CAN) |
| ^Y | 25 | End of Medium (EM) |
| ^Z | 26 | Substitute (SUB) |
| ^[ | 27 | Escape (ESC) |
| ^\ | 28 | File Separator (FS) |
| ^] | 29 | Group Separator (GS) |
| ^^ | 30 | Record Separator (RS) |
| ^_ | 31 | Unit Separator (US) |
| ^? | 127 | Delete (DEL) |
