Hubbry Logo
Substitute characterSubstitute characterMain
Open search
Substitute character
Community hub
Substitute character
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something
Substitute character
Substitute character
from Wikipedia

In computer data, a substitute character (␚) is a control character that is used to pad transmitted data in order to send it in blocks of fixed size, or to stand in place of a character that is recognized to be invalid, erroneous or unrepresentable on a given device. It is also used as an escape sequence in some programming languages.

In the ASCII character set, this character is encoded by the number 26 (1A hex). Standard keyboards transmit this code when the Ctrl and Z keys are pressed simultaneously (Ctrl+Z, often documented by convention as ^Z).[1] Unicode inherits this character from ASCII, but recommends that the replacement character (�, U+FFFD) be used instead to represent un-decodable inputs, when the output encoding is compatible with it.

Uses

[edit]

End of file

[edit]

Historically, under PDP-6 monitor,[2] RT-11, VMS, and TOPS-10,[3] and in early PC CP/M 1 and 2 operating systems (and derivatives like MP/M) it was necessary to explicitly mark the end of a file (EOF) because the native filesystem could not record the exact file size by itself; files were allocated in extents (records) of a fixed size, typically leaving some allocated but unused space at the end of each file.[4][5][6][7] This extra space was filled with 1A16 (hex) characters under CP/M. The extended CP/M filesystems used by CP/M 3 and higher (and derivatives like Concurrent CP/M, Concurrent DOS, and DOS Plus) did support byte-granular files,[8][9] so this was no longer a requirement, but it remained as a convention (especially for text files) in order to ensure backward compatibility.

In CP/M, 86-DOS, MS-DOS, PC DOS, DR-DOS, and their various derivatives, the SUB character was also used to indicate the end of a character stream,[citation needed] and thereby used to terminate user input in an interactive command line window (and as such, often used to finish console input redirection, e.g. as instigated by the command COPY CON: TYPEDTXT.TXT).

While no longer technically required to indicate the end of a file, as of 2017, many text editors[which?] and program languages still support this convention, or can be configured to insert this character at the end of a file when editing, or at least properly cope with them in text files.[citation needed] In such cases, it is often termed a "soft" EOF, as it does not necessarily represent the physical end of the file, but is more a marker indicating that "there is no useful data beyond this point". In reality, more data may exist beyond this character up to the actual end of the data in the file system, thus it can be used to hide file content when the file is entered at the console or opened in editors. Many file format standards (e.g. PNG or GIF) include the SUB character in their headers to perform precisely this function. Some modern text file formats (e.g. CSV-1203[10]) still recommend a trailing EOF character to be appended as the last character in the file. However, typing Control+Z does not embed an EOF character into a file in either MS-DOS or Windows, nor do the APIs of those systems use the character to denote the actual end of a file.

Some programming languages (e.g. Visual Basic) will not read past a "soft" EOF when using the built-in text file reading primitives (INPUT, LINE INPUT etc.),[citation needed] and alternate methods must be adopted, e.g. opening the file in binary mode or using the File System Object to progress beyond it.

Character 26 was used to mark "End of file" even though ASCII calls this character Substitute, and has other characters to indicate "End of file". Number 28 which is called "File Separator" has also been used for similar purposes.

Other uses

[edit]

In Unix-like operating systems, this character is typically used in shells as a way for the user to suspend the currently executing interactive process.[11] The suspended process can then be resumed in foreground (interactive) mode, or be made to resume execution in background mode, or be terminated. When entered by a user at their computer terminal, the currently running foreground process is sent a "terminal stop" (SIGTSTP) signal, which generally causes the process to suspend its execution. The user can later continue the process execution by using the "foreground" command (fg) or the "background" command (bg).

The Unicode Security Considerations report[12] recommends this character as a safe replacement for unmappable characters during character set conversion.

In many GUIs and applications, Control+Z (⌘ Command+Z on macOS) can be used to undo the last action. In many applications, earlier actions than the last one can also be undone by pressing Control+Z multiple times. Control+Z was one of a handful of keyboard sequences chosen by the program designers at Xerox PARC to control text editing.

Representation

[edit]

ASCII and Unicode representation of "substitute":

  • Octal code: 32
  • Decimal code: 26
  • Hexadecimal code: 1A, U+001A
  • Mnemonic symbol: SUB
  • Binary value: 11010

See also

[edit]

References

[edit]

Further reading

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
The substitute character (SUB), also known as the control-Z character, is a non-printable in the ASCII standard with value 26 (hexadecimal 1A, U+001A), defined as a character that may be substituted for any character determined to be or in error during transmission or processing. Originally introduced in the American National Standard Code for Information Interchange (ANSI X3.4-1968), it serves as a transmission control mechanism to signal and replace garbled or erroneous in communication protocols, ensuring without halting the entire stream. In practical applications, the substitute character gained prominence in early operating systems like and , where it functions as an (EOF) marker in text files, particularly when files are created or copied in ASCII mode; for instance, the Windows copy command appends Ctrl+Z (SUB) to indicate the file's end in plain text contexts. This dual role—error substitution and file termination—has persisted in legacy systems, though modern environments treat it primarily as a legacy control code within the C0 Controls and Basic Latin block, with no visual but potential rendering as a symbol like ␚ for purposes. Its use highlights the evolution of standards from 7-bit ASCII to broader multilingual support, where SUB remains reserved for compatibility but is rarely invoked in contemporary data handling due to more robust error-detection methods like checksums and parity bits.

Definition and History

Core Definition

The substitute character, known as SUB, is a defined in the American Standard Code for Information Interchange (ASCII) with a value of 26. It functions primarily as a mechanism for replacing characters that are determined to be invalid or erroneous during or transmission. In data streams, SUB maintains structural integrity by standing in for unknown, invalid, or unrepresentable characters, preventing disruptions in or rendering without altering the intended flow of . This substitution role ensures that systems can continue operations even when encountering anomalies, such as corrupted bytes or encoding mismatches. SUB is inherently non-printable, meaning it does not correspond to a visible or in output devices, but rather operates as a signal within control protocols. As part of the broader category of control characters in encoding systems, it contributes to non-textual instructions that manage device behavior, formatting, and error recovery, distinct from printable alphanumeric content. These control characters, including SUB, form the foundational layer for reliable data interchange in environments.

Historical Origins

The substitute character, designated as code 1A in , was first assigned as a in the initial American Standard Code for Information Interchange (ASCII), published by the American Standards Association (ASA)—predecessor to the (ANSI)—as X3.4-1963. This early version of ASCII reserved the first 32 positions (codes 00 to 1F) for control characters, including the position for what would later be explicitly named the substitute (SUB), initially termed a "generic separator" to handle data formatting in transmission. The inclusion reflected the need for mechanisms to manage erroneous or unrepresentable data in nascent digital systems, drawing from precedents in and technologies where substitution codes were employed to flag and mitigate corruption during mechanical data transfer. Influenced by International Telegraph Alphabet No. 2 (ITA2) and related teletype standards from the early , the SUB concept addressed transmission errors common in wire-based communication, where invalid signals could propagate corruption; punched card systems, prevalent in data processing since Herman Hollerith's 1890 tabulating machines, similarly used filler or error-indicating punches to preserve during batch handling. In the standard, SUB's role emphasized reliability in environments like perforated tape readers and early terminals, preventing cascade failures from garbled inputs. This foundational design prioritized interoperability across heterogeneous hardware, a critical concern for the emerging computer industry. The 1967 revision (USAS X3.4-1967) explicitly redefined the 1A position as SUBSTITUTE, clarifying its function to replace invalid or erroneous characters, often inserted automatically during transmission errors, while adding lowercase letters and refining other controls for broader adoption in teletypewriters and computer terminals. This update, adopted as ANSI X3.4-1968, solidified SUB's integration into practical systems, enhancing error recovery in real-time data exchange. Concurrently, the European Computer Manufacturers Association (ECMA) incorporated SUB into its ECMA-6 standard in 1965, aligning it closely with the evolving ASCII to promote transatlantic compatibility in computing hardware. International standardization advanced with the (ISO)'s adoption of SUB in ISO Recommendation R 646 (1967), later formalized in ISO 646 (1973), which harmonized 7-bit codes for global use, designating it as a transmission control character to indicate substituted invalid data while allowing national variants in graphic positions. This inclusion ensured SUB's role in preventing across diverse international networks, building on ASCII's framework to support the growing demand for standardized information interchange in the .

Encoding and Representation

In ASCII Standard

In the American Standard Code for Interchange (ASCII), the substitute character is designated as the SUB, with a value of 26, hexadecimal value of 1A, and binary representation of 00011010. This assignment places SUB = 26 () as part of the standardized control set, where values are fixed according to the 7-bit encoding scheme defined in the standard. SUB occupies position 26 in the 7-bit ASCII table, specifically within the block spanning 0–31 ( 00–1F), which is reserved for non-printable codes that manage device operations rather than rendering text. These control codes, including SUB, originated from adaptations of earlier signaling systems to support reliable data transmission in environments. As a non-printable control character, SUB is used as a replacement for any invalid or erroneous character detected in a data stream during processing or transmission. Its primary property in ASCII is to indicate substitution during data processing, ensuring error handling without altering the structural integrity of the transmitted information.

In Unicode and Extended Encodings

In , the substitute character is assigned the U+001A, named SUBSTITUTE (also abbreviated as SUB or CONTROL 26), and is classified as a within the Basic Latin block (U+0000–U+007F). This placement ensures backward compatibility with earlier standards like ASCII, where it occupies the same position, while integrating it into the broader repertoire for multilingual text processing. The substitute character's encoding in transformation formats maintains its single-byte representation from legacy systems. In , it is encoded as the single byte 0x1A, allowing seamless handling in byte-oriented environments without multi-byte overhead for this low . In UTF-16, it is represented as the two-byte sequence 0x001A (in both big-endian and little-endian variants), as it falls within the Basic Multilingual Plane and requires no surrogate pairs. Extended encodings preserve the substitute character at its original position for compatibility. In the ISO-8859 series, including ISO-8859-1 (Latin-1), it maps directly to byte 0x1A, corresponding to U+001A in . Similarly, , a superset of ISO-8859-1 for Western European languages, retains it at position 26 (0x1A), ensuring consistent interpretation across Windows applications and text files. Regarding text processing behaviors, the substitute character has a bidirectional class of Boundary Neutral (BN), meaning it acts as a neutral boundary in bidirectional algorithms without influencing the directionality of adjacent characters in mixed left-to-right and right-to-left scripts. In Unicode normalization forms, such as NFC (Normalization Form Canonical Composition) and NFD (Normalization Form Canonical Decomposition), control characters like U+001A remain unchanged, as normalization processes do not decompose or compose non-graphic controls with combining class 0.

Primary Uses

End-of-File Marker

In operating systems, the substitute character (SUB, Ctrl+Z, ASCII 0x1A) functioned as an (EOF) marker for text files, a convention adopted due to the system's storage of files in fixed 128-byte blocks without explicit length tracking. Programs would append SUB at the logical end of content, signaling readers to halt and ignore any trailing bytes in the final block, which might otherwise contain unrelated or uninitialized data. Early MS-DOS versions, including MS-DOS 1.0, preserved this CP/M compatibility by treating SUB similarly as an EOF indicator in text file operations, such as during input from the console or file reading commands like COPY. Users could enter Ctrl+Z at the command line to terminate input streams, ensuring clean file endings without extraneous content. The mechanism operated by having file I/O routines scan for SUB during sequential reads; upon detection, reading ceased, preventing the ingestion of potential garbage data beyond the intended content. For instance, in GW-BASIC programming under MS-DOS, the EOF function returned true (-1) when encountering SUB in sequential files or redirected console input, allowing programs to test for file completion reliably. This approach contrasted with Unix-like systems, where EOF is not marked by an embedded character but signaled interactively via Ctrl+D (EOT, ASCII 0x04), which flushes the input buffer and indicates no further data without altering the file contents.

Character Substitution Mechanism

The substitute character (SUB, ASCII code 26 or 0x1A) serves as a specifically intended to replace any character identified as invalid or erroneous during data processing or transmission. According to the American National Standard Code for Information Interchange (ANSI X3.4-1977), SUB is defined as "a control character that may be substituted for a character that is determined to be invalid or in error." This mechanism ensures that the structural integrity and layout of the text or data stream are maintained, preventing cascading errors or complete processing failure when encountering problematic bytes. By inserting SUB in place of the offending character, systems can continue handling the input without loss of positional information, which is particularly useful in environments constrained to 7-bit ASCII encoding. In text editors and parsers operating within ASCII-compatible frameworks, the substitution process typically involves scanning input bytes and replacing those that fall outside the valid range (e.g., values greater than 127 for non-ASCII content) or fail encoding validation with the SUB character. This preserves the document's visual and logical flow, avoiding gaps or misalignments that could arise from deletion or ignoring invalid elements. A basic algorithmic outline for this substitution in is:

function processCharacter(char): if not isValidASCII(char) or isUndefinedInEncoding(char): return SUB else: return char

function processCharacter(char): if not isValidASCII(char) or isUndefinedInEncoding(char): return SUB else: return char

Such logic is applied sequentially during to flag and substitute errors on-the-fly, allowing downstream operations like rendering or storage to proceed reliably. As a non-printable , SUB does not produce a visible in most text editors, including , ensuring the document's structure is maintained without visual interruption.

Additional Applications

Error Detection and Recovery

The substitute character (SUB, ASCII code 26) functions as a critical transmission control element for detection and recovery, primarily by marking and replacing or erroneous data during transmission or storage. According to the ASCII standard, SUB is intended to indicate that a received character is garbled or , allowing receiving systems to automatically substitute it with a predefined value or , thereby preventing the of corrupted . This automatic insertion occurs in response to detected anomalies, such as or decoding failures, ensuring continuity in without complete halt. In serial communications, including standards like , SUB plays a role in flagging bit errors identified through parity checks, where the receiver replaces suspect characters to maintain stream integrity. Early systems leveraged this to signal errors in real-time, enabling protocols to isolate faulty segments. A practical example appears in legacy storage media like tape drives, where SUB denotes read errors caused by media degradation or misalignment, triggering fallback mechanisms such as re-reading adjacent blocks or accessing redundant data copies. This application was common in 1960s computing environments, where relied on such indicators to enhance reliability amid frequent hardware limitations. Period reports from the era highlight how control characters like SUB contributed to error mitigation, though specific quantitative impacts varied by implementation.

Legacy System Integration

In , the substitute character (SUB, ASCII code 0x1A, equivalent to Ctrl+Z) functioned as the conventional (EOF) marker for text files, a practice inherited from earlier systems like to delimit usable content in environments where file length metadata was unreliable or absent. This allowed applications to process text as if stopping at SUB, though the DOS file (e.g., INT 21h AH=3Fh) relies on size for reads from files, treating them as binary ; SUB is interpreted as EOF primarily in device I/O (e.g., console) and by applications in . For console input functions such as AH=0Ah (buffered keyboard input), SUB similarly triggered EOF behavior when entered by the user. On Teletype (TTY) devices, common in early terminals, SUB served a control function to handle transmission errors by substituting for garbled or invalid characters, to prompt operator intervention. This pause enabled manual correction, such as skipping the faulty code or adjusting the tape reader, preventing propagation of errors in real-time teletypewriter communications over telegraph lines. The character's role aligned with ASCII standards for device control, ensuring reliable operation in multidrop networks where error recovery required human oversight. IBM System/360 mainframes utilized SUB within the encoding scheme (code point 0x3F) as a substitute for non-representable or erroneous data during operations, particularly in channel-attached peripherals like tape drives or printers. Transferring files between and Unix systems highlighted compatibility challenges stemming from SUB's role as an EOF marker, which Unix treated as ordinary data rather than a terminator, potentially leading to truncated reads or unexpected file behavior when combined with differing line-ending conventions (CRLF in DOS versus LF in Unix). Tools like dos2unix addressed these by converting line endings and, in extended usage or with additional processing, removing trailing SUB characters to ensure seamless integration and prevent misprocessing of text content across platforms. Such converters were essential for maintaining in cross-environment workflows prevalent in the pre-1990s era.

References

Add your contribution
Related Hubs
Contribute something
User Avatar
No comments yet.