Hubbry Logo
NewlineNewlineMain
Open search
Newline
Community hub
Newline
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something
Newline
Newline
from Wikipedia

A newline inserted between the words "Hello" and "world"

A newline (frequently called line ending, end of line (EOL), next line (NEL) or line break) is a control character or sequence of control characters in character encoding specifications such as ASCII, EBCDIC, Unicode, etc. This character (newline character), or a sequence of characters, is used to signify the end of a line of text and the start of a new one.[1]

History

[edit]

In the mid-1800s, long before the advent of teleprinters and teletype machines, Morse code operators or telegraphists invented and used Morse code prosigns to encode white space text formatting in formal written text messages. In particular, the Morse prosign BT (mnemonic break text), represented by the concatenation of literal textual Morse codes "B" and "T" characters, sent without the normal inter-character spacing, is used in Morse code to encode and indicate a new line or new section in a formal text message.

Later, in the age of modern teleprinters, standardized character set control codes were developed to aid in white space text formatting. ASCII was developed simultaneously by the International Organization for Standardization (ISO) and the American Standards Association (ASA), the latter being the predecessor organization to American National Standards Institute (ANSI). During the period of 1963 to 1968, the ISO draft standards supported the use of either CR+LF or LF alone as a newline, while the ASA drafts supported only CR+LF.

The sequence CR+LF was commonly used on many early computer systems that had adopted Teletype machines—typically a Teletype Model 33 ASR—as a console device, because this sequence was required to position those printers at the start of a new line. The separation of newline into two functions concealed the fact that the print head could not return from the far right to the beginning of the next line in time to print the next character. Any character printed after a CR would often print as a smudge in the middle of the page while the print head was still moving the carriage back to the first position. "The solution was to make the newline two characters: CR to move the carriage to column one, and LF to move the paper up."[2] In fact, it was often necessary to send extra padding characters—extraneous CRs or NULs—which are ignored but give the print head time to move to the left margin. Many early video displays also required multiple character times to scroll the display.

On such systems, applications had to talk directly to the Teletype machine and follow its conventions since the concept of device drivers hiding such hardware details from the application was not yet well developed. Therefore, text was routinely composed to satisfy the needs of Teletype machines. Most minicomputer systems from DEC used this convention. CP/M also used it in order to print on the same terminals that minicomputers used. From there MS-DOS (1981) adopted CP/M's CR+LF in order to be compatible, and this convention was inherited by Microsoft's later Windows operating system.

The Multics operating system began development in 1964 and used LF alone as its newline. Multics used a device driver to translate this character to whatever sequence a printer needed (including extra padding characters), and the single byte was more convenient for programming. What seems like a more obvious choice – CR – was not used, as CR provided the useful function of overprinting one line with another to create boldface, underscore and strikethrough effects. Perhaps more importantly, the use of LF alone as a line terminator had already been incorporated into drafts of the eventual ISO/IEC 646 standard. Unix followed the Multics practice, and later Unix-like systems followed Unix. This created conflicts between Windows and Unix-like operating systems, whereby files composed on one operating system could not be properly formatted or interpreted by another operating system (for example a UNIX shell script written in a Windows text editor like Notepad[3][4]).

Representation

[edit]

The concepts of carriage return (CR) and line feed (LF) are closely associated and can be considered either separately or together. In the physical media of typewriters and printers, two axes of motion, "down" and "across", are needed to create a new line on the page. Although the design of a machine (typewriter or printer) must consider them separately, the abstract logic of software can combine them together as one event. This is why a newline in character encoding can be defined as CR and LF combined into one (commonly called CR+LF or CRLF).

Some character sets provide a separate newline character code. EBCDIC, for example, provides an NL character code in addition to the CR and LF codes. Unicode, in addition to providing the ASCII CR and LF control codes, also provides a "next line" (NEL) control code, as well as control codes for "line separator" and "paragraph separator" markers. Unicode also contains printable characters for visually representing line feed ␊, carriage return ␍, and other C0 control codes (as well as a generic newline, ␤) in the Control Pictures block.

Software applications and operating system representation of a newline with one or two control characters
Operating system Character encoding Abbreviation hex value dec value Escape sequence
Multics
POSIX standard oriented systems: Unix and Unix-like systems (Linux, macOS, *BSD, AIX, Xenix, etc.), QNX 4+
Others: BeOS, Amiga, RISC OS, and others[5]
ASCII LF 0A 10 \n
Windows, MS-DOS compatibles, Atari TOS, DEC TOPS-10, RT-11, CP/M, MP/M, OS/2, Symbian OS, Palm OS, Amstrad CPC, and most other early non-Unix and non-IBM operating systems CR LF 0D 0A 13 10 \r\n
Commodore 64, Commodore 128, Acorn BBC, ZX Spectrum, TRS-80, Apple II, Oberon, classic Mac OS, HP Series 80, MIT Lisp Machine, and OS-9 CR 0D 13 \r
Acorn BBC[6] and RISC OS spooled text output[7] LF CR 0A 0D 10 13 \n\r
QNX pre-POSIX implementation (version < 4) RS 1E 30 \036
Atari 8-bit computers ATASCII EOL 9B 155
IBM mainframe systems, including z/OS (OS/390) and IBM i (OS/400) EBCDIC NL 15 21 \025
ZX80 and ZX81 (home computers from Sinclair Research Ltd) ZX80/ZX81 proprietary encoding 76 118
  • EBCDIC systems—mainly IBM mainframe systems, including z/OS (OS/390) and IBM i (OS/400)—use NL (New Line, 0x15)[8] as the character combining the functions of line feed and carriage return. The equivalent Unicode character (0x85) is called NEL (Next Line). EBCDIC also has control characters called CR and LF, but the numerical value of LF (0x25) differs from the one used by ASCII (0x0A). Additionally, some EBCDIC variants also use NL but assign a different numeric code to the character. However, those operating systems use a record-based file system, which stores text files as one record per line. In most file formats, no line terminators are actually stored.
  • Operating systems for the CDC 6000 series defined a newline as two or more zero-valued six-bit characters at the end of a 60-bit word. Some configurations also defined a zero-valued character as a colon character, with the result that multiple colons could be interpreted as a newline depending on position.
  • RSX-11 and OpenVMS also use a record-based file system, which stores text files as one record per line. In most file formats, no line terminators are actually stored, but the Record Management Services facility can transparently add a terminator to each line when it is retrieved by an application. The records themselves can contain the same line terminator characters, which can either be considered a feature or a nuisance depending on the application. RMS not only stores records, but also stores metadata about the record separators in different bits for the file to complicate matters even more (since files can have fixed length records, records that are prefixed by a count or records that are terminated by a specific character). The bits are not generic, so while they can specify that CRLF or LF or even CR is the line terminator, they can not substitute some other code.
  • Fixed line length was used by some early mainframe operating systems. In such a system, an implicit end-of-line was assumed every 72 or 80 characters, for example. No newline character was stored. If a file was imported from the outside world, lines shorter than the line length had to be padded with spaces, while lines longer than the line length had to be truncated. This mimicked the use of punched cards, on which each line was stored on a separate card, usually with 80 columns on each card, often with sequence numbers in columns 73–80. Many of these systems added a carriage control character to the start of the next record; this could indicate whether the next record was a continuation of the line started by the previous record, or a new line, or should overprint the previous line (similar to a CR). Often this was a normal printing character such as # that thus could not be used as the first character in a line. Some early line printers interpreted these characters directly in the records sent to them.

Communication protocols

[edit]

Many communications protocols have some sort of new line convention. In particular, protocols published by the Internet Engineering Task Force (IETF) typically use the ASCII CRLF sequence.

In some older protocols, the new line may be followed by a checksum or parity character.

Unicode

[edit]

The Unicode standard defines a number of characters that conforming applications should recognize as line terminators:[9]

 LF: Line Feed, U+000A
 VT: Vertical Tab, U+000B
 FF: Form Feed, U+000C
 CR: Carriage Return, U+000D
 CR+LF: CR (U+000D) followed by LF (U+000A)
 NEL: Next Line, U+0085
 LS: Line Separator, U+2028
 PS: Paragraph Separator, U+2029

While it may seem overly complicated compared to an approach such as converting all line terminators to a single character (e.g. LF), because Unicode is designed to preserve all information when converting a text file from any existing encoding to Unicode and back (round-trip integrity), Unicode needs to make the same distinctions between line breaks made by other encodings. For instance EBCDIC has NL, CR, and LF characters, so all three have to also exist in Unicode.

Most newline characters and sequences are in ASCII's C0 controls (i.e. have Unicode code points up to 0x1F). The three newline characters outside of this range—NEL, LS and PS—are often not recognized as newlines by software. For example:

  • JSON recognizes CR and LF as whitespace, but not any other newline characters.[10] C0 controls cannot appear unescaped within strings, but any other line break characters can.[11]
  • ECMAScript only recognizes CR, LF, LS and PS as line terminators.[12] Historically, unescaped line terminators were not permitted in string literals,[13] but this was changed in ES2019 to allow unescaped LS and PS in strings[12] for compatibility with JSON.[14]
  • YAML 1.1 recognized all three as line breaks; YAML 1.2 no longer recognizes them as line breaks in order to be compatible with JSON.[15]
  • Windows Notepad, the default text editor of Microsoft Windows, does not treat any of NEL, LS, or PS as line breaks.
  • gedit, the default text editor of the GNOME desktop environment, treats LS and PS as line breaks, but not NEL.

Unicode includes some glyphs intended for presenting a user-visible character to the reader of the document, and are thus not recognized themselves as a newline:

  • U+23CE RETURN SYMBOL
  • U+240A SYMBOL FOR LINE FEED
  • U+240D SYMBOL FOR CARRIAGE RETURN
  • U+2424 SYMBOL FOR NEWLINE

HTML

[edit]

In HTML, line breaks are whitespace and are generally[a] treated no different than spaces.[16] Paragraphs are created using separate instances of the HTML element <p>, with the physical separation of paragraphs controlled by the rendering engine.[17]

Line breaks can be explicitly created using the HTML element <br>. To facilitate screen readers being able to interpret pages, HTML documentation recommends against using this element for paragraphing. Instead, sources including MDN Web Docs suggest using this element for poems.[18]

In programming languages

[edit]

To facilitate creating portable programs, programming languages provide some abstractions to deal with the different types of newline sequences used in different environments.

The C language provides the escape sequences \n (newline) and \r (carriage return). However, these are not required to be equivalent to the ASCII LF and CR control characters. The C standard only guarantees two traits:

  1. Each of these escape sequences maps to a unique implementation-defined number that can be stored in one char value.
  2. When writing to a file, device node, or socket/fifo in text mode, \n is transparently translated to the native newline sequence used by the system, which may be longer than one character. When reading in text mode, the native newline sequence is translated back to \n. In binary mode, no translation is performed, and the internal representation produced by \n is output directly.

On Unix operating system platforms, where C originated, the native newline sequence is ASCII LF (0x0A), so \n was simply defined to be that value. With the internal and external representation being identical, the translation performed in text mode is a no-op, and Unix has no notion of text mode or binary mode. This has caused many programmers who developed their software on Unix systems simply to ignore the distinction completely, resulting in code that is not portable to different platforms.

The C standard library function fgets() is best avoided in binary mode because any file not written with the Unix newline convention will be misread. Also, in text mode, any file not written with the system's native newline sequence (such as a file created on a Unix system, then copied to a Windows system) will be misread as well.

Another common problem is the use of \n when communicating using an Internet protocol that mandates the use of ASCII CR+LF for ending lines. Writing \n to a text mode stream works correctly on Windows systems, but produces only LF on Unix, and something completely different on more exotic systems. Using \r\n in binary mode is slightly better.

Many languages, such as C++, Perl,[19] and Haskell provide the same interpretation of \n as C. C++ has an alternative input/output (I/O) model where the manipulator std::endl can be used to output a newline (and flushes the stream buffer).

Java, PHP,[20] and Python[21] provide the \r\n sequence (for ASCII CR+LF). In contrast to C, these are guaranteed to represent the values U+000D and U+000A, respectively.

The Java Class Library input/output (I/O) methods do not transparently translate these into platform-dependent newline sequences on input or output. Instead, they provide functions for writing a full line that automatically add the native newline sequence, and functions for reading lines that accept any of CR, LF, or CR+LF as a line terminator (see BufferedReader.readLine()). The System.lineSeparator() method can be used to retrieve the underlying line separator.

Example:

   String eol = System.lineSeparator();
   String lineColor = "Color: Red" + eol;

Python has a "universal newline support" feature enabled by default, which translates all three commonly found line ending conventions (\n, \r, \r\n) into Python's standard \n convention when opening a file for reading, when importing modules, and when executing a file. This feature can be controlled using the newline argument in the open() function when opening the file.[22][23]

Some languages have created special variables, constants, and subroutines to facilitate newlines during program execution. In some languages such as PHP and Perl, double quotes are required to perform escape substitution for all escape sequences, including \n and \r. In PHP, to avoid portability problems, newline sequences should be issued using the PHP_EOL constant.[24]

Example in C#:

   string eol = Environment.NewLine;
   string lineColor = "Color: Red" + eol;
   
   string eol2 = "\n";
   string lineColor2 = "Color: Blue" + eol2;

Issues with different newline formats

[edit]
A text file created with gedit and viewed with a hex editor. Besides the text objects, there are only EOL markers with the hexadecimal value 0A.

The different newline conventions cause text files that have been transferred between systems of different types to be displayed incorrectly.

Text in files created with programs which are common on Unix-like or classic Mac OS, appear as a single long line on most programs common to MS-DOS and Microsoft Windows because these do not display a single line feed or a single carriage return as a line break.

Conversely, when viewing a file originating from a Windows computer on a Unix-like system, the extra CR may be displayed as a second line break, as ^M, or as <cr> at the end of each line.

Furthermore, programs other than text editors may not accept a file, e.g. some configuration file, encoded using the foreign newline convention, as a valid file.

The problem can be hard to spot because some programs handle the foreign newlines properly while others do not. For example, a compiler may fail with obscure syntax errors even though the source file looks correct when displayed on the console or in an editor. Modern text editors generally recognize all flavours of CR+LF newlines and allow users to convert between the different standards. Web browsers are usually also capable of displaying text files and websites which use different types of newlines.

Even if a program supports different newline conventions, these features are often not sufficiently labeled, described, or documented. Typically a menu or combo-box enumerating different newline conventions will be displayed to users without an indication if the selection will re-interpret, temporarily convert, or permanently convert the newlines. Some programs will implicitly convert on open, copy, paste, or save—often inconsistently.

Most textual Internet protocols (including HTTP, SMTP, FTP, IRC, and many others) mandate the use of ASCII CR+LF (\r\n, 0x0D 0x0A) on the protocol level, but recommend that tolerant applications recognize lone LF (\n, 0x0A) as well. Despite the dictated standard, many applications erroneously use the C newline escape sequence \n (LF) instead of the correct combination of carriage return escape and newline escape sequences \r\n (CR+LF) (see section Newline in programming languages above). This accidental use of the wrong escape sequences leads to problems when trying to communicate with systems adhering to the stricter interpretation of the standards instead of the suggested tolerant interpretation. One such intolerant system is the qmail mail transfer agent that actively refuses to accept messages from systems that send bare LF instead of the required CR+LF.[25]

The standard Internet Message Format[26] for email states: "CR and LF MUST only occur together as CRLF; they MUST NOT appear independently in the body". Differences between SMTP implementations in how they treat bare LF and/or bare CR characters have led to SMTP spoofing attacks referred to as "SMTP smuggling".[27]

The File Transfer Protocol can automatically convert newlines in files being transferred between systems with different newline representations when the transfer is done in "ASCII mode". However, transferring binary files in this mode usually has disastrous results: any occurrence of the newline byte sequence—which does not have line terminator semantics in this context, but is just part of a normal sequence of bytes—will be translated to whatever newline representation the other system uses, effectively corrupting the file. FTP clients often employ some heuristics (for example, inspection of filename extensions) to automatically select either binary or ASCII mode, but in the end it is up to users to make sure their files are transferred in the correct mode. If there is any doubt as to the correct mode, binary mode should be used, as then no files will be altered by FTP, though they may display incorrectly.[28]

Conversion between newline formats

[edit]

Text editors are often used for converting a text file between different newline formats; most modern editors can read and write files using at least the different ASCII CR/LF conventions.

For example, the editor Vim can make a file compatible with the Windows Notepad text editor. Within vim

 :set fileformat=dos
 :wq

Editors can be unsuitable for converting larger files or bulk conversion of many files. For larger files (on Windows NT) the following command is often used:

D:\>TYPE unix_file | FIND /V "" > dos_file

Special purpose programs to convert files between different newline conventions include unix2dos and dos2unix, mac2unix and unix2mac, mac2dos and dos2mac, and flip.[29] The tr command is available on virtually every Unix-like system and can be used to perform arbitrary replacement operations on single characters. A DOS/Windows text file can be converted to Unix format by simply removing all ASCII CR characters with

$ tr -d '\r' < inputfile > outputfile

or, if the text has only CR newlines, by converting all CR newlines to LF with

$ tr '\r' '\n' < inputfile > outputfile

The same tasks are sometimes performed with awk, sed, or in Perl if the platform has a Perl interpreter:

$ awk '{sub("$","\r\n"); printf("%s",$0);}' inputfile > outputfile  # UNIX to DOS  (adding CRs on Linux and BSD based OS that haven't GNU extensions)
$ awk '{gsub("\r",""); print;}' inputfile > outputfile              # DOS to UNIX  (removing CRs on Linux and BSD based OS that haven't GNU extensions)
$ sed -e 's/$/\r/' inputfile > outputfile              # UNIX to DOS  (adding CRs on Linux based OS that use GNU extensions)
$ sed -e 's/\r$//' inputfile > outputfile              # DOS  to UNIX (removing CRs on Linux based OS that use GNU extensions)
$ perl -pe 's/\r?\n|\r/\r\n/g' inputfile > outputfile  # Convert to DOS
$ perl -pe 's/\r?\n|\r/\n/g'   inputfile > outputfile  # Convert to UNIX
$ perl -pe 's/\r?\n|\r/\r/g'   inputfile > outputfile  # Convert to old Mac

The file command can identify the type of line endings:

 $ file myfile.txt
 myfile.txt: ASCII English text, with CRLF line terminators

The Unix egrep (extended grep) command can be used to print filenames of Unix or DOS files (assuming Unix and DOS-style files only, no classic Mac OS-style files):

$ egrep -L '\r\n' myfile.txt # show UNIX style file (LF terminated)
$ egrep -l '\r\n' myfile.txt # show DOS style file (CRLF terminated)

Other tools permit the user to visualise the EOL characters:

$ od -a myfile.txt
$ cat -e myfile.txt
$ cat -v myfile.txt
$ hexdump -c myfile.txt

Interpretation

[edit]

Two ways to view newlines, both of which are self-consistent, are that newlines either separate lines or that they terminate lines. If a newline is considered a separator, there will be no newline after the last line of a file. Some programs have problems processing the last line of a file if it is not terminated by a newline. On the other hand, programs that expect newline to be used as a separator will interpret a final newline as starting a new (empty) line. Conversely, if a newline is considered a terminator, all text lines including the last are expected to be terminated by a newline. If the final character sequence in a text file is not a newline, the final line of the file may be considered to be an improper or incomplete text line, or the file may be considered to be improperly truncated.

In text intended primarily to be read by humans using software which implements the word wrap feature, a newline character typically only needs to be stored if a line break is required independent of whether the next word would fit on the same line, such as between paragraphs and in vertical lists. Therefore, in the logic of word processing and most text editors, newline is used as a paragraph break and is known as a "hard return", in contrast to "soft returns" which are dynamically created to implement word wrapping and are changeable with each display instance. In many applications a separate control character called "manual line break" exists for forcing line breaks inside a single paragraph. The glyph for the control character for a hard return is usually a pilcrow (¶), and for the manual line break is usually a carriage return arrow (↵).

Reverse and partial line feeds

[edit]

RI (U+008D REVERSE LINE FEED,[30] ISO/IEC 6429 8D, decimal 141) is used to move the printing position back one line (by reverse feeding the paper, or by moving a display cursor up one line) so that other characters may be printed over existing text. This may be done to make them bolder, or to add underlines, strike-throughs or other characters such as diacritics. The reverse line feed was called a line starve – a pun on line feed – in the Hacker's Dictionary.[31]

Similarly, PLD (U+008B PARTIAL LINE FORWARD, decimal 139) and PLU (U+008C PARTIAL LINE BACKWARD, decimal 140) can be used to advance or reverse the text printing position by some fraction of the vertical line spacing (typically, half). These can be used in combination for subscripts (by advancing and then reversing) and superscripts (by reversing and then advancing), and may also be useful for printing diacritics.

See also

[edit]

Notes

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
A newline, also known as a line ending or end-of-line (EOL) marker, is a or sequence of control characters in standards such as ASCII and that denotes the conclusion of one line of text and the commencement of the next. In the ASCII standard (ANSI X3.4-1968), the primary newline character is the line feed (LF), assigned 10 (0x0A), which functions as a format effector to advance the printing or display position to the next line. The , 13 (0x0D), separately moves the position to the beginning of the current line, but combinations like CR followed by LF (CRLF) emerged as conventions for complete line termination. incorporates these via the Newline Function (NLF), which includes LF (U+000A), CR (U+000D), the sequence CRLF, and next line (NEL, U+0085); also defines line separator (LS, U+2028) and paragraph separator (PS, U+2029) as explicit break characters, with guidelines recommending consistent handling across platforms to avoid interoperability issues. Newline conventions vary by operating system and historical context: Unix-like systems (e.g., , macOS) standardize on LF alone, while Windows employs CRLF for compatibility with DOS heritage, and older Macintosh systems used CR exclusively until adopting LF in macOS. These differences can lead to challenges in text processing, file transfers, and , prompting tools and protocols like those in RFC 5198 to normalize line endings to with CRLF for network interchange. In programming languages, newlines are often represented by escape sequences such as \n for LF, facilitating portable text manipulation.

History

Origins in Typewriters and Teleprinters

The , a pivotal in mechanical writing devices, was patented on June 23, 1868, by , along with Carlos Glidden and Samuel W. Soule, marking the first practical model known as the "Type-Writer." This device featured a —a movable frame holding the paper—that advanced incrementally as keys were struck, thanks to an escapement mechanism ensuring precise letter spacing. At the end of each line, the typist manually operated a , which retracted the to the left margin, while a separate line feed lever or platen knob advanced the paper upward by one line to prepare for the next row of text. These physical operations, driven by springs and gears, addressed the need for organized linear text production on paper, preventing overlap and maintaining readability without digital aids. The introduction of electric typewriters in the 1930s further refined these mechanisms, automating actions for greater efficiency. IBM's Electromatic model, released in 1935 after acquiring the Northeast Electric Company, incorporated an electric motor to power the carriage return and line feed, reducing manual effort and enabling faster operation compared to purely mechanical predecessors. Earlier attempts at electrification dated back to Thomas Edison's 1872 printing wheel design, but practical office models emerged only in this decade, with Royal introducing its first electric typewriter in 1950. These innovations preserved the core principles of carriage return—resetting the print position horizontally—and line feed—vertical paper advancement—while enhancing reliability for professional use. Teleprinters, or teletypewriters, emerged in the early 1900s as electromechanical devices for transmitting typed messages over telegraph lines, building directly on typewriter mechanics for remote printing. Émile Baudot's five-bit telegraph code, patented in 1874, enabled efficient character transmission but initially lacked dedicated line control signals. This changed with Donald Murray's 1901 adaptation of the for English-language use, which introduced specific control characters for (CR) and line feed (LF); these simulated the typewriter's physical actions by signaling the receiving device's carriage to shift left and the platen to advance the paper, respectively. Teletype machines, first commercialized by the Morkrum Company from 1906 onward and later by the , standardized the CR+LF sequence to ensure complete line transitions over asynchronous telegraph connections, allowing synchronized printing at both ends. A key feature of teleprinters was the ability to perform overstriking—reprinting on the same line for emphasis or correction—by issuing a CR without a subsequent LF, which returned the print head to the line's start without advancing the paper, thus enabling manipulation of text on a single line before feeding to the next. This capability, rooted in the separate mechanical controls of typewriters, foreshadowed flexible line handling in later communication systems and highlighted the practical need for distinct CR and LF operations in noisy telegraph environments.

Evolution in Early Computing

As early computers emerged in the and , the newline concept transitioned from mechanical teleprinters to digital terminals, where repurposed teleprinters served as devices for systems, allowing multiple users to interact with a single machine over phone lines. Devices like the 026 printing card punch, introduced in 1949, adapted mechanisms for , incorporating and line feed operations to print punched cards while advancing the paper feed. Early line printers, such as the 1403 from the , extended this by using carriage control characters in the first column of each line to manage paper advancement, spacing, and form feeds, ensuring efficient output formatting without dedicated newline sequences. The standardization of newline in computing advanced significantly with the development of the American Standard Code for Information Interchange () in 1963 by the American Standards Association (ASA) X3 committee. defined line feed (LF, 0x0A, 10) as a to advance the paper or cursor to the next line, and (CR, 0x0D, 13) to move the cursor to the line's starting position, drawing from conventions to support data transmission and display. These definitions, published as ASA X3.4-1963, provided a common framework for text handling across systems, influencing subsequent protocols and software. In the and , operating systems diverged in newline adoption: the TECO , developed in 1962 for systems, treated line breaks as single LF characters internally, automatically appending LF to input carriage returns for buffer storage. , an influential system from the late 1960s, stored text with LF alone but inserted CR before LF during output to terminals or printers for compatibility with teleprinters. In contrast, UNIX, developed in the early 1970s at , standardized on LF-only for line endings to enhance storage efficiency by avoiding redundant CR characters, particularly beneficial on limited media like tapes and disks. The , launched in , further emphasized consistent newline handling through its reliance on ASCII control characters in early protocols like the 1822 interface message processor protocol, ensuring reliable text transmission across heterogeneous hosts by standardizing LF and CR for line demarcation in network messages. This approach influenced subsequent standards, promoting in data exchange.

Technical Representation

ASCII Control Characters

The American Standard Code for Information Interchange (ASCII), formalized in 1963 as ASA X3.4-1963, established a 7-bit character encoding scheme that reserved the code positions from 0 to 31 (and 127) for control characters, which lack visual glyphs and serve to control text layout, transmission, and peripheral devices rather than represent printable symbols. These controls were influenced by earlier codes, such as the International Telegraph Alphabet No. 2 (ITA2) from , which introduced non-printing signals for formatting in mechanical systems like derivatives. Central to newline operations are the Line Feed (LF) and (CR) control characters. LF, assigned code 10 ( 0A, 012, binary 0001010), instructs devices to advance the active position to the next line, performing a vertical movement without horizontal reset, as originally defined for paper feed mechanisms. CR, with code 13 (hex 0D, 015, binary 0001101), returns the active position to the start of the current line, resetting the horizontal position to the left margin while leaving the vertical position unchanged, emulating the mechanical action of a carriage. In 7-bit ASCII streams, these appear as non-printable bytes; for example, a sequence ending a line might embed LF as the byte 0x0A in a binary data flow, invisible to direct display but interpreted by parsers to format output. Related control characters include Vertical Tabulation (VT, code 11 or hex 0B, 013, binary 0001011), which advances the position to the next vertical for multi-line spacing, and Form Feed (FF, code 12 or hex 0C, 014, binary 0001100), which ejects the current page and advances to the start of a new one, both supporting progression in early printing and display systems. Subsequent extensions to ASCII, such as the ISO 8859 family of standards (e.g., ISO 8859-1 from ), preserved the core 7-bit structure unchanged for these control characters in the 0-127 range, ensuring compatibility while adding 128-255 for additional printable symbols in regional variants.

End-of-Line Sequences Across Systems

In environments, end-of-line sequences represent the transition to a new line in text data, with variations arising from historical and technical considerations across systems. The most prevalent are the line feed (LF, ASCII 0x0A) used alone in Unix, , and macOS (post-2002 versions); the carriage return followed by line feed (CR+LF, ASCII 0x0D followed by 0x0A) in Windows and DOS-derived systems; and (CR, ASCII 0x0D) alone in (pre-OS X). These sequences build on ASCII control characters for carriage positioning and paper advancement. The LF-only approach emerged in early Unix implementations for storage efficiency and standardization, as a single character adequately advanced the cursor on line-buffered terminals without needing separate return and feed operations. In contrast, CR+LF originated in and was adopted by to ensure compatibility with mechanical teletypes and printers, where CR reset the print head to the line start and LF advanced the paper. employed CR-only for its straightforward text rendering model, simplifying file processing on resource-constrained hardware. Less common variants include the Next Line (NEL) control, encoded as 0x85 in ISO-8859-1 and equivalent to 's NL (0x15), primarily used in environments for combined and line feed in vertical tabulation contexts.
SequenceSystemsRationale
LF (0x0A)//macOS (post-2002)Efficiency in file size and terminal handling
CR+LF (0x0D 0x0A)Windows/DOS/Compatibility with teletype mechanics
(0x0D) (pre-2001)Simplicity in text display
NEL (0x85) mainframesVertical movement in legacy encodings
Internet protocols standardize these for interoperability; for instance, RFC 4180 (2005) specifies CR+LF as the for in (CSV) files, allowing the final record to optionally omit a trailing break. The JSON interchange format (ECMA-404, 2013) flexibly recognizes LF, CR, or CR+LF within whitespace to separate tokens, accommodating diverse input sources. In XML documents, the 1.0 specification mandates processor normalization of any CR, LF, or CR+LF to a single LF (#xA) during for consistent entity handling. Tools like , when processing CSV files, generally expect CR+LF for row boundaries to align with Windows conventions, though they may tolerate variations in quoted fields containing embedded breaks.

Encoding in Unicode

In Unicode, newline functionality is represented through several dedicated control characters and separators, each serving specific roles in text formatting and line progression. The primary characters include Line Feed (LF, U+000A), which advances the cursor to the next line while maintaining the horizontal position; (CR, U+000D), which returns the cursor to the line start; and Next Line (NEL, U+0085), a control from ISO 6429 that combines both effects in some legacy systems. Additionally, Unicode defines Line Separator (LS, U+2028) for breaking lines within paragraphs without implying a new paragraph, and (PS, U+2029) for separating entire paragraphs, both aiding in structured text processing. These characters trace their origins to early Unicode versions, with LF and CR included as part of the basic C0 control set in 1.0 released in 1991, inheriting from ASCII and ISO standards to ensure compatibility with existing text processing. LS and PS were added later in 3.0 in 2000, specifically to support layouts in scripts like and Hebrew, as well as East Asian where visual line breaks differ from Western conventions due to vertical writing modes and character widths. Unicode normalization forms, such as Normalization Form C (NFC) and Normalization Form D (NFD), preserve these line break characters without alteration, as they are neither decomposable nor composed with other characters; for instance, LF, CR, , and PS remain unchanged during or composition to maintain text integrity. This stability is crucial for applications involving text transformation, where unintended splitting or merging of lines could disrupt formatting. In multi-byte encodings like and UTF-16, these characters must be treated as to avoid splitting sequences; for example, in , LF encodes as the single byte 0x0A, whereas LS requires the three-byte sequence 0xE2 0x80 0xA8, ensuring no partial reads occur during . In UTF-16, characters in the BMP such as LS (U+2028) are encoded directly as two bytes (0x20 0x28), whereas characters in higher planes (U+10000 and above) use surrogate pairs, which parsers must handle as to preserve line semantics.

Usage Contexts

Operating Systems and Text Files

In operating systems, including , the native end-of-line sequence for text files is the line feed (LF) character, as standardized by for portability across systems. Editors such as vi and handle this by detecting the file's line ending format upon opening and optionally converting to LF for editing; for instance, Vim uses the :set fileformat=unix command to ensure LF consistency, while Emacs employs set-buffer-file-coding-system with the "unix" argument to normalize endings without altering content encoding. Windows traditionally employs the plus line feed (CR+LF) sequence in text files, which is the default for applications like when creating or saving files. In scripting, especially since PowerShell Core (version 6 and later, now PowerShell 7+) released in 2016, uses LF line endings by default across all platforms to support cross-platform compatibility, though this can cause issues with Windows tools expecting CRLF, as discussed in ongoing compatibility reports. macOS underwent a significant shift in newline handling with the transition to OS X in 2001, moving from the classic Mac OS's single (CR) to LF for compliance with standards and Unix heritage, ensuring seamless integration with Unix-based tools and file systems. files with a .txt extension exhibit newline variations depending on the originating system or application, leading to potential challenges. Structured formats like and XML demand consistent normalization of line endings to prevent parsing errors; for example, the XML specification requires processors to normalize all line breaks to LF during input parsing, while JSON parsers may fail on unescaped or mismatched endings in multi-line values unless files are pre-normalized to a single convention. Version control systems like , first released in 2005, address these discrepancies by storing text files internally with LF endings regardless of the originating platform, then converting to the system's native format (such as CR+LF on Windows) during checkout via the core.autocrlf configuration setting, which can be set to true for automatic handling or input to enforce LF on commit. A practical example arises with CSV files generated by , which enforces CR+LF as the row to align with Windows conventions, often resulting in parsing issues when these files are opened in Unix tools like csvkit or without prior conversion, as the extra CR may be interpreted as embedded data rather than a line separator.

Programming Languages

In programming languages, newlines are commonly represented using escape sequences within string literals to insert line feed (LF) or carriage return (CR) characters. For instance, the sequence \n denotes LF in languages such as C, Java, and Python, while \r represents CR, and \r\n can be specified explicitly for the combined sequence used on Windows systems. Python 3 implements universal newlines, treating \n in file input as a portable representation that automatically handles LF, CR, or CRLF sequences regardless of the platform's native convention, as defined in PEP 278 from 2001. In contrast, Java provides System.lineSeparator(), a method that returns the platform-specific newline string—such as \n on Unix-like systems or \r\n on Windows—to ensure compatibility with operating system text file conventions in input and output operations. Modern languages address variability in line endings through flexible APIs; for example, Rust's std::io::BufRead trait, via its lines() method, recognizes both LF and CRLF as line terminators, stripping the (including the optional CR) without including it in the resulting string, and supports custom line-ending configurations through iterator adaptations. Similarly, Go's bufio.Scanner with the default ScanLines function splits on an optional CR followed by a mandatory LF (matching the regex \r?\n), allowing developers to define custom split functions for other endings. In JSON strings, the \n is interpreted as a literal LF character ( U+000A), preserving the newline in serialized data across language implementations. Newline handling in SQL varies by database system; for instance, preserves newlines in string literals and text fields when inserted using escape sequences like E'\n', though certain functions such as trim() may collapse leading or trailing whitespace including newlines. In regular expressions, languages like treat \n as matching only the LF character by default, requiring modifiers such as /s (dotall) to make . match newlines or explicit patterns like \r?\n for broader line-ending support. A practical example in C++ is the std::getline function from <string>, which reads input until it encounters the platform-default delimiter (typically \n), consumes the delimiter to advance the , but excludes it from the output , helping prevent residual characters in subsequent reads.

Web Technologies and Markup

In , whitespace characters, including newlines, are collapsed into a single space during rendering in normal text flow, preventing multiple spaces or line breaks from affecting layout unless explicitly preserved. The <br> element provides a mechanism for inserting a single line break, equivalent to a newline in visual rendering, and is commonly used to simulate the effect of newlines in non-preformatted content. However, within the <pre> element, all whitespace—including newlines—is preserved exactly as authored, rendering fixed-width text with explicit line breaks. Numeric character entities such as &#10; (representing LF, U+000A) allow authors to embed line feeds directly in markup where needed. CSS extends control over newline handling through the white-space property, where the pre-line value collapses consecutive whitespace sequences but preserves newlines as line breaks, allowing text to wrap while respecting authored line separations. This behavior applies to standard LF characters, enabling dynamic formatting in web layouts. Gaps exist in handling Unicode-specific separators like U+2028 (line separator, LS) and U+2029 (paragraph separator, PS), which are treated as non-collapsible segment breaks in pre-line mode but may not always render consistently across browsers in internationalized content. In XML-based formats like , parsers normalize all line endings—whether CR, LF, or CR+LF—to a single LF (U+000A) before processing, ensuring consistent internal representation regardless of the source file's platform. Similarly, used in web APIs escapes newlines within strings as \n (denoting LF), adhering to the format's strict for control characters to maintain parsability across systems. HTTP protocol specifications mandate CR+LF (CRLF) as the line terminator for header fields, separating name-value pairs in requests and responses. Markdown, as defined in the CommonMark specification (version 0.31.2, released January 2024), treats single newlines in paragraphs as soft breaks that are ignored for rendering, requiring either two trailing spaces followed by a newline or a blank line to produce a hard line break or paragraph separation. In code blocks, however, raw newlines are preserved literally as LF, maintaining the original formatting for embedded code snippets.

Interpretation and Processing

Software Parsing Behaviors

Software applications and systems interpret newline sequences differently based on their , platform conventions, and standards, which influences how text is , processed, and rendered during reading and display. In web browsers, parsing collapses sequences of whitespace characters—including newlines (LF or CR+LF)—into a single space, except within elements like <pre> or when the CSS white-space property is set to pre or pre-wrap. This behavior ensures consistent layout rendering across documents but can obscure original formatting unless preserved explicitly. Terminal emulators, such as , map the LF to advancing the cursor to the next line while maintaining the horizontal position, and the CR character to moving the cursor to the start of the current line without vertical movement. These mappings align with legacy teletype behaviors and enable precise cursor control in command-line interfaces. Language runtimes often implement flexible to handle cross-platform compatibility. In , the BufferedReader.readLine() method operates in a universal newline mode, recognizing any of \r (CR), \n (LF), or \r\n (CR+LF) as a line terminator and returning the line without the terminator. Similarly, in .NET Framework and .NET Core, the StreamReader.ReadLine() method detects and consumes \r\n, \n, or \r as line endings, normalizing them during text stream processing. Modern development tools address parsing inconsistencies by auto-detecting and managing newline variants. , released in 2015, automatically detects line ending types (LF, CRLF, or CR) upon opening files and displays the current format in the , allowing users to configure detection and normalization to prevent display artifacts. Version control systems like handle mixed newline sequences in operations through settings such as core.autocrlf, which normalize endings during checkout and commit to ensure consistent comparisons across environments. In POSIX-compliant environments, as defined by IEEE Std 1003.1, a text line consists of zero or more non-newline characters terminated by a (LF), so a CR+LF sequence is parsed as a line content ending with a CR character followed by a newline delimiter, potentially causing visible artifacts like trailing carets in displays unless normalized. Text editors like support specialized modes—such as dos-mode for CRLF and mac-mode for CR—to detect and internally convert foreign newline formats to Unix LF for editing, while preserving the original on save. Email clients encounter parsing variations due to protocol requirements. The standard (RFC 2045) mandates CRLF as the canonical line break for message headers and overall structure, but encapsulated body text may contain platform-specific newlines, resulting in display quirks such as extra blank lines or misaligned content if the client does not normalize during rendering.

Format Conversion Methods

Command-line tools provide straightforward methods for converting newline formats, particularly between Unix-style LF and Windows-style CRLF sequences. The dos2unix and utilities, originating in the early , convert files by removing or adding carriage returns as needed; for instance, dos2unix strips trailing \r characters from lines ending in \r\n, while inserts \r before existing \n terminators. These tools are available in most systems and process files in batch mode, preserving content while normalizing line endings. Stream editors like and offer scriptable alternatives for targeted conversions without dedicated binaries. A common sed command to remove carriage returns from DOS-formatted files is sed 's/\r$//' , which substitutes any \r at the end of a line with nothing, effectively converting CRLF to LF. Similarly, awk can process and rewrite lines, such as awk '{sub(/\r$/,""); print}' to strip trailing \r before outputting. The utility simplifies deletion of carriage returns across an entire file using tr -d '\r' < input > output , which removes all instances of the \r character (ASCII 13) from input and redirects to output. In programming environments, APIs facilitate programmatic newline handling for cross-platform compatibility. Python's os module provides os.linesep , a representing the native line separator (\r\n on Windows, \n on Unix), which can be used with str.replace() to normalize text; for example, text.replace('\n', os.linesep) converts Unix newlines to the local format before writing to disk. Node.js's fs module, when reading files with '' encoding, preserves original byte sequences including mixed newlines, allowing conversion via methods like text.replace(/\r\n/g, '\n') to unify to LF for processing. supports in-place editing through the $^I variable, set to an extension for backups (e.g., $^I = ".bak"), enabling scripts like perl -i -pe 's/\r\n?//g' to remove CRLF or CR variants directly in the file. Integrated development environments (IDEs) and cloud services address conversion gaps through automation. allows configuration of line separators per file or globally via Editor > Code Style settings, with options to change existing files' endings (e.g., from CRLF to LF) and apply normalization during saves if tied to code style schemes. In cloud storage like AWS S3, objects are stored as immutable bytes, preserving native newline formats without alteration, but transformations can be applied via functions or S3 Select queries for on-demand conversion during retrieval or processing. Version control systems like incorporate line ending filters to manage conversions in cross-platform repositories. Git's smudge and clean filters, defined in .gitattributes files, process files during checkout (smudge: apply local CRLF) and commit (clean: normalize to LF); for example, setting *.txt filter=crlf invokes scripts to handle endings, ensuring consistent storage while adapting to developer platforms. This approach mitigates compatibility issues by automating transformations at the repository level.

Common Compatibility Issues

One prevalent compatibility issue arises in version control systems like , where files with mixed or platform-specific newline sequences—such as CRLF on Windows versus LF on systems—can produce misleading commit diffs that appear to show unnecessary changes to entire files. This occurs because normalizes line endings during commits based on configuration settings like core.autocrlf, leading developers to inadvertently introduce or propagate false modifications across repositories. In email systems, mixed newline sequences can disrupt automatic line wrapping, causing text to render incorrectly in clients that expect uniform CRLF delimiters as per standards, where any occurrence of CRLF must represent a line break and isolated CR or LF usage is prohibited. For instance, a composed with LF-only lines on a Unix system may result in broken formatting or unintended reflow when viewed on Windows-based email software. Cross-platform deployment exacerbates these problems; for example, shell scripts authored on Windows with CRLF endings often fail on servers because the shebang line (e.g., #!/bin/bash) becomes #!/bin/bash\r, rendering the interpreter path invalid and preventing execution. Similarly, parsers adhering strictly to RFC 8259 may reject or misparse documents using CR-only line endings, as the specification defines whitespace (including line breaks) but many implementations expect LF or CRLF for structural separation, treating CR as an unescaped . Post-2020 developments in , particularly with Docker, have introduced practices enforcing LF endings in Linux-based images to mitigate portability issues, as CRLF files mounted from Windows hosts can cause runtime errors in scripts or configurations within the environment. This standardization helps avoid inconsistencies but highlights ongoing challenges in hybrid development workflows. Security risks also stem from unnormalized newline inputs; in web forms, failure to sanitize user-supplied containing CRLF sequences can enable injection attacks, allowing attackers to append arbitrary HTTP headers and facilitate response splitting or cache poisoning. The RFC 2046 for recommends CRLF as the standard line break in text parts while acknowledging tolerance for legacy systems using other conventions, yet deviations persist and cause failures. A notable example is Excel's handling of CSV files, where LF-only endings from Unix sources are often mangled during import, resulting in data appearing in a single row or column misalignment due to improper . Tools like Vim address detection challenges via options such as ++ff=dos when editing files, which forces interpretation as DOS (CRLF) format to prevent display artifacts from mismatched endings. Additionally, regular expressions in programming languages may fail if the escape sequence \n (matching LF) is used on CRLF files without accounting for the preceding CR, leading to incomplete matches or errors across platforms. These issues can typically be resolved through format conversion methods that normalize endings to a consistent standard.

Specialized Variants

Reverse Line Feeds

Reverse line feeds, designated as the Reverse Indexing (RI) control function in standards like ECMA-48, enable the printing or cursor position to move upward by one line, countering the downward movement of a standard line feed. This capability facilitates overstriking or overprinting, where subsequent characters are printed over previous ones to simulate effects such as bolding (by reprinting the same text) or underlining (by printing characters beneath the original line) in hardware without dedicated formatting features. The process typically involves a (CR, ASCII 13) to reposition to the line's start, followed by the RI control (ASCII 141 or U+008D in ) to shift upward, and then outputting the overstrike characters; in some implementations, combinations of (BS, ASCII 8) with line feeds approximate this upward and leftward motion. In historical contexts, reverse line feeds were integral to dot-matrix printers prevalent from the through the , where escape sequences like Epson's ESC j n allowed partial reverse feeding (n/216 inch increments) to align for precise overprinting and emphasis without advanced modes. Early terminals also utilized them within ANSI escape sequences or direct control characters for text formatting in line-oriented interfaces, supporting applications like document preparation where visual enhancements were achieved through mechanical repetition rather than fonts. The , released in , incorporated support for reverse line feed alongside half-forward and half-reverse feeds, enabling sophisticated output like charts and emphasized text on or sprocket-fed at 100 . Modern terminal emulators, including , process these operations via ECMA-48-compliant controls, preserving compatibility for legacy software that relies on RI for formatting. Today, reverse line feeds see limited application primarily in retro computing recreations of vintage systems or emulations of period printers, where they recreate authentic overstrike behaviors. In digital text, similar effects are often emulated using combining characters, such as U+0332 COMBINING LOW LINE to simulate underlining over existing glyphs without positional reversal. For instance, in early environments supporting , a could be issued via PRINT CHR$(13); to reposition without advancing, followed by backspaces CHR$(8) to enable overprinting on the current line, though true reverse line feed required the RI character CHR$(141) on 8-bit systems for upward movement.

Partial Line Feeds

Partial line feeds involve advancing the or print position by a of a standard line height, typically half a line (such as 1/12 inch at 6 lines per inch spacing), to enable precise vertical positioning in and display systems. These mechanisms, often implemented via custom control codes or mechanical adjustments, differ from full line feeds by allowing incremental movements without completing a full newline operation. In early contexts, partial feeds were achieved through dedicated commands like "Half Line Feed Forward" and "Half Line Feed Reverse," which facilitated vertical motions for enhanced text formatting. A primary application of partial line feeds appears in typewriters and impact printers for creating subscripts and superscripts, where the platen or carriage advances halfway to position smaller characters relative to the baseline. For instance, Wheelwriter typewriters, such as the Model 1000, use a key combination (Code + H) to move the paper one-half line downward for subscript entry, followed by typing and an automatic return to the baseline upon completion. Similarly, in dot-matrix printers supporting ESC/P commands, sequences like ESC j n enable reverse partial feeds of n/216 inch, allowing fine adjustments for subscript rendering in documents. These techniques were essential in pre-digital to approximate mathematical or without dedicated fonts. Partial line feeds have been used in early terminal processing to precisely position output vertically. For example, early terminal processing used vertical half-line feed characters to position output, matching the number of feeds to the height of preceding glyphs for accurate rendering, such as for superscripts and subscripts. Although Unicode's variation selectors provide glyph alternatives, they do not directly control positioning, leaving partial feeds as a legacy solution for fine vertical adjustments. Despite their utility, partial line feeds lack a widespread digital standard today, remaining largely confined to legacy hardware and printer emulations. In modern web technologies, fractional effects are simulated via CSS properties like line-height set to values such as 0.5em, which adjusts the height of line boxes without altering newline semantics, though this does not replicate true partial advances. Printer languages like HP PCL 5 include explicit half line-feed controls (e.g., moving the cursor one-half line upward or downward), often via escape sequences for compatibility with older workflows. , while not relying on ESC codes, approximates partial line feeds through relative y-offset commands like rmoveto for fine-grained positioning in document composition. Related techniques, such as reverse line feeds, complement partial advances by enabling upward movements for overwriting or alignment corrections.

References

Add your contribution
Related Hubs
Contribute something
User Avatar
No comments yet.