Hubbry Logo
E-textE-textMain
Open search
E-text
Community hub
E-text
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
E-text
E-text
from Wikipedia

e-text (from "electronic text"; sometimes written as etext) is a general term for any document that is read in digital form, and especially a document that is mainly text. For example, a computer-based book of art with minimal text, or a set of photographs or scans of pages, would not usually be called an "e-text". An e-text may be a binary or a plain text file, viewed with any open source or proprietary software. An e-text may have markup or other formatting information, or not. An e-text may be an electronic edition of a work originally composed or published in other media, or may be created in electronic form originally. The term is usually synonymous with e-book.

E-text origins

[edit]

E-texts, or electronic documents, have been around since long before the Internet, the Web, and specialized E-book reading hardware. Roberto Busa began developing an electronic edition of Aquinas in the 1940s, while large-scale electronic text editing, hypertext, and online reading platforms such as Augment and FRESS appeared in the 1960s. These early systems made extensive use of formatting, markup, automatic tables of contents, hyperlinks, and other information in their texts, as well as in some cases (such as FRESS) supporting not just text but also graphics.[1]

"Just plain text"

[edit]

In some communities, "e-text" is used much more narrowly, to refer to electronic documents that are, so to speak, "plain vanilla ASCII". By this is meant not only that the document is a plain text file, but that it has no information beyond "the text itself"—no representation of bold or italics, paragraph, page, chapter, or footnote boundaries, etc. Michael S. Hart,[2] for example, argued that this "is the only text mode that is easy on both the eyes and the computer". Hart made the correct[according to whom?] point that proprietary word-processor formats made texts grossly inaccessible; but that is irrelevant to standard, open data formats. The narrow sense of "e-text" is now uncommon, because the notion of "just vanilla ASCII" (attractive at first glance), has turned out to have serious difficulties:

First, this narrow type of "e-text" is limited to the English letters. Not even Spanish ñ or the accented vowels used in many European languages cannot be represented (unless awkwardly and ambiguously as "~n" "a'"). Asian, Slavic, Greek, and other writing systems are impossible.

Second, diagrams and pictures cannot be accommodated, and many books have at least some such material; often it is essential to the book.

Third, "e-texts" in this narrow sense have no reliable way to distinguish "the text" from other things that occur in a work. For example, page numbers, page headers, and footnotes might be omitted, or might simply appear as additional lines of text, perhaps with blank lines before and after (or not). An ornate separator line might be represented instead by a line of asterisks (or not). Chapter and sections titles, likewise, are just additional lines of text: they might be detectable by capitalization if they were all caps in the original (or not). Even to discover what conventions (if any) were used, makes each book a new research or reverse-engineering project.

In consequence of this, such texts cannot be reliably re-formatted. A program cannot reliably tell where footnotes, headers or footers are, or perhaps even paragraphs, so it cannot re-arrange the text, for example to fit a narrower screen, or read it aloud for the visually impaired. Programs might apply heuristics to guess at the structure, but this can easily fail.

Fourth, and a perhaps surprisingly[according to whom?] important issue, a "plain-text" e-text affords no way to represent information about the work. For example, is it the first or the tenth edition? Who prepared it, and what rights do they reserve or grant to others? Is this the raw version straight off a scanner, or has it been proofread and corrected? Metadata relating to the text is sometimes included with an e-text, but there is by this definition no way to say whether or where it is preset. At best, the text of the title page might be included (or not), perhaps with centering imitated by indentation.

Fifth, texts with more complicated information cannot really be handled at all. A bilingual edition, or a critical edition with footnotes, commentary, critical apparatus, cross-references, or even the simplest tables. This leads to endless practical problems: for example, if the computer cannot reliably distinguish footnotes, it cannot find a phrase that a footnote interrupts.

Even raw scanner OCR output usually produces more information than this, such as the use of bold and italic. If this information is not kept, it is expensive and time-consuming to reconstruct it; more sophisticated information such as what edition you have, may not be recoverable at all.

If actuality, even "plain text" uses some kind of "markup"—usually control characters, spaces, tabs, and the like: Spaces between words; two returns and 5 spaces for paragraph. The main difference from more formal markup is that "plain texts" use implicit, usually undocumented conventions, which are therefore inconsistent and difficult to recognize.[3]

The narrow sense of e-text as "plain vanilla ASCII" has fallen out of favor.[according to whom?] Nevertheless, many such texts are freely available on the Web, perhaps as much because they are easily produced as because of any purported portability advantage. For many years Project Gutenberg strongly favored this model of text, but with time, has begun to develop and distribute more capable forms such as HTML.

See also

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
An e-text, or electronic text, is a digital representation of textual content that can be stored, accessed, and manipulated by computers, encompassing both digitized versions of printed materials and content created natively in digital formats such as e-books, web pages, and interactive documents. Unlike traditional printed texts, e-texts leverage for representation, enabling features like hyperlinking, searchability, and that integrate text with images, audio, or video. This form of text has become central to modern information dissemination, scholarship, and entertainment, with global repositories hosting millions of works. The development of e-texts traces back to the mid-20th century, with pioneering efforts in computing such as Father Roberto Busa's 1940s project to create a machine-generated concordance of Thomas Aquinas's works using early computers, marking one of the first systematic uses of digital tools for textual analysis. In 1971, Michael Hart founded , the world's first , which began distributing free e-books by typing literary classics into computers to promote universal access to knowledge. By the 1980s, advancements in microcomputers and software like Micro-OCP facilitated interactive analysis, expanding e-text applications in academia. The 1990s saw explosive growth with the , introducing hypertext and genres, while the brought widespread adoption through e-readers and mobile devices. Key characteristics of e-texts include their —allowing users to navigate non-linearly via hyperlinks—and their adaptability to various devices, which democratizes access but also raises challenges in preservation and . In scholarly contexts, e-texts serve as primary sources for literary and historical research, enabling computational methods like and corpus analysis that reveal patterns in language and authorship. Their significance extends to , with initiatives like the Dictionary of and Internet Shakespeare Editions digitizing rare materials to ensure long-term availability and foster interdisciplinary studies. As evolves, e-texts continue to redefine , blending traditional reading with dynamic, user-driven experiences.

Definition and Scope

Definition of E-text

An e-text, short for electronic text, refers to any document that is primarily composed of text and is accessed or read in digital form, such as computer-based versions of books, poetry, or novels. This form of content emphasizes written information encoded in a machine-readable format, distinguishing it from physical printed materials by its electronic storage and display capabilities. The scope of e-texts includes a range of formats centered on text, such as files and markup-enhanced texts that incorporate structural elements like hyperlinks for and . While historically focused on simple encodings like ASCII, modern e-texts often use for broader language support and can integrate supporting multimodal elements such as images or audio where text remains primary. E-texts prioritize textual content over predominantly non-text media like movies or audio files, where text is secondary or absent. E-texts are typically viewed through software readers, which can be open-source applications or programs designed for digital devices like computers, e-readers, or smartphones, enabling features such as searching, editing, and portability across platforms. These documents may be encoded in for simplicity or in more complex digital formats, ensuring human while supporting computational . E-books represent a common subset of e-texts, often tailored for dedicated reading devices but sharing the core emphasis on textual narrative.

Distinction from Other Digital Formats

E-texts are fundamentally text-centric digital documents that prioritize the conveyance of written content in a readable form, distinguishing them from multimedia formats such as interactive videos, audio files, or graphics-heavy applications that integrate non-text elements as primary components. Unlike multimedia digital media, which often rely on embedded visuals, animations, or sound to drive user engagement, e-texts maintain a focus on prose-based information, though they may include integrated enhancements like hyperlinks for non-linear navigation or supporting media such as images, ensuring the core experience remains centered on textual comprehension. This text-oriented design allows e-texts to avoid the resource-intensive demands of multimedia rendering, making them suitable for environments where processing power or bandwidth may be limited. While e-texts frequently serve as the foundational content for e-books, they encompass a broader scope that includes non-commercial materials such as academic papers, works, and scholarly articles not necessarily packaged for consumer markets. E-books, by contrast, typically involve formatted, commercially distributed versions with added features like aids or metadata, building upon e-texts but extending into structured publishing ecosystems. For instance, e-texts from repositories like exemplify this generality, providing raw digital transcripts accessible beyond proprietary e-book platforms. A key attribute of e-texts is their emphasis on portability and cross-device , often achieved through open, non-proprietary formats that render consistently on diverse hardware without . In opposition, many proprietary e-book formats, such as those tied to specific like Amazon's AZW or Apple's , restrict access to designated devices or software, potentially limiting and requiring additional conversions for broader use. This design choice in e-texts promotes universal accessibility, enabling seamless viewing on computers, mobile devices, or e-readers while minimizing compatibility barriers.

Historical Development

Early Innovations (1940s-1960s)

The early innovations in electronic text during the and laid the groundwork for by transforming literary works into machine-readable formats for analysis, predating widespread digital distribution. In 1949, Italian Jesuit scholar Roberto Busa initiated the Index Thomisticus project, creating the first comprehensive electronic edition of the works of by encoding over 10 million words onto punched cards using tabulating equipment. This effort, which spanned decades and involved manual punching of text by a team of operators, enabled the generation of a massive concordance through electro-mechanical sorting, marking the inception of machine-readable texts for scholarly linguistic and theological analysis. Busa's collaboration with not only adapted punched-card technology for applications but also demonstrated the potential of computational tools to index and retrieve textual data efficiently, completing the initial card-punching phase by the late . By the 1960s, advancements shifted toward interactive systems that incorporated hypertext concepts, allowing users to navigate and manipulate electronic texts dynamically. and his team at the Stanford Research Institute developed the oN-Line (NLS), later known as Augment, starting in the early 1960s, which introduced hyperlinked text, diagrams, and collaborative features on early computers. This , demonstrated publicly in the 1968 "," supported formatted text with embedded links and graphics, enabling online reading and annotation in a shared environment. Concurrently, at , and his students created the File Retrieval and SyStem (FRESS) in 1968 as a multi-user hypertext platform running on an 360 mainframe, which allowed for the integration of formatted text, hyperlinks, and graphical elements in educational and research contexts. FRESS built on prior experiments like the 1967 Hypertext (HES), emphasizing editable, linked documents for scholarly use. These pioneering efforts in punched-card encoding and hypertext systems influenced subsequent approaches to by highlighting the value of structured, searchable digital representations of .

Project Gutenberg Era (1970s Onward)

The Project Gutenberg era marked a pivotal shift toward the widespread and free distribution of electronic texts, beginning with the efforts of . In 1971, while a student at the University of at Urbana-Champaign, Hart gained access to a connected to the and decided to create the first e-text by typing the U.S. into the system, inspired by a printed copy he received during Independence Day celebrations. This act, distributed electronically via the network to demonstrate the potential of digital , aimed to provide unlimited free access to to promote and . Hart's initiative laid the foundation for , emphasizing volunteer-driven production of files from works to ensure broad accessibility. In 2000, the project transitioned to nonprofit status under the Project Gutenberg Literary Archive Foundation. During the 1970s and into the 1980s, expanded slowly but steadily through Hart's personal efforts and early volunteers, focusing on transcribing classic literature and . By the 1980s, the project had produced hundreds of e-texts, distributed primarily via for initial sharing and later through FTP sites as network infrastructure improved, allowing users to download files from university servers and bulletin boards. This era saw the collection grow from a handful of titles to include works like Lewis Carroll's , chosen for their cultural significance and compatibility with limited storage media such as 360K floppy disks. The volunteer model encouraged involvement in and formatting, transitioning from Hart's solo typing to collaborative efforts. Central to Project Gutenberg's approach was Hart's philosophy of creating "plain vanilla" ASCII texts—simple, unadorned files using the basic 7-bit American Standard Code for Information Interchange—to maximize , portability, and across diverse hardware and software without reliance on formats. This minimalist strategy ensured e-texts could be accessed on early computers, avoiding obsolescence and prioritizing content over presentation, which became a hallmark of the project's commitment to universal availability. By the late , this foundational work had scaled dramatically through distributed networks, resulting in over 60,000 e-texts. As of 2025, the collection includes over 75,000 e-texts.

Plain Text as E-text

Characteristics of Plain Text E-texts

Plain text e-texts consist of unformatted sequences of characters encoded using standards like ASCII or , representing text as a simple linear stream without any markup languages, typographic emphases such as bold or italics, or embedded multimedia elements. This format prioritizes raw content delivery, where each character is mapped to a unique in the chosen encoding scheme, ensuring the text remains a pure, unaltered representation of the original material. A core characteristic of e-texts is their reliance on 7-bit ASCII encoding for basic English-language content, which utilizes code points from 0 to 127 to cover uppercase and lowercase letters, digits, common , and control characters. This limited set facilitates universal readability for standard Western scripts while keeping the structure straightforward and devoid of variable-width complexities in early implementations. Additionally, exhibits high portability across diverse operating systems and hardware platforms, as its standardized encoding allows any compliant software to interpret the file without conversion or specialized interpreters. The format also achieves minimal file sizes, often requiring just one byte per character in ASCII, which optimizes storage and transmission efficiency by excluding all overhead from formatting or metadata. The deliberate adoption of "just " in e-texts underscores a commitment to long-term durability, circumventing dependencies on or evolving format specifications that could render content obsolete over time. This principle ensures that e-texts remain accessible indefinitely using basic text editors or viewers, preserving the integrity of the digital archive. Project Gutenberg's early files exemplify this approach by employing to safeguard literary works against technological obsolescence.

Advantages for Early Digital Use

One of the primary advantages of plain text e-texts in the pre-internet era was their exceptional portability, as they could be read on virtually any computing device equipped with a basic text editor, ranging from large mainframes to early personal computers, without requiring specialized software or proprietary viewers. This compatibility stemmed from the simplicity of the format, which relied on standard character encoding to ensure seamless access across diverse hardware and operating systems prevalent in the 1970s and 1980s. Another key benefit was the ease of distribution, facilitated by the format's minimal file sizes and low bandwidth demands, which made sharing via and (FTP) practical even on the limited networks of the time. In the 1970s, Project Gutenberg's initial e-texts were disseminated primarily through , while by the 1980s and early 1990s, FTP servers and bulletin board systems enabled broader sharing among academic and hobbyist communities worldwide. This approach allowed for rapid, cost-free proliferation of digital literature, democratizing access in an age when physical book distribution was constrained by and logistics. Project Gutenberg's adoption of a "" approach—producing e-texts in unadorned ASCII format—further ensured their long-term usability amid evolving technologies, thereby promoting to works without dependency on fleeting hardware or software standards. This strategy not only preserved texts against but also encouraged volunteer contributions and global collaboration in efforts throughout the late .

Evolution Beyond Plain Text

Limitations of Plain Text Formats

Plain text formats for e-texts inherently exclude non-text elements such as images, tables, and structured footnotes, limiting their ability to represent or complex layouts found in traditional books. This restriction stems from the format's design as a sequence of characters without embedded objects or visual aids, making it unsuitable for works that rely on illustrations or tabular data for comprehension. Additionally, lacks support for non-Latin characters in its early implementations, complicating the representation of international texts with accents, diacritics, or symbols. The original ASCII standard, developed in 1963, was confined to 7 bits, providing only 128 character combinations primarily for English-language needs and excluding most non-Roman scripts. This limitation persisted until the adoption of in the 1990s, with the first version published in 1991, which enabled broader multilingual support through extensions like UTF-8. The absence of formatting in plain text also leads to readability challenges, resulting in a monotonous that hinders with complex works like poetry or scholarly texts. For instance, visual elements such as line breaks, indents, or shapes in poems cannot be preserved, reducing the artistic intent and structural nuances of the original composition. In scholarly contexts, the lack of hierarchical formatting for footnotes or references further obscures navigational aids essential for academic analysis. This shift toward including versions, as seen in Project Gutenberg's practices beginning in the mid-1990s and becoming standard in the early , highlights the format's constraints for modern .

Adoption of Markup and Structured Formats

The adoption of markup and structured formats marked a significant evolution in e-text production, driven by the need to overcome the rigidity of while preserving its core accessibility. In the mid-1980s, the (SGML), formalized as ISO 8879 in 1986, emerged as a foundational meta-language for defining document structures using descriptive tags, enabling the separation of content from presentation in digital texts. SGML served as the precursor to subsequent formats, allowing for the encoding of complex hierarchies, metadata, and semantic elements essential for scholarly and archival e-texts. Building on SGML, the arose in 1987 as an international effort to standardize markup for texts, addressing the fragmentation of early digital encoding practices through hardware- and software-independent guidelines. The TEI's first draft guidelines were released in 1990, with the initial full version (P3) published in 1994, emphasizing XML-compatible structures for encoding linguistic features, annotations, and variants while maintaining text as the primary medium. Concurrently, the development of in the early 1990s, directly derived from SGML, facilitated the integration of hyperlinks and basic formatting into e-texts, broadening their utility for web-based distribution. By the mid-1990s, began incorporating HTML into its e-books to enhance readability and visual structure, transitioning from exclusive reliance to support hypertext elements without external dependencies. The late 1990s saw the rise of Extensible Markup Language (XML), a simplified subset of SGML released in 1998 by the (W3C), which further enabled customizable schemas for e-text metadata, navigation, and interoperability. This paved the way for specialized e-text standards, such as the Open eBook Publication Structure (OEB) introduced in 1999 by the Open eBook Forum, which utilized HTML 4.0 and CSS for structured, reflowable digital books with embedded metadata. Evolving from OEB, the format—standardized in 2007 by the International Digital Publishing Forum—combined 1.1 with CSS 2.1 to create accessible, device-agnostic e-books that retained the text-centric focus while supporting advanced styling and multimedia integration. These formats collectively transformed e-texts from static files into dynamic, searchable resources, influencing digital publishing by prioritizing semantic markup for long-term preservation and usability. Subsequent advancements continued with EPUB 3.0 in 2011, which introduced support for , , audio, video, and enhanced accessibility features, allowing for more interactive and multimedia-rich e-texts. The International Digital Publishing Forum merged with the W3C in 2017, placing EPUB under W3C standards. As of November 2023, EPUB 3.3 remains the latest version, incorporating improvements in packaging, accessibility, and support for complex layouts while maintaining .

Modern Applications and Impact

Role in Digital Libraries and Publishing

E-texts have played a pivotal role in the establishment and growth of digital libraries, serving as the foundational content for preserving and disseminating works. , launched in 1971 as the world's first , pioneered the creation of e-texts by digitizing literary in and other formats, amassing over 75,000 free e-books by 2025 through volunteer efforts. This initiative demonstrated the potential of e-texts to democratize access to knowledge, enabling global users to download or read works online without physical constraints. Building on this model, expansions such as the have incorporated millions of digitized texts into their collections, including scanned books and e-texts from sources, fostering collaborative preservation efforts across institutions. Similarly, has digitized over 40 million titles since 2004, focusing on materials to create searchable e-text archives that enhance scholarly research and access. In the publishing industry, e-texts have driven a profound shift toward digital distribution and self-publishing, lowering barriers to entry and expanding market reach. Platforms like Smashwords, founded in 2008, revolutionized the landscape by allowing authors to upload e-texts in formats such as EPUB and distribute them to major retailers like Apple Books and Barnes & Noble, offering royalties up to 80% and bypassing traditional gatekeepers. This model reduced production costs—eliminating printing and shipping expenses—while enabling instant global availability, which has empowered independent authors to reach diverse audiences. Evolving from their plain text origins, modern e-texts in structured formats have further facilitated this transition by supporting multimedia and reflowable content. By 2025, e-books, primarily composed of e-texts, accounted for approximately 21% of global book sales, with platforms like Amazon Kindle driving higher adoption through open standards that ensure compatibility across devices. This growth underscores e-texts' impact on commercial publishing, where they now represent a core revenue stream, particularly for digital-first titles.

Accessibility and Standardization Efforts

E-texts significantly enhance accessibility for users with disabilities through features like reflowable text in the format, which allows content to adapt to different screen sizes and reading preferences, thereby supporting screen readers for visually impaired individuals by maintaining a logical reading order and enabling text-to-speech functionality. This reflowable design ensures that assistive technologies can navigate and interpret the content without fixed layout constraints, promoting inclusivity across diverse devices. Complementing this, the DAISY format provides audio-enhanced e-texts with synchronized audio narration, text, and images, allowing print-disabled users to access material through listening, enlarged text, or displays for a navigable experience similar to print books. Standardization efforts further bolster e-text usability, with the (W3C) issuing (WCAG) 2.1, which outline principles for making web-based content—including e-texts—perceivable, operable, understandable, and robust for users with disabilities, such as ensuring compatibility with screen readers and keyboard navigation. Additionally, the International Organization for Standardization (ISO) formalized EPUB 3.0 in 2011 as ISO/IEC TS 30135, incorporating support for to render mathematical expressions accessibly and elements like synchronized audio and video, enabling richer, inclusive digital publications. These standards build on the portability of early plain-text e-texts by extending compatibility to advanced assistive tools. A key initiative advancing global access is the Accessible Books Consortium (ABC), launched in 2014 by the (WIPO), which facilitates the production and distribution of e-texts in braille-compatible formats, alongside audio and options, to serve print-disabled users worldwide through an international exchange system. By partnering with libraries, publishers, and advocacy groups, ABC has expanded the availability of such accessible materials, addressing barriers for over 285 million people with visual impairments globally.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.