Recent from talks
Nothing was collected or created yet.
Machine-readable medium and data
View on Wikipedia
In communications and computing, a machine-readable medium (or computer-readable medium) is a medium capable of storing data in a format easily readable by a digital computer or a sensor. It contrasts with human-readable medium and data.
The result is called machine-readable data or computer-readable data, and the data itself can be described as having machine-readability.
Data
[edit]Machine-readable data must be structured data.[1]
Attempts to create machine-readable data occurred as early as the 1960s. At the same time that seminal developments in machine-reading and natural-language processing were releasing (like Weizenbaum's ELIZA), people were anticipating the success of machine-readable functionality and attempting to create machine-readable documents. One such example was musicologist Nancy B. Reich's creation of a machine-readable catalog of composer William Jay Sydeman's works in 1966.
In the United States, the OPEN Government Data Act of 14 January 2019 defines machine-readable data as "data in a format that can be easily processed by a computer without human intervention while ensuring no semantic meaning is lost." The law directs U.S. federal agencies to publish public data in such a manner,[2] ensuring that "any public data asset of the agency is machine-readable".[3]
Machine-readable data may be classified into two groups: human-readable data that is marked up so that it can also be read by machines (e.g. microformats, RDFa, HTML), and data file formats intended principally for processing by machines (CSV, RDF, XML, JSON). These formats are only machine readable if the data contained within them is formally structured; exporting a CSV file from a badly structured spreadsheet does not meet the definition.
Machine readable is not synonymous with digitally accessible. A digitally accessible document may be online, making it easier for humans to access via computers, but its content is much harder to extract, transform, and process via computer programming logic if it is not machine-readable.[4]
Extensible Markup Language (XML) is designed to be both human- and machine-readable, and Extensible Stylesheet Language Transformations (XSLT) is used to improve the presentation of the data for human readability. For example, XSLT can be used to automatically render XML in Portable Document Format (PDF). Machine-readable data can be automatically transformed for human-readability but, generally speaking, the reverse is not true.
For purposes of implementation of the Government Performance and Results Act (GPRA) Modernization Act, the Office of Management and Budget (OMB) defines "machine readable format" as follows: "Format in a standard computer language (not English text) that can be read automatically by a web browser or computer system. (e.g.; xml). Traditional word processing documents and portable document format (PDF) files are easily read by humans but typically are difficult for machines to interpret. Other formats such as extensible markup language (XML), (JSON), or spreadsheets with header columns that can be exported as comma separated values (CSV) are machine readable formats. As HTML is a structural markup language, discreetly labeling parts of the document, computers are able to gather document components to assemble tables of contents, outlines, literature search bibliographies, etc. It is possible to make traditional word processing documents and other formats machine readable but the documents must include enhanced structural elements."[5]
Media
[edit]Examples of machine-readable media include magnetic media such as magnetic disks, cards, tapes, and drums, punched cards and paper tapes, optical discs, barcodes and magnetic ink characters.
Common machine-readable technologies include magnetic recording, processing waveforms, and barcodes. Optical character recognition (OCR) can be used to enable machines to read information available to humans. Any information retrievable by any form of energy can be machine-readable.
Examples include:
- Acoustics
- Chemical
- Electrical
- Magnetic storage
- Mechanical
- Tins And Swins
- Punched card
- Paper tape
- Music box cylinder or disk
- Grooves (See also: Audio Data)
- Phonograph cylinder
- Gramophone record
- DictaBelt (groove on plastic belt)
- Capacitance Electronic Disc
- Tins And Swins
- Optics
- Thermodynamic
Applications
[edit]Documents
[edit]Catalogs
[edit]Dictionaries
[edit]Machine-readable dictionary (MRD) is a dictionary stored as machine-readable data instead of being printed on paper. It is an electronic dictionary and lexical database.
A machine-readable dictionary is a dictionary in an electronic form that can be loaded in a database and can be queried via application software. It may be a single language explanatory dictionary or a multi-language dictionary to support translations between two or more languages or a combination of both. Translation software between multiple languages usually apply bidirectional dictionaries. An MRD may be a dictionary with a proprietary structure that is queried by dedicated software (for example online via internet) or it can be a dictionary that has an open structure and is available for loading in computer databases and thus can be used via various software applications. Conventional dictionaries contain a lemma with various descriptions. A machine-readable dictionary may have additional capabilities and is therefore sometimes called a smart dictionary. An example of a smart dictionary is the Open Source Gellish English dictionary.
The term dictionary is also used to refer to an electronic vocabulary or lexicon as used for example in spelling checkers. If dictionaries are arranged in a subtype-supertype hierarchy of concepts (or terms) then it is called a taxonomy. If it also contains other relations between the concepts, then it is called an ontology. Search engines may use either a vocabulary, a taxonomy or an ontology to optimise the search results. Specialised electronic dictionaries are morphological dictionaries or syntactic dictionaries.
The term MRD is often contrasted with NLP dictionary, in the sense that an MRD is the electronic form of a dictionary which was printed before on paper. Although being both used by programs, in contrast, the term NLP dictionary is preferred when the dictionary was built from scratch with NLP in mind. An ISO standard for MRD and NLP is able to represent both structures and is called Lexical Markup Framework.[6]Passports
[edit]A machine-readable passport (MRP) is a machine-readable travel document (MRTD) with the data on the identity page encoded in optical character recognition format. Many countries began to issue machine-readable travel documents in the 1980s. Most travel passports worldwide are MRPs. The International Civil Aviation Organization (ICAO) requires all ICAO member states to issue only MRPs as of April 1, 2010, and all non-MRP passports must expire by November 24, 2015.[7]
Machine-readable passports are standardized by the ICAO Document 9303 (endorsed by the International Organization for Standardization and the International Electrotechnical Commission as ISO/IEC 7501-1) and have a special machine-readable zone (MRZ), which is usually at the bottom of the identity page at the beginning of a passport. The ICAO 9303 describes three types of documents corresponding to the ISO/IEC 7810 sizes:
- "Type 3" is typical of passport booklets. The MRZ consists of 2 lines × 44 characters.
- "Type 2" is relatively rare with 2 lines × 36 characters.
- "Type 1" is of a credit card-size with 3 lines × 30 characters.
The fixed format allows specification of document type, name, document number, nationality, date of birth, sex, and document expiration date. All these fields are required on a passport. There is room for optional, often country-dependent, supplementary information. There are also two sizes of machine-readable visas similarly defined.
Computers with a camera and suitable software can directly read the information on machine-readable passports. This enables faster processing of arriving passengers by immigration officials, and greater accuracy than manually-read passports, as well as faster data entry, more data to be read and better data matching against immigration databases and watchlists.
Apart from optically readable information, many passports contain an RFID chip which enables computers to read a higher amount of information, for example a photo of the bearer. These passports are called biometric passports and are also described by ICAO 9303.See also
[edit]- Paper data storage
- Symmetric Phase Recording
- Open data
- Linked data
- Human-readable medium and data
- Semantic Web
- Machine-readable postal marking
References
[edit]- ^ "Machine readable". opendatahandbook.org. Retrieved 2019-07-22.
- ^ "HR4174". stratml.us.
- ^ "HR4174". stratml.us.
- ^ Hendler, Jim; Pardo, Theresa A. (2012-09-24). "A Primer on Machine Readability for Online Documents and Data". Data.gov. Archived from the original on 2021-03-20. Retrieved 2015-02-27.
- ^ OMB Circular A-11, Part 6, Preparation, Submission, and Execution of the Budget
- ^ Gil Francopoulo (edited by) LMF Lexical Markup Framework, ISTE / Wiley 2013 (ISBN 978-1-84821-430-9)
- ^ "Last Week for States to Ensure Expiration of Non-Machine Readable Passports". ICAO. Montréal. 17 November 2015. Retrieved 11 March 2024.
This article incorporates public domain material from Federal Standard 1037C. General Services Administration. Archived from the original on 2022-01-22.
Machine-readable medium and data
View on GrokipediaDefinition and Fundamentals
Core Concepts
A machine-readable medium refers to any physical or electronic carrier capable of storing data in a format that a computer or mechanical device can access and interpret directly, such as through binary encoding preserved in magnetic, optical, or electrical states.[8] This distinguishes it from transient signals, as the medium must retain information durably for repeated access, exemplified by devices like hard disk drives where data persists via aligned magnetic domains representing 0s and 1s.[9] In patent contexts, such media are often specified as non-transitory to exclude propagating signals like carrier waves, ensuring eligibility under U.S. law by limiting scope to tangible storage.[10] Machine-readable data, conversely, constitutes the encoded content itself—structured sequences of bits or symbols processable by algorithms without human intervention, typically in binary form where each bit denotes one of two states (0 or 1) to represent all information from numbers to text.[11] This binary foundation arises from the two-state nature of electronic switches in hardware, enabling reliable logic operations via Boolean algebra, as standardized in computing architectures since the mid-20th century.[12] Encoding schemes, such as ASCII for 7-bit text (mapping 128 characters to binary tuples) or UTF-8 for variable-length Unicode, ensure data fidelity across systems, with error-detection methods like parity bits or CRC codes mitigating corruption during read/write cycles.[13] At its core, the interplay between medium and data hinges on causal mechanisms: physical phenomena (e.g., laser reflection on pits in optical media or voltage levels in RAM) map to logical bits, which software interprets per predefined schemas, facilitating scalability from kilobytes in early floppy disks (introduced 1971, holding ~80 KB) to petabytes in modern SSDs.[2] Standardization bodies emphasize structured formats like XML or JSON for interoperability, where data elements are tagged for parsing, reducing ambiguity in automated processing—unlike unstructured text requiring optical character recognition.[14] This framework underpins data integrity, as verifiable checksums (e.g., MD5 hashes, 128-bit digests) confirm unaltered transmission or storage.[15]Distinction from Human-Readable Formats
Machine-readable media encode data in formats that enable direct, automated interpretation and processing by computational systems, often utilizing binary representations, structured schemas, or proprietary serializations that prioritize efficiency over visual legibility.[16][14] In essence, such data requires no human intervention for machines to parse, manipulate, or execute it, as seen in formats like compiled executables or database binaries where content manifests as non-intuitive sequences of bits or bytes.[2] This design stems from the causal imperative of computational hardware, which operates on low-level electrical states rather than symbolic meaning, necessitating encodings that align with processor architectures for minimal latency and resource use.[16] Human-readable formats, by contrast, employ textual or graphical encodings optimized for direct human cognition, such as plain ASCII text, natural language documents, or tabular printouts, where information is conveyed through familiar alphabets, numbers, and spacing without decoding tools.[17] These prioritize perceptual accessibility, allowing unaided eyes to discern patterns and semantics, but they impose parsing overhead on machines, often demanding algorithms like natural language processing or regex matching to extract usable structures.[16] The reciprocal incompatibility arises because human-readable data lacks inherent rigidity—its flexibility for subjective interpretation hinders deterministic machine handling—while machine-readable data's opacity to humans stems from abstraction layers that discard legibility for compactness, as evidenced by compression ratios in binary versus textual storage where the former achieves up to 90% size reduction in datasets like genomic sequences.[17][16] A core distinction manifests in processing paradigms: machine-readable data supports causal chains of automated operations, such as real-time analytics on structured JSON feeds ingested by APIs, enabling scalability in systems handling petabytes daily, whereas human-readable formats facilitate manual verification and iterative editing, critical in domains like legal contracts or scientific notebooks but prone to errors in bulk automation, with studies showing transcription inaccuracies exceeding 1% in OCR-scanned text versus near-zero in native digital parses.[18][16] Hybrid formats like XML or CSV approximate both worlds by imposing syntactic rules—tags or delimiters—that machines exploit for parsing while affording humans approximate readability, though even these trade off: XML's verbosity inflates file sizes by factors of 2-10 compared to binary alternatives like Protocol Buffers, illustrating the tension between human intuitiveness and machine throughput.[19][17] This divide influences practical applications, where machine-readable primacy drives interoperability in enterprise systems—e.g., EDI standards for supply chains process transactions at millions per hour—while human-readable persists for accountability, as in audit trails requiring human oversight to detect anomalies undetectable in abstracted binaries.[18] Empirical trade-offs reveal no universal superiority; selection hinges on context, with machine-readable excelling in high-volume, low-latency environments per benchmarks from data processing frameworks, yet human-readable mitigating risks in interpretive tasks where causal fidelity demands human judgment over algorithmic approximation.[16][17]Historical Development
Pre-Digital Era (Punch Cards and Early Mechanical Storage)
The Jacquard loom, invented by Joseph Marie Jacquard in 1801, represented the earliest practical application of punched cards as a machine-readable medium for controlling automated processes. These cards, made of stiff paper or cardboard with holes punched in specific positions, directed the loom's needles and hooks to weave intricate textile patterns by mechanically selecting warp threads for each row. Unlike manual control, which required skilled operators for complex designs, the punched cards allowed repeatable, error-free instruction storage and execution, with chains of cards handling extended sequences. This system automated what was previously labor-intensive, enabling mass production of figured fabrics and demonstrating punched media's capacity for encoding sequential instructions interpretable solely by machinery.[20] Building on this principle, punched cards transitioned to data storage and processing in the late 19th century through Herman Hollerith's innovations for statistical tabulation. In 1889, facing delays in manual processing of the 1880 U.S. Census—which had taken nearly a decade—Hollerith adapted Jacquard's concept, creating rectangular cards sized to the U.S. dollar bill (approximately 7.375 by 3.25 inches) with 24 columns of round holes to encode demographic variables like age, marital status, and occupation via hole positions and combinations. His electromechanical tabulating machine, deployed for the 1890 Census, read cards by passing them through mercury contacts under spring-loaded pins; a hole allowed pin contact with conductive fluid, completing an electrical circuit to increment counters on dials. This reduced census tabulation from over seven years to about two months, processing 62 million cards with 99% accuracy in data capture.[21][22] Hollerith's Tabulating Machine Company, founded in 1896 and later evolving into the Computing-Tabulating-Recording Company (renamed IBM in 1924), standardized 80-column punched cards by 1928, expanding capacity with rectangular holes for denser encoding while maintaining mechanical and electrical readability. These cards served as non-volatile storage for unit-record data processing, where each card held one record, sorted and tabulated via gang punches, sorters, and reproducers in electromechanical systems predating electronic computers. Early mechanical storage complemented this through devices like notched-edge cards or edge-punched variants, which allowed manual or semi-automated sorting by physical alignment of notches representing categories, though less precise than Hollerith's perforated interiors. Such media prioritized durability and machine-specific interpretability over human legibility, laying groundwork for scalable data handling in business and government before digital electronics.[23][24]Analog to Digital Transition (Magnetic and Optical Media)
The adaptation of magnetic media for digital data storage began shortly after its invention for analog audio recording. In 1928, German engineer Fritz Pfleumer patented magnetic tape coated with iron oxide particles on a paper or film base, initially enabling continuous analog signal capture for sound.[25] By the early 1950s, this technology transitioned to discrete binary encoding for computer data, with IBM's 1952 Model 726 tape drive—used with the IBM 701—storing up to 2 million alphanumeric characters per 1,200-foot reel at densities of 100 bits per inch, far surpassing punch cards in capacity and access speed.[26] [27] This shift facilitated reliable, rewritable machine-readable storage through magnetization patterns representing 0s and 1s, incorporating error-checking via parity bits and enabling sequential data processing essential for early batch computing. Optical media's analog-to-digital transition occurred later, building on laser-based reading principles. Analog optical formats, such as the 1978 LaserDisc, encoded video signals as variable-length pits modulating reflected laser light for continuous playback, but suffered from lower precision and vulnerability to dust. The compact disc (CD), jointly developed by Philips and Sony with prototypes demonstrated in 1979 and commercial audio release in 1982, digitized content using pulse-code modulation (PCM) to represent samples as binary pits and lands, achieving 16-bit/44.1 kHz audio fidelity with error correction via Reed-Solomon codes. The CD-ROM variant, standardized in 1985 under ISO 9660, extended this to data storage, holding approximately 650 megabytes—equivalent to 250,000 pages of text—on a 12 cm polycarbonate disc read by a 780 nm laser, transforming software distribution from floppy disks to high-density, tamper-resistant media. These transitions underscored causal advantages of digital encoding: binary states resisted noise degradation inherent in analog signals, enabling data integrity through redundancy and algorithmic correction, while magnetic and optical substrates provided scalable, non-volatile persistence for machine-readable instructions and datasets. By the 1990s, hybrid advancements like magneto-optical discs further bridged eras, combining magnetic writing with optical verification for capacities exceeding 1 GB per cartridge.[28]Contemporary Evolution (Solid-State Drives and Cloud-Based Storage)
The transition to solid-state drives (SSDs) marked a pivotal shift in machine-readable media by replacing mechanical components with electronic NAND flash memory, eliminating latency from spinning platters and read/write heads inherent in hard disk drives (HDDs). NAND flash, invented in 1987 by Fujio Masuoka and colleagues at Toshiba, enabled non-volatile storage without power, storing data as electrical charges in floating-gate transistors.[29] Early SSD prototypes appeared in the 1970s using DRAM or other semiconductors for military and mainframe applications, but flash-based designs gained traction in the 1990s; SanDisk released a 20 MB SSD for IBM laptops in 1991. Mass-market viability emerged in the mid-2000s as NAND fabrication processes scaled from 90 nm to sub-10 nm nodes, reducing costs per gigabyte from over $10 in 2008 to under $0.10 by 2020, while capacities surged from tens of GB to multi-TB.[30] This evolution improved random access times by orders of magnitude—SSDs achieve latencies under 100 microseconds versus milliseconds for HDDs—due to parallel access across flash cells, enhancing throughput for machine-readable data processing in databases and virtualization.[31] By the 2010s, SSDs supplanted HDDs in consumer devices and enterprise servers, with PCIe NVMe interfaces enabling sequential speeds exceeding 7 GB/s in 2020-era drives, compared to SATA HDD limits around 200 MB/s.[32] Global SSD shipments hit 1.1 billion units in 2023, comprising over 50% of storage revenue in data centers by 2024, driven by lower power consumption (watts per TB far below HDDs) and durability against vibration, critical for mobile and cloud infrastructure.[33] However, flash wear from program/erase cycles—limited to 3,000-100,000 per cell depending on TLC vs. SLC types—necessitates over-provisioning and error-correcting codes, with enterprise SSDs incorporating SLC caching for sustained writes. This solid-state foundation underpins contemporary machine-readable media by prioritizing speed and reliability over HDD density for cost-sensitive archival roles. Cloud-based storage further abstracted machine-readable data from local media, evolving into distributed systems where data resides on remote server farms accessed via APIs over networks, often classified as non-transitory despite transmission signals. AWS Simple Storage Service (S3), launched March 14, 2006, pioneered durable object storage with 99.999999999% (11 9s) availability, using replication across facilities.[34] Competitors followed: Microsoft Azure Blob in 2008 and Google Cloud Storage in 2010, leveraging virtualization to pool SSD/HDD resources for elastic scaling. Adoption accelerated post-2020 amid remote work; corporate data in the cloud rose from 30% in 2015 to 60% by 2022, with projections for 50% of global data (200 zettabytes total) cloud-stored by 2025.[35] The market reached $161.28 billion in 2025, growing at 21.7% CAGR through 2032, fueled by AI training datasets and edge computing, though causal risks include vendor lock-in and outages—like AWS's 2021 disruptions affecting millions—highlighting dependencies on proprietary protocols over sovereign local media.[36] Hybrid models integrate on-premises SSDs with cloud for tiered access, optimizing cost via infrequent-access tiers at $0.00099/GB-month.[37]Types of Machine-Readable Media
Physical and Tangible Media
Physical and tangible machine-readable media consist of non-transitory storage devices that encode data in physical forms accessible by machines, such as altered magnetic domains, optical reflections, or semiconductor states, providing persistent retention independent of power or transmission. These media underpin data storage in computing by offering capacities from kilobytes in early formats to petabytes in contemporary drives, with read/write mechanisms tailored to their material properties.[38][7] Magnetic storage media utilize ferromagnetic materials to represent binary data through oriented magnetic fields on rotating or linear substrates. Hard disk drives (HDDs), a primary example, store data on spinning platters coated with magnetic oxide; the first commercial HDD, IBM's Model 350 introduced in 1956, offered 5 megabytes of capacity across 50 24-inch platters rotating at 1,200 RPM.[39][40] Floppy disks, flexible magnetic discs in protective envelopes, emerged in 1971 with IBM's 8-inch format holding 80 kilobytes, enabling portable data transfer before widespread HDD adoption.[41] Magnetic tapes, sequential-access reels or cassettes, provided cost-effective archival storage, with early variants like those in the 1950s IBM systems supporting batch processing.[5] Optical storage media encode data as microscopic pits and lands on polycarbonate discs, read via laser reflection to detect variations in light intensity. Compact Disc Read-Only Memory (CD-ROM), jointly developed by Philips and Sony, was demonstrated in 1984 with a standard capacity of approximately 650 megabytes, revolutionizing software distribution by allowing vast data volumes on removable media.[42] DVD-ROMs extended this to 4.7 gigabytes per layer, while Blu-ray discs reach 25 gigabytes single-layer, using shorter-wavelength lasers for denser packing. These formats prioritize read-heavy applications like media libraries due to write-once or limited-rewrite limitations in standard variants.[38] Solid-state storage media employ flash memory chips, typically NAND architecture, to trap electrons in floating gates for non-volatile retention without moving parts, yielding higher speeds and shock resistance than mechanical alternatives. USB flash drives and solid-state drives (SSDs) exemplify this, with SSDs replacing HDDs in many systems for capacities exceeding 1 terabyte at read speeds over 7,000 MB/s in enterprise models. Invented in the late 1980s, NAND flash enabled compact, removable media like memory cards, displacing floppies for portable storage by the 2000s.[38][43] Early mechanical forms, such as punched cards and tapes, prefigure modern media by perforating paper or film to represent data via absence or presence of material, readable by mechanical or optical sensors; these tangible formats facilitated tabulation in 19th-century censuses and early computing.[44] Across categories, physical media ensure data integrity through error-correcting codes and redundancy, though susceptibility to environmental degradation—magnetic demagnetization, optical scratching, or charge leakage—necessitates backups.[5]Electronic and Digital Media
Electronic machine-readable media encompass storage devices that utilize electrical or electromagnetic processes, often combined with digital binary encoding, to record and access data without mechanical intermediaries like punch readers. These media typically involve active electronic components, such as transistors or read heads, for data manipulation, enabling high-speed, automated retrieval by computing systems. Examples include hard disk drives (HDDs), which employ electronic servo mechanisms to position heads over magnetic platters, and solid-state drives (SSDs), which rely on semiconductor memory cells.[5][45][46] Solid-state electronic media, exemplified by NAND flash memory, represent a non-volatile digital storage solution where data persists without continuous power, stored via charge trapping in floating-gate transistors. Toshiba invented and commercialized NAND flash in 1987, revolutionizing portable and high-performance storage by eliminating moving parts and reducing failure rates from mechanical wear.[47] USB flash drives and SSDs, built on this technology, facilitate machine-readable data transfer and retention in formats like FAT32 or NVMe, with widespread adoption in consumer electronics by the early 2000s due to their durability and energy efficiency.[5] Hard disk drives integrate electronic circuitry with magnetic domains to encode digital bits, allowing random access to large datasets. IBM shipped the first commercial HDD, the Model 350 disk storage unit, in 1956 as part of the RAMAC system, offering approximately 5 megabytes of capacity across 50 platters.[39] Modern HDDs continue to serve as cost-effective, high-capacity electronic media for archival machine-readable data, though susceptible to electromagnetic interference and requiring error-correcting codes for reliability. Both SSDs and HDDs support standardized digital serialization protocols, ensuring interoperability in computational environments.[46]Legal Distinctions: Transitory Signals vs. Non-Transitory Storage
In United States patent law, the distinction between transitory signals and non-transitory storage is critical for determining the eligibility of claims directed to machine-readable media under 35 U.S.C. § 101, which limits patentable subject matter to processes, machines, manufactures, and compositions of matter. Transitory signals, such as propagating electromagnetic waves or electrical impulses carrying data, are deemed ineligible because they lack the tangible, structural permanence required to qualify as a "manufacture" or "machine," despite being physical phenomena.[48] This ruling stems from the Federal Circuit's decision in In re Nuijten (500 F.3d 1346, Fed. Cir. 2007), where claims to signals embodying watermark data were rejected as non-statutory, emphasizing that fleeting, ephemeral forms do not constitute statutory categories.[10] Non-transitory storage, by contrast, refers to physical media capable of persistently retaining data without ongoing propagation or power dependency for mere existence, such as magnetic disks, optical discs, or semiconductor memory. The U.S. Patent and Trademark Office (USPTO) guidance explicitly notes that transitory signals fail Step 1 of the eligibility analysis due to insufficient concrete structure, whereas non-transitory media satisfy the "manufacture" prong by embodying fixed, tangible form.[10] Post-Nuijten, patent drafters adopted the qualifier "non-transitory" in claims like "a non-transitory computer-readable medium storing instructions" to explicitly disclaim signal embodiments and preempt § 101 rejections, a practice endorsed in USPTO examples where such language ensures claims avoid encompassing ineligible signals.[49] This legal bifurcation influences the scope and enforceability of patents on machine-readable data, as transitory claims risk invalidation for abstractness or lack of tangibility, while non-transitory formulations anchor inventions to verifiable physical embodiments. For instance, the Federal Circuit has upheld that even encoded data on transitory carriers does not confer eligibility absent a claim to the underlying storage structure. Internationally, similar principles apply under frameworks like the European Patent Convention, where signals are often excluded from "physical carrier" definitions, though U.S. jurisprudence provides the most codified distinction via Nuijten and subsequent USPTO clarifications.[38] The term "non-transitory" thus serves not as a functional limitation but as a clarificatory one, preventing overbroad coverage of non-patentable ephemera while protecting tangible storage innovations.[50]Machine-Readable Data Characteristics
Structured Data Formats and Encoding
Structured data formats impose a predefined organization on information, such as hierarchical, tabular, or relational models, facilitating unambiguous parsing and validation by machines on storage media. These formats ensure data integrity and interoperability across systems, contrasting with unstructured data that lacks such schema enforcement. Common examples include text-based representations like CSV for simple tabular records, where each line denotes a row and fields are delimited by commas, as formalized in IETF RFC 4180 published on October 20, 2005. XML provides a tag-based hierarchy for complex, nested data, originating as a W3C Recommendation on February 10, 1998, derived from SGML to enable extensible schemas via XSD. JSON, leveraging key-value pairs and arrays from JavaScript object notation, supports lightweight serialization and was standardized in IETF RFC 8259 on December 7, 2017.[51] Encoding schemes convert logical data structures into byte sequences suitable for machine-readable media, balancing compactness, speed, and error resilience. Text encodings, such as UTF-8—which maps Unicode code points to variable-length bytes and was specified in IETF RFC 3629 on November 2003—predominate in formats like XML and JSON, allowing partial human inspection while ensuring universal character representation across 1,112,064 assigned code points as of Unicode 15.1 in September 2023. Binary encodings, by contrast, forgo human readability for efficiency; Protocol Buffers, developed by Google and open-sourced in July 2008, use schema-defined wire formats to achieve up to 10x size reduction over equivalent JSON for large payloads. Similarly, Apache Avro employs schema evolution in binary streams, optimizing for distributed systems like Hadoop since its initial release in 2009.| Encoding Type | Characteristics | Examples | Performance Notes |
|---|---|---|---|
| Text-based | Human-inspectable byte sequences using printable characters; larger footprint due to redundancy. | UTF-8 in JSON/XML; ASCII subsets in CSV. | Parsing latency ~20-50% higher than binary for datasets >1GB; easier debugging via tools like jq or xmllint.[52] |
| Binary | Opaque byte streams with schema-defined packing; compact and cache-efficient. | Protocol Buffers (varints for integers); Avro with sync markers. | Reduced bandwidth (e.g., 3-10x smaller than text equivalents); faster deserialization via direct memory mapping, critical for real-time media like SSDs. |