Recent from talks
Nothing was collected or created yet.
JSON streaming
View on WikipediaThis article may be too technical for most readers to understand. (December 2015) |
JSON streaming comprises communications protocols to delimit JSON objects built upon lower-level stream-oriented protocols (such as TCP), that ensures individual JSON objects are recognized, when the server and clients use the same one (e.g. implicitly coded in). This is necessary as JSON is a non-concatenative protocol (the concatenation of two JSON objects does not produce a valid JSON object).
Introduction
[edit]JSON is a popular format for exchanging object data between systems. Frequently there's a need for a stream of objects to be sent over a single connection, such as a stock ticker or application log records.[1] In these cases there's a need to identify where one JSON encoded object ends and the next begins. Technically this is known as framing.
There are four common ways to achieve this:
- Send the JSON objects formatted without newlines and use a newline as the delimiter.[2]
- Send the JSON objects concatenated with a record separator control character as the delimiter.[3]
- Send the JSON objects concatenated with no delimiters and rely on a streaming parser to extract them.
- Send the JSON objects prefixed with their length and rely on a streaming parser to extract them.
Comparison
[edit]Line-delimited JSON works very well with traditional line-oriented tools.
Concatenated JSON works with pretty-printed JSON but requires more effort and complexity to parse. It doesn't work well with traditional line-oriented tools. Concatenated JSON streaming is a superset of line-delimited JSON streaming.
Length-prefixed JSON works with pretty-printed JSON. It doesn't work well with traditional line-oriented tools, but may offer performance advantages over line-delimited or concatenated streaming. It can also be simpler to parse.
Approaches
[edit]Newline-delimited JSON
[edit]Two terms for equivalent formats of line-delimited JSON are:
- Newline delimited (NDJSON)[4][5] - The old name was Line delimited JSON (LDJSON).[6]
- JSON lines (JSONL),[7] that is the current (2025) and most used standard, in Big Data and other applications.[8]
Streaming makes use of the fact that the JSON format does not allow return and newline characters within primitive values (in strings those must be escaped as \r and \n, respectively) and that most JSON formatters default to not including any whitespace, including returns and newlines. These features allow the newline character or return and newline character sequence to be used as a delimiter.
This example shows two JSON objects (the implicit newline characters at the end of each line are not shown):
{"some":"thing\n"}
{"may":{"include":"nested","objects":["and","arrays"]}}
The use of a newline as a delimiter enables this format to work very well with traditional line-oriented Unix tools.
A log file, for example, might look like:
{"ts":"2020-06-18T10:44:12","started":{"pid":45678}}
{"ts":"2020-06-18T10:44:13","logged_in":{"username":"foo"},"connection":{"addr":"1.2.3.4","port":5678}}
{"ts":"2020-06-18T10:44:15","registered":{"username":"bar","email":"bar@example.com"},"connection":{"addr":"2.3.4.5","port":6789}}
{"ts":"2020-06-18T10:44:16","logged_out":{"username":"foo"},"connection":{"addr":"1.2.3.4","port":5678}}
which is very easy to sort by date, grep for usernames, actions, IP addresses, etc.
Compatibility
[edit]Line-delimited JSON can be read by a parser that can handle concatenated JSON. Concatenated JSON that contains newlines within a JSON object can't be read by a line-delimited JSON parser.
The terms "line-delimited JSON" and "newline-delimited JSON" are often used without clarifying if embedded newlines are supported.
In the past the newline delimited JSON specification[9] allowed comments to be embedded if the first two characters of a given line were "//". This could not be used with standard JSON parsers if comments were included. Current version of the specification ("NDJSON - Newline delimited JSON ")[10] no longer includes comments.
Concatenated JSON can be converted into line-delimited JSON by a suitable JSON utility such as jq. For example
jq --compact-output . < concatenated.json > lines.json
Record separator-delimited JSON
[edit]Record separator-delimited JSON streaming allows JSON text sequences to be delimited without the requirement that the JSON formatter exclude whitespace. Since JSON text sequences cannot contain control characters, a record separator character can be used to delimit the sequences. In addition, it is suggested that each JSON text sequence be followed by a line feed character to allow proper handling of top-level JSON objects that are not self delimiting (numbers, true, false, and null).
This format is also known as JSON Text Sequences or MIME type application/json-seq, and is formally described in IETF RFC 7464.
The example below shows two JSON objects with ␞ representing the record separator control character and ␊ representing the line feed character:
␞{"some":"thing\n"}␊
␞{
"may": {
"include": "nested",
"objects": [
"and",
"arrays"
]
}
}␊
Concatenated JSON
[edit]Concatenated JSON streaming allows the sender to simply write each JSON object into the stream with no delimiters. It relies on the receiver using a parser that can recognize and emit each JSON object as the terminating character is parsed. Concatenated JSON isn't a new format, it's simply a name for streaming multiple JSON objects without any delimiters.
The advantage of this format is that it can handle JSON objects that have been formatted with embedded newline characters, e.g., pretty-printed for human readability. For example, these two inputs are both valid and produce the same output:
{"some":"thing\n"}{"may":{"include":"nested","objects":["and","arrays"]}}
{
"some": "thing\n"
}
{
"may": {
"include": "nested",
"objects": [
"and",
"arrays"
]
}
}
Implementations that rely on line-based input may require a newline character after each JSON object in order for the object to be emitted by the parser in a timely manner. (Otherwise the line may remain in the input buffer without being passed to the parser.) This is rarely recognised as an issue because terminating JSON objects with a newline character is very common.
Length-prefixed JSON
[edit]Length-prefixed or framed JSON streaming allows the sender to explicitly state the length of each message. It relies on the receiver using a parser that can recognize each length n and then read the following n bytes to parse as JSON.
The advantage of this format is that it can speed up parsing due to the fact that the exact length of each message is explicitly stated, rather than forcing the parser to search for delimiters. Length-prefixed JSON is also well-suited for TCP applications, where a single "message" may be divided into arbitrary chunks, because the prefixed length tells the parser exactly how many bytes to expect before attempting to parse a JSON string.
This example shows two length-prefixed JSON objects (with each length being the byte-length of the following JSON string):
18{"some":"thing\n"}55{"may":{"include":"nested","objects":["and","arrays"]}}
Applications and tools
[edit]Newline-delimited JSON
[edit]- jq can both create and read line-delimited JSON texts.
- Jackson (API) can read and write line-delimited JSON texts.
- logstash includes a json_lines codec.[11]
- ldjson-stream module for Node.js
- ld-jsonstream dependency free module for Node.js
- json-stream-es is a JavaScript/TypeScript library (frontend and backend) that can create and read newline-delimited JSON documents.
- ArduinoJson is a C++ library that supports line-delimited JSON.
- RecordStream A set of tools to manipulate line delimited JSON (generate, transform, collect statistics, and format results).
- The Go standard library's encoding/json package can be used to read and write line-delimited JSON.
- RDF4J and Ontotext GraphDB support NDJSON for JSON-LD (called NDJSONLD)[12] since February 2021.[13]
Record separator-delimited JSON
[edit]- jq can both create and read record separator-delimited JSON texts.
- json-stream-es is a JavaScript/TypeScript library (frontend and backend) that can create and read record separator-delimited JSON documents.
Concatenated JSON
[edit]- concatjson concatenated JSON streaming parser/serializer module for Node.js
- json-stream-es is a JavaScript/TypeScript library (frontend and backend) that can create and read concatenated JSON documents.
- Jackson (API) can read and write concatenated JSON content.
- jq lightweight flexible command-line JSON processor
- Noggit Solr's streaming JSON parser for Java
- Yajl – Yet Another JSON Library. YAJL is a small event-driven (SAX-style) JSON parser written in ANSI C, and a small validating JSON generator.
- ArduinoJson is a C++ library that supports concatenated JSON.
- GSON JsonStreamParser.java can read concatenated JSON.
- json-stream is a streaming JSON parser for python.
Length-prefixed JSON
[edit]- missive Fast, lightweight library for encoding and decoding length-prefixed JSON messages over streams
- Native messaging WebExtensions Native Messaging
References
[edit]- ^ Ryan, Film Grain. "How We Built Filmgrain, Part 2 of 2". filmgrainapp.com. Archived from the original on 5 July 2013. Retrieved 4 July 2013.
- ^ "JSON Lines".
- ^ Williams, N. (2015). "RFC 7464". Request for Comments. doi:10.17487/RFC7464.
- ^ "ndjson - Newline delimited JSON - Format for Structured Data". ndjson.org. Archived from the original on 2023-12-18. Retrieved 2025-10-22.
- ^ ndjson/ndjson-spec, ndjson, 2025-10-18, retrieved 2025-10-22
- ^ ndjson. "Update specification_draft2.md · ndjson/ndjson-spec@c658c26". GitHub. Retrieved 2025-10-22.
- ^ "JSON Lines". jsonlines.org. Retrieved 2025-10-22.
- ^ "JSON Lines |On The Web". jsonlines.org. Retrieved 2025-10-22.
- ^ "Newline Delimited JSON". Jimbo JW. Archived from the original on 2015-12-22.
- ^ "NDJSON - Newline delimited JSON". GitHub. 2 June 2021.
- ^ "Centralized Logging with Monolog, Logstash, and Elasticsearch".
- ^ "Package org.eclipse.rdf4j.rio.ndjsonld". Eclipse Foundation. Retrieved 1 May 2023.
- ^ "Introduce RDFParser and RDFWriter implementation for Newline Delimited JSON-LD format". rdf4j Github repository. February 2021. Retrieved 1 May 2023.
JSON streaming
View on Grokipedia\n), with parsers accepting either \n or \r\n as delimiters while ignoring empty lines.[3] This format ensures low memory usage and supports UTF-8 encoding, prohibiting internal newlines or carriage returns within objects to maintain stream integrity.[3]
Streaming JSON processing is implemented across various programming ecosystems through dedicated APIs. In Java, the JSON Processing API (JSON-P, JSR 353) provides a streaming model via JsonParser for forward-only, event-based reading and JsonGenerator for writing, optimizing for large datasets in resource-limited settings.[2] Similarly, the C++ library RapidJSON employs stream concepts like FileReadStream and custom input interfaces to parse JSON from files or networks using fixed-size buffers (e.g., 64 KB), supporting encodings like UTF-8 and reducing memory overhead for massive documents.[1] These implementations enable applications in big data analytics, real-time APIs, and IoT systems where full JSON loading would be inefficient or impossible.[1]
Introduction
Definition and Motivation
JSON streaming refers to methods for transmitting and parsing individual JSON objects incrementally over stream-oriented protocols such as TCP, enabling processing without buffering the entire payload in memory. This approach builds on JSON's structure as a lightweight, text-based, language-independent data interchange format derived from the JavaScript Programming Language Standard, which uses UTF-8 encoding and defines primitives like strings, numbers, booleans, and null, alongside structured types such as objects and arrays.[4] Standard JSON parsing requires the complete text to be available upfront, as parsers transform the entire input into an in-memory representation, which limits its suitability for continuous or voluminous data flows.[4] The primary motivation for JSON streaming arises from the limitations of traditional JSON handling in scenarios involving large datasets, real-time feeds, or resource-constrained environments. Conventional JSON lacks inherent delimitation for multiple objects in a stream, necessitating the full payload to resolve nested structures and boundaries, which can lead to high memory usage and latency—critical issues for applications like log aggregation or sensor data ingestion where payloads may reach gigabytes. By allowing chunked transmission and on-the-fly parsing, JSON streaming supports low-latency processing, reduces memory footprint, and facilitates efficient data pipelines in big data ecosystems.[5][6] JSON streaming emerged in the early 2010s amid the rapid growth of big data technologies, which amplified the need for scalable data interchange beyond batch-oriented processing. Initial informal applications appeared around 2010 in areas like log streaming, where developers sought ways to handle continuous JSON outputs without full buffering. This practice was formalized in specifications such as RFC 7464, published in 2015, which introduced JSON Text Sequences for serializing indeterminate-length sequences, particularly for logfiles and similar use cases.[5]Comparison to Traditional JSON Processing
Traditional JSON processing typically involves loading an entire document into memory before parsing, which can lead to significant memory overhead and latency, particularly for large datasets such as gigabyte-scale files.[7] This approach requires building a complete in-memory representation of the JSON structure, often resulting in memory usage that scales linearly with the document size, O(n), where n is the total data volume.[8] For instance, parsing a conventional JSON array containing millions of objects may consume 4 to 6 times the file's on-disk size in RAM due to object allocations and buffering.[9] In contrast, JSON streaming enables partial and incremental parsing, processing data object-by-object or record-by-record as it arrives, which reduces memory complexity to O(1) per object and supports indefinite-length streams without predefined boundaries.[7] Unlike traditional methods that demand the full document for validation and traversal, streaming formats allow discarding processed portions immediately, avoiding the need to retain the entire structure in memory.[10] This facilitates real-time handling of unbounded data flows, where traditional JSON would require buffering until completion, potentially causing out-of-memory errors.[8] Performance benefits are evident in benchmarks, where streaming can achieve up to 99% memory savings for large files; for example, processing a multi-gigabyte CityJSON dataset requires 3,743 MB with traditional parsing but only 15 MB using a streaming variant, due to line-by-line feature extraction without loading full vertex lists.[10] Latency improvements arise from on-the-fly processing, enabling sub-second responses for incoming data versus post-load delays in batch methods, which is critical for high-volume ingestion.[7] Use cases highlight these distinctions: traditional JSON suits static, finite payloads like API responses from batch endpoints, where complete loading ensures structural integrity, while streaming excels in dynamic scenarios such as live stock tickers or log aggregation, where continuous, low-latency updates prevent bottlenecks.[8]Approaches
Newline-Delimited JSON
Newline-delimited JSON, also known as NDJSON or JSON Lines (JSONL), is a format for representing a sequence of JSON objects where each object occupies a single line in a text file or stream, terminated by a line feed character (LF, or \n in ASCII, 0x0A).[3] This approach ensures that individual JSON objects are self-contained and separated solely by newlines, with optional carriage returns (CR, \r) preceding the LF for compatibility with different line-ending conventions, such as \r\n on Windows systems.[3] Unlike traditional JSON arrays, newline-delimited JSON does not wrap objects in an enclosing array or permit trailing commas between lines, and each object ends with a newline character. All content must be encoded in UTF-8, and no newlines or carriage returns are allowed within the JSON objects themselves to maintain line boundaries.[11] The mechanics of this format facilitate straightforward parsing by reading one line at a time, allowing each line to be independently validated and processed as a complete JSON object according to RFC 8259. Parsers typically ignore empty lines, though this behavior should be explicitly documented for consistency across implementations. If a line contains invalid JSON, parsers raise an error for that specific object without affecting the rest of the stream, enabling resilient processing in streaming scenarios.[3] One key advantage of newline-delimited JSON is its human readability, as the line-based structure makes it easy to inspect and edit in standard text editors without disrupting the overall format. It also integrates seamlessly with Unix command-line tools such as grep, awk, and sed, which operate on line-oriented input, enabling efficient filtering, transformation, and analysis of large datasets in pipelines. This compatibility extends to memory-efficient streaming applications, where objects can be processed incrementally without loading the entire dataset into memory, isolating errors to individual lines and supporting partial recovery from corrupted streams.[8] However, compatibility issues arise when JSON objects include unescaped newlines, such as within string values, which would violate the single-line requirement and cause parsing failures across line boundaries. Implementations must enforce strict adherence to this rule, rejecting any object that spans multiple lines, which limits its use for JSON containing multiline strings unless those newlines are properly escaped as \n. Additionally, while widely supported, the lack of a registered IANA media type means that some tools may not natively recognize it, requiring explicit configuration.[3][8] For example, a simple log entry in newline-delimited JSON might appear as:{"timestamp":"2025-11-10T12:00:00Z","event":"login","user_id":123}
{"timestamp":"2025-11-10T12:00:00Z","event":"login","user_id":123}
Record Separator-Delimited JSON
Record separator-delimited JSON, also known as JSON text sequences, employs the ASCII record separator character (RS, 0x1E) to delimit individual JSON objects within a stream. Each JSON text in the sequence is encoded in UTF-8, prefixed by an RS byte, and terminated by an ASCII line feed (LF, 0x0A), forming a repeatable structure of RS followed by a complete JSON text and LF. This approach permits arbitrary whitespace, including newlines, within each JSON text, as the delimiters are independent of line boundaries.[5] Unlike newline-delimited JSON, which assumes no unescaped newlines within JSON objects and can break with pretty-printed formatting, record separator-delimited JSON handles embedded newlines gracefully through its fixed control character prefixes. This makes it more robust for scenarios involving formatted or multi-line JSON texts, enabling support for pretty-printed structures without delimiter ambiguity. Additionally, the binary-safe RS delimiter ensures reliable parsing in streaming contexts, as it avoids conflicts with common text characters, and the LF suffix serves as a "canary" to detect and recover from truncated top-level values like numbers or booleans.[5] However, the use of non-printable control characters like RS reduces human readability, as the stream appears garbled in standard text editors without specialized rendering. Parsing requires tools capable of handling binary data, which may complicate integration with purely text-based processors.[5] An example sequence with two JSON objects might appear as follows, where<RS> represents 0x1E and <LF> represents 0x0A:
<RS>{"key": "value"}<LF><RS>{"next": "object", "array": [1, 2, 3]}<LF>
<RS>{"key": "value"}<LF><RS>{"next": "object", "array": [1, 2, 3]}<LF>
Concatenated JSON
Concatenated JSON refers to a method of streaming where multiple complete JSON values, such as objects or arrays, are placed sequentially without any delimiters, separators, or additional formatting between them. In this approach, specialized parsers detect the boundaries between individual JSON values by leveraging the syntactic structure of JSON itself; for instance, the completion of a top-level object (marked by a closing brace}) signals the end of one value, while the start of the next (e.g., an opening brace { for another object) indicates a new one. This relies on the parser operating in a streaming mode that can incrementally process input and recognize multiple top-level values, rather than expecting a single root element as per the standard JSON specification (RFC 8259).[12][13]
A simple example illustrates this: the input string {"a":1}{"b":2} is parsed as two distinct top-level objects, {"a":1} and {"b":2}, without requiring explicit separation. Such parsers emit each complete value as it is detected, enabling real-time processing in streaming scenarios.[12]
The primary advantages of concatenated JSON include its minimal overhead, as no extra bytes are introduced for delimiters or whitespace beyond what is already present in the JSON values themselves, making it compatible with existing pretty-printed JSON output. This simplicity allows for straightforward concatenation on the producer side, often just by appending serialized JSON strings directly.[13][14]
However, this method presents challenges, as it demands parsers explicitly designed to support streaming and multiple top-level values—standard JSON parsers will typically fail or only process the first value, treating the rest as invalid syntax. Ambiguity arises with malformed input, such as trailing commas (e.g., {"a":1,}{"b":2}), which violate JSON syntax and prevent boundary detection, potentially causing parsing errors or incomplete results.[12][15]
Concatenated JSON has been common in early streaming attempts and legacy systems, where the need for low-overhead serialization preceded the adoption of more structured formats like NDJSON.[16]
Length-Prefixed JSON
Length-prefixed JSON is a streaming format where each JSON object is preceded by a prefix specifying the exact byte length of the subsequent JSON data, facilitating unambiguous message boundaries in continuous data flows. The prefix can take various forms, such as a human-readable decimal string (e.g., "15" followed by the 15-byte JSON object{"key":"value"}) or a compact binary representation like a 32-bit unsigned integer in native byte order, with the length value excluding the prefix itself. This approach ensures that parsers can immediately determine the extent of each message without scanning for syntactic delimiters.[17]
This technique offers precise boundaries, making it particularly efficient for binary protocols and network communications where messages may arrive in fragmented chunks, such as over TCP. By explicitly stating the length upfront, it accelerates parsing by avoiding the need to process the entire JSON structure to find endpoints, and it supports high-performance scenarios by enabling direct jumps to message starts when offsets are known, thus facilitating random access within streams. In contrast to simpler concatenation methods that rely on JSON syntax for boundary detection, length-prefixing provides reliability in ambiguous or mixed-content streams.[17][18][19]
However, the format introduces overhead from the prefix—typically 4 bytes for a binary integer or more for textual representations—and requires an initial integer parsing step before JSON decoding, which can add minor computational cost in low-latency environments. An illustrative example appears in the Native Messaging protocol used by web browsers like Chrome and Firefox, where each message consists of a uint32 length prefix followed by UTF-8 encoded JSON, supporting bidirectional communication between extensions and native applications via standard input/output.[20][21]
Since around 2012, length-prefixed JSON has been employed in high-throughput systems, including message queues and real-time protocols, to handle efficient, scalable data exchange in distributed environments.[17]
Standards and Specifications
RFC 7464 and JSON Sequences
RFC 7464, published by the Internet Engineering Task Force (IETF) in February 2015, formally defines the JavaScript Object Notation (JSON) text sequence format along with the associated media typeapplication/json-seq. This specification addresses the need for a standardized way to stream multiple JSON objects over protocols that operate on octet streams, such as TCP, by delimiting each JSON text with control characters to enable unambiguous parsing without requiring prior knowledge of object boundaries.[7]
The core structure of a JSON text sequence consists of one or more JSON texts, each encoded exclusively in UTF-8, prefixed by an ASCII Record Separator (RS, hexadecimal 0x1E), and immediately followed by an ASCII Line Feed (LF, hexadecimal 0x0A). This RS delimiter ensures that each JSON object can be isolated and processed incrementally, even in the presence of streaming interruptions. Parsers should continue parsing after encountering invalid JSON by skipping to the next RS delimiter. The format mandates strict adherence to JSON syntax per RFC 7159 (the JSON specification current at the time of publication; see also RFC 8259 for updates) and prohibits any non-UTF-8 encodings or extraneous characters between delimiters.[7]
Since its publication, RFC 7464 has seen adoption in various software tools and libraries beginning in 2016, with implementations available in languages such as Python and JavaScript for generating and consuming JSON text sequences in logging and data transfer scenarios. It has also been incorporated into IETF protocols, most notably as the foundational format for RFC 8142, which extends it to GeoJSON text sequences for streaming geospatial data.[22][23]
As of November 2025, RFC 7464 remains unchanged without major revisions, continuing to influence contemporary standards; for instance, it is referenced in the OpenAPI Specification version 3.2.0 (released September 2025) to support application/json-seq as a media type for specifying streaming responses in API documentation, allowing schemas to be applied to individual sequence items.[7][24][25]
NDJSON and JSON Lines Formats
NDJSON, or Newline Delimited JSON, is an informal specification for streaming JSON data introduced in 2014 through the ndjson-spec repository.[3] It requires that each line of the file or stream consist of exactly one valid JSON object, with lines separated by a newline character (\n), optionally preceded by a carriage return (\r\n), and no commas between objects.[3] The format mandates UTF-8 encoding without a byte order mark, and JSON texts within lines must not contain newlines or carriage returns.[3] Parsers are permitted to ignore empty lines, though this behavior should be configurable and documented.[3] A representative NDJSON file might appear as follows:
{"name": "Alice", "age": 30}
{"name": "Bob", "age": 25}
{"name": "Charlie", "age": 35}
{"name": "Alice", "age": 30}
{"name": "Bob", "age": 25}
{"name": "Charlie", "age": 35}
\n, supporting \r\n), but it prohibits blank lines entirely, requiring every line to contain a valid JSON value.[26] A trailing newline after the last JSON value is strongly recommended but not strictly required, distinguishing it from NDJSON's stricter enforcement where each JSON object must be followed by a newline.[26] JSON Lines also avoids wrapping the entire content in an array or using commas between records.[26] An example JSON Lines file is:
{"name": "Alice", "age": 30}
{"name": "Bob", "age": 25}
{"name": "Charlie", "age": 35}
{"name": "Alice", "age": 30}
{"name": "Bob", "age": 25}
{"name": "Charlie", "age": 35}
Applications
Real-Time Data Streaming
JSON streaming plays a pivotal role in real-time data applications by enabling continuous, low-latency delivery of structured data over protocols such as Server-Sent Events (SSE) and WebSockets. In SSE, servers push unidirectional streams of JSON events to clients, ideal for live updates like notifications or dashboard refreshes, where each event is a complete JSON object prefixed with "data:" and terminated by double newlines. WebSockets, supporting bidirectional communication, facilitate JSON message exchanges in interactive scenarios, such as chat applications where user messages are serialized as JSON payloads and broadcast to connected clients in real time. These mechanisms ensure efficient handling of dynamic data flows without the overhead of repeated HTTP requests.[28] A prominent example is the X API v2's filtered stream endpoint (formerly Twitter API), which delivers real-time post data using newline-delimited JSON (NDJSON) since its launch in 2020, allowing developers to receive matching posts as they occur via a persistent HTTP connection. This format suits high-volume live feeds, as each line represents a self-contained JSON object for easy parsing and processing on the fly. NDJSON's readability and simplicity make it particularly effective for such streaming use cases.[29] In artificial intelligence and machine learning pipelines, JSON streaming supports real-time inference by transmitting responses in chunks, reducing wait times for large outputs from models like large language models. A key application is using JSON structures in LLM outputs for structured content, such as narratives, where JSON ensures a parseable structure (e.g., for chapters or image prompts), promotes consistency across generations, facilitates integration with application pipelines, and reduces parsing errors in downstream processing.[30][31][32] JSON streaming enables efficient handling of such outputs in real-time applications by delivering partial results progressively, allowing for immediate processing or display. The OpenAPI Specification version 3.2.0, released in September 2025, introduces native support for streaming responses through media types such as application/jsonl (JSON Lines) and application/x-ndjson, allowing API specifications to define chunked JSON delivery for progressive rendering of inference results. This enables applications to display partial outputs, such as generated text, as they become available, enhancing user experience in interactive AI tools.[24][33] Length-prefixed JSON is a general method used in TCP-based streaming applications to achieve low latency by indicating message size upfront, allowing clients to delineate payloads in continuous byte streams without excessive buffering; this technique is applicable to high-frequency scenarios like financial data feeds.[34] As of 2025, JSON streaming has integrated deeply with serverless functions in edge computing platforms, enabling scalable, globally distributed real-time streams. Edge functions on services like Vercel and Cloudflare Workers process and forward JSON data near users, supporting low-latency applications in IoT sensor feeds and live analytics by executing streaming logic without traditional server management. This convergence allows developers to deploy resilient streams that adapt to traffic spikes while maintaining data freshness across regions.[34]Logging and Batch Processing
In logging applications, newline-delimited JSON (NDJSON) is widely adopted for structuring server logs, where each line represents a complete JSON object containing key fields such as timestamp, log level, and message. For instance, a typical entry might appear as{"time":"2023-04-15T15:13:17Z","level":"error","msg":"Connection timeout detected"}, allowing for easy appending of new events without invalidating the file structure.[35] This format facilitates real-time monitoring tools like tail -f, which can process lines incrementally without requiring full file parsing, making it suitable for persistent log storage in production environments.[26][36]
In extract-transform-load (ETL) pipelines for big data ingestion, concatenated or record-separator-delimited JSON streams are employed to handle high-volume data flows efficiently. Tools such as Apache NiFi use processors like SplitJson and MergeContent to break down concatenated JSON arrays into individual records or merge multiple small JSON files before routing to destinations like Apache Kafka topics.[37][38] Similarly, Kafka streams often ingest NDJSON payloads, where messages are delimited by newlines, enabling scalable processing in distributed systems without the overhead of schema enforcement for each record.[39][40] This approach supports batch-oriented ingestion from sources like databases or sensors, ensuring data integrity across clusters.
For batch exports, such as database dumps, JSON Lines (JSONL) format is preferred to mitigate the issues of monolithic JSON files that can become unwieldy at scale. Each record in a JSONL file is a standalone JSON object on its own line, allowing exports from relational databases like SQL Server to produce files that can be processed sequentially without loading the entire dataset into memory.[41] Analytics platforms like Splunk leverage this format for ingesting log exports, where tools automatically parse line-delimited JSON events to index fields for querying, avoiding bloat from nested arrays in single-object exports.[42][43]
At scale, JSON streaming in these contexts enables handling terabyte-scale log volumes without out-of-memory (OOM) errors by utilizing streaming parsers that process data incrementally. Libraries like Python's ijson or Node.js stream transformers read NDJSON files line-by-line, maintaining constant memory usage regardless of file size, which is critical for post-hoc analysis of massive persistent logs.[44][45] This contrasts with traditional full-load parsing, which would fail on datasets exceeding available RAM, and complements real-time streaming by focusing on durable, batch-retrieval scenarios.
Tools and Libraries
Command-Line and General Tools
One of the most widely used command-line tools for processing JSON streams is jq, a lightweight utility designed for slicing, filtering, and transforming JSON data akin to Unix tools like sed and awk. Since version 1.5, released in December 2015, jq has included the--stream option, which enables parsing of input in a streaming fashion, allowing it to handle large JSON texts immediately without loading the entire document into memory. This is particularly useful for newline-delimited JSON (NDJSON) and concatenated JSON formats, where jq can filter or extract elements from massive files or streams efficiently. For example, to process an NDJSON file and select objects where a field meets a condition, one can use:
jq --stream 'fromstream(1|truncate_stream(inputs)) | select(.id > 10)' input.ndjson
jq --stream 'fromstream(1|truncate_stream(inputs)) | select(.id > 10)' input.ndjson
json-stream-es library, available for Node.js and browser environments, provides a streaming JSON parser and stringifier based on web streams, supporting formats like record separator-delimited JSON for both reading and writing streams. It can be invoked via Node for CLI-like usage, such as piping input to parse concatenated or delimited JSON without full materialization. For instance:
node -e "require('json-stream-es').parse().pipe([process](/page/Process).stdout).on('data', console.log)" < input.json
node -e "require('json-stream-es').parse().pipe([process](/page/Process).stdout).on('data', console.log)" < input.json
curl -N -H "Accept: application/x-ndjson" https://api.example.com/stream | jq 'select(.status == "active")'
curl -N -H "Accept: application/x-ndjson" https://api.example.com/stream | jq 'select(.status == "active")'
-N flag disables buffering in curl, ensuring continuous flow to jq for low-latency handling of live data.[48][49]
General utilities for NDJSON manipulation include the ndjson-cli suite, a set of command-line tools for operating on newline-delimited JSON streams. The ndjson-cat command concatenates input files or stdin into a single NDJSON stream, removing internal newlines from pretty-printed JSON to produce clean delimited output. Conversely, ndjson-split expands an array in the input stream into multiple NDJSON lines based on a JavaScript expression. An example workflow to convert a JSON array file to NDJSON:
ndjson-cat array.json | ndjson-split 'd.features' > features.ndjson
ndjson-cat array.json | ndjson-split 'd.features' > features.ndjson
features array into individual lines, enabling efficient manipulation of large datasets in pipelines. These tools integrate seamlessly with jq or curl for broader streaming workflows.[50]
In 2025, jq received significant updates with version 1.8.0, released on June 1, enhancing performance for parsing and streaming operations through optimizations in the core engine, including better handling of large inputs and reduced memory usage in stream mode. These improvements make jq even more suitable for processing high-volume JSON streams on the command line, though no direct integration with external backends like simdjson[51] was added.[52][53]
