Recent from talks
Nothing was collected or created yet.
Jq (programming language)
View on Wikipedia| jq | |
|---|---|
Official jq logo | |
| Paradigms | Purely functional, JSON-oriented processing, tacit |
| Designed by | Stephen Dolan |
| First appeared | August 21, 2012 |
| Stable release | 1.8.1[1] |
| Typing discipline | dynamic |
| Memory management | automatic |
| Scope | lexical |
| Implementation language | C |
| Platform | Cross-platform[2] |
| OS | Cross-platform[note 1] |
| License | MIT |
| Website | jqlang |
jq is a widely-used command-line utility and very high-level, functional, domain-specific programming language designed for processing JSON data. jq filters its input data to produce modified output in a manner similar to AWK or sed, but operates on JSON values, rather than lines. In jq, programs consist of "filters" that can be composed in pipelines that perform a variety of operations on their inputs.[3][4]
History
[edit]jq was created by Stephen Dolan, and released in October 2012.[5][6] It was described as being "like sed for JSON data".[7] Support for regular expressions was added in jq version 1.5.
The original implementation of jq was in Haskell[8] before being ported to the language C.
Usage
[edit]Command-line usage
[edit]jq is typically used at the command line and can be used with other command-line utilities, such as curl. Here is an example showing how the output of a curl command can be piped to a jq filter to determine the category names associated with this Wikipedia page:
$ curl -s 'https://en.wikipedia.org/w/api.php?action=parse&page=jq_(programming_language)&format=json' | jq '.parse.categories[]."*"'
The output produced by this pipeline consists of a stream of JSON strings, the first few of which are:
"Articles_with_short_description"
"Short_description_matches_Wikidata"
"Dynamically_typed_programming_languages"
"Functional_languages"
"Programming_languages"
"Programming_languages_created_in_2012"
"Query_languages"
"2012_software"
The curl command above uses the MediaWiki API for this page to produce a JSON response.
The pipe | allows the output of curl to be accessed by jq, a standard Unix shell mechanism.[9]
The jq filter shown is an abbreviation for the jq pipeline:
.["parse"] | .["categories"] | .[] | .["*"]
This corresponds to the nested JSON structure produced by the call to curl. Notice that the jq pipeline is constructed in the same manner using the | character as the Unix-style pipeline.
Embedded usage
[edit]jq provides a C API, libjq, allowing it to be used from C.[4]
Modes of operation
[edit]jq by default acts as a "stream editor" for JSON inputs, much like the sed utility can be thought of as a "stream editor" for lines of text. However jq has several other modes of operation:
- it can treat its input from one or more sources as lines of text;
- it can gather a stream of inputs from a specified source into a JSON array;
- it can parse its JSON inputs using a so-called "streaming parser" that produces a stream of [path, value] arrays for all "leaf" paths.
The streaming parser is very useful when one of more of the JSON inputs is too large to fit in memory, since its memory needs are usually quite small. For example, for an arbitrarily large array of JSON objects, the peak memory need is little more than needed to handle the largest top-level object.
These modes of operation can, within certain limitations, be combined.
Syntax and semantics
[edit]Types
[edit]Every JSON value is also a value in jq, which accordingly has the data types shown in the table below.[4]
| Type | Examples |
|---|---|
| "number" |
|
| "string" |
|
| "boolean" |
|
| "array" |
|
| "object" |
|
| "null" |
|
null is a value, just like any other JSON scalar; it is not a pointer or a "null pointer".
nan (corresponding to NaN) and infinite (see IEEE 754) are the only two jq scalars that are not also JSON values.
Forms
[edit]Special syntactic forms exist for function creation, conditionals, stream reduction, and the module system.
Filters
[edit]Here is an example of defining a named, parameterized filter for formatting an integer in any base from 2 to 36 inclusive. The implementation illustrates tacit (or point-free) programming:
def tobase($b):
def digit: "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"[.:.+1];
def mod: . % $b;
def div: ((. - mod) / $b);
def digits: recurse( select(. >= $b) | div) | mod ;
select(2 <= $b and $b <= 36)
| [digits | digit] | reverse | add;
The next example demonstrates the use of generators in the classic "SEND MORE MONEY" verbal arithmetic game:
def send_more_money:
def choose(m;n;used): ([range(m;n+1)] - used)[];
def num(a;b;c;d): 1000*a + 100*b + 10*c + d;
def num(a;b;c;d;e): 10*num(a;b;c;d) + e;
first(
1 as $m
| 0 as $o
| choose(8;9;[]) as $s
| choose(2;9;[$s]) as $e
| choose(2;9;[$s,$e]) as $n
| choose(2;9;[$s,$e,$n]) as $d
| choose(2;9;[$s,$e,$n,$d]) as $r
| choose(2;9;[$s,$e,$n,$d,$r]) as $y
| select(num($s;$e;$n;$d) + num($m;$o;$r;$e) ==
num($m;$o;$n;$e;$y))
| [$s,$e,$n,$d,$m,$o,$r,$e,$m,$o,$n,$e,$y] );
Related software
[edit]jq has inspired several clones and similar programs.
gojq reimplements much of jq in Go.[10] It also supports processing YAML files.
jaq is a Rust implementation of jq developed using denotational semantics to formalize its behavior in cases where the original jq's documentation is unclear or does not match its behavior.[10]
Mike Farah's yq is a jq-like program that supports several file formats, including JSON, YAML, and XML.[11] Its syntax is not fully compatible with jq.[10]
Andrey Kislyuk's yq provides a collection of wrapper scripts that use jq to process YAML, XML, or TOML files.[12][13]
Notes
[edit]References
[edit]Bibliography
[edit]- Janssens, Jeroen (2021). Data Science at the Command Line. O'Reilly Media. ISBN 9781492087885.
- Janssens, Jeroen (2014). Data Science at the Command Line: Facing the Future with Time-Tested Tools. O'Reilly Media. ISBN 9781491947807.
- Marrs, Tom (2017). JSON at Work: Practical Data Integration for the Web. O'Reilly Media. ISBN 9781491982419.
Others
[edit]- ^ https://github.com/jqlang/jq/blob/master/NEWS.md.
{{cite web}}: Missing or empty|title=(help) - ^ a b "Download jq". jq. Archived from the original on 10 September 2025. Retrieved 27 September 2025.
- ^ Homer, Michael (19 October 2023). Branching Compositional Data Transformations in jq, Visually. Proceedings of the 2nd ACM SIGPLAN International Workshop on Programming Abstractions and Interactive Notations, Tools, and Environments. New York: Association for Computing Machinery. pp. 11–16. doi:10.1145/3623504.3623567. ISBN 9798400703997. Retrieved 27 September 2025.
- ^ a b c "jq 1.8 Manual". jq. Archived from the original on 16 September 2025. Retrieved 27 September 2025.
- ^ Janssens 2014.
- ^ "jqlang". GitHub: jqlang. Retrieved January 6, 2023.
- ^ "like sed". Archived from the original on 2013-04-14.
- ^ "Initial: jqlang/Jq@eca89ac". GitHub: jqlang.
- ^ "Tutorial". jq. Retrieved January 6, 2023.
- ^ a b c Färber, Michael (2023). "Denotational Semantics and a fast interpreter for jq". arXiv:2302.10576 [cs.LO].
- ^ Farah, Mike. "yq". yq. Archived from the original on 20 August 2025. Retrieved 27 September 2025.
- ^ Asher, Ben (9 January 2022). "Querying JSON and XML with jq and xq". Ashby. Archived from the original on 29 April 2025. Retrieved 27 September 2025.
- ^ Andrey, Kislyuk. "yq: Command-line YAML/XML/TOML processor - jq wrapper for YAML, XML, TOML documents". yq documentation. Archived from the original on 26 August 2025. Retrieved 27 September 2025.
External links
[edit]- Official website
- FAQ on GitHub – jq FAQ
- The jq Programming Language page on the Rosetta Code comparative programming tasks project site
Jq (programming language)
View on Grokipediased, awk, and grep.[1][2] Developed by Stephen Dolan and first released in October 2012, it is implemented in portable C with zero runtime dependencies, allowing it to run as a single binary across various platforms without additional installations.[3][4]
The language operates on the principle of filters, where each jq program acts as a filter that takes JSON input and produces JSON output, processing data as a stream of values rather than loading entire structures into memory.[5] Key features include a pipeline operator (|) for chaining transformations, support for generators that produce multiple outputs from a single input (such as iterating over arrays with .[]), and built-in functions for operations like length calculation, mapping, and error handling via try/catch.[6][7] It handles standard JSON types—numbers, strings, booleans, arrays, objects, and null—while extending functionality with datetime, mathematical, and regular expression capabilities introduced in later versions.[8]
Originally hosted under Stephen Dolan's personal GitHub repository (stedolan/jq), maintenance transitioned to the community-driven jqlang organization in 2021 to ensure ongoing development amid Dolan's shift to other projects.[4] As of July 2025, the latest stable release is version 1.8.1, which includes enhancements like a module system (since 1.5), improved streaming parsing for large files, and recursive function definitions for complex data manipulations.[9] Widely adopted in scripting, DevOps, and data analysis workflows, jq emphasizes composability and simplicity, enabling concise expressions for tasks such as extracting fields (e.g., .field) or reformatting nested objects without requiring full-fledged programming environments.[10][11]
History and Development
Origins and Creation
jq was created by Stephen Dolan in 2012 as a Haskell prototype aimed at providing a lightweight tool for JSON manipulation, analogous to classic Unix utilities like sed and AWK.[12][2] The project first appeared publicly on August 21, 2012, when its GitHub repository was established, followed by the initial release in October 2012.[2] Dolan's motivations centered on enabling efficient JSON handling directly in command-line workflows, avoiding the overhead of full-fledged programming languages, while emphasizing a functional paradigm with pipeline-based data processing for stream-oriented operations.[13][14] Early development faced performance limitations in the Haskell version, particularly for large-scale JSON processing, prompting a rewrite in C to achieve greater speed and portability without sacrificing the tool's core design principles.[15]Key Releases and Evolution
The early development of jq featured an initial prototype implemented in Haskell, which was ported to a C implementation around 2013 to enhance performance and portability.[2] Version 1.5, released on August 16, 2015, marked a significant milestone by introducing support for regular expressions via the Oniguruma library, alongside a new module system for organizing code, a streaming parser for handling large inputs efficiently, destructuring syntax, mathematical functions, try/catch error handling, and tail call optimization.[16] Subsequent releases built on these foundations, with version 1.6 arriving on November 2, 2018, to provide enhancements including new built-in functions such aswalk/1 for recursive traversal, halt/0 for early termination, and access to environment variables via $ENV, as well as improvements to the module system and streaming capabilities through better shebang support and build optimizations.[17]
Version 1.7, released on September 6, 2022, facilitated the transition to the jqlang GitHub organization with new maintainers, introduced continuous integration via GitHub Actions, expanded platform builds including Docker support, and added new builtins like pick/1 along with language enhancements.[18]
Version 1.8.0, released on June 1, 2024, addressed security vulnerabilities (e.g., CVE-2024-23337), introduced new functions such as trim/0, and included language and CLI improvements.[19]
The most recent stable release, version 1.8.1 on July 1, 2025, focused on maintenance with bug fixes for security and performance issues, refined error handling in the CLI, and updates for compatibility across platforms, addressing feedback from prior versions.[20] Since the 2022 transition, jq has been governed and maintained by a dedicated community under the jqlang organization on GitHub, ensuring ongoing development and responsiveness to user contributions.[21]
Core Concepts and Usage
Command-Line Usage
jq is invoked from the command line using the syntaxjq 'filter' [file], where the filter is a jq expression that processes JSON input from standard input (stdin) or a specified file, producing output to standard output (stdout).[22] For instance, to extract the value of a key named "foo" from a JSON object, one can run echo '{"foo": 42}' | jq '.foo', which outputs 42.[22] This allows jq to function as a standalone tool for quick JSON manipulations directly in the shell.[5]
Several command-line options modify jq's behavior to suit different needs. The -r option enables raw output, printing strings without surrounding quotes; for example, jq -r '.foo' <<< '{"foo": "bar"}' outputs bar instead of "bar".[22] The -c flag produces compact output on single lines, useful for scripting, as in jq -c '.[]' <<< '[1, 2, 3]', which outputs each element on its own line without indentation.[22] The -n option provides null input, allowing filters to generate output without external data, such as jq -n '{a: 1}' to create a simple JSON object.[22] For handling large files, the --stream option enables streaming mode, processing input incrementally and outputting path-value pairs like [[], "a"] for a top-level string "a".[23]
Piping integrates jq seamlessly with other shell commands, particularly for processing API responses. A common pattern is combining it with curl to fetch and filter JSON data; for example, curl 'https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia/all-access/all-agents/Talking_Heads_discography/daily/20210928/20210930' | jq '.' retrieves and prettifies page view statistics from the Wikipedia API.[24] Similarly, curl http://api.open-notify.org/iss-now.json | jq '.' formats the current International Space Station position from a public API.[25]
jq excels in Unix pipelines for data extraction and transformation, often combined with tools like grep or sort. For instance, to read directory listings as strings and process them, ls | jq -R '.' treats each line as a raw string input.[22] More complex workflows might involve printf '1\n2\n3\n' | jq -n 'reduce inputs as $i (0; . + ($i | tonumber))' to sum numbers via piping, demonstrating jq's role in aggregating data alongside traditional Unix utilities.[22]
Embedded and Library Usage
libjq is the C library that enables embedding jq's JSON processing capabilities directly into applications written in C or other languages that can interface with C libraries. It exposes a set of functions for parsing JSON input, compiling jq filters, and executing them to generate output, allowing developers to integrate jq's filtering and transformation features without invoking the command-line tool. The core API functions includejq_init, which creates a jq state; jq_parse, which parses a JSON text string into an internal jq value representation; jq_compile, which takes a jq program string and compiles it into an executable form that can be reused; jq_start, which begins applying the compiled program to the input; and jq_next, which retrieves each output value, with jq_teardown for cleanup. These functions support custom input and output handling, making libjq suitable for high-performance, embedded use cases where JSON manipulation is needed within larger programs.[5]
To facilitate usage in higher-level languages, community-maintained bindings wrap libjq's C API. For Python, the pyjq library provides a simple interface to compile and run jq filters on Python objects or strings, enabling seamless JSON processing within Python applications. In Ruby, the ruby-jq gem offers bindings that allow executing jq programs on Ruby data structures, with support for custom libjq versions if needed.[26] For Node.js, bindings such as node-libjq use the C API directly to provide jq functionality in JavaScript environments, avoiding the overhead of subprocess calls.[27]
A representative example of embedding libjq in a C program involves including the <jq.h> header, initializing a jq state, parsing input JSON, compiling a filter like .users[] | .name, running it, and dumping the output. Developers can consult the libjq source code and binding implementations for complete usage patterns, as detailed documentation is primarily available through the jq repository.[2]
jq's permissive MIT license supports broad adoption, allowing libjq to be embedded in open-source projects and commercial software alike without restrictive requirements.
Operational Modes
Streaming and Parsing Modes
jq supports several operational modes to accommodate diverse input formats, sizes, and processing needs, enabling efficient handling of data without always requiring full in-memory loading or strict JSON adherence. These modes include raw input processing for non-JSON text, streaming for large-scale JSON parsing, slurp for aggregating multiple inputs, and array-based gathering for batch operations.[28] In raw mode, activated via the--raw-input (or -R) option, jq treats the input as a sequence of unparsed text lines, converting each line into a JSON string rather than attempting to parse it as JSON objects. This mode is particularly useful for processing plain text files, logs, or CSV-like data where JSON structure is absent, allowing line-by-line manipulation without parsing errors. For output, the -r flag complements this by emitting strings without surrounding quotes, facilitating integration with other text-based tools. When combined with --slurp, raw input can be gathered into a single array of strings for unified processing.[28][29]
Streaming mode, invoked with the --stream option, enables partial parsing of large JSON documents by processing them incrementally without loading the entire structure into memory. Instead of producing a complete JSON tree, it outputs a stream of arrays containing paths to leaf nodes and their corresponding values, such as [[0], "value"] for accessing elements in arrays or objects. This approach is ideal for massive files where full parsing would be resource-intensive, and it pairs well with jq's iteration constructs like foreach to handle the stream progressively. Streams represent data types as paths leading to primitives like strings or numbers, maintaining compatibility with jq's core data model.[30][28]
Slurp mode, enabled by the -s option, reads all input from files or standard input and consolidates it into a single JSON array before applying the filter. This is beneficial for scenarios involving multiple small JSON documents, such as processing several files at once, where the aggregated array serves as the input to a single filter execution. However, it can consume significant memory for very large or numerous inputs, contrasting with streaming's efficiency.[28][29]
For batch processing of multiple inputs, jq can treat them as elements of an array through slurp mode or the built-in inputs stream, which provides access to all gathered items for operations like reduction or mapping across the collection. Without slurp, jq processes each input separately and emits results sequentially, but array gathering via -s or inputs allows holistic treatment, such as summing values from disparate sources. This mechanism supports workflows like aggregating logs from multiple files into a unified dataset.[29][28]
Input and Output Handling
jq processes input from various sources, allowing flexibility in data ingestion. By default, jq reads JSON data from files specified as command-line arguments or from standard input (stdin) if no files are provided. Input from stdin is parsed as a series of whitespace-separated JSON values, which are processed sequentially. For scenarios where no input is needed, such as generating data from scratch, the--null-input or -n option treats the input as null, enabling computations without external data.[13]
Output is formatted as pretty-printed JSON by default, with each result emitted on stdout as a valid JSON entity, separated by newlines. To produce unquoted strings for easier integration with shell scripts or other tools, the --raw-output or -r flag can be used, which outputs string values without JSON escaping or quotes. Colorized output is enabled automatically when writing to a terminal for better readability, but this can be forced with --color-output or -C, or disabled via --monochrome-output or -M for plain text environments. Additionally, --compact-output or -c minimizes whitespace for single-line JSON, while --sort-keys or -S ensures object keys are output in sorted order.[13]
Error handling in jq emphasizes reliability in scripting contexts. When parsing invalid JSON, jq typically exits with a non-zero status and emits an error message in JSON format, such as an array detailing the location of the issue (e.g., ["Invalid literal at line 1, column 7", [1,7]]), especially when using --stream-errors. The --exit-status or -e option enhances this by setting the exit code to reflect the filter's result: 0 for successful non-empty, non-null, non-false output; 1 for false/null/empty results; and 4 if no valid input was processed, with defaults of 2 for usage errors and 3 for compilation failures. This allows jq to integrate seamlessly into pipelines and conditional scripts.[13]
For handling large inputs without loading the entire document into memory, jq supports streaming strategies that parse JSON incrementally, avoiding full materialization of the data structure. This is particularly useful for massive files or streams, where the --stream option outputs path-value pairs that can be processed iteratively using constructs like reduce or foreach.[13]
Data Model and Types
Supported Data Types
jq supports a data model aligned with JSON standards, encompassing both primitive and composite types that enable processing of structured data streams. The primitive types form the foundational building blocks for values in jq expressions. These includenull, which represents the absence of a value and has a length of 0; booleans, consisting of true and false for logical operations; numbers, implemented as IEEE 754 double-precision floating-point values to handle arithmetic with potential precision limitations; and strings, defined as sequences of Unicode codepoints with length measured by codepoint count rather than bytes.[31][32]
Composite types in jq allow for hierarchical data representation. Arrays are ordered collections of values, accessible by zero-based integer indices, with length corresponding to the number of elements; they support nesting to model lists or sequences. Objects, in contrast, are unordered sets of key-value pairs where keys must be strings, facilitating associative storage with length equal to the number of pairs; values within objects can also be nested composites. jq preserves the order of keys as they first appear in the input for iteration (such as with .[]) and for JSON output serialization by default (unless the --sort-keys option is used).[31][32][33]
Type introspection is provided through the built-in .type filter, which returns a string indicating the input's type: "null", "boolean", "number", "string", "array", or "object". This enables conditional logic based on value categories without explicit type checking in filters. For instance, applying .type to a number yields "number", aiding in dynamic processing of heterogeneous inputs.[34]
jq handles undefined or invalid types gracefully in many cases to prevent processing failures. Accessing non-existent keys in objects or out-of-bounds indices in arrays returns null rather than raising an error, allowing pipelines to continue with fallback values. However, operations incompatible with a type, such as invoking length on a boolean, trigger runtime errors; these can be intercepted using the try expression with a catch handler that defaults to empty or a custom value like null, ensuring robust error recovery in scripts.[35][36]
Type Conversions and Operations
jq supports basic arithmetic operations on its numeric type, which is implemented using IEEE 754 double-precision floating-point arithmetic. The operators+, -, *, and / perform addition, subtraction, multiplication, and division, respectively, but only when both operands are numbers; attempts to apply these to non-numeric types result in errors. For instance, given the input {"price": 10.5}, the filter .price * 2 yields 21, while .price + "extra" fails due to type mismatch.[37] Additionally, the + operator serves a polymorphic role: when both operands are strings, it concatenates them, as in "foo" + "bar" producing "foobar". This dual behavior allows flexible handling of textual data without explicit conversion.[38]
Comparison operators in jq include == (equal), != (not equal), < (less than), > (greater than), <= (less than or equal), and >= (greater than or equal), which evaluate to booleans. For equality checks, no type coercion occurs; strings are never equal to numbers, so "1" == 1 returns false, even though both represent the value one. This strictness aligns with jq's design to avoid unexpected behaviors in JSON processing. Numeric comparisons like 1.0 == 1 succeed due to floating-point equivalence. For ordering operators, comparisons follow a total order: null < false < true < numbers (compared numerically) < strings (compared lexicographically by Unicode code points) < arrays < objects. Thus, any number is less than any string, such as 1 < "2" yielding true, but within types, 2 < 3 is true while "2" < "10" is false because '1' precedes '2' in lexicographical order.[39]
To facilitate operations across types, jq provides explicit conversion functions. The tonumber function attempts to parse its input as a number: it leaves existing numbers unchanged, converts valid numeric strings (e.g., "123.45" to 123.45), and errors on invalid inputs like "abc" unless suppressed with the alternative operator ?, which returns null on failure. Similarly, tostring converts non-strings to their JSON string representations, such as 123 to "123" or [1,2] to "[1,2]", while leaving strings intact. The toarray function wraps a scalar value into a single-element array (e.g., 42 becomes [42]) but leaves arrays unchanged. These functions enable type-safe transformations, as in the example of summing string-encoded prices: given [{"price": "10.5"}, {"price": "20.3"}], the filter map(.price | tonumber) | add first converts the strings to numbers and then computes 30.8. Without conversion, arithmetic would fail.[41][42][34]
Syntax and Language Features
Basic Syntax Elements
jq's basic syntax revolves around expressions that manipulate JSON data, forming the building blocks for more complex filters. Expressions include literals, which directly represent JSON values such as numbers (e.g.,42), strings (e.g., "hello"), booleans (true or false), null, arrays (e.g., [1, 2]), and objects (e.g., {"key": "value"}). These literals can be used standalone or combined in expressions to produce output matching the input's structure.[5]
Variables in jq are prefixed with a dollar sign ($) and are typically bound using the as clause, such as . as $var or expression as $identifier, allowing values to be captured and reused within a scope. For instance, given an input object {"foo": 10, "bar": 200}, the expression .bar as $x | .foo | . + $x binds the value of .bar to $x and adds it to .foo, yielding 210. Variables enable temporary storage without mutating the original data, maintaining jq's functional paradigm.[5]
Accessing elements within objects and arrays forms a core part of expressions. Object fields are retrieved using dot notation, like .key for simple keys or .[key] for keys containing special characters (e.g., ."foo$bar"). For arrays, .[] iterates over elements, while indexed access uses .[index] (e.g., .[0] for the first element). These operations assume the input is a valid JSON object or array; otherwise, they may produce null or errors.[5]
Control structures provide conditional logic and error handling. The if-then-else construct evaluates a condition and branches accordingly, formatted as if condition then true_expression else false_expression end. For example, if . == 0 then "zero" else "positive" end outputs "positive" for input 5. Error handling uses try expression catch handler, which executes the expression and falls back to the handler if it fails; e.g., try .missing catch "not found" returns "not found" if .missing is absent. These structures ensure robust processing of potentially malformed JSON.[5]
Comments in jq are single-line and begin with #, allowing explanatory notes without affecting execution (e.g., # This filters numbers). Whitespace is insignificant outside string literals, promoting flexible formatting for readability, though it must be preserved within strings. Strings are delimited by double quotes and support escaping for special characters, such as \" for embedded quotes or \\ for backslashes; for shell compatibility, single quotes can enclose entire jq programs (e.g., jq '.key'). These rules align jq's syntax with JSON standards while accommodating command-line use.[5]
Filters and Pipelines
jq's filtering mechanism is central to its functionality, allowing users to process and transform JSON data streams by applying operations that refine or restructure the input. A filter in jq is a program that takes JSON input and produces JSON output, often by extracting, modifying, or selecting portions of the data. The simplest filter is the identity operator., which passes its input through unchanged, serving as a foundational element for more complex expressions.[5]
The pipe operator | enables the composition of filters into pipelines, where the output of the filter on the left becomes the input to the filter on the right, facilitating sequential data transformations. This operator supports jq's stream-oriented processing, allowing efficient handling of large or nested JSON structures by chaining operations without loading the entire dataset into memory. For instance, .[] | .name iterates over an array with .[] and then extracts the name field from each element.[5]
jq provides several built-in filters for common tasks. The map filter applies a given filter to each element of an input array, producing a new array with the results; for example, map(. + 1) increments every number in an array by one. The select filter outputs only those inputs for which a provided condition evaluates to true, enabling conditional filtering such as select(. > 0) to retain positive values. The length filter returns the size of its input, whether a string (character count), array or object (number of elements), or number (absolute value). Additionally, keys produces an array of the keys in an input object, sorted alphabetically, as in keys applied to {"a": 1, "b": 2} yielding ["a", "b"].[5]
A practical example of a pipeline involves transforming and filtering an array of objects, such as [{"name": "Alice", "age": 25}, {"name": "Bob", "age": 30}]. The expression .[] | select(.age > 25) | {name: .name} first iterates over the array, then selects objects where age exceeds 25, and finally constructs new objects containing only the name, resulting in {"name": "Bob"}. This demonstrates how pipelines combine iteration, selection, and projection to extract targeted information efficiently.[5]
Functions and Modules
Functions in jq are defined using thedef keyword, allowing users to create reusable filters that encapsulate common operations on JSON data. The basic syntax for defining a function is def function_name(parameters): body;, where parameters are optional and separated by semicolons if multiple, and body is the expression or filter that computes the output based on the input and parameters.[11] For instance, a simple function to increment a number can be written as def increment: . + 1;, which adds 1 to its input; applying it via echo 5 | jq 'increment' yields 6.[11] Functions without parameters operate directly on the input stream, while those with parameters allow passing values, such as def add(a; b): a + b;, which can be invoked with add(3; 4) to produce 7; note that using generators like .[] in parameters results in multiple outputs due to Cartesian product evaluation.[11]
Recursion is supported in jq functions, enabling the definition of self-referential operations such as computing factorials or traversing nested structures. A recursive function calls itself within its body, typically using conditional logic to establish a base case; for example, def [factorial](/page/Factorial)(n): if n <= 1 then 1 else n * [factorial](/page/Factorial)(n - 1) end; computes the factorial of its input, with echo 5 | jq 'factorial' outputting 120.[11] Tail call optimization is applied when recursive calls appear in tail position, preventing stack overflows for deep recursions like tree traversals.[43] Another common pattern for recursion involves processing streams, such as def recurse: . or (recurse | tail);, which consumes input until empty by recursing on the tail.[44]
Modules in jq promote code organization and reusability by allowing the import of external jq files containing function definitions. The import directive loads a module with an alias for qualified access, using the syntax import "module_name" as $alias;, after which functions are invoked as $alias::function_name(arguments).[45] For example, import "math" as $math; $math::sqrt(16) computes the square root, outputting 4.[46] The include directive, include "module_name";, imports all definitions from a module into the current namespace without aliasing, making functions directly available but risking name conflicts.[45] Module search paths can be specified via the -L command-line option to locate files relative to custom directories.[45]
jq includes a standard library of built-in modules providing advanced functionality beyond core operators. The "math" module offers mathematical operations compliant with IEEE 754 double-precision floating-point arithmetic, including functions like sin, cos, pow, floor, and sqrt, which are accessed after importing as shown earlier.[46] Other standard modules cover utilities for dates, strings, and objects, enabling complex transformations without external dependencies; for instance, the "math" module's pow(base; exponent) raises base to the power of exponent, useful in statistical computations on JSON data.[46] These modules are distributed with jq and form the foundation for extending its expressiveness in scripts.[7]
Implementations and Variants
Official C Implementation
The official implementation of jq is a lightweight, portable program written in C, serving as the reference version for the language and maintained under the jqlang organization on GitHub. It features zero runtime dependencies, enabling easy compilation and deployment on various platforms including Unix-like systems, Windows, and embedded environments. The core engine processes JSON inputs through a filter-based pipeline, emphasizing efficiency for command-line usage akin to tools like sed and awk.[2] For regular expression support in filters such asmatch, test, and sub, the implementation integrates the Oniguruma library, a flexible regex engine compatible with Perl-style syntax. This integration was introduced in version 1.6 as a bundled submodule, configurable via the build option --with-oniguruma=builtin to avoid external dependencies. In April 2025, the Oniguruma project was archived, leading to community discussions on migrating to an alternative regex engine such as PCRE2.[47][48][49]
The parser for jq expressions consists of a lexer generated by Flex from src/lexer.l and a syntactic parser generated by Bison, producing src/lexer.c and src/parser.c during the maintainer build mode. This LALR(1) approach handles the language's grammar for filters, pipelines, and functions, with the generated code integrated into the C codebase for evaluation. JSON input parsing employs a custom streaming mechanism that supports incremental processing, ensuring low memory footprint by avoiding full document buffering even for multi-gigabyte streams.[50][51][2]
Performance optimizations focus on memory efficiency, particularly in streaming mode where jq can process unbounded JSON inputs with constant memory usage by emitting outputs as they are produced. This design suits high-volume data pipelines, with benchmarks showing effective handling of large files on standard hardware. The build system utilizes Autotools for configuration and compilation, requiring dependencies like autoconf, automake, libtool, and make; the ./configure && make sequence generates the executable, with options for static linking or builtin libraries to minimize deployment size.[1][2]
Ports, Clones, and Alternatives
gojq is a pure Go implementation of jq that aims for full compatibility with the original while serving as an embeddable library for Go applications. It supports most jq features, including enhanced integer precision and YAML input/output, but does not preserve key order in objects. Compared to the official C implementation, gojq offers faster builds and arbitrary-precision arithmetic without external dependencies, making it suitable for environments where Go is preferred.[52] jaq is a Rust-based clone of jq emphasizing correctness, speed, and simplicity, functioning as both a command-line tool and a library. It achieves high compatibility as a drop-in replacement, often outperforming jq in accuracy for certain edge cases, and extends support to formats like YAML, CBOR, TOML, and XML. In benchmarks, jaq demonstrates superior performance, such as completing tasks in 330ms versus 440ms for the official jq on specific workloads, with a startup time under 50ms and extensive testing including fuzzing for safety.[53] yq exists in two prominent variants, both extending jq-like querying to non-JSON formats. The version by Mike Farah is a standalone, portable command-line processor written in Go, supporting YAML, JSON, XML, CSV, TOML, and properties files with jq-compatible syntax for operations like selection and merging. In contrast, Andrey Kislyuk's yq is a Python wrapper around the official jq, converting YAML, XML, or TOML to JSON for processing and back, preserving features like roundtrip YAML tags while requiring jq to be installed separately.[54][55] dasel serves as a multi-format alternative to jq, enabling selection, modification, and deletion of data across JSON, YAML, TOML, XML, and CSV using a unified query syntax that includes jq-inspired selectors like recursive descent. It supports in-place edits and format conversions, positioning it as a versatile tool for diverse data environments without direct dependency on jq.[56]Ecosystem and Advanced Topics
Integrations with Other Tools
jq integrates seamlessly with shell environments like Bash and Zsh, where it is commonly used in pipelines to process JSON data and assign outputs to variables. The-r option outputs raw strings without quotes, facilitating direct assignment in scripts, such as name=$(echo '{"name": "example"}' | jq -r '.name'), which extracts the value for further use in automation tasks.[35][57]
In DevOps workflows, jq enhances configuration processing in tools like Ansible and Docker. Ansible users often invoke jq via the shell module to parse JSON outputs from tasks, such as filtering API responses or playbook results, with dedicated collections like moreati.jq providing native filter plugins for jq expressions within playbooks.[58] For Docker, jq parses command outputs like docker inspect to extract container details, and it modifies JSON-based configurations in Dockerfiles or entrypoint scripts, enabling dynamic environment handling.[59][60]
jq pairs effectively with API clients like curl and wget for extracting web data. A typical pattern fetches JSON from an endpoint and pipes it to jq for filtering, as in curl https://api.example.com/data | jq '.items[] | .id', which isolates specific fields from responses.[13][61] Similarly, wget can download JSON files for processing, supporting automated data retrieval in scripts.[62]
Language bindings extend jq to programmatic environments, notably through pyjq in Python, which embeds jq filters for JSON manipulation within scripts. This enables efficient data transformation in ETL pipelines, where pyjq processes API or file-based JSON inputs before loading into databases, as shown in examples compiling jq queries for repeated use on structured data.[63]
Performance and Optimization
jq's performance is influenced by its design as a single-threaded, stream-oriented processor, making it efficient for sequential JSON manipulation but potentially slower for computationally intensive tasks compared to compiled alternatives. The official C implementation excels in low startup time for small to medium inputs but can bottleneck on large-scale parsing due to its reliance on recursive descent parsing. For instance, processing a 1GB JSON file in standard mode may consume several gigabytes of RAM if the entire structure is loaded, whereas streaming mode mitigates this by incrementally parsing and outputting path-value pairs, maintaining low and constant memory usage even for multi-gigabyte files.[30] Memory efficiency is a core strength when using streaming capabilities, as jq avoids loading the full input into memory by default in non-slurp mode, which is particularly beneficial for GB-scale JSON datasets common in log analysis or API responses. The--stream flag enables this by producing outputs like [[path], value] for each leaf node, allowing pipelines to filter or transform data on-the-fly without buffering the entire document. However, invoking --slurp (-s) to treat input as a single array drastically increases memory usage, often leading to out-of-memory errors for files exceeding a few hundred megabytes, as it constructs a monolithic in-memory representation.[28][64]
In speed comparisons, the official C-based jq (version 1.8) performs adequately for typical command-line tasks but lags behind Rust-implemented clones like jaq, which achieves 5-10x faster execution on parsing and filtering benchmarks across 23 test cases from the wsjq suite, particularly for iterative processing of arrays or objects. jaq's optimizations, such as a more efficient parser and reduced startup time (under 1ms vs. jq's ~50ms), make it up to 196x faster than alternatives like gojq in select scenarios, though jq remains competitive on simpler extractions where its mature Oniguruma regex integration shines. Other variants, such as query-json, report 2-5x gains over jq for operations on files up to 100MB, highlighting jq's baseline as solid but improvable for high-throughput needs.[53][65]
Optimization strategies center on pipeline design to minimize data movement and computation. Early application of the select filter, such as .[] | select(.field > threshold), prunes irrelevant elements before expensive operations like mapping or sorting, reducing overall processing time by up to 50% on filtered datasets. Avoiding slurp unless necessary preserves streaming benefits, and for very large inputs, combining --stream with fromstream builtins reconstructs partial objects without full deserialization. These techniques enable jq to handle 5GB+ files in minutes on standard hardware, prioritizing selective data access over exhaustive traversal.[66][35]
Despite these efficiencies, jq has inherent limitations that can degrade performance in edge cases. Complex regular expressions, powered by the Oniguruma engine, may exhibit slowdowns on intricate patterns due to backtracking overhead, though Oniguruma remains suitable for most jq use cases. The Oniguruma library was archived in April 2025, with an ongoing discussion in the jq repository about migrating to an alternative engine.[48] Additionally, jq lacks native parallel processing, relying on external tools for concurrency, which limits scalability on multi-core systems for embarrassingly parallel tasks like batch filtering.