Recent from talks
Contribute something
Nothing was collected or created yet.
XQuery
View on Wikipedia| XQuery | |
|---|---|
| Paradigm | declarative, functional, modular |
| Designed by | W3C |
| First appeared | 2007 |
| Stable release | |
| Typing discipline | dynamic or static,[2][3] strong |
| OS | Cross-platform |
| Filename extensions | .xq, .xql, .xqm, .xqy, .xquery |
| Website | www |
| Major implementations | |
| Many | |
| Influenced by | |
| XPath, SQL, XSLT | |
| |
XQuery (XML Query) is a query language and functional programming language designed to query and transform collections of structured and unstructured data, primarily in the form of XML. It also supports text data and, through implementation-specific extensions, other formats like binary and relational data.
The language was developed by the XML Query working group of the W3C, with version 1.0 becoming a W3C Recommendation in January 2007. XQuery development is closely coordinated with the development of XSLT by the XSL Working Group. Both groups jointly maintain XPath, a shared component of XQuery and XSLT. XQuery extends XPath with features like FLWOR (For, Let, Where, Order by, Return) expressions, making it semantically similar to SQL but optimized for hierarchical rather than relational data.
XQuery 3.1, published in March 2017, added support for JSON and introduced maps, arrays, and additional higher-order functions, significantly expanding the language's capabilities for modern data processing.
XQuery is implemented by many database systems, XML databases, and XML processors, including BaseX, eXist, MarkLogic, Saxon, and Berkeley DB XML, making it a cornerstone technology for processing XML data in enterprise software applications.
Features
[edit]XQuery's mission is to:
"provide flexible query facilities to extract data from real and virtual documents on the World Wide Web, therefore finally providing the needed interaction between the Web world and the database world. Ultimately, collections of XML files will be accessed like databases."[4]
It is a functional, side effect-free, expression-oriented programming language with a simple type system, summed up by Kilpeläinen:[5]
All XQuery expressions operate on sequences, and evaluate to sequences. Sequences are ordered lists of items. Items can be either nodes, which represent components of XML documents, or atomic values, which are instances of XML Schema base types like xs:integer or xs:string. Sequences can also be empty, or consist of a single item only. No distinction is made between a single item and a singleton sequence. (...) XQuery/XPath sequences differ from lists in languages like Lisp and Prolog by excluding nested sequences. Designers of XQuery may have considered nested sequences unnecessary for the manipulation of document contents. Nesting, or hierarchy of document structures is instead represented by nodes and their child-parent relationships
XQuery provides the means to extract and manipulate data from XML documents or any data source that can be viewed as XML, such as relational databases[6] or office documents.
XQuery contains a superset of XPath expression syntax to address specific parts of an XML document. It supplements this with a SQL-like "FLWOR expression" for performing joins. A FLWOR expression is constructed from the five clauses after which it is named: FOR, LET, WHERE, ORDER BY, RETURN.
The language also provides syntax allowing new XML documents to be constructed. Where the element and attribute names are known in advance, an XML-like syntax can be used; in other cases, expressions referred to as dynamic node constructors are available. All these constructs are defined as expressions within the language, and can be arbitrarily nested.
The language is based on the XQuery and XPath Data Model (XDM) which uses a tree-structured model of the information content of an XML document, containing seven kinds of nodes: document nodes, elements, attributes, text nodes, comments, processing instructions, and namespaces.
XDM also models all values as sequences (a singleton value is considered to be a sequence of length one). The items in a sequence can either be XML nodes or atomic values. Atomic values may be integers, strings, Booleans, and so on: the full list of types is based on the primitive types defined in XML Schema.
Features for updating XML documents or databases, and full text search capability, are not part of the core language, but are defined in add-on extension standards: XQuery Update Facility 1.0 supports update feature and XQuery and XPath Full Text 1.0 supports full text search in XML documents.
XQuery 3.0 adds support for full functional programming, in that functions are values that can be manipulated (stored in variables, passed to higher-order functions, and dynamically called).
Examples
[edit]The sample XQuery code below lists the unique speakers in each act of Shakespeare's play Hamlet, encoded in hamlet.xml
<html>
<body>
{
for $act in doc("hamlet.xml")//ACT
let $speakers := distinct-values($act//SPEAKER)
return
<div>
<h1>{ string($act/TITLE) }</h1>
<ul>
{
for $speaker in $speakers
return <li>{ $speaker }</li>
}
</ul>
</div>
}
</body>
</html>
All XQuery constructs for performing computations are expressions. There are no statements, even though some of the keywords appear to suggest statement-like behaviors. To execute a function, the expression within the body is evaluated and its value is returned. Thus to write a function to double an input value, one simply writes:
declare function local:doubler($x) { $x * 2 }
To write a full query saying 'Hello World', one writes the expression:
"Hello World"
This style is common in functional programming languages.
Applications
[edit]Below are a few examples of how XQuery can be used:
- Extracting information from a database for use in a web service.[7]
- Generating summary reports on data stored in an XML database.[7]
- Searching textual documents on the Web for relevant information and compiling the results.[8]
- Selecting and transforming XML data to XHTML to be published on the Web.[7]
- Pulling data from databases to be used for the application integration.[7]
- Splitting up an XML document that represents multiple transactions into multiple XML documents.[8]
XQuery and XSLT compared
[edit]This section needs additional citations for verification. (May 2020) |
Scope
[edit]Although XQuery was initially conceived as a query language for large collections of XML documents, it is also capable of transforming individual documents. As such, its capabilities overlap with XSLT, which was designed expressly to allow input XML documents to be transformed into HTML or other formats.
The XSLT 2.0 and XQuery standards were developed by separate working groups within W3C, working together to ensure a common approach where appropriate. They share the same data model (XDM), type system, and function library, and both include XPath 2.0 as a sublanguage.
Origin
[edit]The two languages, however, are rooted in different traditions and serve the needs of different communities. XSLT was primarily conceived as a stylesheet language whose primary goal was to render XML for the human reader on screen, on the web (as web template language), or on paper. XQuery was primarily conceived as a database query language in the tradition of SQL.
Because the two languages originate in different communities, XSLT is stronger[according to whom?] in its handling of narrative documents with more flexible structure, while XQuery is stronger in its data handling (for example, when performing relational joins).
Versions
[edit]XSLT 1.0 appeared as a Recommendation in 1999, whereas XQuery 1.0 only became a Recommendation in early 2007; as a result, XSLT is still much more widely used. Both languages have similar expressive power, though XSLT 2.0 has many features that are missing from XQuery 1.0, such as grouping, number and date formatting, and greater control over XML namespaces.[9][10][11] Many of these features were planned for XQuery 3.0.[12]
Any comparison must take into account the version of XSLT. XSLT 1.0 and XSLT 2.0 are very different languages. XSLT 2.0, in particular, has been heavily influenced by XQuery in its move to strong typing and schema-awareness.
Strengths and weaknesses
[edit]Usability studies have shown that XQuery is easier to learn than XSLT, especially for users with previous experience of database languages such as SQL.[13] This can be attributed to the fact that XQuery is a smaller language with fewer concepts to learn, and to the fact that programs are more concise. It is also true that XQuery is more orthogonal, in that any expression can be used in any syntactic context. By contrast, XSLT is a two-language system in which XPath expressions can be nested in XSLT instructions but not vice versa.
XSLT is currently stronger than XQuery for applications that involve making small changes to a document (for example, deleting all the NOTE elements). Such applications are generally handled in XSLT by use of a coding pattern that involves an identity template that copies all nodes unchanged, modified by specific templates that modify selected nodes. XQuery has no equivalent to this coding pattern, though in future versions it will be possible to tackle such problems using the update facilities in the language that are under development.[14]
XQuery 1.0 lacked any kind of mechanism for dynamic binding or polymorphism; this has been remedied with the introduction of functions as first-class values in XQuery 3.0. The absence of this capability starts to become noticeable when writing large applications, or when writing code that is designed to be reusable in different environments.[citation needed] XSLT offers two complementary mechanisms in this area: the dynamic matching of template rules, and the ability to override rules using xsl:import, that make it possible to write applications with multiple customization layers.
The absence of these facilities from XQuery 1.0 was a deliberate design decision: it has the consequence that XQuery is very amenable to static analysis, which is essential to achieve the level of optimization needed in database query languages. This also makes it easier to detect errors in XQuery code at compile time.
The fact that XSLT 2.0 uses XML syntax makes it rather verbose in comparison to XQuery 1.0. However, many large applications take advantage of this capability by using XSLT to read, write, or modify stylesheets dynamically as part of a processing pipeline. The use of XML syntax also enables the use of XML-based tools for managing XSLT code. By contrast, XQuery syntax is more suitable for embedding in traditional programming languages such as Java (see XQuery API for Java) or C#. If necessary, XQuery code can also be expressed in an XML syntax called XQueryX. The XQueryX representation of XQuery code is rather verbose and not convenient for humans, but can easily be processed with XML tools, for example transformed with XSLT stylesheets.[15][16]
Versions and extensions
[edit]Versions
[edit]- XQuery 1.0 became a W3C Recommendation on January 23, 2007.[17]
- XQuery 3.0 became a W3C Recommendation on April 8, 2014.[18]
- XQuery 3.1 became a W3C Recommendation on March 21, 2017.[19]
W3C extensions
[edit]The World Wide Web Consortium (W3C) developed two major extensions to XQuery:
- XQuery 1.0 and XPath 2.0 Full-Text,[20] which extends XQuery with full-text search capabilities
- XQuery Update Facility, which enables data modification in XQuery
Both became W3C Recommendations as extensions to XQuery 1.0. Efforts to adapt them for XQuery 3.0 were abandoned due to resource constraints.
A scripting (procedural) extension for XQuery was proposed but never completed.[21][22]
The EXPath Community Group[23] develops extensions for XQuery and related standards (XPath, XSLT, XProc, and XForms). The following extensions are available:
- Packaging System,[24] for managing XQuery libraries and modules.
- File Module,[25] for file system operations.
- Binary Module,[26] for handling binary data.
- Web Applications,[27] for building web-based applications
Third-party extensions
[edit]JSONiq is an extension of XQuery that adds support to extract and transform data from JSON documents. JSONiq is a superset of XQuery 3.0. It is published under the Creative Commons Attribution-ShareAlike 3.0 license.
XQuery 3.1 de facto deprecates JSONiq as it has added full support for JSON.
The EXQuery[28] project develops standards around creating portable XQuery applications. The following standards are currently available:
- RESTXQ[29]
Further reading
[edit]- Querying XML: XQuery, XPath, and SQL/XML in context. Jim Melton and Stephen Buxton. Morgan Kaufmann, 2006. ISBN 1-55860-711-0.
- Walmsley, Priscilla (2007). XQuery, 1st Edition. O'Reilly Media. ISBN 978-0-596-00634-1.
- Walmsley, Priscilla (2015). XQuery, 2nd Edition. O'Reilly Media. ISBN 978-1-4919-1510-3.
- XQuery: The XML Query Language. Michael Brundage. Addison-Wesley Professional, 2004. ISBN 0-321-16581-0.
- XQuery from the Experts: A Guide to the W3C XML Query Language. Howard Katz (ed). Addison-Wesley, 2004. ISBN 0-321-18060-7.
- An Introduction to the XQuery FLWOR Expression. Dr. Michael Kay (W3C XQuery Committee), 2005.
Implementations
[edit]| Name | License | Language | XQuery 3.1 | XQuery 3.0 | XQuery 1.0 | XQuery Update 1.0 | XQuery Full Text 1.0 |
|---|---|---|---|---|---|---|---|
| BaseX | BSD license | Java | Yes | Yes | Yes | Yes | Yes |
| eXist | LGPL | Java | Partial | Partial | Yes | No | No |
| MarkLogic | Proprietary | C++ | No | Partial | Yes | No | No |
| Saxon HE | Mozilla Public License | Java | Partial | Partial | Yes | Yes | No |
| Saxon EE | Proprietary | Java | Yes | Yes | Yes | Yes | No |
| Xidel | GPLv3+ | FreePascal | Yes | Yes | Yes | No | No |
| Zorba | Apache License | C++ | No | Yes | Yes | Yes | Yes |
See also
[edit]References
[edit]- ^ "XQuery 3.1 Recommendation". 2017-03-21.
- ^ "XQuery 3.1: An XML Query Language". 2017-03-21.
- ^ "XQuery and Static Typing". 3 April 2023.
- ^ W3C (2003-10-25). "cited by J.Robie".
{{cite web}}: CS1 maint: numeric names: authors list (link) - ^ Kilpeläinen, Pekka (2012). "Using XQuery for problem solving" (PDF). Software: Practice and Experience. 42 (12): 1433–1465. doi:10.1002/spe.1140. S2CID 15561027. Archived from the original (PDF) on 2016-03-04. Retrieved 2015-08-29.
- ^ "Data retrieval with XQuery". Retrieved on 18 January 2016.
- ^ a b c d "XQuery Tutorial". www.w3schools.com. Retrieved 2025-11-06.
- ^ a b "XQuery and XPath Full Text 3.0". www.w3.org. Retrieved 2025-11-06.
- ^ Kay, Michael (May 2005). "Comparing XSLT and XQuery". Archived from the original on 2018-06-27. Retrieved 2016-06-15.
- ^ Eisenberg, J. David (2005-03-09). "Comparing XSLT and XQuery".
- ^ Smith, Michael (2001-02-23). "XQuery, XSLT "overlap" debated". Archived from the original on 2006-06-16. Retrieved 2006-06-19.
- ^ "XQuery 3.0 requirements".
- ^ Usability of XML Query Languages. Joris Graaumans. SIKS Dissertation Series No 2005-16, ISBN 90-393-4065-X
- ^ "XQuery Update Facility".
- ^ "XML Syntax for XQuery (XQueryX)".
- ^ Michael Kay. "Saxon diaries: How not to fold constants".
- ^ "XML and Semantic Web W3C Standards Timeline" (PDF). 2012-02-04. Archived from the original (PDF) on 2019-10-26. Retrieved 2012-02-04.
- ^ "XQuery 3.0 Recommendation". 2014-04-08.
- ^ "XQuery 3.1 Recommendation". 2017-03-21.
- ^ XQuery and XPath Full Text 1.0
- ^ XQuery Scripting Extension 1.0 Requirements
- ^ XQuery 1.0 Scripting Extension
- ^ EXPath Community Group
- ^ Packaging System
- ^ File Module
- ^ Binary Module
- ^ Web Applications
- ^ "Standard for portable XQuery applications". Retrieved 12 December 2013.
- ^ "RESTXQ 1.0: RESTful Annotations for XQuery".
- Portions borrowed with permission from the books "XML Hacks" (O'Reilly Media) and "XQuery" (O'Reilly Media).
- Previous version based on an article at the French language Wikipedia
External links
[edit]XQuery
View on GrokipediaOverview
Definition and Purpose
XQuery is a standardized query language and functional programming language developed by the World Wide Web Consortium (W3C) specifically for retrieving and manipulating data in XML and related formats. As a W3C Recommendation, it operates on abstract representations of XML data, enabling users to express complex queries in a declarative manner without side effects, which ensures that query evaluations are deterministic and predictable.[5] The primary purpose of XQuery is to facilitate the extraction, transformation, and construction of structured data from diverse sources, such as XML documents, relational and native XML databases, object repositories, and web services. It addresses the need for a unified language to process both hierarchical XML structures and, in later versions, JSON data, allowing applications to integrate and analyze information across heterogeneous environments with high expressiveness and efficiency.[5][1] Among its key capabilities, XQuery supports the processing of sequences—ordered collections of items including nodes, atomic values, and functions—enabling iterative and recursive operations on data sets. It integrates path expressions from XPath for navigating and selecting elements within documents, while providing mechanisms for output serialization into formats such as XML, JSON, or plain text to suit various application needs.[1][6] Within the XML ecosystem, XQuery builds directly on XPath 2.0 and subsequent versions for expression syntax and semantics, and relies on the XQuery Data Model (XDM) to provide a typed, tree-based representation of data that accommodates both XML infosets and additional structures like JSON maps and arrays.[5][7]History and Development
XQuery emerged in response to the growing need for a standardized query language capable of handling semi-structured XML data in a manner analogous to SQL for relational databases. In 1998, the World Wide Web Consortium (W3C) held a workshop on query languages (QL'98) in December, leading to the formation of the XML Query Working Group in September 1999, chaired by Paul Cotton.[8][9][10] This group drew significant influence from Quilt, an early XML query language developed in 2000 that borrowed features from XPath, XQL, and SQL to enable declarative queries on XML documents.[5] The working group's efforts were guided by requirements emphasizing support for diverse XML sources, including documents and databases, while ensuring compatibility with emerging standards like XML Schema.[11] Key milestones in XQuery's development included the release of XPath 1.0 in November 1999 as a foundational precursor for path-based navigation in XML.[12] The language evolved alongside XPath 2.0 and XSLT 2.0, with the XML Query Working Group collaborating closely with the XSL Working Group to integrate a shared data model, the XQuery and XPath Data Model (XDM), developed concurrently to unify type systems and instance representations across these specifications.[13] XQuery 1.0 achieved W3C Recommendation status on January 23, 2007, marking its formal standardization as a versatile query language for XML.[14] Subsequent versions expanded functionality, with XQuery 3.1, published on March 21, 2017, introducing native support for JSON data sources to address evolving web data formats.[1] The XML Query Working Group operated until its charter expired on May 31, 2015, after delivering core specifications and extensions like update facilities and full-text search.[15] Following the group's closure, ongoing development shifted to the W3C Community Group known as the Query and Transformation Community Group (QT CG), informally QT4CG, formed in 2020 to propose extensions for XQuery 4.0 and related standards through collaborative drafts.[16][4] This transition reflected a move toward community-driven evolution while maintaining backward compatibility with earlier recommendations.[17]Language Fundamentals
Data Model (XDM)
The XQuery Data Model (XDM) provides an abstract, tree-based representation of XML data and other information sources, serving as the foundational structure for all XQuery, XPath, and XSLT operations. It defines how data is conceptualized and manipulated, ensuring that queries operate on a consistent, ordered collection of items rather than raw XML syntax. All inputs to XQuery expressions, such as XML documents or external data, must first be mapped to an XDM instance, and query results are likewise expressed as XDM instances.[18] At its core, the XDM comprises three primary components: atomic values, nodes, and sequences. Atomic values are indivisible scalar items drawn from the value spaces of atomic types defined in XML Schema, such as strings (xs:string), integers (xs:integer), and dates (xs:date). These values lack identity, parent-child relationships, or ordering beyond their type. Nodes, in contrast, represent structured elements of XML with inherent properties like identity and hierarchy, forming a tree structure that mirrors the Infoset or Post-Schema-Validation Infoset (PSVI) of the source XML. There are seven kinds of nodes: document nodes (roots of entire documents), element nodes (tagged structures), attribute nodes (key-value pairs on elements), text nodes (character data), processing-instruction nodes (XML declarations like <?xml-stylesheet ?>), comment nodes (non-semantic annotations), and namespace nodes (bindings for prefixes like xmlns). Together with atomic values, these constitute the eight fundamental kinds of items in the XDM, enabling representation of both structured and unstructured data.[19][20][21]
Sequences serve as the overarching container in the XDM, allowing ordered collections of zero or more items of any kind, including mixtures of nodes, atomic values, and even nested sequences (though without deep nesting of non-atomic items). Unlike sets, sequences preserve duplicates and order, which is crucial for operations like sorting or positional access in queries. For instance, a sequence might combine an element node with an xs:integer atomic value, such as ( <book/>, 42 ). This design supports flexible data flow, where queries can produce and consume sequences as results.[22]
The XDM's type system builds on XML Schema datatypes to provide both schema-aware and schema-free processing modes. In schema-aware mode, items carry precise type annotations from a validated PSVI, such as an element typed as xs:integer for numeric content, enabling type-safe operations like arithmetic on validated dates or strings. Schema-free mode, used for unvalidated or partially validated data, defaults to xs:untypedAtomic for atomic values and xs:untyped for nodes, treating content as generic strings until explicitly cast. This duality allows XQuery to handle diverse data sources without requiring full schema validation upfront, while still supporting typed computations when schemas are available.[23][24]
Every item in the XDM exposes a set of standard properties for access and manipulation. The string value of a node is the concatenated lexical representation of its text content (or the atomic value itself for atomic items), providing a simple textual serialization. The typed value extracts the underlying atomic values, respecting type annotations—for example, converting an untyped numeric string to xs:decimal if valid. The base URI property, inherited from the document or parent, anchors relative references to an absolute URI, essential for resolving external resources like included schemas or linked documents. These properties ensure uniform access across item kinds, facilitating functions like string() or data() in queries.[25][26][27]
As a prerequisite for XQuery execution, all external inputs—whether full XML documents, fragments, or non-XML data like JSON via extensions—must be transformed into XDM instances, typically through mapping rules that preserve order and structure without delving into parsing mechanics. Outputs are similarly constrained to XDM sequences, which can then be serialized to XML, JSON, or other formats as needed. This conformance guarantees interoperability across XQuery implementations and related standards.[28]
Basic Syntax and Expressions
XQuery's basic syntax revolves around expressions that operate on instances of the XQuery and XPath Data Model (XDM), producing sequences of items as results.[29] These expressions form the core of queries, allowing navigation, computation, construction, and declaration of data structures within the language's prolog and query body. The syntax draws heavily from XPath for navigation and incorporates operators for manipulation, ensuring concise and declarative query formulation.[30] Path expressions in XQuery enable hierarchical navigation through XML documents, leveraging XPath 3.1 syntax to select nodes based on their location and properties.[31] A primary step uses the forward slash/ to denote child axis navigation, as in /doc/item, which selects all item child elements of the root doc node.[32] The double slash // specifies descendant-or-self axis, allowing selection regardless of depth, for example //item retrieves all item elements anywhere in the document.[33] Axes extend navigation directionally; the attribute:: axis accesses attributes with @, such as /doc/item/@price to retrieve the price attribute value of each item.[33] Predicates enclosed in square brackets [] filter selections conditionally, like /doc/item[price > 30] to choose only item elements with a price child exceeding 30.[34] Wildcards facilitate flexible matching: * denotes any element node, as in /doc/* for all root children, while @* selects all attributes of a node.[35]
Operators in XQuery perform computations on atomic values and sequences, categorized into arithmetic, comparison, logical, and sequence types.[36] Arithmetic operators include addition (+), subtraction (-), multiplication (*), and division (div), applied to numeric operands; for instance, 5 + 3 yields 8, while 10 div 2 produces 5.[37] Comparison operators such as eq for equality, ne for inequality, and lt for less than enable value assessments, with price eq 10 returning true if the price equals 10.[38] Logical operators and and or combine boolean expressions, as in (price > 10) and (stock > 0) to check multiple conditions simultaneously.[39] Sequence operators manipulate collections: the comma , concatenates sequences like (1, 2, 3), the pipe | or union keyword merges with duplicate removal, such as (1, 2) union (2, 3) resulting in (1, 2, 3), and intersect retains common items, e.g., (1, 2, 3) intersect (2, 3, 4) yields (2, 3).[40]
Constructors build new XML nodes and values from expressions, supporting both direct and computed forms.[41] Direct element constructors use XML-like syntax with embedded expressions in curly braces { }, for example <book>{ $title }</book> where $title is a variable holding the book title string.[42] Attribute constructors follow similarly, as in <book id="{ $id }">...</book>. Computed constructors provide dynamic naming and content, using keywords like element followed by a name expression and content, such as element { "book" } { "XML Querying" } to create an element named book with the given text.[43] These support document, element, attribute, text, comment, and processing instruction nodes, ensuring flexible output generation.[43]
Declarations appear in the query prolog to set up namespaces, import modules, and bind variables for use in the main expression.[44] Namespace declarations use declare [namespace](/page/Namespace) prefix = "URI";, binding a prefix to a namespace URI for qualified name resolution throughout the query.[45] Module imports employ import module [namespace](/page/Namespace) prefix = "module-URI" at "location"; to incorporate external library modules, enabling reuse of functions and variables.[46] Variable bindings via declare variable $name := expression; initialize global variables, such as declare variable $doc := [doc](/page/Document)("books.xml"); to load an external document for subsequent reference.[47] The prolog precedes the query body, ensuring all declarations are processed before expression evaluation.[44]
Core Features
FLWOR Expressions
FLWOR expressions form the cornerstone of XQuery for performing complex queries that iterate over sequences, bind variables, filter results, sort them, and construct outputs, much like SQL's SELECT-FROM-WHERE construct but tailored for XML and other data models.[48] Introduced in the initial XQuery 1.0 specification and refined in subsequent versions, a FLWOR expression (named for its clauses: For, Let, Where, Order by, Return) processes tuples from input sequences to generate a result sequence. The clauses are evaluated sequentially, with optional preceding clauses like For and Let binding variables, followed by filtering and sorting, and culminating in the Return clause that defines the projected output.[48] The For clause initiates iteration by binding a variable to each item in a sequence, effectively looping over the input data.[49] For example,for $item in doc("books.xml")//book binds $item to each <book> element in the document.[49] An optional positional variable can be added using at $pos, which captures the one-based index of the current item during iteration, enabling position-aware processing such as numbering results.[48] The Let clause complements For by binding a variable to the result of an expression without iteration, useful for computations or subqueries that apply to the entire tuple stream, such as let $total := sum($prices).[50]
Filtering occurs in the Where clause, which retains only tuples satisfying a Boolean expression, often using path expressions or predicates to select relevant items.[51] For instance, where $item/price > 20 would exclude books below that price threshold.[51] The Order by clause then sorts the filtered tuples, supporting ascending (ascending) or descending (descending) orders on one or more keys, with options for handling empty values as empty greatest or empty least to control their placement in the sorted sequence.[52] Collation specifications can further customize string comparisons.[52] Finally, the Return clause projects the final result, which can construct new XML nodes, sequences, or atomic values based on the bound variables, such as return <book>{ $item/title }</book>.[53]
XQuery 3.0 introduced the Window clause to facilitate aggregation over sliding or tumbling windows in sequences, enhancing FLWOR for time-series or grouped data processing.[54] A sliding window overlaps consecutive items (e.g., for sliding window $w as $item in expr start $s when fn:true() end $e when fn:true() end previous $p when fn:true()), allowing computations like moving averages, while a tumbling window processes non-overlapping partitions (e.g., for tumbling window $w ...).[54] Within the window variable $w, sub-clauses like $w/current access the current item, and aggregates can be applied over the window's contents.[54]
Additionally, the Count clause, introduced in XQuery 3.0, binds a variable to the number of iterations performed by a preceding For clause, avoiding the need to materialize the full sequence for counting purposes.[55] For example, for $x in (1 to 100) count $c return $c yields 100 without generating the entire sequence.[55] This clause can appear after For and supports efficient cardinality queries in large datasets.[55]
Functions, Types, and Modules
XQuery provides a rich set of built-in functions in thefn: namespace, which includes over 200 functions for performing common operations on data such as accessing documents, manipulating strings, and aggregating sequences.[56] For instance, fn:doc() loads an XML document from a URI, fn:count($sequence) returns the number of items in a sequence, and fn:substring($string, $start, $length) extracts a portion of a string.[57][58][59] These functions are defined in the XPath and XQuery Functions and Operators 3.1 specification and form the core library for query expressions.[56]
In addition to built-in functions, XQuery supports user-defined functions to promote code reuse and modularity. User-defined functions are declared using the syntax declare function local:myfunc($param as xs:[string](/page/String)) { ... }, where the function name is qualified with a namespace prefix, parameters are specified with optional type annotations, and the body contains the function's logic.[60] Overloading is permitted for functions with the same name but different numbers of parameters (arity), allowing multiple implementations based on parameter count. Functions with the same expanded QName and the same arity result in a static error [err:XQST0034], even if their signatures are consistent.[61][62] Higher-order functions, introduced in XQuery 3.0, enable advanced patterns, such as passing functions as arguments or returning them, for example, using an inline function like function($x) { $x * 2 } as a parameter to another function.[63][64]
XQuery's type system builds on the XDM data model, emphasizing sequence types for precise declarations and validation. Sequence types describe the expected items and their cardinality, such as xs:integer+ for one or more integers or item*? for zero or more optional items of any type.[65] Validation is achieved through expressions like $value instance of xs:string, which checks if a value conforms to a specified sequence type, and $value cast as xs:integer, which attempts to convert a value to the target type, raising an error if incompatible.[66][67] These mechanisms ensure type safety in function signatures and variable declarations.[68]
Modules in XQuery enable the organization of code into reusable libraries, distinguishing between main modules and library modules. A main module includes a query prolog and body for execution, while a library module consists of a module declaration and contains only function and variable definitions without an executable body.[69] Modules are imported using the prolog directive import module namespace prefix = "module-uri" [at "location"], which binds a namespace prefix to the imported module's URI and optionally specifies its location.[46] Resolution errors are classified as static if detected during compilation (e.g., invalid namespace URI) or dynamic if arising at runtime (e.g., unavailable module location).[70] This modular structure supports large-scale query development by allowing separation of concerns and dependency management.[71]
Practical Usage
Code Examples
XQuery provides a variety of expressions for querying and transforming XML and JSON data, as defined in the W3C specifications. The following examples illustrate practical applications using sample data sources, demonstrating key constructs such as path expressions, FLWOR (For-Let-Where-Order by-Return) expressions for iteration and aggregation, JSON navigation in version 3.1, and XML construction. These snippets are runnable in conforming XQuery processors and assume access to external documents like XML files for books or sales records.[72] A simple query can extract specific elements from an XML document using path expressions. For instance, to retrieve the titles of all books from a catalog file, the following expression iterates over book elements and returns their title children:for $book in doc("books.xml")//book
return $book/title
for $book in doc("books.xml")//book
return $book/title
<title>[XPath](/page/XPath)</title> and <title>XQuery</title>. Such queries are foundational for selecting subsets of data without complex logic.[3]
For more complex operations involving grouping and aggregation, FLWOR expressions enable iteration, binding variables, filtering, sorting, and computation. Consider a sales records XML document (sales-records.xml) with records containing product names and quantities. The query below groups sales by product name, sums the quantities, and orders the results alphabetically, constructing a new XML fragment:
<sales-qty-by-product>{
for $sales in doc("sales-records.xml")/*/record
let $pname := $sales/product-name
group by $pname
order by $pname
return <product name="{$pname}">{sum($sales/qty)}</product>
}</sales-qty-by-product>
<sales-qty-by-product>{
for $sales in doc("sales-records.xml")/*/record
let $pname := $sales/product-name
group by $pname
order by $pname
return <product name="{$pname}">{sum($sales/qty)}</product>
}</sales-qty-by-product>
<product name="Laptop">150</product>, aggregating totals per product while leveraging the group by and order by clauses for organization. FLWOR components like for for iteration, let for binding, group by for categorization, and return for output construction facilitate such data summarization.[72]
XQuery 3.1 introduces native support for JSON data through functions like json-doc and postfix notation for map and array access, allowing seamless querying of JSON structures. For example, given a JSON file (mildred.json) with contact details such as {"phone": [{"type": "mobile", "number": "07356 740756"}]} , the following extracts the mobile phone number:
json-doc("mildred.json")?phone?*[?type = 'mobile']?number
json-doc("mildred.json")?phone?*[?type = 'mobile']?number
? operators to access the array of phones, filter by type, and retrieve the number, returning "07356 740756". Such syntax simplifies JSON processing without conversion to XML.[73]
Transformation examples demonstrate how XQuery constructs new XML from input data, often combining queries with element constructors. Using a document (head_para.xml) with implicit sections marked by <h2> headings followed by paragraphs, the following FLWOR with a tumbling window restructures it into explicit sections:
declare variable $seq := doc("head_para.xml");
<chapter>{
for tumbling window $w in $seq/body/*
start previous $s when $s[self::h2]
end next $e when $e[self::h2]
return <section title="{data($s)}">
{for $x in $w return <para>{data($x)}</para>}
</section>
}</chapter>
declare variable $seq := doc("head_para.xml");
<chapter>{
for tumbling window $w in $seq/body/*
start previous $s when $s[self::h2]
end next $e when $e[self::h2]
return <section title="{data($s)}">
{for $x in $w return <para>{data($x)}</para>}
</section>
}</chapter>
<section> elements with titles from <h2> and wrapped paragraph content, effectively converting flat structure to hierarchical XML. Window clauses like tumbling window enable sliding groupings over sequences for such restructurings.[72]
Error Handling and Optimization
XQuery distinguishes between three primary categories of errors to ensure robust query processing: static errors, dynamic errors, and type errors. Static errors are detected during the static analysis phase, which occurs before query evaluation, and include issues such as syntax violations or references to undeclared variables, exemplified by the error code err:XQST0046 for invalid URI literals.[74] Dynamic errors arise during the dynamic evaluation phase and encompass runtime failures like numeric overflow or division by zero, with the error code err:XPDY0002 specifically indicating an attempt to reference an undeclared variable in the dynamic context.[75] Type errors, a subset that can manifest either statically or dynamically, occur when an expression's actual type does not match the expected type in its context, such as err:XPTY0004 for incompatible type mismatches during function application.[76] Error handling in XQuery primarily addresses dynamic and type errors through the try/catch expression, which allows developers to encapsulate potentially erroneous code and provide alternative processing in the catch clause. The try clause evaluates the enclosed expression, while the catch clause binds error details—including the error code (a QName in the namespace http://www.w3.org/2005/xqt-errors), description, and associated value—to a variable for inspection and conditional handling.[77] This mechanism supports exit actions, enabling graceful termination or fallback logic without halting the entire query execution. Predefined error codes, standardized across implementations, facilitate precise error identification and debugging, with over 200 codes defined for various conditions like err:FORG0006 for invalid boolean conversion operands.[78] Additionally, the fn:error function permits explicit raising of custom errors during evaluation, providing a means to enforce business rules or validate inputs programmatically.[79] Optimization in XQuery focuses on improving query efficiency while preserving semantics, with implementations permitted to apply transformations during compilation and execution. Query rewriting techniques, such as predicate pushdown—where selection conditions are moved closer to data access points—reduce intermediate result sizes and leverage indexes effectively, as illustrated in optimizing path expressions like //part[color eq "Red"] by using value indexes on the color attribute.[80] Compilation strategies often involve translating XQuery to an intermediate representation, such as bytecode, to enable just-in-time optimization and faster execution on virtual machines, enhancing performance for complex FLWOR expressions.[81] Hints like "stable" in the order by clause of FLWOR expressions guide the optimizer by enforcing preservation of input order for equal sort keys, though this may limit certain reorderings compared to unstable order by, trading potential speed gains for deterministic results.[82] Profiling tools in XQuery environments analyze query execution to identify bottlenecks, generating plans that detail operator costs, data flows, and timings to inform rewrites or index strategies. These tools measure metrics like total execution time and subexpression durations, allowing developers to quantify improvements from optimizations such as index usage, which can reduce scan times from linear to logarithmic in large XML datasets.[83]Comparisons
XQuery vs. XSLT
Both XQuery and XSLT share foundational elements that enable them to process XML data effectively. They operate on the XML Query Data Model (XDM), which defines the structure and types for XML instances, including nodes, atomic values, and sequences.[18] Both languages leverage XPath as their core expression language for navigating and selecting data within XML documents. Additionally, they produce XML output by default and support serialization to other formats, ensuring compatibility in XML-centric environments.[6] Starting with version 2.0, XSLT incorporates XQuery-like expressions through its use of XPath 2.0, allowing for more advanced functional constructs such as user-defined functions and sequence processing that align closely with XQuery's syntax. Despite these commonalities, XQuery and XSLT diverge significantly in their paradigms and design goals. XQuery is a Turing-complete functional programming language optimized for querying and manipulating XML data in a declarative, expression-based manner, resembling SQL but extended for hierarchical structures.[84] In contrast, XSLT is a template-based stylesheet language that employs a rule-driven, push-style processing model to transform XML documents, where patterns match input elements and templates generate output declaratively.[85] XSLT is also Turing-complete, but its declarative template matching prioritizes document-oriented transformations over general-purpose computation, making it less procedural than XQuery's FLWOR expressions.[86] These differences stem from XQuery's focus on database-like retrieval and aggregation, versus XSLT's emphasis on stylistic and structural reformatting. In practice, XQuery excels in use cases involving database-style data retrieval and reporting from large XML repositories, such as extracting and aggregating information from multi-terabyte XML databases to generate structured reports.[3] XSLT, however, is particularly suited for document styling and narrative transformations, like converting XML content into HTML for web presentation or reformatting reports for human-readable output. For instance, XQuery might query a collection of XML invoices to compute totals and filter by criteria, while XSLT would apply templates to render the same data as a formatted webpage. Interoperability between the two languages is facilitated by their shared foundations, allowing for integration in certain implementations. While the XSLT 3.0 standard does not directly support importing XQuery libraries or embedding XQuery expressions, some processors enable this through extensions, such as invoking XQuery functions from XSLT stylesheets.[86] Additionally, XSLT 2.0's xsl:analyze-string element enables pattern-based analysis akin to XQuery's string functions, and vendor extensions often support direct XQuery embedding for hybrid processing. This integration allows developers to leverage XQuery's querying power within XSLT's transformation framework when needed.[86]XQuery vs. SQL and Other Query Languages
XQuery is tailored for querying hierarchical and semi-structured XML data, enabling direct navigation through path expressions that eliminate the need for explicit joins required in relational models.[1] In contrast, SQL is optimized for structured, tabular data stored in relations, where joins are essential to combine data across tables.[87] This fundamental difference in data models—XQuery's tree-based XDM versus SQL's flat rows and columns—makes XQuery more intuitive for document-centric tasks, while SQL excels in enforcing schemas and performing set operations on normalized data.[87] To address interoperability, SQL/XML standards provide bridging mechanisms, such as functions like XMLQUERY that embed XQuery expressions within SQL statements to process XML alongside relational data.[88] Both languages share a declarative paradigm, where users specify desired results without detailing execution steps, but XQuery extends this by natively handling nesting and returning sequences of items from the XDM model, accommodating non-tabular outputs like mixed XML structures.[89][1] SQL, while declarative, relies on extensions for hierarchy, often flattening nested data into rows via functions like XMLTABLE.[87] This native support in XQuery for hierarchical traversal and construction of XML results in more concise queries for complex, nested datasets compared to SQL's row-oriented approach.[88] Regarding other languages, XQuery 3.1 incorporates JSON support through maps and arrays in its data model, positioning it as a versatile superset for both XML and JSON querying without additional extensions.[1] JSONiq, an earlier XQuery extension designed to bridge XML querying with NoSQL JSON stores, adds JSON-specific constructs but has become largely redundant as a standalone language due to these native advancements in XQuery.[90] In contemporary applications, XQuery demonstrates strengths in handling semi-structured data from web APIs and documents, where its flexibility with XML and JSON outperforms SQL's schema-bound rigidity for dynamic, hierarchical content.[88] However, for RDF-based linked data, XQuery faces scalability limitations relative to SPARQL, which is engineered for efficient graph pattern matching across large-scale triple stores.Evolution and Standards
Version History
The XQuery 1.0 specification became a W3C Recommendation on January 23, 2007, introducing the core FLWOR (For-Let-Where-Order by-Return) expression syntax for querying and transforming XML data, tight integration with XPath 2.0 for path navigation and expression evaluation, and support for basic datatypes derived from XML Schema Part 2. This version established XQuery as a functional language capable of handling ordered and unordered sequences of items, with built-in functions and operators aligned with XPath 2.0.[91] XQuery 3.0 advanced the language as a W3C Recommendation on April 8, 2014, adding features such as the "group by" clause for partitioning sequences into groups based on criteria, windowing mechanisms including tumbling and sliding windows for processing ordered data in frames, try-catch expressions for error handling and recovery, and higher-order functions that allow functions to be passed as arguments or returned as results.[3] These enhancements built on the XQuery 1.0 foundation to support more complex analytical queries and robust programming constructs.[92] XQuery 3.1 was published as a W3C Recommendation on March 21, 2017, extending support for JSON data through functions like fn:json-doc() for loading JSON documents and constructors for building JSON structures, alongside the introduction of maps and arrays to the data model for representing key-value pairs and ordered collections.[1] It also incorporated namespace declarations for operators to avoid conflicts and improved streaming capabilities to process large datasets incrementally without full materialization in memory.[1] The XQuery Update Facility 1.0 was developed as an extension to XQuery 1.0, providing expressions for modifying instances of the data model, such as insert, delete, and replace operations.[93] This was later extended by XQuery Update Facility 3.0, which became a W3C Recommendation on January 24, 2017, adding support for updates on JSON data and compatibility with XQuery 3.0 features.[94] The XQueryX serialization format, an XML-based representation of XQuery expressions, has seen limited adoption in practice despite its inclusion across versions.[95]Current Status and Future Developments
As of 2025, XQuery 3.1 remains the stable W3C Recommendation, serving as a versatile query language for processing XML, JSON, and other structured data sources.[2] This version is widely implemented in production tools, including Saxon 12, which provides full support for XQuery 3.1 alongside experimental features from upcoming standards, and BaseX 12.0, released in June 2025, which includes a compliant XQuery processor emphasizing high-performance XML database operations.[96][97] The XQuery and XSLT Extensions Community Group (QT4CG) is actively developing XQuery 4.0, with an Editor's Draft published on 29 October 2025.[4] This draft builds on XQuery 3.1 by enhancing the underlying XQuery and XPath Data Model (XDM 4.0), introducing generalized nodes (GNodes) that encompass XML nodes (XNodes) and JSON nodes (JNodes) to better support querying across diverse data formats like XML and JSON.[98] Timezone handling is improved through explicit timezone support in functions likecurrent-dateTime and an implementation-defined implicit timezone as an xs:dayTimeDuration, addressing precision in date/time operations across global data sources.[4]
Adoption of XQuery persists in XML-centric ecosystems, particularly for data integration and querying in enterprise tools, where XML-based technologies are supported by approximately 70% of data integration platforms.[99] It integrates with REST APIs in XML-heavy applications, such as content management and document processing. The XML databases software market, reliant on XQuery for native querying, is projected to reach $329 million in 2025, reflecting steady growth in specialized domains like publishing and compliance reporting.[100]
Future developments under QT4CG emphasize multimodality in XDM 4.0, enabling seamless handling of XML, JSON, HTML, and emerging structures like maps and arrays to converge with NoSQL data models and broaden applicability beyond traditional XML stores.[98] While direct AI-assisted querying remains exploratory, enhancements like generalized nodes position XQuery for integration with hybrid data environments, potentially supporting automated query generation for diverse sources including NoSQL systems.[101]
Extensions
W3C Extensions
The W3C has developed several standardized extensions to the core XQuery language to address specific needs in data manipulation and querying, ensuring compatibility while enhancing functionality for XML processing. These extensions are defined as separate modules that build upon the XQuery 1.0 and XPath 2.0 data model, requiring explicit imports in queries to access their features. They maintain the declarative nature of XQuery but introduce specialized expressions for updates, full-text search, and alternative syntax representations. The XQuery Update Facility 1.0, published as a W3C Recommendation on March 17, 2011, enables non-destructive updates to XML data by collecting modifications in pending update lists rather than altering instances immediately. This facility supports operations such as insert, which adds nodes before, after, or into a target (e.g.,insert node <year>2005</year> after $book/year); delete, which removes specified nodes (e.g., delete node $book/author); and replace, which substitutes a node or its value (e.g., replace value of node $book/price with 29.99). These updates are applied atomically via the upd:applyUpdates primitive at the end of evaluation, preserving node identities where possible and allowing integration with FLWOR expressions for patterned modifications. Additionally, the transform expression with a copy-modify-return clause creates copies of nodes for transformation without affecting originals (e.g., copy $newBook := $book modify (replace value of node $newBook/price with 29.99) return $newBook), supporting revalidation modes like strict, lax, or skip to ensure schema compliance.[93]
The XQuery and XPath Full-Text Search 1.0, also a W3C Recommendation from March 17, 2011, extends XQuery with capabilities for sophisticated text retrieval in XML documents, using the ft:query or FTContainsExpr to perform searches (e.g., $doc/books/book contains text {"XML" ftand "query"}). Key features include thesaurus options via FTThesaurusOption, which expands queries with synonyms or related terms from external thesauri (e.g., "duty" using thesaurus at "http://example.org/thesauri.xml" relationship "synonym"), and configurable levels of relatedness. Matching options allow customization for case sensitivity, diacritics, stemming, wildcards, and stop words (e.g., case insensitive, stemming, without content ("the", "a")), enabling precise control over token normalization and positional constraints like distance or scope. This extension was updated in Full-Text 3.0 (W3C Recommendation, November 24, 2015), which aligned the grammar with XQuery 3.0, added support for relevance scoring via weights and optional score variables in FLWOR clauses, and enhanced positional filters for sentences or paragraphs without altering core semantics.[102][103]
XQueryX 3.1, released as a W3C Recommendation on March 21, 2017, provides an XML-based syntax as an alternative to the textual XQuery notation, facilitating embedding in XML documents or processing with XML tools like XSLT. It represents XQuery constructs as XML elements (e.g., a FLWOR expression as <flworExpr><forClause><forClauseItem><varRef name="item"/></forClauseItem></forClause>...</flworExpr>), ensuring semantic equivalence to textual XQuery 3.1 while avoiding parsing ambiguities in textual forms. This extension builds fully compatibly on XQueryX 3.0, incorporating new features like map and array constructors, and is particularly useful for generating or querying XQuery code within XML workflows.[95]
All these W3C extensions integrate seamlessly with core XQuery by extending its static and dynamic contexts, but they require importing the respective namespaces (e.g., import module namespace up="http://www.w3.org/2007/xquery-update-10"; for updates) to enable their expressions, ensuring modular adoption without disrupting base language conformance.[93][102][95]
Third-Party and Vendor Extensions
MarkLogic Server provides a suite of proprietary XQuery extensions through thexdmp namespace, enabling server-specific operations such as retrieving cluster configuration details and managing mimetypes across nodes.[104] These functions support high-availability clustering by allowing queries to interact with host statuses, database IDs, and modules roots, facilitating distributed document processing in enterprise environments.[105] For security, the xdmp:security functions offer granular control over authentication, roles, and permissions, including tasks like user management and privilege evaluation directly within XQuery expressions.[106] MarkLogic's JSON handling draws inspiration from JSONiq principles but has evolved into custom extensions like json:transform for converting between XML and JSON, optimizing for its multi-model database architecture without strict adherence to the deprecated JSONiq specification.
Saxon, an open-source XQuery processor, extends the language with functions like saxon:serialize, which allows fine-grained control over output formatting by serializing nodes according to custom parameters, such as indentation or encoding, beyond standard XSLT serialization rules.[107] This extension is particularly useful for generating dynamic outputs in applications requiring precise document rendering. For handling large datasets, Saxon's streaming enhancements, including saxon:stream, enable processing of documents that exceed memory limits by reading input sequentially, supporting XQuery 3.1's higher-order functions in a memory-efficient manner.[108] These features maintain compatibility with W3C standards while adding proprietary optimizations for performance-critical transformations.[109]
BaseX implements extensions via the XQJ (XQuery API for Java) interface, providing a Java-centric binding that extends standard XQuery execution with methods for connection pooling, prepared queries, and result handling in its open-source implementation supporting XQuery up to version 4.0.[110][97] This API facilitates integration with Java applications by allowing dynamic query compilation and error reporting tailored to database operations. eXist-db complements this with its XQuery Update Extension, which introduces functions like update:insert and update:replace for modifying persistent documents atomically, enabling versioned storage through the dedicated versioning module that tracks changes over time.[111] These updates support temporal-like querying by maintaining historical revisions, though they require explicit module imports for compatibility.[112]
In the broader community, JSONiq emerged as a third-party extension to XQuery for native JSON processing, introducing constructs like object and array literals while building on XQuery's data model, and it is maintained as of 2025.[113] JSONiq influenced XQuery 3.1's adoption of maps and arrays for JSON interoperability, though implementations must address compatibility gaps, such as stricter type restrictions in JSONiq versus XQuery's flexible sequences.[114] This evolution highlights community-driven innovations that prioritize JSON workflows but risk fragmentation without full alignment to core standards.[115]
