Recent from talks
Nothing was collected or created yet.
RELAX NG
View on Wikipedia| RELAX NG | |
|---|---|
| Filename extension |
.rng |
| Internet media type |
application/xml, text/xml |
| Type of format | XML schema language |
| Extended from | XML |
In computing, RELAX NG (REgular LAnguage for XML Next Generation) is a schema language for XML. A RELAX NG schema specifies a pattern for the structure and content of an XML document. A RELAX NG schema is itself an XML document, but RELAX NG also offers a popular compact, non-XML syntax.[1] Compared to other XML schema languages, RELAX NG is considered relatively simple.
It was defined by a committee specification of the OASIS RELAX NG technical committee in 2001 and 2002, based on Murata Makoto's RELAX and James Clark's TREX,[2][3][4] and also by part two of the international standard ISO/IEC 19757: Document Schema Definition Languages (DSDL).[5][6] ISO/IEC 19757-2 was developed by ISO/IEC JTC 1/SC 34 and published in its first version in 2003.[7]
Comparison with W3C XML Schema
[edit]Although the RELAX NG specification was developed at roughly the same time as the W3C XML Schema specification, the latter was arguably better known and more widely implemented in both open-source and proprietary XML parsers and editors when it became a W3C Recommendation in 2001. Since then, however, RELAX NG support has increasingly found its way into XML software, and its acceptance has been aided by its adoption as a primary schema for popular document-centric markup languages such as DocBook, the TEI Guidelines, OpenDocument, and EPUB.
RELAX NG shares many features with W3C XML Schema that set both apart from traditional DTDs: data typing, regular expression support, namespace support, ability to reference complex definitions.
Filename extensions
[edit]By informal convention, RELAX NG schemas in the regular syntax are typically named with the filename extension ".rng". For schemas in the compact syntax, the extension ".rnc" is used.
Determinism
[edit]Relax NG schemas are not necessarily "deterministic" or "unambiguous".
Converting Relax NG to DTD
[edit]Relax NG schemas can be converted to DTDs by applying Trang which can be found at: [1]. The manual for Trang is located at [2]. Note that Trang is unable to convert the OASIS DITA 1.3 schema to DTDs, failing with messages like:
sorry, combining definitions with combine="choice" is not supported
See also
[edit]- XML schemas
- DTD (Document Type Definition)
- Document Structure Description
- XML Schema (W3C)
- Schematron
- ODD (One Document Does it all)
- SXML
References
[edit]- ^ RELAX NG Compact Syntax
- ^ James Clark. "TREX - Tree Regular Expressions for XML - "TREX has been merged with RELAX to create RELAX NG."". Retrieved 2009-12-28.
- ^ Murata Makoto (2002-04-03). "RELAX (Regular Language description for XML) -- "RELAX NG of OASIS. It is a schema language created by unifying RELAX Core and TREX."". Retrieved 2009-12-28.
- ^ "TREX and RELAX Unified as RELAX NG, a Lightweight XML Language Validation Specification". Cover Pages. 2001-06-05. Retrieved 2009-12-28.
- ^ RELAX NG Specification
- ^ RELAX NG Technical Committee
- ^ ISO. "ISO/IEC 19757-2:2008 - Information technology -- Document Schema Definition Language (DSDL) -- Part 2: Regular-grammar-based validation -- RELAX NG". ISO. Retrieved 2009-12-28.
External links
[edit]- RELAX NG home page
- "The Design of RELAX NG" by James Clark
- RELAX NG tutorial for the XML syntax
- RELAX NG tutorial for the compact syntax
- Design patterns for structuring XML documents
- RELAX NG Book by Eric van der Vlist, released under the GNU Free Documentation License
- Relax NG Reference by ZVON
- RELAX NG Java community projects at java.net
- Sun Multi-Schema Validator (MSV) open-source Java XML toolkit
- Relax NG Compact Syntax validator open-source C program
- XSD to Relax NG Converter Web-based converter
- https://github.com/relaxng/jing-trang
RELAX NG
View on GrokipediaOverview
Definition and Purpose
RELAX NG, an acronym for REgular LAnguage for XML Next Generation, is a schema specification language designed to define patterns for the structure, content, and valid sequences of elements and attributes in XML documents.[6][7] Its primary purpose is to enable the validation of XML instances against these schemas, thereby ensuring conformance to predefined rules, maintaining data integrity, and promoting interoperability across XML-based applications and systems.[7][3] Unlike traditional XML documents, RELAX NG schemas can be expressed either as XML documents themselves or in a non-XML compact syntax, while preserving the underlying XML information set without modification.[3][8] RELAX NG emerged as an alternative to more complex schema languages for XML processing, prioritizing simplicity and ease of use over intricate features like those in W3C XML Schema or DTDs.[5]Key Features
RELAX NG supports two distinct syntaxes for authoring schemas: an XML-based syntax typically stored in files with the .rng extension, and a compact, non-XML syntax stored in files with the .rnc extension, allowing authors to choose between structured XML representation and a more concise, grammar-like notation that enhances readability while preserving full expressiveness.[8] The schema language integrates seamlessly with W3C XML Schema datatypes, enabling validation of content against built-in types such as string and integer, as well as custom derivations, through the use of the element specifying the datatype library.[9] This integration allows RELAX NG to leverage a robust datatype system without redefining it, focusing instead on structural patterns. RELAX NG features a namespace-friendly design that supports referencing external namespaces via URI identifiers or prefixed names, avoiding prefix conflicts by relying on namespace URIs for unambiguous resolution during schema interpretation and validation.[9] Unlike some schema languages that enforce strict determinism, RELAX NG permits ambiguous grammars where multiple patterns may match a given context, resolving them dynamically based on the instance document without requiring unambiguous definitions.[9] RELAX NG documents in XML syntax are identified by the internet media types application/xml and text/xml, ensuring compatibility with standard XML processing tools and parsers.[1] The language excels at expressing complex constraints, such as co-occurrence rules between attributes and elements—for instance, requiring a specific attribute only when certain child elements are present—through constructs likeHistory
Development Origins
RELAX NG originated from the convergence of two independent efforts to create more expressive and flexible schema languages for XML documents. In early 2000, Makoto Murata developed RELAX (Regular Language description for XML), a schema language that leveraged regular language theory to describe XML structures, initially published as a specification on February 24, 2000, and later approved as an ISO/IEC Technical Report (TR 22250-1) in May 2001.[11][12] Independently, in January 2001, James Clark introduced TREX (Tree Regular Expressions for XML), a compact schema language based on tree automata that emphasized simplicity and the use of W3C XML Schema datatypes.[13][14] These prototypes addressed key shortcomings in existing XML validation mechanisms, particularly the Document Type Definitions (DTDs) introduced with XML 1.0, which lacked support for XML namespaces and offered limited expressiveness for complex content models.[5] The unification of RELAX and TREX began in March 2001 when OASIS formed the TREX Technical Committee, chaired by James Clark, to standardize TREX as an XML validation language.[15] By June 2001, recognizing the synergies between the two approaches, OASIS merged the TREX committee with efforts around Murata's RELAX, renaming it the RELAX NG Technical Committee to develop a single, lightweight schema language.[16] Clark continued as chair, with Murata as a key contributor, aiming to create a "next generation" schema that combined RELAX's modularity with TREX's streamlined syntax while avoiding the anticipated verbosity and complexity of the emerging W3C XML Schema recommendation.[17] This motivation stemmed from the need for a schema language that was easier to author and maintain than DTDs—without their namespace insensitivity and rigid attribute handling—and simpler than XML Schema's type hierarchy and particle constraints.[5] Early development involved iterative prototypes and discussions within the committee, building on the theoretical foundations of hedge automata from RELAX and the practical validation algorithms from TREX. The first working draft of the RELAX NG specification was released on September 17, 2001, marking the initial formal outline of the unified language.[18] This draft incorporated core patterns from both predecessors, focusing on composable grammars to enable reusable schema modules, and set the stage for further refinement toward an OASIS committee specification.[17]Standardization Process
The standardization of RELAX NG began with its approval as an OASIS Committee Specification. On December 3, 2001, the OASIS RELAX NG Technical Committee approved the RELAX NG specification, which defined the XML syntax for the schema language. This marked the formal consolidation of earlier efforts into a unified standard for XML schema definition. Subsequently, the compact syntax was developed to provide a more concise, non-XML alternative for authoring schemas. On November 21, 2002, the OASIS RELAX NG Technical Committee approved the RELAX NG Compact Syntax as a Committee Specification, enabling easier human readability and editing while maintaining equivalence to the XML syntax.[8] RELAX NG was incorporated into the broader Document Schema Definition Languages (DSDL) framework, which coordinates multiple schema languages for comprehensive XML validation. As Part 2 of DSDL, RELAX NG focuses on regular-grammar-based validation, integrating with other parts like Schematron for rule-based assertions.[19] The international standardization process advanced through ISO/IEC JTC 1/SC 34. In December 2003, RELAX NG was published as ISO/IEC 19757-2:2003, titled Information technology — Document Schema Definition Languages (DSDL) — Part 2: RELAX NG, establishing it as a formal international standard for XML schema patterns based on regular tree grammars.[19] An amendment in 2006 (ISO/IEC 19757-2:2003/Amd 1:2006) added explicit support for the compact syntax.[20] The standard underwent a minor revision in December 2008 with the second edition (ISO/IEC 19757-2:2008), which incorporated the 2006 amendment and provided clarifications on schema requirements and validation rules without altering core functionality.[21] The 2008 edition remains the current version of RELAX NG Version 1, with the standard last reviewed and confirmed in 2024. A Version 2 is under development by ISO/IEC JTC1/SC34/WG1 to address evolving needs in XML processing.[3][21]Design Principles
Simplicity and Ease of Use
RELAX NG was designed with a primary goal of minimizing complexity to enhance usability for schema authors and implementers. Its pattern-based approach draws inspiration from context-free grammars, allowing schemas to define XML structures in a way that feels intuitive, particularly for those accustomed to regular expressions or formal language theory. By focusing on declarative patterns rather than intricate type hierarchies or inheritance mechanisms, RELAX NG avoids unnecessary abstractions that could complicate schema authoring. This design choice ensures that schemas can directly mirror the hierarchical structure of the target XML documents without requiring additional flattening or restructuring steps.[5][22] A key aspect of RELAX NG's simplicity lies in its validation process, which operates directly on the input XML without altering the document's infoset. Unlike certain schema languages that mandate preprocessing, such as inserting default attributes or handling nilled elements, RELAX NG performs validation as a pure matching exercise against defined patterns. This approach eliminates the need for schema processors to modify the original document, reducing potential errors and simplifying integration into XML pipelines. As a result, validation remains deterministic and non-intrusive, preserving the integrity of the input while confirming conformance.[5] The learning curve for RELAX NG is notably gentle, as it supports straightforward pattern matching without the verbose declarations often required in alternatives like XML Schema. Developers can express constraints using concise constructs that prioritize clarity over boilerplate, such as defining sequences and choices in a linear, grammar-like fashion. This reduces the overhead of schema creation and maintenance, making it accessible even to users without deep expertise in schema technologies. The language's uniform treatment of elements and attributes further streamlines comprehension, as patterns apply consistently across content models.[5] Readability is further enhanced in RELAX NG's compact syntax, which employs familiar delimiters like parentheses for grouping subpatterns and vertical bars for specifying alternatives, evoking the notation of regular expressions. This syntax option allows authors to write schemas in a textual format that closely resembles the informal sketches often used in documentation or prototyping. By leveraging such intuitive symbols, RELAX NG facilitates quick iteration and review, contributing to its overall ease of use in practical development workflows.[22][23]Modularity and Namespace Handling
RELAX NG supports modularity through theinclude and div elements, which enable schemas to reference and incorporate external definitions for reuse across multiple documents. The include element uses an href attribute to merge an external grammar into the current one, effectively inlining its patterns while allowing optional redefinition of specific definitions via nested define elements. For instance, a schema might include a module for common inline elements like this: <include href="inline.rng" rel="nofollow"/>, which adds reusable patterns without duplicating code. The div element further enhances organization by grouping related definitions within a grammar, such as <div><define name="header">...</define></div>, facilitating modular structure and annotation without affecting semantics.[7]
Linking mechanisms in RELAX NG, particularly the externalRef element, allow schemas to import patterns from external files while preserving the original context during composition. By specifying an href attribute, externalRef references a named pattern in another schema, ensuring that namespace and scoping information from the imported module is maintained to avoid resolution conflicts. An example is <externalRef href="common.rng" rel="nofollow"/> within an element pattern, which integrates reusable components like data types or structures seamlessly into the host schema. This approach promotes reuse by treating external patterns as black boxes, with validation occurring as if they were inline.[7]
RELAX NG integrates namespaces directly through xmlns declarations in schema documents, enabling the use of qualified names for elements and attributes without complications from default namespace inheritance. Schemas declare namespaces via attributes like xmlns:prefix="URI", and the ns attribute on grammar elements specifies the default namespace URI for contained names, such as ns="http://example.com/doc". This allows precise qualification, for example, <element name="ex:doc" xmlns:ex="http://example.com/doc">, ensuring that local names are bound correctly regardless of surrounding contexts and avoiding issues where unqualified names might inadvertently fall into unintended namespaces. Attributes, by convention, belong to the empty namespace unless explicitly qualified.[7]
Co-occurrence constraints in RELAX NG permit the definition of interdependencies across modules, such as requiring specific attribute-element pairings, through compositors like interleave and choice applied to imported patterns. For example, a module might define a card element that interleaves a name attribute with an email child element, ensuring they co-occur appropriately when referenced via externalRef or include: <interleave><attribute name="name"/><element name="email">...</element></interleave>. This mechanism enforces relational rules between components from different modules during validation, maintaining consistency in composed schemas without needing monolithic definitions.[7]
Syntax
XML Syntax
RELAX NG schemas in XML syntax are themselves valid XML documents that adhere to the RELAX NG namespacehttp://relaxng.org/ns/structure/1.0.[1] This syntax enables the use of standard XML tools for parsing, editing, and processing schemas. The root of the schema can be a single pattern, such as an <element>, or a <grammar> element for more modular definitions containing named patterns via <define> elements.[7]
Key elements in the XML syntax include <element> for defining element patterns, <attribute> for attribute patterns, and <text/> for matching text content. The <grammar> element, when used, serves as a container that includes a <start> element to designate the initial matching pattern and <define> elements to create reusable named patterns referenced by <ref>. This structure allows for hierarchical and modular schema composition while maintaining XML conformance.[1]
A simple example of an XML syntax schema defines a book element with required title and author child elements, each containing text:
<element name="book">
<element name="title">
<text/>
</element>
<element name="author">
<text/>
</element>
</element>
<element name="book">
<element name="title">
<text/>
</element>
<element name="author">
<text/>
</element>
</element>
<book><title>Foundation</title><author>Isaac Asimov</author></book>, ensuring the specified structure and content types.[7]
Namespaces are declared using standard XML xmlns attributes on the schema document or elements, with the RELAX NG namespace typically bound to a default or prefix like rng. For patterns matching namespaced elements or attributes, the ns attribute specifies the target namespace URI on <element> or <attribute>, such as <element name="book" ns="http://example.com/book">. This approach integrates seamlessly with XML's namespace mechanisms without requiring additional scoping elements within the <grammar>.[7]
The XML syntax is notably verbose compared to alternatives but excels in machine readability, making it ideal for automated schema generation by tools and integration into XML processing pipelines.
Compact Syntax
The RELAX NG compact syntax provides a non-XML, grammar-like notation for defining schemas, designed to enhance readability while preserving full equivalence to the XML syntax.[8] It employs keywords such aselement for defining elements and attribute for attributes, along with delimiters including {} for grouping content within named constructs, , for sequences, | for choices, & for interleaving, and repetition operators like * for zero or more occurrences, + for one or more, and ? for optional patterns.[8] These elements map directly to corresponding XML syntax constructs, such as element nameClass { pattern } translating to <element name="nameClass"><pattern/></element>, ensuring lossless bidirectional conversion between the two forms.[8][23]
For instance, a simple schema for a book element with required title and author subelements can be expressed as:
element book {
element title { text },
element author { text }
}
element book {
element title { text },
element author { text }
}
book root element containing exactly one title and one author, each holding text content, demonstrating the compact syntax's concise, declarative style akin to a context-free grammar.[23]
The advantages of the compact syntax include maximized human readability, which facilitates manual editing and comprehension without the verbosity of XML tags, while supporting all RELAX NG features such as namespaces, datatypes, and modularity through mechanisms like named patterns and includes.[8] It is particularly beneficial for schema authors preferring a programming-language-like format over XML's markup-heavy structure, and schemas in this form can be independently translated to XML syntax without introducing dependencies or loss of information.[8][23]
Annotations and comments further enhance usability in the compact syntax. Annotations are enclosed in square brackets [ ] and placed before patterns, allowing embedding of foreign elements or attributes for metadata, which map to XML annotations like <db:annotation> in DocBook-integrated schemas.[8] Comments begin with # and extend to the line end, while ## specifically denotes documentation comments that integrate with annotation mechanisms for schema documentation.[8][23]
Core Concepts
Patterns and Elements
In RELAX NG, patterns serve as the fundamental building blocks for specifying the structure of XML documents, defining what constitutes a valid tree of elements, attributes, and text content.[1] These patterns are composable, allowing schema authors to construct complex structures from simpler ones through operators that model sequences, alternatives, and repetitions.[5] By treating elements and attributes uniformly within content models, RELAX NG patterns enable a declarative approach that mirrors the hierarchical nature of XML without imposing rigid ordering constraints where unnecessary.[5] The core atomic patterns includetext, empty, and notAllowed, which handle textual content and constraints at the leaf level. The text pattern matches any sequence of text characters, including whitespace or empty strings, and is used to specify locations where arbitrary character data is permitted.[24] In contrast, the empty pattern matches only an absence of content, ensuring no text or child elements are present in that position.[8] The notAllowed pattern, conversely, forbids any content entirely, serving to explicitly prohibit elements or text in a given context and preventing unintended expansions in schema composition.[1]
Element and attribute declarations are primary patterns for defining named nodes in the XML tree. The element pattern specifies an XML element with a given name (or name class) and an inner pattern that constrains its content; for instance, in compact syntax, element foo { text } requires an <foo> element containing only text.[8] Similarly, the attribute pattern declares an XML attribute with a name and inner pattern, such as attribute name { text }, which mandates a name attribute holding text on its parent element.[24] These declarations can nest patterns recursively to capture document hierarchies.
Patterns combine via operators to express relationships between components. The sequence operator, denoted by a comma (,) in compact syntax or a group element in XML syntax, enforces ordered concatenation of subpatterns, as in element header { element title { text }, element author { text } }, which requires a <header> containing a <title> followed immediately by an <author>.[8] The choice operator, represented by a vertical bar (|) or a choice element, allows one of several alternatives, enabling flexible structures like element greeting { "hello" | "hi" } for selectable fixed content.[24]
Repetition modifiers extend patterns to handle multiplicity without verbose recursion. The optional modifier (?) permits zero or one occurrence of a pattern, such as element optionalNote { text }? to allow but not require a note.[8] The one-or-more modifier (+) requires at least one repetition, while zero-or-more (*) allows any number including none; for example, element items { element item { text }+ } defines an <items> element containing one or more <item> subelements, each with text, modeling a non-empty list structure.[1] These operators apply uniformly to elements, attributes, and other patterns, promoting modularity in schema design.[5]
Data Types and Attributes
RELAX NG integrates data types primarily through external libraries, with the W3C XML Schema Datatypes serving as the recommended standard for defining and constraining textual content within patterns.[7] This integration is facilitated by the<data> element in XML syntax or the data keyword in compact syntax, which specifies a datatype via the type attribute and references the library using the datatypeLibrary attribute, defaulting to the XML Schema namespace http://www.w3.org/2001/XMLSchema-datatypes.[25] Custom facets, such as length restrictions or value ranges, are applied using parameters like <param> in XML syntax or inline declarations in compact syntax, enabling precise control over content validity without embedding full schema complexity.[7]
Attributes in RELAX NG are defined using the <attribute> pattern, which specifies the attribute's name and optional content constraints, including data types from the integrated libraries.[1] By default, attributes are optional within an element pattern unless explicitly required through grouping or referencing mechanisms; their order is insignificant, unlike elements.[7] For typed attributes, the pattern can incorporate a <data> child, as in the compact syntax example:
attribute name { text }
attribute name { text }
attribute age { xsd:integer }
attribute age { xsd:integer }
element person {
attribute name { text },
element age { xsd:[integer](/page/Integer) { minInclusive = "0" maxInclusive = "150" } }
}
element person {
attribute name { text },
element age { xsd:[integer](/page/Integer) { minInclusive = "0" maxInclusive = "150" } }
}
age element's text content must be an integer between 0 and 150, leveraging XML Schema facets for validation.[7] This approach maintains RELAX NG's focus on pattern-based typing while borrowing robust primitive and derived types from XML Schema.
For linking and uniqueness constraints akin to ID/IDREF in XML DTDs, RELAX NG relies on the companion RELAX NG DTD Compatibility specification, which introduces specialized datatypes from the library http://relaxng.org/ns/compatibility/datatypes/1.0.[26] These include ID for unique identifiers, IDREF for single references, and IDREFS for lists of references, applied within attribute patterns to enforce cross-document linking and uniqueness.[26] Validation processors supporting this compatibility layer check that all IDREF values match existing IDs and normalize whitespace in lists, updating the infoset accordingly without altering RELAX NG's core schema language.[26] This modular extension preserves RELAX NG's simplicity while providing key constraints via datatype libraries.
Validation
Validation Mechanism
The validation of an XML document against a RELAX NG schema involves a two-input process: the schema itself, which defines the expected structure and content patterns, and the instance document to be checked. The schema is first loaded and compiled into an internal representation, typically as a set of patterns forming a grammar based on regular tree languages. This compilation resolves references and expands patterns like choices or repetitions into a form suitable for matching, enabling efficient processing without requiring the entire document to be built in memory upfront.[27] Validation commences at the document's root element, where the schema's start pattern—often a top-levelelement pattern—is applied. The process recursively traverses the document's tree structure, computing "derivatives" of the current pattern with respect to each encountered node, such as start tags, attributes, text content, or child elements. For instance, the derivative of a pattern p with respect to a node x yields a new pattern that matches the remaining sequence after x has been consumed, allowing step-by-step expansion and matching. This derivative-based approach, grounded in finite tree automata theory, ensures that the validation proceeds in a deterministic manner for unambiguous schemas, recursively descending into child patterns while verifying attributes, data types, and sequences. If a mismatch occurs—such as an unexpected element, incorrect attribute value, or violation of required cardinality—the process identifies the error by reaching a notAllowed pattern state, pinpointing issues like sequence deviations or type failures.[27]
Error reporting focuses on actionable diagnostics, such as "expected element 'foo' but found 'bar'" or "attribute 'id' must match datatype 'ID'", derived from the context of the failed pattern derivative. This granular feedback aids developers in correcting structural or content violations without needing to reparse the entire document. Due to RELAX NG's foundation in regular languages, the mechanism supports streaming validation, processing the document linearly in a single pass over its serialized form, which is particularly useful for large or incrementally generated XML streams.[5][27]
A practical workflow for validation uses tools like Jing, an open-source RELAX NG validator. For example, to check an XML instance file document.xml against a compact schema schema.rnc, the command jing schema.rnc document.xml is executed; if valid, it exits silently with status 0, otherwise outputting detailed error messages to standard output with exit status 1. This command-line approach integrates easily into build pipelines or scripts, handling both XML and compact syntax schemas.[28]
Determinism and Ambiguity
RELAX NG schemas are inherently capable of expressing non-deterministic content models, where a single XML document may match a pattern in multiple distinct ways. This arises primarily from the use of choice patterns, which allow overlapping alternatives, and interleave patterns, which permit elements in arbitrary orders without enforcing a unique sequence. Unlike strictly deterministic systems, RELAX NG does not require schemas to guarantee a unique derivation for every valid document, prioritizing expressive power for modeling complex structures.[5] In choice patterns, ambiguity occurs when multiple branches can consume the same input sequence; validation proceeds by attempting alternatives in the order they appear in the schema, succeeding on the first viable match (first-success rule). For example, a schema defining an element as either containing a single text node or a sequence of two text nodes with identical content would accept the input in either interpretation, without preferring one over the other beyond the evaluation order. This approach simplifies implementation while allowing schemas that capture real-world variability, such as optional refinements in data formats.[29][27] Interleave patterns introduce further non-determinism by allowing permutations of child elements, where the validator must explore all possible orderings to confirm a match. Consider a pattern interleaving an "author" element and zero or more "keyword" elements; a document with "author" followed by "keyword" would match, as would "keyword" preceding "author," with the engine trying combinations until success or failure. While this enables flexible modeling of unordered collections, it can result in exponential computational complexity for deeply nested interleaves, though practical implementations mitigate this via optimizations like derivative computation.[5][27] The allowance for such ambiguity provides significant flexibility in schema design, facilitating the representation of intricate grammars that reflect natural data ambiguities, such as mixed-order metadata. However, it carries the risk of inconsistent application behaviors across validators, particularly if downstream processing assumes a unique match; schema authors must therefore exercise care to avoid unintended multiple interpretations that could lead to validation discrepancies. Detection algorithms exist to identify ambiguous grammars proactively, ensuring robust schema maintenance.[30]Comparisons
With W3C XML Schema
RELAX NG and W3C XML Schema were developed concurrently, both achieving specification status in 2001, with the RELAX NG OASIS Committee Specification finalized on December 3 and XML Schema Part 1 on May 2. Despite this timeline, RELAX NG emphasizes simplicity by avoiding XML Schema's intricate type hierarchies and derivation mechanisms, which can complicate schema authorship and maintenance. This design choice makes RELAX NG more concise and semantically straightforward for defining XML structures, prioritizing pattern-based expressions over XML Schema's component-oriented approach.[31] Adoption patterns diverge significantly between the two. W3C XML Schema remains dominant in enterprise environments due to its extensive tool support and integration with industry standards, backed by major vendors.[32] In contrast, RELAX NG finds preference in document-centric publishing workflows, powering schemas for formats like DocBook, the Text Encoding Initiative (TEI) Guidelines, EPUB, and OpenDocument.[33][34] These applications leverage RELAX NG's readability and flexibility for content markup. In terms of features, RELAX NG lacks XML Schema's substitution groups, which enable polymorphic element replacement based on a head element. However, it excels in unambiguous namespace handling, treating namespaces as integral to patterns without the wildcard complexities that can lead to validation ambiguities in XML Schema.[5] RELAX NG also offers superior modularity through mechanisms like theinclude and div elements, allowing schemas to reference and extend external modules seamlessly for reusable vocabularies.[7][35]
As of 2025, RELAX NG maintains stability in legacy XML workflows, particularly within publishing tools like Oxygen XML Editor and XMLmind, where it supports validation and authoring for established document standards.[36][37] Its use has diminished in emerging JSON/XML hybrid environments, where XML Schema's broader ecosystem and adaptations for modern data interchange prevail.[38]
With DTDs
RELAX NG offers significant improvements over Document Type Definitions (DTDs) in XML validation by providing native support for XML namespaces, which DTDs handle through awkward prefix-based mechanisms that do not recognize namespace URIs as the primary identifiers.[5] In DTDs, namespace declarations must be mimicked using parameter entities or conditional sections, leading to brittle and non-portable schemas, whereas RELAX NG directly uses qualified names tied to namespace URIs for precise matching.[23] This namespace awareness enables RELAX NG to support extensible and modular schemas without the hacks required in DTDs, such as those needed for documents involving multiple vocabularies like XSLT or RDF.[5] Another key advantage is RELAX NG's richer datatyping capabilities, which extend far beyond the limited options in DTDs, such as #PCDATA, CDATA, or enumerated tokens.[5] RELAX NG integrates external datatype libraries, including those from W3C XML Schema, allowing uniform specification of types like integers, strings with length constraints, or unions for both element content and attribute values.[23] This decoupling from fixed built-in types addresses DTDs' inability to express complex constraints, such as pattern matching or derivations, enabling more precise validation of structured data.[5] RELAX NG also overcomes DTD limitations in modularity and pattern expressiveness by supporting clean includes and composable patterns that resemble regular expressions.[5] Unlike DTDs, which rely on parameter entities for reuse but impose ordering constraints and flattening requirements, RELAX NG allows definitions to be included and overridden in any order within grammars, facilitating maintainable, context-dependent content models.[5] Its patterns support interleaving for unordered content and choice operators that close under union, providing regex-like flexibility for sequences, repetitions, and alternatives that DTDs' rigid & and | operators cannot fully capture without ambiguity or loss of precision.[23] Converting between RELAX NG and DTDs is feasible in one direction—DTDs can be automatically translated to RELAX NG schemas—but the reverse is often lossy due to RELAX NG's advanced features, such as full interleave semantics or namespace-independent name classes, which have no direct equivalents in DTD syntax.[5] The RELAX NG DTD Compatibility specification addresses some gaps by allowing annotations for DTD-specific behaviors like attribute defaults or ID/IDREF validation, but core RELAX NG validation remains focused on schema constraints without modifying the document infoset.[26] In practice, DTDs persist primarily for legacy XML systems where simplicity and broad tool support are prioritized, while RELAX NG is preferred for developing precise, maintainable schemas in modern structured documents, such as those in publishing or data exchange standards.[5] This shift underscores RELAX NG's role in evolving beyond DTDs' foundational but restrictive paradigm, much like how it complements W3C XML Schema in namespace-aware validation.[23]Implementations
Tools and Validators
Jing, developed by James Clark in 2002, serves as the primary RELAX NG validator, implemented in Java and supporting both XML and compact syntaxes through its command-line interface and Java API for programmatic integration.[39] It enables efficient validation of XML documents against RELAX NG schemas, with features including pluggable datatype libraries and compatibility with SAX2 parsers for streaming validation.[39] For schema authoring and editing, oXygen XML Editor provides comprehensive RELAX NG support, including a visual schema editor, content completion assistant, and specialized views for elements, attributes, and constraints, alongside validation for both syntaxes.[40] Similarly, Altova XMLSpy offers RELAX NG validation capabilities through its associated command-line tool AltovaXML, facilitating schema checking in development workflows, though primary editing focuses on graphical views for related schema languages. RELAX NG validation is integrated into libraries such as Apache Xerces, where configurations like ManekiNeko enable parser-based schema enforcement during XML processing.[3] In the .NET ecosystem, libraries like dotnet-dsdl provide RELAX NG validation readers, allowing programmatic checks against schemas in C# applications via XmlReader extensions.[41] As of 2025, RELAX NG tools like Jing are incorporated into IDEs such as IntelliJ IDEA through plugins that offer on-the-fly validation, error checking, and content completion for XML projects using RELAX NG schemas.[42] Extensions for CI/CD pipelines, including GitHub Actions and Jenkins, leverage Jing's command-line interface to automate schema validation in build processes, ensuring document compliance during continuous integration.[28] Converters like Trang complement these tools by enabling format translations for broader interoperability.[3] Other notable validators include the Multi-Schema Validator (MSV), a Java library for validating against RELAX NG and other schema languages, and librelaxng, a C implementation for integrating RELAX NG validation into C-based applications.[43]Converters and Integrations
One prominent converter for RELAX NG schemas is Trang, developed by James Clark, which facilitates interconversion between RELAX NG (in both XML and compact syntaxes) and other schema languages such as DTDs and W3C XML Schema.[44] Trang processes schemas by first converting them to an intermediate RELAX NG object model, applying necessary transformations, and then generating the target format, enabling seamless integration into environments that rely on alternative schema standards.[45] For specific use cases, custom tools have been developed to generate parsers from RELAX NG schemas, such as ANTLR grammars for XKB configuration files.[46] Integrations with Schematron enhance RELAX NG by embedding rule-based constraints for co-occurrence and semantic validations that go beyond structural patterns, often achieved through annotations in the RELAX NG XML syntax or direct embedding of Schematron rules within schema elements.[47] This combination is particularly useful for enforcing business rules in complex documents, where RELAX NG handles content models and Schematron addresses interdependencies.[48] In broader XML ecosystems, RELAX NG integrates into pipelines via standards like XProc, which includes a dedicatedp:validate-with-relax-ng step to apply schema validation within processing workflows, ensuring document conformance during transformations or data flows.[49] Similarly, in publishing tools, DocBook leverages RELAX NG schemas for vocabulary definition, with XSL stylesheets incorporating validation extensions to enforce schema rules during document processing and output generation.[50] These integrations promote RELAX NG's use in modular XML authoring and automated workflows.[33]
RELAX NG schemas conventionally use the .rng filename extension for the XML syntax and .rnc for the compact syntax, though these are not strictly enforced and serve primarily as identification aids in tools like converters and editors.[51]