RELAX NG

RELAX NGMain

Community hub

RELAX NG

7 pages, 0 posts

0 subscribers

Recent from talks

Be the first to start a discussion here.

Recent from talks

Be the first to start a discussion here.

Contribute something

About hubMembersContent overviewUpdatesRules

Main reference articles

RELAX NG

View on Wikipedia

from Wikipedia

RELAX NG
Filename extension	.rng
Internet media type	application/xml, text/xml
Type of format	XML schema language
Extended from	XML

In computing, RELAX NG (REgular LAnguage for XML Next Generation) is a schema language for XML. A RELAX NG schema specifies a pattern for the structure and content of an XML document. A RELAX NG schema is itself an XML document, but RELAX NG also offers a popular compact, non-XML syntax.^[1] Compared to other XML schema languages, RELAX NG is considered relatively simple.

It was defined by a committee specification of the OASIS RELAX NG technical committee in 2001 and 2002, based on Murata Makoto's RELAX and James Clark's TREX,^[2]^[3]^[4] and also by part two of the international standard ISO/IEC 19757: Document Schema Definition Languages (DSDL).^[5]^[6] ISO/IEC 19757-2 was developed by ISO/IEC JTC 1/SC 34 and published in its first version in 2003.^[7]

Comparison with W3C XML Schema

[edit]

Although the RELAX NG specification was developed at roughly the same time as the W3C XML Schema specification, the latter was arguably better known and more widely implemented in both open-source and proprietary XML parsers and editors when it became a W3C Recommendation in 2001. Since then, however, RELAX NG support has increasingly found its way into XML software, and its acceptance has been aided by its adoption as a primary schema for popular document-centric markup languages such as DocBook, the TEI Guidelines, OpenDocument, and EPUB.

RELAX NG shares many features with W3C XML Schema that set both apart from traditional DTDs: data typing, regular expression support, namespace support, ability to reference complex definitions.

Filename extensions

[edit]

This section does not cite any sources. Please help improve this section by adding citations to reliable sources. Unsourced material may be challenged and removed. (September 2019) (Learn how and when to remove this message)

By informal convention, RELAX NG schemas in the regular syntax are typically named with the filename extension ".rng". For schemas in the compact syntax, the extension ".rnc" is used.

Determinism

[edit]

Relax NG schemas are not necessarily "deterministic" or "unambiguous".

Converting Relax NG to DTD

[edit]

Relax NG schemas can be converted to DTDs by applying Trang which can be found at: [1]. The manual for Trang is located at [2]. Note that Trang is unable to convert the OASIS DITA 1.3 schema to DTDs, failing with messages like:

 sorry, combining definitions with combine="choice" is not supported

References

[edit]

^ RELAX NG Compact Syntax
^ James Clark. "TREX - Tree Regular Expressions for XML - "TREX has been merged with RELAX to create RELAX NG."". Retrieved 2009-12-28.
^ Murata Makoto (2002-04-03). "RELAX (Regular Language description for XML) -- "RELAX NG of OASIS. It is a schema language created by unifying RELAX Core and TREX."". Retrieved 2009-12-28.
^ "TREX and RELAX Unified as RELAX NG, a Lightweight XML Language Validation Specification". Cover Pages. 2001-06-05. Retrieved 2009-12-28.
^ RELAX NG Specification
^ RELAX NG Technical Committee
^ ISO. "ISO/IEC 19757-2:2008 - Information technology -- Document Schema Definition Language (DSDL) -- Part 2: Regular-grammar-based validation -- RELAX NG". ISO. Retrieved 2009-12-28.

External links

[edit]

RELAX NG home page
"The Design of RELAX NG" by James Clark
RELAX NG tutorial for the XML syntax
RELAX NG tutorial for the compact syntax
Design patterns for structuring XML documents
RELAX NG Book by Eric van der Vlist, released under the GNU Free Documentation License
Relax NG Reference by ZVON
RELAX NG Java community projects at java.net
Sun Multi-Schema Validator (MSV) open-source Java XML toolkit
Relax NG Compact Syntax validator open-source C program
XSD to Relax NG Converter Web-based converter
https://github.com/relaxng/jing-trang

v t e International Organization for Standardization (ISO) standards
List of ISO standards – ISO romanizations – IEC standards
1–9999	1 2 3 4 6 7 9 16 17 31 -0 -1 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12 -13 68-1 128 216 217 226 228 233 259 261 262 302 306 361 500 518 519 639 -1 -2 -3 -5 -6 646 657 668 690 704 732 764 838 843 860 898 965 999 1000 1004 1007 1073-1 1073-2 1155 1413 1538 1629 1745 1989 2014 2015 2022 2033 2047 2108 2145 2146 2240 2281 2533 2709 2711 2720 2788 2848 2852 2921 3029 3103 3166 -1 -2 -3 3297 3307 3601 3602 3864 3901 3950 3977 4031 4157 4165 4217 4909 5218 5426 5427 5428 5725 5775 5776 5800 5807 5964 6166 6344 6346 6373 6385 6425 6429 6438 6523 6709 6943 7001 7002 7010 7027 7064 7098 7185 7200 7498 -1 7637 7736 7810 7811 7812 7813 7816 7942 8000 8093 8178 8217 8373 8501-1 8571 8583 8601 8613 8632 8651 8652 8691 8805/8806 8807 8820-5 8859 -1 -2 -3 -4 -5 -6 -7 -8 -8-I -9 -10 -11 -12 -13 -14 -15 -16 8879 9000/9001 9036 9075 9126 9141 9227 9241 9293 9314 9362 9407 9496 9506 9529 9564 9592/9593 9594 9660 9797-1 9897 9899 9945 9984 9985 9995
10000–19999	10006 10007 10116 10118-3 10160 10161 10165 10179 10206 10218 10279 10303 -11 -21 -22 -28 -238 10383 10585 10589 10628 10646 10664 10746 10861 10957 10962 10967 11073 11170 11172 11179 11404 11544 11783 11784 11785 11801 11889 11898 11940 (-2) 11941 11941 (TR) 11992 12006 12052 12182 12207 12234-2 12620 13211 -1 -2 13216 13250 13399 13406-2 13450 13485 13490 13567 13568 13584 13616 13816 13818 14000 14031 14224 14289 14396 14443 14496 -2 -3 -6 -10 -11 -12 -14 -17 -20 14617 14644 14649 14651 14698 14764 14882 14971 15022 15118 15189 15288 15291 15398 15408 15444 -3 -9 15445 15438 15504 15511 15686 15693 15706 -2 15707 15897 15919 15924 15926 15926 WIP 15930 15938 16023 16262 16355-1 16485 16612-2 16750 16949 (TS) 17024 17025 17100 17203 17369 17442 17506 17799 18004 18014 18181 18245 18629 18760 18916 19005 19011 19092 -1 -2 19114 19115 19125 19136 19407 19439 19500 19501 19502 19503 19505 19506 19507 19508 19509 19510 19600 19752 19757 19770 19775-1 19794-5 19831
20000–29999	20000 20022 20121 20400 20802 20830 21000 21001 21047 21122 21500 21778 21827 22000 22275 22300 22301 22395 22537 23000 23003 23008 23009 23090-3 23092 23094-1 23094-2 23270 23271 23360 23941 24517 24613 24617 24707 24728 25178 25964 26000 26262 26300 26324 27000 series 27000 27001 27002 27005 27006 27729 28000 29110 29148 29199-2 29500
30000+	30170 31000 32000 37001 38500 39075 40314 40500 42010 45001 50001 55000 56000 80000
Category

v t e IEC standards
IEC	60027 60034 60038 60062 60063 60068 60112 60228 60269 60297 60309 60320 60364 60446 60559 60601 60870 60870-5 60870-6 60906-1 60908 60929 60958 60980-344 61030 61131 61131-3 61131-9 61158 61162 61334 61355 61360 61400 61499 61508 61511 61784 61850 61851 61883 61960 61968 61970 62014-4 62026 62056 62061 62196 62262 62264 62304 62325 62351 62365 62366 62379 62386 62455 62680 62682 62700 63110 63119 63382
ISO/IEC	646 1989 2022 4909 5218 6429 6523 7810 7811 7812 7813 7816 7942 8613 8632 8652 8859 9126 9293 9496 9529 9592 9593 9899 9945 9995 10021 10116 10165 10179 10279 10646 10967 11172 11179 11404 11544 11801 12207 13250 13346 13522-5 13568 13816 13818 14443 14496 14651 14882 15288 15291 15408 15444 15445 15504 15511 15693 15897 15938 16262 16485 17024 17025 18004 18014 18181 19752 19757 19770 19788 20000 20802 21000 21827 22275 22537 23000 23003 23008 23270 23360 24707 24727 24744 24752 26300 27000 27000 family 27002 27040 29110 29119 33001 38500 39075 42010 80000 81346
Related	International Electrotechnical Commission

Revisions and contributors Edit on Wikipedia Read on Wikipedia

View on Grokipedia

from Grokipedia

RELAX NG is a schema language for XML designed to specify patterns for the structure, content, and data types of XML documents, allowing validation of XML instances against defined grammars.^[1] It supports two equivalent syntaxes: a full XML syntax that integrates seamlessly with XML tools and a compact, non-XML syntax optimized for readability and conciseness.^[2] Developed as a unification of the TREX and RELAX Core proposals, RELAX NG emphasizes simplicity, modularity, and internationalizability while maintaining compatibility with W3C XML Schema datatypes.^[3]^[1] The language was created by the OASIS RELAX NG Technical Committee, with initial committee specifications published in 2001 and 2002.^[4] It achieved international standardization as ISO/IEC 19757-2:2003 (Information technology — Document Schema Definition Languages (DSDL) — Part 2: RELAX NG), with a second edition in 2008 that incorporated minor revisions.^[3] RELAX NG's design prioritizes ease of use for both authors and implementers, supporting features like named patterns for reuse, context-dependent content models, and extensibility without requiring schema modifications.^[5] It is notably employed in standards such as XSLT 3.0 for schema-aware processing and the Citation Style Language for document formatting.^[3] As part of the broader Document Schema Definition Languages (DSDL) framework, RELAX NG complements other schema technologies by focusing on declarative, pattern-based validation rather than complex type hierarchies.^[1]

Overview

Definition and Purpose

RELAX NG, an acronym for REgular LAnguage for XML Next Generation, is a schema specification language designed to define patterns for the structure, content, and valid sequences of elements and attributes in XML documents.^[6]^[7] Its primary purpose is to enable the validation of XML instances against these schemas, thereby ensuring conformance to predefined rules, maintaining data integrity, and promoting interoperability across XML-based applications and systems.^[7]^[3] Unlike traditional XML documents, RELAX NG schemas can be expressed either as XML documents themselves or in a non-XML compact syntax, while preserving the underlying XML information set without modification.^[3]^[8] RELAX NG emerged as an alternative to more complex schema languages for XML processing, prioritizing simplicity and ease of use over intricate features like those in W3C XML Schema or DTDs.^[5]

Key Features

RELAX NG supports two distinct syntaxes for authoring schemas: an XML-based syntax typically stored in files with the .rng extension, and a compact, non-XML syntax stored in files with the .rnc extension, allowing authors to choose between structured XML representation and a more concise, grammar-like notation that enhances readability while preserving full expressiveness.^[8] The schema language integrates seamlessly with W3C XML Schema datatypes, enabling validation of content against built-in types such as string and integer, as well as custom derivations, through the use of the element specifying the datatype library.^[9] This integration allows RELAX NG to leverage a robust datatype system without redefining it, focusing instead on structural patterns. RELAX NG features a namespace-friendly design that supports referencing external namespaces via URI identifiers or prefixed names, avoiding prefix conflicts by relying on namespace URIs for unambiguous resolution during schema interpretation and validation.^[9] Unlike some schema languages that enforce strict determinism, RELAX NG permits ambiguous grammars where multiple patterns may match a given context, resolving them dynamically based on the instance document without requiring unambiguous definitions.^[9] RELAX NG documents in XML syntax are identified by the internet media types application/xml and text/xml, ensuring compatibility with standard XML processing tools and parsers.^[1] The language excels at expressing complex constraints, such as co-occurrence rules between attributes and elements—for instance, requiring a specific attribute only when certain child elements are present—through constructs like , , and that model interdependencies flexibly.^[9] This capability stems from RELAX NG's design goal of simplicity, which prioritizes intuitive pattern composition over rigid ordering requirements.^[10]

History

Development Origins

RELAX NG originated from the convergence of two independent efforts to create more expressive and flexible schema languages for XML documents. In early 2000, Makoto Murata developed RELAX (Regular Language description for XML), a schema language that leveraged regular language theory to describe XML structures, initially published as a specification on February 24, 2000, and later approved as an ISO/IEC Technical Report (TR 22250-1) in May 2001.^[11]^[12] Independently, in January 2001, James Clark introduced TREX (Tree Regular Expressions for XML), a compact schema language based on tree automata that emphasized simplicity and the use of W3C XML Schema datatypes.^[13]^[14] These prototypes addressed key shortcomings in existing XML validation mechanisms, particularly the Document Type Definitions (DTDs) introduced with XML 1.0, which lacked support for XML namespaces and offered limited expressiveness for complex content models.^[5] The unification of RELAX and TREX began in March 2001 when OASIS formed the TREX Technical Committee, chaired by James Clark, to standardize TREX as an XML validation language.^[15] By June 2001, recognizing the synergies between the two approaches, OASIS merged the TREX committee with efforts around Murata's RELAX, renaming it the RELAX NG Technical Committee to develop a single, lightweight schema language.^[16] Clark continued as chair, with Murata as a key contributor, aiming to create a "next generation" schema that combined RELAX's modularity with TREX's streamlined syntax while avoiding the anticipated verbosity and complexity of the emerging W3C XML Schema recommendation.^[17] This motivation stemmed from the need for a schema language that was easier to author and maintain than DTDs—without their namespace insensitivity and rigid attribute handling—and simpler than XML Schema's type hierarchy and particle constraints.^[5] Early development involved iterative prototypes and discussions within the committee, building on the theoretical foundations of hedge automata from RELAX and the practical validation algorithms from TREX. The first working draft of the RELAX NG specification was released on September 17, 2001, marking the initial formal outline of the unified language.^[18] This draft incorporated core patterns from both predecessors, focusing on composable grammars to enable reusable schema modules, and set the stage for further refinement toward an OASIS committee specification.^[17]

Standardization Process

The standardization of RELAX NG began with its approval as an OASIS Committee Specification. On December 3, 2001, the OASIS RELAX NG Technical Committee approved the RELAX NG specification, which defined the XML syntax for the schema language. This marked the formal consolidation of earlier efforts into a unified standard for XML schema definition. Subsequently, the compact syntax was developed to provide a more concise, non-XML alternative for authoring schemas. On November 21, 2002, the OASIS RELAX NG Technical Committee approved the RELAX NG Compact Syntax as a Committee Specification, enabling easier human readability and editing while maintaining equivalence to the XML syntax.^[8] RELAX NG was incorporated into the broader Document Schema Definition Languages (DSDL) framework, which coordinates multiple schema languages for comprehensive XML validation. As Part 2 of DSDL, RELAX NG focuses on regular-grammar-based validation, integrating with other parts like Schematron for rule-based assertions.^[19] The international standardization process advanced through ISO/IEC JTC 1/SC 34. In December 2003, RELAX NG was published as ISO/IEC 19757-2:2003, titled Information technology — Document Schema Definition Languages (DSDL) — Part 2: RELAX NG, establishing it as a formal international standard for XML schema patterns based on regular tree grammars.^[19] An amendment in 2006 (ISO/IEC 19757-2:2003/Amd 1:2006) added explicit support for the compact syntax.^[20] The standard underwent a minor revision in December 2008 with the second edition (ISO/IEC 19757-2:2008), which incorporated the 2006 amendment and provided clarifications on schema requirements and validation rules without altering core functionality.^[21] The 2008 edition remains the current version of RELAX NG Version 1, with the standard last reviewed and confirmed in 2024. A Version 2 is under development by ISO/IEC JTC1/SC34/WG1 to address evolving needs in XML processing.^[3]^[21]

Design Principles

Simplicity and Ease of Use

RELAX NG was designed with a primary goal of minimizing complexity to enhance usability for schema authors and implementers. Its pattern-based approach draws inspiration from context-free grammars, allowing schemas to define XML structures in a way that feels intuitive, particularly for those accustomed to regular expressions or formal language theory. By focusing on declarative patterns rather than intricate type hierarchies or inheritance mechanisms, RELAX NG avoids unnecessary abstractions that could complicate schema authoring. This design choice ensures that schemas can directly mirror the hierarchical structure of the target XML documents without requiring additional flattening or restructuring steps.^[5]^[22] A key aspect of RELAX NG's simplicity lies in its validation process, which operates directly on the input XML without altering the document's infoset. Unlike certain schema languages that mandate preprocessing, such as inserting default attributes or handling nilled elements, RELAX NG performs validation as a pure matching exercise against defined patterns. This approach eliminates the need for schema processors to modify the original document, reducing potential errors and simplifying integration into XML pipelines. As a result, validation remains deterministic and non-intrusive, preserving the integrity of the input while confirming conformance.^[5] The learning curve for RELAX NG is notably gentle, as it supports straightforward pattern matching without the verbose declarations often required in alternatives like XML Schema. Developers can express constraints using concise constructs that prioritize clarity over boilerplate, such as defining sequences and choices in a linear, grammar-like fashion. This reduces the overhead of schema creation and maintenance, making it accessible even to users without deep expertise in schema technologies. The language's uniform treatment of elements and attributes further streamlines comprehension, as patterns apply consistently across content models.^[5] Readability is further enhanced in RELAX NG's compact syntax, which employs familiar delimiters like parentheses for grouping subpatterns and vertical bars for specifying alternatives, evoking the notation of regular expressions. This syntax option allows authors to write schemas in a textual format that closely resembles the informal sketches often used in documentation or prototyping. By leveraging such intuitive symbols, RELAX NG facilitates quick iteration and review, contributing to its overall ease of use in practical development workflows.^[22]^[23]

Modularity and Namespace Handling

RELAX NG supports modularity through the include and div elements, which enable schemas to reference and incorporate external definitions for reuse across multiple documents. The include element uses an href attribute to merge an external grammar into the current one, effectively inlining its patterns while allowing optional redefinition of specific definitions via nested define elements. For instance, a schema might include a module for common inline elements like this: <include href="inline.rng" rel="nofollow"/>, which adds reusable patterns without duplicating code. The div element further enhances organization by grouping related definitions within a grammar, such as <div><define name="header">...</define></div>, facilitating modular structure and annotation without affecting semantics.^[7] Linking mechanisms in RELAX NG, particularly the externalRef element, allow schemas to import patterns from external files while preserving the original context during composition. By specifying an href attribute, externalRef references a named pattern in another schema, ensuring that namespace and scoping information from the imported module is maintained to avoid resolution conflicts. An example is <externalRef href="common.rng" rel="nofollow"/> within an element pattern, which integrates reusable components like data types or structures seamlessly into the host schema. This approach promotes reuse by treating external patterns as black boxes, with validation occurring as if they were inline.^[7] RELAX NG integrates namespaces directly through xmlns declarations in schema documents, enabling the use of qualified names for elements and attributes without complications from default namespace inheritance. Schemas declare namespaces via attributes like xmlns:prefix="URI", and the ns attribute on grammar elements specifies the default namespace URI for contained names, such as ns="http://example.com/doc". This allows precise qualification, for example, <element name="ex:doc" xmlns:ex="http://example.com/doc">, ensuring that local names are bound correctly regardless of surrounding contexts and avoiding issues where unqualified names might inadvertently fall into unintended namespaces. Attributes, by convention, belong to the empty namespace unless explicitly qualified.^[7] Co-occurrence constraints in RELAX NG permit the definition of interdependencies across modules, such as requiring specific attribute-element pairings, through compositors like interleave and choice applied to imported patterns. For example, a module might define a card element that interleaves a name attribute with an email child element, ensuring they co-occur appropriately when referenced via externalRef or include: <interleave><attribute name="name"/><element name="email">...</element></interleave>. This mechanism enforces relational rules between components from different modules during validation, maintaining consistency in composed schemas without needing monolithic definitions.^[7]

Syntax

XML Syntax

RELAX NG schemas in XML syntax are themselves valid XML documents that adhere to the RELAX NG namespace http://relaxng.org/ns/structure/1.0.^[1] This syntax enables the use of standard XML tools for parsing, editing, and processing schemas. The root of the schema can be a single pattern, such as an <element>, or a <grammar> element for more modular definitions containing named patterns via <define> elements.^[7] Key elements in the XML syntax include <element> for defining element patterns, <attribute> for attribute patterns, and <text/> for matching text content. The <grammar> element, when used, serves as a container that includes a <start> element to designate the initial matching pattern and <define> elements to create reusable named patterns referenced by <ref>. This structure allows for hierarchical and modular schema composition while maintaining XML conformance.^[1] A simple example of an XML syntax schema defines a book element with required title and author child elements, each containing text:

xml

<element name="book"> <element name="title"> <text/> </element> <element name="author"> <text/> </element> </element>

This schema validates XML instances like <book><title>Foundation</title><author>Isaac Asimov</author></book>, ensuring the specified structure and content types.^[7] Namespaces are declared using standard XML xmlns attributes on the schema document or elements, with the RELAX NG namespace typically bound to a default or prefix like rng. For patterns matching namespaced elements or attributes, the ns attribute specifies the target namespace URI on <element> or <attribute>, such as <element name="book" ns="http://example.com/book">. This approach integrates seamlessly with XML's namespace mechanisms without requiring additional scoping elements within the <grammar>.^[7] The XML syntax is notably verbose compared to alternatives but excels in machine readability, making it ideal for automated schema generation by tools and integration into XML processing pipelines.

Compact Syntax

The RELAX NG compact syntax provides a non-XML, grammar-like notation for defining schemas, designed to enhance readability while preserving full equivalence to the XML syntax.^[8] It employs keywords such as element for defining elements and attribute for attributes, along with delimiters including {} for grouping content within named constructs, , for sequences, | for choices, & for interleaving, and repetition operators like * for zero or more occurrences, + for one or more, and ? for optional patterns.^[8] These elements map directly to corresponding XML syntax constructs, such as element nameClass { pattern } translating to <element name="nameClass"><pattern/></element>, ensuring lossless bidirectional conversion between the two forms.^[8]^[23] For instance, a simple schema for a book element with required title and author subelements can be expressed as:

element book { element title { text }, element author { text } }

This notation specifies that a valid document must have a book root element containing exactly one title and one author, each holding text content, demonstrating the compact syntax's concise, declarative style akin to a context-free grammar.^[23] The advantages of the compact syntax include maximized human readability, which facilitates manual editing and comprehension without the verbosity of XML tags, while supporting all RELAX NG features such as namespaces, datatypes, and modularity through mechanisms like named patterns and includes.^[8] It is particularly beneficial for schema authors preferring a programming-language-like format over XML's markup-heavy structure, and schemas in this form can be independently translated to XML syntax without introducing dependencies or loss of information.^[8]^[23] Annotations and comments further enhance usability in the compact syntax. Annotations are enclosed in square brackets [ ] and placed before patterns, allowing embedding of foreign elements or attributes for metadata, which map to XML annotations like <db:annotation> in DocBook-integrated schemas.^[8] Comments begin with # and extend to the line end, while ## specifically denotes documentation comments that integrate with annotation mechanisms for schema documentation.^[8]^[23]

Core Concepts

Patterns and Elements

In RELAX NG, patterns serve as the fundamental building blocks for specifying the structure of XML documents, defining what constitutes a valid tree of elements, attributes, and text content.^[1] These patterns are composable, allowing schema authors to construct complex structures from simpler ones through operators that model sequences, alternatives, and repetitions.^[5] By treating elements and attributes uniformly within content models, RELAX NG patterns enable a declarative approach that mirrors the hierarchical nature of XML without imposing rigid ordering constraints where unnecessary.^[5] The core atomic patterns include text, empty, and notAllowed, which handle textual content and constraints at the leaf level. The text pattern matches any sequence of text characters, including whitespace or empty strings, and is used to specify locations where arbitrary character data is permitted.^[24] In contrast, the empty pattern matches only an absence of content, ensuring no text or child elements are present in that position.^[8] The notAllowed pattern, conversely, forbids any content entirely, serving to explicitly prohibit elements or text in a given context and preventing unintended expansions in schema composition.^[1] Element and attribute declarations are primary patterns for defining named nodes in the XML tree. The element pattern specifies an XML element with a given name (or name class) and an inner pattern that constrains its content; for instance, in compact syntax, element foo { text } requires an <foo> element containing only text.^[8] Similarly, the attribute pattern declares an XML attribute with a name and inner pattern, such as attribute name { text }, which mandates a name attribute holding text on its parent element.^[24] These declarations can nest patterns recursively to capture document hierarchies. Patterns combine via operators to express relationships between components. The sequence operator, denoted by a comma (,) in compact syntax or a group element in XML syntax, enforces ordered concatenation of subpatterns, as in element header { element title { text }, element author { text } }, which requires a <header> containing a <title> followed immediately by an <author>.^[8] The choice operator, represented by a vertical bar (|) or a choice element, allows one of several alternatives, enabling flexible structures like element greeting { "hello" | "hi" } for selectable fixed content.^[24] Repetition modifiers extend patterns to handle multiplicity without verbose recursion. The optional modifier (?) permits zero or one occurrence of a pattern, such as element optionalNote { text }? to allow but not require a note.^[8] The one-or-more modifier (+) requires at least one repetition, while zero-or-more (*) allows any number including none; for example, element items { element item { text }+ } defines an <items> element containing one or more <item> subelements, each with text, modeling a non-empty list structure.^[1] These operators apply uniformly to elements, attributes, and other patterns, promoting modularity in schema design.^[5]

Data Types and Attributes

RELAX NG integrates data types primarily through external libraries, with the W3C XML Schema Datatypes serving as the recommended standard for defining and constraining textual content within patterns.^[7] This integration is facilitated by the <data> element in XML syntax or the data keyword in compact syntax, which specifies a datatype via the type attribute and references the library using the datatypeLibrary attribute, defaulting to the XML Schema namespace http://www.w3.org/2001/XMLSchema-datatypes.^[25] Custom facets, such as length restrictions or value ranges, are applied using parameters like <param> in XML syntax or inline declarations in compact syntax, enabling precise control over content validity without embedding full schema complexity.^[7] Attributes in RELAX NG are defined using the <attribute> pattern, which specifies the attribute's name and optional content constraints, including data types from the integrated libraries.^[1] By default, attributes are optional within an element pattern unless explicitly required through grouping or referencing mechanisms; their order is insignificant, unlike elements.^[7] For typed attributes, the pattern can incorporate a <data> child, as in the compact syntax example:

attribute name { text }

or with typing:

attribute age { xsd:integer }

This ensures the attribute value conforms to the specified datatype, such as an integer for age.^[7] Data types extend to element content as well, allowing constraints like numeric ranges within simple elements, as shown in this compact syntax example for a person element:

element person { attribute name { text }, element age { xsd:[integer](/page/Integer) { minInclusive = "0" maxInclusive = "150" } } }

Here, the age element's text content must be an integer between 0 and 150, leveraging XML Schema facets for validation.^[7] This approach maintains RELAX NG's focus on pattern-based typing while borrowing robust primitive and derived types from XML Schema. For linking and uniqueness constraints akin to ID/IDREF in XML DTDs, RELAX NG relies on the companion RELAX NG DTD Compatibility specification, which introduces specialized datatypes from the library http://relaxng.org/ns/compatibility/datatypes/1.0.^[26] These include ID for unique identifiers, IDREF for single references, and IDREFS for lists of references, applied within attribute patterns to enforce cross-document linking and uniqueness.^[26] Validation processors supporting this compatibility layer check that all IDREF values match existing IDs and normalize whitespace in lists, updating the infoset accordingly without altering RELAX NG's core schema language.^[26] This modular extension preserves RELAX NG's simplicity while providing key constraints via datatype libraries.

Validation

Validation Mechanism

The validation of an XML document against a RELAX NG schema involves a two-input process: the schema itself, which defines the expected structure and content patterns, and the instance document to be checked. The schema is first loaded and compiled into an internal representation, typically as a set of patterns forming a grammar based on regular tree languages. This compilation resolves references and expands patterns like choices or repetitions into a form suitable for matching, enabling efficient processing without requiring the entire document to be built in memory upfront.^[27] Validation commences at the document's root element, where the schema's start pattern—often a top-level element pattern—is applied. The process recursively traverses the document's tree structure, computing "derivatives" of the current pattern with respect to each encountered node, such as start tags, attributes, text content, or child elements. For instance, the derivative of a pattern p with respect to a node x yields a new pattern that matches the remaining sequence after x has been consumed, allowing step-by-step expansion and matching. This derivative-based approach, grounded in finite tree automata theory, ensures that the validation proceeds in a deterministic manner for unambiguous schemas, recursively descending into child patterns while verifying attributes, data types, and sequences. If a mismatch occurs—such as an unexpected element, incorrect attribute value, or violation of required cardinality—the process identifies the error by reaching a notAllowed pattern state, pinpointing issues like sequence deviations or type failures.^[27] Error reporting focuses on actionable diagnostics, such as "expected element 'foo' but found 'bar'" or "attribute 'id' must match datatype 'ID'", derived from the context of the failed pattern derivative. This granular feedback aids developers in correcting structural or content violations without needing to reparse the entire document. Due to RELAX NG's foundation in regular languages, the mechanism supports streaming validation, processing the document linearly in a single pass over its serialized form, which is particularly useful for large or incrementally generated XML streams.^[5]^[27] A practical workflow for validation uses tools like Jing, an open-source RELAX NG validator. For example, to check an XML instance file document.xml against a compact schema schema.rnc, the command jing schema.rnc document.xml is executed; if valid, it exits silently with status 0, otherwise outputting detailed error messages to standard output with exit status 1. This command-line approach integrates easily into build pipelines or scripts, handling both XML and compact syntax schemas.^[28]

Determinism and Ambiguity

RELAX NG schemas are inherently capable of expressing non-deterministic content models, where a single XML document may match a pattern in multiple distinct ways. This arises primarily from the use of choice patterns, which allow overlapping alternatives, and interleave patterns, which permit elements in arbitrary orders without enforcing a unique sequence. Unlike strictly deterministic systems, RELAX NG does not require schemas to guarantee a unique derivation for every valid document, prioritizing expressive power for modeling complex structures.^[5] In choice patterns, ambiguity occurs when multiple branches can consume the same input sequence; validation proceeds by attempting alternatives in the order they appear in the schema, succeeding on the first viable match (first-success rule). For example, a schema defining an element as either containing a single text node or a sequence of two text nodes with identical content would accept the input in either interpretation, without preferring one over the other beyond the evaluation order. This approach simplifies implementation while allowing schemas that capture real-world variability, such as optional refinements in data formats.^[29]^[27] Interleave patterns introduce further non-determinism by allowing permutations of child elements, where the validator must explore all possible orderings to confirm a match. Consider a pattern interleaving an "author" element and zero or more "keyword" elements; a document with "author" followed by "keyword" would match, as would "keyword" preceding "author," with the engine trying combinations until success or failure. While this enables flexible modeling of unordered collections, it can result in exponential computational complexity for deeply nested interleaves, though practical implementations mitigate this via optimizations like derivative computation.^[5]^[27] The allowance for such ambiguity provides significant flexibility in schema design, facilitating the representation of intricate grammars that reflect natural data ambiguities, such as mixed-order metadata. However, it carries the risk of inconsistent application behaviors across validators, particularly if downstream processing assumes a unique match; schema authors must therefore exercise care to avoid unintended multiple interpretations that could lead to validation discrepancies. Detection algorithms exist to identify ambiguous grammars proactively, ensuring robust schema maintenance.^[30]

Comparisons

With W3C XML Schema

RELAX NG and W3C XML Schema were developed concurrently, both achieving specification status in 2001, with the RELAX NG OASIS Committee Specification finalized on December 3 and XML Schema Part 1 on May 2. Despite this timeline, RELAX NG emphasizes simplicity by avoiding XML Schema's intricate type hierarchies and derivation mechanisms, which can complicate schema authorship and maintenance. This design choice makes RELAX NG more concise and semantically straightforward for defining XML structures, prioritizing pattern-based expressions over XML Schema's component-oriented approach.^[31] Adoption patterns diverge significantly between the two. W3C XML Schema remains dominant in enterprise environments due to its extensive tool support and integration with industry standards, backed by major vendors.^[32] In contrast, RELAX NG finds preference in document-centric publishing workflows, powering schemas for formats like DocBook, the Text Encoding Initiative (TEI) Guidelines, EPUB, and OpenDocument.^[33]^[34] These applications leverage RELAX NG's readability and flexibility for content markup. In terms of features, RELAX NG lacks XML Schema's substitution groups, which enable polymorphic element replacement based on a head element. However, it excels in unambiguous namespace handling, treating namespaces as integral to patterns without the wildcard complexities that can lead to validation ambiguities in XML Schema.^[5] RELAX NG also offers superior modularity through mechanisms like the include and div elements, allowing schemas to reference and extend external modules seamlessly for reusable vocabularies.^[7]^[35] As of 2025, RELAX NG maintains stability in legacy XML workflows, particularly within publishing tools like Oxygen XML Editor and XMLmind, where it supports validation and authoring for established document standards.^[36]^[37] Its use has diminished in emerging JSON/XML hybrid environments, where XML Schema's broader ecosystem and adaptations for modern data interchange prevail.^[38]

With DTDs

RELAX NG offers significant improvements over Document Type Definitions (DTDs) in XML validation by providing native support for XML namespaces, which DTDs handle through awkward prefix-based mechanisms that do not recognize namespace URIs as the primary identifiers.^[5] In DTDs, namespace declarations must be mimicked using parameter entities or conditional sections, leading to brittle and non-portable schemas, whereas RELAX NG directly uses qualified names tied to namespace URIs for precise matching.^[23] This namespace awareness enables RELAX NG to support extensible and modular schemas without the hacks required in DTDs, such as those needed for documents involving multiple vocabularies like XSLT or RDF.^[5] Another key advantage is RELAX NG's richer datatyping capabilities, which extend far beyond the limited options in DTDs, such as #PCDATA, CDATA, or enumerated tokens.^[5] RELAX NG integrates external datatype libraries, including those from W3C XML Schema, allowing uniform specification of types like integers, strings with length constraints, or unions for both element content and attribute values.^[23] This decoupling from fixed built-in types addresses DTDs' inability to express complex constraints, such as pattern matching or derivations, enabling more precise validation of structured data.^[5] RELAX NG also overcomes DTD limitations in modularity and pattern expressiveness by supporting clean includes and composable patterns that resemble regular expressions.^[5] Unlike DTDs, which rely on parameter entities for reuse but impose ordering constraints and flattening requirements, RELAX NG allows definitions to be included and overridden in any order within grammars, facilitating maintainable, context-dependent content models.^[5] Its patterns support interleaving for unordered content and choice operators that close under union, providing regex-like flexibility for sequences, repetitions, and alternatives that DTDs' rigid & and | operators cannot fully capture without ambiguity or loss of precision.^[23] Converting between RELAX NG and DTDs is feasible in one direction—DTDs can be automatically translated to RELAX NG schemas—but the reverse is often lossy due to RELAX NG's advanced features, such as full interleave semantics or namespace-independent name classes, which have no direct equivalents in DTD syntax.^[5] The RELAX NG DTD Compatibility specification addresses some gaps by allowing annotations for DTD-specific behaviors like attribute defaults or ID/IDREF validation, but core RELAX NG validation remains focused on schema constraints without modifying the document infoset.^[26] In practice, DTDs persist primarily for legacy XML systems where simplicity and broad tool support are prioritized, while RELAX NG is preferred for developing precise, maintainable schemas in modern structured documents, such as those in publishing or data exchange standards.^[5] This shift underscores RELAX NG's role in evolving beyond DTDs' foundational but restrictive paradigm, much like how it complements W3C XML Schema in namespace-aware validation.^[23]

Implementations

Tools and Validators

Jing, developed by James Clark in 2002, serves as the primary RELAX NG validator, implemented in Java and supporting both XML and compact syntaxes through its command-line interface and Java API for programmatic integration.^[39] It enables efficient validation of XML documents against RELAX NG schemas, with features including pluggable datatype libraries and compatibility with SAX2 parsers for streaming validation.^[39] For schema authoring and editing, oXygen XML Editor provides comprehensive RELAX NG support, including a visual schema editor, content completion assistant, and specialized views for elements, attributes, and constraints, alongside validation for both syntaxes.^[40] Similarly, Altova XMLSpy offers RELAX NG validation capabilities through its associated command-line tool AltovaXML, facilitating schema checking in development workflows, though primary editing focuses on graphical views for related schema languages. RELAX NG validation is integrated into libraries such as Apache Xerces, where configurations like ManekiNeko enable parser-based schema enforcement during XML processing.^[3] In the .NET ecosystem, libraries like dotnet-dsdl provide RELAX NG validation readers, allowing programmatic checks against schemas in C# applications via XmlReader extensions.^[41] As of 2025, RELAX NG tools like Jing are incorporated into IDEs such as IntelliJ IDEA through plugins that offer on-the-fly validation, error checking, and content completion for XML projects using RELAX NG schemas.^[42] Extensions for CI/CD pipelines, including GitHub Actions and Jenkins, leverage Jing's command-line interface to automate schema validation in build processes, ensuring document compliance during continuous integration.^[28] Converters like Trang complement these tools by enabling format translations for broader interoperability.^[3] Other notable validators include the Multi-Schema Validator (MSV), a Java library for validating against RELAX NG and other schema languages, and librelaxng, a C implementation for integrating RELAX NG validation into C-based applications.^[43]

Converters and Integrations

One prominent converter for RELAX NG schemas is Trang, developed by James Clark, which facilitates interconversion between RELAX NG (in both XML and compact syntaxes) and other schema languages such as DTDs and W3C XML Schema.^[44] Trang processes schemas by first converting them to an intermediate RELAX NG object model, applying necessary transformations, and then generating the target format, enabling seamless integration into environments that rely on alternative schema standards.^[45] For specific use cases, custom tools have been developed to generate parsers from RELAX NG schemas, such as ANTLR grammars for XKB configuration files.^[46] Integrations with Schematron enhance RELAX NG by embedding rule-based constraints for co-occurrence and semantic validations that go beyond structural patterns, often achieved through annotations in the RELAX NG XML syntax or direct embedding of Schematron rules within schema elements.^[47] This combination is particularly useful for enforcing business rules in complex documents, where RELAX NG handles content models and Schematron addresses interdependencies.^[48] In broader XML ecosystems, RELAX NG integrates into pipelines via standards like XProc, which includes a dedicated p:validate-with-relax-ng step to apply schema validation within processing workflows, ensuring document conformance during transformations or data flows.^[49] Similarly, in publishing tools, DocBook leverages RELAX NG schemas for vocabulary definition, with XSL stylesheets incorporating validation extensions to enforce schema rules during document processing and output generation.^[50] These integrations promote RELAX NG's use in modular XML authoring and automated workflows.^[33] RELAX NG schemas conventionally use the .rng filename extension for the XML syntax and .rnc for the compact syntax, though these are not strictly enforced and serve primarily as identification aids in tools like converters and editors.^[51]

History

RELAX NG

Recent from talks

Recent from talks

Contribute something

Contribute something

Media Pages

Timelines

Articles

Notes collections

Notes

Notes

Days in Chronicle

RELAX NG

Comparison with W3C XML Schema

Filename extensions

Determinism

Converting Relax NG to DTD

See also

References

External links

RELAX NG

Overview

Definition and Purpose

Key Features

History

Development Origins

Standardization Process

Design Principles

Simplicity and Ease of Use

Modularity and Namespace Handling

Syntax

XML Syntax

Compact Syntax

Core Concepts

Patterns and Elements

Data Types and Attributes

Validation

Validation Mechanism

Determinism and Ambiguity

Comparisons

With W3C XML Schema

With DTDs

Implementations

Tools and Validators

Converters and Integrations

References

Add your contribution

Related Hubs

Contribute something