Recent from talks
Nothing was collected or created yet.
TermBase eXchange
View on Wikipedia| TermBase eXchange | |
|---|---|
| Filename extension | .tbx |
| Internet media type | application/x-tbx [1] |
| UTI conformation | public.xml |
| Developed by | Localization Industry Standards Association |
| Initial release | 2002? |
| Latest release | April 2019[2] |
| Type of format | Terminology |
| Extended from | XML |
| Standard | ISO 30042 |
| Open format? | yes |
| Website | https://www.gala-global.org/sites/default/files/migrated-pages/docs/tbx_oscar_0.pdf |
TermBase eXchange (TBX) is an international standard (ISO 30042:2019) for the representation of structured concept-oriented terminological data, copublished by ISO and the Localization Industry Standards Association (LISA).[3][4][5] Originally released in 2002 by LISA's OSCAR special interest group, TBX was adopted by ISO TC 37 in 2008. In 2019 ISO 30042:2008 was withdrawn and revised by ISO 30042:2019. It is currently available as an ISO standard and as an open, industry standard, available at no charge.[4][5]
TBX defines an XML format for the exchange of terminology data, and is "an industry standard for terminology exchange".[6]
See also
[edit]- IATE (“Inter-Active Terminology for Europe”) is the EU's inter-institutional terminology database used in the EU institutions and agencies since summer 2004 for the collection, dissemination and shared management of EU-specific terminology. The IATE multilingual databases can be downloaded in a zipped format, then multilanguage glossaries in TBX format can be generated using a free tool.
- OpenTMS (Open Source Translation Management System)
- XLIFF (XML Localisation Interchange File Format): an XML-based format created to standardize the way localizable data are passed between tools during a localization process and a common format for CAT tool files.
References
[edit]- ^ dwaynebailey. "https://github.com/translate/virtaal/blob/master/share/applications/virtaal.desktop.in". GitHub. Retrieved 14 September 2015.
{{cite web}}: External link in(help)|title= - ^ "ISO 30042:2019". Retrieved 21 June 2019.
- ^ ISO 30042:2008: Systems to manage terminology, knowledge and content -- TermBase eXchange (TBX). International Organization for Standardization, http://www.iso.org/iso/catalogue_detail.htm?csnumber=45797
- ^ a b "LISA OSCAR Standards", GALA website. http://www.gala-global.org/lisa-oscar-standards
- ^ a b "TermBase eXchange". https://www.gala-global.org/sites/default/files/migrated-pages/docs/tbx_oscar_0.pdf
- ^ "Microsoft Terminology Collection". Microsoft Language. Archived from the original on 2015-09-23. Retrieved 14 September 2015.
TermBase eXchange
View on GrokipediaHistory
Origins in LISA
The TermBase eXchange (TBX) originated in 2002 within the Localization Industry Standards Association (LISA), specifically through its OSCAR special interest group, which stood for Open Standards for Container/Content Allowing Re-use. This group focused on creating XML-based standards to support automated language processing across globalization, internationalization, localization, and translation processes. TBX emerged as a dedicated format for the representation and exchange of terminological data, marking an early effort to unify data handling in the burgeoning field of multilingual content management.[1][5] The core purpose of TBX was to standardize termbase interchange within localization workflows, tackling the fragmentation and heterogeneity caused by proprietary formats in dominant tools of the era, such as SDL Trados and Déjà Vu. These systems often used incompatible structures for storing and sharing terminology, leading to inefficiencies in collaborative translation projects and data migration. By providing a neutral, XML-based interchange format, TBX aimed to promote interoperability, allowing terminological resources to be shared across diverse software environments without loss of structure or meaning.[6] Development involved key contributors from LISA's membership, including academics like Sue Ellen Wright of Kent State University and professionals from translation memory and terminology software vendors, who collaborated to define the foundational framework. The initial 2002 release featured a basic XML schema centered on essential elements such as terms, language equivalents, and definitions, establishing a flexible yet structured approach to terminological data exchange prior to any formal international standardization.[7][8]Adoption by ISO
The adoption of TermBase eXchange (TBX) by the International Organization for Standardization (ISO) marked a pivotal transition from an industry-driven initiative to a globally recognized standard. In 2007, the Localization Industry Standards Association (LISA) submitted the TBX specification, developed by its OSCAR special interest group, to ISO Technical Committee 37 (TC 37), Terminology and other language and content resources, Subcommittee 3 (SC 3), Management of terminology.[9] This submission utilized a fast-track procedure, leading to the formal adoption and publication of ISO 30042:2008 in December 2008, which defined TBX as an XML-based framework for the interchange of terminological data.[4] The standard was co-published by ISO and LISA, ensuring continuity while elevating TBX to an international benchmark for terminology management systems.[10] A key aspect of this adoption was TBX's alignment with established ISO terminology standards, particularly ISO 12620, which specifies data categories for language resources. ISO 30042:2008 required that all TBX data categories be drawn from the ISO 12620 registry, promoting interoperability and consistency across terminological databases used in translation, knowledge management, and content creation.[11] This integration facilitated the modular representation of terminological elements, allowing TBX to support diverse processes such as term extraction, concept modeling, and data exchange without prescribing a single rigid structure.[12] Following LISA's insolvency in March 2011, its OSCAR standards portfolio, including TBX, was transferred to ISO TC 37 for ongoing maintenance and development.[13] This handover solidified ISO's sole custodianship, withdrawing LISA's formal role and ensuring the standard's evolution under international governance. Early post-adoption efforts by ISO emphasized TBX's modularity to accommodate varying terminological needs, such as specialized dialects for translation workflows or broader knowledge organization tasks.[4] This focus enhanced TBX's adaptability, positioning it as a flexible tool for global standardization in terminology resources.Technical Specifications
Core Framework
TermBase eXchange (TBX) serves as an extensible markup language (XML) format designed for the interchange of terminological data, adhering to the ISO 30042 standard for structured representation of terms and related linguistic information. This framework enables the exchange of terminology resources across diverse software systems, ensuring compatibility and interoperability in fields such as translation and localization.[11] At its foundation, TBX leverages XML to define a modular architecture that separates core structural elements from customizable data categories, allowing users to tailor the format to specific needs without altering the underlying schema. TBX supports two styles: Data Category as Attribute (DCA), where categories are attributes on generic elements like<descrip>, and Data Category as Tag (DCT), using specific element names like <definition>.[2]
The basic structure of a TBX document begins with the root element <tbx>, evolved from the Martif Interchange Format, which encapsulates the entire terminological database.[14][15] Within this root, individual concepts are represented by <conceptEntry> elements, each containing nested components such as <langSec> for language-specific sections and <termSec> for term details. This hierarchical organization supports multilingual entries through attributes like xml:lang on <langSec>, facilitating the inclusion of equivalent terms across languages, along with definitions via <descrip type="definition">, notes in <termNote> elements (e.g., for grammatical information), and administrative metadata in <admin> sections for tracking origins, status, and revision history.[15]
Key principles of the TBX core framework emphasize modularity through dialects, such as the simplified TBX-Basic or more complex private dialects, which extend the mandatory TBX-Core structure with optional modules for enhanced functionality.[11] These dialects maintain backward compatibility while allowing customization of data categories, ensuring that essential elements like terms, definitions, and notes remain consistently represented.[16] Furthermore, TBX integrates seamlessly with XML technologies, including XML Schema Definition (XSD) files for validation, which enforce structural integrity and data constraints across implementations.[17] This compatibility enables automated processing, parsing, and verification of terminological data in standard XML environments.
Data Categories and Modules
The TermBase eXchange (TBX) standard incorporates data categories defined in ISO 12620, which provide a registry of standardized attributes for terminological resources, enabling consistent representation of elements such as terms, definitions, and notes. These categories include core elements like<term> for denoting a concept's designation, <descrip type="definition"> for explanatory text, <note> for general annotations, <termNote> for notes specific to a term's usage or status, and <adminNote> for administrative metadata such as entry modification history. By drawing from this inventory, TBX ensures interoperability across terminological databases while allowing for precise categorization of linguistic and conceptual data.
TBX organizes these data categories into modules and dialects, which serve as customizable building blocks for handling diverse terminological needs. Modules are predefined subsets that specify permissible categories and their constraints, divided into public (endorsed for general use, such as TBX-Basic for simple exchanges and private dialects for complex ontologies) and private (user-defined for specialized applications) types.[18][19] Dialects, in turn, combine one or more modules to create tailored profiles, with public examples like TBX-Basic extending the minimal TBX-Core to include categories such as /definition/ and /subjectField/, facilitating extensions without altering the core structure.[19] This modular approach supports user-defined extensions, ensuring flexibility for domain-specific terminology while maintaining compatibility.
Flexibility in TBX is achieved through attribute-value pairs applied to elements, allowing nuanced specifications such as <term type="synonym"> to indicate a synonymous variant or @xml:lang="fr" to denote the language of a termSet. These pairs, often in Data Category as Attribute (DCA) style, enable concise encoding of metadata like term types (e.g., abbreviation, acronym) or notes, promoting efficient data exchange.[18]
To ensure interoperability, TBX documents must validate against dialect-specific schemas, typically using RELAX NG (RNG) for structural constraints, supplemented by XSD for datatype validation and Schematron for additional rules like cardinality or value restrictions. This validation framework, integrated into the core structure, verifies compliance with selected modules and prevents errors in terminological exchanges.[20]
Versions and Evolution
TBX 2.0 (ISO 30042:2008)
ISO 30042:2008, published in December 2008, established the first international standard for TermBase eXchange (TBX), adopting the framework previously developed by the Localization Industry Standards Association (LISA) as TBX version 2.0 in 2002 with minor enhancements for standardization.[4][5] This XML-based specification provided a structured format for interchanging terminological data, primarily aimed at supporting translation and authoring processes in computer-based environments.[4] The core features of TBX 2.0 centered on a modular architecture consisting of a core structure module and an extensible constraint specification (XCS) module, enabling customization through subsets or supersets of default data categories defined in accordance with ISO 12616.[5] It supported hierarchical term entries via the<termEntry> element, which encapsulated conceptual information and included multiple <langSet> elements for multilingual equivalents, each containing <tig> (term information group) or <ntig> (non-term information group) for terms and related attributes.[5] Administrative metadata was integrated through elements like <admin>, <transac>, and <date>, allowing tracking of creation dates, ownership, and transaction notes to maintain provenance in terminological databases.[5]
Despite its advancements, TBX 2.0 exhibited limitations, including a rigid structure that represented various termbase formats without enforcing a single compatible schema, often leading to interoperability challenges among implementations.[21] It offered limited support for linked data or ontologies, focusing instead on basic terminological interchange without provisions for semantic web integration.[5] Additionally, its reliance on separate XCS files for declaring variants and on ISO 12616 for data category specifications created dependencies that could confuse implementers if not fully adhered to.[5][21]
In the early 2010s, TBX 2.0 saw widespread adoption within the localization industry for exchanging terminology data between computer-aided translation (CAT) tools and translation memory systems, facilitating seamless integration of termbases in multilingual projects.[5][22]
TBX 3.0 (ISO 30042:2019)
TBX 3.0, formalized as the second edition of ISO 30042:2019, was published on April 4, 2019, by the International Organization for Standardization (ISO), superseding and withdrawing the 2008 edition (ISO 30042:2008).[2][23] This revision enhances support for complex terminologies by introducing a more flexible, modular framework that accommodates domain-specific needs while maintaining backward compatibility through defined migration paths.[14][21] Major changes in TBX 3.0 emphasize improved modularity, enabling the creation of industry-defined dialects—such as TBX-Core, TBX-Min, and TBX-Basic—that build progressively as supersets using a telescoping principle to avoid overlapping modules.[14][23] These dialects replace the previous XCS formalism with a simpler @type attribute on the root attributes and adding
Applications and Usage
Terminology Management in Translation
TermBase eXchange (TBX) serves as a standardized XML-based format for exchanging structured terminological data, enabling seamless import and export of term lists between computer-assisted translation (CAT) tools such as memoQ to maintain consistency across multilingual translation projects.[24][25] In memoQ, for instance, term bases can be exported in TBX format for compatibility with various translation environments, facilitating the sharing of terminology resources without format-specific barriers.[24][25] This interoperability ensures that translators working on the same project can access unified term repositories, regardless of their preferred software. The adoption of TBX in translation workflows yields significant benefits, particularly in reducing localization errors by standardizing terms, their equivalents, and associated context notes among distributed teams.[26] By enforcing terminological consistency, TBX minimizes inconsistencies that could arise from ad-hoc translations, thereby enhancing technical accuracy and overall quality in multilingual outputs.[11] For example, context notes embedded in TBX files provide translators with essential guidance on usage, further preventing mistranslations and promoting adherence to project-specific glossaries.[11] In translation workflows, TBX facilitates the conversion of proprietary termbase formats into a neutral, exchangeable structure, supporting efficient sharing within global supply chains and aligning with ISO 17100 requirements for terminology management in professional translation services.[11][27] This process allows agencies and freelancers to integrate terminology from vendor-specific tools into collaborative platforms, ensuring that resources like style guides and approved terms are accessible throughout the project lifecycle without proprietary lock-in. TBX's XML modularity further aids this integration by permitting selective data exchange tailored to workflow needs. A practical case of TBX application is in software localization, where it manages user interface (UI) terms across languages to preserve brand integrity and functionality. For instance, Microsoft provides its official terminology collections in TBX format, enabling localization teams to consistently translate UI elements like buttons and menus while incorporating language-specific adaptations and notes.[28] Similarly, Mozilla's Pontoon platform utilizes TBX files to handle UI terminology for open-source software, ensuring that terms such as "install" are rendered accurately in target languages with contextual details for software-specific usage.[29] This approach streamlines the localization of dynamic UI components, reducing revision cycles and errors in end-user experiences.Integration with Localization Tools
TBX facilitates seamless integration with various localization tools by providing a standardized XML-based format for exchanging terminological data, enabling direct import into termbases without loss of structure. For instance, SDL Trados Studio supports importing TBX files into its MultiTerm termbases using built-in conversion tools, allowing users to populate terminology resources from external glossaries efficiently.[30][31] Similarly, ApSIC Xbench, a quality assurance and terminology management tool, permits adding TBX files to projects for bilingual reference searches and validation during localization workflows.[32][33] memoQ, another computer-assisted translation (CAT) platform, also handles TBX import and export, ensuring compatibility across multilingual projects.[34] Automation of TBX in localization pipelines is achieved through converters and APIs that transform terminological data for broader use, such as converting TBX to TMX (Translation Memory eXchange) format to integrate terms into translation memories. Tools like TTMEM's Convert and Go utility enable batch conversion of TBX files to TMX, supporting automated workflows in software localization where terminology must align with existing translation assets.[35] The Goldpan TMX/TBX Editor further aids this by allowing creation, editing, and conversion of TBX files with up to eight languages, facilitating integration into CI/CD (continuous integration/continuous deployment) pipelines for agile development.[36] In CI/CD environments, these converters ensure terminology updates propagate automatically, as seen in platforms like RWS that incorporate standards like TBX for continuous localization in software releases.[37] As of 2025, tools such as the TBX Exporter for Excel add-in simplify the creation of TBX files from spreadsheets, enhancing accessibility for non-specialist users in terminology workflows.[38] TBX extends its utility through compatibility with complementary standards like SRX (Segmentation Rules eXchange) and XLIFF (XML Localization Interchange File Format), enhancing precision in text processing and segment-level terminology application. When paired with SRX, TBX supports consistent segmentation rules across tools, allowing terminological data to inform how source text is divided for translation, as implemented in systems like XTM Cloud that handle both formats for workflow optimization.[39] With XLIFF, TBX enables embedding or referencing terminology at the segment level, where notes and attributes from TBX (inspired by XLIFF structures) maintain data integrity during file exchanges in localization tools.[22][40] This integration is evident in frameworks like Okapi, which process TBX alongside XLIFF for end-to-end translation pipelines.[41] A key challenge in localization ecosystems is bridging proprietary formats from diverse tools, which TBX addresses as a vendor-neutral intermediary for lossless data transfer. By serving as a common XML schema, TBX allows terminology from closed systems—such as those in SDL MultiTerm or other CAT environments—to be exported, shared, and re-imported without proprietary dependencies, promoting interoperability in the translation and localization industry.[43] This neutrality reduces conversion errors and supports scalable exchanges, as outlined in ISO 30042, ensuring TBX's role in handling complex, multi-vendor workflows.Adoption and Community
Industry Implementation
TBX has achieved significant adoption within European Union institutions, particularly through the Inter-Active Terminology for Europe (IATE) database, which enables exports of terminological data in TBX format compliant with ISO 30042:2019 since at least 2021. This integration supports multilingual drafting and legal-linguistic consistency across EU agencies. Similarly, multinational corporations such as Microsoft have incorporated TBX into their terminology workflows, offering product-specific terminology collections for download in .tbx format to facilitate standardized exchange and integration with translation tools.[44][28] In specialized sectors, TBX facilitates the management of controlled vocabularies essential for precision. In the medical field, it has been applied to represent multilingual extensions of systems like SNOMED CT, using an adapted TBX framework to structure and exchange clinical terminology for translation purposes.[45] The legal sector benefits from TBX's ability to handle concept-oriented data in controlled environments, supporting consistent terminology in multilingual legal documents, though specific implementations often align with broader translation standards. According to the TBX Council's registry of implementations (last updated 2021), over 30 tools claim support for TBX, including more than 20 commercial applications such as SDL Trados and MemoQ.[25] This widespread tool compatibility underscores TBX's role as an industry standard for terminological interchange. A primary barrier to broader implementation has been the initial learning curve associated with TBX's XML structure, which requires familiarity with markup for effective use. This challenge is increasingly mitigated by converters, validators, and simplified subsets like TBX-Basic, which reduce complexity while maintaining ISO compliance.[46]Resources and Support
Official documents for TermBase eXchange (TBX) are primarily governed by the ISO 30042:2019 standard, which defines the framework for representing structured terminological data, including the metamodel, data categories, and XML encoding styles such as Data Category as Attribute (DCA) and Data Category as Term (DCT).[2] The full text of ISO 30042:2019 can be purchased directly from the International Organization for Standardization (ISO) website, providing detailed specifications for terminology management and exchange.[2] Complementary specifications and guidelines are hosted on tbxinfo.net, a dedicated resource site maintained by LTAC Global, offering downloadable documents on TBX dialects, modules, and implementation best practices.[1] For validation of TBX files, RELAX NG schemas are available through tbxinfo.net, enabling users to check compliance with the standard's core structure and dialect-specific requirements.[47] These schemas, often integrated with Schematron rules for additional constraints, support automated verification in XML editors like Oxygen XML Editor, ensuring data integrity during exchange.[47] The site also provides an online TBX validation API for quick testing without local software installation.[48] Community hubs for TBX development and support include tbxinfo.net, which serves as a central repository for public dialects, modules, and developer resources, fostering collaboration among terminologists and software developers.[20] Open-source tools, such as converters for transforming TBX to other formats like JSON, are available on GitHub; for example, the Csh-MultiTerm-TBX-Converter facilitates bidirectional exchange between SDL MultiTerm and TBX formats using JSON mappings.[49] Another repository, tbx-conversion, provides Python scripts for converting between TBX and formats like NTRF, aiding integration with databases such as MongoDB.[50] Training resources for learning and implementing TBX are offered through LTAC Global, including tutorials on creating dialects and using public modules, accessible via their GitHub organization and tbxinfo.net help pages.[51] These materials cover practical topics like schema generation for DCA and DCT styles, with examples for the TBX-Basic dialect.[52] A starter guide, TBXStarterGuide.docx, provides step-by-step instructions for building initial TBX files, including validation against integrated RELAX NG schemas and Schematron rules.[10] While specific webinars on TBX dialects are not frequently scheduled, LTAC Global contributes to industry events and online resources that address dialect customization and terminology workflows.[19] Ongoing updates to TBX are managed through ISO Technical Committee 37 (ISO/TC 37), which oversees language and terminology standards, including a five-year systematic review of ISO 30042:2019 initiated in 2024 and ongoing as of 2025.[53][1] Feedback for the 2025 review can be submitted by contacting the review coordinator at [email protected], as announced on tbxinfo.net, allowing stakeholders to propose enhancements to dialects and validation mechanisms.[1] This process ensures TBX remains aligned with evolving needs in terminology management and localization.[54]References
- https://learn.[microsoft](/page/Microsoft).com/en-us/globalization/localization/localization-file-formats
