Recent from talks
Nothing was collected or created yet.
Data element
View on WikipediaThis article's lead section may need to be rewritten. (October 2025) |
| Data transformation |
|---|
| Concepts |
| Transformation languages |
| Techniques and transforms |
| Applications |
| Related |
In metadata, the term data element is an atomic unit of data that has precise meaning or precise semantics. Data elements usage can be discovered by inspection of software applications or application data files through a process of manual or automated Application Discovery and Understanding. Once data elements are discovered they can be registered in a metadata registry. In the areas of databases and data systems more generally a data element is a concept forming part of a data model. As an element of data representation, a collection of data elements forms a data structure.[1]
Properties
[edit]A data element has:
- An identification such as a data element name
- A clear data element definition
- One or more representation terms
- Optional enumerated values Code (metadata)
- A list of synonyms to data elements in other metadata registries Synonym ring
Name
[edit]A data element name is a name given to a data element in, for example, a data dictionary or metadata registry. In a formal data dictionary, there is often a requirement that no two data elements may have the same name, to allow the data element name to become an identifier, though some data dictionaries may provide ways to qualify the name in some way, for example by the application system or other context in which it occurs.
In a database driven data dictionary, the fully qualified data element name may become the primary key, or an alternate key, of a Data Elements table of the data dictionary.
The data element name typically conforms to ISO/IEC 11179 metadata registry naming conventions and has at least three parts:
- Object, Property and Representation term.
Many standards require the use of Upper camel case to differentiate the components of a data element name. This is the standard used by ebXML, GJXDM and NIEM.
Example of ISO/IEC 11179 name in XML
[edit]Users frequently encounter ISO/IEC 11179 when they are exposed to XML Data Element names that have a multi-part Camel Case format:
Object [Qualifier] Property RepresentationTerm
The specification also includes normative documentation in appendices.
For example, the XML element for a person's given (first) name would be expressed as:
<PersonGivenName>John</PersonGivenName>
Where Person is the Object=Person, Property=Given and Representation term="Name". In this case the optional qualifier is not used, in spite of being implicit in the data element name. This requires knowledge based on data element name, rather than use of structured data.
Definition
[edit]In metadata, a data element definition is a human readable phrase or sentence associated with a data element within a data dictionary that describes the meaning or semantics of a data element.
Data element definitions are critical for external users of any data system. Good definitions can dramatically ease the process of mapping one set of data into another set of data. This is a core feature of distributed computing and intelligent agent development.
There are several guidelines that should be followed when creating high-quality data element definitions.
Properties of clear definitions
[edit]A good definition is:
- Precise - The definition should use words that have a precise meaning. Try to avoid words that have multiple meanings or multiple word senses. The definition should use the shortest description. The definition should not use the term you are trying to define in the definition itself. This is known as a circular definition.
- Distinct - The definition should differentiate a data element from other data elements. This process is called disambiguation - The definition should be free of embedded rationale, functional usage, legal metadata registration.
Definitions should not refer to terms or concepts that might be misinterpreted by others or that have different meanings based on the context of a situation. Definitions should not contain acronyms that are not clearly defined or linked to other precise definitions.
If one is creating a large number of data elements, all the definitions should be consistent with related concepts.
Critical Data Element – Not all data elements are of equal importance or value to an organization. A key metadata property of an element is categorizing the data as a Critical Data Element (CDE). This categorization provides focus for data governance and data quality. An organization often has various sub-categories of CDEs, based on use of the data. e.g.:
- Security Coverage – data elements that are categorized as personal health record, personal health information or PHI warrant particular attention for security and access
- Marketing Department Usage – The marketing department could have a particular set of CDEs identified for identifying Unique Customer or for Campaign Management.
- Finance Department Usage – The Finance department could have a different set of CDEs from Marketing. They are focused on data elements which provide measures and metrics for fiscal reporting.
Standards such as the ISO/IEC 11179 Metadata Registry specification give guidelines for creating precise data element definitions. Specifically chapter four of the ISO/IEC 11179 metadata registry standard.
Common words such as play or run database documents over 57 different distinct meanings for the word "play" but only a single definition for the term dramatic play. Fewer definitions in a chosen word's dictionary entry is preferable. This minimizes misinterpretation related to a reader's context and background. The process of finding a good meaning of a word is called Word-sense disambiguation
Examples of definitions that could be improved
[edit]Here is the definition of "person" data element as defined in the www.w3c.org Friend of a Friend specification *:
Person: A person.
Although most people do have an intuitive understanding of what a person is, the definition has much room for improvement. The first problem is that the definition is circular. Note that this definition really does not help most readers and needs to be clarified.
Here is the definition of the "Person" Data Element in the Global Justice XML Data Model 3.0 *:
person: Describes inherent and frequently associated characteristics of a person.
Note that once again the definition is still circular. Person should not reference itself. The definition should use terms other than person to describe what a person is.
Here is a more precise but shorter definition of a person:
Person: An individual human being.
Note that it uses the word individual to state that this is an instance of a class of things called human being. Technically you might use "homo sapiens" in your definition, but more people are familiar with the term "human being" than "homo sapiens," so commonly used terms, if they are still precise, are always preferred.
Sometimes your system may have cultural norms and assumptions in the definitions. For example, if your "Person" data element tracked characters in a science fiction series that included aliens you may need a more general term other than human being.
Person: An individual of a sentient species.
In telecommunications
[edit]In telecommunications, the term data element has the following components:
- A named unit of data that, in some contexts, is considered indivisible and in other contexts may consist of data items.
- A named identifier of each of the entities and their attributes that are represented in a database.
- A basic unit of information built on standard structures having a unique meaning and distinct units or values.
- In electronic record-keeping, a combination of characters or bytes referring to one separate item of information, such as name, address, or age.
In practice
[edit]In practice, data elements (fields, columns, attributes, etc.) are sometimes "overloaded", meaning a given data element will have multiple potential meanings. While a known bad practice, overloading is nevertheless a very real factor or barrier to understanding what a system is doing.
See also
[edit]References
[edit]- ^ Beynon-Davies P. (2004). Database Systems 3rd Edition. Palgrave, Basingstoke, UK
This article incorporates public domain material from Federal Standard 1037C. General Services Administration. Archived from the original on 2022-01-22.
This article incorporates public domain material from Dictionary of Military and Associated Terms. United States Department of Defense.
Bibliography
[edit]- ISO/IEC 11179-5:2015 Metadata registries (MDR) - Part 5: Naming and identification principles
- ISO/IEC 11179-4:2004 Metadata registries (MDR) - Part 4
- ISO/IEC Technical Report 20943-1, First edition, 2003-08-01 Information technology — Procedures for achieving metadata registry consistency
- "XML Developer's Guide V1.1 – 1 May 2002". DON XML Working Group. 2002-05-01. Retrieved 24 June 2025.
External links
[edit]- Federal XML Developer's Guide
- ISO/IEC 11179 Standards (see ISO/IEC 11179-3:2003 clause 3.3.36)
Data element
View on GrokipediaFundamentals
Definition
A data element is an atomic unit of data that is indivisible and carries precise, unambiguous meaning within a specific context.[4] It represents the smallest meaningful component of information that cannot be further subdivided without losing its semantic integrity, ensuring clarity in data processing and interpretation.[4] According to ISO/IEC 11179, a data element combines a data element concept—which captures the semantic meaning—and a value domain—which specifies the allowable values and representation.[6] In metadata, data models, and information exchange, the data element serves as the foundational building block for constructing larger structures, such as records or messages, enabling consistent representation and interoperability across systems.[7] By providing a standardized unit of meaning, data elements facilitate the organization of complex data hierarchies and support reliable data sharing and analysis. Properties such as identification and representation further characterize these units, though detailed attributes are explored elsewhere. The concept of the data element traces its historical origins to early database theory in the 1960s, particularly through the efforts of the CODASYL Data Base Task Group (DBTG), which formalized data structures in reports that influenced the development of database management systems (DBMS).[8] These foundational works emphasized atomic data units within network models, evolving over decades into modern data management practices that integrate data elements into relational, NoSQL, and big data architectures for enhanced scalability and semantics.[9] It is important to distinguish a data element from a related term like data item; according to standards such as those from HHS, the latter often refers to a specific occurrence or instance of the data element, while the data element itself is the definitional atomic unit.[10]Properties
A data element is characterized by several core properties that ensure its clarity, reusability, and interoperability in information systems. These include a unique identification, typically in the form of a name or identifier, which distinguishes it within a given context or registry.[10] A precise definition is essential, providing a concise, unambiguous statement of the element's meaning without circular references or embedded explanations.[11] Additionally, the data type specifies the nature of the values it can hold, such as string, integer, or date, while the representation term—often qualifiers like "Code," "Amount," or "Identifier"—indicates the general category of representation to promote consistency.[11][10] Optional properties enhance the element's utility and flexibility. Enumerated values may be defined for categorical data, listing permissible options within a value domain to restrict inputs and ensure semantic accuracy.[11] Synonyms or aliases can be included to accommodate alternative names used in different systems or contexts, facilitating mapping and integration.[11] Constraints, such as maximum length, format requirements, or units of measure, further delimit the element's valid representations, preventing errors in data capture and processing.[10] Guidelines for constructing these properties emphasize precision to avoid ambiguity. Definitions should be context-specific, tailored to the domain without vagueness—for instance, specifying "Age: The number of years since birth" rather than a generic term like "how old someone is."[11] They must remain non-circular, relying on established terms rather than self-referential loops, and unambiguous to support consistent interpretation across users and systems.[10] An illustrative example is the data element PersonBirthDate, which includes: a unique name "PersonBirthDate"; a definition "The date on which an individual was born"; data type "date"; representation term "Date"; format constraint "YYYY-MM-DD"; and no enumerated values, as it draws from a standard calendar domain.[10] This set ensures the element's atomic nature as an indivisible unit of data.[11]Standardization
ISO/IEC 11179
ISO/IEC 11179 is an international standard developed by the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC) that provides a framework for metadata registries (MDRs) to register, manage, and describe data elements, concepts, and classifications in a structured manner. First published in parts during the mid-1990s, with initial editions such as ISO/IEC 11179-4 in 1995, the standard has evolved through multiple revisions, reaching its latest editions in 2023 and 2024 across its multi-part structure. It consists of several parts, including frameworks for conceptual schemas (Part 1), metamodels for data and metadata (Part 3), naming and identification principles (Part 5), and registration procedures (Part 6), enabling organizations to ensure semantic consistency and interoperability of data across systems.[12][13] The standard defines key components essential for describing data elements within an MDR. A data element concept represents the abstract meaning or semantic content of a data item, independent of its specific format, such as "Person Birth Date" denoting the date of birth without specifying how it is stored. A data element is a specific instantiation of a data element concept, including its representation (e.g., data type, length), such as "PersonBirthDate" formatted as YYYY-MM-DD. Value domains specify the permissible values or ranges for data elements, either as enumerations (e.g., a list of countries) or qualifiers (e.g., numeric ranges with precision), ensuring controlled and consistent usage. These components collectively support the classification and governance of metadata to facilitate data sharing and reuse. The registration process in ISO/IEC 11179 outlines a formal procedure for submitting, evaluating, and maintaining entries in an MDR to maintain quality and authority. Submission involves providing detailed metadata for a proposed data element, including its concept, representation, and value domain, along with supporting documentation for review by a registration authority. The review process assesses compliance with standard criteria, such as semantic clarity and uniqueness, potentially involving iterations for refinement before approval or rejection. Once registered, data elements undergo stewardship, including versioning to track changes (e.g., updates to value domains) and periodic reviews for obsolescence, ensuring ongoing relevance and traceability. This process promotes accountability through designated stewards responsible for maintenance.[14] Naming conventions under ISO/IEC 11179 emphasize clarity, consistency, and semantic precision to avoid ambiguity in data element identifiers. Names should employ Upper Camel Case, where each word starts with an uppercase letter and subsequent letters are lowercase, such as "PersonGivenName" for a first name field. A representation term from a controlled list (e.g., "Identifier," "Name," "Date") must conclude the name to indicate the data's form or qualifier, drawn from standardized glossaries to ensure uniformity. Abbreviations are discouraged to prevent misinterpretation, favoring full terms unless explicitly defined in the registry, thereby enhancing readability and machine-processability across diverse systems. As of 2025, recent updates to ISO/IEC 11179, including extensions in Parts 31, 33, and 34 published in 2023 and 2024, have enhanced support for semantic web technologies such as Resource Description Framework (RDF) to improve interoperability with linked data environments. These revisions introduce metamodels for data provenance and conceptual mappings that align with RDF schemas, enabling MDRs to export metadata as triples for integration with ontologies and knowledge graphs, thus bridging traditional data element management with modern semantic ecosystems. Additionally, in 2025, ISO/IEC TR 19583-21 and TR 19583-24 were published, offering SQL instantiation and RDF schema mappings for the ISO/IEC 11179 metamodel to support integration with relational databases and linked data environments.[15][16][17]Related Standards
Several standards and frameworks extend the foundational concepts of data elements outlined in ISO/IEC 11179, focusing on domain-specific interoperability and reusable components for information exchange. The ebXML Core Components Technical Specification, developed in the 2000s under ISO/TS 15000-5, defines core components as reusable building blocks for business document exchange, where data elements represent atomic pieces of business information structured within XML schemas to ensure semantic consistency across electronic transactions. This approach promotes the reuse of data elements in supply chain and e-commerce contexts by specifying aggregate and basic components that encapsulate business semantics.[18] In the United States, the Global Justice XML Data Model (GJXDM), initiated in the early 2000s, provides an object-oriented framework for justice and public safety information sharing, organizing data elements into a dictionary and XML schema to standardize exchanges among law enforcement and judicial entities.[19] Building on GJXDM, the National Information Exchange Model (NIEM), launched in 2005, expands this to broader government domains by defining a core set of reusable data elements—such as those for persons, activities, and locations—that support XML-based information exchanges while allowing domain-specific extensions.[20] NIEM's data model emphasizes governance processes to maintain element definitions, facilitating interoperability across federal, state, and local agencies.[21] For metadata applications, the Dublin Core Metadata Element Set, version 1.1, offers a simple vocabulary of 15 properties, including dc:title for resource naming, designed as basic data elements for describing digital resources in a cross-domain, interoperable manner.[22] These elements prioritize simplicity and extensibility, enabling lightweight resource discovery without complex hierarchies.[23] ISO/IEC 19773:2011 further supports data element reuse by extracting modular components from ISO/IEC 11179, including data element concepts for integration into open technical dictionaries, which serve as shared repositories for standardized terminology in engineering and technical applications.[24] These modules define value spaces and datatypes to ensure consistency in multilingual and multi-domain environments.[25] By 2025, data element standards have increasingly aligned with web technologies, such as W3C-endorsed schema.org, which provides structured data vocabularies—including types like WebPage and properties for entities—to markup web content as reusable data elements for enhanced search and interoperability.[26]Usage in Information Systems
Databases and Data Models
In relational databases, data elements serve as columns within tables, defining the structure and type of information stored for each attribute of an entity. For instance, a column representing a customer's name might use the VARCHAR data type in SQL to accommodate variable-length strings, ensuring efficient storage and querying of textual data.[27] This organization into rows and columns allows for systematic representation of relationships between data, where each row (or tuple) corresponds to a complete record. Normalization techniques, such as those outlined in Edgar F. Codd's relational model, are applied to these data elements to minimize redundancy and dependency issues, organizing tables to eliminate duplicate information across columns.[28][29] Within conceptual data models, particularly entity-relationship (ER) diagrams, data elements are represented as attributes attached to entities, capturing specific properties that describe real-world objects. An attribute like CustomerID functions as a unique identifier (primary key) linked to the Customer entity, enabling the modeling of one-to-many or many-to-many relationships between entities without data duplication. These attributes can be simple, such as a single-valued field for an employee's ID, or composite, combining multiple data elements like address components (street, city, zip code). This approach ensures that data elements maintain referential integrity and support the translation of ER models into physical database schemas.[30][31] The role of data elements has evolved across database paradigms, originating from hierarchical models in the 1960s and 1970s, where data was structured in tree-like parent-child relationships, to the relational model introduced by Codd in 1970, which emphasized tabular independence and query flexibility. In modern NoSQL databases like MongoDB, data elements appear as key-value pairs within flexible document structures, allowing nested or varying attributes in BSON format without rigid schemas—for example, a user document might include a "preferences" key with sub-elements like language and theme. This shift accommodates diverse data types and scalability needs, contrasting with the fixed columns of relational systems.[32][33][34] Interoperability between database schemas relies on mapping data elements to align disparate structures, often facilitated by Extract, Transform, Load (ETL) processes that extract data from source systems, transform elements (e.g., converting date formats or aggregating values), and load them into target databases. Tools in ETL pipelines define mappings to ensure semantic consistency, such as linking a "client_name" field from one schema to "customer_fullname" in another, preventing data silos in integrated environments.[35][36]Markup Languages and XML
In markup languages, data elements serve as the fundamental building blocks for structuring and exchanging information in a human- and machine-readable format. In XML, data elements are represented as tagged components enclosed by start and end tags, such as<GivenName>John</GivenName>, which encapsulate specific pieces of data while allowing for hierarchical organization and extensibility.[37] This structure enables the definition of custom tags to represent domain-specific data, ensuring that documents can be parsed and validated consistently across systems.[37]
To enforce consistency and interoperability, XML data elements are typically defined and validated using XML Schema Definition (XSD), a W3C recommendation that specifies constraints on element types, cardinality, and content models. For instance, an XSD can declare an element like <GivenName> with a string type and length restrictions, allowing tools to verify document compliance before processing.[38] Namespaces in XML further support reusability by qualifying element names to avoid conflicts, as seen in schemas where global elements—reusable across multiple documents—are prefixed with unique URIs. In ebXML, an OASIS and UN/CEFACT standard for electronic business, global elements exemplify this by defining reusable core components, such as <ID> or <Amount>, with attributes for data types and business semantics to facilitate standardized B2B exchanges.[18]
These XML-based data elements find practical application in web services and configuration files, promoting portable data interchange. In SOAP, a protocol for XML messaging in web services, data elements form the payload within the <Body> envelope, enabling structured requests and responses over HTTP for operations like remote procedure calls. Similarly, RESTful APIs can leverage XML payloads where data elements represent resources, though JSON has become more prevalent; in both cases, schemas ensure data integrity during transmission. Configuration files, such as those in enterprise software, use XML data elements to define parameters—like <server><host>[example.com](/page/Example.com)</host></server>—allowing modular and version-controlled settings that are easily parsed by applications. Naming conventions for these elements often draw from ISO/IEC 11179 to promote clarity and semantic consistency.
Post-2010 developments have extended these concepts beyond pure XML, with JSON-LD emerging as a W3C recommendation for linked data serialization. In JSON-LD, properties function as data elements annotated with semantic contexts, such as {"@context": {"givenName": "http://schema.org/givenName"}, "givenName": "John"}, enabling JSON documents to link to ontologies like Schema.org for enhanced discoverability and interoperability in web-scale data exchange.[39] This approach bridges traditional markup with semantic web technologies, treating properties as reusable, context-aware data elements without requiring full XML adoption.
