Recent from talks
Contribute something
Nothing was collected or created yet.
Identifier
View on Wikipedia
An identifier is a name that identifies (that is, labels the identity of) either a unique object or a unique class of objects, where the "object" or class may be an idea, person, physical countable object (or class thereof), or physical noncountable substance (or class thereof). The abbreviation ID often refers to identity, identification (the process of identifying), or an identifier (that is, an instance of identification). An identifier may be a word, number, letter, symbol, or any combination of those.
The words, numbers, letters, or symbols may follow an encoding system (wherein letters, digits, words, or symbols stand for [represent] ideas or longer names) or they may simply be arbitrary. When an identifier follows an encoding system, it is often referred to as a code or ID code. For instance the ISO/IEC 11179 metadata registry standard defines a code as system of valid symbols that substitute for longer values in contrast to identifiers without symbolic meaning. Identifiers that do not follow any encoding scheme are often said to be arbitrary IDs; they are arbitrarily assigned and have no greater meaning. (Sometimes identifiers are called "codes" even when they are actually arbitrary, whether because the speaker believes that they have deeper meaning or simply because they are speaking casually and imprecisely.)
The unique identifier (UID) is an identifier that refers to only one instance—only one particular object in the universe. A part number is an identifier, but it is not a unique identifier—for that, a serial number is needed, to identify each instance of the part design. Thus the identifier "Model T" identifies the class (model) of automobiles that Ford's Model T comprises; whereas the unique identifier "Model T Serial Number 159,862" identifies one specific member of that class—that is, one particular Model T car, owned by one specific person.
The concepts of name and identifier are denotatively equal, and the terms are thus denotatively synonymous; but they are not always connotatively synonymous, because code names and ID numbers are often connotatively distinguished from names in the sense of traditional natural language naming. For example, both "Jamie Zawinski" and "Netscape employee number 20" are identifiers for the same specific human being; but normal English-language connotation may consider "Jamie Zawinski" a "name" and not an "identifier", whereas it considers "Netscape employee number 20" an "identifier" but not a "name." This is an emic indistinction rather than an etic one.
Metadata
[edit]In metadata, an identifier is a language-independent label, sign or token that uniquely identifies an object within an identification scheme. The suffix "identifier" is also used as a representation term when naming a data element.
ID codes may inherently carry metadata along with them. For example, when you know that the food package in front of you has the identifier "2011-09-25T15:42Z-MFR5-P02-243-45", you not only have that data, you also have the metadata that tells you that it was packaged on September 25, 2011, at 3:42pm UTC, manufactured by Licensed Vendor Number 5, at the Peoria, IL, USA plant, in Building 2, and was the 243rd package off the line in that shift, and was inspected by Inspector Number 45.
Arbitrary identifiers might lack metadata. For example, if a food package just says 100054678214, its ID may not tell anything except identity—no date, manufacturer name, production sequence rank, or inspector number. In some cases, arbitrary identifiers such as sequential serial numbers leak information (i.e. the German tank problem). Opaque identifiers—identifiers designed to avoid leaking even that small amount of information—include "really opaque pointers" and Version 4 UUIDs.
In computer science
[edit]In computer science, identifiers (IDs) are lexical tokens that name entities. Identifiers are used extensively in virtually all information processing systems. Identifying entities makes it possible to refer to them, which is essential for any kind of symbolic processing.
In computer languages
[edit]In computer languages, identifiers are tokens (also called symbols) which name language entities. Some of the kinds of entities an identifier might denote include variables, types, labels, subroutines, and packages.
Ambiguity
[edit]Identifiers (IDs) versus Unique identifiers (UIDs)
[edit]A resource may carry multiple identifiers. Typical examples are:
- One person with multiple names, nicknames, and forms of address (titles, salutations)
- For example: One specific person may be identified by all of the following identifiers: Jane Smith; Jane Elizabeth Meredith Smith; Jane E. M. Smith; Jane E. Smith; Janie Smith; Janie; Little Janie (as opposed to her mother or sister or cousin, Big Janie); Aunt Jane; Auntie Janie; Mom; Grandmom; Nana; Kelly's mother; Billy's grandmother; Ms. Smith; Dr. Smith; Jane E. Smith, PhD; and Fuzzy (her jocular nickname at work).
- One document with multiple versions[1]
- One substance with multiple names (for example, CAS index names versus IUPAC names;[2] INN generic drug names versus USAN generic drug names versus brand names)
The inverse is also possible, where multiple resources are represented with the same identifier (discussed below).
Implicit context and namespace conflicts
[edit]Many codes and nomenclatural systems originate within a small namespace. Over the years, some of them bleed into larger namespaces (as people interact in ways they formerly had not, e.g., cross-border trade, scientific collaboration, military alliance, and general cultural interconnection or assimilation). When such dissemination happens, the limitations of the original naming convention, which had formerly been latent and moot, become painfully apparent, often necessitating retronymy, synonymity, translation/transcoding, and so on. Such limitations generally accompany the shift away from the original context to the broader one. Typically the system shows implicit context (context was formerly assumed, and narrow), lack of capacity (e.g., low number of possible IDs, reflecting the outmoded narrow context), lack of extensibility (no features defined and reserved against future needs), and lack of specificity and disambiguating capability (related to the context shift, where longstanding uniqueness encounters novel nonuniqueness). Within computer science, this problem is called naming collision. The story of the origination and expansion of the CODEN system provides a good case example in a recent-decades, technical-nomenclature context. The capitalization variations seen with specific designators reveals an instance of this problem occurring in natural languages, where the proper noun/common noun distinction (and its complications) must be dealt with. A universe in which every object had a UID would not need any namespaces, which is to say that it would constitute one gigantic namespace; but human minds could never keep track of, or semantically interrelate, so many UIDs.
Identifiers in various disciplines
[edit]| Identifier | Scope |
|---|---|
| atomic number, corresponding one-to-one with element name | international (via ISV) |
| Australian Business Number | Australian |
| CAGE code | U.S. and NATO |
| CAS registry number | originated in U.S.; today international (via ISV) |
| CODEN | originated in U.S.; today international |
| Digital object identifier (DOI, doi) | Handle System Namespace, international scope |
| DIN standard number | originated in Germany; today international |
| E number | originated in E.U.; may be seen internationally |
| EC number | |
| Employer Identification Number (EIN) | U.S. |
| Electronic Identifier Serial Publicaction (EISP) | international |
| Global Trade Item Number | international |
| Group identifier | many scopes, e.g., specific computer systems |
| International Chemical Identifier | international |
| International Standard Book Number (ISBN) | ISBN is part of the EAN Namespace; international scope |
| International eBook Identifier Number (IEIN) | international |
| International Standard Serial Number (ISSN) | international |
| ISO standard number, e.g., ISO 8601 | international |
| Library of Congress Control Number | U.S., with some international bibliographic usefulness |
| Personal identification number (Denmark) | Denmark |
| Pharmaceutical code | Many different systems |
| Product batch number | |
| Serial Item and Contribution Identifier | U.S., with some international bibliographic usefulness |
| Serial number | many scopes, e.g., company-specific, government-specific |
| Service batch number | |
| Social Security Number | U.S. |
| Tax file number | Australian |
| Unique Article Identifier (UAI) | international |
See also
[edit]- Barcode
- Binomial nomenclature
- British Approved Name
- Data descriptor
- Data element
- Descriptor
- Diagnosis code
- Document management system
- File descriptor
- Food labeling regulations
- Gene nomenclature
- Handle (computing)
- Identification
- Identity (object-oriented programming)
- Identity document
- Index term
- Marketing part number
- Metadata
- Name binding
- Namespace
- Naming convention (programming)
- National identification number
- Nomenclature – contains various standardized naming systems
- Nomenclature code
- Overloading
- Part number
- Personally identifiable information
- Product code
- Reference (computer science)
- Referent
- Representation term
- Systematized Nomenclature of Medicine
- Uniform resource identifier (URI)
- Unique identifier
- Unique key
References
[edit]- ^ University of Glasgow. "Procedure for Applying Identifiers to Documents". Archived from the original on 5 June 2011. Retrieved 28 April 2009.
- ^ University of Pennsylvania. "Information on Chemical Nomenclature". Archived from the original on 4 January 2009. Retrieved 28 April 2009.
External links
[edit]
The dictionary definition of identifier at Wiktionary
Media related to Identifiers at Wikimedia Commons
Identifier
View on GrokipediaCore Concepts
Definition and Purpose
An identifier is a name, symbol, or code that refers to a specific object, entity, or concept, enabling its distinction from others within a given system.[10] In information systems, it typically takes the form of a unique alphanumeric string, numeric value, or URL that associates with the entity in a particular context, serving as a label for identity or classification.[7] This foundational role allows identifiers to function across diverse domains, from physical artifacts to abstract ideas, by providing a consistent point of reference. The historical origins of identifiers trace back to early cataloging systems in the 19th century, which aimed to organize growing collections of knowledge systematically. A key precursor to modern identifiers is the Dewey Decimal Classification (DDC) system, developed by Melvil Dewey in 1876 as a hierarchical method for classifying books in libraries using numeric codes based on subject matter.[11][12] These early systems evolved from manual indexing practices in archives and libraries, laying the groundwork for structured naming that could scale with information volume, influencing later developments in metadata and digital organization.[13] The primary purposes of identifiers include facilitating reference, retrieval, and disambiguation in information systems, ensuring that entities can be located and differentiated efficiently. In everyday language, identifiers manifest as simple naming conventions, such as personal names or common nouns, which provide informal reference within social contexts.[14] In formal systems, they enable precise retrieval by linking to metadata records, enhancing search precision and recall, while disambiguating similar entities—such as distinguishing between homonyms—to avoid confusion in large datasets.[15][16] Key characteristics of identifiers include their design to be human-readable for intuitive use, machine-processable for automated handling, persistent to maintain stability over time where required, and context-dependent to operate effectively within specific scopes. Human-readability often involves alphanumeric formats that convey meaning, while machine-processability relies on standardized structures like strings or codes for computational efficiency.[17] Persistence ensures long-term resolvability, particularly for digital objects, preventing obsolescence in evolving systems.[8] Context-dependency means an identifier's uniqueness and applicability are bounded by its defined namespace or environment, adapting to the needs of the system it serves.[18][19]Types and Characteristics
Identifiers are classified primarily by their scope of uniqueness, distinguishing between local and global types. Local identifiers are unique only within a defined context or scope, such as a specific document, process, or subsystem, allowing reuse across different contexts without collision. For example, a label like "item1" might identify an element within one report but could be reused in another without ambiguity. In contrast, global identifiers ensure uniqueness across broader or entire systems, facilitating interoperability and tracking on a large scale; the International Standard Book Number (ISBN), a 13-digit code assigned to books, exemplifies this by uniquely identifying publications worldwide regardless of publisher or region.[8][20][21] Structurally, identifiers vary in composition to suit different needs for representation and processing. Alphanumeric identifiers combine letters and numbers, such as "user123," offering flexibility for human-readable yet compact forms in user accounts or product codes. Numeric identifiers use solely digits, like the integer 42, which are efficient for computational storage and comparison but less descriptive. Symbolic identifiers, such as Universally Unique Identifiers (UUIDs), employ standardized formats like 128-bit hexadecimal strings (e.g., "123e4567-e89b-12d3-a456-426614174000") to generate opaque, collision-resistant labels without relying on central authority. Composite identifiers build hierarchically from multiple components, as seen in domain names like "example.com," where subdomains nest within top-level domains to organize namespaces.[22] Essential properties of identifiers influence their effectiveness in identification tasks. Readability refers to how easily humans can interpret and use the identifier, favoring meaningful or pronounceable forms over random strings to reduce errors in manual entry. Brevity ensures shortness to minimize transcription mistakes and storage overhead, with optimal lengths balancing uniqueness against usability—typically 8-20 characters for many applications. Consistency involves standardized formats and conventions across uses, enabling predictable parsing and validation. Mutability addresses whether the identifier can change over time; while some local identifiers may be mutable for flexibility, global ones are generally immutable to maintain persistence and referential integrity.[8] The evolution of identifiers reflects advancing needs for organization and automation. In ancient record-keeping, such as the Inca khipu system of knotted strings from the 15th century, simple symbolic labels encoded administrative data like inventories through knot positions and colors, serving as early non-written identifiers. This progressed to printed labels in the 19th century with lithography, but a major leap occurred in the mid-20th century with standardized machine-readable formats; barcodes, patented in 1952 and first scanned commercially in 1974, introduced linear patterns like the Universal Product Code (UPC) for rapid, error-free identification in retail. Different structural types can contribute to namespace conflicts when scopes overlap, as explored in later sections.[23][24]Computing Applications
In Programming Languages
In programming languages, identifiers serve as names for entities such as variables, functions, and classes, adhering to specific syntax rules to ensure parseability and consistency. Typically, an identifier begins with a letter or underscore (classified as an ID_Start character per Unicode standards), followed by zero or more alphanumeric characters, underscores, or other ID_Continue characters like combining marks, but excluding reserved keywords and spaces.[25] For instance, in Python, identifiers must start with a letter (a-z, A-Z, or Unicode equivalents) or underscore, followed by letters, digits (0-9), or underscores, with no length limit, but cannot match reserved keywords such as "if" or "class".[26] Similarly, in C, identifiers start with a letter or underscore, followed by letters, digits, or underscores, with implementations required to treat at least the first 31 characters as significant for internal identifiers and 6 for external ones in older standards, though modern compilers often support longer names. In Java, identifiers follow a comparable pattern, starting with a Unicode letter, $, or _, followed by letters or digits, with no length restriction and case sensitivity distinguishing names like "myVar" from "MyVar".[27] Scoping mechanisms determine the visibility and lifetime of identifiers, primarily through lexical (static) scoping in most modern languages, where scope is resolved based on the code's textual structure rather than runtime call stack. Local identifiers, such as those declared within a function or block, are accessible only within that enclosing scope; for example, in Java, variables declared in a method or block have block-level scope, ceasing to exist after the block ends, promoting encapsulation and preventing unintended side effects.[28] Global identifiers, conversely, are visible across a broader context, like module-wide in Python, where they reside in the module's namespace and can be accessed or modified using the "global" keyword, though Python employs lexical scoping to resolve names by searching enclosing functions, then the global module, and finally built-ins.[29] This lexical approach, exemplified in both languages, ensures predictable name resolution, as the scope of an identifier like a nested function's variable is determined by its position in the source code.[29] Identifiers play a crucial role in structuring code by naming variables, functions, and classes, directly influencing readability and maintainability through conventions that enhance clarity. Case sensitivity is standard in languages like Python, C, and Java, allowing distinct names such as "userName" and "username", which supports expressive naming but requires careful attention to avoid errors.[26][27] Common conventions include camelCase (e.g., "myVariable" in Java for variables) and snake_case (e.g., "my_variable" in Python), which separate words to improve human readability without compromising machine parsing, as these styles align with language-specific guidelines to foster consistent, self-documenting code.[30][28] Historically, identifier rules evolved from hardware constraints to greater flexibility, reflecting advancements in compiler technology and usability. The original FORTRAN I, released in 1957, limited identifiers to six alphanumeric characters starting with a letter, a constraint derived from IBM 704's 6-bit character encoding to simplify symbol table management in early compilers.[31] Subsequent languages like C retained partial echoes of this with initial significant character limits (e.g., 6 for external identifiers pre-C99), but modern ones such as JavaScript impose no length restrictions, allowing arbitrary-length identifiers starting with letters or underscores to support more descriptive naming and Unicode integration. This progression from Fortran's rigid six-character cap to flexible rules in contemporary languages underscores a shift toward prioritizing developer productivity and code expressiveness.[25]In Databases and Systems
In relational databases, identifiers play a central role in maintaining data integrity and enabling relationships between tables. A primary key is a column or set of columns that uniquely identifies each row in a table, enforcing entity integrity by ensuring no duplicate or null values exist in that column.[32] For example, an auto-incrementing integer column, such asid INT AUTO_INCREMENT [PRIMARY KEY](/page/Primary_key) in SQL, automatically generates sequential unique values for new rows.[33] A foreign key, conversely, is a column or set of columns in one table that references the primary key in another table, establishing referential integrity to prevent orphaned records and ensure valid relationships.[32] For instance, a customer_id foreign key in an orders table links to the primary key of a customers table.[33]
These concepts were formalized in the ANSI SQL standards starting with SQL-89 in 1989, which introduced primary key constraints for unique row identification, and SQL-92, which added foreign keys and referential constraints to enforce data integrity across tables.[34]
At the system level, identifiers facilitate resource management in operating systems and applications. In Unix-like systems, a process ID (PID) is a unique integer assigned sequentially to each running process, serving as its identifier for scheduling, monitoring, and termination.[35] File handles act as opaque integer references provided by the operating system to open files, allowing processes to read, write, or manipulate them without exposing underlying storage details.[36] Session tokens, often implemented as unique strings or IDs, maintain state for user interactions in web or distributed systems, binding requests to authenticated sessions without requiring constant database lookups.[37]
Identifiers are essential in querying and indexing for efficient data retrieval. In SQL, they appear in statements like SELECT * FROM users WHERE id = 5, where the id primary key filters rows rapidly.[38] Primary keys automatically create clustered indexes in many systems, organizing data physically for faster lookups and joins, while foreign keys benefit from non-clustered indexes to optimize relationship queries.[32] This indexing role underscores the surrogate versus natural keys debate: natural keys derive from business data (e.g., email addresses), but surrogate keys like UUIDs—128-bit globally unique identifiers—are preferred in distributed systems to avoid central coordination and collision risks during data replication across nodes.[39] For example, UUIDs generated via functions like gen_random_uuid() ensure scalability in multi-node environments without sequential ID conflicts.[39]
Distinctions and Challenges
IDs versus UIDs
In computing, an identifier (ID) serves as a descriptive label for an entity, which may or may not be unique within its context, such as a name like "John" assigned to multiple individuals in a contact list.[40] In contrast, a unique identifier (UID) is a numeric or alphanumeric string guaranteed to distinguish a single entity across a defined domain, exemplified by a Social Security Number that uniquely identifies an individual within the U.S. system.[41] The primary differences between IDs and UIDs lie in their scope of uniqueness, generation methods, and associated collision risks. IDs often operate within a local scope, ensuring uniqueness only in limited contexts like a specific list or block, whereas UIDs aim for global or domain-wide uniqueness, potentially across infinite or distributed systems.[8] Generation for IDs typically involves simple sequential methods, such as auto-incrementing integers, while UIDs employ more robust techniques like UUIDs, which combine timestamps, random values, or hashing to minimize predictability.[42] Collision risks are higher for IDs due to their potential reusability or duplication in shared spaces, but UIDs are designed with probabilistic or deterministic guarantees to avoid overlaps, though not entirely risk-free in vast scales.[43] Practical examples illustrate these distinctions: in spreadsheets, row numbers function as non-unique IDs within a single sheet but may overlap across workbooks, allowing easy local referencing without global enforcement.[40] Conversely, MAC addresses serve as UIDs, providing 48-bit hardware-based uniqueness for network interfaces worldwide, assigned by manufacturers under IEEE standards to prevent conflicts in Ethernet communications.[41] While UIDs effectively prevent duplicates in large-scale or distributed environments, they introduce trade-offs such as increased complexity in implementation and higher storage overhead—for instance, a 128-bit UUID requires more space than a 32- or 64-bit integer ID, potentially impacting database index efficiency and query performance.[44] Non-unique IDs, by avoiding such overhead, simplify local operations but can contribute to namespace issues when scaled.[8]Namespace Conflicts and Resolution
Namespace conflicts arise when identifiers with the same name exist in overlapping or shared contexts, leading to ambiguities in resolution. Implicit conflicts often occur due to the same identifier being defined in different modules or scopes that are later combined, such as a variable namedx declared locally and globally in C++, where the local shadows the global unless explicitly qualified.[45] Explicit conflicts emerge from overlaps in distributed environments, like domain name collisions where an internal private namespace (e.g., .internal) inadvertently resolves to a public top-level domain after its delegation, potentially exposing sensitive systems.[46]
Detection of these conflicts varies by system type and phase. In compiled languages like C++ and C#, compile-time checks identify ambiguities, producing errors such as "conflicting declaration" when identical identifiers appear in the same scope.[45][47] In dynamic languages like Python, conflicts in the module ecosystem—such as one module overwriting another's namespace—are often detected at installation or runtime through tools like ModuleGuard, which simulates environments to reveal issues like module-to-third-party-library overlaps affecting over 21% of PyPI packages. In distributed systems, runtime resolution relies on scoping mechanisms; for instance, Kubernetes enforces uniqueness within namespaces during resource creation, preventing conflicts proactively, though misconfigurations can lead to DNS resolution failures.[48]
Resolution strategies focus on disambiguation and isolation. Namespaces partition identifiers into distinct domains, as in Java packages, where classes like com.example.Class avoid clashes by organizing code hierarchically based on reversed domain names.[49] Qualification uses fully specified paths, such as C#'s global::N1.N2.A or the scope resolution operator :: in C++ to access specific instances like ::x for globals.[47][45] Aliasing provides temporary renamings, seen in C# with using A = N1.N2.A; for shorthand access or in SQL's AS clause (e.g., SELECT e.name AS employee_name FROM employees e), which resolves column ambiguities during joins from multiple tables.[47][50]
Case studies illustrate these issues in practice. In the Python ecosystem, a 2024 study analysis of 4.2 million PyPI packages (434,823 latest versions as of April 2023) revealed that 21.45% exhibit module-to-third-party-library conflicts. Among 97 collected issues from the study, 65.98% were module-to-TPL conflicts, often involving third-party libraries defining modules that overlap with standard library ones, leading to import errors; tools like ModuleGuard detected conflicts in 108 GitHub projects (65 in latest versions), highlighting the need for environment-aware resolution.[51] In modern microservices architectures, Kubernetes namespaces mitigate conflicts by isolating resources—e.g., allowing duplicate service names like payment in dev and prod namespaces—using DNS FQDNs (e.g., payment.dev.svc.cluster.local) for runtime communication, though overlapping deployments without proper scoping can cause resource contention in collaborative environments.[48] Legacy systems, particularly during 1990s integrations, faced similar challenges when merging disparate codebases, often requiring manual renaming or wrappers to handle identifier overlaps in COBOL or mainframe environments.