Hubbry Logo
Universally unique identifierUniversally unique identifierMain
Open search
Universally unique identifier
Community hub
Universally unique identifier
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something
Universally unique identifier
Universally unique identifier
from Wikipedia
Universally unique identifier
AcronymUUID
OrganisationOpen Software Foundation (OSF), ISO/IEC, Internet Engineering Task Force (IETF)
No. of digits32
Examplef81d4fae-7dec-11d0-a765-00a0c91e6bf6
WebsiteRFC 9562 (obsoleted RFC 4122)

A universally unique identifier (UUID) is a 128-bit number designed to be a unique identifier for objects in computer systems. UUIDs are designed to be large enough that any randomly-generated UUID will, in practice, be unique from all other UUIDs. The term globally unique identifier (GUID) is also used, mostly in Microsoft-designed systems.[1][2] The standard way to represent UUIDs is as 32 hexadecimal digits, which are split with hyphens into five groups.

Universally unique identifiers are typically generated with a random number generator, with some systems also incorporating the time of generation or other information into the identifier. There are multiple standards for generating UUIDs for different applications with different requirements.[1] While the probability that a UUID value will be duplicated is not zero, it is generally considered negligible.[3][4] Because there are on the order of 1038 possible UUID values, different computer systems can assume that any UUID they generate will be unique across all computer systems in the world: there is no need for systems to coordinate to avoid reusing the same identifier.

UUIDs are in widespread use in modern computer systems and on the internet to label data objects, for example files or database entries. Despite being large enough to be universally unique, UUIDs still have a low overhead and are quick to generate and compare.

History

[edit]

In the 1980s, Apollo Computer originally used UUIDs in the Network Computing System (NCS). Later, the Open Software Foundation (OSF) used UUIDs for their Distributed Computing Environment (DCE). The design of the DCE UUIDs was partly based on the NCS UUIDs,[5] whose design was in turn inspired by the (64-bit) unique identifiers defined and used pervasively in Domain/OS, an operating system designed by Apollo Computer.[6] Later in the early 1990s, the Microsoft Windows platforms adopted the DCE design as "Globally Unique IDentifiers" (GUIDs).

RFC 4122 registered a URN namespace for UUIDs and recapitulated the earlier specifications, with the same technical content.[2] When in July 2005 RFC 4122 was published as a proposed IETF standard, the ITU had also standardized UUIDs, based on the previous standards and early versions of RFC 4122. On May 7, 2024, RFC 9562[1] was published, introducing 3 new "versions" and clarifying some ambiguities.

Standards

[edit]

The UUID technology is standardized by various bodies. The definition is standardized by the Open Software Foundation (OSF) as part of the Distributed Computing Environment (DCE).[7][8] The definition is documented as part of ISO/IEC 11578:1996 "Information technology – Open Systems Interconnection – Remote Procedure Call (RPC)" and more recently in ITU-T Rec. X.667 | ISO/IEC 9834-8:2014.[9] The Internet Engineering Task Force (IETF) published the Standards-Track RFC 9562[1] from the "Revise Universally Unique Identifier Definitions Working Group"[10] as revision for RFC 4122.[2] RFC 4122 is technically equivalent to ITU-T Rec. X.667 | ISO/IEC 9834-8, but is now obsolete.

Format

[edit]

A UUID is 128 bits in size, in which 2 to 4 bits are used to indicate the format's variant. The most common variant in use, OSF DCE, additionally defines 4 bits for its version.

The use of the remaining bits is governed by the variant/version selected.

Variants

[edit]

The variant field indicates the format of the UUID (and in case of the legacy UUID also the address family used for the node field). The following variants are defined:

  • The Apollo NCS variant (indicated by the one-bit pattern 0xxx2) is for backwards compatibility with the now-obsolete Apollo Network Computing System 1.5 UUID format developed around 1988. Though different in detail, the similarity with modern UUIDv1 is evident. The variant bits in the current UUID specification coincide with the high bits of the address family octet in NCS UUIDs. Though the address family could hold values in the range 0..255, only the values 0..13 were ever defined. Accordingly, the bit pattern 0xxx avoids conflicts with historical NCS UUIDs, should any still exist in databases.[11] This variant defines "families" as subtype.
  • The OSF DCE variant (10xx2) are referred to as RFC 4122/DCE 1.1 UUIDs, or "Leach–Salz" UUIDs, after the authors of the original Internet Draft. This variant defines "versions" as subtype.
  • The Microsoft COM/DCOM variant (110x2) is characterized in the RFC as "reserved, Microsoft Corporation backward compatibility" and was used for early GUIDs on the Microsoft Windows platform.
  • The Reserved variant space is not currently used by any specification.

Versions of the OSF DCE variant

[edit]

The OSF DCE variant defines eight "versions" in the standard, and each version may be more appropriate than the others in specific use cases. The version is indicated by the value of the higher nibble (higher 4 bits, or higher hexadecimal digit) of the 7th byte of the UUID. In hex, this is the character after the second dash. For example, the UUID 9c5b94b1-35ad-49bb-b118-8e8fc24abf80 is version 4, because of the digit after the second dash is 4 in ...-49bb-....

Versions 1 and 6 (date-time and MAC address)

[edit]

Version 1 concatenates the 48-bit MAC address of the "node" (that is, the computer generating the UUID), with a 60-bit timestamp, being the number of 100-nanosecond intervals since midnight 15 October 1582 Coordinated Universal Time (UTC), the date on which the Gregorian calendar was first adopted by the bulk of Europe. RFC 4122 states that the time value rolls over around 3400 AD,[2]: 3 depending on the algorithm used, which implies that the 60-bit timestamp is a signed quantity. However some software, such as the libuuid library, treats the timestamp as unsigned, putting the rollover time in 5623 AD.[12] The rollover time as defined by ITU-T Rec. X.667 is 3603 AD.[13]: v

A 13-bit or 14-bit "uniquifying" clock sequence extends the timestamp in order to handle cases where the processor clock does not advance fast enough, or where there are multiple processors and UUID generators per node. When UUIDs are generated faster than the system clock could advance, the lower bits of the timestamp fields can be generated by incrementing it every time a UUID is being generated, to simulate a high-resolution timestamp. With each version 1 UUID corresponding to a single point in space (the node) and time (intervals and clock sequence), the chance of two properly generated version-1 UUIDs being unintentionally the same is practically nil. Since the time and clock sequence total 74 bits, 274 (1.8×1022, or 18 sextillion) version-1 UUIDs can be generated per node ID, at a maximal average rate of 163 billion per second per node ID.[2]

In contrast to other UUID versions, version-1 and -2 UUIDs based on MAC addresses from network cards rely for their uniqueness in part on an identifier issued by a central registration authority, namely the Organizationally Unique Identifier (OUI) part of the MAC address, which is issued by the IEEE to manufacturers of networking equipment.[14] The uniqueness of version-1 and version-2 UUIDs based on network-card MAC addresses also depends on network-card manufacturers properly assigning unique MAC addresses to their cards, which like other manufacturing processes is subject to error. Virtual machines receive a MAC address in a range that is configurable in the hypervisor.[15] Additionally some operating systems permit the end user to customise the MAC address, notably OpenWRT.[16]

Usage of the node's network card MAC address for the node ID means that a version-1 UUID can be tracked back to the computer that created it. Documents can sometimes be traced to the computers where they were created or edited through UUIDs embedded into them by word processing software. This privacy hole was used when locating the creator of the Melissa virus.[17]

RFC 9562[1] does allow the MAC address in a version-1 (or 2) UUID to be replaced by a random 48-bit node ID, either because the node does not have a MAC address, or because it is not desirable to expose it. In that case, the RFC requires that the least significant bit of the first octet of the node ID should be set to 1.[2] This corresponds to the multicast bit in MAC addresses, and setting it serves to differentiate UUIDs where the node ID is randomly generated from UUIDs based on MAC addresses from network cards, which typically have unicast MAC addresses.[2]

Version 6 is the same as version 1 except all timestamp bits are ordered from most significant to least significant. This allows systems to sort UUIDs in order of creation simply by sorting them lexically, whereas this is not possible with version 1.

Version 2 (date-time and MAC address, DCE security version)

[edit]

RFC 9562[1] reserves version 2 for "DCE security" UUIDs; but it does not provide any details. For this reason, many UUID implementations omit version 2. However, the specification of version-2 UUIDs is provided by the DCE 1.1 Authentication and Security Services specification.[8]

Version-2 UUIDs are similar to version 1, except that the least significant 8 bits of the clock sequence are replaced by a "local domain" number, and the least significant 32 bits of the timestamp are replaced by an integer identifier meaningful within the specified local domain. On POSIX systems, local-domain numbers 0 and 1 are for user ids (UIDs) and group ids (GIDs) respectively, and other local-domain numbers are site-defined.[8] On non-POSIX systems, all local domain numbers are site-defined.

The ability to include a 40-bit domain/identifier in the UUID comes with a tradeoff. On the one hand, 40 bits allow about 1 trillion domain/identifier values per node ID. On the other hand, with the clock value truncated to the 28 most significant bits, compared to 60 bits in version 1, the clock in a version 2 UUID will "tick" only once every 429.49 seconds, a little more than 7 minutes, as opposed to every 100 nanoseconds for version 1. And with a clock sequence of only 6 bits, compared to 14 bits in version 1, only 64 unique UUIDs per node/domain/identifier can be generated per 7-minute clock tick, compared to 16,384 clock sequence values for version 1.[18]

Versions 3 and 5 (namespace name-based)

[edit]

Version-3 and version-5 UUIDs are generated by hashing a namespace identifier and name. Version 3 uses MD5 as the hashing algorithm, and version 5 uses SHA-1.[1]

The namespace identifier is itself a UUID. The specification provides constant UUIDs to represent the namespaces for URLs, fully qualified domain names, object identifiers, and X.500 distinguished names; but any desired UUID may be used as a namespace designator.

To determine the version-3 UUID corresponding to a given namespace and name, the UUID of the namespace is transformed to a string of bytes, concatenated with the input name, then hashed with MD5, yielding 128 bits. Then 6 or 7 bits are replaced by fixed values, the 4-bit version (e.g. 00112 for version 3), and the 2- or 3-bit UUID "variant" (e.g. 102 indicating an RFC 9562[1] UUIDs, or 1102 indicating a legacy Microsoft GUID). Since 6 or 7 bits are thus predetermined, only 121 or 122 bits contribute to the uniqueness of the UUID.

Version-5 UUIDs are similar, but SHA-1 is used instead of MD5. Since SHA-1 generates 160-bit digests, the digest is truncated to 128 bits before the version and variant bits are replaced.

Version-3 and version-5 UUIDs have the property that the same namespace and name will map to the same UUID. However, neither the namespace nor name can be determined from the UUID, even if one of them is specified, except by brute-force search. RFC 4122 recommends version 5 (SHA-1) over version 3 (MD5), and warns against use of UUIDs of either version as security credentials.[2]

Version 4 (random)

[edit]

A version 4 UUID is randomly generated. As in other UUIDs, 4 bits are used to indicate version 4, and 2 or 3 bits to indicate the variant (102 or 1102 for variants 1 and 2 respectively). Thus, for variant 1 (that is, most UUIDs) a random version 4 UUID will have 6 predetermined variant and version bits, leaving 122 bits for the randomly generated part, for a total of 2122, or 5.3×1036 (5.3 undecillion) possible version-4 variant-1 UUIDs. There are half as many possible version 4, variant 2 UUIDs (legacy GUIDs) because there is one less random bit available, 3 bits being consumed for the variant.

Per RFC 9562[1], the seventh octet's most significant 4 bits indicate which version the UUID adheres to. This means that the first hexadecimal digit in the third group always starts with a 4 in UUIDv4s. Visually, this looks like this xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx, where M is the UUID version field. The upper two or three bits of digit N encode the variant. Values are 8, 9, A or B for the 2 bit indication, values C or D for the 3 bit indication. For example, a random UUID version 4, variant 1 could be 8D8AC610-566D-4EF0-9C22-186B2A5ED793.[19]

Version 7 (timestamp and random)

[edit]

Version 7 UUIDs (UUIDv7) are designed for keys in high-load databases and distributed systems.

UUIDv7 begins with a 48 bit big-endian Unix Epoch timestamp with approximately millisecond granularity. The timestamp can be shifted by any time shift value. Directly after the timestamp follows the version nibble, that must have a value of 7. The variant bits have to be 10x. The remaining 74 bits are random seeded counter (optional, at least 12 bits but no longer than 42 bits) and random.

Two counter rollover handling methods can be used together:

  • Zero seeded most significant, leftmost guard bit of the counter.
  • Increment of the timestamp ahead of the actual time and reinitialize the counter when it overflows.

In DBMS UUIDv7 generator can be shared between threads (tied to a table or to a DBMS instance) or can be thread-local (with worse monotonicity, locality and performance).

Version 8 (custom)

[edit]

Version 8 only has two requirements:

  • The variant bits have to be 10, so the nibble containing the variant must be 8 (0b1000), 9 (0b1001), A (0b1010), or B (0b1011).
  • The version nibble has to be the value of 8.

Those requirements tell the system that it is a version 8 UUID. The remaining 122 bits are up to the vendor to customize. The difference with version 4 is that those 122 bits are random, but the 122 bits in UUID version 8 are not, because they follow vendor specific rules.

Special values

[edit]

The "nil" UUID is 00000000-0000-0000-0000-000000000000; that is, all clear bits.[1] The "max" UUID, sometimes also called the "omni" UUID, is FFFFFFFF-FFFF-FFFF-FFFF-FFFFFFFFFFFF; that is, all set bits.[1]

Encoding

[edit]

Binary representation

[edit]

Initially, Apollo Computer designed the UUID with the following wire format:[5][11]

The legacy wire format
Name Offset Length Description
time_high 0x00 4 octets / 32 bits The first 6 octets are the number of four-microsecond (μs) units of time that have passed since 1980-01-01 00:00 UTC.
The time 248 × 4 μs after 1980 started was 2015-09-05 05:58:26.84262 UTC.
Thus, the last time at which UUIDs could be generated in this original format was in 2015.[20]
time_low 0x04 2 octets / 16 bits
reserved 0x06 2 octets / 16 bits These octets are reserved for future use.
family 0x08 1 octet / 8 bits This octet is an address family.
node 0x09 7 octets / 56 bits These octets are a host ID in the form allowed by the specified address family.

Later, the UUID was extended by combining the legacy family field with the new variant field. Because the family field only had used the values ranging from 0 to 13 in the past, it was decided that a UUID with the most significant bit set to 0 was a legacy UUID. This gives the following table for the family group:

Family / variant field
MSB 0 MSB 1 MSB 2 Legacy family field value range In hex Description
0 x x 0–127 (Only 0–13 are used) 0x00–0x7f The legacy Apollo NCS UUID
1 0 x 128–191 0x80–0xbf OSF DCE UUID
1 1 0 192–223 0xc0–0xdf Microsoft COM / DCOM UUID
1 1 1 224–255 0xe0–0xff Reserved for future definition

The legacy Apollo NCS UUID has the format described in the previous table. The OSF DCE UUID variant is described in RFC 9562[1]. The Microsoft COM / DCOM UUID has its variant described in the Microsoft documentation.

Endianness

[edit]

When saving UUIDs to binary format, they are sequentially encoded in big-endian. For example, 00112233-4455-6677-8899-aabbccddeeff is encoded as the bytes 00 11 22 33 44 55 66 77 88 99 aa bb cc dd ee ff.[21][22]

An exception to this are Microsoft's variant 2 UUIDs ("GUID"): historically used in COM/OLE libraries, they use a little-endian format, but appear mixed-endian with the first three components of the UUID as little-endian and last two big-endian. Microsoft's GUID structure defines the last eight bytes as an 8-byte array, which are serialized in ascending order, which makes the byte representation appear mixed-endian.[23] For example, 00112233-4455-6677-8899-aabbccddeeff is encoded as the bytes 33 22 11 00 55 44 77 66 88 99 aa bb cc dd ee ff.[24][25]

Textual representation

[edit]

In most cases, UUIDs are represented as hexadecimal values. The most used format is the 8-4-4-4-12 format, xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx, where every x represents 4 bits. Other well-known formats are the 8-4-4-4-12 format with braces, {xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx}, like in Microsoft's systems, e.g. Windows, or xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, where all hyphens are removed. In some cases, it is also possible to have xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx with the "0x" prefix or the "h" suffix to indicate hexadecimal values. The format with hyphens was introduced with the newer variant system. Before that, the legacy Apollo format used a slightly different format: 34dc23469000.0d.00.00.7c.5f.00.00.00. The first part is the time (time_high and time_low combined). The reserved field is skipped. The family field comes directly after the first dot, so in this case 0d (13 in decimal) for DDS (Data Distribution Service). The remaining parts, each separated with a dot, are the node bytes.

The lowercase form of the hexadecimal values is the generally preferred format. Specifically in some contexts such as those defined in ITU-T Rec. X.667, lowercase is required when the text is generated, but the uppercase version must also be accepted.

Like any integer, a UUID can be represented as a decimal number. For example, the UUID 550e8400-e29b-41d4-a716-446655440000 can be represented as the decimal number 113059749145936325402354257176981405696. It is up to the implementation to decide whether this number is signed or unsigned, i.e. whether this decimal number is negative if the first bit is a 1.

A UUID can also be represented in binary, as a string of 128 bits. For example, the UUID 550e8400-e29b-41d4-a716-446655440000 can be represented as 01010101000011101000010000000000111000101001101101000001110101001010011100010110010001000110011001010101010001000000000000000000.

RFC 9562[1] registers the "uuid" namespace. This makes it possible to make URNs out of UUIDs, like urn:uuid:550e8400-e29b-41d4-a716-446655440000. The normal 8-4-4-4-12 format is used for this. It is also possible to make a OID URN out of UUIDs, like urn:oid:2.25.113059749145936325402354257176981405696. In that case, the unsigned decimal format is used. The "uuid" URN is recommended over the "oid" URN.

Collisions

[edit]

Collision occurs when the same UUID is generated more than once and assigned to different referents. In the case of standard version-1 and version-2 UUIDs using unique MAC addresses from network cards, collisions are unlikely to occur, with an increased possibility only when an implementation varies from the standards, either inadvertently or intentionally.

In contrast to version-1 and version-2 UUIDs generated using MAC addresses, with version-1 and -2 UUIDs which use randomly generated node ids, hash-based version-3 and version-5 UUIDs, and random version-4 UUIDs, collisions can occur even without implementation problems, albeit with a probability so small that it can normally be ignored. This probability can be computed precisely based on analysis of the birthday problem.[26]

For example, the number of random version-4 UUIDs which need to be generated in order to have a 50% probability of at least one collision is 2.71 quintillion, computed as follows:[27]

This number would be equivalent to generating 1 billion UUIDs per second for about 86 years. A file containing this many UUIDs, at 16 bytes per UUID, would be about 43.4 exabytes (37.7 EiB).

The smallest number of version-4 UUIDs which must be generated for the probability of finding a collision to be p is approximated by the formula

Thus, the probability to find a duplicate within 103 trillion version-4 UUIDs is one in a billion.

Collisions have occurred when manufacturers assign a default UUID to a product, such as a motherboard, and then fail to over-write the default UUID later in the manufacturing process. For example, UUID 03000200-0400-0500-0006-000700080009 occurs on many different units of Gigabyte-branded motherboards.[28]

Uses

[edit]

Filesystems

[edit]

Significant uses include filesystem userspace tools [29] most of which are derived from the original implementation by Theodore Ts'o.[12] The "partition label" and the "partition UUID" are both stored in the superblock. They are both part of the file system rather than of the partition. For example, ext2–4 contain a UUID, while NTFS or FAT32 do not. The superblock is a part of the file system, thus fully contained within the partition, hence doing leaves both sda1 and sdb1 with the same label and UUID.

Partition tables

[edit]

The GUID Partition Table (GPT) is one example that utilised GUIDs to label partition types.

Remoting

[edit]

There are several flavors of GUIDs used in Microsoft's Component Object Model (COM):

  • IID – interface identifier; (The ones that are registered on a system are stored in the Windows Registry at [HKEY_CLASSES_ROOT\Interface][30] )
  • CLSID – class identifier; (Stored at [HKEY_CLASSES_ROOT\CLSID]). In practice it is not entirely separate from the IID space, because remoting the interface can require a proxy/stub object which some toolsets used to create with a CLSID equal to the interface's IID.
  • LIBID – type library identifier; (Stored at [HKEY_CLASSES_ROOT\TypeLib][31])
  • CATID – category identifier; (its presence on a class identifies it as belonging to certain class categories, listed at [HKEY_CLASSES_ROOT\Component Categories][32])

Databases

[edit]

UUIDs are commonly used as a unique key in database tables. The NEWID function in Microsoft SQL Server version 4 Transact-SQL returns standard random version-4 UUIDs, while the NEWSEQUENTIALID function returns 128-bit identifiers similar to UUIDs which are committed to ascend in sequence until the next system reboot.[33] The Oracle Database SYS_GUID function does not return a standard GUID, despite the name. Instead, it returns a 16-byte 128-bit RAW value based on a host identifier and a process or thread identifier, somewhat similar to a GUID.[34] PostgreSQL contains a UUID datatype[35] and can generate most versions of UUIDs through the use of functions from modules.[36][37] MySQL provides a UUID function, which generates standard version-1 UUIDs.[38]

Combined Time-GUID

[edit]

The random nature of standard UUIDs of versions 3, 4, and 5, and the ordering of the fields within standard versions 1 and 2 may create problems with database locality or performance when UUIDs are used as primary keys. For example, in 2002 Jimmy Nilsson reported a significant improvement in performance with Microsoft SQL Server when the version-4 UUIDs being used as keys were modified to include a non-random suffix based on system time. This so-called "COMB" (combined time-GUID) approach made the UUIDs significantly more likely to be duplicated, as Nilsson acknowledged, but Nilsson only required uniqueness within the application.[39] By reordering and encoding version 1 and 2 UUIDs so that the timestamp comes first, insertion performance loss can be averted.[40]

COMB-like arrangements of UUID payloads were eventually standardized in RFC 9562[1] as UUIDv6 and UUIDv7.

See also

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
A universally unique identifier (UUID) is a 128-bit label designed to uniquely identify objects or entities in computer systems without requiring a central , ensuring uniqueness across space and time. Standardized by the (IETF) in RFC 9562 (obsoleting RFC 4122 in 2024) as a (URN) namespace, a UUID—also known as a globally (GUID) in some contexts—facilitates distributed systems by providing collision-resistant identifiers for resources such as files, transactions, or database records. Its structure consists of 16 octets (128 bits) arranged in a specific layout: a 32-bit time-low field, a 16-bit time-mid field, a 16-bit time-high-and-version field, an 8-bit clock-sequence-high-and-reserved field, an 8-bit clock-sequence-low field, and a 48-bit node field, typically represented in a canonical string format of eight digits, followed by a , then four, four, four, and twelve digits (e.g., f81d4fae-7dec-11d0-a765-00a0c91e6bf6). The variant field in the eighth octet distinguishes the UUID layout, with the defined variant using the bit pattern 10xx for compatibility. UUIDs come in eight versions, each with distinct algorithms to guarantee uniqueness (detailed in the dedicated section):
  • Version 1: Gregorian time-based, incorporating a and the generating system's node identifier (often a ).
  • Version 2: DCE security version, similar to version 1 but includes user or group IDs for .
  • Version 3: Name-based, using hashing of a UUID and a name string.
  • Version 4: Random or pseudo-random generation, relying on high-quality for uniqueness.
  • Version 5: Name-based, using hashing instead of for improved security.
  • Version 6: Reordered Gregorian time-based, similar to version 1 but with reordered for improved sorting in databases.
  • Version 7: Unix -based, combining a Unix with random bits for uniqueness and monotonicity.
  • Version 8: Custom, allowing application-specific algorithms while maintaining the UUID format.
Uniqueness is achieved through these methods: time-based versions leverage monotonic clocks and unique node IDs (or random alternatives), while name-based, random, and custom versions use cryptographic hashing, sufficient entropy, or defined procedures to minimize collision probabilities, making UUIDs essential for scalable, decentralized applications like cloud computing and databases.

History

Origins in OSF DCE

The universally unique identifier (UUID) was initially developed as a component of the Open Software Foundation's (OSF) Distributed Computing Environment (DCE), a middleware system aimed at facilitating distributed computing in client-server architectures during the early 1990s. The primary goal was to enable the generation of unique identifiers for objects, such as files, processes, or resources, across networked systems without relying on a central coordinating authority, thereby supporting scalability in heterogeneous environments. This approach addressed the challenges of ensuring global uniqueness in distributed setups where multiple nodes might independently create identifiers, avoiding conflicts through decentralized mechanisms. The concept traces its roots to the Apollo Network Computing System (NCS), developed by , before being formalized and expanded within OSF DCE. Key figures in its specification included Paul J. Leach, then at , and Rich Salz, associated with Certco, who co-authored early drafts building directly on the OSF DCE framework. The first specification appeared in OSF DCE version 1.0, released around 1990–1991, defining the UUID as a fixed 128-bit value to balance compactness with sufficient entropy for uniqueness. This structure incorporated a 60-bit timestamp (representing 100-nanosecond intervals since October 15, 1582), a 14-bit clock sequence to handle potential timestamp collisions, and a 48-bit node identifier typically derived from the machine's MAC address, ensuring both temporal and spatial uniqueness. Early adoption of UUIDs occurred within OSF DCE-based systems for tasks like naming interfaces and binding endpoints in remote procedure calls. Microsoft integrated the OSF DCE remote procedure call (RPC) mechanism, including UUIDs, into its Component Object Model (COM) framework, where they served as globally unique identifiers (GUIDs) for components, interfaces, and classes in distributed applications. This integration marked one of the earliest widespread uses beyond pure DCE environments, influencing subsequent implementations in Windows platforms and other enterprise systems.

Standardization and Evolution

The standardization of universally unique identifiers (UUIDs) originated in the Open Software Foundation's Distributed Computing Environment (OSF DCE) in the early 1990s, providing a foundational specification for generating unique identifiers across distributed systems. This initial framework was subsequently formalized as an international standard through ISO/IEC 11578:1996, which defines UUIDs within the context of Open Systems Interconnection—Remote Procedure Call (RPC), ensuring interoperability in information technology environments. In July 2005, the Internet Engineering Task Force (IETF) published RFC 4122, titled "A Universally Unique IDentifier (UUID) URN Namespace," which codified the DCE-based UUID format as a Uniform Resource Name (URN) namespace while introducing three new versions: version 3 for name-based UUIDs using MD5 hashing, version 4 for random or pseudo-random generation, and version 5 for name-based UUIDs using SHA-1 hashing. This RFC addressed gaps in the earlier DCE and ISO specifications by providing a broader set of generation methods suitable for diverse applications, including those not reliant on distributed time synchronization. Over the years, RFC 4122 underwent refinements through published errata, resolving ambiguities in areas such as variant bit interpretation and byte order to enhance implementation consistency. Advancements continued with the IETF's uuidrev working group, which in 2023 released draft-ietf-uuidrev-rfc4122bis-08, proposing version 7 UUIDs that incorporate a 48-bit Unix for improved chronological sortability without depending on synchronized clocks or hardware identifiers. This draft evolved into RFC 9562, published in May 2024 as "Universally Unique IDentifiers (UUIDs)," which obsoletes RFC 4122 and introduces version 6 (a reordered time-based variant of version 1), version 7, and version 8 for custom or experimental applications where implementers define their own layouts within the UUID framework. RFC 9562 also aligns UUID specifications more closely with ITU-T Recommendation X.667 | ISO/IEC 9834-8, emphasizing best practices for generation and usage. A key aspect of UUID evolution has been the shift away from hardware-dependent generation methods, particularly those using in versions 1 and 2, toward software-only approaches. This transition addresses concerns, as can reveal device identities and enable tracking, compounded by ' implementation of randomization to protect user anonymity in network environments. RFC 9562 explicitly recommends using fixed or random node identifiers instead of actual to maintain uniqueness while mitigating these risks, reflecting broader industry adoption of privacy-preserving identifier strategies.

Standards

Core Specifications

The core specifications for universally unique identifiers (UUIDs) establish a standardized 128-bit format designed to ensure uniqueness across both spatial and temporal dimensions without relying on a . This fixed length allows for an immense of approximately 3.4 × 10^38 possible values, minimizing collision risks in distributed systems. The foundational international standard, ISO/IEC 11578:1996, defines the representation and generation algorithms for UUIDs within the Open Systems Interconnection (OSI) framework, specifically as part of the (DCE) (RPC) bindings. It specifies the canonical string format—32 digits grouped as 8-4-4-4-12 with hyphens—and outlines procedures for creating time-based and name-based identifiers to maintain global uniqueness. Compliance with this standard ensures interoperability in OSI-conformant environments by mandating the use of precise bit layouts for , clock sequence, and node fields. Building on DCE principles, RFC 4122 (published in 2005 by the ) formalizes UUIDs as a (URN) namespace, defining versions 1 through 5 with distinct generation methods while emphasizing collision avoidance through randomized elements like clock sequences and node identifiers. It introduces the variant field, encoded in the high-order bits of the eighth octet as the binary pattern 10xx (where xx are variable), to distinguish UUIDs generated under this specification from other variants and reserve space for future extensions. For interoperability, all compliant UUIDs must adhere to this variant encoding and include a version nibble (a 4-bit value in the high-order bits of the seventh octet) that identifies the specific generation algorithm used. Guidelines for collision avoidance include using unique node identifiers, such as MAC addresses, and incrementing a clock sequence on system reboots or clock adjustments to prevent duplicates. RFC 9562 (published in 2024) obsoletes and expands RFC 4122, incorporating versions 6 through 8 to address modern needs like sortable timestamps and custom subtypes, while clarifying and slightly revising the variant rules for enhanced robustness. It reaffirms the 128-bit length and the core requirement for decentralized , with no central coordination needed for or validation. Updated compliance criteria mandate the variant bits (positions 64-65 in the UUID octet stream) be set to 10 for all defined versions, alongside the version nibble (bits 48-51) matching the UUID type (e.g., 0110 for version 6), ensuring seamless integration across systems and networks. These specifications collectively prioritize global by standardizing the layout and metadata fields that signal provenance. In Microsoft Windows environments, the UUID is implemented as a Globally Unique Identifier (GUID), a 16-byte binary structure used extensively in (COM) and Distributed COM (DCOM) to uniquely identify interfaces, classes, and other objects. GUIDs are stored in binary form within the to index configuration information for applications and system components, enabling seamless object resolution across distributed systems. This adaptation maintains compatibility with the core UUID structure while integrating with Windows-specific protocols for remote procedure calls. The Bluetooth specification adopts 128-bit UUIDs for identifying services and profiles, often deriving them from shorter 16-bit or 32-bit forms assigned by the (SIG) to optimize transmission efficiency in low-power devices. These short UUIDs are expanded into full 128-bit equivalents by inserting the assigned value into a fixed base UUID (00000000-0000-1000-8000-00805F9B34FB), ensuring global uniqueness while minimizing on-air bytes in protocols like GATT. Only SIG-assigned short forms are permitted for , with custom 128-bit UUIDs reserved for vendor-specific extensions. In web and data interchange standards, UUIDs are represented in their canonical string lexical form—a sequence of 32 digits grouped as 8-4-4-4-12 and enclosed in braces for some contexts—to ensure consistent parsing within documents, as JSON natively supports strings without requiring special handling. This format promotes interoperability in -based APIs and , where UUIDs serve as keys or identifiers without altering the underlying 128-bit binary value. Emerging standards and drafts extend UUIDs for cloud environments and enhanced privacy. For instance, (AWS) incorporates UUIDs within Amazon Resource Names (ARNs) for certain resource identifiers, such as in AWS Glue transformations that generate unique IDs for data rows, facilitating scalable, distributed resource management. Privacy-enhanced variants, addressed in recent IETF updates, introduce new versions like UUIDv7 (time-ordered with high-entropy random bits) and UUIDv8 (custom subtypes), which avoid exposing hardware identifiers like MAC addresses to mitigate tracking risks in distributed systems. These evolutions, formalized in RFC 9562, prioritize and temporal sorting while preserving .

Structure

Overall Layout

A Universally Unique Identifier (UUID) is a 128-bit value designed to be unique across space and time without centralized coordination. It is typically stored and transmitted in network byte order (big-endian), ensuring consistent interpretation across different systems regardless of local . This fixed 128-bit size provides ample space to avoid collisions while remaining compact for storage and comparison purposes. The UUID is divided into several fixed fields that collectively form its structure: an 8-bit field (indicating the encoding variant), a 4-bit version field (specifying the generation algorithm), a 48-bit node ID (often derived from hardware like a ), a 60-bit or equivalent value (for time-based ), and a clock sequence field (typically 14 bits to handle clock adjustments and prevent duplicates). These fields are not padded with leading zeros in their binary representation; instead, the fixed overall bit length ensures no overflow or alignment issues during generation or parsing. Visually, the 128-bit UUID can be broken down as follows:

UUID = time_low (32 bits) | time_mid (16 bits) | time_hi_and_version (16 bits) | clock_seq_and_variant (16 bits) | node (48 bits)

UUID = time_low (32 bits) | time_mid (16 bits) | time_hi_and_version (16 bits) | clock_seq_and_variant (16 bits) | node (48 bits)

This layout positions the time_low as the least significant bits, followed by time_mid, then time_hi_and_version (which embeds the version in its high 4 bits), clock_seq_and_variant (which embeds the variant in its high bits), and finally the node as the most significant bits. The purpose of these fields is to ensure global uniqueness by integrating temporal information (), spatial identifiers (node ID), and elements of randomness or sequencing (), allowing UUIDs to be generated independently on different systems without coordination. For instance, in time-based versions, the timestamp provides ordering, while the node distinguishes devices; in random versions, the fields incorporate pseudorandom values to achieve the same goal. This field-based design supports multiple UUID versions while maintaining a uniform overall layout.

Variant and Version Fields

The variant field in a UUID occupies the three most significant bits (bits 0 through 2) of the clock_seq_hi_and_reserved octet (octet 8 in the 128-bit layout), determining the overall layout and interpretation of the remaining bits for across systems. This field uses specific bit patterns to distinguish between different UUID encoding schemes: 0xx for Network Computing System (NCS) backward compatibility, 10x for the standard defined in RFC 9562 (providing compatibility with earlier RFC 4122 UUIDs), 110 for GUID backward compatibility, and 111 reserved for future use. The 10x pattern, where the two most significant bits are 10, is the most commonly used in modern implementations to ensure consistent parsing. The version field, a 4-bit , is located in the most significant bits (bits 0 through 3) of the time_hi_and_version octet (octet 6), specifying the UUID generation and thus how the other fields should be interpreted. Versions 1 through 5 represent the original methods: for time-based UUIDs using Gregorian timestamps, version 2 for DCE security UUIDs (largely reserved), version 3 for name-based UUIDs using hashing, version 4 for random or pseudorandom UUIDs, and for name-based UUIDs using hashing. Updated versions 6 through 8 extend this scheme: version 6 for reordered time-based UUIDs (rearranging version 1 fields for better sorting), version 7 for Unix timestamp-based UUIDs with random components for improved monotonicity, and version 8 for custom or application-specific UUIDs. To detect the UUID type, systems parse these fields during decoding: the variant bits first classify the layout, followed by the version bits to identify the exact generation method, enabling validation of compliance with RFC 9562. This dual classification is crucial for preventing misinterpretation; for instance, a variant mismatch (e.g., treating a GUID as an RFC 9562 UUID) can lead to incorrect extraction of timestamps or node identifiers, causing errors in distributed systems. By standardizing these bits, UUIDs maintain uniqueness and portability across diverse environments without requiring additional metadata.

Time and Node Components

In time-based UUIDs, such as versions 1 and 6, the timestamp component provides a measure of the generation time, consisting of a 60-bit value representing the number of 100-nanosecond intervals since the epoch of October 15, 1582, 00:00:00. This starting point deliberately follows the to avoid complications from the earlier transition, ensuring a consistent reference across systems. The is designed to roll over after approximately 3,400 years from the , providing ample longevity for practical use. The timestamp is subdivided and positioned within the 128-bit UUID structure as follows: the least significant 32 bits occupy the time_low field (octets 0-3), the next 16 bits form the time_mid field (octets 4-5), and the most significant 12 bits of the timestamp appear in the time_hi portion of the time_hi_and_version field (octets 6-7, with the remaining 4 bits reserved for the version number). In version 1 UUIDs, this layout preserves the original DCE ordering, while version 6 reorders the fields to place the most significant timestamp bits first for improved chronological sorting in databases. These components collectively ensure that the timestamp contributes to the UUID's uniqueness by embedding precise temporal information. To mitigate risks of duplicate UUIDs arising from clock adjustments, such as regressions or resets on the generating system, a 14-bit clock sequence field is included. This sequence is typically initialized to a random value between 0 and 16,383 and is incremented ( 16,384) whenever the local clock is found to have regressed relative to the last UUID generation time, or upon node ID changes that could otherwise cause collisions. The clock sequence occupies bits 66-79 in the binary representation: for the UUID , it consists of bit 66 ( bit of the 10x variant) followed by the 5 bits in the remainder of octet 8 (bits 67-71), and the 8 bits in octet 9 (bits 72-79). This mechanism guarantees temporal uniqueness even in environments with imperfect clocks, without requiring synchronized time across nodes. The node ID component, a 48-bit field, identifies the hardware or network interface generating the UUID and occupies the final octets (10-15) in the binary layout. It is conventionally set to the of the local node, which uniquely identifies network interfaces worldwide. When a true is unavailable or to preserve , a randomly generated 48-bit value is used instead, with the bit (the least significant bit of octet 10) set to 1 to distinguish it from MAC addresses. This node ID, combined with the and clock , ensures global by tying the UUID to a specific generating . The variant and version fields are integrated adjacent to these components to classify the UUID type without altering their core roles.

UUID Versions

Time-based with MAC Address (Versions 1 and 6)

UUID , also known as the time-based UUID, generates identifiers using a 60-bit representing the number of 100-nanosecond intervals since 00:00:00.00 UTC on October 15, 1582 (the ), combined with a 14-bit clock sequence and a 48-bit node identifier. The is divided into three fields: time_low (32 bits), time_mid (16 bits), and time_hi_and_version (16 bits, with the 4-bit version set to 0001 and the remaining 12 bits for time_hi). The clock sequence prevents duplicates if the system clock is reset or adjusted backward, initialized to a random value between 0 and 16383, while the node field typically holds the of the generating machine's network interface; if no MAC is available, a random 48-bit value is used with the bit (least significant bit of the first octet) set to 1. The generation process for version 1 UUIDs follows a stateful to ensure monotonicity: obtain an exclusive lock to access the UUID generation state, retrieve the current UTC and node ID, compare the to the previous one—if it has not advanced, increment the clock (or generate a new one if it overflows) and retry the acquisition up to a system-defined limit, then format the fields into the 128-bit structure and release the lock. This design supports high generation rates, up to approximately 10 million UUIDs per second per node, as the 100-nanosecond allows 10^7 intervals per second. is guaranteed globally without central coordination: the and node combination ensures no collisions across distinct machines (due to unique MAC addresses), while the clock handles duplicates within the same node and time slot. Version 6 UUIDs, introduced as an update in RFC 9562, maintain the core elements of —60-bit , 14-bit clock , and 48-bit node—but reorder the fields for improved lexical sorting when stored as binary or text representations, enhancing locality and query performance in distributed systems. Specifically, the structure places the most significant 48 bits of the first (time_high across octets 0-5, split as 32 bits in octets 0-3 and 16 bits in 4-5), followed by the version (0110 in bits 48-51 of octet 6), the least significant 12 bits of the (time_low in bits 52-63 of octet 6-7), (10 in bits 64-65 of octet 8), the clock (14 bits across octets 8-9), and the node (48 bits in octets 10-15). Generation mirrors , including the stateful and clock logic, but with the bytes rearranged post-capture to prioritize higher-order bits for sortability, using the same and node derivation rules. Both versions 1 and 6 provide strong uniqueness guarantees identical to those of the original DCE specification, with one UUID per 100-nanosecond interval per node, enabling collision-free operation across space and time in uncoordinated environments. They are particularly suited for use cases in distributed computing systems, such as the Open Software Foundation's Distributed Computing Environment (DCE), where temporal ordering is beneficial for logging, transaction tracking, or replication without requiring synchronized clocks beyond the node level, offering sortability by timestamp but with privacy risks from potential exposure of the MAC address or node ID. Version 6's sorting advantage makes it preferable in modern databases for range queries or partitioning by time.

DCE Security with MAC Address (Version 2)

Version 2 UUIDs, known as DCE security UUIDs, represent a specialized variant of time-based identifiers designed for environments requiring embedded contexts. They extend the core structure of UUIDs by incorporating local identifiers such as user IDs (UIDs) or group IDs (GIDs) to associate UUIDs with specific principals for and auditing purposes. This variant was specified in the DCE 1.1 and Services standard to support privilege management within DCE cells. The layout of a version 2 UUID mirrors that of in its overall 128-bit composition, including a 60-bit split across time_low (32 bits), time_mid (16 bits), and time_hi_and_version (16 bits, with the 4 most significant bits set to 0010 binary to indicate version 2), a 14-bit clock sequence across clock_seq_hi_and_reserved (8 bits, with variant bits 10 in the 2 most significant bits) and clock_seq_low (8 bits), and a 48-bit node field containing the . However, the time_low field replaces the least significant 32 bits of the with the 32-bit local identifier (UID or GID), reducing precision but embedding security information. The clock sequence is effectively shortened to 6 bits in clock_seq_hi_and_reserved (bits 8-13 of the original sequence), while the 8-bit clock_seq_low field holds the domain value, which differentiates the type of local identifier used. The domain value in clock_seq_low specifies the security context and supports three defined values: 0 for the domain (using a user ID), 1 for the group domain (using a group ID), and 2 for the domain (using an organizational unit ID). These domains enable DCE systems to map UUIDs to specific entries, such as in privilege attribute certificates, ensuring that identifiers reflect the creating entity's security role within a local cell. Although the field is 8 bits (allowing values up to 255), only these three are standardized, with others left for potential future or implementation-specific use. Generation of a version 2 UUID involves capturing the current UTC in 100-nanosecond intervals since , 1582, incrementing a 6-bit clock sequence (modulo 64) if the timestamp has not advanced, selecting the appropriate domain and retrieving the corresponding local ID (e.g., via getuid() or getgid()), and combining these with the system's 48-bit node ID (). The local ID and domain are embedded at creation time to record the principal responsible, aiding in auditing and authorization without requiring centralized coordination. Unlike , no standard DCE directly generates version 2 UUIDs; implementations must customize the uuid_create() routine accordingly. Due to their dependency on POSIX-specific identifiers and limited adoption beyond DCE ecosystems, version 2 UUIDs are largely obsolete today and omitted from many modern libraries and standards. RFC 9562 reserves the version for DCE security but provides no further details, deferring to the original specification, and notes their rarity in contemporary systems except for legacy DCE or certain environments.

Namespace Name-based (Versions 3 and 5)

Namespace name-based UUIDs, designated as versions 3 and 5, are generated by applying a to a combination of a predefined UUID and a unique name string, ensuring deterministic uniqueness within that . These versions provide a mechanism to create identifiers from human-readable names that are guaranteed to be unique as long as the name is unique within its specified , making them suitable for applications requiring reproducible UUIDs, such as federated naming systems. Unlike random or time-based UUIDs, the output is always the same for identical inputs, facilitating consistent identification across distributed systems without coordination. The generation process begins with selecting a namespace UUID, which acts as a context for the name, followed by concatenating the namespace UUID—in network byte order—with the name encoded as a sequence of octets (using for strings). For version 3, an hash is computed over this concatenation, yielding a 128-bit digest from which the UUID fields are derived: the first 32 bits form time_low, the next 16 bits time_mid, the following 16 bits populate time_hi_and_version (with the version bits set to 0011 binary, or 3 ), the subsequent 8 bits fill clock_seq_hi_and_reserved (with variant bits set to 10 binary), the next 8 bits clock_seq_low, and the final 48 bits the node field. Version 5 follows an identical structure but uses a hash instead of , with the version bits in time_hi_and_version set to 0101 binary (or 5 ); this substitution is recommended due to MD5's vulnerabilities and provides greater security than version 3, though neither version is intended for security-sensitive applications like credentials. The resulting UUID adheres to the standard (bits 6-7 of octet 6 set to 10) and is converted to the appropriate byte order for representation. RFC 4122 defines several predefined namespace UUIDs to standardize common use cases, including the DNS namespace (6ba7b810-9dad-11d1-80b4-00c04fd430c8) for domain names, the namespace (6ba7b811-9dad-11d1-80b4-00c04fd430c8) for uniform resource locators, and the OID namespace (6ba7b812-9dad-11d1-80b4-00c04fd430c8) for object identifiers. These namespaces enable , allowing different systems to independently generate the same UUID for the same name, thus supporting scenarios like naming resources in distributed directories or registries.

Randomly Generated (Version 4)

Version 4 UUIDs are generated using random or pseudo-random numbers, providing a method for creating unique identifiers without reliance on timestamps or hardware addresses. This approach ensures uniqueness through high-entropy random bits, making it suitable for environments where deterministic generation is undesirable or impractical and the most common type due to its security and privacy-friendly properties. The structure of a Version 4 UUID follows the standard 128-bit layout, with specific fixed bits to indicate the version and variant. The version field, consisting of 4 bits (positions 12-15 in the time_hi_and_version octet), is set to the binary value 0100 to denote Version 4. The variant field, using 2 bits (positions 6-7 in the clock_seq_hi_and_reserved octet), is set to 10 to conform to the RFC 4122 variant specification. The remaining 122 bits are filled with random values, yielding over 2^122 possible unique unique UUIDs and effectively eliminating collision risks in practical applications. Generation of Version 4 UUIDs requires a source of random numbers, preferably of cryptographic quality to maximize and prevent predictability from poor seeding or algorithmic weaknesses. The process involves setting the fixed version and bits, then populating the other fields with random data: the 32-bit time_low field entirely random; the 16-bit time_mid field entirely random; the 12 least significant bits (0-11) of the time_hi_and_version field random; the 14-bit clock (6 bits from clock_seq_hi_and_reserved positions 0-5 plus all 8 bits of clock_seq_low) random; and the 48-bit node field entirely random. This random placement across fields maintains compatibility with the UUID format while distributing evenly. A key advantage of Version 4 UUIDs is their independence from system clocks, avoiding synchronization issues common in time-based variants and enabling generation in offline or distributed systems without coordination. Additionally, by eschewing timestamps and MAC addresses, they enhance privacy by not leaking temporal or hardware-specific information about the generating system. The RFC 4122 standard explicitly recommends this random method for scenarios prioritizing simplicity and security over reproducibility.

Unix Timestamp with Random (Version 7)

UUID Version 7 (UUIDv7) is a time-ordered variant of the Universally Unique Identifier (UUID) standard, defined in RFC 9562, which incorporates a 48-bit Unix timestamp representing milliseconds since the Unix epoch (January 1, 1970, 00:00:00 UTC, excluding leap seconds) into its structure, alongside 4 bits for the version number (set to 0111 binary), a 12-bit field for or a counter, and 62 bits of additional , with the 2-bit variant field set to 10 binary to indicate RFC compliance. As a newer version, it emphasizes time-based generation with sortability. The layout of UUIDv7 arranges the 48-bit across the most significant bits—specifically, the 32-bit time_low field, the 16-bit time_mid field, and the low-order 12 bits of the 16-bit time_hi_and_version field—to ensure that UUIDs generated in temporal sequence exhibit lexical sortability when represented as strings or binary values. The version bits occupy the high-order 4 bits of the time_hi_and_version field, while the variant bits are placed in the high-order 2 bits of the clock_seq_hi_and_reserved field within the 16-bit clock_seq portion, replacing the traditional clock sequence and node identifier fields used in earlier time-based UUIDs. This configuration, illustrated in the following bit-level breakdown, prioritizes temporal ordering in the initial 48 bits followed by randomized bits for uniqueness:
FieldBitsDescription
unix_ts_ms0-4748-bit Unix timestamp (ms)
ver48-51Version (7)
rand_a (or counter/sub-ms)52-6312 bits random or monotonic counter
var64-65Variant (10)
rand_b66-12762 bits random
For generation, the process begins by capturing the current Unix timestamp in milliseconds, which is placed in the high bits; within each millisecond interval, a per-process monotonic counter (up to 4096 increments, fitting the 12-bit field) ensures uniqueness for multiple UUIDs generated sub-millisecond, while the 62 random bits (sourced from a cryptographically secure pseudorandom number generator, similar to those used in Version 4) provide high entropy and replace the node identifier to avoid dependencies on hardware addresses. If the counter overflows within a millisecond, the generator may increment the timestamp or fail to prevent duplicates, promoting robustness in high-throughput scenarios. This design offers lexical sortability akin to UUID Version 6 but achieves clock sequence independence by relying solely on and randomness, resulting in better distribution of across the UUID and reduced collision risk compared to MAC-address-dependent variants. Consequently, UUIDv7 provides enhanced locality over purely random UUIDs like Version 4, as temporally proximate identifiers cluster together, improving query performance in sorted or time-range-based operations without requiring separate fields. Adoption of UUIDv7 is emerging in distributed systems and , where its sortable, time-embedded nature supports efficient indexing for timestamp-driven workloads.

Custom (Version 8)

Version 8 UUIDs provide a flexible framework for custom identifier generation tailored to specific applications or vendors, where the standard layouts of other versions do not suffice. Defined in RFC 9562 as a newer version, this reserves 122 bits for implementation-specific use while enforcing the version field to 8 (binary 1000 in bits 48-51) and the variant field to 10xx (bits 64-65 set to 10), ensuring basic compatibility with UUID parsing systems. This approach allows embedding domain-specific data, such as numbers, application metadata, or custom hashes, without conflicting with predefined structures in versions 1 through 7. Implementations of version 8 UUIDs must fully document their custom layout to enable understanding and potential interoperability, as the RFC does not prescribe any particular algorithm beyond the fixed fields. The 128-bit structure allocates bits 0-47 (custom_a, 48 bits), bits 52-63 (custom_b, 12 bits), and bits 66-127 (custom_c, 62 bits) for user-defined content, leaving the version and bits to signal the custom nature. Uniqueness is the responsibility of the implementer, who must ensure that the method used—whether time-based, random, or otherwise—avoids collisions within the intended scope, and the layout should not mimic patterns from other UUID versions to prevent misinterpretation. For example, a custom version 8 UUID might incorporate a Unix in the initial bits followed by application-specific counters, as illustrated in RFC 9562 with the identifier 2489E9AD-2EE2-8E00-8EC9-32D5F69181C0, or use a SHA-256 hash of and name data for deterministic generation, such as 5c146b14-3c52-8afd-938a-375d0df1fbf6. These examples are illustrative only and not recommended for production without modification to suit the domain's needs. The RFC emphasizes that custom formats should prioritize uniqueness guarantees and rigorous testing, avoiding reliance on security properties like those in version 2. A primary of version 8 UUIDs is diminished , as undocumented or layouts may render identifiers unusable across systems or lead to unintended collisions if uniqueness is not properly managed. To mitigate this, the RFC recommends public documentation of algorithms and advises against using version 8 for scenarios requiring broad , reserving it for controlled, application-specific environments.

Encoding

Binary Representation

A universally unique identifier (UUID) is represented in binary form as a fixed-size 16-byte (128-bit) array, providing a compact and efficient means for storage and transmission across systems without introducing variable-length overhead. This binary format ensures in low-level operations, such as allocation or direct byte manipulation in programming languages. The byte order for this 16-byte array follows big-endian (most significant byte first, also known as network byte order) as specified in RFC 9562, particularly for timestamp-related fields like time_low, time_mid, and time_hi_and_version, where multi-byte values are serialized with the most significant octet first. The node identifier field is likewise transmitted in the order it appears on the network wire, maintaining consistency for cross-platform compatibility. However, implementations in Windows APIs, such as the GUID , store multi-byte fields (Data1, Data2, and Data3) in little-endian order on little-endian architectures like x86, requiring conversion to big-endian when interfacing with network protocols or standards-compliant systems. In database systems, UUIDs are commonly stored using a BINARY(16) data type, preserving the exact 16 bytes without additional formatting or padding, which allows for efficient indexing and querying. In C programming environments, a typical representation is a structure like typedef unsigned char uuid_t[16];, treating the UUID as an opaque byte array to avoid endianness assumptions during local operations. For transmission in network protocols, UUIDs are sent as raw 16-byte sequences in big-endian order without byte swapping or transformation, ensuring direct usability in headers or payloads; examples include Server Message Block (SMB) for file sharing and custom HTTP headers in distributed systems. To parse the binary representation and extract components like the version field, operations are applied directly to the assuming the standard big-endian layout. For instance, the UUID version is obtained from the high of the seventh byte (octet 6, zero-based indexing), corresponding to bits 12-15 of the time_hi_and_version field:

version = (bytes[6] >> 4) & 0x0F;

version = (bytes[6] >> 4) & 0x0F;

This approach enables quick validation and field access in performance-critical code, such as in cryptographic libraries or identifier generators.

Textual Representation

The textual representation of a UUID, as defined in RFC 9562, consists of 32 digits (using lowercase letters a–f) arranged in five groups separated by hyphens in the format 8-4-4-4-12: the first group contains 8 digits for the time-low field, followed by 4 digits each for time-mid, time-high-and-version, clock-seq-and-reserved plus clock-seq-low, and 12 digits for the node field. For example, a typical UUID appears as 123e4567-e89b-12d3-a456-426614174000. This format ensures human readability and across systems. A compact variant omits the hyphens, resulting in a continuous 32-hex-digit string, which is commonly used for storage efficiency or in contexts where brevity is prioritized, though it is not the canonical form specified by the RFC. Uppercase hexadecimal letters are permitted on input for but are not preferred for output, which should use lowercase; the RFC treats hexadecimal values as case-insensitive during processing. When used as a (URN), a UUID is prefixed with urn:uuid:, yielding forms like urn:uuid:123e4567-e89b-12d3-a456-426614174000. Validation of a UUID string typically involves verifying its length (36 characters with hyphens or 32 without), ensuring all characters are valid digits, and checking the and version identifiers embedded in specific positions. The version nibble, located as the first hexadecimal digit of the third group (position 15 in the hyphenated string), must be 1, 3, 4, 5, 6, 7, or 8 to indicate one of the defined UUID versions. Similarly, the bits, starting with the first digit of the fourth group (position 19), should match the RFC 9562 (binary 10xx, corresponding to hexadecimal 8, 9, a, or b) for compatibility. Beyond format checks, the RFC provides no formal mechanism to confirm a UUID's overall validity, such as whether it is assigned or in the future. Many programming libraries support parsing UUID strings into binary form, often accommodating both hyphenated and compact representations. For instance, the uuid_parse() function in the libuuid library (part of the util-linux package) converts a standard hyphenated string to a 128-bit binary UUID, expecting the exact 36-character format including hyphens and null terminator.

Special Values

Nil UUID

The nil UUID is a special form of universally unique identifier defined in the standards for UUIDs, consisting of 128 bits all set to zero. It serves as a reserved value to represent the absence of a UUID, analogous to a null or uninitialized state in data structures. In textual representation, the nil UUID is expressed as 00000000-0000-0000-0000-000000000000. According to RFC 9562, which obsoletes the earlier RFC 4122, this value is explicitly designated as the "nil UUID" and is not produced by any standard UUID generation algorithm. Its variant field evaluates to 0 (following the NCS scheme due to the all-zero bits), and its version field is also 0, distinguishing it from versioned UUIDs. This nil UUID is commonly used in databases to indicate unassigned or optional identifiers, such as in where the UUID type treats the all-zero value as a flag for an unknown or unset UUID, often inserted via functions like uuid_nil(). In programming environments, it represents uninitialized objects; for instance, Python's uuid module provides uuid.NIL as this constant for scenarios requiring a null UUID placeholder. In APIs and data serialization formats like , it denotes optional fields without a valid UUID, avoiding the need for separate null types while maintaining type consistency. Such usage ensures clear signaling of absence without risking collision with generated UUIDs, as the nil value is explicitly reserved for implementation-specific null-like purposes.

Maximum UUID

The maximum UUID, also known as the Max UUID, is a special value consisting of 128 bits all set to 1, represented in hexadecimal as ffffffff-ffff-ffff-ffff-ffffffffffff. This value serves as the theoretical upper bound within the UUID namespace, contrasting with the nil UUID by representing a "full" state rather than an "empty" one. Defined in RFC 9562, the Max UUID adheres to the overall UUID format but features a version number of 15 in its version bits (the first four bits of the third octet set to 1111), which is invalid for standard UUID versions 1 through 8 and reserved for future extensions. Although not explicitly outlined in the earlier RFC 4122, it remains a valid UUID per the structural rules, as the specification does not prohibit all-ones configurations beyond defined variants. In binary form, it is a continuous sequence of 128 ones, making it the largest possible 128-bit value expressible as a UUID. In practice, the Max UUID is rarely generated or encountered, as it is primarily for specific system-level purposes rather than routine identification. It functions as a in scenarios requiring a 128-bit UUID placeholder where no valid identifier applies, such as denoting an invalid or uninitialized state in protocols or data structures. Common contexts include overflow protection in UUID-based counters, where it signals the exhaustion of the identifier space, or as a marker in technical specifications to avoid conflicts with assignable values. For instance, database systems like provide functions to generate this value explicitly as the counterpart to the nil UUID for such sentinel roles. Implementations in programming languages further highlight its specialized role; the Python standard library's uuid module exposes it as uuid.MAX for programmatic use in boundary checks or defaults, while similar constants appear in Rust's uuid crate and Node.js's uuid package to represent the all-ones boundary. Overall, its adoption emphasizes conceptual completeness in UUID ecosystems without implying routine generation, ensuring it does not collide with probabilistically unique identifiers.

Collisions

Probability Calculations

The probability of collisions in UUIDs is analyzed using the birthday paradox approximation, which estimates the likelihood of at least one duplicate among nn generated identifiers in a space of size NN:
P(collision)1en2/(2N)P(\text{collision}) \approx 1 - e^{-n^2 / (2N)}
where N=2mN = 2^m and mm is the number of effective random bits. This formula provides a practical bound for collision risk across UUID versions, assuming uniform distribution and .
For and version 6 UUIDs, collisions occur only if two identifiers share the exact 60-bit , 14-bit clock sequence, and 48-bit node identifier, yielding an effective randomness of 62 bits (m=62m = 62, N262N \approx 2^{62}) within each slot. The global is further ensured by using unique node identifiers, such as MAC addresses, reducing the practical collision risk across distributed systems. For collisions within the same , the space is 62 bits, making duplicates extremely unlikely unless generating over 2312^{31} UUIDs in the same slot across identical nodes, which is practically impossible. Globally, is ensured by distinct and node IDs. Version 2 UUIDs, a legacy DCE security variant, follow a similar time-based structure to version 1 but replace the 48-bit node field with 32-bit POSIX UID and 32-bit GID fields, resulting in an effective space of 60-bit timestamp + 14-bit clock sequence + 64-bit UID/GID (m78m \approx 78) within each slot, scoped to specific users or groups on a system. Uniqueness relies on distinct inputs, with collision risks comparable to version 1 but limited to the same system/user context; due to rarity of use, detailed probability analyses are uncommon. Version 4 UUIDs utilize 122 random bits (m=122m = 122, N=2122N = 2^{122}), excluding the fixed version and variant fields. Under the birthday approximation, generating approximately 2612^{61} (about 2.3 quintillion) UUIDs yields a collision probability of roughly 50%, though smaller sets like 2502^{50} UUIDs result in a negligible of about 101510^{-15}. This vast space makes collisions extremely unlikely in most applications. Version 3 UUIDs, based on hashing of a and name, inherit MD5's known collision vulnerabilities, where practical attacks can produce distinct inputs with identical 128-bit outputs, though scoping (e.g., DNS or ) confines risks to specific domains and limits global impact. Version 5 UUIDs use hashing, which also suffers from demonstrated collisions (e.g., chosen-prefix attacks requiring feasible computation), but the same constraints mitigate widespread uniqueness failures; both versions assume input uniqueness to avoid hash-based duplicates. NIST has deprecated due to these weaknesses and announced that should be phased out by December 31, 2030, for cryptographic uses. Version 7 UUIDs combine a 48-bit Unix (milliseconds since 1970) with 74 random bits (m=74m = 74, N=274N = 2^{74}), providing slightly reduced effective randomness compared to version 4 due to the fixed time component. Under the birthday approximation, generating approximately 2372^{37} version 7 UUIDs within the same would yield about a 50% chance of collision, though such volume in one is practically impossible. For realistic rates (e.g., millions per second), the risk remains negligible, with overall low probability given the time-ordered nature. Version 8 UUIDs are custom, with structure defined by the implementer, typically allocating 122 bits for version-invariant data including random components (e.g., at least 74 random bits recommended). Collision probability depends on the effective random bits used (mm up to 122); following RFC 9562 guidelines for sufficient ensures risks similar to version 4, but poor implementations could increase vulnerabilities.

Mitigation Strategies

To minimize the risk of collisions in UUID generation, implementations must adhere to the specifications outlined in relevant standards, particularly for random and time-based variants. For Version 4 and Version 7 UUIDs, which rely heavily on randomness, using a (CSPRNG) is essential to ensure sufficient and unguessability; weak pseudorandom number generators like the standard C rand() function should be avoided, as they can lead to predictable sequences and increased collision probabilities. For Version 8, implementers must ensure adequate random bits and CSPRNG usage to match version 4 security levels. For Version 1 and Version 6 UUIDs, which incorporate timestamps and node identifiers, maintaining stable, monotonically increasing clocks is critical to prevent duplicates from clock rollbacks or low-resolution timing; if the clock regresses, the clock sequence must be incremented or randomized to maintain uniqueness. Unique node IDs, such as MAC addresses, further reduce collision risks, but in their absence, a fallback to a randomly generated 48-bit node ID with the multicast bit set to 1 provides a viable alternative while preserving global uniqueness properties. Version 2 follows similar clock and sequence rules but scopes uniqueness via UID/GID. Version 3 and Version 5 UUIDs, being name-based, mitigate cross-domain collisions by scoping generations to predefined , such as the DNS namespace (UUID: 6ba7b810-9dad-11d1-80b4-00c04fd430c8), ensuring that identical names in different namespaces produce distinct UUIDs through hashing ( for Version 3, for Version 5). Version 5 is preferred over Version 3 due to MD5's known vulnerabilities, though both rely on the uniqueness of the input name within its namespace to avoid collisions. Beyond version-specific measures, general best practices include generating UUIDs on demand during runtime rather than pre-allocating batches, which can introduce errors if not properly synchronized across distributed systems; in high-volume environments, such as handling millions of insertions, application-level monitoring for duplicates—via hashing indexes or periodic scans—is recommended to detect and handle any rare collisions promptly. Standard-compliant libraries facilitate these practices: the OSSP UUID library implements Versions 1, 3, 4, and 5 per RFC 4122 (updated in RFC 9562), using system-appropriate entropy sources for randomness, while Java's java.util.UUID class employs SecureRandom for randomUUID() to generate Version 4 UUIDs with cryptographic strength.

Uses

Filesystems and Storage

In filesystems, universally unique identifiers (UUIDs) serve as persistent, hardware-independent labels for volumes and partitions, enabling reliable identification and mounting without reliance on volatile device paths. This approach facilitates seamless operation across diverse hardware configurations and prevents conflicts arising from device enumeration changes. UUIDs are typically generated randomly during filesystem creation, often adhering to version 4 of the UUID standard for high entropy and . The (GPT), standardized in the specification, employs a 128-bit Disk GUID to uniquely identify the entire disk, including its header and associated storage. This GUID is generated randomly upon GPT initialization and stored in the GPT header at byte offset 56, serving as a disk signature that distinguishes it from other storage devices even if cloned. Partition entries in GPT also use UUIDs for type identification and unique partitioning, ensuring unambiguous recognition in bootloaders and operating systems. Linux filesystems like ext4 and XFS integrate UUIDs directly into their superblocks for volume identification. For ext4, the UUID is automatically generated as a random 128-bit value during filesystem creation with the mkfs.ext4 command, unless explicitly set via the -U option; this identifier is then referenced in /etc/fstab for stable mounting, decoupling the process from device names like /dev/sda1 that may shift due to hardware additions. Similarly, XFS generates a random UUID by default when formatted with mkfs.xfs, storable in the superblock and customizable with the -m uuid=value option, allowing consistent administration and mounting via tools like mount and xfs_admin. These UUIDs enable automated detection and configuration in environments with dynamic storage topologies. Microsoft's NTFS filesystem utilizes 128-bit GUIDs as Object IDs for volumes and files, assigned to metadata structures like the master file table (MFT) records and the volume root. These GUIDs, supported exclusively on NTFS volumes, facilitate secure identification and access, particularly in security descriptors where they link ownership and permissions without depending on file paths. The volume's Object ID acts as a persistent GUID for the entire filesystem, complementing the 64-bit volume serial number and enabling features like volume mount points via the mountvol command, which references volumes as \\?\volume\{GUID}\. Apple File System (APFS) organizes storage into containers, each identified by a unique 128-bit UUID that encapsulates multiple volumes sharing the same physical space. This container UUID plays a critical role in , where keybags are encrypted using the UUID to enable rapid, secure erasure of contents by invalidating keys tied to the identifier. For snapshots, APFS leverages the container structure to manage point-in-time copies across volumes, with the UUID ensuring integrity and isolation during operations like or , as volumes within the container inherit contextual metadata from it. The adoption of UUIDs in these filesystems yields key advantages, including portability across hardware platforms, as identifiers remain constant regardless of port changes or system reconfiguration, thus simplifying migration and . They also mitigate naming conflicts in multi-disk setups by providing globally unique labels, reducing errors in mounting and data access while enhancing resilience in distributed or cloud environments.

Databases and Identification

In relational databases, UUIDs serve as surrogate primary keys, providing globally unique identifiers without relying on sequential values generated by the database. For instance, includes a native uuid that stores 128-bit UUIDs efficiently as binary values, making it suitable for primary keys in distributed environments where uniqueness across systems is essential. This approach avoids the predictability of auto-incrementing integer IDs, which can expose sensitive information through enumeration attacks or reveal database growth patterns. Certain UUID versions enhance database operations involving time. 1 and 6 incorporate , enabling temporal queries by extracting creation times directly from the identifier for filtering or ordering records based on when data was inserted. In contrast, version 7 prioritizes sortability by placing a Unix in the most significant bits, improving index performance in structures for time-ordered data retrieval. In databases, UUIDs offer alternatives to native identifier schemes. defaults to ObjectIds, which are 12-byte values embedding timestamps for efficient indexing and sorting, but UUIDs provide stronger cross-system at the cost of larger storage when encoded as binary (16 bytes versus ObjectId's compact form). employs version 4 UUIDs—randomly generated for even data distribution—as partition keys, ensuring balanced load across nodes in clustered setups without hotspots from sequential patterns. For indexing, UUIDs are stored in binary format (16 bytes) in systems like , which is more space-efficient than text representations but doubles the size of 8-byte integers like BIGINT, potentially increasing index bloat in high-volume tables. Despite this, binary storage supports fast comparisons and hashing. In distributed transactions, UUIDs are generated client-side at insert time, ensuring consistency across replicas without central coordination and preventing ID conflicts during merges.

Networking and Distributed Systems

In distributed networking and systems, universally unique identifiers (UUIDs) play a critical role in ensuring unambiguous identification of objects, sessions, and messages across heterogeneous environments, preventing conflicts in transient references without relying on centralized coordination. This is particularly valuable in protocols where objects or data must be referenced remotely, as UUIDs provide a 128-bit space that minimizes collision risks even in high-scale, decentralized scenarios. In the (CORBA) using the Internet Inter-ORB Protocol (IIOP), UUIDs form part of the object key within Interoperable Object References (IORs), enabling unique identification of distributed objects across ORBs. The DCE UUID format is specified for this purpose in IIOP profiles, allowing clients to invoke methods on remote objects without name resolution dependencies. Similarly, Microsoft's (DCOM) employs GUIDs—equivalent to UUIDs—for interface marshaling, where the Interface Identifier (IID) uniquely specifies the COM interface being accessed, and the Causality Identifier (CID) tracks related call chains during remote activation and invocation. For modern web-based protocols, RESTful HTTP APIs frequently incorporate UUIDs in resource URLs to denote specific entities, such as /api/resources/{uuid}, which obscures sequential patterns and supports distributed generation without database coordination. ETags for caching can also leverage UUIDs as opaque validators, ensuring efficient conditional requests by comparing resource versions across distributed caches. In with , UUIDs are typically encoded as fixed-length strings (e.g., 36 characters in hyphenated form) or 16-byte fields within message definitions to identify requests, responses, or session objects, facilitating reliable routing in architectures. Custom UUID variants may be defined in application-specific protocols to incorporate network metadata, such as timestamps or node IDs, enhancing in remoting scenarios.

Other Applications

UUIDs find application in software licensing as product keys or identifiers, where a unique value tied to hardware or machine-specific ensures licensed use on designated systems. For instance, employs UUIDs to bind licenses to the compute resource's identifier during , generating a single key that validates the software installation on that specific hardware. Version 5 UUIDs, which are name-based and employ hashing of inputs like machine , support such deterministic generation for reproducible yet unique keys. In systems, UUIDs serve as trace identifiers to correlate events across distributed components in structured logs. OpenTelemetry, a widely adopted framework, uses 128-bit trace IDs generated randomly, often aligning with Version 4 UUIDs for their high and , enabling end-to-end tracing of requests through . UUIDs contribute to by providing unique values for nonces or salts, enhancing security in protocols or hashing schemes, though implementations must prioritize randomness to avoid predictability. Version 4 UUIDs, with their random bit generation, are recommended for such security-sensitive uses due to their suitability for operations requiring unpredictability. For , UUIDs assign unique instance identifiers to containers, facilitating orchestration and isolation in environments like Docker and . In , every object receives a globally unique UID based on UUID standards (ISO/IEC 9834-8), distinguishing instances across the cluster regardless of names or namespaces. Docker similarly allocates a unique 64-character identifier as the container ID upon creation, ensuring unambiguous reference in commands and APIs. In multimedia applications, UUIDs embed as unique identifiers in metadata standards to track and distinguish media files. The ImageUniqueID tag, a 128-bit fixed-length string, functions as a UUID-equivalent to assign persistent uniqueness to images, aiding in database cataloging and verification. Similarly, tags in audio files use the UFID frame for unique file identifiers, where a UUID can be stored under an owner like a music database to enable cross-system recognition without relying on filenames or other mutable data.

References

Add your contribution
Related Hubs
Contribute something
User Avatar
No comments yet.