Recent from talks
Contribute something
Nothing was collected or created yet.
Universally unique identifier
View on Wikipedia| Acronym | UUID |
|---|---|
| Organisation | Open Software Foundation (OSF), ISO/IEC, Internet Engineering Task Force (IETF) |
| No. of digits | 32 |
| Example | f81d4fae-7dec-11d0-a765-00a0c91e6bf6 |
| Website | RFC 9562 (obsoleted RFC 4122) |
A universally unique identifier (UUID) is a 128-bit number designed to be a unique identifier for objects in computer systems. UUIDs are designed to be large enough that any randomly-generated UUID will, in practice, be unique from all other UUIDs. The term globally unique identifier (GUID) is also used, mostly in Microsoft-designed systems.[1][2] The standard way to represent UUIDs is as 32 hexadecimal digits, which are split with hyphens into five groups.
Universally unique identifiers are typically generated with a random number generator, with some systems also incorporating the time of generation or other information into the identifier. There are multiple standards for generating UUIDs for different applications with different requirements.[1] While the probability that a UUID value will be duplicated is not zero, it is generally considered negligible.[3][4] Because there are on the order of 1038 possible UUID values, different computer systems can assume that any UUID they generate will be unique across all computer systems in the world: there is no need for systems to coordinate to avoid reusing the same identifier.
UUIDs are in widespread use in modern computer systems and on the internet to label data objects, for example files or database entries. Despite being large enough to be universally unique, UUIDs still have a low overhead and are quick to generate and compare.
History
[edit]In the 1980s, Apollo Computer originally used UUIDs in the Network Computing System (NCS). Later, the Open Software Foundation (OSF) used UUIDs for their Distributed Computing Environment (DCE). The design of the DCE UUIDs was partly based on the NCS UUIDs,[5] whose design was in turn inspired by the (64-bit) unique identifiers defined and used pervasively in Domain/OS, an operating system designed by Apollo Computer.[6] Later in the early 1990s, the Microsoft Windows platforms adopted the DCE design as "Globally Unique IDentifiers" (GUIDs).
RFC 4122 registered a URN namespace for UUIDs and recapitulated the earlier specifications, with the same technical content.[2] When in July 2005 RFC 4122 was published as a proposed IETF standard, the ITU had also standardized UUIDs, based on the previous standards and early versions of RFC 4122. On May 7, 2024, RFC 9562[1] was published, introducing 3 new "versions" and clarifying some ambiguities.
Standards
[edit]The UUID technology is standardized by various bodies. The definition is standardized by the Open Software Foundation (OSF) as part of the Distributed Computing Environment (DCE).[7][8] The definition is documented as part of ISO/IEC 11578:1996 "Information technology – Open Systems Interconnection – Remote Procedure Call (RPC)" and more recently in ITU-T Rec. X.667 | ISO/IEC 9834-8:2014.[9] The Internet Engineering Task Force (IETF) published the Standards-Track RFC 9562[1] from the "Revise Universally Unique Identifier Definitions Working Group"[10] as revision for RFC 4122.[2] RFC 4122 is technically equivalent to ITU-T Rec. X.667 | ISO/IEC 9834-8, but is now obsolete.
Format
[edit]A UUID is 128 bits in size, in which 2 to 4 bits are used to indicate the format's variant. The most common variant in use, OSF DCE, additionally defines 4 bits for its version.
The use of the remaining bits is governed by the variant/version selected.
Variants
[edit]The variant field indicates the format of the UUID (and in case of the legacy UUID also the address family used for the node field). The following variants are defined:
- The Apollo NCS variant (indicated by the one-bit pattern 0xxx2) is for backwards compatibility with the now-obsolete Apollo Network Computing System 1.5 UUID format developed around 1988. Though different in detail, the similarity with modern UUIDv1 is evident. The variant bits in the current UUID specification coincide with the high bits of the address family octet in NCS UUIDs. Though the address family could hold values in the range 0..255, only the values 0..13 were ever defined. Accordingly, the bit pattern
0xxxavoids conflicts with historical NCS UUIDs, should any still exist in databases.[11] This variant defines "families" as subtype. - The OSF DCE variant (10xx2) are referred to as RFC 4122/DCE 1.1 UUIDs, or "Leach–Salz" UUIDs, after the authors of the original Internet Draft. This variant defines "versions" as subtype.
- The Microsoft COM/DCOM variant (110x2) is characterized in the RFC as "reserved, Microsoft Corporation backward compatibility" and was used for early GUIDs on the Microsoft Windows platform.
- The Reserved variant space is not currently used by any specification.
Versions of the OSF DCE variant
[edit]The OSF DCE variant defines eight "versions" in the standard, and each version may be more appropriate than the others in specific use cases. The version is indicated by the value of the higher nibble (higher 4 bits, or higher hexadecimal digit) of the 7th byte of the UUID. In hex, this is the character after the second dash. For example, the UUID 9c5b94b1-35ad-49bb-b118-8e8fc24abf80 is version 4, because of the digit after the second dash is 4 in ...-49bb-....
Versions 1 and 6 (date-time and MAC address)
[edit]Version 1 concatenates the 48-bit MAC address of the "node" (that is, the computer generating the UUID), with a 60-bit timestamp, being the number of 100-nanosecond intervals since midnight 15 October 1582 Coordinated Universal Time (UTC), the date on which the Gregorian calendar was first adopted by the bulk of Europe. RFC 4122 states that the time value rolls over around 3400 AD,[2]: 3 depending on the algorithm used, which implies that the 60-bit timestamp is a signed quantity. However some software, such as the libuuid library, treats the timestamp as unsigned, putting the rollover time in 5623 AD.[12] The rollover time as defined by ITU-T Rec. X.667 is 3603 AD.[13]: v
A 13-bit or 14-bit "uniquifying" clock sequence extends the timestamp in order to handle cases where the processor clock does not advance fast enough, or where there are multiple processors and UUID generators per node. When UUIDs are generated faster than the system clock could advance, the lower bits of the timestamp fields can be generated by incrementing it every time a UUID is being generated, to simulate a high-resolution timestamp. With each version 1 UUID corresponding to a single point in space (the node) and time (intervals and clock sequence), the chance of two properly generated version-1 UUIDs being unintentionally the same is practically nil. Since the time and clock sequence total 74 bits, 274 (1.8×1022, or 18 sextillion) version-1 UUIDs can be generated per node ID, at a maximal average rate of 163 billion per second per node ID.[2]
In contrast to other UUID versions, version-1 and -2 UUIDs based on MAC addresses from network cards rely for their uniqueness in part on an identifier issued by a central registration authority, namely the Organizationally Unique Identifier (OUI) part of the MAC address, which is issued by the IEEE to manufacturers of networking equipment.[14] The uniqueness of version-1 and version-2 UUIDs based on network-card MAC addresses also depends on network-card manufacturers properly assigning unique MAC addresses to their cards, which like other manufacturing processes is subject to error. Virtual machines receive a MAC address in a range that is configurable in the hypervisor.[15] Additionally some operating systems permit the end user to customise the MAC address, notably OpenWRT.[16]
Usage of the node's network card MAC address for the node ID means that a version-1 UUID can be tracked back to the computer that created it. Documents can sometimes be traced to the computers where they were created or edited through UUIDs embedded into them by word processing software. This privacy hole was used when locating the creator of the Melissa virus.[17]
RFC 9562[1] does allow the MAC address in a version-1 (or 2) UUID to be replaced by a random 48-bit node ID, either because the node does not have a MAC address, or because it is not desirable to expose it. In that case, the RFC requires that the least significant bit of the first octet of the node ID should be set to 1.[2] This corresponds to the multicast bit in MAC addresses, and setting it serves to differentiate UUIDs where the node ID is randomly generated from UUIDs based on MAC addresses from network cards, which typically have unicast MAC addresses.[2]
Version 6 is the same as version 1 except all timestamp bits are ordered from most significant to least significant. This allows systems to sort UUIDs in order of creation simply by sorting them lexically, whereas this is not possible with version 1.
Version 2 (date-time and MAC address, DCE security version)
[edit]RFC 9562[1] reserves version 2 for "DCE security" UUIDs; but it does not provide any details. For this reason, many UUID implementations omit version 2. However, the specification of version-2 UUIDs is provided by the DCE 1.1 Authentication and Security Services specification.[8]
Version-2 UUIDs are similar to version 1, except that the least significant 8 bits of the clock sequence are replaced by a "local domain" number, and the least significant 32 bits of the timestamp are replaced by an integer identifier meaningful within the specified local domain. On POSIX systems, local-domain numbers 0 and 1 are for user ids (UIDs) and group ids (GIDs) respectively, and other local-domain numbers are site-defined.[8] On non-POSIX systems, all local domain numbers are site-defined.
The ability to include a 40-bit domain/identifier in the UUID comes with a tradeoff. On the one hand, 40 bits allow about 1 trillion domain/identifier values per node ID. On the other hand, with the clock value truncated to the 28 most significant bits, compared to 60 bits in version 1, the clock in a version 2 UUID will "tick" only once every 429.49 seconds, a little more than 7 minutes, as opposed to every 100 nanoseconds for version 1. And with a clock sequence of only 6 bits, compared to 14 bits in version 1, only 64 unique UUIDs per node/domain/identifier can be generated per 7-minute clock tick, compared to 16,384 clock sequence values for version 1.[18]
Versions 3 and 5 (namespace name-based)
[edit]Version-3 and version-5 UUIDs are generated by hashing a namespace identifier and name. Version 3 uses MD5 as the hashing algorithm, and version 5 uses SHA-1.[1]
The namespace identifier is itself a UUID. The specification provides constant UUIDs to represent the namespaces for URLs, fully qualified domain names, object identifiers, and X.500 distinguished names; but any desired UUID may be used as a namespace designator.
To determine the version-3 UUID corresponding to a given namespace and name, the UUID of the namespace is transformed to a string of bytes, concatenated with the input name, then hashed with MD5, yielding 128 bits. Then 6 or 7 bits are replaced by fixed values, the 4-bit version (e.g. 00112 for version 3), and the 2- or 3-bit UUID "variant" (e.g. 102 indicating an RFC 9562[1] UUIDs, or 1102 indicating a legacy Microsoft GUID). Since 6 or 7 bits are thus predetermined, only 121 or 122 bits contribute to the uniqueness of the UUID.
Version-5 UUIDs are similar, but SHA-1 is used instead of MD5. Since SHA-1 generates 160-bit digests, the digest is truncated to 128 bits before the version and variant bits are replaced.
Version-3 and version-5 UUIDs have the property that the same namespace and name will map to the same UUID. However, neither the namespace nor name can be determined from the UUID, even if one of them is specified, except by brute-force search. RFC 4122 recommends version 5 (SHA-1) over version 3 (MD5), and warns against use of UUIDs of either version as security credentials.[2]
Version 4 (random)
[edit]A version 4 UUID is randomly generated. As in other UUIDs, 4 bits are used to indicate version 4, and 2 or 3 bits to indicate the variant (102 or 1102 for variants 1 and 2 respectively). Thus, for variant 1 (that is, most UUIDs) a random version 4 UUID will have 6 predetermined variant and version bits, leaving 122 bits for the randomly generated part, for a total of 2122, or 5.3×1036 (5.3 undecillion) possible version-4 variant-1 UUIDs. There are half as many possible version 4, variant 2 UUIDs (legacy GUIDs) because there is one less random bit available, 3 bits being consumed for the variant.
Per RFC 9562[1], the seventh octet's most significant 4 bits indicate which version the UUID adheres to. This means that the first hexadecimal digit in the third group always starts with a 4 in UUIDv4s. Visually, this looks like this xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx, where M is the UUID version field. The upper two or three bits of digit N encode the variant. Values are 8, 9, A or B for the 2 bit indication, values C or D for the 3 bit indication. For example, a random UUID version 4, variant 1 could be 8D8AC610-566D-4EF0-9C22-186B2A5ED793.[19]
Version 7 (timestamp and random)
[edit]Version 7 UUIDs (UUIDv7) are designed for keys in high-load databases and distributed systems.
UUIDv7 begins with a 48 bit big-endian Unix Epoch timestamp with approximately millisecond granularity. The timestamp can be shifted by any time shift value. Directly after the timestamp follows the version nibble, that must have a value of 7. The variant bits have to be 10x. The remaining 74 bits are random seeded counter (optional, at least 12 bits but no longer than 42 bits) and random.
Two counter rollover handling methods can be used together:
- Zero seeded most significant, leftmost guard bit of the counter.
- Increment of the timestamp ahead of the actual time and reinitialize the counter when it overflows.
In DBMS UUIDv7 generator can be shared between threads (tied to a table or to a DBMS instance) or can be thread-local (with worse monotonicity, locality and performance).
Version 8 (custom)
[edit]Version 8 only has two requirements:
- The variant bits have to be
10, so the nibble containing the variant must be 8 (0b1000), 9 (0b1001), A (0b1010), or B (0b1011). - The version nibble has to be the value of 8.
Those requirements tell the system that it is a version 8 UUID. The remaining 122 bits are up to the vendor to customize. The difference with version 4 is that those 122 bits are random, but the 122 bits in UUID version 8 are not, because they follow vendor specific rules.
Special values
[edit]The "nil" UUID is 00000000-0000-0000-0000-000000000000; that is, all clear bits.[1] The "max" UUID, sometimes also called the "omni" UUID, is FFFFFFFF-FFFF-FFFF-FFFF-FFFFFFFFFFFF; that is, all set bits.[1]
Encoding
[edit]Binary representation
[edit]Initially, Apollo Computer designed the UUID with the following wire format:[5][11]
| Name | Offset | Length | Description |
|---|---|---|---|
| time_high | 0x00 | 4 octets / 32 bits | The first 6 octets are the number of four-microsecond (μs) units of time that have passed since 1980-01-01 00:00 UTC. The time 248 × 4 μs after 1980 started was 2015-09-05 05:58:26.84262 UTC. Thus, the last time at which UUIDs could be generated in this original format was in 2015.[20] |
| time_low | 0x04 | 2 octets / 16 bits | |
| reserved | 0x06 | 2 octets / 16 bits | These octets are reserved for future use. |
| family | 0x08 | 1 octet / 8 bits | This octet is an address family. |
| node | 0x09 | 7 octets / 56 bits | These octets are a host ID in the form allowed by the specified address family. |
Later, the UUID was extended by combining the legacy family field with the new variant field. Because the family field only had used the values ranging from 0 to 13 in the past, it was decided that a UUID with the most significant bit set to 0 was a legacy UUID. This gives the following table for the family group:
| MSB 0 | MSB 1 | MSB 2 | Legacy family field value range | In hex | Description |
|---|---|---|---|---|---|
| 0 | x | x | 0–127 (Only 0–13 are used) | 0x00–0x7f | The legacy Apollo NCS UUID |
| 1 | 0 | x | 128–191 | 0x80–0xbf | OSF DCE UUID |
| 1 | 1 | 0 | 192–223 | 0xc0–0xdf | Microsoft COM / DCOM UUID |
| 1 | 1 | 1 | 224–255 | 0xe0–0xff | Reserved for future definition |
The legacy Apollo NCS UUID has the format described in the previous table. The OSF DCE UUID variant is described in RFC 9562[1]. The Microsoft COM / DCOM UUID has its variant described in the Microsoft documentation.
Endianness
[edit]When saving UUIDs to binary format, they are sequentially encoded in big-endian. For example, 00112233-4455-6677-8899-aabbccddeeff is encoded as the bytes 00 11 22 33 44 55 66 77 88 99 aa bb cc dd ee ff.[21][22]
An exception to this are Microsoft's variant 2 UUIDs ("GUID"): historically used in COM/OLE libraries, they use a little-endian format, but appear mixed-endian with the first three components of the UUID as little-endian and last two big-endian. Microsoft's GUID structure defines the last eight bytes as an 8-byte array, which are serialized in ascending order, which makes the byte representation appear mixed-endian.[23] For example, 00112233-4455-6677-8899-aabbccddeeff is encoded as the bytes 33 22 11 00 55 44 77 66 88 99 aa bb cc dd ee ff.[24][25]
Textual representation
[edit]In most cases, UUIDs are represented as hexadecimal values. The most used format is the 8-4-4-4-12 format, xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx, where every x represents 4 bits. Other well-known formats are the 8-4-4-4-12 format with braces, {xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx}, like in Microsoft's systems, e.g. Windows, or xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, where all hyphens are removed. In some cases, it is also possible to have xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx with the "0x" prefix or the "h" suffix to indicate hexadecimal values. The format with hyphens was introduced with the newer variant system. Before that, the legacy Apollo format used a slightly different format: 34dc23469000.0d.00.00.7c.5f.00.00.00. The first part is the time (time_high and time_low combined). The reserved field is skipped. The family field comes directly after the first dot, so in this case 0d (13 in decimal) for DDS (Data Distribution Service). The remaining parts, each separated with a dot, are the node bytes.
The lowercase form of the hexadecimal values is the generally preferred format. Specifically in some contexts such as those defined in ITU-T Rec. X.667, lowercase is required when the text is generated, but the uppercase version must also be accepted.
Like any integer, a UUID can be represented as a decimal number. For example, the UUID 550e8400-e29b-41d4-a716-446655440000 can be represented as the decimal number 113059749145936325402354257176981405696. It is up to the implementation to decide whether this number is signed or unsigned, i.e. whether this decimal number is negative if the first bit is a 1.
A UUID can also be represented in binary, as a string of 128 bits. For example, the UUID 550e8400-e29b-41d4-a716-446655440000 can be represented as 0101010100001110
RFC 9562[1] registers the "uuid" namespace. This makes it possible to make URNs out of UUIDs, like urn:uuid:550e8400-e29b-41d4-a716-446655440000. The normal 8-4-4-4-12 format is used for this. It is also possible to make a OID URN out of UUIDs, like urn:oid:2.25.113059749145936325402354257176981405696. In that case, the unsigned decimal format is used. The "uuid" URN is recommended over the "oid" URN.
Collisions
[edit]Collision occurs when the same UUID is generated more than once and assigned to different referents. In the case of standard version-1 and version-2 UUIDs using unique MAC addresses from network cards, collisions are unlikely to occur, with an increased possibility only when an implementation varies from the standards, either inadvertently or intentionally.
In contrast to version-1 and version-2 UUIDs generated using MAC addresses, with version-1 and -2 UUIDs which use randomly generated node ids, hash-based version-3 and version-5 UUIDs, and random version-4 UUIDs, collisions can occur even without implementation problems, albeit with a probability so small that it can normally be ignored. This probability can be computed precisely based on analysis of the birthday problem.[26]
For example, the number of random version-4 UUIDs which need to be generated in order to have a 50% probability of at least one collision is 2.71 quintillion, computed as follows:[27]
This number would be equivalent to generating 1 billion UUIDs per second for about 86 years. A file containing this many UUIDs, at 16 bytes per UUID, would be about 43.4 exabytes (37.7 EiB).
The smallest number of version-4 UUIDs which must be generated for the probability of finding a collision to be p is approximated by the formula
Thus, the probability to find a duplicate within 103 trillion version-4 UUIDs is one in a billion.
Collisions have occurred when manufacturers assign a default UUID to a product, such as a motherboard, and then fail to over-write the default UUID later in the manufacturing process. For example, UUID 03000200-0400-0500-0006-000700080009 occurs on many different units of Gigabyte-branded motherboards.[28]
Uses
[edit]Filesystems
[edit]Significant uses include filesystem userspace tools [29] most of which are derived from the original implementation by Theodore Ts'o.[12] The "partition label" and the "partition UUID" are both stored in the superblock. They are both part of the file system rather than of the partition. For example, ext2–4 contain a UUID, while NTFS or FAT32 do not. The superblock is a part of the file system, thus fully contained within the partition, hence doing leaves both sda1 and sdb1 with the same label and UUID.
Partition tables
[edit]The GUID Partition Table (GPT) is one example that utilised GUIDs to label partition types.
Remoting
[edit]There are several flavors of GUIDs used in Microsoft's Component Object Model (COM):
- IID – interface identifier; (The ones that are registered on a system are stored in the Windows Registry at
[HKEY_CLASSES_ROOT\Interface][30] ) - CLSID – class identifier; (Stored at
[HKEY_CLASSES_ROOT\CLSID]). In practice it is not entirely separate from the IID space, because remoting the interface can require a proxy/stub object which some toolsets used to create with a CLSID equal to the interface's IID. - LIBID – type library identifier; (Stored at
[HKEY_CLASSES_ROOT\TypeLib][31]) - CATID – category identifier; (its presence on a class identifies it as belonging to certain class categories, listed at
[HKEY_CLASSES_ROOT\Component Categories][32])
Databases
[edit]UUIDs are commonly used as a unique key in database tables. The NEWID function in Microsoft SQL Server version 4 Transact-SQL returns standard random version-4 UUIDs, while the NEWSEQUENTIALID function returns 128-bit identifiers similar to UUIDs which are committed to ascend in sequence until the next system reboot.[33] The Oracle Database SYS_GUID function does not return a standard GUID, despite the name. Instead, it returns a 16-byte 128-bit RAW value based on a host identifier and a process or thread identifier, somewhat similar to a GUID.[34] PostgreSQL contains a UUID datatype[35] and can generate most versions of UUIDs through the use of functions from modules.[36][37] MySQL provides a UUID function, which generates standard version-1 UUIDs.[38]
Combined Time-GUID
[edit]The random nature of standard UUIDs of versions 3, 4, and 5, and the ordering of the fields within standard versions 1 and 2 may create problems with database locality or performance when UUIDs are used as primary keys. For example, in 2002 Jimmy Nilsson reported a significant improvement in performance with Microsoft SQL Server when the version-4 UUIDs being used as keys were modified to include a non-random suffix based on system time. This so-called "COMB" (combined time-GUID) approach made the UUIDs significantly more likely to be duplicated, as Nilsson acknowledged, but Nilsson only required uniqueness within the application.[39] By reordering and encoding version 1 and 2 UUIDs so that the timestamp comes first, insertion performance loss can be averted.[40]
COMB-like arrangements of UUID payloads were eventually standardized in RFC 9562[1] as UUIDv6 and UUIDv7.
See also
[edit]References
[edit]- ^ a b c d e f g h i j k l m n Davis, K.; Peabody, B.; Leach, P. (2024). Universally Unique IDentifiers (UUIDs). Internet Engineering Task Force. doi:10.17487/RFC9562. RFC 9562. Retrieved 9 May 2024.
- ^ a b c d e f g h Leach, P.; Mealling, M.; Salz, R. (2005). A Universally Unique IDentifier (UUID) URN Namespace. Internet Engineering Task Force. doi:10.17487/RFC4122. RFC 4122. Retrieved 17 January 2017.
- ^ "Universally Unique Identifiers (UUID)". H2. Retrieved 21 March 2021.
- ^ ITU-T Recommendation X.667: Generation and registration of Universally Unique Identifiers (UUIDs) and their use as ASN.1 Object Identifier components. Standard. October 2012.
- ^ a b Zahn, Lisa; Dineen, Terence; Leach, Paul; Martin, Elizabeth; Mishkin, Nathaniel; Pato, Joseph; Wyant, Geoffrey (1990). Network Computing Architecture. Prentice Hall. p. 10. ISBN 978-0-13-611674-5.
- ^ Leach, P. J.; Levine, P.H.; Hamilton, J. A.; Stumpf, B.L. (18–20 August 1982). "UIDs as internal names in a distributed file system". Proceedings of the first ACM SIGACT-SIGOPS symposium on Principles of distributed computing - PODC '82. pp. 34–41. doi:10.1145/800220.806679. ISBN 0-89791-081-8.
- ^ "DCE 1.1: Remote Procedure Call". The Open Group. 1997.
- ^ a b c "DCE 1.1: Authentication and Security Services". The Open Group. 1997.
- ^ "ITU-T Study Group 17 - Object Identifiers (OID) and Registration Authorities Recommendations". ITU.int. Retrieved 28 March 2023.
- ^ "Revise Universally Unique Identifier Definitions (uuidrev)". Retrieved 30 May 2023.
- ^ a b "uuid.c". opensource.apple.com. Archived from the original on 24 February 2021. Retrieved 8 June 2017.
- ^ a b "ext2/e2fsprogs.git - Ext2/3/4 filesystem userspace utilities". Kernel.org. Retrieved 9 January 2017.
- ^ "Recommendation ITU-T X.667". www.itu.int. October 2012. Retrieved 19 December 2020.
- ^ "Registration Authority". IEEE Standards Association. Archived from the original on 4 April 2011.
- ^ "MAC addresses for virtual machines". Super User.
- ^ "MAC Address Setup". OpenWRT. 15 September 2021.
- ^ Reiter, Luke (2 April 1999). "Tracking Melissa's Alter Egos". ZDNet. Retrieved 16 January 2017.
- ^ Kuchling, A. M. "What's New in Python 2.5". Python.org. Retrieved 23 January 2016.
- ^ "draft-ietf-uuidrev-rfc4122bis-14". University of Washington. 6 November 2023. Archived from the original on 17 April 2024.
- ^ But a bug in Domain/OS made only the first half of the timespace usable, so problems occurred on 1997-11-02.Jim Rees (1996). "Apollo Date Bug".
- ^ Steele, Nick. "Breaking Down UUIDs".
- ^ "UUID Versions Explained | UUIDTools.com". www.uuidtools.com.
- ^ Chen, Raymond (28 September 2022). "Why does COM express GUIDs in a mix of big-endian and little-endian? Why can't it just pick a side and stick with it?". The Old New Thing. Retrieved 31 October 2022.
- ^ Leach, Paul. "UUIDs and GUIDs".
- ^ "Guid.ToByteArray Method (System)". learn.microsoft.com.
- ^ Jesus, Paulo; Baquero, Carlos; Almaeida, Paulo. "ID Generation in Mobile Environments" (PDF). Repositorium.Sdum.Uminho.pt.
- ^ Mathis, Frank H. (June 1991). "A Generalized Birthday Problem". SIAM Review. 33 (2): 265–270. CiteSeerX 10.1.1.5.5851. doi:10.1137/1033051. ISSN 0036-1445. JSTOR 2031144. OCLC 37699182.
- ^ "Duplicate UUID (Universally Unique ID) - more common than expected - Pocketables". pocketables.com. 27 August 2019.
- ^ "Wayback Machine". opensource.apple.com. Archived from the original on 6 July 2017. Retrieved 17 September 2017.
- ^ "Interface Pointers and Interfaces". Windows Dev Center - Desktop app technologies. Microsoft. Retrieved 15 December 2015.
You reference an interface at run time with a globally unique interface identifier (IID). This IID, which is a specific instance of a globally unique identifier (GUID) supported by COM, allows a client to ask an object precisely whether it supports the semantics of the interface, without unnecessary overhead and without the confusion that could arise in a system from having multiple versions of the same interface with the same name.
- ^ "Registering a Type Library". Microsoft Developer Network. Microsoft. Retrieved 15 December 2015.
- ^ "Categorizing by Component Capabilities". Windows Dev Center - Desktop app technologies. Microsoft. Retrieved 15 December 2015.
A listing of the CATIDs and the human-readable names is stored in a well-known location in the registry.
- ^ "NEWSEQUENTIALID (Transact-SQL)". Microsoft Developer Network. Microsoft. 8 August 2015. Retrieved 14 January 2017.
- ^ "Oracle Database SQL Reference". Oracle.
- ^ "Section 8.12 UUID Type". PostgreSQL 9.4.10 Documentation. PostgreSQL Global Development Group. 13 February 2020.
- ^ "uuid-ossp". PostgreSQL: Documentation: 9.6. PostgreSQL Global Development Group. 12 August 2021.
- ^ "pgcrypto". PostgreSQL: Documentation: 9.6. PostgreSQL Global Development Group. 12 August 2021.
- ^ "Section 13.20 Miscellaneous Functions". MySQL 5.7 Reference Manual. Oracle Corporation.
- ^ Nilsson, Jimmy (8 March 2002). "The Cost of GUIDs as Primary Keys". InformIT. Retrieved 20 June 2012.
- ^ "Storing UUID Values in MySQL". Percona. 19 December 2014. Archived from the original on 29 November 2020. Retrieved 10 February 2021.
External links
[edit]- Recommendation ITU-T X.667 (Free access)
- ISO/IEC 9834-8:2014 (Paid)
- Technical Note TN2166 - Secrets of the GPT - Apple Developer
- UUID Documentation - Apache Commons Id
- CLSID Key - Microsoft Docs
- Universal Unique Identifier - The Open Group Library
- UUID Decoder tool
- A Brief History of the UUID
- Understanding How UUIDs Are Generated
Universally unique identifier
View on Grokipediaf81d4fae-7dec-11d0-a765-00a0c91e6bf6).[1] The variant field in the eighth octet distinguishes the UUID layout, with the defined variant using the bit pattern 10xx for compatibility.[1]
UUIDs come in eight versions, each with distinct algorithms to guarantee uniqueness (detailed in the dedicated section):
- Version 1: Gregorian time-based, incorporating a timestamp and the generating system's node identifier (often a MAC address).[1]
- Version 2: DCE security version, similar to version 1 but includes POSIX user or group IDs for access control.[1]
- Version 3: Name-based, using MD5 hashing of a namespace UUID and a name string.[1]
- Version 4: Random or pseudo-random generation, relying on high-quality randomness for uniqueness.[1]
- Version 5: Name-based, using SHA-1 hashing instead of MD5 for improved security.[1]
- Version 6: Reordered Gregorian time-based, similar to version 1 but with timestamp reordered for improved sorting in databases.[1]
- Version 7: Unix timestamp-based, combining a Unix Epoch timestamp with random bits for uniqueness and monotonicity.[1]
- Version 8: Custom, allowing application-specific algorithms while maintaining the UUID format.[1]
History
Origins in OSF DCE
The universally unique identifier (UUID) was initially developed as a component of the Open Software Foundation's (OSF) Distributed Computing Environment (DCE), a middleware system aimed at facilitating distributed computing in client-server architectures during the early 1990s.[2] The primary goal was to enable the generation of unique identifiers for objects, such as files, processes, or resources, across networked systems without relying on a central coordinating authority, thereby supporting scalability in heterogeneous environments.[2] This approach addressed the challenges of ensuring global uniqueness in distributed setups where multiple nodes might independently create identifiers, avoiding conflicts through decentralized mechanisms.[3] The concept traces its roots to the Apollo Network Computing System (NCS), developed by Apollo Computer, before being formalized and expanded within OSF DCE.[2] Key figures in its specification included Paul J. Leach, then at Microsoft, and Rich Salz, associated with Certco, who co-authored early drafts building directly on the OSF DCE framework.[3] The first specification appeared in OSF DCE version 1.0, released around 1990–1991, defining the UUID as a fixed 128-bit value to balance compactness with sufficient entropy for uniqueness.[4] This structure incorporated a 60-bit timestamp (representing 100-nanosecond intervals since October 15, 1582), a 14-bit clock sequence to handle potential timestamp collisions, and a 48-bit node identifier typically derived from the machine's IEEE 802 MAC address, ensuring both temporal and spatial uniqueness.[5] Early adoption of UUIDs occurred within OSF DCE-based systems for tasks like naming interfaces and binding endpoints in remote procedure calls.[5] Microsoft integrated the OSF DCE remote procedure call (RPC) mechanism, including UUIDs, into its Component Object Model (COM) framework, where they served as globally unique identifiers (GUIDs) for components, interfaces, and classes in distributed applications.[2] This integration marked one of the earliest widespread uses beyond pure DCE environments, influencing subsequent implementations in Windows platforms and other enterprise systems.[2]Standardization and Evolution
The standardization of universally unique identifiers (UUIDs) originated in the Open Software Foundation's Distributed Computing Environment (OSF DCE) in the early 1990s, providing a foundational specification for generating unique identifiers across distributed systems. This initial framework was subsequently formalized as an international standard through ISO/IEC 11578:1996, which defines UUIDs within the context of Open Systems Interconnection—Remote Procedure Call (RPC), ensuring interoperability in information technology environments. In July 2005, the Internet Engineering Task Force (IETF) published RFC 4122, titled "A Universally Unique IDentifier (UUID) URN Namespace," which codified the DCE-based UUID format as a Uniform Resource Name (URN) namespace while introducing three new versions: version 3 for name-based UUIDs using MD5 hashing, version 4 for random or pseudo-random generation, and version 5 for name-based UUIDs using SHA-1 hashing.[2] This RFC addressed gaps in the earlier DCE and ISO specifications by providing a broader set of generation methods suitable for diverse applications, including those not reliant on distributed time synchronization. Over the years, RFC 4122 underwent refinements through published errata, resolving ambiguities in areas such as variant bit interpretation and byte order to enhance implementation consistency.[6] Advancements continued with the IETF's uuidrev working group, which in 2023 released draft-ietf-uuidrev-rfc4122bis-08, proposing version 7 UUIDs that incorporate a 48-bit Unix timestamp for improved chronological sortability without depending on synchronized clocks or hardware identifiers.[7] This draft evolved into RFC 9562, published in May 2024 as "Universally Unique IDentifiers (UUIDs)," which obsoletes RFC 4122 and introduces version 6 (a reordered time-based variant of version 1), version 7, and version 8 for custom or experimental applications where implementers define their own layouts within the UUID framework.[1] RFC 9562 also aligns UUID specifications more closely with ITU-T Recommendation X.667 | ISO/IEC 9834-8, emphasizing best practices for generation and usage. A key aspect of UUID evolution has been the shift away from hardware-dependent generation methods, particularly those using MAC addresses in versions 1 and 2, toward software-only approaches. This transition addresses privacy concerns, as MAC addresses can reveal device identities and enable tracking, compounded by modern operating systems' implementation of MAC address randomization to protect user anonymity in network environments.[1] RFC 9562 explicitly recommends using fixed or random node identifiers instead of actual MAC addresses to maintain uniqueness while mitigating these risks, reflecting broader industry adoption of privacy-preserving identifier strategies.[1]Standards
Core Specifications
The core specifications for universally unique identifiers (UUIDs) establish a standardized 128-bit format designed to ensure uniqueness across both spatial and temporal dimensions without relying on a central registration authority. This fixed length allows for an immense address space of approximately 3.4 × 10^38 possible values, minimizing collision risks in distributed systems.[2][1] The foundational international standard, ISO/IEC 11578:1996, defines the representation and generation algorithms for UUIDs within the Open Systems Interconnection (OSI) framework, specifically as part of the Distributed Computing Environment (DCE) remote procedure call (RPC) bindings. It specifies the canonical string format—32 hexadecimal digits grouped as 8-4-4-4-12 with hyphens—and outlines procedures for creating time-based and name-based identifiers to maintain global uniqueness. Compliance with this standard ensures interoperability in OSI-conformant environments by mandating the use of precise bit layouts for timestamp, clock sequence, and node fields.[8][9] Building on DCE principles, RFC 4122 (published in 2005 by the Internet Engineering Task Force) formalizes UUIDs as a Uniform Resource Name (URN) namespace, defining versions 1 through 5 with distinct generation methods while emphasizing collision avoidance through randomized elements like clock sequences and node identifiers. It introduces the variant field, encoded in the high-order bits of the eighth octet as the binary pattern 10xx (where xx are variable), to distinguish UUIDs generated under this specification from other variants and reserve space for future extensions. For interoperability, all compliant UUIDs must adhere to this variant encoding and include a version nibble (a 4-bit value in the high-order bits of the seventh octet) that identifies the specific generation algorithm used. Guidelines for collision avoidance include using unique node identifiers, such as IEEE 802 MAC addresses, and incrementing a clock sequence on system reboots or clock adjustments to prevent duplicates.[2] RFC 9562 (published in 2024) obsoletes and expands RFC 4122, incorporating versions 6 through 8 to address modern needs like sortable timestamps and custom subtypes, while clarifying and slightly revising the variant rules for enhanced robustness. It reaffirms the 128-bit length and the core requirement for decentralized uniqueness, with no central coordination needed for generation or validation. Updated compliance criteria mandate the variant bits (positions 64-65 in the UUID octet stream) be set to 10 for all defined versions, alongside the version nibble (bits 48-51) matching the UUID type (e.g., 0110 for version 6), ensuring seamless integration across systems and networks. These specifications collectively prioritize global interoperability by standardizing the layout and metadata fields that signal generation provenance.[1]Related and Variant Standards
In Microsoft Windows environments, the UUID is implemented as a Globally Unique Identifier (GUID), a 16-byte binary structure used extensively in Component Object Model (COM) and Distributed COM (DCOM) to uniquely identify interfaces, classes, and other objects.[10] GUIDs are stored in binary form within the Windows registry to index configuration information for applications and system components, enabling seamless object resolution across distributed systems.[11] This adaptation maintains compatibility with the core UUID structure while integrating with Windows-specific protocols for remote procedure calls.[12] The Bluetooth specification adopts 128-bit UUIDs for identifying services and profiles, often deriving them from shorter 16-bit or 32-bit forms assigned by the Bluetooth Special Interest Group (SIG) to optimize transmission efficiency in low-power devices.[13] These short UUIDs are expanded into full 128-bit equivalents by inserting the assigned value into a fixed base UUID (00000000-0000-1000-8000-00805F9B34FB), ensuring global uniqueness while minimizing on-air bytes in protocols like GATT.[14] Only SIG-assigned short forms are permitted for interoperability, with custom 128-bit UUIDs reserved for vendor-specific extensions.[14] In web and data interchange standards, UUIDs are represented in their canonical string lexical form—a sequence of 32 hexadecimal digits grouped as 8-4-4-4-12 and enclosed in braces for some contexts—to ensure consistent parsing within JSON documents, as JSON natively supports strings without requiring special handling.[15] This format promotes interoperability in JSON-based APIs and serialization, where UUIDs serve as keys or identifiers without altering the underlying 128-bit binary value.[15] Emerging standards and drafts extend UUIDs for cloud environments and enhanced privacy. For instance, Amazon Web Services (AWS) incorporates UUIDs within Amazon Resource Names (ARNs) for certain resource identifiers, such as in AWS Glue transformations that generate unique IDs for data rows, facilitating scalable, distributed resource management.[16] Privacy-enhanced variants, addressed in recent IETF updates, introduce new versions like UUIDv7 (time-ordered with high-entropy random bits) and UUIDv8 (custom subtypes), which avoid exposing hardware identifiers like MAC addresses to mitigate tracking risks in distributed systems.[1] These evolutions, formalized in RFC 9562, prioritize randomness and temporal sorting while preserving collision resistance.[17]Structure
Overall Layout
A Universally Unique Identifier (UUID) is a 128-bit integer value designed to be unique across space and time without centralized coordination.[17] It is typically stored and transmitted in network byte order (big-endian), ensuring consistent interpretation across different systems regardless of local endianness.[17] This fixed 128-bit size provides ample space to avoid collisions while remaining compact for storage and comparison purposes.[17] The UUID is divided into several fixed fields that collectively form its structure: an 8-bit variant field (indicating the encoding variant), a 4-bit version field (specifying the generation algorithm), a 48-bit node ID (often derived from hardware like a MAC address), a 60-bit timestamp or equivalent value (for time-based uniqueness), and a clock sequence field (typically 14 bits to handle clock adjustments and prevent duplicates).[17] These fields are not padded with leading zeros in their binary representation; instead, the fixed overall bit length ensures no overflow or alignment issues during generation or parsing.[17] Visually, the 128-bit UUID can be broken down as follows:UUID = time_low (32 bits) | time_mid (16 bits) | time_hi_and_version (16 bits) | clock_seq_and_variant (16 bits) | node (48 bits)
UUID = time_low (32 bits) | time_mid (16 bits) | time_hi_and_version (16 bits) | clock_seq_and_variant (16 bits) | node (48 bits)
Variant and Version Fields
The variant field in a UUID occupies the three most significant bits (bits 0 through 2) of theclock_seq_hi_and_reserved octet (octet 8 in the 128-bit layout), determining the overall layout and interpretation of the remaining bits for interoperability across systems.[18] This field uses specific bit patterns to distinguish between different UUID encoding schemes: 0xx for Network Computing System (NCS) backward compatibility, 10x for the standard defined in RFC 9562 (providing compatibility with earlier RFC 4122 UUIDs), 110 for Microsoft GUID backward compatibility, and 111 reserved for future use.[18] The 10x pattern, where the two most significant bits are 10, is the most commonly used in modern implementations to ensure consistent parsing.[18]
The version field, a 4-bit nibble, is located in the most significant bits (bits 0 through 3) of the time_hi_and_version octet (octet 6), specifying the UUID generation algorithm and thus how the other fields should be interpreted.[19] Versions 1 through 5 represent the original methods: version 1 for time-based UUIDs using Gregorian timestamps, version 2 for DCE security UUIDs (largely reserved), version 3 for name-based UUIDs using MD5 hashing, version 4 for random or pseudorandom UUIDs, and version 5 for name-based UUIDs using SHA-1 hashing.[19] Updated versions 6 through 8 extend this scheme: version 6 for reordered time-based UUIDs (rearranging version 1 fields for better sorting), version 7 for Unix timestamp-based UUIDs with random components for improved monotonicity, and version 8 for custom or application-specific UUIDs.[19]
To detect the UUID type, systems parse these fields during decoding: the variant bits first classify the layout, followed by the version bits to identify the exact generation method, enabling validation of compliance with RFC 9562.[20] This dual classification is crucial for preventing misinterpretation; for instance, a variant mismatch (e.g., treating a Microsoft GUID as an RFC 9562 UUID) can lead to incorrect extraction of timestamps or node identifiers, causing errors in distributed systems.[18] By standardizing these bits, UUIDs maintain uniqueness and portability across diverse environments without requiring additional metadata.[21]
Time and Node Components
In time-based UUIDs, such as versions 1 and 6, the timestamp component provides a measure of the generation time, consisting of a 60-bit value representing the number of 100-nanosecond intervals since the Gregorian calendar epoch of October 15, 1582, 00:00:00.[22] This starting point deliberately follows the adoption of the Gregorian calendar to avoid complications from the earlier Julian calendar transition, ensuring a consistent reference across systems.[22] The timestamp is designed to roll over after approximately 3,400 years from the epoch, providing ample longevity for practical use.[22] The timestamp is subdivided and positioned within the 128-bit UUID structure as follows: the least significant 32 bits occupy the time_low field (octets 0-3), the next 16 bits form the time_mid field (octets 4-5), and the most significant 12 bits of the timestamp appear in the time_hi portion of the time_hi_and_version field (octets 6-7, with the remaining 4 bits reserved for the version number).[22] In version 1 UUIDs, this layout preserves the original DCE ordering, while version 6 reorders the fields to place the most significant timestamp bits first for improved chronological sorting in databases.[23] These components collectively ensure that the timestamp contributes to the UUID's uniqueness by embedding precise temporal information. To mitigate risks of duplicate UUIDs arising from clock adjustments, such as regressions or resets on the generating system, a 14-bit clock sequence field is included.[22] This sequence is typically initialized to a random value between 0 and 16,383 and is incremented (modulo 16,384) whenever the local clock is found to have regressed relative to the last UUID generation time, or upon node ID changes that could otherwise cause collisions.[22] The clock sequence occupies bits 66-79 in the binary representation: for the UUID variant, it consists of bit 66 (the variable bit of the 10x variant) followed by the 5 bits in the remainder of octet 8 (bits 67-71), and the 8 bits in octet 9 (bits 72-79).[22] This mechanism guarantees temporal uniqueness even in environments with imperfect clocks, without requiring synchronized time across nodes. The node ID component, a 48-bit field, identifies the hardware or network interface generating the UUID and occupies the final octets (10-15) in the binary layout.[22] It is conventionally set to the IEEE 802 MAC address of the local node, which uniquely identifies network interfaces worldwide.[22] When a true MAC address is unavailable or to preserve privacy, a randomly generated 48-bit value is used instead, with the multicast bit (the least significant bit of octet 10) set to 1 to distinguish it from unicast MAC addresses.[24] This node ID, combined with the timestamp and clock sequence, ensures global uniqueness by tying the UUID to a specific generating entity.[22] The variant and version fields are integrated adjacent to these components to classify the UUID type without altering their core roles.[25]UUID Versions
Time-based with MAC Address (Versions 1 and 6)
UUID version 1, also known as the time-based UUID, generates identifiers using a 60-bit timestamp representing the number of 100-nanosecond intervals since 00:00:00.00 UTC on October 15, 1582 (the Gregorian calendar epoch), combined with a 14-bit clock sequence and a 48-bit node identifier.[26] The timestamp is divided into three fields: time_low (32 bits), time_mid (16 bits), and time_hi_and_version (16 bits, with the 4-bit version set to 0001 and the remaining 12 bits for time_hi).[27] The clock sequence prevents duplicates if the system clock is reset or adjusted backward, initialized to a random value between 0 and 16383, while the node field typically holds the IEEE 802 MAC address of the generating machine's network interface; if no MAC is available, a random 48-bit value is used with the multicast bit (least significant bit of the first octet) set to 1.[28][29] The generation process for version 1 UUIDs follows a stateful algorithm to ensure monotonicity: obtain an exclusive lock to access the UUID generation state, retrieve the current UTC timestamp and node ID, compare the timestamp to the previous one—if it has not advanced, increment the clock sequence (or generate a new one if it overflows) and retry the timestamp acquisition up to a system-defined limit, then format the fields into the 128-bit structure and release the lock.[30] This design supports high generation rates, up to approximately 10 million UUIDs per second per node, as the 100-nanosecond granularity allows 10^7 intervals per second.[31] Uniqueness is guaranteed globally without central coordination: the timestamp and node combination ensures no collisions across distinct machines (due to unique MAC addresses), while the clock sequence handles duplicates within the same node and time slot.[32] Version 6 UUIDs, introduced as an update in RFC 9562, maintain the core elements of version 1—60-bit timestamp, 14-bit clock sequence, and 48-bit node—but reorder the timestamp fields for improved lexical sorting when stored as binary or text representations, enhancing database index locality and query performance in distributed systems.[33] Specifically, the structure places the most significant 48 bits of the timestamp first (time_high across octets 0-5, split as 32 bits in octets 0-3 and 16 bits in 4-5), followed by the version (0110 in bits 48-51 of octet 6), the least significant 12 bits of the timestamp (time_low in bits 52-63 of octet 6-7), the variant (10 in bits 64-65 of octet 8), the clock sequence (14 bits across octets 8-9), and the node (48 bits in octets 10-15).[33] Generation mirrors version 1, including the stateful timestamp and clock sequence logic, but with the timestamp bytes rearranged post-capture to prioritize higher-order bits for sortability, using the same epoch and node derivation rules.[33] Both versions 1 and 6 provide strong uniqueness guarantees identical to those of the original DCE specification, with one UUID per 100-nanosecond interval per node, enabling collision-free operation across space and time in uncoordinated environments.[32][33] They are particularly suited for use cases in distributed computing systems, such as the Open Software Foundation's Distributed Computing Environment (DCE), where temporal ordering is beneficial for logging, transaction tracking, or replication without requiring synchronized clocks beyond the node level, offering sortability by timestamp but with privacy risks from potential exposure of the MAC address or node ID.[34][35] Version 6's sorting advantage makes it preferable in modern databases for range queries or partitioning by time.[33]DCE Security with MAC Address (Version 2)
Version 2 UUIDs, known as DCE security UUIDs, represent a specialized variant of time-based identifiers designed for distributed computing environments requiring embedded security contexts. They extend the core structure of version 1 UUIDs by incorporating local identifiers such as POSIX user IDs (UIDs) or group IDs (GIDs) to associate UUIDs with specific principals for access control and auditing purposes. This variant was specified in the DCE 1.1 Authentication and Security Services standard to support privilege management within DCE cells. The layout of a version 2 UUID mirrors that of version 1 in its overall 128-bit composition, including a 60-bit timestamp split across time_low (32 bits), time_mid (16 bits), and time_hi_and_version (16 bits, with the 4 most significant bits set to 0010 binary to indicate version 2), a 14-bit clock sequence across clock_seq_hi_and_reserved (8 bits, with variant bits 10 in the 2 most significant bits) and clock_seq_low (8 bits), and a 48-bit node field containing the MAC address. However, the time_low field replaces the least significant 32 bits of the timestamp with the 32-bit local identifier (UID or GID), reducing timestamp precision but embedding security information. The clock sequence is effectively shortened to 6 bits in clock_seq_hi_and_reserved (bits 8-13 of the original sequence), while the 8-bit clock_seq_low field holds the domain value, which differentiates the type of local identifier used.[5] The domain value in clock_seq_low specifies the security context and supports three defined values: 0 for the person domain (using a user ID), 1 for the group domain (using a group ID), and 2 for the organization domain (using an organizational unit ID). These domains enable DCE systems to map UUIDs to specific access control entries, such as in privilege attribute certificates, ensuring that identifiers reflect the creating entity's security role within a local cell. Although the field is 8 bits (allowing values up to 255), only these three are standardized, with others left for potential future or implementation-specific use. Generation of a version 2 UUID involves capturing the current UTC timestamp in 100-nanosecond intervals since October 15, 1582, incrementing a 6-bit clock sequence (modulo 64) if the timestamp has not advanced, selecting the appropriate domain and retrieving the corresponding local ID (e.g., via POSIX getuid() or getgid()), and combining these with the system's 48-bit node ID (MAC address). The local ID and domain are embedded at creation time to record the security principal responsible, aiding in auditing and authorization without requiring centralized coordination. Unlike version 1, no standard DCE API directly generates version 2 UUIDs; implementations must customize the uuid_create() routine accordingly.[5] Due to their dependency on POSIX-specific identifiers and limited adoption beyond DCE ecosystems, version 2 UUIDs are largely obsolete today and omitted from many modern libraries and standards. RFC 9562 reserves the version for DCE security but provides no further details, deferring to the original specification, and notes their rarity in contemporary systems except for legacy DCE or certain Microsoft environments.[17]Namespace Name-based (Versions 3 and 5)
Namespace name-based UUIDs, designated as versions 3 and 5, are generated by applying a cryptographic hash function to a combination of a predefined namespace UUID and a unique name string, ensuring deterministic uniqueness within that namespace.[2] These versions provide a mechanism to create identifiers from human-readable names that are guaranteed to be unique as long as the name is unique within its specified namespace, making them suitable for applications requiring reproducible UUIDs, such as federated naming systems.[2] Unlike random or time-based UUIDs, the output is always the same for identical inputs, facilitating consistent identification across distributed systems without coordination.[2] The generation process begins with selecting a namespace UUID, which acts as a context for the name, followed by concatenating the namespace UUID—in network byte order—with the name encoded as a sequence of octets (using UTF-8 for strings).[2] For version 3, an MD5 hash is computed over this concatenation, yielding a 128-bit digest from which the UUID fields are derived: the first 32 bits form time_low, the next 16 bits time_mid, the following 16 bits populate time_hi_and_version (with the version bits set to 0011 binary, or 3 decimal), the subsequent 8 bits fill clock_seq_hi_and_reserved (with variant bits set to 10 binary), the next 8 bits clock_seq_low, and the final 48 bits the node field.[2] Version 5 follows an identical structure but uses a SHA-1 hash instead of MD5, with the version bits in time_hi_and_version set to 0101 binary (or 5 decimal); this substitution is recommended due to MD5's vulnerabilities and provides greater security than version 3, though neither version is intended for security-sensitive applications like credentials.[2] The resulting UUID adheres to the standard variant (bits 6-7 of octet 6 set to 10) and is converted to the appropriate byte order for representation.[2] RFC 4122 defines several predefined namespace UUIDs to standardize common use cases, including the DNS namespace (6ba7b810-9dad-11d1-80b4-00c04fd430c8) for domain names, the URL namespace (6ba7b811-9dad-11d1-80b4-00c04fd430c8) for uniform resource locators, and the OID namespace (6ba7b812-9dad-11d1-80b4-00c04fd430c8) for object identifiers.[2] These namespaces enable interoperability, allowing different systems to independently generate the same UUID for the same name, thus supporting scenarios like naming resources in distributed directories or registries.[2]Randomly Generated (Version 4)
Version 4 UUIDs are generated using random or pseudo-random numbers, providing a method for creating unique identifiers without reliance on timestamps or hardware addresses. This approach ensures uniqueness through high-entropy random bits, making it suitable for environments where deterministic generation is undesirable or impractical and the most common type due to its security and privacy-friendly properties.[2] The structure of a Version 4 UUID follows the standard 128-bit layout, with specific fixed bits to indicate the version and variant. The version field, consisting of 4 bits (positions 12-15 in the time_hi_and_version octet), is set to the binary value 0100 to denote Version 4. The variant field, using 2 bits (positions 6-7 in the clock_seq_hi_and_reserved octet), is set to 10 to conform to the RFC 4122 variant specification. The remaining 122 bits are filled with random values, yielding over 2^122 possible unique unique UUIDs and effectively eliminating collision risks in practical applications.[2] Generation of Version 4 UUIDs requires a source of random numbers, preferably of cryptographic quality to maximize entropy and prevent predictability from poor seeding or algorithmic weaknesses. The process involves setting the fixed version and variant bits, then populating the other fields with random data: the 32-bit time_low field entirely random; the 16-bit time_mid field entirely random; the 12 least significant bits (0-11) of the time_hi_and_version field random; the 14-bit clock sequence (6 bits from clock_seq_hi_and_reserved positions 0-5 plus all 8 bits of clock_seq_low) random; and the 48-bit node field entirely random. This random placement across fields maintains compatibility with the UUID format while distributing entropy evenly.[2] A key advantage of Version 4 UUIDs is their independence from system clocks, avoiding synchronization issues common in time-based variants and enabling generation in offline or distributed systems without coordination. Additionally, by eschewing timestamps and MAC addresses, they enhance privacy by not leaking temporal or hardware-specific information about the generating system. The RFC 4122 standard explicitly recommends this random method for scenarios prioritizing simplicity and security over reproducibility.[2]Unix Timestamp with Random (Version 7)
UUID Version 7 (UUIDv7) is a time-ordered variant of the Universally Unique Identifier (UUID) standard, defined in RFC 9562, which incorporates a 48-bit Unix timestamp representing milliseconds since the Unix epoch (January 1, 1970, 00:00:00 UTC, excluding leap seconds) into its structure, alongside 4 bits for the version number (set to 0111 binary), a 12-bit field for randomness or a counter, and 62 bits of additional randomness, with the 2-bit variant field set to 10 binary to indicate RFC compliance. As a newer version, it emphasizes time-based generation with sortability.[17] The layout of UUIDv7 arranges the 48-bit timestamp across the most significant bits—specifically, the 32-bittime_low field, the 16-bit time_mid field, and the low-order 12 bits of the 16-bit time_hi_and_version field—to ensure that UUIDs generated in temporal sequence exhibit lexical sortability when represented as strings or binary values.[17] The version bits occupy the high-order 4 bits of the time_hi_and_version field, while the variant bits are placed in the high-order 2 bits of the clock_seq_hi_and_reserved field within the 16-bit clock_seq portion, replacing the traditional clock sequence and node identifier fields used in earlier time-based UUIDs.[17] This configuration, illustrated in the following bit-level breakdown, prioritizes temporal ordering in the initial 48 bits followed by randomized bits for uniqueness:
| Field | Bits | Description |
|---|---|---|
| unix_ts_ms | 0-47 | 48-bit Unix timestamp (ms) |
| ver | 48-51 | Version (7) |
| rand_a (or counter/sub-ms) | 52-63 | 12 bits random or monotonic counter |
| var | 64-65 | Variant (10) |
| rand_b | 66-127 | 62 bits random |
Custom (Version 8)
Version 8 UUIDs provide a flexible framework for custom identifier generation tailored to specific applications or vendors, where the standard layouts of other versions do not suffice. Defined in RFC 9562 as a newer version, this reserves 122 bits for implementation-specific use while enforcing the version field to 8 (binary 1000 in bits 48-51) and the variant field to 10xx (bits 64-65 set to 10), ensuring basic compatibility with UUID parsing systems.[17] This approach allows embedding domain-specific data, such as sequence numbers, application metadata, or custom hashes, without conflicting with predefined structures in versions 1 through 7.[36] Implementations of version 8 UUIDs must fully document their custom layout to enable understanding and potential interoperability, as the RFC does not prescribe any particular algorithm beyond the fixed fields. The 128-bit structure allocates bits 0-47 (custom_a, 48 bits), bits 52-63 (custom_b, 12 bits), and bits 66-127 (custom_c, 62 bits) for user-defined content, leaving the version and variant bits to signal the custom nature.[37] Uniqueness is the responsibility of the implementer, who must ensure that the method used—whether time-based, random, or otherwise—avoids collisions within the intended scope, and the layout should not mimic patterns from other UUID versions to prevent misinterpretation.[36] For example, a custom version 8 UUID might incorporate a Unix timestamp in the initial bits followed by application-specific counters, as illustrated in RFC 9562 with the identifier 2489E9AD-2EE2-8E00-8EC9-32D5F69181C0, or use a SHA-256 hash of namespace and name data for deterministic generation, such as 5c146b14-3c52-8afd-938a-375d0df1fbf6.[38] These examples are illustrative only and not recommended for production without modification to suit the domain's needs. The RFC emphasizes that custom formats should prioritize uniqueness guarantees and rigorous testing, avoiding reliance on security properties like those in version 2.[39] A primary risk of version 8 UUIDs is diminished interoperability, as undocumented or proprietary layouts may render identifiers unusable across systems or lead to unintended collisions if uniqueness is not properly managed.[40] To mitigate this, the RFC recommends public documentation of algorithms and advises against using version 8 for scenarios requiring broad standardization, reserving it for controlled, application-specific environments.[36]Encoding
Binary Representation
A universally unique identifier (UUID) is represented in binary form as a fixed-size 16-byte (128-bit) array, providing a compact and efficient means for storage and transmission across systems without introducing variable-length overhead. This binary format ensures interoperability in low-level operations, such as memory allocation or direct byte manipulation in programming languages.[1] The byte order for this 16-byte array follows big-endian (most significant byte first, also known as network byte order) as specified in RFC 9562, particularly for timestamp-related fields like time_low, time_mid, and time_hi_and_version, where multi-byte values are serialized with the most significant octet first. The node identifier field is likewise transmitted in the order it appears on the network wire, maintaining consistency for cross-platform compatibility. However, implementations in Microsoft Windows APIs, such as the GUID structure, store multi-byte fields (Data1, Data2, and Data3) in little-endian order on little-endian architectures like x86, requiring conversion to big-endian when interfacing with network protocols or standards-compliant systems.[1][10] In database systems, UUIDs are commonly stored using a BINARY(16) data type, preserving the exact 16 bytes without additional formatting or padding, which allows for efficient indexing and querying. In C programming environments, a typical representation is a structure liketypedef unsigned char uuid_t[16];, treating the UUID as an opaque byte array to avoid endianness assumptions during local operations.[1]
For transmission in network protocols, UUIDs are sent as raw 16-byte sequences in big-endian order without byte swapping or transformation, ensuring direct usability in headers or payloads; examples include Server Message Block (SMB) for file sharing and custom HTTP headers in distributed systems.[1]
To parse the binary representation and extract components like the version field, bit manipulation operations are applied directly to the byte array assuming the standard big-endian layout. For instance, the UUID version is obtained from the high nibble of the seventh byte (octet 6, zero-based indexing), corresponding to bits 12-15 of the time_hi_and_version field:
version = (bytes[6] >> 4) & 0x0F;
version = (bytes[6] >> 4) & 0x0F;
Textual Representation
The canonical textual representation of a UUID, as defined in RFC 9562, consists of 32 hexadecimal digits (using lowercase letters a–f) arranged in five groups separated by hyphens in the format 8-4-4-4-12: the first group contains 8 digits for the time-low field, followed by 4 digits each for time-mid, time-high-and-version, clock-seq-and-reserved plus clock-seq-low, and 12 digits for the node field.[1] For example, a typical UUID appears as123e4567-e89b-12d3-a456-426614174000.[1] This format ensures human readability and interoperability across systems.[1]
A compact variant omits the hyphens, resulting in a continuous 32-hex-digit string, which is commonly used for storage efficiency or in contexts where brevity is prioritized, though it is not the canonical form specified by the RFC.[1] Uppercase hexadecimal letters are permitted on input for parsing but are not preferred for output, which should use lowercase; the RFC treats hexadecimal values as case-insensitive during processing.[1] When used as a Uniform Resource Name (URN), a UUID is prefixed with urn:uuid:, yielding forms like urn:uuid:123e4567-e89b-12d3-a456-426614174000.[1]
Validation of a UUID string typically involves verifying its length (36 characters with hyphens or 32 without), ensuring all characters are valid hexadecimal digits, and checking the variant and version identifiers embedded in specific positions.[1] The version nibble, located as the first hexadecimal digit of the third group (position 15 in the hyphenated string), must be 1, 3, 4, 5, 6, 7, or 8 to indicate one of the defined UUID versions.[1] Similarly, the variant bits, starting with the first digit of the fourth group (position 19), should match the RFC 9562 variant (binary 10xx, corresponding to hexadecimal 8, 9, a, or b) for compatibility.[1] Beyond format checks, the RFC provides no formal mechanism to confirm a UUID's overall validity, such as whether it is assigned or in the future.[1]
Many programming libraries support parsing UUID strings into binary form, often accommodating both hyphenated and compact representations. For instance, the uuid_parse() function in the libuuid library (part of the util-linux package) converts a standard hyphenated string to a 128-bit binary UUID, expecting the exact 36-character format including hyphens and null terminator.[41]
Special Values
Nil UUID
The nil UUID is a special form of universally unique identifier defined in the standards for UUIDs, consisting of 128 bits all set to zero.[42] It serves as a reserved value to represent the absence of a UUID, analogous to a null or uninitialized state in data structures.[42] In textual representation, the nil UUID is expressed as00000000-0000-0000-0000-000000000000.[42] According to RFC 9562, which obsoletes the earlier RFC 4122, this value is explicitly designated as the "nil UUID" and is not produced by any standard UUID generation algorithm.[42] Its variant field evaluates to 0 (following the NCS backward compatibility scheme due to the all-zero bits), and its version field is also 0, distinguishing it from versioned UUIDs.[43][44]
This nil UUID is commonly used in databases to indicate unassigned or optional identifiers, such as in PostgreSQL where the UUID type treats the all-zero value as a flag for an unknown or unset UUID, often inserted via functions like uuid_nil().[45] In programming environments, it represents uninitialized objects; for instance, Python's uuid module provides uuid.NIL as this constant for scenarios requiring a null UUID placeholder.[46] In APIs and data serialization formats like JSON, it denotes optional fields without a valid UUID, avoiding the need for separate null types while maintaining type consistency.[47] Such usage ensures clear signaling of absence without risking collision with generated UUIDs, as the nil value is explicitly reserved for implementation-specific null-like purposes.[42]
Maximum UUID
The maximum UUID, also known as the Max UUID, is a special value consisting of 128 bits all set to 1, represented in hexadecimal asffffffff-ffff-ffff-ffff-ffffffffffff.[17] This value serves as the theoretical upper bound within the UUID namespace, contrasting with the nil UUID by representing a "full" state rather than an "empty" one.[17]
Defined in RFC 9562, the Max UUID adheres to the overall UUID format but features a version number of 15 in its version bits (the first four bits of the third octet set to 1111), which is invalid for standard UUID versions 1 through 8 and reserved for future extensions.[17] Although not explicitly outlined in the earlier RFC 4122, it remains a valid UUID per the structural rules, as the specification does not prohibit all-ones configurations beyond defined variants.[2] In binary form, it is a continuous sequence of 128 ones, making it the largest possible 128-bit value expressible as a UUID.[17]
In practice, the Max UUID is rarely generated or encountered, as it is primarily reserved for specific system-level purposes rather than routine identification.[17] It functions as a sentinel value in scenarios requiring a 128-bit UUID placeholder where no valid identifier applies, such as denoting an invalid or uninitialized state in protocols or data structures.[17] Common contexts include overflow protection in UUID-based counters, where it signals the exhaustion of the identifier space, or as a reserved marker in technical specifications to avoid conflicts with assignable values.[17] For instance, database systems like Percona Server for MySQL provide functions to generate this value explicitly as the counterpart to the nil UUID for such sentinel roles.[48]
Implementations in programming languages further highlight its specialized role; the Python standard library's uuid module exposes it as uuid.MAX for programmatic use in boundary checks or defaults, while similar constants appear in Rust's uuid crate and Node.js's uuid package to represent the all-ones boundary.[46][49][50] Overall, its adoption emphasizes conceptual completeness in UUID ecosystems without implying routine generation, ensuring it does not collide with probabilistically unique identifiers.[17]
Collisions
Probability Calculations
The probability of collisions in UUIDs is analyzed using the birthday paradox approximation, which estimates the likelihood of at least one duplicate among generated identifiers in a space of size :where and is the number of effective random bits. This formula provides a practical bound for collision risk across UUID versions, assuming uniform distribution and independence.[17] For version 1 and version 6 UUIDs, collisions occur only if two identifiers share the exact 60-bit timestamp, 14-bit clock sequence, and 48-bit node identifier, yielding an effective randomness of 62 bits (, ) within each timestamp slot. The global uniqueness is further ensured by using unique node identifiers, such as IEEE 802 MAC addresses, reducing the practical collision risk across distributed systems. For collisions within the same timestamp, the space is 62 bits, making duplicates extremely unlikely unless generating over UUIDs in the same slot across identical nodes, which is practically impossible. Globally, uniqueness is ensured by distinct timestamps and node IDs.[17] Version 2 UUIDs, a legacy DCE security variant, follow a similar time-based structure to version 1 but replace the 48-bit node field with 32-bit POSIX UID and 32-bit GID fields, resulting in an effective space of 60-bit timestamp + 14-bit clock sequence + 64-bit UID/GID () within each slot, scoped to specific users or groups on a system. Uniqueness relies on distinct inputs, with collision risks comparable to version 1 but limited to the same system/user context; due to rarity of use, detailed probability analyses are uncommon.[51] Version 4 UUIDs utilize 122 random bits (, ), excluding the fixed version and variant fields. Under the birthday approximation, generating approximately (about 2.3 quintillion) UUIDs yields a collision probability of roughly 50%, though smaller sets like UUIDs result in a negligible risk of about . This vast space makes collisions extremely unlikely in most applications.[17] Version 3 UUIDs, based on MD5 hashing of a namespace and name, inherit MD5's known collision vulnerabilities, where practical attacks can produce distinct inputs with identical 128-bit outputs, though namespace scoping (e.g., DNS or URL) confines risks to specific domains and limits global impact. Version 5 UUIDs use SHA-1 hashing, which also suffers from demonstrated collisions (e.g., chosen-prefix attacks requiring feasible computation), but the same namespace constraints mitigate widespread uniqueness failures; both versions assume input uniqueness to avoid hash-based duplicates. NIST has deprecated MD5 due to these weaknesses and announced that SHA-1 should be phased out by December 31, 2030, for cryptographic uses.[17][52][53] Version 7 UUIDs combine a 48-bit Unix timestamp (milliseconds since 1970) with 74 random bits (, ), providing slightly reduced effective randomness compared to version 4 due to the fixed time component. Under the birthday approximation, generating approximately version 7 UUIDs within the same millisecond would yield about a 50% chance of collision, though such volume in one millisecond is practically impossible. For realistic rates (e.g., millions per second), the risk remains negligible, with overall low probability given the time-ordered nature.[17] Version 8 UUIDs are custom, with structure defined by the implementer, typically allocating 122 bits for version-invariant data including random components (e.g., at least 74 random bits recommended). Collision probability depends on the effective random bits used ( up to 122); following RFC 9562 guidelines for sufficient entropy ensures risks similar to version 4, but poor implementations could increase vulnerabilities.[17]
Mitigation Strategies
To minimize the risk of collisions in UUID generation, implementations must adhere to the specifications outlined in relevant standards, particularly for random and time-based variants. For Version 4 and Version 7 UUIDs, which rely heavily on randomness, using a cryptographically secure pseudorandom number generator (CSPRNG) is essential to ensure sufficient entropy and unguessability; weak pseudorandom number generators like the standard Crand() function should be avoided, as they can lead to predictable sequences and increased collision probabilities. For Version 8, implementers must ensure adequate random bits and CSPRNG usage to match version 4 security levels.[1][54]
For Version 1 and Version 6 UUIDs, which incorporate timestamps and node identifiers, maintaining stable, monotonically increasing clocks is critical to prevent duplicates from clock rollbacks or low-resolution timing; if the clock regresses, the clock sequence must be incremented or randomized to maintain uniqueness. Unique node IDs, such as IEEE 802 MAC addresses, further reduce collision risks, but in their absence, a fallback to a randomly generated 48-bit node ID with the multicast bit set to 1 provides a viable alternative while preserving global uniqueness properties. Version 2 follows similar clock and sequence rules but scopes uniqueness via UID/GID.[1]
Version 3 and Version 5 UUIDs, being name-based, mitigate cross-domain collisions by scoping generations to predefined namespaces, such as the DNS namespace (UUID: 6ba7b810-9dad-11d1-80b4-00c04fd430c8), ensuring that identical names in different namespaces produce distinct UUIDs through hashing (MD5 for Version 3, SHA-1 for Version 5). Version 5 is preferred over Version 3 due to MD5's known vulnerabilities, though both rely on the uniqueness of the input name within its namespace to avoid collisions.[1]
Beyond version-specific measures, general best practices include generating UUIDs on demand during runtime rather than pre-allocating batches, which can introduce errors if not properly synchronized across distributed systems; in high-volume environments, such as databases handling millions of insertions, application-level monitoring for duplicates—via hashing indexes or periodic scans—is recommended to detect and handle any rare collisions promptly. Standard-compliant libraries facilitate these practices: the OSSP UUID library implements Versions 1, 3, 4, and 5 per RFC 4122 (updated in RFC 9562), using system-appropriate entropy sources for randomness, while Java's java.util.UUID class employs SecureRandom for randomUUID() to generate Version 4 UUIDs with cryptographic strength.[1][55][56]
Uses
Filesystems and Storage
In filesystems, universally unique identifiers (UUIDs) serve as persistent, hardware-independent labels for volumes and partitions, enabling reliable identification and mounting without reliance on volatile device paths. This approach facilitates seamless operation across diverse hardware configurations and prevents conflicts arising from device enumeration changes. UUIDs are typically generated randomly during filesystem creation, often adhering to version 4 of the UUID standard for high entropy and collision resistance.[57] The GUID Partition Table (GPT), standardized in the UEFI specification, employs a 128-bit Disk GUID to uniquely identify the entire disk, including its header and associated storage. This GUID is generated randomly upon GPT initialization and stored in the GPT header at byte offset 56, serving as a disk signature that distinguishes it from other storage devices even if cloned. Partition entries in GPT also use UUIDs for type identification and unique partitioning, ensuring unambiguous recognition in bootloaders and operating systems.[57] Linux filesystems like ext4 and XFS integrate UUIDs directly into their superblocks for volume identification. For ext4, the UUID is automatically generated as a random 128-bit value during filesystem creation with themkfs.ext4 command, unless explicitly set via the -U option; this identifier is then referenced in /etc/fstab for stable mounting, decoupling the process from device names like /dev/sda1 that may shift due to hardware additions. Similarly, XFS generates a random UUID by default when formatted with mkfs.xfs, storable in the superblock and customizable with the -m uuid=value option, allowing consistent administration and mounting via tools like mount and xfs_admin. These UUIDs enable automated detection and configuration in environments with dynamic storage topologies.[58][59]
Microsoft's NTFS filesystem utilizes 128-bit GUIDs as Object IDs for volumes and files, assigned to metadata structures like the master file table (MFT) records and the volume root. These GUIDs, supported exclusively on NTFS volumes, facilitate secure identification and access, particularly in security descriptors where they link ownership and permissions without depending on file paths. The volume's Object ID acts as a persistent GUID for the entire filesystem, complementing the 64-bit volume serial number and enabling features like volume mount points via the mountvol command, which references volumes as \\?\volume\{GUID}\.[60][61]
Apple File System (APFS) organizes storage into containers, each identified by a unique 128-bit UUID that encapsulates multiple volumes sharing the same physical space. This container UUID plays a critical role in encryption, where keybags are encrypted using the UUID to enable rapid, secure erasure of contents by invalidating keys tied to the identifier. For snapshots, APFS leverages the container structure to manage point-in-time copies across volumes, with the UUID ensuring integrity and isolation during operations like cloning or rollback, as volumes within the container inherit contextual metadata from it.[62]
The adoption of UUIDs in these filesystems yields key advantages, including portability across hardware platforms, as identifiers remain constant regardless of port changes or system reconfiguration, thus simplifying migration and virtualization. They also mitigate naming conflicts in multi-disk setups by providing globally unique labels, reducing errors in mounting and data access while enhancing resilience in distributed or cloud environments.[63]
Databases and Identification
In relational databases, UUIDs serve as surrogate primary keys, providing globally unique identifiers without relying on sequential values generated by the database. For instance, PostgreSQL includes a nativeuuid data type that stores 128-bit UUIDs efficiently as binary values, making it suitable for primary keys in distributed environments where uniqueness across systems is essential.[64] This approach avoids the predictability of auto-incrementing integer IDs, which can expose sensitive information through enumeration attacks or reveal database growth patterns.[65]
Certain UUID versions enhance database operations involving time. Versions 1 and 6 incorporate timestamps, enabling temporal queries by extracting creation times directly from the identifier for filtering or ordering records based on when data was inserted.[1] In contrast, version 7 prioritizes sortability by placing a Unix timestamp in the most significant bits, improving index performance in B-tree structures for time-ordered data retrieval.[1]
In NoSQL databases, UUIDs offer alternatives to native identifier schemes. MongoDB defaults to ObjectIds, which are 12-byte values embedding timestamps for efficient indexing and sorting, but UUIDs provide stronger cross-system uniqueness at the cost of larger storage when encoded as binary (16 bytes versus ObjectId's compact form). Cassandra employs version 4 UUIDs—randomly generated for even data distribution—as partition keys, ensuring balanced load across nodes in clustered setups without hotspots from sequential patterns.[66]
For indexing, UUIDs are stored in binary format (16 bytes) in systems like PostgreSQL, which is more space-efficient than text representations but doubles the size of 8-byte integers like BIGINT, potentially increasing index bloat in high-volume tables.[64] Despite this, binary storage supports fast comparisons and hashing. In distributed transactions, UUIDs are generated client-side at insert time, ensuring consistency across replicas without central coordination and preventing ID conflicts during merges.[65]
Networking and Distributed Systems
In distributed networking and systems, universally unique identifiers (UUIDs) play a critical role in ensuring unambiguous identification of objects, sessions, and messages across heterogeneous environments, preventing conflicts in transient references without relying on centralized coordination. This is particularly valuable in protocols where objects or data must be referenced remotely, as UUIDs provide a 128-bit space that minimizes collision risks even in high-scale, decentralized scenarios. In the Common Object Request Broker Architecture (CORBA) using the Internet Inter-ORB Protocol (IIOP), UUIDs form part of the object key within Interoperable Object References (IORs), enabling unique identification of distributed objects across ORBs. The DCE UUID format is specified for this purpose in IIOP profiles, allowing clients to invoke methods on remote objects without name resolution dependencies.[67] Similarly, Microsoft's Distributed Component Object Model (DCOM) employs GUIDs—equivalent to UUIDs—for interface marshaling, where the Interface Identifier (IID) uniquely specifies the COM interface being accessed, and the Causality Identifier (CID) tracks related call chains during remote activation and invocation.[68] For modern web-based protocols, RESTful HTTP APIs frequently incorporate UUIDs in resource URLs to denote specific entities, such as/api/resources/{uuid}, which obscures sequential patterns and supports distributed generation without database coordination.[69] ETags for caching can also leverage UUIDs as opaque validators, ensuring efficient conditional requests by comparing resource versions across distributed caches. In gRPC with Protocol Buffers, UUIDs are typically encoded as fixed-length strings (e.g., 36 characters in hyphenated form) or 16-byte fields within message definitions to identify requests, responses, or session objects, facilitating reliable routing in microservices architectures.[70]
Custom UUID variants may be defined in application-specific protocols to incorporate network metadata, such as timestamps or node IDs, enhancing traceability in remoting scenarios.
