Hubbry Logo
Union typeUnion typeMain
Open search
Union type
Community hub
Union type
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something
Union type
Union type
from Wikipedia

In computer science, a union is a value that may have any of multiple representations or formats within the same area of memory; that consists of a variable that may hold such a data structure. Some programming languages support a union type for such a data type. In other words, a union type specifies the permitted types that may be stored in its instances, e.g., float and int. In contrast with a record, which could be defined to contain both a float and an integer; a union would hold only one at a time.

A union can be pictured as a chunk of memory that is used to store variables of different data types. Once a new value is assigned to a field, the existing data is overwritten with the new data. The memory area storing the value has no intrinsic type (other than just bytes or words of memory), but the value can be treated as one of several abstract data types, having the type of the value that was last written to the memory area.

In type theory, a union has a sum type; this corresponds to disjoint union in mathematics.

Depending on the language and type, a union value may be used in some operations, such as assignment and comparison for equality, without knowing its specific type. Other operations may require that knowledge, either by some external information, or by the use of a tagged union.

Untagged unions

[edit]

Because of the limitations of their use, untagged unions are generally only provided in untyped languages or in a type-unsafe way (as in C). They have the advantage over simple tagged unions of not requiring space to store a data type tag.

The name "union" stems from the type's formal definition. If a type is considered as the set of all values that that type can take on, a union type is simply the mathematical union of its constituting types, since it can take on any value any of its fields can. Also, because a mathematical union discards duplicates, if more than one field of the union can take on a single common value, it is impossible to tell from the value alone which field was last written.

However, one useful programming function of unions is to map smaller data elements to larger ones for easier manipulation. A data structure consisting, for example, of 4 bytes and a 32-bit integer, can form a union with an unsigned 64-bit integer, and thus be more readily accessed for purposes of comparison etc.

Unions in various programming languages

[edit]

ALGOL 68

[edit]

ALGOL 68 has tagged unions, and uses a case clause to distinguish and extract the constituent type at runtime. A union containing another union is treated as the set of all its constituent possibilities, and if the context requires it a union is automatically coerced into the wider union. A union can explicitly contain no value, which can be distinguished at runtime. An example is:

 mode node = union (real, int, string, void);
 
 node n := "abc";
 
 case n in
   (real r):   print(("real:", r)),
   (int i):    print(("int:", i)),
   (string s): print(("string:", s)),
   (void):     print(("void:", "EMPTY")),
   out         print(("?:", n))
 esac

The syntax of the C/C++ union type and the notion of casts was derived from ALGOL 68, though in an untagged form.[1]

C/C++

[edit]

In C and C++, untagged unions are expressed nearly exactly like structures (structs), except that each data member is located at the same memory address. The data members, as in structures, need not be primitive values, and in fact may be structures or even other unions. C++ (since C++11) also allows for a data member to be any type that has a full-fledged constructor/destructor and/or copy constructor, or a non-trivial copy assignment operator. For example, it is possible to have the standard C++ string as a member of a union.

The primary use of a union is allowing access to a common location by different data types, for example hardware input/output access, bitfield and word sharing, or type punning. Unions can also provide low-level polymorphism. However, there is no checking of types, so it is up to the programmer to be sure that the proper fields are accessed in different contexts. The relevant field of a union variable is typically determined by the state of other variables, possibly in an enclosing struct.

One common C programming idiom uses unions to perform what C++ calls a reinterpret_cast, by assigning to one field of a union and reading from another, as is done in code which depends on the raw representation of the values. A practical example is the method of computing square roots using the IEEE representation. This is not, however, a safe use of unions in general.

Structure and union specifiers have the same form. [ . . . ] The size of a union is sufficient to contain the largest of its members. The value of at most one of the members can be stored in a union object at any time. A pointer to a union object, suitably converted, points to each of its members (or if a member is a bit-field, then to the unit in which it resides), and vice versa.

— ANSI/ISO 9899:1990 (the ANSI C standard) Section 6.5.2.1

Anonymous union

[edit]

In C++, C11, and as a non-standard extension in many compilers, unions can also be anonymous. Their data members do not need to be referenced, are instead accessed directly. They have some restrictions as opposed to traditional unions: in C11, they must be a member of another structure or union,[2] and in C++, they can not have methods or access specifiers.

Simply omitting the class-name portion of the syntax does not make a union an anonymous union. For a union to qualify as an anonymous union, the declaration must not declare an object. Example:

import std;

int main() {
    union {
        float f;
        uint32_t d; // Assumes float is 32 bits wide
    };

    f = 3.14f;
    std::println("Hexadecimal representation of 3.14f: {:a}", d); 
    return 0;
}

Anonymous unions are also useful in C struct definitions to provide a sense of namespacing.[3]

Transparent union

[edit]

In compilers such as GCC, Clang, and IBM XL C for AIX, a transparent_union attribute is available for union types. Types contained in the union can be converted transparently to the union type itself in a function call, provided that all types have the same size. It is mainly intended for function with multiple parameter interfaces, a use necessitated by early Unix extensions and later re-standardisation.[4]

COBOL

[edit]

In COBOL, union data items are defined in two ways. The first uses the RENAMES (66 level) keyword, which effectively maps a second alphanumeric data item on top of the same memory location as a preceding data item. In the example code below, data item PERSON-REC is defined as a group containing another group and a numeric data item. PERSON-DATA is defined as an alphanumeric data item that renames PERSON-REC, treating the data bytes continued within it as character data.

  01  PERSON-REC.
      05  PERSON-NAME.
          10  PERSON-NAME-LAST    PIC X(12).
          10  PERSON-NAME-FIRST   PIC X(16).
          10  PERSON-NAME-MID     PIC X.
      05  PERSON-ID               PIC 9(9) PACKED-DECIMAL.
  
  01  PERSON-DATA                 RENAMES PERSON-REC.

The second way to define a union type is by using the REDEFINES keyword. In the example code below, data item VERS-NUM is defined as a 2-byte binary integer containing a version number. A second data item VERS-BYTES is defined as a two-character alphanumeric variable. Since the second item is redefined over the first item, the two items share the same address in memory, and therefore share the same underlying data bytes. The first item interprets the two data bytes as a binary value, while the second item interprets the bytes as character values.

  01  VERS-INFO.
      05  VERS-NUM        PIC S9(4) COMP.
      05  VERS-BYTES      PIC X(2)
                          REDEFINES VERS-NUM

Pascal

[edit]

In Pascal, there are two ways to create unions. One is the standard way through a variant record. The second is a nonstandard means of declaring a variable as absolute, meaning it is placed at the same memory location as another variable or at an absolute address. While all Pascal compilers support variant records, only some support absolute variables.

For the purposes of this example, the following are all integer types: a byte consists of 8 bits, a word is 16 bits, and an integer is 32 bits.

The following example shows the non-standard absolute form:

var
    A: Integer;
    B: array[1..4] of Byte absolute A;
    C: Integer absolute 0;

In the first example, each of the elements of the array B maps to one of the specific bytes of the variable A. In the second example, the variable C is assigned to the exact machine address 0.

In the following example, a record has variants, some of which share the same location as others:

type
     Shape = (Circle, Square, Triangle);
     Dimensions = record
        case Figure: Shape of 
           Circle: (Diameter: real);
           Square: (Width: real);
           Triangle: (Side: real; Angle1, Angle2: 0..360)
        end;

PL/I

[edit]

In PL/I the original term for a union was cell,[5] which is still accepted as a synonym for union by several compilers. The union declaration is similar to the structure definition, where elements at the same level within the union declaration occupy the same storage. Elements of the union can be any data type, including structures and array.[6]: pp192–193  Here vers_num and vers_bytes occupy the same storage locations.

  1  vers_info         union,
     5 vers_num        fixed binary,
     5 vers_bytes      pic '(2)A';

An alternative to a union declaration is the DEFINED attribute, which allows alternative declarations of storage, however the data types of the base and defined variables must match.[6]: pp.289–293 

Rust

[edit]

Rust implements both tagged and untagged unions. In Rust, tagged unions are implemented using the enum keyword. Unlike enumerated types in most other languages, enum variants in Rust can contain additional data in the form of a tuple or struct, making them tagged unions rather than simple enumerated types.[7]

Rust also supports untagged unions using the union keyword. The memory layout of unions in Rust is undefined by default,[8] but a union with the #[repr(C)] attribute will be laid out in memory exactly like the equivalent union in C.[9] Reading the fields of a union can only be done within an unsafe function or block, as the compiler cannot guarantee that the data in the union will be valid for the type of the field; if this is not the case, it will result in undefined behavior.[10]

Syntax and example

[edit]

C/C++

[edit]

In C and C++, the syntax is:

union <name> {
    <datatype>  <1st variable name>;
    <datatype>  <2nd variable name>;
    .
    .
    .
    <datatype>  <nth variable name>;
} <union variable name>;

A structure can also be a member of a union, as the following example shows:

union Name1 {
    struct Name2 {  
        int a;
        float b;
        char c;
    } svar;
    int d;
} uvar;

This example defines a variable uvar as a union (tagged as Name1), which contains two members, a structure (tagged as Name2) named svar (which in turn contains three members), and an integer variable named d.

Unions may occur within structures and arrays, and vice versa:

struct Name1 {  
    int flags;
    char* name;
    int utype;
    union Name2 {
        int ival;
        float fval;
        char* sval;
    } u;
} symtab[NSYM];

The number ival is referred to as symtab[i].u.ival and the first character of string sval by either of *symtab[i].u.sval or symtab[i].u.sval[0].

Java

[edit]

Union types do not exist in Java, although they can be somewhat emulated using interfaces. However, catch blocks can represent exception types as unions.

package org.wikipedia.examples;

import java.io.IOException;
import java.sql.SQLException;

public class Example {
    private void mightThrow() throws IOException, SQLException {
        // some actions which may throw here
    }

    public static void main(String[] args) {
        try {
            mightThrow();
        } catch (IOException | SQLException e) {
             System.err.printf("Either an IOException or SQLException was caught: %s%n", e.getMessage());
             e.printStackTrace();
        }
    }
}

PHP

[edit]

Union types were introduced in PHP 8.0.[11] The values are implicitly "tagged" with a type by the language, and may be retrieved by "gettype()".

class Example
{
    private int|float $foo;

    public function squareAndAdd(float|int $bar): int|float
    {
        return $bar ** 2 + $this->foo;
    }
}

Python

[edit]

Support for typing was introduced in Python 3.5.[12] The new syntax for union types were introduced in Python 3.10.[13]

from typing import Union

class Example:
    foo: int = 0

    # old style:
    def square_and_add(self, bar: Union[int, float]) -> Union[int, float]:
        return bar ** 2 + self.foo

    # new style:
    def square_and_add(self, bar: int | float) -> int | float:
        return bar ** 2 + self.foo

TypeScript

[edit]

Union types are supported in TypeScript.[14] The values are implicitly "tagged" with a type by the language, and may be retrieved using a typeof call for primitive values and an instanceof comparison for complex data types. Types with overlapping usage (e.g. a slice method exists on both strings and arrays, the plus operator works on both strings and numbers) don't need additional narrowing to use these features.

function successor(n: number | bigint): number | bigint {
    // types that support the same operations don't need narrowing
    return ++n;
}

function dependsOnParameter(v: string | string[] | number) {
    // distinct types need narrowing
    if (v instanceof Array) {
        // do something
    } else if (typeof(v) === "string") {
        // do something else
    } else {
        // has to be a number
    }
}

Rust

[edit]

Tagged unions in Rust use the enum keyword, and can contain tuple and struct variants:

enum Foo {
	Bar(i32),
	Baz { x: String, y: i32 },
}

Untagged unions in Rust use the union keyword:

union Foo {
	bar: i32,
	baz: bool,
}

Reading from the fields of an untagged union results in undefined behavior if the data in the union is not valid as the type of the field, and thus requires an unsafe block:

let x = Foo { bar: 10 };
let y = unsafe { x.bar }; // This will set y to 10, and does not result in undefined behavior.
let z = unsafe { x.baz }; // This results in undefined behavior, as the value stored in x is not a valid bool.


References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
In , a union type is a that represents the union of two or more types, allowing a value to belong to any one of those types at a time, often interpreted set-theoretically as the set of all values from the constituent types. This concept enables variables or expressions to flexibly accommodate multiple possible data representations, contrasting with product types like or structs that combine multiple values simultaneously. Practical implementations of union types first appeared in programming languages in the 1970s, such as variant records in Pascal and untagged unions in . In practice, they manifest in two primary forms: untagged unions, which overlap for different types to optimize storage, and tagged unions (also known as sum types or variants), which include a discriminator to ensure at runtime or . Untagged unions, as in , allocate space equal to the largest member and allow only one member to be active, making them useful for -efficient type punning but prone to if misused. For example, a C union might store either an or a floating-point number in the same memory location, with the size determined by the larger type. In modern statically typed languages, union types emphasize safety and expressiveness through tagging and type narrowing. In , unions are denoted with the | operator, such as string | number, permitting functions to accept values of either type while restricting access to shared properties. Languages like implement safe tagged unions via enums, which support for exhaustive handling of variants, as in defining a Result<T, E> type that can hold either a success value or an error. Similarly, in functional languages like or , algebraic data types provide union-like sum types for modeling disjoint alternatives, such as shapes in a system (e.g., circle or rectangle). These features support advanced , occurrence typing, and , enhancing code reliability in applications from to .

Overview

Definition and purpose

A is a user-defined in programming languages that enables a single variable to store values of different types, with all possible members occupying the same contiguous block of and only one member holding a valid value at any given time. This overlapping storage contrasts with other composite types, as the size of a union is determined solely by the largest member, allowing efficient reuse of space for mutually exclusive data variants. The primary purpose of union types is to optimize memory usage by permitting the representation of variant data—such as a field that might hold an integer or a string—without allocating separate space for each possibility, which is particularly valuable in systems programming or embedded environments where resources are limited. Additionally, unions facilitate type punning, a technique for reinterpreting the binary representation of data as a different type to perform low-level manipulations, such as byte-order conversions or hardware register access. In certain contexts, they support polymorphism-like behavior by enabling the storage of disparate object types within a uniform structure, aiding in the design of flexible data representations. Unlike structs, which provide non-overlapping storage for all members and thus consume proportional to the sum of their sizes, unions enforce shared allocation to promote . Enumerations, by , serve to define named constants without inherent for variants, focusing instead on symbolic representation rather than polymorphic containment. Union types may be implemented as untagged variants, depending on external context to identify the active member, or tagged variants, incorporating a type discriminator for runtime safety. Overall, the key benefits of union types lie in their ability to deliver space efficiency and representational flexibility, allowing developers to model data that inherently varies in form while minimizing overhead.

Historical development

Early implementations of union types appeared in , released in 1964, where they were defined using the UNION keyword to overlay storage for different data elements. The concept was further formalized in the programming language, introduced in 1968, where they were defined as "united modes" to allow variables to hold values of multiple types while incorporating modes for runtime and . This design emphasized expressive type systems, enabling programmers to declare unions that could dynamically determine and enforce the active type, influencing subsequent languages' approaches to type flexibility. In the early 1970s, union types evolved in systems-oriented languages for memory efficiency. Pascal, released in 1970, introduced tagged unions through variant records, which used a discriminant tag to safely distinguish among alternative types within a record, providing compile-time checks against invalid access. Concurrently, C, developed by Dennis Ritchie between 1971 and 1973, adopted untagged unions to facilitate low-level memory manipulation and data overlaying without runtime overhead, prioritizing portability and performance in Unix system programming. These developments in Pascal and C highlighted a divergence: tagged variants for safety versus untagged for raw control. The influence of these early unions extended to variant types in , where sum types—essentially tagged unions—emerged in languages like ML (1973) and later (1990) to model disjoint choices with exhaustive for error-free handling. In modern dynamic and typed languages, union types have expanded for better expressiveness in type hinting and structural typing. 8.0 (2020) added native union types for parameters, properties, and returns, allowing multiple type declarations separated by pipes. introduced union types in version 1.4 (2015), enabling values to be one of several types via the | operator for safer development. Python 3.10 (2021) simplified unions with the | syntax for annotations, reducing reliance on the typing.Union class. Meanwhile, Go has seen experimental proposals for restricted union types using type sets in interfaces, aiming to address limitations in handling variant data without full sum type adoption as of 2025.

Classification

Untagged unions

Untagged unions, also known as untagged or discriminated-free unions, are a form of union type in programming languages that lack a runtime discriminant or tag to specify which member of the union is currently active, placing the responsibility on the to maintain awareness of the active type. This design allows all possible members to overlay the same block of memory, enabling efficient space usage where only one is needed at a time, but it requires explicit tracking to avoid misinterpretation of the stored value. A key characteristic of untagged unions is their shared memory allocation, where the size of the union equals the size of its largest member, and assigning to one member overwrites the data of others, potentially leading to undefined behavior if the programmer accesses an incorrect member. For instance, in generic pseudocode, an untagged union might be declared as:

union Example { int integer; float real; };

union Example { int integer; float real; };

Here, setting u.integer = 1 would store the integer value in the shared memory, but reading u.real could produce garbage or erroneous results unless the programmer ensures the type matches the intended use. Untagged unions find common application in embedded systems and performance-critical code, particularly for , where raw bytes are reinterpreted as different types to optimize access to hardware registers or packed structures without additional overhead. For example, they enable efficient manipulation of bitfields in low-level device drivers by allowing direct overlay of scalar types onto structured representations. Historically, untagged unions have been a staple in C-family languages since the development of in the early 1970s by at , where they provided a lightweight mechanism for variant data handling in without the complexity of type tags. In contrast to tagged unions, which incorporate a for runtime type identification to enhance safety, untagged variants prioritize raw efficiency at the cost of potential errors.

Tagged unions

Tagged unions, also known as discriminated unions or sum types, pair a union data structure with an explicit tag or enumerator field that indicates which member of the union is valid at runtime. This tag serves as a discriminator, allowing the program to perform type checking before accessing the underlying data, thereby mitigating risks associated with incorrect variant interpretation. These structures are particularly prevalent in paradigms, where they form the basis of algebraic data types by representing a of mutually exclusive possibilities, each potentially carrying associated . The tag ensures that only the appropriate variant is accessed, enforcing either at in statically checked languages or via runtime verification. Unlike untagged unions, which lack this mechanism and depend on manual tracking, tagged unions systematically prevent invalid operations, reducing errors in variant handling. Common use cases include modeling optional or nullable values through types like Maybe or Option, which encapsulate either a present value or an absence indicator, and error-handling constructs such as Result or Either, distinguishing between successful outcomes and failure states with descriptive payloads. These applications promote robust code by making invalid states unrepresentable and facilitating exhaustive over all variants. For illustration, consider a representation of a simple for holding either an or a :

enum VariantType { [INTEGER](/page/Integer), [STRING](/page/STRING) }; struct TaggedValue { [VariantType](/page/The_Variant) tag; union { int integerValue; char* stringValue; } [payload](/page/Payload); };

enum VariantType { [INTEGER](/page/Integer), [STRING](/page/STRING) }; struct TaggedValue { [VariantType](/page/The_Variant) tag; union { int integerValue; char* stringValue; } [payload](/page/Payload); };

To initialize an variant:

TaggedValue tv; tv.tag = [INTEGER](/page/Integer); tv.payload.integerValue = 42;

TaggedValue tv; tv.tag = [INTEGER](/page/Integer); tv.payload.integerValue = 42;

Access requires tag verification:

if (tv.tag == [INTEGER](/page/Integer)) { // Safely use tv.payload.integerValue } else { // Handle mismatch, e.g., raise error or default }

if (tv.tag == [INTEGER](/page/Integer)) { // Safely use tv.payload.integerValue } else { // Handle mismatch, e.g., raise error or default }

This pattern guarantees that the payload is interpreted correctly, with the tag acting as a runtime sentinel.

Technical details

Memory representation

In union types, all members occupy the same memory region, with their storage overlapping such that each member begins at the initial offset of the union object, enabling only one member to hold a valid value at any time without individual allocations. The size of a union is determined by the size of its largest member, rounded upward as necessary to meet the alignment constraints of that member or the overall structure. For instance, a union containing a 4-byte and an 8-byte double would have a size of 8 bytes to accommodate the larger type. The alignment requirement of the union matches that of its most strictly aligned member, ensuring the entire object can be placed at valid addresses; additional bytes may be inserted at the end to fulfill this, particularly for compatibility. Unions facilitate by allowing the bit pattern stored in one member to be reinterpreted as another member's type when accessed, a technique that is explicitly permitted in C to adhere to strict aliasing rules without invoking .

Type safety considerations

Union types, particularly untagged variants, pose significant type safety risks due to the potential for accessing an inactive member, which results in . In languages like and C++, reading from a union member that was not the most recently written to can lead to misinterpretation of the shared memory, causing bit-level misalignment, , or program crashes. This occurs because all members overlay the same memory region, and without an indicator of the active type, the compiler cannot enforce correct access, allowing erroneous code to compile and execute unpredictably. Tagged unions mitigate these issues by including a discriminator (such as an enum) to identify the active , but safety is not inherent; if the tag is not checked before accessing the data, tag mismatches can still trigger akin to untagged cases. For instance, assuming the wrong is active may lead to invalid memory reads or writes, exacerbating the risks of misalignment or invalid operations. To address this, developers must explicitly validate the tag prior to access, ensuring the correct member is used. Mitigation strategies include leveraging extensions and tools to enforce stricter rules. In GCC, the -fstrict-aliasing flag enables optimizations based on the strict aliasing rule, which assumes pointers of different types do not the same object; however, improper union use for can violate this, so disabling it with -fno-strict-aliasing is recommended when necessary to avoid optimization-induced bugs. Additionally, static analysis tools can detect potential misuse by tracking active members and flagging unchecked accesses. Best practices for safe union usage emphasize proactive measures: always initialize and maintain the tag in tagged unions to track the active variant; avoid unions for types with non-overlapping memory layouts or non-trivial constructors/destructors, as these complicate lifetime management; and prefer type-safe alternatives like C++'s std::variant, which tracks the active type at runtime and throws an exception on invalid access, preventing . In modern languages such as , enums serve as tagged unions with compile-time enforcement of variant handling via , providing without runtime checks through zero-cost abstractions that compile to efficient . While tags introduce performance trade-offs, such as additional overhead (typically 1-8 bytes for the discriminator depending on the enum ), they enable optimizations like improved branch prediction in switch statements on the tag, potentially reducing execution time in hot paths compared to unchecked untagged access. This balance favors tagged approaches in safety-critical code, where the modest space cost outweighs the risks of .

Language implementations

C and C++

In C, unions are declared using the syntax union name { member-declarations };, where the members are a sequence of declarations sharing the same memory location, and the size of the union is determined by the largest member, potentially with for alignment. The union is untagged by default, meaning there is no built-in mechanism to track which member is active, requiring manual management by the . Unions in C are typically used for space-efficient storage of variant data or for , where the allows reinterpretation of the same bytes as different types. A common example is a union to hold either an or a character array, as shown below:

c

union Data { int i; char str[10]; }; int main() { union Data d; d.i = 42; // Initialize with int // Now access as char array ([type punning](/page/Type_punning)) for (int j = 0; j < 10; ++j) { printf("%c", d.str[j]); } return 0; }

union Data { int i; char str[10]; }; int main() { union Data d; d.i = 42; // Initialize with int // Now access as char array ([type punning](/page/Type_punning)) for (int j = 0; j < 10; ++j) { printf("%c", d.str[j]); } return 0; }

This demonstrates , which is permitted in C when accessing through the union type, allowing safe reinterpretation of the 's bytes as characters. Another practical use is detecting endianness via a union with an and byte array:

c

union Endian { int value; unsigned char bytes[sizeof(int)]; }; int main() { union Endian e; e.value = 1; if (e.bytes[0] == 1) { // Little-endian } else { // Big-endian } return 0; }

union Endian { int value; unsigned char bytes[sizeof(int)]; }; int main() { union Endian e; e.value = 1; if (e.bytes[0] == 1) { // Little-endian } else { // Big-endian } return 0; }

Such patterns rely on the union's overlapping storage but must adhere to strict aliasing rules, which prohibit direct pointer aliasing outside the union context to enable compiler optimizations. C++ inherits C's union syntax and semantics but introduces restrictions and extensions. Unions in C++ cannot have virtual functions, base classes, or be used as bases, and they support member functions including constructors and destructors under certain conditions. C++11 expanded support to allow non-POD (Plain Old Data) types in unions, provided at most one member has a non-trivial constructor or destructor, and exactly one such member is initialized in a constructor. Anonymous unions, standard in C++ for embedding members directly into enclosing structs without qualification (e.g., struct S { union { int a; float b; }; }; accesses s.a), are a GCC extension in C via unnamed fields. GCC also provides transparent unions as an extension, treating function arguments of union type as compatible with any member type for overloading. A key pitfall in both C and C++ is undefined behavior from violating strict aliasing, where accessing a union member via a pointer of an incompatible type (outside the union) can lead to optimization errors or crashes. Multi-threaded access to unions without synchronization risks data races, as the active member is not atomically tracked, potentially causing inconsistent reads. As a safer alternative in modern C++, std::variant (introduced in C++17) provides a type-safe union with compile-time type checking and runtime active-alternative tracking.

Rust

In Rust, unions provide a way to define untagged union types similar to those in C, where multiple fields overlap in the same memory location. The syntax for declaring a union mirrors that of a struct but uses the union keyword: union Name { field1: Type1, field2: Type2 }. For example:

rust

union MyUnion { f1: u32, f2: f32, }

union MyUnion { f1: u32, f2: f32, }

All fields share the same storage, so writing to one overwrites the others, and reading from a field requires an unsafe block to prevent undefined behavior from accessing invalid data. Initialization is safe, as in let u = MyUnion { f1: 1 };, but accessing fields demands explicit unsafe code, such as unsafe { let val = u.f1; }. This design ensures the borrow checker cannot verify memory safety automatically, placing responsibility on the programmer to track the active field. Unions have been stable since Rust 1.19.0 and remain unchanged in the 2021 edition, prioritizing explicit unsafety to avoid C-style runtime risks. For safe handling of variant data, Rust favors enums as tagged unions, which embed a discriminant tag to distinguish variants at compile time. The syntax is enum Name { Variant1(Type1), Variant2(Type2) }, where each variant can carry associated data. A canonical example is the standard Option<T> enum:

rust

enum Option<T> { None, Some(T), }

enum Option<T> { None, Some(T), }

Usage involves safe pattern matching with match, which the compiler enforces for exhaustiveness: all variants must be handled, or a default arm is required. For instance:

rust

let some_value = Some(42); match some_value { Some(x) => println!("Got {}", x), None => println!("Nothing"), }

let some_value = Some(42); match some_value { Some(x) => println!("Got {}", x), None => println!("Nothing"), }

This prevents invalid access, as the borrow checker ensures only the active 's data is borrowed. Enums implement zero-cost tagged unions, with no runtime overhead—the tag and payload are stored contiguously, matching the size of the largest plus the tag. This approach contrasts with raw unions by integrating safety into the , eliminating the need for unsafe blocks in typical use.

Python

In Python, union types are supported through the typing module for static type hinting, allowing variables, function parameters, and return values to be annotated as compatible with multiple types. Introduced in Python 3.5 via PEP 484, the Union type constructor enables annotations like Union[int, str], which indicate that a value can be of any of the specified types during static analysis. These hints are optional and have no effect on runtime behavior, serving primarily to improve and enable tools for early detection. Starting with Python 3.10, PEP 604 simplified union syntax by overloading the bitwise OR operator (|) for types, allowing int | str as a direct alternative to Union[int, str] in annotations and even in isinstance and issubclass calls. This change enhances readability without requiring imports from the typing module for basic cases. For example, a function can be annotated as follows:

python

from typing import Union # For Python < 3.10 def process_input(value: Union[int, str]) -> None: pass # In Python 3.10+, equivalent to: def process_input(value: int | str) -> None: pass

from typing import Union # For Python < 3.10 def process_input(value: Union[int, str]) -> None: pass # In Python 3.10+, equivalent to: def process_input(value: int | str) -> None: pass

Such annotations help static type checkers infer compatibility, for instance, ensuring that process_input(42) or process_input("hello") are valid while rejecting incompatible types like process_input([1, 2]). Python's union types lack runtime enforcement or shared memory representation, distinguishing them from true unions in languages like C; instead, they are purely for static checking with tools such as mypy, which verifies annotations against code usage. At runtime, union-like behavior can be emulated using isinstance checks or by defining classes and dataclasses that wrap variants, akin to tagged unions, to enforce type-specific logic. For instance:

python

from dataclasses import dataclass from [typing](/page/Typing) import Union @dataclass class IntWrapper: value: int @dataclass class StrWrapper: value: [str](/page/€STR) # Emulate union at runtime def handle_variant(variant: Union[IntWrapper, StrWrapper]) -> None: if isinstance(variant, IntWrapper): print(f"Integer: {variant.value}") elif isinstance(variant, StrWrapper): print(f"String: {variant.value}")

from dataclasses import dataclass from [typing](/page/Typing) import Union @dataclass class IntWrapper: value: int @dataclass class StrWrapper: value: [str](/page/€STR) # Emulate union at runtime def handle_variant(variant: Union[IntWrapper, StrWrapper]) -> None: if isinstance(variant, IntWrapper): print(f"Integer: {variant.value}") elif isinstance(variant, StrWrapper): print(f"String: {variant.value}")

This approach allows runtime discrimination but requires explicit implementation, as Python's dynamic nature does not natively support discriminated unions. Historically, before explicit unions, developers relied on the Any type from the typing module for flexible annotations, but unions marked a shift toward more precise static , reducing over-reliance on Any and improving checker accuracy in large codebases. The 3.10 syntax further streamlined adoption, aligning Python's more closely with modern idioms while preserving .

TypeScript

In TypeScript, union types enable a value to be one of several possible types, providing structural typing that enhances in JavaScript development. This feature allows developers to model scenarios where a variable or property can hold values of multiple types, such as strings or , without runtime enforcement since TypeScript is a compile-time superset of . The syntax for declaring a union type uses the (|) to separate constituent types, either as a named type alias or inline. For example, type ID = string | number; defines a type alias for an identifier that can be either a string or a number, while inline usage appears as function log(id: string | number) { ... }. Unions can combine primitive types, object types, or other unions, but access to properties is limited to those common across all members to prevent type errors. Type narrowing refines the inferred type within conditional blocks using type guards, such as typeof checks or in operators, allowing safe access to type-specific members. For instance, in a function handling a string | number parameter, an if (typeof value === "string") block narrows value to string, enabling methods like toUpperCase(). Discriminated unions extend this by using a shared literal-type discriminant property for precise narrowing via switches or if-statements. A common pattern involves objects with a kind or type field:

typescript

type Shape = | { kind: "circle"; radius: number } | { kind: "square"; side: number }; function getArea(shape: Shape) { switch (shape.kind) { case "circle": return Math.PI * shape.radius ** 2; case "square": return shape.side ** 2; } }

type Shape = | { kind: "circle"; radius: number } | { kind: "square"; side: number }; function getArea(shape: Shape) { switch (shape.kind) { case "circle": return Math.PI * shape.radius ** 2; case "square": return shape.side ** 2; } }

This ensures exhaustiveness checking when paired with the never type in a default case, catching unhandled variants at compile time. Unions integrate seamlessly with interfaces and function overloads; for example, an interface might declare a property as status: "loading" | "success" | "error", restricting values to those literals and improving autocomplete in editors like VS Code via enhanced IntelliSense. In function signatures, unions support overloads by specifying return types based on input narrowing, such as handling mixed arrays. Since unions are erased during compilation to plain , there is no runtime overhead, but JavaScript's dynamic nature allows emulation through arrays or objects. Introduced in 3.4, const assertions (as const) enhance unions by preserving literal types in expressions, preventing widening to broader types like string and enabling precise union construction from constants. For example, const directions = ["north", "east"] as const; infers type Directions = "north" | "east";, useful for creating closed literal unions without manual enumeration. Subsequent versions, including 4.0, improved related inference for variadic tuples and unions, facilitating more robust type combinations in advanced scenarios.

PHP

Union types in PHP, introduced in version 8.0, enable developers to specify multiple acceptable types for function parameters, class properties, and return values, enhancing in this dynamically typed language commonly used for . This feature addresses limitations in PHP's prior by allowing declarations like int|string, where the vertical bar (|) separates individual types. The mixed type, also added in PHP 8.0, serves as a wildcard encompassing any type, including unions, and can be used in these declarations to indicate broad compatibility. Union types are supported in function parameters, class properties, and return types, but PHP does not permit standalone union type declarations for local variables. For instance, a class property can be declared as public int|float $value;, allowing the property to hold either an or a float value. Similarly, functions can specify union types in parameters and returns, such as function processData(mixed|[array](/page/Array) $input): [string](/page/String)|int { return gettype($input) === 'array' ? 'processed' : 42; }, where type checking might use built-in functions like gettype() for runtime validation. These declarations promote clearer code intent without requiring extensive documentation annotations. Type enforcement for union types occurs primarily at runtime, throwing a TypeError if a value does not match any permitted type in the declaration, though this is partial due to PHP's inherent weak typing system. Developers can enable stricter checking via declare(strict_types=1); at the file level, but backward compatibility with PHP's loose type coercion is maintained to avoid breaking existing codebases. Limitations include prohibitions on combining singleton types like false and true in a union (use bool instead) and excluding void or never from unions. In 8.1 and later, union types integrate with readonly properties, which can only be initialized once (typically in the constructor) and cannot be modified afterward, providing immutable typed data structures like public readonly string|int $id;. This combination supports safer data transfer objects in web applications, building on the union type foundation from 8.0.

Go and

In Go, there are no native union types, with developers instead emulating them using the empty interface interface{} to hold values of any type, combined with type switches for runtime type inspection and handling. For example, a value can be assigned to an interface{} variable and then inspected via a type switch:

go

var i interface{} = 42 switch v := i.(type) { case int: fmt.Println("It's an integer:", v) default: fmt.Println("Unknown type") }

var i interface{} = 42 switch v := i.(type) { case int: fmt.Println("It's an integer:", v) default: fmt.Println("Unknown type") }

This approach sacrifices compile-time type safety for flexibility, as the compiler cannot enforce which types might be stored. Since the introduction of generics in Go 1.18, type parameters can use union-like constraints to define sets of allowable types, such as int64 | float64, enabling more type-safe generic functions that approximate variant handling without full union semantics for variables. For instance, a generic sum function might constrain its value type to numeric unions:

go

func SumIntsOrFloats[K comparable, V int64 | float64](m map[K]V) V { var s V for _, v := range m { s += v } return s }

func SumIntsOrFloats[K comparable, V int64 | float64](m map[K]V) V { var s V for _, v := range m { s += v } return s }

However, these constraints apply only to generic parameters and do not provide true discriminated unions at the value level. Go's design philosophy prioritizes simplicity and explicitness, avoiding complex features like native unions to reduce cognitive load and potential errors in concurrent code. As of November 2025, ongoing proposals such as issue #70752 seek to enable finite type sets as union types via extended constraints, but none have been adopted in the language specification. Swift, in contrast, supports tagged unions natively through enumerations (enums) with associated values, allowing each case to carry data of different types while ensuring via exhaustive . This feature models sum types, where the enum acts as a discriminated union, storing a tag (the case) alongside optional payload data. A example is the Result type, commonly used for error handling:

swift

enum Result<T, E> { case success(T) case error(E) } let result: Result<String, [Error](/page/Error)> = .success("Hello")

enum Result<T, E> { case success(T) case error(E) } let result: Result<String, [Error](/page/Error)> = .success("Hello")

occurs via switch statements, which the enforces to cover all cases, preventing runtime surprises:

swift

switch result { case .success(let value): print("Success: \(value)") case .error(let [error](/page/Error)): print("Error: \(error)") }

switch result { case .success(let value): print("Success: \(value)") case .error(let [error](/page/Error)): print("Error: \(error)") }

This exhaustive checking promotes handling of variants, integrating seamlessly with optionals (themselves an enum with associated values) for representing possibly absent data. Swift's enums thus provide a robust alternative to raw unions, emphasizing compile-time guarantees over Go's runtime flexibility.

References

Add your contribution
Related Hubs
Contribute something
User Avatar
No comments yet.