Hubbry Logo
Set operations (SQL)Set operations (SQL)Main
Open search
Set operations (SQL)
Community hub
Set operations (SQL)
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something
Set operations (SQL)
Set operations (SQL)
from Wikipedia

Set operations in SQL is a type of operations which allow the results of multiple queries to be combined into a single result set.[1]

Set operators in SQL include UNION, INTERSECT, and EXCEPT, which mathematically correspond to the concepts of union, intersection and set difference.

UNION operator

[edit]

In SQL the UNION clause combines the results of two SQL queries into a single table of all matching rows. The two queries must result in the same number of columns and compatible data types in order to unite. Any duplicate records are automatically removed unless UNION ALL is used.

UNION can be useful in data warehouse applications where tables are not perfectly normalized.[2] A simple example would be a database having tables sales2005 and sales2006 that have identical structures but are separated because of performance considerations. A UNION query could combine results from both tables.

Note that UNION ALL does not guarantee the order of rows. Rows from the second operand may appear before, after, or mixed with rows from the first operand. In situations where a specific order is desired, ORDER BY must be used.

Note that UNION ALL may be much faster than plain UNION.

Examples

[edit]

Given these two tables:

sales2005
person amount
Joe 1000
Alex 2000
Bob 5000
sales2006
person amount
Joe 2000
Alex 2000
Zach 35000

Executing this statement:

SELECT * FROM sales2005
UNION
SELECT * FROM sales2006;

yields this result set, though the order of the rows can vary because no ORDER BY clause was supplied:

person amount
Joe 1000
Alex 2000
Bob 5000
Joe 2000
Zach 35000

Note that there are two rows for Joe because those rows are distinct across their columns. There is only one row for Alex because those rows are not distinct for both columns.

UNION ALL gives different results, because it will not eliminate duplicates. Executing this statement:

SELECT * FROM sales2005
UNION ALL
SELECT * FROM sales2006;

would give these results, again allowing variance for the lack of an ORDER BY statement:

person amount
Joe 1000
Joe 2000
Alex 2000
Alex 2000
Bob 5000
Zach 35000

The discussion of full outer joins also has an example that uses UNION.

INTERSECT operator

[edit]

The SQL INTERSECT operator takes the results of two queries and returns only rows that appear in both result sets. For purposes of duplicate removal the INTERSECT operator does not distinguish between NULLs. The INTERSECT operator removes duplicate rows from the final result set. The INTERSECT ALL operator does not remove duplicate rows from the final result set, but if a row appears X times in the first query and Y times in the second, it will appear times in the result set.

Example

[edit]

The following example INTERSECT query returns all rows from the Orders table where Quantity is between 50 and 100.

SELECT *
FROM   Orders
WHERE  Quantity BETWEEN 1 AND 100

INTERSECT

SELECT *
FROM   Orders
WHERE  Quantity BETWEEN 50 AND 200;

EXCEPT operator

[edit]

The SQL EXCEPT operator takes the distinct rows of one query and returns the rows that do not appear in a second result set. For purposes of row elimination and duplicate removal, the EXCEPT operator does not distinguish between NULLs. The EXCEPT ALL operator does not remove duplicates, but if a row appears X times in the first query and Y times in the second, it will appear times in the result set.

Notably, the Oracle platform provides a MINUS operator which is functionally equivalent to the SQL standard EXCEPT DISTINCT operator.[3]

Example

[edit]

The following example EXCEPT query returns all rows from the Orders table where Quantity is between 1 and 49, and those with a Quantity between 76 and 100.

Worded another way; the query returns all rows where the Quantity is between 1 and 100, apart from rows where the quantity is between 50 and 75.

SELECT *
FROM   Orders
WHERE  Quantity BETWEEN 1 AND 100

EXCEPT

SELECT *
FROM   Orders
WHERE  Quantity BETWEEN 50 AND 75;

Example

[edit]

The following example is equivalent to the above example but without using the EXCEPT operator.

SELECT o1.*
FROM (
    SELECT *
    FROM Orders
    WHERE Quantity BETWEEN 1 AND 100) o1
LEFT JOIN (
    SELECT *
    FROM Orders
    WHERE Quantity BETWEEN 50 AND 75) o2
ON o1.id = o2.id
WHERE o2.id IS NULL

See also

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
In SQL, set operations are fundamental language constructs that enable the combination of result sets from multiple SELECT queries into a single unified output, treating the data as mathematical sets to perform operations like union, intersection, and difference. The core set operators include UNION, which merges rows from two or more queries while eliminating duplicates; INTERSECT, which returns only the rows common to all input queries; and EXCEPT (or MINUS in some database systems like ), which retrieves rows from the first query that are absent in subsequent queries. These operations require participating SELECT statements to have the same number of columns with compatible data types and sorting of the final result set can be specified using an ORDER BY clause applied to the entire compound statement. Variants such as UNION ALL, INTERSECT ALL, and EXCEPT ALL extend these operators by retaining duplicate rows, offering greater flexibility for scenarios involving redundant data without the overhead of deduplication. Set operations are supported across major relational database management systems (RDBMS), including , , SQL Server, , and , though exact syntax and availability of operators (e.g., EXCEPT vs. MINUS) may vary slightly between implementations. They are particularly useful for data analysis, reporting, and integration tasks, such as consolidating customer lists from different tables or identifying discrepancies between datasets, enhancing SQL's expressive power for complex queries without requiring subqueries or joins in every case.

Overview

Definition and Purpose

In SQL, set operations are binary mechanisms that combine the result sets—known as relations—from two or more SELECT statements to generate a new result set, without modifying the underlying original tables or relations. These operations treat the outputs of independent queries as mathematical sets, applying principles from to produce a unified relation that adheres to the same compatibility rules, such as matching column counts and compatible types. The core purpose of set operations is to facilitate complex in relational databases by merging datasets into a single view, identifying overlapping elements across queries, or highlighting discrepancies between them, thereby embodying the foundational operations of union, intersection, and difference. This approach allows users to perform set-based manipulations efficiently, supporting advanced analytical tasks without restructuring the source data. In contrast to JOIN operations, which pair rows from multiple tables through column-based matching conditions to form horizontally expanded results, set operations process entire result sets vertically by appending, intersecting, or differencing them as cohesive units, independent of inter-row relationships. Introduced as part of SQL standards to streamline multi-table querying, they eliminate the need for intermediate temporary tables or nested subqueries in many scenarios.

Historical Development

Set operations in SQL trace their conceptual origins to Edgar F. Codd's , introduced in his seminal 1970 paper, which drew directly from to define relations as sets and apply standard set operations like union, , and difference to them. This foundation addressed the need for declarative, set-based querying in database systems, overcoming limitations in earlier navigational data models by enabling efficient manipulation of entire datasets without procedural code. The first formalization of set operations in SQL came with the ANSI standard (ISO/IEC 9075:1992), which included support for the UNION, INTERSECT, and EXCEPT operators to combine, intersect, and difference results from multiple queries while eliminating duplicates by default. Prior to this, proprietary dialects like Oracle's had implemented equivalents such as MINUS for set difference since the 1980s, reflecting early commercial adoption of set-based features. Subsequent standards refined these operations: SQL:1999 (SQL3) enhanced SQL with features like recursive queries and common table expressions while maintaining core set operations; SQL:2003 clarified semantics around NULL handling in set operations—treating NULLs as equal for comparison purposes—and ordering behaviors to improve consistency across implementations. Later revisions, including SQL:2011 and SQL:2023 under ISO/IEC 9075, have maintained the core stability of UNION, INTERSECT, and EXCEPT while focusing on integrations like and property graphs, confirming their enduring role in set-based querying. For instance, achieved full support for these operators starting with version 7.1 in 2000, facilitating broader adoption in open-source systems.

Fundamentals

Prerequisites

To effectively utilize set operations in SQL, users must first be proficient in constructing basic SELECT statements, including the use of core clauses such as FROM to specify data sources, WHERE for filtering rows, GROUP BY for aggregation grouping, HAVING for filtering grouped results, and ORDER BY for sorting output. This foundational capability allows the assembly of individual queries that can then be combined via set operators, as each operand in a set operation is itself a complete SELECT statement. A key conceptual prerequisite is recognizing that SQL result sets are inherently unordered multisets—also known as bags—that permit duplicate rows unless DISTINCT is explicitly applied to impose set-like uniqueness. This semantics underpins how set operations process inputs, where the default behavior (without ALL) eliminates duplicates to mimic mathematical sets, while variants like UNION ALL preserve duplicates and maintain the bag structure. Furthermore, NULL values require special attention: in set operation comparisons, two NULLs are treated as equal to each other, enabling them to match or be excluded consistently across rows, in contrast to standard equality checks in WHERE clauses where NULL equals NULL evaluates to unknown. Set operations demand compatibility among operand result sets, specifically requiring an identical number of columns in each participating SELECT query; column names play no role in compatibility but determine the names in the final output set. Corresponding columns must feature compatible data types, with the SQL standard permitting implicit to resolve minor mismatches—such as promoting integers to decimals—based on type precedence rules that prioritize exact matches or common supertypes. These rules ensure seamless integration in management systems (RDBMS) like and SQL Server, where no familiarity with procedural SQL elements, such as functions or triggers, is necessary for basic set operation usage.

General Syntax Rules

Set operations in SQL combine the result sets of two or more SELECT queries into a single result set, following a standardized syntax framework defined in the SQL language specification. The basic form for a set operation is <query1> <set_operator> [ALL | DISTINCT] <query2>, where <set_operator> can be UNION, INTERSECT, or EXCEPT, and this structure can be extended to multiple queries, such as <query1> UNION <query2> INTERSECT <query3>. The optional ALL qualifier preserves duplicate rows from the input queries, while DISTINCT (the default if unspecified) eliminates them; column names and aliases in the final result are derived from the first query. For queries to be compatible, each SELECT statement must produce the same number of columns, with corresponding columns having compatible data types (e.g., within the same type family like numeric or character, allowing implicit where defined). Mismatched column counts or incompatible types result in a . Precedence among operators follows a left-to-right evaluation, with INTERSECT having higher precedence than EXCEPT and UNION in standard implementations, though parentheses are required to enforce specific grouping, such as (<query1> UNION <query2>) EXCEPT <query3>. In terms of row handling, NULL values are treated as equal to one another during comparisons for duplicate elimination or set inclusion, consistent with SQL's standard comparison semantics. Clauses like ORDER BY and LIMIT (or equivalent paging constructs) apply only to the overall result set of the combined query, not to individual component queries.

Core Operators

UNION Operator

The UNION operator in SQL combines the result sets of two or more SELECT queries into a single result set, including all unique rows from the inputs while eliminating duplicates by default. The operation aligns columns positionally, requiring the participating queries to have the same number of columns with compatible data types; the data type of each output column is determined by the highest-precedence type among the corresponding input columns. This positional alignment ensures that the first column of one query corresponds to the first column of the other, without regard to column names unless the optional CORRESPONDING BY clause is used in implementations supporting SQL:1999 Feature F301 or later standards. A primary for the UNION operator is merging similar datasets from different tables or sources, such as combining records from multiple regions or lists from separate databases to produce a unified view without redundant entries. For instance, to retrieve a list of unique product models from both a main table and a specialized accessories table, the following query can be employed:

SELECT ProductModelID, Name FROM Production.ProductModel UNION SELECT ProductModelID, Name FROM dbo.Gloves;

SELECT ProductModelID, Name FROM Production.ProductModel UNION SELECT ProductModelID, Name FROM dbo.Gloves;

This returns all distinct rows where matching entries, such as identical product IDs and names, appear only once. Similarly, combining names from employees and contractors would yield unique names across both groups, as in SELECT name FROM employees UNION SELECT name FROM contractors;. Regarding NULL handling, the UNION operator treats NULL values in corresponding columns as equal for the purpose of duplicate elimination, such that multiple identical rows containing NULLs are reduced to a single row in the output. For example, if both input queries return a row with all NULLs in matching positions, only one such row appears in the UNION result, as NULLs are considered comparable under the set operation's equality rules. The UNION operator has the same precedence as other set operators like INTERSECT and EXCEPT, making it the lowest level among them; operations are evaluated from left to right, and parentheses are required to enforce specific nesting or grouping in compound expressions. For instance, (SELECT * FROM table1 UNION SELECT * FROM table2) EXCEPT SELECT * FROM table3 ensures the UNION is computed first before applying EXCEPT.

INTERSECT Operator

The INTERSECT operator in SQL combines the result sets of two or more SELECT statements and returns only the distinct rows that are present in every input query, effectively computing the set . This operation requires exact matches across all corresponding columns in the rows, with the number, order, and compatible data types of columns being identical between the queries. By default, INTERSECT eliminates duplicates, treating the output as a where only unique rows from the intersection are retained. Row comparison for INTERSECT is based on equality of all column values, following the database's precedence and rules, without implicit conversions between incompatible types such as character and numeric. NULL values are handled in a null-safe manner: two NULLs in corresponding positions are considered equal, allowing rows with matching NULLs to be included in the result if they appear in all input sets. This differs from standard WHERE clause comparisons, where NULL typically does not equal NULL. When using INTERSECT ALL, the operator retains duplicates based on the minimum multiplicity across the input queries, rather than removing them. This variant is useful for preserving the frequency of overlapping rows as determined by the scarcest occurrence in the inputs. However, not all database systems support the ALL modifier uniformly. A practical example illustrates this: consider two tables tracking products ordered in different years.

sql

SELECT product FROM orders_2024 INTERSECT SELECT product FROM orders_2025;

SELECT product FROM orders_2024 INTERSECT SELECT product FROM orders_2025;

This query returns only the products that appear in both result sets, such as those ordered in both 2024 and 2025, with duplicates removed. If the orders_2024 table has two entries for "Widget A" and orders_2025 has one, the result includes "Widget A" only once under default INTERSECT. Unique use cases for INTERSECT include identifying overlapping data across datasets, such as common customers between lists from different campaigns or shared employees in multiple department rosters, enabling analysis of intersections without including extraneous records. For instance, it can pinpoint products stocked in both warehouses by intersecting queries. Limitations of INTERSECT include lack of native support in older versions of some database management systems, such as prior to 8.0 (2018), where emulation via subqueries or joins is required. However, major systems like have supported it since 2000, and current versions of , SQL Server, and provide full support. Additionally, it cannot operate on certain data types like BLOB, CLOB, or XML in systems like and SQL Server, and performance may degrade in distributed environments due to local execution.

EXCEPT Operator

The EXCEPT operator, introduced in the SQL:1999 standard, performs a set difference operation by returning all distinct rows from the first SELECT query that do not appear in the second SELECT query (or subsequent queries if chained). This operator is asymmetric, meaning the order of the queries matters; swapping the queries will generally produce different results, as it subtracts the right-hand set from the left-hand set. By default, EXCEPT eliminates duplicates across the result set, treating it as a difference where matching rows are excluded only once unless the ALL modifier is used. In terms of detailed handling, EXCEPT compares rows based on all columns, requiring the participating queries to have the same number of columns with compatible data types; column names and order from the first query are preserved in the output. NULL values are treated as equal during row comparisons, so a row with NULL in the first query will be excluded if an identical row (including NULL) exists in the second query. When using EXCEPT ALL, the operation preserves multiplicities by subtracting the count of matching rows from the second query from those in the first; for example, if the first query has three identical rows and the second has two, one row remains in the result. If no rows in the second query match any in the first, the entire result set from the first query is returned unchanged. Unique use cases for EXCEPT include identifying discrepancies between datasets, such as finding employees eligible for a program but absent from a promotion list, or detecting records in a current not reflected in transactions. For instance, the following query retrieves user IDs from a full users table that are not present in an active_users table, effectively listing inactive users:

SELECT id FROM users EXCEPT SELECT id FROM active_users;

SELECT id FROM users EXCEPT SELECT id FROM active_users;

This returns only the distinct IDs unique to the users table. In some database systems, EXCEPT serves as the standard name since SQL:1999, while alternatives like MINUS are used interchangeably in others, such as where MINUS has identical semantics to EXCEPT. supports EXCEPT but not MINUS as a direct synonym.

Modifiers and Variations

ALL vs DISTINCT Behavior

In SQL set operations, the DISTINCT modifier, which is the default when no set quantifier is specified, eliminates duplicate rows from the combined result set, treating rows as identical based on all column values, including those containing NULLs, which are considered equal for comparison purposes. This behavior ensures that the output represents a true set, with each unique row appearing exactly once, regardless of how many times it occurs across the input queries. For instance, if two queries each return the row (1, NULL), the result under DISTINCT will include only one such row. In contrast, the ALL modifier preserves all rows from the input queries, including duplicates, treating the operation as a union, intersection, or difference that retains multiplicity (i.e., the count of occurrences in the inputs). This is particularly useful when the intent is to combine datasets without deduplication, such as aggregating s from multiple sources. For UNION ALL, INTERSECT ALL, and EXCEPT ALL, the result includes duplicates based on the minimum multiplicity for intersections (retaining the smaller of matching rows) or the adjusted difference for exceptions (retaining the excess from the first input after subtracting the second). To illustrate, the query SELECT 1 AS val UNION ALL SELECT 1 AS val; produces two rows of value 1, whereas SELECT 1 AS val UNION SELECT 1 AS val; yields only one row. The ALL and DISTINCT modifiers apply independently to each set operator in a chained expression, allowing mixtures such as UNION ALL followed by EXCEPT DISTINCT, though such combinations are uncommon in practice due to their complexity. In chains of operations, the choice of modifier can multiplicatively affect the final row count; for example, multiple ALL operators may exponentially increase output size by preserving duplicates at each step, while DISTINCT operations reduce it. According to the SQL:1999 standard (ISO/IEC 9075-2:1999), the DISTINCT behavior is implicit when the set quantifier is omitted, making explicit specification of ALL optional but necessary for semantics; support for explicit DISTINCT (as in UNION DISTINCT) is an optional feature (T551).

Database-Specific Implementations

provides full support for SQL:1999 set operations, including UNION, INTERSECT, EXCEPT, and their ALL variants, allowing duplicates to be preserved when specified. INTERSECT and EXCEPT were introduced in version 7.1, released on April 13, 2001, enabling comprehensive set-based query combinations from that point onward. introduced support for EXCEPT and INTERSECT in version 2005, aligning with SQL:2003 standards for these operators while maintaining UNION as a longstanding feature. Unlike some other systems, SQL Server does not recognize MINUS as a for EXCEPT, requiring explicit use of the standard keyword. Additionally, the TOP clause can be integrated directly into set operations to limit result sets, such as in SELECT TOP 10 * FROM table1 EXCEPT SELECT TOP 10 * FROM table2, providing flexible row limiting without subqueries. Oracle Database traditionally employs MINUS in place of EXCEPT for set difference operations prior to version 21c, where EXCEPT was formally added alongside enhancements like EXCEPT ALL. INTERSECT has been supported since Oracle 9i, facilitating row across queries. The ALL modifier for preserving duplicates in UNION has been available since Oracle 7, though extensions to INTERSECT ALL and MINUS ALL arrived in 21c. Oracle uniquely extends set operations through integration with hierarchical queries using the CONNECT BY clause, allowing compound queries to traverse tree-like structures, such as combining UNION with hierarchical subqueries for organizational . MySQL lacked native support for INTERSECT and EXCEPT until version 8.0.31, released in October 2022, where limited implementation was added for these operators alongside UNION, primarily supporting DISTINCT semantics by default and ALL for duplicates. Prior to this, developers emulated INTERSECT and EXCEPT using JOINs or subqueries, with UNION ALL serving as the dominant set operation for combining results without deduplication. SQLite offers full support for UNION, INTERSECT, and EXCEPT, including ALL variants, since version 2.4.0, released on June 6, 2000, enabling lightweight set operations in embedded environments. Across these systems, key differences include case insensitivity for SQL keywords like UNION or EXCEPT, which is standard and does not vary significantly, though identifier case sensitivity depends on database settings—such as SQL Server's default case-insensitive versus PostgreSQL's configurable options. Mismatches in column count or data types during set operations trigger vendor-specific error messages, like Oracle's ORA-01789 for differing SELECT lists or MySQL's more generic syntax errors pre-8.0.31.

Advanced Topics

Performance Implications

Set operations in SQL are executed using mechanisms such as hashing or sorting to handle deduplication and row matching, particularly when the default DISTINCT semantics are applied to UNION, INTERSECT, and EXCEPT. For UNION ALL, INTERSECT ALL, and EXCEPT ALL, the database management system (DBMS) often employs simple concatenation or append operations, avoiding the overhead of duplicate elimination and resulting in lower computational costs. INTERSECT and EXCEPT are commonly implemented via semi-join operators (left semi-join for INTERSECT and left anti semi-join for EXCEPT), which require building hash tables or sorted structures from both input result sets to identify matching or non-matching rows. Cost factors for set operations scale significantly with dataset size, amplifying input/output (I/O) demands and CPU usage due to full scans of input queries and subsequent processing for matching or elimination. UNION ALL is generally the least expensive, as it bypasses deduplication and can leverage parallel scans in modern DBMS implementations. In contrast, INTERSECT and EXCEPT incur higher costs from the need to process both result sets completely, often involving hash builds or sorts that can spill to disk for large inputs, making them more resource-intensive than UNION. For example, in SQL Server, hash-based deduplication for UNION often has lower estimated costs than sorting-based approaches on comparable inputs. Optimization strategies include preferring ALL variants when duplicates are acceptable or absent, as this eliminates sorting or hashing overhead. Query planners in advanced DBMS like and SQL Server may rewrite set operations as equivalent JOINs, allowing the use of indexes on join columns to reduce scan costs. Avoiding an explicit ORDER BY clause is advisable unless required, as set operations do not guarantee output order and adding sorting can double execution costs in some cases. In versions 14 and later, parallelism enhances performance for set operations by distributing scans and appends across worker processes, particularly for UNION ALL on partitioned tables. Benchmarks illustrate these differences; optimized UNION plans with deduplication can reduce the number of I/O scans compared to concatenation but may increase overall cost slightly due to additional processing. Operations like EXCEPT can exhibit higher latency in execution plans, with hash match semi-joins incurring higher estimated costs than simple concatenations in SQL Server. In distributed environments, local execution of INTERSECT and EXCEPT further impacts performance by preventing pushdown to remote servers. Common pitfalls include overusing default DISTINCT semantics, which triggers unnecessary deduplication and can significantly increase CPU usage on unsorted inputs compared to ALL variants. Large result sets from set operations may also require substantial temporary in memory or on disk for intermediate hashing or sorting, leading to spills and degraded on systems with limited resources.

Common Use Cases

Set operations in SQL are frequently employed for tasks, where the UNION operator consolidates results from disparate sources into a unified . For instance, in enterprise environments, UNION can merge log entries from web and mobile platforms to create a comprehensive view of user activity across channels, enabling holistic reporting without silos. This approach is particularly useful in scenarios involving multiple feeds, such as combining department records with warehouse inventories to generate a single list of locations. Overlap analysis leverages the INTERSECT operator to identify shared records between datasets, facilitating auditing and validation processes. A common application is in (CRM) systems, where INTERSECT detects duplicate addresses across customer lists to prevent redundant outreach and ensure data accuracy. Similarly, it can pinpoint common product IDs in both inventory and order tables, aiding in stock reconciliation and oversight. The EXCEPT operator excels in difference detection, highlighting records unique to one for change tracking and anomaly identification. In inventory management, for example, EXCEPT compares current stock against previous snapshots to isolate new or discontinued items, supporting timely updates to strategies. This is evident in queries that return products listed in an inventory table but absent from work orders, allowing teams to address potential discrepancies. Chained set operations enable complex filtering by nesting UNION, INTERSECT, and EXCEPT to refine results progressively. Consider a scenario for user management: a query might first UNION active and pending user lists, then apply EXCEPT to exclude banned accounts, yielding a targeted set of eligible users for notifications. Parentheses ensure proper precedence, as in (SELECT user_id FROM active_users UNION SELECT user_id FROM pending_users) EXCEPT SELECT user_id FROM banned_users. Best practices for set operations include integrating them with subqueries for modular query design, which enhances readability and —such as wrapping a UNION result in a subquery for further filtering. NULL values require careful testing, as INTERSECT and EXCEPT treat matching NULLs as equal (unlike standard comparisons where NULL ≠ NULL), potentially including unintended rows; use COALESCE or IS NULL checks to mitigate this. For scaling analyses, pair set operations with window functions like COUNT() OVER() to aggregate counts within results, e.g., tallying overlaps per category without collapsing the dataset.

References

Add your contribution
Related Hubs
Contribute something
User Avatar
No comments yet.