Hubbry Logo
Hash joinHash joinMain
Open search
Hash join
Community hub
Hash join
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something
Hash join
Hash join
from Wikipedia

The hash join is an example of a join algorithm and is used in the implementation of a relational database management system. All variants of hash join algorithms involve building hash tables from the tuples of one or both of the joined relations, and subsequently probing those tables so that only tuples with the same hash code need to be compared for equality in equijoins.

Hash joins are typically more efficient than nested loops joins, except when the probe side of the join is very small. They require an equijoin predicate (a predicate comparing records from one table with those from the other table using a conjunction of equality operators '=' on one or more columns).

Classic hash join

[edit]

The classic hash join algorithm for an inner join of two relations proceeds as follows:

  • First, prepare a hash table using the contents of one relation, ideally whichever one is smaller after applying local predicates. This relation is called the build side of the join. The hash table entries are mappings from the value of the (composite) join attribute to the remaining attributes of that row (whichever ones are needed).
  • Once the hash table is built, scan the other relation (the probe side). For each row of the probe relation, find the relevant rows from the build relation by looking in the hash table.

The first phase is usually called the "build" phase, while the second is called the "probe" phase. Similarly, the join relation on which the hash table is built is called the "build" input, whereas the other input is called the "probe" input.

This algorithm is simple, but it requires that the smaller join relation fits into memory, which is sometimes not the case. A simple approach to handling this situation proceeds as follows:

  1. For each tuple in the build input
    1. Add to the in-memory hash table
    2. If the size of the hash table equals the maximum in-memory size:
      1. Scan the probe input , and add matching join tuples to the output relation
      2. Reset the hash table, and continue scanning the build input
  2. Do a final scan of the probe input and add the resulting join tuples to the output relation

This is essentially the same as the block nested loop join algorithm. This algorithm may scan more times than necessary.

Grace hash join

[edit]

A better approach is known as the "grace hash join", after the GRACE database machine for which it was first implemented.

This algorithm avoids rescanning the entire relation by first partitioning both and via a hash function, and writing these partitions out to disk. The algorithm then loads pairs of partitions into memory, builds a hash table for the smaller partitioned relation, and probes the other relation for matches with the current hash table. Because the partitions were formed by hashing on the join key, it must be the case that any join output tuples must belong to the same partition.

It is possible that one or more of the partitions still does not fit into the available memory, in which case the algorithm is recursively applied: an additional orthogonal hash function is chosen to hash the large partition into sub-partitions, which are then processed as before. Since this is expensive, the algorithm tries to reduce the chance that it will occur by forming the smallest partitions possible during the initial partitioning phase.

Hybrid hash join

[edit]

The hybrid hash join algorithm[1] is a combination of the classical hash join and grace hash join. It uses minimal amount of memory for partitioning like in grace hash join and uses the remaining memory to initialize a classical hash join during partitioning phase. During the partitioning phase, the hybrid hash join uses the available memory for two purposes:

  1. To partition both relations and and
  2. To hold an entire partition from in-memory, known as "partition 0"

Because partition 0 is never written to disk, hybrid hash join typically performs fewer I/O operations than grace hash join. When fits nearly fully into memory hybrid hash join has a similar behavior like the classical hash join which is more beneficial. Otherwise hybrid hash join imitates grace hash join.

Note that this algorithm is memory-sensitive, because there are two competing demands for memory (the hash table for partition 0, and the output buffers for the remaining partitions). Choosing too large a hash table for partition 0 might cause the algorithm to recurse because one of the non-zero partitions is too large to fit into memory.

Hash anti-join

[edit]

Hash joins can also be evaluated for an anti-join predicate (a predicate selecting values from one table when no related values are found in the other). Depending on the sizes of the tables, different algorithms can be applied:

Hash left anti-join

[edit]
  • Prepare a hash table for the NOT IN side of the join.
  • Scan the other table, selecting any rows where the join attribute hashes to an empty entry in the hash table.

This is more efficient when the NOT IN table is smaller than the FROM table.

Hash right anti-join

[edit]
  • Prepare a hash table for the FROM side of the join.
  • Scan the NOT IN table, removing the corresponding records from the hash table on each hash hit.
  • Return everything that left in the hash table.

This is more efficient when the NOT IN table is larger than the FROM table.

Hash semi-join

[edit]

Hash semi-join is used to return the records found in the other table. Unlike the plain join, it returns each matching record from the leading table only once, regardless of how many matches there are in the IN table.

As with the anti-join, semi-join can also be left and right:

Hash left semi-join

[edit]
  • Prepare a hash table for the IN side of the join.
  • Scan the other table, returning any rows that produce a hash hit.

The records are returned right after they produced a hit. The actual records from the hash table are ignored.

This is more efficient when the IN table is smaller than the FROM table.

Hash right semi-join

[edit]
  • Prepare a hash table for the FROM side of the join.
  • Scan the IN table, returning the corresponding records from the hash table and removing them.

With this algorithm, each record from the hash table (that is, FROM table) can only be returned once, since it is removed after being returned.

This is more efficient when the IN table is larger than the FROM table.

See also

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
A hash join is a fundamental join algorithm in management systems (DBMS) that combines two input relations, R and S, based on an equality predicate on specified join attributes, by constructing an in-memory from one relation and probing it with s from the other to identify matches. The algorithm operates in two primary phases: the build phase, where the smaller relation (typically R) is scanned to populate a using a on the join keys, and the probe phase, where the larger relation (S) is scanned, with each hashed to locate potential matches in the table, followed by verification to handle collisions. This approach minimizes the number of comparisons required, making it particularly efficient for equi-joins on large datasets where the build relation fits in memory. For cases where relations exceed available memory, variants like the Grace hash join extend the algorithm with an initial partitioning phase, where both relations are hashed into multiple buckets using a partitioning , potentially recursively until buckets fit in memory, followed by independent joins on matching buckets to reduce I/O costs to approximately 3(M + N) page accesses, where M and N are the sizes of R and S in pages. The hybrid hash join further optimizes this by retaining frequently accessed (or "hot") partitions in memory during partitioning, avoiding disk spills for skewed data distributions and blending elements of both in-memory and partitioned strategies. These variants leverage techniques such as Bloom filters during probing to filter out non-matching tuples early, reducing unnecessary I/O. Hash joins are widely used in modern DBMS for (OLAP) workloads due to their superior performance over alternatives like nested-loop joins (which scale poorly with O(m × n) comparisons) or sort-merge joins (which incur sorting overhead), especially on unsorted data with sufficient memory availability. However, they are memory-intensive, requiring space for the proportional to the build relation's size, and are limited to equality conditions, with potential performance degradation from hash collisions or data skew. Optimizations in multi-core environments focus on parallel partitioning and probing to exploit hardware parallelism while minimizing synchronization overhead.

Fundamentals

Definition and Motivation

The hash join is an for performing equi-joins between two relations in a management system, where tuples from relations RR and SS are matched based on equality conditions such as R.A=S.BR.A = S.B. It applies a to the join attributes of one relation (typically the smaller, called the build relation) to construct an in-memory with containing tuples grouped by their hash values. Tuples from the other relation (the probe relation) are then hashed and probed against this table to identify matching tuples efficiently, with collisions resolved within the same . This hashing reduces the search space dramatically compared to brute-force methods, allowing for rapid identification of join pairs through in-memory lookups. The motivation for hash joins arises from the limitations of alternative join strategies, particularly nested-loop joins, which incur a quadratic time complexity of O(n×m)O(n \times m) for relations of sizes nn and mm, rendering them inefficient for large-scale data. In contrast, hash joins achieve a linear time complexity of O(n+m)O(n + m) on average by scanning each relation once and leveraging hashing to limit comparisons to relevant subsets of tuples. Additionally, variants of hash joins excel in memory-limited settings by employing partitioning to process data in smaller, -resident units, thereby reducing disk I/O overhead and supporting in environments where entire relations cannot fit in main memory. The classic hash join assumes sufficient memory for the build relation. Hash joins trace their origins to the early 1980s amid research on optimizing relational query processing, with key developments including Kjell Bratbergsengen's hash-based relational algebra methods from the ASTRA project, first documented in 1980 and fully published in 1984. Concurrently, David DeWitt and colleagues advanced hash join techniques for multiprocessor systems in 1984-1985, influencing parallel database implementations. These foundational works enabled the integration of hash joins into major commercial systems by the mid-1990s.

Comparison to Other Join Methods

Hash joins are one of several algorithms used in management systems (DBMS) to combine data from multiple tables based on join conditions. The nested-loop join is the simplest approach, iterating through each row of one table (the outer) and scanning the other table (the inner) for matches on the join predicate, resulting in a worst-case of O(n × m) where n and m are the sizes of the two relations. This method performs well for small datasets or when indexes are available on the inner table but scales poorly for large, unindexed tables due to its quadratic behavior. In contrast, the first sorts both input relations on the join keys, which takes O(n log n + m log m) time, followed by a linear merge pass in O(n + m) to produce matching pairs. It is particularly stable for non-equi-join conditions, such as inequalities, and handles sorted data efficiently without requiring indexes, though the upfront sorting cost can be prohibitive for unsorted inputs. Hash joins excel in equi-join scenarios—where the join condition is equality on specific attributes—achieving an average-case of O(n + m) under uniform hash distribution, making them asymptotically optimal for large equi-joins. Their primary strength lies in constant-time lookups via in-memory s, avoiding the repeated scans of nested loops or the sorting overhead of sort-merge joins, particularly when sufficient is available to build the . However, hash joins falter with skewed distributions, where hash collisions lead to uneven partition sizes and potential degradation, and they are generally unsuitable for non-equi conditions due to reliance on exact matches. In modern DBMS such as and SQL Server, hash joins are the preferred strategy for medium-to-large equi-joins on unindexed or unsorted tables, as they leverage available memory to minimize I/O and enable parallelism. They are commonly selected by query optimizers when the build phase fits in memory and the join keys exhibit good selectivity, outperforming alternatives in data warehousing workloads. Empirical studies on benchmarks like TPC-H demonstrate hash joins outperforming sort-merge and nested-loop joins by factors of 2-10x in equi-join dominated queries, especially for scale factors where constraints are manageable and data skew is moderate. For instance, in TPC-H query processing, hash-based implementations reduce execution time significantly compared to sort-merge variants by avoiding sorting overhead, though sort-merge may edge out in scenarios with pre-sorted inputs or inequality joins.

Core Mechanics

Build and Probe Phases

The hash join algorithm consists of two primary phases: the build phase, in which a is constructed from one input relation, and the probe phase, in which the second relation is used to query the for matches. This two-phase structure forms the foundational workflow for all hash join variants, enabling efficient equi-joins on large datasets when sufficient memory is available. In the build phase, the smaller of the two relations (R and S) is conventionally selected as the build input to minimize consumption and maximize the likelihood that the fits entirely in main . For each in the build relation, a is applied to the join key (or composite join keys) to compute a index, and the is inserted into the corresponding of the in-memory . If the entire build relation fits in , the process avoids disk I/O; otherwise, the algorithm may require extensions beyond the basic phases, though the core insertion logic remains the same. The is typically sized to achieve a load factor below 1 to balance space and performance. Collisions in the —arising when multiple tuples hash to the same —are resolved using methods such as , where tuples in a are stored in a , or , where an alternative slot is probed via techniques like linear or . is particularly common in database implementations due to its simplicity and robustness against clustering, though it introduces pointer overhead. The algorithm assumes a high-quality that distributes tuples uniformly across buckets, minimizing skew where some buckets become disproportionately large and degrade performance through excessive chain lengths or probes. This uniform distribution ensures average-case constant-time lookups and is a key precondition for the efficiency of the classic hash join, the simplest non-partitioned realization of these phases. The probe phase processes the larger relation (the probe input) by iterating through its tuples, computing the hash of each join key, and looking up the corresponding bucket in the hash table. For each potential match in the bucket (e.g., by scanning the chain or probing slots), an equality check on the join keys confirms a join; matching tuples are then concatenated and output. Non-matching probe tuples are discarded for inner joins. This phase leverages the precomputed hash table for rapid access, typically achieving O(1) average time per probe under uniform hashing. The following high-level pseudocode illustrates the build and probe phases for an in-memory hash join: Build Phase:

for each tuple t in build_relation: h = hash(t.join_key) insert t into hash_table[h] // using chaining or open addressing for collisions

for each tuple t in build_relation: h = hash(t.join_key) insert t into hash_table[h] // using chaining or open addressing for collisions

Probe Phase:

for each tuple s in probe_relation: h = hash(s.join_key) for each t in bucket hash_table[h]: // scan chain or probe slots if t.join_key == s.join_key: output joined tuple (t, s)

for each tuple s in probe_relation: h = hash(s.join_key) for each t in bucket hash_table[h]: // scan chain or probe slots if t.join_key == s.join_key: output joined tuple (t, s)

This outline captures the essential steps, with the insert and bucket traversal adapted to the chosen collision resolution strategy.

Classic Hash Join

The classic hash join is a fundamental join in management systems that operates entirely in main memory, assuming the smaller relation fits completely within available memory to avoid disk I/O during execution. This method leverages hashing to enable efficient equi-joins by constructing an in-memory from one relation and probing it with tuples from the other, making it suitable for scenarios where memory constraints are not prohibitive and relations are not excessively large. The algorithm proceeds in two primary phases without any partitioning: first, the build phase constructs a hash table using the smaller relation (denoted as R) on the join attribute(s); second, the probe phase scans the larger relation (S) and uses each tuple's join attribute to probe the hash table for matches. In the build phase, each tuple from R is hashed based on the join key, and the tuple (or a pointer to it) is inserted into the corresponding bucket of the hash table, which may handle collisions via chaining or open addressing. During the probe phase, for each tuple in S, the system computes the hash value of its join attribute, locates the matching bucket, and compares the tuple against those in the bucket to identify exact matches; matching pairs are then output, while non-matches are discarded. This process assumes a good hash function that distributes keys uniformly to minimize collisions and bucket overflows. To illustrate, consider a simple equi-join on attribute A between relations (3 tuples) and S (5 tuples), where is the build relation. Relation R:
IDAB
110x
220y
330z
Relation S:
IDAC
110p
215q
320r
425s
530t
In the build phase, a is constructed from R using attribute A as the key (assuming a simple h(A) = A mod 7, with for collisions):
  • h(10) = 3 → Bucket 3: (10, x)
  • h(20) = 6 → Bucket 6: (20, y)
  • h(30) = 2 → Bucket 2: (30, z)
In the probe phase, S is scanned sequentially:
  • For S (A=10), h(10)=3 → Match with (10, x) → Output (10, x, p)
  • For S (A=15), h(15)=1 → No match
  • For S (A=20), h(20)=6 → Match with (20, y) → Output (20, y, r)
  • For S (A=25), h(25)=4 → No match
  • For S (A=30), h(30)=2 → Match with (30, z) → Output (30, z, t)
The resulting join contains three tuples. A key limitation of the classic hash join is its reliance on the entire build relation fitting in memory; if R exceeds available memory, the algorithm fails or degrades severely due to excessive paging in systems, leading to high I/O costs and poor performance. It also offers no mechanism for handling spills to disk, making it unsuitable for large-scale data beyond early prototypes. The classic hash join was employed in initial relational DBMS prototypes during the , particularly in research systems exploring efficient join strategies for main-memory operations.

Partitioned Approaches

Grace Hash Join

The Grace hash join is a variant of the hash join algorithm designed to handle relations larger than available main memory by recursively partitioning the input relations into smaller buckets that fit in memory, thereby minimizing disk I/O operations. This approach separates the process into distinct partitioning and joining phases, enabling efficient processing on disk-based storage systems where random access is feasible. In the partitioning phase, both input relations are hashed on the join attribute using the same to produce k buckets, where k is selected such that each resulting partition is small enough to fit into available memory during subsequent joining. The relations are read from disk, partitioned in a single pass, and the resulting buckets are written to disk, ensuring that corresponding partitions from each relation can be joined independently without cross-partition dependencies. The choice of k balances the number of partitions against I/O overhead, typically aiming for partitions that are approximately the size of memory minus space for output buffers. Following partitioning, the recursive joining phase processes corresponding pairs of partitions from the two relations. For each pair, if the combined size fits in , a classic hash join is applied directly; otherwise, the process by further partitioning the pair into sub-buckets and repeating the algorithm until the base case is reached. This ensures to arbitrarily large datasets, assuming sufficient disk space and random access capabilities for efficient reads and writes. The algorithm assumes storage systems supporting , which allows partitions to be read and written without sequential constraints, optimizing for multi-disk environments. The value of k is chosen to minimize total I/O, often set to the ratio of relation sizes to capacity, ensuring balanced load across levels. The name "Grace hash join" originates from the GRACE database machine project in the 1980s, an initiative at the that developed hash-based relational processing techniques, though the method has since become a standard reference independent of the hardware. Compared to the classic hash join, which requires both relations to fit in memory, the Grace hash join accommodates datasets of arbitrary size by distributing processing across multiple passes, reducing the effective I/O to approximately three passes over each input relation (two reads and one write), as the partitioning phase involves reading each relation once and writing the partitions to disk, while the joining phase involves reading the partition pairs. This results in significantly lower I/O costs for large-scale joins, making it suitable for traditional database systems with limited main memory.

Hybrid Hash Join

The hybrid hash join algorithm optimizes hash-based equi-joins by selectively retaining smaller partitions in memory during the initial partitioning phase, thereby minimizing disk I/O compared to fully disk-based approaches. Introduced to leverage increasing main memory sizes in database systems, it partitions both input relations using a on the join attribute while estimating the distribution of join keys—typically assuming uniformity or using available statistics—to determine partition sizes. In execution, the algorithm first hashes the build relation (R) into B+1 partitions, retaining the smallest partition (R₀) in memory (sized to approximately M - B blocks, where M is the available memory in blocks and B is the number of buffer pages for partitioning). The probe relation (S) is similarly partitioned into corresponding buckets, with S₀ joined immediately against the in-memory for R₀. Larger partitions (Rᵢ and Sᵢ for i > 0) are spilled to disk and later joined using a Grace hash join as a fallback. Bucket sizing is determined by selecting the number of partitions such that 1-2 buckets can fit comfortably in , often starting with an estimate based on relation cardinalities and constraints; dynamic adjustments occur if skew causes overflows, such as by repartitioning oversized buckets or using sampling to refine distribution estimates. This approach handles data skew effectively by isolating large buckets for disk processing while joining smaller ones on-the-fly, reducing overall latency in skewed datasets. The primary advantages include fewer I/O operations—typically 2 passes over the data versus 3 for the Grace hash join—due to immediate in-memory joining of retained partitions, along with lower CPU overhead from reduced record comparisons. It is widely adopted in modern database optimizers, serving as the default equi-join operator in systems like Apache AsterixDB and . For example, consider relations R (1 million tuples) and S (2 million tuples) with skewed join keys where 80% of tuples share one key value, available M = 100 blocks, and B = 10 buffers. The algorithm partitions into 11 buckets, retaining R₀ and S₀ (the non-skewed portions, ~20% of ) in memory for immediate joining, while spilling the large skewed buckets to disk for subsequent Grace-style processing; this avoids unnecessary disk writes and reads for the smaller partitions, potentially halving I/O compared to uniform spilling.

Specialized Hash-Based Joins

Hash Semi-Join

A hash semi-join is a variant of the semi-join operation in relational databases that returns only the tuples from one input relation (typically the left relation, R) for which there exists at least one matching tuple in the other input relation (S) under the specified join condition, while retaining only the attributes from R and suppressing any duplicates from the matching side. This operation is formally defined as R ⋉ S = π_R (R ⋈ S), where π_R projects onto R's attributes after performing the inner join, ensuring no attributes from S appear in the output. The algorithm adapts the core hash join mechanism by building a on the smaller relation (S) during the build phase and then probing from the larger relation (R) in the probe phase, outputting a probe tuple from R immediately upon finding the first match in the hash table without including any attributes from S. This adaptation leverages the same partitioning and hashing principles as classic hash joins but optimizes for existence checks rather than producing full cross-products. In a left semi-join (R ⋉ S), the output consists of tuples from the left relation R that have matches in S; a right semi-join (S ⋉ R) is symmetric but less commonly implemented due to typical query patterns favoring left-side outputs. The operation is directional, meaning R ⋉ S generally differs from S ⋉ R unless the relations and join condition are symmetric. Hash semi-joins are particularly useful in query optimization scenarios, such as rewriting EXISTS subqueries or applying filters to reduce intermediate result sizes in complex queries, thereby minimizing data transmission in distributed systems and avoiding the overhead of full joins. Implementation details include early stopping during the probe phase, where processing for a given probe tuple halts upon the first match to confirm existence without scanning the entire hash bucket, and the use of bitmap or bit vector indexes in some database management systems to accelerate filtering by marking potential matches prior to probing.

Hash Anti-Join

A hash anti-join is a specialized variant of the hash join algorithm used in relational database systems to compute the anti-join operation, which returns tuples from one input relation that lack matching tuples in the other relation according to the specified join condition. This operation is particularly useful for identifying non-matching elements, forming the relational algebra equivalent of set difference under the join predicate. In a left anti-join, tuples from the left (outer) relation are output only if no corresponding tuples exist in the right (inner) relation; the right anti-join reverses this, outputting unmatched tuples from the right relation. The algorithm typically selects the smaller relation as the build input to optimize usage. During the build phase, a is constructed by hashing tuples from the inner relation on the join attributes, often employing hybrid hashing to handle spills to disk if is limited. In the probe phase, each tuple from the outer relation is hashed to locate the relevant in the table; the entire must then be scanned to verify the absence of any matching tuple, with non-matching outer tuples passed to the output. This assumes equi-join predicates and relies on full scans for correctness, distinguishing it from inner joins where matches trigger output. The technique is sensitive to hash collisions, which populate buckets with multiple tuples and necessitate exhaustive scans to confirm non-matches, thereby increasing computational overhead in skewed distributions. In exact implementations, these scans ensure no false negatives, but the rises with collision frequency; approximate variants may leverage probabilistic structures to mitigate this, though at the risk of erroneously excluding valid non-matches. Hash anti-joins find application in evaluating NOT EXISTS subqueries and predicates, where confirming the absence of related records is essential, such as in constraint validation. They support data cleaning by detecting discrepancies like missing references or duplicate exclusions in integration tasks. For in distributed environments, they can be approximated using Bloom filters to pre-filter probes, complementing hash semi-joins that handle checks and reducing data transfer while accepting controlled error rates.

Performance Considerations

Complexity Analysis

The classic hash join achieves an average-case of O(n+m)O(n + m), where nn and mm denote the sizes of the two input relations, under the assumption that the smaller relation fits entirely in main memory and the provides uniform distribution without significant collisions. This bound arises from the build phase scanning and hashing the smaller relation once (O(min(n,m))O(\min(n, m))) and the probe phase scanning the larger relation once while performing constant-time lookups (O(max(n,m))O(\max(n, m))). The for the in-memory during the build phase is O(min(n,m))O(\min(n, m)), limited to the hashed tuples plus overhead for buckets and chains. In the average case, the total processing time can be formalized as T=n+m+hT = n + m + h, where hh represents the probes into the , typically equaling the size of the probe relation (h=mh = m) with successful or unsuccessful matches resolved in constant time. However, the worst-case degrades to O(nm)O(n \cdot m) due to data skew, where uneven hashing causes long chains or overflow, effectively reverting to a nested-loop scan within buckets. For the Grace hash join, which addresses inputs exceeding available through , the average-case extends to O((n+m)logk)O((n + m) \log k), where kk is the number of partitions determined by the factor, accounting for the logarithmic depth in partitioning and sub-join phases. The per partition reduces to O(n)O(\sqrt{n})
Add your contribution
Related Hubs
Contribute something
User Avatar
No comments yet.