Disjoint-set data structure
Disjoint-set data structure
Main page

Disjoint-set data structure

logo
Community Hub0 subscribers
What are your thoughts?
Be the first to start a discussion here.
Be the first to start a discussion here.
Disjoint-set data structure

In computer science, a disjoint-set data structure, also called a union–find data structure or merge–find set, is a data structure that stores a collection of disjoint (non-overlapping) sets. Equivalently, it stores a partition of a set into disjoint subsets. It provides operations for adding new sets, merging sets (replacing them with their union), and finding a representative member of a set. The last operation makes it possible to determine efficiently whether any two elements belong to the same set or to different sets.

While there are several ways of implementing disjoint-set data structures, in practice they are often identified with a particular implementation known as a disjoint-set forest. This specialized type of forest performs union and find operations in near-constant amortized time. For a sequence of m addition, union, or find operations on a disjoint-set forest with n nodes, the total time required is O(mα(n)), where α(n) is the extremely slow-growing inverse Ackermann function. Although disjoint-set forests do not guarantee this time per operation, each operation rebalances the structure (via tree compression) so that subsequent operations become faster. As a result, disjoint-set forests are both asymptotically optimal and practically efficient.

Disjoint-set data structures play a key role in Kruskal's algorithm for finding the minimum spanning tree of a graph. The importance of minimum spanning trees means that disjoint-set data structures support a wide variety of algorithms. In addition, these data structures find applications in symbolic computation and in compilers, especially for register allocation problems.

Disjoint-set forests were first described by Bernard A. Galler and Michael J. Fischer in 1964. In 1973, their time complexity was bounded to , the iterated logarithm of , by Hopcroft and Ullman. In 1975, Robert Tarjan was the first to prove the (inverse Ackermann function) upper bound on the algorithm's time complexity. He also proved it to be tight. In 1979, he showed that this was the lower bound for a certain class of algorithms, pointer algorithms, that include the Galler-Fischer structure. In 1989, Fredman and Saks showed that (amortized) words of bits must be accessed by any disjoint-set data structure per operation, thereby proving the optimality of the data structure in this model.

In 1991, Galil and Italiano published a survey of data structures for disjoint-sets.

In 1994, Richard J. Anderson and Heather Woll described a parallelized version of Union–Find that never needs to block.

In 2007, Sylvain Conchon and Jean-Christophe Filliâtre developed a semi-persistent version of the disjoint-set forest data structure and formalized its correctness using the proof assistant Rocq (then: Coq). "Semi-persistent" means that previous versions of the structure are efficiently retained, but accessing previous versions of the data structure invalidates later ones. Their fastest implementation achieves performance almost as efficient as the non-persistent algorithm. They do not perform a complexity analysis.

Variants of disjoint-set data structures with better performance on a restricted class of problems have also been considered. Gabow and Tarjan showed that if the possible unions are restricted in certain ways, then a truly linear time algorithm is possible. In particular, linear time is achievable if a "union tree" is given a priori. This is a tree that includes all elements of the sets. Let p[v] denote the parent in the tree, then the assumption is that union operations must have the form union(v,p[v]) for some v.

See all
User Avatar
No comments yet.