Hubbry Logo
search
logo

Fusion tree

logo
Community Hub0 Subscribers
Write something...
Be the first to start a discussion here.
Be the first to start a discussion here.
See all
Fusion tree

In computer science, a fusion tree is a type of tree data structure that implements an associative array on w-bit integers on a finite universe, where each of the input integers has size less than 2w and is non-negative. When operating on a collection of n key–value pairs, it uses O(n) space and performs searches in O(logw n) time, which is asymptotically faster than a traditional self-balancing binary search tree, and also better than the van Emde Boas tree for large values of w. It achieves this speed by using certain constant-time operations that can be done on a machine word. Fusion trees were invented in 1990 by Michael Fredman and Dan Willard.

Several advances have been made since Fredman and Willard's original 1990 paper. In 1999 it was shown how to implement fusion trees under a model of computation in which all of the underlying operations of the algorithm belong to AC0, a model of circuit complexity that allows addition and bitwise Boolean operations but does not allow the multiplication operations used in the original fusion tree algorithm. A dynamic version of fusion trees using hash tables was proposed in 1996 which matched the original structure's O(logw n) runtime in expectation. Another dynamic version using exponential tree was proposed in 2007 which yields worst-case runtimes of O(logw n + log log n) per operation. Finally, it was shown that dynamic fusion trees can perform each operation in O(logw n) time deterministically.

This data structure implements add key, remove key, search key, and predecessor (next smaller value) and successor (next larger value) search operations for a given key. The partial result of most significant bit locator in constant time has also helped further research. Fusion trees utilize word-level parallelism to be efficient, performing computation on several small integers, stored in a single machine word, simultaneously to reduce the number of total operations.

A fusion tree is essentially a B-tree with branching factor of w1/5 (any small exponent is also possible as it will not have a great impact on the height of the tree), which gives it a height of O(logw n). To achieve the desired runtimes for updates and queries, the fusion tree must be able to search a node containing up to w1/5 keys in constant time. This is done by compressing ("sketching") the keys so that all can fit into one machine word, which in turn allows comparisons to be done in parallel. So, a series of computations involving sketching, parallel comparison and most significant bit index locator, help reach the required solution.

Sketching is the method by which each w-bit key at a node containing k keys is compressed into only k − 1 bits. Each key x may be thought of as a path in the full binary tree of height w starting at the root and ending at the leaf corresponding to x. This path can be processed by recursively searching the left child of i if the ith bit is 0, and the right child if it is 1, generally, until all bits are scanned. To distinguish two paths, it suffices to look at their branching point (the first bit where any two keys differ). As there are a maximum of k keys, there will not be more than k-1 branching points, which means that no more than k-1 bits are required to identify a key. And hence, no sketch will have more than k-1 bits.

An important property of the sketch function is that it preserves the order of the keys. That is, sketch(x) < sketch(y) for any two keys x < y. So, for the entire range of keys, sketch(x0)<sketch(x1)<...<sketch(xk-1) because if the binary-tree-like path is followed, the nodes will be ordered in such a manner that x0<x1<...<xk-1.

If the locations of the sketch bits are b1 < b2 < ⋅⋅⋅ < br, then the sketch of the key xw-1⋅⋅⋅x1x0 is the r-bit integer .

With only standard word operations, such as those of the C programming language, it is difficult to directly compute the perfect sketch of a key in constant time. Instead, the sketch bits can be packed into a range of size at most r4, using bitwise AND and multiplication, called the approximate sketch, which does have all the important bits but also some additional useless bits spread out in a predictable pattern. The bitwise AND operation serves as a mask to remove all these non-sketch bits from the key, while the multiplication shifts the sketch bits into a small range. Like the "perfect" sketch, the approximate sketch also preserves the order of the keys and means that sketch(x0)<sketch(x1)<...<sketch(xk-1).

See all
User Avatar
No comments yet.