Hubbry Logo
search
logo

Optimal binary search tree

logo
Community Hub0 Subscribers
Write something...
Be the first to start a discussion here.
Be the first to start a discussion here.
See all
Optimal binary search tree

In computer science, an optimal binary search tree (Optimal BST), sometimes called a weight-balanced binary tree, is a binary search tree which provides the smallest possible search time (or expected search time) for a given sequence of accesses (or access probabilities). Optimal BSTs are generally divided into two types: static and dynamic.

In the static optimality problem, the tree cannot be modified after it has been constructed. In this case, there exists some particular layout of the nodes of the tree which provides the smallest expected search time for the given access probabilities. Various algorithms exist to construct or approximate the statically optimal tree given the information on the access probabilities of the elements.

In the dynamic optimality problem, the tree can be modified at any time, typically by permitting tree rotations. The tree is considered to have a cursor starting at the root which it can move or use to perform modifications. In this case, there exists some minimal-cost sequence of these operations which causes the cursor to visit every node in the target access sequence in order. The splay tree is conjectured to have a constant competitive ratio compared to the dynamically optimal tree in all cases, though this has not yet been proven.

In the static optimality problem as defined by Knuth, we are given a set of n ordered elements and a set of probabilities. We will denote the elements through and the probabilities through and through . is the probability of a search being done for element (or successful search). For , is the probability of a search being done for an element between and (or unsuccessful search), is the probability of a search being done for an element strictly less than , and is the probability of a search being done for an element strictly greater than . These probabilities cover all possible searches, and therefore add up to one.

The static optimality problem is the optimization problem of finding the binary search tree that minimizes the expected search time, given the probabilities. As the number of possible trees on a set of n elements is , which is exponential in n, brute-force search is not usually a feasible solution.

In 1971, Knuth published a relatively straightforward dynamic programming algorithm capable of constructing the statically optimal tree in only O(n2) time. In this work, Knuth extended and improved the dynamic programming algorithm by Edgar Gilbert and Edward F. Moore introduced in 1958. Gilbert's and Moore's algorithm required time and space and was designed for a particular case of optimal binary search trees construction (known as optimal alphabetic tree problem) that considers only the probability of unsuccessful searches, that is, . Knuth's work relied upon the following insight: the static optimality problem exhibits optimal substructure; that is, if a certain tree is statically optimal for a given probability distribution, then its left and right subtrees must also be statically optimal for their appropriate subsets of the distribution (known as monotonicity property of the roots).

To see this, consider what Knuth calls the "weighted path length" of a tree. The weighted path length of a tree of n elements is the sum of the lengths of all possible search paths, weighted by their respective probabilities. The tree with the minimal weighted path length is, by definition, statically optimal.

But weighted path lengths have an interesting property. Let E be the weighted path length of a binary tree, EL be the weighted path length of its left subtree, and ER be the weighted path length of its right subtree. Also let W be the sum of all the probabilities in the tree. Observe that when either subtree is attached to the root, the depth of each of its elements (and thus each of its search paths) is increased by one. Also observe that the root itself has a depth of one. This means that the difference in weighted path length between a tree and its two subtrees is exactly the sum of every single probability in the tree, leading to the following recurrence:

See all
User Avatar
No comments yet.