Optimal binary search tree

current hub

Write something...

Be the first to start a discussion here.

Recent from talks

Be the first to start a discussion here.

Recent from talks

Be the first to start a discussion here.

About hubStatsRules

See all

Wikipedia

Grokipedia

Optimal binary search tree

In computer science, an optimal binary search tree (Optimal BST), sometimes called a weight-balanced binary tree, is a binary search tree which provides the smallest possible search time (or expected search time) for a given sequence of accesses (or access probabilities). Optimal BSTs are generally divided into two types: static and dynamic.

In the static optimality problem, the tree cannot be modified after it has been constructed. In this case, there exists some particular layout of the nodes of the tree which provides the smallest expected search time for the given access probabilities. Various algorithms exist to construct or approximate the statically optimal tree given the information on the access probabilities of the elements.

In the dynamic optimality problem, the tree can be modified at any time, typically by permitting tree rotations. The tree is considered to have a cursor starting at the root which it can move or use to perform modifications. In this case, there exists some minimal-cost sequence of these operations which causes the cursor to visit every node in the target access sequence in order. The splay tree is conjectured to have a constant competitive ratio compared to the dynamically optimal tree in all cases, though this has not yet been proven.

In the static optimality problem as defined by Knuth, we are given a set of $n$ ordered elements and a set of $2n+1$ probabilities. We will denote the elements $a_{1}$ through $a_{n}$ and the probabilities $A_{1}$ through $A_{n}$ and $B_{0}$ through $B_{n}$ . $A_{i}$ is the probability of a search being done for element $a_{i}$ (or successful search). For $1\leq i<n$ , $B_{i}$ is the probability of a search being done for an element between $a_{i}$ and $a_{i+1}$ (or unsuccessful search), $B_{0}$ is the probability of a search being done for an element strictly less than $a_{1}$ , and $B_{n}$ is the probability of a search being done for an element strictly greater than $a_{n}$ . These $2n+1$ probabilities cover all possible searches, and therefore add up to one.

The static optimality problem is the optimization problem of finding the binary search tree that minimizes the expected search time, given the $2n+1$ probabilities. As the number of possible trees on a set of $n$ elements is ${2n \choose n}{\frac {1}{n+1}}$ , which is exponential in $n$ , brute-force search is not usually a feasible solution.

In 1971, Knuth published a relatively straightforward dynamic programming algorithm capable of constructing the statically optimal tree in only O(n²) time. In this work, Knuth extended and improved the dynamic programming algorithm by Edgar Gilbert and Edward F. Moore introduced in 1958. Gilbert's and Moore's algorithm required $O(n^{3})$ time and $O(n^{2})$ space and was designed for a particular case of optimal binary search trees construction (known as optimal alphabetic tree problem) that considers only the probability of unsuccessful searches, that is, ${\textstyle \sum _{i=1}^{n}A_{i}=0}$ . Knuth's work relied upon the following insight: the static optimality problem exhibits optimal substructure; that is, if a certain tree is statically optimal for a given probability distribution, then its left and right subtrees must also be statically optimal for their appropriate subsets of the distribution (known as monotonicity property of the roots).

To see this, consider what Knuth calls the "weighted path length" of a tree. The weighted path length of a tree of n elements is the sum of the lengths of all $2n+1$ possible search paths, weighted by their respective probabilities. The tree with the minimal weighted path length is, by definition, statically optimal.

But weighted path lengths have an interesting property. Let E be the weighted path length of a binary tree, $E L$ be the weighted path length of its left subtree, and $E R$ be the weighted path length of its right subtree. Also let W be the sum of all the probabilities in the tree. Observe that when either subtree is attached to the root, the depth of each of its elements (and thus each of its search paths) is increased by one. Also observe that the root itself has a depth of one. This means that the difference in weighted path length between a tree and its two subtrees is exactly the sum of every single probability in the tree, leading to the following recurrence:

See all

Hub AI

Optimal binary search tree AI simulator

(@Optimal binary search tree_simulator)

Wikipedia

Grokipedia

Hub AI

Optimal binary search tree

See all

Knowledge Base

Talk Channels

Special Pages

Optimal binary search tree

Optimal binary search tree

Recent from talks

Recent from talks

Knowledge base stats:

Talk channels stats:

Members stats:

Optimal binary search tree

Hub AI

Optimal binary search tree

History

Optimal binary search tree

Optimal binary search tree

Recent from talks

Recent from talks

Knowledge base stats:

Talk channels stats:

Members stats:

Optimal binary search tree

Hub AI

Optimal binary search tree