Recent from talks
Nothing was collected or created yet.
Order statistic tree
View on WikipediaIn computer science, an order statistic tree is a variant of the binary search tree (or more generally, a B-tree[1]) that supports two additional operations beyond insertion, lookup and deletion:
- Select(i) – find the i-th smallest element stored in the tree
- Rank(x) – find the rank of element x in the tree, i.e. its index in the sorted list of elements of the tree
Both operations can be performed in O(log n) worst case time when a self-balancing tree is used as the base data structure.
Augmented search tree implementation
[edit]To turn a regular search tree into an order statistic tree, the nodes of the tree need to store one additional value, which is the size of the subtree rooted at that node (i.e., the number of nodes below it). All operations that modify the tree must adjust this information to preserve the invariant that
size[x] = size[left[x]] + size[right[x]] + 1
where size[nil] = 0 by definition. Select can then be implemented as[2]: 342
function Select(t, i)
// Returns the i'th element (one-indexed) of the elements in t
p ← size[left[t]]+1
if i = p
return t
else if i < p
return Select(left[t], i)
else
return Select(right[t], i - p)
Rank can be implemented, using the parent-function p[x], as[3]: 342
function Rank(T, x)
// Returns the position of x (one-indexed) in the linear sorted list of elements of the tree T
r ← size[left[x]] + 1
y ← x
while y ≠ T.root
if y = right[p[y]]
r ← r + size[left[p[y]]] + 1
y ← p[y]
return r
Order-statistic trees can be further amended with bookkeeping information to maintain balance (e.g., tree height can be added to get an order statistic AVL tree, or a color bit to get a red–black order statistic tree). Alternatively, the size field can be used in conjunction with a weight-balancing scheme at no additional storage cost.[4]
References
[edit]- ^ "Counted B-Trees". 11 December 2004. Retrieved 18 January 2014.
- ^ Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2001) [1990]. Introduction to Algorithms (2nd ed.). MIT Press and McGraw-Hill. ISBN 0-262-03293-7.
- ^ Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2009) [1990]. Introduction to Algorithms (3rd ed.). MIT Press and McGraw-Hill. ISBN 0-262-03384-4.
- ^ Roura, Salvador (2001). A new method for balancing binary search trees. ICALP. Lecture Notes in Computer Science. Vol. 2076. pp. 469–480. doi:10.1007/3-540-48224-5_39. ISBN 978-3-540-42287-7.
See also
[edit]- Implicit treap
- Segment tree can be used for counting queries, and rank is a counting query, as if each node stores 1 and they are summed from beginning to element
External links
[edit]- Order statistic tree on PineWiki, Yale University.
- The Python package blist uses order statistic B-trees to implement lists with fast insertion at arbitrary positions.
- G++ extension to STL: Policy-Based Data Structures. tree_order_statistics_node_update extension for ordered tree-based containers
Order statistic tree
View on GrokipediaFundamentals
Definition
An order statistic tree is a balanced binary search tree, such as a red-black tree or AVL tree, augmented with subtree size information stored in each node to support efficient order statistic queries.[4] Each node contains a key value, pointers to its left and right child nodes, and a size field that indicates the total number of nodes in the subtree rooted at that node, including the node itself.[4] This structure enables key operations, such as identifying the k-th smallest element (the order-k select operation), to be performed in worst-case time, assuming the underlying tree is balanced (with height ), where is the number of elements in the tree.[4] The node structure can be illustrated in pseudocode as follows:node {
key,
left,
right,
size = 1 + size(left) + size(right)
}
```[](https://mitpress.mit.edu/9780262046305/introduction-to-algorithms/)
### Historical Development
The concept of order statistic trees originated from advancements in binary search tree research during the late 20th century, building on foundational work in balanced search structures. [Binary search tree](/page/Binary_search_tree)s, the underlying structure, were formalized in the early 1960s, with early balanced variants like AVL trees introduced in 1962 by Georgii Adelson-Velsky and Evgenii Landis to ensure logarithmic-time operations.[](https://doi.org/10.1007/978-3-642-63606-7_2)
In the 1970s and 1980s, researchers augmented [binary search tree](/page/Binary_search_tree)s with additional node information, such as subtree sizes, to enable efficient order statistics queries like selection and ranking. This augmentation drew from earlier developments in balanced binary trees, including red-black trees, which were first conceptualized by [Rudolf Bayer](/page/Rudolf_Bayer) in 1972 as symmetric binary B-trees and refined by Leo Guibas and Robert Sedgewick in 1978 through a dichromatic framework that simplified rotations and color maintenance.[](https://doi.org/10.1145/359340.359342)[](https://doi.org/10.1145/359460.359470) Sedgewick's contributions to searching algorithms in this period, including analyses of tree balancing, laid groundwork for size-augmented variants that supported dynamic order statistics.
The term "order statistic tree" and its comprehensive formulation as a size-augmented red-black tree were introduced by [Thomas H. Cormen](/page/Thomas_H._Cormen), [Charles E. Leiserson](/page/Charles_E._Leiserson), Ronald L. Rivest, and Clifford Stein in the first edition of their textbook *[Introduction to Algorithms](/page/Introduction_to_Algorithms)* in 1990, emphasizing operations like OS-SELECT and OS-RANK with O(log n) [time complexity](/page/Time_complexity).[](https://mitpress.mit.edu/9780262046305/introduction-to-algorithms/) This work popularized the structure in academic and practical contexts, integrating it with balanced trees for guaranteed performance.
Later developments included array-based approaches like the [Fenwick tree](/page/Fenwick_tree), proposed by Peter M. Fenwick in 1994 for cumulative frequency tables, which enabled order statistics in linear storage through [prefix sum](/page/Prefix_sum) updates and queries.[](https://doi.org/10.1145/230201.230395) Balanced BSTs from the [1970s](/page/1970s) and [1980s](/page/1980s) further inspired augmentations for order maintenance in dynamic sets.
Evolution continued with implicit-key structures, such as treaps, where Cecilia Aragon and Raimund Seidel introduced randomized implicit treaps in 1996 to support order statistics without explicit keys via heap priorities.[](https://doi.org/10.1145/235347.233380) In the [2000s](/page/2000s), the structure influenced library implementations, notably the GNU C++ library's policy-based data structures, where `__gnu_pbds::ordered_set`—introduced around 2005—provided a templated order statistic tree with find_by_order and order_of_key operations.[](https://gcc.gnu.org/onlinedocs/libstdc++/manual/policy_data_structures.html)
## Underlying Data Structure
### Binary Search Tree Basics
A [binary search tree](/page/Binary_search_tree) (BST) is a [binary tree](/page/Binary_tree) [data structure](/page/Data_structure) in which each node has a key, and for any given node, all keys in its left subtree are less than the node's key, while all keys in its right subtree are greater than the node's key.[](https://opendsa-server.cs.vt.edu/ODSA/Books/Everything/html/BST.html) This ordering property holds recursively for every subtree, ensuring that the tree maintains a sorted structure based on key values.[](https://courses.grainger.illinois.edu/cs225/fa2021/resources/bst/) BSTs assume that keys are comparable and unique, though variations may allow duplicates.[](https://www.cs.cmu.edu/afs/cs/academic/class/15210-s15/www/lectures/bst-notes.pdf)
The fundamental operations on a BST include searching for a key, finding the minimum or maximum key, and determining the predecessor or successor of a given key. Searching starts at the [root](/page/Root) and traverses left or right based on comparisons, taking time proportional to the tree's [height](/page/Height) $h$.[](https://softwarefoundations.cis.upenn.edu/vfa-current/SearchTree.html) Finding the minimum involves traversing leftward to the leftmost node, while the maximum is the rightmost node; both require $O(h)$ time.[](https://pages.cs.wisc.edu/~siff/CS367/Notes/bsts.html) Predecessor and successor operations similarly follow the tree structure to locate the largest key less than or the smallest key greater than a target, each in $O(h)$ time.[](https://ics.uci.edu/~thornton/ics46/Notes/BinarySearchTrees/)
An in-order traversal of a BST visits nodes in ascending order of their keys, which arises directly from the left-smaller/right-larger property and enables efficient access to sorted sequences without additional sorting.[](https://pages.cs.wisc.edu/~siff/CS367/Notes/bsts.html) This implicit ordering makes BSTs suitable for dynamic sets where elements are inserted and deleted while preserving sortability.[](https://web.stanford.edu/class/archive/cs/cs161/cs161.1168/lecture8.pdf)
However, without mechanisms to control balance, BSTs constructed from sequential or adversarially ordered insertions can degenerate into a [linked list](/page/Linked_list), resulting in a height of $O(n)$ in the worst case, where $n$ is the number of nodes, and thus degrading operation times to linear.[](https://softwarefoundations.cis.upenn.edu/vfa-current/SearchTree.html)
### Size Augmentation
To enable order statistic functionality in a [binary search tree](/page/Binary_search_tree) (BST), each node is augmented with an [integer](/page/Integer) field that stores the [size](/page/Size) of the subtree rooted at that node. This augmentation process begins by initializing the size field to 1 for [leaf](/page/Leaf) nodes, which contain a single element. For any non-[leaf](/page/Leaf) node, the size is defined as $ \text{size}(node) = 1 + \text{size}(left) + \text{size}(right) $, where left and right refer to the node's child subtrees. This recursive definition ensures that the size field captures the total number of nodes in the subtree, providing a compact summary of its [cardinality](/page/Cardinality).[](https://www2.cs.arizona.edu/classes/cs445/fall15/AugmentedDS.pdf)
Maintaining the accuracy of these size fields requires an invariant: after any [structural change](/page/Structural_change) to the [tree](/page/Tree), such as a [rotation](/page/Rotation) or node addition/removal, the sizes must be recomputed bottom-up along the affected path to the [root](/page/Root). This bottom-up update propagates changes from the modified subtrees upward, preserving the correctness of all size values throughout the [tree](/page/Tree). Failure to uphold this invariant would lead to inconsistent subtree cardinalities, undermining the tree's utility for order statistics.[](https://www2.cs.arizona.edu/classes/cs445/fall15/AugmentedDS.pdf)
The [size](/page/Size) of any subtree can be retrieved in constant time by directly accessing the precomputed field:
node {
key,
left,
right,
size = 1 + size(left) + size(right)
}
```[](https://mitpress.mit.edu/9780262046305/introduction-to-algorithms/)
### Historical Development
The concept of order statistic trees originated from advancements in binary search tree research during the late 20th century, building on foundational work in balanced search structures. [Binary search tree](/page/Binary_search_tree)s, the underlying structure, were formalized in the early 1960s, with early balanced variants like AVL trees introduced in 1962 by Georgii Adelson-Velsky and Evgenii Landis to ensure logarithmic-time operations.[](https://doi.org/10.1007/978-3-642-63606-7_2)
In the 1970s and 1980s, researchers augmented [binary search tree](/page/Binary_search_tree)s with additional node information, such as subtree sizes, to enable efficient order statistics queries like selection and ranking. This augmentation drew from earlier developments in balanced binary trees, including red-black trees, which were first conceptualized by [Rudolf Bayer](/page/Rudolf_Bayer) in 1972 as symmetric binary B-trees and refined by Leo Guibas and Robert Sedgewick in 1978 through a dichromatic framework that simplified rotations and color maintenance.[](https://doi.org/10.1145/359340.359342)[](https://doi.org/10.1145/359460.359470) Sedgewick's contributions to searching algorithms in this period, including analyses of tree balancing, laid groundwork for size-augmented variants that supported dynamic order statistics.
The term "order statistic tree" and its comprehensive formulation as a size-augmented red-black tree were introduced by [Thomas H. Cormen](/page/Thomas_H._Cormen), [Charles E. Leiserson](/page/Charles_E._Leiserson), Ronald L. Rivest, and Clifford Stein in the first edition of their textbook *[Introduction to Algorithms](/page/Introduction_to_Algorithms)* in 1990, emphasizing operations like OS-SELECT and OS-RANK with O(log n) [time complexity](/page/Time_complexity).[](https://mitpress.mit.edu/9780262046305/introduction-to-algorithms/) This work popularized the structure in academic and practical contexts, integrating it with balanced trees for guaranteed performance.
Later developments included array-based approaches like the [Fenwick tree](/page/Fenwick_tree), proposed by Peter M. Fenwick in 1994 for cumulative frequency tables, which enabled order statistics in linear storage through [prefix sum](/page/Prefix_sum) updates and queries.[](https://doi.org/10.1145/230201.230395) Balanced BSTs from the [1970s](/page/1970s) and [1980s](/page/1980s) further inspired augmentations for order maintenance in dynamic sets.
Evolution continued with implicit-key structures, such as treaps, where Cecilia Aragon and Raimund Seidel introduced randomized implicit treaps in 1996 to support order statistics without explicit keys via heap priorities.[](https://doi.org/10.1145/235347.233380) In the [2000s](/page/2000s), the structure influenced library implementations, notably the GNU C++ library's policy-based data structures, where `__gnu_pbds::ordered_set`—introduced around 2005—provided a templated order statistic tree with find_by_order and order_of_key operations.[](https://gcc.gnu.org/onlinedocs/libstdc++/manual/policy_data_structures.html)
## Underlying Data Structure
### Binary Search Tree Basics
A [binary search tree](/page/Binary_search_tree) (BST) is a [binary tree](/page/Binary_tree) [data structure](/page/Data_structure) in which each node has a key, and for any given node, all keys in its left subtree are less than the node's key, while all keys in its right subtree are greater than the node's key.[](https://opendsa-server.cs.vt.edu/ODSA/Books/Everything/html/BST.html) This ordering property holds recursively for every subtree, ensuring that the tree maintains a sorted structure based on key values.[](https://courses.grainger.illinois.edu/cs225/fa2021/resources/bst/) BSTs assume that keys are comparable and unique, though variations may allow duplicates.[](https://www.cs.cmu.edu/afs/cs/academic/class/15210-s15/www/lectures/bst-notes.pdf)
The fundamental operations on a BST include searching for a key, finding the minimum or maximum key, and determining the predecessor or successor of a given key. Searching starts at the [root](/page/Root) and traverses left or right based on comparisons, taking time proportional to the tree's [height](/page/Height) $h$.[](https://softwarefoundations.cis.upenn.edu/vfa-current/SearchTree.html) Finding the minimum involves traversing leftward to the leftmost node, while the maximum is the rightmost node; both require $O(h)$ time.[](https://pages.cs.wisc.edu/~siff/CS367/Notes/bsts.html) Predecessor and successor operations similarly follow the tree structure to locate the largest key less than or the smallest key greater than a target, each in $O(h)$ time.[](https://ics.uci.edu/~thornton/ics46/Notes/BinarySearchTrees/)
An in-order traversal of a BST visits nodes in ascending order of their keys, which arises directly from the left-smaller/right-larger property and enables efficient access to sorted sequences without additional sorting.[](https://pages.cs.wisc.edu/~siff/CS367/Notes/bsts.html) This implicit ordering makes BSTs suitable for dynamic sets where elements are inserted and deleted while preserving sortability.[](https://web.stanford.edu/class/archive/cs/cs161/cs161.1168/lecture8.pdf)
However, without mechanisms to control balance, BSTs constructed from sequential or adversarially ordered insertions can degenerate into a [linked list](/page/Linked_list), resulting in a height of $O(n)$ in the worst case, where $n$ is the number of nodes, and thus degrading operation times to linear.[](https://softwarefoundations.cis.upenn.edu/vfa-current/SearchTree.html)
### Size Augmentation
To enable order statistic functionality in a [binary search tree](/page/Binary_search_tree) (BST), each node is augmented with an [integer](/page/Integer) field that stores the [size](/page/Size) of the subtree rooted at that node. This augmentation process begins by initializing the size field to 1 for [leaf](/page/Leaf) nodes, which contain a single element. For any non-[leaf](/page/Leaf) node, the size is defined as $ \text{size}(node) = 1 + \text{size}(left) + \text{size}(right) $, where left and right refer to the node's child subtrees. This recursive definition ensures that the size field captures the total number of nodes in the subtree, providing a compact summary of its [cardinality](/page/Cardinality).[](https://www2.cs.arizona.edu/classes/cs445/fall15/AugmentedDS.pdf)
Maintaining the accuracy of these size fields requires an invariant: after any [structural change](/page/Structural_change) to the [tree](/page/Tree), such as a [rotation](/page/Rotation) or node addition/removal, the sizes must be recomputed bottom-up along the affected path to the [root](/page/Root). This bottom-up update propagates changes from the modified subtrees upward, preserving the correctness of all size values throughout the [tree](/page/Tree). Failure to uphold this invariant would lead to inconsistent subtree cardinalities, undermining the tree's utility for order statistics.[](https://www2.cs.arizona.edu/classes/cs445/fall15/AugmentedDS.pdf)
The [size](/page/Size) of any subtree can be retrieved in constant time by directly accessing the precomputed field:
This achieves O(1) time by reading the stored value without [recursion](/page/Recursion) or traversal.[](https://www2.cs.arizona.edu/classes/cs445/fall15/AugmentedDS.pdf)
The primary benefit of size augmentation is the ability to compute subtree cardinalities in constant time per node during traversals, avoiding the need for expensive full-subtree enumerations that would otherwise require O(n) time in a standard BST. This efficiency stems from the augmented fields, which transform the [tree](/page/Tree) into a structure capable of supporting dynamic order statistics while inheriting the BST's logarithmic search properties.
## Core Operations
### Selection Operation
The selection operation in an order statistic tree, also known as order-k select, retrieves the k-th smallest element from the dynamic set of keys stored in the tree, where the keys are ordered according to the [binary search tree](/page/Binary_search_tree) property.[](https://ocw.mit.edu/courses/6-046j-introduction-to-algorithms-sma-5503-fall-2005/resources/lecture-11-augmenting-data-structures-dynamic-order-statistics-interval-trees/) This operation exploits the size augmentation in each node, which stores the total number of nodes in its subtree, to perform the search in O(log n) time on a balanced [tree](/page/Tree) without scanning the entire [structure](/page/Structure).[](https://ocw.mit.edu/courses/6-046j-introduction-to-algorithms-sma-5503-fall-2005/resources/lecture-11-augmenting-data-structures-dynamic-order-statistics-interval-trees/)
The algorithm starts at the root node and proceeds recursively as follows: compute the size of the left subtree, denoted as left_size, which equals the number of elements smaller than the current node's key. If k ≤ left_size, the k-th smallest element lies in the left subtree, so recurse on that subtree with the same k. If k = left_size + 1, the current node contains the k-th smallest element, so return it. Otherwise, the k-th smallest element is in the right subtree, so recurse there with adjusted index k - (left_size + 1).[](https://ocw.mit.edu/courses/6-046j-introduction-to-algorithms-sma-5503-fall-2005/resources/lecture-11-augmenting-data-structures-dynamic-order-statistics-interval-trees/) This navigation mimics the position in the inorder traversal of the tree, using subtree sizes to skip irrelevant branches efficiently.[](https://ocw.mit.edu/courses/6-046j-introduction-to-algorithms-sma-5503-fall-2005/resources/lecture-11-augmenting-data-structures-dynamic-order-statistics-interval-trees/)
The operation assumes 1-based indexing, where k = 1 yields the smallest element and k = n (the total number of elements) yields the largest.[](https://ocw.mit.edu/courses/6-046j-introduction-to-algorithms-sma-5503-fall-2005/resources/lecture-11-augmenting-data-structures-dynamic-order-statistics-interval-trees/) The following pseudocode implements this recursive procedure:
This achieves O(1) time by reading the stored value without [recursion](/page/Recursion) or traversal.[](https://www2.cs.arizona.edu/classes/cs445/fall15/AugmentedDS.pdf)
The primary benefit of size augmentation is the ability to compute subtree cardinalities in constant time per node during traversals, avoiding the need for expensive full-subtree enumerations that would otherwise require O(n) time in a standard BST. This efficiency stems from the augmented fields, which transform the [tree](/page/Tree) into a structure capable of supporting dynamic order statistics while inheriting the BST's logarithmic search properties.
## Core Operations
### Selection Operation
The selection operation in an order statistic tree, also known as order-k select, retrieves the k-th smallest element from the dynamic set of keys stored in the tree, where the keys are ordered according to the [binary search tree](/page/Binary_search_tree) property.[](https://ocw.mit.edu/courses/6-046j-introduction-to-algorithms-sma-5503-fall-2005/resources/lecture-11-augmenting-data-structures-dynamic-order-statistics-interval-trees/) This operation exploits the size augmentation in each node, which stores the total number of nodes in its subtree, to perform the search in O(log n) time on a balanced [tree](/page/Tree) without scanning the entire [structure](/page/Structure).[](https://ocw.mit.edu/courses/6-046j-introduction-to-algorithms-sma-5503-fall-2005/resources/lecture-11-augmenting-data-structures-dynamic-order-statistics-interval-trees/)
The algorithm starts at the root node and proceeds recursively as follows: compute the size of the left subtree, denoted as left_size, which equals the number of elements smaller than the current node's key. If k ≤ left_size, the k-th smallest element lies in the left subtree, so recurse on that subtree with the same k. If k = left_size + 1, the current node contains the k-th smallest element, so return it. Otherwise, the k-th smallest element is in the right subtree, so recurse there with adjusted index k - (left_size + 1).[](https://ocw.mit.edu/courses/6-046j-introduction-to-algorithms-sma-5503-fall-2005/resources/lecture-11-augmenting-data-structures-dynamic-order-statistics-interval-trees/) This navigation mimics the position in the inorder traversal of the tree, using subtree sizes to skip irrelevant branches efficiently.[](https://ocw.mit.edu/courses/6-046j-introduction-to-algorithms-sma-5503-fall-2005/resources/lecture-11-augmenting-data-structures-dynamic-order-statistics-interval-trees/)
The operation assumes 1-based indexing, where k = 1 yields the smallest element and k = n (the total number of elements) yields the largest.[](https://ocw.mit.edu/courses/6-046j-introduction-to-algorithms-sma-5503-fall-2005/resources/lecture-11-augmenting-data-structures-dynamic-order-statistics-interval-trees/) The following pseudocode implements this recursive procedure:
Here, `size` is a function that returns the augmented size field of the given subtree root (0 for null).[](https://cse.sc.edu/~fenner/csce750/OKane-Fall-2020/notes-augmenting.pdf)
The correctness of the select algorithm stems from the binary search tree invariant, which guarantees that all keys in the left subtree precede the current key in sorted order, combined with the size fields that precisely quantify the number of preceding elements in the inorder sequence.[](https://ocw.mit.edu/courses/6-046j-introduction-to-algorithms-sma-5503-fall-2005/resources/lecture-11-augmenting-data-structures-dynamic-order-statistics-interval-trees/) Subtree sizes, computed as the sum of left subtree size, right subtree size, and 1 for the node itself, enable this direct positional lookup.[](https://ocw.mit.edu/courses/6-046j-introduction-to-algorithms-sma-5503-fall-2005/resources/lecture-11-augmenting-data-structures-dynamic-order-statistics-interval-trees/)
### Rank Operation
The rank operation in an [order statistic tree](/page/Order_statistic) computes the 1-based rank of a specified key in the sorted sequence of all elements stored in the [tree](/page/Tree) (1 + number of elements strictly smaller than the key), leveraging the augmented subtree size information to achieve this efficiently. If the key exists in the tree, it returns the rank; if absent, it returns the 1-based position where the key would be inserted (1 + number of elements strictly smaller than the key). This operation is fundamental for applications requiring quick positional queries, such as [percentile](/page/Percentile) calculations or order-based indexing. The operations assume distinct keys; for multisets with duplicates, size augmentation and comparisons must be adjusted accordingly.[](https://sites.math.rutgers.edu/~ajl213/CLRS/Ch14.pdf)
The algorithm performs a standard binary search traversal starting from the [root](/page/Root), but augments the path with counts derived from subtree [size](/page/Size)s to avoid a full inorder traversal. Specifically, as the traversal proceeds, whenever the search path veers to the right child of a node (indicating the key is greater than or equal to the current node's key), the rank accumulator adds the [size](/page/Size) of the current node's left subtree (all elements smaller than the current key in that subtree) plus one (accounting for the current node itself). If the path goes left, no addition occurs since all elements in the right subtree and beyond are larger. Upon reaching the node matching the key, the accumulated value is returned as the rank; if no match is found (reaching null), return the accumulator plus one for the insertion position. This approach ensures the operation runs in $O(\log n)$ time on a balanced [tree](/page/Tree), matching the [height](/page/Height) of the structure.[](https://sites.math.rutgers.edu/~ajl213/CLRS/Ch14.pdf)
The rank can be formalized as 1 + the sum of the sizes of all left subtrees encountered along the search path to the key, plus one for each [ancestor](/page/Ancestor) node where the path branched right (including the target node if found). For a key $k$, the [rank](/page/RANK) $r(k) = 1 + \sum \{ \text{size}(v.\text{left}) + [k \geq v.\text{key}] \mid v \text{ on path} \}$, where [ ] is 1 if true, 0 else. This [summation](/page/Summation) directly corresponds to the number of nodes preceding $k$ in an inorder traversal, plus one.[](https://sites.math.rutgers.edu/~ajl213/CLRS/Ch14.pdf)
Here is the iterative pseudocode for the rank operation, assuming a node structure with `key`, `left`, `right`, and `size` fields:
Here, `size` is a function that returns the augmented size field of the given subtree root (0 for null).[](https://cse.sc.edu/~fenner/csce750/OKane-Fall-2020/notes-augmenting.pdf)
The correctness of the select algorithm stems from the binary search tree invariant, which guarantees that all keys in the left subtree precede the current key in sorted order, combined with the size fields that precisely quantify the number of preceding elements in the inorder sequence.[](https://ocw.mit.edu/courses/6-046j-introduction-to-algorithms-sma-5503-fall-2005/resources/lecture-11-augmenting-data-structures-dynamic-order-statistics-interval-trees/) Subtree sizes, computed as the sum of left subtree size, right subtree size, and 1 for the node itself, enable this direct positional lookup.[](https://ocw.mit.edu/courses/6-046j-introduction-to-algorithms-sma-5503-fall-2005/resources/lecture-11-augmenting-data-structures-dynamic-order-statistics-interval-trees/)
### Rank Operation
The rank operation in an [order statistic tree](/page/Order_statistic) computes the 1-based rank of a specified key in the sorted sequence of all elements stored in the [tree](/page/Tree) (1 + number of elements strictly smaller than the key), leveraging the augmented subtree size information to achieve this efficiently. If the key exists in the tree, it returns the rank; if absent, it returns the 1-based position where the key would be inserted (1 + number of elements strictly smaller than the key). This operation is fundamental for applications requiring quick positional queries, such as [percentile](/page/Percentile) calculations or order-based indexing. The operations assume distinct keys; for multisets with duplicates, size augmentation and comparisons must be adjusted accordingly.[](https://sites.math.rutgers.edu/~ajl213/CLRS/Ch14.pdf)
The algorithm performs a standard binary search traversal starting from the [root](/page/Root), but augments the path with counts derived from subtree [size](/page/Size)s to avoid a full inorder traversal. Specifically, as the traversal proceeds, whenever the search path veers to the right child of a node (indicating the key is greater than or equal to the current node's key), the rank accumulator adds the [size](/page/Size) of the current node's left subtree (all elements smaller than the current key in that subtree) plus one (accounting for the current node itself). If the path goes left, no addition occurs since all elements in the right subtree and beyond are larger. Upon reaching the node matching the key, the accumulated value is returned as the rank; if no match is found (reaching null), return the accumulator plus one for the insertion position. This approach ensures the operation runs in $O(\log n)$ time on a balanced [tree](/page/Tree), matching the [height](/page/Height) of the structure.[](https://sites.math.rutgers.edu/~ajl213/CLRS/Ch14.pdf)
The rank can be formalized as 1 + the sum of the sizes of all left subtrees encountered along the search path to the key, plus one for each [ancestor](/page/Ancestor) node where the path branched right (including the target node if found). For a key $k$, the [rank](/page/RANK) $r(k) = 1 + \sum \{ \text{size}(v.\text{left}) + [k \geq v.\text{key}] \mid v \text{ on path} \}$, where [ ] is 1 if true, 0 else. This [summation](/page/Summation) directly corresponds to the number of nodes preceding $k$ in an inorder traversal, plus one.[](https://sites.math.rutgers.edu/~ajl213/CLRS/Ch14.pdf)
Here is the iterative pseudocode for the rank operation, assuming a node structure with `key`, `left`, `right`, and `size` fields:
This implementation handles the null left subtree case explicitly and terminates with the appropriate 1-based rank or insertion position. The selection operation is a complementary query that, given a rank, retrieves the corresponding key.[](https://sites.math.rutgers.edu/~ajl213/CLRS/Ch14.pdf)
## Modification Operations
### Insertion
The insertion operation in an order-statistic tree begins with the standard [binary search tree](/page/Binary_search_tree) (BST) insertion procedure to locate the appropriate leaf position for the new key $k$. The algorithm starts at the [root](/page/Root) and traverses downward, moving to the left child if $k$ is less than the current node's key or to the right child otherwise, until reaching a [null pointer](/page/Null_pointer) where the new leaf can be attached. A new node $z$ is then created with key $k$ and its size field initialized to 1, as it has no children, and linked as the appropriate child of the parent node.[](https://www2.cs.arizona.edu/classes/cs445/fall15/AugmentedDS.pdf)[](https://cse.sc.edu/~fenner/csce750/OKane-Fall-2020/notes-augmenting.pdf)
To maintain the order-statistic augmentation, the sizes of all nodes on the path from the insertion point back to the root must be updated after attaching the new node. This is achieved by traversing upward from $z$ to the root, incrementing the size field of each ancestor by 1, since the subtree sizes increase by exactly one along this path. The update ensures that every node's size accurately reflects the total number of nodes in its subtree post-insertion.[](https://www2.cs.arizona.edu/classes/cs445/fall15/AugmentedDS.pdf)[](https://cse.sc.edu/~fenner/csce750/OKane-Fall-2020/notes-augmenting.pdf)
The following pseudocode outlines the augmented insertion process, assuming the underlying BST implementation (such as TREE-INSERT from standard references) returns the newly inserted node $z$ and that parent pointers are maintained for traversal:
This implementation handles the null left subtree case explicitly and terminates with the appropriate 1-based rank or insertion position. The selection operation is a complementary query that, given a rank, retrieves the corresponding key.[](https://sites.math.rutgers.edu/~ajl213/CLRS/Ch14.pdf)
## Modification Operations
### Insertion
The insertion operation in an order-statistic tree begins with the standard [binary search tree](/page/Binary_search_tree) (BST) insertion procedure to locate the appropriate leaf position for the new key $k$. The algorithm starts at the [root](/page/Root) and traverses downward, moving to the left child if $k$ is less than the current node's key or to the right child otherwise, until reaching a [null pointer](/page/Null_pointer) where the new leaf can be attached. A new node $z$ is then created with key $k$ and its size field initialized to 1, as it has no children, and linked as the appropriate child of the parent node.[](https://www2.cs.arizona.edu/classes/cs445/fall15/AugmentedDS.pdf)[](https://cse.sc.edu/~fenner/csce750/OKane-Fall-2020/notes-augmenting.pdf)
To maintain the order-statistic augmentation, the sizes of all nodes on the path from the insertion point back to the root must be updated after attaching the new node. This is achieved by traversing upward from $z$ to the root, incrementing the size field of each ancestor by 1, since the subtree sizes increase by exactly one along this path. The update ensures that every node's size accurately reflects the total number of nodes in its subtree post-insertion.[](https://www2.cs.arizona.edu/classes/cs445/fall15/AugmentedDS.pdf)[](https://cse.sc.edu/~fenner/csce750/OKane-Fall-2020/notes-augmenting.pdf)
The following pseudocode outlines the augmented insertion process, assuming the underlying BST implementation (such as TREE-INSERT from standard references) returns the newly inserted node $z$ and that parent pointers are maintained for traversal:
This approach recomputes sizes incrementally along the path, avoiding full subtree recalculations.[](https://cse.sc.edu/~fenner/csce750/OKane-Fall-2020/notes-augmenting.pdf)
In cases involving duplicate keys, the insertion policy must be consistent to preserve correct ordering and statistics; for example, equal keys are typically inserted into the right subtree to ensure they appear after existing instances in the in-order traversal.[](https://www.geeksforgeeks.org/dsa/how-to-handle-duplicates-in-binary-search-tree/)
Assuming the tree is balanced (e.g., via red-black tree implementation), the insertion operation, including traversal and size updates, takes average $O(\log n)$ time, where $n$ is the number of nodes.[](https://www2.cs.arizona.edu/classes/cs445/fall15/AugmentedDS.pdf)
### Deletion
Deletion in an order statistic tree follows the standard [binary search tree](/page/Binary_search_tree) (BST) deletion procedure, augmented with updates to the subtree size fields to maintain the order statistic functionality. The process begins by locating the node containing the key to delete using a standard BST search. Once found, the deletion handles three primary cases based on the number of children the node has, ensuring the BST property is preserved throughout.[](http://www.cs.ucf.edu/courses/cop3502h/spring2012/Lectures/Lec11_BinaryTrees_Deletion.pdf)
In the first case, if the node is a leaf (no children), it is simply removed by setting the appropriate child pointer of its parent to null. This is the simplest scenario, requiring no further structural adjustments beyond the removal. In the second case, if the node has exactly one child, the child is promoted to take the node's position by linking the parent directly to this child, and the node is then freed. These cases maintain the tree structure with minimal changes.[](http://www.cs.ucf.edu/courses/cop3502h/spring2012/Lectures/Lec11_BinaryTrees_Deletion.pdf)
The third case, involving a node with two children, is more involved: the node's value is replaced with its inorder successor (the minimum value in the right subtree), which is typically a leaf or has only a left child. The successor node is then deleted using one of the simpler cases above. This approach avoids directly merging subtrees while preserving the BST ordering. Careful handling during the successor swap ensures that temporary inconsistencies in sizes are resolved through subsequent updates.[](http://www.cs.ucf.edu/courses/cop3502h/spring2012/Lectures/Lec11_BinaryTrees_Deletion.pdf)
Following any structural changes from deletion, the subtree sizes must be updated bottom-up along the path from the affected node to the [root](/page/Root) to reflect the removal of one element. Initially, sizes can be decremented by 1 for nodes along this path, but where subtrees are restructured (e.g., after child replacement or successor swap), the sizes are recomputed explicitly as the sum of the sizes of the left and right subtrees plus one for the node itself. This ensures no size inconsistencies arise, allowing [order statistic](/page/Order_statistic) operations like selection and rank to remain accurate. In balanced variants such as red-black trees, rotations triggered by deletion also require O(1) size fixes per rotation.[](https://www2.cs.arizona.edu/classes/cs445/fall15/AugmentedDS.pdf)[](https://www.cs.virginia.edu/~robins/cs4102/slides/Cormen%20et%20al%20Slides/PDF/11-Augmenting%20Data%20Structures.pdf)
The following pseudocode snippet illustrates the size update process post-deletion, starting from the parent of the deleted node and traversing upward:
This approach recomputes sizes incrementally along the path, avoiding full subtree recalculations.[](https://cse.sc.edu/~fenner/csce750/OKane-Fall-2020/notes-augmenting.pdf)
In cases involving duplicate keys, the insertion policy must be consistent to preserve correct ordering and statistics; for example, equal keys are typically inserted into the right subtree to ensure they appear after existing instances in the in-order traversal.[](https://www.geeksforgeeks.org/dsa/how-to-handle-duplicates-in-binary-search-tree/)
Assuming the tree is balanced (e.g., via red-black tree implementation), the insertion operation, including traversal and size updates, takes average $O(\log n)$ time, where $n$ is the number of nodes.[](https://www2.cs.arizona.edu/classes/cs445/fall15/AugmentedDS.pdf)
### Deletion
Deletion in an order statistic tree follows the standard [binary search tree](/page/Binary_search_tree) (BST) deletion procedure, augmented with updates to the subtree size fields to maintain the order statistic functionality. The process begins by locating the node containing the key to delete using a standard BST search. Once found, the deletion handles three primary cases based on the number of children the node has, ensuring the BST property is preserved throughout.[](http://www.cs.ucf.edu/courses/cop3502h/spring2012/Lectures/Lec11_BinaryTrees_Deletion.pdf)
In the first case, if the node is a leaf (no children), it is simply removed by setting the appropriate child pointer of its parent to null. This is the simplest scenario, requiring no further structural adjustments beyond the removal. In the second case, if the node has exactly one child, the child is promoted to take the node's position by linking the parent directly to this child, and the node is then freed. These cases maintain the tree structure with minimal changes.[](http://www.cs.ucf.edu/courses/cop3502h/spring2012/Lectures/Lec11_BinaryTrees_Deletion.pdf)
The third case, involving a node with two children, is more involved: the node's value is replaced with its inorder successor (the minimum value in the right subtree), which is typically a leaf or has only a left child. The successor node is then deleted using one of the simpler cases above. This approach avoids directly merging subtrees while preserving the BST ordering. Careful handling during the successor swap ensures that temporary inconsistencies in sizes are resolved through subsequent updates.[](http://www.cs.ucf.edu/courses/cop3502h/spring2012/Lectures/Lec11_BinaryTrees_Deletion.pdf)
Following any structural changes from deletion, the subtree sizes must be updated bottom-up along the path from the affected node to the [root](/page/Root) to reflect the removal of one element. Initially, sizes can be decremented by 1 for nodes along this path, but where subtrees are restructured (e.g., after child replacement or successor swap), the sizes are recomputed explicitly as the sum of the sizes of the left and right subtrees plus one for the node itself. This ensures no size inconsistencies arise, allowing [order statistic](/page/Order_statistic) operations like selection and rank to remain accurate. In balanced variants such as red-black trees, rotations triggered by deletion also require O(1) size fixes per rotation.[](https://www2.cs.arizona.edu/classes/cs445/fall15/AugmentedDS.pdf)[](https://www.cs.virginia.edu/~robins/cs4102/slides/Cormen%20et%20al%20Slides/PDF/11-Augmenting%20Data%20Structures.pdf)
The following pseudocode snippet illustrates the size update process post-deletion, starting from the parent of the deleted node and traversing upward:
This traversal ensures all [ancestor](/page/Ancestor) sizes are correctly adjusted in O(h) time, where h is the tree height, typically O(log n) in balanced trees.[](https://www2.cs.arizona.edu/classes/cs445/fall15/AugmentedDS.pdf)
## Analysis and Applications
### Time and Space Complexity
The core operations of an order statistic tree—selection (finding the i-th smallest element), rank (finding the position of an element), insertion, and deletion—each require O(h) time, where h denotes the height of the underlying [binary search tree](/page/Binary_search_tree).[](https://web.stanford.edu/class/archive/cs/cs106b/cs106b.1224/section/section8/)
In the worst case for an unbalanced tree, the height h can grow to O(n), resulting in O(n) time complexity for these operations. However, when elements are inserted in random order, the expected height is O(log n), leading to O(log n) average-case [time complexity](/page/Time_complexity) across operations.[](https://web.stanford.edu/class/archive/cs/cs106b/cs106b.1224/section/section8/)
To achieve guaranteed O(log n) worst-case performance, [order statistic](/page/Order_statistic) trees are often built atop self-balancing structures like red-black trees or AVL trees, which maintain height at O(log n) through rotations and other adjustments during insertions and deletions.[](https://ocw.mit.edu/courses/6-046j-design-and-analysis-of-algorithms-spring-2015/fc870caae0e6812787bb5d50ea4d5e24_MIT6_046JS15_lec09.pdf)
For sequences of m operations on a tree with n elements, [amortized analysis](/page/Amortized_analysis) under balancing schemes confirms O(log n) per operation on average, as the cost of rebalancing is distributed across updates without exceeding the per-operation bound in the aggregate.[](https://ocw.mit.edu/courses/6-046j-design-and-analysis-of-algorithms-spring-2015/fc870caae0e6812787bb5d50ea4d5e24_MIT6_046JS15_lec09.pdf)
The space complexity is O(n) for n elements, since each node stores only O(1) extra information—the size of the subtree rooted at that node—beyond the standard binary search tree fields.[](https://web.stanford.edu/class/archive/cs/cs106b/cs106b.1224/section/section8/)
In comparison to a sorted array, which enables O(log n) rank queries via binary search and O(1) selection if indexed but demands O(n) time for insertions and deletions due to shifting elements, a balanced order statistic tree delivers O(log n) efficiency for all operations, making it preferable for dynamic sets requiring frequent updates.[](https://cs.stackexchange.com/questions/56175/prove-disprove-the-existance-of-a-data-structure-that-has-olog-n-inserts-delet)
### Practical Uses
Order statistic trees find application in dynamic median finding, where maintaining the [median](/page/Median) of a set under insertions and deletions is required, such as in [real-time data](/page/Real-time_data) analysis. By augmenting a balanced [binary search tree](/page/Binary_search_tree) with subtree sizes, these structures support selecting the floor(n/2)-th or ceiling(n/2)-th [order statistic](/page/Order_statistic) in O(log n) time, enabling efficient computation without resorting the entire dataset.[](https://www.cs.yale.edu/homes/aspnes/pinewiki/OrderStatisticsTree.html)
They also extend traditional priority queues by providing access to elements by rank, allowing retrieval of the k-th largest element alongside standard insert and extract-max operations. This capability is valuable in scheduling algorithms and [resource management](/page/Resource_management) systems, where not only the highest-priority item but also intermediate priorities need to be queried efficiently.
In [computational geometry](/page/Computational_geometry), order statistic trees and similar augmented search trees are utilized for storing points and supporting queries like range counting or reporting, which underpin k-nearest neighbor searches. For instance, search-tree tables based on augmented binary search trees achieve O(log n) query times for half-space range queries in 2D and higher dimensions, facilitating efficient point location and proximity computations in spatial datasets.[](https://dspace.mit.edu/bitstream/handle/1721.1/149146/MIT-LCS-TM-388.pdf?sequence=1&isAllowed=y)
Implementations appear in libraries such as the GNU C++ Standard Template Library's policy-based data structures, which offer an `ordered_set` container supporting `order_of_key` (rank query) and `find_by_order` (selection) in O(log n) time via red-black tree augmentation. Custom implementations in other languages, like augmenting Java's `TreeMap`, replicate these features for similar use cases.
Real-world deployments include database indexing, where [B-tree](/page/B-tree) variants of order statistic trees enable [percentile](/page/Percentile) queries by computing ranks and selections on sorted indices, supporting analytical operations like SQL's `PERCENTILE_CONT`. In [streaming data](/page/Streaming_data) processing, they maintain running order statistics for tasks such as [anomaly detection](/page/Anomaly_detection), as seen in incremental [decision tree](/page/Decision_tree) algorithms that use order statistics summaries for classification on continuous data flows.[](https://inria.hal.science/hal-00758003/document)
Despite these advantages, the overhead of maintaining subtree sizes during rotations and rebalancing can increase constant factors in highly dynamic environments with frequent modifications, making them less suitable for sets with minimal order-based queries compared to standard balanced trees.[](https://www.cs.yale.edu/homes/aspnes/pinewiki/OrderStatisticsTree.html)
This traversal ensures all [ancestor](/page/Ancestor) sizes are correctly adjusted in O(h) time, where h is the tree height, typically O(log n) in balanced trees.[](https://www2.cs.arizona.edu/classes/cs445/fall15/AugmentedDS.pdf)
## Analysis and Applications
### Time and Space Complexity
The core operations of an order statistic tree—selection (finding the i-th smallest element), rank (finding the position of an element), insertion, and deletion—each require O(h) time, where h denotes the height of the underlying [binary search tree](/page/Binary_search_tree).[](https://web.stanford.edu/class/archive/cs/cs106b/cs106b.1224/section/section8/)
In the worst case for an unbalanced tree, the height h can grow to O(n), resulting in O(n) time complexity for these operations. However, when elements are inserted in random order, the expected height is O(log n), leading to O(log n) average-case [time complexity](/page/Time_complexity) across operations.[](https://web.stanford.edu/class/archive/cs/cs106b/cs106b.1224/section/section8/)
To achieve guaranteed O(log n) worst-case performance, [order statistic](/page/Order_statistic) trees are often built atop self-balancing structures like red-black trees or AVL trees, which maintain height at O(log n) through rotations and other adjustments during insertions and deletions.[](https://ocw.mit.edu/courses/6-046j-design-and-analysis-of-algorithms-spring-2015/fc870caae0e6812787bb5d50ea4d5e24_MIT6_046JS15_lec09.pdf)
For sequences of m operations on a tree with n elements, [amortized analysis](/page/Amortized_analysis) under balancing schemes confirms O(log n) per operation on average, as the cost of rebalancing is distributed across updates without exceeding the per-operation bound in the aggregate.[](https://ocw.mit.edu/courses/6-046j-design-and-analysis-of-algorithms-spring-2015/fc870caae0e6812787bb5d50ea4d5e24_MIT6_046JS15_lec09.pdf)
The space complexity is O(n) for n elements, since each node stores only O(1) extra information—the size of the subtree rooted at that node—beyond the standard binary search tree fields.[](https://web.stanford.edu/class/archive/cs/cs106b/cs106b.1224/section/section8/)
In comparison to a sorted array, which enables O(log n) rank queries via binary search and O(1) selection if indexed but demands O(n) time for insertions and deletions due to shifting elements, a balanced order statistic tree delivers O(log n) efficiency for all operations, making it preferable for dynamic sets requiring frequent updates.[](https://cs.stackexchange.com/questions/56175/prove-disprove-the-existance-of-a-data-structure-that-has-olog-n-inserts-delet)
### Practical Uses
Order statistic trees find application in dynamic median finding, where maintaining the [median](/page/Median) of a set under insertions and deletions is required, such as in [real-time data](/page/Real-time_data) analysis. By augmenting a balanced [binary search tree](/page/Binary_search_tree) with subtree sizes, these structures support selecting the floor(n/2)-th or ceiling(n/2)-th [order statistic](/page/Order_statistic) in O(log n) time, enabling efficient computation without resorting the entire dataset.[](https://www.cs.yale.edu/homes/aspnes/pinewiki/OrderStatisticsTree.html)
They also extend traditional priority queues by providing access to elements by rank, allowing retrieval of the k-th largest element alongside standard insert and extract-max operations. This capability is valuable in scheduling algorithms and [resource management](/page/Resource_management) systems, where not only the highest-priority item but also intermediate priorities need to be queried efficiently.
In [computational geometry](/page/Computational_geometry), order statistic trees and similar augmented search trees are utilized for storing points and supporting queries like range counting or reporting, which underpin k-nearest neighbor searches. For instance, search-tree tables based on augmented binary search trees achieve O(log n) query times for half-space range queries in 2D and higher dimensions, facilitating efficient point location and proximity computations in spatial datasets.[](https://dspace.mit.edu/bitstream/handle/1721.1/149146/MIT-LCS-TM-388.pdf?sequence=1&isAllowed=y)
Implementations appear in libraries such as the GNU C++ Standard Template Library's policy-based data structures, which offer an `ordered_set` container supporting `order_of_key` (rank query) and `find_by_order` (selection) in O(log n) time via red-black tree augmentation. Custom implementations in other languages, like augmenting Java's `TreeMap`, replicate these features for similar use cases.
Real-world deployments include database indexing, where [B-tree](/page/B-tree) variants of order statistic trees enable [percentile](/page/Percentile) queries by computing ranks and selections on sorted indices, supporting analytical operations like SQL's `PERCENTILE_CONT`. In [streaming data](/page/Streaming_data) processing, they maintain running order statistics for tasks such as [anomaly detection](/page/Anomaly_detection), as seen in incremental [decision tree](/page/Decision_tree) algorithms that use order statistics summaries for classification on continuous data flows.[](https://inria.hal.science/hal-00758003/document)
Despite these advantages, the overhead of maintaining subtree sizes during rotations and rebalancing can increase constant factors in highly dynamic environments with frequent modifications, making them less suitable for sets with minimal order-based queries compared to standard balanced trees.[](https://www.cs.yale.edu/homes/aspnes/pinewiki/OrderStatisticsTree.html)
