Order statistic tree

Order statistic treeMain

Community hub

7 pages, 0 posts

0 subscribers

Recent from talks

Be the first to start a discussion here.

Recent from talks

Be the first to start a discussion here.

Contribute something

About hubMembersContent overviewUpdatesRules

Main reference articles

Order statistic tree

View on Wikipedia

from Wikipedia

In computer science, an order statistic tree is a variant of the binary search tree (or more generally, a B-tree^[1]) that supports two additional operations beyond insertion, lookup and deletion:

Select(i) – find the i-th smallest element stored in the tree
Rank(x) – find the rank of element x in the tree, i.e. its index in the sorted list of elements of the tree

Both operations can be performed in $O (log n)$ worst case time when a self-balancing tree is used as the base data structure.

Augmented search tree implementation

[edit]

To turn a regular search tree into an order statistic tree, the nodes of the tree need to store one additional value, which is the size of the subtree rooted at that node (i.e., the number of nodes below it). All operations that modify the tree must adjust this information to preserve the invariant that

size[x] = size[left[x]] + size[right[x]] + 1

where size[nil] = 0 by definition. Select can then be implemented as^[2]^: 342

function Select(t, i)
    // Returns the i'th element (one-indexed) of the elements in t
    p ← size[left[t]]+1
    if i = p
        return t
    else if i < p
        return Select(left[t], i)
    else
        return Select(right[t], i - p)

Rank can be implemented, using the parent-function p[x], as^[3]^: 342

function Rank(T, x)
    // Returns the position of x (one-indexed) in the linear sorted list of elements of the tree T
    r ← size[left[x]] + 1
    y ← x
    while y ≠ T.root
        if y = right[p[y]]
            r ← r + size[left[p[y]]] + 1
        y ← p[y]
    return r

Order-statistic trees can be further amended with bookkeeping information to maintain balance (e.g., tree height can be added to get an order statistic AVL tree, or a color bit to get a red–black order statistic tree). Alternatively, the size field can be used in conjunction with a weight-balancing scheme at no additional storage cost.^[4]

References

[edit]

^ "Counted B-Trees". 11 December 2004. Retrieved 18 January 2014.
^ Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2001) [1990]. Introduction to Algorithms (2nd ed.). MIT Press and McGraw-Hill. ISBN 0-262-03293-7.
^ Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2009) [1990]. Introduction to Algorithms (3rd ed.). MIT Press and McGraw-Hill. ISBN 0-262-03384-4.
^ Roura, Salvador (2001). A new method for balancing binary search trees. ICALP. Lecture Notes in Computer Science. Vol. 2076. pp. 469–480. doi:10.1007/3-540-48224-5_39. ISBN 978-3-540-42287-7.

External links

[edit]

Order statistic tree on PineWiki, Yale University.
The Python package blist uses order statistic B-trees to implement lists with fast insertion at arbitrary positions.
G++ extension to STL: Policy-Based Data Structures. tree_order_statistics_node_update extension for ordered tree-based containers

v t e Tree data structures
Search trees (dynamic sets, associative arrays)	2–3 2–3–4 AA (a,b) AVL B K-Dimensional B+ B* B^x Binary search Optimal Self-balancing Dancing HTree Interval Order statistic Palindrome (Left-leaning) Red–black Scapegoat Splay T Treap UB Weight-balanced
Heaps	Binary Binomial Brodal d-ary Fibonacci Leftist Pairing Skew binomial Skew van Emde Boas Weak
Tries	Ctrie C-trie (compressed ADT) Hash Radix Suffix Ternary search X-fast Y-fast
Spatial data partitioning trees	Ball BK BSP Cartesian Hilbert R k-d (implicit k-d) M Metric MVP Octree PH Priority R Quad R R+ R* Segment VP X
Other trees	Cover Exponential Fenwick Finger Fractal index Fusion Hash calendar iDistance K-ary Left-child right-sibling Link/cut Log-structured merge Merkle PQ Range SPQR Top

Revisions and contributors Edit on Wikipedia Read on Wikipedia

View on Grokipedia

from Grokipedia

An order statistic tree is a variant of a balanced binary search tree, such as a red-black tree or AVL tree, augmented with additional information in each node to support efficient order-based queries on a dynamic set of elements.^[1] Specifically, each node stores the size of its subtree (the number of nodes rooted at that node, including itself), which is maintained in constant time during insertions, deletions, and tree rotations.^[1] This augmentation enables two key operations beyond standard binary search tree functionality: OS-SELECT, which finds the i-th smallest element in the set (where ranks start from 0 for the smallest), and OS-RANK, which determines the rank of a given element (the number of elements strictly smaller than it).^[1] Both operations run in O(log n) time, where n is the number of elements, assuming the underlying tree remains balanced with height O(log n).^[1] Standard operations like insertion, deletion, and search also take O(log n) time, with subtree sizes updated recursively during modifications to preserve the augmentation.^[1] The structure leverages the ordered nature of binary search trees, where keys in the left subtree are smaller than the node's key and keys in the right subtree are larger, combined with size fields to navigate directly to order positions without full traversals.^[2] For OS-SELECT(i), the algorithm starts at the root and compares i to the size of the left subtree: if i is less than the size of the left subtree, it recurses left; if equal, the current node is returned; otherwise, it recurses right with an adjusted index i minus the size of the left subtree minus one.^[1] Similarly, OS-RANK traverses from root to the target node, accumulating the sizes of left subtrees and adding one for each time the path goes right.^[1] This makes order statistic trees particularly useful for applications requiring frequent order statistics, such as finding medians (OS-SELECT(n/2)) or maintaining ranked access in databases and computational geometry.^[2] Implementations often use policy-based data structures in languages like C++ for efficiency, though the core concept applies to any balanced search tree variant.^[3]

Fundamentals

Definition

An order statistic tree is a balanced binary search tree, such as a red-black tree or AVL tree, augmented with subtree size information stored in each node to support efficient order statistic queries.^[4] Each node contains a key value, pointers to its left and right child nodes, and a size field that indicates the total number of nodes in the subtree rooted at that node, including the node itself.^[4] This structure enables key operations, such as identifying the k-th smallest element (the order-k select operation), to be performed in

O(\log n)

worst-case time, assuming the underlying tree is balanced (with height

O(\log n)

), where

n

is the number of elements in the tree.^[4] The node structure can be illustrated in pseudocode as follows:

node { key, left, right, size = 1 + size(left) + size(right) } ```[](https://mitpress.mit.edu/9780262046305/introduction-to-algorithms/) ### Historical Development The concept of order statistic trees originated from advancements in binary search tree research during the late 20th century, building on foundational work in balanced search structures. [Binary search tree](/page/Binary_search_tree)s, the underlying structure, were formalized in the early 1960s, with early balanced variants like AVL trees introduced in 1962 by Georgii Adelson-Velsky and Evgenii Landis to ensure logarithmic-time operations.[](https://doi.org/10.1007/978-3-642-63606-7_2) In the 1970s and 1980s, researchers augmented [binary search tree](/page/Binary_search_tree)s with additional node information, such as subtree sizes, to enable efficient order statistics queries like selection and ranking. This augmentation drew from earlier developments in balanced binary trees, including red-black trees, which were first conceptualized by [Rudolf Bayer](/page/Rudolf_Bayer) in 1972 as symmetric binary B-trees and refined by Leo Guibas and Robert Sedgewick in 1978 through a dichromatic framework that simplified rotations and color maintenance.[](https://doi.org/10.1145/359340.359342)[](https://doi.org/10.1145/359460.359470) Sedgewick's contributions to searching algorithms in this period, including analyses of tree balancing, laid groundwork for size-augmented variants that supported dynamic order statistics. The term "order statistic tree" and its comprehensive formulation as a size-augmented red-black tree were introduced by [Thomas H. Cormen](/page/Thomas_H._Cormen), [Charles E. Leiserson](/page/Charles_E._Leiserson), Ronald L. Rivest, and Clifford Stein in the first edition of their textbook *[Introduction to Algorithms](/page/Introduction_to_Algorithms)* in 1990, emphasizing operations like OS-SELECT and OS-RANK with O(log n) [time complexity](/page/Time_complexity).[](https://mitpress.mit.edu/9780262046305/introduction-to-algorithms/) This work popularized the structure in academic and practical contexts, integrating it with balanced trees for guaranteed performance. Later developments included array-based approaches like the [Fenwick tree](/page/Fenwick_tree), proposed by Peter M. Fenwick in 1994 for cumulative frequency tables, which enabled order statistics in linear storage through [prefix sum](/page/Prefix_sum) updates and queries.[](https://doi.org/10.1145/230201.230395) Balanced BSTs from the [1970s](/page/1970s) and [1980s](/page/1980s) further inspired augmentations for order maintenance in dynamic sets. Evolution continued with implicit-key structures, such as treaps, where Cecilia Aragon and Raimund Seidel introduced randomized implicit treaps in 1996 to support order statistics without explicit keys via heap priorities.[](https://doi.org/10.1145/235347.233380) In the [2000s](/page/2000s), the structure influenced library implementations, notably the GNU C++ library's policy-based data structures, where `__gnu_pbds::ordered_set`—introduced around 2005—provided a templated order statistic tree with find_by_order and order_of_key operations.[](https://gcc.gnu.org/onlinedocs/libstdc++/manual/policy_data_structures.html) ## Underlying Data Structure ### Binary Search Tree Basics A [binary search tree](/page/Binary_search_tree) (BST) is a [binary tree](/page/Binary_tree) [data structure](/page/Data_structure) in which each node has a key, and for any given node, all keys in its left subtree are less than the node's key, while all keys in its right subtree are greater than the node's key.[](https://opendsa-server.cs.vt.edu/ODSA/Books/Everything/html/BST.html) This ordering property holds recursively for every subtree, ensuring that the tree maintains a sorted structure based on key values.[](https://courses.grainger.illinois.edu/cs225/fa2021/resources/bst/) BSTs assume that keys are comparable and unique, though variations may allow duplicates.[](https://www.cs.cmu.edu/afs/cs/academic/class/15210-s15/www/lectures/bst-notes.pdf) The fundamental operations on a BST include searching for a key, finding the minimum or maximum key, and determining the predecessor or successor of a given key. Searching starts at the [root](/page/Root) and traverses left or right based on comparisons, taking time proportional to the tree's [height](/page/Height) $h$.[](https://softwarefoundations.cis.upenn.edu/vfa-current/SearchTree.html) Finding the minimum involves traversing leftward to the leftmost node, while the maximum is the rightmost node; both require $O(h)$ time.[](https://pages.cs.wisc.edu/~siff/CS367/Notes/bsts.html) Predecessor and successor operations similarly follow the tree structure to locate the largest key less than or the smallest key greater than a target, each in $O(h)$ time.[](https://ics.uci.edu/~thornton/ics46/Notes/BinarySearchTrees/) An in-order traversal of a BST visits nodes in ascending order of their keys, which arises directly from the left-smaller/right-larger property and enables efficient access to sorted sequences without additional sorting.[](https://pages.cs.wisc.edu/~siff/CS367/Notes/bsts.html) This implicit ordering makes BSTs suitable for dynamic sets where elements are inserted and deleted while preserving sortability.[](https://web.stanford.edu/class/archive/cs/cs161/cs161.1168/lecture8.pdf) However, without mechanisms to control balance, BSTs constructed from sequential or adversarially ordered insertions can degenerate into a [linked list](/page/Linked_list), resulting in a height of $O(n)$ in the worst case, where $n$ is the number of nodes, and thus degrading operation times to linear.[](https://softwarefoundations.cis.upenn.edu/vfa-current/SearchTree.html) ### Size Augmentation To enable order statistic functionality in a [binary search tree](/page/Binary_search_tree) (BST), each node is augmented with an [integer](/page/Integer) field that stores the [size](/page/Size) of the subtree rooted at that node. This augmentation process begins by initializing the size field to 1 for [leaf](/page/Leaf) nodes, which contain a single element. For any non-[leaf](/page/Leaf) node, the size is defined as $ \text{size}(node) = 1 + \text{size}(left) + \text{size}(right) $, where left and right refer to the node's child subtrees. This recursive definition ensures that the size field captures the total number of nodes in the subtree, providing a compact summary of its [cardinality](/page/Cardinality).[](https://www2.cs.arizona.edu/classes/cs445/fall15/AugmentedDS.pdf) Maintaining the accuracy of these size fields requires an invariant: after any [structural change](/page/Structural_change) to the [tree](/page/Tree), such as a [rotation](/page/Rotation) or node addition/removal, the sizes must be recomputed bottom-up along the affected path to the [root](/page/Root). This bottom-up update propagates changes from the modified subtrees upward, preserving the correctness of all size values throughout the [tree](/page/Tree). Failure to uphold this invariant would lead to inconsistent subtree cardinalities, undermining the tree's utility for order statistics.[](https://www2.cs.arizona.edu/classes/cs445/fall15/AugmentedDS.pdf) The [size](/page/Size) of any subtree can be retrieved in constant time by directly accessing the precomputed field:

int size(Node* node) { return node ? node->size : 0; }

This achieves O(1) time by reading the stored value without [recursion](/page/Recursion) or traversal.[](https://www2.cs.arizona.edu/classes/cs445/fall15/AugmentedDS.pdf) The primary benefit of size augmentation is the ability to compute subtree cardinalities in constant time per node during traversals, avoiding the need for expensive full-subtree enumerations that would otherwise require O(n) time in a standard BST. This efficiency stems from the augmented fields, which transform the [tree](/page/Tree) into a structure capable of supporting dynamic order statistics while inheriting the BST's logarithmic search properties. ## Core Operations ### Selection Operation The selection operation in an order statistic tree, also known as order-k select, retrieves the k-th smallest element from the dynamic set of keys stored in the tree, where the keys are ordered according to the [binary search tree](/page/Binary_search_tree) property.[](https://ocw.mit.edu/courses/6-046j-introduction-to-algorithms-sma-5503-fall-2005/resources/lecture-11-augmenting-data-structures-dynamic-order-statistics-interval-trees/) This operation exploits the size augmentation in each node, which stores the total number of nodes in its subtree, to perform the search in O(log n) time on a balanced [tree](/page/Tree) without scanning the entire [structure](/page/Structure).[](https://ocw.mit.edu/courses/6-046j-introduction-to-algorithms-sma-5503-fall-2005/resources/lecture-11-augmenting-data-structures-dynamic-order-statistics-interval-trees/) The algorithm starts at the root node and proceeds recursively as follows: compute the size of the left subtree, denoted as left_size, which equals the number of elements smaller than the current node's key. If k ≤ left_size, the k-th smallest element lies in the left subtree, so recurse on that subtree with the same k. If k = left_size + 1, the current node contains the k-th smallest element, so return it. Otherwise, the k-th smallest element is in the right subtree, so recurse there with adjusted index k - (left_size + 1).[](https://ocw.mit.edu/courses/6-046j-introduction-to-algorithms-sma-5503-fall-2005/resources/lecture-11-augmenting-data-structures-dynamic-order-statistics-interval-trees/) This navigation mimics the position in the inorder traversal of the tree, using subtree sizes to skip irrelevant branches efficiently.[](https://ocw.mit.edu/courses/6-046j-introduction-to-algorithms-sma-5503-fall-2005/resources/lecture-11-augmenting-data-structures-dynamic-order-statistics-interval-trees/) The operation assumes 1-based indexing, where k = 1 yields the smallest element and k = n (the total number of elements) yields the largest.[](https://ocw.mit.edu/courses/6-046j-introduction-to-algorithms-sma-5503-fall-2005/resources/lecture-11-augmenting-data-structures-dynamic-order-statistics-interval-trees/) The following pseudocode implements this recursive procedure:

Node* select(Node* node, int k) { if (node == nullptr) return nullptr; int left_size = size(node->left); if (k <= left_size) { return select(node->left, k); } else if (k > left_size + 1) { return select(node->right, k - left_size - 1); } else { return node; } }

Here, `size` is a function that returns the augmented size field of the given subtree root (0 for null).[](https://cse.sc.edu/~fenner/csce750/OKane-Fall-2020/notes-augmenting.pdf) The correctness of the select algorithm stems from the binary search tree invariant, which guarantees that all keys in the left subtree precede the current key in sorted order, combined with the size fields that precisely quantify the number of preceding elements in the inorder sequence.[](https://ocw.mit.edu/courses/6-046j-introduction-to-algorithms-sma-5503-fall-2005/resources/lecture-11-augmenting-data-structures-dynamic-order-statistics-interval-trees/) Subtree sizes, computed as the sum of left subtree size, right subtree size, and 1 for the node itself, enable this direct positional lookup.[](https://ocw.mit.edu/courses/6-046j-introduction-to-algorithms-sma-5503-fall-2005/resources/lecture-11-augmenting-data-structures-dynamic-order-statistics-interval-trees/) ### Rank Operation The rank operation in an [order statistic tree](/page/Order_statistic) computes the 1-based rank of a specified key in the sorted sequence of all elements stored in the [tree](/page/Tree) (1 + number of elements strictly smaller than the key), leveraging the augmented subtree size information to achieve this efficiently. If the key exists in the tree, it returns the rank; if absent, it returns the 1-based position where the key would be inserted (1 + number of elements strictly smaller than the key). This operation is fundamental for applications requiring quick positional queries, such as [percentile](/page/Percentile) calculations or order-based indexing. The operations assume distinct keys; for multisets with duplicates, size augmentation and comparisons must be adjusted accordingly.[](https://sites.math.rutgers.edu/~ajl213/CLRS/Ch14.pdf) The algorithm performs a standard binary search traversal starting from the [root](/page/Root), but augments the path with counts derived from subtree [size](/page/Size)s to avoid a full inorder traversal. Specifically, as the traversal proceeds, whenever the search path veers to the right child of a node (indicating the key is greater than or equal to the current node's key), the rank accumulator adds the [size](/page/Size) of the current node's left subtree (all elements smaller than the current key in that subtree) plus one (accounting for the current node itself). If the path goes left, no addition occurs since all elements in the right subtree and beyond are larger. Upon reaching the node matching the key, the accumulated value is returned as the rank; if no match is found (reaching null), return the accumulator plus one for the insertion position. This approach ensures the operation runs in $O(\log n)$ time on a balanced [tree](/page/Tree), matching the [height](/page/Height) of the structure.[](https://sites.math.rutgers.edu/~ajl213/CLRS/Ch14.pdf) The rank can be formalized as 1 + the sum of the sizes of all left subtrees encountered along the search path to the key, plus one for each [ancestor](/page/Ancestor) node where the path branched right (including the target node if found). For a key $k$, the [rank](/page/RANK) $r(k) = 1 + \sum \{ \text{size}(v.\text{left}) + [k \geq v.\text{key}] \mid v \text{ on path} \}$, where [ ] is 1 if true, 0 else. This [summation](/page/Summation) directly corresponds to the number of nodes preceding $k$ in an inorder traversal, plus one.[](https://sites.math.rutgers.edu/~ajl213/CLRS/Ch14.pdf) Here is the iterative pseudocode for the rank operation, assuming a node structure with `key`, `left`, `right`, and `size` fields:

int rank(Node* node, Key key) { int r = 0; bool found = false; while (node != NULL) { if (key < node->key) { node = node->left; } else { r += (node->left != NULL ? node->left->size : 0) + 1; if (key == node->key) { found = true; return r; } node = node->right; } } return r + (found ? 0 : 1); // 1-based rank: 1 + # elements < key }

This implementation handles the null left subtree case explicitly and terminates with the appropriate 1-based rank or insertion position. The selection operation is a complementary query that, given a rank, retrieves the corresponding key.[](https://sites.math.rutgers.edu/~ajl213/CLRS/Ch14.pdf) ## Modification Operations ### Insertion The insertion operation in an order-statistic tree begins with the standard [binary search tree](/page/Binary_search_tree) (BST) insertion procedure to locate the appropriate leaf position for the new key $k$. The algorithm starts at the [root](/page/Root) and traverses downward, moving to the left child if $k$ is less than the current node's key or to the right child otherwise, until reaching a [null pointer](/page/Null_pointer) where the new leaf can be attached. A new node $z$ is then created with key $k$ and its size field initialized to 1, as it has no children, and linked as the appropriate child of the parent node.[](https://www2.cs.arizona.edu/classes/cs445/fall15/AugmentedDS.pdf)[](https://cse.sc.edu/~fenner/csce750/OKane-Fall-2020/notes-augmenting.pdf) To maintain the order-statistic augmentation, the sizes of all nodes on the path from the insertion point back to the root must be updated after attaching the new node. This is achieved by traversing upward from $z$ to the root, incrementing the size field of each ancestor by 1, since the subtree sizes increase by exactly one along this path. The update ensures that every node's size accurately reflects the total number of nodes in its subtree post-insertion.[](https://www2.cs.arizona.edu/classes/cs445/fall15/AugmentedDS.pdf)[](https://cse.sc.edu/~fenner/csce750/OKane-Fall-2020/notes-augmenting.pdf) The following pseudocode outlines the augmented insertion process, assuming the underlying BST implementation (such as TREE-INSERT from standard references) returns the newly inserted node $z$ and that parent pointers are maintained for traversal:

OS-INSERT(T, k) z ← TREE-INSERT(T, k) // Standard BST insertion, returns new node z z.size ← 1 y ← z.parent while y ≠ NIL y.size ← y.size + 1 y ← y.parent

This approach recomputes sizes incrementally along the path, avoiding full subtree recalculations.[](https://cse.sc.edu/~fenner/csce750/OKane-Fall-2020/notes-augmenting.pdf) In cases involving duplicate keys, the insertion policy must be consistent to preserve correct ordering and statistics; for example, equal keys are typically inserted into the right subtree to ensure they appear after existing instances in the in-order traversal.[](https://www.geeksforgeeks.org/dsa/how-to-handle-duplicates-in-binary-search-tree/) Assuming the tree is balanced (e.g., via red-black tree implementation), the insertion operation, including traversal and size updates, takes average $O(\log n)$ time, where $n$ is the number of nodes.[](https://www2.cs.arizona.edu/classes/cs445/fall15/AugmentedDS.pdf) ### Deletion Deletion in an order statistic tree follows the standard [binary search tree](/page/Binary_search_tree) (BST) deletion procedure, augmented with updates to the subtree size fields to maintain the order statistic functionality. The process begins by locating the node containing the key to delete using a standard BST search. Once found, the deletion handles three primary cases based on the number of children the node has, ensuring the BST property is preserved throughout.[](http://www.cs.ucf.edu/courses/cop3502h/spring2012/Lectures/Lec11_BinaryTrees_Deletion.pdf) In the first case, if the node is a leaf (no children), it is simply removed by setting the appropriate child pointer of its parent to null. This is the simplest scenario, requiring no further structural adjustments beyond the removal. In the second case, if the node has exactly one child, the child is promoted to take the node's position by linking the parent directly to this child, and the node is then freed. These cases maintain the tree structure with minimal changes.[](http://www.cs.ucf.edu/courses/cop3502h/spring2012/Lectures/Lec11_BinaryTrees_Deletion.pdf) The third case, involving a node with two children, is more involved: the node's value is replaced with its inorder successor (the minimum value in the right subtree), which is typically a leaf or has only a left child. The successor node is then deleted using one of the simpler cases above. This approach avoids directly merging subtrees while preserving the BST ordering. Careful handling during the successor swap ensures that temporary inconsistencies in sizes are resolved through subsequent updates.[](http://www.cs.ucf.edu/courses/cop3502h/spring2012/Lectures/Lec11_BinaryTrees_Deletion.pdf) Following any structural changes from deletion, the subtree sizes must be updated bottom-up along the path from the affected node to the [root](/page/Root) to reflect the removal of one element. Initially, sizes can be decremented by 1 for nodes along this path, but where subtrees are restructured (e.g., after child replacement or successor swap), the sizes are recomputed explicitly as the sum of the sizes of the left and right subtrees plus one for the node itself. This ensures no size inconsistencies arise, allowing [order statistic](/page/Order_statistic) operations like selection and rank to remain accurate. In balanced variants such as red-black trees, rotations triggered by deletion also require O(1) size fixes per rotation.[](https://www2.cs.arizona.edu/classes/cs445/fall15/AugmentedDS.pdf)[](https://www.cs.virginia.edu/~robins/cs4102/slides/Cormen%20et%20al%20Slides/PDF/11-Augmenting%20Data%20Structures.pdf) The following pseudocode snippet illustrates the size update process post-deletion, starting from the parent of the deleted node and traversing upward:

function UpdateSizes(node): while node is not null: if subtrees of node changed: node.size = size(node.left) + size(node.right) + 1 else: node.size -= 1 node = node.parent

This traversal ensures all [ancestor](/page/Ancestor) sizes are correctly adjusted in O(h) time, where h is the tree height, typically O(log n) in balanced trees.[](https://www2.cs.arizona.edu/classes/cs445/fall15/AugmentedDS.pdf) ## Analysis and Applications ### Time and Space Complexity The core operations of an order statistic tree—selection (finding the i-th smallest element), rank (finding the position of an element), insertion, and deletion—each require O(h) time, where h denotes the height of the underlying [binary search tree](/page/Binary_search_tree).[](https://web.stanford.edu/class/archive/cs/cs106b/cs106b.1224/section/section8/) In the worst case for an unbalanced tree, the height h can grow to O(n), resulting in O(n) time complexity for these operations. However, when elements are inserted in random order, the expected height is O(log n), leading to O(log n) average-case [time complexity](/page/Time_complexity) across operations.[](https://web.stanford.edu/class/archive/cs/cs106b/cs106b.1224/section/section8/) To achieve guaranteed O(log n) worst-case performance, [order statistic](/page/Order_statistic) trees are often built atop self-balancing structures like red-black trees or AVL trees, which maintain height at O(log n) through rotations and other adjustments during insertions and deletions.[](https://ocw.mit.edu/courses/6-046j-design-and-analysis-of-algorithms-spring-2015/fc870caae0e6812787bb5d50ea4d5e24_MIT6_046JS15_lec09.pdf) For sequences of m operations on a tree with n elements, [amortized analysis](/page/Amortized_analysis) under balancing schemes confirms O(log n) per operation on average, as the cost of rebalancing is distributed across updates without exceeding the per-operation bound in the aggregate.[](https://ocw.mit.edu/courses/6-046j-design-and-analysis-of-algorithms-spring-2015/fc870caae0e6812787bb5d50ea4d5e24_MIT6_046JS15_lec09.pdf) The space complexity is O(n) for n elements, since each node stores only O(1) extra information—the size of the subtree rooted at that node—beyond the standard binary search tree fields.[](https://web.stanford.edu/class/archive/cs/cs106b/cs106b.1224/section/section8/) In comparison to a sorted array, which enables O(log n) rank queries via binary search and O(1) selection if indexed but demands O(n) time for insertions and deletions due to shifting elements, a balanced order statistic tree delivers O(log n) efficiency for all operations, making it preferable for dynamic sets requiring frequent updates.[](https://cs.stackexchange.com/questions/56175/prove-disprove-the-existance-of-a-data-structure-that-has-olog-n-inserts-delet) ### Practical Uses Order statistic trees find application in dynamic median finding, where maintaining the [median](/page/Median) of a set under insertions and deletions is required, such as in [real-time data](/page/Real-time_data) analysis. By augmenting a balanced [binary search tree](/page/Binary_search_tree) with subtree sizes, these structures support selecting the floor(n/2)-th or ceiling(n/2)-th [order statistic](/page/Order_statistic) in O(log n) time, enabling efficient computation without resorting the entire dataset.[](https://www.cs.yale.edu/homes/aspnes/pinewiki/OrderStatisticsTree.html) They also extend traditional priority queues by providing access to elements by rank, allowing retrieval of the k-th largest element alongside standard insert and extract-max operations. This capability is valuable in scheduling algorithms and [resource management](/page/Resource_management) systems, where not only the highest-priority item but also intermediate priorities need to be queried efficiently. In [computational geometry](/page/Computational_geometry), order statistic trees and similar augmented search trees are utilized for storing points and supporting queries like range counting or reporting, which underpin k-nearest neighbor searches. For instance, search-tree tables based on augmented binary search trees achieve O(log n) query times for half-space range queries in 2D and higher dimensions, facilitating efficient point location and proximity computations in spatial datasets.[](https://dspace.mit.edu/bitstream/handle/1721.1/149146/MIT-LCS-TM-388.pdf?sequence=1&isAllowed=y) Implementations appear in libraries such as the GNU C++ Standard Template Library's policy-based data structures, which offer an `ordered_set` container supporting `order_of_key` (rank query) and `find_by_order` (selection) in O(log n) time via red-black tree augmentation. Custom implementations in other languages, like augmenting Java's `TreeMap`, replicate these features for similar use cases. Real-world deployments include database indexing, where [B-tree](/page/B-tree) variants of order statistic trees enable [percentile](/page/Percentile) queries by computing ranks and selections on sorted indices, supporting analytical operations like SQL's `PERCENTILE_CONT`. In [streaming data](/page/Streaming_data) processing, they maintain running order statistics for tasks such as [anomaly detection](/page/Anomaly_detection), as seen in incremental [decision tree](/page/Decision_tree) algorithms that use order statistics summaries for classification on continuous data flows.[](https://inria.hal.science/hal-00758003/document) Despite these advantages, the overhead of maintaining subtree sizes during rotations and rebalancing can increase constant factors in highly dynamic environments with frequent modifications, making them less suitable for sets with minimal order-based queries compared to standard balanced trees.[](https://www.cs.yale.edu/homes/aspnes/pinewiki/OrderStatisticsTree.html)

Info Pages

Talk Pages

Special Pages

Order statistic tree

Recent from talks

Recent from talks

Contribute something

Contribute something

Media Pages

Timelines

Articles

Notes collections

Notes

Notes

Days in Chronicle

Order statistic tree

Augmented search tree implementation

References

See also

External links