Recent from talks
Knowledge base stats:
Talk channels stats:
Members stats:
Selection algorithm
In computer science, a selection algorithm is an algorithm for finding the th smallest value in a collection of ordered values, such as numbers. The value that it finds is called the th order statistic. Selection includes as special cases the problems of finding the minimum, median, and maximum element in the collection. Selection algorithms include quickselect, and the median of medians algorithm. When applied to a collection of values, these algorithms take linear time, as expressed using big O notation. For data that is already structured, faster algorithms may be possible; as an extreme case, selection in an already-sorted array takes time .
An algorithm for the selection problem takes as input a collection of values, and a number . It outputs the th smallest of these values, or, in some versions of the problem, a collection of the smallest values. For this to be well-defined, it should be possible to sort the values into an order from smallest to largest; for instance, they may be integers, floating-point numbers, or some other kind of object with a numeric key. However, they are not assumed to have been already sorted. Often, selection algorithms are restricted to a comparison-based model of computation, as in comparison sort algorithms, where the algorithm has access to a comparison operation that can determine the relative ordering of any two values, but may not perform any other kind of arithmetic operations on these values.
To simplify the problem, some works on this problem assume that the values are all distinct from each other, or that some consistent tie-breaking method has been used to assign an ordering to pairs of items with the same value as each other. Another variation in the problem definition concerns the numbering of the ordered values: is the smallest value obtained by setting , as in zero-based numbering of arrays, or is it obtained by setting , following the usual English-language conventions for the smallest, second-smallest, etc.? This article follows the conventions used by Cormen et al., according to which all values are distinct and the minimum value is obtained from .
With these conventions, the maximum value, among a collection of values, is obtained by setting . When is an odd number, the median of the collection is obtained by setting . When is even, there are two choices for the median, obtained by rounding this choice of down or up, respectively: the lower median with and the upper median with .
As a baseline algorithm, selection of the th smallest value in a collection of values can be performed by the following two steps:
The time for this method is dominated by the sorting step, which requires time using a comparison sort. Even when integer sorting algorithms may be used, these are generally slower than the linear time that may be achieved using specialized selection algorithms. Nevertheless, the simplicity of this approach makes it attractive, especially when a highly-optimized sorting routine is provided as part of a runtime library, but a selection algorithm is not. For inputs of moderate size, sorting can be faster than non-random selection algorithms, because of the smaller constant factors in its running time. This method also produces a sorted version of the collection, which may be useful for other later computations, and in particular for selection with other choices of .
For a sorting algorithm that generates one item at a time, such as selection sort, the scan can be done in tandem with the sort, and the sort can be terminated once the th element has been found. One possible design of a consolation bracket in a single-elimination tournament, in which the teams who lost to the eventual winner play another mini-tournament to determine second place, can be seen as an instance of this method. Applying this optimization to heapsort produces the heapselect algorithm, which can select the th smallest value in time . This is fast when is small relative to , but degenerates to for larger values of , such as the choice used for median finding.
Many methods for selection are based on choosing a special "pivot" element from the input, and using comparisons with this element to divide the remaining input values into two subsets: the set of elements less than the pivot, and the set of elements greater than the pivot. The algorithm can then determine where the th smallest value is to be found, based on a comparison of with the sizes of these sets. In particular, if , the th smallest value is in , and can be found recursively by applying the same selection algorithm to . If , then the th smallest value is the pivot, and it can be returned immediately. In the remaining case, the th smallest value is in , and more specifically it is the element in position of . It can be found by applying a selection algorithm recursively, seeking the value in this position in .
Hub AI
Selection algorithm AI simulator
(@Selection algorithm_simulator)
Selection algorithm
In computer science, a selection algorithm is an algorithm for finding the th smallest value in a collection of ordered values, such as numbers. The value that it finds is called the th order statistic. Selection includes as special cases the problems of finding the minimum, median, and maximum element in the collection. Selection algorithms include quickselect, and the median of medians algorithm. When applied to a collection of values, these algorithms take linear time, as expressed using big O notation. For data that is already structured, faster algorithms may be possible; as an extreme case, selection in an already-sorted array takes time .
An algorithm for the selection problem takes as input a collection of values, and a number . It outputs the th smallest of these values, or, in some versions of the problem, a collection of the smallest values. For this to be well-defined, it should be possible to sort the values into an order from smallest to largest; for instance, they may be integers, floating-point numbers, or some other kind of object with a numeric key. However, they are not assumed to have been already sorted. Often, selection algorithms are restricted to a comparison-based model of computation, as in comparison sort algorithms, where the algorithm has access to a comparison operation that can determine the relative ordering of any two values, but may not perform any other kind of arithmetic operations on these values.
To simplify the problem, some works on this problem assume that the values are all distinct from each other, or that some consistent tie-breaking method has been used to assign an ordering to pairs of items with the same value as each other. Another variation in the problem definition concerns the numbering of the ordered values: is the smallest value obtained by setting , as in zero-based numbering of arrays, or is it obtained by setting , following the usual English-language conventions for the smallest, second-smallest, etc.? This article follows the conventions used by Cormen et al., according to which all values are distinct and the minimum value is obtained from .
With these conventions, the maximum value, among a collection of values, is obtained by setting . When is an odd number, the median of the collection is obtained by setting . When is even, there are two choices for the median, obtained by rounding this choice of down or up, respectively: the lower median with and the upper median with .
As a baseline algorithm, selection of the th smallest value in a collection of values can be performed by the following two steps:
The time for this method is dominated by the sorting step, which requires time using a comparison sort. Even when integer sorting algorithms may be used, these are generally slower than the linear time that may be achieved using specialized selection algorithms. Nevertheless, the simplicity of this approach makes it attractive, especially when a highly-optimized sorting routine is provided as part of a runtime library, but a selection algorithm is not. For inputs of moderate size, sorting can be faster than non-random selection algorithms, because of the smaller constant factors in its running time. This method also produces a sorted version of the collection, which may be useful for other later computations, and in particular for selection with other choices of .
For a sorting algorithm that generates one item at a time, such as selection sort, the scan can be done in tandem with the sort, and the sort can be terminated once the th element has been found. One possible design of a consolation bracket in a single-elimination tournament, in which the teams who lost to the eventual winner play another mini-tournament to determine second place, can be seen as an instance of this method. Applying this optimization to heapsort produces the heapselect algorithm, which can select the th smallest value in time . This is fast when is small relative to , but degenerates to for larger values of , such as the choice used for median finding.
Many methods for selection are based on choosing a special "pivot" element from the input, and using comparisons with this element to divide the remaining input values into two subsets: the set of elements less than the pivot, and the set of elements greater than the pivot. The algorithm can then determine where the th smallest value is to be found, based on a comparison of with the sizes of these sets. In particular, if , the th smallest value is in , and can be found recursively by applying the same selection algorithm to . If , then the th smallest value is the pivot, and it can be returned immediately. In the remaining case, the th smallest value is in , and more specifically it is the element in position of . It can be found by applying a selection algorithm recursively, seeking the value in this position in .