Hubbry Logo
Counting sortCounting sortMain
Open search
Counting sort
Community hub
Counting sort
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something
Counting sort
Counting sort
from Wikipedia
Counting sort
ClassSorting Algorithm
Data structureArray
Worst-case performance, where k is the range of the non-negative key values.
Worst-case space complexity

In computer science, counting sort is an algorithm for sorting a collection of objects according to keys that are small positive integers; that is, it is an integer sorting algorithm. It operates by counting the number of objects that possess distinct key values, and applying prefix sum on those counts to determine the positions of each key value in the output sequence. Its running time is linear in the number of items and the difference between the maximum key value and the minimum key value, so it is only suitable for direct use in situations where the variation in keys is not significantly greater than the number of items. It is often used as a subroutine in radix sort, another sorting algorithm, which can handle larger keys more efficiently.[1][2][3]

Counting sort is not a comparison sort; it uses key values as indexes into an array and the Ω(n log n) lower bound for comparison sorting will not apply.[1] Bucket sort may be used in lieu of counting sort, and entails a similar time analysis. However, compared to counting sort, bucket sort requires linked lists, dynamic arrays, or a large amount of pre-allocated memory to hold the sets of items within each bucket, whereas counting sort stores a single number (the count of items) per bucket.[4]

Input and output assumptions

[edit]

In the most general case, the input to counting sort consists of a collection of n items, each of which has a non-negative integer key whose maximum value is at most k.[3] In some descriptions of counting sort, the input to be sorted is assumed to be more simply a sequence of integers itself,[1] but this simplification does not accommodate many applications of counting sort. For instance, when used as a subroutine in radix sort, the keys for each call to counting sort are individual digits of larger item keys; it would not suffice to return only a sorted list of the key digits, separated from the items.

In applications such as in radix sort, a bound on the maximum key value k will be known in advance, and can be assumed to be part of the input to the algorithm. However, if the value of k is not already known then it may be computed, as a first step, by an additional loop over the data to determine the maximum key value.

The output is an array of the elements ordered by their keys. Because of its application to radix sorting, counting sort must be a stable sort; that is, if two elements share the same key, their relative order in the output array and their relative order in the input array should match.[1][2]

Pseudocode

[edit]

In pseudocode, the algorithm may be expressed as:

function CountingSort(input, k) is
    
    count ← array of k + 1 zeros
    output ← array of same length as input
    
    for i = 0 to length(input) - 1 do
        j = key(input[i])
        count[j] = count[j] + 1

    for i = 1 to k do
        count[i] = count[i] + count[i - 1]

    for i = length(input) - 1 down to 0 do
        j = key(input[i])
        count[j] = count[j] - 1
        output[count[j]] = input[i]

    return output

Where input is the array to be sorted, key returns the numeric key of each item in the input array, count is an auxiliary array used first to store the numbers of items with each key, and then (after the second loop) to store the positions where items with each key should be placed, k is the maximum value of the non-negative key values and output is the sorted output array.

In summary, the algorithm loops over the items in the first loop, computing a histogram of the number of times each key occurs within the input collection. After that in the second loop, it performs a prefix sum computation on count in order to determine, for each key, the position range where the items having that key should be placed; i.e. items of key should be placed starting in position count[]. Finally, in the third loop, it loops over the items of input again, but in reverse order, moving each item into its sorted position in the output array.[1][2][3]

The relative order of items with equal keys is preserved here; i.e., this is a stable sort.

Complexity analysis

[edit]

Because the algorithm uses only simple for loops, without recursion or subroutine calls, it is straightforward to analyze. The initialization of the count array, and the second for loop which performs a prefix sum on the count array, each iterate at most k + 1 times and therefore take O(k) time. The other two for loops, and the initialization of the output array, each take O(n) time. Therefore, the time for the whole algorithm is the sum of the times for these steps, O(n + k).[1][2]

Because it uses arrays of length k + 1 and n, the total space usage of the algorithm is also O(n + k).[1] For problem instances in which the maximum key value is significantly smaller than the number of items, counting sort can be highly space-efficient, as the only storage it uses other than its input and output arrays is the Count array which uses space O(k).[5]

Variant algorithms

[edit]

If each item to be sorted is itself an integer, and used as key as well, then the second and third loops of counting sort can be combined; in the second loop, instead of computing the position where items with key i should be placed in the output, simply append Count[i] copies of the number i to the output.

This algorithm may also be used to eliminate duplicate keys, by replacing the Count array with a bit vector that stores a one for a key that is present in the input and a zero for a key that is not present. If additionally the items are the integer keys themselves, both second and third loops can be omitted entirely and the bit vector will itself serve as output, representing the values as offsets of the non-zero entries, added to the range's lowest value. Thus the keys are sorted and the duplicates are eliminated in this variant just by being placed into the bit array.

For data in which the maximum key size is significantly smaller than the number of data items, counting sort may be parallelized by splitting the input into subarrays of approximately equal size, processing each subarray in parallel to generate a separate count array for each subarray, and then merging the count arrays. When used as part of a parallel radix sort algorithm, the key size (base of the radix representation) should be chosen to match the size of the split subarrays.[6] The simplicity of the counting sort algorithm and its use of the easily parallelizable prefix sum primitive also make it usable in more fine-grained parallel algorithms.[7]

As described, counting sort is not an in-place algorithm; even disregarding the count array, it needs separate input and output arrays. It is possible to modify the algorithm so that it places the items into sorted order within the same array that was given to it as the input, using only the count array as auxiliary storage; however, the modified in-place version of counting sort is not stable.[3]

History

[edit]

Although radix sorting itself dates back far longer, counting sort, and its application to radix sorting, were both invented by Harold H. Seward in 1954.[1][4][8]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Counting sort is a non-comparison-based sorting algorithm designed for sorting a collection of integers within a small, known range, achieving linear time complexity by counting the occurrences of each possible value and using these counts to place elements in their correct positions in the output array. The algorithm assumes the input consists of n elements drawn from the set {0, 1, ..., k-1}, where k is the range of key values, and it operates by first initializing a count array of size k to zero, then incrementing the count for each input element, computing cumulative sums to determine positions, and finally building the sorted array by placing elements in reverse order to maintain stability. This process ensures the relative order of equal elements is preserved, making it a stable sort. Invented by Harold H. Seward in 1954 as part of early work on digital computer applications for business operations at MIT, counting sort laid foundational techniques for efficient sorting and was integrated into implementations to handle multi-digit keys. Its of Θ(n + k) makes it optimal when k is O(n), outperforming comparison-based sorts like or mergesort in such scenarios, though it requires additional space proportional to k and is impractical for large or unknown ranges without modifications. Counting sort is often used as a subroutine in more advanced algorithms, such as for larger , and remains a key example in algorithm education for illustrating space-time tradeoffs in sorting.

Fundamentals

Description

Counting sort is a non-comparative sorting algorithm that determines the sorted order of elements by first counting the occurrences of each possible value in the input and then using these counts, along with arithmetic operations, to directly compute and place each element in its final position in the output array. This approach was originally described as a subroutine within digital sorting methods by Harold H. Seward in his 1954 master's thesis at MIT. Unlike comparison-based algorithms that rely on pairwise element comparisons to establish order, counting sort leverages the discrete nature of the input values and a known bounded range, employing indices directly corresponding to those values to avoid comparisons altogether. It operates by initializing a count sized to the input range, incrementing entries for each input value's frequency, and then transforming this into a cumulative distribution that specifies output positions. As a distribution-based sorting method, counting sort partitions elements into temporary storage based on their values—effectively using the count array as a set of buckets—and reconstructs the sorted sequence by placing elements according to the cumulative counts, akin to a single-pass variant of the distribution step in . This design enables efficient handling of discrete data when the value range is not excessively large, though it involves a space trade-off proportional to that range.

Assumptions

Counting sort requires a specific set of input conditions to function correctly and efficiently. The input is an of nn elements, where each element is a non-negative with values ranging from 0 to kk, and kk represents the maximum value in the array, which must be known beforehand to determine the of the auxiliary count array. This known range allows for the allocation of a count array of k+1k+1, enabling the algorithm to tally occurrences without exceeding memory bounds proportional to the input range. The output of counting sort is a new array of the same size nn, containing the input elements rearranged in non-decreasing order. The assumption of non-negative integers is crucial because negative values would correspond to invalid negative indices in the count array, which are not supported in standard array implementations. Although adaptations exist—such as shifting all values by adding the of the minimum element to map them to a non-negative range—this approach deviates from the standard and can inflate the effective range kk, thereby reducing efficiency in both time and space when the input span is large.

The Algorithm

Counting sort assumes the input consists of non-negative integers within a known range [0, k]. The following pseudocode describes the stable version of the algorithm, using zero-based indexing for arrays. It operates on an input AA of length nn, producing a sorted output BB of the same length, with a count CC of size k+1k+1.

procedure countingSort(A[0..n-1], B[0..n-1], k) C ← array of size (k + 1), initialized to 0 // Count occurrences of each value for i ← 0 to n-1 do C[A[i]] ← C[A[i]] + 1 // Increment count for A[i] for j ← 1 to k do C[j] ← C[j] + C[j-1] // Compute cumulative counts for placement positions for i ← n-1 downto 0 do B[C[A[i]] - 1] ← A[i] // Place A[i] at the correct position in B C[A[i]] ← C[A[i]] - 1 // Decrement the count to handle multiples

procedure countingSort(A[0..n-1], B[0..n-1], k) C ← array of size (k + 1), initialized to 0 // Count occurrences of each value for i ← 0 to n-1 do C[A[i]] ← C[A[i]] + 1 // Increment count for A[i] for j ← 1 to k do C[j] ← C[j] + C[j-1] // Compute cumulative counts for placement positions for i ← n-1 downto 0 do B[C[A[i]] - 1] ← A[i] // Place A[i] at the correct position in B C[A[i]] ← C[A[i]] - 1 // Decrement the count to handle multiples

This initializes the count array CC to zeros, which tracks the of each possible value in the range [0, k]. The first loop increments CC by 1 for each occurrence of value jj in AA, building the distribution. The second loop transforms CC into a cumulative sum array, where CC now represents the number of elements less than or equal to jj, providing the ending position for placing elements of value jj in the sorted output. The third loop iterates backward through AA to ensure stability—preserving the relative order of equal elements—by placing each AA at position C[A]1C[A] - 1 in BB and then decrementing C[A]C[A] to adjust for subsequent identical elements.

Step-by-Step Execution

To illustrate the execution of counting sort, consider an input AA of length n=8n = 8 containing integers in the range 00 to k=5k = 5: A=[2,5,3,0,2,3,0,3]A = [2, 5, 3, 0, 2, 3, 0, 3]. This example follows the standard implementation of the algorithm as described in introductory algorithms texts. The process begins by initializing a count CC of size k+1=6k+1 = 6 (indices 0 to 5) to all zeros. Then, for each element in AA, the corresponding entry in CC is incremented to record the frequency of each value:
Value012345
Count202301
Next, the count is transformed into a cumulative count by setting C=C+C[i1]C = C + C[i-1] for i=1i = 1 to kk, where C{{grok:render&&&type=render_inline_citation&&&citation_id=0&&&citation_type=wikipedia}} remains unchanged. This step computes the starting position in the output for each value, ensuring elements appear in non-decreasing order:
Value012345
Cumulative224778
An output array BB of size nn is then initialized (typically to empty or a placeholder). To produce a sort, the input array is scanned from right to left. For each element AA, it is placed at position C[A]1C[A] - 1 in BB, and C[A]C[A] is decremented. This backward pass preserves the relative order of equal elements. The placements proceed as follows:
  • For A{{grok:render&&&type=render_inline_citation&&&citation_id=7&&&citation_type=wikipedia}} = 3, place 3 at position 6 in BB, decrement C{{grok:render&&&type=render_inline_citation&&&citation_id=3&&&citation_type=wikipedia}} to 6.
  • For A{{grok:render&&&type=render_inline_citation&&&citation_id=6&&&citation_type=wikipedia}} = 0, place 0 at position 1 in BB, decrement C{{grok:render&&&type=render_inline_citation&&&citation_id=0&&&citation_type=wikipedia}} to 1.
  • For A{{grok:render&&&type=render_inline_citation&&&citation_id=5&&&citation_type=wikipedia}} = 3, place 3 at position 5 in BB, decrement C{{grok:render&&&type=render_inline_citation&&&citation_id=3&&&citation_type=wikipedia}} to 5.
  • For A{{grok:render&&&type=render_inline_citation&&&citation_id=4&&&citation_type=wikipedia}} = 2, place 2 at position 3 in BB, decrement C{{grok:render&&&type=render_inline_citation&&&citation_id=2&&&citation_type=wikipedia}} to 3.
  • For A{{grok:render&&&type=render_inline_citation&&&citation_id=3&&&citation_type=wikipedia}} = 0, place 0 at position 0 in BB, decrement C{{grok:render&&&type=render_inline_citation&&&citation_id=0&&&citation_type=wikipedia}} to 0.
  • For A{{grok:render&&&type=render_inline_citation&&&citation_id=2&&&citation_type=wikipedia}} = 3, place 3 at position 4 in BB, decrement C{{grok:render&&&type=render_inline_citation&&&citation_id=3&&&citation_type=wikipedia}} to 4.
  • For A{{grok:render&&&type=render_inline_citation&&&citation_id=1&&&citation_type=wikipedia}} = 5, place 5 at position 7 in BB, decrement C{{grok:render&&&type=render_inline_citation&&&citation_id=5&&&citation_type=wikipedia}} to 7.
  • For A{{grok:render&&&type=render_inline_citation&&&citation_id=0&&&citation_type=wikipedia}} = 2, place 2 at position 2 in BB, decrement C{{grok:render&&&type=render_inline_citation&&&citation_id=2&&&citation_type=wikipedia}} to 2.
The final output array is B=[0,0,2,2,3,3,3,5]B = [0, 0, 2, 2, 3, 3, 3, 5]. Throughout this execution, positions for elements in the output are determined directly from the precomputed counts and cumulative sums, without any direct comparisons between input values.

Analysis

Time Complexity

The time complexity of counting sort is O(n+k)O(n + k), where nn is the length of the input and kk is the number of possible key values (i.e., one more than the difference between the possible values in the input). This bound holds regardless of the input distribution, as the algorithm performs a fixed set of operations independent of the specific arrangement of elements. The analysis breaks down as follows: the initial counting phase iterates over the nn input elements exactly once to tally occurrences in the auxiliary array, requiring O(n)O(n) time; the cumulative sum phase (or prefix sum computation for stable placement) iterates over the kk possible values in the auxiliary array, requiring O(k)O(k) time; and the output phase iterates over the nn input elements once more to construct the sorted array, again requiring O(n)O(n) time. Initialization of the auxiliary array also takes O(k)O(k) time, which is subsumed in the overall O(k)O(k) term. More precisely, the exact running time can be expressed as T(n,k)=n+k+o(1)T(n, k) = n + k + o(1), reflecting the linear traversal costs with lower-order terms for overhead. When k=O(n)k = O(n), this simplifies to linear time O(n)O(n), making counting sort particularly efficient for inputs with bounded integer keys. As a non-comparison-based sorting algorithm, counting sort avoids the Ω(nlogn)\Omega(n \log n) lower bound that applies to comparison sorts, enabling its linear performance in suitable scenarios.

Space Complexity

The of counting sort is O(n+k)O(n + k), where nn is the number of elements to sort and kk is the range of input values (specifically, the difference between the maximum and minimum keys plus one). This arises primarily from the count array, which requires O(k)O(k) space to tally frequencies of each possible key value, and the output array, which needs O(n)O(n) space to build the sorted result without overwriting the input. More precisely, the total auxiliary space is given by S(n,k)=n+k+o(1)S(n, k) = n + k + o(1), where the o(1)o(1) term captures minor overhead from temporary variables and indices. A key trade-off occurs when knk \gg n, as the space then scales with kk, rendering counting sort inefficient or infeasible for datasets with expansive key ranges, such as when keys span up to n3n^3 or more.

Variants

Stable Variant

A stable sorting algorithm preserves the relative order of records with equal keys as they appear in the sorted output compared to the input. The standard implementation of counting sort fills the output array by traversing the input from beginning to end using cumulative counts, which places earlier occurrences of equal keys toward the end of their group in the output, thereby reversing their relative order and making the sort unstable. To ensure stability, the algorithm is modified to traverse the input array in reverse order during the placement phase, assigning positions such that earlier input elements with equal keys receive lower indices in the output. This reverse traversal works because the cumulative count array tracks the rightmost available position for each key initially; processing later input elements first places them at higher indices within the key group, leaving lower indices for earlier elements encountered subsequently. The pseudocode for the stable variant, adapted to zero-based indexing, highlights the backward loop:

COUNTING-SORT(A, B, k) 1 let [count](/page/Count)[0..k] be a new array 2 for i ← 0 to k 3 [count](/page/Count)[i] ← 0 4 for j ← 0 to n - 1 5 [count](/page/Count)[A[j]] ← [count](/page/Count)[A[j]] + 1 6 for i ← 1 to k 7 [count](/page/Count)[i] ← [count](/page/Count)[i] + [count](/page/Count)[i - 1] 8 for j ← n - 1 downto 0 9 B[[count](/page/Count)[A[j]] - 1] ← A[j] 10 [count](/page/Count)[A[j]] ← [count](/page/Count)[A[j]] - 1

COUNTING-SORT(A, B, k) 1 let [count](/page/Count)[0..k] be a new array 2 for i ← 0 to k 3 [count](/page/Count)[i] ← 0 4 for j ← 0 to n - 1 5 [count](/page/Count)[A[j]] ← [count](/page/Count)[A[j]] + 1 6 for i ← 1 to k 7 [count](/page/Count)[i] ← [count](/page/Count)[i] + [count](/page/Count)[i - 1] 8 for j ← n - 1 downto 0 9 B[[count](/page/Count)[A[j]] - 1] ← A[j] 10 [count](/page/Count)[A[j]] ← [count](/page/Count)[A[j]] - 1

The loop at lines 8–10 iterates backward over the input, decrementing the count after each placement to fill positions from right to left within equal-key groups.

Generalized Variant

Counting sort can be generalized to sort objects with arbitrary keys by first mapping those keys to a of non-negative integers, enabling the use of the standard counting mechanism on the mapped values. This mapping ensures a one-to-one correspondence between the original keys and integers within a feasible range [0, k], where k is the number of distinct keys or buckets. A common application of this generalized form is as a stable subroutine in , where counting sort processes individual digits or characters of the keys in multiple passes to achieve overall sorting of larger integers or strings. Each pass treats the digit positions as small integer keys, mapping them directly to counts without needing additional hashing. To handle negative integers, which fall outside the non-negative assumption of basic counting sort, an offset is applied by adding the of the minimum element to all keys, shifting the range to start at zero. For example, given keys {-3, -1, 0, 2}, the minimum is -3, so add 3 to each: {0, 2, 3, 5}; after counting and placement, subtract 3 during output to restore originals. This preserves the linear as long as the shifted range remains manageable. For floating-point numbers, counting sort is adapted by bucketing the values into discrete ranges, such as by scaling or to map continuous values to a of representing intervals (e.g., dividing the range [min, max] into k equal buckets and assigning each float to its bucket index). This transforms the problem into counting, though it introduces approximation if exact ordering within buckets is needed, often requiring a secondary sort on those buckets.

Applications and Limitations

Applications

Counting sort plays a crucial role in , particularly for sorting multi-digit numbers, where it serves as the stable subroutine applied to each digit position to achieve overall linear time sorting for fixed-length keys. The algorithm is well-suited for sorting data with small fixed ranges, such as student grades from 0 to 100, enabling efficient organization of educational records without comparison overhead. Similarly, in image processing, counting sort is suitable for sorting intensities from 0 to 255, as these values lie within a small discrete range ideal for linear-time sorting methods. As a preprocessing step in data structures, counting sort computes frequency distributions, which is essential for generation and subsequent analysis, such as in statistical profiling or data summarization where the count directly represents occurrence tallies for discrete values.

Limitations

Counting sort is fundamentally limited to sorting integers within a known, bounded range [0, k], rendering it inapplicable to continuous types such as real numbers or unbounded domains where the range cannot be predefined. This restriction arises because the algorithm relies on indexing to count occurrences of each possible value, which is infeasible for non-discrete or infinite sets. When the range k significantly exceeds the input size n (i.e., k >> n), counting sort suffers from excessive space usage due to the O(k) count array, leading to substantial memory waste for sparse or widely distributed keys. This space explosion makes the algorithm impractical in such scenarios, as the auxiliary storage requirements can dominate practical implementations. Additionally, counting sort is not an in its standard form, necessitating O(n + k) extra memory for the count and output arrays, which contrasts with space-efficient alternatives for general-purpose sorting. For general cases without the integer-range assumption, comparison-based algorithms like or mergesort are preferable, as they handle arbitrary data types and orders without proportional space overhead to the key range.

Historical Development

Origins

Counting sort was invented by Harold H. Seward in 1954 while he was a graduate student at the Massachusetts Institute of Technology (MIT). Seward first described the algorithm in his master's thesis titled Information Sorting in the Application of Electronic Digital Computers to Business Operations, published as MIT Digital Computer Laboratory Report R-232 in May 1954. In this work, he presented counting sort—referred to as a "floating digital sort"—as an efficient method for internal sorting of represented by decimal digits, particularly suited for the hardware constraints of early electronic computers. The development occurred during the early computing era, when data processing in business applications relied heavily on punched card systems, such as those from , which processed cards at rates like 450 per minute per digit using electric brushes to detect punched holes. Seward's algorithm addressed the need for streamlined sorting on limited hardware, including magnetic tape units and the computer project at MIT, by minimizing passes over the data and leveraging direct frequency-based distribution rather than comparisons. Preceding mechanical techniques, such as manual "Keysort" cards that used binary sorting with needles to separate punched cards by holes, also informed the digit-by-digit distribution central to the algorithm.

Significance

Counting sort represents a pioneering example of non-comparative sorting algorithms, achieving linear O(n + k) under the assumption of a limited range of input values, which challenges the Ω(n log n) lower bound for general comparison-based sorts and has influenced the development of other linear-time sorting techniques. As a foundational building block, counting sort serves as a key subroutine in more advanced algorithms like and , enabling efficient handling of larger or distributed datasets by breaking down sorting into digit-by-digit or bucket-level operations. In educational contexts, counting sort plays a crucial role in illustrating design trade-offs, such as the balance between time efficiency and space requirements, as well as the impact of input assumptions on performance, making it a staple in curricula for demonstrating non-comparison-based methods and stability in sorting. Its modern relevance persists in big data environments, where it supports distributed sorting for data with known distributions, such as through radix sort variants in MapReduce frameworks for partitioning and processing large-scale integer keys efficiently.

References

Add your contribution
Related Hubs
Contribute something
User Avatar
No comments yet.