Hubbry Logo
ShellsortShellsortMain
Open search
Shellsort
Community hub
Shellsort
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Shellsort
Shellsort
from Wikipedia
Shellsort
Step-by-step visualisation of Shellsort
Shellsort with gaps 23, 10, 4, 1 in action
ClassSorting algorithm
Data structureArray
Worst-case performanceO(n2) (worst known worst case gap sequence)
O(n log2n) (best known worst case gap sequence)[1]
Best-case performanceO(n log n) (most gap sequences)
O(n log2n) (best known worst-case gap sequence)[2]
Average performancedepends on gap sequence
Worst-case space complexityО(n) total, O(1) auxiliary
OptimalNo
The steps of Shellsort.
Swapping pairs of items in successive steps of Shellsort with gaps 5, 3, 1

Shellsort, also known as Shell sort or Shell's method, is an in-place comparison sort. It can be understood as either a generalization of sorting by exchange (bubble sort) or sorting by insertion (insertion sort).[3] The method starts by sorting pairs of elements far apart from each other, then progressively reducing the gap between elements to be compared. By starting with far-apart elements, it can move some out-of-place elements into the position faster than a simple nearest-neighbor exchange. The running time of Shellsort is heavily dependent on the gap sequence it uses. For many practical variants, determining their time complexity remains an open problem.

The algorithm was first published by Donald Shell in 1959, and has nothing to do with shells.[4][5]

Description

[edit]

Shellsort is an optimization of insertion sort that allows the exchange of items that are far apart. The idea is to arrange the list of elements so that, starting anywhere, taking every hth element produces a sorted list. Such a list is said to be h-sorted. It can also be thought of as h interleaved lists, each individually sorted.[6] Beginning with large values of h allows elements to move long distances in the original list, reducing large amounts of disorder quickly, and leaving less work for smaller h-sort steps to do.[7] If the list is then k-sorted for some smaller integer k, then the list remains h-sorted. A final sort with h = 1 ensures the list is fully sorted at the end,[6] but a judiciously chosen decreasing sequence of h values leaves very little work for this final pass to do.

In simplistic terms, this means if we have an array of 1024 numbers, our first gap (h) could be 512. We then run through the list comparing each element in the first half to the element in the second half. Our second gap (k) is 256, which breaks the array into four sections (starting at 0, 256, 512, 768), and we make sure the first items in each section are sorted relative to each other, then the second item in each section, and so on. In practice the gap sequence could be anything, but the last gap is always 1 to finish the sort (effectively finishing with an ordinary insertion sort).

An example run of Shellsort with gaps 5, 3 and 1 is shown below.

a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12
Input data 62 83 18 53 07 17 95 86 47 69 25 28
After 5-sorting 17 28 18 47 07 25 83 86 53 69 62 95
After 3-sorting 17 07 18 47 28 25 69 62 53 83 86 95
After 1-sorting 07 17 18 25 28 47 53 62 69 83 86 95

The first pass, 5-sorting, performs insertion sort on five separate subarrays (a1, a6, a11), (a2, a7, a12), (a3, a8), (a4, a9), (a5, a10). For instance, it changes the subarray (a1, a6, a11) from (62, 17, 25) to (17, 25, 62). The next pass, 3-sorting, performs insertion sort on the three subarrays (a1, a4, a7, a10), (a2, a5, a8, a11), (a3, a6, a9, a12). The last pass, 1-sorting, is an ordinary insertion sort of the entire array (a1,..., a12).

As the example illustrates, the subarrays that Shellsort operates on are initially short; later they are longer but almost ordered. In both cases insertion sort works efficiently.

Unlike insertion sort, Shellsort is not a stable sort since gapped insertions transport equal elements past one another and thus lose their original order. It is an adaptive sorting algorithm in that it executes faster when the input is partially sorted.

Pseudocode

[edit]

Using Marcin Ciura's gap sequence, with an inner insertion sort.

# Sort an array a[0...n-1].
gaps = [701, 301, 132, 57, 23, 10, 4, 1]  # Ciura gap sequence

# Start with the largest gap and work down to a gap of 1
# similar to insertion sort but instead of 1, gap is being used in each step
foreach (gap in gaps)
{
    # Do a gapped insertion sort for every element in gaps
    # Each loop leaves a[0..gap-1] in gapped order
    for (i = gap; i < n; i += 1)
    {
        # save a[i] in temp and make a hole at position i
        temp = a[i]
        # shift earlier gap-sorted elements up until the correct location for a[i] is found
        for (j = i; (j >= gap) && (a[j - gap] > temp); j -= gap)
        {
            a[j] = a[j - gap]
        }
        # put temp (the original a[i]) in its correct location
        a[j] = temp
    }
}

Gap sequences

[edit]

The question of deciding which gap sequence to use is difficult. Every gap sequence that contains 1 yields a correct sort (as this makes the final pass an ordinary insertion sort); however, the properties of thus obtained versions of Shellsort may be very different. Too few gaps slows down the passes, and too many gaps produces an overhead.

The table below compares most proposed gap sequences published so far. Some of them have decreasing elements that depend on the size of the sorted array (N). Others are increasing infinite sequences, whose elements less than N should be used in reverse order.

OEIS General term (k ≥ 1) Concrete gaps Worst-case
time complexity
Author and year of publication
[e.g. when N = 2p] Shell, 1959[4]
Frank & Lazarus, 1960[8]
A000225 Hibbard, 1963[9]
A083318 , prefixed with 1 Papernov & Stasevich, 1965[10]
A003586 Successive numbers of the form (3-smooth numbers) Pratt, 1971[1]
A003462 , not greater than Knuth, 1973,[3] based on Pratt, 1971[1]
A036569 Incerpi & Sedgewick, 1985,[11] Knuth[3]
A036562 , prefixed with 1 Sedgewick, 1982[6]
A033622 Sedgewick, 1986[12]
Unknown Gonnet & Baeza-Yates, 1991[13]
A108870 (or equivalently, ) Unknown Tokuda, 1992[14] (misquote per OEIS)
A102549 Unknown (experimentally derived) Unknown Ciura, 2001[15]
A366726 Unknown Lee, 2021[16]
Unknown Skean, Ehrenborg, Jaromczyk, 2023[17]
Start at 1. Successive increments are smallest prime times previous Unknown Glenn C. Rhoads[18]

When the binary representation of N contains many consecutive zeroes, Shellsort using Shell's original gap sequence makes Θ(N2) comparisons in the worst case. For instance, this case occurs for N equal to a power of two when elements greater and smaller than the median occupy odd and even positions respectively, since they are compared only in the last pass.

Although it has higher complexity than the O(N log N) that is optimal for comparison sorts, Pratt's version lends itself to sorting networks and has the same asymptotic gate complexity as Batcher's bitonic sorter.

Gonnet and Baeza-Yates observed that Shellsort makes the fewest comparisons on average when the ratios of successive gaps are roughly equal to 2.2.[13] This is why their sequence with ratio 2.2 and Tokuda's sequence with ratio 2.25 prove efficient. However, it is not known why this is so. Sedgewick recommends using gaps which have low greatest common divisors or are pairwise coprime.[19][failed verification] Gaps which are odd numbers seem to work well in practice: 25% reductions have been observed by avoiding even-numbered gaps. Gaps which avoid multiples of 3 and 5 seem to produce small benefits of < 10%.[original research?]

With respect to the average number of comparisons, Ciura's sequence[15] has the best known performance; gaps greater than 701 were not determined but the sequence can be further extended according to the recursive formula .

Tokuda's sequence, defined by the simple formula , where , , can be recommended for practical applications.

If the maximum input size is small, as may occur if Shellsort is used on small subarrays by another recursive sorting algorithm such as quicksort or merge sort, then it is possible to tabulate an optimal sequence for each input size.[20][21]

Computational complexity

[edit]

The following property holds: after h2-sorting of any h1-sorted array, the array remains h1-sorted.[22] Every h1-sorted and h2-sorted array is also (a1h1+a2h2)-sorted, for any nonnegative integers a1 and a2. The worst-case complexity of Shellsort is therefore connected with the Frobenius problem: for given integers h1,..., hn with gcd = 1, the Frobenius number g(h1,..., hn) is the greatest integer that cannot be represented as a1h1+ ... +anhn with nonnegative integer a1,..., an. Using known formulae for Frobenius numbers, we can determine the worst-case complexity of Shellsort for several classes of gap sequences.[23] Proven results are shown in the above table.

Mark Allen Weiss proved that Shellsort runs in O(N log N) time when the input array is in reverse order.[24]

With respect to the average number of operations, none of the proven results concerns a practical gap sequence. For gaps that are powers of two, Espelid computed this average as .[25] Knuth determined the average complexity of sorting an N-element array with two gaps (h, 1) to be .[3] It follows that a two-pass Shellsort with h = Θ(N1/3) makes on average O(N5/3) comparisons/inversions/running time. Yao found the average complexity of a three-pass Shellsort.[26] His result was refined by Janson and Knuth:[27] the average number of comparisons/inversions/running time made during a Shellsort with three gaps (ch, cg, 1), where h and g are coprime, is in the first pass, in the second pass and in the third pass. ψ(h, g) in the last formula is a complicated function asymptotically equal to . In particular, when h = Θ(N7/15) and g = Θ(N1/5), the average time of sorting is O(N23/15).

Based on experiments, it is conjectured that Shellsort with Hibbard's gap sequence runs in O(N5/4) average time,[3] and that Gonnet and Baeza-Yates's sequence requires on average 0.41N ln N (ln ln N + 1/6) element moves.[13] Approximations of the average number of operations formerly put forward for other sequences fail when sorted arrays contain millions of elements.

The graph below shows the average number of element comparisons use by various gap sequences, divided by the theoretical lower bound, i.e. log2N!. Ciuria's sequence 1, 4, 10, 23, 57, 132, 301, 701 (labelled Ci01) has been extended according to the formula .

Applying the theory of Kolmogorov complexity, Jiang, Li, and Vitányi [28] proved the following lower bound for the order of the average number of operations/running time in a p-pass Shellsort: Ω(pN1+1/p) when p ≤ log2N and Ω(pN) when p > log2N. Therefore, Shellsort has prospects of running in an average time that asymptotically grows like N logN only when using gap sequences whose number of gaps grows in proportion to the logarithm of the array size. It is, however, unknown whether Shellsort can reach this asymptotic order of average-case complexity, which is optimal for comparison sorts. The lower bound was improved by Vitányi[29] for every number of passes to where . This result implies for example the Jiang-Li-Vitányi lower bound for all -pass increment sequences and improves that lower bound for particular increment sequences. In fact all bounds (lower and upper) currently known for the average case are precisely matched by this lower bound. For example, this gives the new result that the Janson-Knuth upper bound is matched by the resulting lower bound for the used increment sequence, showing that three pass Shellsort for this increment sequence uses comparisons/inversions/running time. The formula allows us to search for increment sequences that yield lower bounds which are unknown; for example an increment sequence for four passes which has a lower bound greater than for the increment sequence . The lower bound becomes

The worst-case complexity of any version of Shellsort is of higher order: Plaxton, Poonen, and Suel showed that it grows at least as rapidly as .[30][31] Robert Cypher proved a stronger lower bound: when for all .[32]

Applications

[edit]

Shellsort performs more operations and has higher cache miss ratio than quicksort. However, since it can be implemented using little code and does not use the call stack, some implementations of the qsort function in the C standard library targeted at embedded systems use it instead of quicksort. Shellsort is, for example, used in the uClibc library.[33] For similar reasons, in the past, Shellsort was used in the Linux kernel.[34]

Shellsort can also serve as a sub-algorithm of introspective sort, to sort short subarrays and to prevent a slowdown when the recursion depth exceeds a given limit. This principle is employed, for instance, in the bzip2 compressor.[35]

See also

[edit]

References

[edit]

Bibliography

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Shellsort is a comparison-based sorting algorithm that improves upon insertion sort by sorting elements that are initially far apart in the array before addressing closer elements, using a sequence of progressively smaller increments (or gaps) to define subarrays for sorting. Invented by Donald L. Shell and first described in 1959, it operates in-place and is unstable, meaning that equal elements may not retain their relative order. The algorithm begins with a large gap size hh, applying insertion sort to each of the hh independent subarrays formed by elements separated by hh positions; it then reduces hh according to a predefined sequence (such as Shell's original powers-of-2 decrements) until h=1h = 1, at which point a standard insertion sort completes the process. This diminishing gap approach reduces the total number of comparisons and swaps compared to naive insertion sort, achieving subquadratic time complexity. The performance of Shellsort heavily depends on the choice of increment sequence, with Shell's original sequence yielding a worst-case time complexity of O(n2)O(n^2), while optimized sequences like those proposed by Hibbard (hk=2k1h_k = 2^k - 1) or Sedgewick achieve O(n3/2)O(n^{3/2}) or better, such as O(n4/3)O(n^{4/3}) in some analyses. Theoretical bounds include a worst-case upper bound of O(n1+ϵ)O(n^{1 + \epsilon}) for any ϵ>0\epsilon > 0 with carefully chosen gaps, and a lower bound of Ω(n(logn/loglogn)2)\Omega(n (\log n / \log \log n)^2) comparisons. Empirical studies show Shellsort to be efficient for moderate-sized inputs due to its simplicity and low overhead, often outperforming more complex algorithms like quicksort on small arrays, though it lacks the adaptive properties of mergesort or heapsort. Variants include multi-pass strategies like h-bubble, h-shake, and h-brick passes to further minimize inversions, and randomized versions for data-oblivious sorting in secure computation contexts. Despite ongoing research into optimal gap sequences—spanning decades of work by researchers like Knuth, Pratt, and Incerpi—Shellsort remains valued for its ease of implementation and practical speed without requiring recursion or extra memory.

Overview

Description

Shellsort is a comparison-based sorting algorithm that generalizes insertion sort by allowing the comparison and exchange of elements that are far apart in the array, specifically those separated by a sequence of decreasing gaps known as h-sorts. Introduced by Donald Shell in 1959, it performs a series of insertion sorts on subarrays defined by these gaps, enabling elements to move more quickly toward their final positions compared to the adjacent swaps in standard insertion sort. The primary motivation for Shellsort stems from the limitations of insertion sort, which exhibits quadratic time complexity in the worst case due to the need to shift distant inverted elements one position at a time, leading to O(n²) operations for large inversions. By preprocessing the array with larger gaps, Shellsort reduces the number of such distant inversions early in the process, bringing elements roughly into their correct relative order before applying smaller gaps and ultimately a full insertion sort with gap 1. This approach leverages the adaptive nature of insertion sort on partially ordered data while mitigating its inefficiency on random inputs. At a high level, the algorithm begins with a large initial gap (often around n/2, where n is the array length) and applies insertion sort to each of the independent subarrays formed by elements spaced at that gap. The gap is then systematically reduced—typically by halving or following a predefined sequence—repeating the insertion sort passes on the new subarrays until the gap reaches 1, at which point the entire array is fully sorted. Various gap sequences can be used, though their selection is discussed in dedicated analyses. Shellsort is particularly advantageous as an in-place algorithm that requires no additional memory beyond the input array and adapts well to input data that is already partially sorted, making it efficient for moderately sized arrays in practice. Its simplicity in implementation and lack of recursion further contribute to its utility in resource-constrained environments. To illustrate, consider sorting the array [5, 2, 8, 1, 9] with an initial gap of 2. The subarrays are the elements at even indices (positions 0, 2, 4: [5, 8, 9], already sorted) and odd indices (positions 1, 3: [2, 1]). Applying insertion sort to the odd subarray swaps 2 and 1, resulting in [5, 1, 8, 2, 9]. This pass reduces inversions, such as moving 1 closer to its final position near the front, without resolving the full order yet—subsequent smaller gaps will refine it further.

History

Shellsort was invented by Donald L. Shell, an American computer scientist employed at the General Electric Company in Cincinnati, Ohio. In 1959, he published the algorithm in his seminal paper "A High-Speed Sorting Procedure" in Communications of the ACM, marking the first formal description of the method. Shell developed the algorithm during his work on computing systems in the late 1950s, drawing inspiration from existing sorting techniques such as insertion sort but innovating with a series of diminishing increments to enable more efficient element exchanges across larger distances early in the process. This approach addressed limitations in simple sorts for handling larger datasets in internal memory, reflecting the computational constraints of the era. Early empirical evaluations presented in Shell's paper revealed substantial performance gains over basic insertion sort, prompting its initial adoption in practical computing applications. The introduction of this adaptive strategy profoundly influenced later sorting algorithm research, establishing a foundation for exploring increment-based optimizations in comparison sorts. Post-1959, the algorithm underwent significant refinements, particularly in gap sequence design to mitigate worst-case inefficiencies. A notable early improvement came in 1963 when Thomas N. Hibbard proposed a sequence of gaps following powers of 2 minus 1 (e.g., 1, 3, 7, 15, ...), which empirical studies showed could achieve better average-case bounds than Shell's original powers-of-2 decrements. Further advancements in the 1960s through 1980s, including sequences by Donald E. Knuth in 1973, continued to enhance its theoretical and practical viability. In the 2020s, Shellsort remains a subject of theoretical investigation for insights into adaptive sorting behaviors, with recent work analyzing its optimization challenges and complexity under various gap configurations, including a 2025 study demonstrating expected O(n log² n) complexity using specific increment sequences. It also sees occasional use in modern contexts, such as the bzip2 compression utility, where its in-place nature and avoidance of deep recursion suit resource-limited environments.

Algorithm

Core Mechanism

Shellsort operates by iteratively sorting subsets of the input array defined by a decreasing sequence of gaps, known as h-sorting phases. For a given gap size h, the array is conceptually divided into h interleaved subarrays, where each subarray consists of elements separated by h positions (i.e., elements at indices i, i+h, i+2h, ..., for i = 0 to h-1). Each of these subarrays is then independently sorted using insertion sort, which involves shifting elements within the subarray to maintain order while operating directly on the original array positions. This process begins with the largest gap h (often on the order of n/3 or similar, depending on the gap sequence) and proceeds by reducing h in subsequent iterations until h reaches 1. The choice of gaps plays a crucial role in the algorithm's efficiency, as larger initial gaps address long-range disorder by allowing elements that are far apart to be compared and moved early, thereby reducing the number of inversions (pairs of elements out of their natural order) across the array more rapidly than a single insertion sort would. As the gaps diminish, the focus shifts to refining local order within smaller subarrays, progressively bringing the array closer to fully sorted. This staged reduction in gap sizes ensures that by the time h=1, the array is already partially ordered, minimizing the work required in the final phase. All operations—comparisons, shifts, and swaps—are performed in-place on the original array, requiring no additional storage beyond a few variables for indexing. The algorithm terminates when the gap h equals 1, at which point it executes a standard insertion sort on the entire array, which benefits from the prior phases having eliminated many distant inversions. To illustrate, consider a 7-element array A = [3, 7, 1, 5, 2, 4, 6] using gaps h=3 followed by h=1 (a simplified sequence for demonstration).
  • Initial array: [3, 7, 1, 5, 2, 4, 6]
For h=3, the three interleaved subarrays are:
  • Subarray 0 (indices 0,3,6): [3, 5, 6] — already sorted, no changes.
  • Subarray 1 (indices 1,4): [7, 2] — insert 2 before 7, shifting: array becomes [3, 2, 1, 5, 7, 4, 6].
  • Subarray 2 (indices 2,5): [1, 4] — 4 > 1, no change.
  • After h=3: [3, 2, 1, 5, 7, 4, 6]
For h=1, perform insertion sort on the full array:
  • Pass over elements, shifting out-of-order items: after full passes, the array sorts to [1, 2, 3, 4, 5, 6, 7].
This trace shows how the initial h=3 phase rearranges distant elements (e.g., moving 2 past 7), reducing inversions before the local h=1 refinement completes the sort.

Pseudocode

Shellsort is typically presented in pseudocode as an in-place sorting algorithm that operates on an array of comparable elements, generalizing insertion sort by applying it to subarrays defined by decreasing gap values until the array is fully sorted. The standard pseudocode, assuming a simple gap sequence where gaps halve from the array length divided by 2 down to 1, follows this structure:

procedure ShellSort(A: array of size n) if n <= 1 then return // Empty or single-element array is already sorted end if gap ← floor(n / 2) while gap > 0 do for i ← gap to n-1 do temp ← A[i] j ← i while j >= gap and A[j - gap] > temp do A[j] ← A[j - gap] j ← j - gap end while A[j] ← temp end for gap ← floor(gap / 2) end while end procedure

procedure ShellSort(A: array of size n) if n <= 1 then return // Empty or single-element array is already sorted end if gap ← floor(n / 2) while gap > 0 do for i ← gap to n-1 do temp ← A[i] j ← i while j >= gap and A[j - gap] > temp do A[j] ← A[j - gap] j ← j - gap end while A[j] ← temp end for gap ← floor(gap / 2) end while end procedure

This template initializes the first gap as approximately half the array size and iteratively reduces it by half, ensuring gaps are positive integers decreasing to 1. The outer loop iterates over each gap value, while the middle loop scans the array starting from the gap index, treating each subarray A, A[i+gap], A[i+2*gap], ... as a sequence to sort via insertion. The inner while loop performs the shifts and insertion for each element in the subarray, swapping positions by multiples of the gap when an inversion is detected (i.e., when A[j - gap] > temp). For edge cases, if the input array has zero or one element, the procedure terminates immediately without processing, as no sorting is needed; additionally, the gap loop condition ensures only positive gaps are used, preventing invalid operations. While this pseudocode employs Donald Shell's original gap sequence of successive halvings, the algorithm is adaptable to custom gap sequences by replacing the gap initialization and update with a predefined list of decreasing positive integers ending at 1, such as those analyzed in later refinements. To illustrate, consider applying the pseudocode to the array A = [5, 2, 8, 1, 9, 3] where n=6. Initially, gap=3. For i=3 (A=1), the subarray is A=5, A=1; since 5 > 1, shift: A=5, j=0, insert 1 at A, yielding [1, 2, 8, 5, 9, 3]. For i=4 (A=9), subarray A=2, A=9; since 2 < 9, no shift. For i=5 (A=3), subarray A=8, A=3; since 8 > 3, shift: A=8, then j=2, insert 3 at A, yielding [1, 2, 3, 5, 9, 8]. Next gap=1, which performs a full insertion sort on the partially ordered array, resulting in the sorted [1, 2, 3, 5, 8, 9]. At each step, the inner loop compares and shifts only within the current gap's subarrays, progressively reducing disorder.

Gap Selection Process

In Shellsort, the gap selection process involves generating a sequence of decreasing increments, known as gaps, that determine the distances between elements compared in each pass of the algorithm. The original method, proposed by Donald Shell in 1959, starts with an initial gap hh approximately equal to half the array size nn (typically h=n/2h = \lfloor n/2 \rfloor) and repeatedly halves it until reaching 1, yielding gaps such as n/2,n/4,n/8,,1n/2, n/4, n/8, \dots, 1. This sequence allows the algorithm to initially sort widely spaced subsequences and progressively refine the sorting with smaller gaps. Effective gap selection relies on specific criteria to optimize performance and minimize redundant operations. Gaps in the sequence should be pairwise coprime (i.e., their greatest common divisor is 1) to prevent clustering of elements and ensure even distribution across passes, as shared factors can lead to inefficient comparisons. Additionally, the sequence must be strictly decreasing to enable a refinement process where each subsequent pass builds on the partial order established by prior larger gaps, ultimately converging to a fully sorted array when the final gap is 1. The gaps are integrated into the algorithm either by precomputing the full sequence or generating them dynamically during execution. In the dynamic approach, common in simple implementations, the outer loop iterates over the gaps as follows:

for (int h = n / 2; h > 0; h /= 2) { // Perform insertion sort on subsequences spaced by h for (int i = h; i < n; i++) { int temp = a[i]; int j; for (j = i; j >= h && a[j - h] > temp; j -= h) { a[j] = a[j - h]; } a[j] = temp; } }

for (int h = n / 2; h > 0; h /= 2) { // Perform insertion sort on subsequences spaced by h for (int i = h; i < n; i++) { int temp = a[i]; int j; for (j = i; j >= h && a[j - h] > temp; j -= h) { a[j] = a[j - h]; } a[j] = temp; } }

This structure ensures that for each gap hh, the array is partitioned into hh independent subsequences, each sorted via insertion sort. Empirical guidelines for gap selection emphasize balancing the number of passes with the computational work per pass. Sequences should avoid powers of 2, as in Shell's original method, because they can exhibit pathological behavior on certain inputs, such as when nn is a power of 2, leading to uneven element movement. Instead, designers test candidate sequences for their ability to reduce inversions efficiently across random and worst-case inputs, often aiming for a logarithmic number of gaps (Θ(logn)\Theta(\log n)) to achieve subquadratic performance. A common pitfall in gap selection is choosing sequences where all gaps share a common factor, such as all even numbers, which restricts element exchanges to even indices and causes the algorithm to degrade to O(n2)O(n^2) time complexity by failing to interleave subsequences effectively. Shell's original halving sequence, for instance, results in Θ(n2)\Theta(n^2) worst-case comparisons due to such interactions.

Gap Sequences

Common Sequences

Shellsort's performance is highly dependent on the choice of gap sequence, with several well-established sequences developed over time to improve efficiency. The original sequence proposed by its inventor uses simple halvings, while subsequent refinements introduced geometric progressions and empirical optimizations. Donald Shell's original gap sequence, introduced in 1959, starts with an initial gap of h=n/2h = n/2 and repeatedly halves it until reaching 1, yielding gaps such as n/2,n/4,n/8,,1n/2, n/4, n/8, \dots, 1. This approach is straightforward but can lead to suboptimal performance, with a worst-case time complexity of O(n2)O(n^2). In 1963, Thomas Hibbard proposed a sequence of gaps given by hk=2k1h_k = 2^k - 1 for k=0,1,2,k = 0, 1, 2, \dots, producing values like 1, 3, 7, 15, 31, and so on. This sequence achieves a proven worst-case time complexity of O(n3/2)O(n^{3/2}), marking an early theoretical improvement over Shell's method. Donald Knuth suggested a sequence in his seminal work on sorting algorithms, defined as hk=3k12h_k = \frac{3^k - 1}{2} for k=1,2,3,k = 1, 2, 3, \dots, generating gaps such as 1, 4, 13, 40, 121. This formulation balances the growth of gaps to reduce the number of inversions more evenly across passes, contributing to better practical behavior. Incerpi and Sedgewick developed a hybrid sequence in 1985, combining elements of powers of 2 and 3 for empirical efficiency. It merges two geometric progressions—one with terms 94k92k+19 \cdot 4^k - 9 \cdot 2^k + 1 and the other 4k32k+14^k - 3 \cdot 2^k + 1 (for k2k \geq 2)—resulting in gaps like 1, 5, 19, 41, 109, 209. This sequence is noted for its fast average-case performance in practice due to its distribution of gap sizes. Vaughan Pratt's sequence, outlined in his 1972 thesis, consists of all 3-smooth numbers (products of powers of 2 and 3, i.e., 2a3b2^a \cdot 3^b for non-negative integers a,ba, b), arranged in decreasing order (largest first, down to 1) and truncated to those less than or equal to the array size; examples (in ascending order) include 1, 2, 3, 4, 6, 8, 9, 12. It is theoretically significant for achieving O(nlog2n)O(n \log^2 n) worst-case complexity, though the sequence grows rapidly and is less practical for typical implementations. A more recent empirical sequence by Marcin Ciura, published in 2001, was optimized through experimentation for array sizes up to 10510^5. It uses fixed gaps: 1, 4, 10, 23, 57, 132, 301, 701, and so forth, selected to minimize average-case comparisons and swaps. This sequence outperforms many predecessors in benchmarks for moderate-sized inputs.

Sequence Properties and Design

Effective gap sequences in Shellsort are designed to balance efficiency across multiple insertion sort passes, with desirable properties including a geometric decrease in gap sizes—typically with ratios between 2 and 3—to ensure rapid reduction while maintaining coverage of the array elements. Such ratios, like 11/5 ≈ 2.2, promote even distribution of comparisons and minimize the number of inversions carried over between passes. Additionally, gaps should be pairwise relatively prime (i.e., coprime in pairs) to avoid amplifying periodic disorders in the input, and the overall sequence should aim for a small greatest common divisor, ideally 1, to prevent worst-case scenarios where subarrays remain poorly sorted. This even coverage ensures that distant elements are compared early, facilitating a smooth transition to finer sorting in later passes. Designing these sequences involves key trade-offs, such as the number of passes versus the size of individual gaps: longer sequences with more smaller gaps increase the total work per pass but can lead to better overall sortedness, while shorter sequences with larger initial gaps reduce passes but risk inefficient final stages. The goal is typically O(log n) passes to achieve sub-quadratic performance in practice, balancing computational overhead against the benefits of diminishing gap sizes. For instance, sequences emphasizing fewer passes may use ratios closer to 3, trading some evenness for speed, whereas those prioritizing uniformity opt for slightly smaller ratios around 2.2. To avoid worst-case pathologies, sequences must minimize the potential for input-dependent degradations, such as when gaps share large common factors that align with adversarial patterns; employing coprime gaps and leveraging number-theoretic tools like the Frobenius coin problem helps bound the maximum disorder propagated, ensuring no subarray experiences excessive swaps. This design choice is critical, as sequences with a greatest common divisor greater than 1 can trap elements in unsorted cycles, inflating the comparison count by up to a factor of the divisor. Early gap sequences, such as those proposed by Shell in 1959 and Hibbard in 1963, were derived theoretically based on intuitive progressions like multiples of 3 or powers of 2, aiming for simple asymptotic behavior without extensive testing. In contrast, later developments by Incerpi and Sedgewick in the 1980s and Ciura in 2001 shifted toward empirical tuning through simulations on random permutations, optimizing for average-case metrics like comparisons and assignments on arrays up to several thousand elements. These simulation-driven approaches revealed that theoretical sequences often underperform in practice due to high constants, favoring hybrid designs that incorporate both geometric growth and data-specific refinements. Despite advances, the optimal gap sequence remains conjectural, with open problems centering on sequences that adapt to input distributions beyond uniform random data, potentially incorporating runtime analysis of disorder types. Post-2000 research has explored parameterized functions for gaps; for example, a 2023 study introduced efficient sequences based on grid-search optimization of parameters, potentially improving scalability for larger inputs, though surpassing empirically tuned sequences like Ciura's remains challenging. A 2021 analysis also examined gamma-parameterized gaps inspired by Ciura's method. As an example evaluation, for an array of size n=1000, the theoretical Hibbard sequence (powers of 2 minus 1) requires approximately 10 passes, while the empirically tuned Ciura sequence uses about 8 passes, demonstrating how simulation-optimized designs reduce overhead without sacrificing effectiveness.

Analysis

Time Complexity

The time complexity of Shellsort depends heavily on the chosen gap sequence and input characteristics, with no tight universal bound established. In the worst case, it ranges from subquadratic to O(n^2) depending on the gaps, while a general lower bound of \Omega\left(n \left(\frac{\log n}{\log \log n}\right)^2\right) comparisons holds regardless of the sequence. For specific gap sequences, performance improves with better designs. The original Shell sequence using powers of two yields a worst-case Θ(n^2), which performs worse than sequences like Hibbard's that achieve O(n^{3/2}). The Hibbard sequence (gaps of the form 2^k - 1) achieves a worst-case O(n^{3/2}). The Sedgewick sequence provides a worst-case O(n^{4/3}). However, the average-case complexity for practical sequences like Sedgewick's remains an open theoretical problem, though empirical studies indicate strong subquadratic performance. A sketch of the derivation reveals why bounds vary. The algorithm performs a series of insertion sorts on disjoint subarrays for each gap h. With approximately n/h subarrays of length h, the worst-case cost per pass is O((n/h) \cdot h^2) = O(n h), as each insertion sort on a subarray of size h costs O(h^2). The total complexity is thus O\left(n \sum h_i\right) over all gaps h_i in the sequence; for geometrically decreasing gaps where \sum h_i = O(\sqrt{n}), this yields O(n^{3/2}). In the average case, inversion counting for random inputs reduces shifts, leading to O(n) per pass and subquadratic totals for good sequences. For gap h, the n/h subarrays each require Θ(h^2) comparisons on average for random inputs, totaling Θ(n h) per pass. With good sequences and partial sorting from prior passes, the effective cost per pass reduces toward O(n), yielding subquadratic totals over O(log n) passes. The best-case complexity is O(n \log n) for nearly sorted inputs, where minimal shifts occur per pass. Average-case performance for strong sequences approximates O(n^{1.3}), while worst-case inputs can be engineered for near-O(n^2) with adversarial gaps. Input randomness and sequence quality are key factors; empirical studies show Shellsort 2-5 times slower than quicksort on average but far faster than bubble sort for lists beyond 100 elements.

Space Complexity and Stability

Shellsort operates as a fully in-place algorithm, requiring only O(1) auxiliary space beyond the input array itself. This is achieved through the use of a constant number of variables, such as loop indices and a temporary variable for element swaps during comparisons. The algorithm's implementation avoids recursion entirely and does not allocate extra arrays or data structures for intermediate results, distinguishing it from algorithms like merge sort that demand O(n) additional space for merging. This in-place nature makes Shellsort particularly advantageous in environments with limited memory availability, such as embedded systems or large datasets where auxiliary storage is prohibitive. Regarding stability, Shellsort is not stable, meaning that equal elements may change their relative order in the sorted output compared to the input. Instability occurs primarily due to swaps in early passes with large gaps, where elements separated by the gap distance—potentially including equal keys—can be exchanged if their current positions violate the sorting order within their subarray. For instance, consider two identical elements positioned such that one precedes the other by the gap h; during the h-sort phase, the algorithm may shift them past each other to resolve inversions in interleaved subarrays. While the insertion sort applied to each subarray during an h-sort phase preserves the relative order of equal elements locally within that subarray, the overall process interleaves multiple such subarrays, allowing global reordering of equals across different groups as gaps decrease. This contrasts with basic insertion sort, which is inherently stable because it only performs adjacent swaps and thus never inverts equal elements. Shellsort deliberately sacrifices this stability to enable faster corrections of distant inversions, enhancing its efficiency over insertion sort for larger, less ordered inputs. To achieve stability when required, modified variants of Shellsort can incorporate stable sorting routines (such as a stability-preserving insertion sort) for each subarray phase, though this typically increases the time overhead due to additional checks for equal elements.

Applications and Comparisons

Practical Applications

Shellsort is particularly valued in embedded systems due to its in-place nature, which requires no additional memory allocation beyond the input array, and its avoidance of recursive calls that could lead to stack overflows in resource-constrained environments. Its implementation simplicity further suits these platforms, enabling efficient handling of moderately sized datasets without the overhead of more complex algorithms. In legacy software, Shellsort was formerly integrated into the Linux kernel's generic sorting routines to replace recursive qsort implementations, ensuring robustness in kernel-space operations with limited stack resources. Similarly, the bzip2 compression utility utilizes Shellsort to sort block data, avoiding recursion depth issues that could disrupt compression in constrained environments like embedded file systems. Shellsort is often incorporated into hybrid sorting strategies, such as combining it with quicksort for subarrays smaller than 50 elements, to mitigate quicksort's potential recursion depth problems while leveraging Shellsort's efficiency on small, nearly sorted inputs. This approach enhances overall performance in general-purpose libraries by switching to Shellsort at low thresholds, reducing overhead from quicksort's partitioning on tiny segments. Its non-recursive structure also aids in environments requiring bounded stack usage, such as task scheduling in real-time operating systems. Additionally, Shellsort serves as an educational tool for illustrating adaptive sorting concepts, demonstrating how incremental gap reductions improve upon basic insertion sort in introductory algorithms courses. Shellsort continues to be used in the uClibc library for embedded systems. It also serves as a subroutine in hybrid algorithms like introsort for sorting small subarrays. In practice, Shellsort performs effectively for array sizes between 100 and 10,000 elements, where empirical benchmarks indicate it completes in approximately 10-20% of the time required by bubble sort on random data, owing to its sub-quadratic behavior. Despite these strengths, Shellsort is rarely the first choice in modern applications, as optimized library implementations like introsort in standard C++ or Java libraries generally outperform it through hybrid techniques tailored for larger datasets. Nonetheless, it remains valuable for understanding incremental algorithmic improvements and serves as a building block in custom sorting solutions for niche constraints.

Comparisons to Other Sorting Algorithms

Shellsort generalizes insertion sort by applying it to interleaved subarrays defined by decreasing gap sizes, enabling early swaps between distant elements that reduces the total number of inversions and improves the worst-case time complexity from O(n²) to sub-quadratic bounds, such as O(n^{4/3}) for certain gap sequences like Sedgewick's increments. Compared to quicksort, Shellsort is fully in-place and non-recursive, avoiding pivot selection overhead and worst-case O(n²) behavior from poor partitions, though its average-case performance is typically 2–5 times slower due to higher constant factors in comparisons and swaps. Empirical tests on random inputs show quicksort outperforming Shellsort variants for arrays larger than 10,000 elements; for example, on 75,000 integers, quicksort takes 20.8 seconds while a divisor-based Shellsort requires 32.4 seconds on comparable hardware (1987 benchmarks). However, Shellsort offers more predictable runtime without randomization needs.
Array SizeShellsort (Divisors)Quicksort
1,0001.5 s1.7 s
5,0002.9 s2.6 s
10,0004.6 s3.8 s
35,00014.7 s10.5 s
75,00032.4 s20.8 s
Table: Execution times (seconds) for sorting random integers on 1987 hardware, adapted from shaker sort variants approximating standard Shellsort performance. Like merge sort, Shellsort achieves sub-quadratic average time complexity, such as O(n^{4/3}) for certain gap sequences, but it uses constant extra space versus merge sort's O(n) auxiliary array requirement, making it preferable in memory-constrained environments. Shellsort's non-sequential access patterns, however, can lead to poorer cache performance compared to merge sort's sequential merges, and it is not stable while merge sort preserves relative order of equal elements. In practice, on datasets up to 42,000 elements, Shellsort execution times are comparable or slightly faster than merge sort in some empirical tests, but merge sort scales better for very large inputs. Heapsort shares Shellsort's in-place nature and O(n log n) worst-case bound, both being comparison-based with no recursion, but Shellsort is often faster in practice for small to medium arrays (under 10,000 elements) due to its adaptive behavior on partially sorted data and simpler implementation without heap maintenance overhead. For larger arrays, such as 1 million elements, both perform poorly in unoptimized implementations (e.g., over 2000 seconds in Python on modern hardware), though optimized versions and hybrids outperform them. Neither is stable, but Shellsort's gap-based passes make it more responsive to input distribution. Empirical studies indicate Shellsort requires roughly 3n log₂ n comparisons in practice for good gap sequences, compared to about 1.4n log₂ n for quicksort, highlighting the higher constant factor that contributes to its relative slowness despite sub-O(n²) scaling. Shellsort is chosen for its simplicity and low overhead in embedded systems or non-general-purpose code where recursion or extra space is undesirable, whereas quicksort, merge sort, or heapsort are preferred for large-scale sorting needing optimal speed or stability. In modern languages like C++, the standard library's std::sort—implementing introsort (a quicksort-heapsort hybrid)—consistently outperforms pure Shellsort by factors of 2–10x on typical workloads, but Shellsort remains useful for custom constraints like limited stack space or sorting small buffers.

References

  1. Shellsort is a general-purpose sorting algorithm that was invented by Shell in 1959 [14]. Empirical results show that it is competitive with the fastest ...
  2. Simple sorting algorithm. – n-1 passes over the array. – At the end of pass i ... – invented by computer scientist Donald Shell in 1959. • based on some ...Missing: High- Speed Procedure
  3. A high-speed sorting procedure. In a recent note1, D. L. Shell has described a high-speed sorting procedure for lists contained in internal memory. · Sorting by ...
  4. Summary of Shellsort from the Document

  5. This is an abstract of a survey talk on the theoretical and empirical studies that have been done over the past four decades on the Shellsort algorithm.
Add your contribution
Related Hubs
User Avatar
No comments yet.