K-way merge algorithm
K-way merge algorithm
Main page

K-way merge algorithm

logo
Community Hub0 subscribers
What are your thoughts?
Be the first to start a discussion here.
Be the first to start a discussion here.
K-way merge algorithm

In computer science, k-way merge algorithms or multiway merges are a specific type of sequence merge algorithms that specialize in taking in k sorted lists and merging them into a single sorted list. These merge algorithms generally refer to merge algorithms that take in a number of sorted lists greater than two. Two-way merges are also referred to as binary merges. The k-way merge is also an external sorting algorithm.

A 2-way merge, or a binary merge, has been studied extensively due to its key role in merge sort. An example of such is the classic merge that appears frequently in merge sort examples. The classic merge outputs the data item with the lowest key at each step; given some sorted lists, it produces a sorted list containing all the elements in any of the input lists, and it does so in time proportional to the sum of the lengths of the input lists.

Denote by A[1..p] and B[1..q] two arrays sorted in increasing order. Further, denote by C[1..n] the output array. The canonical 2-way merge algorithm stores indices i, j, and k into A, B, and C respectively. Initially, these indices refer to the first element, i.e., are 1. If A[i] < B[j], then the algorithm copies A[i] into C[k] and increases i and k. Otherwise, the algorithm copies B[j] into C[k] and increases j and k. A special case arises if either i or j have reached the end of A or B. In this case the algorithm copies the remaining elements of B or A into C and terminates.

The k-way merge problem consists of merging k sorted arrays to produce a single sorted array with the same elements. Denote by n the total number of elements. n is equal to the size of the output array and the sum of the sizes of the k input arrays. For simplicity, we assume that none of the input arrays is empty. As a consequence , which simplifies the reported running times. The problem can be solved in running time with space. Several algorithms that achieve this running time exist.

The problem can be solved by iteratively merging two of the k arrays using a 2-way merge until only a single array is left. If the arrays are merged in arbitrary order, then the resulting running time is only O(kn). This is suboptimal.

The running time can be improved by iteratively merging the first with the second, the third with the fourth, and so on. As the number of arrays is halved in each iteration, there are only Θ(log k) iterations. In each iteration every element is moved exactly once. The running time per iteration is therefore in Θ(n) as n is the number of elements. The total running time is therefore in Θ(n log k).

We can further improve upon this algorithm, by iteratively merging the two shortest arrays. It is clear that this minimizes the running time and can therefore not be worse than the strategy described in the previous paragraph. The running time is therefore in O(n log k). Fortunately, in border cases the running time can be better. Consider for example the degenerate case, where all but one array contain only one element. The strategy explained in the previous paragraph needs Θ(n log k) running time, while the improved one only needs Θ(n + k log k) running time.

In this case, we would simultaneously merge k-runs together.

See all
User Avatar
No comments yet.