Jheel Patel | Cloud Systems & Software Engineer

Problem 1.1

The question asks about the state of two subarrays after the two outermost recursive calls in a merge-sort.

The original array is: [5, 3, 8, 9, 1, 7, 0, 2, 6, 4]

Solution

The two outhermost recursive calls are made on the two halves of the original array. Merge sort is a divide-and-conquer algorithm hence it relies on the fact that at any given recursion level, the two subarrays returned by the recursive calls are individually sorted.

Before the two outermost recursive calls:

Left subarray: [5, 3, 8, 9, 1]

Right subarray: [7, 0, 2, 6, 4]

In this case both the subarrays are of same length otherwise we need to make assumption on which subarray can be odd. That assumption won't change the result.
After the two outermost recursive calls:

Left subarray: [1, 3, 5, 8, 9]

Right subarray: [0, 2, 4, 6, 7]

Hence, the answer to the original question: number at the 7th position after gluing both the individually sorted subarrays is 2.

Problem 1.2

The question asks about the running time of a slightly modified merge sort algorithm. Divide the input array into thirds (rather than halves), recursively sort each third, and finally combine the results using a three-way Merge subroutine.

Solution

In the recursion tree with the root as the input array, then at a given level $l$ ( $l=0$ for root level) if size of the subarray is $m$ ,

\begin{equation} m \cdot 3^l = n \implies m = \frac{n}{3^l} \end{equation}

At level $(l-1)$ , for each three-way merge, total comparisons required, in worst case, is $2 \cdot 3 \cdot \frac{n}{3^l}$ . Hence, total no. comparisons at level $(l-1)$ is:

\begin{equation} 2 \cdot 3 \cdot \frac{n}{3^l} \cdot 3^{l-1} = 2 \cdot n \end{equation}

If total no. of levels is L:

\begin{equation} 3 ^ L = n \implies L = \log_2{n}/\log_2{3} \end{equation}

Hence, if we assume sequential execution, total running time, approximetely, is:

\begin{equation} T(n) = L \cdot 2 \cdot n = \frac{2n}{\log_2{3}} \cdot \log_2{n} = O(n \log{n}) \end{equation}

Problem 1.3

In this problem, we are given $k$ sorted arrays, each containing $n$ elements. We need to merge them into a single sorted array of $k \cdot n$ elements. For each array of size $n$ , merge it with previously merged array of size $i \cdot n$ where $i$ is the index of current array (0-indexed).

Solution

Using the merge procedure from merge sort to merge two sorted arrays of sizes $a$ and $b$ has time complexity $O(a+b)$ . Hence total time taken to merge $k$ arrays is:

\begin{equation} \begin{aligned} T(n, k) & = n + 2 \cdot n + 3 \cdot n + ... + k \cdot n \\ & = (1 + 2 + 3 + ... + k) \cdot n \\ & = \frac{k(k+1)}{2} \cdot n = O(k^2 \cdot n) \end{aligned} \end{equation}

Problem 1.4

This is same problem as Problem 1.3 but we need to use different merging procedure. It divides the $k$ arrays into $k / 2$ pairs, merges each pair, resulting in $k / 2$ sorted arrays of length $2n$ , and repeats this process until only one sorted array remains.

Solution

Before each merging step, we have $k$ arrays of size $n$ . After first merging step, we have $k/2$ arrays of size $2n$ . After second merging step, we have $k/4$ arrays of size $4n$ . Continuing this way, after $i^{th}$ merging step, we have $k / 2^i$ arrays of size $2^i \cdot n$ .
Hence time taken at each merging step $i$ is $O(k \cdot n)$ .
Total merging steps required to get a single sorted array is $\log_2{k}$ .
Hence total time taken to merge $k$ arrays is:

\begin{equation} T(n, k) = O(k \cdot n \cdot \log_2{k}) \end{equation}

Problem 1.5

In this problem, we are given an unsorted array of $n$ distinct numbers, and we need to give an algorithm to find the $2nd$ largest number in the array using at most $n + \log_2{n} - 2$ comparisons. For simplicity, we can assume that $n$ is a power of $2$ .

Solution

Idea: Using tournament method. Pair up the elements and compare them. The larger element moves to the next round. Continue this until we find the largest element. The second largest element must be one of the elements that lost to the largest element in each round.

Use merge sort to create a tournament tree. This will take $n - 1$ comparisons to find the largest element.
To find the second largest element, we need to look at the elements that lost to the largest element. The largest element will have competed in $\log_2{n}$ rounds, so there are $\log_2{n}$ candidates for the second largest element.
We can find the largest among these candidates with $\log_2{n} - 1$ comparisons.
Hence, total comparisons made is:

\begin{equation} T(n) = (n - 1) + (\log_2{n} - 1) = n + \log_2{n} - 2 \end{equation}

Python Code Implementation:

def find_second_largest(arr):
    comparisons = {}

    def tournament(arr):
        if len(arr) == 1:
            return arr[0]
        mid = len(arr) // 2
        left_winner = tournament(arr[:mid])
        right_winner = tournament(arr[mid:])
        if left_winner > right_winner:
            comparisons[left_winner] = comparisons.get(left_winner, []) + [right_winner]
            if right_winner in comparisons:
                del comparisons[right_winner]
            return left_winner
        # distinct elements assumption
        else:
            comparisons[right_winner] = comparisons.get(right_winner, []) + [left_winner]
            if left_winner in comparisons:
                del comparisons[left_winner]
            return right_winner

    largest = tournament(arr)
    candidates = comparisons[largest]
    second_largest = max(candidates)

    return second_largest

Proof of correctness

In tournament recursion tree at the last recursion level, we have pairs of elements compared. The larger element from each pair moves up the tree. These moved up elements are the only candidates for the largest element. Is this is not the case then one of the pairs returned the smaller element which contradicts the if-else case. Using induction, we can say that at each level of recursion tree, the larger elements move up and smaller elements are discarded. Hence, at the root of the recursion tree, we have the largest element.
Proof that second largest element is among the elements that lost to the largest element: At each recursion level,
- Case 1: 2nd largest element is paired with largest element. In this case, it loses to the largest element and is among the candidates.
- Case 2: 2nd largest element is paired with some other element. In this case, it wins and moves up the tree. At some later level, or ultimately after the two outermost recursions, it will be paired with the largest element and lose to it. Hence, it will be among the candidates.
At each level of recursion, largest element competes with one element and loses to none. Hence, total no. of candidates for 2nd largest element is $\log_2{n}$ . Finding the largest among these candidates will give us the 2nd largest element in the original array.