Python Advanced Algorithms Tutorial

Find the Median of Two Sorted Arrays (via Leetcode)

This example was found on Leetcode.

In this tutorial, I'll walk extensively through my logic for solving this problem.

Problem statement:

There are two sorted arrays, nums1 and nums2 of size m and n respectively.

Find the median of the two sorted arrays. Assume both arrays are non-empty.

Test cases:

Each list must have at least one value, should be sorted from least to greatest and should only contain integers.

Sum of length of lists is an odd number:

  • nums1=[1] and nums2=[1, 2]
    • len(nums1) = 1 and len(nums2) = 2
    • Sum of length of lists is 3
    • Median is 1
    • Need to traverse to index 1 of combined lists - equivalent to range(0, 2) since 2nd argument exclusive
  • nums1=[1] and nums2=[1, 2, 3, 4]
    • len(nums1) = 1 and len(nums2) = 4
    • Sum of length of lists is 5
    • Median is 2
    • Need to traverse to index 2 of combined lists - equivalent to range(0, 3)
    • A special edge case: want to keep on iterating over elements nums2 and can't utilize index greater than 0 for nums1
  • nums1=[1, 5] and nums2=[0, 3, 6]
    • len(nums1) = 2 and len(nums2) = 3
    • Sum of length of lists is 5
    • Median is 3
    • Need to traverse to index 2 of combined lists - equivalent to range(0, 3)

Sum of length of lists is an even number:

  • nums1=[1] and nums2=[2]
    • len(nums1) = 1 and len(nums2) = 1
    • Sum of length of lists is 2.
    • Median is (1+2)/2 = 1.5
    • Need to traverse to index 1 of combined lists - equivalent to range(0, 2)
  • nums1=[1] and nums2=[2, 4, 6]
    • len(nums1) = 1 and len(nums2) = 3
    • Sum of length of lists is 4
    • Median is (2+4)/2 = 3
    • Need to traverse to index 2 of combined lists - equivalent to range(0, 3)
  • nums1=[1, 3] and nums2=[2, 4]
    • len(nums1) = 2 and len(nums2) = 2
    • Sum of length of lists is 4
    • Median is (2+3)/2 = 2.5
    • Need to traverse to index 2 of combined lists - equivalent to range(0, 3)

Let's write out Python test cases.

In [1]:
# assert find_median_of_two_sorted_arrays([1], [1, 2]) == 1
# assert find_median_of_two_sorted_arrays([1], [1, 2, 3, 4]) == 2
# assert find_median_of_two_sorted_arrays([1, 5], [0, 3, 6]) == 3
# assert find_median_of_two_sorted_arrays([1], [2]) == 1.5
# assert find_median_of_two_sorted_arrays([1], [2, 4, 6]) == 3
# assert find_median_of_two_sorted_arrays([1, 3], [2, 4]) == 2.5

Brute Force Method

In general, I think it's best practice to first try to solve the problem using brute force - meaning solve it by any means necessary - even if it's not best-performant code. Use the plethora of Python functions, data structures and methods to derive the solution in the quickest way you know possible.

Programming in the real world, in most cases, just solved in a quick fashion - likely the brute force way - is more optimal than solving the problem over a long duration with more performant code that uses less memory and computes in a shorter time period. Once the problem is solved, then work on optimizing it to be better.

Pseudocode

We have two sorted lists. It's confusing to work with two separate data structures and manage iterating over them. Therefore, I think it's simplest to create one sorted list.

There are two formulas to find the median of a list - one if the list has an even number of elements and another if the list has an odd number of elements.

Since there will be one sorted list, I can easily retrieve the element(s) at a given index needed to calculate the median.

This method isn't performant because it's high time complexity to sort a list and additional space complexity to store a new sorted list. My hypothesis is that there's a simpler time complexity way that may involve just one iteration over half of m + n in order to find the median value.

Coded Solution

In [3]:
import logging
logging.basicConfig(level=logging.DEBUG, format='%(asctime)-25s %(levelname)-7s %(lineno)-4s %(message)-8s')                      

def find_median_of_two_sorted_arrays(nums1, nums2):
    """
    Find the median of two sorted arrays through iterating over one combined sorted list.
    Reach middle index and calculate median based on specific formula relative to count of 
    elements in combined list.
    
    param nums1: first sorted list
    param nums2: second sorted list
    return median: median of two sorted lists    
    """

    logging.debug("create one sorted list from nums1 and nums2...")
    sorted_list = sorted(nums1+nums2)
    logging.debug("calculate the length of sorted_list...")
    len_sorted_list = len(sorted_list)
    if len_sorted_list == 2:
        logging.debug("given 2-number sorted list, calculate average of two middle numbers in list...")
        median = (sorted_list[0] + sorted_list[1])/2
    elif len_sorted_list % 2 == 0:
        logging.debug("given even numbered sorted_list, use Python floor division operator to calculate two middle # indices...")
        lower_middle_index = len_sorted_list // 2 - 1
        higher_middle_index = len_sorted_list // 2
        logging.debug("calculate median as ")
        median = (sorted_list[lower_middle_index] + sorted_list[higher_middle_index])/2
    else:
        logging.debug("given odd-numbered sorted_list, calculate median as exact middle value...")
        exact_middle_list_index = len_sorted_list // 2
        median = sorted_list[exact_middle_list_index]
    logging.debug("median: {median}")
    return median

All the test cases below pass.

In [4]:
assert find_median_of_two_sorted_arrays([1], [1, 2]) == 1
2020-03-08 20:47:31,891   DEBUG   4    create one sorted list from nums1 and nums2...
2020-03-08 20:47:31,892   DEBUG   6    calculate the length of sorted_list...
2020-03-08 20:47:31,893   DEBUG   18   given odd-numbered sorted_list, calculate median as exact middle value...
2020-03-08 20:47:31,894   DEBUG   21   median: {median}

Turn off logger messages for verifying rest of test cases.

In [5]:
logger = logging.getLogger()
logger.disabled = True
In [6]:
assert find_median_of_two_sorted_arrays([1], [1, 2, 3, 4]) == 2
assert find_median_of_two_sorted_arrays([1, 5], [0, 3, 6]) == 3
assert find_median_of_two_sorted_arrays([1], [2]) == 1.5
assert find_median_of_two_sorted_arrays([1], [2, 4, 6]) == 3
assert find_median_of_two_sorted_arrays([1, 3], [2, 4]) == 2.5

A More Performant Method

Pseudocode

In [7]:
# a python deque is a great data structure for quick O(1) pops at the front - unlike a list
# assign a Python deque to store the values of nums1
# assign a Python deque to store the values of nums2

# assign variable sum_length_two_lists to be sum of the lengths of the input lists
# assign variable last_iterated_value to be None

# as we iterate over each list, we'll compare the values at index 0 of each list...
# create a function called element_comparisons_on_lists that takes in inputs nums1, nums2, and last_iterated_value
    # NEED TO CHECK LENGTH OF LIST...
    # if len(nums2) is equal to 0 or nums1[0] <= nums2[0]:
        # pop out first element of nums1 and assign it to last_iterated_value
    # else:
        # pop out first element of nums2 and assign it to last_iterated_value
    # return nums1, nums2 and last_iterated_value

# if sum_length_two_lists is equal to two:
    # calculate median as average of two values - the one in east list
# else if sum_length_two_lists is even number:
    # assign middle_values_list to be empty python list
    # create python set of lower-middle-index and upper-middle-index
    # iterate from combined_index in range(0, upper-middle-# + 1)
        # if combined_index in python set:
            # append element to middle_values_list
        # call element_comparisons_on_lists and return values
    # median is average of two numbers in middle_values_list
# else:
    # sum_length_two_lists is an odd number...
    # iterate from combined_index in range(0, middle number in list + 1)
        # call element_comparisons_on_lists and return values
    # median is last_iterated_value
# return median

Algorithm Coded

In [9]:
from collections import deque

def element_comparisons_on_lists(nums1, nums2, last_iterated_value):
    """
    Compute the smaller element of the two sorted lists and remove that value from deque
    
    param nums1: first sorted list
    param nums2: second sorted list
    last_iterated_value: the smaller of available elements in each of two sorted lists
    return nums1, nums2, last_iterated_value
    """
    if len(nums2) == 0: 
        logging.debug("no elements in nums2; pop out first element of nums1 and assign it to last_iterated_value")
        last_iterated_value = nums1.popleft()
    elif len(nums1) == 0:
        logging.debug("no elements in nums1; pop out first element of nums2 and assign it to last_iterated_value")
        last_iterated_value = nums2.popleft()
    elif nums1[0] <= nums2[0]:
        logging.debug("pop out first element of nums1 and assign it to last_iterated_value")
        last_iterated_value = nums1.popleft()
    else:
        logging.debug("pop out first element of nums2 and assign it to last_iterated_value")
        last_iterated_value = nums2.popleft()
    logging.info(f"nums1: {nums1} and nums2: {nums2} and last_iterated_value: {last_iterated_value}")
    return nums1, nums2, last_iterated_value

def find_median_of_two_sorted_arrays(nums1, nums2):
    """
    Find the median of two sorted arrays through iterating over lists to reach middle index and using median
    function specific to the total count of elements in nums1 and nums2
    
    param nums1: first sorted list
    param nums2: second sorted list
    return median: median of two sorted lists
    """
    # a python deque is a great data structure for quick O(1) pops at the front - unlike a list
    logging.debug("assign a Python deque to store the values of nums1...")
    nums1 = deque(nums1)
    logging.debug("assign a Python deque to store the values of nums2...")
    nums2 = deque(nums2)

    logging.debug("assign variable sum_length_two_lists to be sum of the lengths of the input lists...")
    sum_length_two_lists = len(nums1) + len(nums2)
    logging.info(f"sum_length_two_lists: {sum_length_two_lists}")
    logging.debug("assign variable last_iterated_value to be None...") 
    last_iterated_value = None

    logging.debug("as we iterate over each list, we'll compare the values at index 0 of each list...")

    if sum_length_two_lists == 2:
        logging.debug("calculate median as average of two values - the one in each list")
        median = (nums1[0] + nums2[0])/2
    elif sum_length_two_lists % 2 == 0:
        logging.debug("sum_length_two_lists is an even number")
        logging.debug("assign middle_values_list to be empty python list...")
        middle_values_list = []
        logging.debug("create python set of lower-middle-index and upper-middle-index")
        set_middle_indices = {sum_length_two_lists // 2 - 1, sum_length_two_lists // 2}

        logging.debug("iterate over roughly half the indices in sum_length_two_lists...")
        # need +1 on right side b/c it's exclusive and we need to reach index at sum_length_two_lists // 2
        for combined_index in range(0, sum_length_two_lists // 2 + 1):
            logging.debug(f"combined_index: {combined_index}")
            nums1, nums2, last_iterated_value = element_comparisons_on_lists(nums1, nums2, last_iterated_value)

            logging.debug("check if we've reached index of index in list to calculate the median...")
            if combined_index in set_middle_indices:
                middle_values_list.append(last_iterated_value)
                logging.debug(f"middle_values_list: {middle_values_list}")  
        median = sum(middle_values_list)/2 
    # else:
    elif sum_length_two_lists % 2 != 0:
        logging.debug("iterate over roughly half the indices in sum_length_two_lists...")
        for combined_index in range(0, sum_length_two_lists // 2 + 1):
            logging.info(f"combined_index: {combined_index}")
            nums1, nums2, last_iterated_value = element_comparisons_on_lists(nums1, nums2, last_iterated_value)
        median = last_iterated_value
    logging.info(f"median: {median}")
    return median

Verify test cases. All pass.

In [10]:
assert find_median_of_two_sorted_arrays([1], [1, 2]) == 1
assert find_median_of_two_sorted_arrays([1], [1, 2, 3, 4]) == 2
assert find_median_of_two_sorted_arrays([1, 5], [0, 3, 6]) == 3
assert find_median_of_two_sorted_arrays([1], [2]) == 1.5
assert find_median_of_two_sorted_arrays([1], [2, 4, 6]) == 3
assert find_median_of_two_sorted_arrays([1, 3], [2, 4]) == 2.5