Python Beginner Algorithms Tutorial

Largest Substring Without Repeating Characters (via Leetcode)

This example was found on Leetcode.

In this tutorial, I'll walk extensively through my logic for solving this problem.

Problem statement:

Given a string of characters, find the length of the largest substring of unique consecutive characters.

Below are some example inputs and the expected output integer.

The function will be called largest_unique_substring() and will take an input as a string and return an integer

Below are some commented out test cases we'll check after the function is created.

In [3]:
# assert largest_unique_substring("abcabcbb") == 3  # "abc" is largest substring of unique consecutive characters
# assert largest_unique_substring("bbbbbbbb") == 1  # "b" is largest substring of unique consecutive characters
# assert largest_unique_substring("badcajd") == 4  # "badc" is largest substring of unique consecutive characters 

Import Modules

In [1]:
import logging

Setup format of logging messages.

In [2]:
logging.basicConfig(level=logging.DEBUG, format='%(asctime)-25s %(levelname)-7s %(lineno)-4s %(message)-8s')                      

My Setup Logic

I think I can calculate the solution in O(n) as I should only need to iterate over the length of elements in the input string, n, once. As I iterate over the input string, I picture utilizing a storage of a moving window that holds the characters the largest running substring and can be reset to an empty string whenever I come across a non-unique element that already exists in the substring.

I need a variable that holds an integer for the length of the largest running substring. I'll call it largest_substring_length.

As I iterate over the string, I'll create several substrings and continually compare its length to largest_substring_length. I only want to create a bigger and bigger substring so long as it holds unique values. Ideally I want as quick lookup as possible to see if an element in input string exists in my running substring. In Python, an optimal data structure to hold unique elements and allow for O(1) time lookup of an element in the data structure is a set. I'll call the data structure set_values.

Given these data structures, I want to write pseudocode to devise the logic for the solution.

In [4]:
# assign variable substring_set to be an empty set
# assign variable largest_substring_length to 0

# iterate over each element in the input string:
    # if element exists in substring_set:
        # this element's value already exists in substring_set so it shouldn't be added to substring_set
        # assign variable substring_set to be an empty set (again)
    # else:
        # element does not exist in substring_set...
        # add element to end of substring_set
        # check if length of substring_set is greater than largest_substring_length...
        # if length of substring_set is greater than largest_substring_length:
            # assign largest_substring_length to be the length of substring_set

Coded Solution

In [5]:
def largest_unique_substring(input_string):
    """
    param input_string: Python string of any length
    return largest_substring_length: integer to represent length of characters in largest substring of unique consecutive values
    """
    logging.debug("assign variable substring_set to be an empty set")
    substring_set = set()

    logging.debug("assign variable largest_substring_length to 0")
    largest_substring_length = 0 

    logging.debug("iterate over each element in the input string...")
    for element in input_string:
        logging.info(f"element: {element}")
        
        logging.debug("compare if element exists in substring_set...")
        if element in substring_set:
            logging.debug("this element's value already exists in substring_set so it shouldn't be added to substring_set...")
            logging.debug("# assign variable substring_set to be an empty set (again)")
            substring_set = set()
        else:
            logging.debug("element does not exist in substring_set...")
            logging.debug("add element to end of substring_set")
            substring_set.add(element)
            logging.info(f"current substring_set: {substring_set}")
            
            logging.debug("assign variable substring_set_length to be the length of substring_set")
            substring_set_length = len(substring_set)
            logging.info(f"current substring_set_length: {substring_set_length}")
            
            logging.debug("check if length of substring_set is greater than largest_substring_length...")
            if len(substring_set) > largest_substring_length:
                logging.debug("assign largest_substring_length to be the length of substring_set")
                largest_substring_length = substring_set_length
                logging.info(f"current largest_substring_length: {largest_substring_length}")
    return largest_substring_length

The function above can be written in 12 lines but I always recommend including logging messages to ensure the code works as you expect.

Test Cases

Let's validate the loggging messages with our first test case above.

In [6]:
assert largest_unique_substring("abcabcbb") == 3  # "abc" is largest substring of unique consecutive characters
2020-03-06 18:02:07,154   DEBUG   6    assign variable substring_set to be an empty set
2020-03-06 18:02:07,157   DEBUG   9    assign variable largest_substring_length to 0
2020-03-06 18:02:07,159   DEBUG   12   iterate over each element in the input string...
2020-03-06 18:02:07,160   INFO    14   element: a
2020-03-06 18:02:07,162   DEBUG   16   compare if element exists in substring_set...
2020-03-06 18:02:07,163   DEBUG   22   element does not exist in substring_set...
2020-03-06 18:02:07,164   DEBUG   23   add element to end of substring_set
2020-03-06 18:02:07,165   INFO    25   current substring_set: {'a'}
2020-03-06 18:02:07,166   DEBUG   27   assign variable substring_set_length to be the length of substring_set
2020-03-06 18:02:07,167   INFO    29   current substring_set_length: 1
2020-03-06 18:02:07,167   DEBUG   31   check if length of substring_set is greater than largest_substring_length...
2020-03-06 18:02:07,168   DEBUG   33   assign largest_substring_length to be the length of substring_set
2020-03-06 18:02:07,170   INFO    35   current largest_substring_length: 1
2020-03-06 18:02:07,172   INFO    14   element: b
2020-03-06 18:02:07,173   DEBUG   16   compare if element exists in substring_set...
2020-03-06 18:02:07,175   DEBUG   22   element does not exist in substring_set...
2020-03-06 18:02:07,176   DEBUG   23   add element to end of substring_set
2020-03-06 18:02:07,177   INFO    25   current substring_set: {'b', 'a'}
2020-03-06 18:02:07,178   DEBUG   27   assign variable substring_set_length to be the length of substring_set
2020-03-06 18:02:07,179   INFO    29   current substring_set_length: 2
2020-03-06 18:02:07,180   DEBUG   31   check if length of substring_set is greater than largest_substring_length...
2020-03-06 18:02:07,181   DEBUG   33   assign largest_substring_length to be the length of substring_set
2020-03-06 18:02:07,182   INFO    35   current largest_substring_length: 2
2020-03-06 18:02:07,182   INFO    14   element: c
2020-03-06 18:02:07,184   DEBUG   16   compare if element exists in substring_set...
2020-03-06 18:02:07,184   DEBUG   22   element does not exist in substring_set...
2020-03-06 18:02:07,185   DEBUG   23   add element to end of substring_set
2020-03-06 18:02:07,187   INFO    25   current substring_set: {'c', 'b', 'a'}
2020-03-06 18:02:07,189   DEBUG   27   assign variable substring_set_length to be the length of substring_set
2020-03-06 18:02:07,192   INFO    29   current substring_set_length: 3
2020-03-06 18:02:07,193   DEBUG   31   check if length of substring_set is greater than largest_substring_length...
2020-03-06 18:02:07,195   DEBUG   33   assign largest_substring_length to be the length of substring_set
2020-03-06 18:02:07,196   INFO    35   current largest_substring_length: 3
2020-03-06 18:02:07,196   INFO    14   element: a
2020-03-06 18:02:07,197   DEBUG   16   compare if element exists in substring_set...
2020-03-06 18:02:07,198   DEBUG   18   this element's value already exists in substring_set so it shouldn't be added to substring_set...
2020-03-06 18:02:07,199   DEBUG   19   # assign variable substring_set to be an empty set (again)
2020-03-06 18:02:07,201   INFO    14   element: b
2020-03-06 18:02:07,204   DEBUG   16   compare if element exists in substring_set...
2020-03-06 18:02:07,207   DEBUG   22   element does not exist in substring_set...
2020-03-06 18:02:07,209   DEBUG   23   add element to end of substring_set
2020-03-06 18:02:07,210   INFO    25   current substring_set: {'b'}
2020-03-06 18:02:07,212   DEBUG   27   assign variable substring_set_length to be the length of substring_set
2020-03-06 18:02:07,213   INFO    29   current substring_set_length: 1
2020-03-06 18:02:07,215   DEBUG   31   check if length of substring_set is greater than largest_substring_length...
2020-03-06 18:02:07,216   INFO    14   element: c
2020-03-06 18:02:07,218   DEBUG   16   compare if element exists in substring_set...
2020-03-06 18:02:07,219   DEBUG   22   element does not exist in substring_set...
2020-03-06 18:02:07,220   DEBUG   23   add element to end of substring_set
2020-03-06 18:02:07,222   INFO    25   current substring_set: {'b', 'c'}
2020-03-06 18:02:07,223   DEBUG   27   assign variable substring_set_length to be the length of substring_set
2020-03-06 18:02:07,225   INFO    29   current substring_set_length: 2
2020-03-06 18:02:07,226   DEBUG   31   check if length of substring_set is greater than largest_substring_length...
2020-03-06 18:02:07,228   INFO    14   element: b
2020-03-06 18:02:07,229   DEBUG   16   compare if element exists in substring_set...
2020-03-06 18:02:07,230   DEBUG   18   this element's value already exists in substring_set so it shouldn't be added to substring_set...
2020-03-06 18:02:07,231   DEBUG   19   # assign variable substring_set to be an empty set (again)
2020-03-06 18:02:07,231   INFO    14   element: b
2020-03-06 18:02:07,232   DEBUG   16   compare if element exists in substring_set...
2020-03-06 18:02:07,233   DEBUG   22   element does not exist in substring_set...
2020-03-06 18:02:07,233   DEBUG   23   add element to end of substring_set
2020-03-06 18:02:07,234   INFO    25   current substring_set: {'b'}
2020-03-06 18:02:07,235   DEBUG   27   assign variable substring_set_length to be the length of substring_set
2020-03-06 18:02:07,236   INFO    29   current substring_set_length: 1
2020-03-06 18:02:07,238   DEBUG   31   check if length of substring_set is greater than largest_substring_length...

Let's turn logging messages off and run the next two test cases.

In [7]:
logger = logging.getLogger()
logger.disabled = True
In [8]:
assert largest_unique_substring("bbbbbbbb") == 1  # "b" is largest substring of unique consecutive characters
assert largest_unique_substring("badcajd") == 4  # "badc" is largest substring of unique consecutive characters