Largest Substring Without Repeating Characters (via Leetcode)
- March 6, 2020
- Key Terms: functions, loops, sets
This example was found on Leetcode.
In this tutorial, I'll walk extensively through my logic for solving this problem.
Problem statement:
Given a string of characters, find the length of the largest substring of unique consecutive characters.
Below are some example inputs and the expected output integer.
The function will be called largest_unique_substring()
and will take an input as a string and return an integer
Below are some commented out test cases we'll check after the function is created.
# assert largest_unique_substring("abcabcbb") == 3 # "abc" is largest substring of unique consecutive characters
# assert largest_unique_substring("bbbbbbbb") == 1 # "b" is largest substring of unique consecutive characters
# assert largest_unique_substring("badcajd") == 4 # "badc" is largest substring of unique consecutive characters
Import Modules¶
import logging
Setup format of logging messages.
logging.basicConfig(level=logging.DEBUG, format='%(asctime)-25s %(levelname)-7s %(lineno)-4s %(message)-8s')
My Setup Logic¶
I think I can calculate the solution in O(n) as I should only need to iterate over the length of elements in the input string, n
, once. As I iterate over the input string, I picture utilizing a storage of a moving window that holds the characters the largest running substring and can be reset to an empty string whenever I come across a non-unique element that already exists in the substring.
I need a variable that holds an integer for the length of the largest running substring. I'll call it largest_substring_length
.
As I iterate over the string, I'll create several substrings and continually compare its length to largest_substring_length
. I only want to create a bigger and bigger substring so long as it holds unique values. Ideally I want as quick lookup as possible to see if an element in input string exists in my running substring. In Python, an optimal data structure to hold unique elements and allow for O(1) time lookup of an element in the data structure is a set. I'll call the data structure set_values
.
Given these data structures, I want to write pseudocode to devise the logic for the solution.
# assign variable substring_set to be an empty set
# assign variable largest_substring_length to 0
# iterate over each element in the input string:
# if element exists in substring_set:
# this element's value already exists in substring_set so it shouldn't be added to substring_set
# assign variable substring_set to be an empty set (again)
# else:
# element does not exist in substring_set...
# add element to end of substring_set
# check if length of substring_set is greater than largest_substring_length...
# if length of substring_set is greater than largest_substring_length:
# assign largest_substring_length to be the length of substring_set
Coded Solution¶
def largest_unique_substring(input_string):
"""
param input_string: Python string of any length
return largest_substring_length: integer to represent length of characters in largest substring of unique consecutive values
"""
logging.debug("assign variable substring_set to be an empty set")
substring_set = set()
logging.debug("assign variable largest_substring_length to 0")
largest_substring_length = 0
logging.debug("iterate over each element in the input string...")
for element in input_string:
logging.info(f"element: {element}")
logging.debug("compare if element exists in substring_set...")
if element in substring_set:
logging.debug("this element's value already exists in substring_set so it shouldn't be added to substring_set...")
logging.debug("# assign variable substring_set to be an empty set (again)")
substring_set = set()
else:
logging.debug("element does not exist in substring_set...")
logging.debug("add element to end of substring_set")
substring_set.add(element)
logging.info(f"current substring_set: {substring_set}")
logging.debug("assign variable substring_set_length to be the length of substring_set")
substring_set_length = len(substring_set)
logging.info(f"current substring_set_length: {substring_set_length}")
logging.debug("check if length of substring_set is greater than largest_substring_length...")
if len(substring_set) > largest_substring_length:
logging.debug("assign largest_substring_length to be the length of substring_set")
largest_substring_length = substring_set_length
logging.info(f"current largest_substring_length: {largest_substring_length}")
return largest_substring_length
The function above can be written in 12 lines but I always recommend including logging messages to ensure the code works as you expect.
Test Cases¶
Let's validate the loggging messages with our first test case above.
assert largest_unique_substring("abcabcbb") == 3 # "abc" is largest substring of unique consecutive characters
Let's turn logging messages off and run the next two test cases.
logger = logging.getLogger()
logger.disabled = True
assert largest_unique_substring("bbbbbbbb") == 1 # "b" is largest substring of unique consecutive characters
assert largest_unique_substring("badcajd") == 4 # "badc" is largest substring of unique consecutive characters