Largest Substring Without Repeating Characters (via Leetcode)¶
Date published: 2020-03-06
Category: Python
Subcategory: Beginner Algorithms
Tags: functions, loops, sets
This example was found on Leetcode.
In this tutorial, I'll walk extensively through my logic for solving this problem.
Problem statement:
Given a string of characters, find the length of the largest substring of unique consecutive characters.
Below are some example inputs and the expected output integer.
The function will be called largest_unique_substring()
and will take an input as a string and return an integer
Below are some commented out test cases we'll check after the function is created.
# assert largest_unique_substring("abcabcbb") == 3 # "abc" is largest substring of unique consecutive characters
# assert largest_unique_substring("bbbbbbbb") == 1 # "b" is largest substring of unique consecutive characters
# assert largest_unique_substring("badcajd") == 4 # "badc" is largest substring of unique consecutive characters
Import Modules¶
import logging
Setup format of logging messages.
logging.basicConfig(level=logging.DEBUG, format='%(asctime)-25s %(levelname)-7s %(lineno)-4s %(message)-8s')
My Setup Logic¶
I think I can calculate the solution in O(n) as I should only need to iterate over the length of elements in the input string, n
, once. As I iterate over the input string, I picture utilizing a storage of a moving window that holds the characters the largest running substring and can be reset to an empty string whenever I come across a non-unique element that already exists in the substring.
I need a variable that holds an integer for the length of the largest running substring. I'll call it largest_substring_length
.
As I iterate over the string, I'll create several substrings and continually compare its length to largest_substring_length
. I only want to create a bigger and bigger substring so long as it holds unique values. Ideally I want as quick lookup as possible to see if an element in input string exists in my running substring. In Python, an optimal data structure to hold unique elements and allow for O(1) time lookup of an element in the data structure is a set. I'll call the data structure set_values
.
Given these data structures, I want to write pseudocode to devise the logic for the solution.
# assign variable substring_set to be an empty set
# assign variable largest_substring_length to 0
# iterate over each element in the input string:
# if element exists in substring_set:
# this element's value already exists in substring_set so it shouldn't be added to substring_set
# assign variable substring_set to be an empty set (again)
# else:
# element does not exist in substring_set...
# add element to end of substring_set
# check if length of substring_set is greater than largest_substring_length...
# if length of substring_set is greater than largest_substring_length:
# assign largest_substring_length to be the length of substring_set
Coded Solution¶
def largest_unique_substring(input_string):
"""
param input_string: Python string of any length
return largest_substring_length: integer to represent length of characters in largest substring of unique consecutive values
"""
logging.debug("assign variable substring_set to be an empty set")
substring_set = set()
logging.debug("assign variable largest_substring_length to 0")
largest_substring_length = 0
logging.debug("iterate over each element in the input string...")
for element in input_string:
logging.info(f"element: {element}")
logging.debug("compare if element exists in substring_set...")
if element in substring_set:
logging.debug("this element's value already exists in substring_set so it shouldn't be added to substring_set...")
logging.debug("# assign variable substring_set to be an empty set (again)")
substring_set = set()
else:
logging.debug("element does not exist in substring_set...")
logging.debug("add element to end of substring_set")
substring_set.add(element)
logging.info(f"current substring_set: {substring_set}")
logging.debug("assign variable substring_set_length to be the length of substring_set")
substring_set_length = len(substring_set)
logging.info(f"current substring_set_length: {substring_set_length}")
logging.debug("check if length of substring_set is greater than largest_substring_length...")
if len(substring_set) > largest_substring_length:
logging.debug("assign largest_substring_length to be the length of substring_set")
largest_substring_length = substring_set_length
logging.info(f"current largest_substring_length: {largest_substring_length}")
return largest_substring_length
The function above can be written in 12 lines but I always recommend including logging messages to ensure the code works as you expect.
Test Cases¶
Let's validate the loggging messages with our first test case above.
assert largest_unique_substring("abcabcbb") == 3 # "abc" is largest substring of unique consecutive characters
2020-03-06 18:02:07,154 DEBUG 6 assign variable substring_set to be an empty set 2020-03-06 18:02:07,157 DEBUG 9 assign variable largest_substring_length to 0 2020-03-06 18:02:07,159 DEBUG 12 iterate over each element in the input string... 2020-03-06 18:02:07,160 INFO 14 element: a 2020-03-06 18:02:07,162 DEBUG 16 compare if element exists in substring_set... 2020-03-06 18:02:07,163 DEBUG 22 element does not exist in substring_set... 2020-03-06 18:02:07,164 DEBUG 23 add element to end of substring_set 2020-03-06 18:02:07,165 INFO 25 current substring_set: {'a'} 2020-03-06 18:02:07,166 DEBUG 27 assign variable substring_set_length to be the length of substring_set 2020-03-06 18:02:07,167 INFO 29 current substring_set_length: 1 2020-03-06 18:02:07,167 DEBUG 31 check if length of substring_set is greater than largest_substring_length... 2020-03-06 18:02:07,168 DEBUG 33 assign largest_substring_length to be the length of substring_set 2020-03-06 18:02:07,170 INFO 35 current largest_substring_length: 1 2020-03-06 18:02:07,172 INFO 14 element: b 2020-03-06 18:02:07,173 DEBUG 16 compare if element exists in substring_set... 2020-03-06 18:02:07,175 DEBUG 22 element does not exist in substring_set... 2020-03-06 18:02:07,176 DEBUG 23 add element to end of substring_set 2020-03-06 18:02:07,177 INFO 25 current substring_set: {'b', 'a'} 2020-03-06 18:02:07,178 DEBUG 27 assign variable substring_set_length to be the length of substring_set 2020-03-06 18:02:07,179 INFO 29 current substring_set_length: 2 2020-03-06 18:02:07,180 DEBUG 31 check if length of substring_set is greater than largest_substring_length... 2020-03-06 18:02:07,181 DEBUG 33 assign largest_substring_length to be the length of substring_set 2020-03-06 18:02:07,182 INFO 35 current largest_substring_length: 2 2020-03-06 18:02:07,182 INFO 14 element: c 2020-03-06 18:02:07,184 DEBUG 16 compare if element exists in substring_set... 2020-03-06 18:02:07,184 DEBUG 22 element does not exist in substring_set... 2020-03-06 18:02:07,185 DEBUG 23 add element to end of substring_set 2020-03-06 18:02:07,187 INFO 25 current substring_set: {'c', 'b', 'a'} 2020-03-06 18:02:07,189 DEBUG 27 assign variable substring_set_length to be the length of substring_set 2020-03-06 18:02:07,192 INFO 29 current substring_set_length: 3 2020-03-06 18:02:07,193 DEBUG 31 check if length of substring_set is greater than largest_substring_length... 2020-03-06 18:02:07,195 DEBUG 33 assign largest_substring_length to be the length of substring_set 2020-03-06 18:02:07,196 INFO 35 current largest_substring_length: 3 2020-03-06 18:02:07,196 INFO 14 element: a 2020-03-06 18:02:07,197 DEBUG 16 compare if element exists in substring_set... 2020-03-06 18:02:07,198 DEBUG 18 this element's value already exists in substring_set so it shouldn't be added to substring_set... 2020-03-06 18:02:07,199 DEBUG 19 # assign variable substring_set to be an empty set (again) 2020-03-06 18:02:07,201 INFO 14 element: b 2020-03-06 18:02:07,204 DEBUG 16 compare if element exists in substring_set... 2020-03-06 18:02:07,207 DEBUG 22 element does not exist in substring_set... 2020-03-06 18:02:07,209 DEBUG 23 add element to end of substring_set 2020-03-06 18:02:07,210 INFO 25 current substring_set: {'b'} 2020-03-06 18:02:07,212 DEBUG 27 assign variable substring_set_length to be the length of substring_set 2020-03-06 18:02:07,213 INFO 29 current substring_set_length: 1 2020-03-06 18:02:07,215 DEBUG 31 check if length of substring_set is greater than largest_substring_length... 2020-03-06 18:02:07,216 INFO 14 element: c 2020-03-06 18:02:07,218 DEBUG 16 compare if element exists in substring_set... 2020-03-06 18:02:07,219 DEBUG 22 element does not exist in substring_set... 2020-03-06 18:02:07,220 DEBUG 23 add element to end of substring_set 2020-03-06 18:02:07,222 INFO 25 current substring_set: {'b', 'c'} 2020-03-06 18:02:07,223 DEBUG 27 assign variable substring_set_length to be the length of substring_set 2020-03-06 18:02:07,225 INFO 29 current substring_set_length: 2 2020-03-06 18:02:07,226 DEBUG 31 check if length of substring_set is greater than largest_substring_length... 2020-03-06 18:02:07,228 INFO 14 element: b 2020-03-06 18:02:07,229 DEBUG 16 compare if element exists in substring_set... 2020-03-06 18:02:07,230 DEBUG 18 this element's value already exists in substring_set so it shouldn't be added to substring_set... 2020-03-06 18:02:07,231 DEBUG 19 # assign variable substring_set to be an empty set (again) 2020-03-06 18:02:07,231 INFO 14 element: b 2020-03-06 18:02:07,232 DEBUG 16 compare if element exists in substring_set... 2020-03-06 18:02:07,233 DEBUG 22 element does not exist in substring_set... 2020-03-06 18:02:07,233 DEBUG 23 add element to end of substring_set 2020-03-06 18:02:07,234 INFO 25 current substring_set: {'b'} 2020-03-06 18:02:07,235 DEBUG 27 assign variable substring_set_length to be the length of substring_set 2020-03-06 18:02:07,236 INFO 29 current substring_set_length: 1 2020-03-06 18:02:07,238 DEBUG 31 check if length of substring_set is greater than largest_substring_length...
Let's turn logging messages off and run the next two test cases.
logger = logging.getLogger()
logger.disabled = True
assert largest_unique_substring("bbbbbbbb") == 1 # "b" is largest substring of unique consecutive characters
assert largest_unique_substring("badcajd") == 4 # "badc" is largest substring of unique consecutive characters