Trie optimization for Word Break I and II

Isaac_Haseley · March 18, 2024, 11:28pm

For both Word Break I and II, our inner loops have bodies that perform polynomial time checks to see whether any words in our dictionary match any of O(n) possible substrings. What if we loaded the dictionary into a trie, and the inner loop traversed the trie for each remaining char in s?

Course: Grokking Coding Interview Patterns in Python - Learn Interactively
Lesson: Word Break II - Grokking Coding Interview Patterns in Python

Dian_Us_Suqlain · April 15, 2024, 5:55am

Hi @Isaac_Haseley,

I’m thinking that having a dictionary loaded onto a trie can be done in the following way for word break 1 and 2:

For “Word Break I” instead of checking if each substring exists in a set (which involves creating many substring objects and potentially large hash table lookups), we can walk through the trie from the start of s for as long as possible. We can also use the dp array to mark valid endpoints of words found in the trie.

And for “Word Break II” similarly, the trie can be used to traverse through each substring, and when a leaf node (end of a valid word) is reached, we can recurse or iterate to check further substrings. This way, we construct only valid sentences from words existing in the trie, thus reducing unnecessary checks and concatenations.

I’m not sure about the complexity analysis, but the approach of loading the dictionary into a trie might allow us for rapid prefix matching during traversal in O(k) time complexity per character, where k is the length of the query string. In this way, we can effectively eliminate the need for polynomial time checks, enabling quick identification of valid word segments and resulting in notable runtime performance improvements for both problems.

Isaac_Haseley · April 17, 2024, 1:51am

Hey @Dian_Us_Suqlain,

Yay! Here’s an approach for Word Break I:

class TrieNode:
    def __init__(self):
        self.children = {}
        self.end_of_word = False

def insert_to_trie(word, root):
    current = root
    for letter in word:
        current = current.children.setdefault(letter, TrieNode())
    current.end_of_word = True

def build_trie(words):
    root = TrieNode()
    for word in words:
        insert_to_trie(word, root)
    return root

def search_in_trie(node, s, index, dp):
    current = node
    for i in range(index, len(s)):
        if s[i] not in current.children:
            return False
        current = current.children[s[i]]
        if current.end_of_word:
            dp[i+1] = True

def word_break(s, word_dict):
    root = build_trie(word_dict)
    dp = [False] * (len(s) + 1)
    dp[0] = True

    for i in range(len(s)):
        if dp[i]:
            search_in_trie(root, s, i, dp):
    return dp[-1]

Time: O(nk), where k is the length of the longest word in the dictionary
Space: O(n+w), where w is the total number of characters in the dictionary