Chapter 11: Segmenting a String - Word Break
After mastering problems that compare two strings, we return to single-sequence problems, but with a new twist. In this chapter, we'll tackle "Word Break," a classic problem that involves determining if a string can be segmented into a space-separated sequence of one or more dictionary words.
This problem (LeetCode #139) is a prime example of a decision-based DP problem, where the state dp[i] represents whether a certain condition (in this case, successful segmentation) is possible for the prefix of the string up to index i.
The Problem:
Given a string s and a dictionary of strings wordDict, return true if s can be segmented into a space-separated sequence of one or more dictionary words.
Note that the same word in the dictionary may be reused multiple times in the segmentation.
For example:
s = "leetcode",wordDict = ["leet", "code"]-> The answer istruebecause "leetcode" can be segmented as "leet code".s = "applepenapple",wordDict = ["apple", "pen"]-> The answer istrue.s = "catsandog",wordDict = ["cats", "dog", "sand", "and", "cat"]-> The answer isfalse.
Approach 1: The Recursive Solution
Let's define a recursive function, solve(i), that returns true if the suffix s[i:] can be segmented.
To implement this, we can iterate through all possible end points j for the first word in the suffix s[i:]. For each j, we check if the substring s[i:j] is in our dictionary. If it is, we then recursively call our function on the rest of the string, solve(j). If any of these recursive calls return true, it means we've found a valid segmentation for the entire suffix, and we can return true.
The base case is when i reaches the end of the string (i == len(s)), which signifies a successful segmentation.
from typing import List
def wordBreak_recursive(s: str, wordDict: List[str]) -> bool:
word_set = set(wordDict) # Use a set for O(1) average time complexity lookups
def solve(i):
# Base case: we've successfully segmented the whole string
if i == len(s):
return True
# Try every possible end point for the current word
for j in range(i + 1, len(s) + 1):
prefix = s[i:j]
if prefix in word_set and solve(j):
return True
return False
return solve(0)
# Trying it out
# print(wordBreak_recursive("leetcode", ["leet", "code"])) # true
# print(wordBreak_recursive("aaaaaaaaaaaaaaaaaaaaaab", ["a","aa","aaa"])) # Very slow
This brute-force approach explores every possible segmentation, leading to an exponential time complexity due to re-computing results for the same suffixes.
Approach 2: Memoization with a Decorator (Top-Down DP)
The state of our recursion is solely determined by the starting index i. This is a perfect scenario for memoization to avoid re-solving for the same suffix.
With memoization, each state solve(i) is computed only once. The time complexity becomes roughly O(n^2), where n is the length of the string, because for each starting index i, the inner loop can run up to n times.
Approach 3: Iteration (Bottom-Up DP)
For the bottom-up approach, we'll use a boolean array dp of size n+1. dp[i] will be true if the prefix s[:i] can be segmented.
Create a
dparray of sizen+1and initialize all elements tofalse.Set
dp[0] = true. This is our base case, representing that an empty prefix can always be segmented.Iterate
ifrom 1 ton. This is the length of the prefix we are trying to validate.Inside, iterate
jfrom 0 toi-1.jrepresents a potential split point.Check two conditions:
Is
dp[j]true? (Meaning, can the string up tojbe segmented?)Is the substring
s[j:i]in our dictionary?
If both are true, it means we can extend the valid segmentation at
jwith a new word (s[j:i]) to form a valid segmentation for the prefix of lengthi. So, we setdp[i] = trueand canbreakthe inner loop (since we just need one valid segmentation).The final answer is
dp[n].
This bottom-up approach is clean, has the same O(n^2) time complexity as the memoized solution, and is often preferred in interviews for its explicit, non-recursive structure.
Conclusion
The "Word Break" problem introduced a new way to think about 1D DP. Instead of calculating an optimal value (like a sum or count), we calculated a boolean possibility. The core idea of dp[i] relying on previous valid states (dp[j] where j < i) remains the same. This pattern is fundamental for solving many parsing and segmentation problems where you need to determine if an input conforms to a set of rules.
Last updated