After mastering problems that compare two strings, we return to single-sequence problems, but with a new twist. In this chapter, we'll tackle "Word Break," a classic problem that involves determining if a string can be segmented into a space-separated sequence of one or more dictionary words.
This problem (LeetCode #139) is a prime example of a decision-based DP problem, where the state dp[i] represents whether a certain condition (in this case, successful segmentation) is possible for the prefix of the string up to index i.
The Problem:
Given a string s and a dictionary of strings wordDict, return true if s can be segmented into a space-separated sequence of one or more dictionary words.
Note that the same word in the dictionary may be reused multiple times in the segmentation.
For example:
s = "leetcode", wordDict = ["leet", "code"] -> The answer is true because "leetcode" can be segmented as "leet code".
s = "applepenapple", wordDict = ["apple", "pen"] -> The answer is true.
s = "catsandog", wordDict = ["cats", "dog", "sand", "and", "cat"] -> The answer is false.
Approach 1: The Recursive Solution
Let's define a recursive function, solve(i), that returns true if the suffix s[i:] can be segmented.
To implement this, we can iterate through all possible end points j for the first word in the suffix s[i:]. For each j, we check if the substring s[i:j] is in our dictionary. If it is, we then recursively call our function on the rest of the string, solve(j). If any of these recursive calls return true, it means we've found a valid segmentation for the entire suffix, and we can return true.
The base case is when i reaches the end of the string (i == len(s)), which signifies a successful segmentation.
from typing import List
def wordBreak_recursive(s: str, wordDict: List[str]) -> bool:
word_set = set(wordDict) # Use a set for O(1) average time complexity lookups
def solve(i):
# Base case: we've successfully segmented the whole string
if i == len(s):
return True
# Try every possible end point for the current word
for j in range(i + 1, len(s) + 1):
prefix = s[i:j]
if prefix in word_set and solve(j):
return True
return False
return solve(0)
# Trying it out
# print(wordBreak_recursive("leetcode", ["leet", "code"])) # true
# print(wordBreak_recursive("aaaaaaaaaaaaaaaaaaaaaab", ["a","aa","aaa"])) # Very slow
This brute-force approach explores every possible segmentation, leading to an exponential time complexity due to re-computing results for the same suffixes.
Approach 2: Memoization with a Decorator (Top-Down DP)
The state of our recursion is solely determined by the starting index i. This is a perfect scenario for memoization to avoid re-solving for the same suffix.
With memoization, each state solve(i) is computed only once. The time complexity becomes roughly O(n^2), where n is the length of the string, because for each starting index i, the inner loop can run up to n times.
Approach 3: Iteration (Bottom-Up DP)
For the bottom-up approach, we'll use a boolean array dp of size n+1. dp[i] will be true if the prefix s[:i] can be segmented.
Create a dp array of size n+1 and initialize all elements to false.
Set dp[0] = true. This is our base case, representing that an empty prefix can always be segmented.
Iterate i from 1 to n. This is the length of the prefix we are trying to validate.
Inside, iterate j from 0 to i-1. j represents a potential split point.
Check two conditions:
Is dp[j] true? (Meaning, can the string up to j be segmented?)
Is the substring s[j:i] in our dictionary?
If both are true, it means we can extend the valid segmentation at j with a new word (s[j:i]) to form a valid segmentation for the prefix of length i. So, we set dp[i] = true and can break the inner loop (since we just need one valid segmentation).
The final answer is dp[n].
This bottom-up approach is clean, has the same O(n^2) time complexity as the memoized solution, and is often preferred in interviews for its explicit, non-recursive structure.
Conclusion
The "Word Break" problem introduced a new way to think about 1D DP. Instead of calculating an optimal value (like a sum or count), we calculated a boolean possibility. The core idea of dp[i] relying on previous valid states (dp[j] where j < i) remains the same. This pattern is fundamental for solving many parsing and segmentation problems where you need to determine if an input conforms to a set of rules.
import functools
from typing import List
class Solution:
def wordBreak_memoized(self, s: str, wordDict: List[str]) -> bool:
word_set = set(wordDict)
@functools.lru_cache(maxsize=None)
def solve(i):
if i == len(s):
return True
for j in range(i + 1, len(s) + 1):
if s[i:j] in word_set and solve(j):
return True
return False
return solve(0)
# Trying it out
# solver = Solution()
# print(solver.wordBreak_memoized("aaaaaaaaaaaaaaaaaaaaaab", ["a","aa","aaa"])) # false, and fast!
def wordBreak_iterative(s: str, wordDict: List[str]) -> bool:
word_set = set(wordDict)
n = len(s)
dp = [False] * (n + 1)
dp[0] = True # Base case for empty prefix
for i in range(1, n + 1):
for j in range(i):
# If s[:j] can be segmented AND s[j:i] is a valid word...
if dp[j] and s[j:i] in word_set:
# ...then s[:i] can also be segmented.
dp[i] = True
break # Found a way, no need to check other j's for this i
return dp[n]
# Trying it out
# print(wordBreak_iterative("applepenapple", ["apple", "pen"])) # true