Chapter 11: Segmenting a String - Word Break

After mastering problems that compare two strings, we return to single-sequence problems, but with a new twist. In this chapter, we'll tackle "Word Break," a classic problem that involves determining if a string can be segmented into a space-separated sequence of one or more dictionary words.

This problem (LeetCode #139) is a prime example of a decision-based DP problem, where the state dp[i] represents whether a certain condition (in this case, successful segmentation) is possible for the prefix of the string up to index i.

The Problem:

Given a string s and a dictionary of strings wordDict, return true if s can be segmented into a space-separated sequence of one or more dictionary words.

Note that the same word in the dictionary may be reused multiple times in the segmentation.

For example:

s = "leetcode", wordDict = ["leet", "code"] -> The answer is true because "leetcode" can be segmented as "leet code".
s = "applepenapple", wordDict = ["apple", "pen"] -> The answer is true.
s = "catsandog", wordDict = ["cats", "dog", "sand", "and", "cat"] -> The answer is false.

Approach 1: The Recursive Solution

Let's define a recursive function, solve(i), that returns true if the suffix s[i:] can be segmented.

To implement this, we can iterate through all possible end points j for the first word in the suffix s[i:]. For each j, we check if the substring s[i:j] is in our dictionary. If it is, we then recursively call our function on the rest of the string, solve(j). If any of these recursive calls return true, it means we've found a valid segmentation for the entire suffix, and we can return true.

The base case is when i reaches the end of the string (i == len(s)), which signifies a successful segmentation.

from typing import List

def wordBreak_recursive(s: str, wordDict: List[str]) -> bool:
    word_set = set(wordDict) # Use a set for O(1) average time complexity lookups

    def solve(i):
        # Base case: we've successfully segmented the whole string
        if i == len(s):
            return True

        # Try every possible end point for the current word
        for j in range(i + 1, len(s) + 1):
            prefix = s[i:j]
            if prefix in word_set and solve(j):
                return True

        return False

    return solve(0)

# Trying it out
# print(wordBreak_recursive("leetcode", ["leet", "code"])) # true
# print(wordBreak_recursive("aaaaaaaaaaaaaaaaaaaaaab", ["a","aa","aaa"])) # Very slow

This brute-force approach explores every possible segmentation, leading to an exponential time complexity due to re-computing results for the same suffixes.

Approach 2: Memoization with a Decorator (Top-Down DP)

The state of our recursion is solely determined by the starting index i. This is a perfect scenario for memoization to avoid re-solving for the same suffix.

import functools
from typing import List

class Solution:
    def wordBreak_memoized(self, s: str, wordDict: List[str]) -> bool:
        word_set = set(wordDict)

        @functools.lru_cache(maxsize=None)
        def solve(i):
            if i == len(s):
                return True

            for j in range(i + 1, len(s) + 1):
                if s[i:j] in word_set and solve(j):
                    return True

            return False

        return solve(0)

# Trying it out
# solver = Solution()
# print(solver.wordBreak_memoized("aaaaaaaaaaaaaaaaaaaaaab", ["a","aa","aaa"])) # false, and fast!

With memoization, each state solve(i) is computed only once. The time complexity becomes roughly O(n^2), where n is the length of the string, because for each starting index i, the inner loop can run up to n times.

Approach 3: Iteration (Bottom-Up DP)

For the bottom-up approach, we'll use a boolean array dp of size n+1. dp[i] will be true if the prefix s[:i] can be segmented.

Create a dp array of size n+1 and initialize all elements to false.
Set dp[0] = true. This is our base case, representing that an empty prefix can always be segmented.
Iterate i from 1 to n. This is the length of the prefix we are trying to validate.
Inside, iterate j from 0 to i-1. j represents a potential split point.
Check two conditions:
- Is dp[j] true? (Meaning, can the string up to j be segmented?)
- Is the substring s[j:i] in our dictionary?
If both are true, it means we can extend the valid segmentation at j with a new word (s[j:i]) to form a valid segmentation for the prefix of length i. So, we set dp[i] = true and can break the inner loop (since we just need one valid segmentation).
The final answer is dp[n].

def wordBreak_iterative(s: str, wordDict: List[str]) -> bool:
    word_set = set(wordDict)
    n = len(s)
    dp = [False] * (n + 1)
    dp[0] = True  # Base case for empty prefix

    for i in range(1, n + 1):
        for j in range(i):
            # If s[:j] can be segmented AND s[j:i] is a valid word...
            if dp[j] and s[j:i] in word_set:
                # ...then s[:i] can also be segmented.
                dp[i] = True
                break # Found a way, no need to check other j's for this i

    return dp[n]

# Trying it out
# print(wordBreak_iterative("applepenapple", ["apple", "pen"])) # true

This bottom-up approach is clean, has the same O(n^2) time complexity as the memoized solution, and is often preferred in interviews for its explicit, non-recursive structure.

Conclusion

The "Word Break" problem introduced a new way to think about 1D DP. Instead of calculating an optimal value (like a sum or count), we calculated a boolean possibility. The core idea of dp[i] relying on previous valid states (dp[j] where j < i) remains the same. This pattern is fundamental for solving many parsing and segmentation problems where you need to determine if an input conforms to a set of rules.

PreviousChapter 10: Transforming One String to Another - Edit Distance NextChapter 12: Reconstructing All Solutions - Word Break II

Last updated 1 month ago