Chapter 11: Segmenting a String - Word Break

After mastering problems that compare two strings, we return to single-sequence problems, but with a new twist. In this chapter, we'll tackle "Word Break," a classic problem that involves determining if a string can be segmented into a space-separated sequence of one or more dictionary words.

This problem (LeetCode #139) is a prime example of a decision-based DP problem, where the state dp[i] represents whether a certain condition (in this case, successful segmentation) is possible for the prefix of the string up to index i.

The Problem:

Given a string s and a dictionary of strings wordDict, return true if s can be segmented into a space-separated sequence of one or more dictionary words.

Note that the same word in the dictionary may be reused multiple times in the segmentation.

For example:

  • s = "leetcode", wordDict = ["leet", "code"] -> The answer is true because "leetcode" can be segmented as "leet code".

  • s = "applepenapple", wordDict = ["apple", "pen"] -> The answer is true.

  • s = "catsandog", wordDict = ["cats", "dog", "sand", "and", "cat"] -> The answer is false.

Approach 1: The Recursive Solution

Let's define a recursive function, solve(i), that returns true if the suffix s[i:] can be segmented.

To implement this, we can iterate through all possible end points j for the first word in the suffix s[i:]. For each j, we check if the substring s[i:j] is in our dictionary. If it is, we then recursively call our function on the rest of the string, solve(j). If any of these recursive calls return true, it means we've found a valid segmentation for the entire suffix, and we can return true.

The base case is when i reaches the end of the string (i == len(s)), which signifies a successful segmentation.

from typing import List

def wordBreak_recursive(s: str, wordDict: List[str]) -> bool:
    word_set = set(wordDict) # Use a set for O(1) average time complexity lookups

    def solve(i):
        # Base case: we've successfully segmented the whole string
        if i == len(s):
            return True

        # Try every possible end point for the current word
        for j in range(i + 1, len(s) + 1):
            prefix = s[i:j]
            if prefix in word_set and solve(j):
                return True

        return False

    return solve(0)

# Trying it out
# print(wordBreak_recursive("leetcode", ["leet", "code"])) # true
# print(wordBreak_recursive("aaaaaaaaaaaaaaaaaaaaaab", ["a","aa","aaa"])) # Very slow

This brute-force approach explores every possible segmentation, leading to an exponential time complexity due to re-computing results for the same suffixes.

Approach 2: Memoization with a Decorator (Top-Down DP)

The state of our recursion is solely determined by the starting index i. This is a perfect scenario for memoization to avoid re-solving for the same suffix.

With memoization, each state solve(i) is computed only once. The time complexity becomes roughly O(n^2), where n is the length of the string, because for each starting index i, the inner loop can run up to n times.

Approach 3: Iteration (Bottom-Up DP)

For the bottom-up approach, we'll use a boolean array dp of size n+1. dp[i] will be true if the prefix s[:i] can be segmented.

  1. Create a dp array of size n+1 and initialize all elements to false.

  2. Set dp[0] = true. This is our base case, representing that an empty prefix can always be segmented.

  3. Iterate i from 1 to n. This is the length of the prefix we are trying to validate.

  4. Inside, iterate j from 0 to i-1. j represents a potential split point.

  5. Check two conditions:

    • Is dp[j] true? (Meaning, can the string up to j be segmented?)

    • Is the substring s[j:i] in our dictionary?

  6. If both are true, it means we can extend the valid segmentation at j with a new word (s[j:i]) to form a valid segmentation for the prefix of length i. So, we set dp[i] = true and can break the inner loop (since we just need one valid segmentation).

  7. The final answer is dp[n].

This bottom-up approach is clean, has the same O(n^2) time complexity as the memoized solution, and is often preferred in interviews for its explicit, non-recursive structure.

Conclusion

The "Word Break" problem introduced a new way to think about 1D DP. Instead of calculating an optimal value (like a sum or count), we calculated a boolean possibility. The core idea of dp[i] relying on previous valid states (dp[j] where j < i) remains the same. This pattern is fundamental for solving many parsing and segmentation problems where you need to determine if an input conforms to a set of rules.

Last updated