Chapter 14: State-Machine DP - Best Time to Buy and Sell Stock with Cooldown

So far, our DP states have primarily been defined by an index or two in an array or string. Now, we'll explore problems where the state needs to capture more than just position—it needs to capture a status. These problems can be modeled as state machines, where each day (or step) involves transitioning from one status to another.

A classic example of this pattern is LeetCode's "Best Time to Buy and Sell Stock with Cooldown" (Problem #309). Here, we want to maximize profit, but our actions are constrained by a "cooldown" rule, making it a perfect candidate for a state-based DP approach.

The Problem:

You are given an array prices where prices[i] is the price of a given stock on the i-th day.

Find the maximum profit you can achieve. You may complete as many transactions as you like (i.e., buy one and sell one share of the stock multiple times) with the following restrictions:

You may not engage in multiple transactions simultaneously (i.e., you must sell the stock before you buy again).
After you sell your stock, you cannot buy stock on the next day (i.e., cooldown period is 1 day).

For example:

prices = [1, 2, 3, 0, 2]
Output: 3
Explanation: transactions = [buy, sell, cooldown, buy, sell] -> profit = (2-1) + (2-0) = 3

Approach 1: The Recursive Solution with State

For this problem, knowing the day i is not enough. We also need to know if we are currently holding a stock. So, let's define our state by (i, holding), where holding is a boolean.

Our function solve(i, holding) will return the maximum profit we can make from day i onwards.

If holding is False (we can buy):
- Option A: Buy. We buy the stock at prices[i]. Our profit is prices[i] plus whatever we can make from day i+1 onwards in a holding = True state. So, prices[i] + solve(i + 1, True).
- Option B: Don't buy (rest). We do nothing. Our profit is whatever we can make from day i+1 onwards in a holding = False state. So, solve(i + 1, False).
- We take the maximum of these two options.
If holding is True (we can sell):
- Option A: Sell. We sell the stock at prices[i]. Our profit is +prices[i] plus whatever we can make from day i+2 onwards (due to the cooldown) in a holding = False state. So, prices[i] + solve(i + 2, False).
- Option B: Don't sell (rest). We do nothing. Our profit is whatever we can make from day i+1 onwards in a holding = True state. So, solve(i + 1, True).
- We take the maximum of these two options.

The base case is when i >= len(prices), at which point no more profit can be made, so we return 0. We start the process at day 0 in a holding = False state.

from typing import List

def maxProfit_recursive(prices: List[int]) -> int:

    def solve(i, holding):
        # Base case: if we're past the last day, no more profit can be made.
        if i >= len(prices):
            return 0

        if holding:
            # We can either sell today (and skip a day) or hold and move to the next day.
            sell_today = prices[i] + solve(i + 2, False)
            hold = solve(i + 1, True)
            return max(sell_today, hold)
        else:
            # We can either buy today or rest and move to the next day.
            buy_today = -prices[i] + solve(i + 1, True)
            rest = solve(i + 1, False)
            return max(buy_today, rest)

    return solve(0, False)

# Trying it out
# print(maxProfit_recursive([1,2,3,0,2])) # 3
# This will be very slow on large inputs due to re-computation.

Approach 2: Memoization with a Decorator (Top-Down DP)

The recursive state is (i, holding), which is a tuple and can be easily cached. Applying our decorator will prevent re-computation and make the solution efficient.

import functools
from typing import List

class Solution:
    def maxProfit_memoized(self, prices: List[int]) -> int:

        @functools.lru_cache(maxsize=None)
        def solve(i, holding):
            if i >= len(prices):
                return 0

            if holding:
                sell_today = prices[i] + solve(i + 2, False)
                hold = solve(i + 1, True)
                return max(sell_today, hold)
            else:
                buy_today = -prices[i] + solve(i + 1, True)
                rest = solve(i + 1, False)
                return max(buy_today, rest)

        return solve(0, False)

# Trying it out
# solver = Solution()
# print(solver.maxProfit_memoized([6,1,3,2,4,7])) # 6

With memoization, each state (i, holding) is computed just once. The time complexity becomes O(n) because there are n * 2 possible states.

Approach 3: Iteration (Bottom-Up DP)

For the bottom-up approach, we can track the maximum profit for three different states at the end of each day i:

hold: The maximum profit if we are holding a stock.
sold: The maximum profit if we just sold a stock today.
rest: The maximum profit if we are "resting" (not holding a stock and not in cooldown).

We can iterate through the prices and update these three state variables:

hold_new = max(hold, rest - price): The only way to be holding today is if we were holding yesterday, OR if we were resting yesterday and bought today.
sold_new = hold + price: The only way to sell today is if we were holding yesterday.
rest_new = max(rest, sold): The only way to be resting today is if we were resting yesterday, OR if we finished our cooldown from selling yesterday.

We can initialize hold to a very small number (since buying costs money), and sold and rest to 0.

def maxProfit_iterative(prices: List[int]) -> int:
    # State variables
    hold = -float('inf')  # Max profit holding a stock
    sold = 0              # Max profit after selling (in cooldown)
    rest = 0              # Max profit resting (can buy)

    for price in prices:
        # Store previous sold state because it's needed for the new rest state
        prev_sold = sold

        # Update the states for the current day
        hold = max(hold, rest - price)
        sold = hold + price
        rest = max(rest, prev_sold)

    # The maximum profit is when we are not holding a stock at the end.
    # This can be either after selling or after resting.
    return max(sold, rest)

# Trying it out
# print(maxProfit_iterative([1,2,3,0,2])) # 3

This O(1) space, O(n) time solution is extremely efficient. It elegantly captures the state transitions without recursion or large DP arrays.

Conclusion

The "Buy and Sell Stock with Cooldown" problem is a gateway to a powerful DP paradigm. By defining the state to include not just the position but also the status of our agent (holding, sold, resting), we can model complex decision-making processes. This state-machine approach is applicable to a wide range of problems where actions in the present constrain actions in the future.

PreviousChapter 13: Conditional Transitions - Decode Ways NextChapter 15: Modifying Transitions - Best Time to Buy and Sell Stock with Transaction Fee

Last updated 3 months ago

hashtagApproach 1: The Recursive Solution with State

hashtagApproach 2: Memoization with a Decorator (Top-Down DP)

hashtagApproach 3: Iteration (Bottom-Up DP)

hashtagConclusion

Approach 1: The Recursive Solution with State

Approach 2: Memoization with a Decorator (Top-Down DP)

Approach 3: Iteration (Bottom-Up DP)

Conclusion