Chapter 9: Comparing Two Sequences - Longest Common Subsequence

Having explored subsequences within a single array, we now tackle problems that involve comparing two different sequences. The quintessential problem in this domain is finding the "Longest Common Subsequence" (LCS). This is a cornerstone algorithm with wide-ranging applications, from bioinformatics (comparing DNA sequences) to software engineering (calculating differences between files).

LeetCode problem #1143 provides the perfect platform to learn the 2D DP approach required for these types of problems. The state dp[i][j] will represent the solution for the prefixes of both sequences.

The Problem:

Given two strings text1 and text2, return the length of their longest common subsequence. If there is no common subsequence, return 0.

A subsequence of a string is a new string generated from the original string with some characters (can be none) deleted without changing the relative order of the remaining characters. For example, "ace" is a subsequence of "abcde".

A common subsequence of two strings is a subsequence that is common to both original strings.

For example:

  • text1 = "abcde", text2 = "ace" -> The answer is 3. The longest common subsequence is "ace".

  • text1 = "abc", text2 = "abc" -> The answer is 3.

  • text1 = "abc", text2 = "def" -> The answer is 0.

Approach 1: The Recursive Solution

Let's define a recursive function solve(i, j) that computes the length of the LCS for the suffixes text1[i:] and text2[j:].

We have two main cases when comparing the characters text1[i] and text2[j]:

  1. The characters match: If text1[i] == text2[j], then this character is part of our LCS. The length is 1 plus the LCS of the rest of the strings, which is solve(i + 1, j + 1).

  2. The characters do not match: If text1[i] != text2[j], we can't use both characters. We must skip one of them and take the best result. We find the maximum of:

    • Skipping the character in text1: solve(i + 1, j)

    • Skipping the character in text2: solve(i, j + 1)

The base case is when either i or j goes out of bounds, meaning we've run out of characters in one of the strings. In that case, the LCS is 0.

This brute-force approach correctly calculates the answer but is extremely inefficient due to massive overlapping subproblems.

Approach 2: Memoization with a Decorator (Top-Down DP)

The state of our recursion is defined by the pair of indices (i, j). Caching the results for each (i, j) pair will prevent redundant computations and make the solution viable.

The top-down DP solution is a direct translation of the recursive logic, but it is dramatically faster thanks to memoization. Its time complexity is O(m * n), where m and n are the lengths of the two strings.

Approach 3: Iteration (Bottom-Up DP)

For the bottom-up approach, we'll create a 2D dp table. Let dp[i][j] be the length of the LCS of text1's prefix of length i (text1[:i]) and text2's prefix of length j (text2[:j]).

  1. Create a dp table of size (m+1) x (n+1), where m=len(text1) and n=len(text2). Initialize it with zeros. The extra row and column handle the base cases of empty string prefixes.

  2. Iterate i from 1 to m and j from 1 to n.

  3. Inside the loop, compare text1[i-1] and text2[j-1] (we use i-1 and j-1 because our DP table is 1-indexed relative to the strings).

    • If text1[i-1] == text2[j-1]: The characters match, so we extend the LCS from the previous prefixes. dp[i][j] = 1 + dp[i-1][j-1].

    • If they don't match: We take the best result from the subproblems where we exclude one of the characters. dp[i][j] = max(dp[i-1][j], dp[i][j-1]).

  4. The final answer is in dp[m][n].

This bottom-up solution is clean, efficient, and clearly shows the relationships between the subproblems. It also has a time complexity of O(m * n).

Conclusion

You've now mastered one of the most important dynamic programming patterns: comparing two sequences using a 2D DP table. The logic of matching or mismatching characters and recurring on the subproblems (dp[i-1][j-1], dp[i-1][j], dp[i][j-1]) is a template that can be adapted to solve many related problems, such as "Edit Distance" and "Shortest Common Supersequence".

Last updated