1. Introduction
The Longest Common Substring (LCS) problem is a classic string problem in computer science. It involves finding the longest contiguous substring that appears in both strings.
Example:
String A: "abcdxyz" String B: "xyzabcd" Longest common substring: "abcd" (length 4)
Note:
- Substring is contiguous.
- This is different from the Longest Common Subsequence, where characters need not be contiguous.
Problem Statement (Typical Coding Problem):
Given two strings
s1
ands2
, find the length of the longest common substring. Optionally, return the substring itself.
2. When to Use Longest Common Substring Pattern
Use this pattern when:
- You need to find common sequences between two strings that are contiguous.
- You need maximum length or the substring itself.
- Problems involve text comparison, DNA sequences, data deduplication, or pattern matching.
Problem Indicators:
- “Longest contiguous substring common in two strings”
- “Maximum overlap between two sequences”
- “Find matching blocks of text”
3. Core Idea
We can define a dynamic programming recurrence:
For strings s1
and s2
of lengths m
and n
:
dp[i][j] = length of longest common substring ending at s1[i-1] and s2[j-1]
Recurrence relation:
if s1[i-1] == s2[j-1]: dp[i][j] = dp[i-1][j-1] + 1 else: dp[i][j] = 0
dp[i][j]
only increases when characters match and substring is contiguous.- Track the maximum value in the DP table → length of LCS.
Base Case:
dp[0][j] = 0 for all j dp[i][0] = 0 for all i
4. Example
s1 = "abcdxyz" s2 = "xyzabcd" DP Table (partial view): '' x y z a b c d '' 0 0 0 0 0 0 0 0 a 0 0 0 0 1 0 0 0 b 0 0 0 0 0 2 0 0 c 0 0 0 0 0 0 3 0 d 0 0 0 0 0 0 0 4 x 0 1 0 0 0 0 0 0 y 0 0 2 0 0 0 0 0 z 0 0 0 3 0 0 0 0
Maximum value = 4 → LCS length = 4 (“abcd”)
5. Bottom-Up DP (Tabulation)
public int longestCommonSubstring(String s1, String s2) { int m = s1.length(); int n = s2.length(); int[][] dp = new int[m+1][n+1]; int maxLen = 0; for (int i = 1; i <= m; i++) { for (int j = 1; j <= n; j++) { if (s1.charAt(i-1) == s2.charAt(j-1)) { dp[i][j] = dp[i-1][j-1] + 1; maxLen = Math.max(maxLen, dp[i][j]); } else { dp[i][j] = 0; } } } return maxLen; }
- Time Complexity: O(m * n)
- Space Complexity: O(m * n)
6. Space Optimization
We only need the previous row, so we can optimize space to O(n):
public int longestCommonSubstringOptimized(String s1, String s2) { int m = s1.length(); int n = s2.length(); int[] prev = new int[n+1]; int[] curr = new int[n+1]; int maxLen = 0; for (int i = 1; i <= m; i++) { for (int j = 1; j <= n; j++) { if (s1.charAt(i-1) == s2.charAt(j-1)) { curr[j] = prev[j-1] + 1; maxLen = Math.max(maxLen, curr[j]); } else { curr[j] = 0; } } int[] temp = prev; prev = curr; curr = temp; // reset current row } return maxLen; }
- Time Complexity: O(m * n)
- Space Complexity: O(n)
7. Finding the Substring Itself
To get the substring, track the ending index when updating maxLen
:
int endIndex = 0; if (dp[i][j] > maxLen) { maxLen = dp[i][j]; endIndex = i; // end of substring in s1 } String lcs = s1.substring(endIndex - maxLen, endIndex);
8. Variants of the Pattern
- Longest Repeated Substring – substring that occurs at least twice in a string.
- Longest Common Subsequence (LCS) – characters do not need to be contiguous.
- Longest Palindromic Substring – substring is a palindrome.
Observation:
- LCS problems often rely on DP tables, subproblem overlapping, and tracking indices.
- Recognize contiguous vs non-contiguous requirements.
9. When to Use This Pattern
- When you need maximum-length contiguous matching segments between two strings.
- When tracking common patterns in sequences is required.
- Problems in text comparison, bioinformatics, plagiarism detection, and pattern matching.
Indicators:
- “Longest common substring”
- “Maximum overlap between strings”
- “Find common blocks in sequences”
10. Key Takeaways
- Longest common substring requires contiguous characters.
- DP relation:
if s1[i-1] == s2[j-1] → dp[i][j] = dp[i-1][j-1] + 1 else → dp[i][j] = 0
- Base case: first row/column = 0.
- Track maximum length and optionally the ending index.
- Optimize space using only previous row for large strings.