Problem Overview
You are given a string s, and your task is to find the longest substring that appears at least twice in the string.
These two occurrences may overlap.
If there is no duplicate substring, return an empty string.
Example 1:
Input: s = "banana" Output: "ana" Explanation: "ana" is the longest substring that appears twice in "banana".
Example 2:
Input: s = "abcd" Output: "" Explanation: No duplicate substring exists.
Intuition / Approach
This is a hard problem that combines binary search with string hashing (Rabin-Karp algorithm) to efficiently find the longest repeated substring.
Let’s break it down step-by-step.
1. Brute Force Idea (and why it fails)
A naive approach would be:
- Generate all substrings.
- Check which ones occur more than once.
- Return the longest one.
However:
- There are O(n²) substrings.
- Each comparison can take O(n).
- Overall complexity: O(n³) — infeasible for large strings (
n≤ 10⁵).
So we need a more optimized approach.
2. Optimized Strategy
We can use Binary Search + Rolling Hash (Rabin-Karp) to achieve nearly O(n log n) performance.
Key Idea
We binary search on the length of the substring, and for each length L:
- Check if there exists any substring of length
Lthat appears at least twice. - If yes → search for longer substrings.
- If no → search for shorter substrings.
To efficiently check duplicates, we use Rabin-Karp hashing:
- Compute hash values for all substrings of length
L. - Store them in a hash set.
- If a hash collision is found (same hash already exists), it might indicate a duplicate substring.
3. Detailed Algorithm
- Binary Search on Substring Length
low = 1,high = n - 1- While
low <= high:- Compute
mid = (low + high) / 2 - Check if there’s a duplicate substring of length
mid. - If found, move
low = mid + 1(try longer substring). - Else, move
high = mid - 1.
- Compute
- Rabin-Karp Rolling Hash
- Treat the string as a base-26 number (for lowercase letters).
- Compute hash for each substring of length
Lusing a sliding window. - Maintain a rolling hash by removing the leading character and adding the next.
- Use modulo arithmetic to prevent overflow.
- Return the Longest Valid Substring
Step-by-Step Example
Input: s = "banana"
Binary Search Steps
| low | high | mid | Found Duplicate? | Longest so far |
|---|---|---|---|---|
| 1 | 5 | 3 | Yes (“ana”) | “ana” |
| 4 | 5 | 4 | No | “ana” |
| Stop when low > high |
Output: “ana”
Java Code Implementation
import java.util.*;
public class Solution {
public String longestDupSubstring(String s) {
int n = s.length();
int left = 1, right = n - 1;
String result = "";
// Convert string to integer array for faster hashing
int[] nums = new int[n];
for (int i = 0; i < n; i++) {
nums[i] = s.charAt(i) - 'a';
}
long mod = (1L << 32); // Large modulus for hash
int base = 26; // Since lowercase English letters
while (left <= right) {
int mid = left + (right - left) / 2;
String dup = search(nums, mid, base, mod, s);
if (!dup.isEmpty()) {
result = dup; // found a longer duplicate
left = mid + 1;
} else {
right = mid - 1;
}
}
return result;
}
private String search(int[] nums, int len, int base, long mod, String s) {
int n = nums.length;
long hash = 0;
for (int i = 0; i < len; i++) {
hash = (hash * base + nums[i]) % mod;
}
HashSet<Long> seen = new HashSet<>();
seen.add(hash);
// precompute base^len % mod
long baseL = 1;
for (int i = 1; i <= len; i++) {
baseL = (baseL * base) % mod;
}
for (int i = 1; i < n - len + 1; i++) {
// Rolling hash: remove leading char, add trailing char
hash = (hash * base - nums[i - 1] * baseL % mod + mod) % mod;
hash = (hash + nums[i + len - 1]) % mod;
if (seen.contains(hash)) {
return s.substring(i, i + len);
}
seen.add(hash);
}
return "";
}
}
Code Explanation
numsarray: Converts each character into an integer (0–25).modvalue: Prevents overflow in hash computations.search()method:- Calculates rolling hash for substrings of a given length
len. - Uses a HashSet to detect duplicates efficiently.
- Calculates rolling hash for substrings of a given length
- Binary Search: Adjusts substring length based on whether a duplicate exists.
Time and Space Complexity
| Complexity | Description |
|---|---|
| Time Complexity | O(n log n) — Binary search over substring length with O(n) hash checks |
| Space Complexity | O(n) — To store hash values and intermediate computations |
Edge Cases
- All unique characters:
"abcd"→ returns"". - All same characters:
"aaaaa"→ returns"aaaa". - Very long input (10⁵ length): Works efficiently due to binary search.
Key Takeaways
- Combining binary search with rolling hash (Rabin-Karp) is a powerful technique for substring problems.
- We avoid comparing actual substrings directly — instead, we compare hash values for efficiency.
- Be careful with hash collisions and overflow handling.
- The problem illustrates how mathematical hashing + search logic can reduce complex O(n³) problems to O(n log n).
