Learnitweb

LeetCode Problem 1044: Longest Duplicate Substring

Problem Overview

You are given a string s, and your task is to find the longest substring that appears at least twice in the string.
These two occurrences may overlap.

If there is no duplicate substring, return an empty string.

Example 1:

Input: s = "banana"
Output: "ana"
Explanation:
"ana" is the longest substring that appears twice in "banana".

Example 2:

Input: s = "abcd"
Output: ""
Explanation:
No duplicate substring exists.

Intuition / Approach

This is a hard problem that combines binary search with string hashing (Rabin-Karp algorithm) to efficiently find the longest repeated substring.

Let’s break it down step-by-step.


1. Brute Force Idea (and why it fails)

A naive approach would be:

  • Generate all substrings.
  • Check which ones occur more than once.
  • Return the longest one.

However:

  • There are O(n²) substrings.
  • Each comparison can take O(n).
  • Overall complexity: O(n³) — infeasible for large strings (n ≤ 10⁵).

So we need a more optimized approach.


2. Optimized Strategy

We can use Binary Search + Rolling Hash (Rabin-Karp) to achieve nearly O(n log n) performance.

Key Idea

We binary search on the length of the substring, and for each length L:

  • Check if there exists any substring of length L that appears at least twice.
  • If yes → search for longer substrings.
  • If no → search for shorter substrings.

To efficiently check duplicates, we use Rabin-Karp hashing:

  • Compute hash values for all substrings of length L.
  • Store them in a hash set.
  • If a hash collision is found (same hash already exists), it might indicate a duplicate substring.

3. Detailed Algorithm

  1. Binary Search on Substring Length
    • low = 1, high = n - 1
    • While low <= high:
      • Compute mid = (low + high) / 2
      • Check if there’s a duplicate substring of length mid.
      • If found, move low = mid + 1 (try longer substring).
      • Else, move high = mid - 1.
  2. Rabin-Karp Rolling Hash
    • Treat the string as a base-26 number (for lowercase letters).
    • Compute hash for each substring of length L using a sliding window.
    • Maintain a rolling hash by removing the leading character and adding the next.
    • Use modulo arithmetic to prevent overflow.
  3. Return the Longest Valid Substring

Step-by-Step Example

Input: s = "banana"

Binary Search Steps

lowhighmidFound Duplicate?Longest so far
153Yes (“ana”)“ana”
454No“ana”
Stop when low > high

Output: “ana”


Java Code Implementation

import java.util.*;

public class Solution {
    public String longestDupSubstring(String s) {
        int n = s.length();
        int left = 1, right = n - 1;
        String result = "";
        
        // Convert string to integer array for faster hashing
        int[] nums = new int[n];
        for (int i = 0; i < n; i++) {
            nums[i] = s.charAt(i) - 'a';
        }

        long mod = (1L << 32); // Large modulus for hash
        int base = 26; // Since lowercase English letters

        while (left <= right) {
            int mid = left + (right - left) / 2;
            String dup = search(nums, mid, base, mod, s);
            if (!dup.isEmpty()) {
                result = dup; // found a longer duplicate
                left = mid + 1;
            } else {
                right = mid - 1;
            }
        }
        return result;
    }

    private String search(int[] nums, int len, int base, long mod, String s) {
        int n = nums.length;
        long hash = 0;
        for (int i = 0; i < len; i++) {
            hash = (hash * base + nums[i]) % mod;
        }

        HashSet<Long> seen = new HashSet<>();
        seen.add(hash);

        // precompute base^len % mod
        long baseL = 1;
        for (int i = 1; i <= len; i++) {
            baseL = (baseL * base) % mod;
        }

        for (int i = 1; i < n - len + 1; i++) {
            // Rolling hash: remove leading char, add trailing char
            hash = (hash * base - nums[i - 1] * baseL % mod + mod) % mod;
            hash = (hash + nums[i + len - 1]) % mod;
            if (seen.contains(hash)) {
                return s.substring(i, i + len);
            }
            seen.add(hash);
        }
        return "";
    }
}

Code Explanation

  • nums array: Converts each character into an integer (0–25).
  • mod value: Prevents overflow in hash computations.
  • search() method:
    • Calculates rolling hash for substrings of a given length len.
    • Uses a HashSet to detect duplicates efficiently.
  • Binary Search: Adjusts substring length based on whether a duplicate exists.

Time and Space Complexity

ComplexityDescription
Time ComplexityO(n log n) — Binary search over substring length with O(n) hash checks
Space ComplexityO(n) — To store hash values and intermediate computations

Edge Cases

  1. All unique characters: "abcd" → returns "".
  2. All same characters: "aaaaa" → returns "aaaa".
  3. Very long input (10⁵ length): Works efficiently due to binary search.

Key Takeaways

  • Combining binary search with rolling hash (Rabin-Karp) is a powerful technique for substring problems.
  • We avoid comparing actual substrings directly — instead, we compare hash values for efficiency.
  • Be careful with hash collisions and overflow handling.
  • The problem illustrates how mathematical hashing + search logic can reduce complex O(n³) problems to O(n log n).