Learnitweb

Problem 705: Design HashSet

You must design your own HashSet data structure from scratch.
It should support the following operations efficiently:

add(key)
remove(key)
contains(key)

You are NOT allowed to use built-in HashSet libraries.

The input range of keys is typically from 0 to 1,000,000.


Understanding the Problem

HashSet stores unique integers with fast O(1) operations.

You need to implement:

void add(int key)
void remove(int key)
boolean contains(int key)

Key constraints:

No duplicate entries
Must run efficiently
Must avoid using Java’s built-in set structures


Approach (Explained in Simple Language)

There are two main implementation strategies:

Strategy 1: Boolean Array

Since key range is small (0 to 1,000,000), we can simply keep a boolean array:

arr[key] = true means key exists
arr[key] = false means key does not exist

This gives direct O(1) access.

Pros:
Fast
Simple
No hashing or collisions

Cons:
Uses large memory (boolean array of size 1,000,001)

This is the easiest and most common LeetCode solution.


Strategy 2: Separate Chaining (LinkedList Buckets)

If key range was large, we’d use hashing:

Compute index = key % bucketCount
Store values in a linked list inside that bucket
Check, add, and remove within that bucket

More complex but more realistic.


Java Code (Boolean Array Method — Recommended)

class MyHashSet {

    private boolean[] set;

    public MyHashSet() {
        set = new boolean[1_000_001];
    }

    public void add(int key) {
        set[key] = true;
    }

    public void remove(int key) {
        set[key] = false;
    }

    public boolean contains(int key) {
        return set[key];
    }
}

This solves the problem perfectly.


Java Code (Hashing + Separate Chaining)

This mimics a real HashSet:

import java.util.LinkedList;

class MyHashSet {

    private LinkedList<Integer>[] buckets;
    private int size = 1000;

    public MyHashSet() {
        buckets = new LinkedList[size];
        for (int i = 0; i < size; i++) {
            buckets[i] = new LinkedList<>();
        }
    }

    private int hash(int key) {
        return key % size;
    }

    public void add(int key) {
        int index = hash(key);
        if (!buckets[index].contains(key)) {
            buckets[index].add(key);
        }
    }

    public void remove(int key) {
        int index = hash(key);
        buckets[index].remove((Integer) key);
    }

    public boolean contains(int key) {
        int index = hash(key);
        return buckets[index].contains(key);
    }
}

This version handles collisions using linked lists.


Dry Run (Boolean Array Method)

Let’s perform operations:

MyHashSet set = new MyHashSet()

Add keys

set.add(5)
set.add(12)

Internal array:
set[5] = true
set[12] = true

Check contains

set.contains(5) → true
set.contains(12) → true
set.contains(1) → false

Remove a key

set.remove(5)

Now:
set[5] = false

Check again

set.contains(5) → false
set.contains(12) → true

Everything works exactly as expected.


Dry Run (Chaining Method)

MyHashSet set = new MyHashSet()

Assume size = 1000
Key = 1005

hash(1005) = 1005 % 1000 = 5

Add:
buckets[5].add(1005)

Another key = 2005
hash(2005) = 2005 % 1000 = 5

Collision → stored in the same bucket:
buckets[5] = [1005, 2005]

contains(2005) searches bucket 5 → returns true
remove(1005) removes from bucket 5


Why This Approach Works

The boolean array method gives:

Direct access
No hashing
Guaranteed O(1) operations
Very simple implementation

The chaining method demonstrates how a real hash set works:

Hash function distributes keys across buckets
Linked lists handle collisions
Still average O(1) performance