Learnitweb

How HashSet Internally Works in Java?

1. Overview of HashSet

The HashSet class is part of Java’s Collection Framework and implements the Set interface, which means:

  • It does not allow duplicate elements.
  • It provides constant-time performance for basic operations like add, remove, contains, assuming a good hash function.

However, HashSet is not backed by its own custom data structure. Instead, it uses a HashMap internally to store its elements.

public class HashSet<E> extends AbstractSet<E>
    implements Set<E>, Cloneable, java.io.Serializable {
    
    private transient HashMap<E,Object> map;

    // Dummy value to associate with an Object in the backing Map
    private static final Object PRESENT = new Object();
}

2. Internal Data Structure

The HashSet internally uses a HashMap<E, Object> where:

  • The element you add to the set becomes the key in the map.
  • The value is a constant dummy object (PRESENT) just to fulfill the key-value structure of a map.

For example:

HashSet<String> set = new HashSet<>();
set.add("Java");

// Internally becomes:
map.put("Java", PRESENT);
  • The PRESENT object is just a placeholder to distinguish between keys that are present and not.
  • Since HashMap does not allow duplicate keys, HashSet automatically prevents duplicate elements.

3. Constructors in HashSet

HashSet offers several constructors, all of which initialize the internal HashMap accordingly.

3.1 Default Constructor

HashSet<E> set = new HashSet<>();
  • Creates a HashMap with default capacity (16) and load factor (0.75).

3.2 Constructor with Initial Capacity

HashSet<E> set = new HashSet<>(initialCapacity);
  • Optimizes internal resizing by setting the expected number of elements in advance.

3.3 Constructor with Capacity and Load Factor

HashSet<E> set = new HashSet<>(initialCapacity, loadFactor);
  • Gives you control over performance trade-offs (memory vs time).

3.4 Constructor with Collection

HashSet<E> set = new HashSet<>(Collection<? extends E> c);
  • Initializes the set with all elements from the collection.

4. Key Operations and Their Internal Working

4.1 Adding Elements

set.add("Apple");

Internally:

map.put("Apple", PRESENT);
  • The HashMap computes the hash of the key ("Apple").
  • Finds the appropriate bucket using (hash & (n - 1)).
  • If the key doesn’t already exist, it creates a new Node.
  • If a collision occurs, it adds the new entry to the linked list or tree at that bucket.
  • Returns true if the element was added, false if it was already present.

Time Complexity: O(1) (average case), O(n) (worst pre-Java 8), O(log n) (Java 8+ with trees)

4.2 Checking for Existence (contains())

set.contains("Apple");

Internally:

map.containsKey("Apple");
  • The map calculates the hash of the key and looks it up in the appropriate bucket.
  • Traverses the list/tree if needed to compare using equals().

Time Complexity: O(1) (average), O(log n) (tree), O(n) (worst case)

4.3 Removing Elements

set.remove("Apple");

Internally:

map.remove("Apple");
  • Calculates the hash and locates the bucket.
  • Removes the entry if it exists.
  • Returns true if removal was successful, false otherwise.

Time Complexity: O(1) on average, O(log n) in tree buckets

4.4 Iterating Over Elements

When you use an iterator:

for (String value : set) {
    System.out.println(value);
}

Internally:

  • The iterator returned is actually backed by the HashMap.keySet() iterator.
  • So it’s traversing keys of the internal map.

Time Complexity: O(n) traversal

5. Uniqueness Guarantee

5.1 No Duplicates Allowed

This is the core property of any Set:

set.add("Apple");
set.add("Apple");  // Ignored

Internally:

  • The second add() checks if "Apple" is already a key in the internal map.
  • If yes, no change is made and false is returned.

This is handled by the HashMap.put() method, which doesn’t overwrite the value if the key already exists (when used in HashSet).

5.2 Based on hashCode() and equals()

Two objects are considered the same if:

  1. Their hashCode() is the same.
  2. Their equals() method returns true.

Hence, if you override equals(), you must override hashCode() properly, or HashSet will behave unpredictably.

6. Null Element Handling

HashSet allows one null element.

Example:

set.add(null);

Internally:

map.put(null, PRESENT);
  • Since HashMap allows one null key, the same applies to HashSet.
  • If you add null again, it’s ignored (since it’s already present).

7. Load Factor and Resizing

Just like HashMap:

  • Default load factor is 0.75.
  • When the number of elements exceeds capacity × loadFactor, the internal map resizes by doubling the bucket array.
  • All existing entries are rehashed and redistributed.

This resizing process has a time complexity of O(n) and is expensive, which is why setting an appropriate initial capacity is important when adding many elements.


8. Java 8+ Improvements

Since HashSet uses HashMap, it benefits from all Java 8+ improvements:

8.1 Red-Black Tree Buckets

  • If too many elements fall into a single bucket (linked list exceeds 8 nodes and map size ≥ 64), the bucket is converted to a Red-Black Tree.
  • Lookup, insert, and delete operations become O(log n) instead of O(n).

8.2 Better Hash Function

  • HashMap uses a better hash spreading technique:
(h) ^ (h >>> 16)
  • This spreads out poorly implemented hashCode() results and reduces clustering.

9. Time Complexity Summary

OperationBest CaseAverage CaseWorst Case (pre-Java 8)Worst Case (Java 8+)
add()O(1)O(1)O(n)O(log n)
contains()O(1)O(1)O(n)O(log n)
remove()O(1)O(1)O(n)O(log n)
iterator()O(n)O(n)O(n)O(n)

Note: Worst-case happens when many hash collisions occur, or all elements fall into the same bucket.

10. Thread Safety

  • HashSet is not thread-safe.
  • If multiple threads modify it concurrently, it must be synchronized externally.
  • For thread-safe variants, use:
Set<String> set = Collections.synchronizedSet(new HashSet<>());

Or consider using:

  • ConcurrentHashMap.newKeySet() for high concurrency.
  • CopyOnWriteArraySet for read-heavy concurrent sets.

11. Comparison with Other Set Implementations

FeatureHashSetLinkedHashSetTreeSet
OrderingNo orderingInsertion orderSorted (natural/custom)
Underlying DSHashMapHashMap + Linked ListTreeMap (Red-Black Tree)
Time ComplexityO(1)O(1)O(log n)
Null Elements1 null allowed1 null allowedNot allowed (by default)

12. Real-World Example

Set<String> emails = new HashSet<>();
emails.add("john@example.com");
emails.add("jane@example.com");
emails.add("john@example.com"); // ignored

System.out.println(emails.size()); // 2

Behind the scenes:

  • "john@example.com" is stored as a key in the internal HashMap.
  • The second insertion is ignored because the key already exists.

13. Summary Table

FeatureDescription
Internal Data StructureHashMap<K, Object>
Element StorageKeys of the map
Uniqueness EnforcementBy using map keys
Null HandlingAllows one null element
Thread-Safe?No
OrderingNo
Collision HandlingChaining → Tree (Java 8+)
ResizingWhen size > capacity × loadFactor
Time Complexity (Avg)O(1) for add, contains, remove