1. Overview of HashSet
The HashSet
class is part of Java’s Collection Framework and implements the Set interface, which means:
- It does not allow duplicate elements.
- It provides constant-time performance for basic operations like
add
,remove
,contains
, assuming a good hash function.
However, HashSet
is not backed by its own custom data structure. Instead, it uses a HashMap
internally to store its elements.
public class HashSet<E> extends AbstractSet<E> implements Set<E>, Cloneable, java.io.Serializable { private transient HashMap<E,Object> map; // Dummy value to associate with an Object in the backing Map private static final Object PRESENT = new Object(); }
2. Internal Data Structure
The HashSet
internally uses a HashMap<E, Object>
where:
- The element you add to the set becomes the key in the map.
- The value is a constant dummy object (
PRESENT
) just to fulfill the key-value structure of a map.
For example:
HashSet<String> set = new HashSet<>(); set.add("Java"); // Internally becomes: map.put("Java", PRESENT);
- The
PRESENT
object is just a placeholder to distinguish between keys that are present and not. - Since
HashMap
does not allow duplicate keys,HashSet
automatically prevents duplicate elements.
3. Constructors in HashSet
HashSet offers several constructors, all of which initialize the internal HashMap accordingly.
3.1 Default Constructor
HashSet<E> set = new HashSet<>();
- Creates a HashMap with default capacity (16) and load factor (0.75).
3.2 Constructor with Initial Capacity
HashSet<E> set = new HashSet<>(initialCapacity);
- Optimizes internal resizing by setting the expected number of elements in advance.
3.3 Constructor with Capacity and Load Factor
HashSet<E> set = new HashSet<>(initialCapacity, loadFactor);
- Gives you control over performance trade-offs (memory vs time).
3.4 Constructor with Collection
HashSet<E> set = new HashSet<>(Collection<? extends E> c);
- Initializes the set with all elements from the collection.
4. Key Operations and Their Internal Working
4.1 Adding Elements
set.add("Apple");
Internally:
map.put("Apple", PRESENT);
- The
HashMap
computes the hash of the key ("Apple"
). - Finds the appropriate bucket using
(hash & (n - 1))
. - If the key doesn’t already exist, it creates a new
Node
. - If a collision occurs, it adds the new entry to the linked list or tree at that bucket.
- Returns
true
if the element was added,false
if it was already present.
Time Complexity: O(1) (average case), O(n) (worst pre-Java 8), O(log n) (Java 8+ with trees)
4.2 Checking for Existence (contains()
)
set.contains("Apple");
Internally:
map.containsKey("Apple");
- The map calculates the hash of the key and looks it up in the appropriate bucket.
- Traverses the list/tree if needed to compare using
equals()
.
Time Complexity: O(1) (average), O(log n) (tree), O(n) (worst case)
4.3 Removing Elements
set.remove("Apple");
Internally:
map.remove("Apple");
- Calculates the hash and locates the bucket.
- Removes the entry if it exists.
- Returns
true
if removal was successful,false
otherwise.
Time Complexity: O(1) on average, O(log n) in tree buckets
4.4 Iterating Over Elements
When you use an iterator:
for (String value : set) { System.out.println(value); }
Internally:
- The iterator returned is actually backed by the
HashMap.keySet()
iterator. - So it’s traversing keys of the internal map.
Time Complexity: O(n) traversal
5. Uniqueness Guarantee
5.1 No Duplicates Allowed
This is the core property of any Set:
set.add("Apple"); set.add("Apple"); // Ignored
Internally:
- The second
add()
checks if"Apple"
is already a key in the internal map. - If yes, no change is made and
false
is returned.
This is handled by the HashMap.put()
method, which doesn’t overwrite the value if the key already exists (when used in HashSet).
5.2 Based on hashCode()
and equals()
Two objects are considered the same if:
- Their
hashCode()
is the same. - Their
equals()
method returnstrue
.
Hence, if you override equals()
, you must override hashCode()
properly, or HashSet will behave unpredictably.
6. Null Element Handling
HashSet allows one null element.
Example:
set.add(null);
Internally:
map.put(null, PRESENT);
- Since
HashMap
allows onenull
key, the same applies toHashSet
. - If you add
null
again, it’s ignored (since it’s already present).
7. Load Factor and Resizing
Just like HashMap:
- Default load factor is 0.75.
- When the number of elements exceeds
capacity × loadFactor
, the internal map resizes by doubling the bucket array. - All existing entries are rehashed and redistributed.
This resizing process has a time complexity of O(n) and is expensive, which is why setting an appropriate initial capacity is important when adding many elements.
8. Java 8+ Improvements
Since HashSet uses HashMap
, it benefits from all Java 8+ improvements:
8.1 Red-Black Tree Buckets
- If too many elements fall into a single bucket (linked list exceeds 8 nodes and map size ≥ 64), the bucket is converted to a Red-Black Tree.
- Lookup, insert, and delete operations become O(log n) instead of O(n).
8.2 Better Hash Function
- HashMap uses a better hash spreading technique:
(h) ^ (h >>> 16)
- This spreads out poorly implemented
hashCode()
results and reduces clustering.
9. Time Complexity Summary
Operation | Best Case | Average Case | Worst Case (pre-Java 8) | Worst Case (Java 8+) |
---|---|---|---|---|
add() | O(1) | O(1) | O(n) | O(log n) |
contains() | O(1) | O(1) | O(n) | O(log n) |
remove() | O(1) | O(1) | O(n) | O(log n) |
iterator() | O(n) | O(n) | O(n) | O(n) |
Note: Worst-case happens when many hash collisions occur, or all elements fall into the same bucket.
10. Thread Safety
HashSet
is not thread-safe.- If multiple threads modify it concurrently, it must be synchronized externally.
- For thread-safe variants, use:
Set<String> set = Collections.synchronizedSet(new HashSet<>());
Or consider using:
ConcurrentHashMap.newKeySet()
for high concurrency.CopyOnWriteArraySet
for read-heavy concurrent sets.
11. Comparison with Other Set Implementations
Feature | HashSet | LinkedHashSet | TreeSet |
---|---|---|---|
Ordering | No ordering | Insertion order | Sorted (natural/custom) |
Underlying DS | HashMap | HashMap + Linked List | TreeMap (Red-Black Tree) |
Time Complexity | O(1) | O(1) | O(log n) |
Null Elements | 1 null allowed | 1 null allowed | Not allowed (by default) |
12. Real-World Example
Set<String> emails = new HashSet<>(); emails.add("john@example.com"); emails.add("jane@example.com"); emails.add("john@example.com"); // ignored System.out.println(emails.size()); // 2
Behind the scenes:
"john@example.com"
is stored as a key in the internalHashMap
.- The second insertion is ignored because the key already exists.
13. Summary Table
Feature | Description |
---|---|
Internal Data Structure | HashMap<K, Object> |
Element Storage | Keys of the map |
Uniqueness Enforcement | By using map keys |
Null Handling | Allows one null element |
Thread-Safe? | No |
Ordering | No |
Collision Handling | Chaining → Tree (Java 8+) |
Resizing | When size > capacity × loadFactor |
Time Complexity (Avg) | O(1) for add, contains, remove |