A HashMap
in Java is a key-value based data structure that allows fast data access by leveraging a technique called hashing. It is part of the java.util
package and is widely used due to its average constant time complexity (O(1)) for insertion and retrieval operations.
Core Internal Concepts Behind HashMap
1. Hashing Mechanism
- Hashing is the process of converting a key into an integer (called a hash code) using a hash function.
- Java uses the
hashCode()
method defined in theObject
class (which you can override). - HashMap improves the hash value using a supplemental hash function to reduce collisions:
static final int hash(Object key) { int h; return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); }
This supplemental function helps in evenly distributing entries across buckets.
2. Bucket Index Calculation
After computing the hash, HashMap uses this formula to compute the array index (bucket):
index = (n - 1) & hash;
where n
is the size of the internal array (must be a power of 2). This bitwise AND operation ensures fast modulo-like behavior and prevents costly modulo operations.
3. Structure of a Bucket (Array + Linked List + Tree)
- Internally, HashMap uses an array of Node<K, V> (or Entry<K, V> in older versions).
- Each index in the array is called a bucket, which stores key-value pairs.
- If multiple keys hash to the same bucket (called a hash collision), these entries are stored:
- As a linked list in Java 7 and earlier
- As a balanced tree (Red-Black Tree) in Java 8+ (if entries > 8)
static class Node<K,V> implements Map.Entry<K,V> { final int hash; final K key; V value; Node<K,V> next; }
Each node contains:
- The computed hash
- The key
- The value
- A reference to the next node in the same bucket (for chaining)
4. Collision Handling Mechanism
- Hash collision occurs when two different keys produce the same bucket index.
- HashMap handles collisions using chaining:
- Entries are linked together in a singly linked list at the same index.
- In Java 8 and above, when the list grows beyond 8 entries (and the map size is at least 64), it is converted into a Red-Black Tree for faster lookup (
O(log n)
).
Why Treeify?
- To maintain performance in heavily loaded maps.
- Reduces the worst-case time complexity from
O(n)
toO(log n)
during retrieval.
5. Load Factor and Resizing
- Load factor determines how full the HashMap can get before it resizes.
- Default value:
0.75
- Default value:
- When the number of entries exceeds
capacity × loadFactor
, the map resizes (rehashes):- It doubles the size of the internal array.
- Each existing entry is rehashed and placed into the new bucket.
Resizing is an expensive operation as it involves:
- Creating a new array
- Iterating all existing entries
- Re-inserting them into the new array using new indexes
6. Working of put()
Method Internally
Here’s what happens when you call put(key, value)
:
- Hash code generation: The key’s
hashCode()
is computed and processed. - Bucket index calculation:
(n - 1) & hash
determines the index in the array. - Check if bucket is empty:
- If yes, create a new Node and insert it.
- Collision Handling:
- If an entry exists at that index:
- Traverse the chain (list or tree).
- If a key match is found (via
equals()
), update the value. - If not, append a new node to the chain.
- If an entry exists at that index:
- Treeify if needed:
- If the number of entries at that index exceeds 8, and total size ≥ 64, convert it to a Red-Black Tree.
7. Working of get()
Method Internally
When you call get(key)
:
- Compute hash of the key using the same logic.
- Find bucket index using
(n - 1) & hash
. - Traverse the bucket at that index:
- If only one node is present, compare keys.
- If it’s a list or tree, search accordingly.
- Return value if found, else return
null
.
8. Other Important Characteristics
Feature | Details |
---|---|
Null keys | Allows one null key only. Hash code is 0, and it’s always stored at index 0. |
Null values | Multiple null values are allowed. |
Thread Safety | Not thread-safe. Use Collections.synchronizedMap() or ConcurrentHashMap for concurrency. |
Ordering | No ordering is guaranteed. Use LinkedHashMap if insertion order matters. |
Initial Capacity | Default is 16; always a power of 2. |
Load Factor | Default is 0.75; can be tuned for performance. |