Learnitweb

What is initial capacity, load factor and rehashing of a HashMap?

As per Java documentation:

An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.

As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.

Default initial capacity of the HashMap takes is 16 and load factor is 0.75f(i.e 75% of current map size).

The load factor represents at what level the HashMap capacity should be doubled. For example, product of capacity and load factor is 12 (16 * 0.75 = 12). This means that after storing the 12th key – value pair into the HashMap , its capacity becomes 32.

Load Factor is a measure to decide when to increase the HashMap capacity to maintain get and put operation complexity of O(1).

Significance of load factor

A well defined hashcode method distributes entries in HashMap uniformly across buckets (16 initially). As the number of items increase, these items will be distributed uniformly across these 16 buckets. For example, if 32 items then 2 in each bucket, if 48 items then 3 in each bucket and so on. Thus even after increase in number of items, the maximum lookup time in each bucket does not increase sharply. This is a benefit of having a well defined hashcode method.

Now, if we keep increasing the number of items, performance will start degrading due to large number of items in each bucket. The solution to maintain performance in this case is to increase number of elements keeping the bucket size fixed(16). Increasing number of items in HashMap helps maintaining complexity of O(1) for get and put operations.

Here load factor is helpful so as to decide when to increase number of elements in HashMap.