1. What is Hashing?
Hashing is the process of converting an input of arbitrary length (like a string, file, or object) into a fixed-size numeric or alphanumeric value, known as the hash code or hash value. It’s a core concept in computer science and widely used in:
- Data retrieval (e.g.,
HashMap
,HashSet
) - Cryptography (e.g., SHA-256)
- File integrity checks (e.g., MD5 hash of a file)
- Password storage
- Digital signatures
2. Key Characteristics of Hashing
- Deterministic: For the same input, the output hash is always the same.
- Fast computation: Good hash functions are computationally efficient.
- Fixed output: Output size is fixed (e.g., Java
hashCode()
is 32 bits). - One-way (in cryptographic hashing): It’s computationally infeasible to retrieve the original input from the hash.
- Uniqueness: Ideally, different inputs generate different outputs (called collision resistance).
3. Logic to Create a Good Hash Function
A good hash function balances efficiency and effectiveness. Here’s what it must achieve:
a. Uniform Distribution
- Hash codes should spread values evenly across all buckets.
- Prevent clustering that causes multiple entries to fall into the same bucket, which slows down lookup time.
b. Minimize Collisions
- Different inputs should ideally produce different hash codes.
- A collision occurs when two distinct inputs produce the same hash code. Java handles this internally, but too many collisions degrade performance.
c. Consistent with equals()
- If two objects are equal according to
equals()
, they must return the samehashCode()
. - Otherwise, hash-based collections like
HashMap
andHashSet
will not function correctly.
d. Use Prime Numbers
- A common technique is to use a prime number (usually 31 or 37) to multiply intermediate results.
- This reduces the likelihood of hash collisions.
e. Use Immutable Fields
- Only include fields in
hashCode()
that are immutable. If a field changes after insertion into aHashMap
, the object becomes “lost” in the map.
4. How to Generate hashCode()
in Java
Java provides multiple ways to generate hash codes effectively:
Manual Hash Code Implementation
Writing your own hashCode()
method gives you full control over how hash codes are computed for your objects. This is particularly useful when:
- You want optimal performance.
- You need to avoid overhead from methods like
Objects.hash()
(which involves varargs and array creation). - You’re working in performance-critical applications, such as large-scale data structures.
@Override public int hashCode() { int result = 17; // Step 1: Start with a non-zero arbitrary number result = 31 * result + field1.hashCode(); // Step 2: Multiply by a prime and add hash of each field result = 31 * result + field2; // Repeat for each significant field ... return result; // Step 3: Return final result }
Step 1: Starting Value
- Begin with a non-zero constant like
17
. - This avoids starting with
0
, which would be neutral in multiplication and reduce variability in the result.
Step 2: Use of Prime Numbers (usually 31)
- Multiplying by a prime number reduces the number of collisions by spreading out hash codes more evenly.
- 31 is chosen in Java for historical and performance reasons:
- It’s an odd prime, reducing symmetry.
- It allows optimization:
31 * x
can be computed as(x << 5) - x
.
Step 3: Combining Field Values
- For object fields (e.g.,
String
,Date
), use their ownhashCode()
methods. - For primitive fields:
int
: use directly.long
: combine using(int)(value ^ (value >>> 32))
boolean
: use 1 for true, 0 for false.float
: useFloat.floatToIntBits(value)
double
: useDouble.doubleToLongBits(value)
and then combine like along
.
Step 4: Return the Final Result
- After combining all significant fields, return the result.
- This final result should give a well-distributed 32-bit integer hash code.
Example
public class Book { private String title; private String author; private int yearPublished; private boolean isBestSeller; @Override public int hashCode() { int result = 17; result = 31 * result + (title != null ? title.hashCode() : 0); result = 31 * result + (author != null ? author.hashCode() : 0); result = 31 * result + yearPublished; result = 31 * result + (isBestSeller ? 1 : 0); return result; } }
Using Objects.hash()
Java 7+ provides a utility method to simplify hash code generation:
@Override public int hashCode() { return Objects.hash(name, age); }
Internally, it uses a similar pattern to combine the hash codes of the fields.