Learnitweb

Hashing in Java

1. What is Hashing?

Hashing is the process of converting an input of arbitrary length (like a string, file, or object) into a fixed-size numeric or alphanumeric value, known as the hash code or hash value. It’s a core concept in computer science and widely used in:

  • Data retrieval (e.g., HashMap, HashSet)
  • Cryptography (e.g., SHA-256)
  • File integrity checks (e.g., MD5 hash of a file)
  • Password storage
  • Digital signatures

2. Key Characteristics of Hashing

  • Deterministic: For the same input, the output hash is always the same.
  • Fast computation: Good hash functions are computationally efficient.
  • Fixed output: Output size is fixed (e.g., Java hashCode() is 32 bits).
  • One-way (in cryptographic hashing): It’s computationally infeasible to retrieve the original input from the hash.
  • Uniqueness: Ideally, different inputs generate different outputs (called collision resistance).

3. Logic to Create a Good Hash Function

A good hash function balances efficiency and effectiveness. Here’s what it must achieve:

a. Uniform Distribution

  • Hash codes should spread values evenly across all buckets.
  • Prevent clustering that causes multiple entries to fall into the same bucket, which slows down lookup time.

b. Minimize Collisions

  • Different inputs should ideally produce different hash codes.
  • A collision occurs when two distinct inputs produce the same hash code. Java handles this internally, but too many collisions degrade performance.

c. Consistent with equals()

  • If two objects are equal according to equals(), they must return the same hashCode().
  • Otherwise, hash-based collections like HashMap and HashSet will not function correctly.

d. Use Prime Numbers

  • A common technique is to use a prime number (usually 31 or 37) to multiply intermediate results.
  • This reduces the likelihood of hash collisions.

e. Use Immutable Fields

  • Only include fields in hashCode() that are immutable. If a field changes after insertion into a HashMap, the object becomes “lost” in the map.

4. How to Generate hashCode() in Java

Java provides multiple ways to generate hash codes effectively:

Manual Hash Code Implementation

Writing your own hashCode() method gives you full control over how hash codes are computed for your objects. This is particularly useful when:

  • You want optimal performance.
  • You need to avoid overhead from methods like Objects.hash() (which involves varargs and array creation).
  • You’re working in performance-critical applications, such as large-scale data structures.
@Override
public int hashCode() {
    int result = 17; // Step 1: Start with a non-zero arbitrary number
    result = 31 * result + field1.hashCode(); // Step 2: Multiply by a prime and add hash of each field
    result = 31 * result + field2;            // Repeat for each significant field
    ...
    return result; // Step 3: Return final result
}

Step 1: Starting Value

  • Begin with a non-zero constant like 17.
  • This avoids starting with 0, which would be neutral in multiplication and reduce variability in the result.

Step 2: Use of Prime Numbers (usually 31)

  • Multiplying by a prime number reduces the number of collisions by spreading out hash codes more evenly.
  • 31 is chosen in Java for historical and performance reasons:
    • It’s an odd prime, reducing symmetry.
    • It allows optimization: 31 * x can be computed as (x << 5) - x.

Step 3: Combining Field Values

  • For object fields (e.g., String, Date), use their own hashCode() methods.
  • For primitive fields:
    • int: use directly.
    • long: combine using (int)(value ^ (value >>> 32))
    • boolean: use 1 for true, 0 for false.
    • float: use Float.floatToIntBits(value)
    • double: use Double.doubleToLongBits(value) and then combine like a long.

Step 4: Return the Final Result

  • After combining all significant fields, return the result.
  • This final result should give a well-distributed 32-bit integer hash code.

Example

public class Book {
    private String title;
    private String author;
    private int yearPublished;
    private boolean isBestSeller;

    @Override
    public int hashCode() {
        int result = 17;
        result = 31 * result + (title != null ? title.hashCode() : 0);
        result = 31 * result + (author != null ? author.hashCode() : 0);
        result = 31 * result + yearPublished;
        result = 31 * result + (isBestSeller ? 1 : 0);
        return result;
    }
}

Using Objects.hash()

Java 7+ provides a utility method to simplify hash code generation:

@Override
public int hashCode() {
    return Objects.hash(name, age);
}

Internally, it uses a similar pattern to combine the hash codes of the fields.