How HashMap works internally in Java?

UPDATED: 27 November 2019
HashMap is one of the implementation of java.util.Map interface. We will understand how it internally stores (put) and retrieve (get) values. HashMap uses array known as bucket/table to store (key, value) pair.

This is one of the popular question in Java Interviews, lets understand how it works.

Instantiation: How to create HashMap object?
There are 3 different ways you instantiate HashMap

  1. new HashMap() - It will create HashMap with default capacity of 16 and default load factor 0.75f.
  2. new HashMap(int initialCapacity) - You can provide the initial capacity of HashMap and default load factor 0.75f is used.
  3. new HashMap(int initialCapacity, float loadFactor) - You can provide both initial capacity and load factor.
Note: We will consider HashMap with default capacity and load factor for explanation of this article.

Understanding implementation of HashMap
I extracted Node class (inner class) from HashMap which implements sub-interface Entry of Map interface. Node class plays important role in implementation of HashMap. Every (key, value) pair in HashMap will be stored as Entry object using Node class implementation.

Node class holds - hash code of key, key, value and `next` is used to store next element (node) reference in case of hash code/key collision. We'll understand hash code/key collision later in this article.
static class Node<K,V> implements Map.Entry<K,V> {
    final int hash; // hash code of key
    final K key; // the key
    V value; // the value to put
    Node<K,V> next; // next element(node) in case of hash code/key collision 

    Node(int hash, K key, V value, Node<K,V> next) {...}
    ...
    ...
}

What happens when you put(K key, V value) in HashMap?
When you put (key, value) pair in HashMap, it generates hash of key by calling hashCode() method of key and follows these steps...

  • put#1.1: Create the bucket (array) of Node<K, V>[] with default capacity or given capacity if not initialized. 
    • In case of bucket (array) is full then it increases capacity by twice the current capacity and transfer all data to new bucket (array). (i.e default capacity [16] > [32] > [64] ...)
  • put#1.2: Now it generates bucket location (array-index) by performing bitwise AND on current capacity of bucket and hash code of key
    /**
     * i = index to store (key, pair) in bucket (array)
     * n = size of bucket (array)
     * hash = hash code of key
     */
    int i = (n - 1) & hash;
    
  • put#1.3: Fetch node (element) from bucket using generated location (index).
    • put#1.3.1: no element found - insert given key, pair in form of Node object (Entry object).
    • put#1.3.2: element found at that location (index) - its the case of Hash code/key collision.
      • put#1.3.2.1: Now it'll compare key of existing node (element) with newly given key. If both the keys are same then it'll replace current Node's value with new value.
      • put#1.3.2.2: If both the keys are different then it forms linked list at the bucket location (index). Current node's next variable will be assigned to reference of newly given Node (key, value). 


What happens when you get(Object key) in HashMap?
When you try to get value using key, it generates hash of key by calling hashCode() method of key and follows these steps...

  • get#1.1: Generates bucket location (array-index) by performing bitwise AND on current capacity of bucket and hash code of key.
  • get#1.2: Find the Entry(Node) object from that bucket index.
    • get#1.2.1: No Entry object found - returns null.
    • get#1.2.2: Entry object found from bucket - It will compare provided hash of key, key and calls equals() method of key with Entry object found from bucket. 
      • get#1.2.2.1: If everything matched then it returns the Entry object. 
      • get#1.2.2.2: If not matched then it check linked list formed at that bucket location (has `next` element) then do the same check as in get#1.2.2 until it find the correct Entry object from linked list.
Now lets discuss question can be asked in interview related to implementation.

What is the initial capcity and load factor of HashMap?
Initial capacity of HashMap is 16 and load factor is 0.75f.

How HashMap stores data in bucket (array/table)?
Its important to remember that HashMap stores not just hash code of key in Array. It stores hash code, key, value and `next` as Entry object in Array.

What is Hash code/key collision?
When two same or different hash code of key generates same index of bucket location by performing bitwise AND is called hash code/key collision. In this situation HashMap forms linked list at given bucket location (index).

For example "AaAaAa".hashCode() and "AaAaBB".hashCode() generates same hash code.

How HashMap handles Hash code/key collision?
What will happend if two objects have same hash code?
It forms linked list at bucket location.

How HashMap will retrieve value when two keys have same hash code?
Linked list formed at bucket location when two keys have same hash code so it'll travese through all linked list node until it find the correct key and equals method of key retruns true. Read get1.2.2 for complete understanding.

Where does HashMap stores null key?
null key will be stored at 0th index of bucket.

Why String, Integer and other wrapper classesare good choice for HashMap key?
String, Integer and other wrapper classes are immutable so its best choice to use it as key. Why String is immutable in Java?

Can we use mutable key in HashMap?
Yes, You can use mutable key but its not good choice because it'll fail to retrive correct value in get(key).

Can we use our own custom object as key in HashMap?
Yes, You can use your custom object as key in HashMap but its necessary to consider immutablity of that object and also implementing hashCode() and equals() method in your class.

How HashMap is improved in Java 8?
Prior to Java 8, HashMap forms linked list in case of hash collision. Now from Java 8 when hash collision occurs and it hits following threshold TREEIFY_THRESHOLD = 8 and MIN_TREEIFY_CAPACITY = 64 then it uses TreeNode to store Entry object to improve HashMap get performance.
Why HashMap resize when it hits TREEIFY_THRESHOLD value which is not required?

Which tree stucture is used in Java 8 to improve performance of HashMap?
red-black tree structure is used to improce performance of HashMap.

References
Popular Map interview questions in Java
How LinkedHashMap works internally in Java?

0 comments :