Java’s HashMap and HashSet are widely used data structures that provide efficient storage and retrieval of data. However, their internal working mechanisms are often asked in technical interviews, especially their use of hashing, handling of collisions, and performance characteristics.
Let’s explore their internal workings, accompanied by examples and typical interview questions.
A HashMap in Java is a key-value pair data structure that allows fast insertion, deletion, and retrieval of data. It uses hashing to map keys to values.
Basic Structure
Array of Buckets: Internally, HashMap is implemented as an array of buckets.
Hashing: Each key is hashed to produce a hash code, which determines the index of the array (bucket) where the key-value pair will be stored.
Collisions: When multiple keys hash to the same bucket (index), it handles collisions using either a linked list or a balanced tree (since Java 8).
Load Factor and Rehashing: When the HashMap exceeds a certain load factor (default is 0.75), it resizes (doubles the size of the array) and rehashes all the keys to distribute them evenly.
Internal Working Steps:
Hashing: The hashCode() of the key is computed, and it is used to find the bucket where the key-value pair will be stored.
Index Calculation: The bucket index is calculated as (hashCode & (n - 1)) where n is the length of the bucket array.
Collision Handling: If two keys hash to the same index (collision):
Pre-Java 8: A linked list is used to store multiple key-value pairs in the same bucket.
Post-Java 8: If the number of collisions at a bucket exceeds a threshold (8), the linked list is converted into a balanced binary tree (Red-Black Tree) to optimize performance.
Rehashing: If the load factor exceeds the threshold, the HashMap grows and rehashes all entries.
Example:
java
Copy code
import java.util.HashMap;
public class HashMapExample {
public static void main(String[] args) {
// Creating a HashMap with key as String and value as Integer
HashMap<String, Integer> map = new HashMap<>();
// Adding elements (key-value pairs)
map.put("John", 25);
map.put("Emma", 30);
map.put("Sam", 22);
// Accessing an element
System.out.println("John's Age: " + map.get("John")); // Output: 25
// Iterating through HashMap
map.forEach((k, v) -> System.out.println(k + ": " + v));
}
}
Performance:
Time Complexity:
Best case: O(1) for put/get operations.
Worst case: O(n) (if all keys hash to the same bucket), but after Java 8, with balanced trees, it becomes O(log n) for collisions.
Space Complexity: Depends on the number of entries and the load factor.
Common Interview Questions on HashMap:
How does HashMap work internally?
Explain the bucket array, hashing mechanism, and collision resolution techniques (linked list vs. red-black tree).
What happens when two keys have the same hash code?
They are placed in the same bucket and handled by a linked list or a balanced tree if collisions exceed a threshold.
What is the load factor in HashMap?
The load factor defines the threshold for resizing. The default load factor is 0.75, meaning the HashMap will resize when it is 75% full.
What is rehashing? When does it occur?
Rehashing occurs when the HashMap grows and redistributes the entries to a new bucket array after resizing.
A HashSet is a data structure that stores unique elements. Internally, it uses a HashMap to ensure the uniqueness of elements, where the element is stored as the key, and a constant dummy object (PRESENT) is stored as the value.
Key Characteristics:
Uniqueness: All elements in a HashSet must be unique. It ensures this by storing elements as keys in the underlying HashMap.
No Order: Elements are not stored in any particular order.
Hashing: The HashSet uses hashing to store elements, which allows for constant-time complexity on average for add, remove, and contains operations.
Internal Working Steps:
Uses HashMap: Internally, HashSet uses a HashMap to store its elements. When you add an element to a HashSet, it is inserted as a key in the HashMap with a dummy value.
Ensuring Uniqueness: If an element already exists (i.e., its key is already present in the HashMap), it is not added again, ensuring uniqueness.
Hashing and Collisions: It uses the same hashing mechanism and collision resolution as HashMap.
Example:
java
Copy code
import java.util.HashSet;
public class HashSetExample {
public static void main(String[] args) {
// Creating a HashSet of String
HashSet<String> set = new HashSet<>();
// Adding elements
set.add("Apple");
set.add("Banana");
set.add("Mango");
set.add("Apple"); // Duplicate element will not be added
// Display the set (order is not guaranteed)
System.out.println(set); // Output: [Apple, Banana, Mango]
// Checking if an element exists
System.out.println(set.contains("Apple")); // Output: true
}
}
Performance:
Time Complexity:
Best case: O(1) for add, remove, and contains operations.
Worst case: O(n) if there are many collisions.
Space Complexity: Dependent on the number of elements in the HashSet and the load factor.
Common Interview Questions on HashSet:
How is HashSet implemented internally?
Explain that HashSet uses a HashMap internally, where the element is stored as the key, and a constant dummy value is used for the value.
How does HashSet ensure uniqueness?
By checking the keys of the underlying HashMap. If the key (element) already exists, the new element is not added.
Why is HashSet faster than TreeSet?
HashSet uses hashing, giving O(1) average time complexity for operations, while TreeSet uses a red-black tree with O(log n) time complexity.
Feature
HashMap
HashSet
Structure
Key-Value pairs
Stores only unique elements
Internal Data
Uses an array of buckets (hash table)
Uses a HashMap internally
Null Values
Allows one null key and multiple null values
Allows one null element
Duplicate Values
Keys must be unique; values can be duplicated
Does not allow duplicate elements
Time Complexity
O(1) average for put/get/remove
O(1) average for add/remove/contains
Ordering
No order (unless LinkedHashMap is used)
No order (unless LinkedHashSet is used)
Use Case
Key-value mappings, fast lookups by key
Collection of unique elements
What is the difference between HashMap and HashSet?
Explain the difference in purpose (key-value pairs vs. unique elements) and internal structure (hash table for HashMap, HashMap used internally for HashSet).
Why is HashSet faster than ArrayList for search operations?
HashSet uses hashing, providing O(1) time complexity, while ArrayList requires O(n) time for search operations.
What happens if the hashCode() method is not implemented correctly in a class used as a key in HashMap?
If hashCode() is not implemented correctly, the HashMap may fail to locate keys correctly, causing collisions, or it may prevent keys from being found during lookups.
Why is the load factor important in HashMap?
The load factor determines when the HashMap should resize. A low load factor increases the frequency of resizing, while a high load factor can lead to more collisions.