HashMap and HashSet internal working

HashMap and HashSet in Java: Internal Working with Examples

Java’s HashMap and HashSet are widely used data structures that provide efficient storage and retrieval of data. However, their internal working mechanisms are often asked in technical interviews, especially their use of hashing, handling of collisions, and performance characteristics.

Let’s explore their internal workings, accompanied by examples and typical interview questions.

1. HashMap Internal Working

A HashMap in Java is a key-value pair data structure that allows fast insertion, deletion, and retrieval of data. It uses hashing to map keys to values.

Basic Structure

Array of Buckets: Internally, HashMap is implemented as an array of buckets.
Hashing: Each key is hashed to produce a hash code, which determines the index of the array (bucket) where the key-value pair will be stored.
Collisions: When multiple keys hash to the same bucket (index), it handles collisions using either a linked list or a balanced tree (since Java 8).
Load Factor and Rehashing: When the HashMap exceeds a certain load factor (default is 0.75), it resizes (doubles the size of the array) and rehashes all the keys to distribute them evenly.

Internal Working Steps:

Hashing: The hashCode() of the key is computed, and it is used to find the bucket where the key-value pair will be stored.
Index Calculation: The bucket index is calculated as (hashCode & (n - 1)) where n is the length of the bucket array.
Collision Handling: If two keys hash to the same index (collision):
- Pre-Java 8: A linked list is used to store multiple key-value pairs in the same bucket.
- Post-Java 8: If the number of collisions at a bucket exceeds a threshold (8), the linked list is converted into a balanced binary tree (Red-Black Tree) to optimize performance.
Rehashing: If the load factor exceeds the threshold, the HashMap grows and rehashes all entries.

Example:

java

Copy code

import java.util.HashMap;

public class HashMapExample {

public static void main(String[] args) {

// Creating a HashMap with key as String and value as Integer

HashMap<String, Integer> map = new HashMap<>();

// Adding elements (key-value pairs)

map.put("John", 25);

map.put("Emma", 30);

map.put("Sam", 22);

// Accessing an element

System.out.println("John's Age: " + map.get("John")); // Output: 25

// Iterating through HashMap

map.forEach((k, v) -> System.out.println(k + ": " + v));

}

Performance:

Time Complexity:
- Best case: O(1) for put/get operations.
- Worst case: O(n) (if all keys hash to the same bucket), but after Java 8, with balanced trees, it becomes O(log n) for collisions.
Space Complexity: Depends on the number of entries and the load factor.

Common Interview Questions on HashMap:

How does HashMap work internally?
- Explain the bucket array, hashing mechanism, and collision resolution techniques (linked list vs. red-black tree).
What happens when two keys have the same hash code?
- They are placed in the same bucket and handled by a linked list or a balanced tree if collisions exceed a threshold.
What is the load factor in HashMap?
- The load factor defines the threshold for resizing. The default load factor is 0.75, meaning the HashMap will resize when it is 75% full.
What is rehashing? When does it occur?
- Rehashing occurs when the HashMap grows and redistributes the entries to a new bucket array after resizing.

2. HashSet Internal Working

A HashSet is a data structure that stores unique elements. Internally, it uses a HashMap to ensure the uniqueness of elements, where the element is stored as the key, and a constant dummy object (PRESENT) is stored as the value.

Key Characteristics:

Uniqueness: All elements in a HashSet must be unique. It ensures this by storing elements as keys in the underlying HashMap.
No Order: Elements are not stored in any particular order.
Hashing: The HashSet uses hashing to store elements, which allows for constant-time complexity on average for add, remove, and contains operations.

Internal Working Steps:

Uses HashMap: Internally, HashSet uses a HashMap to store its elements. When you add an element to a HashSet, it is inserted as a key in the HashMap with a dummy value.
Ensuring Uniqueness: If an element already exists (i.e., its key is already present in the HashMap), it is not added again, ensuring uniqueness.
Hashing and Collisions: It uses the same hashing mechanism and collision resolution as HashMap.

Example:

java

Copy code

import java.util.HashSet;

public class HashSetExample {

public static void main(String[] args) {

// Creating a HashSet of String

HashSet<String> set = new HashSet<>();

// Adding elements

set.add("Apple");

set.add("Banana");

set.add("Mango");

set.add("Apple"); // Duplicate element will not be added

// Display the set (order is not guaranteed)

System.out.println(set); // Output: [Apple, Banana, Mango]

// Checking if an element exists

System.out.println(set.contains("Apple")); // Output: true

}

Performance:

Time Complexity:
- Best case: O(1) for add, remove, and contains operations.
- Worst case: O(n) if there are many collisions.
Space Complexity: Dependent on the number of elements in the HashSet and the load factor.

Common Interview Questions on HashSet:

How is HashSet implemented internally?
- Explain that HashSet uses a HashMap internally, where the element is stored as the key, and a constant dummy value is used for the value.
How does HashSet ensure uniqueness?
- By checking the keys of the underlying HashMap. If the key (element) already exists, the new element is not added.
Why is HashSet faster than TreeSet?
- HashSet uses hashing, giving O(1) average time complexity for operations, while TreeSet uses a red-black tree with O(log n) time complexity.

Comparison of HashMap and HashSet

Feature

HashMap

HashSet

Structure

Key-Value pairs

Stores only unique elements

Internal Data

Uses an array of buckets (hash table)

Uses a HashMap internally

Null Values

Allows one null key and multiple null values

Allows one null element

Duplicate Values

Keys must be unique; values can be duplicated

Does not allow duplicate elements

Time Complexity

O(1) average for put/get/remove

O(1) average for add/remove/contains

Ordering

No order (unless LinkedHashMap is used)

No order (unless LinkedHashSet is used)

Use Case

Key-value mappings, fast lookups by key

Collection of unique elements

Typical Interview Questions on HashMap and HashSet:

What is the difference between HashMap and HashSet?
- Explain the difference in purpose (key-value pairs vs. unique elements) and internal structure (hash table for HashMap, HashMap used internally for HashSet).
Why is HashSet faster than ArrayList for search operations?
- HashSet uses hashing, providing O(1) time complexity, while ArrayList requires O(n) time for search operations.
What happens if the hashCode() method is not implemented correctly in a class used as a key in HashMap?
- If hashCode() is not implemented correctly, the HashMap may fail to locate keys correctly, causing collisions, or it may prevent keys from being found during lookups.
Why is the load factor important in HashMap?
- The load factor determines when the HashMap should resize. A low load factor increases the frequency of resizing, while a high load factor can lead to more collisions.

Page updated

Google Sites

Report abuse