41 Hashing, Hash Tables, and Hash Maps
41.1 Background and Motivation
41.1.1 The Power and Constraints of Indexing
Indexing allows for direct access to individual elements within an array, using an integer as a specific reference, referred to as an ‘index’. When called with the same index, an array always returns the same element, provided the array itself hasn’t been altered. For instance, to retrieve the 12th element of an array arr
, you would use arr[11]
.
Mathematically, the formula to find the memory address of an array element is given by:
baseAddress + (index * sizeOfElement)
While this concept is powerful and efficient, it encounters certain limitations when extended beyond its original design. Let’s delve into one of these constraints.
41.1.2 The Challenge of Non-numeric Indexing
Imagine a scenario where we want to access an array element using a string as an index, for example, arr["dhruv"]
. What’s the issue here?
The problem lies in the arithmetic: baseAddress + ("dhruv" * sizeOfElement)
is not feasible because the index, “dhruv”, is not a numeric value. Without the ability to perform this operation, we’re unable to directly use strings as indices in an array.
41.1.3 Mapping Strings to Numbers: A Key to Overcoming the Constraint
Consider an array of strings where we map numeric indices to string values:
0 -> "Alice"
1 -> "Bob"
2 -> "Charlie"
With an array allowing us to map a number to a string, could we use a similar mechanism to map strings to numbers? Absolutely!
One approach involves a linear search for a string in the array, noting its index. For instance, let’s assume we find the string “Dhruv” at index 5:
0 -> "Alice"
1 -> "Bob"
2 -> "Charlie"
...
5 -> "Dhruv"
The index where we locate the string “Dhruv” (5, in this case) can serve as a key in another array, enabling access to data associated with “Dhruv”. Nevertheless, this method, while feasible, isn’t very efficient. Linear searching can become significantly slow when dealing with large datasets, begging the question - is there a better solution? We’ll explore the answer to this question in the upcoming sections.
41.1.4 Hashing: A Key to Efficient Mapping
This brings us to hashing, an ingenious method that enables us to map non-numeric keys, such as strings, to numeric indices, achieving efficiency even with a large volume of data. At the core of this mechanism is a hash function, which transforms a given key into a number that serves as an index within an array.
A hash function ingests a key and excretes an index, pointing to a location within the hash table’s array. An effective hash function should meet the following criteria:
- Uniform distribution: The function should scatter keys evenly across the array, thereby reducing the likelihood of collisions (where multiple keys map to the same index).
- Minimal collisions: It should minimize the occurrence of hash collisions.
- Swift computation: The function should quickly compute the hash, thereby enabling rapid data insertion, retrieval, and deletion.
- Deterministic output: The function should consistently produce the same output for a given input.
Consider a rudimentary hash function that transforms the first character of a string into its ASCII code:
hash("dhruv") = ASCII('d') = 100
The hash function yields 100, which we can utilize as an index in an array to store or retrieve data related to “dhruv”. This illustrates how hashing enables us to use strings (and other non-numeric keys) as indices, thus offering a practical solution to efficient mapping.
41.2 Hash Functions
41.2.1 Introduction
In our earlier discussion, we accomplished mapping the string “D” to certain data, akin to mapping an index to specific data in an array. The resulting data structure, capable of mapping any string to any data, is termed a hash table. The function that conducts this mapping from strings to data is, as you’ve guessed, the hash function. The procedure of associating “D” to an index 3, where we locate the corresponding value, can be referred to as hashing “D”.
As we step further into the realm of hashing, we’ll explore how it enables mapping any object (referred to as a ‘key’) to any other object (termed the ‘record’ or ‘value’). Picture your student ID as a key, while the record stores all relevant data about you attached to this ID.
Figure 2: At this juncture, a diagram illustrating the concept of hash functions and their role in a hash table would be helpful. The diagram could visualize a hash function mapping different keys (string, numeric or otherwise) to different indices in an array (hash table), which in turn point to corresponding data or records.
41.2.2 Hashing
We can conceptualize hashing as a mechanism for storing and retrieving records from a database, similar to an adept librarian who knows exactly where to fetch a book or where to replace it on the vast shelves. Given a search key value, hashing facilitates the insertion, deletion, and search for records. The speed and efficiency of these operations is remarkable when the hashing is properly implemented, often examining merely one or two records for each operation. This outperforms the average cost of \(O(log n)\) required to execute a binary search on a sorted array of n records, or to conduct an operation on a binary search tree.
Despite its simplicity in concept, the implementation of hashing can be surprisingly complex, requiring careful attention to every detail to ensure a correctly working system. A hash system stores records in an array known as a hash table, denoted as HT
in our discussions. Hashing operates by computing a search key K
to pinpoint the location in HT
that houses the record with key K
. This computation is performed by the hash function, denoted as h
.
Given that hashing schemes position records in the table based on the needs of the address calculation, records aren’t sorted by value. A position in the hash table is often referred to as a slot. The number of slots in the hash table HT
is denoted by the variable M
, with slots numbered from 0 to M−1. The ultimate aim of a hashing system is to ensure that for any key value K
and a hash function h
, \(i=h(K)\) identifies a slot in the table such that \(0 \leq i < M\), with the key of the record at HT[i]
being equal to K
.
However, hashing isn’t the panacea for all applications. It stumbles in scenarios where multiple records with the same key value are allowed. Nor is it effective in addressing range searches, i.e., locating all records whose key values fall within a particular range. It also falters when it comes to finding the record with the minimum or maximum key value, or traversing the records in key order. Hashing excels at answering the question, ‘Which record, if any, has the key value K?’ It’s most appropriate for exact-match queries, and proves to be incredibly efficient when implemented correctly.
Given the large range of key values, hashing generally takes records and stores them in a table with a relatively small number of slots. This inevitably leads to situations where a hash function maps two keys to the same slot, an event known as a collision.
To illustrate, imagine a classroom of students. What is the likelihood that a pair of students share the same birthday (same day of the year, not necessarily the same year)? With 23 students and 365 “slots” or possible days for a birthday, the odds seem slim. However, as the number of students grows, so does the probability of a collision, i.e., two students sharing a birthday. A database utilizing hashing must be careful not to create a hash table so large it wastes space.
An ideal hash function should distribute keys to slots such that every slot in the hash table has an equal probability of being filled, given the actual set of keys being used. Unfortunately, we typically have no control over the distribution of key values for the actual records in a given database or collection. Hence, the effectiveness of any particular hash function depends on the actual distribution of the keys used within the allowable key range.
In some cases, incoming data are well distributed across their key range. For instance, if the input is a set of random numbers selected uniformly from the key range, any hash function that assigns the key range so that each slot in the hash table receives an equal share of the range will likely also distribute the input records uniformly within the table.
However, there are numerous scenarios where incoming records are highly clustered or otherwise poorly distributed. It can be challenging to devise a hash function that successfully scatters the records throughout the table, especially if the input distribution is unknown in advance. For example, if the input is a collection of English words, the beginning letter will be poorly distributed. A dictionary of words mapped to their frequencies is often used in basic natural language processing algorithms.
In summary, while any function can serve as a hash function (i.e., mapping a value to an index), not all can qualify as a good hash function. A function that constantly returns the index 0 is a hash function, albeit a subpar one as it maps everything to 0. A common hash function is the modulus operator. For instance, in N
-sized hash tables, it’s common to use the modulus of N
as a hash function. If N
is \(20\), data for 113 will be hashed to index \(113 \% 20 = 13\).
But what happens when multiple pieces of data map to the same index using the modulus operator? For instance, \(53 \% 20 = 13\), \(73 \% 20 = 13\), etc. The solution lies in the use of nested data structures, allowing us to store everything at index 13. We will explore this in more detail later.
41.2.3 Simple Hash Functions
Let’s envision hashing as a real-world scenario. A teacher has a stack of students’ tests, and she wants to categorize them by the last digit of their respective student IDs. In this case, the “bucket” will be the bins labeled with numbers from 0 to 9, and the teacher will place each test paper in the bin that matches the last digit of the student’s ID. This process is analogous to a hash function, and the simple function in this example is equivalent to the modulo operation.
Given a hash table of size 5, we can use this hash function to assign keys to the appropriate index:
HashTable size: 5
HashFunction: key % size
Keys: 15, 28, 47, 10, 33
Indices:
15 % 5 = 0
28 % 5 = 3
47 % 5 = 2
10 % 5 = 0
33 % 5 = 3
This quick mapping of keys to indices highlights the simplicity and efficiency of a well-chosen hash function. However, not all hash functions are created equal, and their effectiveness can be dependent on the data they’re being applied to. Let’s explore a few other hash functions that are commonly used in different scenarios.
41.2.4 Various Hash Functions and Their Applications
41.2.4.1 Direct Hashing
Direct hashing is as straightforward as it gets. If we consider the item’s key as its index, this constitutes direct hashing. It’s like walking into a classroom and asking students to sit according to their roll numbers, which are unique to each student. So, a student with roll number 15 would sit at desk number 15.
This simplicity, however, comes with its own limitations:
- Keys must be non-negative integers.
- The size of the hash table must equal the maximum key value plus 1, which can lead to a large table and inefficient use of space if the keys are large.
41.2.4.2 Modulo Hash
The modulo hash function, as seen in our example above, utilizes the remainder of the key divided by the table size M
to determine the index. It’s a simple and effective way to convert a larger key range into a manageable index range.
h(K) = K % M
41.2.4.3 Mid-Square Hash
Mid-square hashing takes a unique approach. The key is squared, and a portion of the resulting squared value is used as the index. This technique can be particularly useful when keys are not uniformly distributed.
h(K) = middle_digits(K^2)
41.2.4.4 Mid-Square Hash with Base 2
This variation of mid-square hashing squares the key and extracts the middle bits of the binary representation of the squared value as the index. This technique can be particularly useful for binary keys.
h(K) = middle_bits(K^2)
41.2.4.5 Multiplicative String Hashing
Multiplicative string hashing offers a way to handle string keys. It treats the characters in the string as numbers and combines them using multiplication and a constant. This approach can help achieve a good distribution of string keys in the hash table.
h(K) = (c1 * a^(n-1) + c2 * a^(n-2) + ... + cn) % M
Here, c1, c2, ..., cn
are the character codes of the string, a
is a constant, n
is the length of the string, and M
is the size of the hash table.
Applying our hash function (modulo 5) to the keys from the example above, we end up with a hash table represented in ASCII as follows:
Index | Key
-------------
0 | 15
1 | -
2 | 47
3 | 28
4 | -
Here, we observe that the keys 15 and 10, as well as 28 and 33, have resulted in collisions, as they both map to the same indices (0 and 3, respectively). The presence of collisions brings us to another important aspect of hashing: collision resolution strategies, which we will explore in the upcoming sections.
41.2.5 Comparing Different Hash Functions
Various hash functions exist, each offering a distinct balance between computational speed and uniform key distribution, thus leading to different degrees of collision occurrence. The effectiveness of a hash function can significantly influence the overall performance of a hash table.
41.2.5.1 Efficiency versus Complexity Trade-off
For instance, a simple hash function, such as modulo operation, is computationally efficient but tends to distribute keys unevenly across the table. This non-uniform distribution increases the probability of collisions, resulting in deteriorated performance due to the need for collision resolution techniques.
On the other hand, more sophisticated hash functions, including cryptographic hash functions, yield a more uniform distribution of keys across the table. However, their complexity leads to slower computation times, impacting the speed of operations.
In real-world applications, the choice of a hash function is largely influenced by the specific demands of the task and the nature of the data being processed. The primary goal is to strike a balance between achieving a uniform key distribution, minimal collisions, fast computation, and deterministic output.
41.3 Collision Resolution in Hashing: Chaining and Open Addressing
41.3.1 Hash Collisions
Hash collisions, a scenario where multiple keys map to the same index, are an inevitable outcome in hash tables due to the pigeonhole principle, which states that if there are more pigeons than pigeonholes, at least one pigeonhole must contain more than one pigeon. In the context of hashing, the keys represent pigeons, and the indices of the hash table are the pigeonholes. This analogy aptly describes how more keys than available indices will result in collisions. Such collisions can have a detrimental effect on the efficiency of hash table operations, increasing access times for insertions, deletions, and retrievals.
Two prominent strategies for collision resolution include chaining and open addressing.
41.3.2 Chaining in Hash Collisions
Chaining employs an auxiliary data structure, such as a linked list, at each index to accommodate multiple key-value pairs. This strategy effectively creates a ‘chain’ of entries at the index where a collision occurs.
41.3.2.1 Process for Insertion, Search, and Deletion
The following steps outline the handling of key-value pairs during insertion, search, and deletion when using chaining:
Insertion: The hash function calculates the index for the key. If the index is empty, a new data structure is initialized, and the key-value pair is inserted. In case of a non-empty index, the key-value pair is added to the existing data structure.
Search: Upon calculating the index using the hash function, if the index is empty, it indicates the absence of the key in the hash table. If not empty, the data structure at the index is searched to find the key.
Deletion: Similar to the search operation, the index is determined using the hash function. If the index is empty, it means the key does not exist in the hash table. If the index is non-empty, the data structure is searched to locate the key, which is then removed.
41.3.2.2 Advantages and Disadvantages of Chaining
Chaining presents several benefits and drawbacks:
- Advantages:
- Ease of implementation: The use of pre-existing data structures, like linked lists, simplifies the implementation of chaining.
- Dynamic size management: The data structure at each index can grow or shrink as necessary, enabling efficient space utilization.
- Disadvantages:
- Increased space requirements: Chaining necessitates extra space to store the data structure at each index, contributing to memory overhead.
- Variable access time: The time taken to access key-value pairs is contingent on the length of the data structure at the index, which can vary.
Despite the memory overhead and potentially variable access times, chaining remains a widely used method for resolving hash collisions due to its simplicity and capacity to dynamically manage memory.
41.3.3 Open Addressing in Hash Collisions
Open addressing presents an alternative strategy for collision resolution. It entails finding a substitute index for a key-value pair if the original index is already occupied. This method employs a technique known as probing to find the next available index in the event of a collision. Common probing techniques include linear probing, quadratic probing, and double hashing.
41.3.3.1 Probing Techniques
- Linear probing: If a collision is encountered, the hash table is scanned sequentially (one index at a time) until an empty slot is discovered.
- Quadratic probing: During a collision, the hash table is searched quadratically (by incrementing the index by the square of the probe number) until a vacant slot is identified.
- Double hashing: In the event of a collision, a secondary hash function is utilized to determine a new index for the key-value pair. This process is repeated until an empty slot is found.
41.3.3.2 Insertion, Search, and Deletion
When interacting with a hash table operating under open addressing, three main operations emerge: insertion, search, and deletion. The performance of these operations deeply intertwines with the hash function and probing technique utilized:
Insertion: To insert a key-value pair, begin by computing the hash function to determine the initial index. If the index remains unoccupied, place the key-value pair there. However, if a collision occurs and the index is occupied, leverage the probing technique to navigate the terrain of the hash table and locate an alternative index. There, you will find your new home for the key-value pair.
Search: To find a key, repeat the hashing process, leading to the first potential residence of the key-value pair. If an empty index greets you, the key does not exist in the hash table. If an occupied index is found, compare the stored key with the desired one. A match indicates a successful search; a mismatch compels the probing technique to guide you towards the next probable index. Repeat this process until the key reveals itself or an empty index signals the end of the journey.
Deletion: To remove a key-value pair, initiate the same hashing and searching process. If the key is located, excise the key-value pair and mark the index as a grave—a formerly occupied place. However, a solitary deletion might disrupt the balance of the hash table if the removed pair was part of a cluster, requiring continued probing to rectify the disarray.
41.3.3.3 Advantages and Disadvantages
Like all data structures, open addressing presents both advantages and disadvantages, a balance one must consider carefully:
- Advantages:
- Memory-Efficient: Open addressing demands no extra storage for additional data structures at each index, presenting a more frugal option.
- Defined Size: The static size of the hash table allows for predictable memory usage, a boon when memory resources are constrained.
- Disadvantages:
- Clustering Phenomenon: Certain probing techniques can induce clusters of key-value pairs, hindering efficient access.
- Deletion Dilemmas: The removal of key-value pairs can disrupt the structure, creating “holes” within clusters that must be addressed.
While open addressing provides a viable alternative for resolving hash collisions and may outperform chaining in memory-constrained situations, it’s not the panacea for all use cases. One must consider clustering phenomena and deletion challenges, which could tip the scales unfavorably.
41.4 Evaluating Complexity and Load Factor in Hash Tables
In order to properly assess the complexity of hash functions and hash tables, it’s important to understand the time required to perform fundamental operations such as search, insertion, and deletion. These operations typically consist of two essential steps:
- Computation of the hash function for the given key.
- Traversal of the list of key-value pairs at the computed index.
41.4.1 Analyzing Time Complexity in Hash Computation
In the first step, the time complexity depends on both the key and the hash function used. Let’s take an example where the key is a string, such as “abcd”. In this case, the complexity of its hash function might depend on the length of the string. However, when we’re dealing with an exceptionally large number of entries n
in the map, the length of the keys becomes almost negligible compared to n
. Therefore, we can treat the computation of the hash function as a constant time operation, i.e., O(1).
41.4.2 Investigating Time Complexity in List Traversal
For the second step, which involves traversing the list of key-value pairs at the computed index, the time complexity can be quite variable. In the worst-case scenario, where all the n
entries end up at the same index, the time complexity escalates to O(n). However, thanks to the substantial amount of research dedicated to designing hash functions that distribute keys uniformly in the array, this situation occurs very infrequently.
41.4.3 Understanding the Load Factor
The load factor, represented by the symbol λ, plays a pivotal role in the performance of our hash table. It’s defined as n/b
, where n
represents the number of entries and b
stands for the size of the array. Hence, it refers to the average number of entries at each index:
λ = n/b
It’s crucial to keep this load factor low in order to limit the number of entries at a single index, thus keeping the complexity close to a constant, i.e., O(1).
41.4.4 The Interplay Between Load Factor and Complexity
In order to maintain the load factor within an acceptable range, we can resize the hash table when the load factor surpasses a certain predefined threshold. This action effectively helps to maintain the complexity of the hash table operations around O(1) by ensuring a uniform distribution of the keys across a larger array.
In conclusion, a comprehensive understanding of the complexity and load factor of hash functions is key to designing efficient hash tables. By carefully selecting an appropriate hash function and managing the load factor, we can aim to achieve a near-constant time complexity for various operations in a hash table.
41.5 Rehashing
Rehashing—quite literally, hashing anew—is a mechanism that helps manage the load factor when it swells beyond a certain threshold. This predefined value often defaults to 0.75. If the load factor rises above this threshold, the complexity of hash table operations increases, thereby prompting a need for rehashing.
During rehashing, the size of the array (typically doubling) is augmented, and all existing values are rehashed, subsequently being placed into the new, more capacious array. This enlargement of the array helps in maintaining a lower load factor and thereby, lower complexity.
41.5.1 The Need for Rehashing
As key-value pairs accumulate in the map, they contribute to an increasing load factor. As previously detailed, a rising load factor implies an escalating time complexity, which could potentially compromise the desirable time complexity of O(1). Consequently, to restore the balance, rehashing is employed, expanding the size of the bucketArray
to dial down both the load factor and time complexity.
41.5.2 The Mechanics of Rehashing
Rehashing in a hash table typically unfolds as follows:
- Monitor the load factor with each addition of a new entry to the map.
- If the load factor surpasses its predefined value (or defaults to 0.75 if not specified), initiate rehashing.
- As part of the rehashing process, construct a new array that’s double the size of the previous one and assign it as the new
bucketArray
. - Iterate through each element in the former
bucketArray
and employ theinsert()
method for each, allowing it to find its place in the new, enlargedbucketArray
.
The diagram below offers a visual representation of this rehashing process:
Initial bucketArray (size = 4):
+---+---+---+---+
| | K1| | K2|
+---+---+---+---+
Upon insertion of a new key K3 (load factor > 0.75):
New bucketArray (size = 8):
+---+---+---+---+---+---+---+---+
| | K1| | K2| | | | K3|
+---+---+---+---+---+---+---+---+
By applying rehashing, a hash table can retain its desired time complexity of O(1) even as the element count increases. It’s noteworthy that rehashing, while an effective mechanism, can be a costly operation—especially if the hash table houses a large number of elements. Nevertheless, since rehashing is triggered infrequently and only when the load factor breaches a certain threshold, the amortized cost of rehashing remains low. This allows hash table operations to sustain their near-constant time complexity.
41.6 Hash Tables and Hash Maps
It’s common for individuals, even computer science students and practitioners, to use the terms hash table and hash map interchangeably. However, these are distinct constructs in computing, each with its unique characteristics, capabilities, and preferred applications.
Hash tables and hash maps, while sharing similarities, diverge significantly in their operational mechanisms:
A hash table implements direct hashing, necessitating that the key either be an integer or directly convertible to one, such as a string of numerals. This integer then serves as the determinant for the index in the hash table.
A hash map, in contrast, embraces indirect hashing, where keys of any data type are permitted. This necessitates an auxiliary hash function that transforms the key into an index within the hash table.
When making the choice between a hash table or a hash map, one must critically examine the problem domain, particularly the data type of the keys:
In scenarios where keys are integers or easily translatable into integers, a hash table might be a more fitting solution. A case in point is when handling student IDs as keys; in such a context, a hash table would be ideal.
If the keys, however, belong to other data types or resist easy conversion into integers, a hash map is often the superior choice. For instance, if you are dealing with strings like usernames or URLs as keys, a hash map tends to be the better fit.
41.7 HashMaps in Java
The HashMap class is a part of the Java Collection Framework and encapsulates the functionality of a hash table. As a key-value pair repository, each key within a HashMap is unique, and the ordering of keys is not maintained.
Here’s how to utilize a HashMap in Java:
Import the HashMap class: First, you must import the HashMap class from the
java.util
package to use it within your Java code:import java.util.HashMap;
Instantiate a HashMap: To create a new HashMap instance, employ the following syntax:
HashMap<String, Integer> myMap = new HashMap<String, Integer>();
Inserting elements: To add key-value pairs to the HashMap, make use of the
put()
method:.put("apple", 3); myMap.put("banana", 5); myMap.put("orange", 2); myMap
Retrieving elements: To fetch the value associated with a specific key, use the
get()
method:int apples = myMap.get("apple"); // returns 3 int oranges = myMap.get("orange"); // returns 2
Removing elements: To delete a key-value pair from the HashMap, apply the
remove()
method:.remove("banana"); myMap
Verifying key existence: To verify whether a particular key resides within the HashMap, call the
containsKey()
method:boolean hasApple = myMap.containsKey("apple"); // returns true boolean hasGrape = myMap.containsKey("grape"); // returns false
Iterating over keys: To traverse the keys in a HashMap, you can utilize a for-each loop in conjunction with the
keySet()
method:for (String fruit : myMap.keySet()) { System.out.println(fruit + ": " + myMap.get(fruit)); }
Iterating over values: If you wish to iterate over the values within a HashMap, a for-each loop alongside the
values()
method can be used:for (Integer count : myMap.values()) { System.out.println(count); }
Iterating over key-value pairs: To cycle through the key-value pairs in a HashMap, employ a for-each loop with the
entrySet()
method:for (HashMap.Entry<String, Integer> entry : myMap.entrySet()) { System.out.println(entry.getKey() + ": " + entry.getValue()); }
The HashMap proves to be a handy data structure when it comes to efficiently storing key-value pairs. With constant-time performance for common operations such as put, get, and remove, it emerges as an invaluable asset for a wide array of applications.
This is an excellent first draft that is very detailed. I’ve noticed a few opportunities for improvement that will make your descriptions more consistent and add a bit more depth to your explanations.
41.8 HashTables in Java
In Java, a HashTable is a type of collection that adheres to the Map interface, utilizing a hash table for storage. It shares similarities with HashMap, however, it distinguishes itself through its inherent synchronization which confers thread-safety. The HashTable class stores unique keys and their corresponding values, much like HashMap, it does not maintain any ordering of these keys.
Here is how one might put HashTable to use in Java:
Importing the HashTable Class: The HashTable class can be integrated into your Java code via importing from the
java.util
package:import java.util.Hashtable;
Instantiating a HashTable: The creation of a new HashTable can be achieved with the following syntax:
Hashtable<String, Integer> myTable = new Hashtable<String, Integer>();
Inserting Elements: Key-value pairs can be added to the HashTable using the
put()
method:.put("apple", 3); myTable.put("banana", 5); myTable.put("orange", 2); myTable
Accessing Elements: The value associated with a specific key can be retrieved using the
get()
method:int apples = myTable.get("apple"); // 3 int oranges = myTable.get("orange"); // 2
Deleting Elements: To expunge a key-value pair from the HashTable, make use of the
remove()
method:.remove("banana"); myTable
Key Existence Verification: To ascertain the existence of a key in the HashTable, the
containsKey()
method is handy:boolean hasApple = myTable.containsKey("apple"); // true boolean hasGrape = myTable.containsKey("grape"); // false
Iteration Over Keys: Iteration through the keys in a HashTable can be achieved with a for-each loop in conjunction with the
keySet()
method:for (String fruit : myTable.keySet()) { System.out.println(fruit + ": " + myTable.get(fruit)); }
Iteration Over Values: To traverse through the values in a HashTable, a for-each loop combined with the
values()
method is efficient:for (Integer count : myTable.values()) { System.out.println(count); }
Iteration Over Key-Value Pairs: To iterate through the key-value pairs in a HashTable, the
entrySet()
method can be used in combination with a for-each loop:for (Hashtable.Entry<String, Integer> entry : myTable.entrySet()) { System.out.println(entry.getKey() + ": " + entry.getValue()); }
The HashTable class proves useful in scenarios where key-value pairs need to be stored with a requirement for thread-safe operations. However, due to the performance overhead resulting from synchronization, if thread safety isn’t a primary concern, a HashMap would generally be a more efficient alternative.
41.9 HashSets in Java
A HashSet in Java is a collection class implementing the Set interface, and leverages a hash table for its storage mechanism. Contrary to hash tables and hash maps, it doesn’t store key-value pairs. Instead, it stores unique elements only. These elements aren’t stored in any particular order, and the class doesn’t allow for duplicate values.
The usage of a HashSet in Java can be illustrated as follows:
Importing the HashSet Class: To integrate the HashSet class into your Java code, import it from the
java.util
package:import java.util.HashSet;
Creating a HashSet: A new HashSet can be instantiated using the following syntax:
HashSet<String> mySet = new HashSet<String>();
Adding Elements: Elements can be added to the HashSet using the
add()
method:.add("apple"); mySet.add("banana"); mySet.add("orange"); mySet
Removing Elements: To remove an element from the HashSet, employ the
remove()
method:.remove("banana"); mySet
Element Existence Verification: To verify the existence of an element in the HashSet, the
contains()
method is quite useful:boolean hasApple = mySet.contains("apple"); // true boolean hasGrape = mySet.contains("grape"); // false
Iteration Over Elements: To traverse the elements stored in a HashSet, a for-each loop is effective:
for (String fruit : mySet) { System.out.println(fruit); }
The HashSet class is particularly beneficial when there’s a need to store a unique set of elements without any specific order. Given its provision of constant-time performance for frequent operations such as add, remove, and contains, it emerges as an efficient choice for a wide range of applications.
41.10 hashCode
and equals
in Java
In the heart of Java’s Object Oriented structure, lies the Object
class. It plays the role of the ultimate superclass for all Java classes. Within this class, two methods, hashCode
and equals
, form the fundamental basis for the interaction of objects in various Java collections. These methods interplay crucially in object comparison and are instrumental in the design and function of hash-based collections such as HashSet
, HashMap
, and HashTable
.
41.10.1 Understanding hashCode
The hashCode
method provides a means for generating hash codes, which are integers symbolizing the memory address of an object. The method signature is as follows:
public int hashCode()
By default, the method returns a hash code derived from the object’s memory address. However, subclasses can override this behavior to offer custom hash code generation. For the efficient functioning of hash-based collections, and to maintain object-to-hashcode consistency, a well-defined hashCode
method should adhere to these principles:
- If
equals()
perceives two objects as equal, their hash codes must be identical. - The inverse, however, isn’t true: objects with matching hash codes are not necessarily equal according to their
equals()
method. - Unless data influencing the
equals()
method alters, the object’s hash code should remain consistent.
41.10.2 Overriding the hashCode
Method
For custom classes, overriding hashCode
becomes essential if the equals()
method is also overridden. Ensuring that both these methods are in sync upholds the general hashCode
contract, and thereby guarantees the smooth operation of hash-based data structures.
Below is an instance of a custom Person
class, which overrides both equals()
and hashCode()
methods:
public class Person {
private String name;
private int age;
// Constructor, getters, and setters
@Override
public boolean equals(Object obj) {
if (this == obj) {
return true;
}
if (obj == null || getClass() != obj.getClass()) {
return false;
}
= (Person) obj;
Person person return age == person.age && Objects.equals(name, person.name);
}
@Override
public int hashCode() {
return Objects.hash(name, age);
}
}
Here, the equals()
method verifies if two Person
objects share the same name and age. Meanwhile, the hashCode()
method leverages the utility function Objects.hash()
, which generates a hash code based on the name and age fields.
41.10.3 hashCode
in the Context of Java Collections
In hash-based collections like HashSet
, HashMap
, and HashTable
, the hashCode
method takes center stage, boosting their performance through efficient storage and retrieval of objects based on hash codes.
Maintaining an aptly implemented hashCode
method for the objects being stored is paramount when working with these collections. Missteps could result in compromised performance or erroneous behavior.
In essence, the hashCode
method in Java, which stems from the Object
class, provides a default blueprint for hash code generation. While creating custom classes, it is critical to override the hashCode
method in sync with the equals()
method. This symbiosis between the two methods ensures a seamless operation of hash-based data structures like HashSet
and HashMap
.
While this section provides an understanding of hashCode
and equals
in Java, the next chapter will further delve into comparing objects in Java and the intricacies involved therein.