Hash tables are a fundamental data structure used to store key-value pairs efficiently. However, collisions can occur when two keys hash to the same index. Several techniques have been developed to handle these collisions:
Separate Chaining
Separate chaining, also known as open hashing, involves maintaining a list of all elements that hash to the same index. Each slot in the hash table points to a linked list (or another data structure like a binary search tree) that contains all the elements with the same hash value.
Advantages:
- Simple to implement.
- The hash table never fills up; we can always add more elements to the chain.
- Less sensitive to the hash function or load factors.
Disadvantages:
- Poor cache performance due to linked list traversal.
- Extra memory is required for the pointers in the linked lists.
- Potential for long chains, leading to O(n) search time in the worst case[1][3][11][19].
Open Addressing
Open addressing, also known as closed hashing, stores all elements directly within the hash table. When a collision occurs, the algorithm probes for the next available slot according to a specific sequence.
Techniques in Open Addressing:
-
Linear Probing:
- Probes the next slot sequentially until an empty slot is found.
- Simple to implement and has good cache performance.
- Suffers from primary clustering, where a cluster of occupied slots forms, leading to longer search times[1][3][9][14].
-
Quadratic Probing:
- Probes slots using a quadratic function of the form $$i^2$$, where $$i$$ is the number of attempts.
- Reduces primary clustering but can still suffer from secondary clustering.
- Ensures that the probe sequence covers the entire table if the table size is a prime number[1][3][9][14][18].
-
Double Hashing:
- Uses a second hash function to determine the probe sequence.
- Minimizes clustering issues and provides a more uniform distribution of keys.
- More complex to implement and requires careful selection of hash functions to ensure they are relatively prime to the table size[1][3][9][14][18].
Advantages of Open Addressing:
- Better cache performance since all data is stored in the same table.
- No extra memory is needed for pointers.
Disadvantages of Open Addressing:
- Performance degrades as the load factor increases.
- Requires careful management of the load factor to avoid excessive probing[1][3][9][14]...