Hubbry Logo
search
logo

Dynamic perfect hashing

logo
Community Hub0 Subscribers
Write something...
Be the first to start a discussion here.
Be the first to start a discussion here.
See all
Dynamic perfect hashing

In computer science, dynamic perfect hashing is a programming technique for resolving collisions in a hash table data structure. While more memory-intensive than its hash table counterparts,[citation needed] this technique is useful for situations where fast queries, insertions, and deletions must be made on a large set of elements.

The problem of optimal static hashing was first solved in general by Fredman, Komlós and Szemerédi. In their 1984 paper, they detail a two-tiered hash table scheme in which each bucket of the (first-level) hash table corresponds to a separate second-level hash table. Keys are hashed twice—the first hash value maps to a certain bucket in the first-level hash table; the second hash value gives the position of that entry in that bucket's second-level hash table. The second-level table is guaranteed to be collision-free (i.e. perfect hashing) upon construction. Consequently, the look-up cost is guaranteed to be O(1) in the worst-case.

In the static case, we are given a set with a total of x entries, each one with a unique key, ahead of time. Fredman, Komlós and Szemerédi pick a first-level hash table with size buckets.

To construct, x entries are separated into s buckets by the top-level hashing function, where . Then for each bucket with k entries, a second-level table is allocated with slots, and its hash function is selected at random from a universal hash function set so that it is collision-free (i.e. a perfect hash function) and stored alongside the hash table. If the hash function randomly selected creates a table with collisions, a new hash function is randomly selected until a collision-free table can be guaranteed. Finally, with the collision-free hash, the k entries are hashed into the second-level table.

The quadratic size of the space ensures that randomly creating a table with collisions is infrequent and independent of the size of k, providing linear amortized construction time. Although each second-level table requires quadratic space, if the keys inserted into the first-level hash table are uniformly distributed, the structure as a whole occupies expected space, since bucket sizes are small with high probability.

The first-level hash function is specifically chosen so that, for the specific set of x unique key values, the total space T used by all the second-level hash tables has expected space, and more specifically . Fredman, Komlós and Szemerédi showed that given a universal hashing family of hash functions, at least half of those functions have that property.

Dietzfelbinger et al. present a dynamic dictionary algorithm that, when a set of n items is incrementally added to the dictionary, membership queries always run in constant time and therefore worst-case time, the total storage required is (linear), and expected amortized insertion and deletion time (amortized constant time).

In the dynamic case, when a key is inserted into the hash table, if its entry in its respective subtable is occupied, then a collision is said to occur and the subtable is rebuilt based on its new total entry count and randomly selected hash function. Because the load factor of the second-level table is kept low , rebuilding is infrequent, and the amortized expected cost of insertions is . Similarly, the amortized expected cost of deletions is .

See all
User Avatar
No comments yet.