Hubbry Logo
search
logo

Linear hashing

logo
Community Hub0 Subscribers
Write something...
Be the first to start a discussion here.
Be the first to start a discussion here.
See all
Linear hashing

Linear hashing (LH) is a dynamic data structure which implements a hash table and grows or shrinks one bucket at a time. It was invented by Witold Litwin in 1980. It has been analyzed by Baeza-Yates and Soza-Pollman. It is the first in a number of schemes known as dynamic hashing such as Larson's Linear Hashing with Partial Extensions, Linear Hashing with Priority Splitting, Linear Hashing with Partial Expansions and Priority Splitting, or Recursive Linear Hashing.

The file structure of a dynamic hashing data structure adapts itself to changes in the size of the file, so expensive periodic file reorganization is avoided. A Linear Hashing file expands by splitting a predetermined bucket into two and shrinks by merging two predetermined buckets into one. The trigger for a reconstruction depends on the flavor of the scheme; it could be an overflow at a bucket or load factor (i.e., the number of records divided by the number of buckets) moving outside of a predetermined range. In Linear Hashing there are two types of buckets, those that are to be split and those already split. While extendible hashing splits only overflowing buckets, spiral hashing (a.k.a. spiral storage) distributes records unevenly over the buckets such that buckets with high costs of insertion, deletion, or retrieval are earliest in line for a split.

Linear Hashing has also been made into a scalable distributed data structure, LH*. In LH*, each bucket resides at a different server. LH* itself has been expanded to provide data availability in the presence of failed buckets. Key based operations (inserts, deletes, updates, reads) in LH and LH* take maximum constant time independent of the number of buckets and hence of records.

Records in LH or LH* consists of a key and a content, the latter basically all the other attributes of the record. They are stored in buckets. For example, in Ellis' implementation, a bucket is a linked list of records. The file allows the key based CRUD operations create or insert, read, update, and delete as well as a scan operations that scans all records, for example to do a database select operation on a non-key attribute. Records are stored in buckets whose numbering starts with 0.

The key distinction from schemes such as Fagin's extendible hashing is that as the file expands due to insertions, only one bucket is split at a time, and the order in which buckets are split is already predetermined.

The hash function returns the 0-based index of the bucket that contains the record with key . When a bucket which uses the hash function is split into two new buckets, the hash function is replaced with for both of those new buckets. At any time, at most two hash functions and are used; such that corresponds to the current level. The family of hash functions is also referred to as the dynamic hash function.

Typically, the value of in corresponds to the number of rightmost binary digits of the key that are used to segregate the buckets. This dynamic hash function can be expressed arithmetically as . Note that when the total number of buckets is equal to one, .

Complete the calculations below to determine the correct hashing function for the given hashing key .

See all
User Avatar
No comments yet.