Recent from talks
Nothing was collected or created yet.
Linear hashing
View on WikipediaLinear hashing (LH) is a dynamic data structure which implements a hash table and grows or shrinks one bucket at a time. It was invented by Witold Litwin in 1980.[1] [2] It has been analyzed by Baeza-Yates and Soza-Pollman.[3] It is the first in a number of schemes known as dynamic hashing[3] [4] such as Larson's Linear Hashing with Partial Extensions,[5] Linear Hashing with Priority Splitting,[6] Linear Hashing with Partial Expansions and Priority Splitting,[7] or Recursive Linear Hashing.[8]
The file structure of a dynamic hashing data structure adapts itself to changes in the size of the file, so expensive periodic file reorganization is avoided.[4] A Linear Hashing file expands by splitting a predetermined bucket into two and shrinks by merging two predetermined buckets into one. The trigger for a reconstruction depends on the flavor of the scheme; it could be an overflow at a bucket or load factor (i.e., the number of records divided by the number of buckets) moving outside of a predetermined range.[1] In Linear Hashing there are two types of buckets, those that are to be split and those already split. While extendible hashing splits only overflowing buckets, spiral hashing (a.k.a. spiral storage) distributes records unevenly over the buckets such that buckets with high costs of insertion, deletion, or retrieval are earliest in line for a split.[5]
Linear Hashing has also been made into a scalable distributed data structure, LH*. In LH*, each bucket resides at a different server.[9] LH* itself has been expanded to provide data availability in the presence of failed buckets.[10] Key based operations (inserts, deletes, updates, reads) in LH and LH* take maximum constant time independent of the number of buckets and hence of records.[1][10]
Algorithm details
[edit]Records in LH or LH* consists of a key and a content, the latter basically all the other attributes of the record.[1][10] They are stored in buckets. For example, in Ellis' implementation, a bucket is a linked list of records.[2] The file allows the key based CRUD operations create or insert, read, update, and delete as well as a scan operations that scans all records, for example to do a database select operation on a non-key attribute.[10] Records are stored in buckets whose numbering starts with 0.[10]
The key distinction from schemes such as Fagin's extendible hashing is that as the file expands due to insertions, only one bucket is split at a time, and the order in which buckets are split is already predetermined.[11]
Hash functions
[edit]The hash function returns the 0-based index of the bucket that contains the record with key . When a bucket which uses the hash function is split into two new buckets, the hash function is replaced with for both of those new buckets. At any time, at most two hash functions and are used; such that corresponds to the current level. The family of hash functions is also referred to as the dynamic hash function.
Typically, the value of in corresponds to the number of rightmost binary digits of the key that are used to segregate the buckets. This dynamic hash function can be expressed arithmetically as . Note that when the total number of buckets is equal to one, .
Complete the calculations below to determine the correct hashing function for the given hashing key .[10]
# l represents the current level
# s represents the split pointer index
a = h_l(c)
if (a < s): a = h_{l+1}(c)
Split control
[edit]Linear hashing algorithms may use only controlled splits or both controlled and uncontrolled splits.
Controlled splitting occurs if a split is performed whenever the load factor, which is monitored by the file, exceeds a predetermined threshold.[10] If the hash index uses controlled splitting, the buckets are allowed to overflow by using linked overflow blocks. When the load factor surpasses a set threshold, the split pointer's designated bucket is split. Instead of using the load factor, this threshold can also be expressed as an occupancy percentage, in which case, the maximum number of records in the hash index equals (occupancy percentage)*(max records per non-overflowed bucket)*(number of buckets).[12]
An uncontrolled split occurs when a split is performed whenever a bucket overflows, in which case that bucket would be split into two separate buckets.
File contraction occurs in some LH algorithm implementations if a controlled split causes the load factor to sink below a threshold. In this case, a merge operation would be triggered which would undo the last split, and reset the file state.[10]
Split pointer
[edit]The index of the next bucket to be split is part of the file state and called the split pointer . The split pointer corresponds to the first bucket that uses the hash function instead of .[10]
For example, if numerical records are inserted into the hash index according to their farthest right binary digits, the bucket corresponding with the appended bucket will be split. Thus, if we have the buckets labelled as 000, 001, 10, 11, 100, 101, we would split the bucket 10 because we are appending and creating the next sequential bucket 110. This would give us the buckets 000, 001, 010, 11, 100, 101, 110.[12]
When a bucket is split, split pointer and possibly the level are updated according to the following, such that the level is 0 when the linear hashing index only has 1 bucket.[10]
# l represents the current level
# s represents the split pointer index
s = s + 1
if (s >= 2^l):
l = l + 1
s = 0
LH*
[edit]The main contribution of LH* is to allow a client of an LH* file to find the bucket where the record resides even if the client does not know the file state. Clients in fact store their version of the file state, which is initially just the knowledge of the first bucket, namely Bucket 0. Based on their file state, a client calculates the address of a key and sends a request to that bucket. At the bucket, the request is checked and if the record is not at the bucket, it is forwarded. In a reasonably stable system, that is, if there is only one split or merge going on while the request is processed, it can be shown that there are at most two forwards. After a forward, the final bucket sends an Image Adjustment Message to the client whose state is now closer to the state of the distributed file.[10] While forwards are reasonably rare for active clients, their number can be even further reduced by additional information exchange between servers and clients [13]
Other properties
[edit]File state calculation
[edit]The file state consists of split pointer and level . If the original file started with buckets, then the number of buckets and the file state are related via [13]
.
Adoption in language systems
[edit]Griswold and Townsend [14] discussed the adoption of linear hashing in the Icon language. They discussed the implementation alternatives of dynamic array algorithm used in linear hashing, and presented performance comparisons using a list of Icon benchmark applications.
Adoption in database systems
[edit]Linear hashing is used in the Berkeley database system (BDB), which in turn is used by many software systems, using a C implementation derived from the CACM article and first published on the Usenet in 1988 by Esmond Pitt.
References
[edit]- ^ a b c d Litwin, Witold (1980), "Linear hashing: A new tool for file and table addressing" (PDF), Proc. 6th Conference on Very Large Databases: 212–223
- ^ a b Ellis, Carla Schlatter (June 1987), "Concurrency in Linear Hashing", ACM Transactions on Database Systems, 12 (2): 195–217, doi:10.1145/22952.22954, S2CID 14260177
- ^ a b Baeza-Yates, Ricardo; Soza-Pollman, Hector (1998), "Analysis of Linear Hashing Revised" (PDF), Nordic Journal of Computing: 70–85, S2CID 7497598, archived from the original (PDF) on 2019-03-07
- ^ a b Enbody, Richard; Du, HC (June 1988), "Dynamic hashing schemes", ACM Computing Surveys, 20 (2): 85–113, doi:10.1145/46157.330532, S2CID 1437123
- ^ a b Larson, Per-Åke (April 1988), "Dynamic Hash Tables", Communications of the ACM, 31 (4): 446–457, doi:10.1145/42404.42410, S2CID 207548097
- ^ Ruchte, Willard; Tharp, Alan (Feb 1987), "Linear hashing with Priority Splitting: A method for improving the retrieval performance of linear hashing", IEEE Third International Conference on Data Engineering: 2–9
- ^ Manolopoulos, Yannis; Lorentzos, N. (1994), "Performance of linear hashing schemes for primary key retrieval", Information Systems, 19 (5): 433–446, doi:10.1016/0306-4379(94)90005-1
- ^ Ramamohanarao, K.; Sacks-Davis, R. (Sep 1984), "Recursive linear hashing", ACM Transactions on Database Systems, 9 (3): 369–391, doi:10.1145/1270.1285, S2CID 18577730
- ^ Litwin, Witold; Neimat, Marie-Anne; Schneider, Donavan A. (1993), "LH: Linear Hashing for distributed files", ACM SIGMOD Record, 22 (2): 327–336, doi:10.1145/170036.170084, S2CID 259938726
- ^ a b c d e f g h i j k Litwin, Witold; Moussa, Rim; Schwarz, Thomas (Sep 2005), "LH*RS - a highly-available scalable distributed data structure", ACM Transactions on Database Systems, 30 (3): 769–811, doi:10.1145/1093382.1093386, S2CID 1802386
- ^ Fagin, Ronald; Nievergelt, Jurg; Pippenger, Nicholas; Strong, Raymond (Sep 1979), "Extendible Hashing - A Fast Access Method for Dynamic Files", ACM Transactions on Database Systems, 4 (2): 315–344, doi:10.1145/320083.320092, S2CID 2723596
- ^ a b Silberschatz, Abraham; Korth, Henry F.; Sudarshan, S. (2020). Database system concepts (Seventh ed.). New York, NY: McGraw-Hill Education. ISBN 978-1-260-08450-4.
- ^ a b Chabkinian, Juan; Schwarz, Thomas (2016), "Fast LH*", International Journal of Parallel Programming, 44 (4): 709–734, doi:10.1007/s10766-015-0371-8, S2CID 7448240
- ^ Griswold, William G.; Townsend, Gregg M. (April 1993), "The Design and Implementation of Dynamic Hashing for Sets and Tables in Icon", Software: Practice and Experience, 23 (4): 351–367, doi:10.1002/spe.4380230402, S2CID 11595927
External links
[edit]See also
[edit]Linear hashing
View on GrokipediaOverview
Definition and Motivation
Hashing is a data structure technique that employs a hash function to compute an array index for key-value pairs, enabling efficient storage and retrieval by mapping keys to buckets or slots. This method supports average-case constant-time operations for insertion, deletion, and search, assuming a uniform distribution of keys. The load factor, defined as the ratio of the number of stored elements to the total number of buckets, is a critical metric; it influences collision frequency, with optimal performance typically maintained below 0.7 to 0.8 to avoid excessive probing or chaining.[4][5] Static hashing, with its fixed number of buckets determined at initialization, encounters significant challenges in dynamic environments where data volumes fluctuate. High load factors lead to bucket overflows, necessitating expensive full-table rehashing or reorganization to restore performance, while preallocating excess buckets results in inefficient space utilization. These issues underscore the need for dynamic hashing schemes that expand or contract incrementally without global disruptions.[6] Linear hashing emerges as an extendible hashing method that dynamically adjusts the address space through sequential bucket splitting guided by a split pointer, eliminating the need for a directory structure to resolve addresses. It mitigates the rehashing costs of static hashing and the memory overhead and access delays from directory doublings in extendible hashing variants, allowing growth one bucket at a time. This design facilitates precise load factor control, commonly targeting 60% to 90%, to balance space and performance.[6] The technique delivers key benefits including average constant-time insertions and deletions, with mean successful search times ranging from 1.03 to 1.73 accesses for bucket capacities of 1 to 50 at loads up to 90%. Overflow chains remain bounded, preventing disproportionate slowdowns, and the structure's adaptability suits varying data scales, such as in file systems or database indexes.[6]Historical Development
Linear hashing was developed by Witold Litwin in 1980 as a dynamic file organization technique designed to support insertions and deletions without requiring full file reorganization. It was first described in his seminal paper "Linear Hashing: A New Tool for File and Table Addressing," presented at the 6th International Conference on Very Large Data Bases (VLDB) in Montreal, Canada.[1] The method emerged amid the rapid growth of relational database management systems in the late 1970s and early 1980s, when there was increasing demand for scalable storage solutions capable of handling fluctuating data volumes and workloads efficiently.[1] Linear hashing built upon prior dynamic hashing approaches, notably extendible hashing—a directory-based predecessor introduced by Ronald Fagin, Jürg Nievergelt, Nicholas Pippenger, and H. Raymond Strong in 1979—which addressed similar challenges but relied on a directory structure that linear hashing sought to eliminate for simplicity.[7] In the early 1980s, linear hashing attracted significant interest in the database research community for its potential in relational systems, leading to early extensions such as partial expansions proposed by Per-Åke Larson in 1980, which generalized Litwin's scheme to allow more controlled growth and better space efficiency.[8] By the 1990s, as distributed computing gained prominence, further refinements adapted linear hashing for parallel and networked environments; a key milestone was the LH* algorithm, developed by Litwin, Marie-Anne Neimat, and Donovan A. Schneider in 1993, which extended the technique to scalable distributed files across multiple nodes.[9]Core Algorithm
Hash Functions
In linear hashing, the primary hash function maps a key to an initial bucket address, where is a base hash function that maps arbitrary keys to integers uniformly distributed over a large range, and denotes the current maximum power-of-two value corresponding to the number of primary buckets before the ongoing expansion phase.[6] This function ensures that keys are distributed across the existing bucket space in a manner that supports incremental growth without requiring full reorganization.[6] The secondary hash function serves to resolve mappings for keys affected by recent splits, distinguishing those that remain in the original bucket from those redirected to the newly created one.[6] By considering one additional bit in the modulus, it facilitates the redistribution of keys during bucket expansion, maintaining balance as the file grows.[6] The bucket address for a key is determined by first computing the primary hash . If this value falls below the current split pointer, the secondary hash is used instead; otherwise, the primary hash prevails. This selective application minimizes disruption to the majority of keys during each split.[6] The following pseudo-code demonstrates the computation of the bucket address:function bucket_address(k, m, split_pointer):
h_val = h(k) # Base hash
h = h_val % (2 ** m) # Primary hash h_m(k)
if h < split_pointer:
h2 = h_val % (2 ** (m + 1)) # Secondary hash h_{m+1}(k)
return h2
else:
return h
function bucket_address(k, m, split_pointer):
h_val = h(k) # Base hash
h = h_val % (2 ** m) # Primary hash h_m(k)
if h < split_pointer:
h2 = h_val % (2 ** (m + 1)) # Secondary hash h_{m+1}(k)
return h2
else:
return h
Bucket Splitting Process
In linear hashing, the bucket splitting process is initiated to maintain balanced distribution as the file grows, typically triggered when the overall load factor exceeds a predetermined threshold (e.g., 0.8 or 0.9, depending on the variant and bucket capacity), where load factor is the ratio of records to total bucket capacity, in controlled splitting variants.[10][6] This threshold ensures that splits occur proactively rather than reactively on every overflow, allowing for efficient space utilization around 0.7 to 0.9 depending on implementation parameters.[10] In uncontrolled splitting, as originally proposed, a split is triggered immediately upon an insertion causing a bucket overflow (local load factor reaching 1), but the process always targets the bucket indicated by the split pointer rather than the overflowing one to promote uniform growth.[6] The splitting procedure begins by identifying the bucket at the current position of the split pointer, which designates the next candidate for division in a linear sequence. All records in this selected bucket are then rehashed using a secondary hash function, such as , to redistribute them approximately evenly: records hashing to the original bucket address remain there, while those mapping to the new address are moved to a freshly allocated bucket appended at the end of the file structure.[6] This redistribution leverages the secondary function—typically an extension of the primary hash by considering one additional bit—to ensure balanced occupancy without requiring a full file reorganization. The new bucket is created sequentially, expanding the address space incrementally by one unit per split. Unlike directory-based dynamic hashing methods that double the entire structure at once, linear hashing employs partial expansions where only a single bucket is split per operation, enabling gradual file growth that adapts smoothly to insertion rates.[6] This approach avoids the overhead of global rehashing, with the address space effectively doubling after a full cycle of splits equal to the initial number of buckets, after which the process repeats with deeper hash levels. The result is a file that can handle varying workloads with minimal disruption, as each split operation is localized and completes in time proportional to the records in the affected bucket. Overflows, which arise from temporary collisions during insertions, are resolved through these splits, as redistributed records reduce chain lengths and restore primary bucket usage over time.[6] For deletions, the scheme supports contraction via coalescing, the reverse of splitting: when the load factor falls below a lower threshold, the split pointer moves backward, merging records from a pair of sibling buckets back into one using the secondary hash function, thereby reclaiming space efficiently.[6] During an insertion, the process integrates seamlessly into the access path: after computing the primary bucket address, the algorithm checks the split pointer to determine if the higher-level hash should be used for buckets beyond the pointer; if the target bucket is at or near capacity, a split is performed on the designated bucket before proceeding with the insertion, ensuring the new record finds space without immediate overflow.[6] This check maintains the integrity of the linear expansion while keeping insertion costs amortized constant.Split Pointer Management
In linear hashing, the split pointer serves as a dynamic index that identifies the next bucket eligible for splitting during file expansion. Defined as an integer ranging from 0 to , where represents the current level of the hash structure (with fully split buckets numbered 0 to ), ensures that splits occur sequentially, starting from bucket 0 and progressing linearly through the address space. This pointer maintains the order of expansion, preventing the need for global reorganization by focusing splits on individual buckets as needed.[6] The advancement of is triggered immediately after a split operation on the bucket it currently points to, incrementing by 1 to target the subsequent bucket. When reaches , indicating that all buckets up to the current level have been split, the level is incremented to , effectively doubling the address space, and resets to 0 to begin the splitting process for the new level. This mechanism interacts with the bucket splitting process by designating the exact bucket for division into two, after which the pointer updates to continue the controlled growth.[6] During key-to-bucket mapping for insertions, searches, or deletions, the split pointer integrates directly into the hashing algorithm to determine the appropriate bucket address. Specifically, the primary hash function (modulo ) is used if the primary hash value ; otherwise, if , the secondary hash function (modulo ) is applied to resolve the bucket, accounting for partially split buckets. This comparison ensures that keys are directed to either an existing bucket or one that has already been split, maintaining load balance without requiring full rehashing.[6] Edge cases in split pointer management arise at power-of-two boundaries and during load adjustments via deletions. At boundaries, such as when wraps from back to 0 upon level increment, the structure seamlessly transitions to the expanded space, with all previous mappings remaining valid due to the linear progression. For deletions, if the overall load factor drops below a predefined threshold, the pointer can decrement to initiate contraction through grouping (merging split pairs), reversing the expansion sequence to reclaim space efficiently while avoiding underutilization. These rules preserve the dynamic adaptability of linear hashing to varying workloads.[6]Variants and Extensions
LH* Algorithm
The LH* algorithm, proposed by Witold Litwin, Marie-Anne Neimat, and Daniel A. Schneider in 1993, is a scalable distributed data structure that generalizes linear hashing to parallel or distributed RAM and disk files.[11] It supports dynamic growth and shrinkage across multiple servers without a central directory, using a split coordinator to manage incremental bucket splits one at a time. Deletions are handled implicitly through the splitting process, allowing the structure to adapt to varying loads in distributed environments. LH* partitions data across servers, with each bucket typically residing on a different node. Insertions and searches involve a small number of messages (e.g., 1-3 for inserts, 2-4 for searches in general cases), enabling efficient parallel operations like hash joins. Contractions occur similarly to splits when buckets empty, merging with buddy buckets to reclaim space. Later extensions, such as LH*RS (2004), add high availability features for fault tolerance in multicomputer settings.[12]Other Modifications
Parallel linear hashing adaptations have been developed for multi-processor systems to enable concurrent operations without significant performance degradation. In the 1990s, researchers proposed locking protocols that protect the split pointer during concurrent splits, allowing multiple threads to perform inserts and searches while a single thread handles bucket splitting under a directory lock.[13] This approach minimizes lock contention by using selective locking on overflow chains and deallocating old buckets exclusively, supporting multithreaded environments in main memory databases. Distributed variants of linear hashing extend the structure to cluster environments by partitioning the hash space across multiple nodes, often using a global split pointer replicated or coordinated via directory structures. These adaptations, such as Distributed Dynamic Hashing (DDH), distribute data dynamically across networked servers while maintaining load balance through incremental bucket splits coordinated between nodes.[14] Memory-optimized versions of linear hashing target in-memory databases by reducing rehashing overhead through cache-conscious designs and adaptive splitting. Post-2010 research has focused on variants that leverage modern hardware, such as Self-Adaptive Linear Hashing (SAL-Hashing) for solid-state drives integrated with in-memory components, which minimizes random writes by adjusting split thresholds based on access patterns.[15] Recent analyses highlight linear hashing's suitability for main memory key-value stores, with optimizations like fringe behavior improvements via spiral extensions to enhance insertion throughput under varying loads.[16] Hybrid approaches have been explored to combine elements of linear hashing with other techniques for better performance in specific scenarios. A specific example is partial linear hashing, introduced by Per-Åke Larson in 1980 and further developed in 1988, which supports dynamic files by allowing selective partial expansions of buckets rather than full splits, thereby reducing reorganization costs and accommodating varying data distributions.[17][18]Properties and Analysis
File State and Capacity
In linear hashing, the file's internal state is maintained through a set of variables that monitor its current capacity and dictate when expansions occur. The variable denotes the current level, establishing the base range for the primary hash function as buckets (assuming initial for simplicity; in general, ). The split pointer tracks the index of the next bucket designated for splitting, with . The total number of buckets is computed as , reflecting the incremental growth beyond the power-of-two boundary.[6] The capacity of the file is determined by the load factor , a configurable threshold typically ranging from 0.6 to 0.9, which represents the desired utilization ratio of bucket space. The maximum number of records before triggering the next split is approximately , where is the bucket capacity (often normalized to 1 for analysis). A split is initiated when the number of records exceeds , ensuring the file expands proactively to maintain performance. Once reaches , the current level is complete: is incremented to , resets to 0, and the address space effectively doubles, completing a full expansion round.[6] Overflow handling is integrated into the state via tracking of chain lengths for each primary bucket, where excess records are stored in linked overflow buckets. During an insert operation, if a target bucket overflows, records are appended to its chain, and the split may be triggered based on the overall load; the state updates include rehashing affected records into the new bucket and incrementing . Deletions reduce record counts and chain lengths, potentially enabling contractions if the load falls below a lower threshold (e.g., ), which merges buckets and decrements or adjusts accordingly. These transitions ensure the file state remains balanced without full reorganization.[6] For illustration, consider an initial empty file with , , yielding bucket and capacity for approximately 1.6 records (assuming and bucket size of 2). After inserting the first two records, the load exceeds the threshold, triggering a split: the single bucket divides, advances to 1, becomes 2, and records are redistributed using the next-level hash function. Since , the level increments to , resets to 0.[6]Performance Metrics
Linear hashing achieves average-case O(1) time complexity for insert, search, and delete operations, primarily due to the use of bounded overflow chains that limit the number of probes per operation. Splits are local to one bucket, maintaining O(1) worst-case time, with amortized constant cost across operations. In the original formulation, successful searches require approximately 1 to 1.77 disk accesses on average, while unsuccessful searches average 1.62 to 1.63 accesses, assuming bucket capacities of 1 to 50 records.[6] Space efficiency in linear hashing is notably higher than directory-based methods like extendible hashing, which incur up to 100% overhead from directory doublings. Linear hashing with partial expansions exhibits an overflow storage ratio of 1/13 to 1/17 (approximately 6-8% extra space) at 85% utilization, enabling partial file growth without full rehashing.[19] Load factor maintenance in linear hashing is optimized at 70-90%, with controlled splits allowing up to 90% utilization to maximize storage density—an improvement of 31-40% over uncontrolled variants. Variance analysis reveals even distribution after splits, with successful search lengths stabilizing at 1.08-1.39 accesses and insertion costs at 4-5 accesses under 85% load, though temporary unevenness arises during growth phases.[6][19] Scalability supports handling over 10^6 records with low rehash costs, as the structure grows linearly without directory overhead, extending to billions of buckets using minimal core memory (a few bytes for pointers). Benchmarks from 1980s VLDB analyses demonstrate linear hashing is 2-5 times faster than quadratic probing for dynamic workloads, owing to reduced clustering and incremental expansions.[6] A key trade-off is higher variance in chain lengths during growth phases compared to perfect hashing schemes, potentially increasing unsuccessful search times to 1.63 accesses at low bucket capacities, though this is mitigated by overflow chaining and partial expansions.[6][19]Implementations and Applications
Use in Programming Languages
In the 1980s, linear hashing was adopted in Lisp interpreters for efficient symbol table management, where dynamic growth was essential for handling variable-sized environments without frequent full rehashes. For instance, the TAO Lisp system implemented a simple linear hashing scheme for its symbol table to support fast lookups and insertions in a firmware-based interpreter.[20] A prominent modern adoption appears in Erlang's Erlang Term Storage (ETS) module, where tables of typesset, bag, and duplicate_bag employ linear hashing to enable dynamic resizing one bucket at a time. This implementation uses an array of fine-grained locks (initially 16, expanded to 64 in later versions) mapped to buckets via modulo operations, allowing concurrent reads and writes while maintaining amortized constant-time operations for inserts and lookups in set tables.[21][22]
Linear hashing's incremental bucket splitting offers advantages in garbage-collected languages by minimizing pause times during resizing, as it avoids the need for complete table rehashing that could interfere with collection cycles. In Erlang, this aligns well with the BEAM virtual machine's lightweight processes and generational garbage collection, enabling scalable in-memory storage for concurrent applications without blocking mutators during growth.[21][23]
Despite these benefits, linear hashing remains less prevalent in strictly typed languages like Java or C++, where simpler open-addressing schemes with periodic doubling predominate due to their predictability and lower implementation overhead in compile-time environments. This contrasts with its broader use in database systems for persistent storage, as explored elsewhere.[24]
