Judy array
View on WikipediaThe topic of this article may not meet Wikipedia's general notability guideline. (April 2022) |
In computer science, a Judy array is an early-2000s Hewlett-Packard hand-optimized implementation of a 256-ary radix tree that uses many situational node types to reduce latency from CPU cache-line fills.[1][2] As a compressed radix tree, a Judy array can store potentially sparse integer- or string-indexed data with comparatively low memory usage and low read latency, without relying on hashing or tree balancing, and without sacrificing in-order traversal.[3] Per-operation latency scales as —as expected of a tree—and the leading constant factor is small enough that Judy arrays are suitable even to the peta-element range.[4] When applicable, they can be faster than implementations of AVL trees, B-trees, hash tables, or skip lists from the same time period.[3][needs update]
History
[edit]The Judy array was invented by Douglas Baskins over the years leading up to 2002 and named after his sister.[5]
Node types
[edit]Broadly, tree nodes in Judy arrays fall into one of three categories, though the implementation uses situational variations within each category:[2]
- A linear node is a short, fixed-capacity, array-based association list meant to fit in one cache line. That is, such a node has an array of key bytes and a parallel array of values or pointers. Lookup is by linear search over the key array and then random access to the corresponding index in the value/pointer array.
- A bitmap node is a size-256 bitvector tracking which values/children are present and then a sorted list of corresponding values or pointers. Lookup is by population count of the bits up to the target index and then random access to the corresponding entry in the value/pointer array. The bitmap fits within a typical CPU cache line, and random access only loads one cache line from the sorted list, so for reading these nodes require at most two cache-line fills.
- An uncompressed node is a conventional trie node as an array of values/pointers. Lookup is by random access using the key byte as an index, which at the CPU level requires visiting one cache line.
Linear nodes are used for low branching, bitmap nodes for intermediate branching, and uncompressed nodes for high branching.[2]
Advantages and disadvantages
[edit]Due to cache optimizations, Judy arrays are fast, especially for very large datasets. On certain tasks involving data that are sequential or nearly sequential, Judy arrays can even outperform hash tables, since, unlike hash tables, the internal tree structure of Judy arrays maintains the ordering of the keys.[6]
On the other hand, Judy arrays are not suitable for all key types, rely heavily on compile-time case-splitting (which increases both the compiled code size and the work involved in retuning for a new architecture[6]), make some concessions to older architectures that may not be relevant to modern machines, and do not exploit SIMD.[2] They are optimized for read performance over write performance.[2]
See also
[edit]References
[edit]- ^ Robert Gobeille and Douglas Baskins' patent
- ^ a b c d e Alan Silverstein, "Judy IV Shop Manual", 2002
- ^ a b "A 10-Minute Description of How Judy Arrays Work and Why They Are So Fast".
- ^ "Debian -- Details of package libjudy-dev in buster".
- ^ "Home". judy.sourceforge.net.
- ^ a b "A performance comparison of Judy to hash tables".
External links
[edit]Judy array
View on GrokipediaFundamentals
Definition and Purpose
A Judy array is a specialized data structure implemented as a 256-ary radix tree, designed for efficient dynamic storage and retrieval of key-value pairs in sparse datasets. It optimizes memory usage by allocating space only for populated entries, avoiding the overhead of dense arrays or the hashing collisions common in traditional hash tables. The structure supports integer keys (32- or 64-bit words) or pointer keys, enabling it to function as an associative array that maps keys to values such as bits, words, or pointers.[2] The primary purpose of a Judy array is to provide fast lookups, insertions, and deletions in scenarios where data is sparse or irregularly distributed, outperforming conventional arrays, binary search trees, or hash tables by minimizing memory waste and cache misses. It excels in applications requiring scalable indexing without predefined size limits or tuning parameters, making it ideal for handling large, unpredictable datasets like network routing tables or database indices. By treating the array as growable and indexed by arbitrary keys, it eliminates the need for contiguous memory allocation, thus reducing fragmentation in memory-constrained environments.[2][5] At a high level, the Judy array API includes functions such as JudySet (or variants like JLI for integers) to insert or update key-value pairs, JudyGet (e.g., JLG) to retrieve values by key, JudyDel to remove entries, and JudyCount to determine the population size. For example, it can efficiently store mappings for IP addresses in a routing system, where only a fraction of the 2^32 possible IPv4 addresses are used, avoiding memory bloat from unused slots while enabling constant-time access (with respect to the number of stored entries), bounded by the key length in bytes.[5]Basic Principles
The Judy array operates on the core principle of a 256-ary radix tree, where each level of the tree branches based on 8 bits (one byte) of the key, enabling efficient navigation through the key space.[2] This high branching factor minimizes the tree's depth, resulting in a maximum of four levels for 32-bit keys (covering an expanse of 2^32 possible values) and eight levels for 64-bit keys (covering 2^64).[2] By decoding the key byte-by-byte during traversal, the structure achieves low latency with few cache-line accesses, typically requiring fewer than the maximum levels due to data density optimizations.[2] Central to the Judy array's adaptability are the concepts of expanse, population, and density, which guide the dynamic structuring of the tree. Expanse refers to the total range of possible keys within a subtree, such as 256 to 511 for an 8-bit segment.[2] Population denotes the actual number of keys stored in that expanse, while density is the ratio of population to expanse, indicating sparseness (e.g., a density of 1.0 means all keys in the range are present).[2] These metrics determine the choice of subtree representations, allowing the array to efficiently handle sparse distributions by using compact nodes for low-density areas and expanded structures for high-density ones, thereby balancing memory usage and access speed across varying data sets.[2] Key compression in Judy arrays is achieved through implicit positioning within the tree, eliminating the need to store full keys in most nodes. As traversal progresses, each path from the root encodes the key's digits, so the position in the structure inherently represents the key without redundant storage.[2] This approach shares common key prefixes across subtrees, storing only the differing portions as needed, which reduces memory overhead while maintaining fast lookups.[2] Judy arrays exist in two primary variants: Judy1 for integer (bit-vector) arrays and JudyL for pointer (key-value) arrays. JudyL stores explicit keys and associated pointers in sorted order within nodes, adapting to population size—for instance, a single entry uses a minimal two-word structure, while denser groups employ linear arrays of keys and values.[2] In contrast, Judy1 optimizes for dense subtrees by employing bitmaps, such as a 32-byte bitmap representing 256 possible keys when density exceeds approximately 0.094, allowing bit-level operations for compact storage and rapid testing of key presence.[2]History and Development
Invention and Implementation
The Judy array was invented by Douglas Baskins while working at Hewlett-Packard. It originated from his explorations in digital tree compression techniques around 2000, with a development team formed in January of that year to productize the concept. This work culminated in the early 2000s, specifically through patent applications filed in December 1999 describing a fast, efficient, adaptive hybrid tree structure known as Judy.[4][1] Baskins named the data structure after his sister Judy, reflecting a personal touch in its creation.[1] Developed as an internal tool at Hewlett-Packard to address high-performance computing requirements, particularly for managing sparse datasets in memory-constrained environments, the Judy array was hand-optimized in the C programming language to target Unix-like systems prevalent in HP's engineering workflows.[1] The structure evolved rapidly: Judy III became available internally by March 2000, and Judy IV—featuring roughly twice the speed and space efficiency of its predecessor—was delivered in April 2001 for HP-UX systems.[3] Early prototypes were tested on Hewlett-Packard hardware, allowing Baskins and collaborators to refine the structure iteratively based on real-world performance observations in HP's computing infrastructure.[4] The Judy array remained proprietary to Hewlett-Packard until its open sourcing in 2002.[1]Open Sourcing and Legacy
In 2002, the Judy array implementation was released as open-source software under the GNU Lesser General Public License (LGPL) version 2.0 by inventor Douglas Baskins and the Hewlett-Packard development team through SourceForge, providing a complete C library along with extensive documentation and examples.[6][1] The original Hewlett-Packard patents related to the Judy array, such as US Patent 6,735,595 issued in 2004, were covered under the LGPL's implicit patent grant, allowing free use and distribution without additional licensing requirements; these patents expired in 2020, further enabling unrestricted adoption.[4][7] The core Judy project saw its last official update in 2009 with version 1.0.5, after which community-driven ports to other languages proliferated in the 2010s, including PyJudy for Python in 2009, go-judy for Go around 2013, and rudy for Rust in 2017.[8][9] The Judy array's legacy endures in its influence on high-performance computing designs, where its cache-optimized radix tree approach has been cited in research papers through the 2020s for applications requiring efficient sparse array operations, such as in bioinformatics databases for petabase-scale nucleotide indexing.[10] Despite its complexity limiting widespread modern adoption in favor of simpler alternatives like hash tables, Judy remains integrated into niche tools, including monitoring software like Netdata for performance data storage and various specialized networking and database systems for fast key-value lookups.[11] As of 2025, the core project remains inactive, with maintenance relying on community forks and bindings.[6]Internal Structure
Node Types
Judy arrays employ a diverse set of node types optimized for varying population densities within key expanses, enabling efficient memory usage and cache performance across sparse and dense scenarios.[2] These nodes form the building blocks of the radix tree structure, with designs tailored to fit within one or two cache lines to minimize access latency.[2] Linear nodes handle low-population cases, starting with compact representations such as 2-word nodes for a single key, 4-word for two keys, 8-word for three keys, and scaling up to 32-word nodes capable of storing up to 32 keys before transitioning to branched structures.[2] These linear nodes have similar capacities in 64-bit implementations while remaining cache-efficient. This progression allows sequential storage of keys and associated pointers or values without branching overhead for small sets.[2] Branch nodes come in three primary variants to manage higher densities: full 256-ary branches using sparse pointers for any populated subexpanse, bitmap branches that employ a 32-byte (256-bit) bitmap to indicate active indices (particularly in Judy1 arrays for populations exceeding 25 keys), and compressed branches that eliminate unused portions to fit within 1-2 cache lines.[2][12] Bitmap branches activate at a density threshold of approximately 0.094 (25 keys out of 256 possible), providing a compact alternative to full arrays by using bit positions to index subtrees.[2] The Judy array supports around 25 major node types for 32-bit keys and approximately 85 for 64-bit keys, encompassing variations for both Judy1 (index arrays) and JudyL (general associative arrays), along with sub-tree pointers and auxiliary structures.[2] The root pointer initializes as NULL for an empty array and evolves into the appropriate node type as elements are inserted, such as a 2-word linear node for the first key.[2] Transition rules govern node evolution: linear nodes split into branch nodes when the population surpasses 32 keys, prompting the formation of a 256-ary structure; further, bitmap branches emerge in Judy1 when density exceeds the 0.094 threshold to optimize sparse high-level expanses.[2] These rules ensure adaptive growth, balancing depth and memory footprint.[2]Tree Organization and Key Representation
The Judy array organizes its data as a hierarchical 256-ary digital tree, where each level processes 8 bits of the key starting from the most significant bit (MSB), effectively dividing the key space into 256 possible digits per level.[12] This structure forms a "tree of trees," with the root pointing to subtrees that recursively apply the same radix-based subdivision to the remaining key bits, enabling efficient traversal without fixed depth limitations.[2] Keys are represented by splitting them into these 8-bit digits, allowing traversal to infer the key's position implicitly through the tree's branching paths rather than storing full keys in internal nodes.[12] For instance, in bitmap nodes, the digit is represented by bit offsets within a compact bitmap, where set bits indicate populated subexpanses without explicit digit storage.[2] This digit-wise processing supports both fixed-size integer keys (e.g., 32- or 64-bit words) and, in variants like JudySL, variable-length strings by treating them as sequences of bytes.[12] The root begins as a simple linear structure—a null pointer for an empty array, a 2-word object for a single key, or a small multi-word object for a few keys—before expanding into branched subtrees as the population grows beyond thresholds like 32 keys.[2] Subtrees are linked via Judy Pointers (JPs), which are enriched pointers containing addresses and metadata, forming the hierarchical cascade without any rebalancing mechanism; instead, the tree adapts dynamically through splitting (cascading) when nodes overflow and merging (decascading) when subtrees underflow during population changes.[12] To handle sparse key distributions, empty branches are skipped entirely using null JPs for unpopulated digits, minimizing memory overhead in low-density regions.[12] Compression is achieved by sharing common prefixes across subtrees, where index compression omits redundant most-significant bits or digits that are identical for all keys at a given level, further optimizing storage for clustered or sparse datasets.[2]Operations
Insertion and Deletion
Insertion in a Judy array, such as via theJudyLIns or JudyLSet functions, begins by traversing the tree structure digit by digit from the most significant byte of the key, following pointers in branch nodes to reach the appropriate leaf or subarray.[13] If the key already exists at the leaf, the associated value is updated or overwritten, ensuring no duplicates are stored as Judy arrays function as sparse maps with unique keys.[2] For a new key, nodes are created or expanded along the traversal path; initial insertions build small linear leaves that store keys and values directly in sorted order, accommodating up to 25 entries before splitting to optimize space and access time.[3]
When a linear leaf exceeds its capacity—typically 25 keys for JudyL arrays, corresponding to a density threshold of approximately 0.094 over a 256-byte expanse—it splits into a branch node, such as a bitmap branch with a 32-byte bitmap indicating populated subexpanses and pointers to child nodes.[2][3] This splitting allocates new nodes dynamically, with memory managed in cache-line multiples (e.g., 64 bytes) to reduce fragmentation and improve locality, using a custom allocator that provisions chunks in sizes like 2, 4, 8, up to 512 words for full branches.[3] Existence checks during insertion can be performed using functions like JudyLGet, which returns a null pointer if the key is absent, allowing conditional updates without full insertion.[13]
Deletion, implemented via JudyLDel, follows a similar digit-by-digit traversal to locate the leaf containing the key, where the value is removed—either by clearing a bit in a bitmap leaf or extracting from a linear leaf—and the key is marked as absent.[13][2] Post-removal, the structure is inspected for underpopulation; if a branch node's occupancy falls below a hysteresis threshold (e.g., fewer than 7 populated subexpanses in a bitmap branch or 25 keys in a leaf), it collapses back to a more compact linear form to reclaim memory and maintain efficiency.[3] This dynamic collapsing frees unused nodes, with the allocator coalescing small freed blocks where possible to minimize fragmentation, though hysteresis prevents frequent resizing to avoid thrashing during repeated inserts and deletes.[3] If the array empties entirely, it reverts to a null pointer, releasing all associated memory.[13]