Hubbry Logo
MurmurHashMurmurHashMain
Open search
MurmurHash
Community hub
MurmurHash
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
MurmurHash
MurmurHash
from Wikipedia

MurmurHash is a non-cryptographic hash function suitable for general hash-based lookup.[1][2][3] It was created by Austin Appleby in 2008[4] and, as of 8 January 2016,[5] is hosted on GitHub along with its test suite named SMHasher. It also exists in a number of variants,[6] all of which have been released into the public domain. The name comes from two basic operations, multiply (MU) and rotate (R), used in its inner loop.

Unlike cryptographic hash functions, it is not specifically designed to be difficult to reverse by an adversary, making it unsuitable for cryptographic purposes.

Variants

[edit]

MurmurHash1

[edit]

The original MurmurHash was created as an attempt to make a faster function than Lookup3.[7] Although successful, it had not been tested thoroughly and was not capable of providing 64-bit hashes as in Lookup3. Its design would be later built upon in MurmurHash2, combining a multiplicative hash (similar to the Fowler–Noll–Vo hash function) with an Xorshift.

MurmurHash2

[edit]

MurmurHash2[8] yields a 32- or 64-bit value. It comes in multiple variants, including some that allow incremental hashing and aligned or neutral versions.

  • MurmurHash2 (32-bit, x86)—The original version; contains a flaw that weakens collision in some cases.[9]
  • MurmurHash2A (32-bit, x86)—A fixed variant using Merkle–Damgård construction. Slightly slower.
  • CMurmurHash2A (32-bit, x86)—MurmurHash2A, but works incrementally.
  • MurmurHashNeutral2 (32-bit, x86)—Slower, but endian- and alignment-neutral.
  • MurmurHashAligned2 (32-bit, x86)—Slower, but does aligned reads (safer on some platforms).
  • MurmurHash64A (64-bit, x64)—The original 64-bit version. Optimized for 64-bit arithmetic.
  • MurmurHash64B (64-bit, x86)—A 64-bit version optimized for 32-bit platforms. It is not a true 64-bit hash due to insufficient mixing of the stripes.[10]

The person who originally found the flaw[clarification needed] in MurmurHash2 created an unofficial 160-bit version of MurmurHash2 called MurmurHash2_160.[11]

MurmurHash3

[edit]

The current version, completed April 3, 2011, is MurmurHash3,[12][13] which yields a 32-bit or 128-bit hash value. When using 128-bits, the x86 and x64 versions do not produce the same values, as the algorithms are optimized for their respective platforms. MurmurHash3 was released alongside SMHasher, a hash function test suite.

Implementations

[edit]

The canonical implementation is in C++, but there are efficient ports for a variety of popular languages, including Python,[14] C,[15] Go,[16] C#,[13][17] D,[18] Lua, Perl,[19] Ruby,[20] Rust,[21][22] PHP,[23][24] Common Lisp,[25] Haskell,[26] Elm,[27] Clojure,[28] Scala,[29] Java,[30][31][32] Erlang,[33] Swift,[34] Object Pascal,[35] Kotlin,[36] JavaScript,[37] OCaml[38] and Microsoft Excel.[39]

It has been adopted into a number of open-source projects, most notably libstdc++ (ver 4.6), nginx (ver 1.0.1),[40] Rubinius,[41] libmemcached (the C driver for Memcached),[42] npm (nodejs package manager),[43] maatkit,[44] Hadoop,[1] Kyoto Cabinet,[45] Cassandra,[46][47] Solr,[48] vowpal wabbit,[49] Elasticsearch,[50] Guava,[51] Kafka,[52] and RedHat Virtual Data Optimizer (VDO).[53]

Vulnerabilities

[edit]

Hash functions can be vulnerable to collision attacks, where a user can choose input data in such a way so as to intentionally cause hash collisions. Jean-Philippe Aumasson and Daniel J. Bernstein were able to show that even implementations of MurmurHash using a randomized seed are vulnerable to so-called HashDoS attacks.[54] With the use of differential cryptanalysis, they were able to generate inputs that would lead to a hash collision. The authors of the attack recommend using their own SipHash instead.

Algorithm

[edit]
algorithm Murmur3_32 is
    // Note: In this version, all arithmetic is performed with unsigned 32-bit integers.
    //       In the case of overflow, the result is reduced modulo 232.
    input: key, len, seed

    c1 ← 0xcc9e2d51
    c2 ← 0x1b873593
    r1 ← 15
    r2 ← 13
    m ← 5
    n ← 0xe6546b64

    hash ← seed

    for each fourByteChunk of key do
        k ← fourByteChunk

        k ← k × c1
        k ← k ROL r1
        k ← k × c2

        hash ← hash XOR k
        hash ← hash ROL r2
        hash ← (hash × m) + n

    with any remainingBytesInKey do
        remainingBytes ← SwapToLittleEndian(remainingBytesInKey)
        // Note: Endian swapping is only necessary on big-endian machines.
        //       The purpose is to place the meaningful digits towards the low end of the value,
        //       so that these digits have the greatest potential to affect the low range digits
        //       in the subsequent multiplication.  Consider that locating the meaningful digits
        //       in the high range would produce a greater effect upon the high digits of the
        //       multiplication, and notably, that such high digits are likely to be discarded
        //       by the modulo arithmetic under overflow.  We don't want that.

        remainingBytes ← remainingBytes × c1
        remainingBytes ← remainingBytes ROL r1
        remainingBytes ← remainingBytes × c2

        hash ← hash XOR remainingBytes

    hash ← hash XOR len

    hash ← hash XOR (hash >> 16)
    hash ← hash × 0x85ebca6b
    hash ← hash XOR (hash >> 13)
    hash ← hash × 0xc2b2ae35
    hash ← hash XOR (hash >> 16)

A sample C implementation follows (for little-endian CPUs):

static inline uint32_t murmur_32_scramble(uint32_t k) {
    k *= 0xcc9e2d51;
    k = (k << 15) | (k >> 17);
    k *= 0x1b873593;
    return k;
}
uint32_t murmur3_32(const uint8_t* key, size_t len, uint32_t seed)
{
	uint32_t h = seed;
    uint32_t k;
    /* Read in groups of 4. */
    for (size_t i = len >> 2; i; i--) {
        // Here is a source of differing results across endiannesses.
        // A swap here has no effects on hash properties though.
        memcpy(&k, key, sizeof(uint32_t));
        key += sizeof(uint32_t);
        h ^= murmur_32_scramble(k);
        h = (h << 13) | (h >> 19);
        h = h * 5 + 0xe6546b64;
    }
    /* Read the rest. */
    k = 0;
    for (size_t i = len & 3; i; i--) {
        k <<= 8;
        k |= key[i - 1];
    }
    // A swap is *not* necessary here because the preceding loop already
    // places the low bytes in the low places according to whatever endianness
    // we use. Swaps only apply when the memory is copied in a chunk.
    h ^= murmur_32_scramble(k);
    /* Finalize. */
	h ^= len;
	h ^= h >> 16;
	h *= 0x85ebca6b;
	h ^= h >> 13;
	h *= 0xc2b2ae35;
	h ^= h >> 16;
	return h;
}
Tests
Test string Seed value Hash value (hexadecimal) Hash value (decimal)
0x00000000 0x00000000 0
0x00000001 0x514E28B7 1,364,076,727
0xffffffff 0x81F16F39 2,180,083,513
test 0x00000000 0xba6bd213 3,127,628,307
test 0x9747b28c 0x704b81dc 1,883,996,636
Hello, world! 0x00000000 0xc0363e43 3,224,780,355
Hello, world! 0x9747b28c 0x24884CBA 612,912,314
The quick brown fox jumps over the lazy dog 0x00000000 0x2e4ff723 776,992,547
The quick brown fox jumps over the lazy dog 0x9747b28c 0x2FA826CD 799,549,133

See also

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
MurmurHash is a family of fast, non-cryptographic hash functions designed for general hash-based lookup operations, such as those in hash tables and other data structures requiring efficient key distribution. Developed by , the initial version was released in as , with no claims attached to allow unrestricted use and modification. The name derives from the core operations in its mixing algorithm—multiply (MU) and rotate (R)—which contribute to its simplicity and performance. Subsequent iterations include MurmurHash2 and the latest version, MurmurHash3 (2011), introduced to address limitations in earlier versions like poorer distribution on certain inputs. MurmurHash3 supports both 32-bit and 128-bit output hashes, with variants optimized for x86 and x64 architectures to achieve high speed while maintaining strong properties for uniform hash value spread. It processes input in architecture-dependent word-sized chunks, employing a combination of , , and XOR operations in its finalization step to ensure robustness against collisions in non-adversarial scenarios. The algorithm's quality has been rigorously evaluated using the SMHasher test suite, which assesses distribution, collision resistance, and performance metrics, confirming MurmurHash3's suitability for performance-critical applications. Widely integrated into programming language standard libraries and frameworks—such as Apache Commons Codec, the D programming language, and Scala—it excels in scenarios like bloom filters and approximate counting structures where cryptographic security is unnecessary but speed and reliability are paramount. All versions remain in the public domain, facilitating broad adoption across open-source and proprietary software.

Overview

Definition and Purpose

MurmurHash is a of non-cryptographic hash functions designed for general-purpose hashing applications, emphasizing speed and uniform distribution of outputs. These functions derive their name from the "murmur"-like iterative mixing process in their core operations, which involves multiplication and rotation to bits effectively. Unlike cryptographic hashes, MurmurHash prioritizes performance over resistance to intentional attacks, making it unsuitable for security-sensitive uses such as or message . The primary purposes of MurmurHash include supporting hash-based data structures like hash tables, where it generates indices to enable efficient lookups and storage. It is also widely applied in Bloom filters for space-efficient probabilistic set membership testing, reducing false positives through multiple hash computations. Additionally, MurmurHash facilitates data partitioning in distributed systems, such as assigning keys to nodes in like to balance load across clusters. Key advantages of MurmurHash lie in its high execution speed on modern processors, achieving rates up to several gigabytes per second for typical inputs, alongside excellent properties that ensure good distribution and minimize collisions. The algorithm's simplicity further aids implementation, requiring minimal code while supporting output sizes of 32 bits or 128 bits across variants, tailored to common use cases in caches and non-relational databases.

Development History

MurmurHash was developed by and first released in 2008 as a family of non-cryptographic hash functions aimed at high in general-purpose hashing applications. The initial version, MurmurHash 1.0, focused on 32-bit hashing and was designed to surpass the speed and distribution quality of prior simple hashes, such as ' lookup3, while maintaining low collision rates in hash tables. In 2009, Appleby released MurmurHash 2.0 to address minor weaknesses in the first version and extend support to 64-bit architectures, enabling better utilization of modern processors at the time. In , Appleby undertook a major redesign, releasing MurmurHash 3.0 after extensive testing with the SMHasher suite he developed to evaluate robustness. This version introduced improvements for both 32-bit and 64-bit platforms, including endian-neutral processing to ensure consistent results across architectures and configurable seed options for enhanced security against hash flooding attacks. The redesign was motivated by the growing demands of environments, where faster mixing and finalization steps were needed without sacrificing avalanche properties or speed; it remains the last official update, with no subsequent versions released due to its proven stability and wide adoption. As of 2025, MurmurHash3 remains the latest version, with no official updates since due to its stability. Key adoption milestones began shortly after MurmurHash 3.0's finalization, with integration into Google's library in February 2013 (Guava 14.0). By 2012, it was incorporated into for efficient data partitioning and hashing tasks. With adoptions including in 2012 for partitioning keys and for hash table operations by 2015, it solidified its role in open-source high-throughput projects.

Variants

MurmurHash 1.0

MurmurHash 1.0, the original version of the released in by , produces a 32-bit output and is designed for non-cryptographic applications requiring fast hashing with good distribution properties. It processes input data in 32-bit (4-byte) blocks, making it efficient for typical use cases in hash tables and similar structures. A key feature is its use of a simple 32-bit seed parameter to initialize the hash, allowing for seeded variations to reduce collisions in probabilistic settings. The function takes three parameters: an input key as a byte array (const void *key), the length of the key in bytes (int len), and the 32-bit seed (uint32_t seed). These enable flexible hashing of arbitrary while incorporating length awareness to avoid certain attacks or biases. The basic structure involves initializing the hash state, processing full 4-byte blocks through a mixing operation, handling any remaining bytes (tail), and applying finalization steps. For each block, the algorithm reads the 4 bytes as an unsigned k, mixes it into the hash h via , by a constant (m = 0xc6a4a793), and a right-shift XOR (h ^= h >> 16), effectively applying a single round per block. The tail processing incorporates leftover bytes with similar mixing, followed by finalization that includes additional multiplications and shift-XORs to propagate bits across the hash value. Here is an outline of the algorithm in :

function MurmurHash1(key, len, seed): m = 0xc6a4a793 r = 16 h = seed ^ (len * m) data = key while len >= 4: k = read_uint32(data) // Read 4 bytes as unsigned int h += k h *= m h ^= h >> r data += 4 len -= 4 // Tail handling if len > 0: if len == 3 or len >= 3: h += read_uint8(data + 2) << 16 if len >= 2: h += read_uint8(data + 1) << 8 if len >= 1: h += read_uint8(data) h *= m h ^= h >> r // Finalization h *= m h ^= h >> 10 h *= m h ^= h >> 17 return h

function MurmurHash1(key, len, seed): m = 0xc6a4a793 r = 16 h = seed ^ (len * m) data = key while len >= 4: k = read_uint32(data) // Read 4 bytes as unsigned int h += k h *= m h ^= h >> r data += 4 len -= 4 // Tail handling if len > 0: if len == 3 or len >= 3: h += read_uint8(data + 2) << 16 if len >= 2: h += read_uint8(data + 1) << 8 if len >= 1: h += read_uint8(data) h *= m h ^= h >> r // Finalization h *= m h ^= h >> 10 h *= m h ^= h >> 17 return h

This structure assumes little-endian byte order and 32-bit integer alignment for optimal performance. Despite its speed, MurmurHash 1.0 has notable limitations, including poor hashing quality on certain inputs like long strings, where bit diffusion is insufficient. It is also endian-dependent, producing different results on little-endian versus big-endian architectures due to unnormalized byte reads. Additionally, it exhibits higher collision rates in statistical tests compared to subsequent versions, making it less suitable for modern high-collision-risk scenarios. MurmurHash 1.0 has largely been superseded by later variants offering better quality and portability, though it remains in use within some legacy systems for compatibility.

MurmurHash 2.0

MurmurHash 2.0 serves as a transitional refinement in the MurmurHash family, extending support to 64-bit systems while building on the 32-bit foundation of version 1.0. Released by , it introduced a dedicated 64-bit output variant alongside the existing 32-bit one, with improved mixing mechanisms tailored for longer input keys to enhance overall hash quality. The algorithm retains block-based processing, handling 4-byte blocks in the 32-bit mode and 8-byte blocks in the 64-bit mode, making it adaptable to both architectures without fundamental redesign. Parameters in MurmurHash 2.0 mirror those of version 1.0 in structure, utilizing a for —now supporting 64-bit seeds in the extended variant—but provide distinct functions for 32-bit (e.g., MurmurHash2) and 64-bit (e.g., MurmurHash64A) outputs to optimize performance on target hardware. This separation allows developers to select the appropriate output size based on application needs, such as sizing in 64-bit environments. Compared to MurmurHash 1.0, version 2.0 incorporates better handling of partial blocks via detailed tail-processing logic for remainders of 1 to 7 bytes, reducing inconsistencies in short or uneven inputs. It also mitigates bias in hash distributions for certain input patterns through refined and constants, resulting in superior speed and robustness overall. MurmurHash 2.0, however, remains endian-specific, assuming little-endian byte order as on x86 platforms, which limits portability to big-endian systems without modifications. It is also vulnerable to some collision attacks identified in subsequent analyses using tools like the , and its lags behind MurmurHash 3.0 on 64-bit hardware due to less efficient mixing rounds. This version saw brief adoption in early 64-bit applications, including caching frameworks like JBoss Infinispan, where its speed-to-quality balance suited non-cryptographic lookups before MurmurHash 3.0 superseded it. For illustration, the 64-bit variant processes 8-byte blocks with adjusted constants, such as a multiplication factor of 0xc6a4a7935bd1e995 and a right shift by 47 bits, to propagate changes effectively across the hash state.

MurmurHash 3.0

MurmurHash 3.0, released in by , represents the current standard version of the MurmurHash family, designed as a fully portable non-cryptographic suitable for general-purpose hashing tasks. It addresses limitations in earlier iterations by providing endian-neutral processing, ensuring consistent results across little-endian and big-endian systems without platform-specific adjustments. This version supports both 32-bit and 64-bit architectures, making it adaptable to diverse hardware environments while maintaining high performance. Key features of MurmurHash 3.0 include its ability to produce variable output sizes of 32 bits or 128 bits, allowing users to select the appropriate hash length based on application needs, such as space efficiency in hash tables or in probabilistic data structures. It also offers flexible seeding options, with 32-bit or 64-bit seeds to enable multiple independent hash instances for techniques like . The primary parameters are the input key (a byte ), its length, the seed value, and an output size selector, which together define the hashing operation. Within MurmurHash 3.0, there are three main variants optimized for specific platforms: the x86 variant for 32-bit hashing on x86 processors, the x64 variant, which produces 128-bit hashes optimized for x64 architectures, and the x86-128 variant, which generates 128-bit hashes for enhanced quality on x86 systems. Compared to previous versions, MurmurHash 3.0 introduces endian-neutral mixing to eliminate byte-order dependencies, leverages SSE and AVX instructions for accelerated processing on modern CPUs, and achieves a superior avalanche effect for better bit diffusion and distribution properties. These enhancements result in faster execution and more robust hashing without sacrificing simplicity. MurmurHash 3.0 has seen widespread adoption as the default hashing algorithm in numerous modern libraries. For instance, Python's mmh3 package implements it as its core functionality, providing bindings for high-performance applications as of 2025. Similarly, Google's Guava library in Java includes MurmurHash 3.0 variants for 32-bit and 128-bit outputs, integrated into its hashing utilities for production use. Although there have been no official updates from the original author since 2011, community efforts continue to maintain and adapt implementations for emerging architectures, ensuring ongoing relevance.

Algorithm

Core Principles

MurmurHash employs a block-based approach, where the input data is divided into fixed-size blocks—typically 4 bytes for 32-bit variants and 16 bytes for 64-bit variants—and processed sequentially to generate the hash value. Any remaining bytes that do not fill a complete block are handled separately as a during the finalization phase, ensuring the entire input contributes to the output without truncation. A key feature is the use of a seed parameter to initialize the internal hash state, which serves as a salting mechanism. This allows the same input to produce distinct hash values across different invocations, facilitating applications such as generating multiple independent hashes from a single input set or mitigating predictable patterns in hash table usage. The core mixing operations aim to thoroughly diffuse the bits from each input block throughout the accumulating hash state, promoting a uniform distribution of output values and minimizing the likelihood of collisions in downstream applications like hash tables or bloom filters. These operations are designed to achieve strong avalanche properties, where small changes in the input lead to significant changes in the output. MurmurHash variants share empirically tuned magic constants, such as 0x85ebca6b in 32-bit implementations, which are carefully selected through testing to enhance mixing quality and ensure desirable statistical properties without relying on complex computations. The algorithm aspires to pseudorandom behavior, producing outputs that mimic a for typical, non-adversarial inputs, thereby providing reliable non-cryptographic hashing suitable for performance-critical scenarios. In terms of high-level flow, MurmurHash loads successive input blocks into the processing pipeline, mixes each one into an accumulating hash value, and concludes by incorporating the total input length to prevent length-extension ambiguities and finalize the result. This structured progression balances computational efficiency with quality, making it adaptable for both streaming and of data.

Mixing and Finalization Steps

The mixing process in MurmurHash3 involves applying a series of , , and XOR operations to input blocks and partial blocks (tails) to diffuse bits effectively. For the 32-bit variant (MurmurHash3_x86_32), each 4-byte block is loaded as a 32-bit value k1k_1, which is then mixed using the constants 0xcc9e2d51 and 0x1b873593: k1k1×0xcc9e2d51k_1 \leftarrow k_1 \times 0xcc9e2d51, followed by a left by 15 bits (rotl15(k1)\text{rotl}_{15}(k_1)), another k1k1×0x1b873593k_1 \leftarrow k_1 \times 0x1b873593, and XOR into the accumulator h1h1k1h_1 \leftarrow h_1 \oplus k_1. The accumulator is then further mixed: h1rotl13(h1)h_1 \leftarrow \text{rotl}_{13}(h_1), h1h1×5+0xe6546b64h_1 \leftarrow h_1 \times 5 + 0xe6546b64. The tail, consisting of remaining bytes less than 4, is handled by shifting those bytes into a 32-bit k1k_1 (e.g., for 3 bytes: k_1 \leftarrow \text{tail}{{grok:render&&&type=render_inline_citation&&&citation_id=0&&&citation_type=wikipedia}} + (\text{tail}{{grok:render&&&type=render_inline_citation&&&citation_id=1&&&citation_type=wikipedia}} \ll 8) + (\text{tail}{{grok:render&&&type=render_inline_citation&&&citation_id=2&&&citation_type=wikipedia}} \ll 16)) and applying the block mixing operations (multiplications, rotation, XOR into h1h_1) without the subsequent accumulator mixing step, yielding the final 32-bit output after finalization. The finalization step incorporates the input length and ensures complete bit using the dedicated mixing function fmix32(h)\text{fmix32}(h), defined as: hh(h16),hh×0x85ebca6b,hh(h13),hh×0xc2b2ae35,hh(h16).\begin{align*} h &\leftarrow h \oplus (h \gg 16), \\ h &\leftarrow h \times 0x85ebca6b, \\ h &\leftarrow h \oplus (h \gg 13), \\ h &\leftarrow h \times 0xc2b2ae35, \\ h &\leftarrow h \oplus (h \gg 16). \end{align*} Prior to applying fmix32\text{fmix32}, the length is XORed into the accumulator: h1h1lenh_1 \leftarrow h_1 \oplus \text{len}. This fmix32 promotes strong properties, where each output bit depends on a large number of input bits. In the 64-bit variant (underlying the 128-bit output version, MurmurHash3_x64_128), mixing operates on 16-byte blocks with two parallel 64-bit accumulators h1h_1 and h2h_2, using constants c1=0x87c37b91114253d5c_1 = 0x87c37b91114253d5 and c2=0x4cf5ad432745937fc_2 = 0x4cf5ad432745937f. For each block, two 8-byte values k1k_1 and k2k_2 are loaded; k1k_1 is mixed as k1k1×c1k_1 \leftarrow k_1 \times c_1, rotl31(k1)\text{rotl}_{31}(k_1), k1k1×c2k_1 \leftarrow k_1 \times c_2, then h1h1k1h_1 \leftarrow h_1 \oplus k_1, followed by h1rotl27(h1)+h2h_1 \leftarrow \text{rotl}_{27}(h_1) + h_2, h1h1×5+0x52dce729h_1 \leftarrow h_1 \times 5 + 0x52dce729. Similarly, k2k_2 is mixed as k2k2×c2k_2 \leftarrow k_2 \times c_2, rotl33(k2)\text{rotl}_{33}(k_2), k2k2×c1k_2 \leftarrow k_2 \times c_1, h2h2k2h_2 \leftarrow h_2 \oplus k_2, h2rotl31(h2)+h1h_2 \leftarrow \text{rotl}_{31}(h_2) + h_1, h2h2×5+0x38495ab5h_2 \leftarrow h_2 \times 5 + 0x38495ab5. Tail bytes (less than 16) are incorporated by loading them into k1k_1 (for 0-8 bytes, partial low bits for 0-7 or full for 8) or into partial k2k_2 (low bits for 9-15 bytes, with high byte zeroed) and full k1k_1, then applying the full block mixing operations to each, including accumulator updates. Finalization XORs the length into both accumulators: h1h1lenh_1 \leftarrow h_1 \oplus \text{len}, h2h2lenh_2 \leftarrow h_2 \oplus \text{len}; mixes them pairwise h1h1+h2h_1 \leftarrow h_1 + h_2, h2h2+h1h_2 \leftarrow h_2 + h_1; applies the 64-bit mixing function fmix64\text{fmix64} to each; and combines again h1h1+h2h_1 \leftarrow h_1 + h_2, h2h2+h1h_2 \leftarrow h_2 + h_1 to produce the 128-bit output as two 64-bit values. The fmix64\text{fmix64} is defined as: kk(k33),kk×0xff51afd7ed558ccd,kk(k33),kk×0xc4ceb9fe1a85ec53,kk(k33).\begin{align*} k &\leftarrow k \oplus (k \gg 33), \\ k &\leftarrow k \times 0xff51afd7ed558ccd, \\ k &\leftarrow k \oplus (k \gg 33), \\ k &\leftarrow k \times 0xc4ceb9fe1a85ec53, \\ k &\leftarrow k \oplus (k \gg 33). \end{align*} This structure ensures thorough bit diffusion across the parallel hashes.

Implementations

Official Releases

The official reference implementation of MurmurHash was developed by its creator, , in C/C++, with initial commits dating from 2008 to 2011. This canonical is hosted on in the smhasher repository, which serves as the primary distribution point for the hash functions alongside the SMHasher . Early versions, MurmurHash 1.0 and 2.0, are provided as compact single-file implementations suitable for direct inclusion in projects. In contrast, MurmurHash 3.0 expands into a fuller library structure, featuring dedicated functions for 32-bit and 128-bit hashing on x86 and x64 architectures. The repository includes comprehensive test vectors to verify correct implementation across these variants. The code is released into the , allowing unrestricted use, modification, and distribution without licensing obligations. Maintenance of the project halted following the completion of MurmurHash 3.0 in 2011, leaving it in a stable, archived state with no subsequent development. The original distribution site, murmurhash..net, is now defunct, but the GitHub mirror preserves all assets for download and verification. While optimized for x86 processors, the is portable and has been adapted to other architectures through minor adjustments to and integer handling.

Language-Specific Adaptations

MurmurHash has been adapted across numerous programming languages through dedicated libraries and built-in modules, enabling its use in diverse ecosystems while addressing language-specific constraints such as , performance, and platform portability. These ports often maintain fidelity to the original algorithm but incorporate optimizations like architecture-specific assembly or bindings to native code for efficiency. In Python, the mmh3 library provides a comprehensive implementation of MurmurHash3, available as both a pure Python version via pymmh3 and a high-performance extension for faster execution. This library supports all major MurmurHash3 variants, including 32-bit, 64-bit, and 128-bit hashes, making it suitable for applications requiring in and . Java integrations prominently feature MurmurHash3 within the library from , specifically through the com.google.common.hash.Murmur3_128 class, which generates 128-bit hashes and is widely used in Android applications for tasks like caching and data partitioning. Standalone pure implementations, such as the murmur library, offer additional flexibility for environments without Guava dependencies. For C#, third-party libraries like Data.HashFunction.MurmurHash provide robust MurmurHash3 support via , implementing 32-bit and 128-bit variants compatible with .NET Core and later frameworks. While .NET's System.HashCode does not natively include MurmurHash3, these packages fill the gap for scenarios demanding high-speed non-cryptographic hashing, such as in game development or database indexing. In Go, the twmb/murmur3 package delivers a native of MurmurHash3 with assembly optimizations for AMD64 architectures, supporting 32-bit, 64-bit, and 128-bit outputs for efficient string and byte slicing. Go's includes hash/maphash, a distinct but performance-oriented hash inspired by principles similar to MurmurHash, though it is not a direct port. Rust adaptations include the , a pure port of MurmurHash3 that ensures safety and portability without unsafe code, alongside the , which draws inspiration from MurmurHash's design for even faster performance but diverges in its mixing mechanics. These crates are commonly used in for hash tables and bloom filters. JavaScript implementations, such as murmurhash-js, offer an optimized port of MurmurHash algorithms for both and browser environments, processing strings and buffers to produce consistent 32-bit or 128-bit hashes suitable for client-side data structures. Beyond language-specific ports, MurmurHash is integrated into major frameworks like Apache Hadoop, where the org.apache.hadoop.util.hash.MurmurHash class facilitates data partitioning in distributed computing. In Redis clients such as Jedis, it aids key hashing for consistent operations across clusters. The V8 JavaScript engine in Chrome employs MurmurHash variants for internal optimizations in object hashing and garbage collection. Recent adaptations as of 2025 have focused on platform-specific enhancements, including SIMD instructions in assembly-optimized ports for ARM and x86-64 to boost throughput in vectorized workloads, and fixes for endianness discrepancies to ensure cross-platform hash consistency. For instance, some C++-based bindings in Python and Go libraries explicitly handle byte-order variations to prevent mismatches on big-endian systems.

Performance and Quality

Speed Benchmarks

MurmurHash3, particularly its x64 variant, demonstrates high throughput in benchmarks, achieving approximately 7.1 GB/s for processing large inputs on a 3.6 GHz 5 PRO 3350G processor. This performance positions it as one of the faster non-cryptographic hash functions, though it lags behind more recent alternatives optimized for modern hardware. Benchmarks from the SMHasher suite, originally developed in 2011 and updated through 2023, provide standardized evaluations across various hash functions using consistent test conditions. Comparisons highlight MurmurHash3's efficiency relative to older algorithms. For instance, it outperforms FNV-1a by a factor of about 9x (FNV-1a at 0.75 GB/s) and Jenkins' one-at-a-time hash (OOAT) at 0.62 GB/s, offering similar or better speed with superior distribution properties. Against contemporaries like CityHash64 (14.4 GB/s) and xxHash64 (11.2 GB/s), MurmurHash3 is slower by roughly 1.6x to 2x, reflecting trade-offs in algorithmic complexity and SIMD utilization.
Hash FunctionThroughput (GB/s)Relative to MurmurHash3
MurmurHash3 (x64, finalizer)7.11x
FNV-1a0.759.5x slower
Jenkins OOAT0.6211.5x slower
CityHash6414.42x faster
xxHash6411.21.6x faster
Several factors influence MurmurHash3's speed. It excels with block-sized inputs due to its efficient mixing of 32- or 64-bit chunks, but performance degrades on short keys where initialization and finalization overhead dominate. The x86-128 variant leverages 128-bit integer operations for better efficiency on compatible hardware, though it does not extensively use SIMD intrinsics like later hashes. MurmurHash3 is optimized for little-endian x86 and architectures, where it achieves peak performance. As of October 2025, optimizations in libstdc++ have improved its efficiency on platforms, such as a ~9% on Neoverse-V2 hardware. These benchmarks typically involve gigabyte-scale inputs processed in chunks, with metrics reported in cycles per byte or throughput in GB/s to capture sustained performance under realistic workloads.

Avalanche and Distribution Properties

MurmurHash3 exhibits a strong , where altering a single bit in the input typically results in approximately 50% of the output bits flipping, with a bias of less than 0.25% per bit as reported by the algorithm's creator. This property is evaluated through chi-squared tests in the SMHasher suite, where MurmurHash3 variants, such as Murmur3A, achieve low moment chi-squared values around 69, indicating effective and passing most avalanche-related assessments. However, minor flaws have been noted in the 128-bit variant under specific seeds, showing slightly elevated collision rates in certain structured inputs, though these do not significantly impact overall performance. The distribution properties of MurmurHash3 demonstrate high uniformity across the 32-bit output space, with low bias levels, such as deviations under 0.1% in collision tests conducted via SMHasher. This ensures even spreading of hash values, minimizing clustering in applications. In contrast, earlier versions like MurmurHash2 exhibit higher variance, with distribution biases up to 1.7% and failures in bit independence criterion (BIC) tests, leading to correlated outputs. Key statistical properties tested include serial correlation and bit independence, where MurmurHash1 and 2 show elevated serial correlation in sequential inputs, resulting in predictable patterns and higher collision probabilities in tests like LongNeighbors. MurmurHash3 addresses these by incorporating improved mixing rounds, reducing such biases and achieving better independence across output bits, as verified in SMHasher's 2011 results with average avalanche scores near 49.5% bit flips. These enhancements in MurmurHash3 over predecessors mitigate issues like sequential input biases observed in versions 1.0 and , where incremental data led to non-uniform distributions and increased variance in chi-squared metrics. The algorithm's design prioritizes these qualities for non-cryptographic use, passing the majority of SMHasher's distribution and independence tests without the flaws present in older variants. Benchmarks from the SMHasher suite (updated through 2023) confirm the stability of these properties, with no major regressions in or distribution metrics.

Security Analysis

Known Weaknesses

MurmurHash versions 1.0 and 2.0 exhibit vulnerabilities stemming from inadequate mixing in their finalization steps, which allow attackers to generate chosen-prefix collisions with relatively low computational effort. Specifically, for the 32-bit variant, adversaries can find colliding pairs using approximately 2^16 targeted queries, as demonstrated through statistical tests in the SMHasher suite released in 2011, which exposed these flaws in distribution uniformity and . These issues prompted the development of MurmurHash 3.0 to address the weak properties observed in earlier iterations. MurmurHash 3.0 demonstrates improved resilience, with no full collisions identified in exhaustive testing. A primary attack vector against MurmurHash variants involves hash flooding in hash table structures, enabling amortized denial-of-service (DoS) by forcing degenerate performance through induced collisions. In 2012 demonstrations, attackers exploited MurmurHash 2.0 to generate multicollisions requiring only about 2^20 operations on certain platforms, targeting applications like web parsers in Ruby and Perl environments. MurmurHash 3.0 remains susceptible to multicollision attacks for hash flooding, requiring approximately 2^20 operations similar to version 2.0, though with slightly improved mixing in some scenarios. Similar exploits were noted in hash-dependent systems, though not directly tied to Node.js V8, which adopted alternative mitigations like SipHash for related flooding risks around 2019. Mitigations for these weaknesses primarily involve incorporating randomized seeds at runtime to disrupt predictable collision patterns, a non-cryptographic approach sufficient for MurmurHash's intended use cases. For environments handling untrusted inputs, such as servers, transitioning to collision-resistant functions like is recommended alongside randomization. As of 2025, no new (CVEs) have been assigned to MurmurHash itself, though tests in tools like SMHasher continue to flag potential edge cases, such as handling zero-length inputs, which may lead to suboptimal mixing in unpatched implementations. Recent analyses confirm structural weaknesses allowing efficient collision generation for DoS attacks in MurmurHash3.

Non-Cryptographic Limitations

MurmurHash lacks preimage resistance and strong diffusion properties essential for cryptographic applications, as it is engineered primarily for efficient general-purpose hashing rather than protection against inversion or targeted collision searches by adversaries. Its design prioritizes uniform distribution and speed over security margins, making it unsuitable for scenarios where inputs might be crafted to exploit structural weaknesses. The algorithm's deterministic behavior with a fixed renders outputs predictable, allowing potential reversal through computational effort; for instance, recovering a 32-bit preimage requires approximately 2^{32} operations, which is practical on contemporary hardware, while longer variants remain vulnerable without cryptographic safeguards. This predictability exposes applications to risks like hash flooding in denial-of-service attacks, where attackers can generate colliding inputs to degrade performance. MurmurHash does not possess formal security proofs akin to those for the SHA family, relying instead on empirical evaluation via test suites like SMHasher for quality metrics such as effects and distribution. It has not been validated against cryptographic standards, including NIST's requirements for in secure modules, underscoring its non-cryptographic intent. Experts recommend confining MurmurHash to internal data structures in trusted environments, such as hash tables for non-sensitive lookups, and switching to keyed primitives like or BLAKE3 for DoS-resistant or security-critical needs, where a secret enhances unpredictability. In comparisons, MurmurHash achieves higher throughput than cryptographic hashes but offers inferior protection; for example, while enables practical chosen-prefix collisions at around 2^{39} complexity, MurmurHash provides no such resistance and is prone to multicollision exploits. Best practices advocate combining it with salting via a secret key to mitigate some risks.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.