Recent from talks
Nothing was collected or created yet.
MurmurHash
View on Wikipedia
MurmurHash is a non-cryptographic hash function suitable for general hash-based lookup.[1][2][3] It was created by Austin Appleby in 2008[4] and, as of 8 January 2016,[5] is hosted on GitHub along with its test suite named SMHasher. It also exists in a number of variants,[6] all of which have been released into the public domain. The name comes from two basic operations, multiply (MU) and rotate (R), used in its inner loop.
Unlike cryptographic hash functions, it is not specifically designed to be difficult to reverse by an adversary, making it unsuitable for cryptographic purposes.
Variants
[edit]MurmurHash1
[edit]The original MurmurHash was created as an attempt to make a faster function than Lookup3.[7] Although successful, it had not been tested thoroughly and was not capable of providing 64-bit hashes as in Lookup3. Its design would be later built upon in MurmurHash2, combining a multiplicative hash (similar to the Fowler–Noll–Vo hash function) with an Xorshift.
MurmurHash2
[edit]MurmurHash2[8] yields a 32- or 64-bit value. It comes in multiple variants, including some that allow incremental hashing and aligned or neutral versions.
- MurmurHash2 (32-bit, x86)—The original version; contains a flaw that weakens collision in some cases.[9]
- MurmurHash2A (32-bit, x86)—A fixed variant using Merkle–Damgård construction. Slightly slower.
- CMurmurHash2A (32-bit, x86)—MurmurHash2A, but works incrementally.
- MurmurHashNeutral2 (32-bit, x86)—Slower, but endian- and alignment-neutral.
- MurmurHashAligned2 (32-bit, x86)—Slower, but does aligned reads (safer on some platforms).
- MurmurHash64A (64-bit, x64)—The original 64-bit version. Optimized for 64-bit arithmetic.
- MurmurHash64B (64-bit, x86)—A 64-bit version optimized for 32-bit platforms. It is not a true 64-bit hash due to insufficient mixing of the stripes.[10]
The person who originally found the flaw[clarification needed] in MurmurHash2 created an unofficial 160-bit version of MurmurHash2 called MurmurHash2_160.[11]
MurmurHash3
[edit]The current version, completed April 3, 2011, is MurmurHash3,[12][13] which yields a 32-bit or 128-bit hash value. When using 128-bits, the x86 and x64 versions do not produce the same values, as the algorithms are optimized for their respective platforms. MurmurHash3 was released alongside SMHasher, a hash function test suite.
Implementations
[edit]The canonical implementation is in C++, but there are efficient ports for a variety of popular languages, including Python,[14] C,[15] Go,[16] C#,[13][17] D,[18] Lua, Perl,[19] Ruby,[20] Rust,[21][22] PHP,[23][24] Common Lisp,[25] Haskell,[26] Elm,[27] Clojure,[28] Scala,[29] Java,[30][31][32] Erlang,[33] Swift,[34] Object Pascal,[35] Kotlin,[36] JavaScript,[37] OCaml[38] and Microsoft Excel.[39]
It has been adopted into a number of open-source projects, most notably libstdc++ (ver 4.6), nginx (ver 1.0.1),[40] Rubinius,[41] libmemcached (the C driver for Memcached),[42] npm (nodejs package manager),[43] maatkit,[44] Hadoop,[1] Kyoto Cabinet,[45] Cassandra,[46][47] Solr,[48] vowpal wabbit,[49] Elasticsearch,[50] Guava,[51] Kafka,[52] and RedHat Virtual Data Optimizer (VDO).[53]
Vulnerabilities
[edit]Hash functions can be vulnerable to collision attacks, where a user can choose input data in such a way so as to intentionally cause hash collisions. Jean-Philippe Aumasson and Daniel J. Bernstein were able to show that even implementations of MurmurHash using a randomized seed are vulnerable to so-called HashDoS attacks.[54] With the use of differential cryptanalysis, they were able to generate inputs that would lead to a hash collision. The authors of the attack recommend using their own SipHash instead.
Algorithm
[edit]algorithm Murmur3_32 is
// Note: In this version, all arithmetic is performed with unsigned 32-bit integers.
// In the case of overflow, the result is reduced modulo 232.
input: key, len, seed
c1 ← 0xcc9e2d51
c2 ← 0x1b873593
r1 ← 15
r2 ← 13
m ← 5
n ← 0xe6546b64
hash ← seed
for each fourByteChunk of key do
k ← fourByteChunk
k ← k × c1
k ← k ROL r1
k ← k × c2
hash ← hash XOR k
hash ← hash ROL r2
hash ← (hash × m) + n
with any remainingBytesInKey do
remainingBytes ← SwapToLittleEndian(remainingBytesInKey)
// Note: Endian swapping is only necessary on big-endian machines.
// The purpose is to place the meaningful digits towards the low end of the value,
// so that these digits have the greatest potential to affect the low range digits
// in the subsequent multiplication. Consider that locating the meaningful digits
// in the high range would produce a greater effect upon the high digits of the
// multiplication, and notably, that such high digits are likely to be discarded
// by the modulo arithmetic under overflow. We don't want that.
remainingBytes ← remainingBytes × c1
remainingBytes ← remainingBytes ROL r1
remainingBytes ← remainingBytes × c2
hash ← hash XOR remainingBytes
hash ← hash XOR len
hash ← hash XOR (hash >> 16)
hash ← hash × 0x85ebca6b
hash ← hash XOR (hash >> 13)
hash ← hash × 0xc2b2ae35
hash ← hash XOR (hash >> 16)
A sample C implementation follows (for little-endian CPUs):
static inline uint32_t murmur_32_scramble(uint32_t k) {
k *= 0xcc9e2d51;
k = (k << 15) | (k >> 17);
k *= 0x1b873593;
return k;
}
uint32_t murmur3_32(const uint8_t* key, size_t len, uint32_t seed)
{
uint32_t h = seed;
uint32_t k;
/* Read in groups of 4. */
for (size_t i = len >> 2; i; i--) {
// Here is a source of differing results across endiannesses.
// A swap here has no effects on hash properties though.
memcpy(&k, key, sizeof(uint32_t));
key += sizeof(uint32_t);
h ^= murmur_32_scramble(k);
h = (h << 13) | (h >> 19);
h = h * 5 + 0xe6546b64;
}
/* Read the rest. */
k = 0;
for (size_t i = len & 3; i; i--) {
k <<= 8;
k |= key[i - 1];
}
// A swap is *not* necessary here because the preceding loop already
// places the low bytes in the low places according to whatever endianness
// we use. Swaps only apply when the memory is copied in a chunk.
h ^= murmur_32_scramble(k);
/* Finalize. */
h ^= len;
h ^= h >> 16;
h *= 0x85ebca6b;
h ^= h >> 13;
h *= 0xc2b2ae35;
h ^= h >> 16;
return h;
}
| Test string | Seed value | Hash value (hexadecimal) | Hash value (decimal) |
|---|---|---|---|
0x00000000
|
0x00000000
|
0
| |
0x00000001
|
0x514E28B7
|
1,364,076,727
| |
0xffffffff
|
0x81F16F39
|
2,180,083,513
| |
| test | 0x00000000
|
0xba6bd213
|
3,127,628,307
|
| test | 0x9747b28c
|
0x704b81dc
|
1,883,996,636
|
| Hello, world! | 0x00000000
|
0xc0363e43
|
3,224,780,355
|
| Hello, world! | 0x9747b28c
|
0x24884CBA
|
612,912,314
|
| The quick brown fox jumps over the lazy dog | 0x00000000
|
0x2e4ff723
|
776,992,547
|
| The quick brown fox jumps over the lazy dog | 0x9747b28c
|
0x2FA826CD
|
799,549,133
|
See also
[edit]References
[edit]- ^ a b "Hadoop in Java". Hbase.apache.org. 24 July 2011. Archived from the original on 12 January 2012. Retrieved 13 January 2012.
- ^ Chouza et al.
- ^ "Couceiro et al" (PDF) (in Portuguese). p. 14. Retrieved 13 January 2012.
- ^ Tanjent (tanjent) wrote,3 March 2008 13:31:00. "MurmurHash first announcement". Tanjent.livejournal.com. Retrieved 13 January 2012.
{{cite web}}: CS1 maint: numeric names: authors list (link) - ^ Austin Appleby. "SMHasher". Github.com. Retrieved 23 September 2024.
- ^ "MurmurHash2-160". Simonhf.wordpress.com. 25 September 2010. Retrieved 13 January 2012.
- ^ "MurmurHash1". GitHub. Retrieved 12 January 2019.
- ^ "MurmurHash2 on Github". GitHub.
- ^ "MurmurHash2Flaw". GitHub. Retrieved 15 January 2019.
- ^ "MurmurHash3 (see note on MurmurHash2_x86_64)". GitHub. Retrieved 15 January 2019.
- ^ "MurmurHash2_160". 25 September 2010. Retrieved 12 January 2019.
- ^ "MurmurHash3 on Github". GitHub.
- ^ a b Horvath, Adam (10 August 2012). "MurMurHash3, an ultra fast hash algorithm for C# / .NET".
- ^ "pyfasthash in Python". Retrieved 13 January 2012.
- ^ "C implementation in qLibc by Seungyoung Kim". GitHub.
- ^ "murmur3 in Go". GitHub.
- ^ Landman, Davy. "Davy Landman in C#". Landman-code.blogspot.com. Retrieved 13 January 2012.
- ^ "std.digest.murmurhash - D Programming Language". dlang.org. Retrieved 5 November 2016.
- ^ "Toru Maesaka in Perl". metacpan.org. Retrieved 13 January 2012.
- ^ Yuki Kurihara (16 October 2014). "Digest::MurmurHash". GitHub.com. Retrieved 18 March 2015.
- ^ "stusmall/murmur3". GitHub. Retrieved 29 November 2015.
- ^ "owengombas/murmurs". GitHub. Retrieved 13 June 2025.
- ^ "PHP userland implementation of MurmurHash3". github.com. Retrieved 18 December 2017.
- ^ "PHP 8.1 with MurmurHash3 support".
- ^ "tarballs_are_good / murmurhash3". Retrieved 7 February 2015.
- ^ "Haskell". Hackage.haskell.org. Retrieved 13 January 2012.
- ^ "Elm". package.elm-lang.org. Retrieved 12 June 2019.
- ^ "Murmur3.java in Clojure source code on Github". clojure.org. Retrieved 11 March 2014.
- ^ "Scala standard library implementation". GitHub. 26 September 2014.
- ^ Murmur3, part of Guava
- ^ "Murmur3A and Murmur3F Java classes on Github". greenrobot. Retrieved 5 November 2014.
- ^ "Hash4j by Dynatrace". GitHub. Retrieved 24 September 2025.
- ^ "bipthelin/murmerl3". GitHub. Retrieved 21 October 2015.
- ^ Daisuke T (7 February 2019). "MurmurHash-Swift". GitHub.com. Retrieved 10 February 2019.
- ^ GitHub - Xor-el/HashLib4Pascal: Hashing for Modern Object Pascal
- ^ "goncalossilva/kotlinx-murmurhash". GitHub.com. 10 December 2021. Retrieved 14 December 2021.
- ^ raycmorgan (owner). "Javascript implementation by Ray Morgan". Gist.github.com. Retrieved 13 January 2012.
- ^ INRIA. "OCaml Source". GitHub.com.
- ^ Tim Wambach. "Murmur3 in Excel [ENG]". infsec.de.
- ^ "nginx". Retrieved 13 January 2012.
- ^ "Rubinius". GitHub. Retrieved 29 February 2012.
- ^ "libMemcached". libmemcached.org. Retrieved 21 October 2015.
- ^ "switch from MD5 to murmur". GitHub.
- ^ "maatkit". 24 March 2009. Retrieved 13 January 2012.
- ^ "Kyoto Cabinet specification". Fallabs.com. 4 March 2011. Retrieved 13 January 2012.
- ^ "Partitioners". apache.org. 15 November 2013. Retrieved 19 December 2013.
- ^ "Introduction to Apache Cassandra™ + What's New in 4.0 by Patrick McFadin. DataStax Presents". YouTube. 10 April 2019.
- ^ "Solr MurmurHash2 Javadoc". 31 August 2022. Archived from the original on 2 April 2015.
- ^ "hash.cc in vowpalwabbit source code". GitHub.
- ^ "Elasticsearch 2.0 - CRUD and routing changes".
- ^ "Guava Hashing.java". GitHub.
- ^ "Kafka BuiltInPartitioner.java". GitHub.
- ^ Virtual Data Optimizer source code
- ^ "Breaking Murmur: Hash-flooding DoS Reloaded".
External links
[edit]MurmurHash
View on GrokipediaOverview
Definition and Purpose
MurmurHash is a family of non-cryptographic hash functions designed for general-purpose hashing applications, emphasizing speed and uniform distribution of outputs.[2] These functions derive their name from the "murmur"-like iterative mixing process in their core operations, which involves multiplication and rotation to avalanche bits effectively.[2] Unlike cryptographic hashes, MurmurHash prioritizes performance over resistance to intentional attacks, making it unsuitable for security-sensitive uses such as cryptography or message authentication.[2] The primary purposes of MurmurHash include supporting hash-based data structures like hash tables, where it generates indices to enable efficient lookups and storage.[2] It is also widely applied in Bloom filters for space-efficient probabilistic set membership testing, reducing false positives through multiple hash computations.[5] Additionally, MurmurHash facilitates data partitioning in distributed systems, such as assigning keys to nodes in databases like Cassandra to balance load across clusters. Key advantages of MurmurHash lie in its high execution speed on modern processors, achieving rates up to several gigabytes per second for typical inputs, alongside excellent avalanche properties that ensure good distribution and minimize collisions.[2] The algorithm's simplicity further aids implementation, requiring minimal code while supporting output sizes of 32 bits or 128 bits across variants, tailored to common use cases in caches and non-relational databases.[2]Development History
MurmurHash was developed by Austin Appleby and first released in 2008 as a family of non-cryptographic hash functions aimed at high performance in general-purpose hashing applications.[2] The initial version, MurmurHash 1.0, focused on 32-bit hashing and was designed to surpass the speed and distribution quality of prior simple hashes, such as Bob Jenkins' lookup3, while maintaining low collision rates in hash tables.[6] In 2009, Appleby released MurmurHash 2.0 to address minor weaknesses in the first version and extend support to 64-bit architectures, enabling better utilization of modern processors at the time.[7] In 2011, Appleby undertook a major redesign, releasing MurmurHash 3.0 after extensive testing with the SMHasher suite he developed to evaluate hash function robustness.[8] This version introduced improvements for both 32-bit and 64-bit platforms, including endian-neutral processing to ensure consistent results across architectures and configurable seed options for enhanced security against hash flooding attacks.[9] The redesign was motivated by the growing demands of high-performance computing environments, where faster mixing and finalization steps were needed without sacrificing avalanche properties or speed; it remains the last official update, with no subsequent versions released due to its proven stability and wide adoption. As of 2025, MurmurHash3 remains the latest version, with no official updates since 2011 due to its stability.[2] Key adoption milestones began shortly after MurmurHash 3.0's finalization, with integration into Google's Guava library in February 2013 (Guava 14.0).[10] By 2012, it was incorporated into Apache Hadoop for efficient data partitioning and hashing tasks. With adoptions including Apache Cassandra in 2012 for partitioning keys and Redis for hash table operations by 2015, it solidified its role in open-source high-throughput projects.[11]Variants
MurmurHash 1.0
MurmurHash 1.0, the original version of the hash function released in 2008 by Austin Appleby, produces a 32-bit output and is designed for non-cryptographic applications requiring fast hashing with good distribution properties.[9] It processes input data in 32-bit (4-byte) blocks, making it efficient for typical use cases in hash tables and similar structures.[9] A key feature is its use of a simple 32-bit seed parameter to initialize the hash, allowing for seeded variations to reduce collisions in probabilistic settings.[9] The function takes three parameters: an input key as a byte array (const void *key), the length of the key in bytes (int len), and the 32-bit seed (uint32_t seed).[12] These enable flexible hashing of arbitrary binary data while incorporating length awareness to avoid certain attacks or biases.[12] The basic structure involves initializing the hash state, processing full 4-byte blocks through a mixing operation, handling any remaining bytes (tail), and applying finalization steps.[12] For each block, the algorithm reads the 4 bytes as an unsigned integer k, mixes it into the hash h via addition, multiplication by a constant (m = 0xc6a4a793), and a right-shift XOR (h ^= h >> 16), effectively applying a single avalanche round per block.[12] The tail processing incorporates leftover bytes with similar mixing, followed by finalization that includes additional multiplications and shift-XORs to propagate bits across the hash value.[12] Here is an outline of the algorithm in pseudocode:function MurmurHash1(key, len, seed):
m = 0xc6a4a793
r = 16
h = seed ^ (len * m)
data = key
while len >= 4:
k = read_uint32(data) // Read 4 bytes as unsigned int
h += k
h *= m
h ^= h >> r
data += 4
len -= 4
// Tail handling
if len > 0:
if len == 3 or len >= 3:
h += read_uint8(data + 2) << 16
if len >= 2:
h += read_uint8(data + 1) << 8
if len >= 1:
h += read_uint8(data)
h *= m
h ^= h >> r
// Finalization
h *= m
h ^= h >> 10
h *= m
h ^= h >> 17
return h
function MurmurHash1(key, len, seed):
m = 0xc6a4a793
r = 16
h = seed ^ (len * m)
data = key
while len >= 4:
k = read_uint32(data) // Read 4 bytes as unsigned int
h += k
h *= m
h ^= h >> r
data += 4
len -= 4
// Tail handling
if len > 0:
if len == 3 or len >= 3:
h += read_uint8(data + 2) << 16
if len >= 2:
h += read_uint8(data + 1) << 8
if len >= 1:
h += read_uint8(data)
h *= m
h ^= h >> r
// Finalization
h *= m
h ^= h >> 10
h *= m
h ^= h >> 17
return h
MurmurHash 2.0
MurmurHash 2.0 serves as a transitional refinement in the MurmurHash family, extending support to 64-bit systems while building on the 32-bit foundation of version 1.0. Released by Austin Appleby, it introduced a dedicated 64-bit output variant alongside the existing 32-bit one, with improved mixing mechanisms tailored for longer input keys to enhance overall hash quality.[9] The algorithm retains block-based processing, handling 4-byte blocks in the 32-bit mode and 8-byte blocks in the 64-bit mode, making it adaptable to both architectures without fundamental redesign.[13] Parameters in MurmurHash 2.0 mirror those of version 1.0 in structure, utilizing a seed for randomization—now supporting 64-bit seeds in the extended variant—but provide distinct functions for 32-bit (e.g., MurmurHash2) and 64-bit (e.g., MurmurHash64A) outputs to optimize performance on target hardware.[13] This separation allows developers to select the appropriate output size based on application needs, such as hash table sizing in 64-bit environments.[9] Compared to MurmurHash 1.0, version 2.0 incorporates better handling of partial blocks via detailed tail-processing logic for remainders of 1 to 7 bytes, reducing inconsistencies in short or uneven inputs.[13] It also mitigates bias in hash distributions for certain input patterns through refined multiplication and rotation constants, resulting in superior speed and robustness overall.[9] MurmurHash 2.0, however, remains endian-specific, assuming little-endian byte order as on x86 platforms, which limits portability to big-endian systems without modifications.[13] It is also vulnerable to some collision attacks identified in subsequent analyses using tools like the SMHasher test suite, and its performance lags behind MurmurHash 3.0 on 64-bit hardware due to less efficient mixing rounds.[2] This version saw brief adoption in early 64-bit applications, including caching frameworks like JBoss Infinispan, where its speed-to-quality balance suited non-cryptographic lookups before MurmurHash 3.0 superseded it. For illustration, the 64-bit variant processes 8-byte blocks with adjusted constants, such as a multiplication factor of 0xc6a4a7935bd1e995 and a right shift by 47 bits, to propagate changes effectively across the hash state.[13]MurmurHash 3.0
MurmurHash 3.0, released in 2011 by Austin Appleby, represents the current standard version of the MurmurHash family, designed as a fully portable non-cryptographic hash function suitable for general-purpose hashing tasks.[2] It addresses limitations in earlier iterations by providing endian-neutral processing, ensuring consistent results across little-endian and big-endian systems without platform-specific adjustments.[14] This version supports both 32-bit and 64-bit architectures, making it adaptable to diverse hardware environments while maintaining high performance.[2] Key features of MurmurHash 3.0 include its ability to produce variable output sizes of 32 bits or 128 bits, allowing users to select the appropriate hash length based on application needs, such as space efficiency in hash tables or collision resistance in probabilistic data structures. It also offers flexible seeding options, with 32-bit or 64-bit seeds to enable multiple independent hash instances for techniques like double hashing. The primary parameters are the input key (a byte array), its length, the seed value, and an output size selector, which together define the hashing operation. Within MurmurHash 3.0, there are three main variants optimized for specific platforms: the x86 variant for 32-bit hashing on x86 processors, the x64 variant, which produces 128-bit hashes optimized for x64 architectures, and the x86-128 variant, which generates 128-bit hashes for enhanced quality on x86 systems.[2] Compared to previous versions, MurmurHash 3.0 introduces endian-neutral mixing to eliminate byte-order dependencies, leverages SSE and AVX instructions for accelerated processing on modern CPUs, and achieves a superior avalanche effect for better bit diffusion and distribution properties.[14] These enhancements result in faster execution and more robust hashing without sacrificing simplicity.[2] MurmurHash 3.0 has seen widespread adoption as the default hashing algorithm in numerous modern libraries. For instance, Python's mmh3 package implements it as its core functionality, providing bindings for high-performance applications as of 2025.[15] Similarly, Google's Guava library in Java includes MurmurHash 3.0 variants for 32-bit and 128-bit outputs, integrated into its hashing utilities for production use.[16] Although there have been no official updates from the original author since 2011, community efforts continue to maintain and adapt implementations for emerging architectures, ensuring ongoing relevance.[2]Algorithm
Core Principles
MurmurHash employs a block-based processing approach, where the input data is divided into fixed-size blocks—typically 4 bytes for 32-bit variants and 16 bytes for 64-bit variants—and processed sequentially to generate the hash value. Any remaining bytes that do not fill a complete block are handled separately as a tail during the finalization phase, ensuring the entire input contributes to the output without truncation.[2] A key feature is the use of a seed parameter to initialize the internal hash state, which serves as a salting mechanism. This allows the same input to produce distinct hash values across different invocations, facilitating applications such as generating multiple independent hashes from a single input set or mitigating predictable patterns in hash table usage.[2] The core mixing operations aim to thoroughly diffuse the bits from each input block throughout the accumulating hash state, promoting a uniform distribution of output values and minimizing the likelihood of collisions in downstream applications like hash tables or bloom filters. These operations are designed to achieve strong avalanche properties, where small changes in the input lead to significant changes in the output.[2] MurmurHash variants share empirically tuned magic constants, such as 0x85ebca6b in 32-bit implementations, which are carefully selected through testing to enhance mixing quality and ensure desirable statistical properties without relying on complex computations. The algorithm aspires to pseudorandom behavior, producing outputs that mimic a random oracle for typical, non-adversarial inputs, thereby providing reliable non-cryptographic hashing suitable for performance-critical scenarios.[2] In terms of high-level flow, MurmurHash loads successive input blocks into the processing pipeline, mixes each one into an accumulating hash value, and concludes by incorporating the total input length to prevent length-extension ambiguities and finalize the result. This structured progression balances computational efficiency with quality, making it adaptable for both streaming and batch processing of data.[2]Mixing and Finalization Steps
The mixing process in MurmurHash3 involves applying a series of multiplication, rotation, and XOR operations to input blocks and partial blocks (tails) to diffuse bits effectively. For the 32-bit variant (MurmurHash3_x86_32), each 4-byte block is loaded as a 32-bit value , which is then mixed using the constants 0xcc9e2d51 and 0x1b873593: , followed by a left rotation by 15 bits (), another multiplication , and XOR into the accumulator . The accumulator is then further mixed: , .[17][18] The tail, consisting of remaining bytes less than 4, is handled by shifting those bytes into a 32-bit (e.g., for 3 bytes: k_1 \leftarrow \text{tail}{{grok:render&&&type=render_inline_citation&&&citation_id=0&&&citation_type=wikipedia}} + (\text{tail}{{grok:render&&&type=render_inline_citation&&&citation_id=1&&&citation_type=wikipedia}} \ll 8) + (\text{tail}{{grok:render&&&type=render_inline_citation&&&citation_id=2&&&citation_type=wikipedia}} \ll 16)) and applying the block mixing operations (multiplications, rotation, XOR into ) without the subsequent accumulator mixing step, yielding the final 32-bit output after finalization. The finalization step incorporates the input length and ensures complete bit avalanche using the dedicated mixing function , defined as: Prior to applying , the length is XORed into the accumulator: . This fmix32 promotes strong avalanche properties, where each output bit depends on a large number of input bits.[17][18] In the 64-bit variant (underlying the 128-bit output version, MurmurHash3_x64_128), mixing operates on 16-byte blocks with two parallel 64-bit accumulators and , using constants and . For each block, two 8-byte values and are loaded; is mixed as , , , then , followed by , . Similarly, is mixed as , , , , , .[17][18] Tail bytes (less than 16) are incorporated by loading them into (for 0-8 bytes, partial low bits for 0-7 or full for 8) or into partial (low bits for 9-15 bytes, with high byte zeroed) and full , then applying the full block mixing operations to each, including accumulator updates. Finalization XORs the length into both accumulators: , ; mixes them pairwise , ; applies the 64-bit mixing function to each; and combines again , to produce the 128-bit output as two 64-bit values. The is defined as: This structure ensures thorough bit diffusion across the parallel hashes.[17][18]Implementations
Official Releases
The official reference implementation of MurmurHash was developed by its creator, Austin Appleby, in C/C++, with initial commits dating from 2008 to 2011. This canonical source code is hosted on GitHub in the smhasher repository, which serves as the primary distribution point for the hash functions alongside the SMHasher test suite.[2] Early versions, MurmurHash 1.0 and 2.0, are provided as compact single-file implementations suitable for direct inclusion in projects. In contrast, MurmurHash 3.0 expands into a fuller library structure, featuring dedicated functions for 32-bit and 128-bit hashing on x86 and x64 architectures. The repository includes comprehensive test vectors to verify correct implementation across these variants.[2] The code is released into the public domain, allowing unrestricted use, modification, and distribution without licensing obligations.[2] Maintenance of the project halted following the completion of MurmurHash 3.0 in April 2011, leaving it in a stable, archived state with no subsequent development. The original distribution site, murmurhash.sourceforge.net, is now defunct, but the GitHub mirror preserves all assets for download and verification.[2] While optimized for x86 processors, the reference implementation is portable and has been adapted to other architectures through minor adjustments to endianness and integer handling.[2]Language-Specific Adaptations
MurmurHash has been adapted across numerous programming languages through dedicated libraries and built-in modules, enabling its use in diverse ecosystems while addressing language-specific constraints such as memory management, performance, and platform portability. These ports often maintain fidelity to the original algorithm but incorporate optimizations like architecture-specific assembly or bindings to native code for efficiency. In Python, the mmh3 library provides a comprehensive implementation of MurmurHash3, available as both a pure Python version via pymmh3 and a high-performance Cython extension for faster execution. This library supports all major MurmurHash3 variants, including 32-bit, 64-bit, and 128-bit hashes, making it suitable for applications requiring consistent hashing in data science and web development.[15][19] Java integrations prominently feature MurmurHash3 within the Guava library from Google, specifically through thecom.google.common.hash.Murmur3_128 class, which generates 128-bit hashes and is widely used in Android applications for tasks like caching and data partitioning. Standalone pure Java implementations, such as the murmur library, offer additional flexibility for environments without Guava dependencies.[20][21]
For C#, third-party libraries like Data.HashFunction.MurmurHash provide robust MurmurHash3 support via NuGet, implementing 32-bit and 128-bit variants compatible with .NET Core and later frameworks. While .NET's System.HashCode does not natively include MurmurHash3, these packages fill the gap for scenarios demanding high-speed non-cryptographic hashing, such as in game development or database indexing.
In Go, the twmb/murmur3 package delivers a native implementation of MurmurHash3 with assembly optimizations for AMD64 architectures, supporting 32-bit, 64-bit, and 128-bit outputs for efficient string and byte slicing. Go's standard library includes hash/maphash, a distinct but performance-oriented hash inspired by principles similar to MurmurHash, though it is not a direct port.[22]
Rust adaptations include the murmur3 crate, a pure Rust port of MurmurHash3 that ensures safety and portability without unsafe code, alongside the seahash crate, which draws inspiration from MurmurHash's design for even faster performance but diverges in its mixing mechanics. These crates are commonly used in systems programming for hash tables and bloom filters.[23][24]
JavaScript implementations, such as murmurhash-js, offer an optimized port of MurmurHash algorithms for both Node.js and browser environments, processing strings and buffers to produce consistent 32-bit or 128-bit hashes suitable for client-side data structures.[25]
Beyond language-specific ports, MurmurHash is integrated into major frameworks like Apache Hadoop, where the org.apache.hadoop.util.hash.MurmurHash class facilitates data partitioning in distributed computing. In Redis clients such as Jedis, it aids key hashing for consistent operations across clusters. The V8 JavaScript engine in Chrome employs MurmurHash variants for internal optimizations in object hashing and garbage collection.
Recent adaptations as of 2025 have focused on platform-specific enhancements, including SIMD instructions in assembly-optimized ports for ARM and x86-64 to boost throughput in vectorized workloads, and fixes for endianness discrepancies to ensure cross-platform hash consistency. For instance, some C++-based bindings in Python and Go libraries explicitly handle byte-order variations to prevent mismatches on big-endian systems.[22][26]
Performance and Quality
Speed Benchmarks
MurmurHash3, particularly its x64 variant, demonstrates high throughput in benchmarks, achieving approximately 7.1 GB/s for processing large inputs on a 3.6 GHz AMD Ryzen 5 PRO 3350G processor.[27] This performance positions it as one of the faster non-cryptographic hash functions, though it lags behind more recent alternatives optimized for modern hardware. Benchmarks from the SMHasher suite, originally developed in 2011 and updated through 2023, provide standardized evaluations across various hash functions using consistent test conditions.[28] Comparisons highlight MurmurHash3's efficiency relative to older algorithms. For instance, it outperforms FNV-1a by a factor of about 9x (FNV-1a at 0.75 GB/s) and Jenkins' one-at-a-time hash (OOAT) at 0.62 GB/s, offering similar or better speed with superior distribution properties.[27] Against contemporaries like CityHash64 (14.4 GB/s) and xxHash64 (11.2 GB/s), MurmurHash3 is slower by roughly 1.6x to 2x, reflecting trade-offs in algorithmic complexity and SIMD utilization.[27]| Hash Function | Throughput (GB/s) | Relative to MurmurHash3 |
|---|---|---|
| MurmurHash3 (x64, finalizer) | 7.1 | 1x |
| FNV-1a | 0.75 | 9.5x slower |
| Jenkins OOAT | 0.62 | 11.5x slower |
| CityHash64 | 14.4 | 2x faster |
| xxHash64 | 11.2 | 1.6x faster |
