Hubbry Logo
MD4MD4Main
Open search
MD4
Community hub
MD4
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something
MD4
MD4
from Wikipedia
MD4
General
DesignersRonald Rivest
First publishedOctober 1990[1]
SeriesMD2, MD4, MD5, MD6
Cipher detail
Digest sizes128 bits
Block sizes512 bits
Rounds3
Best public cryptanalysis
A collision attack published in 2007 can find collisions for full MD4 in less than two hash operations.[2]

The MD4 Message-Digest Algorithm is a cryptographic hash function developed by Ronald Rivest in 1990.[3] The digest length is 128 bits. The algorithm has influenced later designs, such as the MD5, SHA-1 and RIPEMD algorithms. The initialism "MD" stands for "Message Digest".

One MD4 operation. MD4 consists of 48 of these operations, grouped in three rounds of 16 operations. F is a nonlinear function; one function is used in each round. Mi denotes a 32-bit block of the message input, and Ki denotes a 32-bit constant, different for each round.

The security of MD4 has been severely compromised. The first full collision attack against MD4 was published in 1995, and several newer attacks have been published since then. As of 2007, an attack can generate collisions in less than two MD4 hash operations.[2] A theoretical preimage attack also exists.

A variant of MD4 is used in the ed2k URI scheme to provide a unique identifier for a file in the popular eDonkey2000 / eMule P2P networks. MD4 was also used by the rsync protocol (prior to version 3.0.0).

MD4 is used to compute NTLM password-derived key digests on Microsoft Windows NT, XP, Vista, 7, 8, 10 and 11.[4]

Security

[edit]

Weaknesses in MD4 were demonstrated by Den Boer and Bosselaers in a paper published in 1991.[5] The first full-round MD4 collision attack was found by Hans Dobbertin in 1995, which took only seconds to carry out at that time.[6] In August 2004, Wang et al. found a very efficient collision attack, alongside attacks on later hash function designs in the MD4/MD5/SHA-1/RIPEMD family. This result was improved later by Sasaki et al., and generating a collision is now as cheap as verifying it (a few microseconds).[2]

In 2008, the preimage resistance of MD4 was also broken by Gaëtan Leurent, with a 2102 attack.[7] In 2010 Guo et al published a 299.7 attack.[8]

In 2011, RFC 6150 stated that RFC 1320 (MD4) is historic (obsolete).

MD4 hashes

[edit]

The 128-bit (16-byte) MD4 hashes (also termed message digests) are typically represented as 32-digit hexadecimal numbers. The following demonstrates a 43-byte ASCII input and the corresponding MD4 hash:

MD4("The quick brown fox jumps over the lazy dog")
= 1bee69a46ba811185c194762abaeae90

Even a small change in the message will (with overwhelming probability) result in a completely different hash, e.g. changing d to c:

MD4("The quick brown fox jumps over the lazy cog")
= b86e130ce7028da59e672d56ad0113df

The hash of the zero-length string is:

MD4("") = 31d6cfe0d16ae931b73c59d7e0c089c0

MD4 test vectors

[edit]

The following test vectors are defined in RFC 1320 (The MD4 Message-Digest Algorithm)

MD4 ("") = 31d6cfe0d16ae931b73c59d7e0c089c0
MD4 ("a") = bde52cb31de33e46245e05fbdbd6fb24
MD4 ("abc") = a448017aaf21d8525fc10ae87aa6729d
MD4 ("message digest") = d9130a8164549fe818874806e1c7014b
MD4 ("abcdefghijklmnopqrstuvwxyz") = d79e1c308aa5bbcdeea8ed63df412da9
MD4 ("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789") = 043f8582f241db351ce627e153e7f0e4
MD4 ("12345678901234567890123456789012345678901234567890123456789012345678901234567890") = e33b4ddc9c38f2199c3e7b164fcc0536

MD4 collision example

[edit]

Let:

 k1 = 839c7a4d7a92cb5678a5d5b9eea5a7573c8a74deb366c3dc20a083b69f5d2a3bb3719dc69891e9f95e809fd7e8b23ba6318edd45e51fe39708bf9427e9c3e8b9
 k2 = 839c7a4d7a92cbd678a5d529eea5a7573c8a74deb366c3dc20a083b69f5d2a3bb3719dc69891e9f95e809fd7e8b23ba6318edc45e51fe39708bf9427e9c3e8b9
MD4(k1) = MD4(k2) = 4d7e6a1defa93d2dde05b45d864c429b

Note that two hex-digits of k1 and k2 define one byte of the input string, whose length is 64 bytes .

See also

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
MD4 (Message-Digest Algorithm 4) is a cryptographic hash function that produces a 128-bit hash value, known as a message digest, from an input message of arbitrary length. Developed by Ronald L. Rivest of the MIT Laboratory for and RSA Data Security, Inc., it was published in 1990 as a faster and simpler successor to the earlier MD2 algorithm, with the primary goal of enabling secure compression of large files for digital signature applications under public-key cryptosystems like RSA. The algorithm processes input in 512-bit blocks through three rounds of 16 operations each, using a 128-bit buffer initialized with fixed constants and incorporating bitwise functions (f, g, h) along with modular additions and rotations optimized for 32-bit processors. At the time of its release, MD4 was conjectured to offer strong security properties, including —making it computationally infeasible (requiring more than 264 operations) to find two distinct messages with the same digest—and —requiring more than 2128 operations to find a message matching a given digest. It was specified in RFC 1320 in 1992 and implemented in various systems for checks, such as early versions of Microsoft's authentication protocol. However, subsequent revealed significant weaknesses: a full was demonstrated in 1996 by Hans Dobbertin with 220 MD4 computations, and it was further improved in 2004 by Xiaoyun Wang and colleagues with a complexity of just 28 hash operations (verifiable manually). Further advances included a first in 2008 with 2100 effort and improvements reducing second preimage attacks to 269.4 complexity, along with key recovery vulnerabilities in HMAC-MD4 requiring 277 computations. Due to these flaws, MD4 provides no meaningful and has been deprecated since the mid-1990s by its creators at RSA, with the IETF formally moving it to Historic status in 2011 to discourage any further use in cryptographic protocols. It served as the basis for subsequent hash functions like (1991) and influenced the design of , but modern applications have shifted to secure alternatives such as SHA-256 from the family. Despite its obsolescence, MD4 remains of historical and academic interest for studying design and differential techniques.

Overview and History

Description and Purpose

MD4 is a developed by Ronald Rivest in 1990. It takes an input message of arbitrary length and produces a fixed 128-bit output, typically represented as a 32-character digest. The primary purpose of MD4 is to provide a secure and efficient method for compressing , particularly for use in applications where large files must be "compressed" before with a private key. It was designed to support checks by generating a unique that verifies whether a has been altered. Optimized for fast computation on 32-bit processors, MD4 achieves high throughput, such as approximately 1,450,000 bytes per second on a SUN Sparc workstation. MD4 processes input messages in 512-bit blocks and maintains an internal state consisting of four 32-bit words, labeled A, B, C, and D. These state variables are initialized to fixed constants: A = 0x67452301, B = 0xefcdab89, C = 0x98badcfe, and D = 0x10325476. Compared to its predecessor MD2, which was optimized for 8-bit processors, MD4 offers significant improvements in speed and simplicity while maintaining the 128-bit output size and targeting enhanced security against collision attacks at the time of its release. Rivest later developed MD5 in 1991 as a strengthened successor to address emerging concerns with MD4.

Development and Key Milestones

MD4 was developed by Ronald L. Rivest, a prominent cryptographer known for co-inventing the RSA public-key cryptosystem in 1977 alongside and , and for creating the MD2 message-digest algorithm in 1989 as an early optimized for 8-bit processors. Building on these foundations, Rivest designed MD4 in late 1990 as a faster and more secure successor to MD2, specifically tailored for 32-bit architectures to address the growing need for efficient one-way hash functions in digital signature schemes. The primary motivation for MD4 stemmed from the requirements of emerging protocols and applications, such as (PGP), which demanded a secure method to compress arbitrary-length messages into a fixed 128-bit digest for integrity verification and signing with public-key systems like RSA, without relying on large lookup tables that could hinder performance. Rivest placed the algorithm in the to encourage review and potential standardization, reflecting the collaborative ethos of early cryptographic development in the era. Key milestones include the first public description in RFC 1186 in October 1990, which outlined an initial version of the algorithm, followed by the definitive specification in RFC 1320 published by the Internet Engineering Task Force (IETF) in April 1992, providing a portable reference implementation. MD4 saw rapid adoption shortly thereafter, notably in the initial release of PGP software in 1991, where it served as the core hashing mechanism for message authentication, and in various early cryptographic libraries by 1993, underscoring its role in facilitating secure email and data exchange protocols. This widespread integration was short-lived, however, as emerging security concerns prompted Rivest to introduce MD5 in 1991 as a strengthened variant.

Algorithm Design

Message Preparation and Padding

In the MD4 algorithm, the input message is first prepared by appending padding bits to ensure it can be divided into 512-bit blocks for processing. This padding scheme begins with the addition of a single '1' bit to the end of the message, followed by enough '0' bits to make the total length congruent to 448 512 bits. The purpose of this padding is to align the message length with MD4's block-based design, facilitating the compression function's operation on fixed-size inputs. Padding is applied regardless of the message's original length; if the length is already congruent to 448 modulo 512 bits, a full 512 bits of padding (starting with the '1' bit) are still appended to create an additional block. At least one bit is always added, and the maximum padding added in a single step is 512 bits. Following the bit padding, a 64-bit little-endian representing the original length in bits is appended as the final 64 bits, resulting in a total length that is a multiple of 512 bits. For messages of 0 to 447 bits, the results in exactly one 512-bit block; for 448 to 511 bits, it results in two 512-bit blocks. This ensures all inputs are processed through the compression function. MD4 supports a maximum original message length of 26412^{64} - 1 bits to avoid overflow in the 64-bit length field; if the message exceeds this, only the low-order 64 bits of the length are used, though such cases are not standard.

Initialization and State Management

The MD4 hash function begins with the initialization of a state consisting of four 32-bit registers, denoted as A, B, C, and D. These registers are set to specific initial values: A = 0x67452301, B = 0xefcdab89, C = 0x98badcfe, and D = 0x10325476. These constants are chosen arbitrarily but fixed to ensure consistent hashing across implementations. The state is maintained as a chain of these four 32-bit words throughout the computation. For each 512-bit message block, the current values of A, B, C, and D serve as the input to the compression function. To preserve the chaining, the pre-compression values are temporarily saved (as AA, BB, CC, and DD), the compression is performed to update A, B, C, and D, and then the saved values are added to the updated registers: A ← AA + A, B ← BB + B, C ← CC + C, and D ← DD + D. This modular addition (modulo 2^32) ensures that the output state of one block directly becomes the input state for the subsequent block, enabling the processing of messages longer than 512 bits. MD4 operates on data in little-endian byte order, where the least significant byte of each 32-bit word is stored first. This convention applies to both the representation of the state registers and the parsing of message blocks into 32-bit words, ensuring portability across different system architectures.

Compression Function Steps

The MD4 compression function processes each 512-bit message block to update the 128-bit variables, producing an intermediate hash value that is carried forward. Each block is divided into 16 32-bit words, denoted as M through M, which serve as inputs to a series of 48 computational steps organized into three rounds of 16 steps each. These steps perform nonlinear transformations, modular additions, and bit rotations to achieve and mixing across the state variables A, B, C, and D (initialized from the chaining values). All arithmetic operations are performed 2^{32}, ensuring the values remain within 32 bits. In the first round (steps to 15), the function F is applied, defined as F(X, Y, Z) = (X \land Y) \lor (\lnot X \land Z), which selectively combines bits based on the value of X. Each step updates one of the state variables according to the formula: a=(a+F(b,c,d)+M)sa = (a + F(b, c, d) + M) \ll s where k indexes the message word (following a specific sequence: ,1,,3,,,6,7,8,9,10,11,12,13,14,15), and s is the left amount, through 3, 7, 11, 7, 11, 19, 3, 7, 11, 7, 11, 19, 3, 7, 11, 7 for the respective steps. No constant is added in this round (equivalent to adding ). The state variables are cyclically permuted after each update: for example, after updating A, the registers become D, A, B, C for the next step. This round emphasizes bitwise selection to propagate changes from the message into the state. The second round (steps 16 to 31) employs the function G, defined as G(X, Y, Z) = (X \land Y) \lor (X \land Z) \lor (Y \land Z), which provides a more balanced mixing by including majority-like operations among the inputs. The update formula becomes: a=(a+G(b,c,d)+M+0x5A827999)sa = (a + G(b, c, d) + M + 0x5A827999) \ll s Here, k follows the sequence 0, 4, 8, 12, 1, 5, 9, 13, 2, 6, 10, 14, 3, 7, 11, 15, and rotation amounts s cycle through 3, 5, 9, 13, 3, 5, 9, 13, 3, 5, 9, 13, 3, 5, 9, 13. The constant 0x5A827999, derived from the sine of fractions of π to promote avalanche effects, is added to each step. State permutation follows the same cyclic pattern as in the first round, enhancing nonlinear diffusion. The third round (steps 32 to 47) uses the function H, defined as H(X, Y, Z) = X \oplus Y \oplus Z, a linear XOR operation that further spreads bits across the state for parity-like mixing. The formula is: a=(a+H(b,c,d)+M+0x6ED9EBA1)sa = (a + H(b, c, d) + M + 0x6ED9EBA1) \ll s with k sequencing as 0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15, and s values of 3, 9, 11, 15, 3, 9, 11, 15, 3, 9, 11, 15, 3, 9, 11, 15. The constant 0x6ED9EBA1, similarly based on trigonometric values, is incorporated. Cyclic permutation of the state continues, completing the block's transformation. Upon completing the 48 steps, the updated state values are added to the initial chaining variables: A \leftarrow A + AA, B \leftarrow B + BB, C \leftarrow C + CC, D \leftarrow D + DD, where AA through DD are the saved copies from the block's start. This produces the value for the next block or the final 128-bit hash if it is the last block. The design of these rounds, with varying functions, word orders, and rotations, aims to ensure thorough and irreversibility in the compression.

Security Analysis

Known Weaknesses and Theoretical Flaws

MD4 employs the Merkle-Damgård construction, which appends the message length after padding to the input blocks processed by the compression function; this structural choice renders it vulnerable to length extension attacks, where an adversary can extend a known hash value to forge a valid hash for a longer message without knowledge of the original secret. The early rounds of MD4 exhibit weak diffusion properties, with insufficient bit mixing that allows differences introduced in the input to propagate predictably through the state registers, facilitating the application of differential cryptanalysis techniques. This limited avalanche in the initial steps enables attackers to construct low-probability differential paths that exploit the linear nature of the operations, compromising the function's resistance to chosen-difference manipulations. MD4's compression function consists of 48 steps across three rounds, a choice that was deemed inadequate for providing full 128-bit even by mid-1990s standards, as it failed to achieve the expected level of non-linearity and mixing required to withstand emerging cryptanalytic methods. The reliance on relatively simple functions and fixed shift operations in these steps contributes to predictable behavior under analysis, falling short of the robustness needed for long-term . Theoretically, MD4 was intended to offer on the order of 2^{64} operations due to its 128-bit output size, but the specific choices in its nonlinear functions and round structure result in a to attain this bound, allowing collisions to be found with significantly lower complexity through structured differences. In comparison to its successor , MD4's three-round structure lacks the additional fourth round introduced in MD5, which enhances the by further propagating bit changes across the state and improving overall to better resist differential attacks. This simpler round design in MD4 prioritizes computational efficiency over security margins, contributing to its earlier vulnerability revelations.

Practical Attacks and Exploits

The first practical on MD4 was demonstrated by Hans Dobbertin in 1996, utilizing differential cryptanalysis to identify two distinct 512-bit blocks that produce identical hash outputs. This attack targets the full compression function and requires roughly 2^{20} evaluations of the MD4 function, executable in seconds on contemporary 1990s hardware such as a standard PC. Building on earlier partial attacks, including those on reduced rounds by den Boer and Bosselaers from , Dobbertin's method extended the approach to the complete , confirming MD4's vulnerability to collisions with high probability. The feasibility of generating such collisions in minimal time underscored MD4's inadequacy for collision-resistant applications, prompting immediate scrutiny of its deployment in protocols like digital signatures and integrity checks. In 2004, Xiaoyun Wang and colleagues further advanced collision attacks on MD4, achieving practical results with a complexity of 2^8 MD4 computations (about 256 operations, verifiable by hand) through refined differential paths. This improvement rendered collisions trivial to compute even on modest resources, amplifying concerns over MD4's in legacy systems. Regarding second-preimage resistance, weaknesses were first exploited in 2005 by Hongbo Yu, Gaoli Wang, and others, who developed a second-preimage attack with complexity reduced compared to the brute-force 2^{128}. For first preimage resistance, Gaëtan Leurent presented an attack in 2008 with complexity approximately 2^{102}. These were further improved in 2010 by Jian Guo et al., reducing first preimage attacks to 2^{78.4} and second preimage attacks to 2^{69.4} with pre-computations. Although not as inexpensive as collisions, these attacks demonstrated that finding colliding messages is feasible with significant but achievable resources, further eroding trust in MD4 for purposes. Although no high-profile malware like (which targeted weaknesses for certificate forgery in 2012) directly leveraged MD4, its vulnerabilities have contributed to exploits in outdated protocols, such as legacy authentication in Windows environments where MD4 hashing enables offline attacks on captured challenges. MD4's demonstrated flaws led to its deprecation by NIST in 2011 via RFC 6150, which retired the algorithm to historic status and explicitly advised against its use in any new or existing security contexts due to the ease of practical attacks. Today, MD4 is considered obsolete, with modern systems migrating to secure alternatives like SHA-256 to mitigate collision and preimage risks.

Examples and Implementations

Standard Test Vectors

Standard test vectors for the MD4 algorithm are defined in RFC 1320 to validate the correctness of implementations by comparing computed hashes against expected outputs for specific inputs. These vectors form part of a comprehensive , ensuring and adherence to the algorithm's specifications across different computing environments. The following table presents key test vectors from RFC 1320, including the empty message, a single-byte input, and a 62-byte message:
InputDescriptionMD4 Hash (hexadecimal)
(empty string)No input bytes31d6cfe0d16ae931b73c59d7e0c089c0
"a"Single ASCII byte (0x61)bde52cb31de33e46245e05fbdbd6fb24
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"62 ASCII bytes covering uppercase, lowercase letters, and digits043f8582f241db351ce627e153e7f0e4
These vectors account for MD4's message preparation, including with a single '1' bit followed by zeros to reach a congruent to 448 512 bits, and appending the 64-bit . Implementations must process data in little-endian byte order for 32-bit words to achieve reproducible results matching these hashes.

Collision Demonstration

A prominent demonstration of MD4's collision vulnerability was provided by Hans Dobbertin in 1996, who developed an attack to construct colliding s for the full MD4 algorithm. The attack exploits structural weaknesses through differential , allowing collisions to be found with low complexity (around 220 operations) in seconds on contemporary hardware. Dobbertin's method adapts techniques from attacking related functions like , propagating differences through MD4's rounds to produce distinct messages with the same hash output. This example highlights MD4's lack of , as small input modifications can lead to identical digests, undermining its use in integrity checks.
Add your contribution
Related Hubs
Contribute something
User Avatar
No comments yet.