Hubbry Logo
Merkle treeMerkle treeMain
Open search
Merkle tree
Community hub
Merkle tree
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Merkle tree
Merkle tree
from Wikipedia
An example of a binary hash tree. Hashes 0-0 and 0-1 are the hash values of data blocks L1 and L2, respectively, and hash 0 is the hash of the concatenation of hashes 0-0 and 0-1.

In cryptography and computer science, a hash tree or Merkle tree is a tree in which every "leaf" node is labelled with the cryptographic hash of a data block, and every node that is not a leaf (called a branch, inner node, or inode) is labelled with the cryptographic hash of the labels of its child nodes. A hash tree allows efficient and secure verification of the contents of a large data structure. A hash tree is a generalization of a hash list and a hash chain.

Demonstrating that a leaf node is a part of a given binary hash tree requires computing a number of hashes proportional to the logarithm of the number of leaf nodes in the tree.[1] Conversely, in a hash list, the number is proportional to the number of leaf nodes itself. A Merkle tree is therefore an efficient example of a cryptographic commitment scheme, in which the root of the tree is seen as a commitment and leaf nodes may be revealed and proven to be part of the original commitment.[2]

The concept of a hash tree is named after Ralph Merkle, who patented it in 1979.[3][4]

Uses

[edit]

Hash trees can be used to verify any kind of data stored, handled and transferred in and between computers. They can help ensure that data blocks received from other peers in a peer-to-peer network are received undamaged and unaltered, and even to check that the other peers do not lie and send fake blocks.

Hash trees are used in:

Suggestions have been made to use hash trees in trusted computing systems.[12]

Overview

[edit]

A hash tree is a tree of hashes in which the leaves (i.e., leaf nodes, sometimes also called "leafs") are hashes of data blocks in, for instance, a file or set of files. Nodes farther up in the tree are the hashes of their respective children. For example, in the above picture hash 0 is the result of hashing the concatenation of hash 0-0 and hash 0-1. That is, hash 0 = hash( hash 0-0 + hash 0-1 ) where "+" denotes concatenation.

Most hash tree implementations are binary (two child nodes under each node) but they can just as well use many more child nodes under each node.

Usually, a cryptographic hash function such as SHA-2 is used for the hashing. If the hash tree only needs to protect against unintentional damage, unsecured checksums such as CRCs can be used.

In the top of a hash tree there is a top hash (or root hash or master hash). Before downloading a file on a P2P network, in most cases the top hash is acquired from a trusted source, for instance a friend or a web site that is known to have good recommendations of files to download. When the top hash is available, the hash tree can be received from any non-trusted source, like any peer in the P2P network. Then, the received hash tree is checked against the trusted top hash, and if the hash tree is damaged or fake, another hash tree from another source will be tried until the program finds one that matches the top hash.[13]

The main difference from a hash list is that one branch of the hash tree can be downloaded at a time and the integrity of each branch can be checked immediately, even though the whole tree is not available yet. For example, in the picture, the integrity of data block L2 can be verified immediately if the tree already contains hash 0-0 and hash 1 by hashing the data block and iteratively combining the result with hash 0-0 and then hash 1 and finally comparing the result with the top hash. Similarly, the integrity of data block L3 can be verified if the tree already has hash 1-1 and hash 0. This can be an advantage since it is efficient to split files up in very small data blocks so that only small blocks have to be re-downloaded if they get damaged. If the hashed file is big, such a hash list or hash chain becomes fairly big. But if it is a tree, one small branch can be downloaded quickly, the integrity of the branch can be checked, and then the downloading of data blocks can start.[citation needed]

Second preimage attack

[edit]

The Merkle hash root does not indicate the tree depth, enabling a second-preimage attack in which an attacker creates a document other than the original that has the same Merkle hash root. For the example above, an attacker can create a new document containing two data blocks, where the first is hash 0-0 + hash 0-1, and the second is hash 1-0 + hash 1-1.[14][15]

One simple fix is defined in Certificate Transparency: when computing leaf node hashes, a 0x00 byte is prepended to the hash data, while 0x01 is prepended when computing internal node hashes.[13] Limiting the hash tree size is a prerequisite of some formal security proofs, and helps in making some proofs tighter. Some implementations limit the tree depth using hash tree depth prefixes before hashes, so any extracted hash chain is defined to be valid only if the prefix decreases at each step and is still positive when the leaf is reached.

Tiger tree hash

[edit]

The Tiger tree hash is a widely used form of hash tree. It uses a binary hash tree (two child nodes under each node), usually has a data block size of 1024 bytes and uses the Tiger hash.[16]

Tiger tree hashes are used in Gnutella,[17] Gnutella2, and Direct Connect P2P file sharing protocols[18] and in file sharing applications such as Phex,[19] BearShare, LimeWire, Shareaza, DC++[20] and gtk-gnutella.[21]

See also

[edit]

References

[edit]

Further reading

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
A Merkle tree, also known as a hash tree, is a tree data structure in cryptography and computer science where each leaf node is labeled with the cryptographic hash of a data block, and each non-leaf node is labeled with the hash of the labels of its child nodes, resulting in a single root hash that efficiently represents and verifies the integrity of the entire dataset. This structure enables logarithmic-time verification of individual data elements without needing to process the full dataset, making it ideal for applications requiring secure and scalable data authentication. The Merkle tree was invented by cryptographer Ralph C. Merkle to address challenges in schemes, particularly for authenticating multiple messages efficiently using one-time signatures. It was first detailed in Merkle's U.S. Patent No. 4,309,569, filed on September 5, 1979, and issued on January 5, 1982, which describes an "authentication tree function" for providing digital signatures on messages via a of hashes. In this foundational work, Merkle proposed the tree to allow selective verification of "leaves" (individual message hashes) while compressing the overall authentication into a compact root value, overcoming limitations of traditional signature methods that required signing each item separately. To construct a Merkle tree, data blocks are hashed to form nodes, which are then pairwise hashed to create parent nodes, repeating upward until reaching the ; if the number of leaves is odd, a leaf may be duplicated or padded to maintain balance. Verification involves a Merkle proof, consisting of the hashes along the path from a target to the , allowing a verifier to recompute the and confirm inclusion or integrity with minimal data—typically O(log n) hashes for n leaves—while detecting tampering if the recomputed mismatches the known . This property ensures tamper-evident data structures, as any alteration in a propagates changes to all ancestors, invalidating the . Merkle trees have become fundamental in modern and distributed systems, particularly in technologies like and , where they summarize transactions within blocks to enable light clients to verify inclusion without downloading full block data. They are also integral to hash-based schemes, such as those standardized by NIST for , where trees aggregate one-time signatures into a single public key for scalability. Additional applications include certificate transparency logs for detecting misissued TLS certificates and authenticated data structures in secure storage systems, enhancing efficiency in networks and .

Definition and Construction

Core Components

A Merkle tree, also known as a hash tree, is a composed of nodes where each non-leaf node is derived from the cryptographic hashes of its child nodes. Cryptographic hash functions form the foundational mechanism in Merkle trees, mapping input data of arbitrary length to a fixed-size output string, typically 256 bits for functions like SHA-256, while exhibiting properties such as one-way computation (infeasible to reverse), , and the where small input changes produce significantly different outputs. The leaf nodes at the bottom level of a Merkle tree each contain the hash of an individual block, denoted as H(datai)H(data_i) where HH is the and dataidata_i represents the ii-th block of , such as a transaction or file segment. Internal (non-leaf) nodes are computed by concatenating the hashes of their child nodes and applying the again, for example in a binary Merkle tree as H(left_childright_child)H(left\_child || right\_child), where || denotes , allowing hierarchical representation of larger datasets. The root node at the top encapsulates a single hash value that uniquely represents the entire underlying dataset, enabling efficient integrity verification across all leaves without recomputing every hash. For illustration, consider a simple binary Merkle tree with four leaf nodes hashing data blocks D1,D2,D3,D4D_1, D_2, D_3, D_4:

Root: H(H1 || H2) / \ H1: H(H(D1) || H(D2)) H2: H(H(D3) || H(D4)) / \ / \ H(D1) H(D2) H(D3) H(D4)

Root: H(H1 || H2) / \ H1: H(H(D1) || H(D2)) H2: H(H(D3) || H(D4)) / \ / \ H(D1) H(D2) H(D3) H(D4)

This structure, often used in for transaction verification, scales to larger numbers of leaves by or adjusting for balance.

Building Process

The of a Merkle tree begins with dividing the input data into fixed-size blocks, typically of equal length for simplicity, though variable sizes can be handled through . Each data block is then individually hashed using a , such as SHA-256, to produce the leaf nodes of the tree. These leaf hashes represent the base level of the structure and ensure that even minor changes in the data result in a completely different hash value. The building process proceeds recursively in a bottom-up manner. The leaf nodes are paired from left to right, and for each pair, their hashes are concatenated (often with a delimiter or in a specific order to prevent ambiguity) and hashed together to form the parent nodes at the next level. This pairing and hashing step is repeated for subsequent levels: if there are an even number of nodes at a level, they pair perfectly; if odd, the final unpaired node is duplicated and paired with itself to maintain balance, ensuring the tree remains binary and complete. The recursion continues until only a single hash remains, which becomes the Merkle root, encapsulating the entire dataset in a compact form. This root can then be used for efficient integrity verification without recomputing all hashes. To illustrate, consider the following pseudocode for constructing a binary Merkle tree from a list of n data blocks, where hash_function is a cryptographic hash like SHA-256:

function build_merkle_tree(data_blocks): # Step 1: Create leaves by hashing each data block leaves = [] for block in data_blocks: leaf_hash = hash_function(block) leaves.append(leaf_hash) # Step 2-3: Recursively build levels until root current_level = leaves while len(current_level) > 1: next_level = [] for i in range(0, len(current_level), 2): left = current_level[i] right = current_level[i+1] if i+1 < len(current_level) else left # Duplicate if odd parent = hash_function(left + right) # Concatenate and hash next_level.append(parent) current_level = next_level return current_level[0] # The root hash

function build_merkle_tree(data_blocks): # Step 1: Create leaves by hashing each data block leaves = [] for block in data_blocks: leaf_hash = hash_function(block) leaves.append(leaf_hash) # Step 2-3: Recursively build levels until root current_level = leaves while len(current_level) > 1: next_level = [] for i in range(0, len(current_level), 2): left = current_level[i] right = current_level[i+1] if i+1 < len(current_level) else left # Duplicate if odd parent = hash_function(left + right) # Concatenate and hash next_level.append(parent) current_level = next_level return current_level[0] # The root hash

This algorithm handles variable data sizes by implicitly supporting unbalanced trees through the duplication mechanism for odd counts, though explicit padding of data blocks to a power-of-two number can also be applied beforehand for perfectly balanced trees. The time complexity is O(n), as each of the n leaves and approximately n-1 internal nodes requires a constant-time hash computation. Space complexity is similarly O(n), due to storing all intermediate nodes during construction, though optimized implementations can reduce this by computing levels iteratively without full storage. As an example, suppose we have a dataset of 8 blocks labeled D1 through D8. First, compute leaf hashes: H1 = hash(D1), H2 = hash(D2), ..., H8 = hash(D8). Next level: pair them to get HA = hash(H1 + H2), HB = hash(H3 + H4), HC = hash(H5 + H6), HD = hash(H7 + H8). Then, the level above: HE = hash(HA + HB), HF = hash(HC + HD). Finally, the root HR = hash(HE + HF). This hierarchical structure allows the root HR to represent all 8 blocks succinctly, with each level reducing the data by half through hashing.

Properties

Security Characteristics

The security of Merkle trees fundamentally depends on the cryptographic properties of the underlying hash function, particularly its collision resistance. A collision occurs when two distinct inputs produce the same hash output, and a collision-resistant hash function makes it computationally infeasible for an adversary to find such a pair. For instance, SHA-256, a widely used hash function in Merkle tree implementations, provides 128 bits of collision resistance, meaning the expected effort to find a collision is on the order of 21282^{128} operations, which is negligible in probability for all practical purposes (approximately 21282^{-128} for random inputs). This property ensures the integrity of the entire tree: any attempt to alter data in one or more leaves would change the root hash unless a collision is found, thereby detecting tampering efficiently. Merkle trees also leverage second preimage resistance, which prevents an attacker from finding a different input that hashes to the same output as a given input. In naive Merkle tree implementations without proper domain separation (e.g., using identical hashing for leaves and internal nodes), a second preimage attack vulnerability arises, where an adversary could potentially forge an inclusion proof by treating an internal node as a leaf and finding a preimage that matches an existing path hash. However, the hierarchical structure of the Merkle tree mitigates this by requiring the attacker to perform full path recomputation from the tampered leaf to the root, involving successive second preimage searches across multiple levels (typically O(logn)O(\log n) deep), which compounds the computational cost exponentially and renders the attack infeasible under standard hash assumptions. The construction of Merkle trees exhibits parallels to the Merkle-Damgård paradigm used in iterative hash functions, where security is preserved through chained hashing. Specifically, tampering with a single leaf propagates changes upward, invalidating the root hash while making it impossible to alter intermediate nodes deceptively without breaking the hash function's one-wayness or collision resistance; any such modification would require solving hard problems at each affected level to restore the original root. This design ensures that even localized changes are globally detectable, bolstering overall tamper resistance. A key security feature is the provision of concise proofs of inclusion, verified via the sibling hashes along the path from a leaf to the root. To confirm a data block's presence (inclusion), the verifier recomputes the root by iteratively hashing the leaf with its siblings, using only O(logn)O(\log n) hashes for a tree with nn leaves; if the recomputed root matches the known root, inclusion is proven without exposing unrelated data. This process relies on the formula: Root=H(H(H(leafsibling1)sibling2))\text{Root} = H\bigl( H\bigl( \cdots H(\text{leaf} \parallel \text{sibling}_1) \parallel \text{sibling}_2 \bigr) \cdots \bigr) where \parallel denotes concatenation and HH is the collision-resistant hash function, illustrating the interdependent chain that enforces cryptographic verification.

Efficiency Aspects

One key efficiency advantage of Merkle trees lies in their logarithmic proof size for verifying individual data elements. To confirm the integrity of a specific leaf node in a tree with n leaves, a verifier needs only the O(log n) sibling hashes along the path from that leaf to the root, enabling authentication without accessing the full dataset. For instance, in a balanced binary Merkle tree with n = 1,000,000 leaves, verification requires roughly 20 hashes, each typically 256 bits for modern cryptographic functions, compared to transmitting or recomputing all 1,000,000 leaf hashes. This structure ensures that proof sizes scale logarithmically, making Merkle trees highly scalable for large datasets. Updates to the tree are similarly efficient, requiring recomputation of only the O(log n) affected nodes from the modified leaf up to the root. This localized adjustment avoids propagating changes across the entire structure, contrasting sharply with naive approaches that demand O(n) operations to rehash the full dataset upon any modification. For example, altering a single transaction in a block of millions would necessitate only path-length recomputations, preserving overall system performance in dynamic environments. In terms of space, a binary Merkle tree stores 2n - 1 hash values for n leaves—n for the leaves and n - 1 for internal nodes—resulting in approximately twice the storage of the raw data when assuming uniform hash sizes. However, this overhead enables compact proofs that are far smaller than alternative full-data representations, optimizing memory use for applications handling massive datasets. Compared to simple checksums, which provide O(1) computation for overall integrity but no mechanism for partial verification, or full rehashing methods that scale linearly in both time and space, Merkle trees strike a balance favoring scalability and selective access. Merkle trees further excel in bandwidth-constrained distributed systems by minimizing data transfer during audits and verifications. Instead of broadcasting entire datasets, parties exchange only the logarithmic-sized Merkle proof, as seen in blockchain protocols where light clients confirm transaction inclusion using a block header and the relevant branch, reducing required downloads from gigabytes to kilobytes per verification. This efficiency underpins their adoption in resource-limited settings, ensuring low-latency integrity checks without compromising reliability.

Applications

In Blockchain Technology

In blockchain technology, Merkle trees play a pivotal role in ensuring efficient and secure transaction verification within decentralized ledgers. Introduced in Bitcoin's design, they allow the entire set of transactions in a block to be represented compactly by including only the Merkle root in the block header, which is then hashed as part of the proof-of-work computation. This structure facilitates the validation of transaction inclusion without requiring the transmission or storage of full transaction lists across the network. A key application is Simplified Payment Verification (SPV), which enables lightweight clients to confirm the validity of specific transactions by downloading only the block headers and a logarithmic number of hashes forming the Merkle branch from the leaf to the root. This process, requiring O(log n) additional data where n is the number of transactions, allows users to verify payments without storing or processing the entire blockchain, reducing bandwidth and storage demands significantly. SPV proofs rely on the cryptographic properties of the Merkle tree to demonstrate that a transaction is included in the block referenced by its header, assuming the header chain is valid. This efficiency extends to other cryptocurrencies, such as , where light clients leverage Merkle proofs to synchronize and verify transactions and state changes with minimal resource overhead, cutting down storage requirements and synchronization times compared to full nodes. In 's block headers, the transactionsRoot field serves as the Merkle root for transaction hashes, enabling similar compact verification. The integration of Merkle roots into block headers in both proof-of-work and proof-of-stake consensus mechanisms ensures tamper-evident blocks: any alteration to the transaction set would change the root hash, invalidating the block's proof and requiring recomputation by miners or validators. For example, in Bitcoin's block structure, the 32-byte Merkle root is positioned in the header alongside fields like the previous block hash and nonce, providing a fixed-size commitment to potentially thousands of transactions while supporting scalable network propagation. Merkle trees are also employed in token distribution systems, such as airdrops and claims, where the Merkle root enables efficient verification of user eligibility through Merkle proofs, without disclosing the full list of eligible addresses. This method enhances privacy and scalability in decentralized applications. Furthermore, in permissioned escrow mechanisms within decentralized finance (DeFi), Merkle trees support conditional token claims by incorporating hashed conditions into the tree structure and deriving escrow addresses from the Merkle root along with claimant-specific details, allowing for verifiable proofs of compliance with criteria such as elapsed time periods or absence of abusive behavior.

In Data Verification Systems

In peer-to-peer (P2P) file sharing protocols, Merkle trees enable efficient verification of file chunks without requiring the entire file to be downloaded or retransmitted. The Merkle Torrent extension, proposed as BitTorrent Enhancement Proposal 30 (BEP-30), integrates Merkle hashes into torrent files to create a hierarchical structure where each leaf represents a hash of an individual data block, and internal nodes aggregate hashes of their children. This allows peers to verify the integrity of specific blocks by requesting only the necessary path from the root hash to the leaf, reducing bandwidth usage in distributed environments. Certificate Transparency (CT), introduced by Google in 2013, employs to maintain append-only logs of publicly issued certificates, facilitating the detection of misissued or unauthorized certificates by certificate authorities. Each log operates as a where certificate entries form the leaves, and signed tree heads provide verifiable proofs of inclusion or consistency for auditors and subscribers. Browsers and clients can monitor these logs to ensure transparency in the public key infrastructure (), with monitors cross-checking logs to identify discrepancies. In version control systems like , Merkle trees underpin the structure of repository objects to ensure data integrity and enable efficient comparisons across versions. represents directories and files as tree objects, where each tree node contains SHA-1 hashes of its child trees or blobs (file contents), forming a Merkle tree rooted at the commit object. This design allows for tamper-evident storage, as any modification to a file propagates changes up the tree, altering the root hash and enabling quick diffing without recomputing entire histories. Merkle trees are integral to cloud storage auditing protocols, particularly in provable data possession (PDP) schemes, where they support efficient integrity checks for large-scale outsourced data without full retrieval. In dynamic PDP models, the client constructs a over file blocks stored on an untrusted server, generating compact proofs via authenticated paths that verify block presence and unaltered state. For instance, updates to specific blocks can be handled by recomputing only the affected subtree, minimizing computational overhead while maintaining verifiability. Seminal work on dynamic PDP highlights the use of hash-based structures like Merkle trees as alternatives to more complex authenticated data structures for scalable auditing. As an illustrative example, consider a large file divided into 1,000 chunks stored in cloud or P2P systems; the provider computes hashes for each chunk to build a , culminating in a root hash shared with the verifier. To audit the integrity of chunk 500, the verifier requests the logarithmic-path hashes (approximately 10 sibling hashes for a balanced binary tree) from the provider, recomputes the root from these and the known chunk hash, and confirms it matches the trusted root, thus detecting any tampering without downloading the full 1,000 chunks.

History and Variants

Origins and Development

The concept of the Merkle tree, originally termed a "hash tree," was invented by Ralph C. Merkle in 1979 as a key component in his doctoral thesis on and . In this work, Merkle proposed a binary tree structure composed of cryptographic hashes to enable efficient verification of data integrity and inclusion proofs for large datasets, allowing a signer to certify multiple messages without recomputing signatures for each one individually. The primary motivation for this invention stemmed from the need to scale digital signature schemes for practical use, particularly in scenarios requiring proof that a specific item belongs to a large set without disclosing the entire set or incurring high computational costs for verification. Merkle patented this authentication tree method in 1982, formalizing its application to digital signatures based on one-way functions. A significant milestone followed in 1987, when Merkle published a paper detailing a digital signature scheme using conventional encryption functions like DES, which relied on the hash tree to reduce signature sizes and enable many-time use while maintaining security equivalent to the underlying encryption. Merkle further elaborated on the hash tree in his 1989 paper "A Certified Digital Signature," presented at CRYPTO '89, where he described its use in constructing certified signatures for arbitrary-length messages via a tree of one-way hashes, emphasizing its role in collision-resistant authentication. This structure built upon emerging ideas in hash function design, including Merkle's own contribution to the Merkle-Damgård construction, which he outlined in a concurrent 1989 CRYPTO paper as a method to extend fixed-size compression functions into variable-length hash functions secure against certain attacks. By the 1990s, the hash tree—now widely known as the in honor of its inventor—gained adoption in cryptographic literature for applications in secure data structures and protocols.

Notable Implementations

One notable variant of the Merkle tree is the Tiger Tree Hash (TTH), which employs the Tiger cryptographic hash function to construct a binary hash tree for verifying file integrity in peer-to-peer networks. TTH processes data in 1024-byte blocks at the leaves and builds upward through binary combinations, allowing for variable fan-out where some internal nodes may have only one child to accommodate non-power-of-two block counts, thus enabling efficient partial file verification in protocols like Gnutella and Direct Connect. Multi-ary Merkle trees generalize the binary structure to k-ary trees, where each non-leaf node has up to k children (with k > 2), which reduces the tree height to approximately logkn\log_k n for n leaves and shortens proof paths in verification processes. This adaptation is particularly useful in large-scale systems requiring faster membership proofs, as the logarithmic depth scales better with higher , though it increases node storage due to hashing k children. Extensions of Merkle trees to authenticated dictionaries incorporate dynamic updates and membership queries, often using Merkle signature schemes to build accumulators that commit to sets while enabling succinct proofs of inclusion or exclusion without revealing the full structure. These schemes, such as those based on hash trees for persistent dictionaries, support insertions and deletions with logarithmic-time verifiability, providing a foundation for secure outsourcing and certificate revocation lists. In modern distributed systems, Merkle trees evolve into Merkle Directed Acyclic Graphs (DAGs) for content-addressed storage, as implemented in the (IPFS), where nodes represent blocks or directories linked by multihashes to form a generalized acyclic structure for versioning and immutable data sharing. This approach extends traditional trees by allowing arbitrary linking and fan-out, facilitating decentralized file systems with content identifiers derived from Merkle proofs. A more recent variant is the Verkle tree, which integrates Merkle trees with vector commitments to generate much smaller inclusion proofs, on the order of a few hundred bytes regardless of tree size. This makes Verkle trees suitable for enabling stateless clients in networks like , where full state storage is impractical, and has been part of 's scalability roadmap as of 2025. Compared to binary Merkle trees, which offer balanced simplicity and minimal node overhead suitable for standard verification, multi-ary variants provide reduced proof sizes and heights for high-fan-out scenarios but require more computational resources per node due to aggregating multiple children.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.