This may seem like a random topic, but I’m currently taking a cryptography class, and my project idea is related to homomorphic encryption. While doing some preliminary research into the topic, I stumbled across Merkle trees and distributed filesystems. So, naturally, I decided to make it the topic of my post in hopes of learning more about them and imparting some knowledge.


So, without further ado…enjoy.


Before we can dive into Merkle Trees, we first have to talk about plain old trees.

What is a Tree?

A tree is a data structure in computer science that represents a hierarchical tree structure composed of nodes. Each node can be connected to many children, but can only have one parent node; this is not the case for the root node. The root node has no parents; it’s the top-most node in the tree. Nodes with no children are called leaf nodes.


Below is a figure that visually explains this:


The constraints put on the nodes is actually quite useful. This means that there are no loops in relations, meaning a node cannot be it’s own ancestor. Secondly, this means that we can consider each child node the root node of its own subtree. That might sound confusing, but let’s use some visuals:


Let’s show the entire tree with leaf nodes and subtrees marked:

With that in mind, how can we visit/access each node? This is called “walking the tree” or tree traversal.

Tree Traversal

We won’t be looking at tree traversal in-depth for this post; however, just at a high level, I’ll briefly describe the two most popular types.


Unlike linear data structures, which are traversed linearly (looping over the data structure itself and computing at each element), trees can be traversed in multiple ways. Two of the most common traversal methods are breadth-first and depth-first.



We now have enough background info to tackle this week's post…

What is a Merkle Tree?

A Merkle tree is a binary tree (just a tree where a node can have up to two child nodes), obviously, in which every leaf node contains the cryptographic hash of a block of data, and every non-leaf node contains the cryptographic hash of the hashes of its children (sounds confusing, I’ll show a visual in a second).


Now, what this specific tree allows us to do is securely verify that arbitrary content exists within the tree.


Let’s get a visualization of a Merkle Tree:

Inclusivity

So, how can we verify that a piece of data is included in the Merkle tree? By providing a Merkle Proof, aka an inclusion proof.

In an inclusion proof, all we need is the leaf node’s (the data block being verified) sibling and all other intermediate hashes needed to calculate the Merkle root hash. I’ll explain visually below:


Let's say we want to check if data block L1 is in the Merkle Tree. The way we would do that is:

This method of proof is actually very advantageous when compared to linear data structures because it scales logarithmically. This means that we need to compute log(N) hashes to prove the inclusion of a data element, where N is the number of leaf nodes. As data grows, this logarithmic scaling proves to be much more manageable.


So in our example above, we have four leaf nodes, so we only need log(4) hashes to verify any data block. Let’s see how that scales:

As the leaf nodes grow, the number of hashes to verify any individual data block grows at a sustainable pace.

Uses

So, where do we see Merkle Trees today? Basically everywhere. The most common and prevalent usage of Merkle Trees is within the Blockchain and Cryptocurrency space. It’s a fundamental part of Blockchain technology; they’re used to verify and organize transactions efficiently within data blocks.



So basically, Merkle Trees are everywhere, from the file system on your computer to your version control system (Git, Mercurial) to distributed databases like Amazon DynamoDB and Google’s BigTable.

Conclusion

Merkle Trees play a crucial role in ensuring data integrity and efficient verification in distributed systems. Their hierarchical hashing mechanism allows for secure and scalable data verification, making them indispensable in modern technology.


For my next post, we’ll be implementing Merkle Trees.


If you found this post helpful or have questions, please leave a comment below. Don't forget to subscribe to stay updated on future posts exploring fascinating topics in computer science and cryptography.

Thanks for tuning in,

Ali.