sia.hackernoon.com

Ethereum processes vast amounts of data every second. To keep the network efficient and accessible, this data must be organized in standardized and memory-optimized formats. To achieve this, Ethereum uses specialized data structures that allow anyone to run a node, even on modest consumer-grade hardware.

This blog explores the core data structures that power Ethereum’s execution and consensus layers: Patricia Merkle Tries, which provide cryptographically secure storage of key-value pairs, Recursive Length Prefix (RLP), a compact and consistent data serialization method, and Simple Serialize (SSZ), the consensus layer’s preferred serialization format designed to work seamlessly with merklelization.

Understanding these foundations is key to seeing how Ethereum achieves security, scalability, and decentralization.

In this blog, we will focus specifically on Merkle Patricia Tries.

Merkle Patricia Trie

Ethereum works like a state machine. Every transaction updates the global state, which contains account balances, contract storage, and code. This state is stored as key–value pairs, and Ethereum needs a data structure that is both efficient and secure. For this, it uses the Modified Merkle Patricia Trie (MPT).

The MPT combines two ideas: tries and Merkle trees. A trie is a tree-like structure for looking up keys by following a path of characters (or, in Ethereum’s case, hexadecimal nibbles). A Merkle tree is a hash-based structure that allows anyone to verify data without needing the whole dataset.

By combining them, Ethereum gets a database that is efficient for lookups and updates, while also cryptographically secure against tampering.

The key strength of the MPT is that it produces a single root hash that represents the entire state of Ethereum. If even one balance changes, this root hash changes. This means two nodes can quickly agree on whether they have the same state by just comparing root hashes. It also enables Merkle proofs, where someone can prove a piece of data exists in the state by only providing the path and hashes, rather than the whole database.

Radix Tries (The Foundation)

The foundation of MPT is a radix trie. In a radix trie, each key is split into symbols (like hex digits), and we walk down the tree one symbol at a time. Each node can have multiple children, and the value is stored at the end of the path.

For example, suppose the key "dog" is stored. First, we convert it into hex (64 6f 67). Then we start at the root and follow the path nibble by nibble: 6 → 4 → 6 → 15 → 6 → 7. At the end of this path, we find the value associated with "dog".

Updating works similarly: you walk down the path, copy the nodes that change, update the value, and then recompute the hashes on the way back up. Deleting means following the path, removing the value, and cleaning up any nodes that become unnecessary.

While this works, it can be very inefficient. Ethereum addresses are 32 bytes, which is 64 hex characters. A plain radix trie would require 64 steps just to look up one key, with many empty branches along the way. This is where the Patricia optimization comes in.

Adding Hashing: Merkle Radix Tries

Ethereum doesn’t just store values in the trie—it also hashes them. Every node is stored in the database under a key equal to the hash of its contents, specifically keccak256(rlp(value)). This makes the structure content-addressable: the same node always has the same hash, and if the content changes, the hash changes.

This hashing turns the trie into a Merkle Radix Tree, where the root hash acts as a fingerprint of the entire structure. If you know the root, you can verify that any piece of data belongs in the trie by walking down the path and checking the hashes. If someone tries to fake data, the hashes won’t line up.

Think of it like a library catalog: the root hash is like a digital seal on the entire catalog. If even one entry is wrong, the seal won’t match.

The Patricia Optimization

To fix the inefficiency of radix tries, Ethereum adds Patricia optimizations. Instead of storing one nibble per node, paths are compressed. There are three types of nodes in MPT:

Branch nodes, which have 16 child slots (one for each hex nibble) plus a value slot.
Leaf nodes, which store the end of a path and the value.
Extension nodes, which store a sequence of nibbles leading to the next node.

This way, instead of having 64 steps for a 64-nibble path, we might only have a handful of steps, since long single-child paths are collapsed into extension nodes.

Compact Encoding of Paths

When Ethereum stores paths in the Merkle Patricia Trie, it does not save them nibble by nibble (too large). Instead, it uses a compact encoding scheme that packs extra information into the first nibble.

This first nibble tells us two things:

Node type: Is it a leaf node or an extension node?
Path length: Is the remaining path odd or even in length?

The rules are:

Hex	Bits	Node type	Path length
0	0000	Extension	Even
1	0001	Extension	Odd
2	0010	Leaf (end)	Even
3	0011	Leaf (end)	Odd

For even remaining path length (0 or 2), another 0 "padding" nibble will always follow.

    def compact_encode(hexarray):
        term = 1 if hexarray[-1] == 16 else 0
        if term:
            hexarray = hexarray[:-1]
        oddlen = len(hexarray) % 2
        flags = 2 * term + oddlen
        if oddlen:
            hexarray = [flags] + hexarray
        else:
            hexarray = [flags] + [0] + hexarray
        # hexarray now has an even length whose first nibble is the flags.
        o = ""
        for i in range(0, len(hexarray), 2):
            o += chr(16 * hexarray[i] + hexarray[i + 1])
        return o

Example:

Input:

> hexarray = [0xf, 0x1, 0xc, 0xb, 0x8, 0x10] or [f, 1, c, b, 8, 10]
  '3f 1c b8'

Notice the last nibble is 0x10(decimal 16).
- That’s theterminator marker (leaf) in Ethereum’s compact encoding.

Walk through code:

term = 1 if hexarray[-1] == 16 else 0

Last element = 16 → so term = 1.

if term:
    hexarray = hexarray[:-1]
    
Strip the terminator →
hexarray = [f, 1, c, b, 8] (5 nibbles).

oddlen = len(hexarray) % 2

Length = 5 → oddlen = 1

flags = 2 * term + oddlen

2 * 1 + 1 = 3 → so flags = 3.

if oddlen:
    hexarray = [flags] + hexarray

Since oddlen = 1, prepend 3 →
hexarray = [3, f, 1, c, b, 8].

Now, Pack Into Bytes

The loop does:
o += chr(16 * hexarray[i] + hexarray[i + 1])

Pairs:

(3, f) → 0x3f
(1, c) → 0x1c
(b, 8) → 0xb8

Output: 
'3f 1c b8'

Input:

> hexarray = [0x1, 0x2, 0x3, 0x4, 0x5, ...] or [1, 2, 3, 4, 5, ...]
  '11 23 45'

Walk through code:

term = 1 if hexarray[-1] == 16 else 0

Last element = 5 → so term = 0.

if term:
    hexarray = hexarray[:-1]
    
will not execute as term = 0

oddlen = len(hexarray) % 2

Length = 5 → oddlen = 1

flags = 2 * term + oddlen

2 * 0 + 1 = 1 → so flags = 1.

if oddlen:
    hexarray = [flags] + hexarray

Since oddlen = 1, prepend 1 →
hexarray = [1, 1, 2, 3, 4, 5].

Now, Pack Into Bytes

The loop does:
o += chr(16 * hexarray[i] + hexarray[i + 1])

Pairs:

(1, 1) → 0x11
(2, 3) → 0x23
(4, 5) → 0x45

Output: 
'11 23 45'

Here is the extended code for getting a node in Merkle Patricia Trie.

    def get_helper(node_hash, path):
        if path == []:
            return node_hash
        if node_hash == "":
            return ""
        curnode = rlp.decode(node_hash if len(node_hash) < 32 else db.get(node_hash))
        if len(curnode) == 2:
            (k2, v2) = curnode
            k2 = compact_decode(k2)
            if k2 == path[: len(k2)]:
                return get(v2, path[len(k2) :])
            else:
                return ""
        elif len(curnode) == 17:
            return get_helper(curnode[path[0]], path[1:])

    def get(node_hash, path):
        path2 = []
        for i in range(len(path)):
            path2.push(int(ord(path[i]) / 16))
            path2.push(ord(path[i]) % 16)
        path2.push(16)
        return get_helper(node_hash, path2)

Example Trie

Let’s walk through an example of how a Merkle Patricia Trie (MPT) is built from key-value pairs.

We’ll use five accounts with these key-value pairs:

Account	Key (address/hash, shortened)	Value (simplified state)
1	`0x616b6c64`	`0x01`
2	`0x616b6c65`	`0x02`
3	`0x616b6c78`	`0x03`
4	`0x616b6d31`	`0x04`
5	`0x31323334`	`0x05`

In Ethereum, the keys are normally 32-byte Keccak-256 hashes of addresses. Here, we’re using shorter keys so the example is easy to follow.

Step 1: Insert the first key-value pair.

Start with an empty MPT, and add the first key-value pair: (0x616b6c64, 0x01). Since it is just one key, it results in a trie of only one leaf node. To create that node, proceed in the following way:

Write a two-item array: The first element should be the whole key and second one the value. Add the prefix flag 20 to the first element, indicating that the node is a leaf and that the key has an even amount of nibbles. Then, the array should look like this: ["0x20616b6c64","0x01"].

Encode in RLP the array: You can use an RLP converter or this Python script:

import rlp

key = bytes.fromhex("20616b6c64")
value = bytes.fromhex("01")
encoded = rlp.encode([key, value])
print(encoded.hex())

This should output this hex: c78520616b6c6401.

Hash the RLP encoding using Keccak-256 to get the pointer of this node. You can use an online hasher or this script:

from eth_utils import keccak

rlp_bytes = bytes.fromhex("c78520616b6c6401")
hash_pointer = keccak(rlp_bytes)
print(hash_pointer.hex())

This should output:

4e2d0fbe6726eac15c5ecf49a4e1f947aa50e0531f4f3e98b8e1577ba52e1783

The resulting MPT should look like this:

[Key, Value] -> [“0x616b6c64”, “0x01”]
RLP Encoding -> 0xc78520616b6c6401
Hash of RLP Encoding -> 4e2d0fbe6726eac15c5ecf49a4e1f947aa50e0531f4f3e98b8e1577ba52e1783

Step 2: Insert the second key-value pair.

[“0x616b6c65”, “0x02”]. Since this key shares the first 7 digits with the previous one, we proceed in the following way:

Build one leaf for each key: In each leaf, the array's first element should be the remaining path, but since the two keys share every digit except the last one, the remaining path is empty. So, we should just write there the flag 20, indicating that we are in a leaf node and that the path has an even amount of digits (zero digits). After that, encode the arrays in RLP and hash the RLP encodings, as we did in step 1.
Build a branch node: Create a 17-item array. Write the hash of the first key's leaf node at index 4 and the hash of the second key's leaf node at index 5. Encode the array in RLP and hash the encoding.

Build the extension and root node: Create a two-item array that contains the shared prefix as the first element and the hash of the previously built branch node as the second element. Since the shared prefix has an odd number of digits, add the flag 1 to it. Encode the array in RLP and hash the encoding.

[Key, Value] -> ["0x20", "0x01"]
RLP Encoding: 0xc22001
Hash of RLP Encoding: 0xce8c4b060e961e285a1c2d6af956fae96986f946102f23b71506524eea9e2450

[Key, Value] -> ["0x20", "0x02"]
RLP Encoding: 0xc22002
Hash of RLP Encoding: 0x1e116569fc77f5d4721ac60cbbeee7bdf40344511d9c4b1ac6ac9e6a8ae3c4e7

[“0x”, "0x", "0x", "0x", "0xce8c4b060e961e285a1c2d6af956fae96986f946102f23b71506524eea9e2450", "0x1e116569fc77f5d4721ac60cbbeee7bdf40344511d9c4b1ac6ac9e6a8ae3c4e7", "0x", "0x", "0x", "0x", "0x", "0x", "0x", "0x", "0x", "0x", "0x"]
RLP Encoding: 0xf85180808080a0ce8c4b060e961e285a1c2d6af956fae96986f946102f23b71506524eea9e2450a01e116569fc77f5d4721ac60cbbeee7bdf40344511d9c4b1ac6ac9e6a8ae3c4e78080808080808080808080
Hash of RLP Encoding: 0xc64b8c068233f978ac4e505b00ab3186aa3deeb29e76c928cd372161b899bc2b

[Key, Value] -> ["0x1616b6c6", "0xc64b8c068233f978ac4e505b00ab3186aa3deeb29e76c928cd372161b899bc2b"]
RLP Endcoding: 0xe6841616b6c6a0c64b8c068233f978ac4e505b00ab3186aa3deeb29e76c928cd372161b899bc2b
Hash of RLP Encoding: 0x870b83eb1f33d66c00bf9b0e1b3fce53ef7b8e9999d7f94df24214ae49238d2e

Step 3: Insert the third key-value pair.

[“0x616b6c78”, “0x03”], Since this key shares the first 6 digits with the previous two keys, we proceed in the following way:

Add a leaf node: Notice that in this case, since the new key shares with the previous ones all the digits except the last two, the array's first item will have just one digit as the remaining path, and the flag 3 indicating that it is a leaf node with an odd amount of path digits.
Add a branch node: Its array should contain at index 6 the hash pointer of the branch node built in step 2, and at index 7 the hash pointer of the new leaf node we recently added.

Add an extension node: The root will be another extension node. Its array should contain the shared prefix as the first element and the hash of the recently added branch node as the second element. Since the share prefix has an even number of digits, add the flag 00 to it.

[Key, Value] -> ["0x38", "0x03"]
RLP Encoding: 0xc23803
Hash of RLP Encoding: 0xcbfcfed99ffa2123bea4569cf9b9201b1f87e4649acfbdb4ad36aa1f5317c33a

["0x", "0x", "0x", "0x", "0x", "0x", "0xc64b8c068233f978ac4e505b00ab3186aa3deeb29e76c928cd372161b899bc2b", "0xcbfcfed99ffa2123bea4569cf9b9201b1f87e4649acfbdb4ad36aa1f5317c33a", "0x", "0x", "0x", "0x", "0x", "0x", "0x", "0x", "0x"]
RLP Encoding: 0xf851808080808080a0c64b8c068233f978ac4e505b00ab3186aa3deeb29e76c928cd372161b899bc2ba0cbfcfed99ffa2123bea4569cf9b9201b1f87e4649acfbdb4ad36aa1f5317c33a808080808080808080
Hash of RLP Encoding: 0xcc97f12ea3217345e666974cd81b117ca02404f19c15d31158ac1d1e55398706

[Key, Value] -> ["0x00616b6c", "0xcc97f12ea3217345e666974cd81b117ca02404f19c15d31158ac1d1e55398706"]
RLP Endcoding: 0xe68400616b6ca0cc97f12ea3217345e666974cd81b117ca02404f19c15d31158ac1d1e55398706
Hash of RLP Encoding: 0x85fb3f4e331ecdabcf4d198863553663a6191de909e8e4a8dad0dd8d50debb97

Add the last two key-value pairs, continuing in this way, following the same steps as we did for the previous keys. When you're done, you should have the following MPT:

Verify

When a verifier gets the StateRoot and the proof path for a key, they must check if the proof is correct. Unlike a normal Merkle Tree, where the check starts at the leaf and goes up to the root, in a Merkle Patricia Trie, the process goes from the top down.

The verifier begins at the StateRoot and then moves step by step through the nodes in the proof path. At each step, they decode the node and make sure that the hash stored inside matches the next node in the path. This continues until the verifier reaches the final leaf node. If this leaf contains both the remaining part of the key and the correct value, then the proof is valid.

Let's say the verifier receives the proof of the above for the key-value (0x616b6d31, 0x04). Then, he has to follow these steps:

Hash the first element of the path and check that it matches the given StateRoot. Indeed:

keccak(bytes.fromhex(
    "f851808080a0a26b2ac124718443aeed68fa0309225d0c8dd9dbee45685909e92fb594e1a4638080a02ccd118c9470c051689543b233ab109ad74d2fb4f57eb429c4d43294d6ae686780808080808080808080"
)).hex() ==
    "13ea549e268b5aa80e9752c6be0770cffba34d2b1aa1f858cb90f3f13ac3bed7"

Decode the first RLP element of the path and verify that in the index, 6 has the hash of the second path element. Indeed:

rlp.decode(bytes.fromhex(
    "f851808080a0a26b2ac124718443aeed68fa0309225d0c8dd9dbee45685909e92fb594e1a4638080a02ccd118c9470c051689543b233ab109ad74d2fb4f57eb429c4d43294d6ae686780808080808080808080"
))[6].hex() ==
    "2ccd118c9470c051689543b233ab109ad74d2fb4f57eb429c4d43294d6ae6867"

Decode the second RLP element of the path. You will see an array with two items. The first item is 0x1616b6. Because it starts with 1, this means it is an extension node. Now, check that the rest of the digits match the key we are trying to prove. Then, confirm that the second item in the array is the hash of the next element in the path.
Decode the third element of the path. This will be a branch node. Look inside it, and make sure that at index d, it holds the hash of the following element in the path.

Merkle Patricia Trie
Decode the last element of the path. This will be an array with two items. The first item is 0x2031. Since it begins with 20, this means it is a leaf node. Verify that this first item contains the remaining key digits 31 and that the second item is the value 0x04.

Tries in Ethereum

In Ethereum’s execution layer, all the merkle tries are based on a data structure called the Merkle Patricia Trie (MPT). From every block header, you can actually see three important roots — each representing a different trie:

stateRoot – tracks account state.
transactionsRoot – tracks all transactions in the block.
receiptsRoot – tracks receipts of those transactions.

Let’s break them down one by one.

State Trie

There is one global state trie, and it gets updated every time a block is processed.

The path in this trie is always keccak256(ethereumAddress).
The value stored at that path is rlp(ethereumAccount).

An Ethereum account is stored as an array of four items:

[ nonce, balance, storageRoot, codeHash ]

Here’s the interesting part: the storageRoot itself is the root of another trie — the Storage Trie.

Storage Trie

The storage trie is where all contract data lives. Each account (that is, a contract) has its own storage trie.

If you want to retrieve a contract’s stored value, you need:

the storage address (contract address),
the position of the storage slot, and
the block number (or just "latest" for the current state).

For example, to fetch data from slot 0 of address 0xc0ffee254729296a45a3885639AC7E10F9d54979, you can use the JSON-RPC method eth_getStorageAt:

curl -X POST --data '{
  "jsonrpc":"2.0",
  "method":"eth_getStorageAt",
  "params": [
    "0xc0ffee254729296a45a3885639AC7E10F9d54979", 
    "0x0", 
    "latest"
  ],
  "id":1
}' localhost:8545

This might return something like:

{"jsonrpc":"2.0","id":1,"result":"0x00000000000000000000000000000000000000000000000000000000000004d2"}

For other slots, you can’t just pass the number directly. Instead, Ethereum calculates the slot’s position as:

keccak256( padded(address) + padded(slotIndex) )

For example, for slot 1, at address 0x391694e7e0b0cce554cb130d723a9d27458f9298, the key is:

keccak256("000...391694e7...f9298" + "000...0001")

In a Geth console, you can calculate it with:

var key = "000000000000000000000000391694e7e0b0cce554cb130d723a9d27458f9298" +
          "0000000000000000000000000000000000000000000000000000000000000001"
web3.sha3(key, { "encoding": "hex" })

This gives the hashed position, which you then pass into eth_getStorageAt to fetch the actual stored value.

If an account is not a contract, its storageRoot is empty.

Transactions Trie

Every block also has its own transactions trie. This stores all the transactions of that block as (key, value) pairs.

The path is rlp(transactionIndex) (basically the transaction’s position in the block).
The value depends on the transaction type:
- For legacy transactions → rlp(tx)
- For typed transactions (EIP-2718) → TxType | encode(tx)

This allows Ethereum to support both old and new transaction formats

Receipts Trie

Finally, every block also has a receipts trie. Like the transactions trie, the path is also rlp(transactionIndex).

The receipts trie is never updated once the block is sealed.
A receipt is basically proof of what happened in a transaction (gas used, logs, etc.).

Receipts can be of two kinds:

LegacyReceipt → stored as rlp([status, cumulativeGasUsed, logsBloom, logs]).
Receipt (typed) → stored as TransactionType | ReceiptPayload.

Again, this supports both old and new transaction formats, thanks to EIP-2718.

Ethereum’s Hidden Backbone: A Deep Dive Into Merkle Patricia Tries

Merkle Patricia Trie

Radix Tries (The Foundation)

Adding Hashing: Merkle Radix Tries

The Patricia Optimization

Compact Encoding of Paths

Example Trie

Step 1: Insert the first key-value pair.

Step 2: Insert the second key-value pair.

Step 3: Insert the third key-value pair.

Verify

Tries in Ethereum

State Trie

Storage Trie

Transactions Trie

Receipts Trie