In this article, we propose a blockchain network that acts as a centralized append-only distributed file system (DFS) such as Hadoop Distributed File System (HDFS) or Google File System (GFS). The potential advantages of blockchain as a distributed file system (BaaDFS) include:
Currently, the blockchain network is designed for private use (as the target DFS is centralized), but it should be able to extend to public/consortium use with some modifications.

DFS Semantics Supported

The BaaDFS supports single-writer multiple-reader append-only DFS semantics, which has been widely used in existing DFSs such as HDFS and GFS. To be more specific, for a given file, we only allow to write from a writer at any time, but multiple readers may open and read the same file simultaneously. The writer could only append data to a file. Such DFS semantics is highly suitable for high-performance batch processing.

Client Operation Semantics

The BaaDFS supports the following file operations from client perspective:

Representation of the File System as a Blockchain Ledger

Similar to HDFS/GFS, a file in BaaDFS is represented as a list of data chunks
where the file content is equally divided into chunks with the same size (chunk_size) except the last chunk, whose size, namely, last_chunk_size, is file_size % chunk_size.
In the proposed BaaDFS, a chunk of data is stored as part of a blockchain block, where the block consists of
tx := (filename, chunk_idx, chunk_num, chunk_data_0, chunk_data_1, chunk_data_${chunk_num — 1}),
which means that the data chunk_data_0, chunk_data_1, …, chunk_data_${chunk_num — 1} are written to the file with the offset starting from chunk_idx * chunk_size. The size of chunk_data’s in a transaction must be chunk_size, except the last one, whose size, last_chunk_size <= chunk_size. The resulting new file_size after applying the write transaction becomes (chunk_idx + chunk_num — 1) * chunk_size + last_chunk_size. The hash of the list of transactions (likely in a Merkle tree way) will be stored in a field of the block header as conventional blockchain does.
Note that since we implement an append-only file system, the write transactions must satisfy the following constraints (assuming file_size’ is the pre-write file size, and file_size is the post-write file size)
To lookup the chunk, we define a chunk_info, which tells where the chunk data can be read as
chunk_info := (block_index, tx_index, tx_chunk_index)
where block_index is the height of the block that contains the corresponding write operation/transaction, tx_index is the position of the write transaction in the block, and tx_chunk_index is the position of the chunk_data in the transaction.
As a result, given the history of the ledger, i.e., blocks, a reader could fully read any part of the file by a list of chunk_info’s together with file_size and chunk_size, where the list of chunk_info can be efficiently implemented as a Merkle tree (likely an accumulator) for fast update (only the last item), append, and read.
The tuple of (file_size, chunk_size, and chunk_info_trie_hash) is defined as the metadata of the file as:
metadata := (file_size, chunk_size, chunk_info_trie_hash),
and the state of the ledger given a block is basically a mapping as:
state := filename -> metadata
which could be implemented as another Merkle tree (e.g., Patricia Merkle Tree or Sparse Merkle Tree), whose hash value will be stored in the header of the block.
Summarizing the aforementioned details, the diagram of the ledger of a BaaDFS looks like
A good property of such a file system representation is that given the hash of the state trie or a block hash, a reader or writer could uniquely determine a snapshot of the filesystem and check the integrity or immunity of the filesystem.

Components of the BaaDFS Network

In this subsection, we illustrate a blockchain network and its components for the BaaDFS

Example Implementations of Supported Operations

We assume a full node has the following RPC service to a DFSClient:
Given above RPCs, the file operations can be supported by a DFSClient in BaaDFS as follows:

Advantages over HDFS/GFS

High availability

Consider the network adopts a BFT consensus that tolerates up to f byzantine failures with 3f + 1 full nodes

High data integrity

Given the position of the data to be validated (filename, offset, len), the node could validate the integrity of the data as follows (assuming the LAST_FINALIZED_BLOCK and its hash is valid (agreed by consensus)):
  1. Read the metadata of the file from the state_trie_hash of LAST_FINALIZED_BLOCK, and validate all cryptographic proofs that the metadata is indeed in the state_trie.
  2. Read the chunk_info’s from the chunk_info_trie_hash in the metadata, and validate all cryptographic proofs that the chunk_info’s are included in the chunk_info_trie.
  3. Read the data chunk for each chunk_info, and verify that each data chunk is included in the corresponding blocks.
  4. Validate the blocks containing the data chunks are part of the history of the ledger (i.e., previous blocks of LAST_FINALIZED_BLOCK)
Note that obtaining cryptographic proofs of steps 1–3 only traverses O(log(|tree|)) elements in the Merkle tree, while for step 4, a standard hashed linked list of blocks may take linear time to cryptographically verify if a block is ahead of LAST_FINALIZED_BLOCK in the ledger. An improvement can be done by using a Merkle tree accumulator to store all the blocks as also adopted by Facebook Libra, and as a result, verifying the block relationship can be done in O(log(|blocks|)) time.

Highly trustworthy storage for clients

A writer can check the integrity of the written data according the cryptographic proofs of the blockchain without trusting the node that writes the data. For readers, assuming no writer challenges the proofs, any reader could use the proofs and verify such integrity and immunity of data from any node. Again, a reader only needs to trust the consensus and the ledger instead of a specific node. As a comparison, for traditional systems such as HDFS/GFS, we have to trust the systems are properly implemented and operated (e.g., a chunk of a file in a datanode in HDFS is corrupted (with checksum disabled) or the datanode is hacked, and the DFSClient has no way to verify the corrupted data after reading the chunk from the datanode).

Scalability

Scalability on Storage

To scale storage, a full node can be implemented as a cluster (server farms) and the block can be distributedly stored on the servers in the cluster. The cluster may or may not implement HA. If an HA feature is implemented, the full node itself may replicate the blocks to multiple servers in the cluster.

Scalability on Read

Similar to HDFS/GFS, scalability on read can be achieved in the way that a full node will respond to a data chunk read request with an IP address of the server that stores the actual data chunk in the cluster, and the following data read operation will be performed on that server instead of the full node itself.

Scalability on Write

All full nodes must reach the same view on all the blocks, and if the amount of requests of write operation is high and the blocks are large, synchronizing the blocks among the full nodes may be costly or even prohibited. To optimize write performance, we could have the following optimizations:

Further Enhancements

Further Extensions

We may extend the idea to the blockchain as a distributed key-value store (BaaDKYS).