By: David Zhu

Metadata synchronization (sync) is a core feature in Alluxio that keeps files and directories consistent with their source of truth in under storage systems, thus making it simple for users to retrieve the latest version of data from Alluxio. Meanwhile, understanding the internal process is important in order to tune the performance. This article describes the design and the implementation in Alluxio to keep metadata synchronized.

Why is Metadata Sync Critical in Alluxio

In Alluxio, metadata refers to the information of files and directories in the Alluxio file system, including information of their owners, groups, permission, creation and modification time, etc. Metadata is independent of their content — even if a file or directory is empty, it still has associated metadata.

Alluxio maintains a copy of the file system or the object store namespace of the under storage system. Metadata consistency is important in Alluxio, especially when different clusters write/read data in the data pipeline and make changes outside of Alluxio.

A typical scenario in the above diagram is a data pipeline with the combination of Spark ETL and Presto SQL. The ETL cluster (without Alluxio) writes data, followed by the analytics cluster with Alluxio reading the transformed data. Since Alluxio maintains a copy of the metadata from under storage and manages the metadata, when data changes in under storage with the ETL step, the instance of Alluxio on the analytics cluster must be made aware and kept consistent with the metadata in the underlying storage system to proceed correctly.

How Does Metadata Sync Work in Alluxio

Alluxio provides a file system abstraction in a unified namespace on one or more under storage systems. Accessing files or directories through Alluxio is expected to get the same result as directly accessing the under storage. For example, if the under storage mounted to Alluxio root is s3://bucket/data, listing the "/" directory in Alluxio yields the same result as listing objects in s3://bucket/data and printing "/file" in Alluxio should return the same result as s3://bucket/data/file.

In Alluxio, metadata is only stored and served from Alluxio masters, whereas the content of individual files is served from Alluxio workers.

By default, Alluxio loads the metadata from the under storage on demand. In the above example, an Alluxio master starting from empty does not have any information about s3://bucket/data/file after startup. This file is only recognized when some user lists "/" directory in Alluxio or tries to access "/file". This “lazy” behavior prevents unnecessary effort and significantly improves performance because metadata operations in under storage can be slow.

Note that updating metadata can be bidirectional. If all modifications to the file system occur through Alluxio, Alluxio only needs to scan under storage once to retrieve the initial state, then apply changes in both Alluxio and under storage synchronously as a part of the file system RPC calls. This will provide users with a consistent view of under storage. In reality, however, changes are often made to the under storage outside of Alluxio. As a result, Alluxio master must discover the addition, removal, and updates to files and directions in under storage, and apply the changes in the Alluxio file system. This process to synchronize the two namespaces is called metadata sync.

How to Trigger Metadata Sync

When applications make changes to the metadata of an Alluxio file, and this file is persisted, the change will always be propagated to under storage synchronously, and there is no need to trigger metadata sync. When applications update under storage files without letting Alluxio know, there are two approaches to govern the timing of metadata sync.

1. Rely on a time-based automatic sync

One can set a sync Interval as an Alluxio property key “alluxio.user.file.metadata.sync.interval”.

Note that using this approach, if a path in Alluxio has never been accessed, it will never trigger a sync. Once the path is accessed after the interval of sync expires, Alluxio will sync with the under storage again. For example, in a Presto job, the query planning stage lists all the files necessary for the job, which will trigger a sync if those paths have not been accessed recently. However, subsequent stages of the job will not sync unless the job duration exceeds the sync interval.

So we can technically resync more frequently than the sync interval in this case.

This property key can be customized with a new global default value (when set in alluxio-site.properties) or on a directory basis recursively to apply all its children.

2. Sync manually using LoadMetadata flags

If sync metadata does not happen because of the sync interval, most Alluxio operations will continue to execute with the current metadata in the Alluxio file system with some exceptions:

How is Metadata Sync Implemented

Alluxio master potentially triggers metadata sync on an Alluxio path when the Alluxio master receives an RPC request to retrieve the metadata of this path. Rather than having a dedicated service to traverse the entire file system inode tree and keep it in sync, this effort is amortized by each individual Alluxio file system operation on master. The high-level process for the syncing inside the RPC request is:

Note that the metadata sync process can be relatively expensive and block other operations if they touch the same part of the inode tree. This is because the sync process may write-lock the part of file system metadata it is updating. Particularly, when syncing a particular path in a tree, the RPC handling thread will first acquire the read lock on the entire path to the file. Because the syncing thread needs the ability to create paths as well, it needs a write lock for the sync root path. As the syncing thread processes each path under the root path, additional locks are acquired. A write lock to the file path is acquired by the syncing thread and released as soon as the path is processed.

Performance Optimization

Tuning parallelism

One can tune the level of parallelism to sync metadata by controlling three configuration parameters:

Caching result

There are three types of different caches, with different goals and uses in the metadata syncing process. Here is a quick summary of all of them.

Summary

Metadata synchronization is one of the most important features in Alluxio. There are multiple different ways to trigger synchronization with different performance tradeoffs. Inside Alluxio master, there is a list of optimizations applied to speed up the synchronization.

More Info


Also Published Here