Authors:

(1) Kaiyuan Chen, University of California, Berkeley ([email protected]);

(2) Alexander Thomas, University of California, Berkeley ([email protected]);

(3) Hanming Lu, University of California, Berkeley (hanming [email protected]);

(4) William Mullen, University of California, Berkeley ([email protected]);

(5) Jeff Ichnowski, University of California, Berkeley ([email protected]);

(6) Rahul Arya, University of California, Berkeley ([email protected]);

(7) Nivedha Krishnakumar, University of California, Berkeley ([email protected]);

(8) Ryan Teoh, University of California, Berkeley ([email protected]);

(9) Willis Wang, University of California, Berkeley ([email protected]);

(10) Anthony Joseph, University of California, Berkeley ([email protected]);

(11) John Kubiatowicz, University of California, Berkeley ([email protected]).

Abstract and I. Introduction

II. Background

III. Paranoid Stateful Lambda

IV. SCL Design

V. Optimizations

VI. PSL with SCL

VII. Implementation

VIII. Evaluation

IX. Related Work

X. Conclusion, Acknowledgment, and References

Abstract—We propose a federated Function-as-a-Service (FaaS) execution model that provides secure and stateful execution in both Cloud and Edge environments. The FaaS workers, called Paranoid Stateful Lambdas (PSLs), collaborate with one another to perform large parallel computations. We exploit cryptographically hardened and mobile bundles of data, called DataCapsules, to provide persistent state for our PSLs, whose execution is protected using hardware-secured TEEs. To make PSLs easy to program and performant, we build the familiar Key-Value Store interface on top of DataCapsules in a way that allows amortization of cryptographic operations. We demonstrate PSLs functioning in an edge environment running on a group of Intel NUCs with SGXv2.

As described, our Secure Concurrency Layer (SCL), provides eventually-consistent semantics over written values using untrusted and unordered multicast. All SCL communication is encrypted, unforgeable, and private. For durability, updates are recorded in replicated DataCapsules, which are appendonly cryptographically-hardened blockchain with confidentiality, integrity, and provenance guarantees. Values for inactive keys are stored in a log-structured merge-tree (LSM) in the same DataCapsule. SCL features a variety of communication optimizations, such as an efficient message passing framework that reduces the latency up to 44x from the Intel SGX SDK, and an actor-based cryptographic processing architecture that batches cryptographic operations and increases throughput by 81x.

I. INTRODUCTION

Distributed computing uses workers on multiple hosts to jointly run a single task. Existing Function-as-a-Service (FaaS) providers, such as AWS Lambda [8], have pushed distributed computing to an extreme: users can launch hundreds or thousands of distributed workers concurrently. Some FaaS implementations, such as Cloudburst [38], even support stateful executions, in which distributed workers share storage with one another. At this time, the serverless FaaS model has become quite popular for a wide variety of applications.

Edge computing, in contrast to cloud computing, exploits resources at the edge of the network, presenting a variety of opportunities for low-latency, high-bandwidth communication, lower energy usage, and better privacy. It is arguably the next major computing paradigm after cloud computing [44]. Providing a stateful, serverless model of access to resources at the edge seems ideal for a variety of emerging IoT and robotic applications, since it is directly compatible with the paradigm of on-the-fly allocation of compute and storage resources by mobile devices as they transit regions on the edge of the network [23, 39, 41].

Unfortunately, general-purpose Edge computing presents a number of challenges [19, 35] and is thus not widely used by existing FaaS implementations. One huge challenge is that the edge environment is often not as trustworthy as the cloud; resources at the edge of the network may be owned and maintained by novice users or malicious third parties. Furthermore, physical security is less prevalent in edge environments, leading to a variety of physical attack vectors. Compromised devices, while appearing legitimate, could steal information, covertly monitor communication, or deny service. Even worse, malicious entities could seek to alter information in subtle ways that are not immediately obvious, but which corrupt edge applications in damaging or even dangerous ways. Application writers often attempt to “roll their own” data protection in ad-hoc and sometimes buggy ways, leading to data breaches and security violations. Clearly, the lack of a standardized approach to both protect and easily utilize information on the edge hinders exploitation of edge computing resources.

Paranoid Stateful Lambdas: In this paper, we introduce the first FaaS execution service that enables secure and stateful execution for both cloud and edge environments, while at the same time being easy to use. The FaaS workers, called Paranoid Stateful Lambdas (PSLs), can collaborate to perform large parallel computations that span the globe and securely exploit resources from many domains. See Figure 1. We provide an easy-to-use data model and automatically manage cryptographic keys for compute and storage resources.

We take a two-fold approach to supporting PSLs. First, we exploit trusted execution environments (TEEs), such as those provided by Intel SGX [7, 15] and ARM TrustZone. TEEs provide confidentiality and integrity of executed functions while also providing strong isolation of data, computation, and cryptographic assets from the untrusted kernel or hypervisor. Through attestation, a multi-PSL application can be served by an untrusted, third-party service provider.

Second, we package persistent state in cryptographically hardened bundles of data called DataCapsules [29]. Each DataCapsule is a “blockchain [48] in a box,” exploiting a standardized metadata format to guarantee provenance, integrity, and privacy via cryptography. With append-only semantics, DataCapsules provide a permanent audit-trail of operations on their contents, thus allowing undo-like operations, multiversion support, and mediation in the presence of malicious failure. See Figure 2. While DataCapsules can be embedded in any network storage environment, they are particularly powerful when combined with a data-centric network such as the Global Data Plane (GDP) [29], which allows DataCapsules to be stored, migrated, and interacted with anywhere in the network. In this paper, we treat the DataCapsule service as a black box provided by the underlying infrastructure.

This combination of TEEs (for active computation and data in use) and DataCapsules (for data at rest or in motion) provides a powerful combination that enables secure, stateful computation in insecure environments. While DataCapsules by themselves provide a standardized way to encapsulate and protect information as it moves within the network (leading to a global, federated data service), we ease the burden of PSL programmers by presenting them with a familiar key-value store (KVS) interface. This is implemented on top of DataCapsules via a protected “runtime system” we call a Common Access API, or (CAAPI). Consequently, communicating PSLs may interact via shared keys in the key-value store.

Performance Challenges: While security is one of our primary motivations for implementing the PSL framework, we also wish to enable high performance parallel computation using PSLs. This goal is hindered in multiple ways: First, the distributed nature of PSL-based parallelism leads to a need for relaxed consistency for most writes in our KVS. We discuss how to implement an eventually-consistent model for put() operations that allows interacting PSLs to operate with independence from one another while still detecting denial of service attacks and bounding the maximum write propagation delay over an unordered and untrusted network. Our mechanism further enables a release-consistent locking scheme [20].

Second, all communication between collaborating enclaves must be encrypted and signed to prevent malicious parties from forging, corrupting, or observing such communication. This security tax can be significant if not mitigated through batching and suppression of locally overwritten updates. We show how our relaxed consistency implementation permits a variety of cryptographic optimizations.

Third, the strong isolation provided by TEEs is a doubleedged sword: while shielding in-enclave applications from external malicious parties, it imposes a strong impediment to communication across the enclave barrier. The common communication approach [15] involves hardware-specific attestation and complicated key exchange protocols for oneto-one communication, the complexity only increasing with larger enclave group sizes. Even crossing the enclave barrier on a local node using the popular SGX container framework (GrapheneSGX [43]) can exhibit horrendous overhead, combining an expensive context switch with byte-wise data copying[1]. Our approach to speeding up communication exploits the standardized DataCapsule format (which protects information) combined with heavy optimization of communication across the enclave barrier and a fast but untrusted multicast tree for communication.

The Secure Concurrency Layer: Much of our communication innovations are embodied in the eponymous Secure Concurrecy Layer (SCL), one of the primary topics of this paper. SCL is an in-enclave cache manager that securely and efficiently relays data between multiple enclaves while providing well-formed update semantics. In our system, a given PSL interacts with remote PSLs by issuing KVS put() operations to its own local cache. SCL translates these write operations into encrypted and signed update records compatible with the underlying DataCapsule. The updates are then propagated to other enclaves as well as the network-embedded DataCapsule (for durability) over an untrusted and unordered network multicast tree. SCL provides eventual consistency semantics over the written values, but enforces epoch-based resynchronization for liveness2 . SCL also features various performance optimizations. For example, SCL uses a circular buffer based message passing design, which passes messages across secure enclave boundary 44x faster than using standard send ecalls. To parallelize the cryptographic computations, such as encryption, hashing, and signing, SCL uses an actorbased architecture for computing the DataCapsule’s headers. When combined with batching, these optimizations increase throughput by 81x over the unoptimized baseline.

We design and implement the PSL FaaS infrastructure using SCL. PSL-enabled worker nodes can run directly on top of SCL by static linking or dynamic script interpretation[3]. To bootstrap secure enclaves with appropriate cryptographic identities, we design a key management scheme inspired by the Bitcoin wallet [6] and an optimized attestation protocol. Unlike previous works [25, 45] that only support Intel SGX and assume SGXv1, we implement SCL on Asylo [21, 22], a hardware-agnostic framework that allows SCL to run on most mainstream TEE hardware. The result is a third-party service running on the edge that can satisfy on-the-fly requests to securely execute PSL applications using compute and storage resources embedded in the edge environment.

We claim the following contributions in this paper:

• Paranoid Stateful Lambdas (PSLs): We introduce the notion of Paranoid Stateful Lambdas and show the design and implementation of our PSL execution environment.

• Separation of State and Computation: We propose to use DataCapsules as the ground-truth vehicle for communication among different types of secure enclave hardware with confidentiality, integrity, and provenance guarantees.

• SCL KVS: We design, implement and evaluate SCL, a secure and eventually-consistent replicated KVS that facilitates inter-enclave communication and bounds maximum write latency while mitigating denial of service. We implement associated key distribution and attestation protocols.

• Communication Optimizations: We reduce and amortize the communication and cryptographic overhead by rearchitecting the cryptographic pipeline and designing a circular buffer based message passing mechanism.

This paper is available on arxiv under CC BY 4.0 DEED license.

[1] The standard system call facility for SGX incurs between 8,000 and 20,000 cycles for an ecall and takes 8,000 cycles for an ocall.

[2] Our system utilizes a Log-Structured Merge(LSM) tree to efficiently store idle Key-Value pairs, namely those not currently in PSL caches.

[3] In the future, we hope to support dynamic linking of PSL binaries residing in DataCapsules.