Authors:

(1) Mathias Brossard, Systems Group, Arm Research;

(2) Guilhem Bryant, Systems Group, Arm Research;

(3) Basma El Gaabouri, Systems Group, Arm Research;

(4) Xinxin Fan, IoTeX.io;

(5) Alexandre Ferreira, Systems Group, Arm Research;

(6) Edmund Grimley-Evans, Systems Group, Arm Research;

(7) Christopher Haster, Systems Group, Arm Research;

(8) Evan Johnson, University of California, San Diego;

(9) Derek Miller, Systems Group, Arm Research;

(10) Fan Mo, Imperial College London;

(11) Dominic P. Mulligan, Systems Group, Arm Research;

(12) Nick Spinale, Systems Group, Arm Research;

(13) Eric van Hensbergen, Systems Group, Arm Research;

(14) Hugo J. M. Vincent, Systems Group, Arm Research;

(15) Shale Xiong, Systems Group, Arm Research.

Editor's note: this is part 3 of 6 of a study detailing the development of a framework to help people collaborate securely. Read the rest below.

3 IceCap

IceCap is a hypervisor with a minimal trusted computing base (TCB, henceforth) built around the formally verified seL4 microkernel. IceCap provides a pragmatic and flexible software isolate for many existing Armv8-A devices. The IceCap hypervisor relegates the untrusted operator to a domain of limited privilege called the host. This domain consists of a distinguished virtual machine—housing a rich operating system such as Linux—and a minimal accompanying virtual machine monitor. The host domain manages the device’s CPU and memory resources, and drives device peripherals which the TCB does not depend on. This includes opaque memory and CPU resources for confidential virtual machines—or isolates. However, the host does not have the right to access the resources of isolates—while scheduling and memory management policy is controlled by the host, mechanism is the responsibility of more trustworthy components.

IceCap’s TCB includes the seL4 microkernel and compartmentalized, privileged seL4-native services running in EL0. These co-operate defensively with the host to expose isolate lifecycle, scheduling, and memory management mechanisms.

At system initialization, the hypervisor extends from the device’s root of trust via a device-specific measured boot process and then passes control to the untrusted host domain. A remote party coordinates with the host to spawn a new isolate by first sending a declarative specification of the isolate’s initial state to IceCap’s trusted spawning service, via the host, which then carves-out the requested memory and CPU resources from resources which are inaccessible to the host. A process on the host, called the shadow virtual machine monitor, provides untrusted paravirtualized device backends to isolates, and also acts as a token representing the isolate in the host’s scheduler, to enable the host operating system to manage isolate scheduling policy with minimal modification.

To support attestation of isolates, IceCap would use a platform-specific measured boot to prove its own identity and then attest that of an isolate to a remote challenger. This is not yet implemented, with IceCap attestation being stubbed to support Veracruz, but straightforward to do so.

seL4 is accompanied by security and functional correctness proofs, checked in Isabelle/HOL [68, 69, 82], providing assurance that IceCap correctly protects isolates from software attacks. By using seL4, IceCap will also benefit from ongoing research into the elimination of certain classes of timing channels [38]. The trusted seL4 userspace components of IceCap are not yet verified, though they are compartmentalized and initialized using CapDL [55], which has a formal semantics known to be amenable to verification [18] from previous work. Using the high-level seL4 API, these components are also implemented at a high level of abstraction in Rust, making auditing easier and eliminating the need to subvert the Rust compiler’s memory safety checks—even for components which interact with hardware address translation structures. The IceCap TCB is small and limited in scope—about 40,000 lines of code. Virtual machine monitors are moved to the trust domains of the virtual machines they supervise, thereby eliminating emulation code from the TCB. Towards that end, cross-domain fault handling is replaced with higher-level message passing via seL4 IPC.

Isolates are also protected with the System MMU (SMMU) from attacks originating from peripherals under the host’s control. IceCap is designed to seamlessly take advantage of additional hardware security features based on, or aligned with, address translation-based access controls— Arm TrustZone [7], for example. TrustZone firmware typically uses the NS state bit to implement a coarse context switch, logically partitioning execution on the application processor into two worlds. IceCap could use this to run isolates out of secure-world memory resources, protected by platform-specific mechanisms which may mitigate certain

classes of physical attack.

Under IceCap, isolate and host incur a minimal performance overhead compared to host and guests under KVM [51]. We use Firecracker [2]—an open-source VMM for KVM from AWS—as a point of comparison, due to its minimalism for the sake of performance, and preference for paravirtualization over emulation. Compute-bound workloads in IceCap isolates incur a ∼2.2% overhead compared to native Linux processes and a ∼1.8% overhead compared to Firecracker guests due to context switches through the TCB on timer ticks (see Table 1). The virtual network bandwidth between the host and an isolate represents how data flows through IceCap in bulk. However, at the time of writing, untrusted network device emulation differs from Firecracker’s trusted network device emulation in ways that hinder a satisfying comparison, and with this in mind, we note guest-to-host incurs a ∼9.9% bandwidth overhead, whereas host-to-guest outperforms Firecracker by a small margin. As IceCap’s implementation matures, we expect virtual network bandwidth overhead to settle between these two points.

The great performance of seL4 IPC [81] helps reduce IceCap’s performance overhead, and this is further helped by minimizing VM exits using aggressive paravirtualization: VMMs for both host and guest do not even map any of their VMs’ memory into their own address spaces, and their only runtime responsibility is emulating the interrupt controller, with their VMs employing interrupt mitigation to even avoid that.

Next, we introduce a framework for designing and deploying privacy-preserving delegated computations across various different isolation technologies—IceCap included.

This paper is available on arxiv under CC BY 4.0 DEED license.