Deriving Dependently-Typed OOP from First Principles

Authors:

Table Of Links

ABSTRACT

1 INTRODUCTION

2 DEPENDENTLY-TYPED OBJECT-ORIENTED PROGRAMMING

3 CASE STUDY

4 DESIGN CONSTRAINTS AND SOLUTIONS

5 FORMALIZATION

6 DE/REFUNCTIONALIZATION

7 FUTURE WORK

8 RELATED WORK

9 CONCLUSION AND REFERENCES

Abstract

The expression problem describes how most types can easily be extended with new ways to produce the type or new ways to consume the type, but not both. When abstract syntax trees are defined as an algebraic data type, for example, they can easily be extended with new consumers, such as print or eval, but adding a new constructor requires the modification of all existing pattern matches. The expression problem is one way to elucidate the difference between functional or data-oriented programs (easily extendable by new consumers) and object-oriented programs (easily extendable by new producers).

This difference between programs which are extensible by new producers or new consumers also exists for dependently typed programming, but with one core difference: Dependently-typed programming almost exclusively follows the functional programming model and not the object-oriented model, which leaves an interesting space in the programming language landscape unexplored. In this paper, we explore the field of dependently-typed object-oriented programming by deriving it from first principles using the principle of duality.

That is, we do not extend an existing objectoriented formalism with dependent types in an ad-hoc fashion, but instead start from a familiar data-oriented language and derive its dual fragment by the systematic use of defunctionalization and refunctionalization. Our central contribution is a dependently typed calculus which contains two dual language fragments.

We provide type- and semantics-preserving transformations between these two language fragments: defunctionalization and refunctionalization. We have implemented this language and these transformations and use this implementation to explain the various ways in which constructions in dependently typed programming can be explained as special instances of the general phenomenon of duality.

Introduction

There are many programming paradigms, but dependently typed programming languages almost exclusively follow the functional programming model. In this paper, we show why dependentlytyped programming languages should also include object-oriented principles, and how this can be done. One of the main reasons why object-oriented features should be included is a consequence of how the complexity of the domain is modeled in the functional and object-oriented paradigm.

Functional programmers structure the domain using data types defined by their constructors, whereas object-oriented programmers structure the domain using classes and interfaces defined by methods. This choice has important implications for the extensibility properties of large programs, which are only more accentuated for dependently typed programs. Why do most dependently typed languages follow the functional style?

One of the main reasons is that dependent type theories, on which a lot of them are based, are best studied for functional programming languages. Our challenge, then, is to develop a dependently-typed object-oriented calculus that can serve as the foundation for object-oriented dependently-typed programming languages. Instead of specifying this calculus in an ad-hoc fashion, we want to use de- and refunctionalization as systematic tools to derive an object-oriented language fragment from its functional counterpart.

We want to show how object-oriented programming is dual to functional programming, that this duality extends from non-dependent programming languages to dependently typed programming languages, and that we can use this duality to derive our calculus.

1.1 Data and Codata: The Essence of Functional and Object-Oriented Programming

How can functional programming (FP) and object-oriented programming (OOP) be dual, if there is no precise definition of these two paradigms? We have to define what we mean by functional and object-oriented programming if we want to get a precise research question. For the purposes of this paper, and other reasonable definitions notwithstanding, we focus on the differences in program decomposition between the two paradigms.1

For us, the essence of functional programming is programming with algebraic data types and pattern matching, whereas the essence of object-oriented programming is programming against interfaces, which correspond to the type-theoretic concept of codata and copattern matching. This definition is not novel but follows similar observations by Cook [1990, 2009] and Downen et al. [2019]. A potentially confusing but important aspect of this definition is that first-class functions are in the object-oriented space, since they are a particular form of codata (and 𝜆 is a particular form of copattern matching).

In the rest of this subsection, we elaborate on this definition. Let us verify first that this definition captures the essence of FP. An essential part of the programming experience in statically typed functional languages like OCaml, Scala, Haskell or SML, but also proof assistants like Coq, Agda, Idris and Lean, is modeling the domain with algebraic data types. Algebraic data types consist of product types like structs and records, sum types and enums, and recursive types like lists, which together form the essential vocabulary with which programmers in those languages express themselves.

The dependently typed languages in this list extend this vocabulary by allowing data types to be indexed; the vector type, for example, is indexed over the number of its elements. That OOP can be identified with codata types is less obvious, so we will introduce them with a bit more detail. Data types and codata types differ in how they are defined: Whereas a data type is defined by its constructors, i.e. all the ways in which terms of that type can be constructed, a codata type is defined by all the ways it can be observed.

One type which is defined by its two canonical observations is the type of infinite streams. We can either observe the head of a stream, yielding one element, or we can observe the tail, yielding a new stream. Equivalently, we can say that every stream has to implement the stream interface which requires a head and a tail method.

Instead of this object-oriented terminology, we use the type-theoretic jargon and the following syntax for defining the type of streams:

codata Stream(a: Type) { Stream(a).head(a: Type): a, Stream(a).tail(a: Type): Stream(a) } codef Ones: Stream(Nat) { head() => S(Z), tail() => Ones }

The right-hand side shows how to construct a stream by implementing the stream interface, i.e. by saying how it will behave on the head and the tail observation. This particular stream models an infinite sequence of ones. The syntactic construct we use here is called copattern matching [Abel et al. 2013] and is the precise dual of pattern matching.

We mentioned that according to our definition of FP and OOP, first-class functions counterintuitively belong to the object-oriented space, so let us substantiate that claim. Programmers in functional programming languages can define many types, but they usually cannot define the function type. Functions can be defined, however, using codata types:

A function is just an object which implements an interface with one apply method. For example, functions from natural numbers to Booleans, and the constant function which always returns true are defined in the following way:

codata Fun { ap(x: Nat): Bool } codef ConstTrue: Fun { ap(_) => True }

The research question that motivated this paper is this: If functional programming can be and has been extended to dependent functional programming, can object-oriented programming be similarly extended? Codata has been introduced to many proof assistants before, but for an entirely different purpose. The purpose was to model certain infinite structures and coinductive objects, not to program in an object-oriented style.

In this paper, we are interested in this second aspect, and we are (to the best of our knowledge) the first ones to discuss this question in detail. To approach this question in a principled way, we need an additional technical tool, defunctionalization and refunctionalization, which we introduce in the next section.

1.2 De- and Refunctionalization: A Tool for Systematic Language Design

Now that we have introduced two alternative programming paradigms, let us look at how one paradigm can express programs in the other paradigm. One way in which object-oriented programmers have often represented the functional style is with the visitor pattern [Gamma et al. 1995].

Later, Downen et al. [2019] showed how the visitor pattern can be used as a compilation technique for data and codata types; using the visitor pattern, they can compile functional programs to object-oriented programs, and using a related tabulation technique they can compile object-oriented programs to functional ones. In this paper, we use an alternative technique: defunctionalization and refunctionalization.

Defunctionalization [Danvy and Nielsen 2001; Reynolds 1972] is a whole-program transformation which eliminates higher-order functions by replacing lambda abstractions by constructors of a data type, together with a top-level apply function. Refunctionalization [Danvy and Millikin 2009] is its partial inverse, and re-introduces higher-order functions by replacing occurrences of the constructors by lambda abstractions. We already observed in the previous section that the function type is just one instance of a codata type.

Based on this observation, Rendel et al. [2015] showed that defunctionalization and refunctionalization can be generalized to arbitrary data and codata types, which makes these transformations both more powerful and more symmetric since refunctionalization is now a full inverse instead of a partial one. Let us look at an example of how these generalized defunctionalization and refunctionalization transformations work. In Figure 1a we have defined Booleans as a data type with two constructors, and negation by pattern matching on True and False.

For negation we use syntax familiar to object-oriented programmers: negating a boolean 𝑏 can be written as 𝑏.neg. Refunctionalizing this program results in the program in Figure 1b. In this representation, negation is the single observation of a codata type, and True and False are defined as objects implementing this interface.

One way to visualize how defunctionalization and refunctionalization work is to think of each type as a matrix. The two programs of Figure 1, for example, can be represented by the following matrix:

The rows of the matrix enumerate all the ways elements of the type can be consumed, whereas the columns enumerate the ways in which elements of the type can be constructed. The cells of the matrix specify the result of an interaction between one of each. Data types and codata types are then just two different linear presentations of this type-matrix, and defunctionalization and refunctionalization transpose the linearization.

In this paper, we use defunctionalization and refunctionalization not as a compilation technique, but as a tool for systematic language design. These transformations are only total in a language where the data and codata fragments of the language are equally expressive. We can therefore use them to systematically derive the codata fragment of an object-oriented dependently-typed language by starting from a familiar design for dependent data types and pattern matching, and refunctionalizing programs in that language.

1.3 A Minimal Dependently-Typed Example

Let us now extend the example from the previous section by a simple proof that negation is an involution, i.e. that applying negation twice is the identity. We look at this example first from the familiar point of view of functional programming, and then from the more unfamiliar point of view of dependently-typed object-oriented programming.

These two dual presentations are not artificially constructed but inter-derived using de- and refunctionalization introduced in the previous section. In the accompanying implementation that we provide, each version can be automatically transformed into the other presentation at the click of a button. In the functional decomposition, shown in Figure 2a, we use the Martin-Löf equality type Eq(𝑎 : Type, 𝑥 𝑦 : 𝑎) to express propositional equality.

The way we defined the proposition that negating a boolean twice is the identity function is interesting. Instead of a dependent function, it is formulated more directly as an elimination on a named boolean self which yields a proof that self is equal to self twice-negated, i.e. self.neg.neg. The proof pattern matches on True and False and returns the Refl constructor in each branch.

In the object-oriented decomposition, shown in Figure 2b, we have kept the definition of the Martin-Löf equality type. The definition of Booleans, on the other hand, has changed dramatically. Booleans are now defined via the two observations that we defined in the original program: negation and the proof that negating a boolean twice is the identity.

Instead of two canonical constructors True and False we now have two mutually recursive top-level definitions of True and False. This means that we are now free to add new Booleans without changing the definition of the type Bool, thus we could just add another object and implement the negation operation together with a proof of its correctness, i.e. a proof that applying it twice yields the original element.

Defining objects by interfaces that they have to implement is of course familiar. However, also including proofs of correctness in those interfaces, and looking at familiar types like Booleans in this flipped representation is novel. In this article, we invite you to follow us on an exploration of the duality of these two programming styles and to discover both the expressive power we get and the sometimes subtle problems we encounter and the restrictions we have to impose.

1.4 Overview

The remainder of this article is structured as follows:

• Building on our minimal example, we present dependently typed object-oriented programming in Section 2. Our language consists of two fragments, a functional/data-oriented fragment and an object-oriented/codata fragment, and the specification of these two fragments is dictated by the requirement that defunctionalization and refunctionalization are total and semantic-preserving transformations.

• In Section 3 we evaluate the expressive power and the extensibility properties of our system using a case study of a dependently typed web server. In Appendix C, we evaluate our design and implementation by a formalization of type preservation for a simple expression language respectively full type soundness of the simply typed lambda calculus. Since we have an available online implementation, the reader can choose to flip any of the involved types from the data to the codata representation, and vice-versa, and observe the resulting program.

• In Section 4 we discuss the constraints on the design of the type system that we had to observe because we want our system to be closed under de- and refunctionalization. For each such constraint, we discuss both the problem and the solution that we have chosen for our formalization.

• In Sections 5 and 6 we present all the formal details. We specify a declarative formalization in the style of Martin-Löf in Section 5 and the details of the defunctionalization and refunctionalization algorithms in Section 6.

• We discuss future work in Section 7, related work in Section 8 and conclude in Section 9. We have implemented the language, and the defunctionalization and refunctionalization algorithms presented in this paper. We make an IDE available in the browser (cf. Figure 3) which supports the defunctionalization and refunctionalization transformations as code actions using the language server protocol (LSP).

This paper is available on arxiv under CC 4.0 license.