Rehabilitating Packs using Functors and Recursivity, part 2.

Auteurs: Pierrick Couderc
Date: 2020-09-30
Catégorie: OCaml
Tags: ocaml



This blog post and the previous one about functor packs covers two RFCs currently developed by OCamlPro and Jane Street. We previously introduced functor packs, a new feature adding the possiblity to compile packs as functors, allowing the user to implement functors as multiple source files or even parameterized libraries.

In this blog post, we will cover the other aspect of the packs rehabilitation: allowing anyone to implement recursive compilation units using packs (as described formally in the RFC#20). Our previous post introduced briefly how packs were compiled and why we needed some bits of closure conversion to effectively implement big functors. Once again, to implement recursive packs we will need to encode modules through this technique, as such we advise the reader to check at least the introduction and the compilation part of functor packs.

Recursive modules through recursive packs

Recursive modules are a feature long available in the compiler, but restricted to modules, not compilation units. As such, it is impossible to write two files that depend on each other, except by using scripts that tie up these modules into a single compilation file. Due to the internal representation of recursive modules, it would be difficult to implement recursive (and mutually recursive) compilation units. However, we could use packs to implement these.

One common example of recursive modules are trees whose nodes are represented by sets. To implement such a data structure with the standard library we need recursive modules: Set is a functor that takes as parameter a module describing the values embedded in the set, but in our case the type needs the already applied functor.

module rec T : sig
  type t =
      Leaf of int
    | Node of TSet.t

  val compare : t -> t -> int
end = struct
  type t =
      Leaf of int
    | Node of TSet.t

  let compare t1 t2 =
    match t1, t2 with
      Leaf v1, Leaf v2 -> Int.compare v1 v2
    | Node s1, Node s2 -> TSet.compare s1 s2
    | Leaf _, Node _ -> -1
    | Node _, Leaf _ -> 1
end

and TSet : Set.S with type elt = T.t = Set.Make(T)

With recursive pack, we can simply put T and TSet into their respective files (t.ml and tSet.ml), and tie them into one module (let's name it P). Signature of recursive modules cannot be infered, as such we also need to define t.mli and tSet.mli. Both must be compiled simultaneously since they refer to each other. The result of the compilation is the following:

ocamlopt -c -for-pack P -recursive t.mli tSet.mli
ocamlopt -c -for-pack P -pack-is-recursive P t.ml
ocamlopt -c -for-pack P -pack-is-recursive P tSet.ml
ocamlopt -o p.cmx -recursive-pack t.cmx tSet.cmx

We have three new compilation options:

  • -recursive indicates to the compiler to typecheck all the given mlis simultaneously, as recursive modules.
  • -pack-is-recursive indicates which pack(s) in the hierarchy are meant to be recursive. This is necessary since it determines how the module must be compiled (i.e if we will need to apply closure conversion).
  • recursive-pack generates a pack that deals with the initialization of its modules, as for recursive modules.

Recursives modules compilation

One may be wondering why we need packs to compile recursive modules. Let's take a look at how they are encoded. We will craft a naive example that is simple enough once compiled:

module rec Even : sig
  val test: int -> bool
end = struct
  let test i =
    if i-1 <= 0 then false else Odd.test (i-1)
end

and Odd : sig
  val test: int -> bool
end = struct
  let test i =
    if i-1 <= 0 then true else Even.test (i-1)
end

It defines two modules Even and Odd, that both test whether an integer is even or odd, and if that is not the case calls the test function from the other module. Not a really interesting use of recursive modules obviously. The compilation schema for recursive modules is the following:

  • First, it allocates empty blocks for each module according to its shape (how many values are bound and what size they need in the block, if the module is a functor and what are its values, etc).
  • Then these blocks are filled with the implementation.

In our case, in a pseudo-code that is a bit higher order than Lambda (the first intermediate language of ocaml) it would translate as:

module Even = <allocation of the shape of even.cmx>
module Odd = <allocation of the shape of odd.cmx>

Even := <struct let test = .. end>
Odd := <struct let test = .. end>

This ensures that every reference to Even in Odd (and vice-versa) are valid pointers. To respect this schema, we will use packs to tie recursive modules together. Without packs, this means we would generate this code when linking the units into an executable which can be tricky. The pack can simply do it as initialization code.

Compiling modules for recursive pack

If we tried to compile these modules naively, we would end up in the same situation than for the functor pack: the compilation units would refer to identifiers that do not exist at the time they are generated. Moreover, the initialization part needs to know the shape of the compilation unit to be able to allocate precisely the block that will contain the recursive module. In order to implement recursive compilation units into packs, we extends the compilation units in two ways:

  • The shape of the unit is computed and stored in the cmo (or cmx).
  • As for functor pack, we apply closure conversion on the free variables that are modules from the same pack or from packs above in the hierarchy as long as they are recursive.

As an example, we will reuse our Even / Odd example above and split it into two units even.ml and odd.ml, and compile them into a recursive pack P. Both have the same shape: a module with a single value. Even refers to a free variable Odd, which is in the same recursive pack, and vice-versa. The result of the closure conversion is a function that will take the pointer resulting from the initialization. Since the module is also recursive itself, it takes its own pointer resulting from its initialization. The result will look as something like:

(* even.cmx *)
module Even_rec (Even: <even.mli><even.mli>)(Odd: <odd.mli><odd.mli>) = ..

(* odd.cmx *)
module Odd_rec (Odd: <odd.mli><odd.mli>)(Even: <even.mli><even.mli>) = ..

(* p.cmx *)
module Even = <allocation of the shape of even.cmx>
module Odd = <allocation of the shape of odd.cmx>

Even := Even_rec(Even)(Odd)
Odd := Odd_rec(Odd)(Even)

Rejunavating packs

Under the hood, these new features come with some refactoring in the pack implementation which follows work done for RFC on the representation of symbols in the middle-end of the compiler. Packs were not really used anymore and were deprecated by module aliases, this work makes them relevant again. These RFCs improve the OCaml ecosystem in multiple ways:

  • Compilation units are now on par with modules, since they can be functors.
  • Functor packs allow developers to implement parameterized libraries, without having to rely on scripts to produce multiple libraries linked with different backends (for example, Cohttp can use Lwt or Async as backend, and provides two libraries, one for each of these).
  • Recursive packs allow the implementation of recursive modules into separate files.

We hope that such improvements will benefit the users and library developers. Having a way to implement parameterize libraries without having to describe big functors by hand, or use mutually recursive compilation units without using scripts to generate a unique ml file will certainly introduce new workflows.



Au sujet d'OCamlPro :

OCamlPro développe des applications à haute valeur ajoutée depuis plus de 10 ans, en utilisant les langages les plus avancés, tels que OCaml et Rust, visant aussi bien rapidité de développement que robustesse, et en ciblant les domaines les plus exigeants (méthodes formelles, cybersécurité, systèmes distribués/blockchain, conception de DSLs). Fort de plus de 20 ingénieurs R&D, avec une expertise unique sur les langages de programmation, aussi bien théorique (plus de 80% de nos ingénieurs ont une thèse en informatique) que pratique (participation active au développement de plusieurs compilateurs open-source, prototypage de la blockchain Tezos, etc.), diversifiée (OCaml, Rust, Cobol, Python, Scilab, C/C++, etc.) et appliquée à de multiples domaines. Nous dispensons également des [formations sur mesure sur OCaml, Rust, et les méthodes formelles] (https://training.ocamlpro.com/) Pour nous contacter : contact@ocamlpro.com.