Rehabilitating Packs using Functors and Recursivity, part 2.

Authors: Pierrick Couderc
Date: 2020-09-30
Category: OCaml
Tags: ocaml



This blog post and the previous one about functor packs covers two RFCs currently developed by OCamlPro and Jane Street. We previously introduced functor packs, a new feature adding the possiblity to compile packs as functors, allowing the user to implement functors as multiple source files or even parameterized libraries.

In this blog post, we will cover the other aspect of the packs rehabilitation: allowing anyone to implement recursive compilation units using packs (as described formally in the RFC#20). Our previous post introduced briefly how packs were compiled and why we needed some bits of closure conversion to effectively implement big functors. Once again, to implement recursive packs we will need to encode modules through this technique, as such we advise the reader to check at least the introduction and the compilation part of functor packs.

Recursive modules through recursive packs

Recursive modules are a feature long available in the compiler, but restricted to modules, not compilation units. As such, it is impossible to write two files that depend on each other, except by using scripts that tie up these modules into a single compilation file. Due to the internal representation of recursive modules, it would be difficult to implement recursive (and mutually recursive) compilation units. However, we could use packs to implement these.

One common example of recursive modules are trees whose nodes are represented by sets. To implement such a data structure with the standard library we need recursive modules: Set is a functor that takes as parameter a module describing the values embedded in the set, but in our case the type needs the already applied functor.

module rec T : sig
  type t =
      Leaf of int
    | Node of TSet.t

  val compare : t -> t -> int
end = struct
  type t =
      Leaf of int
    | Node of TSet.t

  let compare t1 t2 =
    match t1, t2 with
      Leaf v1, Leaf v2 -> Int.compare v1 v2
    | Node s1, Node s2 -> TSet.compare s1 s2
    | Leaf _, Node _ -> -1
    | Node _, Leaf _ -> 1
end

and TSet : Set.S with type elt = T.t = Set.Make(T)

With recursive pack, we can simply put T and TSet into their respective files (t.ml and tSet.ml), and tie them into one module (let's name it P). Signature of recursive modules cannot be infered, as such we also need to define t.mli and tSet.mli. Both must be compiled simultaneously since they refer to each other. The result of the compilation is the following:

ocamlopt -c -for-pack P -recursive t.mli tSet.mli
ocamlopt -c -for-pack P -pack-is-recursive P t.ml
ocamlopt -c -for-pack P -pack-is-recursive P tSet.ml
ocamlopt -o p.cmx -recursive-pack t.cmx tSet.cmx

We have three new compilation options:

Recursives modules compilation

One may be wondering why we need packs to compile recursive modules. Let's take a look at how they are encoded. We will craft a naive example that is simple enough once compiled:

module rec Even : sig
  val test: int -> bool
end = struct
  let test i =
    if i-1 <= 0 then false else Odd.test (i-1)
end

and Odd : sig
  val test: int -> bool
end = struct
  let test i =
    if i-1 <= 0 then true else Even.test (i-1)
end

It defines two modules Even and Odd, that both test whether an integer is even or odd, and if that is not the case calls the test function from the other module. Not a really interesting use of recursive modules obviously. The compilation schema for recursive modules is the following:

In our case, in a pseudo-code that is a bit higher order than Lambda (the first intermediate language of ocaml) it would translate as:

module Even = <allocation of the shape of even.cmx>
module Odd = <allocation of the shape of odd.cmx>

Even := <struct let test = .. end>
Odd := <struct let test = .. end>

This ensures that every reference to Even in Odd (and vice-versa) are valid pointers. To respect this schema, we will use packs to tie recursive modules together. Without packs, this means we would generate this code when linking the units into an executable which can be tricky. The pack can simply do it as initialization code.

Compiling modules for recursive pack

If we tried to compile these modules naively, we would end up in the same situation than for the functor pack: the compilation units would refer to identifiers that do not exist at the time they are generated. Moreover, the initialization part needs to know the shape of the compilation unit to be able to allocate precisely the block that will contain the recursive module. In order to implement recursive compilation units into packs, we extends the compilation units in two ways:

As an example, we will reuse our Even / Odd example above and split it into two units even.ml and odd.ml, and compile them into a recursive pack P. Both have the same shape: a module with a single value. Even refers to a free variable Odd, which is in the same recursive pack, and vice-versa. The result of the closure conversion is a function that will take the pointer resulting from the initialization. Since the module is also recursive itself, it takes its own pointer resulting from its initialization. The result will look as something like:

(* even.cmx *)
module Even_rec (Even: <even.mli><even.mli>)(Odd: <odd.mli><odd.mli>) = ..

(* odd.cmx *)
module Odd_rec (Odd: <odd.mli><odd.mli>)(Even: <even.mli><even.mli>) = ..

(* p.cmx *)
module Even = <allocation of the shape of even.cmx>
module Odd = <allocation of the shape of odd.cmx>

Even := Even_rec(Even)(Odd)
Odd := Odd_rec(Odd)(Even)

Rejunavating packs

Under the hood, these new features come with some refactoring in the pack implementation which follows work done for RFC on the representation of symbols in the middle-end of the compiler. Packs were not really used anymore and were deprecated by module aliases, this work makes them relevant again. These RFCs improve the OCaml ecosystem in multiple ways:

We hope that such improvements will benefit the users and library developers. Having a way to implement parameterize libraries without having to describe big functors by hand, or use mutually recursive compilation units without using scripts to generate a unique ml file will certainly introduce new workflows.


About OCamlPro:

OCamlPro is a R&D lab founded in 2011, with the mission to help industrial users benefit from state-of-the art programming languages like OCaml and Rust.

We design, create and implement custom ad-hoc software for our clients. We also have a long experience in developing and maintaining open-source tooling for OCaml, such as Opam, TryOCaml, ocp-indent, ocp-index and ocp-browser, and we contribute to the core-development of OCaml, notably with our work on the Flambda optimizer branch.

Another area of expertise is that of Formal Methods, with tools such as our SMT Solver Alt-Ergo (check our Alt-Ergo Users'). We also provide vocational trainings in OCaml and Rust, and we can build courses on formal methods on-demand. Do not hesitate to reach out by email: contact@ocamlpro.com.