Rehabilitating Packs using Functors and Recursivity, part 2.
This blog post and the previous one about functor packs covers two RFCs currently developed by OCamlPro and Jane Street. We previously introduced functor packs, a new feature adding the possiblity to compile packs as functors, allowing the user to implement functors as multiple source files or even parameterized libraries.
In this blog post, we will cover the other aspect of the packs rehabilitation: allowing anyone to implement recursive compilation units using packs (as described formally in the RFC#20). Our previous post introduced briefly how packs were compiled and why we needed some bits of closure conversion to effectively implement big functors. Once again, to implement recursive packs we will need to encode modules through this technique, as such we advise the reader to check at least the introduction and the compilation part of functor packs.
Recursive modules through recursive packs
Recursive modules are a feature long available in the compiler, but restricted to modules, not compilation units. As such, it is impossible to write two files that depend on each other, except by using scripts that tie up these modules into a single compilation file. Due to the internal representation of recursive modules, it would be difficult to implement recursive (and mutually recursive) compilation units. However, we could use packs to implement these.
One common example of recursive modules are trees whose nodes are represented by sets. To implement such a data structure with the standard library we need recursive modules: Set
is a functor that takes as parameter a module describing the values embedded in the set, but in our case the type needs the already applied functor.
module rec T : sig
type t =
Leaf of int
| Node of TSet.t
val compare : t -> t -> int
end = struct
type t =
Leaf of int
| Node of TSet.t
let compare t1 t2 =
match t1, t2 with
Leaf v1, Leaf v2 -> Int.compare v1 v2
| Node s1, Node s2 -> TSet.compare s1 s2
| Leaf _, Node _ -> -1
| Node _, Leaf _ -> 1
end
and TSet : Set.S with type elt = T.t = Set.Make(T)
With recursive pack, we can simply put T
and TSet
into their respective files (t.ml
and tSet.ml
), and tie them into one module (let's name it P
). Signature of recursive modules cannot be infered, as such we also need to define t.mli
and tSet.mli
. Both must be compiled simultaneously since they refer to each other. The result of the compilation is the following:
ocamlopt -c -for-pack P -recursive t.mli tSet.mli
ocamlopt -c -for-pack P -pack-is-recursive P t.ml
ocamlopt -c -for-pack P -pack-is-recursive P tSet.ml
ocamlopt -o p.cmx -recursive-pack t.cmx tSet.cmx
We have three new compilation options:
-recursive
indicates to the compiler to typecheck all the givenmli
s simultaneously, as recursive modules.-pack-is-recursive
indicates which pack(s) in the hierarchy are meant to be recursive. This is necessary since it determines how the module must be compiled (i.e if we will need to apply closure conversion).recursive-pack
generates a pack that deals with the initialization of its modules, as for recursive modules.
Recursives modules compilation
One may be wondering why we need packs to compile recursive modules. Let's take a look at how they are encoded. We will craft a naive example that is simple enough once compiled:
module rec Even : sig
val test: int -> bool
end = struct
let test i =
if i-1 <= 0 then false else Odd.test (i-1)
end
and Odd : sig
val test: int -> bool
end = struct
let test i =
if i-1 <= 0 then true else Even.test (i-1)
end
It defines two modules Even
and Odd
, that both test whether an integer is even or odd, and if that is not the case calls the test function from the other module. Not a really interesting use of recursive modules obviously. The compilation schema for recursive modules is the following:
- First, it allocates empty blocks for each module according to its shape (how many values are bound and what size they need in the block, if the module is a functor and what are its values, etc).
- Then these blocks are filled with the implementation.
In our case, in a pseudo-code that is a bit higher order than Lambda (the first intermediate language of ocaml) it would translate as:
module Even = <allocation of the shape of even.cmx>
module Odd = <allocation of the shape of odd.cmx>
Even := <struct let test = .. end>
Odd := <struct let test = .. end>
This ensures that every reference to Even
in Odd
(and vice-versa) are valid pointers. To respect this schema, we will use packs to tie recursive modules together. Without packs, this means we would generate this code when linking the units into an executable which can be tricky. The pack can simply do it as initialization code.
Compiling modules for recursive pack
If we tried to compile these modules naively, we would end up in the same situation than for the functor pack: the compilation units would refer to identifiers that do not exist at the time they are generated. Moreover, the initialization part needs to know the shape of the compilation unit to be able to allocate precisely the block that will contain the recursive module. In order to implement recursive compilation units into packs, we extends the compilation units in two ways:
- The shape of the unit is computed and stored in the
cmo
(orcmx
). - As for functor pack, we apply closure conversion on the free variables that are modules from the same pack or from packs above in the hierarchy as long as they are recursive.
As an example, we will reuse our Even
/ Odd
example above and split it into two units even.ml
and odd.ml
, and compile them into a recursive pack P
. Both have the same shape: a module with a single value. Even
refers to a free variable Odd
, which is in the same recursive pack, and vice-versa. The result of the closure conversion is a function that will take the pointer resulting from the initialization. Since the module is also recursive itself, it takes its own pointer resulting from its initialization. The result will look as something like:
(* even.cmx *)
module Even_rec (Even: <even.mli><even.mli>)(Odd: <odd.mli><odd.mli>) = ..
(* odd.cmx *)
module Odd_rec (Odd: <odd.mli><odd.mli>)(Even: <even.mli><even.mli>) = ..
(* p.cmx *)
module Even = <allocation of the shape of even.cmx>
module Odd = <allocation of the shape of odd.cmx>
Even := Even_rec(Even)(Odd)
Odd := Odd_rec(Odd)(Even)
Rejunavating packs
Under the hood, these new features come with some refactoring in the pack implementation which follows work done for RFC on the representation of symbols in the middle-end of the compiler. Packs were not really used anymore and were deprecated by module aliases, this work makes them relevant again. These RFCs improve the OCaml ecosystem in multiple ways:
- Compilation units are now on par with modules, since they can be functors.
- Functor packs allow developers to implement parameterized libraries, without having to rely on scripts to produce multiple libraries linked with different backends (for example, Cohttp can use Lwt or Async as backend, and provides two libraries, one for each of these).
- Recursive packs allow the implementation of recursive modules into separate files.
We hope that such improvements will benefit the users and library developers. Having a way to implement parameterize libraries without having to describe big functors by hand, or use mutually recursive compilation units without using scripts to generate a unique ml
file will certainly introduce new workflows.
Au sujet d'OCamlPro :
OCamlPro développe des applications à haute valeur ajoutée depuis plus de 10 ans, en utilisant les langages les plus avancés, tels que OCaml et Rust, visant aussi bien rapidité de développement que robustesse, et en ciblant les domaines les plus exigeants (méthodes formelles, cybersécurité, systèmes distribués/blockchain, conception de DSLs). Fort de plus de 20 ingénieurs R&D, avec une expertise unique sur les langages de programmation, aussi bien théorique (plus de 80% de nos ingénieurs ont une thèse en informatique) que pratique (participation active au développement de plusieurs compilateurs open-source, prototypage de la blockchain Tezos, etc.), diversifiée (OCaml, Rust, Cobol, Python, Scilab, C/C++, etc.) et appliquée à de multiples domaines. Nous dispensons également des [formations sur mesure sur OCaml, Rust, et les méthodes formelles] (https://training.ocamlpro.com/) Pour nous contacter : contact@ocamlpro.com.
Articles les plus récents
2024
- opam 2.3.0 release!
- Optimisation de Geneweb, 1er logiciel français de Généalogie depuis près de 30 ans
- Alt-Ergo 2.6 is Out!
- Flambda2 Ep. 3: Speculative Inlining
- opam 2.2.0 release!
- Flambda2 Ep. 2: Loopifying Tail-Recursive Functions
- Fixing and Optimizing the GnuCOBOL Preprocessor
- OCaml Backtraces on Uncaught Exceptions
- Opam 102: Pinning Packages
- Flambda2 Ep. 1: Foundational Design Decisions
- Behind the Scenes of the OCaml Optimising Compiler Flambda2: Introduction and Roadmap
- Lean 4: When Sound Programs become a Choice
- Opam 101: The First Steps
2023
- Maturing Learn-OCaml to version 1.0: Gateway to the OCaml World
- The latest release of Alt-Ergo version 2.5.1 is out, with improved SMT-LIB and bitvector support!
- 2022 at OCamlPro
- Autofonce, GNU Autotests Revisited
- Sub-single-instruction Peano to machine integer conversion
- Statically guaranteeing security properties on Java bytecode: Paper presentation at VMCAI 23
- Release of ocplib-simplex, version 0.5