Rehabilitating Packs using Functors and Recursivity, part 1.

Auteurs: Pierrick Couderc
Date: 2020-09-24
Catégorie: OCaml
Tags: ocaml



OCamlPro has a long history of dedicated efforts to support the development of the OCaml compiler, through sponsorship or direct contributions from Flambda Team. An important one is the Flambda intermediate representation designed for optimizations, and in the future its next iteration Flambda 2. This work is funded by JaneStreet.

Packs in the OCaml ecosystem are kind of an outdated concept (options -pack and -for-pack in the OCaml manual), and their main utility has been overtaken by the introduction of module aliases in OCaml 4.02. What if we tried to redeem them and give them a new youth and utility by adding the possibility to generate functors or recursive packs?

This blog post covers the functor units and functor packs, while the next one will be centered around recursive packs. Both RFCs are currently developed by JaneStreet and OCamlPro. This idea was initially introduced by functor packs (Fabrice Le Fessant) and later generalized by functorized namespaces (Pierrick Couderc et al.).

Packs for the masses

First of all let's take a look at what packs are, and how they fixed some issues that arose when the ecosystem started to grow and the number of libraries got quite large.

One common problem in any programming language is how names are treated and disambiguated. For example, look at this small piece of code:

let x = "something"

let x = "something else"

We declare two variables x, but actually the first one is shadowed by the second, and is now unavailable for the rest of the program. It is perfectly valid in OCaml. Let's try to do the same thing with modules:

module M = struct end

module M = struct end

The compiler rejects it with the following error:

File "m.ml", line 3, characters 0-21:
3 | module M = struct end
    ^^^^^^^^^^^^^^^^^^^^^
Error: Multiple definition of the module name M.
       Names must be unique in a given structure or signature.

This also applies with programs linking two compilation units of the same name. Imagine you are using two libraries (here lib_a and lib_b), that both define a module named Misc.

ocamlopt -o prog.asm -I lib_a -I lib_b lib_a.cmxa lib_b.cmxa prog.ml 
File "prog.ml", line 1:
Error: The files lib_a/a.cmi and lib_b/b.cmi make inconsistent assumptions
over interface Misc

At link time, the compiler will reject your program since you are trying to link two modules with the same name but different implementations. The compiler is unable to differentiate the two compilation units since they define some identical symbols, as such cannot link the program. Enforcing unique module names in the same namespace (i.e. a signature) is consistent with the inability to link two modules of the same name in the same program.

However, Misc is a common name for a module in any project. How can we avoid that? As a user of the libraries there is nothing you can do, since you cannot rename the modules (you will eventually need to link two files named misc.cmx). As the developer, you need to ensure that your module names are unique enough to be used along any other libraries. One solution would be to use prefixes for each of your compilation units, for example by naming your files mylib_misc.ml, with the drawback that you will need to use those long module names inside your library. Another solution is packing your units.

A pack is simply a generated module that appends all your compilation units into one. For example, suppose you have two files a.ml and b.ml, you can generate a pack (i.e. a module) mylib.cmx that is equivalent to:

module A = struct <content of a.ml> end
module B = struct <content of b.ml> end

As such, A and B can retain their original module name, and be accessed from the outside as Mylib.A and Mylib.B. It uses the namespacing induced by the module system. A developer can simply generate a pack for its library, assuming its library name will be unique enough to be linked with other modules without the risk of name clashing. However it has one big downside: suppose you use a library with many modules but only use one. Without packs the compiler will only link the necessary compilation units from this library, but since the pack is one big compilation unit this means your program embeds the complete library.

This problem is fixed using module aliases and the compiler option -no-alias-deps since OCaml 4.02, and the result for the user is equivalent to a pack, making them more or less deprecated.

Functorizing packs, or how to parameterize a library

Packs being modules representing libraries, a useful feature would be to be able to produce libraries that take modules as parameters, just like functors. Another usage would be to split a huge functor into multiple files. In other words, we want our pack Mylib to be compiled as:

functor (P : sig .. end) -> struct 
  module A = struct <content of a.ml> end
  module B = struct <content of b.ml> end
end

while A and B would use the parameter P as a module, and Mylib instantiated later as

module Mylib = Mylib(Some_module_with_sig_compatible_with_P)

One can notice that our pack is indeed a functor, and not simply a module that binds a functor. To be able to do that, we also extends classical compilation units to be compiled as functors. Such functors are not expressed in the language, we do not provide a syntax for that, they are a matter of options at compile-time. For example:

ocamlopt -c -parameter P m.ml

will compile m.ml as a functor that has a parameter P whose interface is described in p.cmi in the compilation path. Similarly, our pack Mylib can be produced by the following compilation steps:

ocamlopt -c -parameter-of Mylib p.mli
ocamlopt -c -for-pack "Mylib(P)" a.ml
ocamlopt -c -for-pack "MyLib(P)" b.ml
ocamlopt -pack -o mylib.cmx -parameter P a.cmx b.cmx

In details:

  • The parameter is compiled with the flag -parameter-of Mylib, as such it won't be used as the interface of an implementation.
  • The two modules packed are compiled with the flag -for-pack "MyLib(P)". Expressing the parameter of the pack is mandatory since P must be known as a functor parameter (we will see why in the next section).
  • The pack is compiled with -parameter P, which will indeed produce a functorized compilation unit.

Functors are not limited to a unique parameter, as such they can be compiled with multiple -parameter options and multiple arguments in -for-pack. This implementation being on the build system side, it does not need to change the syntax of the language. We expect build tools like dune to provide supports for this feature, making it maybe more easier to use. Moreover, it makes compilation units on par with modules which can have a functor type. One downside however is that we cannot express type equalities between two parameters or with the functor body type as we would do with substitutions in module types.

Functor packs under the hood

In terms of implementation, packs should be seen as a concatenation of the compilation units then a rebinding of each of them in the newly created one. For example, a pack P of two units m.cmx and n.cmx is actually compiled as something like:

module P__M = <code of m.cmx> 
module P__N = <code of n.cmx> 
module M = P__M 
module N = P__N

According to this representation, if we tried to naively implement our previous functor pack Mylib(P) we would end up with a functor looking like this:

module Mylib__A = <code of a.cmx, with references to P>
module Mylib__B = <code of b.cmx, with references to P>

functor (P : <signature of p.cmi>) -> struct
  module A = Mylib__A
  module B = Mylib__B
end

Unfortunately, this encoding of functor packs is wrong: P is free in a.cmx and b.cmx and its identifier cannot correspond to the one generated for the functor retrospectively. The solution is actually quite simple and relies on a transformation known as closure conversion. In other words we will transform our modules into functors that takes as parameters their free variables, which in our case are the parameters of the functor pack and the dependencies from the same pack. Let's do it on a concrete functor equivalent to Mylib:

module Mylib' (P : P_SIG) = struct
  module A = struct .. <references to P> end
  module B = struct .. <references to P> <references to A> end
end

Our goal here is to move A and B outside of the functor, as such out of the scope of P, which is done by transforming those two modules into functors that takes a parameter P' with the same signature as P:

module A_funct (P' : P_SIG) = struct .. <references to P as P'> end 
module B_funct (P' : P_SIG) = struct 
  module A' = A_funct(P') 
  .. 
  <references to P as P'> 
  <references to A as A'> 
end 

module Mylib' (P : P_SIG) = struct 
  module A = A_funct(P) 
  module B = B_funct(P) 
end

While this code compiles it is not semantically equivalent. A_funct is instantiated twice, its side effects are computed twice: the first time when instantiating A in the functor, and the second when instantiating B. The solution is simply to go further with closure conversion and make the result of applying A_funct to P an argument of B_funct.

module A_funct (P' : P_SIG) = struct .. <references to P as P'> end
module B_funct (P' : P_SIG)(A': module type of A_funct(P'))= struct
  ..
  <references to P as P'>
  <references to A as A'>
end

module Mylib' (P : P_SIG) = struct
  module A = A_funct(P)
  module B = B_funct(P)(A)
end

This represents exactly how our functor pack Mylib is encoded. Since we need to compile modules in a specific way if they belong to a functor pack, the compiler has to know in the argument -for-pack that the pack is a functor, and what are its parameters.

Functor packs applied to ocamlopt

What we described is a functional prototype of functor packs, implemented on OCaml 4.10, as described in RFC#11. In practice, we already have one usage that we could benefit of in the future: cross-compilation of native code. At the moment the compiler is configured to target the architecture which it is compiled on. The modules relative to the current architecture are linked symbolically into the backend folder and the backend is compiled as if it only supported one architecture. One downside of this approach is that changes into the interface of the backend that need some modifications in each architecture are not detected at compile time, but only for the current one. You need to reconfigure the OCaml compiler and rebuild it to check if another architecture still compiles. One interesting property is that each architecture backend defines the same set of modules with compatible interfaces. In other words, these modules could simply be parameters of a functor, that is instantiated for a given architecture.

Following this idea, we implemented a prototype of native compiler whose backend is indeed packed into a functor, and instantiated at the initialization of the compiler. With this approach, we can easily switch the targeted architecture, and moreover we can be sure that each architecture is compiled, leveraging the fact that some necessary refactoring is always done when changes happen in the backend interface. Implementing such a functor is mainly a matter of adapting the build system to produce a functor pack, writing few signatures for the functor and its parameters, and instantiating the backend at the right time.

This proof of concept shows how functor packs can ease some complicated build system and allows new workflow.

Making packs useful again

Packs were an old concept mainly outdated by module aliases. They were not practical as they are some sort of monolithic libraries shaped into a unique module containing sub modules. While they perfectly use the module system for its namespacing properties, their usage enforces the compiler to link an entire library even if only one module is actually used. This improvement allows programmers to define big functors, functors that are split among multiple files, resulting in what we can view as a way to implement some form of parameterized libraries.

In the second part, we will cover another aspect of the rehabilitation of packs: using packs to implement mutually recursive compilation units.

Comments

François Bobot (25 September 2020 at 9 h 16 min):

I believe there is a typo

module Mylib’ (P : P_SIG) = struct
module A = A_funct(P)
module B = A_funct(P)
end

The last must be B_funct(P), the next example as also the same typo.

Pierrick Couderc (25 September 2020 at 10 h 31 min):

Indeed, thank you!

Cyrus Omar (8 February 2021 at 3 h 49 min):

This looks very useful! Any updates on this work? I’d like to be able to use it from dune.



Au sujet d'OCamlPro :

OCamlPro développe des applications à haute valeur ajoutée depuis plus de 10 ans, en utilisant les langages les plus avancés, tels que OCaml et Rust, visant aussi bien rapidité de développement que robustesse, et en ciblant les domaines les plus exigeants (méthodes formelles, cybersécurité, systèmes distribués/blockchain, conception de DSLs). Fort de plus de 20 ingénieurs R&D, avec une expertise unique sur les langages de programmation, aussi bien théorique (plus de 80% de nos ingénieurs ont une thèse en informatique) que pratique (participation active au développement de plusieurs compilateurs open-source, prototypage de la blockchain Tezos, etc.), diversifiée (OCaml, Rust, Cobol, Python, Scilab, C/C++, etc.) et appliquée à de multiples domaines. Nous dispensons également des [formations sur mesure sur OCaml, Rust, et les méthodes formelles] (https://training.ocamlpro.com/) Pour nous contacter : contact@ocamlpro.com.