Rehabilitating Packs using Functors and Recursivity, part 1.
OCamlPro has a long history of dedicated efforts to support the development of the OCaml compiler, through sponsorship or direct contributions from Flambda Team. An important one is the Flambda intermediate representation designed for optimizations, and in the future its next iteration Flambda 2. This work is funded by JaneStreet.
Packs in the OCaml ecosystem are kind of an outdated concept (options -pack
and -for-pack
in the OCaml manual), and their main utility has been overtaken by the introduction of module aliases in OCaml 4.02. What if we tried to redeem them and give them a new youth and utility by adding the possibility to generate functors or recursive packs?
This blog post covers the functor units and functor packs, while the next one will be centered around recursive packs. Both RFCs are currently developed by JaneStreet and OCamlPro. This idea was initially introduced by functor packs (Fabrice Le Fessant) and later generalized by functorized namespaces (Pierrick Couderc et al.).
Packs for the masses
First of all let's take a look at what packs are, and how they fixed some issues that arose when the ecosystem started to grow and the number of libraries got quite large.
One common problem in any programming language is how names are treated and disambiguated. For example, look at this small piece of code:
let x = "something"
let x = "something else"
We declare two variables x
, but actually the first one is shadowed by the second, and is now unavailable for the rest of the program. It is perfectly valid in OCaml. Let's try to do the same thing with modules:
module M = struct end
module M = struct end
The compiler rejects it with the following error:
File "m.ml", line 3, characters 0-21:
3 | module M = struct end
^^^^^^^^^^^^^^^^^^^^^
Error: Multiple definition of the module name M.
Names must be unique in a given structure or signature.
This also applies with programs linking two compilation units of the same name. Imagine you are using two libraries (here lib_a
and lib_b
), that both define a module named Misc
.
ocamlopt -o prog.asm -I lib_a -I lib_b lib_a.cmxa lib_b.cmxa prog.ml
File "prog.ml", line 1:
Error: The files lib_a/a.cmi and lib_b/b.cmi make inconsistent assumptions
over interface Misc
At link time, the compiler will reject your program since you are trying to link two modules with the same name but different implementations. The compiler is unable to differentiate the two compilation units since they define some identical symbols, as such cannot link the program. Enforcing unique module names in the same namespace (i.e. a signature) is consistent with the inability to link two modules of the same name in the same program.
However, Misc
is a common name for a module in any project. How can we avoid that? As a user of the libraries there is nothing you can do, since you cannot rename the modules (you will eventually need to link two files named misc.cmx
). As the developer, you need to ensure that your module names are unique enough to be used along any other libraries. One solution would be to use prefixes for each of your compilation units, for example by naming your files mylib_misc.ml
, with the drawback that you will need to use those long module names inside your library. Another solution is packing your units.
A pack is simply a generated module that appends all your compilation units into one. For example, suppose you have two files a.ml
and b.ml
, you can generate a pack (i.e. a module) mylib.cmx
that is equivalent to:
module A = struct <content of a.ml> end
module B = struct <content of b.ml> end
As such, A
and B
can retain their original module name, and be accessed from the outside as Mylib.A
and Mylib.B
. It uses the namespacing induced by the module system. A developer can simply generate a pack for its library, assuming its library name will be unique enough to be linked with other modules without the risk of name clashing. However it has one big downside: suppose you use a library with many modules but only use one. Without packs the compiler will only link the necessary compilation units from this library, but since the pack is one big compilation unit this means your program embeds the complete library.
This problem is fixed using module aliases and the compiler option -no-alias-deps
since OCaml 4.02, and the result for the user is equivalent to a pack, making them more or less deprecated.
Functorizing packs, or how to parameterize a library
Packs being modules representing libraries, a useful feature would be to be able to produce libraries that take modules as parameters, just like functors. Another usage would be to split a huge functor into multiple files. In other words, we want our pack Mylib
to be compiled as:
functor (P : sig .. end) -> struct
module A = struct <content of a.ml> end
module B = struct <content of b.ml> end
end
while A
and B
would use the parameter P
as a module, and Mylib
instantiated later as
module Mylib = Mylib(Some_module_with_sig_compatible_with_P)
One can notice that our pack is indeed a functor, and not simply a module that binds a functor. To be able to do that, we also extends classical compilation units to be compiled as functors. Such functors are not expressed in the language, we do not provide a syntax for that, they are a matter of options at compile-time. For example:
ocamlopt -c -parameter P m.ml
will compile m.ml
as a functor that has a parameter P
whose interface is described in p.cmi
in the compilation path. Similarly, our pack Mylib
can be produced by the following compilation steps:
ocamlopt -c -parameter-of Mylib p.mli
ocamlopt -c -for-pack "Mylib(P)" a.ml
ocamlopt -c -for-pack "MyLib(P)" b.ml
ocamlopt -pack -o mylib.cmx -parameter P a.cmx b.cmx
In details:
- The parameter is compiled with the flag
-parameter-of Mylib
, as such it won't be used as the interface of an implementation. - The two modules packed are compiled with the flag
-for-pack "MyLib(P)"
. Expressing the parameter of the pack is mandatory sinceP
must be known as a functor parameter (we will see why in the next section). - The pack is compiled with
-parameter P
, which will indeed produce a functorized compilation unit.
Functors are not limited to a unique parameter, as such they can be compiled with multiple -parameter
options and multiple arguments in -for-pack
. This implementation being on the build system side, it does not need to change the syntax of the language. We expect build tools like dune to provide supports for this feature, making it maybe more easier to use. Moreover, it makes compilation units on par with modules which can have a functor type. One downside however is that we cannot express type equalities between two parameters or with the functor body type as we would do with substitutions in module types.
Functor packs under the hood
In terms of implementation, packs should be seen as a concatenation of the compilation units then a rebinding of each of them in the newly created one. For example, a pack P
of two units m.cmx
and n.cmx
is actually compiled as something like:
module P__M = <code of m.cmx>
module P__N = <code of n.cmx>
module M = P__M
module N = P__N
According to this representation, if we tried to naively implement our previous functor pack Mylib(P)
we would end up with a functor looking like this:
module Mylib__A = <code of a.cmx, with references to P>
module Mylib__B = <code of b.cmx, with references to P>
functor (P : <signature of p.cmi>) -> struct
module A = Mylib__A
module B = Mylib__B
end
Unfortunately, this encoding of functor packs is wrong: P
is free in a.cmx
and b.cmx
and its identifier cannot correspond to the one generated for the functor retrospectively. The solution is actually quite simple and relies on a transformation known as closure conversion. In other words we will transform our modules into functors that takes as parameters their free variables, which in our case are the parameters of the functor pack and the dependencies from the same pack. Let's do it on a concrete functor equivalent to Mylib:
module Mylib' (P : P_SIG) = struct
module A = struct .. <references to P> end
module B = struct .. <references to P> <references to A> end
end
Our goal here is to move A
and B
outside of the functor, as such out of the scope of P
, which is done by transforming those two modules into functors that takes a parameter P'
with the same signature as P
:
module A_funct (P' : P_SIG) = struct .. <references to P as P'> end
module B_funct (P' : P_SIG) = struct
module A' = A_funct(P')
..
<references to P as P'>
<references to A as A'>
end
module Mylib' (P : P_SIG) = struct
module A = A_funct(P)
module B = B_funct(P)
end
While this code compiles it is not semantically equivalent. A_funct
is instantiated twice, its side effects are computed twice: the first time when instantiating A
in the functor, and the second when instantiating B
. The solution is simply to go further with closure conversion and make the result of applying A_funct
to P
an argument of B_funct
.
module A_funct (P' : P_SIG) = struct .. <references to P as P'> end
module B_funct (P' : P_SIG)(A': module type of A_funct(P'))= struct
..
<references to P as P'>
<references to A as A'>
end
module Mylib' (P : P_SIG) = struct
module A = A_funct(P)
module B = B_funct(P)(A)
end
This represents exactly how our functor pack Mylib
is encoded. Since we need to compile modules in a specific way if they belong to a functor pack, the compiler has to know in the argument -for-pack
that the pack is a functor, and what are its parameters.
Functor packs applied to ocamlopt
What we described is a functional prototype of functor packs, implemented on OCaml 4.10, as described in RFC#11. In practice, we already have one usage that we could benefit of in the future: cross-compilation of native code. At the moment the compiler is configured to target the architecture which it is compiled on. The modules relative to the current architecture are linked symbolically into the backend folder and the backend is compiled as if it only supported one architecture. One downside of this approach is that changes into the interface of the backend that need some modifications in each architecture are not detected at compile time, but only for the current one. You need to reconfigure the OCaml compiler and rebuild it to check if another architecture still compiles. One interesting property is that each architecture backend defines the same set of modules with compatible interfaces. In other words, these modules could simply be parameters of a functor, that is instantiated for a given architecture.
Following this idea, we implemented a prototype of native compiler whose backend is indeed packed into a functor, and instantiated at the initialization of the compiler. With this approach, we can easily switch the targeted architecture, and moreover we can be sure that each architecture is compiled, leveraging the fact that some necessary refactoring is always done when changes happen in the backend interface. Implementing such a functor is mainly a matter of adapting the build system to produce a functor pack, writing few signatures for the functor and its parameters, and instantiating the backend at the right time.
This proof of concept shows how functor packs can ease some complicated build system and allows new workflow.
Making packs useful again
Packs were an old concept mainly outdated by module aliases. They were not practical as they are some sort of monolithic libraries shaped into a unique module containing sub modules. While they perfectly use the module system for its namespacing properties, their usage enforces the compiler to link an entire library even if only one module is actually used. This improvement allows programmers to define big functors, functors that are split among multiple files, resulting in what we can view as a way to implement some form of parameterized libraries.
In the second part, we will cover another aspect of the rehabilitation of packs: using packs to implement mutually recursive compilation units.
Comments
François Bobot (25 September 2020 at 9 h 16 min):
I believe there is a typo
module Mylib’ (P : P_SIG) = struct
module A = A_funct(P)
module B = A_funct(P)
end
The last must be
B_funct(P)
, the next example as also the same typo.
Pierrick Couderc (25 September 2020 at 10 h 31 min):
Indeed, thank you!
Cyrus Omar (8 February 2021 at 3 h 49 min):
This looks very useful! Any updates on this work? I’d like to be able to use it from dune.
Au sujet d'OCamlPro :
OCamlPro développe des applications à haute valeur ajoutée depuis plus de 10 ans, en utilisant les langages les plus avancés, tels que OCaml et Rust, visant aussi bien rapidité de développement que robustesse, et en ciblant les domaines les plus exigeants (méthodes formelles, cybersécurité, systèmes distribués/blockchain, conception de DSLs). Fort de plus de 20 ingénieurs R&D, avec une expertise unique sur les langages de programmation, aussi bien théorique (plus de 80% de nos ingénieurs ont une thèse en informatique) que pratique (participation active au développement de plusieurs compilateurs open-source, prototypage de la blockchain Tezos, etc.), diversifiée (OCaml, Rust, Cobol, Python, Scilab, C/C++, etc.) et appliquée à de multiples domaines. Nous dispensons également des [formations sur mesure sur OCaml, Rust, et les méthodes formelles] (https://training.ocamlpro.com/) Pour nous contacter : contact@ocamlpro.com.
Articles les plus récents
2024
- opam 2.3.0 release!
- Optimisation de Geneweb, 1er logiciel français de Généalogie depuis près de 30 ans
- Alt-Ergo 2.6 is Out!
- Flambda2 Ep. 3: Speculative Inlining
- opam 2.2.0 release!
- Flambda2 Ep. 2: Loopifying Tail-Recursive Functions
- Fixing and Optimizing the GnuCOBOL Preprocessor
- OCaml Backtraces on Uncaught Exceptions
- Opam 102: Pinning Packages
- Flambda2 Ep. 1: Foundational Design Decisions
- Behind the Scenes of the OCaml Optimising Compiler Flambda2: Introduction and Roadmap
- Lean 4: When Sound Programs become a Choice
- Opam 101: The First Steps
2023
- Maturing Learn-OCaml to version 1.0: Gateway to the OCaml World
- The latest release of Alt-Ergo version 2.5.1 is out, with improved SMT-LIB and bitvector support!
- 2022 at OCamlPro
- Autofonce, GNU Autotests Revisited
- Sub-single-instruction Peano to machine integer conversion
- Statically guaranteeing security properties on Java bytecode: Paper presentation at VMCAI 23
- Release of ocplib-simplex, version 0.5