Flambda2 Ep. 4: How to write a purely functional compiler
Welcome to a new episode of The Flambda2 Snippets!
Today, we will cover key high-level aspects of the algorithm of Flambda2
. We
will do our best to explain the fundamental design
decisions
pertaining to the architecture of the compiler. We will touch on how we managed
to make a purely functional optimising compiler (leveraging
tail-recursion,
backtracking, and non-linear traversal) by covering how the code is traversed,
what actions this design facilitates, and more!
All feedback is welcome, thank you for staying tuned and happy reading!
The F2S blog posts aim at gradually introducing the world to the inner-workings of a complex piece of software engineering: The
Flambda2 Optimising Compiler
for OCaml, a technical marvel born from a 10 year-long effort in Research & Development and Compilation; with many more years of expertise in all aspects of Computer Science and Formal Methods.
Expression traversal
Here's a code snippet we would like to be able to optimise and that demonstrates a set of properties that we want our code optimiser to have.
(* original code *)
let bar x =
let d = x + x
let y = x, d in
y
let foo z =
let x, d = bar z in
if x = z then x + 1 else d
We will optimise this code to :
let foo z =
z + 1
And we will do that in a single pass that is both efficient and maintainable.
Here are the key transformations we would like to apply to this codeblock:
- the first element of the pair returned by the
bar
function is an alias to thez
argument offoo
, thus theif
condition infoo
always evaluates totrue
. - That means that the
d
variable is never used, since theelse
branch infoo
is never executed.
That being said, to discover the aliased values x
and z
, we have to
follow the z
variable from foo
to bar
and back again. And to
discover that the let d = x + x
is unused in bar
we have to know about
the alias and then go back from the d
used in foo
to the let
in
bar
. The point is, there is a complex order of dependencies between
these properties that we have to follow in order to learn about the code.
Keep in mind that we aim for our compiler to remain reasonably fast. In
order to do that, we conduct all code transformations at the same time as the
analysis. This entails that we cannot just plug a constraint solver inside
of Flambda2
in order to discover these properties.
You have to understand that there are two kinds of properties that we want to
track.
One of them, like discovering that the if
condition always evaluates
to true
, flows in the order of evaluation, i.e: top-down. While the others,
like finding dead code, like let d = x + x
, and thus eliminating it, can
only be done in the reverse order of the evaluation, i.e: bottom-up.
Interesting detail: properties of the first category, sometimes help discover properties from the second, like in that specific example, but never the other way round.
And now, we will explain, how we have designed Flambda2
to be able to operate
within these constraints while transforming the code at the same time.
(*
CPS-converted version
Same code as before in the FL2 IR.
All variables with names starting with `k` are continuations.
*)
let bar x k_ret =
let d = x + x in
apply_cont k_ret x d
let foo z k_ret =
let_cont k x d =
let r = x + 1 in
if x = z then
apply_cont k_ret r
else
apply_cont k_ret d
in
apply bar z k
If you recall our very first F2S
snippet, we
mentioned one of the fundamental design decisions of Flambda2
which consists
in representing programs using
CPS. One of
the main reasons for that is that inlining becomes very simple.
But there's a catch…
If you refer back to the original version of the CPS-converted codeblock above,
you will see that if x = z then x + 1 else d
is inside the scope of the
the bar
function call. It's no longer the case once the function
has been converted to CPS. This shrinking of the scope, is inherent to CPS
representation. In an expression language, value analysis can simply be written
as a recursive function on expressions, propagating properties through an
environment. That is how the simplification pass was written on our previous IR Flambda1
. It did
produce some imprecisions here and there, but the trade-off in code simplicity
favoured this route rather than the one we have taken with Flambda2
today.
In direct-style language representations, traversals in the order of evaluation may be roughly emulated by simply traversing the tree recursively. On the other hand, in CPS-style language representations, this doesn't hold.
That's the catch: analysing CPS code entails more complex algorithms.
Overview of the traversal
Reasoning about code requires having a specific kind of data structure.
This data structure must behave like a kind of database of properties of
expressions, we naturally attach a name to each expression, and the data
structure itself keeps track of the properties related to them. This data
structure will be named acc
in the following code blocks (short for
accumulator).
A design decision we made early was that we wanted to traverse the code only once while doing the maximum amount of simplifications. Of course, there are exceptions to this rule but that’s a topic for another time.
Experience gained from designing Flambda1
guided this decision. In practice,
this overarching traversal manifests itself as two distinct passes: one
downwards and one upwards. The downwards pass performs static analysis and
inlining, while the upwards one handles code reconstruction and dead code
elimination, we call this whole process "Simplify".
Downward traversal
As mentioned in F2S1, the FL2 AST is simple and represented with only 6 different cases. You can find it again below:
type expr =
| Let_val of { var; prim; body : expr }
| Let_cont of { k; param; handler : expr ; body : expr }
| Apply_cont of { k; arg }
| Apply_val of { f; k_return; arg }
| Switch of { arg; cases }
| Invalid
We are going to cover each of these cases separately and explain how each behave and their role in how they help us reason about the code. Then, once all that is clear, we will explain how we traverse each constructor. This should help you understand what information we accumulate during both passes, and what exactly we can do with them.
Let_val
type expr =
| Let_val of
{
var : variable ;
prim : named ;
body : expr ;
}
[…]
Overview and semantics:
The Let_val
constructor evaluates a named
primitive, and binds it to a
variable
inside the body
and then evaluates that body
. A named
primitive is a single atomic operation applied to some variable
s. Primitives
have no impact on control flow, for instance they cannot raise exceptions.
Traversal algorithm:
This is the easy case, we just follow the evaluation order above. We analyse
the named
primitive, extend the acc
data structure with the discovered
properties, and proceed with analysing the body using the new acc
.
The most important thing about this process on the way down is this specific
extension of the acc
data structure. Most other constructors will pipe the
acc
smartly all along the computation rather than extending it.
Additional details:
One interesting thing to note: we can discover properties on the arguments of
the primitive and not only on the bound variable
. For example, the primitive
that reads the field of a value allows us to discover that the argument is a
block where that field exists in the current acc
.
Let_cont
type expr =
| Let_cont of
{
body : expr;
k : continuation;
params : variable list;
handler : expr;
}
[…]
Overview and semantics:
The Let_cont
constructor evaluates body
. body
is allowed to refer
to the k continuation
, and when encountering an application of k
the
control flow will evaluate handler
after binding the arguments of the given
application to params
.
Traversal algorithm:
The first thing to note is that there might be several apply_cont
to k
inside the body
, and since we want to analyse the handler
only once, we
cannot just follow the evaluation order naively like with the Let_expr
case.
Therefore, we first analyse the body
, and collect all the data about the
applications of k
(see the apply_cont
case below).
Once we have that, we can analyse and deduce the properties that we can know
about the arguments given to k
. We can then bind these properties to the
corresponding parameters and then analyse the handler
itself.
Quick rundown:
Let's consider the following code snippet.
let foo_d b k_ret =
let_cont k x =
let y = (x <= 1) in
apply_cont k_ret y
in
if b then apply_cont k 0
else apply_cont k 1
When we analyse the let_cont
we first analyse the body and see the
conditional on b
. We'll see the two apply_cont
s to k
and we'll be able
to deduce that the argument given to k
is either 0
or 1
. With that
knowledge, we can analyse the handler
of k
and deduce that y
is always
true
.
- Side note:
So far, we've only considered the case where let_cont
s are not recursive.
We also allow let_cont
to be recursive, namely to represent the control-flow
of loops, which means that the handler
can contain apply_cont k
. Since we
won't be able to see all apply_cont
s before analysing the handler
we will
have to stay conservative by over-approximating the properties we know about
the parameters.
Apply_cont
type expr =
| Apply_cont of
{
k : continuation;
args : variable list;
}
[…]
Overview and semantics:
As described in Let_cont
this only transfers the control to the handler
associated to k
using the args
to populate the value of the parameters of
k
.
Traversal algorithm:
In this constructor, we extend the acc
by associating the current context to
k
. This will be retrieved later (see Let_cont
case) to know
which contexts led to this continuation, and thus setup a context for the
handler.
Furthermore, Apply_cont
has no underlying field of type expr
so it is
a leaf of the on-going traversal. Assuming that there was a Let_cont
earlier,
the traversal will forward the acc
to the last Let_cont
encountered and
proceed from there again as explained above.
If there is no remaining Let_cont
then it means that the analysis of the
function is over and that we've traversed all the live code.
See the Let_cont
example.
Apply_val
type expr =
| Apply_val of
{
f : variable;
args : variable list;
k_return : continuation;
}
[…]
Overview and semantics:
Apply_val
is the usual function application. f
is interpreted as a
functional value so control-flow jumps to the associated code, binding args
to the function parameters. Since this is CPS, when the function returns, the
control is transfered to k_return
, same as for an Apply_cont
, its return
value is bound to the parameter of k_return
.
It closely ressembles something like:
let x = f args in
apply_cont k_return x
But since we don't allow normal function applications inside of a Let_val
, we
have an Apply_val
constructor to handle it.
Traversal algorithm:
The first thing we do is: recover the known properties about f
from our acc
.
Depending on what properties we have discovered so far, we decide whether to
inline f
or not:
If we choose not to inline f
, we handle this Apply_val
as another
Apply_cont
to k_return
, but if we do decide to inline it, we replace
the current Apply_val
with the body of the f
function and continue the
traversal from there.
The properties that matter for the inlining decision include:
- Do we know the actual function called (as shown above,
f
is a variable, and we may or may not know which function it refers to) - Are there any user annotation, either on the definition of
f
such as[@inline]
, or at the application, with[@inlined]
- The size of
f
is important too because inlining large functions may be detrimental - The value of the
args
matter because, for instance, when we know nothing about the arguments, inliningf
is less likely to be benefitial - In some case, this is where we try Speculative Inlining
Some static analysis requires a whole-view of the program, or at least, the current function.
So when the downwards pass has traversed the whole term, we trigger a few
analyses that we could not do on-the-fly like properties that involve
loops. Such properties can't be computed during a single pass, they usually
require a fix-point. Once that is done, we use the result of the downward pass,
we can use that to initialise the upward environment
(uenv
).
Upward traversal
You will be happy to learn that the upward traversal is much easier to break down than the downward one! 🎉
Upward environment
Since we have all the data accumulated on our way down at our disposal, we only have a few more properties to track on our way back up. As said previously, we gathered the properties inside an accumulator while following the evaluation order, on our way down. On our way up, we will feed something more akin to an environment.
(* example of a rebuilding step function *)
val rebuild_let : var -> prim -> args : var list -> body : (term * uenv) -> term * uenv
This upward environment (uenv
) will mainly hold data about:
- free (live) variables, which are variables that are used in the subterms of the term being traversed;
- relevant information to aid the Speculative Inlining heuristic, which include the size of the term and the optimization benefits, for instance the number of operations eliminated during both traversals.
These properties are inherently structural. Thus, tracking them is easily done while traversing the tree in the structural order.
Furthermore, these are properties of the rebuilt version of the term, not the original one:
- Since some optimisations can remove variable uses (making such variables potentially useless), we are required to work with a rebuilt version of that term;
- Obviously, optimization benefits can only be computed after actually performing them;
That is why we could not have tracked them on the way down, thus relegating them to the way back up. Hence, we have designed the upward pass to follow the structural order to track that effortlessly.
Dead code elimination
Free variables are useful for dead code elimination.
Dead code is code which can be removed from the term without altering its semantic.
There are two kinds of dead code:
- Pure expressions whose result is never used;
- Code sections that are never reached;
The first one can be detected by looking for variables which are never mentioned outside of their definition.
The second is the same, but relative to continuation names.
In order to understand how it is done, let's see how we do it for a simple
let
binding, and then we'll see how it is done through continuations with
let_cont
.
Let-bindings: when rebuilding the let
, we have the rebuilt body, and the
set of free-variables of the rebuilt body, and if the variable bound by the
let
is not part of the free-variables, we delete that let
.
Rundown:
let x = 1 in
(* Step 2 *)
let z = 0 in
(* Step 1 *)
let y = x + 1 in
(* Step 0 *)
42 + z
The rebuilding order follows the Step
annotations of the example from 0 to 2.
-
Step 0: the free-variable of the body of the
let y
is{ z }
.y
is not present in that set so we don't rebuild thelet
. Had we rebuilt it,x
would have been part of the free-variables set. So we just keep{ z }
as the set of free-variables. -
Step 1: We rebuild the
let z
and removez
from the free-variables set because it is now bound. -
Step 2: And now we continue onto the
let x
that we also remove.
We can observe that this method can remove all useless let
s in a single traversal.
Let's see now, how we can extend this method to rebuilding let_cont
s while
still maintaining this property.
Let_conts: As for let_cont
, we want to be able to remove the unused
parameters of the continuation. We can see that a parameter is unused after
having analysed the continuation handler: when the parameter is absent from the
set of free-variables of the handler.
Furthermore, we need to change the apply_cont
of the continuation from which
we remove parameters. We need to only pass arguments for live parameters.
That entails to go through the body of the continuation after traversing
its handler, because it's inside the body that apply_cont
s to that
continuation appear (we are going to put aside recursive continuations for the
sake of simplicity).
And so, we keep track of which of these continuations' parameters we removed in
order to rebuild the apply_cont
.
Rundown:
(* Step 0 *)
let_cont k0 z =
(* Step 1 (k0 handler) *)
return 42
in
(* Step 2 (k0 body) *)
let_cont k1 y =
(* Step 4 (k1 handler) *)
let z1 = y + 1 in
(* Step 3 (k1 handler) *)
apply_cont k0 z1
in
(* Step 5 (k1 body) *)
apply_cont k1 420
The rebuilding order follows the Step
annotations of the example from 0 to 5,
though, we will only mention the relevant steps.
-
Step 1: Parameter
z
ofk0
is dead, so we get rid of it. -
Step 3: So now, we have to update the
apply_cont k0 z1
which in turn becomesapply_cont k0
(the argument disappears). -
Step 4: Since
z1
was deleted, itslet
is then removed, andy
becomes useless and in turn, eliminated. -
Step 5: Eventually, we can replace the
apply_cont k1 420
withapply_cont k1
because they
parameter was previously eradicated.
It is this traversal order, which allows to conduct simplications on CPS in one go and perform dead code elimination on the upwards traversal.
Conclusion
In this episode of The Flambda2 Snippets, we have explored how the Flambda2 Optimising Compiler performs upwards and downwards traversals to analyze and transform OCaml code efficiently. By structuring our passes in this way, we ensure that static analysis and optimizations are performed in a single traversal while maintaining precision and efficiency.
The downward traversal enables us to propagate information about variables, functions, and continuations, allowing for effective inlining and simplification. Meanwhile, the upward traversal facilitates optimizations such as dead code elimination by identifying and removing unnecessary expressions in a structured and efficient manner.
Through these mechanisms, Flambda2
is able to navigate the complexities
introduced by CPS conversion while still achieving significant performance
gains. Understanding these traversal strategies is key to grasping the power
behind Flambda2
’s approach to optimization and why it stands as a robust
solution for compiling OCaml code.
Thank you all for reading! We hope these articles keep the community eager to dive even deeper with us into OCaml compilation. Until next time, mind the stalactites! ⛏️🔦
About OCamlPro:
OCamlPro is a R&D lab founded in 2011, with the mission to help industrial users benefit from experts with a state-of-the-art knowledge of programming languages theory and practice.
- We provide audit, support, custom developer tools and training for both the most modern languages, such as Rust, Wasm and OCaml, and for legacy languages, such as COBOL or even home-made domain-specific languages;
- We design, create and implement software with great added-value for our clients. High complexity is not a problem for our PhD-level experts. For example, we helped the French Income Tax Administration re-adapt and improve their internally kept M language, we designed a DSL to model and express revenue streams in the Cinema Industry, codename Niagara, and we also developed the prototype of the Tezos proof-of-stake blockchain from 2014 to 2018.
- We have a long history of creating open-source projects, such as the Opam package manager, the LearnOCaml web platform, and contributing to other ones, such as the Flambda optimizing compiler, or the GnuCOBOL compiler.
- We are also experts of Formal Methods, developing tools such as our SMT Solver Alt-Ergo (check our Alt-Ergo Users' Club) and using them to prove safety or security properties of programs.
Please reach out, we'll be delighted to discuss your challenges: contact@ocamlpro.com or book a quick discussion.
Most Recent Articles
2024
- opam 2.3.0 release!
- Optimisation de Geneweb, 1er logiciel français de Généalogie depuis près de 30 ans
- Alt-Ergo 2.6 is Out!
- Flambda2 Ep. 3: Speculative Inlining
- opam 2.2.0 release!
- Flambda2 Ep. 2: Loopifying Tail-Recursive Functions
- Fixing and Optimizing the GnuCOBOL Preprocessor
- OCaml Backtraces on Uncaught Exceptions
- Opam 102: Pinning Packages
- Flambda2 Ep. 1: Foundational Design Decisions
- Behind the Scenes of the OCaml Optimising Compiler Flambda2: Introduction and Roadmap
- Lean 4: When Sound Programs become a Choice
- Opam 101: The First Steps
2023
- Maturing Learn-OCaml to version 1.0: Gateway to the OCaml World
- The latest release of Alt-Ergo version 2.5.1 is out, with improved SMT-LIB and bitvector support!
- 2022 at OCamlPro
- Autofonce, GNU Autotests Revisited
- Sub-single-instruction Peano to machine integer conversion
- Statically guaranteeing security properties on Java bytecode: Paper presentation at VMCAI 23