Table of contents

Flambda2 Ep. 4: How to write a purely functional compiler




As we dive deeper into the Flambda2 Optimising Compiler, who knows what marvels might await us. Picture: Son Doong Cave (Vietnam). Credit: Collected.

As we dive deeper into the Flambda2 Optimising Compiler, who knows what marvels might await us. Picture: Son Doong Cave (Vietnam). Credit: Collected.

Welcome to a new episode of The Flambda2 Snippets!

Today, we will cover key high-level aspects of the algorithm of Flambda2. We will do our best to explain the fundamental design decisions pertaining to the architecture of the compiler. We will touch on how we managed to make a purely functional optimising compiler (leveraging tail-recursion, backtracking, and non-linear traversal) by covering how the code is traversed, what actions this design facilitates, and more!

All feedback is welcome, thank you for staying tuned and happy reading!

The F2S blog posts aim at gradually introducing the world to the inner-workings of a complex piece of software engineering: The Flambda2 Optimising Compiler for OCaml, a technical marvel born from a 10 year-long effort in Research & Development and Compilation; with many more years of expertise in all aspects of Computer Science and Formal Methods.

Expression traversal

Here's a code snippet we would like to be able to optimise and that demonstrates a set of properties that we want our code optimiser to have.

(* original code *)
let bar x =
  let d = x + x
  let y = x, d in
  y

let foo z =
  let x, d = bar z in
  if x = z then x + 1 else d

We will optimise this code to :

let foo z =
  z + 1

And we will do that in a single pass that is both efficient and maintainable.

Here are the key transformations we would like to apply to this codeblock:

  • the first element of the pair returned by the bar function is an alias to the z argument of foo, thus the if condition in foo always evaluates to true.
  • That means that the d variable is never used, since the else branch in foo is never executed.

That being said, to discover the aliased values x and z, we have to follow the z variable from foo to bar and back again. And to discover that the let d = x + x is unused in bar we have to know about the alias and then go back from the d used in foo to the let in bar. The point is, there is a complex order of dependencies between these properties that we have to follow in order to learn about the code.

Keep in mind that we aim for our compiler to remain reasonably fast. In order to do that, we conduct all code transformations at the same time as the analysis. This entails that we cannot just plug a constraint solver inside of Flambda2 in order to discover these properties.

You have to understand that there are two kinds of properties that we want to track. One of them, like discovering that the if condition always evaluates to true, flows in the order of evaluation, i.e: top-down. While the others, like finding dead code, like let d = x + x, and thus eliminating it, can only be done in the reverse order of the evaluation, i.e: bottom-up.

Interesting detail: properties of the first category, sometimes help discover properties from the second, like in that specific example, but never the other way round.

And now, we will explain, how we have designed Flambda2 to be able to operate within these constraints while transforming the code at the same time.

(*
  CPS-converted version
  Same code as before in the FL2 IR.
  All variables with names starting with `k` are continuations.
*)
let bar x k_ret =
  let d = x + x in
  apply_cont k_ret x d

let foo z k_ret =
  let_cont k x d =
    let r = x + 1 in
    if x = z then
      apply_cont k_ret r
    else
      apply_cont k_ret d
  in
  apply bar z k

If you recall our very first F2S snippet, we mentioned one of the fundamental design decisions of Flambda2 which consists in representing programs using CPS. One of the main reasons for that is that inlining becomes very simple.

But there's a catch…

If you refer back to the original version of the CPS-converted codeblock above, you will see that if x = z then x + 1 else d is inside the scope of the the bar function call. It's no longer the case once the function has been converted to CPS. This shrinking of the scope, is inherent to CPS representation. In an expression language, value analysis can simply be written as a recursive function on expressions, propagating properties through an environment. That is how the simplification pass was written on our previous IR Flambda1. It did produce some imprecisions here and there, but the trade-off in code simplicity favoured this route rather than the one we have taken with Flambda2 today.

In direct-style language representations, traversals in the order of evaluation may be roughly emulated by simply traversing the tree recursively. On the other hand, in CPS-style language representations, this doesn't hold.

That's the catch: analysing CPS code entails more complex algorithms.

Overview of the traversal

Reasoning about code requires having a specific kind of data structure.

This data structure must behave like a kind of database of properties of expressions, we naturally attach a name to each expression, and the data structure itself keeps track of the properties related to them. This data structure will be named acc in the following code blocks (short for accumulator).

A design decision we made early was that we wanted to traverse the code only once while doing the maximum amount of simplifications. Of course, there are exceptions to this rule but that’s a topic for another time.

Experience gained from designing Flambda1 guided this decision. In practice, this overarching traversal manifests itself as two distinct passes: one downwards and one upwards. The downwards pass performs static analysis and inlining, while the upwards one handles code reconstruction and dead code elimination, we call this whole process "Simplify".

Downward traversal

As mentioned in F2S1, the FL2 AST is simple and represented with only 6 different cases. You can find it again below:

type expr =
  | Let_val of { var; prim; body : expr }
  | Let_cont of { k; param; handler : expr ; body : expr }
  | Apply_cont of { k; arg }
  | Apply_val of { f; k_return; arg }
  | Switch of { arg; cases }
  | Invalid

We are going to cover each of these cases separately and explain how each behave and their role in how they help us reason about the code. Then, once all that is clear, we will explain how we traverse each constructor. This should help you understand what information we accumulate during both passes, and what exactly we can do with them.

Let_val

type expr =
  | Let_val of
	{
	  var : variable ;
	  prim : named ;
	  body : expr ;
	}
[…]

Overview and semantics:

The Let_val constructor evaluates a named primitive, and binds it to a variable inside the body and then evaluates that body. A named primitive is a single atomic operation applied to some variables. Primitives have no impact on control flow, for instance they cannot raise exceptions.

Traversal algorithm:

This is the easy case, we just follow the evaluation order above. We analyse the named primitive, extend the acc data structure with the discovered properties, and proceed with analysing the body using the new acc.

The most important thing about this process on the way down is this specific extension of the acc data structure. Most other constructors will pipe the acc smartly all along the computation rather than extending it.

Additional details:

One interesting thing to note: we can discover properties on the arguments of the primitive and not only on the bound variable. For example, the primitive that reads the field of a value allows us to discover that the argument is a block where that field exists in the current acc.

Let_cont

type expr =
  | Let_cont of
	{
	  body : expr;
	  k : continuation;
	  params : variable list;
	  handler : expr;
	}
[…]

Overview and semantics:

The Let_cont constructor evaluates body. body is allowed to refer to the k continuation, and when encountering an application of k the control flow will evaluate handler after binding the arguments of the given application to params.

Traversal algorithm:

The first thing to note is that there might be several apply_cont to k inside the body, and since we want to analyse the handler only once, we cannot just follow the evaluation order naively like with the Let_expr case.

Therefore, we first analyse the body, and collect all the data about the applications of k (see the apply_cont case below).

Once we have that, we can analyse and deduce the properties that we can know about the arguments given to k. We can then bind these properties to the corresponding parameters and then analyse the handler itself.

Quick rundown:

Let's consider the following code snippet.

let foo_d b k_ret = 
  let_cont k x =
    let y = (x <= 1) in
    apply_cont k_ret y
  in
  if b then apply_cont k 0
  else apply_cont k 1

When we analyse the let_cont we first analyse the body and see the conditional on b. We'll see the two apply_conts to k and we'll be able to deduce that the argument given to k is either 0 or 1. With that knowledge, we can analyse the handler of k and deduce that y is always true.

  • Side note:

So far, we've only considered the case where let_conts are not recursive. We also allow let_cont to be recursive, namely to represent the control-flow of loops, which means that the handler can contain apply_cont k. Since we won't be able to see all apply_conts before analysing the handler we will have to stay conservative by over-approximating the properties we know about the parameters.

Apply_cont

type expr =
  | Apply_cont of
	{
	  k : continuation;
	  args : variable list;
	}
[…]

Overview and semantics:

As described in Let_cont this only transfers the control to the handler associated to k using the args to populate the value of the parameters of k.

Traversal algorithm:

In this constructor, we extend the acc by associating the current context to k. This will be retrieved later (see Let_cont case) to know which contexts led to this continuation, and thus setup a context for the handler.

Furthermore, Apply_cont has no underlying field of type expr so it is a leaf of the on-going traversal. Assuming that there was a Let_cont earlier, the traversal will forward the acc to the last Let_cont encountered and proceed from there again as explained above.

If there is no remaining Let_cont then it means that the analysis of the function is over and that we've traversed all the live code.

See the Let_cont example.

Apply_val

type expr =
  | Apply_val of
	{
	  f : variable;
	  args : variable list;
	  k_return : continuation;
	}
[…]

Overview and semantics:

Apply_val is the usual function application. f is interpreted as a functional value so control-flow jumps to the associated code, binding args to the function parameters. Since this is CPS, when the function returns, the control is transfered to k_return, same as for an Apply_cont, its return value is bound to the parameter of k_return.

It closely ressembles something like:

let x = f args in
apply_cont k_return x

But since we don't allow normal function applications inside of a Let_val, we have an Apply_val constructor to handle it.

Traversal algorithm:

The first thing we do is: recover the known properties about f from our acc.

Depending on what properties we have discovered so far, we decide whether to inline f or not: If we choose not to inline f, we handle this Apply_val as another Apply_cont to k_return, but if we do decide to inline it, we replace the current Apply_val with the body of the f function and continue the traversal from there.

The properties that matter for the inlining decision include:

  • Do we know the actual function called (as shown above, f is a variable, and we may or may not know which function it refers to)
  • Are there any user annotation, either on the definition of f such as [@inline], or at the application, with [@inlined]
  • The size of f is important too because inlining large functions may be detrimental
  • The value of the args matter because, for instance, when we know nothing about the arguments, inlining f is less likely to be benefitial
  • In some case, this is where we try Speculative Inlining

Some static analysis requires a whole-view of the program, or at least, the current function.

So when the downwards pass has traversed the whole term, we trigger a few analyses that we could not do on-the-fly like properties that involve loops. Such properties can't be computed during a single pass, they usually require a fix-point. Once that is done, we use the result of the downward pass, we can use that to initialise the upward environment (uenv).

Upward traversal

You will be happy to learn that the upward traversal is much easier to break down than the downward one! 🎉

Upward environment

Since we have all the data accumulated on our way down at our disposal, we only have a few more properties to track on our way back up. As said previously, we gathered the properties inside an accumulator while following the evaluation order, on our way down. On our way up, we will feed something more akin to an environment.

(* example of a rebuilding step function *)
val rebuild_let : var -> prim -> args : var list -> body : (term * uenv) -> term * uenv

This upward environment (uenv) will mainly hold data about:

  • free (live) variables, which are variables that are used in the subterms of the term being traversed;
  • relevant information to aid the Speculative Inlining heuristic, which include the size of the term and the optimization benefits, for instance the number of operations eliminated during both traversals.

These properties are inherently structural. Thus, tracking them is easily done while traversing the tree in the structural order.

Furthermore, these are properties of the rebuilt version of the term, not the original one:

  • Since some optimisations can remove variable uses (making such variables potentially useless), we are required to work with a rebuilt version of that term;
  • Obviously, optimization benefits can only be computed after actually performing them;

That is why we could not have tracked them on the way down, thus relegating them to the way back up. Hence, we have designed the upward pass to follow the structural order to track that effortlessly.

Dead code elimination

Free variables are useful for dead code elimination.

Dead code is code which can be removed from the term without altering its semantic.

There are two kinds of dead code:

  • Pure expressions whose result is never used;
  • Code sections that are never reached;

The first one can be detected by looking for variables which are never mentioned outside of their definition.

The second is the same, but relative to continuation names.

In order to understand how it is done, let's see how we do it for a simple let binding, and then we'll see how it is done through continuations with let_cont.


Let-bindings: when rebuilding the let, we have the rebuilt body, and the set of free-variables of the rebuilt body, and if the variable bound by the let is not part of the free-variables, we delete that let.

Rundown:

let x = 1 in
(* Step 2 *)
let z = 0 in
(* Step 1 *)
let y = x + 1 in
(* Step 0 *)
42 + z

The rebuilding order follows the Step annotations of the example from 0 to 2.

  • Step 0: the free-variable of the body of the let y is { z }. y is not present in that set so we don't rebuild the let. Had we rebuilt it, x would have been part of the free-variables set. So we just keep { z } as the set of free-variables.

  • Step 1: We rebuild the let z and remove z from the free-variables set because it is now bound.

  • Step 2: And now we continue onto the let x that we also remove.

We can observe that this method can remove all useless lets in a single traversal.

Let's see now, how we can extend this method to rebuilding let_conts while still maintaining this property.


Let_conts: As for let_cont, we want to be able to remove the unused parameters of the continuation. We can see that a parameter is unused after having analysed the continuation handler: when the parameter is absent from the set of free-variables of the handler.

Furthermore, we need to change the apply_cont of the continuation from which we remove parameters. We need to only pass arguments for live parameters.

That entails to go through the body of the continuation after traversing its handler, because it's inside the body that apply_conts to that continuation appear (we are going to put aside recursive continuations for the sake of simplicity).

And so, we keep track of which of these continuations' parameters we removed in order to rebuild the apply_cont.

Rundown:

(* Step 0 *)
let_cont k0 z =
(* Step 1 (k0 handler) *)
  return 42
in
(* Step 2 (k0 body) *)
let_cont k1 y =
(* Step 4 (k1 handler) *)
  let z1 = y + 1 in 
(* Step 3 (k1 handler) *)
  apply_cont k0 z1
in
(* Step 5 (k1 body) *)
apply_cont k1 420

The rebuilding order follows the Step annotations of the example from 0 to 5, though, we will only mention the relevant steps.

  • Step 1: Parameter z of k0 is dead, so we get rid of it.

  • Step 3: So now, we have to update the apply_cont k0 z1 which in turn becomes apply_cont k0 (the argument disappears).

  • Step 4: Since z1 was deleted, its let is then removed, and y becomes useless and in turn, eliminated.

  • Step 5: Eventually, we can replace the apply_cont k1 420 with apply_cont k1 because the y parameter was previously eradicated.

It is this traversal order, which allows to conduct simplications on CPS in one go and perform dead code elimination on the upwards traversal.


Conclusion

In this episode of The Flambda2 Snippets, we have explored how the Flambda2 Optimising Compiler performs upwards and downwards traversals to analyze and transform OCaml code efficiently. By structuring our passes in this way, we ensure that static analysis and optimizations are performed in a single traversal while maintaining precision and efficiency.

The downward traversal enables us to propagate information about variables, functions, and continuations, allowing for effective inlining and simplification. Meanwhile, the upward traversal facilitates optimizations such as dead code elimination by identifying and removing unnecessary expressions in a structured and efficient manner.

Through these mechanisms, Flambda2 is able to navigate the complexities introduced by CPS conversion while still achieving significant performance gains. Understanding these traversal strategies is key to grasping the power behind Flambda2’s approach to optimization and why it stands as a robust solution for compiling OCaml code.

Thank you all for reading! We hope these articles keep the community eager to dive even deeper with us into OCaml compilation. Until next time, mind the stalactites! ⛏️🔦



About OCamlPro:

OCamlPro is a R&D lab founded in 2011, with the mission to help industrial users benefit from experts with a state-of-the-art knowledge of programming languages theory and practice.

  • We provide audit, support, custom developer tools and training for both the most modern languages, such as Rust, Wasm and OCaml, and for legacy languages, such as COBOL or even home-made domain-specific languages;
  • We design, create and implement software with great added-value for our clients. High complexity is not a problem for our PhD-level experts. For example, we helped the French Income Tax Administration re-adapt and improve their internally kept M language, we designed a DSL to model and express revenue streams in the Cinema Industry, codename Niagara, and we also developed the prototype of the Tezos proof-of-stake blockchain from 2014 to 2018.
  • We have a long history of creating open-source projects, such as the Opam package manager, the LearnOCaml web platform, and contributing to other ones, such as the Flambda optimizing compiler, or the GnuCOBOL compiler.
  • We are also experts of Formal Methods, developing tools such as our SMT Solver Alt-Ergo (check our Alt-Ergo Users' Club) and using them to prove safety or security properties of programs.

Please reach out, we'll be delighted to discuss your challenges: contact@ocamlpro.com or book a quick discussion.