Better Inlining: Progress Report
As announced some time ago, I am working on a new intermediate language within the OCaml compiler to improve its inlining strategy. After some time of bug squashing, I prepared a testable version of the patchset, available either on Github (branch flambda_experiments
), or through OPAM, in the following repository:
opam repo add inlining https://github.com/OCamlPro/opam-compilers-repository.git
opam switch flambda
opam install inlining-benchs
The series of patches is not ready for benchmarking against real applications, as no cross module information is propagated yet (this is more practical for development because it simplifies debugging a lot), but it already works quite well on single-file code. Some very simple benchmark examples are available in the inlining-benchs
package.
The series of patches implements a set of 'reasonable' compilation passes, that do not try anything too complicated, but combined, generates quite efficient code.
Current Status
As said in the previous post, I decided to design a new intermediate language to implement better inlining heuristics in the compiler. This intermediate language, called flambda
, lies between the lambda
code and the Clambda
code. It has an explicit representation of closures, making them easier to manipulate, and modules do not appear in it anymore (they have already been compiled to static structures).
I then started to implement new inlining heuristics as functions from the lambda
code to the flambda
code. The following features are already present:
- intra function value analysis
- variable rebinding
- dead code elimination (which needs purity analysis)
- known match / if branch elimination
In more detail, the chosen strategy is divided into two passes, which can be described by the following pseudo-code:
if function is at toplevel
then if applied to at least one constant OR small enough
then inline
else if applied to at least one constant AND small enough
then inline
if function is small enough
AND does not contain local function declarations
then inline
The first pass eliminates most functor applications and functions of the kind:
let iter f x =
let rec aux x = ... f ... in
aux x
The second pass eliminates the same kind of functions as Ocaml 4.01, but being after the first pass, it can also inline functions revealed by inlining functors.
Benchmarks
I ran a few benchmarks to ensure that there were no obvious miscompilations (and there were, but they are now fixed). On benchmarks that were too carefully written there was not much gain, but I got interesting results on some examples: those illustrate quite well the improvements, and can be seen at $(opam config var lib)/inlining-benchs
(binaries at $(opam congfig var bin)/bench-*
).
The Knuth-Bendix Benchmark (single-file)
Performance gains against OCaml 4.01 are around 20%. The main difference is that exceptions are compiled to constants, hence not allocated when raised. In that particular example, this halves the allocations.
In general, constant exceptions can be compiled to constants when predefined (Not_found
, Failure
, ...). They cannot yet when user defined: to improve this a few things need to be changed in translcore.ml
to annotate values created by exceptions.
The Noiz Benchmark:
Performance gains are around 30% against OCaml 4.01. This code uses a lot of higher order functions of the kind:
let map_triple f (a,b,c) = (f a, f b, f c)
OCaml 4.01 can inline map_triple
itself but then cannot inline the parameter f
. Moreover, when writing:
let (x,y,z) = map_triple f (1,2,3)
the tuples are not really used, and after inlining their allocations can be eliminated (thanks to rebinding and dead code elimination)
The Set Example
Performance gains are around 20% compared to OCaml 4.01. This example shows how inlining can help defunctorization: when inlining the Set
functor, the provided comparison function can be inlined in Set.add
, allowing direct calls everywhere.
Known Bugs
Recursive Values
A problem may arise in a rare case of recursive values where a field access can be considered to be a constant. Something that would look like (if it were allowed):
type 'a v = { v : 'a }
let rec a = { v = b }
and b = (a.v, a.v)
I have a few solutions, but not sure yet which one is best. This probably won't appear in any normal test. This bug manifests through a segmentation fault (cmmgen
fails to compile that recursive value reasonably).
Pattern-Matching
The new passes assume that every identifier is declared only once in a given module, but this assumption can be broken on some rare pattern matching cases. I will have to dig through matching.ml
to add a substitution in these cases. (the only non hand-built occurence that I found is in ocamlnet
)
Known Mis-compilations
- since there is no cross-module information at the moment, calls to functions from other modules are always slow.
- In some rare cases, there could be functions with more values in their closure, thus resulting in more allocations.
What's next ?
I would now like to add back cross-module information, and after a bit of cleanup the first series of patches should be ready to propose upstream.
About OCamlPro:
OCamlPro is a R&D lab founded in 2011, with the mission to help industrial users benefit from experts with a state-of-the-art knowledge of programming languages theory and practice.
- We provide audit, support, custom developer tools and training for both the most modern languages, such as Rust, Wasm and OCaml, and for legacy languages, such as COBOL or even home-made domain-specific languages;
- We design, create and implement software with great added-value for our clients. High complexity is not a problem for our PhD-level experts. For example, we helped the French Income Tax Administration re-adapt and improve their internally kept M language, we designed a DSL to model and express revenue streams in the Cinema Industry, codename Niagara, and we also developed the prototype of the Tezos proof-of-stake blockchain from 2014 to 2018.
- We have a long history of creating open-source projects, such as the Opam package manager, the LearnOCaml web platform, and contributing to other ones, such as the Flambda optimizing compiler, or the GnuCOBOL compiler.
- We are also experts of Formal Methods, developing tools such as our SMT Solver Alt-Ergo (check our Alt-Ergo Users' Club) and using them to prove safety or security properties of programs.
Please reach out, we'll be delighted to discuss your challenges: contact@ocamlpro.com or book a quick discussion.
Most Recent Articles
2024
- opam 2.3.0 release!
- Optimisation de Geneweb, 1er logiciel français de Généalogie depuis près de 30 ans
- Alt-Ergo 2.6 is Out!
- Flambda2 Ep. 3: Speculative Inlining
- opam 2.2.0 release!
- Flambda2 Ep. 2: Loopifying Tail-Recursive Functions
- Fixing and Optimizing the GnuCOBOL Preprocessor
- OCaml Backtraces on Uncaught Exceptions
- Opam 102: Pinning Packages
- Flambda2 Ep. 1: Foundational Design Decisions
- Behind the Scenes of the OCaml Optimising Compiler Flambda2: Introduction and Roadmap
- Lean 4: When Sound Programs become a Choice
- Opam 101: The First Steps
2023
- Maturing Learn-OCaml to version 1.0: Gateway to the OCaml World
- The latest release of Alt-Ergo version 2.5.1 is out, with improved SMT-LIB and bitvector support!
- 2022 at OCamlPro
- Autofonce, GNU Autotests Revisited
- Sub-single-instruction Peano to machine integer conversion
- Statically guaranteeing security properties on Java bytecode: Paper presentation at VMCAI 23
- Release of ocplib-simplex, version 0.5