Better Inlining: Progress Report
As announced some time ago, I am working on a new intermediate language within the OCaml compiler to improve its inlining strategy. After some time of bug squashing, I prepared a testable version of the patchset, available either on Github (branch flambda_experiments
), or through OPAM, in the following repository:
opam repo add inlining https://github.com/OCamlPro/opam-compilers-repository.git
opam switch flambda
opam install inlining-benchs
The series of patches is not ready for benchmarking against real applications, as no cross module information is propagated yet (this is more practical for development because it simplifies debugging a lot), but it already works quite well on single-file code. Some very simple benchmark examples are available in the inlining-benchs
package.
The series of patches implements a set of 'reasonable' compilation passes, that do not try anything too complicated, but combined, generates quite efficient code.
Current Status
As said in the previous post, I decided to design a new intermediate language to implement better inlining heuristics in the compiler. This intermediate language, called flambda
, lies between the lambda
code and the Clambda
code. It has an explicit representation of closures, making them easier to manipulate, and modules do not appear in it anymore (they have already been compiled to static structures).
I then started to implement new inlining heuristics as functions from the lambda
code to the flambda
code. The following features are already present:
- intra function value analysis
- variable rebinding
- dead code elimination (which needs purity analysis)
- known match / if branch elimination
In more detail, the chosen strategy is divided into two passes, which can be described by the following pseudo-code:
if function is at toplevel
then if applied to at least one constant OR small enough
then inline
else if applied to at least one constant AND small enough
then inline
if function is small enough
AND does not contain local function declarations
then inline
The first pass eliminates most functor applications and functions of the kind:
let iter f x =
let rec aux x = ... f ... in
aux x
The second pass eliminates the same kind of functions as Ocaml 4.01, but being after the first pass, it can also inline functions revealed by inlining functors.
Benchmarks
I ran a few benchmarks to ensure that there were no obvious miscompilations (and there were, but they are now fixed). On benchmarks that were too carefully written there was not much gain, but I got interesting results on some examples: those illustrate quite well the improvements, and can be seen at $(opam config var lib)/inlining-benchs
(binaries at $(opam congfig var bin)/bench-*
).
The Knuth-Bendix Benchmark (single-file)
Performance gains against OCaml 4.01 are around 20%. The main difference is that exceptions are compiled to constants, hence not allocated when raised. In that particular example, this halves the allocations.
In general, constant exceptions can be compiled to constants when predefined (Not_found
, Failure
, ...). They cannot yet when user defined: to improve this a few things need to be changed in translcore.ml
to annotate values created by exceptions.
The Noiz Benchmark:
Performance gains are around 30% against OCaml 4.01. This code uses a lot of higher order functions of the kind:
let map_triple f (a,b,c) = (f a, f b, f c)
OCaml 4.01 can inline map_triple
itself but then cannot inline the parameter f
. Moreover, when writing:
let (x,y,z) = map_triple f (1,2,3)
the tuples are not really used, and after inlining their allocations can be eliminated (thanks to rebinding and dead code elimination)
The Set Example
Performance gains are around 20% compared to OCaml 4.01. This example shows how inlining can help defunctorization: when inlining the Set
functor, the provided comparison function can be inlined in Set.add
, allowing direct calls everywhere.
Known Bugs
Recursive Values
A problem may arise in a rare case of recursive values where a field access can be considered to be a constant. Something that would look like (if it were allowed):
type 'a v = { v : 'a }
let rec a = { v = b }
and b = (a.v, a.v)
I have a few solutions, but not sure yet which one is best. This probably won't appear in any normal test. This bug manifests through a segmentation fault (cmmgen
fails to compile that recursive value reasonably).
Pattern-Matching
The new passes assume that every identifier is declared only once in a given module, but this assumption can be broken on some rare pattern matching cases. I will have to dig through matching.ml
to add a substitution in these cases. (the only non hand-built occurence that I found is in ocamlnet
)
Known Mis-compilations
- since there is no cross-module information at the moment, calls to functions from other modules are always slow.
- In some rare cases, there could be functions with more values in their closure, thus resulting in more allocations.
What's next ?
I would now like to add back cross-module information, and after a bit of cleanup the first series of patches should be ready to propose upstream.
Au sujet d'OCamlPro :
OCamlPro développe des applications à haute valeur ajoutée depuis plus de 10 ans, en utilisant les langages les plus avancés, tels que OCaml et Rust, visant aussi bien rapidité de développement que robustesse, et en ciblant les domaines les plus exigeants (méthodes formelles, cybersécurité, systèmes distribués/blockchain, conception de DSLs). Fort de plus de 20 ingénieurs R&D, avec une expertise unique sur les langages de programmation, aussi bien théorique (plus de 80% de nos ingénieurs ont une thèse en informatique) que pratique (participation active au développement de plusieurs compilateurs open-source, prototypage de la blockchain Tezos, etc.), diversifiée (OCaml, Rust, Cobol, Python, Scilab, C/C++, etc.) et appliquée à de multiples domaines. Nous dispensons également des [formations sur mesure certifiées Qualiopi sur OCaml, Rust, et les méthodes formelles] (https://training.ocamlpro.com/) Pour nous contacter : contact@ocamlpro.com.
Articles les plus récents
2024
- opam 2.3.0 release!
- Optimisation de Geneweb, 1er logiciel français de Généalogie depuis près de 30 ans
- Alt-Ergo 2.6 is Out!
- Flambda2 Ep. 3: Speculative Inlining
- opam 2.2.0 release!
- Flambda2 Ep. 2: Loopifying Tail-Recursive Functions
- Fixing and Optimizing the GnuCOBOL Preprocessor
- OCaml Backtraces on Uncaught Exceptions
- Opam 102: Pinning Packages
- Flambda2 Ep. 1: Foundational Design Decisions
- Behind the Scenes of the OCaml Optimising Compiler Flambda2: Introduction and Roadmap
- Lean 4: When Sound Programs become a Choice
- Opam 101: The First Steps
2023
- Maturing Learn-OCaml to version 1.0: Gateway to the OCaml World
- The latest release of Alt-Ergo version 2.5.1 is out, with improved SMT-LIB and bitvector support!
- 2022 at OCamlPro
- Autofonce, GNU Autotests Revisited
- Sub-single-instruction Peano to machine integer conversion
- Statically guaranteeing security properties on Java bytecode: Paper presentation at VMCAI 23
- Release of ocplib-simplex, version 0.5