An in-depth Look at OCaml’s new “Best-fit” Garbage Collector Strategy
The Garbage Collector probably is OCaml’s greatest unsung hero. Its pragmatic approach allows us to allocate without much fear of efficiency loss. In a way, the fact that most OCaml hackers know little about it is a good sign: you want a runtime to gracefully do its job without having to mind it all the time.
But as OCaml 4.10.0 has now hit the shelves, a very exciting feature is in the changelog:
#8809, #9292: Add a best-fit allocator for the major heap; still experimental, it should be much better than current allocation policies (first-fit and next-fit) for programs with large heaps, reducing both GC cost and memory usage. This new best-fit is not (yet) the default; set it explicitly with OCAMLRUNPARAM="a=2" (or Gc.set from the program). You may also want to increase the
space_overhead
parameter of the GC (a percentage, 80 by default), for example OCAMLRUNPARAM="o=85", for optimal speed. (Damien Doligez, review by Stephen Dolan, Jacques-Henri Jourdan, Xavier Leroy, Leo White)
At OCamlPro, some of the tools that we develop, such as the package manager opam, the Alt-Ergo SMT solver or the Flambda optimizer, can be quite demanding in memory usage, so we were curious to better understand the properties of this new allocator.
Minor heap and Major heap: the GC in a nutshell
Not all values are allocated equal. Some will only be useful for the span of local calculations, some will last as long as the program lives. To handle those two kinds of values, the runtime uses a Generational Garbage Collector with two spaces:
- The minor heap uses the Stop-and-copy principle. It is fast but has to stop the computation to perform a full iteration.
- The major heap uses the Mark-and-sweep principle. It has the perk of being incremental and behaves better for long-lived data.
Allocation in the minor heap is straightforward and efficient: values are stored sequentially, and when there is no space anymore, space is emptied, surviving values get allocated in the major heap while dead values are just forgotten for free. However, the major heap is a bit more tricky, since we will have random allocations and deallocations that will eventually produce a scattered memory. This is called fragmentation, and this means that you’re using more memory than necessary. Thankfully, the GC has two strategies to counter that problem:
- Compaction: a heavyweight reallocation of everything that will remove those holes in our heap. OCaml’s compactor is cleverly written to work in constant space, and would be worth its own specific article!
- Free-list Allocation: allocating the newly coming data in the holes (the free-list) in memory, de-scattering it in the process.
Of course, asking the GC to be smarter about how it allocates data makes the GC slower. Coding a good GC is a subtle art: you need to have something smart enough to avoid fragmentation but simple enough to run as fast as possible.
Where and how to allocate: the 3 strategies
OCaml used to propose 2 free-list allocation strategies: next-fit, the default, and first-fit. Version 4.10 of OCaml introduces the new best-fit strategy. Let’s compare them:
Next-fit, the original and remaining champion
OCaml’s original (and default) “next-fit” allocating strategy is pretty simple:
- Keep a (circular) list of every hole in memory ordered by increasing addresses;
- Have a pointer on an element of that list;
- When an allocation is needed, if the currently pointed-at hole is big enough, allocate in it;
- Otherwise, try the next hole and so-on.
This strategy is extremely efficient, but a big hole might be fragmented with very small data while small holes stay unused. In some cases, the GC would trigger costly compactions that would have been avoidable.
First-fit, the unsuccessful contender
To counteract that problem, the “first-fit” strategy was implemented in 2008 (OCaml 3.11.0):
- Same idea as next-fit, but with an extra allocation table.
- Put the pointer back at the beginning of the list for each allocation.
- Use the allocation table to skip some parts of the list.
Unfortunately, that strategy is slower than the previous one. This is an example of making the GC smarter ends up making it slower. It does, however, reduce fragmentation. It was still useful to have this strategy at hand for the case where compaction would be too costly (on a 100Gb heap, for instance). An application that requires low latency might want to disable compaction and use that strategy.
Best-fit: a new challenger enters!
This leads us to the brand new “best-fit” strategy. This strategy is actually composite and will have different behaviors depending on the size of the data you’re trying to allocate.
- On small data (up to 32 words), segregated free lists will allow allocation in (mostly) constant time.
- On big data, a general best-fit allocator based on splay trees.
This allows for the best of the two worlds, as you can easily allocate your numerous small blocks in the small holes in your memory while you take a bit more time to select a good place for your big arrays.
How will best-fit fare? Let’s find out!
Try it!
First, let us remind you that this is still an experimental feature, which from the OCaml development team means “We’ve tested it thoroughly on different systems, but only for months and not on a scale as large as the whole OCaml ecosystem”.
That being said, we’d advise you don’t use it in production code yet.
Why you should try it
Making benchmarks of this new strategy could be beneficial for you and the language at large: the dev team is hoping for feedback, the more quality feedback you give means the more the future GC will be tuned for your needs.
In 2008, the first-fit strategy was released with the hope of improving memory usage by reducing fragmentation. However, the lack of feedback meant that the developers were not aware that it didn’t meet the users’ needs. If more feedback had been given, it’s possible that work on improving the strategy or on better strategies would have happened sooner.
Choosing the allocator strategy
Now, there are two ways to control the GC behavior: through the code or through environment variables.
First method: Adding instructions in your code
This method should be used by those of us who have code that already does some GC fine-tuning. As early as possible in your program, you want to execute the following lines:
let () =
Gc.(set
{ (get()) with
allocation_policy = 2; (* Use the best-fit strategy *)
space_overhead = 100; (* Let the major GC work a bit less since it's more efficient *)
})
You might also want to add verbose = 0x400;
or verbose = 0x404;
in order to get some GC debug information. See here for more details on how to use the GC
module.
Of course, you’ll need to recompile your code, and this will apply only after the runtime has initialized itself, triggering a compaction in the process. Also, since you might want to easily switch between different allocation policies and overhead specifications, we suggest you use the second method.
Second method: setting $OCAMLRUNPARAM
At OCamlPro, we develop and maintain a program that any OCaml developer should want to run smoothly. It’s called Opam, maybe you’ve heard of it? Though most commands take a few seconds, some administrative-heavy commands can be a strain on our computer. In other words: those are perfect for a benchmark.
Here’s what we did to benchmark Opam:
$ opam update
$ opam switch create 4.10.0
$ opam install opam-devel # or build your own code
$ export OCAMLRUNPARAM='b=1,a=2,o=100,v=0x404'
$ cd my/local/opam-repository
$ perf stat ~/.opam/4.10.0/lib/opam-devel/opam admin check --installability # requires right to execute perf, time can do the trick
If you want to compile and run your own benchmarks, here are a few details on OCAMLRUNPARAM
:
b=1
means “print the backtrace in case of uncaught exception”a=2
means “use best-fit” (default is0
, first-fit is1
)o=100
means “do less work” (default is80
, lower means more work)v=0x404
means “have the gc be verbose” (0x400
is “print statistics at exit”, 0x4 is “print when changing heap size”)
See the manual for more details on OCAMLRUNPARAM
You might want to compare how your code fares on all three different GC strategies (and fiddle a bit with the overhead to find your best configuration).
Our results on opam
Our contribution in this article is to benchmark opam
with the different allocation strategies:
Strategy: | Next-fit | First-fit | Best-fit | ||
Overhead: | 80 | 80 | 80 | 100 | 120 |
Cycles used (Gcycle) | 2,040 | 3,808 | 3,372 | 2,851 | 2,428 |
Maximum heap size (kb) | 793,148 | 793,148 | 689,692 | 689,692 | 793,148 |
User time (s) | 674 | 1,350 | 1,217 | 1,016 | 791 |
A quick word on these results. Most of opam
‘s calculations are done by dose and rely heavily on small interconnected blocks. We don’t really have big chunks of data we want to allocate, so the strategy won’t give us the bonus you might have as it perfectly falls into the best-case scenario of the next-fit strategy. As a matter of fact, for every strategy, we didn’t have a single GC compaction happen. However, Best-fit still allows for a lower memory footprint!
Conclusions
If your software is highly reliant on memory usage, you should definitely try the new Best-fit strategy and stay tuned on its future development. If your software requires good performance, knowing if your performances are better with Best-fit (and giving feedback on those) might help you in the long run.
The different strategies are:
- Next-fit: generally good and fast, but has very bad worst cases with big heaps.
- First fit: mainly useful for very big heaps that must avoid compaction as much as possible.
- Best-fit: almost the best of both worlds, with a small performance hit for programs that fit well with next-fit.
Remember that whatever works best for you, it’s still better than having to malloc
and free
by hand. Happy allocating!
Comments
gasche (23 March 2020 at 17 h 50 min):
What about higher overhead values than 120, like 140, 160, 180 and 200?
Thomas Blanc (23 March 2020 at 18 h 17 min):
Because 100 was the overhead value Leo advised in the PR discussion I decided to put it in the results. As 120 got the same maximum heap size as next-fit I found it worth putting it in. Higher overhead values lead to faster execution time but a bigger heap.
I don’t have my numbers at hand right now. You’re probably right that they are relevant (to you and Damien at least) but I didn’t want to have a huge table at the end of the post.
nbbb (24 March 2020 at 11 h 18 min):
Higher values would allow us to see if best-fit can reproduce the performance characteristics of next-fit, for some value of the overhead.
nbbb (24 March 2020 at 16 h 51 min):
I just realized that 120 already has a heap as bit as next-fit — so best-fit can’t get as good as next-fit in this example, and higher values of the overhead are not quite as informative. Should have read more closely the first time.
Thomas Blanc (24 March 2020 at 16 h 55 min):
Sorry that it wasn’t as clear as it could be.
Note that opam and dose are in the best-case scenario of best-fit. Your own code would probably produce a different result and I encourage you to test it and communicate about it.
Au sujet d'OCamlPro :
OCamlPro développe des applications à haute valeur ajoutée depuis plus de 10 ans, en utilisant les langages les plus avancés, tels que OCaml et Rust, visant aussi bien rapidité de développement que robustesse, et en ciblant les domaines les plus exigeants (méthodes formelles, cybersécurité, systèmes distribués/blockchain, conception de DSLs). Fort de plus de 20 ingénieurs R&D, avec une expertise unique sur les langages de programmation, aussi bien théorique (plus de 80% de nos ingénieurs ont une thèse en informatique) que pratique (participation active au développement de plusieurs compilateurs open-source, prototypage de la blockchain Tezos, etc.), diversifiée (OCaml, Rust, Cobol, Python, Scilab, C/C++, etc.) et appliquée à de multiples domaines. Nous dispensons également des [formations sur mesure certifiées Qualiopi sur OCaml, Rust, et les méthodes formelles] (https://training.ocamlpro.com/) Pour nous contacter : contact@ocamlpro.com.
Articles les plus récents
2024
- opam 2.3.0 release!
- Optimisation de Geneweb, 1er logiciel français de Généalogie depuis près de 30 ans
- Alt-Ergo 2.6 is Out!
- Flambda2 Ep. 3: Speculative Inlining
- opam 2.2.0 release!
- Flambda2 Ep. 2: Loopifying Tail-Recursive Functions
- Fixing and Optimizing the GnuCOBOL Preprocessor
- OCaml Backtraces on Uncaught Exceptions
- Opam 102: Pinning Packages
- Flambda2 Ep. 1: Foundational Design Decisions
- Behind the Scenes of the OCaml Optimising Compiler Flambda2: Introduction and Roadmap
- Lean 4: When Sound Programs become a Choice
- Opam 101: The First Steps
2023
- Maturing Learn-OCaml to version 1.0: Gateway to the OCaml World
- The latest release of Alt-Ergo version 2.5.1 is out, with improved SMT-LIB and bitvector support!
- 2022 at OCamlPro
- Autofonce, GNU Autotests Revisited
- Sub-single-instruction Peano to machine integer conversion
- Statically guaranteeing security properties on Java bytecode: Paper presentation at VMCAI 23
- Release of ocplib-simplex, version 0.5