The Growth of the OCaml Distribution
We recently worked on a project to build a binary installer for OCaml, inspired from RustUp for Rust. We had to build binary packages of the distribution for every OCaml version since 4.02.0, and we were surprised to discover that their (compressed) size grew from 18 MB to about 200 MB. This post gives a survey of our findings.
Introduction
One of the strengths of Rust is the ease with which it gets installed on a new computer in user space: with a simple command copy-pasted from a website into a terminal, you get all what you need to start building Rust projects in a few seconds. Rustup, and a set of prebuilt packages for many architectures, is the project that makes all this possible.
OCaml, on the other hand, is a bit harder to install: you need to find
in the documentation the proper way for your operating system to
install opam
, find how to create a switch with a compiler version,
and then wait for the compiler to be built and installed. This usually
takes much more time.
As a winter holiday project, we worked on a project similar to Rustup,
providing binary packages for most OCaml distribution versions. It
builds upon our experience of opam
and
opam-bin
, our plugin to
build and share binary packages for opam
.
While building binary packages for most versions of the OCaml distribution, we were surprised to discover that the size of the binary archive grew from 18 MB to about 200 MB in 10 years. Though on many high-bandwidth connexions, it is not a problem, it might become one when you go far from big towns (and fortunately, we designed our tool to be able to install from sources in such a case, compromising the download speed against the installation speed).
We decided it was worth trying to investigate this growth in more details, and this post is about our early findings.
General Trends
So, let's have a look at the evolution of the size of the binary OCaml distribution in more details. Between version 4.02.0 (Aug 2014) and version 5.0.0 (Dec 2022):
-
The size of the compressed binary archive grew from from 18 MB to 198 MB
-
The size of the installed binary distribution grew from 73 MB to 522 MB
-
The number of installed files grew from 748 to 2433
On the other hand, the source distribution itself was much more stable:
-
The size of the compressed source archive grew only from 3 MB to 5 MB
-
The size of the sources grew from 14 MB to 26 MB
-
The number of source files grew from 2355 to 4084
For our project, this evolution makes the source distribution a good alternative to binary distributions for low-bandwidth settings, especially as OCaml is much faster than Rust at building itself. For the record, version 5.0.0 takes about 1 minute to build on a 16-core 64GB-RAM computer.
Interestingly, if we plot the total size of the binary distribution, and the total size with only files that were present in the previous version, we can notice that the growth is mostly caused by the increase in size of these existing files, and not by the addition of new files:
Causes and Consequences
We tried to identify the main causes of this growth: the growth is linear most of the time, with sharp increases (and decreases) at some versions. We plotted the difference in size, for the total size, the new files, the deleted files and the same files, i.e. the files that made it from one version to the next one:
Let's have a look at the versions with the highest increases in size:
-
+86 MB for 4.08.0: though there are a lot of new files (+307), they only account for 3 MB of additionnal storage. Most of the difference comes from an increase in size of both compiler libraries (probably in relation with the use of Menhir for parsing) and of some binaries. In particular:
- +13 MB for
bin/ocamlobjinfo.byte
(2_386_046 -> 16_907_776) - +12 MB for
bin/ocamldep.byte
(2_199_409 -> 15_541_022) - +6 MB for
bin/ocamldebug
(1_092_173 -> 7_671_300) - +6 MB for
bin/ocamlprof.byte
(630_989 -> 7_043_717) - +6 MB for
lib/ocaml/compiler-libs/parser.cmt
(2_237_513 -> 9_209_256)
- +13 MB for
-
+74 MB for 4.03.0: again, though there are a lot of new files (+475, mostly in
compiler-libs
), they only account for 11 MB of additionnal storage, and a large part is compensated by the removal ofocamlbuild
from the distribution, causing a gain of 7 MB.Indeed, most the increase in size is probably caused by the compilation with debug information (option
-g
), that increases considerably the size of all executables, for example:- +12 MB for
bin/ocamlopt
(2_016_697 -> 15_046_969) - +9 MB for
bin/ocaml
(1_833_357 -> 11_574_555) - +8 MB for
bin/ocamlc
(1_748_717 -> 11_070_933) - +8 MB for
lib/ocaml/expunge
(1_662_786 -> 10_672_805) - +7 MB for
lib/ocaml/compiler-libs/ocamlcommon.cma
(1_713_947 -> 8_948_807)
- +12 MB for
-
+72 MB for 4.11.0: again, the increase almost only comes from existing files. For example:
- +16 MB for
bin/ocamldebug
(8_170_424 -> 26_451_049) - +6 MB for
bin/ocamlopt.byte
(21_895_130 -> 28_354_131) - +5 MB for
lib/ocaml/extract_crc
(659_967 -> 6_203_791) - +5 MB for
bin/ocaml
(17_074_577 -> 22_388_774) - +5 MB for
bin/ocamlobjinfo.byte
(17_224_939 -> 22_523_686)
Again, the increase is probably related to adding more debug information in the executable (there is a specific PR on
ocamldebug
for that, and for all executables more debug info is available for each allocation); - +16 MB for
-
+48 MB for 5.0.0: a big difference in storage is not surprising for a change in a major version, but actually half of the difference just comes from an increase of 23 MB of
bin/ocamldoc
; -
+34 MB for 4.02.3: this one is worth noting, as it comes at a minor version change. The increase is mostly caused by the addition of 402 new files, corresponding to
cmt/cmti
files for thestdlib
andcompiler-libs
We could of course study some other versions, but understanding the root causes of most of these changes would require to go deeper than what we can in such a blog post. Yet, these figures give good hints for experts on which versions to start investigating with.
Inside the OCaml Installation
Before concluding, it might also be worth studying which parts of the
OCaml Installation take most of the space. 5.0.0 is a good candidate
for such a study, as libraries have been moved to separate
directories, instead of all being directly stored in lib/ocaml
.
Here is a decomposition of the OCaml Installation:
- Total: 529 MB
share
: 1 MBman
: 4 MBbin
: 303 MBlib/ocaml
: 223 MBcompiler-libs
: 134 MBexpunge
: 20 MB
As we can see, a large majority of the space is used by executables. For example, all these ones are above 10 MB:
- 28 MB
ocamldoc
- 26 MB
ocamlopt.byte
- 25 MB
ocamldebug
- 21 MB
ocamlobjinfo.byte
,ocaml
- 20 MB
ocamldep.byte
,ocamlc.byte
- 19 MB
ocamldoc.opt
- 18 MB
ocamlopt.opt
- 15 MB
ocamlobjinfo.opt
- 14 MB
ocamldep.opt
,ocamlc.opt
,ocamlcmt
There are both bytecode and native code executables in this list.
Conclusion
Our installer project would benefit from having a smaller binary OCaml
distribution, but most OCaml users in general would also benefit from
that: after a few years of using OCaml, OCaml developers usually end
up with huge $HOME/.opam
directories, because every opam
switch
often takes more than 1 GB of space, and the OCaml distribution takes
a big part of that. opam-bin
partially solves this problem by
sharing equal files between several switches (when the
--enable-share
configuration option has been used).
Here is a short list of ideas to test to decrease the size of the binary OCaml distribution:
-
Use the same executable for multiple programs (
ocamlc.opt
,ocamlopt.opt
,ocamldep.opt
, etc.), using the first command argument to choose the behavior to have. Rustup, for example, only installs one binary in$HOME/.cargo/bin
forcargo
,rustc
,rustup
, etc. and actually, our tool does the same trick to share the same binary for itself,opam
,opam-bin
,ocp-indent
anddrom
. -
Split installed files into separate
opam
packages, of which only one would be installed as the compiler distribution. For example, mostcmt
files ofcompiler-libs
are not needed by most users, they might only be useful for compiler/tooling developers, and even then, only in very rare cases. They could be installed as anotheropam
package. -
Remove the
-linkall
flag onocamlcommon.cm[x]a
libraries. In general, such a flag should only be set when building an executable that is expected to use plugins, because otherwise, this executable will contain all the modules of the library, even the ones that are not useful for its specific purpose.
Au sujet d'OCamlPro :
OCamlPro développe des applications à haute valeur ajoutée depuis plus de 10 ans, en utilisant les langages les plus avancés, tels que OCaml et Rust, visant aussi bien rapidité de développement que robustesse, et en ciblant les domaines les plus exigeants (méthodes formelles, cybersécurité, systèmes distribués/blockchain, conception de DSLs). Fort de plus de 20 ingénieurs R&D, avec une expertise unique sur les langages de programmation, aussi bien théorique (plus de 80% de nos ingénieurs ont une thèse en informatique) que pratique (participation active au développement de plusieurs compilateurs open-source, prototypage de la blockchain Tezos, etc.), diversifiée (OCaml, Rust, Cobol, Python, Scilab, C/C++, etc.) et appliquée à de multiples domaines. Nous dispensons également des [formations sur mesure certifiées Qualiopi sur OCaml, Rust, et les méthodes formelles] (https://training.ocamlpro.com/) Pour nous contacter : contact@ocamlpro.com.
Articles les plus récents
2024
- Alt-Ergo 2.6 is Out!
- Flambda2 Ep. 3: Speculative Inlining
- opam 2.2.0 release!
- Flambda2 Ep. 2: Loopifying Tail-Recursive Functions
- Fixing and Optimizing the GnuCOBOL Preprocessor
- OCaml Backtraces on Uncaught Exceptions
- Opam 102: Pinning Packages
- Flambda2 Ep. 1: Foundational Design Decisions
- Behind the Scenes of the OCaml Optimising Compiler Flambda2: Introduction and Roadmap
- Lean 4: When Sound Programs become a Choice
- Opam 101: The First Steps
2023
- Maturing Learn-OCaml to version 1.0: Gateway to the OCaml World
- The latest release of Alt-Ergo version 2.5.1 is out, with improved SMT-LIB and bitvector support!
- 2022 at OCamlPro
- Autofonce, GNU Autotests Revisited
- Sub-single-instruction Peano to machine integer conversion
- Statically guaranteeing security properties on Java bytecode: Paper presentation at VMCAI 23
- Release of ocplib-simplex, version 0.5
- The Growth of the OCaml Distribution