The Growth of the OCaml Distribution
We recently worked on a project to build a binary installer for OCaml, inspired from RustUp for Rust. We had to build binary packages of the distribution for every OCaml version since 4.02.0, and we were surprised to discover that their (compressed) size grew from 18 MB to about 200 MB. This post gives a survey of our findings.
Introduction
One of the strengths of Rust is the ease with which it gets installed on a new computer in user space: with a simple command copy-pasted from a website into a terminal, you get all what you need to start building Rust projects in a few seconds. Rustup, and a set of prebuilt packages for many architectures, is the project that makes all this possible.
OCaml, on the other hand, is a bit harder to install: you need to find
in the documentation the proper way for your operating system to
install opam
, find how to create a switch with a compiler version,
and then wait for the compiler to be built and installed. This usually
takes much more time.
As a winter holiday project, we worked on a project similar to Rustup,
providing binary packages for most OCaml distribution versions. It
builds upon our experience of opam
and
opam-bin
, our plugin to
build and share binary packages for opam
.
While building binary packages for most versions of the OCaml distribution, we were surprised to discover that the size of the binary archive grew from 18 MB to about 200 MB in 10 years. Though on many high-bandwidth connexions, it is not a problem, it might become one when you go far from big towns (and fortunately, we designed our tool to be able to install from sources in such a case, compromising the download speed against the installation speed).
We decided it was worth trying to investigate this growth in more details, and this post is about our early findings.
General Trends
So, let's have a look at the evolution of the size of the binary OCaml distribution in more details. Between version 4.02.0 (Aug 2014) and version 5.0.0 (Dec 2022):
-
The size of the compressed binary archive grew from from 18 MB to 198 MB
-
The size of the installed binary distribution grew from 73 MB to 522 MB
-
The number of installed files grew from 748 to 2433
On the other hand, the source distribution itself was much more stable:
-
The size of the compressed source archive grew only from 3 MB to 5 MB
-
The size of the sources grew from 14 MB to 26 MB
-
The number of source files grew from 2355 to 4084
For our project, this evolution makes the source distribution a good alternative to binary distributions for low-bandwidth settings, especially as OCaml is much faster than Rust at building itself. For the record, version 5.0.0 takes about 1 minute to build on a 16-core 64GB-RAM computer.
Interestingly, if we plot the total size of the binary distribution, and the total size with only files that were present in the previous version, we can notice that the growth is mostly caused by the increase in size of these existing files, and not by the addition of new files:
Causes and Consequences
We tried to identify the main causes of this growth: the growth is linear most of the time, with sharp increases (and decreases) at some versions. We plotted the difference in size, for the total size, the new files, the deleted files and the same files, i.e. the files that made it from one version to the next one:
Let's have a look at the versions with the highest increases in size:
-
+86 MB for 4.08.0: though there are a lot of new files (+307), they only account for 3 MB of additionnal storage. Most of the difference comes from an increase in size of both compiler libraries (probably in relation with the use of Menhir for parsing) and of some binaries. In particular:
- +13 MB for
bin/ocamlobjinfo.byte
(2_386_046 -> 16_907_776) - +12 MB for
bin/ocamldep.byte
(2_199_409 -> 15_541_022) - +6 MB for
bin/ocamldebug
(1_092_173 -> 7_671_300) - +6 MB for
bin/ocamlprof.byte
(630_989 -> 7_043_717) - +6 MB for
lib/ocaml/compiler-libs/parser.cmt
(2_237_513 -> 9_209_256)
- +13 MB for
-
+74 MB for 4.03.0: again, though there are a lot of new files (+475, mostly in
compiler-libs
), they only account for 11 MB of additionnal storage, and a large part is compensated by the removal ofocamlbuild
from the distribution, causing a gain of 7 MB.Indeed, most the increase in size is probably caused by the compilation with debug information (option
-g
), that increases considerably the size of all executables, for example:- +12 MB for
bin/ocamlopt
(2_016_697 -> 15_046_969) - +9 MB for
bin/ocaml
(1_833_357 -> 11_574_555) - +8 MB for
bin/ocamlc
(1_748_717 -> 11_070_933) - +8 MB for
lib/ocaml/expunge
(1_662_786 -> 10_672_805) - +7 MB for
lib/ocaml/compiler-libs/ocamlcommon.cma
(1_713_947 -> 8_948_807)
- +12 MB for
-
+72 MB for 4.11.0: again, the increase almost only comes from existing files. For example:
- +16 MB for
bin/ocamldebug
(8_170_424 -> 26_451_049) - +6 MB for
bin/ocamlopt.byte
(21_895_130 -> 28_354_131) - +5 MB for
lib/ocaml/extract_crc
(659_967 -> 6_203_791) - +5 MB for
bin/ocaml
(17_074_577 -> 22_388_774) - +5 MB for
bin/ocamlobjinfo.byte
(17_224_939 -> 22_523_686)
Again, the increase is probably related to adding more debug information in the executable (there is a specific PR on
ocamldebug
for that, and for all executables more debug info is available for each allocation); - +16 MB for
-
+48 MB for 5.0.0: a big difference in storage is not surprising for a change in a major version, but actually half of the difference just comes from an increase of 23 MB of
bin/ocamldoc
; -
+34 MB for 4.02.3: this one is worth noting, as it comes at a minor version change. The increase is mostly caused by the addition of 402 new files, corresponding to
cmt/cmti
files for thestdlib
andcompiler-libs
We could of course study some other versions, but understanding the root causes of most of these changes would require to go deeper than what we can in such a blog post. Yet, these figures give good hints for experts on which versions to start investigating with.
Inside the OCaml Installation
Before concluding, it might also be worth studying which parts of the
OCaml Installation take most of the space. 5.0.0 is a good candidate
for such a study, as libraries have been moved to separate
directories, instead of all being directly stored in lib/ocaml
.
Here is a decomposition of the OCaml Installation:
- Total: 529 MB
share
: 1 MBman
: 4 MBbin
: 303 MBlib/ocaml
: 223 MBcompiler-libs
: 134 MBexpunge
: 20 MB
As we can see, a large majority of the space is used by executables. For example, all these ones are above 10 MB:
- 28 MB
ocamldoc
- 26 MB
ocamlopt.byte
- 25 MB
ocamldebug
- 21 MB
ocamlobjinfo.byte
,ocaml
- 20 MB
ocamldep.byte
,ocamlc.byte
- 19 MB
ocamldoc.opt
- 18 MB
ocamlopt.opt
- 15 MB
ocamlobjinfo.opt
- 14 MB
ocamldep.opt
,ocamlc.opt
,ocamlcmt
There are both bytecode and native code executables in this list.
Conclusion
Our installer project would benefit from having a smaller binary OCaml
distribution, but most OCaml users in general would also benefit from
that: after a few years of using OCaml, OCaml developers usually end
up with huge $HOME/.opam
directories, because every opam
switch
often takes more than 1 GB of space, and the OCaml distribution takes
a big part of that. opam-bin
partially solves this problem by
sharing equal files between several switches (when the
--enable-share
configuration option has been used).
Here is a short list of ideas to test to decrease the size of the binary OCaml distribution:
-
Use the same executable for multiple programs (
ocamlc.opt
,ocamlopt.opt
,ocamldep.opt
, etc.), using the first command argument to choose the behavior to have. Rustup, for example, only installs one binary in$HOME/.cargo/bin
forcargo
,rustc
,rustup
, etc. and actually, our tool does the same trick to share the same binary for itself,opam
,opam-bin
,ocp-indent
anddrom
. -
Split installed files into separate
opam
packages, of which only one would be installed as the compiler distribution. For example, mostcmt
files ofcompiler-libs
are not needed by most users, they might only be useful for compiler/tooling developers, and even then, only in very rare cases. They could be installed as anotheropam
package. -
Remove the
-linkall
flag onocamlcommon.cm[x]a
libraries. In general, such a flag should only be set when building an executable that is expected to use plugins, because otherwise, this executable will contain all the modules of the library, even the ones that are not useful for its specific purpose.
About OCamlPro:
OCamlPro is a R&D lab founded in 2011, with the mission to help industrial users benefit from experts with a state-of-the-art knowledge of programming languages theory and practice.
- We provide audit, support, custom developer tools and training for both the most modern languages, such as Rust, Wasm and OCaml, and for legacy languages, such as COBOL or even home-made domain-specific languages;
- We design, create and implement software with great added-value for our clients. High complexity is not a problem for our PhD-level experts. For example, we helped the French Income Tax Administration re-adapt and improve their internally kept M language, we designed a DSL to model and express revenue streams in the Cinema Industry, codename Niagara, and we also developed the prototype of the Tezos proof-of-stake blockchain from 2014 to 2018.
- We have a long history of creating open-source projects, such as the Opam package manager, the LearnOCaml web platform, and contributing to other ones, such as the Flambda optimizing compiler, or the GnuCOBOL compiler.
- We are also experts of Formal Methods, developing tools such as our SMT Solver Alt-Ergo (check our Alt-Ergo Users' Club) and using them to prove safety or security properties of programs.
Please reach out, we'll be delighted to discuss your challenges: contact@ocamlpro.com or book a quick discussion.
Most Recent Articles
2024
- opam 2.3.0 release!
- Optimisation de Geneweb, 1er logiciel français de Généalogie depuis près de 30 ans
- Alt-Ergo 2.6 is Out!
- Flambda2 Ep. 3: Speculative Inlining
- opam 2.2.0 release!
- Flambda2 Ep. 2: Loopifying Tail-Recursive Functions
- Fixing and Optimizing the GnuCOBOL Preprocessor
- OCaml Backtraces on Uncaught Exceptions
- Opam 102: Pinning Packages
- Flambda2 Ep. 1: Foundational Design Decisions
- Behind the Scenes of the OCaml Optimising Compiler Flambda2: Introduction and Roadmap
- Lean 4: When Sound Programs become a Choice
- Opam 101: The First Steps
2023
- Maturing Learn-OCaml to version 1.0: Gateway to the OCaml World
- The latest release of Alt-Ergo version 2.5.1 is out, with improved SMT-LIB and bitvector support!
- 2022 at OCamlPro
- Autofonce, GNU Autotests Revisited
- Sub-single-instruction Peano to machine integer conversion
- Statically guaranteeing security properties on Java bytecode: Paper presentation at VMCAI 23
- Release of ocplib-simplex, version 0.5