OCamlPro Feed

OCaml Onboarding: Introduction to the Dune build system

2025-07-29T09:05:17Z

A camel sitting atop a dune in the middle of the desert. He wears his hard hat as he takes a break from all the building and running of OCaml code. Want to start building your own? Follow the tracks below.

Welcome to all Camleers

We are back with another practical walkthrough for the newcomers of the OCaml ecosystem. We understand from the feedback we have gathered over the years that getting started with the OCaml Distribution can sometimes be perceived as challenging at first. That's why we keep it in mind when planning each post - to make your onboarding smoother and more approachable.

Case in point: today's topic, which came to us during the making of our latest opam deep-dive: Opam 103: Bootstrapping a New OCaml Project with opam.

It occured to us that we were assuming a level of familiarity with the toolchain that we had never explicitly explained or clarified. We decided to put together a short, practical guide for the newer developers, looking for quick, on-the-fly tutorials for OCaml. 🛠️

A Camleer's basics: Dune

If you're new to OCaml, or any other programming language for that matter, the first necessities you'll encounter are building, running, and testing your code. Fortunately, there is a powerful build system called dune that we can use. It is widespread and makes project setup and compilation straightforward. Understanding how dune works is a key step towards becoming productive in the OCaml ecosystem.

In this article, we’ll walk you through the essentials of using dune to build libraries, executables, and tests, and to manage your project structure. Whether you're writing your first OCaml program or stepping into a new dune-based codebase, this guide will help you get up and running quickly.

We strongly believe that starting from scratch is key when approaching a brand new technical topic — and today's topic is no exception. Anyone who has ever felt lost exploring a new codebase knows that minimal, toy examples are often the best way to build intuition.

Table of contents

A Camleer's basics: Dune
Ressources
Project metadata and build specification files
- dune-project
- dune file
  - Key stanzas
Build and run your project
Test your project with Dune
- Cram tests
- dune runtest
Scaffolding with dune init

Ressources

As said previously, this article was written in the context of the latest Opam 103: Bootstrapping a New OCaml Project with opam. That article explained how an OCaml developer should go about structuring an OCaml project when they intend to use it with opam.

The point of today's topic is to focus on the other defining parameter of the structure of an OCaml project: your build system. The goal is to show how the workflows of opam and dune fit together, while giving you a solid introduction to the fundamentals of dune.

We're using the same toy project helloer as basis for this rundown. It's a simple, well-scoped example with a structure that's idiomatic to both opam and dune, making it a great fit for illustrating the fundamentals without unnecessary complexity.

Note that helloer was not created using dune init that we will introduce at the end of this article. First, it's important to understand how Dune works under the hood - so you know what it's generating for you, how to modify it confidently, and how it fits into your overall build workflow.

Consider checking Dune's official reference manual or visiting the official OCaml Discuss forum to reach out to the OCaml Community.

Project metadata and build specification files

dune-project

Let's first start with the dune-project file since every Dune-driven project should have one at its root.

This file is the entry point for your project and its contents are its metadata — which Dune uses to understand how your project is structured.

Said metadata includes things like:

the version of dune you're using;
important URLs for your projects lifecycles;
optional settings like dependencies licensing, documentation;
and even configuration for automatic opam file generation. More on that in Opam 103.

This information not only guides Dune, but also helps tools like opam understand how to build, distribute, and document your project.

$ cat dune-project
(lang dune 3.15)
(package (name helloer))

(cram enable)

Note: The first line must be (lang dune X.Y) - with no comments or extra whitespace. This line determines which features and syntax dune will recognize.

NB: You will find all complementary information in the official docs 👈.

dune file

A dune file is a build specification file that tells Dune how to compile the OCaml code within a specific directory.

Usually there's one dune file per subdirectory, with the description of what's there - library, executable, or some tests. Since our toy helloer project is flat in structure, we’ll place this file at the root of the project.

$ cat dune
(library
 (name helloer_lib)
 (modules helloer_lib)
)

(executable
 (public_name helloer)
 (name helloer)
 (libraries cmdliner helloer_lib)
 (modules helloer)
)

(test
 (name test)
 (libraries alcotest helloer_lib)
 (modules test)
)

In effect, this tells dune:

how to build the OCaml files in that directory;
how libraries, executables, and test targets are defined.

Key stanzas

In the context of Dune, a stanza is just a fancy word for a block of configuration. It tells the build system what kind of artifact you want to define — be it a library, an executable, a test, a documentation alias, or even an installable binary. Each stanza lives inside a dune file and follows a structured, declarative syntax.

They’re usually grouped by purpose, and each type comes with its own expected fields. Each of these stanzas deserves a deeper dive, but here's a quick overview to get you started.

`library` stanza

(library
 (name helloer_lib)
 (modules helloer_lib)
)

A library stanza tells Dune how to compile a set of modules into a reusable package.

Purpose of this stanza:

defines a library named helloer_lib;
which will be built from the module helloer_lib.ml (by default, each .ml file defines a module with the same name);
and only the exposed modules should be listed here - that is, the modules that are meant to be part of the library's public API and usable by other parts of the project or by external code.

OCaml module names should match the filename, so helloer_lib.ml is expected to exist in this directory.

`executable` stanza

(executable
 (public_name helloer)
 (name helloer)
 (libraries cmdliner helloer_lib)
 (modules helloer)
)

An executable stanza explains how to bundle up some code into a runnable binary.

Purposes:

name: builds an executable named helloer;
needs libraries: external cmdliner (for CLI parsing) and internal helloer_lib (our own library);
public_name helloer: this makes the executable available publicly. It is used for dune install helloer in the opam file for instance.

You can learn about how to find and install cmdliner in opam in the latest Opam 103 blogpost, you'll find a simple breakdown of opam files there too .

`test` stanza

(test
 (name test)
 (libraries alcotest helloer_lib)
 (modules test)
)

What it does:

declares a test target named test, defined in the file test.ml. A test stanza registers the executable as part of the runtest rule alias, meaning it will be compiled and run automatically when you invoke dune runtest (or its alias dune test);
uses the alcotest testing library;
also uses helloer_lib to test its functionality.

Now your project is setup and structured. Next, let’s see how to build it.

Build and run your project

`dune build`

As you can see below, the dune build @all command will build all targets defined in your dune files, it's the default behaviour of the dune build command.

$ tree
.
├── dune
├── dune-project
├── helloer_lib.ml
├── helloer.ml
├── helloer.opam
└── test.ml
$ dune build @all

$ tree -L 2
.
├── _build
│   ├── default
│   │   ├── helloer.exe      // executable in its build dir
│   │   ├── helloer_lib.cmxs // built library
│   │   ├── test.exe         // test executable
│   │   └── [...]
│   ├── install
│   └── log
├── dune
├── dune-project
├── helloer_lib.ml
├── helloer.ml
├── helloer.opam
└── test.ml

Explanation:

@all is an alias that includes all buildable targets defined in your dune files: executables, libraries, tests, docs, etc;
it is useful for doing a full build to ensure everything compiles.

You can also use custom aliases (like @doc, @runtest, etc.), or define your own in your dune files.

`dune build @doc`

Once your code builds and your project has a proper dune-project file, you can generate documentation using:

$ dune build @doc

What it does:

uses odoc behind the scenes to build API docs from your OCaml code. This implies that installing odoc is mandatory to benefit from this feature, a simple opam install odoc will do just fine;
builds HTML files in _build/default/_doc/_html/.

Make sure your dune-project file includes a (package ...) stanza, and that your libraries are properly documented using OCaml comments (** your comment *).

You can see generate the doc for the toy project here

NB: You will find all complementary information in the official docs 👈.

After building, you can view the generated docs:

$ open _build/default/_doc/_html/index.html

This is great for checking your module interfaces or publishing documentation online.

`dune exec --`

This command is used to run executables defined in your project.

So, something like:

$ dune exec -- ./helloer.exe
Hello OCamlers!!                   
$ dune exec -- ./helloer.exe --gentle
Welcome my dear OCamlers.

This tells dune to build the executable if necessary, then run it. The -- separates the dune options from the executable and its arguments. The first item after -- is the executable to run

This can be:

A relative path to a built target, so: dune exec -- ./path/to/executable
A public name of an installed executable, meaning: dune exec -- ./helloer.

All additional arguments after the executable name (like --gentle) are passed to the executable itself.

Essentially, dune exec -- COMMAND behaves the same way as calling dune install first and then COMMAND sequentially.

NB: If you'd like to copy the executable to your project root (outside _build/), you can add (promote (until-clean)) to your executable stanza.

Great, our little project builds and runs smoothly, now onto testing it.

Test your project with Dune

In our helloer project, we use the alcotest library on our internal helloer_lib. This is quite standard. However testing the executable itself can be done without depending on an external tool with the help of cram tests.

Cram tests

Dune supports a special kind of test called a cram test, inspired by the original Cram, which checks that command-line examples produce the expected output.

The "expected output" is the shell-session itself and whatever your executable prints, during its test run for that specific call, is checked against it.

To create a cram test, you just write a .t file that contains a succession of shell-like sessions separated by empty newlines like so:

$ helloer
Hello OCamlers!!

$ helloer --gentle
Welcome my dear OCamlers.

How it works:

it runs the commands in .t files;
it compares what is printed to stdout by our binary to the expected output written in the cram file;
fails if the outputs differ. However, you can use dune promote whenever you wish to replace all the failed tests with the new output, which will most often happen when you make changes to your binary's printing to stdout.

You can test it here.

`dune runtest`

You can run all your tests using:

$ dune runtest

it builds test targets defined in your project;
it looks for files ending in .t or .ml files marked as tests;
it executes the tests, often using expect style testing (like ppx_expect or alcotest).

It's quite straightforward: if you have an inline_tests stanza or an expect test, it will run them and tell you if anything failed.

For example, a valid cram test will output something like:

$ dune runtest
Testing `Tests'.                 
This run has ID `N39NJ5ZE'.

  [OK]          messages          0   normal.
  [OK]          messages          1   gentle.

Full test results in `~/ocamler/dev/helloer/_build/default/_build/_tests/Tests'.
Test Successful in 0.000s. 2 tests run.

However, if one of these tests were to fail, you would see something like:

$ dune runtest
File "test.t", line 1, characters 0-0:
diff --git a/_build/.sandbox/e6d6dcfb864b62e42104889af2a44f23/default/test.t b/_build/.sandbox/e6d6dcfb864b62e42104889af2a44f23/default/test.t.corrected
index f79b63c..70c7a17 100644
--- a/_build/.sandbox/e6d6dcfb864b62e42104889af2a44f23/default/test.t
+++ b/_build/.sandbox/e6d6dcfb864b62e42104889af2a44f23/default/test.t.corrected
@@ -3,7 +3,7 @@ Default behaviour
   Hello OCamlers!!
 Gentle behaviour
   $ helloer --gentle
-  Welcome my deer OCamlers.
+  Welcome my dear OCamlers.
 Unknown behaviour
   $ helloer --unknown
   helloer: unknown option '--unknown'.
File "dune", line 16, characters 7-11:       
16 |  (name test)
            ^^^^
Testing `Tests'.
This run has ID `1OS0H3WP'.

  [OK]          messages          0   normal.
> [FAIL]        messages          1   gentle.

┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ [FAIL]        messages          1   gentle.                                                                                              │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
ASSERT same string
FAIL same string

   Expected: `"Welcome my deer OCamlers."'
   Received: `"Welcome my dear OCamlers."'

Raised at Alcotest_engine__Test.check in file "src/alcotest-engine/test.ml", lines 216-226, characters 4-19
Called from Alcotest_engine__Core.Make.protect_test.(fun) in file "src/alcotest-engine/core.ml", line 186, characters 17-23
Called from Alcotest_engine__Monad.Identity.catch in file "src/alcotest-engine/monad.ml", line 24, characters 31-35

Logs saved to `~/ocamler/dev/helloer/_build/default/_build/_tests/Tests/messages.001.output'.
 ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Full test results in `~/ocamler/dev/helloer/_build/default/_build/_tests/Tests'.
1 failure! in 0.000s. 2 tests run.

At this point in the development process, we can assume that you know how to use the most basic dune command-lines to make your OCaml projects a reality!

Now that we’ve explored how Dune works at a foundational level — writing stanzas by hand, managing libraries and executables, building, running, testing — you’re probably starting to see patterns. These project ingredients don’t change much from one small OCaml project to the next. That’s exactly where dune init comes in.

Scaffolding with dune init

dune init is the starting point for creating a new OCaml project using Dune. It scaffolds a working directory structure and sets up the essential files you’ll need.

Rather than writing every file from scratch, Dune offers a command-line scaffolding tool that sets up a complete, minimal project for you — so you can jump straight to writing code with a solid structure already in place.

This means the following command is all you need to scaffold a basic project:

$ dune init project helloer

What it does:

creates a new directory helloer with a working OCaml project inside;
sets up the dune-project file;
adds sample source files and their associated dune build files.

The structure you'll get looks like this:

$ tree
helloer/
├── bin/
│   ├── dune
│   └── main.ml
├── dune-project
├── lib
│   ├── dune
├── test
│    ├── dune
│    ├── test_helloer.ml
└── [...]

From here, you can build on this template by adding libraries, tests, and more.

If your project is only a library or binary, you can use the other project template with dune init lib helloer or dune init exec helloer.

Sharp-eyed readers may notice differences between our toy project and the layout generated by dune init.

You can see the end result in this branch.

Conclusion

Indeed, you should be comfortable with the basic building blocks of a dune-based OCaml project: from initializing it, defining libraries and executables, to running it and writing tests, and even generating documentation. dune takes care of a lot of the heavy lifting, letting you focus on writing code rather than fiddling with build scripts. As you grow more confident with OCaml and Dune, you’ll discover even more powerful features—but for now, you’re well-equipped to start building real-world OCaml applications.

opam 2.4 release

2025-07-23T09:05:17Z

Feedback on this post is welcomed on Discuss!

We are extremely happy to announce the release of opam 2.4.0 and encourage all users to upgrade. Please read on for installation and upgrade instructions.

Major changes

On opam init the compiler chosen for the default switch will no longer be ocaml-system (#3509) This was done because the system compiler (as-is your ocaml installed system wide, e.g. /usr/bin/ocaml) is known to be under-tested and prone to a variety of bugs and configuration issues. Removing it from the default compiler allows new-comers a more smooth experience. Note: if you wish to use it anyway, you are always able to do it explicitly using opam init --compiler=ocaml-system
GNU patch and the diff command are no longer runtime dependencies. Instead the OCaml patch library is used (#6019, #6052, #3782, ocaml/setup-ocaml#933) Doing this we've removed some rarely used features of GNU Patch such as the support of Context diffs. The new implementation only supports Unified diffs including the git extended headers, however file permission changes via said extended headers have no effect.
Add Nix support for external dependencies (depexts) by adding support for stateless package managers (#5982). Thanks to @RyanGibb for this contribution
Fix opam install <local_dir> with and without options like --deps-only or --show-action having unexpected behaviours (#6248, #5567) such as:
- reporting Nothing to do despite dependencies or package not being up-to-date
- asking to install the wrong dependencies

UI changes

opam show now displays the version number of packages flagged with avoid-version/deprecated gray (#6354)
opam upgrade: Do not show the message about packages "not up-to-date" when the package is tagged with avoid-version/deprecated (#6271)
Fail when trying to pin a package whose definition could not be found instead of forcing interactive edition (e.g. this could happen when making a typo in the package name of a pin-depends) (#6322)

New commands / options

Add opam admin compare-versions to compare package versions for sanity checks. Thanks to @mbarbin for this contribution
Add opam lock --keep-local to keep local pins url in pin-depends field (#4897)
Add opam admin migrate-extrafiles which moves all extra-files of an existing opam repository into extra-sources. Thanks to @hannesm for this contribution
The -i/--ignore-test-doc argument has been removed from opam admin check (#6335)

Other noteworthy changes

opam pin/opam pin list now displays the current revision of a pinned repository in a new column. Thanks to @desumn for this contribution
Symlinks in repositories are no longer supported (#5892)
Fix sandboxing support in NixOS (#6333)
Add the OPAMSOLVERTOLERANCE environment variable to allow users to fix solver timeouts for good (#3230)
Fix a regression on opam upgrade <package> upgrading unrelated packages (#6373). Thanks to @AltGr for this contribution
Fix pin-depends for with-* dependencies when creating a lock file (#5428)
opam admin check now sets with-test and with-doc to false instead of true
Add apt-rpm/ALTLinux family support for depexts. Thanks to @RiderALT for this contribution
Fix the detection of installed external packages on OpenBSD to not just consider manually installed packages (#6362). Thanks to @semarie for this contribution
Disable the detection of available system packages on SUSE-based distributions (#6426)

ystem,dune,beginner,dev,new project

Changes

opam switch create [name] <version> will not include compiler packages flagged with avoid-version/deprecated in the generated invariant anymore (#6494). This will allow opam to avoid the use of the ocaml-system package unless actually explicitly requested by the user. The opam experience when the ocaml-system compiler is used is known to be prone to a variety of bugs and configuration issues.
Cygwin: Fallback to the existing setup-x86_64.exe if its upgrade failed to be fetched (#6495, partial fix for #6474)
Fix a memory leak happening when running large numbers of commands or opening large number of opam files (#6484). Thanks to @hannesm for this contribution
Remove handling of the OPAMSTATS environment variable (#6485). Thanks to @hannesm for this contribution

Changes

Fixed some bugs in opam install --deps-only (and other commands simulating package pins, such as --depext-only) more visible in 2.4:
- When a package pkg is already installed and opam install ./pkg --deps is called, if there is a conflict between the installed pkg dependencies and the definition of the local pkg, the conflict was not seen and the already installed pkg was kept (#6529)
- No longer fetch and write the sources when simulating packages that were already pinned (#6532)
- opam was triggering the reinstall of the package based on the already pinned packages instead of the expected newly simulated pinned packages (#6501)
- opam was using the opam description of the wrong package in some cases (#6535)
Change the behaviour of --deps-only, where it no longer requires unicity of package version between the request and the installed packages. In other words, if you have pkg.1 installed, installing dependencies of pkg.2 no longer removes pkg.1. This also allows to install dependencies of conflicting packages when their dependencies are compliant. (#6520)

Windows binary

Improve the prebuilt Windows binaries by including Cygwin's setup-x86_64.exe in the binary itself as fallback, in case cygwin.com is inaccessible (#6538)

NOTE: this article is cross-posted on opam.ocaml.org and ocamlpro.com.

Opam 103: Bootstrapping a New OCaml Project with opam

2025-04-19T09:05:17Z

A young camel is ready to leave on its first journey through the desert, he is well-prepared and has the perfect tools at his disposal!

Curious about the origins of opam?

Check out this short history on its evolution as the de facto package manager and environment manager for OCaml.

Welcome back to the `opam deep-dives` series!

Finally - you've asked for it since our very first opam deep-dive: it's time to explore the developer side of the opam experience.

So far, we have focused on user-facing scenarios to provide a gentle introduction. Now, we are shifting gears into project creation and development workflows.

Thank you for your patience - the wait was worth it! Today, we will guide you through starting a new OCaml project with a full opam-integrated workflow. 🚀

This guide is especially geared toward newer OCaml developers who want to master opam when setting up and managing a project. 😇

Table of contents

Prerequisites & Context
Setting up the environment
- Creating a new local switch
Getting started
Your first opam file
- A minimal functional opam file
- A real-world opam file
Conclusion

If you haven't yet, we recommend starting with Opam 101: The First steps to get comfortable with installation and usage basics and Opam 102: Pinning Packages, which already dives quite deep into package pinning, one of the first keys to tailoring your workflow and environment to your exact needs.

Also, check out the tags of each article to get an idea of the entry level required for the smoothest read possible!

Prerequisites & Context

Our goal across this post and the next one is to guide you through the full life cycle of an OCaml project - from creating a directory on your machine to publishing your package to the official opam-repository.

We will walk through each step of the journey, highlighting not just how to do things, but also why they matter in a pragmatic, real-world opam workflow. You’ll see how to:

Create and manage local switches
Select and install packages
Prepare your project for distribution

This post assumes you've read Opam 101: The First Step - especially the section on switches.

Nevertheless, here's a quick TL;DR for those of you who would rather get started:

What is an opam switch?

An opam switch is a development environment in the OCaml world. Opam provides you with a command-line interface for you to customise, and maintain a safe and stable environment. It's defined by all the possible combinations and valid operations between a specific version of the OCaml compiler, and any set of versioned packages.

Functionally, it is a set of environment variables that are user-updated and point to the different locations of installed versions of packages, binaries and other utilities either in a ~/.opam directory for global switches or in the current _opam directory for local ones.

👉 More in the official docs.

In this tutorial, we'll use local switches, which are especially well-suited for project-based workflows like the one we are building today. Furthermore, know that this article uses opam 2.1.5!

Ready? Let's dive in!

Setting up the environment

Before publishing an OCaml package, you have to develop it - and that means setting up your environment.

This encompasses everything from creating the working directory of your new project, to setting up a custom, local switch for it.

We will consider that you have created a new directory for your project and have since moved into it in order to progress further in the setup process.

Something like:

$ mkdir helloer
$ cd helloer

Here are the things that opam will help you accomplish at this stage of the development process:

Setting up a new switch (i.e, environment creation);
Explore packages (libraries / tooling) available in the OCaml ecosystem (i.e, technical exploration);
Selection and installation of OCaml software inside a switch (i.e, environment setup and tailoring);

Creating a new local switch

A switch is the isolated environment in which opam will operate and assist you in taking all the necessary steps towards an optimal workflow.

It defines a specific OCaml compiler version and a set of compatible packages, allowing you to safely build and manage your project.O

So let's first create a local switch in our helloer directory.

$ opam switch create .

<><> Installing new switch packages <><><><><><><><><><><><><><><><><><><><><><>
Switch invariant: ["ocaml" {>= "4.05.0"}]

We let opam select the default switch invariant when creating a new switch which is OCaml compiler version >= 4.05.0. You can define any set of switch invariants that you wish.

In the call above, the . character indicates that we are asking opam to create a switch inside the current directory, a local one. This differs from a global switch, which lives in your ~/.opam/ folder.

What's a “switch invariant”?

The idea of switch invariants is quite simple, they are the parameters of the automatic solving of package dependency trees. More specifically, this switch invariant defines the OCaml version your environment relies on. Invariants are immutable. opam will never change invariants without notifying you first and will always consider the switch invariant constraint when building the graph of available/compatible packages for your current switch, or for any other switch-altering operation for that matter.

So, back to our example:

$ opam switch create .

<><> Installing new switch packages <><><><><><><><><><><><><><><><><><><><><><>
Switch invariant: ["ocaml" {>= "4.05.0"}]

<><> Processing actions <><><><><><><><><><><><><><><><><><><><><><><><><><><><>
∗ installed base-bigarray.base
∗ installed base-threads.base
∗ installed base-unix.base
∗ installed ocaml-system.4.14.1
∗ installed ocaml-config.2
∗ installed ocaml.4.14.1
Done.

We can see that opam selected ocaml-system.4.14.1 as opposed to ocaml-base-compiler.4.14.1 as the OCaml compiler to install in your current local switch along with its dependencies. What's the difference?

ocaml-system is a system-bound compiler, typically one already installed on your system (e.g. via apt, brew, etc.), outside of your opam installation
ocaml-base-compiler would be a new compiler installed within your opam installation, one that opam would have permission over.

If you recall this section of Opam 101, you should know that creating a switch can be a fairly time-consuming task depending on whether or not the compiler version you have queried from opam is already installed somewhere on your machine. Therefore, every time you ask opam to install a version of the compiler, it will first scour your installation for a locally available version of that compiler to save you the time necessary for downloading, compiling and installing a brand new one. This is the reason why opam has selected an ocaml-system.4.14.1 compiler instead of installing a brand new ocaml-base-compiler.4.14.1.

A quick look at our current directory will show that an _opam directory can now be found.

$ ls
_opam

And opam switch will list the currently available switches:

$ opam switch
#  switch                        compiler             description
→  /home/ocamler/dev/helloer     ocaml.4.14.1         /home/ocamler/dev/helloer
   my-switch                     ocaml-system.4.14.1  my-switch

[NOTE] Current switch has been selected based on the current directory.
       The current global system switch is my-switch.

opam indicates that it has selected the local switch as the currently active one with the → character and then tells us that the currently active global switch outside of this directory is still a previously created one called my-switch.

Local switches were explained in detail in this section of Opam 101. We learned in it that opam automatically selects the local switch as the currently active one as soon as we move inside the directory in which it was created.

opam list will show us what packages are currently installed in our switch:

$ opam list
# Packages matching: installed
# Name        # Installed # Synopsis
base-bigarray base
base-threads  base
base-unix     base
ocaml         4.14.1      The OCaml compiler (virtual package)
ocaml-config  2           OCaml Switch Configuration
ocaml-system  4.14.1      The OCaml compiler (system version, from outside of opam)

Which is simply the list of dependencies of the OCaml compiler (since it is the only thing we have installed in our switch so far).

To confirm which OCaml binary is being used:

$ ocaml -vnum
4.14.1
$ which ocaml
/usr/bin/ocaml

This confirms you are using the system compiler (not an opam-installed one) as the path does not point to either the global ~/.opam nor /home/ocamler/dev/helloer/_opam directories.

In the case of a global switch, the following would be true:

$ ocaml -vnum
4.14.1
$ which ocaml
/home/ocamler/.config/opam/my-global-switch/bin/ocaml

To verify that everything works, and since we have a compiler, let's compile a program:

$ cat helloer.ml
let () =
  print_endline "Hello OCamlers!!"
$ ocamlc -o hello helloer.ml
$ ./hello
Hello OCamlers!!

Nice! You've just compiled your first OCaml program in a fresh local switch. 🎉

Of course, this isn't very exciting yet. Let’s spice things up by adding external libraries to our helloer program.

Getting started

helloer is a toy project - with it we can play around with the tools at hand and learn a thing or two about them.

However, if you're curious about the source code of the project, you can check it out right here.

Do keep in mind that this is rather remote from what you will encounter in the wild. Both the code base and the structure of the repository are made intentionally barebones. This will allow us to smoothly introduce what we believe are the most common features a beginner OCaml dev should get familiar with.

Looking for the necessary tools with `opam search`

One of the most valuable skills in OCaml development is selecting the right libraries for the job. Over time, you will build a mental toolbox of go-to libraries for all kinds of engineering goals.

You can explore the package ecosystem in several ways.

From the command line :

opam search <keyword> — find packages

opam show <keyword> — see details of a package

From the web:

opam.ocaml.org/packages

ocaml.org/packages

Choose a build system

The first tool your OCaml project needs is a build system.

We are using Dune in this tutorial because it is the most widely used build system in the OCaml ecosystem today. It is well integrated with opam, supports both libraries and executables, and offers fast incremental builds with dependency tracking. Most packages in the opam-repository use dune as their build system.

That said, OCaml also has other build tools, either still used in specific contexts or maintained for compatibility - for example, ocpbuild, ocamlbuild, topkg or just make and ocamlc combined. In some cases, these alternative tools can feel more lightweight or straightforward, especially for very small projects or when fine-grained control is needed. Choosing the right tool often depends on your project’s scope and your familiarity with the ecosystem. For most new development, however, dune remains the most common and actively maintained choice.

If you happen to look for guidance or any kind of support for your OCaml developments, keep in mind that the Discuss OCaml Community Forum is the best place to engage with your peers!

We believe that introducing you to the current most common practices of the OCaml Community is a solid way to get you going.

You can expect us to cover in detail how to properly begin a project with dune in upcoming blogposts.

A call to opam install will change the state of our current switch by installing this build system:

$ opam install dune

You can now get to writing your first dune-project file. You may either refer to the documentation or get some inspiration from the dune-project file we provide at the end of this article!

Adding Command-Line Tools

Now onto finding a library to help us build a neat command-line interface for helloer.

We know that the OCaml Standard Library ships an Arg module which aims at allowing the parsing of command-line arguments. However, it's quite limited and verbose. Instead, we'll use opam search for something more ergonomic:

Using opam, a simple opam search with your keywords might help you greatly:

$ opam search "command line interface"
# Packages matching: match(*command line interface*)
# Name                  # Installed # Synopsis
bap-byteweight-frontend --          BAP Toolkit for training and controlling Byteweight algorithm
clim                    --          Command Line Interface Maker
cmdliner                --          Declarative definition of command line interfaces for OCaml
dream-cli               --          Command Line Interface for Dream applications
hg_lib                  --          A library that wraps the Mercurial command line interface
inquire                 --          Create beautiful interactive command line interface in OCaml
kappa-binaries          --          Command line interfaces of the Kappa tool suite
minicli                 --          Minimalist library for command line parsing
ocal                    --          An improved Unix `cal` utility
ocamline                --          Command line interface for user input
wcs                     --          Command line interface for Watson Conversation Service

cmdliner is one of our favourite libraries for that matter so let's use it in helloer.

Update the switch:

$ opam install cmdliner

Now is the time to implement your first command-line with cmdliner! You can check how we did it for helloer here in the helloer.ml file. You may also refer directly to the cmdliner library documentation!

$ ./helloer.exe
Hello OCamlers!!           
$ ./helloer.exe --gentle
Welcome my dear OCamlers.

Adding a Test Library

Finally, before we get to coding our little project, we should consider adding a test library to our project. This will make writing tests much easier, less time-consuming and less tedious.

Again, calling opam search with one or several keywords will yield many packages that pertain to testing OCaml binaries. Our selection for today will be alcotest, a well-known and wide-spread option for conducting tests on OCaml binaries.

$ opam search "test"
# Packages matching: match(*test*)
# Name                              # Installed # Synopsis
afl-persistent                      --          Use afl-fuzz in persistent mode
ahrocksdb                           --          A binding to RocksDB
alcotest                            --          Alcotest is a lightweight and colourful test framework
alcotest-async                      --          Async-based helpers for Alcotest
alcotest-js                         --          Virtual package containing optional JavaScript dependencies for Alcotest
alcotest-lwt                        --          Lwt-based helpers for Alcotest
alcotest-mirage                     --          Mirage implementation for Alcotest
[...]

Update the switch:

$ opam install alcotest

You can check out how we used it in helloer here, also refer to the documentation of the library for further exploration.

Run tour tests:

$ dune runtest
Testing `Tests'.                 
This run has ID `N39NJ5ZE'.

  [OK]          messages          0   normal.
  [OK]          messages          1   gentle.

Full test results in `~/ocamler/dev/helloer/_build/default/_build/_tests/Tests'.
Test Successful in 0.000s. 2 tests run.

Now that we have found and used our new tools, we need only to create a package for helloer!

This means writing an opam file. Next section will cover what information go into it.

Furthermore, the distribution of your newly developed package to the rest of the OCaml Community on the opam-repository will be covered in the next opam blog post!

Your first opam file

So how exactly does one write an opam file?

A minimal functional opam file

You will find below a minimal opam file for the helloer project.

This file is minimal in the sense that it is complete enough for you to work with your package on your local environment. However, there remains a few fields that we will explain in a moment and that are necessary for you to distribute your code.

$ cat helloer.opam
opam-version: "2.0"
depends: [
  "cmdliner"
  "ocaml"
  "alcotest" {with-test}
]
build: [
 [ "dune" "build" "-p" name ]
 [ "dune" "runtest" ] {with-test}
]
install: [ "dune" "install" ]

For now, there is enough information for opam to install and use your OCaml project locally.

What are these fields for?

opam-version: "2.0":

Specifies that this opam file uses syntax compatible with opam 2.0 or later.
Required at the top of every opam file.

depends: [...]:

Lists the packages that your project depends on. It lists all necessary information to build your project, and to help opam suggest what other packages have to be installed prior to it.
- You could optionally set lower or upper bound for specific version range (e.g., cmdliner {>= "1.0.0"}), but omitting that is fine for a minimal file like this one.

build: [...]:

Tells opam how to build your project.
["dune" "build" "-p" name]: Builds the package. -p name means "build the part of the project with the same name as the opam package".

name here should match the actual opam package name; it’s often replaced automatically by opam internally.
["dune" "runtest"] {with-test}: Runs the test suite, only if --with-test option is passed (e.g. during CI or development).
You can fill up with any command here, it will be launched by opam in a sandboxed environment.

install: [...]:

This tells opam how to install the built binaries and libraries into the opam environment.
Dune will install any libraries, executables, and other files you've marked as public_name.

That is it for the essential fields of an opam file. Now onto the metadata fields which are required for you to later distribute your package through the Official OCaml opam-repository.

An opam file supports about thirty-ish valid fields to specify a package, again, you can look them all up here.

A real-world opam file with `opam lint`

What fields are mandatory for a proper opam package?

A good question, and the answer is simple since opam features a linting command with opam lint.

This means that running this command on our small opam file will yield the following message to help us make sense of what more is required to make our newly developped package distributable:

$ opam lint .
/home/ocamler/dev/helloer/helloer.opam: Errors.
    error 23: Missing field 'maintainer'
  warning 25: Missing field 'authors'
  warning 35: Missing field 'homepage'
  warning 36: Missing field 'bug-reports'
    error 57: Synopsis must not be empty
  warning 68: Missing field 'license'

As you can see, some of these missing fields are considered errors, and others are considered mere warnings. Each also come with a designated error code. You can find all warning and error codes by either running the opam lint --help command in your terminal, or going to the corresponding opam man page on the interwebs.

Let's break each of them down and see how these linting errors can be fixed.

Error 23: Missing field maintainer
What: Contact email for the package maintainer.
Example:
```
maintainer: ["Your Name <hell@er.com>"]
```
Error 57: Synopsis must not be empty
What: One-line description of your project.
Example:
```
synopsis: "A simple and polite greeter in OCaml"
```
Warning 25: Missing field authors
What: List of project authors.
Example:
```
authors: ["Your Name <hell@er.com>"]
```
Warning 35: Missing field homepage
What: URL to your project's website or repository.
Example:
```
homepage: "https://github.com/OCamlPro/opam_bp_examples"
```
Warning 36: Missing field bug-reports
What: URL for reporting issues.
Example:
```
bug-reports: "https://github.com/OCamlPro/opam_bp_examples/issues"
```
Warning 68: Missing field license
What: License under which your project is distributed.
Example:
```
license: "ISC"
```

Now that you have linted an opam file manually, you will be very happy to learn that dune can actually automatically generate that file for you. Say for instance that you prefer the syntax of the dune-project file, you can let dune handle it for you!

Here's what a complete dune-project file would look like for our little helloer project.

(lang dune 3.15)

(name helloer)

(license ISC)

(authors "Your Name <hell@er.com>")

(maintainers "You Name <hell@er.com>")

(source
 (uri "https://github.com/OCamlPro/opam_bp_examples"))

(homepage "https://github.com/OCamlPro/opam_bp_examples")

(bug_reports "https://github.com/RadioPotin/ocamuse/issues")

(documentation "N/A")

(generate_opam_files true)

(package
  (name helloer)
  (synopsis "A simple and polite greeter in OCaml")
  (description
    "This is an example package for article 'opam 103' on OCamlPro's blog")
  (tags (greeter opam 103 tutorial beginner))
  (depends
    ocaml
    cmdliner
    alcotest
  )
)

Conclusion and what's next

At this point, our project is in great shape. We've seen how to setup your local switch, integrate third-party libraries, run tests, and write a minimal but functional opam file. With that foundation, you’re ready to build and manage your OCaml projects locally with confidence.

But writing code for yourself is only half the journey. The next step is sharing it with the world. In our next post, we’ll dive into how to publish your package to the official OCaml opam-repository — covering everything from the structure of an opam package submission, to working with the community through pull requests, versioning, opam-CI, and more.

Thank you for trodding along the dunes with this little OCaml caravan of ours 🚶🚶🐫🐫🐫 — and until our next Oasis, happy hacking!

Flambda2 Ep. 4: How to write a purely functional compiler

2025-02-19T09:05:17Z

As we dive deeper into the Flambda2 Optimising Compiler, who knows what marvels might await us. Picture: Son Doong Cave (Vietnam). Credit: Collected.

Welcome to a new episode of The Flambda2 Snippets!

Today, we will cover key high-level aspects of the algorithm of Flambda2. We will do our best to explain the fundamental design decisions pertaining to the architecture of the compiler. We will touch on how we managed to make a purely functional optimising compiler (leveraging tail-recursion, backtracking, and non-linear traversal) by covering how the code is traversed, what actions this design facilitates, and more!

All feedback is welcome, thank you for staying tuned and happy reading!

The F2S blog posts aim at gradually introducing the world to the inner-workings of a complex piece of software engineering: The Flambda2 Optimising Compiler for OCaml, a technical marvel born from a 10 year-long effort in Research & Development and Compilation; with many more years of expertise in all aspects of Computer Science and Formal Methods.

Table of contents

Expression traversal
Overview of the traversal
Downward traversal
- Let_val
- Let_cont
- Apply_cont
- Apply_val
Upward traversal
Upward environment
Dead code elimination
Conclusion

Expression traversal

Here's a code snippet we would like to be able to optimise and that demonstrates a set of properties that we want our code optimiser to have.

(* original code *)
let bar x =
  let d = x + x
  let y = x, d in
  y

let foo z =
  let x, d = bar z in
  if x = z then x + 1 else d

We will optimise this code to :

let foo z =
  z + 1

And we will do that in a single pass that is both efficient and maintainable.

Here are the key transformations we would like to apply to this codeblock:

the first element of the pair returned by the bar function is an alias to the z argument of foo, thus the if condition in foo always evaluates to true.
That means that the d variable is never used, since the else branch in foo is never executed.

That being said, to discover the aliased values x and z, we have to follow the z variable from foo to bar and back again. And to discover that the let d = x + x is unused in bar we have to know about the alias and then go back from the d used in foo to the let in bar. The point is, there is a complex order of dependencies between these properties that we have to follow in order to learn about the code.

Keep in mind that we aim for our compiler to remain reasonably fast. In order to do that, we conduct all code transformations at the same time as the analysis. This entails that we cannot just plug a constraint solver inside of Flambda2 in order to discover these properties.

You have to understand that there are two kinds of properties that we want to track. One of them, like discovering that the if condition always evaluates to true, flows in the order of evaluation, i.e: top-down. While the others, like finding dead code, like let d = x + x, and thus eliminating it, can only be done in the reverse order of the evaluation, i.e: bottom-up.

Interesting detail: properties of the first category, sometimes help discover properties from the second, like in that specific example, but never the other way round.

And now, we will explain, how we have designed Flambda2 to be able to operate within these constraints while transforming the code at the same time.

(*
  CPS-converted version
  Same code as before in the FL2 IR.
  All variables with names starting with `k` are continuations.
*)
let bar x k_ret =
  let d = x + x in
  apply_cont k_ret x d

let foo z k_ret =
  let_cont k x d =
    let r = x + 1 in
    if x = z then
      apply_cont k_ret r
    else
      apply_cont k_ret d
  in
  apply bar z k

If you recall our very first F2S snippet, we mentioned one of the fundamental design decisions of Flambda2 which consists in representing programs using CPS. One of the main reasons for that is that inlining becomes very simple.

But there's a catch…

If you refer back to the original version of the CPS-converted codeblock above, you will see that if x = z then x + 1 else d is inside the scope of the the bar function call. It's no longer the case once the function has been converted to CPS. This shrinking of the scope, is inherent to CPS representation. In an expression language, value analysis can simply be written as a recursive function on expressions, propagating properties through an environment. That is how the simplification pass was written on our previous IR Flambda1. It did produce some imprecisions here and there, but the trade-off in code simplicity favoured this route rather than the one we have taken with Flambda2 today.

In direct-style language representations, traversals in the order of evaluation may be roughly emulated by simply traversing the tree recursively. On the other hand, in CPS-style language representations, this doesn't hold.

That's the catch: analysing CPS code entails more complex algorithms.

Overview of the traversal

Reasoning about code requires having a specific kind of data structure.

This data structure must behave like a kind of database of properties of expressions, we naturally attach a name to each expression, and the data structure itself keeps track of the properties related to them. This data structure will be named acc in the following code blocks (short for accumulator).

A design decision we made early was that we wanted to traverse the code only once while doing the maximum amount of simplifications. Of course, there are exceptions to this rule but that’s a topic for another time.

Experience gained from designing Flambda1 guided this decision. In practice, this overarching traversal manifests itself as two distinct passes: one downwards and one upwards. The downwards pass performs static analysis and inlining, while the upwards one handles code reconstruction and dead code elimination, we call this whole process "Simplify".

Downward traversal

As mentioned in F2S1, the FL2 AST is simple and represented with only 6 different cases. You can find it again below:

type expr =
  | Let_val of { var; prim; body : expr }
  | Let_cont of { k; param; handler : expr ; body : expr }
  | Apply_cont of { k; arg }
  | Apply_val of { f; k_return; arg }
  | Switch of { arg; cases }
  | Invalid

We are going to cover each of these cases separately and explain how each behave and their role in how they help us reason about the code. Then, once all that is clear, we will explain how we traverse each constructor. This should help you understand what information we accumulate during both passes, and what exactly we can do with them.

Let_val

type expr =
  | Let_val of
	{
	  var : variable ;
	  prim : named ;
	  body : expr ;
	}
[…]

Overview and semantics:

The Let_val constructor evaluates a named primitive, and binds it to a variable inside the body and then evaluates that body. A named primitive is a single atomic operation applied to some variables. Primitives have no impact on control flow, for instance they cannot raise exceptions.

Traversal algorithm:

This is the easy case, we just follow the evaluation order above. We analyse the named primitive, extend the acc data structure with the discovered properties, and proceed with analysing the body using the new acc.

The most important thing about this process on the way down is this specific extension of the acc data structure. Most other constructors will pipe the acc smartly all along the computation rather than extending it.

Additional details:

One interesting thing to note: we can discover properties on the arguments of the primitive and not only on the bound variable. For example, the primitive that reads the field of a value allows us to discover that the argument is a block where that field exists in the current acc.

Let_cont

type expr =
  | Let_cont of
	{
	  body : expr;
	  k : continuation;
	  params : variable list;
	  handler : expr;
	}
[…]

Overview and semantics:

The Let_cont constructor evaluates body. body is allowed to refer to the k continuation, and when encountering an application of k the control flow will evaluate handler after binding the arguments of the given application to params.

Traversal algorithm:

The first thing to note is that there might be several apply_cont to k inside the body, and since we want to analyse the handler only once, we cannot just follow the evaluation order naively like with the Let_expr case.

Therefore, we first analyse the body, and collect all the data about the applications of k (see the apply_cont case below).

Once we have that, we can analyse and deduce the properties that we can know about the arguments given to k. We can then bind these properties to the corresponding parameters and then analyse the handler itself.

Quick rundown:

Let's consider the following code snippet.

let foo_d b k_ret = 
  let_cont k x =
    let y = (x <= 1) in
    apply_cont k_ret y
  in
  if b then apply_cont k 0
  else apply_cont k 1

When we analyse the let_cont we first analyse the body and see the conditional on b. We'll see the two apply_conts to k and we'll be able to deduce that the argument given to k is either 0 or 1. With that knowledge, we can analyse the handler of k and deduce that y is always true.

Side note:

So far, we've only considered the case where let_conts are not recursive. We also allow let_cont to be recursive, namely to represent the control-flow of loops, which means that the handler can contain apply_cont k. Since we won't be able to see all apply_conts before analysing the handler we will have to stay conservative by over-approximating the properties we know about the parameters.

Apply_cont

type expr =
  | Apply_cont of
	{
	  k : continuation;
	  args : variable list;
	}
[…]

Overview and semantics:

As described in Let_cont this only transfers the control to the handler associated to k using the args to populate the value of the parameters of k.

Traversal algorithm:

In this constructor, we extend the acc by associating the current context to k. This will be retrieved later (see Let_cont case) to know which contexts led to this continuation, and thus setup a context for the handler.

Furthermore, Apply_cont has no underlying field of type expr so it is a leaf of the on-going traversal. Assuming that there was a Let_cont earlier, the traversal will forward the acc to the last Let_cont encountered and proceed from there again as explained above.

If there is no remaining Let_cont then it means that the analysis of the function is over and that we've traversed all the live code.

See the Let_cont example.

Apply_val

type expr =
  | Apply_val of
	{
	  f : variable;
	  args : variable list;
	  k_return : continuation;
	}
[…]

Overview and semantics:

Apply_val is the usual function application. f is interpreted as a functional value so control-flow jumps to the associated code, binding args to the function parameters. Since this is CPS, when the function returns, the control is transfered to k_return, same as for an Apply_cont, its return value is bound to the parameter of k_return.

It closely ressembles something like:

let x = f args in
apply_cont k_return x

But since we don't allow normal function applications inside of a Let_val, we have an Apply_val constructor to handle it.

Traversal algorithm:

The first thing we do is: recover the known properties about f from our acc.

Depending on what properties we have discovered so far, we decide whether to inline f or not: If we choose not to inline f, we handle this Apply_val as another Apply_cont to k_return, but if we do decide to inline it, we replace the current Apply_val with the body of the f function and continue the traversal from there.

The properties that matter for the inlining decision include:

Do we know the actual function called (as shown above, f is a variable, and we may or may not know which function it refers to)
Are there any user annotation, either on the definition of f such as [@inline], or at the application, with [@inlined]
The size of f is important too because inlining large functions may be detrimental
The value of the args matter because, for instance, when we know nothing about the arguments, inlining f is less likely to be benefitial
In some case, this is where we try Speculative Inlining

Some static analysis requires a whole-view of the program, or at least, the current function.

So when the downwards pass has traversed the whole term, we trigger a few analyses that we could not do on-the-fly like properties that involve loops. Such properties can't be computed during a single pass, they usually require a fix-point. Once that is done, we use the result of the downward pass, we can use that to initialise the upward environment (uenv).

Upward traversal

You will be happy to learn that the upward traversal is much easier to break down than the downward one! 🎉

Upward environment

Since we have all the data accumulated on our way down at our disposal, we only have a few more properties to track on our way back up. As said previously, we gathered the properties inside an accumulator while following the evaluation order, on our way down. On our way up, we will feed something more akin to an environment.

(* example of a rebuilding step function *)
val rebuild_let : var -> prim -> args : var list -> body : (term * uenv) -> term * uenv

This upward environment (uenv) will mainly hold data about:

free (live) variables, which are variables that are used in the subterms of the term being traversed;
relevant information to aid the Speculative Inlining heuristic, which include the size of the term and the optimization benefits, for instance the number of operations eliminated during both traversals.

These properties are inherently structural. Thus, tracking them is easily done while traversing the tree in the structural order.

Furthermore, these are properties of the rebuilt version of the term, not the original one:

Since some optimisations can remove variable uses (making such variables potentially useless), we are required to work with a rebuilt version of that term;
Obviously, optimization benefits can only be computed after actually performing them;

That is why we could not have tracked them on the way down, thus relegating them to the way back up. Hence, we have designed the upward pass to follow the structural order to track that effortlessly.

Dead code elimination

Free variables are useful for dead code elimination.

Dead code is code which can be removed from the term without altering its semantic.

There are two kinds of dead code:

Pure expressions whose result is never used;
Code sections that are never reached;

The first one can be detected by looking for variables which are never mentioned outside of their definition.

The second is the same, but relative to continuation names.

In order to understand how it is done, let's see how we do it for a simple let binding, and then we'll see how it is done through continuations with let_cont.

Let-bindings: when rebuilding the let, we have the rebuilt body, and the set of free-variables of the rebuilt body, and if the variable bound by the let is not part of the free-variables, we delete that let.

Rundown:

let x = 1 in
(* Step 2 *)
let z = 0 in
(* Step 1 *)
let y = x + 1 in
(* Step 0 *)
42 + z

The rebuilding order follows the Step annotations of the example from 0 to 2.

Step 0: the free-variable of the body of the let y is { z }. y is not present in that set so we don't rebuild the let. Had we rebuilt it, x would have been part of the free-variables set. So we just keep { z } as the set of free-variables.
Step 1: We rebuild the let z and remove z from the free-variables set because it is now bound.
Step 2: And now we continue onto the let x that we also remove.

We can observe that this method can remove all useless lets in a single traversal.

Let's see now, how we can extend this method to rebuilding let_conts while still maintaining this property.

Let_conts: As for let_cont, we want to be able to remove the unused parameters of the continuation. We can see that a parameter is unused after having analysed the continuation handler: when the parameter is absent from the set of free-variables of the handler.

Furthermore, we need to change the apply_cont of the continuation from which we remove parameters. We need to only pass arguments for live parameters.

That entails to go through the body of the continuation after traversing its handler, because it's inside the body that apply_conts to that continuation appear (we are going to put aside recursive continuations for the sake of simplicity).

And so, we keep track of which of these continuations' parameters we removed in order to rebuild the apply_cont.

Rundown:

(* Step 0 *)
let_cont k0 z =
(* Step 1 (k0 handler) *)
  return 42
in
(* Step 2 (k0 body) *)
let_cont k1 y =
(* Step 4 (k1 handler) *)
  let z1 = y + 1 in 
(* Step 3 (k1 handler) *)
  apply_cont k0 z1
in
(* Step 5 (k1 body) *)
apply_cont k1 420

The rebuilding order follows the Step annotations of the example from 0 to 5, though, we will only mention the relevant steps.

Step 1: Parameter z of k0 is dead, so we get rid of it.
Step 3: So now, we have to update the apply_cont k0 z1 which in turn becomes apply_cont k0 (the argument disappears).
Step 4: Since z1 was deleted, its let is then removed, and y becomes useless and in turn, eliminated.
Step 5: Eventually, we can replace the apply_cont k1 420 with apply_cont k1 because the y parameter was previously eradicated.

It is this traversal order, which allows to conduct simplications on CPS in one go and perform dead code elimination on the upwards traversal.

Conclusion

In this episode of The Flambda2 Snippets, we have explored how the Flambda2 Optimising Compiler performs upwards and downwards traversals to analyze and transform OCaml code efficiently. By structuring our passes in this way, we ensure that static analysis and optimizations are performed in a single traversal while maintaining precision and efficiency.

The downward traversal enables us to propagate information about variables, functions, and continuations, allowing for effective inlining and simplification. Meanwhile, the upward traversal facilitates optimizations such as dead code elimination by identifying and removing unnecessary expressions in a structured and efficient manner.

Through these mechanisms, Flambda2 is able to navigate the complexities introduced by CPS conversion while still achieving significant performance gains. Understanding these traversal strategies is key to grasping the power behind Flambda2’s approach to optimization and why it stands as a robust solution for compiling OCaml code.

Thank you all for reading! We hope these articles keep the community eager to dive even deeper with us into OCaml compilation. Until next time, mind the stalactites! ⛏️🔦

opam 2.3.0 release!

2024-11-13T09:05:17Z

Feedback on this post is welcomed on Discuss!

As mentioned in our talk at the OCaml Workshop 2024, we decided to switch to a time-based release cycle (every 6 months), starting with opam 2.3.

As promised, we are very pleased to announce the release of opam 2.3.0, and encourage all users to upgrade. Please read on for installation and upgrade instructions.

Try it!

In case you plan a possible rollback, you may want to first backup your ~/.opam or $env:LOCALAPPDATAopam directory.

The upgrade instructions are unchanged:

Either from binaries: run

For Unix systems

$ bash -c "sh <(curl -fsSL https://opam.ocaml.org/install.sh)"

or from PowerShell for Windows systems

Invoke-Expression "& { $(Invoke-RestMethod https://opam.ocaml.org/install.ps1) }"

or download manually from the Github "Releases" page to your PATH.

Or from source, manually: see the instructions in the README.

You should then run:

opam init --reinit -ni

Major breaking change: extra-files

When loading a repository, opam now ignores files in packages' files/ directories which aren't listed in the extra-files field of the opam file. This was done to simplify the opam specification where we hope the opam file to be the only thing that you have to look at when reading a package specification. It being optional to list all files in the extra-files: field went against that principle. This change also reduces the surface area for potential file corruption as all extra-files must have checksums.

This is a breaking change and means that if you are using the files/ directory without listing them in the extra-files: field, you need to make sure that all files in that directory are included in the extra-files field. The resulting opam file remains compatible with all previous opam 2.x releases.

If you have an opam repository, you should make sure all files are listed so every packages continues to work without any issue, which can be done automatically using the opam admin update-extrafiles command.

Major changes

Packages requiring an unsupported version of opam are now marked unavailable, instead of causing a repository error. This means an opam repository can now allow smoother upgrade in the future where some packages can require a newer version of opam without having to fork the repository to upgrade every package to that version as was done for the upgrade from opam 1.2 to 2.0
Add a new opam list --latests-only option to list only the latest versions of packages. Note that this option respects the order options were given on the command line. For example: --available --latests-only will first list all the available packages, then choose only the latest packages in that set; while --latests-only --available will first list all the latest packages, then only show the ones that are available in that set
Fix and improve opam install --check, which now checks if the whole dependency tree of the package is installed instead of only the root dependencies
Add a new --verbose-on option to enable verbose output for specified package names. Thanks to @desumn for this contribution
Add a new opam switch import --deps-only option to install only the dependencies of the root packages listed in the opam switch export file
opam switch list-available no longer displays compilers flagged with avoid-version/deprecated unless --all is given, meaning that pre-release or unreleased OCaml packages no longer appear to be the latest version
opam switch create --repositories now correctly infers --kind=git for URLs ending with .git rather than requiring the git+https:// protocol. This is consistant with other commands such as opam repository add. Thanks to @Keryan-dev for this contribution
opam switch set-invariant now displays the switch invariant using the same syntax as the --formula flag
The builtin-0install solver was improved and should now be capable of being your default solver instead of builtin-mccs+glpk. It was previously mostly only suited for automated tasks such as Continuous Integration. If you wish to give it a try, simply calling opam option solver=builtin-0install (call opam option solver= restores the default)
Most of the unhelpful conflict messages were fixed. (#4373)
Fix an opam 2.1 regression where the initial pin of a local VCS directory would store untracked and ignored files. Those files would usually be cleaned before building the package, however git submodules would not be cleaned and would cause issues when paired with the new behaviour added in 2.3.0~alpha1 which makes opam error when git submodules fail to update (it was previously a warning). (#5809)
Fix the value of the arch variable when the current OS is 32bit on a 64bit machine (e.g. Raspberry Pi OS). (#5949)
opam now fails when git submodules fail to update instead of ignoring the error and just showing a warning
opam's libraries now compile with OCaml >= 5.0 on Windows
Fix the installed packages internal cache, which was storing the wrong version of the opam file after a build failure. This could be triggered easily for users with custom repositories with non-populated extra-files. (#6213)
Several improvements to the pre-built release binaries were made:
- The Linux binaries are now built on Alpine 3.20
- The FreeBSD binary is now built on FreeBSD 14.1
- The OpenBSD binary is now built on OpenBSD 7.6 and loses support for OpenBSD 7.5 and earlier
- Linux/riscv64 and NetBSD/x86_64 binaries are now available

And many other general, performance and UI improvements were made and bugs were fixed. You can take a look to previous blog posts. API changes and a more detailed description of the changes are listed in:

This release also includes PRs improving the documentation and improving and extending the tests.

Please report any issues to the bug-tracker.

We hope you will enjoy the new features of opam 2.3!

Optimisation de Geneweb, 1er logiciel français de Généalogie depuis près de 30 ans

2024-11-06T09:05:17Z

Un bonsaï sous sa cloche de verre. De nos jours, l'accès à la généalogie grand public est préservé surtout grâce à la maintenance de codes patrimoniaux.

L’équipe d’OCamlPro a récemment été sollicitée par l’association Roglo, une association française de généalogie qui gère une base de plus de 10 millions de personnes connectées dans un même arbre généalogique, et dont la base s'accroît d’environ 500 000 nouvelles contributions tous les ans. L’association s’appuie sur le logiciel libre Geneweb, l’un des plus puissants logiciels du domaine, créé en 1997 à l’Inria, permettant de partager sur le web des arbres généalogiques, et utilisé aussi bien par des particuliers que par des leaders du secteur, comme la société française Geneanet, acquise en 2021 par l’Américain Ancestry.

Notre mission s’est d’abord concentrée sur l'optimisation des performances, pour ramener le traitement de certaines requêtes sur la base gargantuesque de Roglo à des temps raisonnables. Après avoir rapidement survolé le code de plus de 80 000 lignes et profilé les requêtes les plus coûteuses, nous avons pu proposer une solution, l’implanter et l’intégrer dans la branche principale. Pour l’une des requêtes, le temps passe ainsi de 77s à 4s, soit 18 fois plus rapide ! Nous travaillons maintenant à enrichir Geneweb de nouvelles fonctionnalités pour ces utilisateurs, mais aussi pour ses contributeurs et les mainteneurs de la plateforme !

Cette mission, fragmentée en sprints de développement, s'inscrit dans une démarche continue visant à faire évoluer Geneweb pour qu'il puisse gérer des volumes de données encore plus importants. Nous sommes ravis de contribuer à cette évolution, en apportant notre expertise en optimisation et en développement logiciel pour faire grandir cette plateforme de référence.

Alt-Ergo 2.6 is Out!

2024-09-30T09:05:17Z

The Alt-Ergo 2.6 release comes with many enhancements!

We are excited to announce the release of Alt-Ergo 2.6!

Alt-Ergo is an open-source automated prover used for formal verification in software development. It is part of the arsenal behind static analysis frameworks such as TrustInSoft Analyzer and Frama-C, and is one of the solvers behind Why3, a platform for deductive program verification. The newly released version 2.6 brings new features and performance improvements.

Development on Alt-Ergo has accelerated significantly this past year, thanks to the launch of the DéCySif joint research project (i-Démo) with AdaCore, Inria, OCamlPro and TrustInSoft. The improvements to bit-vectors and algebraic data types in this release are sponsored by the Décysif project.

The highlights of Alt-Ergo 2.6 are:

Support for reasoning and model generation with bit-vectors
Model generation for algebraic data types
Optimization with (maximize) and (minimize)
FPA support is enabled by default and available in SMT-LIB format
Binary releases now on GitHub

Alt-Ergo 2.6 also includes other improvements to the user interface (notably the set-option SMT-LIB command), use of Dolmen as the default frontend for SMT-LIB and native input, and many bug fixes.

Bit-vectors

In Alt-Ergo 2.5, we introduced built-in functions for the bit-vector primitives from the SMT-LIB standard, but only provided limited reasoning support. For Alt-Ergo 2.6, we set out to improve this reasoning support, and have developed a new and improved relational theory for bit-vectors. This new theory is based on an also new constraint propagation core that draws heavily on the architecture of the Colibri solver (as in Sharpening Constraint Programming approaches for Bit-Vector Theory), integrated into Alt-Ergo's existing normalizing Shostak solver.

Bit-vectors are commonly used in verification of low-level code and in cryptography, so improved support significantly enhances Alt-Ergo’s applicability in these domains.

There are still areas of improvements, so please share any issue you encounter with the bit-vector theory (or Alt-Ergo in general) via our issue tracker.

To showcase improvements in Alt-Ergo 2.6, we compared it against the version 2.5 and industry-leading solvers Z3 and CVC5 on a dataset of bit-vector problems collected from our partners in the DéCySif project. The (no BV) variants for Alt-Ergo do not use the new bit-vector theory but instead an axiomatization of bit-vector primitives provided by Why3. The percentages represent the proportion of bit-vector problems solved successfully in each configuration.

	AE 2.5		AE 2.6		Z3 (4.12.5)	CVC5 (1.1.2)	Total
	(BV)	(no BV)	(BV)	(no BV)
#	4128	4870	6265	4940	5482	7415	9038
%	46%	54%	69%	54%	61%	82%	100%

As the table shows, Alt-Ergo 2.6 significantly outperforms version 2.5, and the new built-in bit-vector theory outperforms Why3's axiomatization. We even surpass Z3 on this benchmark, a testament to the new bit-vector theory in Alt-Ergo 2.6.

Model Generation

Bit-vector is not the only theory Alt-Ergo 2.6 improves upon. Model generation was introduced in Alt-Ergo 2.5 with support for booleans, integers, reals, arrays, enumerated types, and records. Alt-Ergo 2.6 extends this support to bit-vector and arbitrary algebraic data types, which means that model generation is now enabled for all the theories supported by Alt-Ergo.

Model generation allows users to extract concrete examples or counterexamples, aiding in debugging and verification of their systems.

Model generation is also more robust in Alt-Ergo 2.6, with numerous bug fixes and improvements for edge cases.

Optimization

Alt-Ergo 2.6 introduces optimization capabilities, available via SMT-LIB input using OptiSMT primitives such as (minimize) and (maximize) and compatible with Z3 and OptiMathSat. Optimization allows guiding the solver towards simpler and smaller counterexamples, helping users find more concrete and realistic scenarios to trigger a bug.

See some examples in the documentation.

SMT-LIB command support

Alt-Ergo 2.6 supports more SMT-LIB syntax and commands, such as:

The (get-info :all-statistics) command to obtain information about the solver's statistics
The (reset), (exit) and (echo) commands
The (get-assignment) command, as well as the :named attribute and :produce-assignments option

See the SMT-LIB standard for more details about these commands.

Floating-point theory

In this release, we have made Alt-Ergo's floating-point theory enabled by default: there is no need to provide the --enable-theories fpa flag anymore. The theory can be disabled with --disable-theories fpa,nra,ria (the nra and ria theories were automatically enabled along with the fpa theory in Alt-Ergo 2.5).

We have also made the floating-point primitives available in the SMT-LIB format as the indexed constant ae.round and the convenience ae.float16, ae.float32, ae.float64 and ae.float128 functions; see the documentation.

Dolmen is the new default frontend

Introduced in Alt-Ergo 2.5, the Dolmen frontend has been rigorously tested for regressions and is now the default for both .smt2 and .ae files; the --frontend dolmen flag that was introduced in Alt-Ergo 2.5 is no longer necessary.

The Dolmen frontend is based on the Dolmen library developed by Guillaume Bury at OCamlPro. It provides excellent support for the SMT-LIB standard and is used to check validity of all new problems in the SMT-LIB benchmark collection, as well as the results of the annual SMT-LIB affiliated solver competition SMT-COMP.

The preferred input format for Alt-Ergo is now the SMT-LIB format. The legacy .ae format is still supported, but is now deprecated and users are encouraged to migrate to the SMT-LIB format if possible. Please reach out if you find any issue while migrating to the SMT-LIB format.

As we announced when releasing Alt-Ergo 2.5, the legacy frontend (supports .ae files only) is deprecated in Alt-Ergo 2.6, but it can still be enabled with the --frontend legacy option. It will be removed entirely from Alt-Ergo 2.7.

Parser extensions, such as the built-in AB-Why3 plugin, only work with the legacy frontend, and will no longer work with Alt-Ergo 2.7. We are not aware of any current users of either parser extensions or the AB-Why3 plugin: if you need these features, please reach out to us on GitHub or by email so that we can figure out a path forward.

Use of `dune-site` for plugins

Starting with Alt-Ergo 2.6, we are using the plugin mechanism from dune-site to replace the custom plugin loading Dynlink. Plugins now need to be registered in the (alt-ergo plugins) site with the plugin stanza.

This does not impact users, but only impacts developers of Alt-Ergo plugins. See the dune file for Alt-Ergo's built-in FM-Simplex plugin for reference.

Binary releases on GitHub

Starting with Alt-Ergo 2.6, we will be providing binary releases on the GitHub Releases page for Linux (x86_64) and macOS (x86_64 and arm). These are released under the same licensing conditions as the Alt-Ergo source code.

The binary releases are statically linked and have no dependencies, except for system dependencies on macOS. They do not support dynamically loading plugins.

Performance

For Alt-Ergo 2.6, our main focus of improvement in term of reasoning was on bit-vectors and algebraic data types. Other theories also benefit from broader performance improvements we made. On our internal problem dataset, Alt-Ergo 2.6 is about 5% faster than Alt-Ergo 2.5 on the goals they both prove.

And more!

This release also includes significant internal refactoring, notably a rewrite from scratch of the interval domain. This improves the accuracy of Alt-Ergo in handling interval arithmetic and facilitates mixed operations involving integers and bit-vectors, resulting in shorter and more reliable proofs.

See the complete changelog here.

We encourage you to try out Alt-Ergo 2.6 and share your experience or any feedback on our GitHub or by email at alt-ergo@ocamlpro.com. Your input will help share future releases!

Acknowledgements

We thank the Alt-Ergo Users' Club members: AdaCore, the CEA, Thales, Mitsubishi Electric R&D Center Europe (MERCE) and TrustInSoft.

Special thanks to David Mentré and Denis Cousineau at MERCE for funding the initial optimization work. MERCE has been a Member of the Alt-Ergo Users' Club for four years. This partnership allowed Alt-Ergo to evolve and we hope that more users will join the Club on our journey to make Alt-Ergo a must-have tool.

The dedicated members of our Alt-Ergo Club!

Flambda2 Ep. 3: Speculative Inlining

2024-08-09T09:05:17Z

Credit: The Weighing of the Heart Ceremony, Ammit. Angus McBride (British, 1931-2007)" src="/blog/assets/img/picture_egyptian_weighing_of_heart.jpg"/>

A representation of Speculative Inlining through the famous Weighing Of The Heart of Egyptian Mythology. Egyptian God Anubis weighs his OCaml function, to see if it is worth inlining.
Credit: The Weighing of the Heart Ceremony, Ammit. Angus McBride (British, 1931-2007)

Welcome to a new episode of The Flambda2 Snippets!

The F2S blog posts aim at gradually introducing the world to the inner-workings of a complex piece of software engineering: The Flambda2 Optimising Compiler for OCaml, a technical marvel born from a 10 year-long effort in Research & Development and Compilation; with many more years of expertise in all aspects of Computer Science and Formal Methods.

Today's article will serve as an introduction to one of the key design decisions structuring Flambda2 that we will cover in the next episode in the series: Upward and Downward Traversals.

See, there are interesting things to be said about how inlining is conducted inside of our compiler. Inlining in itself is rather ubiquitous in compilers. The goal here is to show how we approach inlining, and present what we call Speculative Inlining.

Table of contents

Inlining in general
When inlining is detrimental
How to decide when inlining is beneficial
Speculative inlining
Speculative inlining in practice
Summary
Conclusion

Inlining in general

Given the way people write functional programs, inlining is an important part of the optimisation pipeline of such functional langages.

What we call inlining in this series is the process of duplicating some code to specialise it to a specific context.

Usually, this can be thought as copy-pasting the body of a function at its call site. A common misunderstanding is to think that the main benefit of this optimisation is to remove the cost of the function call. However, with modern computer architectures, this has become less and less relevant in the last decades. The actual benefit is to use the specific context to trigger further optimisations.

Suppose we have the following option_map and double functions:

let option_map f x =
  match x with
  | None -> None
  | Some x -> Some (f x)

let double i =
  i + i

Additionally, suppose we are currently considering the following function:

let stuff () =
  option_map double (Some 21)

In this short example, inlining the option_map function would perform the following transformation:

let stuff () =
  let f = double in
  let x = Some 21 in
  match x with
  | None -> None
  | Some x -> Some (f x)

Now we can inline the double function.

let stuff () =
  let x = Some 21 in
  match x with
  | None -> None
  | Some x ->
    Some (let i = x in i + i)

As you can see, inlining alone isn't that useful of an optimisation per se. In this context, appliquing Constant Propagation will optimise and simplify it to the following:

let stuff () = Some 42

Although this is a toy example, combining small functions is a common pattern in functional programs. It's very convenient that using combinators is not significantly worse than writing this function by hand.

When inlining is detrimental

We cannot just go around and inline everything, everywhere... all at once.

As we said, inlining is mainly code duplication and that would be detrimental and blow the size of the compiled code drastically. However, there is a sweet spot to be found, between both absolute inlining and no inlining at all, but it is hard to find.

Here's an example of exploding code at inlining time:

(* val h : int -> int *)
let h n = (* Some non constant expression *)

(* val f : (int -> int) -> int -> int *)
let f g x = g (g x)

(* 4 calls to f -> 2^4 calls to h *)
let n = f (f (f (f h))) 42

Following through with the inlining process will produce a very large binary relative to its source code. This contrived example highlights potential problems that might arise in ordinary codebases in the wild, even if this one is tailored to be quite nasty for inlining: notice the exponential blowup in the number of nested calls, every additional call to f doubles the number of calls to h after inlining.

How to decide when inlining is beneficial

Most compilers use a collection of heuristics to guide them in the decision making. A good collection of heuristics is hard to both design, and fine-tune. They also can be quite specific to a programming style and unfit for other compilers to integrate. The take away is: there is no best way.

Side Note:

This topic would make for an interesting blog post but, unfortunately, rather remote from the point of this article. If you are interested in going deeper into that subject right now, we have found references for you to explore until we get around to writing a comprehensive, and more digestable, explanation about the heuristic nature of inlining:

Secrets of the Glasgow Haskell Compiler inliner, by SIMON PEYTON JONES and SIMON MARLOW, 2002.

Extending the Scope of Syntactic Abstraction, by OSCAR WADDELL, 1999. Section 4.4 (PDF Download link), for the case of Scheme.

Towards Better Inlining Decisions Using Inlining Trials, by JEFFREY DEAN and CRAIG CHAMBERS, 1994.

Understanding and Exploiting Optimal Function Inlining, by THEODOROS THEODORIDIS, TOBIAS GROSSER, ZHENDONG SU, 2022.

Before we get to a concrete example, and break down Speculative Inlining for you, we would like to discuss the trade-offs of duplicating code.

CPUs execute instructions one by one, or at least they pretend that they do. In order to execute an instruction, they need to load up into memory both code and data. In modern CPUs, most instructions take only a few cycles to execute and in practice, the CPUs often execute several at the same time. To put into perspective, loading memory, however, in the worst case, can take hundreds of CPU cycles... Most of the time it's not the case because CPUs have complex memory cache hierarchies such that loading from instruction cache can take just a few cycles, loading from level 2 caches may take dozens of them, and the worst case is loading from main memory which can take hundreds of cycles.

The take away is, when executing a program, the cost of one instruction that has to be loaded from main memory can be larger than the cost of executing a hundred instructions in caches.

There is a way to avoid the worst case scenario. Since caches are rather small in size, the main component to keeping from loading from main memory is to keep your program rather small, or at least the parts of it that are regularly executed.

Keep these orders of magnitude in mind when we address the trade-offs between improving the number of instructions that we run and keeping the program to a reasonably small size.

Before explaining Speculative Inlining let's consider a piece of code.

The following pattern is quite common in OCaml and other functional languages, let's see how one would go about inlining this code snippet.

Example 1: Notice the higher-order function f:

(*
  val f :
    (condition:bool -> int -> unit) 
    -> condition:bool
    -> int
    -> unit
 *)
let f g ~condition n =
  for i = 0 to n do
    g ~condition i
  done

let g_real ~condition i =
  if condition then
    (* small operation *)
  else
    (* big piece of code *)

let condition = true

let foo n =
  f g_real ~condition n

Even for such a small example we will see that the heuristics involved to finding the right solution can become quite complex.

Keeping in mind the fact that condition is always true, the best set of inlining decisions would yield the following code:

(* All the code before [foo] is kept as is, from the previous codeblock *)
let foo x = 
  for i = 0 to x do
    (* small operation *)
  done

But if condition had been always false, instead of small operation, we would have had a big chunk of g_real duplicated in foo (i.e: (* big piece of code *)). Moreover it would have only spared us the running time of a few call instructions. Therefore, we would have probably preferred to have kept ourselves from inlining anything.

Specifically, we would have liked to have stopped from inlining g, as well as to have avoided inlining f because it would have needlessly increased the size of the code with no substantial benefit.

However, if we want to be able to take an educated decision based on the value of condition, we will have to consider the entirety of the code relevant to that choice. Indeed, if we just look at the code for f, or its call site in foo, nothing would guide us to the right decision. In order to take the right decision, we need to understand that if the ~condition parameter to the g_real function is true, then we can remove a large piece of code, namely: the else branch and the condition check as well.

But to understand that the ~condition in g_real is always true, we need to see it in the context of f in foo. This implies again that, that choice of inlining is not based on a property of g_real but rather a property of the context of its call.

There exists a very large number of combinations of such difficult situations that would each require different heuristics which would be incredibly tedious to design, implement, and maintain.

Speculative inlining

We manage to circumvent the hurdle that this decision problem represents thanks to what we call Speculative Inlining. This strategy requires two properties from the compiler: the ability to inline and optimise at the same time, as well as being able to backtrack inlining decisions.

Lets look at Example 1 again and look into the Speculative Inlining strategy.

let f g ~condition n =
  for i = 0 to n do
    g ~condition i
  done

let g_real ~condition x =
  if condition then
    (* small operation *)
  else
    (* big piece of code *)

let condition = true

let foo x =
  f g_real ~condition x

We will focus only on the traversal of the foo function.

Before we try and inline anything, there are a couple things we have to keep in mind about values and functions in OCaml:

Application arity may not match function arity

To give you an idea, the function foo could also been written in the following way:

let foo x =
  let f1 = f in
  let f2 = f1 g_real in 
  let f3 = f2 ~condition in
  f3 x

We expect the compiler to translate it as well as the original, but we cannot inline a function unless all its arguments are provided. To solve this, we need to handle partial applications precisely. Over-applications also present similar challenges.

Functions are values in OCaml

We have to understand that the call to f in foo is not trivially a direct call to f in this context. Indeed, at this point functions could instead be stored in pairs, or lists, or even hashtables, to be later retrieved and applied at will, and we call such functions general functions.

Since our goal is to inline it, we need to know the body of the function. We call a function concrete when we have knowledge of its body. This entails Constant Propagation in order to associate a concrete function to general function values and, consequently, be able to simplify it while inlining.

Here's the simplest case to demonstrate the importance of Constant Propagation.

let foo_bar y =
  let pair = foo, y in
  (fst pair) (snd pair)

In this case, we have to look inside the pair in order to find the function, this demonstrates that we sometimes have to do some amount of value analysis in order to proceed. It's quite common to come across such cases in OCaml programs due to the module system and other functional languages present similar characteristics.

There are many scenarios which also require a decent amount of context in order to identify which function should be called. For example, when a function passed as parameter is called, we need to know the context of the caller functions, sometimes up to an arbitrarily large context. Analysing the relevant context will tell us which function is being called and thus help us make educated inlining decisions. This problem is specific to functional languages, functions in good old imperative languages are seldom ambiguous; even though such considerations would be relevant when function pointers are involved.

This small code snippet shows us that we have to inline some functions in order to know whether we should have inlined them.

Speculative inlining in practice

In practice, Speculative Inlining is being able to quantify the benefits brought by a set of optimisations, which have to be applied after a given inlining decision, and use these results to determine if said inlining decision is in fact worth to carry out all things considered.

The criteria for accepting an inlining decision is that the resulting code should be faster that the original one. We use "should be" because program speed cannot be fully understood with absolutes.

That's why we use a heuristic algorithm in order to compare the original and the optimised versions of the code. It roughly consists in counting the number of retired (executed) instructions and comparing it to the increase in code size introduced by inlining the body of that function. The value of that cut-off ratio is by definition heuristic and different compilation options given to ocamlopt change it.

As said previously, we cannot go around and evaluate each inlining decision independently because there are cases where inlining a function allows for more of them to happen, and sometimes a given inlining choice validates another one. We can see this in Example 1, where deciding not to inline function g_real would make the inlining of function f useless.

Naturally, every combination of inlining decision cannot be explored exhaustively. We can only explore a small subset of them, and for that we have another heuristic that was already used in Flambda1, although Flambda2 does not yet implement it in full.

It's quite simple: we choose to consider inlining decision relationships only when there are nested calls. As for any other heuristic, it does not cover every useful case, but not only is it the easiest to implement, we are also fairly confident that it covers the most important cases.

Here's a small rundown of that heuristic:

A is a function which calls B
- Case 1: we evaluate the body of A at its definition, possibly inlining B in the process
- Case 2: at a specific callsite of A, we evaluate A in the inlining context.
  - Case 2.a: inlining A is beneficial no matter the decision on B, so we do it.
  - Case 2.b: inlining A is potentially detrimental, so we go and evaluate B before deciding to inline A for good.

Keep in mind that case 2.b is recursive and can go arbitrarily deep. This amounts to looking for the best leaf in the decision tree. Since we can't explore the whole tree, we do have a some limit to the depth of the exploration.

Reminder for our fellow Cameleers: Flambda1 and Flambda2 have a flag you can pass through the CLI which will generate a .org file which will detail all the inlining decisions taken by the compiler. That flag is: -inlining-report. Note that .org files allow to easily visualise a decision tree inside of the Emacs editor.

Summary

By now, you should have a better understanding of the intricacies inherent to Speculative Inlining. Prior to its initial inception, it was fair to question how feasible (and eligible, considering the many requirements for developping a compiler), such an algorithm would be in practice. Since then, it has demonstrated its usefulness in Flambda1 and, consequently, its porting to Flambda2 was called for.

So before we move on to the next stop in the F2S series, lets summarize what we know of Speculative Inlining.

We learned that inlining is the process of copying the body of a function at its callsite. We also learned that it is not a very interesting transformation by itself, especially nowadays with how efficient modern CPUs are, but that its usefulness is found in how it facilitates other optimisations to take place later.

We also learned about the heuristic nature of inlining and how it would be difficult to maintain finely-tailored heuristics in the long run as many others have tried before us. Actually, it is because there is no best way that we have come up with the need for an algorithm that is capable of simultaneously performing inlining and optimising as well as backtracking when needed which we called Speculative Inlining. In a nutshell, Speculative Inlining is one of the algorithms of the optimisation framework of Flambda2 which facilitates other optimisations to take place.

We have covered the constraints that the algorithm has to respect for it to hold ground in practice, like performance. We value a fast compiler and aim to keep both its execution but also the code it generates to be so. Take an optimisation such as Constant Propagation as an example. It would be a naïve approach to try and perform this transformation everywhere because the resulting complexity of the compiler would amount to something like size_of_the_code * number_of_inlinings_performed which is unacceptable to say the least. We aim at making the complexity of our compiler linear to the code size, which in turn entails plenty of logarithms anytime it is possible. Instead, we choose to apply any transformation only in the inlined parts of the code.

With all these parameters in mind, can we imagine ways to tackle these multi-layered challenges all at the same time ? There are solutions out there that do so in an imperative manner. In fact, the most intuitive way to implement such an algorithm may be fairly easily done with imperative code. You may want to read about Equality Saturation for instance, or even download Manuel Serrano's Paper inside the Scheme Bigloo compiler to learn more about it. However, we require backtracking, and the nested nature of these transformations (inlining, followed by different optimising transformations) would make backtracking bug-prone and tedious to maintain if it was to be written imperatively.

It soon became evident for us that we were going to leverage one of the key characteristics of functional languages in order to make this whole ordeal easier to design, implement and maintain: purity of terms. Indeed, not only is it easier to support backtracking when manipulating pure code, but it also becomes impossible for us to introduce cascades of hard to detect nested bugs by avoiding transforming code in place. From this point on, we knew we had to perform all transformations at the same time, making our inlining function one that would return an optimised inlined function. This does introduce complexities that we have chosen over the hurdles of maintaining an imperative version of that same algorithm, which can be seen as pertaining to graph traversal and tree rewriting for all intents and purposes.

Despite the density of this article, keep in mind that we aim at explaining Flambda2 in the most comprehensive manner possible and that there are voluntary shortcuts taken throughout these snippets for all of this to make sense for the broader audience. In time, these articles will go deep into the guts of the compiler and by then, hopefully, we will have done a good job at providing our readers with all necessary information for all of you to continue enjoying this rabbit-hole with us!

Here's a pseudo-code snippet representing Speculative Inlining.

(* Pseudo-code to rpz the actual speculation *)
let try_inlining f env args =
  let inlined_version_of_f = inline f env args in
  let benefit = compare inlined_version_of_f f in
  if benefit > 0 then
    inlined_version_of_f
  else
    f

Conclusion

As we said at the start of this article, this one is but an introduction to a major topic we will cover next, namely: Upwards and Downwards Traversals.

We had to cover Speculative Inlining first. It is a reasonably approachable solution to a complex problem, and having an idea of all the requirements for its good implementation is half of the work done for understanding key design decisions such as how code traversal was designed for algorithms such as Speculative Inlining to hold out.

Thank you all for reading! We hope that these articles will keep the community hungry for more!

Until next time, keep calm and OCaml! ⚱️🐫🏺📜

opam 2.2.0 release!

2024-07-01T09:05:17Z

Feedback on this post is welcomed on Discuss!

We are very pleased to announce the release of opam 2.2.0, and encourage all users to upgrade. Please read on for installation and upgrade instructions.

NOTE: this article is cross-posted on opam.ocaml.org and ocamlpro.com, and published in discuss.ocaml.org.

Try it!

In case you plan a possible rollback, you may want to first backup your ~/.opam or $env:LOCALAPPDATAopam directory.

The upgrade instructions are unchanged:

Either from binaries: run

For Unix systems

bash -c "sh <(curl -fsSL https://raw.githubusercontent.com/ocaml/opam/master/shell/install.sh) --version 2.2.0"

or from PowerShell for Windows systems

Invoke-Expression "& { $(Invoke-RestMethod https://raw.githubusercontent.com/ocaml/opam/master/shell/install.ps1) }"

or download manually from the Github "Releases" page to your PATH.

Or from source, manually: see the instructions in the README.

You should then run:

opam init --reinit -ni

Changes

Major change: Windows support

After 8 years' effort, opam and opam-repository now have official native Windows support! A big thank you is due to Andreas Hauptmann (@fdopen), whose WODI and OCaml for Windows projects were for many years the principal downstream way to obtain OCaml on Windows, Jun Furuse (@camlspotter) whose initial experimentation with OPAM from Cygwin formed the basis of opam-repository-mingw, and, most recently, Jonah Beckford (@jonahbeckford) whose DkML distribution kept - and keeps - a full development experience for OCaml available on Windows.

OCaml when used on native Windows requires certain tools from the Unix world which are provided by either Cygwin or MSYS2. We have engineered opam init so that it is possible for a user not to need to worry about this, with opam managing the Unix world, and the user being able to use OCaml from either the Command Prompt or PowerShell. However, for the Unix user coming over to Windows to test their software, it is also possible to have your own Cygwin/MSYS2 installation and use native Windows opam from that. Please see the previous blog post for more information.

There are two "ports" of OCaml on native Windows, referred to by the name of provider of the C compiler. The mingw-w64 port is GCC-based. opam's external dependency (depext) system works for this port (including providing GCC itself), and many packages are already well-supported in opam-repository, thanks to the previous efforts in opam-repository-mingw. The MSVC port is Visual Studio-based. At present, there is less support in this ecosystem for external dependencies, though this is something we expect to work on both in opam-repository and in subsequent opam releases. In particular, it is necessary to install Visual Studio or Visual Studio BuildTools separately, but opam will then automatically find and use the C compiler from Visual Studio.

Major change: opam tree / opam why

opam tree is a new command showing packages and their dependencies with a tree view. It is very helpful to determine which packages bring which dependencies in your installed switch.

$ opam tree cppo
cppo.1.6.9
├── base-unix.base
├── dune.3.8.2 (>= 1.10)
│   ├── base-threads.base
│   ├── base-unix.base [*]
│   └── ocaml.4.14.1 (>= 4.08)
│       ├── ocaml-base-compiler.4.14.1 (>= 4.14.1~ & < 4.14.2~)
│       └── ocaml-config.2 (>= 2)
│           └── ocaml-base-compiler.4.14.1 (>= 4.12.0~) [*]
└── ocaml.4.14.1 (>= 4.02.3) [*]

Reverse-dependencies can also be displayed using the new opam why command. This is useful to examine how dependency versions get constrained.

$ opam why cmdliner
cmdliner.1.2.0
├── (>= 1.1.0) b0.0.0.5
│   └── (= 0.0.5) odig.0.0.9
├── (>= 1.1.0) ocp-browser.1.3.4
├── (>= 1.0.0) ocp-indent.1.8.1
│   └── (>= 1.4.2) ocp-index.1.3.4
│       └── (= version) ocp-browser.1.3.4 [*]
├── (>= 1.1.0) ocp-index.1.3.4 [*]
├── (>= 1.1.0) odig.0.0.9 [*]
├── (>= 1.0.0) odoc.2.2.0
│   └── (>= 2.0.0) odig.0.0.9 [*]
├── (>= 1.1.0) opam-client.2.2.0~alpha
│   ├── (= version) opam.2.2.0~alpha
│   └── (= version) opam-devel.2.2.0~alpha
├── (>= 1.1.0) opam-devel.2.2.0~alpha [*]
├── (>= 0.9.8) opam-installer.2.2.0~alpha
└── user-setup.0.7

Special thanks to @cannorin for contributing this feature.

Major change: with-dev-setup

There is now a way for a project maintainer to share their project development tools: the with-dev-setup dependency flag. It is used in the same way as with-doc and with-test: by adding a {with-dev-setup} filter after a dependency. It will be ignored when installing normally, but it's pulled in when the package is explicitly installed with the --with-dev-setup flag specified on the command line.

For example

opam-version: "2.0"
depends: [
  "ocaml"
  "ocp-indent" {with-dev-setup}
]
build: [make]
install: [make "install"]
post-messages:
[ "Thanks for installing the package"
  "as well as its development setup. It will help with your future contributions" {with-dev-setup} ]

Major change: opam pin --recursive

When pinning a package using opam pin, opam looks for opam files in the root directory only. With recursive pinning, you can now instruct opam to look for .opam files in subdirectories as well, while maintaining the correct relationship between the .opam files and the package root for versioning and build purposes.

Recursive pinning is enabled by the following options to opam pin and opam install:

With --recursive, opam will look for .opam files recursively in all subdirectories.
With --subpath <path>, opam will only look for .opam files in the subdirectory <path>.

The two options can be combined: for instance, if your opam packages are stored as a deep hierarchy in the mylib subdirectory of your project you can try opam pin . --recursive --subpath mylib.

These options are useful when dealing with a large monorepo-type repository with many opam libraries spread about.

New Options

opam switch -, inspired by git switch -, makes opam switch back to the previously selected global switch.
opam pin --current fixes a package to its current state (disabling pending reinstallations or removals from the repository). The installed package will be pinned to its current installed state, i.e. the pinned opam file is the one installed.
opam pin remove --all removes all the pinned packages from a switch.
opam exec --no-switch removes the opam environment when running a command. It is useful when you want to launch a command without opam environment changes.
opam clean --untracked removes untracked files interactively remaining from previous packages removal.
opam admin add-constraint <cst> --packages pkg1,pkg2,pkg3 applies the given constraint to a given set of packages
opam list --base has been renamed into --invariant, reflecting the fact that since opam 2.1 the "base" packages of a switch are instead expressed using a switch invariant.
opam install --formula <formula> installs a formula instead of a list of packages. This can be useful if you would like to install one package or another one. For example opam install --formula '"extlib" |"extlib-compat"' will install either extlib or extlib-compat depending on what's best for the current switch.

Miscellaneous changes

The UI now displays a status when extracting an archive or reloading a repository
Overhauled the implementation of opam env, fixing many corner cases for environment updates and making the reverting of package environment variables precise. As a result, using setenv in an opam file no longer triggers a lint warning.
Fix parsing pre-opam 2.1.4 switch import files containing extra-files
Add a new sys-ocaml-system default global eval variable
Hijack the "%{var?string-if-true:string-if-false-or-undefined}%" syntax to support extending the variables of packages with + in their name (conf-c++ and conf-g++ already exist) using "%{?pgkname:var:}%"
Fix issues when using fish as shell
Sandbox: Mark the user temporary directory (as returned by getconf DARWIN_USER_TEMP_DIR) as writable when TMPDIR is not defined on macOS
Add Warning 69: Warn for new syntax when package name in variable in string interpolation contains several '+' (this is related to the "hijack" item above)
Add support for Wolfi OS, treating it like Alpine family as it also uses apk
Sandbox: /tmp is now writable again, restoring POSIX compliance
Add a new opam admin: new add-extrafiles command to add/check/update the extra-files: field according to the files present in the files/ directory
Add a new opam lint -W @1..9 syntax to allow marking a set of warnings as errors
Fix bugs in the handling of the OPAMCURL, OPAMFETCH and OPAMVERBOSE environment variables
Fix bugs in the handling of the --assume-built argument
Software Heritage fallbacks is now supported, but is disabled-by-default for now. For more information you can read one of our previous blog post

And many other general and performance improvements were made and bugs were fixed. You can take a look to previous blog posts. API changes and a more detailed description of the changes are listed in:

This release also includes PRs improving the documentation and improving and extending the tests.

Please report any issues to the bug-tracker.

We hope you will enjoy the new features of opam 2.2! 📯

Flambda2 Ep. 2: Loopifying Tail-Recursive Functions

2024-05-07T09:05:17Z

Two camels are taking a break from crossing the desert, they know their path could not have been more optimised.

Welcome to a new episode of The Flambda2 Snippets!

Today's topic is Loopify, one of Flambda2's many optimisation algorithms which specifically deals with optimising both purely tail-recursive and/or functions annotated with the [@@loop] attribute in OCaml.

A lazy explanation for its utility would be to say that it simply aims at reducing the number of memory allocations in the context of recursive and tail-recursive function calls in OCaml. However, we will see that is just part of the point and thus we will tend to address the broader context: what are tail-calls, how they are optimised and how they fit in the functional programming world, what dilemma does Loopify nullify exactly and, in time, many details on how it's all implemented!

If you happen to be stumbling upon this article and wish to get a bird's-eye view of the entire F2S series, be sure to refer to Episode 0 which does a good amount of contextualising as well as summarising of, and pointing to, all subsequent episodes.

All feedback is welcome, thank you for staying tuned and happy reading!

The F2S blog posts aim at gradually introducing the world to the inner-workings of a complex piece of software engineering: The Flambda2 Optimising Compiler, a technical marvel born from a 10 year-long effort in Research & Development and Compilation; with many more years of expertise in all aspects of Computer Science and Formal Methods.

Table of contents

Tail-Call Optimisation
Tail-Calls in OCaml
The Conundrum of Reducing allocations Versus Writing Clean Code
Loopify
Conclusion

Tail-Call Optimisation

As far as we know, Tail-Call optimisation (TCO) has been a reality since at least the 70s. Some LISP implementations used it and Scheme specified it into its language around 1975.

The debate to support TCO happens regularly today still. Nowadays, it's a given that most functional languages support it (Scala, OCaml, Haskell, Scheme and so on...). Other languages and compilers have supported it for some time too. Either optionally, with some C compilers (gcc and clang) that support TCO in some specific compilation scenarios; or systematically, like Lua, which, despite not usually being considered a functional language, specifies that TCO occurs whenever possible (you may want to read section 3.4.10 of the Lua manual here).

So what exactly is Tail-Call Optimisation ?

A place to start would be the Wikipedia page. You may also find some precious insight about the link between the semantics of GOTO and tail calls here, a course from Xavier Leroy at the College de France, which is in French.

Additionally to these resources, here are images to help you visualise how TCO improves stack memory consumption. Assume that g is a recursive function called from f:

A representation of the textbook behaviour for recursive functions stackframe allocations. You can see here that the stackframes of non-tail-recursive functions are allocated sequentially on decreasing memory addresses which may eventually lead to a stack overflow.

Now, let's consider a tail-recursive implementation of the g function in a context where TCO is not supported. Tail-recursion means that the last thing t_rec_g does before returning is calling itself. The key is that we still have a frame for the caller version of t_rec_g but we know that it will only be used to return to the parent. The frame itself no longer holds any relevant information besides the return address and thus the corresponding memory space is therefore mostly wasted.

A representation of the textbook behaviour for tail-recursive functions stackframe allocations without Tail Call Optimisation (TCO). When TCO is not implemented the behaviour for these allocations and the potential for a stack overflow are the same as with non-tail-recursive functions.

And finally, let us look at the same function in a context where TCO is supported. It is now apparent that memory consumption is much improved by the fact that we reuse the space from the previous stackframe to allocate the next one all the while preserving its return address:

A representation of the textbook behaviour for tail-recursive functions stackframe allocations with TCO. Since TCO is implemented, we can see that the stack memory consumption is now constant, and that the potential that this specific tail-recursive function will lead to a stack overflow is diminished.

Tail-Calls in OCaml

The List data structure is fundamental to and ubiquitous in functional programming. Therefore, it's important to not have an arbitrary limit on the size of lists that one can manipulate. Indeed, most List manipulation functions are naturally expressed as recursive functions, and can most of the time be implemented as tail-recursive functions. Without guaranteed TCO, a programmer could not have the assurance that their program would not stack overflow at some point. That reasoning also applies to a lot of other recursive data structures that commonly occur in programs or libraries.

In OCaml, TCO is guaranteed. Ever since its inception, Cameleers have unanimously agreed to guarantee the optimisation of tail-calls. While the compiler's support for TCO has been a thing from the beginning, an attribute, [@tailcall] was later added to help users ensure that their calls are in tail position.

Recently, TCO was also extended with the Tail Mod Cons optimisation which allows to generate tail-calls in more cases.

The Conundrum of Reducing Allocations Versus Writing Clean Code

One would find one of the main purposes for the existence of Loopify in the following conversation: a Discuss Post about the unboxing of floating-point values in OCaml and performance.

This specific comment sparks a secondary conversation that you may want to read yourself but will find a quick breakdown of below and that will be a nice starting point to understand today's subject.

Consider the following code:

let sum l =
  let rec loop s l =
    match l with
    | [] -> s
    | hd :: tl ->
      (* This allocates a boxed float *)
      let s = s +. hd in
      loop s tl 
  in
  loop 0. l

This is a simple tail-recursive implementation of a sum function for a list of floating-point numbers. However this is not as efficient as we would like it to be.

Indeed, OCaml needs an uniform representation of its values in order to implement polymorphic functions. In the case of floating-point numbers this means that the numbers are boxed whenever they need to be used as generic values.

Besides, everytime we call a function all parameters have to be considered as generic values. We thus cannot avoid their allocation at each recursive call in this function.

If we were to optimise it in order to get every last bit of performance out of it, we could try something like:

Warning: The following was coded by trained professionnals, do NOT try this at home.

let sum l = 
  (* Local references *)
  let s = ref 0. in
  let cur = ref l in
  try
    while true do
      match !cur with
      | [] -> raise Exit
      | hd :: tl ->
        (* Unboxed floats -> No allocation *)
        s := !s +. hd;
        cur := tl
    done; assert false
  with Exit -> !s (* The only allocation *)

While in general references introduce one allocation and a layer of indirection, when the compiler can prove that a reference is strictly local to a given function it will use mutable variables instead of reference cells.

In our case s and cur do not escape the function and are therefore eligible to this optimisation.

After this optimisation, s is now a mutable variable of type float and so it can also trigger another optimisation: float unboxing.

You can see more details here but note that, in this specific example, all occurrences of boxing operations disappear except a single one at the end of the function.

We like to think that not forcing the user to write such code is a benefit, to say the least.

Loopify

Concept

There is a general concept of transforming function-level control-flow into direct IR continuations to benefit from "basic block-level" optimisations. One such pattern is present in the local-function optimisation triggered by the [@local] attribute. Here's the link to the PR that implements it. Loopify is an attempt to extend the range of this kind of optimisation to proper (meaning self) tail-recursive calls.

As you saw previously, in some cases (e.g.: numerical calculus), recursive functions sometimes hurt performances because they introduce some allocations.

That lost performance can be recovered by hand-writing loops using local references however it's unfortunate to encourage non-functional code in a language such as OCaml.

One of Flambda and Flambda2's goals is to avoid situations such as those and allow for good-looking, functional code, to be as performant as code which is written and optimised by hand at the user-level.

Therefore, we introduce a solution to the specific problem described above with Loopify, which, in a nutshell, transforms tail-recursive functions into non-recursive functions containing a loop, hence the name.

Deciding to Loopify or not

The decision to loopify a given function is made during the conversion from the Lambda IR to the Flambda2 IR. The conversion is triggered in two cases:

when a function is purely tail-recursive -- meaning all its uses within its body are self-tail calls, they are called proper calls;
when an annotation is given by the user in the source code using the [@loop] attribute;

Let's see two examples for them:

(* Not a tail-rec function: is not loopified *)
let rec map f = function
  | [] -> []
  | x :: r -> f x :: map f r

(* Is tail-rec: is loopified *)
let rec fold_left f acc = function
  | [] -> acc
  | x :: r -> fold_left f (f acc x) r

Here, the decision to loopify is automatic and requires no input from the user. Quite straightforward.

Onto the second case now:

(* Helper function, not recursive, nothing to do. *)
let log dbg f arg =
  if dbg then
    print_endline "Logging...";
  f arg
[@@inline]

(* 
  Not tail-rec in the source, but may become
  tail-rec after inlining of the [log] function.
  At this point we can loopify, provided that the
  user specified a [@@loop] attribute.
*)
let rec iter_with_log dbg f = function
  | [] -> ()
  | x :: r ->
    f x;
    log dbg (iter_with_log dbg f) r
[@@loop]

The recursive function iter_with_log, is not initially purely tail-recursive.

However after the inlining of the log function and then simplification, the new code for iter_with_log becomes purely tail-recursive.

At that point we have the ability to loopify the function, but we keep from doing so unless the user specifies the [@@loop] attribute on the function definition.

The nature of the transformation

Onto the details of the transformation.

First, we introduce a recursive continuation at the start of the function. Lets call it self.

Then, at each tail-recursive call, we replace the function call with a continuation call to self with the same arguments as the original call.

let rec iter_with_log dbg f l =
  let_cont rec k_self dbg f l =
    match l with
    | [] -> ()
    | x :: r ->
      f x;
      log dbg (iter_with_log dbg f) r
  in
  apply_cont k_self (dbg, f, l)

Then, we inline the log function:

let rec iter_with_log dbg f l =
  let_cont k_self dbg f l =
    match l with
    | [] -> ()
    | x :: r ->
      f x;
      (* Here the inlined code starts *)
      (*
        We first start by binding the arguments of the
        original call to the parameters of the function's code
       *)
      let dbg = dbg in
      let f = iter_with_log dbg f in
      let arg = r in
      if dbg then
        print_endline "Logging...";
      f arg
  in
  apply_cont k_self (dbg, f, l)

Then, we discover a proper tail-recursive call subsequently to these transformations that we replace with the adequate continuation call.

let rec iter_with_log dbg f l =
  let_cont k_self dbg f l =
    match l with
    | [] -> ()
    | x :: r ->
      f x;
      (* Here the inlined code starts *)
      (*
        Here, the let bindings have been substituted
        by the simplification.
       *)
      if dbg then
        print_endline "Logging...";
      apply_cont k_self (dbg, f, r)
  in
  apply_cont k_self (dbg, f, l)

In this context, the benefit of transforming a function call to a continuation call is mainly about allowing other optimisations to take place. As shown in the previous section, one of these optimisations is unboxing which can be important in some cases like numerical calculus. Such optimisations can take place because continuations are local to a function while OCaml ABI-abiding function calls require a prior global analysis.

One could think that a continuation call is intrinsically cheaper than a function call. However, the OCaml compiler already optimises self-tail-calls such that they are already as cheap as continuation calls (i.e, a single jump instruction).

An astute reader could realise that this transformation can apply to any function and will result in one of three outcomes:

if the function is not tail-recursive, or even not recursive at all, nothing will happen, the transformation does nothing.
if a function is purely tail-recursive then all recursive calls will be replaced to a continuation call and the function after optimisation will no longer be recursive. This allows us to later inline it and even specialise some of its arguments. This happens precisely when we automatically decide to loopify a function;
if a function is not purely tail-recursive, but contains some tail-recursive calls then the transformation will rewrite those calls but not the other ones. This may result in better code but it's hard to be sure in advance. In such cases (and cases where functions become purely tail-recursive only after inlining), users can force the transformation by using the [@@loop] attribute

Conclusion

Here it is, the concept behind the Loopify optimisation pass as well as the general context and philosophy which led to its inception!

It should be clear enough now that having to choose between writing clean or efficient code was always unsatisfactory to us. With Loopify, as well as with the rest of the Flambda and Flambda2 compiler backends, we aim at making sure that users should not have to write imperative code for it to be as efficient as functional code. Thus ideally making any which way of writing a piece of code as efficient as the next.

This article describes one of the very first user-facing optimisations of this series of snippets on Flambda2. We have not gotten into any of the neat implementation details yet. This is a topic for another time. The functioning of Loopify will be much clearer next time we talk about it.

Loopify is only applied automatically when the tail-recursive nature of a function call is visible in the source from the get-go. However, the optimisations applied by Loopify can still very much be useful in other situations as seen in this section. That is why we have the [@loop] attribute in order to enforce loopification. Good canonical examples for applying Loopify with the [@loop] attribute would be either of the following: loopifying a partially tail-recursive function (i.e, a function with only some tail-recursive paths), or for functions which are not obviously tail-recursive in the source code, but could become so after some optimisation steps.

This transformation illustrates a core principle behind the Flambda2 design: applying a somewhat naïve optimisation that is not transformative by itself, but changes the way the compiler can look at the code and trigger a whole lot of other useful ones. Conversely, it being triggered in the middle of the inlining phase can allow some non-obvious cases to become radically better. Coding a single optimisation that would discover the cases demonstrated in the examples above would be quite complex, while this one is rather simple thanks to these principles.

Throughout the entire series of snippets, we will continue seeing these principles in action, starting with the next blog post that will introduce Downward and Upward Traversals.

Stay tuned, and thank you for reading, until next time, see you Space Cowboy. 🤠

Fixing and Optimizing the GnuCOBOL Preprocessor

2024-04-30T09:05:17Z

In this post, I will present some work that we did on the GnuCOBOL compiler, the only fully-mature open-source compiler for COBOL. It all started with a bug issued by one of our customers that we fixed by improving the preprocessing pass of the compiler. We later went on and optimised it to get better performances than the initial version.

Supporting the GnuCOBOL compiler has become one of our commercial activities. If you are interested in this project, we have a dedicated website on our SuperBOL offer, a set of tools and services to ease deploying GnuCOBOL in a company to replace proprietary COBOL environments.

At OCamlPro, we often favor correctness over performance. But at the end, our software is correct AND often faster than its competitors! Optimizing software is an art, that often contradicts popular beliefs.

Table of contents

Preprocessing and Replacements in COBOL
Preprocessing in the GnuCOBOL Compiler
Conformance to the ISO Standard
Preprocessing with Automata on Streams
Some Performance Issues
Optimising Allocations
What about Fast Paths ?
Conclusion

Preprocessing and Replacements in COBOL

COBOL was born in 1959, at a time where the science of programming languages was just starting. If you had to design a new language for the same purpose today, the result would be very different, you would do different mistakes, but maybe not fewer. Actually, COBOL has shown to be particularly resilient to time, as it is still used, 70 years later! Though it has evolved over the years (the last ISO standard for COBOL was released in January 2023), the kernel of the language is still the same, showing that most of the initial design choices were not perfect, but still got the job done.

One of these choices, which would sure scare off young developers, is how COBOL favors code reusability and sharing, through replacements done in its preprocessor.

Let's consider this COBOL code, this will be our example for the rest of this article:

DATA DIVISION.
WORKING-STORAGE SECTION.
  01 VAL1.
    COPY MY-RECORD REPLACING ==:XXX:== BY ==VAL1==.
  01 VAL2.
    COPY MY-RECORD REPLACING ==:XXX:== BY ==VAL2==.
  01 COUNTERS.
     05 COUNTER-NAMES  PIC 999 VALUE 0.
     05 COUNTER-VALUES PIC 999 VALUE 0.

We are using the free format, a modern way of formatting code, the older fixed format would require to leave a margin of 7 characters on the left. We are in the DATA division, the part of the program that defines the format of data, and specifically, in the WORKING-STORAGE section, where global variables are defined. In standard COBOL, there are no local variables, so the WORKING-STORAGE section usually contains all the variables of the program, even temporary ones.

In COBOL, there are variables of basic types (integers and strings with specific lengths), and composite types (arrays and records). Records are defined using levels: global variables are at level 01 (such as VAL1, VAL2 and COUNTERS in our example), whereas most other levels indicate inner fields: here, COUNTER-NAMES and COUNTER-VALUES are two direct fields of COUNTERS, as shown by their lower level 05 (both are actually integers of 3 digits as specified by PIC 999). Moreover, COBOL programmers like to be able to access fields directly, by making them unique in the program: it is thus possible to use COUNTER-NAMES everywhere in the program, without refering to COUNTERS itself (note that if the field wasn't assigned a unique name, it would be possible to use COUNTER-NAMES OF COUNTERS to disambiguate them).

On the other hand, in older versions of COBOL, there were no type definitions.

So how would one create two record variables with the same content?

One would use the preprocessor to include the same file several times, describing the structure of the record into your program. One would also use that same file to describe the format of some data files storing such records. Actually, COBOL developers use external tools that are used to manage data files and generate the descriptions, that are then included into COBOL programs in order to manipulate the files (pacbase for example is one such tool).

In our example, there would be a file MY-RECORD.CPY (usually called a copybook), containing something like the following somewhere in the filesystem:

05 :XXX:-USERNAME PIC X(30).
05 :XXX:-BIRTHDATE.
  10 :XXX:-BIRTHDATE-YEAR PIC 9999.
  10 :XXX:-BIRTHDATE-MONTH PIC 99.
  10 :XXX:-BIRTHDATE-MDAY PIC 99.
05 :XXX:-ADDRESS PIC X(100).

This code except is actually not really correct COBOL code because identifiers cannot contain a :XXX: part:. It was written instead for it to be included and modified in other COBOL programs.

Indeed, the following line will include the file and perform a replacement of a :XXX: partial token by VAL1:

COPY MY-RECORD REPLACING ==:XXX:== BY ==VAL1==.

So, in our main example, we now have two global record variables VAL1 and VAL2, of the same format, but containing fields with unique names such as VAL1-USERNAME and VAL2-USERNAME.

Allow me to repeat that, despite pecular nature, these features have stood the test of the time.

The journey continues. Suppose now that you are in a specific part of your program, and that wish to manipulate longer names, say, you would like the :XXX:-USERNAME variable to be of size 60 instead of 30.

Here is how you could do it:

  [...]
REPLACE ==PIC X(30)== BY ==PIC X(60)==.
  01 VAL1.
    COPY [...]
REPLACE OFF.
  01 COUNTERS.
  [...]

Here, we can replace a list of consecutive tokens PIC X(30) by another list of tokens PIC X(60). The result is that the fields VAL1-USERNAME and VAL2-USERNAME are now 60 bytes long.

REPLACE and COPY REPLACING can both perform the same kind of replacements on both parts of tokens (using LEADING or TRAILING keywords) and lists of tokens. COBOL programmers combine them to perform their daily job of building consistent software, by sharing formats using shared copybooks.

Let's see now how GnuCOBOL can deal with that.

Preprocessing in the GnuCOBOL Compiler

The GnuCOBOL compiler is a transpiler: it translates COBOL source code into C89 source code, that can then be compiled to executable code by a C compiler. It has two main benefits: high portability, as GnuCOBOL will work on any platform with any C compiler, including very old hardware and mainframes, and simplicity, as code generation is reduced to its minimum, most of the code of the compiler is its parser... Which is actually still huge as COBOL is a particularly rich language.

GnuCOBOL implements many dialects, (i.e.: extensions of COBOL available on proprietary compilers such as IBM, MicroFocus, etc.), in order to provide a solution to the migration issues posed by proprietary platforms.

The support of dialects is one of the most interesting features of GnuCOBOL: by supporting natively many extensions of proprietary compilers, it is possible to migrate applications from these compilers to GnuCOBOL without modifying the sources, allowing to run the same code on the old platform and the new one during all the migration.

One of OCamlPro's main contributions to GnuCOBOL has been to create such a dialect for GCOS7, a former Bull mainframe still in use in some places.

This is a Bull DPS-7 mainframe around 1980, running the GCOS7 operating system. Such systems are still used to run COBOL critical applications in some companies, though running on software emulators on PCs. GnuCOBOL is a mature solution to migrate such applications to modern Linux computers.

To perform its duty, GnuCOBOL processes COBOL source files in two passes: it preprocesses them during the first phase, generating a new temporary COBOL file with all inclusions and replacement done, and then parses this file and generates the corresponding C code.

To do that, GnuCOBOL includes two pairs of lexers and parsers, one for each phase. The first pair only recognises a very limited set of constructions, such as COPY... REPLACING, REPLACE, but also some other ones like compiler directives.

The lexer/parser for preprocessing directly works on the input file, and performed all these operations in a single pass before version 3.2.

The output can be seen using the -E argument:

$ cobc -E --free foo.cob
#line 1 "foo.cob"
DATA DIVISION.
WORKING-STORAGE SECTION.
 
 01 VAL1.
 
#line 1 "MY-RECORD.CPY"
05 VAL1-USERNAME PIC X(60).
05 VAL1-BIRTHDATE.
 10 VAL1-BIRTHDATE-YEAR PIC 9999.
 10 VAL1-BIRTHDATE-MONTH PIC 99.
 10 VAL1-BIRTHDATE-MDAY PIC 99.
05 VAL1-ADDRESS PIC X(100).
#line 5 "foo.cob"

 01 VAL2.
 
#line 1 "MY-RECORD.CPY"
05 VAL2-USERNAME PIC X(60).
05 VAL2-BIRTHDATE.
 10 VAL2-BIRTHDATE-YEAR PIC 9999.
 10 VAL2-BIRTHDATE-MONTH PIC 99.
 10 VAL2-BIRTHDATE-MDAY PIC 99.
05 VAL2-ADDRESS PIC X(100).
#line 7 "foo.cob"

 
 01 COUNTERS.
 05 COUNTER-NAMES PIC 999 VALUE 0.
 05 COUNTER-VALUES PIC 999 VALUE 0.

The -E option is particularly useful if you want to understand the final code that GnuCOBOL will compile. You can also get access to this information using the option --save-temps (save intermediate files), in which case cobc will generate a file with extension .i (foo.i in our case) containing the preprocessed COBOL code.

You can see that cobc successfully performed both the REPLACE and COPY REPLACING instructions.

The corresponding code in version 3.1.2 is in file cobc/pplex.l, function ppecho. Fully understanding it is left as an exercice for the motivated reader.

The general idea is that replacements defined by COPY REPLACING and REPLACE are added to the same list of active replacements.

We show in the next section that such an implementation does not conform to the ISO standard.

Conformance to the ISO Standard

You may wonder if it is possible for REPLACE statements to perform replacements that would change a COPY statement, such as :

REPLACE ==COPY MY-RECORD== BY == COPY OTHER-RECORD==.
COPY MY-RECORD.

You may also wonder what happens if we try to combine replacements by COPY and REPLACE on the same tokens, for example:

REPLACE ==VAL1-USERNAME PIC X(30)== BY ==VAL1-USERNAME PIC X(60)==

Such a statement only makes sense if we assume the COPY replacements have been performed before the REPLACE replacements are performed.

Such ambiguities have been resolved in the ISO Standard for COBOL: in section 7.2.1. Text Manipulation >> General, it is specified that preprocessing is executed in 4 phases on the streams of tokens:

1. `COPY` statements are performed, and the corresponding `REPLACING`
   replacements too;
2. Conditional compiler directives are then performed;
3. `REPLACE` statements are performed;
4. `COBOL-WORDS` statements are performed (allowing to enabled/disable
   some keywords)

So, a REPLACE cannot modify a COPY statement (and the opposite is also impossible, as REPLACE are not allowed in copybooks), but it can modify the same set of tokens that are being modified by the REPLACING part of a COPY.

The ISO standard specifies the different steps to preprocess COBOL files and perform replacements in a specific order.

As described in the previous section, GnuCOBOL implements all phases 1, 2 and 3 in a single one, even mixing replacements defined by COPY and by REPLACE statements. Fortunately, this behavior is good enough for most programs. Unfortunately, there are still programs that combine COPY and REPLACE on the same tokens, leading to hard to debug errors, as the compiler does not conform to the specification.

A difficult situation which happened to one of our customers and that we prompty addressed by patching a part of the compiler.

Preprocessing with Automata on Streams

Correctly implementing the specification written in the standard would make the preprocessing phase quite complicated. Indeed, we would have to implement a small parser for every one of the four steps of preprocessing. That's actually what we did for our COBOL parser in OCaml used by the LSP (Language Server Protocol) of our SuperBOL Studio COBOL plugin for VSCode.

However, doing the same in GnuCOBOL is much harder: GnuCOBOL is written in C, and such a change would require a complete rewriting of the preprocessor, something that would take more time than we had on our hands. Instead, we opted for rewriting the replacement function, to split COPY REPLACING and REPLACE into two different replacement phases.

The corresponding C code has been moved into a file cobc/replace.c. It implements an automaton that applies a list of replacements on a stream of tokens, returning another stream of tokens. The preprocessor is thus composed of two instances of this automaton, one for COPY REPLACING statements and another one for REPLACE statements.

The second instance takes the stream of tokens produced by the first one as input. The automaton is implemented using recursive functions, which is particularly suitable to allow reasoning about its correctness. Actually, several bugs were found in the former C implementation while designing this automaton. Each automaton has an internal state, composed of a set of tokens which are queued (and waiting for a potential match) and a list of possible replacements of these tokens.

Thanks to this design, it was possible to provide a working implementation in a very short delay, considering the complexity of that part of the compiler.

We added several tests to the testsuite of the compiler for all the bugs that had been detected in the process to prevent regressions in the future, and the corresponding pull request was reviewed by Simon Sobisch, the GnuCOBOL project leader, and later upstreamed.

Some Performance Issues

Unfortunately, it was not the end of the work: Simon performed some performance evaluations on this new implementation, and although it had improved the conformance of GnuCOBOL to the standard, it did affect the performance negatively.

Compiler performance is not always critical for most applications, as long as you compile only individual COBOL source files. However, some source files can become very big, especially when part of the code is auto-generated. In COBOL, a typical case of that is the use of a pre-compiler, typically for SQL. Such programs contain EXEC SQL statements, that are translated by the SQL pre-compiler into much longer COBOL code, consisting mostly of CALL statements calling C functions into the SQL library to build and execute SQL requests.

For such a generated program, of a whopping 700 kLines, Simon noticed an important degradation in compilation time, and profiling tools concluded that the new preprocessor implementation was responsible for it, as shown in the flamegraph below:

perf stats visualised on hotspot: the horizontal axis is the total duration. We can see that ppecho, the function for replacements, takes most of the preprocessing time, with the two-automata replacement phases. Credit: Simon Sobisch" src="/blog/assets/img/cobc-callgraph-pplex1.png"/>

A flamegraph generated by perf stats visualised on hotspot: the horizontal axis is the total duration. We can see that ppecho, the function for replacements, takes most of the preprocessing time, with the two-automata replacement phases. Credit: Simon Sobisch

So we started investigating to fix the problem in a new pull-request.

Optimizing Allocations

Our first intuition was that the main difference with the previous implementation came from allocating too many lists in the temporary state of the two automatons. This intuition was only partially right, as we will see.

Mutable lists were used in the automaton (and also in the former implementation) to store a small part of the stream of tokens, while they were being matched with a replacement source. On a partial match, the list had to wait for additionnal tokens to check for a full match. Actually, these lists were used as queues, as tokens were always added at the end, while matched or un-matched tokens were removed from the top. Also, the size of these lists was bounded by the maximal replacement that was defined in the code, that would unlikely be more than a few dozen tokens.

Our first idea was to replace these lists by real queues, that can be efficiently implemented using circular buffers and arrays. Each and every allocation of a new list element would then be replaced by the single allocation of a circular buffer, granted with a few possible reallocations further down the road if the list of replacements was to grow bigger.

The results were a bit disappointing: on the flamegraph, there was some improvement, but the replacement phase still took a lot of time:

token_list_add. But our work is not yet finished! Credit: Simon Sobisch" src="/blog/assets/img/cobc-callgraph-pplex2.png"/>

The flamegraph is better, as shown by the disappearance of calls to token_list_add. But our work is not yet finished! Credit: Simon Sobisch

Another intuition we had was that we had been a bit naive about allocating tokens: in the initial implementation of version 3.1.2, tokens were allocated when copied from the lexer into the single queue for replacement; in our implementation, that job was also done, but twice, as they were allocated in both automata. So, we modified our implementation to only allocate tokens when they are first entered in the COPY REPLACING stream, and not anymore when entering the REPLACE stream. A simple idea, that reduced again the remaining allocations by a factor of 2.

Yet, the new optimised implementation still didn't match the performance of the former 3.1.2 version, and we were running out of ideas on how the allocations performed by the automata could again be improved:

Using circular buffers instead of mutable lists for queues decreased allocations by a factor of 3. Removing the re-allocations between the two streams would also improve it by a factor of 2. A nice improvement, but not yet the performances of version 3.1.2

What about Fast Paths ?

So we decided to study some of the code from 3.1.2 to understand what could cause such a difference, and it became immediately obvious: the former version had two fast paths, that we had left out of our own implementation!

The two fast paths that completely shortcut the replacement mechanisms are the following:

The first one is when there are no replacements defined in the source. In COBOL, most replacements are only performed in the DATA DIVISION, and moreover, COPY REPLACING ones are only performed during copies. This means that a large part of the code that did not need to go through our two automata still did!

The second fast path is for spaces: replacements always start and finish by a non-space token in COBOL, so, if we check that we are not in the middle of partial match (i.e. both internal token queues are empty), we can safely make the space token skip the automata. Again, given the frequency of space tokens (about half, as there are very few other separators), this fast path is likely to be used very, very frequently.

Implementing them was straigthforward, and the results were the one expected:

After implementing the same fast paths as in 3.1.2, the flamegraph is back to normal, with the time spent in the replacement function being almost not noticeable. Credit: Simon Sobisch

Conclusion

As often with optimisations, intuitions do not always lead to the expected improvements: in our case, the real improvement came not with improving the algorithm, but from shortcutting it!

Yet, we are still very pleased by the results: the new optimised implementation of replacements in GnuCOBOL makes it more conformant to the standard, and also more efficient than the former 3.1.2 version, as shown by the final results sent to us by Simon:

These results show that the new implementation is now a little better than 3.1.2. It comes from using the circular buffers instead of the mutable lists for queues, but the optimisation only happens when replacements are defined, which is a very small part of the code source.

OCaml Backtraces on Uncaught Exceptions

2024-04-25T09:05:17Z

A mystical Camel using its net to catch all uncaught... Butterflies.

Uncaught exception: Not_found

This blog post probably won't teach anything new to OCaml veterans; but for the others, you might be glad to learn that this very basic, yet surprisingly little-known feature of OCaml will give you backtraces with source file positions on any uncaught exception.

Since it can save hours of frustrating debugging, my intent is to give some publicity to this accidentally hidden feature.

PSA: define OCAMLRUNPARAM=b in your environment.

For those wanting to go further, I'll then go on with hints and guidelines for good exception management in OCaml.

For the details, everything here is documented in the Printexc module.

Table of contents

Uncaught exception: Not_found
Get your stacktraces!
Improve your traces
- Properly Re-raising exceptions, and finalisers
- There are holes in my backtrace!
Guidelines for exception handling, and Control-C
- Controlling the backtraces from OCaml

Get your stacktraces!

Compile-time errors are good, but sometimes you just have to cope with run-time failures.

Here is a simple (and buggy) program:

let dict = [
    "foo", "bar";
    "foo2", "bar2";
]

let rec replace = function
  | [] -> []
  | w :: words -> List.assoc w dict :: words

let () =
  let words = Array.to_list Sys.argv in
  List.iter print_endline (replace words)

Side note

For purposes of the example, we use List.assoc here; this relies on OCaml's structural equality, which is often a bad idea in projects, as it can break in surprising ways when the matched type gets more complex. A more serious implementation would use e.g. Map.Make with an explicit comparison function.

Here is the result of executing this program with no options:

$ ./foo
Fatal error: exception Not_found

This isn't very helpful, but no need for a debugger, lots of printf or tedious debugging, just do the following:

$ export OCAMLRUNPARAM=b
$ ./foo
Fatal error: exception Not_found
Raised at Stdlib__List.assoc in file "list.ml", line 191, characters 10-25
Called from Foo.replace in file "foo.ml", line 8, characters 18-35
Called from Foo in file "foo.ml", line 12, characters 26-41

Much more helpful! In most cases, this will be enough to find and fix the bug.

If you still don't get the backtrace, you may need to recompile with -g (with dune, ensure your default profile is dev or specify --profile=dev)

So, now we know where the failure occured... But not on what input. This is not a matter of backtraces: if that's an issue, define your own exceptions, with arguments, and raise that rather than the basic Not_found.

Hint

If you run the program directly from your editor, with a properly configured OCaml mode, the file positions in the backtrace should be parsed and become clickable, making navigation very quick and easy.

Improve your traces

The above works well in general, but depending on the complexity of the programs, there are some more advanced tricks that may be helpful, to preserve or improve the backtraces.

Properly Re-raising exceptions, and finalisers

It's pretty common to want a finaliser after some processing, here to remove a temporary file:

let with_temp_file basename (f: unit -> 'a) : 'a =
  let filename = Filename.temp_file basename in
  match f filename with
  | result ->
    Sys.remove filename;
    result
  | exception e ->
    Sys.remove filename;
    raise e

In simple cases this will work, but if e.g. you are using the Printf module before re-raising, it will break the printed backtrace.

Solution 1: use Fun.protect ~finally f that handles the backtrace properly.

Solution 2: manually, use raw backtrace access from the Printexc module:

| exception e ->
  let bt = Printexc.get_raw_backtrace () in
  Sys.remove filename;
  Printexc.raise_with_backtrace e bt

Re-raising exceptions after catching them should always be done in this way.

There are holes in my backtrace!

Indeed, it may appear that not all function calls show up in the backtrace.

There are two main reasons for that:

functions can get inlined by the compiler, so they don't actually appear in the concrete backtrace at runtime;
tail-call optimisation also affects the stack, which can be visible here;

Don't run and disable all optimisations though! Some effort has been put in recording useful debugging information even in these cases. The Flambda pass of the compiler, which does more inlining, also actually makes it more traceable.

As a consequence, switching to Flambda will often give you more helpful backtraces with recursive functions and tail-calls. It can be done with opam install ocaml-option-flambda (this will recompile the whole opam switch).

Well, what if my program uses lwt?

Backtraces in this context are a complex matter -- but they can be simulated: a good practice is to use ppx_lwt and the let%lwt syntax rather than let* or Lwt.bind directly, because the ppx will insert calls that reconstruct "fake" backtrace information.

Guidelines for exception handling, and Control-C

Exceptions in OCaml can happen anywhere in the program: besides uses of raise, system errors can trigger them. In particular, if you want to implement clean termination on the user pressing Control-C without manually handling signals, you should call Sys.catch_break true ; you will then get a Sys.Break exception raised when the user interrupts the program.

Anyway, this is one reason why you must never use try .. with _ ->

let find_opt x m =
  try Some (Map.find x m)
  with _ -> None

The programmer was too lazy to write with Not_found. They may think this is OK since Map.find won't raise anything else. But if Control-C is pressed at the wrong time, this will catch it, and return None instead of stopping the program !

let find_debug x m =
  try Map.find x m
  with e ->
    let bt = Printexc.get_raw_backtrace () in
    Printf.eprintf "Error on %s!" (to_string x);
    Printexc.raise_with_backtrace e bt

This version is OK since it re-raises the exception. If you absolutely need to catch all exceptions, a last resort is to explicitely re-raise "uncatchable" exceptions:

let this_is_a_last_resort =
  try .. with
  | (Sys.Break | Assert_failure _ | Match_failure _) as e -> raise e
  | _ -> ..

In practice, you'll finally want to catch exceptions from your main function (cmdliner already offers to do this, for example); catching Sys.Break at that point will offer a better message than Uncaught exception, give you control over finalisation and the final exit code (the convention is to use 130 for Sys.Break).

Controlling the backtraces from OCaml

Setting OCAMLRUNPARAM=b in the environment works from the outside, but the module Printexc can also be used to enable or disable them from the OCaml program itself.

Printexc.record_backtrace: bool -> unit toggles the recording of backtraces. Forcing it off when running tests, or on when a debug flag is specified, can be good ideas;
Printexc.backtrace_status: unit -> bool checks if recording is enabled. This can be used when finalising the program to print the backtraces when enabled;

Nota Bene

The base library turns on backtraces recording by default. While I salute an attempt to remedy the issue that this post aims to address, this can lead to surprises when just linking the library can change the output of a program (e.g. this might require specific code for cram tests not to display backtraces)

The Printexc module also allows to register custom exception printers: if, following the advice above, you defined your own exceptions with parameters, use Printexc.register_printer to have that information available when they are uncaught.

Opam 102: Pinning Packages

2024-03-25T09:05:17Z

Pins standout. They help us anchor interest points, thus helping us focus on what's important. They become the catalyst for experimentation and help us navigating the strong safety features that opam provides users with.

Welcome, dear reader, to a new opam blog post!

Today we take an additional step down the metaphorical rabbit hole with opam pin, the easiest way to catch a ride on the development version of a package in opam.

We are aware that our readers are eager to see these blog posts venture on the developer side of the opam experience, and so are we, but we need to spend just a bit little more time on the beginner and user-side of it for now so please, bear with us! 🐻

This tutorial is the second one in this on-going series about the OCaml package manager opam. Be sure to read the first one to get up to speed. Also, check out each article's tags to get an idea of the entry level required for the smoothest read possible!

New to the expansive OCaml sphere? As said on the official opam website, opam has been a game changer for the OCaml distribution, since it first saw the day of light here, almost a decade ago.

Table of contents

Tutorial context
Use-case for opam pin
- Pinning a released package development version: opam pin add --dev-repo
- Pinning an unreleased package development version: opam pin add <url>
Dig into opam pin, find spicy features
Conclusion

Tutorial context and basis

As far as context goes for this article, we will consider that you already are familiar with the concepts introduced in our tutorial opam 101.

Your current environment should thus be somewhat similar to the one we had by the end of that tutorial. Meaning: your version of opam is a least 2.1.5 (all outputs were generated with this version), you have already launched opam init, created a global switch my-switch and, possibly, you have even populated it with a few packages with a few calls to the opam install command.

Furthermore, keep in mind that, in this blog post, we are approaching this subject from the perspective of a developer who is looking into integrating new packages to his current workload, not from the perspective of someone who is looking into sharing a project or publishing a new software.

opam pin is a feature that will quickly become necessary for you to use as you continue your exploration of opam. It allows for the user to pin a given package to a specific version, or even change the source from which said package is pulled, installed, and synchronised with from within your currently active switch.

This feature shines the most in contexts such as:

when doing ordinary switch management;
for incorporating external, still under-construction, libraries to your own current workload;
when designing a specific switch: pinning a specific package version will make it the main compatibility constraint for that switch, thus tailoring the environment around it in the process.

Reminder

Remember that opam's command-line interface is beginner friendly. You can, at any point of your exploration, use the --help option to have every command and subcommand explained. You may also check out the opam cheat-sheet that was released a while ago and still holds some precious insights on opam's CLI.

Use-case for `opam pin`

Now onto today's use-cases for opam pin, the premise is as follows:

The package on which your current development depends on has just had a major update on its development branch. This package is available on the opam repository and its name is hc.

That update introduced a new feature that you would very much like to experiment with for your own on-going project.

However, that feature is still very much a work-in-progress and the maintainers of hc are not about to release their package anytime soon...

That's when opam pin comes in. In this article, we will cover two similar use-cases for opam pin, namely the one dealing with pinning a version of a package that is already available on the opam repository, and that of pinning a version of an unreleased package, directly from its public URL.

After all the basics have been laid out, we will eventually cover some of the more underground ⛏ and dangerous 🔥 features available when pinning packages.

Important Notice

For the sake of convenience and brevity, we will breakdown the opam pin command, and some of its options, by only dealing with addresses that obey the classic definition of the word URL.

However do keep in mind that opam uses a broader definition for that word, going as far as to consider a filesystem path to be a valid string for a URL argument, thus allowing all opam pin calls and options to be valid when manipulating opam packages inside a local filesystem or local network instead of just on the web.

Pinning the dev version of a released package: `opam pin add --dev-repo`

Picking up from the base context: our project depends on hc, and hc has just received an update. The first option available for us to access this fresh update on the hc repository is to use opam pin add --dev-repo <pkg> command.

$ opam pin add --dev-repo hc
[hc.0.3] synchronised (git+https://git.zapashcanon.fr/zapashcanon/hc.git)
hc is now pinned to git+https://git.zapashcanon.fr/zapashcanon/hc.git (version 0.3)

The following actions will be performed:
  ∗ install dune 3.14.0 [required by hc]
  ∗ install hc   0.3*
===== ∗ 2 =====
Do you want to continue? [Y/n] y

<><> Processing actions <><><><><><><><><><><><><><><><><><><><><><><>
⬇ retrieved hc.0.3  (no changes)
⬇ retrieved dune.3.14.0  (https://opam.ocaml.org/cache)
∗ installed dune.3.14.0
∗ installed hc.0.3
Done.

So what exactly did `opam pin` do here?

$ opam pin add --dev-repo hc
[hc.0.3] synchronised (git+https://git.zapashcanon.fr/zapashcanon/hc.git)

When you feed a package name to the opam pin add --dev-repo command, it will first retrieve the package definition found inside the opam file in the directory of the corresponding package on the the Official OCaml opam repository or any other opam repositories that your local opam installation happens to be synchronised with.

You can inspect said package definition directly yourself with the opam show <pkg> command.

Let's take a look at the package definition for hc:

$ opam show hc

<><> hc: information on all versions ><><><><><><><><><><><><><><><><>
name         hc
all-versions 0.0.1  0.2  0.3

<><> Version-specific details <><><><><><><><><><><><><><><><><><><><>
version      0.3
repository   default
url.src      "https://git.zapashcanon.fr/zapashcanon/hc/archive/0.3.tar.gz"
url.checksum
          "sha256=61b443056adec3f71904c5775b8521b3ac8487df618a8dcea3f4b2c91bedc314"
          "sha512=a1d213971230e9c7362749d20d1bec6f5e23af191522a65577db7c0f9123ea4c0fc678e5f768418d6dd88c1f3689a49cf564b5c744995a9db9a304f4b6d2c68a"
homepage     "https://git.zapashcanon.fr/zapashcanon/hc"
doc          "https://doc.zapashcanon.fr/hc/"
bug-reports  "https://git.zapashcanon.fr/zapashcanon/hc/issues"
dev-repo     "git+https://git.zapashcanon.fr/zapashcanon/hc.git"
authors      "Léo Andrès <contact@ndrs.fr>"
maintainer   "Léo Andrès <contact@ndrs.fr>"
license      "ISC"
depends      "dune" {>= "3.0"} "ocaml" {>= "4.14"} "odoc" {with-doc}
synopsis     Hashconsing library
description  hc is an OCaml library for hashconsing. It provides
             easy ways to use hashconsing, in a type-safe and
             modular way and the ability to get forgetful
             memoïzation.

Here, you can see the dev-repo field which contains the URL of the development repository of that package. Opam will use that information to retrieve package sources for you.

hc is now pinned to git+https://git.zapashcanon.fr/zapashcanon/hc.git (version 0.3)

Once it has retrieved hc sources, opam will then store the status of the pin internally, which is that hc is git pinned to url git.zapashcanon.fr/zapashcanon/hc at version 0.3.

$ opam pin list
hc.0.3    git  git+https://git.zapashcanon.fr/zapashcanon/hc.git

Did you know? The default behaviour for opam pin is the list option. The option to see all pinned packages in the current active switch.

On the other hand, the default behaviour for opam pin <target> command is the add option. Keep it in mind if you happen to grow tired of typing opam pin add <target> every time.

Opam will then analyse hc dependencies and compute a solution that respects the dependencies constraints and state of your current switch (i.e. the compatibility constraints between the packages currently installed in your switch).

If it manages to do so, it will come forth with a prompt to install the pinned package and its dependencies.

The following actions will be performed:
  ∗ install dune 3.14.0 [required by hc]
  ∗ install hc   0.3*
===== ∗ 2 =====
Do you want to continue? [Y/n] y

Pressing Enter or y + Enter will perform the installation.

Notice that sometimes a * character is found next to some package actions? It's the shorthand signal that the package is pinned, you can get that information at a quick glance when opam outputs the actions to perform for you if you know what to look for.

<><> Processing actions <><><><><><><><><><><><><><><><><><><><><><><>
⬇ retrieved hc.0.3  (no changes)
⬇ retrieved dune.3.14.0  (https://opam.ocaml.org/cache)
∗ installed dune.3.14.0
∗ installed hc.0.3
Done.

Congratulations, you now have a pinned development version of the hc package. You can now start exploring the neat feature you have been looking forward to!

Pinning the dev version of an unreleased package: `opam pin add <url>`

Every once in a while on your OCaml journey, you will come across unreleased software.

These OCaml programs and libraries can still very much have active repositories but their maintainers have not yet gone as far as to release them in order to distribute their work through opam to the rest of the OCaml ecosystem.

Yet, you might still want to have seamless access to these software solutions on your local opam installation for your own personal enjoyment and developments. That's when opam pin add <url> comes in handy.

Modern OCaml projects will most often have one or several opam files in their tree which opam can operate with.

$ opam pin git+https://github.com/rjbou/opam-otopop
Package opam-otopop does not exist, create as a NEW package? [Y/n] y
opam-otopop is now pinned to git+https://github.com/rjbou/opam-otopop (version 0.1)

The following actions will be performed:
  ∗ install opam-client 2.0.10 [required by opam-otopop]
  ∗ install opam-otopop 0.1*
===== ∗ 2 =====
Do you want to continue? [Y/n] y

<><> Processing actions <><><><><><><><><><><><><><><><><><><><><><><>
⬇ retrieved opam-client.2.0.10  (https://opam.ocaml.org/cache)
∗ installed opam-client.2.0.10
∗ installed opam-otopop.0.1
Done.

As you can see, the course of an opam pin add <url> call is very close to that of an opam pin add --dev-repo <pkg>, the only exception being the following line:

Package opam-otopop does not exist, create as a NEW package? [Y/n] y

Since the package is unavailable on the opam repositories that your opam installation is synchronised with, opam doesn't know about it.

That's why it will ask you if you want to create it as a NEW package.

Once pinned, that package is available in your switch as any other ordinarily available repository package.

You can see here that opam has pinned the opam-otopop package to a specific 0.1 version.

opam-otopop is now pinned to git+https://github.com/rjbou/opam-otopop (version 0.1)

The reason for that is found inside the opam file at the root of the source repository for that package:

version: "0.1"

In any instance where this specific field is not found in the opam file, the version name would then be pinned to the verbatim ~dev version.

Dig into opam pin, find spicy features

Add a pin without installing with `--no-action`

Here are the two main use-cases for a call to opam pin with the --no-action option:

You don't want to install a package immediately, but do want to inform opam of its existence to allow opam to keep the compatibility constraints of that specific package in the equation whenever you are undertaking operations that would require such calculations;
You just want to be assured that your package will be synchronised with the right sources;

--no-action will only perform the first actions of an opam pin call and will quit before installing the package, it can be used with all pin subcommands.

$ opam pin add hc --dev-repo --no-action
[hc.0.3] synchronised (git+https://git.zapashcanon.fr/zapashcanon/hc.git)
hc is now pinned to git+https://git.zapashcanon.fr/zapashcanon/hc.git (version 0.3)
$

Update your pinned packages

There are two ways to go about updating and upgrading your pinned packages. They are the same no matter if you used the --dev-repo option, or <url> argument, or any other method for pinning them.

The first one you may consider is to either install, or reinstall the specific package(s). The reason is that opam will always first synchronise with the linked source, and then proceed to recompiling.

$ opam install opam-otopop

<><> Synchronising pinned packages ><><><><><><><><><><><><><><><><><><><><><><>
[opam-otopop.0.1] synchronised (git+https://github.com/rjbou/opam-otopop#master)

The following actions will be performed:
  ↻ recompile opam-otopop 0.1*

<><> Processing actions <><><><><><><><><><><><><><><><><><><><><><><><><><><><>
⊘ removed   opam-otopop.0.1
∗ installed opam-otopop.0.1
Done.

In the above code block, opam-otopop has been upgraded by that opam install call.

The second method is to use the specific opam update and opam upgrade mechanisms. These commands are very common in an opam abiding workflow. Their general usage was briefly mentioned in our article opam 101.

By default, opam update updates the state of your opam repositories, for you to have access to the most recent version of your packages. If you add the --development flag to it, it will also update the source code of your pinned packages internally.

$ opam update --development

<><> Synchronising development packages <><><><><><><><><><><><><><><><><><><><>
[opam-otopop.0.1] synchronised (git+https://github.com/rjbou/opam-otopop#master)
Now run 'opam upgrade' to apply any package updates.

Then you run upgrade as you would in any other package upgrade scenario.

$ opam upgrade
The following actions will be performed:
  ↻ recompile opam-otopop 0.1* [upstream or system changes]

<><> Processing actions <><><><><><><><><><><><><><><><><><><><><><><><><><><><>
⊘ removed   opam-otopop.0.1
∗ installed opam-otopop.0.1
Done.

Unpin packages

When you are done with your experimentation and wish to remove a pinned package, you can simply call the remove subcommand.

Keep in mind that opam unpin is an alias for opam pin remove.

The behaviour of opam unpin is slightly different between released and unreleased packages.

Released packages

If the pinned package is released, by default, opam will retrieve and install the released version of the package instead of removing that package altogether.

$ opam pin list
hc.0.3    git  git+https://git.zapashcanon.fr/zapashcanon/hc.git

$ opam list hc
# Packages matching: name-match(hc) & (installed | available)
# Package # Installed # Synopsis
hc.0.3    0.3         pinned to version 0.3 at git+https://git.zapashcanon.fr/zapashcanon/hc.git

$ opam pin remove hc
Ok, hc is no longer pinned to git+https://git.zapashcanon.fr/zapashcanon/hc.git (version 0.3)
The following actions will be performed:
  ↻ recompile hc 0.3
Do you want to continue? [Y/n] y

<><> Processing actions <><><><><><><><><><><><><><><><><><><><><><><><><><><><>
⬇ retrieved hc.0.3  (https://opam.ocaml.org/cache)
⊘ removed   hc.0.3
∗ installed hc.0.3
Done.

$ opam list hc
# Packages matching: name-match(hc) & (installed | available)
# Package # Installed # Synopsis
hc.0.3    0.3         Hashconsing library

As we can see in the details:

⬇ retrieved hc.0.3  (https://opam.ocaml.org/cache)

opam has retrieved the sources from the archive that is specified in the opam file of the relevant opam repository, thus pulling hc back down to its latest available, current-switch compatible, release.

Notice the absence of the * character next to the package action? It means the package is no longer pinned.

Unreleased packages

On the other hand, an unreleased package, since its only definition source—meaning both the location of its source code as well as all information required for opam to operate, found in the corresponding opam file—is the pin itself, opam will have no other choice than to offer to remove it for you.

$ opam pin list
opam-otopop.0.1    git  git+https://github.com/rjbou/opam-otopop#master

In this case, opam unpin <package-name> (or idempotently: opam pin remove <package-name>) launches an opam remove action:

$ opam pin remove opam-otopop
Ok, opam-otopop is no longer pinned to git+https://github.com/rjbou/opam-otopop#master (version 0.1)
The following actions will be performed:
  ⊘ remove opam-otopop 0.1
Do you want to continue? [Y/n] y

<><> Processing actions <><><><><><><><><><><><><><><><><><><><><><><><><><><><>
⊘ removed   opam-otopop.0.1
Done.

Unpin but do no action

Just like with the opam pin add command, the --no-action option is available when removing pins. It will only unpin the package, without removing it, or recompiling it.

$ opam pin remove opam-otopop --no-action
Ok, opam-otopop is no longer pinned to git+https://github.com/rjbou/opam-otopop#master (version 0.1)

$ opam list opam-otopop
# Packages matching: name-match(opam-otopop) & (installed | available)
# Package      # Installed # Synopsis
opam-otopop.0.1 0.1         An opam-otopop package

You may use it for removing the pin from a package while still keeping it installed in your switch, or replacing it by its opam repository definition version.

The resulting package remains linked to its URL, but it is not considered as pinned, so there will be no update or automatic syncing to follow the changes of the upstream branch.

You may also consider this feature to prepare a specific action, say, as a temporary state. For example, you could unpin several packages in a row, and then proceed to recompiling the whole batch in one go.

One URL to pin them all: handling a multi-package repository

Every example seen so far had but one opam file at the root of their respective work tree (sometimes in a specific opam/ directory).

Yet it is possible for some projects to have several packages distributed by a single repository. An example of this would be the opam project source repository itself. If that is the case, and you pin that URL, the default behaviour is that all the packages defined at that address will be pinned.

Let's take this project.

You can see that several packages are defined: ocp-index and ocp-browser.

Here's how a pin action behaves when given that URL:

$ opam pin add git+https://github.com/OCamlPro/ocp-index
This will pin the following packages: ocp-browser, ocp-index.
Continue? [Y/n] y
ocp-browser is now pinned to git+https://github.com/OCamlPro/ocp-index (version 1.3.6)
ocp-index is now pinned to git+https://github.com/OCamlPro/ocp-index (version 1.3.6)

The following actions will be performed:
  ∗ install ocp-indent  1.8.1  [required by ocp-index]
  ∗ install ocp-index   1.3.6*
  ∗ install ocp-browser 1.3.6*
===== ∗ 3 =====
Do you want to continue? [Y/n] y

<><> Processing actions <><><><><><><><><><><><><><><><><><><><><><><>
⬇ retrieved ocp-indent.1.8.1  (https://opam.ocaml.org/cache)
∗ installed ocp-indent.1.8.1
∗ installed ocp-index.1.3.6
∗ installed ocp-browser.1.3.6
Done.

As you can see, this process is exactly the same as before, but with 3 packages in one go.

What if I do not want to pin every package in that repository?

Easy: if you just need one of the packages found at that URL, you can just feed that package name to the opam pin add <package-name> <url> CLI call, just like we did at the beginning of this tutorial!

$ opam pin add ocp-index git+https://github.com/OCamlPro/ocp-index
[ocp-index.1.3.6] synchronised (git+https://github.com/OCamlPro/ocp-index)
ocp-index is now pinned to git+https://github.com/OCamlPro/ocp-index (version 1.3.6)

The following actions will be performed:
  ∗ install ocp-indent 1.8.1  [required by ocp-index]
  ∗ install ocp-index  1.3.6*
===== ∗ 2 =====
Do you want to continue? [Y/n] y

<><> Processing actions <><><><><><><><><><><><><><><><><><><><><><><>
⬇ retrieved ocp-indent.1.8.1  (cached)
∗ installed ocp-indent.1.8.1
∗ installed ocp-index.1.3.6
Done.

If you do not know the exact names of these different packages, you may also consider using the very handy opam pin scan command which will lookup the contents repository at the URL and list its opam packages for you:

$ opam pin scan git+https://github.com/OCamlPro/ocp-index
# Name       # Version  # Url
ocp-index    1.3.6      git+https://github.com/OCamlPro/ocp-index
ocp-browser  1.3.6      git+https://github.com/OCamlPro/ocp-index

Setting arbitrary version numbers, toying with fire

As demonstrated earlier, opam will choose a version of the pinned package according to the contents of the opam file.

The important thing to take away from that is, in most usual scenarios, the contents of the opam file are paramount to how opam will calculate compatibility constraints in a given switch.

It is from the information that is hardcoded inside the opam file that opam will be able to take educated decisions whenever changes to the state of your current switch are to be made. There is a way, however, to circumvent that behaviour, that we want to inform you of, even if it entails a bit of precaution.

Naturally, directly tinkering with such a key stability feature like compatibility constraints solving does require you to tread carefully. We will see together some of the pitfalls and things to do that will keep you from finding yourself in confusing situations in regards to the state of your switch and the dependencies within it.

Ready? Lets get acquainted with our first slightly dangerous opam feature:

You are allowed to append an arbitrary version number to the name of the pinned package for opam to incorporate in its calculations, as seen in the following code block:

$ opam pin add directories.1.0 git+https://github.com/ocamlpro/directories --no-action
[directories.1.0] synchronised (git+https://github.com/ocamlpro/directories)
directories is now pinned to git+https://github.com/ocamlpro/directories (version 1.0)

In this specific example, package directories is available in the opam repository, that our opam installation is synchronised with. However, there is no such 1.0 version in that repository. Not a single reference to such a version number can be found at that address, neither in the tags, nor releases of the repository, and not even in the opam file.

$ opam show directories --field all-versions
0.1  0.2  0.3  0.4  0.5

What we have done here is effectively telling opam that directories is at a different version number than it actually is in the most purely technical aspect...

But why would we want to do such a thing?

Let's consider a reasonable use-case for opam pin add <package>.<my-version-number> <url>:

You have been working on a project called my-project for some time and you are using a package named fst-dep for your development.

Below, you will find an excerpt of the fst-dep.opam file, specifically its dependencies:

depends: [
  "dep-to-try" { <= "3.0.0" }
  "other-dep"
]

All three packages (fst-dep,dep-to-try and other-dep) are installed in your current switch and are available on your favourite opam repository.

One day you go about checking the repository for each dependency, and you find that dep-to-try has just had one of its main features reimplemented, improved and optimised, they are preparing to release a 4.0.0 version soon.

See, these changes would have been available for you to fetch directly from it's development repository had you been working with it directly, but you are not. It is up to the maintainers of fst-dep to do that work.

Since you have no ownership over any of these dependencies. You have no way of changing any of the version constraints in this tiny dependency tree that ranges from fst-dep and upwards.

Here are the three mainstream solutions to this problem:

Wait for both packages to publish new releases. A new official release from the dep-to-try team, which would ship said reimplementation, and another from the fst-dep team which would update its dependency tree to include dep-to-try's latest version. Needless to say that this could take an arbitrary amount of time which is unsatisfying at best.
Another suboptimal solution would be to copy the current state of the entire opam repository relevant to your package distribution, go to the corresponding directory for fst-dep inside that repository, relax the hard dependency "dep-to-try" { <= "3.0.0" } and reinstall all the packages that are directly or indirectly affected by that change. A very time consuming task for such a small edit to the global dependency tree.
Last option would be to pin fst-dep, then go about manually editing the dependencies of fst-dep with the opam pin --edit option to relax the dependency. The only pitfall with this solution is that, in a context where dep-to-try is a key package in the OCaml distribution, and many other packages depend on it as well, you might have to do a lot of editing to make your switch a stable environment with all dependency constraints met...

So neither of these solutions fit our needs. They are all unsatisfactory at best and even counter-productive at worst.

That's when arbitrary version pinning shines.

The main benefit of this feature is that it allows for added flexibility in navigating and tweaking the compatibility tree of any opam repository at the switch-level. It provides the user with ways to circumvent all tasks pertaining to a larger operation on the global graph of packages.

$ opam pin dep-to-try.3.0.0 git+https://github.com/OCamlPro/dep-to-try
[dep-to-try.3.0.0] synchronised (file:///home/rjbou/ocamlpro/opam_bps_examples/dep-to-try)
dep-to-try is now pinned to git+https://github.com/OCamlPro/dep-to-try#master (version 3.0.0)

opam will still think that dep-to-try's version is valid ({ <= "3.0.0"}), even if you are synchronised with the state of its development branch, thus giving you access to the latest changes with the minimal amount of manual editing required. Pretty neat, right?

Now, onto the pitfalls that you should keep in mind when tinkering with your dependencies like that.

What kind of predicament awaits you?

You could introduce unforeseen behaviours. This could be anything from errors at compile-time, if dep-to-try's interfaces have changed significantly, to runtime crashes if you're unlucky.
Another source of confusion could arise if you happen to use the opam unpin dep-to-try --no-action command on such a package. After unpinning it, there's a chance that you would later forget it used to be pinned to a development version. There would be little to no way for you to remember which package it was that you had experimented with at some point. You would either have to inspect all you installed packages or even remake a switch from scratch which would not be affected by your reckless arbitrary version pinning and would work just fine after that.

Our advice is rather simple: use this feature with discretion and try to avoid unpinning packages if it's not to reinstall or remove them altogether. If you follow these instructions, you should be safe...

Setting multiple arbitrary version numbers

One last bit of black magic for you to play around with.

Instead of pinning package-name.my-version-number, you may use the --with-version option to pin packages at that URL to an arbitrary version. A key detail is that it is compatible with multiple opam file pinning... Just keep in mind that all the pitfalls mentioned previously apply here too, only with multiple packages at once, which could make it more confusing.

Below, you can see that we are setting all the packages found in that repository to the same version:

$ opam pin add git+https://github.com/OCamlPro/ocp-index --with-version 2.0.0
This will pin the following packages: ocp-browser, ocp-index.
Continue? [Y/n] y
ocp-browser is now pinned to git+https://github.com/OCamlPro/ocp-index (version 2.0.0)
ocp-index is now pinned to git+https://github.com/OCamlPro/ocp-index (version 2.0.0)

The following actions will be performed:
  ∗ install ocp-indent  1.8.1  [required by ocp-index]
  ∗ install ocp-index   2.0.0*
  ∗ install ocp-browser 2.0.0*
===== ∗ 3 =====
Do you want to continue? [Y/n] y

<><> Processing actions <><><><><><><><><><><><><><><><><><><><><><><>
⬇ retrieved ocp-indent.1.8.1  (cached)
⬇ retrieved ocp-index.2.0.0  (no changes)
⬇ retrieved ocp-browser.2.0.0  (no changes)
∗ installed ocp-indent.1.8.1
∗ installed ocp-index.2.0.0
∗ installed ocp-browser.2.0.0
Done.

You can see that all these packages are pinned to 2.0.0 now.

$ opam pin list
ocp-browser.2.0.0    git  git+https://github.com/OCamlPro/ocp-index
ocp-index.2.0.0      git  git+https://github.com/OCamlPro/ocp-index

Conclusion

Here it is, the opam pin command in most of its glory.

If you have managed to stick this long to read this article, you should no longer feel confused about pinning projects and should now have another of opam's most commonly used feature in your arsenal when tackling your own development challenges!

So it is that we have learned about pinning both released and unreleased packages. Additionally, we showcased several features for orthogonal use-cases: from the more quality of life-oriented calls such as opam show and opam pin scan, to obscure features like arbitrary version pinning as well as ordinary options like --no-action, --dev-repo and subcommands like opam unpin.

We are steadily approaching a level of familiarity with opam that will allow us to get into some really neat features soon.

Be sure to stay tuned with our blog, the journey into the rabbit hole has only started and opam is a deep one indeed!

Thank you for reading,

From 2011, with love,

The OCamlPro Team

Flambda2 Ep. 1: Foundational Design Decisions

2024-03-19T09:05:17Z

Welcome to The Flambda2 Snippets!

In this first post of The Flambda2 Snippets, we dive into the powerful CPS-based internal representation used within the Flambda2 optimizer, which was one of the main motivation to move on from the former Flambda optimizer.

Credit goes to Andrew Kennedy's paper Compiling with Continuations, Continued for pointing us in this direction.

The F2S blog posts aim at gradually introducing the world to the inner-workings of a complex piece of software engineering: The Flambda2 Optimising Compiler, a technical marvel born from a 10 year-long effort in Research & Development and Compilation; with many more years of expertise in all aspects of Computer Science and Formal Methods.

Table of contents

CPS (Continuation Passing Style)
Double Barrelled CPS
The Flambda2 Term Language
Following up

CPS (Continuation Passing Style)

Terms in the Flambda2 IR are represented in CPS style, so let us briefly explain what that means.

Some readers may already be familiar with what we call First-Class CPS where continuations are represented using functions of the language:

(* Non-tail-recursive implementation of map *)
let rec map f = function
| [] -> []
| x :: r -> f x :: map f r

(* Tail-recursive CPS implementation of map *)
let rec map_cps f l k =
match l with
| [] -> k []
| x :: r ->  let fx = f x in map_cps f r (fun r -> k (fx :: r))

This kind of transformation is useful to make a recursive function tail-recursive and sometimes to avoid allocations for functions returning multiple values.

In Flambda2, we use Second-Class CPS instead, where continuations are control-flow constructs in the Intermediate Language. In practice, this is equivalent to an explicit representation of a control-flow graph.

Here's an example using some hopefully intuitive syntax for the Flambda2 IR.

let rec map f = function
| [] -> []
| x :: r -> f x :: map f r

(* WARNING: FLAMBDA2 PSEUDO-SYNTAX INBOUND *)
let rec map
  ((f : <whatever_type1> ),
  (param : <whatever_type2>))
  {k_return_continuation : <return_type>}
{
  let_cont k_empty () = k_return_continuation [] in 
  let_cont k_nonempty x r =
    let_cont k_after_f fx =
      let_cont k_after_map map_result =
        let result = fx :: map_result in
        k_return_continuation result 
      in
      Apply (map f r {k_after_map})
    in
    Apply (f x {k_after_f})
  in
  match param with
  | [] -> k_empty ()
  | x :: r -> k_nonempty x r
}

Every let_cont binding declares a new sequence of instructions in the control-flow graph, which can be terminated either by:

calling a continuation (for example, k_return_continuation) which takes a fixed number of parameters;
applying an OCaml function (Apply), this function takes as a special parameter the continuation which it must jump to at the end of its execution. Unlike continuations, OCaml functions can take a number of arguments that does not match the number of parameters at their definition;
branching constructions like match _ with and if _ then _ else _, in these cases each branch is a call to a (potentially different) continuation;

This image shows the previous code represented as a graph.

Notice that some boxes are nested to represent scoping relations: variables defined in the outer boxes are available in the inner ones.

To demonstrate the kinds of optimisations that such control-flow graphs allow us, see the following simple example:

Original Program:

let f cond =
  let v =
    if cond then
      raise Failure
    else 0
  in
  v, v

We then represent the same program using CPS in two steps, the first is the direct translation of the original program, the second is an equivalent program represented in a more compact form.

Minimal CPS transformation, using pseudo-syntax

(* showing only the body of f *)
(* STEP 1 - Before graph compaction *)
let_cont k_after_if v =
  let result = v, v in
  k_return_continuation result
in
let_cont k_then () = k_raise_exception Failure in 
let_cont k_else () = k_after_if 0 in
if cond then k_then () else k_else ()

which becomes after inlining k_after_if:

(* STEP 2 - After graph compaction *)
let_cont k_then () = k_raise_exception Failure in 
let_cont k_else () =
  let v = 0 in 
  let result = v, v in
  k_return_continuation result
in
if cond then k_then () else k_else ()

This allows us, by using the translation to CPS and back, to transform the original program into the following:

Optimized original program

let f cond =
  if cond then
    raise Failure
  else 0, 0

As you can see, the original program is simpler now. The nature of the changes operated on the code are in fact not tied to a particular optimisation but rather the nature of the CPS transformation itself. Moreover, we do want to actively perform optimisations and to that extent, having an intermediate representation that is equivalent to a control-flow graph allows us to benefit from the huge amount of literature on the subject of static analysis of imperative programs which often are represented as control-flow graphs.

To be fair, in the previous example, we have cheated in how we have translated the raise primitive. Indeed we used a simple continuation (k_raise_exception) but we haven't defined it anywhere prior. This is possible because our use of Double Barrelled CPS.

Double Barrelled CPS

In OCaml, all functions can not only return normally (Barrel 1) but also throw exceptions (Barrel 2), it corresponds to two different paths in the control-flow and we need the ability to represent it in our own control-flow graph.

Hence the name: Double Barrelled CPS, that we took from this paper, by Hayo Thielecke. In practice this only has consequences in four places:

the function definitions must have two special parameters instead of one: the exception continuation (k_raise_exception) in addition to the normal return continuation (k_return_continuation);
the function applications must have two special arguments, reciprocally;
try ... with terms are translated using regular continuations with the exception handler (the with path of the construct) compiled to a continuation handler (let_cont);
raise terms are translated into continuation calls, to either the current function exception continuation (e.g. in case of uncaught exception) or the enclosing try ... with handler continuation.

The Flambda2 Term Language

This CPS form has directed the concrete implementation of the FL2 language.

We can see that the previous IRs have very descriptive representations, with about 20 constructors for Clambda and 15 for Flambda while Flambda2 has regrouped all these features into only 6 categories which are sorted by how they affect the control-flow.

type expr =
  | Let of let_expr
  | Let_cont of let_cont_expr
  | Apply of apply
  | Apply_cont of apply_cont
  | Switch of switch
  | Invalid of { message : string }

The main benefits we reap from such a strong design choice are that:

Code organisation is better: dealing with control-flow is only done when matching on full expressions and dealing with specific features of the language is done at a lower level;
Reduce code duplication: features that behave in a similar way will have their common code shared by design;

Following up

The goal of this article was to show a fundamental design choice in Flambda2 which is using a CPS-based representation. This design is felt throughout the Flambda2 architecture and will be mentioned and strengthened again in later posts.

Flambda2 takes the Lambda IR as input, then performs CPS conversion, followed by Closure conversion, each of them worth their own blog post, and this produces the terms in the Flambda2 IR.

From there, we have our main optimisation pass that we call Simplify which first performs static analysis on the term during a single Downwards Traversal, and then rebuilds an optimised term during the Upwards Traversal.

Once we have an optimised term, we can convert it to the CMM IR and feed it to the rest of the backend. This part is mostly CPS elimination but with added original and interesting work we will detail in a specific snippet.

The single-pass design allows us to consider all the interactions between optimisations

Some examples of optimisations performed during Simplify:

Inlining of function calls;
Constant propagation;
Dead code elimination
Loopification, that is transforming tail-recursive functions into loops;
Unboxing;
Specialisation of polymorphic primitives;

Most of the following snippets will detail one or several parts of these optimisations.

Stay tuned, and thank you for reading!

Behind the Scenes of the OCaml Optimising Compiler Flambda2: Introduction and Roadmap

2024-03-18T09:05:17Z

Introducing our Flambda2 snippets

At OCamlPro, the main ongoing task on the OCaml Compiler is to improve the high-level optimisation. This is something that we have been doing for quite some time now. Indeed, we are the authors behind the Flambda optimisation pass and today we would like to introduce the series of blog snippets showcasing the direct successor to it, the creatively named Flambda2.

This series of blog posts will cover everything about Flambda2, a new optimising backend for the OCaml native compiler. This introductory episode will provide you with some context and history about Flambda2 but also about its predecessor Flambda and, of course, the OCaml compiler!

This work may be considered as a completement to an on-going documentation effort at OCamlPro as well as to the many different talks we have given last year on the subject, two of which you can watch online: OCaml Workshop ( slideshow ), ML Workshop ( slideshow ).

This work was developed in collaboration with, and funded by Jane Street. Warm thanks to Mark Shinwell for shepherding the Flambda project and to Ron Minsky for his support.

Table of contents

Introduction
Compiling OCaml
Snippets Roadmap
The F2S Series!

Compiling OCaml

The compiling of OCaml is done through a multitude of passes (see simplified representation below), and the bulk of high-level optimisations happens between the Lambda IR (Intermediate Representation) and CMM (which stands for C--). This set of optimisations will be the main focus of this series of snippets.

Flambda." src="/blog/assets/img/flambda2_snippets_ep0_figure3_1.png"/>

The different passes of the OCaml compilers, from sources to executable code, before the addition of Flambda.

Indeed, that part of the compiler is quite crowded. Originally, after the frontend has type-checked the sources, the Closure pass was in charge of transforming the Lambda IR (see source code) into the Clambda IR (see source code). This transformation handles Constant Propagation, some inlining, and some Constant Lifting (moving constant structures to static allocation). Then, a subsequent pass (called Cmmgen) transforms the Clambda IR into the CMM IR (see source code) and handles some peep-hole optimisations and unboxing. This final representation will be used by architecture-specific backends to produce assembler code.

Before we get any further into the hairy details of Flambda2 in the upcoming snippets, it is important that we address some context.

We introduced the Flambda framework which was released with OCaml 4.03. This was a success in improving inlining and related optimisations, and has been stable ever since, with very few bug reports.

We kept both Closure and Flambda alive together because some users cared a lot about the compilation speed of OCaml - Flambda is indeed a bit slower than Closure.

Flambda provides an alternative to the classic Closure transformation, with additionnal optimizations." src="/blog/assets/img/flambda2_snippets_ep0_figure3_2.png"/>

Flambda provides an alternative to the classic Closure transformation, with additionnal optimizations.

Now is time to introduce another choice to both Flambda and Closure: Flambda2, which is meant to eventually replace Flambda and potentially Closure as well. In fact, Janestreet has been gradually moving from Closure and Flambda to Flambda2 during the past year and has to this day no more systems relying on Closure or Flambda.

You can read more about the transition from staging to production-level workloads of Flambda2 right here.

Flambda is still maintained and will be for the forseeable future. However, we have noticed some limitations that prevented us from doing some kinds of optimisations and on which we will elaborate in the following episodes of The Flambda2 Snippets series.

Flambda2 provides a much extended alternative to Flambda, from Lambda IR to CMM." src="/blog/assets/img/flambda2_snippets_ep0_figure3.png"/>

Flambda2 provides a much extended alternative to Flambda, from Lambda IR to CMM.

One obvious difference to notice is that Flambda2 translates directly to CMM, circumventing the Clambda IR, allowing us to lift some limitations inherent to Clambda itself.

Furthermore, we experimented after releasing Flambda with the aim to incrementally improve and add new optimisations. We tried to improve its internal representation and noticed that we could gain a lot by doing so, but also that it required deeper changes and that is what led us to Flambda2.

Snippets Roadmap

This is but the zeroth snippet of the series. It aims at providing you with history and context for Flambda2.

You can expect the rest of the snippets to alternate between deep dives into the technical aspects of Flambda2, and user-facing descriptions of the new optimisations that we enable.

The F2S Series!

Episode 1: Foundational Design Decisions in Flambda2

The first snippet covers the characteristics and benefits of a CPS-based internal representation for the optimisation of the OCaml language. It was already covered in part at the OCaml Workshop in 2023 and we go deeper into the subject in these blog posts.
Episode 2: Loopifying Tail-Recursive Functions

Loopify is the first optimisation algorithm that we introduce in the F2S series. In this post, we breakdown the concept of transforming tail-recursive functions in the context of reducing memory allocations inside of the Flambda2 compiler. We start with giving broader context around tail-recursion and tail-recursion optimisation before diving into how this transformation is both simple and representative of the philosophy behind all the optimisations conducted by the Flambda2 compiler.
Episode 3: Speculative Inlining

This article introduces Speculative Inlining, which is the name of the algorithm responsible for computing and inlining optimised function code inside of Flambda2. We cover how quickly we are faced with complex questions with only heuristic answers when it comes down to an optimal inlining choice. Speculative Inlining is also the best demonstration of how we traverse code in our compilation pipeline.
Episode 4: How to write a purely functional compiler

This article explores how Flambda2 processes and optimises code through structured traversals. We break down the key principles behind upward and downward traversals, explaining how they enable effective propagation of information, elimination of redundancies, and efficient transformation of expressions. These mechanisms play a crucial role in the simplification and optimisation pipeline, tying together techniques introduced in previous episodes.
Episode 5: A lifecycle of IR semantics: what goes into a conditional

Coming soon...

Stay tuned, and thank you for reading!

Lean 4: When Sound Programs become a Choice

2024-03-07T09:05:17Z

Monitoring Edge Technical Endeavours

As a company specialized in strongly-typed programming languages with strong static guarantees, OCamlPro closely monitors the ongoing trend of bringing more and more of these elements into mainstream programming languages. Rust is a relatively recent example of this trend; another one is the very recent Lean 4 language.

Table of contents

Monitoring Edge Technical Software
Lean 4, the Promise of Proven Software
OCamlPro for a Future of Trustworthy Software

Lean 4, the Promising Future of Proven Software

Lean 4 builds on the shoulders of giants like the Coq proof assistant, and languages such as OCaml and Haskell, to put programmers in a world where they can write elegant programs, express their specification with the full power of modern logics, and prove that these programs are correct with respect to their specification. Doing all this in the same language is crucial as it can streamline the certification process: once Lean 4 is trusted (audits, certification...), then programs, specifications, and proofs are also trusted. This contrasts with having a programming language, a specification language, and a separate verification/certification tool, and then having to argue about the trustworthiness of each of them, and that the glue linking all of them together makes sense. This is extremely interesting in the context of critical embedded systems in particular, and in qualified/certified "high-trust" development in general.

While admittedly not as mainstream as Rust, Lean 4 has recently seen an explosion in interest from the media, developers, mathematicians, and (some) industrials. Quanta now routinely publishes articles about/mentioning Lean 4; Fields medalist Terry Tao is increasingly vocal about (and productive with) its use of Lean 4, see here and here for (very technical) example(s). On the industrial side, Leonardo de Moura (Lean 4's lead designer) recently went from a position at Microsoft Research to Amazon Web Service, which was followed by a fast and still ongoing expansion of the infrastructure around Lean 4.

Pushing for a Future of Trustworthy Software

OCamlPro has been closely monitoring Lean 4's progress by regularly developing in-house prototypes in Lean 4. Getting involved in the community and Lean 4's development effort is also part of our culture. This is to give back to the community, but also to closely follow the evolution of Lean 4 and sharpen our skills.

There are a few notable and public examples of our involvement. As part of our in-house prototyping, we discovered a "major bug" in Lean 4's dependent pattern-matching; later, we contributed on improving aspects of the by notation (used to construct proofs), which then ricocheted into fixing problems into the calc tactic. More recently, we contributed on various fronts such as improving the ecosystem's ergonomics, adding useful lemmas to Lean 4's standard library, contributing to the documentation effort...

Lean 4 is not of industrial-strength yet, but it gets closer and closer. Quickly enough for us to think that now's a reasonable time to spend some time exploring it.

Opam 101: The First Steps

2024-01-23T09:05:17Z

Opam is like a magic box that allows people to be tidy when they share their work with the world, thus making the environment stable and predictable for everybody!

Welcome, dear reader, to a new series of blog posts!

This series will be about everything opam. Each article will cover a specific aspect of the package manager, and make sure to dissipate any confusion or misunderstandings on this keystone of the OCaml distribution!

Each technical article will be tailored for specific levels of engineering -- everyone, be they beginners, intermediate or advanced in the OCaml Arts will find answers to some questions about opam right here.

Checkout each article's tags to get an idea of the entry level required for the smoothest read possible!

Table of contents

Walking the path of opam, treading on solid ground
First step: installing opam
Second step: initialisation
Acclimating to the environment
Switches, tailoring your workspace to your vision
- Creating a global switch
- Creating a local switch
The official opam-repository, the safe for all your packages
Installing packages in your current switch
Conclusion

New to the expansive OCaml sphere? As said on the official opam website, opam has been a game changer for the OCaml distribution, since it first saw the light of day here, almost a decade ago.

Walking the path of opam, treading on solid ground

We are aware that it can be quite a daunting task to get on-board with the OCaml distribution. Be it because of its decentralised characteristics: plethora of different tools, a variety of sometimes clashing modi operandi and practices, usually poorly documented edge use-cases, the variety of ways to go about having a working environment or many a different reason...

We have been thinking about making it easier for everyone, even the more confirmed Cameleers, by releasing a set of blogposts progressively detailing the depths at which opam can go.

Be sure to read these articles from the start if you are new to the beautiful world of OCaml and, if you are already familiar, use it as a trust-worthy documentation on speed-dial... You never know when you will have to setup an opam installation while off-the-grid, do you ?

Are you ready to dive in ?

First step: installing opam

First, let's talk about installing opam.

DISCLAIMER: In this tutorial, we will only be addressing a fresh install of opam on Linux and Mac. For more information about a Windows installation, stay tuned with this blog!

One would expect to have to interact with the package manager of one's favourite distribution in order to install opam, and, to some extent, one would be correct. However, we cannot guarantee that the version of opam you have at your disposal through these means is indeed the one expected by this tutorial, and every subsequent one for that matter.

You can check that here, make sure the version available to you is 2.1.5 or above.

Thus, in order for us to guarantee that we are on the same version, we will use the installation method found here and add an option to specify the version of opam we will be working with from now on.

Note that if you don't add the --version 2.1.5 option to the following command line, the script will download and install the latest opam release. The cli of opam is made to remain consistent between versions so, unless you have a very old version, or if you read this article in the very distant future, you should not have problems by not using the exact same version as we do. For the sake of consistency though, I will use this specific version.

$ bash -c "sh <(curl -fsSL https://raw.githubusercontent.com/ocaml/opam/master/shell/install.sh) --version 2.1.5"

This script will download the necessary binaries for a proper installation of opam. Once done, you can move on to the nitty gritty of having a working opam environment with opam init.

Second step: initialisation

The first command to launch, after the initial opam binaries have been downloaded and opam has been installed on your system, is opam init.

This is when you step into the OCaml distribution for the first time.

opam init does several crucial things for you when you launch it, and the rest of this article will detail what exactly these crucial things are and what they mean:

it checks some required and recommended tools;
it syncs with the official OCaml opam-repository, which you can find here;
it sets up the opam environment in your *rc files;
it creates a switch and installs an ocaml-compiler for you;

Lets take a step-by-step look at the output of that command:

$ opam init
No configuration file found, using built-in defaults.
Checking for available remotes: rsync and local, git, mercurial, darcs.
Perfect!

<><> Fetching repository information ><><><><><><><><><><><><><><><><>
[default] Initialised

<><> Required setup - please read <><><><><><><><><><><><><><><><><><>

  In normal operation, opam only alters files within ~/.opam.

  However, to best integrate with your system, some environment
  variables should be set. If you allow it to, this initialisation
  step will update your bash configuration by adding the following
  line to ~/.profile:

    test -r ~/.opam/opam-init/init.sh && . ~/.opam/opam-init/init.sh > /dev/null 2> /dev/null || true

  Otherwise, every time you want to access your opam installation,
  you will need to run:

    eval $(opam env)

  You can always re-run this setup with 'opam init' later.

Do you want opam to modify ~/.profile? [N/y/f]
(default is 'no', use 'f' to choose a different file) y

User configuration:
  Updating ~/.profile.
[NOTE] Make sure that ~/.profile is well sourced in your ~/.bashrc.


<><> Creating initial switch 'default' (invariant ["ocaml" {>= "4.05.0"}] - initially with ocaml-base-compiler)

<><> Installing new switch packages <><><><><><><><><><><><><><><><><>
Switch invariant: ["ocaml" {>= "4.05.0"}]

<><> Processing actions <><><><><><><><><><><><><><><><><><><><><><><>
∗ installed base-bigarray.base
∗ installed base-threads.base
∗ installed base-unix.base
∗ installed ocaml-options-vanilla.1
⬇ retrieved ocaml-base-compiler.5.1.0  (https://opam.ocaml.org/cache)
∗ installed ocaml-base-compiler.5.1.0
∗ installed ocaml-config.3
∗ installed ocaml.5.1.0
∗ installed base-domains.base
∗ installed base-nnp.base
Done.

The main result for an opam init call is to setup what is called your opam root. It does so by creating a ~/.opam directory to operate inside of. opam modifies and writes in this location only as a default.

First, opam checks that there is at least one required tool for syncing to the opam-repository. Then it checks what backends are available in your system. Here all are available: rsync, git, mercurial, and darcs. They will be used to sync repositories or packages.

$ opam init
No configuration file found, using built-in defaults.
Checking for available remotes: rsync and local, git, mercurial, darcs.
Perfect!

Then, opam fetches the default opam repository: opam.ocaml.org.

<><> Fetching repository information ><><><><><><><><><><><><><><><><>
[default] Initialised

Secondly, opam requires your input in order to configure your shell for the smoothest possible experience. For more details about the opam environment, refer to the next section.

Something interesting to remember for later is, in the excerpt below, we grant opam with the permission to edit the ~/.profile file. This part of the Quality of Life features for an everyday use an opam environment and we will detail how so below.

<><> Required setup - please read <><><><><><><><><><><><><><><><><><>

  In normal operation, opam only alters files within ~/.opam.

  However, to best integrate with your system, some environment
  variables should be set. If you allow it to, this initialisation
  step will update your bash configuration by adding the following
  line to ~/.profile:

    test -r ~/.opam/opam-init/init.sh && . ~/.opam/opam-init/init.sh > /dev/null 2> /dev/null || true

  Otherwise, every time you want to access your opam installation,
  you will need to run:

    eval $(opam env)

  You can always re-run this setup with 'opam init' later.

Do you want opam to modify ~/.profile? [N/y/f]
(default is 'no', use 'f' to choose a different file) y

User configuration:
  Updating ~/.profile.
[NOTE] Make sure that ~/.profile is well sourced in your ~/.bashrc.

The next action is the installation of your very first switch alongside a version of the OCaml compiler, by default a compiler >= 4.05.0 to be exact.

For more information about what is a switch be sure to read the rest of the article.

<><> Creating initial switch 'default' (invariant ["ocaml" {>= "4.05.0"}] - initially with ocaml-base-compiler)

<><> Installing new switch packages <><><><><><><><><><><><><><><><><>
Switch invariant: ["ocaml" {>= "4.05.0"}]

<><> Processing actions <><><><><><><><><><><><><><><><><><><><><><><>
∗ installed base-bigarray.base
∗ installed base-threads.base
∗ installed base-unix.base
∗ installed ocaml-options-vanilla.1
⬇ retrieved ocaml-base-compiler.5.1.0  (https://opam.ocaml.org/cache)
∗ installed ocaml-base-compiler.5.1.0
∗ installed ocaml-config.3
∗ installed ocaml.5.1.0
∗ installed base-domains.base
∗ installed base-nnp.base
Done.

Great! So let's focus on the actions performed by the opam init call!

Acclimating to the environment

Well, as said previously, the first action was to setup an opam root in your $HOME directory, (i.e: ~/.opam). This is where opam will operate. opam will never modify other locations in your filesystem without notifying you first.

An opam root is made to resemble a linux-like architecture. You will find inside it directories such as /usr, /etc, /bin and so on. This is by default where opam will store everything relative to your system-wide installation. Config files, packages and their configurations, and also binaries.

This leads us to the need for an eval $(opam env) call.

Indeed, in order to make your binaries and such accessible as system-wide tools, you need to update all the relevant environment variables (PATH, MANPATH, etc.) with all the locations for all of your everyday OCaml tools.

To see what variables are exported when evaluating the opam env command, you can check the following codeblock:

$ opam env
OPAM_SWITCH_PREFIX='~/.opam/default'; export OPAM_SWITCH_PREFIX;
CAML_LD_LIBRARY_PATH='~/.opam/default/lib/stublibs:~/.opam/default/lib/ocaml/stublibs:~/.opam/default/lib/ocaml'; export CAML_LD_LIBRARY_PATH;
OCAML_TOPLEVEL_PATH='~/.opam/default/lib/toplevel'; export OCAML_TOPLEVEL_PATH;
MANPATH=':~/.opam/default/man'; export MANPATH;
PATH='~/.opam/default/bin:$PATH'; export PATH;

Remember when we granted opam init with the permission to edit the ~/.profile file, earlier in this tutorial ? That comes in handy now: it keeps us from having to use the eval $(opam env) more than necessary.

Indeed, you would otherwise have to call it every time you launch a new shell among other things. What it does instead is adding hook at prompt level that keeps opam environment synced, updating it every time the user presses Enter. Very handy indeed.

Switches, tailoring your workspace to your vision

The second task accomplished by opam init was installing the first switch inside your fresh installation.

A switch is one of opam's core operational concepts, it's definition can vary depending on your exact use-case but in the case of OCaml, a switch is a named pair:

an arbitrary version of the OCaml compiler
a list of packages available for that specific version of the compiler.

In our example, we see that the only packages installed in the process were the dependencies for the OCaml compiler version 5.1.0 inside the switch named default.

<><> Creating initial switch 'default' (invariant ["ocaml" {>= "4.05.0"}] - initially with ocaml-base-compiler)

<><> Installing new switch packages <><><><><><><><><><><><><><><><><>
Switch invariant: ["ocaml" {>= "4.05.0"}]

<><> Processing actions <><><><><><><><><><><><><><><><><><><><><><><>
∗ installed base-bigarray.base
∗ installed base-threads.base
∗ installed base-unix.base
∗ installed ocaml-options-vanilla.1
⬇ retrieved ocaml-base-compiler.5.1.0  (https://opam.ocaml.org/cache)
∗ installed ocaml-base-compiler.5.1.0
∗ installed ocaml-config.3
∗ installed ocaml.5.1.0
∗ installed base-domains.base
∗ installed base-nnp.base
Done.

You can create an arbitrary amount of parallel switches in opam. This allows users to manage parallel, independent OCaml environments for their developments.

There are two types of switches:

global switches have their packages, binaries and tools available anywhere on your computer. They are useful when you consider a given switch to be your default and most adequate environment for your everyday use of opam and OCaml.
local switches on the other hand are only available in a given directory. Their packages and binaries are local to that specific directory. This allows users to make specific projects have their own self-contained working environments. The local switch is automatically selected by opam as the current one when you are located inside the appropriate directory. More details on local switches below.

The default behaviour for opam when creating a switch at init-time is to make it global and name it default.

$ opam switch show
default
$ opam switch
#  switch   compiler     description
→  default  ocaml.5.1.0  default

Now that you have a general understanding of what exactly is a switch and how it is used, let's get into how you can go about manually creating your first switch.

Creating a global switch

NB: Remember that opam's command-line interface is beginner friendly. You can, at any point of your exploration, use the --help option to have every command and subcommand explained. You may also checkout the opam cheat-sheet that was released a while ago and might still hold some precious insights on opam's cli.

So how does one create a switch ? The short answer is bafflingly straightforward:

# Installs a switch named "my-switch" based OCaml compiler version > 4.05.0
# Here 4.05 is the default lower compiler version opam selects when unspecified
$ opam switch create my-switch

Easy, right? Now let's imagine that you would like to specify a later version of the OCaml compiler. The first thing you would want to know is which version are available for you to specify, and you can use opam list for that.

Other commands can be used to the same effect but we prefer introducing you to this specific one as it may also be used for any other package available via opam.

So, as for any other package than ocaml itself, opam list will give you all available versions of that package for your currently active switch. Since we don't yet have an OCaml compiler installed, it will list all of them so that we may pick and choose our favourite to use for the switch we are making.

$ opam list ocaml
# Packages matching: name-match(ocaml) & (installed | available)
# Package    # Installed # Synopsis
ocaml.3.07   --          The OCaml compiler (virtual package)
ocaml.3.07+1 --          The OCaml compiler (virtual package)
ocaml.3.07+2 --          The OCaml compiler (virtual package)
ocaml.3.08.0 --          The OCaml compiler (virtual package)
(...)
ocaml.4.13.1 --          The OCaml compiler (virtual package)
ocaml.4.13.2 --          The OCaml compiler (virtual package)
(...)
ocaml.5.2.0  --          The OCaml compiler (virtual package)

Let's use it for a switch:

# Installs a switch named "my-switch" based OCaml compiler version = 4.13.1
$ opam switch create my-switch ocaml.4.13.1

That's it, for the first time, you have manually created your own global switch tailored to your specific needs, congratulations!

NB: Creating a switch can be a fairly time-consuming task depending on whether or not the compiler version you have queried from opam is already installed on your machine, typically in a previously created switch. Every time you ask opam to install a version of the compiler, it will first scour your installation for a locally available version of that compiler to save you the time necessary for downloading, compiling and installing a brand new one.

Now, onto local switches.

Creating a local switch

As said previously, the use of a local switch is to constrain a specific OCaml environment to a specific location on your workstation.

Let's imagine you are about to start a new development called my-project.

While preparing all necessary pre-requisites for it, you notice something problematic: your global default switch is drastically incompatible with the dependencies of your project. In this imaginary situation, you have a default global switch that is useful for most of your other tasks but now have only one project that differs from your usual usage of OCaml.

To remedy this situation, you could go about creating another global switch for your upcoming dev requirements on my-project and proceed to install all relevant packages and remake a full switch from scratch for that specific project. However this would require you to always keep track of which one is your currently active switch, while possibly having to regularly oscillate between your global default switch and your alternative global my-project switch which you could understandably find to be suboptimal and tedious to incorporate to your workflow on the long run.

That's when local switches come in handy because they allow you to leave the rest of your OCaml dev environment unaffected by whatever out-of-bounds or specific workload you're undertaking. Additionally, the fact that opam automatically selects your local switch as your current active one as soon as you step inside the relevant directory makes the developers's context switch seemless.

Let's examine how you can create such a switch:

# Hop inside the directory of your project
$ cd my-project
# We consider your project already has an opam file describing only
# its main dependency: ocaml.4.14.1
$ opam switch create .

<><> Installing new switch packages <><><><><><><><><><><><><><><><><>
Switch invariant: ["ocaml" {>= "4.05.0"}]

<><> Processing actions <><><><><><><><><><><><><><><><><><><><><><><>
∗ installed base-bigarray.base
∗ installed base-threads.base
∗ installed base-unix.base
∗ installed ocaml-system.4.14.1
∗ installed ocaml-config.2
∗ installed ocaml.4.14.1
Done.
$ opam switch
#  switch                   compiler      description
→  /home/ocp/my-project     ocaml.4.14.1  /home/ocp/my-project
   default                  ocaml.5.1.0   default
   my-switch                ocaml.4.13.1  my-switch

[NOTE] Current switch has been selected based on the current directory.
       The current global system switch is default.

Here it is, you can now hop into your local switch /home/ocp/my-project whenever you have time to deviate from your global environment.

The official opam-repository, the safe for all your packages

Among all the things that opam init did when it was executed, there is still one detail we have yet to explain and that's the first action of the process: retrieving packages specification from the official OCaml opam-repository.

Explaining what exactly an opam-repository is requires the recipient to have a slightly deeper understanding of how opam works than the average reader this article was written for might have; so you will have to wait for us to go deeper into that subject in another blogpost when the time is ripe.

What we will do now though is explain what the official OCaml opam-repository is and how it relates to our use of opam in this blog post.

The Official OCaml opam-repository is an open-source project where all released software of the OCaml distributions are referenced. It holds different compilers, basic tools, thousands of libraries, approximatively 4500 packages in total as of today and is configured to be the default repository for opam to sync to. You may add your own repositories for your own use of opam, but again, that's a subject for another time.

In case the repository itself is not what you are looking for, know that all packages available throughout the entire OCaml distribution may be browsed directly on ocaml.org.

It is essentially a collection of opam packages described in opam file format. Checkout the manual for more information about the opam file format.

A short explanation for it is that an opam package file holds every information necessary for opam to operate and provide. The file lists all of the packages direct dependencies, where to find its source code, the names and emails of maintainers and authors, different checksums for each archive release and the list goes on.

Here's a quick example for you to have an idea of what it looks like:

opam-version: "2.0"
synopsis: "OCaml bindings to Zulip API"
maintainer: ["Dario Pinto <dario.pinto@ocamlpro.com>"]
authors: ["Mohamed Hernouf <mohamed.hernouf@ocamlpro.com>"]
license: "LGPL-2.1-only WITH OCaml-LGPL-linking-exception"
homepage: "https://github.com/OCamlPro/ozulip"
doc: "https://ocamlpro.github.io/ozulip"
bug-reports: "https://github.com/OCamlPro/ozulip/issues"
dev-repo: "git+https://github.com/OCamlPro/ozulip.git"
tags: ["zulip" "bindings" "api"]
depends: [
  "ocaml" {>= "4.10"}
  "dune" {>= "2.0"}
  "ez_api" {>= "2.0.0"}
  "re"
  "base64"
  "json-data-encoding" {>= "1.0.0"}
  "logs"
  "lwt" {>= "5.4.0"}
  "ez_file" {>= "0.3.0"}
  "cohttp-lwt-unix"
  "yojson"
  "logs"
]
build: [ "dune" "build" "-p" name "-j" jobs "@install" ]
url {
  src: "https://github.com/OCamlPro/ozulip/archive/refs/tags/0.1.tar.gz"
  checksum: [
    "md5=4173fefee440773dd0f8d7db5a2e01e5"
    "sha512=cb53870eb8d41f53cf6de636d060fe1eee6c39f7c812eacb803b33f9998242bfb12798d4922e7633aa3035cf2ab98018987b380fb3f380f80d7270e56359c5d8"
  ]
}

Okay so now, how do we go about populating a switch with packages and really get started?

Installing packages in your current switch

It's elementary. This simple command will do the trick of trying to install a package, and its dependencies, in your currently active switch.

$ opam install my-package

I say trying because opam will notify you if the current package version and its dependencies you are querying are or not compatible with the current state of your switch. It will also offer you solutions for the compatibility constraints between packages to be satisfiable: it may suggest to upgrade some of your packages, or even to remove them entirely.

The key thing about this process is that opam is designed to solve compatibility constraints in the global graph of dependencies that the OCaml packages form. This design is what makes opam the average Cameleer's best friend. It will highlight inconsistencies within dependencies, it will figure out a way for your specific query to be satisfiable somehow and save you a lot of headscratching, that is, if you are willing to accommodate a bit of getting-used to.

The next command allows you to uninstall a package from your currently active switch as well as the packages that depend on it:

$ opam remove my-package

And the two following will update the state of the repositories opam is synchronised with and upgrade the packages installed while always keeping package compatibility in mind.

$ opam update
$ opam upgrade

Conclusion

Here it is, you should now be knowledgeable enough about opam to jump right in the OCaml discovery!

Today we learned everything elementary about opam.

From installation, to initialisation and explanations about the core concepts of the opam environment, switches, packages and the Official OCaml opam-repository.

Be sure to stay tuned with our blog, the journey into the rabbit hole has only started and opam is a deep one indeed!

Thank you for reading,

From 2011, with love,

The OCamlPro Team

Maturing Learn-OCaml to version 1.0: Gateway to the OCaml World

2023-12-13T09:05:17Z

Camels are known to be able to walk long distances. They have adapted to an inhospitable environment and help Humanity daily.

From the very start OCamlPro has been trying to help ease the learning of the OCaml language. OCaml has been used around the world to teach about a variety of Computer Science domains, from algorithmic to calculus, or functional programming and compilation.

The language had been long taught in Academia when arose initiatives to offer simple web tools to write and compile OCaml code in a simple web browser. We launched the TryOCaml web editor for OCaml all the way back in 2012. We were then appointed in 2015 by Roberto Di Cosmo from the French University Paris-Diderot, to create the OCaml FUN MOOC platform - and helped write the exercises used as pedagogical resources for the Introduction to Functional Programming.

That is how the Learn-OCaml open source learning platform was born, created then maintained at OCamlPro until 2018. Its steering was then transferred to the OCaml Software Foundation in 2019 and the project steadily grew into a fully fledged tool used by teachers and students around the world to this day.

Kudos to all OCaml teachers around the world, and to the LearnOCaml team, shepherded by Louis Gesbert

Learn-OCaml v1.0

What is Learn-OCaml today?

Learn-OCaml is a web platform for orchestrating exercises for OCaml programming, with automated grading. The interface features a code editor and client-side evaluation and grading; it can be served statically, but if running the bundled server there are also server-side saves, facilities for teachers to follow the progress of students, give assignments, get grades, etc.

We are thrilled to announce that the steady work that has been accomplished over the years on Learn-OCaml is finally bearing its fruits in the form of a long-awaited soon-to-be-released v1.0!

For all details relative to the upcoming 1.0 release, do refer to Louis' post on OCaml Discuss.

For all historical intents and purposes, do refer to the original 2016 OCaml Workshop paper on Learn-OCaml which kickstarted a long stream of updates and improvements to the platform and its public corpus exercices.

The maintenance and development work on the platform is now funded by the OCaml Software Foundation.

The latest release of Alt-Ergo version 2.5.1 is out, with improved SMT-LIB and bitvector support!

2023-09-18T09:05:17Z

Alt‑Ergo: An Automated SMT Solver for Program Verification

We are happy to announce a new release of Alt‑Ergo (version 2.5.1).

Alt-Ergo is a cutting-edge automated prover designed specifically for mathematical formulas, with a primary focus on advancing program verification.

This powerful tool is instrumental in the arsenal of static analysis solutions such as Trust-In-Soft Analyzer and Frama-C. It accompanies other major solvers like CVC5 and Z3, and is part of the solvers used behind Why3, a platform renowned for deductive program verification.

Find out more about Alt‑Ergo and how to join the Alt-Ergo Users' Club here!

This release includes the following new features and improvements:

support for bit-vectors in the SMT-LIB format;
new SMT-LIB parser and typechecker;
improved bit-vector reasoning;
partial support for SMT-LIB commands set-option and get-model;
simplified options to enable floating-point arithmetic theory;
various bug fixes.

Update for bug fixes

Since writing this blog post, we have released Alt-Ergo version 2.5.2 which fixes an incorrect implementation of the (distinct) SMT-LIB operator when applied to more than two arguments, and a (rare) crash in model generation. We strongly advise users interested in SMT-LIB or model generation support upgrade to version 2.5.2 on OPAM.

Better SMT-LIB Support

This release includes a better support of the SMT-LIB standard v2.6. More precisely, the release contains:

built-in primitives for the FixedSizeBitVectors;
Reals_Ints theories and the QF_BV logic;
new fully-featured parsers and type-checkers for SMT-LIB and native Alt-Ergo languages;
specific and meaningful messages for syntax and typing errors.

These features are powered by the Dolmen Library through a new frontend alongside the legacy one. Dolmen, developed by our own Guillaume Bury, is also used by the SMT community to check the conformity of the SMT-LIB benchmarks.

Important: In this release, the legacy frontend is still the default. If you want to enable the new Dolmen frontend, use the option --frontend dolmen. We encourage you to try it and report any bugs on our issue tracker.

Note: We plan to deprecate the legacy frontend and make Dolmen the default frontend in version 2.6.0, and to fully remove the legacy frontend in version 2.7.0.

Support For Bit-Vectors Primitives

Alt-Ergo has had support for bit-vectors in its native language for a long time, but bit-vectors were not supported by the old SMT-LIB parser, and hence not available in the SMT-LIB format. This has changed with the new Dolmen front-end, and support for bit-vectors in the SMT-LIB format is now available starting with Alt-Ergo 2.5.1!

The SMT-LIB theories for bit-vectors, BV and QF_BV, have more primitives than those previously available in Alt-Ergo. Alt-Ergo 2.5.1 supports all the primitives in the BV and QF_BV primitives when using the Dolmen frontend. Alt-Ergo's reasoning capabilities on the new primitives are limited, and will be gradually improved in future releases.

Built-in Primitives For Mixed Integer And Real Problems

In this release, we add the support for the primitives to_real, to_int and is_int of the SMT-LIB theory Reals_Ints. Notice that the support is only avalaible through the Dolmen frontend.

Example

For instance, the input file input.smt2:

(set-logic ALL)
(declare-const x Int)
(declare-const y Real)
(declare-fun f (Int Int) Int)
(assert (= (f x y) 0))
(check-sat)

with the command:

alt-ergo --frontend dolmen input.smt2

produces the limpid error message:

File "input.smt2", line 5, character 11-18:
5 | (assert (= (f x y) 0))
               ^^^^^^^
Error The term: `y` has type `real` but was expected to be of type `int`

Model Generation

Generating models (also known as counterexamples) is highly appreciated by users of SMT-solvers. Indeed, most builtin theories in common SMT-solvers are incomplete. As a consequence, solvers can fail to discharge goals and, without models, the SMT-solver behaves as a black box by outputting laconic answers: sat, unsat or unknown.

Providing best-effort counterexamples assists developers to understand why the solver failed to validate goals. If the goal isn't valid, the solver should, as much as it can, output a correct counter-example that helps users while fixing their specifications. If the goal is actually valid, the generated model is wrong but it can help SMT-solver's maintainers to understand why their solver didn't manage to discharge the goal.

Model generation for LIA theory and enum theory is available in Alt-Ergo. The feature for other theories is either in testing phase or being implemented. If you run into wrong models, please report them on our Github repository.

Usage

The present release provides convenient ways to invoking models. Notice that we change model invocation since the post Alt-Ergo: the SMT solver with model generation about model generation on the next development branch.

Check out the documentation for more details.

Floating Point Support

In version 2.5.1, the options to enable support for unbounded floating-point arithmetic have been simplified. The options --use-fpa and --prelude fpa-theory-2019-10-08-19h00.ae are gone: floating-point arithmetic is now treated as a built-in theory and can be enabled with --enable-theories fpa. We plan on enabling support for the FPA theory by default in a future release.

Usage

To turn on the fpa theory, use the new option --enable-theory fpa as follows:

alt-ergo --enable-theory fpa input.smt2

About Alt-Ergo 2.5.0

Version 2.5.0 should not be used, as it contains a soundness bug with the new bvnot primitive that slipped through the cracks. The bug was found immediately after the release, and version 2.5.1 released with a fix.

Acknowledgements

We thank members of the Alt-Ergo Users' Club: Thales, Trust-in-Soft, AdaCore, MERCE and the CEA.

We specially thank David Mentré and Denis Cousineau at Mitsubishi Electric R&D Center Europe for funding the initial work on model generation. Note that MERCE has been a Member of the Alt-Ergo Users' Club for three years. This partnership allowed Alt-Ergo to evolve and we hope that more users will join the Club on our journey to make Alt-Ergo a must-have tool.

The dedicated members of our Alt-Ergo Club!

2022 at OCamlPro

2023-06-30T09:05:17Z

Clear skies on OCamlPro's way of life.

For 12 years now, OCamlPro has been empowering a large range of customers, allowing them to harness state-of-the-art technologies and languages like OCaml and Rust. Our not-so-small-anymore company steadily grew into a team of highly-skilled and passionate engineers, experts in Computer Science, from Compilation and Software Analysis to Domain Specific Languages design and Formal Methods.

In this article, as every year (see last year's post) - albeit later than we do usually, we review some of the work we did during 2022, in many different worlds as shows the wide range of the missions we achieved.

Table of contents

Newcomers at OCamlPro

Modernizing Core Parts of Real Life Applications

MLANG, keystone of the French citizens' Income Tax Calculation
Contributing to GnuCOBOL, the Free Open-Source COBOL Alternative

Rust Expertise and Developments

Ecore, a heart of Rust for EMF
Open-Source Rust Contributions

The WebAssembly Garbage Collection Working-Group

Tooling for Formal Methods

The Alt-Ergo Theorem Prover
- The Alt-Ergo Users' Club
- Developing Alt-Ergo
Dolmen Library for Automated Deduction Languages

Contributions to OCaml

About opam, the OCaml Package Manager
The Flambda2 Optimizing Compiler

Organizing OCaml Meetups

OCaml Users in PariS (OUPS)
OCaml Meet-Up in Toulouse

Participation to External Events

The OCaml Workshop 2022 - ICFP Ljubljana
Journées Francophones Langages Applicatif 2022

Newcomers at OCamlPro

OCamlPro is not just a R&D software company, we like to think about it more as a team of people who like to work together. So, we are proud to introduce you the incredible human beings that joined us in 2022:

Pierre Villemot joined us in June. After three years of research at the Weizmann Institute on transcendental measures in Arithmetical Geometry, he was recruited and became the main maintainer of the Alt-Ergo Theorem Prover.
Milàn Martos joined us in July. He studied Chemistry and Computer Science at ENS, and he holds an MBA. He joined the Team as a Presales Engineer and as a Junior OCaml Web Developer.
Nathanaëlle Courant joined us in September. She holds a Master's degree from École Normale Supérieure in Paris, and is finishing her Ph.D. on efficient and verified reduction and convertibility tests for theorem provers. She joined OCamlPro in 2022 and works on the OCaml optimizer, in the Flambda team.
Arthur Carcano also joined us in September. Arthur is a Rust developer interested in performance optimization, software design, and crafting powerful and user-friendly tools. After completing his M.Sc. in Computer Science at ENS Ulm, he obtained a Ph.D. in Mathematics and Computer Science from Université de Paris.
Emilien Lemaire joined us in December 2022. After an internship on typechecking COBOL statements, he will be working with our COBOL team on creating a studio of modern tools for COBOL.

Modernizing Core Parts of Real Life Applications

We love to harness our IT expertise to give a competitive advantage to our clients by modernizing core chunks of key infrastructures. For example, we are working with the French Public Finances General Directorate on two of their modernization projects, to reimplement the language used for the computation of the Income Tax (MLang) and to provide support on the GnuCOBOL compiler used by the MedocDB application (COBOL).

MLANG, keystone of the French citizens' Income Tax Calculation

The M language, designed in the 80s to compute the French Income Tax, is still being rewritten in OCaml!

In 2022, our work on MLANG has passed a significant milestone: our work may no longer be considered a prototype! Code generation is now behaviourly compliant with the upstream compiler. David focused on rewriting the C architecture, which has been of great aid in iterating through each version of this new implementation of MLANG.

As far as testing goes, we were allowed to compare the results of our implementation against the ones of the upstream calculator, on real-life inputs too. We are talking about calculations of immense scale, which entails a highly performance-dependent project. Naturally, we managed to produce something of equivalent performance which was a very important matter for our contractors which have, since then, voiced their satisfaction. It is always great for us to feel appreciated for our work.

The next step is to make a production-level language by the end of 2023, so stay tuned if you are interested in this great project.

Wondering what MLANG is ? Be sure to read last year's post on the matter.

Contributing to GnuCOBOL, the Free Open-Source COBOL Alternative

Cobol is ran in gargantuan infrastructures of many an insurance companies and banks across the globe.

In 2022, we started contributing to the GnuCOBOL project: the GnuCOBOL compiler is, today, the only free, open-source, industrial-grade alternative to proprietary compilers by IBM and Micro-Focus. A cornerstone feature of GnuCOBOL is its ability to understand proprietary extensions to COBOL (dialects), to decrease the migration work of developers.

Last year's at OCamlPro presented our gradual introduction to the COBOL Universe as one of our latest technical endeavours. In the beginning, our main objective was to wrap our heads around the state of the environment for COBOL developers.

Our main contribution for now is to add support for the GCOS7 dialect, to ease migration from obsolete GCOS Bull mainframes to a cluster of PCs running GnuCOBOL for our first COBOL customer, the French DGFIP (Public Finances General Directorate). We also contributed a few fixes and small useful general features. Our contributions are gradually upstreamed in the official version of GnuCOBOL.

The other part of our COBOL strategy is to progressively develop our SuperBOL Studio, a set of modern tools for COBOL, and especially GnuCOBOL, based on an OCaml parser for COBOL that we have been improving during the year to support the full COBOL standard. More on this next year, hopefully !

Rust Expertise and Developments

Kind words sent our way by Florian Gilcher (skade), managing director at Ferrous Systems!

OCamlPro's culture is one of human values and appeal for everything scientific.

Programming languages of all nature have caught our attention at some point in time. As a consequence of years of expertise in all kinds of languages, we have grown fond of strongly-typed, memory-safe ones. Eventually gravitating towards Rust, we have since then invested significantly in adding this state-of-the-art language to our toolsets, one of which being the trainings we deliver to industrial actors of various backgrounds to help them grasp at such technological marvels.

Our trainers are qualified engineers, some of which have more than ten years of experience in the industry in R&D, Formal Methods and embedded systems alike, seven of which being solely in Rust.

Strong of our collective experiences, 2022 was indeed the stage for many contributions and missions, some of which we will share with you right now.

Ecore, a heart of Rust for EMF

Ecore is the code generator at the heart of the EMF Architecture.

In 2022, we have seized the opportunity to work at the threshold between Java and Rust for our clients and academic partners of the CEA (Commissariat aux Énergies Atomiques et aux Énergies Alternatives). The product was a Rust-written and Rust-encoded Java class hierarchy code generator.

Ecore is the core metamodel at the heart of the Eclipse Modeling Framework (EMF), which is used internally at the CEA. Ecore is a superset of UML and allows for the engineers of the CEA to express a Java class hierarchy through a graphical interface. In practice, this allows for the generation of basic Java models for the engineers to then build upon.

Our mission consisted in writing, in Rust, a new model generator for them to use in their workflows and to make it capable of generating Rust code instead of Java.

The cost for harnessing the objective qualities of a new implementation in Rust was to have us tackle the scientific challenges pertaining to the inherent structural differences between both languages. Our goal was to find a way to encode, in Rust, a way to express the semantics of the hierarchy of classes of Java, hence merging the worlds of Rust and Java on the way there.

Eventually, our partners were convinced the challenges were worth the improved speed at which models were generated. Furthermore, the now embedded-programming compliant platform, the runtime safety and even Rust's broader WebAssembly-ready toolchain have cleared a new path for the future iterations of their internal projects.

Open-Source Rust Contributions

Ferris the Crab is the mascot of the Rust Language. No wonder why we converged as well!

As we continue scouring the market for more and more Rust projects, and whenever the opportunity shows up, we still actively contribute to the open-source community, here are some of this year's OS work:

Lean4

Here's a project suited for all who, like us, are Formal Methods, functional programming and formal methods enthousiasts: Lean:

Lean is a functional programming language that makes it easy to write correct and maintainable code. You can also use Lean as an interactive theorem prover. Lean programming primarily involves defining types and functions. This allows your focus to remain on the problem domain and manipulating its data, rather than the details of programming.

The list of our contributions to the repository of lean4:

Detection of a major dependent pattern matching bug
Some QA with unintuitive calc indentation
And some more with strict indentation in nested by-s requirement

Matla, TLA+ Projects Manager

Last year, we shared a sneakpeek of Matla, introducing its use-case and the motivations for implementing such manager for TLA+ projects. As we tinkered with TLA+, sometimes finding a bug, we continued our development of Matla on the side.

The tool, although still a work-in-progress, has since then undergone a few changes and releases:

You are welcome to contribute if you happen to find yourself in the same situation we were in when we started the project.

Agnos, for Let's Encrypt Wildcard Certificates

Agnos is a single-binary program allowing you to easily obtain certificates (including wildcards) from Let's Encrypt using DNS-01 challenges. It answers Let's Encrypt DNS queries on its own, bypassing the need for API calls to your DNS provider, and strives to offer a user-friendly and easy configuration.

Often, the best contributions are of a practical nature, which is the case for Agnos.

If that sounds interesting to you, you can learn more about it by reading this article.

Make sure to give us some feedback when you end up using it!

The WebAssembly Garbage Collection Working-Group

WebAssembly is used to compile many languages to an efficient portable code for web-browsers.

Late 2022 was finally time for us to put into practice the knowledge we have acquired about WebAssembly over the years by writing and presenting the first compiler of a real-world functional language targeting the WasmGC proposal.

Although a relatively new technology, its great design, huge potential, and already very tangible and interesting use-cases have not escaped our watch and we are very happy to have kept a sharp eye on it.

WebAssembly (abbreviated Wasm) is a binary instruction format for a stack-based virtual machine. Wasm is designed as a portable compilation target for programming languages, enabling deployment on the web for client and server applications.

WasmGC is the name of the on-going working group and proposal towards eventually adding support for garbage collection to Wasm. December 2022 saw a significant amount of work accomplished by both Léo Andrès (whose thesis work is directed by Pierre and Jean-Christophe Filliâtre) and Pierre Chambart towards finding viable compilation strategies from OCaml to WasmGC. The goal was three-fold: make a prototype compiler to demonstrate the soundness of the proposal, show that our compilation strategies were viable and, finally, convince the commitee of the significance of the Wasm i31ref for OCaml.

Our success in these three distinct points was paramount for OCaml, and other languages that depend on the presence of i31ref, in order to one day benefit from having WebAssembly as a natively supported compilation target for Web-bound applications.

Here's a short listing of the work they have accomplished for that matter. Please rest assured, more detailed explanations are to be expected on this very blog in the future so stay tuned!

Introducing Wasocaml to the Wasm-GC Group and demonstrating the OCaml's dependency on Wasm keeping i31ref in their GC proposal.
Wasocaml, an OCaml compiler to Wasm. Wasocaml is also the first compiler for a real-world functional language to Wasm-GC.
owi, an OCaml toolchain to work with Wasm. It provides and interpreter as an executable and a library.

Tooling for Formal Methods

Programming languages theory is closely tied with the idea of proper mathematical formalisation. Hence the strong scientific background in Formal Methods that we draw from both for language design or formal verification for cybersecurity.

The Alt-Ergo Theorem Prover

OCamlPro develops and maintains Alt-Ergo, an automatic solver of mathematical formulas designed for program verification and based on Satisfiability Modulo Theories (SMT) technology. Alt-Ergo was initially created within the VALS team at University of Paris-Saclay.

Alt-Ergo proves mathematical formulas corresponding to software program properties.

The Alt-Ergo Users' Club

The Alt-Ergo Users' Club was launched in 2019. Its 4th annual meeting was held in late March 2022.

These meetings allow us to keep track of our members' needs and demands as well as keep them informed with the latest changes to the SMT Solver; they are the lifeline of our Club and help us guarantee that the project lives on, despite the enormous task it represents.

This is a good time to appreciate the scope of the project: Alt-Ergo is the fruit of more than 10 years' worth of Research & Development. Currently maintained by Pierre Villemot, whom we will introduce in the next section, as full-time R&D engineer.

The dedicated members of the Club!

This is the reason why we would like to thank our partners from the Alt-Ergo Users’ Club, for their trust: Thales, Trust-in-Soft, AdaCore, MERCE (Mitsubishi Electric R&D Centre Europe) and the CEA. Their support allows us to maintain our tool.

Congratulations and many thanks to our partners at Trust-In-Soft, who have upgraded their subscription to the Club to Gold-tier, allowing this great open-source tool to live on!

Developing Alt-Ergo

In 2022, the Alt-Ergo team welcomed Pierre Villemot as full-time maintainer of the project! His recruitement shows our commitment to the project's long term maintenance and evolution. We are looking forward to seeing him take it to new heights in future releases! Speaking of releases, 2022 has also been the stage for Alt-Ergo's v2.4.2 minor release which introduced an update of the labgltk library to version 3 and a set of bug fixes.

Now onto the more substantial changes to Alt-Ergo, the integration into next of all the following:

Integration of the SMT-LIB2 format parser Dolmen to Alt-Ergo's frontend;
Improvement and test of models generation;
Addition of mutually recursive functions for the legacy frontend and Dolmen alike;
Significant amounts of documentation and code-cleaning;
Implementation of systematical benchmarks of the SMT-LIB for regression prevention;
Prototypical Dockerisation;

These are significative improvements to the User Experience and overall ergonomy of the tool. You can already benefit from these changes by using Alt-Ergo's dev version.

Finally, let us inform you that our candidacy for the DECYSIF project was approved. Indeed, we and our partners at Adacore, Trust-In-Soft and the Laboratoire Méthodes Formelles have been selected to conduct this funded research project as consultant in Formal Methods. Now, we hope to be part of collaborative R&D projects to further fund core Alt-Ergo developments. This should allow us to deepen collaboration with old partners like the Why3 team at the Formal Methods Lab (LMF) and the ProofinUse consortium members. Stay tuned!

Dolmen Library for Automated Deduction Languages

Dolmens are Neolithic megalithic structures composed of menhirs and they can range from a few centimeters to several meters high!

Dolmen is an OCaml Library developed by Guillaume Bury as part of our Research and Development processes around Formal Methods and our development efforts for our automated SMT-Solver Alt-Ergo.

Dolmen is a testimony of our push towards standardised input languages for SMT-Solvers. Indeed, it provides flexible Menhir parsers and typecheckers for several languages used in automated deduction such as: smtlib, ae (Alt-Ergo), dimacs, iCNF, tptp and zf (zipperposition). And so, Dolmen aims at encompassing the largest amount of input languages for automated deduction as possible and provides the OCaml community with a centralised solution for their input parsing and typechecking, hence keeping them from having to reimplement them each time.

Furthermore, the Dolmen binary is used by the maintainers of the SMTLIB in order to assert that newly submitted benchmarks to the SMTLIB are compliant with the specification which makes Dolmen its de facto reference implementation. In time, Dolmen will become the default frontend for Alt-Ergo and, we hope, any other OCaml written SMT-Solver from now on.

Contributions to OCaml

Last but not least, OCamlPro’s DNA built itself on one of the most powerful and elegant programming languages ever, born from more than 30 years of French public Research in Computer Science, and widely used in safety critical industries. OCaml’s traits pervasively inspired many new languages (like F#). We are proud to be part of this great community of researchers and computer scientists.

About opam, the OCaml Package Manager

opam, the OCaml Package Manager, remains one of OCamlPro's greatest achievements!

2022 has been the theatre of a sustained and continuous effort from the opam team.

The fruits of their labor have been compiled into an alpha release of version 2.2.0 by June 28th 2023, so here is a taste of what should make the final 2.2.0 version of opam a treat for its users:

Windows support: opam 2.2 comes with native Windows compatibility. You can now use opam from your preferred Windows terminal!
Recursive pinning: allows to have opam lookup for opam files into subdirectories.
Software Heritage binding: opam now integrates a fallback to Software Heritage archive retrieval, based on SWHID. If an SWHID url is present in an opam file, the fallback can be activated.
Enhanced features for developers: as the development tools variable to share a development setup, the opam tree command to have better an overview of dependencies, new pinning subcommands, and so on.

That being said, 2022 was a very special year for opam. Indeed, 10 years prior, on the 26th of June 2012, OCamlPro birthed version 0.1 of what was to become the official OCaml Package Manager, the cornerstone of the OCaml environment.

It was no small feat to make opam what it is today. It took approximately 5 years to bring 1.0.0 up to 2.0.0 and another 3 to reach 2.1.0 all the while ensuring changes were compliant with the current ecosystem (opam repository, OCaml tooling) and the public's feedback and vision.

This work is allowed thanks to Jane Street's funding.

The Flambda2 Optimizing Compiler

Flambda2 is a powerful code optimizer for the OCaml compiler strong of many years of R&D.

OCamlPro is proud to be working on Flambda2, an ambitious OCaml optimizing compiler project, initiated with Mark Shinwell from Jane Street, our long-term partner and client. Flambda2 builds upon its predecessor: Flambda, which focused on reducing the runtime cost of abstractions and removing as many short-lived allocations as possible. Thus, Flambda2 not only shines with the maturity and experience its architects acquired through years worth of R&D and dev-time for Flambda, but it improves upon them.

In 2022, Flambda2 was for the first time used for production workloads and has been ever since! Indeed, we can officially say that Flambda2 left the realm of the prototype to enter one of real-life, production-tested software for which we continue to provide development and support as it has been for years now.

This achievement comes along having our engineers take more and more action in maintaining the OCaml compiler. Being part of the OCaml Core-Team is an honour.

Finally, in 2022, the Flambda Team welcomed a new member: Nathanaëlle Courant will be joining forces with Pierre Chambart, Damien Doligez, Vincent Laviron, Guillaume Bury to tackle the challenges inherent to maintaining Flambda2 and that of the Core-Team.

If you are interested in more things Flambda2, stay tuned in with our blog, there should be a series of very interesting articles coming up in the not-so distant future!

This work is allowed thanks to Jane Street's funding.

In other OCaml compiler news, 2022 was also the year of the official release of OCaml 5.0.0, also known as Multi-Core, on the 16th of December. This major release introduced a new runtime system with support for shared memory parallelism and effect handlers! This fabulous milestone is brought to us by the joined work of the amazing people of the OCaml Core-Team; among them some of our own.

Many thanks to all who take part in uncovering the yet untrodden paths of the OCaml distribution!

What a time to be an OCaml enthousiast!

Organizing Meetups for the OCaml Community

OCaml Users in PariS (OUPS)

Camels going to their pluri-annual OUPS Meet-up.

Just under 10 years ago, Fabrice Le Fessant initiated the very first OCaml Users in Paris.

This event allowed OCaml users in Paris, professionals and amateurs alike, to meet and exchange on OCaml novelties. This is still the case and the organising crew now includes several people of diverse affiliations, maintaining the purpose of this friendly event.

Every two months or so, the organisers reach out to the community, hail volunteers and select presentations of on-going works. When the time and place is settled, the ocaml-paris Meetup members are informed by various means. The OCaml Users in PariS meetup is the place to enthusiastically share knowledge and a few pizzas. It is supported by the OCaml Software Foundation who graciously pays for the pizzas.

You can register to the OCaml Users in PariS (OUPS) meetup group here.

Here are all the relevant links to the talks that happened in Paris in 2022:

OCaml Meet-Up in Toulouse

Toulouse also has its set of enthousiastic OCaml supporters.

Fortunately for OCaml Users that live in the French South-West, a new Meet-up is now available to them. On the 11th of October 2022, the first OCaml meet-up in Toulouse happened.

The first occurence of the OCaml Users in Toulouse Meetup kicked off with Erik Martin-Dorel (OCaml Software Foundation) presenting Learn-OCaml who was then followed by David Declerck (OCamlPro) presenting his OCaml-Canvas graphics library for OCaml.

You can register to the OCaml Meet-Up in Toulouse group here.

Here's to sharing a slice or two with you soon!

Participation to External Events

The OCaml Workshop 2022 - ICFP Ljubljana

ICFP 2022 took place in the beautiful town of Ljubjana, Slovenia.

The OCaml Workshop is an international conference that focuses on everything OCaml and is part of the ICFP (International Conference on Functional Programming).

We attended many of these and have presented numerous papers throughout the years.

In 2022, a paper co-authored by the maintainers of opam, the OCaml Package Manager, was submitted and approved for presentation: "Supporting a decade of opam".

You can find the textual references of the talk here and a replay of the presentation there.

You can expect more papers and interesting talks coming from us in upcoming editions of the conference!

Journées Francophones Langages Applicatifs 2022

the JFLA'2022 took place in the beautiful Domaine d'Essendiéras in Périgord, France.

Among the many scientific conferences we attend on an annual basis, the JFLA (Journée Francophones des Langages Applicatifs or French-Speaking annual gathering on Application Programming Languages, mainly Functional Languages) is the one we feel most at home since 2016.

Ever since have we remained among their faithful supporters and participants. This gathering of many of our fellow French computer scientists and industrial actors alike has been our go-to conference to catch-up with and present our work. The 2022 edition was no exception!

We submitted and presented the following papers:

Mikino, formal verification made accessible (link to dedicated blogpost);
Connecting Software Heritage with the OCaml ecosystem;
Alt-Ergo-Fuzz, hunting the bugs of the bug hunter;

You can find a more detailed recounting of our JFLA2022 submissions in this blog post as well as the links to the actual (french-written) submitted papers.

Conclusion

As always, we warmly thank all our clients, partners, and friends, for their support and collaboration throughout the year,

And to you, dear reader, thank you for tagging along,

Since 2011 with love,

The OCamlPro Team

Autofonce, GNU Autotests Revisited

2023-06-27T09:05:17Z

Since 2022, OCamlPro has been contributing to GnuCOBOL, the only fully open-source compiler for the COBOL language. To speed-up our contributions to the compiler, we developed a new tool, autofonce, to be able to easily run and modify the testsuite of the compiler, originally written as a GNU Autoconf testsuite. This article describes this tool, that could be useful for other project testsuites.

Table of contents

Introduction
The Gnu Autoconf Testsuite of GnuCOBOL
Main Features of Autofonce
Conclusion

Autofonce is a modern runner for GNU Autoconf Testsuite

Introduction

Since 2022, OCamlPro has been involved in a big modernization project for the French state: the goal is to move a large COBOL application, running on a former Bull mainframe (GCOS) to a cluster of Linux computers. The choice was made to use the most open-source compiler, GnuCOBOL, that had already been used in such projects.

One of the main problems in such migration projects is that most COBOL proprietary compilers provide extensions to the COBOL language standard, that are not supported by other compilers. Fortunately, GnuCOBOL has good support for several mainstream COBOL dialects, such as IBM or Micro-Focus ones. Unfortunately, GnuCOBOL had no support at the time for the GCOS COBOL dialect developed by Bull for its mainframes.

As a consequence, OCamlPro got involved in the project to extend GnuCOBOL with the support for the GCOS dialect needed for the application. This work implied a lot of (sometimes very deep) modifications of the compiler and its runtime library, both of them written in the C language. And of course, our modifications had first to pass the large existing testsuite of COBOL examples, and then extend it with new tests, so that the new dialect would continue to work in the future.

This work lead us to develop autofonce, a modern open-source runner for GNU Autoconf Testsuites, the framework used in GnuCOBOL to manage its testsuite. Our tool is available on Github, with Linux and Windows binaries on the release page.

The GNU Autoconf Testsuite of GnuCOBOL

GNU Autoconf is a set of powerful tools, developed to help developers of open-source projects to manage their projects, from configuration steps to testing and installation. As a very old technology, GNU Autoconf relies heavily on M4 macros both as its own development language, and as its extension language, typically for tests.

In GnuCOBOL, the testsuite is in a sub-directory tests/, containing a file testsuite.at, itself including other files from a sub-directory testsuite.src/.

As an example, a typical test from syn_copy.at looks like:

AT_SETUP([INITIALIZE constant])
AT_KEYWORDS([misc])
AT_DATA([prog.cob], [
       IDENTIFICATION   DIVISION.
       PROGRAM-ID.      prog.
       DATA             DIVISION.
       WORKING-STORAGE  SECTION.
       01  CON          CONSTANT 10.
       01  V            PIC 9.
       78  C78          VALUE 'A'.
       PROCEDURE DIVISION.
           INITIALIZE CON.
           INITIALIZE V.
           INITIALIZE V, 9.
           INITIALIZE C78, V.
])
AT_CHECK([$COMPILE_ONLY prog.cob], [1], [],
[prog.cob:10: error: invalid INITIALIZE statement
prog.cob:12: error: invalid INITIALIZE statement
prog.cob:13: error: invalid INITIALIZE statement
])
AT_CLEANUP

Actually, we were quite pleased by the syntax of tests, it is easy to generate test files (using AT_DATA macro) and to test the execution of commands (using AT_CHECK macro), checking its exit code, its standard output and error output separately. It is even possible to combine checks to run additional checks in case of error or success. In general, the testsuite is easy to read and complete.

However, there were still some issues:

At every update of the code or the tests, the testsuite runner has to be recompiled;
Running the testsuite requires to be in the correct sub-directory, typically within the _build/ sub-directory;
By default, tests are ran sequentially, even when many cores are available.

The output is pretty verbose, showing all tests that have been executed. Failed tests are often lost in the middle of other successful tests, and you have to wait for the end of the run to start investigating them;

## -------------------------------------------- ##
## GnuCOBOL 3.2-dev test suite: GnuCOBOL Tests. ##
## -------------------------------------------- ##
  
General tests of used binaries
  
  1: compiler help and information                   ok
  2: compiler warnings                               ok
  3: compiler outputs (general)                      ok
  4: compiler outputs (file specified)               ok
  5: compiler outputs (path specified)               ok
  6: compiler outputs (assembler)                    ok
  7: source file not found                           ok
  8: temporary path invalid                          ok
  9: use of full path for cobc                       ok
 10: C Compiler optimizations                        ok
 11: invalid cobc option                             ok
 12: cobcrun help and information                    ok
 13: cobcrun validation                              ok
 14: cobcrun -M DSO entry argument                   ok
 15: cobcrun -M directory/ default                   ok
 [...]

There is no automatic way to update tests, when their output has changed. Every test has to be updated manually.
In case of error, it is not always easy to rerun a specific test within its directory.

With autofonce, we tried to solve all of these issues...

Main Features of Autofonce

autofonce is written in a modern language, OCaml, so that it can handle a large testsuite much faster than GNU Autoconf. Since we do not expect users to have an OCaml environment set up, we provide binary versions of autofonce for both Linux (static executable) and Windows (cross-compiled executable) on Github.

autofonce does not use m4, instead, it has a limited support for a small set of predefined m4 macros, typically supporting m4 escape sequences (quadrigraphs), but not the addition of new m4 macros, and neither the execution of shell commands outside of these macros (yes, testsuites in GNU Autoconf are actually sh shell scripts with m4 macros...). In the case of GnuCOBOL, we were lucky enough that the testsuite was well written and avoided such problems (we had to fix only a few of them, such as including shell commands into AT_CHECK macros). The syntax of tests is documented here.

Some interesting features of autofonce are :

autofonce executes the tests in parallel by default, using as many cores as available. Only failed tests are printed, so that the developer can immediately start investigating them;
autofonce can be run from any directory in the project. A .autofonce file has to be present at the root of the project, to describe where the tests are located and in which environment they should be executed;
autofonce makes it easy to re-execute a specific test that failed, by generating, within the test sub-directory, a script for every step of the test;
autofonce provides many options to filter which tests should be executed. Tests can be specified by number, range of numbers, keywords, or negative keywords. The complete list of options is easily printable using autofonce run --help for example;

Additionnally, autofonce implements a powerful promotion mechanism to update tests, with the autofonce promote sub-command. For example, if you update a warning message in the compiler, you would like all tests where this message appears to be modified. With autofonce, it is as easy as:

# Run all tests at least once
autofonce run
# Print the patch that would be applied in case of promotion
autofonce promote
# Apply the patch above
autofonce promote --apply
# Combine running and promotion 10 times:
autofonce run --auto-promote 10

The last command iterates promotion 10 times: indeed, since a test may have multiple checks, and only the first failed check of the test will be updated during one iteration (because the test aborts at the first failed check), as many iterations as the maximal number of failed checks within a test may be needed.

Also, as for GNU Autoconf, autofonce generates a final log file containing the results with a full log of errors and files needed to reproduce the error. This file can be uploaded into the artefacts of a CI system to easily debug errors after a CI failure.

Conclusion

During our work on GnuCOBOL, autofonce improved a lot our user experience of running the testsuite, especially using the auto-promotion feature to update tests after modifications.

We hope autofonce could be used for other open-source projects that already use the GNU Autoconf testsuite. Of course, it requires that the testsuite does not make heavy use of shell features and mainly relies on standard m4 macros.

We found that the format of GNU Autoconf tests to be quite powerful to easily check exit codes, standard outputs and error outputs of shell commands. autofonce could be used to help using this format in projects, that do not want to rely on an old tool like GNU Autoconf, and are looking for a much more modern test framework.

Sub-single-instruction Peano to machine integer conversion

2023-01-23T09:05:17Z

It is a rainy end of January in Paris, morale is getting soggier by the day, and the bulk of our light exposure needs are now fulfilled by our computer screens as the sun seems to have definitively disappeared behind a continuous stream of low-hanging clouds. But, all is not lost, the warm rays of comradeship pierce through the bleak afternoons, and our joyful party of adventurers once again embarked on an adventure of curiosity and rabbit-hole depth-first-searching.

Last week's quest led us to a treasure coveted by a mere handful of enlightened connoisseurs, but a treasure of tremendous nerdy-beauty, known to the academics as "Sub-single-instruction Peano to machine integer conversion" and to the locals as "How to count how many nested Some there are very very fast by leveraging druidic knowledge about unspecified, undocumented, and unstable behavior of the Rust compiler".

Our quest begins

Our whole quest started when we wanted to learn more about discriminant elision. Discriminant elision in Rust is part of what makes it practical to use Option<&T> in place of *const T. More precisely it is what allows Option<&T> to fit in as much memory as *const T, and not twice as much. To understand why, let's consider an Option<u64>. An u64 is 8 bytes in size. An Option<u64> should have at least one more bit, to indicate whether it is a None, or a Some. But bits are not very practical to deal with for computers, and hence this discriminant value -- indicating which of the two variants (Some or None) the value is -- should take up at least one byte. Because of alignment requirements (and because the size is always a multiple of the alignment) it actually ends up taking 8 bytes as well, so that the whole Option<u64> occupies twice the size of the wrapped u64.

In languages like C, it is very common to pass around pointers, and give them a specific meaning if they are null. Typically, a function like lfind which searches for an element in a array will return a pointer to the matching element, and this pointer will be null if no such element was found. In Rust however fallibility is expected to be encoded in the type system. Hence, functions like find returns a reference, wrapped in a Option. Because this kind of API is so ubiquitous, it would have been a real hindrance to Rust adoption if it took twice as much space as the C version.

This is why discriminant elision exists. In our Option<&T> example Rust can leverage the same logic as C: &T references in Rust are guaranteed to be -- among other things -- non-null. Hence Rust can choose to encode the None variant as the null value of the variable. Transparently to the user, our Option<&T> now fits on 8 bytes, the same size as a simple &T. But Rust discriminant elision mechanism goes beyond Option<&T> and works for any general type if:

The option-like value has one fieldless variant and one single-field variant
The wrapped type has so-called niche values, that is values that are statically known to never be valid for said type.

Discriminant elision remains under-specified, but more information can be found in the FFI guidelines. Note that other unspecified situations seem to benefit from niche optimization (e.g. PR#94075).

Too many options

Out of curiosity, we wanted to investigate how the Rust compiler represents a series of nested Option. It turns out that up to 255 nested options can be stored into a byte, which is also the theoretical limit. Because this mechanism is not limited to Option, we can use it with (value-level) Peano integers. Peano integers are a theoretical encoding of integer in "unary base", but it is enough for this post to consider them a fun little gimmick. If you want to go further, know that Peano integers are more often used at the type-level, to try to emulate type-level arithmetic.

In our case, we are mostly interested in Peano-integers at the value level. We define them as follows:

#![recursion_limit = "512"]
#![allow(dead_code)]

/// An empty enum, a type without inhabitants.
/// Cf: https://en.wikipedia.org/wiki/Bottom_type
enum Null {}

/// PeanoEncoder<Null> is a Peano-type able to represent integers up to 0.
/// If T is a Peano-type able to represent integers up to n
/// PeanoEncoder<T> is a Peano-type able to represent integers up to n+1
#[derive(Debug)]
enum PeanoEncoder<T> {
    Successor(T),
    Zero,
}

macro_rules! times2 {
    ($peano_2x:ident, $peano_x:ident ) => {
        type $peano_2x<T> = $peano_x<$peano_x<T>>;
    };
}
times2!(PeanoEncoder2, PeanoEncoder);
times2!(PeanoEncoder4, PeanoEncoder2);
times2!(PeanoEncoder8, PeanoEncoder4);
times2!(PeanoEncoder16, PeanoEncoder8);
times2!(PeanoEncoder32, PeanoEncoder16);
times2!(PeanoEncoder64, PeanoEncoder32);
times2!(PeanoEncoder128, PeanoEncoder64);
times2!(PeanoEncoder256, PeanoEncoder128);

type Peano0 = PeanoEncoder<Null>;
type Peano255 = PeanoEncoder256<Null>;

Note that we cannot simply go for

enum Peano {
    Succesor(Peano),
    Zero,
}

like in Haskell or OCaml because without indirection the type has infinite size, and adding indirection would break discriminant elision. What we really have is that we are actually using a type-level Peano-encoding of integers to create a type Peano256 that contains value-level Peano-encoding of integers up to 255, as a byte would.

We can define the typical recursive pattern matching based way of converting our Peano integer to a machine integer (a byte).

trait IntoU8 {
    fn into_u8(self) -> u8;
}

impl IntoU8 for Null {
    fn into_u8(self) -> u8 {
        match self {}
    }
}

impl<T: IntoU8> IntoU8 for PeanoEncoder<T> {
    fn into_u8(self) -> u8 {
        match self {
            PeanoEncoder::Successor(x) => 1 + x.into_u8(),
            PeanoEncoder::Zero => 0,
        }
    }
}

Here, according to godbolt, Peano255::into_u8 gets compiled to more than 900 lines of assembly, which resembles a binary decision tree with jump-tables at the leaves.

However, we can inspect a bit how rustc represents a few values:

println!("Size of Peano255: {} byte", std::mem::size_of::<Peano255>());
for x in [
    Peano255::Zero,
    Peano255::Successor(PeanoEncoder::Zero),
    Peano255::Successor(PeanoEncoder::Successor(PeanoEncoder::Zero)),
] {
    println!("Machine representation of {:?}: {}", x, unsafe {
        std::mem::transmute::<_, u8>(x)
    })
}

which gives

Size of Peano255: 1 byte
Machine representation of Zero: 255
Machine representation of Successor(Zero): 254
Machine representation of Successor(Successor(Zero)): 253

A pattern seems to emerge. Rustc chooses to represent Peano255::Zero as 255, and each successor as one less.

As a brief detour, let's see what happens for PeanoN with other values of N.

let x = Peano1::Zero;
println!("Machine representation of Peano1::{:?}: {}", x, unsafe {
    std::mem::transmute::<_, u8>(x)
});
for x in [
    Peano2::Successor(PeanoEncoder::Zero),
    Peano2::Zero,
] {
    println!("Machine representation of Peano2::{:?}: {}", x, unsafe {
        std::mem::transmute::<_, u8>(x)
    })
}

gives

Machine representation of Peano1::Zero: 1
Machine representation of Peano2::Successor(Zero): 1
Machine representation of Peano2::Zero: 2

Notice that the representation of Zero is not the same for each PeanoN. What we actually have -- and what is key here -- is that the representation for x of type PeanoN is the same as the representation of Succesor(x) of type PeanoEncoder<PeanoN>, which implies that the machine representation of an integer k in the type PeanoN is n-k.

That detour being concluded, we refocus on Peano255 for which we can write a very efficient conversion function

impl Peano255 {
    pub fn transmute_u8(x: u8) -> Self {
        unsafe { std::mem::transmute(u8::MAX - x) }
    }
}

Note that this function mere existence is very wrong and a sinful abomination to the eye of anything that is holy and maintainable. But provided you run the same compiler version as me on the very same architecture, you may be ok using it. Please don't use it.

In any case transmute_u8 gets compiled to

movl    %edi, %eax
notb    %al
retq

that is a simple function that applies a binary not to its argument register. And in most use cases, this function would actually be inlined and combined with operations above, making it run in less than one processor operation!

And because 255 is so small, we can exhaustively check that the behavior is correct for all values! Take that formal methods!

for i in 0_u8..=u8::MAX {
    let x = Peano255::transmute_u8(i);
    if i % 8 == 0 {
        print!("{:3} ", i)
    } else if i % 8 == 4 {
        print!(" ")
    }
    let c = if x.into_u8() == i { '✓' } else { '✗' };
    print!("{}", c);
    if i % 8 == 7 {
        println!()
    }
}

  0 ✓✓✓✓ ✓✓✓✓
  8 ✓✓✓✓ ✓✓✓✓
 16 ✓✓✓✓ ✓✓✓✓
 24 ✓✓✓✓ ✓✓✓✓
 32 ✓✓✓✓ ✓✓✓✓
 40 ✓✓✓✓ ✓✓✓✓
 48 ✓✓✓✓ ✓✓✓✓
 56 ✓✓✓✓ ✓✓✓✓
 64 ✓✓✓✓ ✓✓✓✓
 72 ✓✓✓✓ ✓✓✓✓
 80 ✓✓✓✓ ✓✓✓✓
 88 ✓✓✓✓ ✓✓✓✓
 96 ✓✓✓✓ ✓✓✓✓
104 ✓✓✓✓ ✓✓✓✓
112 ✓✓✓✓ ✓✓✓✓
120 ✓✓✓✓ ✓✓✓✓
128 ✓✓✓✓ ✓✓✓✓
136 ✓✓✓✓ ✓✓✓✓
144 ✓✓✓✓ ✓✓✓✓
152 ✓✓✓✓ ✓✓✓✓
160 ✓✓✓✓ ✓✓✓✓
168 ✓✓✓✓ ✓✓✓✓
176 ✓✓✓✓ ✓✓✓✓
184 ✓✓✓✓ ✓✓✓✓
192 ✓✓✓✓ ✓✓✓✓
200 ✓✓✓✓ ✓✓✓✓
208 ✓✓✓✓ ✓✓✓✓
216 ✓✓✓✓ ✓✓✓✓
224 ✓✓✓✓ ✓✓✓✓
232 ✓✓✓✓ ✓✓✓✓
240 ✓✓✓✓ ✓✓✓✓
248 ✓✓✓✓ ✓✓✓✓

Isn't computer science fun?

Note: The code for this blog post is available here.

Statically guaranteeing security properties on Java bytecode: Paper presentation at VMCAI 23

2023-01-12T09:05:17Z

We are excited to announce that Nicolas will present a paper at the International Conference on Verification, Model Checking, and Abstract Interpretation (VMCAI) the 16th and 17th of January.

This year, VMCAI is co-located with the Symposium on Principles of Programming Languages (POPL) conference, which, as its name suggests, is a flagship conference in the Programming Languages domain.

What's more, for its 50th anniversary edition, POPL will return back where its first edition took place: Boston! It is thus in the vicinity of the MIT and Harvard that we will meet with prominent figures of computer science research.

This paper will be presented at VMCAI'2023, colocated with POPL'2023 at Boston!

A sound technique to statically guarantee security properties on Java bytecode

Nicolas will be presenting a novel static program analysis technique dedicated to the discovery of information flows in Java bytecode. By automatically discovering such flows, the new technique allows developers and users of Java libraries to assess key security properties on the software they run.

Two prominent examples of such properties are confidentiality (stating that no single bit of secret information may be inadvertently revealed by the software), and its dual, integrity (stating that no single bit of trusted information may be tampered via untrusted data).

The technique is proven sound (i.e. it cannot miss a flow of information), and achieves state-of-the-art precision (i.e. it does not raise too many false alarms) according to evaluations using the IFSpec benchmark suite.

Try it out!

In addition to being supported by a proof, the technique has also been implemented in a tool called Guardies.

We believe this static analysis tool will naturally complement the taint tracking and dynamic analysis techniques that are usually employed to assess software security.

Reading more about it

You may already access the full paper here.

Nicolas developed this contribution while working at the University of Liverpool, in collaboration with Narges Khakpour, herself from the University of Newcastle.

Release of ocplib-simplex, version 0.5

2023-01-05T09:05:17Z

On last November, we released version 0.5 of ocplib-simplex, a generic library implementing the Simplex Algorithm in OCaml. It is a key component of the Alt-Ergo automatic theorem prover that we keep developing at OCamlPro.

Table of contents

The Simplex Algorithm
What Changed in 0.5 ?

Try ocplib-simplex before implementing your own library !

The simplex algorithm

The Simplex Algorithm is well known among linear optimization enthusiasts. Let's say you own a manufacture producing two kinds of chairs: the first is cheap, you make a small profit out of them but they are quick to produce; the second one is a bit more fancy, you make a bigger profit but they need a lot of time to build. You have a limited amount of wood and time. How many cheap and fancy chairs should you produce to optimize your profits?

You can represent this problem with a set of mathematical constraints (more precisely, linear inequalities) which is precisely the scope of the simplex algorithm. Given a set of linear inequalities, it computes a solution maximizing a given value (in our example, the total profit). If you are interested in the detail of the algorithm, you shoud definitely watch this video.

The simplex algorithm is known to be a difficult problem in terms of complexity. While the base algorithm is EXP-time, it is generally very efficient in practice.

What Changed in 0.5 ?

Among the main changes in this new version of ocplib-simplex:

Make the library's API more generic and easier to use (see the System Solving Example or the Linear Optimization Example);
All the modules are better documentated in their .mli interfaces (see coreSig.mli for example);
the build system has been switched to dune

We hope that this work of simplification will help you to integrate this library more easily in your projects!

If you want to follow this project, report an issue or contribute, you can find it on GitHub.

Please do not hesitate to contact us at OCamlPro: alt-ergo@ocamlpro.com.

The Growth of the OCaml Distribution

2023-01-02T09:05:17Z

We recently worked on a project to build a binary installer for OCaml, inspired from RustUp for Rust. We had to build binary packages of the distribution for every OCaml version since 4.02.0, and we were surprised to discover that their (compressed) size grew from 18 MB to about 200 MB. This post gives a survey of our findings.

Table of contents

Introduction
General Trends
Causes and Consequences
Inside the OCaml Installation
Conclusion

Introduction

One of the strengths of Rust is the ease with which it gets installed on a new computer in user space: with a simple command copy-pasted from a website into a terminal, you get all what you need to start building Rust projects in a few seconds. Rustup, and a set of prebuilt packages for many architectures, is the project that makes all this possible.

OCaml, on the other hand, is a bit harder to install: you need to find in the documentation the proper way for your operating system to install opam, find how to create a switch with a compiler version, and then wait for the compiler to be built and installed. This usually takes much more time.

As a winter holiday project, we worked on a project similar to Rustup, providing binary packages for most OCaml distribution versions. It builds upon our experience of opam and opam-bin, our plugin to build and share binary packages for opam.

While building binary packages for most versions of the OCaml distribution, we were surprised to discover that the size of the binary archive grew from 18 MB to about 200 MB in 10 years. Though on many high-bandwidth connexions, it is not a problem, it might become one when you go far from big towns (and fortunately, we designed our tool to be able to install from sources in such a case, compromising the download speed against the installation speed).

We decided it was worth trying to investigate this growth in more details, and this post is about our early findings.

General Trends

In 10 years, the OCaml Distribution binary archive grew by a factor 10, from 18 MB to 198 MB, corresponding to a growth from 73 MB to 522 MB after installation, and from 748 to 2433 installed files.

So, let's have a look at the evolution of the size of the binary OCaml distribution in more details. Between version 4.02.0 (Aug 2014) and version 5.0.0 (Dec 2022):

The size of the compressed binary archive grew from from 18 MB to 198 MB
The size of the installed binary distribution grew from 73 MB to 522 MB
The number of installed files grew from 748 to 2433

The OCaml Distribution source archive was much more stable, with a global growth smaller than 2.

On the other hand, the source distribution itself was much more stable:

The size of the compressed source archive grew only from 3 MB to 5 MB
The size of the sources grew from 14 MB to 26 MB
The number of source files grew from 2355 to 4084

For our project, this evolution makes the source distribution a good alternative to binary distributions for low-bandwidth settings, especially as OCaml is much faster than Rust at building itself. For the record, version 5.0.0 takes about 1 minute to build on a 16-core 64GB-RAM computer.

Interestingly, if we plot the total size of the binary distribution, and the total size with only files that were present in the previous version, we can notice that the growth is mostly caused by the increase in size of these existing files, and not by the addition of new files:

The growth is mostly caused by the increase in size of existing files, and not by the addition of new files.

Causes and Consequences

We tried to identify the main causes of this growth: the growth is linear most of the time, with sharp increases (and decreases) at some versions. We plotted the difference in size, for the total size, the new files, the deleted files and the same files, i.e. the files that made it from one version to the next one:

The difference of size between two versions is not big most of the time, but some versions exhibit huge increases or decreases.

Let's have a look at the versions with the highest increases in size:

+86 MB for 4.08.0: though there are a lot of new files (+307), they only account for 3 MB of additionnal storage. Most of the difference comes from an increase in size of both compiler libraries (probably in relation with the use of Menhir for parsing) and of some binaries. In particular:
- +13 MB for bin/ocamlobjinfo.byte (2_386_046 -> 16_907_776)
- +12 MB for bin/ocamldep.byte (2_199_409 -> 15_541_022)
- +6 MB for bin/ocamldebug (1_092_173 -> 7_671_300)
- +6 MB for bin/ocamlprof.byte (630_989 -> 7_043_717)
- +6 MB for lib/ocaml/compiler-libs/parser.cmt (2_237_513 -> 9_209_256)
+74 MB for 4.03.0: again, though there are a lot of new files (+475, mostly in compiler-libs), they only account for 11 MB of additionnal storage, and a large part is compensated by the removal of ocamlbuild from the distribution, causing a gain of 7 MB.

Indeed, most the increase in size is probably caused by the compilation with debug information (option -g), that increases considerably the size of all executables, for example:
- +12 MB for bin/ocamlopt (2_016_697 -> 15_046_969)
- +9 MB for bin/ocaml (1_833_357 -> 11_574_555)
- +8 MB for bin/ocamlc (1_748_717 -> 11_070_933)
- +8 MB for lib/ocaml/expunge (1_662_786 -> 10_672_805)
- +7 MB for lib/ocaml/compiler-libs/ocamlcommon.cma (1_713_947 -> 8_948_807)
+72 MB for 4.11.0: again, the increase almost only comes from existing files. For example:
- +16 MB for bin/ocamldebug (8_170_424 -> 26_451_049)
- +6 MB for bin/ocamlopt.byte (21_895_130 -> 28_354_131)
- +5 MB for lib/ocaml/extract_crc (659_967 -> 6_203_791)
- +5 MB for bin/ocaml (17_074_577 -> 22_388_774)
- +5 MB for bin/ocamlobjinfo.byte (17_224_939 -> 22_523_686)
Again, the increase is probably related to adding more debug information in the executable (there is a specific PR on ocamldebug for that, and for all executables more debug info is available for each allocation);
+48 MB for 5.0.0: a big difference in storage is not surprising for a change in a major version, but actually half of the difference just comes from an increase of 23 MB of bin/ocamldoc;
+34 MB for 4.02.3: this one is worth noting, as it comes at a minor version change. The increase is mostly caused by the addition of 402 new files, corresponding to cmt/cmti files for the stdlib and compiler-libs

We could of course study some other versions, but understanding the root causes of most of these changes would require to go deeper than what we can in such a blog post. Yet, these figures give good hints for experts on which versions to start investigating with.

Inside the OCaml Installation

Before concluding, it might also be worth studying which parts of the OCaml Installation take most of the space. 5.0.0 is a good candidate for such a study, as libraries have been moved to separate directories, instead of all being directly stored in lib/ocaml.

Here is a decomposition of the OCaml Installation:

Total: 529 MB
- share: 1 MB
- man: 4 MB
- bin: 303 MB
- lib/ocaml: 223 MB
  - compiler-libs: 134 MB
  - expunge: 20 MB

As we can see, a large majority of the space is used by executables. For example, all these ones are above 10 MB:

28 MB ocamldoc
26 MB ocamlopt.byte
25 MB ocamldebug
21 MB ocamlobjinfo.byte, ocaml
20 MB ocamldep.byte, ocamlc.byte
19 MB ocamldoc.opt
18 MB ocamlopt.opt
15 MB ocamlobjinfo.opt
14 MB ocamldep.opt, ocamlc.opt, ocamlcmt

There are both bytecode and native code executables in this list.

Conclusion

Our installer project would benefit from having a smaller binary OCaml distribution, but most OCaml users in general would also benefit from that: after a few years of using OCaml, OCaml developers usually end up with huge $HOME/.opam directories, because every opam switch often takes more than 1 GB of space, and the OCaml distribution takes a big part of that. opam-bin partially solves this problem by sharing equal files between several switches (when the --enable-share configuration option has been used).

Here is a short list of ideas to test to decrease the size of the binary OCaml distribution:

Use the same executable for multiple programs (ocamlc.opt, ocamlopt.opt, ocamldep.opt, etc.), using the first command argument to choose the behavior to have. Rustup, for example, only installs one binary in $HOME/.cargo/bin for cargo, rustc, rustup, etc. and actually, our tool does the same trick to share the same binary for itself, opam, opam-bin, ocp-indent and drom.
Split installed files into separate opam packages, of which only one would be installed as the compiler distribution. For example, most cmt files of compiler-libs are not needed by most users, they might only be useful for compiler/tooling developers, and even then, only in very rare cases. They could be installed as another opam package.
Remove the -linkall flag on ocamlcommon.cm[x]a libraries. In general, such a flag should only be set when building an executable that is expected to use plugins, because otherwise, this executable will contain all the modules of the library, even the ones that are not useful for its specific purpose.

WebAssembly/Wasm and OCaml

2022-12-14T09:05:17Z

The Dragon-Camel is raging at the sight of all the challenges we overcome!

In this first post about WebAssembly (Wasm) and OCaml, we introduce the work we have been doing for quite some time now, though without publicity, about our participation in the Garbage-Collection (GC) Working Group for Wasm, and two related development projects in OCaml.

WebAssembly, a fast and portable bytecode

WebAssembly is a low-level, binary format that allows compiled code to run efficiently in the browser. Its roadmap is decided by Working Groups from multiple organizations and companies, including Microsoft, Google, and Mozilla. These groups meet regularly to discuss and plan the development of WebAssembly, with the broader community of developers, academics, and other interested parties to gather feedback and ideas for the future of WebAssembly.

There are multiple projects in OCaml related to Wasm, notably Wasicaml, a production-ready port of the OCaml bytecode interpreter to Wasm . However, these projects don't tackle the domain we would like to address, and for good reasons: they target the existing version of Wasm, which is basically a very simple programming language with no data structures, but with an access to a large memory array. Almost anything can of course be compiled to something like that, but there is a big restriction: the resulting program can interact with the outside world only through the aforementioned memory buffer. This is perfectly fine if you write Command-Line Interface (CLI) tools, or workers to be deployed in a Content Delivery Network (CDN). However, this kind of interaction can become quite tedious if you need to deal with abstract objects provided by your environment, for example DOM objects in a browser to manipulate webpages. In such cases, you will need to write some wrapper access functions in JavaScript (or OCaml with js_of_ocaml of course), and you will have to be very careful about the lifetime of those objects to avoid memory leaks.

Hence the shiny new proposals to extend Wasm with various useful features that can be very convenient for OCaml. In particular, three extensions crucially matter to us, functional programmers: the Garbage Collection, Exceptions and Tail-Call proposals.

The Wasm committee has already worked on these proposals for a few years, and the Exceptions and Tail-Call proposals are now quite satisfying. However, this is not yet the case for the GC proposal. Indeed, finding a good API for a GC that is compatible with all the languages in the wild, that can be implemented efficiently, and can be used to run a program you don't trust, is all but an easy task. Multiple attempts by strong teams, for different virtual machines, have exposed limitations of past proposals. But, we must now admit that the current proposal has reached a state where it is quite impressive, being both simple and generic.

The proposal is now getting close to a feature freeze status. Thanks to the hard work of many people on the committee, including us, the particularities of functional typed languages were not forgotten in the design, and we are convinced that there should be no problem for OCaml. Now is the time to test it for real!

Targetting Wasm from the OCaml Compiler

Adding a brand new backend to a compiler to target something that is quite different from your usual assembly can be a huge work, and only a few language developers actively work on making a prototype for Wasm+GC. Yet, we think that it is important for the committee, to have as many examples as possible to validate the proposal and move it to the next step.

That's the reason why we decided to contribute to the proposal, by prototyping a backend for Wasm to the OCaml compiler.

Our experimental Wasm interpreter in OCaml

In parallel, we are also working on the development of our own Wasm Virtual Machine in OCaml, to be able to easily experiment both on the OCaml side and Wasm side, while waiting for most official Wasm VM to fully implement the new proposals.

These experimental projects and related discussions are very important design steps, although obviously far from production-ready status.

As our current work focuses on OCaml 4.14, effect handlers are left for future work. The current proposal that would make it possible to compile effect handlers to Wasm nicely is still in its earlier stages. We hope to be able to prototype it too on our Wasm VM.

Note that we are looking for sponsors to fund this work. If supporting Wasm in OCaml may impact your business, you can contact us to discuss how we can use your help!

Our next blog post in January will provide more technical details on our two prototyping efforts.

Alt-Ergo: the SMT solver with model generation

2022-11-16T09:05:17Z

The Alt-Ergo automatic theorem prover developed at OCamlPro has just been released with a major update : counterexample model can now be generated. This is now available on the next branch, and will officially be part of the 2.5.0 release, coming this year !

Alt-Ergo at a Glance

Alt-Ergo is an open source automatic theorem prover based on the SMT technology. It was born at the Laboratoire de Recherche en Informatique, Inria Saclay Ile-de-France and CNRS in 2006 and has been maintained and developed by OCamlPro since 2013.

It is capable of reasoning in a combination of several built-in theories such as:

uninterpreted equality;
integer and rational arithmetic;
arrays;
records;
algebraic data types;
bit vectors.

It also is able to deal with commutative and associative operators, quantified formulas and has a polymorphic first-order native input language. Alt-Ergo is written in OCaml. Its core has been formally proved in the Coq proof assistant.

Alt-Ergo has been involved in a qualification process (DO-178C) by Airbus Industrie. During this process, a qualification kit has been produced. It was composed of a technical document with tool requirements (TR) that gives a precise description of each part of the prover, a companion document (~ 450 pages) of tests, and an instrumented version of the tool with a TR trace mechanism.

Model Generation

When a property is false, generating a counterexample is a key that many state-of-the-art SMT-solvers should include by default. However, this is a complex problem in the first place.

The first obstacle is the decidability of the theories manipulated by the SMT solvers. In general, the complexity class (i.e. the classification of algorithmic problems) is between "NP-Hard" (for the linear arithmetic theory on integers for example) and "Undecidable" (for the polynomial arithmetic on integers for example). Then comes the quantified properties, i.e. properties prefixed with foralls and exists, adding an additional layer of complexity and undecidability. Another challenge was the core algorithm behind Alt-Ergo which does not natively support model generation. At last, an implementation of the models have to take care of Alt-Ergo's support of polymorphism.

How to use Model Generation in Alt-Ergo

There are two ways to activate model generation on Alt-Ergo.

Basic usage: simply add the option --model to your command ($ alt-ergo file --model)
Advanced usage: three options mainly impact the model generation.
- --interpretation: sets the model generation strategy. It can either be none for no model generation; first for generating the very first interpretation computed only; every for generating a model after each decision and last only generating a model when alt-ergo concludes on the formula satisfiability.
- --sat-solver: only the 'Tableaux-CDCL' sat solver is compatible with the interpretation feature
- --instantiation-heuristic: when set to normal, alt-ergo generates model faster. This is an experimental feature that sometimes generates incorrect models.
  
  Example:
  
  $ alt-ergo file --interpretation every --sat-solver Tableaux-CDCL --instantiation-heuristic auto

Warning: only the linear arithmetic and the enum model generation have been tested. Other theories are either not implemented (ADTs) or experimental (risk of crash or unsound models). We are currently still heavily testing the feature, so feel free to join us on Alt-Ergo's GitHub repository if you have questions or issues with this new feature. Note that the models generated are best-effort models; Alt-Ergo does not answer Sat when it outputs a model. In a future version, we will add a mechanism that automatically checks the model generated.

Godspeed!

Acknowledgements

We want to thank David Mentré and Denis Cousineau at Mitsubishi Electric R&D Center Europe for funding the initial work on counterexample.

Note that MERCE has been a Member of the Alt-Ergo Users’ Club for 3 years. This partnership allowed Alt-Ergo to evolve and we hope that more users will join the Club on our journey to make Alt-Ergo a must-have tool. Please do not hesitate to contact the Alt-Ergo team at OCamlPro: alt-ergo@ocamlpro.com.

Let's Encrypt Wildcard Certificates Made Easy with Agnos

2022-10-05T09:05:17Z

It is with great pleasure that we announce the first beta release of Agnos. A former personal project of our new recruit, Arthur, Agnos development is now hosted at and sponsored by OCamlPro's Rust division, Red Iron.

TL;DR: If you are familiar with ACME providers like Let's Encrypt, DNS-01 and the challenges relating to wildcard certificates, simply know that Agnos touts itself as a single-binary, API-less, provider-agnostic dns-01 client, allowing you to easily obtain wildcard certificates without having to interface with your DNS provider. To do so, it offers a user-friendly configuration and answers Let's Encrypt DNS-01 challenges on its own, bypassing the need for API calls to edit DNS zones. You may want to jump to the last section of this post, or directly join us on Agnos's github.

Agnos was born from the observation that even though wildcard certificates are in many cases more convenient and useful than their fully qualified counterparts, they are not often used in practice. As of today it is not uncommon to see certificates with multiple Subject Alternate Names (SAN) for multiple subdomains, which can become problematic and weaken infrastructure. If some situations indeed require to forego wildcard certificates, this choice is too often still a default one.

At OCamlPro, we believe that technical difficulties should not stand in the way of optimal decision making, and that compromises should only be made in the face of unsolvable challenges. By releasing this first beta of Agnos, we hope that your feedback we'll help us build a tool truly useful to the community and that together, we can open a path towards seamless wildcard certificate issuance – tossing away issues and pain-points previously encountered as a thing of the past.

This blog post describes the different ACME challenges, why DNS providers API have so far been hindering DNS-01 adoption, and how Agnos solves this issue. If you are already curious and want to run some code, let's meet on Agnos's github

Let's encrypt's mechanism and ACME challenges

The Automatic Certificate Management Environment (ACME) is the protocol behind automated certificate authority services like Let's Encrypt. At its core, this protocol requires the client asking for a certificate to provide evidence that they control a resource by having said resource display some authority-determined token.

The easiest way to do so is to serve a file on a web-server. For example serving a file containing the token at my-domain.example would prove that I control the web-server that the fully qualified domain name my-domain.example is pointing to. This, under normal circumstances proves that I somewhat control this fully qualified domain. This process is illustrated below.

The ACME client initiates the certificate issuance process and is challenged to serve the token via HTTP at the domain address. The ACME client and HTTP server can be and often are on the same machine. The token can be quickly provisioned, and the ACME client can ask the ACME server to validate the challenge and issue the certificate.

However, demonstrating that one controls an HTTP server pointed to by my-domain.example is not deemed enough by Let's Encrypt to demonstrate full control of the my-domain.example domain and all its subdomains. Hence, the user cannot be issued a wildcard certificate through this method.

To obtain a wildcard certificate, one must rely on the DNS-01 type of challenge, illustrated below. The ACME client initiates the certificate issuance process and is challenged to serve the token via a DNS TXT record. Because DNS management is often delegated to a DNS provider, the DNS server is rarely on the same machine, and the token must be provisioned via a call to the DNS provider API, if there is any. Moreover, DNS providers virtually always use multiple servers, and the new record must be propagated to all of them. The ACME client must then wait and check for the propagation to be finished before asking the ACME server to validate the challenge and issue the certificate.

The pros and cons of each of these two challenge type are summarized by Let's Encrypt's documentation as follow:

HTTP-01

Pros

It’s easy to automate without extra knowledge about a domain’s configuration.

It allows hosting providers to issue certificates for domains CNAMEd to them.

It works with off-the-shelf web servers.

Cons

It doesn’t work if your ISP blocks port 80 (this is rare, but some residential ISPs do this).

Let’s Encrypt doesn’t let you use this challenge to issue wildcard certificates.

If you have multiple web servers, you have to make sure the file is available on all of them.

DNS-01

Pros

You can use this challenge to issue certificates containing wildcard domain names.

It works well even if you have multiple web servers.

Cons

Keeping API credentials on your web server is risky.

Your DNS provider might not offer an API.

Your DNS API may not provide information on propagation times.

Agnos as the best of both worlds

By using NS records to delegate the DNS-01 challenge to Agnos itself, we can virtually remove all of DNS-01 cons. Indeed by serving its own DNS answers, Agnos:

Nullifies the need for API and API credentials
Nullifies all concerns regarding propagation times

In more details, Agnos proceeds as follows (and as illustrated below). Before any ACME transaction takes place (and only once), the ACME client user manually updates their DNS zone to delegate ACME specific subdomains to Agnos. Note that the rest of DNS functionality is still assumed by the DNS provider. To carry out the ACME transaction, the ACME client initiates the certificate issuance process and is challenged to serve the token via a DNS TXT record. Agnos does so using its own DNS functionality (leveraging Trust-dns). The ACME client can immediately ask the ACME server for validation. The ACME server asks the DNS provider for the TXT record and is replied to that the ACME specific subdomain is delegated to Agnos. The ACME server then asks Agnos-as-a-DNS-server for the TXT record which Agnos provides. Finally the certificate is issued and stored by Agnos on the client machine.

Taking Agnos for a ride

In conclusion, we hope that by switching to Agnos, or more generally to provider-agnostic DNS-01 challenge solving, individuals and organizations will benefit from the full power of DNS-01 and wildcard certificates, without having to take API-related concerns into account when choosing their DNS provider.

If this post has piqued your interest and you want to help us develop Agnos further by trying the beta out, let's meet on our github. We would very much appreciate any feedback and bug reports, so we tried our best to streamline and well document the installation process to facilitate new users. On ArchLinux for example, getting started can be as easy as:

Adding two records to your DNS zone using your provider web GUI:

agnos-ns.doma.in            A       1.2.3.4
_acme-challenge.doma.in     NS      agnos-ns.doma.in

and running on your server

# Install the agnos binary
yay -S agnos
# Allow agnos to bind the priviledge 53 port
sudo setcap 'cap_net_bind_service=+ep' /usr/bin/agnos
# Download the example configuration file
curl 'https://raw.githubusercontent.com/krtab/agnos/v.0.1.0-beta.1/config_example.toml' > agnos_config.toml
# Edit it to suit your need
vim agnos_config.toml
# Launch agnos 🚀
agnos agnos_config.toml

Until then, happy hacking!

opam 2.1.3 is released!

2022-08-12T09:05:17Z

Feedback on this post is welcomed on Discuss!

We are pleased to announce the minor release of opam 2.1.3.

This opam release consists of backported fixes:

Fix opam init and opam init --reinit when the jobs variable has been set in the opamrc or the current config. (#5056)
opam var no longer fails if no switch is set (#5025)
Setting a variable with option --switch <sw> fails instead of writing an invalid switch-config file (#5027)
Handle external dependencies when updating switch state pin status (all pins), instead as a post pin action (only when called with opam pin (#5046)
Remove windows double printing on commands and their output (#4940)
Stop Zypper from upgrading packages on updates on OpenSUSE (#4978)
Clearer error message if a command doesn't exist (#4112)
Actually allow multiple state caches to co-exist (#4554)
Fix some empty conflict explanations (#4373)
Fix an internal error on admin repository upgrade from OPAM 1.2 (#4965)

and improvements:

When inferring a 2.1+ switch invariant from 2.0 base packages, don't filter out pinned packages as that causes very wide invariants for pinned compiler packages (#4501)
Some optimisations to opam list --installable queries combined with other filters (#4311)
Improve performance of some opam list combinations (e.g. --available, --installable) (#4999)
Improve performance of opam list --conflicts-with when combined with other filters (#4999)
Improve performance of opam show by as much as 300% when the package to show is given explicitly or is unique (#4997)(#4172)
When a field is defined in switch and global scope, try to determine the scope also by checking switch selection (#5027)

You can also find API changes in the release note.

Opam installation instructions (unchanged):

From binaries: run
```
$ bash -c "sh <(curl -fsSL https://raw.githubusercontent.com/ocaml/opam/master/shell/install.sh) --version 2.1.3"
```
or download manually from the Github "Releases" page to your PATH. In this case, don't forget to run opam init --reinit -ni to enable sandboxing if you had version 2.0.0~rc manually installed or to update you sandbox script.
From source, using opam:
```
$ opam update; opam install opam-devel
```
(then copy the opam binary to your PATH as explained, and don't forget to run opam init --reinit -ni to enable sandboxing if you had version 2.0.0~rc manually installed or to update your sandbox script)
From source, manually: see the instructions in the README.

We hope you enjoy this new minor version, and remain open to bug reports and suggestions.

OCamlPro at the JFLA2022 Conference

2022-07-12T09:05:17Z

Domaine d'Essendiéras , located in French Region Perigord, where the JFLA2022 took place!

In today's article, we share our contributions to the 2022 JFLAs, the French-Speaking annual gathering on Application Programming Languages, mainly Functional Languages such as OCaml (Journées Francophones des Langages Applicatifs).

This much awaited event is organised by Inria, the French National Institute for Research in Science and Digital Technologies.

This is always the best occasion for us to directly share our latest works and contributions with this diverse community of researchers, professors, students and industrial actors alike. Moreover, it allows us to meet up with all our long known peers and get in contact with an ever changing pool of actors in the fields of functional languages in general, formal methods and everything OCaml!

This year the three papers we submitted were selected, and this is what this article is about!

Table of contents

Mikino, formal verification made accessible

Connecting Software Heritage with the OCaml ecosystem

Alt-Ergo-Fuzz, hunting the bugs of the bug hunter

Mikino, formal verification made accessible

Mikino and all correlated content mentionned in this article was made by Adrien Champion

If you follow our Blog, you may have already read our Mikino blogpost, but if you haven't here's a quick breakdown and a few pointers... In case you wish to play around or maybe contribute to the project. ;)

So what is Mikino ?

Mikino is a simple induction engine over transition systems. It is written in Rust, with a strong focus on ergonomics and user-friendliness.

Depending on what your needs are, you could either be interested in the Mikino Api or the Mikino Binary or just, for purely theoretical reasons, want to undergo our Verification for Dummies: SMT and Induction tutorial which is specifically tailored to appeal to the newbies of formal verification!

Have a go at it, learn and have fun!

For further reading: OCamlPro's paper for the JFLA2022 (Mikino) (French-written article describing the entire project).

Connecting Software Heritage with the OCaml ecosystem

The archiving of OCaml packages into the SWH architecture, the release of swhid library and the integration of SWH into opam was done by Léo Andrès, Raja Boujbel, Louis Gesbert and Dario Pinto

Once again, if you follow our Blog, you must have seen Software Heritage (SWH) mentioned in our yearly review article.

Now you can also look at SWH paper by OCamlPro for the JFLA2022 (French) if you are looking for a detailed explanation of how important Software Heritage is to free software as a whole, and in what manner OCamlPro contributed to this gargantuan long-term endeavour of theirs.

This great collaboration was one of the highlights of last year from which arose an OCaml library called swhid and the guaranteed perennity of all the packages found on opam.

The work we did to achieve this was to:

add a few modules to the SWH architecture in order to store all the OCaml packages found on opam in the Library of Alexandria of open source software.
release a library used for computing SWH identifiers
add support in opam in order to allow a fallback on SWH architecture if a given package is missing from the opam repository
patch the opam repository in order to detect already missing packages

Alt-Ergo-Fuzz, hunting the bugs of the bug hunter

The fuzzing of the SMT-Solver Alt-Ergo was done by Hichem Rami Ait El Hara, Guillaume Bury and Steven de Oliveira

As the last entry of OCamlPro's papers that have made it to this year's JFLA: a rundown of Hichem's work, guided by Guillaume and Steven, on developping a Fuzzer for Alt-Ergo.

When it comes to critical systems, and industry-borne software, there are no limits to the requirements in safety, correctness, testing that would prove a program's reliability.

This is what SMT (Satisfiability Modulo Theory)-Solvers like Alt-Ergo are for: they use a complex mix of theory and implementation in order to prove, given a set of input theories, whether a program is acceptable... But SMT-Solvers, like any other program in the world, has to be searched for bugs or unwanted behaviours - this is the harsh reality of development.

With that in mind, Hichem sought to provide a fuzzer for Alt-Ergo to help hunt the bugs of the bug hunter: Alt-Ergo-Fuzz.

This tool has helped identify several bugs of unsoundness and crashes:

#474 - Crash
#475 - Crash
#476 - Unsoundness
#477 - Unsoundness
#479 - Unsoundness
#481 - Crash
#482 - Crash

More details in OCamlPro's paper for the JFLA2022 (Alt-Ergo-Fuzz).

2021 at OCamlPro

2022-01-31T09:05:17Z

Passing from one year to another is a great time to have a look back!

OCamlPro was created in 2011 to advocate the adoption of the OCaml language and Formal Methods in general in the industry. 2021 was a very special year as we celebrated our 10th anniversary! While building a team of highly-skilled engineers, we navigated through our expertise domains, programming languages design, compilation and analysis, advanced developer tooling, formal methods, blockchains and high-value software prototyping.

In this article, as every year (see last year's post), we review some of the work we did during 2021, in many different worlds.

Table of contents

Newcomers at OCamlPro

Real Life Modern Applications

Modernizing the French Income Tax System
A First Step in the COBOL Universe
Auditing a High-Scale Genealogy Application
Improving an ecotoxicology platform

Contributions to OCaml

Flambda Code Optimizer
Opam Package Manager
LearnOCaml and TryOCaml
OCaml Documentation Hub
Plugging Opam into Software Heritage

Tooling for Formal Methods

Alt-Ergo Development
Alt-Ergo Users’ Club and R&D Projects
Dolmen Library for Automated Deduction Languages

Rust Developments

SMT, Induction and Mikino
Matla, a Project Manager for TLA+/TLC
Rust Training at Onera
Audit of a Rust Blockchain Node

Scaling and Verifying Blockchains

From Dune Network to FreeTON/EverScale
A Why3 Framework for Solidity

Participations in Public Events

Open Source Experience 2021
OCaml Workshop at ICFP 2021
Joining the Why3 Consortium at the ProofInUse Seminar

Towards 2022

As always, we warmly thank all our clients, partners, and friends, for their support and collaboration during this peculiar year!

Newcomers at OCamlPro

Some of the new and old members of the team: Pierre Chambart, Dario Pinto, Léo Andrès, Fabrice Le Fessant, Louis Gesbert, Artemiy Rozovyk, Muriel Shan Sei Fan, Nicolas Berthier, Vincent Laviron, Steven De Oliveira and Keryan Didier.

A company is nothing without its employees. This year, we have been delighted to welcome a great share of newcomers:

Hichem Rami Ait El Hara recently completed his master's degree in Computer Science. After an internship at OCamlPro, during which he developed a fuzzer for Alt-Ergo, he joined OCamlPro to work on Alt-Ergo and the verification of smart contracts. He will soon start a PhD on SMT solving.
Nicolas Berthier holds a PhD on synchronous programming for resource-constrained systems. With many years experience on model-checking, abstract interpretation, and software analysis, he joined OCamlPro to work on programming language compilation and analysis.
Julien Blond is a senior OCaml developer with a strong experience in formal verification of security software. He joined OCamlPro as both a project manager and a Coq expert.
Keryan Didier joined the team as a R&D engineer. He holds a PhD from University Pierre et Marie Curie, during which he developed an automated implementation method for hard real-time applications. Previously, he studied functional programming and language design at University Paris-Diderot. Keryan has been involved in the MLang project as well as the flambda2 project within OCamlPro's Compilation team.
Mohamed Hernouf recently completed his master's degree in Computer Science. After an internship at OCamlPro, working on the OCaml Documentation Hub, he joined OCamlPro and continues to work on the documentation hub and other OCaml applications.
Dario Pinto is a student at the 42Paris School of Computer Science. He joined OCamlPro in a work-study contract for two years.
Artemiy Rozovyk recently completed his master's degree in Computer Science. He joined OCamlPro to work on the development of applications for the EverScale and Avalanche blockchains.

Real Life Modern Applications

Modernizing the French Income Tax System

The M language, designed in the 80s for the Income Tax, is now being rewritten and extended in OCaml.

The M language is a very old programming language developed by the French tax administration to compute income taxes. Recently, Denis Merigoux and Raphael Monat have implemented a new compiler in OCaml for the M language. This new compiler shows better performance, clearer semantics, and achieves greater maintainability than the former compiler. OCamlPro is now involved in strengthening this new compiler, to put it in production and eventually compute the taxes of more than 30 million French families.

A First Step in the COBOL Universe

Recent studies still estimate that COBOL has the highest amount of lines of code running.

Born more than 60 years ago, COBOL is still said to be the most used language in the world, in terms of the number of lines running in computers, though many people forecast it would disappear at the edge of the 21st century. With more than 300 reserved keywords, it is also one of the most complex languages to parse and analyse. It's not enough to scare the developers at OCamlPro: while helping one of the biggest COBOL users in France to translate its programs into the GNUCobol open-source compiler, OCamlPro built a strong expertise of COBOL and mainframes, and developed a powerful parser of COBOL that will help us bring modern development tools to the COBOL developers.

Auditing a High-Scale Genealogy Application

Geneweb was developed in the 90s to manage family trees... and is still managing them!

Geneweb is one of the most powerful software to manage and share genealogical data to date. Written in OCaml more than 20 years ago, it contains a web server and complex algorithms to compute information on family trees. It is used by Geneanet, which is one of the leading companies in the genealogy field, to store more than 800,000 family trees and more than 7 billion names of ancestors. OCamlPro is now working with Geneanet to improve Geneweb and make it scale to even larger data sets.

Improving an ecotoxicology platform

Mosaic is used by ecotoxicologists and regulators to obtain advanced and innovative methods for environmental risks assessment.

The Mosaic platform helps researchers, industrials actors and regulators in the field of ecotoxicology by providing an easy way to run various statistical analyses. All the user has to do is to enter some data on the web interface, then computations are run on the server and the results are displayed. The platform is fully written in OCaml and takes care of calling the mathematical model which is written in R. OCamlPro modernised the project in order to ease maintainance and new contributions. In the process, we discovered bugs introduced by new R versions (without any kind of warning). Then we developped a new interface for data input, it's similar to a spreadsheet and much more convenient than having to write raw CSV. During this work, we had the opportunity to contribute to some other OCaml packages such as leveldb or write new ones such as agrid.

Contributions to OCaml

Flambda Code Optimizer

Flambda2 is a powerful code optimizer for the OCaml compiler.

OCamlPro is proud to be working on Flambda2, an ambitious OCaml optimizing compiler project, initiated with Mark Shinwell from Jane Street, our long-term partner and client. Flambda focuses on reducing the runtime cost of abstractions and removing as many short-lived allocations as possible. Jane Street has launched large-scale testing of flambda2, and on our side, we have documented the design of some key algorithms. In 2021, the Flambda team grew bigger with Keryan. Along with the considerable amount of fixes and improvements, this will allow us to publish Flambda2 in the coming months!

In other OCaml compiler news, 2021 saw the long-awaited merge of the multicore branch into the official development branch. This was thanks to the amazing work of many people, including our own, Damien Doligez. This is far from the end of the story though, and we're looking forward to both further contributing to the compiler (fixing bugs, re-enabling support for all platforms) and making use of the features in our own programs.

This work is allowed thanks to Jane Street’s funding.

Opam Package Manager

A large set of new features have been implemented in Opam in 2021.

Opam is the OCaml source-based package manager. The first specification draft was written in early 2012 and went on to become OCaml’s official package manager — though it may be used for other languages and projects, since Opam is language-agnostic! If you need to install, upgrade and manage your compiler(s), tools and libraries easily, Opam is meant for you. It supports multiple simultaneous compiler installations, flexible package constraints, and a Git-friendly development workflow.

Opam development and maintenance is a collaboration between OCamlPro, with Raja & Louis, and OCamlLabs, with David Allsopp & Kate Deplaix.

Our 2021 work on opam lead to the final release of the long-awaited opam 2.1, three versions of opam 2.0 and two versions of opam 2.1 with small fixes.

Opam 2.1 introduced several new features:

Integration of system dependencies (formerly the opam-depext plugin)
Creation of lock files for reproducible installations (formerly the opam-lock plugin)
Switch invariants, replacing the "base packages" in opam 2.0 and allowing for easier compiler upgrades
Improved option configurations
CLI versioning, allowing cleaner deprecations for opam now and also improvements to semantics in future without breaking backwards-compatibility
opam root readability by newer and older versions, even if the format changed
Performance improvements to opam-update, conflict messages, and many other areas

Take a stroll through the blog post for a closer look.

In 2021, we also prepared the soon to-be alpha release of opam 2.2 version. It will provide a better handling of the Windows ecosystem, integration of Software Heritage archive fallback, better UI on user interactions, recursively pinning of projects, fetching optimisations, etc.

This work is greatly helped by Jane Street’s funding and support.

LearnOCaml and TryOCaml

We have also been active in the maintainance of Learn-ocaml. What was originally designed as the platform for the OCaml MOOC is now a tool in the hands of OCaml teachers worldwide, managed and funded by the OCaml Foundation.

The work included a well overdue port to OCaml 4.12; generation of portable executables (automatic through CI) for much easier deployment and use of the command-line client; as well as many quality-of-life and usability improvements following from two-way conversations with many teachers.

On a related matter, we also reworked our on-line OCaml editor and toplevel TryOCaml, improving its design and adding features like code snippet sharing. We were glad to see that, in these difficult times, these tools proved useful to both teachers and students, and look forward to improving them further.

OCaml Documentation Hub

The OCaml Documentation Hub includes browsable documentation and sources for more than 2000 Opam packages.

As one of the biggest user of OCaml, OCamlPro aims at facilitating daily use of OCaml by developing a lot of open-source tooling.

One of our main contributions to the OCaml ecosystem in 2021 was probably the OCaml Documentation Hub at docs.ocaml.pro.

The OCaml Documentation Hub is a website that provides documentation for more than 2000 OPAM packages, among which of course the most popular ones, with inter-package documentation links! The website also contains browsable sources for all these packages, and a search engine to discover useful OCaml functions, modules, types and classes.

All this documentation is generated using our custom tool Digodoc. Though it's not worth a specific section, we also kept maintaining Drom, our layer on Dune and Opam that most of our recent projects use.

Pluging Opam into Software Heritage

Svalbard Global Seed Vault in Norway.

Last year has also seen the long awaited collaboration between Software Heritage and OCamlPro happen.

Thanks to a grant by the Alfred P. Sloan Foundation, OCamlPro has been able to collaborate with our partners at Software Heritage and manage to further expand the coverage of this gargantuan endeavour of theirs by archiving 3516 opam packages. In effect, the main benefits of this Open Source collaboration have been:

The addition of several modules to the Software Heritage architecture, allowing the archiving of said opam packages;
The publication of an OCaml library allowing to work with SWHIDs;
An implementation of a possible fallback onto Software Heritage if a given package on opam is no longer available;
A fix for the official opam repository in order to identify already missing packages.

Not long after Software was at last acknowledged by Unesco as part of the World Heritage, we were thrilled to be part of this great and meaningful initiative. We could feel how true passion remained throughout our interactions and long after the work was done.

Tooling for Formal Methods

Avionics, blockchains, cyber-security, cloud, etc... formal methods are spreading in the computer industry.

Alt-Ergo Development

OCamlPro develops and maintains Alt-Ergo, an automatic solver of mathematical formulas designed for program verification and based on Satisfiability Modulo Theories (SMT). Alt-Ergo was initially created within the VALS team at University of Paris-Saclay.

In 2021, we continued to focus on the maintainability of our solver. We released versions 2.4.0 and 2.4.1 in January and July respectively, with 2.4.1 containing a bugfix as well as some performance improvements.

In order to increase our test coverage, we instrumented Alt-Ergo so that we could run it using afl-fuzz. Although this is a proof of concept, and has yet to be integrated into Alt-ergo's continuous integration, it has already helped us find a few bugs, such as this.

Alt-Ergo Users’ Club and R&D Projects

We thank our partners from the Alt-Ergo Users’ Club, Adacore, CEA List, MERCE (Mitsubishi Electric R&D Centre Europe), Thalès, and Trust-In-Soft, for their trust. Their support allows us to maintain our tool.

The club was launched in 2019 and the third annual meeting of the Alt-Ergo Users’ Club was held in early April 2021. Our annual meeting is the perfect place to review each partner’s needs regarding Alt-Ergo. This year, we had the pleasure of receiving our partners to discuss the roadmap for future Alt-Ergo features and enhancements. If you want to join us for the next meeting (coming soon), contact us!

Finally, we will be able to merge into the main branch of Alt-Ergo some of the work we did in 2020. Thanks to our partner MERCE (Mitsubishi Electric R&D Centre Europe), we worked on the SMT model generation. Alt-Ergo is now (partially) able to output a model in the smt-lib2 format. Thanks to the Why3 team from University of Paris-Saclay, we hope that this work will be available in the Why3 platform to help users in their program verification efforts. OCamlPro was very happy to join the Why3 Consortium this year, for even more collaborations to come!

This work is funded in part by the FUI R&D Project LCHIP, MERCE, Adacore and with the support of the Alt-Ergo Users’ Club.

Dolmen Library for Automated Deduction Languages

Dolmen is a powerful library providing flexible parsers and typecheckers for many languages used in automated deduction.

The ongoing work on using the Dolmen library as frontend for Alt-Ergo has progressed considerably, both on the side of dolemn which has been extended to support Alt-Ergo's native language in this PR, and on Alt-Ergo's side to add dolmen as a frontend that can be chosen in this PR. Once these are merged, Alt-Ergo will be able to read input problems in new languages, such as TPTP!

Rust Developments

Rust is a very good complement to OCaml for performance critical applications.

SMT, Induction and Mikino

A few months ago, we published a series of posts: verification for dummies: SMT and induction. These posts introduce and discuss SMT solvers, the notion of induction and that of invariant strengthening. They rely on mikino, a simple software we wrote that can analyze simple transition systems and perform SMT-based induction checks (as well as BMC, i.e. bug-finding). We wrote mikino in Rust with readability and ergonomics in mind: mikino showcases the basics of writing an SMT-based model checker performing induction. The posts are very hands-on and leverage mikino's high-quality output to discuss induction and invariant strengthening, with examples that readers can run and edit themselves.

Matla, a Project Manager for TLA+/TLC

During 2021 we ended up using the TLA+ language and its associated TLC verification engine in several completely unrelated projects. TLC is an amazing tool, but it is not suited to handle a TLA+ project with many modules (files), regression tests, etc. In particular, TLA+ is not a typed language. This means that TLA+ code tends to have many checks (dynamic assertions) checking that quantities have the expected type. This is fine, albeit a bit tedious, to some extent, but as the code grows bigger the analysis conducted by TLC can become very, very expensive. Eventually it is not reasonable to keep assert-type-checking everything since it contributes to TLC's analysis exploding.

As TLA+/TLC users, we are currently developing matla which manages TLA+ projects. Written in Rust, matla is heavily inspired by the Rust ecosystem, in particular cargo. Matla has not been publicly released yet as we are waiting for more feedback from early users. We do use it internally however as its various features make our TLA+ projects much simpler:

handling the TLA toolchain (download, PATH, updates...) for the user;
provide a Matla module with "debug assertions" helpers: these assertions are active in debug mode, which is the default when running matla run. Passing --release to matla's run mode however compiles all debug assertions away; this allows to type-check everything when debugging while making sure release runs do not pay the price of these checks;
handle integration testing: matla projects have a tests directory where users can write tests (TLA files with a .tla and .cfg files) and specify if they are expected to be successful or to fail (and how);
understand and transform TLC's output to improve user feedback, in particular when TLC yields an error (not good enough yet and is the reason we have not released yet); matla also parses and prettifies TLC's counterexample traces by formatting values, formatting states (aggregation of values), and render traces of states graphically using ASCII art.

Rust Training at Onera

The ongoing pandemic is undoubtingly impacting our professional training activities. Still, we had the opportunity to set up a Rust training session with applied researchers at ONERA during the summer. The session spanned over a week (about seven hours a day) and was our first fully remote Rust training session. We still believe on-site training (when possible) is better, full remote offers some flexibility (spreading out the training over several weeks for instance) and our experience with ONERA shows that it can work in practice with the right technology. Interestingly, it turns out that some aspects of the session actually work better with remote: hands-on exercises and projects for instance benefit from screen sharing. Discussing code with one participant is done with screen sharing, meaning all participants can follow along if they so chose.

Long story short, fully remote training is something we now feel confident proposing to our clients as a flexible alternative to on-site training.

Audit of a Rust Blockchain Node

We participated in a contest aiming at writing a high-level specification of the (compiler for) the TON VM assembler, in particular its instructions and how they are compiled. This contest was a first step towards applying Formal Methods, and in particular formal verification, to the TON VM. We are happy to report that we finished first in this context, and are looking forward to future contests pushing Formal Methods further in the Everscale blockchain.

Scaling and Verifying Blockchains

OCamlPro is involved in several projects with high-throughput blockchains, such as EverScale and Avalanche.

From Dune Network to FreeTON/EverScale

In 2019-2020, we concentrated our efforts on the development of blockchains on adding new programming languages to the Dune Network ecosystem, in collaboration with Origin Labs. You can read more about Love and Solidity for Dune.

At the end of 2020, it became clear that high-throughput was becoming a major requirement for blockchain adoption in real applications, and that the Tezos-based technology behind Dune Network could not compete with high-performance blockchains such as Solana or Avalanche. Following this observation, the Dune Network community decided to merge with the FreeTON community early in 2021. Initially developed by Telegram, the TON project was stopped under legal threats, but another company, TONLabs, restarted the project from its open-source code under the FreeTON name, and the blockchain was launched mid-2020. FreeTON, now renamed EverScale, is today the fastest blockchain in the world, with around 55,000 transactions per second on an open network sustained during several days.

EverScale uses a very unique community-driven development process: contests are organized by thematic sub-governances (subgov) to improve the ecosystem, and contestants win prices in tokens to reward their high-quality work. During 2021, OCamlPro got involved in several of these sub-governances, both as a jury, in the Formal Methods subgov and the Developer Experience subgov, and a contestant winning multiple prices for the development of smart contracts (zksnarks use-cases, auctions and recurring payments), the audit of several smart contracts (TrueNFT audit, Smart Majority Voting audit and a DEX audit), and the specification of some Rust components in the node (the Assembler module).

This work in the EverScale ecosystem gave us the opportunity to develop some interesting OCaml contributions:

We improved our ocaml-solidity parser to support all the extensions of the Solidity language required to parse EverScale contracts;
We developed an OCaml binding for the EverScale Rust SDK;
We developed a command line wallet called ft to help developers easily deploy the contracts and interact with them;
We developed a bridge between Dune Network and EverScale to swap DUN tokens into EVER tokens.

This work was funded by the EverScale community through contests.

A Why3 Framework for Solidity

Our most recent work on the EverScale blockchain has been targetted into the development of a Why3 framework to formally verify EverScale Solidity contracts. At the same time, we have been involved in the specification of several big smart contract projects, and we plan to use this framework in practice on these projects as soon as their formal verification starts.

We hope to be able to extend this work to EVM based Solidity contracts, as available on Ethereum and Avalanche and many other blockchains. By comparison with other frameworks that work directly on the EVM bytecode, this work focused directly on the Solidity language should make the verification much higher-level and so more straight-forward.

Participations in Public Events

Open Source Experience 2021

Stéfane Fermigier (Abilian) and Pierre Baudracco (BlueMind) from Systematic Open Source Hub meet Amélie de Montchalin (French Minister of Public Service) in front of OCamlPro's booth.

We were present at the new edition of the Open Source Experience in Paris! Our booth welcomed our visitors to discuss tailor-made software solutions. Fabrice had the opportunity to give a presentation on FreeTON (Now EverScale) (Watch the video), the high speed blockchain he is working on. We were delighted to meet the open source community. Moreover, Amélie de Montchalin, French Minister of Transformation and Public Service, was present to the Open Source Experience to thank all the free software actors. A very nice experience for us, we can't wait to be back in 2022!

OCaml Workshop at ICFP 2021

We participated in the programming competition organized by the International Conference on Functional Programming (ICFP). 3 talks we submitted to the OCaml Workshop were accepted!

Fabrice, Mohamed and Louis presented Digodoc, our new tool that builds a graph of an opam switch, associating files, libraries and opam packages into a cyclic graph of inclusions and dependencies;
Fabrice spoke about Opam-bin, a plugin that builds binary opam packages on the fly;
Lastly, Steven and David presented Love, a smart contract language embedded in the Dune Network blockchain. It was an opportunity to present our tools and projects, and above all to discuss with the OCaml community. We're delighted to take part in this adventure every year!

Joining the Why3 Consortium at the ProofInUse seminar

We were very happy to join the Why3 Consortium while participating the ProofinUse joint lab seminar on counterexamples on October the 1st. Many thanks to Claude Marché for his role as scientific shepherd.

Towards 2022

Though 2022 is just starting, it already sounds like a great year with many new interesting and innovative projects for OCamlPro.

After a phase of adaptation to the health context in 2020 and a year of growth in 2021, we are motivated to start the year 2022 with new and very enriching projects, new professional encounters, leading to the growth of our teams. If you want to be part of a passionate team, we would love to hear from you! We are currently actively hiring. Check the available job positions and follow the application instructions!

All our amazing achievements are the result of incredible people and teamwork, kudos to Fabrice, Pierre, Louis, Vincent, Damien, Raja, Steven, Guillaume, David, Adrien, Léo, Keryan, Mohamed, Hichem, Dario, Julien, Artemiy, Nicolas, Elias, Marla, Aurore and Muriel.

Verification for Dummies: SMT and Induction

2021-10-14T09:05:17Z

Adrien Champion adrien.champion@ocamlpro.com
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

These posts broadly discusses induction as a formal verification technique, which here really means formal program verification. I will use concrete, runnable examples whenever possible. Some of them can run directly in a browser, while others require to run small easy-to-retrieve tools locally. Such is the case for pretty much all examples dealing directly with induction.

The next chapters discuss the following notions:

formal logics and formal frameworks;
SMT-solving: modern, low-level verification building blocks;
declarative transition systems;
transition system unrolling;
BMC and induction proofs over transition systems;
candidate strengthening.

The approach presented here is far from being the only one when it comes to program verification. It happens to be relatively simple to understand, and I believe that familiarity with the notions discussed here makes understanding other approaches significantly easier.

This book thus hopes to serve both as a relatively deep dive into the specific technique of SMT-based induction, as well as an example of the technical challenges inherent to both developing and using automated proof engines.

Some chapters contain a few pieces of Rust code. Usually to provide a runnable version of a system under discussion, or to serve as example of actual code that we want to encode and verify. Some notions of Rust could definitely help in places, but this is not mandatory (probably).

Generating static and portable executables with OCaml

2021-09-02T09:05:17Z

Distributing OCaml software on opam is great (if I dare say so myself), but sometimes you need to provide your tools to an audience outside of the OCaml community, or just without recompilations or in a simpler way.

However, just distributing the locally generated binaries requires that the users have all the needed shared libraries installed, and a compatible libc. It's not something you can assume in general, and even if you don't need any C shared library or are confident enough it will be installed everywhere, the libc issue will arise for anyone using a distribution based on a different kind, or a little older than the one you used to build.

There is no built-in support for generating static executables in the OCaml compiler, and it may seem a bit tricky, but it's not in fact too complex to do by hand, something you may be ready to do for a release that will be published. So here are a few tricks, recipes and advice that should enable you to generate truly portable executables with no external dependency whatsoever. Both Linux and macOS will be treated, but the examples will be based on Linux unless otherwise specified.

Example

I will take as an example a trivial HTTP file server based on Dream.

Sample code

fserv.ml

let () = Dream.(run @@ logger @@ static ".")

fserv.opam

opam-version: "2.0"
depends: ["ocaml" "dream"]

dune-project

(lang dune 2.8)
(name fserv)

The relevant part of our dune file is just:

(executable
  (public_name fserv)
  (libraries dream))

This is how we check the resulting binary:

$ dune build fserv.exe
      ocamlc .fserv.eobjs/byte/dune__exe__Fserv.{cmi,cmo,cmt}
    ocamlopt .fserv.eobjs/native/dune__exe__Fserv.{cmx,o}
    ocamlopt fserv.exe
$ file _build/default/fserv.exe
_build/default/fserv.exe: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=1991bb9f1d67807411c93f6fb6ec46b4a0ee8ed5, for GNU/Linux 3.2.0, with debug_info, not stripped
$ ldd _build/default/fserv.exe
        linux-vdso.so.1 (0x00007ffe97690000)
        libssl.so.1.1 => /usr/lib/x86_64-linux-gnu/libssl.so.1.1 (0x00007fd6cc636000)
        libcrypto.so.1.1 => /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 (0x00007fd6cc342000)
        libev.so.4 => /usr/lib/x86_64-linux-gnu/libev.so.4 (0x00007fd6cc330000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fd6cc30e000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fd6cc1ca000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fd6cc1c4000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fd6cbffd000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fd6cced7000)

(on macOS, replace ldd with otool -L; dune output is obtained with (display short) in ~/.config/dune/config)

So let's see how to change this result. Basically, here, libev, libssl and libcrypto are required shared libraries that may not be installed on every system, while all the others are part of the core system:

linux-vdso, libdl and ld-linux are concerned with the dynamic loading of shared objects ;
libm and libpthread are extensions of the core libc that are tightly bound to it, and always installed.

Statically linking the libraries

In simple cases, static linking can be turned on as easily as passing the -static flag to the C compiler: through OCaml you will need to pass -cclib -static. We can add that to our dune file:

(executable
  (public_name fserv)
  (flags (:standard -cclib -static))
  (libraries dream))

... which gives:

$ dune build fserv.exe
      ocamlc .fserv.eobjs/byte/dune__exe__Fserv.{cmi,cmo,cmt}
    ocamlopt .fserv.eobjs/native/dune__exe__Fserv.{cmx,o}
    ocamlopt fserv.exe
/usr/bin/ld: /usr/lib/gcc/x86_64-linuxgnu/10/../../../x86_64-linux-gnu/libcrypto.a(dso_dlfcn.o): in function `dlfcn_globallookup':
(.text+0x13): warning: Using 'dlopen' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
/usr/bin/ld: ~/.opam/4.11.0/lib/ocaml/libunix.a(initgroups.o): in function `unix_initgroups':
initgroups.c:(.text.unix_initgroups+0x1f): warning: Using 'initgroups' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
[...]
$ file _build/default/fserv.exe 
_build/default/fserv.exe: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), statically linked, BuildID[sha1]=9ee3ae1c24fbc291d1f580bc7aaecba2777ee6c2, for GNU/Linux 3.2.0, with debug_info, not stripped
$ ldd _build/default/fserv.exe
        not a dynamic executable

The executable was generated... and the result seems OK, but we shouldn't skip all these ld warnings. Basically, what ld is telling us is that you shouldn't statically link glibc (it internally uses dynlinking, to libraries that also need glibc functions, and will therefore still need to dynlink a second version from the system 🤯).

Indeed here, we have been statically linking a dynamic linking engine, among other things. Don't do it.

Linux solution: linking with musl instead of glibc

The easiest workaround at this point, on Linux, is to compile with musl, which is basically a glibc replacement that can be statically linked. There are some OCaml and gcc variants to automatically use musl (comments welcome if you have been successful with them!), but I have found the simplest option is to use a tiny Alpine Linux image through a Docker container. Here we'll use OCamlPro's minimal Docker images but anything based on musl should do.

$ docker run --rm -it ocamlpro/ocaml:4.12
[...]
~/fserv $ sudo apk add openssl-libs-static
(1/1) Installing openssl-libs-static (1.1.1l-r0)
OK: 161 MiB in 52 packages
~/fserv $ opam switch create . --deps ocaml-system
[...]
~/fserv $ eval $(opam env)
~/fserv $ dune build fserv.exe
~/fserv $ file _build/default/fserv.exe
_build/default/fserv.exe: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, with debug_info, not stripped
~/fserv $ ldd _build/default/fserv.exe
        /lib/ld-musl-x86_64.so.1 (0x7ff41353f000)

Almost there! We see that we had to install extra packages with apk add: the static libraries might not be already installed and in this case are in a separate package (you would get bin/ld: cannot find -lssl). The last remaining dynamic loader in the output of ldd is because static PIE executable were not supported until recently. To get rid of it, we just need to add -cclib -no-pie (note: a previous revision of this blogpost mentionned -static-pie instead, which may work with recent compilers, but didn't seem to give reliable results):

(executable
  (public_name fserv)
  (flags (:standard -cclib -static -cclib -no-pie))
  (libraries dream))

And we are good!

~/fserv $ file _build/default/fserv.exe
_build/default/fserv.exe: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, with debug_info, not stripped
~/fserv $ ldd _build/default/fserv.exe
/lib/ld-musl-x86_64.so.1: _build/default/fserv.exe: Not a valid dynamic program

Trick: short script to compile through a Docker container

Passing the context to a Docker container and getting the artefacts back can be bothersome and often causes file ownership issues, so I use the following snippet to pipe them to/from it using tar:
git ls-files -z | xargs -0 tar c | 
docker run --rm -i ocamlpro/ocaml:4.12 
  sh -uexc 
    '{ tar x &&
       opam switch create . ocaml-system --deps-only --locked &&
       opam exec -- dune build --profile=release @install;
     } >&2 && tar c -hC _build/install/default/bin .' | 
tar vx

The other cases: turning to manual linking

Sometimes you can't use the above: the automatic linking options may need to be tweaked for static libraries, your app may still need dynlinking support at some point, or you may not have the musl option. On macOS, for example, the libc doesn't have a static version at all (and the -static option of ld is explicitely "only used building the kernel"). Let's get our hands dirty and see how to use a mixed static/dynamic linking scheme. First, we examine how OCaml does the linking usually:

The linking options are passed automatically by OCaml, using information that is embedded in the cm(x)a files, for example:

$ ocamlobjinfo $(opam var lwt:lib)/unix/lwt_unix.cma |head
File ~/.opam/4.11.0/lib/lwt/unix/lwt_unix.cma
Force custom: no
Extra C object files: -llwt_unix_stubs -lev -lpthread
Extra C options:
Extra dynamically-loaded libraries: -llwt_unix_stubs
Unit name: Lwt_features
Interfaces imported:
        c21c5d26416461b543321872a551ea0d        Stdlib
        1372e035e54f502dcc3646993900232f        Lwt_features
        3a3ca1838627f7762f49679ce0278ad1        CamlinternalFormatBasics

Now the linking flags, here -llwt_unix_stubs -lev -lpthread let the C compiler choose the best way to link; in the case of stubs, they will be static (using the .a files — unless you make special effort to use dynamic ones), but -lev will let the system linker select the shared library, because it is generally preferred. Gathering these flags by hand would be tedious: my preferred trick is to just add the -verbose flag to OCaml (for the lazy, you can just set — temporarily — OCAMLPARAM=_,verbose=1):

(executable
  (public_name fserv)
  (flags (:standard -verbose))
  (libraries dream))

$ dune build
      ocamlc .fserv.eobjs/byte/dune__exe__Fserv.{cmi,cmo,cmt}
    ocamlopt .fserv.eobjs/native/dune__exe__Fserv.{cmx,o}
+ as  -o '.fserv.eobjs/native/dune__exe__Fserv.o' '/tmp/build8eb7e5.dune/camlasm91a0b9.s'
    ocamlopt fserv.exe
+ as  -o '/tmp/build8eb7e5.dune/camlstartupc9267f.o' '/tmp/build8eb7e5.dune/camlstartup1d9915.s'
+ gcc -O2 -fno-strict-aliasing -fwrapv -Wall -Wdeclaration-after-statement -fno-common -fexcess-precision=standard -fno-tree-vrp -ffunction-sections -D_FILE_OFFSET_BITS=64 -D_REENTRANT -DCAML_NAME_SPACE  -Wl,-E -o 'fserv.exe'  '-L~/.opam/4.11.0/lib/bigstringaf' '-L~/.opam/4.11.0/lib/ocaml' '-L~/.opam/4.11.0/lib/ocaml' '-L~/.opam/4.11.0/lib/ocaml' '-L~/.opam/4.11.0/lib/lwt/unix' '-L~/.opam/4.11.0/lib/cstruct' '-L~/.opam/4.11.0/lib/mirage-crypto' '-L~/.opam/4.11.0/lib/mirage-crypto-rng/unix' '-L~/.opam/4.11.0/lib/mtime/os' '-L~/.opam/4.11.0/lib/digestif/c' '-L~/.opam/4.11.0/lib/bigarray-overlap/stubs' '-L~/.opam/4.11.0/lib/ocaml' '-L~/.opam/4.11.0/lib/ssl' '-L~/.opam/4.11.0/lib/ocaml'  '/tmp/build8eb7e5.dune/camlstartupc9267f.o' '~/.opam/4.11.0/lib/ocaml/std_exit.o' '.fserv.eobjs/native/dune__exe__Fserv.o' '~/.opam/4.11.0/lib/dream/dream.a' '~/.opam/4.11.0/lib/dream/sql/dream__sql.a' '~/.opam/4.11.0/lib/dream/http/dream__http.a' '~/.opam/4.11.0/lib/dream/websocketaf/websocketaf.a' '~/.opam/4.11.0/lib/dream/httpaf-lwt-unix/httpaf_lwt_unix.a' '~/.opam/4.11.0/lib/dream/httpaf-lwt/httpaf_lwt.a' '~/.opam/4.11.0/lib/dream/h2-lwt-unix/h2_lwt_unix.a' '~/.opam/4.11.0/lib/dream/h2-lwt/h2_lwt.a' '~/.opam/4.11.0/lib/dream/h2/h2.a' '~/.opam/4.11.0/lib/psq/psq.a' '~/.opam/4.11.0/lib/dream/httpaf/httpaf.a' '~/.opam/4.11.0/lib/dream/hpack/hpack.a' '~/.opam/4.11.0/lib/dream/gluten-lwt-unix/gluten_lwt_unix.a' '~/.opam/4.11.0/lib/lwt_ssl/lwt_ssl.a' '~/.opam/4.11.0/lib/ssl/ssl.a' '~/.opam/4.11.0/lib/dream/gluten-lwt/gluten_lwt.a' '~/.opam/4.11.0/lib/faraday-lwt-unix/faraday_lwt_unix.a' '~/.opam/4.11.0/lib/faraday-lwt/faraday_lwt.a' '~/.opam/4.11.0/lib/dream/gluten/gluten.a' '~/.opam/4.11.0/lib/faraday/faraday.a' '~/.opam/4.11.0/lib/dream/localhost/dream__localhost.a' '~/.opam/4.11.0/lib/dream/graphql/dream__graphql.a' '~/.opam/4.11.0/lib/ocaml/str.a' '~/.opam/4.11.0/lib/graphql-lwt/graphql_lwt.a' '~/.opam/4.11.0/lib/graphql/graphql.a' '~/.opam/4.11.0/lib/graphql_parser/graphql_parser.a' '~/.opam/4.11.0/lib/re/re.a' '~/.opam/4.11.0/lib/dream/middleware/dream__middleware.a' '~/.opam/4.11.0/lib/yojson/yojson.a' '~/.opam/4.11.0/lib/biniou/biniou.a' '~/.opam/4.11.0/lib/easy-format/easy_format.a' '~/.opam/4.11.0/lib/magic-mime/magic_mime_library.a' '~/.opam/4.11.0/lib/fmt/fmt_tty.a' '~/.opam/4.11.0/lib/multipart_form/lwt/multipart_form_lwt.a' '~/.opam/4.11.0/lib/dream/pure/dream__pure.a' '~/.opam/4.11.0/lib/hmap/hmap.a' '~/.opam/4.11.0/lib/multipart_form/multipart_form.a' '~/.opam/4.11.0/lib/rresult/rresult.a' '~/.opam/4.11.0/lib/pecu/pecu.a' '~/.opam/4.11.0/lib/prettym/prettym.a' '~/.opam/4.11.0/lib/bigarray-overlap/overlap.a' '~/.opam/4.11.0/lib/bigarray-overlap/stubs/overlap_stubs.a' '~/.opam/4.11.0/lib/base64/rfc2045/base64_rfc2045.a' '~/.opam/4.11.0/lib/unstrctrd/parser/unstrctrd_parser.a' '~/.opam/4.11.0/lib/unstrctrd/unstrctrd.a' '~/.opam/4.11.0/lib/uutf/uutf.a' '~/.opam/4.11.0/lib/ke/ke.a' '~/.opam/4.11.0/lib/fmt/fmt.a' '~/.opam/4.11.0/lib/base64/base64.a' '~/.opam/4.11.0/lib/digestif/c/digestif_c.a' '~/.opam/4.11.0/lib/stdlib-shims/stdlib_shims.a' '~/.opam/4.11.0/lib/dream/graphiql/dream__graphiql.a' '~/.opam/4.11.0/lib/dream/cipher/dream__cipher.a' '~/.opam/4.11.0/lib/mirage-crypto-rng/lwt/mirage_crypto_rng_lwt.a' '~/.opam/4.11.0/lib/mtime/os/mtime_clock.a' '~/.opam/4.11.0/lib/mtime/mtime.a' '~/.opam/4.11.0/lib/duration/duration.a' '~/.opam/4.11.0/lib/mirage-crypto-rng/unix/mirage_crypto_rng_unix.a' '~/.opam/4.11.0/lib/mirage-crypto-rng/mirage_crypto_rng.a' '~/.opam/4.11.0/lib/mirage-crypto/mirage_crypto.a' '~/.opam/4.11.0/lib/eqaf/cstruct/eqaf_cstruct.a' '~/.opam/4.11.0/lib/eqaf/bigstring/eqaf_bigstring.a' '~/.opam/4.11.0/lib/eqaf/eqaf.a' '~/.opam/4.11.0/lib/cstruct/cstruct.a' '~/.opam/4.11.0/lib/caqti-lwt/caqti_lwt.a' '~/.opam/4.11.0/lib/lwt/unix/lwt_unix.a' '~/.opam/4.11.0/lib/ocaml/threads/threads.a' '~/.opam/4.11.0/lib/ocplib-endian/bigstring/ocplib_endian_bigstring.a' '~/.opam/4.11.0/lib/ocplib-endian/ocplib_endian.a' '~/.opam/4.11.0/lib/mmap/mmap.a' '~/.opam/4.11.0/lib/ocaml/bigarray.a' '~/.opam/4.11.0/lib/ocaml/unix.a' '~/.opam/4.11.0/lib/logs/logs_lwt.a' '~/.opam/4.11.0/lib/lwt/lwt.a' '~/.opam/4.11.0/lib/caqti/caqti.a' '~/.opam/4.11.0/lib/uri/uri.a' '~/.opam/4.11.0/lib/angstrom/angstrom.a' '~/.opam/4.11.0/lib/bigstringaf/bigstringaf.a' '~/.opam/4.11.0/lib/bigarray-compat/bigarray_compat.a' '~/.opam/4.11.0/lib/stringext/stringext.a' '~/.opam/4.11.0/lib/ptime/ptime.a' '~/.opam/4.11.0/lib/result/result.a' '~/.opam/4.11.0/lib/logs/logs.a' '~/.opam/4.11.0/lib/ocaml/stdlib.a' '-lssl_stubs' '-lssl' '-lcrypto' '-lcamlstr' '-loverlap_stubs_stubs' '-ldigestif_c_stubs' '-lmtime_clock_stubs' '-lrt' '-lmirage_crypto_rng_unix_stubs' '-lmirage_crypto_stubs' '-lcstruct_stubs' '-llwt_unix_stubs' '-lev' '-lpthread' '-lthreadsnat' '-lpthread' '-lunix' '-lbigstringaf_stubs' '~/.opam/4.11.0/lib/ocaml/libasmrun.a' -lm -ldl

There is a lot of noise, but the interesting part is at the end, the -l* options before the standard ocaml/libasmrun -lm -ldl:

  '-lssl_stubs' '-lssl' '-lcrypto' '-lcamlstr' '-loverlap_stubs_stubs' '-ldigestif_c_stubs' '-lmtime_clock_stubs' '-lrt' '-lmirage_crypto_rng_unix_stubs' '-lmirage_crypto_stubs' '-lcstruct_stubs' '-llwt_unix_stubs' '-lev' '-lpthread' '-lthreadsnat' '-lpthread' '-lunix' '-lbigstringaf_stubs'

Manually linking with glibc (Linux)

To link these statically, but the glibc dynamically:

we disable the automatic generation of linking flags by OCaml with -noautolink
we pass directives to the linker through OCaml and the C compiler, using -cclib -Wl,xxx. -Bstatic makes static linking the preferred option
we escape the linking flags we extracted above through -cclib

(executable
  (public_name fserv)
  (flags (:standard
          -noautolink
          -cclib -Wl,-Bstatic
          -cclib -lssl_stubs                    -cclib -lssl
          -cclib -lcrypto                       -cclib -lcamlstr
          -cclib -loverlap_stubs_stubs          -cclib -ldigestif_c_stubs
          -cclib -lmtime_clock_stubs            -cclib -lrt
          -cclib -lmirage_crypto_rng_unix_stubs -cclib -lmirage_crypto_stubs
          -cclib -lcstruct_stubs                -cclib -llwt_unix_stubs
          -cclib -lev                           -cclib -lthreadsnat
          -cclib -lunix                         -cclib -lbigstringaf_stubs
          -cclib -Wl,-Bdynamic
          -cclib -lpthread))
  (libraries dream))

Note that -lpthread and -lm are tightly bound to the libc and can't be static in this case, so we moved -lpthread to the end, outside of the static section. The part between the -Bstatic and the -Bdynamic is what will be statically linked, leaving the defaults and the libc dynamic. Result:

$ dune build fserv.exe && ldd _build/default/fserv.exe
      ocamlc .fserv.eobjs/byte/dune__exe__Fserv.{cmi,cmo,cmt}
    ocamlopt .fserv.eobjs/native/dune__exe__Fserv.{cmx,o}
    ocamlopt fserv.exe
$ file _build/default/fserv.exe
_build/default/fserv.exe: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=31c93085284da5d74002218b1d6b61c0efbdefe4, for GNU/Linux 3.2.0, with debug_info, not stripped
$ ldd _build/default/fserv.exe
        linux-vdso.so.1 (0x00007ffe207c5000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f49d5e56000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f49d5d12000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f49d5d0c000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f49d5b47000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f49d69bf000)

The remaining are the base of the dynamic linking / shared object systems, but we got away with libssl, libcrypto and libev, which were the ones possibly absent from target systems. The resulting executable should work on any glibc-based Linux distribution that is recent enough; on older ones you will likely get missing GLIBC symbols.

If you need to distribute that way, it's a good idea to compile on an old release (like Debian 'oldstable' or 'oldoldstable') for maximum portability.

Manually linking on macOS

Unfortunately, the linker on macOS doesn't seem to have options to select the static versions of the libraries; the only solution is to get our hands even dirtier, and link directly to the .a files, instead of using -l arguments.

Most of the flags just link with stubs, we can keep them as is: -lssl_stubs -lcamlstr -loverlap_stubs_stubs -ldigestif_c_stubs -lmtime_clock_stubs -lmirage_crypto_rng_unix_stubs -lmirage_crypto_stubs -lcstruct_stubs -llwt_unix_stubs -lthreadsnat -lunix -lbigstringaf_stubs

That leaves us with: -lssl -lcrypto -lev -lpthread

lpthread is built-in, we can ignore it

for the others, we need to lookup the .a file: I use e.g.

$ echo $(pkg-config libssl --variable libdir)/libssl.a
~/brew/Cellar/openssl@1.1/1.1.1k/lib/libcrypto.a

Of course you don't want to hardcode these paths, but let's test for now:

(executable
  (public_name fserv)
  (flags (:standard
          -noautolink
          -cclib -lssl_stubs           -cclib -lcamlstr
          -cclib -loverlap_stubs_stubs -cclib -ldigestif_c_stubs
          -cclib -lmtime_clock_stubs   -cclib -lmirage_crypto_rng_unix_stubs
          -cclib -lmirage_crypto_stubs -cclib -lcstruct_stubs
          -cclib -llwt_unix_stubs      -cclib -lthreadsnat
          -cclib -lunix                -cclib -lbigstringaf_stubs
          -cclib ~/brew/Cellar/openssl@1.1/1.1.1k/lib/libssl.a
          -cclib ~/brew/Cellar/openssl@1.1/1.1.1k/lib/libcrypto.a
          -cclib ~/brew/Cellar/libev/4.33/lib/libev.a))
  (libraries dream))

$ dune build fserv.exe
      ocamlc .fserv.eobjs/byte/dune__exe__Fserv.{cmi,cmo,cmt}
    ocamlopt .fserv.eobjs/native/dune__exe__Fserv.{cmx,o}
    ocamlopt fserv.exe
$ file _build/default/fserv.exe
_build/default/fserv.exe: Mach-O 64-bit executable x86_64
$ otool -L _build/default/fserv.exe
_build/default/fserv.exe:
        /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1292.60.1)

This is as good as it will get!

Cleaning up the build system

We have until now been adding the linking flags manually in the dune file; you probably don't want to do that and be restricted to static builds only! Not counting the non-portable link options we have been using...

The quick&dirty way

Don't use this in your build system! But for quick testing you can conveniently pass flags to the OCaml compilers using the OCAMLPARAM variable. Combined with the tar/docker snippet above, we get a very simple static-binary generating command:

git ls-files -z | xargs -0 tar c | 
docker run --rm -i ocamlpro/ocaml:4.12 
  sh -uexc '{
    tar x &&
    sudo apk add openssl-libs-static &&
    opam switch create . ocaml-system --deps-only --locked &&
    OCAMLPARAM=_,cclib=-static,cclib=-no-pie opam exec -- dune build --profile=release @install;
  } >&2 && tar c -hC _build/install/default/bin .' | 
tar vx

Note that, for releases, you may also want to strip the generated binaries.

Making it an option of the build system (with dune)

For something you will want to commit, I recommend to generate the flags in a separate file linking-flags-fserv.sexp:

(executable
  (public_name fserv)
  (flags (:standard (:include linking-flags-fserv.sexp)))
  (libraries dream))

The linking flags will depend on the chosen linking mode and on the OS. For the OS, it's easiest to generate them through a script ; for the linking mode, I use an environment variable to optionally turn static linking on.

(rule
  (with-stdout-to linking-flags-fserv.sexp
    (run ./gen-linking-flags.sh %{env:LINKING_MODE=dynamic} %{ocaml-config:system})))

This will use the following gen-linking-flags.sh script to generate the file, passing it the value of $LINKING_MODE and defaulting to dynamic. Doing it this way also ensures that dune will properly recompile when the value of the environment variable changes.

#!/bin/sh
set -ue

LINKING_MODE="$1"
OS="$2"
FLAGS=
CCLIB=

case "$LINKING_MODE" in
    dynamic)
        ;; # No extra flags needed
    static)
        case "$OS" in
            linux) # Assuming Alpine here
                CCLIB="-static -no-pie";;
            macosx)
                FLAGS="-noautolink"
                CCLIB="-lssl_stubs -lcamlstr -loverlap_stubs_stubs
                       -ldigestif_c_stubs -lmtime_clock_stubs
                       -lmirage_crypto_rng_unix_stubs -lmirage_crypto_stubs
                       -lcstruct_stubs -llwt_unix_stubs -lthreadsnat -lunix
                       -lbigstringaf_stubs"
                LIBS="libssl libcrypto libev"
                for lib in $LIBS; do
                    CCLIB="$CCLIB $(pkg-config $lib --variable libdir)/$lib.a"
                done;;
            *)
                echo "No known static compilation flags for '$OS'" >&2
                exit 1
        esac;;
    *)
        echo "Invalid linking mode '$LINKING_MODE'" >&2
        exit 2
esac

echo '('
for f in $FLAGS; do echo "  $f"; done
for f in $CCLIB; do echo "  -cclib $f"; done
echo ')'

Then you'll only have to run LINKING_MODE=static dune build fserv.exe to generate the static executable (wrapped in the Docker script above, in the case of Alpine), and can include that in your CI as well.

For real-world examples, you can check learn-ocaml or opam.

Related topics

reproducible builds should be a goal when you intend to distribute pre-compiled binaries.

opam-bundle is a different, heavy-weight approach to distributing opam software to non-OCaml developers, that retains the "compile all from source" policy but provides one big package that bootstraps OCaml, opam and all the dependencies with a single command.-

opam 2.1.0 is released!

2021-08-04T09:05:17Z

Feedback on this post is welcomed on Discuss!

We are happy to announce the release of opam 2.1.0.

Many new features made it in (see the pre-release changelogs or release notes for the details), but here are a few highlights.

What's new in opam 2.1?

Integration of system dependencies (formerly the opam-depext plugin), increasing their reliability as it integrates the solving step
Creation of lock files for reproducible installations (formerly the opam-lock plugin)
Switch invariants, replacing the "base packages" in opam 2.0 and allowing for easier compiler upgrades
Improved options configuration (see the new option and expanded var sub-commands)
CLI versioning, allowing cleaner deprecations for opam now and also improvements to semantics in future without breaking backwards-compatibility
opam root readability by newer and older versions, even if the format changed
Performance improvements to opam-update, conflict messages, and many other areas

Seamless integration of System dependencies handling (a.k.a. "depexts")

opam has long included the ability to install system dependencies automatically via the depext plugin. This plugin has been promoted to a native feature of opam 2.1.0 onwards, giving the following benefits:

You no longer have to remember to run opam depext, opam always checks depexts (there are options to disable this or automate it for CI use). Installation of an opam package in a CI system is now as easy as opam install ., without having to do the dance of opam pin add -n/depext/install. Just one command now for the common case!
The solver is only called once, which both saves time and also stabilises the behaviour of opam in cases where the solver result is not stable. It was possible to get one package solution for the opam depext stage and a different solution for the opam install stage, resulting in some depexts missing.
opam now has full knowledge of depexts, which means that packages can be automatically selected based on whether a system package is already installed. For example, if you have neither MariaDB nor MySQL dev libraries installed, opam install mysql will offer to install conf-mysql and mysql, but if you have the MariaDB dev libraries installed, opam will offer to install conf-mariadb and mysql.

Hint: You can set OPAMCONFIRMLEVEL=unsafe-yes or --confirm-level=unsafe-yes to launch non interactive system package commands.

opam lock files and reproducibility

When opam was first released, it had the mission of gathering together scattered OCaml source code to build a community repository. As time marches on, the size of the opam repository has grown tremendously, to over 3000 unique packages with over 19500 unique versions. opam looks at all these packages and is designed to solve for the best constraints for a given package, so that your project can keep up with releases of your dependencies.

While this works well for libraries, we need a different strategy for projects that need to test and ship using a fixed set of dependencies. To satisfy this use-case, opam 2.0.0 shipped with support for using project.opam.locked files. These are normal opam files but with exact versions of dependencies. The lock file can be used as simply as opam install . --locked to have a reproducible package installation.

With opam 2.1.0, the creation of lock files is also now integrated into the client:

opam lock will create a .locked file for your current switch and project, that you can check into the repository.
opam switch create . --locked can be used by users to reproduce your dependencies in a fresh switch.

This lets a project simultaneously keep up with the latest dependencies (without lock files) while providing a stricter set for projects that need it (with lock files).

Hint: You can export the full configuration of a switch with opam switch export new options, --full to have all packages metadata included, and --freeze to freeze all VCS to their current commit.

Switch invariants

In opam 2.0, when a switch is created the packages selected are put into the “base” of the switch. These packages are not normally considered for upgrade, in order to ease pressure on opam's solver. This was a much bigger concern early on in opam 2.0's development, but is less of a problem with the default mccs solver.

However, it's a problem for system compilers. opam would detect that your system compiler version had changed, but be unable to upgrade the ocaml-system package unless you went through a slightly convoluted process with --unlock-base.

In opam 2.1, base packages have been replaced by switch invariants. The switch invariant is a package formula which must be satisfied on every upgrade and install. All existing switches' base packages could just be expressed as package1 & package2 & package3 etc. but opam 2.1 recognises many existing patterns and simplifies them, so in most cases the invariant will be "ocaml-base-compiler" {= "4.11.1"}, etc. This means that opam switch create my_switch ocaml-system now creates a switch invariant of "ocaml-system" rather than a specific version of the ocaml-system package. If your system OCaml package is updated, opam upgrade will seamlessly switch to the new package.

This also allows you to have switches which automatically install new point releases of OCaml. For example:

opam switch create ocaml-4.11 --formula='"ocaml-base-compiler" {>= "4.11.0" & < "4.12.0~"}' --repos=old=git+https://github.com/ocaml/opam-repository#a11299d81591
opam install utop

Creates a switch with OCaml 4.11.0 (the --repos= was just to select a version of opam-repository from before 4.11.1 was released). Now issue:

opam repo set-url old git+https://github.com/ocaml/opam-repository
opam upgrade

and opam 2.1 will automatically offer to upgrade OCaml 4.11.1 along with a rebuild of the switch. There's not yet a clean CLI for specifying the formula, but we intend to iterate further on this with future opam releases so that there is an easier way of saying “install OCaml 4.11.x”.

Hint: You can set up a default invariant that will apply for all new switches, via a specific opamrc. The default one is ocaml >= 4.05.0

Configuring opam from the command-line

Configuring opam is not a simple task: you need to use an opamrc at init stage, or hack global/switch config file, or use opam config var for additional variables. To ease that step, and permit a more consistent opam config tweaking, a new command was added : opam option.

For example:

opam option download-jobs gives the global download-jobs value (as it exists only in global configuration)
opam option jobs=6 --global will set the number of parallel build jobs opam is allowed to run (along with the associated jobs variable)
opam option depext-run-commands=false disables the use of sudo for handling system dependencies; it will be replaced by a prompt to run the installation commands
opam option depext-bypass=m4 --global bypass m4 system package check globally, while opam option depext-bypass=m4 --switch myswitch will only bypass it in the selected switch

The command opam var is extended with the same format, acting on switch and global variables.

Hint: to revert your changes use opam option <field>=, it will take its default value.

CLI Versioning

A new --cli switch was added to the first beta release, but it's only now that it's being widely used. opam is a complex enough system that sometimes bug fixes need to change the semantics of some commands. For example:

opam show --file needed to change behaviour
The addition of new controls for setting global variables means that the opam config was becoming cluttered and some things want to move to opam var
opam switch install 4.11.1 still works in opam 2.0, but it's really an OPAM 1.2.2 syntax.

Changing the CLI is exceptionally painful since it can break scripts and tools which themselves need to drive opam. CLI versioning is our attempt to solve this. The feature is inspired by the (lang dune ...) stanza in dune-project files which has allowed the Dune project to rename variables and alter semantics without requiring every single package using Dune to upgrade their dune files on each release.

Now you can specify which version of opam you expected the command to be run against. In day-to-day use of opam at the terminal, you wouldn't specify it, and you'll get the latest version of the CLI. For example: opam var --global is the same as opam var --cli=2.1 --global. However, if you issue opam var --cli=2.0 --global, you will told that --global was added in 2.1 and so is not available to you. You can see similar things with the renaming of opam upgrade --unlock-base to opam upgrade --update-invariant.

The intention is that --cli should be used in scripts, user guides (e.g. blog posts), and in software which calls opam. The only decision you have to take is the oldest version of opam which you need to support. If your script is using a new opam 2.1 feature (for example opam switch create --formula=) then you simply don't support opam 2.0. If you need to support opam 2.0, then you can't use --formula and should use --packages instead. opam 2.0 does not have the --cli option, so for opam 2.0 instead of --cli=2.0 you should set the environment variable OPAMCLI to 2.0. As with all opam command line switches, OPAMCLI is simply the equivalent of --cli which opam 2.1 will pick-up but opam 2.0 will quietly ignore (and, as with other options, the command line takes precedence over the environment).

Note that opam 2.1 sets OPAMCLI=2.0 when building packages, so on the rare instances where you need to use the opam command in a package build: command (or in your build system), you must specify --cli=2.1 if you're using new features.

Since 2.1.0~rc2, CLI versioning applies to opam environment variables. The previous behavior was to ignore unknown or wrongly set environment variable, while now you will have a warning to let you know that the environment variable won't be handled by this version of opam.

To ensure not breaking compatibility of some widely used deprecated options, a default CLI is introduced: when no CLI is specified, those deprecated options are accepted. It concerns opam exec and opam var subcommands.

There's even more detail on this feature in our wiki. We're hoping that this feature will make it much easier in future releases for opam to make required changes and improvements to the CLI without breaking existing set-ups and tools.

Note: For opam libraries users, since 2.1 environment variable are no more loaded by the libraries, only by opam client. You need to load them explicitly.

opam root portability

opam root format changes during opam life-cycle, new field are added or removed, new files are added ; an older opam version sometimes can no longer read an upgraded or newly created opam root. opam root format has been updated to allow new versions of opam to indicate that the root may still be read by older versions of the opam libraries. A plugin compiled against the 2.0.9 opam libraries will therefore be able to read information about an opam 2.1 root (plugins and tools compiled against 2.0.8 are unable to load opam 2.1.0 roots). It is a read-only best effort access, any attempt to modify the opam root fails.

Hint: for opam libraries users, you can safely load states with OpamStateConfig load functions.

Tremendous thanks to all involved people, who've developed, tested & retested, helped with issue reports, comments, feedback...

Try it!

In case you plan a possible rollback, you may want to first backup your ~/.opam directory.

The upgrade instructions are unchanged:

Either from binaries: run

bash -c "sh <(curl -fsSL https://raw.githubusercontent.com/ocaml/opam/master/shell/install.sh) --version 2.1.0"

or download manually from the Github "Releases" page to your PATH.

Or from source, manually: see the instructions in the README.

You should then run:

opam init --reinit -ni

opam 2.0.9 release

2021-08-03T09:05:17Z

Feedback on this post is welcomed on Discuss!

We are pleased to announce the minor release of opam 2.0.9.

This new version contains some back-ported fixes.

New features

Back-ported ability to load upgraded roots read-only; allows applications compiled with opam-state 2.0.9 to load a root which has been upgraded to opam 2.1 [#4636]
macOS sandbox now supports OPAM_USER_PATH_RO for adding a custom read-only directory to the sandbox [#4589, #4609]
OPAMROOT and OPAMSWITCH now reflect the --root and --switch parameters in the package build [#4668]
When built with opam-file-format 2.1.3+, opam-format 2.0.x displays better errors for newer opam files [#4394]

Bug fixes

Linux sandbox now mounts host $TMPDIR read-only, then sets the sandbox $TMPDIR to a new separate tmpfs. Hardcoded /tmp access no longer works if TMPDIR points to another directory [#4589]
Stop clobbering DUNE_CACHE in the sandbox script [#4535, fixing ocaml/dune#4166]
Ctrl-C now correctly terminates builds with bubblewrap; sandbox now requires bubblewrap 0.1.8 or later [#4400]
Linux sandbox script no longer makes PWD read-write on remove actions [#4589]
Lint W59 and E60 no longer trigger for packages flagged conf [#4549]
Reduce the length of temporary file names for pin caching to ease pressure on Windows [#4590]
Security: correct quoting of arguments when removing switches [#4707]
Stop advertising the removed option --compiler when creating local switches [#4718]
Pinning no longer fails if the archive's opam file is malformed [#4580]
Fish: stop using deprecated ^ syntax to fix support for Fish 3.3.0+ [#4736]

Installation instructions (unchanged):

From binaries: run

bash -c "sh <(curl -fsSL https://raw.githubusercontent.com/ocaml/opam/master/shell/install.sh) --version 2.0.9"

or download manually from the Github "Releases" page to your PATH. In this case, don't forget to run opam init --reinit -ni to enable sandboxing if you had version 2.0.0~rc manually installed or to update you sandbox script.

From source, using opam:

opam update; opam install opam-devel

(then copy the opam binary to your PATH as explained, and don't forget to run opam init --reinit -ni to enable sandboxing if you had version 2.0.0~rc manually installed or to update your sandbox script)

From source, manually: see the instructions in the README.

We hope you enjoy this new minor version, and remain open to bug reports and suggestions.

Detecting identity functions in Flambda

2021-07-16T09:05:17Z

In some discussions among OCaml developers around the empty type (PR#9459), some people mused about the possibility of annotating functions with an attribute telling the compiler that the function should be trivial, and always return a value strictly equivalent to its argument.
Curious about the feasibility of implementing this feature, we advertised an internship with our compiler team aimed at exploring this subject.
We welcomed Léo Boitel during three months to work on this topic, with Vincent Laviron as mentor, and we're proud to let him show off what he has achieved in this post.

The problem at hand

OCaml's strong typing system is one of its main perks: it allows to write safe code thanks to the abstraction it provides. Most of the basic design mistakes will directly result into a typing error and the user cannot mess up the memory as it is automatically handled by the compiler and runtime.

However, these perks keep a power-user from implementing some optimizations, in particular those linked to memory representation as they cannot be accessed directly.

A good example would be this piece of code:

type return = Ok of int | Failure
let id = function
| Some x -> Ok x
| None -> Failure

In terms of memory representation, this function is indeed the identity function. Some x and Ok x share the same representation (and so do None and Failure). However, this identity is invisible to the user. Even if the user knows the representation is the same, they would need to use this function to avoid a typing error.

Another good example would be this one:

type record = { a:int; b:int }
let id (x,y) = { a = x; b = y }

Even if those functions are the identity, they come with a cost: not only do they cost a function call, they reallocate the result instead of just returning their argument directly. Detecting those functions would allow us to produce interesting optimizations.

Hurdles

If we want to detect identities, we quickly hit the problem of recursive functions: how does one recognize identity in those cases? Can a function be an identity if it doesn't always terminate, or if it never does?

Once we have a good definition of what exactly an identity function is, we still need to prove that an existing function fits the definition. Indeed, we want to ensure the user that this optimization will not change the observable behavior of the program.

We also want to avoid breaking type safety. As an example, the following function:

let rec fake_id = function
| [] -> 0
| t::q -> fake_id (t::q)

A naive induction proof would allow us to replace this function with the identity, as [] and 0 share the same memory representation. However, this is unsafe as applying this to a non-empty list would return a list even if this function has an int type (we'll talk more about it later).

To tackle those challenges, we started the internship with a theoretical study that lasted for three fourths of the allocated time and lastly implemented a practical solution in the Flambda representation of the compiler.

Theoretical results

We worked on extensions of lambda-calculus (implemented in OCaml) in order to gradually test our ideas in a simpler framework than the full Flambda.

Pairs

We started with a lambda-calculus to which we only added the concept of pairs. To prove identities, every function has to be annotated as identity or not. We then prove these annotations by β-reducing the function bodies. After each recursive reduction, we apply a rule that states that a pair made of the first and second projection of a variable is equal to that variable. We do not reduce applications, but we replace them by the argument if the concerned function is annotated as identity.

Using this method, we maintain a reasonable complexity compared to a full β-reduction which would be unrealistic on a big program.

We then add higher-order capabilities by allowing annotations of the form Annotation → Annotation. Functions as List.map can that way be abstracted as Id → Id. Even though this solution doesn't cover every case, most real-world usage are recognized by these patterns.

Tuple reconstruction

We then move from just pairs to tuples of arbitrary size. This adds a new problem: if we make a pair out of the first two fields of a variable, this is no longer necessarily that variable, as it may have more than two fields.

We then have two solutions: we can first annotate every projection with the size of the involved tuple to know if we are indeed reconstructing the entire variable. As an example, if we make a pair from the fields of a triplet, we know there is no way to simplify this reconstruction.

Another solution, more ambitious, is to adopt a less restrictive definition of equality and to allow the replacement of (x,y) by (x,y,z). Indeed, if the variable was typed as a pair, we are guaranteed that the third field will never be accessed. The behavior of the program will therefore never be affected by this extension.

Though this allows to avoid a lot of allocations, it may also increase memory usage in some cases: if the triplet ceases to be used, it won't be deallocated by the Garbage Collector (GC) and the field z will be kept in memory as long as (x,y) is still accessible.

This approach remains interesting to us, as long as it is manually enabled by the user for some specific blocks.

Recursion

We now add recursive definitions to our language, through the use of a fixpoint operator.

To prove that a recursive function is the identity, we have to use induction. The main difficulty is to prove that the function indeed terminates to ensure the validity of the induction.

We can separate this into three different levels of safety. The first option is to not prove termination at all, and let the user state which function they know will return. We can then assume the function is the identity and replace its body on that hypothesis. This approach is enough for most practical cases, but its main problem lies in the fact that it allows to write unsafe code (as we've already seen).

Our second option is to limit our induction hypothesis to recursive applications on "smaller" elements than the argument. An element is defined as smaller if it is a projection of the argument or a projection of a smaller element. This is not enough to prove that the function will terminate (the argument might be cyclic, for example) but is enough to ensure type safety. The reason is that any possibly returned value is constructed (as it cannot directly come from a recursive call) and have therefore a defined type. Typing would fail if the function was to return a value that cannot be identified to its argument.

Finally, we may want to establish a perfect equivalence between the function and the identity function before simplifying it. In that case, we propose to create a special annotation for functions that are the identity when applied to a non-cyclical object. We can prove they have this property with the already described induction. The difficulty now lies into applying the simplification only to valid applications: if an object is immutable, wasn't recursively defined and is made of components that also have that property, we can declare that object inductive and simplify applications on it. The inductive state of variables can be propagated during our recursive pass of optimization.

Block reconstruction

The representation of blocks in Flambda provides interesting challenges in terms of equality detection, which is often crucial to prove an identity. It is very hard to detect an identical block reconstruction.

Blocks in Flambda

Variants

The blocks in Flambda come from the existence of variants in OCaml: one type may have several different constructors, as we can see in

type choice = A of int | B of int

When OCaml is compiled to Flambda, the information used by the constructor is lost and replaced by a tag. The tag is a number contained in the header of the object's memory representation between 0 and 255 that represents which constructor was used. As an example, an element of type choice would have tag 0 for the A constructor, and 1 for B.

That tag will be kept at runtime, which will allow for example to implement pattern matching as a simple switch in Flambda, that executes simple comparisons on the tag to know which branch to execute next.

This system complicates our task as Flambda's typing doesn't inform us which type the constructor is supposed to have, and therefore keeps us from easily knowing if two variants are indeed equal.

Tag generalization

To complicate things, tags are actually used for any block, meaning tuples, modules or functions (as a matter of fact, almost anything but constant constructors and integers). If the object doesn't have variants, it will usually have tag 0. This tag is never read (as there are no variants to differentiate) but keeps us from simply comparing two tuples, because Flambda will simply see two blocks of unknown tag.

Inlining

Finally, this system is optimized by inlining tuples: if a variant has a shape Pair of int * int, it will be often be flattened into a tuple (tag Pair, int, int).

This also means that variants can have an arbitrary size, which is also unknown in Flambda.

Existing approach

A partial solution to the problem already existed in a Pull Request (PR) you can read here.

The chosen approach in this PR is the natural one: we use the switch to gain information on the tag of a block, depending on the branch taken. The PR also allows to know the mutability and size of the block in each branch, starting from OCaml (where this information is known as it is explicit in the pattern matching) and propagating the knowledge to Flambda.

This allows to register every block on which a switch is done with their tag, size and mutability. We can then detect if one of them is reconstructed with the use of a Pmakeblock primitive.

Unfortunately, this path has its limits as there are numerous cases where the tag and size could be known without performing a switch on the value. As an example, this doesn't allow the simplification of tuple reconstruction.

New solution

Our new solution will have to propagate more information from OCaml into Flambda. This propagation is based on two PRs that already existed for Flambda 2, which annotated in the lambda representation each projection (Pfield) with typing informations. We add block mutability and tag and finally size.

Our first contribution was to translate these PRs to Flambda 1, and to propagate from lambda to Flambda correctly.

We then had access to every necessary information to detect and prove block reconstruction: not only we have a list of blocks that were pattern-matched, we can make a list of partially immutable blocks, meaning blocks for which we know that some fields are immutable.

Here's how we use it:

Block discovery

As soon as we find a projection, we verify whether it is done on an immutable block of known size. If so, we add that block to the list of partial blocks. We verify that the information we have on the tag and size are compatible with the already known projections. If all of the fields of the block are known, the block is added to the list of simplifiable blocks.

Of course, we also keep track of known blocks though switches.

Simplification

This part is similar to the original PR: when an immutable block is met, we check whether this block is known as simplifiable. In that case we avoid a reallocation.

Compared to the original approach, we also reduced the asymptotic complexity (from quadratic to linear) by registering the association of every projection variable to its index and original block. We also modified some implementation details that could have triggered a bug when associated with our PR.

Example

Let's consider this function:

type typ1 = A of int | B of int * int
type typ2 = C of int | D of {x:int; y:int}
let id = function
  | A n -> C n
  | B (x,y) -> D {x; y}

The current compiler would produce the resulting Flambda output:

End of middle end:
let_symbol
  (camlTest__id_21
    (Set_of_closures (
      (set_of_closures id=Test.8
        (id/5 = fun param/7 ->
          (switch*(0,2) param/7
           case tag 0:
            (let
              (Pmakeblock_arg/11 (field 0<{../../test.ml:4,4-7}> param/7)
               Pmakeblock/12
                 (makeblock 0 (int)<{../../test.ml:4,11-14}>
                   Pmakeblock_arg/11))
              Pmakeblock/12)
           case tag 1:
            (let
              (Pmakeblock_arg/15 (field 1<{../../test.ml:5,4-11}> param/7)
               Pmakeblock_arg/16 (field 0<{../../test.ml:5,4-11}> param/7)
               Pmakeblock/17
                 (makeblock 1 (int,int)<{../../test.ml:5,17-23}>
                   Pmakeblock_arg/16 Pmakeblock_arg/15))
              Pmakeblock/17)))
         free_vars={ } specialised_args={}) direct_call_surrogates={ }
        set_of_closures_origin=Test.1])))
  (camlTest__id_5_closure (Project_closure (camlTest__id_21, id/5)))
  (camlTest (Block (tag 0,  camlTest__id_5_closure)))
End camlTest

Our optimization allows to detect that this function reconstructs a similar block and therefore can simplify it:

End of middle end:
let_symbol
  (camlTest__id_21
    (Set_of_closures (
      (set_of_closures id=Test.7
        (id/5 = fun param/7 ->
          (switch*(0,2) param/7
           case tag 0 (1): param/7
           case tag 1 (2): param/7))
         free_vars={ } specialised_args={}) direct_call_surrogates={ }
        set_of_closures_origin=Test.1])))
  (camlTest__id_5_closure (Project_closure (camlTest__id_21, id/5)))
  (camlTest (Block (tag 0,  camlTest__id_5_closure)))
End camlTest

Possible improvements

Equality relaxation

We can use observational equality studied in the theoretical part for block equality in order to avoid more allocations. The implementation is simple:

When a block is created, to know if it will be allocated, the normal course of action is to check if all of its fields are the known projections of another block, with the same index, and if the block sizes are the same. We can just remove that last check.

Implementing this was a bit more tricky because of several practical details. First, we want that optimization to be only triggered on user-annotated blocks, we had to propagate that annotation to Flambda.

Additionally, if we only implement that optimization, numerous optimization cases will be ignored because unused variables are simplified before our optimization pass. As an example, if a function looks like

let loose_id (a,b,c) = (a,b)

The c variable will be simplified away before reaching Flambda, and there will be no way to prove that (a,b,c) is immutable as its third field could not be. This problem is being solved on Flambda2 thanks to a PR that propagates mutability information for every block, but we didn't have the time necessary to migrate it on Flambda 1.

Detecting recursive identities

Now that we can detect block reconstruction, we're left with solving the problem of recursive functions.

Unsafe approach

We began the implementation of a pass that contains no termination proof. The idea is to add the proof later, or to authorize non-terminating functions to be simplified as long as they type correctly (see previously in the theory part).

For now, we trust the user to verify these properties manually.

Hence, we modified the function simplification procedure: when a function with a single argument is modified, we first assume that this function is the identity before simplifying its body. We then check whether the result is equivalent to an identity by recursively going through it, so as to cover as many cases as possible (for example in conditional branchings). If it is the case, the function will be replaced by the identity ; otherwise, we go back to a normal simplification, without the induction hypothesis.

Constant propagation

We took some time to improve our code that checks whether the body of a function is an identity or not, so as to handle constant values. It propagates identity information we have on an argument during conditional branching.

This way, on a function like

type truc = A | B | C
let id = function
  | A -> A
  | B -> B
  | C -> C

or even

let id x = if x=0 then 0 else x

We can successfully detect identity.

Examples

Recursive functions

We can now detect recursive identities:

let rec listid = function
  | t::q -> t::(listid q)
  | [] -> []

Used to compile to:

End of middle end:
let_rec_symbol
  (camlTest__listid_5_closure
    (Project_closure (camlTest__set_of_closures_20, listid/5)))
  (camlTest__set_of_closures_20
    (Set_of_closures (
      (set_of_closures id=Test.11
        (listid/5 = fun param/7 ->
          (if param/7 then begin
            (let
              (apply_arg/13 (field 1<{../../test.ml:9,4-8}> param/7)
               apply_funct/14 camlTest__listid_5_closure
               Pmakeblock_arg/15
                 *(apply*&#091;listid/5]<{../../test.ml:9,15-25}> apply_funct/14
                    apply_arg/13)
               Pmakeblock_arg/16 (field 0<{../../test.ml:9,4-8}> param/7)
               Pmakeblock/17
                 (makeblock 0<{../../test.ml:9,12-25}> Pmakeblock_arg/16
                   Pmakeblock_arg/15))
              Pmakeblock/17)
            end else begin
            (let (const_ptr_zero/27 Const(0a)) const_ptr_zero/27) end))
         free_vars={ } specialised_args={}) direct_call_surrogates={ }
        set_of_closures_origin=Test.1])))
let_symbol (camlTest (Block (tag 0,  camlTest__listid_5_closure)))
End camlTest

But is now detected as being the identity:

End of middle end:
let_symbol
  (camlTest__set_of_closures_20
    (Set_of_closures (
      (set_of_closures id=Test.13 (listid/5 = fun param/7 -> param/7)
        free_vars={ } specialised_args={}) direct_call_surrogates={ }
        set_of_closures_origin=Test.1])))
  (camlTest__listid_5_closure
    (Project_closure (camlTest__set_of_closures_20, listid/5)))
  (camlTest (Block (tag 0,  camlTest__listid_5_closure)))
End camlTest

Unsafe example

However, we can use the unsafety of the feature to go around the typing system and access a memory address as if it was an integer:

type bugg = A of int*int | B of int
let rec bug = function
  | A (a,b) -> (a,b)
  | B x -> bug (B x)
  
let (a,b) = (bug (B 42))
let _ = print_int b

This function will be simplified to the identity even though the bugg type is not compatible with tuples; trying to project on the second field of variant b will access an undefined part of memory:

$ ./unsafe.out
47423997875612

Possible improvements - short term

Function annotation

A theoretically simple thing to add would be to let the choice of applying unsafe optimizations to the user. We lacked the time to do the work of propagating the information to Flambda, but it would not be hard to implement.

Order on arguments

For a safer optimization, we could use the idea developed in the theoretical part to make the optimization correct on non-cyclical objects and more importantly give us typing guarantees to avoid the problem we just saw.

To get this guarantee, we would have to change the simplification pass by adding an optional pair of function-argument to the environment. When this option exists, the pair indicates that we are in the body in the process of simplification and that applications on smaller elements can be simplified as identity. Of course, the pass would need to be modified to remember which elements are not smaller than the previous argument.

Possible improvements - long term

Exclusion of cyclical objects

As described in the theoretical part, we could recursively deduce which objects are cyclical and attempt to remove them from our optimization. The problem is then that instead of having to replace functions by the identity, we need to add a special annotation that represents IdRec.

This amounts to a lot of added implementation complexity when compiling over several files, as we need access to the interface of already compiled files to know when the optimization can be used.

A possibility would be to use .cmx files to store this information when the file is compiled, but that kind of work would have taken too long to be achieved during the internship. Moreover, the practicality of that choice is far from obvious: it would complexify the optimization pass for a small improvement with respect to a version that would be correct on non-cyclical objects and activated through annotations.

Détection de fonctions d’identité dans Flambda

2021-07-15T09:05:17Z

Au cours de discussions parmi les développeurs OCaml sur le type vide (PR#9459), certains caressaient l’idée d’annoter des fonctions avec un attribut indiquant au compilateur que la fonction devrait être triviale, et toujours renvoyer une valeur strictement équivalente à son argument. Nous étions curieux de voir si l’implémentation d’une telle fonctionnalité serait possible et nous avons publié une offre de stage pour explorer ce sujet. L’équipe Compilation d’OCamlPro a ainsi accueilli Léo Boitel durant trois mois pour se consacrer à ce sujet, avec Vincent Laviron pour encadrant. Nous sommes fiers des résultats auxquels Léo a abouti !

Voici ce que Léo en a écrit 🙂

Description du problème

Le typage fort d’OCaml est un de ses grands avantages : il permet d’écrire du code plus sûr grâce à la capacité d’abstraction qu’il offre. La plupart des erreurs de conception se traduiront directement en erreur de typage, et l’utilisateur ne peut pas faire d’erreur avec la manipulation de la mémoire puisqu’elle est entièrement gérée par le compilateur.

Cependant, ces avantages empêchent l’utilisateur de faire certaines optimisations lui-même, en particulier celles liées aux représentations mémoires puisqu’il n’y accède pas directement.

Un cas classique serait le suivant :

type return = Ok of int | Failure
let id = function
| Some x -> Ok x
| None -> Failure

Cette fonction est une identité, car la représentation mémoire de Some x et de Ok x est la même (idem pour None et Failure). Cependant, l’utilisateur ne le voit pas, et même s’il le voyait, il aurait besoin de cette fonction pour conserver un typage correct.

Un autre exemple serait le suivant: Another good example would be this one:

type record = { a:int; b:int }
let id (x,y) = { a = x; b = y }

Même si ces fonctions sont des identités, elles ont un coût : en plus de nous coûter un appel, elles réallouent le résultat au lieu de nous retourner leur argument directement. C’est pourquoi leur détection permettrait des optimisations intéressantes.

Difficultés

Si on veut pouvoir détecter les identités, on se heurte rapidement au problème des fonctions récursives : comment définir l’identité pour ces dernières ? Est-ce qu’une fonction peut-être l’identité si elle ne termine pas toujours, voire jamais ?

Une fois qu’on a défini l’identité, le problème est la preuve qu’une fonction est bien l’identité. En effet, on veut garantir à l’utilisateur que cette optimisation ne changera pas le comportement observable du programme.

On veut aussi éviter d’ajouter des failles de sûreté au typage. Par exemple, si on a une fonction de la forme suivante:

let rec fake_id = function
| [] -> 0
| t::q -> fake_id (t::q)

Une preuve naïve par induction nous ferait remplacer cette fonction par l’identité, car [] et 0 ont la même représentation mémoire. C’est dangereux car le résultat d’une application à une liste non-vide sera une liste alors qu’il est typé comme un entier (voir exemples plus bas).

Pour résoudre ces problèmes, nous avons commencé par une partie théorique qui a occupé les trois quarts du stage, pour finir par une partie pratique d’implémentation dans Flambda.

Résultats théoriques

Pour cette partie, nous avons travaillé sur des extensions de lambda-calcul, implémentées en OCaml, pour pouvoir tester nos idées au fur et à mesure dans un cadre plus simple que Flambda.

Paires

Nous avons commencé par un lambda calcul auquel on ajoute seulement des paires. Pour effectuer nos preuves, on annote toutes les fonctions comme des identités ou non. On prouve ensuite ces annotations en β-réduisant le corps des fonctions. Après chaque réduction récursive, on applique une règle qui dit qu’une paire composée des deux projections d’une variable est égale à la variable. On ne réduit pas les applications, mais on les remplace par l’argument si la fonction est annotée comme une identité.

On garde ainsi une complexité raisonnable par rapport à une β-réduction complète qui serait évidemment irréaliste pour de gros programmes.

On passe ensuite à l’ordre supérieur en permettant des annotations de la forme Annotation → Annotation. Les fonctions comme List.map peuvent donc être représentées comme Id → Id. Bien que cette solution ne soit pas complète, elle couvre la grande majorité des cas d’utilisation.

Reconstruction de tuples

On passe ensuite des paires aux tuples de taille arbitraire. Cela complexifie le problème : si on construit une paire à partir des projections des deux premiers champs d’une variable, ce n’est pas forcément la variable, puisqu’elle peut avoir plus de champs.

On a alors deux solutions : tout d’abord, on peut annoter les projections avec la taille du tuple pour savoir si on reconstruit la variable en entier. Par exemple, si on reconstruit une paire avec deux projections d’un triplet, on sait qu’on ne peut pas simplifier cette reconstruction.

L’autre solution, plus ambitieuse, est d’adopter une définition moins stricte de l’égalité, et de dire qu’on peut remplacer, par exemple, (x,y) par (x,y,z). En effet, si la variable a été typée comme une paire, on a la garantie qu’on accédera jamais au champ z de toute façon. Le comportement du programme sera donc le même si on étend la variable avec des champs supplémentaires.

Utiliser l’égalité observationnelle permet d’éviter beaucoup d’allocations, mais elle peut utiliser plus de mémoire dans certains cas : si le triplet cesse d’être utilisé, il ne sera pas désalloué par le Garbage Collector (GC), et le champ z restera donc en mémoire pour rien tant que (x,y) est utilisé.

Cette approche reste intéressante, au moins si on donne la possibilité à l’utilisateur de l’activer manuellement pour certains blocs.

Récursion

On ajoute maintenant les définitions récursives à notre langage, par le biais d’un opérateur de point fixe.

Pour prouver qu’une fonction récursive est l’identité, on doit procéder par induction. La difficulté est alors de prouver que la fonction termine, pour que l’induction soit correcte.

On peut distinguer trois niveaux de preuve : la première option est de ne pas prouver la terminaison, et de laisser l’utilisateur choisir les fonctions dont il est sûr qu’elles terminent. On suppose donc que la fonction est l’identité, et on simplifie son corps avec cette hypothèse. Cette approche est suffisante pour la plupart des cas pratiques, mais son problème principal est qu’elle autorise à écrire du code qui casse la sûreté du typage, comme discuté ci-dessus.

La seconde option est de faire notre hypothèse d’induction uniquement sur des applications de la fonction sur des éléments plus “petits” que l’argument. Un élément est défini comme tel s’il est une projection de l’argument, ou une projection d’un élément plus petit. Cela n’est pas suffisant pour prouver que la fonction termine (par exemple si l’argument est cyclique), mais c’est assez pour avoir un typage sûr. En effet, cela implique que toutes les valeurs de retour possibles de la fonction sont construites (puisqu’elles ne peuvent provenir directement d’un appel récursif), et ont donc un type défini. Le typage échouerait donc si la fonction pouvait renvoyer une valeur qui n’est pas identifiable à son argument.

Finalement, on peut vouloir une équivalence observationnelle parfaite entre la fonction et l’identité pour la simplifier. Dans ce cas, la solution que nous proposons est de créer une annotation spéciale pour les fonctions qui sont l’identité quand elles sont appliquées à un objet non cyclique. On peut prouver qu’elles ont cette propriété avec l’induction décrite ci-dessus. La difficulté est ensuite de faire la simplification sur les bonnes applications : si un objet est immutable, n’est pas défini récursivement, et que tous ses sous-objets satisfont cette propriété, on le dit inductif et on peut simplifier les applications sur lui. On propage le statut inductif des objets lors de notre passe récursive d’optimisation.

###Reconstruction de blocs

La représentation des blocs dans Flambda pose des problèmes intéressants pour détecter leur égalité, ce qui est souvent nécessaire pour prouver une identité. En effet, il est difficile de détecter la reconstruction d’un bloc à l’identique.

Blocs dans Flambda

Variants

The blocks in Flambda come from the existence of variants in OCaml: one type may have several different constructors, as we can see in

type choice = A of int | B of int

Quand OCaml est compilé vers Flambda, l’information du constructeur utilisé par un objet est perdue, et est remplacée par un tag. Le tag est un nombre contenu dans un entête de la représentation mémoire de l’objet, et est un nombre entre 0 et 255 représentant le constructeur de l’objet. Par exemple, un objet de type choice aurait le tag 0 si c’est un A et 1 si c’est un B.

Le tag est ainsi présent dans la mémoire à l’exécution, ce qui permet par exemple d’implémenter le pattern matching de OCaml comme un switch en Flambda, qui fait de simples comparaisons sur le tag pour décider quelle branche prendre.

Ce système nous complique la tâche puisque le typage de Flambda ne nous dit pas quel type de constructeur contient un variant, et empêche donc de décider facilement si deux variants sont égaux.

Généralisation des tags

Pour plus de complexité, les tags sont en faits utilisés pour tous les blocs, c’est à dire les tuples, les modules, les fonctions (en fait presque toutes les valeurs sauf les entiers et les constructeurs constants). Quand l’objet n’est pas un variant, on lui donne généralement un tag 0. Ce tag n’est donc jamais lu par la suite (puisqu’on ne fait pas de match sur l’objet), mais nous empêche de comparer simplement deux tuples, puisqu’on verra simplement deux objets de tag inconnu en Flambda.

Inlining

Enfin, on optimise ce système en inlinant les tuples : si on a un variant de type Pair of int*int, au lieu d’être représenté comme le tag de Pair et une adresse mémoire pointant vers un couple (donc un tag 0 et les deux entiers), le couple est inliné et l’objet est de la forme (tag Pair, entier, entier).

Cela implique que les variants sont de taille arbitraire, qui est aussi inconnue dans Flambda.

Approche existante

Une solution partielle au problème existait déjà dans une Pull Request (PR) disponible ici.

L’approche qui y est adoptée est naturelle : on y utilise les switchs pour gagner de l’information sur le tag d’un bloc, en fonction de la branche prise. La PR permet aussi de connaître la mutabilité et la taille du bloc dans chaque branche, en partant de OCaml (où l’information est connue puisque le constructeur est explicite dans le match), et propageant l’information jusqu’à Flambda.

Cela permet d’enregistrer tous les blocs sur lesquels on a fait un switch dans l’environnement, avec leur tag, taille et mutabilité. On peut ensuite détecter si on reconstruit l’un d’entre eux avec la primitive Pmakeblock.

Cette approche est malheureusement limitée puisqu’ils existe de nombreux cas où on pourrait connaître le tag et la taille du bloc sans faire de switch dessus. Par exemple, on ne pourra jamais simplifier une reconstruction de tuple avec cette solution.

Nouvelle approche

Notre nouvelle approche commence donc par propager plus d’information depuis OCaml. La propagation est fondée sur deux PR qui existaient sur Flambda 2, et qui annotent dans lambda chaque projection (Pfiel) avec des informations dérivées du typage OCaml. Une ajoute la mutabilité du bloc et l’autre son tag et enfin sa taille.

Notre première contribution a été d’adapter ces PRs à Flambda 1 et de les propager de lambda à Flambda correctement.

Nous avons ensuite les informations nécessaires pour détecter les reconstructions de blocs : en plus d’avoir une liste de blocs sur lesquels on a switché, on crée une liste de blocs partiellement immutables, c’est à dire dont on sait que certains champs sont immutables.

On l’utilise ainsi :

Découverte de blocs

Dès qu’on voit une projection, on regarde si elle est faite sur un bloc immutable de taille connue. Si c’est le cas, on ajoute le bloc correspondant aux blocs partiels. On vérifie que l’information qu’on a sur le tag et la taille est compatible avec celle des projections de ce bloc vues précédemment. Si on connaît maintenant tous les champs du bloc, on l’ajoute à notre liste de blocs connus sur lesquels on peut faire des simplifications.

On garde aussi les informations sur les blocs qu’on connaît grâce aux switchs.

Simplification

Cette partie est similaire à celle de la PR originale : quand on construit un bloc immutable, on vérifie si on le connaît, et le cas échéant on ne le réalloue pas.

Par rapport à l’approche originale, nous avons aussi réduit la complexité de la PR originale (de quadratique à linéaire), en enregistrant l’association de chaque variable de projection à son index et bloc original. Nous avons aussi modifié des détails de l’implémentation originale qui auraient pu créer un bug lorsque associés à notre PR.

Exemple

Considérons cette fonction:

type typ1 = A of int | B of int * int
type typ2 = C of int | D of {x:int; y:int}
let id = function
  | A n -> C n
  | B (x,y) -> D {x; y}

Le compilateur actuel produirait le Flambda suivant:

End of middle end:
let_symbol
  (camlTest__id_21
    (Set_of_closures (
      (set_of_closures id=Test.8
        (id/5 = fun param/7 ->
          (switch*(0,2) param/7
           case tag 0:
            (let
              (Pmakeblock_arg/11 (field 0<{../../test.ml:4,4-7}> param/7)
               Pmakeblock/12
                 (makeblock 0 (int)<{../../test.ml:4,11-14}>
                   Pmakeblock_arg/11))
              Pmakeblock/12)
           case tag 1:
            (let
              (Pmakeblock_arg/15 (field 1<{../../test.ml:5,4-11}> param/7)
               Pmakeblock_arg/16 (field 0<{../../test.ml:5,4-11}> param/7)
               Pmakeblock/17
                 (makeblock 1 (int,int)<{../../test.ml:5,17-23}>
                   Pmakeblock_arg/16 Pmakeblock_arg/15))
              Pmakeblock/17)))
         free_vars={ } specialised_args={}) direct_call_surrogates={ }
        set_of_closures_origin=Test.1])))
  (camlTest__id_5_closure (Project_closure (camlTest__id_21, id/5)))
  (camlTest (Block (tag 0,  camlTest__id_5_closure)))
End camlTest

Notre amélioration permet de détecter que cette fonction reconstruit des blocs similaires et donc la simplifie:

End of middle end:
let_symbol
  (camlTest__id_21
    (Set_of_closures (
      (set_of_closures id=Test.7
        (id/5 = fun param/7 ->
          (switch*(0,2) param/7
           case tag 0 (1): param/7
           case tag 1 (2): param/7))
         free_vars={ } specialised_args={}) direct_call_surrogates={ }
        set_of_closures_origin=Test.1])))
  (camlTest__id_5_closure (Project_closure (camlTest__id_21, id/5)))
  (camlTest (Block (tag 0,  camlTest__id_5_closure)))
End camlTest

Pistes d’amélioration

Relâchement de l’égalité

On peut utiliser l’égalité observationnelle étudiée dans la partie théorique pour l’égalité de blocs, afin d’éviter plus d’allocations. L’implémentation est simple :

Quand on crée un bloc, pour voir si il est alloué, l’approche normale est de regarder si chacun de ses champs est une projection connue d’un autre bloc, a le même index et si les deux blocs sont de même taille. On peut simplement supprimer la dernière vérification.

L’implémentation a été un peu plus difficile que prévu à cause de détails pratiques. Tout d’abord, on veut appliquer cette optimisation uniquement sur certains blocs annotés par l’utilisateur. Il faut donc propager l’annotation jusqu’à Flambda.

De plus, si on se contente d’implémenter l’optimisation, beaucoup de cas seront ignorés car les variables inutilisées sont simplifiées avant notre passe. Par exemple, prenons une fonction de la forme suivante :

let loose_id (a,b,c) = (a,b)

La variable c sera simplifiée avant d’atteindre Flambda, et on ne pourra donc plus prouver que (a,b,c) est immutable car son troisième champ pourrait ne pas l’être. Ce problème est en passe d’être résolu sur Flambda 2 grâce à une PR qui propage l’information de mutabilité pour tous les blocs, mais nous n’avons pas eu le temps nécessaire pour l’adapter à Flambda 1.

Détection d’identités récursives

Maintenant que nous pouvons détecter les reconstructions de blocs, reste à résoudre le problème des fonctions récursives.

Approche sans garanties

Nous avons commencé par implémenter une approche qui ne comporte pas de preuve de terminaison. L’idée est de rajouter la preuve ensuite, ou d’autoriser les fonctions qui ne terminent pas toujours à être simplifiées à condition qu’elles soient correctes au niveau du typage (voir section 7 dans la partie théorique).

Ici, on fait confiance à l’utilisateur pour vérifier ces propriétés manuellement.

Nous avons donc modifié la simplification de fonction : quand on simplifie une fonction à un seul argument, on commence par supposer que cette fonction est l’identité avant de simplifier son corps. On vérifie ensuite si le résultat est équivalent à une identité en le parcourant récursivement, pour couvrir le plus de cas possible (par exemple les branchements conditionnels). Si c’est le cas, la fonction est remplacée par l’identité ; sinon, on revient à une simplification classique, sans hypothèse d’induction.

Propagation de constantes

Nous avons ensuite amélioré notre fonction qui détermine si le corps d’une fonction est une identité ou non, pour gérer les constantes. Il propage les informations d’égalité qu’on gagne sur l’argument lors des branchements conditionnels.

Ainsi, si on a une fonction de la forme

type truc = A | B | C
let id = function
  | A -> A
  | B -> B
  | C -> C

ou même

let id x = if x=0 then 0 else x

on détectera bien que c’est l’identité.

Exemples

Fonctions récursives

Nous pouvons maintenant détecter les identités récursives :

let rec listid = function
  | t::q -> t::(listid q)
  | [] -> []

compilait avant ainsi:

End of middle end:
let_rec_symbol
  (camlTest__listid_5_closure
    (Project_closure (camlTest__set_of_closures_20, listid/5)))
  (camlTest__set_of_closures_20
    (Set_of_closures (
      (set_of_closures id=Test.11
        (listid/5 = fun param/7 ->
          (if param/7 then begin
            (let
              (apply_arg/13 (field 1<{../../test.ml:9,4-8}> param/7)
               apply_funct/14 camlTest__listid_5_closure
               Pmakeblock_arg/15
                 *(apply*&#091;listid/5]<{../../test.ml:9,15-25}> apply_funct/14
                    apply_arg/13)
               Pmakeblock_arg/16 (field 0<{../../test.ml:9,4-8}> param/7)
               Pmakeblock/17
                 (makeblock 0<{../../test.ml:9,12-25}> Pmakeblock_arg/16
                   Pmakeblock_arg/15))
              Pmakeblock/17)
            end else begin
            (let (const_ptr_zero/27 Const(0a)) const_ptr_zero/27) end))
         free_vars={ } specialised_args={}) direct_call_surrogates={ }
        set_of_closures_origin=Test.1])))
let_symbol (camlTest (Block (tag 0,  camlTest__listid_5_closure)))
End camlTest

On détecte maintenant que c’est l’identité :

End of middle end:
let_symbol
  (camlTest__set_of_closures_20
    (Set_of_closures (
      (set_of_closures id=Test.13 (listid/5 = fun param/7 -> param/7)
        free_vars={ } specialised_args={}) direct_call_surrogates={ }
        set_of_closures_origin=Test.1])))
  (camlTest__listid_5_closure
    (Project_closure (camlTest__set_of_closures_20, listid/5)))
  (camlTest (Block (tag 0,  camlTest__listid_5_closure)))
End camlTest

Exemple non sûr

En revanche, on peut profiter de l’absence de garanties pour contourner le typage, et accéder à une adresse mémoire comme à un entier :

type bugg = A of int*int | B of int
let rec bug = function
  | A (a,b) -> (a,b)
  | B x -> bug (B x)
  
let (a,b) = (bug (B 42))
let _ = print_int b

Cette fonction va être simplifiée vers l’identité alors que le type bugg n’est pas compatible avec le type tuple ; quand on essaie de projeter sur le second champ du variant b, on accède à une partie de la mémoire indéfinie :

$ ./unsafe.out
47423997875612

Pistes d’améliorations – court terme

Annotation des fonctions

Une amélioration simple en théorie, serait de laisser le choix à l’utilisateur des fonctions sur lesquelles il veut appliquer ces optimisations qui ne sont pas toujours correctes. Nous n’avons pas eu le temps de faire le travail de propagation de l’information jusqu’à Flambda, mais il ne devrait pas y avoir de difficultés d’implémentation.

Ordre sur les arguments

Pour avoir une optimisation plus sûre, on voudrait pouvoir utiliser l’idée développée dans la partie théorique, qui rend l’optimisation correcte sur les objets non cycliques, et surtout qui nous redonne les garanties du typage pour éviter le problème vu dans l’exemple ci-dessus.

Afin d’avoir cette garantie, on veut changer la passe de simplification pour que son environnement contienne une option de couple fonction – argument. Quand cette option existe, le couple indique que nous sommes dans le corps d’une fonction, en train de la simplifier, et donc que les applications de la fonction sur des éléments plus petits que l’argument peuvent être simplifiés en une identité. Bien sûr, on devrait aussi modifier la passe pour se rappeler des éléments qui ne sont pas plus petits que l’argument.

Pistes d’améliorations – long terme

Exclusion des objets cycliques

Comme décrit dans la partie théorique, on pourrait déduire récursivement quels objets sont cycliques et tenter de les exclure de notre optimisation. Le problème est alors qu’au lieu de remplacer les fonctions par l’identité, on doit avoir une annotation spéciale qui représente IdRec.

Cela devient bien plus complexe à implémenter quand on compile entre plusieurs fichiers, puisqu’on doit alors avoir cette information dans l’interface des fichiers déjà compilés pour pouvoir faire l’optimisation quand c’est nécessaire.

Une piste serait d’utiliser les fichiers .cmx pour enregistrer cette information quand on compile un fichier, mais ce genre d’implémentation était trop longue pour être réalisée pendant le stage. De plus, il n’est même pas évident qu’elle soit un bon choix pratique : elle complexifierait beaucoup l’optimisation pour un avantage faible par rapport à une version correcte sur les objets non cycliques et activée par une annotation de l’utilisateur.

opam 2.1.0~rc2 released

2021-06-23T09:05:17Z

Feedback on this post is welcomed on Discuss!

The opam team has great pleasure in announcing opam 2.1.0~rc2!

The focus since beta4 has been preparing for a world with more than one released version of opam (i.e. 2.0.x and 2.1.x). The release candidate extends CLI versioning further and, under the hood, includes a big change to the opam root format which allows new versions of opam to indicate that the root may still be read by older versions of the opam libraries. A plugin compiled against the 2.0.9 opam libraries will therefore be able to read information about an opam 2.1 root (plugins and tools compiled against 2.0.8 are unable to load opam 2.1.0 roots).

Please do take this release candidate for a spin! It is available in the Docker images at ocaml/opam on Docker Hub as the opam-2.1 command (or you can sudo ln -f /usr/bin/opam-2.1 /usr/bin/opam in your Dockerfile to switch to it permanently). The release candidate can also be tested via our installation script (see the wiki for more information).

Thank you to anyone who noticed the unannounced first release candidate and tried it out. Between tagging and what would have been announcing it, we discovered an issue with upgrading local switches from earlier alpha/beta releases, and so fixed that for this second release candidate.

Assuming no showstoppers, we plan to release opam 2.1.0 next week. The improvements made in 2.1.0 will allow for a much faster release cycle, and we look forward to posting about the 2.2.0 plans soon!

Try it!

In case you plan a possible rollback, you may want to first backup your ~/.opam directory.

The upgrade instructions are unchanged:

Either from binaries: run

bash -c "sh <(curl -fsSL https://raw.githubusercontent.com/ocaml/opam/master/shell/install.sh) --version 2.1.0~rc2"

or download manually from the Github "Releases" page to your PATH.

Or from source, manually: see the instructions in the README.

You should then run:

opam init --reinit -ni

We hope there won't be any, but please report any issues to the bug-tracker. Thanks for trying it out, and hoping you enjoy!

Tutorial: Format Module of OCaml

2021-05-06T09:05:17Z

The Format module of OCaml is an extremely powerful but unfortunately often poorly used module.

It combines two distinct elements:

pretty-print boxes
semantic tags

This tutorial aims to demystify much of this module and explain the range of things that you can do with it.

Réunion annuelle du Club des utilisateurs d’Alt-Ergo 2021

2021-04-29T09:05:17Z

La troisième réunion annuelle du Club des utilisateurs d’Alt-Ergo a eu lieu le 1er avril ! Cette réunion annuelle est l’endroit idéal pour passer en revue les besoins de chaque partenaire concernant Alt-Ergo. Nous avons eu le plaisir de recevoir nos partenaires pour discuter de la feuille de route concernant les développements et les améliorations futures d’Alt-Ergo.

Alt-Ergo est un démonstrateur automatique de formules mathématiques, créé au LRI et développé par OCamlPro depuis 2013. Pour en savoir plus ou rejoindre le Club, visitez le site https://alt-ergo.ocamlpro.com.

Notre Club a plusieurs objectifs. Son objectif principal est de garantir la pérennité d’Alt-Ergo en favorisant la collaboration entre les membres du Club et en tissant des liens avec les utilisateurs de méthodes formelles, telle que la communauté Why3. L’une de nos priorités est de définir les besoins des utilisateurs de solveurs de contraintes en étendant Alt-Ergo à de nouveaux domaines tels que le Model Checking, tout en concurrençant les autres solveurs de l’état de l’art au cours de compétitions internationales. Enfin, le dernier objectif du Club est de trouver de nouveaux projets ou contrats pour le développement de fonctionnalités à long terme.

Nous tenons à remercier tous nos membres pour leur soutien : Mitsubishi Electric R&D Centre Europe, AdaCore et le CEA List. Nous souhaitons également mettre en lumière l’équipe de développement Why3 avec laquelle nous travaillons pour améliorer nos outils.

Cette année, de nouveaux points d’intérêts ont été soulevés par nos membres. Dans un premier temps, la génération de modèles, ajoutée à Alt-Ergo suite à la dernière édition, a été utile à la majorité des membres du club. Les points techniques souhaités à présent sont de pouvoir raffiner les contraintes et étudier comment les propager. Dans un second temps a eu lieu la présentation de Dolmen, le parseur/typer qui permettra de ne typer qu’une seule fois les fichiers SMT2 et d’être prêt pour le SMT3. Son intégration à Alt-Ergo est en cours, l’avis des membres du club est enthousiaste sur les apports futurs de l’outil Dolmen à la communauté des solveurs SMT !

Ces fonctionnalités sont désormais nos principales priorités, retrouvez les planches présentées à la réunion du Club édition 2021. Pour suivre nos avancement et les nouveautés, n’hésitez pas à lire nos articles sur notre blog.

New Try-Alt-Ergo

2021-03-29T09:05:17Z

Have you heard about our Try-Alt-Ergo website? Created in 2014 (see our blogpost), the first objective was to facilitate access to our performant SMT Solver Alt-Ergo. Try-Alt-Ergo allows you to write and run your problems in your browser without any server computation.

This playground website has been maintained by OCamlPro for many years, and it's high time to bring it back to life with new updates. We are therefore pleased to announce the new version of the Try-Alt-Ergo website! In this article, we will first explain what has changed in the back end, and what you can use if you are interested in running your own version of Alt-Ergo on a website, or in an application! And then we will focus on the new front-end of our website, from its interface to its features through its tutorial about the program.* *

Try-Alt-Ergo 2014

Try-Alt-Ergo was designed to be a powerful and simple tool to use. Its interface was minimalist. It offered three panels, one panel (left) with a text area containing the problem to prove. The centered panel was composed of a button to run Alt-Ergo, load examples, set options. The right panel showed these options, examples and other information. This design lacked some features that have been added to our solver through the years. Features such as models (counter-examples), unsat-core, more options and debug information was missing in this version.

Try-Alt-Ergo did not offer a proper editor (with syntax coloration), a way to save the file problem nor an option to limit the run of the solver with a time limit. Another issue was about the thread. When the solver was called the webpage froze, that behavior was problematic in case of the long run because there was no way to stop the solver.

Alt-Ergo 1.30

The 1.30 version of Alt-Ergo was the version used in the back-end to prove problems. Since this version, a lot of improvements have been done in Alt-Ergo. To learn more about these improvements, see our changelog in the documentation.

Over the years we encountered some difficulties to update the Alt-Ergo version used in Try-Alt-Ergo. We used Js_of_ocaml to compile the OCaml code of our solver to be runnable as a JavaScript code. Some libraries were not available in JavaScript and we needed to manually disable them. The lack of automatism leads to a lack of time to update the JavaScript version of Alt-Ergo in Try-Alt-Ergo.

In 2019 we switched our build system to dune which opens the possibility to ease the cross-compilation of Alt-Ergo in JavaScript.

New back-end

With some simple modification, we were able to compile Alt-Ergo in JavaScript. This modification is simple enough that this process is now automated in our continuous integration. This will enable us to easily provide a JavaScript version of our Solver for each future version.

Two ways of using our solver in JavaScript are available:

alt-ergo.js, a JavaScript version of the Alt-Ergo CLI. It can be runned with node: node alt-ergo.js <options> <file>. Note that this code is slower than the natively compiled CLI of Alt-Ergo.In our effort to open the SMT world to more people, an npm package is the next steps of this work.
alt-ergo-worker.js, a web worker of Alt-Ergo. This web worker needs JSON file to input file problem, options into Alt-Ergo and to returns its answers:
- Options are sent as a list of couple name:value like:{"debug":true,"input_format":"Native","steps_bound":100,"sat_solver": "Tableaux","file":"test-file"}You can specify all options used in Alt-Ergo. If some options are missing, the worker uses the default value for these options. For example, if debug is not specified the worker will use its defaults value :false.- Input file is sent as a list of string, with the following format:{ "content": [ "goal g: true"] }
- Alt-Ergo answers can be composed with its results, debug information, errors, warnings …{ "results": [ "File "test-file", line 1, characters 9-13: Valid (0.2070) (0 steps) (goal g) ] ,``"debugs": [ "[Debug][Sat_solver]", "use Tableaux-like solver"] }like the options, a result value like debugs does not contains anything, "debugs": [...] is not returned.
- See the Alt-Ergo web-worker documentation to learn more on how to use it.

New Front-end

The Try-Alt-Ergo has been completely reworked and we added some features:

The left panel is still composed in an editor and answers area
- Ace editor with custom syntax coloration (both native and smt-lib2) is now used to make it more pleasant to write your problems.
A top panel that contains the following buttons:
- Ask Alt-Ergo which retrieves content from the editor and options, launch the web worker and print answers in the defined areas.
- Load and Save files.
- Documentation, that sends users to the newly added native syntax documentation of Alt-Ergo.
- Tutorial, that opens an interactive tutorial to introduce you to Alt-Ergo native syntax and program verification.

A right panel composed of tabs:
- Start and About that contains general information about Alt-Ergo, Try-Alt-Ergo and how to use it.
- Outputs prints more information than the basic answer area under the editor. In these tabs you can find debugs (long) outputs, unsat-core or models (counter-example) generated by Alt-Ergo.
- Options contains every option you can use, such as the time limit / steps limit or to set the format of the input file to prove .
- Statistics is still a basic tab that only output axioms used to prove the input problem.
- Examples contains some basic examples showing the capabilities of our solver.

We hope you will enjoy this new version of Try-Alt-Ergo, we can't wait to read your feedback!

This work was done at OCamlpro.

opam 2.0.8 release

2021-02-08T09:05:17Z

We are pleased to announce the minor release of opam 2.0.8.

This new version contains some backported fixes:

Critical for fish users! Don't add . to PATH. [#4078]
Fix sandbox script for newer ccache versions. [#4079 and #4087]
Fix sandbox crash when ~/.cache is a symlink. [#4068]
User modifications to the sandbox script are no longer overwritten by opam init. [#4020 & #4092]
macOS sandbox script always mounts /tmp read-write, regardless of TMPDIR [#3742, addressing ocaml/opam-repository#13339]
pre- and post-session hooks can now print to the console [#4359]
Switch-specific pre/post sessions hooks are now actually run [#4472]
Standalone opam-installer now correctly builds from sources [#4173]
Fix arch variable detection when using 32bit mode on ARM64 and i486 [#4462]

A more complete release note is available.

Installation instructions (unchanged):

From binaries: run

$~ bash -c "sh <(curl -fsSL https://raw.githubusercontent.com/ocaml/opam/master/shell/install.sh) --version 2.0.8"

From source, using opam:

$~ opam update; opam install opam-devel

From source, manually: see the instructions in the README.

We hope you enjoy this new minor version, and remain open to bug reports and suggestions.

NOTE: this article is cross-posted on opam.ocaml.org and ocamlpro.com, and published in discuss.ocaml.org.

2020 at OCamlPro

2021-02-02T09:05:17Z

OCamlPro was created in 2011 to advocate the adoption of the OCaml language and formal methods in general in the industry. While building a team of highly-skilled engineers, we navigated through our expertise domains, delivering works on the OCaml language and tooling, training companies to the use of strongly-typed languages like OCaml but also Rust, tackling formal verification challenges with formal methods, maintaining the SMT solver Alt-Ergo, designing languages and tools for smart contracts and blockchains, and more!

In this article, as every year (see 2019 at OCamlPro for last year's post), we review some of the work we did during 2020, in many different worlds.

Table of contents

In the World of OCaml

Flambda & Compilation Team
Opam, the OCaml Package Manager
Encouraging OCaml Adoption: Trainings and Resources for OCaml
Open Source Tooling and Libraries for OCaml
Supporting the OCaml Software Foundation
Events

In the World of Formal Methods

Alt-Ergo Development
Alt-Ergo Users’ Club and R&D Projects
Alt-Ergo’s Roadmap

In the World of Rust

In the World of Blockchain Languages

We warmly thank all our partners, clients and friends for their support and collaboration during this peculiar year!

The first lockdown was a surprise and we took advantage of this special moment to go over our past contributions and sum it up in a timeline that gives an overview of the key events that made OCamlPro over the years. The timeline format is amazing to reconnect with our history and to take stock in our accomplishments.

Now this will turn into a generic timeline edition tool on the Web, stay tuned if you are interested in our internal project to be available to the general public! If you think that a timeline would fit your needs and audience, we designed a simplistic tool, tailored for users who want complete control over their data.

In the World of OCaml

Flambda & Compilation Team

Work by Pierre Chambart, Vincent Laviron, Guillaume Bury, Pierrick Couderc and Louis Gesbert

OCamlPro is proud to be working on Flambda2, an ambitious work on an OCaml optimizing compiler, in close collaboration with Mark Shinwell from our long-term partner and client Jane Street. Flambda focuses on reducing the runtime cost of abstractions and removing as many short-lived allocations as possible. In 2020, the Flambda team worked on a considerable number of fixes and improvements, transforming Flambda2 from an experimental prototype to a version ready for testing in production!

This year also marked the conclusion of our work on the pack rehabilitation (see our two recent posts Part 1 and Part 2, and a much simpler Version in 2011). Our work aimed to give them a new youth and utility by adding the possibility to generate functors or recursive packs. This improvement allows programmers to define big functors, functors that are split among multiple files, resulting in what we can view as a way to implement some form of parameterized libraries.

This work is allowed thanks to Jane Street’s funding.

Opam, the OCaml Package Manager

Work by Raja Boujbel, Louis Gesbert and Thomas Blanc

Our 2020 work on Opam led to the release of two versions of opam 2.0 with small fixes, and the release of three alphas and two betas of Opam 2.1!

Opam 2.1.0 will soon go to release candidate and will introduce a seamless integration of depexts (system dependencies handling), dependency locking, pinning sub-directories, invariant-based definition for Opam switches, the configuration of Opam from the command-line without the need for a manual edition of the configuration files, and the CLI versioning for better handling of CLI evolutions.

This work is greatly helped by Jane Street’s funding and support.

Encouraging OCaml Adoption: Trainings and Resources for OCaml

Work by Pierre Chambart, Vincent Laviron, Adrien Champion, Mattias, Louis Gesbert and Thomas Blanc

OCamlPro is also a training centre. We organise yearly training sessions for programmers from multiple companies in our offices: from OCaml to OCaml tooling to Rust! We can also design custom and on-site trainings to meet specific needs.

We released a brand new version of TryOCaml, a tool born from our work on Learn-OCaml! Try OCaml has been highly praised by professors at the beginning of the Covid lockdown. Even if it can be used as a personal sandbox, it’s also possible to adapt its usage for classes. TryOCaml is a hassle-free tool that lowers significantly the barriers to start coding in OCaml, as no installation is required.

We regularly release cheat sheets for developers: in 2020, we shared the long-awaited Opam 2.0 cheat sheet, with a new theme! In just two pages, you’ll have in one place the everyday commands you need as an Opam user. We also shine some light on unsung features which may just change your coding life.

2020 was also an important year for the OCaml language itself: we were pleased to welcome OCaml 4.10! One of the highlights of the release was the “Best-fit” Garbage Collector Strategy. We had an in-depth look at this exciting change.

This work is self-funded by OCamlPro as part of its effort to ease the adoption of OCaml.

Open Source Tooling and Libraries for OCaml

Work by Fabrice Le Fessant, Léo Andrès and David Declerck

OCamlPro has a long history of developing open source tooling and libraries for the community. 2020 was no exception!

drom is a simple tool to create new OCaml projects that will use best OCaml practices, i.e. Opam, Dune and tests. Its goal is to provide a cargo-like user experience and helps onboarding new developers in the community. drom is available in the official opam repository.

directories is a new OCaml Library that provides configuration, cache and data paths (and more!). The library follows the suitable conventions on Linux, MacOS and Windows.

opam-bin is a framework to create and use binary packages with Opam. It enables you to create, use and share binary packages easily with opam, and to create as many local switches as you want spending no time, no disk space! If you often use Opam, opam-bin is a must-have!

We also released a number of libraries, focused on making things easy for developers… so we named them with an ez_ prefix: ez_cmdliner provides an Arg-like interface for cmdliner, ez_file provides simple functions to read and write files, ez_subst provides easily configurable string substitutions for shell-like variable syntax, ez_config provides abstract options stored in configuration files with an OCaml syntax. There are also a lot of ezjs-* libraries, that are bindings to Javascript libraries that we used in some of our js_of_ocaml projects.

*This work was self-funded by OCamlPro as part of its effort to improve the OCaml ecosystem.*

Supporting the OCaml Software Foundation

OCamlPro was proud and happy to initiate the OCaml User Survey 2020 as part of the mission of the [OCaml Software Foundation]. The goal of the survey was to better understand the community and its needs. The final results have not yet been published by the Foundation, we are looking forward to reading them soon!

Events

Though the year took its toll on our usual tour of the world conferences and events, OCamlPro members still took part in the annual 72-hour team programming competition organised by the International Conference on Functional Programming (ICFP). Our joint team “crapo on acid” went through the final!

In the World of Formal Methods

Work by Albin Coquereau, Mattias, Sylvain Conchon, Guillaume Bury and Louis Rustenholz

Sylvain Conchon joined OCamlPro as Formal Methods Chief Scientific Officer in 2020!

Alt-Ergo Development

In 2020, we focused on the maintainability of our solver. The first part of this work was to maintain and fix issues within the already released version. The 2.3.0 (released in 2019) had some issues that needed to be fixed minor releases.

The second part of the maintainability work on Alt-Ergo contains more major features. All these features were released in the new version 2.4.0 of Alt-Ergo. The main goal of this release was to focus on the user experience and the documentation. This release also contains bug fixes and many other improvements. Alt-Ergo is on its way towards a new documentation and in particular a new documentation on its native syntax.

We also tried to improve the command line experience of our tools with the use of the cmdliner library to parse Alt-Ergo options. This library allows us to improve the manpage of our tool. We tried to harmonise the debug messages and to improve all of Alt-Ergo’s outputs to make it clearer for the users.

Alt-Ergo Users’ Club and R&D Projects

We thank our partners from the Alt-Ergo Users’ Club, Adacore, CEA List, MERCE (Mitsubishi Electric R&D Centre Europe) and Trust-In-Soft, for their trust. Their support allows us to maintain our tool.

The club was launched in 2019 and the second annual meeting of the Alt-Ergo Users’ Club was held in mid-February 2020. Our annual meeting is the perfect place to review each partner’s needs regarding Alt-Ergo. This year, we had the pleasure of receiving our partners to discuss the roadmap for future Alt-Ergo developments and enhancements. If you want to join us for the next meeting (coming soon), contact us!

We also want to thank our partners from the FUI R&D Project LCHIP. Thanks to this project, we were able to add a new major feature in Alt-Ergo: the support for incremental commands (push, pop and check-sat-assuming) from the smt-lib2 standard.

Alt-Ergo’s Roadmap

Some of the work we did in 2020 is not yet available. Thanks to our partner MERCE (Mitsubishi Electric R&D Centre Europe), we worked on the SMT model generation. Alt-Ergo is now (partially) able to output a model in the smt-lib2 format. Thanks to the Why3 team from University of Paris-Saclay, we hope that this work will be available in the Why3 platform to help users in their program verification efforts.

Another project was launched in 2020 but is still in early development: the complete rework of our Try-Alt-Ergo website with new features such as model generation. Try Alt-Ergo current version allows users to use Alt-Ergo directly from their browsers (Firefox, Chromium) without the need of a server for computations.

This work needed a JavaScript compatible version of Alt-Ergo. We have made some work to build our solver in two versions, one compatible with Node.js and another as a webworker. We hope that this work can make it easier to use our SMT solver in web applications.

This work is funded in part by the FUI R&D Project LCHIP, MERCE, Adacore and with the support of the Alt-Ergo Users’ Club.

In the World of Rust

Work by Adrien Champion

As OCaml-ians, we naturally saw in the Rust language a beautiful complement to our approach. One opportunity to explore this state-of-the art language has been to pursue our work on ocp-memprof and build Memthol, a visualizer and analyzer to profile OCaml programs. It works on memory dumps containing information about the size and (de)allocation date of part of the allocations performed by some execution of a program.

Between lockdowns, we’ve also been able to hold our Rust training. It’s designed as a highly-modular vocational course, from 1 to 4 days. The training covers a beginner introduction to Rust’s basics features, crucial features and libraries for real-life development and advanced features, all through complex use-cases one would find in real life.

This work was self-funded by OCamlPro as part of our exploration of other statically and strongly typed functional languages.

In the World of Blockchain Languages

Work by David Declerck and Steven de Oliveira

One of our favourite activities is to develop new programming languages, specialized for specific domains, but with nice properties like clear semantics, strong typing, static typing and functional features. In 2020, we applied our skills in the domain of blockchains and smart contracts, with the creation of a new language, Love, and work on a well-known language, Solidity.

In 2020, our blockchain experts released Love, a type-safe language with an ML syntax and suited for formal verification. In a few words, Love is designed to be expressive for fast development, efficient in execution time and cheap in storage, and readable in terms of smart contracts auditability. Yet, it has a clear and formal semantics and a strong type system to detect bugs. It allows contracts to use other contracts as libraries, and to call viewers on other contracts. Contracts developed in Love can also be formally verified.

We also released a Solidity parser and printer written in OCaml using Menhir, and used it to implement a full interpreter directly in a blockchain. Solidity is probably the most used language for smart contracts, it was first born on Ethereum but many other blockchains provide it as a way to easily onboard new developers coming from the Ethereum ecosystem. In the future, we plan to extend this work with formal verification of Solidity smart contracts.

This is a joint effort with Origin Labs, the company created to tackle blockchain-related challenges.

##Towards 2021##

Adaptability and continuous improvement, that’s what 2020 brought to OCamlPro!

We will remember 2020 as a complicated year, but one that allowed us to surpass ourselves and challenge our projects. We are very proud of our team who all continued to grow, learn, and develop our projects in this particular context. We are more motivated than ever for the coming year, which marks our tenth year anniversary! We’re excited to continue sharing our knowledge of the OCaml world and to accompany you in your own projects.

Release of Alt-Ergo 2.4.0

2021-01-22T09:05:17Z

A new release of Alt-Ergo (version 2.4.0) is available.

You can get it from Alt-Ergo's website. The associated opam package will be published in the next few days.

This release contains some major novelties:

Alt-Ergo supports incremental commands (push/pop) from the smt-lib standard.
We switched command line parsing to use cmdliner. You will need to use --<option name> instead of -<option name>. Some options have also been renamed, see the manpage or the documentation.
We improved the online documentation of your solver, available here.

This release also contains some minor novelties:

.mlw and .why extension are depreciated, the use of .ae extension is advised.
Add --input (resp --output) option to manually set the input (resp output) file format
Add --pretty-output option to add better debug formatting and to add colors
Add exponentiation operation, ** in native Alt-Ergo syntax. The operator is fully interpreted when applied to constants
Fix --steps-count and improve the way steps are counted (AdaCore contribution)
Add --instantiation-heuristic option that can enable lighter or heavier instantiation
Reduce the instantiation context (considered foralls / exists) in CDCL-Tableaux to better mimic the Tableaux-like SAT solver
Multiple bugfixes

The full list of changes is available here. As usual, do not hesitate to report bugs, to ask questions, or to give your feedback!

opam 2.1.0~beta4 released

2021-01-13T09:05:17Z

Feedback on this post is welcomed on Discuss!

On behalf of the opam team, it gives me great pleasure to announce the third beta release of opam 2.1. Don’t worry, you didn’t miss beta3 - we had an issue with a configure script that caused beta2 to report as beta3 in some instances, so we skipped to beta4 to avoid any further confusion!

We encourage you to try out this new beta release: there are instructions for doing so in our wiki. The instructions include taking a backup of your ~/.opam root as part of the process, which can be restored in order to wind back. Please note that local switches which are written to by opam 2.1 are upgraded and will need to be rebuilt if you go back to opam 2.0. This can either be done by removing _opam and repeating whatever you use in your build process to create the switch, or you can use opam switch export switch.export to backup the switch to a file before installing new packages. Note that opam 2.1 shouldn’t upgrade a local switch unless you upgrade the base packages (i.e. the compiler).

What’s new in opam 2.1?

Switch invariants
Improved options configuration (see the new option and expanded var sub-commands)
Integration of system dependencies (formerly the opam-depext plugin), increasing their reliability as it integrates the solving step
Creation of lock files for reproducible installations (formerly the opam-lock plugin)
CLI versioning, allowing cleaner deprecations for opam now and also improvements to semantics in future without breaking backwards-compatibility
Performance improvements to opam-update, conflict messages, and many other areas
New plugins: opam-compiler and opam-monorepo

Switch invariants

In opam 2.0, when a switch is created the packages selected are put into the “base” of the switch. These packages are not normally considered for upgrade, in order to ease pressure on opam’s solver. This was a much bigger concern early on in opam 2.0’s development, but is less of a problem with the default mccs solver.

However, it’s a problem for system compilers. opam would detect that your system compiler version had changed, but be unable to upgrade the ocaml-system package unless you went through a slightly convoluted process with --unlock-base.

In opam 2.1, base packages have been replaced by switch invariants. The switch invariant is a package formula which must be satisfied on every upgrade and install. All existing switches’ base packages could just be expressed as package1 & package2 & package3 etc. but opam 2.1 recognises many existing patterns and simplifies them, so in most cases the invariant will be "ocaml-base-compiler" {= 4.11.1}, etc. This means that opam switch create my_switch ocaml-system now creates a switch invariant of "ocaml-system" rather than a specific version of the ocaml-system package. If your system OCaml package is updated, opam upgrade will seamlessly switch to the new package.

This also allows you to have switches which automatically install new point releases of OCaml. For example:

$~ opam switch create ocaml-4.11 --formula='"ocaml-base-compiler" {>= "4.11.0" & < "4.12.0~"}' --repos=old=git+https://github.com/ocaml/opam-repository#a11299d81591
$~ opam install utop

Creates a switch with OCaml 4.11.0 (the --repos= was just to select a version of opam-repository from before 4.11.1 was released). Now issue:

$~ opam repo set-url old git+https://github.com/ocaml/opam-repository
$~ opam upgrade

and opam 2.1 will automatically offer to upgrade OCaml 4.11.1 along with a rebuild of the switch. There’s not yet a clean CLI for specifying the formula, but we intend to iterate further on this with future opam releases so that there is an easier way of saying “install OCaml 4.11.x”.

opam depext integration

You no longer have to remember to run opam depext, opam always checks depexts (there are options to disable this or automate it for CI use). Installation of an opam package in a CI system is now as easy as opam install ., without having to do the dance of opam pin add -n/depext/install. Just one command now for the common case!
The solver is only called once, which both saves time and also stabilises the behaviour of opam in cases where the solver result is not stable. It was possible to get one package solution for the opam depext stage and a different solution for the opam install stage, resulting in some depexts missing.
opam now has full knowledge of depexts, which means that packages can be automatically selected based on whether a system package is already installed. For example, if you have neither MariaDB nor MySQL dev libraries installed, opam install mysql will offer to install conf-mysql and mysql, but if you have the MariaDB dev libraries installed, opam will offer to install conf-mariadb and mysql.

opam lock files and reproducibility

When opam was first released, it had the mission of gathering together scattered OCaml source code to build a community repository. As time marches on, the size of the opam repository has grown tremendously, to over 3000 unique packages with over 18000 unique versions. opam looks at all these packages and is designed to solve for the best constraints for a given package, so that your project can keep up with releases of your dependencies.

With opam 2.1.0, the creation of lock files is also now integrated into the client:

opam lock will create a .locked file for your current switch and project, that you can check into the repository.
opam switch create . --locked can be used by users to reproduce your dependencies in a fresh switch.

This lets a project simultaneously keep up with the latest dependencies (without lock files) while providing a stricter set for projects that need it (with lock files).

CLI Versioning

A new --cli switch was added to the first beta release, but it’s only now that it’s being widely used. opam is a complex enough system that sometimes bug fixes need to change the semantics of some commands. For example:

opam show --file needed to change behaviour
The addition of new controls for setting global variables means that the opam config was becoming cluttered and some things want to move to opam var
opam switch install 4.11.1 still works in opam 2.0, but it’s really an OPAM 1.2.2 syntax.

Now you can specify which version of opam you expected the command to be run against. In day-to-day use of opam at the terminal, you wouldn’t specify it, and you’ll get the latest version of the CLI. For example: opam var --global is the same as opam var --cli=2.1 --global. However, if you issue opam var --cli=2.0 --global, you will told that --global was added in 2.1 and so is not available to you. You can see similar things with the renaming of opam upgrade --unlock-base to opam upgrade --update-invariant.

The intention is that --cli should be used in scripts, user guides (e.g. blog posts), and in software which calls opam. The only decision you have to take is the oldest version of opam which you need to support. If your script is using a new opam 2.1 feature (for example opam switch create --formula=) then you simply don’t support opam 2.0. If you need to support opam 2.0, then you can’t use --formula and should use --packages instead. opam 2.0 does not have the --cli option, so for opam 2.0 instead of --cli=2.0 you should set the environment variable OPAMCLI to 2.0. As with all opam command line switches, OPAMCLI is simply the equivalent of --cli which opam 2.1 will pick-up but opam 2.0 will quietly ignore (and, as with other options, the command line takes precedence over the environment).

There’s even more detail on this feature in our wiki. We’re still finalising some details on exactly how opam behaves when --cli is not given, but we’re hoping that this feature will make it much easier in future releases for opam to make required changes and improvements to the CLI without breaking existing set-ups and tools.

What’s new since the last beta?

opam now uses CLI versioning (#4385)
opam now exits with code 31 if all failures were during fetch operations (#4214)
opam install now has a --download-only flag (#4036), allowing opam’s caches to be primed
opam init now advises the correct shell-specific command for eval $(opam env) (#4427)
post-install hooks are now allowed to modify or remove installed files (#4388)
New package variable opamfile-loc with the location of the installed package opam file (#4402)
opam update now has --depexts flag (#4355), allowing the system package manager to update too
depext support NetBSD and DragonFlyBSD added (#4396)
The format-preserving opam file printer has been overhauled (#3993, #4298 and #4302)
pins are now fetched in parallel (#4315)
os-family=ubuntu is now treated as os-family=debian (#4441)
opam lint now checks that strings in filtered package formulae are booleans or variables (#4439)

and many other bug fixes as listed on the release page.

New Plugins

Several features that were formerly plugins have been integrated into opam 2.1.0. We have also developed some new plugins that satisfy emerging workflows from the community and the core OCaml team. They are available for use with the opam 2.1 beta as well, and feedback on them should be directed to the respective GitHub trackers for those plugins.

opam compiler

The opam compiler plugin can be used to create switches from various sources such as the main opam repository, the ocaml-multicore fork, or a local development directory. It can use Git tag names, branch names, or PR numbers to specify what to install.

Once installed, these are normal opam switches, and one can install packages in them. To iterate on a compiler feature and try opam packages at the same time, it supports two ways to reinstall the compiler: either a safe and slow technique that will reinstall all packages, or a quick way that will just overwrite the compiler in place.

opam monorepo

The opam monorepo plugin lets you assemble standalone dune workspaces with your projects and all of their opam dependencies, letting you build it all from scratch using only Dune and OCaml. This satisfies the “monorepo” workflow which is commonly requested by large projects that need all of their dependencies in one place. It is also being used by projects that need global cross-compilation for all aspects of a codebase (including C stubs in packages), such as the MirageOS unikernel framework.

Next Steps

This is anticipated to be the final beta in the 2.1 series, and we will be moving to release candidate status after this. We could really use your help with testing this release in your infrastructure and projects and let us know if you run into any blockers. If you have feature requests, please also report them on our issue tracker -- we will be planning the next release cycle once we ship opam 2.1.0 shortly.

Memthol: exploring program profiling

2020-12-01T09:05:17Z

Memthol is a visualizer and analyzer for program profiling. It works on memory dumps containing information about the size and (de)allocation date of part of the allocations performed by some execution of a program.

For information regarding building memthol, features, browser compatibility… refer to the memthol github repository. *Please note that Memthol, as a side project, is a work in progress that remains in beta status for now. *

Memthol's background

The Memthol work was started more than a year ago (we had published a short introductory paper at the JFLA2020). The whole idea was to use the previous work originally achieved on ocp-memprof, and look for some extra funding to achieve a usable and industrial version.Then came the excellent memtrace profiler by Jane Street's team (congrats!)Memthol is a self-funded side project, that we think it still is worth giving to the OCaml community. Its approach is valuable, and can be complementary. It is released under the free GPL licence v3.

Memthol's versatility: supporting memtrace's dump format

The memtrace format is nicely designed and polished enough to be considered a future standard for other tools.This is why Memthol supports Jane Street's dumper format, instead of our own dumper library's.

Why choose Rust to implement Memthol?

We've been exploring the Rust language for more than a year now.The Memthol work was the opportunity to further explore this state-of-the-art language. We are open to extra funding, to deepen the Memthol work should industrial users be interested.

Memthol's How-to

The following steps are from the Memthol Github howto.

1. Introduction

2. Basics

3. Charts

4. Global Settings

5. Callstack Filters

Introduction

This tutorial deals with the BUI ( Browser User Interface) aspect of the profiling. How the dumps are generated is outside of the scope of this document. Currently, memthol accepts memory dumps produced by [Memtrace](https://blog.janestreet.com/finding-memory-leaks-with-memtrace) (github repository here). A memtrace dump for a program execution is a single Common Trace Format (CTF) file.

This tutorial uses CTF files from the memthol repository. All paths mentioned in the examples are from its root.

Memthol is written in Rust and is composed of

a server, written in pure Rust, and
a client, written in Rust and compiled to web assembly.

The server contains the client, which it will serve at some address on some port when launched.

Running Memthol

Memthol must be given a path to a CTF file generated by memtrace.

> ls rsc/dumps/ctf/flamba.ctf
rsc/dumps/ctf/flamba.ctf
> memthol rsc/dumps/ctf/flamba.ctf
|===| Starting
| url: http://localhost:7878
| target: `rsc/dumps/ctf/flamba.ctf`
|===|

Basics

Our running example in this section will be rsc/dumps/mini_ae.ctf:

❯ memthol --filter_gen none rsc/dumps/ctf/mini_ae.ctf
|===| Starting
| url: http://localhost:7878
| target: `rsc/dumps/ctf/mini_ae.ctf`
|===|

Notice the odd --filter_gen none passed to memthol. Ignore it for now, it will be discussed later in this section.

Once memthol is running, http://localhost:7878/ (here) will lead you to memthol's BUI, which should look something like this:

Click on the orange everything tab at the bottom left of the screen.

Memthol's interface is split in three parts:

the central, main part displays charts. There is only one here, showing the evolution of the program's total memory size over time based on the memory dump.
the header gives statistics about the memory dump and handles general settings. There is currently only one, the time window.- the footer controls your filters (there is only one here), which we are going to discuss right now.

Filters

Filters allow to split allocations and display them separately. A filter is essentially a set of allocations. Memthol has two built-in filters. The first one is the everything filter. You cannot really do anything with it except for changing its name and color using the filter settings in the footer.

Notice that when a filter is modified, two buttons appear in the top-left part of the footer. The first reverts the changes while the second one saves them. Let's save these changes.

The everything filter always contains all allocations in the memory dump. It cannot be changed besides the cosmetic changes we just did. These changes are reverted in the rest of the section.

Custom Filters

Let's create a new filter using the + add button in the top-right part of the footer.

Notice that, unlike everything, the settings for our new filter have a Catch allocation if … (empty) section with a + add button. Let's click on that.

This adds a criterion to our filter. Let's modify it so that the our filter catches everything of size greater than zero machine words, rename the filter, and save these changes.

The tab for our filter now shows (3) next to its name, indicating that this filter catches 3 allocations, which is all the allocations of the (tiny) dump.

Now, create a new filter and modify it so that it catches allocations made in file weak.ml. This requires

creating a filter,
adding a criterion to that filter,
switching it from size to callstack
removing the trailing ** (anything) by erasing it,
write weak.ml as the last file that should appear in the callstack.>

After saving it, you should get the following.

Sadly, this filter does not match anything, although some allocations fit this filter. This is because a custom filter F “catches" an allocation if

all of the criteria of F are true for this allocation, and
the allocation is not caught by any custom filter at the left of F (note that the everything filter is not a custom filter).

In other words, all allocations go through the list of custom filters from left to right, and are caught by the first filter such that all of its criteria are true for this allocation. As such, it is similar to switch/case and pattern matching.

Let's move our new filter to the left by clicking the left arrow next to it, and save the change.

Nice.

You can remove a filter by selecting it and clicking the - remove button in the top-right part of the footer, next to the + add filter button. This only works for custom filters, you cannot remove built-in filters.

Now, remove the first filter we created (size ≥ 0), which should give you this:

Out of nowhere, we get the second and last built-in filter: catch-all. When some allocations are not caught by any of your filters, they will end up in this filter. Catch-all is not visible when it does not catch any allocation, which is why it was (mostly) not visible until now. The filter we wrote previously where catching all the allocations.

In the switch/case analogy, catch-all is the else/default branch. In pattern matching, it would be a trailing wildcard _.

So, weak.ml only catches one of the three allocations: catch-all appears and indicates it matches the remaining two.

It is also possible to write filter criteria over allocations' callstacks. This is discussed in the Callstack Filters Section.

Filter Generation

When we launched this section's running example, we passed --filter_gen none to memthol. This is because, by default, memthol will run automatic filter generation which scans allocations and generates filters. The default (and currently only) one creates one filter per allocation-site file.

For more details, in particular filter generation customization, run memthol --filter_gen help.

If we relaunch the example without --filter_gen none

❯ memthol rsc/dumps/ctf/mini_ae.ctf
|===| Starting
| url: http://localhost:7878
| target: `rsc/dumps/ctf/mini_ae.ctf`
|===|

we get something like this (actual colors may vary):

Charts

This section uses the same running example as the last section.

❯ memthol rsc/dumps/ctf/mini_ae.ctf
|===| Starting
| url: http://localhost:7878
| target: `rsc/dumps/ctf/mini_ae.ctf`
|===|

Filter Toggling

The first way to interact with a chart is to (de)activate filters. Each chart has its own filter tabs allowing to toggle filters on/off.

From the initial settings

click on all filters but everything to toggle them off.

Let's create a new chart. The only kind of chart that can be constructed currently is total size over time, so click on create chart below our current, lone chart.

Deactivate everything in the second chart.

Nice. We now have the overall total size over time in the first chart, and the details for each filter in the second one.

Next, notice that both charts have, on the left of their title, a down (first chart) and up (second chart) arrow. This moves the charts up and down.

On the right of the title, we have a settings ... buttons which is discussed below. The next button collapses the chart. If we click on the collapse* button of the first chart, it collapses and the button turns into an expand button.

The last button in the chart header removes the chart.

Chart Settings

Clicking the settings ... button in the header of any chart display its settings. (Clicking on the button again hides them.)

Currently, these chart settings only allow to rename the chart and change its display mode.

Display Mode

In memthol, a chart can be displayed in one of three ways:

normal, the one we used so far,
stacked area, where the values of each filter are displayed on top of each other, and
stacked area percent, same as stacked area but values are displayed as percents of the total.

Here is the second chart from our example displayed as stacked area for instance:

Global Settings

This section uses the same running example as the last section.

❯ memthol rsc/dumps/ctf/mini_ae.ctf
|===| Starting
| url: http://localhost:7878
| target: `rsc/dumps/ctf/mini_ae.ctf`
|===|

There is currently only one global setting: the time window.

Time Window

The time window global setting controls the time interval displayed by all the charts.

In our example,

not much is happening before (roughly) 0.065 seconds. Let's have the time window start at that point:

Similar to filter edition, we can apply or cancel this change using the two buttons that appeared in the bottom-left corner of the header.

Saving these changes yields

Here is the same chart but with the time window upper-bound set at 0.074.

Callstack Filters

Callstack filters are filters operating over allocation properties that are sequences of strings (potentially with some other data). Currently, this means allocation callstacks, where the strings are file names with line/column information.

String Filters

A string filter can have three shapes: an actual string value, a regex, or a match anything / wildcard filter represented by the string "...". This wildcard filter is discussed in its own section below.

A string value is simply given as a value. To match precisely the string "file_name", one only needs to write file_name. So, a filter that matches precisely the list of strings [ "file_name_1", "file_name_2" ] will be written


string list	contains	`[ file_name_1 file_name_2 ]`

A regex on the other hand has to be written between #" and "#. If we want the same filter as above, but want to relax the first string description to be file_name_<i> where <i> is a single digit, we write the filter as


string list	contains	`[ #"file_name_[0-9]"# file_name_2 ]`

The Wildcard Filter

The wildcard filter, written ..., lazily (in general, see below) matches a repetition of any string-like element of the list. To break this definition down, let us separate two cases: the first one is when ... is not followed by another string-like filter, and second one is when it is followed by another filter.

In the first case, ... simply matches everything. Consider for instance the filter


string list	contain	`[ #"file_name_[0-9]"# ... ]`

This filter matches any list of strings that starts with a string accepted by the first regex filter. The following lists of strings are all accepted by the filter above.

[ file_name_0 ]
[ file_name_7 anything at all ]
[ file_name_3 file_name_7 ]

Now, there is one case when ... is not actually lazy: when the n string-filters after it are not .... In this case, all elements of the list but the n last ones will be skipped, leaving them for the n last string filters.

For this reason


string list	contain	`[ … #"file_name_[0-9]"# ]`

does work as expected. For example, on the string list

[ "some_file_name" "file_name_7" "another_file_name" "file_name_0" ]

a lazy behavior would not match. First, ... would match anything up to and excluding a string recognized by #"file_name_[0-9]"#. So ... would match some_file_name, but that's it since file_name_7 is a match for #"file_name_[0-9]"#. Hence the filter would reject this list of strings, because there should be nothing left after the match for #"file_name_[0-9]"#. But there are still another_file_name and file_name_0 left.

Instead, the filter works as expected. ... discards all elements but the last one file_name_0, which is accepted by #"file_name_[0-9]"#.

Callstack (Location) Filters

Allocation callstack information is a list of tuples containing:

the name of the file,
the line in the file,
a column range.

Currently, the range information is ignored. The line in the file is not, and one can specify a line constraint while writing a callstack filter. The normal syntax is

<string-filter>:<line-filter>

Now, a line filter has two basic shapes

_: anything,
<number>: an actual value.

It can also be a range:

[<basic-line-filter>, <basic-line-filter>]: a potentially open range.

Line Filter Examples


`_`	matches any line at all
`7`	matches line 7
`[50, 102]`	matches any line between `50` and `102`
`[50, _]`	matches any line greater than `50`
`[_, 102]`	matches any line less than `102`
`[_, _]`	same as `_` (matches any line)

Callstack Filter Examples

Whitespaces are inserted for readability but are not needed:


`src/main.ml : _`	matches any line of `src/main.ml`
`#".*/main.ml"# : 107`	matches line 107 of any `main.ml` file regardless of its path

Rehabilitating Packs using Functors and Recursivity, part 2.

2020-09-30T09:05:17Z

This blog post and the previous one about functor packs covers two RFCs currently developed by OCamlPro and Jane Street. We previously introduced functor packs, a new feature adding the possiblity to compile packs as functors, allowing the user to implement functors as multiple source files or even parameterized libraries.

In this blog post, we will cover the other aspect of the packs rehabilitation: allowing anyone to implement recursive compilation units using packs (as described formally in the RFC#20). Our previous post introduced briefly how packs were compiled and why we needed some bits of closure conversion to effectively implement big functors. Once again, to implement recursive packs we will need to encode modules through this technique, as such we advise the reader to check at least the introduction and the compilation part of functor packs.

Recursive modules through recursive packs

Recursive modules are a feature long available in the compiler, but restricted to modules, not compilation units. As such, it is impossible to write two files that depend on each other, except by using scripts that tie up these modules into a single compilation file. Due to the internal representation of recursive modules, it would be difficult to implement recursive (and mutually recursive) compilation units. However, we could use packs to implement these.

One common example of recursive modules are trees whose nodes are represented by sets. To implement such a data structure with the standard library we need recursive modules: Set is a functor that takes as parameter a module describing the values embedded in the set, but in our case the type needs the already applied functor.

module rec T : sig
  type t =
      Leaf of int
    | Node of TSet.t

  val compare : t -> t -> int
end = struct
  type t =
      Leaf of int
    | Node of TSet.t

  let compare t1 t2 =
    match t1, t2 with
      Leaf v1, Leaf v2 -> Int.compare v1 v2
    | Node s1, Node s2 -> TSet.compare s1 s2
    | Leaf _, Node _ -> -1
    | Node _, Leaf _ -> 1
end

and TSet : Set.S with type elt = T.t = Set.Make(T)

With recursive pack, we can simply put T and TSet into their respective files (t.ml and tSet.ml), and tie them into one module (let's name it P). Signature of recursive modules cannot be infered, as such we also need to define t.mli and tSet.mli. Both must be compiled simultaneously since they refer to each other. The result of the compilation is the following:

ocamlopt -c -for-pack P -recursive t.mli tSet.mli
ocamlopt -c -for-pack P -pack-is-recursive P t.ml
ocamlopt -c -for-pack P -pack-is-recursive P tSet.ml
ocamlopt -o p.cmx -recursive-pack t.cmx tSet.cmx

We have three new compilation options:

-recursive indicates to the compiler to typecheck all the given mlis simultaneously, as recursive modules.
-pack-is-recursive indicates which pack(s) in the hierarchy are meant to be recursive. This is necessary since it determines how the module must be compiled (i.e if we will need to apply closure conversion).
recursive-pack generates a pack that deals with the initialization of its modules, as for recursive modules.

Recursives modules compilation

One may be wondering why we need packs to compile recursive modules. Let's take a look at how they are encoded. We will craft a naive example that is simple enough once compiled:

module rec Even : sig
  val test: int -> bool
end = struct
  let test i =
    if i-1 <= 0 then false else Odd.test (i-1)
end

and Odd : sig
  val test: int -> bool
end = struct
  let test i =
    if i-1 <= 0 then true else Even.test (i-1)
end

It defines two modules Even and Odd, that both test whether an integer is even or odd, and if that is not the case calls the test function from the other module. Not a really interesting use of recursive modules obviously. The compilation schema for recursive modules is the following:

First, it allocates empty blocks for each module according to its shape (how many values are bound and what size they need in the block, if the module is a functor and what are its values, etc).
Then these blocks are filled with the implementation.

In our case, in a pseudo-code that is a bit higher order than Lambda (the first intermediate language of ocaml) it would translate as:

module Even = <allocation of the shape of even.cmx>
module Odd = <allocation of the shape of odd.cmx>

Even := <struct let test = .. end>
Odd := <struct let test = .. end>

This ensures that every reference to Even in Odd (and vice-versa) are valid pointers. To respect this schema, we will use packs to tie recursive modules together. Without packs, this means we would generate this code when linking the units into an executable which can be tricky. The pack can simply do it as initialization code.

Compiling modules for recursive pack

If we tried to compile these modules naively, we would end up in the same situation than for the functor pack: the compilation units would refer to identifiers that do not exist at the time they are generated. Moreover, the initialization part needs to know the shape of the compilation unit to be able to allocate precisely the block that will contain the recursive module. In order to implement recursive compilation units into packs, we extends the compilation units in two ways:

The shape of the unit is computed and stored in the cmo (or cmx).
As for functor pack, we apply closure conversion on the free variables that are modules from the same pack or from packs above in the hierarchy as long as they are recursive.

As an example, we will reuse our Even / Odd example above and split it into two units even.ml and odd.ml, and compile them into a recursive pack P. Both have the same shape: a module with a single value. Even refers to a free variable Odd, which is in the same recursive pack, and vice-versa. The result of the closure conversion is a function that will take the pointer resulting from the initialization. Since the module is also recursive itself, it takes its own pointer resulting from its initialization. The result will look as something like:

(* even.cmx *)
module Even_rec (Even: <even.mli><even.mli>)(Odd: <odd.mli><odd.mli>) = ..

(* odd.cmx *)
module Odd_rec (Odd: <odd.mli><odd.mli>)(Even: <even.mli><even.mli>) = ..

(* p.cmx *)
module Even = <allocation of the shape of even.cmx>
module Odd = <allocation of the shape of odd.cmx>

Even := Even_rec(Even)(Odd)
Odd := Odd_rec(Odd)(Even)

Rejunavating packs

Under the hood, these new features come with some refactoring in the pack implementation which follows work done for RFC on the representation of symbols in the middle-end of the compiler. Packs were not really used anymore and were deprecated by module aliases, this work makes them relevant again. These RFCs improve the OCaml ecosystem in multiple ways:

Compilation units are now on par with modules, since they can be functors.
Functor packs allow developers to implement parameterized libraries, without having to rely on scripts to produce multiple libraries linked with different backends (for example, Cohttp can use Lwt or Async as backend, and provides two libraries, one for each of these).
Recursive packs allow the implementation of recursive modules into separate files.

We hope that such improvements will benefit the users and library developers. Having a way to implement parameterize libraries without having to describe big functors by hand, or use mutually recursive compilation units without using scripts to generate a unique ml file will certainly introduce new workflows.

Rehabilitating Packs using Functors and Recursivity, part 1.

2020-09-24T09:05:17Z

OCamlPro has a long history of dedicated efforts to support the development of the OCaml compiler, through sponsorship or direct contributions from Flambda Team. An important one is the Flambda intermediate representation designed for optimizations, and in the future its next iteration Flambda 2. This work is funded by JaneStreet.

Packs in the OCaml ecosystem are kind of an outdated concept (options -pack and -for-pack in the OCaml manual), and their main utility has been overtaken by the introduction of module aliases in OCaml 4.02. What if we tried to redeem them and give them a new youth and utility by adding the possibility to generate functors or recursive packs?

This blog post covers the functor units and functor packs, while the next one will be centered around recursive packs. Both RFCs are currently developed by JaneStreet and OCamlPro. This idea was initially introduced by functor packs (Fabrice Le Fessant) and later generalized by functorized namespaces (Pierrick Couderc et al.).

Packs for the masses

First of all let's take a look at what packs are, and how they fixed some issues that arose when the ecosystem started to grow and the number of libraries got quite large.

One common problem in any programming language is how names are treated and disambiguated. For example, look at this small piece of code:

let x = "something"

let x = "something else"

We declare two variables x, but actually the first one is shadowed by the second, and is now unavailable for the rest of the program. It is perfectly valid in OCaml. Let's try to do the same thing with modules:

module M = struct end

module M = struct end

The compiler rejects it with the following error:

File "m.ml", line 3, characters 0-21:
3 | module M = struct end
    ^^^^^^^^^^^^^^^^^^^^^
Error: Multiple definition of the module name M.
       Names must be unique in a given structure or signature.

This also applies with programs linking two compilation units of the same name. Imagine you are using two libraries (here lib_a and lib_b), that both define a module named Misc.

ocamlopt -o prog.asm -I lib_a -I lib_b lib_a.cmxa lib_b.cmxa prog.ml 
File "prog.ml", line 1:
Error: The files lib_a/a.cmi and lib_b/b.cmi make inconsistent assumptions
over interface Misc

At link time, the compiler will reject your program since you are trying to link two modules with the same name but different implementations. The compiler is unable to differentiate the two compilation units since they define some identical symbols, as such cannot link the program. Enforcing unique module names in the same namespace (i.e. a signature) is consistent with the inability to link two modules of the same name in the same program.

However, Misc is a common name for a module in any project. How can we avoid that? As a user of the libraries there is nothing you can do, since you cannot rename the modules (you will eventually need to link two files named misc.cmx). As the developer, you need to ensure that your module names are unique enough to be used along any other libraries. One solution would be to use prefixes for each of your compilation units, for example by naming your files mylib_misc.ml, with the drawback that you will need to use those long module names inside your library. Another solution is packing your units.

A pack is simply a generated module that appends all your compilation units into one. For example, suppose you have two files a.ml and b.ml, you can generate a pack (i.e. a module) mylib.cmx that is equivalent to:

module A = struct <content of a.ml> end
module B = struct <content of b.ml> end

As such, A and B can retain their original module name, and be accessed from the outside as Mylib.A and Mylib.B. It uses the namespacing induced by the module system. A developer can simply generate a pack for its library, assuming its library name will be unique enough to be linked with other modules without the risk of name clashing. However it has one big downside: suppose you use a library with many modules but only use one. Without packs the compiler will only link the necessary compilation units from this library, but since the pack is one big compilation unit this means your program embeds the complete library.

This problem is fixed using module aliases and the compiler option -no-alias-deps since OCaml 4.02, and the result for the user is equivalent to a pack, making them more or less deprecated.

Functorizing packs, or how to parameterize a library

Packs being modules representing libraries, a useful feature would be to be able to produce libraries that take modules as parameters, just like functors. Another usage would be to split a huge functor into multiple files. In other words, we want our pack Mylib to be compiled as:

functor (P : sig .. end) -> struct 
  module A = struct <content of a.ml> end
  module B = struct <content of b.ml> end
end

while A and B would use the parameter P as a module, and Mylib instantiated later as

module Mylib = Mylib(Some_module_with_sig_compatible_with_P)

One can notice that our pack is indeed a functor, and not simply a module that binds a functor. To be able to do that, we also extends classical compilation units to be compiled as functors. Such functors are not expressed in the language, we do not provide a syntax for that, they are a matter of options at compile-time. For example:

ocamlopt -c -parameter P m.ml

will compile m.ml as a functor that has a parameter P whose interface is described in p.cmi in the compilation path. Similarly, our pack Mylib can be produced by the following compilation steps:

ocamlopt -c -parameter-of Mylib p.mli
ocamlopt -c -for-pack "Mylib(P)" a.ml
ocamlopt -c -for-pack "MyLib(P)" b.ml
ocamlopt -pack -o mylib.cmx -parameter P a.cmx b.cmx

In details:

The parameter is compiled with the flag -parameter-of Mylib, as such it won't be used as the interface of an implementation.
The two modules packed are compiled with the flag -for-pack "MyLib(P)". Expressing the parameter of the pack is mandatory since P must be known as a functor parameter (we will see why in the next section).
The pack is compiled with -parameter P, which will indeed produce a functorized compilation unit.

Functors are not limited to a unique parameter, as such they can be compiled with multiple -parameter options and multiple arguments in -for-pack. This implementation being on the build system side, it does not need to change the syntax of the language. We expect build tools like dune to provide supports for this feature, making it maybe more easier to use. Moreover, it makes compilation units on par with modules which can have a functor type. One downside however is that we cannot express type equalities between two parameters or with the functor body type as we would do with substitutions in module types.

Functor packs under the hood

In terms of implementation, packs should be seen as a concatenation of the compilation units then a rebinding of each of them in the newly created one. For example, a pack P of two units m.cmx and n.cmx is actually compiled as something like:

module P__M = <code of m.cmx> 
module P__N = <code of n.cmx> 
module M = P__M 
module N = P__N

According to this representation, if we tried to naively implement our previous functor pack Mylib(P) we would end up with a functor looking like this:

module Mylib__A = <code of a.cmx, with references to P>
module Mylib__B = <code of b.cmx, with references to P>

functor (P : <signature of p.cmi>) -> struct
  module A = Mylib__A
  module B = Mylib__B
end

Unfortunately, this encoding of functor packs is wrong: P is free in a.cmx and b.cmx and its identifier cannot correspond to the one generated for the functor retrospectively. The solution is actually quite simple and relies on a transformation known as closure conversion. In other words we will transform our modules into functors that takes as parameters their free variables, which in our case are the parameters of the functor pack and the dependencies from the same pack. Let's do it on a concrete functor equivalent to Mylib:

module Mylib' (P : P_SIG) = struct
  module A = struct .. <references to P> end
  module B = struct .. <references to P> <references to A> end
end

Our goal here is to move A and B outside of the functor, as such out of the scope of P, which is done by transforming those two modules into functors that takes a parameter P' with the same signature as P:

module A_funct (P' : P_SIG) = struct .. <references to P as P'> end 
module B_funct (P' : P_SIG) = struct 
  module A' = A_funct(P') 
  .. 
  <references to P as P'> 
  <references to A as A'> 
end 

module Mylib' (P : P_SIG) = struct 
  module A = A_funct(P) 
  module B = B_funct(P) 
end

While this code compiles it is not semantically equivalent. A_funct is instantiated twice, its side effects are computed twice: the first time when instantiating A in the functor, and the second when instantiating B. The solution is simply to go further with closure conversion and make the result of applying A_funct to P an argument of B_funct.

module A_funct (P' : P_SIG) = struct .. <references to P as P'> end
module B_funct (P' : P_SIG)(A': module type of A_funct(P'))= struct
  ..
  <references to P as P'>
  <references to A as A'>
end

module Mylib' (P : P_SIG) = struct
  module A = A_funct(P)
  module B = B_funct(P)(A)
end

This represents exactly how our functor pack Mylib is encoded. Since we need to compile modules in a specific way if they belong to a functor pack, the compiler has to know in the argument -for-pack that the pack is a functor, and what are its parameters.

Functor packs applied to `ocamlopt`

What we described is a functional prototype of functor packs, implemented on OCaml 4.10, as described in RFC#11. In practice, we already have one usage that we could benefit of in the future: cross-compilation of native code. At the moment the compiler is configured to target the architecture which it is compiled on. The modules relative to the current architecture are linked symbolically into the backend folder and the backend is compiled as if it only supported one architecture. One downside of this approach is that changes into the interface of the backend that need some modifications in each architecture are not detected at compile time, but only for the current one. You need to reconfigure the OCaml compiler and rebuild it to check if another architecture still compiles. One interesting property is that each architecture backend defines the same set of modules with compatible interfaces. In other words, these modules could simply be parameters of a functor, that is instantiated for a given architecture.

Following this idea, we implemented a prototype of native compiler whose backend is indeed packed into a functor, and instantiated at the initialization of the compiler. With this approach, we can easily switch the targeted architecture, and moreover we can be sure that each architecture is compiled, leveraging the fact that some necessary refactoring is always done when changes happen in the backend interface. Implementing such a functor is mainly a matter of adapting the build system to produce a functor pack, writing few signatures for the functor and its parameters, and instantiating the backend at the right time.

This proof of concept shows how functor packs can ease some complicated build system and allows new workflow.

Making packs useful again

Packs were an old concept mainly outdated by module aliases. They were not practical as they are some sort of monolithic libraries shaped into a unique module containing sub modules. While they perfectly use the module system for its namespacing properties, their usage enforces the compiler to link an entire library even if only one module is actually used. This improvement allows programmers to define big functors, functors that are split among multiple files, resulting in what we can view as a way to implement some form of parameterized libraries.

In the second part, we will cover another aspect of the rehabilitation of packs: using packs to implement mutually recursive compilation units.

Comments

François Bobot (25 September 2020 at 9 h 16 min):

I believe there is a typo

module Mylib’ (P : P_SIG) = struct
module A = A_funct(P)
module B = A_funct(P)
end

The last must be B_funct(P), the next example as also the same typo.

Pierrick Couderc (25 September 2020 at 10 h 31 min):

Indeed, thank you!

Cyrus Omar (8 February 2021 at 3 h 49 min):

This looks very useful! Any updates on this work? I’d like to be able to use it from dune.

A Dune Love story: From Liquidity to Love

2020-06-09T09:05:17Z

By OCamlPro & Origin Labs

Writing smart contacts may often be a burdensome task, as you need to learn a new language for each blockchain you target. In the Dune Network team, we are willing to provide as many possibilities as possible for developers to thrive in an accessible and secure framework.

There are two kinds of languages on a blockchain: “native” languages that are directly understood by the blockchain, but with some difficulty by the developers, and “compiled” languages that are more transparent to developers, but need to be translated to a native language to run on the blockchain. For example, Solidity is a developer-friendly language, compiled to the native EVM language on the Ethereum blockchain.

Dune Network supports multiple native languages:

Love, a type-safe language with a ML syntax and suited for formal verification
Michelson, inherited from Tezos, also type-safe, much more difficult to read
Solidity, the Ethereum language, of which we are currently implementing the interpreter after releasing its parser in OCaml a few weeks ago

On the side of compiled languages, Dune Network supports:

Liquidity, a type-safe ML language suited for formal verification, that compiles to Michelson (and allows developers to decompile Michelson for auditing)
ReasonML, a JavaScript language designed by Facebook that compiles down to Michelson through Liquidity
All other Tezos languages that compile to Michelson (for example Ligo, SmartPy, Albert...)

Though Liquidity and Love are both part of the ML family, Liquidity is much more developer-friendly: types are inferred, whereas in Love they have to be explicit, and Liquidity supports the ReasonML JavaScript syntax while Love is bound to its ML syntax.

For all these reasons, we are pleased to announce a wedding: Liquidity now supports the Love language!

Liquidity now supports generating Love smart contracts

This is great news for Love, as Liquidity is easier to use, and comes with an online web editor, Try-Liquidity. Liquidity is also being targeted by the ConCert project, aiming at verifying smart contracts with the formal verification framework Coq.

The Smart Contract Framework on the Dune Network

Compiling contracts from Liquidity to Love has several benefits compared to Michelson. First, Love contracts are about 60% smaller than Michelson contracts, hence they are 60% cheaper to deploy. Also, the compiler outputs a Love contract that can be easily read and audited.

The Love compiler is part of the Liquidity project. It works as follows:

The Liquidity contract is type-checked by the Liquidity compiler. The strong type system of liquidity enforces structural & semantic properties on data.
The typed Liquidity contract is compiled to a typed Love contract. During this step, the Liquidity contract is scanned to check if it complies with the Love requirements (correct use of operators, no reentrancy, etc.).
The Love contract is type-checked. Once this step is completed, the contract is ready to be deployed on the chain!

Want to try it out? Check the Try-Liquidity website: you can now compile and deploy your Liquidity contracts in Love from the online editor directly to the Mainnet and Testnet using Dune Metal!

These are some of the resources you might find interesting when building your own smart contracts:

The Love Language Documentation: https://dune.network/docs/dune-dev-docs/love-doc/introduction.html
Try-Liquidity: https://www.liquidity-lang.org/edit/
The Liquidity Website: https://www.liquidity-lang.org/
The Dune Network Website: https://dune.network

About Origin Labs

Origin Labs is a company founded in 2019 by the former blockchain team at OCamlPro. At Origin Labs, they have been developing Dune Network, a fork of the Tezos blockchain, its ecosystem, and applications over the Dune Network platform. At OCamlPro, they developed TzScan, the most popular block explorer at the time, Liquidity, a smart contract language, and were involved in the development of the core protocol and node. Feel free to reach out by email: contact@origin-labs.com.

[Interview] Sylvain Conchon joins OCamlPro

2020-06-06T09:05:17Z

On April 2020, Sylvain Conchon joined the OCamlPro team as our Chief Scientific Officer on Formal Methods. Sylvain is a professor at University Paris-Saclay, he has also been teaching OCaml in universities for about 20 years. He is the co-author of Apprendre à programmer avec OCaml with Jean-Christophe Filliâtre, a book for students in French elitist Preparatory Schools. His field of expertise is the automated deduction for program verification and model checking of parameterized systems. He is also the co-creator of Alt-Ergo, our SMT Solver dedicated to program verification, used by Airbus and qualified for the [DO-178C](http://(https://en.wikipedia.org/wiki/DO-178C) avionic standard, of [Cubicle](http://cubicle.lri.fr/) and the very useful [OCamlgraph](https://opam.ocaml.org/packages/ocamlgraph/) library.

Research and Industry

Sylvain, you’ve been involved in the industrial world for a long time, what do you think about the interactions between industry and research labs?

I’ve always found interactions with industry professionals to be very rewarding. During my studies, I worked for several years in IT (SSII), and as a university professor, I have supervised students during their internships or apprenticeships in tech companies or at large industrial companies every year. I also take part in research projects that involve industrial partners, and I spent some time at Intel in Portland, which allowed me to discover the computer hardware industry from inside.

How do you establish a fruitful collaboration between academia and industry?

It’s primarily a question of mutual understanding. You can see it clearly during collaborative research projects that involve both academics and industrial partners. Tools resulting from research, no matter what they are, have to be relevant to real industrial problems. Once that’s taken care of, the software also needs to be usable by industry professionals without them needing to understand its inner workings (for instance they shouldn’t have to specify all 50 necessary options for its use, interpret its results, or its absence of results!).

This requires a significant engineering effort geared towards the end user; and this task is not part of usual research activity. So, we first need to really understand the problems and needs of the industrial partner, and then determine whether our technologies and tools can be adapted or used to prototype a relevant solution.

You’ve just joined OCamlPro, what are your first thoughts?

I am very happy to be joining such a dynamic company full of talented, motivated, friendly people, where they do both high-level engineering and top-quality research! Several of my former PhD students are also working at OCamlPro, such as Albin Coquereau, David Declerck and Mattias Roux. With Mohamed Iguernlala and Alain Mebsout at our partner Origin Labs, and with the other OCP team members, it makes our team rock-solid in formal methods tooling development.

“Tools resulting from research, no matter what they are, have to satisfy real industry needs.”

OCaml, a Cutting-Edge Language

You are well known in the OCaml community, and some of your students became fans of OCaml (and of your teaching)… What do you say to your students who are just discovering OCaml?

I tend to summarize it with one phrase: “With OCaml, you’re not learning the computer programming of the last 10 years, you’re learning the programming of the 10 coming years”. This has proven true numerous times, because a good number of OCaml’s features were to be found in mainstream languages years later. That being said, all my years of teaching this language have led me to think that some modifications to its syntax would make the language easier to tackle for some beginners.

How did you personally discover OCaml?

During my master’s thesis (maîtrise) at university: one of my teachers pointed this language to me; they believed it would help me write a compiler for another programming language. So, I discovered OCaml by myself, by reading the manual and going through examples. It wasn’t until my MASt (DEA) that I discovered the theoretical foundations of this fantastic language (semantics, typing, compilation).

Would you say OCaml is an industrial programming language?

The question needs to be clarified: what is an industrial programming language? If by industrial language you mean one that is used by industry professionals, then I’d say that OCaml needs to be used more widely to be classified as such. If the question is whether OCaml is at the same level as languages used in industry, then it absolutely is. But maybe the question is more about the OCaml ecosystem and how developed the available tooling is: certain improvements undoubtedly need to be made in order to reach the level of a widespread industrial programming language. But we’re on the right track, especially thanks to companies like OCamlPro and its projects like Opam and Try-OCaml for example.

Formal Methods as an Industrial Technique, and the Example of the Alt-Ergo Solver

Formal methods being one of OCamlPro’s areas of expertise, in what way do you think OCaml is suited for the SMT domain?

Tools like SMT solvers are mainly symbolic data manipulation software that allow you to analyze, transform, and reason about logical formulas. OCaml is made for that. There is also a more “computational” side to these tools, which requires precise programming of data structures as well as efficient memory management. OCaml, with its extremely efficient garbage collector (GC), is particularly suited for this kind of development. SMT solvers are tools that also need to be very reliable because errors are difficult to find and are potentially very harmful. OCaml’s type system contributes to the reliability of these tools.

“SMT solvers are nowadays essential in software engineering”

Can you describe Alt-Ergo in a few words?

Alt-Ergo is a software for proving logical formulas automatically (without human intervention), meaning proving whether a formula is true or false. Alt-Ergo belongs to a family of automated provers called SMT (Satisfiability Modulo Theories). It was designed to be integrated into program verification platforms. These platforms (like [Why3](https://(https://why3.lri.fr/), Frama-C, Spark…) generate logical formulas that need to be proven in order to guarantee that a program is safe. Proving these formulas by hand would be very tedious (there are sometimes tens of thousands of formulas to prove). An SMT solver such as Alt-Ergo is there to do that job in a completely automated way. It is what allows these verification platforms to be used at an industrial level.

In what way developing this software in OCaml benefits Alt-Ergo over its competitors?

It makes it more reliable, since an SMT solver, like any program, can have bugs. Most of Alt-Ergo is written in a purely functional programming style, i.e. only using immutable data structures. One of the advantages of this programming style is that it allowed us to formally prove the main components of Alt-Ergo (for example, its kernel was formalized using the Coq proof assistant, which would have been impossible with a language like C++) without sacrificing efficiency thanks to a very good garbage collector and OCaml’s very powerful persistent data structure library. We made use of OCaml’s module system, particularly functors and recursive modules, to conceive a very modular code, making it maintainable and easily extensible. OCaml allowed us to create an SMT solver just as efficient as CVC4 or Z3 for program verification, but with a total number of lines of code divided by three or four.This obviously does not guarantee that Alt-Ergo has zero bugs, but it really helps us in fixing any if they are found.

What is your opinion on SMT solvers and the current state of the art of SMT?

Today, SMT solvers are essential in software engineering. They can be found in various tools for proving, testing, model checking, abstract interpretation, and typing. The main reason for this success is that they are becoming increasingly efficient and the underlying theories are becoming more and more expressive. It is a very competitive area of research among the world’s best universities and research labs, as well as large IT companies. But there is still a lot of room for improvement, particularly in the nonlinear arithmetic domain, where user demand is growing. For now, one of my research objectives is to combine Model Checking tools with program verification. These two types of tools are based on SMT and should complement each other to offer even more automation to verification tools.

What applications can SMT techniques and Alt-Ergo have in industry?

SMT techniques can be used wherever formal methods are useful. Including, but not limited to verifying the safety of critical software in embedded systems, finding security vulnerabilities in computer systems, or resolving planning problems. They can also be found in domains of artificial intelligence, where it is crucial to guarantee neural network stability and produce formal explanations of their results.

You ended up working on Model Checking, can you tell us about how Model Checking is connected to SMT and how it is currently used?

Model Checking consists of verifying that all possible states of a system respect certain properties, regardless of the input data. This is a difficult problem because some systems (like microprocessors for example) can have hundreds of millions of states. To reach that scale, model checkers implement extremely sophisticated algorithms to visit these states quickly by storing them in a compact manner. That said, this technique reaches its limits when the input values are unbounded or when the number of system components is unknown. Imagine Internet routing algorithms where you don’t know how many machines are connected. These algorithms must be correct no matter the number of machines. This is where SMT solvers come into play. By using logical formulas, we’re able to represent sets of states of arbitrary sizes. Visiting system states becomes calculating the formulas that represent the states satisfying the desired properties, etc. Therefore, everything in Model Checking is based on logical formulas, and SMT solvers are of course there to reason about these formulas.

[Interview] Sylvain Conchon rejoint OCamlPro

2020-06-05T09:05:17Z

Sylvain Conchon vient de rejoindre OCamlPro en tant que Chief Scientific Officer Méthodes Formelles. Professeur à l’Université Paris-Saclay, il travaille dans le domaine de la démonstration automatique pour la preuve de programmes et le model checking pour systèmes paramétrés. Il est aussi le co-créateur d’Alt-Ergo.

Recherche et industrie

Sylvain, tu fréquentes de longue date le monde industriel, que penses-tu des interactions entre les industriels et les laboratoires de recherche ?

J’ai toujours trouvé très enrichissantes les interactions avec les industriels. Pendant mes études, j’ai travaillé plusieurs années en SSII, et je suis mes étudiants en stage ou en apprentissage dans des sociétés technologiques ou chez de grands industriels. Je participe également à des projets de recherche qui impliquent des industriels,et j’ai passé quelques temps chez Intel à Portland, ce qui m’a permis de découvrir l’industrie du hardware.

Comment parvenir à établir des relations fructueuses entre le monde académique et les industriels ?

C’est beaucoup une histoire de rencontre. On le voit lors des montages de projets de recherche collaboratifs qui réunissent académiques et industriels. Les outils issus de la recherche, quels qu’ils soient, doivent avant tout répondre à un besoin réel des industriels. Si c’est le cas, il faut aussi que le logiciel soit utilisable par des ingénieurs du métier sans qu’il leur soit nécessaire de comprendre son fonctionnement interne (par exemple, pour positionner les 50 options nécessaires à son utilisation, interpréter ses résultats ou ses absences de résultats!). Cela nécessite à l’évidence un travail d’ingénierie important, tourné vers l’utilisateur final et souvent éloigné des activités des chercheurs. Il faut donc comprendre les problèmes et les besoins des industriels, et ensuite déterminer si les technologies et les outils que l’on maîtrise peuvent être adaptés ou utilisés pour réaliser un prototype qui réponde à certains de ces besoins.

Tu viens de rejoindre OCamlPro, quelles sont tes premières impressions ?

Je suis heureux d’avoir rejoint une entreprise très dynamique, pleine de gens talentueux, motivés et sympathiques, où l’on fait à la fois de l’ingénierie de haut niveau et de la recherche de qualité !

“ Les outils issus de la recherche, quels qu’ils soient, doivent avant tout répondre à un besoin réel des industriels.”

OCaml, un langage de pointe

Tu es connu dans la communauté OCaml, et certains de tes étudiants sont devenus des fans d’OCaml (et de ton enseignement)… que dis-tu à tes étudiants qui découvrent OCaml ?

J’ai tendance à résumer en disant ceci : « avec OCaml, vous n’apprenez pas la programmation des 10 dernières années, mais celle des 10 prochaines années ». Cette affirmation s’est toujours vérifiée car bon nombre de traits du langage OCaml se sont retrouvés dans les langages mainstream, avec plusieurs années de décalage. Cela dit, mes années d’expérience dans l’enseignement de ce langage me laissent penser que quelques modifications dans sa syntaxe permettraient une approche plus aisée pour certains débutants.

Et toi, comment as-tu découvert OCaml ?

Pendant mes études à l’Université lors de mon projet de fin de maîtrise : un de mes enseignants m’avait orienté vers ce langage pour m’aider à réaliser un compilateur pour un langage de programmation concurrente. J’ai donc découvert ce langage par moi-même, en lisant le manuel et les exemples. Ce n’est que pendant mon DEA que j’ai découvert les fondements théoriques de ce beau langage (sémantique, typage, compilation).

OCaml, un langage industriel ou pas encore ?

Il convient de préciser la question : qu’est-ce qu’un langage industriel ? Si c’est un langage utilisé par les industriels, alors OCaml n’est hélas pas encore suffisamment utilisé dans l’industrie pour être qualifié ainsi. Si la question est de savoir s’il a le niveau des langages utilisés dans l’industrie, alors la réponse est oui, sans hésiter. Mais peut-être la question porte-t-elle davantage sur l’écosystème OCaml et la maturité de l’outillage: il y a sûrement des progrès à faire pour atteindre le niveau d’un langage très répandu dans l’industrie, mais c’est en bonne voie, en particulier grâce à des entreprises telles qu’OCamlPro.

Les méthodes formelles comme technique industrielle, et l’exemple du solveur Alt-Ergo

Les méthodes formelles sont l’un des domaines d’expertise d’OCamlPro, en quoi penses-tu qu’OCaml est adapté au domaine des SMT ?

Les outils comme les solveurs SMT sont principalement des logiciels de manipulation symbolique des données qui permettent d’analyser, de transformer et de raisonner sur des formules logiques. OCaml est fait pour ce genre de traitements. Il y a aussi une partie plus « calculatoire » dans ces outils qui nécessite une programmation fine des structures de données ainsi qu’une gestion efficace de la mémoire. OCaml est particulièrement adapté pour ce genre de développements, surtout avec son ramasse-miettes (GC) extrêmement performant. Enfin, les solveurs SMT sont des outils qui doivent avoir un grand niveau de fiabilité car les erreurs dans ces logiciels sont difficiles à trouver et leur présence peut être très préjudiciable. Le système de types d’OCaml contribue à la fiabilité de ces outils.

“Les solveurs SMT sont aujourd’hui incontournables dans le domaine de l’ingénierie du logiciel.”

Peux-tu nous parler d’Alt-Ergo en quelques mots ?

C’est un logiciel utilisé pour prouver automatiquement (sans intervention humaine) des formules logiques, c’est-à-dire savoir si ces formules sont vraies ou fausses. Alt-Ergo appartient à une famille de démonstrateurs automatiques appelée SMT (pour Satisfiabilité Modulo Théories). Il a été conçu pour être intégré dans des plate-formes de vérification de programmes. Ces outils (comme Why3, Frama-C, Spark,…) génèrent des formules logiques qu’il est nécessaire de prouver afin de garantir qu’un programme est sûr. Faire la preuve de ces formules à la main serait très fastidieux (il y a parfois plusieurs dizaines de milliers de formules à prouver). Un solveur SMT comme Alt-Ergo est là pour faire ce travail, de manière complètement automatique. C’est ce qui permet à ces plateformes de vérification d’être utilisables au niveau industriel.

En quoi le développement d’Alt-Ergo en OCaml peut-il être un avantage par rapport aux concurrents ?

Cela lui confère une plus grande sûreté, car un solveur SMT, comme n’importe quel programme peut aussi avoir des bugs. La plus grande partie d’Alt-Ergo est programmée dans un style purement fonctionnel, c’est-à-dire uniquement avec l’utilisation de structures de données immuables. L’un des avantages de ce style de programmation est qu’il nous a permis de prouver formellement ses principaux composants (par exemple, son noyau a été formalisé à l’aide de l’assistant à la preuve Coq, ce qui serait impossible à faire dans un langage comme C++), sans sacrifier son efficacité grâce au très bon ramasse-miettes et à la bibliothèque de structures de données persistantes très performantes d’OCaml. Enfin, nous avons largement bénéficié du système de modules d’OCaml, en particulier les foncteurs et les modules récursifs, pour concevoir un code très modulaire, maintenable et facilement extensible. Au final, OCaml nous a permis de concevoir un solveur SMT aussi performant que CVC4 ou Z3 pour la preuve de programmes, mais avec un nombre de lignes de code divisé par trois ou quatre. Bien sûr, cela ne garantit pas que Alt-Ergo ait zéro bugs, mais cela nous aide beaucoup à mettre le doigt dessus quand quelqu’un en trouve.

“OCaml nous a permis de concevoir un solveur SMT aussi performant que CVC4 ou Z3 pour la preuve de programmes, mais avec un nombre de lignes de code divisé par trois ou quatre.“

Quel est ton avis sur les solveurs SMT et l’état de l’art SMT actuel ?

Les solveurs SMT sont aujourd’hui incontournables dans le domaine de l’ingénierie du logiciel. On les trouve aussi bien dans des outils de preuve, de test, de model checking, d’interprétation abstraite ou encore de typage. La principale raison de ce succès est qu’ils sont de plus en plus efficaces et les théories sous-jacentes sont très expressives. C’est un domaine de recherche très concurrentiel entre les meilleures universités ou laboratoires du monde et de grandes entreprises en informatique. Mais la marge de progression de ces outils est encore très grande, en particulier dans le domaine de l’arithmétique non linéaire où la demande des utilisateurs est de plus en plus forte. Pour le moment, un de mes objectifs en recherche est de combiner les outils de Model Checking avec ceux de preuve de programmes. Ces deux familles d’outils reposent sur les SMT et elles devraient se compléter pour offrir des outils de vérification encore plus automatiques.

Quelles applications les techniques SMT et Alt-Ergo peuvent-elles avoir dans l’industrie ?

Les techniques SMT peuvent être utilisées partout où les méthodes formelles peuvent être utiles. Par exemple (mais cette liste est loin d’être exhaustive), pour vérifier la sûreté de logiciels critiques dans le domaine de l’embarqué, pour trouver des failles de sécurité dans les systèmes informatiques ou pour résoudre des problèmes de planification. On les trouve également dans le domaine de l’intelligence artificielle où il est crucial de garantir la stabilité des réseaux de neurones mais aussi de produire des explications formelles sur leurs résultats.

Tu as été amené à travailler sur le Model Checking, peux-tu nous parler des liens entre Model Checking et SMT et de son utilisation actuelle ?

Le Model Checking consiste à vérifier que tous les états possibles d’un système respectent bien certaines propriétés, et ce quelles que soient les données en entrée. C’est un problème difficile car certains systèmes (microprocesseurs par ex.) peuvent avoir des centaines de millions d’états. Pour passer à l’échelle, les model checkers implémentent des algorithmes très perfectionnés pour visiter ces états rapidement, en les stockant d’une manière très compacte. Cependant, cette technique atteint ses limites quand les valeurs prises en entrée sont non bornées ou quand le nombre de composants du système n’est pas connu. Pensez aux algorithmes de routage d’Internet où on ne connaît pas le nombre de machines sur le réseau, ces algorithmes doivent être corrects, quel que soit ce nombre de machines. C’est là que les solveurs SMT entrent en jeu. En utilisant des formules logiques, on peut représenter des ensembles d’états de taille arbitraire. Visiter les états d’un système consiste alors à calculer les formules qui représentent ces états. Vérifier que les états respectent une propriété revient à prouver que les formules qui représentent des états impliquent la propriété voulue, etc. Tout dans le Model Checking repose donc sur des formules logiques et les solveurs SMT sont évidemment là pour raisonner sur ces formules.

Tutoriel Format

2020-06-01T09:05:17Z

Article écrit par Mattias.

Le module Format d’OCaml est un module extrêmement puissant mais malheureusement très mal utilisé. Il combine notamment deux éléments distincts :

les boîtes d’impression élégante
les tags sémantiques

Le présent article vise à démystifier une grande partie de ce module afin de découvrir l’ensemble des choses qu’il est possible de faire avec.

Si tout va bien vous devriez passer de

(En réalité nous arriverons à un résultat légèrement différent car l’auteur de ce tutoriel n’aime pas tous les choix faits pour afficher les messages d’erreur en OCaml mais les différences n’auront pas de grande importance)

I. Introduction générale : `fprintf fmt "%a" pp_error e`

Si vous ne comprenez pas ce que le code dans le titre doit faire, je vous invite à lire attentivement ce qui va suivre. Sinon vous pouvez directement sauter à la deuxième partie.

I.1. Rappels sur `printf`

Pour rappel, la fonction printf est une fonction variadique (c’est-à-dire qu’elle peut prendre un nombre variable de paramètres).

Le premier paramètre est une chaîne de formattage composée de caractères et de spécificateurs de format.
- Les caractères sont affichés tels quels. printf "abc" affichera abc.
- Les spécificateurs de caractère sont des caractères précédés du caractère % (syntaxe héritée du C). Ils sont remplacés à l’exécution par un des paramètres fournis après la chaîne de formattage à la fonction et servent à indiquer de quel type doit être la valeur qui sera affichée (ainsi que d’autres informations dont les détails peuvent être trouvés dans la documentation du module Printf. printf "Test: %d" attend un entier signé et affichera Test: <d> avec <d> remplacé par l’entier fourni.
Les paramètres suivants sont les valeurs fournies à printf pour remplacer les spécificateurs de format
- printf "%d %s %c" 3 s 'a' affichera l’entier signé 3, une espace insécable, le contenu de la variable s qui doit être une chaîne de caractères, une autre espace insécable et finalement le caractère ‘a’.
- On remarque aussi qu’ici le nombre de paramètres supplémentaires fournis en plus de la chaîne de formattage correspond au nombre de spécificateurs et que ceux-ci ne peuvent être intervertis. printf "%d %c" 'a' 3 ne pourra pas être compilé/exécuté car %d attend un entier signé et le premier paramètre est un caractère. Les spécificateurs qui n’attendent qu’un argument sont des spécificateurs que j’appelle unaires et sont extrêmement faciles à utiliser, il faut seulement savoir quel caractère correspond à quel type et les donner dans le bon ordre comme illustré dans la figure ci-dessous (le chevron représentant la sortie standard)

I.2. Afficher un type défini par l’utilisateur

Arrive alors ce moment où vous commencez à définir vos propres structures de données et, malheureusement, il n’y a aucun moyen d’afficher votre expression avec les spécificateurs par défaut (ce qui semble normal). Définissons donc notre propre type et affichons-le avec les techniques déjà vues :

type error =
  | Type_Error of string * string
  | Apply_Non_Function of string

let pp_error = function
  | Type_Error (s1, s2) -> printf "Type is %s instead of %s" s1 s2
  | Apply_Non_Function s -> printf "Type is %s, this is not a function" s

Supposons maintenant que nous ayons une liste d’erreurs et que nous souhaitions les afficher en les séparant par une ligne horizontale. Une première solution serait la suivante :

let pp_list l =
  List.iter (fun e ->
      pp_error e;
      printf "\n"
    ) l

Cette façon de faire a plusieurs inconvénients (qui vont être magiquement réglés par la fonction du titre).

I.3. Afficher sur un `formatter` abstrait

Le premier inconvénient est que printf envoie son résultat vers la sortie standard alors qu’on peut vouloir l’envoyer vers un fichier ou vers la sortie d’erreur, par exemple.

La solution est fprintf (il serait de bon ton de feindre la surprise ici).

fprintf prend un paramètre supplémentaire avant la chaîne de formattage appelé formatter abstrait. Ce paramètre est du type formatter et représente un imprimeur élégant (ou pretty-printer)

c’est-à-dire l’objet vers lequel le résultat devra être envoyé. L’énorme avantage qui en découle est qu’on peut transformer beaucoup de choses en formatter. Un fichier, un buffer, la sortie standard etc. À vrai dire, printf est implémenté comme let printf = fprintf std_formatter

Pour l’utiliser on va donc modifier pp_error et lui donner un paramètre supplémentaire :

let pp_error fmt = function
  | Type_Error (s1, s2) -> fprintf fmt "Type is %s instead of %s" s1 s2
  | Apply_Non_Function s -> fprintf fmt "Type is %s, this is not a function" s

Puis on réécrit pp_list pour prendre cela en compte :

let pp_list fmt l =
  List.iter (fun e ->
      pp_error fmt e;
      fprintf fmt "\n"
    ) l

Comme on peut le voir dans la figure ci-dessous, fprintf imprime dans le formatter qui lui est fourni en paramètre et non plus sur la sortie standard.

Si on veut maintenant afficher le résultat sur la sortie standard il suffira simplement de donner pp_list std_formatter comme formatter à fprintf. Cette façon de faire n’a, en réalité, que des avantages, puisqu’elle permet d’être beaucoup plus fexible quant au formatter qui sera utilisé à l’exécution du programme.

I.4. Afficher des types complexes avec `%a`

Le deuxième problème arrivera bien assez vite si nous continuons avec cette méthode. Pour bien le comprendre, reprenons pp_error. Dans le cas de Type_error of string * string on veut écrire Type is s1 instead of s2 et on fournit donc à fprintf la chaîne de formattage "Type is %s instead of %s" avec s1 et s2 en paramètres supplémentaires. Comment devrions-nous faire si s1 et s2 étaient des types définis par l’utilisateur avec chacun leur fonction d’affichage pp_s1`` : formatter -> s1 -> unit et pp_s2 : formatter -> s2 -> unit ? En suivant la logique de notre solution jusqu’ici, nous écririons le code suivant :

let pp_error fmt = function
  | Type_Error (s1, s2) ->
    fprintf fmt "Type is ";
    pp_s1 fmt s1;
    fprintf fmt "instead of ";
    pp_s2 fmt s2
  | Apply_non_function s -> fprintf fmt "Type is %s, this is not a function" s

Il est assez facile de se rendre compte rapidement que plus nous devrons manipuler des types complexes, plus cette syntaxe s’alourdira. Tout cela parce que les spécificateurs de caractère unaires ne permettent de manipuler que les types de base d’OCaml.

C’est là qu’entre en jeu %a. Ce spécificateur de caractère est, lui, binaire (ternaire en réalité mais un de ses paramètres est déjà fourni). Ses paramètres sont :

Une fonction d’affichage de type formatter -> 'a -> unit (premier paramètre devant être fourni)
Le formatter dans lequel il doit afficher son résultat (qui ne doit pas être fourni en plus)
La valeur qu’on souhaite afficher

Il appliquera ensuite le formatter et la valeur à la fonction fournie comme premier argument et lui donner la main pour qu’elle affiche ce qu’elle doit dans le formatter qui lui a été fourni en paramètre. Lorsqu’elle aura terminé, l’impression continuera. L’exemple suivant montre le fonctionnement (avec une impression sur la sortie standard, fmt ayant été remplacé par std-formatter

Dans notre cas nous avions déjà transformé nos fonctions d’affichage pour qu’elles prennent un formatter abstrait et nous n’avons donc presque rien à modifier :

let pp_error fmt = function
  | Type_Error (s1, s2) -> fprintf fmt "Type is %s instead of %s" s1 s2
  | Apply_Non_Function s -> fprintf fmt "Type is %s, this is not a function" s

let pp_list fmt l =
  List.iter (fun e ->
      fprintf fmt "%a\n" pp_error e;
    ) l

Et, bien sûr, si s1 et s2 avaient eu leurs propres fonctions d’affichage :

let pp_error fmt = function
  | Type_Error (s1, s2) -> fprintf fmt "Type is %a instead of %a" pp_s1 s1 pp_s2 s2
  | Apply_Non_Function s -> fprintf fmt "Type is %s, this is not a function" s

Arrivé-e-s ici vous devriez être à l’aise avec les notions de formatter abstrait et de spécificateur de caractère binaire et vous devriez donc pouvoir afficher n’importe quelle structure de donnée, même récursive, sans aucun soucis. Je recommande vivement cette façon de faire afin que tout changement qui devrait succéder ne nécessite pas de changer l’intégralité du code.

II. Les boîtes d’impression élégante

Et pour justement avoir des changements qui ne nécessitent pas de tout modifier, il va falloir s’intéresser un minimum aux boîtes d’impression élégante.

Aussi appelées pretty-print boxes, je les appellerai “boîtes” dorénavant, un tutoriel existe déjà, fait par l’équipe de la bibliothèque standard.

L'idée derrière les boîtes est tout simple :

À mon niveau je m’occupe correctement de comment afficher mes éléments et je n’impose rien au-dessus.

Reprenons, par exemple, la fonction permettant d’afficher les error:

let pp_error fmt = function
  | Type_Error (s1, s2) -> fprintf fmt "Type is %s instead of %s" s1 s2
  | Apply_Non_Function s -> fprintf fmt "Type is %s, this is not a function" s

Si on ajoutait un retour à la ligne on imposerait à toute fonction nous appelant ce saut de ligne or ce n’est pas à nous d’en décider. Cette fonction, en l’état, fait parfaitement ce qu’elle doit faire.

Regardons, par contre, la fonction affichant une liste d’erreur :

let pp_list fmt l =
  List.iter (fun e ->
      fprintf fmt "%a\n" pp_error e;
    ) l

A l’issue de celle-ci un saut à ligne provenant du dernier élément est forcé. Non seulement il n’est pas recommandé d’utiliser n (ou @n ou même @.) car ce ne sont pas à proprement parler des directives de Format mais des directives systèmes qui vont donc chambouler le reste de l’impression.

Malheureusement bien trop de développeurs et développeuses ont découvert @. en même temps que Format et s’en servent sans restriction. Au risque de me répéter souvent : n’utilisez pas @. !

II.1. Le spécificateur `@`

On l’avait vu, une chaîne de formattage est composée de caractères et de spécificateurs de caractères commençant par % Les spécificateurs sont des caractères qui ne sont pas affichés et qui seront remplacés avant l’affichage final.

Format ajoute son propre spécificateur de caractère : @.

II.1.a. Le vidage (flush)

La première spécification qu’on a vue est donc celle qu’il ne faut presque jamais utiliser (ce qui pose la question de l’avoir mentionnée en premier lieu) : @.. Cette spécification indique seulement au moteur d’impression qu’à ce niveau là il faut sauter une ligne et vider l’imprimeur. Les deux autres spécifications semblables sont @n qui n’indique que le saut de ligne et @? qui n’indique que le vidage de l’imprimeur. L’inconvénient de ces trois spécificateurs est qu’ils sont trop puissants et chamboulent donc le bon fonctionnement du reste de l’impression. Je n’ai personnellement jamais utilisé @n (autant utiliser une boîte avec un spécificateur de coupure comme nous le verrons immédiatement après) et n’utilise @. que lorsque je sais qu’il ne reste rien à imprimer.

II.1.b. Les indications de coupure ou d’espace

Important :

Une indication de coupure saute à la ligne s’il le faut sinon elle ne fait rien
Une indication d’espace sécable saute à la ligne s’il le faut, sinon elle affiche une espace

Les deux sont donc des indications de saut de ligne si nécessaire, il n’existe pas d’indication d’espace par défaut ou rien s’il n’y a pas assez d’espace (utiliser affichera toujours une espace).

Les indications sont au nombre de trois (et leur fonctionnement sera bien plus clair lorsque vous verrez les boîtes) :

@, : indication de coupure (c’est-à-dire rien prioritairement ou un saut à la ligne s’il le faut)
@⎵ : indique une espace sécable (c’est-à-dire une espace prioritairement ou un saut à la ligne s’il le faut) (Il faut bien évidemment comprendre le caractère ⎵ comme l’espace blanc habituel)
@;<n o> : indique n espaces sécables ou une coupure indentée de o (c’est-à-dire n espaces sécables prioritairement ou un saut à la ligne avec une indentation supplémentaire de o s’il le faut)

D’après ce que je viens d’écrire il devrait être évident maintenant que le caractère est une espace insécable qui ne provoquera donc pas de saut à la ligne quand bien même on dépasserait les limites de celle-ci. Contrairement à nos espaces de traitement de texte habituel qui sont des espaces sécables (pouvant provoquer des sauts de ligne), il faut spécifier quels espaces sont sécables lorsqu’on utilise Format.

On écrira par exemple fprintf fmt "let rec f =@ %a" pp_expr e car on ne veut pas que let rec f = soit séparé en plusieurs lignes mais on met bien @⎵ avant %a car l’expression sera soit sur la même ligne si suffisament petite soit à la ligne suivante si trop grande (on devrait même écrire @;<1 2> pour que l’expression soit indentée si on saute à la ligne suivante mais, on va le voir immédiatement, c’est là que les boîtes nous permettent d’automatiser ce genre de comportement)

II.1.c. Les boîtes

La deuxième spécification est celle permettant d’ouvrir et de fermer des boîtes.

Une boîte se commence par @[ et se termine par @]. Entre ces deux bornes, on fait ce qu’on veut (sauf utiliser @., @? ou @\n !). Tout ce qui se passe à l’intérieur de la boîte reste (et doit rester) à l’intérieur de celle-ci. Indentation, coupures, boîtes verticales, horizontales, les deux, l’une ou l’autre, toutes ces options sont accessibles une fois qu’une boîte a été ouverte. Voyons-les rapidement (pour rappel, la version détaillée est disponible dans le tutoriel.

Une fois qu’une boîte a été ouverte on peut préciser entre deux chevrons le comportement qu’on veut qu’elle ait en cas d’indication de coupure, en voici un rapide aperçu :

<v> : Toute indication de coupure entraîne un saut à la ligne
<h> : Toute indication d’espace entraîne une espace, les indications de coupure n’ont aucun effet
<hv>: Si toute la boîte peut être imprimée sur la même ligne alors seules les indications d’espace sont prises en compte sinon seules les indications de coupure le sont et chaque élément est imprimé sur sa propre ligne
<hov> : Tant que des éléments peuvent être imprimés sur une ligne ils le sont avec leurs indications d’espace. Les indications de coupure sont utilisées lorsqu’il faut sauter une ligne.

Chacun de ces comportements peut se voir attribuer une valeur supplémentaire, sa valeur d’indentation, qui indique l’indentation par rapport au début de la boîte qui devra être ajoutée à chaque saut de ligne.

Soit le code suivant permettant d’afficher une liste d’items séparés soit par une indication de coupure @,, soit par une indication d’espace @⎵ soit par une indication d’espace ou de coupure indentée @;<2 3> (2 espaces ou une coupure indentée de trois espaces) :

open Format

let l = ["toto"; "tata"; "titi"]

let pp_item fmt s = fprintf fmt "%s" s

let pp_cut fmt () = fprintf fmt "@,"
let pp_spc fmt () = fprintf fmt "@ "
let pp_brk fmt () = fprintf fmt "@;<2 3>"


let pp_list pp_sep fmt l =
  pp_print_list pp_item ~pp_sep fmt l

Voici un récapitulatif des différents comportements de boîtes en fonction des indications de coupure/espace rencontrées :

(* Boite verticale (tout est coupure) *)
printf "------------@.";
printf "v@.";
printf "------------@.";
printf "@[<v 2>[%a]@]@." (pp_list pp_cut) l;
printf "@[<v 2>[%a]@]@." (pp_list pp_spc) l;
printf "@[<v 2>[%a]@]@." (pp_list pp_brk) l;
(* Sortie attendue:
------------
v
------------
[toto
  tata
  titi]
[toto
  tata
  titi]
[toto
     tata
     titi]
*)


(* Boîte horizontale (pas de coupure) *)
printf "------------@.";
printf "h@.";
printf "------------@.";
printf "@[<h 2>[%a]@]@." (pp_list pp_cut) l;
printf "@[<h 2>[%a]@]@." (pp_list pp_spc) l;
printf "@[<h 2>[%a]@]@." (pp_list pp_brk) l;
(* Sortie attendue:
------------
h
------------
[tototatatiti]
[toto tata titi]
[toto  tata  titi]
*)


(* Boîte horizontale-verticale
  (Affiche tout sur une ligne si possible sinon boîte verticale) *)
printf "------------@.";
printf "hv@.";
printf "------------@.";
printf "@[<hv 2>[%a]@]@." (pp_list pp_cut) l;
printf "@[<hv 2>[%a]@]@." (pp_list pp_spc) l;
printf "@[<hv 2>[%a]@]@." (pp_list pp_brk) l;
(* Sortie attendue:
------------
hv
------------
[toto
  tata
  titi]
[toto
  tata
  titi]
[toto
     tata
     titi]
*)


(* Boîte horizontale ou verticale tassante
  (Affiche le maximum possible sur une ligne avant de sauter à la
   ligne suivante et recommencer) *)
printf "------------@.";
printf "hov@.";
printf "------------@.";
printf "@[<hov 2>[%a]@]@." (pp_list pp_cut) l;
printf "@[<hov 2>[%a]@]@." (pp_list pp_spc) l;
printf "@[<hov 2>[%a]@]@." (pp_list pp_brk) l;
(* Sortie attendue:
------------
hov
------------
[tototata
  titi]
[toto tata
  titi]
[toto
     tata
     titi]
*)

(*  Boîte horizontale ou verticale structurelle
   (Même fonctionnement que la boîte tassante sauf pour le dernier
    retour à la ligne qui tente de favoriser une indentation de
    niveau 0) *)
printf "------------@.";
printf "b@.";
printf "------------@.";
printf "@[<b 2>[%a]@]@." (pp_list pp_cut) l;
printf "@[<b 2>[%a]@]@." (pp_list pp_spc) l;
printf "@[<b 2>[%a]@]@." (pp_list pp_brk) l
(* Sortie attendue:
------------
b
------------
[tototata
  titi]
[toto tata
  titi]
[toto
     tata
     titi]
*)

Petite précision sur l’utilisation ici des @. alors qu’il est recommandé de ne jamais les utiliser. Il ne faut en réalité pas jamais les utiliser, il faut seulement les utiliser lorsqu’on est sûr de n’être dans aucune boîte. Ici, par exemple, on souhaite marquer distinctement les différentes impressions de boîtes, il est donc tout à fait correct d’utiliser @. étant donné qu’on est sûr d’être au dernier niveau d’impression (rien au-dessus) et de ne pas casser une passe d’impression élégante. Il serait donc bien plus précis de dire

Il ne faut pas utiliser @., @n et @? dans des impressions qui sont ou seront potentiellement imbriquées

Mais il est bien plus simple pour commencer de ne jamais les utiliser quitte à les rajouter après.

Le comportement de la boîte b (boîte structurelle) semble être le même que celui de la boîte hov (boîte tassante) mais il se trouve des cas où les deux diffèrent (généralement lorsqu’un saut de ligne réduit l’indentation courante, la boîte structurelle saute à la ligne même s’il reste de la place sur la ligne courante). Je vous invite à consulter le tutoriel pour plus de précisions (je dois aussi avouer que leur fonctionnement est très proche de ce qu’on pourrait appeler “opaque” étant donné qu’en fonction de la taille de marge le comportement attendu aura lieu ou non. L’auteur de ce tutoriel tient à préciser qu’il utilise plutôt des boîtes verticales avec une indentation nulle s’il lui arrive de vouloir obtenir le comportement des boîtes structurelles, un exemple est fourni lors de l’affichage en HTML à la fin de ce document).

II.2. Récapitulatif

Il faut utiliser des boîtes
Les indications de vidage fermant toutes les boîtes, il ne faut surtout pas les utiliser dans des fonctions d’affichage internes, il faut se limiter aux indications de coupure et d’espace
Il faut vraiment utiliser des boîtes

Vous voilà armé-e-s pour utiliser Format dans sa version la plus simple, avec des boîtes, de l’indentation, des indications de coupure et d’espace.

Reprenons notre affichage d’erreur :

let pp_error fmt = function
  | Type_Error (s1, s2) -> fprintf fmt "@[<hov 2>Type is %s@ instead of %s@]" s1 s2
  | Apply_non_function s -> fprintf fmt "@[<hov 2>Type is %s,@ this is not a function@]" s

let pp_list fmt l =
  pp_print_list pp_error fmt l

On a encapsulé l’affichage des deux erreurs dans des boîtes hov avec une indication d’espace sécable au milieu et utilisé la fonction pp_print_list du module Format

Si je tente maintenant d’afficher une liste d’erreurs dans deux environnements, un de 50 colonnes et l’autre de 25 colonnes de largeur avec le code suivant :

let () =
  let e1 = Type_Error ("int", "bool") in
  let e2 = Apply_non_function ("int") in
  let e3 = Type_Error ("int", "float") in
  let e4 = Apply_non_function ("bool") in

  let el = [e1; e2; e3; e4] in
  pp_set_margin std_formatter 50;
  fprintf std_formatter "--------------------------------------------------@.";
  fprintf std_formatter "@[<v 0>%a@]@." pp_list el;
  pp_set_margin std_formatter 25;
  fprintf std_formatter "-------------------------@.";
  fprintf std_formatter "@[<v 0>%a@]@." pp_list el;

J’obtiens le résultat suivant :

--------------------------------------------------
Type is int instead of bool
Type is int, this is not a function
Type is int instead of float
Type is bool, this is not a function
-------------------------
Type is int
  instead of bool
Type is int,
  this is not a function
Type is int
  instead of float
Type is bool,
  this is not a function

Ce qu’on rajoute en verbosité on le gagne en élégance. Et en parlant d’élégance, ça manque de couleurs.

III. Les tags sémantiques

Cette partie n’est pas présente dans le tutoriel mais dans un article tutoriel qui l’explique assez rapidement.

La troisième spécification, donc (après celles de coupure et de boîtes), est la spécification de tag sémantique : @{ pour en ouvrir un et @} pour le fermer.

III.1. Marquer son texte

Mais avant de comprendre leur fonctionnement, cherchons à comprendre leur intérêt. Que vous souhaitiez afficher dans un terminal, dans une page html ou autre, il y a de fortes chances que cette sortie accepte les marques de texte comme l’italique, la coloration etc. Utilisateur d’emacs et d’un terminal ANSI je peux modifier l’apparence de mon texte grâce aux codes ANSI :

Si je crée un programme OCaml qui affiche cette chaîne de charactère et que je l’exécute directement dans mon terminal je devrais obtenir le même résultat :

Naturellement, ça ne fonctionne pas, si l’informatique était standardisée et si tout le monde savait communiquer ça se saurait. Il s’avère que le caractère 033 est interprété en octal par les terminaux ANSI mais en décimal par OCaml (ce qui semble être l’interprétation normale). OCaml permet de représenter un caractère selon plusieurs séquences d’échappement différentes :

Séquence	Caractère résultant
`DDD`	le caractère correspondant au code ASCII `DDD` en décimal
`xHH`	le caractère correspondant au code ASCII `HH` en hexadécimal
`oOOO`	le caractère correspondant au code ASCII `OOO` en octal

On peut donc écrire au choix

let () = Format.printf "\027[36mBlue Text \027[0;3;30;47mItalic WhiteBG Black Text"
let () = Format.printf "\x1B[36mBlue Text \x1B[0;3;30;47mItalic WhiteBG Black Text"
let () = Format.printf "\o033[36mBlue Text \o033[0;3;30;47mItalic WhiteBG Black Text"

Dans tous les cas, on obtient le résultat suivant :

Que se passe-t-il, par contre, si j’exécute une de ces lignes dans un terminal non ANSI ? En testant sur TryOCaml:

On ne veut surtout pas que ce genre d’affichage puisse arriver. Il faudrait donc pouvoir s’assurer que le marquage du texte soit actif uniquement quand on le décide. L’idée de créer deux chaînes de formattage en fonction de notre capacité ou non à afficher du texte marqué n’est clairement pas une bonne pratique de programmation (changer une formulation demande de changer deux chaînes de formattage, le code est difficilement maintenable). Il faudrait donc un outil qui puisse faire un pré-traitement de notre chaîne de formattage pour lui ajouter des décorations.

Cet outil est déjà fourni par Format, ce sont les tags sémantiques.

III.2 Les tags sémantiques

Introduits par @{ et fermés par @}, comme les boîtes ils sont paramétrés par la construction <t> pour indiquer l’ouverture (et la fermeture) du tag t. Contrairement aux boîtes, les tags n’ont aucune signification pour l’imprimeur (on peut faire l’analogie avec les types de base d’OCaml que sont int, bool, float etc et les types définis par le programmeur ou la programmeuse (type t = A | B, par exemple. Les types de base ont déjà une quantité de fonctions qui leurs sont associés alors que les types définis ne signifient rien tant qu’on n’écrit pas les fonctions qui les manipuleront). L’avantage premier de ces tags est donc que, n’ayant aucune signification, ils sont tout simplement ignorés par l’imprimeur lors de l’affichage de notre chaîne de caractère finale:

Par défaut, l’imprimeur ne traite pas les tags sémantiques (ce qui permet d’avoir un comportement d’affichage aussi simple que possible par défaut). Le traitement des tags sémantiques peut être activé pour chaque formatter indépendamment avec les fonctions val pp_set_tags : formatter -> bool -> unit, val pp_set_print_tags : formatter -> bool -> unit et val pp_set_mark_tags : formatter -> bool -> unit dont on verra les effets immédiatement. Voyons déjà ce qui se passe avec la fonction générale pp_set_tags qui combine les deux suivantes :

Que s’est-il passé ?

Une fois que le traitement des tags sémantiques est activé, quatre opérations vont être effectuées à chaque ouverture et fermeture de tag :

print_open_stag suivie de mark_open_stag pour chaque tag t ouvert avec @{<t>
mark_close_stag suivie de print_close_stag pour chaque tag t fermé avec @} correspondant à la dernière ouverture @{<t>

Regardons les signatures de ces quatre opérations :

type formatter_stag_functions = {
    mark_open_stag : stag -> string;
       mark_close_stag : stag -> string;
       print_open_stag : stag -> unit;
       print_close_stag : stag -> unit;
}

Les fonctions mark_*_stag prennent un tag sémantique en paramètre et renvoie une chaîne de caractères quand les fonctions print_*_stag prennent le même paramètre mais ne renvoient rien. La raison derrière est en réalité toute simple :

Les fonctions de marquage écrivent directement dans la cible d’affichage (le terminal, le fichier ou autre)
Les fonctions d’affichage écrivent dans le formatter qui les traite comme des chaînes de caractères normales qui peuvent donc entraîner des sauts de ligne, des coupures, de nouvelles boîtes etc

Une indication de couleur pour un terminal ANSI n’apparaît pas à l’affichage, le texte se retrouve coloré, il semble donc naturel de ne pas vouloir que cette indication ait un effet sur l’impression élégante. En revanche, si on voulait avoir une sortie vers un fichier LaTeX ou HTML, cette indication de couleur apparaîtraît et devrait donc avoir une influence sur l’impression élégante.

Il est donc assez simple de savoir dans quel cas on veut utiliser print_*_stag ou mark_*_stag :

Si le tag doit avoir un impact immédiat sur l’apparence du texte affiché (couleur, taille, décorations…) et non pas son contenu, il faut utiliser mark_*_stag
Si le tag doit avoir un impact sur le contenu du texte affiché et non pas sur son apparence, il faut utiliser print_*_stag
Si le tag doit avoir un impact à la fois sur le contenu et l’apparence du texte affiché alors il faut utiliser les deux en séparant bien entre contenu géré par print_*_stag et apparence gérée par mark_*_stag

Ces quatres fonctions ont chacune un comportement par défaut que voici :

let mark_open_stag = function
  | String_tag s -> "<" ^ s ^ ">"
  | _ -> ""
let mark_close_stag = function
  | String_tag s -> "</" ^ s ^ ">"
  
let print_open_stag = ignore
let print_close_stag = ignore

Le type stag est un type somme extensible (introduits dans OCaml 4.02.0) c’est-à-dire qu’il est défini de la sorte

type stag = ..

type stag += String_tag of string

Par défaut seuls les String_tag of string sont donc reconnus comme des tags sémantiques (ce sont aussi les seuls qui peuvent être obtenus par la construction @{<t> ... @}, ici t sera traité comme String_tag t) ce qui est illustré par le comportement par défaut de mark_open_tag et mark_close_tag. Ce comportement par défaut nous permet aussi de comprendre ce qui est arrivé ici :

N’ayant pas personnalisé les opérations de manipulation des tags, leur comportement par défaut a été exécuté, ce qui revient à afficher directement le tag entre chevrons sans passer par le formatter. Il faut donc définir les comportements voulus pour nos tags (attention, ne manipulant que des chaînes de caractère, toute erreur est conséquemment difficile à identifier et corriger, il vaut mieux donc éviter les célèbres | _ -> () — il faudrait en réalité les éviter tout le temps si possible mais c’est une autre histoire).

Commençons donc par définir nos tags et ce à quoi on veut qu’ils correspondent :

open Format

type style =
  | Normal

  | Italic
  | Italic_off

  | FG_Black
  | FG_Blue
  | FG_Default

  | BG_White
  | BG_Default

let close_tag = function
  | Italic -> Italic_off
  | FG_Black | FG_Blue | FG_Default -> FG_Default

  | BG_White | BG_Default -> BG_Default

  | _ -> Normal

let style_of_tag = function
  | String_tag s -> begin match s with
      | "n" -> Normal
      | "italic" -> Italic
      | "/italic" -> Italic_off

      | "fg_black" -> FG_Black
      | "fg_blue" -> FG_Blue
      | "fg_default" -> FG_Default

      | "bg_white" -> BG_White
      | "bg_default" -> BG_Default

      | _ -> raise Not_found
    end
  | _ -> raise Not_found

Maintenant que chaque tag possible est géré, il nous faut les associer à leur valeur (ANSI dans ce cas) et implémenter nos propres fonctions de marquages (et pas d’affichage car a priori ces tags n’ont aucun effet sur le contenu du texte affiché) :

(* See https://en.wikipedia.org/wiki/ANSI_escape_code#SGR_parameters for some values *)
let to_ansi_value = function
  | Normal -> "0"

  | Italic -> "3"
  | Italic_off -> "23"

  | FG_Black -> "30"
  | FG_Blue -> "34"
  | FG_Default -> "39"

  | BG_White -> "47"
  | BG_Default -> "49"

let ansi_tag = Printf.sprintf "\x1B[%sm"

let start_mark_ansi_stag t = ansi_tag @@ to_ansi_value @@ style_of_tag t

let stop_mark_ansi_stag t = ansi_tag @@ to_ansi_value @@ close_tag @@ style_of_tag t

On se le rappelle, l’ouverture d’un tag ANSI se fait avec la séquence d’échappement x1B suivie de une ou plusieurs valeurs de tags séparées par ; entre [ et m. Dans notre cas chaque tag n’est associé qu’à une valeur mais il serait tout à fait possible d’avoir un Error -> "1;4;31" qui imposerait un affichage gras, souligné et en rouge. Tant que la chaîne de caractère renvoyée au terminal correspond bien à une séquence de marquage ANSI tout est possible.

Il faut ensuite faire en sorte que ces fonctions soient celles utilisées par le formatter lors de leur traitement :

let add_ansi_marking formatter =
  let open Format in
  pp_set_mark_tags formatter true;
  let old_fs = pp_get_formatter_stag_functions formatter () in
  pp_set_formatter_stag_functions formatter
    { old_fs with
      mark_open_stag = start_mark_ansi_stag;
      mark_close_stag = stop_mark_ansi_stag }

On utilise la fonction pp_set_mark_tags (au lieu de pp_set_tags) car on ne se sert pas de print_*_stags et on associe aux fonctions mark_*_stag les fonctions *_ansi_stag.

Il ne nous reste plus qu’à faire en sorte que les tags sémantiques soient traités et avec nos fonctions avant d’afficher notre chaîne de caractères :

let () =
  add_ansi_marking std_formatter;
  Format.printf "@{<fg_blue>Blue Text @}@{<italic>@{<bg_white>@{<fg_black>Italic WhiteBG BlackFG Text@}@}@}"

Et l’affichage dans le terminal sera bien celui voulu :

Si le programme doit être affiché dans un terminal non ANSI il suffit simplement d’enlever la ligne add_ansi_marking std_formatter; :

On pourrait aussi faire en sorte que notre texte puisse être envoyé vers un document HTML.

Il faut déjà changer les valeurs associées aux tags (on voit ici l’utilisation de boîtes verticales à indentation nulle mentionnée lors du paragraphe sur les boîtes structurelles) :

let to_html_value fmt =
  let fg_color c = Format.fprintf fmt {|@[<v 0>@[<v 2><span style="color:%s;">@,|} c in
  let bg_color c = Format.fprintf fmt {|@[<v 0>@[<v 2><span style="background-color:%s;">@,|} c in
  let close_span () = Format.fprintf fmt "@]@,</span>@]" in
  let default = Format.fprintf fmt in
  fun t -> match t with
    | Normal -> ()

    | Italic -> default "<i>"
    | Italic_off -> default "</i>"

    | FG_Black -> fg_color "black"
    | FG_Blue -> fg_color "blue"
    | FG_Default -> close_span ()

    | BG_White -> bg_color "white"
    | BG_Default -> close_span ()

La construction {| ... |} permet d’avoir des chaînes de caractères sans les caractères spéciaux " et `` ce qui permet d’écrire {|"This is a nice "|} sans espacer ces caractères.

De même, la construction

let fonction arg1 ... argn =
  let expr1 = ... in
  ...
  let exprn = ... in
fun argn1 ... argnm ->

Permet de définir des expressions internes à une fonction qui dépendent des arguments fournis avant et donc, dans le cas d’une application partielle, de calculer cet environnement une seule fois. Dans le cas de la fonction to_html_value je pourrai donc créer la nouvelle application partielle let to_html_value_std = to_html_value std_formatter qui contiendra donc directment les implémentations de fg_color, bg_color, close_span et default pour std_formatter.

Contrairement au cas du terminal ANSI, ce qui changera sera le contenu et non pas l’apparence du texte, nous utiliserons donc les fonctions print_*_stag. C’est pourquoi nos fonctions doivent directement écrire dans le formatter et non pas renvoyer une chaîne de caractères.

Les fonctions d’ouverture et de fermeture ne changent pas énormément :

let start_print_html_stag fmt t =
  to_html_value fmt @@ style_of_tag t

let stop_print_html_stag fmt t =
  to_html_value fmt @@ close_tag @@ style_of_tag t

On associe ensuite ces fonctions aux fonctions print_*_stag :

let add_html_printings formatter =
  let open Format in
  pp_set_mark_tags formatter false;
  pp_set_print_tags formatter true;
  let old_fs = pp_get_formatter_stag_functions formatter () in
  pp_set_formatter_stag_functions formatter
    { old_fs with
      print_open_stag = start_print_html_stag formatter;
      print_close_stag = stop_print_html_stag formatter}

On en profite pour désactiver le marquage sur le formatter passé en paramètre. Cela évite d’avoir de mauvaises surprises au cas où il aurait été activé précédemment (il aurait fallu faire de même lors du marquage pour le terminal ANSI).

Finalement, l’appel à :

let () =
  add_html_printings std_formatter;
  Format.printf "@[<v 0>@{<fg_blue>Blue Text @}@,@{<italic>@{<bg_white>@{<fg_black>Italic WhiteBG BlackFG Text@}@}@}@]@."

Nous donne le résultat attendu :

<span style="color:blue;">
  Blue Text
</span>
<i>
  <span style="background-color:white;">
     <span style="color:black;">
       Italic WhiteBG BlackFG Text
     </span>
   </span>
</i>

Conclusion

Nous voici arrivés à la fin de ce tutoriel qui, je l’espère, vous permettra d’appréhender le module Format avec bien plus de sérénité.

Dans les possibilités non présentées ici mais qu’il est intéressant d’avoir en mémoire :

Possibilité de redéfinir intégralement toutes les fonctions d’affichage définies dans l’enregistrement :

<span class="hljs-keyword">type</span>
formatter_out_functions = {
    out_string :
    <span class="hljs-built_in">string</span> -> <span class="hljs-built_in">int</span> -> <span class="hljs-built_in">int</span> -> <span class="hljs-built_in">unit</span>;
    out_flush :
    <span class="hljs-built_in">unit</span> -> <span class="hljs-built_in">unit</span>;
    out_newline :
    <span class="hljs-built_in">unit</span> -> <span class="hljs-built_in">unit</span>;
    out_spaces :
    <span class="hljs-built_in">int</span> -> <span class="hljs-built_in">unit</span>;
    out_indent :
    <span class="hljs-built_in">int</span> -> <span class="hljs-built_in">unit</span>;
}

Possibilité de transformer n’importe quel sortie en un formatter pour écrire directement dedans sans avoir à passer par des chaînes de caractère intermédiaire (notamment la fonction val formatter_of_buffer : Buffer.t -> formatter qui permet directement d’écrire dans un buffer
L’impression élégante symbolique qui imprime de façon symbolique donc permet de voir directement quelles directives seront envoyées au formatter à l’impression. Très utile pour débuguer en cas d’impression cacophonique mais aussi extrêmement puissant pour effectuer une phase de post-traitement (par exemple si on veut ajouter un symbole à chaque début de ligne)
Les fonctions utiles qu’il ne faut pas oublier d’utiliser (je sais que les devs OCaml aiment réinventer la roue mais il existe déjà des fonctions pour afficher des listes, des options et les résultats Ok _ | Error _) :

val pp_print_list : ?pp_sep:(formatter -> unit -> unit) -> (formatter -> 'a -> unit) -> formatter -> 'a list -> unit

(* Affiche une liste dont chaque élément est séparé par le séparateur par défaut `@,` ou celui fourni *)

val pp_print_option : ?none:(formatter -> unit -> unit) -> (formatter -> ‘a -> unit) -> formatter -> ‘a option -> unit

(* Affiche le contenu d’une option en cas de Some contenu et rien par défaut si None ou l’affichage fourni *)</p>

val pp_print result : ok:(formatter -> ‘a -> unit) -> error:(formatter -> ‘e -> unit) -> formatter -> (‘a, ‘e) result -> unit

(* Affiche le contenu d’un result. Les arguments ne sont ici pas optionnels et conditionnent l’affichage en cas de Ok </em> et de Error _ *)

Enfin, une pelletée de fonctions à la printf telles que, donc :
fprintf que nous avons déjà vue
dprintf qui permet de retarder l'évaluation de l'impression et donc de ne pas calculer des impressions qui ne seront jamais faites
ifprintf qui n'affiche rien (utile lorsqu'on veut avoir la même signature que fprintf mais en étant sûr que rien ne sera fait)

Sources :

Tutoriel du site OCaml
Richard Bonichon, Pierre Weis. Format Unraveled. 28ièmes Journées Francophones des LangagesApplicatifs, Jan 2017, Gourette, France. hal-01503081

Codes sources :

Code LaTeX correspondant à printf

\documentclass[tikz,border=10pt]{standalone}

\usepackage{tikz}
\usetikzlibrary{math}
\usetikzlibrary{tikzmark}

\usepackage{xcolor}

\pagecolor[rgb]{0,0,0}
\color[rgb]{1,1,1}

\colorlet{color1}{blue!50!white}
\colorlet{color2}{red!50!white}
\colorlet{color3}{green!50!black}

\begin{document}

\begin{tikzpicture}[remember picture]
\node [align=left,font=\ttfamily] at (0,0) {
    let s = "toto" in\[2em]
    printf "{color{color1}\tikzmarknode{scd}{\%d}}
        {color{color2}\tikzmarknode{scc}{\%c}}
        {color{color3}\tikzmarknode{scs}{\%s}}"
    {\color{color1}\tikzmarknode{d}{3}}
    {\color{color2}\tikzmarknode{c}{'c'}}
    {\color{color3}\tikzmarknode{s}{s}}\\[2em]
    > "3 c toto"
};
\draw[<-, color1] (scd.north) -- ++(0,0.5) -| (d);
\draw[<-, color2] (scc.south) -- ++(0,-0.4) -| (c);
\draw[<-, color3] (scs.north) -- ++(0,0.4) -| (s);
\end{tikzpicture}

end{document}

Code LaTeX correspondant à fprintf:

\documentclass[tikz,border=10pt]{standalone}

\usepackage{tikz}
\usetikzlibrary{math}
\usetikzlibrary{decorations.pathreplacing,tikzmark}

\usepackage{xcolor}

\pagecolor[rgb]{0,0,0}
\color[rgb]{1,1,1}

\colorlet{color1}{blue!50!white}
\colorlet{color2}{red!50!white}
\colorlet{color3}{green!50!black}

\begin{document}

\begin{tikzpicture}[remember picture]
\node [align=left,font=\ttfamily] at (0,0) {
    let s = "toto" in\\[2em]
    fprintf \tikzmarknode{fmt}{fmt} \tikzmarknode{str}{"{\color{color1}\tikzmarknode{scd}{\%d}}
        {\color{color2}\tikzmarknode{scc}{\%c}}
        {\color{color3}\tikzmarknode{scs}{\%s}}"}
    {\color{color1}\tikzmarknode{d}{3}}
    {\color{color2}\tikzmarknode{c}{'c'}}
    {\color{color3}\tikzmarknode{s}{s}}\\[2em]
    > \\
    (* fmt <- "3 c toto" *)
};
\draw[<-, color1] (scd.north) -- ++(0,0.5) -| (d);
\draw[<-, color2] (scc.south) -- ++(0,-0.3) -| (c);
\draw[<-, color3] (scs.north) -- ++(0,0.4) -| (s);
\draw[decorate,decoration={brace, amplitude=5pt, raise=10pt},yshift=-2cm] (str.south east) -- (str.south west) node[midway, yshift=-13pt](a){} ;

\draw[->, white] (a.south) -- ++(0,-0.1) -| (fmt);
\end{tikzpicture}

\end{document}

Code LaTeX correspondant à fprintf avec utilisation de %a

\documentclass[tikz,border=10pt]{standalone}

\usepackage{tikz}
\usetikzlibrary{math}
\usetikzlibrary{decorations.pathreplacing,tikzmark}

\usepackage{xcolor}

\pagecolor[rgb]{0,0,0}
\color[rgb]{1,1,1}

\colorlet{color1}{blue!50!white}
\colorlet{color2}{red!50!white}
\colorlet{color3}{green!50!black}

\begin{document}

\begin{tikzpicture}[remember picture]
\node [align=left,font=\ttfamily] at (0,0) {
    let s = "toto" in\\[2em]
    type expr = \{i: int; j: int\}\\
    let pp\_expr fmt {i; j} = fprintf fmt "<\%d, \%d> i j" in\\[2em]
    fprintf \tikzmarknode{fmt}{std\_formatter} \tikzmarknode{str}{"{\color{color1}\tikzmarknode{scd}{\%d}}
        {\color{color2}\tikzmarknode{sca}{\%a}}
        {\color{color3}\tikzmarknode{scs}{\%s}}"}
    {\color{color1}\tikzmarknode{d}{3}}
    {\color{color2}\tikzmarknode{ppe}{pp\_expr}}
    {\color{color2}\tikzmarknode{e}{\{i=1; j=2\}}}
    {\color{color3}\tikzmarknode{s}{s}}\\[2em]
    > "3 <1, 2> toto"
};
\draw[<-, color1] (scd.north) -- ++(0,0.5) -| (d);
\draw[<-, color2] (sca.south) -- ++(0,-0.3) -| (ppe);
\draw[<-, color2] (sca.65) -- ++(0,0.3) -| (e);
\draw[->, color2] (fmt.north) -- ++(0,0.2) -| (sca.115);
\draw[<-, color3] (scs.south) -- ++(0,-0.4) -| (s);
\draw[decorate,decoration={brace, amplitude=5pt, raise=12pt},yshift=-2cm]  (str.south east) -- (str.south west) node[midway, yshift=-13pt](a){} ;

\draw[->, white] (a.south) -- ++(0,-0.1) -| (fmt);
\end{tikzpicture}

\end{document}

A Solidity parser in OCaml with Menhir

2020-05-19T09:05:17Z

This article is cross-posted on Origin Labs’ Dune Network blog

We are happy to announce the first release of our Solidity parser, written in OCaml using Menhir. This is a joint effort with Origin Labs, the company dedicated to blockchain challenges, to implement a full interpreter for the Solidity language directly in a blockchain.

Solidity is probably the most popular language for smart-contracts, small pieces of code triggered when accounts receive transactions on a blockchain.Solidity is an object-oriented strongly-typed language with a Javascript-like syntax.

Solidity was first implemented for the Ethereum blockchain, with a compiler to the EVM, the Ethereum Virtual Machine.

Dune Network takes a different approach, as Solidity smart-contracts will be executed natively, after type-checking. Solidity will be the third native language on Dune Network, with Michelson, a low-level strongly-typed language inherited from Tezos, and Love, an higher-level strongly-typed language, also implemented jointly by OCamlPro and Origin Labs.

A first step has been accomplished, with the completion of the Solidity parser and printer, written in OCaml with Menhir.

This parser (and its printer companion) is now available as a standalone library under the LGPLv3 license with Linking Exception, allowing its integration in all projects. The source code is available at https://gitlab.com/o-labs/solidity-parser-ocaml.

Our parser should support all of Solidity 0.6, with the notable exception of inline assembly (may be added in a future release).

Example contract

Here is an example of a very simple contract that stores an integer value and allows the contract’s owner to add an arbitrary value to this value, and any other contract to read this value:

pragma solidity >=0.6.0 <0.7.0;

contract C {
    address owner;
    int x;

    constructor() public {
        owner = msg.sender;
        x = 0;
    }

    function add(int d) public {
        require(msg.sender == owner);
        x += d;
    }

    function read_x() public view returns(int) {
        return x;
    }
}

Parser Usage

Executable

Our parser comes with a small executable that demonstrates the library usage. Simply run:

./solp contract.sol

This will parse the file contract.sol and reprint it on the terminal.

Library

To use our parser as a library, add it to your program’s dependencies and use the following function:

Solidity_parser.parse_contract_file : string -> Solidity_parser.Solidity_types.module_

It takes a filename and returns a Solidity AST.

If you wish to print this AST, you may turn it into its string representation by sending it to the following function:

Solidity_parser.Printer.string_of_code : Solidity_parser.Solidity_types.module_ -> string

Conclusion

Of course, all of this is Work In Progress, but we are quite happy to share it with the OCaml community. We think there is a tremendous work to be done around blockchains for experts in formal methods. Do not hesitate to contact us if you want to use this library!

About Origin Labs

Origin Labs is a company founded in 2019 by the former blockchain team at OCamlPro. At Origin Labs, they have been developing Dune Network, a fork of the Tezos blockchain, its ecosystem, and applications over the Dune Network platform. At OCamlPro, they developed TzScan, the most popular block explorer at the time, Liquidity, a smart contract language, and were involved in the development of the core protocol and node.Do not hesitate to reach out by email: contact@origin-labs.com.

opam 2.1.0 alpha is here!

2020-04-22T09:05:17Z

We are happy to announce a alpha for opam 2.1.0, one year and a half in the making after the release of 2.0.0.

Many new features made it in (see the complete changelog or release note for the details), but here are a few highlights of this release.

Release highlights

The two following features have been around for a while as plugins and are now completely integrated in the core of opam. No extra installs needed anymore, and a more smooth experience.

Seamless integration of System dependencies handling (a.k.a. "depexts")

A number of opam packages depend on tools or libraries installed on the system, which are out of the scope of opam itself. Previous versions of opam added a specification format, and opam 2.0 already handled checking the OS and extracting the required system package names.

However, the workflow generally involved letting opam fail once, then installing the dependencies and retrying, or explicitely using the opam-depext plugin, which was invaluable for CI but still incurred extra steps.

With opam 2.1.0, depexts are seamlessly integrated, and you basically won't have to worry about them ahead of time:

Before applying its course of actions, opam 2.1.0 checks that external dependencies are present, and will prompt you to install them. You are free to let it do it using sudo, or just run the provided commands yourself.
It is resilient to depexts getting removed or out of sync.
Opam 2.1.0 detects packages that depend on stuff that is not available on your OS version, and automatically avoids them.

This is all fully configurable, and can be bypassed without tricky commands when you need it (e.g. when you compiled a dependency yourself).

Dependency locking

To share a project for development, it is often necessary to be able to reproduce the exact same environment and dependencies setting — as opposed to allowing a range of versions as opam encourages you to do for releases.

For some reason, most other package managers call this feature "lock files". Opam can handle those in the form of [foo.]opam.locked files, and the --locked option.

With 2.1.0, you no longer need a plugin to generate these files: just running opam lock will create them for existing opam files, enforcing the exact version of all dependencies (including locally pinned packages).

If you check-in these files, new users would just have run opam switch create . --locked on a fresh clone to get a local switch ready to build the project.

Pinning sub-directories

This one is completely new: fans of the Monorepo rejoice, opam is now able to handle projects in subtrees of a repository.

Using opam pin PROJECT_ROOT --subpath SUB_PROJECT, opam will look for PROJECT_ROOT/SUB_PROJECT/foo.opam. This will behave as a pinning to PROJECT_ROOT/SUB_PROJECT, except that the version-control handling is done in PROJECT_ROOT.
Use opam pin PROJECT_ROOT --recursive to automatically lookup all sub-trees with opam files and pin them.

Opam switches are now defined by invariants

Previous versions of opam defined switches based on base packages, which typically included a compiler, and were immutable. Opam 2.1.0 instead defines them in terms of an invariant, which is a generic dependency formula.

This removes a lot of the rigidity opam switch commands had, with little changes on the existing commands. For example, opam upgrade ocaml commands are now possible; you could also define the invariant as ocaml-system and have its version change along with the version of the OCaml compiler installed system-wide.

Configuring opam from the command-line

The new opam option command allows to configure several options, without requiring manual edition of the configuration files.

For example:

opam option jobs=6 --global will set the number of parallel build jobs opam is allowed to run (along with the associated jobs variable)
opam option depext-run-commands=false disables the use of sudo for handling system dependencies; it will be replaced by a prompt to run the installation commands.

The command opam var is extended with the same format, acting on switch and global variables.

Try it!

In case you plan a possible rollback, you may want to first backup your ~/.opam directory.

The upgrade instructions are unchanged:

Either from binaries: run

$~ bash -c "sh <(curl -fsSL https://raw.githubusercontent.com/ocaml/opam/master/shell/install.sh) --version 2.1.0~alpha"

or download manually from the Github "Releases" page to your PATH.

Or from source, manually: see the instructions in the README.

You should then run:

opam init --reinit -ni

This is still a alpha, so a few glitches or regressions are to be expected. Please report them to the bug-tracker. Thanks for trying it out, and hoping you enjoy!

NOTE: this article is cross-posted on opam.ocaml.org and ocamlpro.com.

opam 2.0.7 release

2020-04-21T09:05:17Z

We are pleased to announce the minor release of opam 2.0.7.

This new version contains backported small fixes:

Escape Windows paths on manpages [#4129 @AltGr @rjbou]
Fix opam installer opam file [#4058 @rjbou]
Fix various warnings [#4132 @rjbou @AltGr - fix #4100]
Fix dune 2.5.0 promote-install-files duplication [#4132 @rjbou]

Installation instructions (unchanged):

From binaries: run

bash -c "sh <(curl -fsSL https://raw.githubusercontent.com/ocaml/opam/master/shell/install.sh) --version 2.0.7"

From source, using opam:

opam update; opam install opam-devel

From source, manually: see the instructions in the README.

We hope you enjoy this new minor version, and remain open to bug reports and suggestions.

NOTE: this article is cross-posted on opam.ocaml.org and ocamlpro.com.

Le nouveau GC d’OCaml 4.10 : premier aperçu de la stratégie best-fit

2020-03-24T09:05:17Z

Le GC d’OCaml oeuvre discrètement à l’efficacité de vos allocations mémoire. Tel un héros de l’ombre, il reste méconnu de la plupart des hackers OCaml. Avec l’arrivée d’OCaml 4.10, il s’enrichit d’une nouvelle stratégie apparue dans le changelog, signée de Damien Doligez.

Dans cet article nous commençons à explorer la nouvelle stratégie baptisée *best-fit *du nouveau Glaneur de Cellules dans OCaml 4.10.

En savoir plus : article en anglais.

An in-depth Look at OCaml’s new “Best-fit” Garbage Collector Strategy

2020-03-23T09:05:17Z

The Garbage Collector probably is OCaml’s greatest unsung hero. Its pragmatic approach allows us to allocate without much fear of efficiency loss. In a way, the fact that most OCaml hackers know little about it is a good sign: you want a runtime to gracefully do its job without having to mind it all the time.

But as OCaml 4.10.0 has now hit the shelves, a very exciting feature is in the changelog:

#8809, #9292: Add a best-fit allocator for the major heap; still experimental, it should be much better than current allocation policies (first-fit and next-fit) for programs with large heaps, reducing both GC cost and memory usage. This new best-fit is not (yet) the default; set it explicitly with OCAMLRUNPARAM="a=2" (or Gc.set from the program). You may also want to increase the space_overhead parameter of the GC (a percentage, 80 by default), for example OCAMLRUNPARAM="o=85", for optimal speed. (Damien Doligez, review by Stephen Dolan, Jacques-Henri Jourdan, Xavier Leroy, Leo White)

At OCamlPro, some of the tools that we develop, such as the package manager opam, the Alt-Ergo SMT solver or the Flambda optimizer, can be quite demanding in memory usage, so we were curious to better understand the properties of this new allocator.

Minor heap and Major heap: the GC in a nutshell

Not all values are allocated equal. Some will only be useful for the span of local calculations, some will last as long as the program lives. To handle those two kinds of values, the runtime uses a Generational Garbage Collector with two spaces:

The minor heap uses the Stop-and-copy principle. It is fast but has to stop the computation to perform a full iteration.
The major heap uses the Mark-and-sweep principle. It has the perk of being incremental and behaves better for long-lived data.

Allocation in the minor heap is straightforward and efficient: values are stored sequentially, and when there is no space anymore, space is emptied, surviving values get allocated in the major heap while dead values are just forgotten for free. However, the major heap is a bit more tricky, since we will have random allocations and deallocations that will eventually produce a scattered memory. This is called fragmentation, and this means that you’re using more memory than necessary. Thankfully, the GC has two strategies to counter that problem:

Compaction: a heavyweight reallocation of everything that will remove those holes in our heap. OCaml’s compactor is cleverly written to work in constant space, and would be worth its own specific article!
Free-list Allocation: allocating the newly coming data in the holes (the free-list) in memory, de-scattering it in the process.

Of course, asking the GC to be smarter about how it allocates data makes the GC slower. Coding a good GC is a subtle art: you need to have something smart enough to avoid fragmentation but simple enough to run as fast as possible.

Where and how to allocate: the 3 strategies

OCaml used to propose 2 free-list allocation strategies: next-fit, the default, and first-fit. Version 4.10 of OCaml introduces the new best-fit strategy. Let’s compare them:

Next-fit, the original and remaining champion

OCaml’s original (and default) “next-fit” allocating strategy is pretty simple:

Keep a (circular) list of every hole in memory ordered by increasing addresses;
Have a pointer on an element of that list;
When an allocation is needed, if the currently pointed-at hole is big enough, allocate in it;
Otherwise, try the next hole and so-on.

This strategy is extremely efficient, but a big hole might be fragmented with very small data while small holes stay unused. In some cases, the GC would trigger costly compactions that would have been avoidable.

First-fit, the unsuccessful contender

To counteract that problem, the “first-fit” strategy was implemented in 2008 (OCaml 3.11.0):

Same idea as next-fit, but with an extra allocation table.
Put the pointer back at the beginning of the list for each allocation.
Use the allocation table to skip some parts of the list.

Unfortunately, that strategy is slower than the previous one. This is an example of making the GC smarter ends up making it slower. It does, however, reduce fragmentation. It was still useful to have this strategy at hand for the case where compaction would be too costly (on a 100Gb heap, for instance). An application that requires low latency might want to disable compaction and use that strategy.

Best-fit: a new challenger enters!

This leads us to the brand new “best-fit” strategy. This strategy is actually composite and will have different behaviors depending on the size of the data you’re trying to allocate.

On small data (up to 32 words), segregated free lists will allow allocation in (mostly) constant time.
On big data, a general best-fit allocator based on splay trees.

This allows for the best of the two worlds, as you can easily allocate your numerous small blocks in the small holes in your memory while you take a bit more time to select a good place for your big arrays.

How will best-fit fare? Let’s find out!

Try it!

First, let us remind you that this is still an experimental feature, which from the OCaml development team means “We’ve tested it thoroughly on different systems, but only for months and not on a scale as large as the whole OCaml ecosystem”.

That being said, we’d advise you don’t use it in production code yet.

Why you should try it

Making benchmarks of this new strategy could be beneficial for you and the language at large: the dev team is hoping for feedback, the more quality feedback you give means the more the future GC will be tuned for your needs.

In 2008, the first-fit strategy was released with the hope of improving memory usage by reducing fragmentation. However, the lack of feedback meant that the developers were not aware that it didn’t meet the users’ needs. If more feedback had been given, it’s possible that work on improving the strategy or on better strategies would have happened sooner.

Choosing the allocator strategy

Now, there are two ways to control the GC behavior: through the code or through environment variables.

First method: Adding instructions in your code

This method should be used by those of us who have code that already does some GC fine-tuning. As early as possible in your program, you want to execute the following lines:

let () =
Gc.(set
  { (get()) with
    allocation_policy = 2; (* Use the best-fit strategy *)
    space_overhead = 100; (* Let the major GC work a bit less since it's more efficient *)
  })

You might also want to add verbose = 0x400; or verbose = 0x404; in order to get some GC debug information. See here for more details on how to use the GC module.

Of course, you’ll need to recompile your code, and this will apply only after the runtime has initialized itself, triggering a compaction in the process. Also, since you might want to easily switch between different allocation policies and overhead specifications, we suggest you use the second method.

Second method: setting `$OCAMLRUNPARAM`

At OCamlPro, we develop and maintain a program that any OCaml developer should want to run smoothly. It’s called Opam, maybe you’ve heard of it? Though most commands take a few seconds, some administrative-heavy commands can be a strain on our computer. In other words: those are perfect for a benchmark.

Here’s what we did to benchmark Opam:

$ opam update
$ opam switch create 4.10.0
$ opam install opam-devel # or build your own code
$ export OCAMLRUNPARAM='b=1,a=2,o=100,v=0x404'
$ cd my/local/opam-repository
$ perf stat ~/.opam/4.10.0/lib/opam-devel/opam admin check --installability # requires right to execute perf, time can do the trick

If you want to compile and run your own benchmarks, here are a few details on OCAMLRUNPARAM:

b=1 means “print the backtrace in case of uncaught exception”
a=2 means “use best-fit” (default is 0 , first-fit is 1)
o=100 means “do less work” (default is 80, lower means more work)
v=0x404 means “have the gc be verbose” (0x400 is “print statistics at exit”, 0x4 is “print when changing heap size”)

See the manual for more details on OCAMLRUNPARAM

You might want to compare how your code fares on all three different GC strategies (and fiddle a bit with the overhead to find your best configuration).

Our results on opam

Our contribution in this article is to benchmark opam with the different allocation strategies:

Strategy:	Next-fit	First-fit	Best-fit
Overhead:	80	80	80	100	120
Cycles used (Gcycle)	2,040	3,808	3,372	2,851	2,428
Maximum heap size (kb)	793,148	793,148	689,692	689,692	793,148
User time (s)	674	1,350	1,217	1,016	791

A quick word on these results. Most of opam‘s calculations are done by dose and rely heavily on small interconnected blocks. We don’t really have big chunks of data we want to allocate, so the strategy won’t give us the bonus you might have as it perfectly falls into the best-case scenario of the next-fit strategy. As a matter of fact, for every strategy, we didn’t have a single GC compaction happen. However, Best-fit still allows for a lower memory footprint!

Conclusions

If your software is highly reliant on memory usage, you should definitely try the new Best-fit strategy and stay tuned on its future development. If your software requires good performance, knowing if your performances are better with Best-fit (and giving feedback on those) might help you in the long run.

The different strategies are:

Next-fit: generally good and fast, but has very bad worst cases with big heaps.
First fit: mainly useful for very big heaps that must avoid compaction as much as possible.
Best-fit: almost the best of both worlds, with a small performance hit for programs that fit well with next-fit.

Remember that whatever works best for you, it’s still better than having to malloc and free by hand. Happy allocating!

Comments

gasche (23 March 2020 at 17 h 50 min):

What about higher overhead values than 120, like 140, 160, 180 and 200?

Thomas Blanc (23 March 2020 at 18 h 17 min):

Because 100 was the overhead value Leo advised in the PR discussion I decided to put it in the results. As 120 got the same maximum heap size as next-fit I found it worth putting it in. Higher overhead values lead to faster execution time but a bigger heap.

I don’t have my numbers at hand right now. You’re probably right that they are relevant (to you and Damien at least) but I didn’t want to have a huge table at the end of the post.

nbbb (24 March 2020 at 11 h 18 min):

Higher values would allow us to see if best-fit can reproduce the performance characteristics of next-fit, for some value of the overhead.

nbbb (24 March 2020 at 16 h 51 min):

I just realized that 120 already has a heap as bit as next-fit — so best-fit can’t get as good as next-fit in this example, and higher values of the overhead are not quite as informative. Should have read more closely the first time.

Thomas Blanc (24 March 2020 at 16 h 55 min):

Sorry that it wasn’t as clear as it could be.

Note that opam and dose are in the best-case scenario of best-fit. Your own code would probably produce a different result and I encourage you to test it and communicate about it.

New version of TryOCaml in beta!

2020-03-16T09:05:17Z

We are happy to announce that our venerable "TryOCaml" service is being retired and replaced by a new, modern version based upon our work on Learn-OCaml.

→ Try it here ←

The new interface provides an editor panel besides the familiar top-level, error and warning positions highlighting, the latest OCaml release (4.10.0), local storage of your session, and more.

The service is still in beta, so it would be helpful if you could tell us about any hiccups you may encounter on the Discuss thread.

Let's read the testimony of Sylvain Conchon about our new version of TryOCaml:

“TryOCaml saved our lives in Paris Saclay in these times of social distancing. I teach functional programming with OCaml to my Y2 Bachelor’s Degree students. With the quarantine in place, we weren’t able to host the practical assignment in the machine room as usual, so we decided the students would do the exam at home. However, many of our students use Windows on which setting up OCaml is a hassle, or otherwise encountered problems while setting up the OCaml environment. We invited our students to use try-ocaml instead! Many have and the exam went really smoothly.”

Réunion annuelle du Club des utilisateurs d’Alt-Ergo

2020-03-03T09:05:17Z

La deuxième réunion annuelle du Club des utilisateurs d’Alt-Ergo a eu lieu à la mi-février ! Notre réunion annuelle est l’endroit idéal pour passer en revue les besoins de chaque partenaire concernant Alt-Ergo. Cette année, nous avons eu le plaisir de recevoir nos partenaires pour discuter de la feuille de route concernant les développements et les améliorations futures d’Alt-Ergo.

Alt-Ergo est un démonstrateur automatique de formules mathématiques, créé au LRI et développé par OCamlPro depuis 2013. Pour en savoir plus ou rejoindre le Club, visitez le site https://alt-ergo.ocamlpro.com/.

Notre Club a plusieurs objectifs, le premier étant de garantir la pérennité d’Alt-Ergo en favorisant la collaboration entre les membres du Club et en renforçant la collaboration avec les communautés de méthodes formelles telles que Why3. L’une de nos priorités est d’augmenter le nombre d’utilisateurs de notre outil en l’étendant à de nouveaux domaines tels que le Model Checking, la participation à des compétitions internationales étant également un moyen de gagner en visibilité. Enfin, le dernier objectif du Club est de trouver de nouveaux projets ou contrats pour le développement de fonctionnalités à long terme.

Nous remercions tous nos membres pour leur soutien et souhaitons la bienvenue à Mitsubishi Electric R&D Centre Europe qui rejoint AdaCore et le CEA List en tant que membre du Club cette année. Nous souhaitons également mettre en lumière l’équipe de développement Why3 avec laquelle nous travaillons pour améliorer nos outils.

Nos membres sont particulièrement intéressés par les points suivants :

– Une meilleure génération de modèles et de contre-exemples

– L’ajout de la théorie des séquences

– L’amélioration du support de l’arithmétique non linéaire dans Alt-Ergo

Ces fonctionnalités sont maintenant nos principales priorités. Pour suivre nos avancement et les nouveautés, n’hésitez pas à lire nos articles sur ce blog.

2019 chez OCamlPro

2020-02-05T09:05:17Z

OCamlPro a pour ambition d’aider les industriels dans leur adoption du langage OCaml et des méthodes formelles. L’entreprise est passée d’1 à 21 personnes et est restée fidèle à cet objectif. L’année 2019 chez OCamlPro a été très animée, et le nombre de réalisations impressionnant, d’abord dans le monde OCaml (flambda2 & optimisations du compilateur, opam 2, notre interface Rust pour memprof, des outils comme tryOCaml, ocp-indent, et le soutien à la OCaml Software Foundation), et dans le monde des méthodes formelles (nouvelles versions de notre solveur SMT Alt-Ergo, lancement du Club des utilisateurs Alt-Ergo,lancement du langage Love, etc.)

Lire la suite (en anglais)

2019 at OCamlPro

2020-02-04T09:05:17Z

OCamlPro was created to help OCaml and formal methods spread into the industry. We grew from 1 to 21 engineers, still strongly sharing this ambitious goal! The year 2019 at OCamlPro was very lively, with fantastic accomplishments all along!

Let's quickly review the past years' works, first in the world of OCaml (flambda2 & compiler optimisations, opam 2, our Rust-based UI for memprof, tools like tryOCaml, ocp-indent, and supporting the OCaml Software Foundation), then in the world of formal methods (new versions of our SMT Solver Alt-Ergo, launch of the Alt-Ergo Users' Club, the Love language, etc.).

In the World of OCaml

Flambda/Compilation Team

Work by Pierre Chambart, Vincent Laviron, Guillaume Bury and Pierrick Couderc

Pierre and Vincent's considerable work on Flambda 2 (the optimizing intermediate representation of the OCaml compiler – on which inlining occurs), in close cooperation with Jane Street (Mark Shinwell, Leo White and their team) aims at overcoming some of flambda's limitations. We have continued our work on making OCaml programs always faster: internal types are clearer, more concise, and possible control flow transformations are more flexible. Overall a precious outcome for industrial users. In 2019, the major breakthrough was to go from the initial prototype to a complete compiler, which allowed us to compile simple examples first and then to bootstrap it.

On the OCaml compiler side, we also worked with Leo on two new features: functorized compilation units and functorized packs, and recursive packs. The former will allow any developer to implement .ml files as if they were functors and not simply modules, and more importantly generate packs that are real functors. As such, this allows to split big functors into several files or to parameterize libraries on other modules. The latter allows two distinct usages: recursive interfaces, to implement recursive types into distinct .mlis as long as they do not need any implementation; and recursive packs, whose components are typed and compiled as recursive modules.

These new features are described on the new RFC repository for OCaml (a similar idea was suggested and implemented in 2011 by Fabrice Le Fessant).
The implementation is available on GitHub for both functorized packs and recursive packs. Be aware that both are based on an old version of OCaml for now, but should be in sync with the current trunk in the near future.
See also Vincent's OCamlPro’s compiler team work update of August 2019.

This work is allowed thanks to Jane Street's funding.

Work on a formalized type system for OCaml

Work of Pierrick Couderc

At the end of 2018, Pierrick defended his PhD on "Checking type inference results of the OCaml language", leading to a formalized type systems and semantics for a large subset of OCaml, or at least its unique typed intermediate language: the Typedtree. This work led us to work on new aspects of the OCaml compiler as recursive and functorized packs described earlier, and we hope this proves useful in the future for the evolution of the language.

The OPAM package manager

Work of Raja Boujbel and Louis Gesbert

OPAM is maintained and developed at OCamlPro by Louis and Raja. Thanks to their thorough efforts the opam 2.1 first release candidate is soon to be published!

Back in 2018, the long-awaited opam 2.0 version was finally released. It embedded many changes, in opam itself as well as for the community. The opam file format was redefined to simplify and add new features. With the close collaboration of OCamlLabs and opam repository maintainers, we were able to manage a smooth transition of the repository and whole ecosystem from opam 1.2 format to the new – and richer – opam 2.0 format. Other emblematic features are e.g. for practically integrated mccs solver, sandboxing builds, for security issues (we care about your filesystem!), for usability reworking of the pin' command, etc.

While the 2.1.0 version is in preparation, the 2.0.0 version is still updated with minor releases to fix issues. The lastest 2.0.6 release is fresh from January.

In the meantime, we continued to improve opam by integrating some opam plugins (opam lock, opam depext), recursively discover opam files in the file tree when pinning, new definition of a switch compiler, the possibility to use z3 backend instead of mccs, etc.

All these new features – among others – will be integrated in the 2.1.0 release, that is betaplanned for February. The best is yet to come!

More details: on https://opam.ocaml.org
Releases on Releases on https://github.com/ocaml/opam/releases & our blog

This work is allowed thanks to Jane Street's funding.

Encouraging OCaml adoption

OCaml Expert trainings for professional programmers

We proposed in 2019 some OCaml expert training specially designed for developers who want to use advanced features and master all the open-source tools and libraries of OCaml.

The "Expert" OCaml course is for already experienced OCaml programmers to better understand advanced type system possibilities (objects, GADTs), discover GC internals, write "compiler-optimizable" code. These sessions are also an opportunity to come discuss with our OPAM & Flambda lead developers and core contributors in Paris.

Next session: 3-4 March 2020, Paris (registration)

Our cheat-sheets on OCaml, the stdlib and opam

Work of Thomas Blanc, Raja Boujbel and Louis Gesbert

Thomas announced the release of our up-to-date cheat-sheets for the OCaml language, standard library and opam. Our original cheat-sheets were dating back to 2011. This was an opportunity to update them after the many changes in the language, library and ecosystem overall.

Cheat-sheets are helpful to refer to, as an overview of the documentation when you are programming, especially when you’re starting in a new language. They are meant to be printed and pinned on your wall, or to be kept in handy on a spare screen. They come in handy when your rubber duck is rubbish at debugging your code!

More details on Thomas' blog post

Open Source Tooling and Web IDEs

And let's not forget the other tools we develop and maintain! We have tools for education such as our interactive editor OCaml-top and Try-OCaml (from the previous work on the learn-OCaml platform for the OCaml Fun MOOC) which you can use to code in your browser. Developers will appreciate tools like our indentation tool ocp-indent, and ocp-index which gives you easy access to the interface information of installed OCaml libraries for editors like Emacs and Vim.

Supporting the OCaml Software Foundation

OCamlPro was proud to be one of the first supporters of the new Inria's OCaml Software Foundation. We keep committed to the adoption of OCaml as an industrial language:

"[…] As a long-standing supporter of the OCaml language, we have always been committed to helping spread OCaml's adoption and increase the accessibility of OCaml to beginners and students. […] We value close and friendly collaboration with the main actors of the OCaml community, and are proud to be contributing to the OCaml language and tooling." (August 2019, Advisory Board of the OCSF, ICFP Berlin)

More information on the OCaml Software Foundation

In the World of Formal Methods

By Mohamed Iguernlala, Albin Coquereau, Guillaume Bury

In 2018, we welcomed five new engineers with a background in formal methods. They consolidate the department of formal methods at OCamlPro, in particular help develop and maintain our SMT solver Alt-Ergo.

Release of Alt-Ergo 2.3.0, and version 2.0.0 (free)

After the release of Alt-Ergo 2.2.0 (with a new front-end that supports the SMT-LIB 2 language, extended prenex polymorphism, implemented as a standalone library) came the version 2.3.0 in 2019 with new features : dune support, ADT / algebraic datatypes, improvement of the if-then-else and let-in support, improvement of the data types.

More information on the Alt-Ergo SMT Solver
Albin Coquereau defended his PhD thesis in Decembre 2019 "Improving performance of the SMT solver Alt-Ergo with a better integration of efficient SAT solver"
We participated in the SMT-COMP 2019 during the 22nd SAT conference. The results of the competition are detailed here.

The launch of the Alt-Ergo Users' Club

Getting closer to our users, gathering industrial and academic supporters, collecting their needs into the Alt-Ergo roadmap is key to Alt-Ergo's development and sustainability.

The Alt-Ergo Users' Club was officially launched beginning of 2019. The first yearly meeting took place in February 2019. We were happy to welcome our first members Adacore, CEA List, Trust-In-Soft, and now Mitsubishi MERCE.

More information on the Alt-Ergo Users' Club

Harnessing our language-design expertise: Love

Work by David Declerck & Steven de Oliveira

Following the launch of Dune network, the Love language for smart-contracts was born from the collaboration of OCamlPro and Origin Labs. This new language, whose syntax is inspired from OCaml and Liquidity, is an alternative to the Dune native smart contract language Michelson. Love is based on system-F, a type system requiring no type inference and allowing polymorphism. The language has successfully been integrated on the network and the first smart contracts are being written.

LOVE: a New Smart Contract Language for the Dune Network The Love Smart Contract Language: Introduction & Key Features — Part I

The OCaml & Rust combo should be a candidate for any ambitious software project!

A Rust-based UI for memprof: we started in 2019 to work in collaboration with the memprof developer team on a Rust based UI for memprof. See Pierre and Albin's exposé at the JFLA2020's "Gardez votre mémoire fraiche avec Memthol" (Pierre Chambart , Albin Coquereau and Jacques-Henri Jourdan)
Rust training : Rust borrows heavily from functional programming languages to provide very expressive abstraction mechanisms. Because it is a systems language, these mechanisms are almost always zero-cost. For instance, polymorphic code has no runtime cost compared to a monomorphic version.This concern for efficiency also means that Rust lets developers keep a very high level of control and freedom for optimizations. Rust has no Garbage Collection or any form of runtime memory inspection to decide when to free, allocate or re-use memory. But because manual memory management is almost universally regarded as dangerous, or at least very difficult to maintain, the Rust compiler has a borrow-checker which is responsible for i) proving that the input program is memory-safe (and thread-safe), and ii) generating a safe and “optimal” allocation/deallocation strategy. All of this is done at compile-time.
Next sessions: April 20-24th 2020 (registration)

OCamlPro around the world

OCamlPro's team members attended many events throughout the world:

ICFP 2019 (Berlin)
The JFLA’2019 (Les Rousses, Haut-Jura)
The POSS'2019 (Paris)
MirageOS Retreat (Marrakech)

As a committed member of the OCaml ecosystem's animation, we've organized OCaml meetups too (see the famous OUPS meetups in Paris!).

Now let's jump into the new year 2020, with a team keeping expanding, and new projects ahead: keep posted!

Many people ask us about what happened in 2018! That was an incredibly active year on blockchain-related achievements, and at that time we were hoping to attract clients that would be interested in our blockchain expertise.

But that is history now! Still interested? Check the Origin Labs team and their partner The Garage on Dune Network!

For the record:

(April 2019) We had started Techelson: a testing framework for Michelson and Liquidity
(Nov 2018) An Introduction to Tezos RPCs: Signing Operations / An Introduction to Tezos RPCs: a Basic Wallet / Liquidity Tutorial: A Game with an Oracle for Random Numbers / First Open-Source Release of TzScan
(Oct 2018) OCamlPro’s TZScan grant proposal accepted by the Tezos Foundation – joint press release
(Jul 2018) OCamlPro’s Tezos block explorer TzScan’s last updates
(Feb 2018) Release of a first version of TzScan.io, a Tezos block explorer / OCamlPro’s Liquidity-lang demo at JFLA2018 – a smart-contract design language . We were developing Liquidity, a high level smart contract language, human-readable, purely functional, statically-typed, which syntax was very close to the OCaml syntax.
To garner interest and adoption, we also developed the online editor Try Liquidity. Smart-contract developers could design contracts interactively, directly in the browser, compile them to Michelson, run them and deploy them on the alphanet network of Tezos. Future plans included a full-fledged web-based IDE for Liquidity. Worth mentioning was a neat feature: decompiling a Michelson program back to its Liquidity version, whether it was generated from Liquidity code or not.

opam 2.0.6 release

2020-01-16T09:05:17Z

We are pleased to announce the minor release of opam 2.0.6.

This new version contains some small backported fixes and build update:

Don't remove git cache objects that may be used [#3831 @AltGr]
Don't include .gitattributes in index.tar.gz [#3873 @dra27]
Update FAQ uri [#3941 @dra27]
Lock: add warning in case of missing locked file [#3939 @rjbou]
Directory tracking: fix cached entries retrieving with precise tracking [#4038 @hannesm]
Build:
- Add sanity checks [#3934 @dra27]
- Build man pages using dune [#3902 ]
- Add patch and bunzip check for make cold [#4006 @rjbou - fix #3842]
Shell:
- fish: add colon for fish manpath [#3886 @rjbou - fix #3878]
Sandbox:
- Add dune cache as rw [#4019 @rjbou - fix #4012]
- Do not fail if $HOME/.ccache is missing [#3957 @mseri]
opam-devel file: avoid copying extraneous files in opam-devel example [#3999 @maroneze]

As sandbox scripts have been updated, don't forget to run opam init --reinit -ni to update yours.

Note: To homogenise macOS name on system detection, we decided to keep macos, and convert darwin to macos in opam. For the moment, to not break jobs & CIs, we keep uploading darwin & macos binaries, but from the 2.1.0 release, only macos ones will be kept.

Installation instructions (unchanged):

From binaries: run

bash -c "sh <(curl -fsSL https://raw.githubusercontent.com/ocaml/opam/master/shell/install.sh) --version 2.0.6"

From source, using opam:

opam update; opam install opam-devel

From source, manually: see the instructions in the README.

We hope you enjoy this new minor version, and remain open to bug reports and suggestions.

NOTE: this article is cross-posted on opam.ocaml.org and ocamlpro.com.

The Opam 2.0 cheatsheet, with a new theme!

2020-01-10T09:05:17Z

Earlier, we dusted-off our Language and Stdlib cheatsheets, for teachers and students. With more time, we managed to design an Opam 2.0 cheat-sheet we are proud of. It is organized into two pages:

The everyday average Opam use:
- Installation, Configuration, Switches, Allowed URL formats, Packages, Exploring, Package pinning, Working with local pins, Sharing a dev setup, Configuring remotes.
Peculiar advanced use cases (opam-managed project, publishing, repository maintenance, etc.):
- Package definition files, Some optional fields, Expressions, External dependencies, Publishing, Repository administration.

Moreover, with the help of listings, we tried the use of colors for better readability. And we left some blank space for your own peculiar commands. Two versions are available (PDF):

The Opam cheatsheet in black & white
The Opam cheatsheet in colour.

In any case do not hesitate to send us your suggestions on github:

Louis and Raja, the lead Opam developers, designed this cheatsheet so as to shed light on some important features (some I even discovered even though I speak daily with them!). If a command you find useful is not mentioned, let us know and we’ll add it. Feel free to ask for clarification and/or expansion of the manual!

Happy hacking!

Note: If you come to one of our training sessions, you’ll get a free cheatsheet! Isn’t that a bargain?

Des nouvelles de la part de l'équipe compilateur d'OCamlPro

2019-09-30T09:05:17Z

Nous sommes heureux de présenter certains travaux en cours sur le compilateur OCaml, travaux menés en étroite collaboration avec notre partenaire et client Janestreet.

Un travail conséquent a été fait pour aboutir à un nouveau framework d’optimisation du compilateur, appelé Flambda2, dont nous espérons qu’il corrigera certains défauts apparus dans Flambda. En parallèle, l’équipe a mené à bien certaines améliorations immédiates sur Flambda, ainsi que des modifications du compilateur qui seront utiles pour Flambda2.

Voir (en anglais) : OCamlPro’s compiler team work update

Formations OCaml par OCamlPro : 5-6 et 7-8 novembre 2019

2019-09-26T09:05:17Z

OCamlPro lance un cycle de formations régulières à OCaml, en français, dans ses locaux parisiens (métro Alésia). La première session aura lieu début novembre 2019, avec 2 formations:

Formation débutant : passer à OCaml (5-6 novembre)
Formation expert : approfondir sa maîtrise du langage (7-8 novembre).

La formation expert sera l’occasion pour des programmeurs OCaml ayant déjà une certaine expérience de mieux comprendre les possibilités avancées du typage (objets, GADTs), de découvrir en détail le fonctionnement du GC et d’écrire du code optimisable par le compilateur.

Ces formations sont aussi une occasion de venir discuter avec les lead développeurs et contributeurs d’OPAM et Flambda chez OCamlPro.

Des formations en anglais peuvent aussi être organisées sur demande à contact@ocamlpro.com

OCaml expert and beginner training by OCamlPro (in French): Nov. 5-6 & 7-8

2019-09-25T09:05:17Z

In our endeavour to encourage professional programmers to understand and use OCaml, OCamlPro will be giving two training sessions, in French, in our Paris offices:

OCaml Beginner course for professional programmers (5-6 Nov)
OCaml Expertise (7-8 Nov).

The "Expert" OCaml course is for already experienced OCaml programmers to better understand advanced type system possibilities (objects, GADTs), discover GC internals, write "compiler-optimizable" code.

These sessions are also an opportunity to come discuss with OCamlPro's OPAM & Flambda lead developers and core contributors in Paris.

Training in English can also be organized, on-demand.

This complements the excellent OCaml MOOC from Université Paris-Diderot and the learn-OCaml platform of the OCaml Software Foundation.

A look back on OCaml since 2011

2019-09-20T09:05:17Z

As you already know if you’ve read our last blogpost, we have updated our OCaml cheat sheets starting with the language and stdlib ones. We know some of you have students to initiate in September and we wanted these sheets to be ready for the start of the school year! We’re working on more sheets for OCaml tools like opam or Dune and important libraries such as ~~Obj~~ Lwt or Core. Keep an eye on our blog or the repo on GitHub to follow all the updates.

Going through the documentation was a journey to the past: we have looked back on 8 years of evolution of the OCaml language and library. New feature after new feature, OCaml has seen many changes. Needless to say, upgrading our cheat sheets to OCaml 4.08.1 was a trip down memory lane. We wanted to share our throwback experience with you!

2011

Fabrice Le Fessant first published our cheat sheets in 2011, the year OCamlPro was created! At the time, OCaml was in its 3.12 version and just got its current name agreed upon. First-class modules were the new big thing, Camlp4 and Camlp5 were battling for the control of the syntax extension world and Godi and Oasis were the packaging rage.

2012

Right after 3.12 came the switch to OCaml 4.00 which brought a major change: GADTs (generalized algebraic data types). Most of OCaml’s developers don’t use their almighty typing power, but the possibilities they provide are really helpful in some cases, most notably the format overhaul. They’re also a fun way to troll a beginner asking how to circumvent the typing system on Stack Overflow. Since most of us might lose track of their exact syntax, GADTs deserve their place in the updated sheet (if you happen to be OCamlPro’s CTO, of course the writer of this blogpost remembers how to use GADTs at all times).

On the standard library side, the big change was the switch of Hashtbl to Murmur 3 and the support for seeded randomization.

2013

With OCaml 4.01 came constructor disambiguation, but there isn’t really a way to add this to the sheet. This feature allows you to avoid misguided usage of polymorphic variants, but that’s a matter of personal taste (there’s a well-known rule that if you refresh the comments section enough times, someone —usually called Daniel— will appear to explain polymorphic variants’ superiority to you). -ppx rewriters were introduced in this version as well.

The standard library got a few new functions. Notably, Printexc.get_callstack for stack inspection, the optimized application operators |> and @@ and Format.asprintf.

2014

Gabriel Scherer, on the Caml-list, end of January:

TL;DR: During the six next months, we will follow pull requests (PR) posted on the github mirror of the OCaml distribution, as an alternative to the mantis bugtracker. This experiment hopes to attract more people to participate in the extremely helpful and surprisingly rewarding activity of patch reviews.

Can you guess which change to the cheat-sheets came with 4.02? It’s a universally-loved language feature added in 2014. Still don’t know? It is exceptional! Got it?

Drum roll… it is the match with exception construction! It made our codes simpler, clearer and in some cases more efficient. A message to people who want to improve the language: please aim for that.

This version also added the {quoted|foo|quoted} syntax (which broke comments), generative functors, attributes and extension nodes, extensible data types, module aliases and, of course, immutable strings (which was optional at the time). Immutable strings is the one feature that prompted us to remove a line from the cheat sheets. More space is good. Camlp4 and Labltk moved out of the distribution.

In consequence of immutable strings, Bytes and BytesLabel were added to the library. For the great pleasure of optimization addicts, raise_notrace popped up. Under the hood, the format type was re-implemented using GADTs.

2015

This release was so big that 4.02.2 feels like a release in itself, with the adding of nonrec and #... operators.

The standard library was spared by this bug-fix themed release. Note that this is the last comparatively slow year of OCaml as the transition to GitHub would soon make features multiply, as hindsight teaches us.

2016

Speaking of a major release, we’re up to OCaml 4.03! It introduced inline records, a GADT exhaustiveness check on steroids (with -> . to denote unreachability) and standard attributes like warning, inlined, unboxed or immediate. Colors appeared in the compiler and last but not least, it was the dawn of a new option called Flambda.

The library saw a lot of useful new functions coming in: lots of new iterators for Array, an equal function in most basic type modules, Uchar, the *_ascii alternatives and, of course, Ephemeron.

4.04 was much more restrained, but it was the second release in a single year. Local opening of module with the M.{} syntax was added along with the let exception ... in construct. String.split_on_char was notably added to the stdlib which means we don’t have to rewrite it anymore.

2017

We now get to 4.05… which did not change the language. Not that the development team wasn’t busy, OCaml just got better without any change to the syntax.

On the library side however, much happened, with the adding of *_opt functions pretty much everywhere. If you’re using the OCaml compiler from Debian, this is where you might think the story ends. You’d be wrong…

…because 4.06 added a lot! My own favorite feature from this release has to be user-defined indexing operators. This is also when safe-string became the default, giving worthwhile work to every late maintainer in the community. This release also added one awesome function in the standard library: Map.update.

2018

4.07 was aimed towards solidifying the language. It added empty variants and type-based selection of GADT constructors to the mix.

On the library side, one old and two new modules were added, with the integration of Bigarray, Seq and Float.

2019

And here we are with 4.08, in the present day! We can now put exceptions under or-patterns, which is the only language change from this release we propagated to the sheet. Time will tell if we need to add custom binding operators or [@@alert]. Pervasives is now deprecated in profit of Stdlib and new modules are popping up (Int, Bool, Fun, Result… did we miss one?) while Sort made its final deprecation warning.

We did not add 4.09 to this journey to the past, as this release is still solidly in the now at the time of this blogpost. Rest assured, we will see much more awesome features in OCaml in the future! In the meantime, we are working on updating more cheat sheets: keep posted!

Comments

Micheal Bacarella (23 September 2019 at 18 h 17 min):

For a blog-post from a company called OCaml PRO this seems like a rather tone-deaf PR action.

I wanted to read this and get hyped but instead I’m disappointed and I continue to feel like a chump advocating for this language.

Why? Because this is a rather underwhelming summary of 8 years of language activity. Perhaps you guys didn’t intend for this to hit the front of Hacker News, and maybe this stuff is really exciting to programming language PhDs, but I don’t see how the average business OCaml developer would relate to many of these changes at all. It makes OCaml (still!) seem like an out-of-touch academic language where the major complaints about the language are ignored (multicore, Windows support, programming-in-the-large, debugging) while ivory tower people fiddle with really nailing type-based selection in GADTs.

I expect INRIA not to care about the business community but aren’t you guys called OCaml PRO? I thought you liked money.

You clearly just intended this to be an interesting summary of changes to your cheatsheet but it’s turned into a PR release for the language and leaves normals with the continued impression that this language is a joke.

Thomas Blanc (24 September 2019 at 14 h 57 min):

Yes, latency can be frustrating even in the OCaml realm. Thanks for your comment, it is nice to see people caring about it and trying to remedy through contributions or comments.

Note that we only posted on discuss.ocaml.org expecting to get one or two comments. The reason for this post was that while updating the CS we were surprised to see how much the language had changed and decided to write about it.

You do raise some good points though. We did work on a full windows support back in the day. The project was discontinued because nobody was willing to buy it. We also worked on memory profiling for the debugging of memory leaks (before other alternatives existed). We did not maintain it because the project had no money input. I personally worked on compile-time detection of uncaught exception until the public funding of that project ran out. We also had a proposal for namespaces in the language that would have facilitated programming-in-the-large (no funding) and worked on multicore (funding for one man for one year).

Mise à jour des Cheat Sheets : OCaml Language et OCaml Standard Library

2019-09-14T09:05:17Z

Les mémentos (cheat-sheets) OCaml lang et OCaml stdlib partagés par OCamlPro en 2011 ont été mis à jour pour OCaml 4.08.

Si vous souhaitez contribuer des améliorations: sources sur GitHub.

En savoir plus : Updated Cheat Sheets: OCaml Language and OCaml Standard Library

Updated Cheat Sheets: OCaml Language and OCaml Standard Library

2019-09-13T09:05:17Z

In 2011, we shared several cheat sheets for OCaml. Cheat sheets are helpful to refer to, as an overview of the documentation when you are programming, especially when you’re starting in a new language. They are meant to be printed and pinned on your wall, or to be kept in handy on a spare screen. We hope they will help you out when your rubber duck is rubbish at debugging your code!

Since we first shared them, OCaml and its related tools have evolved. We decided to refresh them and started with the two most-used cheat sheets—our own contribution to the start of the school year!

Download the revised version:

OCaml Language (lang) (PDF)
OCaml Standard Library (stdlib) (PDF)

You can also find the sources on GitHub. We welcome contributions, feel free to send patches if you see room for improvement! We’re working on other cheat sheets: keep an eye on our blog to see updates and brand new cheat sheets.

While we were updating them, we realized how much OCaml had evolved in the last eight years. We’ll tell you everything about our trip down memory lane very soon in another blogpost!

OCamlPro’s compiler team work update

2019-08-30T09:05:17Z

The OCaml compiler team at OCamlPro is happy to present some of the work recently done jointly with JaneStreet's team.

A lot of work has been done towards a new framework for optimizations in the compiler, called Flambda2, aiming at solving the shortcomings that became apparent in the Flambda optimization framework (see below for more details). While that work is in progress, the team also worked on some more short-term improvements, notably on the current Flambda optimization framework, as well as some compiler modifications that will benefit Flambda2.

This work is funded by JaneStreet :D

Short-term improvements

Recursive values compilation

OCaml supports quite a large range of recursive definitions. In addition to recursive (and mutually-recursive) functions, one can also define regular values recursively, as for the infinite list let rec l = 0 :: l.

Not all recursive constructions are allowed, of course. For instance, the definition let rec x = x is rejected because there is no way to actually build a value that would behave correctly.

The basic rule for deciding whether a definition is allowed or not is made under the assumption that recursive values (except for functions, mostly) are compiled by first allocating space in the heap for the recursive values, binding the recursively defined variables to the allocated (but not yet initialized) values. The defining expressions are then evaluated, yielding new values (that can contain references the non-initialized values). Finally, the fields of these new values are copied one-by-one into the corresponding fields of the initial values.

For this approach to work, some restrictions need to apply:

the compiler needs to be able to compute the size of the values beforehand (these values must be allocated values, in order to avoid defining an integer recursively),
and since during the evaluation of the defining expressions their fields are not valid, one cannot write any code that may read these fields, like pattern-matching on the value, or passing the value to some function (or storing it in a mutable field of some record).

All of those restrictions have recently been reworked and formalized based on work from Alban Reynaud during an internship at Inria, reviewed and completed by Gabriel Scherer and Jeremy Yallop.

Unfortunately, this work only covers checking whether the recursive definitions are allowed or not; actual compilation is done later in the compiler, in one place for bytecode and another for native code, and these pieces of code have not been linked with the new check so there have been a few cases where the check allowed code that wasn't actually compiled correctly.

Since we didn't want to deal with it directly in our new version of Flambda, we had started working on a patch to move the compilation of recursive values up in the compilation pipeline, before the split between bytecode and native code. After some amount of hacking (we discovered that compilation of classes creates recursive value bindings that would not pass the earlier recursive check…), we have a patch that is mostly ready for review and will soon start engaging with the rest of the compiler team with the aim of integrating it into the compiler.

Separate compilation of recursive modules, compilation units as functors

Some OCaml developers like to encapsulate each type definition in its own module, with an interface that can expose the needed types and functions, while abstracting away as much of the actual implementation as possible. It is then common to have each of these modules in its own file, to simplify management and avoid unseemly big files.

However, this breaks down when one needs to define several types that depend on each other. The usual solutions are either to use recursive modules, which have the drawback of requiring all the modules to be in the same compilation unit, leading to very big files (we have seen a real case of a more than 10,000-lines file), or make each module parametric in the other modules, translating them into functors, and then instantiate all the functors when building the outwards-facing interface.

To address these issues, we have been working on two main patches to improve the life of developers facing these problems.

The first one allows compiling several different files as mutually recursive modules, reusing the approach used to compile regular recursive modules. In practice, this will allow developers using recursive modules extensively to properly separate not only the different modules from each other, but also the implementation and interfaces into a .ml and .mli files. This would of course need some additional support from the different build tools, but we're confident we can get at least dune to support the feature.

The second one allows compiling a single compilation unit as a functor instead of a regular module. The arguments of the functor would be specified on the command line, their signature taken from their corresponding interface file. This can be useful not only to break recursive dependencies, like the previous patch (though in a different way), but also to help developers relying on multiple implementations of a same .mli interface functorize their code with minimal effort.

These two improvements will also benefit packs, whereas recursive compilation units could be packed in a single module and packs could be functorized themselves.

Small improvements to Flambda

We are still committed to maintain the Flambda part of the compiler. Few bugs have been found, so we concentrate our efforts on small features that either yield overall performance gains or allow naive code patterns to be compiled as efficiently as their equivalent but hand-optimized versions.

As an example, one optimization that we should be able to submit soon looks for cases where an immutable block is allocated but an immutable block with the same exact fields and tag already exists.

This can be demonstrated with the following example:

let result_bind f = function
  | Ok x -> f x
  | Error e -> Error e

The usual way to avoid the extra allocation of Error e is to write the clause as | (Error e) as r -> r. With this new patch, the redundant allocation will be detected and removed automatically! This can be even more interesting with inlining:

let my_f x =
  if (* some condition *)
  then Ok x
  else (* something else *)

let _ =
  (* ... *)
  let r = result_bind my_f (* some argument *) in
  (* ... *)

In this example, inlining result_bind then my_f can match the allocation Ok x in my_f with the pattern matching in result_bind. This removes an allocation that would be very hard to remove otherwise. We expect these patterns to occur quite often with some programming styles relying on a great deal of abstraction and small independent functions.

Flambda 2.0

We are building on the work done for Flambda and the experience of its users to develop Flambda 2.0, the next optimization framework.

Our goal is to build a framework for analyzing the costs and benefits of code transformations. The framework focuses on reducing the runtime cost of abstractions and removing as many short-lived allocations as possible.

The aim of Flambda 2.0 is roughly the same as the original Flambda. So why did we decide to write a new framework instead of patching the existing one? Several points led us to this decision.

An invariant on the representation of closures that ensured that every closure had a unique identifier, which was convenient for a number of reasons, turned out to be quite expensive to maintain and prevented some optimizations.
The internal representation of Flambda terms included too many different cases that were either redundant or not relevant to the optimizations we were interested in, making a lot of code more complicated than necessary.
The ANF-like representation we used was not perfect. We wanted an easier way to do control flow optimizations, which led us to choose a CPS-like representation for Flambda 2.0.
Finally, the original Flambda was thought of as an alternative to the closure conversion and inlining algorithms performed by the Closure module of the compiler, translating from the Lambda representation to Clambda. However, a number of optimizations (most importantly unboxing) are done during the next phase of compilation, Cmmgen, which translates to the Cmm representation. The original Flambda had trouble to estimate correctly which optimizations would trigger and what would their benefit be. It may be noted that correctly estimating benefit is a key in Flambda's algorithms, and we know of a number of cases where Flambda is not as good as it could be because it couldn't predict the unboxing opportunities that inlining would have allowed. Flambda 2.0 will go from Lambda to Cmm, and will handle all transformations done in both Closure and Cmmgen in a single framework.

These improvements are still very much a work in progress. We have not reached the point where other developers can try out the new framework on their codebases yet.

This does not mean there are no news to enjoy before our efforts show on the mainstream compiler! While working on Flambda 2.0, we did deploy a number of patches on the compiler both before and after the Flambda stage. We proposed all the changes independant enough to be proposed on their own. Some of these fixes have been merged already. Others are still under discussion and some, like the recursive values patch mentioned above, are still waiting for cleanup or documentation before submission.

Comments

Jon Harrop (30 August 2019 at 20 h 11 min):

What is the status of multicore OCaml?

Vincent Laviron (2 September 2019 at 16 h 22 min):

OCamlPro is not working on multicore OCaml. It is still being worked on elsewhere, with efforts concentrated around OCaml Labs, but I don’t have more information than what is publicly available. All of the work we described here is not expected to interfere with multicore.

Lindsay (25 September 2020 at 20 h 20 min):

Thanks for your continued work on the compiler and tooling! Am curious if there is any news regarding the item “Separate compilation of recursive modules”.

Release d’opam 2.0.5

2019-07-23T09:05:17Z

Nous sommes fiers d’annoncer la release (mineure) d’ opam 2.0.5. Cette nouvelle version contient des mises à jours de build et correctifs.

Plus d’information

opam 2.0.5 release

2019-07-11T09:05:17Z

We are pleased to announce the minor release of opam 2.0.5.

This new version contains build update and small fixes:

Bump src_ext Dune to 1.6.3, allows compilation with OCaml 4.08.0. [#3887 @dra27]
Support Dune 1.7.0 and later [#3888 @dra27 - fix #3870]
Bump the ocaml_mccs lib-ext, to include latest changes [#3896 @AltGr]
Fix cppo detection in configure [#3917 @dra27]
Read jobs variable from OpamStateConfig [#3916 @dra27]
Linting:
- add check upstream option [#3758 @rjbou]
- add warning for with-test in run-test field [#3765, #3860 @rjbou]
- fix misleading doc filter warning [#3871 @rjbou]
Fix typos [#3891 @dra27, @mehdid]

Note: To homogenise macOS name on system detection, we decided to keep macos, and convert darwin to macos in opam. For the moment, to not break jobs & CIs, we keep uploading darwin & macos binaries, but from the 2.1.0 release, only macos ones will be kept.

Installation instructions (unchanged):

From binaries: run

sh <(curl -sL https://raw.githubusercontent.com/ocaml/opam/master/shell/install.sh)

From source, using opam:

opam update; opam install opam-devel

From source, manually: see the instructions in the README.

We hope you enjoy this new minor version, and remain open to bug reports and suggestions.

NOTE: this article is cross-posted on opam.ocaml.org and ocamlpro.com.

Résultats de la SMT-Comp 2019 pour Alt-Ergo

2019-07-10T09:05:17Z

Les résultats de la compétition SMT-COMP 2019 ont été publiés au whorkshop SMT de la 22e conférence SAT. Nous étions fiers d’y participer pour la deuxième année consécutive, surtout depuis qu’Alt-Ergo prend en charge le standard SMT-LIB 2.

Alt-Ergo est un SAT solveur open-source maintenu et distribué par OCamlPro, et financé entre autres grâce à plusieurs projets de R&D collaborative (BWare, SOPRANO, Vocal, LChip).

Si vous êtes un utilisateur d’Alt-Ergo, songez à rejoindre le Club des Utilisateurs d’Alt-Ergo! L’histoire de ce logiciel remonte à 2006, où il est né de recherches académiques conjointes entre Inria et le CNRS dans le laboratoire du LRI. Il est depuis septembre 2013 maintenu, développé & et distribué par OCamlPro (voir l’historique des versions passées).

Si vous êtes curieux des activités d’OCamlPro dans le domaine des méthodes formelles, vous pouvez lire le court témoignage d’un client heureux

Voir /blog/alt-ergo-participation-to-the-smt-comp-2019

The Alt-Ergo SMT Solver’s results in the SMT-COMP 2019

2019-07-09T09:05:17Z

The results of the SMT-COMP 2019 were released a few days ago at the SMT whorkshop during the 22nd SAT conference. We were glad to participate in this competition for the second year in a row, especially as Alt-Ergo now supports the SMT-LIB 2 standard.

Alt-Ergo is an open-source SAT-solver maintained and distributed by OCamlPro and partially funded by R&D projects. If you’re interested, please consider joining the Alt-Ergo User’s Club! Its history goes back in 2006 from early academic researches conducted conjointly at Inria & CNRS “LRI” lab, and the maintenance and development work by OCamlPro since September 2013 (see the past releases).

If you’re curious about OCamlPro’s other activities in Formal Methods, see a happy client’s feedback

SMT-COMP 2018

Our goal last year was to challenge ourselves on the community benchmarks. We wanted to compare Alt-Ergo to state-of-the-art SMT solvers. We thus selected categories close to the “deductive program verification”, as Alt-Ergo is primarily tuned for formulas coming from this application domain. Specifically, we took part in four main tracks categories: ALIA, AUFLIA, AUFLIRA, AUFNIRA. These categories are a combination of theories such as Arrays, Uninterpreted Function and Linear and Non-linear arithmetic over Integers and Reals.

Alt-Ergo’s Results at SMT-COMP 2018

For its first participation in SMT-COMP, Alt-Ergo showed that it was a competitive solver comparing to state of the art solvers such as CVC4, Vampire, VeriT or Z3.

Main Track Categories (number of participants)	Sequential Perfs	Parallel Perfs
ALIA (4)
AUFLIA (4)
AUFLIRA (4)
AUFNIRA (3)

The global results of the competition are available here.

SMT-COMP 2019

Since last year’s competition, we made some improvements on Alt-Ergo, specifically over our data structures and the support of algebraic datatypes (see post).

A few changes can be noted for this year’s competition:

A distinction between SAT and UNSAT in the scoring scheme allowed us to compete in more categories, as Alt-Ergo doesn’t send back SAT.
The aim of the 24s Scoring is to reward solvers which solve problems quickly.
The number of benchmarks in each category has changed. For each category, only the benchmarks which were not proven by every solver last year are used. For example: in the division AUFLIRA, 20011 benchmarks were used last year, of which 1683 remained this year.

Alt-Ergo only competed in the Single Query Track. We selected the same categories as last year and added UF, UFLIA, UFLRA and UFNIA. We also decided to compete over categories supporting algebraic DataTypes to test our newly support of this theory. Alt-Ergo’s expertise is over quantified problems, but we wanted to test our hand in the solver theories over some Quantifier-free categories.

Alt-Ergo’s Results at SMT-COMP 2019

We were proud to see Alt-Ergo performs within a reasonable margin on Quantifier Free problems comparing to other solvers over the UNSAT problems, even though these problems are not our solver’s primary goal. And we were happy with the performance of our solver in Datatype categories, as the support of this theory is new.

For the last categories, Alt-Ergo managed to reproduce last year’s performance, close to CVC4 (2018 and 2019 winner) and Vampire.

Single Query Categories (number of participants)	Sequential	Parallel	Unsat	24s
ALIA (8)
AUFLIA (8)
AUFLIRA (8)
AUFNIRA (5)
UF (8)
UFLIA (8)
UFLRA (8)
UFNIA (8)

This year results are available here. These results do not include Par4 a portfolio solver.

Alt-Ergo is constantly evolving, as well as our support of the SMT-LIB standard. For next year’s participation, we will try to compete in more categories and hope to cover more tracks, such as the UNSAT-Core track.

Blockchains @ OCamlPro: an Overview

2019-04-29T09:05:17Z

OCamlPro started working on blockchains in 2014, when Arthur Breitman came to us with an initial idea to develop the Tezos ledger. The idea was very challenging with a lot of innovations. So, we collaborated with him to write a specification, and to turn the specification into OCaml code. Since then, we continually improved our skills in this domain, trained more engineers, introduced the technology to students and to professionals, advised a dozen projects, developed tools and libraries, made some improvements and extensions to the official Tezos node, and conducted several private deployments of the Tezos ledger.

For an overview of OCamlPro’s blockchain activities see here

TzScan: A complete Block Explorer for Tezos

TzScan is considered today to be the best block explorer for Tezos. It’s made of three main components:

an indexer that queries the Tezos node and fills a relational database,
an API server that queries the database to retrieve various informations,
a web based user interface (a Javascript application)

We deployed the indexer and API to freely provide the community with an access to all the content of the Tezos blockchain, already used by many websites, wallets and apps. In addition, we directly use this API within our TzScan.io instance. Our deployment spans on multiple Tezos nodes, multiple API servers and a distributed database to scale and reply to millions of queries per day. We also regularly release open source versions under the GPL license, that can be easily deployed on private Tezos networks. TzScan’s development has been initiated in September 2017. It represents today an enormous investment, that the Tezos Foundation helped partially fund in July 2018.

Contact us for support, advanced features, advertisement, or if you need a private deployment of the TzScan infrastructure.

Liquidity: a Smart Contract Language for Tezos

Liquidity is the first high-level language for Tezos over Michelson. Its development began in April 2017, a few months before the Tezos fundraising in July 2017. It is today the most advanced language for Tezos: it offers OCaml-like and ReasonML-like syntaxes for writing smart contracts, compilation and de-compilation to/from Michelson, multiple-entry points, static type-checking à la ML, etc. Its online editor allows to develop smart contracts and to deploy them directly into the alphanet or mainnet. Liquidity has been used before the mainnet launch to de-compile the Foundation’s vesting smart contracts in order to review them. This smart contract language represents more than two years of work, and is fully funded by OCamlPro. It has been developed with formal verification in mind, formal verification being one of the selling points of Tezos. We have elaborated a detailed roadmap mixing model-checking and deductive program verification to investigate this feature. We are now searching for funding opportunities to keep developing and maintaining Liquidity.

See our online editor to get started ! Contact us if you need support, training, writing or in-depth analysis of your smart contracts.

Techelson: a testing framework for Michelson and Liquidity

Techelson is our newborn in the set of tools for the Tezos blockchain. It is a test execution engine for the functional properties of Michelson and Liquidity contracts. Techelson is still in its early development stage. The user documentation is available here. An example on how to use it with Liquidity is detailed in this post.

Contact us to customize the engine to suit your own needs!

IronTez: an optimized Tezos node by OCamlPro

IronTez is a tailored node for private (and public) deployments of Tezos. Among its additional features, the node adds some useful RPCs, improves storage, enables garbage collection and context pruning, allows an easy configuration of the private network, provides additional Michelson instructions (GET_STORAGE, CATCH…). One of its nice features is the ability to enable adaptive baking in private / proof-of-authority setting (eg. baking every 5 seconds in presence of transactions and every 10 minutes otherwise, etc.).

A simplified version of IronTez has already been made public to allow testing its improved storage system, Ironmin, showing a 10x reduction in storage. Some TzScan.io nodes are also using versions of IronTez. We’ve also successfully deployed it along with TzScan for a big foreign company to experiment with private blockchains. We are searching for projects and funding opportunities to keep developing and maintaining this optimized version of the Tezos node.

Don’t hesitate to contact us if you want to deploy a blockchain with IronTez, or for more information !

Comments

Kristen (3 May 2019 at 0 h 30 min):

I really wanted to keep using IronTez but I ran into bugs that have not yet been fixed, the code is out of date with upstream, and there is no real avenue for support/assistance other than email.

opam 2.0.4 release

2019-04-10T09:05:17Z

We are pleased to announce the release of opam 2.0.4.

This new version contains some backported fixes:

Sandboxing on macOS: considering the possibility that TMPDIR is unset [#3597 @herbelin - fix #3576]
display: Fix opam config var display, aligned on opam config list [#3723 @rjbou - rel. #3717]
pin:
- update source of (version) pinned directory [#3726 @rjbou - #3651]
- fix --ignore-pin-depends with autopin [#3736 @AltGr]
- fix pinnings not installing/upgrading already pinned packages (introduced in 2.0.2) [#3800 @AltGr]
opam clean: Ignore errors trying to remove directories [#3732 @kit-ty-kate]
remove wrong "mismatched extra-files" warning [#3744 @rjbou]
urls: fix hg opam 1.2 url parsing [#3754 @rjbou]
lint: update message of warning 47, to avoid confusion because of missing synopsis field internally inferred from descr [#3753 @rjbou - fix #3738]
system:
- lock & signals: don't interrupt at non terminal signals [#3541 @rjbou]
- shell: fix fish manpath setting [#3728 @gregory-nisbet]
- git: use diff.noprefix=false config argument to overwrite user defined configuration [#3788 @rjbou, #3628 @Blaisorblade - fix #3627]
dirtrack: fix precise tracking mode [#3796 @rjbou]
fix some mispellings [#3731 @MisterDA]
CI enhancement & fixes [#3706 @dra27, #3748 @rjbou, #3801 @rjbou]

Note: To homogenise macOS name on system detection, we decided to keep macos, and convert darwin to macos in opam. For the moment, to not break jobs & CIs, we keep uploading darwin & macos binaries, but from the 2.1.0 release, only macos ones will be kept.

Installation instructions (unchanged):

From binaries: run

sh <(curl -sL https://raw.githubusercontent.com/ocaml/opam/master/shell/install.sh)

From source, using opam:

opam update; opam install opam-devel

From source, manually: see the instructions in the README.

We hope you enjoy this new minor version, and remain open to bug reports and suggestions.

NOTE: this article is cross-posted on opam.ocaml.org and ocamlpro.com.

opam 2.0 tips

2019-03-12T09:05:17Z

This blog post looks back on some of the improvements in opam 2.0, and gives tips on the new workflows available.

Package development environment management

Opam 2.0 has been vastly improved to handle locally defined packages. Assuming you have a project ~/projects/foo, defining two packages foo-lib and foo-bin, you would have:

~/projects/foo
|-- foo-lib.opam
|-- foo-bin.opam
`-- src/ ...

(See also about computed dependency constraints for handling multiple package definitions with mutual constraints)

Automatic pinning

The underlying mechanism is the same, but this is an interface improvement that replaces most of the opam 1.2 workflows based on opam pin.

The usual commands (install, upgrade, remove, etc.) have been extended to support specifying a directory as argument. So when working on project foo, just write:

cd ~/projects/foo
opam install .

and both foo-lib and foo-bin will get automatically pinned to the current directory (using git if your project is versioned), and installed. You may prefer to use:

opam install . --deps-only

to just get the package dependencies ready before you start hacking on it. See below for details on how to reproduce a build environment more precisely. Note that opam depext . will not work at the moment, which will be fixed in the next release when the external dependency handling is integrated (opam will still list you the proper packages to install for your OS upon failure).

If your project is versioned and you made changes, remember to either commit, or add --working-dir so that your uncommitted changes are taken into account.

Local switches

Opam 2.0 introduced a new feature called "local switches". This section explains what it is about, why, when and how to use them.

Opam switches allow to maintain several separate development environments, each with its own set of packages installed. This is particularly useful when you need different OCaml versions, or for working on projects with different dependency sets.

It can sometimes become tedious, though, to manage, or remember what switch to use with what project. Here is where "local switches" come in handy.

How local switches are handled

A local switch is simply stored inside a _opam/ directory, and will be selected automatically by opam whenever your current directory is below its parent directory.

NOTE: it's highly recommended that you enable the new shell hooks when using local switches. Just run opam init --enable-shell-hook: this will make sure your PATH is always set for the proper switch.

You will otherwise need to keep remembering to run eval $(opam env) every time you cd to a directory containing a local switch. See also how to display the current switch in your prompt

For example, if you have ~/projects/foo/_opam, the switch will be selected whenever in project foo, allowing you to tailor what it has installed for the needs of your project.

If you remove the switch dir, or your whole project, opam will forget about it transparently. Be careful not to move it around, though, as some packages still contain hardcoded paths and don't handle relocation well (we're working on that).

Creating a local switch

This can generally start with:

cd ~/projects/foo
opam switch create . --deps-only

Local switch handles are just their path, instead of a raw name. Additionally, the above will detect package definitions present in ~/projects/foo, pick a compatible version of OCaml (if you didn't explicitely mention any), and automatically install all the local package dependencies.

Without --deps-only, the packages themselves would also get installed in the local switch.

Using an existing switch

If you just want an already existing switch to be selected automatically, without recompiling one for each project, you can use opam switch link:

cd ~/projects/bar
opam switch link 4.07.1

will make sure that switch 4.07.1 is chosen whenever you are in project bar. You could even link to ../foo here, to share foo's local switch between the two projects.

Reproducing build environments

Pinnings

If your package depends on development versions of some dependencies (e.g. you had to push a fix upstream), add to your opam file:

depends: [ "some-package" ] # Remember that pin-depends are depends too
pin-depends: [
  [ "some-package.version" "git+https://gitfoo.com/blob.git#mybranch" ]
]

This will have no effect when your package is published in a repository, but when it gets pinned to its dev version, opam will first make sure to pin some-package to the given URL.

Lock-files

Dependency contraints are sometimes too wide, and you don't want to explore all the versions of your dependencies while developing. For this reason, you may want to reproduce a known-working set of dependencies. If you use:

opam lock .

opam will check what version of the dependencies are installed in your current switch, and explicit them in *.opam.locked files. opam lock is a plugin at the moment, but will get automatically installed when needed.

Then, assuming you checked these files into version control, any user can do

opam install . --deps-only --locked

to instruct opam to reproduce the same build environment (the --locked option is also available to opam switch create, to make things easier).

The generated lock-files will also contain added constraints to reproduce the presence/absence of optional dependencies, and reproduce the appropriate dependency pins using pin-depends. Add the --direct-only option if you don't want to enforce the versions of all recursive dependencies, but only direct ones.

Release : Liquidity version 1.0 !

2019-03-09T09:05:17Z

Nous sommes fiers d'annoncer la release de la première version majeure de Liquidity, le langage de smart contracts et son outillage. Parmi les fonctions phares : multiples points d'entrée, système de contrats modulaire, polymorphisme et inférence de type, syntaxe ReasonML pour une plus grande adoption, etc.

Voir cet article !

Announcing Liquidity version 1.0

2019-03-08T09:05:17Z

Liquidity version 1.0

We are pleased to announce the release of the first major version of the Liquidity smart-contract language and associated tools.

Some of the highlights of this version are detailed below.

Multiple Entry Points

In the previous versions of Liquidity, smart contracts were limited to a single entry point (named main). But traditionally smart contracts executions path depend strongly on the parameter and in most cases they are completely distinct.

Having different entry points allows to separate code that do not overlap and which usually accomplish vastly different tasks. Encoding entry points with complex pattern matching constructs before was tedious and made the code not extremely readable. This new feature gives you readability and allows to call contracts in a natural way.

Internally, entry points are encoded with sum types and pattern matching so that you keep the strong typing guarantees that come over from Michelson. This means that you cannot call a typed smart contract with the wrong entry point or the wrong parameter (this is enforced statically by both the Liquidity typechecker and the Michelson typechecker).

Modules and Contract System

Organizing, encapsulating and sharing code is not always easy when you need to write thousand lines files. Liquidity now allows to write modules (which contain types and values/functions) and contracts (which define entry points in addition). Types and non-private values of contracts and modules in scope can be accessed by other modules and contracts.

You can even compile several files at once with the command line compiler, so that you may organize your multiple smart contract projects in libraries and files.

Polymorphism and Type Inference

Thanks to a new and powerful type inference algorithm, you can now get rid of almost all type annotations in the smart contracts.

Instead of writing something like

let%entry main (parameter : bool) (storage : int) =
  let ops = ([] : operation list) in
  let f (c : bool) = if not c then 1 else 2 in
  ops, f parameter

you can now write

let%entry main parameter _ =
  let ops = [] in
  let f c = if not c then 1 else 2 in
  ops, f parameter

And type inference works with polymorhpism (also a new feature of this release) so you can now write generic and reusable functions:

type 'a t = { x : 'a set; y : 'a }

let mem_t v = Set.mem v.y v.x

Inference also works with contract types and entry points.

ReasonML Syntax

We originally used a modified version of the OCaml syntax for the Liquidity language. This made the language accessible, almost for free, to all OCaml and functional language developers. The typing discipline one needs is quite similar to other strongly typed functional languages so this was a natural fit.

However this is not the best fit for everyone. We want to bring the power of Liquidity and Tezos to the masses so adopting a seemingly familiar syntax for most people can help a lot. With this new version of Liquidity, you can now write your smart contracts in both an OCaml-like syntax or a ReasonML-like one. The latter being a lot closer to Javascript on the surface, making it accessible to people that already know the language or people that write smart contracts for other platforms like Solidity/Ethereum.

You can see the full changelog as well as download the latest release and binaries at this address.

Don't forget that you can also try all these new cool features and more directly in your browser with our online editor.

Release de Techelson, moteur de tests pour Michelson et Liquidity

2019-03-07T09:05:17Z

Nous sommes fiers d’annoncer la première release de Techelson, moteur d’exécution de tests pour Michelson. Les programmeurs Liquidity peuvent également l’utiliser.

Voir Techelson, a test execution engine for Michelson.

Techelson, a test execution engine for Michelson

2019-03-06T09:05:17Z

We are pleased to announce the first release of Techelson, available here.

Techelson is a Test Execution Engine for Michelson. It aims at testing functional properties of Michelson smart contracts. Make sure to check the user documentation to get a sense of Techelson's workflow and features.

For Liquidity programmers interested in Techelson, take a look at this blog post discussing how to write tests in Liquidity and run them using Techelson.

Techelson is still young: if you have problems, suggestions or feature requests please open an issue on the repository.

Signing Data for Smart Contracts

2019-03-05T09:05:17Z

Smart contracts calls already provide a built-in authentication mechanism as transactions (i.e. call operations) are cryptographically signed by the sender of the transaction. This is a guarantee on which programs can rely.

However, sometimes you may want more involved or flexible authentication schemes. The ones that rely on signature validity checking can be implemented in Michelson, and Liquidity provide a built-in instruction to do so. (You still need to keep in mind that you cannot store unencrypted confidential information on the blockchain).

This instruction is Crypto.check in Liquidity. Its type can be written as:

Crypto.check: key -> signature -> bytes -> bool

Which means that it takes as arguments a public key, a signature and a sequence of bytes and returns a Boolean. Crypto.check pub_key signature message is true if and only if the signature signature was obtained by signing the Blake2b hash of message using the private key corresponding to the public key pub_key.

A small smart contract snippet which implements a signature check (against a predefined public key kept in the smart contract's storage) can be tested online here.

type storage = key

let%entry main ((message : string), (signature : signature)) key =
  let bytes = Bytes.pack message in
  if not (Crypto.check key signature bytes) then
    failwith "Wrong signature";
  ([] : operation list), key

This smart contract fails if the string message was not signed with the private key corresponding to the public key key stored. Otherwise it does nothing.

This signature scheme is more flexible than the default transaction/sender one, however it requires that the signature can be built outside of the smart contract. (And more generally outside of the toolset provided by Liquidity and Tezos). On the other hand, signing a transaction is something you get for free if you use the tezos client or any tezos wallet (as is it essentially their base function).

The rest of this blog post will focus on various ways to sign data, and on getting signatures that can be used in Tezos and Liquidity directly.

Signing Using the Tezos Client

One (straightforward) way to sign data is to use the Tezos client directly. You will need to be connected to a Tezos node though as the client makes RPCs to serialize data (this operation is protocol dependent). We can only sign sequences of bytes, so the first thing we need to do is to serialize whichever data we want to sign. This can be done with the command hash data of the client.

$ ./tezos-client -A alphanet-node.tzscan.io -P 80 hash data '"message"' of type string
Raw packed data:
  0x0501000000076d657373616765
Hash:
  exprtXaZciTDGatZkoFEjE1GWPqbJ7FtqAWmmH36doxBreKr6ADcYs
Raw Blake2b hash:
  0x01978930fd2d04d0db8c2e4ef8a3f5d63b8e732177c8723135ed0dc7d99ebed3
Raw Sha256 hash:
  0x32569319f6517036949bcead23a761bfbfcbf4277b010355884a86ba09349839
Raw Sha512 hash:
  0xdfa4ea9f77db3a98654f101be1d33d56898df40acf7c2950ca6f742140668a67fefbefb22b592344922e1f66c381fa2bec48aa47970025c7e61e35d939ae3ca0
Gas remaining: 399918 units remaining

This command gives the result of hashing the data using various algorithms but what we're really interested in is the first item Raw packed data which is the serialized version of our data ("message") : 0x0501000000076d657373616765.

We can now sign these bytes using the Tezos client as well. This step can be performed completely offline, for that we need to use the option -p of the client to specify the protocol we want to use (the sign bytes command will not be available without first selecting a valid protocol). Here we use protocol 3, designated by its hash PsddFKi3.

$ ./tezos-client -p PsddFKi3 sign bytes 0x0501000000076d657373616765 for my_account
Signature:
  edsigto9QHtXMyxFPyvaffRfFCrifkw2n5ZWqMxhGRzieksTo8AQAFgUjx7WRwqGPh4rXTBGGLpdmhskAaEauMrtM82T3tuxoi8

The account my_account can be any imported account in the Tezos client. In particular, it can be an encrypted key pair (you will need to enter a password to sign) or a hardware Ledger (you will need to confirm the signature on the Ledger). The obtained signature can be used as is with Liquidity or Michelson. This one starts with edsig because it was obtained using an Ed25519 private key, but you can also get signatures starting with spsig1 or p2sig depending on the cryptographic curve that you use.

Signing Manually

In this second section we detail the necessary steps and provide a Python script to sign string messages using an Ed25519 private key. This can be easily adapted for other signing schemes.

These are the steps that will need to be performed in order to sign a string:

Assuming that the value you want to sign is a string, you first need to convert its ASCII version to hexa, for the string "message" that is 6d657373616765.
You need to produce the packed version of the corresponding Michelson expression. The binary representation can vary depending on the types of the values you want to pack but for strings it is:

| 0x | 0501 | [size of the string on 4 bytes] | [ascii string in hexa] |

for "message" (of length 7), it is

| 0x | 0501 | 00000007 | 6d657373616765 |

or 0x0501000000076d657373616765.

Hash this value using Blake2b (01978930fd2d04d0db8c2e4ef8a3f5d63b8e732177c8723135ed0dc7d99ebed3) which is 32 bytes long.
Depending on your public key, you then need to sign it with the corresponding curve (ed25519 for edpk keys), the signature is 64 bytes:

753e013b8515a7d47eaa5424de5efa2f56620ac8be29d08a6952ae414256eac44b8db71f74600275662c8b0c226f3280e9d24e70a5fa83015636b98059b5180c

Optionally convert to base58check. This is not needed because Liquidity and Michelson allow signatures (as well as keys and key hashes) to be given in hex format with a 0x:

0x753e013b8515a7d47eaa5424de5efa2f56620ac8be29d08a6952ae414256eac44b8db71f74600275662c8b0c226f3280e9d24e70a5fa83015636b98059b5180c

The following Python (3) script will do exactly this, entirely offline. Note that this is just an toy example, and should not be used in production. In particular you need to give your private key on the command line so this might not be secure if the machine you run this on is not secure.

$ pip3 install base58check pyblake2 ed25519
> python3 ./sign_string.py "message" edsk2gL9deG8idefWJJWNNtKXeszWR4FrEdNFM5622t1PkzH66oH3r
0x753e013b8515a7d47eaa5424de5efa2f56620ac8be29d08a6952ae414256eac44b8db71f74600275662c8b0c226f3280e9d24e70a5fa83015636b98059b5180c

`sign_string.py`

from pyblake2 import blake2b
import base58check
import ed25519
import sys

message = sys.argv[1]
seed_b58 = sys.argv[2]

prefix = b'x05x01'
len_bytes = (len(message)).to_bytes(4, byteorder='big')
h = blake2b(digest_size=32)
b = bytearray()
b.extend(message.encode())
h.update(prefix + len_bytes + b)
digest = h.digest()

seed = base58check.b58decode(seed_b58)[4:-4]
sk = ed25519.SigningKey(seed)
sig = sk.sign(digest)
print("0x" + sig.hex())

What's new for Alt-Ergo in 2018? Here is a recap!

2019-02-11T09:05:17Z

After the hard work done on the integration of floating-point arithmetic reasoning two years ago, 2018 is the year of polymorphic SMT2 support and efficient SAT solving for Alt-Ergo. In this post, we recap the main novelties last year, and we announce the first Alt-Ergo Users’ Club meeting.

An SMT2 front-end with prenex polymorphism

As you may know, Alt-Ergo’s native input language is not compliant with the SMT-LIB 2 input language standard, and translating formulas from SMT-LIB 2 to Alt-Ergo’ syntax (or vice-versa) is not immediate. Besides its extension with polymorphism, this native language diverges from SMT-LIB’s by distinguishing terms of type boolean from formulas (that are propositions). This distinction makes it hard, for instance, to efficiently translate let-in and if-then-else constructs that are ubiquitous in SMT-LIB 2 benchmarks.

In order to work closely with the SMT community, we designed a conservative extension of the SMT-LIB 2 standard with prenex polymorphism and implemented it as a new frontend in Alt-Ergo 2.2. This work has been published in the 2018 edition of the SMT-Workshop. An online version of the paper is available here. Experimental results showed that polymorphism is really important for Alt-Ergo, as it allows to improve both resolution rate and resolution time (see Figure 5 in the paper for more details).

Improved SAT solvers

We also worked on improving SAT-solving in Alt-Ergo last year. The main direction towards this goal was to extend our CDCL-based SAT solver to mimic some desired behaviors of the native Tableaux-like SAT engine. Generally speaking, this allows a better management of the context during proof search, which prevents from overwhelming theories and instantiation engines with useless facts. A comparison of this solver with Alt-Ergo’s old Tableaux-like solver is also done in our SMT-Workshop paper.

SMT-Comp and SMT-Workshop 2018

As emphasized above, we published our work regarding polymorphic SMT2 and SAT solving in SMT-Workshop 2018. More generally, this was an occasion for us to write the first tool paper about Alt-Ergo, and to highlight the main features that make it different from other state-of-the-art SMT solvers like CVC4, Z3 or Yices.

Thanks to our new SMT2 frontend, we were able to participate to the SMT-Competition last year. Naturally, we selected categories that are close to “deductive program verification”, as Alt-Ergo is primarily tuned for formulas coming from this application domain.

Although Alt-Ergo did not rank first, it was a positive experience and this encourages us to go ahead. Note that Alt-Ergo’s brother, Ctrl-Ergo, was not far from winning the QF-LIA category of the competition. This performance is partly due to the improvements in the CDCL SAT solver that were also integrated in Ctrl-Ergo.

Alt-Ergo for Atelier-B

Atelier-B is a framework that allows to develop formally verified software using the B Method. The framework rests on an automatic reasoner that allows to discharges thousands of mathematical formulas extracted from B models. If a formula is not discharged automatically, it is proved interactively. ClearSy (the company behind development of Atelier-B) has recently added a new backend to produce verification conditions in Why3’s logic, in order to target more automatic provers and increase automation rate. For certifiability reasons, we extended Alt-Ergo with a new frontend that is able to directly parse these verification conditions without relying on Why3.

Improved hash-consed data-structures

As said above, Alt-Ergo makes a clear distinction between Boolean terms and Propositions. This distinction prevents us from doing some rewriting and simplifications, in particular on expressions involving let-in and if-then-else constructs. This is why we decided to merge Term, Literal, and Formula in a new Expr data-structure, and remove this distinction. This allowed us to implement some additional simplification steps, and we immediately noticed performance improvements, in particular on SMT2 benchmarks. For instance, Alt-Ergo 2.3 proves 19548 formulas of AUFLIRA category in ~350 minutes, while version 2.2 proves 19535 formulas in ~1450 minutes (time limit was set to 20 minutes per formula).

Towards the integration of algebraic datatypes

Last Autumn, we also started working on the integration of algebraic datatypes reasoning in Alt-Ergo. In this first iteration, we extended Alt-Ergo’s native language to be able to declare (mutually recursive) algebraic datatypes, to write expressions with patterns matching, to handle selectors, … We then extended the typechecker accordingly and implemented a (not that) basic theory reasoner. Of course, we also handle SMT2’s algebraic datatypes. Here is an example in Alt-Ergo’s native syntax:

type ('a, 'b) t = A of {a_1 : 'a} | B of {b_11 : 'a ; b12 : 'b} | C | D | E

logic e : (int, real) t
logic n : int

axiom ax_n : n &gt;= 9

axiom ax_e:
  e = A(n) or e = B(n*n, 0.) or e = E

goal g:
  match e with
   | A(u) -> u >= 8
   | B (u,v) -> u >= 80 and v = 0.
   | E -> true
   | _ -> false
  end
  and 3 <= 2+2

What is planned in 2019 and beyond: the Alt-Ergo’s Users’ Club is born!

In 2018, we welcomed a lot of new engineers with a background in formal methods: Steven (De Oliveira) holds a PhD in formal verification from the Paris-Saclay University and the French Atomic Energy Commission (CEA). He has a master in cryptography and worked in the Frama-C team, developing open-source tools for verifying C programs. David (Declerck) obtained a PhD from Université Paris-Saclay in 2018, during which he extended the Cubicle model checker to support weak memory models and wrote a compiler from a subset of the x86 assembly language to Cubicle. Guillaume (Bury) holds a PhD from Université Sorbonne Paris Cité. He studied the integration of rewriting techniques inside SMT solvers. Albin (Coquereau) is working as a PhD student between OCamlPro, LRI and ENSTA, focusing on improving the Alt-Ergo SMT solver. Adrien is interested in verification of safety properties over software and embedded systems. He worked on higher-order functional program verification at the University of Tokyo, and on the Kind 2 model checker at the University of Iowa. All these people will consolidate the department of formal methods at OCamlPro, which will be beneficial for Alt-Ergo.

In 2019 we just launched the Alt-Ergo Users’ Club, in order to get closer to our users, collect their needs, and integrate them into the Alt-Ergo roadmap, but also to ensure sustainable funding for the development of the project. We are happy to announce the very first member of the Club is Adacore, very soon to be followed by Trust-In-Soft and CEA List. Thanks for your early support!

Interested to join? Contact us: contact@ocamlpro.com

Optimisation du stockage dans Tezos : une branche de test sur Gitlab

2019-02-05T09:05:17Z

Ce troisième article consacré à l’amélioration du stockage dans Tezos fait suite à l’annonce de la mise à disposition d’une image docker pour les beta testeurs souhaitant essayer notre système de stockage et garbage collector.

Voir Improving Tezos Storage : Gitlab branch for testers

Improving Tezos Storage : Gitlab branch for testers

2019-02-04T09:05:17Z

This article is the third post of a series of posts on improving Tezos storage. In our previous post, we announced the availability of a docker image for beta testers, wanting to test our storage and garbage collector. Today, we are glad to announce that we rebased our code on the latest version of mainnet-staging, and pushed a branch mainnet-staging-irontez on our public Gitlab repository.

The only difference with the previous post is a change in the name of the RPCs : /storage/context/gc will trigger a garbage collection (and terminate the node afterwards) and /storage/context/revert will migrate the database back to Irmin (and terminate the node afterwards).

Enjoy and send us feedback !!

Comments

AppaDude (10 February 2019 at 15 h 12 min):

I must be missing something. I compiled and issued the required rpc trigger:

/storage/context/gc with the command

~/tezos/tezos-client rpc get /storage/context/gc But I just got an empty JSON response of {} and the size of the .tezos-node folder is unchanged. Any advice is much appreciated. Thank you!

Fabrice Le Fessant (10 February 2019 at 15 h 47 min):

By default, garbage collection will keep 9 cycles of blocks (~36000 blocks). If you have fewer blocks, or if you are using Irontez on a former Tezos database, and fewer than 9 cycles have been stored in Irontez, nothing will happen. If you want to force a garbage collection, you should tell Irontez to keep fewer block (but more than 100, that’s the minimum that we enforce):

~/tezos/tezos-client rpc get ‘/storage/context/gc?keep=120’

should trigger a GC if the node has been running on Irontez for at least 2 hours.

AppaDude (10 February 2019 at 16 h 04 min):

I think it did work. I was confused because the total disk space for the .tezos-node folder remained unchanged. Upon closer inspection, I see these contents and sizes:

These are the contents of .tezos-node, can I safely delete context.backup?

4.0K config.json 269M context 75G context.backup 4.0K identity.json 4.0K lock 1.4M peers.json 5.4G store 4.0K version.json

Is it safe to delete context.backup if I do not plan to revert? (/storage/context/revert)

Fabrice Le Fessant (10 February 2019 at 20 h 51 min):

Yes, normally. Don’t forget it is still under beta-testing…

Note that /storage/context/revert works even if you remove context.backup.

Jack (23 February 2019 at 0 h 24 min):

Have there been any issues reported with missing endorsements or missing bakings with this patch? We have been using this gc version (https://gitlab.com/tezos/tezos/merge_requests/720) for the past month and ever since we switched we have been missing endorsements and missing bakings. The disk space savings is amazing, but if we keep missing ends/bakes, it’s going to hurt our reputation as a baking service.

Fabrice Le Fessant (23 February 2019 at 6 h 58 min):

Hi,

I am not sure what you are asking for. Are you using our version (https://gitlab.com/tzscan/tezos/commits/mainnet-staging-irontez), or the one on the Tezos repository ? Our version is very different, so if you are using the other one, you should contact them directly on the merge request. On our version, we got a report last week, and the branch has been fixed immediately (but not yet the docker images, should be done in the next days).

Jack (25 February 2019 at 15 h 53 min):

I was using the 720MR and experiencing issues with baking/endorsing. I understand that 720MR and IronTez are different. I was simply asking if your version has had any reports of baking/endorsing troubles.

Jack (25 February 2019 at 15 h 51 min):

Is there no way to convert a “standard node” to IronTez? I was running the official tezos-node, and my datadir is around 90G. I compiled IronTez and started it up on that same dir, then ran rpc get /storage/context/gc and nothing is happening. I thought this was supposed to convert my datadir to irontez? If not, what is the RPC to do this? Or must I start from scratch to be 100% irontez?

Fabrice Le Fessant (25 February 2019 at 16 h 24 min):

There are two ways to get a full Irontez DB:

Start a node from scratch and wait for one or two days…

Use an existing node, run Irontez on it for 2 hours, and then call rpc get /storage/context/gc?keep=100 . 100 is the number of blocks to be kept. After 2 hours, the last 120 blocks should be stored in the IronTez DB, so the old DB will not be used anymore. Note that Irontez will not delete the old DB, just rename it. You should go there and remove the file to recover the disk space.

Jack (27 February 2019 at 1 h 24 min):

Where do we send feedback/get help? Email? Slack? Reddit?

Banjo E. (3 March 2019 at 2 h 40 min):

There is a major problem for bakers who want to use the irontez branch. After garbage collection, the baker application will not start because the baker requests a rpc call for the genesis block information. That genesis block information is gone after the garbage collection. Please address this isssue soon. Thank you!

Fabrice Le Fessant (6 March 2019 at 21 h 44 min):

I pushed a new branch with a tentative fix: https://gitlab.com/tzscan/tezos/tree/mainnet-staging-irontez-fix-genesis . Unfortunately, I could not test it (I am far away from work for two weeks), so feedback is really welcome, before pushing in the irontez branch.

Tezos et OCamlPro

2019-01-31T09:05:17Z

Tezos est aujourd’hui un projet open source, un réseau international développé par des équipes sur plus de cinq continents. Dans la genèse du projet, l’entreprise française OCamlPro, qui développe encore aujourd’hui de nombreux projets liés à Tezos (TZscan, Liquidity, etc.), a joué un rôle particulièrement important. C’est en effet en son sein que des ingénieurs-chercheurs ont posé les premières pierres du code, en étroite collaboration avec Arthur Breitman, l’architecte du projet, et DLS pendant plusieurs années. Nous nous réjouissons aujourd’hui de l’essor qu’a pris le projet.

Arthur et OCamlPro (publication conjointe)

Improving Tezos Storage : update and beta-testing

2019-01-30T09:05:17Z

In a previous post, we presented some work that we did to improve the quantity of storage used by the Tezos node. Our post generated a lot of comments, in which upcoming features such as garbage collection and pruning were introduced. It also motivated us to keep working on this (hot) topic, and we present here our new results, and current state. Irontez3 is a new version of our storage system, that we tested both on real traces and real nodes. We implemented a garbage-collector for it, that is triggered by an RPC on our node (we want the user to be able to choose when it happens, especially for bakers who might risk losing a baking slot), and automatically every 16 cycles in our traces.

In the following graph, we present the size of the context database during a full trace execution (~278 000 blocks):

There is definitely quite some improvement brought to the current Tezos implementation based on Irmin+LMDB, that we reimplemented as IronTez0. IronTez0 allows an IronTez node to read a database generated by the current Tezos and switch to the IronTez3 database. At the bottom of the graph, IronTez3 increases very slowly (about 7 GB at the end), and the garbage-collector makes it even less expensive (about 2-3 GB at the end). Finally, we executed a trace where we switched from IronTez0 to IronTez3 at block 225 000. The graph shows that, after the switch, the size immediately grows much more slowly, and finally, after a garbage collection, the storage is reduced to what it would have been with IronTez3.

Now, let’s compare the speed of the different storages:

The graph shows that IronTez3 is about 4-5 times faster than Tezos/IronTez0. Garbage-collections have an obvious impact on the speed, but clearly negligible compared to the current performance of Tezos. On our computer used for the traces, a Xeon with an SSD disk, the longest garbage collection takes between 1 and 2 minutes, even when the database was about 40 GB at the beginning.

In the former post, we didn’t check the amount of memory used by our storage system. It might be expected that the performance improvement could be associated with a more costly use of memory… but such is not the case :

At the top of the graph is our IronTez0 implementation of the current storage: it uses a little more memory than the current Tezos implementation (about 6 GB), maybe because it shares data structures with IronTez3, with fields that are only used by IronTez3 and could be removed in a specialized version. IronTez3 and IronTez3 with garbage collection are at the bottom, using about 2 GB of memory. It is actually surprising that the cost of garbage collections is very limited.

On our current running node, we get the following storage:

$ du

1.4G ./context
4.9G ./store
6.3G .

Now, if we use our new RPC to revert the node to Irmin (taking a little less than 8 minutes on our computer), we get :

$ du
14.3G ./context
 4.9G ./store
19.2G .

Beta-Testing with Docker

If you are interested in these results, it is now possible to test our node: we created a docker image, similar to the ones of Tezos. It is available on Docker Hub (one image that works for both Mainnet and Alphanet). Our script mainnet.sh (http://tzscan.io/irontez/mainnet.sh) can be used similarly to the alphanet.sh script of Tezos to manage the container. It can be run on an existing Tezos database, it will switch it to IronTez3. Note that such a change is not irreversible, still it might be a good idea to backup your Tezos node directory before, as (1) migrating back might take some time, (2) this is a beta-testing phase, meaning the code might still hide nasty bugs, and (3) the official node might introduce a new incompatible format.

New RPCS

Both of these RPCs will make the node TERMINATE once they have completed. You should restart the node afterwards.

The RPC /ocp/storage/gc : it triggers a garbage collection using the RPC /ocp/storage/gc . By default, this RPC will keep only the contexts from the last 9 cycles. It is possible to change this value by using the ?keep argument, and specify another number of contexts to keep (beware that if this value is too low, you might end up with a non-working Tezos node, so we have set a minimum value of 100). No garbage-collection will happen if the oldest context to keep was stored in the Irmin database. The RPC /ocp/storage/revert : it triggers a migration of the database fron Irontez3 back to Irmin. If you have been using IronTez for a while, and want to go back to the official node, this is the way. After calling this RPC, you should not run IronTez again, otherwise, it will restart using the IronTez3 format, and you will need to revert again. This operation can take a lot of time, depending on the quantity of data to move between the two formats.

Following Steps

We are now working with the team at Nomadic Labs to include our work in the public Tezos code base. We will inform you as soon as our Pull Request is ready, for more testing ! If all testing and review goes well, we hope it can be merged in the next release !

Comments

Jack (30 January 2019 at 15 h 30 min):

Please release this as a MR on gitlab so those of us not using docker can start testing the code.

Fabrice Le Fessant (10 February 2019 at 15 h 39 min):

That was done: here

Tezos and OCamlPro

2019-01-29T09:05:17Z

A reflection on the new year… Today, Tezos is a global network and an open source project with developers spanning over five continents. In the inception of this project, the French company OCamlPro which, to this day, stills develops numerous projects around Tezos, played a particularly important role. Indeed, they were the first home of the research engineers who laid down the cornerstone of the code base, in tight collaboration with Arthur Breitman and the architect of the project, and DLS. We take some time today to remember those early days and celebrate the flourishing of this once small project.

(cross-post with Arthur Breitman, Founder of the Tezos project)

opam 2.0.3 release

2019-01-28T09:05:17Z

We are pleased to announce the release of opam 2.0.3.

This new version contains some backported fixes:

Fix manpage remaining $ (OPAMBESTEFFORT)
Fix OPAMROOTISOK handling
Regenerate missing environment file

Installation instructions (unchanged):

From binaries: run

sh <(curl -sL https://raw.githubusercontent.com/ocaml/opam/master/shell/install.sh)

From source, using opam:

opam update; opam install opam-devel

From source, manually: see the instructions in the README.

We hope you enjoy this new major version, and remain open to bug reports and suggestions.

NOTE: this article is cross-posted on opam.ocaml.org and ocamlpro.com.

Improving Tezos Storage

2019-01-15T09:05:17Z

Running a Tezos node currently costs a lot of disk space, about 59 GB for the context database, the place where the node stores the states corresponding to every block in the blockchain, since the first one. Of course, this is going to decrease once garbage collection is integrated, i.e. removing very old information, that is not used and cannot change anymore (PR720 by Thomas Gazagnaire, Tarides, some early tests show a decrease to 14GB ,but with no performance evaluation). As a side note, this is different from pruning, i.e. transmitting only the last cycles for “light” nodes (PR663 by Thomas Blanc, OCamlPro). Anyway, as Tezos will be used more and more, contexts will keep growing, and we need to keep decreasing the space and performance cost of Tezos storage.

As one part of our activity at OCamlPro is to allow companies to deploy their own private Tezos networks, we decided to experiment with new storage layouts. We implemented two branches: our branch IronTez1 is based on a full LMDB database, as Tezos currently, but with optimized storage representation ; our branch IronTez2 is based on a mixed database, with both LMDB and file storage.

To test these branches, we started a node from scratch, and recorded all the accesses to the context database, to be able to replay it with our new experimental nodes. The node took about 12 hours to synchronize with the network, on which about 3 hours were used to write and read in the context database. We then replayed the trace, either only the writes or with both reads and writes.

Here are the results:

The mixed storage is the most interesting: it uses half the storage of a standard Tezos node !

Again, the mixed storage is the most efficient : even with reads and writes, IronTez2 is five time faster than the current Tezos storage.

Finally, here is a graph that shows the impact of the two attacks that happened in November 2018, and how it can be mitigated by storage improvement:

The graph shows that, using mixed storage, it is possible to restore the storage growth of Tezos to what it was before the attack !

Interestingly, although these experiments have been done on full traces, our branches are completely backward-compatible : they could be used on an already existing database, to store the new contexts in our optimized format, while keeping the old data in the ancient format.

Of course, there is still a lot of work to do, before this work is finished. We think that there are still more optimizations that are possible, and we need to test our branches on running nodes for some time to get confidence (TzScan might be the first tester !), but this is a very encouraging work for the future of Tezos !

opam 2.0.2 release

2018-12-12T09:05:17Z

We are pleased to announce the release of opam 2.0.2.

As sandbox scripts have been updated, don't forget to run opam init --reinit -ni to update yours.

This new version contains mainly backported fixes:

Doc:
- update man page
- add message for deprecated options
- reinsert removed ones to print a deprecated message instead of fail (e.g. --alias-of)
- deprecate no-aspcud
Pin:
- on pinning, rebuild updated pin-depends packages reliably
- include descr & url files on pinning 1.2 opam files
Sandbox:
- handle symlinks in bubblewrap for system directories such as /bin or /lib (#3661). Fixes sandboxing on some distributions such as CentOS 7 and Arch Linux.
- allow use of unix domain sockets on macOS (#3659)
- change one-line conditional to if statement which was incompatible with set -e
- make /var readonly instead of empty and rw
Path: resolve default opam root path
System: suffix .out for read_command_output stdout files
Locked: check consistency with opam file when reading lock file to suggest regeneration message
Show: remove pin depends messages
Cudf: Fix closure computation in the presence of cycles to have a complete graph if a cycle is present in the graph (typically ocaml-base-compiler ⇄ ocaml)
List: Fix some cases of listing coinstallable packages
Format upgrade: extract archived source files of version-pinned packages
Core: add is_archive in OpamSystem and OpamFilename
Init: don't fail if empty compiler given
Lint: fix light_uninstall flag for error 52
Build: partial port to dune
Update cold compiler to 4.07.1

Installation instructions (unchanged):

From binaries: run

sh <(curl -sL https://raw.githubusercontent.com/ocaml/opam/master/shell/install.sh)

From source, using opam:

opam update; opam install opam-devel

From source, manually: see the instructions in the README.

We hope you enjoy this new minor version, and remain open to bug reports and suggestions.

NOTE: this article is cross-posted on opam.ocaml.org and ocamlpro.com.

An Introduction to Tezos RPCs: Signing Operations

2018-11-21T09:05:17Z

In a previous blogpost, we presented the RPCs used by tezos-client to send a transfer operation to a tezos-node. We were left with two remaining questions:

How to forge a binary operation, for signature
How to sign a binary operation

In this post, we will reply to these questions. We are still assuming a node running and waiting for RPCs on address 127.0.0.1:9731. Since we will ask this node to forge a request, we really need to trust it, as a malicious node could send a different binary transaction from the one we sent him.

Let’s take back our first operation:

{
  "branch": "BMHBtAaUv59LipV1czwZ5iQkxEktPJDE7A9sYXPkPeRzbBasNY8",
  "contents": [
    { "kind": "transaction",
      "source": "tz1KqTpEZ7Yob7QbPE4Hy4Wo8fHG8LhKxZSx",
      "fee": "50000",
      "counter": "3",
      "gas_limit": "200",
      "storage_limit": "0",
      "amount": "100000000",
      "destination": "tz1gjaF81ZRRvdzjobyfVNsAeSC6PScjfQwN"
   } ]
}

So, we need to translate this operation into a binary format, more amenable for signature. For that, we use a new RPC to forge operations. Under Linux, we can use the tool curl to send the request to the node:

$ curl -v -X POST http://127.0.0.1:9731/chains/main/blocks/head/helpers/forge/operations -H "Content-type: application/json" --data '{
  "branch": "BMHBtAaUv59LipV1czwZ5iQkxEktPJDE7A9sYXPkPeRzbBasNY8",
  "contents": [
    { "kind": "transaction",
      "source": "tz1KqTpEZ7Yob7QbPE4Hy4Wo8fHG8LhKxZSx",
      "fee": "50000",
      "counter": "3",
      "gas_limit": "200",
      "storage_limit": "0",
      "amount": "100000000",
      "destination": "tz1gjaF81ZRRvdzjobyfVNsAeSC6PScjfQwN"
  } ]
}'

Note that we use a POST request (request with content), with a Content-type header indicating that the content is in JSON format. We get the following body in the reply :

"ce69c5713dac3537254e7be59759cf59c15abd530d10501ccf9028a5786314cf08000002298c03ed7d454a101eb7022bc95f7e5f41ac78d0860303c8010080c2d72f0000e7670f32038107a59a2b9cfefae36ea21f5aa63c00"

This is the binary representation of our operation, in hexadecimal format, exactly what we were looking for to be able to include operations on the blockchain. However, this representation is not yet complete, since we also need the operation to be signed by the manager.

To sign this operation, we will first use tezos-client. That’s something that we can do if we want, for example, to sign an operation offline, for better security. Let’s assume that we have saved the content of the string (ce69...3c00 without the quotes) in a file operation.hex, we can ask tezos-client to sign it with:

$ tezos-client --addr 127.0.0.1 --port 9731 sign bytes 0x03$(cat operation.hex) for bootstrap1

The 0x03$(cat operation.hex) is the concatenation of the 0x03 prefix and the hexa content of the operation.hex, which is equivalent to 0x03ce69...3c00. The prefix is used (1) to indicate that the representation is hexadecimal (0x), and (2) that it should start with 03, which is a watermark for operations in Tezos.

We get the following reply in the console:

Signature: edsigtkpiSSschcaCt9pUVrpNPf7TTcgvgDEDD6NCEHMy8NNQJCGnMfLZzYoQj74yLjo9wx6MPVV29CvVzgi7qEcEUok3k7AuMg

Wonderful, we have a signature, in base58check format ! We can use this signature in the run_operation and preapply RPCs… but not in the injection RPC, which requires a binary format. So, to inject the operation, we need to convert to the hexadecimal version of the signature. For that, we will use the base58check package of Python (we could do it in OCaml, but then, we could just use tezos-client all along, no ?):

$ pip3 install base58check
$ python
>>>import base58check
>>>base58check.b58decode(b'edsigtkpiSSschcaCt9pUVrpNPf7TTcgvgDEDD6NCEHMy8NNQJCGnMfLZzYoQj74yLjo9wx6MPVV29CvVzgi7qEcEUok3k7AuMg').hex()
'09f5cd8612637e08251cae646a42e6eb8bea86ece5256cf777c52bc474b73ec476ee1d70e84c6ba21276d41bc212e4d878615f4a31323d39959e07539bc066b84174a8ff0de436e3a7'

All signatures in Tezos start with 09f5cd8612, which is used to generate the edsig prefix. Also, the last 4 bytes are used as a checksum (e436e3a7). Thus, the signature itself is after this prefix and before the checksum: 637e08251cae64...174a8ff0d.

Finally, we just need to append the binary operation with the binary signature for the injection, and put them into a string, and send that to the server for injection. If we have stored the hexadecimal representation of the signature in a file signature.hex, then we can use :

$ curl -v -H "Content-type: application/json" 'http://127.0.0.1:9731/injection/operation?chain=main' --data '"'$(cat operation.hex)$(cat signature.hex)'"'

and we receive the hash of this new operation:

"oo1iWZDczV8vw3XLunBPW6A4cjmdekYTVpRxRh77Fd1BVv4HV2R"

Again, we cheated a little, by using tezos-client to generate the signature. Let’s try to do it in Python, too !

First, we will need the secret key of bootstrap1. We can export from tezos-client to use it directly:

$ tezos-client show address bootstrap1 -S
Hash: tz1KqTpEZ7Yob7QbPE4Hy4Wo8fHG8LhKxZSx
Public Key: edpkuBknW28nW72KG6RoHtYW7p12T6GKc7nAbwYX5m8Wd9sDVC9yav
Secret Key: unencrypted:edsk3gUfUPyBSfrS9CCgmCiQsTCHGkviBDusMxDJstFtojtc1zcpsh

The secret key is exported on the last line by using the -S argument, and it usually starts with edsk. Again, it is in base58check, so we can use the same trick to extract its binary value:

$ python3
>>> import base58check
>>> base58check.b58decode(b'edsk3gUfUPyBSfrS9CCgmCiQsTCHGkviBDusMxDJstFtojtc1zcpsh').hex()[8:72]
'8500c86780141917fcd8ac6a54a43a9eeda1aba9d263ce5dec5a1d0e5df1e598'

This time, we directly extracted the key, by removing the first 8 hexa chars, and keeping only 64 hexa chars (using [8:72]), since the key is 32-bytes long. Let’s suppose that we save this value in a file bootstrap1.hex.

Now, we will use the following script to compute the signature:

import binascii

operation=binascii.unhexlify(open("operation.hex","rb").readline()[:-1])
seed = binascii.unhexlify(open("bootstrap1.hex","rb").readline()[:-1])

from pyblake2 import blake2b
h = blake2b(digest_size=32)
h.update(b'x03' + operation)
digest = h.digest()

import ed25519
sk = ed25519.SigningKey(seed)
sig = sk.sign(digest)
print(sig.hex())

The binascii module is used to read the files in hexadecimal (after removing the newlines), to get the binary representation of the operation and of the Ed25519 seed. Ed25519 is an elliptive curve used in Tezos to manage tz1 addresses, i.e. to sign data and check signatures.

The blake2b module is used to hash the message, before signature. Again, we add a watermark to the operation, i.e. x03, before hashing. We also have to specify the size of the hash, i.e. digest_size=32, because the Blake2b hashing function can generate hashes with different sizes.

Finally, we use the ed25519 module to transform the seed (private/secret key) into a signing key, and use it to sign the hash, that we print in hexadecimal. We obtain:

637e08251cae646a42e6eb8bea86ece5256cf777c52bc474b73ec476ee1d70e84c6ba21276d41bc212e4d878615f4a31323d39959e07539bc066b84174a8ff0d

This result is exactly the same as what we got using tezos-client !

We now have a complete wallet, i.e. the ability to create transactions and sign them without tezos-client. Of course, there are several limitations to this work: first, we have exposed the private key in clear, which is usually not a very good idea for security; also, Tezos supports three types of keys, tz1 for Ed25519 keys, tz2 for Secp256k1 keys (same as Bitcoin/Ethereum) and tz3 for P256 keys; finally, a realistic wallet would probably use cryptographic chips, on a mobile phone or an external device (Ledger, etc.).

Comments

Anthony (28 November 2018 at 2 h 01 min):

Fabrice, you talk about signing the operation using tezos-client, which can then be used with the run_operation, however when . you talk about doing it in a script, it doesn’t include the edsig or checksum or converted back into a usable form for run_operations. Can you explain how this is done in a script?

Thanks Anthony

Fabrice Le Fessant (29 November 2018 at 15 h 07 min):

You are right, run_operation needs an edsig signature, not the hexadecimal encoding. To generate the edsig, you just need to use the reverse operation of base58check.b58decode, i.e. base58check.b58encode, on the concatenation of 3 byte arrays:

1/ the 5-bytes prefix that will generate the initial edsig characters, i.e. 0x09f5cd8612 in hexadecimal 2/ the raw signature s 3/ the 4 initial bytes of a checksum: the checksum is computed as sha256(sha256(s))

Badalona (27 December 2018 at 13 h 13 min):

Hi Fabrice.

I will aprreciate if you show the coding of step 3. My checksum is always wrong.

Thanks

Alain (16 January 2019 at 16 h 26 min):

The checksum is on prefix + s. Here is a python3 script to do it:

./hex2edsig.py 637e08251cae646a42e6eb8bea86ece5256cf777c52bc474b73ec476ee1d70e84c6ba21276d41bc212e4d878615f4a31323d39959e07539bc066b84174a8ff0dedsigtkpiSSschcaCt9pUVrpNPf7TTcgvgDEDD6NCEHMy8NNQJCGnMfLZzYoQj74yLjo9wx6MPVV29CvVzgi7qEcEUok3k7AuMg

from pyblake2 import blake2b
import hashlib
import base58check
import ed25519
import sys

def sha256 (x) :
    return hashlib.sha256(x).digest()

def b58check (prefix, b) :
    x = prefix + b
    checksum = sha256(sha256(x))[0:4]
    return base58check.b58encode(x + checksum)

edsig_prefix = bytes([9, 245, 205, 134, 18])

hexsig = sys.argv[1]
bytessig = bytes.fromhex(hexsig)
b58sig = b58check (edsig_prefix, bytessig)
print(b58sig.decode('ascii'))

Anthony (29 November 2018 at 21 h 49 min):

Fabrice, Thanks for the information would you be able to show the coding as you have done in your blog? Thanks Anthony

Mark Robson (9 February 2020 at 23 h 51 min):

Great information, but can the article be updated to include the things discussed in the comments? As I can’t see the private key of bootstrap1 I can’t replicate locally. Been going around in circles on that point

stacey roberts (7 May 2020 at 13 h 53 min):

Can you help me to clear about how tezos can support to build a fully decentralized supply chain eco system?

leesadaisy (16 September 2020 at 10 h 25 min):

Hi there! Thanks for sharing useful info. Keep up your work.

Alice Jenifferze (17 September 2020 at 10 h 51 min):

Thanks for sharing!

Introduction aux RPCs dans Tezos : exemple d’un portefeuille (wallet) simple

2018-11-20T09:05:17Z

Dans cet article technique, nous introduisons brièvement les RPCs dans Tezos à travers un exemple simple montrant comment le client Tezos interagit avec le noeud lors d’une instruction de transfert. Les RPCs de Tezos sont des requêtes HTTP (GET ou POST) auxquelles les noeuds Tezos répondent dans un fichier au format JSON. Elles sont la seule façon pour les wallets d’interagir avec Read more…

An Introduction to Tezos RPCs: a Basic Wallet

2018-11-15T09:05:17Z

In this technical blog post, we will briefly introduce Tezos RPCs through a simple example: we will show how the tezos-client program interacts with the tezos-node during a transfer command. Tezos RPCs are HTTP queries (GET or POST) to which tezos-node replies in JSON format. They are the only way for wallets to interact with the node. However, given the large number of RPCs accepted by the node, it is not always easy to understand which ones can be useful if you want to write a wallet. So, here, we use tezos-client as a simple example, that we will complete in another blog post for wallets that do not have access to the Tezos Protocol OCaml code.

As for the basic setup, we run a sandboxed node locally on port 9731, with two known addresses in its wallet, called bootstrap1 and bootstrap2.

Here is the command we are going to trace during this example:

tezos-client --addr 127.0.0.1 --port 9731 -l transfer 100 from bootstrap1 to bootstrap2

With this command, we send just 100 tezzies between the two accounts, paying only for the default fees (0.05 tz).

We use the -l option to request tezos-client to log all the RPC calls it uses on the standard error (the console).

The first query issued by tezos-client is:

>>>>0: http://127.0.0.1:9731/chains/main/blocks/head/context/contracts/tz1KqTpEZ7Yob7QbPE4Hy4Wo8fHG8LhKxZSx/counter
<<<<0: 200 OK
"2"

tz1KqTpEZ7Yob7QbPE4Hy4Wo8fHG8LhKxZSx is the Tezos address corresponding to bootstrap1 the payer of the operation. In Tezos, the payer is the address responsible for paying the fees and burn (storage) of the transaction. In our case, it is also the source of the transfer. Here, tezos-client requests the counter of the payer, because all operations must have a different counter. This is an important feature, here, it will prevent bootstrap2 from sending the same operation over and over, emptying the account of bootstrap1.

Here, the counter is 2, probably because we already issued some former operations, so the next operation should have a counter of 3. The request is done on the block head of the main chain, an alias for the last block baked on the chain.

The next query is:

>>>>1: http://127.0.0.1:9731/chains/main/blocks/head/context/contracts/tz1KqTpEZ7Yob7QbPE4Hy4Wo8fHG8LhKxZSx/manager_key
<<<<1: 200 OK
{ "manager": "tz1KqTpEZ7Yob7QbPE4Hy4Wo8fHG8LhKxZSx",
  "key": "edpkuBknW28nW72KG6RoHtYW7p12T6GKc7nAbwYX5m8Wd9sDVC9yav" }

This time, the client requests the key of the account manager. For a keyhash address (tz…), the manager is always itself, but this query is needed to know if the public key of the manager has been revealed. Here, the key field contains a public key, which means a revelation operation has already been published. Otherwise, the client would have had to also create this revelation operation prior to the transfer (or together, actually). The revelation is mandatory, because all the nodes need to know the public key of the manager to validate the signature of the transfer.

Let’s see the next query:

>>>>2: http://127.0.0.1:9731/monitor/bootstrapped
<<<<2: 200 OK
{ "block": "BLyypN89WuTQyLtExGP6PEuZiu5WFDxys3GTUf7Vz4KvgKcvo2E",
  "timestamp": "2018-10-13T00:32:47Z" }

This time, the client checks whether the node it is using is well connected to the network. A node is bootstrapped if it has enough connections to other nodes, and its chain is synchronized with them. This step is needed to prevent the operation from being sent on an obsolete fork of the chain.

Now, the next query requests the current configuration of the network.

>>>>3: http://127.0.0.1:9731/chains/main/blocks/head/context/constants
<<<<3: 200 OK
{ "proof_of_work_nonce_size": 8,
  "nonce_length": 32,
  "max_revelations_per_block": 32,
  "max_operation_data_length": 16384,
  "preserved_cycles": 5,
  "blocks_per_cycle": 4096,
  "blocks_per_commitment": 32,
  "blocks_per_roll_snapshot": 512,
  "blocks_per_voting_period": 32768,
  "time_between_blocks": [ "60", "75" ],
  "endorsers_per_block": 32, 
  "hard_gas_limit_per_operation": "400000",
  "hard_gas_limit_per_block": "4000000",
  "proof_of_work_threshold": "-1",
  "tokens_per_roll": "10000000000",
  "michelson_maximum_type_size": 1000,
  "seed_nonce_revelation_tip": "125000",
  "origination_burn": "257000",
  "block_security_deposit": "512000000",
  "endorsement_security_deposit": "64000000", 
  "block_reward": "16000000",
  "endorsement_reward": "2000000",
  "cost_per_byte": "1000",
  "hard_storage_limit_per_operation": "60000"
}

These constants may differ for different protocols, or different networks. They are for example different on mainnet, alphanet and zeronet. Among these constants, some of them are useful when issuing a transaction: mainly hard_gas_limit_per_operation and hard_storage_limit_per_operation . The first one is the maximum gas that can be set for a transaction, and the second one is the maximum storage that can be used. We don’t plan to use them directly, but we will use them to compute an approximation of the gas and storage that we will set for the transaction.

>>>>4: http://127.0.0.1:9731/chains/main/blocks/head/hash
<<<<4: 200 OK
"BLyypN89WuTQyLtExGP6PEuZiu5WFDxys3GTUf7Vz4KvgKcvo2E"

This query is a bit redundant with the /monitor/bootstrapped query, which already returned the last block baked on the chain. Anyway, it is useful if we are not working on the main chain.

The next query requests the chain_id of the main chain, which is typically useful to verify that we know the format of operations for this chain id:

>>>>5: http://127.0.0.1:9731/chains/main/chain_id
<<<<5: 200 OK
"NetXdQprcVkpaWU"

Finally, the client tries to simulate the transaction, using the maximal gas and storage limits requested earlier. Since it is in simulation mode, the transaction is only ran locally on the node, and immediately backtracked. It is used to know if the transactions executes successfully, and to know the gas and storage actually used (to avoid paying fees for an erroneous transaction) :

>>>>6: http://127.0.0.1:9731/chains/main/blocks/head/helpers/scripts/run_operation
{ "branch": "BLyypN89WuTQyLtExGP6PEuZiu5WFDxys3GTUf7Vz4KvgKcvo2E",
  "contents": [
    { "kind": "transaction",
      "source": "tz1KqTpEZ7Yob7QbPE4Hy4Wo8fHG8LhKxZSx",
      "fee": "50000",
      "counter": "3",
      "gas_limit": "400000",
      "storage_limit": "60000",
      "amount": "100000000",
      "destination": "tz1gjaF81ZRRvdzjobyfVNsAeSC6PScjfQwN" } 
    ],
  "signature":
    "edsigtXomBKi5CTRf5cjATJWSyaRvhfYNHqSUGrn4SdbYRcGwQrUGjzEfQDTuqHhuA8b2d8NarZjz8TRf65WkpQmo423BtomS8Q"
}

The operation is related to a branch, and you can see that the branch field is here set to the hash of the last block head. The branch field is used to prevent an operation from being executed on an alternative head, and also for garbage collection: an operation can be inserted only in one of the 64 blocks after the branch block, or it will be deleted.

The result looks like this:

<<<<6: 200 OK
{ "contents": [ 
    { "kind": "transaction",
      "source": "tz1KqTpEZ7Yob7QbPE4Hy4Wo8fHG8LhKxZSx",
      "fee": "50000",
      "counter": "3",
      "gas_limit": "400000",
      "storage_limit": "60000",
      "amount": "100000000",
      "destination": "tz1gjaF81ZRRvdzjobyfVNsAeSC6PScjfQwN",
      "metadata": { 
        "balance_updates": [ 
         { "kind": "contract",
           "contract": "tz1KqTpEZ7Yob7QbPE4Hy4Wo8fHG8LhKxZSx",
           "change": "-50000" },
         { "kind": "freezer", 
           "category": "fees",
           "delegate": "tz1Ke2h7sDdakHJQh8WX4Z372du1KChsksyU",
           "level": 0, 
           "change": "50000" } 
          ],
        "operation_result":
          { "status": "applied",
            "balance_updates": [
             { "kind": "contract",
               "contract": "tz1KqTpEZ7Yob7QbPE4Hy4Wo8fHG8LhKxZSx",
               "change": "-100000000" },
             { "kind": "contract",
               "contract": "tz1gjaF81ZRRvdzjobyfVNsAeSC6PScjfQwN",
               "change": "100000000" } 
             ], 
        "consumed_gas": "100" } } } 
   ] 
}

Notice the consumed_gas field in the metadata section, that’s the gas that we can expect the transaction to use on the real chain. Here, there is no storage consumed, otherwise, a storage_size field would be present. The returned status is applied, meaning that the transaction could be successfully simulated by the node.

However, in the query, there was a field that we cannot easily infer: it is the signature field. Indeed, the tezos-client knows how to generate a signature for the transaction, knowing the public/private key of the manager. How can we do that in our wallet ? We will explain that in a next Tezos blog post.

Again, the tezos-client requests the last block head:

>>>>7: http://127.0.0.1:9731/chains/main/blocks/head/hash
<<<<7: 200 OK
"BLyypN89WuTQyLtExGP6PEuZiu5WFDxys3GTUf7Vz4KvgKcvo2E"

and the current chain id:

>>>>8: http://127.0.0.1:9731/chains/main/chain_id
<<<<8: 200 OK
"NetXdQprcVkpaWU"

The last simulation is a prevalidation of the transaction, with the exact same parameters (gas and storage) with which it will be submitted on the official blockchain:

>>>>9: http://127.0.0.1:9731/chains/main/blocks/head/helpers/preapply/operations
[ { "protocol": "PsYLVpVvgbLhAhoqAkMFUo6gudkJ9weNXhUYCiLDzcUpFpkk8Wt",
    "branch": "BLyypN89WuTQyLtExGP6PEuZiu5WFDxys3GTUf7Vz4KvgKcvo2E",
    "contents": [ 
     { "kind": "transaction", 
       "source": "tz1KqTpEZ7Yob7QbPE4Hy4Wo8fHG8LhKxZSx", 
       "fee": "50000",
       "counter": "3",
       "gas_limit": "200",
       "storage_limit": "0",
       "amount": "100000000",
       "destination": "tz1gjaF81ZRRvdzjobyfVNsAeSC6PScjfQwN" 
     } ], 
    "signature": "edsigu5Cb8WEmUZzoeGSL3sbSuswNFZoqRPq5nXA18Pg4RHbhnFqshL2Rw5QJBM94UxdWntQjmY7W5MqBDMhugLgqrRAWHyH5hD" 
} ]

Notice that, in this query, the gas_limit was set to 200. tezos-client is a bit conservative, adding 100 to the gas returned by the first simulation. Indeed, the gas can be different when the transaction is ran for inclusion, for example if a baker introduced another transaction before that interferes with this one (for example, a transaction that empties an account has an additionnal gas cost of 50).

<<<<9: 200 OK
[ { "contents": [
     { "kind": "transaction", 
       "source": "tz1KqTpEZ7Yob7QbPE4Hy4Wo8fHG8LhKxZSx",
       "fee": "50000",
       "counter": "3",
       "gas_limit": "200",
       "storage_limit": "0",
       "amount": "100000000",
       "destination": "tz1gjaF81ZRRvdzjobyfVNsAeSC6PScjfQwN",
       "metadata": { 
         "balance_updates": [ 
          { "kind": "contract",
            "contract": "tz1KqTpEZ7Yob7QbPE4Hy4Wo8fHG8LhKxZSx",
            "change": "-50000" },
          { "kind": "freezer",
            "category": "fees",
            "delegate": "tz1Ke2h7sDdakHJQh8WX4Z372du1KChsksyU",
            "level": 0,
            "change": "50000" } ],
         "operation_result": 
          { "status": "applied",
            "balance_updates": [ 
             { "kind": "contract",
               "contract": "tz1KqTpEZ7Yob7QbPE4Hy4Wo8fHG8LhKxZSx",
               "change": "-100000000" },
             { "kind": "contract",
               "contract": "tz1gjaF81ZRRvdzjobyfVNsAeSC6PScjfQwN",
               "change": "100000000" } ],
         "consumed_gas": "100" } 
     } } ], 
    "signature": "edsigu5Cb8WEmUZzoeGSL3sbSuswNFZoqRPq5nXA18Pg4RHbhnFqshL2Rw5QJBM94UxdWntQjmY7W5MqBDMhugLgqrRAWHyH5hD"
 } ]

Again, the tezos-client had to sign the transaction with the manager private key. This will be explained in a next blog post.

Since this prevalidation was successful, the client can now inject the transaction on the block chain:

>>>>10: http://127.0.0.1:9731/injection/operation?chain=main
"a75719f568f22f279b42fa3ce595c5d4d0227cc8cf2af351a21e50d2ab71ab3208000002298c03ed7d454a101eb7022bc95f7e5f41ac78d0860303c8010080c2d72f0000e7670f32038107a59a2b9cfefae36ea21f5aa63c00eff5b0ce828237f10bab4042a891d89e951de2c5ad4a8fa72e9514ee63fec9694a772b563bcac8ae0d332d57f24eae7d4a6fad784a8436b6ba03d05bf72e4408"
<<<<10: 200 OK
"ooUo7nUZAbZKhTuX5NC999BuHs9TZBmtoTrCWT3jFnW7vMdN25U"

We can see that this request does not contain the JSON encoding of the transaction, but a binary version (in hexadecimal format). This binary version is what is stored in the blockchain, to decrease the size of the storage. It contains both a binary encoding of the transaction, and the signature of the transaction. tezos-client knows this binary format, but if we want to create our own wallet, we will need a way to compute it by ourselves.

The node replies with the operation hash of the injected operation: the operation is now waiting for inclusion in the mempool of the node, and will be forwarded to other nodes so that the next baker can include it in the next block.

I hope you have now a better understanding of how a wallet can use Tezos RPCs to issue a transaction. We now have two remaining questions, for a next blog post:

How to generate the binary format of an operation, from the JSON encoding ? How to sign an operation, so that we can include this signature in the run, preapply and injection RPCs ?

If we can reply to these questions, we will also be able to sign operations offline.

Comments

lizhihohng (5 May 2019 at 6 h 59 min):

Before forge or sign a transaction, how to get a gas or gas limit, not a hard gas limit from contants?

Juliane (16 November 2019 at 15 h 29 min):

Good answer back in return of this difficulty with solid arguments and explaining all on the topic of that.

First Open-Source Release of TzScan

2018-11-08T09:05:17Z

In October 2017, after the Tezos ICO, OCamlPro started to work on a block explorer for Tezos. For us, it was the most important software that we could contribute to the community, after the node itself, of course. We used it internally to monitor the Tezos alphanet, until its official public release in February 2018, as TzScan. One of TzScan main goals was to make the complex DPOS consensus algorithm of Tezos easier to understand, to follow, especially for bakers who will contribute to it. Since its creation, we have been improving it every day, rushing for the Betanet in June 2018, and still now, monitoring all the Tezos networks, Mainnet, Alphanet and Zeronet.

So we are pleased today to announce the first release of TzScan OS, the open-source version of TzScan!

The sources are available on Gitlab: https://gitlab.com/tzscan/tzscan
The code, mostly OCaml, is distributed under GNU GPL v3.

The project contains:

The blockchain crawler, used to monitor the blockchain, and fill a PostgreSQL database
The web interface, requesting information using a REST API
The API server, using the PostgreSQL database to reply to API requests

It can be used in two different modes:

Remote Use: if you are not running a Tezos node, you might want to only run the web interface, using the official TzScan API server
Local Use: if you are running a Tezos node, you can use the crawler and the API server to serve information on your node, to a locally running web interface

Contribute

If you are interested in contributing to TzScan OS, a first step could be to translate TzScan in your language : check the file lang-en.json for a list of strings to translate, and lang-fr.json for a partial translation!

OCamlPro’s services around TzScan

TzScan OS can be used to monitor private/enterprise deployments of Tezos. OCamlPro is available to help and support such deployments.

Acknowledgments

We are thankful to the Tezos Foundation and Ryan Jesperson for their support!

All feedback is welcome!

Liquidity Tutorial: A Game with an Oracle for Random Numbers

2018-11-06T09:05:17Z

A Game with an oracle

In this small tutorial, we will see how to write a chance game on the Tezos blockchain with Liquidity and a small external oracle which provides random numbers.

Principle of the game

Rules of the game are handled by a smart contract on the Tezos blockchain.

When a player decides to start a game, she must start by making a transaction (i.e. a call) to the game smart contract with a number parameter (let's call it n) between 0 and 100 (inclusively). The amount that is sent with this transaction constitute her bet b.

A random number r is then chosen by the oracle and the outcome of the game is decided by the smart contract.

The player loses if her number n is greater than r. In this case, she forfeits her bet amount and the game smart contract is resets (the money stays on the game smart contract).- The player wins if her number n is smaller or equal to r. In this case, she gets back her initial bet b plus a reward which is proportional to her bet and her chosen number b * n / 100. This means that a higher number n, while being a riskier choice (the following random number must be greater), yields a greater reward. The edge cases being n = 0 is an always winning input but the reward is always null, and n = 100 wins only if the random number is also 100 but the player doubles her bet.

Architecture of the DApp

Everything that happens on the blockchain is deterministic and reproducible which means that smart contracts cannot generate random numbers securely ¹ .

The following smart contract works in this manner. Once a user starts a game, the smart contract is put in a state where it awaits a random number from a trusted off-chain source. This trusted source is our random generator oracle. The oracle monitors the blockchain and generates and sends a random number to the smart contract once it detects that it is waiting for one.

Because the oracle waits for a play transaction to be included in a block and sends the random number in a subsequent block, this means that a game round lasts at least two blocks ² .

This technicality forces us to split our smart contract into two distinct entry points:

A first entry point play is called by a player who wants to start a game (it cannot be called twice). The code of this entry point saves the game parameters in the smart contract storage and stops execution (awaiting a random number).- A second entry point finish, which can only be called by the oracle, accepts random numbers as parameter. The code of this entry point computes the outcome of the current game based on the game parameters and the random number, and then proceeds accordingly. At the end of finish the contract is reset and a new game can be started.

The Game Smart Contract

The smart contract game manipulates a storage of the following type:

type game = {
  number : nat;
  bet : tez;
  player : key_hash;
}

type storage = { game : game option; oracle_id : address; }

The storage contains the address of the oracle, oracle_id. It will only accept transactions coming from this address (i.e. that are signed by the corresponding private key). It also contains an optional value game that indicates if a game is being played or not.

A game consists in three values, stored in a record:

number is the number chosen by the player.- bet is the amount that was sent with the first transaction by the player. It constitute the bet amount.- player is the key hash (tz1...) on which the player who made the bet wishes to be payed in the event of a win.

We also give an initializer function that can be used to deploy the contract with an initial value. It takes as argument the address of the oracle, which cannot be changed later on.

let%init storage (oracle_id : address) =
  { game = (None : game option); oracle_id }

The `play` entry point

The first entry point, play takes as argument a pair composed of: - a natural number, which is the number chosen by the player - and a key hash, which is the address on which a player wishes to be payed as well as the current storage of the smart contract.

let%entry play (number : nat) storage = ...

The first thing this contract does is validate the inputs:

Ensure that the number is a valid choice, i.e. is between 0 and 100 (natural numbers are always greater or equal to 0).

if number > 100p then  failwith "number must be <= 100";

Ensure that the contract has enough funds to pay the player in case she wins. The highest paying bet is to play 100 which means that the user gets payed twice its original bet amount. At this point of the execution, the balance of the contract is already credited with the bet amount, so this check comes to ensuring that the balance is greater than twice the bet.

if 2p * Current.amount () > Current.balance () then
  failwith "I don't have enough money for this bet";

Ensure that no other game is currently being played so that a previous game is not erased.

match storage.game with
| Some</span> g ->
  failwith ("Game already started with", g)
| None ->
  (* Actual code of entry point *)

The rest of the code for this entry point consist in simply creating a new game record { number; bet; player } and saving it to the smart contract's storage. This entry point always returns an empty list of operations because it does not make any contract calls or transfers.

let bet = Current.amount () in
let storage = storage.game <- Some { number; bet; player } in
(([] : operation list), storage)

The new storage is returned and the execution stops at this point, waiting for someone (the oracle) to call the finish entry point.

The `finish` entry point

The second entry point, finish takes as argument a natural number parameter, which is the random number generated by the oracle, as well as the current storage of the smart contract.

let%entry finish (random_number : nat) storage = ...

The random number can be any natural number (these are mathematically unbounded natural numbers) so we must make sure it is between 0 and 100 before proceeding. Instead of rejecting too big random numbers, we simply (Euclidean) divide it by 101 and keep the remainder, which is between 0 and 100. The oracle already generates random numbers between 0 and 100 so this operation will do nothing but is interesting to keep if we want to replace the random generator one day.

let random_number = match random_number / 101p with
  | None -> failwith ()
  | Some (_, r) -> r in

Smart contracts are public objects on the Tezos blockchain so anyone can decide to call them. This means that permissions must be handled by the logic of the smart contract itself. In particular, we don't want finish to be callable by anyone, otherwise it would mean that the player could choose its own random number. Here we make sure that the call comes from the oracle.

if Current.sender () <> storage.oracle_id then
  failwith ("Random numbers cannot be generated");

We must also make sure that a game is currently being played otherwise this random number is quite useless.

match storage.game with
| None -> failwith "No game already started"
| Some game -> ...

The rest of the code in the entry point decides if the player won or lost, and generates the corresponding operations accordingly.

if random_number < game.number then
  (* Lose *)
  ([] : operation list)

If the random number is smaller that the chosen number, the player lost. In this case no operation is generated and the money is kept by the smart contract.

else
  (* Win *)
  let gain = match (game.bet * game.number / 100p) with
    | None -> 0tz
    | Some (g, _) -> g in
  let reimbursed = game.bet + gain in
  [ Account.transfer ~dest:game.player ~amount:reimbursed ]

Otherwise, if the random number is greater or equal to the previously chosen number, then the player won. We compute her gain and the reimbursement value (which is her original bet + her gain) and generate a transfer operation with this amount.

let storage = storage.game <- (None : game option) in
(ops, storage)

Finally, the storage of the smart contract is reset, meaning that the current game is erased. The list of generated operations and the reset storage is returned.

A safety entry point: `fund`

At anytime we authorize anyone (most likely the manager of the contract) to add funds to the contract's balance. This allows new players to participate in the game even if the contract has been depleted, by simply adding more funds to it.

let%entry fund _ storage =
  ([] : operation list), storage

This code does nothing, excepted accepting transfers with amounts.

Full Liquidity Code of the Game Smart Contract

[%%version 0.403]

type game = {
  number : nat;
  bet : tez;
  player : key_hash;
}

type storage = {
  game : game option;
  oracle_id : address;
}

let%init storage (oracle_id : address) =
  { game = (None : game option); oracle_id }

(* Start a new game *)
let%entry play ((number : nat), (player : key_hash)) storage =
  if number > 100p then failwith "number must be <= 100";
  if Current.amount () = 0tz then failwith "bet cannot be 0tz";
  if 2p * Current.amount () > Current.balance () then
    failwith "I don't have enough money for this bet";
  match storage.game with
  | Some g ->
    failwith ("Game already started with", g)
  | None ->
    let bet = Current.amount () in
    let storage = storage.game <- Some { number; bet; player } in
    (([] : operation list), storage)

(* Receive a random number from the oracle and compute outcome of the
   game *)
let%entry finish (random_number : nat) storage =
  let random_number = match random_number / 101p with
    | None -> failwith ()
    | Some (_, r) -> r in
  if Current.sender () <> storage.oracle_id then
    failwith ("Random numbers cannot be generated");
  match storage.game with
  | None -> failwith "No game already started"
  | Some game ->
    let ops =
      if random_number < game.number then
        (* Lose *)
        ([] : operation list)
      else
        (* Win *)
        let gain = match (game.bet * game.number / 100p) with
          | None -> 0tz
          | Some (g, _) -> g in
        let reimbursed = game.bet + gain in
        [ Account.transfer ~dest:game.player ~amount:reimbursed ]
    in
    let storage = storage.game <- (None : game option) in
    (ops, storage)

(* accept funds *)
let%entry fund _ storage =
  ([] : operation list), storage

The Oracle

The oracle can be implemented using Tezos RPCs on a running Tezos node. The principle of the oracle is the following:

Monitor new blocks in the chain.
For each new block, look if it includes successful transactions whose destination is the game smart contract.
Look at the parameters of the transaction to see if it is a call to either play, finish or fund.
If it is a successful call to play, then we know that the smart contract is awaiting a random number.
Generate a random number between 0 and 100 and make a call to the game smart contract with the appropriate private key (the transaction can be signed by a Ledger plugged to the oracle server for instance).
Wait a small amount of time depending on blocks intervals for confirmation.
Loop.

These can be implemented with the following RPCs:

Monitoring blocks: /chains/main/blocks?[length=<int>]https://tezos.gitlab.io/mainnet/api/rpc.html#get-chains-chain-id-blocks
Listing operations in blocks: /chains/main/blocks/<block_id>/operations/3https://tezos.gitlab.io/mainnet/api/rpc.html#get-block-id-operations-list-offset
Getting the storage of a contract: /chains/main/blocks/<block_id>/context/contracts/<contract_id>/storagehttps://tezos.gitlab.io/mainnet/api/rpc.html#get-block-id-context-contracts-contract-id-storage
Making transactions or contract calls:
- Either call the tezos-client binary (easiest if running on a server).
- Call the liquidity file.liq --call ... binary (private key must be in plain text so it is not recommended for production servers).

An implementation of a random number Oracle in OCaml (which uses the liquidity client to make transactions) can be found in this repository: https://github.com/OCamlPro/liq_game/blob/master/src/crawler.ml.

Try a version on the mainnet

This contract is deployed on the Tezos mainnet at the following address:KT1GgUJwMQoFayRYNwamRAYCvHBLzgorLoGo, with the minor difference that the contract refunds 1 μtz if the player loses to give some sort of feedback. You can try your luck by sending transactions (with a non zero amount) with a parameter of the form Left (Pair 99 "tz1LWub69XbTxdatJnBkm7caDQoybSgW4T3s") where 99 is the number you want to play and tz1LWub69XbTxdatJnBkm7caDQoybSgW4T3s is your refund address. You can do so by using either a wallet that supports passing parameters with transactions (like Tezbox) or the command line Tezos client:

tezos-client transfer 10 from my_account to KT1GgUJwMQoFayRYNwamRAYCvHBLzgorLoGo --fee 0 --arg 'Left (Pair 50 "tz1LWub69XbTxdatJnBkm7caDQoybSgW4T3s")'

Remarks

In this game, the oracle must be trusted and so it can cheat. To mitigate this drawback, the oracle can be used as a random number generator for several games, if random values are stored in an intermediate contract.
If the oracle looks for events in the last baked block (head), then it is possible that the current chain will be discarded and that the random number transaction appears in another chain. In this case, the player that sees this happen can play another game with a chosen number if he sees the random number in the mempool. In practice, the oracle operation is created only on the branch where the first player started, so that this operation cannot be put on another branch, removing any risk of attack.

Footnotes

Some contracts on Ethereum use block hashes as sources of randomness but these are easily manipulated by miners so they are not safe to use. There are also ways to have participants contribute parts of a random number with enforceable commitments https://github.com/randao/randao.
The random number could technically be sent in the same block by monitoring the mempool but it is not a good idea because the miner could reorder the transactions which will make both of them fail, or worse she could replace her bet accordingly once she sees a random number in her mempool.

Alain Mebsout: Alain is a senior engineer at OCamlPro. Alain was involved in Tezos early in 2017, participating in the design of the ICO infrastructure and in particular, the Bitcoin and Ethereum smart contracts. Since then, Alain has been developing the Liquidity language, compiler and online editor, and has started working on the verification of Liquidity smart contracts. Alain also contributed some code in the Tezos node to improve Michelson. Alain holds a PhD in Computer Science on formal verification of programs.

Comments

Luiz Milfont (14 December 2018 at 17 h 21 min):

Hello Mr. Alain Mebsout. My name is Milfont and I am the author of TezosJ SDK library, that allows to interact with Tezos blockchain through Java programming language.I did’t know this game before and got interested. I wonder if you would like me to create an Android version of your game, that would be an Android APP that would create a wallet automatically for the player and then he would pull a jackpot handle, sending the transaction with the parameters to your smart contract. I would like to know if you agree with this, and allow me to do it, using your already deployed game. Thanks in advance. Milfont. Twitter: @luizmilfont

michsell (1 October 2019 at 15 h 29 min):

Hello Alain,

I just played the game you designed, the problem is I cannot get any feedback even that 1utz for losing the game. Is the game retired? If so, can anyone help to remove it from tzscan dapps page: https://tzscan.io/dapps. Also, by any chance I may get the tezzies back…

Many thanks! Best regards, Michshell

opam 2.0.1 is out!

2018-10-24T09:05:17Z

We are pleased to announce the release of opam 2.0.1.

This new version contains mainly backported fixes, some platform-specific:

Cold boot for MacOS/CentOS/Alpine
Install checksum validation on MacOS
Archive extraction for OpenBSD now defaults to using gtar
Fix compilation of mccs on MacOS and Nix platforms
Do not use GNU-sed specific features in the release Makefile, to fix build on OpenBSD/FreeBSD
Cleaning to enable reproducible builds
Update configure scripts

And some opam specific:

git: fix git fetch by sha1 for git < 2.14
linting: add test variable warning and empty description error
upgrade: convert pinned but not installed opam files
error reporting: more comprehensible error message for tar extraction, and upgrade of git-url compilers
opam show: upgrade given local files
list: as opam 2.0.0 list doesn't return non-zero code if list is empty, add --silent option for a silent output and returns 1 if list is empty

Installation instructions (unchanged):

From binaries: run

sh <(curl -sL https://raw.githubusercontent.com/ocaml/opam/master/shell/install.sh)

or download manually from the Github "Releases" page to your PATH. In this case, don't forget to run opam init --reinit -ni to enable sandboxing if you had version 2.0.0~rc manually installed.

From source, using opam:

opam update; opam install opam-devel

(then copy the opam binary to your PATH as explained, and don't forget to run opam init --reinit -ni to enable sandboxing if you had version 2.0.0~rc manually installed)

From source, manually: see the instructions in the README.

We hope you enjoy this new major version, and remain open to bug reports and suggestions.

NOTE: this article is cross-posted on opam.ocaml.org and ocamlpro.com.

OCamlPro’s TzScan grant proposal accepted by the Tezos Foundation – joint press release

2018-10-17T09:05:17Z

Tezos Foundation and OCamlPro joint press release - October 17, 2018

We are pleased to announce that the Tezos Foundation has issued a grant to OCamlPro to support its work on TzScan, a block explorer for the Tezos blockchain that will be made open-source.

OCamlPro is a French company and R&D lab, focused on OCaml and blockchain development. OCamlPro, which is an active community member and contributor to Tezos, has initiated several Tezos-related projects such as TzScan and Liquidity, a high-level programming language for creating smart contracts in Tezos with an online editor, compiler and debugger, and features a decompiler to audit Michelson contracts.

Open-source block explorers are a key component of a blockchain ecosystem by allowing users to more easily monitor transactions, network validators (“bakers”), and the health of a network. OCamlPro will also provide documentation on Tezos and continue to improve the TzScan API, which may be used by applications such as wallets.

The Tezos Foundation’s core mission is to support the long-term success of the Tezos protocol and ecosystem. By funding projects imagined by scientists, researchers, developers, entrepreneurs, and enthusiasts, the Foundation encourages decentralized development and robust participation.

More information here.

Curious about OCamlPro's blockchain activities?

OCamlPro is a French software company and R&D lab, born in 2011 and located in Paris and Essonne. We are dedicated to improving the quality of software, through the use of formal methods, and we promote the use of OCaml, a fast and expressive, statically typed state-of-the-art programming language, matured for more than 30 years in the French public research lab Inria.

In 2014, OCamlPro has been involved in the Tezos project, helping with the Tezos protocol design and developing the prototype of Tezos, later to become the official Tezos software. In 2017, OCamlPro developed the ICO infrastructure for Tezos, including Bitcoin and Ethereum smart-contracts. OCamlPro self-funded two big projects around Tezos:

The TzScan block-explorer for Tezos: TzScan provides many features specific to Tezos delegated proof-of-stake protocol, to make life easier for bakers. TzScan API can be used by applications, such as wallets and delegation services to provide additional information to their users.
The Liquidity language for Tezos smart-contracts. Liquidity is a programming language, compiled to Michelson. Its online editor can be used to write, deploy, run and debug smart contracts. It also features a decompiler from Michelson, that can be used to audit contracts written in other languages.

In 2018, OCamlPro worked with the Tezos Foundation and Tezos Core Development team to prepare the launch of the betanet network, and later, the mainnet network.

With a team of 10 PhD-level developers working on Tezos, OCamlPro is one of the biggest spot of knowledge on Tezos. OCamlPro can provide many services to the Tezos community: improvement of Tezos software, development of specific software, features and new protocols, training and consulting and smart contract design, writing, and auditing. With tight connections with Inria and other French research labs and universities, OCamlPro is also involved in several research projects, related to blockchains or formal methods:

Formal methods: OCamlPro is involved in collaborative projects with academic and industrial partners to develop tools for software verification, such as the Alt-Ergo SMT Solver (from LRI).
OCaml tooling: we help optimize OCaml (flambda) and design development tools for OCaml (open-source most of the time). Such tools range from command-line tools (such as OPAM or ocp-build), or GUI tools (the OCaml Memory Profiler), to web-based tools (TryOCaml, the OCaml MOOC with the learn-OCaml platform of the OCaml Foundation of Inria).

opam 2.0.0 release and repository upgrade

2018-09-19T09:05:17Z

We are happy to announce the final release of opam 2.0.0.

A few weeks ago, we released a last release candidate to be later promoted to 2.0.0, synchronised with the opam package repository upgrade.

You are encouraged to update as soon as you see fit, to continue to get package updates: opam 2.0.0 supports the older formats, and 1.2.2 will no longer get regular updates. See the Upgrade Guide for details about the new features and changes.

The website opam.ocaml.org has been updated, with the full 2.0.0 documentation pages. You can still find the documentation for the previous versions in the corresponding menu.

Package maintainers should be aware of the following:

the master branch of the opam package repository is now in the 2.0.0 format
package submissions must accordingly be made in the 2.0.0 format, or using the new version of opam-publish (2.0.0)
anything that was merged into the repository in 1.2 format has been automatically updated to the 2.0.0 format
the 1.2 format repository has been forked to its own branch, and will only be updated for critical fixes

For custom repositories, the advice remains the same.

Installation instructions (unchanged):

From binaries: run

sh <(curl -sL https://raw.githubusercontent.com/ocaml/opam/master/shell/install.sh)

or download manually from the Github "Releases" page to your PATH. In this case, don't forget to run opam init --reinit -ni to enable sandboxing if you had version 2.0.0~rc manually installed.

From source, using opam:

opam update; opam install opam-devel

(then copy the opam binary to your PATH as explained, and don't forget to run opam init --reinit -ni to enable sandboxing if you had version 2.0.0~rc manually installed)

From source, manually: see the instructions in the README.

We hope you enjoy this new major version, and remain open to bug reports and suggestions.

NOTE: this article is cross-posted on opam.ocaml.org and ocamlpro.com.

Last stretch! Repository upgrade and opam 2.0.0 roadmap

2018-08-02T09:05:17Z

A few days ago, we released opam 2.0.0~rc4, and explained that this final release candidate was expected be promoted to 2.0.0, in sync with an upgrade to the opam package repository. So here are the details about this!

If you are an opam user, and don't maintain opam packages

You are encouraged to upgrade) as soon as comfortable, and get used to the changes and new features
All packages installing in opam 1.2.2 should exist and install fine on 2.0.0~rc4 (if you find one that doesn't, please report!)
If you haven't updated by September 17th, the amount of updates and new packages you receive may become limited¹.

So what will happen on September 17th ?

Opam 2.0.0~rc4 gets officially released as 2.0.0
On the ocaml/opam-repository Github repository, a 1.2 branch is forked, and the 2.0.0 branch is merged into the master branch
From then on, pull-requests to ocaml/opam-repository need to be in 2.0.0 format. Fixes to the 1.2 repository can be merged if important: pulls need to be requested against the 1.2 branch in that case.
The opam website shows the 2.0.0 repository by default (https://opam.ocaml.org/2.0-preview/ becomes https://opam.ocaml.org/)
The http repositories for 1.2 and 2.0 (as used by opam update) are accordingly moved, with proper redirections put in place

Advice for package maintainers

Until September 17th, pull-requests filed to the master branch of ocaml/opam-repository need to be in 1.2.2 format
The CI checks for all PRs ensure that the package passes on both 1.2.2 and 2.0.0. After the 17th of september, only 2.0.0 will be checked (and 1.2.2 only if relevant fixes are required).
The 2.0.0 branch of the repository will contain the automatically updated 2.0.0 version of your package definitions
You can publish 1.2 packages while using opam 2.0.0 by installing opam-publish.0.3.5 (running opam pin opam-publish 0.3.5 is recommended)
You should only need to keep an opam 1.2 installation for more complex setups (multiple packages, or if you need to be able to test the 1.2 package installations locally). In this case you might want to use an alias, e.g. alias opam.1.2="OPAMROOT=$HOME/.opam.1.2 ~/local/bin/opam.1.2. You should also probably disable opam 2.0.0's automatic environment update in that case (opam init --disable-shell-hook)
opam-publish.2.0.0~beta has a fully revamped interface, and many new features, like filing a single PR for multiple packages. It files pull-request in 2.0 format only, however. At the moment, it will file PR only to the 2.0.0 branch of the repository, but pushing 1.2 format packages to master is still preferred until September 17th.
It is also advised to keep in-source opam files in 1.2 format until that date, so as not to break uses of opam pin add --dev-repo by opam 1.2 users. The small opam-package-upgrade plugin can be used to upgrade single 1.2 opam files to 2.0 format.
ocaml-ci-script already switched to opam 2.0.0. To keep testing opam 1.2.2, you can set the variable OPAM_VERSION=1.2.2 in the .travis.yml file.

Advice for custom repository maintainers

The opam admin upgrade command can be used to upgrade your repository to 2.0.0 format. We recommand using it, as otherwise clients using opam 2.0.0 will do the upgrade locally every time. Add the option --mirror to continue serving both versions, with automatic redirects.
It's your place to decide when/if you want to switch your base repository to 2.0.0 format. You'll benefit from many new possibilities and safety features, but that will exclude users of earlier opam versions, as there is no backwards conversion tool.

¹ Sorry for the inconvenience. We'd be happy if we could keep maintaining the 1.2.2 repository for more time; repository maintainers are doing an awesome job, but just don't have the resources to maintain both versions in parallel.

opam 2.0.0 RC4-final is out!

2018-07-26T09:05:17Z

We are happy to announce the opam 2.0.0 final release candidate! 🍾

This release features a few bugfixes over Release Candidate 3. It will be promoted to 2.0.0 proper within a few weeks, when the official repository format switches from 1.2.0 to 2.0.0. After that date, updates to the 1.2.0 repository may become limited, as new features are getting used in packages.

It is safe to update as soon as you see fit, since opam 2.0.0 supports the older formats. See the Upgrade Guide for details about the new features and changes. If you are a package maintainer, you should keep publishing as before for now: the roadmap for the repository upgrade will be detailed shortly.

The opam.ocaml.org pages have also been refreshed a bit, and the new version showing the 2.0.0 branch of the repository is already online at https://opam.ocaml.org/2.0-preview/ (report any issues here).

Installation instructions:

From binaries: run

sh <(curl -sL https://raw.githubusercontent.com/ocaml/opam/master/shell/install.sh)

or download manually from the Github "Releases" page to your PATH. In this case, don't forget to run opam init --reinit -ni to enable sandboxing if you had version 2.0.0~rc manually installed.

From source, using opam:

opam update; opam install opam-devel

(then copy the opam binary to your PATH as explained, and don't forget to run opam init --reinit -ni to enable sandboxing if you had version 2.0.0~rc manually installed)

From source, manually: see the instructions in the README.

We hope you enjoy this new version, and remain open to bug reports and suggestions.

NOTE: this article is cross-posted on opam.ocaml.org and ocamlpro.com.

OCamlPro’s Tezos block explorer TzScan’s last updates

2018-07-20T09:05:17Z

OCamlPro is pleased to announce the latest update of TZScan (https://tzscan.io), its Tezos block explorer to ease the use of the Tezos network. TzScan is now ready for the protocol update scheduled for tomorrow. In addition to some minor bugfixes, the main novelties are:

Displaying of obtained and expected rewards
Adding of internal transactions of smart contracts
Adding of delegation services
Aliasing of known account and sponsors
Improvements of UX, and faster navigation
Improvements on desktop, tablets and mobiles

We continue to maintain the alphanet and zeronet branches in parallel of the betanet.

We keep on working hard to improve and add new features to TzScan. Further enhancements and optimizations are to come. Enjoy and play with our explorer! If you have any suggestions or bugs to report, please notify us at contact@tzscan.io

opam 2.0.0 Release Candidate 3 is out!

2018-06-22T09:05:17Z

We are pleased to announce the release of a third release candidate for opam 2.0.0. This one is expected to be the last before 2.0.0 comes out.

Changes since the 2.0.0~rc2 are, as expected, mostly fixes. We deemed it useful, however, to bring in the following:

a new command opam switch link that allows to select a switch to be used in a given directory (particularly convenient if you use the shell hook for automatic opam environment update)
a new option opam install --assume-built, that allows to install a package using its normal opam procedure, but for a source repository that has been built by hand. This fills a gap that remained in the local development workflows.

The preview of the opam 2 webpages can be browsed at http://opam.ocaml.org/2.0-preview/ (please report issues here).

Installation instructions (unchanged):

From binaries: run

sh <(curl -sL https://raw.githubusercontent.com/ocaml/opam/master/shell/install.sh)

or download manually from the Github "Releases" page to your PATH. In this case, don't forget to run opam init --reinit -ni to enable sandboxing if you had version 2.0.0~rc manually installed.

From source, using opam:

opam update; opam install opam-devel

(then copy the opam binary to your PATH as explained, and don't forget to run opam init --reinit -ni to enable sandboxing if you had version 2.0.0~rc manually installed)

From source, manually: see the instructions in the README.

Thanks a lot for testing out this new RC and reporting any issues you may find.

opam 2.0.0 Release Candidate 2 is out!

2018-05-22T09:05:17Z

We are pleased to announce the release of a second release candidate for opam 2.0.0.

This new version brings us very close to a final 2.0.0 release, and in addition to many fixes, features big performance enhancements over the RC1.

Among the new features, we have squeezed in full sandboxing of package commands for both Linux and macOS, to protect our users from any misbehaving scripts.

NOTE: if upgrading manually from 2.0.0~rc, you need to run opam init --reinit -ni to enable sandboxing.

The new release candidate also offers the possibility to setup a hook in your shell, so that you won't need to run eval $(opam env) anymore. This is specially useful in combination with local switches, because with it enabled, you are guaranteed that running make from a project directory containing a local switch will use it.

The documentation has also been updated, and a preview of the opam 2 webpages can be browsed at http://opam.ocaml.org/2.0-preview/ (please report issues here). This provides the list of packages available for opam 2 (the 2.0 branch of opam-repository), including the compiler packages.

Installation instructions:

From binaries: run

sh <(curl -sL https://raw.githubusercontent.com/ocaml/opam/master/shell/install.sh)

or download manually from the Github "Releases" page to your PATH. In this case, don't forget to run opam init --reinit -ni to enable sandboxing if you had version 2.0.0~rc manually installed.

From source, using opam:

opam update; opam install opam-devel

(then copy the opam binary to your PATH as explained, and don't forget to run opam init --reinit -ni to enable sandboxing if you had version 2.0.0~rc manually installed)

From source, manually: see the instructions in the README.

Thanks a lot for testing out this new RC and reporting any issues you may find.

NOTE: this article is cross-posted on opam.ocaml.org and ocamlpro.com.

Release of Alt-Ergo 2.2.0

2018-04-23T09:05:17Z

A new release of Alt-Ergo (version 2.2.0) is available.

You can get it from Alt-Ergo's website. An OPAM package for it will be published in the next few days.

The major novelty of this release is a new experimental front-end that supports the SMT-LIB 2 language, extended prenex polymorphism. This extension is implemented as a standalone library, and is available here: https://github.com/Coquera/psmt2-frontend

The full list of CHANGES is available here. As usual, do not hesitate to report bugs, to ask questions, or to give your feedback!

Taskforce on the Tezos Protocol, and TzScan evolution

2018-04-13T09:05:17Z

As we are preparing to work on the Tezos Protocol, we're still actively keeping the pace on the block explorer TZScan.io, adding cool information for baking accounts. We'd like to allow people to see who is contributing to the network and to understand the distribution of rolls, rights, etc.

For starters, we are showing the roll balance used for baking in the current cycle and the rolls history of a baker.

https://tzscan.io/tz1MqVR7hnZwH1FoQ7swjamanNxrXtNVAQ7v?default=baking

Enjoy, more to come in the next weeks!

OCaml JTRT

2018-04-01T09:05:17Z

This time of the year is, just like Christmas time, a time for laughs and magic... although the magic we are talking about, in the OCaml community, is not exactly nice, nor beautiful. Let's say that we are somehow akin to many religions: we know magic does exist , but that it is satanic and shouldn't be introduced to children.

Introducing Just The Right Time (JTRT)

Let me first introduce you to the concept of 'Just The Right Time' [1]. JTRT is somehow a 'Just In Time' compiler, but one that runs at the right time, not at some random moment decided by a contrived heuristic.

How does the compiler know when that specific good moment occurs? Well, he doesn't, and that's the point: you certainly know far better. In the OCaml world, we like good performances, like any other, but we prefer predictable ones to performances that may sometimes be awesome, and sometimes really slow. And we are ready to trade off some annotations for better predictability (or is it just me trying to give the impression that my opinion is everyone's opinion...). Don't forget that OCaml is a compiled language; hence the average generated code is good enough. Runtime compilation only matters for some subtle situations where a patterns gets repeated a lot, and you don't know about that pattern before receiving some inputs.

Of course the tradeoff wouldn't be the same in Javascript if you had to write something like that to get your code to perform decently.

function fact(n) {
   "compile this";
   if (n == 0) {
      "compile this too";
      return 1
   } else {
      "Yes, I really want to compile that";
      return (n * fact(n - 1););
   }
 }

The magical `this_is_the_right_time` function

There are already nice tools for doing that in OCaml. In particular, you should look at metaocaml, which is an extension of the language that has been maintained for years. But it requires you to think a bit about what your program is doing and add a few types, here and there.

Fortunately, today is the day you may want to try this ugly weekend hack instead.

To add a bit of context, let's say there are 1/ the Dirty Little Tricks, and 2/ the Other Kind of Ugly Hacks. We are presenting one of the latter; the kind of hacks for which you are both ashamed and a bit proud (but you should really be a lot more ashamed). I've made quite a few of those, and this one would probably rank well among the top 5 (and I'm deeply sorry about the other ones that are still in production somewhere...).

This is composed of two parts: a small compiler patch, and a runtime library. That library only exposes the following single function:

val this_is_the_right_time : 'a -> 'a

Let's take an example:

let f x =
  let y = x + x in
  let g z = z * y in
  g

let multiply_by_six = f 3

You can 'optimize' it by changing it to:

let f x =
  let y = x + x in
  let g z = z * y in
  g

let multiply_by_six = this_is_the_right_time (f 3)

That's all. By stating that this is the right time, you told the compiler to take that function and do its magic on it.

How *the f**k* does that work?!

The compiler patch is quite simple. It adds to every function some annotation to allow the compiler to know enough things about it. (It is annotated with its representation in the Flambda IR.) This is just a partial dump of the compiler memory state when transforming the Flambda IR to clambda. I tried to do it in some more 'disciplined' way (it used some magic to traverse the compiler internal memory representation to create a static version of it in the binary), but 'ld' was not so happy linking a ~500MB binary. So I went the 'marshal' way.

This now means that at runtime the program can find the representation of the closures. To give an example of the kind of code you really shouldn't write, here is the magic invocation to retrieve that representation:

let extract_representation_from_closure (value:'a)
                                 : Flambda.set_of_closures =
   let obj = Obj.repr value in
   let size = Obj.size obj in
   let id = Obj.obj (Obj.field obj (size - 2)) in
   let marshalled = Obj.field obj (size - 1) in
   (Marshal.from_string marshalled 0).(id)

With that, we now know the layout of the closure and we can extract all the variables that it binds. We can further inspect the value of those bound variables, and build an IR representation for them. That's the nice thing about having an untyped IR, you can produce some even when you lost the types. It will just probably be quite wrong, but who cares...

Now that we know everything about our closure, we can rebuild it, and so will we. As we can't statically build a non-closed function (the flambda IR happens after closure conversion), we will instead build a closed function that allocates the closure for us. For our example, it would look like this:

let build_my_closure previous_version_of_the_closure =
   let closure_field_y = previous_version_of_the_closure.y in
   fun z -> z * 6 (* closure_field_y * closure_field_y *)

In that case the function that we are building is closed, so we don't need the old closure to extract its field. But this shows the generic pattern. This would be used like that:

let this_is_the_right_time optimize_this =
   let ir_version = extract_representation_from_closure optimize_this in
   let build_my_closure = magic_building_function ir_version in
   build_my_closure optimize_this

I won't go too much into the details of the magic_building_function, because it would be quite tedious. Let's just say that it is using mechanisms provided for the native toplevel of OCaml.

A more sensible example

To finish on something a bit more interesting than time_6, let's suppose that we designed a super nice language whose AST and evaluator are:

type expr =
 | Add of expr * expr
 | Const of int
 | Var

let rec eval_expr expr x =
  match expr with
  | Add (e1, e2) -> eval_expr e1 x + eval_expr e2 x
  | Const i -> i
  | Var -> x

But we want to optimize it a bit, and hence wrote a super powerful pass:

let rec optimize expr =
   match expr with
   | Add (Const n1, Add (e, Const n2)) -> Add (Const (n1 + n2), optimize e)
   | Add (e1, e2) -> Add (optimize e1, optimize e2)
   | _ -> expr

The user writes some expression, that gets parsed to Add (Const 11, Add (Var, Const 22)), it goes through optimizing and results in Add (Const 33, Var). Then you find that this looks like the right time.

let optimized =
  this_is_the_right_time
    (fun x -> (eval_expr (optimize user_ast) x))

Annnnd... nothing happens. The reason being that there is no way to distinguish between mutable and immutable values at runtime, hence the safe assumption is to assume that everything is mutable, which limits optimizations a lot. So let's enable the 'special' mode:

incorrect_mode := true

And MAGIC happens! The code that gets spitted out is exactly what we want (that is fun x -> 33 + x).

Conclusion

Just so that you know, I don't really recommend using it. It's buggy, and many details are left unresolved (I suspect that the names you would come up for that kind of details would often sound like 'segfault'). Flambda was not designed to be used that way. In particular, there are some invariants that must be maintained, like the uniqueness of variables and functions... that we completely disregarded. That lead to some 'funny' behaviors (like power 2 8 returning 512...). It is possible to do that correctly, but that would require far more than a few hours' hacking. This might be a lot easier with the upcoming version of Flambda.

So this is far from ready, and it's not going to be anytime soon (supposing that this is a good idea, which I'm still not convinced it is).

But if you still want to play with it: the sources are available.

[1] Not that it exists in real-world.

Release of Alt-Ergo 2.1.0

2018-03-15T09:05:17Z

A new release of Alt-Ergo (version 2.1.0) is available on Alt-Ergo's website: https://alt-ergo.ocamlpro.com/#releases. An OPAM package for it will be published soon.

In this release, we mainly improved the CDCL-based SAT solver to get performances similar to/better than the old Tableaux-like SAT. The CDCL solver is now the default Boolean reasoner. The full list of CHANGES is available here.

Despite our various tests, you may still encounter some issues with this new solver. Please, don't hesitate to report bugs, ask questions, and give your feedback!

New updates on TzScan

2018-03-14T09:05:17Z

Update - TZScan.io can now work on top of the zeronet (zeronet.tzscan.io), we hope it can help the developers community monitor the network. You can now switch between the alphanet & zeronet networks!

OCamlPro is pleased to announce an update of TzScan (https://tzscan.io), its Tezos block explorer to ease the use of the Tezos network.

In addition to some minor bugfixes, the main novelties are:

Health of the network with stats about the blocks, endorsements, bakers, etc.
Display of future baker’s rights in the current cycle
For each account, a more detailed balance including the bonds, rewards, fees, etc. for the current cycle and its future basking positions
A new feature to inject signed operations in the network
In the detailed block’s view, all blocks are displayed at the same level in alternative chains
UI improvements on desktop, tablet and mobile

We are still working hard trying to improve and add new features to TzScan. Further enhancements and optimizations are to come. Enjoy and play with our explorer.

If you have suggestions or bugs, please send us reports at contact@tzscan.io

Release of a first version of TzScan, a Tezos block explorer

2018-02-14T09:05:17Z

OCamlPro is proud to release a first version of TzScan, its Tezos block explorer to ease the use of the Tezos network.

What TzScan can do for you :

Several charts on blocks, operations, network, volumes, fees, and more,
Marketcap and Futures/IOU prices from coinmarket.com,
Blocks, operations, accounts and contracts detail pages,
Public API to get information about blocks, operations, accounts and more,
Documentation on different concepts of Tezos like Endorsements, Nonces, etc.

What we tried to do with TzScan is to show differently the Tezos network to have a better understanding of what is really going on by showing the main points of Proof of Stake. Further enhancements and optimization are to come but enjoy and play with our explorer.

If you have suggestions or bugs, please send us reports at contact@tzscan.io !

OCamlPro’s Liquidity-lang demo at JFLA2018 – a smart-contract design language

2018-02-08T09:05:17Z

As a tradition, we took part in this year's Journées Francophones des Langages Applicatifs (JFLA 2018) that was chaired by LRI's Sylvie Boldo and hosted in Banyuls the last week of January. That was a nice opportunity to present a live demo of a multisignature smart-contract entirely written in the Liquidity language designed at OCamlPro, and deployed live on the Tezos alphanet (the slides are now available, see at the end of the post).

Tezos is the only blockchain to use a strongly typed, functional language, with a formal semantic and an interpreter validated by the use of GADTs (generalized abstract data-types). This stack-based language, named Michelson, is somewhat tricky to use as-is, the absence of variables (among others) necessitating to manipulate the stack directly. For this reason, we have developed, starting in June 2017, a higher level language, Liquidity, implementing the type system of Michelson in a subset of OCaml.

In addition to the compiler which allows to compile Liquidity programs to Michelson ones, we have developed a decompiler which, from Michelson code, can recover a Liquidity version, much easier to look at and understand (for humans). This tool is of some significance considering that contracts will be stored on the blockchain in Michelson format, making them more approachable and understandable for end users.

To facilitate designing contracts and foster Liquidity adoption we have also developed a web application. This app offers somewhat bare-bone editors for Liquidity and Michelson, allows compilation in the browser directly, deployment of Liquidity contracts and interaction with them (using the Tezos alphanet).

This blog post presents these different tools in more detail.

Michelson

Michelson is a stack-based, functional, statically and strongly typed language. It comes with a set of built-in base types like strings, Booleans, unbounded integers and naturals, lists, pairs, option types, union (of two) types, sets, maps. There also a number of domain dependent types like amounts (in tezzies), cryptographic keys and signatures, dates, etc. A Michelson program consists in a structured sequence of instructions, each of which operates on the stack. The program takes as inputs a parameter as well as a storage and returns a result and a new value for the storage. They can fail at runtime with the instruction FAIL, or another error (call of a failing contract, out of gas, etc.), but most instructions that could fail return an option instead ( e.g EDIV returns None when dividing by zero). The following example is a smart contract which implements a voting system on the blockchain. The storage consists in a map from possible votes (as strings) to integers counting number of votes. A transaction to this contract must be made with an amount (accessible with instruction AMOUNT) greater or equal to 5 tezzies and a parameter which is a valid vote. If one of these conditions is not respected, the execution, and thus the transaction, fail. Otherwise the program retrieves the previous number of votes in the storage and increments them. At the end of the execution, the stack contains the pair composed of the value Unit and the updated map (the new storage).

parameter string;
storage (map string int);
return unit;
code
  { # Pile = [ Pair parameter storage ]
    PUSH tez "5.00"; AMOUNT; COMPARE; LT;
    IF # Is AMOUNT < 5 tz ?
      { FAIL }
      {
        DUP; DUP; CAR; DIP { CDR }; GET; # GET parameter storage
        IF_NONE # Is it a valid vote ?
          { FAIL }
          { # Some x, x now in the stack
            PUSH int 1; ADD; SOME; # Some (x + 1)
            DIP { DUP; CAR; DIP { CDR } }; SWAP; UPDATE;
            # UPDATE parameter (Some (x + 1)) storage
            PUSH unit Unit; PAIR; # Pair Unit new_storage
          }
      };
  }

Michelson has several specificities:

Typing a Michelson program is done by types propagation, and not à la Milner. Polymorphic types are forbidden and type annotations are required when a type is ambiguous ( e.g. empty list).
Functions (lambdas) are pure and are not closures, i.e. they must have an empty environment. For instance, a function passed to another contract as parameter acts in a purely functional way, only accessing the environment of the new contract.
Method calls is preformed with the instruction TRANSFER_TOKENS: it requires an empty stack (not counting its arguments). It takes as argument the current storage, saves it before the call is made, and finally returns it after the call together with the result. This forces developers to save anything worth saving in the current storage, while keeping in mind that a reentring call can happend (the returned storage might be different).

We won't explain the semantics of Michelson here, a good one in big step form is available here.

The Liquidity Language

Liquidity is also a functional, statically and strongly typed language that compiles down to the stack-based language Michelson. Its syntax is a subset of OCaml and its semantic is given by its compilation schema (see below). By making the choice of staying close to Michelson in spirit while offering higher level constraints, Liquidity allows to easily write legible smart contracts with the same safety guaranties offered by Michelson. In particular we decided that it was important to keep the purely functional aspect of the language so that simply reading a contract is not obscured by effects and global state. In addition, the OCaml syntax makes Liquidity an immediately accessible tool to programmers who already know OCaml while its limited span makes the learning curve not too steep.

The following example is a liquidity version of the vote contract. Its inner workings are rather obvious for anyone who has already programmed in a ML-like language.

[%%version 0.15]

type votes = (string, int) map

let%init storage (myname : string) =
  Map.add myname 0 (Map ["ocaml", 0; "pro", 0])

let%entry main
    (parameter : string)
    (storage : votes)
  : unit * votes =

  let amount = Current.amount() in

  if amount < 5.00tz then
    Current.failwith "Not enough money, at least 5tz to vote"
  else
    match Map.find parameter storage with
    | None -> Current.failwith "Bad vote"
    | Some x ->
        let storage = Map.add parameter (x+1) storage in
        ( (), storage )

A Liquidity contract starts with an optional version meta-information. The compiler can reject the program if it is written in a too old version of the language or if it is itself not recent enough. Then comes a set of type and function definitions. It is also possible to specify an initial storage (constant, or a non-constant storage initializer) with let%init storage. Here we define a type abbreviation votes for a map from strings to integers. It is the structure that we will use to store our vote counts.

The storage initializer creates a map containing two bindings, "ocaml" to 0 and "pro" to 0 to which we add another vote option depending on the argument myname given at deploy time.

The entry point of the program is a function main defined with a special annotation let%entry. It takes as arguments a call parameter (parameter) and a storage (storage) and returns a pair whose first element is the result of the call, and second element is a potentially modified storage.

The above program defines a local variable amount which contains the amount of the transaction which generated the call. It checks that it is greater than 5 tezzies. If not, we fail with an explanatory message. Then the program retrieves the number of votes for the chosen option given as parameter. If the vote is not a valid one (i.e., there is no binding in the map), execution fails. Otherwise, the current number of votes is bound to the name x. Storage is updated by incrementing the number of votes for the chosen option. The built-in function Map.add adds a new binding (here, it replaces a previously existing binding) and returns the modified map. The program terminates, in the normal case, on its last expression which is its returned value (a pair containing () the contract only modifies the storage and the storage itself).

A reference manual for Liquidity is available here. It gives a relatively complete overview of the available types, built-in functions and constructs of the language.

Compilation

Encodings

Because Liquidity is a lot richer than Michelson, some types and constructs must be simplified or encoded. Record types are translated to right-associated pairs with as many components as the record has fields. t1 is encoded as t1' in the following example.

type t1 = { a: int; b: string; c: bool}
type t1’ = (int * (string * bool))

Field accesses in a record is translated to accesses in the corresponding tuples (pairs). Sum (or union) types are translated using the built-in variant type (this is the or type in Michelson). t2 is encoded as t2' in the following example.

type ('a, 'b) variant = Left of 'a | Right of `b

type t2 = A of int | B of string | C
type t2’ = (int, (string, unit) variant) variant

Similarly, pattern matching on expressions of a sum type is translated to nested pattern matchings on variant typed expressions. An example translation is the following:

match x with
| A i -> something1(i)
| B s -> something2(s)
| C -> something3

match x with
| Left i -> something1(i)
| Right r -> match r with
             | Left s -> something2(s)
             | Right -> something3

Liquidity also supports closures while Michelson only allows pure lambdas. Closures are translated by lambda-lifting, i.e. encoded as pairs whose first element is a lambda and second element is the closure environment. The resulting lambda takes as argument a pair composed of the closure's argument and environment. Adequate transformations are also performed for built-in functions that take lambdas as arguments ( e.g. in List.map) to allow closures instead.

Compilation schema

This little section is a bit more technical, so if you don't care how Liquidity is compiled precisely, you can skip over to the next one.

We note by Γ, [|x|]_d ⊢ X ↑^t compilation of the Liquidity instruction x, in environment Γ. Γ is a map associating variable names to a position in the stack. The compilation algorithm also maintains the size of the current stack (at compilation of instruction x), denoted by d in the previous expression. Below is a non-deterministic version of the compilation schema, the one implemented in the Liquidity compiler being a determinized version.

The result of compiling x is a Michelson instruction (or sequence of instructions) X together with a Boolean transfer information t. The instruction Contract.call (or TRANSFER_TOKENS in Michelson) needs an empty stack to evaluate, so the compiler empties the stack before translating this call. However, the various branches of a Michelson program must have the same stack type. This is why we need to maintain this information so that the compiler can empty stacks in some parts of the program.

Some of the rules have parts annotated with ?_b. This suffix denotes a potential reset or erasing. In particular:

For sets, Γ?_*b is ∅ if b evaluates to false, and Γ otherwise.
For integers, *d?_b is 0 if b evaluates to false, and d otherwise.
For instructions, (*X)?_b is {} if b evaluates to false, and X otherwise.

For instance, by looking at rule CONST, we can see that compiling a Liquidity constant simply consists in pushing this constant on the stack. To handle variables in a simple manner, the rule VAR tells us to look in the environment Γ for the index associated to the variable we want to compile. Then, instruction D(U)ⁱP puts at the top of the stack a copy of the element present at depth i. Variables are added to Γ with the Liquidity instruction let ... in or with any instruction that binds an new symbol, like fun for instance.

Decompilation from Michelson

While Michelson programs are high level compared to other bytecodes, it remains difficult for a blockchain end-user to understand what a Michelson program does exactly by looking at it. However, following the idea that "code is law", a user should be able to read a contract and understand its precise semantic. Thus, we have developed a decompiler from Michelson to Liquidity, which allows to recover a much more readable and understandable representation of a program on the blockchain.

The decompilation of Michelson code follows the diagram below where:

Cleaning consists in simplifying Michelson code to accelerate the whole process and simplify the following task. For now it consists in ereasing instructions whose continuation is a failure.
Symbolic Execution consists in executing the Michelson program with symbolic inputs, and by replacing every value placed in the stacj by a node containing the instruction that generated it. Each node of this graph can be seen as an expression of the target program, which can be bound to a variable name. Edges to this node represent future occurrences of this variable.
Decompilation consists in transforming the graph generated by the previous step in a Liquidity syntax tree. Names for variables are recovered from annotations produced by the Liquidity compiler (in case we decompile a Michelson program that was generated from Liquidty), or are chosen on the fly when no annotation is present (e.g. if the Michelson program was written by hand).

Finally the program is typed (to ensure no mistakes were made), simplified and pretty printed.

Example of decompilation

return int;
storage int;
code {DUP; CAR;
      DIP { CDR; PUSH int 1 };  # stack is: parameter :: 1 :: storage
      IF # if parameter = true
         { DROP; DUP; }         # stack is storage :: storage
         { }                    # stack is 1 :: storage
         ;
      PAIR;
     }

This example illustrate some of the difficulties of the decompilation process: Liquidity is a purely functional language where each construction is an expression returning a value; Michelson handles the stack directly, which is impossible to concretize in in Liquidity (values in the stack don't have the same type, as opposed to values in a list). In this example, depending on the value of parameter the contract returns either the content of the storage, or the integer 1. In the Michelson code, the programmer used the instruction IF, but its branches do not return a value and only operates by modifying (or not) the stack.

[%%version 0.15]
type storage = int
let%entry main (parameter : bool) (storage : storage) : (int * storage) =
    ((if parameter then storage else 1 ), storage)

The above translation to Liquidity also contains an if, but it has to return a value. The graph below is the result of the symbolic execution phase on the Michelson program. The IF instruction is decomposed in several nodes, but does not contain any remaining instruction: the result of this if is in fact the difference between the stack resulting from the execution of the then branch and from the else branch. It is denoted by the node N_IF_END_RESULT 0 (if there were multiple of these nodes with different indexes, the result of the if would have been a tuple, corresponding to the multiple changes in the stack).

Try-Liquidity

You can go to https://liquidity-lang.org/edit to try out Liquidity in your browser.

The first thing to do (if you want to deploy and interact with a contract) is to go into the settings menu. There you can set your Tezos private key (use one that you generated for the alphanet for the moment) or the source (i.e. your public key hash, which is derived from your private key if you set it).

You can also change which Tezos node you want to interact with (the first one should do, but you can also set one of your choosing such as one running locally on your machine). The timestamp shown next to the node name indicates how long ago was produced the last block that it knows of. Transactions that you make on a node that is not synchronized will not be included in the main chain.

You should now see your account with its balance in the top bar:

In the main editor window, you can select a few Liquidity example contracts or write your own. For this small tutorial, we will select multisig.liq which is a minimal multi-signature wallet contract. It allows anyone to send money to it, but it requires a certain number of predefined owners to agree before making a withdrawal.

Clicking on the button Compile should make the editor blink green (when there are no errors) and the compiled Michelson will appear on the editor on the right.

Let's now deploy this contract on the Tezos alphanet. By going into the Deploy (or paper airplane icon) tab, we can choose our set of owners for the multisig contract and the minimum number of owners to be in agreement before a withdrawal can proceed. Here I put down two account addresses for which I possess the private keys, and I want the two owners to agree before any transaction is approved (2p is the natural number 2).

Then I can either forge the deployment operation which is then possible to sign offline and inject in the Tezos chain by other means, or I can directly deploy this contract (if the private key is set in settings). If deployment is successful, we can see both the deployment operation and the new contract on a block explorer by clicking on the provided links.

Now we can query the blockchain to examine our newly deployed contract. Head over to the Examine tab. The address field should already be filled with our contract handle. We just have to click on Retrieve balance and storage.

The contract has 3tz on its balance because we chose to initialize it this way. On the right is the current storage of the contract (in Liquidity syntax) which is a record with four fields. Notice that the actions field is an empty map.

Let's make a few calls to this contract. Head over to the Call tab and fill-in the parameter and the amount. We can send for instance 5.00tz with the parameter Pay. Clicking on the button Call generates a transaction which we can observe on a block explorer. More importantly if we go back to the Examine tab, we can now retrieve the new information and see that the storage is unchanged but the balance is 8.00tz.

We can also make a call to withdraw money from the contract. This is done by passing a parameter of the form:

Manage (
  Some {
    destination = tz1brR6c9PY3SSfBDu7Qxdhsz3pvNRDwf68a;
    amount = 2tz;
})

This is a proposition of transfer of funds in the amount of 2.00tz from the contract to the destination tz1brR6c9PY3SSfBDu7Qxdhsz3pvNRDwf68a.

The balance of the contract has not changed (it is still 8.00tz) but the storage has been modified. That is because this multisig contract requires two owners to agree before proceeding. The proposition is stored in the map actions and is associated to the owner who made said proposition.

{
  owners =
    (Set
       [tz1XT2pgiSRWQqjHv5cefW7oacdaXmCVTKrU;
       tz1brR6c9PY3SSfBDu7Qxdhsz3pvNRDwf68a]);
  actions =
    (Map
       [(tz1brR6c9PY3SSfBDu7Qxdhsz3pvNRDwf68a,
          {
            destination = tz1brR6c9PY3SSfBDu7Qxdhsz3pvNRDwf68a;
            amount = 2.00tz
          })]);
  owners_length = 2p;
  min_agree = 2p
}

We can now open a new browser tab and point it to https://liquidity-lang.org/edit, but this time we fill in the private key for the second owner tz1XT2pgiSRWQqjHv5cefW7oacdaXmCVTKrU. We choose the multisig contract in the Liquidity editor and fill-in the contract address in the call tab with the same one as in the other session TZ1XvTpoSUeP9zZeCNWvnkc4FzuUighQj918 (you can double check the icons for the two contracts are identical). For the the withdrawal to proceed, this owner has to make the exact same proposition so let's make a call with the same parameter:

Manage (
  Some {
    destination = tz1brR6c9PY3SSfBDu7Qxdhsz3pvNRDwf68a;
    amount = 2tz;
})

The call should also succeed. When we examine the contract, we can now see that its balance is down to 6.00tz and that the field actions of its storage has been reinitialized to the empty map. In addition, we can update the balance of our first account (by clicking on the circle arrow icon in the tob bar) to see that it is now up an extra 2.00tz and that was the destination of the proposed (and agreed on) withdrawal. All is well!

We have seen how to compile, deploy, call and examine Liquidity contracts on the Tezos alphanet using our online editor. Experiment with your own contracts and let us know how that works for you!

Slides in English
and French

Comments

fredcy (9 February 2018 at 3 h 14 min):

It says “Here we define a type abbreviation votes […]” but I don’t see any votes symbol in the nearby code.

[Still working through the document. I’m eager to try Liquidity rather than write in Michelson.]

alain (9 February 2018 at 7 h 18 min):

You are right, thanks for catching this. I’ve updated the contract code to use type votes.

branch (26 February 2019 at 18 h 28 min):

Why the “Deploy” button can be inactive, while liquidity contract is compiled successfully?

alain (6 March 2019 at 15 h 09 min):

For the Deploy button to become active, you need to specify an initial value for the storage directly in the code of the smart contract. This can be done by writing a constant directly or a function.

let%init storage = (* the value of your initial storage*)

let%init storage x y z = (* the value of your initial storage, function of x, y and z *)

opam 2.0.0 Release Candidate 1 is out!

2018-02-02T09:05:17Z

We are pleased to announce a first release candidate for the long-awaited opam 2.0.0.

A lot of polishing has been done since the last beta, including tweaks to the built-in solver, allowing in-source package definitions to be gathered in an opam/ directory, and much more.

With all of the 2.0.0 features getting pretty solid, we are now focusing on bringing all the guides up-to-date¹, updating the tools and infrastructure, making sure there are no usability issues with the new workflows, and being future-proof so that further updates break as little as possible.

You are invited to read the beta5 announcement for details on the 2.0.0 features. Installation instructions haven't changed:

From binaries: run

sh <(curl -sL https://raw.githubusercontent.com/ocaml/opam/master/shell/install.sh)

or download manually from the Github "Releases" page to your PATH.

From source, using opam:

opam update; opam install opam-devel

(then copy the opam binary to your PATH as explained)

From source, manually: see the instructions in the README.

Thanks a lot for testing out the RC and reporting any issues you may find. See what we need tested for more detail.

¹ You can at the moment rely on the manpages, the Manual, and of course the API, but other pages might be outdated.

2017 at OCamlPro

2018-01-15T09:05:17Z

Since 2017 is just over, now is probably the best time to review what happened during this hectic year at OCamlPro… Here are our big 2017 achievements, in the world of blockchains (the Liquidity smart contract language, Tezos and the Tezos ICO etc.), of OCaml (with OPAM 2, flambda 2 etc.), and of formal methods (Alt-Ergo etc.)

In the World of Blockchains

The Liquidity Language for smart contracts

Work of Alain Mebsout, Fabrice Le Fessant, Çagdas Bozman, Michaël Laporte

OCamlPro develops Liquidity, a high level smart contract language for Tezos. Liquidity is a human-readable language, purely functional, statically-typed, whose syntax is very close to the OCaml syntax. Programs can be compiled to the stack-based language (Michelson) of the Tezos blockchain.

To garner interest and adoption, we developed an online editor called "Try Liquidity". Smart-contract developers can design contracts interactively, directly in the browser, compile them to Michelson, run them and deploy them on the alphanet network of Tezos.

Future plans include a full-fledged web-based IDE for Liquidity. Worth mentioning is a neat feature of Liquidity: decompiling a Michelson program back to its Liquidity version, whether it was generated from Liquidity code or not. In practice, this allows to easily read somewhat obfuscated contracts already deployed on the blockchain.

Tezos and the Tezos ICO

Work of Grégoire Henry, Benjamin Canou, Çagdas Bozman, Alain Mebsout, Michael Laporte, Mohamed Iguernlala, Guillem Rieu, Vincent Bernardoff (for DLS) and at times all the OCamlPro team in animated and joyful brainstorms.

Since 2014, the OCamlPro team had been co-designing the Tezos prototype with Arthur Breitman based on Arthur's White Paper, and had undertaken the implementation of the Tezos node and client. A technical prowess and design achievement we have been proud of. In 2017, we developed the infrastructure for the Tezos ICO (Initial Coin Offering) from the ground up, encompassing the web app (back-end and front-end), the Ethereum and Bitcoin (p2sh) multi-signature contracts, as well as the hardware Ledger based process for transferring funds. The ICO, conducted in collaboration with Arthur, was a resounding success — the equivalent of 230 million dollars (in ETH and BTC) at the time were raised for the Tezos Foundation!

This work was allowed thanks to Arthur Breitman and DLS's funding.

In the World of OCaml

Towards OPAM 2.0, the OCaml Package manager

OPAM was born at Inria/OCamlPro with Frederic, Thomas and Louis, and is still maintained here at OCamlPro. Now thanks to Louis Gesbert's thorough efforts and the OCaml Labs contribution, OPAM 2.0 is coming !

opam is now compiled with a built-in solver, improving the portability, ease of access and user experience (aspcud no longer a requirement)
new workflows for developers have been designed, including convenient ways to test and install local sources, more reliable ways to share development setups
the general system has seen a large number of robustness and expressivity improvements, like extended dependencies
it also provides better caching, and many hooks enabling, among others, setups with sandboxed builds, binary artifacts caching, or end-to-end package signature verification.

More details: on https://opam.ocaml.org/blog and releases on https://github.com/ocaml/opam/releases

This work is allowed thanks to JaneStreet's funding.

Flambda Compilation

Work of Pierre Chambart, Vincent Laviron

Pierre and Vincent's considerable work on Flambda 2 (the optimizing intermediate representation of the OCaml compiler – on which inlining occurs), in close cooperation with JaneStreet's team (Mark, Leo and Xavier) aims at overcoming some of flambda's limitations. This huge refactoring will help make OCaml code more maintainable, improving its theoretical grounds. Internal types are clearer, more concise, and possible control flow transformations are more flexible. Overall a precious outcome for industrial users.

This work is allowed thanks to JaneStreet's funding.

OCaml for ia64-HPUX

In 2017, OCamlPro also worked on porting OCaml on HPUX-ia64. This came from a request of CryptoSense, a French startup working on an OCaml tool to secure cryptographic protocols. OCaml had a port on Linux-ia64, that was deprecated before 4.00.0 and even before, a port on HPUX, but not ia64. So, we expected the easiest part would be to get the bytecode version running, and the hardest part to get access to an HPUX-ia64 computer: it was quite the opposite, HPUX is an awkward system where most tools (shell, make, etc.) have uncommon behaviors, which made even compiling a bytecode version difficult. On the contrary, it was actually easy to get access to a low-power virtual machine of HPUX-ia64 on a monthly basis. Also, we found a few bugs in the former OCaml ia64 backend, mostly caused by the scheduler, since ia64 uses explicit instruction parallelism. Debugging such code was quite a challenge, as instructions were often re-ordered and interleaved. Finally, after a few weeks of work, we got both the bytecode and native code versions running, with only a few limitations.

This work was mandated by CryptoSense.

The style-checker Typerex-lint

Work of Çagdas Bozman, Michael Laporte and Clément Dluzniewski.

In 2017, typerex-lint has been improved and extended. Typerex-lint is a style-checker to analyze the sources of OCaml programs, and can be extended using plugins. It allows to automatically check the conformance of a code base to some coding rules. We added some analysis to look for code that doesn't comply with the recommendations made by the SecurOCaml project members. We also made an interactive web output that provides an easy way to navigate in typerex-lint results.

Build systems and tools

Work of Fabrice Le Fessant

Every year in the OCaml world, a new build tool appears. 2017 was not different, with the rise of jbuild/dune. jbuild came with some very nice features, some of which were already in our home-made build tool, ocp-build, like the ability to build multiple packages at once in a composable way, some other ones were new, like the ability to build multiple versions of the package in one run or the wrapping of libraries using module aliases. We have started to incorporate some of these features in ocp-build. Nevertheless, from our point of view, the two tools belong to two different families: jbuild/dune belongs to the "implicit" family, like ocamlbuild and oasis, with minimal project description; ocp-build belongs to the "explicit" family, like make and omake. We prefer the explicit family, because the build file acts as a description of the project, an entry point to understand the project and the modules. Also, we have kept working on improving the project description language for ocp-build, something that we think is of utmost importance. Latest release: ocp-build 1.99.20-beta.

Other contributions and software

OCaml bugfixes by Pierre Chambart, Vincent Laviron, and other members of the team.
The ocp-analyzer prototype by Vincent Laviron

In the World of Formal Methods

Alt-Ergo

By Mohamed Iguernlala

For Alt-Ergo, 2017 was the year of floating-point arithmetic reasoning. Indeed, in addition to the publication of our results at the 29th International Conference on Computer Aided Verification (CAV), Jul 2017, we polished the prototype we started in 2016 and integrated it in the main branch. This is a joint work with Sylvain Conchon (Paris-Saclay University) and Guillaume Melquiond (Inria Lab) in the context of the SOPRANO ANR Project. Another big piece of work in 2017 consisted in investigating a better integration of an efficient CDCL-based SAT solver in Alt-Ergo. In fact, although modern CDCL SAT solvers are very fast, their interaction with the decision procedures and quantifiers instantiation should be finely tuned to get good results in the context of Satisfiability Modulo Theories. This new solver should be integrated into Alt-Ergo in the next few weeks. This work has been done in the context of the LCHIP FUI Project.

We also released a new major version of Alt-Ergo (2.0.0) with a modification in the licensing scheme. Alt-Ergo@OCamlPro's development repository is now made public. This will allow users to get updates and bugfixes as soon as possible.

Towards a formalized type system for OCaml

Work of Pierrick Couderc, Grégoire Henry, Fabrice Le Fessant and Michel Mauny (Inria Paris)

OCaml is known for its rich type system and strong type inference, unfortunately such complex type engine is prone to errors, and it can be hard to come up with clear idea of how typing works for some features of the language. For 3 years now, OCamlPro has been working on formalizing a subset of this type system and implementing a type checker derived from this formalization. The idea behind this work is to help the compiler developers ensure some form of correctness of the inference. This type checker takes a Typedtree, the intermediate representation resulting from the inference, and checks its consistency. Put differently, this tool checks that each annotated node from the Typedtree can be indeed given such a type according to the context, its form and its sub-expressions. In practice, we could check and catch some known bugs resulting from unsound programs that were accepted by the compiler.

This type checker is only available for OCaml 4.02 for the moment, and the document describing this formalized type system will be available shortly in a PhD thesis, by Pierrick Couderc.

Around the World

OCamlPro's team members attended many events throughout the world:

The ICFP'2017 (Oxford)
The JFLA'2017 (Gourette, Pyrénées)
The CAV'2017 (29th International Conference on Computer Aided Verification, Heidelberg)
The POSS'2017 (Paris)

As a member committed to the OCaml ecosystem's animation, we've organized OCaml meetups too (see the famous OUPS meetups in Paris!).

A few hints about what's ahead for OCamlPro

Let's keep up the good work!

opam 2.0 Beta5 is out!

2017-11-27T09:05:17Z

After a few more months brewing, we are pleased to announce a new beta release of opam. With this new milestone, opam is reaching feature-freeze, with an expected 2.0.0 by the beginning of next year.

This version brings many new features, stability fixes, and big improvements to the local development workflows.

What's new

The features presented in past announcements: local switches, in-source package definition handling, extended dependencies are of course all present. But now, all the glue to make them interact nicely together is here to provide new smooth workflows. For example, the following command, if run from the source tree of a given project, creates a local switch where it will restore a precise installation, including explicit versions of all packages and pinnings:

opam switch create ./ --locked

this leverages the presence of opam.locked or <name>.opam.locked files, which are valid package definitions that contain additional details of the build environment, and can be generated with the opam-lock plugin (the lock command may be merged into opam once finalised).

But this new beta also provides a large amount of quality of life improvements, and other features. A big one, for example, is the integration of a built-in solver (derived from mccs and glpk). This means that the opam binary works out-of-the box, without requiring the external aspcud solver, and on all platforms. It is also faster.

Another big change is that detection of architecture and OS details is now done in opam, and can be used to select the external dependencies with the new format of the depexts: field, but also to affect dependencies or build flags.

There is much more to it. Please see the changelog, and the updated manual.

How to try it out

Our warm thanks for trying the new beta and reporting any issues you may hit.

There are three main ways to get the update:

The easiest is to use our pre-compiled binaries. This script will also make backups if you migrate from 1.x, and has an option to revert back:

sh <(curl -sL https://raw.githubusercontent.com/ocaml/opam/master/shell/install.sh)

This uses the binaries from https://github.com/ocaml/opam/releases/tag/2.0.0-beta5

Another option is to compile from source, using an existing opam installation. Simply run:

opam update; opam install opam-devel

and follow the instructions (you will need to copy the compiled binary to your PATH).

Compiling by hand from the inclusive source archive, or from the git repo. Use ./configure && make lib-ext && make if you have OCaml >= 4.02.3 already available; make cold otherwise.

If the build fails after updating a git repo from a previous version, try git clean -fdx src/ to remove any stale artefacts.

Note that the repository format is different from that of opam 1.2. Opam 2 will be automatically redirected from the opam-repository to an automatically rewritten 2.0 mirror, and is otherwise able to do the conversion on the fly (both for package definitions when pinning, and for whole repositories). You may not yet contribute packages in 2.0 format to opam-repository, though.

What we need tested

We are interested in all opinions and reports, but here are a few areas where your feedback would be specially useful to us:

Use 2.0 day-to-day, in particular check any packages you may be maintaining. We would like to ensure there are no regressions due to the rewrite from 1.2 to 2.0.
Check the quality of the solutions provided by the solver (or conflicts, when applicable).
Test the different pinning mechanisms (rsync, git, hg, darcs) with your project version control systems. See the --working-dir option.
Experiment with local switches for your project (and/or opam install DIR). Give us feedback on the workflow. Use opam lock and share development environments.
If you have any custom repositories, please try the conversion to 2.0 format with opam admin upgrade --mirror on them, and use the generated mirror.
Start porting your CI systems for larger projects to use opam 2, and give us feedback on any improvements you need for automated scripting (e.g. the --json output).

new opam features: more expressive dependencies

2017-05-11T09:05:17Z

This blog will cover yet another aspect of the improvements opam 2.0 has over opam 1.2. I may be a little more technical than previous issues, as it covers a feature directed specifically at packagers and repository maintainers, and regarding the package definition format.

Specifying dependencies in opam 1.2

Opam 1.2 already has an advanced way of specifying package dependencies, using formulas on packages and versions, with the following syntax:

    depends: [
      "foo" {>= "3.0" & < "4.0~"}
      ( "bar" | "baz" {>= "1.0"} )
    ]

meaning that the package being defined depends on both package foo, within the 3.x series, and one of bar or baz, the latter with version at least 1.0. See here for a complete documentation.

This only allows, however, dependencies that are static for a given package.

Opam 1.2 introduced build, test and doc "dependency flags" that could provide some specifics for dependencies (e.g. test dependencies would only be needed when tests were requested for the package). These were constrained to appear before the version constraints, e.g. "foo" {build & doc & >= "3.0"}.

Extensions in opam 2.0

Opam 2.0 generalises the dependency flags, and makes the dependencies specification more expressive by allowing to mix filters, i.e. formulas based on opam variables, with the version constraints. If that formula holds, the dependency is enforced, if not, it is discarded.

This is documented in more detail in the opam 2.0 manual.

Note also that, since the compilers are now packages, the required OCaml version is now expressed using this mechanism as well, through a dependency to the (virtual) package ocaml, e.g. depends: [ "ocaml" {>= "4.03.0"} ]. This replaces uses of the available: field and ocaml-version switch variable.

Conditional dependencies

This makes it trivial to add, for example, a condition on the OS to a given dependency, using the built-in variable os:

depends: [ "foo" {>= "3.0" & < "4.0~" & os = "linux"} ]

here, foo is simply not needed if the OS isn't Linux. We could also be more specific about other OSes using more complex formulas:

    depends: [
      "foo" { "1.0+linux" & os = "linux" |
              "1.0+osx" & os = "darwin" }
      "bar" { os != "osx" & os != "darwin" }
    ]

Meaning that Linux and OSX require foo, respectively versions 1.0+linux and 1.0+osx, while other systems require bar, any version.

Dependency flags

Dependency flags, as used in 1.2, are no longer needed, and are replaced by variables that can appear anywhere in the version specification. The following variables are typically useful there:

with-test, with-doc: replace the test and doc dependency flags, and are true when the package's tests or documentation have been requested
likewise, build behaves similarly as before, limiting the dependency to a "build-dependency", implying that the package won't need to be rebuilt if the dependency changes
dev: this boolean variable holds true on "development" packages, that is, packages that are bound to a non-stable source (a version control system, or if the package is pinned to an archive without known checksum). dev sources often happen to need an additional preliminary step (e.g. autoconf), which may have its own dependencies.

Use opam config list for a list of pre-defined variables. Note that the with-test, with-doc and build variables are not available everywhere: the first two are allowed only in the depends:, depopts:, build: and install: fields, and the latter is specific to the depends: and depopts: fields.

For example, the datakit.0.9.0 package has:

depends: [
  ...
  "datakit-server" {>= "0.9.0"}
  "datakit-client" {with-test & >= "0.9.0"}
  "datakit-github" {with-test & >= "0.9.0"}
  "alcotest" {with-test & >= "0.7.0"}
]

When running opam install datakit.0.9.0, the with-test variable is set to false, and the datakit-client, datakit-github and alcotest dependencies are filtered out: they won't be required. With opam install datakit.0.9.0 --with-test, the with-test variable is true (for that package only, tests on packages not listed on the command-line are not enabled!). In this case, the dependencies resolve to:

depends: [
  ...
  "datakit-server" {>= "0.9.0"}
  "datakit-client" {>= "0.9.0"}
  "datakit-github" {>= "0.9.0"}
  "alcotest" {>= "0.7.0"}
]

which is treated normally.

Computed versions

It is also possible to use variables, not only as conditions, but to compute the version values: "foo" {= var} is allowed and will require the version of package foo corresponding to the value of variable var.

This is useful, for example, to define a family of packages, which are released together with the same version number: instead of having to update the dependencies of each package to match the common version at each release, you can leverage the version package-variable to mean "that other package, at the same version as current package". For example, foo-client could have the following:

depends: [ "foo-core" {= version} ]

It is even possible to use variable interpolations within versions, e.g. specifying an os-specific version differently than above:

depends: [ "foo" {= "1.0+%{os}%"} ]

this will expand the os variable, resolving to 1.0+linux, 1.0+darwin, etc.

Getting back to our datakit example, we could leverage this and rewrite it to the more generic:

depends: [
  ...
  "datakit-server" {>= version}
  "datakit-client" {with-test & >= version}
  "datakit-github" {with-test & >= version}
  "alcotest" {with-test & >= "0.7.0"}
]

Since the datakit-* packages follow the same versioning, this avoids having to rewrite the opam file on every new version, with a risk of error each time.

As a side note, these variables are consistent with what is now used in the build: field, and the build-test: field is now deprecated. So this other part of the same datakit opam file:

build:
  ["ocaml" "pkg/pkg.ml" "build" "--pinned" "%{pinned}%" "--tests" "false"]
build-test: [
  ["ocaml" "pkg/pkg.ml" "build" "--pinned" "%{pinned}%" "--tests" "true"]
  ["ocaml" "pkg/pkg.ml" "test"]
]

would now be preferably written as:

build: ["ocaml" "pkg/pkg.ml" "build" "--pinned" "%{pinned}%" "--tests" "%{with-test}%"]
run-test: ["ocaml" "pkg/pkg.ml" "test"]

which avoids building twice just to change the options.

Conclusion

Hopefully this extension to expressivity in dependencies will make the life of packagers easier; feedback is welcome on your personal use-cases.

Note that the official repository is still in 1.2 format (served as 2.0 at https://opam.ocaml.org/2.0, through automatic conversion), and will only be migrated a little while after opam 2.0 is finally released. You are welcome to experiment on custom repositories or pinned packages already, but will need a little more patience before you can contribute package definitions making use of the above to the official repository.

NOTE: this article is cross-posted on opam.ocaml.org and ocamlpro.com.

new opam features: "opam install DIR"

2017-05-04T09:05:17Z

After the opam build feature was announced followed a lot of discussions, mainly having to do with its interface, and misleading name. The base features it offered, though, were still widely asked for:

a way to work directly with the project in the current directory, assuming it contains definitions for one or more packages
a way to copy the installed files of a package below a specified destdir
an easier way to get started hacking on a project, even without an initialised opam

Status of `opam build`

opam build, as described in a previous post has been dropped. It will be absent from the next Beta, where the following replaces it.

Handling a local project

Consistently with what was done with local switches, it was decided, where meaningful, to overload the <packages> arguments of the commands, allowing directory names instead, and meaning "all packages defined there", with some side-effects.

For example, the following command is now allowed, and I believe it will be extra convenient to many:

opam install . --deps-only

What this does is find opam (or <pkgname>.opam) files in the current directory (.), resolve their installations, and install all required packages. That should be the single step before running the source build by hand.

The following is a little bit more complex:

opam install .

This also retrieves the packages defined at ., pins them to the current source (using version-control if present), and installs them. Note that subsequent runs actually synchronise the pinnings, so that packages removed or renamed in the source tree are tracked properly (i.e. removed ones are unpinned, new ones pinned, the other ones upgraded as necessary).

opam upgrade, opam reinstall, and opam remove have also been updated to handle directories as arguments, and will work on "all packages pinned to that target", i.e. the packages pinned by the previous call to opam install <dir>. In addition, opam remove <dir> unpins the packages, consistently reverting the converse install operation.

opam show already had a --file option, but has also been extended in the same way, for consistency and convenience.

This all, of course, works well with a local switch at ./, but the two features can be used completely independently. Note also that the directory name must be made unambiguous with a possible package name, so make sure to use ./foo rather than just foo for a local project in subdirectory foo.

Specifying a destdir

This relies on installed files tracking, but was actually independent from the other opam build features. It is now simply a new option to opam install:

opam install foo --destdir ~/local/

will install foo normally (if it isn't installed already) and copy all its installed files, following the same hierarchy, into ~/local. opam remove --destdir is also supported, to remove these files.

Initialising

Automatic initialisation has been dropped for the moment. It was only saving one command (opam init, that opam will kindly print out for you if you forget it), and had two drawbacks:

some important details (like shell setup for opam) were skipped
the initialisation options were much reduced, so you would often have to go back to opam init anyway. The other possibility being to duplicate init options to all commands, adding lots of noise. Keeping things separate has its merits.

Granted, another command, opam switch create ., was made implicit. But using a local switch is a user choice, and worse, in contradiction with the previous de facto opam default, so not creating one automatically seems safer: having to specify --no-autoinit to opam build in order to get the more simple behaviour was inconvenient and error-prone.

One thing is provided to help with initialisation, though: opam switch create <dir> has been improved to handle package definitions at <dir>, and will use them to choose a compatible compiler, as opam build did. This avoids the frustration of creating a switch, then finding out that the package wasn't compatible with the chosen compiler version, and having to start over with an explicit choice of a different compiler.

If you would really like automatic initialisation, and have a better interface to propose, your feedback is welcome!

A few other new options have been added to opam install and related commands, to improve the project-local workflows:

opam install --keep-build-dir is now complemented with --reuse-build-dir, for incremental builds within opam (assuming your build-system supports it correctly). At the moment, you should specify both on every upgrade of the concerned packages, or you could set the OPAMKEEPBUILDDIR and OPAMREUSEBUILDDIR environment variables.
opam install --inplace-build runs the scripts directly within the source dir instead of a dedicated copy. If multiple packages are pinned to the same directory, this disables parallel builds of these packages.
opam install --working-dir uses the working directory state of your project, instead of the state registered in the version control system. Don't worry, opam will warn you if you have uncommitted changes and forgot to specify --working-dir.

NOTE: this article is cross-posted on opam.ocaml.org and ocamlpro.com.

Comments

Hez Carty (4 May 2017 at 21 h 30 min):

Would a command like “opam init $DIR” and “opam init $DIR –deps-only” work for an auto-intialization interface? Ideally creating the equivalent to a bare .opam/ using $DIR as $OPAMROOT + install a local switch + “opam install .” (with –deps-only if specified) under the newly created switch.

Louis Gesbert (5 May 2017 at 7 h 50 min):

opam init DIR is currently used and means “use DIR as your initial, default package repository”. Overloading opam init sounds like a good approach though, esp. since the default of the command is already to create an initial switch. But a new flag, e.g. opam init –here, could be used to mean: do opam init –bare (it’s idempotent), opam switch create . and then opam install ..

The issue that remains is inherent to compound commands: we would have to port e.g. the –deps-only option to opam init, making the interface and doc heavier, and it would only make sense in this specific use-case ; either that or limit the expressivity of the compound command, requiring people to fallback to the individual ones when they need some more specific features.

new opam features: local switches

2017-04-27T09:05:17Z

Among the areas we wanted to improve on for opam 2.0 was the handling of switches. In opam 1.2, they are simply accessed by a name (the OCaml version by default), and are always stored into ~/.opam/<name>. This is fine, but can get a bit cumbersome when many switches are in presence, as there is no way to sort them or associate them with a given project.

A reminder about switches

For those unfamiliar with it, switches, in opam, are independent prefixes with their own compiler and set of installed packages. The opam switch command allows to create and remove switches, as well as select the currently active one, where operations like opam install will operate.

Their uses include easily juggling between versions of OCaml, or of a library, having incompatible packages installed separately but at the same time, running tests without damaging your "main" environment, and, quite often, separation of environment for working on different projects.

You can also select a specific switch for a single command, with
opam install foo --switch other
or even for a single shell session, with
eval $(opam env --switch other)

What opam 2.0 adds to this is the possibility to create so-called local switches, stored below a directory of your choice. This gets users back in control of how switches are organised, and wiping the directory is a safe way to get rid of the switch.

Using within projects

This is the main intended use: the user can define a switch within the source of a project, for use specifically in that project. One nice side-effect to help with this is that, if a "local switch" is detected in the current directory or a parent, opam will select it automatically. Just don't forget to run eval $(opam env) to make the environment up-to-date before running make.

Interface

The interface simply overloads the switch-name arguments, wherever they were present, allowing directory names instead. So for example:

cd ~/src/project
opam switch create ./

will create a local switch in the directory ~/src/project. Then, it is for example equivalent to run opam list from that directory, or opam list --switch=~/src/project from anywhere.

Note that you can bypass the automatic local-switch selection if needed by using the --switch argument, by defining the variable OPAMSWITCH or by using eval $(opam env --switch <name>)

Implementation

In practice, the switch contents are placed in a _opam/ subdirectory. So if you create the switch ~/src/project, you can browse its contents at ~/src/project/_opam. This is the direct prefix for the switch, so e.g. binaries can be found directly at _opam/bin/: easier than searching the opam root! The opam metadata is placed below that directory, in a .opam-switch/ subdirectory.

Local switches still share the opam root, and in particular depend on the repositories defined and cached there. It is now possible, however, to select different repositories for different switches, but that is a subject for another post.

Finally, note that removing that _opam directory is handled transparently by opam, and that if you want to share a local switch between projects, symlinking the _opam directory is allowed.

Current status

This feature has been present in our dev builds for a while, and you can already use it in the current beta.

Limitations and future extensions

It is not, at the moment, possible to move a local switch directory around, mainly due to issues related to relocating the OCaml compiler.

Creating a new switch still implies to recompile all the packages, and even the compiler itself (unless you rely on a system installation). The projected solution is to add a build cache, avoiding the need to recompile the same package with the same dependencies. This should actually be possible with the current opam 2.0 code, by leveraging the new hooks that are made available. Note that relocation of OCaml is also an issue for this, though.

Editing tools like ocp-indent or merlin can also become an annoyance with the multiplication of switches, because they are not automatically found if not installed in the current switch. But the user-setup plugin (run opam user-setup install) already handles this well, and will access ocp-indent or tuareg from their initial switch, if not found in the current one. You will still need to install tools that are tightly bound to a compiler version, like merlin and ocp-index, in the switches where you need them, though.

NOTE: this article is cross-posted on opam.ocaml.org and ocamlpro.com.

Comments

Jeremie Dimino (11 May 2017 at 8 h 27 min):

Thanks, that seems like a useful feature. Regarding relocation of the compiler, shouldn’t it be enough to set the environment variable OCAMLLIB? AFAIK the stdlib directory is the only hardcoded path on the compiler.

Louis Gesbert (11 May 2017 at 8 h 56 min):

Last I checked, there were a few more problematic points, in particular generated bytecode executables statically referring to their interpreter; but yes, in any case, it’s worth experimenting in that direction using the new hooks, to see how it works in practice.

Jeremie Dimino (12 May 2017 at 9 h 13 min):

Indeed, I remember that we had a similar problem in the initial setup to test the public release of Jane Street packages: we were using long paths for the opam roots and the generated #! where too long for the OS… What I did back then is write a program that scanned the tree and rewrote the #! to use “#!/usr/bin/env ocamlrun”.

That could be an option here. The rewriting only need to be done once, since the compiler uses ocamlc -where/camlheader when generating a bytecode executable.

EzSudoku

2017-04-01T09:05:17Z

As you may have noticed, on the begining of April I have some urge to write something technical about some deeply specific point of OCaml. This time I'd like to tackle that through sudoku.

It appeard that Sudoku is of great importance considering the number of posts explaining how to write a solver. Following that trend I will explain how to write one in OCaml. But with a twist.

We will try to optimize it. I won't show you anything as obvious as how to micro-optimize your code or some smart heuristc. No we are not aiming for being merely algorithmically good. We will try to make something serious, we are want it to be solved even before the program starts.

Yes really. Before. And I will show you how to use a feature of OCaml 4.03 that is sadly not well known.

First of all, as we do like type and safe programs, we will define what a well formed sudoku solution looks like. And by defining of course I mean declaring some GADTs with enough constraints to ensure that only well correct solutions are valid.

I assume tha you know the rules of Sudoku and will refrain from infuriating you by explaining it. But we will still need some vocabulary.

So the aim of sudoku is to fill a 'grid' with 'symbols' satisfying some 'row' 'column' and 'square' constraints.

To make the code examples readable we will stick to 4*4 sudokus. It's the smallest size that behaves the same way as 9*9 ones (I considered going for 1*1 ones, but the article ended up being a bit short). Of course everything would still apply to any n^2*n^2 sized one.

So let's start digging in some types. As we will refine them along the way, I will leave some parts to be filled later. This is represented by '...' .

First there are symbols, just 4 of them befause we reduced the size. Nothing special about that right now.

type ... symbol =
  | A : ...
  | B : ...
  | C : ...
  | D : ...

And a grid is 16 symbols. To avoid too much visual clutter in the type I just put them linearly. The comment show how it is supposed to be seen in the 2d representation of the grid:

(* a b c d
   e f g h
   i j k l
   m n o p *)

type grid =
  Grid :
    ... symbol * (* a *)
    ... symbol * (* b *)
    ... symbol * (* c *)
    ... symbol * (* d *)

    ... symbol * (* e *)
    ... symbol * (* f *)
    ... symbol * (* g *)
    ... symbol * (* h *)

    ... symbol * (* i *)
    ... symbol * (* j *)
    ... symbol * (* k *)
    ... symbol * (* l *)

    ... symbol * (* m *)
    ... symbol * (* n *)
    ... symbol * (* o *)
    ... symbol (* p *)
      -> solution

Right now grid is a simple 16-uple of symbols, but we will soon start filling those '...' to forbid any set of symbols that is not a valid solution.

Each constraint looks like, 'among those 4 positions neither 2 symbols are the same'. To express that (in fact something equivalent but a bit simpler to state with our types), we will need to name positions. So let's introduce some names:

type r1 (* the first position among a row constraint *)
type r2 (* the second position among a row constraint *)
type r3
type r4

type c1 (* the first position among a column constraint *)
type c2
type c3
type c4

type s1
type s2
type s3
type s4

type ('row, 'column, 'square) position

On the 2d grid this is how the various positions will be mapped.

r1 r2 r3 r4
r1 r2 r3 r4
r1 r2 r3 r4
r1 r2 r3 r4

c1 c1 c1 c1
c2 c2 c2 c2
c3 c3 c3 c3
c4 c4 c4 c4

s1 s2 s1 s2
s3 s4 s3 s4
s1 s2 s1 s2
s3 s4 s4 s4

For instance, the position g, in the 2nd row, 3rd column, will at the 3rd position in its row constraint, 2nd in its column constraint, and 3rd in its square constraint:

type g = (r3, c2, s3) position

We could have declare a single constraint position type, but this is slightly more readable. than:

type g = (p3, p2, p3) position

The position type is phantom, we could have provided a representation, but since no value of this type will ever be created, it's less confusing to state it that way.

type a = (r1, c1, s1) position
type b = (r2, c1, s2) position
type c = (r3, c1, s1) position
type d = (r4, c1, s2) position

type e = (r1, c2, s3) position
type f = (r2, c2, s4) position
type g = (r3, c2, s3) position
type h = (r4, c2, s4) position

type i = (r1, c3, s1) position
type j = (r2, c3, s2) position
type k = (r3, c3, s1) position
type r = (r4, c3, s2) position

type m = (r1, c4, s3) position
type n = (r2, c4, s4) position
type o = (r3, c4, s3) position
type p = (r4, c4, s4) position

It is now possible to state for each symbol in which position it is, so we will start filling a bit those types.

type ('position, ...) symbol =
  | A : (('r, 'c, 's) position, ...) symbol
  | B : (('r, 'c, 's) position, ...) symbol
  | C : (('r, 'c, 's) position, ...) symbol
  | D : (('r, 'c, 's) position, ...) symbol

This means that a symbol value is then associated to a single position in each constraint. We will need to state that in the grid type too:

type grid =
  Grid :
    (a, ...) symbol * (* a *)
    (b, ...) symbol * (* b *)
    (c, ...) symbol * (* c *)
    (d, ...) symbol * (* d *)

    (e, ...) symbol * (* e *)
    (f, ...) symbol * (* f *)
    (g, ...) symbol * (* g *)
    (h, ...) symbol * (* h *)

    (i, ...) symbol * (* i *)
    (j, ...) symbol * (* j *)
    (k, ...) symbol * (* k *)
    (l, ...) symbol * (* l *)

    (m, ...) symbol * (* m *)
    (n, ...) symbol * (* n *)
    (o, ...) symbol * (* o *)
    (p, ...) symbol (* p *)
    -> solution

We just need to forbid a symbol to appear in two different positions of a given row/column/square to prevent invalid solutions.

type 'fields row constraint 'fields = < a : 'a; b : 'b; c : 'c; d : 'd >
type 'fields column constraint 'fields = < a : 'a; b : 'b; c : 'c; d : 'd >
type 'fields square constraint 'fields = < a : 'a; b : 'b; c : 'c; d : 'd >

Those types represent the statement 'in this line/column/square, the symbol a is at the position 'a, the symbol b is at the position 'b, ...'

For instance, the row 'A D B C' will be represented by

< a : l1; b : l3; c : l4; d : l2 > row

Which reads: 'The symbol A is in first position, B in third position, C in fourth, and D in second'

The object type is used to make things a bit lighter later and allow to state names.

Now the symbols can be a bit more annotated:

type ('position, 'row, 'column, 'square) symbol =
  | A : (('r, 'c, 's) position,
         < a : 'r; .. > row,
         < a : 'c; .. > column,
         < a : 's; .. > square)
        symbol

  | B : (('r, 'c, 's) position,
         < b : 'r; .. > row,
         < b : 'c; .. > column,
         < b : 's; .. > square)
        symbol

  | C : (('r, 'c, 's) position,
         < c : 'r; .. > row,
         < c : 'c; .. > column,
         < c : 's; .. > square)
        symbol

  | D : (('r, 'c, 's) position,
         < d : 'r; .. > row,
         < d : 'c; .. > column,
         < d : 's; .. > square)
        symbol

Notice that '..' is not '...'. Those dots are really part of the OCaml syntax: it means 'put whatever you want here, I don't care'. There is nothing more to add to this type.

This type declaration reports the position information. Using the same variable name 'r in the position and in the row constraint parameter for instance means that both fields will have the same type.

For instance, a symbol 'B' in position 'g' would be in the 3rd position of its row, 2nd position of its column , and 3rd position of its square:

let v : (g, _, _, _) symbol = B;;
val v :
  (g, < b : r3 > row,
      < b : c2 > column,
      < b : s3 > square)
symbol = B

Those types constraints ensure that this is correctly reported.

The real output of the type checker is a bit more verbose, but I remove the irrelevant part:

val v :
  (g, < a : 'a; b : r3; c : 'b; d : 'c > row,
      < a : 'd; b : c2; c : 'e; d : 'f > column,
      < a : 'g; b : s3; c : 'h; d : 'i > square)
symbol = B

We are now quite close from a completely constrained type. We just need to say that the various symbols from the same row/line/column constraint have the same type:

type grid =
  Grid :
    (a, 'row1, 'column1, 'square1) symbol *
    (b, 'row1, 'column2, 'square1) symbol *
    (c, 'row1, 'column3, 'square2) symbol *
    (d, 'row1, 'column4, 'square2) symbol *

    (e, 'row2, 'column1, 'square1) symbol *
    (f, 'row2, 'column2, 'square1) symbol *
    (g, 'row2, 'column3, 'square2) symbol *
    (h, 'row2, 'column4, 'square2) symbol *

    (i, 'row3, 'column1, 'square3) symbol *
    (j, 'row3, 'column2, 'square3) symbol *
    (k, 'row3, 'column3, 'square4) symbol *
    (l, 'row3, 'column4, 'square4) symbol *

    (m, 'row4, 'column1, 'square3) symbol *
    (n, 'row4, 'column2, 'square3) symbol *
    (o, 'row4, 'column3, 'square4) symbol *
    (p, 'row4, 'column4, 'square4) symbol *

That is two symbols in the same row/column/square will share the same 'row/'symbol/'square type. For any couple of symbols in say, a row, they must agree on that type, hence, on the position of every symbol.

Let's look at the 'A' symbol for the 'a' and 'c' position for instance. Both share the same 'row1 type variable. There are two cases. Either both are 'A's ore one is not.

If one symbol is not a 'A', let's say those are 'C' and 'A' symbols. Their row type (pun almost intended) will be respectively < c : r1; .. > and < a : r3; .. >. Meaning that 'C' does not care about the position of 'A' and conversly. Those types are compatible. No problem here.
If both are 'A's then something else happens. Their row types will be < a : r1; .. > and < a : r3; .. > which is certainly not compatible since r1 and r3 are not compatible. This will be rejected. Now we have a grid type that checks the sudoku constraints !

Let's try it.

let ok =
  Grid
    (A, B, C, D,
     C, D, A, B,

     D, A, B, C,
     B, C, D, A)

val ok : grid = Grid (A, B, C, D, C, D, A, B, D, A, B, C, B, C, D, A)

let not_ok =
  Grid
    (A, B, C, D,
     C, D, A, B,

     D, A, B, C,
     B, C, A, D)

     B, C, A, D);;
  ^
Error: This expression has type
  (o, < a : r3; b : r1; c : r2; d : 'a > row,
      < a : c4; b : 'b; c : 'c; d : 'd > column,
      < a : s3; b : 'e; c : 'f; d : 'g > square)
    symbol
but an expression was expected of type
  (o, < a : r3; b : r1; c : r2; d : 'a > row,
      < a : c2; b : c3; c : c1; d : 'h > column,
      < a : 'i; b : s1; c : s2; d : 'j > square)
    symbol
Types for method a are incompatible

What it is trying to say is that 'A' is both at position '2' and '4' of its column. Well it seems to work.

Solving it

But we are not only interested in checking that a solution is correct, we want to find them !

But with 'one weird trick' we will magically transform it into a solver, namely the -> . syntax. It was introduced in OCaml 4.03 for some other purpose. But we will now use its hidden power !

This is the right hand side of a pattern. It explicitely states that a pattern is unreachable. For instance

type _ t =
  | Int : int -> int t
  | Float : float -> float t

let add (type v) (a : v t) (b : v t) : v t =
  match a, b with
  | Int a, Int b -> Int (a + b)
  | Float a, Float b -> Float (a +. b)
  | _ -> .

By writing it here you state that you don't expect any other pattern to verify the type constraints. This is effectively the case here. In general you won't need this as the exhaustivity checker will see it. But in some intricate situations it will need some hints to work a bit more. For more information see Jacques Garrigue / Le Normand article

This may be a bit obscure, but this is what we now need. Indeed, we can ask the exhaustivity checker if there exist a value verifying the pattern and the type constraints. For instance to solve a problem, we ask the compiler to check if there is any value verifying a partial solution encoded as a pattern.

 A _ C _
 _ D _ B
 _ A D _
 D _ B _

let test x =
  match x with
  | Grid
    (A, _, C, _,
     _, D, _, B,

     _, A, D, _,
     D, _, B, _) -> .
  | _ -> ()

Error: This match case could not be refuted.
Here is an example of a value that would reach it:
Grid (A, B, C, D, C, D, A, B, B, A, D, C, D, C, B, A)

The checker tells us that there is a solution verifying those constraints, and provides it.

If there were no solution, there would have been no error.

let test x =
  match x with
  | Grid
    (A, B, C, _,
     _, _, _, D,

     _, _, _, _,
     _, _, _, _) -> .
  | _ -> ()

val test : grid -> unit =

And that's it !

Wrapping it up

Of course that's a bit cheating since the program is not executable, but who cares really ? If you want to use it, I made a small (ugly) script generating those types. You can try it on bigger problems, but in fact it is a bit exponential. So you shouldn't really expect an answer too soon.

Comments

Louis Gesbert (28 April 2017 at 8 h 11 min):

Brilliant!

new opam features: "opam build"

2017-03-16T09:05:17Z

UPDATE: after discussions following this post, this feature was abandoned with the interface presented below. See this post for the details and the new interface!

The new opam 2.0 release, currently in beta, introduces several new features. This post gets into some detail on the new opam build command, its purpose, its use, and some implementation aspects.

opam build is run from the source tree of a project, and does not rely on a pre-existing opam installation. As such, it adds a new option besides the existing workflows based on managing shared OCaml installations in the form of switches.

What does it do ?

Typically, this is used in a fresh git clone of some OCaml project. Like when pinning the package, opam will find and leverage package definitions found in the source, in the form of opam files.

if opam hasn't been initialised (no ~/.opam), this is taken care of.
if no switch is otherwise explicitely selected, a local switch is used, and created if necessary (i.e. in ./_opam/)
the metadata for the current project is registered, and the package installed after its dependencies, as opam usually does

This is particularly useful for distributing projects to people not used to opam and the OCaml ecosystem: the setup steps are automatically taken care of, and a single opam build invocation can take care of resolving the dependency chains for your package.

If building the project directly is preferred, adding --deps-only is a good way to get the dependencies ready for the project:

opam build --deps-only
eval $(opam config env)
./configure; make; etc.

Note that if you just want to handle project-local opam files, opam build can also be used in your existing switches: just specify --no-autoinit, --switch or make sure the OPAMSWITCH variable is set. E.g. opam build --no-autoinit --deps-only is a convenient way to get the dependencies for the local project ready in your current switch.

Additional functions

Installation

The installation of the packages happens as usual to the prefix corresponding to the switch used (<project-root>/_opam/ for a local switch). But it is possible, with --install-prefix, to further install the package to the system:

opam build --install-prefix ~/local

will install the results of the package found in the current directory below ~/local.

The dependencies of the package won't be installed, so this is intended for programs, assuming they are relocatable, and not for libraries.

Choosing custom repositories

The user can pre-select the repositories to use on the creation of the local switch with:

opam build --repositories <repos>

where <repos> is a comma-separated list of repositories, specified either as name=URL, or name if already configured on the system.

Multiple packages

Multiple packages are commonly found to share a single repository. In this case, opam build registers and builds all of them, respecting cross-dependencies. The opam files to use can also be explicitely selected on the command-line.

In this case, specific opam files must be named <package-name>.opam.

Implementation details

The choice of the compiler, on automatic initialisation, is either explicit, using the --compiler option, or automatic. In the latter case, the default selection is used (see opam init --help, section "CONFIGURATION FILE" for details), but a compiler compatible with the local packages found is searched from that. This allows, for example, to choose a system compiler when available and compatible, avoiding a recompilation of OCaml.

When using --install-prefix, the normal installation is done, then the tracking of package-installed files, introduced in opam 2.0, is used to extract the installed files from the switch and copy them to the prefix.

The packages installed through opam build are not registered in any repository, and this is not an implicit use of opam pin: the rationale is that packages installed this way will also be updated by repeating opam build. This means that when using other commands, e.g. opam upgrade, opam won't try to keep the packages to their local, source version, and will either revert them to their repository definition, or remove them, if they need recompilation.

Planned extensions

This is still in beta: there are still rough edges, please experiment and give feedback! It is still possible that the command syntax and semantics change significantly before release.

Another use-case that we are striving to improve is sharing of development setups (share sets of pinned packages, depend on specific remotes or git hashes, etc.). We have many ideas to improve on this, but opam build is not, as of today, a direct solution to this. In particular, installing this way still relies on the default opam repository; a way to define specific options for the switch that is implicitely created on opam build is in the works.

NOTE: this article is cross-posted on opam.ocaml.org and ocamlpro.com.

Comments

Louis Gesbert (16 March 2017 at 14 h 31 min):

Some discussion on a better naming and making some parts of this more widely available in the opam CLI is ongoing at https://github.com/ocaml/opam/issues/2882

Hez Carty (16 March 2017 at 17 h 23 min):

Is it possible/planned to support sharing of compilers across local (or global) switches? It would be very useful to have a global 4.04.0+flambda switch including only the compiler itself or the compiler + basic tools like ocp-indent and merlin. Then a number of projects could share this base installation but have their own locally installed dependencies without duplicating the entire build time per-project.

Louis Gesbert (17 March 2017 at 10 h 10 min):

Sharing compilers, or other packages across switches is not supported at the moment. However:

You can still use the global system compiler on any switch, local or not, to avoid its recompilation What is planned, as a first step, for after the 2.0 release, is to add a cache of compiled packages. Hooks are already in place to allow this, and opam is able to track the files installed by each package already, so the most difficult part is probably going to be the relocation issues with OCaml itself.

A cache is an easier solution to warrant consistency: with shared switches, the problem of reinstallations and keeping everything consistent gets much more complex — what happens when you change the compiler of your “master” switch ?

Hez Carty (20 March 2017 at 16 h 46 min):

That sounds great, thank you. Should make this kind of local switch more useful when working with large numbers of projects.

opam 2.0 Beta is out!

2017-02-09T09:05:17Z

UPDATE (2017-02-14): A beta2 is online, which fixes issues and performance of the opam build command. Get the new binaries, or recompile the opam-devel package and replace the previous binary.

We are pleased to announce that the beta release of opam 2.0 is now live! You can try it already, bootstrapping from a working 1.2 opam installation, with:

opam update; opam install opam-devel

With about a thousand patches since the last stable release, we took the time to gather feedback after our last announcement and implemented a couple of additional, most-wanted features:

An opam build command that, from the root of a source tree containing one or more package definitions, can automatically handle initialisation and building of the sources in a local switch.
Support for repository signing through the external Conex tool, being developed in parallel.

There are many more features, like the new opam clean and opam admin commands, a new archive caching system, etc., but we'll let you check the full changelog.

We also improved still on the already announced features, including compilers as packages, local switches, per-switch repository configuration, package file tracking, etc.

The updated documentation is at http://opam.ocaml.org/doc/2.0/. If you are developing in opam-related tools, you may also want to browse the new APIs.

Try it out

Please try out the beta, and report any issues or missing features. You can:

Build it from source in opam, as shown above (opam install opam-devel)
Use the pre-built binaries.
Building from the source tarball: download here and build using ./configure && make lib-ext && make if you have OCaml >= 4.01 already available; make cold otherwise
Or directly from the git tree, following the instructions included in the README. Some files have been moved around, so if your build fails after you updated an existing git clone, try to clean it up (git clean -dx).

Some users have been using the alpha for the past months without problems, but you may want to keep your opam 1.2 installation intact until the release is out. An easy way to do this is with an alias:

alias opam2="OPAMROOT=~/.opam2 path/to/opam-2-binary"

Changes to be aware of

Command-line interface

opam switch create is now needed to create new switches, and opam switch is now much more expressive
opam list is also much more expressive, but be aware that the output may have changed if you used it in scripts
new commands:
- opam build: setup and build a local source tree
- opam clean: various cleanup operations (wiping caches, etc.)
- opam admin: manage software repositories, including upgrading them to opam 2.0 format (replaces the opam-admin tool)
- opam env, opam exec, opam var: shortcuts for the opam config subcommands
opam repository add will now setup the new repository for the current switch only, unless you specify --all
Some flags, like --test, now apply to the packages listed on the command-line only. For example, opam install lwt --test will build and install lwt and all its dependencies, but only build/run the tests of the lwt package. Test-dependencies of its dependencies are also ignored
The new opam install --soft-request is useful for batch runs, it will maximise the installed packages among the requested ones, but won't fail if all can't be installed

As before, opam is self-documenting, so be sure to check opam COMMAND --help first when in doubt. The bash completion scripts have also been thoroughly improved, and may help navigating the new options.

Metadata

There are both a few changes (extensions, mostly) to the package description format, and more drastic changes to the repository format, mainly related to translating the old compiler definitions into packages.

opam will automatically update, internally, definitions of pinned packages as well as repositories in the 1.2 format
however, it is faster to use repositories in the 2.0 format directly. To that end, please use the opam admin upgrade command on your repositories. The --mirror option will create a 2.0 mirror and put in place proper redirections, allowing your original repository to retain the old format

The official opam repository at https://opam.ocaml.org remains in 1.2 format for now, but has a live-updated 2.0 mirror to which you should be automatically redirected. It cannot yet accept package definitions in 2.0 format.

Package format

Any available: constraints based on the OCaml compiler version should be rewritten into dependencies to the ocaml package
Separate build: and install: instructions are now required
It is now preferred to include the old url and descr files (containing the archive URL and package description) in the opam file itself: (see the new synopsis: and description: fields, and the url {} file section)
Building tests and documentation should now be part of the main build: instructions, using the {test} and {doc} filters. The build-test: and build-doc: fields are still supported.
It is now possible to use opam variables within dependencies, for example depends: [ "foo" {= version} ], for a dependency to package foo at the same version as the package being defined, or depends: [ "bar" {os = "linux"} ] for a dependency that only applies on Linux.
The new conflict-class: field allows mutual conflicts among a set of packages to be declared. Useful, for example, when there are many concurrent, incompatible implementations.
The ocaml-version: field has been deprecated for a long time and is no longer accepted. This should now be a dependency on the ocaml package
Three types of checksums are now accepted: you should use md5=<hex-value>, sha256=<hex-value> or sha512=<hex-value>. We'll be gradually deprecating md5 in favour of the more secure algorithms; multiple checksums are allowed
Patches supplied in the patches: field must apply with patch -p1
The new setenv: field allows packages to export updates to environment variables;
Custom fields x-foo: can be used for extensions and external tools
""" delimiters allow unescaped strings
& has now the customary higher precedence than | in formulas
Installed files are now automatically tracked meaning that the remove: field is usually no longer required.

The full, up-to-date specification of the format can be browsed in the manual.

Repository format

In the official, default repository, and also when migrating repositories from older format versions, there are:

A virtual ocaml package, that depends on any implementation of the OCaml compiler. This is what packages should depend on, and the version is the corresponding base OCaml version (e.g. 4.04.0 for the 4.04.0+fp compiler). It also defines various configuration variables, see opam config list ocaml.
Three mutually-exclusive packages providing actual implementations of the OCaml toolchain:
- ocaml-base-compiler is the official releases
- ocaml-variants.<base-version>+<variant-name> contains all the other variants
- ocaml-system-compiler maps to a compiler installed on the system outside of opam

The layout is otherwise the same, apart from:

The compilers/ directory is ignored
A repo file should be present, containing at least the line opam-version: "2.0"
The indexes for serving over HTTP have been simplified, and urls.txt is no longer needed. See opam admin index --help
The archives/ directory is no longer used. The cache now uses a different format and is configured through the repo file, defaulting to cache/ on the same server. See opam admin cache --help

Feedback

Thanks for trying out the beta! Please let us have feedback, preferably to the opam tracker; other options include the opam-devel list and #opam IRC channel on Freenode.

Release of Alt-Ergo 1.30 with experimental support for models generation

2016-11-21T09:05:17Z

We have recently released a new (public up-to-date) version of Alt-Ergo. We focus in this article on its main new feature: experimental support for models generation. This work has been done with Frédéric Lang, an intern at OCamlPro from February to July 2016.

The idea behind models generation

The idea behind this feature is the following: when Alt-Ergo fails to prove the validity of a given formula F, it tries to compute and exhibit values for the terms of the problem that make the negation of F satisfiable. For instance, for the following example, written in Alt-Ergo's syntax,

logic f : int -> int
logic a, b : int
goal g:
(a <> b and f(a) <= f(b) + 2*a) ->
false

a possible (counter) model is a = 1, b = 3, f(a) = 0, and f(b) = 0. The solution is called a candidate model because universally quantified formulas are, in general, not taken into account. We talk about counter example or counter model because the solution falsifies (i.e. satisfies the negation of) F.

Basic usage

Models generation in Alt-Ergo is non-intrusive. It is controlled via a new option called -interpretation. This option requires an integer argument. The default value 0 disables the feature, and:

-interpretation 1 triggers a model computation and display at the end of Alt-Ergo's execution (i.e. just before returning I don't know),
-interpretation 2 triggers a model computation before each axioms instantiation round,
-interpretation 3 is the most aggressive. It triggers a model computation before each Boolean decision in the SAT.

For the two latest strategies, the model will be displayed at the end of the execution if the given formula is not proved. Note that a negative argument (-1, -2, or -3) will enable model computation as explained above, but the result will not be displayed (useful for automatic testing). In addition, if Alt-Ergo timeouts, the latest computed model, if any, will be shown.

Advanced usage

If you are not on Windows, you will also be able to use option -interpretation-timelimit to try to get a candidate model even when Alt-Ergo hits a given time limit (set with option -timelimit). The idea is simple: if Alt-Ergo fails to prove validity during the time allocated for "proof search", it will activate models generation and tries to get a counter example during the time allocated for that.

Form of produced models

Currently, models are printed in a syntax similar to SMT2's. We made this choice because Why3 already parses models in this format. For instance, Alt-Ergo outputs the following model for the example above:

(
(a 1)
(b 3)
((f 3) 0)
((f 1) 0)
)

Some known issues and limitations

For the moment, arrays are interpreted in term of the accesses that appear in the input formula, or that have been added internally by the decision procedure. In particular, a non-constrained array arr will probably be uninterpreted in the model (which would mean that it can have any well-typed value at any well-typed index).
Model generation may not terminate in presence of non-linear arithmetic. This is actually the case for the example below (Alt-Ergo handles rationals, and there is no rational x such that x * x = 2). We plan to implement a delta-completeness like approach to stop splitting when intervals become really too small. goal g: forall x : real. x * x = 2. -> false.
Currently, we generate a model for the content of the decision procedures part. Since the SAT's model is (in general) partial in Alt-Ergo, some ground terms may be missing. Moreover, no filtering with labels mechanism is done for the moment.

Alt-Ergo 1.30 vs 1.20 vs 1.01 releases

A quick comparison between this new version, the latest private release (1.20), and the latest public release (1.01) on our internal benchmarks is shown below. You notice that this version is faster and discharges more formulas.

	Alt-Ergo 1.01	Alt-Ergo 1.20	Alt-Ergo 1.30
Why3 benchmarks (9752 VCs)	88.36% 7310 seconds	89.23% 7155 seconds	89.57% 4553 seconds
SPARK benchmarks (14442 VCs)	78.05% 3872 seconds	78.42% 3042 seconds	78.56% 2909 seconds
BWare benchmarks (12828 VCs)	97.38% 6373 seconds	98.02% 6907 seconds	98.31% 4231 seconds

Download, install and bugs report

You can learn more about Alt-Ergo and download the latest version on the solver's website. You can also install it via the OPAM package manager. For bugs report, we recommend Alt-Ergo's issues tracker on Github.

Don't hesitate to give your feedback to help us improving Alt-Ergo. You can also contribute with benchmarks to diversify and enrich our internal test-suite.

opam-lib 1.3 available

2016-11-20T09:05:17Z

opam-lib 1.3

The package for opam-lib version 1.3 has just been released in the official opam repository. There is no release of opam with version 1.3, but this is an intermediate version of the library that retains compatibility of the file formats with 1.2.2.

The purpose of this release is twofold:

provide some fixes and enhancements over opam-lib 1.2.2. For example, 1.3 has an enhanced lint function
be a step towards migration to opam-lib 2.0.

This version is compatible with the current stable release of opam (1.2.2), but dependencies have been updated so that you are not (e.g.) stuck on an old version of ocamlgraph.

Therefore, I encourage all maintainers of tools based on opam-lib to migrate to 1.3.

The respective APIs are available in html for 1.2 and 1.3.

A note on plugins: when you write opam-related tools, remember that by setting flags: plugin in their definition and installing a binary named opam-toolname, you will enable the users to install package toolname and run your tool with a single opam toolname command.

Architectural changes

If you need to migrate from 1.2 to 1.3, these tips may help:

there are now 6 different ocamlfind sub-libraries instead of just 4: format contains the handlers for opam types and file formats, has been split out from the core library, while state handles the state of a given opam root and switch and has been split from the client library.
OpamMisc is gone and moved into the better organised OpamStd, with submodules for String, List, etc.
OpamGlobals is gone too, and its contents have been moved to:
- OpamConsole for the printing, logging, and shell interface handling part
- OpamXxxConfig modules for each of the libraries for handling the global configuration variables. You should call the respective init functions, with the options you want to set, for proper initialisation of the lib options (and handling the OPAMXXX environment variables)
OpamPath.Repository is now OpamRepositoryPath, and part of the repository sub-library.

opam-lib 2.0 ?

The development version of the opam-lib (2.0~alpha5 as of writing) is already available on opam. The name has been changed to provide a finer granularity, so it can actually be installed concurrently -- but be careful not to confuse the ocamlfind package names (opam-lib.format for 1.3 vs opam-format for 2.0).

The provided packages are:

opam-file-format: now separated from the opam source tree, this has no dependencies and can be used to parse and print the raw opam syntax.
opam-core: the basic toolbox used by opam, which actually doesn't include the opam specific part. Includes a tiny extra stdlib, the engine for running a graph of processes in parallel, some system handling functions, etc. Depends on ocamlgraph and re only.
opam-format: defines opam data types and their file i/o functions. Depends just on the two above.
opam-solver: opam's interface with the dose3 library and external solvers.
opam-repository: fetching repositories and package sources from all handled remote types.
opam-state: handling of the opam states, at the global, repository and switch levels.
opam-client: the client library, providing the top-level operations (installing packages...), and CLI.
opam-devel: this packages the development version of the opam tool itself, for bootstrapping. You can install it safely as it doesn't install the new opam in the PATH.

The new API can be also be browsed ; please get in touch if you have trouble migrating.

opam 2.0 preview release!

2016-09-20T09:05:17Z

We are pleased to announce a preview release for opam 2.0, with over 700 patches since 1.2.2. Version 2.0~alpha4 has just been released, and is ready to be more widely tested.

This version brings many new features and changes, the most notable one being that OCaml compiler packages are no longer special entities, and are replaced by standard package definition files. This in turn means that opam users have more flexibility in how switches are managed, including for managing non-OCaml environments such as Coq using the same familiar tools.

A few highlights

This is just a sample, see the full changelog for more:

Sandboxed builds: Command wrappers can be configured to, for example, restrict permissions of the build and install processes using Linux namespaces, or run the builds within Docker containers.
Compilers as packages: This brings many advantages for opam workflows, such as being able to upgrade the compiler in a given switch, better tooling for local compilers, and the possibility to define coq as a compiler or even use opam as a generic shell scripting engine with dependency tracking.
Local switches: Create switches within your projects for easier management. Simply run opam switch create <directory> <compiler> to get started.
Inplace build: Use opam to build directly from your source directory. Ensure the package is pinned locally then run opam install --inplace-build.
Automatic file tracking:: opam now tracks the files installed by packages and is able to cleanly remove them when no existing files were modified. The remove: field is now optional as a result.
Configuration file: This can be used to direct choices at opam init automatically (e.g. specific repositories, wrappers, variables, fetch commands, or the external solver). This can be used to override all of opam's OCaml-related settings.
Simpler library: the OCaml API is completely rewritten and should make it much easier to write external tools and plugins. Existing tools will need to be ported.
Better error mitigation: Through clever ordering of the shell actions and separation of build and install, most build failures can keep your current installation intact, not resulting in removed packages anymore.

Roll out

You are very welcome to try out the alpha, and report any issues. The repository at opam.ocaml.org will remain in 1.2 format (with a 2.0 mirror at opam.ocaml.org/2.0~dev in sync) until after the release is out, which means the extensions can not be used there yet, but you are welcome to test on local or custom repositories, or package pinnings. The reverse translation (2.0 to 1.2) is planned, to keep supporting 1.2 installations after that date.

The documentation for the new version is available at http://opam.ocaml.org/doc/2.0/. This is still work in progress, so please do ask if anything is unclear.

Interface changes

Commands opam switch and opam list have been rehauled for more consistency and flexibility: the former won't implicitly create new switches unless called with the create subcommand, and opam list now allows to combine filters and finely specify the output format. They may not be fully backwards compatible, so please check your scripts.

Most other commands have also seen fixes or improvements. For example, opam doesn't forget about your set of installed packages on the first error, and the new opam install --restore can be used to reinstall your selection after a failed upgrade.

Repository changes

While users of opam 1.2 should feel at home with the changes, the 2.0 repository and package formats are not compatible. Indeed, the move of the compilers to standard packages implies some conversions, and updates to the relationships between packages and their compiler. For example, package constraints like

available: [ ocaml-version >= "4.02" ]

are now written as normal package dependencies:

depends: [ "ocaml" {>= "4.02"} ]

To make the transition easier,

upgrade of a custom repository is simply a matter of running opam-admin upgrade-format at its root;
the official repository at opam.ocaml.org already has a 2.0 mirror, to which you will be automatically redirected;
packages definition are automatically converted when you pin a package.

Note that the ocaml package on the official repository is actually a wrapper that depends on one of ocaml-base-compiler, ocaml-system or ocaml-variants, which contain the different flavours of the actual compiler. It is expected that it may only get picked up when requested by package dependencies.

Package format changes

The opam package definition format is very similar to before, but there are quite a few extensions and some changes:

it is now mandatory to separate the build: and install: steps (this allows tracking of installed files, better error recovery, and some optional security features);
the url and description can now optionally be included in the opam file using the section url {} and fields synopsis: and description:;
it is now possible to have dependencies toggled by globally-defined opam variables (e.g. for a dependency needed on some OS only), or even rely on the package information (e.g. have a dependency at the same version);
the new setenv: field allows packages to export updates to environment variables;
custom fields x-foo: can be used for extensions and external tools;
allow """ delimiters around unescaped strings
& is now parsed with higher priority than |
field ocaml-version: can no longer be used
the remove: field should not be used anymore for simple cases (just removing files)

Let's go then -- how to try it ?

First, be aware that you'll be prompted to update your ~/.opam to 2.0 format before anything else, so if you value it, make a backup. Or just export OPAMROOT to test the alpha on a temporary opam root.

Packages for opam 2.0 are already in the opam repository, so if you have a working opam installation of opam (at least 1.2.1), you can bootstrap as easily as:

opam install opam-devel

This doesn't install the new opam to your PATH within the current opam root for obvious reasons, so you can manually install it as e.g. "opam2" using:

sudo cp $(opam config var "opam-devel:lib")/opam /usr/local/bin/opam2

You can otherwise install as usual:

Using pre-built binaries (available for OSX and Linux x86, x86_64, armhf) and our install script:

wget https://raw.github.com/ocaml/opam/2.0-alpha4-devel/shell/opam_installer.sh -O - | sh -s /usr/local/bin

Equivalently, pick your version and download it to your PATH;
Building from our inclusive source tarball: download here and build using ./configure && make lib-ext && make && make install if you have OCaml >= 4.01 already available, make cold && make install otherwise;
Or from source, following the included instructions from the README. Some files have been moved around, so if your build fails after you updated an existing git clone, try to clean it up (git clean -fdx).

ASM.OCaml

2016-04-01T09:05:17Z

As you may know, there is a subset of Javascript that compiles efficiently to assembly used as backend of various compilers including a C compiler like emscripten. We'd like to present you in the same spirit how never to allocate in OCaml.

Before starting to write anything, we must know how to find if a code is allocating. The best way currently is to look at the Cmm intermediate representation. We can see it by calling ocamlopt with the -dcmm option:

ocamlopt -c -dcmm test.ml

let f x = (x,x)

Some excerpt from the output:

(function camlTest__f_4 (x_6/1204: val) (alloc 2048 x_6/1204 x_6/1204))

To improve readability, in this post we will clean a bit the variable names:

(function f (x: val) (alloc 2048 x x))

We see that the function f (named camlTest__f_4) is calling the alloc primitive, which obviously is an allocation. Here, this creates a size 2 block with tag 0 (2048 = 2 << 10 + 0) and containing two times the value x_6/1204 which was x is the source. So we can detect if some code is allocating by doing ocamlopt -c -dcmm test.ml 2&>1 | grep alloc (obviously any function or variable named alloc will also appear).

It is possible to write some code that don't allocate (in the heap) at all, but what are the limitations ? For instance the omnipresent fibonacci function does not allocate:

let rec fib = function
  | 0 -> 0
  | 1 -> 1
  | n -> fib (n-1) + fib (n-2)

(function fib (n: val)
  (if (!= n 1)
  (if (!= n 3)
    (let Paddint_arg (app "fib" (+ n -4) val)
    (+ (+ (app "fib" (+ n -2) val) Paddint_arg) -1))
    3)
  1))

But quite a lot of common patterns do:

Building structured values will allocate (tuple, records, sum types containing an element, ...)
Using floats, int64, ... will allocate
Declaring non-toplevel functions will allocate

Considering that, it can appear that it is almost impossible to write any non-trivial code without using those. But that's not completely true.

There are some exceptions to those rules, where some normally allocating constructions can be optimised away. We will explain how to exploit them to be able to write some more code.

Local references

Maybe the most important one is the case of local references.

let fact n =
  let result = ref 1 in
  for i = 1 to n do
    result := n * !result
  done;
  !result

To improve readability, this has been cleaned and demangled

(function fact (n: val)
  (let (result 3)
  (seq
    (let (i 3 bound n)
    (catch
      (if (> i bound) (exit 3)
      (loop
        (assign result (+ (* (+ n -1) (>>s result 1)) 1))
        (assign i (+ i 2))
      (if (== i bound) (exit 3) [])
    with(3) []))))
  result)))

You can notice that allocation of the reference disappeared. The modifications were replaced by assignments (the assign operator) to the result variable. This transformation can happen when a reference is never used anywhere else than as an argument of the ! and := operator and does not appear in the closure of any local function like:

let counter () =
  let count = ref 0 in
  let next () = count := !count + 1; !count in
  next

This won't happen in this case since count is in the closure of next.

Unboxing

The float, int32, int64 and nativeint types do not fit in the generic representation of values that can be stored in the OCaml heap, so they are boxed. This means that they are allocated and there is an annotation to tell the garbage collector to skip their content. So using them in general will allocate. But an important optimization is that local uses (some cases that obviously won't go in the heap) are 'unboxed', i.e. not allocated.

If/match couple

Some 4.03 change also improve some cases of branching returning tuples

let positive_difference x y =
  let (min, max) =
    if x < y then
      (x, y)
    else
      (y, x)
  in
  max - min

(function positive_difference (x: val y: val)
  (catch
    (if (< x y) (exit 7 y x)
    (exit 7 x y))
  with(7 max min) (+ (- max min) 1)))

Control flow

You can do almost any control flow like that, but this is quite unpractical and is still limited in many ways.

If you don't want to write everything as for and while loops, you can write functions for your control flow, but to prevent allocation you will have to refrain from doing a few things. For instance, you should not pass record or tupple as argument to functions of course, you should pass each field separately as a different argument.

But what happens when you want to return multiple values ? There is some ongoing project to try to optimise the allocations of some of those cases away, but currently you can't. Really ? NO !

Returning multiple values

If you bend a bit your mind, you may see that returning from a function is almost the same thing as calling one... Or you can make it that way. So let's transform our code in 'Continuation Passing Style'

For instance, let's write a function that finds the minimum and the maximum of a list. That could be written like that:

let rec fold_left f init l =
  match l with
  | [] -> init
  | h :: t ->
    let acc = f init h in
    fold_left f acc t

let keep_min_max (min, max) v =
  let min = if v < min then v else min in
  let max = if v > max then v else max in
  min, max

let find_min_max l =
  match l with
  | [] -> invalid_arg "find_min_max"
  | h :: t ->
    fold_left keep_min_max (h, h) t

Continuation Passing Style

Transforming it to continuation passing style (CPS) replace every function return by a tail-call to a function representing 'what happens after'. This function is usually called a continuation and a convention is to use the variable name 'k' for it.

Let's start simply by turning only the keep_min_max function into continuation passing style.

let keep_min_max (min, max) v k =
  let min = if v < min then v else min in
  let max = if v > max then v else max in
  k (min, max)

val keep_min_max : 'a * 'a -> 'a -> ('a * 'a -> 'b) -> 'b

That's all here. But of course we need to modify a bit the function calling it.

let rec fold_left f init l =
  match l with
  | [] -> init
  | h :: t ->
    let k acc =
      fold_left f acc t
    in
    f init h k

val fold_left : ('a -> 'b -> ('a -> 'a) -> 'a) -> 'a -> 'b list -> 'a
val find_min_max : 'a list -> 'a * 'a

Here instead of calling f then recursively calling fold_left, we prepare what we will do after calling f (that is calling fold_left) and then we call f with that continuation. find_min_max is unchanged and still has the same type.

But we can continue turning things in CPS, and a full conversion would result in:

let rec fold_left_k f init l k =
  match l with
  | [] -> k init
  | h :: t ->
    let k acc =
      fold_left_k f acc t k
    in
    f init h k
val fold_left_k : ('a -> 'b -> ('a -> 'c) -> 'c) -> 'a -> 'b list -> ('a -> 'c) -> 'c

let keep_min_max_k (min, max) v k =
  let min = if v < min then v else min in
  let max = if v > max then v else max in
  k (min, max)
val keep_min_max_k : 'a * 'a -> 'a -> ('a * 'a -> 'b) -> 'b

let find_min_max_k l k =
  match l with
  | [] -> invalid_arg "find_min_max"
  | h :: t ->
    fold_left_k keep_min_max (h, h) t k
val find_min_max_k : 'a list -> ('a * 'a -> 'b) -> 'b

let find_min_max l =
  find_min_max_k l (fun x -> x)
val find_min_max : 'a list -> 'a * 'a

Where rectypes matter for performance reasons

That's nice, we only have tail calls now, but we are not done removing allocation yet of course. We now need to get rid of the allocation of the closure in fold_left_k and of the couples in keep_min_max_k. For that, we need to pass everything that should be allocated as argument:

let rec fold_left_k2 f init1 init2 l k =
  match l with
  | [] -> k init1 init2
  | h :: t ->
    f init1 init2 h t fold_left_k2 k

val fold_left_k2 :
  ('b -> 'c -> 'd -> 'd list -> 'a -> ('b -> 'c -> 'e) -> 'e) ->
  'b -> 'c -> 'd list -> ('b -> 'c -> 'e) -> 'e as 'a

let rec keep_min_max_k2 = fun min max v k_arg k k2 ->
  let min = if v < min then v else min in
  let max = if v > max then v else max in
  k keep_min_max_k2 min max k_arg k2

val keep_min_max_k2 :
  'b -> 'b -> 'b -> 'c -> ('a -> 'b -> 'b -> 'c -> 'd -> 'e) -> 'd -> 'e as 'a

let find_min_max_k2 l k =
  match l with
  | [] -> invalid_arg "find_min_max"
  | h :: t ->
    fold_left_k2 keep_min_max_k2 h h t k

val find_min_max_k2 : 'a list -> ('a -> 'a -> 'b) -> 'b

For some reason, we now need to activate 'rectypes' to allow functions to have a recursive type (the 'as 'a') but we managed to completely get rid of allocations.

(function fold_left_k2 (f: val init1: val init2: val l: val k: val)
  (if (!= l 1)
  (app "caml_apply6" init1 init2 (load val l) l "fold_left_k2" k f val))
  (app "caml_apply2" init1 init2 k val)))

(function keep_min_max_k2 (min: val max: val v: val k: val k: val k2: val)
  (let
  (min
  (if (!= (extcall "caml_lessthan" v min val) 1)
  v min)
  max
  (if (!= (extcall "caml_greaterthan" v max val) 1)
  v max))
  (app "caml_apply5" "keep_min_max_k2" min max k k2 k val)))

(function find_min_max_k2 (l: val k: val)
  (if (!= l 1)
  (let h (load val l)
  (app "fold_left_k2" "keep_min_max_k2" h h t k val))
  (raise "exception")))

So we can turn return points into call points and get rid of a lot of potential allocations like that. But of course there is no way to handle functions passing or returning sum types like that ! Well, I'm not so sure.

Sum types

Let's try with the option type for instance:

type 'a option =
  | None
  | Some of 'a

let add_an_option_value opt v =
  match opt with
  | None -> v
  | Some n -> n + v

let n1 = add_an_option_value (Some 3) 4
let n2 = add_an_option_value None 4

The case of the sum type tells us if there is some more values that we can get and their type. But there is another way to associate some type information with an actual value: GADTs

type ('a, 'b) option_case =
  | None' : ('a, unit) option_case
  | Some' : ('a, 'a) option_case

let add_an_option_value (type t) (opt: (int, t) option_case) (n:t) v =
  match opt with
  | None' -> v
  | Some' -> n + v

let n1 = add_an_option_value Some' 3 4
let n2 = add_an_option_value None' () 4

And voilà, no allocation anymore !

Combining that with the CPS transformation can get you quite far without allocating !

Manipulating Memory

Now that we can manage almost any control flow without allocating, we need also to manipulate some values. That's the point where we simply suggest to use the same approach as ASM.js: allocate a single large bigarray (this is some kind of malloc), consider integers as pointers and you can do anything. We won't go into too much details here as this would require another post for that topic.

For some low level packed bitfield manipulation you can have a look at some more tricks

Conclusion

So if you want to write non allocating code in OCaml, turn everything in CPS, add additional arguments everywhere, turn your sum types in unboxed GADTs, manipulate a single large bigarrays. And enjoy !

Comments

Gaetan Dubreil (3 April 2016 at 11 h 16 min):

Thank you for this attractive and informative post. Just to be sure, is it not ‘t’ rather than ‘l’ that must be past to the fold_left function? You said “we only have tail calls now” but I don’t see any none tail calls in the first place, am I wrong?

Pierre Chambart (4 April 2016 at 14 h 48 min):

There where effectively some typos. Thanks for noticing.

There is one non-tail call in fold_left: the call to f. But effectively the recursion is tail.

kantien (25 May 2016 at 13 h 57 min):

Interesting article, but i have one question. Can we say, from the proof theory point of view, that turning the code in CPS style not to allocate is just an application of the Gentzen’s cut-elimination theorem ? I explain in more details this interpretation : if we have a proof P1 of the proposition A and a proof P2 of the proposition A ⇒ B, we can produce a proof P3 of proposition B by applying the cut rule or modus ponens, but the theorem says that we can eliminate the use of cut rule and produce a direct proof P4 of the proposition B. But modus ponens (or cut rule) is just the rule for typing function application : if f has type ‘a -> ‘b and x has type ‘a then f x has type ‘b. And so the cut-elimination theorem says that we can produce an object of type ‘b without allocate an object of type ‘a (this is not necessary to produce the P1 proof, or more exactly this is not necessary to put the P1’s conclusion in the environment in order to use it as a premise of the P2 proof ). Am I right ?

jdxu (4 January 2021 at 11 h 36 min):

Very useful article. BTW, is there any document/tutorial/article about cmm syntax?

Signing the OPAM repository

2015-06-05T09:05:17Z

NOTE (September 2016): updated proposal from OCaml 2016 workshop is available, including links to prototype implementation.

This is an initial proposal on signing the OPAM repository. Comments and discussion are expected on the platform mailing-list.

The purpose of this proposal is to enable a secure distribution of OCaml packages. The package repository does not have to be trusted if package developers sign their releases.

Like Python's pip, Ruby's gems or more recently Haskell's hackage, we are going to implement a flavour of The Upgrade Framework (TUF). This is good because:

it has been designed by people who know the stuff much better than us
it is built upon a threat model including many kinds of attacks, and there are some non-obvious ones (see the specification, and below)
it has been thoroughly reviewed
following it may help us avoid a lot of mistakes

Importantly, it doesn't enforce any specific cryptography, allowing us to go with what we have at the moment in native OCaml, and evolve later, e.g. by allowing ed25519.

There are several differences between the goal of TUF and opam, namely TUF distributes a directory structure containing the code archive, whereas opam distributes metadata about OCaml packages. Opam uses git (and GitHub at the moment) as a first class citizen: new packages are submitted as pull requests by developers who already have a GitHub account.

Note that TUF specifies the signing hierarchy and the format to deliver and check signatures, but allows a lot of flexibility in how the original files are signed: we can have packages automatically signed on the official repository, or individually signed by developers. Or indeed allow both, depending on the package.

Below, we tried to explain the specifics of our implementation, and mostly the user and developer-visible changes. It should be understandable without prior knowledge of TUF.

We are inspired by Haskell's adjustments (and e2e) to TUF using a git repository for packages. A signed repository and signed packages are orthogonal. In this proposal, we aim for both, but will describe them independently.

Threat model

An attacker can compromise at least one of the package distribution system's online trusted keys.
An attacker compromising multiple keys may do so at once or over a period of time.
An attacker can respond to client requests (MITM or server compromise) during downloading of the repository, a package, and also while uploading a new package release.
An attacker knows of vulnerabilities in historical versions of one or more packages, but not in any current version (protecting against zero-day exploits is emphatically out-of-scope).
Offline keys are safe and securely stored.

An attacker is considered successful if they can cause a client to build and install (or leave installed) something other than the most up-to-date version of the software the client is updating. If the attacker is preventing the installation of updates, they want clients to not realize there is anything wrong.

Attacks

Arbitrary package: an attacker should not be able to provide a package they created in place of a package a user wants to install (via MITM during package upload, package download, or server compromise).
Rollback attacks: an attacker should not be able to trick clients into installing software that is older than that which the client previously knew to be available.
Indefinite freeze attacks: an attacker should not be able to respond to client requests with the same, outdated metadata without the client being aware of the problem.
Endless data attacks: an attacker should not be able to respond to client requests with huge amounts of data (extremely large files) that interfere with the client's system.
Slow retrieval attacks: an attacker should not be able to prevent clients from being aware of interference with receiving updates by responding to client requests so slowly that automated updates never complete.
Extraneous dependencies attacks: an attacker should not be able to cause clients to download or install software dependencies that are not the intended dependencies.
Mix-and-match attacks: an attacker should not be able to trick clients into using a combination of metadata that never existed together on the repository at the same time.
Malicious repository mirrors: should not be able to prevent updates from good mirrors.
Wrong developer attack: an attacker should not be able to upload a new version of a package for which they are not the real developer.

Trust

A difficult problem in a cryptosystem is key distribution. In TUF and this proposal, a set of root keys are distributed with opam. A threshold of these root keys needs to sign (transitively) all keys which are used to verify opam repository and its packages.

Root keys

The root of trust is stored in a set of root keys. In the case of the official opam OCaml repository, the public keys are to be stored in the opam source, allowing it to validate the whole trust chain. The private keys will be held by the opam and repository maintainers, and stored password-encrypted, securely offline, preferably on unplugged storage.

They are used to sign all the top-level keys, using a quorum. The quorum has several benefits:

the compromise of a number of root keys less than the quorum is harmless
it allows to safely revoke and replace a key, even if it was lost

The added cost is more maintenance burden, but this remains small since these keys are not often used (only when keys are going to expire, were compromised or in the event new top-level keys need to be added).

The initial root keys could be distributed as such:

Louis Gesbert, opam maintainer, OCamlPro
Anil Madhavapeddy, main repository maintainer, OCaml Labs
Thomas Gazagnaire, main repository maintainer, OCaml Labs
Grégoire Henry, OCamlPro safekeeper
Someone in the OCaml team ?

Keys will be set with an expiry date so that one expires each year in turn, leaving room for smooth rollover.

For other repositories, there will be three options:

no signatures (backwards compatible ?), e.g. for local network repositories. This should be allowed, but with proper warnings.
trust on first use: get the root keys on first access, let the user confirm their fingerprints, then fully trust them.
let the user manually supply the root keys.

End-to-end signing

This requires the end-user to be able to validate a signature made by the original developer. There are two trust paths for the chain of trust (where "→" stands for "signs for"):

(high) root keys → repository maintainer keys → (signs individually) package delegation + developer key → package files
(low) root keys → snapshot key → (signs as part of snapshot) package delegation + developer key → package files

It is intended that packages may initially follow the low trust path, adding as little burden and delay as possible when adding new packages, and may then be promoted to the high path with manual intervention, after verification, from repository maintainers. This way, most well-known and widely used packages will be provided with higher trust, and the scope of an attack on the low trust path would be reduced to new, experimental or little-used packages.

Repository signing

This provides consistent, up-to-date snapshots of the repository, and protects against a whole different class of attacks than end-to-end signing (e.g. rollbacks, mix-and-match, freeze, etc.)

This is done automatically by a snapshot bot (might run on the repository server), using the snapshot key, which is signed directly by the root keys, hence the chain of trust:

root keys → snapshot key → commit-hash

Where "commit-hash" is the head of the repository's git repository (and thus a valid cryptographic hash of the full repository state, as well as its history)

Repository maintainer (RM) keys

Repository maintainers hold the central role in monitoring the repository and warranting its security, with or without signing. Their keys (called targets keys in the TUF framework) are signed directly by the root keys. As they have a high security potential, in order to reduce the consequences of a compromise, we will be requiring a quorum for signing sensitive operations

These keys are stored password-encrypted on the RM computers.

Snapshot key

This key is held by the snapshot bot and signed directly by the root keys. It is used to guarantee consistency and freshness of repository snapshots, and does so by signing a git commit-hash and a time-stamp.

It is held online and used by the snapshot bot for automatic signing: it has lower security than the RM keys, but also a lower potential: it can not be used directly to inject malicious code or metadata in any existing package.

Delegate developer keys

These keys are used by the package developers for end-to-end signing. They can be generated locally as needed by new packagers (e.g. by the opam-publish tool), and should be stored password-encrypted. They can be added to the repository through pull-requests, waiting to be signed (i) as part of snapshots (which also prevents them to be modified later, but we'll get to it) and (ii) directly by RMs.

Initial bootstrap

We'll need to start somewhere, and the current repository isn't signed. An additional key, initial-bootstrap, will be used for guaranteeing integrity of existing, but yet unverified packages.

This is a one-go key, signed by the root keys, and that will then be destroyed. It is allowed to sign for packages without delegation.

Trust chain and revocation

In order to build the trust chain, the opam client downloads a keys/root key file initially and before every update operation. This file is signed by the root keys, and can be verified by the client using its built-in keys (or one of the ways mentioned above for unofficial repositories). It must be signed by a quorum of known root keys, and contains the comprehensive set of root, RM, snapshot and initial bootstrap keys: any missing keys are implicitly revoked. The new set of root keys is stored by the opam client and used instead of the built-in ones on subsequent runs.

Developer keys are stored in files keys/dev/<id>, self-signed, possibly RM signed (and, obviously, snapshot-signed). The conditions of their verification, removal or replacement are included in our validation of metadata update (see below).

File formats and hierarchy

Signed files and tags

The files follow the opam syntax: a list of fields fieldname: followed by contents. The format is detailed in opam's documentation.

The signature of files in opam is done on the canonical textual representation, following these rules:

any existing signature: field is removed
one field per line, ending with a newline
fields are sorted lexicographically by field name
newlines, backslashes and double-quotes are escaped in string literals
spaces are limited to one, and to these cases: after field leaders fieldname:, between elements in lists, before braced options, between operators and their operands
comments are erased
fields containing an empty list, or a singleton list containing an empty list, are erased

The signature: field is a list with elements in the format of string triplets [ "<keyid>" "<algorithm>" "<signature>" ]. For example:

opam-version: "1.2"
name: "opam"
signature: [
  [ "louis.gesbert@ocamlpro.com" "RSASSA-PSS" "048b6fb4394148267df..." ]
]

Signed tags are git annotated tags, and their contents follow the same rules. In this case, the format should contain the field commit:, pointing to the commit-hash that is being signed and tagged.

File hierarchy

The repository format is changed by the addition of:

a directory keys/ at the root
delegation files packages/<pkgname>/delegate and compilers/<patchname>.delegate
signed checksum files at packages/<pkgname>/<pkgname>.<version>/signature

Here is an example:

repository root /
|--packages/
|  |--pkgname/
|  |  |--delegation                    - signed by developer, repo maintainer
|  |  |--pkgname.version1/
|  |  |  |--opam
|  |  |  |--descr
|  |  |  |--url
|  |  |  `--signature                  - signed by developer1
|  |  `--pkgname.version2/ ...
|  `--pkgname2/ ...
|--compilers/
|  |--version/
|  |  |--version+patch/
|  |  |  |--version+patch.comp
|  |  |  |--version+patch.descr
|  |  |  `--version+patch.signature
|  |  `--version+patch2/ ...
|  |--patch.delegate
|  |--patch2.delegate
|  `--version2/ ...
`--keys/
   |--root
   `--dev/
      |--developer1-email              - signed by developer1,
      `--developer2-email ...            and repo maint. once verified

Keys are provided in different files as string triplets [ [ "keyid" "algo" "key" ] ]. keyid must not conflict with any previously-defined keys, and algo may be "rsa" and keys encoded in PEM format, with further options available later.

For example, the keys/root file will have the format:

date=2015-06-04T13:53:00Z
root-keys: [ [ "keyid" "{expire-date}" "algo" "key" ] ]
snapshot-keys: [ [ "keyid" "algo" "key" ] ]
repository-maintainer-keys: [ [ "keyid" "algo" "key" ] ]

This file is signed by current and past root keys -- to allow clients to update. The date: field provides further protection against rollback attacks: no clients may accept a file with a date older than what they currently have. Date is in the ISO 8601 standard with 0 UTC offset, as suggested in TUF.

Delegation files

/packages/pkgname/delegation delegates ownership on versions of package pkgname. The file contains version constraints associated with keyids, e.g.:

name: pkgname
delegates: [
  "thomas@gazagnaire.org"
  "louis.gesbert@ocamlpro.com" {>= "1.0"}
]

The file is signed:

by the original developer submitting it
or by a developer previously having delegation for all versions, for changes
or directly by repository maintainers, validating the delegation, and increasing the level of trust

Every key a developer delegates trust to must also be signed by the developer.

compilers/patch.delegate files follow a similar format (we are considering changing the hierarchy of compilers to match that of packages, to make things simpler).

The delegates: field may be empty: in this case, no packages by this name are allowed on the repository. This may be useful to mark deletion of obsolete packages, and make sure a new, different package doesn't take the same name by mistake or malice.

Package signature files

These guarantee the integrity of a package: this includes metadata and the package archive itself (which may, or may not, be mirrored on the the opam repository server).

The file, besides the package name and version, has a field package-files: containing a list of files below packages/<pkgname>/<pkgname>.<version> together with their file sizes in bytes and one or more hashes, prefixed by their kind, and a field archive: containing the same details for the upstream archive. For example:

name: pkgname
version: pkgversion
package-files: [
  "opam" {901 [ sha1 "7f9bc3cc8a43bd8047656975bec20b578eb7eed9" md5 "1234567890" ]}
  "descr" {448 [ sha1 "8541f98524d22eeb6dd669f1e9cddef302182333" ]}
  "url" {112 [ sha1 "0a07dd3208baf4726015d656bc916e00cd33732c" ]}
  "files/ocaml.4.02.patch" {17243 [ sha1 "b3995688b9fd6f5ebd0dc4669fc113c631340fde" ]}
]
archive: [ 908460 [ sha1 "ec5642fd2faf3ebd9a28f9de85acce0743e53cc2" ] ]

This file is signed either:

by the initial-bootstrap key, only initially
by a delegate key (i.e. by a delegated-to developer)
by a quorum of repository maintainers

The latter is needed to hot-fix packages on the repository: repository maintainers often need to do so. A quorum is still required, to prevent a single RM key compromise from allowing arbitrary changes to every package. The quorum is not initially required to sign a delegation, but is, consistently, required for any change to an existing, signed delegation.

Compiler signature files <version>+<patch>.signature are similar, with fields compiler-files containing checksums for <version>+<patch>.*, the same field archive: and an additional optional field patches:, containing the sizes and hashes of upstream patches used by this compiler.

If the delegation or signature can't be validated, the package or compiler is ignored. If any file doesn't correspond to its size or hashes, it is ignored as well. Any file not mentioned in the signature file is ignored.

Snapshots and linearity

Main snapshot role

The snapshot key automatically adds a signed annotated tag to the top of the served branch of the repository. This tag contains the commit-hash and the current timestamp, effectively ensuring freshness and consistency of the full repository. This protects against mix-and-match, rollback and freeze attacks.

The signed annotated tag is deleted and recreated by the snapshot bot, after checking the validity of the update, periodically and after each change.

Linearity

The repository is served using git: this means, not only the latest version, but the full history of changes are known. This as several benefits, among them, incremental downloads "for free"; and a very easy way to sign snapshots. Another good point is that we have a working full OCaml implementation.

We mentioned above that we use the snapshot signatures not only for repository signing, but also as an initial guarantee for submitted developer's keys and delegations. One may also have noticed, in the above, that we sign for delegations, keys etc. individually, but without a bundle file that would ensure no signed files have been maliciously removed.

These concerns are all addressed by a linearity condition on the repository's git: the snapshot bot does not only check and sign for a given state of the repository, it checks every individual change to the repository since the last well-known, signed state: patches have to follow from that git commit (descendants on the same branch), and are validated to respect certain conditions: no signed files are removed or altered without signature, etc.

Moreover, this check is also done on clients, every time they update: it is slightly weaker, as the client don't update continuously (an attacker may have rewritten the commits since last update), but still gives very good guarantees.

A key and delegation that have been submitted by a developer and merged, even without RM signature, are signed as part of a snapshot: git and the linearity conditions allow us to guarantee that this delegation won't be altered or removed afterwards, even without an individual signature. Even if the repository is compromised, an attacker won't be able to roll out malicious updates breaking these conditions to clients.

The linearity invariants are:

no key, delegation, or package version (signed files) may be removed
a new key is signed by itself, and optionally by a RM
a new delegation is signed by the delegate key, optionally by a RM. Signing keys must also sign the delegate keys
a new package or package version is signed by a valid key holding a valid delegation for this package version
keys can only be modified with signature from the previous key or a quorum of RM keys
delegations can only be modified with signature by a quorum of RMs, or possibly by a former delegate key (without version constraints) in case there was previously no RM signature
any package modification is signed by an appropriate delegate key, or by a quorum of RM keys

It is sometimes needed to do operations, like key revocation, that are not allowed by the above rules. These are enabled by the following additional rules, that require the commit including the changes to be signed by a quorum of repository maintainers using an annotated tag:

package or package version removal
removal (revocation) of a developer key
removal of a package delegation (it's in general preferable to leave an empty delegation)

Changes to the keys/root file, which may add, modify or revoke keys for root, RMs and snapshot keys is verified in the normal way, but needs to be handled for checking linearity since it decides the validity of RM signatures. Since this file may be needed before we can check the signed tag, it has its own timestamp to prevent rollback attacks.

In case the linearity invariant check fail:

on the GitHub repository, this is marked and the RMs are advised not to merge (or to complete missing tag signatures)
on the clients, the update is refused, and the user informed of what's going on (the repository has likely been compromised at that point)
on the repository (checks by the snapshot bot), update is stalled and all repository maintainers immediately warned. To recover, the broken commits (between the last signed tag and master) need to be amended.

Work and changes involved

General

Write modules for key handling ; signing and verification of opam files.

Write the git synchronisation module with linearity checks.

opam

Rewrite the default HTTP repository synchronisation module to use git fetch, verify, and git pull. This should be fully transparent, except:

in the cases of errors, of course
when registering a non-official repository
for some warnings with features that disable signatures, like source pinning (probably only the first time would be good)

Include the public root keys for the default repository, and implement management of updated keys in ~/.opam/repo/name.

Handle the new formats for checksums and non-repackaged archives.

Allow a per-repository security threshold (e.g. allow all, allow only signed packages, allow only packages signed by a verified key, allow only packages signed by their verified developer). It should be easy but explicit to add a local network, unsigned repository. Backends other than git won't be signed anyway (local, rsync...).

opam-publish

Generate keys, handle locally stored keys, generate signature files, handle signing, submit signatures, check delegation, submit new delegation, request delegation change (might require repository maintainer intervention if an RM signed the delegation), delete developer, delete package.

Manage local keys. Probably including re-generating, and asking for revocation.

opam-admin

Most operations on signatures and keys will be implemented as independent modules (as to be usable from e.g. unikernels working on the repository). We should also make them available from opam-admin, for testing and manual management. Special tooling will also be needed by RMs.

fetch the archives (but don't repackage as pkg+opam.tar.gz anymore)
allow all useful operations for repository maintainers (maybe in a different tool ?):
- manage their keys
- list and sign changed packages directly
- list and sign waiting delegations to developer keys
- validate signatures, print reports
- sign tags, including adding a signature to an existing tag to meet the quorum
- list quorums waiting to be met on a given branch
generate signed snapshots (same as the snapshot bot, for testing)

Signing bots

If we don't want to have this processed on the publicly visible host serving the repository, we'll need a mechanism to fetch the repository, and submit the signed tag back to the repository server.

Doing this through mirage unikernels would be cool, and provide good isolation. We could imagine running this process regularly:

fetch changes from the repository's git (GitHub)
check for consistency (linearity)
generate and sign the signed tag
push tag back to the release repository

Travis

All security information and check results should be available to RMs before they make the decision to merge a commit to the repository. This means including signature and linearity checks in a process running on Travis, or similarly on every pull-request to the repository, and displaying the results in the GitHub tracker.

This should avoid most cases where the snapshot bot fails the validation, leaving it stuck (as well as any repository updates) until the bad commits are rewritten.

Some more detailed scenarios

`opam init` and `update` scenario

On init, the client would clone the repository and get to the signed tag, get and check the associated keys/root file, and validate the signed tag according to the new keyset. If all goes well, the new set of root, RM and snapshot keys is registered.

Then all files' signatures are checked following the trust chains, and copied to the internal repository mirror opam will be using (~/.opam/repo/<name>). When a package archive is needed, the download is done either from the repository, if the file is mirrored, or from its upstream, in both cases with known file size and upper limit: the download is stopped if going above the expected size, and the file removed if it doesn't match both.

On subsequent updates, the process is the same except that a fetch operation is done on the existing clone, and that the repository is forwarded to the new signed tag only if linearity checks passed (and the update is aborted otherwise).

`opam-publish` scenario

The first time a developer runs opam-publish submit, a developer key is generated, and stored locally.
Upon opam-publish submit, the package is signed using the key, and the signature is included in the submission.
If the key is known, and delegation for this package matches, all is good
If the key is not already registered, it is added to /keys/dev/ within the pull-request, self-signed.
If there is no delegation for the package, the /packages/pkgname/delegation file is added, delegating to the developer key and signed by it.
If there is an existing delegation that doesn't include the auhor's key, this will require manual intervention from the repository managers. We may yet submit a pull-request adding the new key as delegate for this package, and ask the repository maintainers -- or former developers -- to sign it.

Security analysis

We claim that the above measures give protection against:

Arbitrary packages: if an existing package is not signed, it is not installed (or even visible) to the user. Anybody can submit new unclaimed packages (but, in the current setting, still need GitHub write access to the repository, or to bypass GitHub's security).
Rollback attacks: git updates must follow the currently known signed tag. if the snapshot bot detects deletions of packages, it refuses to sign, and clients double-check this. The keys/root file contains a timestamp.
Indefinite freeze attacks: the snapshot bot periodically signs the signed tag with a timestamp, if a client receives a tag older than the expected age it will notice.
Endless data attacks: we rely on the git protocol and this does not defend against endless data. Downloading of package archive (of which the origin may be any mirror), though, is protected. The scope of the attack is mitigated in our setting, because there are no unattended updates: the program is run manually, and interactively, so the user is most likely to notice.
Slow retrieval attacks: same as above.
Extraneous dependencies attacks: metadata is signed, and if the signature does not match, it is not accepted.

NOTE: the provides field -- yet unimplemented, see the document in opam/doc/design -- could provide a vector in this case, by advertising a replacement for a popular package. Additional measures will be taken when implementing the feature, like requiring a signature for the provided package.
Mix-and-match attacks: the repository has a linearity condition, and partial repositories are not possible.
Malicious repository mirrors: if the signature does not match, reject.
Wrong developer attack: if the developer is not in the delegation, reject.

GitHub repository

Is the link between GitHub (opam-repository) and the signing bot special? If there is a MITM on this link, they can add arbitrary new packages, but due to missing signatures only custom universes. No existing package can be altered or deleted, otherwise consistency condition above does not hold anymore and the signing bot will not sign.

Certainly, the access can be frozen, thus the signing bot does not receive updates, but continues to sign the old repository version.

Snapshot key

If the snapshot key is compromised, an attacker is able to:

Add arbitrary (non already existing) packages, as above.
Freeze, by forever re-signing the signed tag with an updated timestamp.

Most importantly, the attacker won't be able to tamper with existing packages. This hudgely reduces the potential of an attack, even with a compromised snapshot key.

The attacks above would also require either a MITM between the repository and the client, or a compromise of the opam repository: in the latter case, since the linearity check is reproduces even from the clients:

any tamper could be detected very quickly, and measures taken.
a freeze would be detected as soon as a developer checks that their package is really online. That currently happens several times a day.

The repository would then just have to be reset to before the attack, which git makes as easy as it can get, and the holders of the root keys would sign a new /auth/root, revoking the compromised snapshot key and introducing a new one.

In the time before the signing bot can be put back online with the new snapshot key -- i.e. the breach has been found and fixed -- a developer could manually sign time-stamped tags before they expire (e.g. once a day) so as not to hold back updates.

Repository Maintainer keys

Repository maintainers are powerful, they can modify existing opam files and sign them (as hotfix), introduce new delegations for packages, etc.).

However, by requiring a quorum for sensitive operations, we limit the scope of a single RM key compromise to the validation of new developer keys or delegations (which should be the most common operation done by RMs): this enables to raise the level of security of the new, malicious packages but otherwise doesn't change much from what can be done with just access to the git repository.

A further compromise of a quorum of RM keys would allow to remove or tamper with any developer key, delegation or package: any of these amounts to being able to replace any package with a compromised version. Cleaning up would require replacing all but the root keys, and resetting the repository to before any malicious commit.

Difference to TUF

we use git
thus get linearity "for free"
and already have a hash over the entire repository
TUF provides a mechanism for delegation, but it's both heavier and not expressive enough for what we wanted -- delegate to packages directly.
We split in lots more files, and per-package ones, to fit with and nicely extend the git-based workflow that made the success of opam. The original TUF would have big json files signing for a lot of files, and likely to conflict. Both developers and repository maintainers should be able to safely work concurrently without issue. Signing bundles in TUF gives the additional guarantee that no file is removed without proper signature, but this is handled by git and signed tags.
instead of a single file with all signed packages by a specific developer, one file per package

Differences to Haskell:

use TUF keys, not gpg
e2e signing

Reduced Memory Allocations with ocp-memprof

2015-05-18T09:05:17Z

In this blog post, we explain how ocp-memprof helped us identify a piece of code in Alt-Ergo that needed to be improved. Simply put, a function that merges two maps was performing a lot of unnecessary allocations, negatively impacting the garbage collector's activity. A simple patch allowed us to prevent these allocations, and thus speed up Alt-Ergo's execution.

The Story

Il all started with a challenging example coming from an industrial user of Alt-Ergo, our SMT solver. It was proven by Alt-Ergo in approximately 70 seconds. This seemed abnormnally long and needed to be investigated. Unfortunately, all our tests with different options (number of triggers, case-split analysis, …) and different plugins (satML plugin, profiling plugin, fm-simplex plugin) of Alt-Ergo failed to improve the resolution time. We then profiled an execution using ocp-memprof to understand the memory behavior of this example.

Profiling an Execution with `ocp-memprof`

As usual, profiling an OCaml application with ocp-memprof is very simple (see the user manual for more details). We just compiled Alt-Ergo in the OPAM switch for ocp-memprof (version 4.01.0+ocp1) and executed the following command:

$ ocp-memprof -exec ./ae-4.01.0+ocp1-public-without-patch pb-many-GCs.why

The execution above triggers 612 garbage collections in about 114 seconds. The analysis of the generated dumps produces the evolution graph below. We notice on the graph that:

we have approximately 10 MB of hash-tables allocated since the beginning of the execution, which is expected;
the second most allocated data in the major heap are maps, and they keep growing during the execution of Alt-Ergo.

We are not able to precisely identify the allocation origins of the maps in this graph (maps are generic structures that are intensively used in Alt-Ergo). To investigate further, we wanted to know if some global value was abnormally retaining a lot of memory, or if some (non recursive-terminal) iterator was causing some trouble when applied on huge data structures. For that, we extended the analysis with the --per-root option to focus on the memory graph of the last dump. This is done by executing the following command, where 4242 is the PID of the process launched by ocp-memprof --exec in the previous command:

$ ocp-memprof -load 4242 -per-root 611

The per-root graph (above, on the right) gives more interesting information. When expanding the stack node and sorting the sixth column in decreasing order, we notice that:

a bunch of these maps are still in the stack: the item Map_at_192_offset_1 in the first column means that most of the memory is retained by the fold function, at line 192 of the Map module (resolution of stack frames is only available in the commercial version of ocp-memprof);
the "kind" column corresponding to Map_at_192_offset_1 gives better information. It provides the signature of the function passed to fold. This information is already provided by the online version.

Uf.Make(Uf.??Make.X).LX.t ->;
Explanation.t ->;
Explanation.t Map.Make(Uf.Make(Uf.??Make.X).LX).t ->;
Explanation.t Map.Make(Uf.Make(Uf.??Make.X).LX).t

This information allows us to see the precise origin of the allocation: the map of module LX used in uf.ml. Lucky us, there is only one fold function of LX's maps in the Uf module with the same type.

Optimizing the Code

Thanks to the information provided by the --per-root option, we identified the code responsible for this behavior:

(*** function extracted from module uf.ml ***)
module MapL = Map.Make(LX)

let update_neqs r1 r2 dep env =
let merge_disjoint_maps l1 ex1 mapl =
try
let ex2 = MapL.find l1 mapl in
let ex = Ex.union (Ex.union ex1 ex2) dep in
raise (Inconsistent (ex, cl_extract env))
with Not_found ->;
MapL.add l1 (Ex.union ex1 dep) mapl
in
let nq_r1 = lookup_for_neqs env r1 in
let nq_r2 = lookup_for_neqs env r2 in
let mapl = MapL.fold merge_disjoint_maps nq_r1 nq_r2 in
MapX.add r2 mapl (MapX.add r1 mapl env.neqs)

Roughly speaking, the function above retrieves two maps nq_r1 and nq_r2 from env, and folds on the first one while providing the second map as an accumulator. The local function merge_disjoint_maps (passed to fold) raises Exception.Inconsistent if the original maps were not disjoint. Otherwise, it adds the current binding (after updating the corresponding value) to the accumulator. Finally, the result mapl of the fold is used to update the values of r1 and r2 in env.neqs.

After further debugging, we observed that one of the maps (nq_r1 and nq_r2) is always empty in our situation. A straightforward fix consists in testing whether one of these two maps is empty. If it is the case, we simply return the other map. Here is the corresponding code:

(*** first patch: testing if one of the maps is empty ***)
…
let mapl =
if MapL.is_empty nq_r1 then nq_r2
else
if MapL.is_empty nq_r2 then nq_r1
else MapL.fold_merge merge_disjoint_maps nq_r1 nq_r2
…

Of course, a more generic solution should not just test for emptiness, but should fold on the smallest map. In the second patch below, we used a slightly modified version of OCaml's maps that exports the height function (implemented in constant time). This way, we always fold on the smallest map while providing the biggest one as an accumulator.

(*** second (better) patch : folding on the smallest map ***)
…
let small, big =
if MapL.height nq_r1 > MapL.height nq_r2 then nq_r1, nq_r2
else nq_r2, nq_r1
in
let mapl = MapL.fold merge_disjoint_maps small big in
…

Checking the Efficiency of our Patch

Curious to see the result of the patch ? We regenerate the evolution and memory graphs of the patched code (fix 1), and we noticed:

a better resolution time: from 69 seconds to 16 seconds;
less garbage collection : from 53,000 minor collections to 19,000;
a smaller memory footprint : from 26 MB to 24 MB;

Conclusion

We show in this post that ocp-memprof can also be used to optimize your code, not only by decreasing memory usage, but by improving the speed of your application. The interactive graphs are online in our gallery of examples if you want to see and explore them (without the patch and with the patch).

	AE	AE + patch	Remarks
4.01.0	69.1 secs	16.4 secs	substantial improvement on the example
4.01.0+ocp1	76.3 secs	17.1 secs	when using the patched version of Alt-Ergo
dumps generation	114.3 secs (+49%)	17.6 secs (+2.8%)	(important) overhead when dumping memory snapshots
# dumps (major collections)	612 GCs	31 GCs	impressive GC activity without the patch
dumps analysis (online ocp-memprof)	759 secs	24.3 secs
dumps analysis (commercial ocp-memprof)	153 secs	3.7 secs	analysis with commercial ocp-memprof is ~ x5 faster than public version (above)
AE memory footprint	26 MB	24 MB	memory consumption is comparable
minor collections	53K	19K	fewer minor GCs thanks to the patch

Do not hesitate to use `ocp-memprof` on your applications. Of course, all feedback and suggestions are welcome, just [email](mailto:contact@ocamlpro.com) us !

More information:

Homepage: https://memprof.typerex.org/
Gallery of examples: https://memprof.typerex.org/gallery.php
Free Version: https://memprof.typerex.org/free-version.php
Commercial Version: https://memprof.typerex.org/commercial-version.php
Report a Bug: https://memprof.typerex.org/report-a-bug.php

OPAM 1.2.2 Released

2015-05-07T09:05:17Z

OPAM 1.2.2 has just been released. This fixes a few issues over 1.2.1 and brings a couple of improvements, in particular better use of the solver to keep the installation as up-to-date as possible even when the latest version of a package can not be installed.

Upgrade from 1.2.1 (or earlier)

See the normal installation instructions: you should generally pick up the packages from the same origin as you did for the last version -- possibly switching from the official repository packages to the ones we provide for your distribution, in case the former are lagging behind.

There are no changes in repository format, and you can roll back to earlier versions in the 1.2 branch if needed.

Improvements

Conflict messages now report the original version constraints without translation, and they have been made more concise in some cases
Some new opam lint checks, opam lint now numbers its warnings and may provide script-friendly output
Feature to automatically install plugins, e.g. opam depext will prompt to install depext if available and not already installed
Priority to newer versions even when the latest can't be installed (with a recent solver only. Before, all non-latest versions were equivalent to the solver)
Added opam list --resolve to list a consistent installation scenario
Be cool by default on errors in OPAM files, these don't concern end-users and packagers and CI now have opam lint to check them.

Fixes

OSX: state cache got broken in 1.2.1, which could induce longer startup times. This is now fixed
opam config report has been fixed to report the external solver properly
--dry-run --verbose properly outputs all commands that would be run again
Providing a simple path to an aspcud executable as external solver (through options or environment) works again, for backwards-compatibility
Fixed a fd leak on solver calls (thanks Ivan Gotovchits)
opam list now returns 0 when no packages match but no pattern was supplied, which is more helpful in scripts relying on it to check dependencies.

wxOCaml, camlidl and Class Modules

2015-04-13T09:05:17Z

A few months ago, a memory leak in the Scanf.fscanf function of OCaml’s standard library has been reported on the OCaml mailing list. The following “minimal” example reproduces this misbehavior:

for i = 0 to 100_000 do
  let ic = open_in “some_file.txt” in
  Scanf.fscanf ic “%s” (fun _s -&amp;gt; ());
  close_in ic
done;;

read_line ();;

Let us see how to identify the origin of the leak and fix it with our OCaml memory profiler.

Installing the OCaml Memory Profiler

We first install our modified OCaml compiler and the memory profiling tool thanks to the following opam commands:

$ opam remote add memprof http://memprof.typerex.org/opam
$ opam update

$ opam switch 4.01.0+ocp1-20150202
$ opam install ocp-memprof
$ eval opam config env

That’s all ! Installation is done after only five (opam) commands.

Compiling and Executing the Example

The second step consists in compiling the example above and profiling it. This is simply achieved with the commands:

$ ocamlopt scanf_leak.ml -o scanf.x

$ ocp-memprof –exec scanf.x

You may notice that no instrumentation of the source is needed to enable profiling.

Visualizing the Results

In the last command above, scanf.x dumps a lot of information (related to memory occupation) during its execution. Our “OCaml Memory Profiler” then analyzes these dumps, and generates a “human readable” graph that shows the evolution of memory consumption after each OCaml garbage collection. Concretely, this yields the graph below (the interactive graph generated by ocp-memprof is available here). As you can see, memory consumption is growing abnormally and exceed 240Mb ! Note that we stopped the scanf.x after 90 seconds.

Playing With (Some of) ocp-memprof Capabilities

ocp-memprof allows to group and show data contained in the graph w.r.t. several criteria. For instance, data are grouped by “Modules” in the capture below. This allows us to deduce that most allocations are performed in the Scanf and Buffer modules.

In addition to aggregation capabilities, the interactive graph generated by ocp-memprof also allows to “zoom” on particular data. For instance, by looking at Scanf, we obtain the graph below that shows the different functions that are allocating in this module. We remark that the most allocating function is Scanf.Scanning.from_ic. Let us have a look to this function.

From Profiling Graphs to Source Code The code of the function from_ic, that is responsible for most of the allocation in Scanf, is the following:

let memo_from_ic =
let memo = ref [] in
(fun scan_close_ic ic ->
   try 
     List.assq ic !memo 
   with
   | Not_found ->
     let ib = from_ic scan_close_ic (From_channel ic) ic in
     memo := (ic, ib) :: !memo;
     ib)
;;

It looks like that the leak is caused by the memo list that associates a lookahead buffer, resulting from the call to from_ic, with each input channel.

Patching the Code

Benoit Vaugon quickly sent a patch based on weak-pointers that seems to solve the problem. He modified the code as follows:

he put the key in a weak set in order to test if it is gone;
he created a pair that stores the key and the associated value (PairMemo);
he put this pair in a weak set (IcMemo), where it will be reclaimed at the next GC because;
he added a finalizer on the pair that adds again the pair in the weak set at each GC

let memo_from_ic =
  let module IcMemo = Weak.Make (
    struct
      type t = Pervasives.in_channel
      let equal ic1 ic2 = ic1 = ic2
      let hash ic = Hashtbl.hash ic
    end) 
  in
  let module PairMemo = Weak.Make (
    struct
      type t = Pervasives.in_channel * in_channel
      let equal (ic1, _) (ic2, _) = ic1 = ic2
      let hash (ic, _) = Hashtbl.hash ic
    end) 
  in
  let ic_memo = IcMemo.create 16 in
  let pair_memo = PairMemo.create 16 in
  let rec finaliser ((ic, _) as pair) =
    if IcMemo.mem ic_memo ic then (
      Gc.finalise finaliser pair;
      PairMemo.add pair_memo pair) in
  (fun scan_close_ic ic ->
     try snd (PairMemo.find pair_memo (ic, stdin)) with
     | Not_found ->
       let ib = from_ic scan_close_ic (From_channel ic) ic in
       let pair = (ic, ib) in
       IcMemo.add ic_memo ic;
       Gc.finalise finaliser pair;
       PairMemo.add pair_memo pair;
       ib)
;;

Checking the Fixed Version

Curious to see the memory behavior after applying this patch ? The graph below shows the memory consumption of the patched version of Scanf module. Again, the interactive version is available here. After each iteration of the for-loop, the memory is released as expected and memory consumption does not exceed 2.1Mb during each for-loop iteration.

Conclusion

This example is online in our gallery of examples if you want to see and explore the graphs (with the leak and without the leak).

Do not hesitate to use ocp-memprof on your applications. Of course, all feedback and suggestions on using ocp-memprof are welcome, just send us an email !

More information:

Homepage: http://memprof.typerex.org/
Usage: http://memprof.typerex.org/free-version.php
Support: http://memprof.typerex.org/report-a-bug.php
Gallery of examples: http://memprof.typerex.org/gallery.php
Commercial: http://memprof.typerex.org/commercial-version.php

OPAM 1.2.1 Released

2015-03-18T09:05:17Z

OPAM 1.2.1 has just been released. This patch version brings a number of fixes and improvements over 1.2.0, without breaking compatibility.

Upgrade from 1.2.0 (or earlier)

What's new

No huge new features in this point release -- which means you can roll back to 1.2.0 in case of problems -- but lots going on under the hood, and quite a few visible changes nonetheless:

The engine that processes package builds and other commands in parallel has been rewritten. You'll notice the cool new display but it's also much more reliable and efficient. Make sure to set jobs: to a value greater than 1 in ~/.opam/config in case you updated from an older version.
The install/upgrade/downgrade/remove/reinstall actions are also processed in a better way: the consequences of a failed actions are minimised, when it used to abort the full command.
When using version control to pin a package to a local directory without specifying a branch, only the tracked files are used by OPAM, but their changes don't need to be checked in. This was found to be the most convenient compromise.
Sources used for several OPAM packages may use <name>.opam files for package pinning. URLs of the form git+ssh:// or hg+https:// are now allowed.
opam lint has been vastly improved.

... and much more

There is also a new manual documenting the file and repository formats.

Fixes

See the changelog for a summary or closed issues in the bug-tracker for an overview.

Experimental features

These are mostly improvements to the file formats. You are welcome to use them, but they won't be accepted into the official repository until the next release.

New field features: in opam files, to help with ./configure scripts and documenting the specific features enabled in a given build. See the original proposal and the section in the new manual
The "filter" language in opam files is now well defined, and documented in the manual. In particular, undefined variables are consistently handled, as well as conversions between string and boolean values, with new syntax for converting bools to strings.
New package flag "verbose" in opam files, that outputs the package's build script to stdout
New field libexec: in <name>.install files, to install into the package's lib dir with the execution bit set.
Compilers can now be defined without source nor build instructions, and the base packages defined in the packages: field are now resolved and then locked. In practice, this means that repository maintainers can move the compiler itself to a package, giving a lot more flexibility.

Cumulus and ocp-memprof, a love story

2015-03-04T09:05:17Z

In this blog post, we went on the hunt of memory leaks in Cumulus by using our memory profiler: ocp-memprof. Cumulus is a feed aggregator based on Eliom, a framework for programming web sites and client/server web applications, part of the Ocsigen Project.

First, run and get the memory snapshots

To test and run the server, we use ocp-memprof to start the process:

$ ocp-memprof -exec ocsigenserver.opt -c ocsigenserver.opt.conf -v

There are several ways to obtain snapshots:

automatically after each GC: there is nothing to do, this is the default behavior
manually:
- by sending a SIGUSR1 signal (the default signal can be changed by using --signal SIG option);
- by editing the source code and using the dump function in the Headump module:
```
(* the string argument stands for the name of the dump *)
val dump : string -> unit
```

Here, we use the default behavior and get a snapshot after every GC.

The Memory Evolution Graph

After running the server for a long time, the server process shows an unusually high consumption of memory. ocp-memprof automatically generates some statistics on the application memory usage. Below, we show the graph of memory consumption. On the x-axis, you can see the number of GCs, and on the y-axis, the memory size in bytes used by the most popular types in memory.

Eliom expert users would quickly identify that most of the memory is used by XML nodes and attributes, together with strings and closures.

Unfortunately, it is not that easy to know which parts of Cumulus source code are the cause for the allocations of these XML trees. These trees are indeed abstract types allocated using functions exported by the Eliom modules. The main part of the allocations are then located in the Eliom source code.

Generally, we will have a problem to locate abstract type values just using allocation points. It may be useful to browse the memory graph which can be completely reconstructed from the snapshot to identify all paths between the globals and the blocks representing XML nodes.

From roots to leaking nodes

The approach that we chose to identify the leak is to take a look at the pointer graph of our application in order to identify the roots retaining a significant portion of the memory. Above, we can observe the table of the retained size, for all roots of the application. What we can tell quickly is that 92.2% of our memory is retained by values with finalizers.

Below, looking at them more closely, we can state that there is a significant amount of values of type:

[code language="fsharp" gutter="false"] 'a Eliom_comet_base.channel_data Lwt_stream.t -> unit [/code]

Probably, these finalizers are never called in order to free their associated values. The leak is not trivial to track down and fix. However, a quick fix is possible in the case of Cumulus.

Identifying the source code and patching it

After further investigation into the source code of Cumulus, we found the only location where such values are allocated:

(* $ROOT/cumulus/src/base/feeds.ml *)
let (event , call_event ) =
let ( private_event , call_event ) = React.E. create () in
let event = Eliom_react .Down. of_react private_event in
(event , call_event )

The function of_react takes an optional argument ~scope to specify the way that Eliom_comet.Channel.create has to use the communication channel.

Changing the default value of the scope by another given in Eliom module, we have now only one channel and every client use this channel to communicate with the server (the default method created one channel by client).

(* $ROOT/cumulus/src/base/feeds.ml *)
let (event , call_event ) =
let ( private_event , call_event ) = React.E. create () in
let event = Eliom_react .Down. of_react
~scope : Eliom_common . site_scope private_event in
(event , call_event )let (event , call_event ) =

Checking the fix

After patching the source code, we recompile our application and re-execute the process as before. Below, we can observe the new pointer graph. By changing the default value of scope, the size retained by finalizers drops from 92.2% to 0% !

The new evolution graph below shows that the memory usage drops from 45Mb (still growing quickly) for a few hundreds connections to 5.2Mb for thousands connections.

Conclusion

As a reminder, a finalisation function is a function that will be called with the (heap-allocated) value to which it is associated when that value becomes unreachable.

The GC calls finalisation functions in order to deallocate their associated values. You need to pay special attention when writing such finalisation functions, since anything reachable from the closure of a finalisation function is considered reachable. You also need to be careful not to make the value, that you want to free, become reachable again.

This example is online in our gallery of examples if you want to see and explore the graphs (with the leak and without the leak).

Do not hesitate to use ocp-memprof on your applications. Of course, all feedback and suggestions on using ocp-memprof are welcome, just send us a mail ! More information:

Homepage: https://memprof.typerex.org/
Usage: https://memprof.typerex.org/free-version.php
Support: https://memprof.typerex.org/report-a-bug.php
Gallery of examples: https://memprof.typerex.org/gallery.php
Commercial: https://memprof.typerex.org/commercial-version.php

Private Release of Alt-Ergo 1.00

2015-01-29T09:05:17Z

After the public release of Alt-Ergo 0.99.1 last December, it's time to announce a new major private version (1.00) of our SMT solver. As usual:

we freely provide a JavaScript version on Alt-Ergo's website
we provide a private access to our internal repositories for academia users and our clients.

Quick Evaluation

A quick comparison between this new version and the previous releases is given below. Timeout is set to 60 seconds. The benchmark is made of 19044 formulas: (a) some of these formulas are known to be invalid, and (b) some of them are out of scope of current SMT solvers. The results are obtained with Alt-Ergo's native input language.

	public release 0.95.2	public release 0.99.1	private release 1.00
Proved Valid	15980	16334	17638
Proved Valid (%)	84,01 %	85,77 %	92,62 %
Required time (seconds)	10831	10504	9767
Average speed (valid formulas per second)	1,47	1,55	1,81

Main Novelties of Alt-Ergo 1.00

General Improvements

theories data structures: semantic values (internal theories representation of terms) are now hash-consed. This enables the use of hash-based comparison (instead of structural comparison) when possible
theories combination: the dispatcher component, that sends literals assumed by the SAT solver to different theories depending on whether these literals are equalities, disequalities or inequalities, has been re-implemented. The new code is much more simpler and enables some optimizations and factorizations that could not be made before
case-split analysis: we made several improvements in the heuristics of the case-split analysis mechanism over finite domains
explanations propagation: we improved explanations propagation in congruence closure and linear arithmetic algorithms. This makes the proofs faster thanks to a better back-jumping in the SAT solver part
linear integer arithmetic: we re-implemented several parts of linear arithmetic and introduced important improvements in the Fourier-Motzkin algorithm to make it run on smaller sub-problems and avoid some useless executions. These optimizations allowed a significant speed up on our internal benchmarks
data structures: we optimized hash-consing and some functions in the "formula" and "literal" modules
SAT solving: we made a lot of improvements in the default SAT-solver and in the SatML plugin. In particular, the solvers now send lists of facts (literals) to "the decision procedure part" instead of sending them one by one. This avoids intermediate calls to some "expensive" algorithms, such as Fourier-Motzkin
Matching: we extended the E-matching algorithm to also perform matching modulo the theory of records. In addition, we simplified matching heuristics and optimized the E-matching process to avoid computing the same instances several times
Memory management: thanks to the ocp-memprof tool (http://memprof.typerex.org/), we identified some parts of Alt-Ergo that needed some improvements in order to avoid useless memory allocations, and thus unburden the OCaml garbage collector
the function that retrieves the used axioms and predicates (when option 'save-used-context' is activated) has been improved

Bug Fixes

6 in the "inequalities" module of linear arithmetic
4 in the "formula" module
3 in the "ty" module used for types representation and manipulation
2 in the "theories front-end" module that interacts with the SAT solvers
1 in the "congruence closure" algorithm
1 in "existential quantifiers elimination" module
1 in the "type-checker"
1 in the "AC theory" of associative and commutative function symbols
1 in the "union-find" module

New OCamlPro Plugins

profiling plugin: when activated, this plugin records and prints some information about the current execution of Alt-Ergo every 'x' seconds: In particular, one can observe a module being activated, a function being called, the amount of time spent in every module/function, the current decision/instantiation level, the number of decisions/instantiations that have been made so far, the number of case-splits, of boolean/theory conflicts, of assumptions in the decision procedure, of generated instances per axiom, ….
fm-simplex plugin: when activated, this plugin is used instead of the Fourier-Motzkin method to infer bounds for linear integer arithmetic affine forms (which are used in the case-split analysis process). This module uses the Simplex algorithm to simulate particular runs of Fourier-Motzkin, which makes it scale better on linear integer arithmetic problems containing a lot of inequalities

New Options

version-info: prints some information about this version of Alt-Ergo (release and compilation dates, release commit ID)
no-theory: deactivate theory reasoning. In this case, only the SAT-solver and the matching parts are working
inequalities-plugin: specify a plugin to use, instead of the "default" Fourier-Motzkin algorithm, to handle inequalities of linear arithmetic
tighten-vars: when this option is set, the Fm-Simplex plugin will try to infer bounds for integer variables as well. Note that this option may be very expensive
profiling-plugin: specify a profiling plugin to use to monitor an execution of Alt-Ergo
profiling : makes the profiling module prints its information every seconds
no-tcp: deactivate constraints propagation modulo theories

Removed Capabilities

the pruning module used in the frontend is now removed
the SMT and SMT2 front-ends are removed. We plan to implement a new front-end for SMT2 in upcoming releases

OPAM 1.2 and Travis CI

2014-12-18T09:05:17Z

The new pinning feature of OPAM 1.2 enables new interesting workflows for your day-to-day development in OCaml projects. I will briefly describe one of them here: simplifying continuous testing with Travis CI and GitHub.

Creating an `opam` file

As explained in the previous post, adding an opam file at the root of your project now lets you pin development versions of your project directly. It's very easy to create a default template with OPAM 1.2:

$ opam pin add <my-project-name> . --edit
[... follow the instructions ...]

That command should create a fresh opam file; if not, you might need to fix the warnings in the file by re-running the command. Once the file is created, you can edit it directly and use opam lint to check that is is well-formed.

If you want to run tests, you can also mark test-only dependencies with the {test} constraint, and add a build-test field. For instance, if you use oasis and ounit, you can use something like:

build: [
  ["./configure" "--prefix=%{prefix}%" "--%{ounit:enable}%-tests"]
  [make]
]
build-test: [make "test"]
depends: [
  ...
  "ounit" {test}
  ...
]

Without the build-test field, the continuous integration scripts will just test the compilation of your project for various OCaml compilers. OPAM doesn't run tests by default, but you can make it do so by using opam install -t or setting the OPAMBUILDTEST environment variable in your local setup.

Installing the Travis CI scripts

Travis CI is a free service that enables continuous testing on your GitHub projects. It uses Ubuntu containers and runs the tests for at most 50 minutes per test run.

To use Travis CI with your OCaml project, you can follow the instructions on https://github.com/ocaml/ocaml-travisci-skeleton. Basically, this involves:

adding .travis.yml at the root of your project. You can tweak this file to test your project with different versions of OCaml. By default, it will use the latest stable version (today: 4.02.1, but it will be updated for each new compiler release). For every OCaml version that you want to test (supported values for <VERSION> are 3.12, 4.00, 4.01 and 4.02) add the line:

env:
 - OCAML_VERSION=<VERSION>

signing in at TravisCI using your GitHub account and enabling the tests for your project (click on the + button on the left pane).

And that's it, your project now has continuous integration, using the OPAM 1.2 pinning feature and Travis CI scripts.

Testing Optional Dependencies

By default, the script will not try to install the optional dependencies specified in your opam file. To do so, you need to manually specify which combination of optional dependencies you want to tests using the DEPOPTS environment variable. For instance, to test cohttp first with lwt, then with async and finally with both lwt and async (but only on the 4.01 compiler) you should write:

env:
   - OCAML_VERSION=latest DEPOPTS=lwt
   - OCAML_VERSION=latest DEPOPTS=async
   - OCAML_VERSION=4.01   DEPOPTS="lwt async"

As usual, your contributions and feedback on this new feature are gladly welcome.

OPAM 1.2.0 Released

2014-10-23T09:05:17Z

We are very proud to announce the availability of OPAM 1.2.0.

Upgrade from 1.1

Simply follow the usual instructions, using your preferred method (package from your distribution, binary, source, etc.) as documented on the homepage.

NOTE: There are small changes to the internal repository format (~/.opam). It will be transparently updated on first run, but in case you might want to go back and have anything precious there, you're advised to back it up.

Usability

Lot of work has been put into providing a cleaner interface, with helpful behaviour and messages in case of errors.

The documentation pages also have been largely rewritten for consistency and clarity.

New features

This is just the top of the list:

A extended and versatile opam pin command. See the Simplified packaging workflow
More expressive queries, see for example opam source
New metadata fields, including source repositories, bug-trackers, and finer control of package behaviour
An opam lint command to check the quality of packages

For more detail, see the announcement for the beta, the full changelog, and the bug-tracker.

Package format

The package format has been extended to the benefit of both packagers and users. The repository already accepts packages in the 1.2 format, and this won't affect 1.1 users as a rewrite is done on the server for compatibility with 1.1.

If you are hosting a repository, you may be interested in these administration scripts to quickly take advantage of the new features or retain compatibility.

OPAM 1.2: Repository Pinning

2014-08-19T09:05:17Z

Most package managers support some pin functionality to ensure that a given package remains at a particular version without being upgraded. The stable OPAM 1.1 already supported this by allowing any existing package to be pinned to a target, which could be a specific released version, a local filesystem path, or a remote version-controlled repository.

However, the OPAM 1.1 pinning workflow only lets you pin packages that already exist in your OPAM repositories. To declare a new package, you had to go through creating a local repository, registering it in OPAM, and adding your package definition there. That workflow, while reasonably clear, required the user to know about the repository format and the configuration of an internal repository in OPAM before actually getting to writing a package. Besides, you were on your own for writing the package definition, and the edit-test loop wasn't as friendly as it could have been.

A natural, simpler workflow emerged from allowing users to pin new package names that don't yet exist in an OPAM repository:

choose a name for your new package
opam pin add in the development source tree
the package is created on-the-fly and registered locally.

To make it even easier, OPAM can now interactively help you write the package definition, and you can test your updates with a single command. This blog post explains this new OPAM 1.2 functionality in more detail; you may also want to check out the new Packaging tutorial relying on this workflow.

From source to package

For illustration purposes in this post I'll use a tiny tool that I wrote some time ago and never released: ocp-reloc. It's a simple binary that fixes up the headers of OCaml bytecode files to make them relocatable, which I'd like to release into the public OPAM repository.

"opam pin add"

The command opam pin add <name> <target> pins package <name> to <target>. We're interested in pinning the ocp-reloc package name to the project's source directory.

cd ocp-reloc
opam pin add ocp-reloc .

If ocp-reloc were an existing package, the metadata would be fetched from the package description in the OPAM repositories. Since the package doesn't yet exist, OPAM 1.2 will instead prompt for on-the-fly creation:

Package ocp-reloc does not exist, create as a NEW package ? [Y/n] y
ocp-reloc is now path-pinned to ~/src/ocp-reloc

NOTE: if you are using beta4, you may get a version-control-pin instead, because we added auto-detection of version-controlled repos. This turned out to be confusing (issue #1582), because your changes wouldn't be reflected until you commit, so this has been reverted in favor of a warning. Add the --kind path option to make sure that you get a path-pin.

OPAM Package Template

Now your package still needs some kind of definition for OPAM to acknowledge it; that's where templates kick in, the above triggering an editor with a pre-filled opam file that you just have to complete. This not only saves time in looking up the documentation, it also helps getting consistent package definitions, reduces errors, and promotes filling in optional but recommended fields (homepage, etc.).

opam-version: "1.2"
name: "ocp-reloc"
version: "0.1"
maintainer: "Louis Gesbert <louis.gesbert@ocamlpro.com>"
authors: "Louis Gesbert <louis.gesbert@ocamlpro.com>"
homepage: ""
bug-reports: ""
license: ""
build: [
  ["./configure" "--prefix=%{prefix}%"]
  [make]
]
install: [make "install"]
remove: ["ocamlfind" "remove" "ocp-reloc"]
depends: "ocamlfind" {build}

After adding some details (most importantly the dependencies and build instructions), I can just save and exit. Much like other system tools such as visudo, it checks for syntax errors immediately:

[ERROR] File "/home/lg/.opam/4.01.0/overlay/ocp-reloc/opam", line 13, character 35-36: '.' is not a valid token.
Errors in /home/lg/.opam/4.01.0/overlay/ocp-reloc/opam, retry editing ? [Y/n]

Installation

You probably want to try your brand new package right away, so OPAM's default action is to try and install it (unless you specified -n):

ocp-reloc needs to be installed.
The following actions will be performed:
 - install   cmdliner.0.9.5                        [required by ocp-reloc]
 - install   ocp-reloc.0.1*
=== 1 to install ===
Do you want to continue ? [Y/n]

I usually don't get it working the first time around, but opam pin edit ocp-reloc and opam install ocp-reloc -v can be used to edit and retry until it does.

Package Updates

How do you keep working on your project as you edit the source code, now that you are installing through OPAM? This is as simple as:

opam upgrade ocp-reloc

This will pick up changes from your source repository and reinstall any packages that are dependent on ocp-reloc as well, if any.

So far, we've been dealing with the metadata locally used by your OPAM installation, but you'll probably want to share this among developers of your project even if you're not releasing anything yet. OPAM takes care of this by prompting you to save the opam file back to your source tree, where you can commit it directly into your code repository.

cd ocp-reloc
git add opam
git commit -m 'Add OPAM metadata'
git push

Publishing your New Package

The above information is sufficient to use OPAM locally to integrate new code into an OPAM installation. Let's look at how other developers can share this metadata.

Picking up your development package

If another developer wants to pick up ocp-reloc, they can directly use your existing metadata by cloning a copy of your repository and issuing their own pin.

git clone git://github.com/OCamlPro/ocp-reloc.git
opam pin add ocp-reloc/

Even specifying the package name is optional since this is documented in ocp-reloc/opam. They can start hacking, and if needed use opam pin edit to amend the opam file too. No need for a repository, no need to share anything more than a versioned opam file within your project.

Cloning already existing packages

We have been focusing on an unreleased package, but the same functionality is also of great help in handling existing packages, whether you need to quickly hack into them or are just curious. Let's consider how to modify the omd Markdown library.

opam source omd --pin
cd omd.0.9.7
...patch...
opam upgrade omd

The new opam source command will clone the source code of the library you specify, and the --pin option will also pin it locally to ensure it is used in preference to all other versions. This will also take care of recompiling any installed packages that are dependent on omd using your patched version so that you notice any issues right away.

There's a new OPAM field available in 1.2 called dev-repo. If you specify this in your metadata, you can directly pin to the upstream repository via opam source --dev-repo --pin.

If the upstream repository for the package contains an opam file, that file will be picked up in preference to the one from the OPAM repository as soon as you pin the package. The idea is to have:

a development opam file that is versioned along with your source code (and thus accurately tracks the latest dependencies for your package).
a release opam file that is published on the OPAM repository and can be updated independently without making a new release of the source code.

How to get from the former to the latter will be the subject of another post! In the meantime, all users of the beta are welcome to share their experience and thoughts on the new workflow on the bug tracker.

OPAM 1.2.0 public beta released

2014-08-14T09:05:17Z

It has only been 18 months since the first release of OPAM, but it is already difficult to remember a time when we did OCaml development without it. OPAM has helped bring together much of the open-source code in the OCaml community under a single umbrella, making it easier to discover, depend on, and maintain OCaml applications and libraries. We have seen steady growth in the number of new packages, updates to existing code, and a diverse group of contributors.

OPAM has turned out to be more than just another package manager. It is also increasingly central to the demanding workflow of industrial OCaml development, since it supports multiple simultaneous (patched) compiler installations, sophisticated package version constraints that ensure statically-typed code can be recompiled without conflict, and a distributed workflow that integrates seamlessly with Git, Mercurial or Darcs version control. OPAM tracks multiple revisions of a single package, thereby letting packages rely on older interfaces if they need to for long-term support. It also supports multiple package repositories, letting users blend the global stable package set with their internal revisions, or building completely isolated package universes for closed-source products.

Since its initial release, we have been learning from the extensive feedback from our users about how they use these features as part of their day-to-day workflows. Larger projects like XenAPI, the Ocsigen web suite, and the Mirage OS publish OPAM remotes that build their particular software suites. Complex applications such as the Pfff static analysis tool and Hack language from Facebook, the Frenetic SDN language and the Arakoon distributed key store have all appeared alongside these libraries. Jane Street pushes regular releases of their production Core/Async suite every couple of weeks.

One pleasant side-effect of the growing package database has been the contribution of tools from the community that make the day-to-day use of OCaml easier. These include the utop interactive toplevel, the IOCaml browser notebook, and the Merlin IDE extension. While these tools are an essential first step, there's still some distance to go to make the OCaml development experience feel fully integrated and polished.

Today, we are kicking off the next phase of evolution of OPAM and starting the journey towards building an OCaml Platform that combines the OCaml compiler toolchain with a coherent workflow for build, documentation, testing and IDE integration. As always with OPAM, this effort has been a collaborative effort, coordinated by the OCaml Labs group in Cambridge and OCamlPro in France. The OCaml Platform builds heavily on OPAM, since it forms the substrate that pulls together the tools and facilitates a consistent development workflow. We've therefore created this blog on opam.ocaml.org to chart its progress, announce major milestones, and eventually become a community repository of all significant activity.

Major points:

OPAM 1.2 beta available: Firstly, we're announcing the availability of the OPAM 1.2 beta, which includes a number of new features, hundreds of bug fixes, and pretty new colours in the CLI. We really need your feedback to ensure a polished release, so please do read the release notes below.
In the coming weeks, we will provide an overview of what the OCaml Platform is (and is not), and describe an example workflow that the Platform can enable.
Feedback: If you have questions or comments as you read these posts, then please do join the platform@lists.ocaml.org and make them known to us.

Releasing the OPAM 1.2 beta4

We are proud to announce the latest beta of OPAM 1.2. It comes packed with new features, stability and usability improvements. Here the highlights.

Binary RPMs and DEBs!

We now have binary packages available for Fedora 19/20, CentOS 6/7, RHEL7, Debian Wheezy and Ubuntu! You can see the full set at the OpenSUSE Builder site and download instructions for your particular platform.

An OPAM binary installation doesn't need OCaml to be installed on the system, so you can initialize a fresh, modern version of OCaml on older systems without needing it to be packaged there. On CentOS 6 for example:

cd /etc/yum.repos.d/
wget http://download.opensuse.org/repositories/home:ocaml/CentOS_6/home:ocaml.repo
yum install opam
opam init --comp=4.01.0

Simpler user workflow

For this version, we focused on improving the user interface and workflow. OPAM is a complex piece of software that needs to handle complex development situations. This implies things might go wrong, which is precisely when good support and error messages are essential. OPAM 1.2 has much improved stability and error handling: fewer errors and more helpful messages plus better state backups when they happen.

In particular, a clear and meaningful explanation is extracted from the solver whenever you are attempting an impossible action (unavailable package, conflicts, etc.):

$ opam install mirage-www=0.3.0
The following dependencies couldn't be met:
  - mirage-www -> cstruct < 0.6.0
  - mirage-www -> mirage-fs >= 0.4.0 -> cstruct >= 0.6.0
Your request can't be satisfied:
  - Conflicting version constraints for cstruct

This sets OPAM ahead of many other package managers in terms of user-friendliness. Since this is made possible using the tools from irill (which are also used for Debian), we hope that this work will find its way into other package managers. The extra analyses in the package solver interface are used to improve the health of the central package repository, via the OPAM Weather service.

And in case stuff does go wrong, we added the opam upgrade --fixup command that will get you back to the closest clean state.

The command-line interface is also more detailed and convenient, polishing and documenting the rough areas. Just run opam <subcommand> --help to see the manual page for the below features.

More expressive queries based on dependencies.

$ opam list --depends-on cow --rec
# Available packages recursively depending on cow.0.10.0 for 4.01.0:
cowabloga   0.0.7  Simple static blogging support.
iocaml      0.4.4  A webserver for iocaml-kernel and iocamljs-kernel.
mirage-www  1.2.0  Mirage website (written in Mirage)
opam2web    1.3.1 (pinned)  A tool to generate a website from an OPAM repository
opium       0.9.1  Sinatra like web toolkit based on Async + Cohttp
stone       0.3.2  Simple static website generator, useful for a portfolio or documentation pages

Check on existing opam files to base new packages from.

$ opam show cow --raw
opam-version: "1"
name: "cow"
version: "0.10.0"
[...]

Clone the source code for any OPAM package to modify or browse the interfaces.

$ opam source cow
Downloading archive of cow.0.10.0...
[...]
$ cd cow.0.10.0

We've also improved the general speed of the tool to cope with the much bigger size of the central repository, which will be of importance for people building on low-power ARM machines, and added a mechanism that will let you install newer releases of OPAM directly from OPAM if you choose so.

Yet more control for the packagers

Packaging new libraries has been made as straight-forward as possible. Here is a quick overview, you may also want to check the OPAM 1.2 pinning post.

opam pin add <name> <sourcedir>

will generate a new package on the fly by detecting the presence of an opam file within the source repository itself. We'll do a followup post next week with more details of this extended opam pin workflow.

The package description format has also been extended with some new fields:

bug-reports: and dev-repo: add useful URLs
install: allows build and install commands to be split,
flags: is an entry point for several extensions that can affect your package.

Packagers can limit dependencies in scope by adding one of the keywords build, test or doc in front of their constraints:

depends: [
  "ocamlfind" {build & >= 1.4.0}
  "ounit" {test}
]

Here you don't specifically require ocamlfind at runtime, so changing it won't trigger recompilation of your package. ounit is marked as only required for the package's build-test: target, i.e. when installing with opam install -t. This will reduce the amount of (re)compilation required in day-to-day use.

We've also made optional dependencies more consistent by removing version constraints from the depopts: field: their meaning was unclear and confusing. The conflicts field is used to indicate versions of the optional dependencies that are incompatible with your package to remove all ambiguity:

depopts: [ "async" {>= "109.15.00"} & "async_ssl" {>= "111.06.00"} ]

becomes:

depopts: [ "async" "async_ssl" ]
conflicts: [ "async" {< "109.15.00"}
             "async_ssl" {< "111.06.00"} ]

There is an upcoming features field that will give more flexibility in a clearer and consistent way for such complex cases.

Easier to package and install

Efforts were made on the build of OPAM itself as well to make it as easy as possible to compile, bootstrap or install. There is no more dependency on camlp4 (which has been moved out of the core distribution in OCaml 4.02.0), and the build process is more conventional (get the source, run ./configure, make lib-ext to get the few internal dependencies, make and make install). Packagers can use make cold to build OPAM with a locally compiled version of OCaml (useful for platforms where it isn't packaged), and also use make download-ext to store all the external archives within the source tree (for automated builds which forbid external net access).

The whole documentation has been rewritten as well, to be better focused and easier to browse. Please leave any feedback or changes on the documentation on the issue tracker.

Try it out !

The public beta of OPAM 1.2 is just out. You're welcome to give it a try and give us feedback before we roll out the release!

We'd be most interested on feedback on how easily you can work with the new pinning features, on how the new metadata works for you... and on any errors you may trigger that aren't followed by informative messages or clean behaviour.

If you are hosting a repository, the administration scripts may help you quickly update all your packages to benefit from the new features.

OCamlPro Highlights: May-June 2014

2014-07-16T09:05:17Z

Here is a short report on some of our public activities in May and June 2014.

Towards OPAM 1.2

After a lot of discussions and work on OPAM itself, we are now getting to a clear workflow for OCaml developpers and packagers: the preliminary document for OPAM 1.2 is available here. The idea is that you can now easily create and test the metadata locally, before having to get your package included in any repo: there is less administrative burden and it's much quicker to start, fix it, test it and get it right.

Things getting pretty stable, we are closing the last few bugs and should be releasing 1.2~beta very shortly.

OCaml Hacking Session

We participated in the first OCaml hacking session in Paris organized by Thomas Braibant and supervised by Gabriel Scherer, who had kindly prepared in advance a selection of tasks. In particular, he came with a list of open bugs in Mantis that makes for good first descents in the compiler's code.

It was the first event of this kind for the OCaml Users in Paris (OUPS) meetup group. It was a success since everybody enjoyed it and some work has actually been achieved. We'll have to wait for the next one to confirm that !

On our front, Fabrice started working (with others) on a good, consensual Emacs profile; Pierre worked on building cross-compilers using Makefile templates; Benjamin wanted to evaluate the feasibility of handling ppx extension nodes correctly inside Emacs, and it turns out that elisp tools exist for the task! You can see a first experiment running in the following screen capture, or even try the code (just open it in emacs, do a M-x eval-buffer on it and then a M-x tuareg-mode-with-ppx on an OCaml file). But beware, it's not yet very mainstream and can make your Emacs crash.

Alt-Ergo Development

During the last two months, we participated in the supervision of an intern, Albin Coquereau - a graduted student at University Paris-Sud - in the VALS team who worked on a conservative extension of the SMT2 standard input language with prenex polymorphism a la ML and overloading. First results are promising. In the future, we plan to replace Alt-Ergo's input language with our extension of SMT2 in order to get advantage from SMT2's features and polymorphism's expressiveness.

Recenlty, we have also published an online Javascript-based version of Alt-Ergo (based on private release 0.99).

OCaml Adventures in Scilab Land

We are currently working on the proper integration of our Scilab tools in the Scilab world, respecting its ways and conventions. For this, we built a Scilab module respecting the standard ATOMS interface. This module can embed an OCaml program inside the run-time environment of Scilab, so that OCaml functions can be called as external primitives. (Dyn)linking Scilab's various components, LLVM's and the OCaml run-time together was not that easy.

Symmetrically, we built an OCaml library to manipulate and build Scilab values from OCaml, so that our tools can introspect the dynamic envrionment of Scilab's interprete. We also worked with the Scilab team to defined an AST interchange mechanism.

We plan to use this module as an entry point for our JIT oriented type system, as well as to integrate Scilint, our style checker, so that a Scilab user can check their functions without leaving the Scilab toplevel.

Experiment with Bytes and Backward Compatibility

As announced by a long discussion in the caml-list, OCaml 4.02 introduces the first step to eliminate a long known OCaml problem: String Mutability. The main difficulty being that resolving that problem necessarilly breaks backward compatibility.

To achieve this first step, OCaml 4.02 comes with a new bytes type and a corresponding Bytes module, similar to OCaml 4.01 String module, but using the bytes type. The type of a few functions from the String module changed to use the bytes type (like String.set, String.blit... ). By default the string and bytes types are equal, hence ensuring backward compatibility for this release, but a new argument "-safe-string" to the compiler can be used to remove this equality, and will probably become the default in some future release.

# let s = "foo";;
val s : string = "foo"
# s.[0] <- 'a';;
Characters 0-1:
s.[0] <- 'a';;
^
Error: This expression has type string but an expression was expected of type bytes

Notice that even when using -safe-string you shouldn't rely on strings being immutable. For instance even if you compile that file with -safe-string, the assertion in the function g does not necessarilly hold:

If the following file a.ml is compiled with -safe-string

let s = "foo"
let f () = s
let g () = assert(s = "foo")

and the following file b.ml is compiled without -safe-string

let s = A.f () in
s.[0] <- 'a';
A.g ()

In b.ml the equality holds, so modifying the string is possible, and the assertion from A.g will fail.

So you should consider that for now -safe-string is only a compiler-enforced good practice. But this may (and should) change in future versions. The ocamlc man page says:

-safe-string
Enforce the separation between types string and bytes, thereby
making strings read-only. This will become the default in a
future version of OCaml.

In other words if you want your current code to be forward-compatible, your code should start using bytes instead of string as soon as possible.

Maintaining Compatibility between 4.01 and 4.02

In our experiments, we found a convenient solution to start using the bytes type while still providing compatibility with 4.01: we use a small StringCompat module that we open at the beginning of all our files making use of bytes. Depending on the version of OCaml used to build the program, we provide two different implementations of stringCompat.ml.

Before 4.02, our stringCompat.ml file provides a bytes type and a Bytes module, including the String module plus an often used Bytes.to_string equivalent:

type bytes = string
module Bytes = struct
include String
let to_string t = t
end

After 4.02, our stringCompat.ml file is much simpler:

type t = bytes
type bytes = t
module Bytes = Bytes

You might actually even wonder why it is not empty ? In fact, it is also a good practice to compile with a warning for unused open, and an empty stringCompat.ml would always trigger a warning in 4.02 for not being useful. Instead, this simple implementation is always seen as useful, as any use of bytes or Bytes will use the (virtual) indirection through StringCompat.

We plan to upload this module as a string-compat package in OPAM, so that everybody can use this trick. If you have a better solution, we'll be pleased to discuss it via the pull on opam-repository.

Testing whether your project correctly builds with "-safe-string"

When your code have been adapted to use the bytes whenever you need to modify a string, you can test if you didn't miss a case using OCaml 4.02 without changing your build system. To do that, you can just set the environment variable OCAMLPARAM to "safe-string=1,_". Notice that "OCAMLPARAM" should only be used for testing purpose, never set it in your build system or tools, this would prevent testing new compiler options on your package (and you will receive complaints when the core developers can't desactivate the "-w A -warn-error" generated by your build system) !

If your project passes this test and you don't use "-warn-error", your package should continue to build without modification in the near and the not-so-near future (unless you link with the compiler internals of course).

Try Alt-Ergo in Your Browser

2014-07-15T09:05:17Z

Recently, we worked on an online Javascript-based serverless version of the Alt-Ergo SMT solver. In what follows, we will explain the principle of this version of Alt-Ergo, show how it can be used on a realistic example and compare its performances with bytecode and native binaries of Alt-Ergo.

Compilation

"Try Alt-Ergo" is a Javascript-based version of Alt-Ergo that can be run on compatible browsers (eg. Firefox, Chromium) without requiring a server for computations. It is obtained by compiling the bytecode executable of the solver into Javascript thanks to js_of_ocaml. The .js file is generated following the scheme given below. Roughly speaking, it consists of three steps:

A new frontend (main_js.ml) is added to the sources of Alt-Ergo. This file contains some glue code that allows the generated .js file to interact with an HTML file (insertion of buttons, modification of DIV contents, ...)
The sources of Alt-Ergo and main_js.ml are compiled with ocamlc. The compilation make use of a preprocessor provided by js_of_ocaml.
The js_of_ocaml compiler is used to transform the bytecode generated by ocaml into a javascript file.

The .js file is then plugged into an HTML file that fits with the glue code inserted in main_js.ml.

General overview of the HTML interface

The HTML interface is made of four panels:

The left panel is an editable textarea in which you can write/past/load the formula you want to prove.
The bottom-left panel is used to display the answer of Alt-Ergo.
The middle panel contains a set of buttons that allow to interact with both the interface and the javascript version of Alt-Ergo.
The right panel is used to display different views. The default view ("Options") allows to control the options of Alt-Ergo. When a formula is proved valid, one may switch to "Statistics" view, thanks to the corresponding button in the middle, to see what are the quantified axioms and predicates that are used/instantiated during the proof. The "Debug" view shows the information received by main_js.ml from the HTML interface. The "Examples" view shows some basic examples in Alt-Ergo's native input language that can be loaded in the left panel by a simple click.

A step-by-step example

Let us see how "Try Alt-Ergo" works on a formula translated from Atelier-B in the context of the BWare project:

First, open Try Alt-Ergo in a new tab/window.
Download the formula try-alt-ergo.why. This formula contains 177 quantified axioms and 132 predicates.
Click on the "Load a Local File" button of Try alt-ergo's interface and load the example into the left panel.
Go to "Options" panel and set the maximum number of steps to 1000, the maximum number of triggers to 1, and deactivate E-matching
Click on "Ask Alt-Ergo" button and wait approximately 60 seconds (depending on your computer). On my laptop, Alt-Ergo given the following answer after, approximately, 40 seconds.

# Alt-Ergo's answer: Valid (37.2260 seconds) (222 steps)

Now, you can navigate into the "Statistics" panel to see the quantified axioms and predicates that are instantiated during the proof, those that are potentially used, and those that have never been instantiated.

Limitations

The Javascript version is slower than native and bytecode versions. In fact, bytecode executable is 4 times faster and native executable is 42 times faster than "Try Alt-Ergo", as shown below.

./alt-ergo.byte -nb-triggers 1 -no-Ematching -max-split infinity -save-used-context try-alt-ergo.why
File "/home/mi/Bureau/po.why", line 3017, characters 1-2450:Valid (9.3105) (222)

./alt-ergo.opt -nb-triggers 1 -no-Ematching -max-split infinity -save-used-context try-alt-ergo.why
File "/home/mi/Bureau/po.why", line 3017, characters 1-2450:Valid (0.8750) (222)

Since it is not possible to set a time limit in javascript. The "steps" mechanism should be used instead. This limit controls the number of calls to the decision procedure component of the solver.
Currently, the integration of external plugins (such as our miniSAT-based SAT solver) is not supported
Compared to AltGr-Ergo, statistics and debug information are only shown at the end of the execution.
"Asking Alt-Ergo" may report "syntax error" on well formed files for Safari and Midori users. The "Load a Local File" button is not working on Opera browser.

[ Acknowledgement: this work is financially supported by the BWare project. ]

Comments

Joshua Pratt (10 January 2020 at 5 h 20 min):

Can the compiled alt-ergo.js be uploaded to npm? I’d love to use it in a web page I’m working on.

OCamlPro (6 March 2020 at 16 h 03 min):

Hi Joshua, thanks for passing by 🙂 We have no plans of building a bridge between a version of Alt-Ergo in JS and npm. However, you can tweak Alt-Ergo to suit your needs! We would recommend taking a look at Try Why3 http://why3.lri.fr/try/, where you will find a JavaScript version of Alt-Ergo. You can follow their instructions to build Alt-Ergo in JavaScript https://gitlab.inria.fr/why3/why3/tree/master/src/trywhy3

Bharat Jayaraman (26 February 2020 at 17 h 37 min):

This is VERY USEFUL tool!

I am using it in a course on software verification here in Buffalo. It’s great for checking verification conditions.

Many thanks, Bharat

OCamlPro (6 March 2020 at 16 h 04 min):

Hi Bharat! Thank you for your message, we are always glad to hear from our users! If you feel so inclined, you can drop us an email at alt-ergo@ocamlpro.com to tell us more about your experience with Alt-Ergo and any feedback you may have.

OCamlPro Highlights: April 2014

2014-05-20T09:05:17Z

Here is a short report on some of our activities in April 2014, and a short analysis of OCaml evolution since its first release.

OPAM Improvements

We're still working on release 1.2. It was decided to include quite a few new features in this release, which delayed it a little bit since we want to be sure to get it right. It's now getting stabilized, documented and tested. One of the biggest improvements concerns the development workflow and the use of pinned packages, which is a powerful and complex feature that could also get a bit confusing. We are grateful for the large amount of feedback from the community that helped in its design. The basic idea is to use OPAM metadata from within the source packages, because it's most useful while developping and helps get the packaging right. It was possible before, but a little bit awkward : you now only need to provide an opam file or directory at the root of your project, and when pinned to either a local path or a version-controlled repository, opam will pick it up and use it. It will then be synchronized on any subsequent opam update. You can even do this if there is no corresponding package in the repository, OPAM will create it and store it in its internal repository for you. And in case this metadata is getting in the way, or you just want a quick local fix, you can always do opam pin edit <package> to locally change the metadata used by opam.

During this month, we've also been improving performance by a large amount in several areas, because delays could become noticeable for people using it on eg. raspberry pis. There is an important clarification on the handling of optional dependencies; and we worked hard on making the build of OPAM as painless as possible on every possible setting.

OPAM Weather Service

Last month, we presented an online service for OPAM, to provide advanced CUDF solvers to every OPAM user. The service is provided by IRILL, and based on the tools they implemented to manipulate CUDF files (some of them are also used directly in OPAM).

This month, we are happy to introduce a new service, that we helped them put online: the OPAM Weather Service, an instantiation for OPAM of a service they also provide for Debian. It shows the evolution of the installability of all packages in the official OPAM repository, for three stable versions of OCaml (3.12.1, 4.00.1 and 4.01.0). It should help maintainers track dependency problems with their packages, when old packages are removed or new conflicting dependencies are introduced.

An Internship on OCaml Namespaces

This month, we welcomed Pierrick Couderc for an internship in our lab. He is going to work on adding namespaces to OCaml. His goal is to design a kind of namespaces that extend the current module mechanism in a consistent but powerful way. One challenge of his job will be to make these namespaces also extend our big functors to provide functors at the namespace level.

Pierrick is not a complete newcomer in our team: last year, he already worked for us with David Maison (now working at TrustInSoft) on an online service to edit and compile OCaml code for students.

The Evolution of OCaml Sources

This month, there was also a lot of activity for the Core team, as we are closing to the feature freeze for OCaml 4.02. We took this opportunity to have a look at the evolution of OCaml sources since the first release of OCaml 1.00, in 1996.

Our first graph plots the size of uncompressed OCaml sources in bytes, from the first release to the current trunk:

The graph shows four interesting events:

in 2002-2003, between 3.02 and 3.06, an increase of 4 MB
in 2007, between 3.09.3 and 3.10.0, an increase of again 4 MB
in 2013, between 4.00.1 and 4.01.0, an increase of 2 MB
in 2014, between 4.01.0 and 4.02.0, a decrease of 6 MB

Our second graph plots the number of files per kind (OCaml sources, OCaml interfaces, C sources and C headers):

We can now check the files that were added and removed at the four events that we noticed on the first graph:

the first event corresponds to the addition of 174 files for camlp4 in 3.04, and then 70 files for ocamldoc in 3.06. Also, labltk increased a lot, with many new examples;
the second event corresponds to the addition of 225 files for ocamlbuild in 3.10.0, and the replacement of camlp4 (renamed into camlp5) by a new implementation;
the third event corresponds to ... a change in the size of boot/myocamlbuild.boot, the bytecode file used by ocamlbuild to bootstrap itself !
finally, the incoming new event corresponds to the removal of camlp4 and labltk from 4.02, i.e. about 300 files for each of them.

Our third graph shows the number of lines per kind of file, again:

This graph does not show us much more than what we have seen by number of files, but what might be interesting is to compute the ratio, i.e. the number of lines per file, for each kind of file:

There is a general trend to increase the number of lines per file, from about 200 lines in an OCaml source file in 1996 to about 330 lines in 2014. This ratio increased considerably for release 3.04, because camlp4 used to generate a huge bootstrap file of its own pre-preprocessed OCaml sources. More interestingly, the ratio didn't decrease in 2014, when camlp4 was removed from the distribution ! Interface files also grew bigger, but most of the increase was in 3.06, when ocamldoc was added to the distribution, and an effort was done to document mli files.

The Generic Syntax Extension

2014-04-01T09:05:17Z

OCaml 4.01 with its new feature to disambiguate constructors allows to do a nice trick: a simple and generic syntax extension that allows to define your own syntax without having to write complicated parsetree transformers. We propose an implementation in the form of a ppx rewriter.

it does only a simple transformation: replace strings prefixed by an operator starting with ! by a series of constructor applications

for instance:

!! "hello 3"

is rewriten to

!! (Start (H (E (L (L (O (Space (N3 (End))))))))

How is that generic ? We will present you a few examples.

Base 3 Numbers

For instance, if you want to declare base 3 arbitrary big numbers, let's define a syntax for it. We first start by declaring some types.

type start = Start of p

and p =
  | N0 of stop
  | N1 of q
  | N2 of q

and q =
  | N0 of q
  | N1 of q
  | N2 of q
  | Underscore of q
  | End

and stop = End

This type will only allow to write strings matching the regexp 0 | (1|2)(0|1|2|_)*. Notice that some constructors appear in multiple types like N0. This is not a problem since constructor desambiguation will choose for us the right one at the right place. Let's now define a few functions to use it:

open Num

let rec convert_p = function
  | N0 (End) -> Int 0
  | N1 t -> convert_q (Int 1) t
  | N2 t -> convert_q (Int 2) t

and convert_q acc = function
  | N0 t -> convert_q (acc */ Int 3) t
  | N1 t -> convert_q (Int 1 +/ acc */ Int 3) t
  | N2 t -> convert_q (Int 2 +/ acc */ Int 3) t
  | Underscore t -> convert_q acc t
  | End -> acc

let convert (Start p) = convert_p p

# val convert : start -> Num.num = <fun>

And we can now try it:

let n1 = convert (Start (N0 End))
# val n1 : Num.num = <num 0>
let n2 = convert (Start (N1 (Underscore (N0 End))))
# val n2 : Num.num = <num 3>
let n3 = convert (Start (N1 (N2 (N0 End))))
# val n3 : Num.num = <num 15>

And the generic syntax extension allows us to write:

let ( !! ) = convert

let n4 = !! "120_121_000"
val n4 : Num.num = <num 11367>

Specialised Format Strings

We can implement specialised format strings for a particular usage. Here, for concision we will restrict to a very small subset of the classical format: the characters %, i, c and space

Let's define the constructors.

type 'a start = Start of 'a a

and 'a a =
  | Percent : 'a f -> 'a a
  | I : 'a a -> 'a a
  | C : 'a a -> 'a a
  | Space : 'a a -> 'a a
  | End : unit a

and 'a f =
  | I : 'a a -> (int -> 'a) f
  | C : 'a a -> (char -> 'a) f
  | Percent : 'a a -> 'a f

Let's look at the inferred type for some examples:

let (!*) x = x

let v = !* "%i %c";;
# val v : (int -> char -> unit) start = Start (Percent (I (Space (Percent (C End)))))
let v = !* "ici";;
# val v : unit start = Start (I (C (I End)))

This is effectively the types we would like for a format string looking like that. To use it we can define a simple printer:

let rec print (Start cons) =
  main cons

and main : type t. t a -> t = function
  | I r ->
    print_string "i";
    main r
  | C r ->
    print_string "c";
    main r
  | Space r ->
    print_string " ";
    main r
  | End -> ()
  | Percent f ->
    format f

and format : type t. t f -> t = function
  | I r ->
    fun i ->
      print_int i;
      main r
  | C r ->
    fun c ->
      print_char c;
      main r
  | Percent r ->
    print_string "%";
    main r

let (!!) cons = print cons

And voila!

let s = !! "%i %c" 1 'c';;
# 1 c

How generic is it really ?

It may not look like it, but we can do almost any syntax we might want this way. For instance we can do any regular language. To explain how we transform a regular language to a type definition, we will use as an example the language a(a|)b

type start = Start of a

and a =
  | A of a';

and a' =
  | A of b
  | B of stop

and b = B of stop

and stop = End

We can try a few things on it:

let v = Start (A (A (B End)))
# val v : start = Start (A (A (B End)))

let v = Start (A (B End))
# val v : start = Start (A (B End))

let v = Start (B End)
# Characters 15-16:
#   let v = Start (B End);;
#                  ^
# Error: The variant type a has no constructor B

let v = Start (A (A (A (B End))))
# Characters 21-22:
#  let v = Start (A (A (A (B End))));;
#                        ^
# Error: The variant type b has no constructor A

Assumes the language is given as an automaton that:

has 4 states, a, a', b and stop
with initial state a
with final state stop
with transitions: a - A -> a' a' - A -> b a' - B -> stop b - B -> stop let's write {c} for the constructor corresponding to the character c and

[c][/c]

for the type corresponding to a state of the automaton.

For each state q we have a type declaration [q]
For each letter a of the alphabet we have a constructor {a}
For each transition p - l -> q we have a constructor {l} with parameter [q] in type [p]:

type [p] = {l} of [q]

The End constructor without any parameter must be present in any final state
The initial state e is declared by

type start = Start of [e]

Yet more generic

In fact we can encode deterministic context free languages (DCFL) also. To do that we encode pushdown automatons. Here we will only give a small example: the language of well parenthesized words

type empty
type 'a r = Dummy

type _ q =
  | End : empty q
  | Rparen : 'a q -> 'a r q
  | Lparen : 'a r q -> 'a q

type start = Start of empty q

let !! x = x

let m = ! ""
let m = ! "()"
let m = ! "((())())()"

To encode the stack, we use the type parameters: Lparen pushes an r to the stack, Rparen consumes it and End checks that the stack is effectively empty.

There are a few more tricks needed to encode tests on the top value in the stack, and a conversion of a grammar to Greibach normal form to allow this encoding.

We can go even further

a^n b^n c^n

In fact we don't need to restrict to DCFL, we can for instance encode the a^n.b^n.c^n language which is not context free:

type zero
type 'a s = Succ

type (_,_) p =
  | End : (zero,zero) p
  | A : ('b s, 'c s) p -> ('b, 'c) p
  | B : ('b, 'c s) q -> ('b s, 'c s) p

and (_,_) q =
  | B : ('b, 'c) q -> ('b s, 'c) q
  | C : 'c r -> (zero, 'c s) q

and _ r =
  | End : zero r
  | C : 'c r -> 'c s r

type start = Start of (zero,zero) p

let v = Start (A (B (C End)))
let v = Start (A (A (B (B (C (C End))))))

Non recursive languages

We can also encode solutions of Post Correspondance Problems (PCP), which are not recursive languages:

Suppose we have two alphabets A = { X, Y, Z } et O = { a, b } and two morphisms m1 and m2 from A to O* defined as

m1(X) = a, m1(Y) = ab, m1(Z) = bba
m2(X) = baa, m2(Y) = aa, m2(Z) = bb

Solutions of this instance of PCP are words such that their images by m1 and m2 are equal. for instance ZYZX is a solution: both images are bbaabbbaa. The language of solution can be represented by this type declaration:

type empty
type 'a a = Dummy
type 'a b = Dummy

type (_,_) z =
  | X : ('t1, 't2) s -> ('t1 a, 't2 b a a) z
  | Y : ('t1, 't2) s -> ('t1 a b, 't2 a a) z
  | Z : ('t1, 't2) s -> ('t1 b b a, 't2 b b) z

and (_,_) s =
  | End : (empty,empty) s
  | X : ('t1, 't2) s -> ('t1 a, 't2 b a a) s
  | Y : ('t1, 't2) s -> ('t1 a b, 't2 a a) s
  | Z : ('t1, 't2) s -> ('t1 b b a, 't2 b b) s

type start = Start : ('a, 'a) z -> start

let v = X (Z (Y (Z End)))
let r = Start (X (Z (Y (Z End))))

Open question

Can every context free language (not deterministic) be represented like that ? Notice that the classical example of the palindrome can be represented (proof let to the reader).

Conclusion

So we have a nice extension available that allows you to define a new syntax by merely declaring a type. The code is available on github. We are waiting for the nice syntax you will invent !

PS: Their may remain a small problem... If inadvertently you mistype something you may find some quite complicated type errors attacking you like a pyranha instead of a syntax error.

OCamlPro Highlights: Feb 2014

2014-03-05T09:05:17Z

Here is a short report of some of our activities in February 2014 !

Displaying what OPAM is doing

After releasing version 1.1.1, we have been very busy preparing the next big things for OPAM. We have also steadily been improving stability and usability, with a focus on friendly messages: for example, there is a whole new algorithm to give the best explanations on what OPAM is going to do and why:

With OPAM 1.1.1, you currently get this information:

## opam install custom_printf.109.15.00
The following actions will be performed:
– remove pa_bench.109.55.02
– downgrade type_conv.109.60.01 to 109.20.00 [required by comparelib, custom_printf]
– downgrade uri.1.4.0 to 1.3.11
– recompile variantslib.109.15.03 [use type_conv]
– downgrade sexplib.110.01.00 to 109.20.00 [required by custom_printf]
– downgrade pa_ounit.109.53.02 to 109.18.00 [required by custom_printf]
– recompile ocaml-data-notation.0.0.11 [use type_conv]
– recompile fieldslib.109.20.03 [use type_conv]
– recompile dyntype.0.9.0 [use type_conv]
– recompile deriving-ocsigen.0.5 [use type_conv]
– downgrade comparelib.109.60.00 to 109.15.00
– downgrade custom_printf.109.60.00 to 109.15.00
– downgrade cohttp.0.9.16 to 0.9.15
– recompile cow.0.9.1 [use type_conv, uri]
– recompile github.0.7.0 [use type_conv, uri]
0 to install | 7 to reinstall | 0 to upgrade | 7 to downgrade | 1 to remove

With the next trunk version of OPAM, you will get the much more informative output on real dependencies:

## opam install custom_printf.109.15.00
The following actions will be performed:
– remove pa_bench.109.55.02 [conflicts with type_conv, pa_ounit]
– downgrade type_conv from 109.60.01 to 109.20.00 [required by custom_printf]
– downgrade uri from 1.4.0 to 1.3.10 [uses sexplib]
– recompile variantslib.109.15.03 [uses type_conv]
– downgrade sexplib from 110.01.00 to 109.20.00 [required by custom_printf]
– downgrade pa_ounit from 109.53.02 to 109.18.00 [required by custom_printf]
– recompile ocaml-data-notation.0.0.11 [uses type_conv]
– recompile fieldslib.109.20.03 [uses type_conv]
– recompile dyntype.0.9.0 [uses type_conv]
– recompile deriving-ocsigen.0.5 [uses type_conv]
– downgrade comparelib from 109.60.00 to 109.15.00 [uses type_conv]
– downgrade custom_printf from 109.60.00 to 109.15.00
– downgrade cohttp from 0.9.16 to 0.9.14 [uses sexplib]
– recompile cow.0.9.1 [uses type_conv]
– recompile github.0.7.0 [uses uri, cohttp]
0 to install | 7 to reinstall | 0 to upgrade | 7 to downgrade | 1 to remove

Failsafe behaviour is being much improved as well, because things do happen to go wrong when you access the network to download packages and then compile them, and that was the biggest source of problems for our users: errors are now more tightly controlled in each stage of the opam command.

For example, nothing will be changed in case of a failed or interrupted download, and if you press C-c in the middle of an action, you’ll get something like this:

[ERROR] User interruption while waiting for sub-processes

[ERROR] Failure while processing typerex.1.99.6-beta

=-=-= Error report =-=-=
These actions have been completed successfully
install conf-gtksourceview.2
upgrade cmdliner from 0.9.2 to 0.9.4
The following failed
install typerex.1.99.6-beta
Due to the errors, the following have been cancelled
install ocaml-top.1.1.2
install ocp-index.1.0.2
install ocp-build.1.99.6-beta
recompile alcotest.0.2.0
install ocp-indent.1.4.1
install lablgtk.2.16.0

The former state can be restored with opam switch import -f “<xxx>.export”

You also shouldn’t have to dig anymore to find the most meaningful error when something fails.

With the ever-increasing number of packages and versions, resolving requests becomes a real challenge and we’re glad we made the choice to rely on specialized solvers. The built-in heuristics may show its limits when attempting long-delayed upgrades, and everybody is encouraged to install an external solver (aspcud being the one supported at the moment).

Consequently, we have also been working more tightly with the Mancoosi team at IRILL to improve interaction with the solver, and how the user can get the best of it is now well documented, thanks to Roberto Di Cosmo.

Per-projects OPAM Switches with `ocp-manager`

At OCamlPro, we often use OPAM with multiple switches, to test whether our tools are working with different versions of OCaml, including the new ones that we are developing. Switching between versions is not always as intuitive as we would like, as we sometimes forget to call

$ eval `opam config env`

in the right location or at the good time, and end up compiling a project with a different version of OCaml that we would have liked.

It was quite surprising to discover that a tool that we had developed a long time ago, ocp-manager, would actually become a solution for us to a problem that appeared just now with OPAM: ocp-manager was a tool we used to switch between different versions of OCaml before OPAM. It would use a directory of wrappers, one for each OCaml tool, and by adding this directory once and for all to the PATH, with:

$ eval `ocp-manager -config`

You would be able to switch to OPAM switch 3.12.1 (that needs to have been installed first with OPAM) immediatly by using:

[code language=”bash” gutter=”false”]

$ ocp-manager -set opam:3.12.1

Nothing much different from OPAM ? The nice thing with ocp-manager is that wrappers also use environment variables and per-directory information to choose the OCaml version of the tool they are going to run. For example, if some top-directory of your project contains a file .ocp-switch with the line “opam:4.01.0”, your project will always be compiled with this version of OCaml, even if you change the global per-user configuration. You can also override the global and local configuration by setting the OCAML_VERSION environment variable.

Maybe ocp-manager can also be useful for you. Just install it with opam install ocp-manager, change your shell configuration to add its directory to your PATH, and check if it also works for you (the manpage can be very useful!).

Optimization Patches for `ocamlopt` under Reviewing Process

This month, we also spent a lot of time improving the optimization patches that we submitted for inclusion into OCaml, and that we have described in our previous blog posts. Mark Shinwell from Jane Street and Gabriel Scherer from INRIA kindly accepted to devote some of their time in a thorough reviewing process, leading to many improvements in the readability and maintenability of our optimization code. As this first patch is a prerequisite for our next patches, we also spent a lot of time propagating these modifications, so that we will be able to submit them faster once this one has been merged!

Displaying the Distribution of Block Sizes with `ocp-memprof`

In our study to understand the memory behavior of OCaml applications, we have investigated the distribution of block sizes, both in the heap (live blocks) and in the free list (dead blocks). This information should help the programmer to understand which GC parameters might be the best ones for his application, by showing the fragmentation of the heap and the time spent searching in the free list. It is all the more important that improving the format of the free list with bins has been discussed lately in the Core team.

Here, we display the distribution of blocks at a snapshot during the execution of why3replayer, a tool that we are trying to optimize during the Bware Project. The number of free blocks is displayed darker than live blocks, from size 21 to size 0.

It is interesting to notice that, for this applications, almost all allocations have a size smaller than 6. We are planning to use such information to simulate the cost of allocation for this application, and see which data structure for the free list would benefit the most to the performance of the application.

Whole Program Analysis

The static OCaml analyszer is going quite well. Our set of (working) test samples is growing in size and complexity. Our last improvement was what is called widening. What’s widening ? Well, the main idea is “when I go through a big loop 5000 times, I don’t want the analyzer to do that too”. If we take this sample test:

let () = for i = 0 to 5000 do () done

Without widening, the analysis would loop 5000 times through that loop. That’s quite useless, not to mention that replacing 5000 by Random.int () would make the analysis loop until max_int (2^62 times on a 64-bits computer) ! Worse, let’s take this code:

let () =
let x = ref 0 in
for i = 1 to 10 do
x := !x + 1
done

Here, the analysis would not see that the increment on !x and i would be linked (that’s one of the aproximations we do to make the computation doable). So, the analyzer does not loop ten times, but again 2^62 times: we do not want that to happen.

The good news now: we can say to the analyzer “every time you go through a loop, check what integers you incremented, and suppose you’ll increment them again and again until you can’t”. This way we only go twice through our for-loop: first to discover it, then to propagate the widening approximation.

Of course this is not that simple, and we’ll often loose information by doing only two iterations. But in most cases, we don’t need it or we can get it in a quicker way than iterating billions of times through a small loop.

Hopefully, we’ll soon be able to analyze any simple program that uses only Pervasives and the basic language features, but for and while loops are already a good starting point !

SPARK 2014: a Use-Case of Alt-Ergo

The SPARK toolset, developped by the AdaCore company, targets the verification of programs written in the SPARK language; a subset of Ada used in the design of critical systems. We published this month a use-case of Alt-Ergo that explains the integration of our solver as a back-end of the next generation of SPARK.

Discussions with SPARK 2014 developpers were very important for us to understand the strengths of Alt-Ergo for them and what would be improved in the solver. We hope this use-case will be helpful for IT solutions providers that would need an automatic solver in their products.

Scilab 5 or Scilab 6 ?

We are still working at improving the Scilab environment with new tools written in OCaml. We are soon going to release a new version of Scilint, our style-checking tool for Scilab code, with a new parser compatible with Scilab 5 syntax. Changing the parser of Scilint was not an easy job: while our initial parser was partially based on the yacc parser of the future Scilab 6, we had to write the new parser from scratch to accept the more tolerant syntax of Scilab 5. It was also a good opportunity to design a cleaner AST than the one copied from Scilab 6: written in C++, Scilab 6 AST would for example have all AST nodes inherit from the Exp class, even instructions or the list of parameters of a function prototype !

We have also started to work on a type-system for Scilab. We want the result to be a type language expressive enough to express, say, the (dependent) sizes of matrices, yet simple enough for clash messages not to be complete black magic for Scilab programmers. This is not simple. In particular, there is the other constraint to build a versatile type system that could serve a JIT or give usable information to the programmer. Which means that the type environment is a mix of static information coming from the inference and of annotations, and dynamic information gotten by introspection of the dynamic interpreter.

In the mean time, we are also planning to write a simpler JIT, to mitigate the impatience of Scilab programmers expecting to feel the underlying power of OCaml!

OCamlPro Highlights: Dec 2013 & Jan 2014

2014-02-05T09:05:17Z

Here is a short report of some of our activities in last December and January !

A New Intel Backend for ocamlopt

With the support of LexiFi, we started working on a new Intel backend for the ocamlopt native code compiler. Currently, there are four Intel backends in ocamlopt: amd64/emit.mlp, amd64/emit_nt.mlp, i386/emit.mlp and i386/emit_nt.mlp, i.e. support for two processors (amd64 and i386) and two OS variants (Unices and Windows). These backends directly output assembly sources files, on which the platform assembler is called (gas on Unices, and masm on Windows).

The current organisation makes it hard to maintain these backends: code for a given processor has to be written in two almost identical files (Unix and Windows), with subtle differences in the syntax: for example, the destination operand is the second parameter in gas syntax, while it is the first one in AT&T syntax (masm).

Our current work aims at merging, for each processor, the Unix and Windows backends, by making them generate an abstract representation of the assembly. This representation is shared between the two processors ('amd64' and 'i386'), so that we only have to develop two printers, one for gas syntax and one for masm syntax. As a consequence, maintenance of the backend will be much easier: while writting the assembly code, the developer does not need to care about the exact syntax. Moreover, the type-checker can verify that each assembler instruction is used with the correct number of well-formatted operands.

Finally, our hope is that it will be also possible to write optimization passes directly on the assembly representation, such as peephole optimizations or instruction re-scheduling. This work is available in OCaml SVN, in the "abstract_x86_asm" branch.

OPAM, new Release 1.1.1

OPAM has been shifted from the 1.1.0-RC to 1.1.1, with large stability and UI improvements. We put a lot of effort on improving the interface, and on helping to build other tools in the emerging ecosystem around OPAM. Louis visited OCamlLabs, which was a great opportunity to discuss the future of OPAM and the platform, and contribute to their effort towards opam-in-a-box, a new way to generate pre-configured VirtualBox instances with all OCaml packages locally installable by OPAM, particularly convenient for computer classrooms.

The many plans and objectives on OPAM can be seen and discussed on the work-in-progress OPAM roadmap. Lots of work is ongoing for the next releases, including Windows support, binary packages, and allowing more flexibility by shifting the compiler descriptions to the packages.

`ocp-index` and its new Brother, `ocp-grep`

On our continued efforts to improve the environment and tools for OCaml hackers, we also made some extensions to ocp-index, which in addition to completing and documenting the values from your libraries, using binary annotations to jump to value definitions, now comes with a tiny ocp-grep tool that offers the possibility to syntactically locate instances of a given identifier around your project - handling open, local opens, module aliases, etc. In emacs, C-c / will get the fully qualified version of the ident under cursor and find all its uses throughout your project. Simple, efficient and very handy for refactorings. The ocp-index query interface has also been made more expressive. Some documentation is online and will be available shortly in upcoming release 1.1.

`ocp-cmicomp`: Compression of Interface Files for Try-OCaml

While developing Try-OCaml, we noticed a problem with big compiled interface files (.cmi). In Try-OCaml, such files are embedded inside the JavaScript file by js_of_ocaml, resulting in huge code files to download on connection (about 12 MB when linking Dom_html from js_of_ocaml, and about 40 MB when linking Core_kernel), and the browser freezing for a few seconds when opening the corresponding modules.

To reduce this problem, we developed a tool, called ocp-cmicomp, to compress compiled interface files. A compiled interface file is just a huge OCaml data structure, marshalled using output_value. This data structure is often created by copying values from other interface files (types, names of values, etc.) during the compilation process. As this is done transitively, the data structure has a lot of redundancy, but has lost most of the sharing. ocp-cmicomp tries to recover this sharing: it iterates on the data structure, hash-consing the immutable parts of it, to create a new data structure with almost optimal sharing.

To increase the opportunities for sharing, ocp-cmicomp also uses some heuristics: for example, it computes the most frequent methods in objects, and sort the list of methods of each object type in increasing order of frequency. As a consequence, different object types are more likely to share the same tail. Finally, we also had to be very careful: the type-checker internally uses a lot physical comparison between types (especially with polymorphic variables and row variables), so that we still had to prevent sharing of some immutable parts to avoid typing problems.

The result is quite interesting. For example, dom_html.cmi was reduced from 2.3 MB to 0.7 MB (-71%, with a lot of object types), and the corresponding JavaScript file for Try-OCaml decreased from 12 MB to 5 MB. core_kernel.cmi was reduced from 13.5 MB to 10 MB (-26%, no object types), while the corresponding JavaScript decreased from 40 MB to 30 MB !

OCamlRes: Bundling Auxiliary Files at Compile Time

A common problem when writing portable software is to locate the resources of the program, and its predefined configuration files. The program has to know the system on which it is running, which can be done like in old times by patching the source, generating a set of globals or at run-time. Either way, paths may then vary depending on the system. For instance, paths are often completely static on Unix while they are partially forged on bundled MacOS apps or on Windows. Then, there is always the task of bundling the binary with its auxiliary files which depends on the OS.

For big apps with lots of system interaction, it is something you have to undertake. However, for small apps, it is an unjustified burden. The alternative proposed by OCamlRes is to bundle these auxiliary files at compile time as an OCaml module source. Then, one can just compile the same (partially pre-generated) code for all platforms and distribute all-inclusive, naked binary files. This also has the side advantage of turning run-time exceptions for inexistent or invalid files to compile-time errors. OCamlRes is made of two parts:

an ocplib-ocamlres library to manipulate resources at run-to time, scan input files to build resource trees, and to dump resources in various formats
a command line tool ocp-ocamlres, that reads the ressources and bundles them into OCaml source files.

OCamlRes has several output formats, some more subtle than the default mechanism (which is to transform a directory structure on the disk into an OCaml static tree where each file is replaced by its content), and can (and will) be extended. An example is detailed in the documentation file.

Compiler optimisations

The last post mentioned improvements on the prototype compiler optimization allowing recursive functions specialization. Some quite complicated corner cases needed a rethink of some parts of the architecture. The first version of the patch was meant to be as simple as possible. To this end we chose to avoid as much as possible the need to maintain non trivialy checkable invariants on the intermediate language. That decision led us to add some constraints on what we allowed us to do. One such constraint that matters here, is that we wanted every crucial information (that break things up if the information is lost) to be propagated following the scope. For instance, that means that in a case like:

let x = let y = 1 in (y,y) in x

the information that y is an integer can escape its scope but if the information is lost, at worst the generated code is not as good as possible, but is still correct. But sometimes, some information about functions really matters:

let f x =
  let g y = x + y in
  g

let h a =
  let g = f a in
  g a

Let's suppose in this example that f cannot be inlined, but g can. Then, h becomes (with g.x being access to x in the closure of g):

let h a =
  let g = f a in
  a + g.x

Let's suppose that some other transformation elsewhere allowed f to be inlined now, then h becomes:

let h a =
  let x = a in
  let g y = x + y in (* and the code can be eliminated from the closure *)
  a + g.x

Here the closure of of g changes: the code is eliminated so only the x field is kept in the block, hence changing its offset. This information about the closure (what is effectively available in the closure) must be propagated to the use point (g.x) to be able to generate the offset access in the block. If this information is lost, there is no way to compile that part. The way to avoid that problem was to limit a bit the kind of cases where inlining is possible so that this kind of information could always be propagated following the scope. But in fact a few cases did not verify that property when dealing with inlining parameters from different compilation unit.

So we undertook to rewrite some part to be able to ensure that those kinds of information are effectively correctly propagated and add assertions everywhere to avoid forgeting a case. The main problem was to track all the strange corner cases, that would almost never happen or wouldn't matter if they were not optimally compiled, but must not loose any information to satisfy the assertions.

Alt-Ergo: More Confidence and More Features

Formalizing and Proving a Critical Part of the Core

Last month, we considered the formalization and the proof of a critical component of Alt-Ergo's core. This component handles equalities solving to produce equivalent substitutions. However, since Alt-Ergo handles several theories (linear integer and rational arithmetic, enumerated datatypes, records, ...), providing a global routine that combines solvers of these individual theories is needed to be able to solve mixed equalities.

The example below shows one of the difficulties we faced when designing our combination algorithm: the solution of the equality r = {a = r.a + r.b; b = 1; c = C} cannot just be of the form r |-> {a = r.a + r.b; b = 1; c = C} as the pivot r appears in the right-hand side of the solution. To avoid this kind of subtle occur-checks, we have to solve an auxiliary and simpler conjunction of three equalities in our combination framework: r = {a = k1 + k2; b = 1; c = C}, r.a = k1 and r.b = k2 (where k1 and k2 are fresh variables). We will then deduce that k2 |-> 1 and that k1 + k2 = k1, which has no solution.

type enum = A | B | C
type t = { a : int ; b : enum }
logic r : t

goal g: r = {a = r.a + r.b; b = 1; c = C} -> false

After having implemented a new combination algorithm in Alt-Ergo a few months ago, we considered its formalization and its proof, as we have done with most of the critical parts of Alt-Ergo. It was really surprising to see how types information associated to Alt-Ergo terms helped us to prove the termination of the combination algorithm, a crucial property that was hard to prove in our previous combination algorithms, and a challenging research problem as well.

Models Generation

On the development side, we conducted some preliminary experiments in order to extend Alt-Ergo with a model generation feature. This capability is useful to derive concrete test-cases that may either exhibit erroneous behaviors of the program being verified or bugs in its formal specification.

In a first step, we concentrated on model generation for the combination of the theories of uninterpreted functions and linear integer arithmetic. The following example illustrates this problem:

logic f : int -> int
logic x, y : int

goal g: 2*x >= 0 -> y >= 0 -> f(x) <> f(y) -> false

We have a satisfiable (but non-valid) formula, where x and y are in the integer intervals [0,+infinity[ and f(x) <> f(y). We would like to find concrete values for x, y and f that satisfy the formula. A first attempt to answer this question may be the following:

From an arithmetic point of view, x = 0 and y = 0 are possible values for x and y. So, Linear arithmetic suggests this partial model to other theories.
The theory of uninterpreted functions cannot agree with this solution. In fact, x = y = 0 would imply f(x) = f(y), which contradicts f(x) <> f(y). More generally, x should be different from y.
Now, if linear arithmetic suggests, x = 0 and y = 1, the theory of uninterpreted functions will agree. The next step is to find integer values for f(0) and f(1) such that f(0) <> f(1).

After having implemented a brute force technique that tries to construct such models, our main concern now is to find an elegant and more efficient "divide and conquer" strategy that allows each theory to compute its own partial model with guarantees that this model will be coherent with the partial models of the other theories. It would be then immediate to automatically merge these partial solutions into a global one.

OPAM 1.1.1 released

2014-01-29T09:05:17Z

We are proud to announce that OPAM 1.1.1 has just been released.

This minor release features mostly stability and UI/doc improvements over OPAM 1.1.0, but also focuses on improving the API and tools to be a better base for the platform (functions for opam-doc, interface with tools like opamfu and opam-installer). Lots of bigger changes are in the works, and will be merged progressively after this release.

Installing

Installation instructions are available on the wiki.

Note that some packages may take a few days until they get out of the pipeline. If you're eager to get 1.1.1, either use our binary installer or compile from source.

The 'official' package repository is now hosted at opam.ocaml.org, synchronised with the Git repository at http://github.com/ocaml/opam-repository, where you can contribute new packages descriptions. Those are under a CC0 license, a.k.a. public domain, to ensure they will always belong to the community.

Thanks to all of you who have helped build this repository and made OPAM such a success.

Changes

From the changelog:

Fix opam-admin make <packages> -r (#990)
Explicitly prettyprint list of lists, to fix opam-admin depexts (#997)
Tell the user which fields is invalid in a configuration file (#1016)
Add OpamSolver.empty_universe for flexible universe instantiation (#1033)
Add OpamFormula.eval_relop and OpamFormula.check_relop (#1042)
Change OpamCompiler.compare to match Pervasives.compare (#1042)
Add OpamCompiler.eval_relop (#1042)
Add OpamPackage.Name.compare (#1046)
Add types version_constraint and version_formula to OpamFormula (#1046)
Clearer command aliases. Made info an alias for show and added the alias uninstall (#944)
Fixed opam init --root=<relative path> (#1047)
Display OS constraints in opam info (#1052)
Add a new 'opam-installer' script to make .install files usable outside of opam (#1026)
Add a --resolve option to opam-admin make that builds just the archives you need for a specific installation (#1031)
Fixed handling of spaces in filenames in internal files (#1014)
Replace calls to which by a more portable call (#1061)
Fixed generation of the init scripts in some cases (#1011)
Better reports on package patch errors (#987, #988)
More accurate warnings for unknown package dependencies (#1079)
Added opam config report to help with bug reports (#1034)
Do not reinstall dev packages with opam upgrade <pkg> (#1001)
Be more careful with opam init to a non-empty root directory (#974)
Cleanup build-dir after successful compiler installation to save on space (#1006)
Improved OSX compatibility in the external solver tools (#1074)
Fixed messages printed on update that were plain wrong (#1030)
Improved detection of meaningful changes from upstream packages to trigger recompilation

OCamlPro Highlights: November 2013

2013-12-03T09:05:17Z

New Team Members

We are pleased to welcome three new members in our OCamlPro team since the beginning of November:

Benjamin Canou started working at OCamlPro on the Richelieu project, an effort to bring better safety and performance to the Scilab language. He is in charge of a type inference algorithm that will serve both as a developper tool and in coordination with a JIT. He spent his first month understanding the darkest corners of the language, and then writing a versatile AST with a parser to build it. Actually, this is not an easy task, because the language gives different statuses to characters (including spaces) depending on the context, leading to non-trivial lexing. But the real source of problems is the fact that the original lexparser is intermingled with the interpreter inside a big bunch of venerable FORTRAN code. This old fellow makes parsing choices depending on the dynamic typing context, allows its users to catch syntax errors at runtime, among other fun things. The new OCaml lexer and parser is handwritten in around a thousand lines, has performance comparable to a Lex and Yacc generated one, and is resilient to errors so it could be integrated into an IDE to detect errors on the fly without stopping on the first one. Once again, it’s OCaml to the rescue of the weak and elderly!An example of the kind of code that can be written in Scilab:

if return = while then [ 12..
34.. … .. …
56 } ; else ‘”‘”
end

which is parsed into:

— parsed in 0.000189–
(script (if (== !return !while) (matrix (row 123456)) “‘”))
— messages
1.10:1.11: use of deprecated operator ‘=’
— end

Gregoire Henry started working at OCamlPro on the Bware project. He is tackling the optimization of memory performance of automatic provers written in OCaml, in collaboration with Cagdas Bozman. One of his first contributions after joining us was to exhume his internship work of 2004, an implementation of Graphics for Mac OS X that we are going to use for our online OCaml IDE!
Thomas Blanc started a PhD at OCamlPro after his summer internship with us. He is going to continue his work on whole-program analysis, especially as a way to detect uncaught exceptions. We hope his tool will be a good replacement for the ocamlexn tool written by Francois Pessaux.

Compiler Updates

On the compiler optimization front, Pierre Chambart got direct access to the OCaml SVN, so that he will soon upload his work directly into an SVN branch, easier for reviewing and integration into the official compiler. A current set of optimizations is already scheduled for the new branch, and is now working on inlining recursive functions, such List.map, by inlining the function definition at the call site, when at least one of its arguments is invariant during recursion.

A function that can benefit a lot from that transformation is:

let f l = List.fold_left (+) 0 l

camlTest__f_1013:
.L102:
movq %rax, %rdi
movq $1, %rbx
jmp camlTest__fold_left_1017@PLT

camlTest__fold_left_1017:
.L101:
cmpq $1, %rdi
je .L100
movq 8(%rdi), %rsi
movq (%rdi), %rdi
leaq -1(%rbx, %rdi), %rbx
movq %rsi, %rdi
jmp .L101
.align 4
.L100:
movq %rbx, %rax
ret

Development Tools

Release of OPAM 1.1

After lots of testing and fixing, the official version 1.1.0 of OPAM has finally been released. It features lots of stability improvements, and a reorganized and cleaner repo now hosted at https://opam.ocaml.org. Work goes on on OPAM as we’ll release opam-installer soon, a small script that enables using and testing .install files. This is a step toward a better integration of OPAM with existing build tools, and are experimenting with ways to ease usage for Coq packages, to generate binary packages, and to enhance portability.

Binary Packages for OPAM

We also started to experiment with binary packages. We developed a very small tool, ocp-bin, that monitors the compilation of every OPAM package, takes a snapshot of OPAM files before and after the compilation and installation, and generates a binary archive for the package. The next time the package is re-installed, with the same dependencies, the archive is used instead of compiling the package again.

For a typical package, the standard OPAM file:

build: [
[ “./configure” “–prefix” “%{prefix}%”]
[ “make]
[ make “install”]
]
remove: [
[ make “uninstall” ]
]

has to be modified in:

build: [
[ “ocp-bin” “begin” “%{package}%” “%{version}%” “%{compiler}%” “%{prefix}%”
“-opam” “-depends” “%{depends}%” “-hash” “%{hash}%”
“-nodeps” “ocamlfind.” ]
[ “ocp-bin” “–” “./configure” “–prefix” “%{prefix}%”]
[ “ocp-bin” “–” make]
[ “ocp-bin” “–” make “install”]
[ “ocp-bin” “end” ]
]
remove: [
[ “!” “ocp-bin” “uninstall”
“%{package}%” “%{version}%” “%{compiler}%” “%{prefix}%” ]

Such a transformation would be automated in the future by adding a field ocp-bin: true. Note that, since ocp-bin takes care of the deinstallation of the package, it would ensure a complete and correct deinstallation of all packages.

We also implemented a client-server version of ocp-bin, to be able to share binary packages between users. The current limitation with this approach is that many binary packages are not relocatable: if packages are compiled by Bob to be installed in /home/bob/.opam/4.01.0, the same packages will only be usable on a different computer by a user with the same home path! Although it can still be useful for a user with several computers, we plan to investigate now on how to build relocatable packages for OCaml.

Stable Release of ocp-index

Always looking for a way to provide better tools to the OCaml programmer, we are happy to announce the first stable release of ocp-index, which provides quick access to the installed interfaces and documentation as well as some source-browsing features (lookup ident definition, search for uses of an ident, etc).

Profiling Alt-Ergo with `ocp-memprof`: The Killer App

One of the most exciting events this month is the use of the ocp-memprof tool to profile an execution of Alt-Ergo on a big formula generated by the Cubicle model checker. The story is the following:

The formula was generated from a transition system modeling the FLASH coherence cache protocol, plus additional information computed by Cubicle during the verification of FLASH’s safety. It contains a sub-formula made of nested conjunctions of 999 elements. Its proof requires reasoning in the combination of the free theory of equality, enumerated data types and quantifiers. Alt-Ergo was able to discharge it in only 10 seconds. However, Alain Mebsout — who is doing his Phd thesis on Cubicle — noticed that Alt-Ergo allocates more than 60 MB during its execution.

In order to localize the source of this abnormal memory consumption, we installed the OCaml Memory Profiler runtime, version 4.00.1+memprof (available in the private OPAM repository of OCamlPro) and compiled Alt-Ergo using -bin-annot option in order to dump .cmt files. We then executed the prover on Alain’s example as shown below, without any instrumentation of Alt-Ergo’s code.

$ OCAMLRUNPARAM=m ./alt-ergo.opt formula.mlw

This execution caused the modified OCaml compiler to dump a snapshot of the typed heap at every major collection of the GC. The names of dumped files are of the form memprof.<PID>.<DUMP-NAME>.<image-number>.dump, where PID is a natural number that identifies the set of dumped files during a particular execution.

Dumped files were then fed to the ocp-memprof tool (available in the TypeRex-Pro toolbox) using the syntax below. The synthesis of this step (.hp file) was then converted to a .ps file thanks to hp2ps command. At the end, we obtained the diagram shown in the figure below.

$ ./ocp-memprof -loc -sizes PID

From the figure above, one can extract the following information:

there were 15 major collections of OCaml’s GC during the above execution (the x-axis),
Alt-Ergo allocated more than 60 MB during its execution (the y-axis),
Some function in file src/preprocess/why_typing.ml is allocating a lot of data of type Parsed.pp_desc at line 868 (the first square of the legend).

The third point corresponds to a piece of code used in a recursive function that performs alpha renaming on parsed formulas to avoid variable captures. This code is the following:

let rec alpha_renaming_b s f =
…

| PPinfix(f1, op, f2) -> (* ‘op’ may be the AND operator *)
let ff1 = alpha_renaming_b s f1 in
let ff2 = alpha_renaming_b s f2 in
PPinfix(ff1, op, ff2) (* line 868 *)

…

Actually, in 99% there are no capture problems and the function just reconstructs a new value PPinfix(ff1, op, ff2) that is structurally equal (=) to its argument f. In case of very big formulas (recall that Alain’s formula contains a nested conjunction of 999 elements), this causes Alt-Ergo to allocate a lot.

Fixing this behavior was straightforward. We only had to check whether recursive calls to alpha renaming function returned modified values using physical equality ==. If not, no renaming is performed and we safely return the formula given in the argument. This way, the function will never allocate for formulas without capture issues. For instance, the piece of code given above is fixed as follows:

let rec alpha_renaming_b s f =
…

| PPinfix(f1, op, f2) ->
let ff1 = alpha_renaming_b s f1 in
let ff2 = alpha_renaming_b s f2 in
if ff1 == f1 && ff2 == f2 then f (* no renaming performed by recursive calls ? *)
else PPinfix(ff1, op, ff2)

…

Once we applied the patch on the hole function alpha_renaming_b, Alt-Ergo only needed 2 seconds and less than 2.2MB memory to prove our formula. Profiling an execution of patched version of the prover with OCaml 4.00.1+memprof and ocp-memprof produced the diagram below. The difference with the first drawing was really impressive.

Other R&D Projects

Scilint, the Scilab Style-Checker

This month our work on Richelieu was mainly focused on improving Scilint. After some discussions with Scilab knowledgeable users, we chose a new set of warnings to implement. Among other things those warnings analyze primitive fonctions and their arguments as well as loop variables. Another important thing was to allow SciNotes, Scilab’s editor, to display our warnings. This has been done by implementing support for Firehose. Finally some minor bugs were also fixed.

OPAM 1.1.0 released

2013-11-08T09:05:17Z

After a while staged as RC, we are proud to announce the final release of OPAM 1.1.0!

Thanks again to those who have helped testing and fixing the last few issues.

Important note

The repository format has been improved with incompatible new features; to account for this, the new repository is now hosted at opam.ocaml.org, and the legacy repository at opam.ocamlpro.com is kept to support OPAM 1.0 installations, but is unlikely to benefit from many package updates. Migration to opam.ocaml.org will be done automatically as soon as you upgrade your OPAM version.

You're still free, of course, to use any third-party repositories instead or in addition.

Installing

NOTE: When switching from 1.0, the internal state will need to be upgraded. THIS PROCESS CANNOT BE REVERTED. We have tried hard to make it fault- resistant, but failures might happen. In case you have precious data in your ~/.opam folder, it is advised to backup that folder before you upgrade to 1.1.0.

Using the binary installer:

download and run https://github.com/ocaml/opam/blob/master/shell/opam_installer.sh

Using the .deb packages from Anil's PPA (binaries are currently syncing): add-apt-repository ppa:avsm/ppa apt-get update sudo apt-get install opam

For OSX users, the homebrew package will be updated shortly.

or build it from sources at :

https://github.com/ocaml/opam/releases/tag/1.1.0

For those who haven't been paying attention

OPAM is a source-based package manager for OCaml. It supports multiple simultaneous compiler installations, flexible package constraints, and a Git-friendly development workflow. OPAM is edited and maintained by OCamlPro, with continuous support from OCamlLabs and the community at large (including its main industrial users such as Jane-Street and Citrix).

Thanks to all of you who have helped build this repository and made OPAM such a success.

Changes

Too many to list here, see https://raw.github.com/OCamlPro/opam/1.1.0/CHANGES

For packagers, some new fields have appeared in the OPAM description format:

depexts provides facilities for dealing with system (non ocaml) dependencies
messages, post-messages can be used to notify the user eg. of licensing information, or help her troobleshoot at package installation.
available supersedes ocaml-version and os constraints, and can contain more expressive formulas

Also, we have integrated the main package repository with Travis, which will help us to improve the quality of contributions (see Anil's post).

OCamlPro Highlights, Sept-Oct 2013

2013-11-01T09:05:17Z

Here is a short report of our activities in September-October 2013.

OCamlPro at OCaml’2013 in Boston

We were very happy to participate to OCaml’2013, in Boston. The event was a great success, with a lot of interesting talks and many participants. It was a nice opportunity for us to present some of our recent work:

Fabrice presented his work on the design of the wxOCaml library. Although the wxOCaml library itself is an interesting project, the goal of his talk was to show that binding thousands of functions from a C++ library can be automated very easily in OCaml, and make the bindings easy to maintain and to improve.
The work of Thomas and Louis on OPAM was presented in a talk by Anil on the OCaml Platform v0.1. The OCaml Platform is a set of tools, including OPAM, to provide an ever increasing set of packages for OCaml developers, including high-quality documentation and broad portability. Some statistics showed how OPAM, in less than a year, grew from 200 packages to more than 1400 packages, and from 2-3 contributors to about 130 contributors in September. Another talk, Ocamlot: OCaml Online Testing presented how sets of packages will now be automatically tested, to give immediate feedback to contributors, and an evaluation of packages quality to users.
Pierre presented his work on Improving OCaml high level optimisations that he also presented in a recent blog post.
Grégoire presented his work with Jacques Garrigue on Runtime types in OCaml. In particular, he showed how abstraction is hard to deal with, as there is a dilemma between the ability to write powerful polytipic functions and the preservation of the abstraction wanted by the developer for code modularity.
Finally, Çagdas presented his work on Profiling the Memory Usage of OCaml Applications without Changing their Behavior. This new profiler will be able to provide precise memory information on production OCaml software, by snapshoting the memory and recovering type information. It is currently being tested on several projects, such as the Why3 verification tool.

Of course, the day was full of interesting talks, and we can only advise to see all of them on the complete program that is now online.

CUFP’2013 Program was also very dense. For OCaml users, Dave Thomas, first keynote, reminded us how important it is to build two-way bridges between OCaml and other languages: we have the bad habit to only build one-way bridges to just use other languages from OCaml, and forget that new users will have to start by using small OCaml components from their existing software written in another language. Then, Julien Verlaguet presented the use of OCaml at Facebook to type-check and compile a typed version of PhP, HipHop, that is now used for a large part of the code at Facebook.

Software Projects

The period of September-October was also very busy trying to find some funding for our projects. Fortunately, we still managed to make a lot of progress in the development of these projects:

OPAM

Lots has been going on regarding OPAM, as the 1.1 release is being pushed forward, with a beta and a RC available already. This release focuses on stability improvements and bug-fixes, but is nonetheless a large step from 1.0, with an enhanced update mechanism, extended metadata, an enhanced ‘pin’ workflow for developers, and much more.

We are delighted by the success met by OPAM, which was mentioned again and again at the OCaml’2013 workshop, where we got a warming lot of positive feedback. To be sure that this belongs to the community, after licensing all metadata of the repository under CC0 (as close to public domain as legally possible), we have worked hand in hand with OCamlLabs to migrate it to opam.ocaml.org. External repositories for Windows, Android and so on are appearing, which is a really good thing, too.

The Alt-Ergo SMT Solver

In September, we officially announced the distribution and the support of Alt-Ergo by OCamlPro and launched its new website. This site allows to download public releases and to discover available support offerings. We have also published a new public release (version 0.95.2) of the prover. The main changes in this minor release are: source code reorganization, simplification of quantifier instantiation heuristics, GUI improvement to reduce latency when opening large files, as well as various bug fixes.

During September, we also re-implemented and simplified other parts of Alt-Ergo. In addition, we started the integration of a new SAT-solver based on miniSAT (implemented as a plug-in) and the development of a new tool, called Ctrl-Alt-Ergo, that automates the most interesting strategies of Alt-Ergo. The experiments we made during October are very encouraging as shown by our previous blog post.

Multi-runtime

Luca Saiu completed his work at Inria and on the multi-runtime branch, fixing the last bugs and leaving the code in a shape not too far removed from permitting its eventual integration into the OCaml mainline.

Now, the code has a clean configuration-time facility for disabling the multi-runtime system, and compatibility is restored with architectures not including the required assembly support to at least compile and work using a single runtime. A crucial optimization permits to work in this mode with extremely little overhead with respect to stock OCaml. Testing on an old PowerPC 32-bit machine revealed a few minor portability problems related to word size and endianness.

Compiler optimisations

We have been working on allowing cross module inlining. We wanted to be able to show a version generating strictly better code than the current compiler. This milestone being reached, we are now preparing a patch series for upstreaming the base parts. We are also working on polishing the remaining problems: the passes were written in an as simple as possible way, so compilation time is still a bit high. And there are a few difficulties remaining with cross module inlining and packs.

The INRIA-OCamlPro Lab Team

The team is also evolving, and some of us are now leaving the team to join other projects:

After two years with us, Thomas Gazagnaire has left OCamlPro in October to work most of his time on Mirage in Cambridge (UK). Thomas was OCamlPro’s first employee, and OCamlPro probably wouldn’t exist without him. Thomas has also been the main architect of OPAM, and was involved in the design of many of our projects. Louis Gesbert will continue his work on developing and maintaining OPAM.
After one year with us, Luca Saiu has left Inria in October. Luca has made a tremendous work on the implementation of a multicore-OCaml, where every runtime runs in a different memory space with its own garbage collector. We hope to be able to upstream his work soon to the official OCaml distribution.
After an internship with us, Pierrick Couderc, Souhire Kenawi and David Maison are back to their masters’ studies since September. Souhire worked on testing the development of iOS applications on Linux with OCaml, a very challenging task ! Pierrick and David developed an online editor for OCaml that we are going to release very soon.

This blog post was about departures, but stay connected, next month, we are going to announce some newcomers who decided to join the team for the winter !

OPAM 1.1.0 release candidate out

2013-10-14T09:05:17Z

OPAM 1.1.0 is ready, and we are shipping a release candidate for packagers and all interested to try it out.

This version features several bug-fixes over the September beta release, and quite a few stability and usability improvements. Thanks to all beta-testers who have taken the time to file reports, and helped a lot tackling the remaining issues.

Repository change to opam.ocaml.org

This release is synchronized with the migration of the main repository from ocamlpro.com to ocaml.org. A redirection has been put in place, so that all up-to-date installation of OPAM should be redirected seamlessly. OPAM 1.0 instances will stay on the old repository, so that they won't be broken by incompatible package updates.

We are very happy to see the impressive amount of contributions to the OPAM repository, and this change, together with the licensing of all metadata under CC0 (almost pubic domain), guarantees that these efforts belong to the community.

If you are upgrading from 1.0

The internal state will need to be upgraded at the first run of OPAM 1.1.0. THIS PROCESS CANNOT BE REVERTED. We have tried hard to make it fault- resistant, but failures might happen. In case you have precious data in your ~/.opam folder, it is advised to backup that folder before you upgrade to 1.1.0.

Installing

Using the binary installer:

download and run https://github.com/ocaml/opam/blob/master/shell/opam_installer.sh

You can also get the new version either from Anil's unstable PPA:

add-apt-repository ppa:avsm/ppa-testing
apt-get update
sudo apt-get install opam

or build it from sources at :

https://github.com/OCamlPro/opam/releases/tag/1.1.0-RC

Changes

Too many to list here, see https://raw.github.com/OCamlPro/opam/1.1.0-RC/CHANGES

For packagers, some new fields have appeared in the OPAM description format:

depexts provides facilities for dealing with system (non ocaml) dependencies
messages, post-messages can be used to notify the user or help her troubleshoot at package installation.
available supersedes ocaml-version and os constraints, and can contain more expressive formulas

Alt-Ergo @ OCamlPro: Two months later

2013-10-02T09:05:17Z

As announced in a previous post, I joined OCamlPro at the beginning of September and I started working on Alt-Ergo. Here is a report presenting the tool and the work we have done during the two last months.

Alt-Ergo at a Glance

Alt-Ergo is an open source automatic theorem prover based on SMT technology. It is developed at Laboratoire de Recherche en Informatique, Inria Saclay Ile-de-France and CNRS since 2006. It is capable of reasoning in a combination of several built-in theories such as uninterpreted equality, integer and rational arithmetic, arrays, records, enumerated data types and AC symbols. It also handles quantified formulas and has a polymorphic first-order native input language. Alt-Ergo is written in OCaml. Its core has been formally proved in the Coq proof assistant.

Alt-Ergo Spider Web

Alt-Ergo is mainly used to prove the validity of mathematical formulas generated by program verification platforms. It was originally designed and tuned to prove formulas generated by the Why tool. Now, it is used by different tools and in various contexts, in particular via the Why3 platform. As shown by the diagram below, Alt-Ergo is used to prove formulas:

generated from Ada code by SPARK 2005 and SPARK 2014,
generated from C programs by Frama-C and CAVEAT,
produced from WhyML programs by Why3,
translated from proof obligations generated by Atelier-B,

Moreover, Alt-Ergo is used in the context of cryptographic protocols verification by EasyCrypt and in SMT-based model checking by Cubicle.

Some "Hello World" Examples

Below are some basic formulas written in the why input syntax. Each example is proved valid by Alt-Ergo. The first formulas is very simple and is proved with a straightforward arithmetic reasoning. goal g2 requires reasoning in the combination of functional arrays and linear arithmetic, etc. The last example contains a quantified sub-formula with a polymorphic variable x. Generating four ground instances of this axiom where x is replaced by 1, true, 1.4 and a respectively is necessary to prove goal g5.

** Simple arithmetic operation **

goal g1 : 1 + 2 = 3

** Theories of functional arrays and linear integer arithmetic **

logic a : (int, int) farray
goal g2 : forall i:int. i = 6 -> a[i<-4][5] = a[i-1]

** Theories of records and linear integer arithmetic **

type my_record = { a : int ; b : int }
goal g3 : forall v,w : my_record. 2 * v.a = 10 -> { v with b = 5} = w -> w.a = 5

** Theories of enumerated data types and uninterpreted equality **

type my_sum = A | B | C
logic P : 'a -> prop
goal g4 : forall x : my_sum. P(C) -> x<>A and x<>B -> P(x)

** Formula with quantifiers and polymorphism **

axiom a: forall x : 'a. P(x)
goal g5 : P(1) and P(true) and P(1.4) and P(a)

** formula with quantifiers and polymorphism **

$$ alt-ergo examples.why
File "examples.why", line 2, characters 1-21:Valid (0.0120) (0)
File "examples.why", line 6, characters 1-53:Valid (0.0000) (1)
File "examples.why", line 10, characters 1-81:Valid (0.0000) (3)
File "examples.why", line 15, characters 1-59:Valid (0.0000) (6)
File "examples.why", line 19, characters 1-47:Valid (0.0000) (10)

Alt-Ergo @ OCamlPro

On September 20, we officially announced the distribution and the support of Alt-Ergo by OCamlPro and launched its new website. This site allows to download public releases of the prover and to discover available support offerings. It'll be enriched with additional content progressively. The former Alt-Ergo's web page hosted by LRI is now devoted to theoretical foundations and academic aspects of the solver.

We have also published a new public release (version 0.95.2) of Alt-Ergo. The main changes in this minor release are: source code reorganization into sub-directories, simplification of quantifiers instantiation heuristics, GUI improvement to reduce latency when opening large files, as well as various bug fixes.

In addition to the re-implementation and the simplification of some parts of the prover (e.g. internal literals representation, theories combination architecture, ...), the main novelties of the current master branch of Alt-Ergo are the following:

The user can now specify an external (plug-in) SAT-solver instead of the default DFS-based engine. We experimentally provide a CDCL solver based on miniSAT that can be plugged to perform satisfiability reasoning. This solver is more efficient when formulas contain a rich propositional structure.
We started the development of a new tool, called Ctrl-Alt-Ergo, in which we put our expertise by implementing the most interesting strategies of Alt-Ergo. The experiments we made with our internal benchmarks are very promising, as shown below.

Experimental Evaluation

We compared the performances of latest public releases of Alt-Ergo with the current master branch of both Alt-Ergo and Ctrl-Alt-Ergo (commit ce0bba61a1fd234b85715ea2c96078121c913602) on our internal test suite composed of 16209 formulas. Timeout was set to 60 seconds and memory was limited to 2GB per formula. Benchmarks descriptions and the results of our evaluation are given below.

Why3 Benchmark

This benchmark contains 2470 formulas generated from Why3's gallery of WhyML programs. Some of these formulas are out of scope of current SMT solvers. For instance, the proof of some of them requires inductive reasoning.

SPARK Hi-lite Benchmark

This benchmark is composed of 3167 formulas generated from Ada programs used during Hi-lite project. It is known that some formulas are not valid.

BWare Benchmark

This test-suite contains 10572 formulas translated from proof obligations generated by Atelier-B. These proof obligations are issued from industrial B projects and are proved valid.

	Alt-Ergo version 0.95.1	Alt-Ergo version 0.95.2	Alt-Ergo master branch*	Ctrl-Alt-Ergo master branch*
Release date	Mar. 05, 2013	Sep. 20, 2013	– – –	– – –
Why3 benchmark	2270 (91.90 %)	2288 (92.63 %)	2308 (93.44 %)	2363 (95.67 %)
SPARK benchmark	2351 (74.23 %)	2360 (74.52 %)	2373 (74.93 %)	2404 (75.91 %)
BWare benchmark	5609 (53.05 %)	9437 (89.26 %)	10072 (95.27 %)	10373 (98.12 %)

(*) commit `ce0bba61a1fd234b85715ea2c96078121c913602`

OPAM 1.1.0 beta released

2013-09-20T09:05:17Z

We are very happy to announce the beta release of OPAM version 1.1.0!

OPAM is a source-based package manager for OCaml. It supports multiple simultaneous compiler installations, flexible package constraints, and a Git-friendly development workflow which. OPAM is edited and maintained by OCamlPro, with continuous support from OCamlLabs and the community at large (including its main industrial users such as Jane-Street and Citrix).

Since its first official release last March, we have fixed many bugs and added lots of new features and stability improvements. New features go from more metadata to the package and compiler descriptions, to improved package pin workflow, through a much faster update algorithm. The full changeset is included below.

We are also delighted to see the growing number of contributions from the community to both OPAM itself (35 contributors) and to its metadata repository (100+ contributors, 500+ unique packages, 1500+ packages). It is really great to also see alternative metadata repositories appearing in the wild (see for instance the repositories for Android, Windows and so on). To be sure that the community efforts will continue to benefit to everyone and to underline our committment to OPAM, we are rehousing it at http://opam.ocaml.org and switching the license to CC0 (see issue #955, where 85 people are commenting on the thread).

The binary installer has been updated for OSX and x86_64:

https://github.com/ocaml/opam/blob/master/shell/opam_installer.sh

You can also get the new version either from Anil's unstable PPA: add-apt-repository ppa:avsm/ppa-testing apt-get update sudo apt-get install opam

or build it from sources at :

https://github.com/OCamlPro/opam/releases/tag/1.1.0-beta

NOTE: If you upgrade from OPAM 1.0, the first time you will run the new opam binary it will upgrade its internal state in an incompatible way: THIS PROCESS CANNOT BE REVERTED. We have tried hard to make this process fault-resistant, but failures might happen. In case you have precious data in your ~/.opam folder, it is advised to backup that folder before you upgrade to 1.1.

Changes

Automatic backup before any operation which might alter the list of installed packages
Support for arbitrary sub-directories for metadata repositories
Lots of colors
New option opam update -u equivalent to opam update && opam upgrade --yes
New opam-admin tool, bundling the features of opam-mk-repo and opam-repo-check + new 'opam-admin stats' tool
New available: field in opam files, superseding ocaml-version and os fields
Package names specified on the command-line are now understood case-insensitively (#705)
Fixed parsing of malformed opam files (#696)
Fixed recompilation of a package when uninstalling its optional dependencies (#692)
Added conditional post-messages support, to help users when a package fails to install for a known reason (#662)
Rewrite the code which updates pin et dev packages to be quicker and more reliable
Add {opam,url,desc,files/} overlay for all packages
opam config env now detects the current shell and outputs a sensible default if no override is provided.
Improve opam pin stability and start display information about dev revisions
Add a new man field in .install files
Support hierarchical installation in .install files
Add a new stublibs field in .install files
OPAM works even when the current directory has been deleted
speed-up invocation of opam config var VARIABLE when variable is simple (eg. prefix, lib, ...)
opam list now display only the installed packages. Use opam list -a to get the previous behavior.
Inverse the depext tag selection (useful for ocamlot)
Add a --sexp option to opam config env to load the configuration under emacs
Purge ~/.opam/log on each invocation of OPAM
System compiler with versions such as version+patches are now handled as if this was simply version
New OpamVCS functor to generate OPAM backends
More efficient opam update
Switch license to LGPL with linking exception
opam search now also searches through the tags
minor API changes for API.list and API.SWITCH.list
Improve the syntax of filters
Add a messages field
Add a --jobs command line option and add %{jobs}% to be used in OPAM files
Various improvements in the solver heuristics
By default, turn-on checking of certificates for downloaded dependency archives
Check the md5sum of downloaded archives when compiling OPAM
Improved opam info command (more information, non-zero error code when no patterns match)
Display OS and OPAM version on internal errors to ease error reporting
Fix opam reinstall when reinstalling a package wich is a dependency of installed packages
Export and read OPAMSWITCH to be able to call OPAM in different switches
opam-client can now be used in a toplevel
-n now means --no-setup and not --no-checksums anymore
Fix support of FreeBSD
Fix installation of local compilers with local paths endings with ../ocaml/
Fix the contents of ~/.opam/opam-init/variable.sh after a switch

OCamlPro Highlights, August 2013

2013-09-04T09:05:17Z

Here is a short report on the different projects we have been working on in August.

News from OCamlPro

Compiler Optimizations

After our reports on better inlining have raised big expectations, we have been working hard on fixing the few remaining bugs. An enhanced alias/constant analysis was added, to provide the information needed to lift some constraints on the maintained invariants, and simplifying some other passes quite a lot in the process. We are now working on reestablishing cross-module inlining, by exporting the new information between compilation units.

Memory Profiling

On the memory profiling front, now that the compiler patch is well tested and quite stable, we started some cleanup to make it more modular, easier to understand and extend. We also worked on improving the performance of the profiler (the tool that analyzes the heap snapshots), by caching some expensive computations, such as extracting type information from ‘cmt’ files associated with each location, in files that are shared between executions. We have started testing the profiler on the Why3 verification platform, and these optimizations proved very useful to analyze longer traces.

OPAM Package Manager

On OPAM, we are still preparing the release of version 1.1. The release date has shifted a little bit — it is now planned to happen mid-September, before the OCaml’2013 meeting — because we are focusing on getting speed and stability improvements in a very good shape. We are now relying on opam-rt, our new regression testing infrastructure, to be sure to get the best level of quality for the release.

Regarding the package and compiler metadata, we are very proud to announce that our community has crossed an important line, with more than 100 contributors and 500 different packages ! In order to ensure that these hours of packaging efforts continue to benefit everyone in the OCaml community in the future, we are (i) clarifying the license for all the metadata in the package repository to use CC0 and (ii) discussing with OCamlLabs and the different stakeholders to migrate all the metadata to the ocaml.org infrastructure.

Simple Build Manager

We also made progress on the design of our simple build-manager for OCaml, ocp-build. The next branch in the GIT repository features a new, much more expressive package description language : ocp-build can now be used to build arbitrary files, for example to generate new source files, or to compile files in other languages than OCaml. We successfully used the new language to build Try-ocaml and wxOCaml, completely avoiding the use of “make”.

It can also automatically generate basic HTML documentation for libraries using ocamldoc with “ocp-build make -doc”. There are still some improvements on our TODO list before an official release, for example improving the support of META files, but we are getting very close ! ocp-build is very efficient: compiling Merlin with ocp-build takes only 4s on a quad-core while ocamlbuild needs 13s in similar conditions and with the same parallelisation settings.

Graphics on Try-OCaml

Try-OCaml has been improved with a dedicated implementation of the Graphics module: type “lesson 19”, and you will get some fun examples, including a simple game by Sylvain Conchon.

Alt-Ergo Theorem Prover

We are also happy to welcome Mohamed Iguernelala in the team, starting at the beginning of September. Mohamed is a great OCaml programmer, and he will be working on the Alt-Ergo theorem prover, an SMT-solver in OCaml developed by Sylvain Conchon, and heavily used in the industry for safety-critical software (aircrafts, trains, etc.).

News from the INRIA-OCamlPro Lab

Multi-runtime OCaml

After thorough testing, the multi-runtime branch is getting stable enough for being submitted upstream. The build system has been fixed to enable the modified OCaml to run, in single-runtime mode, on architectures for which no multi-runtime port exists yet, while maintaining API compatibility with mainline OCaml. Thanks to some clever preprocessor hacks, the performance impact in single-runtime mode will be negligible.

Whole-Program Analysis

Our work on whole program analysis, while still in the early stages, is quickly getting forward, and we managed to generate well-formed graphs representing a whole OCaml program. The tool can be fed sources and .cmt files, and at each point of the program, will compute all of the plausible values every variable can take, plus the calculations that allowed to get those values. We hope to have it ready for testing the detection of uncaught exceptions soon.

Editing OCaml Online

We also made a lot of progress in our Online IDE for OCaml, with code generation within the browser. The prototype is now quite robust, and some tricky bugs with the representation of integers and floats in Javascript have been fixed, so that the generated code is always the same as the one generated by a standalone compiler. Also, the interface now allows the user to have a full hierarchy of files and projects in his workspace. There is still some work to be done on improving the design, but we are very exited with the possibility to develop in OCaml without installing anything on the computer !

Scilab Code Analysis

For the Richelieu project, after testing some type inference analysis on Scilab code in the last months, we have now started to implement a new tool, Scilint, to perform some of this analysis on whole Scilab projects and report warnings on suspect code. We hope this tool will soon be used by every Scilab user, to avoid wasting hours of computation before reaching an easy-to-catch error, such as a misspelled — thus undefined — variable.

Meeting with the Community

Some of us are going to present part of this work at OCaml’2013, the OCaml Users and Developers Workshop in Boston. We expect it to be a good opportunity to get some feedback on these projects from the community!

News from July

2013-08-05T09:05:17Z

Once again, here is the summary of our activities for last month. The highlight this month is the release of ocaml-top, an interactive editor for education which works well under Windows and that we hope professors all around the world will use to teach OCaml to their students. We are also continuying our work on the improvement of the performance of OCaml, with new inlining heuristics in the compiler and adding multicore support to the runtime.

Compiler updates

Last month, we started to get very nice results with our compiler performance improvements. First, Pierre Chambart polished the prototype implementation of his new flamba intermediate language and he started to get impressive micro-benchmarks results, with around 20% – 30% improvements on code using exceptions or functors. Following a discussion with our industrial users, he is currently focusing on improving the compilation of local recursive functions such as the very typical:

let f x y =  
let rec loop v =  
… x …  
loop z  
in  
loop x

A simple and reasonably efficient solutions is to eta-expand the auxiliary function, i.e. add an intermediate function calling the loop with all closure parameters passed as variables. The hard part is to then to add the right arguments to all the call sites: luckily enough the new inlining engine already does that kind of analysis so it can be re-used here. This means that these constructs will be compiled efficiently by the new inlining heuristics.

Second, Luca Saiu has finished debugging the native thread support on top of his multi-runtime variant of OCaml, which has become quite usable and is pretty robust now. He has tentatively started adding support for vmthreads as well, concurrently cleaning up context finalization and solving other minor issues, such as configuration scripts for architectures that do not support the multi-runtime features yet. Then, after writing documentation and running a full pass over the sources to remove debugging stubs and prints which pollute the code after months of low-level experimentation, he is going to prepare patches for discussion and submission to the main OCaml compiler.

Çagdas Bozman continued to improve the implementation of his profiling tools for both native and byte-code programs. A great output of his recent work is that the location information is much more precise: with very different techniques for native and byte code, the program locations are now uniquely identified. The usability was improved as well, as the profiling location tables are now embedded directly into the programs. He also improved the post-mortem profiling tools to re-type dumped heaps, which also leads to much more accurate profiling information. Çagdas is now actively using these tools on why3 and he expects to get feedback results very soon to continue to improve his tools.

Finally, Thomas Blanc is still working on whole program analysis for his internship, in order to find possibly uncaught exceptions. The project is moving quite well and the month was spent on analyzing the lambda intermediate representation of the compilation step. With the help of Pierre Chambart, he is working on a 0-CFA library that should allow to compute the “possible” values for each variable at each point of the program. The idea is to make a directed hypergraph with each hyperedge representing an instruction and each vertex being a state of the program. Then search a fixpoint in the possible values propagated through the graph. This allows the compiler to know everywhere in the program what possible values may be returned or what possible exceptions may be raised. In order to create a well-designed graph, it is needed to create a new intermediate representation that looks like Lambda except (mainly) that every expression gets an identifier. The next step is to specify a hypergraph construction for each primitive and control-flow.

Development Tools

Editors

This month, Louis Gesbert has been busy making the first release of ocaml-top, the simple graphical top-level for beginners and students. Together with the web-IDE, this project aims at lowering the entry barrier to OCaml. Ocaml-top features a clean and easy to access interface, with nonetheless advanced features like automatic semantic indentation, error marking, and integrated display of standard library types — using the engines of ocp-indent and ocp-index of course. The biggest challenge was probably to make everything work as expected on Microsoft Windows, which was required for most beginners and classrooms.

The two main issues were:

Setup the build environment: there are several versions of OCaml for Windows ; we generally want to avoid any dependency on cygwin on the generated program, but it’s very hard to avoid any need for it in the build chain. The easiest solution at the moment is to “cross-compile” from cygwin using the mingw32 gcc compiler. The hard part is to get all the needed libraries properly setup: this felt a lot like Linux 15 years ago, you can find some binaries but generally not properly configured, and there is no consistent packaging system (or at least you can’t find what you want in it).
Process management: ocaml-top runs the OCaml toplevel as a sub-process, so as not to be inpaired by any problem in the user program. Interacting with that process in a portable way is close to impossible, Windows having no POSIX signals, and read/write operations being very different in terms of blocking, etc. Some obscure C bindings were required to simulate a SIGINT that could tell the ocaml process to stop the current computation and return to the prompt. But at this cost, ocaml-top can be run with any existing external OCaml toplevel.

Not mentioning some gtk/lablgtk bugs that were often OS-specific. After having read horror stories about the most commonly used “Windows installer generator” NSIS, Louis opted for the Microsoft open source solution WiX which turned out to be quite clean and efficient, although using a lot of XML. The only point that might be in favor of NSIS is that it can generate the installer from Linux, so it’s much convenient when you cross-compile, which is not the case here ; also worth mentioning, Xen and LVM are really great tools which do save a lot of time when working and testing between two (or more) different OSes.

Always on the editor front, David and Pierrick have been working on a web-IDE for OCaml since the beginning of their internship two months ago. For now, the IDE includes Ace, an editor, plugged with some features specific for OCaml, particularly ocp-indent, made possible by using js_of_ocaml which compiles bytecode to Javascript. It also includes a basic project manager that uses a server to store files for each user. Authentication is done by using Mozilla’s Persona. One particularly nice feature they are working on is client-side bytecode generation: this means users can ask their browser to produce the byte-code of the project they are working on without any connection to the server ! Beware that this is still work-in-progress and the feature is not bug-free for the moment. The project (undocumented for now) is available on Github.

Tools

Meanwhile, most of my time last month has been spent preparing the next release of OPAM, with the help of Louis Gesbert. This new release will contain a lot of bug-fixes and an improved opam update mechanism: it will be much more flexible, faster and more stable than the one present in 1.0. Few months ago, I had already pushed a first batch of patches to the development version, which started to make things look much better. I decided last month to continue improving that feature and make it rock-solid: hence I have started a regression testing platform for OPAM which is still young but already damn useful to stabilize my new set of patches. opam-rt is also written in OCaml: it generates random repositories with random packages, shuffles everything around and checks that OPAM correctly picks-up the changes. In the future this will make it easier to test complex OPAM scenarios and will hopefully lead to a better OPAM.

ocp-index has seen some progress, with lots of rough edges rounded, and much better performance on big cmi files (typically module packs, like Core.Std). While more advanced functionality is being planned, it is already very helpful, and problems seen in earlier development versions have been fixed. The upcoming release also greatly improves the experience from emacs, and might become the first “stable”. The flow of bugs reported on ocp-indent is drying up, which means the tool is gaining some maturity. Not much visible changes for the past month except for a few bug-fixes, but the library interface has been completely rewritten to offer much more flexibility while being more friendly. This has allowed it to be plugged in the Web-IDE (see above), which being executed in JavaScript has much tighter performance constraints — the indent engine is only re-run where required after changes — ; and in ocaml-top, where it is also used to detect top-level phrase bounds.

Community

We are proud to be well represented at the OCaml Developer Workshop 2013. This year it happens in Boston, in September, co-located with the Conference of Users of Functional Programming. Both conferences will contains a lot of OCaml-related talks: I am especially excited to hear about PHP type-inference efforts at Facebook using OCaml! If you are in the area around the 22/23 and 24 of September and you want to chat about OCamlPro and OCaml, we will be around!

Better Inlining: Progress Report

2013-07-11T09:05:17Z

As announced some time ago, I am working on a new intermediate language within the OCaml compiler to improve its inlining strategy. After some time of bug squashing, I prepared a testable version of the patchset, available either on Github (branch flambda_experiments), or through OPAM, in the following repository:

opam repo add inlining https://github.com/OCamlPro/opam-compilers-repository.git
opam switch flambda
opam install inlining-benchs

The series of patches is not ready for benchmarking against real applications, as no cross module information is propagated yet (this is more practical for development because it simplifies debugging a lot), but it already works quite well on single-file code. Some very simple benchmark examples are available in the inlining-benchs package.

The series of patches implements a set of 'reasonable' compilation passes, that do not try anything too complicated, but combined, generates quite efficient code.

Current Status

As said in the previous post, I decided to design a new intermediate language to implement better inlining heuristics in the compiler. This intermediate language, called flambda, lies between the lambda code and the Clambda code. It has an explicit representation of closures, making them easier to manipulate, and modules do not appear in it anymore (they have already been compiled to static structures).

I then started to implement new inlining heuristics as functions from the lambda code to the flambda code. The following features are already present:

intra function value analysis
variable rebinding
dead code elimination (which needs purity analysis)
known match / if branch elimination

In more detail, the chosen strategy is divided into two passes, which can be described by the following pseudo-code:

if function is at toplevel
then if applied to at least one constant OR small enough
then inline
else if applied to at least one constant AND small enough
then inline

if function is small enough
AND does not contain local function declarations
then inline

The first pass eliminates most functor applications and functions of the kind:

let iter f x =
let rec aux x = ... f ... in
aux x

The second pass eliminates the same kind of functions as Ocaml 4.01, but being after the first pass, it can also inline functions revealed by inlining functors.

Benchmarks

I ran a few benchmarks to ensure that there were no obvious miscompilations (and there were, but they are now fixed). On benchmarks that were too carefully written there was not much gain, but I got interesting results on some examples: those illustrate quite well the improvements, and can be seen at $(opam config var lib)/inlining-benchs (binaries at $(opam congfig var bin)/bench-*).

The Knuth-Bendix Benchmark (single-file)

Performance gains against OCaml 4.01 are around 20%. The main difference is that exceptions are compiled to constants, hence not allocated when raised. In that particular example, this halves the allocations.

In general, constant exceptions can be compiled to constants when predefined (Not_found, Failure, ...). They cannot yet when user defined: to improve this a few things need to be changed in translcore.ml to annotate values created by exceptions.

The Noiz Benchmark:

Performance gains are around 30% against OCaml 4.01. This code uses a lot of higher order functions of the kind:

let map_triple f (a,b,c) = (f a, f b, f c)

OCaml 4.01 can inline map_triple itself but then cannot inline the parameter f. Moreover, when writing:

let (x,y,z) = map_triple f (1,2,3)

the tuples are not really used, and after inlining their allocations can be eliminated (thanks to rebinding and dead code elimination)

The Set Example

Performance gains are around 20% compared to OCaml 4.01. This example shows how inlining can help defunctorization: when inlining the Set functor, the provided comparison function can be inlined in Set.add, allowing direct calls everywhere.

Known Bugs

Recursive Values

A problem may arise in a rare case of recursive values where a field access can be considered to be a constant. Something that would look like (if it were allowed):

type 'a v = { v : 'a }

let rec a = { v = b }
and b = (a.v, a.v)

I have a few solutions, but not sure yet which one is best. This probably won't appear in any normal test. This bug manifests through a segmentation fault (cmmgen fails to compile that recursive value reasonably).

Pattern-Matching

The new passes assume that every identifier is declared only once in a given module, but this assumption can be broken on some rare pattern matching cases. I will have to dig through matching.ml to add a substitution in these cases. (the only non hand-built occurence that I found is in ocamlnet)

Known Mis-compilations

since there is no cross-module information at the moment, calls to functions from other modules are always slow.
In some rare cases, there could be functions with more values in their closure, thus resulting in more allocations.

What's next ?

I would now like to add back cross-module information, and after a bit of cleanup the first series of patches should be ready to propose upstream.

News from May and June

2013-07-01T09:05:17Z

It is time to give a brief summary of our recent activities. As usual, our contributions were focused on three main objectives:

make the OCaml compiler faster and easier to use;
make the OCaml developers more efficient by releasing new development tools and improving editor supports;
organize and participate to community events around the language

We are also welcoming four interns who will work with us on these objectives during the summer.

Compiler updates

Following the ideas he announced in his recent blog post, Pierre Chambart has made some progress on his inlining branch. He is currently working on stabilizing and cleaning-up the code for optimization which does not take into account inter-module information.

We also continue to work on our profiling tool and start to separate the different parts of the project. We have patched the compiler and runtime, for both bytecode and native code, to generate : .prof files which contain the id-loc information and allow us to recover the location from the identifiers in the header of the block; and to dump a program heap in a file on demand or to monitor a running program without memory and performance overhead. Çagdas Bozman has presented the work he has done so far regarding his PhD to members of the Bware project and we started to test our prototype on industrial use-cases using the why3 platform.

On the multi-core front, Luca Saiu is continuing his post-doc with Fabrice le Fessant and is modifying the OCaml runtime to support parallel programming on multi-core computers. Their version of the “multi-runtime” OCaml provides a message-passing abstraction in which a running OCaml program is “split” into independent OCaml programs, one per thread (if possible running on its separate core) with a separate instance of the runtime library in order to reduce resource contention both at the software and at the hardware level. Luca is now debugging the support for OCaml multi-threading running on top of a multi-context parallel program. A recent presentation covering this work and its challenges is available online.

A new intern from ENS Cachan, Thomas Blanc is working on a whole program analysis system. His internship’s final goal is to provide a good hint of exceptions that may be left uncaught by the program, resulting a failure. It is quite interesting as exceptions are pretty much the part of the program “hard to foresee”. The main difficulty comes from higher-order functions (like List.iter). Because of them, a simple local analysis becomes impossible. So the first task is to take the whole program in the form of separated .cmt files, merge it, and remove every higher-order functions (either by direct inlining if possible or by a very big pattern matching). The merging as already been done through a deep browsing of the compiler’s typedtrees. Thomas is now focusing in reordering the code so that higher-order functions can be safely removed.

Finally, we are helping to prepare the release 4.01.0 of the OCaml compiler: Fabrice has integrated his frame-pointer patch, that can be used to profile the performance of OCaml applications using Linux perf tool; he has added in Pervasives two application operators that had been optimized before, but were only available for people who knew about that; he has also added a new environment variable, OCAMLCOMPPARAM, that can be used to change how a program is compiled by ocamlc/ocamlopt, without changing the build system (for example, OCAMLCOMPPARAM='g=1' make can be used to compile a project in debug mode without modifying the makefiles).

Development Tools

Since the initial release of OPAM in March, we have been kept busy preparing the upcoming 1.1.0 version, which should interface nicely with the forthcoming set of automatic tools which will constitute the first version of the OCaml Platform that we are helping OCamlLabs to deliver. We have constantly been focused on fixing bugs and implementing feature requests (more than 70 issues have been closed on Github) and we have recently improved the speed and reliability of opam update. More good news related to OPAM: The number of packages submitted to official repository is steadily increasing with around 20 new packages integrated every-months (and much more already existing package upgrades), and the official Debian package should land in testing very soon.

This month, Louis was still busy improving different tools for ocaml code edition. ocp-index and ocp-indent, made for the community to improve the general ocaml experience and kindly funded by Jane Street, have seen some updates:

ocp-index: the library data access tool which was first presented in April has seen some progress, with the ability to locate definitions and resolve type names. It is still not yet considered stable though, expect more from it soon. An early release (0.2.0) is in OPAM.
ocp-indent the generic ocaml source code indenter, has seen its usual bunch of fixes, along with some new customization options. Also, its library interface has been rewritten, offering much better flexibility and opening the gate to uses like restarting from checkpoints to avoid full reparsing, detecting top-expression boundaries, syntax coloration, etc. We will be releasing 1.3.0 in OPAM very soon.

We are also developing in-house projects aiming at providing a better first experience of OCaml to beginners and students:

the new ocaml-top (previous project name ocp-edit-simple) aims to offer a simple, but clean and easy-to-use interface to interact with the ocaml top-level. It is intended mainly for exercises, tutorials and practicals. A release should be coming soon, the Linux version being quite stable while some bugs remain on Windows.
two new interns, David and Pierrick, have started working on a web-IDE for OCaml. As students, they have seen sometimes how difficult it could be to install OCaml on some OSes, or simply configure editors like emacs or vim. To solve these issues, the idea is to use only a web browser-based editor and provide a way to compile a project without having to install anything on your computer. For the editing part, the idea is to use Ace and improve it for OCaml, using ocp-indent for example, which is possible by using js_of_ocaml. The next step will be to glue this editor with both TryOCaml to execute code, and a cloud computing part, to store projects and files and access them from anywhere.

We are also trying to improve cross-compilation tutorials and tools for developing native iOS application under a Linux system, using the OCaml language. Souhire, our fourth new intern, is experimenting with that idea and will document how to set up such an environment, from the foundation until the publication on the application store (if it is possible). She is starting to look at how iOS applications (with a native graphical interface) written in C can be cross-compiled on Linux, and how the ones written in OCaml can be cross-compiled on MacOSX.

On the library front, Fabrice has completely rewritten the way his wxOCaml library is generated, compared to what was described in a previous blog post. It does not share any code anymore with other wxWidgets bindings (wxHaskell or wxEiffel), but directly generates the stubs from a DSL (close to C++) describing the wxWidgets classes. It should make binding more widgets (classes) and more methods for each widget much easier, and also help for maintenance, evolution and compatibility with wxWidgets version. There are now an interesting set of samples in the library, covering many interesting usages.

Community

We have also been pretty active during the last months to promote the use of OCaml in the free-software and research community: we are actively participating to the upcoming OCaml 2013 and Commercial User of Functional Programming conference which will be help next September in Boston.

While I was visiting Jane Street with OCamlLabs’s team, I had the pleasure to be invited to give a talk at the NYC OCaml meetup on OPAM (my slides can be found online here). It was a nice meetup, with more than 20 people, hosted in the great Jane-Street New-York offices.

OCamlPro is still organizing OCaml meetups in Paris, hosted by IRILL and sponsored by LexiFi : our last Ocaml Users in PariS (OUPS) meetup was in May, there were more than 50 persons ! It was a nice collection of talks, where Esther Baruk spoke about the usage of OCaml at Lexifi, Benoit Vaugon about all the secrets that we always wanted to know about the OCaml bytecode, Frédéric Bour presents us Merlin, the new IDe for VIM, and Gabriel Scherer told us how to better interact with the OCaml core team.

We are now preparing our next OUPS meeting which will take place at IRILL on Tuesday, July 2nd. Emphasis will be on programming in OCaml in different context. Thus, there will be some js_of_ocaml experiences, GPGPU in OCaml and GADTs in practice. There is still many seats available, so do not hesitate to register to the meetup, but if you cannot, this time, videos of the talks (in French) will be available afterwards.

Not really related to OCaml, we also attend the Teratec 2013 Forum which brings together a lot of Scilab users. This is part of the Richelieu research project that Michael is working on: his goal is to analyze Scilab code, before just-in-time compilation. It requires a basic type-inference algorithm, but for a language that has not been designed for that ! He is currently struggling with the dynamic aspects of Scilab language. After some work on preprocessing eval and evalstr functions, he is now focusing on how Scilab programers usually write functions. He is currently using different kinds of analyses on real-world Scilab programs to understand how they are structured.

Finally, we are happy to announce that we finally found the time to release the sources of our OCaml cheat-sheets. Feel free to contribute by sending patches if you are interested to improve them!

Optimisations you shouldn’t do

2013-05-24T09:05:17Z

Doing the compiler's work

Working at OCamlPro may have some drawbacks. I spend a lot of time hacking the OCaml compiler. Hence when I write some code, I have a good glimpse of what the generated assembly will look like. This is nice when I want to write performance sensitive code, but as I usually write code for which execution time doesn't matter much, this mainly tends to torture me. A small voice in my head is telling me "you shouldn't write it like that, you known you could avoid this allocation". And usually, following that indication would only tend to make the code less readable. But there is a solution to calm that voice: making the compiler smarter than me.

OCaml compilation mechanisms are quite predictable. There is no dark magic to replace your ugly code by a well-behaving one, but it always generates reasonably efficient code. This is a good thing in general, as you won't be surprised by code running more slowly than what you usually expect. But it does not behave very well with dumb code. This may not often seem like a problem with code written by humans, but generated code, for example coming from camlp4/ppx, or compiled from another language to OCaml, may fall into that category. In fact, there is another common source for non-human written code: inlining.

Inlining

Inlining (or inline expansion) is a key optimisation in many compilers and particularly in functional languages. Inlining replaces a function call by the function body. Let's apply inlining to f in this example.

let f x = x + 1
let g y = f (f y)

We replace the calls to f by let for each arguments and then copy the body of f.

let g y =
let x1 = y in
let r1 = x1 + 1 in
let x2 = r1 in
let r2 = x2 + 1 in
r2

Inlining allows to eliminate the cost of a call (and associated spilling), but the main point is elsewhere: it puts the code in its context, allowing its specialisation. When you look at that generated code after inlining your trained eyes will notice that it looks quite dumb. And you really want to rewrite it as:

let g y = y + 2

The problem is that OCaml is compiling smart code into smart assembly, but after inlining your code is not as smart as it used to be. What is missing in the current compiler is a way to get back a nice and smart code after inlining. (To be honest, OCaml is not that bad and on that example it would generate the right code: put this on the sake of the mandatory blog-post dramatic effect.)

In fact you could consider inlining as two separate things: duplication and call elimination. By duplication you make a new version of the function that is specialisable in its context, and by call elimination you replace the call by specialised code. This distinction is important because there are some cases where you only want to do duplication: recursive functions.

Recursive function inlining

In a recursive function duplicating and removing a call is similar to loop unrolling. This can be effective in some cases, but this is not what we want to do in general. Lets try it on List.map

let rec list_map f l = match l with
| [] -&gt; []
| a::r -&gt; f a :: list_map f r

let l' =
let succ = (fun x -&gt; x + 1) in
list_map succ l

If we simply inline the body of list_map we obtain this

let l' =
let succ = (fun x -&gt; x + 1) in
match l with
| [] -&gt; []
| a::r -&gt; succ a :: list_map succ r

And with some more inlining we get this which is probably not any faster than the original code.

let l' =
let succ = (fun x -&gt; x + 1) in
match l with
| [] -&gt; []
| a::r -&gt; a + 1 :: list_map succ r

Instead we want the function to be duplicated.

let l' =
let succ = (fun x -&gt; x + 1) in
let rec list_map' f l = match l with
| [] -&gt; []
| a::r -&gt; f a :: list_map' f r in
list_map' succ l

Now we know that list_map' will never escape its context hence that its f parameter will always be succ. Hence we can replace f by succ everywhere in its body.

let l' =
let succ = (fun x -&gt; x + 1) in
let rec list_map' f l = match l with
| [] -&gt; []
| a::r -&gt; succ a :: list_map' succ r in
list_map' succ l

And we can now see that the f parameter is not used anymore, we can eliminate it.

let l' =
let succ = (fun x -&gt; x + 1) in
let rec list_map' l = match l with
| [] -&gt; []
| a::r -&gt; succ a :: list_map' r in
list_map' l

With some more inlining and cleaning we finally obtain this nicely specialised function which will be faster than the original.

let l' =
let rec list_map' l = match l with
| [] -&gt; []
| a::r -&gt; a + 1 :: list_map' r in
list_map' l

Current state of the OCaml inliner

Inlining can gain a lot, but abusing it will increase code size a lot. This is not only a problem of binary size (who cares?): if your code does not fit in processor cache anymore, its speed will be limited by memory bandwidth.

To avoid that, OCaml has a threshold to the function size allowed for inlining. The compiler may also refuse to inline in other cases that are not completely justified though, mainly for reasons related to its architecture:

duplication and call elimination are not separated, hence recursive function duplication is not possible.
functions containing structured constants or local functions are not allowed to be duplicated, preventing those functions to be inlined.

let constant x =
let l = [1] in
x::l

let local_function x =
let g x = some closed function in
... g x ...

The assumption is that if a function contains a constant or a function it will be too big to be reasonably inlined. But there is a reasonable transformation that could allow it.

let l = [1]
let constant x =
x::l

let g x = some closed function
let local_function x = ... g x ...

and then we can reasonably inline constant and local_function. Those cases are only technical limitation that could easily be lifted with the new implementation.

But improving the OCaml inliner is not that easy. It is well written, but it is also doing a lot of other things at the same time:

Closure conversion

Closure conversion transforms functions to a data structure containing a code pointer and the free variables of the function. You could imagine it as that transformation:

let a = 1

let f x = x + a (* a is a free variable in f *)

let r = f 42

Here a is a free variable of f. We cannot compile f while it contains reference to free variables. To get rid of the free variables, we add a new parameter to the function, the environment, containing all the free variables.

let a = 1

let f x environment =
(* the new environment parameter contains all the free variables of f *)
x + environment.a

let f_closure = { code = f; environment = { a = a } }

let r = f_closure.code 42 f_closure.environment

Value analysis

In functional languages inlining is not as simple as it is for languages like C because the function name does not tells you which function is used at a call site:

let f x = (x,(fun y -&gt; y+1))

let g x =
let (a,h) = f x in
h a

To be able to inline h as (fun y -> y+1) the compiler needs to track which values flows to variables. This is usually called a value analysis. This example can look a bit contrived, but in practice functor application generate quite similar code. This allows for instance to inline Set.Make(...).is_empty. The result of this value analysis is used by other optimisations:

Constant folding

When the value analysis can determine that the result of an operation is a constant, it can remove its computation:

let f x =
let a = 1 in
let b = a + 1 in
x + b

Since b always have the value 2 and a + 1 does not have side effects it is possible to simplify it.

let f x =
x + 2

Direct call specialisation

Sometimes it is impossible to know which function will be used at a call site:

let f g x = g x

There is a common representation (the closure) that allows to call a function without knowing anything about it. Using a function through its closure is called a generic call. This is efficient, but of course not as efficient as a simple assembly call (a direct call). The work of the direct call specialisation is to turn as many as possible generic call into direct ones. In practice, the vast majority of calls can be optimised.

Improving OCaml inliner

The current architecture is very fast and works well on a lot of cases, but it is quite difficult to improve the handling of corner cases.

I have started a complete rewrite of those passes, I am currently working on splitting all those things in their own passes. The first step was to add a new intermediate representation (flambda) more suited to doing the various analysis. The main difference with the current representation (clambda) is that closures are explicitly represented, making them easier to manipulate. As a nice side effect this intermediate representation allows to plug passes in or out, or loop on them without changing anything to the architecture. But we are losing the possibility to enforce some invariants in the type of the representation, hence we need to be careful to correctly maintain them.

With this new architecture, the closure conversion is done first (going from lambda to flambda). Then on flambda are provided a set of simple analysis:

simple intraprocedural value and alias analysis
purity analysis
constant analysis
dead expression analysis

And there is a set of simple passes using their results:

dead code elimination
constant folding/direct call specialisation/type specialisation: a simple traversal replacing expressions with more efficient ones when the result of the value analysis allows it.
alias rebinding: Use results of alias analysis to know when a field access can be simplified:

let f x =
let tuple = (x,x) in
let (y,z) = tuple in
y + z

let f x =
x + x

Of course nobody would write that, but access to variables bounded in a closure can looks a lot like that after inlining:

let f x =
let g y = x + y in
g x

After closure conversion we obtain this.

let f x =
let g_closure =
{ code = fun x environment -&gt; environment.x + y;
environment = { x = x } } in
g_closure.code x g_closure.environment

And after inlining g.

let f x =
let g_closure =
{ code = fun x environment -&gt; environment.x + y;
environment = { x = x } } in
x + closure.environment.x

Inlining g makes some code that looks a bit stupid. closure.environment.x is always the same value as x. So there is no need to access it through the structure.

let f x =
let g_closure =
{ code = fun x environment -&gt; environment.x + y;
environment = { x = x } } in
x + x

Now that we have simplified the code, we notice that g_closure is not used anymore, and dead code elimination can simply get rid of it.

let f x =
x + x

a really, really dumb inliner: it inlines almost anything. Its interest is to demonstrate what can be achieved when putting some code in its context.

After the different optimisation passes we need to send the result to the compiler back-end. This is done by the final conversion from flambda to clambda, which is mainly doing a lot of bureaucratic transformations and mark constant structured values. Doing this constant marking separately also allows to improve a bit the generated code.

let rec f x =
let g y = f y in
g x

f and g are closed functions but the current compiler will not be able to detect it and allocate a closure for g at each call of f.

Hey ! Where are the nice charts ?

As you noticed that there were no fancy improvements charts, and there won't be any below. Those are demonstrations passes, the generated code can (and probably will) be worse than the one generated by the current compiler. This is mainly done to show what can be achieved by combining simple passes and simple analysis and allowing to apply them multiple times. What is needed to get fast code is to change the inlining heuristic (and re-enable cross module inlining).

My current work is to write more serious analysis allowing better optimisations. In particular I expect that a reasonable interprocedural value analysis could help a lot with handling recursive function specialisation.

My future toys

Then I'd like to play a bit with other common things like

unused parameters elimination: when a function does not use one of its parameters, remove it. This is trivial with simple functions, but it can get a bit tricky with multiply recursive functions. (that kind of code can appear after constant folding with informations from interprocedural analysis )
lambda lifting: turning closure into closed function by adding arguments. This can eliminate some allocations

let f x =
let g y = x + y in
g 4

If we add the x parameter to g we can avoid building its closure each time f is called.

let f x =
let g y x = x + y in
g 4 x

This can get quite tricky if we want to handle cases like

let f x n =
let g i = i + x in
Array.init n g

We need to add a new parameter to init also to be able to pass it to g.

common sub-expression elimination:

let f x =
let a = x + 1 in
let b = x + 1 in
a + b

In f We clearly don't need to compute x + 1 two times

let f x =
let a = x + 1 in
a + a

earlier unboxing: Floats are boxed in ocaml, this means that there is an indirection when accessing the constant of a value of type float. To reduce the cost of allocating and accessing floats unboxing eliminates the indirections between some operations. I'd like to try to do this as a flambda pass to be able to use the results of the value analysis.

If you want to play/hack a bit with the demo look at my github branch (be warned, this branch may sometimes be rebased)

April Monthly Report

2013-04-22T09:05:17Z

This post aims at summarizing the activities of OCamlPro for the past month. As usual, we worked in three main areas: the OCaml toolchain, development tools for OCaml and R&D projects.

The toolchain

Our multi-runtime implementation of OCaml had gained stability. Luca fixed a lot of low-level bugs in the “master” branch of his OCaml repository, which were mainly related to the handling of signals. There are still some issues, which seem to be related to thread-switching (ie. when using OS level mutli-threading).

We made great progress on improved inlining strategy. In the current OCaml compiler, inlining, closure conversion and constant propagation are done in a single pass interleaved with analysis. It has served well until now, but to improve it in a way which is easily extensible in the future, it needs a complete rewrite. After a few prototypes, Pierre is now coming up with a suitable intermediate language (IR) more suited for the job, using a dedicated value analysis to guide the simplification and inlining passes. This IR will stand between the lambda code and the C-lambda and is designed such that future specialized optimization can be easily be added. There are two good reasons for this IR: First, it is not as intrusive and reduces the extent of the modifications to the compiler, as it can be plugged between two existing passes and turned on or off using a command-line flag. Second, it can be tweaked to make the abstract interpretation more precise and efficient. For instance, we want the inlining to work with higher-order functions as well as modules and functors, performing basic defunctorization. It is still in an experimentation phase, but we are quickly converging on the API and hope to have something we can demo in the next months.

Our frame-pointer patch has also been accepted. Note that, as this patch changes the calling sconvention of OCaml programs, you cannot link together libraries compiled with and without the patch. Hence, this option will be provided as a configuration switch (./configure --with-frame-pointer).

Regarding memory profiling, we released a preliminary prototype of the memory profiler for native code. It is available in Çagdas repository. We are still in the process of testing and evaluating the prototype before making it widely available through OPAM. As the previous bytecode prototype, you need to compile the libraries and the program you want to profile as usual in order to build a table associating unique identifier to program locations (.prof file). Then, for each allocated block, we have then patched the runtime of OCaml to encode in the header the identifier of the line which allocated it. To be able to dump the heap, you can either instrument your program, or send a signal, or set the appropriate environment variable (OCAMLRUNPARAM=m). Finally, you can use the profiler which will read the .prof and .cmt files in order to generate a pdf file which is the amount of memory use by type. More details on this will come soon, you can already read the README file available on github.

Finally, we organized a new meeting with the core-team to discuss some of the bugs in the OCaml bug tracker. It was the first of the year, but we are now going to have one every month, as it has a very positive impact on the involvement of everybody in fixing bugs and helps focus work on the most important issues.

Development Tools for OCaml

Since the latest release of ocp-indent, Louis continued to improve the tool. We plan to release version 1.2.0 in the next couple of days, with some bug fixes (esp. related to the handling of records) and the following new features: operators are now aligned by default (as well as opened parentheses not finishing a line) and indentation can be bounded using the max_indent parameter. We are also using the great cmdliner which means ocp-indent now has nice manual pages.

We are also preparing a new minor release of OPAM, with a few bug fixes, an improved solver heuristic and improved performance. OPAM statistics seem to converge towards round numbers, as OcamlPro/opam repository has recently reached 100 “stars” on Github, OCamlPro/opam-repository is not very far from being forked 100 times, while the number of unique packages on opam.ocamlpro.com is almost 400. We are also preparing the platform release, with a cleaner and simpler client API to be used by the upcoming “Ocamlot”, the automated infrastructure which will test and improve the quality and consistency of OPAM packages.

Last, we released a very small – but already incredibly useful tool: ocp-index. This new tool provides completion based on what is installed on your system, with type and documentation when available. Similarly to ocp-indent, the main goal of this tool is to make it easy to integrate in your editor of choice. As a proof of concept, we also distribute a small curses-based tool, called ocp-browser, which lets you browse interactively the libraries installed on your system, as well as an emacs binding for auto-complete.el. Interestingly enough, behind the scene ocp-index uses a lazy immutable prefix tree with merge operations to efficiently store and load cmis and cmt files.

Other R&D Projects

We continued to work on the Richelieu project. We are currently adding basic type-inference for Scilab programs to our tool scilint, to be able to print warnings on possible programers mistakes. A first part of the work was to understand how to automatically get rid of some of the eval constructs, especially deff and evalstr primitives that are often used. After this, Michael manually analyzed some real-world Scilab programs to understand how typing should be done, and he is now implementing the type checker and a table of types for primitive functions.

We are also submitting a new project, called SOCaml, for funding by the French government. In 2010, ANSSI, the French agency for the security of computer systems, commanded a study, called LAFOSEC, to understand the advantages of using functional languages in the domain of security. Early results of the study were presented in JFLA’2013, with in particular recommandations on how to improve OCaml to use it for security applications. The goal of the SOCaml project would be to implement these recommandations, to improve OCaml, to provide additional tools to detect potential errors and to implement libraries to verify marshaled values and bytecode. We hope the project will be accepted, as it opens a new application domain for OCaml, and would allow us to work on this topic with our partners in the project, such as LexiFi and Michel Mauny‘s team at ENSTA Paristech (the project would also contribute to their ocamlcc bytecode-to-c compiler).

wxOCaml, camlidl and Class Modules

2013-04-02T09:05:17Z

Last week, I was bored doing some paperwork, so I decided to hack a little to relieve my mind...

Looking for a GUI Framework for OCaml Beginners

Some time ago, at OCamlPro, we had discussed the fact that OCaml was lacking more GUI frameworks. Lablgtk is powerful, but I don’t like it (and I expect that some other people in the OCaml community share my opinion) for several reasons:

LablGTK makes an extensive use of objects, labels and polymorphic variants. Although using these advanced features of OCaml can help expert OCaml developers, it makes LablGTK hard to use for beginners… and a good reason to have better GUIs is actually to attract beginners!
GTK does not look native under Windows and Mac OS X, giving an outdated feeling about interfaces written with it.

Now, the question was, which GUI framework to support for OCaml ? A long time ago, I had heard that wxWidgets (formerly wxWindows) had contributed to the popularity of Python at some point, and I remembered that there was a binding called wxCaml that had been started by SooHyoung Oh a few years ago. I had managed to compile it a two years ago, but not to make the examples work, so I decided it was worth another try.

From wxEiffel to wxCaml, through wxHaskell

wxCaml is based on wxHaskell, itself based on wxEiffel, a binding for wxWidgets done for the Eiffel programming language. Since wxWidgets is written in C++, and most high-level programming languages only support bindings to C functions, the wxEiffel developers wrote a first binding from C++ to C, called the ELJ library: for each class wxCLASS of wxWidgets, and for each method Method of that class, they wrote a function wxCLASS_Method, that takes the object as first argument, the other arguments of the method, and then call the method on the first argument, with the other arguments. For example, the code for the wxWindow looks a lot like that:

EWXWEXPORT(bool,wxWindow_Close)(wxWindow* self,bool _force)
{
    return self->Close(_force);
}

From what I understood, they stopped maintaining this library, so the wxHaskell developers took the code and maintained it for wxHaskell. In wxHaskell, a few include files describe all these C functions. Then, they use a program ‘wxc’ that generates Haskell stubs for all these functions, in a class hierarchy.

In the first version of wxCaml, camlidl was used to generate OCaml stubs from these header files. The header files had to be modified a little, for two reasons:

They are actually not correct: some parts of these header files have not been updated to match the evolution of wxWidgets API. Some of the classes for which they describe stubs does not exist anymore. The tool used by wxHaskell filters out these classes, because their names are hardcoded in its code, but camlidl cannot.
camlidl needs to know more information than just what is written in C header files. It needs some attributes on types and arguments, like the fact that a char pointer is actually a string, or that a pointer argument to a function is used to return a value. See wxc_types.idl for macros to automate parts of this step.
camlidl was not used a lot, and not maintained for a long time, so there are some bugs in it. For example, the names of the arguments given in IDL header files can conflict with variables generated in C by camlidl (such as “_res”) or with types of the caml C API (such as “value”).

Since the version of wxCaml I downloaded used outdated versions of wxWidgets (wxWindows 2.4.2 when current version is wxWidgets 2.9) and wxHaskell (0.7 when current version is 0.11), I decided to upgrade wxCaml to the current versions. I copied the ELJ library and the header files from the GitHub repository of wxHaskell. Unfortunately, the corresponding wxWidgets version is 2.9.4, which is not yet officially distributed by mainstream Linux distributions, so I had to compile it also.

After the painful work of fixing the new header files for camlidl, I was able to run a few examples of wxCaml. But I was not completely satisfied with it:

To translate the relation of inheritance between classes for camlidl, wxCaml makes them equivalent, so that the child can be used where the ancestor can be used. Unfortunately, it means also that the ancestor can be used wherever the child would, and since most classes are descendant of wxObject, they can all be used in place of each other in the OCaml code !
A typed version of the interface had been started, but it was already making heavy use of objects, which I had decided to ban from the new version, as other advanced features of OCaml.

wxCamlidl, modifying camlidl for wxOCaml

So, I decided to write a new typed interface, where each class would be translated into an abstract type, a module containing its methods as functions, and a few cast functions, from children to ancestors.

I wrote just what was needed to make two simple examples work (hello_world and two_panels, from wxWidgets tutorials), I was happy with the result:

But writting by hand the complete interface for all classes and methods would not be possible, so I decided it was time to write a tool for that task.

My first attempt at automating the generation of the typed interface failed because the basic tool I wrote didn’t have enough information to correctly do the task: sometimes, methods would be inherited by a class from an ancestor, without noticing that the descendant had its own implementation of the method. Moreover, I didn’t like the fact that camlidl would write all the stubs into a single file, and my tool into another file, making any small wxOCaml application links itself with these two huge modules and the complete ELJ library, even if it would use only a few of its classes.

As a consequence, I decided that the best spot to generate a modular and typed interface would be camlidl itself. I got a copy of its sources, and created a new module in it, using the symbolic IDL representation to generate the typed version, instead of the untyped version. The module would compute the hierarchy of classes, to be able to propagate statically methods from ancestors to children, and to generate cast functions from children to ancestors.

A first generated module, called WxClasses defines all the wxWidgets classes as abstract types:

type eLJDragDataObject  
and eLJMessageParameters  
…  
and wxDocument  
and wxFrameLayout  
and wxMenu  
and wxMenuBar  
and wxProcess  
and …

Types started by “eLJ…” are classes defined in the ELJ library for wxWidgets classes where methods have to be defined to override basic behaviors.

Classes as modules

For each wxWidget class, a specific module is created with:

the constructor function, usually called “wxnew”
the methods of the class, and the methods of the ancestors
the cast functions to ancestors

For example, for the WxFrame module, the tool generates this signature:

open WxClasses

external wxnew : (* constructor *)  
wxWindow -> int -> wxString -> int -> int -> int -> int -> int  
-> wxFrame  
= “camlidl_wxc_idl_wxFrame_Create_bytecode”  
… (* direct methods *)  
external setToolBar : wxFrame -> wxToolBar -> unit  
= “camlidl_wxc_idl_wxFrame_SetToolBar”  
… (* inherited methods *)  
external setToolTip : wxFrame -> wxString -> unit  
= “camlidl_wxc_idl_wxWindow_SetToolTip”  
…  
(* string wrappers *)  
val wxnew : wxWindow -> int -> string -> int -> int -> int -> int -> int -> wxFr  
ame  
val setToolTip : wxFrame -> string -> unit  
…  
val ptrNULL : wxFrame (* a NULL pointer *)  
…  
external wxWindow : wxFrame -> wxWindow = “%identity” (* cast function *)  
…

In this example, we can see that:

WxFrame first defines the constructor for wxFrame objects. The constructor is later refined, because the stub makes use of wxString arguments, for which the tool creates a wrapper to use OCaml strings instead (using WxString.createUTF8 before the stub and WxString.delete after the stub).
Stubs are then created for direct methods, i.e. functions corresponding to new methods of the class wxFrame. String wrappers are also produced if necessary.
Stubs are also created for inherited methods. Here, “setToolTip” is a method of the class wxWindow (thus, its stub name wxWindow_SetToolTip). Normally, this function is in the WxWindow module, and takes a wxWindow as first argument. But to avoid the need for a cast from wxFrame to wxWindow to use it, we define it again here, allowing a wxFrame directly as first argument.
The module also defines a ptrNULL value that can be used wherever a NULL pointer is expected instead of an object of the class.
Finally, functions like “wxWindow” are cast functions from children to ancestor, allowing to use a value of type wxFrame wherever a value of type wxWindow is expected.

All functions that could not be put in such files are gathered in a module WxMisc. Finally, the tool also generates a module WxWidgets containing a copy of all constructors with simpler names:

…  
val wxFrame : wxWindow -> int -> string -> int -> int -> int -> int -> int -> wxFrame  
val wxFontMapper : unit -> wxFontMapper  
…

and functions to ignore the results of functions:

…  
external ignore_wxFontMapper : wxFontMapper -> unit = “%ignore”  
external ignore_wxFrame : wxFrame -> unit = “%ignore”  
…

We expect wxOCaml applications to just start with “open WxWidgets” to get access to these constructors, to use functions prefixed by the class module names, and to use constants from the Wxdefs module.

Here is how the minimal application looks like:

open WxWidgets  
let _ =  
let onInit event =  
let frame_id = wxID () in  
let quit_id = wxID() in  
let about_id = wxID() in

(* Create toplevel frame *)  
let frame = wxFrame WxWindow.ptrNULL frame_id “Hello World”  
50 50 450 350 Wxdefs.wxDEFAULT_FRAME_STYLE in  
WxFrame.setStatusText frame “Welcome to wxWidgets!” 0;

(* Create a menu *)  
let menuFile = wxMenu “” 0 in  
WxMenu.append menuFile about_id “About” “About the application” false;  
WxMenu.appendSeparator menuFile;  
WxMenu.append menuFile quit_id “Exit” “Exit from the application” false;

(* Add the menu to the frame menubar *)  
let menuBar = wxMenuBar 0 in  
ignore_int (WxMenuBar.append menuBar menuFile “&File”);  
WxFrame.setMenuBar frame menuBar;  
ignore_wxStatusBar (WxFrame.createStatusBar frame 1 0);

(* Handler for QUIT menu *)  
WxFrame.connect frame quit_id Wxdefs.wxEVT_COMMAND_MENU_SELECTED  
(fun _ -> exit 0);

(* Handler for ABOUT menu *)  
WxFrame.connect frame about_id Wxdefs.wxEVT_COMMAND_MENU_SELECTED  
(fun _ ->  
ignore_int (  
WxMisc.wxcMessageBox “wxWidgets Hello World example.”  
“About Hello World”  
(Wxdefs.wxOK lor Wxdefs.wxICON_INFORMATION)  
(WxFrame.wxWindow frame)  
Wxdefs.wxDefaultCoord  
Wxdefs.wxDefaultCoord  
)  
);

(* Display the frame *)  
ignore_bool ( WxFrame.show frame );  
ELJApp.setTopWindow (WxFrame.wxWindow frame)  
in  
WxMain.main onInit (* Main WxWidget loop starting the app *)

Testers welcome

The current code can be downloaded from our repository on GitHub. It should work with wxWidgets 2.9.4, and the latest version of ocp-build (1.99-beta5).

Of course, as I never wrote an application with wxWidgets before, I could only write a few examples, so I would really appreciate any feedback given by beta testers, especially as there might be errors in the translation to IDL, that make important functions impossible to use, that I cannot detect by myself.

I am also particularly interested by feedback on the use of modules for classes, to see if the corresponding style is usable. Our current feeling is that it is more verbose than a purely object-oriented style, but it is simpler for beginners, and improves the readability of code.

Finally, it was a short two-day hack, so it is far from finished. Especially, after hacking wxCamlidl, and looking at the code of the ELJ library, I had the feeling that we could go directly from the C++ header files, or something equivalent, to produce not only the OCaml stubs and the typed interface, but also the C++ to C bindings, and get rid completely of the ELJ library.

An Indentation Engine for OCaml

2013-03-18T09:05:17Z

Since our last activity report we have released the first stable versions of two projects: OPAM, an installation manager for OCaml source packages, and ocp-indent, an indentation tool.

We have already described the basics of OPAM in two precedent blog posts, so today we will focus on the release of ocp-indent.

Indentation should be consistent across editors

When you work on a very large code-base, it is crucial to keep a consistent indentation scheme. This is not only good for code review purposes (when the indentation carries semantic properties) but also when your code is starting to evolve and when the one who makes the change is not the one who wrote the initial piece of code. In the latter case, the variety of editors and local configurations usually leads to lot of small changes carrying no semantic value at all (such as changing tabs to spaces, adding few spaces at the beginning or end of lines, and so on). This semantic noise considerably decreases the efficiency of any code-review and change process and is usually very good at hiding hard-to-track bugs in your code-base.

A few months ago, the solutions for OCaml to this indentation problem were limited. For instance, you could write coding guidelines and hope that all the developers in your project would follow them. If you wanted to be more systematic, you could create and share a common configuration file for some popular editors (most OCaml developers use the emacs’ tuareg-mode or vim) but it is very hard to get consistent indentation result across multiple tools. Moreover, having to rely on a specific editor mode means that it is harder to fully automatize the indentation process, for instance when setting-up a VCS hook.

In order to overcome these pitfalls, Jane Street asked us to design a new external tool with the following high-level specification:

it should be easy to use inside and outside any editor;
it should understand the OCaml semantics and reflect it in the indentation;
it should be easy to maintain and to extend;

So we started to look at the OCaml tools’ ecosystem and we found an early prototype of Jun Furuse’s ocaml-indent. The foundation looked great but the result on real-world code sources was not as nice as it could be, so we decided to start from this base to build our new tool, that we called ocp-indent. Today, ocp-indent and ocaml-indent do not have much code in common anymore, but the global architecture of the system remains the same.

Writing an indentation engine for OCaml

An indentation engine may seem like a rather simple problem: given any line in the program, we want to compute its indentation level, based on the code structure.

It turns out to be much more difficult than that, mainly because indentation is only marginally semantic, and, worse, is a matter of taste and “proper layout”. In short, it’s not a problem that can be expressed concisely, because one really does want lots of specific cases handled “nicely”, depending on the on-screen layout — position of line breaks — rather than the semantic structure. Ocp-indent does contain lots of ad-hoc logic for such cases. To make things harder, the OCaml syntax is known to be difficult to handle, with a few ambiguities.

Indent process

Ocp-indent processes code in a simple and efficient way:

We lex the input with a modified version of the OCaml lexer, to guarantee complete consistency with OCaml itself. The parser had to be modified to be more robust (ocaml fails on errors, the indentation tool should not) and to keep tokens like comments, quotations, and, in the latest version, some ocamldoc block delimiters.
Taking the token stream as input, we maintain a “block” stack that keeps informations like the kinds of blocks we have been through to get to the cursor position, the column and the indentation parameters. For instance, the “block” stack [KBody KFfun; KLet; KBody KModule] corresponds to the position of X in the following piece of (pseudo-) code:

…
module Foo = struct
…
let f = fun a &> X

Each token may look up the stack to find its starting counterpart (in will look for KLet, etc.), or disambiguate (= will look for KLet, stopping on opening tokens like KBracket, and will be inserted as an operator if none is found). This is flexible enough to allow for “breaking” the stack when incorrect grammar is found. For example, the unclosed paren in module let x = ( end should not break indent after the end. Great care was taken in deciding what tokens should be able to remove from the stack in which conditions.
The stack can also be used to find a token that we want to align on, typically bars | in a pattern-matching.
On every line break, the stack can be used to compute the indentation of the next line.
In the case of partial file indentation (typically, reindenting one line or a single block), on lines that shouldn’t be reindented the stack is reversely updated to adapt to the current indentation.

Priorities

The part where some abstraction can be put into the engine is the knowledge of the semantics, and more precisely of the scope of the operations. It’s also in that case that the indenter can help you write, and not only read, your code. On that matter, ocp-indent has a knowledge of the precedence of operators and constructs that is used to know how far to unwind the stack, and what to align on. For example, a ; will flush function applications and most operators.

It is that part that gives it the most edge over tuareg, and avoids semantically incorrect indents. All infix operators are defined with a priority, a kind of indentation (indentation increment or alignment over the above concerned expression), and an indentation value (number of spaces to add). So for example most operators have a priority lower than function application, but not ., which yields correct results for:

let f =
somefun
record.
field
y
+ z

Boolean operators like && and || are setup for alignment instead of indentation:

let r = a
|| b
&& c
|| d

Additionally, some special operators are wanted with a negative alignment in some cases. This is also handled in a generic way by the engine. In particular, this is the case for ; or |:

type t = A
| B

let r = { f1 = x
; f2 = y
}

A note on the integration in editors

ocp-indent can be used on the command-line to reindent whole files (or part of them with --lines), but the most common use of an indenter is from an editor. If you are lucky enough to be able to call OCaml code from your editor, you can use it directly as a library, but otherwise, the preferred way is to use the option --numeric: instead of reprinting the file reindented, it will only output indentation levels, which you can then process from your editor (for instance, using indent-line-to with emacs). That should be cheaper and will help preserve cursor position, etc.

Currently, a simple emacs binding working on either the ocaml or the tuareg mode is provided, together with a vim mode contributed by Raphaël Proust and David Powers.

Results

We’ve built ocp-indent based on a growing collection of unit-tests. If you find an indentation bug, feel free to send us a code snippet that we will incorporate into our test suite.

Our tests clearly show that the deep understanding that ocp-indent has of the OCaml syntax makes it shines on specific cases. We are still discussing and evaluating the implementation of few corner-cases related, see for instance the currently failing tests.

We have also run some benchmarks on real code-bases and the result is quite conclusive: ocp-indent is always better than tuareg! This is a very nice result as most of the existing source files are either indented manually or are following tuareg standards. But ocp-indent is also orders of magnitude faster, which means you can integrate it seamlessly into any automatic process.

OPAM 1.0.0 released

2013-03-15T09:05:17Z

I am very happy to announce the first official release of OPAM!

Many of you already know and use OPAM so I won't be long. Please read beta-release-of-opam for a longer description.

1.0.0 fixes many bugs and add few new features to the previously announced beta-release.

The most visible new feature, which should be useful for beginners with OCaml and OPAM, is an auto-configuration tool. This tool easily enables all the features of OPAM (auto-completion, fix the loading of scripts for the toplevel, opam-switch-eval alias, etc). This tool runs interactively on each opam init invocation. If you don't like OPAM to change your configuration files, use opam init --no-setup. If you trust the tool blindly, use opam init --auto-setup. You can later review the setup by doing opam config setup --list and call the tool again using opam config setup (and you can of course manually edit your ~/.profile (or ~/.zshrc for zsh users), ~/.ocamlinit and ~/.opam/opam-init/*).

Please report:

Bug reports and feature requests for the OPAM tool: https://github.com/OCamlPro/opam/issues
Packaging issues or requests for a new packages: https://github.com/OCamlPro/opam-repository/issues
General queries to: https://lists.ocaml.org/listinfo/platform
More specific queries about the internals of OPAM to: https://lists.ocaml.org/listinfo/opam-devel

Install

Packages for Debian and OSX (at least homebrew) should follow shortly and I'm looking for volunteers to create and maintain rpm packages. The binary installer is up-to-date for Linux and Darwin 64-bit architectures, the 32-bit version for Linux should arrive shortly.

If you want to build from sources, the full archive (including dependencies) is available here:

https://github.com/ocaml/opam/releases/tag/2.1.0

Upgrade

If you are upgrading from 0.9.* you won't have anything special to do apart installing the new binary. You can then update your package metadata by running opam update. If you want to use the auto-setup feature, remove the "eval opam config env line you have previously added in your ~/.profile and run opam config setup --all.

So everything should be fine. But you never know ... so if something goes horribly wrong in the upgrade process (of if your are upgrading from an old version of OPAM) you can still trash your ~/.opam, manually remove what OPAM added in your ~/.profile (~/.zshrc for zsh users) and ~/.ocamlinit, and start again from scratch.

Random stats

Great success on github. Thanks everybody for the great contributions!

https://github.com/OCamlPro/opam: +2000 commits, 26 contributors https://github.com/OCamlPro/opam-repository: +1700 commits, 75 contributors, 370+ packages

on http://opam.ocamlpro.com/ +400 unique visitor per week, 15k 'opam update' per week +1300 unique visitor per month, 55k 'opam update' per month 3815 unique visitor since the alpha release

Changelog

The full change-log since the beta release in January:

1.0.0 [Mar 2013]

Improve the lexer performance (thx to @oandrieu)
Fix various typos (thx to @chaudhuri)
Fix build issue (thx to @avsm)

0.9.6 [Mar 2013]

Fix installation of pinned packages on BSD (thx to @smondet)
Fix configuration for zsh users (thx to @AltGr)
Fix loading of ~/.profile when using dash (eg. in Debian/Ubuntu)
Fix installation of packages with symbolic links (regression introduced in 0.9.5)

0.9.5 [Mar 2013]

If necessary, apply patches and substitute files before removing a package
Fix opam remove <pkg> --keep-build-dir keeps the folder if a source archive is extracted
Add build and install rules using ocamlbuild to help distro packagers
Support arbitrary level of nested subdirectories in packages repositories
Add opam config exec "CMD ARG1 ... ARGn" --switch=SWITCH to execute a command in a subshell
Improve the behaviour of opam update wrt. pinned packages
Change the default external solver criteria (only useful if you have aspcud installed on your machine)
Add support for global and user configuration for OPAM (opam config setup)
Stop yelling when OPAM is not up-to-date
Update or generate ~/.ocamlinit when running opam init
Fix tests on *BSD (thx Arnaud Degroote)
Fix compilation for the source archive

0.9.4 [Feb 2013]

Disable auto-removal of unused dependencies. This can now be enabled on-demand using -a
Fix compilation and basic usage on Cygwin
Fix BSD support (use type instead of which to detect existing commands)
Add a way to tag external dependencies in OPAM files
Better error messages when trying to upgrade pinned packages
Display depends and depopts fields in opam info
opam info pkg.version shows the metadata for this given package version
Add missing doc fields in .install files
opam list now only shows installable packages

0.9.3 [Feb 2013]

Add system compiler constraints in OPAM files
Better error messages in case of conflicts
Cleaner API to install/uninstall packages
On upgrade, OPAM now perform all the remove action first
Use a cache for main storing OPAM metadata: this greatly speed-up OPAM invocations
after an upgrade, propose to reinstall a pinned package only if there were some changes
improvements to the solver heuristics
better error messages on cyclic dependencies

0.9.2 [Jan 2013]

Install all the API files
Fix opam repo remove repo-name
speed-up opam config env
support for opam-foo scripts (which can be called using opam foo)
'opam update pinned-package' works
Fix 'opam-mk-repo -a'
Fix 'opam-mk-repo -i'
clean-up pinned cache dir when a pinned package fails to install

0.9.1 [Jan 2013]

Use ocaml-re 1.2.0

An Overview of our Current Activities

2013-02-18T09:05:17Z

From the early days of OCamlPro, people have been curious about our plans; they were asking how we worked at OCamlPro and what we were doing exactly. Now that we have started releasing projects more regularly, these questions come again. They are very reasonable questions, and have resolved to be more public and communicate more regularly. This post covers our activities from the beginning of 2013 and updates are scheduled on a monthly basis.

OCamlPro ?

OCamlPro has been created to promote the use of OCaml in the industry. In order to do so, we provide a wide range of services targeted at all stages of typical software projects: we train engineers and we improve the efficiency and usability of the OCaml compiler and tools, we help design new projects, advise on which open-source software components to use and how, we help deliver OCaml software projects and we do custom software development. One extra focus is the increase of the accessibility of OCaml for beginners and students.

Our customers are well-known industrial users such as Jane-Street, Citrix and Lexifi but we also help individual developers lost in the wild of non-OCaml environments inter-operate OCaml with other components. We also believe that collaborative R&D projects are a great opportunity to make existing companies discover OCaml and its benefits to their products and we are involved in several of them (see below).

Our engineering team is steadily growing (currently 9 full-time engineers in a joint lab between OCamlPro and INRIA) located in Paris and Nice. We gather a wide range of technical skills and industrial world expertise as we are all coming from major academic and industrial actors such as INRIA, [text](Dassaut Systèmes), [MLstate](http://www.mlstate.com/) and [Citrix](http://www.citrix.com/). We also love the OCaml open-source ecosystem: we have been participating to the development of [ocsigen](http://ocsigen.org/), [mirage](http://www.openmirage.org/), [XCP](http://www.xen.org/products/cloudxen.html), [mldonkey](http://mldonkey.sourceforge.net/), [marionet](http://www.marionnet.org/EN/) and so on. By the way, OCamlPro has some open [positions](/jobs) and we are still looking to hire excellent software engineers!

OCaml Distribution

The first of our technical activities is related to work on the OCaml distribution itself. We are part of the OCaml compiler development team - our INRIA members are part of the Gallium project which develops OCaml at INRIA - and we regularly contribute patches to improve the usability and performance of the compiler itself.

We have recently proposed a series of patches to improve the performance of functions with float arguments and we have started developing a framework to benchmark the efficiency of compiler optimizations.

We are also actively exploring the design-space for concurrency and distribution in OCaml, with an implementation of

reentrant runtime
way to instantiate different runtimes in separate system threads in the same process
efficient multi-scale communication library, between threads and between processes.

We call this multi-runtime OCaml and a prototype is available on github.

Last, we are also making progress with the memory profiling tools. We work on a modified OCaml runtime which can store the location of each allocated block in the heap, with hooks to dump that heap on demand. External tools can then use that dump to produce useful statistics about memory usage of the program. The good news is that we now have a working and usable bytecode runtime and an external tool that produces basic memory information. We are preparing an alpha release in the next month, so stay tuned!

Development Tools

Our efforts to make OCaml more usable go further than looking at the compiler. We are improving the development tools already existing in the community, such as the recently released indentation tool which was initially coming from an experiment from Jun Furuse, and creating new ones when the lack is blatant.

Most recent news on that front concern OPAM, the package manager for OCaml that we are developing since mid-2012. For people not familiar with it yet, OPAM is a source-based package manager for OCaml. It supports resolution of complex dependency constraints between packages, multiple simultaneous compiler installations, flexible package constraints, and a Git-friendly development workflow. The beta release was announced in January, and we expect the first official release to happen in the next weeks. The OCaml community has gratefully welcomed OPAM, and the repository of its package metadata has already become the most forked OCaml project on github! Interestingly, two meetups have gathered more than fifty OPAM users in Paris and Cambridge in January. We really hope this kind of meetup can be generalized: if you want to help us organize one in your area, feel free to contact us!

The other major part of our work around development tools for OCaml is TypeRex. TypeRex is a collection of tools which focus on improving developer efficiency, and lowering the entry barrier for experienced developers who are used to shiny IDEs in other languages. The first version of TypeRex, that was released last year, was a first step in this direction: it provided an enhanced emacs mode for OCaml code, with colorization, indentation, refactoring facilities, completion, code-navigation, documentation tooltips, etc. The next version of TypeRex (simply dubbed typerex2) is underway, with more independent tools (like ocp-indent), less tightly coupled to Emacs, and focused on better integration with various IDEs. If you are interested in following the progress of these tools, you can check the typerex2 OPAM packages with 1.99.X+beta version numbers, which we release on a regular basis.

R&D projects

The idea that OCaml is the right choice to create new innovative products is at the core of OCamlPro. We are very involved in the research community, especially on Functional Languages, with participation into the Program Committees of various conferences such as the OCaml User and Developer (OUD) workshop and the Commercial User of Functional Programming (CUFP) conference. We also joined two collaborative R&D projects in 2012, the Richelieu FUI and BWare ANR. As part of the Richelieu project, we are developing a JIT compiler for the Scilab language. As part of the Bware project, we improve the efficiency of automatic theorem provers, with a specific focus on Alt-Ergo, an SMT solver particularly suited to program verification. We are always interested in bringing our expertise in compiler technologies and knowledge of complex and distributed systems to new R&D projects: contact us if you are interested!

In the Richelieu project, our combined technical and theoretical expertise proved particularly effective. The research consortium is led by Scilab Entreprises which needed a safer and more efficient execution engine for Scilab in order to compete with Matlab. We joined the consortium to implement the early analysis required by the JIT compiler. The project started last December, and we have since specified the semantics of the language and implemented a working prototype of an interpreter that is already as fast as the current C++ engine of Scilab 6.

Growing the Community

Our last important domain of activity is geared towards the OCaml community. It is important to us that the community grows bigger, and to achieve this goal there are some basic blocks that we need to help build, together with the other main actors of the community.

The first missing block is a good reference documentation. This year will end with (at least) one new important book for the language: Real-World OCaml which targets experienced software engineers who do not know OCaml yet. We collaborate with OCamlLabs to make the technical experience of this book a success. We also work to improve the general experience of using OCaml for complete beginners by creating a stable replacement to the broken ocamlwin, the simple editor distributed with OCaml on Windows.

It is also important to us that OCaml uses the web as a platform to attract new users, as is becoming the norm for modern programming languages. We are members of the ocaml.org building effort and have created tryocaml to let newcomers easily discover the language directly from their browser. TryOcaml has been welcomed as a great tool, already adopted and adapted: see for instance tryrtt or try ReactiveML. We are in the process of simplifying the integration with other compiler variants. Last, but not least, we collaborate very closely with OCamlLabs to create the OCaml Plateform: a consistent set of libraries, thoroughly tested and integrated, with a rolling release schedule of 6 months. The platform will be based on OPAM and we are currently designing and prototyping a testing infrastructure to improve and guarantee the quality of packages.

Beta Release of OPAM

2013-01-17T09:05:17Z

OPAM is a source-based package manager for OCaml. It supports multiple simultaneous compiler installations, flexible package constraints, and a Git-friendly development workflow. I have recently announced the beta-release of OPAM on the caml-list, and this blog post introduces the basics to new OPAM users.

Why OPAM

We have decided to start writing a brand new package manager for OCaml in the beginning of 2012, after looking at the state of affairs in the OCaml community and not being completely satisfied with the existing solutions, especially regarding the management of dependency constraints between packages. Existing technologies such as GODI, oasis, odb and ocamlbrew did contain lots of good ideas that we shamelessly stole but the final user-experience was not so great — and we disagreed with some of the architectural choices, so it wasn’t so easy to contribute to fix the existing flaws. Thus we started to discuss the specification of a new package manager with folks from Jane Street who decided to fund the project and from the Mancoosi project to integrate state-of-the-art dependency management technologies. We then hired an engineer to do the initial prototyping work — and this effort finally gave birth to OPAM!

Installing OPAM

OPAM packages are already available for homebrew, macports and arch-linux. Debian and Ubuntu packages should be available quite soon. In any cases, you can either use a binary installer or simply install it from sources. To learn more about the installation process, read the installation instructions.

Initializing OPAM

Once you’ve installed OPAM, you have to initialize it. OPAM will store all its state under ~/.opam, so if you want to reset your OPAM configuration, simply remove that directory and restart from scratch. OPAM can either use the compiler installed on your system or it can also install a fresh version of the compiler:

$ opam init # Use the system compiler<br>
$ opam init –comp 4.00.1 # Use OCaml 4.00.1<br>

OPAM will prompt you to add a shell script fragment to your .profile. It is highly recommended to follow these instructions, as it let OPAM set-up correctly the environment variables it needs to compile and configure the packages.

Getting help

OPAM user manual is integrated:

$ opam –help # Get help on OPAM itself
$ opam init –help # Get help on the init sub-command

Basic commands

Once OPAM is initialized, you can ask it to list the available packages, get package information and search for a given pattern in package descriptions:

$ opam list *foo* # list all the package containing ‘foo’ in their name
$ opam info foo # Give more information on the ‘foo’ package
$ opam search foo # search for the string ‘foo’ in all package descriptions

Once you’ve found a package you would like to install, just run the usual install command.

$ opam install lwt # install lwt and its dependencies
$ opam remove lwt # remove lwt and its dependencies

Later on, you can check whether new packages are available and you can upgrade your package installation.

$ opam update # check if new packages are available
$ opam upgrade # upgrade your packages to the latest version

Casual users of OCaml won’t need to know more about OPAM. Simply remind to update and upgrade OPAM regularly to keep your system up-to-date.

Use-case 1: Managing Multiple Compilers

A new release of OCaml is available and you want to be able to use it. How to do this in OPAM ? This is as simple as:

$ opam update # pick-up the latest compiler descriptions
$ opam switch 4.00.2 # switch to the new 4.00.2 release
$ opam switch export –switch=system | opam switch import -y

The first line will get the latest package and compiler descriptions, and will tell you if new packages or new compilers are available. Supposing that 4.00.2 is now available, you can then switch to that version using the second command. The last command imports all the packages installed by OPAM for the OCaml compiler installed on your system (if any).

You can also easily use the latest unstable version of OCaml if you want to give it a try:

$ opam switch 4.01.0dev+trunk # install trunk
$ opam switch reinstall 4.01.0dev+trunk # reinstall trunk

Reinstalling trunk means getting the latest changesets and recompiling the packages already installed for that compiler switch.

Use-case 2: Managing Multiple Repositories

Sometimes, you want to let people use a new version of your software early. Or you are working in a company and expose internal libraries to your coworkers but you don’t want them to be available to anybody using OPAM. How can you do that with OPAM? It’s easy! You can set-up your own repository (see for instance xen-org‘s development packages) and add it to your OPAM configuration:

$ opam repository list # list the repositories available in your config
$ opam repository add xen-org git://github.com/xen-org/opam-repo-dev.git
$ opam repository list # new xen-org repository available

This will add the repository to your OPAM configuration and it will display the newly available packages. The next time you run opam update OPAM will then scan for any change in the remote git repository.

Repositories can either be local (e.g. on your filesystem), remote (available through HTTP) and stored in git or darcs.

Use-case 3: Using Development Packages

You want to try the latest version of a package which have not yet been released, or you have a patched version of a package than you want to try. How could you do it? OPAM has a pin sub-command which let you do that easily:

$ opam pin lwt /local/path/
$ opam install lwt # install the version of lwt stored in /local/path

You can also use a given branch in a given git repository. For instance, if you want the library re to be compiled with the code in the experimental branch of its development repository you can do:

$ opam pin re git://github.com/ocaml/ocaml-re.git#experimental
$ opam install re

When building the packages, OPAM will use the path set-up with the pin command instead of using the upstream archives. Also, on the next update, OPAM will automatically check whether some changes happened and if the packages needs to be recompiled:

$ opam update lwt # check for changes in /local/path
$ opam update re # check for change in the remote git branch
$ opam upgrade lwt re # upgrade re and lwt if necessary

Conclusion

I’ve briefly explained some of the main features of OPAM. If you want to go further, I would advise to read the user and packager tutorials. If you really want to understand the internals of OPAM, you can also read the developer manual.

OCamlPro’s Contributions to OCaml 4.00.0

2012-08-20T09:05:17Z

OCaml 4.00.0 has been released on July 27, 2012. For the first time, the new OCaml includes some of the work we have been doing during the last year. In this article, I will present our main contributions, mostly funded by Jane Street and Lexifi.

Binary Annotations for Advanced Development Tools

OCaml 4.00.0 has a new option -bin-annot (undocumented, for now, as it is still being tested). This option tells the compiler to dump in binary format a compressed version of the typed tree (an abstract syntax tree with type annotations) to a file (with the .cmt extension for implementation files, and .cmti for interface files). This file can then be used by development tools to provide new features, based on the full knowledge of types in the sources. One of the first tools to use it is the new version of ocamlspotter, by Jun Furuse.

This new option will probably make the old option -annot obsolete (except, maybe, in specific contextes where you don’t want to depend on the internal representation of the typedtree, for example when you are modifying this representation !). Generated files are much smaller than with the -annot option, and much faster to write (during compilation) and to read (for analysis).

New Options for ocamldep

As requested on the bug tracker, we implemented a set of new options for ocamldep:

-all will print all the dependencies, i.e. not only on .cmi, .cmo and .cmx files, but also on source files, and for .o files. In this mode also, no proxying is performed: if there is no interface file, a bytecode dependency will still appear against the .cmi file, and not against the .cmo file as it would before;
-one-line will not break dependencies on several lines;
-sort will print the arguments of ocamldep (filenames) in the order of dependencies, so that the following command should work when all source files are in the same directory:

ocamlopt -o my_program `ocamldep -sort *.ml *.mli

CFI Directives for Debugging

OCaml tries to make the best use of available registers and stack space, and consequently, its layout on the stack is much different from the one of C functions. Also, function names are mangled to make them local to their module. As a consequence, debugging native code OCaml programs has long been a problem with previous versions of OCaml:, since the debugger cannot print correctly the backtrace of the stack, nor put breakpoints on OCaml functions.

In OCaml 4.00.0, we worked on a patch submitted on the bug tracker to improve the situation: x86 and amd64 backends now emit more debugging directives, such as the locations in the source corresponding to functions in the assembly (so that you can put breakpoints at function entry), and CFI directives, indicating the correct stack layout, for the debugger to correctly unwind the stack. These directives are part of the DWARF debugging standard.

Unfortunately, line by line stepping is not yet available, but here is an example of session that was not possible with previous versions:

let f x y = List.map ( (+) x ) y
let _ = f 3 [1;2;3;4]

$ ocamlopt -g toto.ml
$ gdb ./a.out
(gdb) b toto.ml:1
Breakpoint 1 at 0x4044f4: file toto.ml, line 1.
(gdb) run
Starting program: /home/lefessan/ocaml-4.00.0-example/a.out

Breakpoint 1, 0x00000000004044f4 in camlToto__f_1008 () at toto.ml:1
1 let f x y = List.map ( (+) x ) y
(gdb) bt

0 0x00000000004044f4 in camlToto__f_1008 () at toto.ml:1
1 0x000000000040456c in camlToto__entry () at toto.ml:2
2 0x000000000040407d in caml_program ()
3 0x0000000000415fe6 in caml_start_program ()
4 0x00000000004164b5 in caml_main (argv=0x7fffffffe3f0) at startup.c:189
5 0x0000000000408cdc in main (argc=<optimized out>, argv=<optimized out>)
at main.c:56
(gdb)

Optimisation of Partial Function Applications

Few people know that partial applications with multiple arguments are not very efficient. For example, do you know how many closures are dynamically allocated in in the following example ?

let f x y z = x + y + z
let sum_list_offsets orig list = List.fold_left (f orig) 0 list
let sum = sum_list_offsets 10 [1;2;3]

Most programmers would reply one, f orig, but that’s not all (indeed, f and sum_list_offsets are allocated statically, not dynamically, as they have no free variables). Actually, three more closures are allocated, when List.fold_left is executed on the list, one closure per element of the list.

The reason for this is that Ocaml has only two modes to execute functions: either all arguments are present, or just one argument. Prior to 4.00.0, when a function would enter the second mode (as f in the previous example), then it would remain in that mode, meaning that the two other arguments would be passed one by one, creating a partial closure between them.

In 4.00.0, we implemented a simple optimization, so that whenever all the remaining expected arguments are passed at once, no partial closure is created and the function is immediatly called with all its arguments, leading to only one dynamic closure creation in the example.

Optimized Pipe Operators

It is sometimes convenient to use the pipe notation in OCaml programs, for example:

let (|>) x f = f x;;
let (@@) f x = f x;;
[1;2;3] |> List.map (fun x -> x + 2) |> List.map print_int;;
List.map print_int @@ List.map (fun x -> x + 1 ) @@ [1;2;3];;

However, such |> and @@ operators are currently not optimized: for example, the last line will be compiled as:

let f1 = List.map print_int;;
let f2 = List.map (fun x -> x + 1);;
let x = f2 [1;2;3;];;
f1 x;;

Which means that partial closures are allocated every time a function is executed with multiple arguments.

In OCaml 4.00.0, we optimized these operators by providing native operators, for which no partial closures are generated:

external (|>) : ‘a -> (‘a -> ‘b) -> ‘b = "%revapply";;
external ( @@ ) : (‘a -> ‘b) -> ‘a -> ‘b = "%apply"

Now, the previous example is equivalent to:

List.map print_int (List.map ( (+) 1 ) [1;2;3])

Bug Fixing

Of course, a lot of our contributions are not always as visible as the previous ones. We also spent a lot of time fixing small bugs. Although it doesn’t sound very fun, fixing bugs in OCaml is also fun, because bugs are often challenging to understand, and even more challenging to remove without introducing new ones !

Profiling OCaml amd64 code under Linux

2012-08-08T09:05:17Z

We have recently worked on modifying the OCaml system to be able to profile OCaml code on Linux amd64 systems, using the processor performance counters now supported by stable kernels. This page presents this work, funded by Jane Street.

The patch is provided for OCaml version 4.00.0. If you need it for 3.12.1, some more work is required, as we would need to backport some improvements that were already in the 4.00.0 code generator.

An example: profiling `ocamlopt.opt`

Here is an example of a session of profiling done using both Linux performance tools and a modified OCaml 4.00.0 system (the patch is available at the end of this article).

Linux performance tools are available as part of the Linux kernel (in the linux-tools package on Debian/Ubuntu). Most of the tools are invoked through the perf command, à la git. For example, we are going to check where the time is spent when calling the ocamlopt.opt command:

perf record -g ./ocamlopt.opt -c -I utils -I parsing -I typing typing/*.ml

This command generates a file perf.data in the current directory, containing all the events that were received during the execution of the command. These events contain the values of the performance counters in the amd64 processor, and the call-chain (backtrace) at the event.

We can inspect this file using the command:

perf report -g

The command displays:

Events: 3K cycles
+   9.81%  ocamlopt.opt  ocamlopt.opt           [.] compare_val
+   8.85%  ocamlopt.opt  ocamlopt.opt           [.] mark_slice
+   7.75%  ocamlopt.opt  ocamlopt.opt           [.] caml_page_table_lookup
+   7.40%            as  as                     [.] 0x5812
+   5.60%  ocamlopt.opt  [kernel.kallsyms]      [k] 0xffffffff8103d0ca
+   3.91%  ocamlopt.opt  ocamlopt.opt           [.] sweep_slice
+   3.18%  ocamlopt.opt  ocamlopt.opt           [.] caml_oldify_one
+   3.14%  ocamlopt.opt  ocamlopt.opt           [.] caml_fl_allocate
+   2.84%            as  [kernel.kallsyms]      [k] 0xffffffff81317467
+   1.99%  ocamlopt.opt  ocamlopt.opt           [.] caml_c_call
+   1.99%  ocamlopt.opt  ocamlopt.opt           [.] caml_compare
+   1.75%  ocamlopt.opt  ocamlopt.opt           [.] camlSet__mem_1148
+   1.62%  ocamlopt.opt  ocamlopt.opt           [.] caml_oldify_mopup
+   1.58%  ocamlopt.opt  ocamlopt.opt           [.] camlSet__bal_1053
+   1.46%  ocamlopt.opt  ocamlopt.opt           [.] camlSet__add_1073
+   1.37%  ocamlopt.opt  libc-2.15.so           [.] 0x15cbd0
+   1.37%  ocamlopt.opt  ocamlopt.opt           [.] camlInterf__compare_1009
+   1.33%  ocamlopt.opt  ocamlopt.opt           [.] caml_apply2
+   1.09%  ocamlopt.opt  ocamlopt.opt           [.] caml_modify
+   1.07%            sh  [kernel.kallsyms]      [k] 0xffffffffa07e16fd
+   1.07%            as  libc-2.15.so           [.] 0x97a61
+   0.94%  ocamlopt.opt  ocamlopt.opt           [.] caml_alloc_shr

Using the arrow keys and the Enter key to expand an item, we can get a better idea of where most of the time is spent:

Events: 3K cycles
+ 9.81%  ocamlopt.opt  ocamlopt.opt           [.] compare_val
- compare_val
- 71.68% camlSet__mem_1148
+ 98.01% camlInterf__add_interf_1121
+ 1.99% camlInterf__add_pref_1158
- 21.48% camlSet__add_1073
- camlSet__add_1073
+ 93.41% camlSet__add_1073
+ 6.59% camlInterf__add_interf_1121
+ 1.44% camlReloadgen__fun_1386
+ 1.43% camlClosure__close_approx_var_1373
+ 1.43% camlSwitch__opt_count_1239
+ 1.34% camlTbl__add_1050
+ 1.20% camlEnv__find_1408
+ 8.85%  ocamlopt.opt  ocamlopt.opt           [.] mark_slice
- 7.75%  ocamlopt.opt  ocamlopt.opt           [.] caml_page_table_lookup
- caml_page_table_lookup
+ 50.03% camlBtype__set_commu_1704
+ 49.97% camlCtype__expand_head_1923
+ 7.40%            as  as                     [.] 0x5812
+ 5.60%  ocamlopt.opt  [kernel.kallsyms]      [k] 0xffffffff8103d0ca
+ 3.91%  ocamlopt.opt  ocamlopt.opt           [.] sweep_slice
Press `?` for help on key bindings

We notice that a lot of time is spent in the compare_val primitive, called from the Pervasives.compare function, itself called from the Set module in asmcomp/interp.ml. We can locate the corresponding code at the beginning of the file:

module IntPairSet =
  Set.Make(struct type t = int * int let compare = compare end)

Let's replace the polymorphic function compare by a monomorphic function, optimized for pairs of small ints:

module IntPairSet =
  Set.Make(struct type t = int * int
                  let compare (a1,b1) (a2,b2) =
                  if a1 = a2 then b1 - b2 else a1 - a2
           end)

We can now compare the speed of the two versions:

peerocaml:~/ocaml-4.00.0%  time ./ocamlopt.old -c -I utils -I parsing -I typing typing/.ml
./ocamlopt.old  7.38s user 0.56s system 97% cpu 8.106 total
peerocaml:~/ocaml-4.00.0%  time ./ocamlopt.new -c -I utils -I parsing -I typing typing/.ml
./ocamlopt.new  6.16s user 0.50s system 97% cpu 6.827 total

And we get an interesting speedup ! Now, we can iterate the process, check where most of the time is spent in the new version, optimize the critical path and so on.

Installation of the modified OCaml system

A modified OCaml system is required because, for each event, the Linux kernel must attach a backtrace of the stack (call-chain). However, the kernel is not able to use standard DWARF debugging information, and current OCaml stack frames are too complex to be unwinded without this DWARF information. Instead, we had to modify OCaml code generator to follow the same conventions as C for frame pointers, i.e. using saving the frame pointer on function entry and restoring it on function exit. This required to decrease the number of available registers from 13 to 12, using %rbp as the frame pointer, leading to an average 3-5% slowdown in execution time.

The patch for OCaml 4.00.0 is available here:

omit-frame-pointer-4.00.0.patch (20 kB, v2, updated 2012/08/13)

To use it, you can use the following recipe, that will compile and install the patched version in ~/ocaml-4.00-with-fp.

$ wget http://caml.inria.fr/pub/distrib/ocaml-4.00.0/ocaml-4.00.0.tar.gz
$ tar zxf ~/ocaml-4.00.0.tar.gz
$ cd ocaml-4.00.0
$~/ocaml-4.00.0% wget ocamlpro.com/files/omit-frame-pointer-4.00.0.patch
$~/ocaml-4.00.0% patch -p1 < omit-frame-pointer-4.00.0.patch
$~/ocaml-4.00.0% ./configure -prefix ~/ocaml-4.00-with-fp
$~/ocaml-4.00.0% make world opt opt.opt install
$~/ocaml-4.00.0% cd ~
$ export PATH=$HOME/ocaml-4.00.0-with-fp/bin:$PATH

It is important to know that the patch modifies OCaml calling convention, meaning that ALL THE MODULES AND LIBRARIES in your application must be recompiled with this version.

On our benchmarks, the slowdown induced by the patch is between 3 and 5%. You can still compile your application without frame pointers, for production, using a new option -fomit-frame-pointer that was added by the patch.

This patch has been submitted for inclusion in OCaml. You can follow its status and contribute to the discussion here: http://caml.inria.fr/mantis/view.php?id=5721

Packing and Functors

2011-08-10T09:05:17Z

We have recently worked on modifying the OCaml system to be able to pack a set of modules within a functor, parameterized on some signatures. This page presents this work, funded by Jane Street.

All the patches on this page are provided for OCaml version 3.12.1.

Packing Functors

Installation of the modified OCaml system

The patch for OCaml 3.12.1 is available here:

ocaml+libfunctor-3.12.1.patch.gz (26 kB)

To use it, you can use the following recipe, that will compile and install the patched version in ~/ocaml+libfunctor-3.12.1/bin/.

~% wget http://caml.inria.fr/pub/distrib/ocaml-3.12/ocaml-3.12.1.tar.gz
~% tar zxf ~/ocaml-3.12.1.tar.gz
~% cd ocaml-3.12.1
~/ocaml-3.12.1% wget ocamlpro.com/code/ocaml+libfunctor-3.12.1.patch.gz
~/ocaml-3.12.1% gzip -d ocaml+libfunctor-3.12.1.patch.gz
~/ocaml-3.12.1% patch -p1 < ocaml+libfunctor-3.12.1.patch
~/ocaml-3.12.1% ./configure –prefix ~/ocaml+libfunctor-3.12.1
~/ocaml-3.12.1% make coldstart
~/ocaml-3.12.1% make ocamlc ocamllex ocamltools
~/ocaml-3.12.1% make library-cross
~/ocaml-3.12.1% make bootstrap
~/ocaml-3.12.1% make all opt opt.opt
~/ocaml-3.12.1% make install
~/ocaml-3.12.1% cd ~
~% export PATH=$HOME/ocaml+libfunctor-3.12.1/bin:$PATH

Note that it needs to bootstrap the compiler, as the format of object files is not compatible with the one of ocaml-3.12.1.

Usage of the lib-functor patch.

Now that you are equiped with the new system, you can start using it. The lib-functor patch adds two new options to the compilers ocamlc and ocamlopt:

-functor <interface_file> : this option is used to specify that the current module is compiled with the interface files specifying the argument of the functor. This option should be used together with -for-pack , where is the name of the module in which the current module will be embedded.
-pack-functor <module> : this option is used to pack the modules. It should be used with the option -o <object_file> to specify in which module it should be embedded. The specified with -pack-functor specifies the name of functor that will be created in the target object file.

If the interface x.mli contains :

type t
val compare : t -> t -> int

and the files xset.ml and xmap.ml contain respectively :

module T = Set.Make(X)

module T = Map.Make(X)

Then :

~/test% ocamlopt -c -for-pack Xx -functor x.cmi xset.ml
~/test% ocamlopt -c -for-pack Xx -functor x.cmi xmap.ml
~/test% ocamlopt -pack-functor MakeSetAndMap -o xx.cmx xset.cmx xmap.cmx

will construct a compiled unit whose signature is (that you can get with ocamlopt -i xx.cmi, see below) :

module MakeSetAndMap :
functor (X : sig type t val compare : t -> t -> int end) -> sig
  module Xset : sig
    module T : sig
      type elt = X.t
      type t = Set.Make(X).t
      val empty : t
      val is_empty : t -> bool
      …
    end
  end
  module Xmap : sig
    module T : sig
      type key = X.t
      type ‘a t = ‘a Map.Make(X).t
      val empty : ‘a t
      val is_empty : ‘a t -> bool
      …
    end
  end
end

Other extension: printing interfaces

OCaml only allows you to print the interface of a module or interface by compiling its source with the -i option. However, you don’t always have the source of an object interface (in particular, if it was generated by packing), and you might still want to do it.

In such a case, the lib-functor patch allows you to do that, by using the -i option on an interface object file:

~/test% cat > a.mli
val x : int
~/test% ocamlc -c -i a.mli
val x : int
~/test% ocamlc -c -i a.cmi
val x : int

Other extension: packing interfaces

OCaml only allows you to pack object files inside another object file (.cmo or .cmx). When doing so, you can either provide an source interface (.mli) that you need to compile to provide the corresponding object interface (.cmi), or the object interface will be automatically generated by exporting all the sub-modules within the packed module.

However, sometimes, you would want to be able to specify the interfaces of each module separately, so that:

you can reuse most of the interfaces you already specified
you can use a different interface for a module, that the one used to compile the other modules. This happens when you want to export more values to the other internal sub-modules than you want to export to the user.

In such a case, the lib-functor patch allows you to do that, by using the -pack option on interface object files:

test% cat > a.mli
val x : int
test% cat > b.mli
val y : string
test% ocamlc -c a.mli b.mli
test% ocamlc -pack -o c.cmi a.cmi b.cmi
test% ocamlc -i c.cmi
module A : sig val x : int end
module B : sig val y : string end

Using `ocp-pack` to pack source files

Installation of ocp-pack

Download the source file from:

ocp-pack-1.0.1.tar.gz (20 kB, GPL Licence, Copyright OCamlPro SAS)

Then, you just need to compile it with:

~% tar zxf ocp-pack-1.0.1.tar.gz
~% cd ocp-pack-1.0.1
~/ocp-pack-1.0.1% make
~/ocp-pack-1.0.1% make install

Usage of `ocp-pack`

ocp-pack can be used to pack source files of modules within just one source file. It allows you to avoid the use of the -pack option, that is not always supported by all ocaml tools (for example, ocamldoc). Moreover, ocp-pack tries to provide the correct locations to the compiler, so errors are not reported within the generated source file, but within the original source files.

It supports the following options:

% ocp-pack -help
Usage:
ocp-pack -o target.ml [options] files.ml*

Options:
-o <filename.ml> generate filename filename.ml
-rec use recursive modules
all .ml files must have a corresponding .mli file
-pack-functor <modname> create functor with name <modname>
-functor <filename.mli> use filename as an argument for functor
-mli output the .mli file too
.ml files without .mli file will not export any value
-no-ml do not output the .ml file
-with-ns use directory structure to create a hierarchy of modules
-v increment verbosity
–version display version information

ocp-pack automatically detects interface sources and implementation sources. When only the interface source is available, it is assumed that it is a type-only module, i.e. no val items are present inside.

Here is an example of using ocp-pack to build the ocamlgraph package:

test% ocp-pack -o graph.ml 
lib/bitv.ml lib/heap.ml lib/unionfind.ml 
src/sig.mli src/dot_ast.mli src/sig_pack.mli 
src/version.ml src/util.ml src/blocks.ml 
src/persistent.ml src/imperative.ml src/delaunay.ml 
src/builder.ml src/classic.ml src/rand.ml src/oper.ml 
src/path.ml src/traverse.ml src/coloring.ml src/topological.ml 
src/components.ml src/kruskal.ml src/flow.ml src/graphviz.ml 
src/gml.ml src/dot_parser.ml src/dot_lexer.ml src/dot.ml 
src/pack.ml src/gmap.ml src/minsep.ml src/cliquetree.ml 
src/mcs_m.ml src/md.ml src/strat.ml
test% ocamlc -c graph.ml
test% ocamlopt -c graph.ml

The -with-ns option can be used to automatically build a hierarchy of modules. With that option, sub-directories are seen as sub-modules. For example, packing a/x.ml, a/y.ml and b/z.ml will give a result like:

[code language=”fsharp”] module A = struct module X = struct … end module Y = struct … end end module B = struct module Z = struct … end end [/code]

Packing modules as functors

The -pack-functor and -functor options provide the same behavior as the same options with the lib-functor patch. The only difference is that -functor takes the interface source as argument, not the interface object.

Packing recursive modules

When trying to pack modules with ocp-pack, you might discover that your toplevel modules have recursive dependencies. This is usually achieved by types declared abstract in the interfaces, but depending on each other in the implementations. Such modules cannot simply packed by ocp-pack.

To handle them, ocp-pack provides a -rec option. With that option, modules are put within a module rec construct, and are all required to be accompagnied by an interface source file.

Moreover, in many cases, OCaml is not able to compile such recursive modules:

For typing reasons: recursive modules are typed in an environment containing only an approximation of other recursive modules signatures
For code generation reasons: recursive modules can be reordered depending on their shape, and this reordering can generate an order that is actually not safe, leading to an exception at runtime

To solve these two issues in most cases, you can use the following patch (you can apply it using the same recipe as for lib-functor, and even apply both patches on the same sources):

ocaml+rec-3.12.1.patch.gz

With this patch, recursive modules are typed in an environment that is enriched progressively with the final types of the modules as soon as they become available. Also, during code generation, a topological order is computed on the recursive modules, and the subset of modules that can be initialized using in that topological order are immediatly generated, leaving only the other modules to be reordered.

OCaml and Windows

2011-06-23T09:05:17Z

Recently, I have been experimenting wiht OCaml / MSVC running on Windows 7 64bit. I have mainly followed what the OCaml’s README.win32 was saying and I learned some NSIS tricks. The result of this experiment is the following two (rather big) windows binaries :

ocaml-trunk-64-installer.exe (92 MB)
ocaml-3.12-64-installer.exe (92 MB)

These binaries are auto-installer for :

the OCaml distribution (either the 3.12.1+rc1 version or trunk);
Emacs (version 23.3) + tuareg mode (version 2.0.4);
OCamlGraph (version 1.7) : this is just a little experiment with packaging external libraries.

Hopefully, all of this might be useful to some people, at least to people looking for an alternative to WinOcaml which seems to be broken. You should need no other dependencies if you just want to use the OCaml top-level (ocaml.exe). If you want to compile your project you will need MSVC installed and correctly set-up. If your project is using Makefiles then you should probably install cygwin as well. I can give more details if some people are interested.

Unfortunately, the current process for creating these binaries involves an awlful lot of manual steps (including switching for Windows Termninal to cygwin shell) and further, many OCaml packages won’t install directly on windows (as most of them are using shell tricks to be configured correctly). I hope we will be able to release something cleaner in a later stage.

OCaml Cheat Sheets

2011-06-03T09:05:17Z

When you are beginning in a new programming language, it is sometimes helpful to have an overview of the documentation, that you can pin on your wall and easily have a look at it while you are programming. Since we couldn’t find such Cheat Sheets, we decided to start writting our own cheat sheets for OCaml.

Beware, these documents are drafts, that we plan to improve in the next months. In the meantime, feel free to tell us how we could improve them, what is missing, and where the focus should be !

The OCaml Language (June 8, 2011)
OCaml Standard Tools (June 7, 2011)
OCaml Standard Library (June 7, 2011)
OCaml Emacs Mode (Tuareg) (June 27, 2011)

OCaml 32bits longval

2011-05-06T09:05:17Z

You will need OCaml 3.11.2 installed on a i686 linux computer. The archive contains:

libcamlrun-linux-i686.a
ocamlrun-linux-i686
Makefile
README

The Makefile has two targets:

sudo make install will save /usr/bin/ocamlrun and /usr/lib/ocaml/libcamlrun.a in the current directory and replace them with the longval binaries.
sudo make restore will restore the saved files.

If your install directories are not the default ones, you should modify the Makefile. After installing, you can test it with the standard OCaml top-level:

Objective Caml version 3.11.2


# let s = ref “”;;
val s : string ref = {contents = “”}

# s := String.create 20_000_000;;
– : unit = ()

Now you can enjoy big values in all your strings and arrays in bytecode. You will need to relink all your custom binaries. If you are interested in the native version of the longval compiler, you can contact us.

OCamlPro Feed

OCaml Onboarding: Introduction to the Dune build system

Welcome to all Camleers

opam 2.4 release

Major changes

UI changes

New commands / options

Other noteworthy changes

Changes

Changes

Windows binary

Opam 103: Bootstrapping a New OCaml Project with opam

Welcome back to the opam deep-dives series!

Flambda2 Ep. 4: How to write a purely functional compiler

Welcome to a new episode of The Flambda2 Snippets!

Overview and semantics:

Traversal algorithm:

Additional details:

Overview and semantics:

Traversal algorithm:

Quick rundown:

Overview and semantics:

Traversal algorithm:

Overview and semantics:

Traversal algorithm:

opam 2.3.0 release!

Try it!

Major breaking change: extra-files

Major changes

Optimisation de Geneweb, 1er logiciel français de Généalogie depuis près de 30 ans

Alt-Ergo 2.6 is Out!

Bit-vectors

Model Generation

Optimization

SMT-LIB command support

Floating-point theory

Dolmen is the new default frontend

Use of dune-site for plugins

Binary releases on GitHub

Performance

And more!

Acknowledgements

Flambda2 Ep. 3: Speculative Inlining

Welcome to a new episode of The Flambda2 Snippets!

opam 2.2.0 release!

Try it!

Changes

Major change: Windows support

Major change: opam tree / opam why

Major change: with-dev-setup

Major change: opam pin --recursive

New Options

Miscellaneous changes

Flambda2 Ep. 2: Loopifying Tail-Recursive Functions

Welcome to a new episode of The Flambda2 Snippets!

Fixing and Optimizing the GnuCOBOL Preprocessor

OCaml Backtraces on Uncaught Exceptions

Opam 102: Pinning Packages

So what exactly did opam pin do here?

Flambda2 Ep. 1: Foundational Design Decisions

Welcome to The Flambda2 Snippets!

Welcome back to the `opam deep-dives` series!

Use of `dune-site` for plugins

So what exactly did `opam pin` do here?