wxOCaml, camlidl and Class Modules

Auteurs: Çagdas Bozman
Date: 2015-04-13
Catégorie: Tooling



A few months ago, a memory leak in the Scanf.fscanf function of OCaml’s standard library has been reported on the OCaml mailing list. The following “minimal” example reproduces this misbehavior:

for i = 0 to 100_000 do
  let ic = open_in “some_file.txt” in
  Scanf.fscanf ic “%s” (fun _s -> ());
  close_in ic
done;;

read_line ();;

Let us see how to identify the origin of the leak and fix it with our OCaml memory profiler.

Installing the OCaml Memory Profiler

We first install our modified OCaml compiler and the memory profiling tool thanks to the following opam commands:

$ opam remote add memprof http://memprof.typerex.org/opam
$ opam update
$ opam switch 4.01.0+ocp1-20150202
$ opam install ocp-memprof
$ eval opam config env

That’s all ! Installation is done after only five (opam) commands.

Compiling and Executing the Example

The second step consists in compiling the example above and profiling it. This is simply achieved with the commands:

$ ocamlopt scanf_leak.ml -o scanf.x
$ ocp-memprof –exec scanf.x

You may notice that no instrumentation of the source is needed to enable profiling.

Visualizing the Results

In the last command above, scanf.x dumps a lot of information (related to memory occupation) during its execution. Our “OCaml Memory Profiler” then analyzes these dumps, and generates a “human readable” graph that shows the evolution of memory consumption after each OCaml garbage collection. Concretely, this yields the graph below (the interactive graph generated by ocp-memprof is available here). As you can see, memory consumption is growing abnormally and exceed 240Mb ! Note that we stopped the scanf.x after 90 seconds.

Playing With (Some of) ocp-memprof Capabilities

ocp-memprof allows to group and show data contained in the graph w.r.t. several criteria. For instance, data are grouped by “Modules” in the capture below. This allows us to deduce that most allocations are performed in the Scanf and Buffer modules.

In addition to aggregation capabilities, the interactive graph generated by ocp-memprof also allows to “zoom” on particular data. For instance, by looking at Scanf, we obtain the graph below that shows the different functions that are allocating in this module. We remark that the most allocating function is Scanf.Scanning.from_ic. Let us have a look to this function.

From Profiling Graphs to Source Code The code of the function from_ic, that is responsible for most of the allocation in Scanf, is the following:

let memo_from_ic =
let memo = ref [] in
(fun scan_close_ic ic ->
   try 
     List.assq ic !memo 
   with
   | Not_found ->
     let ib = from_ic scan_close_ic (From_channel ic) ic in
     memo := (ic, ib) :: !memo;
     ib)
;;

It looks like that the leak is caused by the memo list that associates a lookahead buffer, resulting from the call to from_ic, with each input channel.

Patching the Code

Benoit Vaugon quickly sent a patch based on weak-pointers that seems to solve the problem. He modified the code as follows:

  • he put the key in a weak set in order to test if it is gone;
  • he created a pair that stores the key and the associated value (PairMemo);
  • he put this pair in a weak set (IcMemo), where it will be reclaimed at the next GC because;
  • he added a finalizer on the pair that adds again the pair in the weak set at each GC
let memo_from_ic =
  let module IcMemo = Weak.Make (
    struct
      type t = Pervasives.in_channel
      let equal ic1 ic2 = ic1 = ic2
      let hash ic = Hashtbl.hash ic
    end) 
  in
  let module PairMemo = Weak.Make (
    struct
      type t = Pervasives.in_channel * in_channel
      let equal (ic1, _) (ic2, _) = ic1 = ic2
      let hash (ic, _) = Hashtbl.hash ic
    end) 
  in
  let ic_memo = IcMemo.create 16 in
  let pair_memo = PairMemo.create 16 in
  let rec finaliser ((ic, _) as pair) =
    if IcMemo.mem ic_memo ic then (
      Gc.finalise finaliser pair;
      PairMemo.add pair_memo pair) in
  (fun scan_close_ic ic ->
     try snd (PairMemo.find pair_memo (ic, stdin)) with
     | Not_found ->
       let ib = from_ic scan_close_ic (From_channel ic) ic in
       let pair = (ic, ib) in
       IcMemo.add ic_memo ic;
       Gc.finalise finaliser pair;
       PairMemo.add pair_memo pair;
       ib)
;;

Checking the Fixed Version

Curious to see the memory behavior after applying this patch ? The graph below shows the memory consumption of the patched version of Scanf module. Again, the interactive version is available here. After each iteration of the for-loop, the memory is released as expected and memory consumption does not exceed 2.1Mb during each for-loop iteration.

Conclusion

This example is online in our gallery of examples if you want to see and explore the graphs (with the leak and without the leak).

Do not hesitate to use ocp-memprof on your applications. Of course, all feedback and suggestions on using ocp-memprof are welcome, just send us an email !

More information:



Au sujet d'OCamlPro :

OCamlPro développe des applications à haute valeur ajoutée depuis plus de 10 ans, en utilisant les langages les plus avancés, tels que OCaml et Rust, visant aussi bien rapidité de développement que robustesse, et en ciblant les domaines les plus exigeants (méthodes formelles, cybersécurité, systèmes distribués/blockchain, conception de DSLs). Fort de plus de 20 ingénieurs R&D, avec une expertise unique sur les langages de programmation, aussi bien théorique (plus de 80% de nos ingénieurs ont une thèse en informatique) que pratique (participation active au développement de plusieurs compilateurs open-source, prototypage de la blockchain Tezos, etc.), diversifiée (OCaml, Rust, Cobol, Python, Scilab, C/C++, etc.) et appliquée à de multiples domaines. Nous dispensons également des [formations sur mesure certifiées Qualiopi sur OCaml, Rust, et les méthodes formelles] (https://training.ocamlpro.com/) Pour nous contacter : contact@ocamlpro.com.