Sub-single-instruction Peano to machine integer conversion

Authors: Arthur Carcano
Date: 2023-01-23
Category: Rust
Tags: Rust



It is a rainy end of January in Paris, morale is getting soggier by the day, and the bulk of our light exposure needs are now fulfilled by our computer screens as the sun seems to have definitively disappeared behind a continuous stream of low-hanging clouds. But, all is not lost, the warm rays of comradeship pierce through the bleak afternoons, and our joyful party of adventurers once again embarked on an adventure of curiosity and rabbit-hole depth-first-searching.

Last week's quest led us to a treasure coveted by a mere handful of enlightened connoisseurs, but a treasure of tremendous nerdy-beauty, known to the academics as "Sub-single-instruction Peano to machine integer conversion" and to the locals as "How to count how many nested Some there are very very fast by leveraging druidic knowledge about unspecified, undocumented, and unstable behavior of the Rust compiler".

Our quest begins

Our whole quest started when we wanted to learn more about discriminant elision. Discriminant elision in Rust is part of what makes it practical to use Option<&T> in place of *const T. More precisely it is what allows Option<&T> to fit in as much memory as *const T, and not twice as much. To understand why, let's consider an Option<u64>. An u64 is 8 bytes in size. An Option<u64> should have at least one more bit, to indicate whether it is a None, or a Some. But bits are not very practical to deal with for computers, and hence this discriminant value -- indicating which of the two variants (Some or None) the value is -- should take up at least one byte. Because of alignment requirements (and because the size is always a multiple of the alignment) it actually ends up taking 8 bytes as well, so that the whole Option<u64> occupies twice the size of the wrapped u64.

In languages like C, it is very common to pass around pointers, and give them a specific meaning if they are null. Typically, a function like lfind which searches for an element in a array will return a pointer to the matching element, and this pointer will be null if no such element was found. In Rust however fallibility is expected to be encoded in the type system. Hence, functions like find returns a reference, wrapped in a Option. Because this kind of API is so ubiquitous, it would have been a real hindrance to Rust adoption if it took twice as much space as the C version.

This is why discriminant elision exists. In our Option<&T> example Rust can leverage the same logic as C: &T references in Rust are guaranteed to be -- among other things -- non-null. Hence Rust can choose to encode the None variant as the null value of the variable. Transparently to the user, our Option<&T> now fits on 8 bytes, the same size as a simple &T. But Rust discriminant elision mechanism goes beyond Option<&T> and works for any general type if:

  1. The option-like value has one fieldless variant and one single-field variant
  2. The wrapped type has so-called niche values, that is values that are statically known to never be valid for said type.

Discriminant elision remains under-specified, but more information can be found in the FFI guidelines. Note that other unspecified situations seem to benefit from niche optimization (e.g. PR#94075).

Too many options

Out of curiosity, we wanted to investigate how the Rust compiler represents a series of nested Option. It turns out that up to 255 nested options can be stored into a byte, which is also the theoretical limit. Because this mechanism is not limited to Option, we can use it with (value-level) Peano integers. Peano integers are a theoretical encoding of integer in "unary base", but it is enough for this post to consider them a fun little gimmick. If you want to go further, know that Peano integers are more often used at the type-level, to try to emulate type-level arithmetic.

In our case, we are mostly interested in Peano-integers at the value level. We define them as follows:

#![recursion_limit = "512"]
#![allow(dead_code)]

/// An empty enum, a type without inhabitants.
/// Cf: https://en.wikipedia.org/wiki/Bottom_type
enum Null {}

/// PeanoEncoder<Null> is a Peano-type able to represent integers up to 0.
/// If T is a Peano-type able to represent integers up to n
/// PeanoEncoder<T> is a Peano-type able to represent integers up to n+1
#[derive(Debug)]
enum PeanoEncoder<T> {
    Successor(T),
    Zero,
}

macro_rules! times2 {
    ($peano_2x:ident, $peano_x:ident ) => {
        type $peano_2x<T> = $peano_x<$peano_x<T>>;
    };
}
times2!(PeanoEncoder2, PeanoEncoder);
times2!(PeanoEncoder4, PeanoEncoder2);
times2!(PeanoEncoder8, PeanoEncoder4);
times2!(PeanoEncoder16, PeanoEncoder8);
times2!(PeanoEncoder32, PeanoEncoder16);
times2!(PeanoEncoder64, PeanoEncoder32);
times2!(PeanoEncoder128, PeanoEncoder64);
times2!(PeanoEncoder256, PeanoEncoder128);

type Peano0 = PeanoEncoder<Null>;
type Peano255 = PeanoEncoder256<Null>;

Note that we cannot simply go for

enum Peano {
    Succesor(Peano),
    Zero,
}

like in Haskell or OCaml because without indirection the type has infinite size, and adding indirection would break discriminant elision. What we really have is that we are actually using a type-level Peano-encoding of integers to create a type Peano256 that contains value-level Peano-encoding of integers up to 255, as a byte would.

We can define the typical recursive pattern matching based way of converting our Peano integer to a machine integer (a byte).

trait IntoU8 {
    fn into_u8(self) -> u8;
}

impl IntoU8 for Null {
    fn into_u8(self) -> u8 {
        match self {}
    }
}

impl<T: IntoU8> IntoU8 for PeanoEncoder<T> {
    fn into_u8(self) -> u8 {
        match self {
            PeanoEncoder::Successor(x) => 1 + x.into_u8(),
            PeanoEncoder::Zero => 0,
        }
    }
}

Here, according to godbolt, Peano255::into_u8 gets compiled to more than 900 lines of assembly, which resembles a binary decision tree with jump-tables at the leaves.

However, we can inspect a bit how rustc represents a few values:

println!("Size of Peano255: {} byte", std::mem::size_of::<Peano255>());
for x in [
    Peano255::Zero,
    Peano255::Successor(PeanoEncoder::Zero),
    Peano255::Successor(PeanoEncoder::Successor(PeanoEncoder::Zero)),
] {
    println!("Machine representation of {:?}: {}", x, unsafe {
        std::mem::transmute::<_, u8>(x)
    })
}

which gives

Size of Peano255: 1 byte
Machine representation of Zero: 255
Machine representation of Successor(Zero): 254
Machine representation of Successor(Successor(Zero)): 253

A pattern seems to emerge. Rustc chooses to represent Peano255::Zero as 255, and each successor as one less.

As a brief detour, let's see what happens for PeanoN with other values of N.

let x = Peano1::Zero;
println!("Machine representation of Peano1::{:?}: {}", x, unsafe {
    std::mem::transmute::<_, u8>(x)
});
for x in [
    Peano2::Successor(PeanoEncoder::Zero),
    Peano2::Zero,
] {
    println!("Machine representation of Peano2::{:?}: {}", x, unsafe {
        std::mem::transmute::<_, u8>(x)
    })
}

gives

Machine representation of Peano1::Zero: 1
Machine representation of Peano2::Successor(Zero): 1
Machine representation of Peano2::Zero: 2

Notice that the representation of Zero is not the same for each PeanoN. What we actually have -- and what is key here -- is that the representation for x of type PeanoN is the same as the representation of Succesor(x) of type PeanoEncoder<PeanoN>, which implies that the machine representation of an integer k in the type PeanoN is n-k.

That detour being concluded, we refocus on Peano255 for which we can write a very efficient conversion function

impl Peano255 {
    pub fn transmute_u8(x: u8) -> Self {
        unsafe { std::mem::transmute(u8::MAX - x) }
    }
}

Note that this function mere existence is very wrong and a sinful abomination to the eye of anything that is holy and maintainable. But provided you run the same compiler version as me on the very same architecture, you may be ok using it. Please don't use it.

In any case transmute_u8 gets compiled to

movl    %edi, %eax
notb    %al
retq

that is a simple function that applies a binary not to its argument register. And in most use cases, this function would actually be inlined and combined with operations above, making it run in less than one processor operation!

And because 255 is so small, we can exhaustively check that the behavior is correct for all values! Take that formal methods!

for i in 0_u8..=u8::MAX {
    let x = Peano255::transmute_u8(i);
    if i % 8 == 0 {
        print!("{:3} ", i)
    } else if i % 8 == 4 {
        print!(" ")
    }
    let c = if x.into_u8() == i { '✓' } else { '✗' };
    print!("{}", c);
    if i % 8 == 7 {
        println!()
    }
}
  0 ✓✓✓✓ ✓✓✓✓
  8 ✓✓✓✓ ✓✓✓✓
 16 ✓✓✓✓ ✓✓✓✓
 24 ✓✓✓✓ ✓✓✓✓
 32 ✓✓✓✓ ✓✓✓✓
 40 ✓✓✓✓ ✓✓✓✓
 48 ✓✓✓✓ ✓✓✓✓
 56 ✓✓✓✓ ✓✓✓✓
 64 ✓✓✓✓ ✓✓✓✓
 72 ✓✓✓✓ ✓✓✓✓
 80 ✓✓✓✓ ✓✓✓✓
 88 ✓✓✓✓ ✓✓✓✓
 96 ✓✓✓✓ ✓✓✓✓
104 ✓✓✓✓ ✓✓✓✓
112 ✓✓✓✓ ✓✓✓✓
120 ✓✓✓✓ ✓✓✓✓
128 ✓✓✓✓ ✓✓✓✓
136 ✓✓✓✓ ✓✓✓✓
144 ✓✓✓✓ ✓✓✓✓
152 ✓✓✓✓ ✓✓✓✓
160 ✓✓✓✓ ✓✓✓✓
168 ✓✓✓✓ ✓✓✓✓
176 ✓✓✓✓ ✓✓✓✓
184 ✓✓✓✓ ✓✓✓✓
192 ✓✓✓✓ ✓✓✓✓
200 ✓✓✓✓ ✓✓✓✓
208 ✓✓✓✓ ✓✓✓✓
216 ✓✓✓✓ ✓✓✓✓
224 ✓✓✓✓ ✓✓✓✓
232 ✓✓✓✓ ✓✓✓✓
240 ✓✓✓✓ ✓✓✓✓
248 ✓✓✓✓ ✓✓✓✓

Isn't computer science fun?

Note: The code for this blog post is available here.



About OCamlPro:

OCamlPro is a R&D lab founded in 2011, with the mission to help industrial users benefit from experts with a state-of-the-art knowledge of programming languages theory and practice.

  • We provide audit, support, custom developer tools and training for both the most modern languages, such as Rust, Wasm and OCaml, and for legacy languages, such as COBOL or even home-made domain-specific languages;
  • We design, create and implement software with great added-value for our clients. High complexity is not a problem for our PhD-level experts. For example, we helped the French Income Tax Administration re-adapt and improve their internally kept M language, we designed a DSL to model and express revenue streams in the Cinema Industry, codename Niagara, and we also developed the prototype of the Tezos proof-of-stake blockchain from 2014 to 2018.
  • We have a long history of creating open-source projects, such as the Opam package manager, the LearnOCaml web platform, and contributing to other ones, such as the Flambda optimizing compiler, or the GnuCOBOL compiler.
  • We are also experts of Formal Methods, developing tools such as our SMT Solver Alt-Ergo (check our Alt-Ergo Users' Club) and using them to prove safety or security properties of programs.

Please reach out, we'll be delighted to discuss your challenges: contact@ocamlpro.com or book a quick discussion.