# Sub-single-instruction Peano to machine integer conversion

Authors: Arthur Carcano
Date: 2023-01-23
Category: Rust
Tags: Rust

It is a rainy end of January in Paris, morale is getting soggier by the day, and the bulk of our light exposure needs are now fulfilled by our computer screens as the sun seems to have definitively disappeared behind a continuous stream of low-hanging clouds. But, all is not lost, the warm rays of comradeship pierce through the bleak afternoons, and our joyful party of adventurers once again embarked on an adventure of curiosity and rabbit-hole depth-first-searching.

Last week's quest led us to a treasure coveted by a mere handful of enlightened connoisseurs, but a treasure of tremendous nerdy-beauty, known to the academics as "Sub-single-instruction Peano to machine integer conversion" and to the locals as "How to count how many nested `Some` there are very very fast by leveraging druidic knowledge about unspecified, undocumented, and unstable behavior of the Rust compiler".

# Our quest begins

Our whole quest started when we wanted to learn more about discriminant elision. Discriminant elision in Rust is part of what makes it practical to use `Option<&T>` in place of `*const T`. More precisely it is what allows `Option<&T>` to fit in as much memory as `*const T`, and not twice as much. To understand why, let's consider an `Option<u64>`. An `u64` is 8 bytes in size. An `Option<u64>` should have at least one more bit, to indicate whether it is a `None`, or a `Some`. But bits are not very practical to deal with for computers, and hence this discriminant value -- indicating which of the two variants (`Some` or `None`) the value is -- should take up at least one byte. Because of alignment requirements (and because the size is always a multiple of the alignment) it actually ends up taking 8 bytes as well, so that the whole `Option<u64>` occupies twice the size of the wrapped `u64`.

In languages like C, it is very common to pass around pointers, and give them a specific meaning if they are null. Typically, a function like `lfind` which searches for an element in a array will return a pointer to the matching element, and this pointer will be null if no such element was found. In Rust however fallibility is expected to be encoded in the type system. Hence, functions like `find` returns a reference, wrapped in a `Option`. Because this kind of API is so ubiquitous, it would have been a real hindrance to Rust adoption if it took twice as much space as the C version.

This is why discriminant elision exists. In our `Option<&T>` example Rust can leverage the same logic as C: `&T` references in Rust are guaranteed to be -- among other things -- non-null. Hence Rust can choose to encode the `None` variant as the null value of the variable. Transparently to the user, our `Option<&T>` now fits on 8 bytes, the same size as a simple `&T`. But Rust discriminant elision mechanism goes beyond `Option<&T>` and works for any general type if:

1. The option-like value has one fieldless variant and one single-field variant
2. The wrapped type has so-called niche values, that is values that are statically known to never be valid for said type.

Discriminant elision remains under-specified, but more information can be found in the FFI guidelines. Note that other unspecified situations seem to benefit from niche optimization (e.g. PR#94075).

# Too many options

Out of curiosity, we wanted to investigate how the Rust compiler represents a series of nested `Option`. It turns out that up to 255 nested options can be stored into a byte, which is also the theoretical limit. Because this mechanism is not limited to `Option`, we can use it with (value-level) Peano integers. Peano integers are a theoretical encoding of integer in "unary base", but it is enough for this post to consider them a fun little gimmick. If you want to go further, know that Peano integers are more often used at the type-level, to try to emulate type-level arithmetic.

In our case, we are mostly interested in Peano-integers at the value level. We define them as follows:

``````#![recursion_limit = "512"]

/// An empty enum, a type without inhabitants.
/// Cf: https://en.wikipedia.org/wiki/Bottom_type
enum Null {}

/// PeanoEncoder<Null> is a Peano-type able to represent integers up to 0.
/// If T is a Peano-type able to represent integers up to n
/// PeanoEncoder<T> is a Peano-type able to represent integers up to n+1
#[derive(Debug)]
enum PeanoEncoder<T> {
Successor(T),
Zero,
}

macro_rules! times2 {
(\$peano_2x:ident, \$peano_x:ident ) => {
type \$peano_2x<T> = \$peano_x<\$peano_x<T>>;
};
}
times2!(PeanoEncoder2, PeanoEncoder);
times2!(PeanoEncoder4, PeanoEncoder2);
times2!(PeanoEncoder8, PeanoEncoder4);
times2!(PeanoEncoder16, PeanoEncoder8);
times2!(PeanoEncoder32, PeanoEncoder16);
times2!(PeanoEncoder64, PeanoEncoder32);
times2!(PeanoEncoder128, PeanoEncoder64);
times2!(PeanoEncoder256, PeanoEncoder128);

type Peano0 = PeanoEncoder<Null>;
type Peano255 = PeanoEncoder256<Null>;
``````

Note that we cannot simply go for

``````enum Peano {
Succesor(Peano),
Zero,
}
``````

like in Haskell or OCaml because without indirection the type has infinite size, and adding indirection would break discriminant elision. What we really have is that we are actually using a type-level Peano-encoding of integers to create a type `Peano256` that contains value-level Peano-encoding of integers up to 255, as a byte would.

We can define the typical recursive pattern matching based way of converting our Peano integer to a machine integer (a byte).

``````trait IntoU8 {
fn into_u8(self) -> u8;
}

impl IntoU8 for Null {
fn into_u8(self) -> u8 {
match self {}
}
}

impl<T: IntoU8> IntoU8 for PeanoEncoder<T> {
fn into_u8(self) -> u8 {
match self {
PeanoEncoder::Successor(x) => 1 + x.into_u8(),
PeanoEncoder::Zero => 0,
}
}
}
``````

Here, according to godbolt, `Peano255::into_u8` gets compiled to more than 900 lines of assembly, which resembles a binary decision tree with jump-tables at the leaves.

However, we can inspect a bit how rustc represents a few values:

``````println!("Size of Peano255: {} byte", std::mem::size_of::<Peano255>());
for x in [
Peano255::Zero,
Peano255::Successor(PeanoEncoder::Zero),
Peano255::Successor(PeanoEncoder::Successor(PeanoEncoder::Zero)),
] {
println!("Machine representation of {:?}: {}", x, unsafe {
std::mem::transmute::<_, u8>(x)
})
}
``````

which gives

``````Size of Peano255: 1 byte
Machine representation of Zero: 255
Machine representation of Successor(Zero): 254
Machine representation of Successor(Successor(Zero)): 253
``````

A pattern seems to emerge. Rustc chooses to represent `Peano255::Zero` as 255, and each successor as one less.

As a brief detour, let's see what happens for `PeanoN` with other values of N.

``````let x = Peano1::Zero;
println!("Machine representation of Peano1::{:?}: {}", x, unsafe {
std::mem::transmute::<_, u8>(x)
});
for x in [
Peano2::Successor(PeanoEncoder::Zero),
Peano2::Zero,
] {
println!("Machine representation of Peano2::{:?}: {}", x, unsafe {
std::mem::transmute::<_, u8>(x)
})
}
``````

gives

``````Machine representation of Peano1::Zero: 1
Machine representation of Peano2::Successor(Zero): 1
Machine representation of Peano2::Zero: 2
``````

Notice that the representation of Zero is not the same for each `PeanoN`. What we actually have -- and what is key here -- is that the representation for `x` of type `PeanoN` is the same as the representation of `Succesor(x)` of type `PeanoEncoder<PeanoN>`, which implies that the machine representation of an integer `k` in the type `PeanoN` is `n-k`.

That detour being concluded, we refocus on `Peano255` for which we can write a very efficient conversion function

``````impl Peano255 {
pub fn transmute_u8(x: u8) -> Self {
unsafe { std::mem::transmute(u8::MAX - x) }
}
}
``````

Note that this function mere existence is very wrong and a sinful abomination to the eye of anything that is holy and maintainable. But provided you run the same compiler version as me on the very same architecture, you may be ok using it. Please don't use it.

In any case `transmute_u8` gets compiled to

``````movl    %edi, %eax
notb    %al
retq
``````

that is a simple function that applies a binary not to its argument register. And in most use cases, this function would actually be inlined and combined with operations above, making it run in less than one processor operation!

And because 255 is so small, we can exhaustively check that the behavior is correct for all values! Take that formal methods!

``````for i in 0_u8..=u8::MAX {
let x = Peano255::transmute_u8(i);
if i % 8 == 0 {
print!("{:3} ", i)
} else if i % 8 == 4 {
print!(" ")
}
let c = if x.into_u8() == i { '✓' } else { '✗' };
print!("{}", c);
if i % 8 == 7 {
println!()
}
}
``````
``````  0 ✓✓✓✓ ✓✓✓✓
8 ✓✓✓✓ ✓✓✓✓
16 ✓✓✓✓ ✓✓✓✓
24 ✓✓✓✓ ✓✓✓✓
32 ✓✓✓✓ ✓✓✓✓
40 ✓✓✓✓ ✓✓✓✓
48 ✓✓✓✓ ✓✓✓✓
56 ✓✓✓✓ ✓✓✓✓
64 ✓✓✓✓ ✓✓✓✓
72 ✓✓✓✓ ✓✓✓✓
80 ✓✓✓✓ ✓✓✓✓
88 ✓✓✓✓ ✓✓✓✓
96 ✓✓✓✓ ✓✓✓✓
104 ✓✓✓✓ ✓✓✓✓
112 ✓✓✓✓ ✓✓✓✓
120 ✓✓✓✓ ✓✓✓✓
128 ✓✓✓✓ ✓✓✓✓
136 ✓✓✓✓ ✓✓✓✓
144 ✓✓✓✓ ✓✓✓✓
152 ✓✓✓✓ ✓✓✓✓
160 ✓✓✓✓ ✓✓✓✓
168 ✓✓✓✓ ✓✓✓✓
176 ✓✓✓✓ ✓✓✓✓
184 ✓✓✓✓ ✓✓✓✓
192 ✓✓✓✓ ✓✓✓✓
200 ✓✓✓✓ ✓✓✓✓
208 ✓✓✓✓ ✓✓✓✓
216 ✓✓✓✓ ✓✓✓✓
224 ✓✓✓✓ ✓✓✓✓
232 ✓✓✓✓ ✓✓✓✓
240 ✓✓✓✓ ✓✓✓✓
248 ✓✓✓✓ ✓✓✓✓
``````

Isn't computer science fun?

Note: The code for this blog post is available here.