Sub-single-instruction Peano to machine integer conversion
It is a rainy end of January in Paris, morale is getting soggier by the day, and the bulk of our light exposure needs are now fulfilled by our computer screens as the sun seems to have definitively disappeared behind a continuous stream of low-hanging clouds. But, all is not lost, the warm rays of comradeship pierce through the bleak afternoons, and our joyful party of adventurers once again embarked on an adventure of curiosity and rabbit-hole depth-first-searching.
Last week's quest led us to a treasure coveted by a mere handful of enlightened connoisseurs, but a treasure of tremendous nerdy-beauty, known to the academics as "Sub-single-instruction Peano to machine integer conversion" and to the locals as "How to count how many nested Some
there are very very fast by leveraging druidic knowledge about unspecified, undocumented, and unstable behavior of the Rust compiler".
Our quest begins
Our whole quest started when we wanted to learn more about discriminant elision. Discriminant elision in Rust is part of what makes it practical to use Option<&T>
in place of *const T
. More precisely it is what allows Option<&T>
to fit in as much memory as *const T
, and not twice as much. To understand why, let's consider an Option<u64>
. An u64
is 8 bytes in size. An Option<u64>
should have at least one more bit, to indicate whether it is a None
, or a Some
. But bits are not very practical to deal with for computers, and hence this discriminant value -- indicating which of the two variants (Some
or None
) the value is -- should take up at least one byte. Because of alignment requirements (and because the size is always a multiple of the alignment) it actually ends up taking 8 bytes as well, so that the whole Option<u64>
occupies twice the size of the wrapped u64
.
In languages like C, it is very common to pass around pointers, and give them a specific meaning if they are null. Typically, a function like lfind
which searches for an element in a array will return a pointer to the matching element, and this pointer will be null if no such element was found. In Rust however fallibility is expected to be encoded in the type system. Hence, functions like find
returns a reference, wrapped in a Option
. Because this kind of API is so ubiquitous, it would have been a real hindrance to Rust adoption if it took twice as much space as the C version.
This is why discriminant elision exists. In our Option<&T>
example Rust can leverage the same logic as C: &T
references in Rust are guaranteed to be -- among other things -- non-null. Hence Rust can choose to encode the None
variant as the null value of the variable. Transparently to the user, our Option<&T>
now fits on 8 bytes, the same size as a simple &T
. But Rust discriminant elision mechanism goes beyond Option<&T>
and works for any general type if:
- The option-like value has one fieldless variant and one single-field variant
- The wrapped type has so-called niche values, that is values that are statically known to never be valid for said type.
Discriminant elision remains under-specified, but more information can be found in the FFI guidelines. Note that other unspecified situations seem to benefit from niche optimization (e.g. PR#94075).
Too many options
Out of curiosity, we wanted to investigate how the Rust compiler represents a series of nested Option
. It turns out that up to 255 nested options can be stored into a byte, which is also the theoretical limit. Because this mechanism is not limited to Option
, we can use it with (value-level) Peano integers. Peano integers are a theoretical encoding of integer in "unary base", but it is enough for this post to consider them a fun little gimmick. If you want to go further, know that Peano integers are more often used at the type-level, to try to emulate type-level arithmetic.
In our case, we are mostly interested in Peano-integers at the value level. We define them as follows:
#![recursion_limit = "512"]
#![allow(dead_code)]
/// An empty enum, a type without inhabitants.
/// Cf: https://en.wikipedia.org/wiki/Bottom_type
enum Null {}
/// PeanoEncoder<Null> is a Peano-type able to represent integers up to 0.
/// If T is a Peano-type able to represent integers up to n
/// PeanoEncoder<T> is a Peano-type able to represent integers up to n+1
#[derive(Debug)]
enum PeanoEncoder<T> {
Successor(T),
Zero,
}
macro_rules! times2 {
($peano_2x:ident, $peano_x:ident ) => {
type $peano_2x<T> = $peano_x<$peano_x<T>>;
};
}
times2!(PeanoEncoder2, PeanoEncoder);
times2!(PeanoEncoder4, PeanoEncoder2);
times2!(PeanoEncoder8, PeanoEncoder4);
times2!(PeanoEncoder16, PeanoEncoder8);
times2!(PeanoEncoder32, PeanoEncoder16);
times2!(PeanoEncoder64, PeanoEncoder32);
times2!(PeanoEncoder128, PeanoEncoder64);
times2!(PeanoEncoder256, PeanoEncoder128);
type Peano0 = PeanoEncoder<Null>;
type Peano255 = PeanoEncoder256<Null>;
Note that we cannot simply go for
enum Peano {
Succesor(Peano),
Zero,
}
like in Haskell or OCaml because without indirection the type has infinite size, and adding indirection would break discriminant elision. What we really have is that we are actually using a type-level Peano-encoding of integers to create a type Peano256
that contains value-level Peano-encoding of integers up to 255, as a byte would.
We can define the typical recursive pattern matching based way of converting our Peano integer to a machine integer (a byte).
trait IntoU8 {
fn into_u8(self) -> u8;
}
impl IntoU8 for Null {
fn into_u8(self) -> u8 {
match self {}
}
}
impl<T: IntoU8> IntoU8 for PeanoEncoder<T> {
fn into_u8(self) -> u8 {
match self {
PeanoEncoder::Successor(x) => 1 + x.into_u8(),
PeanoEncoder::Zero => 0,
}
}
}
Here, according to godbolt, Peano255::into_u8
gets compiled to more than 900 lines of assembly, which resembles a binary decision tree with jump-tables at the leaves.
However, we can inspect a bit how rustc represents a few values:
println!("Size of Peano255: {} byte", std::mem::size_of::<Peano255>());
for x in [
Peano255::Zero,
Peano255::Successor(PeanoEncoder::Zero),
Peano255::Successor(PeanoEncoder::Successor(PeanoEncoder::Zero)),
] {
println!("Machine representation of {:?}: {}", x, unsafe {
std::mem::transmute::<_, u8>(x)
})
}
which gives
Size of Peano255: 1 byte
Machine representation of Zero: 255
Machine representation of Successor(Zero): 254
Machine representation of Successor(Successor(Zero)): 253
A pattern seems to emerge. Rustc chooses to represent Peano255::Zero
as 255, and each successor as one less.
As a brief detour, let's see what happens for PeanoN
with other values of N.
let x = Peano1::Zero;
println!("Machine representation of Peano1::{:?}: {}", x, unsafe {
std::mem::transmute::<_, u8>(x)
});
for x in [
Peano2::Successor(PeanoEncoder::Zero),
Peano2::Zero,
] {
println!("Machine representation of Peano2::{:?}: {}", x, unsafe {
std::mem::transmute::<_, u8>(x)
})
}
gives
Machine representation of Peano1::Zero: 1
Machine representation of Peano2::Successor(Zero): 1
Machine representation of Peano2::Zero: 2
Notice that the representation of Zero is not the same for each PeanoN
. What we actually have -- and what is key here -- is that the representation for x
of type PeanoN
is the same as the representation of Succesor(x)
of type PeanoEncoder<PeanoN>
, which implies that the machine representation of an integer k
in the type PeanoN
is n-k
.
That detour being concluded, we refocus on Peano255
for which we can write a very efficient conversion function
impl Peano255 {
pub fn transmute_u8(x: u8) -> Self {
unsafe { std::mem::transmute(u8::MAX - x) }
}
}
Note that this function mere existence is very wrong and a sinful abomination to the eye of anything that is holy and maintainable. But provided you run the same compiler version as me on the very same architecture, you may be ok using it. Please don't use it.
In any case transmute_u8
gets compiled to
movl %edi, %eax
notb %al
retq
that is a simple function that applies a binary not to its argument register. And in most use cases, this function would actually be inlined and combined with operations above, making it run in less than one processor operation!
And because 255 is so small, we can exhaustively check that the behavior is correct for all values! Take that formal methods!
for i in 0_u8..=u8::MAX {
let x = Peano255::transmute_u8(i);
if i % 8 == 0 {
print!("{:3} ", i)
} else if i % 8 == 4 {
print!(" ")
}
let c = if x.into_u8() == i { '✓' } else { '✗' };
print!("{}", c);
if i % 8 == 7 {
println!()
}
}
0 ✓✓✓✓ ✓✓✓✓
8 ✓✓✓✓ ✓✓✓✓
16 ✓✓✓✓ ✓✓✓✓
24 ✓✓✓✓ ✓✓✓✓
32 ✓✓✓✓ ✓✓✓✓
40 ✓✓✓✓ ✓✓✓✓
48 ✓✓✓✓ ✓✓✓✓
56 ✓✓✓✓ ✓✓✓✓
64 ✓✓✓✓ ✓✓✓✓
72 ✓✓✓✓ ✓✓✓✓
80 ✓✓✓✓ ✓✓✓✓
88 ✓✓✓✓ ✓✓✓✓
96 ✓✓✓✓ ✓✓✓✓
104 ✓✓✓✓ ✓✓✓✓
112 ✓✓✓✓ ✓✓✓✓
120 ✓✓✓✓ ✓✓✓✓
128 ✓✓✓✓ ✓✓✓✓
136 ✓✓✓✓ ✓✓✓✓
144 ✓✓✓✓ ✓✓✓✓
152 ✓✓✓✓ ✓✓✓✓
160 ✓✓✓✓ ✓✓✓✓
168 ✓✓✓✓ ✓✓✓✓
176 ✓✓✓✓ ✓✓✓✓
184 ✓✓✓✓ ✓✓✓✓
192 ✓✓✓✓ ✓✓✓✓
200 ✓✓✓✓ ✓✓✓✓
208 ✓✓✓✓ ✓✓✓✓
216 ✓✓✓✓ ✓✓✓✓
224 ✓✓✓✓ ✓✓✓✓
232 ✓✓✓✓ ✓✓✓✓
240 ✓✓✓✓ ✓✓✓✓
248 ✓✓✓✓ ✓✓✓✓
Isn't computer science fun?
Note: The code for this blog post is available here.
Au sujet d'OCamlPro :
OCamlPro développe des applications à haute valeur ajoutée depuis plus de 10 ans, en utilisant les langages les plus avancés, tels que OCaml et Rust, visant aussi bien rapidité de développement que robustesse, et en ciblant les domaines les plus exigeants (méthodes formelles, cybersécurité, systèmes distribués/blockchain, conception de DSLs). Fort de plus de 20 ingénieurs R&D, avec une expertise unique sur les langages de programmation, aussi bien théorique (plus de 80% de nos ingénieurs ont une thèse en informatique) que pratique (participation active au développement de plusieurs compilateurs open-source, prototypage de la blockchain Tezos, etc.), diversifiée (OCaml, Rust, Cobol, Python, Scilab, C/C++, etc.) et appliquée à de multiples domaines. Nous dispensons également des [formations sur mesure certifiées Qualiopi sur OCaml, Rust, et les méthodes formelles] (https://training.ocamlpro.com/) Pour nous contacter : contact@ocamlpro.com.
Articles les plus récents
2024
- opam 2.3.0 release!
- Optimisation de Geneweb, 1er logiciel français de Généalogie depuis près de 30 ans
- Alt-Ergo 2.6 is Out!
- Flambda2 Ep. 3: Speculative Inlining
- opam 2.2.0 release!
- Flambda2 Ep. 2: Loopifying Tail-Recursive Functions
- Fixing and Optimizing the GnuCOBOL Preprocessor
- OCaml Backtraces on Uncaught Exceptions
- Opam 102: Pinning Packages
- Flambda2 Ep. 1: Foundational Design Decisions
- Behind the Scenes of the OCaml Optimising Compiler Flambda2: Introduction and Roadmap
- Lean 4: When Sound Programs become a Choice
- Opam 101: The First Steps
2023
- Maturing Learn-OCaml to version 1.0: Gateway to the OCaml World
- The latest release of Alt-Ergo version 2.5.1 is out, with improved SMT-LIB and bitvector support!
- 2022 at OCamlPro
- Autofonce, GNU Autotests Revisited
- Sub-single-instruction Peano to machine integer conversion
- Statically guaranteeing security properties on Java bytecode: Paper presentation at VMCAI 23
- Release of ocplib-simplex, version 0.5