Latest Posts (10 found)
bitonic's blog. 1 months ago

Hidden benefits of undefined behavior

I was reviewing some code at work, and something jumped at me when reading some arithmetic: If you’ve had the good fortune of dealing with C or C++ for extended periods you would have recognized the problem. will be performed with 32-bit precision, which is almost certainly not what the programmer intended since it will lead to an overflow for any outside [-2, +2] . So far so good, a classic C footgun – we’ve all been there. However code using this function was running, and working, which was more surprising. Take a moment to consider why before revealing the solution. A direct compilation of will look something like this: First, perform 32-bit signed multiplication, then sign-extend the result into 64-bit. However, signed overflow is undefined behavior, so an optimizing compiler might prefer to do the sign extension before as part of a that it needed to do anyway, and then do the multiplication in 64 bits, thereby saving one instruction and “fixing” our bug in the process. So we have a rare case of two C footguns cancelling each other out, rather than combining into an even bigger footgun as usual.

0 views
bitonic's blog. 2 months ago

Open sourcing TernFS, a distributed filesystem

A few days ago we open sourced TernFS, a distributed filesystem which has been used in production at XTX for a couple of years now. The project took up a big chunk of my waking hours for a year and a half, and I’m extremely happy that we are now sharing it with the world. You can find a blog post describing it on XTX’s tech blog , and the source code on GitHub . The repository contains the full git history, which is somewhat unusual for these sort of projects: you can see how the codebase grew rapidly from prototype into a production system.

0 views
bitonic's blog. 5 months ago

How to store Go pointers from assembly

The standard Go toolchain comes with an assembler out of the box. Said assembler is highly idiosyncratic, using syntax inherited from Plan 9 and choosing its own names for platform-specific instructions and registers. But it's great to have it readily available. More mundanely, Go comes with a garbage collector. This post explains how to make these two components play nice, if we want to manipulate Go pointers from our assembly.

0 views
bitonic's blog. 5 months ago

`inline-verilog`: Execute Verilog circuits from Haskell

Like for most of the past 13 years, early June meant attending ZuriHac . The event was as pleasant and well organized as ever. While talking to Ed Kmett he half-jokingly mentioned that it would be useful to have a Verilog analogue of , especially to test Verilog circuits against Haskell functions. A few hours later I had , which lets to do just that. allows you to embed “nameless” Verilog modules right in your Haskell: The example above will print . Input and output arguments are expected to be present in the Haskell context in which the is called. The Haskell types for the input and outputs are automatically inferred from the Verilog code. The output arguments are expected to be pointers in which the outputs will be stored. Verilog code can also be factored out within a Haskell file using . See for a more complex example. The way works is by invoking to compile each Verilog block to C++, generating C stubs which execute the verilated circuit, and linking everything up with the Haskell executable. There are some limitations, namely I haven’t implemented or tested: All of the above should be easy, but it did not fit in the couple of hours it took to cook to its current state. Also, does not make much sense for non-combinatorial circuits (i.e. stuff that does IO). But to test functional units, it should do the job. Inputs/outputs wider than 64 bits. Multidimensional input/output ports, e.g.  .

0 views
bitonic's blog. 1 years ago

Waiting for many things at once with `io_uring`

When doing systems programming we often need to wait for something to happen. Common examples might be waiting for some data to come through a socket or waiting on a lock. We also often want to wait on any of several conditions to become true. A web server might be handling many sockets at once, waiting for any number of them to become readable or writeable. This short blog post is concerned with the latter scenario in Linux. Until recently there was no generic framework which allowed us to wait on many arbitrary events, but now there is, thanks to `io_uring`.

0 views
bitonic's blog. 1 years ago

CRC-32C tips and tricks

In a companion blog post I talked about how to effectively combine Reed-Solomon coding and CRCs. To do so I presented a few functions to manipulate CRCs. In this post I'll explain why and how they can be implemented.

0 views
bitonic's blog. 1 years ago

CRCs and Reed-Solomon coding: better together

In my latest filesystem-themed post I discussed a technique to perform distributed resource management more safely. This time I'll explain how one might effectively combine _Reed-Solomon coding_ and _cyclic redundancy checks_. The first gives us redundancy (we can lose disks and still recover our data), the second protects us against data corruption.

0 views
bitonic's blog. 1 years ago

Message authentication codes for safer distributed transactions

I've been developing and quickly deploying a distributed system, which is a class of software where bugs are expensive. A few hundred petabytes later, we haven't lost a single byte of data, also thanks to a simple trick which catches a large class of bugs when delegating responsibilities to possibly buggy software. It's a neat use of cryptography beyond security, so here's a small description.

0 views
bitonic's blog. 1 years ago

How to stop Linux threads cleanly

Stopping a Linux thread is a surprisingly annoying affair. In this post I present common pitfalls and some solutions -- although no truly satisfactory method exists.

0 views
bitonic's blog. 3 years ago

The essence of Reed-Solomon coding

Let’s say we want to store some data on multiple drives, so that we can recover from drive failures. One obvious (and valid!) first attempt is to just store everything multiple times – usually called “mirroring”. The most common form of mirroring is to store things twice: if one drive fails, the data is still in the other. 1 Reed-Solomon coding gives us much more flexibility, allowing us to store our data over n = k + t drives, so that any t drives can fail while still not losing data. 2 For instance, with \mathrm{RS}(10, 4) we can store data over 14 drives, and any 4 drives can fail at once, without incurring data loss. The data is split over 10 equal blocks, and then 4 additional blocks (“parity” blocks) are generated in a such a way that any 4 blocks can be recovered from the other 10. 3 4 The main idea behind Reed-Solomon is easy to visualize. We’ll focus on the task of durably storing k numbers y_1, ..., y_k . We’ll plot them as (1, y_1), ..., (k, y_k) . For instance, with k = 4 , we might have: This is often known as RAID1 . ↩︎ This post presents a specific way to perform Reed-Solomon coding, using Lagrange interpolation. See the Wikipedia article for more information. ↩︎ RAID4 and RAID6 fit in this framework as \mathrm{RS}(3,1) and \mathrm{RS}(4,2) respectively. ↩︎ The “parity” terminology comes from other error detecting or correcting codes which use parity bits. ↩︎ For any distinct k points, there’s a unique polynomial of degree d < k passing through them. 5 This fact should not be too surprising, and is apparent when k = 2 : given two distinct points, there’s only one line passing through them. Here’s the unique third-degree polynomial passing through our four points: A polynomial of degree d is a something of the form a_0 + a_1 x + a_2 x^2 + ... + a_d x^d ↩︎ Armed with this fact, we can pick an additional t points on the same polynomial. Again, with k = 4 and t = 2 , we would have: Sampled points are in gold. Since the interpolating polynomial is unique given any k points it passes through, we can drop any t points, and we’ll still be able to recover the polynomial. Once we’ve done that, we can just resample it to recover the numbers we’re storing. To recap, our procedure to durably store k numbers is as follows: At this point, you already understand a key idea powering Reed-Solomon coding. The other key idea allows us to store bits, rather than numbers. I won’t properly explain it here, but the gist of it is to use finite fields of order 2^{\mathrm{bits}} as our number type. 6 Such finite fields are numeric types working differently from the usual integers modulo 2^{\mathrm{bits}} that we’re used to program with, but still easy to implement in hardware, and importantly numbers for which the considerations in this blog post about polynomials hold . 7 Once we have such a numeric type, all we need to do to durably store some blob of data is to split it in a series of 2^{\mathrm{bits}} numbers (each \mathrm{bits} wide), group them in sets of k , and then store them durably as described in this post, tuning t based on how redundant we want to be. The final trick worth mentioning is that this kind of Reed-Solomon encoding can be implemented efficiently given that we have fixed the x coordinates, no matter what numbers we want to store. Or in other words, the definition for \ell_j which we use to generate the unique polynomial only depends on the x coordinates, which allows us to do the heavy lifting once for any k numbers we want to store. Head over to Peter Cawley’s blog for details on how the finite field machinery works on modern CPUs. ↩︎ As the name suggests, finite fields are fields , which is handy since generating polynomials passing through a set of points involves divisions. Integers modulo 2^\mathrm{bits} , which is what we’re used to program with, are not closed under division, which jams the maths required to perform the operations we described. ↩︎ This is often known as RAID1 . ↩︎ This post presents a specific way to perform Reed-Solomon coding, using Lagrange interpolation. See the Wikipedia article for more information. ↩︎ RAID4 and RAID6 fit in this framework as \mathrm{RS}(3,1) and \mathrm{RS}(4,2) respectively. ↩︎ The “parity” terminology comes from other error detecting or correcting codes which use parity bits. ↩︎ A polynomial of degree d is a something of the form a_0 + a_1 x + a_2 x^2 + ... + a_d x^d ↩︎ Compute the unique polynomial of degree d < k by placing the numbers at some predefined interval on the XY plane; Sample the polynomial t times beyond the original points; Store the k original numbers alongside the t “parity” numbers; If some numbers are lost we can recompute them by resampling the polynomial. Head over to Peter Cawley’s blog for details on how the finite field machinery works on modern CPUs. ↩︎ As the name suggests, finite fields are fields , which is handy since generating polynomials passing through a set of points involves divisions. Integers modulo 2^\mathrm{bits} , which is what we’re used to program with, are not closed under division, which jams the maths required to perform the operations we described. ↩︎

0 views