Latest Posts (20 found)
Made of Bugs 1 months ago

Solving Regex Crosswords with Z3

For a while now, I’ve been fascinated by Z3 and by SMT solving more broadly. While on pat leave recently, I was reminded of the existence of regular-expression crossword puzzles, and allowed myself to get nerdsniped by writing a Z3-backed solver. I expected to spend perhaps an afternoon cranking out a quick solver; I ended up getting sucked into understanding and debugging Z3 performance, and learning far more about Z3 and about SMT than I expected.

0 views
Made of Bugs 4 months ago

The ITTAGE indirect branch predictor

While investigating the performance of the new Python 3.14 tail-calling interpreter, I learned (via this very informative comment from Sam Gross) new (to me) piece of performance trivia: Modern CPUs mostly no longer struggle to predict the bytecode-dispatch indirect jump inside a “conventional” bytecode interpreter loop. In steady-state, assuming the bytecode itself is reasonable stable, modern CPUs achieve very high accuracy predicting the dispatch, even for “vanilla” while / switch-style interpreter loops1!

0 views
Made of Bugs 8 months ago

Performance of the Python 3.14 tail-call interpreter

About a month ago, the CPython project merged a new implementation strategy for their bytecode interpreter. The initial headline results were very impressive, showing a 10-15% performance improvement on average across a wide range of benchmarks across a variety of platforms. Unfortunately, as I will document in this post, these impressive performance gains turned out to be primarily due to inadvertently working around a regression in LLVM 19. When benchmarked against a better baseline (such GCC, clang-18, or LLVM 19 with certain tuning flags), the performance gain drops to 1-5% or so depending on the exact setup.

0 views
Made of Bugs 10 months ago

Building personal software with Claude

Earlier this month, I used Claude to port (parts of) an Emacs package into Rust, shrinking the execution time by a factor of 1000 or more (in one concrete case: from 90s to about 15ms). This is a variety of yak-shave that I do somewhat routinely, both professionally and in service of my personal computing environment. However, this time, Claude was able to execute substantially the entire project under my supervision without me writing almost-any lines of code, speeding up the project substantially compared to doing it by hand.

0 views
Made of Bugs 1 years ago

Finding near-duplicates with Jaccard similarity and MinHash

Suppose we have a large collection of documents, and we wish you identify which documents are approximately the same as each other. For instance, we may have crawled the web over some period of time, and expect to have fetched the “same page” several times, but to see slight differences in metadata, or that we have several revisions of a page following small edits. In this post I want to explore the method of approximate deduplication via Jaccard similarity and the MinHash approximation trick.

0 views
Made of Bugs 1 years ago

Stripe's monorepo developer environment

I worked at Stripe for about seven years, from 2012 to 2019. Over that time, I used and contributed to many generations of Stripe’s developer environment – the tools that engineers used daily to write and test code. I think Stripe did a pretty good job designing and building that developer experience, and since leaving, I’ve found myself repeatedly describing features of that environment to friends and colleagues. This post is an attempt to record the salient features of that environment as I remember it.

0 views
Made of Bugs 1 years ago

Performance engineering, profilers, and seeing the invisible

I was recently introduced to the paper “Seeing the Invisible: Perceptual-Cognitive Aspects of Expertise” by Gary Klein and Robert Hoffman. It’s excellent and I recommend you read it when you have a chance. Klein and Hoffman discuss the ability of experts to “see what is not there”: in addition to observing data and cues that are present in the environment, experts perceive implications of these cues, such as the absence of expected or “typical” information, the typicality or atypicality of observed data, and likely/possible past and future time trajectories of a system based on a point-in-time snapshot or limited duration of observation.

0 views
Made of Bugs 1 years ago

Advent of Code in C++ Template Metaprogramming

This December, the imp of the perverse struck me, and I decided to see how many days of Advent of Code I could do purely in compile-time C++ metaprogramming. As of this writing, I’ve done two days, and I’m not sure I’ll make it any further. However, that’s one more day than I planned to do as of yesterday, which is in turn further than I thought I’d make it after my first attempt.

0 views
Made of Bugs 2 years ago

What's with ML software and pickles?

I have spent many years as an software engineer who was a total outsider to machine-learning, but with some curiosity and occasional peripheral interactions with it. During this time, a recurring theme for me was horror (and, to be honest, disdain) every time I encountered the widespread usage of Python pickle in the Python ML ecosystem. In addition to their major security issues1, the use of pickle for serialization tends to be very brittle, leading to all kinds of nightmares as you evolve your code and upgrade libraries and Python versions.

0 views
Made of Bugs 2 years ago

Graceful behavior at capacity

Suppose we’ve got a service. We’ll gloss over the details for now, but let’s stipulate that it accepts requests from the outside world, and takes some action in response. Maybe those requests are HTTP requests, or RPCs, or just incoming packets to be routed at the network layer. We can get more specific later. What can we say about its performance? All we know is that it receives requests, and that it acts on them.

0 views
Made of Bugs 2 years ago

Efficiency trades off against resiliency

What’s the “right” level of CPU utilization for a server? If you look at a monitoring dashboard from a well-designed and well-run service, what CPU utilization should we hope to see, averaged over a day or two? It’s a very general question, and it’s not clear it should have a single answer. That said, for a long time, I generally believed that higher is always better: we should aim for as close to 100% utilization as we can.

0 views
Made of Bugs 3 years ago

Transformers for software engineers

Ever since its introduction in the 2017 paper, Attention is All You Need, the Transformer model architecture has taken the deep-learning world by storm. Initially introduced for machine translation, it has become the tool of choice for a wide range of domains, including text, audio, video, and others. Transformers have also driven most of the massive increases in model scale and capability in the last few years. OpenAI’s GPT-3 and Codex models are Transformers, as are DeepMind’s Gopher models and many others.

0 views
Made of Bugs 3 years ago

A Cursed Bug

In my day job at Anthropic, we run relatively large distributed systems to train large language models. One of the joys of using a lot of computing resources, especially on somewhat niche software stacks, is that you spend a lot of time running into the long-tail of bugs which only happen rarely or in very unusual configurations, which you happen to be the first to encounter. These bugs are frustrating, but I also often enjoy them.

0 views
Made of Bugs 4 years ago

Distributed cloud builds for everyone

CPU cycles are cheaper than they have ever been, and cloud computing has never been more ubiquitous. All the major cloud providers offer generous free tiers, and services like GitHub Actions offer free compute resources to open-source repositories. So why do so many developers still build software on their laptops? Despite the embarrassment of riches of cheap or even free cloud compute, most projects I know of, and most developers, still do most of their software development — building and running code — directly on their local machines.

0 views
Made of Bugs 4 years ago

Building LLVM in 90 seconds using Amazon Lambda

Last week, Frederic Cambus wrote about building LLVM quickly on some very large machines, culminating in a 2m37s build on a 160-core ARM machine. I don’t have a giant ARM behemoth, but I have been working on a tool I call Llama, which lets you offload computational work – including C and C++ builds – onto Amazon Lambda. I decided to see how good it could do at a similar build.

0 views
Made of Bugs 4 years ago

Some opinionated thoughts on SQL databases

People who work with me tend to realize that I have Opinions about databases, and SQL databases in particular. Last week, I wrote about a Postgres debugging story and tweeted about AWS’ policy ban on internal use of SQL databases, and had occasion to discuss and debate some of those feelings on Twitter; this article is an attempt to write up more of them into a single place I can refer to.

0 views
Made of Bugs 5 years ago

Towards solving Ultimate Tic Tac Toe

Summary: Read about my efforts to solve the game of Ultimate Tic Tac Toe. It’s been a fun journey into interesting algorithms and high-performance parallel programming in Rust. Backstory Starting around the beginning of the COVID-19 lockdown, I’ve gotten myself deeply nerdsniped by an attempt to solve the game of Ultimate Tic Tac Toe, a two-level Tic Tac Toe variant which is (unlike Tic Tac Toe) nontrivial and contains some interesting strategic elements.

0 views
Made of Bugs 5 years ago

Write testable code by writing generic code

Alex Gaynor recently asked this question in an IRC channel I hang out in (a channel which contains several software engineers nearly as obsessed with software testing as I am): uhh, so I’m writing some code to handle an econnreset… how do I test this? This is a good question! Testing ECONNRESET is one of those fiddly problems that exists at the interface between systems — in his case, with S3, not even a system under his control — that can be infuriatingly tricky to reproduce and test.

0 views
Made of Bugs 5 years ago

Test suites as classifiers

Suppose we have some codebase we’re considering applying some patch to, and which has a robust and maintained test suite. Considering the patch, we may ask, is this patch acceptable to apply and deploy. By this we mean to ask if the patch breaks any important functionality, violates any key properties or invariants of the codebase, or would otherwise cause some unacceptable risk or harm. In principle, we can divide all patches into “acceptable” or “unacceptable” relative to some project-specific notion of what we’re willing to allow.

0 views
Made of Bugs 5 years ago

Systems that defy detailed understanding

Last week, I wrote about the mindset that computer systems can be understood, and behaviors can be explained, if we’re willing to dig deep enough into the stack of abstractions our software is built atop. Some of the ensuing discussion on Twitter and elsewhere lead me to write this followup, in which I want to run through a few classes of systems where I’ve found pursuing in-detail understanding of the system wasn’t the right answer.

0 views