Latest Posts (15 found)
sunshowers 1 weeks ago

Cancelling async Rust

This is an edited, written version of my RustConf 2025 talk about cancellations in async Rust. Like the written version of my RustConf 2023 talk , I’ve tried to retain the feel of a talk while making it readable as a standalone blog entry. Some links: Let’s start with a simple example – you decide to read from a channel in a loop and gather a bunch of messages: All good, nothing wrong with this, but you realize sometimes the channel is empty for long periods of time, so you add a timeout and print a message: There’s nothing wrong with this code—it behaves as expected. Now you realize you need to write a bunch of messages out to a channel in a loop: But sometimes the channel gets too full and blocks, so you add a timeout and print a message: It turns out that this code is often incorrect, because not all messages make their way to the channel. Hi, I’m Rain, and this post is about cancelling async Rust. This post is split into three parts: Before we begin, I want to lay my cards on the table – I really love async Rust! I gave a talk at RustConf a couple years ago talking about how async Rust is a great fit for signal handling in complex applications. I’m also the author of cargo-nextest , a next-generation test runner for Rust, where async Rust is the best way I know of to express some really complex algorithms that I wouldn’t know how to express otherwise. I wrote a blog post about this a few years ago. Now, I work at Oxide Computer Company , where we make cloud-in-a-box computers . We make vertically integrated systems where you provide power and networking on one end, and the software you want to run on the other end, and we take care of everything in between. Of course, we use Rust everywhere, and in particular we use async Rust extensively for our higher-level software, such as storage , networking and the customer-facing management API . But along the way we’ve encountered a number of issues around async cancellation, and a lot of this post is about what we learned along the way. What does cancellation mean? Logically, a cancellation is exactly what it sounds like: you start some work, and then change your mind and decide to stop doing that work. As you might imagine this is a useful thing to do: But then you change your mind: you want to cancel it rather than continue it to completion. Before we talk about async Rust, it’s worth thinking about how you’d do cancellations in synchronous Rust. One option is to have some kind of flag you periodically check, maybe stored in an atomic: This approach is fine for smaller bits of code but doesn’t really scale well to large chunks of code since you’d have to sprinkle these checks everywhere. A related option, if you’re working with a framework as part of your work, is to panic with a special payload of some kind. A third option is to kill the whole process . This is a very heavyweight approach, but an effective one in case you spawn processes to do your work. Rather than kill the whole process, can you kill a single thread? All of these options are suboptimal or of limited use in some way. In general, the way I think about it is that there isn’t a universal protocol for cancellation in synchronous Rust. In contrast, there is such a protocol in async Rust, and in fact cancellations are extraordinarily easy to perform in async Rust. Why is that so? To understand that, let’s look at what a future is. Here’s a simple example of a future: In this future, you first perform a network request which returns some data, and then you process it. The Rust compiler looks at this future and generates a state machine, which is just a struct or enum in memory: If you’ve written async Rust before the and keywords, you’ve probably written code like it by hand. It’s basically just an enum describing all the possible states the future can be in. The compiler also generates an implementation of the trait for this future: and when you call on the future, it gets translated down to this underlying function. It is only when or this function is called that something actually happens. Note that this is diametrically opposed to how async works in other languages like Go, JavaScript, or C#. In those languages, when you create a future to await on, it starts doing its thing, immediately, in the background: That’s regardless of whether you await it or not. In Rust, this call does nothing until you actually call on it: I know I sound a bit like a broken record here, but if you can take away one thing from this post, it would be that futures are passive, and completely inert until awaited or polled. So what does the universal protocol to cancel futures look like? It is simply to drop the future, or to not await it, or poll it any more. Since a future is just a state machine, you can throw it away at any time the poll function isn’t actively being called. The upshot of all this is that any Rust future can be cancelled at any await point. Given how hard cancellation tends to be in synchronous environments, the ability to easily cancel futures in async Rust is extraordinarily powerful—in many ways its greatest strength! But there is a flip side, which is that cancelling futures is far, far too easy. This is for two reasons. First, it’s just way too easy to quietly drop a future. As we’re going to see, there are all kinds of code patterns that lead to silently dropping futures. Now this wouldn’t be so bad, if not for the second reason: that cancellation of parent futures propagates down to child futures. Because of Rust’s single ownership model , child futures are owned by parent ones. If a parent future is dropped or cancelled, the same happens to the child. To figure out whether a child future’s cancellation can cause issues, you have to look at its parent, and grandparent, and so on. Reasoning about cancellation becomes a very complicated non-local operation . I’m going to cover some examples in a bit, but before we do that I want to talk about a couple terms, some of which you might have seen references to already. The first term is cancel safety . You might have seen mentions of this in the Tokio documentation. Cancel safety, as generally defined, means the property of a future that can be cancelled (i.e. dropped) without any side effects. For example, a Tokio sleep future is cancel safe: you can just stop waiting on the sleep and it’s completely fine. An example of a future that is not cancel safe is Tokio’s MPSC send , which sends a message over a channel: If this future is dropped, the message is lost forever. The important thing is that cancel safety is a local property of an individual future . But cancel safety is not all that one needs to care about. What actually matters is the context the cancellation happens in, or in other words whether the cancellation actually causes some kind of larger property in the system to be violated. To capture this I tend to use a different term called cancel correctness , which I define as a global property of system correctness in the face of cancellations. (This isn’t a standard term, but it’s a framing I’ve found really helpful in understanding cancellations.) When is cancel correctness violated? It requires three things: The system has a cancel-unsafe future somewhere within it. As we’ll see, many APIs that are cancel-unsafe can be reworked to be cancel-safe. If there aren’t any cancel-unsafe futures in the system, then the system is cancel correct. A cancel-unsafe future is actually cancelled. This may sound a bit trivial, but if cancel-unsafe futures are always run to completion, then the system can’t have cancel correctness bugs. Cancelling the future violates some property of a system. This could be data loss as with , some kind of invariant violation, or some kind of cleanup that must be performed but isn’t. So a lot of making Rust async robust is about trying to tackle one of these three things. I want to zoom in for a second on invariant violations and talk about an example of a Tokio API that is very prone to cancel correctness issues: Tokio mutexes . The way Tokio mutexes work is: you create a mutex, you lock it which gives you mutable access to the data underneath, and then you unlock it by releasing the mutex. If you look at the function’s documentation , in the “cancel safety” section it says: This method uses a queue to fairly distribute locks in the order they were requested. Cancelling a call to lock makes you lose your place in the queue. Okay, so not totally cancel safe, but the only kind of unsafety is fairness, which doesn’t sound too bad. But the problems lie in what you actually do with the mutex. In practice, most uses of mutexes are in order to temporarily violate invariants that are otherwise upheld when a lock isn’t held. I’ll use a real world example of a cancel correctness bug that we found at my job at Oxide: we had code to manage a bunch of data sent over by our computers, which we call sleds. The shared state was guarded by a mutex, and a typical operation was: Here’s a rough sketch of what that looks like: This is all well and good, but the problem is that the action being performed actually had an await point within it: If the code that operated on the mutex got cancelled at that await point, then the data would be stuck in the invalid state. Not great! And keep in mind the non-local reasoning aspect: when doing this analysis, you need to look at the whole chain of callers. Now that we’ve talked about some of the bad things that can happen during cancellations, it’s worth asking what kinds of code patterns lead to futures being cancelled. The most straightforward example, and maybe a bit of a silly one, is that you create a future but simply forget to call on it. Now Rust actually warns you if you don’t call on the future: But a code pattern I’ve sometimes made mistakes with is that the future returns a , and you want to ignore the result so you assign it to an underscore like so: If I forget to call on the future, Rust doesn’t warn me about it at all, and then I’m left scratching my head about why this code didn’t run. I know this sounds really silly and basic, but I’ve made this mistake a bunch of times. (After my talk, it was pointed out to me that Clippy 1.67 and above have a warn-by-default lint for this. Hooray!) Another example of futures being cancelled is operations, such as Tokio’s macro . For example: If you call with a bunch of futures, and all of them succeed, it’s all good. But if one of them fails, the rest simply get cancelled. In fact, at Oxide we had a pretty bad bug around this: we had code to stop a bunch of services, all expressed as futures. We used : If one of these operations failed for whatever reason, we would stop running the code to wait for the other services to exit. Oops! But perhaps the most well-known source of cancellations is Tokio’s macro . Select is this incredibly beautiful operation. It is called with a set of futures, and it drives all of them forward concurrently: Each future has a code block associated with it (above, and ). If one of the futures completes, the corresponding code block is called. But also, all of the other futures are always cancelled! For a variety of reasons, select statements in general, and select loops in particular, are particularly prone to cancel correctness issues. So a lot of the documentation about cancel safety talks about select loops. But I want to emphasize here that select is not the only source of cancellations, just a particularly notable one. So, now that we’ve looked at all of these issues with cancellations, what can be done about it? First, I want to break the bad news to you – there is no general, fully reliable solution for this in Rust today. But in our experience there are a few patterns that have been successful at reducing the likelihood of cancellation bugs. Going back to our definition of cancel correctness, there are three prongs all of which come together to produce a bug: Most solutions we’ve come up with try and tackle one of these prongs. Let’s look at the first prong: the system has a cancel-unsafe future somewhere in it. Can we use code patterns to make futures be cancel-safe? It turns out we can! I’ll give you two examples here. The first is MPSC sends. Let’s come back to the example from earlier where we would lose messages entirely: Can we find a way to make this cancel safe? In this case, yes, and we do so by breaking up the operation into two parts: (I want to put an asterisk here that reserve is not entirely cancel-safe, since Tokio’s MPSC follows a first-in-first-out pattern and dropping the future means losing your place in line. Keep this in mind for now.) The second is with Tokio’s . If you’ve written synchronous Rust you’re probably familiar with the method , which writes an entire buffer out: In synchronous Rust, this is a great API. But within async Rust, the pattern is absolutely not cancel safe! If the future is dropped before completion, you have no idea how much of this buffer was written out. But there’s an alternative API that is cancel-safe, called . This API is carefully designed to enable the reporting of partial progress, and it doesn’t just accept a buffer , but rather something that looks like a cursor on top of it: When part of the buffer is written out, the cursor is advanced by that number of bytes. So if you call in a loop, you’ll be resuming from this partial progress, which works great. Going back to the three prongs: the second prong is about actually cancelling futures. What code patterns can be used to not cancel futures? Here are a couple of examples. The first one is, in a place like a select loop, resume futures rather than cancelling them each time. You’d typically achieve this by pinning a future , and then polling a mutable reference to that future. For example: Coming back to our example of MPSC sends, the one asterisk with is that cancelling it makes you lose your place in line. Instead, if you pin the future and poll a mutable reference to it, you don’t lose your place in line. (Does the difference here matter? It depends, but you can now have this strategy available to you.) The second example is to use tasks. I mentioned earlier that futures are Rust are diametrically opposed to similar notions in languages like JavaScript. Well, there’s an alternative in async Rust that’s much closer to the JavaScript idea, and that’s tasks . A fun example is that at Oxide, we have an HTTP server called Dropshot . Previously, whenever an HTTP request came in, we’d use a future for it, and drop the future if the TCP connection was closed. This was really bad because future cancellations could happen due to the behavior of not just the parent future, but of a process that was running across a network! This is a rather extreme form of non-local reasoning. We addressed this by spinning up a task for each HTTP request, and by running the code to completion even if the connection is closed: The last thing I want to say is that this sucks ! The promise of Rust is that you don’t need to do this kind of non-local reasoning—that you can analyze small bits of code for local correctness, and scale that up to global correctness. Almost everything in Rust, from and to , is geared towards making that possible. Future cancellations fly directly in the face of that, and I think they’re probably the least Rusty part of Rust . This is all really unfortunate. Can we come up with something more systematic than this kind of ad-hoc reasoning? There doesn’t exist anything in safe Rust today, but there are a few different ideas people have come up with. I wanted to give a nod to those ideas: All of these options have really significant implementation challenges, though. This blog post from boats covers some of these solutions, and the implementation challenges with them. In this post, we: Some of the recommendations are: There’s a very deep well of complexity here, a lot more than I can cover in one blog post: If you’re curious about any of these, check out this link where I’ve put together a collection of documents and blog posts about these concepts. In particular, I’d recommend reading these two Oxide RFDs: Thank you for reading this post to the end! And thanks to many of my coworkers at Oxide for reviewing the talk and the RFDs linked above, and for suggestions and constructive feedback. Video of the talk on YouTube. Slides on Google Slides. Repository with links and notes on GitHub. Coverage on Linux Weekly News . What is cancellation? It’s an extremely powerful part of async Rust but also one that is very hard to reason thoroughly about. Analyzing cancellations: Going deep into their mechanics and providing some helpful ways to think about them. What can be done? Solutions, including practical guidance, and real bugs we’ve found and fixed in production codebases. I gave a talk at RustConf a couple years ago talking about how async Rust is a great fit for signal handling in complex applications. I’m also the author of cargo-nextest , a next-generation test runner for Rust, where async Rust is the best way I know of to express some really complex algorithms that I wouldn’t know how to express otherwise. I wrote a blog post about this a few years ago. You may have started a large download or a long network request Maybe you’ve started reading a file, similar to the command. The code that wishes to perform the cancellation can set that flag. Then, the code which checks that flag can exit early. If that feels strange to you, you’re not alone! But the Salsa framework for incremental computation, used by—among other things— rust-analyzer , uses this approach. Something I learned recently was that this only works on build targets which have a notion of panic unwinding , or being able to bubble up the panic. Not all platforms support this, and in particular, Wasm doesn’t. This means that Salsa cancellations don’t work if you build rust-analyzer for Wasm. While some OSes have APIs to perform this action, they tend to warn very strongly against it. That’s because in general, most code is just not ready for a thread disappearing from underneath. In particular, thread killing is not permitted by safe Rust, since it can cause serious corruption. For example, Rust mutexes would likely stay locked forever. First, it’s just way too easy to quietly drop a future. As we’re going to see, there are all kinds of code patterns that lead to silently dropping futures. Now this wouldn’t be so bad, if not for the second reason: that cancellation of parent futures propagates down to child futures. Because of Rust’s single ownership model , child futures are owned by parent ones. If a parent future is dropped or cancelled, the same happens to the child. To figure out whether a child future’s cancellation can cause issues, you have to look at its parent, and grandparent, and so on. Reasoning about cancellation becomes a very complicated non-local operation . For example, if you drop a future which sends a message, but for whatever reason you don’t care about the message any more, it’s not really a bug! The system has a cancel-unsafe future somewhere within it. As we’ll see, many APIs that are cancel-unsafe can be reworked to be cancel-safe. If there aren’t any cancel-unsafe futures in the system, then the system is cancel correct. A cancel-unsafe future is actually cancelled. This may sound a bit trivial, but if cancel-unsafe futures are always run to completion, then the system can’t have cancel correctness bugs. Cancelling the future violates some property of a system. This could be data loss as with , some kind of invariant violation, or some kind of cleanup that must be performed but isn’t. Obtain a lock on the mutex. Obtain the sled-specific data by value, moving it to an invalid state. Perform an action. Set the sled-specific data back to the next valid state. A cancel-unsafe future exists This cancel-unsafe future is cancelled The cancellation violates a system property The first component is the operation to reserve a permit or slot in the channel. This is an initial async operation that’s cancel-safe. The second is to actually send the message , which is an operation that becomes infallible. Unlike futures which are driven by the caller, tasks are driven by the runtime (such as Tokio). With Tokio, dropping a handle to a task does not cause it to be cancelled, which means they’re a good place to run cancel-unsafe code. Async drop would let you run async code when a future is cancelled. This would handle some, though not all, of the cases we discussed today. There’s also a couple different proposals for what are called linear types , where you could force some code to be run on drop, or mark a particular future as non-cancellable (once it’s been created it must be driven to completion). Saw that futures are passive Introduced cancel safety and cancel correctness as concepts Examined some bugs that can occur with cancellation Looked at some recommendations you can use to mitigate the downsides of cancellation Avoid Tokio mutexes Rewrite APIs to make futures cancel-safe Find ways to ensure that cancel-unsafe futures are driven to completion Why are futures passive, anyway? Cooperative cancellation: cancellation tokens Actor model as an alternative to Tokio mutexes Task aborts Structured concurrency Relationship to panic safety and mutex poisoning RFD 397 Challenges with async/await in the control plane by David Pacheco RFD 400 Dealing with cancel safety in async Rust by myself

0 views
sunshowers 5 months ago

I didn't transition for the metaphysics

Content note for anti-trans, scientifically illiterate nonsense. A couple weeks ago, I came across this rather remarkable Twitter post from Benjamin Ryan, a so-called writer about trans issues, commenting on a recent, anonymously-authored HHS report. This post is… stunning in how ignorant it is. The idea that trans people transition due to some kind of grand ontological theory of gender, and not to correct a misalignment between outward characteristics and inherent identity, is astonishingly detached from reality. I know how absurd it is, because: The world is remarkably complicated. To attempt to make sense of the world, we make models, or maps, of it. As the saying goes, all models are wrong, but some are useful . When we are faced by a mismatch between our model and observed reality, we seem to pick one of two options. The first is to continue to insist that the map is accurate, to deny the existence of a mismatch. This is bad . In the limit, one must inevitably use force to reshape the territory to fit the map. In other words, it is vicious . I’d hesitate to say that this first option is the root of all evil, but it certainly lies underneath a large chunk of it. The second option is to acknowledge the model’s limitations, and if necessary, expand the model to fit these observations. This is good ; it is virtuous . It is how I’ve tried to live my life, with a sense of curiosity and a desire to truly understand the world. Philosophy, like all attempts at sense-making, consists of creating models. To the extent that the existence of trans people has metaphysical or ontological implications, the philosophy lies downstream of these observations. It is common for simpler models to fail in the limit. For example, we now know that the traditional understanding of time isn’t quite right. When the measurements that seemed to belie this understanding were conducted, Einstein understood that traditional Newtonian physics models could not explain them. He came up with new models—new kinds of relativity 1 . We now know that this new understanding of time is more correct, and it would be weird to encounter a 21st century physicist who insists that time is absolute. Similarly, it is weird to encounter a 21st century professional in the field who insists on the traditional understanding of gender, and especially one who does so for metaphysical reasons. Just like with time, we now know that the traditional understanding of gender is a decent approximation that fails for a small but non-trivial percentage of the population ( like myself! ). But this time, getting it wrong has serious human rights implications. A bunch of people, likely through conversations in private mailing lists and group chats , have memed themselves into believing that their ontological maps must be correct—that any mismatches between it and the territory must be a mistake in the territory. To be clear, because free will does not exist , it is not their fault. It is a systemic issue, a matter of the environment these people are immersed in: one of ignorance, not knowledge . However, it is a real problem that the tendency to reject observed reality in favor of existing models is ascendant. There have always been people who proudly cling to their models as accurate descriptions of all of reality even as scientific understanding evolves. After a lull of a few decades, this group is now increasingly in charge. This tendency must be overcome, both at the individual and at the societal level. I’ll leave you with one last thought. You might have heard about the UK’s Cass review in the mainstream media, but perhaps less so about how it has been derided by working professionals. People far more knowledgeable than I am have said it “violates its own evidentiary standards” , that it is “completely contrary to the interests of adolescents in need of help” , and that it “transgresses medical law, policy, and practice” . I’m nowhere qualified to critique all of it, but I just want to point to one little thing (p. 14): the claim that “medication is binary”. To the best of my knowledge, no one in the media has pointed out that “medication is binary” is simply false. Yet, supposed papers of record like the Washington Post, when writing approvingly about the HHS report, have nothing to say about this untrammeled display of disinformation. (See also, Gell-Mann amnesia .) I’m unapologetic about my moral values: suffering is bad, knowledge is better than ignorance, and building new models is better than clinging to old ones. I hope you are as well. Physicists were familiar with the idea of relativity centuries before Einstein; it was Galileo who first described relativity . Before Einstein, the understanding was that the laws of mechanics are the same in all inertial (non-accelerating) reference frames. With special relativity, Einstein substituted physics for mechanics, and with general relativity, he crossed out inertial as well.  ↩︎ I am an honest conveyor of my experiences, and I absolutely did not transition for the metaphysics . Rather, I transitioned because I used to have negative feelings about my body and the way others perceived me, and now I have positive feelings about those things. Any doctor who specializes in trans care can tell you that there is a wide variety of methods for a nonbinary medical transition, from different hormone therapies to nonbinary surgeries. Besides that, many nonbinary people like myself go through mostly binary medical transitions, and we do so for a variety of reasons. Physicists were familiar with the idea of relativity centuries before Einstein; it was Galileo who first described relativity . Before Einstein, the understanding was that the laws of mechanics are the same in all inertial (non-accelerating) reference frames. With special relativity, Einstein substituted physics for mechanics, and with general relativity, he crossed out inertial as well.  ↩︎

0 views
sunshowers 7 months ago

Demystifying monads in Rust through property-based testing

In programming pedagogy, monads have a place as a mystical object from the functional programming world that’s hard to understand and even harder to explain. The stereotype about monad explanations is that they fall into two buckets: either comparisons to some kind of food item , or throwing complex mathematical jargon at you, what’s the problem? But monads aren’t esoteric or magical at all, nor do they only occur in functional programming. In essence, a monad is a design pattern that allows you to chain together operations within a framework. Noticing monadic design can be quite helpful for programmers in any environment, particularly because it’s often undesirable ! In many situations, monads have observable tradeoffs, and sometimes (as here) we can even collect concrete data to back this up. I’m going to try and explain monads in a way that is: In other words, I’m going to try and write the monad tutorial that I personally would have appreciated when I was younger. And I’m going to start at a place that’s unconventional: through property-based testing , where monads have profound performance characteristics. Note: While this article’s primary goal is to explain monads, it also serves as a practical introduction to property-based testing and fault injection techniques. If you’re new to these, you’ll find an introduction to both alongside the monad explanation. This post consists of five sections: Testing is fundamentally about building models for how your code should behave, at just the right level of complexity: they should match the scope of what you’re testing, without going overboard and reinventing the whole system a second time. The best explication of this general idea I’ve seen is in this piece by the great Argentinian writer Jorge Luis Borges : …In that Empire, the Art of Cartography attained such Perfection that the map of a single Province occupied the entirety of a City, and the map of the Empire, the entirety of a Province. In time, those Unconscionable Maps no longer satisfied, and the Cartographers Guilds struck a Map of the Empire whose size was that of the Empire, and which coincided point for point with it…” —On Exactitude in Science , Jorge Luis Borges Nothing quite exemplifies testing-as-modeling like property-based testing—an approach where instead of specifying exact examples, you define models in terms of properties , or invariants, that your code should satisfy. Then, you test your models against randomly generated inputs. Let’s take a simple example of a sort function, say , defined over a slice of integers: How should we go about testing it? The most common way to do this is to list out a bunch of inputs and ensure they are correctly sorted, through example-based tests . Example-based tests are quite valuable, because they are easy to write and quite direct about what happens. But even in a simple example like sorting, it’s easy to imagine cases where your examples don’t quite cover every edge case. How can more edge cases be covered? Well, one way to do so is to step back and ask, what is the sort trying to do? The goal of a sort function is to ensure that all the elements are in ascending order. Can we test that directly? The first thing we’d need is to get some inputs to test with. All we care about is a list of numbers here, which seems like it should be easy to generate using a random number generator. So maybe we write something like: We now have a model of sorting that we’ve written down in code form: any pair of values must be in ascending order. (In this view, example-based tests are also simple models: for input X, the output should be Y.) Now, we run the test, and… Whoops, looks like the function has a bug. (Scroll the above example to the right!) This example is quite unhelpful and hard to understand! It is possible to use this as an input to debug with, but it is quite painful. If we could use automation to turn this test case into a much smaller one that can still reproduce the bug, debugging becomes significantly easier. The process of doing so is called test case shrinking or reduction . To recap—property-based testing consists of two components: What counts as “smaller”? For a list of numbers, ideally we’d be able to minimize both the number of items in the list and the integers themselves. This suggests an algorithm for how to write a shrinker by hand: First, try and minimize the size of the list using a binary search algorithm. For example: Once the list has been shrunk, start shrinking the elements within the list, applying binary search to each element. After you’re done writing this algorithm, you’re well on your way towards creating the original property-based testing library, QuickCheck . This approach has you write two functions: a generator and a shrinker . With this, you can get much more reasonable-looking output: And for relatively simple cases like lists of integers, this kind of shrinking works quite well! But we’re not here to test simple cases. We’re here for the difficult ones. Most real-world sort implementations don’t just work over a list of integers. They’re written to be polymorphic over anything which can be sorted. In Rust parlance, this means anything that implements the trait; and even if not, a custom comparator function can be provided. But can be written by hand, and custom comparators are virtually always written by hand. One immediate consequence is that it’s possible that the comparator function says two elements are equal, but they are actually different. In that case, should the order of elements be preserved? Unstable sorts tend to be faster than stable ones, and there are valid reasons for preferring each at different times. (The Rust standard library has separate implementations for stable and unstable sorts.) Additionally, hand-written implementations mean users can make mistakes ! A production-grade sort algorithm must behave reasonably in the face of arbitrary user input, not just in the actual elements being sorted but also in the comparator function (full credit to Lukas Bergdoll for his extensive research here): Ord safety: Users can write a comparator that’s simply incorrect. An easy way is to introduce a difference between ordering and equality, for example by returning for two elements that are actually equal. Users could also return different answers for the same comparison when called at different times 1 . Panic safety: The comparator can panic in the middle of execution. Since panics can be caught , the input should be in some kind of valid state afterwards. Observation safety: If any of the inputs are mutated by the comparator, those mutations must be carried through to the final output. (With Rust, mutation through shared references is possible via interior mutability , as seen in or ). In these cases, completing the sort successfully becomes impossible. But it’s important that we leave the input in a reasonable state. How do we go about testing this? Trying to think of all the different failure modes seems really hard! But property-based testing can address this need through randomized fault injection . Let’s focus on Ord safety for now, with a comparator that flips around the result 20% of the time: To generate a : And to test this: Our original approach continues to work well—that is, right until the test finds a bug and we need to shrink a failing input. How does one go about writing a shrinker for ? Doing so seemed relatively straightforward for a list of integers. But this is a list where the elements are pairs of an integer and another list . Also: We’ve just tested Ord safety so far—once we’ve added fault injection for other kinds of safety, the complexity seems endless. And even more importantly, there isn’t a great way to compose smaller shrinkers together to form larger ones. Writing shrinkers is a lot of work already, and what’s the point if you have to keep doing all of it over and over? The practical result is that most of the time, writing a shrinker for types like is quite difficult. And writing one is also technically optional, since: All in all, given the choice of writing a shrinker by hand or just moving on with their lives, most developers tend to choose the latter 2 . Because of this, most modern property-based testing frameworks, like proptest in Rust, try and take care of shrinking for you through some notion of integrated shrinking . The idea behind integrated shrinking is: When you generate a random input, you don’t just generate the value itself. You also generate some context that is helpful for reducing the size of the input. The proptest library comes with many different kinds of strategies that can be composed together to create more complex ones. To generate an instance of , we’re going to use two strategies: To generate , we’re going to use , as well as: You might be wondering where all the shrinking code is. It’s actually implemented on the corresponding value trees for each strategy: You can see how the base strategies get turned into successively larger ones through combinators like . It’s very similar to , where you can keep calling , , and so on over and over 3 . In my experience, composability across different scales is where this model shines. You can build bigger strategies out of smaller ones, up to a surprising amount of complexity. This means that your team can invest in a library of ever-more-complex strategies, and continue to derive value out of that library across everything from the smallest of unit tests to large integration tests. But there is one massive wrinkle with integrated shrinking. And that wrinkle is exactly what monads are about. In the previous few sections, we’ve built up all the context we need. We’re now going to look at the fundamental operation that introduces monadic composition to proptest: . In the example above, there’s a function called , which we use to turn a tuple of components into a value. What happens when you try and shrink a value through a ? It’s very simple: So is just a conduit that values pass through: it simply maps a value to another value, and does not change the structure of the value tree in any way. Now let’s say we want to test out pairs of instances, where the way the second is generated depends on the first. This is a situation where we don’t just want to map a value to another value—we need to generate a whole new strategy based on a value. This is a fundamental shift : This is monadic composition in action. The result of one operation controls, at runtime, the entire shape of the next operation. For example, here’s one way to go about generating pairs of s, where the second value is always greater than the first: Your first reaction might be: wow, this seems really powerful . And you would be right! You can write whatever you like in the body of a : In a real and quite rigorous sense, is maximally powerful. Every combinator we’ve talked about, and in fact most of the proptest library, can be written in terms of . So why do all these combinators exist? Why don’t we just use everywhere? The function actually works reasonably well in practice. It generates random values with the right shape, and shrinks them correctly on a failing input. Shrinking is really, really slow. Exponentially slow. Why is that the case? Consider what happens when we want to shrink a value through a . As before: Because generates brand new value trees each time it’s called, shrinking has to be started again, from scratch, each time! This is the essence of monadic composition: powerful, unconstrained, and fundamentally unpredictable . We can measure the impact of monadic composition quite directly, along two related axes: the amount of time it takes to complete iterations, and the number of iterations the shrink completes in. For this post I wrote a small Rust program which collects metrics about shrinking for: With this program, I collected 512 samples on my workstation and analyzed the data. (I ran the program with set to 1 , to mimic typical dev builds in larger Rust projects 4 ). First, the amount of time it took to shrink values down, by key percentile : In this table, p50 represents the median completion time, while p75 and p90 show the times that 75% and 90% of the samples completed within. With , the amount of time scales somewhat linearly as we go from pairs to triples. But with just one additional level of , the performance degrades dramatically, going from under 20 milliseconds to almost 2 seconds! That’s over 100x slower. The difference in the number of iterations is even more striking: From hundreds of iterations to almost a million! And we’re working with pretty simple structures here, too. Just one more level of would make shrinking quite noticeably slow, and another one after that would be disastrous. The data here spans several orders of magnitude. A good way to visualize this kind of data is via a CDF plotted on a logarithmic scale. In these graphs, the x-axis shows the time or iterations, and the y-axis shows the cumulative probability. Curves that are further to the right are worse, and the logarithmic scale reveals that the differences are in orders of magnitude. What makes monadic composition so difficult to deal with? It has to do with the fact, mentioned above, that you can write whatever you like inside the . Because a can contain arbitrary computation inside of it, and that computation returns brand-new strategies, determining how a value will shrink through it is fundamentally unpredictable without actually executing it. In other words, the callback is quite opaque. Why is that? It’s because the callback is written in Rust, which is a powerful, Turing-complete language. It is impossible to fully analyze the semantics of Turing-complete languages 5 . (You might know this as the halting problem , or as Rice’s theorem .) But the fact that some analysis requires solving the halting problem is merely the start of the discussion, not the end of it! There is a rich literature on how to find approximate solutions for problems that are otherwise insoluble due to Rice’s theorem. For shrinking, here are a few approaches that are known to work. One option is to place limits on how long shrinking is done for. Note that has no issues while generating values, just while shrinking them 6 . The proptest library itself sets limits on shrink iterations, particularly across instances . This ensures that shrinking operations finish in a reasonable amount of time, even if they don’t produce minimal values. A better option is to rewrite generators to not use monadic composition . For the example above, it’s not hugely difficult 7 : But this can be quite challenging as complexity goes up! The proptest library comes with a number of helpers to write non-monadic strategies, particularly and . But there are real situations, particularly with large and complex data structures (for example, randomly-generated programming language syntax trees), where none of those options suffice and you have to use the full power of . Last but not least, there’s a set of approaches that I’m going to put into the bucket of rediscovering structure across flat maps. Key to these is understanding that when you generate a random value, you’re turning an RNG, which is a random stream of bits, into concrete, structured values. Can we somehow be clever about looking at the RNG bitstream? One option is to instrument the test , for example by using a fuzzer . Fuzzing is all about generating random data, looking at the branches explored, and tweaking the random data to ensure other branches are taken. It’s a great fit for peering past the black box of monadic composition. Another option is to be clever with random number generator . A random number ultimately generates a sequence of ones and zeroes. Can we poke at this sequence, possibly with the help of some hints from the strategies themselves? This is implemented by the Hypothesis framework for Python ; see this excellent paper about it . Both of these approaches are heuristic and quite complex. But that’s what you need to put together some structure again, after it’s been through the blender of monadic composition. In this article, we looked at monads as a design pattern : a way for a user to compose operations within a framework. We looked at both monadic functions like , and non-monadic ones such as and . Finally, we saw how in the context of property-based testing, monadic composition has a performance impact measured in orders of magnitude. With this knowledge in mind, you can now spot monadic composition in all kinds of other places: Futures in Rust, where is monadic, but futures are awaited sequentially. To run futures concurrently, you have to use non-monadic operations like . Some build systems , where individual build nodes can generate more nodes, making it impossible to generate a dependency graph without running the build. (Cargo is thankfully non-monadic.) And even with iterators, where returns an iterator with the same length as before, and returns an iterator with the same or a smaller length, but the length of is unbounded. , , and all happen to be monadic operations as well, though they’re all simple enough that they don’t have any practical downsides. The common thread through all of these examples is that within a framework, monadic composition is not just from value to value . It is from value to a further instance of that framework . The return value of can result in more futures being spawned, monadic build nodes can generate more build nodes, and turns individual values into iterators. This freedom is what makes monads both the most flexible kind of composition, and the hardest to predict the behavior of. This is part of a general observation throughout programming, whenever there’s an interaction between two parties or two sides. The more restrictions there are on one side, the freer the other side is to do what it likes. Monadic composition is an extreme version of that: the most powerful and least constrained form of composition for the user, but the most difficult to deal with for the framework. Whether you’re a user or a library designer, pay close attention to situations where your operations are monadic. They can provide a great deal of power, perhaps too much in some circumstances. If non-monadic operations are sufficient to help you achieve your goal, prefer them. And when they aren’t enough: well, now you know what you’re getting into! Thanks to Fiona and Cliff Biffle for reviewing drafts of this post. Any mistakes in it are my own. Updated 2025-02-20: Clarified note about Turing completeness to indicate that it is not the composition itself that’s Turing-complete—it’s the language used to write the callback in that’s at issue. This section contains some jargon, but it’s mostly here to satisfy the pedants who will inevitably Ctrl-F for “monad laws”. Please ignore this section if you didn’t get here through a search for that string. Formally speaking, monads are any kind of structure with two operations return and bind , where these operations obey three monad laws . For , these functions would be: For strategies, return is , and bind is . Here are the three monad laws expressed in terms of proptest: Left identity: for any , should be the same as . That is indeed the case, both for generation and for shrinking. Since does not do any shrinking, the pathological behavior doesn’t take effect. Right identity: for any , should be the same as . That is also the case. Every time is shrunk, has to discard the state returned by the dependent strategy. But is a trivial strategy that doesn’t have any state, so it’s effectively a no-op. Associativity: if you have a chain of two s, you can apply the second one either “inside” or “outside” the first one. More specifically, for any , with the definitions: this law requires that and are the same. That’s true as well, both for generation and for shrinking, though the latter is slightly harder to see. One might object by pointing out that in Rust, the above operations result in different types even if they have the same behavior. That is true, but also not particularly relevant, since you can always call to erase the type. One might also point out that we haven’t talked about a typeclass or trait here. That’s a distraction! Monadic composition exists everywhere in programming, regardless of whether the language lets you express as a reified type. You can express in modern versions of Rust via GATs , but there really isn’t much utility to it. Some of these examples might seem far too adversarial to ever be seen in the real world. But an implementation that can handle adversarially bad inputs can also handle inputs that are bad in more benign ways. For production-grade sort functions, it makes sense to design for the worst possible inputs.  ↩︎ This is not a moral failing on the part of those developers, because free will doesn’t exist . It’s completely fine, and worth embracing! Integrated shrinking is altering the environment in action.  ↩︎ Just like with iterators, the type signatures also become quite long!  ↩︎ The default value of , 0, is fine for , but infeasible for with triples.  ↩︎ The problem is not specific to Turing-complete languages—it even exists in non-Turing-complete ones. For example, if the callback were written in a language that only had for loops, no while loops , the framework would still not be able to peer past the shrinker.  ↩︎ I think this is the key insight that many monad tutorials miss. When writing about monads, it’s tempting to focus on simple examples like , where the monadic behavior doesn’t have significant practical implications. Here, too, if we restrict ourselves to generating values, there’s nothing wrong with . It’s only during shrinking that monadic composition rears its ugly head. If you’re explaining monads to an audience, consider emphasizing not just what monads are, but when their unconstrained power becomes a liability rather than an asset.  ↩︎ If you’re playing close attention, you might notice that while this function meets the requirement we set out, the random distribution generated is somewhat different. In this case, it’s possible to get a distribution equivalent to the example through the use of .  ↩︎ Geared towards Rust developers , with code samples in Rust, though I hope any sufficiently curious programmer can follow along Free of jargon: no mathematical formalism whatsoever Without analogies of any kind, and grounded in real programming problems Non-trivial: focusing on a production-worthy example with objectively measurable implications Practical: with advice all of us coders can use Property-based testing goes over the basics Drawing the rest of the owl talks about a complex scenario: using property-based testing to inject faults Integrated shrinking shows how to reduce inputs of challenging complexity to smaller sizes Monads, finally is where we introduce monads in this context, and provide data for how costly they can be Rediscovering structure discusses some ways to mitigate the tradeoffs of monads in property-based testing Test case generation using a source of randomness. On failing a test, shrinking it down to a smaller, more understandable size. First, try and minimize the size of the list using a binary search algorithm. For example: Try the first half of the list against the function. If that exhibits an error, attempt to recursively shrink this half. If that doesn’t work, try it with the last half of the list. If neither work or if the list has 1 or fewer elements, move on to the next step. Once the list has been shrunk, start shrinking the elements within the list, applying binary search to each element. A sort implementation which preserves the order is called a stable sort . An implementation which does not is called an unstable sort . Ord safety: Users can write a comparator that’s simply incorrect. An easy way is to introduce a difference between ordering and equality, for example by returning for two elements that are actually equal. Users could also return different answers for the same comparison when called at different times 1 . Panic safety: The comparator can panic in the middle of execution. Since panics can be caught , the input should be in some kind of valid state afterwards. Observation safety: If any of the inputs are mutated by the comparator, those mutations must be carried through to the final output. (With Rust, mutation through shared references is possible via interior mutability , as seen in or ). We’ve just tested Ord safety so far—once we’ve added fault injection for other kinds of safety, the complexity seems endless. And even more importantly, there isn’t a great way to compose smaller shrinkers together to form larger ones. Writing shrinkers is a lot of work already, and what’s the point if you have to keep doing all of it over and over? If the test passes, shrinkers are never invoked. Simply write correct code, and shrinking just isn’t an issue! If the test fails, developers can debug using the original input. It’s painful but possible. In proptest, this combined value and context is called a value tree . Any implementation that accepts a random number generator and turns it into a value tree is called a strategy . The strategy , which “just” returns a single value. The strategy , which generates values from one of a possible list of strategies, where each choice has a given probability. (A function that takes one or more strategies as input, and produces a strategy as its output, is called a combinator .) A range strategy such as , which generates values within the range uniformly at random. The combinator , which accepts a strategy for elements and a size parameter. Range strategies use binary search to make values smaller. The strategy doesn’t do any shrinking, since it just returns a single value. The combinator shrinks towards the beginning of the choices: in this case, is shrunk into . The combinator implements roughly the algorithm in Implementing a manual shrinker above. Attempt to get a smaller value from the underlying value tree. Call the map function on the value. As above, transforms a value into another, but preserves the original structure. This new method, , goes well beyond that. Based on the value generated, it creates a brand new strategy with a structure all of its own. You can conditionally return one strategy or another, reimplementing . You can first generate a size and then return a vector with those many elements, reimplementing the combinator. You can call again, as many times as you like. We would attempt to get a smaller value from the underlying value tree. And then, because we would call the callback to generate a new strategy, we would throw away the previously-generated value tree entirely . The implementations for pairs above, and a non-monadic implementation with (see below) The same for triples: a non-monadic implementation with , and a monadic one with two levels of . One option is to instrument the test , for example by using a fuzzer . Fuzzing is all about generating random data, looking at the branches explored, and tweaking the random data to ensure other branches are taken. It’s a great fit for peering past the black box of monadic composition. Another option is to be clever with random number generator . A random number ultimately generates a sequence of ones and zeroes. Can we poke at this sequence, possibly with the help of some hints from the strategies themselves? This is implemented by the Hypothesis framework for Python ; see this excellent paper about it . Futures in Rust, where is monadic, but futures are awaited sequentially. To run futures concurrently, you have to use non-monadic operations like . Some build systems , where individual build nodes can generate more nodes, making it impossible to generate a dependency graph without running the build. (Cargo is thankfully non-monadic.) And even with iterators, where returns an iterator with the same length as before, and returns an iterator with the same or a smaller length, but the length of is unbounded. , , and all happen to be monadic operations as well, though they’re all simple enough that they don’t have any practical downsides. Left identity: for any , should be the same as . That is indeed the case, both for generation and for shrinking. Since does not do any shrinking, the pathological behavior doesn’t take effect. Right identity: for any , should be the same as . That is also the case. Every time is shrunk, has to discard the state returned by the dependent strategy. But is a trivial strategy that doesn’t have any state, so it’s effectively a no-op. Associativity: if you have a chain of two s, you can apply the second one either “inside” or “outside” the first one. More specifically, for any , with the definitions: this law requires that and are the same. That’s true as well, both for generation and for shrinking, though the latter is slightly harder to see. Some of these examples might seem far too adversarial to ever be seen in the real world. But an implementation that can handle adversarially bad inputs can also handle inputs that are bad in more benign ways. For production-grade sort functions, it makes sense to design for the worst possible inputs.  ↩︎ This is not a moral failing on the part of those developers, because free will doesn’t exist . It’s completely fine, and worth embracing! Integrated shrinking is altering the environment in action.  ↩︎ Just like with iterators, the type signatures also become quite long!  ↩︎ The default value of , 0, is fine for , but infeasible for with triples.  ↩︎ The problem is not specific to Turing-complete languages—it even exists in non-Turing-complete ones. For example, if the callback were written in a language that only had for loops, no while loops , the framework would still not be able to peer past the shrinker.  ↩︎ I think this is the key insight that many monad tutorials miss. When writing about monads, it’s tempting to focus on simple examples like , where the monadic behavior doesn’t have significant practical implications. Here, too, if we restrict ourselves to generating values, there’s nothing wrong with . It’s only during shrinking that monadic composition rears its ugly head. If you’re explaining monads to an audience, consider emphasizing not just what monads are, but when their unconstrained power becomes a liability rather than an asset.  ↩︎ If you’re playing close attention, you might notice that while this function meets the requirement we set out, the random distribution generated is somewhat different. In this case, it’s possible to get a distribution equivalent to the example through the use of .  ↩︎

0 views
sunshowers 8 months ago

Free will quite clearly doesn't exist

There’s absolutely no brand-new insight in this post. A huge amount of credit goes to various philosophers and thinkers, especially folks like Aaron Rabinowitz , for shaping my views. Any errors in it are my own. In the natural world, everything that occurs is the result of a chain of causation all the way to the big bang. The chain of causation is: (We’ll summarize all of these causes into a single word, “deterministic.”) At no point does anything we know from studying the natural world resemble anything like our common understanding of free will . There aren’t even the vaguest hints or suggestions of anything like that. So free will in its common form is inherently a supernatural belief. It’s fine if you believe in supernatural phenomena. But you’re not going to make people who see no need to believe in supernatural phenomena, like myself, agree with you on that basis. This is not unfalsifiable! One of the characteristics of a naturalistic view is invariance : for example, the laws of the universe stay the same when you move around from place to place, and/or over time. Very clear evidence of supernatural interventions would be a different set of rules governing the universe at one particular place or time, in a way that doesn’t generalize. Such evidence has never been presented 1 . A general response to pointing out this basic truth is compatibilism . This term refers to a group of positions which accepts determinism, but tries to preserve something resembling “willpower”, “agency” or “moral capacity”. But that’s just shifting the goalposts. It is true that the ability to make globally optimal decisions is valuable, and it is also true that this varies by person and over time. But that variance is determined by the same processes that are in charge of everything else. Why wouldn’t it be? We’ve all had some days where writing the right code — doing the right thing has been harder than others. That, too, traces its chain of causation back to the big bang. Why do so many humans believe in free will? The widespread belief in free will is also deterministic, of course. Like everything else about us, it’s a result of a chain of causality, i.e. some combination of environmental, genetic, and random effects. It may be a helpful bias to have in some situations. But we have plenty of other forms of biased thinking. Part of becoming Better is understanding how our biased thinking might cloud our understanding of society and lead to worse outcomes. In particular, our belief in free will and the closely-related notion of moral responsibility clouds our ability to see developers writing bad code as a result of bad development tool— incarceration and other kinds of torture for what they are. A belief that fear and terror prevent people from doing things you don’t want them to do isn’t incompatible with determinism, but it’s best to be honest about what you truly believe here. The last stand of the free-will defenders tends to be some variety of “yes, it’s false, but laypeople need to believe in it to cope with reality.” For many of us, our cultures have not memetically prepared us to deal with a lack of free will 2 . This is unfortunate, because recognizing that free will does not exist is not a reason for nihilism or fatalism. It is a tremendous gift of knowledge! If all behavior is some combination of environmental, genetic, and random luck , then it follows that the easiest point of leverage is to make the environment better. We now have pretty good data that better environments can cause, say, memory safety bugs to decrease from 76% to 24— fewer children to develop asthma. Why wouldn’t better environments also help us reason more clearly, make better decisions, and generally live life in a more pro-social manner? Overall, I think it is far more interesting to skip over all this free will stuff and instead do what the French philosophers did: examine how environments exercise control over us, in order to suggest ways to reshape it. So when people exercise catastrophically poor judgment while writing code in memory-unsafe lang— making the most important political decisions of their lives, it is important to understand that their behaviors are a result of the environment failing them, not a personal moral failing. This is Reddit atheism 101. It is important to occasionally remind ourselves about why we started believing in these foundational ideas.  ↩︎ Like everything else in existence, this, too, is determined.  ↩︎ almost entirely deterministic with some chaotic effects that appear to be random on the outside but in reality are deterministic and also with a small amount of true (quantum) randomness, that very occasionally turns into something big This is Reddit atheism 101. It is important to occasionally remind ourselves about why we started believing in these foundational ideas.  ↩︎ Like everything else in existence, this, too, is determined.  ↩︎

0 views
sunshowers 9 months ago

Why nextest is process-per-test

I’m often asked why the Rust test runner I maintain, cargo-nextest , runs every test in a separate process. Here’s my best attempt at explaining the rationale behind it. This document is cross-posted from the canonical copy at the nextest site . Benchmarks are available there as well, though a large part of this document talks about non-benchmarkable reliability improvements. A key factor distinguishing nextest from is that nextest runs each test in a separate process . With nextest, the default execution model is now, and will always be, process-per-test. This document explains why. Some terms used in this document: The Rust ecosystem is now staggeringly large. There are millions of Rust developers and billions of lines of Rust code in production today. A key challenge with driving adoption of any new technology, like a new test runner, is coordination : everyone needs to agree on how to run and manage them. Game theory offers a powerful framework to study these kinds of coordination problems. Consider this classic problem: if two people need to meet in London on a particular day but can’t coordinate the time and place, they’re likely to choose noon at Big Ben . This becomes what game theorists call a focal point , also known as a Schelling point —a natural default that everyone can assume without discussion. The process-per-test model is powerful because it serves as the focal point , the Big Ben, of how to run tests. Just as everyone in London knows where the Great Clock of Westminster is, every operating system knows how to create and manage processes. Just as noon is a natural meeting time, processes form natural fault boundaries. The process-per-test model forms a universal protocol that everyone in the ecosystem can rely on without explicit coordination. This game-theoretic lens provides fresh insight into the benefits of process-per-test: Test authors and libraries don’t need to coordinate global state usage. There are a surprising number of real-world use cases that require per-test isolation—particularly integration tests against global state: Even if particular tests don’t require per-test isolation, they may often benefit from it. For example, singletons become separated per-test. (You may argue that singletons aren’t desirable, or should not be used for tests—but instead, think of per-test isolation as a principled, zero-coordination solution to this class of issues.) Internal memory handling doesn’t need to be coordinated across tests. Memory corruption in one test doesn’t cause others to behave erratically. One test segfaulting does not take down a bunch of other tests. Test runners don’t need a custom protocol. With process-per-test, test runners can rely on universal OS primitives, like the ability to kill processes. For more discussion, see The costs of coordination below. Process-per-test is not free, though. It’s worth acknowledging some costs: Tests might want to share in-memory state. They might have an in-memory semaphore for rate-limiting, a shared in-memory immutable cache that is primed on first access, or an in-memory server to which tests communicate via message passing. Process-per-test makes these valuable patterns harder to achieve : semaphores must be managed by the test runner, in-memory state must be stored on disk, and shared servers must live out-of-process. Adapting existing tests to process-per-test may require a significant engineering effort. Another downside that’s noticeable in some situations is process creation performance . Creating processes is very fast on Linux and most other Unix-like systems. It is quite slow on Windows, though, and it can be slow on macOS if anti-malware protections are interfering . How much this is an issue in practice depends on how long individual tests take to run. For many projects on Windows, nextest is overall faster than anyway because it handles long-pole tests better. Process-per-test enables rich lifecycle management with zero coordination via universal OS primitives. What kinds of coordination would enable shared-process tests? Here’s a partial list of operations that nextest performs using OS primitives: Compare this to ’s shared-process model. works by embedding a small runner into each test binary which manages rate-limiting and filtering. However, as of early 2025, each runner works in isolation: there is no global orchestrator that can schedule tests across multiple binaries. There is also no way to configure timeouts for tests, or to cancel some (but not all) running tests. For more, see the appendix . If a test takes down the process, all of the other tests in the binary are effectively cancelled as well. In this case, getting a full accounting of all tests and what happened to them is extremely difficult. JUnit reports wouldn’t be nearly as valuable if they were missing many tests! Effective and robust management of tests in a shared-process model requires coordination between three entities: This represents a lot of extra work! Not just the technical kind (though that is certainly quite involved), but also the coordination kind: building support for such a protocol and getting developers to adopt it. And it would still not solve every problem. (For example, segfaults would still take down the whole process.) There are many technical benefits to the process-per-test model, but the biggest benefit is in (the lack of) coordination: the process-per-test model acts as a focal point that all participants can agree on by default. This is the key reason that nextest commits to this model being the default in perpetuity. Nevertheless, we’re excited to see developments in this space. We’ll consider opt-ins for newer patterns that can deliver feature-rich and reliable test running at scale like nextest does today. Many nextest users rely on its ability to cancel running tests due to timeouts. This section talks about why terminating threads is much more difficult than terminating processes. Unlike process termination, though, thread termination is inherently hazardous. The Windows function explicitly warns against ever using it: TerminateThread is a dangerous function that should only be used in the most extreme cases. You should call TerminateThread only if you know exactly what the target thread is doing, and you control all of the code that the target thread could possibly be running at the time of the termination. With and the POSIX equivalent, , the behavior of synchronization primitives like is undefined. Certainly, Rust mutexes will not be marked poisoned . It’s possible (even likely) that mutexes are held forever and never released. Thread-killing operations would be devastating to test reliability. Practically speaking, the only reliable way to cancel synchronous Rust code is with some kind of cooperation from the test itself. This could mean the test checking a flag every so often, or by the test regularly calling into a library that panics with a special payload on cancellation. With async Rust, cancellation is somewhat easier , because yield points are cancellation points. However, in an eerie similarity with , Tokio mutexes are also not marked poisoned on a task or future cancellation 1 . It can be illuminating to contrast process termination with thread termination: These examples make clear how focal points manifest and perpetuate: we’ve all generally agreed that other processes might behave in strange ways, but we assume that other threads within our process are going to behave reliably. Based on our experiences with cancellation in other Rust projects, we strongly recommend Tokio mutexes be treated as a feature of last resort.  ↩︎ Process-per-test: Each test lives in its own process. Shared-process: Several tests share the same process. There are vast numbers of existing tests, representing millions of developer-hours of effort, that must continue to work These tests integrate with all sorts of different libraries, including many not written in Rust, or with other unusual requirements Test runners need to be able to run tests reliably, and rely on certain properties—for example, that killing one test will have no effect on others Test authors and libraries don’t need to coordinate global state usage. There are a surprising number of real-world use cases that require per-test isolation—particularly integration tests against global state: Tests against a graphical API like EGL , which only allows one thread in a process to create a GPU context. uses nextest for this reason Tests that must be run on the main thread of their corresponding process, like tests against certain macOS frameworks Tests that require altering environment variables or other global context Internal memory handling doesn’t need to be coordinated across tests. Memory corruption in one test doesn’t cause others to behave erratically. One test segfaulting does not take down a bunch of other tests. Test runners don’t need a custom protocol. With process-per-test, test runners can rely on universal OS primitives, like the ability to kill processes. For more discussion, see The costs of coordination below. Tests might want to share in-memory state. They might have an in-memory semaphore for rate-limiting, a shared in-memory immutable cache that is primed on first access, or an in-memory server to which tests communicate via message passing. Process-per-test makes these valuable patterns harder to achieve : semaphores must be managed by the test runner, in-memory state must be stored on disk, and shared servers must live out-of-process. Adapting existing tests to process-per-test may require a significant engineering effort. Another downside that’s noticeable in some situations is process creation performance . Creating processes is very fast on Linux and most other Unix-like systems. It is quite slow on Windows, though, and it can be slow on macOS if anti-malware protections are interfering . How much this is an issue in practice depends on how long individual tests take to run. For many projects on Windows, nextest is overall faster than anyway because it handles long-pole tests better. Start a test. In particular, start a test at a specific moment, and not before then; and also, do not start a test if it is filtered out or the run is cancelled. With a process-per-test model, this is natural: start the test process. Know when a test is done, as it’s done. With process-per-test, wait for the process to exit. Measure test times. Not just after a test is done, but while it is running. With process-per-test, use wall-clock time. Terminate tests that timed out. In particular, one test timing out should not cause others to be killed. With process-per-test, this is solved via signals on Unix and job objects on Windows. Retry failed tests. This is an extension to point 1: retrying tests, and marking tests as flaky if they succeeded later. Cancel tests on an input. On a test failure or receiving a signal, the test runner may decide to leave tests running but no longer schedule new ones, or to cancel existing tests. Gather test output. With process-per-test, nextest can read standard output and standard error for the process. However, as of early 2025, each runner works in isolation: there is no global orchestrator that can schedule tests across multiple binaries. There is also no way to configure timeouts for tests, or to cancel some (but not all) running tests. For more, see the appendix . If a test takes down the process, all of the other tests in the binary are effectively cancelled as well. In this case, getting a full accounting of all tests and what happened to them is extremely difficult. JUnit reports wouldn’t be nearly as valuable if they were missing many tests! The test runner: There must be an overall orchestrator, and for robustness it must live in a separate process from any of the test binaries. A component within the test binary: These requirements preclude any kind of one-shot model where the list of tests is provided upfront and run to completion. Instead, there must be an in-memory component that’s dynamic and event-driven. There needs to be a rich, bidirectional communication protocol between the test runner and this component. (Simpler protocols like the GNU make jobserver can do rate-limiting, but not cancellation, retries, or time measurement.) The test itself: For cancellation in particular, the test must cooperate. For more on why, see the appendix . With process-per-test, nextest sends a signal to the test process, and if it doesn’t exit within 10 seconds, a . With a shared-process model, assuming each test is in a separate thread, a timeout would require the thread to be killed rather than the process. Threads do not generally expect other threads to just disappear without warning, but processes are expected to sometimes exit abnormally. Synchronization across processes is possible via functions like , but is often advisory and generally uncommon. Code that uses these tools is generally prepared to encounter protected data in an invalid state. Cross-process mutexes are quite rare (message-passing is much more common), and shared-memory code is usually written with great care. Based on our experiences with cancellation in other Rust projects, we strongly recommend Tokio mutexes be treated as a feature of last resort.  ↩︎

0 views
sunshowers 1 years ago

Beyond Ctrl-C: The dark corners of Unix signal handling

RustConf 2024 is next week, so I thought I’d put up a written version of my RustConf 2023 talk about signals in time for that. I’ve tried to retain the feel of it being a talk, while editing it heavily to make it readable as a blog entry. Some links: Imagine you’re in the middle of a conversation, when suddenly, a tap on the shoulder interrupts your train of thought. You turn to face the interloper, only to find a dear friend with an urgent message. In that moment, you’re faced with a choice: do you ignore the interruption and continue your conversation, or do you pause to address your friend’s needs? In the world of computing, this tap on the shoulder is akin to a signal : a way for the operating system to interject and communicate with a running process. Just as you might choose to ignore or respond to your friend’s interruption, a process must decide how to handle the signals it receives. Let’s start with a couple of questions: If you’ve answered yes to the second question, then there’s a chance your program isn’t handling signals correctly. Much like a conversation that’s interrupted at an inopportune moment, a process that’s interrupted by a signal can be left in a state of disarray. In this post, we’ll explore the world of Unix signals, delving into their history and their surprising complexity. We’ll learn how to handle these interruptions gracefully. And we’ll discover how async Rust can help us tame the chaos of signal handling, making it easier than ever to write robust software. Most of this knowledge comes from my work on cargo-nextest , a next-generation test runner for Rust that is up to thrice as fast as . One of the things that sets nextest apart is how it carefully, rigorously handles any signals that come its way. Be sure to try it out if you haven’t already! If you’re an experienced engineer, you’re probably used to asking questions like why , when , and how much you should care about something. So: Why bother with signal handling at all? When should you care about signals? You should care about signals if you’re developing a service . You’re likely going to be running under a service manager like Kubernetes, and the lifecycle generally involves signals. This screenshot shows Kubernetes documentation explaining that when your service is shutting down, it will receive a . Docker works the same way with . Note how the documentation advises that your code should “listen for this event and start shutting down cleanly at this point”. You should also care about signals if you’re developing a command-line tool . That’s because your users are impatient and if they perceive your operation as being slow for any reason, they will hit Ctrl-C and send a signal to your process. How much do you need to care about signals? If you’re a command like or and all you’re doing is reading data , you probably don’t need to care much about signals. You’re not making any changes to the system so there’s little that can go wrong if your process dies. If you’re writing data , it’s definitely worth thinking about signal handling. There are ways to arrange for your code to be resilient to sudden termination, such as writing files atomically , or using a database like SQLite . But even if your code doesn’t depend on correct signal handling, you can likely provide a better user experience if you do handle them. Where you need to care most about signals is if you’re orchestrating an operation in a distributed system. In those cases, if the process receives a signal locally, it may wish to send out cancellation messages over the wire. Again, it’s a good idea to make your system fault-tolerant and resilient to sudden termination—for example, by serializing state in a persistent store—but at the very least, signals give you the opportunity to perform cleanup that can otherwise be difficult to do. Before we move on: In this post, we’re going to focus on Unix, not Windows. We’re also going to focus on portable signal handling, which means mechanisms that work on every Unix. Some Unix platforms offer alternative ways to handle signals 1 , but this post will not be discussing them. Let’s look at a simple example of a signal being sent to a process: What happened here? This example shows the two uses of signals. One is as a standardized, widely understood way for the kernel to interrupt a process. The other is as a basic, limited way to perform interprocess communication , or IPC . Besides Ctrl-C and other shortcuts, the main way you’d be sending signals to processes on the command line is via the command . Within a programmatic context, libc has a function you can call which does the same thing. As mentioned above, each signal has a name and number associated with it. Some of those numbers are standardized across Unix platforms, while others aren’t. On Linux, if you type in , you’ll see a long list of signals. Some of them are: In this table: is also known as , and it’s a special signal used to kill a process. What sets apart is that unlike almost all other signals, its behavior can’t be customized in any way. might be familiar to you if you’ve ever encountered a core dump. Somewhat surprisingly, the behavior of can be customized. For example, the Rust standard library customizes ’s behavior to detect call stack exhaustion 2 . and are used for what is called “ job control ”. If you’ve ever used Ctrl-Z in , or the commands or , then they uses these signals 3 . In general, all signals have a default action. Almost all of them also let you customize the default behavior, using what is called a signal handler . A signal handler is a custom function that is used to intercept specific signals. Once a signal handler is set up, the kernel no longer follows the default action, calling the handler instead. In that sense, it’s a reverse system call , also known as an upcall . For example, if you’re writing or reading data, you use a system call or syscall to perform that action—in that case, you’re calling the kernel. With an upcall, instead, the kernel calls you. (It’s an “upcall” because the call goes “up”, not “down”). Importantly, signal upcalls can happen at almost any time . And this is where we start running into issues. To see the sorts of issues that signal handlers can create, let’s walk through a specific example: that of a download manager. Back in the 2000s, these programs were a lifesaver. I grew up with pretty terrible internet back then, and the download managers had several features that really helped. The most important feature was their ability to resume downloads, something that browsers didn’t support back then. For this post, we’re going to work with a really simple download manager. Let’s say you provide a bunch of URLs to a tool, which then downloads them in parallel and maintains their status in a database. Now, let’s say we want to handle . If the user hits Ctrl-C, we may want to follow a small set of sensible steps: Your first idea might be, “let’s just put all this logic in a signal handler”. Can you do that? The answer turns out to be that no, you can’t do that. And why you can’t do that is helpful at illustrating many of the pitfalls of signal handlers. Earlier, I mentioned that signal handlers can be called at any time . It turns out that the ability to call a piece of code at any time is fraught with peril like little else in computing. This property of signal handlers is at the root of so many of the problems with them. For example, consider what happens if a signal handler is called while you’re holding a mutex or other lock. In general, trying to acquire the same lock again will result in a deadlock 4 . So you can’t call functions that try to acquire a lock. Well, which functions acquire locks? Even something as basic as allocating memory via requires a lock, because it pokes at global structures. This means that you cannot allocate memory in a signal handler. This alone shuts off a large percentage of the things you can do in a signal handler. Another joy of signal handling is that while a handler is running, your process can receive a different signal and invoke a second signal handler. As you might imagine, this is just very hard to reason about in practice. These aren’t just theoretical concerns! You might have heard of the CVE database , where security vulnerabilities are filed and listed. The CVE database has a lesser-known cousin called the CWE database , which lists out “common weaknesses” that result in security vulnerabilities. Within this database, there are no fewer than four weaknesses related to incorrect signal handlers: On Linux, the man page on lists functions described by POSIX as okay to call in signal handlers, and it is pretty short. You can write to a file descriptor, but is not allowed. You also can’t open or seek a file. The functions okay to call in signal handlers are called async-signal-safe functions . The term is a bit confusing! “async” here has nothing to do with async Rust. (In many ways it’s the opposite of async Rust, because the defining characteristic of async Rust is that you cannot just be interrupted or preempted at any time. You can only be interrupted at await points.) So how do most modern programs handle signals? To understand that, we briefly need to introduce the concept of the self-pipe trick . The trick uses a Unix feature called self-pipes, which have been described in the same breath as both “wonderful” and “cursed”: a high honor! You might be familiar with pipes from having used them in shells via the namesake pipe ( ) operator. For example, consider a typical command. A self-pipe is just a kind of pipe where the write and read ends are held by the same process. Now, you might ask, “what’s the point of this?” And you’d be right to do so! Most of the time they don’t add any value. But they do add value in the specific context of signal handlers, because they let programs write signal handlers that are as simple as possible. Most C programs do this by hand, but in Rust you don’t have to write this delicate pattern manually. There are several crates which implement this pattern, like , and most people just use one of them. Going back to the download manager example: Let’s say you write to a pipe, indicating that Ctrl-C has been pressed. How do you handle the read side of the pipe? Once you’ve received a signal, how do you handle it? One option is to set some kind of global flag , and check whether you’ve received a signal at every iteration of a loop–or maybe every Nth iteration of a loop. This works fine for small programs that are CPU-bound, but isn’t really scalable to large programs because those tend to have lots of loops in them (are you really going to add checks to every loop?) Large programs are I/O bound anyway, and it’s a bit hard to check for signals while you’re waiting for some network operation to complete. Another potential solution: Store all the state behind a mutex, and wrest control of the program by locking out all the workers. This is a solution that some programs use, but it is really difficult to coordinate state between the workers and the signal handler. I really wouldn’t recommend following this approach. The most reasonable approach for I/O-bound programs is to use message passing , where the parts of a program that deal with signals are notified that a signal has occurred. This is possible to do in a synchronous model, but, as I’ve documented in my previous post about how nextest uses Tokio , it adds a great deal of unbounded complexity. What happens in practice is that you inevitably create a nest of threads whose only responsibility is to pass messages around. The good news is that dealing with messages is much, much simpler with async Rust. At this point you might ask: “I’m just a simple downloader tool, why would I need async for this?” Async Rust is presented as being for massively concurrent web or backend servers, but a secret about it is that that is just marketing. The scope of problems async Rust solves is much broader, and it happens to be incredibly well suited to signal handling for most programs. This is because async Rust provides some very expressive ways to perform advanced control flow, making such code readable without sacrificing performance. To see how, we’re going to use Tokio’s signal handling functionality, which under the hood uses the same self-pipe trick mentioned above 5 . Here’s what a very simple example of Ctrl-C looks like under async Rust: In this example: Now, this isn’t very special by itself; you can easily implement this with synchronous code. But this model really shines in more complex code paths, because of async Rust’s ability to perform heterogenous selects via . For a detailed discussion of heterogenous selects, see my earlier post about them. But a quick summary is that is a powerful control flow tool that waits for a set of futures to be ready concurrently, and resolves as soon as one of them completes. What makes really special is it doesn’t just work against specific kinds of asynchronicity, but against any arbitrary, heterogenous source of asynchronicity: signals, timers, any async function–you name it. As a result, is a great fit for signal handling 6 . Going back to our download manager example, let’s try using . There are a few ways to organize this, but here’s one way. We’re going to briefly introduce two constructs that make our life simpler: Here’s how the main function works: Next, the main function needs to wait for results from all the workers. How would we do it if we weren’t handling signals? Well, in that case we would loop and wait for each worker task to finish until they’re all done, handling errors along the way. To handle signals, we use a with two branches from which items are fetched in parallel: Now let’s look at the worker function. We first write our download function within an async block: Then, just like earlier, we write a loop with a over two options: What makes this model tick is how well it scales up to additional complexity. Two specific examples that are handled well by this model: A fully working example of the download manager described above is in the demo repository . These two extensions are marked as exercises for you to complete; try solving them! If all you’re doing is orchestrating an external operation like downloading files, then this is most of what you need to know. But a lot of the really interesting details about signals lie when you’re spawning processes—such as if you’re a shell, or a test runner like nextest. There is far too much to talk about here, but we’re going to go deeper into one particular example. In part 1, I mentioned that if you press Ctrl-C, the shell sends to the process you’re running. That isn’t quite correct; I fibbed a little! The truth is that the shell actually sends the signal to what is called a process group . What is a process group? In Unix, processes can be organized into groups, each with its own unique identifier. Process groups allow a set of processes to be sent a signal at once. The shell creates a new process group for the command and its children, assigning them a unique group ID. This allows the shell to manage the processes as a unit, sending signals to the entire group at once. On Linux, you can print process groups with a command like . (Other Unix platforms have their own flags to display similar output.) For example, if you run this command while is running in another terminal, you might get some output that looks like this: In this case: Recall from earlier that if you want to send to a process via the command, you’d use . And that’s what really happens when you hit Ctrl-C: both the process itself and all the child processes are terminated by this signal sent to the process group. Let’s say that you’re a test runner like nextest. What if you want to join the party? For example, it’s common for tests to spin up servers in another process to communicate with. A test runner that terminates a test likely also wants to kill off any processes created by that test. To set the process group for a process, Unix provides a function called . In Rust, access to this is provided via the extension trait’s method . Most of the time, you’ll want to pass in for the process group, which means that the kernel will create a new process group for you with the same number as the process ID. Let’s say you’re a command-line process that is using process groups to manage its children. What happens if the user hits Ctrl-C? You might assume that process groups form some sort of tree. Just like with the notion of a tree of processes with parent and child processes, you might imagine that process groups follow a similar pattern. In reality: no, process groups don’t form a tree . Each process group is its own island. This means that as a boundary process which manages process groups of your own, it is your responsibility to forward signals to child process groups. (This is quite easy to fit into our async Rust model: the worker tasks, on receiving a broadcast message, send the corresponding signal to the process groups that they’re responsible for.) Most programs would want to behave as similarly as possible to the world where they didn’t set up process groups. To achieve that you’ll want to at least forward these signals: To match behavior, for signals that are typically sent to process groups: (Ctrl-Z), and ( or ). These two signals are interesting to handle in nextest: if you have timers running for your processes, for example if you want to timeout and kill process groups, you can combine and with async Rust to pause and resume those timers. (There is also a deep story about an issue in POSIX lurking here, which I’ll cover in a sequel to this post.) Also consider , to meet user expectations, forwarding and other signals. These signals are not always sent to process groups, but it would make sense to forward them. Signals are a fundamental part of the computing landscape, a legacy of design decisions made decades ago. They are a reminder that the systems we build today are shaped by the choices of the past, and that even the most well-intentioned innovations can have unintended consequences. In the words of Doug McIlroy , creator of Unix pipes: was there first and foremost to support ; it did not purport to provide a sound basis for asynchronous IPC. The complexity of is evidence that asynchrony remains untamed 40 years on. Signals, for all their utility, were never meant to be the foundation of interprocess communication that they have become today. Yet we have found ways to adapt and evolve them to our needs, working around their limitations and turning them into opportunities. Indeed, this is the essence of computing, and of technology itself: a story of creativity in the face of constraint, and of building upon the foundations of the past to create something new and beautiful. I hope this post has shed some light on the fascinating world of Unix signals, and perhaps even inspired you to think closely about the systems you build. If you’ll be at RustConf in Montreal, come find me: I’d love to chat more about this and hear your own stories about signals, or of other intricate systems with long-forgotten design decisions. See you there! Thanks to Fiona for reviewing a draft of this post. One example is signalfd on Linux. There is much to be said about signalfd, but it is outside the scope of this post.  ↩︎ Some other language runtimes do much, much worse things with . Here’s what Java does .  ↩︎ If you haven’t, by the way, check it out! It’s one of the cooler parts of Unix.  ↩︎ If you’ve dealt with locks you might have heard of the concept of recursive or re-entrant locks , which are locks that can be acquired by the same thread as many times as desired. Recursive locks often indicate a design flaw , because the goal of a lock is to temporarily violate invariants that are otherwise upheld outside of its context. But more importantly, it isn’t guaranteed that signal handlers are going to be called on the same thread! So even recursive locks might deadlock when it comes to signal handlers.  ↩︎ Under the hood, Tokio uses mio , which uses a self-pipe for its portable implementation. But on some platforms such as Linux, mio uses eventfd . This is an extension to Unix that is quite similar to self-pipes, and from the perspective of signal handlers serves the same purpose.  ↩︎ There’s a case to be made that is too freeform—too powerful. By hiding details about async cancellation, can lead to surprising bugs. See this blog post by boats for more. In this example, we’re handling cancellation explicitly, so the impact of those issues is less direct.  ↩︎ Video of the talk on YouTube. Slides on Google Slides. Demo repository on GitHub. Have you ever hit Ctrl-C while running a command-line program? Have you ever encountered data corruption because you hit Ctrl-C at the wrong time? You should care about signals if you’re developing a service . You’re likely going to be running under a service manager like Kubernetes, and the lifecycle generally involves signals. This screenshot shows Kubernetes documentation explaining that when your service is shutting down, it will receive a . Docker works the same way with . Note how the documentation advises that your code should “listen for this event and start shutting down cleanly at this point”. You should also care about signals if you’re developing a command-line tool . That’s because your users are impatient and if they perceive your operation as being slow for any reason, they will hit Ctrl-C and send a signal to your process. If you’re a command like or and all you’re doing is reading data , you probably don’t need to care much about signals. You’re not making any changes to the system so there’s little that can go wrong if your process dies. If you’re writing data , it’s definitely worth thinking about signal handling. There are ways to arrange for your code to be resilient to sudden termination, such as writing files atomically , or using a database like SQLite . But even if your code doesn’t depend on correct signal handling, you can likely provide a better user experience if you do handle them. Where you need to care most about signals is if you’re orchestrating an operation in a distributed system. In those cases, if the process receives a signal locally, it may wish to send out cancellation messages over the wire. Again, it’s a good idea to make your system fault-tolerant and resilient to sudden termination—for example, by serializing state in a persistent store—but at the very least, signals give you the opportunity to perform cleanup that can otherwise be difficult to do. Then, a few seconds after, I hit Ctrl-C in my terminal. The terminal sent a signal called (where means “interrupt”) to the Cargo process. The signal caused the Cargo process, as well as all the Rust compiler processes underneath it, to be interrupted and terminated. To send or Ctrl-C, you can use , where is the numeric process ID. Each signal also has an associated number. For the number is always 2, so another way of saying this is . If you don’t specify a signal and just say , it sends SIGTERM by default. is also known as , and it’s a special signal used to kill a process. What sets apart is that unlike almost all other signals, its behavior can’t be customized in any way. might be familiar to you if you’ve ever encountered a core dump. Somewhat surprisingly, the behavior of can be customized. For example, the Rust standard library customizes ’s behavior to detect call stack exhaustion 2 . and are used for what is called “ job control ”. If you’ve ever used Ctrl-Z in , or the commands or , then they uses these signals 3 . Cancel all running downloads. Flush any data to disk. In the database, mark the state of these downloads as interrupted. For example, consider what happens if a signal handler is called while you’re holding a mutex or other lock. In general, trying to acquire the same lock again will result in a deadlock 4 . So you can’t call functions that try to acquire a lock. Well, which functions acquire locks? Even something as basic as allocating memory via requires a lock, because it pokes at global structures. This means that you cannot allocate memory in a signal handler. This alone shuts off a large percentage of the things you can do in a signal handler. Another joy of signal handling is that while a handler is running, your process can receive a different signal and invoke a second signal handler. As you might imagine, this is just very hard to reason about in practice. CWE-364 : Signal Handler Race Condition CWE-432 : Dangerous Signal Handler not Disabled During Sensitive Operations CWE-479 : Signal Handler Use of a Non-reentrant Function CWE-828 : Signal Handler with Functionality that is not Asynchronous-Safe When this command is run, the shell creates a pipe . Each pipe has a write end and a read end . In the case of , the write end is held by , and the read end by . The program starts by creating a self-pipe. It then hands the write end of the pipe to a signal handler, and holds on to the read end. Then, the signal handler writes to that self-pipe. This is safe to do, because a pipe is a file descriptor, and writing to a file descriptor is async-signal-safe. Then, the program reads from the self-pipe elsewhere. One option is to set some kind of global flag , and check whether you’ve received a signal at every iteration of a loop–or maybe every Nth iteration of a loop. This works fine for small programs that are CPU-bound, but isn’t really scalable to large programs because those tend to have lots of loops in them (are you really going to add checks to every loop?) Large programs are I/O bound anyway, and it’s a bit hard to check for signals while you’re waiting for some network operation to complete. Another potential solution: Store all the state behind a mutex, and wrest control of the program by locking out all the workers. This is a solution that some programs use, but it is really difficult to coordinate state between the workers and the signal handler. I really wouldn’t recommend following this approach. The most reasonable approach for I/O-bound programs is to use message passing , where the parts of a program that deal with signals are notified that a signal has occurred. This is possible to do in a synchronous model, but, as I’ve documented in my previous post about how nextest uses Tokio , it adds a great deal of unbounded complexity. What happens in practice is that you inevitably create a nest of threads whose only responsibility is to pass messages around. The code sets up a stream of or Ctrl-C signals, then ing the method. The method resolves each time the process receives Ctrl-C. The type , which stands for a set of worker tasks (not threads!) that are running in parallel. Broadcast channels , which allows a single producer to send messages to multiple consumers. The idea here is that the main task will receive signals, and then broadcast them to workers. Create a and a broadcast channel. Spin up a stream of signals, as before. Spawn a task for each worker on the , and pass in a receiver for broadcast messages. The first branch waits for worker tasks to be done with the same code as above The second branch awaits a Ctrl-C message from the stream. The first branch drives the operation forward. The second branch waits for cancellation messages over the broadcast channel. First, you’ll likely want to handle other signals like , since isn’t the only signal you’d receive. Another common extension is to use what is sometimes called a double Ctrl-C pattern . The first time the user hits Ctrl-C, you attempt to shut down the database cleanly, but the second time you encounter it, you give up and exit immediately. The shell created a process numbered . When did that, it also created a corresponding process group with the same number. When ran , those process groups were inherited. If you want to send to a process group , in very typical Unix fashion you have to use a negative number. For example, if you run , the signal gets sent atomically to the whole process group numbered : the process, its children, grandchildren, everything. (Ctrl-Z), and ( or ). These two signals are interesting to handle in nextest: if you have timers running for your processes, for example if you want to timeout and kill process groups, you can combine and with async Rust to pause and resume those timers. (There is also a deep story about an issue in POSIX lurking here, which I’ll cover in a sequel to this post.) One example is signalfd on Linux. There is much to be said about signalfd, but it is outside the scope of this post.  ↩︎ Some other language runtimes do much, much worse things with . Here’s what Java does .  ↩︎ If you haven’t, by the way, check it out! It’s one of the cooler parts of Unix.  ↩︎ If you’ve dealt with locks you might have heard of the concept of recursive or re-entrant locks , which are locks that can be acquired by the same thread as many times as desired. Recursive locks often indicate a design flaw , because the goal of a lock is to temporarily violate invariants that are otherwise upheld outside of its context. But more importantly, it isn’t guaranteed that signal handlers are going to be called on the same thread! So even recursive locks might deadlock when it comes to signal handlers.  ↩︎ Under the hood, Tokio uses mio , which uses a self-pipe for its portable implementation. But on some platforms such as Linux, mio uses eventfd . This is an extension to Unix that is quite similar to self-pipes, and from the perspective of signal handlers serves the same purpose.  ↩︎ There’s a case to be made that is too freeform—too powerful. By hiding details about async cancellation, can lead to surprising bugs. See this blog post by boats for more. In this example, we’re handling cancellation explicitly, so the impact of those issues is less direct.  ↩︎

0 views
sunshowers 1 years ago

Debugging a rustc segfault on illumos

At Oxide , we use Helios as the base OS for the cloud computers we sell. Helios is a distribution of illumos , a Unix-based operating system descended from Solaris. As someone who learned illumos on the job, I’ve been really impressed by the powerful debugging tools it provides. I had a chance to use some of them recently to track down a segmentation fault in the Rust compiler, with the help of several of my colleagues. I learned a lot from the process, and I thought I’d write about it! I’m writing this post for an audience of curious technologists who aren’t necessarily familiar with systems work. If you’re an experienced systems developer, parts of it are likely familiar to you—feel free to skip over them. A couple of weeks ago, I wanted to make a change to the Rust standard library on illumos. I logged into my illumos box and cloned the Rust repository (revision ). Following the setup instructions , I configured the build system with the build profile. When I went to run , I saw an error with the following output: Quite concerning! Like any good technologist I tried running the command again. But the segfault seemed to be completely deterministic: the program would crash while compiling every time. Coincidentally, we had our fortnightly “Rust @ Oxide” virtual meetup at around that time. There wasn’t much to discuss there, so we turned that meeting into a debugging session. (I love how my coworkers get excited about debugging strange issues.) Like the compilers for many other languages, the Rust compiler is written in the language it is intending to compile (in this case, Rust). In other words, the Rust compiler is self-hosting . Any self-hosting compiler needs to answer the question: how in the world do you compile the compiler if you don’t already have a working compiler? This is known as the bootstrapping problem . There are several ways to address the problem, but the two most common are: Use the previous version of the compiler. In other words, use version N-1 of the compiler to compile version N. For example, use Rust 1.75 to compile Rust 1.76. The earliest versions of Rust were written in Ocaml. So if you’re spinning up Rust on a brand new platform and have an Ocaml compiler available, you can actually start from there and effectively create your own lineage of compilers. There are also implementations of Rust in other languages, like in C++, which can be used to build some (typically pretty old) version of the compiler. Interestingly, these other implementations don’t need to be perfect—for example, since they’re only used to compile code that’s known to be valid, they don’t need to handle errors well. That’s a large chunk of the complexity of a real compiler. Cross-compile from another platform. As a shortcut, if you have a way to cross-compile code from another platform, you can use that to set up the initial compiler. This is the most common method for setting up Rust on a new platform. (But note that method 1 must be used on at least one platform.) While bootstrapping from the previous version of Rust, the toolchain follows a series of stages , ranging from stage 0 to stage 2 . In our case, since we’re working with the standard library we’re only concerned with stage 0 : the standard library compiled with the previous version of . That is the build process that crashed. The first thing to find is the version of that’s crashing. There are a few ways to find the compiler, but a simple command works well: This command finds at . Let’s ask it for its version: Can the bug be reproduced independently of the Rust toolchain? The toolchain does all sorts of non-standard things, so it’s worth checking. The output says , so let’s try building that separately. Again, there are a few ways to do this, but the easiest is to make a simple Cargo project that depends on the crate. And then run . I didn’t have rustc 1.80.0 beta 1 on the machine, so I tried with the 1.80.0 release: Yep, it crashes in the same spot. This is a minimal-enough example, so let’s work with this. When a program crashes, systems are typically configured to generate a core dump , also known as a core file. The first step while debugging any crash is to ensure that core dumps are generated, and then to find one to examine it. On illumos, many of the system-level administration tools are called . The tool for managing core files is called . Let’s run that: This suggests that core “per-process core dumps” are enabled. The lack of a pattern indicates that the defaults are used. Generally, on Unix systems the default is to generate a file named in the current directory of the crashing process. A simple in our little test project doesn’t show a file, which means that it might be elsewhere. Let’s just do a global for it. This showed a few files on my system, including: . Bingo! That looks like a hit. (Why is it in the registry? Because when compiling a crate, Cargo sets the current working directory of the child process to the crate’s directory.) The next step is to move the file into another directory 1 . After doing that, let’s start examining it. The best way to examine a core file on illumos is with the Modular Debugger, . is a powerful tool that can be used to inspect the state of both live and dead processes, as well as the kernel itself. Using with the core file is simple: just run . The first step is to enable symbol demangling 2 . The command to do that in is , so let’s run that: (The output says “C++”, but illumos’s demangler can handle Rust symbols, too.) Let’s look at the CPU registers now. A register stores a small amount of data that the CPU can access very quickly. Core files typically have the contents of registers at the time of the crash, which can be very useful for debugging. In , the command to print out registers is or . Here’s the output: All right, there’s a lot going on here. A full accounting of the registers on x86-64 is beyond the scope of this post, but if you’re interested here’s a quick summary . The most important registers here are , , and . All three of these are 64-bit addresses. is the instruction pointer , also known as the program counter . is a special register that points to the next instruction to be executed. The CPU uses to keep track of where it is in the program. is the stack pointer . The call stack is a region of memory that is used to store function call information and local variables. The stack pointer points to the head of the stack. Note that on most architectures including x86-64, the stack grows down in memory: when a function is called, a new stack frame is set up and the stack pointer is decremented by however much space the function needs. is the base pointer , more commonly known as the frame pointer . It points to the base of the current stack frame 3 . We can also look at the call stack via the command. The stack turns out to be enormous ( full output ): (The is used to send the output to a shell command, in this case one that counts the number of lines.) It looks like the crash is in the parser. (Notably, the crash is while compiling a crate called , which suggests automatic code generation. Generated code often tends to stress the parser in ways that manually written code does not.) Based on the call stack, it looks like the parser is recursive in nature. A quick Google search confirms that the parser is a “simple hand-written recursive descent parser”. This isn’t surprising, since most production parsers are written this way. (For example, is also a recursive descent parser.) Turning our attention to the instruction pointer , we can use the command to disassemble the function at that address. ( Full output ; the flag ensures that addresses are not converted to very long function names.) So it looks like the crash is happening in a instruction to another function, . (Keep in mind that this information could be completely unreliable! The stack might be corrupted, the registers might be wrong, and so on. But it’s what we have for now.) On virtual memory systems , which includes all modern desktop and server systems, each process gets the illusion that it has a very large amount of memory all to itself. This is called the address space of a process. The instructions, the call stack, and the heap all get their own regions of addresses in that space, called memory mappings . The 64-bit addresses that we saw earlier are all part of the address space. has a command called to look up which part of memory an address is at. Let’s look at the stack pointer first: This tells us that the address is in the range to . This is a small 4 KiB range. What about the frame pointer? This appears to be in a different range. In this case, the ending address is (note the , not the !). This address is bytes away from the starting address. That is equal to 1028 KiB , or 1 MiB + 4 KiB page 4 . Something else that’s relevant here is what permissions each range of addresses has. Like files on Unix, a block of virtual memory can have read , write , or execute permissions. (In this case, execute means that it is valid for the instruction pointer to point here 5 .) On illumos, a tool called can show these spaces. works on both live processes and core files. Running shows the permissions for the addresses we’re interested in ( full output ): The 1028 KiB range is read-write, and the 4 KiB range above that doesn’t have any permissions whatsoever. This would explain the segfault . A segfault is an attempt to operate on a part of memory that the program doesn’t have permissions for. Attempting to read from or write to memory which has no permissions is an example of that. At this point, we have enough information to come up with a theory: But there are also other bits of evidence that this theory doesn’t explain, or even cuts against. (This is what makes post-mortem debugging exciting! There are often contradictory-seeming pieces of information that need to be explained.) The memory is marked or . That’s not how call stacks are supposed to be marked! In the output, there’s a line which says: So you’d expect call stacks to be marked with , not . Why is the size of the allocation 1028 KiB? You’d generally expect stack sizes to be a round power of two. Isn’t 1028 KiB kind of small? The thread is a non-main thread, and the default stack size for Rust threads is 2 MiB . Why is our thread ~1 MiB and not 2 MiB? On Unix platforms, for the main thread, the call stack size is determined by (in KiB). On my illumos machine, this printed , indicating a 10 MiB call stack. For child threads, the call stack size is determined by whatever created them. For Rust, the default is 2 MiB. Why doesn’t this crash happen on other platforms? If this is a crash in the parser, one would ordinarily expect it to arise everywhere. Yet it doesn’t seem to occur on Linux, macOS, or Windows. What’s special about illumos? Setting doesn’t help. Rust-created thread stack sizes can be configured via the environment variable . If we try to use that: It turns out that crashes at exactly the same spot. That’s really strange! It is possible that the stack size was overridden at thread creation time. The documentation for says: “Note that setting will override this.” But that seems unlikely. Looking towards the bottom of the call stack, there’s something really strange : Notice the jump in addresses from to ? Normally, stack addresses are decremented as new functions are called: the number goes down. In this case the stack address is incremented . The number went up. Strange. Also notice that this coincides with the use of a function called . Now that’s a real lead! What part of memory is in? says: So this address is part of the stack for thread 3. agrees : What is ? Time for some googling! Per the documentation , is: A library to help grow the stack when it runs out of space. This is an implementation of manually instrumented segmented stacks where points in a program’s control flow are annotated with “maybe grow the stack here”. Each point of annotation indicates how far away from the end of the stack it’s allowed to be, plus the amount of stack to allocate if it does reach the end. Because the parser is recursive, it is susceptible to call stack exhaustion. The use of is supposed to prevent, or at least mitigate, that. How does work? The library has a pretty simple API : The developer is expected to intersperse calls to within their recursive function. If less than bytes of stack space remain, will allocate a new segment of bytes, and run with the stack pointer pointing to the new segment. How does rustc use ? The code is in this file . The code requests an additional 1 MiB stack with a red zone of 100 KiB. Why did create a new stack segment? In our case, the call is at the very bottom of the stack, when plenty of space should be available, so ordinarily should not need to allocate a new segment. Why did it do so here? The answer is in ’s source code . There is code to guess the stack size on many platforms. But it isn’t enabled on illumos: always returns . With this information in hand, we can flesh out our call stack exhaustion theory: Some file in was triggering the crash by requiring more than 1 MiB of stack space. Had this bug occurred on other platforms like Linux, this issue would have been a showstopper. However, it wasn’t visible on those platforms because: didn’t call enough! In order for it to work, needs to be interspersed throughout the recursive code. But some recursive parts did not appear to have called it. (It is somewhat ironic that , a library meant to prevent call stack exhaustion, was actively making life worse here.) Where does the 1028 KiB come from? Looking at the source code : It looks like first computes the number of requested pages by dividing the requested stack size by the page size, rounding up. Then it adds 2 to that. In our case: This explains both the 1028 KiB allocation (one guard page after the stack), and the 4 KiB guard page we’re crashing at (one guard page before the stack). If the issue is that a 1 MiB stack isn’t enough, it should be possible to reproduce this on other platforms by setting their stack size to something smaller than the 2 MiB default. With a stack size <= 1 MiB, we would expect that: Let’s try to compile on Linux with a reduced stack size. This does crash as expected. The full output is here . Some of the symbols are missing, but the crash does seem to be in parser code. (At this point, we could have gone further and tried to make a debug-assertions build of – but it was already pretty clear why the crash was happening.) Call stack exhaustion in the parser suggests that the crash is happening in some kind of large, automatically generated file. But what file is it? It’s hard to tell by looking at the core file itself, but we have another dimension of debugging at hand: syscall tracers! These tools print out all the syscalls made by a process. Most OSes have some means to trace syscalls: on Linux, on macOS, Process Monitor on Windows, and on illumos 7 . Since we’re interested in file reads, we can try filtering it down to the and syscalls . You need to open a file to read it, after all. (Alternatively, we can also simply not filter out any syscalls, dump the entire trace to a file, and then look at it afterwards.) On illumos, we tell to run , filtering syscalls to and ( ), and following child processes ( ): This prints out every file that the child tries to open ( full output ): It looks like the crash is in a file called in the directory. With Cargo, a file being in an directory is a pretty strong indication that it is generated by a build script. On Linux, a similar command is: This command also blames the same file, . What does this file look like, anyway? Here’s my copy. It’s pretty big and deeply nested! It does look large and complex enough to trigger call stack exhaustion. Syscall traces would definitely be somewhat harder to get if the crash weren’t so easily reproducible. Someone smarter than me should write about how to figure this out using just the core file. The file’s fully loaded into memory so it seems like it should be possible. Going back to the beginning: the reason I went down this adventure was because I wanted to make an unrelated change to the Rust standard library. But the stage 0 compiler being broken meant that it was impossible to get to the point where I could build the standard library as-is, let alone test that change. How can we work around this? Well, going back to basics, where did the stage 0 compiler come from? It came from Rust’s CI, and it wasn’t actually built on illumos! (Partly because there’s no publicly-available CI system running illumos.) Instead, it was cross-compiled from Linux to illumos. Based on this, my coworker Joshua suggested that I try and do whatever Rust’s CI does to build a stage 0 compiler for illumos. Rust’s CI uses a set of Docker images to build distribution artifacts. In theory, building a patched rustc should be as simple as running these commands on my Linux machine: In reality, there were some Docker permissions issues due to which I had to make a couple of changes to the script. Overall, though, it was quite simple. Here’s the patch I built the compiler with, including the changes to the CI scripts. The result of building the compiler was a set of files, just like the ones published by Rust’s CI . After copying the files over to my illumos machine, I wasn’t sure which tarballs to extract. So I made a small change to the bootstrap script to use my patched tarballs. With this patch, I was able to successfully build Rust’s standard library on illumos and test my changes. Hooray! ( Here’s what I was trying to test.) Update 2024-08-05: After this post was published, jyn pointed out on Mastodon that is actually optional, and that I could have also worked around the issue by disabling it in the build system’s . Thanks! The bug occurred due to a combination of several factors. It also revealed a few other issues, such as the lack of an environment variable workaround and some missing error reporting. Here are some ways we can make the situation better, and help us have an easier time debugging similar issues in the future. isn’t using enough. The basic problem underneath it all is that the part of the parser that triggered the bug wasn’t calling often enough to make new stack segments. should be calling more than it is today. cannot detect the stack size on illumos. This is something that we should fix in , but this is actually a secondary issue here. On other platforms, ’s ability to detect the stack size was masking the bug. Fixing this requires two changes: -created segments don’t print a nice message on stack exhaustion. This is a bit ironic because is supposed to prevent stack exhaustion. But when it does happen, it would be nice if printed out a message like standard Rust does. On illumos, the Rust runtime doesn’t print a message on stack exhaustion. Separate from the previous point, on illumos the Rust runtime doesn’t print a message on stack exhaustion even when using native stacks. Rust’s CI doesn’t run on illumos. At Oxide, we have an existential dependency on Rust targeting illumos. Even a shadow CI that ran on nightly releases would have caught this issue right away. We’re discussing the possibilities for this internally; stay tuned! segment sizes can’t be controlled via the environment. Being able to control stack sizes with is a great way to work around issues. It doesn’t appear that segment sizes can be controlled in this manner. Maybe that functionality should be added to , or to itself? Maybe a crater run with a smaller stack size? It would be interesting to see if there are other parts of the Rust codebase that need to call more as well. suggests disabling optional components. Since was an optional component that can be disabled, the tooling could notice if a build failed in such a component, and recommend disabling that component. Added 2024-08-05, suggested by jyn . To me, this is the most exciting part of debugging: what kinds of changes can we make, both specific and systemic ones, to make life easier for our future selves? This was a really fun debugging experience because I got to learn about several illumos debugging tools, and also because we could synthesize information from several sources to figure out a complex issue. (Thankfully, the root cause was straightforward, with no memory corruption or other “spooky action at a distance” involved.) Debugging this was a real team effort. I couldn’t have done it without the assistance of several of my exceptional colleagues. In no particular order: Thanks to all of you! I neglected to do this during my own debugging session, which led to some confusion when I re-ran the process and found that the core file had been overwritten.  ↩︎ Name mangling is a big topic of its own, but the short version is that the Rust compiler uses an algorithm to encode function names into the binary. The encoding is designed to be reversible, and the process of doing so is called demangling. (Other languages like C++ do name mangling, too.)  ↩︎ You might have heard about “frame pointer omission”, which is a technique to infer the base of stack frames rather than storing it in explicitly. In this case, the frame pointer is not omitted.  ↩︎ A page is the smallest amount of physical memory that can be atomically mapped to virtual memory. On x86-64, the page size is virtually always 4 KiB.  ↩︎ Memory being both writable and executable is dangerous, and modern systems do not permit this by default for security reasons. Some platforms like iOS even make it impossible for memory to be writable and executable, unless the platform holder gives you the corresponding permissions.  ↩︎ This is generally known as a “stack overflow”, but that term can also mean a stack-based buffer overflow . Throughout this document, we use “call stack exhaustion” to avoid confusion.  ↩︎ There is likely some way to get itself to print out which files it opened, but the beauty of system call tracers is that you don’t need to know anything about the program you’re tracing.  ↩︎ Use the previous version of the compiler. In other words, use version N-1 of the compiler to compile version N. For example, use Rust 1.75 to compile Rust 1.76. From where do you begin, though? The earliest versions of Rust were written in Ocaml. So if you’re spinning up Rust on a brand new platform and have an Ocaml compiler available, you can actually start from there and effectively create your own lineage of compilers. There are also implementations of Rust in other languages, like in C++, which can be used to build some (typically pretty old) version of the compiler. Interestingly, these other implementations don’t need to be perfect—for example, since they’re only used to compile code that’s known to be valid, they don’t need to handle errors well. That’s a large chunk of the complexity of a real compiler. Cross-compile from another platform. As a shortcut, if you have a way to cross-compile code from another platform, you can use that to set up the initial compiler. This is the most common method for setting up Rust on a new platform. (But note that method 1 must be used on at least one platform.) is the instruction pointer , also known as the program counter . is a special register that points to the next instruction to be executed. The CPU uses to keep track of where it is in the program. is the stack pointer . The call stack is a region of memory that is used to store function call information and local variables. The stack pointer points to the head of the stack. Note that on most architectures including x86-64, the stack grows down in memory: when a function is called, a new stack frame is set up and the stack pointer is decremented by however much space the function needs. is the base pointer , more commonly known as the frame pointer . It points to the base of the current stack frame 3 . The thread had a call stack of 1028 KiB available to it, starting at . The call stack pointer was at (only = 320 bytes away), and it tried to create a frame of size (1312) bytes, at . This caused the call stack to be exhausted : the thread ran out of space 6 . When the thread ran out of space, it indexed into a 4 KiB section known as a guard page . The thread did not have any permissions to operate on the page, and was in fact designed to cause a segfault if accessed in any way. The program then (correctly) segfaulted. The memory is marked or . That’s not how call stacks are supposed to be marked! In the output, there’s a line which says: So you’d expect call stacks to be marked with , not . Why is the size of the allocation 1028 KiB? You’d generally expect stack sizes to be a round power of two. Isn’t 1028 KiB kind of small? The thread is a non-main thread, and the default stack size for Rust threads is 2 MiB . Why is our thread ~1 MiB and not 2 MiB? How are call stack sizes determined? On Unix platforms, for the main thread, the call stack size is determined by (in KiB). On my illumos machine, this printed , indicating a 10 MiB call stack. For child threads, the call stack size is determined by whatever created them. For Rust, the default is 2 MiB. Why doesn’t this crash happen on other platforms? If this is a crash in the parser, one would ordinarily expect it to arise everywhere. Yet it doesn’t seem to occur on Linux, macOS, or Windows. What’s special about illumos? Setting doesn’t help. Rust-created thread stack sizes can be configured via the environment variable . If we try to use that: It turns out that crashes at exactly the same spot. That’s really strange! It is possible that the stack size was overridden at thread creation time. The documentation for says: “Note that setting will override this.” But that seems unlikely. Some file in was triggering the crash by requiring more than 1 MiB of stack space. The parser running against needed more than 1 MiB of stack space, but less than 2 MiB. Had this bug occurred on other platforms like Linux, this issue would have been a showstopper. However, it wasn’t visible on those platforms because: Threads created by Rust use a 2 MiB stack by default. requested that create a 1 MiB stack segment, but only if less than 100 KiB of stack space was left. On the other platforms, could see that well over 100 KiB of stack space was left, and so it did not allocate a new segment. On illumos, could not see how much stack was left, and so it allocated a new 1 MiB segment. This 1 MiB stack was simply not enough to parse . didn’t call enough! In order for it to work, needs to be interspersed throughout the recursive code. But some recursive parts did not appear to have called it. The requested stack size is 1 MiB. With 4 KiB pages, this works out to 256 pages. then requests 256 + 2 = 258 pages, which is 1032 KiB. calls as before. There are two possibilities: either decides there is enough stack space and doesn’t create a new segment, or it decides there isn’t enough and does create a new 1 MiB segment. In either case, 1 MiB is simply not enough to parse , and the program crashes. isn’t using enough. The basic problem underneath it all is that the part of the parser that triggered the bug wasn’t calling often enough to make new stack segments. should be calling more than it is today. Filed as rust-lang/rust#128422 . cannot detect the stack size on illumos. This is something that we should fix in , but this is actually a secondary issue here. On other platforms, ’s ability to detect the stack size was masking the bug. Fixing this requires two changes: A PR to to add the function to it. A PR to to use this function to detect the stack size on illumos. -created segments don’t print a nice message on stack exhaustion. This is a bit ironic because is supposed to prevent stack exhaustion. But when it does happen, it would be nice if printed out a message like standard Rust does. This is rust-lang/stacker#59 . On illumos, the Rust runtime doesn’t print a message on stack exhaustion. Separate from the previous point, on illumos the Rust runtime doesn’t print a message on stack exhaustion even when using native stacks. Filed as rust-lang/rust#128568 . Rust’s CI doesn’t run on illumos. At Oxide, we have an existential dependency on Rust targeting illumos. Even a shadow CI that ran on nightly releases would have caught this issue right away. We’re discussing the possibilities for this internally; stay tuned! segment sizes can’t be controlled via the environment. Being able to control stack sizes with is a great way to work around issues. It doesn’t appear that segment sizes can be controlled in this manner. Maybe that functionality should be added to , or to itself? Opened a discussion on internals.rust-lang.org . Maybe a crater run with a smaller stack size? It would be interesting to see if there are other parts of the Rust codebase that need to call more as well. suggests disabling optional components. Since was an optional component that can be disabled, the tooling could notice if a build failed in such a component, and recommend disabling that component. Added 2024-08-05, suggested by jyn . Joshua M. Clulow Matt Keeter Cliff Biffle Steve Klabnik artemis everfree I neglected to do this during my own debugging session, which led to some confusion when I re-ran the process and found that the core file had been overwritten.  ↩︎ Name mangling is a big topic of its own, but the short version is that the Rust compiler uses an algorithm to encode function names into the binary. The encoding is designed to be reversible, and the process of doing so is called demangling. (Other languages like C++ do name mangling, too.)  ↩︎ You might have heard about “frame pointer omission”, which is a technique to infer the base of stack frames rather than storing it in explicitly. In this case, the frame pointer is not omitted.  ↩︎ A page is the smallest amount of physical memory that can be atomically mapped to virtual memory. On x86-64, the page size is virtually always 4 KiB.  ↩︎ Memory being both writable and executable is dangerous, and modern systems do not permit this by default for security reasons. Some platforms like iOS even make it impossible for memory to be writable and executable, unless the platform holder gives you the corresponding permissions.  ↩︎ This is generally known as a “stack overflow”, but that term can also mean a stack-based buffer overflow . Throughout this document, we use “call stack exhaustion” to avoid confusion.  ↩︎ There is likely some way to get itself to print out which files it opened, but the beauty of system call tracers is that you don’t need to know anything about the program you’re tracing.  ↩︎

0 views
sunshowers 1 years ago

Professionals demonstrate empathy

As of this writing, it appears that another open-source project lead has rejected a proposal to replace gendered he language with they , citing vague concerns about “politics.” (Please refrain from contacting the project about this; it’s counterproductive.) At Oxide, one of our core values is empathy: the ability to understand others’ perspectives. I believe empathy is essential in our professional lives. While we are naturally inclined to empathize, applying it effectively is a skill that requires ongoing effort. (I certainly haven’t always been perfect at it!) Furthermore, leaders must especially exemplify empathy, setting standards for their organizations. It should hopefully be evident that rejecting the use of they language lacks empathy, in addition to being historically inaccurate. Shakespeare, for instance, used they as a singular pronoun, even when referring to individuals known to be male. In Act IV, Scene 3 of A Comedy of Errors , he writes: There’s not a man I meet but doth salute me As if I were their well-acquainted friend The dual usage of “he” as both gendered and gender-neutral has historically been used to exclude women and nonbinary individuals. For example : […] although grammarians had insisted for centuries that masculine pronouns could be gender neutral, the masculine pronouns in the various state voting laws were typically interpreted to exclude women. If you’re an open-source project lead and receive a proposal to use they language, please understand it’s not an attack on you or your project. Try to empathize with the person making the suggestion and consider the historical context, including the gender disparity in our industry and how he has been used to discriminate against those who aren’t male (even if you don’t personally do so). I also want to emphasize that brigading a project, even with good intentions, is unlikely to be helpful. As much as possible, try approaching this issue with empathy as well. (In this case, reaching out to the project lead privately might have been more effective.)

0 views
sunshowers 2 years ago

ECC RAM on AMD Ryzen 7000 desktop CPUs

One of the coolest features of AMD’s Ryzen desktop CPUs, and historically a great reason to get them over the competition, was the official support for error-corrected memory (ECC RAM) 1 . With most Ryzen 1000 through 5000 series CPUs and the right motherboards, ordinary users could get ECC RAM going without having to spring for more expensive workstation-grade CPUs. For example, here’s the specification page for the ASRock B550 Steel Legend motherboard. This is a mainstream “B” series motherboard which lists detailed compatibility information for ECC RAM by processor generation. (To my knowledge ASRock has had the best support for ECC RAM in Ryzen motherboards, and I’ve been very happy with their motherboards in general.) Unfortunately, when the AMD Ryzen 7000 “Raphael” CPUs were launched along with the brand new Socket AM5 , all mention of ECC support was gone. The specification page for the ASRock X670E Taichi, one of the most expensive AM5 motherboards you can buy, has no mention of ECC support as of the date of writing this. I still decided to upgrade to a Ryzen 7950X, and overall I’ve been happy with the performance of the new processor. But the lack of ECC was a huge bummer at the time of purchasing my system. A couple months ago I came across a topic on the ASRock forums talking about ECC support on AM5 motherboards, in which a user called ApplesOfEpicness said that they’d worked with an AMD engineer to get ECC RAM going within AMD’s AGESA firmware. They’d claimed to have tested it on an ASRock motherboard with an updated UEFI, by shorting ground and data pins, and seeing errors be reported up to the OS. I was intrigued by this! Even though I didn’t have the same motherboard that ApplesOfEpicness did, I had chosen an ASRock board (the B650E PG Riptide )—I had figured that if ECC was possible on any AM5 board at all, it would be supported on ASRock. So based on the forum post, last week I ordered a pair of 32 GB server-grade ECC sticks from v-color . I updated my motherboard’s UEFI to the latest version (version 1.28 with AGESA 1.0.0.7b), and then replaced my existing RAM with the new sticks. I started up the system, and after a very long link training process 2 … it booted up! On the Linux side, all indications were that the ECC memory was functioning correctly. reported: (The “Total Width” field is the important one here. For non-ECC RAM it read 64 bits, but in my case it was 72 bits because 64-bit ECC RAM has an extra 8 bits of parity data.) Also, the Linux kernel reported that its error detection and correction subsystem, EDAC , was enabled: Looking good so far! At this point it’s worth asking about the source of these messages. Where is the data coming from and why should we believe it? Let’s look at first. starts with : dmidecode is a tool for dumping a computer’s DMI (some say SMBIOS) table contents in a human‐readable format. This table contains a description of the system’s hardware components, as well as other useful pieces of information such as serial numbers and BIOS revision. Thanks to this table, you can retrieve this information without having to probe for the actual hardware. While this is a good point in terms of report speed and safeness, this also makes the presented information possibly unreliable. Oh, interesting, “possibly unreliable” is a little concerning! What is this SMBIOS thing anyway? Wikipedia says : In computing, the System Management BIOS (SMBIOS) specification defines data structures (and access methods) that can be used to read management information produced by the BIOS of a computer. This eliminates the need for the operating system to probe hardware directly to discover what devices are present in the computer. So the data presented by is coming from the UEFI , not from the processor 3 . What this means is that the memory is ECC- capable , but not necessarily that it is active . Whether ECC is active is ultimately determined by the memory controller on the system. When I mentioned setting up ECC at work , Robert Mustacchi pointed me to the excellent illumos documentation about AMD’s Unified Memory Controller . I did some reading and learned that essentially, AMD processors expose a bus called the System Management Network (SMN). Among other things, this bus can be used to query and configure the AMD Unified Memory Controller (UMC). NOTE: The information in the rest of this section is not part of the public AMD Processor Programming Reference, but can be gleaned from the source code for the open-source Linux and illumos kernels. WARNING: Accessing the SMN directly, and especially sending write commands to it, is dangerous and can severely damage your computer. Do not write to the SMN unless you know what you’re doing. The idea is that we can ask the UMC the question “is ECC enabled” directly, by sending a read request over the SMN to what is called the register. The exact addresses involved are a little bit magical, but on illumos with a Ryzen 7000 processor, here’s how you would query the UMC over the SMN bus (channel 0 and channel 1 are the two memory channels on the system, and each channel has one of the 32GB sticks plugged into it.) ( is the illumos equivalent to .) Also, illumos comes with a really nice way to break up a hex value into bits: The bit we’re interested in here is bit 30. If it’s set, then ECC is enabled in the memory controller. Can we replicate this query on Linux? Turns out we can! There’s a neat little driver called which provides access to the SMN bus. It’s easy to download and install (though on my system I needed to apply a patch ). The driver exposes a file called which can be used to perform a query over the SMN bus. The documentation says that to perform a query, we must write 4 bytes to the file in little-endian format , and then read 4 bytes from the output in little-endian format. This isn’t convenient to do via the command line, so let’s write a small Python script: Running this script, I got: Bit 30 (the first nibble’s ) is set, which means the memory controller is reporting that ECC is enabled. This query should also be possible on Windows, perhaps using this tool , though I can’t vouch for it. The most foolproof way to test whether ECC is working is to introduce an error somehow. I don’t quite have the courage to physically short pins, nor the patience to slowly overclock my RAM, waiting multiple minutes for DDR5 link training each time. So instead, I’m content with knowing that the memory controller is reporting that ECC is enabled. Organically, I haven’t seen any errors so far. If a correctable or uncorrectable error does occur at some point, I’ll update this post with that information. Earlier in this post I’d mentioned that the Linux kernel reported that EDAC was enabled. I was curious what the data source for that was, so I dug into the Linux kernel source code. Being generally unfamiliar with the Linux codebase, I used the tried and tested strategy of searching for strings that get logged. In this case: Going by just the name, sounds like it would be querying the UMC. So let’s look at what it does . It looks like it’s checking that ’s bit is set. And what is ? It’s bit 30 ! So it looks like the messages are only shown if the UMC reports that ECC is enabled. This means that, at least on AMD processors, the Linux kernel message is a reliable indicator that ECC is enabled. ECC RAM is great, and you can easily get it working on Ryzen 7000 desktop CPUs, at least with ASRock motherboards. I learned a ton of low-level processor interface details along the way. Thanks again to Robert for teaching me about a lot of the details here! While ECC RAM is probably overkill for most desktop users and I don’t have it in my gaming PC, I’ve seen enough random bit flips on servers to know that I would like to have ECC RAM in the computer I depend on for my livelihood.  ↩︎ The internet is replete with complaints about slow Zen 4 boot times —these is in part due to DDR5 link training, which is a very slow process. With my system, link training almost three minutes for just 64 GB of RAM. Thankfully, at least on Ryzen 7000 desktops, it only needs to be done once after replacing RAM or changing timings. The UEFI caches the results of link training and reuses those values on subsequent boots.  ↩︎ There’s some nuance here: some information like the memory speed does come from the memory controller. The ECC information, however, comes from the UEFI.  ↩︎ ApplesOfEpicness did so by shorting a data and ground pin on their motherboard. Another way would be to try and overclock the RAM until it gets to an unstable point. Searching for led me to find this line inside . This function is called here inside . is only called if returns true. What is ? It is set to a function called in this code . And is set to when the processor family is >= 0x17 . Ryzen 7000 (Zen 4) is family 0x19 . While ECC RAM is probably overkill for most desktop users and I don’t have it in my gaming PC, I’ve seen enough random bit flips on servers to know that I would like to have ECC RAM in the computer I depend on for my livelihood.  ↩︎ The internet is replete with complaints about slow Zen 4 boot times —these is in part due to DDR5 link training, which is a very slow process. With my system, link training almost three minutes for just 64 GB of RAM. Thankfully, at least on Ryzen 7000 desktops, it only needs to be done once after replacing RAM or changing timings. The UEFI caches the results of link training and reuses those values on subsequent boots.  ↩︎ There’s some nuance here: some information like the memory speed does come from the memory controller. The ECC information, however, comes from the UEFI.  ↩︎

0 views
sunshowers 2 years ago

Dealing with tempfile cleaners

I used to use this as a place to write long-form thoughts, but with the demise of Twitter I want to also write short-form posts here under the #shorts tag. I also post on Mastodon but that’s a little more ephemeral and less searchable than this blog. Let’s say you’re creating a temporary directory with some cached artifacts to share across several tests. It’s a bit more ephemeral than, say, the Cargo directory, so your initial idea might be to just put your artifacts in the temporary directory . Also, let’s say that you want to reuse the directory across a span of several days or more (perhaps by hashing a set of inputs and using that to name the directory). If your artifacts span multiple files, do not do this . The reason is that some operating systems and environments have temporary file cleaners, also known as tempfile reapers. macOS ships one by default , and some other environments also have a cleaner like configured. These reapers may end up deleting some of your temporary files but not others, leading to an inconsistent cache with part of it missing. The easiest alternative is to store artifacts in a single file rather than a directory. That way, either the file exists or it got cleaned up—there’s no in-between, inconsistent state. Another option is to record the list of files in a manifest, and invalidate the cache if any of those files are missing. A third option is to store your cache somewhere other than the system temporary directory. But then you need a way to evict old cache entries. (Also, be aware that if your temporary directory is in a global location like , users can overwrite each other’s artifacts. You can avoid that with a and/or something like ). If you can get away with not extracting files and all you need is random access, a zip file works well. For sequential access or if you need to extract files, a (possibly compressed) tarball is a great option.

0 views
sunshowers 3 years ago

How (and why) nextest uses tokio

Update 2022-11-03: This was originally the first in a series of two blog posts. I’ve now marked this as a standalone post. Content originally slated for part 2 will now be published as followup posts. I’m the primary author and maintainer of cargo-nextest , a next-generation test runner for Rust released earlier this year. (It reached a thousand stars on GitHub recently; go star it here! ) Nextest is faster than for most Rust projects, and provides a number of extra features, such as test retries , reusing builds , and partitioning (sharding) test runs . To power these features, nextest uses Tokio , the leading Rust async runtime for writing network applications. Wait, what? Nextest isn’t a network app at all 1 . However, nextest has to juggle multiple concurrent tests, each of which can be in the process of starting, printing output, failing, timing out, being cancelled or even forcibly stopped. In this post, I’ll try to convince you that an async runtime is actually the perfect approach to handle a large number of heterogenous events like this. The challenges posed by interprocess communication within a system are really similar to those posed by remote communication across systems . Your computer is a distributed system . Just like most network services, nextest spends almost all of its time waiting rather than computing. In other words, nextest provides the prototypical use-case for asynchronous concurrency , one of the two main paradigms for non-sequential code. Why is an async runtime like Tokio good for I/O-bound applications? The answer has to do with heterogenous selects 2 . You might have heard of the select operation: in computer parlance, it means to observe two or more operations, and then to proceed with the first one that completes. The name of the operation comes from the Unix syscall. With and its successors like , you can typically only wait on one of several I/O read or write operations (represented through file descriptors). But real-world I/O-bound programs need to do much more than that! Here are some other kinds of operations they may need to wait on: I/O-bound programs often need to wait across several of these sources at the same time. For example, it’s common for services to make a network request, then select across: This is an example of a heterogenous select (or, spelled slightly differently, a heterogeneous select ): a select operation that can span several different kinds of sources. We also run into portability considerations: is Unix-specific, and is for the most part Linux-only 3 . Other operating systems have their own -like abstractions, each with their own peculiarities: Some platforms may also support selecting across some kinds of heterogenous sources. For example, Linux has and . To the extent that these are supported, there are significant differences in the APIs available on each platform. And all of them make selecting across channels and other user code quite difficult. The divergence in platform APIs and capabilities may not be a huge deal for certain kinds of network services that only target one platform, but is a real problem for a cross-platform developer tool like nextest. So there are two separate but related sets of problems here: Async runtimes like Tokio solve both sets of problems in one go 4 . In fact, I’d like to make a bold claim here: The ability to do heterogenous selects is the point of async Rust . At this point it’s worth mentioning that there are two related, but distinct, things here: It is definitely possible to leverage async runtimes without ever using the and keywords—in fact, I’ve written large chunks of a production service in Rust that used an async runtime before / was ever a thing. However, without syntactic support it is often very difficult to do so, since you may need to hand-write complicated state machines. The and keywords dramatically simplify the experience of writing state machines. David Tolnay talks about this more in his blog post . (The / example in his blog post was part of the aforementioned production service, and it was my code that he rewrote.) With this background, we’re now ready to start looking at how nextest’s current architecture came about. The first versions of nextest all ran tests using “normal” threads. Here’s the general structure they used for their execution model : The run pool: Tests were run via a thread pool. The signal thread: To handle signals such as Ctrl-C cleanly, nextest spun up a separate thread, and if a signal is detected, wrote to another channel: the signal event channel . The main thread: The main thread was dedicated to monitoring the two event channels. This thread would select across the test event and signal event channels : Importantly, only lets you select over crossbeam channels, not heterogenous sources in general. Anything else, including pipe reads or timers, must be translated over to crossbeam channels somehow. One of the first demands placed on nextest was the need to identify slow tests . Note that duct’s method doesn’t have any notion of a timeout: it just blocks until the process is completed. Luckily, has a method . So the solution was straightforward: set up another thread pool, called the wait pool , which would wait on test processes to exit. So now this becomes: Here’s the code : This works, but you can already see the cracks starting to show. To solve this issue we had to introduce a whole new thread pool on top of the existing run pool. A few months after nextest was released, a user requested another feature: the ability to terminate tests that take more than a certain amount of time. Now, nextest could terminate tests immediately with the unblockable signal 7 (aka ). But it’s generally a good idea to give tests a bit of a grace period. So we chose to first send tests a signal, then wait a bit for them to clean up before sending . To do this, we defined two timeouts: The test execution flow is then: And the termination loop: Here’s the code : You can already start seeing the code becoming long and rather janky, with its 100ms polling loop. Some tests spawn server or other subprocesses, and sometimes those subprocesses don’t get cleaned up at the end of the test. In particular, nextest is concerned about processes that inherit standard output and/or standard error. Here’s an example test: Since inherits standard output and standard error from the test, it keeps those file descriptors open until it exits. Previously, nextest would simply wait until those subprocesses exited. In the worst case, the subprocesses wouldn’t exit at all: think of a server process that gets started but never shut down. That can result in nextest simply hanging, which isn’t great. How can nextest detect these sorts of leaky tests ? One solution is to wait for standard output and standard error to be closed until a short duration (say 100ms) passes. While this is conceptually simple, implementing it is surprisingly complicated. Waiting on one handle being closed is nontrivial; waiting on two handles being closed (standard output and standard error) adds an extra challenge. I spent some time looking at how to implement it manually, and came away from it with the understanding that I’d have to maintain separate implementations for Windows and Unix. Not only that, I’d just be duplicating a lot of the work Tokio had already done to abstract over platform differences. So I started looking at porting over nextest to Tokio. I spent a weekend porting over nextest to use async Rust with Tokio. After a bit of iteration, I settled on this general architecture: Here’s the bit that selects over the test event and signal event channels : This is really similar to the crossbeam-based implementation above, with just one real change: The code tracks a state to see if the signal event channel has been dropped. This is to ensure that the loop doesn’t repeatedly get values from the signal handler, wasting cycles. (This is more important with futures that, once completed, cannot be waited on further, as we’ll see in the next section.) Where this gets really interesting is with the loop nextest uses to wait for tests to complete. It turns out that all the code we wrote above can be replaced with a set of Rust and Tokio primitives, written in a roughly declarative style. After starting the process, nextest sets up a timer, and futures to read from standard output and standard error: Some notes about the code above: Remember how we talked about heterogenous selects in part 1 of this post? This is where we start using them. Notice how only works on crossbeam channels. On the other hand, Tokio’s select works on arbitrary, heterogenous sources of asynchronicity. Here’s how nextest now waits on tests: Then, select across the futures listed above in a loop, using Tokio’s ability to handle heterogenous selects with . If data is available via the standard output or standard error futures: If the timer fires: If the process exits: Here’s the code : And then, how does nextest detect leaked handles? After the main loop ends, nextest just runs another little loop for that: If data is available via the standard output or standard error futures: If both the standard output and standard error futures have completed: If the timer fires before both futures have completed: Here’s the code : That’s it! We’ve replaced all of that unwieldy code with a couple of declarative loops. One of the promises of Rust’s type system is that it’ll catch most kinds of bugs before you ever run the program. In my experience, this promise is mostly upheld, even when dealing with async code. For a few compile-time issues like pinning, it was easy to find solutions . I ran into two runtime issues: There’s a bit of a meme in the Rust world: that if you can “just use” threads, you should ; and that async code is really only “worth it” for network services that need to be massively concurrent . (I don’t mean to pick on the individual authors here. It’s a general view in the Rust community.) While I agree that async Rust isn’t easy , I believe nextest’s experience using tokio directly challenges that general view. The point of async, even more than the concurrency, is to make it easy to operate across arbitrary sources of asynchronicity. This is exactly what nextest needs to do. Any Rust program that is I/O-bound, or any program that needs to do heterogenous selects, should strongly consider using an async runtime like Tokio. Thanks to Fiona , Inanna , Eliza , Brandon , Predrag , and Michael for reviewing drafts of this article. And thank you for reading it to the end. The flowcharts are available in this Google folder , and are licensed under CC BY 4.0 . 2022-10-06: Other than its self-update functionality, which isn’t relevant here.  ↩︎ Full credit for the term and framing goes to Eliza Weisman .  ↩︎ Some other systems have added for compatibility with Linux: for example, illumos .  ↩︎ The platform-specific implementations are usually abstracted away with a low-level library like Mio .  ↩︎ The runner loop needed a scoped thread pool because it relied on borrowed data—it used rayon ’s for that purpose.  ↩︎ MPSC stands for “multi-producer, single consumer”. This kind of channel lets many threads (in this case, the threads running the tests) send data to a single “status” thread.  ↩︎ Or its equivalent on Windows, TerminateProcess .  ↩︎ Why ? Because we’d like tests to be started in lexicographic order, but results to be reported in the order tests finish in. That is exactly the behavior provides. One alternative was to spawn every test as a separate future and use a semaphore to control concurrency, but that might have resulted in tests being started in arbitrary order.  ↩︎ The async concurrency style is best for I/O-bound applications, where most time is spent waiting on other parts of the overall system to respond. An I/O-bound application becomes faster if I/O becomes faster—for example, if your network link becomes faster, or you replace a spinning disk hard drive with a solid-state drive. On the other end of the spectrum, data parallelism with libraries like Rayon is best for compute-bound applications, where you’re running a lot of computations in parallel on a set of input data. These become faster if you replace your CPU with a faster one. Timeouts or recurring timers Channel reads and writes when using message-passing, including bounded channels with backpressure Process exits Unix signals a network response a termination signal, so any pending network operations can be canceled BSD-like platforms have ( FreeBSD , macOS ). Windows has I/O completion ports . The need to select over arbitrary kinds of heterogenous sources, not just I/O operations. The need to do so in a cross-platform manner. An async runtime : a framework that manages an event loop, handling events from heterogenous sources and associating them with individual tasks. The and keywords , which provide an easy way to write tasks that wait on events handled by an async runtime. Nextest first built up a list of tests. It then spun up a thread pool with the number of tests 5 , called the run pool . For each test in the list, nextest used a thread in the run pool. The thread ran the test using the duct library, then blocked until the test exited. (nextest used duct for convenience—we could have used instead, which wouldn’t have changed much.) To send information about test events (the start, end, success/failure, and time taken for individual tests) to the main thread, nextest used a crossbeam MPSC channel 6 : we’ll call this the test event channel . If this loop received a test event, it would bubble that up to the reporter (the component that prints nextest’s output). If a test failed, it would set a cancellation flag which would cause no further tests to be scheduled. If the loop received a signal event, it would set the cancellation flag. Start the test process using a thread in the run pool. Spawn a wait pool thread to wait on the test. In the run pool, wait on the thread in the wait pool with a timeout. If the test finishes before the timeout, mark it done. If the timeout is hit, mark the test slow and go back to waiting on the test. A slow timeout , after which tests are marked as slow. Nextest’s default slow timeout is 60 seconds. A termination timeout , after which tests are killed. Nextest doesn’t have a default termination timeout, but a typical one users might set is 5 minutes (300 seconds). Start the test process using a thread in the run pool. Spawn a wait pool thread to wait on the test. In the run pool, wait on the thread in the wait pool with the slow timeout. If the test finishes before the slow timeout, mark it done. If the slow timeout is hit and the termination timeout hasn’t been hit yet, mark the test as slow and go back to waiting on it. If the termination timeout is hit, set up a termination loop. Send the process . In a loop, for up to 10 seconds: Check if the test process exited. If it did, break out of the loop. Otherwise, wait 100ms and check the test’s status again. If the test hasn’t exited by the end of the loop, send it . Create a Tokio in the beginning. The lasts for as long as the runner loop does. Turn the list of tests into a , then for each test, create a future that executes the test. Use to control concurrency 8 . Switch over crossbeam channels to tokio channels as needed, and invocations to . On being polled, the futures repeatedly read standard output and standard error into their corresponding buffers in 4 KiB chunks. They only exit when 0 bytes are read, which corresponds to the handle being closed. The code needs to maintain and flags: this is similar to above, except this time it’s necessary because async blocks cannot be polled once completed: if done so, they panic with the message “ resumed after completion”. The futures borrow data from the stack. This is possible thanks to async Rust’s integration with the borrow checker. The futures are pinned to the stack , which means that they cannot be moved. This is a fundamental part of how zero-overhead async Rust works. An async block must be pinned before it can be polled. Start the test process, which sets up a future that completes when the test exits. Set up a timer for the slow timeout. Set up futures to read from standard output and standard error. Read data into the corresponding buffer. Loop back and select again. If the terminate timeout is hit, terminate the test and exit the loop. If not, mark as slow, loop back and select again. Exit the execution loop. Proceed to the leak detection loop below. Set up a short (by default 100ms) leak timer. Select across the leak timer and the standard output and error futures from before, in a loop. Read data into the corresponding buffer. Loop back, and select again. Note that the test didn’t leak. Conclude test execution. Note that the test leaked handles. Conclude test execution. A panic with the message “ resumed after completion”. The solution to this was easy to find . The main loop that selected across the test event and signal event channels hung during longer test runs; this was caused by a statement . Removing the fixed it, but I mostly found that out by trial and error. I’m not sure how to debug this kind of issue in a principled manner. Add a note that platform-specific APIs like and can provide heterogenous selects to some extent, though not as much as Tokio can. Updated and implementations to match changes in #560 . I’d forgotten to add Michael Gattozzi to the reviewers list. Fixed, sorry! Other than its self-update functionality, which isn’t relevant here.  ↩︎ Full credit for the term and framing goes to Eliza Weisman .  ↩︎ Some other systems have added for compatibility with Linux: for example, illumos .  ↩︎ The platform-specific implementations are usually abstracted away with a low-level library like Mio .  ↩︎ The runner loop needed a scoped thread pool because it relied on borrowed data—it used rayon ’s for that purpose.  ↩︎ MPSC stands for “multi-producer, single consumer”. This kind of channel lets many threads (in this case, the threads running the tests) send data to a single “status” thread.  ↩︎ Or its equivalent on Windows, TerminateProcess .  ↩︎ Why ? Because we’d like tests to be started in lexicographic order, but results to be reported in the order tests finish in. That is exactly the behavior provides. One alternative was to spawn every test as a separate future and use a semaphore to control concurrency, but that might have resulted in tests being started in arbitrary order.  ↩︎

0 views
sunshowers 4 years ago

Open and closed universes

Type systems are tools for modeling some aspect of reality. Some types need to represent one of several different choices. How should these different kinds of choices be modeled? If you’re writing an end-user application, none of this matters. You have full control over all aspects of your code, so use whatever is most convenient. However, if you’re writing a library that other developers will use, you may care about API stability and forward compatibility. A simple example is the type in Rust, or its analogs like in Haskell. is defined as : Part of the definition of is that it can only be either or . This is never going to change, so values form a closed universe. A simple enum is appropriate for such cases. The main advantage of this approach is that it minimizes the burden on consumers: they know that an can only be or , and only have to care about such cases. The main disadvantage is that any sort of new addition would constitute API breakage. You’re expanding a universe you formerly assumed to be closed, and everyone who depends on you needs to be made aware of this. As a library author, you may run into two kinds of open universes: If you’re trying to model semi-open universes, consider using a non-exhaustive enum. For example, you may have an error type in a library that bubbles up other errors. One way to write this is: The problem with this approach is that errors typically do not form a closed universe. A future version may depend on some other library, for example , and you’d have to add a new option to the type: Anyone matching on the type would have to handle this new, additional case… unless they’ve specified a wildcard pattern . Some programming languages allow library developers to force consumers to specify a wildcard pattern. In Rust, you can annotate a type with : Anyone matching against the error type would have to add an additional wildcard pattern: This ensures that the library developer can continue to add more options over time without causing breakages. The main downside is that all consumers need to handle the additional wildcard pattern and do something sensible with it 1 . The non-exhaustive enum approach is best suited for semi-open universes. What about situations where consumers should have the ability to add new choices? The best way is often to use a string, or a newtype wrapper around it. For example, one may wish to represent a choice between computer architectures such as , , and . More architectures may be added in the future, too, but often the library just needs to pass the architecture down to some other tool without doing much processing on it. One way to represent this is 2 : Well-known strings can be represented as constants: This allows known choices to not be stringly-typed , and currently-unknown ones to be used without having to change the library. The format doesn’t have to be restricted to strings. For example, an may wish to also carry its bitness . In general, this works best for choices that are: One pattern I’ve seen is to use another variant that represents an unknown value. For example: The problem with this approach is that it becomes very hard to define equality and comparisons on these types. Are and the same, or are they different? In practice, this becomes tricky very quickly. In general, this approach is best avoided. The final, most general approach is to use traits, typeclasses or other interface mechanisms. For example, architectures can be represented as a trait in Rust: Then, other library methods could use: And custom architectures can be defined as: This also works for output types such as returned errors. You can try and downcast the error into a more specific type: but this turns a formerly compile-time check into a runtime one. A common bug that happens with this approach is version skew: for example, let’s say the library had originally been using . In a minor release, it upgrades to and return errors from the new version, while the consumer continues to downcast to the old version. Such cases will fail silently 3 . Traits are similar to the many-strings approach because they both model fully open universes. However, there are some tradeoffs between the two. One benefit of traits is that they offer greater implementation flexibility: trait methods allow for custom behavior on user-defined types. A downside is that traits make each choice in an open universe a type , not a value , so operations like equality become harder to define. It isn’t impossible, though: the trait could require a method which returns a many-strings type. The contents of can then be used for equality checks. Another downside is ergonomics: traits and interfaces are more awkward than values. A sufficiently complex trait may even require some sort of code generation to use effectively. Rust has declarative and procedural macros to help, which work okay but add even more complexity on top. Whether strings are sufficient or full-blown traits are required is a case-by-case decision. In general, the many-strings approach is much simpler and is powerful enough for many kinds of choices; traits provide greater power and are always worth keeping in mind, though. A trait can also model a semi-open universe if you seal it : The library itself may add new implementations of the trait, but downstream consumers cannot because they do not have access to 4 . Type systems are often used to represent one of several choices. The choices can either be fixed (closed universe), or might continue to change over time (open universe). These are different situations that usually need to be handled in different ways. Hope this helps! The idea of writing this post arose from several conversations with members of the Rust community. Thanks to Munin , Jane Lusby , Manish Goregaokar , Michael Gattozzi and Izzy Muerte for reviewing drafts of this post. Updated 2021-08-04: edited some text, and corrected the footnote about to indicate that it is required. Updated 2021-08-13: added a note about more complex versions of the many-strings pattern, expanded on the difference between it and traits, and expanded on what it would mean to unify sealed traits and non-exhaustive enums. Thanks to Manish for the inspiration. I wish that Rust had a way to lint, but not fail to compile, on missing patterns in a non-exhaustive match.  ↩︎ This implementation uses the type, which with the lifetime stands for either a borrowed string that lasts the duration of the entire program—typically one embedded in the binary—or an owned created at runtime. If this were just a , then the implementation below wouldn’t be possible because s can’t allocate memory.  ↩︎ With static types, this sort of version skew error is usually caught at compile time. With dynamic casting it can only be caught at runtime. You might argue that if the error type comes from a public dependency, an incompatible upgrade to the dependency constitutes a semantic version breakage. However, this is a bit of a boundary case; the library authors might disagree with you and say that their stability guarantees don’t extend to downcasting. Semantic versioning is primarily a low-bandwidth way for library developers and consumers to communicate about relative burdens in the event of a library upgrade.  ↩︎ Given that both sealed traits and non-exhaustive enums model the same kinds of semi-open universes in fairly similar fashions, a direction languages can evolve in is to unify the two. Scala has already done something similar . Rust could gain several features that bring non-exhaustive enums and sealed traits incrementally closer to each other: Sometimes, all the choices may be known in advance and will likely never change—this is often called a closed universe of values. Other times, the set of options will change over time, and the type needs to represent an open universe . Open with respect to the library itself: new choices can be defined by library authors in the future, but not by downstream consumers. I’m going to call these semi-open universes . Open with respect to users of the library: new choices can be defined by both library authors and consumers. These might be described as fully open universes . used as input types to a library that doesn’t do much processing on them, and the data associated with each choice doesn’t vary by choice. I wish that Rust had a way to lint, but not fail to compile, on missing patterns in a non-exhaustive match.  ↩︎ This implementation uses the type, which with the lifetime stands for either a borrowed string that lasts the duration of the entire program—typically one embedded in the binary—or an owned created at runtime. If this were just a , then the implementation below wouldn’t be possible because s can’t allocate memory.  ↩︎ With static types, this sort of version skew error is usually caught at compile time. With dynamic casting it can only be caught at runtime. You might argue that if the error type comes from a public dependency, an incompatible upgrade to the dependency constitutes a semantic version breakage. However, this is a bit of a boundary case; the library authors might disagree with you and say that their stability guarantees don’t extend to downcasting. Semantic versioning is primarily a low-bandwidth way for library developers and consumers to communicate about relative burdens in the event of a library upgrade.  ↩︎ Given that both sealed traits and non-exhaustive enums model the same kinds of semi-open universes in fairly similar fashions, a direction languages can evolve in is to unify the two. Scala has already done something similar . Rust could gain several features that bring non-exhaustive enums and sealed traits incrementally closer to each other: Allow statements to match sealed traits if a wildcard match is specified. Make enum variants their own types rather than just data constructors . Automatically convert the return values of functions like into enums if necessary. (The trait doesn’t need to be sealed in this case, because the enum would be anonymous and matching on its variants wouldn’t be possible.) Inherent methods on an enum would need to be “lifted up” to become trait methods. As of Rust 1.54, there are several kinds of methods that can be specified on enums but not on traits. Features like existential types help narrow that gap.

0 views
sunshowers 4 years ago

What I use, late 2020 edition

I’m going to try and write up a post about my digital tools at the end of every year. Here’s what I’m using in late 2020. The computer I use for work and play is a desktop with an AMD Ryzen 9 3900x processor, a Samsung 970 EVO Plus 2TB NVMe drive, 64GB of RAM, and an RTX 2060 Super graphics card. The case I use is an NZXT H1 : it is small, stays cool and quiet under heavy loads, and looks gorgeous . It also has a lot of future upgradability: if I need another few terabytes of storage in the future, I can easily pop the case open and add an SSD or NVMe drive. My current monitor is an LG 38GN950 , a 4k 38" ultrawide with a high refresh rate and low latency 1 . My main monitor has been some some sort of ultrawide since 2015. The biggest advantage of an ultrawide is the ability to have 3-4 panes of code side by side at the same time without having to use multiple monitors. They also work really well with tiling window managers (more about them below). My primary desktop operating system is the Pop!_OS distribution of GNU/Linux, version 20.10. The main feature I like about it is the Pop Shell tiling extension for GNOME 2 . I first started using a tiling window manager around 2017, and I haven’t looked back since. Pop Shell is my favorite by a large margin: it combines a great desktop environment with a high-quality auto-tiling implementation. I like to have many windows open at the same time: a text editor or IDE, a browser, Discord, Slack, Spotify etc. With a traditional overlapping window manager, once I have more than a couple of windows open I start getting lost in them. I’ve tried using virtual desktops but they never really made sense to me: it becomes really hard to keep track of where each window is, and the alt-tab application switcher starts behaving in confusing ways. So instead of having just one or a few desktops with many windows in them, why not have many desktops with a few windows in them? That’s the core insight behind tiling window managers. Most come with a few important features: I’ve been using computers since the late ’90s, right? I’m pretty jaded at this point. Switching to a tiling window manager has made computers fun again. If you haven’t tried one yet and you get a chance, please do! All you need to do is boot up a live USB stick with Pop_OS! 3 . You might be surprised at how much you like it. Some other tools I use: I like to play video games, mostly on PC. Being able to modify my games, whether the developers intended to allow it or not, is very important to me. I do have a Windows dual boot for games but I don’t use it very often. Most games I play work on Linux, either natively or through Proton . As someone who’s used Linux off and on since 2002, it’s quite remarkable how great a gaming platform it is these days. Some games I’ve been playing over the last couple of months: My primary phone is a Google Pixel 5. The phone is the smoothest I’ve ever used, it takes great photos, and the battery never runs out. Here’s a few things I really like about the phone, most of which are only available on Android: What an extraordinary year this has been. I’ve had a very rough time with my mental health but have otherwise been fortunate in so many ways. Here’s hoping the COVID vaccines do their job and things are back to normal by this time next year. I care a lot about high refresh rates: over 90-100Hz a device feels responsive in a way that it simply doesn’t at the usual 60Hz. Outside of niche devices like e-book readers, I only buy displays that are at least 90Hz, preferably 120.  ↩︎ The Pop Shell extension works on any distribution with a recent GNOME environment. Pop_OS! includes it by default.  ↩︎ Many computers can boot bare metal Linux. It’s a pretty big hassle on recent Macs, though. Honestly, I think macOS is poorly designed and virtual machines are hard to use, so I prioritize Linux compatibility when I’m shopping for hardware.  ↩︎ A single window on a desktop gets maximized to take up the entire screen. Opening up a second window causes each window to be resized to take up half the space by default: there are no overlapping windows. It becomes impossible to have more than four or so windows on a single desktop. Keyboard shortcuts for all the most important operations: resizing windows, navigating between desktops, moving windows between desktops, etc. Howdy combined with a Logitech IR webcam enables face-based authentication for logins and . It’s delightful. My keyboard is a Kinesis Freestyle Edge with Cherry MX Brown switches. There’s so much to love about the keyboard: I work for hours at a stretch and ergonomics is a top priority for me. The split ergonomic design means my shoulders and wrists thank me after a long day’s work. The macro keys are a great way to automate common operations. The keyboard’s configuration is stored in a simple text file, and this file can be checked into source control. My primary web browser is Mozilla Firefox . I’ve been a Firefox user since 2008. It’s really nice and keeps getting new features I appreciate, such as Multi-Account Containers . My preferred shell is zsh . I think it strikes the right balance between Bourne shell compatibility and user-friendliness. Some of my favorite less-well-known shortcuts are to repeat the last word of the last command (also available on bash), and to add single quotes around the current command. homeshick lets me synchronize my dotfiles across computers. I used to use Emacs a long time ago but gave up on it because of the UI stalls inherent to its single-threaded model. I still use Emacs keybindings everywhere, though, such as to go to the beginning of a line and to go to the end. Here’s how to turn them on in GNOME . For Rust development I use a combination of the terminal and the JetBrains Rust plugin . It’s not perfect but it’s pretty close. I’m also keeping an eye on Visual Studio Code and rust-analyzer , but for now the JetBrains plugin is still better. For my terminal needs I use Guake . I bind the pull-down key to ⑧ on my keyboard, just to the left of left-shift. Fullscreen is bound to ⑦. A pull-down terminal and an IDE work very well together. Celeste , a difficult but kind 2D platformer. I’ve mostly been playing custom maps made by the incredible mod community around the game. Monster Train , a deckbuilding roguelike similar to Slay the Spire . The game technically only supports Windows but plays perfectly through Proton. When I’m feeling like a challenge I’ll play Sekiro: Shadows Die Twice for a bit. Sekiro is my favorite From Software game, and it also works really well on Linux through Proton. I played through Yoku’s Island Express , a fun little pinball Metroidvania. What a neat little game. Super Smash Bros. Ultimate , a fun fighting game on the Switch. The 90Hz screen is very nice to use. I use Firefox as my primary browser. I love having access to uBlock Origin and Dark Reader , among other extensions, and the bookmark sync with desktop is great. KDE Connect is great for syncing notifications and files between my Linux desktop and my phone. On the desktop side I use the GSConnect extension for GNOME Shell. I use the Wavelet equalizer for my Sony wireless headphones . Wavelet ships with a database of presets for popular headphones, and I also appreciate that you can have separate profiles for different output devices. We upgraded to a Netgear Orbi mesh Wi-Fi setup this year. The Orbi is the best wireless setup I’ve ever used: stable, reliable, and very fast. The Valve Index is a pretty great VR headset. Even though I’ve mostly used it to play Beat Saber, the Index’s high refresh rate means I feel less woozy after a long session. As a long-time IRC veteran, I finally got on Discord this year. It’s not quite a perfect substitute for hanging out in real life, but it’s pretty good. I care a lot about high refresh rates: over 90-100Hz a device feels responsive in a way that it simply doesn’t at the usual 60Hz. Outside of niche devices like e-book readers, I only buy displays that are at least 90Hz, preferably 120.  ↩︎ The Pop Shell extension works on any distribution with a recent GNOME environment. Pop_OS! includes it by default.  ↩︎ Many computers can boot bare metal Linux. It’s a pretty big hassle on recent Macs, though. Honestly, I think macOS is poorly designed and virtual machines are hard to use, so I prioritize Linux compatibility when I’m shopping for hardware.  ↩︎

0 views
sunshowers 5 years ago

The social consequences of type systems

Type systems 1 are the wellspring of some of the most interesting work in computer science, and practitioners like myself deal with them everyday. This post collects some thoughts I’ve had about how type systems interact with the communities that use them. Let’s say you write a simple Python function that concatenates two strings: For example, returns the string . But in Python works on numbers as well, so returns the string . You really care that your function is only used to concatenate strings, so you decide to add a comment which explains you shouldn’t do that. In Python, that’s usually done through docstrings : This may be enough if you’re working by yourself, but many people work in teams. Let’s say you come back to this code a few months later and you realize a teammate has accidentally used it to add two integers. So you decide to add a little bit of enforcement, using a Python function called to produce an error if either or aren’t strings: This is a really big jump. With this change, now produces an error in cases where it used to work. This is a form of what is often called dynamic analysis : every time the function is called at runtime , it checks that its inputs look correct. Dynamic analysis works quite well! Many high-quality codebases rely on it, with a combination of tests, in various ways. But soon you start noticing patterns (the one is really common!), so you try and automate those checks a little. You turn those patterns into a system of annotations , for example: These annotations are similar to docstrings: they are communications from the author of the code to the other people working on it. Anyone looking at the definition of can instantly grasp that the function is meant to take in two strings and output another. But these annotations are a special kind of communication: one that is understandable by machines just as well. You can build tools on top of them. Here’s some examples of tools on top of these annotations that I’ve seen in practice 2 : The most immediate effect is that types form a kind of documentation that is kept up-to-date. I can’t overemphasize how important this is—in my experience working on large projects, documentation falls out of date all the time. Types don’t. The options 1–3 are possible in languages like Python, where the runtime representation of data doesn’t need to be known in advance. Other languages like C++ and Rust use these annotations to decide the layout of data in memory, so they must use a static type checker. Calling a type checker an example of a static analysis tool might raise some eyebrows. The term is usually reserved for a kind of aftermarket tool (often proprietary , though some are free ) that tells you about things like missing null checks . I’ve been thinking a lot about labels lately. I identify with the labels queer 3 , trans and nonbinary . They describe who I am and mark the communities I belong to. Another label I sometimes use, but don’t necessarily identify with, is bisexual : the term describes me well, but even though the bisexual community has many lovely people, I don’t personally feel a strong affinity towards it. Labels like static analysis are similar. They are both technical descriptions and community markers. There’s a static analysis community and a separate type systems community , which is sometimes considered part of the formal verification community . All of these communities work on similar kinds of static analysis tools, but historically they’ve had different focuses and haven’t had much overlap. When separate communities work on things that technically look similar, there’s often a lot of untapped potential in synthesizing those approaches. Rust’s advances over C++ involve taking many of the ideas pioneered in static analysis tools—null checks, use-after-free and so on—and moving them into the type system 4 . At a technical level this doesn’t sound like that big a deal—why write software in Rust if you can just use C++ with a static analysis tool? But, for the communities using Rust, this move has profound social effects. A typical static analysis tool is run separately, usually as part of continuous integration, and not within a standard developer build. It flags issues that are worth looking at, some of which are real and others are false positives. Usually, you can disable a false positive with a simple flag or comment. These tools have to walk a fine line: too many false positives and its users will end up ignoring it, too many false negatives and it misses bugs. A static type system is terribly annoying, especially when you’re learning it. It’s in your face. You have to deal with it every time you compile your code. It has false positives, but they must be dealt with. They can’t be silenced through a mere comment; you must either: But what makes a good type system have a learning curve is also what makes it useful once you’re over the hump. I get a real sense of visceral, psychological safety while writing Rust that I don’t get with C, C++ or even Python. Once you understand the peculiar ways in which a type system talks, a type check starts feeling like a conversation with a learned colleague—one who’s got your back. A good type system becomes even more pertinent once you start using other people’s code . You may not be sure whether your dependencies use a particular static analysis tool, but you do know that everyone’s had to fight the borrow checker . A high-quality type system is a shared, community-wide investment in a way of doing things, and one that is nearly impossible to retrofit into an existing ecosystem. Rust bet its entire existence on producing a type system that is good enough, and error messages that are clear enough , that developers were willing to accept all the pain of learning it. I think that bet’s paid off. The relationship between type systems and communities isn’t just one-way: after all, developers decide how type systems are used in practice. Nowhere is this more apparent than in languages like Rust and Haskell, whose communities use type-level proofs quite extensively. What do I mean by that? Well, consider Rust’s type for example. It is a wrapper around a series of bytes , with the added, purely semantic constraint that the bytes can be interpreted as valid UTF-8. The runtime representations of and are identical . Every is a valid so a “downgrade” is free, but not every is a valid . If you’re converting a to a , you must go through some sort of validation step. I like to think of this as liminal code : it lies at the edge of the world, guarding access to the realm of s. In my experience, liminal code tends to be a very common source of bugs. People like myself spend a lot of our time building tools and systems to ensure that liminal code is correct. The whole point of fuzzing , for example, is to find bugs in such code. What type-level proofs do is usually not eliminate liminal code 6 so much as concentrate it in one place. Without them, as in a language like C with , it is impossible to tell at a glance if a is supposed to be valid UTF-8. It is common for redundant, dynamic checks to be scattered throughout a codebase as bugs are discovered. In Rust, every path from to exists in one file , where it can be carefully audited. All other Rust code can trust that a is valid UTF-8. There’s also an escape hatch that converts a to a without validating it, but—as you might expect—it requires . This method of scaling up a difficult, careful, time-consuming local analysis to the rest of a system, is something type systems excel at. And there’s nothing preventing you from using type-level proofs in your own Rust code! For example, you may want to model a language identifier as a wrapper over a 7 , with the restriction that it is non-empty and doesn’t start with a digit. The code to validate a string would live in one place and be tested and fuzzed. The rest of your system can trust that identifiers are valid. You could also use it for many other sorts of proofs, such as that a set of state transitions is valid or that a cryptographic library is used correctly . You may have noticed that the arguments for type-level proofs look a lot like the ones for static type systems. Yet, in practice they’re more common in some environments—particularly functional programming languages and Rust—than in others. I’m not exactly sure why, but if I may hazard a guess for Rust: its affine typing , clear demarcation of unsafe code, and trait system all contribute to type-level proofs being easy to express and hard to bypass. This attracts the sorts of people who want to use them in their own code, and thus is created a virtuous cycle between technology and culture. The principle of linguistic relativity , sometimes called the Sapir-Whorf hypothesis , claims that the structure of a language affects how we think. Linguists have gone back and forth over the last few decades on its validity, with most now rejecting the original, strong formulation but accepting a more balanced view. There’s a case to be made for its relevance for programming languages as well; I’ve tried and put together some ways in which type systems influence the way developers use them. I think this is a useful lens to look at type systems through, and I hope you agree. The idea of writing this post arose from conversations with Inanna Malick . Thanks to Inanna, kt , Manish Goregaokar , Jane Lusby and Michael Gattozzi for reviewing drafts of this post. Any errors in it are my own. It’s useful to draw a distinction between type theory and type systems. Type theory originated in mathematics, starting with the early 20th century’s foundational crisis and Bertrand Russell’s attempt to solve a paradox that occurs in naive set theories . While type theory successfully addresses Russell’s paradox and other issues, most practicing mathematicians choose to use axiomatic set theories like ZFC instead. Type theory has lived on in formal logic and computer science, and has been applied in various ways through type systems . There’s been some recent foundational mathematics work within the field of homotopy type theory .  ↩︎ There are all sorts of in-between things possible as well, such as using dynamic analysis in some parts of your program and static analysis in others. Gradual typing consists of a rich, widely used series of ideas in this space.  ↩︎ Within the greater LGBTQ+ community, queer is a bit special because it’s a label that signals a rejection of labels, and an opposition to the norms of straight and gay society. This excerpt from the Xenofeminist Manifesto captures a kind of queerness: When the possibility of transition became real and known, the tomb under Nature’s shrine cracked, and new histories–bristling with futures–escaped the old order of ‘sex’. The disciplinary grid of gender is in no small part an attempt to mend that shattered foundation, and tame the lives that escaped it. The time has now come to tear down this shrine entirely, and not bow down before it in a piteous apology for what little autonomy has been won. There are also aftermarket dynamic analysis tools that are widely used, such as Valgrind and sanitizers . The Rust compiler moves many of these dynamic analyses into the type system as well. Note that I’m using type system to describe the combined effects of both the type checker and the borrow checker here: in a nutshell, anything that gets in the way of you being able to run your code.  ↩︎ The borrow checker used to reject many valid programs because they weren’t written in precisely the right way. The non-lexical lifetimes project has addressed many of those issues, but others remain, and will continue to do so because of the halting problem . The Rust community’s reaction to gratuitous use of unsafe is, I think, best understood as a defense mechanism against ingroup trust violations. This is not a justification for terrible behavior.   ↩︎ Though sometimes they can, especially if it is possible to make invalid states unrepresentable . These techniques synergize well: you may model, say, a number between 0–127 as a wrapper around a , allowing for the numbers 128–255 to be invalid but internally representable. But at a higher level, a struct that has a field has made those numbers unrepresentable. The practical advantage is that can be tested independently from any struct that uses it. What I’m trying to get at here is the general idea of reducing the surface area of sensitive code. I think type-level proofs are very good at achieving this.  ↩︎ For performance reasons, a production system may use a wrapper over a , or instead. The point still holds.  ↩︎ Do nothing at all. The communication would purely be to humans, just like any other documentation. Insert dynamic checks that, in case of a failure, log those errors somewhere and proceed as normal. Insert dynamic checks that cause the program to fail at runtime. This is sometimes called a contract system . Check ahead of time , without running the program, that is always called with string inputs. Such tools are called type checkers and are examples of what are called static analysis tools. rewrite your code to eliminate those false positives bypass the compiler’s checks using unsafe code , putting a big 🚨 flashing 🚨 warning 🚨 sign on it 5 . Without them, as in a language like C with , it is impossible to tell at a glance if a is supposed to be valid UTF-8. It is common for redundant, dynamic checks to be scattered throughout a codebase as bugs are discovered. In Rust, every path from to exists in one file , where it can be carefully audited. All other Rust code can trust that a is valid UTF-8. There’s also an escape hatch that converts a to a without validating it, but—as you might expect—it requires . It’s useful to draw a distinction between type theory and type systems. Type theory originated in mathematics, starting with the early 20th century’s foundational crisis and Bertrand Russell’s attempt to solve a paradox that occurs in naive set theories . While type theory successfully addresses Russell’s paradox and other issues, most practicing mathematicians choose to use axiomatic set theories like ZFC instead. Type theory has lived on in formal logic and computer science, and has been applied in various ways through type systems . There’s been some recent foundational mathematics work within the field of homotopy type theory .  ↩︎ There are all sorts of in-between things possible as well, such as using dynamic analysis in some parts of your program and static analysis in others. Gradual typing consists of a rich, widely used series of ideas in this space.  ↩︎ Within the greater LGBTQ+ community, queer is a bit special because it’s a label that signals a rejection of labels, and an opposition to the norms of straight and gay society. This excerpt from the Xenofeminist Manifesto captures a kind of queerness: When the possibility of transition became real and known, the tomb under Nature’s shrine cracked, and new histories–bristling with futures–escaped the old order of ‘sex’. The disciplinary grid of gender is in no small part an attempt to mend that shattered foundation, and tame the lives that escaped it. The time has now come to tear down this shrine entirely, and not bow down before it in a piteous apology for what little autonomy has been won.   ↩︎ There are also aftermarket dynamic analysis tools that are widely used, such as Valgrind and sanitizers . The Rust compiler moves many of these dynamic analyses into the type system as well. Note that I’m using type system to describe the combined effects of both the type checker and the borrow checker here: in a nutshell, anything that gets in the way of you being able to run your code.  ↩︎ The borrow checker used to reject many valid programs because they weren’t written in precisely the right way. The non-lexical lifetimes project has addressed many of those issues, but others remain, and will continue to do so because of the halting problem . The Rust community’s reaction to gratuitous use of unsafe is, I think, best understood as a defense mechanism against ingroup trust violations. This is not a justification for terrible behavior.   ↩︎ Though sometimes they can, especially if it is possible to make invalid states unrepresentable . These techniques synergize well: you may model, say, a number between 0–127 as a wrapper around a , allowing for the numbers 128–255 to be invalid but internally representable. But at a higher level, a struct that has a field has made those numbers unrepresentable. The practical advantage is that can be tested independently from any struct that uses it. What I’m trying to get at here is the general idea of reducing the surface area of sensitive code. I think type-level proofs are very good at achieving this.  ↩︎ For performance reasons, a production system may use a wrapper over a , or instead. The point still holds.  ↩︎

0 views
sunshowers 5 years ago

Thinking about dependencies

People tend to have lots of opinions about software dependencies. Here are mine. At a high level, virtually all code in existence relies on tools and libraries written by other people. Even projects that don’t use any library dependencies rely on tools written by other people: This is good. Depending on other people is how humans work. But it’s possible to go too far. Dependencies may have: I think there’s a balance to strike here. My heart favors dependencies, but my head recognizes that a lot of the concerns around them have merit. The oldest programming ecosystems, like those of C and C++, generally did not have any sort of dependency management tooling. Dependencies would generally be managed “out of band”, perhaps through a set of custom scripts or build instructions. This is still quite common on Linux with system-wide package managers like apt , and on Windows with DirectX runtimes . However, this model comes at a very steep cost: it is very hard to know exactly what code you are depending on. Version mismatches between libraries can lead to mysterious bugs; this model bakes in the assumption that software always gets better over time, but in the real world this is often not the case . The other option is to vendor them into the repository: copy and paste a particular version of a dependency, effectively forking it and taking control over it. This means you know exactly what code is included; however, it makes it much harder to stay up to date with bugfixes and security patches from upstream developers. There’s been endless debate about the relative merits of each approach, with Linux distributions taking firm stands , and plenty of flamewars over the decades 1 . It’s best to completely skip over it; neither is ideal, and we can do much better. The next generation of ecosystems, such as Python’s pip and Haskell’s Cabal , made dependency management a bit more explicit. They consisted of three things: This was a big improvement from before: instead of relying on custom scripts or forked versions of code checked into the repository, pip and Cabal provided a uniform way to handle them across projects. Unfortunately, these tools did not address the biggest benefit of vendoring: knowing the exact versions of dependencies in use. Requirements files usually specified ranges of versions, and what version ultimately got used depended on what was available in the package repository when the tool was run. Later tools like Ruby’s Bundler made the result of resolution explicit, writing out the results to a lockfile . For the first time, developers could have confidence that their package manager would behave deterministically across computers. One frequently-occuring issue these tools did not address is diamond dependencies . Let’s say that your code depends on two libraries, and . If both and depend on different versions of a library , pip and Cabal would not be able to handle this situation; they would simply blow up. Now there is nothing about this that is inherent to dependencies; it is quite possible for a system to be built such that multiple versions of a dependency can be used at the same time, and it just so happened that Python and Haskell weren’t able to. But the existence of this problem had two effects: The first major attempt to solve this problem was in the node.js package manager, npm. Through a combination of tooling and runtime support , npm could manage multiple versions of a dependency at the same time. Unshackled, the power of dependencies was truly unlocked. npm projects quickly started growing thousands of them 2 . The npm model had its own problems, though: Rust’s package manager, Cargo, incorporates most of the learnings of earlier systems while choosing to do a few things its own way 3 . A quick survey of its design decisions: Knowing the exact versions of dependencies in use. Cargo’s lockfiles record the exact versions of every third-party library in use, and go even further by recording hashes of their source code. These lockfiles should be checked into your repository. I disagree with Cargo’s recommendation 4 to not check in its lockfile for libraries— everyone should know exactly what third-party code is included, so everyone should check in lockfiles. (They should also keep those lockfiles up-to-date with a service like dependabot ). Combinatorial explosion of dependencies. Cargo pulls back on npm’s model a bit: it will unify dependencies up to semver compatibility: This reduces the combinatorial explosion of dependencies seen in systems like npm to a linear explosion. Much more reasonable. Incompatible data structures. Rust treats different versions of as completely different, even if the types in them have the same names and structures. The only way to pass around data from 1.1 to 2.0 is to do an explicit conversion. Rust’s static type system allows this detection to be done at compile-time. This model works in conjunction with the dependency unification described above. left-pad. The crates.io package repository guarantees that code that’s been uploaded to it is never deleted (though it may be yanked ). This is a basic guarantee that every internet-hosted package repository should provide. By addressing the technical issues that earlier systems had, Cargo foregrounds some very hard questions about human society. Depending on other people is scary . Across larger society we’ve built good models for being discerning about it, but we still need to refine them a bit for the online world. Here are some arguments that look technical at the surface, but are really about trusting other people. We don’t want our builds to ever hit the internet. This is a reasonable request, especially for entities that host their own source control. Cargo supports vendoring dependencies , storing them in the repository while still treating them as third-party code. Lockfile hashes ensure that third-party code isn’t accidentally changed. But in emergencies, we may have to make urgent bugfixes. Dependencies make that harder. I’ve had to deal with this myself—after all, in emergencies, being autonomous is important .. The maintainers may not be as responsive as your schedule needs 5 . As an alternative, Cargo allows workspaces to patch dependencies . It’s not just emergencies. Dependencies fail to work properly in edge cases, and our situation is an edge case. Rachel Kroll talks more about this. She’s right—lots of code works fine for most people, but has bugs when run in edge case environments like Facebook’s datacenters. If you’re one of the people it doesn’t work for, you have three options: All of these options have tradeoffs. Depending on the facts, any of them, or something in between, may be the right thing to do. For example, you may be able to rewrite some code from scratch while copying over tests from a dependency. Do whatever makes the most sense for the situation. We don’t know what dependencies are actually included in our code, we don’t have the time to review everything, and we can’t keep up with updates. Supply chain attacks are real and are becoming more common over time. In general, having fewer dependencies certainly helps manage supply chain attacks. But rewriting code may also introduce its own bugs and vulnerabilities. For large projects, it also makes sense to carve out a subset of your system as high-security, then have a higher bar for adding dependencies to it. This is a healthy way to channel concerns about dependencies into something that I think is quite constructive. I think this concern gets at the biggest unsolved technical problem with dependencies, which is that they aren’t legible enough. I think solving this is worthwhile, and cargo-guppy is my attempt to do so. It provides a query interface over a Cargo dependency graph, telling you precisely what dependencies are included in a particular build target. This can then be used to carve a large codebase into several subsets, for example: My coworker David Wong wrote cargo-dephell , which uses guppy along with several other technical and social signals and presents the information as a web page. My understanding from listening to the more serious arguments against dependencies is that: There’s a lot of reasonable positions to take here, and here’s how I think about some of them. We only trust code that we wrote ourselves. Very few entities are in the position of: But some entities certainly are, and if you’re one of them, go for it. OK, maybe not everything , but we only trust libraries written by ourselves. You’re still going to be relying on a compiler written by other people. Compilers are very complicated. They themselves depend on libraries written by other people, such as lexers, parsers, and code generation backends. Lots of people use the same compilers and source control systems. We’re comfortable making an exception for them. This is a reasonable stance. Compilers and source control form clear abstraction boundaries. Reflections on Trusting Trust is part of computer security folklore, but I’m quite confident that the latest version of the Rust compiler doesn’t have a Trojan horse in it. One way to be more careful is by using diverse double-compiling , or generally maintaining your own lineage of compiler binaries—I know of several entities that do this internally. But compilers also come with a standard library , such as Rust’s . This brings up a set of important governance arguments that are worth looking at. The standard library should include everything, eliminating the need for third-party dependencies. Some languages like Python follow this “batteries included” approach, but in practice it has proven to be quite problematic. Standard libraries are meant to be stable . Once they ship, most interfaces can never be changed, even if real-world usage exposes issues with them. Adding too many features to the standard library also stifles innovation, because why use a third-party dependency that’s only a little bit better when the standard library is right there? Experience has shown that Python’s batteries are leaking . Put another way, adding something to the standard library is an irreversible type 1 decision . Put yet another way, a stable system is a dead system . In contrast, Rust chooses to have a lean standard library, relying on Cargo to make it easy to pull in other bits of code. This lets developers experiment and come up with better APIs before stabilizing them. The most important ones ultimately make it into the standard library. I believe that this is the right overall direction. Too much choice is bad! It makes it harder for us to make decisions about which library to pick. This is a pretty strong argument for a batteries-included approach. There is a real tradeoff between innovation and standardization , with no easy answers. My personal opinion is that: The advance Cargo made over earlier systems was to bring both options to a level playing field 6 , so we can grapple with the complex social issues more directly. We’re willing to trust “third-party” dependencies that language maintainers have taken over ownership of. This is great! Bringing more code under unified governance would be wonderful for popular dependencies to the extent that language maintainers have the capacity to do so. But they only have a fixed amount of time, and maintaining dependencies may not always be the best use of their skills. This problem is more acute for languages without large amounts of corporate funding: The Go language team can afford to maintain a large number of not-quite-standard libraries under , but the Rust nursery is less resourced. If you’re willing to trust the maintainers of a language, are you also willing to extend that trust to some other people? Assuming that you are, what sorts of facts are worth looking at and how should one evaluate them? There aren’t easy answers here, but some questions that may help: How complex is the dependency? The harder the problem it solves, the more useful it is—the more likely that it captures the collective wisdom of an open source community. Has the project gotten the technical fundamentals right? How much automation does it use? How well-tested is it? Does it make its quality processes easy to understand? Does it only use superpowers like Rust’s to the extent that it needs to? How well-known is the project? How many contributors has the project had over time? Is the project used by other dependencies you trust? More popular code will have more eyes on it. How well-run is the project? Do the maintainers respond to bug reports promptly? What sorts of code review processes do they have? Do they have a code of conduct governing social interactions? What about other projects by the same people? Are the maintainers’ interests aligned with yours? Are they open to your contributions? How promptly do they accept bugfixes or new features? If you need to make substantial changes to a project that cause it to diverge from the maintainers’ vision, forking or rewriting it might be the right call. For example, David Tolnay 7 is a well-known figure in the Rust community and the author of many widely-used Rust libraries . I’m happy to extend the same level of trust to him that I do to the Rust language team. By addressing most of the technical issues involved in managing third-party dependencies, the Cargo package manager has brought a number of complex social problems to the fore. The implicit trust models present in earlier systems no longer work; evaluating third-party dependencies requires new models that combine technical and social signals. Thanks to David Wong , Brandon Williams , Rex Hoffman , Inanna Malick , and kt for reviewing drafts of this post. Any issues with this post are solely my fault. Some of which I took part in myself back in the day. All those old IRC logs are lost in time, I hope.  ↩︎ The npm community has a norm to split a library up into lots of reusable components. I think a few moderately-sized libraries are better than lots of small ones. I have many other thoughts about library size, coupling and API boundaries, but this margin is too narrow to contain them.  ↩︎ Cargo isn’t perfect, however, and its dependency model still has a few technical problems. The biggest ones I’ve noticed are a lack of distributed caching leading to long build times for large dependency chains, and some more spooky action at a distance through feature unification .  ↩︎ Added 2020-08-26: I think there are two justifiable positions here: Check in the lockfile, keep it up to date, and also update the version ranges specified in e.g. files. This effectively adds a “meta-spec” on top of what Cargo.toml specifies, and each release captures what the latest versions of each dependency were at that time. If you care most about developers and CI building and testing the same code as much as possible, and are fine with applying gentle pressure on downstream developers to keep all their dependencies up-to-date, you may prefer this approach. Don’t check in the lockfile, specify the oldest required version instead of the newest one, and rely on the community to “fuzz” versioning because different developers will have different local versions of dependencies. If you care about the fact that external developers are going to use a range of versions against your code in practice, and wish to provide maximum flexibility, you may like this way of doing things more. I personally prefer the first approach, but the other way is fine too. My only request is that if you do, please consider building tooling to test against the oldest versions you actually declared! Thanks to Henry de Valence for engaging with me on this.  ↩︎ Nor would it be reasonable to expect them to be, unless you have a support contract with them.  ↩︎ Well, mostly. The standard library is still special in some ways.  ↩︎ Disclosure: David and I currently work at the same place, and we occassionally collaborate.  ↩︎ compilers, interpreters and runtimes build systems version control bugs, especially in uncommon and untested cases, security vulnerabilities, or malicious code introduced into them somewhere along the way. a global, internet-hosted package repository , such as PyPI a package manager : a client that could automatically download packages from the repository a way for developers to declare what third-party packages they require from the repository, along with a range of versions for each package. it added a natural constraint to the size of a project’s dependency graph: the more dependencies you used, the more likely you were to run into this problem developers feared it, because it could happen as the result of any update to any dependency: there’s nothing that destroys confidence in a system like spooky action at a distance . It went too far in its quest to avoid diamond dependencies: even if and depended on the same version of , npm would store two separate versions of it. This would happen transitively, causing a combinatorial explosion in the number of dependencies used by a project. One could pass around data structures between incompatible versions of , relying on JavaScript’s dynamic typing to hope that it all worked out. Opinions on how well it works are mixed . There were also design flaws in how npm’s package repository was run, the most famous of which resulted in the left-pad incident . A single developer deleted their package from the node.js package repository, bringing many projects’ build systems grinding to a halt. This was really, really bad and should have never been possible. If library depends on version 1.1 of and depends on 1.2, Cargo will use a single version of that satisfies the requirements of both and . If depends on version 1.1 of and depends on 2.0, Cargo will include two separate versions of . Contribute improvements to the dependency, provided the maintainers are willing to accept your changes. We did this for many years with Mercurial , for example. Fork the dependency. You don’t have to work with the maintainers, but you do have to spend effort keeping up to date with bugfixes and security updates. Rewrite the code from scratch. You’ll have complete control over your code, at the cost of potentially ending up with a different set of bugs. a high-security subset as mentioned above, requiring the greatest degree of care the full set of binaries that form the production system, forming a superset of 1 management and administration tools for the system developer tools, linters, and CI automation scripts. people generally trust themselves most of all they trust the processes of programming languages a lot , and random third parties not so much (with possible exceptions for popular projects like parts of C++’s Boost ). being able to write everything from scratch, including their compiler, build system and source control genuinely being able to trust internal talent over the collective wisdom gathered in an open source project Python’s standard library is too big. Rust’s standard library is a bit too small, and some third-party code would be better off being in the standard library. It is better to err on the side of being too small than too big, because of the stability concerns above. Some of which I took part in myself back in the day. All those old IRC logs are lost in time, I hope.  ↩︎ The npm community has a norm to split a library up into lots of reusable components. I think a few moderately-sized libraries are better than lots of small ones. I have many other thoughts about library size, coupling and API boundaries, but this margin is too narrow to contain them.  ↩︎ Cargo isn’t perfect, however, and its dependency model still has a few technical problems. The biggest ones I’ve noticed are a lack of distributed caching leading to long build times for large dependency chains, and some more spooky action at a distance through feature unification .  ↩︎ Added 2020-08-26: I think there are two justifiable positions here: Check in the lockfile, keep it up to date, and also update the version ranges specified in e.g. files. This effectively adds a “meta-spec” on top of what Cargo.toml specifies, and each release captures what the latest versions of each dependency were at that time. If you care most about developers and CI building and testing the same code as much as possible, and are fine with applying gentle pressure on downstream developers to keep all their dependencies up-to-date, you may prefer this approach. Don’t check in the lockfile, specify the oldest required version instead of the newest one, and rely on the community to “fuzz” versioning because different developers will have different local versions of dependencies. If you care about the fact that external developers are going to use a range of versions against your code in practice, and wish to provide maximum flexibility, you may like this way of doing things more. Nor would it be reasonable to expect them to be, unless you have a support contract with them.  ↩︎ Well, mostly. The standard library is still special in some ways.  ↩︎ Disclosure: David and I currently work at the same place, and we occassionally collaborate.  ↩︎

0 views