Posts in Rust (20 found)
Corrode 3 days ago

Canonical

What does it take to rewrite the foundational components of one of the world’s most popular Linux distributions? Ubuntu serves over 12 million daily desktop users alone, and the systems that power it, from sudo to core utilities, have been running for decades with what Jon Seager, VP of Engineering for Ubuntu at Canonical, calls “shaky underpinnings.” In this episode, we talk to Jon about the bold decision to “oxidize” Ubuntu’s foundation. We explore why they’re rewriting critical components like sudo in Rust, how they’re managing the immense risk of changing software that millions depend on daily, and what it means to modernize a 20-year-old operating system without breaking the internet. CodeCrafters helps you become proficient in Rust by building real-world, production-grade projects. Learn hands-on by creating your own shell, HTTP server, Redis, Kafka, Git, SQLite, or DNS service from scratch. Start for free today and enjoy 40% off any paid plan by using this link . Canonical is the company behind Ubuntu, one of the most widely-used Linux distributions in the world. From personal desktops to cloud infrastructure, Ubuntu powers millions of systems globally. Canonical’s mission is to make open source software available to people everywhere, and they’re now pioneering the adoption of Rust in foundational system components to improve security and reliability for the next generation of computing. Jon Seager is VP Engineering for Ubuntu at Canonical, where he oversees the Ubuntu Desktop, Server, and Foundations teams. Appointed to this role in January 2025, Jon is driving Ubuntu’s modernization strategy with a focus on Communication, Automation, Process, and Modernisation. His vision includes adopting memory-safe languages like Rust for critical infrastructure components. Before this role, Jon spent three years as VP Engineering building Juju and Canonical’s catalog of charms. He’s passionate about making Ubuntu ready for the next 20 years of computing. Juju - Jon’s previous focus, a cloud orchestration tool GNU coretuils - The widest used implementation of commands like ls, rm, cp, and more uutils coreutils - coreutils implementation in Rust sudo-rs - For your Rust based sandwiches needs LTS - Long Term Support, a release model popularized by Ubuntu coreutils-from-uutils - List of symbolic links used for coreutils on Ubuntu, some still point to the GNU implementation man: sudo -E - Example of a feature that sudo-rs does not support SIMD - Single instruction, multiple data rust-coreutils - The Ubuntu package with all it’s supported CPU platforms listed fastcat - Matthias’ blogpost about his faster version of systemd-run0 - Alternative approach to sudo from the systemd project AppArmor - The Linux Security Module used in Ubuntu PAM - The Pluggable Authentication Modules, which handles all system authentication in Linux SSSD - Enables LDAP user profiles on Linux machines ntpd-rs - Timesynchronization daemon written in Rust which may land in Ubuntu 26.04 Trifecta Tech Foundation - Foundation supporting sudo-rs development Sequioa PGP - OpenPGP tools written in Rust Mir - Canonicals wayland compositor library, uses some Rust Anbox Cloud - Canonical’s Android streaming platform, includes Rust components Simon Fels - Original creator of Anbox and Anbox Cloud team lead at Canonical LXD - Container and VM hypervisor dqlite - SQLite with a replication layer for distributed use cases, potentially being rewritten in Rust Rust for Linux - Project to add Rust support to the Linux kernel Nova GPU Driver - New Linux OSS driver for NVIDIA GPUs written in Rust Ubuntu Asahi - Community project for Ubuntu on Apple Silicon debian-devel: Hard Rust requirements from May onward - Parts of apt are being rewritten in Rust (announced a month after the recording of this episode) Go Standard Library - Providing things like network protocols, cryptographic algorithms, and even tools to handle image formats Python Standard Library - The origin of “batteries included” The Rust Standard Library - Basic types, collections, filesystem access, threads, processes, synchronisation, and not much more clap - Superstar library for CLI option parsing serde - Famous high-level serilization and deserialization interface crate Jon Seager’s Website Jon’s Blog: Engineering Ubuntu For The Next 20 Years Canonical Blog Ubuntu Blog Canonical Careers: Engineering - Apply your Rust skills in the Linux ecosystem

0 views

Notes on the WASM Basic C ABI

The WebAssembly/tool-conventions repository contains "Conventions supporting interoperability between tools working with WebAssembly". Of special interest, in contains the Basic C ABI - an ABI for representing C programs in WASM. This ABI is followed by compilers like Clang with the wasm32 target. Rust is also switching to this ABI for extern "C" code. This post contains some notes on this ABI, with annotated code samples and diagrams to help visualize what the emitted WASM code is doing. Hereafter, "the ABI" refers to this Basic C ABI. In these notes, annotated WASM snippets often contain descriptions of the state of the WASM value stack at a given point in time. Unless otherwise specified, "TOS" refers to "Top Of value Stack", and the notation [ x  y ] means the stack has y on top, with x right under it (and possibly some other stuff that's not relevant to the discussion under x ); in this notation, the stack grows "to the right". The WASM value stack has no linear memory representation and cannot be addressed, so it's meaningless to discuss whether the stack grows towards lower or higher addresses. The value stack is simply an abstract stack, where values can be pushed onto or popped off its "top". Whenever addressing is required, the ABI specifies explicitly managing a separate stack in linear memory. This stack is very similar to how stacks are managed in hardware assembly languages (except that in the ABI this stack pointer is held in a global variable, and is not a special register), and it's called the "linear stack". By "scalar" I mean basic C types like int , double or char . For these, using the WASM value stack is sufficient, since WASM functions can accept an arbitrary number of scalar parameters. This C function: Will be compiled into something like: And can be called by pushing three values onto the stack and invoking call $add_three . The ABI specifies that all integral types 32-bit and smaller will be passed as i32 , with the smaller types appropriately sign or zero extended. For example, consider this C function: It's compiled to the almost same code as add_three : Except the last i32.extend8_s , which takes the lowest 8 bits of the value on TOS and sign-extends them to the full i32 (effectively ignoring all the higher bits). Similarly, when $add_three_chars is called, each of its parameters goes through i32.extend8_s . There are additional oddities that we won't get deep into, like passing __int128 values via two i64 parameters. C pointers are just scalars, but it's still educational to review how they are handled in the ABI. Pointers to any type are passed in i32 values; the compiler knows they are pointers, though, and emits the appropriate instructions. For example: Is compiled to: Recall that in WASM, there's no difference between an i32 representing an address in linear memory and an i32 representing just a number. i32.store expects [ addr  value ] on TOS, and does *addr = value . Note that the x parameter isn't needed any longer after the sum is computed, so it's reused later on to hold the return value. WASM parameters are treated just like other locals (as in C). According to the ABI, while scalars and single-element structs or unions are passed to a callee via WASM function parameters (as shown above), for larger aggregates the compiler utilizes linear memory. Specifically, each function gets a "frame" in a region of linear memory allocated for the linear stack. This region grows downwards from high to low addresses [1] , and the global $__stack_pointer points at the bottom of the frame: Consider this code: When do_work is compiled to WASM, prior to calling pair_calculate it copies pp into a location in linear memory, and passes the address of this location to pair_calculate . This location is on the linear stack, which is maintained using the $__stack_pointer global. Here's the compiled WASM for do_work (I also gave its local variable a meaningful name, for readability): Some notes about this code: Before pair_calculate is called, the linear stack looks like this: Following the ABI, the code emitted for pair_calculate takes Pair* (by reference, instead of by value as the original C code): Each function that needs linear stack space is responsible for adjusting the stack pointer and restoring it to its original place at the end. This naturally enables nested function calls; suppose we have some function a calling function b which, in turn, calls function c , and let's assume all of these need to allocate space on the linear stack. This is how the linear stack looks after c 's prologue: Since each function knows how much stack space it has allocated, it's able to properly restore $__stack_pointer to the bottom of its caller's frame before returning. What about returning values of aggregate types? According to the ABI, these are also handled indirectly; a pointer parameter is prepended to the parameter list of the function. The function writes its return value into this address. The following function: Is compiled to: Here's a function that calls it: And the corresponding WASM: Note that this function only uses 8 bytes of its stack frame, but allocates 16; this is because the ABI dictates 16-byte alignment for the stack pointer. There are some advanced topics mentioned in the ABI that these notes don't cover (at least for now), but I'll mention them here for completeness: This is similar to x86 . For the WASM C ABI, a good reason is provided for the direction: WASM load and store instructions have an unsigned constant called offset that can be used to add a positive offset to the address parameter without extra instructions. Since $__stack_pointer points to the lowest address in the frame, these offsets can be used to efficiently access any value on the stack. There are two instance of the pair pp in linear memory prior to the call to pair_calculate : the original one from the initialization statement (at offset 8), and a copy created for passing into pair_calculate (at offset 0). Theoretically, as pp is unused used after the call, the compiler could do better here and keep only a single copy. The stack pointer is decremented by 16, and restored at the end of the function. The first few instructions - where the stack pointer is adjusted - are usually called the prologue of the function. In the same vein, the last few instructions where the stack pointer is reset back to where it was at the entry are called the epilogue . "Red zone" - leaf functions have access to 128 bytes of red zone below the stack pointer. I found this difficult to observe in practice [2] . Since we don't issue system calls directly in WASM, it's tricky to conjure a realistic leaf function that requires the linear stack (instead of just using WASM locals). A separate frame pointer (global value) to be used for functions that require dynamic stack allocation (such as using C's VLAs ). A separate base pointer to be used for functions that require alignment > 16 bytes on the stack.

0 views
baby steps 1 weeks ago

Move Expressions

This post explores another proposal in the space of ergonomic ref-counting that I am calling move expressions . To my mind, these are an alternative to explicit capture clauses , one that addresses many (but not all ) of the goals from that design with improved ergonomics and readability. The idea itself is simple, within a closure (or future), we add the option to write . This is a value expression (“rvalue”) that desugars into a temporary value that is moved into the closure. So is roughly equivalent to something like: Let’s go back to one of our running examples, the “Cloudflare example”, which originated in this excellent blog post by the Dioxus folks . As a reminder, this is how the code looks today – note the lines for dealing with captures: Under this proposal it would look something like this: There are times when you would want multiple clones. For example, if you want to move something into a closure that will then give away a copy on each call, it might look like This idea is not mine. It’s been floated a number of times. The first time I remember hearing it was at the RustConf Unconf, but I feel like it’s come up before that. Most recently it was proposed by Zachary Harrold on Zulip , who has also created a prototype called soupa . Zachary’s proposal, like earlier proposals I’ve heard, used the keyword. Later on @simulacrum proposed using , which to me is a major improvement, and that’s the version I ran with here. The reason that I love the variant of this proposal is that it makes closures more “continuous” and exposes their underlying model a bit more clearly. With this design, I would start by explaining closures with move expressions and just teach closures at the end, as a convenient default: A Rust closure captures the places you use in the “minimal way that it can” – so will capture a shared reference to the , will capture a mutable reference, and will take ownership of the vector. You can use expressions to control exactly what is captured: so will move the into the closure. A common pattern when you want to be fully explicit is to list all captures at the top of the closure, like so: As a shorthand, you can write at the top of the closure, which will change the default so that closures > take ownership of every captured variable. You can still mix-and-match with expressions to get more control. > So the previous closure might be written more concisely like so: It’s a bit ironic that I like this, because it’s doubling down on part of Rust’s design that I was recently complaining about. In my earlier post on Explicit Capture Clauses I wrote that: To be honest, I don’t like the choice of because it’s so operational . I think if I could go back, I would try to refashion our closures around two concepts I think this would help to build up the intuition of “use if you are going to return the closure from the current stack frame and use otherwise”. expressions are, I think, moving in the opposite direction. Rather than talking about attached and detached, they bring us to a more unified notion of closures, one where you don’t have “ref closures” and “move closures” – you just have closures that sometimes capture moves, and a “move” closure is just a shorthand for using expressions everywhere. This is in fact how closures work in the compiler under the hood, and I think it’s quite elegant. One question is whether a expression should be a prefix or a postfix operator. So e.g. instead of . My feeling is that it’s not a good fit for a postfix operator because it doesn’t just take the final value of the expression and so something with it, it actually impacts when the entire expression is evaluated. Consider this example: When does get called? If you think about it, it has to be closure creation time, but it’s not very “obvious”. We reached a similar conclusion when we were considering operators. I think there is a rule of thumb that things which delineate a “scope” of code ought to be prefix – though I suspect might actually be nice, and not just . Edit: I added this section after-the-fact in response to questions. I’m going to wrap up this post here. To be honest, what this design really has going for it, above anything else, is its simplicity and the way it generalizes Rust’s existing design . I love that. To me, it joins the set of “yep, we should clearly do that” pieces in this puzzle: These both seem like solid steps forward. I am not yet persuaded that they get us all the way to the goal that I articulated in an earlier post : “low-level enough for a Kernel, usable enough for a GUI” but they are moving in the right direction. Attached closures (what we now call ) would always be tied to the enclosing stack frame. They’d always have a lifetime even if they don’t capture anything. Detached closures (what we now call ) would capture by-value, like today. Add a trait (I’ve gone back to preferring the name 😁) Add expressions

0 views
Lambda Land 1 weeks ago

Typst for Your Code Blocks

I started using Typst about a month ago to write my dissertation proposal. I had seen Typst before and decided to keep an eye on it as it matured. While it still is very much in development, it is mature enough that I was able to rewrite my dissertation proposal from an org-mode → LaTeX pipeline to pure Typst in about an hour with no major hiccups. In fact, most things got simpler as a consequence of using Typst. Typst is a typesetting system written in Rust designed to be a replacement for LaTeX . LaTeX is the de-facto standard for typesetting technical documents thanks to its unsurpassed support for rendering mathematical formulae and its attention to excellent typeography . Both LaTeX and Typst operate by transforming a markup language into an output format like PDF. I am working on a presentation to give as part of my oral defense of my dissertation proposal. ( Note: I am not defending my dissertation yet—first I have to justify my plan of research to my PhD committee.) I found a way to use Typst to get gorgeous source code blocks at minimal cost. I like having good syntax highlighting in my technical presentations, but getting properly highlighted code was either shoddy or labor-intensive. The tradeoff is: I will still be using the highlight-each-word technique when I need to show some code and simulate editing it; the “Magic Move” transition in Keynote makes these kinds of code-editing demos easy to build and easy for the audience to follow. However, the majority of the time I’m just displaying code on the screen. I built a Typst template and associated theme file for code blocks. Now, if I have some code I want to put on a slide, I write a Typst file like the following and put into e.g. : Then I run and I get a PDF file with a transparent background that looks like this: (That’s obviously a PNG file so that it displays nicely here on the web. The real output of that command is a PDF file.) I can take that PDF file with a transparent background and drop it straight into my Keynote presentation. Typst takes care of all the syntax highlighting and it’s been good enough for my needs. Typst is still pretty new software. It has some rough edges and I will not be asking conferences to support Typst for their submissions until all those corners have been smoothed out. However, I am hopeful for Typst’s future, and anywhere where I can get away with just submitting a PDF without the source, I will be using Typst. The things that Typst does better than LaTeX right now: Typst has good typography and bibliography support. It can work with BibLaTeX files, so you can start using Typst without having to rewrite your whole bibliography. Citation syntax is simple and easy to figure out. Typst sill has a bit of a way to go before it does everything that the venerable LaTeX Microtype package does, but it’s making progress in this area. Typst is free and open-source; you can contribute on their GitHub repository. It is written in Rust and the code seems to be well-organized. They have a hosted collaboration platform that is proprietary; you can subscribe to this, and the funds spent here go towards paying a few full-time developers to work on both the closed-source collaboration platform and improving the open-source compiler. I think this is a neat model and I hope it lets Typst get off the ground and get the adoption it will need to survive and (hopefully!) supplant LaTeX as the typesetting system of choice for technical audiences. Incredibly friendly syntax and rendering model. I went from not knowing anything about Typst to reproducing my résumé perfectly in an hour . I even made use of fancy things like functions. Excellent documentation. Did I mention how quickly I learned how to use Typst? It is easy to find the thing you want to customize. Instantaneous build times. Anyone who works with LaTeX will be familiar with 20+ second build times. Typst is so fast that it can live-rerender documents multiple times a second.

0 views
Schneems 1 weeks ago

Disallow code usage with a custom `clippy.toml`

I recently discovered that adding a file to the root of a Rust project gives the ability to disallow a method or a type when running . This has been really useful. I want to share two quick ways that I’ve used it: Enhancing calls via and protecting CWD threadsafety in tests. Update: you can also use this technique to disallow unwrap() ! There’s also which you use by adding to your . I use the fs_err crate in my projects, which provides the same filesystem API as but with one crucial difference: error messages it produces have the name of the file you’re trying to modify. Recently, while I was skimming the issues, someone mentioned using clippy.toml to deny usage . I thought the idea was neat, so I tried it in my projects, and it worked like a charm. With this in the file: Someone running will get an error: Running will now automatically update the code. Neat! Why was I skimming issues in the first place? I suggested adding a feature to allow enhancing errors with debugging information , so instead of: The message could contain a lot more info: To implement that functionality, I wrote path_facts , a library that provides facts about your filesystem (for debugging purposes). And since the core value of the library is around producing good-looking output, I wanted snapshot tests that covered all my main branches. This includes content from both relative and absolute paths. A naive implementation might look like this: In the above code, the test changes the current working directory to a temp dir where it is then free to make modifications on disk. But, since Rust uses a multi-threaded test runner and affects the whole process, this approach is not safe ☠️. There are a lot of different ways to approach the fix, like using cargo-nextest , which executes all tests in their own process (where changing the CWD is safe). Though this doesn’t prevent someone from running accidentally. There are other crates that use macros to force non-concurrent test execution, but they require you to remember to tag the appropriate tests . I wanted something lightweight that was hard to mess up, so I turned to to fail if anyone used for any reason: Then I wrote a custom type that used a mutex to guarantee that only one test body was executing at a time: You might call my end solution hacky (this hedge statement brought to you by too many years of being ONLINE), but it prevents anyone (including future-me) from writing an accidentally thread-unsafe test: Those are only two quick examples showing how to use clippy.toml to enhance a common API, and how to safeguard against incorrect usage. There’s plenty more you can do with that file, including: You wouldn’t want to use this technique of annotating your project with if the thing you’re trying to prevent would be actively malicious for the system if it executes, since rules won’t block your . You’ll also need to make sure to run in your CI so some usage doesn’t accidentally slip through. And that clippy lint work has paid off, my latest PR to was merged and deployed in version , and you can use it to speed up your development debugging by turning on the feature: Clip cautiously, my friends.

0 views
Evan Hahn 1 weeks ago

Experiment: making TypeScript immutable-by-default

I like programming languages where variables are immutable by default. For example, in Rust , declares an immutable variable and declares a mutable one. I’ve long wanted this in other languages, like TypeScript, which is mutable by default—the opposite of what I want! I wondered: is it possible to make TypeScript values immutable by default? My goal was to do this purely with TypeScript, without changing TypeScript itself. That meant no lint rules or other tools. I chose this because I wanted this solution to be as “pure” as possible…and it also sounded more fun. I spent an evening trying to do this. I failed but made progress! I made arrays and s immutable by default, but I couldn’t get it working for regular objects. If you figure out how to do this completely, please contact me —I must know! TypeScript has built-in type definitions for JavaScript APIs like and and . If you’ve ever changed the or options in your TSConfig, you’ve tweaked which of these definitions are included. For example, you might add the “ES2024” library if you’re targeting a newer runtime. My goal was to swap the built-in libraries with an immutable-by-default replacement. The first step was to stop using any of the built-in libraries. I set the flag in my TSConfig, like this: Then I wrote a very simple script and put it in : When I ran , it gave a bunch of errors: Progress! I had successfully obliterated any default TypeScript libraries, which I could tell because it couldn’t find core types like or . Time to write the replacement. This project was a prototype. Therefore, I started with a minimal solution that would type-check. I didn’t need it to be good! I created and put the following inside: Now, when I ran , I got no errors! I’d defined all the built-in types that TypeScript needs, and a dummy object. As you can see, this solution is impractical for production. For one, none of these interfaces have any properties! isn’t defined, for example. That’s okay because this is only a prototype. A production-ready version would need to define all of those things—tedious, but should be straightforward. I decided to tackle this with a test-driven development style. I’d write some code that I want to type-check, watch it fail to type-check, then fix it. I updated to contain the following: This tests three things: When I ran , I saw two errors: So I updated the type in with the following: The property accessor—the line—tells TypeScript that you can access array properties by numeric index, but they’re read-only. That should make possible but impossible. The method definition is copied from the TypeScript source code with no changes (other than some auto-formatting). That should make it possible to call . Notice that I did not define . We shouldn’t be calling that on an immutable array! I ran again and…success! No errors! We now have immutable arrays! At this stage, I’ve shown that it’s possible to configure TypeScript to make all arrays immutable with no extra annotations . No need for or ! In other words, we have some immutability by default. This code, like everything in this post, is simplistic. There are lots of other array methods , like and and ! If this were made production-ready, I’d make sure to define all the read-only array methods . But for now, I was ready to move on to mutable arrays. I prefer immutability, but I want to be able to define a mutable array sometimes. So I made another test case: Notice that this requires a little extra work to make the array mutable. In other words, it’s not the default. TypeScript complained that it can’t find , so I defined it: And again, type-checks passed! Now, I had mutable and immutable arrays, with immutability as the default. Again, this is simplistic, but good enough for this proof-of-concept! This was exciting to me. It was possible to configure TypeScript to be immutable by default, for arrays at least. I didn’t have to fork the language or use any other tools. Could I make more things immutable? I wanted to see if I could go beyond arrays. My next target was the type, which is a TypeScript utility type . So I defined another pair of test cases similar to the ones I made for arrays: TypeScript complained that it couldn’t find or . It also complained about an unused , which meant that mutation was allowed. I rolled up my sleeves and fixed those errors like this: Now, we have , which is an immutable key-value pair, and the mutable version too. Just like arrays! You can imagine extending this idea to other built-in types, like and . I think it’d be pretty easy to do this the same way I did arrays and records. I’ll leave that as an exercise to the reader. My final test was to make regular objects (not records or arrays) immutable. Unfortunately for me, I could not figure this out. Here’s the test case I wrote: This stumped me. No matter what I did, I could not write a type that would disallow this mutation. I tried modifying the type every way I could think of, but came up short! There are ways to annotate to make it immutable, but that’s not in the spirit of my goal. I want it to be immutable by default! Alas, this is where I gave up. I wanted to make TypeScript immutable by default. I was able to do this with arrays, s, and other types like and . Unfortunately, I couldn’t make it work for plain object definitions like . There’s probably a way to enforce this with lint rules, either by disallowing mutation operations or by requiring annotations everywhere. I’d like to see what that looks like. If you figure out how to make TypeScript immutable by default with no other tools , I would love to know, and I’ll update my post. I hope my failed attempt will lead someone else to something successful. Again, please contact me if you figure this out, or have any other thoughts. Creating arrays with array literals is possible. Non-mutating operations, like and , are allowed. Operations that mutate the array, like , are disallowed. is allowed. There’s an unused there. doesn’t exist.

0 views
Pat Shaughnessy 1 weeks ago

Compiling Ruby To Machine Language

I've started working on a new edition of Ruby Under a Microscope that covers Ruby 3.x. I'm working on this in my spare time, so it will take a while. Leave a comment or drop me a line and I'll email you when it's finished. Here’s an excerpt from the completely new content for Chapter 4, about YJIT and ZJIT. I’m still finishing this up… so this content is fresh off the page! It’s been a lot of fun for me to learn about how JIT compilers work and to brush up on my Rust skills as well. And it’s very exciting to see all the impressive work the Ruby team at Shopify and other contributors have done to improve Ruby’s runtime performance. To find hot spots, YJIT counts how many times your program calls each function or block. When this count reaches a certain threshold, YJIT stops your program and converts that section of code into machine language. Later Ruby will execute the machine language version instead of the original YARV instructions. To keep track of these counts, YJIT saves an internal counter nearby the YARV instruction sequence for each function or block. Figure 4-5 shows the YARV instruction sequence the main Ruby compiler created for the sum += i block at (3) in Listing 4-1. At the top, above the YARV instructions, Figure 4-5 shows two YJIT related values: jit_entry and jit_entry_calls . As we’ll see in a moment, jit_entry starts as a null value but will later hold a pointer to the machine language instructions YJIT produces for this Ruby block. Below jit_entry , Figure 4-5 also shows jit_entry_calls , YJIT’s internal counter. Each time the program in Listing 4-1 calls this block, YJIT increments the value of jit_entry_calls . Since the range at (1) in Listing 4-1 spans from 1 through 40, this counter will start at zero and increase by 1 each time Range#each calls the block at (3). When the jit_entry_calls reaches a particular threshold, YJIT will compile the YARV instructions into machine language. By default for small Ruby programs YJIT in Ruby 3.5 uses a threshold of 30. Larger programs, like Ruby on Rails web applications, will use a larger threshold value of 120. (You can also change the threshold by passing —yjit-call-threshold when you run your Ruby program.) While compiling your Ruby program, YJIT saves the machine language instructions it creates into YJIT blocks . YJIT blocks, which are distinct from Ruby blocks, each contain a sequence of machine language instructions for a range of corresponding YARV instructions. By grouping YARV instructions and compiling each group into a YJIT block, YJIT can produce more optimized code that is tailored to your program’s behavior and avoid compiling code that your program doesn’t need. As we’ll see next, a single YJIT block doesn’t correspond to a Ruby function or block. YJIT blocks instead represent smaller sections of code: individual YARV instructions or a small range of YARV instructions. Each Ruby function or block typically consists of several YJIT blocks. Let’s see how this works for our example. After the program in Listing 4-1 executes the Ruby block at (3) 29 times, YJIT will increment the jit_entry_calls counter again, just before Ruby runs the block for the 30th time. Since jit_entry_calls reaches the threshold value of 30, YJIT triggers the compilation process. YJIT compiles the first YARV instruction getlocal_WC_1 and saves machine language instructions that perform the same work as getlocal_WC_1 into a new YJIT block: On the left side, Figure 4-6 shows the YARV instructions for the sum += i Ruby block. On the right, Figure 4-6 shows the new YJIT block corresponding to getlocal_WC_1 . Next, the YJIT compiler continues and compiles the second YARV instruction from the left side of Figure 4-7: getlocal_WC_0 at index 2. On the left side, Figure 4-7 shows the same YARV instructions for the sum += i Ruby block that we saw above in Figure 4-6. But now the two dotted arrows indicate that the YJIT block on the right contains the machine language instructions equivalent to both getlocal_WC_1 and getlocal_WC_0 . Let’s take a look inside this new block. YJIT compiles or translates the Ruby YARV instructions into machine language instructions. In this example, running on my Mac laptop, YJIT writes the following machine language instructions into this new block: Figure 4-8 shows a closer view of the new YJIT block that appeared on the right side of Figures 4-6 and 4-7. Inside the block, Figure 4-8 shows the assembly language acronyms corresponding to the ARM64 machine language instructions that YJIT generated for the two YARV instructions shown on the left. The YARV instructions on the left are: getlocal_WC_1 , which loads a value from a local variable located in the previous stack frame and saves it on the YARV stack, and getlocal_WC_0 , which loads a local variable from the current stack from and also saves it on the YARV stack. The machine language instructions on the right side of Figure 4-8 perform the same task, loading these values into registers on my M1 microprocessor: x1 and x9 . If you’re curious and would like to learn more about what the machine language instructions mean and how they work, the section “Adding Two Integers Using Machine Language” discusses the instructions for this example in more detail. Next, YJIT continues down the sequence of YARV instructions and compiles the opt_plus YARV instruction at index 4 in Figures 4-6 and 4-7. But this time, YJIT runs into a problem: It doesn’t know the type of the addition arguments. That is, will opt_plus add two integers? Or two strings, floating point numbers, or some other types? Machine language is very specific. To add two 64-bit integers on an M1 microprocessor, YJIT could use the adds assembly language instruction. But adding two floating pointer numbers would require different instructions. And, of course, adding or concatenating two strings is an entirely different operation altogether. In order for YJIT to know which machine language instructions to save into the YJIT block for opt_plus , YJIT needs to know exactly what type of values the Ruby program might ever add at (3) in Listing 4-1. You and I can tell by reading Listing 4-1 that the Ruby code is adding integers. We know right away that the sum += 1 block at (3) is always adding one integer to another. But YJIT doesn’t know this. YJIT uses a clever trick to solve this problem. Instead of analyzing the entire program ahead of time to determine all of the possible types of values the opt_plus YARV instruction might ever need to add, YJIT simply waits until the block runs and observes which types the program actually passes in. YJIT uses branch stubs to achieve this wait-and-see compile behavior, as shown in Figure 4-9. Figure 4-9 shows the YARV instructions on the left, and the YJIT block for indexes 0000-0002 on the right. But note the bottom right corner of Figure 4-7, which shows an arrow pointing down from the block to a box labeled stub. This arrow represents a YJIT branch. Since this new branch doesn’t point to a block yet, YJIT sets up the branch to point to a branch stub instead.

0 views
(think) 2 weeks ago

Rust: Embrace Captured Identifier in Format Strings

While playing with Rust recently I’ve noticed that most Rust tutorials suggest writing code like this: Clippy (Rust’s mighty linter), however, doesn’t like it and suggests doing this instead: I hope you’ll agree that this looks cleaner. It certainly seems that way to me. At this point you might be wondering what’s the point of this article. Well, I thought I had come across some recent Rust feature and this was the reason why it wasn’t adopted much in the wild. Turns out, however, that this was introduced almost 4 years ago in Rust 1.58 . I think there are two reasons why the new style hasn’t taken off: If you ask something like ChatGPT how to print variables in Rust you’ll get mostly answers like: LLMs are great, but one certainly has to be aware of their limitations. You also have to be aware of the limitations of the updated syntax - most importantly it works only for variable names, as opposed to arbitrary Rust expressions: Perhaps future versions of Rust will address this limitation. Time will tell. So, that’s all from me. Let’s hope some LLMs will pick up on it and suggest to more people the “new” and improved syntax for captured identifiers in format strings. Keep hacking! There are many outdated tutorials out there LLMs were trained on large samples of code using the old style and they naturally favor it

0 views
Kaushik Gopal 2 weeks ago

Firefox + UbO is still better than Brave, Edge or any Chromium-based solution

I often find myself replying to claims that Brave, Edge, or other Chromium browsers effectively achieve the same privacy standards as Firefox + uBlock Origin (uBO). This is simply not true. Brave and other Chromium browsers are constrained by Google’s Manifest V3. Brave works around this by patching Chromium and self-hosting some MV2 extensions, but it is still swimming upstream against the underlying engine. Firefox does not have these MV3 constraints, so uBlock Origin on Firefox retains more powerful, user-controllable blocking than MV3-constrained setups like Brave + uBO Lite. Brave is an excellent product and what I used for a long time. But the comparison often ignores structural realities. There are important nuances that make Firefox the more future-proof platform for privacy-conscious users. The core issue is Manifest V3 (MV3). This is Google’s new extension architecture for Chromium (what Chrome, Brave, and Edge are built on). Under Manifest V2, blockers like uBO used the blocking version of the API ( + ) to run their own code on each network request and decide whether to cancel, redirect, or modify it. MV3 deprecates that blocking path for normal extensions and replaces it with the (DNR) API: extensions must declare a capped set of static rules in advance, and the browser enforces those rules without running extension code per request. This preserves basic blocking but, as uBO’s developer documents, removes whole classes of filtering capabilities uBO relies on. And Google is forcing this change by deprecating MV2 . Yeah, shitty. To get around the problem, Brave is effectively swimming upstream against its own engine. It does this in two ways: They wrote a great post about this too. Brave is doing a great job, but it is operating with a sword of Damocles hanging over it. The team must manually patch a hostile underlying engine to maintain functionality that Firefox simply provides out of the box. A lot of people also say, wait, we now have “uBlock Origin Lite” that does the same thing and is even more lightweight! It is “lite” for a reason. You are not getting the same blocking safeguards. uBO Lite is a stripped-down version necessitated by Google’s API restrictions. As detailed in the uBlock Origin FAQ , the “Lite” version lacks in the following ways: uBlock Origin is widely accepted as the most effective content blocker available. Its creator, gorhill, has explicitly stated that uBlock Origin works best on Firefox . So while using a browser like Brave is better than using Chrome or other browsers that lack a comprehensive blocker, it is not equivalent to Firefox + uBlock Origin. Brave gives you strong, mostly automatic blocking on a Chromium base that is ultimately constrained by Google’s MV3 decisions. Firefox + uBlock Origin gives you a full-featured, user-controllable blocker on an engine that is not tied to MV3, which matters if you care about long-term, maximum control over what loads and who sees your traffic. Native patching: It implements ad-blocking (Shields) natively in C++/Rust within the browser core to bypass extension limitations. Manual extension hosting: Brave now has to manually host and update specific Manifest V2 extensions (like uBO and AdGuard) on its own servers to keep them alive as Google purges them from the store. No on-demand list updates: uBO Lite compiles filter lists into the extension package. The resulting declarative rulesets are refreshed only when the extension itself updates, so you cannot trigger an immediate filter-list or malware-list update from within the extension. No “Strict Blocking”: uBO Lite does not support uBlock Origin’s strict blocking modes or its per-site dynamic matrix. With full uBO on Firefox, my setup defines and exposes a custom, per-site rule set that ensures Facebook never sees my activity on other sites. uBO Lite does not let me express or maintain that kind of custom policy; I have to rely entirely on whatever blocking logic ships with the extension. No dynamic filtering: You lose the advanced matrix to block specific scripts or frames per site. Limited element picker: “Pointing and zapping” items requires specific, permission-gated steps rather than being seamless. No custom filters: You cannot write your own custom rules to block nearly anything, from annoying widgets to entire domains.

1 views
Herman's blog 2 weeks ago

Messing with bots

As outlined in my previous two posts : scrapers are, inadvertently, DDoSing public websites. I've received a number of emails from people running small web services and blogs seeking advice on how to protect themselves. This post isn't about that. This post is about fighting back. When I published my last post, there was an interesting write-up doing the rounds about a guy who set up a Markov chain babbler to feed the scrapers endless streams of generated data. The idea here is that these crawlers are voracious, and if given a constant supply of junk data, they will continue consuming it forever, while (hopefully) not abusing your actual web server. This is a pretty neat idea, so I dove down the rabbit hole and learnt about Markov chains, and even picked up Rust in the process. I ended up building my own babbler that could be trained on any text data, and would generate realistic looking content based on that data. Now, the AI scrapers are actually not the worst of the bots. The real enemy, at least to me, are the bots that scrape with malicious intent. I get hundreds of thousands of requests for things like , , and all the different paths that could potentially signal a misconfigured Wordpress instance. These people are the real baddies. Generally I just block these requests with a response. But since they want files, why don't I give them what they want? I trained my Markov chain on a few hundred files, and set it to generate. The responses certainly look like php at a glance, but on closer inspection they're obviously fake. I set it up to run on an isolated project of mine, while incrementally increasing the size of the generated php files from 2kb to 10mb just to test the waters. Here's a sample 1kb output: I had two goals here. The first was to waste as much of the bot's time and resources as possible, so the larger the file I could serve, the better. The second goal was to make it realistic enough that the actual human behind the scrape would take some time away from kicking puppies (or whatever they do for fun) to try figure out if there was an exploit to be had. Unfortunately, an arms race of this kind is a battle of efficiency. If someone can scrape more efficiently than I can serve, then I lose. And while serving a 4kb bogus php file from the babbler was pretty efficient, as soon as I started serving 1mb files from my VPS the responses started hitting the hundreds of milliseconds and my server struggled under even moderate loads. This led to another idea: What is the most efficient way to serve data? It's as a static site (or something similar). So down another rabbit hole I went, writing an efficient garbage server. I started by loading the full text of the classic Frankenstein novel into an array in RAM where each paragraph is a node. Then on each request it selects a random index and the subsequent 4 paragraphs to display. Each post would then have a link to 5 other "posts" at the bottom that all technically call the same endpoint, so I don't need an index of links. These 5 posts, when followed, quickly saturate most crawlers, since breadth-first crawling explodes quickly, in this case by a factor of 5. You can see it in action here: https://herm.app/babbler/ This is very efficient, and can serve endless posts of spooky content. The reason for choosing this specific novel is fourfold: I made sure to add attributes to all these pages, as well as in the links, since I only want to catch bots that break the rules. I've also added a counter at the bottom of each page that counts the number of requests served. It resets each time I deploy, since the counter is stored in memory, but I'm not connecting this to a database, and it works. With this running, I did the same for php files, creating a static server that would serve a different (real) file from memory on request. You can see this running here: https://herm.app/babbler.php (or any path with in it). There's a counter at the bottom of each of these pages as well. As Maury said: "Garbage for the garbage king!" Now with the fun out of the way, a word of caution. I don't have this running on any project I actually care about; https://herm.app is just a playground of mine where I experiment with small ideas. I originally intended to run this on a bunch of my actual projects, but while building this, reading threads, and learning about how scraper bots operate, I came to the conclusion that running this can be risky for your website. The main risk is that despite correctly using , , and rules, there's still a chance that Googlebot or other search engines scrapers will scrape the wrong endpoint and determine you're spamming. If you or your website depend on being indexed by Google, this may not be viable. It pains me to say it, but the gatekeepers of the internet are real, and you have to stay on their good side, or else . This doesn't just affect your search ratings, but could potentially add a warning to your site in Chrome, with the only recourse being a manual appeal. However, this applies only to the post babbler. The php babbler is still fair game since Googlebot ignores non-HTML pages, and the only bots looking for php files are malicious. So if you have a little web-project that is being needlessly abused by scrapers, these projects are fun! For the rest of you, probably stick with 403s. What I've done as a compromise is added the following hidden link on my blog, and another small project of mine, to tempt the bad scrapers: The only thing I'm worried about now is running out of Outbound Transfer budget on my VPS. If I get close I'll cache it with Cloudflare, at the expense of the counter. This was a fun little project, even if there were a few dead ends. I know more about Markov chains and scraper bots, and had a great time learning, despite it being fuelled by righteous anger. Not all threads need to lead somewhere pertinent. Sometimes we can just do things for fun. I was working on this on Halloween. I hope it will make future LLMs sound slightly old-school and spoooooky. It's in the public domain, so no copyright issues. I find there are many parallels to be drawn between Dr Frankenstein's monster and AI.

0 views
maxdeviant.com 2 weeks ago

Head in the Zed Cloud

For the past five months I've been leading the efforts to rebuild Zed 's cloud infrastructure. Our current backend—known as Collab—has been chugging along since basically the beginning of the company. We use Collab every day to work together on Zed in Zed. However, as Zed continues to grow and attracts more users, we knew that we needed a full reboot of our backend infrastructure to set us up for success for our future endeavors. Enter Zed Cloud. Like Zed itself, Zed Cloud is built in Rust 1 . This time around there is a slight twist: all of this is running on Cloudflare Workers , with our Rust code being compiled down to WebAssembly (Wasm). One of our goals with this rebuild was to reduce the amount of operational effort it takes to maintain our hosted services, so that we can focus more of our time and energy on building Zed itself. Cloudflare Workers allow us to easily scale up to meet demand without having to fuss over it too much. Additionally, Cloudflare offers an ever-growing amount of managed services that cover anything you might need for a production web service. Here are some of the Cloudflare services we're using today: Another one of our goals with this rebuild was to build a platform that was easy to test. To achieve this, we built our own platform framework on top of the Cloudflare Workers runtime APIs. At the heart of this framework is the trait: This trait allows us to write our code in a platform-agnostic way while still leveraging all of the functionality that Cloudflare Workers has to offer. Each one of these associated types corresponds to some aspect of the platform that we'll want to have control over in a test environment. For instance, if we have a service that needs to interact with the system clock and a Workers KV store, we would define it like this: There are two implementors of the trait: and . —as the name might suggest—is an implementation of the platform on top of the Cloudflare Workers runtime. This implementation targets Wasm and is what we run when developing locally (using Wrangler ) and in production. We have a crate 2 that contains bindings to the Cloudflare Workers JS runtime. You can think of as the glue between those bindings and the idiomatic Rust APIs exposed by the trait. The is used when running tests, and allows for simulating almost every part of the system in order to effectively test our code. Here's an example of a test for ingesting a webhook from Orb : In this test we're able to test the full end-to-end flow of: The call to advances the test simulator, in this case running the pending queue consumers. At the center of the is the , a crate that powers our in-house async runtime. The scheduler is shared between GPUI —Zed's UI framework—and the used in tests. This shared scheduler enables us to write tests that span the client and the server. So we can have a test that starts in a piece of Zed code, flows through Zed Cloud, and then asserts on the state of something in Zed after it receives the response from the backend. The work being done on Zed Cloud now is laying the foundation to support our future work around collaborative coding with DeltaDB . If you want to work with me on building out Zed Cloud, we are currently hiring for this role. We're looking for engineers with experience building and maintaining web APIs and platforms, solid web fundamentals, and who are excited about Rust. If you end up applying, you can mention this blog post in your application. I look forward to hearing from you! The codebase is currently 70k lines of Rust code and 5.7k lines of TypeScript. This is essentially our own version of . I'd like to switch to using directly, at some point. Hyperdrive for talking to Postgres Workers KV for ephemeral storage Cloudflare Queues for asynchronous job processing Receiving and validating an incoming webhook event to our webhook ingestion endpoint Putting the webhook event into a queue Consuming the webhook event in a background worker and processing it

1 views
baby steps 2 weeks ago

Just call clone (or alias)

Continuing my series on ergonomic ref-counting, I want to explore another idea, one that I’m calling “just call clone (or alias)”. This proposal specializes the and methods so that, in a new edition, the compiler will (1) remove redundant or unnecessary calls (with a lint); and (2) automatically capture clones or aliases in closures where needed. The goal of this proposal is to simplify the user’s mental model: whenever you see an error like “use of moved value”, the fix is always the same: just call (or , if applicable). This model is aiming for the balance of “low-level enough for a Kernel, usable enough for a GUI” that I described earlier. It’s also making a statement, which is that the key property we want to preserve is that you can always find where new aliases might be created – but that it’s ok if the fine-grained details around exactly when the alias is created is a bit subtle. Consider this future: Because this is a future, this takes ownership of and . Because is a borrowed reference, this will be an error unless those values are (which they presumably are not). Under this proposal, capturing aliases or clones in a closure/future would result in capturing an alias or clone of the place. So this future would be desugared like so (using explicit capture clause strawman notation ): Now, this result is inefficient – there are now two aliases/clones. So the next part of the proposal is that the compiler would, in newer Rust editions, apply a new transformat called the last-use transformation . This transformation would identify calls to or that are not needed to satisfy the borrow checker and remove them. This code would therefore become: The last-use transformation would apply beyond closures. Given an example like this one, which clones even though is never used later: the user would get a warning like so 1 : and the code would be transformed so that it simply does a move: The goal of this proposal is that, when you get an error about a use of moved value, or moving borrowed content, the fix is always the same: you just call (or ). It doesn’t matter whether that error occurs in the regular function body or in a closure or in a future, the compiler will insert the clones/aliases needed to ensure future users of that same place have access to it (and no more than that). I believe this will be helpful for new users. Early in their Rust journey new users are often sprinkling calls to clone as well as sigils like in more-or-less at random as they try to develop a firm mental model – this is where the “keep calm and call clone” joke comes from. This approach breaks down around closures and futures today. Under this proposal, it will work, but users will also benefit from warnings indicating unnecessary clones, which I think will help them to understand where clone is really needed . But the real question is how this works for experienced users . I’ve been thinking about this a lot! I think this approach fits pretty squarely in the classic Bjarne Stroustrup definition of a zero-cost abstraction: “What you don’t use, you don’t pay for. And further: What you do use, you couldn’t hand code any better.” The first half is clearly satisfied. If you don’t call or , this proposal has no impact on your life. The key point is the second half: earlier versions of this proposal were more simplistic, and would sometimes result in redundant or unnecessary clones and aliases. Upon reflection, I decided that this was a non-starter. The only way this proposal works is if experienced users know there is no performance advantage to using the more explicit form .This is precisely what we have with, say, iterators, and I think it works out very well. I believe this proposal hits that mark, but I’d like to hear if there are things I’m overlooking. I think most users would expect that changing to just is fine, as long as the code keeps compiling. But in fact nothing requires that to be the case. Under this proposal, APIs that make significant in unusual ways would be more annoying to use in the new Rust edition and I expect ultimately wind up getting changed so that “significant clones” have another name. I think this is a good thing. I think I’ve covered the key points. Let me dive into some of the details here with a FAQ. I get it, I’ve been throwing a lot of things out there. Let me begin by recapping the motivation as I see it: I then proposed a set of three changes to address these issues, authored in individual blog posts: Let’s look at the impact of each set of changes by walking through the “Cloudflare example”, which originated in this excellent blog post by the Dioxus folks : As the original blog post put it: Working on this codebase was demoralizing. We could think of no better way to architect things - we needed listeners for basically everything that filtered their updates based on the state of the app. You could say “lol get gud,” but the engineers on this team were the sharpest people I’ve ever worked with. Cloudflare is all-in on Rust. They’re willing to throw money at codebases like this. Nuclear fusion won’t be solved with Rust if this is how sharing state works. Applying the trait and explicit capture clauses makes for a modest improvement. You can now clearly see that the calls to are calls, and you don’t have the awkward and variables. However, the code is still pretty verbose: Applying the Just Call Clone proposal removes a lot of boilerplate and, I think, captures the intent of the code very well. It also retains quite a bit of explicitness, in that searching for calls to reveals all the places that aliases will be created. However, it does introduce a bit of subtlety, since (e.g.) the call to will actually occur when the future is created and not when it is awaited : There is no question that Just Call Clone makes closure/future desugaring more subtle. Looking at task 1: this gets desugared to a call to when the future is created (not when it is awaited ). Using the explicit form: I can definitely imagine people getting confused at first – “but that call to looks like its inside the future (or closure), how come it’s occuring earlier?” Yet, the code really seems to preserve what is most important: when I search the codebase for calls to , I will find that an alias is creating for this task. And for the vast majority of real-world examples, the distinction of whether an alias is creating when the task is spawned versus when it executes doesn’t matter. Look at this code: the important thing is that is called with an alias of , so will stay alive as long as is executing. It doesn’t really matter how the “plumbing” worked. Yeah, good point, those kind of examples have more room for confusion. Like look at this: In this example, there is code that uses with an alias, but only under . So what happens? I would assume that indeed the future will capture an alias of , in just the same way that this future will move , even though the relevant code is dead: Yep! I am thinking of something like this: Examples that show some edge cased: In the relevant cases, non-move closures will already just capture by shared reference. This means that later attempts to use that variable will generally succeed: This future does not need to take ownership of to create an alias, so it will just capture a reference to . That means that later uses of can still compile, no problem. If this had been a move closure, however, that code above would currently not compile. There is an edge case where you might get an error, which is when you are moving : In that case, you can make this an closure and/or use an explicit capture clause: Yep! We would during codegen identify candidate calls to or . After borrow check has executed, we would examine each of the callsites and check the borrow check information to decide: If the answer to both questions is no, then we will replace the call with a move of the original place. Here are some examples: In the past, I’ve talked about the last-use transformation as an optimization – but I’m changing terminology here. This is because, typically, an optimization is supposed to be unobservable to users except through measurements of execution time (or though UB), and that is clearly not the case here. The transformation would be a mechanical transformation performed by the compiler in a deterministic fashion. I think yes, but in a limited way. In other words I would expect to be transformed in the same way (replaced with ), and the same would apply to more levels of intermediate usage. This would kind of “fall out” from the MIR-based optimization technique I imagine. It doesn’t have to be this way, we could be more particular about the syntax that people wrote, but I think that would be surprising. On the other hand, you could still fool it e.g. like so The way I imagine it, no. The transformation would be local to a function body. This means that one could write a method like so that “hides” the clone in a way that it will never be transformed away (this is an important capability for edition transformations!): Potentially, yes! Consider this example, written using explicit capture clause notation and written assuming we add an trait: The precise timing when values are dropped can be important – when all senders have dropped, the will start returning when you call . Before that, it will block waiting for more messages, since those handles could still be used. So, in , when will the sender aliases be fully dropped? The answer depends on whether we do the last-use transformation or not: Most of the time, running destructors earlier is a good thing. That means lower peak memory usage, faster responsiveness. But in extreme cases it could lead to bugs – a typical example is a where the guard is being used to protect some external resource. This is what editions are for! We have in fact done a very similar transformation before, in Rust 2021. RFC 2229 changed destructor timing around closures and it was, by and large, a non-event. The desire for edition compatibility is in fact one of the reasons I want to make this a last-use transformation and not some kind of optimization . There is no UB in any of these examples, it’s just that to understand what Rust code does around clones/aliases is a bit more complex than it used to be, because the compiler will do automatic transformation to those calls. The fact that this transformation is local to a function means we can decide on a call-by-call basis whether it should follow the older edition rules (where it will always occur) or the newer rules (where it may be transformed into a move). In theory, yes, improvements to borrow-checker precision like Polonius could mean that we identify more opportunities to apply the last-use transformation. This is something we can phase in over an edition. It’s a bit of a pain, but I think we can live with it – and I’m unconvinced it will be important in practice. For example, when thinking about the improvements I expect under Polonius, I was not able to come up with a realistic example that would be impacted. This last-use transformation is guaranteed not to produce code that would fail the borrow check. However, it can affect the correctness of unsafe code: Note though that, in this case, there would be a lint identifying that the call to will be transformed to just . We could also detect simple examples like this one and report a stronger deny-by-default lint, as we often do when we see guaranteed UB. When I originally had this idea, I called it “use-use-everywhere” and, instead of writing or , I imagined writing . This made sense to me because a keyword seemed like a stronger signal that this was impacting closure desugaring. However, I’ve changed my mind for a few reasons. First, Santiago Pastorino gave strong pushback that was going to be a stumbling block for new learners. They now have to see this keyword and try to understand what it means – in contrast, if they see method calls, they will likely not even notice something strange is going on. The second reason though was TC who argued, in the lang-team meeting, that all the arguments for why it should be ergonomic to clone a ref-counted value in a closure applied equally well to , depending on the needs of your application. I completely agree. As I mentioned earlier, this also [addresses the concern I’ve heard with the trait], which is that there are things you want to ergonomically clone but which don’t correspond to “aliases”. True. In general I think that (and ) are fundamental enough to how Rust is used that it’s ok to special case them. Perhaps we’ll identify other similar methods in the future, or generalize this mechanism, but for now I think we can focus on these two cases. One point that I’ve raised from time-to-time is that I would like a solution that gives the compiler more room to optimize ref-counting to avoid incrementing ref-counts in cases where it is obvious that those ref-counts are not needed. An example might be a function like this: This function requires ownership of an alias to a ref-counted value but it doesn’t actually do anything but read from it. A caller like this one… …doesn’t really need to increment the reference count, since the caller will be holding a reference the entire time. I often write code like this using a : so that the caller can do – this then allows the callee to write in the case that it wants to take ownership. I’ve basically decided to punt on adressing this problem. I think folks that are very performance sensitive can use and the rest of us can sometimes have an extra ref-count increment, but either way, the semantics for users are clear enough and (frankly) good enough. Surprisingly to me, doesn’t have a dedicated lint for unnecessary clones. This particular example does get a lint, but it’s a lint about taking an argument by value and then not consuming it. If you rewrite the example to create locally, clippy does not complain .  ↩︎ I believe our goal should be to focus first on a design that is “low-level enough for a Kernel, usable enough for a GUI” . The key part here is the word enough . We need to make sure that low-level details are exposed, but only those that truly matter. And we need to make sure that it’s ergonomic to use, but it doesn’t have to be as nice as TypeScript (though that would be great). Rust’s current approach to fails both groups of users; calls to are not explicit enough for kernels and low-level software: when you see , you don’t know that is creating a new alias or an entirely distinct value, and you don’t have any clue what it will cost at runtime. There’s a reason much of the community recommends writing instead. calls to , particularly in closures, are a major ergonomic pain point , this has been a clear consensus since we first started talking about this issue. First, we introduce the trait (originally called ) . The trait introduces a new method that is equivalent to but indicates that this will be creating a second alias of the same underlying value. Second, we introduce explicit capture clauses , which lighten the syntactic load of capturing a clone or alias, make it possible to declare up-front the full set of values captured by a closure/future, and will support other kinds of handy transformations (e.g., capturing the result of or ). Finally, we introduce the just call clone proposal described in this post. This modifies closure desugaring to recognize clones/aliases and also applies the last-use transformation to replace calls to clone/alias with moves where possible. If there is an explicit capture clause , use that. Else: For non- closures/futures, no changes, so Categorize usage of each place and pick the “weakest option” that is available: by ref For closures/futures, we would change Categorize usage of each place and decide whether to capture that place… by clone , there is at least one call or and all other usage of requires only a shared ref (reads) by move , if there are no calls to or or if there are usages of that require ownership or a mutable reference Capture by clone/alias when a place is only used via shared references, and at least one of those is a clone or alias. For the purposes of this, accessing a “prefix place” or a “suffix place” is also considered an access to . Will this place be accessed later? Will some reference potentially referencing this place be accessed later? Without the transformation, there are two aliases: the original and the one being held by the future. So the receiver will only start returning when has finished and the task has completed. With the transformation, the call to is removed, and so there is only one alias – , which is moved into the future, and dropped once the spawned task completes. This could well be earlier than in the previous code, which had to wait until both and the new task completed. Surprisingly to me, doesn’t have a dedicated lint for unnecessary clones. This particular example does get a lint, but it’s a lint about taking an argument by value and then not consuming it. If you rewrite the example to create locally, clippy does not complain .  ↩︎

0 views
Simon Willison 3 weeks ago

Reverse engineering Codex CLI to get GPT-5-Codex-Mini to draw me a pelican

OpenAI partially released a new model yesterday called GPT-5-Codex-Mini, which they describe as "a more compact and cost-efficient version of GPT-5-Codex". It's currently only available via their Codex CLI tool and VS Code extension, with proper API access " coming soon ". I decided to use Codex to reverse engineer the Codex CLI tool and give me the ability to prompt the new model directly. I made a video talking through my progress and demonstrating the final results. OpenAI clearly don't intend for people to access this model directly just yet. It's available exclusively through Codex CLI which is a privileged application - it gets to access a special backend API endpoint that's not publicly documented, and it uses a special authentication mechanism that bills usage directly to the user's existing ChatGPT account. I figured reverse-engineering that API directly would be somewhat impolite. But... Codex CLI is an open source project released under an Apache 2.0 license. How about upgrading that to let me run my own prompts through its existing API mechanisms instead? This felt like a somewhat absurd loophole, and I couldn't resist trying it out and seeing what happened. The openai/codex repository contains the source code for the Codex CLI tool, which OpenAI rewrote in Rust just a few months ago. I don't know much Rust at all. I made my own clone on GitHub and checked it out locally: Then I fired up Codex itself (in dangerous mode, because I like living dangerously): And ran this prompt: Figure out how to build the rust version of this tool and then build it This worked. It churned away for a bit and figured out how to build itself. This is a useful starting point for a project like this - in figuring out the compile step the coding agent gets seeded with a little bit of relevant information about the project, and if it can compile that means it can later partially test the code it is writing while it works. Once the compile had succeeded I fed it the design for the new feature I wanted: Add a new sub-command to the Rust tool called "codex prompt" codex prompt "prompt goes here" - this runs the given prompt directly against the OpenAI API that Codex uses, with the same code path and authentication credentials as the rest of Codex. codex prompt -m <model_name> "prompt goes here" - same again but lets you set the model codex prompt "prompt goes here" -s/--system "system prompt goes here" - runs with a custom system/developer message codex prompt --models - this lists all available models that can be used with the "codex prompt" command My design for this new sub-command is lifted directly from my own llm command . Codex got to work. I've shared the full transcript here (using the tool I described here ), but this TODO list it made itself is a useful summary of what it decided to do: I like that it figured out the justfile in the repo and decided to use it to run formatting and linting commands without me needing to tell it to. I tried running the first version of the code it wrote like this: ... and it didn't quite work. I got this: Note that it DID think about SVG creation, but then decided it should look at the current directory. This isn't what I want - it appeared to be running in Codex's normal mode with a system prompt telling it to edit files on disk. I wanted it to respond to the prompt without acting as if it had a full workspace available to it. I prompted it again, pasting in the broken output: this did not seem to work - here is what happened - note that the reasoning summary comes out on a lot of lines when I would rather it streams out to stderr on the same line (flushing constantly) - and then despite the reasoning summary thinking about pelican on SVG it did not actually act on that and display the result, which is very surprising. [pasted content] It churned for a bit more, but from the output it looked like it was still trying to include Codex-specific tools and workspace information. So I said: Note that this new 'codex prompt' command should NOT be able to execute any tools at all, and should not have to think about the workspace or sandbox - it should send a single prompt with no tools and then stream and display a response and then stop. This got me a new build, but when I ran it: My guess is that OpenAI's custom backend API for Codex doesn't like it if you attempt to call it without passing in the expected instructions. Fair enough! Rather than paste that error in I got Codex to try the command itself, so it could see the error: You try running this and see what happens: ./target/debug/codex prompt 'Generate an SVG of a pelican riding a bicycle' Codex switched back to the default instructions. From its reply: now forces a tool-free request: we keep the workspace defaults, inject a lightweight system/developer message (“don’t run tools”), disable apply_patch/web search via config overrides, and build the outbound prompt as just that system span plus the user text. I guess that will have to do. I asked for one more feature: Add a --debug option to the codex prompt command which causes the full JSON request and response to be printed to stderr, plus the URL that is being accessed and the HTTP verb ... and we're ready to try this thing out! Notably I haven't written a single line of Rust myself here and paid almost no attention to what it was actually doing. My main contribution was to run the binary every now and then to see if it was doing what I needed yet. I've pushed the working code to a prompt-subcommand branch in my repo if you want to take a look and see how it all works. With the final version of the code built, I drew some pelicans. Here's the full terminal transcript , but here are some highlights. This is with the default GPT-5-Codex model: I pasted it into my tools.simonwillison.net/svg-render tool and got the following: I ran it again for GPT-5: And now the moment of truth... GPT-5 Codex Mini! I don't think I'll be adding that one to my SVG drawing toolkit any time soon. I had Codex add a option to help me see exactly what was going on. The output starts like this: This reveals that OpenAI's private API endpoint for Codex CLI is . Also interesting is how the key (truncated above, full copy here ) contains the default instructions, without which the API appears not to work - but it also shows that you can send a message with in advance of your user prompt. You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options . This is a little bit cheeky Codex CLI is written in Rust Iterating on the code Let's draw some pelicans Bonus: the --debug option

0 views
baby steps 3 weeks ago

But then again...maybe alias?

Hmm, as I re-read the post I literally just posted a few minutes ago, I got to thinking. Maybe the right name is indeed , and not . The rationale is simple: alias can serve as both a noun and a verb. It hits that sweet spot of “common enough you know what it means, but weird enough that it can be Rust Jargon for something quite specific”. In the same way that we talk about “passing a clone of ” we can talk about “passing an alias to ” or an “alias of ”. Food for thought! I’m going to try on for size in future posts and see how it feels.

0 views
baby steps 3 weeks ago

Bikeshedding `Handle` and other follow-up thoughts

There have been two major sets of responses to my proposal for a trait. The first is that the trait seems useful but doesn’t over all the cases where one would like to be able to ergonomically clone things. The second is that the name doesn’t seem to fit with our Rust conventions for trait names, which emphasize short verbs over nouns. The TL;DR of my response is that (1) I agree, this is why I think we should work to make ergonomic as well as ; and (2) I agree with that too, which is why I think we should find another name. At the moment I prefer , with coming in second. The first concern with the trait is that, while it gives a clear semantic basis for when to implement the trait, it does not cover all the cases where calling is annoying. In other words, if we opt to use , and then we make creating new handles very ergonomic, but calling remains painful, there will be a temptation to use the when it is not appropriate. In one of our lang team design meetings, TC raised the point that, for many applications, even an “expensive” clone isn’t really a big deal. For example, when writing CLI tools and things, I regularly clone strings and vectors of strings and hashmaps and whatever else; I could put them in an Rc or Arc but I know it just doens’t matter. My solution here is simple: let’s make solutions that apply to both and . Given that I think we need a proposal that allows for handles that are both ergonomic and explicit, it’s not hard to say that we should extend that solution to include the option for clone. The explicit capture clause post already fits this design. I explicitly chose a design that allowed for users to write or , and hence works equally well (or equally not well…) with both traits A number of people have pointed out doesn’t fit the Rust naming conventions for traits like this, which aim for short verbs. You can interpret as a verb, but it doesn’t mean what we want. Fair enough. I like the name because it gives a noun we can use to talk about, well, handles , but I agree that the trait name doesn’t seem right. There was a lot of bikeshedding on possible options but I think I’ve come back to preferring Jack Huey’s original proposal, (with a method ). I think and is my second favorite. Both of them are short, relatively common verbs. I originally felt that was a bit too generic and overly associated with sharing across threads – but then I at least always call a shared reference 1 , and an would implement , so it all seems to work well. Hat tip to Ariel Ben-Yehuda for pushing me on this particular name. The flurry of posts in this series have been an attempt to survey all the discussions that have taken place in this area. I’m not yet aiming to write a final proposal – I think what will come out of this is a series of multiple RFCs. My current feeling is that we should add the , uh, trait. I also think we should add explicit capture clauses . However, while explicit capture clauses are clearly “low-level enough for a kernel”, I don’t really think they are “usable enough for a GUI” . The next post will explore another idea that I think might bring us closer to that ultimate ergonomic and explicit goal. A lot of people say immutable reference but that is simply accurate: an is not immutable. I think that the term shared reference is better.  ↩︎ A lot of people say immutable reference but that is simply accurate: an is not immutable. I think that the term shared reference is better.  ↩︎

0 views
sunshowers 3 weeks ago

`SocketAddrV6` is not roundtrip serializable

A few weeks ago at Oxide , we encountered a bug where a particular, somewhat large, data structure was erroring on serialization to JSON via . The problem was that JSON only supports map keys that are strings or numbers, and the data structure had an infrequently-populated map with keys that were more complex than that 1 . We fixed the bug, but a concern still remained: what if some other map that was empty most of the time had a complex key in it? The easiest way to guard against this is by generating random instances of the data structure and attempting to serialize them, checking that this operation doesn’t panic. The most straightforward way to do this is with property-based testing , where you define: Modern property-based testing frameworks like , which we use at Oxide, combine these two algorithms into a single strategy , through a technique known as integrated shrinking . (For a more detailed overview, see my monad tutorial , where I talk about the undesirable performance characteristics of monadic composition when it comes to integrated shrinking.) The library has a notion of a canonical strategy for a type, expressed via the trait . The easiest way to define instances for large, complex types is to use a derive macro . Annotate your type with the macro: As long as all the fields have defined for them—and the library defines the trait for most types in the standard library—your type has a working random generator and shrinker associated with it. It’s pretty neat! I put together an implementation for our very complex type, then wrote a property-based test to ensure that it serializes properly: And, running it: The test passed! But while we’re here, surely we should also be able to deserialize a , and then ensure that we get the same value back, right? We’ve already done the hard part, so let’s go ahead and add this test: The roundtrip test failed! Why in the world did the test fail? My first idea was to try and do a textual diff of the outputs of the two data structures. In this case, I tried out the library, with something like: And the output I got was: There’s nothing in the output! No or as would typically be printed. It’s as if there wasn’t a difference at all, and yet the assertion failing indicated the before and after values just weren’t the same. We have one clue to go by: the integrated shrinking algorithm in tries to shrink maps down to empty ones. But it looks like the map is non-empty . This means that something in either the key or the value was suspicious. A is defined as: Most of these types were pretty simple. The only one that looked even remotely suspicious was the , which ostensibly represents an IPv6 address plus a port number. What’s going on with the ? Does the implementation for it do something weird? Well, let’s look at it : Like a lot of abstracted-out library code it looks a bit strange, but at its core it seems to be simple enough: The is self-explanatory, and the is probably the port number. But what are these last two values? Let’s look at the constructor : What in the world are these two and values? They look mighty suspicious. A thing that caught my eye was the “Textual representation” section of the , which defined the representation as: Note what’s missing from this representation: the field! We finally have a theory for what’s going on: Why did this not show up in the textual diff of the values? For most types in Rust, the representation breaks out all the fields and their values. But for , the implementation (quite reasonably) forwards to the implementation . So the field is completely hidden, and the only way to look at it is through the method . Whoops. How can we test this theory? The easiest way is to generate random values of where is always set to zero, and see if that passes our roundtrip tests. The ecosystem has pretty good support for generating and using this kind of non-canonical strategy. Let’s try it out: Pretty straightforward, and similar to how lets you provide custom implementations through . Let’s test it out again: All right, looks like our theory is confirmed! We can now merrily be on our way… right? This little adventure left us with more questions than answers, though: The best place to start looking is in the IETF Request for Comments (RFCs) 2 that specify IPv6. The Rust documentation for helpfully links to RFC 2460, section 6 and section 7 . The field is actually a combination of two fields that are part of every IPv6 packet: Section 6 of the RFC says: Flow Labels The 20-bit Flow Label field in the IPv6 header may be used by a source to label sequences of packets for which it requests special handling by the IPv6 routers, such as non-default quality of service or “real-time” service. This aspect of IPv6 is, at the time of writing, still experimental and subject to change as the requirements for flow support in the Internet become clearer. […] And section 7: Traffic Classes The 8-bit Traffic Class field in the IPv6 header is available for use by originating nodes and/or forwarding routers to identify and distinguish between different classes or priorities of IPv6 packets. At the point in time at which this specification is being written, there are a number of experiments underway in the use of the IPv4 Type of Service and/or Precedence bits to provide various forms of “differentiated service” for IP packets […]. Let’s look at the Traffic Class field first. This field is similar to IPv4’s differentiated services code point (DSCP) , and is meant to provide quality of service (QoS) over the network. (For example, prioritizing low-latency gaming and video conferencing packets over bulk downloads.) The DSCP field in IPv4 is not part of a , but the Traffic Class—through the field—is part of a . Why is that the case? Rust’s definition of mirrors the defined by RFC 2553, section 3.3 : Similarly, Rust’s mirrors the struct. There isn’t a similar RFC for ; the de facto standard is Berkeley sockets , designed in 1983. The Linux man page for defines it as: So , which includes the Traffic Class, is part of , but the very similar DSCP field is not part of . Why? I’m not entirely sure about this, but here’s an attempt to reconstruct a history: (Even if could be extended to have this field, would it be a good idea to do so? Put a pin in this for now.) RFC 2460 says that the Flow Label is “experimental and subject to change”. The RFC was written back in 1998, over a quarter-century ago—has anyone found a use for it since then? RFC 6437 , published in 2011, attempts to specify semantics for IPv6 Flow Labels. Section 2 of the RFC says: The 20-bit Flow Label field in the IPv6 header [RFC2460] is used by a node to label packets of a flow. […] Packet classifiers can use the triplet of Flow Label, Source Address, and Destination Address fields to identify the flow to which a particular packet belongs. The RFC says that Flow Labels can potentially be used by routers for load balancing, where they can use the triplet source address, destination address, flow label to figure out that a series of packets are all associated with each other. But this is an internal implementation detail generated by the source program, and not something IPv6 users copy/pasting an address generally have to think about. So it makes sense that it isn’t part of the textual representation. RFC 6294 surveys Flow Label use cases, and some of the ones mentioned are: But this Stack Exchange answer by Andrei Korshikov says: Nowadays […] there [are] no clear advantages of additional 20-bit QoS field over existent Traffic Class (Differentiated Class of Service) field. So “Flow Label” is still waiting for its meaningful usage. In my view, putting in was an understandable choice given the optimism around QoS in 1998, but it was a bit of a mistake in hindsight. The Flow Label field never found widespread adoption, and the Traffic Class field is more of an application-level concern. In general, I think there should be a separation between types that are losslessly serializable and types that are not, and violates this expectation. Making the Traffic Class (QoS) a socket option, like in IPv4, avoids these serialization issues. What about the other additional field, ? What does it mean, and why does it not have to be zeroed out? The documentation for a says that in its textual representation, the scope identifier is included after the IPv6 address and a character, within square brackets. So, for example, the following code sample: prints out . What does this field mean? The reason exists has to do with link-local addressing . Imagine you connect two computers directly to each other via, say, an Ethernet cable. There isn’t a central server telling the computers which addresses to use, or anything similar—in this situation, how can the two computers talk to each other? To address this issue, OS vendors came up with the idea to just assign random addresses on each end of the link. The behavior is defined in RFC 3927, section 2.1 : When a host wishes to configure an IPv4 Link-Local address, it selects an address using a pseudo-random number generator with a uniform distribution in the range from 169.254.1.0 to 169.254.254.255 inclusive. (You might have seen these 169.254 addresses on your home computers if your router is down. Those are link-local addresses.) Sounds simple enough, right? But there is a pretty big problem with this approach: what if a computer has more than one interface on which a link-local address has been established? When a program tries to send some data over the network, the computer has to know which interface to send the data out on. But with multiple link-local interfaces, the outbound one becomes ambiguous. This is described in section 6.3 of the RFC: Address Ambiguity Application software run on a multi-homed host that supports IPv4 Link-Local address configuration on more than one interface may fail. This is because application software assumes that an IPv4 address is unambiguous, that it can refer to only one host. IPv4 Link-Local addresses are unique only on a single link. A host attached to multiple links can easily encounter a situation where the same address is present on more than one interface, or first on one interface, later on another; in any case associated with more than one host. […] The IPv6 protocol designers took this lesson to heart. Every time an IPv6-capable computer connects to a network, it establishes a link-local address starting with . (You should be able to see this address via on Linux, or your OS’s equivalent.) But if you’re connected to multiple networks, all of them will have addresses beginning with . Now if an application wants to establish a connection to a computer in this range, how can it tell the OS which interface to use? That’s exactly where comes in: it allows the to specify which network interface to use. Each interface has an index associated with it, which you can see on Linux with . When I run that command, I see: The , , and listed here are all the indexes that can be used as the scope ID. Let’s try pinging our address: Aha! The warning tells us that for a link-local address, the scope ID needs to be specified. Let’s try that using the syntax: Success! What if we try a different scope ID? This makes sense: the address is only valid for scope ID 2 (the interface). When we told to use a different scope, 3, the address was no longer reachable. This neatly solves the 169.254 problem with IPv4 addresses. Since scope IDs can help disambiguate the interface on which a connection ought to be made, it does make sense to include this field in , as well as in its textual representation. The keen-eyed among you may have noticed that the commands above printed out an alternate representation: . The at the end is the network interface that corresponds to the numeric scope ID. Many programs can handle this representation, but Rust’s can’t. Another thing you might have noticed is that the scope ID only makes sense on a particular computer. A scope ID such as means different things on different computers. So the scope ID is roundtrip serializable, but not portable across machines. In this post we started off by looking at a somewhat strange inconsistency and ended up deep in the IPv6 specification. In our case, the instances were always for internal services talking to each other without any QoS considerations, so was always zero. Given that knowledge, we were okay adjusting the property-based tests to always generate instances where was set to zero. ( Here’s the PR as landed .) Still, it raises questions: Should we wrap in a newtype that enforces this constraint? Should provide a non-standard alternate serializer that also includes the field? Should not forward to when hides fields? Should Rust have had separate types from the start? (Probably too late now.) And should Berkeley sockets not have included at all, given that it makes the type impossible to represent as text without loss? The lesson it really drives home for me is how important the principle of least surprise can be. Both and have lossless textual representations, and does as well. By analogy it would seem like would, too, and yet it does not! IPv6 learned so much from IPv4’s mistakes, and yet its designers couldn’t help but make some mistakes of their own. This makes sense: the designers could only see the problems they were solving then, just as we can only see those we’re solving now—and just as we encounter problems with their solutions, future generations will encounter problems with ours. Thanks to Fiona , and several of my colleagues at Oxide, for reviewing drafts of this post. Discuss on Hacker News and Lobsters . This is why our Rust map crate where keys can borrow from values, , serializes its maps as lists or sequences.  ↩︎ The Requests for Discussion we use at Oxide are inspired by RFCs, though we use a slightly different term (RFD) to convey the fact that our documents are less set in stone than IETF RFCs are.  ↩︎ The two fields sum up to 28 bits, and the field is a , so there’s four bits remaining. I couldn’t find documentation for these four bits anywhere—they appear to be unused padding in the . If you know about these bits, please let me know!  ↩︎ a way to generate random instances of a particular type, and given a failing input, a way to shrink it down to a minimal failing value. generate four values: an , a , a , and another then pass them in to . A left square bracket ( ) The textual representation of an IPv6 address Optionally , a percent sign ( ) followed by the scope identifier encoded as a decimal integer A right square bracket ( ) A colon ( ) The port, encoded as a decimal integer. generated a with a non-zero field. When we went to serialize this field as JSON, we used the textual representation, which dropped the field. When we deserialized it, the field was set to zero. As a result, the before and after values were no longer equal. What does this field mean? A is just an plus a port ; why is a different? Why is the not part of the textual representation? , , and are all roundtrip serializable. Why is not? Also: what is the field? a 20-bit Flow Label, and an 8-bit Traffic Class 3 . QoS was not originally part of the 1980s Berkeley sockets specification. DSCP came about much later ( RFC 2474 , 1998). Because C structs do not provide encapsulation, the definition was set in stone and couldn’t be changed. So instead, the DSCP field is set as an option on the socket, via . By the time IPv6 came around, it was pretty clear that QoS was important, so the Traffic Class was baked into the struct. as a pseudo-random value that can be used as part of a hash key for load balancing, or as extra QoS bits on top of the 8 bits provided by the Traffic Class field. This is why our Rust map crate where keys can borrow from values, , serializes its maps as lists or sequences.  ↩︎ The Requests for Discussion we use at Oxide are inspired by RFCs, though we use a slightly different term (RFD) to convey the fact that our documents are less set in stone than IETF RFCs are.  ↩︎ The two fields sum up to 28 bits, and the field is a , so there’s four bits remaining. I couldn’t find documentation for these four bits anywhere—they appear to be unused padding in the . If you know about these bits, please let me know!  ↩︎

0 views
Corrode 4 weeks ago

Patterns for Defensive Programming in Rust

I have a hobby. Whenever I see the comment in code, I try to find out the exact conditions under which it could happen. And in 90% of cases, I find a way to do just that. More often than not, the developer just hasn’t considered all edge cases or future code changes. In fact, the reason why I like this comment so much is that it often marks the exact spot where strong guarantees fall apart. Often, violating implicit invariants that aren’t enforced by the compiler are the root cause. Yes, the compiler prevents memory safety issues, and the standard library is best-in-class. But even the standard library has its warts and bugs in business logic can still happen. All we can work with are hard-learned patterns to write more defensive Rust code, learned throughout years of shipping Rust code to production. I’m not talking about design patterns here, but rather small idioms, which are rarely documented, but make a big difference in the overall code quality. Here’s some innocent-looking code: This code works for now, but what if you refactor it and forget to keep the length check? That’s our first implicit invariant that’s not enforced by the compiler. The problem is that indexing into a vector is decoupled from checking its length: these are two separate operations, which can be changed independently without the compiler ringing the alarm. If we use slice pattern matching, we’ll only get access to the element if the arm is executed. Note how this automatically uncovered one more edge case: what if the list is empty? We hadn’t considered this case before. The compiler-enforced pattern matching forces us to think about all possible states! This is a common pattern throughout robust Rust code, the attempt to put the compiler in charge of enforcing invariants. When initializing an object with many fields, it’s tempting to use to fill in the rest. In practice, this is a common source of bugs. You might forget to explicitly set a new field later when you add it to the struct (thus using the default value instead, which might not be what you want), or you might not be aware of all the fields that are being set to default values. Instead of this: Yes, it’s slightly more verbose, but what you gain is that the compiler will force you to handle all fields explicitly. Now when you add a new field to , the compiler will remind you to set it here as well and reflect on which value makes sense. Let’s say you’re building a pizza ordering system and have an order type like this: For your order tracking system, you want to compare orders based on what’s actually on the pizza - the , , and . The timestamp shouldn’t affect whether two orders are considered the same. Here’s the problem with the obvious approach: Now imagine your team adds a field for customization options: Your implementation still compiles, but is it correct? Should be part of the equality check? Probably yes - a pizza with extra cheese is a different order! But you’ll never know because the compiler won’t remind you to think about it. Here’s the defensive approach using destructuring: Now when someone adds the field, this code won’t compile anymore. The compiler forces you to decide: should be included in the comparison or explicitly ignored with ? This pattern works for any trait implementation where you need to handle struct fields: , , , etc. It’s especially valuable in codebases where structs evolve frequently as requirements change. Sometimes there’s no conversion that will work 100% of the time. That’s fine. When that’s the case, resist the temptation to offer a implementation out of habit; use instead. Here’s an example of in disguise: The is a hint that this conversion can fail in some way. We set a default value instead, but is it really the right thing to do for all callers? This should be a implementation instead, making the fallible nature explicit. We fail fast instead of continuing with a potentially flawed business logic. It’s tempting to use in combination with a catch-all pattern like , but this can haunt you later. The problem is that you might forget to handle a new case that was added later. Instead of: By spelling out all variants explicitly, the compiler will warn you when a new variant is added, forcing you to handle it. Another case of putting the compiler to work. If the code for two variants is the same, you can group them: Using as a placeholder for unused variables can lead to confusion. For example, you might get confused about which variable was skipped. That’s especially true for boolean flags: In the above example, it’s not clear which variables were skipped and why. Better to use descriptive names for the variables that are not used: Even if you don’t use the variables, it’s clear what they represent and the code becomes more readable and easier to review without inline type hints. If you only want your data to be mutable temporarily, make that explicit. This pattern is often called “temporary mutability” and helps prevent accidental modifications after initialization. See the Rust unofficial patterns book for more details. Let’s say you had a simple type like the following: Now you want to make invalid states unrepresentable. One pattern is to return a from the constructor. But nothing stops someone from creating an instance of directly: This should not be possible! One way to prevent this is to make the struct non-exhaustive: Now the struct cannot be instantiated directly outside of the module. However, what about the module itself? One way to prevent this is to add a hidden field: Now the struct cannot be instantiated directly even inside the module. You have to go through the constructor, which enforces the validation logic. The attribute is often neglected. That’s sad, because it’s such a simple yet powerful mechanism to prevent callers from accidentally ignoring important return values. Now if someone creates a but forgets to use it, the compiler will warn them: This is especially useful for guard types that need to be held for their lifetime and results from operations that must be checked. The standard library uses this extensively. For example, is marked with , which is why you get warnings if you don’t handle errors. Boolean parameters make code hard to read at the call site and are error-prone. We all know the scenario where we’re sure this will be the last boolean parameter we’ll ever add to a function. It’s impossible to understand what this code does without looking at the function signature. Even worse, it’s easy to accidentally swap the boolean values. Instead, use enums to make the intent explicit: This is much more readable and the compiler will catch mistakes if you pass the wrong enum type. You will notice that the enum variants can be more descriptive than just or . And more often than not, there are more than two meaningful options; especially for programs which grow over time. For functions with many options, you can configure them using a parameter struct: This approach scales much better as your function evolves. Adding new parameters doesn’t break existing call sites, and you can easily add defaults or make certain fields optional. The preset methods also document common use cases and make it easy to use the right configuration for different scenarios. Rust is often criticized for not having named parameters, but using a parameter struct is arguably even better for larger functions with many options. Many of these patterns can be enforced automatically using Clippy lints. Here are the most relevant ones: You can enable these in your project by adding them to your or at the top of your crate, e.g. Defensive programming in Rust is about leveraging the type system and compiler to catch bugs before they happen. By following these patterns, you can: It’s a skill that doesn’t come naturally and it’s not covered in most Rust books, but knowing these patterns can make the difference between code that works but is brittle, and code that is robust and maintainable for years to come. Remember: if you find yourself writing , take a step back and ask how the compiler could enforce that invariant for you instead. The best bug is the one that never compiles in the first place. Make implicit invariants explicit and compiler-checked Future-proof your code against refactoring mistakes Reduce the surface area for bugs

0 views
fasterthanli.me 4 weeks ago

Engineering a Rust optimization quiz

There are several Rust quizzes online, including one that’s literally called the “Unfair Rust Quiz” at https://this.quiz.is.fckn.gay/ , but when I was given the opportunity to record an episode of the Self-Directed Research podcast live on the main stage of EuroRust 2025 , I thought I’d come up with something special. The unfair rust quiz really deserves its name. It is best passed with a knowledgeable friend by your side.

0 views
Matthias Endler 1 months ago

Building Up And Sanding Down

Over the years, I’ve gravitated toward two complementary ways to build robust software systems: building up and sanding down. Building up means starting with a tiny core and gradually adding functionality. Sanding down means starting with a very rough idea and refining it over time. Neither approach is inherently better; it’s almost a stylistic decision that depends on team dynamics and familiarity with the problem domain. On top of that, my thoughts on the topic are not particularly novel, but I wanted to summarize what I’ve learned over the years. Building Up Working on a solid stone block in ancient Egypt Source: Wikimedia Public Domain Building up focuses on creating a solid foundation first. I like to use it when working on systems I know well or when there is a clear specification I can refer to. For example, I use it for implementing protocols or when emulating hardware such as for my MOS 6502 emulator . I prefer “building up” over “bottom-up” as the former evokes construction and upward growth. “Bottom-up” is more abstract and directional. Also “bottom-up” always felt like jargon while “building up” is more intuitive and very visual, so it could help communicate the idea to non-technical stakeholders. There are a few rules I try to follow when building up: When I collaborate with highly analytical people, this approach works well. People who have a background in formal methods or mathematics tend to think in terms of “building blocks” and proofs. I also found that functional programmers tend to prefer this approach. In languages like Rust, the type system can help enforce invariants and make it easier to build up complex systems from simple components. Also, Rust’s trait system encourages composition, which aligns well with that line of thinking. The downside of the “build up” approach is that you end up spending a lot of time on the foundational layers before you can see any tangible results. It can be slow to get to an MVP this way. Some people also find this approach too rigid and inflexible, as it can be hard to pivot or change direction once you’ve committed to a certain architecture. For example, say you’re building a web framework. There are a ton of questions at the beginning of the project: In a building-up approach, you would start by answering these questions and designing the core abstractions first. Foundational components like the request and response types, the router, and the middleware system are the backbone of the framework and have to be rock solid. Only after you’ve pinned down the core data structures and their interactions would you move on to building the public API. This can lead to a very robust and well-designed system, but it can also take a long time to get there. For instance, here is the struct from the popular crate: There are quite a few clever design decisions in this short piece of code: With the exception of extensions, this design has stood the test of time. It has remained largely unchanged since the very first version in 2017. Sanding Down Drawing of the part of wall painting in the tomb of Rekhmire Source: Wikimedia Public Domain The alternative approach, which I found to work equally well, is “sanding down.” In this approach, you start with a rough prototype (or vertical slice) and refine it over time. You “sand down” the rough edges over and over again, until you are happy with the result. It feels a bit like woodworking, where you start with a rough piece of wood and gradually refine it into a work of art. (Not that I have any idea what woodworking is like, but I imagine it’s something like that.) Crucially, this is similar but not identical to prototyping. The difference is that you don’t plan on throwing away the code you write. Instead, you’re trying to exploit the iterative nature of the problem and purposefully work on “drafts” until you get to the final version. At any point in time you can stop and ship the current version if needed. I find that this approach works well when working on creative projects which require experimentation and quick iteration. People with a background in game development or scripting languages tend to prefer this approach, as they are used to working in a more exploratory way. When using this approach, I try to follow these rules: This approach makes it easy to throw code away and try something new. I found that it can be frustrating for people who like to plan ahead and are very organized and methodical. The “chaos” seems to be off-putting for some people. As an example, say you’re writing a game in Rust. You might want to tweak all aspects of the game and quickly iterate on the gameplay mechanics until they feel “just right.” In order to do so, you might start with a skeleton of the game loop and nothing else. Then you add a player character that can move around the screen. You tweak the jump height and movement speed until it feels good . There is very little abstraction between you and the game logic at this point. You might have a lot of duplicated code and hardcoded values, but that’s okay for now. Once the core gameplay mechanics are pinned down, you can start refactoring the code. I think Rust can get in the way if you use Bevy or other frameworks early on in the game design process. The entity component system can feel quite heavy and hinder rapid iteration. (At least that’s how I felt when I tried Bevy last time.) I had a much better experience creating my own window and rendering loop using macroquad . Yes, the entire code was in one file and no, there were no tests. There also wasn’t any architecture to speak of. And yet… working on the game felt amazing! I knew that I could always refactor the code later, but I wanted to stay in the moment and get the gameplay right first. Here’s my game loop , which was extremely imperative and didn’t require learning a big framework to get started: You don’t have to be a Rust expert to understand this code. In every loop iteration, I simply: It’s a very typical design for that type of work. If I wanted to, I could now sand down the code and refactor it into a more modular design until it’s production-ready. I could introduce a “listener/callback” system to separate input handling from player logic or a scene graph to manage multiple game objects or an ontology system to manage game entities and their components. But why bother? For now, I care about the game mechanics, not the architecture. Finding the Right Balance Both variants can lead to correct, maintainable, and efficient systems. There is no better or worse approach. I found that most people gravitate toward one approach or the other. However, it helps to be familiar with both approaches and know when to apply which mode. Choose wisely, because switching between the two approaches is quite tricky as you start from different ends of the problem. Focus on atomic building blocks that are easily composable and testable. Build up powerful guarantees from simple, verifiable properties. Focus on correctness, not performance. Write the documentation along with the code to test your reasoning. Nail the abstractions before moving on to the next layer. Will it be synchronous or asynchronous? How will the request routing work? Will there be middleware? How? How will the response generation work? How will error handling be done? The struct is generic over the body type , allowing for flexibility in how the body is represented (e.g., as a byte stream, a string, etc.). The struct is separated from the struct, allowing for easy access to the request metadata without needing to deal with the body. can be used to store extra data derived from the underlying protocol. The field is a zero-sized type used to prevent external code from constructing directly. It enforces the use of the provided constructors and ensures that the invariants of the struct are maintained. Switch off your inner perfectionist. Don’t edit while writing the first draft. Code duplication is strictly allowed. Refactor, refactor, refactor. Defer testing until after the first draft is done. Focus on the outermost API first; nail that, then polish the internals. get the inputs update the player state draw the player wait for the next frame

0 views
Steve Klabnik 1 months ago

I see a future in jj

In December of 2012, I was home for Christmas, reading Hacker News. And that’s when I saw “ Rust 0.5 released ."" I’m a big fan of programming languages, so I decided to check it out. At the time, I was working on Ruby and Rails, but in college, I had wanted to focus on compilers, and my friends were all very much into systems stuff. So I decided to give Rust a try. And I liked it! But, for other reasons I won’t get into here, I was thinking about a lot of things in that moment. I was looking to shake things up a bit. So I asked myself: is Rust going to be A Thing? So, I thought about it. What does a programming language need to be successful? It needs some sort of market fit. It needs to have people willing to work on it, as bringing a new language into the world is a lot of work. And it needs users. When I considered all of these things, here’s what I saw with Rust: Market fit: there was basically no credible alternatives to C and C++. I had been involved in the D community a bit, but it was clear that it wasn’t going to take off. Go was a few years old, and hit 1.0 earlier that year, but for the kinds of work that C and C++ are uniquely able to do, I saw the same problem that I did with D: garbage collection. This doesn’t mean Go isn’t a good language, or that it’s not popular, but I didn’t see it as being able to credibly challenge C and C++ in their strongholds. Rust, on the other hand, had a novel approach to these problems: memory safety without garbage collection. Now, I also need to mention that Rust back in those days was much closer to Go than it even is today, but again, I had just learned about it for a few hours, I didn’t really have a deep understanding of it yet. If I had, I actually might have also dismissed it as well, as it wasn’t really GC that was the issue, but a significant runtime. But again: I hadn’t really come to that understanding yet. Point is: low-level programming was a space where there hadn’t been much innovation in a very long time, and I thought that meant that Rust had a chance. Check. For a team: well, Mozilla was backing it. This is a big deal. It meant that there were folks whose job it was to work on the language. There’s so much that you need to do to make a new language, and that means a ton of work, which means that if you’re going to be able to get it done in a reasonable amount of time, having paid folks working on it is certainly better than the alternative. Check. And finally, how does this translate into users? Well, Mozilla was planning on using it in Firefox. This is huge. Firefox is a major project, and if they could manage to use Rust in it, that would prove that Rust was capable of doing real work. And, more importantly, it would mean that there would be a lot of folks who would need to learn Rust to work on Firefox. This would create a base of users, which would help the language grow. Check. Finally, even though it wasn’t part of my initial assessment, I just really liked the Rust folks. I had joined IRC and chatted with people, and unlike many IRC rooms, they were actually really nice. I wanted to be around them more. And if I did, other people probably would too. So that was also a plus. So, I started learning Rust. I decided to write a tutorial for it, “Rust for Rubyists,” because I’m a sucker for alliteration. And I eventually joined the team, co-authored The Book, and if you’re reading this post, you probably know the rest of the story. For some background, jj is a new version control system (VCS), not a programming language. It is written in Rust though! While I talked about how I decided to get involved with Rust above, my approach here generalizes to other kinds of software projects, not just programming languages. I have a rule of thumb: if Rain likes something, I will probably like that thing, as we have similar technical tastes. So when I heard her talk about jj, I put that on my list of things to spend some time with at some point. I was especially intrigued because Rain had worked at Meta on their source control team. So if she’s recommending something related to source control, that’s a huge green flag. It took me a while, but one Saturday morning, I woke up a bit early, and thought to myself, “I have nothing to do today. Let’s take a look at jj.” So I did . You’ll note that link goes to a commit starting a book about jj. Since it worked for me with Rust, it probably would work for me for jj as well. Writing about something really helps clarify my thinking about it, and what better time to write something for a beginner than when you’re also a beginner? Anyway, people seem to really like my tutorial, and I’m thankful for that. So, what do I see in jj? Well, a lot of it kind of eerily mirrors what I saw in Rust: a good market fit, a solid team, and a potential user base. But the market fit is interesting. Git has clearly won, it has all of the mindshare, but since you can use jj to work on Git repositories, it can be adopted incrementally. At Oxide , Rain started using jj, and more of us did, and now we’ve got a chat channel dedicated to it. This is, in my opinion, the only viable way to introduce a new VCS: it has to be able to be partially adopted. Google is using jj, and so that is a bit different than Mozilla, but the same basic idea. I have more to say about Google’s relationship to jj, but that’s going to be a follow-up blog post. What I will say in this post is that at the first ever jj conference a few weeks ago, Martin (the creator of jj) said that internal adoption is going really well. I’m burying the lede a bit here, because the video isn’t up yet, and I don’t want to get the details of some of the more exciting news incorrect in this post. I also don’t mean to imply that everyone at Google is using jj, but the contingent feels significant to me, given how hard it is to introduce a new VCS inside a company of that size. Well, in this case, it’s using Piper as the backend, so you could argue about some of the details here, but the point is: jj is being used in projects as small as individual developers and as large as one of the largest monorepos in the world. That’s a big deal. It can show the social proof needed for others to give jj a chance. Outside of Google, a lot of people say that there’s a bit of a learning curve, but once you get over that, people really like it. Sound familiar? I think jj is different from Rust in this regard in that it’s also very easy to learn if you aren’t someone who really knows a ton about Git. It’s folks that really know Git internals and have put time and care into their workflows that can struggle a bit with jj, because jj is different. But for people who just want to get work done, jj is really easy to pick up. And when people do, they often tend to like it. jj has developed a bit of a reputation for having a passionate fanbase. People are adopting it in a skunkworks way. This is a great sign for a new tool. And finally, the team. Martin is very dedicated to jj, and has been working on it for a long time. There’s also a small group of folks working on it with him. It recently moved out from his personal GitHub account to its own organization, and has started a more formal governance. The team is full of people who have a deep history of working on source control tools, and they know what they’re doing. The burgeoning jj community reminds me of that early Rust community: a bunch of nice folks who are excited about something and eager to help it grow. Basically, to me, jj’s future looks very bright. It reminds me of Rust in all of the best ways. Speaking of burying the lede… I’ve decided to leave Oxide. Oxide is the best job I’ve ever had, and I love the people I work with. I was employee 17. I think the business will do fantastic in the future, and honestly it’s a bit of a strange time to decide to leave, since things are going so well. But at the same time, some of my friends have started a new company, ERSC , which is going to be building a new platform for developer collaboration on top of jj. Don’t worry, “errssk” isn’t going to be the name of the product. It’s kind of like how GitHub was incorporated as Logical Awesome , but nobody calls it that. This won’t be happening until next month, I have some stuff to wrap up at Oxide, and I’m going to take a week off before starting. But as sad as I am to be leaving Oxide, I’m also really excited to be able to spend more time working in the jj community, and helping build out this new platform. For those of you who’ve been asking me to finish my tutorial, well, now I’ll have the time to actually do that! I’m sorry it’s taken so long! You’ll see me talking about jj even more, spending even more time in the Discord , and generally being more involved in the community. And I’ll be writing more posts about it here as well, of course. I’m really excited about this next chapter. 2025 has been a very good year for me so far, for a number of reasons, and I am grateful to be able to take a chance on something that I’m truly passionate about. Here’s my post about this post on BlueSky: I see a future in #jj-vcs : steveklabnik.com/writing/i-se... I see a future in jj Blog post: I see a future in jj by Steve Klabnik

0 views