Posts in Rust (20 found)
Steve Klabnik 1 weeks ago

Getting started with Claude for software development

2025 was an interesting year in many ways. One way in which it was interesting for me is that I went from an AI hater to a pretty big user. And so I’ve had a few requests for a “using Claude” guide, so I figure new year, why not give it a shot? The lack of this kind of content was something that really frustrated me starting out, so feels like a good thing to contribute to the world. This post is going to be for software developers that are interested in learning about using these tools as of early 2026. I’m going to spend this post talking about some background, and then the first steps towards getting your feet wet. If folks like it, I’ll follow up with more. There’s a lot here. I’m going to be speaking about Claude directly, because it’s the tool I use the most, but a lot of this should apply to other platforms as well. The first thing I want to say on this topic is that there’s a reason that this is the first post of possibly many: there’s a lot here, as I just said above. This matters more than you might think at first. Becoming productive with LLMs is not actually easy, no matter what other people tell you. A lot of advice in this space is given by people who don’t have a teaching background, and have forgotten how much work they’ve put in to get to where they are. I liken it to vim: everyone acknowledges that modal editing is a different way of working. We joke about how hard it is to learn how to quit vim if you’ve accidentally started it up. But many people also acknowledge that the learning curve is worth it, due to the power you get. I think of LLMs like vim: they’re not super easy to get real results from, but the time invested can be worth it. It’s also worth saying up front: maybe it’s not worth it, for you. I don’t fault anyone for not wanting to spend time learning a new tool, especially in a space that’s moving as fast as this one. Effectively everything I’m going to talk about in this post has really only come into its own in the last 12-14 months. Maybe in another 12 months this post will be useless. I don’t know. But just like we might not find the time to learn vim to be worth it over just using a more normal editor, that doesn’t mean that deciding that all of this isn’t worth your time isn’t a rational, reasonable decision to make. You’re not going to be “left behind” or whatever some of the boosters say, in the same way that you don’t need to learn vim to do software dev. We aren’t doing vim vs emacs wars here. We’re saying “hey if you want to learn vim this is how I think you can, and if not, that’s fine.” Furthermore, because there’s so much to cover, this post is going to be background and step 1. Because otherwise it would be too dang long. You can’t just read ten thousand words on this stuff and be an expert, you have to go actually use the things. So you should be taking time between each post to go and do that, and so not putting it all in one place should give you natural breakpoints to go and actually try this stuff out. The very first thing I want to say on this entire topic is something that I think about a lot. I have generally had better outcomes with LLMs than a lot of people I know. And that’s bothered me. And I’m not sure exactly why that is. But I do have one idea. I like to approach this space in a … maybe “scientific” way is too strong, but at least a rational one. I try things out, discard what doesn’t seem to work, and keep what seems to work. I try and think critically about this space. I do think that the whole “vibe” term, while complicated in this space, is also important. Vibes do matter, actually. I have some more science-y and some more folks-y reasons that I believe this. But I do think that the attitude you bring towards this process partially dictates your success, and I think you should be conscious of that while you go on this journey. Is that too woo-y for you? Okay, let me make it concrete: I un-ironically believe that swearing at Claude makes it perform worse. I think you will get better results working with an LLM if you treat them like you’d treat a respected co-worker, and you will get worse results if you berate, insult, or otherwise mistreat them. This matters because I think that for a lot of LLM-skeptical people who give this a shot, they may not actually go “Hey claude what’s your fucking problem” (though I have literally seen this happen) they will tend to let their frustrations show a bit more when things don’t work out. Use your emotional regulation skills. It’s very okay to be critical in response to whatever Claude does, but do it in a way that wouldn’t get you reported to HR in a healthy company. Do this: Why did you do it that way? I would have preferred if we did <this> instead. Stop making such basic mistakes. You know that we do <this> and not <that>, idiot. I think that being kind to people is good for you, but I also believe that even if you’re a misanthrope, consider this a skill to get increased output from the tool. I think a bit of anthropomorphization is actually a good thing here. We’ll come back to that later during the more practical steps, but basically, that’s the higher level principle at work: an LLM is not a person. But it is working based off of language that people use. That’s its API. And so interacting with it in the way you’d interact with a co-worker is, in my mind, the right way to do it. Maybe I’ll elaborate on this belief someday. Or maybe not. I do this for personal belief reasons more than anything else. But it is something I want to share. Okay! Now that we’ve got that out of the way, let’s talk about the various ways you can use Claude! There’s a number of them, actually, but I want to focus on two: on the web at https://claude.ai , and with Claude Code . Using Claude in these ways is fundamentally different. Both have pros and cons. For real actual software development, you want to use Claude Code. This is due to the “agentic loop”, which you’ll learn more about in a bit. But for your first steps, using it via the web is okay. It’s mostly just important to know that your experience using the web interface is not going to be the same as using Claude Code. If I only had access to the web interface, I wouldn’t be so bullish on this stuff. But it is and can be useful. Especially when getting your feet wet, as long as you can understand that they’re meaningfully different. This gets into another topic that matters: money. Another reason I do not fault anyone for not spending time with these tools is that vim is free, whereas Claude is very much not. However. There are three major factors in the money equation: Claude Web vs Claude Code, which models you have access to, and the actual cost. Let’s talk about them. You can load up https://claude.ai and talk to Claude right now for free. But you cannot use Claude Code without paying. So if you want to start incredibly small, using the website at first before you fork over many can make sense. Again, that’s fine, just know that the experience is different. But it may be a good way to start. In 2024 and 2025, there was a good argument that you needed to be on a paid plan because that’s how you got access to the latest models. While this is still true to some degree, models have advanced far enough that the changes are less important over time. I do think that in the first half of 2026, it still does matter to a degree. Basically, the difference between Claude 3, 4, and 4.5 is significant, but for me, Claude 4 was good enough a year ago to get real work done. I’m not 100% sure which one you get for free today, but it’s at least fine. And I think that by the time the next round of models come out, the ones you’ll have access to for free will be basically good enough to make this question moot. But do know that you get what you pay for, and paying for things does get you better performance. (Speaking of models, you’ll hear Claude referred to in three ways: Haiku, Sonnet, and Opus. As the names imply, worst to best there, though also, fastest to slowest. Sonnet, especially the 4.5 version, is pretty good for everything. Opus 4.5 is wonderful. Haiku is great for certain things.) As for actual cost: there’s $20/month, $100/month, and $200/month plans, as well as “you pay per API call.” You might be tempted to think “I’ll just pay per API call and keep my usage down.” This is a reasonable thing to think, and also a terrible mistake to make. You get a lot of bang for your buck with the plans. To give you an idea, I recently hit my weekly limit last night on the $200/month plan, and my estimated usage for that week (which again, I’m paying $50 for) would have been $1440.73 if I were paying by the API call. Now, I am a very heavy user, but the point stands: as someone trying out these tools, it is way easy to spend more than $20 of API tokens. If you want to give these tools a real shot, come up with a spare $20, sign up for the cheap plan, and then cancel after your experiment is over. You get access to Claude Code and you’ve capped your spend. It’s a win/win. There’s some good secondary effects of trying to be frugal here but I think that’s more of an intermediate than an advanced topic, to be honest. I think worrying about the money while you build these skills is a distraction. Cap your spend via a plan so that way you can not stress out about breaking the bank. Okay, with all of that background out of the way: let’s talk about your first steps here. Everyone is interested in the ability of LLMs to generate code. But I think that’s actually step 2, not step 1. The way I want you to start using these tools is purely read-only at first. This is also why the website is okay to get started with too; Claude Code is far better at generating code than the site is, but we’re not going to start by writing code. Find some code you’ve written recently. It can be literally anything. Load up https://claude.ai , and type: Hi Claude! Can we talk about this code? And then paste your code in. You don’t need any fancy prompting techniques. You don’t even need to say what language it is. Just give it some code. It could be ten lines, it could be a hundred. I wouldn’t recommend a thousand to start. Claude will probably respond with some sort of basic analysis of what you’ve done, and then a question. I gave it ~50 lines of code a friend and I were discussing recently, and it gave me this back: Sure! This looks like <description of what it does>. You’ve got <three things that the code does>. What’s on your mind about it? Are you thinking through the design, running into a specific issue, or wanting feedback on a particular aspect? From here, you have a ton of options of which way to go, but they really depend on what you’ve pasted in. Here’s some fun prompt ideas: Do you think this code is idiomatic? If you could improve one thing about this code, what might it be? If I wanted to modify this code to do <something>, how would you go about doing that? Are there any bugs in this code? Are there any security implications of this code I may not have thought about? And so on. Anyway, the goal here is to just get used to this whole thing. It’s a bit weird! It’s very different than talking to a compiler. If Claude says something you disagree with, push back a little, just like you would a co-worker: I’m not sure I agree with that. The reason why is that in some other part of the system, there’s <behavior> and so that would impact this sort of decision. Why did you suggest that? I’d like to understand more. Claude will absolutely not be right all of the time. And that’s okay! The goal is to work together, not that this is a magic tool that suddenly solves all of your problems. Once you’ve done this a few times, you might want to graduate to Claude Code. The reason for this is that you can start to scale up your questions. Once you’ve installed it and logged in, you’ll be at a terminal prompt. It might bug you about creating a CLAUDE.md, don’t worry about that for now. Continue having conversations with Claude about your codebase. The reason that this is a a big step up is that before, you had to paste all of the code in. Now, Claude can go find your code itself. Some prompts for you to try: Please give me a code review of my codebase and suggest five things I could do to improve it. Can you find any bugs in <component>? I’m curious about the performance of <component>, can we talk about it? One thing I like to do here is have Claude double check my intuitions. A few months ago, working on an application in Rust, I was considering a refactoring. I hadn’t done it because I was worried that it would be tedious, take a while, and maybe not improve the codebase. It might! But it might not. But putting in the day or two to do the refactor wasn’t really worth finding out if maybe that would be wasted. So, I asked Claude. This is an example of a bit longer of a prompt: Hi Claude! I am considering refactoring my code. In a function like this: <paste code>, I don’t like how I did things, and I’m considering doing it like this instead: <paste code>. However, I know that changes the signature, which impacts other code in the codebase. A few questions for you: 1. how many function signatures would need to be updated if I made this change? 2. can you show me what the code would look like if I did this refactoring on one of my simpler endpoints? 3. can you show me what the code would look like if I did this refactoring on one of my most complex endpoints? Claude came back and said something like “250 signatures would need to change, here’s the before and after using these two examples from your codebase.” Now, Claude isn’t perfect: maybe it was actually 260 signatures. But the point is, this helped me characterize my intuition here: it would be a non-trivial amount of work. But I also got to see its impact on real code I had written, which helped me decide if this refactoring would actually help me in some of the more hairy parts of the codebase. Note that there’s not really any special “prompt engineering” going on here. You don’t need to do “as a senior software engineer” or stuff like that. Just talk to it like you’d talk to a person. It’s fine. That doesn’t mean that prompts are useless, but this sort of optimization is an intermediate to advanced topic, and frankly, I’m skeptical that at this point the “as an x” technique even helps. More on that someday. The point is, you can start asking more complex questions as you get more comfortable with the tool. Because Claude works asynchronously, you can just fire off questions like these in the background, and come back to them when it’s done. Well, sorta. Let’s talk about permissions before we wrap this up. By default, Claude will put you in an “ask before edits” mode. This is a good way to start. It’ll check in with you before doing certain things, and you can say yes or no. Please consider what it’s about to do, and give the answer you’re comfortable with. Advanced users basically let Claude do whatever it wants, but you’re not there yet, and there’s risks involved that aren’t obvious to you just yet as a new user, so even though it can be a bit annoying to say yes every time it asks, I’d encourage you to start off with minimal permissions. It gives you the option to say “commands like this one are okay for the rest of my session” and so when it wants to or something, that can be nice to agree to, but I’d encourage you to not use it for writing code just yet, and tell it no if it asks. We’ll do that in a follow-up post. So that’s my intro to getting started with Claude. Spend $20, talk to it like you’d talk to a person, and use it as a means of getting feedback on your code, don’t have it write anything just yet. Graduate to larger and larger questions as you get comfortable with what it can do. Gently push back when you think it gets out of line. But your goal here is a baseline understanding of what the tool is capable of, not to vibe code out an entire app in an afternoon. These skills may seem too basic, but I promise you, it gets harder from here, and so you’ll want a solid foundation in read-only questions before we graduate to having Claude write some code. I hope this was helpful to you. Here’s my post about this post on BlueSky: Getting started with Claude for software development: steveklabnik.com/writing/gett... Getting started with Claude for software development Blog post: Getting started with Claude for software development by Steve Klabnik

0 views

Reflecting on 2025, preparing for 2026

As I do every year, it's that time to reflect on the year that's been, and talk about some of my hopes and goals for the next year! I'll be honest, this one is harder to write than last year's. It was an emotionally intense year in a lot of ways. Here's to a good 2026! Where last year I got sick and had time black holes from that, this year I lost time to various planned surgeries. I didn't get nearly as much done, because it was also hard to stay focused with all the attacks on trans rights happening. Without further ado, what'd I get up to? I helped coaching clients land job and improve their lives at work and beyond. I started coaching informally in 2024, and in 2025 I took on some clients formally. During the year, I helped clients improve their skills, build their confidence, and land great new jobs. I also helped clients learn how to balance their work and home life, how to be more productive and focused, and how to navigate a changing industry. This was one of the most rewarding things I did all year. I hope to do more of it this coming year! If you want to explore working together, email me or schedule an intro . I solved interesting problems at work. This reflection is mostly private, because it's so intertwined with work that's confidential. I learned a lot, and also got to see team members blossom into their own leadership roles. It is really fun watching people grow over time. I took on some consulting work. I had some small engagements to consult with clients, and those were really fun. Most of the work was focused on performance-sensitive web apps and networked code, using (naturally) Rust. This is something I'll be expanding this year! I've left my day job and am spinning up my consulting business again. More on that soon, but for now, email me if you want help with software engineering (especially web app performance) or need a principal engineer to step in and provide some engineering leadership. I wrote some good blog posts. This year, my writing output dropped to about 1/3 of what it was last year. Despite the reduction, I wrote some pretty good posts that I'm really happy with! I took a break intentionally to spend some time dealing with everything going on around me, and that helped a lot. I didn't get back to consistent weekly posts, but I intend to in 2026. My hernias were fixed. During previous medical adventures, some hernias were found. I go those fixed [1] ! Recovering from hernia repair isn't fun, but wasn't too bad in the long run. It resolved some pain I'd had for a while, which I hadn't realized was unusual pain. (Story of my life, honestly.) Long-awaited surgery! In addition to the hernia repair, I had another planned surgery done. The recovery was long, and is still ongoing. My medical leave was 12 weeks, and I'm going to continue recovering for about the first year in various forms. This has brought me so much deep relief, I can't even put it in words. Performed a 30-minute set at West Philly Porchfest. I did a solo set in West Philly Porchfest! All the arrangements were done by me, and I performed all the parts live (well, one part used a pre-sequenced arpeggiator). I played my wind synth as my main instrument, layering parts over top of myself with a looper, and I also played the drum parts. You can watch a few of the pieces in a YouTube playlist . Wrote and recorded two pieces of original music. This was one of my goals from 2024, and I'm very proud that I got it done. The first piece of music, Anticipation , came from an exercise a music therapist had me do. I took the little vignette and expanded it into a full piece, but more importantly, the exercise gave me an approach to composition. I'd like to rerecord Anticipation sometime, since I've grown as a musician significantly across the year. My second piece I'm even happier with. It's called Little Joys , and I'm just tickled that I was able to write this. I played it on my alto sax (piped through a pedal board) and programmed the other parts using a sequencer. One of my poems was published! I've written a lot more poetry this year. One of my close friends told me that I should get one of them published to have more people read it. They thought it was a good and important poem. That gave me the confidence to submit some poems, and one of them was accepted! (The one they told me to submit was not yet accepted anywhere, but fingers crossed.) You can read my poem, "my voice", in the December issue of Lavender Review . Every year when I write this, I realize I got a lot done. This year was a lot, filled with way more creative output than previous years. How does it stack up against what I wanted to do last year ? I am really proud of how much I did on my goals. I might be unhappy with my slipping on if it were a "normal" year where the government isn't trying to strip my rights, but you know what? I'll take it. Especially since I prioritized my health and happiness. So, what would I like to get out of this new year, 2026? These aren't my predictions for what will happen, nor are they concrete goals. They're more of a reflection on what I'd like this coming year to be. This is what I'm dreaming 2026 will be like for me. Keep my rights (and maybe regain ground). A perennial goal, I'd like to be able to stay where I am and have access to, I don't know, doctors and bathrooms. We've held a lot of ground this year. Hopefully some of what was lost can be regained. I'm going to keep doing what I can, and that includes living my best life and being positive representation for all others who are under attack. Maintain relationships with friends and family. I want to keep up with my friends and family and continue having regular chats with those I care about. We're a social species, and we rely on each other for support. I'm going to keep being there for the people I care about when they need me, and keep accepting their help as well when I need them. Spin up my business. I'm going out on my own, and I'm going to be offering my software engineering services again. By the end of the year, this will hopefully be thrumming along to support me and my family. Publish weekly blog posts (sustainably). I'm back in the saddle! This is the first post of 2026, and they're going to hopefully keep coming regularly. To make it sustainable, I'm going to explore if Patreon is a viable option to offset some of the time it takes to make the blog worth reading. Record a short album. I have a track in progress, and I have four more track ideas planned. I accidentally started writing an EP, I think??? This year I would love to actually finish that and release it. Publish more poetry. Writing poetry this year was very meaningful, and it's deeply important to me. I want to get more of it published so that I can share it with people who will also be able to get deep importance from it. That's it! Wow, the year was a lot. I've put a lot of myself in this post. If you've read this far, thank you so much for reading. If you've not read this far, then how're you reading this sentence anyway? 2025 had a lot in it. There were some very good things I am very grateful for. There were some very scary and bad things that I wish had never happened. All told, it's been a long few years jammed into one calendar year. I hope that 2026 will be a little calmer, with less of the bad. Maybe it can feel like just one year. Regardless, I'm going to hold as much joy in the world this year as I can. Please join me in that. Let's fill 2026 with as much joy as we can, and make the world shine in spite of everything. The surgeon really meshed me up! ↩ ❓ Once again, I wanted to keep my rights. It's a perennial goal, and I did keep my rights in the state/community I live in. I'm awarding this one a question mark since my rights were under assault, and there are now many more places I cannot safely travel to. That means it's not a full miss, but not a win either. ✅ No personal-time side projects went into production! Yet another year that I toyed with the idea and again talked myself out of it. I'm taking it off the list for 2026, since the urge wasn't really even there this time. ✅ Maintained relationships with friends and family. I've had regular, scheduled calls with some people close to me. I've visited people, supported them when they needed me, and asked for support when I needed it. ❓ I did a little consulting and coaching, but didn't explore many ways to make this (playful exploration like I do on here) my living. I'm giving this the question mark of dubiousity, since I don't think I got much information from the year toward the questions I wanted to answer. ✅ Kept my mental health strong! There were certainly some challenges. What I'm proud of most is that I recognized those challenges and made space for myself. That's why I stopped blogging regularly: I needed the space to get through things with intact mental health. ❓ Did some ridiculous fun projects with code, but not as much as I wanted. The main project was making it so I can type using my keyboard (you know, like a piano, not the thing with letters on it). I had aspired to do more, and I'm glad I let myself relax on this. ✅ Wrote some original music! ✅ Also recorded that original music! It's on my bandcamp page . The surgeon really meshed me up! ↩

0 views
Farid Zakaria 1 weeks ago

Bespoke software is the future

At Google, some of the engineers would joke, self-deprecatingly , that the software internally was not particularly exceptional but rather Google’s dominance was an example of the power of network effects: when software is custom tailored to work well with each other. This is often cited externally to Google, or similar FAANG companies, as indulgent “NIH” (Not Invented Here) syndrome; where the prevailing practice is to pick generalized software solutions, preferably open-source, off-the shelf. The problem with these generalized solutions is that, well, they are generalized and rarely fit well together. 🙄 Engineers are trained to be DRY (Don’t Repeat Yourself), and love abstractions. As a tool tries to solve more problems, the abstraction becomes leakier and ill-fitting. It becomes a general-purpose tax. If you only need 10% of a software solution, you pay for the remaining 90% via the abstractions they impose. 🫠 Internally to a company, however, we are taught that unused code is a liability. We often celebrate negative pull-requests as valuable clean-up work with the understanding that smaller code-bases are simpler to understand, operate and optimize. Yet for our most of our infrastructure tooling, we continue to bloat solutions and tout support despite miniscule user bases. This is probably one of the areas I am most excited about with the ability to leverage LLM for software creation. I recently spent time investigating linkers in previous posts such as LLVM’s lld . I found LLVM to be a pretty polished codebase with lots of documentation. Despite the high-quality, navigating the codebase is challenging as it’s a mass of interfaces and abstractions in order to support: multiple object file formats, 13+ ISAs, a slough of features (i.e. linker scripts ) and multiple operating systems. Instead, I leveraged LLMs to help me design and write µld , a tiny opinionated linker in Rust that only targets ELF, x86_64, static linking and barebone feature-set. It shouldn’t be a surprise to anyone that the end result is a codebase that I can audit, educate myself and can easily grow to support additional improvements and optimizations. The surprising bit, especially to me, was how easy it was to author and write within a very short period of time (1-2 days). That means smaller companies, without the coffer of similar FAANG companies, can also pursue bespoke custom tailored software for their needs. This future is well-suited for tooling such as Nix . Nix is the perfect vehicle to help build custom tooling as you have a playground that is designed to build the world similar to a monorepo. We need to begin to cut away legacy in our tooling and build software that solves specific problems. The end-result will be smaller, easier to manage and better integrated. Where this might have seemed unattainable for most, LLMs will democratize this possibility. I’m excited for the bespoke future.

0 views
Brain Baking 2 weeks ago

2025 In Video Games

It’s that time of the year—the time to publish the yearly notes summarizing playtime statistics and providing a personal opinion on recent and vintage Game Of The Year (GOTY) contestants. In 2023 , Pizza Tower and Tactics Ogre: Reborn were examples of superb recent games that even made it to the Top 100 List , while DUSK and Plants vs. Zombies scored high in the vintage list (both also on the Top 100). In 2024 , Skald and the Paper Mario remake were the great ones, but the most memorable experience was no doubt playing Ultima Underworld for the first time together for the DOS Game Club. For 2025, the amount of games recorded on my retro gaming site remains the same as the previous year—27—but this year I also started occasionally reviewing board games that I replay at least ten times. Here’s this year’s collage of the games I (re)played this year in chronological order: A collage of the 2025 GOTY contestants. I have yet to write a review for Shotgun King so let’s keep that one out. It’s a small indie roguelike that’s fun but doesn’t have a lot to offer. Also, since this post is called 2025 in Video Games , let’s ignore the board games for now and keep that for a future post where I summarise my Board Game Geek statistics. Some more useless stats, based on user input from How Long To Beat (HLTB): Last year, about 50% of my gaming time took place on the Switch. That’s dropped to 40%. Or has it? Remove the six board games and you’ve got 52% so nope, I’m still primarily a Nintendo (handheld) gamer. I have a bunch of cartridges waiting to be played and I believe even a few cases still in shrink wrap (yeah I know), so for the coming year, that’s not likely to change either. I don’t need a Switch 2 just yet. For more details on those divisions by platform, I again reused last year’s script to generate a graph summarizing the platforms and calculates an average score (rated on 5, see about the rating system ) for each platform: A bar chart of (average) scores per platform. Most mediocre plays game from platforms where I was hunting down card games for my feature write-up on card games back in September. Filtering all games that are scored as either great (4/5) or amazing (5/5), we end with the following lists, where I further cherry-picked the best of the best: The Recent GOTY list: Couch “recent” cough . Yeah, again—I know. What can I say, I’m a retro gamer, and the “new games” I play are usually repurposed old ones, go figure. This seems to be especially apparent this year. Those Nightdive Studios boomer shooter remakes are beyond awesome, you’ve got to try them! The Vintage GOTY list: I found 2024 to be a meagre year for me when it comes to “the great ones”—because I don’t play many of those within the year of release. I have the same feeling for this year, looking back at the play log. There are many great games I highly enjoyed such as Wonder Boy with the awesome art and music and ability to switch back and forth between retro and remastered version, or Hoyle Card Games , the PC classic that’s hard to beat when it comes to trumping the trump. I love Celeste and Castlevania Dominus Collection but those were replays of games I know by heart, so I’m ruling them out. We’ve got to draw the line somewhere. And then there’s Inscryption . What a game. No, what an experience that was! I was on the edge of my seat almost every single in-game minute. I played it in January (read my thoughts but beware of the spoilers) and didn’t encounter a game that challenged my expectations that much ever since. There’s no need for a debate or a voting round: Inscryption is my “Game of the Other Year”. It’s in the Top100 . As for the GOTY of 2025-ish; that’s got to be one of the Nightdive remakes. Both Blood: Refreshed Supply and the Outlaws remaster have been released recently and I haven’t yet had the change to touch either of them. If I had, I think Blood might have been the winner as that’s the only Build Engine game I never truly played back in the nineties. Screw it. DOOM + DOOM II is my GOTY. Just the music alone: And that’s from the new Legacy of Rust expansion. I’ll leave the discovery of Andrew Hulshult’s DOOM riffs up to you. Obviously, DOOM + DOOM II (2024) kicked out and replaced DOOM (1993) in the Top100. Cheers to 2026. My hopes are high for opening that shrink wrap. Related topics: / games / goty / lists / yearnote / By Wouter Groeneveld on 30 December 2025.  Reply via email . total #games: 27 total hours: 175.8 average hours: 6.51 average a day: 0.5 longest game: 28.0 hours; ‘Castlevania Dominus Collection’ shortest game: 0.0 hours; Hoyle Card Games 2002 Divison by platform: Platform: pc (5/27) Platform: ds (3/27) Platform: boardgames (6/27) Platform: gameboycolor (1/27) Platform: switch (11/27) Platform: snes (1/27) 💖 Guncho (pc; 2024) 💖 Shogun Showdown (switch; 2023) 💖 Rise Of The Triad: Ludicrous Edition (switch; 2023) 💖 Prince of Persia: The Lost Crown (switch; 2024) 💖 DOOM + DOOM II (pc; 2024) 💖 Castlevania Dominus Collection (switch; 2024) 💖 Hoyle Card Games 2002 (pc; 2002) 💖 Wonder Boy: The Dragon’s Trap (switch; 2017) 💖 Tangle Tower (switch; 2019) 💖 Celeste (switch; 2018) 💖 Inscryption (switch; 2021)

0 views

CHERIoT RTOS: An OS for Fine-Grained Memory-Safe Compartments on Low-Cost Embedded Devices

CHERIoT RTOS: An OS for Fine-Grained Memory-Safe Compartments on Low-Cost Embedded Devices Saar Amar, Tony Chen, David Chisnall, Nathaniel Wesley Filardo, Ben Laurie, Hugo Lefeuvre, Kunyan Liu, Simon W. Moore, Robert Norton-Wright, Margo Seltzer, Yucong Tao, Robert N. M. Watson, and Hongyan Xia SOSP'25 This paper is a companion to a previous paper which described the CHERIoT hardware architecture. This work presents an OS that doesn’t look like the systems you are used to. The primary goal is memory safety (and security more broadly). Why rewrite your embedded code in Rust when you can switch to a fancy new chip and OS instead? Recall that a CHERI capability is a pointer augmented with metadata (bounds, access permissions). CHERI allows a more restrictive capability to be derived from a less restrictive one (e.g., reduce the bounds or remove access permissions), but not the other way around. CHERIoT RTOS doesn’t have the notion of a process, instead it has a compartment. A compartment comprises code and compartment-global data. Compartment boundaries are trust boundaries. I think of it like a microkernel operating system. Example compartments in CHERIoT include: Boot loader Context switcher Heap allocator Thread scheduler The boot loader is fully trusted and is the first code to run. The hardware provides the boot loader with the ultimate capability. The boot loader then derives more restrictive capabilities, which it passes to other compartments. You could imagine a driver compartment which is responsible for managing a particular I/O device. The boot loader would provide that compartment with a capability that enables the compartment to access the MMIO registers associated with the device. There is no user space/kernel space distinction here, only a set of compartments, each with a unique set of capabilities. Fig. 3 illustrates a compartment: Source: https://dl.acm.org/doi/10.1145/3731569.3764844 Sealed Capabilities The CHERIoT hardware architecture supports sealing of capabilities. Sealing a capability is similar to deriving a more restrictive one, only this time the derived capability is useless until it is unsealed by a compartment which holds a capability with unsealing permissions. I think of this like a client encrypting some data before storing it on a server. The data is useless to everyone except for the client who can decrypt it. Cross-compartment function calls are similar to system calls and are implemented with sealed capabilities. Say compartment needs to be able to call a function exported by compartment . At boot, the boot loader derives a “function call” capability which is a pointer into the export table associated with , seals that capability, and passes it to compartment at initialization. The boot loader also gives the switcher a capability which allows it to unseal the function call capability. When A wants to call the function exported by , it passes the sealed capability to the switcher. The switcher then unseals the capability and uses it to read metadata about the exported function from ’s export table. The switcher uses this metadata to safely perform the function call. Capability sealing also simplifies inter-compartment state management. Say compartment calls into compartment (for networking) to create a TCP connection. The networking compartment can allocate a complicated tree of objects and then return a sealed capability which points to that tree. Compartment can hold on to that capability and pass it as a parameter for future networking function calls (which will unseal and then use). Compartment doesn’t need to track per-connection objects in its global state. The heap compartment handles memory allocation for all compartments. There is just one address space shared by all compartments, but capabilities make the whole thing safe. As described in the previous summary, when an allocation is freed, the heap allocator sets associated revocation bits to zero. This prevents use-after-free bugs (in conjunction with the CHERIoT hardware load filter). Similar to garbage collection, freed memory is quarantined (not reused) until a memory sweep completes which ensures that no outstanding valid capabilities are referencing the memory to be reused. The allocator supports allocation capabilities which can enforce per-compartment quotas. If you’ve had enough novelty, you can rest your eyes for a moment. The CHERIoT RTOS supports threads, and they mostly behave like you would expect. The only restriction is that threads are statically declared in code. Threads begin execution in the compartment that declares them, but then threads can execute code in other compartments via cross-compartment calls. Each compartment is responsible for managing its own state with proper error handling. If all else fails, the OS supports micro-reboots, where a single compartment can be reset to a fresh state. The cross-compartment call mechanism supported by the switcher enables the necessary bookkeeping for micro-reboots. The steps to reboot a single compartment are: Stop new threads from calling into the compartment (these calls fail with an error code) Fault all threads which are currently executing in the compartment (this will also result in error codes being returned to other compartments) Release all resources (e.g., heap data) which have been allocated by the compartment Reset all global variables to their initial state I wonder how often a micro-reboot of one compartment results in an error code which causes other compartments to micro-reboot. If a call into a compartment which is in the middle of a micro-reboot can fail, then I could see that triggering a cascade of micro-reboots. The ideas here remind me of Midori , which relied on managed languages rather than hardware support. I wonder which component is better to trust, an SoC or a compiler? Subscribe now Boot loader Context switcher Heap allocator Thread scheduler Source: https://dl.acm.org/doi/10.1145/3731569.3764844 Sealed Capabilities The CHERIoT hardware architecture supports sealing of capabilities. Sealing a capability is similar to deriving a more restrictive one, only this time the derived capability is useless until it is unsealed by a compartment which holds a capability with unsealing permissions. I think of this like a client encrypting some data before storing it on a server. The data is useless to everyone except for the client who can decrypt it. Cross-compartment function calls are similar to system calls and are implemented with sealed capabilities. Say compartment needs to be able to call a function exported by compartment . At boot, the boot loader derives a “function call” capability which is a pointer into the export table associated with , seals that capability, and passes it to compartment at initialization. The boot loader also gives the switcher a capability which allows it to unseal the function call capability. When A wants to call the function exported by , it passes the sealed capability to the switcher. The switcher then unseals the capability and uses it to read metadata about the exported function from ’s export table. The switcher uses this metadata to safely perform the function call. Capability sealing also simplifies inter-compartment state management. Say compartment calls into compartment (for networking) to create a TCP connection. The networking compartment can allocate a complicated tree of objects and then return a sealed capability which points to that tree. Compartment can hold on to that capability and pass it as a parameter for future networking function calls (which will unseal and then use). Compartment doesn’t need to track per-connection objects in its global state. Heap Allocator The heap compartment handles memory allocation for all compartments. There is just one address space shared by all compartments, but capabilities make the whole thing safe. As described in the previous summary, when an allocation is freed, the heap allocator sets associated revocation bits to zero. This prevents use-after-free bugs (in conjunction with the CHERIoT hardware load filter). Similar to garbage collection, freed memory is quarantined (not reused) until a memory sweep completes which ensures that no outstanding valid capabilities are referencing the memory to be reused. The allocator supports allocation capabilities which can enforce per-compartment quotas. Threads If you’ve had enough novelty, you can rest your eyes for a moment. The CHERIoT RTOS supports threads, and they mostly behave like you would expect. The only restriction is that threads are statically declared in code. Threads begin execution in the compartment that declares them, but then threads can execute code in other compartments via cross-compartment calls. Micro-reboots Each compartment is responsible for managing its own state with proper error handling. If all else fails, the OS supports micro-reboots, where a single compartment can be reset to a fresh state. The cross-compartment call mechanism supported by the switcher enables the necessary bookkeeping for micro-reboots. The steps to reboot a single compartment are: Stop new threads from calling into the compartment (these calls fail with an error code) Fault all threads which are currently executing in the compartment (this will also result in error codes being returned to other compartments) Release all resources (e.g., heap data) which have been allocated by the compartment Reset all global variables to their initial state

0 views

Can Bundler Be as Fast as uv?

At RailsWorld earlier this year, I got nerd sniped by someone. They asked “why can’t Bundler be as fast as uv?” Immediately my inner voice said “YA, WHY CAN’T IT BE AS FAST AS UV????” My inner voice likes to shout at me, especially when someone asks a question so obvious I should have thought of it myself. Since then I’ve been thinking about and investigating this problem, going so far as to give a presentation at XO Ruby Portland about Bundler performance . I firmly believe the answer is “Bundler can be as fast as uv” (where “as fast” has a margin of error lol). Fortunately, Andrew Nesbitt recently wrote a post called “How uv got so fast” , and I thought I would take this opportunity to review some of the highlights of the post and how techniques applied in uv can (or can’t) be applied to Bundler / RubyGems. I’d also like to discuss some of the existing bottlenecks in Bundler and what we can do to fix them. If you haven’t read Andrew’s post, I highly recommend giving it a read . I’m going to quote some parts of the post and try to reframe them with RubyGems / Bundler in mind. Andrew opens the post talking about rewriting in Rust: uv installs packages faster than pip by an order of magnitude. The usual explanation is “it’s written in Rust.” That’s true, but it doesn’t explain much. Plenty of tools are written in Rust without being notably fast. The interesting question is what design decisions made the difference. This is such a good quote. I’m going to address “rewrite in Rust” a bit later in the post. But suffice to say, I think if we eliminate bottlenecks in Bundler such that the only viable option for performance improvements is to “rewrite in Rust”, then I’ll call it a success. I think rewrites give developers the freedom to “think outside the box”, and try techniques they might not have tried. In the case of , I think it gave the developers a good way to say “if we don’t have to worry about backwards compatibility, what could we achieve?”. I suspect it would be possible to write a uv in Python (PyUv?) that approaches the speeds of uv, and in fact much of the blog post goes on to talk about performance improvements that aren’t related to Rust. pip’s slowness isn’t a failure of implementation. For years, Python packaging required executing code to find out what a package needed. I didn’t know this about Python packages, and it doesn’t really apply to Ruby Gems so I’m mostly going to skip this section. Ruby Gems are tar files, and one of the files in the tar file is a YAML representation of the GemSpec. This YAML file declares all dependencies for the Gem, so RubyGems can know, without evaling anything, what dependencies it needs to install before it can install any particular Gem. Additionally, RubyGems.org provides an API for asking about dependency information, which is actually the normal way of getting dependency info (again, no required). There’s only one other thing from this section I’d like to quote: PEP 658 (2022) put package metadata directly in the Simple Repository API, so resolvers could fetch dependency information without downloading wheels at all. Fortunately RubyGems.org already provides the same information about gems. Reading through the number of PEPs required as well as the amount of time it took to get the standards in place was very eye opening for me. I can’t help but applaud folks in the Python community for doing this. It seems like a mountain of work, and they should really be proud of themselves. I’m mostly going to skip this section except for one point: Ignoring requires-python upper bounds. When a package says it requires python<4.0, uv ignores the upper bound and only checks the lower. This reduces resolver backtracking dramatically since upper bounds are almost always wrong. Packages declare python<4.0 because they haven’t tested on Python 4, not because they’ll actually break. The constraint is defensive, not predictive. I think this is very very interesting. I don’t know how much time Bundler spends on doing “required Ruby version” bounds checking, but it feels like if uv can do it, so can we. I really love that Andrew pointed out optimizations that could be made that don’t involve Rust. There are three points in this section that I want to pull out: Parallel downloads. pip downloads packages one at a time. uv downloads many at once. Any language can do this. This is absolutely true, and is a place where Bundler could improve. Bundler currently has a problem when it comes to parallel downloads, and needs a small architectural change as a fix. The first problem is that Bundler tightly couples installing a gem with downloading the gem. You can read the installation code here , but I’ll summarize the method in question below: The problem with this method is that it inextricably links downloading the gem with installing it. This is a problem because we could be downloading gems while installing other gems, but we’re forced to wait because the installation method couples the two operations. Downloading gems can trivially be done in parallel since the files are just archives that can be fetched independently. The second problem is the queuing system in the installation code. After gem resolution is complete, and Bundler knows what gems need to be installed, it queues them up for installation. You can find the queueing code here . The code takes some effort to understand. Basically it allows gems to be installed in parallel, but only gems that have already had their dependencies installed. So for example, if you have a dependency tree like “gem depends on gem which depends on gem ” ( ), then no gems will be installed (or downloaded) in parallel. To demonstrate this problem in an easy-to-understand way, I built a slow Gem server . It generates a dependency tree of ( depends on , depends on ), then starts a Gem server. The Gem server takes 3 seconds to return any Gem, so if we point Bundler at this Gem server and then profile Bundler, we can see the impact of the queueing system and download scheme. In my test app, I have the following Gemfile: If we profile Bundle install with Vernier, we can see the following swim lanes in the marker chart: The above chart is showing that we get no parallelism during installation. We spend 3 seconds downloading the gem, then we install it. Then we spend 3 seconds downloading the gem, then we install it. Finally we spend 3 seconds downloading the gem, and we install it. Timing the process shows we take over 9 seconds to install (3 seconds per gem): Contrast this with a Gemfile containing , , and , which have no dependencies, but still take 3 seconds to download: Timing for the above Gemfile shows it takes about 4 seconds: We were able to install the same number of gems in a fraction of the time. This is because Bundler is able to download siblings in the dependency tree in parallel, but unable to handle other relationships. There is actually a good reason that Bundler insists dependencies are installed before the gems themselves: native extensions. When installing native extensions, the installation process must run Ruby code (the file). Since the could require dependencies be installed in order to run, we must install dependencies first. For example depends on , but is only used during the installation process, so it needs to be installed before can be compiled and installed. However, if we were to decouple downloading from installation it would be possible for us to maintain the “dependencies are installed first” business requirement but speed up installation. In the case, we could have been downloading gems and at the same time as gem (or even while waiting on to be installed). Additionally, pure Ruby gems don’t need to execute any code on installation. If we knew that we were installing a pure Ruby gem, it would be possible to relax the “dependencies are installed first” business requirement and get even more performance increases. The above case could install all three gems in parallel since none of them execute Ruby code during installation. I would propose we split installation in to 4 discrete steps: Downloading and unpacking can be done trivially in parallel. We should unpack the gem to a temporary folder so that if the process crashes or the machine loses power, the user isn’t stuck with a half-installed gem. After we unpack the gem, we can discover whether the gem is a native extension or not. If it’s not a native extension, we “install” the gem simply by moving the temporary folder to the “correct” location. This step could even be a “hard link” step as discussed in the next point. If we discover that the gem is a native extension, then we can “pause” installation of that gem until its dependencies are installed, then resume (by compiling) at an appropriate time. Side note: , a Bundler alternative , works mostly in this manner today. Here is a timing of the case from above: Lets move on to the next point: Global cache with hardlinks. pip copies packages into each virtual environment. uv keeps one copy globally and uses hardlinks I think this is a great idea, but I’d actually like to split the idea in two. First, RubyGems and Bundler should have a combined, global cache, full stop. I think that global cache should be in , and we should store files there when they are downloaded. Currently, both Bundler and RubyGems will use a Ruby version specific cache folder. In other words, if you do on two different versions of Ruby, you get two copies of Rails and all its dependencies. Interestingly, there is an open ticket to implement this , it just needs to be done. The second point is hardlinking on installation. The idea here is that rather than unpacking the gem multiple times, once per Ruby version, we simply unpack once and then hard link per Ruby version. I like this idea, but I think it should be implemented after some technical debt is paid: namely implementing a global cache and unifying Bundler / RubyGems code paths. On to the next point: PubGrub resolver Actually Bundler already uses a Ruby implementation of the PubGrub resolver. You can see it here . Unfortunately, RubyGems still uses the molinillo resolver . In other words you use a different resolver depending on whether you do or . I don’t really think this is a big deal since the vast majority of users will be doing most of time. However, I do think this discrepancy is some technical debt that should be addressed, and I think this should be addressed via unification of RubyGems and Bundler codebases (today they both live in the same repository, but the code isn’t necessarily combined). Lets move on to the next section of Andrew’s post: Andrew first mentions “Zero-copy deserialization”. This is of course an important technique, but I’m not 100% sure where we would utilize it in RubyGems / Bundler. I think that today we parse the YAML spec on installation, and that could be a target. But I also think we could install most gems without looking at the YAML gemspec at all. Thread-level parallelism. Python’s GIL forces parallel work into separate processes, with IPC overhead and data copying. This is an interesting point. I’m not sure what work pip needed to do in separate processes. Installing a pure Ruby, Ruby Gem is mostly an IO bound task, with some ZLIB mixed in. Both of these things (IO and ZLIB processing) release Ruby’s GVL, so it’s possible for us to do things truly in parallel. I imagine this is similar for Python / pip, but I really have no idea. Given the stated challenges with Python’s GIL, you might wonder whether Ruby’s GVL presents similar parallelism problems for Bundler. I don’t think so, and in fact I think Ruby’s GVL gets kind of a bad rap. It prevents us from running CPU bound Ruby code in parallel. Ractors address this, and Bundler could possibly leverage them in the future, but since installing Gems is mostly an IO bound task I’m not sure what the advantage would be (possibly the version solver, but I’m not sure what can be parallelized in there). The GVL does allow us to run IO bound work in parallel with CPU bound Ruby code. CPU bound native extensions are allowed to release the GVL , allowing Ruby code to run in parallel with the native extension’s CPU bound code. In other words, Ruby’s GVL allows us to safely run work in parallel. That said, the GVL can work against us because releasing and acquiring the GVL takes time . If you have a system call that is very fast, releasing and acquiring the GVL could end up being a large percentage of that call. For example, if you do , and the buffer is very small, you could encounter a situation where GVL book keeping is the majority of the time. A bummer is that Ruby Gem packages usually contain lots of very small files, so this problem could be impacting us. The good news is that this problem can be solved in Ruby itself, and indeed some work is being done on it today . No interpreter startup. Every time pip spawns a subprocess, it pays Python’s startup cost. Obviously Ruby has this same problem. That said, we only start Ruby subprocesses when installing native extensions. I think native extensions make up the minority of gems installed, and even when installing a native extension, it isn’t Ruby startup that is the bottleneck. Usually the bottleneck is compilation / linking time (as we’ll see in the next post). Compact version representation. uv packs versions into u64 integers where possible, making comparison and hashing fast. This is a cool optimization, but I don’t think it’s actually Rust specific. Comparing integers is much faster than comparing version objects. The idea is that you take a version number, say , and then pack each part of the version in to a single integer. For example, we could represent as and as , etc. It should be possible to use this trick in Ruby and encode versions to integer immediates, which would unlock performance in the resolver. Rust has an advantage here - compiled native code comparing u64s will always be faster than Ruby, even with immediates. However, I would bet that with the YJIT or ZJIT in play, this gap could be closed enough that no end user would notice the difference between a Rust or Ruby implementation of Bundler. I started refactoring the object so that we might start doing this, but we ended up reverting it because of backwards compatibility (I am jealous of in that regard). I think the right way to do this is to refactor the solver entry point and ensure all version requirements are encoded as integer immediates before entering the solver. We could keep the API as “user facing” and design a more internal API that the solver uses. I am very interested in reading the version encoding scheme in uv. My intuition is that minor numbers tend to get larger than major numbers, so would minor numbers have more dedicated bits? Would it even matter with 64 bits? I’m going to quote Andrew’s last 2 paragraphs: uv is fast because of what it doesn’t do, not because of what language it’s written in. The standards work of PEP 518, 517, 621, and 658 made fast package management possible. Dropping eggs, pip.conf, and permissive parsing made it achievable. Rust makes it a bit faster still. pip could implement parallel downloads, global caching, and metadata-only resolution tomorrow. It doesn’t, largely because backwards compatibility with fifteen years of edge cases takes precedence. But it means pip will always be slower than a tool that starts fresh with modern assumptions. I think these are very good points. The difference is that in RubyGems and Bundler, we already have the infrastructure in place for writing a “fast as uv” package manager. The difficult part is dealing with backwards compatibility, and navigating two legacy codebases. I think this is the real advantage the uv developers had. That said, I am very optimistic that we could “repair the plane mid-flight” so to speak, and have the best of both worlds: backwards compatibility and speed. I mentioned at the top of the post I would address “rewrite it in Rust”, and I think Andrew’s own quote mostly does that for me. I think we could have 99% of the performance improvements while still maintaining a Ruby codebase. Of course if we rewrote it in Rust, you could squeeze an extra 1% out, but would it be worthwhile? I don’t think so. I have a lot more to say about this topic, and I feel like this post is getting kind of long, so I’m going to end it here. Please look out for part 2, which I’m tentatively calling “What makes Bundler / RubyGems slow?” This post was very “can we make RubyGems / Bundler do what uv does?” (the answer is “yes”). In part 2 I want to get more hands-on by discussing how to profile Bundler and RubyGems, what specifically makes them slow in the real world, and what we can do about it. I want to end this post by saying “thank you” to Andrew for writing such a great post about how uv got so fast . Download the gem Unpack the gem Compile the gem Install the gem

0 views
Blog System/5 2 weeks ago

ssh-agent broken in tmux? I've got you!

A little over two years ago, I wrote an article titled SSH agent forwarding and tmux done right . In it, I described how SSH agent forwarding works—a feature that lets a remote machine use the credentials stored in your local ssh-agent instance—and how using a console multiplexer like tmux or screen often breaks it. In that article, I presented the ssh-agent-switcher : a program I put together in a few hours to fix this problem. In short, ssh-agent-switcher exposes an agent socket at a stable location ( by default) and proxies all incoming credential requests to the transient socket that the sshd server creates on a connection basis. In this article, I want to formalize this project by presenting its first actual release, 1.0.0, and explain what has changed to warrant this release number. I put effort into creating this formal release because ssh-agent-switcher has organically gained more interest than I imagined as it is solving a real problem that various people have. When I first wrote ssh-agent-switcher, I did so to fix a problem I was having at work: we were moving from local developer workstations to remote VMs, we required SSH to work on the remote VMs for GitHub access, and I kept hitting problems with the ssh-agent forwarding feature breaking because I’m an avid user of tmux . To explain the problem to my peers, I wrote the aforementioned article and prototyped ssh-agent-switcher after-hours to demonstrate a solution. At the end of the day, the team took a different route for our remote machines but I kept using this little program on my personal machines. Because of work constraints, I had originally written ssh-agent-switcher in Go and I had used Bazel as its build system. I also used my own shtk library to quickly write a bunch of integration tests and, because of the Bazel requirement, I even wrote my first ruleset, rules_shtk , to make it possible. The program worked, but due to the apparent lack of interest, I considered it “done” and what you found in GitHub was a code dump of a little project I wrote in a couple of free evenings. Recently, however, ssh-agent-switcher stopped working on a Debian testing machine I run and I had to fix it. Luckily, someone had sent a bug report describing what the problem was: OpenSSH 10.1 had changed the location where sshd creates the forwarding sockets and even changed their naming scheme, so ssh-agent-switcher had to adapt. Fixing this issue was straightforward, but doing so made me have to “touch” the ssh-agent-switcher codebase again and I got some interest to tweak it further. My energy to work on side-projects like this one and to write about them comes from your support. Subscribe now to motivate future content! As I wanted to modernize this program, one thing kept rubbing me the wrong way: I had originally forced myself to use Go because of potential work constraints. As these requirements never became relevant and I “needed to write some code” to quench some stress, I decided to rewrite the program in Rust. Why, you ask? Just because I wanted to. It’s my code and I wanted to have fun with it, so I did the rewrite. Which took me into a detour. You see: while command line parsing in Rust CLI apps is a solved problem , I had been using the ancient getopts crate in other projects of mine out of inertia. Using either library requires replicating some boilerplate across apps that I don’t like, so… I ended up cleaning up that “common code” as well and putting it into a new crate aptly-but-oddly-named getoptsargs . Take a look and see if you find it interesting… I might write a separate article on it. Doing this rewrite also made me question the decision to use Bazel (again imposed by constraints that never materialized) for this simple tool: as much as I like the concepts behind this build system and think it’s the right choice for large codebases, it was just too heavy for a trivial program like ssh-agent-switcher. So… I just dropped Bazel and wrote a Makefile—which you’d think isn’t necessary for a pure Rust project but remember that this codebase includes shell tests too. With the Rust rewrite done, I was now on a path to making ssh-agent-switcher a “real project” so the first thing I wanted to fix were the ugly setup instructions from the original code dump. Here is what the project README used to tell you to write into your shell startup scripts: Yikes. You needed shell-specific logic to detach the program from the controlling session so that it didn’t stop running when you logged out, as that would have made ssh-agent-switcher suffer from the exact same problems as regular sshd socket handling. The solution to this was to make ssh-agent-switcher become a daemon on its own with proper logging and “singleton” checking via PID file locking. So now you can reliably start it like this from any shell: I suppose you could make systemd start and manage ssh-agent-switcher automatically with a per-user socket trigger without needing the daemonization support in the binary per se… but I do care about more than just Linux and so assuming the presence of systemd is not an option. With that done, I felt compelled to fix a zero-day TODO that kept causing trouble for people: a fixed-size buffer used to proxy requests between the SSH client and the forwarded agent. This limitation caused connections to stall if the response from the ssh-agent contained more keys than fit in the buffer. The workaround had been to make the fixed-size buffer “big enough”, but that was still insufficient for some outlier cases and came with the assumption that the messages sent over the socket would fit in the OS internal buffers in one go as well. No bueno. Fixing this properly required one of the following: adding threads to handle reads and writes over two sockets in any order, dealing with the annoying / family of system calls, or using an async runtime and library (tokio) to deal with the event-like nature of proxying data between two network connections. People dislike async Rust for some good reasons, but async is the way to get to the real fearless concurrency promise. I did not fancy managing threads by hand, and I did not want to deal with manual event handling… so async it was. And you know what? Switching to async had two immediate benefits: Handling termination signals with proper cleanup became straightforward. The previous code had to install a signal handler and deal with potential races in the face of blocking system calls by doing manual polling of incoming requests, which isn’t good if you like power efficiency. Using tokio made this trivial and in a way that I more easily trust is correct. I could easily implement the connection proxying by using an event-driven loop and not having to reason about threads and their terminating conditions. Funnily enough, after a couple of hours of hacking, I felt proud of the proxying algorithm and the comprehensive unit tests I had written so I asked Gemini for feedback, and… while it told me my code was correct, it also said I could replace it all with a single call to a primitive! Fun times. I still don’t trust AI to write much code for me, but I do like it a lot to perform code reviews. Even with tokio in the picture and all of the recent new features and fixes, the Rust binary of ssh-agent-switcher is still smaller (by 100KB or so) than the equivalent Go one and I trust its implementation more. Knowing that various people had found this project useful over the last two years, I decided to conclude this sprint by creating an actual “formal release” of ssh-agent-switcher. Formal releases require: Documentation, which made me write a manual page . A proper installation process, which made me write a traditional -like script because doesn’t support installing supporting documents. A tag and release number, which many people forget about doing these days but are critical if you want the code to be packaged in upstream OSes. And with that, ssh-agent-switcher 1.0.0 went live on Christmas day of 2025. pkgsrc already has a package for it ; what is your OS waiting for? 😉 In that article, I presented the ssh-agent-switcher : a program I put together in a few hours to fix this problem. In short, ssh-agent-switcher exposes an agent socket at a stable location ( by default) and proxies all incoming credential requests to the transient socket that the sshd server creates on a connection basis. In this article, I want to formalize this project by presenting its first actual release, 1.0.0, and explain what has changed to warrant this release number. I put effort into creating this formal release because ssh-agent-switcher has organically gained more interest than I imagined as it is solving a real problem that various people have. Some background When I first wrote ssh-agent-switcher, I did so to fix a problem I was having at work: we were moving from local developer workstations to remote VMs, we required SSH to work on the remote VMs for GitHub access, and I kept hitting problems with the ssh-agent forwarding feature breaking because I’m an avid user of tmux . To explain the problem to my peers, I wrote the aforementioned article and prototyped ssh-agent-switcher after-hours to demonstrate a solution. At the end of the day, the team took a different route for our remote machines but I kept using this little program on my personal machines. Because of work constraints, I had originally written ssh-agent-switcher in Go and I had used Bazel as its build system. I also used my own shtk library to quickly write a bunch of integration tests and, because of the Bazel requirement, I even wrote my first ruleset, rules_shtk , to make it possible. The program worked, but due to the apparent lack of interest, I considered it “done” and what you found in GitHub was a code dump of a little project I wrote in a couple of free evenings. New OpenSSH naming scheme Recently, however, ssh-agent-switcher stopped working on a Debian testing machine I run and I had to fix it. Luckily, someone had sent a bug report describing what the problem was: OpenSSH 10.1 had changed the location where sshd creates the forwarding sockets and even changed their naming scheme, so ssh-agent-switcher had to adapt. Fixing this issue was straightforward, but doing so made me have to “touch” the ssh-agent-switcher codebase again and I got some interest to tweak it further. My energy to work on side-projects like this one and to write about them comes from your support. Subscribe now to motivate future content! The Rust rewrite As I wanted to modernize this program, one thing kept rubbing me the wrong way: I had originally forced myself to use Go because of potential work constraints. As these requirements never became relevant and I “needed to write some code” to quench some stress, I decided to rewrite the program in Rust. Why, you ask? Just because I wanted to. It’s my code and I wanted to have fun with it, so I did the rewrite. Which took me into a detour. You see: while command line parsing in Rust CLI apps is a solved problem , I had been using the ancient getopts crate in other projects of mine out of inertia. Using either library requires replicating some boilerplate across apps that I don’t like, so… I ended up cleaning up that “common code” as well and putting it into a new crate aptly-but-oddly-named getoptsargs . Take a look and see if you find it interesting… I might write a separate article on it. Doing this rewrite also made me question the decision to use Bazel (again imposed by constraints that never materialized) for this simple tool: as much as I like the concepts behind this build system and think it’s the right choice for large codebases, it was just too heavy for a trivial program like ssh-agent-switcher. So… I just dropped Bazel and wrote a Makefile—which you’d think isn’t necessary for a pure Rust project but remember that this codebase includes shell tests too. Daemonization support With the Rust rewrite done, I was now on a path to making ssh-agent-switcher a “real project” so the first thing I wanted to fix were the ugly setup instructions from the original code dump. Here is what the project README used to tell you to write into your shell startup scripts: Yikes. You needed shell-specific logic to detach the program from the controlling session so that it didn’t stop running when you logged out, as that would have made ssh-agent-switcher suffer from the exact same problems as regular sshd socket handling. The solution to this was to make ssh-agent-switcher become a daemon on its own with proper logging and “singleton” checking via PID file locking. So now you can reliably start it like this from any shell: I suppose you could make systemd start and manage ssh-agent-switcher automatically with a per-user socket trigger without needing the daemonization support in the binary per se… but I do care about more than just Linux and so assuming the presence of systemd is not an option. Going async With that done, I felt compelled to fix a zero-day TODO that kept causing trouble for people: a fixed-size buffer used to proxy requests between the SSH client and the forwarded agent. This limitation caused connections to stall if the response from the ssh-agent contained more keys than fit in the buffer. The workaround had been to make the fixed-size buffer “big enough”, but that was still insufficient for some outlier cases and came with the assumption that the messages sent over the socket would fit in the OS internal buffers in one go as well. No bueno. Fixing this properly required one of the following: adding threads to handle reads and writes over two sockets in any order, dealing with the annoying / family of system calls, or using an async runtime and library (tokio) to deal with the event-like nature of proxying data between two network connections. Handling termination signals with proper cleanup became straightforward. The previous code had to install a signal handler and deal with potential races in the face of blocking system calls by doing manual polling of incoming requests, which isn’t good if you like power efficiency. Using tokio made this trivial and in a way that I more easily trust is correct. I could easily implement the connection proxying by using an event-driven loop and not having to reason about threads and their terminating conditions. Funnily enough, after a couple of hours of hacking, I felt proud of the proxying algorithm and the comprehensive unit tests I had written so I asked Gemini for feedback, and… while it told me my code was correct, it also said I could replace it all with a single call to a primitive! Fun times. I still don’t trust AI to write much code for me, but I do like it a lot to perform code reviews. Documentation, which made me write a manual page . A proper installation process, which made me write a traditional -like script because doesn’t support installing supporting documents. A tag and release number, which many people forget about doing these days but are critical if you want the code to be packaged in upstream OSes.

0 views
Corrode 3 weeks ago

2025 Holiday Special

As we close the chapter on 2025 and celebrate our second year of ‘Rust in Production’, it’s time to reflect on the highlights of the 17 episodes since our last holiday special. We looked at Rust from all angles, from cloud infrastructure to embedded systems, and from robotics to satellite technology. One thing that all these stories have in common is the passion and dedication of the Rust community to build faster, safer, and more reliable software. In this special episode, we look back at some of the memorable moments from the past year and celebrate Rust’s achievements. This goes beyond the case studies we’ve covered; it’s about the Rust community as a whole and the state of the Rust ecosystem at the end of 2025. CodeCrafters helps you become proficient in Rust by building real-world, production-grade projects. Learn hands-on by creating your own shell, HTTP server, Redis, Kafka, Git, SQLite, or DNS service from scratch. Start for free today and enjoy 40% off any paid plan by using this link . Code.Talks Talk - Matthias’ presentation on Rust case studies Stack Overflow Developer Survey 2025 - Rust as most admired language since 1.0 release Brave with Anton Lazarev (S03E07) - Rust as the go-to language Volvo with Julius Gustavsson (S03E08) - Empowering engineers Astral with Charlie Marsh (S04E03) - Welcoming community leads to huge impact Scythe with Andrew Tinka (S05E02) - Confidence in what you build Rust4Linux CVE - The first CVE in Rust for Linux Greg KH post - Context on kernel CVE statistics curl with Daniel Stenberg (S02E01) - Bug reports every three hours, code constantly changes curl statistics - How old code gets rewritten all the time Tembo with Adam Hendel (S04E05) - Software is never done Redis CVE-2025-49844 - Remote code execution vulnerability from use-after-free Canonical with John Seager (S05E05) - Ubuntu is optimistic about Rust Rust in Android - Memory safety vulnerabilities below 20% Android statistics - 3.9 billion active devices worldwide Roc with Richard Feldman (S05E04) - Focus on the end user Svix with Tom Hacohen (S04E02) - Love it, but compile times… Prime Video with Alexandru Ene (S05E01) - Build times need to improve crates.io - 200 billion crate downloads and 200k published crates Cloudflare with Kevin Guthrie and Edward Wang (S05E03) - Ecosystem is fantastic; thanks to all maintainers Rust Conferences 2026 - Complete list of upcoming Rust conferences CodeCrafters Course - Build your own HTTP server in Rust Rust Project Goals - November update on 41 active project goals cargo-script RFC - Run Rust scripts without full Cargo projects Better pin ergonomics RFC - Improving async Rust ergonomics KSAT with Vegard Sandengen (S04E07) - Make async better 1Password with Andrew Burkhart (S04E06) - Make it easier to learn Rust Rust Book by Brown University - Interactive learning resource for Rust Clippy lints - All available linter rules for Rust C++ and Rust interop - Safer language interoperability initiative Microsoft with Victor Ciura (S04E01) - C++ doesn’t have to die for Rust to succeed BorrowSanitizer initiative - LLVM instrumentation for detecting aliasing violations Polonius - Next-generation borrow checker Rust with Niko Matsakis (S04E04) - Be excellent to each other (Bill & Ted reference)

0 views
matklad 3 weeks ago

Static Allocation For Compilers

TigerBeetle famously uses “static allocation” . Infamously, the use of the term is idiosyncratic: what is meant is not arrays, as found in embedded development, but rather a weaker “no allocation after startup” form. The amount of memory TigerBeetle process uses is not hard-coded into the Elf binary. It depends on the runtime command line arguments. However, all allocation happens at startup, and there’s no deallocation. The long-lived event loop goes round and round happily without . I’ve wondered for years if a similar technique is applicable to compilers. It seemed impossible, but today I’ve managed to extract something actionable from this idea? Static allocation depends on the physics of the underlying problem. And distributed databases have surprisingly simple physics, at least in the case of TigerBeetle. The only inputs and outputs of the system are messages. Each message is finite in size (1MiB). The actual data of the system is stored on disk and can be arbitrarily large. But the diff applied by a single message is finite. And, if your input is finite, and your output is finite, it’s actually quite hard to need to allocate extra memory! This is worth emphasizing — it might seem like doing static allocation is tough and requires constant vigilance and manual accounting for resources. In practice, I learned that it is surprisingly compositional. As long as inputs and outputs of a system are finite, non-allocating processing is easy. And you can put two such systems together without much trouble. routing.zig is a good example of such an isolated subsystem. The only issue here is that there isn’t a physical limit on how many messages can arrive at the same time. Obviously, you can’t process arbitrary many messages simultaneously. But in the context of a distributed system over an unreliable network, a safe move is to drop a message on the floor if the required processing resources are not available. Counter-intuitively, not allocating is simpler than allocating, provided that you can pull it off! Alas, it seems impossible to pull it off for compilers. You could say something like “hey, the largest program will have at most one million functions”, but that will lead to both wasted memory and poor user experience. You could also use a single yolo arena of a fixed size, like I did in Hard Mode Rust , but that isn’t at all similar to “static allocation”. With arenas, the size is fixed explicitly, but you can OOM. With static allocation it is the opposite — no OOM, but you don’t know how much memory you’ll need until startup finishes! The “problem size” for a compiler isn’t fixed — both the input (source code) and the output (executable) can be arbitrarily large. But that is also the case for TigerBeetle — the size of the database is not fixed, it’s just that TigerBeetle gets to cheat and store it on disk, rather than in RAM. And TigerBeetle doesn’t do “static allocation” on disk, it can fail with at runtime, and it includes a dynamic block allocator to avoid that as long as possible by re-using no longer relevant sectors. So what we could say is that a compiler consumes arbitrarily large input, and produces arbitrarily large output, but those “do not count” for the purpose of static memory allocation. At the start, we set aside an “output arena” for storing finished, immutable results of compiler’s work. We then say that this output is accumulated after processing a sequence of chunks, where chunk size is strictly finite. While limiting the total size of the code-base is unreasonable, limiting a single file to, say, 4 MiB (runtime-overridable) is fine. Compiling then essentially becomes a “stream processing” problem, where both inputs and outputs are arbitrary large, but the filter program itself must execute in O(1) memory. With this setup, it is natural to use indexes rather than pointers for “output data”, which then makes it easy to persist it to disk between changes. And it’s also natural to think about “chunks of changes” not only spatially (compiler sees a new file), but also temporally (compiler sees a new version of an old file). Is there any practical benefits here? I don’t know! But seems worth playing around with! I feel that a strict separation between O(N) compiler output and O(1) intermediate processing artifacts can clarify compiler’s architecture, and I won’t be too surprised if O(1) processing in compilers would lead to simpler code the same way it does for databases?

0 views

CHERIoT: Complete Memory Safety for Embedded Devices

CHERIoT: Complete Memory Safety for Embedded Devices Saar Amar, David Chisnall, Tony Chen, Nathaniel Wesley Filardo, Ben Laurie, Kunyan Liu, Robert Norton, Simon W. Moore, Yucong Tao, Robert N. M. Watson, and Hongyan Xia Micro'23 If you are like me, you’ve vaguely heard of CHERI , but never really understood what it is about. Here it is in one sentence: hardware support for memory protection embedded in every single pointer. This particular paper focuses on CHERI implementation details for embedded/IoT devices. The C in C HERI stands for capability . A capability is a fat pointer which contains an address, bounds, and access permissions. Fig. 1 shows the bit layout (64 bits total) of the capabilities used by CHERIoT: Source: https://dl.acm.org/doi/10.1145/3613424.3614266 Here is the fundamental concept to understand: the only way to access memory in a CHERI architecture is via a capability. There are no pointers other than capabilities. The hardware uses special tag bits (associated with registers and memory locations), to track which registers or memory addresses contain valid capabilities, and which do not. In the following example, is a regular old integer (the associated tag bit will not be set). is a pointer generated by reinterpreting the bits in . The tag bit associated with will not be set, and thus memory read/writes using will fail. If a programmer cannot create a capability willy-nilly, where do they come from? At boot, the hardware creates the uber-capability (i.e., one which has full permissions to access all memory) and places this capability into a specific register. The initial OS code that runs on boot can access this capability and can use special instructions to derive new capabilities from it. For example, the OS could derive a capability which has read-only access to the first 1 MiB of memory. The owner of a capability may derive new capabilities from it, but hardware ensures a derived capability cannot have broader permissions than the capability from which it was derived. CHERIoT is designed for embedded use cases, which have real-time requirements. MMUs/MPUs can add variability because they usually contain caches (e.g., TLBs) which have dramatically different performance characteristics in hit vs. miss cases. CHERIoT does away with this. There is no memory translation, and memory protection is supported on a per-capability basis (as opposed to a per-page tracking in the MPU). This is pretty cool: capabilities not only give fine-grained memory protection, but they also make performance more consistent by removing a cache from the system. Each capability represents a range of memory which can be accessed. Three fields (comprising 22 bits total) in each capability are used to represent the size of memory which is accessible by the capability. The encoding is a bit like floating point, with an exponent field which allows small sizes (i.e., less than 512 bytes) to be represented exactly, while larger sizes require padding. Astute readers will ask themselves: “how does CHERIoT prevent use-after-free bugs? A call to must somehow invalidate all capabilities which point at the freed region, correct?” CHERIoT introduces heap revocation bits . Because the total amount of RAM is often modest in embedded use cases CHERIoT can get away with a dedicated SRAM to hold heap revocation bits. There is 1 revocation bit per 8 bytes of RAM. Most software does not have direct access to these bits, but the heap allocator does. All revocation bits are initially set to zero. When the heap allocator frees memory, it sets the corresponding bits to one. The hardware uses these bits to prevent capabilities from accessing freed memory. You may think that CHERIoT checks revocation bits on each memory access, but it does not. Instead, the hardware load filter checks the revocation bits when the special “load capability” ( ) instruction is executed. This instruction is used to load a capability from memory into a register. The tag bit associated with the destination register is set to one only if the revocation bit associated with the address the capability points to is zero, and the tag bit of the source address is one. The final ingredient in this recipe is akin to garbage collection. CHERIoT supports what I like to think of as a simple garbage collection hardware accelerator called the background pipelined revoker . Software can request this hardware to scan a range of memory. Scanning occurs “in the background” (i.e., in clock cycles where the processor was not accessing memory). The background revoker reuses existing hardware to load each potential capability in the specified memory range, and then store it back. The load operation reads the associated tag bit and revocation bit, while the store operation updates the tag bit. This clears the tag bit for any capability that points to revoked memory. Once the background revoker has finished scanning all memory, the heap allocator can safely set the revocation bits associated with recently freed allocations back to zero and reuse the memory to satisfy future heap allocations. The authors modified two existing processors to support CHERIoT. Flute is a 5-stage processor with a 64-bit memory bus. Ibex is a 2 or 3 stage processor with a 32-bit memory bus. Table 2 shows the area and power cost associated with extending the Ibex processor to support CHERIoT (roughly 2x for both metrics): Source: https://dl.acm.org/doi/10.1145/3613424.3614266 Table 3 uses CoreMark to measure the performance overhead associated with CHERIoT: Source: https://dl.acm.org/doi/10.1145/3613424.3614266 Dangling Pointers I would be interested to know how easily C#/Go/Rust can be modified to use CHERI hardware bounds checking rather than software bounds checking for array accesses. This seems like an area where CHERI could win back some performance. Subscribe now

0 views
ENOSUCHBLOG 1 months ago

Dependency cooldowns, redux

Three weeks ago I wrote about how we should all be using dependency cooldowns . This got a lot of attention, which is great! So I figured I owe you, dear reader, an update with (1) some answers to common questions I’ve received, and (2) some updated on movements in large open source ecosystems since the post’s publication. Question : Aren’t cooldowns a self-defeating policy? In other words: if everyone uses cooldowns wouldn’t we be in the exact same situation, just shifted back by the cooldown period? Answer : I think there are two parts to this: The observation in the original post is that there are parties other than downstreams in the open source ecosystem, namely secuity partners (vendors) and index maintainers themselves. These parties have strong incentives to proactively monitor, report, and remove malicious packages from the ecosystem. Most importantly, these incentives are timely, even when user installations are not. Even with a universal exhortation to use cooldowns, I think universal adoption is clearly not realistic: there are always going to be people who live at the edge. If those people want to be the proverbial canaries in the coal mine, that’s their prerogative! Or in other words, we certainly all should be using cooldowns, but clearly that’s never going to happen. Question : What about security updates? Wouldn’t cooldowns delay important security patches? Answer : I guess so, but you shouldn’t do that! Cooldowns are a policy , and all policies have escape hatches. The original post itself notes an important escape hatch that already exists in how cooldowns are implemented in tools like Dependabot: cooldowns don’t apply to security updates. In other words: ecosystems that are considering implementing cooldowns directly in their packaging tools should make sure that users can encode exceptions 1 as necessary. Question : Doesn’t this incentivize attackers to abuse the vulnerability disclosure process? In other words, what stops an attacker from reporting their own malicious release as a vulnerability fix in order to bypass cooldowns? Answer : Nothing, in principle: this certainly does make vulnerability disclosures themselves an attractive mechanism for bypassing cooldowns. However, in practice, I think an attacker’s ability to do this is limited by (at least) three factors: Creating a public vulnerability disclosure on a project (e.g. via GitHub Security Advisories ) generally requires a higher degree of privilege/comprehensive takeover than simply publishing a malicious version. Specifically: most of the malicious publishing activity we’ve seen thus far involves a compromised long-lived publishing credential (e.g. an npm or PyPI API token), rather than full account takeover. We might seek that kind of full ATO in the future, but it’s a significantly higher bar (particularly in the presence of in-session MFA challenges on GitHub). The name of the game continues to be maximizing the window of opportunity, which is often shorter than a single day. At this timescale, hours are significant. But fortunately 2 for us, propagating a public vulnerability disclosure takes a nontrivial amount of time: CVEs take a nontrivial amount of time to assign 3 , and ecosystem-level propagation of vulnerability information (e.g. into RUSTSEC or PYSEC) typically happens on the timeframe of hours (via scheduled batch jobs). Consequently, abusing the vulnerability disclosure process to bypass cooldowns requires the attacker to shorten their window of opportunity, which isn’t in their interest. That doesn’t mean they won’t do it (especially as the update loop between advisories and ecosystem vulnerability databases gets shorter), but it does stand to reason that it disincentivizes this kind of abuse to some degree. Stealth. Creating a public vulnerability disclosure is essentially a giant flashing neon sign to security-interested parties to look extremely closely at a given release. More specifically, it’s a signal to those parties that they should diff the new release against the old (putatively vulnerable) one, to look for the vulnerable code. This is the exact opposite of what the attacker wants: they’re trying to sneak malicious code into the new release, and are trying to avoid drawing attention to it. Question : What about opportunistic abuse of the vulnerability disclosure process? For example, if is vulnerable and is a legitimate security update, what stops the attacker from publishing with malicious code immediately after ? Answer : This is a great example of why cooldown policies (and their bypasses) struggle to be universal without human oversight. Specifically, it demonstrates why bypasses should probably be minimal by default : if both and claim to address a vulnerability and both require bypassing the cooldown, then selecting the lower version is probably the more correct choice in an automatic dependency updating context. This is a somewhat free-form section for ecosystem changes I’ve noticed since the original post. If there are others I’ve missed, please let me know! uv has added support for dependency cooldowns via relative values. Recently released versions of uv already include this feature. For example, a user can do to exclude any dependency updates published within the last seven days. Documentation: uv - dependency cooldowns . References: astral-sh/uv#16814 pip is adding an absolute point-in-time feature via . This is currently slated to be released with pip 26, i.e. early next year. In addition to the “absolute” cooldown feature above, pip is also considering adding a relative cooldown feature similar to uv’s. This is being tracked in pypa/pip#13674 . References: pypa/pip#13625 , pypa/pip#13674 pnpm has had cooldowns since September with v10.16 , with ! They even have a cooldown exclusion feature. yarn added cooldown support one month after pnpm, via npm does not have cooldown support yet. There appear to be several discussions about it, some of which date back years. npm/rfcs#646 and npm/cli#8570 appear to have most of the context. npm/cli#8802 is also open adding an implementation of the above RFC. pinact has added a flag to support cooldowns for GitHub Actions dependencies. Renovate and Dependabot already do a decent job of providing cooldowns for GitHub Actions updates, including support for hash-pinning and updating the associated version comment. Security updates are the most obvious exception, but it also seems reasonable to me to allow people to encode exceptions for data-only dependencies, first-party dependencies, trusted dependencies, and so forth. It’s clearly a non-trivial and non-generalizable problem!  ↩ For some definition of “fortunate”: clearly we want TTLs for vulnerability disclosures to be as short as possible in normal, non-malicious circumstances!  ↩ Unlike GHSAs, which are typically assigned instantly. For better or worse however, CVE IDs continue to be the “reference” identifier for ecosystem-level vulnerability information propagation.  ↩ The observation in the original post is that there are parties other than downstreams in the open source ecosystem, namely secuity partners (vendors) and index maintainers themselves. These parties have strong incentives to proactively monitor, report, and remove malicious packages from the ecosystem. Most importantly, these incentives are timely, even when user installations are not. Even with a universal exhortation to use cooldowns, I think universal adoption is clearly not realistic: there are always going to be people who live at the edge. If those people want to be the proverbial canaries in the coal mine, that’s their prerogative! Or in other words, we certainly all should be using cooldowns, but clearly that’s never going to happen. Creating a public vulnerability disclosure on a project (e.g. via GitHub Security Advisories ) generally requires a higher degree of privilege/comprehensive takeover than simply publishing a malicious version. Specifically: most of the malicious publishing activity we’ve seen thus far involves a compromised long-lived publishing credential (e.g. an npm or PyPI API token), rather than full account takeover. We might seek that kind of full ATO in the future, but it’s a significantly higher bar (particularly in the presence of in-session MFA challenges on GitHub). The name of the game continues to be maximizing the window of opportunity, which is often shorter than a single day. At this timescale, hours are significant. But fortunately 2 for us, propagating a public vulnerability disclosure takes a nontrivial amount of time: CVEs take a nontrivial amount of time to assign 3 , and ecosystem-level propagation of vulnerability information (e.g. into RUSTSEC or PYSEC) typically happens on the timeframe of hours (via scheduled batch jobs). Consequently, abusing the vulnerability disclosure process to bypass cooldowns requires the attacker to shorten their window of opportunity, which isn’t in their interest. That doesn’t mean they won’t do it (especially as the update loop between advisories and ecosystem vulnerability databases gets shorter), but it does stand to reason that it disincentivizes this kind of abuse to some degree. Stealth. Creating a public vulnerability disclosure is essentially a giant flashing neon sign to security-interested parties to look extremely closely at a given release. More specifically, it’s a signal to those parties that they should diff the new release against the old (putatively vulnerable) one, to look for the vulnerable code. This is the exact opposite of what the attacker wants: they’re trying to sneak malicious code into the new release, and are trying to avoid drawing attention to it. uv has added support for dependency cooldowns via relative values. Recently released versions of uv already include this feature. For example, a user can do to exclude any dependency updates published within the last seven days. Documentation: uv - dependency cooldowns . References: astral-sh/uv#16814 pip is adding an absolute point-in-time feature via . This is currently slated to be released with pip 26, i.e. early next year. In addition to the “absolute” cooldown feature above, pip is also considering adding a relative cooldown feature similar to uv’s. This is being tracked in pypa/pip#13674 . References: pypa/pip#13625 , pypa/pip#13674 cargo is discussing a design for cooldowns in rust-lang/cargo#15973 . This discussion predates my blog post, but appears to have been reinvigorated by it. There’s been some discussion in the Bundler community Slack about adding cooldowns directly to the gem.coop index, e.g. providing index “views” like for index-level cooldowns. I think this is a very cool approach! The NuGet community is discussing a cooldown design in NuGet/Home#14657 . pnpm has had cooldowns since September with v10.16 , with ! They even have a cooldown exclusion feature. yarn added cooldown support one month after pnpm, via npm does not have cooldown support yet. There appear to be several discussions about it, some of which date back years. npm/rfcs#646 and npm/cli#8570 appear to have most of the context. npm/cli#8802 is also open adding an implementation of the above RFC. Go is discussing the feasibility/applicability of cooldowns in golang/go#76485 . pinact has added a flag to support cooldowns for GitHub Actions dependencies. Renovate and Dependabot already do a decent job of providing cooldowns for GitHub Actions updates, including support for hash-pinning and updating the associated version comment. Security updates are the most obvious exception, but it also seems reasonable to me to allow people to encode exceptions for data-only dependencies, first-party dependencies, trusted dependencies, and so forth. It’s clearly a non-trivial and non-generalizable problem!  ↩ For some definition of “fortunate”: clearly we want TTLs for vulnerability disclosures to be as short as possible in normal, non-malicious circumstances!  ↩ Unlike GHSAs, which are typically assigned instantly. For better or worse however, CVE IDs continue to be the “reference” identifier for ecosystem-level vulnerability information propagation.  ↩

1 views
Corrode 1 months ago

Rust for Linux

Bringing Rust into the Linux kernel is one of the most ambitious modernization efforts in open source history. The Linux kernel, with its decades of C code and deeply ingrained development practices, is now opening its doors to a memory-safe language. It’s the first time in over 30 years that a new programming language has been officially adopted for kernel development. But the journey is far from straightforward. In this episode, we speak with Danilo Krummrich, Linux kernel maintainer and Rust for Linux core team member, about the groundbreaking work of integrating Rust into the Linux kernel. Among other things, we talk about the Nova GPU driver, a Rust-based successor to Nouveau for NVIDIA graphics cards, and discuss the technical challenges and cultural shifts required for large-scale Rust adoption in the kernel as well as the future of the Rust4Linux project. CodeCrafters helps you become proficient in Rust by building real-world, production-grade projects. Learn hands-on by creating your own shell, HTTP server, Redis, Kafka, Git, SQLite, or DNS service from scratch. Start for free today and enjoy 40% off any paid plan by using this link . Rust for Linux is a project aimed at bringing the Rust programming language into the Linux kernel. Started to improve memory safety and reduce vulnerabilities in kernel code, the project has been gradually building the infrastructure, abstractions, and tooling necessary for Rust to coexist with the kernel’s existing C codebase. Danilo Krummrich is a software engineer at Red Hat and a core contributor to the Rust for Linux project. In January 2025, he was officially added as a reviewer to the RUST entry in the kernel’s MAINTAINERS file, recognizing his expertise in developing Rust abstractions and APIs for kernel development. Danilo maintains the and branches and is the primary developer of the Nova GPU driver, a fully Rust-based driver for modern NVIDIA GPUs. He is also a maintainer of RUST [ALLOC] and several DRM-related kernel subsystems. AOSP - The Android Open Source Project Kernel Mailing Lists - Where the Linux development happens Miguel Ojeda - Rust4Linux maintainer Wedson Almeida Filho - Retired Rust4Linux maintainer noveau driver - The old driver for NVIDIA GPUs Vulkan - A low level graphics API Mesa - Vulkan and OpenGL implementation for Linux vtable - Indirect function call, a source of headaches in nouveau DRM - Direct Rendering Manager, Linux subsystem for all things graphics Monolithic Kernel - Linux’ kernel architecture The Typestate Pattern in Rust - A very nice way to model state machines in Rust pinned-init - The userspace crate for pin-init rustfmt - Free up space in your brain by not thinking about formatting kunit - Unit testing framework for the kernel Rust core crate - The only part of the Rust Standard Library used in the Linux kernel Alexandre Courbot - NVIDIA employed co-maintainer of nova-core Greg Kroah-Hartman - Linux Foundation fellow and major Linux contributor Dave Airlie - Maintainer of the DRM tree vim - not even neovim mutt - classic terminal e-mail client aerc - a pretty good terminal e-mail client Rust4Linux Zulip - The best entry point for the Rust4Linux community Rust for Linux GitHub Danilo Krummrich on GitHub Danilo Krummrich on LinkedIn

0 views

packnplay: Making it easy to run coding agents in containers

TL;DR: I built a tool to make it easier to run your favorite coding agent in a container without a lot of setup. It's called packnplay. You can find it on GitHub A couple months back, the folks at StrongDM open sourced Leash , a tool for Docker and macOS that gives you really granular hooks to control your coding agents. You can allow and deny individual network connections and syscalls. On the frontend, there's a slick webui that gives you a realtime view of what your agents are doing and the ability to toggle access to resources on the fly. The whole thing is built around Amazon Cedar. On the backend, they've done some really impressive work to instrument Docker to make this possible. But that's not half as cool as what they've done for the macOS native version of the sandbox. Since they're an enterprise security company, they managed to talk Apple into giving them the entitlement to build a system extension that provides a syscall and network filter. Think Little Snitch, but with filesystem and syscall control, too. And then they built a wrapper that runs your coding agent with that magic enabled, giving you access to the same dashboards as the Docker implementation. After spending a little bit of time with Leash, I was a convert to the idea of running my agents in containers. Leash is built to support enterprises running huge swarms of agents who need very fine-grained access control. I usually have a dozen at most. In a lot of ways, it's overkill for what I need. What I wanted was simpler: spin up a dev container preconfigured for a coding agent, with the right credentials and source code mounted, and let the agent run in   mode without worrying about it escaping. So I built , a wrapper around Docker/Orbstack/etc that gives you an easy way to spin up a relatively safe, ephemeral container to let your agent go wild. is all it takes to set up a new container. Your project's source mounts at the same path you'd see it outside the container. So if I'm working on Superpowers at , that's exactly where the directory will be mounted inside the container. To get another session in the same container, just run something like this from the directory where you launched packnplay: Containers don't automatically shut down when your agent session disconnects, so it's possible to restart to pick up config changes, etc. The default container and settings runtime support a whole bunch of coding agents today: claude, gemini-cli, codex, copilot, qwen-code, amp, opencode. By far, the most complicated part of all of this was figuring out how to make sure that Claude Code doesn't log you out and that your Claude Code settings don't get corrupted when you're running agents both inside and outside the container. (The very, very short version is that is constantly rewritten by Claude Code and their file locking doesn't work across container boundaries, so we give your containers a standalone . Similarly, if you reuse the same Claude Code subscription token across two different operating systems, it appears that Anthropic's anti-fraud systems kick in and expire the token. So maintains its own Claude Code subscription token for use in your containers.) The default container comes prepopulated with common tooling for typescript, go, python, rust, etc. Corey Quinn contributed AWS tooling and configurable AWS credential management. If you're building with AWS, things should just work. There's basic configuration for GCP, as well. You can configure to proxy your Git and GitHub credentials into containers. And, if the default container doesn't float your boat, you can use a standard devcontainer config. will transparently pick up the .devcontainer config from your project's repo and use that in preference to our default container. With automatic worktree support, custom environment variable 'bundles', and port forwarding, ought to be able to match your existing workflows. Give your agents a safe place to do their thing with . If doesn't yet support your favorite agent, I'd love a PR.

0 views
Blog System/5 1 months ago

From Azure Functions to FreeBSD

On Thanksgiving morning, I woke up to one of my web services being unavailable. All HTTP requests failed with a “503 Service unavailable” error. I logged into the console, saw a simplistic “Runtime version: Error” message, and was not able to diagnose the problem. I did not spend a lot of time trying to figure the issue out and I didn’t even want to contact the support black hole. Because… there was something else hidden behind an innocent little yellow warning at the top of the dashboard: Migrate your app to Flex Consumption as Linux Consumption will reach EOL on September 30 2028 and will no longer be supported. I had known for a few weeks now, while trying to set up a new app, that all of my Azure Functions apps were on death row. The free plan I was using was going to be decommissioned and the alternatives I tried didn’t seem to support custom handlers written in Rust. I still had three years to deal with this, but hitting a showstopper error pushed me to take action. All of my web services are now hosted by the FreeBSD server in my garage with just a few tweaks to their codebase. This is their migration story. Blog System/5 and the open source projects described below are all made in my limited free time. Subscribe now to show your support; it goes a long way! Back in 2021, I had been developing my EndBASIC language for over a year and I wanted to create a file sharing service for it. Part of this was to satisfy my users, but another part was to force myself into the web services world as I felt “behind”. At that time, I had also been at Microsoft for a few months already working on Azure Storage. One of the perks of the job was something like $300 of yearly credit to deploy stuff on Azure for learning purposes. It was only “natural” that I’d pick Azure for what I wanted to do with EndBASIC. Now… $300 can be plentiful for a simple app, but it can also be paltry. Running a dedicated VM would eat through this in a couple of months, but the serverless model offered by Azure Functions with its “infinite” free tier would go a long way. I looked at their online documentation, found a very good guide on how to deploy Rust-native functions onto a Linux runtime , and… I was sold. I quickly got a bare bones service up and running on Azure Functions and I built it up from there. Based on these foundations, I later developed a separate service for my own site analytics (poorly named EndTRACKER ), and I recently started working on a new service to provide secure auto-unlock of encrypted ZFS volumes (stay tuned!). And, for the most part, the experience with Azure has been neat. I learned a bunch and I got to a point where I had set up “push on green” via GitHub Actions and dual staging vs. prod deployments. The apps ran completely on their own for the last three years, a testament to the stability of the platform and to the value of designing for testability . Until now that is. Compute-wise, I was set: Azure Functions worked fine as the runtime for my apps’ logic and it cost pennies to run, so the $300 was almost untouched. But web services aren’t made of compute alone: they need to store data, which means they need a database. My initial research in 2021 concluded that the only option for a database instance with a free plan was to go with, no surprise, serverless Microsoft SQL Server (MSSQL). I had never used Microsoft’s offering but it couldn’t be that different from PostgreSQL or MySQL, could it? Maybe so, but I didn’t get very far in that line of research. The very first blocker I hit was that the MSSQL connection required TLS and this hadn’t been implemented in the connector I chose to use for my Rust-based functions. I wasted two weeks implementing TLS support in (see PR #1200 and PR #1203 ) and got it to work, but that code was not accepted upstream because it conflicted with their business strategy. Needless to say, this was disappointing because getting that to work was a frigging nightmare. In any case, once I passed that point, I started discovering more missing features and bugs in the MSSQL connector, and then I also found some really weird surprises in MSSQL’s dialect of SQL. TL;DR, this turned into a dead end. On the left, the default instance and cost selected by Azure when choosing to create a managed PostgreSQL server today. On the right, minimum possible cost after dialing down CPU, RAM, disk, and availability requirements. I had no choice other than to provision a full PostgreSQL server on Azure. Their onboarding wizard tried to push me towards a pretty beefy and redundant instance that would cost over $600 per month when all I needed was the lowest machine you could get for the amount of traffic I expected. Those options were hidden under a “for development only” panel and riddled with warnings about no redundancy, but after I dialed all the settings down and accepted the “serious risks”, I was left with an instance that’d cost $15 per month or so. This fit well well within the free yearly credit I had access to, so that was it. About two months ago, I started working on a new service to securely auto-unlock ZFS encrypted volumes (more details coming). For this, I had to create a new Azure Functions deployment… and I started seeing the writing on the wall. I don’t remember the exact details, but it was really difficult to get the creation wizard to provision me the same flex plan I had used for my other services, and it was warning me that the selected plan was going to be axed in 2028. At the time of this writing, 2028 is still three years out and this warning was for a new service I was creating. I didn’t want to consider migrating neither EndBASIC nor EndTRACKER to something else just yet. Until Thanksgiving, that was. On Thanksgiving morning, I noticed that my web analytics had stopped working. All HTTP API requests failed with a “503 Service unavailable.” error but, interestingly, the cron-triggered APIs were still running in the background just fine and the staging deployment slot of the same app worked fine end-to-end as well. I tried redeploying the app with a fresh binary, thinking that a refresh would fix the problem, but that was of no use. I also poked through the dashboard trying to figure out what “Runtime version: Error” would be about, making sure the version spec in was up-to-date, and couldn’t figure it out either. Summary state of my problematic Azure Functions deployment. Note the cryptic runtime error along with the subtle warning at the top about upcoming deprecations. So… I had to get out of Azure Functions, quick. Not accidentally, I had bought a second-hand, over-provisioned ThinkStation (2x36-core Xeon E5-2697, 64 GB of RAM, a 2 TB NVMe drive, and a 4x4 TB HDD array) just two years back. The justification I gave myself was to use it as my development server, but I had this idea in the back of my mind to use it to host my own services at some point. The time to put it to serving real-world traffic with FreeBSD 14.x had come. The way you run a serverless Rust (or Go) service on Azure Functions is by creating a binary that exposes an HTTP server on the port provided to it by the environment variable. Then, you package the binary along a set of metadata JSON files that tell the runtime what HTTP routes the binary serves and push the packaged ZIP file to Azure. From there on, the Azure Functions runtime handles TLS termination for those routes, spawns your binary server on a micro VM on demand, and redirects the requests to it. By removing the Azure Functions runtime from the picture, I had to make my server binary stand alone. This was actually pretty simple because the binary was already an HTTP server: it just had to be coerced into playing nicely with FreeBSD’s approach to running services. In particular, I had to: Inject configuration variables into the server process at startup time. These used to come from the Azure Functions configuration page, and are necessary to tell the server where the database lives and what credentials to use. Make the service run as an unprivileged user, easily. Create a PID file to track the execution of the process so that the framework could handle restarts and stop requests. Store the logs that the service emits via stderr to a log file, and rotate the log to prevent local disk overruns. Most daemons implement all of the above as features in their own code, but I did not want to have to retrofit all of these into my existing HTTP service in a rush. Fortunately, FreeBSD provides this little tool, daemon(8) , which wraps an existing binary and offers all of the above. This incantation was enough to get me going: I won’t dive into the details of each flag, but to note: specifies which PID file to create; specifies where to store the stdout and stderr of the process; is required for log rotation (much more below); drops privileges to the given user; and specifies the “title” of the process to display in output. The trick was sufficient to inject configuration variables upon process startup, simulating the same environment that my server used to see when spawned by the Azure Functions runtime. Hooking that up into an service script was then trivial: And with that: Ta-da! I had the service running locally and listening to a local port determined in the configuration file. As part of the migration out of Azure Functions, I switched to self-hosting PostgreSQL as well. This was straightforward but required a couple of extra improvements to my web framework: one to stop using a remote PostgreSQL instance for tests (something I should have done eons ago), and another to support local peer authentication to avoid unnecessary passwords. In the call to above, I briefly mentioned the need for the flag to support log rotation. What’s that about? You see, in Unix-like systems, when a process opens a file, the process holds a handle to the open file. If you delete or rename the file, the handle continues to exist exactly as it was . This has two consequences: If you rename the file, all subsequent reads and writes go to the new file location, not the old one. If you delete the file, all subsequent reads and writes continue to go to disk but to a file you cannot reference anymore. You can run out of disk space and, while will confirm the fact, will not let you find what file is actually consuming it! For a long-running daemon that spits out verbose logs, writing them to a file can become problematic because you can end up running out of disk space. To solve this problem, daemons typically implement log rotation : a mechanism to keep log sizes in check by moving them aside when a certain period of time passes or when they cross a size threshold, and then only keeping the last N files around. Peeking into one of the many examples in my server, note how is the “live” log where writes go to but there is a daily archive for up to a week: Having all daemons implement log rotation logic on their own would be suboptimal because you’d have duplicate logic throughout the system and you would not be able to configure policy easily for them all. This is where newsyslog(8) on FreeBSD (or on Linux) comes into play. is a tool that rotates log files based on criteria such as size or time and optionally compresses them. But remember: the semantics of open file handles mean that simply renaming log files is insufficient! Once takes action and moves a log file aside, it must ensure that the process that was writing to that file closes the file handle and reopens it so that writes start going to the new place. This is typically done via sending a to the daemon, and is why we need to pass to the call. To illustrate the sequence: The system starts a service via and redirects logs to . runs and determines that needs to be rotated because a day has passed. renames to and creates a new and empty . At this point is still writing to ! sends a to the process. The process closes its file handle for the log, reopens (which is the fresh new log file), and resumes writing. compresses the file for archival now that it’s quiesced. Configuring is easy, but cryptic. We can create a service-specific configuration file under that provides entries for our service, such as: I’ll leave you to the manpage to figure out what the magic is (but in short, it controls retention count, rotation schedule, and compression). As I briefly mentioned earlier, the Azure Functions runtime was responsible for TLS termination in my previous setup. Without such a runtime in place, I had to configure TLS on my own in my HTTP server… or did I? I had been meaning to play with Cloudflare Tunnels for a while given that I already use Cloudflare for DNS. Zero Trust Tunnels allow you to expose a service without opening inbound ports in your firewall. The way this works is by installing the tunnel daemon on your machine and configuring the tunnel to redirect certain URL routes to an internal address (typically ). Cloudflare then acts as the frontend for the requests, handles TLS termination and DDOS protection, and then redirects the request to your local service. Interactions between client machines, Cloudflare servers, the cloudflared tunnel agent, and the actual HTTP servers I wrote. The obvious downside of relying on someone else to do TLS termination instead of doing it yourself on your own server is that they can intercept and modify your traffic. For the kinds of services I run this isn’t a big deal for me, and the simplicity of others dealing with certificates is well welcome. Note that I was already offloading TLS termination to Azure Functions anyway, so this isn’t a downgrade in security or privacy. But using Cloudflare as the frontend came with a little annoyance: CORS handling. You see: the services I run require configuring extra allowed origins, and as soon as I tried to connect to them via the Cloudflare tunnel, I’d get the dreaded “405 Method not allowed” error in the requests. Before, I used to configure CORS orgins from the Azure Functions console, but no amount of peeking through the Cloudflare console showed me how to do this for my tunneled routes. At some point during the investigation, I assumed that I had to configure CORS on my own server. I’m not sure how I reached that bogus conclusion, but I ended up wasting a few hours implementing a configuration system for CORS in my web framework . Nice addition… but ultimately useless. I had not accounted for the fact that because Cloudflare acts as the frontend for the services, it is the one responsible for handling the pre-flight HTTP requests necessary for CORS. In turn, this means that Cloudflare is where CORS needs to be configured but there is nothing “obvious” about configuring CORS in the Cloudflare portal. AI to the rescue! As skeptical as I am of these tools, it’s true that they work well to get answers to common problems—and figuring out how to deal with CORS in Cloudflare was no exception. They told me to configure a transformation rule that explicitly sets CORS response headers for specific subdomains, and that did the trick: Sample rule configuration on the Cloudflare portal to rewrite CORS response headers. Even though AI was correct in this case, the whole thing looked fishy to me, so I did spend time reading about the inner workings of CORS to make sure I understood what this proposed solution was about and to gain my own confidence that it was correct. By now, my web services are now fully running on my FreeBSD machine. The above may have seemed complicated, but in reality it was all just a few hours of work on Thanksgiving morning. Let’s conclude by analyzing the results of the transition. On the plus side, here is what I’ve gained: Predictability: Running in the cloud puts you at the mercy of the upgrade and product discontinuation treadmill of big cloud providers. It’s no fun to have to be paying attention to deprecation messages and adjust to changes no matter how long the deadlines are. FreeBSD also evolves, of course, but it has remained pretty much the same over the last 30 years and I have no reason to believe it’ll significantly change in the years to come. Performance: My apps are so much faster now it’s ridiculous. The serverless runtime of Azure Functions starts quickly for sure, but it just can’t beat a server that’s continuously running and that has hot caches at all layers. That said, I bet the real difference in performance for my use case comes from collocating the app servers with the database, duh. Ease of management: In the past, having automated deployments via GitHub Actions to Azure Functions was pretty cool, not gonna lie. But… being now able to deploy with a trivial , perform administration PostgreSQL tasks with just a , and inspecting logs trivially and quickly by looking at beats any sort of online UI and distributed system. “Doesn’t scale” you say, but it scales up my time . Cost: My Azure bill has gone from $20/month, the majority of which was going into the managed PostgreSQL instance, to almost zero. Yes, the server I’m running in the garage is probably costing me the same or more in electricity, but I was running it anyway already for other reasons. And here is what I’ve lost (for now): Availability (and redundancy): The cloud gives you the chance of very high availability by providing access to multiple regions. Leveraging these extra availability features is not cheap and often requires extra work, and I wasn’t taking advantage of them in my previous setup. So, I haven’t really decreased redundancy, but it’s funny that the day right after I finished the migration, I lost power for about 2 hours. Hah, I think I hadn’t suffered any outages with Azure other than the one described in this article. A staging deployment: In my previous setup, I had dual prod and staging deployments (via Azure Functions slots and separate PostgreSQL databases—not servers) and it was cool to deploy first to staging, perform some manual validations, and then promote the deployment to prod. In practice, this was rather annoying because the deployment flow was very slow and not fully automated (see “manual testing”), but it indeed saved me from breaking prod a few times. Auto-deployments: Lastly and also in my previous setup, I had automated the push to staging and prod by simply updating tags in the GitHub repository. Once again, this was convenient, but the biggest benefit of it all was that the prod build process was “containerized” and not subject to environmental interference. I’d very well set up a cron job or webhook-triggered local service that rebuilt and deployed my services on push… but it’s now hard to beat the simplicity of . None of the above losses are inherent to self-hosting, of course. I could provide alternatives for them all and at some point I will; consider them to-dos! On Thanksgiving morning, I woke up to one of my web services being unavailable. All HTTP requests failed with a “503 Service unavailable” error. I logged into the console, saw a simplistic “Runtime version: Error” message, and was not able to diagnose the problem. I did not spend a lot of time trying to figure the issue out and I didn’t even want to contact the support black hole. Because… there was something else hidden behind an innocent little yellow warning at the top of the dashboard: Migrate your app to Flex Consumption as Linux Consumption will reach EOL on September 30 2028 and will no longer be supported. I had known for a few weeks now, while trying to set up a new app, that all of my Azure Functions apps were on death row. The free plan I was using was going to be decommissioned and the alternatives I tried didn’t seem to support custom handlers written in Rust. I still had three years to deal with this, but hitting a showstopper error pushed me to take action. All of my web services are now hosted by the FreeBSD server in my garage with just a few tweaks to their codebase. This is their migration story. Blog System/5 and the open source projects described below are all made in my limited free time. Subscribe now to show your support; it goes a long way! How did I get here? Back in 2021, I had been developing my EndBASIC language for over a year and I wanted to create a file sharing service for it. Part of this was to satisfy my users, but another part was to force myself into the web services world as I felt “behind”. At that time, I had also been at Microsoft for a few months already working on Azure Storage. One of the perks of the job was something like $300 of yearly credit to deploy stuff on Azure for learning purposes. It was only “natural” that I’d pick Azure for what I wanted to do with EndBASIC. Now… $300 can be plentiful for a simple app, but it can also be paltry. Running a dedicated VM would eat through this in a couple of months, but the serverless model offered by Azure Functions with its “infinite” free tier would go a long way. I looked at their online documentation, found a very good guide on how to deploy Rust-native functions onto a Linux runtime , and… I was sold. I quickly got a bare bones service up and running on Azure Functions and I built it up from there. Based on these foundations, I later developed a separate service for my own site analytics (poorly named EndTRACKER ), and I recently started working on a new service to provide secure auto-unlock of encrypted ZFS volumes (stay tuned!). And, for the most part, the experience with Azure has been neat. I learned a bunch and I got to a point where I had set up “push on green” via GitHub Actions and dual staging vs. prod deployments. The apps ran completely on their own for the last three years, a testament to the stability of the platform and to the value of designing for testability . Until now that is. The cloud database Compute-wise, I was set: Azure Functions worked fine as the runtime for my apps’ logic and it cost pennies to run, so the $300 was almost untouched. But web services aren’t made of compute alone: they need to store data, which means they need a database. My initial research in 2021 concluded that the only option for a database instance with a free plan was to go with, no surprise, serverless Microsoft SQL Server (MSSQL). I had never used Microsoft’s offering but it couldn’t be that different from PostgreSQL or MySQL, could it? Maybe so, but I didn’t get very far in that line of research. The very first blocker I hit was that the MSSQL connection required TLS and this hadn’t been implemented in the connector I chose to use for my Rust-based functions. I wasted two weeks implementing TLS support in (see PR #1200 and PR #1203 ) and got it to work, but that code was not accepted upstream because it conflicted with their business strategy. Needless to say, this was disappointing because getting that to work was a frigging nightmare. In any case, once I passed that point, I started discovering more missing features and bugs in the MSSQL connector, and then I also found some really weird surprises in MSSQL’s dialect of SQL. TL;DR, this turned into a dead end. On the left, the default instance and cost selected by Azure when choosing to create a managed PostgreSQL server today. On the right, minimum possible cost after dialing down CPU, RAM, disk, and availability requirements. I had no choice other than to provision a full PostgreSQL server on Azure. Their onboarding wizard tried to push me towards a pretty beefy and redundant instance that would cost over $600 per month when all I needed was the lowest machine you could get for the amount of traffic I expected. Those options were hidden under a “for development only” panel and riddled with warnings about no redundancy, but after I dialed all the settings down and accepted the “serious risks”, I was left with an instance that’d cost $15 per month or so. This fit well well within the free yearly credit I had access to, so that was it. The outage and trigger About two months ago, I started working on a new service to securely auto-unlock ZFS encrypted volumes (more details coming). For this, I had to create a new Azure Functions deployment… and I started seeing the writing on the wall. I don’t remember the exact details, but it was really difficult to get the creation wizard to provision me the same flex plan I had used for my other services, and it was warning me that the selected plan was going to be axed in 2028. At the time of this writing, 2028 is still three years out and this warning was for a new service I was creating. I didn’t want to consider migrating neither EndBASIC nor EndTRACKER to something else just yet. Until Thanksgiving, that was. On Thanksgiving morning, I noticed that my web analytics had stopped working. All HTTP API requests failed with a “503 Service unavailable.” error but, interestingly, the cron-triggered APIs were still running in the background just fine and the staging deployment slot of the same app worked fine end-to-end as well. I tried redeploying the app with a fresh binary, thinking that a refresh would fix the problem, but that was of no use. I also poked through the dashboard trying to figure out what “Runtime version: Error” would be about, making sure the version spec in was up-to-date, and couldn’t figure it out either. Summary state of my problematic Azure Functions deployment. Note the cryptic runtime error along with the subtle warning at the top about upcoming deprecations. So… I had to get out of Azure Functions, quick. Not accidentally, I had bought a second-hand, over-provisioned ThinkStation (2x36-core Xeon E5-2697, 64 GB of RAM, a 2 TB NVMe drive, and a 4x4 TB HDD array) just two years back. The justification I gave myself was to use it as my development server, but I had this idea in the back of my mind to use it to host my own services at some point. The time to put it to serving real-world traffic with FreeBSD 14.x had come. From serverless to serverful The way you run a serverless Rust (or Go) service on Azure Functions is by creating a binary that exposes an HTTP server on the port provided to it by the environment variable. Then, you package the binary along a set of metadata JSON files that tell the runtime what HTTP routes the binary serves and push the packaged ZIP file to Azure. From there on, the Azure Functions runtime handles TLS termination for those routes, spawns your binary server on a micro VM on demand, and redirects the requests to it. By removing the Azure Functions runtime from the picture, I had to make my server binary stand alone. This was actually pretty simple because the binary was already an HTTP server: it just had to be coerced into playing nicely with FreeBSD’s approach to running services. In particular, I had to: Inject configuration variables into the server process at startup time. These used to come from the Azure Functions configuration page, and are necessary to tell the server where the database lives and what credentials to use. Make the service run as an unprivileged user, easily. Create a PID file to track the execution of the process so that the framework could handle restarts and stop requests. Store the logs that the service emits via stderr to a log file, and rotate the log to prevent local disk overruns. If you rename the file, all subsequent reads and writes go to the new file location, not the old one. If you delete the file, all subsequent reads and writes continue to go to disk but to a file you cannot reference anymore. You can run out of disk space and, while will confirm the fact, will not let you find what file is actually consuming it! The system starts a service via and redirects logs to . runs and determines that needs to be rotated because a day has passed. renames to and creates a new and empty . At this point is still writing to ! sends a to the process. The process closes its file handle for the log, reopens (which is the fresh new log file), and resumes writing. compresses the file for archival now that it’s quiesced. Interactions between client machines, Cloudflare servers, the cloudflared tunnel agent, and the actual HTTP servers I wrote. The obvious downside of relying on someone else to do TLS termination instead of doing it yourself on your own server is that they can intercept and modify your traffic. For the kinds of services I run this isn’t a big deal for me, and the simplicity of others dealing with certificates is well welcome. Note that I was already offloading TLS termination to Azure Functions anyway, so this isn’t a downgrade in security or privacy. CORS But using Cloudflare as the frontend came with a little annoyance: CORS handling. You see: the services I run require configuring extra allowed origins, and as soon as I tried to connect to them via the Cloudflare tunnel, I’d get the dreaded “405 Method not allowed” error in the requests. Before, I used to configure CORS orgins from the Azure Functions console, but no amount of peeking through the Cloudflare console showed me how to do this for my tunneled routes. At some point during the investigation, I assumed that I had to configure CORS on my own server. I’m not sure how I reached that bogus conclusion, but I ended up wasting a few hours implementing a configuration system for CORS in my web framework . Nice addition… but ultimately useless. I had not accounted for the fact that because Cloudflare acts as the frontend for the services, it is the one responsible for handling the pre-flight HTTP requests necessary for CORS. In turn, this means that Cloudflare is where CORS needs to be configured but there is nothing “obvious” about configuring CORS in the Cloudflare portal. AI to the rescue! As skeptical as I am of these tools, it’s true that they work well to get answers to common problems—and figuring out how to deal with CORS in Cloudflare was no exception. They told me to configure a transformation rule that explicitly sets CORS response headers for specific subdomains, and that did the trick: Sample rule configuration on the Cloudflare portal to rewrite CORS response headers. Even though AI was correct in this case, the whole thing looked fishy to me, so I did spend time reading about the inner workings of CORS to make sure I understood what this proposed solution was about and to gain my own confidence that it was correct. Results of the transition By now, my web services are now fully running on my FreeBSD machine. The above may have seemed complicated, but in reality it was all just a few hours of work on Thanksgiving morning. Let’s conclude by analyzing the results of the transition. On the plus side, here is what I’ve gained: Predictability: Running in the cloud puts you at the mercy of the upgrade and product discontinuation treadmill of big cloud providers. It’s no fun to have to be paying attention to deprecation messages and adjust to changes no matter how long the deadlines are. FreeBSD also evolves, of course, but it has remained pretty much the same over the last 30 years and I have no reason to believe it’ll significantly change in the years to come. Performance: My apps are so much faster now it’s ridiculous. The serverless runtime of Azure Functions starts quickly for sure, but it just can’t beat a server that’s continuously running and that has hot caches at all layers. That said, I bet the real difference in performance for my use case comes from collocating the app servers with the database, duh. Ease of management: In the past, having automated deployments via GitHub Actions to Azure Functions was pretty cool, not gonna lie. But… being now able to deploy with a trivial , perform administration PostgreSQL tasks with just a , and inspecting logs trivially and quickly by looking at beats any sort of online UI and distributed system. “Doesn’t scale” you say, but it scales up my time . Cost: My Azure bill has gone from $20/month, the majority of which was going into the managed PostgreSQL instance, to almost zero. Yes, the server I’m running in the garage is probably costing me the same or more in electricity, but I was running it anyway already for other reasons. Availability (and redundancy): The cloud gives you the chance of very high availability by providing access to multiple regions. Leveraging these extra availability features is not cheap and often requires extra work, and I wasn’t taking advantage of them in my previous setup. So, I haven’t really decreased redundancy, but it’s funny that the day right after I finished the migration, I lost power for about 2 hours. Hah, I think I hadn’t suffered any outages with Azure other than the one described in this article. A staging deployment: In my previous setup, I had dual prod and staging deployments (via Azure Functions slots and separate PostgreSQL databases—not servers) and it was cool to deploy first to staging, perform some manual validations, and then promote the deployment to prod. In practice, this was rather annoying because the deployment flow was very slow and not fully automated (see “manual testing”), but it indeed saved me from breaking prod a few times. Auto-deployments: Lastly and also in my previous setup, I had automated the push to staging and prod by simply updating tags in the GitHub repository. Once again, this was convenient, but the biggest benefit of it all was that the prod build process was “containerized” and not subject to environmental interference. I’d very well set up a cron job or webhook-triggered local service that rebuilt and deployed my services on push… but it’s now hard to beat the simplicity of .

0 views
Kaushik Gopal 1 months ago

Combating AI coding atrophy with Rust

It’s no secret that I’ve fully embraced AI for my coding. A valid concern ( and one I’ve been thinking about deeply ) is the atrophying of the part of my brain that helps me code. To push back on that, I’ve been learning Rust on the side for the last few months. I am absolutely loving it. Kotlin remains my go-to language. It’s the language I know like the back of my hand. If someone sends me a swath of Kotlin code, whether handwritten or AI generated, I can quickly grok it and form a strong opinion on how to improve it. But Kotlin is a high-level language that runs on a JVM. There are structural limits to the performance you can eke out of it, and for most of my career 1 I’ve worked with garbage-collected languages. For a change, I wanted a systems-level language, one without the training wheels of a garbage collector. I also wanted a language with a different core philosophy, something that would force me to think in new ways. I picked up Go casually but it didn’t feel like a big enough departure from the languages I already knew. It just felt more useful to ask AI to generate Go code than to learn it myself. With Rust, I could get code translated, but then I’d stare at the generated code and realize I was missing some core concepts and fundamentals. I loved that! The first time I hit a lifetime error, I had no mental model for it. That confusion was exactly what I was looking for. Coming from a GC world, memory management is an afterthought — if it requires any thought at all. Rust really pushes you to think through the ownership and lifespan of your data, every step of the way. In a bizarre way, AI made this gap obvious. It showed me where I didn’t understand things and pointed me toward something worth learning. Here’s some software that’s either built entirely in Rust or uses it in fundamental ways: Many of the most important tools I use daily are built with Rust. Can’t hurt to know the language they’re written in. Rust is quite similar to Kotlin in many ways. Both use strict static typing with advanced type inference. Both support null safety and provide compile-time guarantees. The compile-time strictness and higher-level constructs made it fairly easy for me to pick up the basics. Syntactically, it feels very familiar. I started by rewriting a couple of small CLI tools I used to keep in Bash or Go. Even in these tiny programs, the borrow checker forced me to be clear about who owns what and when data goes away. It can be quite the mental workout at times, which is perfect for keeping that atrophy from setting in. After that, I started to graduate to slightly larger programs and small services. There are two main resources I keep coming back to: There are times when the book or course mentions a concept and I want to go deeper. Typically, I’d spend time googling, searching Stack Overflow, finding references, diving into code snippets, and trying to clear up small nuances. But that’s changed dramatically with AI. One of my early aha moments with AI was how easy it made ramping up on code. The same is true for learning a new language like Rust. For example, what’s the difference 2 between these two: Another thing I loved doing is asking AI: what are some idiomatic ways people use these concepts? Here’s a prompt I gave Gemini while learning: Here’s an abbreviated response (the full response was incredibly useful): It’s easy to be doom and gloom about AI in coding — the “we’ll all forget how to program” anxiety is real. But I hope this offers a more hopeful perspective. If you’re an experienced developer worried about skill atrophy, learn a language that forces you to think differently. AI can help you cross that gap faster. Use it as a tutor, not just a code generator. I did a little C/C++ in high school, but nowhere close to proficiency.  ↩︎ Think mutable var to a “shared reference” vs. immutable var to an “exclusive reference”.  ↩︎ fd (my tool of choice for finding files) ripgrep (my tool of choice for searching files) Fish shell (my shell of choice, recently rewrote in Rust) Zed (my text/code editor of choice) Firefox ( my browser of choice) Android?! That’s right: Rust now powers some of the internals of the OS, including the recent Quick Share feature. Fondly referred to as “ The Book ”. There’s also a convenient YouTube series following the book . Google’s Comprehensive Rust course, presumably created to ramp up their Android team. It even has a dedicated Android chapter . This worked beautifully for me. I did a little C/C++ in high school, but nowhere close to proficiency.  ↩︎ Think mutable var to a “shared reference” vs. immutable var to an “exclusive reference”.  ↩︎

0 views
Ankur Sethi 1 months ago

Using LLMs for web search

I have two main use-cases for LLMs: writing code and searching the web. There's a lot of discussion online about LLM-assisted programming but very little about LLM-assisted search, so here are some unstructured thoughts about just that. OpenAI calls their web search feature ChatGPT Deep Research , Google has Gemini Deep Research , and Anthropic just uses the word Research . Regardless of the label, all these products work the same way: I like this feature a lot , and I lean on it heavily when I'm looking for high-quality long-form writing by actual human beings on a topic that's new to me. I use Kagi for most of my search queries, but it helps to turn to an LLM when I'm completely unfamiliar with a topic, when I'm not even sure what keywords to search for. Or, sometimes, when I just don't feel like sifting through thousands of words of SEO content to find a few good results. I rarely ask LLMs factual questions because I don’t trust the answers they return. Unless I have a way to verify their output, I consider anything an LLM says to be a hallucination. This is doubly true when I’m learning something new. If I'm new to Rust, how can I be sure that the Rust best practices Claude is telling me about are truly the best of all Rust practices? I find it much easier to trust LLM-generated output when it cites web pages I can read myself. This lets me verify that the information comes from an entity I can trust, an entity that’s (hopefully) a real person or institution with real expertise. Grounding the LLM in web search also means I'm likely to find more up to date information that's not in its training data yet (though, in my experience, this is not always guaranteed). Whenever I get curious about something these days, I write down a detailed question, submit it to Claude, and go off to get some work done while it churns away browsing hundreds of web pages looking for answers. Sometimes I end up with multiple browser tabs with a different search query running in each of them. For most questions, Claude is able to write me a detailed report with citations in five to ten minutes, and it looks at about a hundred pages in the process. Sometimes it decides to spend a lot more time browsing, reading multiple hundreds of web pages. For one question, it churned away for almost seventeen minutes and looked at five hundred and fifty sources to generate its report. I don’t actually care for the report Claude produces at the end of its research process. I almost never read it. It’s a a whole lot of LLM slop: unreadable prose, needlessly verbose, often misrepresenting the very sources it quotes. I only care about the links it finds, which are usually entirely different from what I get out of Kagi. My workflow is to skim the report to get an idea of its general structure, open all the links in new tabs, and close the report itself. I wish there was a mode where the "report" could just be a list of useful links, though I suppose I could accomplish that with some clever prompting. The web pages Claude surfaces always surprise me. I can’t tell what search index it uses, what search keywords it uses under the hood, or how it decides what links are worth clicking. Whatever the secret sauce is, I regularly end up on web pages that I'd never be able to find through a regular search engine: personal websites that were last updated 20 years ago, columns from long-defunct magazines, ancient Blogger and LiveJournal blogs, pages hidden deep inside some megcorporation’s labyrinthian support website, lecture notes hosted on university websites, PDFs from exposed directories, and other unexpected wonders from a web that search engines try their best to hide away. Sometimes Claude links to pages that aren’t even online anymore! How is it able to cite these pages if it can't actually read them? Unclear. I often have to pull up a cached version of such pages using the Internet Archive. For example, one of the reports it produced linked to On Outliners and Outline Processing and Getting Started Blogging with Drummer , both of which no longer exist on the web. I can't tell whether any major LLM providers take their web search products seriously (outside of Perplexity, which is not technically an LLM provider). I certainly haven’t seen any new changes made to them since they were introduced, and nobody seems to talk about them very much. For me, though, web search is one of the main reasons I use LLMs at all. That's partly why I'm giving Anthropic that $20 every month. I have a long wishlist of features I want to see in LLM-powered search products: Maybe Kagi Assistant will grow into this in the future? Maybe I should try using Perplexity? I've had meh experiences with both these products, and I'm not sure whether they can compete with the quality of results ChatGPT/Claude/Gemini surface. Anyway, yeah. I like LLM-powered search. You enter a prompt, just like in any other LLM workflow. The LLM may ask you a clarifying question (for some reason it’s always a single clarifying question, never more than that). The LLM searches the web for pages matching your prompt using a traditional search engine. It uses the web pages it found to generate a long report, citing its sources wherever it makes a claim within the text of the report. Stop hiding away web search inside a menu! Let me directly click “New search” in the same way I click “New chat” or "Code" in the Claude sidebar. Let me edit the LLM's research plan before it starts searching (Gemini lets me do this to some degree). Let me edit the keywords the LLM will use to start its research process. Or, let me ask it to automatically refine those keywords before it begins. What if the LLM finds new information online that re-contextualizes my original query? In those cases, allow it to interrupt its research process and ask for clarifications. Let me look at the raw search results for each keyword the LLM searched for. Add a mode where the LLM picks the best search results for me and only returns a list of links, like a traditional search engine. Let me use “lenses” in the same way I use them on Kagi . Allow me to place limits on my sources (e.g social media, personal blogs, news websites, or academic journals). Let me uprank/downrank/ban certain sources in the same way Kagi allows .

0 views
Langur Monkey 1 months ago

Google *unkills* JPEG XL?

I’ve written about JPEG XL in the past. First, I noted Google’s move to kill the format in Chromium in favor of the homegrown and inferior AVIF. 1 2 Then, I had a deeper look at the format, and visually compared JPEG XL with AVIF on a handful of images. The latter post started with a quick support test: “If you are browsing this page around 2023, chances are that your browser supports AVIF but does not support JPEG XL.” Well, here we are at the end of 2025, and this very sentence still holds true. Unless you are one of the 17% of users using Safari 3 , or are adventurous enough to use a niche browser like Thorium or LibreWolf , chances are you see the AVIF banner in green and the JPEG XL image in black/red. The good news is, this will change soon. In a dramatic turn of events, the Chromium team has reversed its tag, and has decided to support the format in Blink (the engine behind Chrome/Chromium/Edge). Given Chrome’s position in the browser market share, I predict the format will become a de factor standard for images in the near future. I’ve been following JPEG XL since its experimental support in Blink. What started as a promising feature was quickly axed by the team in a bizarre and ridiculous manner. First, they asked the community for feedback on the format. Then, the community responded very positively. And I don’t only mean a couple of guys in their basement. Meta , Intel , Cloudinary , Adobe , , , Krita , and many more. After that came the infamous comment: [email protected] [email protected] #85 Oct 31, 2022 12:34AM Thank you everyone for your comments and feedback regarding JPEG XL. We will be removing the JPEG XL code and flag from Chromium for the following reasons: Yes, right, “ not enough interest from the entire ecosystem ”. Sure. Anyway, following this comment, a steady stream of messages pointed out how wrong that was, from all the organizations mentioned above and many more. People were noticing in blog posts, videos, and social media interactions. Strangely, the following few years have been pretty calm for JPEG XL. However, a few notable events did take place. First, the Firefox team showed interest in a JPEG XL Rust decoder , after describing their stance on the matter as “neutral”. They were concerned about the increased attack surface resulting from including the current 100K+ lines C++ reference decoder, even though most of those lines are testing code. In any case, they kind of requested a “memory-safe” decoder. This seems to have kick-started the Rust implementation, jxl-rs , from Google Research. To top it off, a couple of weeks ago, the PDF Association announced their intent to adopt JPEG XL as a preferred image format in their PDF specification. The CTO of the PDF Association, Peter Wyatt, expressed their desire to include JPEG XL as the preferred format for HDR content in PDF files. 4 All of this pressure exerted steadily over time made the Chromium team reconsider the format. They tried to kill it in favor of AVIF, but that hasn’t worked out. Rick Byers, on behalf of Chromium, made a comment in the Blink developers Google group about the team welcoming a performant and memory-safe JPEG XL decoder in Chromium. He stated that the change of stance was in light of the positive signs from the community we have exposed above (Safari support, Firefox updating their position, PDF, etc.). Quickly after that, the Chromium issue state was changed from to . This is great news for the format, and I believe it will give it the final push for mass adoption. The format is excellent for all kinds of purposes, and I’ll be adopting it pretty much instantly for this and the Gaia Sky website when support is shipped. Some of the features that make it superior to the competition are: For a full codec feature breakdown, see Battle of the Codecs . JPEG XL is the future of image formats. It checks all the right boxes, and it checks them well. Support in the overwhelmingly most popular browser engine is probably going to be a crucial stepping stone in the format’s path to stardom. I’m happy that the Chromium team reconsidered their inclusion, but I am sad that it took so long and so much pressure from the community to achieve it. https://aomediacodec.github.io/av1-avif/   ↩︎ https://jpegxl.info/resources/battle-of-codecs.html   ↩︎ https://radar.cloudflare.com/reports/browser-market-share-2025-q1   ↩︎ https://www.youtube.com/watch?v=DjUPSfirHek&t=2284s   ↩︎ https://youtu.be/qc2DvJpXh-A   ↩︎ Experimental flags and code should not remain indefinitely There is not enough interest from the entire ecosystem to continue experimenting with JPEG XL The new image format does not bring sufficient incremental benefits over existing formats to warrant enabling it by default By removing the flag and the code in M110, it reduces the maintenance burden and allows us to focus on improving existing formats in Chrome Lossless re-compression of JPEG images. This means you can re-compress your current JPEG library without losing information and benefit from a ~30% reduction in file size for free. This is a killer feature that no other format has. Support for wide gamut and HDR. Support for image sizes of up to 1,073,741,823x1,073,741,824. You won’t run out of image space anytime soon. AVIF is ridiculous in this aspect, capping at 8,193x4,320. WebP goes up to 16K 2 , while the original 1992 JPEG supports 64K 2 . Maximum of 32 bits per channel. No other format (except for the defunct JPEG 2000) offers this. Maximum of 4,099 channels. Most other formats support 4 or 5, with the exception of JPEG 2000, which supports 16,384. JXL is super resilient to generation loss. 5 JXL supports progressive decoding, which is essential for web delivery, IMO. WebP or HEIC have no such feature. Progressive decoding in AVIF was added a few years back. Support for animation. Support for alpha transparency. Depth map support. https://aomediacodec.github.io/av1-avif/   ↩︎ https://jpegxl.info/resources/battle-of-codecs.html   ↩︎ https://radar.cloudflare.com/reports/browser-market-share-2025-q1   ↩︎ https://www.youtube.com/watch?v=DjUPSfirHek&t=2284s   ↩︎ https://youtu.be/qc2DvJpXh-A   ↩︎

0 views
Corrode 1 months ago

Canonical

What does it take to rewrite the foundational components of one of the world’s most popular Linux distributions? Ubuntu serves over 12 million daily desktop users alone, and the systems that power it, from sudo to core utilities, have been running for decades with what Jon Seager, VP of Engineering for Ubuntu at Canonical, calls “shaky underpinnings.” In this episode, we talk to Jon about the bold decision to “oxidize” Ubuntu’s foundation. We explore why they’re rewriting critical components like sudo in Rust, how they’re managing the immense risk of changing software that millions depend on daily, and what it means to modernize a 20-year-old operating system without breaking the internet. CodeCrafters helps you become proficient in Rust by building real-world, production-grade projects. Learn hands-on by creating your own shell, HTTP server, Redis, Kafka, Git, SQLite, or DNS service from scratch. Start for free today and enjoy 40% off any paid plan by using this link . Canonical is the company behind Ubuntu, one of the most widely-used Linux distributions in the world. From personal desktops to cloud infrastructure, Ubuntu powers millions of systems globally. Canonical’s mission is to make open source software available to people everywhere, and they’re now pioneering the adoption of Rust in foundational system components to improve security and reliability for the next generation of computing. Jon Seager is VP Engineering for Ubuntu at Canonical, where he oversees the Ubuntu Desktop, Server, and Foundations teams. Appointed to this role in January 2025, Jon is driving Ubuntu’s modernization strategy with a focus on Communication, Automation, Process, and Modernisation. His vision includes adopting memory-safe languages like Rust for critical infrastructure components. Before this role, Jon spent three years as VP Engineering building Juju and Canonical’s catalog of charms. He’s passionate about making Ubuntu ready for the next 20 years of computing. Juju - Jon’s previous focus, a cloud orchestration tool GNU coretuils - The widest used implementation of commands like ls, rm, cp, and more uutils coreutils - coreutils implementation in Rust sudo-rs - For your Rust based sandwiches needs LTS - Long Term Support, a release model popularized by Ubuntu coreutils-from-uutils - List of symbolic links used for coreutils on Ubuntu, some still point to the GNU implementation man: sudo -E - Example of a feature that sudo-rs does not support SIMD - Single instruction, multiple data rust-coreutils - The Ubuntu package with all it’s supported CPU platforms listed fastcat - Matthias’ blogpost about his faster version of systemd-run0 - Alternative approach to sudo from the systemd project AppArmor - The Linux Security Module used in Ubuntu PAM - The Pluggable Authentication Modules, which handles all system authentication in Linux SSSD - Enables LDAP user profiles on Linux machines ntpd-rs - Timesynchronization daemon written in Rust which may land in Ubuntu 26.04 Trifecta Tech Foundation - Foundation supporting sudo-rs development Sequioa PGP - OpenPGP tools written in Rust Mir - Canonicals wayland compositor library, uses some Rust Anbox Cloud - Canonical’s Android streaming platform, includes Rust components Simon Fels - Original creator of Anbox and Anbox Cloud team lead at Canonical LXD - Container and VM hypervisor dqlite - SQLite with a replication layer for distributed use cases, potentially being rewritten in Rust Rust for Linux - Project to add Rust support to the Linux kernel Nova GPU Driver - New Linux OSS driver for NVIDIA GPUs written in Rust Ubuntu Asahi - Community project for Ubuntu on Apple Silicon debian-devel: Hard Rust requirements from May onward - Parts of apt are being rewritten in Rust (announced a month after the recording of this episode) Go Standard Library - Providing things like network protocols, cryptographic algorithms, and even tools to handle image formats Python Standard Library - The origin of “batteries included” The Rust Standard Library - Basic types, collections, filesystem access, threads, processes, synchronisation, and not much more clap - Superstar library for CLI option parsing serde - Famous high-level serilization and deserialization interface crate Jon Seager’s Website Jon’s Blog: Engineering Ubuntu For The Next 20 Years Canonical Blog Ubuntu Blog Canonical Careers: Engineering - Apply your Rust skills in the Linux ecosystem

0 views

Notes on the WASM Basic C ABI

The WebAssembly/tool-conventions repository contains "Conventions supporting interoperability between tools working with WebAssembly". Of special interest, in contains the Basic C ABI - an ABI for representing C programs in WASM. This ABI is followed by compilers like Clang with the wasm32 target. Rust is also switching to this ABI for extern "C" code. This post contains some notes on this ABI, with annotated code samples and diagrams to help visualize what the emitted WASM code is doing. Hereafter, "the ABI" refers to this Basic C ABI. In these notes, annotated WASM snippets often contain descriptions of the state of the WASM value stack at a given point in time. Unless otherwise specified, "TOS" refers to "Top Of value Stack", and the notation [ x  y ] means the stack has y on top, with x right under it (and possibly some other stuff that's not relevant to the discussion under x ); in this notation, the stack grows "to the right". The WASM value stack has no linear memory representation and cannot be addressed, so it's meaningless to discuss whether the stack grows towards lower or higher addresses. The value stack is simply an abstract stack, where values can be pushed onto or popped off its "top". Whenever addressing is required, the ABI specifies explicitly managing a separate stack in linear memory. This stack is very similar to how stacks are managed in hardware assembly languages (except that in the ABI this stack pointer is held in a global variable, and is not a special register), and it's called the "linear stack". By "scalar" I mean basic C types like int , double or char . For these, using the WASM value stack is sufficient, since WASM functions can accept an arbitrary number of scalar parameters. This C function: Will be compiled into something like: And can be called by pushing three values onto the stack and invoking call $add_three . The ABI specifies that all integral types 32-bit and smaller will be passed as i32 , with the smaller types appropriately sign or zero extended. For example, consider this C function: It's compiled to the almost same code as add_three : Except the last i32.extend8_s , which takes the lowest 8 bits of the value on TOS and sign-extends them to the full i32 (effectively ignoring all the higher bits). Similarly, when $add_three_chars is called, each of its parameters goes through i32.extend8_s . There are additional oddities that we won't get deep into, like passing __int128 values via two i64 parameters. C pointers are just scalars, but it's still educational to review how they are handled in the ABI. Pointers to any type are passed in i32 values; the compiler knows they are pointers, though, and emits the appropriate instructions. For example: Is compiled to: Recall that in WASM, there's no difference between an i32 representing an address in linear memory and an i32 representing just a number. i32.store expects [ addr  value ] on TOS, and does *addr = value . Note that the x parameter isn't needed any longer after the sum is computed, so it's reused later on to hold the return value. WASM parameters are treated just like other locals (as in C). According to the ABI, while scalars and single-element structs or unions are passed to a callee via WASM function parameters (as shown above), for larger aggregates the compiler utilizes linear memory. Specifically, each function gets a "frame" in a region of linear memory allocated for the linear stack. This region grows downwards from high to low addresses [1] , and the global $__stack_pointer points at the bottom of the frame: Consider this code: When do_work is compiled to WASM, prior to calling pair_calculate it copies pp into a location in linear memory, and passes the address of this location to pair_calculate . This location is on the linear stack, which is maintained using the $__stack_pointer global. Here's the compiled WASM for do_work (I also gave its local variable a meaningful name, for readability): Some notes about this code: Before pair_calculate is called, the linear stack looks like this: Following the ABI, the code emitted for pair_calculate takes Pair* (by reference, instead of by value as the original C code): Each function that needs linear stack space is responsible for adjusting the stack pointer and restoring it to its original place at the end. This naturally enables nested function calls; suppose we have some function a calling function b which, in turn, calls function c , and let's assume all of these need to allocate space on the linear stack. This is how the linear stack looks after c 's prologue: Since each function knows how much stack space it has allocated, it's able to properly restore $__stack_pointer to the bottom of its caller's frame before returning. What about returning values of aggregate types? According to the ABI, these are also handled indirectly; a pointer parameter is prepended to the parameter list of the function. The function writes its return value into this address. The following function: Is compiled to: Here's a function that calls it: And the corresponding WASM: Note that this function only uses 8 bytes of its stack frame, but allocates 16; this is because the ABI dictates 16-byte alignment for the stack pointer. There are some advanced topics mentioned in the ABI that these notes don't cover (at least for now), but I'll mention them here for completeness: This is similar to x86 . For the WASM C ABI, a good reason is provided for the direction: WASM load and store instructions have an unsigned constant called offset that can be used to add a positive offset to the address parameter without extra instructions. Since $__stack_pointer points to the lowest address in the frame, these offsets can be used to efficiently access any value on the stack. There are two instance of the pair pp in linear memory prior to the call to pair_calculate : the original one from the initialization statement (at offset 8), and a copy created for passing into pair_calculate (at offset 0). Theoretically, as pp is unused used after the call, the compiler could do better here and keep only a single copy. The stack pointer is decremented by 16, and restored at the end of the function. The first few instructions - where the stack pointer is adjusted - are usually called the prologue of the function. In the same vein, the last few instructions where the stack pointer is reset back to where it was at the entry are called the epilogue . "Red zone" - leaf functions have access to 128 bytes of red zone below the stack pointer. I found this difficult to observe in practice [2] . Since we don't issue system calls directly in WASM, it's tricky to conjure a realistic leaf function that requires the linear stack (instead of just using WASM locals). A separate frame pointer (global value) to be used for functions that require dynamic stack allocation (such as using C's VLAs ). A separate base pointer to be used for functions that require alignment > 16 bytes on the stack.

0 views
baby steps 1 months ago

Move Expressions

This post explores another proposal in the space of ergonomic ref-counting that I am calling move expressions . To my mind, these are an alternative to explicit capture clauses , one that addresses many (but not all ) of the goals from that design with improved ergonomics and readability. The idea itself is simple, within a closure (or future), we add the option to write . This is a value expression (“rvalue”) that desugars into a temporary value that is moved into the closure. So is roughly equivalent to something like: Let’s go back to one of our running examples, the “Cloudflare example”, which originated in this excellent blog post by the Dioxus folks . As a reminder, this is how the code looks today – note the lines for dealing with captures: Under this proposal it would look something like this: There are times when you would want multiple clones. For example, if you want to move something into a closure that will then give away a copy on each call, it might look like This idea is not mine. It’s been floated a number of times. The first time I remember hearing it was at the RustConf Unconf, but I feel like it’s come up before that. Most recently it was proposed by Zachary Harrold on Zulip , who has also created a prototype called soupa . Zachary’s proposal, like earlier proposals I’ve heard, used the keyword. Later on @simulacrum proposed using , which to me is a major improvement, and that’s the version I ran with here. The reason that I love the variant of this proposal is that it makes closures more “continuous” and exposes their underlying model a bit more clearly. With this design, I would start by explaining closures with move expressions and just teach closures at the end, as a convenient default: A Rust closure captures the places you use in the “minimal way that it can” – so will capture a shared reference to the , will capture a mutable reference, and will take ownership of the vector. You can use expressions to control exactly what is captured: so will move the into the closure. A common pattern when you want to be fully explicit is to list all captures at the top of the closure, like so: As a shorthand, you can write at the top of the closure, which will change the default so that closures > take ownership of every captured variable. You can still mix-and-match with expressions to get more control. > So the previous closure might be written more concisely like so: It’s a bit ironic that I like this, because it’s doubling down on part of Rust’s design that I was recently complaining about. In my earlier post on Explicit Capture Clauses I wrote that: To be honest, I don’t like the choice of because it’s so operational . I think if I could go back, I would try to refashion our closures around two concepts I think this would help to build up the intuition of “use if you are going to return the closure from the current stack frame and use otherwise”. expressions are, I think, moving in the opposite direction. Rather than talking about attached and detached, they bring us to a more unified notion of closures, one where you don’t have “ref closures” and “move closures” – you just have closures that sometimes capture moves, and a “move” closure is just a shorthand for using expressions everywhere. This is in fact how closures work in the compiler under the hood, and I think it’s quite elegant. One question is whether a expression should be a prefix or a postfix operator. So e.g. instead of . My feeling is that it’s not a good fit for a postfix operator because it doesn’t just take the final value of the expression and so something with it, it actually impacts when the entire expression is evaluated. Consider this example: When does get called? If you think about it, it has to be closure creation time, but it’s not very “obvious”. We reached a similar conclusion when we were considering operators. I think there is a rule of thumb that things which delineate a “scope” of code ought to be prefix – though I suspect might actually be nice, and not just . Edit: I added this section after-the-fact in response to questions. I’m going to wrap up this post here. To be honest, what this design really has going for it, above anything else, is its simplicity and the way it generalizes Rust’s existing design . I love that. To me, it joins the set of “yep, we should clearly do that” pieces in this puzzle: These both seem like solid steps forward. I am not yet persuaded that they get us all the way to the goal that I articulated in an earlier post : “low-level enough for a Kernel, usable enough for a GUI” but they are moving in the right direction. Attached closures (what we now call ) would always be tied to the enclosing stack frame. They’d always have a lifetime even if they don’t capture anything. Detached closures (what we now call ) would capture by-value, like today. Add a trait (I’ve gone back to preferring the name 😁) Add expressions

0 views