Posts in Rust (20 found)
Martin Fowler 3 days ago

Fragments: May 27

At the GOTO Conference in Copenhagen in 2025, Kent Beck and I spent some time on stage talking and answering questions from the audience - a format I refer to as “two old geezers on a park bench”. We talk about our experiences with LLM-augmented programming (at that point - October 2025), we show our frustration that things we’ve been saying for thirty years still need to be said, we say how anything like a manifesto reunion needs to be led by a younger generation, and opine on what junior developers should be focusing on in their career. ❄                ❄                ❄                ❄                ❄ Ian Johnson has written a series of posts about restructuring a gnarly codebase The story follows a real Laravel + React codebase over ~3 months and ~258 commits from a legacy monolith with no tests to a well-structured application with automated quality gates, a React SPA migration in progress, and an AI agent that reliably ships production code with minimal supervision. The series covers the steps in decent detail, and his approach follows the kinds of steps I’d use. First get everything under the control of decent characterization tests, add static analysis, introduce the right patterns to make things flow easily. With all of this, is his use of AI, which changed during the exercise: For the first two months of this project, I used Claude Code with auto-approve turned off. Every file edit, every terminal command, every change… I reviewed it before it executed. […] The results were good. The code was clean. But I was doing most of the thinking and half the typing. The agent was a fancy autocomplete with better suggestions. I wasn’t getting the leverage I’d hoped for. I read an article about “on-the-loop” versus “in-the-loop” human-AI collaboration. The framing clicked immediately […] I was micromanaging because I didn’t trust the agent to do the right thing. And I didn’t trust the agent because there was nothing forcing it to do the right thing. His early steps put in tests, static analysis, and the right architectural patterns. With those in place, he could let the agent do more work. My role shifted from writer to curator. I don’t write most of the code anymore. I Define the patterns […] Review the test specs […] Review the output […] Update the harness […] Make strategic decisions […] He finishes the series with conclusions about how he’d generalize his experience to other circumstances. ❄                ❄                ❄                ❄                ❄ Back in the land of my birth, there was some notable groans when the National Health Service decided to close nearly all of their Open Source repositories , supposedly to the security threat of LLMs. Closing repos like this isn’t an effective counter to LLM-augmented attackers. I suspect it’s no coincidence to see GDS (Government Data Services), the highly-regarded IT enablers in the UK government publish their position Moving code from public to private as a substitute for investment in secure-by-design delivery, ownership and remediation is a warning sign because it reduces sharing and scrutiny, can slow coordinated improvement across government and suppliers, and does not remove the underlying weaknesses in a running service. Terence Eden memorably sums up his view on this: Within the UK’s Civil Service you occasionally hear the expression “being invited to a meeting without biscuits”. It implies a rather frosty discussion without any of the polite niceties of a normal meeting. ❄                ❄                ❄                ❄                ❄ I’ve seen a few cases where those developers who are most involved in working with LLMs find they are running into a problem with cognitive endurance, Adam Tornhill has joined this group : One of the big wins with agents is that they let us stay with the higher-level problem for longer. We get less sidetracked by details, dependency cleanup, and similar secondary tasks that used to break concentration. But there is a cost we are still underestimating. Agentic coding is mentally expensive. I can usually sustain the pace for a couple of hours. Then I need a break. The pace is simply too intense. And based on conversations with other engineers, I do not think I am alone in that. He explains that working with The Genie means we are making more decisions in less time, this increase in decision density is hard on the brain. He responds by keeping agent tasks small, automating everything he can, and accepting that he won’t know every line of code as long as he has good verification mechanisms in place. Notably, he has not gone in the direction of doing his work with swarms of agents that he coordinates. Instead has one long-running task that he babysits and one focus task That last point is important given the running-twenty-agents-in-parallel hype. I cannot even think about twenty meaningful things to build, and even less so about the resulting cognitive tax of the likely interruptions. It’s exactly the wrong thing to even consider. At least for humans. (And yes, I understand sub-agents and machine parallelisation. That is not what I’m objecting to. It is the parallelisation of human attention that does not scale). I liked that he included some thoughts about what folks can do in time outside this intense programming time. Not just “have a coffee” (although he includes that) but also about learning about the domain that the software supports. ❄                ❄                ❄                ❄                ❄ A couple of pithy quotes from social media Lorin Hochstein “Metaphor debt” is when all of your metaphors involve the concept of “debt” because you can’t think of any other metaphors anymore. ❄                ❄ Daniel Terhorst-North If a vegan crossfit fan is using Claude to write Rust, which thing do they tell you first? ❄                ❄                ❄                ❄                ❄ Karl Bode reacts to speakers getting booed when mentioning AI during commencement addresses. He points out that younger folks are increasingly unhappy with the tech oligarchy and their fruits . The thing is the kids aren’t stupid. They see the field clearly. They see the difference between what’s being sold to them by tech companies, the press, and commencement speakers, and what they have repeatedly seen with their own eyes. They’ve watched tech oligarchs spend the last decade mired in scandal after scandal, hype cycle after hype cycle, steadily enshittifying everything they touch along the way. The percentage of Gen Z that think AI’s benefits don’t counterbalance the risks now sits around fifty percent, up 11 percentage points in just the last year. Eight out of every ten believe that using AI makes the process of actual learning more difficult. He sees young people saddled with the perception of entering a worsening world - which leads them to rage against this latest fruit of the tech oligarchy. A rage that is easy for folks like me - with a comfortable retirement off-ramp - to properly appreciate. A rage that could have marked political and social consequences. ❄                ❄                ❄                ❄                ❄ Relevant to these concerns are a couple of items in last week’s Economist newspaper. The newspaper argues that historically major technological advances haven’t led to significant unemployment or drops in wages ( paywalled article ). The closest was the original industrial revolution in 19th Century Britain. There was a stagnation in wages during this period, but there was also a massive increase in population, from 4½ million to 12 million. It also points out that we’ll probably only understand the full consequences of all this when a recession hits, as this is when most unproductive jobs tend to be flushed out of the system. A second article ( also paywalled ) indicates that AI is having some effect on graduate hiring. They did an analysis of surveys of recent graduates, looking to see if employment varied depending on a job’s exposure to AI. The least exposed quintile of subjects saw employment rate fall by 1.5% over the last couple of years, while the most exposed quintile’s drop was 6.6%. ❄                ❄                ❄                ❄                ❄ Lawfare isn’t impressed with the latest efforts by the US Government to regulate AI. On [last] Wednesday, the White House invited leaders of OpenAI, Google, Anthropic, Meta, and Microsoft to the Oval Office for a signing ceremony the following afternoon. President Trump was to sign an executive order on AI and cybersecurity—the administration’s most formal effort yet to establish a voluntary process for reviewing frontier models before their release. But roughly three hours before the ceremony, when some company executives were already in the air to Washington, the White House called it off. They see the proposed regulations as mild, and including some valuable measures to harden defenses against cyber threats. But it’s worth underscoring the implications of postponing (if not outright canceling) this order, which, by its own terms, was about as modest a frontier-AI intervention as the federal government could put on paper: voluntary, focused on the government’s own defenses, and explicitly barred from becoming a licensing regime. The objection isn’t so much about government coercion as about the government having any settled role at all. Voluntary, in other words, isn’t the floor of frontier AI policy in this administration; it’s the ceiling. This is a questionable position given that the concerns animating this draft order will likely grow in the near future. It is also self-defeating for those who applauded the order’s delay or demise. Far from resolving the risk of government meddling in AI, killing the order just leaves in place what Ball has described as the “opaque and essentially lawless” alternative: government access happening through back channels, on terms set case by case, with no stable rules at all. One of the problems here is a distinct lack of governmental expertise, either in AI or in software in general. Too much is being decided at the whims of the tech oligarchy, there isn’t any attempt to engage in the broader issues at hand. That’s not entirely a bad thing, trying to regulate something that’s still evolving so fast is usually a fool’s errand - but the problem here is the impact of AI is so big that there’s real danger in being too far behind. ❄                ❄ Which leads me to a rare thing, an endorsement of a candidate for political office. If you are voting in congressional district MA-06 (North Shore of Massachusetts), I’d seriously look at Beth Anders-Beck , who is running for congress in that district. Beth has a long background in software development (including developing the notion of Forest and Desert ), so would introduce expertise that Congress desperately needs. I’ve known Beth for decades, and have a high opinion of their intelligence, judgment, and ability to work with others. Congress doesn’t deserve Beth, but it does need her.

0 views
Stratechery 4 days ago

The SpaceX IPO and Data Centers in Space

Listen to this post : It’s hardly the biggest problem in the world — or perhaps the height of privilege to consider it a problem at all — but one of the most annoying consumer experiences is booking an Uber Black and realizing you got assigned a Tesla Model Y (Uber finally stopped allowing new Model Y’s onto Black last year ). Buckle up for an uncomfortable back seat, basic plastic finishes, and, all-too-often, potential car sickness from a driver who hasn’t completely mastered the Tesla’s aggressive regenerative braking. Still, the fact that the Model Y ever made it to the Black level is a testament to the brand Elon Musk built. Back in 2016, when 300,000 people dropped $1,000 each in a matter of hours to reserve an as-yet-unreleased Model 3, I explained that the phenomenon was because It’s a Tesla : The real payoff of Musk’s “Master Plan” is the fact that Tesla means something: yes, it stands for sustainability and caring for the environment, but more important is that Tesla also means amazing performance and Silicon Valley cool. To be sure, Tesla’s focus on the high end has helped them move down the cost curve, but it was Musk’s insistence on making “An electric car without compromises” that ultimately led to 276,000 people reserving a Model 3, many without even seeing the car: after all, it’s a Tesla. This is the same brand halo that landed what is, if we’re honest, a pretty basic car on the Uber Black list. What actually makes these cars compelling is the extent to which they are computers on wheels: I know plenty of very rich people who drive a Tesla not for the finishes but rather the Full Self-Driving (Supervised); there is nothing like it on the market, at least when it comes to cars you can own. Tesla appears to be doubling down on this point of differentiation: the company stopped production of the Models S and X earlier this year, focusing production resources on the CyberCab and robots; if you want your car to drive itself, you’ll get the same model as everyone else. It reminds me of Andy Warhol’s famous quote : What’s great about this country is that America started the tradition where the richest consumers buy essentially the same things as the poorest. You can be watching TV and see Coca-Cola, and you know that the President drinks Coke, Liz Taylor drinks Coke, and just think, you can drink Coke, too. A Coke is a Coke and no amount of money can get you a better Coke than the one the bum on the corner is drinking. All the Cokes are the same and all the Cokes are good. Liz Taylor knows it, the President knows it, the bum knows it, and you know it. That “tradition” is scale, and America is indeed better at it than any other country in the world; and, amongst Americans, no one pursues and seeks to leverage scale quite like Musk. From a press release from American Airlines: American Airlines today announced a sweeping modernization of its narrowbody inflight customer experience with the installation of Starlink, the fastest Wi-Fi in the sky, on more than 500 narrowbody aircraft beginning in Q1 2027. Starlink is widely regarded as the world’s most advanced satellite constellation using a low Earth orbit to deliver broadband Internet capable of supporting inflight streaming, online gaming, collaborative meeting tools and more. With thousands of satellites in low Earth orbit, Starlink can deliver multigigabit connectivity to aircraft using its Aero Terminal, which can support up to 1 Gbps per antenna. “As a premium global airline, we are continuously seeking out world-class partners like Starlink to deliver what our customers need and want,” said American Airlines Chief Customer Officer Heather Garboden. “The addition of Starlink solidifies American as a leading airline in keeping passengers connected in flight.” As part of American’s commitment to an elevated onboard experience, Starlink will enable seamless streaming, browsing and real-time communication capabilities across American’s domestic and short-haul international routes. I linked to the press release just for the amusement of American Airlines, which has in recent years built its strategy around offering anything-but-premium on routes you need, billing their Starlink deal as a commitment to “an elevated onboard experience.” That may have been the argument for United’s Starlink deal when it was announced in 2024 , but by this point it’s tablestakes , which is surely exactly how Musk wants it. Starlink is the consumer-facing business of SpaceX, generating $8.7 billion in revenue last year and $4.4 billion in profit; while it’s not totally clear exactly how SpaceX accounts for launch costs, obviously Starlink benefits greatly from the fact that it has access to SpaceX’s launch capacity. That launch capacity has resulted in over ten thousand active satellites in low Earth orbit, delivering low latency high speed Internet anywhere in the world — including in the air. That’s the carrot for airlines; the stick is the prospect of everyone else having the same service, and customers making flight decisions based on the quality of Internet access available. There is a similarity to Tesla in this way. Musk companies at their best don’t win the game; they change the rules through scale, such that billionaires buy economy cars because they actually drive themselves (with supervision), and airlines transform the consumer experience on their own dime. Musk makes all-in bets — whether that be in terms of launch capacity or in autonomous driving — not by making rational short-term business decisions, but by starting with the desired end state and working backwards. Tech has a long history of silly charts — there is an entire category known as Bezos charts — and the SpaceX S-1 has one that made me laugh. It came in the discussion of SpaceX’s total addressable market: We believe we have identified the largest actionable total addressable market (“TAM”) in human history. We estimate that our quantifiable TAM is $28.5 trillion, consisting of $370 billion in Space from space-enabled solutions; $1.6 trillion in Connectivity across $870 billion in Starlink Broadband and $740 billion in Starlink Mobile as well as additional opportunities in enterprise and government; $26.5 trillion in AI across $2.4 trillion in AI infrastructure, $760 billion in consumer subscriptions, $600 billion in digital advertising, and $22.7 trillion in enterprise applications. For illustrative purposes of sizing our addressable market opportunity, we exclude China and Russia from our global estimates. This image is approximately to scale vertically, but certainly not horizontally: I could use the help in really wrapping my mind around the $26.5 trillion AI opportunity, given it’s more than 13 times the space and connectivity opportunity combined! In all seriousness, the numbers are obviously absurd, but then again, everything about this IPO is absurd. SpaceX is seeking a $2 trillion valuation on a mere $18.67 billion in revenue with $4.9 billion in losses last year, and growth actually slowed from 35% to 33%. That slowdown happened despite the addition of xAI (and thus also X), which tipped the company from a small profit to that massive loss, thanks to $5.1 billion in AI R&D expense. That R&D, keep in mind, went towards building a model that is in 5th place, and whose entire founding team recently left the company. But sure, $26.5 trillion AI opportunity! This is not to say that SpaceX won’t get its desired valuation. Tesla’s valuation never made any sense right up until the Models 3 and Y actually worked out, causing Tesla’s share price to soar (and even then it was hard to ever build a financial model that justified the new share price). Musk’s ability to make his own reality starts with investors; from 2021’s Mistakes and Memes and comparing Apple and Tesla: This comparison works as far as it goes, but it doesn’t tell the entire story: after all, Apple’s brand was derived from decades building products, which had made it the most profitable company in the world. Tesla, meanwhile, always seemed to be weeks from going bankrupt, at least until it issued ever more stock, strengthening the conviction of Tesla skeptics and shorts. That, though, was the crazy thing: you would think that issuing stock would lead to Tesla’s stock price slumping; after all, existing shares were being diluted. Time after time, though, Tesla announcements about stock issuances would lead to the stock going up. It didn’t make any sense, at least if you thought about the stock as representing a company. It turned out, though, that TSLA was itself a meme, one about a car company, but also sustainability, and most of all, about Elon Musk himself. Issuing more stock was not diluting existing shareholders; it was extending the opportunity to propagate the TSLA meme to that many more people, and while Musk’s haters multiplied, so did his fans. The Internet, after all, is about abundance, not scarcity. The end result is that instead of infrastructure leading to a movement, a movement, via the stock market, funded the building out of infrastructure. I explained in that Article why I generally did not cover Tesla’s financial results, and the reasoning extends to why I don’t expect to cover SpaceX’s: Musk is the master of memes, and is himself a meme. He offers a dream — Mars, fully autonomous vehicles, an addressable market of $28.5 trillion — and positions his companies and their stock as access to that dream, and through the alchemy of capital markets, transforms shared delusion into mass market reality. Musk’s track record matters in this regard. Building an electric car company was possible, as was full self-driving (supervised); at the same time there were ever increasing government mandates and programs around decreasing emissions that acted as the stick to Tesla’s carrot. Similarly, landing rockets was possible, and the new market creation downstream from correspondingly lower launch costs was comprehensible. That Musk succeeded in both instances gives him the benefit of the doubt. The question that matters, then, is not if the numbers make sense right now (they absolutely do not); what matters is if the dream is even possible, and if there are actual reasons to think it might happen. I think that data centers in space meet these conditions. The first question about data centers in space is if they are even possible, and I think the answer is clearly yes. The key thing to consider is that there is no requirement that these data centers look anything like data centers on earth. On earth we build massive buildings full of GPUs with massive infrastructure for cooling those GPUs and massive power plants (or a connection to a grid which connects to massive power plants) to power those GPUs. The idea of transporting these massive structures to space sounds implausible, and it is! However, there is no reason that space data centers would look like data centers on earth. What makes far more sense is to think about an individual satellite as something akin to a rack. Right now the largest Starlink satellite in orbit is the V2 Mini Direct-to-Cell, which measures 7.4 meters by 2.7 meters by 0.3 meters (estimated); an NVL72 rack from Nvidia, meanwhile, measures 2.2 meters by 1.1 meters by 0.6 meters, so we’re already in the right size range. The V2 Mini Direct-to-Cell consumes (and dissipates) up to an estimated 25kW of energy; the NVL72 up to 135kW, and it can fit a 1 trillion parameter model quantized to FP4. The big shortcoming for a rack-satellite is power and its dissipation, but going from 25kW to 135kW is certainly within the realm of possibility — and given that you don’t need much of the cooling and power distribution usage on earth, something closer to 100kW might deliver similar performance. There are other issues to address, including the problem of radiation screwing with calculations, reliability, etc., although those two concerns could be addressed in part by using larger chips (which are less efficient, but also use less power); these rack-satellites will also be disposable, like Starlink satellites, ameliorating reliability issues. The key factor, however, is that a fleet of racks, interconnected with lasers (as Starlink’s already are), each with their own solar panels and radiator arrays for cooling (deploying 200+ square meters of radiators per rack will be a huge challenge), is possible . The next question about data centers in space is if there is a use case for them — the carrot — and I already made the argument that there is in The Inference Shift . Specifically, there are three types of workloads developing around LLMs: training, answer inference, and agentic inference. From the section making the case for “agentic inference”: Critically, this articulation of an agentic-specific memory hierarchy implies a necessary trade-off of speed for capacity. Here’s the thing, though: lower speed isn’t nearly as important a consideration if there isn’t a human in the loop. If an agent is waiting around for a job that is being run overnight, the agent doesn’t know or care about the user experience impact; what is most important is being able to accomplish a task, and if entirely new approaches to memory make that possible, then delays are fine. If delays are fine, then all of the focus on pure compute power and high-bandwidth memory seems out of place: if latency isn’t the top priority, then slower and cheaper memory — like traditional DRAM, for example — makes a lot more sense. And if the entire system is mostly waiting on memory, then chips don’t need to be as fast as the cutting edge either. This represents a profound shift in future architectures, but it also doesn’t mean that current architectures are going away: At the same time, these categories won’t be equal in size or importance. Specifically, agentic inference will be the largest market by far, because that is the market that won’t be limited by humans or time. Today’s agents are fancy answer inference; in the future true agentic inference will be work done by computers according to dictates given by other computers, and the market size scales not with humans but with compute. It’s agentic inference that makes the most sense for racks in space, and conveniently enough, that is also the market that is likely to be the largest in the long run. The third question about data centers in space is if there is a stick. Specifically, while I think that racks-in-space are both a lot more viable than people think, and a lot more relevant to agentic inference than current modes of compute, it is at the end of the day cheaper and easier to build on earth, all things being equal. All things are not equal, however: right now we are at the very beginning of the AI buildout and already one of the biggest constraints is not just power (expected), but zoning (unexpected). I wrote in an Update last week : That leads to an interesting contrast to globalization: when companies were closing down American factories and laying off workers and moving operations to China, none of the affected towns or workers had a say. They just suddenly no longer had a job, and a huge number of cities across the Rust Belt no longer had a reason to exist. People simply had to move, or worse, retreat to things like alcohol or drugs. AI, however, is the opposite: building data centers requires permission, which is to say that people actually have a say. Again, I am not at all saying that these people are well informed about data centers, or about the economic impact on their communities, much less the economic impact of AI generally; what I am noting is that people who didn’t have a say in globalization are suddenly finding they do have a say about AI, and it’s not a surprise they are expressing their disapproval by blocking data centers. In that Update I made the case that data center builders — and by extension the companies that use them — should straight up pay people for permission to build data centers in their communities. At a minimum, however, that increases the costs of terrestrial data centers. What seems very plausible in the long run is that the demand for compute ends up being so large that there eventually is nowhere left to build, making the vast expanses of space not just an alternative but in fact the only choice. If all of this happens — and there are a lot of “if”s here! — then suddenly that $2 trillion valuation starts looking reasonable. SpaceX is already monetizing xAI’s first data center, Colossus 1, to the tune of $15 billion/year for 300MW of capacity; that’s 3,000 racks-in-space. Anthropic, meanwhile, will probably make 3x the revenue on that capacity; it remains to be seen if xAI can get back in the state-of-the-art game, but if so then the amount of revenue it can generate per rack-in-space will be commensurately higher. Even without xAI, however, SpaceX has the potential to be a monopoly provider of marginal compute capacity. There are, needless to say, a massive number of assumptions baked into this argument, including assuming a huge number of engineering challenges are solved, Starship actually works, SpaceX gets sufficient supply of the right kinds of chips, compute demand is massively larger, agentic inference unbundles current architectures, and data center opponents are successful. The risk attached to all of these assumptions should discount the valuation you put on this business, which is to say I still think this IPO is nuts. At the same time, I’m glad it exists, for multiple reasons. The first one is the most obvious one: Musk, for all of his faults, has already pushed humanity forward on multiple vectors, including electric cars, self-driving, reusable rockets, satellite Internet, etc., and I’m excited to see him try and do more. The second is that I am in fact concerned about our ability to muster enough compute to fully realize the gains from AI, and am very worried about a replay of nuclear power, where our failure to build denied us the opportunity to even imagine what could be invented in a world of unlimited energy; the fact Musk is proposing an alternative path to unlimited compute is a relief. The third is that I appreciate the extent to which this IPO is a return to what an IPO should be: the opportunity for people to contribute capital to actually build the business, and to benefit if it works out. As I noted, I can’t make a financial model that necessarily justifies this valuation, particularly based on current financials, but neither can a VC investing in the Series A of a company. SpaceX has already invented a lot, and its early investors are going to make a lot of money with this IPO; at the same time, there is still so much more to invent that there remains a lot of upside — and, to be very clear, a lot of risk. It’s a testament to SpaceX’s ambitions that retail investors get to play VC. And hey, you get Mars upside for free! Training will continue to matter, and Nvidia’s current architecture, including high-speed compute, large amounts of high-bandwidth memory, and high-speed networking, will likely continue to dominate. Answer inference will be a meaningful market, albeit a relatively small one, and speed from chips like Cerebras or Groq (I explained how Nvidia is deploying Groq’s LPUs here ) will be very useful. Agentic inference will gradually unbundle the GPU, which alternates between stranding high-bandwidth memory (during the prefill process) and stranding compute (during the decode process), in favor of increasingly sophisticated memory hierarchies dominated by high capacity and relatively lower cost memory types, with “good enough” compute; indeed, if anything it will be the speed of CPUs for things like tool use that will matter more than the speed of GPUs.

0 views
Blargh 5 days ago

RustRadio UI improved

This is just a short followup to the last RustRadio post. If you came for more rants about C , you’ll be disappointed. I’ve never been that interested in writing UI code, including HTML. You can see the “programmer art” in the screenshots linked from www.habets.pp.se . And then the slightly different tech section , that doesn’t serve much of a purpose now that we have github. I’ve not been happier with GTK, QT, and the others either. But [RustRadio][rustradio] needs a UI. I feel like the browser is the most stable and portable UI. So I’d already decided on that. So now I have to manually do a bunch of DOM manipulation, to create an interactive UI? Or worse, learn the React/Angular/Whatever flavor of the day, that will be obsolete by next afternoon? Gag me with a spoon. For now I’m just continuing to focus on the SDR and architectural parts of RustRadio, and I’m letting the LLM-written code do the HTML manipulation. Yeah, it’s kinda vibe coding. But doesn’t use , and it demonstrably outputs what I want. (I mean, sure it may require some follow-up prompts), so who cares? The vibe coding is isolated to the files doing the drawing. If I want to artisanally craft better code in the future, that’s the file that needs to be rewritten. Until then, it works. <iframe width=”560” height=”315” src=”https://www.youtube.com/embed/7k0JNT6itaI frameborder=”0” allowfullscreen></iframe> See the quick start instructions in the ruwasm repo for how to run this UI live with an RTL-SDR.

0 views
Blog System/5 1 weeks ago

A Markdown-based test suite

This article is not about AI and it is not written with AI, but the work that I’m about to present was definitely motivated by AI. And because I generally like telling stories, I have to give you that background. Do with that whatever you want, but… it’d be a pity if you left just because the AI word showed up in the first paragraph! I think the technical explanation that follows is at the very least entertaining and also interesting independently of AI. Back in December, I started toying with coding agents. One thing I tried, and for which I didn’t expect a lot of success, was to point an AI agent to the EndBASIC public documentation and ask it to write games like Space Invaders or Mario from scratch. And even though the results weren’t perfect and they didn’t work on the first try, they did work with a few tiny tweaks. Combining that with a bunch of hand-written rules, I had an agent producing EndBASIC demos with ease. This experiment was impressive because I did not expect an agent to be able to write EndBASIC code… and because it worked, it fueled my interest to pick EndBASIC’s own development back up. Three thoughts came to mind: Increase EndBASIC’s “self-documenting” aspects so that an AI agent can learn about its idiosyncrasies unsupervised. Speed up EndBASIC so that it can run more elaborate games. Extend EndBASIC with long-desired primitives like sprites and sound, to finally realize the vision behind the project. These thoughts combined sparked the rewrite of EndBASIC’s core that I’ve been pursuing since January and which should see the light of day in the upcoming release. But before that happens, I want to talk to you about just one of the cool pieces behind the new core: namely, its approach to testing. I’ve stopped writing unit tests for the compiler and VM in Rust and I’ve switched to writing them in Markdown. And I believe this has turned out to be a pretty nice approach. One of the things I had to do to convince an AI agent to write proper EndBASIC code was to hand-craft a bunch of rules to tell it how EndBASIC differs from other, more traditional BASIC dialects. That worked OK, but writing these rules by hand was error-prone and difficult to make exhaustive. So I wanted to let LLMs extract that information directly from EndBASIC. The idea was simple: if I wrote the integration tests for the new core in Markdown, the lingua franca of AI, the tests would serve as the canonical and correct documentation demonstrating language behaviors. LLMs are great at summarizing information, so if I unleashed them over a large set of these hands-on “examples”, they would probably figure stuff out, right? And they actually do! I gave the following prompt to GPT 5.4: Based on your pre-existing knowledge of BASIC dialects, I want you to read all of the files, analyze how the EndBASIC dialect differs from your knowledge, and come up with a bunch of rules for yourself to know how to write EndBASIC code later on. You can ignore the Disassembly sections. Beware that all functions and commands in these integration tests are test-only: the real functions and commands that you can use are documented in , so read those too to learn what functionality is available. Write your findings to a file. And this produced a very comprehensive file with spot-on rules: here, take a look . But leaving that aside, let’s peek into the internals of this new Markdown-based test suite. All cool so far? Want to see more similar content in the future? Subscribe now to demonstrate your interest! It’s a collection of Markdown files: Where each file acts as a container of one or more test cases : Every test case has a section title describing what the test is about and various subsections to define the test scenario: A Source code block that is the input to the compiler. If compilation fails, a Compilation errors section with the error messages and nothing else afterwards. If compilation succeeds: A Disassembly section that contains the compiled bytecode. An optional Exit code section showing the program’s exit code, if different from zero. An Output section that contains any messages printed to the console by the executed program. A Runtime errors section that contains any errors from the executed program. Here is a simple example validating the command: There is no section to validate the lexer nor parser internals right now but I’m considering to further extend the format and dump the AST too in order to simplify the tests for these components. The driver for this test suite enumerates all Markdown files in the tests directory and processes them one at a time. For each file, the driver extracts all test case titles and their Source subsections to compute all the test cases to execute. Once the driver has this subset of information from the Markdown files, the driver feeds each individual test case to the compiler and, if compilation succeeds, to the VM. All side-effects are captured and the driver emits a new Markdown file from scratch with the results of the test. Once the driver has terminated producing a new version of the Markdown file for a test, the driver compares the produced file (actual) against the pre-recorded, checked-in version (golden). If they differ, the test fails and the driver uses the tool to print the differences. And that’s it. Easy peasy, right? This keeps the driver super-simple as the only thing it has to do is parse a minimal subset of Markdown, and the diffs it produces are trivial to understand to a human. There are currently 448 test cases and 13k lines of Markdown in this test suite so maintaining them “by hand” is not an option. You wouldn’t want to implement an optimization to the compiler and then have to rewrite hundreds of disassembly chunks in the golden files to reflect the changes, would you? The thing is that, due to the design described earlier, regenerating the golden files after a core change is easy: the driver is already doing exactly that to execute the tests! The trick is, simply put, to ask the driver to rewrite the golden file instead of producing an actual file by setting the environment variable. And voila: all golden files are regenerated in place. I can then use Git to validate the changes and commit them along with the actual code change. Let’s start with the pros of this Markdown-based test suite framework: It is much easier to work with than what I had before. I used to dread touching the compiler and VM of the previous EndBASIC core implementation because tweaking tens of tests was painful. Changes required me to fiddle with positions and deeply nested types, and now the tests are trivial to tweak and diff against previous state. Pretty much any decent text editor has Markdown support, including formatting fenced code blocks. This makes it easy to skim through the test suite and modify the files and is actually the primary reason I used Markdown instead of a bespoke textual format. LLMs can “learn” with ease. OK, fair, this is just a guess: I did not try the same prompt at the beginning of this article against the old core with its Rust-based tests, and maybe the LLMs would have done a good job at reverse-engineering the rules. But because the Markdown tests are so much easier to read by humans, I have to assume that they also are for LLMs. And now, of course, some cons: Regenerating the output of a test, or all tests, is way too easy . With the older Rust-based tests, I was forced to manually punch in things like line numbers and nested AST trees. This process forced me to think through the changes in detail. With the new approach… regenerating the golden files is trivial, so it’s easy to miss little mistakes in source positions or disassembled code. Differences in disassembly are usually noisy and hard to review because every line carries an address and thus any new or deleted instruction will introduce offsets into all other addresses. I could of course choose to not include the instruction addresses in the dump, but they come in handy when manually validating jump targets, so it felt better to keep them around. Rust cannot generate first-class test cases on the fly which means that the various test cases within a Markdown file are “invisible” to the driver: I can run them all or none, but regular test filtering via doesn’t apply. I was able to “expose” the different Markdown files as different Rust-native test cases, but this involves a hardcoded list of test files—which must be kept in sync with the files on disk, and so I mitigated the chances of divergence by adding a test that cross-references the two. This idea does not generalize well. The Markdown-based test suite presented here works well for components where end-to-end testing is favorable and, more importantly, cheap , but I wouldn’t recommend it for other scenarios. Keeping tests fast is a must for quick iteration. And I think that’s about it. If the above feels too abstract, I encourage you to take a look at the driver , its helper code , and the directory with test suites . Now that you have this new trick up your sleeve, what do you think? Back in December, I started toying with coding agents. One thing I tried, and for which I didn’t expect a lot of success, was to point an AI agent to the EndBASIC public documentation and ask it to write games like Space Invaders or Mario from scratch. And even though the results weren’t perfect and they didn’t work on the first try, they did work with a few tiny tweaks. Combining that with a bunch of hand-written rules, I had an agent producing EndBASIC demos with ease. This experiment was impressive because I did not expect an agent to be able to write EndBASIC code… and because it worked, it fueled my interest to pick EndBASIC’s own development back up. Three thoughts came to mind: Increase EndBASIC’s “self-documenting” aspects so that an AI agent can learn about its idiosyncrasies unsupervised. Speed up EndBASIC so that it can run more elaborate games. Extend EndBASIC with long-desired primitives like sprites and sound, to finally realize the vision behind the project. A Source code block that is the input to the compiler. If compilation fails, a Compilation errors section with the error messages and nothing else afterwards. If compilation succeeds: A Disassembly section that contains the compiled bytecode. An optional Exit code section showing the program’s exit code, if different from zero. An Output section that contains any messages printed to the console by the executed program. A Runtime errors section that contains any errors from the executed program. It is much easier to work with than what I had before. I used to dread touching the compiler and VM of the previous EndBASIC core implementation because tweaking tens of tests was painful. Changes required me to fiddle with positions and deeply nested types, and now the tests are trivial to tweak and diff against previous state. Pretty much any decent text editor has Markdown support, including formatting fenced code blocks. This makes it easy to skim through the test suite and modify the files and is actually the primary reason I used Markdown instead of a bespoke textual format. LLMs can “learn” with ease. OK, fair, this is just a guess: I did not try the same prompt at the beginning of this article against the old core with its Rust-based tests, and maybe the LLMs would have done a good job at reverse-engineering the rules. But because the Markdown tests are so much easier to read by humans, I have to assume that they also are for LLMs. Regenerating the output of a test, or all tests, is way too easy . With the older Rust-based tests, I was forced to manually punch in things like line numbers and nested AST trees. This process forced me to think through the changes in detail. With the new approach… regenerating the golden files is trivial, so it’s easy to miss little mistakes in source positions or disassembled code. Differences in disassembly are usually noisy and hard to review because every line carries an address and thus any new or deleted instruction will introduce offsets into all other addresses. I could of course choose to not include the instruction addresses in the dump, but they come in handy when manually validating jump targets, so it felt better to keep them around. Rust cannot generate first-class test cases on the fly which means that the various test cases within a Markdown file are “invisible” to the driver: I can run them all or none, but regular test filtering via doesn’t apply. I was able to “expose” the different Markdown files as different Rust-native test cases, but this involves a hardcoded list of test files—which must be kept in sync with the files on disk, and so I mitigated the chances of divergence by adding a test that cross-references the two. This idea does not generalize well. The Markdown-based test suite presented here works well for components where end-to-end testing is favorable and, more importantly, cheap , but I wouldn’t recommend it for other scenarios. Keeping tests fast is a must for quick iteration.

0 views
Martin Fowler 2 weeks ago

Fragments: May 14

Last week I spent a day at a retreat that brought together several people working in software development to talk about the profession’s future with the rise of agentic programming. The event was help under the Chatham House Rule , so I can’t attribute the comments and stories I heard. (If anyone recognizes themselves, and would like attribution, let me know.) Here are a few tidbits that caught my notebook. ❄                ❄ One group developed a behavioral clone of GNU Cobol compiler in Rust. The result is 70K lines of Rust and was built in 3 days. This is yet another sign of the ability of LLMs to do a good job of porting existing code to a new platform. Good regression tests are extremely valuable here (and I don’t know how good GNU Cobol’s are). There’s also the possibility of building a test suite if you have access an existing implementation. ❄                ❄ Large spec documents can be complex for a human to review. One attendee shared the idea of getting the LLM to interview a human expert, asking the human questions to verify the correctness of the specification, a form of Interrogatory LLM . ❄                ❄ Not specifically about AI - but I liked how one attendee commented that the first thing they do when consulting with an organization is to read the guidelines for their change-control board. This is the scar tissue of what’s gone wrong in the past. I’ve often said that to understand why a thing is the way it is, you need to understand the history of how it got there. This seems like an excellent way to tap into important parts of that history. ❄                ❄ My colleagues who work with modernizing legacy systems have long been rather sniffy about “Lift and Shift” - porting a legacy system to a new platform while retaining Feature Parity . We see this pattern as a huge missed opportunity. Often the old systems have bloated over time, with many features unused by users (50% according to a 2014 Standish Group report) and business processes that have evolved over time. Replacing these features is a waste. Instead, try to muster the energy to take a step back and understand what users currently need, and prioritize these needs against business outcomes and metrics. But this point of view was developed before LLM’s ability to port code appeared. One attendee who does a lot of work in this field said they believed that lifting and shifting to a new platform should now be always the first step in a legacy migration. The cost is no longer as formidable as it used to be, and a better environment makes further evolution much cheaper. Just don’t stop there. ❄                ❄ Several attendees were from the financial industry, and thus were immersed the problems of complex legacy environments coupled with regulatory controls and significant risk should software do something wrong with money. One of their issues is the complexity we run into when a financial product is offered in multiple jurisdictions, each with their own regulations to satisfy. There’s a lot of software complexity in deciding which jurisdiction applies, and picking the right set of rules at the right point of the workflow. The question here is whether the rapidity of agentic programming means that we can build individual, simpler systems for each jurisdiction. We would then use LLMs to ensure consistency between them, so that as the product rules change, each system reflects that into its own environment. A large part of software design is about identifying what is the same and what differs between various business contexts. Where things are the same, and need to be the same, we are rightly wary of duplicating code, since this increases the cost of updates the dangers of inconsistency. The interesting question is what role LLMs can play to give us new tools to tackle this. ❄                ❄ As is usually the case in gatherings like this, folks were concerned about junior developers. When we work with The Genie, our value comes from good judgment - how do we teach that? This group did have one common tool - Pair Programming . One of the key benefits of pairing has always been skills transfer, and here an experienced agentic programmer can pass on their judgment for software design and how to use the genie to get there. And the junior will often have a trick or two to share too - that fresh pair of eyes in particularly valuable in the shift to our agentic future. ❄                ❄ Historically, we use computer systems to bring order to chaotic human processes. Is AI reversing that? ❄                ❄ So much software is involved in data transformation. Those records over there need to be consumed by these APIs over here, but there are differences in how the data is structured, often due to being in different Bounded Contexts , so we have to do some conversion. Agents are particularly adept at writing this kind of transformation code, which is often more tedious than we’d like. ❄                ❄ Chaos Engineering has become a valuable technique to improve resiliency, made famous by Netflix’s Chaos Monkey that randomly breaks live services to see how well the ecosystem reacts and recovers. What would a Chaos Monkey for AI look like? Would it deliberately introduce hallucinations into a pipeline to see if sensors were able to catch them? ❄                ❄                ❄                ❄                ❄ Back at my desk There’s been a bunch of questions about the article on Structured-Prompt-Driven Development (SPDD) that the authors answered in a Q&A section . One in particular caught my eye: Have you considered having an agent do the prompt/spec review itself — not a human reviewing the Canvas, but an agent that reads the REASONS Canvas alongside the code diff and verifies alignment? The reply talks about how there is an available command to do this, but there downsides. In particular one reason not to do this automatically is: Letting humans learn. Review is also where humans learn from the AI’s choices — patterns, trade-offs, options they had not thought of. Cutting humans out speeds things up, but it blocks the long-term skill growth that SPDD is designed to protect. […] Once enough decision rules build up to give us real confidence, we may shift more of the review to the agent step by step — but the part where humans learn from the AI is something we plan to keep. One of the ways we should judge the value of an AI tool is how much it helps us humans learn more about the world we inhabit and build. ❄                ❄                ❄                ❄                ❄ In some strange way I injured my elbow last week. No idea how, there was no event where I said “oh shit”. It just gradually started hurting and swelling. My life-long strategy to avoid sports injuries 1 had defied me. I applied ice and ibuprofin, the swelling went down, but my range of motion got worse. I’m glad I learned to use a knife and fork in English childhood, so I normally eat with my left hand. I noticed that that loss of range of motion occurred after I got home, when I started spending all day at the computer again. I might not use my elbow directly, but my right hand does a lot of typing and mousing. My desk set up is pretty ergonomic, with a good keyboard , a wrist rest for the mouse, and arm rests on my chair. But even so, did my computer use make my elbow get worse once I got home? I can’t imagine not using the computer, for me writing has become an unstoppable habit. But maybe I should use this opportunity to explore voice input - after all most people can speak faster than they can type. I tried this many years ago, when a colleague told me how good voice recognition was once it trained to you. I tried it, and indeed the voice recognition, even in those pre-AI days, was very good. But it didn’t work for me. When I’m writing I rapidly type words into Emacs, but almost immediately I go back to edit them. Write two sentences, edit them, write another, re-edit the paragraph. The back-and-forth between seeing my words and thinking about them is tight - I can’t just dictate my words. That made me reflect further. I only started using a computer for my writing in my 20s. At school I had to write longhand, and in university to type on typewriter. But those media don’t support the constant rewriting that I do now. Would I even have become a writer had the text editor not been invented? ❄                ❄                ❄                ❄                ❄ James Pritchard thinks that many developers are over-using agents at run-time in their products, when LLMs are better used as functions . The problem with agents isn’t that they don’t work. It’s that they work unpredictably. You trade a known execution path for “autonomy” that mostly means “I don’t know what it’s going to do.” When an agent-powered feature breaks in production, you’re debugging a conversation transcript, not a stack trace. Most “agent” use cases are actually workflows, a known sequence of steps where one or two of those steps happen to involve an LLM. You don’t need autonomy for that. You need a function call. He points out that functions compose predictably, so if you know the workflow, then composing in a program text is better than agents figuring out how to coordinate themselves. It’s faster, and needs less tokens. It’s usually easier to deal with failures, since the scope of the interaction is smaller. ❄                ❄                ❄                ❄                ❄ Pritchard also thinks that people use skills far more than they should . He thinks people accumulate folders of markdown skill files but LLMs use them inconsistently, often missing them when they’re needed, or bloating context when they are not. Many things that should go in skills should be other parts of a harness , preferably computational. Skills should only be used with deliberate, infrequent workflows. The skills obsession is a symptom of a deeper pattern: people reaching for configuration when they should be reaching for architecture. “The LLM doesn’t write good tests.” Don’t write a testing skill. Are your existing tests inconsistent? Is the test setup complex? Fix those things and it’ll write good tests without being told how. Point it at a test file you’re proud of. Code is clearer than English. The best setup is one where you barely need to configure the LLM at all. A clean codebase with clear patterns, a short project config for the non-obvious stuff, hooks for automation, and maybe one or two skills for specific workflows you run intentionally. That’s it. ❄                ❄                ❄                ❄                ❄ An oft-stated point about the rise of agentic programming is that we have to start dealing with non-determinism in our work. Of course that’s somewhat of a simplification, because some aspects of software development have long had to face non-determinism. A notable example of this is distributed systems, and a notable figure in helping us probe the truly uncomfortable waters of distributed systems is Kyle Kingsbury (Aphyr). Last month he dropped a long article (the pdf is 32 pages) on how he sees our LLM-enabled future. The title “ The Future of Everything is Lies, I Guess ” betrays his lack of enthusiasm for this future. Some readers are undoubtedly upset that I have not devoted more space to the wonders of machine learning—how amazing LLMs are at code generation, how incredible it is that Suno can turn hummed melodies into polished songs. But this is not an article about how fast or convenient it is to drive a car. We all know cars are fast. I am trying to ask what will happen to the shape of cities. It’s worth the long read, even if it isn’t terribly cheerful. Kingsbury brings up many of worries about AI’s growth from the perspective of someone who is clearly well-informed about their capabilities. His view is that the best response to all this is that we should stop. He wants to avoid using AIs for his writing, software, or personal life. He thinks those working for the AI companies should quit. And yet he also knows that these tools are useful, and wants to use them. I’m both a hoper and a doomer when it comes to our AI future. Fundamentally I see any powerful technology as a big bus: we are either on it, or get run over by it. I’m onboard the bus because I don’t think putting up some barriers would stop me being crushed by its wheels. Maybe if I’m on the bus I can join some people to influence the driver a bit. I’m also very reluctant to speculate on the future outcomes of anything, let alone something as powerful as this. Did the early industrialists in the late eighteenth century have any clue what the industrial revolution they unleashed would do? While it created many harms, it also created a massive rise in the living standards of millions of people, at least those whose countries were on the bus. AI may create benefits that I can’t really dream of, although I can glimpse it when it helps a friend stave off Parkinson’s disease. Those hopes are there, but Kingsbury’s article shines a light on the darker elements of the here-and-now, asking serious questions of responsibility a part of my work as a moderator of a Mastodon instance is to respond to user reports, and occasionally those reports are for CSAM, and I am legally obligated to review and submit that content to the NCMEC. I do not want to see these images, and I really wish I could unsee them. On dark mornings, when I sit down at my computer and find a moderation report for AI-generated images of sexual assault, I sometimes wish that the engineers working at OpenAI etc. had to see these images too. Perhaps it would make them reflect on the technology they are ushering into the world, and how “alignment” is working out in practice. Don’t do sports  ↩ Don’t do sports  ↩

0 views
matklad 2 weeks ago

Learning Software Architecture

In reply to an email asking about learning software design skills as a researcher physicist: I was attached to a bioinformatics lab early in my career, so I think I understand what you are talking about, the phenomenon of “scientific code”! My thoughts: First meta observation is that “software design” is something best learned by doing. While I had some formal “design” courses at the University, and I was even “an architect” for our course project, that stuff was mostly make-believe, kindergarteners playing fire-fighters. What really taught me how to do stuff was an accident of my career, where my second real project ( IntelliJ Rust ) propelled me to a position of software leadership, and made design my problem. I did make a few mistakes in IJ Rust, but nothing too horrible, and I learned a lot. So that’s good news — software engineering is simple enough that an inquisitive mind can figure it out from first principles (and reading random blog posts). Second meta observation, the bad news: Conway’s law is important. Softwaregenesis repeats the social architecture of the organization producing software. Or, as put eloquently by neugierig , If I were to summarize what I learned in a single sentence, it would be this: we talk about programming like it is about writing code, but the code ends up being less important than the architecture, and the architecture ends up being less important than social issues. I suspect that the difference you perceive between industrial and scientific software is not so much about software-building knowledge, but rather about the field of incentives that compels people to produce the software. Something like “my PhD needs to publish a paper in three months” is perhaps a significant explainer? Two things you can do here. One , at times you get a chance to design or nudge an incentive structure for a project. This happens once in a blue moon, but is very impactful. This is the secret sauce behind TIGER_STYLE , not the set of rules per se, but the social context that makes this set of rules a good idea. Two , you can speedrun the four stages of grief to acceptance. Incentive structure is almost never what you want it to be, but, if you can’t change it, you can adapt to it. This is also true about most industrial software projects — there’s never a time to do a thing properly, you must do the best you can, given constraints. Let me use rust-analyzer as an example. The physical reality of the project is that it’s simultaneously very deep (it’s a compiler! Yay!) and very wide (opposite to an LLM, a classical IDE is a lot of purpose-built special features). The social reality is that “deep compiler” can attract a few brilliant dedicated contributors, and that the “breadth features” can be a good fit for an army of weekend warriors, people who learn Rust, who don’t have sustained capacity to participate in the project, but who can sink an hour or two to scratch their own itch. My insistence that doesn’t require building , that it builds on stable, that it doesn’t have any C dependencies, and that the entire test suite takes seconds, was in the service of the goal of attracting high-impact contributors. I was wrangling the build system to make sure people can work on the borrow checker without thinking about anything else. To attract weekend warriors, the internals of rust-analyzer are split into multiple independent features, where each feature is guarded by at runtime. The thinking was that I explicitly don’t want to care too much about quality there, that the bar for getting a feature PR in is “happy path works & tested”. It’s fine if the code crashes, it will only attract further contributors, provided that: In contrast, when working on the core spine which provided support for features, I was very relatively more pedantic about quality. A word of caution about adapting to, rather than fixing incentive structure — the future is uncertain, and tends to happen in the least convenient manner. The original motivation behind rust-analyzer experiment was to avoid the need to write a parallel compiler (the one in IntelliJ Rust), and to prototype a better architecture for LSP, so that the learnings could be backported to . So, even in core (especially in core), the code was very experimental. Oh well. Stuck with one more compiler now, I guess? I might hazard a guess that something similar happened to uutils project, which started as the primary destination for people learning Rust, and ended up as Ubuntu coreutils implementation. Third , now to some concrete recommendations. Sadly, I don’t know of a single book I can recommend which contains the truths. I suspect one can only find such a book in an apocryphal short story by Borges: practice seems to be an essential element here. But here are some things worth paying attention to: Boundaries talk by Gary Bernhardt is all-time favorite. It contains solid object-level advice, and, for me, it triggered the meta inquiry. How to Test is something I wish I had. I immediately understood the importance of testing, but it took me a long time to grow arrogant enough to admit that most widely-cited testing advice is shamanistic snake-oil, and to conceptualize what actually works. ∅MQ guide and, more generally, writings by Pieter Hintjens introduced me to Conway’s Law thinking. That “feature development” architecture of rust-analyzer? – optimistic merging , applied. Reflections on a decade of coding by Jamii is excellent, goes very meta. It is intentionally the first of my links . Ted Kaminski blog is the closest there is to a coherent theory of software development, appropriately framed as a set of notes to a non-existing book! As for the actual books, Software Engineering at Google and Ousterhout’s The Philosophy of Software Design are often recommended. They are good. SWE, in particular, helped me with a couple of important names . But they weren’t ground breaking for me. the quality is isolated to a feature, and doesn’t spill over, at runtime, the crash is invisible to the user (it’s crucial that rust-analyzer features work with an immutable snapshot, and can’t poison the data).

0 views
Blargh 2 weeks ago

Quantum safe amateur radio secure shell

I’ve previously pointed out that the AX.25 implementation in the kernel is pretty poor . It’s not really being maintained, and even when it gets fixes after I reported it , with people running LTS OSs it can take like 5 years before before the fix actually reaches users, if ever. So when writing applications, you still have to work around kernel bugs from a decade ago. This makes it kind of pointless to upstream patches. The exception is security patches, and reading between the lines of why the AX.25 code is now being removed from the kernel , it sounds like maybe some LLM (like the looming “Mythos” and the related Glasswing ) may have found some severe problems. But even if there aren’t any known security problems yet, having code is now more of a liability than ever. Code needs to be removed, or taken responsibility of. (tangent about ffmpeg at the bottom of this post) With the kernel code removed, say goodbye to the old walkthrough . Well, not “new”, per se, but “replacement”. With the socket based API about to be gone, we need some other way for applications to send packets and manage connections. For sending raw packets to and from the modem there’s KISS . I have no real complaints about it. Not much to get wrong about sending frames. It’s implemented by most modems, like the software modem Direwolf and by some radios like the Kenwood TH-D75 , so it’s not going anywhere. For connected mode (streams of in order data, like with TCP) the biggest contender seems to be AGW . Direwolf implements it, and I’ve made a messy implementation of an AGW client in Rust . The Rust API works, as we’ll see, but the code needs some refactoring and cleanup due to it being written exploratorily while I was deciding what it should even do, and how. The AGW protocol is not super amazing, but it gets the job done. One can build a connection API on top of it, as I have , and never have to think about the AGW protocol ever again. There’s another protocol called RHP, specified here and here . It came out of the XRouter project. Since XRouter is closed source, I have a strong aversion to it. It seems both counter to how I see amateur radio, and anachronistic, for it to be closed source. It’s bad enough that VARA and [Winlink][winlkn] are closed source. And people are definitely working on replacing VARA with various other modes because of it. tl;dr: I’m going with AGW for now. If someone writes a Rust crate for RHP exposing a compatible API, I certainly wouldn’t mind adding that dependency to optionally use. I have not yet implemented AGW (or RHP) in my own AX.25 stack , but I plan to. For now that means I’ll use Direwolf. My previous axsh implementation, since deleted , had some problems: So with everything but terminal management needing a rewrite, this is a reason to rewrite the whole thing. Non-requirement: Encrypt — This would violate the amateur radio license. And then, why not just use SSH? If you have an AGW server, such as Direwolf, then it’s easy to run axsh. Just start a server: Then log in: Then wait like 30-40 seconds for the handshake to complete. The reason for the wait is the large ML-DSA signatures used in the handshake. It can’t be the same direwolf instance, since Direwolf only shuffles packets between the radio and AGW clients, not from one AGW client to another. In my case I had one Direwolf connected to an ICom 9700, and another to a Baofeng UV5R using an AIOC (all in one cable) . AIOC is highly recommended for experimentation over the air. So yeah my test is between just about the cheapest VHF/UHF radio that exists, and maybe the most expensive one. In addition to running : With KISS providing packet support (and AGW providing a higher level API on top, if preferred), why not just run TCP/IP, and let the very stable OS TCP implementation take care of everything? TCP is definitely more modern, stable, and maintained, but it doesn’t scale down to slow speeds very well. A TCP+IPv4 header is at least 40 bytes, and if you don’t want to be some sort of caveman, IPv6 is another 20 bytes. At 1200bps that would be 267-400ms overhead for every packet 1 . Checking a random TCP data packet on my laptop I see that with TCP options TCP/IPv4 is actually 52 bytes, or 350ms. Counting the air time (milliseconds, not just bytes) makes this overhead problem more obvious. And because of amateur radio license reasons TCP would still need to identify the callsign, you probably have to add 17 bytes (113ms) as a surrounding header. That leaves TCP with 69 or 89 bytes overhead per packet, meaning 460ms or 593ms. And since you don’t want to tie up the RF channel for too long (only for the whole packet to be dropped due to interference), you won’t want to send packets that are too large. Of course it’s 4x as slow if you want to do something like Bell 103 on HF. AX.25 connected mode takes that down to 19 bytes (126ms) overhead (if using Mod 128 mode) per data packet. Because of the AX.25 segmenter, for bulk data TCP is not as bad as it may have sounded. For a 1500 byte TCP segment, fitting in just under 8 200 byte AX.25 frames (totalling bytes of overhead), this means 1367ms overhead instead of plain AX.25 (at bytes) 1013ms. A 1500 byte payload takes 10 seconds to send, so that’s an overhead of 13.7% instead of 10.1%. But for interactive use cases, worst case a single payload packet, it’s 467ms vs 133ms. And that’s only counting the data frames, not the acknowledgments. A TCP ACK is at a minimum bytes, or 380ms. An AX.25 RR is 18-19 bytes, or 120-127ms. That makes TCP about three times less efficient, compared to AX.25. A bigger problem with TCP, especially untweaked, is resend timers and window sizes. At 1200bps you don’t actually want too big a window size, since you don’t want to tie up the RF channel for several minutes if the other end has gone away. So a bunch of airtime tweaks are needed. And at best you’ll end up with the numbers above. Maybe you could tweak TCP to be more friendly to lower speeds, and find the other overhead acceptable. If so, then you’ll be happy to hear that axsh supports running on TCP as well. Well first, it inherits the same problems from TCP/IP. Sure, the UDP header is smaller than the TCP header, but then on top of that there’s the QUIC header. The second problem is that QUIC is meant to be encrypted. Ripping out encryption, while staying secure, seems more dangerous that keeping it simple and just working from the requirements. Probably the whole handshake would have to be redesigned. AX.25 being removed from the Linux kernel reminds me of LLM finding that bug in ffmpeg , causing all that drama. I have no dog in this fight, but in my opinion ffmpeg is in the wrong, here. Their argument seems to be all about how this particular encoder is rarely used, is just a hobby project, etc.. Ok, but it’s in your code base. Even if disabled by default, why would you want to ship a security footgun? Maybe some hobbyists out there build ffmpeg with all encoders enabled. Do you want them to be vulnerable to someone’s virus? So Google should either keep quiet, or give a patch? Well, keeping quiet because the codec is rarely used is not really an option. That’s borderline negligent and morally culpable, for when someone eventually gets hacked. So Google “should” always provide a patch in these cases? Perhaps, depending on the meaning of the word “should”. Google is rich, so “should” be morally forced to contribute to your software, just because Google (presumably, via youtube) is a heavy user of ffmpeg? Well, that just sounds like the the (non-)problem with open source software (or free software) in general. The license permits use and profit without contribution. If you wanted a tithe then you should have put that in the license. Sounds like you want everyone to be free only to do what you want. That’s not how that works. This is also why I don’t like the AGPL license . It’s not free software if it binds me in your serfdom. Actually, it’s a tiny bit more, because of the occasional bit stuffing ↩ it was implemented in C++, and not only do I prefer Rust, how could I even call something written in C++ “secure”? (a blog post for another day) used the kernel API, so that needs rewriting, used , which proved to be a bit “weird” when interoperating with some other APIs, and used crypto primitives vulnerable to quantum computers. Don’t use kernel AX.25 sockets — this means use AGW. Also work on TCP (mainly for debugging) — This means using an internal framing protocol. Be quantum safe — Use ML-DSA+ed25519 dual signed for authentication of server and client. Be efficient — This means don’t use ML-DSA for per packet signatures (they are huge), at the cost of some quantum safety (see [the README][axsh]). Actually, it’s a tiny bit more, because of the occasional bit stuffing ↩

0 views
Blog System/5 3 weeks ago

What if there was no BASIC in EndBASIC?

Six years have passed since I started building EndBASIC : a retro-looking BASIC interpreter that works on the web, on the desktop, and on embedded hardware—and that allows writing cross-platform apps that leverage graphics, a cloud file sharing system, and even access to local hardware via GPIO. But as cool as this sounds, and as exciting as the journey has been, there is something that keeps bugging me about the future of the project: who wants to invest time building something new on an abandoned language? Even Visual Basic, a real platform that evolved over many years and gained “serious language features”, has fallen out of fashion and is, as far as I know, in “maintenance mode” by Microsoft . So I’ve been thinking… “What if there was no BASIC in EndBASIC?” Or, in other words, how could I leverage the many pieces I’ve created underneath this project to build something that people actually want to use, be it for retro-style development or for other purposes? In fact, the BASIC portion of the whole project—that is, the language parser and compiler—is the least interesting one. Yes, I’ve written a BASIC interpreter, but the amount of features my dialect provides today is… limited, and building those up is not the most exciting thing to do. For example: records and files are very much needed features, but adding them is not going to make the project much cooler or useful than it already is. So: let’s look at the building blocks (BBs) behind EndBASIC so that we can imagine how these pieces could be recombined to form something different. At the root of EndBASIC lives a “pure” language core with a relatively simple, non-optimizing (yet) compiler and accompanying VM that can be extended via native Rust bindings and that can be embedded into Rust programs. It’s trivial to leverage this core language to implement imperative DSLs or even dynamic configuration files. What does “pure” mean here, though? Simply put that the core language, implemented as a separate crate with minimal dependencies, has absolutely no function nor command definitions in it. The programs executed by this VM can have no side-effects nor escape the VM by default: all they can do is compute values, maybe based on global variables injected by the host. Consumers of this core crate, like EndBASIC’s standard library, are responsible for implementing all functions and commands that make the language useful—but these are all intentionally kept out of the core. Even fundamental primitives that you would expect a BASIC dialect to provide, like or , are not part of the core language. This poses some difficulties in the implementation of the compiler but they are an explicit design choice to keep the core lean. Underneath EndBASIC’s graphics drawing commands, there is a collection of Rust primitives to interact with a possibly-graphical console. Hybrid graphics/text console running on Firefox. Image from my presentation at BSDCan 2025 . Today, this console runs on various environments, including: The text-based Windows and Unix-like terminals via the library. The SDL library for native desktop graphics support across Windows and Unix-like systems (including macOS and Linux, of course). The HTML5 canvas to target the web browser. The NetBSD wscons framebuffer to target “boot-to-EndBASIC” environments. The ST7735s LCD (but more generally, SPI-based LCDs). For all graphical backends, these Rust primitives provide consistent basic rasterization algorithms for common shapes and support different bitmap fonts. Behind EndBASIC’s file operations lives a Rust library that implements a virtual drive-based file system with multiple implementations for these drives. Right now, drives can be backed by: Memory, like a ram disk. A collection of read-only files hardcoded into the binary. A single directory in the host’s file system. The browser’s local storage. A cloud service. The cloud service drive is backed by the EndBASIC Service, which provides a simple REST API for file access and mutation. Files stored in this drive can be user-private, but they can also be shared with specific users or be made public. Powering the EndBOX , we have a NetBSD base system slimmed down to a minimum set of components with the ability to boot into a graphical application within seconds of powering up the device. The system image has been designed so that it’s possible to access configuration files and user data from Windows and macOS hosts with ease, allowing tuning the system and performing backups without having to access the inner NetBSD OS. In other words: NetBSD is just an implementation detail that is invisible to users of this image. This platform is accompanied by a custom collection of robust build scripts that coordinate compiling NetBSD from scratch with the cross-building of a Rust application that uses the console framework described earlier. These scripts then take care of bundling the cross-built binary into the image and starting it up early in the boot process. You can think of this as a simple version of buildroot but NetBSD-based and explicitly targeted at creating application-specific images for embedded devices. What’s more: in order to support the needs of an application that starts so early in the boot process and to ensure the application can run with the least amount of privileges, the platform provides an RPC mechanism and a privileged daemon that can be used to escalate privileges for specific operations (such as rebooting the machine). And there is also a daemon to optionally collect completely anonymous telemetry (things like version numbers, host architecture, and graphics device in use) from deployed devices. Diagram of the system services provided by the EndBOX. Slide from my presentation at BSDCan 2025 . So, what could we do with these? As you were reading through the four building blocks above, I suspect some ideas might have come to mind about what could be done with these pieces without necessarily keeping the BASIC dialect around. Here are some of mine for how this project could evolve: Continue building the BASIC language up. Yes, of course that’s a possibility! EndBASIC itself is a fun project to work on, and adding more features to it and continuing to build up this retro environment is a never-ending source of joy. Even if I question the point later… Replace BASIC with a real language. As I said earlier, further developing the BASIC dialect is not super interesting, particularly because BASIC is full of warts that make it hard to parse. So what if you got the exact same retro, cross-platform experience that EndBASIC currently offers but based on a real language that people like using these days? Lua is a strong contender… but so could be LISP or Scheme. I don’t know much about the latter two, but knowing that LISP machines used to exist… they could be pretty fitting. Focus on the core. I kinda like the idea of having a pure compiler and VM that can be extended to form DSLs, all written in Rust. So maybe building on this specifically is another possible path? The problem is that there already are more languages in the world than you can imagine, so pursuing this feels like a dead end right from the beginning, and I’m not even sure what it’d mean. A BSD-based buildroot alternative. buildroot works, but it’s based on Linux and comes with the problems that you’d expect from Linux: you are essentially assembling a disparate set of components built by different people with little communication. Putting them together gives you “something”, but it is not cohesive. The platform behind EndBASIC, however, carries an opinionated design for how embedded disk images should be, and provides basic services to enforce security principles, to preserve privacy, and to be performant. So what if you could write a Rust-native program that leveraged my console and file system abstraction primitives, and could deploy it to an embedded device with ease as “a full OS”? Something something AI. There is no denying that AI is here to stay, so I’ve also been wondering how it might be related to EndBASIC. AI coding agents are great for prototyping ideas, and EndBASIC is a great platform to showcase flashy results with very little code. So maybe EndBASIC could become a good target for coding agents so that you could vibecode embedded projects? I wrote this article sometime last year after running out of steam: shipping the EndBOX was a lot of work and, once that was out, I had to take a break. Honestly, it was a bit disappointing to receive so little interest. By Christmas, I picked up the draft and was almost about to publish it. But then… In January, I started working on a full re-implementation of EndBASIC’s core language, because reasons that are not important now. This means that the “next steps” have been decided already, at least for the next few months: this rewrite is landing imminently as I mentioned in the EndBASIC 0.12 release announcement from last week. Once that happens, I’ll want to spend some time building up some features that I’ve been wishing to have but never got to for years. In particular, I want to provide some extra graphics manipulation primitives (primarily for sprites) and I want to finally add sound support. But all other ideas are still on the table, and I’m curious: do any of them pique your curiosity? If you answered yes to the question right above, subscribe to show your support, and make sure to leave a comment below! In fact, the BASIC portion of the whole project—that is, the language parser and compiler—is the least interesting one. Yes, I’ve written a BASIC interpreter, but the amount of features my dialect provides today is… limited, and building those up is not the most exciting thing to do. For example: records and files are very much needed features, but adding them is not going to make the project much cooler or useful than it already is. So: let’s look at the building blocks (BBs) behind EndBASIC so that we can imagine how these pieces could be recombined to form something different. BB1: A pure compiler and VM At the root of EndBASIC lives a “pure” language core with a relatively simple, non-optimizing (yet) compiler and accompanying VM that can be extended via native Rust bindings and that can be embedded into Rust programs. It’s trivial to leverage this core language to implement imperative DSLs or even dynamic configuration files. What does “pure” mean here, though? Simply put that the core language, implemented as a separate crate with minimal dependencies, has absolutely no function nor command definitions in it. The programs executed by this VM can have no side-effects nor escape the VM by default: all they can do is compute values, maybe based on global variables injected by the host. Consumers of this core crate, like EndBASIC’s standard library, are responsible for implementing all functions and commands that make the language useful—but these are all intentionally kept out of the core. Even fundamental primitives that you would expect a BASIC dialect to provide, like or , are not part of the core language. This poses some difficulties in the implementation of the compiler but they are an explicit design choice to keep the core lean. BB2: A portable console framework Underneath EndBASIC’s graphics drawing commands, there is a collection of Rust primitives to interact with a possibly-graphical console. Hybrid graphics/text console running on Firefox. Image from my presentation at BSDCan 2025 . Today, this console runs on various environments, including: The text-based Windows and Unix-like terminals via the library. The SDL library for native desktop graphics support across Windows and Unix-like systems (including macOS and Linux, of course). The HTML5 canvas to target the web browser. The NetBSD wscons framebuffer to target “boot-to-EndBASIC” environments. The ST7735s LCD (but more generally, SPI-based LCDs). Memory, like a ram disk. A collection of read-only files hardcoded into the binary. A single directory in the host’s file system. The browser’s local storage. A cloud service. Diagram of the system services provided by the EndBOX. Slide from my presentation at BSDCan 2025 . So, what could we do with these? As you were reading through the four building blocks above, I suspect some ideas might have come to mind about what could be done with these pieces without necessarily keeping the BASIC dialect around. Here are some of mine for how this project could evolve: Continue building the BASIC language up. Yes, of course that’s a possibility! EndBASIC itself is a fun project to work on, and adding more features to it and continuing to build up this retro environment is a never-ending source of joy. Even if I question the point later… Replace BASIC with a real language. As I said earlier, further developing the BASIC dialect is not super interesting, particularly because BASIC is full of warts that make it hard to parse. So what if you got the exact same retro, cross-platform experience that EndBASIC currently offers but based on a real language that people like using these days? Lua is a strong contender… but so could be LISP or Scheme. I don’t know much about the latter two, but knowing that LISP machines used to exist… they could be pretty fitting. Focus on the core. I kinda like the idea of having a pure compiler and VM that can be extended to form DSLs, all written in Rust. So maybe building on this specifically is another possible path? The problem is that there already are more languages in the world than you can imagine, so pursuing this feels like a dead end right from the beginning, and I’m not even sure what it’d mean. A BSD-based buildroot alternative. buildroot works, but it’s based on Linux and comes with the problems that you’d expect from Linux: you are essentially assembling a disparate set of components built by different people with little communication. Putting them together gives you “something”, but it is not cohesive. The platform behind EndBASIC, however, carries an opinionated design for how embedded disk images should be, and provides basic services to enforce security principles, to preserve privacy, and to be performant. So what if you could write a Rust-native program that leveraged my console and file system abstraction primitives, and could deploy it to an embedded device with ease as “a full OS”? Something something AI. There is no denying that AI is here to stay, so I’ve also been wondering how it might be related to EndBASIC. AI coding agents are great for prototyping ideas, and EndBASIC is a great platform to showcase flashy results with very little code. So maybe EndBASIC could become a good target for coding agents so that you could vibecode embedded projects?

0 views

Building With Intent

I'm working on a new application called TinyFeeds, it's a native RSS feed reader. Sure there's thousands of those, but this one is mine and as such I'm being extremely intentional about how it's built. I believe constraints breed innovation, and as such I've outlined a few constraints for myself in this project. First off, the file size has to be 5MB or under for the shipped binary. This is inspired by Matt's Fits on a Floppy manifesto. I'm also inspired by the Palm Pilot apps I use on a daily basis, many of which are under 5MB. Maintaining a small file size makes you second guess the need for features, libraries, graphics, etc. In a world where Google Chrome secretly downloads an extra 4GB for a local LLM , I feel like small apps are sorely needed. Second, the application is to be built in Rust and Iced . This constraint has forced me to finally dig in and learn Rust. The result is a fast, native application that has a high level of stability thanks to the tools used to build it. Finally, no LLM generated code is to be used. This again forces me to actually learn the language, focus on code structure, and de-scope feature bloat. It also makes me feel proud of what I've built, something I never feel when using LLMs. So how's it going? Great so far! As I mentioned, TinyFeeds is built intentionally for me and how I enjoy consuming RSS. With any feed reader I always filter by unread posts from today. I don't use folders, tags, bookmarks, etc. So that's exactly what TinyFeeds does: The UI has been designed to facilitate this. It's incredibly simple, but the layout is intentional. TinyFeeds won't be for everyone, heck it might only be something I want, but that's the point! I find it a joy to use even in it's early state. While it isn't ready yet, you can early trial it if you so desire by cloning from Codeberg and building it yourself ( ). The app currently clocks in at 4MB when built with the build script! After TinyFeeds, I plan to build similar apps focused on small size, performance and minimal feature sets. All hand coded. Possibly inspired by Palm OS apps :-P Reads your feeds from a simple .txt file Shows new stories from today Only shows a single story at a time Remembers what stories you've viewed so they aren't shown again

0 views
Ivan Sagalaev 3 weeks ago

nfp -e

Last Friday I spotted Dave Gauer's post about using a text editor as a UI which hit some of my sweet spots about computers. One of the examples mentioned in it was which opens a cron config file in a text editor. And not only it spares you remembering the location of the config, but it also offers a guiding commented example if the config is missing, and helpfully signals cron to pick up the changes after you finish editing. And almost immediately I thought about my own tool that could use something like this: nfp . Amazingly enough, not only had I actually opened the project and started fiddling around, I continued doing it through the weekend, and by the end of day on Sunday actually finished the feature! And despite it being a rather small one, as they go, I have to add that I was coding all through watching snooker matches, cooking food, chauffeuring my family on errands and dealing with some emergencies. So I went to bed feeling quite happy with myself :-) It still feels exciting to me how any programming task gradually reveals its true complexity after you go from thinking about what you should do to actually doing it. Saying "nfp -e should open the config in a text editor and restart after editing" sounds simple enough, but here's a few of the questions I had to work through. Some of them were quite the head-scratchers. First, open which editor? There's an obvious env var, but there's also , which usually takes precedence. How do you restart? Sending a signal to a working daemon was the first thing that came to my mind, but that might prove cumbersome, as the daemon lives in a loop waiting for file events, and this loop owns the information parsed from the config. Handling would require a separate facility to update that information. I'm not quite comfortable thinking of how to do that in Rust. Thankfully, this complication turned out to be a blessing in disguise: since the file watching machinery is already there, just watch the config too and add a special case to handle it differently from regular files! Do you edit the config file directly or do you do it on the side, in a temp file? The temp file feels like a cleaner, safer choice, because it gives you a chance to verify correctness of the new config and prevent the real one from breaking. But there's a downside: how do you open the same temp file for the user to continue editing it the next time they run ? It's going to be a new process, it doesn't know the old temp file. does it by organizing a loop within the same process with a yes/no prompt asking the user if they want to re-edit the same file. But that starts feeling more complicated than the feature deserves. I ended up with a simpler solution where I always open the actual config, which means it can get mangled. I handle it in the running daemon itself, which simply refuses to restart its main loop when it can't parse the config. Providing an example config on the first run proved to be tricky, as confy (the config handling library) actually immediately creates a non-empty file if it doesn't exist on the a load attempt. So I had to rewire my brain to think "a config with no useful entries" instead of "a missing config." That worked! All in all, this was quite fun! I (finally) converted the repository from pijul to git and pushed it to CodeBerg . I still think pijul has a superior architecture as a VCS, but the world has apparently settled on git for good. Also, while I'm happy to not deal with the toxic culture of GitHub, having code published in a weird way means most people wouldn't even want to try it. After 4 years I haven't gotten a single peep of feedback :-) And I still believe in sharing. I hope CodeBerg becomes my sweet spot. First, open which editor? There's an obvious env var, but there's also , which usually takes precedence. How do you restart? Sending a signal to a working daemon was the first thing that came to my mind, but that might prove cumbersome, as the daemon lives in a loop waiting for file events, and this loop owns the information parsed from the config. Handling would require a separate facility to update that information. I'm not quite comfortable thinking of how to do that in Rust. Thankfully, this complication turned out to be a blessing in disguise: since the file watching machinery is already there, just watch the config too and add a special case to handle it differently from regular files! Do you edit the config file directly or do you do it on the side, in a temp file? The temp file feels like a cleaner, safer choice, because it gives you a chance to verify correctness of the new config and prevent the real one from breaking. But there's a downside: how do you open the same temp file for the user to continue editing it the next time they run ? It's going to be a new process, it doesn't know the old temp file. does it by organizing a loop within the same process with a yes/no prompt asking the user if they want to re-edit the same file. But that starts feeling more complicated than the feature deserves. I ended up with a simpler solution where I always open the actual config, which means it can get mangled. I handle it in the running daemon itself, which simply refuses to restart its main loop when it can't parse the config. Providing an example config on the first run proved to be tricky, as confy (the config handling library) actually immediately creates a non-empty file if it doesn't exist on the a load attempt. So I had to rewire my brain to think "a config with no useful entries" instead of "a missing config." That worked!

0 views
Kaushik Gopal 3 weeks ago

Agents are the new compilers. Specs are the new code.

Linus Torvalds recently said 1 AI will be to code what compilers were to assembly — freeing us from writing it by hand. Around the same time, I talked with Jesse Vincent (creator of one of the most popular agent skills out there — superpowers ). Something he said stuck with me: Specs are going to be the new code . I realize those two ideas snap together a little too neatly. Agents are compilers 2 and specs will become code. Software engineering is moving up another level of abstraction and we’ve seen this play out before. I saw this first-hand with my tiny USB-C cable checker — . It started as a shell command over macOS’s , then became Go when I wanted a proper binary, then Rust because I wanted to practice Rust, and later a version. The code kept changing. The thing I cared about did not: parse the USB tree, identify the attached devices, report the speed, and make bad cables obvious. , my voice track sync program, followed the same pattern. It started in Python because the audio libraries were there. Then I moved it to Rust because I didn’t want to ship a Python runtime or care which Python version happened to be on a machine. Again, the implementation changed. The behavior stayed boringly stable: take a master track and local tracks, find the offset, pad or trim each file, and drop aligned audio into the DAW. Compilers freed us from writing assembly. Agents may free us from writing code because it becomes an artifact the spec produces. The somewhat recent push around detailed exec plans could be an early signal of the looming shift at bigger scale. Push that thought further. We might get comfortable rebuilding whole modules instead of patching and refactoring them. We preserved the old shape of a system because throwing it away cost too much. Even when you know the module is wrong, you sand it down: extract an interface, migrate one caller at a time, add tests around behavior nobody fully understands. You keep moving because the alternative is a rewrite, and rewrites have a well-earned reputation for eating companies alive. But agents change that cost curve. If an agent can read the spec, understand the tests, inspect production traces, and rebuild a module in an afternoon, the sensible move may be to replace the entire module altogether. Push that even further and the unit of work changes. You stop asking an agent to patch one function or file. You ask it to rebuild the entire payment module against the tweaked spec. Heck, swap out the auth layer with a new library. Or regenerate the API boundary, now that the domain model is clearer. This is the part I cannot stop thinking about. Each rebuild can start from what we now understand about the whole module, not from what we believed the first time someone shipped it. Tech debt the old code carried (because it grew one patch at a time) can finally come off. The spec can absorb what we learned from the old implementation: the weird edge case in billing, the migration path nobody wrote down, the customer whose workflow depends on a “bug”, the batch job that only fails on the first day of the month. Specs become the place where the system’s memory lives. Once those lessons move into the spec, the implementation becomes replaceable. We are becoming Spec Writers. starts at the 1:48 mark  ↩︎ Yes, agents aren’t deterministic the way compilers are — same prompt tomorrow may give different code. But that may be the wrong bar moving forward. What has to stay stable is behavior under the spec; the code can vary. Also my dude, are you seriously nitpicking with Linus Torvalds?  ↩︎ Each rebuild can start from what we now understand about the whole module, not from what we believed the first time someone shipped it. Tech debt the old code carried (because it grew one patch at a time) can finally come off. starts at the 1:48 mark  ↩︎ Yes, agents aren’t deterministic the way compilers are — same prompt tomorrow may give different code. But that may be the wrong bar moving forward. What has to stay stable is behavior under the spec; the code can vary. Also my dude, are you seriously nitpicking with Linus Torvalds?  ↩︎

0 views
qouteall notes 1 months ago

Rust Async Traps

In Rust, if you call an async function, it returns a future. But the future is just data by default. If you don't await it or spawn a it, its async code won't run. The word "future" has very different meaning in Java. In Java, when obtaining a , the task should be already running. Async runtime schedules async tasks on threads. When an async task suspends, the thread can run other async tasks. But it requires the async task to cooperatively suspend ( ). An async task can keep running without for long time, and the async runtime cannot force-suspend it. Then a scheduler thread will be kept occupied. This is called blocking the scheduler thread . When a scheduler thread is blocked, it reduces overall concurrency and reduces overall performance. And it may cause deadlock. The normal sleep and normal locking will block thread using OS functionality. When a thread is blocked by OS, async runtime don't know about it. In Tokio, use for mutex and and sleep. They will coorporatively pause and avoid that issue. That issue is not limited to only locking and sleep. It also involves networking and all kinds of IOs. So Tokio provides its own set of IO functionalities, and you have to use them when using Tokio for max performance. Also, heavy computation work without point is also blocking. The async runtime cannot force-suspend the heavy computation if it doesn't cooperatively . Tokio also supports an "escape hatch". The task spawned by runs in another thread pool and won't block the normal scheduler thread. The code that does non-async blocking or heavy compute work should be ran in . How to deadlock Tokio application in Rust with just a single mutex Why do I get a deadlock when using Tokio with a std::sync::Mutex? In Rust, a future can be dropped. When it's dropped, its async code stops executing in an await point. This is called cancellation. It's a implicit exit mechanism. The control flow of it is not obvious in code. Note it cancels the future, not the IO. Cancelling a future just stops the async code from running (and drop related data). The already-done IO operations won't be cancelled. (The written files won't be magically rolled back. The sent packets won't be magically withdrawn.) Cancellation not the only implicit exit mechanism. Panic is another implicit exit mechanism. And in the languages that have exceptions (Java, JS, Python, etc.), exception is another implciit exit mechanism. However, exceptions and panics are often logged, but future cancel is often not logged . Although panic is implicit code control flow, it's often explicit in logs. It's easy to debug because it's visible in log. But a future cancel by default logs nothing. Debugging future cancel issue is much harder than debugging panics. The cancellation "catch": normally when the parent future cancels, the inner futures are also cancelled. It propagates from outside to inside. The can stop that propagation. Although is , dropping it won't cancel the spawned task. So if you want to avoid cancellation, wrap it in (and don't call ). In Golang, there is panic, but there is no implcit cancellation. All cancellation need to be explicit. (However managing context cancellation in Golang still has traps, just different to async Rust.) Two examples of cancellation issues: Alan tries to cache requests, which doesn't always happen , Barbara gets burned by select See also: Dealing with cancel safety in async Rust , Cancelling async Rust There is another kind of "cancel": doesn't drop the future but does not the future. This is also dangerous. Elaborated below. Tokio documentation about cancellation safety: 1 , 2 Note again that "cancel" just drops Rust future (and un-track it in async runtime). It doesn't cancel the IO operation. With epoll, the buffer can be directly put inside future, with no extra allocation. If the Rust future is dropped, it just don't do the IO after being notified. With io_uring, dropping the future doesn't cancel the kernel's IO process. So putting buffer into future in io_uring is not memory-safe on cancellation (kernel will write into freed memory). Two solutions: See also: Notes on io-uring As previously mentioned, dropping a future cancels it. There is another kind of "cancellation": just not the future, without dropping the future. It's also dangerous. It may cause deadlock or weird delaying. In you can pass ownership of a future, but you can also pass a future borrow. When a future borrow is passed, one dangerous case can happen. If the select goes into one branch, the future of other branches are dropeed. If you pass a future borrow to it, the borrow itself is dropped, but the borrowed future is not dropped. However, the borrowed future will not be polled again (you can explicit await it after the , but it doesn't before finishing). This creates a temporaily un- -ed future. This is dangerous when async lock is involved. After acquiring lock, the returned future holds lock. If the future holding lock is dropped, it released lock. But if the future holds lock but not dropped and not polled, it's likely to deadlock. This is the mechanism behind futurelock . When using buffered stream, some futures in buffer may be temporarily un- -ed. This can cause weird delaying or deadlock. https://tmandry.gitlab.io/blog/posts/for-await-buffered-streams/ https://without.boats/blog/poll-progress/ Rust currently have no in-place initialization. Heap-allocating one thing requires firstly creating it on stack then move it to heap. In release mode, it can be optimized to directly initializing on heap. But in debug mode it still involves creating on stack. Some futures may be very large. Creating a large future on stack can cause stack overflow. Sometimes it stack overflows in debug mode but not release mode, because in release mode it directly writes to heap. In Windows the default stack size is smaller so it's more likely to stackoverflow. There is currently some inefficiency in future size. See Async Future Memory Optimisation How to reduce future size: It will print All of them execute on main thread. There is no parallelism. The parallelism can be enabled by using . But without it has no parallelism by default. This is different in Golang. In Golang, goroutines are parallel. Async-sync-async sandwitch: Async function call sync function that blocks on another async function. Its async-to-sync calling blocks scheduler thread. It's very prone to deadlock. Tokio does multi-thread work-stealing scheduling. Its purpose is very similar to OS scheduling. And an async task's purpose is very similar to OS thread. The duality of the two: As long as the data is owned by a thread, it's data-race free. The correspondence: as long as the data is owned by an async task, it's data-race free. Tokio requires the future to be . This can create some troubles. It requires because Tokio does work stealing. An async task in one thread could be then scheduled to another async task. However if async task is analogous to thread, then if we ensure that the data is owned by async task, it can also achieve data-race free, even if the data is not . However Rust doesn't check "async task boundary". An async task can pass data out. Then the data is no longer owned by async task. There is no language mechanism that ensures that the data is tied within async task. So you still have to satisfy even for the data that's only used with one async task. The constraint can be avoided for thread-per-core async runtimes. Using multiple async runtimes together is possible but is hard and error-prone. And there are many async-runtime-specific types. So async runtime naturally has exclusion. That's why Tokio has monopoly. In Golang you can only use one official goroutine scheduler. In Rust, although Tokio has monopoly, you have choices of using other async runtimes. This trap is not Rust-specific. When using thread pool, it often has thread count limit, which limits concurrency. But in async, there is no concurrency limit by default. This is good for high-performance web server. But it has downsides: One solution is to add a semaphore to limit concurrency. Structural concurrency force all concurrent tasks to be scoped. Then the tasks form a tree-shaped structure. Structural concurrency can borrow data from parent. There is no need to make the future . There is no need to wrap things in . The tree shape is free of cycles, so awaiting on child tasks alone cannot deadlock (but it can deadlock if other kinds of waits are involved). But there are cases that structural concurrency cannot handld. One is background tasks. For example, a web server provides a Restful API that launches a background task. The background task keeps running after the request that launch task finishes. The bane of my existence: Supporting both async and sync code in Rust Why async Rust? Async Rust can be a pleasure to work with (without ) Making Async Rust Reliable - Tyler Mandry FuturesUnordered and the order of futures The "fully owned" here means not just ownership in Rust semantics. The has internal data structures. The "fully owned" applies to these internal data structures. One async task fully own the means the internal data structure (that contains reference count) is only accessible from one async task. ↩ . When one branch is selected, the futures of other branches are cancelled. . Explcitly cancel a task. . When timeout is reached but the future hasn't finished, it's cancelled. In epoll, the OS notifies app that an IO can be done, then the app does another system call to do IO. It involves context switching from kernel to app (receive notification), then to kernel (do the IO syscall) then to app (finishing IO). The app can choose to not do the IO after receiving notification. This works well with Rust future cancellation. In io_uring, the OS directly finish IO (write to buffer) then tell the app. It's just a context switch from kernel to app (it's faster than epoll's kernel-to-app-to-kernel-to-app). The IO is fully done by kernel. The app cannot choose to "receive notification but not do IO". When app receives notification, the IO has already been done. This doesn't work well with Rust async cancellation. Make the future non-cancellable. Rust doesn't yet have linear type (must-move type) so this cannot be guaranteed by language. Make the buffer heap-allocated. When future is dropped, the buffer can still exist, kernel can write to it without violating memory safety. Avoid creating an in-place buffer like . The buffer will directly be in the future. When calling another async function, firstly box that future then await on it. If not boxed, the sub-future will be directly put inside parent future. Making async code call sync code is easy, but has risk of blocking scheduler thread, as mentioned previously. Making sync code call async is not easy. It requires using async runtime's API. But it's less risky. For scraper, if concurrency is too high, it may use too much memory then OOM. If it sends too many concurrent requests to a remote server, it may trigger rate limit then most requests fail. The "fully owned" here means not just ownership in Rust semantics. The has internal data structures. The "fully owned" applies to these internal data structures. One async task fully own the means the internal data structure (that contains reference count) is only accessible from one async task. ↩

0 views
Evan Schwartz 1 months ago

Scour - April Update

Hi friends, In April, Scour scoured 778,059 posts from 25,790 feeds . This month, my focus was on ranking improvements and adding a number of new features: Scour is designed to find hidden gems that interest you, while trying to avoid using popularity signals or pigeonholing you into a narrow slice of content simply because you clicked on one thing (you can read the ranking philosophy here ). Your Scour feed now subtly adjusts based on which content you click on, like, or dislike. Interests whose related content you like will get a small boost, as well as posts from domains that you tend to like. This effect is intentionally subtle. The feed is also much better now at balancing across your different interests. I revamped the way it does the final content selection to have an explicit diversification step that balances the feed based on your interests, the sources, and other criteria. Scour's interface has undergone a number of iterations this month. Now, you click or tap a post to expand it. The expanded view contains a short snippet from the post with a link to read more, as well as buttons to save, react, report it, etc. Want to save an item to read for later? You can now save items , which is separate from liking them. Saved items are private and don't affect your feed's ranking at all. Also, Scour will occasionally resurface a couple of your saved items while you're browsing your feed so you can revisit things you might not have had time to read before. You can read post summaries and some entire posts directly on Scour. Click on Read More, which is shown when you click on a post, to go to the post preview page. That page has better styling now, so it should be nicer to read. Plus, code blocks now get automatic syntax highlighting. You can now browse popular interests by category. Technology is broken out into subcategories, or you can easily skip past it to find other topics like Science & Nature, Food & Cooking, Arts & Design, etc. Clicking on a post's domain now brings you to a chronological list of all the posts from that site and, optionally, all the subdomains. You can easily block domains on that page if you don't want any of their content appearing in your feed, or just browse to see what else was published. The default feed view switched from infinite scrolling to paginated. You can click the link at the bottom of the page to use infinite scroll, or toggle this in your settings. Thanks to Gordon McLean for the Scour mention in Why I Still Like the Internet ! And thanks to everyone whose feedback shaped the roadmap this month: Here were some of my favorite posts that I found on Scour in April: For Rust developers, I also wrote up this blog post: Your Clippy Config Should Be Stricter . Have ideas for how to make Scour better? Post them on the feedback board ! Happy Scouring! Thanks to Qiang Huang for requesting an easier way to see the post preview! Thanks to Shane Sveller for lots of UI feedback and requesting the ability to block multiple subdomains ! Thanks to Phil Eaton and Gordon McLean for pointing out that the footer was impossible to reach (it's now hidden completely when infinite scroll is enabled)! Thanks also to Phil for asking to see all posts from a domain! Thanks to u/goma_goma for suggesting adding Saved Posts! Thanks to Adam Benenson and Patrick Wadström for the feedback that led to the categorized interests view! TurboPuffer wrote an interesting blog post about efficiently merging recency and other numeric signals into lexical (BM25) scores for documents. I'm currently working on adding lexical scoring to Scour, so this was very timely for me: Mixing numeric attributes into text search for better first-stage relevance . On the topic of search, Doug Turnbull had a good post discussing Can agents replace the search stack? and Daniel Tunkelang wrote about using multiple documents to represent a search query in Distilling Retrieval Pipelines to a Single Embedding Model . I'm not switching Scour's architecture to either of these just yet, but they're interesting food for thought. I uninstalled Ollama, the tool for running local LLMs, after reading: Friends Don't Let Friends Use Ollama . This is a gem of a comment and historical tidbit in the SQLite source code that Avinash Sajjanshetty found while working on the Turso rewrite: SQLite prefixes its temp files with . On the non-software front, this article makes an unfortunately compelling point: Iran didn’t have a nuclear weapon before this war. But you can see why it would develop one now .

0 views
Evan Schwartz 1 months ago

Your Clippy Config Should Be Stricter

“If it compiles, it works.” This feeling is one of the main things Rust engineers love most about Rust, and a reason why using it with coding agents is especially nice. After debugging some code that compiled but mysteriously stopped in production, I realized that it’s useful to enable more Clippy lints to catch bugs that the compiler won't prevent by itself. It's especially useful as guardrails for coding agents, but stricter linting can make your code safer, whether or not you’re coding with LLMs. Scour is the personalized content feed that I work on. Every Friday, Scour sends an email digest to each user with the top posts that matched their interests. On a recent Friday, the email sending job mysteriously stopped. This was puzzling because I had already put in place multiple type system-level safeguards and tests to ensure that it would continue with a log on all types of errors. After digging into the logs, I found the culprit to be . A function naively truncated article summaries without checking for UTF-8 character boundaries, which caused a panic and stopped the Tokio worker thread running the email sending loop. The solution for this particular bug was a safer method for truncating article summaries that respects UTF-8 character boundaries. However, this problem was reminiscent enough of the 2025 Cloudflare bug that "broke the internet" that I wanted some more general solution. Rust's compiler prevents many types of bugs but there are still production problems it can't catch. Panics will either crash your program or quietly kill Tokio worker threads. Deadlocks and dropped futures can make work silently stop. And plenty of numeric operations can silently cause incorrect behavior. We can stave off many of these types of bugs by making Clippy even stricter than it already is. This is especially relevant in the age of coding agents. A seasoned Rust engineer might naturally avoid patterns that could cause problems. An agent or a junior colleague might not. Stricter Clippy rules make it easier to rely on code you didn't personally write. Also, enabling new lints on an existing codebase is tedious, and exactly the kind of task that is good to hand to a coding agent. Clippy ships with hundreds of lints that are disabled by default. Some are disabled because they might have false positives and some are style choices which you might reasonably not want. Which lints should we enable to help us get back the "if it compiles [and passes Clippy], it works" feeling? Clippy's lints are grouped into categories : Correctness, Suspicious, Complexity, Perf, Style, Pedantic, Restriction, Cargo, Nursery, and Deprecated. Unfortunately, none of these categories cleanly map onto "don't let this panic or do the wrong thing in production". In fact, the Clippy docs say that "The category should, emphatically , not be enabled as a whole." Clippy even includes a dedicated lint, , to discourage you from enabling this category. While the category includes many useful lints, it also includes some that directly contradict one another. For example, it contains lints to enforce both and . The docs say "Lints should be considered on a case-by-case basis before enabling". Of course, you can enable whole categories like and and then specific ones you want to disable, but I'm outlining a selective opt-in here. Even if you don't use a certain pattern in your code base today, it's not bad to enable the lint anyway. Inapplicable lints serve as cheap tripwires in case the given pattern is ever added later, whether by you, a colleague, or a coding agent. Every project is different and you should look through the available lints to see which ones make sense for your project. Also, check when lints landed in stable if your Minimum Supported Rust Version predates 1.95, as some of these may have been added after your MSRV. With those caveats out of the way, here are the lints I enabled, roughly categorized by what kind of behavior they prevent. You can skip to the bottom if you just want to copy my config . This group prevents panics from unwraps and unsafe slicing or indexing into arrays and strings. Note that some of these, like and may produce many warnings throughout your code base. That may be annoying to fix. However, using safe methods like and iterators instead of slicing prevents pretty severe footguns, so I would argue that it's worth it. You might or might not want to enable . Calling on an or can result in a panic. However, the message you pass to should already document why that thing shouldn't happen. Enabling the lint and then selectively disabling it throughout your code with may end up duplicating the same rationale for using it in the first place. Another lint that is a real judgement call is . This can prevent overflows and division by zero. However, it will cause Clippy to warn you about every place you use math operators: , , , , , and . I tried enabling it in my code base and would estimate that around 15% of the warnings caught real issues and 85% was just noise. These prevent various concurrency bugs and deadlocks: The lints , , effectively force you to document invariants when doing lossy casts between numeric types. You might or might not find that useful. These two are especially useful if you're using a coding agent. Instead of letting the agent write , it should provide a reason wherever it's disabling a lint. If you're using a Cargo workspace, you'll want to enable these lints in the workspace Cargo.toml. Unfortunately, each workspace crate needs to opt in to inheriting lints with , rather than inheriting the lints by default. On nightly, there's a lint that specifically checks for this. If you're using stable Rust, you can use or a simple shell script run on CI to make sure you don't forget to make a workspace crate inherit the lints. When enabling lints, you can either set Clippy to or them. Either works but I personally prefer setting these to and running Clippy with before committing and on CI. This makes local iteration marginally easier because you can compile your code initially without fixing all the lints right away. Ultimately, as Clippy's docs say, "You can choose how much Clippy is supposed to annoy help you." But especially in the age of coding agents, I think it's worth tightening the guardrails so you end up with even fewer mysterious bugs in production and more code where you can say "if it compiles and lints, it should work." Discuss on r/rust , Lobsters , or Hacker News . - on (UTF-8 boundary panic). This would have caught my initial bug. / / - placeholder-panic macros - inside functions that return a - panics if the second is larger - / inside a function that returns a - drops without awaiting - swallows errors - silently drops - loses source error - discards the error message (only relevant if you're using an earlier edition than 2024) - deadlock pattern. The scoping was fixed in the 2024 edition so this is no longer an issue. - a that is too large can cause a stack overflow - every needs a comment - one unsafe op per block (one comment per op) / - only document safety where it belongs - on floats - stricter, also flags comparisons against constants - silently-rounded float literals ( ) - wraps to - always false - ( is single-threaded) - differs in debug vs release - method named returning non- - manual impl that disagrees with - impl whose error is should be - calls should be removed after debugging - every becomes - every requires a reason

0 views
Marc Brooker 1 months ago

It's time to be right.

Outcomes continue to matter. Earlier this week, I spoke at AI Dev 26. This is what I spoke about there. I’ve been making money, in some form, building software for nearly 30 years. The last five months have been the most exciting of that entire time. I’m extremely optimistic about the future of software, and the future of software engineering as a field. But I have a hypothesis about agentic AI for development, and for knowledge work broadly: in future, the size of the opportunity for agentic AI will be more limited by defect rate than capabilities. Let’s break that down a little bit, by thinking about defects along two axes: how serious the defects are, and how frequent they are. These axes intentionally conflate two inputs&emdash;how hard the problems are and how capable agents are at solving them&emdash;and focus only on the output that matters most: user-experienced defects. We’re also focusing on outputs from an AI agent here. Agents are feedback loops. Feedback loops, just like in electronics and control theory, can have significantly different capabilities from their underlying components. In simpler terms, agents can work around model gaps very effectively 1 . Simplifying further, we’ll arrange these axes into a kind of four-blocker, and think about the kinds of people that would use an agent in each block. Again, I’m conflating the difficulty of the problem with the capabilities of the agents here. Easier problems move towards the top left more quickly. The point, though, is that defect rate is going to be one of the main inputs into how many people can use an agent, irrespective of how well it does on its best days. We can also frame the problem as a distribution of outcomes. The right tail is the positive capabilities that agents have, and is the part of the distribution that gets the most attention and effort. The left tail is the defects, the bad outcomes, which doesn’t get nearly as much attention but is probably more important as an area to invest in if you care about serving real customers and growing a real business on AI agents. Somewhat amusingly, my favorite knowledge work agent took five tries to draw this Cauchy distribution SVG. The first version was a normal distribution, and the next three were weirdly spiky or discontinuous. At each step it insisted it was me that was wrong about what a Cauchy distribution looks like. Solidly in the bottom left corner here 2 . I want to highlight some of the work we’re doing at AWS on agent correctness. This is just a sample of a large body of work, but shows the direction we’re heading it. This is also something that needs an industry-wide focus and attention, and not something we can do just by building tools. Some changes I’d like to see are: As I said up front, I’m super optimistic about the future of this field. But I think that a lot of the conversation about risks is either silly sci-fi stuff, or straight-up denialism aimed at the right hand side of the distribution. There’s a really smart and important conversation to be had about the left hand side, but two few people having it today. High defect frequency, high defect seriousness. Basically nobody. Except maybe a small set of true believers and early adopters. If an agent is making highly consequential mistakes often, it’s simply not going to be useful to a lot of folks. High defect frequency, low defect seriousness. People working on problems where slop is OK. This is a larger opportunity, because slop is OK fairly frequently. If I’m using an agent for low consequence stuff, like summarizing an email about this fall’s soccer league, it’s likely to do a better job than I’d do by skimming. And, as much as it tends to hurt our sense of professional pride, there’s also a huge opportunity for software slop in one-off scripts, little experiments and tools, basic UIs, and so on. Even stability-sim.systems is, in some sense, slop. Low defect frequency, high defect seriousness. This is an odd corner, where the opportunity is mostly constrained to a set of experts. Software built here is going to need to be reviewed, debugged and operated by people who deeply understand how it works. Often, that debugging is going to require significantly more understanding than it took to build the software in the first place. That doesn’t make agentic AI useless, but does severely limit the set of people who can get value out of it. Low defect frequency, low defect seriousness. This is where we want to be, because it means that everybody can play. The defect rate is low enough that it’s not annoying or time wasting, and the defects are low-consequence enough that those which remain don’t matter. Correct-by-construction coding tools and languages, like Hydro for distributed systems, and Cedar for auth. These are tools that agents can use to avoid entire classes of high-consequence defects 3 . Spec-driven development in agents like Kiro , which gives the coding agent additional big picture context that helps it evolve systems over time without regressing on key properties. Property-based testing is another example of the same pattern at a different scale. Code reasoning tools like Strata , powered by Lean , that allow agents to reason formally about properties of code. Autoformalization, turning natural language into precise formal implementations, in Bedrock AR Checks and AgentCore Policy , which remove whole classes of runtime defects (especially in critical places like tool call safety). Deterministic and precise policy for tools, in Trusted Remote Execution and AgentCore Policy , which precisely constrain agents tool call behavior. Principled approaches to deterministic agent steering, like Strands Steering , which can keep agents on the right path while still taking advantage of their power and flexibility. Benchmarks which capture failure severity, not just pass/fail. Pass@10 isn’t super meaningful if the other 9 swings were subtly and non-obviously wrong in consequential ways. An end to end view of dev agent success, not just code patching. Operations, cost, availability, durability, availability, security, performance, stability, etc. These are things that customers of software care about deeply, and aren’t captured well in existing benchmarks. A research program to develop a deep understanding of agentic AI failure modes, and a taxonomy of failures. A culture where we take our worst days as seriously as our best ones. In my recent conversation with Ryan Peterman I spoke a lot about AWS’s culture around learning from failures, and I think that’s a pattern we need industry-wide focus on in agentic AI. But also hide model capabilities. Bad agentic harnesses and the wrong feedback can make a great model bad, and great feedback can make a bad model much better. I knew people would accuse me of being insufficiently bitter lesson pilled after this talk. The choice of the Cauchy distribution is a little easter egg for those people. Memory-safe languages like Rust are also in this category.

0 views
Corrode 1 months ago

Bugs Rust Won't Catch

In April 2026, Canonical disclosed 44 CVEs in uutils, the Rust reimplementation of GNU coreutils that ships by default since 25.10. Most of them came out of an external audit commissioned ahead of the 26.04 LTS. I read through the list and thought there’s a lot to learn from it. What’s notable is that all of these bugs landed in a production Rust codebase, written by people who knew what they were doing, and none of them were caught by the borrow checker, clippy lints , or cargo audit . I’m not writing this to criticize the uutils team. Quite the contrary; I actually want to thank them for sharing the audit results in such detail so that we can all learn from them. We also had Jon Seager, VP Engineering for Ubuntu, on our ‘Rust in Production’ podcast recently and a lot of listeners appreciated his honesty about the state of Rust at Canonical. If you write systems code in Rust, this is the most concentrated look at where Rust’s safety ends that you’ll likely find anywhere right now. This is the largest cluster of bugs in the audit. It’s also the reason , , and are still GNU in Ubuntu 26.04 LTS. :( The pattern is always the same. You do one syscall to check something about a path, then another syscall to act on the same path. Between those two calls, an attacker with write access to a parent directory can swap the path component for a symbolic link. The kernel re-resolves the path from scratch on the second call, and the privileged action lands on the attacker’s chosen target. Rust’s standard library makes this easy to get wrong. The ergonomic APIs you reach for first ( , , , ) all take a path and re-resolve it every time, rather than taking a file descriptor and operating relative to that. That’s fine for a normal program, but if you’re writing a privileged tool that needs to be secure against local attackers, you have to be careful. Here’s the bug, simplified from . Between step 1 and step 2, anyone with write access to the parent directory can plant as a symlink to, say, . Then follows the symlink and the privileged process happily overwrites with whatever happened to contain. The fix uses : The docs for say (emphasis mine): No file is allowed to exist at the target location, also no (dangling) symlink . In this way, if the call succeeds, the file returned is guaranteed to be new. A in Rust looks like a value, but remember that to the kernel it’s just a name. That name can point to different things from one syscall to the next. Anchor your operations on a file descriptor instead. only helps with that when you’re creating a new file. For everything else, open the parent directory once and work relative to that handle . If you act on the same path twice, assume it’s a TOCTOU (Time Of Check To Time Of Use) bug until you’ve proven otherwise. This is a close relative of TOCTOU. You want a directory with restrictive permissions, so you write something like this. For a brief moment, exists with the default permissions. Any other user on the system can it during that window. Once they have a file descriptor, the later doesn’t take it away from them. Reach for and so the file or directory is born with the permissions you want. The kernel will apply your on top, so set that explicitly too if you really care. The original check in was literally this: That comparison is bypassed by anything that resolves to but isn’t spelled . So , , , or a symlink that points to . Run and see it rip right past your check and lock down the whole system. Here’s the fix : resolves , , and symlinks into a real absolute path. That’s a lot better than string comparison. Oh and if you were wondering about this line: I think that’s just a fancy way of saying In the specific case of , this works because has no parent directory, so there’s nothing for an attacker to swap from underneath you. In the more general case of comparing two arbitrary paths for filesystem identity, however, you’d want to open both and compare their pairs, the way GNU coreutils does. (Think identity, not string equality.) By the way, my favorite bug in this group is CVE-2026-35363: It refused and but happily accepted and , then deleted the current directory while printing . 😅 Rust’s and are always UTF-8. That’s a great choice in 99% of all cases, but Unix paths, environment variables, arguments, and the inputs flowing through tools like , , and live in the messy world of bytes. Every time a Rust program bridges that gap, it has three options. The audit found bugs in both of the first two categories. Here’s an example. This is the original code, from . GNU works on binary files because it just shuffles bytes around. The uutils version replaced anything that wasn’t valid UTF-8 with , which silently corrupted the output. Here’s the fix: stay in bytes. forces a UTF-8 round-trip through . does not. It writes the raw bytes directly to . For Unix-flavored systems code, use and for filesystem paths, for environment variables, and or for stream contents. It’s tempting to round-trip them through for easier formatting, but that’s where the corruption creeps in. UTF-8 is a great default for application strings, but it’s absolutely, positively the wrong default for the raw byte stuff Unix tools work with. In a CLI, every , every , every slice index, every unchecked arithmetic operation, every is a potential denial of service if an attacker can shape the input. That’s because a unwinds the stack and aborts the process. If your tool is running in a cron job, a CI pipeline, or a shell script, that means the whole thing just stops working. Even worse, you could find yourself in a crash loop that paralyzes the entire system. A canonical case from the audit was ( CVE-2026-35348 ). The flag reads a NUL-separated list of filenames from a file, but the parser called on a UTF-8 conversion of each name: GNU treats filenames as raw bytes, the way the kernel does. The uutils version required UTF-8 and aborted the whole process on the first non-UTF-8 path: (I reproduced this against on macOS. The Python one-liner is there because most modern shells refuse to create a non-UTF-8 filename for you.) Your nightly cron job is dead and there goes your weekend. In code that processes untrusted input, treat every , , indexing, or cast as a CVE waiting to be filed. Use , , , , and surface a real error. Push back on the boundary of your application and let the caller deal with the fallout. A good lint baseline to catch this in CI: These are noisy in test code where panicking on bad data is exactly what you want. The cleanest way to scope them to non-test code is to put at the top of each crate root, or to gate on the individual modules. Closely related to the previous point, a few CVEs come from ignoring or losing error information. and returned the exit code of the last file processed instead of the worst one. So could fail on half the files and still exit . Your script thinks everything is fine. called on its call to mimic GNU’s behavior on . The intent was reasonable, but that same code ran for regular files too, so a full disk silently produced a half-written destination. The reason was that someone wanted to throw away a and reached for , , or . Here’s a very simple pattern to avoid that: Also, if you write to discard a , leave a comment that explains why this specific failure is safe to ignore. A surprising number of these CVEs aren’t “the code does something unsafe” but “the code does something different from GNU, and a shell script somewhere relied on the GNU behavior.” The clearest example is (CVE-2026-35369). GNU reads as “signal 1” and asks for a PID. uutils read it as “send the default signal to PID -1”, which on Linux means every process you can see . Yikes! A typo becomes a system-wide kill switch. If you reimplement a battle-tested tool, bug-for-bug compatibility on exit codes, error messages, edge cases, and option semantics is a security feature. (Hello, Hyrum’s Law – and obligatory XKCD 1172 !) Anywhere your behavior diverges from the original, somebody’s shell script is making a wrong decision. uutils now runs the upstream GNU coreutils test suite against itself in CI. That’s the right scale of defense for this class of bug. CVE-2026-35368 is the worst single bug in the audit. It’s local root code execution in . The bug is visible if you know what to look for (a followed by a function call that loads a dynamic library), but it’s the kind of thing that doesn’t jump out on a first read. Here’s the pattern, simplified from the utility. Huh. Looks innocent. The trap is that ends up loading shared libraries from the new root filesystem to resolve the username. An attacker who can plant a file in the chroot gets to run code as uid 0. GNU resolves the user before calling . Same fix here. Once you’re across, every library call might run the attacker’s code. And no, static compilation doesn’t help here, because goes through NSS, which s modules at runtime regardless of whether your binary is statically linked. You might have made it this far and thought “Wow, that’s a lot of bugs! Maybe Rust isn’t as safe as I thought?” That would be the wrong conclusion. Keep in mind that none of the following bad things happened: That means, even if the tools were (and probably still are) buggy, they never had a bug that could be exploited to read arbitrary memory. GNU coreutils has shipped CVEs in every single one of those categories. Take a peek at the last few years of the GNU file: …the list goes on and on. The Rust rewrite has shipped zero of these, over a comparable window of activity. 1 That’s most of what historically goes wrong in a C codebase. What’s left is, frankly, a more interesting class of bug. It lives at the boundary between our controlled Rust environment and the messy, chaotic outside world, where paths, bytes, strings, and syscalls are all tangled up in one eternal ball of sadness. That’s the new security boundary of modern systems code. 2 If you write systems code in Rust, treat this CVE list as a checklist. Grep your own codebase for , stray calls, discarded s, , and string comparisons against . I also wrote a companion post, titled Patterns for Defensive Programming in Rust . When I think of “ idiomatic Rust ”, correctness is not the first thing that comes to mind. After all, isn’t that the compiler’s job? Instead, I think of elegant iterator patterns , ergonomic method signatures, immutability , or clever use of expressions . But none of that matters if the code doesn’t do the right thing, and the compiler is far from perfect at enforcing correctness. That’s why we don’t only have idioms for writing more elegant code; we also have idioms for writing correct code. They are the distilled experience of a community that has learned, often painfully, which shapes of code survive contact with reality and which ones do not. Reality is rarely as tidy as the abstractions we would like to impose on it. The mark of robust systems, in any language, is the willingness to reflect that untidiness rather than paper over it. Rust gives us extraordinary tools to do so, and the compiler will hold a great deal for us. But the part it cannot hold, the boundary between our program and everything else, is still ours to get right. The type system can encode many things, but it cannot encode conditions outside of its control, such as the passage of time between two syscalls. Idiomatic Rust, then, is not just code that the borrow checker accepts or that leaves alone. It is code whose types, names, and control flow tell the truth about the system they run in. And that truth is sometimes ugly. It could mean using file descriptors instead of paths, instead of , instead of , and bug-for-bug compatibility over clean semantics. None of it is as pretty as the version you would write on a whiteboard. But it is more honest. Need Help Hardening Your Rust Codebase? Is your team shipping Rust into production and want to make sure you’re not falling into the same traps? I offer Rust consulting services, from code reviews and security-focused audits to training your team on the patterns that the compiler won’t enforce for you. Get in touch to learn more. To be fair to GNU: GNU coreutils is 40 years old and has had a very long time to surface and fix this class of bug. And we don’t know there are no memory-safety bugs in the Rust rewrite, only that the audit didn’t find any. Still, the difference is noticeable when comparing the same duration of development activity. ↩ It’s worth noting that the / TOCTOU class of bug is in some ways easier to avoid in C than in Rust. C code naturally reaches for an open file descriptor and the family of syscalls ( , , , ), and most creation syscalls take a argument directly. Rust’s high-level APIs abstract over the file descriptor and operate on values, which makes the path-based, re-resolving call the path of least resistance. The handle-based APIs exist on every Unix platform; Rust just doesn’t put them front and center. ↩ 🫩 Lossy conversion with silently rewrites invalid bytes to U+FFFD. That’s just fancy data corruption. 🫤 Strict conversion with or crashes or refuses to operate. 😚 Staying in bytes with or is what you should usually do. No buffer overflows. No use-after-free. No double-free. No data races on shared mutable state. No null-pointer dereferences. No uninitialized memory reads. buffer overflow on deep paths longer than (9.11, 2026) out-of-bounds read on trailing blanks (9.9, 2025) heap buffer overflow (9.9, 2025) writes a NUL byte past a heap buffer (9.8, 2025) 1-byte read before a heap buffer with a key offset (9.8, 2025) and crashes with SELinux but no xattr support (9.7, 2025) heap overwrite ( CVE-2024-0684 , 9.5, 2024) reads unallocated memory on malformed input (9.4, 2023) stack buffer overrun with many files and a high (9.0, 2021) To be fair to GNU: GNU coreutils is 40 years old and has had a very long time to surface and fix this class of bug. And we don’t know there are no memory-safety bugs in the Rust rewrite, only that the audit didn’t find any. Still, the difference is noticeable when comparing the same duration of development activity. ↩ It’s worth noting that the / TOCTOU class of bug is in some ways easier to avoid in C than in Rust. C code naturally reaches for an open file descriptor and the family of syscalls ( , , , ), and most creation syscalls take a argument directly. Rust’s high-level APIs abstract over the file descriptor and operate on values, which makes the path-based, re-resolving call the path of least resistance. The handle-based APIs exist on every Unix platform; Rust just doesn’t put them front and center. ↩

0 views
underlap 1 months ago

Claude Code reflection

I was getting more into the swing of using Claude Code, before I gave it up. One of the main advantages of Claude is that it enables me to step outside my own areas of expertise. For example, it was able to diagnose, and propose a fix to, an intermittent bug in Windows code in ipc-channel. It even wrote a good description of the bug, including how to reproduce the behaviour. I asked it to analyse the git history to determine when the bug was introduced. However, given that ipc-channel is part of Servo and is likely covered by Servo’s AI policy, I added the following to the start of the bug: This bug was diagnosed by Claude Code. A test to reproduce the behaviour was added to draft PR https://github.com/servo/ipc-channel/pull/449 . I reviewed the test and it looks legitimate. The test failed in this run : A putative fix produced a clean run. I cannot vouch for the fix because I lack the necessary Windows expertise (and any desire to obtain it). If Servo’s AI policy applies to this repository, the test and fix should be handled with caution. However, I believe having this bug on the books is beneficial in case someone else encounters it. For another example, I have been trying to diagnose an intermittent crash on Windows CI and Claude has helped me install and use crash dump tooling to determine which testcases are involved. This required some iteration as Claude tried various techniques and then had to correct the CI pipeline to work correctly with the tooling. It seems likely that the underlying issue is again in the Windows code in ipc-channel, but I haven’t managed to get any closer to diagnosing the problem (and neither has Claude). Another example was helping me get an overview of a project ( ) which is developing an operational semantics for Rust. I asked Claude to write a linear walkthrough of the project repository, which it did very competently. When working outside my own areas of expertise, I am quite dependent on Claude. I can ask it to explain things and so, if I was suitably motivated, I could expand my expertise. When I am not interested in expanding my expertise, e.g. in using the Windows API, I have to treat Claude with caution and be careful it doesn’t lead me up the garden path with nonsense. Another disadvantage is that using Claude to write code on my own projects is very much like delegating coding tasks to someone else. I can easily lose touch with areas of the code, even if I review Claude’s code carefully. However, unlike delegating to someone else, I have to keep in mind that Claude cannot care about the code it is writing and cannot take any responsibility for it – I have to carry the can. When I need to continue working on code Claude has developed, I often need to involve Claude because I have not lived through the development of the code. When writing code myself, I experience various emotions and mental states: struggling to understand the requirements, wondering how to structure the code, designing tests, making various kinds of trade-offs, puzzling over bugs, etc. Although my memory of this lived experience soon fades, I can get back into a similar mental state when I need to work on the same code again. By reading tests, code, docs, and commit logs, I can revive the relevant memories and then use those to help me continue. But with Claude, there is no history of having lived with the details of the code. I’ve reviewed it all and understood it fairly well, but that’s not the same as having grappled with the code myself. The chances are that I’ll have a much vaguer recollection of having worked (with Claude) on any particular piece of code, especially since I will have spent much less time than if I was writing that code by hand. Anyway, the main purposes of writing code during my retirement was to keep my brain sharp and for enjoyment (yeah, I know!). Using Claude Code detracts from both those purposes. So, at least for now, I’ve let my Claude subscription expire. And, in fact, I’ve stopped writing code altogether, but that’s a topic for another post.

0 views
Corrode 1 months ago

Helsing

Jon Gjengset is one of the most recognizable names in the Rust community, the author of Rust for Rustaceans , a prolific live-streamer, and a long-time contributor to the Rust ecosystem. Today he works as a Principal Engineer at Helsing, a European defense company that has made Rust a foundational part of its engineering stack. Helsing builds safety-critical software for real-world defense applications, where correctness, performance, and reliability are non-negotiable. In this episode, Jon talks about what it means to build mission-critical systems in Rust, why Helsing bet on Rust from the start, and what lessons from his years of Rust education have shaped the way he writes and thinks about production code. CodeCrafters helps you become proficient in Rust by building real-world, production-grade projects. Learn hands-on by creating your own shell, HTTP server, Redis, Kafka, Git, SQLite, or DNS service from scratch. Start for free today and enjoy 40% off any paid plan by using this link . Founded in 2021, Helsing is a European defence company building AI-enabled software for some of the most demanding environments imaginable. Helsing’s software runs where correctness is non-negotiable. That philosophy led them to Rust early on and they’ve leaned into it fully. From coordinate transforms to CRDT document stores to Protobuf package management, almost everything they build ends up being written in Rust. Jon holds a PhD from MIT’s PDOS group, where he built Noria, a high-performance streaming dataflow database, and later co-founded ReadySet to continue that work commercially. He then spent time building infrastructure at AWS, before joining Helsing as a Principal Engineer. Outside of his day job, he’s been teaching Rust to the world through his livestreams and writing for years, which makes him a rare combination: someone who thinks deeply about both how to use Rust and how to explain it. Helsing AI selected for Eurofighter upgrade - Helsing’s Eurofighter Project CA-1 Europa - Helsing’s Autonomous Uncrewed Combat Aerial Vehicle Rust in Python cryptography - Rust being used in a Python library Clippy Documentation: Adding Lints - How to add custom lints to (your own fork of) clippy anyhow’s .context() - Use it everywhere, it’s very very helpful eyre - A fork of with support for customizable, pluggable error report handlers miette - Fancy, diagnostic-rich error reporting for Rust with source snippets and labels buffrs - Helsing’s Cargo-inspired package manager for Protocol Buffers, written in Rust sguaba - Helsing’s Rust crate for type-safe coordinate system math, preventing unit and frame mix-ups at compile time Sguaba: Type-safe spatial math in Rust - Jon’s talk at Rust Amsterdam introducing sguaba and the type-system techniques behind it Apache Avro - A compact binary serialization format for streaming data, with a Rust implementation available via the crate pubgrub - A Rust implementation of the PubGrub version-solving algorithm, as used in Cargo and uv CRDTs - Conflict-free Replicated Data Types: data structures that can be merged across distributed nodes without conflicts ADR (Architecture Decision Record) - A lightweight way to document important architectural decisions and their context DSON: JSON CRDT using delta-mutations for document stores - The 2022 paper that was the basis for Helsing’s CRDT implementation dson - Helsing’s Rust implementation of DSON Jon’s Livestreams on YouTube - Deep-dive Rust coding sessions where Jon implements real-world libraries and systems from scratch WebAssembly with Rust - The official Rust and WebAssembly book, covering a cool technology and useful skills to have as a Rust developer Rust for Rustaceans - Jon’s book for intermediate Rust developers covering ownership, traits, async, and the finer points of the language CVE-2024-24576: Cargo/tar supply chain vulnerability - A security issue in the crate that affected Cargo’s package extraction Wikipedia: Defence in Depth - The security principle of using multiple independent layers of protection; Even with Rust you need multiple layers, there is no silver bullet SBOMs (Software Bill of Materials) - A machine-readable inventory of all components in a software artifact; Cargo’s lock files make this tractable for Rust projects Helsing: AI-assisted vetting of software packages - Make it more efficient to review dependencies you take in Bevy - A game engine built entirely in Rust, and a notable example of a large, complex Rust dependency Tauri - A Rust-powered framework for building lightweight desktop and mobile apps from a web frontend, an alternative to Electron Helsing Website Helsing Tech Blog Helsing on GitHub Helsing on LinkedIn Jon Gjengset’s Website Jon Gjengset on GitHub Jon Gjengset on YouTube Jon Gjengset on Bluesky Rust for Rustaceans

0 views
baby steps 1 months ago

Symposium: community-oriented agentic development

I’m very excited to announce the first release of the Symposium project as well as its inclusion in the Rust Foundation’s Innovation Lab . Symposium’s goal is to let everyone in the Rust community participate in making agentic development better. The core idea is that crate authors should be able to vend skills, MCP servers, and other extensions, in addition to code. The Symposium tool then installs those extensions automatically based on your dependencies. After all, who knows how to use a crate better than the people who maintain it? If you want to read more details about how Symposium works, I refer you to the announcement post from Jack Huey on the main Symposium blog . This post is my companion post, and it is focused on something more personal – the reasons that I am working on Symposium. The short version is that I believe in extensibility everywhere . Right now, the Rust language does a decent job of being extensible: you can write Rust crates that offer new capabilities that feel built-in, thanks to proc-macros, traits, and ownership. But we’re just getting started at offering extensibility in other tools, and I want us to hurry up! I want crate authors to be able to supply custom diagnostics. I want them to be able to supply custom lints. I want them to be able to supply custom optimizations. I want them to be able to supply custom IDE refactorings. And, as soon as I started messing around with agentic development, I wanted extensibility there too. The goal of Symposium is to give crate authors, and the broader Rust community, the ability to directly influence the experience of people writing Rust code with agents. Rust is a really popular target language for agents because the type system provides strong guardrails and it generates efficient code – and I predict it’s only going to become more popular . Despite Rust’s popularity as an agentic coding target, the Rust community right now are basically bystanders when it comes to the experience of people writing Rust with agents; I want us to have a means of influencing it directly. Enter Symposium. With Symposium, Crate authors can package up skills etc and then Symposium will automatically make them available for your agent. Symposium also takes care of bridging the small-but-very-real gaps between agents (e.g., each has their own hook format, and some of them use and some use , etc). Let me give you an example. Consider the assert-truct crate, recently created by Carl Lerche. lets you write convenient assertions that test the values of specific struct fields: This crate is neat, but of course, no models are going to know how to use it – it’s not part of their training set. They can figure it out by reading the docs, but that’s going to burn more tokens (expensive, slow, consumes carbon), so that’s not a great idea. In practice what people do today is to add skills to their project – for example, in his crate, Carl has a testing skill that also shows how to use assert-struct . But it seems silly for everybody who uses the crate to repeat that content. With Symposium, teaching your agent how to use your dependencies should not be necessary. Instead, your crates can publish their own skills or other extensions. The way this works is that the assert-struct crate defines the skill once, centrally, in its own repository 1 . Then there is a separate file in Symposium’s central recommendations repository with a pointer to the assert-struct repository. Any time that the assert-struct repository updates that skill, the updates are automatically synchronized for you. Neat! (You can also embed skills directly in the rr repository, but then updating them requires a PR to that repo.) It’s easy! Check out the docs here: https://symposium.dev/crate-authors/supporting-your-crate.html Skills, hooks, and MCP Servers, for now. Currently we allow skill content to be defined in a decentralized fashion but we require that a plugin be added to our central recommendations repository . This is a temporary limitation. We eventually expect to allow crate authors to adds skills and plugins in a fully decentralized fashion. We chose to limit ourselves to a centralized repository early on for three reasons: No problem, you can add a custom plugin source. I am, very much so. I feel like a lot of the uses of LLMs we see today are not great (e.g., chat bots hijack conversational and social cues to earn trust that they don’t deserve ) and to reconfirm peoples’ biases instead of challenging their ideas. And I’m worried about the environmental cost of data centers and the way companies have retreated from their climate goals . And I don’t like how centralized models concentrate economic power . 2 So yeah, I see all that. And I also see how LLMs enable people to build things that they couldn’t build before and help to make previously intractable problems soluble – and that includes more and more people who never thought of themselves as programmers 3 . My goal with Symposium and other projects is to be part of the solution, finding ways to leverage LLMs that are net positive: opening doors, not closing them. Fundamentally, the reason I am working on Symposium is that I believe everybody has something unique to offer . I see the appeal of strongly opinionated systems that reflect the brilliant vision of a particular person. But to me, the most beautiful systems are the ones that everybody gets to build together 4 . This is why I love open source. This is why I love emacs 5 . It’s why I love VSCode’s extension system, which has so many great gems 6 . To me, Symposium is a double win in terms of empowerment. First, it makes agents extensible, which is going to give crate authors more power to support their crates. But it also helps make agentic programming better, which I believe will ultimately open up programming to a lot more people . And that is what it’s all about. Actually as of this posting, the assert-struct skill is embedded directly in the recommendations repo . But I opened a PR to put it on assert-struct and I’ll port it over once it lands.  ↩︎ I’m very curious to do more with open models.  ↩︎ Within Amazon, it’s been amazing to watch how many people who never thought of themselves as software developers are starting to build software. Considering the challenges the software industry has with representation, I find this very encouraging. Diverse teams are stronger, better teams!   ↩︎ None of this is to say I don’t believe in good defaults; there’s a reason I use Zed and VSCode these days, and not emacs, much as I love it in concept.  ↩︎ OMG. One of my friends college wrote this amazing essay some time back on emacs . Next time you’re doomscrolling on the toilet or whatever, pop over to this essay instead. Fair warning, it’s long, so it’ll take you a while to read, but I think it nails what people love about emacs.  ↩︎ These days I’m really enjoying Zed, but I have to say, I really miss kahole/edamagit ! Which of course is inspired by the magit emacs package .  ↩︎ Even when decentralized support exists, a centralized repository will be useful, since there will always be crates that choose not to provide that support. Having a central list of plugins will make it easy to update people as we evolve Symposium. Having a centralized repository will help protect against malicious skills[^threat] while we look for other mechanisms, since we can vet the crates that are added and easily scan their content. Actually as of this posting, the assert-struct skill is embedded directly in the recommendations repo . But I opened a PR to put it on assert-struct and I’ll port it over once it lands.  ↩︎ I’m very curious to do more with open models.  ↩︎ Within Amazon, it’s been amazing to watch how many people who never thought of themselves as software developers are starting to build software. Considering the challenges the software industry has with representation, I find this very encouraging. Diverse teams are stronger, better teams!   ↩︎ None of this is to say I don’t believe in good defaults; there’s a reason I use Zed and VSCode these days, and not emacs, much as I love it in concept.  ↩︎ OMG. One of my friends college wrote this amazing essay some time back on emacs . Next time you’re doomscrolling on the toilet or whatever, pop over to this essay instead. Fair warning, it’s long, so it’ll take you a while to read, but I think it nails what people love about emacs.  ↩︎ These days I’m really enjoying Zed, but I have to say, I really miss kahole/edamagit ! Which of course is inspired by the magit emacs package .  ↩︎

0 views