Posts in Rust (20 found)

watgo - a WebAssembly Toolkit for Go

I'm happy to announce the general availability of watgo - the W eb A ssembly T oolkit for G o. This project is similar to wabt (C++) or wasm-tools (Rust), but in pure, zero-dependency Go. watgo comes with a CLI and a Go API to parse WAT (WebAssembly Text), validate it, and encode it into WASM binaries; it also supports decoding WASM from its binary format. At the center of it all is wasmir - a semantic representation of a WebAssembly module that users can examine (and manipulate). This diagram shows the functionalities provided by watgo: watgo comes with a CLI, which you can install by issuing this command: The CLI aims to be compatible with wasm-tools [1] , and I've already switched my wasm-wat-samples projects to use it; e.g. a command to parse a WAT file, validate it and encode it into binary format: wasmir semantically represents a WASM module with an API that's easy to work with. Here's an example of using watgo to parse a simple WAT program and do some analysis: One important note: the WAT format supports several syntactic niceties that are flattened / canonicalized when lowered to wasmir . For example, all folded instructions are lowered to unfolded ones (linear form), function & type names are resolved to numeric indices, etc. This matches the validation and execution semantics of WASM and its binary representation. These syntactic details are present in watgo in the textformat package (which parses WAT into an AST) and are removed when this is lowered to wasmir . The textformat package is kept internal at this time, but in the future I may consider exposing it publicly - if there's interest. Even though it's still early days for watgo, I'm reasonably confident in its correctness due to a strategy of very heavy testing right from the start. WebAssembly comes with a large official test suite , which is perfect for end-to-end testing of new implementations. The core test suite includes almost 200K lines of WAT files that carry several modules with expected execution semantics and a variety of error scenarios exercised. These live in specially designed .wast files and leverage a custom spec interpreter. watgo hijacks this approach by using the official test suite for its own testing. A custom harness parses .wast files and uses watgo to convert the WAT in them to binary WASM, which is then executed by Node.js [2] ; this harness is a significant effort in itself, but it's very much worth it - the result is excellent testing coverage. watgo passes the entire WASM spec core test suite. Similarly, we leverage wabt's interp test suite which also includes end-to-end tests, using a simpler Node-based harness to test them against watgo. Finally, I maintain a collection of realistic program samples written in WAT in the wasm-wat-samples repository ; these are also used by watgo to test itself. Parse: a parser from WAT to wasmir Validate: uses the official WebAssembly validation semantics to check that the module is well formed and safe Encode: emits wasmir into WASM binary representation Decode: read WASM binary representation into wasmir

0 views
Corrode 1 weeks ago

Cloudsmith

Rust adoption can be loud, like when companies such as Microsoft, Meta, and Google announce their use of Rust in high-profile projects. But there are countless smaller teams quietly using Rust to solve real-world problems, sometimes even without noticing. This episode tells one such story. Cian and his team at Cloudsmith have been adopting Rust in their Python monolith not because they wanted to rewrite everything in Rust, but because Rust extensions were simply best-in-class for the specific performance problems they were trying to solve in their Django application. As they had these initial successes, they gained more confidence in Rust and started using it in more and more areas of their codebase. CodeCrafters helps you become proficient in Rust by building real-world, production-grade projects. Learn hands-on by creating your own shell, HTTP server, Redis, Kafka, Git, SQLite, or DNS service from scratch. Start for free today and enjoy 40% off any paid plan by using this link . Made with love in Belfast and trusted around the world. Cloudsmith is the fully-managed solution for controlling, securing, and distributing software artifacts. They analyze every package, container, and ML model in an organization’s supply chain, allow blocking bad packages before they reach developers, and build an ironclad chain of custody. Cian is a Service Reliability Engineer located in Dublin, Ireland. He has been working with Rust for 10 years and has a history of helping companies build reliable and efficient software. He has a BA in Computer Programming from Dublin City University. Lee Skillen’s blog - The blog of Lee Skillen, Cloudsmith’s co-founder and CTO Django - Python on Rails Django Mixins - Great for scaling up, not great for long-term maintenance SBOM - Software Bill of Materials Microservice vs Monolith - Martin Fowler’s canonical explanation Jaeger - “Debugger” for microservices PyO3 - Rust-to-Python and Python-to-Rust FFI crate orjson - Pretty fast JSON handling in Python using Rust drf-orjson-renderer - Simple orjson wrapper for Django REST Framework Rust in Python cryptography - Parsing complex data formats is just safer in Rust! jsonschema-py - jsonschema in Python with Rust, mentioned in the PyO3 docs WSGI - Python’s standard for HTTP server interfaces uWSGI - A application server providing a WSGI interface rustimport - Simply import Rust files as modules in Python, great for prototyping granian - WSGI application server written in Rust with tokio and hyper hyper - HTTP parsing and serialization library for Rust HAProxy - Feature rich reverse proxy with good request queue support nginx - Very common reverse proxy with very nice and readable config locust - Fantastic load-test tool with configuration in Python goose - Locust, but in Rust Podman - Daemonless container engine Docker - Container platform buildx - Docker CLI plugin for extended build capabilities with BuildKit OrbStack - Faster Docker for Desktop alternative Rust in Production: curl with Daniel Stenberg - Talking about hyper’s strictness being at odds with curl’s permissive design axum - Ergonomic and modular web framework for Rust rocket - Web framework for Rust Cloudsmith Website Cian Butler’s Website Cian’s E-Mail

0 views
Evan Schwartz 1 weeks ago

Scour - March Update

Hi friends, In March, Scour scoured 813,588 posts from 24,029 feeds (7,131 were newly added) and 488 new users signed up. Welcome! Here's what's new in the product: Scour now does a better job of ensuring that your feed draws from a mix of sources and that no single interest or group of interests dominates. I had made a number of changes along these lines in the past, but they were fiddly and the diversification mechanism wasn't working that well. Under the hood, Scour now does a first pass to score how similar articles are to your interests and then has a separate step for selecting posts for your feed while keeping it diverse on a number of different dimensions. Content from websites and groups of interests you tend to like and/or click on more are now given slightly more room in your feed. Conversely, websites and groups of interests you tend to dislike or not click on will be given a bit less space. For Scour, I'm always trying to think of how to show you more content you'll find interesting -- without trapping you in a small filter bubble (you can read about my ranking philosophy in the docs). After a number of iterations, I landed on a design that I'm happy with. I hope this strikes a good balance between making sure you see articles from your favorite sources, while still leaving room for the serendipity of finding a great new source that you didn't know existed. After you click an article, Scour now explicitly asks you for your reaction. These reactions help tune your feed slightly , and they help me improve the ranking algorithm over time. Before, the reaction buttons were below every post but that made them a bit hard to hit intentionally and easy to touch accidentally. If you want to react to an article without reading it first, you can also find them in the More Options ( ) menu. Thanks to Shane Sveller for pointing out that the reaction buttons were too small on mobile! Scour now supports exact keyword matching, in addition to using vector embeddings for semantic similarity. Articles that are similar to one of your interests but don't use the exact words or phrases from your interest definition will be ranked lower. Right now this applies to interests marked as "Specific" or "Normal" (this is also automatically determined when interests are created). This should cut down on the number of articles you see that are mis-categorized or clearly off-topic. Thanks to Alex Miller and an anonymous user for prompting this, and thanks to Alex, JackJackson, mhsid, snuggles, and anders_no for all the Off-Topic reports! Sometimes, I see an article on Hacker News or elsewhere and wonder why didn't this show up in my Scour feed. You can now paste links into the Why didn't I see this? page, and it will give you a bit of an explanation. You can also report that so I can look into it more and continue to improve the ranking algorithm over time. Here were some of my favorite posts that I found on Scour in March: Happy Scouring! P.S. If you use a coding agent like Claude Code, I also wrote up A Rave Review of Superpowers , a plugin that makes me much more productive. For anyone building products, this is a good reminder to make sure you're trying out and experiencing the bad parts of your product: Bored of eating your own dogfood? Try smelling your own farts! . This was a brief, interesting history and technical overview of document formats, from to and and why Markdown "won": Markdown Ate The World . A reminder that any user-generated input, including repo branch names, can be malicious: OpenAI Codex: How a Branch Name Stole GitHub Tokens . This is a very detailed and informative visual essay explaining how quantization (compression) for large language models works: Quantization from the ground up . I'm not currently using Turso (the Rust rewrite of SQLite), but I think what they're doing is interesting. Including this experimental version that speaks the Postgres SQL dialect: pgmicro . And because I like making -- and eating -- sour sourdough: How To Make Sourdough Bread More (Or Less) Sour .

0 views
Martin Fowler 1 weeks ago

Fragments: April 2

As we see LLMs churn out scads of code, folks have increasingly turned to Cognitive Debt as a metaphor for capturing how a team can lose understanding of what a system does. Margaret-Anne Storey thinks a good way of thinking about these problems is to consider three layers of system health : While I’m getting a bit bemused by debt metaphor proliferation, this way of thinking does make a fair bit of sense. The article includes useful sections to diagnose and mitigate each kind of debt. The three interact with each other, and the article outlines some general activities teams should do to keep it all under control ❄                ❄ In the article she references a recent paper by Shaw and Nave at the Wharton School that adds LLMs to Kahneman’s two-system model of thinking . Kahneman’s book, “Thinking Fast and Slow”, is one of my favorite books. Its central idea is that humans have two systems of cognition. System 1 (intuition) makes rapid decisions, often barely-consciously. System 2 (deliberation) is when we apply deliberate thinking to a problem. He observed that to save energy we default to intuition, and that sometimes gets us into trouble when we overlook things that we would have spotted had we applied deliberation to the problem. Shaw and Nave consider AI as System 3 A consequence of System 3 is the introduction of cognitive surrender, characterized by uncritical reliance on externally generated artificial reasoning, bypassing System 2. Crucially, we distinguish cognitive surrender, marked by passive trust and uncritical evaluation of external information, from cognitive offloading, which involves strategic delegation of cognition during deliberation. It’s a long paper, that does into detail on this “Tri-System theory of cognition” and reports on several experiments they’ve done to test how well this theory can predict behavior (at least within a lab). ❄                ❄                ❄                ❄                ❄ I’ve seen a few illustrations recently that use the symbols “< >” as part of an icon to illustrate code. That strikes me as rather odd, I can’t think of any programming language that uses “< >” to surround program elements. Why that and not, say, “{ }”? Obviously the reason is that they are thinking of HTML (or maybe XML), which is even more obvious when they use “</>” in their icons. But programmers don’t program in HTML. ❄                ❄                ❄                ❄                ❄ Ajey Gore thinks about if coding agents make coding free, what becomes the expensive thing ? His answer is verification. What does “correct” mean for an ETA algorithm in Jakarta traffic versus Ho Chi Minh City? What does a “successful” driver allocation look like when you’re balancing earnings fairness, customer wait time, and fleet utilisation simultaneously? When hundreds of engineers are shipping into ~900 microservices around the clock, “correct” isn’t one definition — it’s thousands of definitions, all shifting, all context-dependent. These aren’t edge cases. They’re the entire job. And they’re precisely the kind of judgment that agents cannot perform for you. Increasingly I’m seeing a view that agents do really well when they have good, preferably automated, verification for their work. This encourages such things as Test Driven Development . That’s still a lot of verification to do, which suggests we should see more effort to find ways to make it easier for humans to comprehend larger ranges of tests. While I agree with most of what Ajey writes here, I do have a quibble with his view of legacy migration. He thinks it’s a delusion that “agentic coding will finally crack legacy modernisation”. I agree with him that agentic coding is overrated in a legacy context, but I have seen compelling evidence that LLMs help a great deal in understanding what legacy code is doing . The big consequence of Ajey’s assessment is that we’ll need to reorganize around verification rather than writing code: If agents handle execution, the human job becomes designing verification systems, defining quality, and handling the ambiguous cases agents can’t resolve. Your org chart should reflect this. Practically, this means your Monday morning standup changes. Instead of “what did we ship?” the question becomes “what did we validate?” Instead of tracking output, you’re tracking whether the output was right. The team that used to have ten engineers building features now has three engineers and seven people defining acceptance criteria, designing test harnesses, and monitoring outcomes. That’s the reorganisation. It’s uncomfortable because it demotes the act of building and promotes the act of judging. Most engineering cultures resist this. The ones that don’t will win. ❄                ❄                ❄                ❄                ❄ One the questions comes up when we think of LLMs-as-programmers is whether there is a future for source code. David Cassel on The New Stack has an article summarizing several views of the future of code . Some folks are experimenting with entirely new languages built with the LLM in mind, others think that existing languages, especially strictly typed languages like TypeScript and Rust will be the best fit for LLMs. It’s an overview article, one that has lots of quotations, but not much analysis in itself - but it’s worth a read as a good overview of the discussion. I’m interested to see how all this will play out. I do think there’s still a role for humans to work with LLMs to build useful abstractions in which to talk about what the code does - essentially the DDD notion of Ubiquitous Language . Last year Unmesh and I talked about growing a language with LLMs. As Unmesh put it Programming isn’t just typing coding syntax that computers can understand and execute; it’s shaping a solution. We slice the problem into focused pieces, bind related data and behaviour together, and—crucially—choose names that expose intent. Good names cut through complexity and turn code into a schematic everyone can follow. The most creative act is this continual weaving of names that reveal the structure of the solution that maps clearly to the problem we are trying to solve. Technical debt lives in code. It accumulates when implementation decisions compromise future changeability. It limits how systems can change. Cognitive debt lives in people. It accumulates when shared understanding of the system erodes faster than it is replenished. It limits how teams can reason about change. Intent debt lives in artifacts. It accumulates when the goals and constraints that should guide the system are poorly captured or maintained. It limits whether the system continues to reflect what we meant to build and it limits how humans and AI agents can continue to evolve the system effectively.

0 views

Summary of reading: January - March 2026

"Intellectuals and Society" by Thomas Sowell - a collection of essays in which Sowell criticizes "intellectuals", by which he mostly means left-leaning thinkers and opinions. Interesting, though certainly very biased. This book is from 2009 and focuses mostly on early and mid 20th century; yes, history certainly rhymes. "The Hacker and the State: Cyber Attacks and the New Normal of Geopolitics" by Ben Buchanan - a pretty good overview of some of the the major cyber-attacks done by states in the past 15 years. It doesn't go very deep because it's likely just based on the bits and pieces that leaked to the press; for the same reason, the coverage is probably very partial. Still, it's an interesting and well-researched book overall. "A Primate's Memoir: A Neuroscientist’s Unconventional Life Among the Baboons" by Robert Sapolsky - an account of the author's years spent researching baboons in Kenya. Only about a quarter of the book is really about baboons, though; mostly, it's about the author's adventures in Africa (some of them surely inspired by an intense death wish) and his interaction with the local peoples. I really liked this book overall - it's engaging, educational and funny. Should try more books by this author. "Seeing Like a State" by James C. Scott - the author attempts to link various events in history to discuss "Why do well-intentioned plans for improving the human condition go tragically awry?"; discussing large state plans like scientific forest management, building pre-planned cities and mono-colture agriculture. Some of the chapters are interesting, but overall I'm not sure I'm sold on the thesis. Specifically, the author mixes in private enterprises (like industrial agricultire in the West) with state-driven initiatives in puzzling ways. "Karate-Do: My Way of Life" by Gichin Funakoshi - short autobiography from the founder of modern Shotokan Karate. It's really interesting to find out how recent it all is - prior to WWII, Karate was an obscure art practiced mostly in Okinawa and a bit in other parts of Japan. The author played a critical role in popularizing Karate and spreading it out of Okinawa in the first half of the 20th century. The writing is flowing and succinct - I really liked this book. "A Tale of a Ring" by Ilan Sheinfeld (read in Hebrew) - a multi-generational fictional saga of two families who moved from Danzig (today Gdansk in Poland) to Buenos Aires in late 19th century, with a touch of magic. Didn't like this one very much. "The Wide Wide Sea: Imperial Ambition, First Contact and the Fateful Final Voyage of Captain James Cook" by Hampton Sides - a very interesting account of Captain Cook's last voyage (the one tasked with finding a northwest passage around Canada). The book has a strong focus on his interaction with Polynesian peoples along the way, especially on Hawaii (which he was the first European to visit). "The Suitcase" by Sergei Dovlatov - (read in Russian) a collection of short stories in Dovlatov's typical humorist style. Very nice little book. "The Second Chance Convenience Store" by Kim Ho-Yeon - a collection of connected stories centered around a convenience store in Seoul, and an unusual new employee that began working night shifts there. Short and sweet fiction, I enjoyed it. "A History of the Bible: The Story of the World's Most Influential Book" by John Barton - a very detailed history of the Bible, covering both the old and new testaments in many aspects. Some parts of the book are quite tedious; it's not an easy read. Even though the author tries to maintain a very objective and scientific approach, it's apparent (at least for an atheist) that he skirts as close as possible to declaring it all nonsense, given that he's a priest! "Rust Atomics and Locks: Low-Level Concurrency in Practice" by Mara Bos - an overview of low-level concurrency topics using Rust. It's a decent book for people not too familiar with the subject; I personally didn't find it too captivating, but I do see the possibility of referring to it in the future if I get to do some lower-level Rust hacking. A comment on the code samples: it would be nice if the accompanying repository had test harnesses to observe how the code behaves, and some benchmarks. Without this, many claims made in the book feel empty without real data to back them up, and it's challenging to play with the code and see it perform in real life. "Hot Chocolate on Thursday" by Michiko Aoyama - a bit similar to "What You Are Looking for Is in the Library" by the same author: connected short stories about ordinary people living their life in Japan (with one detour to Australia). Slightly worse than the previous book, but still pretty good. "The Silmarillion" by J.R.R. Tolkien - enen though I'm a big LOTR fan, I've never gotten myself to read this one, due to its reputation for being difficult. What changed things eventually (25 years after my first read through of LOTR) is my kids! They liked LOTR so much that they went straight ahead to Silmarillion and burned through it as well, so I couldn't stay behind. What can I say, this book is pretty amazing. The amazing thing is how a book can be both epic and borderline unreadable at the same time :) Tolkien really let himself go with the names here (3-4 new names introduced per page, on average), names for characters, names for natural features like forests and rivers, names for all kinds of magical paraphenalia; names that change in time, different names given to the same thing by different peoples, and on and on. The edition I was reading has a helpful name index at the end (42 pages long!) which was very helpful, but it still made the task only marginally easier. Names aside though, the book is undoubtedly monumental; the language is outstanding. It's a whole new mythology, Bible-like in scope, all somehow more-or-less consistent (if you remember who is who, of course); it's an injustice to see this just as a prelude to the LOTR books. Compared to the scope of the Simlarillion, LOTR is just a small speck of a quest told in detail; The Silmarillion - among other things - includes brief tellings of at least a dozen stories of similar scope. Many modern book (or TV) series build whole "universes" with their own rules, history and aesthetic. The Silmarillion must be considered the OG of this. "Travels with Charley in Search of America" by John Steinbeck "Deep Work" by Cal Newport "The Philadelphia chromosome" by Jessica Wapner "The Price of Privelege" by Madeline Levine

0 views
matduggan.com 3 weeks ago

Markdown Ate The World

I have always enjoyed the act of typing words and seeing them come up on screen. While my favorite word processor of all time might be WordPerfect ( here ), I've used almost all of them. These programs were what sold me on the entire value proposition of computers. They were like typewriters, which I had used in school, except easier in every single way. You could delete things. You could move paragraphs around. It felt like cheating, and I loved it. As time has gone up what makes up a "document" in word processing has increased in complexity. This grew as word processors moved on from being proxies for typewriters and into something closer to a publishing suite. In the beginning programs like WordPerfect, WordStar, MultiMate, etc had flat binary files with proprietary formatting codes embedded in there. When word processors were just proxies for typewriters, this made a lot of sense. But as Microsoft Word took off in popularity and quickly established itself as the dominant word processor, we saw the rise of the .doc file format. This was an exponential increase in complexity from what came before, which made sense because suddenly word processors were becoming "everything tools" — not just typing, but layout, images, revision tracking, embedded objects, and whatever else Microsoft could cram in there. At its base the is a Compound File Binary Format, which is effectively just a FAT file system with the file broken into sectors that are chained together with a File Allocation Table. It's an interesting design. A normal file system would end up with sort of a mess of files to try and contain everything that the has, but if you store all of that inside of a simplified file system contained within one file then you could optimize for performance and reduced the overhead that comes with storing separate objects in a flat file. It also optimizes writes, because you don't need to rewrite the entire file when you add an object and it keeps it simple to keep revision history. But from a user perspective, they're "just" dealing with a single file. ( Reference ) The .doc exploded and quickly became the default file format for humanity's written output. School papers, office memos, résumés, the Great American Novel your uncle was definitely going to finish — all .doc files. But there was a problem with these files. They would become corrupted all of the goddamn time. Remember, these were critical documents traveling from spinning rust drives on machines that crashed constantly compared to modern computers, often copied to floppy disks or later to cheap thumb drives you got from random vendor giveaways at conferences, and then carried to other computers in backpacks and coat pockets. The entire workflow had the structural integrity of a sandwich bag full of soup. So when Word was saving your critical file, it was actually doing a bunch of different operations. It was: These weren't atomic operations so it was super easy in an era when computers constantly crashed or had problems to end up in a situation where some structures were updated and others weren't. Compared to like a file where you would either get the old version or a truncated new version. You might lose content, but you almost never ended up with an unreadable file. With as someone doing like helpdesk IT, you constantly ended up with people that had just corrupted unreadable files. And here's the part that really twisted the knife: the longer you worked on the same file, the more important that file likely was. But Word didn't clean up after itself. As a .doc accumulated images, tracked changes, and revision history, the internal structure grew more complex and the file got larger. But even when you deleted content from the document, the data wasn't actually removed from the file. It was marked as free space internally but left sitting there, like furniture you moved to the curb that nobody ever picked up. The file bloated. The internal fragmentation worsened. And the probability of corruption increased in direct proportion to how much you cared about the contents. Users had to be trained both to save the file often (as AutoRecover wasn't reliable enough) and to periodically "Save As" a new file to force Word to write a clean version from scratch. This was the digital equivalent of being told that your car works fine, you just need to rebuild the engine every 500 miles as routine maintenance. The end result was that Microsoft Word quickly developed a reputation among technical people as horrible to work with. Not because it was a bad word processor — it was actually quite good at the word processing part — but because when a user showed up at the Help Desk with tears in their eyes, the tools I had to help them were mostly useless. I could scan the raw file for text patterns, which often pulled out the content, but without formatting it wasn't really a recovered file — it was more like finding your belongings scattered across a field after a tornado. Technically your stuff, but not in any useful arrangement. Sometimes you could rebuild the FAT or try alternative directory entries to recover slightly older versions. But in general, if the .doc encountered a structural error, the thing was toast and your work was gone forever. This led to a never-ending series of helpdesk sessions where I had to explain to people that yes, I understood they had worked on this file for months, but it was gone and nobody could help them. I became a grief counselor who happened to know about filesystems. Thankfully, people quickly learned to obsessively copy their files to multiple locations with different names — thesis_final.doc, thesis_final_v2.doc, thesis_FINAL_FINAL_REAL.doc — but this required getting burned at least once, which is sort of like saying you learned your car's brakes didn't work by driving into a bus. So around 2007 we see the shift from to , which introduces a lot of hard lessons from the problems of . First, it's just a bundle, specifically a ZIP archive. Now in theory, this is great. Your content is human-readable XML. Your images are just image files. If something goes wrong, you can rename the file to .zip, extract it, and at least recover your text by opening document.xml in Notepad. The days of staring at an opaque binary blob and praying were supposed to be over. However, in practice, something terrible happened. Microsoft somehow managed to produce the worst XML to ever exist in human history. Let me lay down the scope of this complexity, because I have never seen anything like it in my life. Here is the standards website for ECMA-376. Now you know you are in trouble when you see a 4 part download that looks like the following: If you download Part 1, you are given the following: Now if you open that PDF, get ready for it. It's a 5039 page PDF. I have never conceived of something this complicated. It's also functionally unreadable, and I say this as someone who has, on multiple occasions in his life, read a car repair manual cover to cover because I didn't have anything else to do. I once read the Haynes manual for a 1994 Honda Civic like it was a beach novel. This is not that. This is what happens when a standards committee gets a catering budget and no deadline. There was an accusation at the time that Microsoft was making OOXML deliberately more complicated than it needed to be — that the goal was to claim it was an "open standard" while making the standard so incomprehensibly vast that it would take a heroic effort for anyone else to implement it. I think this is unquestionably true. LibreOffice has a great blog post on it that includes this striking comparison: So the difference between ODF format and the OOXML format results in a exponentially less complicated XML file. Either you could do the incredible amount of work to become compatible with this nightmarish specification or you could effectively find yourself cut out of the entire word processing ecosystem. Now without question this was done by Microsoft in order to have their cake and eat it too. They would be able to tell regulators and customers that this wasn't a proprietary format and that nobody was locked into the Microsoft Office ecosystem for the production of documents, which had started to become a concern among non-US countries that now all of their government documents and records were effectively locked into using Microsoft. However the somewhat ironic thing is it ended up not mattering that much because soon the only desktop application that would matter is the browser. The file formats of word processors were their own problems, but more fundamentally the nature of how people consumed content was changing. Desktop based applications became less and less important post 2010 and users got increasingly more frustrated with the incredibly clunky way of working with Microsoft Word and all traditional files with emailing them back and forth endlessly or working with file shares. So while was a superior format from the perspective of "opening the file and it becoming corrupted", it also was fundamentally incompatible with the smartphone era. Even though you could open these files, soon the expectation was that whatever content you wanted people to consume should be viewable through a browser. As "working for a software company" went from being a niche profession to being something that seemingly everyone you met did, the defacto platform for issues, tracking progress, discussions, etc moved to GitHub. This was where I (and many others) first encountered Markdown and started using it on a regular basis. John Gruber, co-creator of Markdown, has a great breakdown of "standard" Markdown and then there are specific flavors that have branched off over time. You can see that here . The important part though is: it lets you very quickly generate webpages that work on every browser on the planet with almost no memorization and (for the most part) the same thing works in GitHub, on Slack, in Confluence, etc. You no longer had to ponder whether the person you were sending to had the right license to see the thing you were writing in the correct format. This combined with the rise of Google Workspace with Google Docs, Slides, etc meant your technical staff were having conversations through Markdown pages and your less technical staff were operating entirely in the cloud. Google was better than Microsoft at the sort of stuff Word had always been used for, which is tracking revisions, handling feedback, sharing securely, etc. It had a small subset of the total features but as we all learned, nobody knew about the more advanced features of Word anyway. By 2015 the writing was on the wall. Companies stopped giving me an Office license by default, switching them to "you can request a license". This, to anyone who has ever worked for a large company, is the kiss of death. If I cannot be certain that you can successfully open the file I'm working on, there is absolutely no point in writing it inside of that platform. Combine that with the corporate death of email and replacing it with Slack/Teams, the entire workflow died without a lot of fanfare. Then with the rise of LLMs and their use (perhaps overuse) of Markdown, we've reached peak . Markdown is the format of our help docs, many of our websites are generated exclusively from Markdown. It's now the most common format that I write anything in. This was originally written in Markdown inside of Vim. There's a lot of reasons why I think Markdown ended up winning, in no small part because it solved a real problem in an easy to understand way. Writing HTML is miserable and overkill for most tasks, this removed the need to do that and your output was consumable in a universal and highly performant way that required nothing of your users except access to a web browser. But I also think it demonstrates an interesting lesson about formats. and . along with ODF are pretty highly specialized things designed to handle the complexity of what modern word processing can do. LibreOffice lets you do some pretty incredible things that cover a huge range of possible needs. Markdown doesn't do most of what those formats do. You can't set margins. You can't do columns. You can't embed a pivot table or track changes or add a watermark that says DRAFT across every page in 45-degree gray Calibri. Markdown doesn't even have a native way to change the font color. And none of that mattered, because it turns out most writing isn't about any of those things. Most writing is about getting words down in a structure that makes sense, and then getting those words in front of other people. Markdown does that with less friction than anything else ever created. You can learn it in ten minutes, write it in any text editor on any device, read the source file without rendering it, diff it in version control, and convert it to virtually any output format. The files are plain text. They will outlive every application that currently renders them. They don't belong to any company. They can't become corrupted in any meaningful way — the worst thing that can happen to a Markdown file is you lose some characters, and even then the rest of the file is fine. After decades of nursing .doc files like they were delicate flowers that you had to transport home strapped to your car roof, the idea of a format that simply cannot structurally fail is not just convenient. It's a kind of liberation. I think about this sometimes when I'm writing in Vim at midnight, just me and a blinking cursor and a plain text file that will still be readable when I'm dead. No filesystem-within-a-filesystem. No sector allocation tables. No 5,039-page specification. Just words, a few hash marks, and never having to think about it again. Updating the document stream (your text) Updating the formatting tables Update the sector allocation tables Update the directory entries Update summary information Flush everything to disk Part 1 “Fundamentals And Markup Language Reference”, 5th edition, December 2016 Part 2 “Open Packaging Conventions”, 5th edition, December 2021 Part 3 “Markup Compatibility and Extensibility”, 5th edition, December 2015 Part 4 “Transitional Migration Features”, 5th edition, December 2016

0 views
baby steps 3 weeks ago

Maximally minimal view types, a follow-up

A short post to catalog two interesting suggestions that came in from my previous post, and some other related musings. It was suggested to me via email that we could use to eliminate the syntax ambiguity: Conceivably we could do this for the type, like: and in position: I have to sit with it but…I kinda like it? I’ll use it in the next example to try it on for size. In my post I said that if you hvae a public method whose type references private fields, you would not be able to call it from another scope: The error arises from desugaring to a call that references private fields: I proposed we could lint to avoid this situation. But an alternative was proposed where we would say that, when we introduce an auto-ref, if the callee references local variables not visible from this point in the program, we just borrow the entire struct rather than borrowing specific fields. So then we would desugar to: If we then say that is coercable to a , then the call would be legal. Interestingly, the autoderef loop already considers visibility: if you do , we will deref until we see a field visible to you at the current point . This raises an interesting question I did not discuss. What happens when you write a value of a type like ? For example, what if I do this: What I expect is that this would just swap the selected fields ( , in this case) and leave the other fields untouched. The basic idea is that a type indicates that the messages field is initialized and accessible and the other fields must be completely ignored. This represents another possible future extension. Today if you move out of a field in a struct, then you can no longer work with the value as a whole: But with selective borrowing, we could allow this, and you could even return “partially initialized” values: That’d be neat.

0 views
Kaushik Gopal 3 weeks ago

Podsync - I finally built my podcast track syncer

I host and edit a podcast 1 . When recording remotely, we each record our own audio locally (I on my end, my co-host on his). The service we use (Adobe Podcast, Zoom, Skype-RIP) captures everyone together as a master track. But the quality doesn’t match what each person records locally with their own microphone. So we use that master as a reference point and stitch the individual local tracks together. This is what the industry calls a “ double-ender ”. Add a guest and it becomes a “triple-ender”. But this gets hairy during editing. Each person starts their recording at a slightly different moment — everyone hits record at a different time. Before I can edit, I need to line everything up. Drop all the tracks into a DAW, play the master alongside each individual track, nudge by ear until the speech aligns. Add a guest and it gets tedious fast. 10–15 minutes of fiddly, ear-straining alignment before I’ve even started editing. There’s also drift. Each machine’s audio clock runs at a slightly different rate, so two tracks that are perfectly aligned at minute one might be 200ms apart by minute sixty. So I built PodSync 2 . I first heard of a similar technique from Marco Arment — back in ATP episode 25 . He had a new app for aligning double-ender tracks and was already thinking about whether something so niche was even worth releasing publicly. I don’t think he ever released it. Being a Kotlin developer at the time, I figured I’d build my own. Java was mature. Surely there were audio processing libraries that could handle this. There weren’t 😅. At least not in any clean, usable form. Getting the right signal processing pieces together in JVM-land was awkward enough that my interest fizzled, so I kept doing it by hand. When I revamped Fragmented , I finally came back to this. I used Claude to help me build it — in Rust, no less. 3 But before you chalk this up to another vibecoded project, hear me out. The interesting part here wasn’t just that AI made it easier. It was thinking through the actual algorithm: Voice activity detection ( VAD ) to find speech regions. MFCC features to fingerprint the audio. Cross-correlation to find where the tracks match. Some real signal processing techniques, not just prompt engineering. Now, could I have prompted my way to a solution? Probably. But I like to think, years of manually aligning tracks — and some sound engineering intuition — helped me steer AI towards a better solution. Working on this felt refreshing. In an era where half the conversation is about AI replacing engineering work, here’s a problem where the hard part is still the problem itself — understanding the domain, picking the right approach, knowing what “correct” sounds like. It gives me confidence that solving real problems well still has its place. I like how Dax put it: thdxr on twitter I really don’t care about using AI to ship more stuff. It’s really hard to come up with stuff worth shipping. The core idea: take a chunk of speech from a participant track, compare it against the master recording, find where they match best. That position is the time offset. The trick is picking which chunk of speech to use. Rather than betting on a single region, Podsync finds a few strong candidates per track (longer contiguous speech blocks preferred) and tries each one against the master. For long candidates, it samples from the start, middle, and end. The highest-confidence match wins; if a second independent region agrees on the same offset, that corroboration factors in as a tie-breaker. After finding the offset, Podsync pads or trims each track to align with the master and match its length (and outputs some info on the offset). Drop the output into my DAW at 0:00. Done. I even wrote an agent skill you can just point your agent harness to and it will take care of all the steps for you : What used to be 10–15 minutes of alignment per episode is now a single command. Marco, if you ever read this, would still love to see your implementation! His solution (as I understand) is aimed more at correcting the drift vs getting the offset right. In practice, I haven’t found drift to be much of a problem. It exists but stays minor, and I’m typically editing every second of the podcast anyway so it’s easy enough to handle by hand. I even had a branch that corrected drift by splicing at silence points, but it complicated things more than it helped. It’s a podcast on AI development but we strive to make it high signal. None of that masturbatory AI discourse .  ↩︎ See also Phone-sync .  ↩︎ I chose Rust (it’s what interests me these days ) and a CLI tool with no runtime dependency is more pleasant to distribute.  ↩︎ It’s a podcast on AI development but we strive to make it high signal. None of that masturbatory AI discourse .  ↩︎ See also Phone-sync .  ↩︎ I chose Rust (it’s what interests me these days ) and a CLI tool with no runtime dependency is more pleasant to distribute.  ↩︎

0 views
Lalit Maganti 4 weeks ago

syntaqlite: high-fidelity devtools that SQLite deserves

Most SQL tools treat SQLite as a “flavor” of a generic SQL parser. They approximate the language, which means they break on SQLite-exclusive features like virtual tables , miss syntax like UPSERT , and ignore the 22 compile-time flags that change the syntax SQLite accepts. So I built syntaqlite : an open-source parser, formatter, validator, and LSP built directly on SQLite’s own Lemon-generated grammar. It sees SQL exactly how SQLite sees it, no matter which version of SQLite you’re using or which feature flags you compiled with. It ships as a CLI , VS Code extension , Claude Code LSP plugin , and C / Rust libraries. There’s also a web playground which you can try now: paste any SQLite SQL and see parsing, formatting, and validation live in the browser, no install needed. Full documentation is available here . Here’s syntaqlite in action: Formatting with the CLI Validation with the CLI

1 views
underlap 1 months ago

Moving on to Servo

It’s finally time to move on from implementing to using it in . But how? Since my last post, I’ve been using Claude code extensively to help me complete : I got into a pattern of having Claude write a spec (in the form of commented out API with docs), write tests for the spec, and then implement the spec. Sometime Claude failed to do the simplest things. I asked it to extract some common code into a function, but it got part way through and then started reverting what it had done by inlining the function. Pointing this out didn’t help. It’s as if the lack of a measurable goal [1] made it lose “direction”. It was faster and safer to do this kind of change by hand. I published v0.0.7 of the crate. This is functionally complete and ready for consumption by Servo. is perhaps an ideal project for using Claude Code with its: By comparison, Servo is challenging for Claude (and human developers), having: And that’s not to mention Servo’s (anti-) AI Contribution policy . I asked Claude to plan the migration of Servo from to using the Migration Guide it had written previously. It came up with a credible plan including validating by running various tests. However, the plan didn’t include running the tests before making changes to be sure they already passed and the environment was set up correctly. The first hurdle in getting the tests to pass was that Servo doesn’t build on arch Linux. This is a known problem and a workaround was to use a devcontainer in vscode, a development environment running in a Linux container. A pre-req. was to install Docker, which gave me flashbacks to the latter part of my software development career (when I worked on container runtimes, Docker/OCI image management, and Kubernetes). These aspects of my career were part of my decision to retire when I did. I had little interest in these topics, beyond the conceptual side. After a bit of fiddling to install Docker and get it running, I tried to open the devcontainer in vscodium. The first issue with this was that some 48 GB of “build context” needed transferring to the Docker daemon. This was later reduced to 5 GB. The second issue was that vscodium was missing some APIs that were needed to make the devcontainer usable. So I uninstalled vscodium and installed vscode. I was then able to ask Claude to proceed to check that the validation tests ran correctly. The program failed to run [2] , so Claude used Cargo to run various tests. After implementing the first step of the plan, Claude mentioned that there was still one compilation error, but that didn’t matter because it had been present at the start. This was a mistake along the lines of “AI doesn’t care”. Any developer worth their salt would have dug into the error before proceeding to implement a new feature. Anyway, I got Claude to commit its changes in the devcontainer. I then found, when trying to squash two Claude commits outside the container that some of the files in had been given as an owner, because the devcontainer was running under the container . I tried modifying the devcontainer to use a non-root user ( ), but then the build couldn’t update the directory which is owned by . I considered investigating this further to enable a non-root user to build Servo inside a devcontainer [3] , but at this point I started to feel like I was in a hole and should stop digging: So I took a step back and decided the discuss the way forward with the Servo developers: Applying IPC channel multiplexing to Servo . Code complexity metrics seem to have fallen out of favour, but maybe some such metrics would help Claude to keep going in the right direction when refactoring. ↩︎ I later got running, by installing , but not all the tests ran successfully (four hung). ↩︎ The unit tests passed in the devcontainer with after applying (where is the relevant non-root user and group) to , , and . ↩︎ Not according to Servo’s (anti-) AI Contribution policy . ↩︎ Added non-blocking receive: and methods. Revised to avoid describing internals. Added a Migration Guide on migrating from to . Added and , identified as missing by the Migration Guide. Improved test speed and reduced its variability. Relatively simple and well-documented API, Unit and integration tests (all of which run in under five seconds), Benchmarks, Strong typing (a given for Rust), Linting (including pedantic lints), Standard formatting, the ipc-channel API for comparison. A complex API, Extremely slow tests, An enormous codebase. Do I really want to continue using Claude Code? Would AI generated code be acceptable to Servo developers? [4] Do I actually want to get back into wrestling with a mammoth project now I’m retired? Code complexity metrics seem to have fallen out of favour, but maybe some such metrics would help Claude to keep going in the right direction when refactoring. ↩︎ I later got running, by installing , but not all the tests ran successfully (four hung). ↩︎ The unit tests passed in the devcontainer with after applying (where is the relevant non-root user and group) to , , and . ↩︎ Not according to Servo’s (anti-) AI Contribution policy . ↩︎

0 views
Chris Coyier 1 months ago

Claude is an Electron App

Juicy intro from Nikita Prokopov : In  “Why is Claude an Electron App?”  Drew Breunig wonders: Claude spent $20k on an agent swarm implementing (kinda) a C-compiler in Rust, but desktop Claude is an Electron app. If code is free, why aren’t all apps native? And then argues that the answer is that LLMs are not good enough yet. They can do 90% of the work, so there’s still a substantial amount of manual polish, and thus, increased costs. But I think that’s not the real reason. The real reason is: native has nothing to offer.

0 views
daniel.haxx.se 1 months ago

Dependency tracking is hard

curl and libcurl are written in C. Rather low level components present in many software systems. They are typically not part of any ecosystem at all. They’re just a tool and a library. In lots of places on the web when you mention an Open Source project, you will also get the option to mention in which ecosystem it belongs. npm, go, rust, python etc. There are easily at least a dozen well-known and large ecosystems. curl is not part of any of those. Recently there’s been a push for PURLs ( Package URLs ), for example when describing your specific package in a CVE. A package URL only works when the component is part of an ecosystem. curl is not. We can’t specify curl or libcurl using a PURL. SBOM generators and related scanners use package managers to generate lists of used components and their dependencies . This makes these tools quite frequently just miss and ignore libcurl. It’s not listed by the package managers. It’s just in there, ready to be used. Like magic. It is similarly hard for these tools to figure out that curl in turn also depends and uses other libraries. At build-time you select which – but as we in the curl project primarily just ships tarballs with source code we cannot tell anyone what dependencies their builds have. The additional libraries libcurl itself uses are all similarly outside of the standard ecosystems. Part of the explanation for this is also that libcurl and curl are often shipped bundled with the operating system many times, or sometimes perceived to be part of the OS. Most graphs, SBOM tools and dependency trackers therefore stop at the binding or system that uses curl or libcurl, but without including curl or libcurl. The layer above so to speak. This makes it hard to figure out exactly how many components and how much software is depending on libcurl. A perfect way to illustrate the problem is to check GitHub and see how many among its vast collection of many millions of repositories that depend on curl. After all, curl is installed in some thirty billion installations, so clearly it used a lot . (Most of them being libcurl of course.) It lists one dependency for curl. Repositories that depend on curl/curl: one. Screenshot taken on March 9, 2026 What makes this even more amusing is that it looks like this single dependent repository ( Pupibent/spire ) lists curl as a dependency by mistake.

0 views
Evan Schwartz 1 months ago

Scour - February Update

Hi friends, In February, Scour scoured 647,139 posts from 17,766 feeds (1,211 were newly added). Also, 917 new users signed up, so welcome everyone who just joined! Here's what's new in the product: If you subscribe to specific feeds (as opposed to scouring all of them), Scour can now infer topics you might be interested in from them. You can click the link that says "Suggest from my feeds" on the Interests page . Thank you to the anonymous user who requested this! The onboarding experience is simpler. Instead of typing out three interests, you now can describe yourself and your interests in free-form text. Scour extracts a set of interests from what you write. Thank you to everyone who let me know that they were a little confused by the onboarding process. I made two subtle changes to the ranking algorithm. First, the scoring algorithm ranks posts by how well they match your closest interest and gives a slight boost if the post matches multiple interests. That was the intended design from earlier, but I realized that multiple weaker matches were pulling down the scores rather than boosting them. The second change was that I finally retired the machine learning text quality classifier model that Scour had been using. The final straw was when a blog post I had written (and worked hard on!) wasn't showing up on Scour. The model had classified it as low quality 😤. I knew for a while that what the model was optimizing for was somewhat orthogonal to my idea of text quality, but that was it. For the moment, Scour relies on a large domain blocklist (of just under 1 million domains) to prevent low-quality content and spam from getting into your feed. I'm also investigating other ways of assessing quality without relying on social signals , but more on that to come in the future. I've always been striving to make Scour fast and it got much faster this past month. My feed, which compares about 35,000 posts against 575 interests, now loads in around 50 milliseconds. Even comparing all the 600,000+ posts from the last month across all feeds takes only 180 milliseconds. This graph shows the 99th percentile latency (the slowest requests) dropping from the occasional 10 seconds down to under 400 milliseconds (lower is better): For those interested in the technical details, this speed up came from two changes: First, I switched from scanning through post embeddings streamed from SQLite, which was already quite fast because the data is local, to keeping all the relevant details in memory. The in-memory snapshot is rebuilt every 15 minutes when the scraper finishes polling all of the feeds for new content. This change resulted in the very nice combination of much higher performance and lower memory usage, because SQLite connections have independent caches. The second change came from another round of optimization on the library I use to compute the Hamming Distance between each post's embedding and the embeddings of each of your interests. You can read more about this in the upcoming blog post, but I was able to speed up the comparisons by around another 40x, making it so Scour can now do around 1.6 billion comparisons per second. Together, these changes make loading the feed feel instantaneous, even though your whole feed is ranked on the fly when you load the page. Here were some of my favorite posts that I found on Scour in February: Happy Scouring! Scour is built on vector embeddings, so I'm especially excited when someone releases a new and promising-sounding embedding model. I get particularly excited by those that are explicitly trained to support binary quantization like this one from Perplexity: pplx-embed: State-of-the-Art Embedding Models for Web-Scale Retrieval . I also spend a fair amount of time thinking about optimizing Rust code, especially using SIMD, so this was an interesting write up from TurboPuffer: Rust zero-cost abstractions vs. SIMD . This was an interesting write up comparing what different coding agents do under the hood: I Intercepted 3,177 API Calls Across 4 AI Coding Tools. Here's What's Actually Filling Your Context Window. . And finally, this one is on a very different topic but has some nice animations that demonstrate why boarding airplanes is slow and shows The Fastest Way to Board an Airplane .

0 views
tonsky.me 1 months ago

Claude is an Electron App because we’ve lost native

In “Why is Claude an Electron App?” Drew Breunig wonders: Claude spent $20k on an agent swarm implementing (kinda) a C-compiler in Rust, but desktop Claude is an Electron app. If code is free, why aren’t all apps native? And then argues that the answer is that LLMs are not good enough yet. They can do 90% of the work, so there’s still a substantial amount of manual polish, and thus, increased costs. But I think that’s not the real reason. The real reason is: native has nothing to offer. API-wise, native apps lost to web apps a long time ago. Native APIs are terrible to use, and OS vendors use everything in their power to make you not want to develop native apps for their platform. That explains the rise of Electron before LLM times, but it’s also a problem that LLMs solve now: if that was a real barrier to developing native apps, it doesn’t exist anymore. Then there’re looks and consistency. Some time ago, maybe in the late 90s and 2000s, native was ahead. It used to look good, it was consistent, and it all actually worked: the more apps used native look and feel, the better user experience was across apps (which we used to call programs). These days, though, native is as bad as the web, if not worse. Consistency is basically out the window. Anything can look like anything, buttons have no borders, contrast doesn’t exist, and neither do conventions. Apple, for example, seems to place traffic lights and corner radius by vibes rather than by any measurable guidelines. Looks could be good, but they also can be bad, and then you are stuck with platform-consistent, but generally bad UI (Liquid Glass ahem). It changes too often, too: the app you made today will look out of place next year, when Apple decides to change look and feel yet again. There’s no native look anymore. Theoretically, native apps can integrate with OS on a deeper level. This sounds nice, but what does that mean in practice? There are almost no good interoperable file formats; everything is locked inside individual apps, most services moved to the web, and OSes dropped the ball for making a good shared baseline. You can integrate with OS-provided calendar, but you can’t do it with web calendar. Well, you can, of course, but it’s easier on the web; native doesn’t help with it at all. Finally, the last hope of people longing for native is performance. They feel that native apps will be faster. Well, they can, but it doesn’t mean they will. Web apps can be faster, too, but in practice, nobody cares. There’s no technical reason why Slack needs to load 80 MiB just to show 10 channel names and 3 messages on a screen. The web is not the problem here! It’s a choice to be bad. What makes you think it’ll be different once the company decides to move to native? Don’t get me wrong: writing this brings me no joy. I don’t think web is a solution either. I just remember good times when native did a better-than-average job, and we were all better for using it, and it saddens me that these times have passed. I just don’t think that kidding ourselves that the only problem with software is Electron and it all will be butterflies and unicorns once we rewrite Slack in SwiftUI is not productive. The real problem is a lack of care. And the slop; you can build it with any stack.

0 views
Jeff Geerling 1 months ago

How to Securely Erase an old Hard Drive on macOS Tahoe

Apparently Apple thinks nobody with a modern Mac uses spinning rust (hard drives with platters) anymore. I plugged in a hard drive from an old iMac into my Mac Studio using my Sabrent USB to SATA Hard Drive enclosure, and opened up Disk Utility, clicked on the top-level disk in the sidebar, and clicked 'Erase'. Lo and behold, there's no 'Security Options' button on there, as there had been since—I believe—the very first version of Disk Utility in Mac OS X!

0 views
<antirez> 1 months ago

Implementing a clear room Z80 / ZX Spectrum emulator with Claude Code

Anthropic recently released a blog post with the description of an experiment in which the last version of Opus, the 4.6, was instructed to write a C compiler in Rust, in a “clean room” setup. The experiment methodology left me dubious about the kind of point they wanted to make. Why not provide the agent with the ISA documentation? Why Rust? Writing a C compiler is exactly a giant graph manipulation exercise: the kind of program that is harder to write in Rust. Also, in a clean room experiment, the agent should have access to all the information about well established computer science progresses related to optimizing compilers: there are a number of papers that could be easily synthesized in a number of markdown files. SSA, register allocation, instructions selection and scheduling. Those things needed to be researched *first*, as a prerequisite, and the implementation would still be “clean room”. Not allowing the agent to access the Internet, nor any other compiler source code, was certainly the right call. Less understandable is the almost-zero steering principle, but this is coherent with a certain kind of experiment, if the goal was showcasing the completely autonomous writing of a large project. Yet, we all know how this is not how coding agents are used in practice, most of the time. Who uses coding agents extensively knows very well how, even never touching the code, a few hits here and there completely changes the quality of the result. # The Z80 experiment I thought it was time to try a similar experiment myself, one that would take one or two hours at max, and that was compatible with my Claude Code Max plan: I decided to write a Z80 emulator, and then a ZX Spectrum emulator (and even more, a CP/M emulator, see later) in a condition that I believe makes a more sense as “clean room” setup. The result can be found here: https://github.com/antirez/ZOT. # The process I used 1. I wrote a markdown file with the specification of what I wanted to do. Just English, high level ideas about the scope of the Z80 emulator to implement. I said things like: it should execute a whole instruction at a time, not a single clock step, since this emulator must be runnable on things like an RP2350 or similarly limited hardware. The emulator should correctly track the clock cycles elapsed (and I specified we could use this feature later in order to implement the ZX Spectrum contention with ULA during memory accesses), provide memory access callbacks, and should emulate all the known official and unofficial instructions of the Z80. For the Spectrum implementation, performed as a successive step, I provided much more information in the markdown file, like, the kind of rendering I wanted in the RGB buffer, and how it needed to be optional so that embedded devices could render the scanlines directly as they transferred them to the ST77xx display (or similar), how it should be possible to interact with the I/O port to set the EAR bit to simulate cassette loading in a very authentic way, and many other desiderata I had about the emulator. This file also included the rules that the agent needed to follow, like: * Accessing the internet is prohibited, but you can use the specification and test vectors files I added inside ./z80-specs. * Code should be simple and clean, never over-complicate things. * Each solid progress should be committed in the git repository. * Before committing, you should test that what you produced is high quality and that it works. * Write a detailed test suite as you add more features. The test must be re-executed at every major change. * Code should be very well commented: things must be explained in terms that even people not well versed with certain Z80 or Spectrum internals details should understand. * Never stop for prompting, the user is away from the keyboard. * At the end of this file, create a work in progress log, where you note what you already did, what is missing. Always update this log. * Read this file again after each context compaction. 2. Then, I started a Claude Code session, and asked it to fetch all the useful documentation on the internet about the Z80 (later I did this for the Spectrum as well), and to extract only the useful factual information into markdown files. I also provided the binary files for the most ambitious test vectors for the Z80, the ZX Spectrum ROM, and a few other binaries that could be used to test if the emulator actually executed the code correctly. Once all this information was collected (it is part of the repository, so you can inspect what was produced) I completely removed the Claude Code session in order to make sure that no contamination with source code seen during the search was possible. 3. I started a new session, and asked it to check the specification markdown file, and to check all the documentation available, and start implementing the Z80 emulator. The rules were to never access the Internet for any reason (I supervised the agent while it was implementing the code, to make sure this didn’t happen), to never search the disk for similar source code, as this was a “clean room” implementation. 4. For the Z80 implementation, I did zero steering. For the Spectrum implementation I used extensive steering for implementing the TAP loading. More about my feedback to the agent later in this post. 5. As a final step, I copied the repository in /tmp, removed the “.git” repository files completely, started a new Claude Code (and Codex) session and claimed that the implementation was likely stolen or too strongly inspired from somebody else's work. The task was to check with all the major Z80 implementations if there was evidence of theft. The agents (both Codex and Claude Code), after extensive search, were not able to find any evidence of copyright issues. The only similar parts were about well established emulation patterns and things that are Z80 specific and can’t be made differently, the implementation looked distinct from all the other implementations in a significant way. # Results Claude Code worked for 20 or 30 minutes in total, and produced a Z80 emulator that was able to pass ZEXDOC and ZEXALL, in 1200 lines of very readable and well commented C code (1800 lines with comments and blank spaces). The agent was prompted zero times during the implementation, it acted absolutely alone. It never accessed the internet, and the process it used to implement the emulator was of continuous testing, interacting with the CP/M binaries implementing the ZEXDOC and ZEXALL, writing just the CP/M syscalls needed to produce the output on the screen. Multiple times it also used the Spectrum ROM and other binaries that were available, or binaries it created from scratch to see if the emulator was working correctly. In short: the implementation was performed in a very similar way to how a human programmer would do it, and not outputting a complete implementation from scratch “uncompressing” it from the weights. Instead, different classes of instructions were implemented incrementally, and there were bugs that were fixed via integration tests, debugging sessions, dumps, printf calls, and so forth. # Next step: the ZX Spectrum I repeated the process again. I instructed the documentation gathering session very accurately about the kind of details I wanted it to search on the internet, especially the ULA interactions with RAM access, the keyboard mapping, the I/O port, how the cassette tape worked and the kind of PWM encoding used, and how it was encoded into TAP or TZX files. As I said, this time the design notes were extensive since I wanted this emulator to be specifically designed for embedded systems, so only 48k emulation, optional framebuffer rendering, very little additional memory used (no big lookup tables for ULA/Z80 access contention), ROM not copied in the RAM to avoid using additional 16k of memory, but just referenced during the initialization (so we have just a copy in the executable), and so forth. The agent was able to create a very detailed documentation about the ZX Spectrum internals. I provided a few .z80 images of games, so that it could test the emulator in a real setup with real software. Again, I removed the session and started fresh. The agent started working and ended 10 minutes later, following a process that really fascinates me, and that probably you know very well: the fact is, you see the agent working using a number of diverse skills. It is expert in everything programming related, so as it was implementing the emulator, it could immediately write a detailed instrumentation code to “look” at what the Z80 was doing step by step, and how this changed the Spectrum emulation state. In this respect, I believe automatic programming to be already super-human, not in the sense it is currently capable of producing code that humans can’t produce, but in the concurrent usage of different programming languages, system programming techniques, DSP stuff, operating system tricks, math, and everything needed to reach the result in the most immediate way. When it was done, I asked it to write a simple SDL based integration example. The emulator was immediately able to run the Jetpac game without issues, with working sound, and very little CPU usage even on my slow Dell Linux machine (8% usage of a single core, including SDL rendering). Once the basic stuff was working, I wanted to load TAP files directly, simulating cassette loading. This was the first time the agent missed a few things, specifically about the timing the Spectrum loading routines expected, and here we are in the territory where LLMs start to perform less efficiently: they can’t easily run the SDL emulator and see the border changing as data is received and so forth. I asked Claude Code to do a refactoring so that zx_tick() could be called directly and was not part of zx_frame(), and to make zx_frame() a trivial wrapper. This way it was much simpler to sync EAR with what it expected, without callbacks or the wrong abstractions that it had implemented. After such change, a few minutes later the emulator could load a TAP file emulating the cassette without problems. This is how it works now: do { zx_set_ear(zx, tzx_update(&tape, zx->cpu.clocks)); } while (!zx_tick(zx, 0)); I continued prompting Claude Code in order to make the key bindings more useful and a few things more. # CP/M One thing that I found really interesting was the ability of the LLM to inspect the COM files for ZEXALL / ZEXCOM tests for the Z80, easily spot the CP/M syscalls that were used (a total of three), and implement them for the extended z80 test (executed by make fulltest). So, at this point, why not implement a full CP/M environment? Same process again, same good result in a matter of minutes. This time I interacted with it a bit more for the VT100 / ADM3 terminal escapes conversions, reported things not working in WordStar initially, and in a few minutes everything I tested was working well enough (but, there are fixes to do, like simulating a 2Mhz clock, right now it runs at full speed making CP/M games impossible to use). # What is the lesson here? The obvious lesson is: always provide your agents with design hints and extensive documentation about what they are going to do. Such documentation can be obtained by the agent itself. And, also, make sure the agent has a markdown file with the rules of how to perform the coding tasks, and a trace of what it is doing, that is updated and read again quite often. But those tricks, I believe, are quite clear to everybody that has worked extensively with automatic programming in the latest months. To think in terms of “what a human would need” is often the best bet, plus a few LLMs specific things, like the forgetting issue after context compaction, the continuous ability to verify it is on the right track, and so forth. Returning back to the Anthropic compiler attempt: one of the steps that the agent failed was the one that was more strongly related to the idea of memorization of what is in the pretraining set: the assembler. With extensive documentation, I can’t see any way Claude Code (and, even more, GPT5.3-codex, which is in my experience, for complex stuff, more capable) could fail at producing a working assembler, since it is quite a mechanical process. This is, I think, in contradiction with the idea that LLMs are memorizing the whole training set and uncompress what they have seen. LLMs can memorize certain over-represented documents and code, but while they can extract such verbatim parts of the code if prompted to do so, they don’t have a copy of everything they saw during the training set, nor they spontaneously emit copies of already seen code, in their normal operation. We mostly ask LLMs to create work that requires assembling different knowledge they possess, and the result is normally something that uses known techniques and patterns, but that is new code, not constituting a copy of some pre-existing code. It is worth noting, too, that humans often follow a less rigorous process compared to the clean room rules detailed in this blog post, that is: humans often download the code of different implementations related to what they are trying to accomplish, read them carefully, then try to avoid copying stuff verbatim but often times they take strong inspiration. This is a process that I find perfectly acceptable, but it is important to take in mind what happens in the reality of code written by humans. After all, information technology evolved so fast even thanks to this massive cross pollination effect. For all the above reasons, when I implement code using automatic programming, I don’t have problems releasing it MIT licensed, like I did with this Z80 project. In turn, this code base will constitute quality input for the next LLMs training, including open weights ones. # Next steps To make my experiment more compelling, one should try to implement a Z80 and ZX Spectrum emulator without providing any documentation to the agent, and then compare the result of the implementation. I didn’t find the time to do it, but it could be quite informative. Comments

0 views
underlap 1 months ago

ipc-channel-mux router support

The IPC channel multiplexing crate, ipc-channel-mux, now includes a “router”. The router provides a means of automatically forwarding messages from subreceivers to Crossbeam receivers so that users can enjoy Crossbeam receiver features, such as selection (explained below). The absence of a router blocked the adoption of the crate by Servo, so it was an important feature to support. Routing involves running a thread which receives from various subreceivers and forwards the results to Crossbeam channels. Without a separate thread, a receive on one of the Crossbeam receivers would block and when a message became available on the subchannel, it wouldn’t be forwarded to the Crossbeam channel. Before we explain routing further, we need to introduce a concept which may be unfamiliar to some readers. Suppose you have a set of data sources – servers, file descriptors, or, in our case, channels – which may or may not be ready to deliver data. To wait for one or more of these to be ready, one option is to poll the items in the set. But if none of the items are ready, what should you do? If you loop around and repeatedly poll the items, you’ll consume a lot of CPU. If you delay for a period of time before polling again and an item becomes ready before the period has elapsed, you won’t notice. So polling either consumes excessive CPU or reduces responsiveness. How do we balance the requirements of efficiency and responsiveness? The solution is to somehow block until at least one item is ready. That’s just what selection does. In the context of IPC channel, this selection logic applies to a set of receivers, known as an . An holds a set of IPC receivers and, when requested, waits for at least one of the receivers to be ready and then returns a collection of the results from all the receivers which became ready. The purpose of routing is that users, such as Servo, can then select [1] [2] over a heterogeneous collection of IPC receivers and Crossbeam receivers. By converting IPC receivers into Crossbeam receivers, it’s possible to use Crossbeam channel’s selection feature on a homogeneous collection of Crossbeam receivers to implement a select on the corresponding heterogeneous collection of IPC receivers and Crossbeam receivers. Routing for has the same requirement: to convert a collection of subreceivers to Crossbeam receivers so that Crossbeam channel’s selection feature can be used on a homogeneous collection of Crossbeam receivers to implement selection on the corresponding heterogeneous collection of subreceivers and Crossbeam receivers. Let’s look at how this is implemented. The most obvious approach was to mirror the design of IPC channel routing and implement subchannel routing in terms of sets of subreceivers known as s. Receiving from a collection of subreceivers could be implemented by attempting a receive (using ) from each subreceiver of the collection in turn and returning any results returned. However there is a difficulty: if none of the subreceivers returns a result, what should happen? If we loop around and repeatedly attempt to receive from each subreceiver in the collection, we’ll consume a lot of CPU. If we delay for a certain period of time, we won’t be responsive if a subreceiver becomes ready to return a result. The solution is to somehow block until at least one of the subreceivers is ready to return a result. A does just that. It holds a set of subreceivers and, when requested, returns a collection of the results from all the receivers which became ready. This is a specific example of the advantages of using selection over polling, discussed above. Remember that the results of a subreceiver are demultiplexed from the results of an IPC receiver (provided by the crate). The following diagram shows how a MultiReceiver sits between an IpcReceiver and the SubReceivers served by that IpcReceiver: IPC channels already implements an . So a can be implemented in terms of an containing all the IPC receivers underlying the subreceivers in the set. There are some complications however. When a subreceiver is added to a , there may be other subreceivers with the same underlying IPC receiver which do not belong to the set and yet the will return a message that could demultiplex either to a subreceiver in the set or a subreceiver not in the set. Worse than that, subreceivers with the same underlying IPC receiver may be added to distinct s. So if we use an to implement a , more than one may need to share the same . There is one case where Servo uses directly, rather than via the router and it’s in the implementation of . So one option would be to avoid adding IpcReceiverSet to the API of . Then there would be at most one instance of and so some of the complications might not arise. But there’s a danger that it would be possible to encounter the same complication using the router, e.g. if some subreceivers were added to the router and other subreceivers with the same underlying IPC channel as those added to the router were used directly. Another complication of routing is that the router thread needs to receive messages from subchannels which originate outside that thread. So subreceivers need to be moved into the thread. In terms of Rust, they need to be . Given that some subreceivers can be moved into the thread and other subreceivers which have not not moved into the thread can share the same underlying IPC channel, subreceivers (or at least substantial parts of their implementation) need to be . To avoid polling, essentially it must be possible for a select operation on an SubReceiverSet to result in a select operation on an IpcReceiverSet comprising the underlying IpcReceiver(s). I expermented with the situation where some subreceivers were added to the router and other subreceivers with the same underlying IPC channel as those added to the router were used directly. This resulted in liveness and/or fairness issues when the thread using a subreceiver directly competed with the router thread. Both these threads would attempt to issue a select on an . The cleanest solution initially appeared to be to make both these depend on the router to issue the select operation. This came with some restrictions though, such as the stand-alone subreceiver not being able to receive any more messages after the router was shut down. A radical alternative was to restructure the router API so that it would not be possible for some subreceivers to be added to the router and other subreceivers with the same underlying IPC channel as those added to the router to be used directly. This may be a reasonable restriction for Servo because receivers tend to be added to the router soon after the receiver’s channel is created. With this redesigned router API in which subreceivers destined for routing are hidden from the API, the above liveness and fairness problems can be side-stepped. v0.0.5 of the ipc-channel-mux crate includes the redesigned router API. v0.0.6 improves the throughput for both subchannel receives and routing. The next step is to try to improve the code structure since the module has grown considerably and could do with some parts splitting into separate modules. After that, I’ll need to see if some of the missing features relative to ipc-channel need to be added to ipc-channel-mux before it’s ready to be tried out in Servo. [3] Another possibility, if some of the IPC receivers has been disconnected, is that select can return which IPC receivers have been disconnected. ↩︎ Crossbeam selection is a little more general. They allow the user to wait for operations to complete, each of which may be a send or a receive. An arbitrary one of the completed operations is chosen and its resultant value is returned. ↩︎ The main functional gaps in ipc-channel-mux compared to ipc-channel are shared memory transmission and non-blocking subchannel receive. ↩︎ Another possibility, if some of the IPC receivers has been disconnected, is that select can return which IPC receivers have been disconnected. ↩︎ Crossbeam selection is a little more general. They allow the user to wait for operations to complete, each of which may be a send or a receive. An arbitrary one of the completed operations is chosen and its resultant value is returned. ↩︎ The main functional gaps in ipc-channel-mux compared to ipc-channel are shared memory transmission and non-blocking subchannel receive. ↩︎

0 views
baby steps 1 months ago

What it means that Ubuntu is using Rust

Righty-ho, I’m back from Rust Nation, and busily horrifying my teenage daughter with my (admittedly atrocious) attempts at doing an English accent 1 . It was a great trip with a lot of good conversations and some interesting observations. I am going to try to blog about some of them, starting with some thoughts spurred by Jon Seager’s closing keynote, “Rust Adoption At Scale with Ubuntu”. For some time now I’ve been debating with myself, has Rust “crossed the chasm” ? If you’re not familiar with that term, it comes from a book that gives a kind of “pop-sci” introduction to the Technology Adoption Life Cycle . The answer, of course, is it depends on who you ask . Within Amazon, where I have the closest view, the answer is that we are “most of the way across”: Rust is squarely established as the right way to build at-scale data planes or resource-aware agents and it is increasingly seen as the right choice for low-level code in devices and robotics as well – but there remains a lingering perception that Rust is useful for “those fancy pants developers at S3” (or wherever) but a bit overkill for more average development 3 . On the other hand, within the realm of Safety Critical Software, as Pete LeVasseur wrote in a recent rust-lang blog post , Rust is still scrabbling for a foothold. There are a number of successful products but most of the industry is in a “wait and see” mode, letting the early adopters pave the path. The big idea that I at least took away from reading Crossing the Chasm and other references on the technology adoption life cycle is the need for “reference customers”. When you first start out with something new, you are looking for pioneers and early adopters that are drawn to new things: What an early adopter is buying [..] is some kind of change agent . By being the first to implement this change in the industry, the early adopters expect to get a jump on the competition. – from Crossing the Chasm But as your technology matures, you have to convince people with a lower and lower tolerance for risk: The early majority want to buy a productivity improvement for existing operations. They are looking to minimize discontinuity with the old ways. They want evolution, not revolution. – from Crossing the Chasm So what is most convincing to people to try something new? The answer is seeing that others like them have succeeded. You can see this at play in both the Amazon example and the Safety Critical Software example. Clearly seeing Rust used for network services doesn’t mean it’s ready to be used in your car’s steering column 4 . And even within network services, seeing a group like S3 succeed with Rust may convince other groups building at-scale services to try Rust, but doesn’t necessarily persuade a team to use Rust for their next CRUD service. And frankly, it shouldn’t! They are likely to hit obstacles. All of this was on my mind as I watched the keynote by Jon Seager, the VP of Engineering at Canonical, which is the company behind Ubuntu. Similar to Lars Bergstrom’s epic keynote from year’s past on Rust adoption within Google, Jon laid out a pitch for why Canonical is adopting Rust that was at once visionary and yet deeply practical . “Visionary and yet deeply practical” is pretty much the textbook description of what we need to cross from early adopters to early majority . We need folks who care first and foremost about delivering the right results, but are open to new ideas that might help them do that better; folks who can stand on both sides of the chasm at once. Jon described how Canonical focuses their own development on a small set of languages: Python, C/C++, and Go, and how they had recently brought in Rust and were using it as the language of choice for new foundational efforts , replacing C, C++, and (some uses of) Python. Jon talked about how he sees it as part of Ubuntu’s job to “pay it forward” by supporting the construction of memory-safe foundational utilities. Jon meant support both in terms of finances – Canonical is sponsoring the Trifecta Tech Foundation’s to develop sudo-rs and ntpd-rs and sponsoring the uutils org’s work on coreutils – and in terms of reputation. Ubuntu can take on the risk of doing something new, prove that it works, and then let others benefit. Remember how the Crossing the Chasm book described early majority people? They are “looking to minimize discontinuity with the old ways”. And what better way to do that than to have drop-in utilities that fit within their existing workflows. With new adoption comes new perspectives. On Thursday night I was at dinner 5 organized by Ernest Kissiedu 6 . Jon Seager was there along with some other Rust adopters from various industries, as were a few others from the Rust Foundation and the open-source project. Ernest asked them to give us their unvarnished takes on Rust. Jon made the provocative comment that we needed to revisit our policy around having a small standard library. He’s not the first to say something like that, it’s something we’ve been hearing for years and years – and I think he’s right! Though I don’t think the answer is just to ship a big standard library. In fact, it’s kind of a perfect lead-in to (what I hope will be) my next blog post, which is about a project I call “battery packs” 7 . The broader point though is that shifting from targeting “pioneers” and “early adopters” to targeting “early majority” sometimes involves some uncomfortable changes: Transition between any two adoption segments is normally excruciatingly awkward because you must adopt new strategies just at the time you have become most comfortable with the old ones. [..] The situation can be further complicated if the high-tech company, fresh from its marketing success with visionaries, neglects to change its sales pitch. [..] The company may be saying “state-of-the-art” when the pragmatist wants to hear “industry standard”. – Crossing the Chasm (emphasis mine) Not everybody will remember it, but in 2016 there was a proposal called the Rust Platform . The idea was to bring in some crates and bless them as a kind of “extended standard library”. People hated it. After all, they said, why not just add dependencies to your ? It’s easy enough. And to be honest, they were right – at least at the time. I think the Rust Platform is a good example of something that was a poor fit for early adopters, who want the newest thing and don’t mind finding the best crates, but which could be a great fit for the Early Majority. 8 Anyway, I’m not here to argue for one thing or another in this post, but more for the concept that we have to be open to adapting our learned wisdom to new circumstances. In the past, we were trying to bootstrap Rust into the industry’s consciousness – and we have succeeded. The task before us now is different: we need to make Rust the best option not just in terms of “what it could be ” but in terms of “what it actually is ” – and sometimes those are in tension. Later in the dinner, the talk turned, as it often does, to money. Growing Rust adoption also comes with growing needs placed on the Rust project and its ecosystem. How can we connect the dots? This has been a big item on my mind, and I realize in writing this paragraph how many blog posts I have yet to write on the topic, but let me lay out a few interesting points that came up over this dinner and at other recent points. First, there are more ways to offer support than $$. For Canonical specifically, as they are an open-source organization through-and-through, what I would most want is to build stronger relationships between our organizations. With the Rust for Linux developers, early on Rust maintainers were prioritizing and fixing bugs on behalf of RfL devs, but more and more, RfL devs are fixing things themselves, with Rust maintainers serving as mentors. This is awesome! Second, there’s an interesting trend about $$ that I’ve seen crop up in a few places. We often think of companies investing in the open-source dependencies that they rely upon. But there’s an entirely different source of funding, and one that might be even easier to tap, which is to look at companies that are considering Rust but haven’t adopted it yet. For those “would be” adopters, there are often individuals in the org who are trying to make the case for Rust adoption – these individuals are early adopters, people with a vision for how things could be, but they are trying to sell to their early majority company. And to do that, they often have a list of “table stakes” features that need to be supported; what’s more, they often have access to some budget to make these things happen. This came up when I was talking to Alexandru Radovici, the Foundation’s Silver Member Directory, who said that many safety critical companies have money they’d like to spend to close various gaps in Rust, but they don’t know how to spend it. Jon’s investments in Trifecta Tech and the uutils org have the same character: he is looking to close the gaps that block Ubuntu from using Rust more. Well, first of all, you should watch Jon’s talk. “Brilliant”, as the Brits have it. But my other big thought is that this is a crucial time for Rust. We are clearly transitioning in a number of areas from visionaries and early adopters towards that pragmatic majority, and we need to be mindful that doing so may require us to change some of the way that we’ve always done things. I liked this paragraph from Crossing the Chasm : To market successfully to pragmatists, one does not have to be one – just understand their values and work to serve them. To look more closely into these values, if the goal of visionaries is to take a quantum leap forward, the goal of pragmatists is to make a percentage improvement–incremental, measurable, predictable progress. [..] To market to pragmatists, you must be patient. You need to be conversant with the issues that dominate their particular business. You need to show up at the industry-specific conferences and trade shows they attend. Re-reading Crossing the Chasm as part of writing this blog post has really helped me square where Rust is – for the most part, I think we are still crossing the chasm, but we are well on our way. I think what we see is a consistent trend now where we have Rust champions who fit the “visionary” profile of early adopters successfully advocating for Rust within companies that fit the pragmatist, early majority profile. It strikes me that open-source is just an amazing platform for doing this kind of marketing. Unlike a company, we don’t have to do everything ourselves. We have to leverage the fact that open source helps those who help themselves – find those visionary folks in industries that could really benefit from Rust, bring them into the Rust orbit, and then (most important!) support and empower them to adapt Rust to their needs. This last part may sound obvious, but it’s harder than it sounds. When you’re embedded in open source, it seems like a friendly place where everyone is welcome. But the reality is that it can be a place full of cliques and “oral traditions” that “everybody knows” 9 . People coming with an idea can get shutdown for using the wrong word. They can readily mistake the, um, “impassioned” comments from a random contributor (or perhaps just a troll…) for the official word from project leadership. It only takes one rude response to turn somebody away. So what will ultimately help Rust the most to succeed? Empathy in Open Source . Let’s get out there, find out where Rust can help people, and make it happen. Exciting times! I am famously bad at accents. My best attempt at posh British sounds more like Apu from the Simpsons. I really wish I could pull off a convincing Greek accent, but sadly no.  ↩︎ Another of my pearls of wisdom is “there is nothing more permanent than temporary code”. I used to say that back at the startup I worked at after college, but years of experience have only proven it more and more true.  ↩︎ Russel Cohen and Jess Izen gave a great talk at last year’s RustConf about what our team is doing to help teams decide if Rust is viable for them. But since then another thing having a big impact is AI, which is bringing previously unthinkable projects, like rewriting older systems, within reach.  ↩︎ I have no idea if there is code in a car’s steering column, for the record. I assume so by now? For power steering or some shit?  ↩︎ Or am I supposed to call it “tea”? Or maybe “supper”? I can’t get a handle on British mealtimes.  ↩︎ Ernest is such a joy to be around. He’s quiet, but he’s got a lot of insights if you can convince him to share them. If you get the chance to meet him, take it! If you live in London, go to the London Rust meetup! Find Ernest and introduce yourself. Tell him Niko sent you and that you are supposed to say how great he is and how you want to learn from the wisdom he’s accrued over the years. Then watch him blush. What a doll.  ↩︎ If you can’t wait, you can read some Zulip discussion here.  ↩︎ The Battery Packs proposal I want to talk about is similar in some ways to the Rust Platform, but decentralized and generally better in my opinion– but I get ahead of myself!  ↩︎ Betteridge’s Law of Headlines has it that “Any headline that ends in a question mark can be answered by the word no ”. Well, Niko’s law of open-source 2 is that “nobody actually knows anything that ’everybody’ knows”.  ↩︎ I am famously bad at accents. My best attempt at posh British sounds more like Apu from the Simpsons. I really wish I could pull off a convincing Greek accent, but sadly no.  ↩︎ Another of my pearls of wisdom is “there is nothing more permanent than temporary code”. I used to say that back at the startup I worked at after college, but years of experience have only proven it more and more true.  ↩︎ Russel Cohen and Jess Izen gave a great talk at last year’s RustConf about what our team is doing to help teams decide if Rust is viable for them. But since then another thing having a big impact is AI, which is bringing previously unthinkable projects, like rewriting older systems, within reach.  ↩︎ I have no idea if there is code in a car’s steering column, for the record. I assume so by now? For power steering or some shit?  ↩︎ Or am I supposed to call it “tea”? Or maybe “supper”? I can’t get a handle on British mealtimes.  ↩︎ Ernest is such a joy to be around. He’s quiet, but he’s got a lot of insights if you can convince him to share them. If you get the chance to meet him, take it! If you live in London, go to the London Rust meetup! Find Ernest and introduce yourself. Tell him Niko sent you and that you are supposed to say how great he is and how you want to learn from the wisdom he’s accrued over the years. Then watch him blush. What a doll.  ↩︎ If you can’t wait, you can read some Zulip discussion here.  ↩︎ The Battery Packs proposal I want to talk about is similar in some ways to the Rust Platform, but decentralized and generally better in my opinion– but I get ahead of myself!  ↩︎ Betteridge’s Law of Headlines has it that “Any headline that ends in a question mark can be answered by the word no ”. Well, Niko’s law of open-source 2 is that “nobody actually knows anything that ’everybody’ knows”.  ↩︎

0 views
Evan Schwartz 1 months ago

PSA: Your SQLite Connection Pool Might Be Ruining Your Write Performance

Update (Feb 18, 2026): After a productive discussion on Reddit and additional benchmarking , I found that the solutions I originally proposed (batched writes or using a synchronous connection) don't actually help. The real issue is simpler and more fundamental than I described: SQLite is single-writer, so any amount of contention at the SQLite level will severely hurt write performance. The fix is to use a single writer connection with writes queued at the application level, and a separate connection pool for concurrent reads. The original blog post text is preserved below, with retractions and updates marked accordingly. My apologies to the SQLx maintainers for suggesting that this behavior was unique to SQLx. Write transactions can lead to lock starvation and serious performance degradation when using SQLite with SQLx , the popular async Rust SQL library. In retrospect, I feel like this should have been obvious, but it took a little more staring at suspiciously consistent "slow statement" logs than I'd like to admit, so I'm writing it up in case it helps others avoid this footgun. SQLite is single-writer. In WAL mode, it can support concurrent reads and writes (or, technically "write" singular), but no matter the mode there is only ever one writer at a time. Before writing, a process needs to obtain an EXCLUSIVE lock on the database. If you start a read transaction with a SELECT and then perform a write in the same transaction, the transaction will need to be upgraded to write transaction with an exclusive lock: A read transaction is used for reading only. A write transaction allows both reading and writing. A read transaction is started by a SELECT statement, and a write transaction is started by statements like CREATE, DELETE, DROP, INSERT, or UPDATE (collectively "write statements"). If a write statement occurs while a read transaction is active, then the read transaction is upgraded to a write transaction if possible. ( source ) Transactions started with or also take the exclusive write lock as soon as they are started. Transactions in SQLx look like this: This type of transaction where you read and then write is completely fine. The transaction starts as a read transaction and then is upgraded to a write transaction for the . Update: This section incorrectly attributes the performance degradation to the interaction between async Rust and SQLite. The problem is actually that any contention for the EXCLUSIVE lock at the SQLite level, whether from single statements or batches, will hurt write performance. The problem arises when you call within a write transaction. For example, this could happen if you call multiple write statements within a transaction: This code will cause serious performance degradation if you have multiple concurrent tasks that might be trying this operation, or any other write, at the same time. When the program reaches the first statement, the transaction is upgraded to a write transaction with an exclusive lock. However, when you call , the task yields control back to the async runtime. The runtime may schedule another task before returning to this one. The problem is that this task is now holding an exclusive lock on the database. All other writers must wait for this one to finish. If the newly scheduled task tries to write, it will simply wait until it hits the and returns a busy timeout error. The original task might be able to make progress if no other concurrent writers are scheduled before it, but under higher load you might continuously have new tasks that block the original writer from progressing. Starting a transaction with will also cause this problem, because you will immediately take the exclusive lock and then yield control with . In practice, you can spot this issue in your production logs if you see a lot of SQLx warnings that say where the time is very close to your (which is 5 seconds by default). This is the result of other tasks being scheduled by the runtime and then trying and failing to obtain the exclusive lock they need to write to the database while being blocked by a parked task. SQLite's concurrency model (in WAL mode) is many concurrent readers with exactly one writer. Mirroring this architecture at the application level provides the best performance. Instead of a single connection pool, where connections may be upgraded to write at any time, use two separate pools: With this setup, write transactions serialize within the application. Tasks will queue waiting for the single writer connection, rather than all trying to obtain SQLite's EXCLUSIVE lock. In my benchmarks , this approach was ~20x faster than using a single pool with multiple connections: An alternative to separate pools is wrapping writes in a Mutext , which achieves similar performance (95ms in the benchmarks). However, separate pools make the intent clearer and, if the reader pool is configured as read-only, prevent accidentally issuing a write on a reader connection. Having separate pools works when reads and writes are independent, but sometimes you need to atomically read and then write based on it: Sending this transaction to the single write connection is fine if the read is extremely fast, such as a single lookup by primary key. However, if your application requires expensive reads that must precede writes in a single atomic transaction, the shared connection pool with moderate concurrency might outperform a single writer. Retraction: Benchmarking showed that batched writes perform no better than the naive loop under concurrency, because 50 connections still contend for the write lock regardless of whether each connection issues 100 small s or one large . QueryBuilder is still useful for reducing per-statement overhead, but it does not fix the contention problem. We could safely replace the example code above with this snippet that uses a bulk insert to avoid the lock starvation problem: Note that if you do this with different numbers of values, you should call . By default, SQLx caches prepared statements. However, each version of the query with a different number of arguments will be cached separately, which may thrash the cache. Retraction: Benchmarking showed that this did not actually improve performance. Unfortunately, the fix for atomic writes to multiple tables is uglier and potentially very dangerous. To avoid holding an exclusive lock across an , you need to use the interface to execute a transaction in one shot: However, this can lead to catastrophic SQL injection attacks if you use this for user input, because does not support binding and sanitizing query parameters. Note that you can technically run a transaction with multiple statements in a call but the docs say: The query string may only contain a single DML statement: SELECT, INSERT, UPDATE, DELETE and variants. The SQLite driver does not currently follow this restriction, but that behavior is deprecated. If you find yourself needing atomic writes to multiple tables with SQLite and Rust, you might be better off rethinking your schema to combine those tables or switching to a synchronous library like with a single writer started with . Update: the most useful change would actually be making a distinction between a and a . Libraries like SQLx could enforce the distinction at compile time or runtime by inspecting the queries for the presence of write statements, or the could be configured as read-only. Maybe, but it probably won't. If SQLx offered both a sync and async API (definitely out of scope) and differentiated between read and write statements, a write could be like , which would prevent it from being held across an point. However, SQLx is not an ORM and it probably isn't worth it for the library to have different methods for read versus write statements. Without that, there isn't a way to prevent write transaction locks from being held across s while allowing safe read transactions to be used across s. So, in lieu of type safety to prevent this footgun, I wrote up this blog post and this pull request to include a warning about this in the docs. Discuss on r/rust and Hacker News .

0 views
Jimmy Miller 1 months ago

Untapped Way to Learn a Codebase: Build a Visualizer

The biggest shock of my early career was just how much code I needed to read that others wrote. I had never dealt with this. I had a hard enough time understanding my own code. The idea of understanding hundreds of thousands or even millions of lines of code written by countless other people scared me. What I quickly learned is that you don't have to understand a codebase in its entirety to be effective in it. But just saying that is not super helpful. So rather than tell, I want to show. In this post, I'm going to walk you through how I learn an unfamiliar codebase. But I'll admit, this isn't precisely how I would do it today. After years of working on codebases, I've learned quite a lot of shortcuts. Things that come with experience that just don't translate for other people. So what I'm going to present is a reconstruction. I want to show bits and parts of how I go from knowing very little to gaining knowledge and ultimately, asking the right questions. To do this, I will use just a few techniques: I want to do this on a real codebase, so I've chosen one whose purpose and scope I'm generally familiar with. But one that I've never contributed to or read, Next.js . But I've chosen to be a bit more particular than that. I'm particularly interested in learning more about the Rust bundler setup (turbopack) that Next.js has been building out. So that's where we will concentrate our time. Trying to learn a codebase is distinctly different from trying to simply fix a bug or add a feature. In post, we may use bugs, talk about features, make changes, etc. But we are not trying to contribute to the codebase, yet. Instead, we are trying to get our mind around how the codebase generally works. We aren't concerned with things like coding standards, common practices, or the development roadmap. We aren't even concerned with correctness. The changes we make are about seeing how the codebase responds so we can make sense of it. I find starting at to be almost always completely unhelpful. From main, yes, we have a single entry point, but now we are asking ourselves to understand the whole. But things actually get worse when dealing with a large codebase like this. There isn't even one main. Which main would we choose? So instead, let's start by figuring out what our library even consists of. A couple of things to note. We have packages, crates, turbo, and turbopack. Crates are relevant here because we know we are interested in some of the Rust code, but we are also interested in turbopack in particular. A quick look at these shows that turbo, packages, and crates are probably not our target. Why do I say that? Because turbopack has its own crates folder. So there are 54 crates under turbopack.... This is beginning to feel a bit daunting. So why don't we take a step back and find a better starting point? One starting point I find particularly useful is a bug report . I found this by simply looking at recently opened issues. When I first found it, it had no comments on it. In fact, I find bug reports with only reproducing instructions to be the most useful. Remember, we are trying to learn, not fix a bug. So I spent a little time looking at the bug report. It is fairly clear. It does indeed reproduce. But it has a lot of code. So, as is often the case, it is useful to reduce it to the minimal case. So that's what I did. Here is the important code and the problem we are using to learn from. MyEnum here is dead code. It should not show up in our final bundle. But when we do and look for it, we get: If we instead do The code is completely gone from our build. So now we have our bug. But remember. Our goal here is not to fix the bug. But to understand the code. So our goal is going to be to use this little mini problem to understand what code is involved in this bug. To understand the different ways we could fix this bug. To understand why this bug happened in the first place. To understand some small slice of the turbopack codebase. So at each junction, we are going to resist the urge to simply find the offending code. We are going to take detours. We are going to ask questions. We hope that from the start of this process to the end, we no longer think of the code involved in this action as a black box. But we will intentionally leave ourselves with open questions. As I write these words, I have no idea where this will take us. I have not prepared this ahead of time. I am not telling you a fake tale from a codebase I already know. Yes, I will simplify and skip parts. But you will come along with me. The first step for understanding any project is getting some part of it running. Well, I say that, but in my day job, I've been at companies where this is a multi-day or week-long effort. Sometimes, because of a lack of access, sometimes from unclear instructions, if you find yourself in that situation, you now have a new task, understand it well enough to get it to build. Well, unfortunately, that is the scenario we find ourselves in. I can't think of a single one of these endeavors I've gone on to learn a codebase that didn't involve a completely undesirable, momentum-stopping side quest. For this one, it was as soon as I tried to make changes to the turbopack Rust code and get it working in my test project. There are instructions on how to do this . In fact, we even get an explanation on why it is a bit weird. Since Turbopack doesn't support symlinks when pointing outside of the workspace directory, it can be difficult to develop against a local Next.js version. Neither nor imports quite cut it. An alternative is to pack the Next.js version you want to test into a tarball and add it to the pnpm overrides of your test application. The following script will do it for you: Okay, straightforward enough. I start by finding somewhere in the turbopack repo that I think will be called more than once, and I add the following: Yes. Very scientific, I know. But I've found this to be a rather effective method of making sure my changes are picked up. So I do that, make sure I've built and done the necessary things. I run Then that script tells me to add some overrides and dependencies to my test project. I go to build my project and HERERE!!!!!!! does not show up at all... I will save you the fun details here of looking through this system. But I think it's important to mention a few things. First, being a dependency immediately stood out to me. In my day job, I maintain a fork of swc (WHY???) for some custom stuff. I definitely won't pretend to be an expert on swc, but I know it's written in Rust. I know it's a native dependency. The changes I'm making are native dependencies. But I see no mention at all of turbopack. In fact, if I search in my test project, I get the following: So I have a sneaking suspicion my turbopack code should be in that tar. So let's look at the tar. Ummm. That seems a bit small... Let's look at what's inside. Okay, I think we found our problem. There's really nothing in this at all. Definitely no native code. After lots of searching, the culprit came down to: In our case, the input came from this file and f was . Unfortunately, this little set + regex setup causes to be filtered out. Why? Because it doesn't match the regex. This regex is looking for a with characters after it. We have none. So since we are already in the set (we just added ourselves), we filter ourselves out. How do we solve this problem? There are countless answers, really. I had Claude whip me up one without regex. But my gut says the sorting lets us do this much simpler. But after this fix, let's look at the tar now: Much better. After this change, we can finally see HERERE!!!!!!! a lot. Update : As I wrote this article, someone fixed this in a bit of a different way . Keeping the regex and just changing to . Fairly practical decision. Okay, we now have something we can test. But where do we even begin? This is one reason we chose this bug. It gives a few avenues to go down. First, the report says that these enums are not being "tree-shaken." Is that the right term? One thing I've learned from experience is to never assume that the end user is using terms in the same manner as the codebase. So this can be a starting point, but it might be wrong. With some searching around, we can actually see that there is a configuration for turning turbopackTreeShaking on or off. It was actually a bit hard to find exactly where the default for this was. It isn't actually documented. So let's just enable it and see what we get. Well, I think we figured out that the default is off. So one option is that we never "tree shake" anything. But that seems wrong. At this point, I looked into tree shaking a bit in the codebase, and while I started to understand a few things, I've been at this point before. Sometimes it is good to go deep. But how much of this codebase do I really understand? If tree shaking is our culprit (seeming unlikely at this point), it might be good to know how code gets there. Here, we of course found a bug. But it is an experimental feature. Maybe we can come back and fix it? Maybe we can file a bug? Maybe this code just isn't at all ready for primetime. It's hard to know as an outsider. Our "search around the codebase" strategy failed. So now we try a different tactic. We know a couple of things. We now have two points we can use to try to trace what happens. Let's start with parsing. Luckily, here it is straightforward: . When we look at this code, we can see that swc does the heavy lifting. First, it parses it into a TypeScript AST, then applies transforms to turn it into JavaScript. At this point, we don't write to a string, but if you edit the code and use an emitter, you see this: Now, to find where we write the chunks. In most programs, this would be pretty easy. Typically, there is a linear flow somewhere that just shows you the steps. Or if you can't piece one together, you can simply breakpoint and follow the flow. But Turbopack is a rather advanced system involving async Rust (more on this later). So, in keeping with the tradition of not trying to do things that rely too heavily on my knowledge, I have done the tried and true, log random things until they look relevant. And what I found made me realize that logging was not going to be enough. It was time to do my tried and true learning technique, visualization. Ever since my first job , I have been building custom tools to visualize codebases. Perhaps this is due to my aphantasia. I'm not really sure. Some of these visualizers make their way into general use for me. But more often than not, they are a means of understanding. When I applied for a job at Shopify working on YJIT, I built a simple visualizer but never got around to making it more useful than a learning tool. The same thing is true here, but this time, thanks to AI, it looks a bit more professional. This time, we want to give a bit more structure than what we'd be able to do with a simple print. 1 We are trying to get events out that have a bunch of information. Mostly, we are interested in files and their contents over time. Looking through the codebase, we find that one key abstract is an ident; this will help us identify files. We will simply find points that seem interesting, make a corresponding event, make sure it has idents associated with it, and send that event over a WebSocket. Then, with that raw information, we can have our visualizer stitch together what exactly happens. If we take a look, we can see our code step through the process. And ultimately end up in the bundle despite not being used. If you notice, though, between steps 3 and 4, our code changed a bit. We lost this PURE annotation. Why? Luckily, because we tried to capture as much context as we could. We can see that a boolean "Scope Hoisting" has been enabled. Could that be related? If we turn it off, we instead see: Our pure annotation is kept around, and as a result, our code is eliminated. If we take a step back, this can make sense. Something during the parse step is creating a closure around our enum code, but when it does so, it is marking that as a "pure" closure, meaning it has no side effects. Later, because this annotation is dropped, the minifier doesn't know that it can just get rid of this closure. As I've been trying to find time to write this up, it seems that people on the bug report have found this on their own as well. So we've found the behavior of the bug. Now we need to understand why it is happening. Remember, we are fixing a bug as a means of understanding the software. Not just to fix a bug. So what exactly is going on? Well, we are trying to stitch together two libraries. Software bugs are way more likely to occur on these seams. In this case, after reading the code for a while, the problem becomes apparent. SWC parses our code and turns it into an AST. But if you take a look at an AST , comments are not a part of the AST. So instead, SWC stores comments off in a hashmap where we can look them up by byte pos. So for each node in the AST, it can see if there is a comment attached. But for the PURE comment, it doesn't actually need to look this comment up. It is not a unique comment that was in the source code. It is a pre-known meta comment. So rather than store each instance in the map, it makes a special value. Now, this encoding scheme causes some problems for turbopack. Turbopack does not act on a single file; it acts across many files. In fact, for scope hoisting, we are trying to take files across modules and condense them into a single scope. So now, when we encounter one of these bytepos encodings, how do we know what module it belongs to? The obvious answer to many might be to simply make a tuple like , and while that certainly works, it does come with tradeoffs. One of these is memory footprint. I didn't find an exact reason. But given the focus on performance on turbopack, I'd imagine this is one of the main motivations. Instead, we get a fairly clever encoding of module and bytepos into a single BytePos, aka a u32. I won't get into the details of the representation here; it involves some condition stuff. But needless to say, now that we are going from something focusing on one file to focusing on multiple and trying to smuggle in this module_id into our BytePos, we ended up missing one detail, PURE. Now our pure value is being interpreted as some module at some very high position instead of the proper bytes. To fix this bug, I found the minimal fix was simply the following: With this our enum properly is marked as PURE and disappears from the output! Now remember, we aren't trying to make a bug fix. We are trying to understand the codebase. Is this the right fix? I'm not sure. I looked around the codebase, and there are a number of other swc sentinel values that I think need to also be handled (PLACEHOLDER and SYNTHESIZED). There is also the decoding path. For dummy, the decoding path panics. Should we do the same? Should we be handling pure at a higher level, where it never even goes through the encoder? Update : As I was writing this, someone else proposed a fix . As I was writing the article, I did see that others had started to figure out the things I had determined from my investigation. But I was not confident enough that it was the right fix yet. In fact, the PR differs a bit from my local fix. It does handle the other sentinel, but at a different layer. It also chooses to decode with module 0. Which felt a bit wrong to me. But again, these are decisions that people who work on this codebase long-term are better equipped to decide than me. I must admit that simply fixing this bug didn't quite help me understand the codebase. Not just because it is a fairly good size. But because I couldn't see this fundamental unit that everything was composed of. In some of the code snippets above, you will see types that mention Vc. This stands for ValueCell. There are a number of ways to try to understand these; you can check out the docs for turbo engine for some details. Or you can read the high-level overview that skips the implementation details for the most part. You can think of these cells like the cells in a spreadsheet. They provide a level of incremental computation. When the value of some cell updates, we can invalidate stuff. Unlike a spreadsheet, the turbo engine is lazy. I've worked with these kinds of systems before. Some are very explicitly modeled after spreadsheets. Others are based on rete networks or propagators. I am also immediately reminded of salsa from the Rust analyzer team. I've also worked with big, complicated non-computational graphs. But even with that background, I know myself, I've never been able to really understand these things until I can visualize them. And while a general network visualizer can be useful (and might actually be quite useful if I used the aggregate graph), I've found that for my understanding, I vastly prefer starting small and exploring out the edges of the graph. So that's what I did. But before we get to that visualization, I want to highlight something fantastic in the implementation: a central place for controlling a ton of the decisions that go into this system. The backend here lets us decide so many things about how the execution of our tasks will run. Because of this, we have one place we can insert a ton of tooling and begin to understand how this system works. As before, we are going to send things on a WebSocket. But unlike last time, our communication will actually be two-way. We are going to be controlling how the tasks run. In my little test project, I edited a file, and my visualizer displayed the following. Admittedly, it is a bit janky, but there are some nice features. First, on the left, we can see all the pending tasks. In this case, something has marked our file read as dirty, so we are trying to read the file. We can see the contents of a cell that this task has. And we can see the dependents of this task. Here is what it looks like once we release that task to run. We can now see 3 parse tasks have kicked off. Why 3? I'll be honest, I haven't looked into it. But a good visualization is about provoking questions, not only answering them. Did I get my visualization wrong because I misunderstood something about the system? Are there three different subsystems that want to parse, and we want to do them in parallel? Have we just accidentally triggered more parses than we should be? This is precisely what we want out of a visualizer. Is it perfect? No, would I ship this as a general visualizer? No. Am I happy with the style? Not in the least. But already it enables a look into the project I couldn't see before. Here we can actually watch the graph unfold as I execute more steps. What a fascinating view of a once opaque project. With this visualizer, I was able to make changes to my project and watch values as they flow through the systems. I made simple views for looking at code. If I extended this, I can imagine it being incredibly useful for debugging general issues, for seeing the ways in which things are scheduled, and for finding redundancies in the graph. Once I was able to visualize this, I really started to understand the codebase better. I was able to see all the values that didn't need to be recomputed when we made changes. The whole thing clicked. This was an exercise in exploring a new codebase that is a bit different of a process than I see others take. It isn't an easy process, it isn't quick. But I've found myself repeating this process over and over again. For the turborepo codebase, this is just the beginning. This exploration was done over a few weekends in the limited spare time I could find. But already I can start to put the big picture together. I can start to see how I could shape my tools to help me answer more questions. If you've never used tool building as a way to learn a codebase, I highly recommend it. One thing I always realize as I go through this process is just how hard it is to work interactively with our current software. Our languages, our tools, our processes are all written without ways to live code, without ways to inspect their inner workings. It is also incredibly hard to find a productive UI environment for this kind of live exploration. The running state of the visualizer contains all the valuable information. Any system that needs you to retrace your steps to get the UI back to the state it was once in to visualize more is incredibly lacking. So I always find myself in the browser, but immediately, I am having to worry about performance. We have made massive strides in so many aspects of software development. I hope that we will fix this one as well. Setting a goal Editing randomly Fixing things I find that are broken Reading to answer questions Making a visualizer Our utilities.ts file is read and parsed. It ends up in a file under a "chunks" directory.

0 views