Posts in Programming (20 found)

Code research projects with async coding agents like Claude Code and Codex

I've been experimenting with a pattern for LLM usage recently that's working out really well: asynchronous code research tasks . Pick a research question, spin up an asynchronous coding agent and let it go and run some experiments and report back when it's done. Software development benefits enormously from something I call code research . The great thing about questions about code is that they can often be definitively answered by writing and executing code. I often see questions on forums which hint at a lack of understanding of this skill. "Could Redis work for powering the notifications feed for my app?" is a great example. The answer is always "it depends", but a better answer is that a good programmer already has everything they need to answer that question for themselves. Build a proof-of-concept, simulate the patterns you expect to see in production, then run experiments to see if it's going to work. I've been a keen practitioner of code research for a long time. Many of my most interesting projects started out as a few dozen lines of experimental code to prove to myself that something was possible. It turns out coding agents like Claude Code and Codex are a fantastic fit for this kind of work as well. Give them the right goal and a useful environment and they'll churn through a basic research project without any further supervision. LLMs hallucinate and make mistakes. This is far less important for code research tasks because the code itself doesn't lie: if they write code and execute it and it does the right things then they've demonstrated to both themselves and to you that something really does work. They can't prove something is impossible - just because the coding agent couldn't find a way to do something doesn't mean it can't be done - but they can often demonstrate that something is possible in just a few minutes of crunching. I've used interactive coding agents like Claude Code and Codex CLI for a bunch of these, but today I'm increasingly turning to their asynchronous coding agent family members instead. An asynchronous coding agent is a coding agent that operates on a fire-and-forget basis. You pose it a task, it churns away on a server somewhere and when it's done it files a pull request against your chosen GitHub repository. OpenAI's Codex Cloud , Anthropic's Claude Code for web , Google Gemini's Jules , and GitHub's Copilot coding agent are four prominent examples of this pattern. These are fantastic tools for code research projects. Come up with a clear goal, turn it into a few paragraphs of prompt, set them loose and check back ten minutes later to see what they've come up with. I'm firing off 2-3 code research projects a day right now. My own time commitment is minimal and they frequently come back with useful or interesting results. You can run a code research task against an existing GitHub repository, but I find it's much more liberating to have a separate, dedicated repository for your coding agents to run their projects in. This frees you from being limited to research against just code you've already written, and also means you can be much less cautious about what you let the agents do. I have two repositories that I use for this - one public, one private. I use the public one for research tasks that have no need to be private, and the private one for anything that I'm not yet ready to share with the world. The biggest benefit of a dedicated repository is that you don't need to be cautious about what the agents operating in that repository can do. Both Codex Cloud and Claude Code for web default to running agents in a locked-down environment, with strict restrictions on how they can access the network. This makes total sense if they are running against sensitive repositories - a prompt injection attack of the lethal trifecta variety could easily be used to steal sensitive code or environment variables. If you're running in a fresh, non-sensitive repository you don't need to worry about this at all! I've configured my research repositories for full network access, which means my coding agents can install any dependencies they need, fetch data from the web and generally do anything I'd be able to do on my own computer. Let's dive into some examples. My public research repository is at simonw/research on GitHub. It currently contains 13 folders, each of which is a separate research project. I only created it two weeks ago so I'm already averaging nearly one a day! It also includes a GitHub Workflow which uses GitHub Models to automatically update the README file with a summary of every new project, using Cog , LLM , llm-github-models and this snippet of Python . Here are a some example research projects from the repo. node-pyodide shows an example of a Node.js script that runs the Pyodide WebAssembly distribution of Python inside it - yet another of my ongoing attempts to find a great way of running Python in a WebAssembly sandbox on a server. python-markdown-comparison ( transcript ) provides a detailed performance benchmark of seven different Python Markdown libraries. I fired this one off because I stumbled across cmarkgfm , a Python binding around GitHub's Markdown implementation in C, and wanted to see how it compared to the other options. This one produced some charts! came out on top by a significant margin: Here's the entire prompt I used for that project: Create a performance benchmark and feature comparison report on PyPI cmarkgfm compared to other popular Python markdown libraries - check all of them out from github and read the source to get an idea for features, then design and run a benchmark including generating some charts, then create a report in a new python-markdown-comparison folder (do not create a _summary.md file or edit anywhere outside of that folder). Make sure the performance chart images are directly displayed in the README.md in the folder. Note that I didn't specify any Markdown libraries other than - Claude Code ran a search and found the other six by itself. cmarkgfm-in-pyodide is a lot more fun. A neat thing about having all of my research projects in the same repository is that new projects can build on previous ones. Here I decided to see how hard it would be to get - which has a C extension - working inside Pyodide inside Node.js. Claude successfully compiled a 88.4KB file with the necessary C extension and proved it could be loaded into Pyodide in WebAssembly inside of Node.js. I ran this one using Claude Code on my laptop after an initial attempt failed. The starting prompt was: Figure out how to get the cmarkgfm markdown lover [typo in prompt, this should have been "library" but it figured it out anyway] for Python working in pyodide. This will be hard because it uses C so you will need to compile it to pyodide compatible webassembly somehow. Write a report on your results plus code to a new cmarkgfm-in-pyodide directory. Test it using pytest to exercise a node.js test script that calls pyodide as seen in the existing node.js and pyodide directory There is an existing branch that was an initial attempt at this research, but which failed because it did not have Internet access. You do have Internet access. Use that existing branch to accelerate your work, but do not commit any code unless you are certain that you have successfully executed tests that prove that the pyodide module you created works correctly. This one gave up half way through, complaining that emscripten would take too long. I told it: Complete this project, actually run emscripten, I do not care how long it takes, update the report if it works It churned away for a bit longer and complained that the existing Python library used CFFI which isn't available in Pyodide. I asked it: Can you figure out how to rewrite cmarkgfm to not use FFI and to use a pyodide-friendly way of integrating that C code instead? ... and it did. You can see the full transcript here . blog-tags-scikit-learn . Taking a short break from WebAssembly, I thought it would be fun to put scikit-learn through its paces on a text classification task against my blog: Work in a new folder called blog-tags-scikit-learn Download - a SQLite database. Take a look at the blog_entry table and the associated tags - a lot of the earlier entries do not have tags associated with them, where the later entries do. Design, implement and execute models to suggests tags for those earlier entries based on textual analysis against later ones Use Python scikit learn and try several different strategies Produce JSON of the results for each one, plus scripts for running them and a detailed markdown description Also include an HTML page with a nice visualization of the results that works by loading those JSON files. This resulted in seven files, four results files and a detailed report . (It ignored the bit about an HTML page with a nice visualization for some reason.) Not bad for a few moments of idle curiosity typed into my phone! That's just three of the thirteen projects in the repository so far. The commit history for each one usually links to the prompt and sometimes the transcript if you want to see how they unfolded. More recently I added a short file to the repo with a few extra tips for my research agents. You can read that here . My preferred definition of AI slop is AI-generated content that is published without human review. I've not been reviewing these reports in great detail myself, and I wouldn't usually publish them online without some serious editing and verification. I want to share the pattern I'm using though, so I decided to keep them quarantined in this one public repository. A tiny feature request for GitHub: I'd love to be able to mark a repository as "exclude from search indexes" such that it gets labelled with tags. I still like to keep AI-generated content out of search, to avoid contributing more to the dead internet . It's pretty easy to get started trying out this coding agent research pattern. Create a free GitHub repository (public or private) and let some agents loose on it and see what happens. You can run agents locally but I find the asynchronous agents to be more convenient - especially as I can run them (or trigger them from my phone) without any fear of them damaging my own machine or leaking any of my private data. Claude Code for web offers a free $250 of credits for their $20/month users for a limited time (until November 18, 2025). Gemini Jules has a free tier . There are plenty of other coding agents you can try out as well. Let me know if your research agents come back with anything interesting! You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options . Code research Coding agents Asynchronous coding agents Give them a dedicated GitHub repository Let them rip with unlimited network access My simonw/research collection This is total slop, of course Try it yourself

0 views
neilzone 2 days ago

Using vimwiki as a personal, portable, knowledge base

A while back, I was looking for a tool to act as basically a semi-organised dumping ground for all sorts of notes and thoughts. I wanted it to be Free software, easy to keep in sync / use across multiple devices, I can use it offline / without a LAN connection, and it should render Markdown nicely. I looked at logseq, which looked interesting, but decided to give vimwiki a go. I spend a lot of my time in vim already, so this seemed like it would fit into the way I work very easily. And I was right. Since it is “just” a collection of .md files, it appeals to me from a simplicity point of view, and also makes synchronisation and backing up very easy. There are [multiple ways to install vimwiki]. I went for: and then adding the following to my (although I already had one of them): To add a new wiki with support for Markdown (rather than the default vimwiki syntax), I put the details into Then, I opened vim, and used to open the wiki. On the first use, there was a prompt to create the first page. The basic vimwiki-specific keybindings are indeed the ones I use the most to manage the wiki itself. For me, “ ” is “". Otherwise, I just use vim normally, which is a significant part of the appeal for me. The wiki is just a collection of markdown files, in the directory specified in the “path” field in the configuration. This makes synchronisation easy. I sync my vimwiki directory with Nextcloud, so that it propogates automatically onto my machines, and I can also push it to git, so that I can grab it on my phone. This works for me, and means that I don’t need to configure, secure etc. another sync tool or a dedicated sync system. There is support for multiple wikis, although I have not experimented much with this. Each wiki gets its own line in . You can use in vim to select which wiki you want to use. I really like vimwiki. It is simple but effective, and because it runs in vim, it does not require me to learn a different tool, or adjust my workflow. I just open vim and open my wiki. Prior to vimwiki, I was just dropping .md or .txt files into a directory which got synchronised, so this is not massively different, other than more convenient. Everything is still file-based, but with an increased ease of organisation. For someone who didn’t already use vim, it is probably a more challenging choice.

1 views
The Coder Cafe 3 days ago

Build Your Own Key-Value Storage Engine—Week 1

Curious how leading engineers tackle extreme scale challenges with data-intensive applications? Join Monster Scale Summit (free + virtual). It’s hosted by ScyllaDB, the monstrously fast and scalable database. Agenda Week 0: Introduction Week 1: In-Memory Store Welcome to week 1 of Build Your Own Key-Value Storage Engine ! Let’s start by making sure what you’re about to build in this series makes complete sense: what’s a storage engine? A storage engine is the part of a database that actually stores, indexes, and retrieves data, whether on disk or in memory. Think of the database as the restaurant, and the storage engine as the kitchen that decides how food is prepared and stored. Some databases let you choose the storage engine. For example, MySQL uses InnoDB by default (based on B+-trees). Through plugins, you can switch to RocksDB, which is based on LSM trees. This week, you will build an in-memory storage engine and the first version of the validation client that you will reuse throughout the series. 💬 If you want to share your progress, discuss solutions, or collaborate with other coders, join the community Discord server ( channel): Join the Discord Keys are lowercase ASCII strings. Values are ASCII strings. NOTE : Assumptions persist for the rest of the series unless explicitly discarded. The request body contains the value. If the key exists, update its value and return success. If the key doesn’t exist, create it and return success. Keep all data in memory. If the key exists, return 200 OK with the value in the body. If the key does not exist, return . Implement a client to validate your server: Read the testing scenario from this file: put.txt . Run an HTTP request for each line: → Send a to with body . → Send a to . Confirm that is returned. If not, something is wrong with your implementation. → Send a GET to . Confirm that is returned. If not, something is wrong with your implementation. Each request must be executed sequentially, one line at a time; otherwise, out-of-order responses may fail the client’s assertions. If you want to generate an input file with a different number of lines, you can use this Go generator : is the format to generate. is the number of lines. At this stage, you need a -type file, so for example, if you need one million lines: Add basic metrics for latency: Record start and end time for each request. Keep a small histogram of latencies in milliseconds. At the end, print , , and . This work is optional as there is no latency target in this series. However, it can be an interesting point of comparison across weeks to see how your changes affect latency. That’s it for this week! You have built a simple storage engine that keeps everything in memory. In two weeks, we will level up. You will delve into a data structure widely used in key-value databases: LSM trees. Missing direction in your tech career? At The Coder Cafe, we serve timeless concepts with your coffee to help you master the fundamentals. Written by a Google SWE and trusted by thousands of readers, we support your growth as an engineer, one coffee at a time. ❤️ If you enjoyed this post, please hit the like button. Week 0: Introduction Week 1: In-Memory Store Welcome to week 1 of Build Your Own Key-Value Storage Engine ! Let’s start by making sure what you’re about to build in this series makes complete sense: what’s a storage engine? A storage engine is the part of a database that actually stores, indexes, and retrieves data, whether on disk or in memory. Think of the database as the restaurant, and the storage engine as the kitchen that decides how food is prepared and stored. Some databases let you choose the storage engine. For example, MySQL uses InnoDB by default (based on B+-trees). Through plugins, you can switch to RocksDB, which is based on LSM trees. This week, you will build an in-memory storage engine and the first version of the validation client that you will reuse throughout the series. Your Tasks 💬 If you want to share your progress, discuss solutions, or collaborate with other coders, join the community Discord server ( channel): Join the Discord Assumptions Keys are lowercase ASCII strings. Values are ASCII strings. : The request body contains the value. If the key exists, update its value and return success. If the key doesn’t exist, create it and return success. Keep all data in memory. : If the key exists, return 200 OK with the value in the body. If the key does not exist, return . Read the testing scenario from this file: put.txt . Run an HTTP request for each line: → Send a to with body . → Send a to . Confirm that is returned. If not, something is wrong with your implementation. → Send a GET to . Confirm that is returned. If not, something is wrong with your implementation. Each request must be executed sequentially, one line at a time; otherwise, out-of-order responses may fail the client’s assertions. is the format to generate. is the number of lines. Record start and end time for each request. Keep a small histogram of latencies in milliseconds. At the end, print , , and .

0 views
Den Odell 3 days ago

Escape Velocity: Break Free from Framework Gravity

Frameworks were supposed to free us from the messy parts of the web. For a while they did, until their gravity started drawing everything else into orbit. Every framework brought with it real progress. React, Vue, Angular, Svelte, and others all gave structure, composability, and predictability to frontend work. But now, after a decade of React dominance, something else has happened. We haven’t just built apps with React, we’ve built an entire ecosystem around it—hiring pipelines, design systems, even companies—all bound to its way of thinking. The problem isn’t React itself, nor any other framework for that matter. The problem is the inertia that sets in once any framework becomes infrastructure. By that point, it’s “too important to fail,” and everything nearby turns out to be just fragile enough to prove it. React is no longer just a library. It’s a full ecosystem that defines how frontend developers are allowed to think. Its success has created its own kind of gravity, and the more we’ve built within it, the harder it’s become to break free. Teams standardize on it because it’s safe: it’s been proven to work at massive scale, the talent pool is large, and the tooling is mature. That’s a rational choice, but it also means React exerts institutional gravity. Moving off it stops being an engineering decision and becomes an organizational risk instead. Solutions to problems tend to be found within its orbit, because stepping outside it feels like drifting into deep space. We saw this cycle with jQuery in the past, and we’re seeing it again now with React. We’ll see it with whatever comes next. Success breeds standardization, standardization breeds inertia, and inertia convinces us that progress can wait. It’s the pattern itself that’s the problem, not any single framework. But right now, React sits at the center of this dynamic, and the stakes are far higher than they ever were with jQuery. Entire product lines, architectural decisions, and career paths now depend on React-shaped assumptions. We’ve even started defining developers by their framework: many job listings ask for “React developers” instead of frontend engineers. Even AI coding agents default to React when asked to start a new frontend project, unless deliberately steered elsewhere. Perhaps the only thing harder than building on a framework is admitting you might need to build without one. React’s evolution captures this tension perfectly. Recent milestones include the creation of the React Foundation , the React Compiler reaching v1.0 , and new additions in React 19.2 such as the and Fragment Refs. These updates represent tangible improvements. Especially the compiler, which brings automatic memoization at build time, eliminating the need for manual and optimization. Production deployments show real performance wins using it: apps in the Meta Quest Store saw up to 2.5x faster interactions as a direct result. This kind of automatic optimization is genuinely valuable work that pushes the entire ecosystem forward. But here’s the thing: the web platform has been quietly heading in the same direction for years, building many of the same capabilities frameworks have been racing to add. Browsers now ship View Transitions, Container Queries, and smarter scheduling primitives. The platform keeps evolving at a fair pace, but most teams won’t touch these capabilities until React officially wraps them in a hook or they show up in Next.js docs. Innovation keeps happening right across the ecosystem, but for many it only becomes “real” once React validates the approach. Which is fine, assuming you enjoy waiting for permission to use the platform you’re already building on. The React Foundation represents an important milestone for governance and sustainability. This new foundation is a part of the Linux Foundation, and founding members include Meta, Vercel, Microsoft, Amazon, Expo, Callstack, and Software Mansion. This is genuinely good for React’s long-term health, providing better governance and removing the risk of being owned by a single company. It ensures React can outlive any one organization’s priorities. But it doesn’t fundamentally change the development dynamic of the framework. Yet. The engineers who actually build React still work at companies like Meta and Vercel. The research still happens at that scale, driven by those performance needs. The roadmap still reflects the priorities of the companies that fund full-time development. And to be fair, React operates at a scale most frameworks will never encounter. Meta serves billions of users through frontends that run on constrained mobile devices around the world, so it needs performance at a level that justifies dedicated research teams. The innovations they produce, including compiler-driven optimization, concurrent rendering, and increasingly fine-grained performance tooling, solve real problems that exist only at that kind of massive scale. But those priorities aren’t necessarily your priorities, and that’s the tension. React’s innovations are shaped by the problems faced by companies running apps at billions-of-users scale, not necessarily the problems faced by teams building for thousands or millions. React’s internal research reveals the team’s awareness of current architectural limitations. Experimental projects like Forest explore signal-like lazy computation graphs; essentially fine-grained reactivity instead of React’s coarse re-render model. Another project, Fir , investigates incremental rendering techniques. These aren’t roadmap items; they’re just research prototypes happening inside Meta. They may never ship publicly. But they do reveal something important: React’s team knows the virtual DOM model has performance ceilings and they’re actively exploring what comes after it. This is good research, but it also illustrates the same dynamic at play again: that these explorations happen behind the walls of Big Tech, on timelines set by corporate priorities and resource availability. Meanwhile, frameworks like Solid and Qwik have been shipping production-ready fine-grained reactivity for years. Svelte 5 shipped runes in 2024, bringing signals to mainstream adoption. The gap isn’t technical capability, but rather when the industry feels permission to adopt it. For many teams, that permission only comes once React validates the approach. This is true regardless of who governs the project or what else exists in the ecosystem. I don’t want this critique to take away from what React has achieved over the past twelve years. React popularized declarative UIs and made component-based architecture mainstream, which was a huge deal in itself. It proved that developer experience matters as much as runtime performance and introduced the idea that UI could be a pure function of input props and state. That shift made complex interfaces far easier to reason about. Later additions like hooks solved the earlier class component mess elegantly, and concurrent rendering through `` opened new possibilities for truly responsive UIs. The React team’s research into compiler optimization, server components, and fine-grained rendering pushes the entire ecosystem forward. This is true even when other frameworks ship similar ideas first. There’s value in seeing how these patterns work at Meta’s scale. The critique isn’t that React is bad, but that treating any single framework as infrastructure creates blind spots in how we think and build. When React becomes the lens through which we see the web, we stop noticing what the platform itself can already do, and we stop reaching for native solutions because we’re waiting for the framework-approved version to show up first. And crucially, switching to Solid, Svelte, or Vue wouldn’t eliminate this dynamic; it would only shift its center of gravity. Every framework creates its own orbit of tools, patterns, and dependencies. The goal isn’t to find the “right” framework, but to build applications resilient enough to survive migration to any framework, including those that haven’t been invented yet. This inertia isn’t about laziness; it’s about logistics. Switching stacks is expensive and disruptive. Retraining developers, rebuilding component libraries, and retooling CI pipelines all take time and money, and the payoff is rarely immediate. It’s high risk, high cost, and hard to justify, so most companies stay put, and honestly, who can blame them? But while we stay put, the platform keeps moving. The browser can stream and hydrate progressively, animate transitions natively, and coordinate rendering work without a framework. Yet most development teams won’t touch those capabilities until they’re built in or officially blessed by the ecosystem. That isn’t an engineering limitation; it’s a cultural one. We’ve somehow made “works in all browsers” feel riskier than “works in our framework.” Better governance doesn’t solve this. The problem isn’t React’s organizational structure; it’s our relationship to it. Too many teams wait for React to package and approve platform capabilities before adopting them, even when those same features already exist in browsers today. React 19.2’s `` component captures this pattern perfectly. It serves as a boundary that hides UI while preserving component state and unmounting effects. When set to , it pauses subscriptions, timers, and network requests while keeping form inputs and scroll positions intact. When revealed again by setting , those effects remount cleanly. It’s a genuinely useful feature. Tabbed interfaces, modals, and progressive rendering all benefit from it, and the same idea extends to cases where you want to pre-render content in the background or preserve state as users navigate between views. It integrates smoothly with React’s lifecycle and `` boundaries, enabling selective hydration and smarter rendering strategies. But it also draws an important line between formalization and innovation . The core concept isn’t new; it’s simply about pausing side effects while maintaining state. Similar behavior can already be built with visibility observers, effect cleanup, and careful state management patterns. The web platform even provides the primitives for it through tools like , DOM state preservation, and manual effect control. What . Yet it also exposes how dependent our thinking has become on frameworks. We wait for React to formalize platform behaviors instead of reaching for them directly. This isn’t a criticism of `` itself; it’s a well-designed API that solves a real problem. But it serves as a reminder that we’ve grown comfortable waiting for framework solutions to problems the platform already lets us solve. After orbiting React for so long, we’ve forgotten what it feels like to build without its pull. The answer isn’t necessarily to abandon your framework, but to remember that it runs inside the web, not the other way around. I’ve written before about building the web in islands as one way to rediscover platform capabilities we already have. Even within React’s constraints, you can still think platform first: These aren’t anti-React practices, they’re portable practices that make your web app more resilient. They let you adopt new browser capabilities as soon as they ship, not months later when they’re wrapped in a hook. They make framework migration feasible rather than catastrophic. When you build this way, React becomes a rendering library that happens to be excellent at its job, not the foundation everything else has to depend on. A React app that respects the platform can outlast React itself. When you treat React as an implementation detail instead of an identity, your architecture becomes portable. When you embrace progressive enhancement and web semantics, your ideas survive the next framework wave. The recent wave of changes, including the React Foundation, React Compiler v1.0, the `` component, and internal research into alternative architectures, all represent genuine progress. The React team is doing thoughtful work, but these updates also serve as reminders of how tightly the industry has become coupled to a single ecosystem’s timeline. That timeline is still dictated by the engineering priorities of large corporations, and that remains true regardless of who governs the project. If your team’s evolution depends on a single framework’s roadmap, you are not steering your product; you are waiting for permission to move. That is true whether you are using React, Vue, Angular, or Svelte. The framework does not matter; the dependency does. It is ironic that we spent years escaping jQuery’s gravity, only to end up caught in another orbit. React was once the radical idea that changed how we build for the web. Every successful framework reaches this point eventually, when it shifts from innovation to institution, from tool to assumption. jQuery did it, React did it, and something else will do it next. The React Foundation is a positive step for the project’s long-term sustainability, but the next real leap forward will not come from better governance. It will not come from React finally adopting signals either, and it will not come from any single framework “getting it right.” Progress will come from developers who remember that frameworks are implementation details, not identities. Build for the platform first. Choose frameworks second. The web isn’t React’s, it isn’t Vue’s, and it isn’t Svelte’s. It belongs to no one. If we remember that, it will stay free to evolve at its own pace, drawing the best ideas from everywhere rather than from whichever framework happens to hold the cultural high ground. Frameworks are scaffolding, not the building. Escaping their gravity does not mean abandoning progress; it means finding enough momentum to keep moving. Reaching escape velocity, one project at a time. Use native forms and form submissions to a server, then enhance with client-side logic Prefer semantic HTML and ARIA before reaching for component libraries Try View Transitions directly with minimal React wrappers instead of waiting for an official API Use Web Components for self-contained widgets that could survive a framework migration Keep business logic framework-agnostic, plain TypeScript modules rather than hooks, and aim to keep your hooks short by pulling logic from outside React Profile performance using browser DevTools first and React DevTools second Try native CSS features like , , scroll snap , , and before adding JavaScript solutions Use , , and instead of framework-specific alternatives wherever possible Experiment with the History API ( , ) directly before reaching for React Router Structure code so routing, data fetching, and state management can be swapped out independently of React Test against real browser APIs and behaviors, not just framework abstractions

0 views
baby steps 3 days ago

But then again...maybe alias?

Hmm, as I re-read the post I literally just posted a few minutes ago, I got to thinking. Maybe the right name is indeed , and not . The rationale is simple: alias can serve as both a noun and a verb. It hits that sweet spot of “common enough you know what it means, but weird enough that it can be Rust Jargon for something quite specific”. In the same way that we talk about “passing a clone of ” we can talk about “passing an alias to ” or an “alias of ”. Food for thought! I’m going to try on for size in future posts and see how it feels.

0 views
baby steps 3 days ago

Bikeshedding `Handle` and other follow-up thoughts

There have been two major sets of responses to my proposal for a trait. The first is that the trait seems useful but doesn’t over all the cases where one would like to be able to ergonomically clone things. The second is that the name doesn’t seem to fit with our Rust conventions for trait names, which emphasize short verbs over nouns. The TL;DR of my response is that (1) I agree, this is why I think we should work to make ergonomic as well as ; and (2) I agree with that too, which is why I think we should find another name. At the moment I prefer , with coming in second. The first concern with the trait is that, while it gives a clear semantic basis for when to implement the trait, it does not cover all the cases where calling is annoying. In other words, if we opt to use , and then we make creating new handles very ergonomic, but calling remains painful, there will be a temptation to use the when it is not appropriate. In one of our lang team design meetings, TC raised the point that, for many applications, even an “expensive” clone isn’t really a big deal. For example, when writing CLI tools and things, I regularly clone strings and vectors of strings and hashmaps and whatever else; I could put them in an Rc or Arc but I know it just doens’t matter. My solution here is simple: let’s make solutions that apply to both and . Given that I think we need a proposal that allows for handles that are both ergonomic and explicit, it’s not hard to say that we should extend that solution to include the option for clone. The explicit capture clause post already fits this design. I explicitly chose a design that allowed for users to write or , and hence works equally well (or equally not well…) with both traits A number of people have pointed out doesn’t fit the Rust naming conventions for traits like this, which aim for short verbs. You can interpret as a verb, but it doesn’t mean what we want. Fair enough. I like the name because it gives a noun we can use to talk about, well, handles , but I agree that the trait name doesn’t seem right. There was a lot of bikeshedding on possible options but I think I’ve come back to preferring Jack Huey’s original proposal, (with a method ). I think and is my second favorite. Both of them are short, relatively common verbs. I originally felt that was a bit too generic and overly associated with sharing across threads – but then I at least always call a shared reference 1 , and an would implement , so it all seems to work well. Hat tip to Ariel Ben-Yehuda for pushing me on this particular name. The flurry of posts in this series have been an attempt to survey all the discussions that have taken place in this area. I’m not yet aiming to write a final proposal – I think what will come out of this is a series of multiple RFCs. My current feeling is that we should add the , uh, trait. I also think we should add explicit capture clauses . However, while explicit capture clauses are clearly “low-level enough for a kernel”, I don’t really think they are “usable enough for a GUI” . The next post will explore another idea that I think might bring us closer to that ultimate ergonomic and explicit goal. A lot of people say immutable reference but that is simply accurate: an is not immutable. I think that the term shared reference is better.  ↩︎ A lot of people say immutable reference but that is simply accurate: an is not immutable. I think that the term shared reference is better.  ↩︎

0 views
Abhinav Sarkar 3 days ago

A Short Survey of Compiler Targets

As an amateur compiler developer, one of the decisions I struggle with is choosing the right compiler target. Unlike the 80’s when people had to target various machine architectures directly, now there are many mature options available. This is a short and very incomplete survey of some of the popular and interesting options. A compiler can always directly output machine code or assembly targeted for one or more architectures. A well-known example is the Tiny C Compiler . It’s known for its speed and small size, and it can compile and run C code on the fly. Another such example is Turbo Pascal . You could do this with your compiler too, but you’ll have to figure out the intricacies of the Instruction set of each architecture (ISA) you want to target, as well as, concepts like Register allocation . Most modern compilers actually don’t emit machine code or assembly directly. They lower the source code down to a language-agnostic Intermediate representation (IR) first, and then generate machine code for major architectures (x86-64, ARM64, etc.) from it. The most prominent tool in this space is LLVM . It’s a large, open-source compiler-as-a-library. Compilers for many languages such as Rust , Swift , C/C++ (via Clang ), and Julia use LLVM as an IR to emit machine code. An alternative is the GNU C compiler (GCC), via its GIMPLE IR, though no compilers seem to use it directly. GCC can be used as a library to compile code, much like LLVM, via libgccjit . It is used in Emacs to Just-in-time (JIT) compile Elisp . Cranelift is another new option in this space, though it supports only few ISAs. For those who find LLVM or GCC too large or slow to compile, minimalist alternatives exist. QBE is a small backend focused on simplicity, targeting “70% of the performance in 10% of the code”. It’s used by the language Hare that prioritizes fast compile times. Another option is libFIRM , which uses a graph-based SSA representation instead of a linear IR. Sometimes you are okay with letting other compilers/runtimes take care of the heavy lifting. You can transpile your code to a another established high-level language and leverage that language’s existing compiler/runtime and toolchain. A common target in such cases is C. Since C compilers exist for nearly all platforms, generating C code makes your language highly portable. This is the strategy used by Chicken Scheme and Vala . Or you could compile to C++ instead, like Jank , if that’s your thing. There is also C– , a subset of C targeted by GHC and OCaml . Another ubiquitous target is JavaScript (JS), which is one of the two options (other being WebAssembly ) for running code natively in a web browser or one of the JS runtimes ( Node , Deno , Bun ). Multiple languages such as TypeScript , PureScript , Reason , ClojureScript , Dart and Elm transpile to JS. Nim interestingly, can transpile to C, C++ or JS. Another target similar to JS is Lua , a lightweight and embeddable scripting language, which languages such as MoonScript and Fennel transpile to. A more niche approach is to target a Lisp dialect. Compiling to Chez Scheme , for example, allows you to leverage its macro system, runtime, and compiler. The Idris 2 and Racket use Chez Scheme as their primary backend targets. This is a common choice for application languages. You compile to a portable bytecode for a Virtual machine (VM). VMs generally come with features like Garbage collection , JIT compilation , and security sandboxing. The Java Virtual Machine (JVM) is probably the most popular one. It’s the target for many languages including Java , Kotlin , Scala , Groovy , and Clojure . Its main competitor is the Common Language Runtime , originally developed by Microsoft , which is targeted by languages such as C# , F# , and Visual Basic.NET . Another notable VM is the BEAM , originally built for Erlang . The BEAM VM isn’t built for raw computation speed but for high concurrency, fault tolerance, and reliability. Recently, new languages such as Elixir and Gleam have been created to target it. Finally, this category also includes MoarVM —the spiritual successor to the Parrot VM —built for the Raku (formerly Perl 6) language. WebAssembly (Wasm) is a relatively new target. It’s a portable binary instruction format focused on security and efficiency. Wasm is supported by all major browsers, but not limited to them. The WebAssembly System Interface (WASI) standard provides APIs for running Wasm in non-browser and non-JS environments. Wasm is now targeted by many languages such as Rust , C/C++ , Go , Kotlin , Scala , Zig , and Haskell . Meta-tracing frameworks are a more complex category. These are not the targets for your compiler backend, instead, you use them to build a custom JIT compiler for your language by specifying an interpreter for it. The most well-known example is PyPy , an implementation of Python , created using the RPython framework. Another such framework is GraalVM/Truffle , a polyglot VM and meta-tracing framework from Oracle . Its main feature is zero-cost interoperability: code from GraalJS , TruffleRuby , and GraalPy can all run on the same VM, and can call each other directly. Move past the mainstream, and you’ll discover a world of unconventional and esoteric compiler targets. Developers pick them for academic curiosity, artistic expression, or to test the boundaries of viable compilation targets. Brainfuck: An esoteric language with only eight commands, Brainfuck is Turing-complete and has been a target for compilers as a challenge. People have written compilers for C , Haskell and Lambda calculus . Lambda calculus: Lambda calculus is a minimal programming languages that expresses computation solely as functions and their applications. It is often used as the target of educational compilers because of its simplicity, and its link to the fundamental nature of computation. Hell , a subset of Haskell, compiles to Simply typed lambda calculus . SKI combinators: The SKI combinator calculus is even more minimal than lambda calculus. All programs in SKI calculus can be composed of only three combinators: S, K and I. MicroHs compiles a subset of Haskell to SKI calculus. JSFuck: Did you know that you can write all possible JavaScript programs using only six characters ? Well, now you know . Postscript: Postscript is also a Turing-complete programming language. Your next compiler could target it! Regular Expressions ? Lego ? Cellular automata ? I’m going to write a compiler from C++ to JSFuck. If you have any questions or comments, please leave a comment below. If you liked this post, please share it. Thanks for reading! This post was originally published on abhinavsarkar.net . If you liked this post, please leave a comment . Machine Code / Assembly Intermediate Representations Other High-level Languages Virtual Machines / Bytecode WebAssembly Meta-tracing Frameworks Unconventional Targets Brainfuck: An esoteric language with only eight commands, Brainfuck is Turing-complete and has been a target for compilers as a challenge. People have written compilers for C , Haskell and Lambda calculus . Lambda calculus: Lambda calculus is a minimal programming languages that expresses computation solely as functions and their applications. It is often used as the target of educational compilers because of its simplicity, and its link to the fundamental nature of computation. Hell , a subset of Haskell, compiles to Simply typed lambda calculus . SKI combinators: The SKI combinator calculus is even more minimal than lambda calculus. All programs in SKI calculus can be composed of only three combinators: S, K and I. MicroHs compiles a subset of Haskell to SKI calculus. JSFuck: Did you know that you can write all possible JavaScript programs using only six characters ? Well, now you know . Postscript: Postscript is also a Turing-complete programming language. Your next compiler could target it! Regular Expressions ? Lego ? Cellular automata ?

0 views
Neil Madden 3 days ago

Fluent Visitors: revisiting a classic design pattern

It’s been a while since I’ve written a pure programming post. I was recently implementing a specialist collection class that contained items of a number of different types. I needed to be able to iterate over the collection performing different actions depending on the specific type. There are lots of different ways to do this, depending on the school of programming you prefer. In this article, I’m going to take a look at a classic “Gang of Four” design pattern: The Visitor Pattern . I’ll describe how it works, provide some modern spins on it, and compare it to other ways of implementing the same functionality. Hopefully even the most die-hard anti-OO/patterns reader will come away thinking that there’s something worth knowing here after all. (Design Patterns? In this economy?) The example I’ll use in this post is a simple arithmetic expression language. It’s the kind of boring and not very realistic example you see all the time in textbooks, but the more realistic examples I have to hand have too many weird details, so this’ll do. I’m going to write everything in Java 25. Java because, after Smalltalk, it’s probably the language most associated with design patterns. And Java 25 specifically because it makes this example really nice to write. OK, our expression language just has floating-point numbers, addition, and multiplication. So we start by defining datatypes to represent these: If you’re familiar with a functional programming language, this is effectively the same as a datatype definition like the following: Now we want to define a bunch of different operations over these expressions: evaluation, pretty-printing, maybe type-checking or some other kinds of static analysis. We could just directly expose the Expression sub-classes and let each operation directly traverse the structure using pattern matching. For example, we can add an method directly to the expression class that evaluates the expression: (Incidentally, isn’t this great? It’s taken a long time, but I really like how clean this is in modern Java). We can then try out an example: Which gives us: There are some issues with this though. Firstly, there’s no encapsulation. If we want to change the way expressions are represented then we have to change eval() and any other function that’s been defined in this way. Secondly, although it’s straightforward for this small expression language, there can be a lot of duplication in operations over a complex structure dealing with details of traversing that structure. The Visitor Pattern solves both of these issues, as we’ll show now. The basic Visitor Pattern involves creating an interface with callback methods for each type of object you might encounter when traversing a structure. For our example, it looks like the following: A few things to note here: The next part of the pattern is to add an method to the Expression class, which then traverses the data structure invoking the callbacks as appropriate. In the traditional implementation, this method is implemented on each concrete sub-class using a technique known as “double-dispatch”. For example, we could add an implementation of to the Add class that calls . This technique is still sometimes useful, but I find it’s often clearer to just inline all that into the top-level Expression implementation (as a method implementation, because Expression is an interface): What’s going on here? Firstly, the method is parameterised to accept any type of return value. Again, we’ll see why in a moment. It then inspects the specific type of expression of this object and calls the appropriate callback on the visitor. Note that in the Add/Mul cases we also recursively visit the left-hand-side and right-hand-side expressions first, similarly to how we called .eval() on those in the earlier listing. We can then re-implement our expression evaluator in terms of the visitor: OK, that works. But it’s kinda ugly compared to what we had. Can we improve it? Yes, we can. The Visitor is really just a set of callback functions, one for each type of object in our data structure. Rather than defining these callbacks as an implementation of the interface, we could instead define them as three separate lambda functions. We can then invoke these instead: We can then use this to reimplement our expression evaluator again: That’s a lot nicer to look at. We can then call it as before, and we can also use the fluent visitor to define operations on the fly, such as printing a nicer string representation: There are some potential drawbacks to this approach, but overall I think it’s really clean and nice. One drawback is that you lose compile-time checking that all the cases have been handled: if you forget to register one of the callbacks you’ll get a runtime NullPointerException instead. There are ways around this, such as using multiple FluentVisitor types that incrementally construct the callbacks, but that’s more work: That ensures that every callback has to be provided before you can call , at the cost of needing many more classes. This is the sort of thing where good IDE support would really help (IntelliJ plugin anyone?). Another easy-to-fix nit is that, if you don’t care about the result, it is easy to forget to call and thus not actually do anything at all. This can be fixed by changing the method to accept a function rather than returning a FluentVisitor: The encapsulation that the Visitor provides allows us to quite radically change the underlying representation, while still preserving the same logical view of the data. For example, here is an alternative implementation that works only on positive integers and stores expressions in a compact reverse Polish notation (RPN). The exact same visitors we defined for the previous expression evaluator will also work for this one: Hopefully this article has shown you that there is still something interesting about old patterns like the Visitor, especially if you adapt them a bit to modern programming idioms. I often hear fans of functional programming stating that the Visitor pattern only exists to make up for the lack of pattern matching in OO languages like Java. In my opinion, this is the wrong way to think about things. Even when you have pattern matching (as Java now does) the Visitor pattern is still useful due to the increased encapsulation it provides, hiding details of the underlying representation. The correct way to think about the Visitor pattern is as a natural generalisation of the reduce/fold operation common in functional programming languages. Consider the following (imperative) implementation of a left-fold operation over a list: We can think of a linked list as a data structure with two constructors: Nil (the empty list), and Cons(List, List). In this case, the reduce operation is essentially a Visitor pattern where corresponds to the case and corresponds to the case. So, far from being a poor man’s pattern matching, the true essence of the Visitor is a generalised fold operation, which is why it’s so useful. Maybe this old dog still has some nice tricks, eh? We use a generic type parameter <T> to allow operations to return different types of results depending on what they do. We’ll see how this works in a bit. In keeping with the idea of encapsulating details, we use the more abstract type rather than the concrete type we’re using under the hood. (We could also have done this before, but I’m doing it here to illustrate that the Visitor interface doesn’t have to exactly represent the underlying data structures).

0 views
daniel.haxx.se 4 days ago

Yes really, curl is still developed

One of the most common reactions or questions I get about curl when I show up at conferences somewhere and do presentations: — is curl still being actively developed? How many more protocols can there be? This of course being asked by people without very close proximity or insight into the curl project and probably neither into the internet protocol world – which frankly probably is most of the civilized world. Still, these questions keep surprising me. Can projects actually ever get done ? (And do people really believe that adding protocols is the only thing that is left to do?) There are new car models being made every year in spite of the roads being mostly the same for the last decades and there are new browser versions shipped every few weeks even though the web to most casual observers look roughly the same now as it did a few years ago. Etc etc. Even things such as shoes or bicycles are developed and shipped in new versions every year. In spite of how it may appear to casual distant observers, very few things remain the same over time in this world. This certainly is also true for internet, the web and how to do data transfers over them. Just five years ago we did internet transfers differently than how we (want to) do them today. New tweaks and proposals are brought up at least on a monthly basis. Not evolving implies stagnation and eventually… death. As standards, browsers and users update their expectations, curl does as well. curl needs to adapt and keep up to stay relevant. We want to keep improving it so that it can match and go beyond what people want from it. We want to help drive and push internet transfer technologies to help users to do better , more efficient and more secure operations. We like carrying the world’s infrastructure on our shoulders. One of the things that actually have occurred to me, after having worked on this project for some decades by now – and this is something I did not at all consider in the past, is that there is a chance that the project will remain alive and in use the next few decades as well. Because of exactly this nothing-ever-stops characteristic of the world around us, but also of course because of the existing amount of users and usage. Current development should be done with care, a sense of responsibility and with the anticipation that we will carry everything we merge today with us for several more decades – at least. At the latest curl up meeting, I had session I called 100 year curl where I brought up thoughts for us as a project that we might need to work on and keep in mind if indeed we believe the curl project will and should be able to celebrate its 100th birthday in a future. It is a slightly overwhelming (terrifying even?) thought but in my opinion not entirely unrealistic. And when you think about it, we have already traveled almost 30% of the way towards that goalpost. — I used curl the first time decades ago and it still looks the same. This is a common follow-up statement. What have we actually done during all this time that the users can’t spot? A related question that to me also is a little amusing is then: — You say you worked on curl full time since 2019, but what do you actually do all days? We work hard at maintaining backwards compatibility and not breaking existing use cases. If you cannot spot any changes and your command lines just keep working, it confirms that we do things right. curl is meant to do its job and stay out of the way. To mostly be boring. A dull stack is a good stack. We have refactored and rearranged the internal architecture of curl and libcurl several times in the past and we keep doing it at regular intervals as we improve and adapt to new concepts, new ideas and the ever-evolving world. But we never let that impact the API, the ABI or by breaking any previously working curl tool command lines. I personally think that this is curl’s secret super power. The one thing we truly have accomplished and managed to stick to: stability . In several aspects of the word. curl offers stability in an unstable world. Counting commit frequency or any other metric of project activity , the curl project is actually doing more development now and at a higher pace than ever before during its entire lifetime. We do this to offer you and everyone else the best, the most reliable, the fastest, the most feature rich, the best documented and the most secure internet transfer library on the planet.

0 views
Martin Fowler 4 days ago

Fragments Nov 3

I’m very concerned about the security dangers of LLM-enabled browsers, as it’s just too easy for them to contain the Lethal Trifecta . For up-to-date eyes on these issues, I follow the writings of coiner of that phrase: Simon Willison. Here he examines a post on how OpenAI is thinking about these issues. My takeaways from all of this? It’s not done much to influence my overall skepticism of the entire category of browser agents, but it does at least demonstrate that OpenAI are keenly aware of the problems and are investing serious effort in finding the right mix of protections. ❄                ❄                ❄                ❄ Unsurprisingly, there are a lot of strong opinions on AI assisted coding. Some engineers swear by it. Others say it’s dangerous. And of course, as is the way with the internet, nuanced positions get flattened into simplistic camps where everyone’s either on one side or the other. A lot of the problem is that people aren’t arguing about the same thing. They’re reporting different experiences from different vantage points. His view is that beginners are very keen on AI-coding but they don’t see the problems they are creating. Experienced folks do see this, but it takes a further level of experience to realize that when used well these tools are still valuable. Interestingly, I’ve regularly seen sceptical experienced engineers change their view once they’ve been shown how you can blend modern/XP practices with AI assisted coding. The upshot is this, is that you have be aware of the experience level of whoever is writing about this stuff - and that experience is not just in software development generally, but also in how to make use of LLMs. One thing that rings clearly from reading Simon Willison and Birgitta Böckeler is that effective use of LLMs is a skill that takes a while to develop. ❄                ❄                ❄                ❄ Charlie Brown and Garfield, like most comic strip characters, never changed over the decades. But Doonesbury’s cast aged, had children, and some have died (I miss Lacey). Gary Trudeau retired from writing daily strips a few years ago, but his reruns of older strips is one of the best things in the shabby remains of Twitter. A couple of weeks ago, he reran one of the most memorable strips in its whole run. The very first frame of Doonesbury introduced the character “B.D.”, a football jock never seen without his football helmet, or when on duty, his military helmet. This panel was the first time in over thirty years that B.D. was shown without a helmet, readers were so startled that they didn’t immediately notice that the earlier explosion had removed his leg. This set off a remarkable story arc about the travails of a wounded veteran. It’s my view that future generations will find Doonesbury to be a first-class work of literature, and a thoughtful perspective on contemporary America.

0 views
matklad 4 days ago

On Async Mutexes

A short note on contradiction or confusion in my language design beliefs I noticed today. One of the touted benefits of concurrent programming multiplexed over a single thread is that mutexes become unnecessary. With only one function executing at any given moment in time data races are impossible. The standard counter to this argument is that mutual exclusion is a property of the logic itself, not of the runtime. If a certain snippet of code must be executed atomically with respect to everything else that is concurrent, then it must be annotated as such in the source code. You can still introduce logical races by accidentally adding an in the middle of the code that should be atomic. And, while programming, you are adding new s all the time! This argument makes sense to me, as well its as logical conclusion. Given that you want to annotate atomic segments of code anyway, it makes sense to go all the way to Kotlin-style explicit async implicit await. The contradiction I realized today is that for the past few years I’ve been working on a system built around implicit exclusion provided by a single thread — TigerBeetle! Consider compaction, a code that is responsible for rewriting data on disk to make it smaller without changing its logical contents. During compaction, TigerBeetle schedules a lot of concurrent disk reads, disk writes, and CPU-side merges. Here’s an average callback: This is the code ( source ) that runs when a disk read finishes, and it mutates — shared state across all outstanding IO. It’s imperative that no other IO completion mutates compaction concurrently, especially inside that monster of a function . Applying “make exclusion explicit” rule to the code would mean that the entire needs to be wrapped in a mutex, and every callback needs to start with lock/unlock pair. And there’s much more to TigerBeetle than just compaction! While some pairs of callbacks probably can execute concurrently relatively to each other, this changes over time. For example, once we start overlapping compaction and execution, those will be using our GridCache (buffer manager) at the same time. So explicit locking probably gravitates towards having just a single global lock around the entire state, which is acquired for the duration of any callback. At which point, it makes sense to push lock acquisition up to the event loop, and we are back to the implicit locking API! This seems to be another case of two paradigms for structuring concurrent programs . The async/await discussion usually presupposes CSP programming style, where you define a set of concurrent threads of execution, and the threads are mostly independent, sharing a little of data. TigerBeetle is written in a state machine/actor style, where the focal point is the large amount of shared state, which is evolving in discrete steps in reaction to IO events (there’s only one “actor” in TigerBeetle). Additionally, TigerBeetle uses manual callbacks instead of async/await syntax, so inserting an in the middle of critical section doesn’t really happen. Any new concurrency requires introducing an explicit named continuation function, and each continuation (callback) generally starts with a bunch of assertions to pin down the current state and make sure that the ground hasn’t shifted too far since the IO was originally scheduled. Or, as is the case with , sometimes the callback doesn’t assume anything at all about the state of the world and instead carries out an exhaustive case analysis from scratch.

0 views
sunshowers 4 days ago

`SocketAddrV6` is not roundtrip serializable

A few weeks ago at Oxide , we encountered a bug where a particular, somewhat large, data structure was erroring on serialization to JSON via . The problem was that JSON only supports map keys that are strings or numbers, and the data structure had an infrequently-populated map with keys that were more complex than that 1 . We fixed the bug, but a concern still remained: what if some other map that was empty most of the time had a complex key in it? The easiest way to guard against this is by generating random instances of the data structure and attempting to serialize them, checking that this operation doesn’t panic. The most straightforward way to do this is with property-based testing , where you define: Modern property-based testing frameworks like , which we use at Oxide, combine these two algorithms into a single strategy , through a technique known as integrated shrinking . (For a more detailed overview, see my monad tutorial , where I talk about the undesirable performance characteristics of monadic composition when it comes to integrated shrinking.) The library has a notion of a canonical strategy for a type, expressed via the trait . The easiest way to define instances for large, complex types is to use a derive macro . Annotate your type with the macro: As long as all the fields have defined for them—and the library defines the trait for most types in the standard library—your type has a working random generator and shrinker associated with it. It’s pretty neat! I put together an implementation for our very complex type, then wrote a property-based test to ensure that it serializes properly: And, running it: The test passed! But while we’re here, surely we should also be able to deserialize a , and then ensure that we get the same value back, right? We’ve already done the hard part, so let’s go ahead and add this test: The roundtrip test failed! Why in the world did the test fail? My first idea was to try and do a textual diff of the outputs of the two data structures. In this case, I tried out the library, with something like: And the output I got was: There’s nothing in the output! No or as would typically be printed. It’s as if there wasn’t a difference at all, and yet the assertion failing indicated the before and after values just weren’t the same. We have one clue to go by: the integrated shrinking algorithm in tries to shrink maps down to empty ones. But it looks like the map is non-empty . This means that something in either the key or the value was suspicious. A is defined as: Most of these types were pretty simple. The only one that looked even remotely suspicious was the , which ostensibly represents an IPv6 address plus a port number. What’s going on with the ? Does the implementation for it do something weird? Well, let’s look at it : Like a lot of abstracted-out library code it looks a bit strange, but at its core it seems to be simple enough: The is self-explanatory, and the is probably the port number. But what are these last two values? Let’s look at the constructor : What in the world are these two and values? They look mighty suspicious. A thing that caught my eye was the “Textual representation” section of the , which defined the representation as: Note what’s missing from this representation: the field! We finally have a theory for what’s going on: Why did this not show up in the textual diff of the values? For most types in Rust, the representation breaks out all the fields and their values. But for , the implementation (quite reasonably) forwards to the implementation . So the field is completely hidden, and the only way to look at it is through the method . Whoops. How can we test this theory? The easiest way is to generate random values of where is always set to zero, and see if that passes our roundtrip tests. The ecosystem has pretty good support for generating and using this kind of non-canonical strategy. Let’s try it out: Pretty straightforward, and similar to how lets you provide custom implementations through . Let’s test it out again: All right, looks like our theory is confirmed! We can now merrily be on our way… right? This little adventure left us with more questions than answers, though: The best place to start looking is in the IETF Request for Comments (RFCs) 2 that specify IPv6. The Rust documentation for helpfully links to RFC 2460, section 6 and section 7 . The field is actually a combination of two fields that are part of every IPv6 packet: Section 6 of the RFC says: Flow Labels The 20-bit Flow Label field in the IPv6 header may be used by a source to label sequences of packets for which it requests special handling by the IPv6 routers, such as non-default quality of service or “real-time” service. This aspect of IPv6 is, at the time of writing, still experimental and subject to change as the requirements for flow support in the Internet become clearer. […] And section 7: Traffic Classes The 8-bit Traffic Class field in the IPv6 header is available for use by originating nodes and/or forwarding routers to identify and distinguish between different classes or priorities of IPv6 packets. At the point in time at which this specification is being written, there are a number of experiments underway in the use of the IPv4 Type of Service and/or Precedence bits to provide various forms of “differentiated service” for IP packets […]. Let’s look at the Traffic Class field first. This field is similar to IPv4’s differentiated services code point (DSCP) , and is meant to provide quality of service (QoS) over the network. (For example, prioritizing low-latency gaming and video conferencing packets over bulk downloads.) The DSCP field in IPv4 is not part of a , but the Traffic Class—through the field—is part of a . Why is that the case? Rust’s definition of mirrors the defined by RFC 2553, section 3.3 : Similarly, Rust’s mirrors the struct. There isn’t a similar RFC for ; the de facto standard is Berkeley sockets , designed in 1983. The Linux man page for defines it as: So , which includes the Traffic Class, is part of , but the very similar DSCP field is not part of . Why? I’m not entirely sure about this, but here’s an attempt to reconstruct a history: (Even if could be extended to have this field, would it be a good idea to do so? Put a pin in this for now.) RFC 2460 says that the Flow Label is “experimental and subject to change”. The RFC was written back in 1998, over a quarter-century ago—has anyone found a use for it since then? RFC 6437 , published in 2011, attempts to specify semantics for IPv6 Flow Labels. Section 2 of the RFC says: The 20-bit Flow Label field in the IPv6 header [RFC2460] is used by a node to label packets of a flow. […] Packet classifiers can use the triplet of Flow Label, Source Address, and Destination Address fields to identify the flow to which a particular packet belongs. The RFC says that Flow Labels can potentially be used by routers for load balancing, where they can use the triplet source address, destination address, flow label to figure out that a series of packets are all associated with each other. But this is an internal implementation detail generated by the source program, and not something IPv6 users copy/pasting an address generally have to think about. So it makes sense that it isn’t part of the textual representation. RFC 6294 surveys Flow Label use cases, and some of the ones mentioned are: But this Stack Exchange answer by Andrei Korshikov says: Nowadays […] there [are] no clear advantages of additional 20-bit QoS field over existent Traffic Class (Differentiated Class of Service) field. So “Flow Label” is still waiting for its meaningful usage. In my view, putting in was an understandable choice given the optimism around QoS in 1998, but it was a bit of a mistake in hindsight. The Flow Label field never found widespread adoption, and the Traffic Class field is more of an application-level concern. In general, I think there should be a separation between types that are losslessly serializable and types that are not, and violates this expectation. Making the Traffic Class (QoS) a socket option, like in IPv4, avoids these serialization issues. What about the other additional field, ? What does it mean, and why does it not have to be zeroed out? The documentation for a says that in its textual representation, the scope identifier is included after the IPv6 address and a character, within square brackets. So, for example, the following code sample: prints out . What does this field mean? The reason exists has to do with link-local addressing . Imagine you connect two computers directly to each other via, say, an Ethernet cable. There isn’t a central server telling the computers which addresses to use, or anything similar—in this situation, how can the two computers talk to each other? To address this issue, OS vendors came up with the idea to just assign random addresses on each end of the link. The behavior is defined in RFC 3927, section 2.1 : When a host wishes to configure an IPv4 Link-Local address, it selects an address using a pseudo-random number generator with a uniform distribution in the range from 169.254.1.0 to 169.254.254.255 inclusive. (You might have seen these 169.254 addresses on your home computers if your router is down. Those are link-local addresses.) Sounds simple enough, right? But there is a pretty big problem with this approach: what if a computer has more than one interface on which a link-local address has been established? When a program tries to send some data over the network, the computer has to know which interface to send the data out on. But with multiple link-local interfaces, the outbound one becomes ambiguous. This is described in section 6.3 of the RFC: Address Ambiguity Application software run on a multi-homed host that supports IPv4 Link-Local address configuration on more than one interface may fail. This is because application software assumes that an IPv4 address is unambiguous, that it can refer to only one host. IPv4 Link-Local addresses are unique only on a single link. A host attached to multiple links can easily encounter a situation where the same address is present on more than one interface, or first on one interface, later on another; in any case associated with more than one host. […] The IPv6 protocol designers took this lesson to heart. Every time an IPv6-capable computer connects to a network, it establishes a link-local address starting with . (You should be able to see this address via on Linux, or your OS’s equivalent.) But if you’re connected to multiple networks, all of them will have addresses beginning with . Now if an application wants to establish a connection to a computer in this range, how can it tell the OS which interface to use? That’s exactly where comes in: it allows the to specify which network interface to use. Each interface has an index associated with it, which you can see on Linux with . When I run that command, I see: The , , and listed here are all the indexes that can be used as the scope ID. Let’s try pinging our address: Aha! The warning tells us that for a link-local address, the scope ID needs to be specified. Let’s try that using the syntax: Success! What if we try a different scope ID? This makes sense: the address is only valid for scope ID 2 (the interface). When we told to use a different scope, 3, the address was no longer reachable. This neatly solves the 169.254 problem with IPv4 addresses. Since scope IDs can help disambiguate the interface on which a connection ought to be made, it does make sense to include this field in , as well as in its textual representation. The keen-eyed among you may have noticed that the commands above printed out an alternate representation: . The at the end is the network interface that corresponds to the numeric scope ID. Many programs can handle this representation, but Rust’s can’t. Another thing you might have noticed is that the scope ID only makes sense on a particular computer. A scope ID such as means different things on different computers. So the scope ID is roundtrip serializable, but not portable across machines. In this post we started off by looking at a somewhat strange inconsistency and ended up deep in the IPv6 specification. In our case, the instances were always for internal services talking to each other without any QoS considerations, so was always zero. Given that knowledge, we were okay adjusting the property-based tests to always generate instances where was set to zero. ( Here’s the PR as landed .) Still, it raises questions: Should we wrap in a newtype that enforces this constraint? Should provide a non-standard alternate serializer that also includes the field? Should not forward to when hides fields? Should Rust have had separate types from the start? (Probably too late now.) And should Berkeley sockets not have included at all, given that it makes the type impossible to represent as text without loss? The lesson it really drives home for me is how important the principle of least surprise can be. Both and have lossless textual representations, and does as well. By analogy it would seem like would, too, and yet it does not! IPv6 learned so much from IPv4’s mistakes, and yet its designers couldn’t help but make some mistakes of their own. This makes sense: the designers could only see the problems they were solving then, just as we can only see those we’re solving now—and just as we encounter problems with their solutions, future generations will encounter problems with ours. Thanks to Fiona , and several of my colleagues at Oxide, for reviewing drafts of this post. Discuss on Hacker News and Lobsters . This is why our Rust map crate where keys can borrow from values, , serializes its maps as lists or sequences.  ↩︎ The Requests for Discussion we use at Oxide are inspired by RFCs, though we use a slightly different term (RFD) to convey the fact that our documents are less set in stone than IETF RFCs are.  ↩︎ The two fields sum up to 28 bits, and the field is a , so there’s four bits remaining. I couldn’t find documentation for these four bits anywhere—they appear to be unused padding in the . If you know about these bits, please let me know!  ↩︎ a way to generate random instances of a particular type, and given a failing input, a way to shrink it down to a minimal failing value. generate four values: an , a , a , and another then pass them in to . A left square bracket ( ) The textual representation of an IPv6 address Optionally , a percent sign ( ) followed by the scope identifier encoded as a decimal integer A right square bracket ( ) A colon ( ) The port, encoded as a decimal integer. generated a with a non-zero field. When we went to serialize this field as JSON, we used the textual representation, which dropped the field. When we deserialized it, the field was set to zero. As a result, the before and after values were no longer equal. What does this field mean? A is just an plus a port ; why is a different? Why is the not part of the textual representation? , , and are all roundtrip serializable. Why is not? Also: what is the field? a 20-bit Flow Label, and an 8-bit Traffic Class 3 . QoS was not originally part of the 1980s Berkeley sockets specification. DSCP came about much later ( RFC 2474 , 1998). Because C structs do not provide encapsulation, the definition was set in stone and couldn’t be changed. So instead, the DSCP field is set as an option on the socket, via . By the time IPv6 came around, it was pretty clear that QoS was important, so the Traffic Class was baked into the struct. as a pseudo-random value that can be used as part of a hash key for load balancing, or as extra QoS bits on top of the 8 bits provided by the Traffic Class field. This is why our Rust map crate where keys can borrow from values, , serializes its maps as lists or sequences.  ↩︎ The Requests for Discussion we use at Oxide are inspired by RFCs, though we use a slightly different term (RFD) to convey the fact that our documents are less set in stone than IETF RFCs are.  ↩︎ The two fields sum up to 28 bits, and the field is a , so there’s four bits remaining. I couldn’t find documentation for these four bits anywhere—they appear to be unused padding in the . If you know about these bits, please let me know!  ↩︎

0 views
xenodium 4 days ago

agent-shell 0.17 improvements + MELPA

While it's only been a few weeks since the last agent-shell post , there are plenty of new updates to share. What's agent-shell again? A native Emacs shell to interact with any LLM agent powered by ACP ( Agent Client Protocol ). Before getting to the latest and greatest, I'd like to say thank you to new and existing sponsors backing my projects. While the work going in remains largely unsustainable, your contributions are indeed helping me get closer to sustainability. Thank you! If you benefit from my content and projects, please consider sponsoring to make the work sustainable. Work paying for your LLM tokens and other tools? Why not get your employer to sponsor agent-shell also? Now on to the very first update… Both agent-shell and acp.el are now available on MELPA. As such, installation now boils down to: OpenCode and Qwen Code are two of the latest agents to join agent-shell . Both accessible via and through the agent picker, but also directly from and . Adding files as context has seen quite a few improvements in different shapes. Thank you Ian Davidson for contributing embedded context support. Invoke to take a screenshot and automatically send it over to . A little side-note, did you notice the activity indicator in the header bar? Yep. That's new too. While file completion remains experimental, you can enable via: From any file you can now invoke to send the current file to . If region is selected, region information is sent also. Fancy sending a different file other than current one? Invoke with , or just use . , also operates on files (selection or region), DWIM style ;-) You may have noticed paths in section titles are no longer displayed as absolute paths. We're shortening those relative to project roots. While you can invoke with prefix to create new shells, is now available (and more discoverable than ). Cancelling prompt sessions (via ) is much more reliable now. If you experienced a shell getting stuck after cancelling a session, that's because we were missing part of the protocol implementation. This is now implemented. Use the new to automatically insert shell (ie. bash) command output. Initial work for automatically saving markdown transcripts is now in place. We're still iterating on it, but if keen to try things out, you can enable as follows: Text header Applied changes are now displayed inline. The new and can now be used to change the session mode. You can now find out what capabilities and session modes are supported by your agent. Expand either of the two sections. Tired of pressing and to accept changes from the diff buffer? Now just press from the diff viewer to accept all hunks. Same goes for rejecting. No more and . Now just press from the diff buffer. We get a new basic transient menu. Currently available via . We got lots of awesome pull requests from wonderful folks. Thank you for your contributions! Beyond what's been showcased here, much love and effort's been poured into polishing the experience. Interested in the nitty-gritty? Have a look through the 173 commits since the last blog post. If agent-shell or acp.el are useful to you, please consider sponsoring its development. LLM tokens aren't free, and neither is the time dedicated to building this stuff ;-) Arthur Heymans : Add a Package-Requires header ( PR ). Elle Najt : Execute commands in devcontainer ( PR ). Elle Najt : Fix Write tool diff preview for new files ( PR ). Elle Najt : Inline display of historical changes ( PR ). Elle Najt : Live Markdown transcripts ( PR ). Elle Najt : Prompt session mode cycling and modeline display ( PR ). Fritz Grabo : Devcontainer fallback workspace ( PR ). Guilherme Pires : Codex subscription auth ( PR ). Hordur Freyr Yngvason : Make qwen authentication optional ( PR ). Ian Davidson : Embedded context support ( PR ). Julian Hirn : Fix quick-diff window restoration for full-screen ( PR ). Ruslan Kamashev : Hide header line altogether ( PR ). festive-onion : Show Planning mode more reliably ( PR ).

0 views
spf13 5 days ago

Why Engineers Can't Be Rational About Programming Languages

Series Overview This is the first in a series of posts on the true cost of a programming language. A programming language is the single most expensive choice a company makes, yet we treat it like a technical debate. After watching this mistake bankrupt dozens of companies and hurt hundreds more, I’ve learned the uncomfortable truth: these decisions are rarely about technology. They’re about identity, emotion, and ego, and they’re destroying your velocity and budget in ways you can’t see until it’s too late.

1 views
iDiallo 5 days ago

None of us Read the specs

After using Large Language Models extensively, the same questions keep resurfacing. Why didn't the lawyer who used ChatGPT to draft legal briefs verify the case citations before presenting them to a judge? Why are developers raising issues on projects like cURL using LLMs, but not verifying the generated code before pushing a Pull Request? Why are students using AI to write their essays, yet submitting the result without a single read-through? The reason is simple. If you didn't have time to write it, you certainly won't spend time reading it. They are all using LLMs as their time-saving strategy. In reality, the work remains undone because they are merely shifting the burden of verification and debugging to the next person in the chain. AI companies promise that LLMs can transform us all into a 10x developer. You can produce far more output, more lines of code, more draft documents, more specifications, than ever before. The core problem is that this initial time saved is almost always spent by someone else to review and validate your output. At my day job, the developers who use AI to generate large swathes of code are generally lost when we ask questions during PR reviews. They can't explain the logic or the trade-offs because they didn't write it, and they didn't truly read it. Reading and understanding generated code defeats the initial purpose of using AI for speed. Unfortunately, there is a fix for that as well. If PR reviews or verification slow the process down, then the clever reviewer can also use an LLM to review the code at a 10x speed. Now, everyone has saved time. The code gets deployed faster. The metrics for velocity look fantastic. But then, a problem arises. A user experiences a critical issue. At this point, you face a technical catastrophe: The developer is unfamiliar with the code, and the reviewer is also unfamiliar with the code. You are now completely at the mercy of another LLM to diagnose the issue and create a fix, because the essential human domain knowledge required to debug a problem has been bypassed by both parties. This issue isn't restricted to writing code. I've seen the same dangerous pattern when architects use LLMs to write technical specifications for projects. As an architect whose job is to produce a document that developers can use as a blueprint, using an LLM exponentially improves speed. Where it once took a day to go through notes and produce specs, an LLM can generate a draft in minutes. As far as metrics are concerned, the architect is producing more. Maybe they can even generate three or four documents a day now. As an individual contributor, they are more productive. But that output is someone else’s input, and their work depends entirely on the quality of the document. Just because we produce more doesn't mean we are doing a better job. Plus, our tendency is to not thoroughly vet the LLM's output because it always looks good enough, until someone has to scrutinize it. The developer implementing a feature, following that blueprint, will now have to do the extra work of figuring out if the specs even make sense. If the document contains logical flaws, missing context, or outright hallucinations , the developer must spend time reviewing and reconciling the logic. The worst-case scenario? They decide to save time, too. They use an LLM to "read" the flawed specs and build the product, incorporating and inheriting all the mistakes, and simply passing the technical debt along. LLMs are powerful tools for augmentation, but we treat them as tools for abdication . They are fantastic at getting us to a first draft, but they cannot replace the critical human function of scrutiny, verification, and ultimate ownership. When everyone is using a tool the wrong way, you can't just say they are holding it wrong . But I don't see how we can make verification a sustainable part of the process when the whole point of using an LLM is to save time. For now at least, we have to deliberately consider all LLM outputs incorrect until vetted. If we fail to do this, we're not just creating more work for others; we're actively eroding our work, making life harder for our future selves.

0 views
Armin Ronacher 5 days ago

Absurd Workflows: Durable Execution With Just Postgres

It’s probably no surprise to you that we’re building agents somewhere. Everybody does it. Building a good agent, however, brings back some of the historic challenges involving durable execution. Entirely unsurprisingly, a lot of people are now building durable execution systems. Many of these, however, are incredibly complex and require you to sign up for another third-party service. I generally try to avoid bringing in extra complexity if I can avoid it, so I wanted to see how far I can go with just Postgres. To this end, I wrote Absurd 1 , a tiny SQL-only library with a very thin SDK to enable durable workflows on top of just Postgres — no extension needed. Durable execution (or durable workflows) is a way to run long-lived, reliable functions that can survive crashes, restarts, and network failures without losing state or duplicating work. Durable execution can be thought of as the combination of a queue system and a state store that remembers the most recently seen execution state. Because Postgres is excellent at queues thanks to , you can use it for the queue (e.g., with pgmq ). And because it’s a database, you can also use it to store the state. The state is important. With durable execution, instead of running your logic in memory, the goal is to decompose a task into smaller pieces (step functions) and record every step and decision. When the process stops (whether it fails, intentionally suspends, or a machine dies) the engine can replay those events to restore the exact state and continue where it left off, as if nothing happened. Absurd at the core is a single file ( ) which needs to be applied to a database of your choice. That SQL file’s goal is to move the complexity of SDKs into the database. SDKs then make the system convenient by abstracting the low-level operations in a way that leverages the ergonomics of the language you are working with. The system is very simple: A task dispatches onto a given queue from where a worker picks it up to work on. Tasks are subdivided into steps , which are executed in sequence by the worker. Tasks can be suspended or fail, and when that happens, they execute again (a run ). The result of a step is stored in the database (a checkpoint ). To avoid repeating work, checkpoints are automatically loaded from the state storage in Postgres again. Additionally, tasks can sleep or suspend for events and wait until they are emitted. Events are cached, which means they are race-free. What is the relationship of agents with workflows? Normally, workflows are DAGs defined by a human ahead of time. AI agents, on the other hand, define their own adventure as they go. That means they are basically a workflow with mostly a single step that iterates over changing state until it determines that it has completed. Absurd enables this by automatically counting up steps if they are repeated: This defines a single task named , and it has just a single step. The return value is the changed state, but the current state is passed in as an argument. Every time the step function is executed, the data is looked up first from the checkpoint store. The first checkpoint will be , the second , , etc. Each state only stores the new messages it generated, not the entire message history. If a step fails, the task fails and will be retried. And because of checkpoint storage, if you crash in step 5, the first 4 steps will be loaded automatically from the store. Steps are never retried, only tasks. How do you kick it off? Simply enqueue it: And if you are curious, this is an example implementation of the function used above: And like Temporal and other solutions, you can yield if you want. If you want to come back to a problem in 7 days, you can do so: Or if you want to wait for an event: Which someone else can emit: Really, that’s it. There is really not much to it. It’s just a queue and a state store — that’s all you need. There is no compiler plugin and no separate service or whole runtime integration . Just Postgres. That’s not to throw shade on these other solutions; they are great. But not every problem necessarily needs to scale to that level of complexity, and you can get quite far with much less. Particularly if you want to build software that other people should be able to self-host, that might be quite appealing. It’s named Absurd because durable workflows are absurdly simple, but have been overcomplicated in recent years. ↩ It’s named Absurd because durable workflows are absurdly simple, but have been overcomplicated in recent years. ↩

0 views

How I Use Every Claude Code Feature

I use Claude Code. A lot. As a hobbyist, I run it in a VM several times a week on side projects, often with to vibe code whatever idea is on my mind. Professionally, part of my team builds the AI-IDE rules and tooling for our engineering team that consumes several billion tokens per month just for codegen. The CLI agent space is getting crowded and between Claude Code, Gemini CLI, Cursor, and Codex CLI, it feels like the real race is between Anthropic and OpenAI. But TBH when I talk to other developers, their choice often comes down to what feels like superficials—a “lucky” feature implementation or a system prompt “vibe” they just prefer. At this point these tools are all pretty good. I also feel like folks often also over index on the output style or UI. Like to me the “you’re absolutely right!” sycophancy isn’t a notable bug; it’s a signal that you’re too in-the-loop. Generally my goal is to “shoot and forget”—to delegate, set the context, and let it work. Judging the tool by the final PR and not how it gets there. Having stuck to Claude Code for the last few months, this post is my set of reflections on Claude Code’s entire ecosystem. We’ll cover nearly every feature I use (and, just as importantly, the ones I don’t), from the foundational file and custom slash commands to the powerful world of Subagents, Hooks, and GitHub Actions. This post ended up a bit long and I’d recommend it as more of a reference than something to read in entirety. The single most important file in your codebase for using Claude Code effectively is the root . This file is the agent’s “constitution,” its primary source of truth for how your specific repository works. How you treat this file depends on the context. For my hobby projects, I let Claude dump whatever it wants in there. For my professional work, our monorepo’s is strictly maintained and currently sits at 13KB (I could easily see it growing to 25KB). It only documents tools and APIs used by 30% (arbitrary) or more of our engineers (else tools are documented in product or library specific markdown files) We’ve even started allocating effectively a max token count for each internal tool’s documentation, almost like selling “ad space” to teams. If you can’t explain your tool concisely, it’s not ready for the . Over time, we’ve developed a strong, opinionated philosophy for writing an effective . Start with Guardrails, Not a Manual. Your should start small, documenting based on what Claude is getting wrong. Don’t -File Docs. If you have extensive documentation elsewhere, it’s tempting to -mention those files in your . This bloats the context window by embedding the entire file on every run. But if you just mention the path, Claude will often ignore it. You have to pitch the agent on why and when to read the file. “For complex … usage or if you encounter a , see for advanced troubleshooting steps.” Don’t Just Say “Never.” Avoid negative-only constraints like “Never use the flag.” The agent will get stuck when it thinks it must use that flag. Always provide an alternative. Use as a Forcing Function. If your CLI commands are complex and verbose, don’t write paragraphs of documentation to explain them. That’s patching a human problem. Instead, write a simple bash wrapper with a clear, intuitive API and document that . Keeping your as short as possible is a fantastic forcing function for simplifying your codebase and internal tooling. Here’s a simplified snapshot: Finally, we keep this file synced with an file to maintain compatibility with other AI IDEs that our engineers might be using. If you are looking for more tips for writing markdown for coding agents see “AI Can’t Read Your Docs”, “AI-powered Software Engineering”, and “How Cursor (AI IDE) Works”. The Takeaway: Treat your as a high-level, curated set of guardrails and pointers. Use it to guide where you need to invest in more AI (and human) friendly tools, rather than trying to make it a comprehensive manual. Thanks for reading Shrivu’s Substack! Subscribe for free to receive new posts and support my work. I recommend running mid coding session at least once to understand how you are using your 200k token context window (even with Sonnet-1M, I don’t trust that the full context window is actually used effectively). For us a fresh session in our monorepo costs a baseline ~20k tokens (10%) with the remaining 180k for making your change — which can fill up quite fast. A screenshot of /context in one of my recent side projects. You can almost think of this like disk space that fills up as you work on a feature. After a few minutes or hours you’ll need to clear the messages (purple) to make space to continue. I have three main workflows: (Avoid): I avoid this as much as possible. The automatic compaction is opaque, error-prone, and not well-optimized. + (Simple Restart): My default reboot. I the state, then run a custom command to make Claude read all changed files in my git branch. “Document & Clear” (Complex Restart): For large tasks. I have Claude dump its plan and progress into a , the state, then start a new session by telling it to read the and continue. The Takeaway: Don’t trust auto-compaction. Use for simple reboots and the “Document & Clear” method to create durable, external “memory” for complex tasks. I think of slash commands as simple shortcuts for frequently used prompts, nothing more. My setup is minimal: : The command I mentioned earlier. It just prompts Claude to read all changed files in my current git branch. : A simple helper to clean up my code, stage it, and prepare a pull request. IMHO if you have a long list of complex, custom slash commands, you’ve created an anti-pattern. To me the entire point of an agent like Claude is that you can type almost whatever you want and get a useful, mergable result. The moment you force an engineer (or non-engineer) to learn a new, documented-somewhere list of essential magic commands just to get work done, you’ve failed. The Takeaway: Use slash commands as simple, personal shortcuts, not as a replacement for building a more intuitive and better-tooled agent. On paper, custom subagents are Claude Code’s most powerful feature for context management. The pitch is simple: a complex task requires tokens of input context (e.g., how to run tests), accumulates tokens of working context, and produces a token answer. Running tasks means tokens in your main window. The subagent solution is to farm out the work to specialized agents, which only return the final token answers, keeping your main context clean. I find they are a powerful idea that, in practice, custom subagents create two new problems: They Gatekeep Context: If I make a subagent, I’ve now hidden all testing context from my main agent. It can no longer reason holistically about a change. It’s now forced to invoke the subagent just to know how to validate its own code. They Force Human Workflows: Worse, they force Claude into a rigid, human-defined workflow. I’m now dictating how it must delegate, which is the very problem I’m trying to get the agent to solve for me. My preferred alternative is to use Claude’s built-in feature to spawn clones of the general agent. I put all my key context in the . Then, I let the main agent decide when and how to delegate work to copies of itself. This gives me all the context-saving benefits of subagents without the drawbacks. The agent manages its own orchestration dynamically. In my “Building Multi-Agent Systems (Part 2)” post, I called this the “Master-Clone” architecture, and I strongly prefer it over the “Lead-Specialist” model that custom subagents encourage. The Takeaway: Custom subagents are a brittle solution. Give your main agent the context (in ) and let it use its own feature to manage delegation. On a simple level, I use and frequently. They’re great for restarting a bugged terminal or quickly rebooting an older session. I’ll often a session from days ago just to ask the agent to summarize how it overcame a specific error, which I then use to improve our and internal tooling. More in the weeds, Claude Code stores all session history in to tap into the raw historical session data. I have scripts that run meta-analysis on these logs, looking for common exceptions, permission requests, and error patterns to help improve agent-facing context. The Takeaway: Use and to restart sessions and uncover buried historical context. Hooks are huge. I don’t use them for hobby projects, but they are critical for steering Claude in a complex enterprise repo. They are the deterministic “must-do” rules that complement the “should-do” suggestions in . We use two types: Block-at-Submit Hooks: This is our primary strategy. We have a hook that wraps any command. It checks for a file, which our test script only creates if all tests pass. If the file is missing, the hook blocks the commit, forcing Claude into a “test-and-fix” loop until the build is green. Hint Hooks: These are simple, non-blocking hooks that provide “fire-and-forget” feedback if the agent is doing something suboptimal. We intentionally do not use “block-at-write” hooks (e.g., on or ). Blocking an agent mid-plan confuses or even “frustrates” it. It’s far more effective to let it finish its work and then check the final, completed result at the commit stage. The Takeaway: Use hooks to enforce state validation at commit time ( ). Avoid blocking at write time—let the agent finish its plan, then check the final result. Planning is essential for any “large” feature change with an AI IDE. For my hobby projects, I exclusively use the built-in planning mode. It’s a way to align with Claude before it starts, defining both how to build something and the “inspection checkpoints” where it needs to stop and show me its work. Using this regularly builds a strong intuition for what minimal context is needed to get a good plan without Claude botching the implementation. In our work monorepo, we’ve started rolling out a custom planning tool built on the Claude Code SDK. Its similar to native plan mode but heavily prompted to align its outputs with our existing technical design format. It also enforces our internal best practices—from code structure to data privacy and security—out of the box. This lets our engineers “vibe plan” a new feature as if they were a senior architect (or at least that’s the pitch). The Takeaway: Always use the built-in planning mode for complex changes to align on a plan before the agent starts working. I agree with Simon Willison’s : Skills are (maybe) a bigger deal than MCP. If you’ve been following my posts, you’ll know I’ve drifted away from MCP for most dev workflows, preferring to build simple CLIs instead (as I argued in “AI Can’t Read Your Docs” ). My mental model for agent autonomy has evolved into three stages: Single Prompt: Giving the agent all context in one massive prompt. (Brittle, doesn’t scale). Tool Calling: The “classic” agent model. We hand-craft tools and abstract away reality for the agent. (Better, but creates new abstractions and context bottlenecks). Scripting : We give the agent access to the raw environment—binaries, scripts, and docs—and it writes code on the fly to interact with them. With this model in mind, Agent Skills are the obvious next feature. They are the formal productization of the “Scripting” layer. If, like me, you’ve already been favoring CLIs over MCP, you’ve been implicitly getting the benefit of Skills all along. The file is just a more organized, shareable, and discoverable way to document these CLIs and scripts and expose them to the agent. The Takeaway: Skills are the right abstraction. They formalize the “scripting”-based agent model, which is more robust and flexible than the rigid, API-like model that MCP represents. Skills don’t mean MCP is dead (see also “Everything Wrong with MCP” ). Previously, many built awful, context-heavy MCPs with dozens of tools that just mirrored a REST API ( , , ). The “Scripting” model (now formalized by Skills) is better, but it needs a secure way to access the environment. This to me is the new, more focused role for MCP. Instead of a bloated API, an MCP should be a simple, secure gateway that provides a few powerful, high-level tools: In this model, MCP’s job isn’t to abstract reality for the agent; its job is to manage the auth, networking, and security boundaries and then get out of the way. It provides the entry point for the agent, which then uses its scripting and context to do the actual work. The only MCP I still use is for Playwright , which makes sense—it’s a complex, stateful environment. All my stateless tools (like Jira, AWS, GitHub) have been migrated to simple CLIs. The Takeaway: Use MCPs that act as data gateways. Give the agent one or two high-level tools (like a raw data dump API) that it can then script against. Claude Code isn’t just an interactive CLI; it’s also a powerful SDK for building entirely new agents—for both coding and non-coding tasks. I’ve started using it as my default agent framework over tools like LangChain/CrewAI for most new hobby projects. I use it in three main ways: Massive Parallel Scripting: For large-scale refactors, bug fixes, or migrations, I don’t use the interactive chat. I write simple bash scripts that call in parallel. This is far more scalable and controllable than trying to get the main agent to manage dozens of subagent tasks. Building Internal Chat Tools: The SDK is perfect for wrapping complex processes in a simple chat interface for non-technical users. Like an installer that, on error, falls back to the Claude Code SDK to just fix the problem for the user. Or an in-house “ v0-at-home ” tool that lets our design team vibe-code mock frontends in our in-house UI framework, ensuring their ideas are high-fidelity and the code is more directly usable in frontend production code. Rapid Agent Prototyping: This is my most common use. It’s not just for coding. If I have an idea for any agentic task (e.g., a “threat investigation agent” that uses custom CLIs or MCPs), I use the Claude Code SDK to quickly build and test the prototype before committing to a full, deployed scaffolding. The Takeaway: The Claude Code SDK is a powerful, general-purpose agent framework. Use it for batch-processing code, building internal tools, and rapidly prototyping new agents before you reach for more complex frameworks. The Claude Code GitHub Action (GHA) is probably one of my favorite and most slept on features. It’s a simple concept: just run Claude Code in a GHA. But this simplicity is what makes it so powerful. It’s similar to Cursor’s background agents or the Codex managed web UI but is far more customizable. You control the entire container and environment, giving you more access to data and, crucially, much stronger sandboxing and audit controls than any other product provides. Plus, it supports all the advanced features like Hooks and MCP. We’ve used it to build custom “PR-from-anywhere” tooling. Users can trigger a PR from Slack, Jira, or even a CloudWatch alert, and the GHA will fix the bug or add the feature and return a fully tested PR 1 . Since the GHA logs are the full agent logs, we have an ops process to regularly review these logs at a company level for common mistakes, bash errors, or unaligned engineering practices. This creates a data-driven flywheel: Bugs -> Improved CLAUDE.md / CLIs -> Better Agent. The Takeaway: The GHA is the ultimate way to operationalize Claude Code. It turns it from a personal tool into a core, auditable, and self-improving part of your engineering system. Finally, I have a few specific configurations that I’ve found essential for both hobby and professional work. / : This is great for debugging. I’ll use it to inspect the raw traffic to see exactly what prompts Claude is sending. For background agents, it’s also a powerful tool for fine-grained network sandboxing. / : I bump these. I like running long, complex commands, and the default timeouts are often too conservative. I’m honestly not sure if this is still needed now that bash background tasks are a thing, but I keep it just in case. : At work, we use our enterprise API keys ( via apiKeyHelper ). It shifts us from a “per-seat” license to “usage-based” pricing, which is a much better model for how we work. It accounts for the massive variance in developer usage (We’ve seen 1:100x differences between engineers). It lets engineers to tinker with non-Claude-Code LLM scripts, all under our single enterprise account. : I’ll occasionally self-audit the list of commands I’ve allowed Claude to auto-run. The Takeaway: Your is a powerful place for advanced customization. That was a lot, but hopefully, you find it useful. If you’re not already using a CLI-based agent like Claude Code or Codex CLI, you probably should be. There are rarely good guides for these advanced features, so the only way to learn is to dive in. Thanks for reading Shrivu’s Substack! Subscribe for free to receive new posts and support my work. To me, a fairly interesting philosophical question is how many reviewers should a PR get that was generated directly from a customer request (no internal human prompter)? We’ve settled on 2 human approvals for any AI-initiated PR for now, but it is kind of a weird paradigm shift (for me at least) when it’s no longer a human making something for another human to review. It only documents tools and APIs used by 30% (arbitrary) or more of our engineers (else tools are documented in product or library specific markdown files) We’ve even started allocating effectively a max token count for each internal tool’s documentation, almost like selling “ad space” to teams. If you can’t explain your tool concisely, it’s not ready for the . Start with Guardrails, Not a Manual. Your should start small, documenting based on what Claude is getting wrong. Don’t -File Docs. If you have extensive documentation elsewhere, it’s tempting to -mention those files in your . This bloats the context window by embedding the entire file on every run. But if you just mention the path, Claude will often ignore it. You have to pitch the agent on why and when to read the file. “For complex … usage or if you encounter a , see for advanced troubleshooting steps.” Don’t Just Say “Never.” Avoid negative-only constraints like “Never use the flag.” The agent will get stuck when it thinks it must use that flag. Always provide an alternative. Use as a Forcing Function. If your CLI commands are complex and verbose, don’t write paragraphs of documentation to explain them. That’s patching a human problem. Instead, write a simple bash wrapper with a clear, intuitive API and document that . Keeping your as short as possible is a fantastic forcing function for simplifying your codebase and internal tooling. A screenshot of /context in one of my recent side projects. You can almost think of this like disk space that fills up as you work on a feature. After a few minutes or hours you’ll need to clear the messages (purple) to make space to continue. I have three main workflows: (Avoid): I avoid this as much as possible. The automatic compaction is opaque, error-prone, and not well-optimized. + (Simple Restart): My default reboot. I the state, then run a custom command to make Claude read all changed files in my git branch. “Document & Clear” (Complex Restart): For large tasks. I have Claude dump its plan and progress into a , the state, then start a new session by telling it to read the and continue. : The command I mentioned earlier. It just prompts Claude to read all changed files in my current git branch. : A simple helper to clean up my code, stage it, and prepare a pull request. They Gatekeep Context: If I make a subagent, I’ve now hidden all testing context from my main agent. It can no longer reason holistically about a change. It’s now forced to invoke the subagent just to know how to validate its own code. They Force Human Workflows: Worse, they force Claude into a rigid, human-defined workflow. I’m now dictating how it must delegate, which is the very problem I’m trying to get the agent to solve for me. Block-at-Submit Hooks: This is our primary strategy. We have a hook that wraps any command. It checks for a file, which our test script only creates if all tests pass. If the file is missing, the hook blocks the commit, forcing Claude into a “test-and-fix” loop until the build is green. Hint Hooks: These are simple, non-blocking hooks that provide “fire-and-forget” feedback if the agent is doing something suboptimal. Single Prompt: Giving the agent all context in one massive prompt. (Brittle, doesn’t scale). Tool Calling: The “classic” agent model. We hand-craft tools and abstract away reality for the agent. (Better, but creates new abstractions and context bottlenecks). Scripting : We give the agent access to the raw environment—binaries, scripts, and docs—and it writes code on the fly to interact with them. Massive Parallel Scripting: For large-scale refactors, bug fixes, or migrations, I don’t use the interactive chat. I write simple bash scripts that call in parallel. This is far more scalable and controllable than trying to get the main agent to manage dozens of subagent tasks. Building Internal Chat Tools: The SDK is perfect for wrapping complex processes in a simple chat interface for non-technical users. Like an installer that, on error, falls back to the Claude Code SDK to just fix the problem for the user. Or an in-house “ v0-at-home ” tool that lets our design team vibe-code mock frontends in our in-house UI framework, ensuring their ideas are high-fidelity and the code is more directly usable in frontend production code. Rapid Agent Prototyping: This is my most common use. It’s not just for coding. If I have an idea for any agentic task (e.g., a “threat investigation agent” that uses custom CLIs or MCPs), I use the Claude Code SDK to quickly build and test the prototype before committing to a full, deployed scaffolding. / : This is great for debugging. I’ll use it to inspect the raw traffic to see exactly what prompts Claude is sending. For background agents, it’s also a powerful tool for fine-grained network sandboxing. / : I bump these. I like running long, complex commands, and the default timeouts are often too conservative. I’m honestly not sure if this is still needed now that bash background tasks are a thing, but I keep it just in case. : At work, we use our enterprise API keys ( via apiKeyHelper ). It shifts us from a “per-seat” license to “usage-based” pricing, which is a much better model for how we work. It accounts for the massive variance in developer usage (We’ve seen 1:100x differences between engineers). It lets engineers to tinker with non-Claude-Code LLM scripts, all under our single enterprise account. : I’ll occasionally self-audit the list of commands I’ve allowed Claude to auto-run.

0 views
Jeff Quast 6 days ago

State of Terminal Emulators in 2025: The Errant Champions

This is a follow-up to my previous article, Terminal Emulators Battle Royale – Unicode Edition! from 2023, in which I documented Unicode support across terminal emulators. Since then, the ucs-detect tool and its supporting blessed library have been extended to automatically detect support of DEC Private Modes, sixel graphics, pixel size, and software version. The ucs-detect program tests terminal cursor positioning by sending visible text followed by control sequences that request the cursor position.

0 views
Corrode 6 days ago

Patterns for Defensive Programming in Rust

I have a hobby. Whenever I see the comment in code, I try to find out the exact conditions under which it could happen. And in 90% of cases, I find a way to do just that. More often than not, the developer just hasn’t considered all edge cases or future code changes. In fact, the reason why I like this comment so much is that it often marks the exact spot where strong guarantees fall apart. Often, violating implicit invariants that aren’t enforced by the compiler are the root cause. Yes, the compiler prevents memory safety issues, and the standard library is best-in-class. But even the standard library has its warts and bugs in business logic can still happen. All we can work with are hard-learned patterns to write more defensive Rust code, learned throughout years of shipping Rust code to production. I’m not talking about design patterns here, but rather small idioms, which are rarely documented, but make a big difference in the overall code quality. Here’s some innocent-looking code: This code works for now, but what if you refactor it and forget to keep the length check? That’s our first implicit invariant that’s not enforced by the compiler. The problem is that indexing into a vector is decoupled from checking its length: these are two separate operations, which can be changed independently without the compiler ringing the alarm. If we use slice pattern matching, we’ll only get access to the element if the arm is executed. Note how this automatically uncovered one more edge case: what if the list is empty? We hadn’t considered this case before. The compiler-enforced pattern matching forces us to think about all possible states! This is a common pattern throughout robust Rust code, the attempt to put the compiler in charge of enforcing invariants. When initializing an object with many fields, it’s tempting to use to fill in the rest. In practice, this is a common source of bugs. You might forget to explicitly set a new field later when you add it to the struct (thus using the default value instead, which might not be what you want), or you might not be aware of all the fields that are being set to default values. Instead of this: Yes, it’s slightly more verbose, but what you gain is that the compiler will force you to handle all fields explicitly. Now when you add a new field to , the compiler will remind you to set it here as well and reflect on which value makes sense. Let’s say you’re building a pizza ordering system and have an order type like this: For your order tracking system, you want to compare orders based on what’s actually on the pizza - the , , and . The timestamp shouldn’t affect whether two orders are considered the same. Here’s the problem with the obvious approach: Now imagine your team adds a field for customization options: Your implementation still compiles, but is it correct? Should be part of the equality check? Probably yes - a pizza with extra cheese is a different order! But you’ll never know because the compiler won’t remind you to think about it. Here’s the defensive approach using destructuring: Now when someone adds the field, this code won’t compile anymore. The compiler forces you to decide: should be included in the comparison or explicitly ignored with ? This pattern works for any trait implementation where you need to handle struct fields: , , , etc. It’s especially valuable in codebases where structs evolve frequently as requirements change. Sometimes there’s no conversion that will work 100% of the time. That’s fine. When that’s the case, resist the temptation to offer a implementation out of habit; use instead. Here’s an example of in disguise: The is a hint that this conversion can fail in some way. We set a default value instead, but is it really the right thing to do for all callers? This should be a implementation instead, making the fallible nature explicit. We fail fast instead of continuing with a potentially flawed business logic. It’s tempting to use in combination with a catch-all pattern like , but this can haunt you later. The problem is that you might forget to handle a new case that was added later. Instead of: By spelling out all variants explicitly, the compiler will warn you when a new variant is added, forcing you to handle it. Another case of putting the compiler to work. If the code for two variants is the same, you can group them: Using as a placeholder for unused variables can lead to confusion. For example, you might get confused about which variable was skipped. That’s especially true for boolean flags: In the above example, it’s not clear which variables were skipped and why. Better to use descriptive names for the variables that are not used: Even if you don’t use the variables, it’s clear what they represent and the code becomes more readable and easier to review without inline type hints. If you only want your data to be mutable temporarily, make that explicit. This pattern is often called “temporary mutability” and helps prevent accidental modifications after initialization. See the Rust unofficial patterns book for more details. Let’s say you had a simple type like the following: Now you want to make invalid states unrepresentable. One pattern is to return a from the constructor. But nothing stops someone from creating an instance of directly: This should not be possible! One way to prevent this is to make the struct non-exhaustive: Now the struct cannot be instantiated directly outside of the module. However, what about the module itself? One way to prevent this is to add a hidden field: Now the struct cannot be instantiated directly even inside the module. You have to go through the constructor, which enforces the validation logic. The attribute is often neglected. That’s sad, because it’s such a simple yet powerful mechanism to prevent callers from accidentally ignoring important return values. Now if someone creates a but forgets to use it, the compiler will warn them: This is especially useful for guard types that need to be held for their lifetime and results from operations that must be checked. The standard library uses this extensively. For example, is marked with , which is why you get warnings if you don’t handle errors. Boolean parameters make code hard to read at the call site and are error-prone. We all know the scenario where we’re sure this will be the last boolean parameter we’ll ever add to a function. It’s impossible to understand what this code does without looking at the function signature. Even worse, it’s easy to accidentally swap the boolean values. Instead, use enums to make the intent explicit: This is much more readable and the compiler will catch mistakes if you pass the wrong enum type. You will notice that the enum variants can be more descriptive than just or . And more often than not, there are more than two meaningful options; especially for programs which grow over time. For functions with many options, you can configure them using a parameter struct: This approach scales much better as your function evolves. Adding new parameters doesn’t break existing call sites, and you can easily add defaults or make certain fields optional. The preset methods also document common use cases and make it easy to use the right configuration for different scenarios. Rust is often criticized for not having named parameters, but using a parameter struct is arguably even better for larger functions with many options. Many of these patterns can be enforced automatically using Clippy lints. Here are the most relevant ones: You can enable these in your project by adding them to your or at the top of your crate, e.g. Defensive programming in Rust is about leveraging the type system and compiler to catch bugs before they happen. By following these patterns, you can: It’s a skill that doesn’t come naturally and it’s not covered in most Rust books, but knowing these patterns can make the difference between code that works but is brittle, and code that is robust and maintainable for years to come. Remember: if you find yourself writing , take a step back and ask how the compiler could enforce that invariant for you instead. The best bug is the one that never compiles in the first place. Make implicit invariants explicit and compiler-checked Future-proof your code against refactoring mistakes Reduce the surface area for bugs

0 views