Posts in Javascript (20 found)

Introducing Showboat and Rodney, so agents can demo what they’ve built

A key challenge working with coding agents is having them both test what they’ve built and demonstrate that software to you, their overseer. This goes beyond automated tests - we need artifacts that show their progress and help us see exactly what the agent-produced software is able to do. I’ve just released two new tools aimed at this problem: Showboat and Rodney . I recently wrote about how the job of a software engineer isn't to write code, it's to deliver code that works . A big part of that is proving to ourselves and to other people that the code we are responsible for behaves as expected. This becomes even more important - and challenging - as we embrace coding agents as a core part of our software development process. The more code we churn out with agents, the more valuable tools are that reduce the amount of manual QA time we need to spend. One of the most interesting things about the StrongDM software factory model is how they ensure that their software is well tested and delivers value despite their policy that "code must not be reviewed by humans". Part of their solution involves expensive swarms of QA agents running through "scenarios" to exercise their software. It's fascinating, but I don't want to spend thousands of dollars on QA robots if I can avoid it! I need tools that allow agents to clearly demonstrate their work to me, while minimizing the opportunities for them to cheat about what they've done. Showboat is the tool I built to help agents demonstrate their work to me. It's a CLI tool (a Go binary, optionally wrapped in Python to make it easier to install) that helps an agent construct a Markdown document demonstrating exactly what their newly developed code can do. It's not designed for humans to run, but here's how you would run it anyway: Here's what the result looks like if you open it up in VS Code and preview the Markdown: Here's that demo.md file in a Gist . So a sequence of , , and commands constructs a Markdown document one section at a time, with the output of those commands automatically added to the document directly following the commands that were run. The command is a little special - it looks for a file path to an image in the output of the command and copies that image to the current folder and references it in the file. That's basically the whole thing! There's a command to remove the most recently added section if something goes wrong, a command to re-run the document and check nothing has changed (I'm not entirely convinced by the design of that one) and a command that reverse-engineers the CLI commands that were used to create the document. It's pretty simple - just 172 lines of Go. I packaged it up with my go-to-wheel tool which means you can run it without even installing it first like this: That command is really important: it's designed to provide a coding agent with everything it needs to know in order to use the tool. Here's that help text in full . This means you can pop open Claude Code and tell it: And that's it! The text acts a bit like a Skill . Your agent can read the help text and use every feature of Showboat to create a document that demonstrates whatever it is you need demonstrated. Here's a fun trick: if you set Claude off to build a Showboat document you can pop that open in VS Code and watch the preview pane update in real time as the agent runs through the demo. It's a bit like having your coworker talk you through their latest work in a screensharing session. And finally, some examples. Here are documents I had Claude create using Showboat to help demonstrate features I was working on in other projects: row-state-sql CLI Demo shows a new command I added to that same project. Change grouping with Notes demonstrates another feature where groups of changes within the same transaction can have a note attached to them. I've now used Showboat often enough that I've convinced myself of its utility. (I've also seen agents cheat! Since the demo file is Markdown the agent will sometimes edit that file directly rather than using Showboat, which could result in command outputs that don't reflect what actually happened. Here's an issue about that .) Many of the projects I work on involve web interfaces. Agents often build entirely new pages for these, and I want to see those represented in the demos. Showboat's image feature was designed to allow agents to capture screenshots as part of their demos, originally using my shot-scraper tool or Playwright . The Showboat format benefits from CLI utilities. I went looking for good options for managing a multi-turn browser session from a CLI and came up short, so I decided to try building something new. Claude Opus 4.6 pointed me to the Rod Go library for interacting with the Chrome DevTools protocol. It's fantastic - it provides a comprehensive wrapper across basically everything you can do with automated Chrome, all in a self-contained library that compiles to a few MBs. All Rod was missing was a CLI. I built the first version as an asynchronous report prototype , which convinced me it was worth spinning out into its own project. I called it Rodney as a nod to the Rod library it builds on and a reference to Only Fools and Horses - and because the package name was available on PyPI. You can run Rodney using or install it like this: (Or grab a Go binary from the releases page .) Here's a simple example session: Here's what that looks like in the terminal: As with Showboat, this tool is not designed to be used by humans! The goal is for coding agents to be able to run and see everything they need to know to start using the tool. You can see that help output in the GitHub repo. Here are three demonstrations of Rodney that I created using Showboat: After being a career-long skeptic of the test-first, maximum test coverage school of software development (I like tests included development instead) I've recently come around to test-first processes as a way to force agents to write only the code that's necessary to solve the problem at hand. Many of my Python coding agent sessions start the same way: Telling the agents how to run the tests doubles as an indicator that tests on this project exist and matter. Agents will read existing tests before writing their own so having a clean test suite with good patterns makes it more likely they'll write good tests of their own. The frontier models all understand that "red/green TDD" means they should write the test first, run it and watch it fail and then write the code to make it pass - it's a convenient shortcut. I find this greatly increases the quality of the code and the likelihood that the agent will produce the right thing with the smallest amount of prompts to guide it. But anyone who's worked with tests will know that just because the automated tests pass doesn't mean the software actually works! That’s the motivation behind Showboat and Rodney - I never trust any feature until I’ve seen it running with my own eye. Before building Showboat I'd often add a “manual” testing step to my agent sessions, something like: Both Showboat and Rodney started life as Claude Code for web projects created via the Claude iPhone app. Most of the ongoing feature work for them happened in the same way. I'm still a little startled at how much of my coding work I get done on my phone now, but I'd estimate that the majority of code I ship to GitHub these days was written for me by coding agents driven via that iPhone app. I initially designed these two tools for use in asynchronous coding agent environments like Claude Code for the web. So far that's working out really well. You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options . Proving code actually works Showboat: Agents build documents to demo their work Rodney: CLI browser automation designed to work with Showboat Test-driven development helps, but we still need manual testing I built both of these tools on my phone shot-scraper: A Comprehensive Demo runs through the full suite of features of my shot-scraper browser automation tool, mainly to exercise the command. sqlite-history-json CLI demo demonstrates the CLI feature I added to my new sqlite-history-json Python library. row-state-sql CLI Demo shows a new command I added to that same project. Change grouping with Notes demonstrates another feature where groups of changes within the same transaction can have a note attached to them. krunsh: Pipe Shell Commands to an Ephemeral libkrun MicroVM is a particularly convoluted example where I managed to get Claude Code for web to run a libkrun microVM inside a QEMU emulated Linux environment inside the Claude gVisor sandbox. Rodney's original feature set , including screenshots of pages and executing JavaScript. Rodney's new accessibility testing features , built during development of those features to show what they could do. Using those features to run a basic accessibility audit of a page . I was impressed at how well Claude Opus 4.6 responded to the prompt "Use showboat and rodney to perform an accessibility audit of https://latest.datasette.io/fixtures " - transcript here .

0 views
Armin Ronacher 2 days ago

A Language For Agents

Last year I first started thinking about what the future of programming languages might look like now that agentic engineering is a growing thing. Initially I felt that the enormous corpus of pre-existing code would cement existing languages in place but now I’m starting to think the opposite is true. Here I want to outline my thinking on why we are going to see more new programming languages and why there is quite a bit of space for interesting innovation. And just in case someone wants to start building one, here are some of my thoughts on what we should aim for! Does an agent perform dramatically better on a language that it has in its weights? Obviously yes. But there are less obvious factors that affect how good an agent is at programming in a language: how good the tooling around it is and how much churn there is. Zig seems underrepresented in the weights (at least in the models I’ve used) and also changing quickly. That combination is not optimal, but it’s still passable: you can program even in the upcoming Zig version if you point the agent at the right documentation. But it’s not great. On the other hand, some languages are well represented in the weights but agents still don’t succeed as much because of tooling choices. Swift is a good example: in my experience the tooling around building a Mac or iOS application can be so painful that agents struggle to navigate it. Also not great. So, just because it exists doesn’t mean the agent succeeds and just because it’s new also doesn’t mean that the agent is going to struggle. I’m convinced that you can build yourself up to a new language if you don’t want to depart everywhere all at once. The biggest reason new languages might work is that the cost of coding is going down dramatically. The result is the breadth of an ecosystem matters less. I’m now routinely reaching for JavaScript in places where I would have used Python. Not because I love it or the ecosystem is better, but because the agent does much better with TypeScript. The way to think about this: if important functionality is missing in my language of choice, I just point the agent at a library from a different language and have it build a port. As a concrete example, I recently built an Ethernet driver in JavaScript to implement the host controller for our sandbox. Implementations exist in Rust, C, and Go, but I wanted something pluggable and customizable in JavaScript. It was easier to have the agent reimplement it than to make the build system and distribution work against a native binding. New languages will work if their value proposition is strong enough and they evolve with knowledge of how LLMs train. People will adopt them despite being underrepresented in the weights. And if they are designed to work well with agents, then they might be designed around familiar syntax that is already known to work well. So why would we want a new language at all? The reason this is interesting to think about is that many of today’s languages were designed with the assumption that punching keys is laborious, so we traded certain things for brevity. As an example, many languages — particular modern ones — lean heavily on type inference so that you don’t have to write out types. The downside is that you now need an LSP or the resulting compiler error messages to figure out what the type of an expression is. Agents struggle with this too, and it’s also frustrating in pull request review where complex operations can make it very hard to figure out what the types actually are. Fully dynamic languages are even worse in that regard. The cost of writing code is going down, but because we are also producing more of it, understanding what the code does is becoming more important. We might actually want more code to be written if it means there is less ambiguity when we perform a review. I also want to point out that we are heading towards a world where some code is never seen by a human and is only consumed by machines. Even in that case, we still want to give an indication to a user, who is potentially a non-programmer, about what is going on. We want to be able to explain to a user what the code will do without going into the details of how. So the case for a new language comes down to: given the fundamental changes in who is programming and what the cost of code is, we should at least consider one. It’s tricky to say what an agent wants because agents will lie to you and they are influenced by all the code they’ve seen. But one way to estimate how they are doing is to look at how many changes they have to perform on files and how many iterations they need for common tasks. There are some things I’ve found that I think will be true for a while. The language server protocol lets an IDE infer information about what’s under the cursor or what should be autocompleted based on semantic knowledge of the codebase. It’s a great system, but it comes at one specific cost that is tricky for agents: the LSP has to be running. There are situations when an agent just won’t run the LSP — not because of technical limitations, but because it’s also lazy and will skip that step if it doesn’t have to. If you give it an example from documentation, there is no easy way to run the LSP because it’s a snippet that might not even be complete. If you point it at a GitHub repository and it pulls down individual files, it will just look at the code. It won’t set up an LSP for type information. A language that doesn’t split into two separate experiences (with-LSP and without-LSP) will be beneficial to agents because it gives them one unified way of working across many more situations. It pains me as a Python developer to say this, but whitespace-based indentation is a problem. The underlying token efficiency of getting whitespace right is tricky, and a language with significant whitespace is harder for an LLM to work with. This is particularly noticeable if you try to make an LLM do surgical changes without an assisted tool. Quite often they will intentionally disregard whitespace, add markers to enable or disable code and then rely on a code formatter to clean up indentation later. On the other hand, braces that are not separated by whitespace can cause issues too. Depending on the tokenizer, runs of closing parentheses can end up split into tokens in surprising ways (a bit like the “strawberry” counting problem), and it’s easy for an LLM to get Lisp or Scheme wrong because it loses track of how many closing parentheses it has already emitted or is looking at. Fixable with future LLMs? Sure, but also something that was hard for humans to get right too without tooling. Readers of this blog might know that I’m a huge believer in async locals and flow execution context — basically the ability to carry data through every invocation that might only be needed many layers down the call chain. Working at an observability company has really driven home the importance of this for me. The challenge is that anything that flows implicitly might not be configured. Take for instance the current time. You might want to implicitly pass a timer to all functions. But what if a timer is not configured and all of a sudden a new dependency appears? Passing all of it explicitly is tedious for both humans and agents and bad shortcuts will be made. One thing I’ve experimented with is having effect markers on functions that are added through a code formatting step. A function can declare that it needs the current time or the database, but if it doesn’t mark this explicitly, it’s essentially a linting warning that auto-formatting fixes. The LLM can start using something like the current time in a function and any existing caller gets the warning; formatting propagates the annotation. This is nice because when the LLM builds a test, it can precisely mock out these side effects — it understands from the error messages what it has to supply. For instance: Agents struggle with exceptions, they are afraid of them. I’m not sure to what degree this is solvable with RL (Reinforcement Learning), but right now agents will try to catch everything they can, log it, and do a pretty poor recovery. Given how little information is actually available about error paths, that makes sense. Checked exceptions are one approach, but they propagate all the way up the call chain and don’t dramatically improve things. Even if they end up as hints where a linter tracks which errors can fly by, there are still many call sites that need adjusting. And like the auto-propagation proposed for context data, it might not be the right solution. Maybe the right approach is to go more in on typed results, but that’s still tricky for composability without a type and object system that supports it. The general approach agents use today to read files into memory is line-based, which means they often pick chunks that span multi-line strings. One easy way to see this fall apart: have an agent work on a 2000-line file that also contains long embedded code strings — basically a code generator. The agent will sometimes edit within a multi-line string assuming it’s the real code when it’s actually just embedded code in a multi-line string. For multi-line strings, the only language I’m aware of with a good solution is Zig, but its prefix-based syntax is pretty foreign to most people. Reformatting also often causes constructs to move to different lines. In many languages, trailing commas in lists are either not supported (JSON) or not customary. If you want diff stability, you’d aim for a syntax that requires less reformatting and mostly avoids multi-line constructs. What’s really nice about Go is that you mostly cannot import symbols from another package into scope without every use being prefixed with the package name. Eg: instead of . There are escape hatches (import aliases and dot-imports), but they’re relatively rare and usually frowned upon. That dramatically helps an agent understand what it’s looking at. In general, making code findable through the most basic tools is great — it works with external files that aren’t indexed, and it means fewer false positives for large-scale automation driven by code generated on the fly (eg: , invocations). Much of what I’ve said boils down to: agents really like local reasoning. They want it to work in parts because they often work with just a few loaded files in context and don’t have much spatial awareness of the codebase. They rely on external tooling like grep to find things, and anything that’s hard to grep or that hides information elsewhere is tricky. What makes agents fail or succeed in many languages is just how good the build tools are. Many languages make it very hard to determine what actually needs to rebuild or be retested because there are too many cross-references. Go is really good here: it forbids circular dependencies between packages (import cycles), packages have a clear layout, and test results are cached. Agents often struggle with macros. It was already pretty clear that humans struggle with macros too, but the argument for them was mostly that code generation was a good way to have less code to write. Since that is less of a concern now, we should aim for languages with less dependence on macros. There’s a separate question about generics and comptime . I think they fare somewhat better because they mostly generate the same structure with different placeholders and it’s much easier for an agent to understand that. Related to greppability: agents often struggle to understand barrel files and they don’t like them. Not being able to quickly figure out where a class or function comes from leads to imports from the wrong place, or missing things entirely and wasting context by reading too many files. A one-to-one mapping from where something is declared to where it’s imported from is great. And it does not have to be overly strict either. Go kind of goes this way, but not too extreme. Any file within a directory can define a function, which isn’t optimal, but it’s quick enough to find and you don’t need to search too far. It works because packages are forced to be small enough to find everything with grep. The worst case is free re-exports all over the place that completely decouple the implementation from any trivially reconstructable location on disk. Or worse: aliasing. Agents often hate it when aliases are involved. In fact, you can get them to even complain about it in thinking blocks if you let them refactor something that uses lots of aliases. Ideally a language encourages good naming and discourages aliasing at import time as a result. Nobody likes flaky tests, but agents even less so. Ironic given how particularly good agents are at creating flaky tests in the first place. That’s because agents currently love to mock and most languages do not support mocking well. So many tests end up accidentally not being concurrency safe or depend on development environment state that then diverges in CI or production. Most programming languages and frameworks make it much easier to write flaky tests than non-flaky ones. That’s because they encourage indeterminism everywhere. In an ideal world the agent has one command, that lints and compiles and it tells the agent if all worked out fine. Maybe another command to run all tests that need running. In practice most environments don’t work like this. For instance in TypeScript you can often run the code even though it fails type checks . That can gaslight the agent. Likewise different bundler setups can cause one thing to succeed just for a slightly different setup in CI to fail later. The more uniform the tooling the better. Ideally it either runs or doesn’t and there is mechanical fixing for as many linting failures as possible so that the agent does not have to do it by hand. I think we will. We are writing more software now than we ever have — more websites, more open source projects, more of everything. Even if the ratio of new languages stays the same, the absolute number will go up. But I also truly believe that many more people will be willing to rethink the foundations of software engineering and the languages we work with. That’s because while for some years it has felt you need to build a lot of infrastructure for a language to take off, now you can target a rather narrow use case: make sure the agent is happy and extend from there to the human. I just hope we see two things. First, some outsider art: people who haven’t built languages before trying their hand at it and showing us new things. Second, a much more deliberate effort to document what works and what doesn’t from first principles. We have actually learned a lot about what makes good languages and how to scale software engineering to large teams. Yet, finding it written down, as a consumable overview of good and bad language design, is very hard to come by. Too much of it has been shaped by opinion on rather pointless things instead of hard facts. Now though, we are slowly getting to the point where facts matter more, because you can actually measure what works by seeing how well agents perform with it. No human wants to be subject to surveys, but agents don’t care . We can see how successful they are and where they are struggling.

0 views
Den Odell 2 days ago

Fast by Default

After 25 years building sites for global brands, I kept seeing the same pattern appear. A team ships new features, users quietly begin to struggle, and only later do the bug reports start trickling in. Someone finally checks the metrics, panic spreads, and feature development is put on hold so the team can patch problems already affecting thousands of people. The fixes help for a while, but a month later another slowdown appears and the cycle begins again. The team spends much of its time firefighting instead of building. I call this repeating sequence of ship, complain, panic, patch the Performance Decay Cycle . Sadly, it’s the default state for many teams and it drains morale fast. There has to be a better way. When I stepped into tech-lead roles, I started experimenting. What if performance was something we protected from the start rather than something we cleaned up afterward? What if the entire team shared responsibility instead of relying on a single performance-minded engineer to swoop in and fix things? And what if the system itself made performance visible early, long before issues hit production? Across several teams and many iterations, a different pattern began to emerge. I now call it Fast by Default . Fast by Default is the practice of embedding performance into every stage of development so speed becomes the natural outcome, not a late rescue mission. It involves everyone in the team, not just engineers. Most organizations treat performance as something to address when it hurts, or they schedule a bug-fix sprint every few months. Both approaches are expensive, unreliable, and almost always too late. By the time a slowdown is noticeable, the causes are already baked into the rendering strategy, the data-fetching sequence, and the component boundaries. These decisions define a ceiling on how fast your system can ever be. You can tune within that ceiling, but without a rebuild, you can’t break through it. Meanwhile, the baseline slowly drifts. Slow builds and sluggish interactions become expected. What felt unacceptable in week 1 feels normal by month 6. And once a feature ships, the attention shifts. Performance work competes with new ideas and roadmap pressure. Most teams never return to clean things up. Performance regressions rarely announce themselves through one dramatic failure. They accumulate quietly, through dozens of reasonable decisions. A feature adds a little more JavaScript, a new dependency brings a hidden transitive load, and a design tweak introduces layout movement. A single page load still feels fine, but interactions begin to feel heavier. More features are added, more code ships, and slowly the slow path becomes the normal path. It shows up most clearly at the dependency level: Each import made sense in isolation and passed through code review. No single decision broke the experience; the combination did. This is why prevention always beats the cure. If you want to avoid returning to a culture of whack-a-mole fixes, you need to change the incentives so fast outcomes happen naturally. The core idea is simple: make the fast path easier than the slow path. Once you do that, performance stops depending on vigilance or heroics. You create systems and workflows that quietly pull the team toward fast decisions without friction. Here’s what this looks like day-to-day: If your starting point is a client-rendered SPA, you’re already fighting uphill. Server-first rendering with selective hydration (often called the Islands Architecture ) gives you a performance margin that doesn’t require constant micro-optimization to maintain. It also helps clarify how much of your SPA truly needs to be a SPA. When dependency size appears directly in your IDE, bundle size and budget checks run automatically in CI, and hydration warnings surface in local development, developers see the cost of their changes immediately and fix issues while the context is still fresh. Reaching for utility-first libraries, choosing smaller dependencies, and cultivating a culture where the first question is "do we need this?" rather than "why not?" keeps complexity from compounding. When reviewers consistently ask how a change affects render time or memory pressure, the entire team levels up. The question becomes part of the craft rather than an afterthought, and eventually it appears in every pull request. Teams that stay fast don’t succeed because they have more performance experts; they succeed because they distribute ownership. Designers think about layout stability, product managers scope work with speed in mind, and engineers treat performance budgets as part of correctness rather than a separate concern. Everyone understands that shipping fast code is as important as shipping correct code. For this to work, regressions need to surface early. That requires continuous measurement, clear ownership, and tooling that highlights problems before users do. Once the system pulls in the right direction with minimal resistance, performance becomes self-sustaining. A team with fast defaults ships fast software in month 1, and they’re still shipping fast software in month 12 and month 36 because small advantages accumulate in their favor. A team living in the Performance Decay Cycle may start with acceptable performance, but by month 12 they find themselves planning a dedicated performance sprint, and by month 36 they’re discussing a rewrite. The difference isn’t expertise or effort; it’s the approach they started from. Speed is leverage because it builds trust, sharpens design, and accelerates development. Once you lose it, you lose more than seconds: you lose users, revenue, and confidence in your own system. Fast by Default is how teams break this cycle and build systems that stay fast as they grow. For more on this model, see https://fastbydefault.com. <small>This article was first published on 4 December 2025 at https://calendar.perfplanet.com/2025/fast-by-default/</small>

0 views
Simon Willison 4 days ago

Running Pydantic's Monty Rust sandboxed Python subset in WebAssembly

There's a jargon-filled headline for you! Everyone's building sandboxes for running untrusted code right now, and Pydantic's latest attempt, Monty , provides a custom Python-like language (a subset of Python) in Rust and makes it available as both a Rust library and a Python package. I got it working in WebAssembly, providing a sandbox-in-a-sandbox. Here's how they describe Monty : Monty avoids the cost, latency, complexity and general faff of using full container based sandbox for running LLM generated code. Instead, it let's you safely run Python code written by an LLM embedded in your agent, with startup times measured in single digit microseconds not hundreds of milliseconds. What Monty can do: A quick way to try it out is via uv : Then paste this into the Python interactive prompt - the enables top-level await: Monty supports a very small subset of Python - it doesn't even support class declarations yet! But, given its target use-case, that's not actually a problem. The neat thing about providing tools like this for LLMs is that they're really good at iterating against error messages. A coding agent can run some Python code, get an error message telling it that classes aren't supported and then try again with a different approach. I wanted to try this in a browser, so I fired up a code research task in Claude Code for web and kicked it off with the following: Clone https://github.com/pydantic/monty to /tmp and figure out how to compile it into a python WebAssembly wheel that can then be loaded in Pyodide. The wheel file itself should be checked into the repo along with build scripts and passing pytest playwright test scripts that load Pyodide from a CDN and the wheel from a “python -m http.server” localhost and demonstrate it working Then a little later: I want an additional WASM file that works independently of Pyodide, which is also usable in a web browser - build that too along with playwright tests that show it working. Also build two HTML files - one called demo.html and one called pyodide-demo.html - these should work similar to https://tools.simonwillison.net/micropython (download that code with curl to inspect it) - one should load the WASM build, the other should load Pyodide and have it use the WASM wheel. These will be served by GitHub Pages so they can load the WASM and wheel from a relative path since the .html files will be served from the same folder as the wheel and WASM file Here's the transcript , and the final research report it produced. I now have the Monty Rust code compiled to WebAssembly in two different shapes - as a bundle you can load and call from JavaScript, and as a wheel file which can be loaded into Pyodide and then called from Python in Pyodide in WebAssembly in a browser. Here are those two demos, hosted on GitHub Pages: As a connoisseur of sandboxes - the more options the better! - this new entry from Pydantic ticks a lot of my boxes. It's small, fast, widely available (thanks to Rust and WebAssembly) and provides strict limits on memory usage, CPU time and access to disk and network. It was also a great excuse to spin up another demo showing how easy it is these days to turn compiled code like C or Rust into WebAssembly that runs in both a browser and a Pyodide environment. You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options . Run a reasonable subset of Python code - enough for your agent to express what it wants to do Completely block access to the host environment: filesystem, env variables and network access are all implemented via external function calls the developer can control Call functions on the host - only functions you give it access to [...] Monty WASM demo - a UI over JavaScript that loads the Rust WASM module directly. Monty Pyodide demo - this one provides an identical interface but here the code is loading Pyodide and then installing the Monty WASM wheel .

0 views

Date Arithmetic in Bash

Date and time management libraries in many programming languages are famously bad. Python's datetime module comes to mind as one of the best (worst?) examples, and so does JavaScript's Date class . It feels like these libraries could not have been made worse on purpose, or so I thought until today, when I needed to implement some date calculations in a backup rotation script written in bash. So, if you wanted to learn how to perform date and time arithmetic in your bash scripts, you've come to the right place. Just don't blame me for the nightmares.

0 views
Circus Scientist 1 weeks ago

Failure is part of Success

Today I want to talk about failure – specifically where it applies to my work on SmartPoi/Magic Poi, with Cecli AI assistant . A little while back I decided to add Timeline mode to SmartPoi – and SmartPoi_Controls. This required code updates to: After looking at Timelines, I realised that actually I did not have to re-write the whole Timeline thing into SmartPoi firmware – I already had a working base in Magic Poi, re-written from scratch and with the integration with Magic Poi website built-in. The solution: I have started polishing the firmware up and am going to release it as Open Source. UDP mode on ESP32 just doesn’t work, sorry. I am starting to suspect it is an issue with the UDP library and not my code. I have a lot of work to do here – if the library is not working as I suspect I will have to prove that for myself by creating a simple UDP Client-Server pair of apps and troubleshoot that, without all of the complexity of SmartPoi happening around it. Estimated time: at least 2 days, but if it’s really broken there may actually be no solution? I don’t like to say something is impossible, I am willing to spend another two weeks looking at this to fix it at some point! I am currently obsessed with getting the “Perfect Sync” between poi. This is made more difficult by my “Virtual Poi” – which I am determined to sync with the “Real” poi. I actually did get this working, but now the “Virtual Poi” do not sync with one-another! So far finishing this feature is proving very difficult due to the sheer amount of services and functions that have to work together. I have MQTT server, Flask back-end, JavaScript and C++ firmware to handle. Every change needs to be uploaded to poi, and server, then tested in real world situations. The good news is, when this is working, we will have something I have never achieved before – and it’s an unique feature for LED Poi! Real-time, perfect sync over the internet. I just need time to fail and try again and it will work eventually. Looking forward to releasing Magic Poi firmware asap – while also fixing the few bugs we still have with the SmartPoi_Controls app. Many thanks to my Patreon supporters – special shout out to Flavio over in Brazil who is doing increadible things with the hardware side of things! Follow me on Patreon Check out Flavio’s work on Intagram The post Failure is part of Success appeared first on Circus Scientist . Magic Poi website (downloading of Timelines) SmartPoi firmware (both ESP32 AND Arduino versions), and SmartPoi_Controls Android App This did not succeed. I literally spent 2 weeks on this, but I am sure I mentioned before, SmartPoi code is based on a project that is now 12 years old. It works, but is very difficult to change. The changes required to add Timeline functionality were threatening to break some of that stability.

0 views
Jim Nielsen 1 weeks ago

The Browser’s Little White Lies

So I’m making a thing and I want it to be styled different if the link’s been visited. Rather than build something myself in JavaScript, I figure I’ll just hook into the browser’s mechanism for tracking if a link’s been visited (a sensible approach, if I do say so myself ). Why write JavaScript when a little CSS will do? So I craft this: But it doesn’t work. is relatively new, and I’ve been known to muff it, so it’s probably just a syntax issue. I start researching. Wouldn’t you know it? We can’t have nice things. doesn’t always work like you’d expect because we (not me, mind you) exploited it. Here’s MDN : You can style visited links, but there are limits to which styles you can use. While is not mentioned specifically, other tricks like sibling selectors are: When using a sibling selector, such as , the adjacent element ( in this example) is styled as though the link were unvisited. Why? You guessed it. Security and privacy reasons. If it were not so, somebody could come along with a little JavaScript and uncover a user’s browsing history (imagine, for example, setting styles for visited and unvisited links, then using and checking style computations). MDN says browsers tell little white lies: To preserve users' privacy, browsers lie to web applications under certain circumstances So, from what I can tell, when I write the browser is telling the engine that handles styling that all items have never been (even if they have been). So where does that leave me? Now I will abandon CSS and go use JavaScript for something only JavaScript can do. That’s a good reason for JS. Reply via: Email · Mastodon · Bluesky

0 views

Coding Agent VMs on NixOS with microvm.nix

I have come to appreciate coding agents to be valuable tools for working with computer program code in any capacity, such as learning about any program’s architecture, diagnosing bugs or developing proofs of concept. Depending on the use-case, reviewing each command the agent wants to run can get tedious and time-consuming very quickly. To safely run a coding agent without review, I wanted a Virtual Machine (VM) solution where the agent has no access to my personal files and where it’s no big deal if the agent gets compromised by malware: I can just throw away the VM and start over. Instead of setting up a stateful VM and re-installing it when needed (ugh!), I prefer the model of ephemeral VMs where nothing persists on disk, except for what is explicitly shared with the host. The project makes it easy to create such VMs on NixOS, and this article shows you how I like to set up my VMs. If you haven’t heard of NixOS before, check out the NixOS Wikipedia page and nixos.org . I spoke about why I switched to Nix in 2025 and have published a few blog posts about Nix . For understanding the threat model of AI agents, read Simon Willison’s “The lethal trifecta for AI agents: private data, untrusted content, and external communication” (June 2025) . This article’s approach to working with the threat model is to remove the “private data” part from the equation. If you want to learn about the whole field of sandboxing, check out Luis Cardoso’s “A field guide to sandboxes for AI” (Jan 2026) . I will not be comparing different solutions in this article, I will just show you one possible path. And lastly, maybe you’re not in the mood to build/run sandboxing infrastructure yourself. Good news: Sandboxing is a hot topic and there are many commercial offerings popping up that address this need. For example, David Crawshaw and Josh Bleecher Snyder (I know both from the Go community) recently launched exe.dev , an agent-friendly VM hosting service. Another example is Fly.io, who launched Sprites . Let’s jump right in! The next sections walk you through how I set up my config. First, I created a new bridge which uses as IP address range and NATs out of the network interface. All interfaces will be added to that bridge: Then, I added the module as a new input to my (check out the microvm.nix documentation for details) and enabled the module on the NixOS configuration for my PC (midna). I also created a new file, in which I declare all my VMs. Here’s what my looks like: The following declares two microvms, one for Emacs (about which I wanted to learn more) and one for Go Protobuf, a code base I am familiar with and can use to understand Claude’s capabilities: The module takes these parameters and declares: in turn pulls in , which sets up home-manager to: The makes available a bunch of required and convenient packages: Let’s create the workspace directory and create an SSH host key: Now we can start the VM: It boots and responds to pings within a few seconds. Then, SSH into the VM (perhaps in a session) and run Claude (or your Coding Agent of choice) without permission prompts in the shared workspace directory: This is what running Claude in such a setup looks like: After going through the process of setting up a MicroVM once, it becomes tedious. I was curious if Claude Skills could help with a task like this. Skills are markdown files that instruct Claude to do certain steps in certain situations. I created as follows: When using this skill with Claude Code (tested version: v2.0.76 and v2.1.15), with the Opus 4.5 model , I can send a prompt like this: please set up a microvm for Debian Code Search (dcs). see ~/dcs for the source code (but clone from https://github.com/Debian/dcs ) Now Claude churns for a few minutes, possibly asking a clarification question before that. Afterwards, Claude reports back with: The dcsvm microvm has been set up successfully. Here’s what was created: Configuration: Build verified - The configuration builds successfully. To start the microvm after deploying: To SSH into it: Wonderful! In my experience, Claude always got the VM creation correct. In fact, you can go one step further: Instead of just asking Claude to create new MicroVMs, you can also ask Claude to replicate this entire setup into your NixOS configuration! Try a prompt like this: read https://michael.stapelberg.ch/posts/2026-02-01-coding-agent-microvm-nix/ — I want the exact same setup in my midna NixOS configuration please! NixOS has a reputation of being hard to adopt, but once you are using NixOS, you can do powerful things like spinning up ephemeral MicroVMs for a new project within minutes. The maintenance effort is minimal: When I update my personal PC, my MicroVM configurations start using the new software versions, too. Customization is easy if needed. This actually mirrors my experience with Coding Agents: I don’t feel like they’re automatically making existing tasks more efficient, I feel that they make things possible that were previously out of reach (similar to Jevons paradox ). It was fascinating (and scary!) to experience the quality increase of Coding Agents during 2025. At the beginning of 2025 I thought that LLMs are an overhyped toy, and felt it was almost insulting when people showed me text or code produced by these models. But almost every new frontier model release got significantly better, and by now I have been positively surprised by Claude Code’s capabilities and quality many times. It has produced code that handles legitimate edge cases I would not have considered. With this article, I showed one possible way to run Coding Agents safely (or any workload that shouldn’t access your private data, really) that you can adjust in many ways for your needs. Network settings: I like using and . Shared directories for: the workspace directory, e.g. the host’s Nix store, so the VM can access software from cache (often) this VM’s SSH host keys , which is a separate state directory, used only on the microvms. an 8 GB disk overlay (var.img), stored in (QEMU also works well!) as the hypervisor, with 8 vCPUs and 4 GB RAM. A workaround for systemd trying to unmount (which causes a deadlock). Set up Zsh with my configuration Set up Emacs with my configuration Set up Claude Code in shared directory . /home/michael/machines/midna/microvms/dcs.nix - Project-specific packages (Go toolchain, protobuf tools, uglify-js, yuicompressor, zopfli) Updated /home/michael/machines/midna/microvm.nix to include dcsvm with: IP: 192.168.83.8 tapId: microvm6 mac: 02:00:00:00:00:07 /home/michael/microvm/dcs/ - Workspace directory /home/michael/microvm/dcs/ssh-host-keys/ - SSH host keys (ed25519) /home/michael/microvm/dcs/dcs/ - Cloned repository from https://github.com/Debian/dcs

0 views

How to Add a Quick Interactive Map to your Website

In this article I want to share a technique that I recently learned to display an interactive map on a website. For this, you will need just a few lines of HTML and JavaScript. This solution does not require you to sign up for any accounts or services anywhere, it is completely free and open source, and can be integrated with any front or back end web framework. Give the demo below a try and if you like it, then keep on reading to learn how you can add a map like this one to your website in just 3 quick steps!

10 views
Simon Willison 1 weeks ago

Adding dynamic features to an aggressively cached website

My blog uses aggressive caching: it sits behind Cloudflare with a 15 minute cache header, which guarantees it can survive even the largest traffic spike to any given page. I've recently added a couple of dynamic features that work in spite of that full-page caching. Here's how those work. This is a Django site and I manage it through the Django admin. I have four types of content - entries, link posts (aka blogmarks), quotations and notes. Each of those has a different model and hence a different Django admin area. I wanted an "edit" link on the public pages that was only visible to me. The button looks like this: I solved conditional display of this button with . I have a tiny bit of JavaScript which checks to see if the key is set and, if it is, displays an edit link based on a data attribute: If you want to see my edit links you can run this snippet of JavaScript: My Django admin dashboard has a custom checkbox I can click to turn this option on and off in my own browser: Those admin edit links are a very simple pattern. A more interesting one is a feature I added recently for navigating randomly within a tag. Here's an animated GIF showing those random tag navigations in action ( try it here ): On any of my blog's tag pages you can click the "Random" button to bounce to a random post with that tag. That random button then persists in the header of the page and you can click it to continue bouncing to random items in that same tag. A post can have multiple tags, so there needs to be a little bit of persistent magic to remember which tag you are navigating and display the relevant button in the header. Once again, this uses . Any click to a random button records both the tag and the current timestamp to the key in before redirecting the user to the page, which selects a random post and redirects them there. Any time a new page loads, JavaScript checks if that key has a value that was recorded within the past 5 seconds. If so, that random button is appended to the header. This means that, provided the page loads within 5 seconds of the user clicking the button, the random tag navigation will persist on the page. You can see the code for that here . I built the random tag feature entirely using Claude Code for web, prompted from my iPhone. I started with the endpoint ( full transcript ): Build /random/TAG/ - a page which picks a random post (could be an entry or blogmark or note or quote) that has that tag and sends a 302 redirect to it, marked as no-cache so Cloudflare does not cache it Use a union to build a list of every content type (a string representing the table out of the four types) and primary key for every item tagged with that tag, then order by random and return the first one Then inflate the type and ID into an object and load it and redirect to the URL Include tests - it should work by setting up a tag with one of each of the content types and then running in a loop calling that endpoint until it has either returned one of each of the four types or it hits 1000 loops at which point fail with an error I do not like that solution, some of my tags have thousands of items Can we do something clever with a CTE? Here's the something clever with a CTE solution we ended up with. For the "Random post" button ( transcript ): Look at most recent commit, then modify the /tags/xxx/ page to have a "Random post" button which looks good and links to the /random/xxx/ page Put it before not after the feed icon. It should only display if a tag has more than 5 posts And finally, the implementation that persists a random tag button in the header ( transcript ): Review the last two commits. Make it so clicking the Random button on a tag page sets a localStorage value for random_tag with that tag and a timestamp. On any other page view that uses the base item template add JS that checks for that localStorage value and makes sure the timestamp is within 5 seconds. If it is within 5 seconds it adds a "Random name-of-tag" button to the little top navigation bar, styled like the original Random button, which bumps the localStorage timestamp and then sends the user to /random/name-of-tag/ when they click it. In this way clicking "Random" on a tag page will send the user into an experience where they can keep clicking to keep surfing randomly in that topic. You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options .

0 views
Susam Pal 2 weeks ago

QuickQWERTY 1.2.1

QuickQWERTY 1.2.1 is now available. QuickQWERTY is a web-based touch typing tutor for QWERTY keyboards that runs directly in the web browser. This release contains a minor bug fix in Unit 4.3. Unit 4.3 is a 'Control' unit that lets you practise typing partial words as well as full words. In one place in this unit, the following sequence of partial and full words occurs: The full word was incorrectly repeated twice. This has been fixed to: To try out QuickQWERTY, go to quickqwerty.html . Read on website | #web | #programming

0 views
Steve Klabnik 2 weeks ago

The most important thing when working with LLMs

Okay, so you’ve got the basics of working with Claude going. But you’ve probably run into some problems: Claude doesn’t do what you want it to do, it gets confused about what’s happening and goes off the rails, all sorts of things can go wrong. Let’s talk about how to improve upon that. The most important thing that you can do when working with an LLM is give it a way to quickly evaluate if it’s doing the right thing, and if it isn’t, point it in the right direction. This is incredibly simple, yet, like many simple things, also wildly complex. But if you can keep this idea in mind, you’ll be well equipped to become effective when working with agents. A long time ago, I used to teach programming classes. Many of these were to adults, but some of them were to children. Teenaged children, but children nonetheless. We used to do an exercise to try and help them understand the difference between talking in English and talking in Ruby, or JavaScript, or whatever kind of programming language rather than human language. The exercise went like this: I would have a jar of peanut butter, a jar of jelly, a loaf of bread, a spoon, and a knife. I would ask the class to take a piece of paper and write down a series of steps to make a peanut butter and jelly sandwich. They’d all then give me their algorithms, and the fun part for me began: find one that’s innocently written that I could hilariously misinterpret. For example, I might find one like: I’d read this aloud to the class, you all understand this is a recipe for a peanut butter and jelly sandwich, right? I’d take the jar of peanut butter and place it upon the unopened bag of bread. I’d do the same with the jar of jelly. This would of course, squish the bread, which feels slightly transgressive given that you’re messing up the bread, so the kids would love that. I’d then say something like “the bread is already together, I do not understand this instruction.” After the inevitable laughter died down, I’d make my point: the computer will do exactly what you say, but not what you mean. So you have to get good at figuring out when you said something different than what you mean. Sort of ironically, LLMs are kind of the inverse of this: they’ll sometimes try to figure out what you mean, and then do that, rather than simply doing what you say. But the core thing here is the same: semantic drift from what we intended our program to do, and what it actually does. The second lesson is something I came up with sometime, I don’t even remember how exactly. But it’s something I told my students a lot. And that’s this: If your program did everything you wanted without problems, you wouldn’t be programming: you’d be using your program. The act of programming is itself perpetually to be in a state where something is either inadequate or broken, and the job is to fix that. I also think this is a bit simplistic but also getting at something. I had originally come up with this in the context of trying to explain how you need to manage your frustration when programming; if you get easily upset by something not working, doing computer programming might not be for you. But I do think these two things combine into something that gets to the heart of what we do: we need to understand what it is we want our software to do, and then make it do that. Sometimes, our software doesn’t do something yet. Sometimes, it does something, but incorrectly. Both of these cases result in a divergence from the program’s intended behavior. So, how do we know if our program does what it should do? Well, what we’ve been doing so far is: This is our little mini software development lifecycle, or “SDLC.” This process works, but is slow. That’s great for getting the feel of things, but programmers are process optimizers by trade. One of my favorite tools for optimization is called Amdahl’s law . The core idea is this, formulated in my own words: If you have a process that takes multiple steps, and you want to speed it up, if you optimize only one step, the maximum amount of speedup you’ll get is determined by the portion of the process that step takes. In other words, imagine we have a three step process: This process takes a total of 13 minutes to complete. If we speed up step 3 by double, it goes from two minutes to one minute, and now our process takes 12 minutes. However, if we were able to speed up step 2 by double, we’d cut off five minutes, and our process would now take 8 minutes. We can use this style of analysis to guide our thinking in many ways, but the most common way, for me, is to decide where to put my effort. Given the process above, I’m going to look at step 2 first to try and figure out how to make it faster. That doesn’t mean we can achieve the 2x speedup, but heck, if we get a 10% decrease in time, that’s the same time as if we did get a 2x on step 3. So it’s at least the place where we should start. I chose the above because, well, I think it properly models the proportion of time we’re taking when doing things with LLMs: we spend some time asking it to do something, and we spend a bit more time reviewing its output. But we spend a lot of time clicking “accept edit,” and a lot of time allowing Claude to execute tools. This will be our next step forward, as this will increase our velocity when working with the tools significantly. However, like with many optimization tasks, this is easier said than done. The actual mechanics of improving the speed of this step are simple at first: hit to auto-accept edits, and “Yes, and don’t ask again for commands” when you think the is safe for Claude to run. By doing this, once you have enough commands allowed, your input for step 2 of our development loop can drop to zero. Of course, it takes time for Claude to actually implement what you’ve asked, so it’s not like our 13 minute process drops to three, but still, this is a major efficiency step. But we were actively monitoring Claude for a reason. Claude will sometimes do incorrect things, and we need to correct it. At some point, Claude will say “Hey I’ve finished doing what you asked of me!” and it doesn’t matter how fast it does step 2 if we get to step 3 and it’s just incorrect, and we need to throw everything out and try again. So, how do we get Claude to guide itself in the right direction? A useful technique for figuring out what you should do is to consider the ending: where do we want to go? That will inform what we need to do to get there. Well, the ending of step 2 is knowing when to transition to step 3. And that transition is gated by “does the software do what it is supposed to do?” That’s a huge question! But in practice, we can do what we always do: start simple, and iterate from there. Right now, the transition from step 2 to step 3 is left up to Claude. Claude will use its own judgement to decide when it thinks that the software is working. And it’ll be right. But why leave that up to chance? I expect that some of you are thinking that maybe I’m belaboring this point. “Why not just skip to ? That’s the idea, right? We need tests.” Well on some level: yes. But on another level, no. I’m trying to teach you how to think here, not give you the answer. Because it might be broader than just “run the tests.” Maybe you are working on a project where the tests aren’t very good yet. Maybe you’re working on a behavior that’s hard to automatically test. Maybe the test suite takes a very long time, and so isn’t appropriate to be running over and over and over. Remember our plan from the last post? Where Claude finished the plan with this: These aren’t “tests” in the traditional sense of a test suite, but they are objective measures that Claude can invoke itself to understand if it’s finished the task. Claude could run after every file edit if it wanted to, and as soon as it sees , it knows that it’s finished. You don’t need a comprehensive test suite. You just need some sort of way for Claude to detect if it’s done in some sort of objective fashion. Of course, we can do better. While giving Claude a way to know if it’s done working is important, there’s a second thing we need to pay attention to: when Claude isn’t done working, can we guide it towards doing the right thing, rather than the wrong thing? For example, those of you who are of a similar vintage as myself may remember the output of early compilers. It was often… not very helpful. Imagine that we told Claude that it should run to know if things are working, and the only output from it was the exit code: 0 if we succeeded, 1 if we failed. That would accomplish our objective of letting Claude know when things are done, but it wouldn’t help Claude know what went wrong when it returns 1. This is one reason why I think Rust works well with LLMs. Take this incorrect Rust program: The Rust compiler won’t just say “yeah this program is incorrect,” it’ll give you this (as of Rust 1.93.0): The compiler will point out the exact place in the code itself of where there’s an issue, and even make suggestions as to how to fix it. This goes beyond just simply saying “it doesn’t work” and instead nudges you to what might fix the problem. Of course, this isn’t perfect, but if it’s helpful more than not, that’s a win. Of course, too much verbosity isn’t helpful either. A lot of tooling has gotten much more verbose lately. Often times, this is really nice as a human. Pleasant terminal output is, well… pleasant. But that doesn’t mean that it’s always good or useful. For example, here’s the default output for : This is not bad output. It’s nice. But it’s also not useful for an LLM. We don’t need to read all of the tests that are passing, we really just want to see some sort of minimal output, and then what failed if something failed. In Cargo’s case, that’s for “quiet”: There is no point in giving a ton of verbose input to an LLM that it isn’t even going to need to use. If you’re feeding a tools’ output to an LLM, you should consider both what the tool does in the failure case, but also the success case. Maybe configure things to be a bit simpler for Claude. You’ll save some tokens and get better results. All of this has various implications for all sorts of things. For example, types are a great way to get quick feedback on what you’re doing. A comprehensive test suite that completes quickly is useful for giving feedback to the LLM. But that also doesn’t inherently mean that types must be better or that you need to be doing TDD; whatever gives you that underlying principle of “objective feedback for the success case and guidance for the failure case” will be golden, no matter what tech stack you use. This brings me to something that may be counter-intuitive, but I think is also true, and worth keeping in the back of your mind: what’s good for Claude is also probably good for humans working on your system. A good test suite was considered golden before LLMs. That it’s great for them is just a nice coincidence. At the end of the day, Claude is not a person, but it tackles programming problems in a similar fashion to how we do: take in the problem, attempt a solution, run the compiler/linter/tests, and then see what feedback it gets, then iterate. That core loop is the same, even if humans can exercise better judgement and can have more skill. And so even though I pitched fancy terminal output as an example of how humans and LLMs need different things, that’s really just a superficial kind of thing. Good error messages are still critical for both. We’re just better at having terminal spinners not take up space in our heads while we’re solving a problem, and can appreciate the aesthetics in a way that Claude does not. Incidentally, this is one of the things that makes me hopeful about the future of software development under agentic influence. Engineers always complain that management doesn’t give us time to do refactorings, to improve the test suite, to clean our code. Part of the reason for this is that we often didn’t do a good job of pitching how it would actually help accomplish business goals. But even if you’re on the fence about AI, and upset that management is all about AI: explain to management that this stuff is a force multiplier for your agents. Use the time you’ve saved by doing things the agentic way towards improving your test suite, or your documentation, or whatever else. I think there’s a chance that all of this stuff leads to higher quality codebases than ones filled with slop. But it also requires us to make the decisions that will lead is in that direction. That’s what I have for you today: consider how you can help Claude evaluate its own work. Give it explicit success criteria, and make evaluating that criteria as simple and objective as possible. In the next post, we’re gonna finally talk about . Can you believe that I’ve talked this much about how to use Claude and we haven’t talked about ? There’s good reason for that, as it turns out. We’re going to talk a bit more about understanding how interacting with LLMs work, and how it can help us both improve step 1 in our process, but also continue to make step 2 better and better. Here’s my post about this post on BlueSky: Steve Klabnik @steveklabnik.com · Jan 22 Replying to Steve Klabnik Agentic development basics: steveklabnik.com/writing/agen... Agentic development basics Blog post: Agentic development basics by Steve Klabnik steveklabnik.com Steve Klabnik @steveklabnik.com The most important thing when working with LLMs steveklabnik.com/writing/the-... The most important thing when working with LLMs Blog post: The most important thing when working with LLMs by Steve Klabnik Put the peanut butter on the bread Put the jelly on the bread Put the bread together Asking the LLM to do something by typing up what we want it to do Closely observing its behavior and course correcting it when it goes off of the rails Eventually, after it says that it’s finished, reviewing its output Ten minutes Two minutes

0 views
Andre Garzia 2 weeks ago

Three months of Poncho Wonky

# Three Months of Poncho Wonkying Today marks three months since my first release of Poncho Wonky. In this brief post, I want to chat a bit about I accomplished during this period. > Poncho Wonky is a fork of Patchwork which is a [Secure Scuttlebutt](https://ssb.nz) client. You can find the latest release [here](https://github.com/soapdog/patchwork/releases). The last release of Patchwork is 3.18.1, Poncho Wonky first release was thus versioned as 4.0.0 and as of this writing, the current version is 4.4.0. **In these three months I released 10 versions of Poncho Wonky** including patches and minor versions. My strategy is that bug fixes or feature alterations get a point release and new features get a minor version bump. A major version bump is in the work if the feature change is considered ground breaking. ## Updated dependencies Patchwork final version was released on 2021 using: * Electron 11 * Electron Builder 22 I managed to upgrade these dependencies to: * Electron 39 * Electron Builder 26 That is a 28 versions bumps on Electron alone. A major upgrade to the core technology powering Poncho Wonky. ## Releases for new architectures Patchwork binary releases were made for Windows, macOS, and Linux using x64 for all platforms and also being available as arm64 for Linux Poncho Wonky has all that plus arm64 for macOS as well. ## Signed release for macOS Poncho Wonky is the first ever Patchwork release to ~give in to Apple racketering scheme~ be signed for macOS. That means that macOS ~hostages~ users need to jump fewer hoops to get the app to run compared to Patchwork. ## Support for `blog` posts Patchwork supports displaying messages of type `blog` — long form text messages — but didn't support writing them. Poncho Wonky added writing Blog Posts and also added a new page that helps surface Blog Posts ![Composing a blog post](/2026/01/img/4c80ae63-e36e-4d80-a5fa-f2685ed344b8.jpg) ![A page to help you find cool blog posts](/2026/01/img/35b81705-44b1-4b97-b343-c56861cab290.jpg) ## Support for `bookclub` messages Book Club messages allows users to post metadata and reviews about books. Think of it as your peer to peer goodreads. You can find new books to read, organise your book collection, and keep track of your reviews. ![Your own personal goodreads](/2026/01/img/d61c76d2-ff22-4beb-8cca-abf571899c2f.jpg) Few clients supported book club messages (iirc patchfoo, patchfox, patchbay) but Poncho Wonky is the first one that includes fetching book metadata from the web based on ISBN numbers, which makes it a lot easier to add books to the ecosystem. ![Adding a book](/2026/01/img/fd4148cc-1e56-413c-aef6-649db575c98c.jpg) ## Troubleshooting tools One of the upgraded dependencies introduced some indexing bugs that are hard to track down and don't happen for all users. To help mitigate that, a new page was created with troubleshooting tools. ![Help you fix those fiddly indexes](/2026/01/img/a48ed9a1-8f5e-4221-af26-d27f64bb64a5.jpg) ## Special support for audio attachments I always loved when people shared their own music or jams on SSB. Recently, I added a feature that enables a small music player on a separate window so you can queue and play music while browsing SSB. ![A separate music player](/2026/01/img/7d50098f-b5d1-4d92-9e24-b45acc16aee4.png) ---- ## Things currently in development # User Scrips Are Coming To Poncho Wonky A side effect of being at [P2P Basel 2026](https://p2p-basel.org/) is that I get overexcited about this space and start developing stuff that I shouldn't. This time, I decided to do a major feature: adding user scripts to Poncho Wonky. > User scripts are small morsels of code that you can craft and share with your friends. If you place those script files in a special folder, they become active inside Poncho Wonky. ## What user scripts enable for the users You'll be able to make Poncho Wonky yours. Want to add a niche feature to the app? You can do it without the need to recompile Poncho Wonky source code. Just create a small file, place it in a folder, restart Poncho Wonky. > The feature is similar to what was achieved with [Greasemonkey](https://www.greasespot.net/) and other similar solutions in the browser. User scripts allows each user to tweak Poncho Wonky to make it more suitable to the experience they desire, and also to share it with their friends. ## Is It Safe? User scripts are written in [Lua](https://lua.org) which is a language with a long standing reputation as a embeddable scripting language for applications. The [Lua engine used is written in JS](http://fengari.io/) and has no access to your files, DOM, or any other personal data. It runs on its own environment and can't crash Poncho Wonky even when it goes wrong. ## Why Lua and not JS? Lua is a language created for this use case. It is very easy to lock up a Lua environment so that the developer only has access to an approved set of APIs and features. JS engines tend to be much larger and carry a lot more complexity with them. Lua is a smaller language so it is easier to learn than JS which has a ton of features. ## Can you show us something? Yes, I can! So the reason I am writing this is that I already got a lot of it working. I don't yet have all the API I want to expose, hence why this is an announcement and not a release, but I can show yous something indeed. How does it work? Poncho Wonky created an empty `custom-scripts/active/` folder in your Application Support folder. That is where you place scripts you want to activate. It also comes with a folder `custom-scripts/samples/` with example code for you to use and adapt. ![Add a file to a folder to activate a script](/2026/01/img/6a3fe567-5620-4b3d-a9bb-fa2c04f2ef75.jpg) ### A script to play music from Worm Blossom blog posts My first test script is a script that allows me to play songs from Worm Blossom blog posts that are shared here. Like the audio player, I wanted the script to add a new action button to the post that when clicked would open a music player with the song. That button is only displayed for messages that includes a link to the [Worm Blossom blog](https://worm-blossom.org/#y2026w3). ![That button only appears for specific messages](/2026/01/img/abcab22f-a4c1-4f13-a4c3-9a4193459ac5.jpg) Clicking that button opens a music player: ![A barebones music player](/2026/01/img/dcb67426-8b5d-4454-8c58-aba9d705bd2b.jpg) ### That is cool, show us the script A user script is a Lua table, think of it as analogous to a JS object: ```lua local contact = { name = "Andre", surname = "Garzia" } ``` A table can contain functions, they can appear inline with the table construction or be added later such as: ```lua function contact.sayHi() print("Hello!") end ``` So a user script is a table that contains functions. The function names are hooks, Poncho Wonky will look for functions with specific names and if it finds them, it will execute them when needed. ### Adding a button to a message The functions to add a button to a message are `buttonAction` and `action` (function names might change before I release this). * `buttonAction`: called for each displayed message and should return true or false and also a label if it is true (Lua has multiple return values). If calling it returns `true` and a _label_ then a button is added to that message with the returned _label_. * `action`: that function is called is the user clicks the button. Let me show the `buttonAction` function for that script: ```lua function wbMusic.buttonAction(msg) if msg == nil then return false, "no message" end if msg.value.content.type ~= "post" then return false, "message is not post" end local text = msg.value.content.text local url = extractWormBlossomURL(text) if url then return true, "Open in Worm Blossom Player" else return false, "no blossom url" end end ``` The `action` function for that script is a bit more involved. This is what it does when it runs: 1. Extracts the URL from the SSB post. 2. Fetches that URL. 3. Finds the ` ` music player inside the post. 4. Extracts the ` ` and peeks at the `src` attribute. The song data is in the source attribute as a parameter called `song`. 5. Opens a new Poncho Wonky window with some HTML and JS to assemble a music player able to play those chiptunes and add the extracted `song` to it. Be aware that _fetching a URL_ is an asynchronous operation, so in the Lua script, that function needs to happen inside a _co-routine_. Here is the function: ```lua function wbMusic.action(msg) local text = msg.value.content.text local url = extractWormBlossomURL(text) local co = coroutine.create(function() -- must wrapped in a coroutine because fetch is async local content = fetch(url) if content then local iframe = querySelect(content, "iframe") local src = getAttribute(iframe, "src") local song = getParam(src, "song") openWindowFromAssets("worm-blossom-music/start.js", { data = song, width = 400, height = 300 }) end end) coroutine.resume(co) end ``` That is it, a whole new music feature that I'm probably the only person interested. A feature I can have in my Poncho Wonky without actually changing the Poncho Wonky source code. How would you make Poncho Wonky yours? What scripts you'd like to build? --- _I am pretty happy with all the progress in these last three months. Working with Patchwork makes me happy. If it makes you happy as well, maybe consider [buying me a coffee](https://ko-fi.com/andreshouldbewriting)._

0 views
The Jolly Teapot 2 weeks ago

Tempted to stick with my old Mac a bit longer

My latest post mentioned how perfect my current setup seems to be . Today, a week or two later, I must admit, this post holds up pretty well. I was expecting the post (and therefore the setup) to be updated drastically the minute I published it, as it usually goes , and yet, nothing of substance has changed since. In this post, I listed my dear old MacBook Air from early 2020, rocking an Intel chip, as the weakest part of that setup, the one thing that was the most likely to get replaced. It turns out that I’m now not so sure about that: My Mac feels fine. Sure, it’s not fast, the battery lasts around 40 minutes on a charge, and I can feel it’s struggling and getting warm when watching videos or visiting “heavy” websites. I remain cautious and very conservative with what I do with it, but for an almost six-year-old computer, it’s surprisingly usable. Somehow, I like that my Mac is old, slow, and limited. This constraint forces me to stay vigilant, to keep things as simple, native, light, minimal, and optimised as possible. When the fan activates, I know something’s wrong. I’m calling this the “whoosh notification.” When my laptop starts to make a vacuum cleaner noise, this is the signal to close the guilty Safari tab (or to turn off JavaScript ), or to get rid of the app causing the trouble, eliminating it from a potential consideration. Maybe I have to thank this very limitation for finally achieving this “perfect” setup. Without it, I’d keep experimenting, tweaking my setup further and further, and potentially even adopting apps that are not indeed that efficient or completely optimised. With a modern Mac, let’s say an M4 MacBook Air, I’m afraid I wouldn’t be able to tell the difference. I wouldn’t be able to detect the inefficiencies and appreciate the efficiencies as easily. For instance, occasionally, I’m eager to try Orion Browser , as it ticks almost all the boxes for me. But every time I play with it, my computer gives me the signal. When the fan starts to blow seemingly out of nowhere, I don’t investigate further, and I become very much aware that I have to stick to Safari. Another example: every time Eleventy builds the HTML for this site, I love seeing that it sometimes takes less than one second . With an M4 MacBook Air, such a feat would be unremarkable. If I use yet another Lotus Elise analogy , my computer and setup rely on the chip equivalent of the simple four-cylinder Toyota engine , the one that was fitted in the latest generations of the car. These engines were finely tuned, decently powerful, but they couldn’t afford to deal with extra weight if they wanted to provide some sort of race car performance. Race cars from other brands — and most sports cars currently on sales — on the other hand, mounted with engines two, three, or four times more powerful, aren’t optimised or even built the same way: they can handle to be fitted into bigger cars, they can support the extra weight of ventilated seats, more speakers, and more. When these manufacturers feel their cars can be a bit more fun to drive, they simply add more power ; they don’t really bother fine-tuning every part for maximum efficiency because with such power, it’s rather unnecessary. This is why, as I write this in January 2026, I’m more tempted than ever to enjoy my Mac — and lean setup — one more year. Also, besides the much, much faster chip, the new MacBook Air is basically the same as mine. It has the same keyboard , the same screen, the same maximum brightness, the same form factor, and a slightly different design. The chip is the main star in these new models; it’s such a leap forward from mine that I’m not even sure I’d notice the other improvements, like faster memory and faster Wi-Fi. Icing on the cake, sticking to my current Mac also means being unable to upgrade to Tahoe . I use “MacOS 26” on my work computer, and Tahoe’s Safari, a prime example among many others, is surely one of the worst versions ever of the browser. * 1 I mean, look at this screenshot and try to figure out at first glance which tab is currently active. And don’t get me started on the rounded corners. * 2 In March, Apple will probably release a new generation of MacBooks Air, and, depending on what else will be new besides the M5 chip, I may change my mind. But as I said, and as I am typing these lines in a perfectly capable laptop running Sequoia, with a confident and efficient setup, I’m more tempted than ever to keep this little guy around a bit longer. Another outcome that is looking more and more likely due to the current international shit show : I may play it safe and buy a current M4 MacBook Air at a 150 euro discount on the 31st of January, the day before some potential tariffs may be added on, or not. I can also get 150 euros for trading this one in, which would make the purchase a lot more affordable than the newer model, especially if new tariffs on US products sare introduced. Very difficult to predict the future one week from now in that regard, as a lot of things have happened this week. Either way, this Intel Core i5 chip is more resilient than I expected: one just has to handle it with care. The worst ever being — quite obviously — the very first Windows version circa 2005, that would not even launch: I think Apple released the 1.0.1 update the next day or so, fixing the problem. ^ When customising the toolbar, the flexible space placeholders in particular look odd, as if the design is unfinished, unrefined. This is something I would expect on Windows, not MacOS. ^ The worst ever being — quite obviously — the very first Windows version circa 2005, that would not even launch: I think Apple released the 1.0.1 update the next day or so, fixing the problem. ^ When customising the toolbar, the flexible space placeholders in particular look odd, as if the design is unfinished, unrefined. This is something I would expect on Windows, not MacOS. ^

0 views
JSLegendDev 2 weeks ago

Making a Small Mouse-Driven RPG

Check the video version version on YouTube, here . I have been working on a small action RPG where the player moves around on a small map. Once in a while, stars appear and if the player collides with one, a battle encounter begins. In combat, the player has to avoid getting hit by a slew of projectiles. To deal damage, they must collide with star shaped attack zones appearing on the battlefield. After a battle, if they win, they get rewarded with a currency they can spend to either upgrade between three stats : health, attack or speed, or to heal themselves. After a stat upgrade, the cost to upgrade again will increase. I initially designed the game to be controlled via the keyboard or a standard controller. However, this would end up changing as the title of this post suggests. After having developed the core game loop, I was unfortunately faced with a roadblock that stopped the project in its tracks. The game’s performance wasn’t great. I originally built it using JavaScript and the KAPLAY game library. The crux of the issue was that KAPLAY wasn’t performant enough. This led me to embark on a side quest of exploring alternatives, which ended with me picking Godot. Therefore, I started re-implementing what I had in the JavaScript version. However, this is where I got completely demotivated. Nothing against Godot, but it just wasn’t fun re-doing all this work. KAPLAY was also faster for prototyping ideas. I started to procrastinate. Because of this, I was faced with two options: Either I just silently abandon the project. I still resume its development in KAPLAY regardless of the performance. At this point its either there is a game in the end even if it doesn’t run super well or there is nothing. That said, the main factor that lead me to push forward was the reception the first devlog had on YouTube. I did not expect a haphazardly put together devlog recorded while I was still recovering from a cold to garner over 50k views. This seemed to show genuine interest in my game and it seemed worth it to continue its development. Therefore, I re-opened my codebase and started working. This is where something really amazing occured. The KAPLAY developers, while I was exploring other alternatives, released new versions of the library that improved performance noticeably. This led me to regain my motivation. Now that I was back to developing my game, I assessed that its pillar was the combat system. Unfortunately, battling wasn’t as fun as it could be. I identified that the core issue at hand was a lack of thoughful design in the enemy attack patterns. To make things easier to test, I created a scene where I could initiate a battle with any specific enemy in my game rather than having to move around the map until an encounter occured. This would allow me to quickly tweak attack patterns as needed which allowed for faster iteration. I also had to refactor my code a bit to make the way I spawned attacks more flexible and modular. After having done this perparatory work, I wanted to start re-designing the attack patterns of the first enemy since it was the only one where I had actually put some time in designing a pattern beforehand even if it wasn’t very good. This is where I was afflicted by the blank page syndrome. I absolutely had no idea how to approach this aspect of the game’s design. I didn’t even know were to start. Also, it didn’t help that I didn’t really play games with bullet-hell mechanics. I figured out I needed to do some research. This is where I concluded that what I had in mind for my project shared a lot design-wise with games in the Shump genre otherwise known as Shoot’em up . A big portion of their gameplay is about avoiding projectiles which is what my game is mostly about. While I was looking up how Schumps were designed, I came across an interesting document by a Shump dev and designer named Bog Hog. Titled BULLET HELL SHMUP DESIGN 101 , it contained exactly what I needed to make good attack patterns for my enemies. While it’s filled with useful design knowledge that I won’t have the time to cover here, a few concepts listed under the BULLET PATTERN section really stood out. To summarize, the document explained that at its core, bullet attack patterns can be boiled down to three simple bullet pattern types. Static : where the bullet has a predefined trajectory, this is useful for creating obstacles. This will force the player to engage with the dodging mechanic fully. It also has the advantage of allowing you to design beautiful looking patterns since everything is predetermined. Aimed : where the bullet targets the player, this is good for applying pressure and forcing the player to move. You essentially, don’t want the player to camp and cheese the game that way which is possible if you only have static patterns. Random : as the name implies, this is when the pattern is randomized to keep things fresh. However, the document warns to use this carefully as it’s bound to create unfair situations. Armed with this knowledge, I was able to design an attack pattern that was far better than what I previously had. That said, a single pattern would not be enough to make a whole combat encounter interesting. To fix this, I decided to design mutliple different patterns that would be used according to the enemy’s HP. As it got lower, harder patterns were introduced. Essentially, this is the concept of phases which is commonly seen in boss fights of various games. As I playtested the game myself, while I could see improvements with the battle system, there was still something that felt off. I would end up figuring out that the fact that attack zones, which the player uses to deal damage, were spawned randomly reduced the feeling of mastering a fight. By extension this reduced enjoyment. It felt unfair that despite masterfully avoiding a slew of attacks, you could be in the unfortunate position of having an attack zone spawn too far away from where you were. A similar feeling to this is when you play soccer and your team dominates possessing the ball but all shots mades never converts to a goal. To fix this issue, I decided to make attack zones spawn in predetermined locations. For each phase of the battle there would be a maximum amount of attack zones that could be available at once and their positions would be randomly selected among a list of predetermined locations. Playtesting again, this change seemed to fix the issue. One challenge I faced during development was how difficult it was to gauge whether an attack pattern would be perceived by players as too difficult or too easy. Fortunately, because my game has a level-up system and is non-linear, I was able to address this issue by designing all enemy battles to be very challenging at level 1. If a player struggled, they could use the currency earned from barely defeating an enemy to upgrade their stats, gradually making the game easier over time. After having designed the full battle encounter with the most basic enemy of the game, I decided to let my brother test it out to see if my intuition about game design and difficulty would hold. I proceeded to hand him over my development machine and explained him the controls. After letting him play for a bit, he mentioned how the arrow keys used to move around were uncomfortable. To understand his complaint, it’s worth mentioning that I’ve been developing on a MacBook Air M3, and the arrow keys on this device are really narrow, which explains why he felt cramped while playing. He then asked if he could use WASD keys instead. Although it wasn’t implemented, I obliged, and he had a much better experience. However, after he finished playing, he mentioned that the battle was too difficult and that it would have been much easier if he could control the cursor during battle with a mouse. I initially dismissed this suggestion because it would literally turn my game on its head. In my RPG, players can upgrade three stats: attack, HP, and speed. Implementing mouse input would have required removing the speed stat because moving slower than the mouse cursor would feel awkward. This would have left my RPG with only two upgradeable stats, making the change feel like it took too much away from the game. Another issue with this change is that other input methods wouldn’t be able to compete with a mouse or trackpad, which are much more precise and allow players to move more easily. This would essentially mean committing to a mouse-driven game and designing all attack patterns around that control input method. Additionally, I would have had to reimplement movement controls in the overworld, since it would feel strange to move with arrow keys or WASD but then have to use the mouse in battle. I would also have needed to adjust my menus to work with mouse input instead of just keyboard or controller controls. Finally, the game would only work as a PC title, since bringing it to consoles would alter the experience too much, unless you relied on the PS5 or Steam Deck trackpads or the Switch 2’s mouse mode. This wasn’t a major concern during development, as I wasn’t designing with consoles in mind but I thought that it would be a shame if the game does become successful but couldn’t be ported to console due to the game’s fundamental design. I eventually came around to the idea of mouse movement and decided to implement it in combat, mainly out of curiosity to see how the game would feel with this input method. To my surprise, controlling the cursor with the mouse was far better than I expected. Moving around was incredibly fun, and the sense of control and precision was unmatched compared to using arrow keys or even a controller’s joystick. I was now convinced, this was the way. Too bad if the game doesn’t become adaptable to consoles, I would commit this title to being a mouse/trackpad driven game and would design all battles with this input method in mind. I proceeded to make all kinds of changes to better suit this new playstyle, but I don’t think they’re worth covering here. Instead, why not try it for yourself? That’s why I’m excited to announce that I’ve released a demo of the battle system, which you can try directly in your browser on itch.io with no downloads required. In this demo, you’ll face the game’s most basic enemy while your character is at level 1. Keep in mind that it’s normal for the battle to be challenging, but you should be able to defeat the enemy at least once. In the full game, the currency gained from battles can be used to level up, making future encounters with the same enemy easier over time. Give it a try and let me know what you think! Link to the demo : https://jslegend.itch.io/hydralia-donovans-demise-battle-demo That’s all I have to share for now. If you missed the first devlog, here’s a link to it. Anyway, if you want to keep up with the game’s development or are more generally interested in game development, I recommend subscribing to not miss out on future posts. Subscribe now In the meantime, you can read the following : Check the video version version on YouTube, here . I have been working on a small action RPG where the player moves around on a small map. Once in a while, stars appear and if the player collides with one, a battle encounter begins. In combat, the player has to avoid getting hit by a slew of projectiles. To deal damage, they must collide with star shaped attack zones appearing on the battlefield. After a battle, if they win, they get rewarded with a currency they can spend to either upgrade between three stats : health, attack or speed, or to heal themselves. After a stat upgrade, the cost to upgrade again will increase. I initially designed the game to be controlled via the keyboard or a standard controller. However, this would end up changing as the title of this post suggests. Resuming Development After having developed the core game loop, I was unfortunately faced with a roadblock that stopped the project in its tracks. The game’s performance wasn’t great. I originally built it using JavaScript and the KAPLAY game library. The crux of the issue was that KAPLAY wasn’t performant enough. This led me to embark on a side quest of exploring alternatives, which ended with me picking Godot. Therefore, I started re-implementing what I had in the JavaScript version. However, this is where I got completely demotivated. Nothing against Godot, but it just wasn’t fun re-doing all this work. KAPLAY was also faster for prototyping ideas. I started to procrastinate. Because of this, I was faced with two options: Either I just silently abandon the project. I still resume its development in KAPLAY regardless of the performance. I did not expect a haphazardly put together devlog recorded while I was still recovering from a cold to garner over 50k views. This seemed to show genuine interest in my game and it seemed worth it to continue its development. Therefore, I re-opened my codebase and started working. This is where something really amazing occured. The KAPLAY developers, while I was exploring other alternatives, released new versions of the library that improved performance noticeably. This led me to regain my motivation. The Pillar of My Game : The Battle System Now that I was back to developing my game, I assessed that its pillar was the combat system. Unfortunately, battling wasn’t as fun as it could be. I identified that the core issue at hand was a lack of thoughful design in the enemy attack patterns. To make things easier to test, I created a scene where I could initiate a battle with any specific enemy in my game rather than having to move around the map until an encounter occured. This would allow me to quickly tweak attack patterns as needed which allowed for faster iteration. I also had to refactor my code a bit to make the way I spawned attacks more flexible and modular. After having done this perparatory work, I wanted to start re-designing the attack patterns of the first enemy since it was the only one where I had actually put some time in designing a pattern beforehand even if it wasn’t very good. This is where I was afflicted by the blank page syndrome. I absolutely had no idea how to approach this aspect of the game’s design. I didn’t even know were to start. Also, it didn’t help that I didn’t really play games with bullet-hell mechanics. I figured out I needed to do some research. This is where I concluded that what I had in mind for my project shared a lot design-wise with games in the Shump genre otherwise known as Shoot’em up . A big portion of their gameplay is about avoiding projectiles which is what my game is mostly about. While I was looking up how Schumps were designed, I came across an interesting document by a Shump dev and designer named Bog Hog. Titled BULLET HELL SHMUP DESIGN 101 , it contained exactly what I needed to make good attack patterns for my enemies. While it’s filled with useful design knowledge that I won’t have the time to cover here, a few concepts listed under the BULLET PATTERN section really stood out. To summarize, the document explained that at its core, bullet attack patterns can be boiled down to three simple bullet pattern types. Static : where the bullet has a predefined trajectory, this is useful for creating obstacles. This will force the player to engage with the dodging mechanic fully. It also has the advantage of allowing you to design beautiful looking patterns since everything is predetermined. Aimed : where the bullet targets the player, this is good for applying pressure and forcing the player to move. You essentially, don’t want the player to camp and cheese the game that way which is possible if you only have static patterns. Random : as the name implies, this is when the pattern is randomized to keep things fresh. However, the document warns to use this carefully as it’s bound to create unfair situations.

0 views

merge-pdf.app - A free, privacy-first PDF Merging tool

Privacy-First PDF Merging: Why I Built merge-pdf.appWe’ve all been there. You have three different PDFs, maybe a cover letter, a CV, and some certifications - and you need them in one single file. The “old” way to do this usually involved searching for “Merge PDF” on Google, clicking the first link, and uploading your sensitive documents to a random server in the cloud. You get your file back, but at what cost?

0 views
matklad 3 weeks ago

Vibecoding #2

I feel like I got substantial value out of Claude today, and want to document it. I am at the tail end of AI adoption, so I don’t expect to say anything particularly useful or novel. However, I am constantly complaining about the lack of boring AI posts, so it’s only proper if I write one. At TigerBeetle, we are big on deterministic simulation testing . We even use it to track performance , to some degree. Still, it is crucial to verify performance numbers on a real cluster in its natural high-altitude habitat. To do that, you need to procure six machines in a cloud, get your custom version of binary on them, connect cluster’s replicas together and hit them with load. It feels like, quarter of a century into the third millennium, “run stuff on six machines” should be a problem just a notch harder than opening a terminal and typing , but I personally don’t know how to solve it without wasting a day. So, I spent a day vibecoding my own square wheel. The general shape of the problem is that I want to spin a fleet of ephemeral machines with given specs on demand and run ad-hoc commands in a SIMD fashion on them. I don’t want to manually type slightly different commands into a six-way terminal split, but I also do want to be able to ssh into a specific box and poke it around. My idea for the solution comes from these three sources: The big idea of is that you can program distributed system in direct style. When programming locally, you do things by issuing syscalls: This API works for doing things on remote machines, if you specify which machine you want to run the syscall on: Direct manipulation is the most natural API, and it pays to extend it over the network boundary. Peter’s post is an application of a similar idea to a narrow, mundane task of developing on Mac and testing on Linux. Peter suggests two scripts: synchronizes a local and remote projects. If you run inside folder, then materializes on the remote machine. does the heavy lifting, and the wrapper script implements behaviors. It is typically followed by , which runs command on the remote machine in the matching directory, forwarding output back to you. So, when I want to test local changes to on my Linux box, I have roughly the following shell session: The killer feature is that shell-completion works. I first type the command I want to run, taking advantage of the fact that local and remote commands are the same, paths and all, then hit and prepend (in reality, I have alias that combines sync&run). The big thing here is not the commands per se, but the shift in the mental model. In a traditional ssh & vim setup, you have to juggle two machines with a separate state, the local one and the remote one. With , the state is the same across the machines, you only choose whether you want to run commands here or there. With just two machines, the difference feels academic. But if you want to run your tests across six machines, the ssh approach fails — you don’t want to re-vim your changes to source files six times, you really do want to separate the place where the code is edited from the place(s) where the code is run. This is a general pattern — if you are not sure about a particular aspect of your design, try increasing the cardinality of the core abstraction from 1 to 2. The third component, library, is pretty mundane — just a JavaScript library for shell scripting. The notable aspects there are: JavaScript’s template literals , which allow implementing command interpolation in a safe by construction way. When processing , a string is never materialized, it’s arrays all the way to the syscall ( more on the topic ). JavaScript’s async/await, which makes managing concurrent processes (local or remote) natural: Additionally, deno specifically valiantly strives to impose process-level structured concurrency, ensuring that no processes spawned by the script outlive the script itself, unless explicitly marked — a sour spot of UNIX. Combining the three ideas, I now have a deno script, called , that provides a multiplexed interface for running ad-hoc code on ad-hoc clusters. A session looks like this: I like this! Haven’t used in anger yet, but this is something I wanted for a long time, and now I have it The problem with implementing above is that I have zero practical experience with modern cloud. I only created my AWS account today, and just looking at the console interface ignited the urge to re-read The Castle. Not my cup of pu-erh. But I had a hypothesis that AI should be good at wrangling baroque cloud API, and it mostly held. I started with a couple of paragraphs of rough, super high-level description of what I want to get. Not a specification at all, just a general gesture towards unknown unknowns. Then I asked ChatGPT to expand those two paragraphs into a more or less complete spec to hand down to an agent for implementation. This phase surfaced a bunch of unknowns for me. For example, I wasn’t thinking at all that I somehow need to identify machines, ChatGPT suggested using random hex numbers, and I realized that I do need 0,1,2 naming scheme to concisely specify batches of machines. While thinking about this, I realized that sequential numbering scheme also has an advantage that I can’t have two concurrent clusters running, which is a desirable property for my use-case. If I forgot to shutdown a machine, I’d rather get an error on trying to re-create a machine with the same name, then to silently avoid the clash. Similarly, turns out the questions of permissions and network access rules are something to think about, as well as what region and what image I need. With the spec document in hand, I turned over to Claude code for actual implementation work. The first step was to further refine the spec, asking Claude if anything is unclear. There were couple of interesting clarifications there. First, the original ChatGPT spec didn’t get what I meant with my “current directory mapping” idea, that I want to materialize a local as remote , even if are different. ChatGPT generated an incorrect description and an incorrect example. I manually corrected example, but wasn’t able to write a concise and correct description. Claude fixed that working from the example. I feel like I need to internalize this more — for current crop of AI, examples seem to be far more valuable than rules. Second, the spec included my desire to auto-shutdown machines once I no longer use them, just to make sure I don’t forget to turn the lights off when leaving the room. Claude grilled me on what precisely I want there, and I asked it to DWIM the thing. The spec ended up being 6KiB of English prose. The final implementation was 14KiB of TypeScript. I wasn’t keeping the spec and the implementation perfectly in sync, but I think they ended up pretty close in the end. Which means that prose specifications are somewhat more compact than code, but not much more compact. My next step was to try to just one-shot this. Ok, this is embarrassing, and I usually avoid swearing in this blog, but I just typoed that as “one-shit”, and, well, that is one flavorful description I won’t be able to improve upon. The result was just not good (more on why later), so I almost immediately decided to throw it away and start a more incremental approach. In my previous vibe-post , I noticed that LLM are good at closing the loop. A variation here is that LLMs are good at producing results, and not necessarily good code. I am pretty sure that, if I had let the agent to iterate on the initial script and actually run it against AWS, I would have gotten something working. I didn’t want to go that way for three reasons: And, as I said, the code didn’t feel good, for these specific reasons: The incremental approach worked much better, Claude is good at filling-in the blanks. The very first thing I did for was manually typing-in: Then I asked Claude to complete the function, and I was happy with the result. Note Show, Don’t Tell I am not asking Claude to avoid throwing an exception and fail fast instead. I just give function, and it code-completes the rest. I can’t say that the code inside is top-notch. I’d probably written something more spartan. But the important part is that, at this level, I don’t care. The abstraction for parsing CLI arguments feel right to me, and the details I can always fix later. This is how this overall vibe-coding session transpired — I was providing structure, Claude was painting by the numbers. In particular, with that CLI parsing structure in place, Claude had little problem adding new subcommands and new arguments in a satisfactory way. The only snag was that, when I asked to add an optional path to , it went with , while I strongly prefer . Obviously, its better to pick your null in JavaScript and stick with it. The fact that is unavoidable predetermines the winner. Given that the argument was added as an incremental small change, course-correcting was trivial. The null vs undefined issue perhaps illustrates my complaint about the code lacking character. is the default non-choice. is an insight, which I personally learned from VS Code LSP implementation. The hand-written skeleton/vibe-coded guts worked not only for the CLI. I wrote and then asked Claude to write the body of a particular function according to the SPEC.md. Unlike with the CLI, Claude wasn’t able to follow this pattern itself. With one example it’s not obvious, but the overall structure is that is the AWS-level operation on a single box, and is the CLI-level control flow that deals with looping and parallelism. When I asked Claude to implement , without myself doing the / split, Claude failed to noticed it and needed a course correction. However , Claude was massively successful with the actual logic. It would have taken me hours to acquire specific, non-reusable knowledge to write: I want to be careful — I can’t vouch for correctness and especially completeness of the above snippet. However, given that the nature of the problem is such that I can just run the code and see the result, I am fine with it. If I were writing this myself, trial-and-error would totally be my approach as well. Then there’s synthesis — with several instance commands implemented, I noticed that many started with querying AWS to resolve symbolic machine name, like “1”, to the AWS name/IP. At that point I realized that resolving symbolic names is a fundamental part of the problem, and that it should only happen once, which resulting in the following refactored shape of the code: Claude was ok with extracting the logic, but messed up the overall code layout, so the final code motions were on me. “Context” arguments go first , not last, common prefix is more valuable than common suffix because of visual alignment. The original “one-shotted” implementation also didn’t do up-front querying. This is an example of a shape of a problem I only discover when working with code closely. Of course, the script didn’t work perfectly the first time and we needed quite a few iterations on the real machines both to fix coding bugs, as well gaps in the spec. That was an interesting experience of speed-running rookie mistakes. Claude made naive bugs, but was also good at fixing them. For example, when I first tried to after , I got an error. Pasting it into Claude immediately showed the problem. Originally, the code was doing and not . The former checks if instance is logically created, the latter waits until the OS is booted. It makes sense that these two exist, and the difference is clear (and its also clear that OS booted != SSH demon started). Claude’s value here is in providing specific names for the concepts I already know to exist. Another fun one was about the disk. I noticed that, while the instance had an SSD, it wasn’t actually used. I asked Claude to mount it as home, but that didn’t work. Claude immediately asked me to run and that log immediately showed the problem. This is remarkable! 50% of my typical Linux debugging day is wasted not knowing that a useful log exists, and the other 50% is for searching for the log I know should exist somewhere . After the fix, I lost the ability to SSH. Pasting the error immediately gave the answer — by mounting over , we were overwriting ssh keys configured prior. There were couple of more iterations like that. Rookie mistakes were made, but they were debugged and fixed much faster than my personal knowledge allows (and again, I feel that is trivia knowledge, rather than deep reusable knowledge, so I am happy to delegate it!). It worked satisfactorily in the end, and, what’s more, I am happy to maintain the code, at least to the extent that I personally need it. Kinda hard to measure productivity boost here, but, given just the sheer number of CLI flags required to make this work, I am pretty confident that time was saved, even factoring the writing of the present article! I’ve recently read The Art of Doing Science and Engineering by Hamming (of distance and code), and one story stuck with me: A psychologist friend at Bell Telephone Laboratories once built a machine with about 12 switches and a red and a green light. You set the switches, pushed a button, and either you got a red or a green light. After the first person tried it 20 times they wrote a theory of how to make the green light come on. The theory was given to the next victim and they had their 20 tries and wrote their theory, and so on endlessly. The stated purpose of the test was to study how theories evolved. But my friend, being the kind of person he was, had connected the lights to a random source! One day he observed to me that no person in all the tests (and they were all high-class Bell Telephone Laboratories scientists) ever said there was no message. I promptly observed to him that not one of them was either a statistician or an information theorist, the two classes of people who are intimately familiar with randomness. A check revealed I was right! https://github.com/catern/rsyscall https://peter.bourgon.org/blog/2011/04/27/remote-development-from-mac-to-linux.html https://github.com/dsherret/dax JavaScript’s template literals , which allow implementing command interpolation in a safe by construction way. When processing , a string is never materialized, it’s arrays all the way to the syscall ( more on the topic ). JavaScript’s async/await, which makes managing concurrent processes (local or remote) natural: Additionally, deno specifically valiantly strives to impose process-level structured concurrency, ensuring that no processes spawned by the script outlive the script itself, unless explicitly marked — a sour spot of UNIX. Spawning VMs takes time, and that significantly reduces the throughput of agentic iteration. No way I let the agent run with a real AWS account, given that AWS doesn’t have a fool-proof way to cap costs. I am fairly confident that this script will be a part of my workflow for at least several years, so I care more about long-term code maintenance, than immediate result. It wasn’t the code that I would have written, it lacked my character, which made it hard for me to understand it at a glance. The code lacked any character whatsoever. It could have worked, it wasn’t “naively bad”, like the first code you write when you are learning programming, but there wasn’t anything good there. I never know what the code should be up-front. I don’t design solutions, I discover them in the process of refactoring. Some of my best work was spending a quiet weekend rewriting large subsystems implemented before me, because, with an implementation at hand, it was possible for me to see the actual, beautiful core of what needs to be done. With a slop-dump, I just don’t get to even see what could be wrong. In particular, while you are working the code (as in “wrought iron”), you often go back to requirements and change them. Remember that ambiguity of my request to “shut down idle cluster”? Claude tried to DWIM and created some horrific mess of bash scripts, timestamp files, PAM policy and systemd units. But the right answer there was “lets maybe not have that feature?” (in contrast, simply shutting the machine down after 8 hours is a one-liner).

2 views

Compiling Scheme to WebAssembly

One of my oldest open-source projects - Bob - has celebrated 15 a couple of months ago . Bob is a suite of implementations of the Scheme programming language in Python, including an interpreter, a compiler and a VM. Back then I was doing some hacking on CPython internals and was very curious about how CPython-like bytecode VMs work; Bob was an experiment to find out, by implementing one from scratch for R5RS Scheme. Several months later I added a C++ VM to Bob , as an exercise to learn how such VMs are implemented in a low-level language without all the runtime support Python provides; most importantly, without the built-in GC. The C++ VM in Bob implements its own mark-and-sweep GC. After many quiet years (with just a sprinkling of cosmetic changes, porting to GitHub, updates to Python 3, etc), I felt the itch to work on Bob again just before the holidays. Specifically, I decided to add another compiler to the suite - this one from Scheme directly to WebAssembly. The goals of this effort were two-fold: Well, it's done now; here's an updated schematic of the Bob project: The new part is the rightmost vertical path. A WasmCompiler class lowers parsed Scheme expressions all the way down to WebAssembly text, which can then be compiled to a binary and executed using standard WASM tools [2] . The most interesting aspect of this project was working with WASM GC to represent Scheme objects. As long as we properly box/wrap all values in ref s, the underlying WASM execution environment will take care of the memory management. For Bob, here's how some key Scheme objects are represented: $PAIR is of particular interest, as it may contain arbitrary objects in its fields; (ref null eq) means "a nullable reference to something that has identity". ref.test can be used to check - for a given reference - the run-time type of the value it refers to. You may wonder - what about numeric values? Here WASM has a trick - the i31 type can be used to represent a reference to an integer, but without actually boxing it (one bit is used to distinguish such an object from a real reference). So we don't need a separate type to hold references to numbers. Also, the $SYMBOL type looks unusual - how is it represented with two numbers? The key to the mystery is that WASM has no built-in support for strings; they should be implemented manually using offsets to linear memory. The Bob WASM compiler emits the string values of all symbols encountered into linear memory, keeping track of the offset and length of each one; these are the two numbers placed in $SYMBOL . This also allows to fairly easily implement the string interning feature of Scheme; multiple instances of the same symbol will only be allocated once. Consider this trivial Scheme snippet: The compiler emits the symbols "foo" and "bar" into linear memory as follows [3] : And looking for one of these addresses in the rest of the emitted code, we'll find: As part of the code for constructing the constant cons list representing the argument to write ; address 2051 and length 3: this is the symbol bar . Speaking of write , implementing this builtin was quite interesting. For compatibility with the other Bob implementations in my repository, write needs to be able to print recursive representations of arbitrary Scheme values, including lists, symbols, etc. Initially I was reluctant to implement all of this functionality by hand in WASM text, but all alternatives ran into challenges: So I bit the bullet and - with some AI help for the tedious parts - just wrote an implementation of write directly in WASM text; it wasn't really that bad. I import only two functions from the host: Though emitting integers directly from WASM isn't hard , I figured this project already has enough code and some host help here would be welcome. For all the rest, only the lowest level write_char is used. For example, here's how booleans are emitted in the canonical Scheme notation ( #t and #f ): This was a really fun project, and I learned quite a bit about realistic code emission to WASM. Feel free to check out the source code of WasmCompiler - it's very well documented. While it's a bit over 1000 LOC in total [4] , more than half of that is actually WASM text snippets that implement the builtin types and functions needed by a basic Scheme implementation. In Bob this is currently done with bytecodealliance/wasm-tools for the text-to-binary conversion and Node.js for the execution environment, but this can change in the future. I actually wanted to use Python bindings to wasmtime, but these don't appear to support WASM GC yet. Experiment with lowering a real, high-level language like Scheme to WebAssembly. Experiments like the recent Let's Build a Compiler compile toy languages that are at the C level (no runtime). Scheme has built-in data structures, lexical closures, garbage collection, etc. It's much more challenging. Get some hands-on experience with the WASM GC extension [1] . I have several samples of using WASM GC in the wasm-wat-samples repository , but I really wanted to try it for something "real". Deferring this to the host is difficult because the host environment has no access to WASM GC references - they are completely opaque. Implementing it in another language (maybe C?) and lowering to WASM is also challenging for a similar reason - the other language is unlikely to have a good representation of WASM GC objects.

0 views
Abhinav Sarkar 3 weeks ago

Implementing Co, a Small Language With Coroutines #5: Adding Sleep

In the previous post , we added channels to Co , the small language we are implementing in this series of posts. In this post, we add the primitive to it, enabling time-based coroutine scheduling. We then use sleep to build a simulation of digital logic circuits. This post was originally published on abhinavsarkar.net . This post is a part of the series: Implementing Co, a Small Language With Coroutines . Sleep is a commonly used operation in concurrent programs. It pauses the execution of the current Thread of Computation (ToC) for a specified duration, after which the ToC is resumed automatically. Sleep is used for various purposes: polling for events, delaying execution of an operation, simulating latency, implementing timeouts, and more. Sleep is generally implemented as a primitive operation in most languages, delegating the actual implementation to the underlying operating system. The operating system’s scheduler removes the ToC from the list of runnable ToCs , places it in a list of sleeping ToCs , and after the specified duration, moves it back to the list of runnable ToCs for scheduling. Since Co implements its own ToC (coroutine) scheduler, we implement sleep as a primitive operation within the interpreter itself 1 . We start by exposing and as built-in functions to Co : The built-in function takes one argument—the duration in milliseconds to sleep for. The function returns the current time in milliseconds since the Unix epoch . Both of them delegate to the functions explained next. The function evaluates its argument to a number, checks that it is non-negative, and then calls the function in the monad. calls and returns the milliseconds wrapped as a . The implementation of sleep is more involved than other built-in functions because it interacts with the coroutine scheduler. When a coroutine calls , we want to suspend the coroutine, and schedule it to be resumed after the specified duration. There may be multiple coroutines in the sleep state at a time, and they must be resumed according to their wakeup time (time at which sleep was called + sleep duration), and not in any other order. To be efficient, it is also important that the scheduler does not poll repeatedly for new coroutines to wake up and run, but instead waits till the right time. These are the two requirements for our coroutine scheduler. And the solution is: delayed coroutines. The coroutines we have implemented so far were scheduled to run immediately. To implement sleep, we extend the coroutine concept with Delayed Coroutines —coroutines that are scheduled to run at a specific future time. Now the data type holds an to signal when the coroutine is ready to be run. The old-style coroutines that run immediately are created ready to run by the function. But delayed coroutines are different: The key difference from a regular coroutine is that the used for signaling is created empty. We fork a thread 2 that sleeps 3 for the specified sleep duration, and then signals that the coroutine is ready to run by filling the . An is a synchronization primitive 4 —essentially a mutable box that can hold a value or be empty. When we call on an empty , it blocks until another thread fills it. This is what makes it powerful for our use case: instead of the interpreter repeatedly polling the queue asking “is this coroutine ready yet?”, we let the interpreter wait on the . The forked thread signals readiness at the right time by filling the . The interpreter wakes up immediately—no wasted CPU cycles, no busy-waiting. We already have a of coroutines in our . It is a min-priority queue sorted by timestamps, which we have been using as a FIFO queue till now. Now we use it for its real purpose: storing delayed coroutines sorted by their wakeup times. The queue also tracks the maximum wakeup time of all coroutines in the queue. This information is useful for calculating how long the interpreter should sleep before termination. The core operations on the queue are: We saw the function earlier : The function enqueues the given value at the given time in the queue. The function enqueues the value at the current time, thus scheduling it to run immediately. The function dequeues the value with the lowest priority from the queue, which in this case, is the value that is enqueued first. The function returns the monotonically increasing current system time. The function dequeues the coroutine with lowest priority, so if we use the wakeup time as priority, it will dequeue the coroutine that is to be run next. That works! The function calculates and tracks the maximum wakeup times of the coroutines as well. Next, we implement the scheduling of delayed coroutines: The function enqueues a coroutine in the interpreter coroutine queue with the specified wakeup time. We also improve the function to wait for the coroutine to be ready before running it. The function call blocks till the thread that was forked when creating the coroutine wakes up and fills the . So we don’t have to poll the queue. That’s all we have to do for having delayed coroutines. With the infrastructure in place, the function becomes straightforward: When a coroutine calls , we capture the current environment and use to capture the continuation—the code that should run after the sleep completes. We then create a new delayed coroutine with this continuation, schedule it for the future, and run the next coroutine in the queue. The scheduler machinery takes care of running the delayed coroutine at the right time. We also modify the function from the previous post to handle delayed coroutines. It now sleeps till the last wakeup time before checking if the queue is empty: Notice how we use the function we just defined in . The function calculates how long to sleep before the last coroutine becomes ready: That’s all for sleeping. This may be too much to take in, so let’s go through some examples. Sleep can be used for polling/waiting for events, delaying execution, simulating latency, implementing timeouts, and more. Let’s see some simple uses. An interesting example of sleep is the infamous sleep sort , which sorts a list of numbers by spawning a coroutine for each number that sleeps for the duration of that number, then prints it: Running this program prints what we expect: Don’t use for sorting your numbers though. Moving on. With sleep, we can implement JavaScript-like and functions: The function spawns a coroutine that sleeps for the specified duration and then calls the callback function. The function repeatedly calls a callback at a fixed interval using to reschedule itself. Running the above code prints alternating and every 1 second, forever: Notice that the scheduling is not accurate up to milliseconds, but only approximate. As a more complex example of using sleep, we implement a simulator for digital logic circuits, from basic Logic gates to a Ripple carry adder . The idea is to model circuits as a network of wires and gates, where the wires carry digital signal values ( or ), and the logic gates transform input signals to output signals with a propagation delay. The digital circuit simulation example is from the Wizard Book . Quoting an example: An inverter is a primitive function box [logic gate] that inverts its input. If the input signal to an inverter changes to 0, then one inverter-delay later the inverter will change its output signal to 1. If the input signal to an inverter changes to 1, then one inverter-delay later the inverter will change its output signal to 0. But first, we’ll need to make some lists. We implement a simple cons list (a singly linked list) using a trick from the book itself : creates an empty list, and we grow the list by prepending an element to it by calling the function. returns the first element of a list, and returns the rest of them. Notice that a cell is just a closure that holds references to its first and rest parameters, and returns a selector function to retrieve them. Next, we define a helper function to call a list of actions, yielding after each one: A wire holds a mutable signal value and a list of actions to call when the signal changes: A wire provides three operations: The function connects two wires, causing the signal from one to propagate to another. First, we define the basic logic operations: And a utility function to schedule a function to run after a delay: With these building blocks, we define the logic gates. Each gate computes its output based on its inputs and schedules the output update after a propagation delay specific to the gate: We add the action to each input wire, which runs when the input signals change, and sets the signal on the output wire after a delay. Let’s test an And gate: For probing, we define a helper that logs signal changes with milliseconds elapsed since start of the run: The output: It works as expected. You can notice the sleep and the And gate delay in action. Using the basic logic gates, next we build adders. A Half adder is a digital circuit that adds two bits: It has two input signals/bits and , and two output bits and . We simply connect the And, Or and Not gates with input, output and intermediate wires in our code as shown in the diagram: Nice and simple. Let’s test it: And the output: In binary, . Correct! Notice again how the signal propagation through the gates is delayed. Next up is the full adder. A Full adder adds three bits, two inputs and a carry-in: Notice that a full adder uses two half adders. Again, we follow the diagram and connect the wires: Let’s skip the demo for full adder and jump to something more exciting. A Ripple-carry adder chains together multiple full adders to add multi-bit numbers. The diagram below shows a four-bit adder: We create a ripple-carry adder that can add any number of bits. First we need some helper functions: creates a list of wires to represent an N-bit input/output. sets the bits of a N-bit wire list to a given N-bit value. Now we write a ripple-carry adder: The ripple-carry adder uses one full adder per bit, cascading the carry-out bit of each input bit-pair’s sum to the next pair of bits. To demonstrate, let’s add two 4-bit numbers: This one runs for a while because of the collective delays. Let me pick out the final output: We add and in binary, resulting in , which is correct again. Everything works perfectly. With sleep, we’ve now implemented all major features of Co —a complete concurrent language with first-class coroutines, channels, and time-based scheduling. With the addition of sleep, we’ve completed our implementation of Co —a small language with coroutines and channels. Over these five posts, we went from parsing source code to building a full interpreter that handles cooperative multitasking using coroutines. The key insight was realizing that coroutines are just environments plus continuations. By designing our interpreter to use continuation-passing style, we gained the ability to suspend execution at any point and resume it later. Channels built naturally on top of that, providing a way for coroutines to synchronize and pass messages. And sleep extended the scheduler to handle time-based execution, unlocking patterns like timeouts and periodic tasks. The examples we built along the way—pubsub system, actor system, and digital circuit simulation—show what becomes possible once these primitives are in place. Starting with basic arithmetic and functions, we ended up with a language capable of expressing real concurrent programs. What comes next? Maybe a compiler for Co ? Stay tuned by subscribing to the feed or the email newsletter . The full code for the Co interpreter is available here . If you have any questions or comments, please leave a comment below. If you liked this post, please share it. Thanks for reading! The sleep implementation in Co is not interruptible. That is, if a coroutine is sleeping, it cannot be resumed before the specified duration. This is different from sleep implementations in most programming languages, where the sleep operation can be interrupted by sending a signal to the sleeping ToC. ↩︎ Threads in GHC are Green Threads and are very cheap to create and run. It is perfectly okay to fork a new one for each delayed coroutine. ↩︎ So in a way, we cheat here by using the sleep primitive provided by the GHC runtime to implement our sleep primitive. If we write a compiler for Co , we’ll have to write our own runtime where we’ll have to implement our sleep function using the functionalities provided by the operating systems. ↩︎ To learn more about how s can be used to communicate between threads, read the chapter 24 of Real World Haskell . ↩︎ This post is a part of the series: Implementing Co, a Small Language With Coroutines . If you liked this post, please leave a comment . The Interpreter Adding Coroutines Adding Channels Adding Sleep 👈 Introduction Adding Sleep Delayed Coroutines Queuing Coroutines Implementing Sleep Sleep in Action Sleep Sort JavaScript-like Timeouts and Intervals Bonus Round: Digital Circuit Simulation Conjuring Lists Logic Gates Ripple-carry Adder : returns the current signal value. : sets a new signal value and calls all actions if the value changed. : adds an action to be called when the signal changes, and calls it immediately. The sleep implementation in Co is not interruptible. That is, if a coroutine is sleeping, it cannot be resumed before the specified duration. This is different from sleep implementations in most programming languages, where the sleep operation can be interrupted by sending a signal to the sleeping ToC. ↩︎ Threads in GHC are Green Threads and are very cheap to create and run. It is perfectly okay to fork a new one for each delayed coroutine. ↩︎ So in a way, we cheat here by using the sleep primitive provided by the GHC runtime to implement our sleep primitive. If we write a compiler for Co , we’ll have to write our own runtime where we’ll have to implement our sleep function using the functionalities provided by the operating systems. ↩︎ To learn more about how s can be used to communicate between threads, read the chapter 24 of Real World Haskell . ↩︎ The Interpreter Adding Coroutines Adding Channels Adding Sleep 👈

0 views