Posts in Javascript (20 found)
Chris Coyier Yesterday

AI is my CMS

I mean… it’s not really, of course. I just thought such a thing would start to trickle out to people’s minds as agentic workflows start to take hold. Has someone written "AI is my CMS" yet? Feels inevitable. Like why run a build tool when you can just prompt another page? AI agents are already up in your codebase fingerbanging whole batches of files on command. What’s the difference between a CMS taking some content and smashing it into some templates and an AI doing that same job instead? Isn’t less tooling good? I had missed that this particular topic already had quite a moment in the sun this past December. Lee Robinson wrote Coding Agents & Complexity Budgets . Without calling it out by name, Lee basically had a vibe-coding weekend where he ripped out Sanity from cursor.com. I don’t think Lee is wrong for this choice. Spend some money to save some money. Remove some complexity. Get the code base more AI-ready. Yadda yadda. Even though Lee didn’t call out Sanity, they noticed and responded . They also make some good and measured points, I think. Which makes this a pretty great blog back-and-forth, by the way, which you love to see. Some of their argument as to why it can be the right choice to have Sanity is that some abstraction and complexity can be good, actually, because building websites from content can be complicated, especially as time and scale march on. And if you rip out a tool that does some of it, only to re-build many of those features in-house, what have you really gained? TIME FOR MY TWO CENTS. The language feels a little wrong to me. I think if you’re working with Markdown-files as content in a Next.js app… that’s already a CMS. You didn’t rip out a CMS, you ripped out a cloud database . Yes, that cloud database does binary assets also, and handles user management, and has screens for CRUDing the content, but to me it’s more of a cloud data store than a CMS. The advantage Lee got was getting the data and assets out of the cloud data store. I don’t think they were using stuff like the fancy GROQ language to get at their content in fine-grained ways. It’s just that cursor.com happened to not really need a database, and in fact was using it for things they probably shouldn’t have been (like video hosting). Me, I don’t think there is one right answer. If keeping content in Markdown files and building sites by smashing those into templates is wrong, then every static site generator ever built is wrong (🙄). But keeping content in databases isn’t wrong either. I tend to lean that way by default, since the power you get from being able to query is so obviously and regularly useful. Maybe they are both right in that having LLM tools that have the power to wiggleworm their way into the content no matter where it is, is helpful. In the codebase? Fine. In a DB that an MCP can access? Fine.

0 views
David Bushell 3 days ago

SvelteKit i18n and FOWL

Perhaps my favourite JavaScript APIs live within the Internationalization namespace. A few neat things the global allows: It’s powerful stuff and the browser or runtime provides locale data for free! That means timezones, translations, and local conventions are handled for you. Remember moment.js? That library with locale data is over 600 KB (uncompressed). That’s why JavaScript now has the Internationalization API built-in. SvelteKit and similar JavaScript web frameworks allow you to render a web page server-side and “hydrate” in the browser. In theory , you get the benefits of an accessible static website with the progressively enhanced delights of a modern “web app”. I’m building attic.social with SvelteKit. It’s an experiment without much direction. I added a bookmarks feature and used to format dates. Perfect! Or was it? Disaster strikes! See this GIF: What is happening here? Because I don’t specify any locale argument in the constructor it uses the runtime’s default. When left unconfigured, many environments will default to . I spotted this bug only in production because I’m hosting on a Cloudflare worker. SvelteKit’s first render is server-side using but subsequent renders use in my browser. My eyes are briefly sullied by the inferior US format! Is there a name for this effect? If not I’m coining: “Flash of Wrong Locale” (FOWL). To combat FOWL we must ensure that SvelteKit has the user’s locale before any templates are rendered. Browsers may request a page with the HTTP header. The place to read headers is hooks.server.ts . I’ve vendored the @std/http negotiation library to parse the request header. If no locales are provided it returns which I change to . SvelteKit’s is an object to store custom data for the lifetime of a single request. Event are not directly accessible to SvelteKit templates. That could be dangerous. We must use a page or layout load function to forward the data. Now we can update the original example to use the data. I don’t think the rune is strictly necessary but it stops a compiler warning . This should eliminate FOWL unless the header is missing. Privacy focused browsers like Mullvad Browser use a generic header to avoid fingerprinting. That means users opt-out of internationalisation but FOWL is still gone. If there is a cache in front of the server that must vary based on the header. Otherwise one visitor defines the locale for everyone who follows unless something like a session cookie bypasses the cache. You could provide a custom locale preference to override browser settings. I’ve done that before for larger SvelteKit projects. Link that to a session and store it in a cookie, or database. Naturally, someone will complain they don’t like the format they’re given. This blog post is guaranteed to elicit such a comment. You can’t win! Why can’t you be normal, Safari? Despite using the exact same locale, Safari still commits FOWL by using an “at” word instead of a comma. Who’s fault is this? The ECMAScript standard recommends using data from Unicode CLDR . I don’t feel inclined to dig deeper. It’s a JavaScriptCore quirk because Bun does the same. That is unfortunate because it means the standard is not quite standard across runtimes. By the way, the i18n and l10n abbreviations are kinda lame to be honest. It’s a fault of my design choices that “internationalisation” didn’t fit well in my title. Thanks for reading! Follow me on Mastodon and Bluesky . Subscribe to my Blog and Notes or Combined feeds. Natural alphanumeric sorting Relative date and times Currency formatting

0 views
iDiallo 4 days ago

The Server Older than my Kids!

This blog runs on two servers. One is the main PHP blog engine that handles the logic and the database, while the other serves all static files. Many years ago, an article I wrote reached the top position on both Hacker News and Reddit. My server couldn't handle the traffic . I literally had a terminal window open, monitoring the CPU and restarting the server every couple of minutes. But I learned a lot from it. The page receiving all the traffic had a total of 17 assets. So in addition to the database getting hammered, my server was spending most of its time serving images, CSS and JavaScript files. So I decided to set up additional servers to act as a sort of CDN to spread the load. I added multiple servers around the world and used MaxMindDB to determine a user's location to serve files from the closest server . But it was overkill for a small blog like mine. I quickly downgraded back to just one server for the application and one for static files. Ever since I set up this configuration, my server never failed due to a traffic spike. In fact, in 2018, right after I upgraded the servers to Ubuntu 18.04, one of my articles went viral like nothing I had seen before . Millions of requests hammered my server. The machine handled the traffic just fine. It's been 7 years now. I've procrastinated long enough. An upgrade was long overdue. What kept me from upgrading to Ubuntu 24.04 LTS was that I had customized the server heavily over the years, and never documented any of it. Provisioning a new server means setting up accounts, dealing with permissions, and transferring files. All of this should have been straightforward with a formal process. Instead, uploading blog post assets has been a very manual affair. I only partially completed the upload interface, so I've been using SFTP and SCP from time to time to upload files. It's only now that I've finally created a provisioning script for my asset server. I mostly used AI to generate it, then used a configuration file to set values such as email, username, SSH keys, and so on. With the click of a button, and 30 minutes of waiting for DNS to update, I now have a brand new server running Ubuntu 24.04, serving my files via Nginx. Yes, next months Ubuntu 26.04 LTS comes out, and I can migrate it by running the same script. I also built an interface for uploading content without relying on SFTP or SSH, which I'll be publishing on GitHub soon. It's been 7 years running this server. It's older than my kids. Somehow, I feel a pang of emotion thinking about turning it off. I'll do it tonight... But while I'm at it, I need to do something about the 9-year-old and 11-year-old servers that still run some crucial applications.

0 views

I built a site for hosting events

Recently I started the Columbus Vintage Computing Club, it's a group for nerds like me that enjoy classic computers and software. Our first meeting was a week ago and it was an absolute blast. We had a Tandy and Vic20 on display, and lengthy discussions down memory lane! While I was planning the group, I knew I needed somewhere to post about the club and see who was interested in attending. I've run groups before (previously the Frontend Columbus group), so I went to the platform I knew best, Meetup. The first thing I noticed was Meetup absolutely chugged on my laptop. A site for posting groups and events should not bring my laptop to it's knees, but I guess that's the JavaScript filled world we live in. The next thing I noticed (after investing the time in creating my group and event), was that as a free-tier peasant, I was only allowed to have 10 people RSVP to my event. Absolutely insane! Everywhere Meetup was shoving their subscription infront of me, begging for my cash, just so I can run a free event for computer nerds. So fuck Meetup, I built Kai. Kai is a free and open-source platform for posting groups and organizing events. Open to anyone, no ads, no tracking. You can check it out at KaiMeet.com . It's beta, it's rough around the edges, and I only loaded in support for U.S. cities at the moment (want to use it somewhere else, email me !). But it works for my club, maybe it'll work for yours? More features will be coming as needed for the club I host, or if people reach out and ask for things. You can also open a PR on Github . Kai will be run as Donationware (BuyMeACoffee link) . If ya like it, help me pay the server cost. Kai represents something interesting that, to be honest, I'm excited to watch: the fall of SaaS companies that have ruined the internet. I'm not saying Kai will be disruptive, but somebody's version of what I'm doing will be. And people will keep doing this now that creating software is accessible. Those services that pray on being the only game in town, that sell your data to advertising partners and shove subscriptions in your face? They should be freaking the fuck out right now.

0 views
Chris Coyier 4 days ago

Claude is an Electron App

Juicy intro from Nikita Prokopov : In  “Why is Claude an Electron App?”  Drew Breunig wonders: Claude spent $20k on an agent swarm implementing (kinda) a C-compiler in Rust, but desktop Claude is an Electron app. If code is free, why aren’t all apps native? And then argues that the answer is that LLMs are not good enough yet. They can do 90% of the work, so there’s still a substantial amount of manual polish, and thus, increased costs. But I think that’s not the real reason. The real reason is: native has nothing to offer.

0 views
Ankur Sethi 4 days ago

I built a programming language using Claude Code

Over the course of four weeks in January and February, I built a new programming language using Claude Code. I named it Cutlet after my cat. It’s completely legal to do that. You can find the source code on GitHub , along with build instructions and example programs . I’ve been using LLM-assisted programming since the original GitHub Copilot release in 2021, but so far I’ve limited my use of LLMs to generating boilerplate and making specific, targeted changes to my projects. While working on Cutlet, though, I allowed Claude to generate every single line of code. I didn’t even read any of the code. Instead, I built guardrails to make sure it worked correctly (more on that later). I’m surprised by the results of this experiment. Cutlet exists today. It builds and runs on both macOS and Linux. It can execute real programs. There might be bugs hiding deep in its internals, but they’re probably no worse than ones you’d find in any other four-week-old programming language in the world. I have Feelings™ about all of this and what it means for my profession, but I want to give you a tour of the language before I get up on my soapbox. If you want to follow along, build the Cutlet interpreter from source and drop into a REPL using . Arrays and strings work as you’d expect in any dynamic language. Variables are declared with the keyword. Variable names can include dashes. Same syntax rules as Raku . The only type of number (so far) is a double. Here’s something cool: the meta-operator turns any regular binary operator into a vectorized operation over an array. In the next line, we’re multiplying every element of by 1.8, then adding 32 to each element of the resulting array. The operator is a zip operation. It zips two arrays into a map. Output text using the built-in function. This function returns , which is Cutlet’s version of . The meta operator also works with comparisons. Here’s another cool bit: you can index into an array using an array of booleans. This is a filter operation. It picks the element indexes corresponding to and discards those that correspond to . Here’s a shorter way of writing that. Let’s print this out with a user-friendly message. The operator concatenates strings and arrays. The built-in turns things into strings. The meta-operator in the prefix position acts as a reduce operation. Let’s find the average temperature. adds all the temperatures, and the built-in finds the length of the array. Let’s print this out nicely, too. Functions are declared with . Everything in Cutlet is an expression, including functions and conditionals. The last value produced by an expression in a function becomes its return value. Your own functions can work with too. Let’s reduce the temperatures with our function to find the hottest temperature. Cutlet can do a lot more. It has all the usual features you’d expect from a dynamic language: loops, objects, prototypal inheritance, mixins, a mark-and-sweep garbage collector, and a friendly REPL. We don’t have file I/O yet, and some fundamental constructs like error handling are still missing, but we’re getting there! See TUTORIAL.md in the git repository for the full documentation. I’m a frontend engineer and (occasional) designer. I’ve tried using LLMs for building web applications, but I’ve always run into limitations. In my experience, Claude and friends are scary good at writing complex business logic, but fare poorly on any task that requires visual design skills. Turns out describing responsive layouts and animations in English is not easy. No amount of screenshots and wireframes can communicate fluid layouts and animations to an LLM. I’ve wasted hours fighting with Claude about layout issues it swore it had fixed, but which I could still see plainly with my leaky human eyes. I’ve also found these tools to excel at producing cookie-cutter interfaces they’ve seen before in publicly available repositories, but they fall off when I want to do anything novel. I often work with clients building complex data visualizations for niche domains, and LLMs have comprehensively failed to produce useful outputs on these projects. On the other hand, I’d seen people accomplish incredible things using LLMs in the last few months, and I wanted to replicate those experiments myself. But my previous experience with LLMs suggested that I had to pick my project carefully. A small, dynamic programming language met all my requirements. Finally, this was also an experiment to figure out how far I could push agentic engineering. Could I compress six months of work into a few weeks? Could I build something that was beyond my own ability to build? What would my day-to-day work life look like if I went all-in on LLM-driven programming? I wanted to answer all these questions. I went into this experiment with some skepticism. My previous attempts at building something entirely using Claude Code hadn’t worked out. But this attempt has not only been successful, but produced results beyond what I’d imagined possible. I don’t hold the belief that all software in the future will be written by LLMs. But I do believe there is a large subset that can be partially or mostly outsourced to these new tools. Building Cutlet taught me something important: using LLMs to produce code does not mean you forget everything you’ve learned about building software. Agentic engineering requires careful planning, skill, craftsmanship, and discipline, just like any software worth building before generative AI. The skills required to work with coding agents might look different from typing code line-by-line into an editor, but they’re still very much the same engineering skills we’ve been sharpening all our careers. There is a lot of work involved in getting good output from LLMs. Agentic engineering does not mean dumping vague instructions into a chat box and harvesting the code that comes out. I believe there are four main skills you have to learn today in order to work effectively with coding agents: Models and harnesses are changing rapidly, so figuring out which problems LLMs are good at solving requires developing your intuition, talking to your peers, and keeping your ear to the ground. However, if you don’t want to stay up-to-date with a rapidly-changing field—and I wouldn’t judge you for it, it’s crazy out there—here are two questions you can ask yourself to figure out if your problem is LLM-shaped: If the answer to either of those questions is “no”, throwing AI at the problem is unlikely to yield good results. If the answer to both of them is “yes”, then you might find success with agentic engineering. The good news is that the cost of figuring this out is the price of a Claude Code subscription and one sacrificial lamb on your team willing to spend a month trying it out on your codebase. LLMs work with natural language, so learning to communicate your ideas using words has become crucial. If you can’t explain your ideas in writing to your co-workers, you can’t work effectively with coding agents. You can get a lot out of Claude Code using simple, vague, overly general prompts. But when you do that, you’re outsourcing a lot of your thinking and decision-making to the robot. This is fine for throwaway projects, but you probably want to be more careful when you’re building something you will put into production and maintain for years. You want to feed coding agents precisely written specifications that capture as much of your problem space as possible. While working on Cutlet, I spent most of my time writing, generating, reading, and correcting spec documents . For me, this was a new experience. I primarily work with early-stage startups, so for most of my career, I’ve treated my code as the spec. Writing formal specifications was an alien experience. Thankfully, I could rely on Claude to help me write most of these specifications. I was only comfortable doing this because Cutlet was an experiment. On a project I wanted to stake my reputation on, I might take the agent out of the equation altogether and write the specs myself. This was my general workflow while making any change to Cutlet: This workflow front-loaded the cognitive effort of making any change to the language. All the thinking happened before a single line of code was written, which is something I almost never do. For me, programming involves organically discovering the shape of a problem as I’m working on it. However, I’ve found that working that way with LLMs is difficult. They’re great at making sweeping changes to your codebase, but terrible at quick, iterative, organic development workflows. Maybe my workflow will evolve as inference gets faster and models become better, but until then, this waterfall-style model works best. I find this to be the most interesting and fun part of working with coding agents. It’s a whole new class of problem to solve! The core principle is this: coding agents are computer programs, and therefore have a limited view of the world they exist in. Their only window into the problem you’re trying to solve is the directory of code they can access. This doesn’t give them enough agency or information to be able to do a good job. So, to help them thrive, you must give them that agency and information in the form of tools they can use to reach out into the wider world. What does this mean in practice? It looks different for different projects, but this is what I did for Cutlet: All these tools and abilities guaranteed that any updates to the code resulted in a project that at least compiled and executed. But more importantly, they increased the information and agency Claude had access to, making it more effective at discovering and debugging problems without my intervention. If I keep working on this project, my main focus will be to give my agents even more insight into the artifact they are building, even more debugging tools, even more freedom, and even more access to useful information. You will want to come up with your own tooling that works for your specific project. If you’re building a Django app, you might want to give the agent access to a staging database. If you’re building a React app, you might want to give it access to a headless browser. There’s no single answer that works for every project, and I bet people are going to come up with some very interesting tools that allow LLMs to observe the results of their work in the real world. Coding agents can sometimes be inefficient in how they use the tools you give them. For example, while working on this project, sometimes Claude would run a command, decide its output was too long to fit into the context window, and run it again with the output piped to . Other times it would run , forget to the output for errors, and run it a second time to capture the output. This would result in the same expensive checks running multiple times in the course of making a single edit. These mistakes slowed down the agentic loop significantly. I could fix some of these performance bottlenecks by editing or changing the output of a custom script. But there were some issues that required more effort to discover and fix. I quickly got into the habit of observing the agent at work, noticing sequences of commands that the agent repeated over and over again, and turning them into scripts for the agent to call instead. Many of the scripts in Cutlet’s directory came about this way. This was very manual, very not-fun work. I’m hoping this becomes more automated as time goes on. Maybe a future version of Claude Code could review its own tool calling outputs and suggest scripts you could write for it? Of course, the most fruitful optimization was to run Claude inside Docker with and access. By doing this, I took myself out of the agentic loop. After a plan file had been produced, I didn’t want to hang around babysitting agents and saying every time they wanted to run . As Cutlet evolved, the infrastructure I built for Claude also evolved. Eventually, I captured many of the workflows Claude naturally followed as scripts, slash commands, or instructions in . I also learned where the agent stumbled most, and preempted those mistakes by giving it better instructions or scripts to run. The infrastructure I built for Claude was also valuable for me, the human working on the project. The same scripts that helped Claude automate its work also helped me accomplish common tasks quickly. As the project grows, this infrastructure will keep evolving along with it. Models change all the time. So do project requirements and workflows. I look at all this project infrastructure as an organic thing that will keep changing as long as the project is active. Now that it’s possible for individual developers to accomplish so much in such little time, is software engineering as a career dead? My answer to this question is nope, not at all . Software engineering skills are just as valuable today as they were before language models got good. If I hadn’t taken a compilers course in college and worked through Crafting Interpreters , I wouldn’t have been able to build Cutlet. I still had to make technical decisions that I could only make because I had (some) domain knowledge and experience. Besides, I had to learn a bunch of new skills in order to effectively work on Cutlet. These new skills also required technical knowledge. A strange and new and different kind of technical knowledge, but technical knowledge nonetheless. Before working on this project, I was worried about whether I’d have a job five years from now. But today I’m convinced that the world will continue to have a need for software engineers in the future. Our jobs will transform—and some people might not enjoy the new jobs anymore—but there will still be plenty of work for us to do. Maybe we’ll have even more work to do than before, since LLMs allow us to build a lot more software a lot faster. And for those of us who never want to touch LLMs, there will be domains where LLMs never make any inroads. My friends who work on low-level multimedia systems have found less success using LLMs compared to those who build webapps. This is likely to be the case for many years to come. Eventually, those jobs will transform, too, but it will be a far slower shift. Is it fair to say that I built Cutlet? After all, Claude did most of the work. What was my contribution here besides writing the prompts? Moreover, this experiment only worked because Claude had access to multiple language runtimes and computer science books in its training data. Without the work done by hundreds of programmers, academics, and writers who have freely donated their work to the public, this project wouldn’t have been possible. So who really built Cutlet? I don’t have a good answer to that. I’m comfortable taking credit for the care and feeding of the coding agent as it went about generating tokens, but I don’t feel a sense of ownership over the code itself. I don’t consider this “my” work. It doesn’t feel right. Maybe my feelings will change in the future, but I don’t quite see how. Because of my reservations about who this code really belongs to, I haven’t added a license to Cutlet’s GitHub repository. Cutlet belongs to the collective consciousness of every programming language designer, implementer, and educator to have released their work on the internet. (Also, it’s worth noting that Cutlet almost certainly includes code from the Lua and Python interpreters. It referred to those languages all the time when we talked about language features. I’ve also seen a ton of code from Crafting Interpreters making its way into the codebase with my own two fleshy eyes.) I’d be remiss if I didn’t include a note on mental health in this already mammoth blog post. It’s easy to get addicted to agentic engineering tools. While working on this project, I often found myself at my computer at midnight going “just one more prompt”, as if I was playing the world’s most obscure game of Civilization . I’m embarrassed to admit that I often had Claude Code churning away in the background when guests were over at my place, when I stepped into the shower, or when I went off to lunch. There’s a heady feeling that comes from accomplishing so much in such little time. More addictive than that is the unpredictability and randomness inherent to these tools. If you throw a problem at Claude, you can never tell what it will come up with. It could one-shot a difficult problem you’ve been stuck on for weeks, or it could make a huge mess. Just like a slot machine, you can never tell what might happen. That creates a strong urge to try using it for everything all the time. And just like with slot machines, the house always wins. These days, I set limits for how long and how often I’m allowed to use Claude. As LLMs become widely available, we as a society will have to figure out the best way to use them without destroying our mental health. This is the part I’m not very optimistic about. We have comprehensively failed to regulate or limit our use of social media, and I’m willing to bet we’ll have a repeat of that scenario with LLMs. Now that we can produce large volumes of code very quickly, what can we do that we couldn’t do before? This is another question I’m not equipped to answer fully at the moment. That said, one area where I can see LLMs being immediately of use to me personally is the ability to experiment very quickly. It’s very easy for me to try out ten different features in Cutlet because I just have to spec them out and walk away from the computer. Failed experiments cost almost nothing. Even if I can’t use the code Claude generates, having working prototypes helps me validate ideas quickly and discard bad ones early. I’ve also been able to radically reduce my dependency on third-party libraries in my JavaScript and Python projects. I often use LLMs to generate small utility functions that previously required pulling in dependencies from NPM or PyPI. But honestly, these changes are small beans. I can’t predict the larger societal changes that will come about because of AI agents. All I can say is programming will look radically different in 2030 than it does in 2026. This project was a proof of concept to see how far I could push Claude Code. I’m currently looking for a new contract as a frontend engineer, so I probably won’t have the time to keep working on Cutlet. I also have a few more ideas for pushing agentic programming further, so I’m likely to prioritize those over continuing work on Cutlet. When the mood strikes me, I might still add small features now and then to the language. Now that I’ve removed myself from the development loop, it doesn’t take a lot of time and effort. I might even do Advent of Code using Cutlet in December! Of course, if you work at Anthropic and want to give me money so I can keep running this experiment, I’m available for contract work for the next 8 months :) For now, I’m closing the book on Cutlet and moving on to other projects (and cat). Thanks to Shruti Sunderraman for proofreading this post. Also thanks to Cutlet the cat for walking across the keyboard and deleting all my work three times today. I didn’t want to solve a particularly novel problem, but I wanted the ability to sometimes steer the LLM into interesting directions. I didn’t want to manually verify LLM-generated code. I wanted to give the LLM specifications, test cases, documentation, and sample outputs, and make it do all the difficult work of figuring out if it was doing the right thing. I wanted to give the agent a strong feedback loop so it could run autonomously. I don’t like MCPs. I didn’t want to deal with them. So anything that required connecting to a browser, taking screenshots, or talking to an API over the network was automatically disqualified. I wanted to use a boring language with as few external dependencies as possible. LLMs know how to build language implementations because their training data contains thousands of existing implementations, papers, and CS books. I was intrigued by the idea of creating a “remix” language by picking and choosing features I enjoy from various existing languages. I could write a bunch of small deterministic programs along with their expected outputs to test the implementation. I could even get Claude to write them for me, giving me a potentially infinite number of test cases to verify that the language was working correctly. Language implementations can be tested from the command line, with purely textual inputs and outputs. No need to take screenshots or videos or set up fragile MCPs. There’s no better feedback loop for an agent than “run and until there are no more errors”. C is as boring as it gets, and there are a large number of language implementations built in C. Understanding which problems can be solved effectively using LLMs, which ones need a human in the loop, and which ones should be handled entirely by humans. Communicating your intent clearly and defining criteria for success. Creating an environment in which the LLM can do its best work. Monitoring and optimizing the agentic loop so the agent can work efficiently. For the problem you want to solve, is it possible to define and verify success criteria in an automated fashion? Have other people solved this problem—or a similar one—before? In other words, is your problem likely to be in the training data for an LLM? First, I’d present the LLM with a new feature (e.g. loops) or refactor (e.g. moving from a tree-walking interpreter to a bytecode VM). Then I’d have a conversation with it about how the change would work in the context of Cutlet, how other languages implemented it, design considerations, ideas we could steal from interesting/niche languages, etc. Just a casual back-and-forth, the same way you might talk to a co-worker. After I had a good handle on what the feature or change would look like, I’d ask the LLM to give me an implementation plan broken down into small steps. I’d review the plan and go back and forth with the LLM to refine it. We’d explore various corner cases, footguns, gotchas, missing pieces, and improvements. When I was happy with the plan, I’d ask the LLM to write it out to a file that would go into a directory. Sometimes we’d end up with 3-4 plan files for a single feature. This was intentional. I needed the plans to be human-readable, and I needed each plan to be an atomic unit I could roll back if things didn’t work out. They also served as a history of the project’s evolution. You can find all the historical plan files in the Cutlet repository. I’d read and review the generated plan file, go back and forth again with the LLM to make changes to it, and commit it when everything looked good. Finally, I’d fire up a Docker container, run Claude with all permissions—including access—and ask it to implement my plan. Comprehensive test suite . My project instructions told Claude to write tests and make sure they failed before writing any new code. Alongside, I asked it to run tests after making significant code changes or merging any branches. Armed with a constantly growing test suite, Claude was able to quickly identify and fix any regressions it introduced into the codebase. The tests also served as documentation and specification. Sample inputs and outputs . These were my integration tests. I added a number of example programs to the Cutlet repository—most of them written by Claude itself—that not only serve as documentation for humans, but also as an end-to-end test suite. The project instructions told Claude to run all of them and verify their output after every code change. Linters, formatters, and static analysis tools . Cutlet uses and to ensure a baseline of code quality. Just like with tests, the project instructions asked the LLM to run these tools after every major code change. I noticed that would often produce diagnostics that would force Claude to rewrite parts of the code. If I had access to some of the more expensive static analysis tools (such as Coverity ), I would have added them to my development process too. Memory safety tools . I asked Claude to create a target that rebuilt the entire project and test suite with ASan and UBSan enabled (with LSan riding along via ASan), then ran every test under the instrumented build. The project instructions included running this check at the end of implementing a plan. This caught memory errors—use-after-free, buffer overflows, undefined behavior—that neither the tests nor the linter could find. Running these tests took time and greatly slowed down the agent, but they caught even more issues than . Symbol indexes . The agent had access to and for navigating the source code. I don’t know how useful this was, because I rarely ever saw it use them. Most of the time it would just the code for symbols. I might remove this in the future. Runtime introspection tools . Early in the project, I asked Claude to give Cutlet the ability to dump the token stream, AST, and bytecode for any piece of code to the standard output before executing it. This allowed the agent to quickly figure out if it had introduced errors into any part of the execution pipeline without having to navigate the source code or drop into a debugger. Pipeline tracing . I asked Claude to write a Python script that fed a Cutlet program through the interpreter with debug flags to capture the full compilation pipeline : the token stream, the AST, and the bytecode disassembly. It then mapped each token type, AST node, and opcode back to the exact source locations in the parser, compiler, and VM where they were handled. When an agent needed to add a new language feature, it could run the tracer on an example of a similar existing feature to see precisely which files and functions to touch. I was very proud of this machinery, but I never saw Claude make much use of it either. Running with every possible permission . I wanted the agent to work autonomously and have access to every debugging tool it might want to use. To do this, I ran it inside a Docker container with enabled and full access. I believe this is the only practical way to use coding agents on large projects. Answering permissions prompts is cognitively taxing when you have five agents working in parallel, and restricting their ability to do whatever they want makes them less effective at their job. We will need to figure out all sorts of safety issues that arise when you give LLMs the ability to take full control of a system, but on this project, I was willing to accept the risks that come with YOLO mode.

0 views
Simon Willison 5 days ago

Perhaps not Boring Technology after all

A recurring concern I've seen regarding LLMs for programming is that they will push our technology choices towards the tools that are best represented in their training data, making it harder for new, better tools to break through the noise. This was certainly the case a couple of years ago, when asking models for help with Python or JavaScript appeared to give much better results than questions about less widely used languages. With the latest models running in good coding agent harnesses I'm not sure this continues to hold up. I'm seeing excellent results with my brand new tools where I start by prompting "use uvx showboat --help / rodney --help / chartroom --help to learn about these tools" - the context length of these new models is long enough that they can consume quite a lot of documentation before they start working on a problem. Drop a coding agent into any existing codebase that uses libraries and tools that are too private or too new to feature in the training data and my experience is that it works just fine - the agent will consult enough of the existing examples to understand patterns, then iterate and test its own output to fill in the gaps. This is a surprising result. I thought coding agents would prove to be the ultimate embodiment of the Choose Boring Technology approach, but in practice they don't seem to be affecting my technology choices in that way at all. Update : A few follow-on thoughts: You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options . The issue of what technology LLMs recommend is a separate one. What Claude Code Actually Chooses is an interesting recent study where Edwin Ong and Alex Vikati where they proved Claude Code over 2,000 times and found a strong bias towards build-over-buy but also identified a preferred technical stack, with GitHub Actions, Stripe, and shadcn/ui seeing a "near monopoly" in their respective categories. For the sake of this post my interest is in what happens when the human makes a technology choice that differs from those preferred by the model harness. The Skills mechanism that is being rapidly embraced by most coding agent tools is super-relevant here. We are already seeing projects release official skills to help agents use them - here are examples from Remotion , Supabase , Vercel , and Prisma .

0 views

Superpowers 5

Superpowers 5 is out today. By far, my favorite new feature is the "Visual Brainstorming" companion tool, which grew out of my frustration with the ASCII art that Claude usually generates when you ask it anything about design or UX. I found myself asking Claude, over and over, "Hey, why don't you write that out as HTML so I can see what you're talking about." It took me far too long to remember what might be the most important mantra of agentic coding: "Why am I doing this?" And so now, if your agent has Superpowers and thinks that it has something for you to see, it'll prompt you: Some of what we'll be working on might be easier to explain if I can show it to you in a web browser. I can put together mockups, diagrams, comparisons, and other visuals as we go. This feature is still new and can be token-intensive. Want to try it? (Requires opening a local URL) As an example of how this works, I fired up an instance of Claude with Superpowers 5 to clean up the rendered webpages from Youtube2Webpage . Until quite recently, Youtube2Webpage was, by far, my most popular GitHub repo. Thanks to all of you, that is no longer the case. And then after a little more back and forth in the terminal, Claude told me to go look at the browser. Visual Brainstorming has been in our dev branch for a couple months now. Since it landed, Claude Code's built in "AskUserQuestion" tool has grown support for showing ASCII-art diagrams attached to choices: ASCII art is great for communicating basic intent, but when you're working anything complicated, getting out of the terminal can make it much easier to communicate about what you're doing. As an example, I tried to get Claude to help with some brand/logo ideation for Prime Radiant . You can imagine what kind of a disaster that would have been in ASCII art. Behind the scenes, we're spinning up a web server that loads chunks of HTML written by your agent off of disk and a little bit of client-side javascript that returns clicks and feedback from the browser as you interact with it. Visual Brainstorming is mostly tested in Claude Code and Codex, but should work in most agents. One of the most important workflow improvements in 5.0 is a new 'spec review' loop. Starting with Claude Opus 4.5 or so, I've been catching my agents leaving some steps in longer spec and planning docs as "TBD" or "Fill this in later." Using specs and plans with "TBD" sections goes over just about as poorly as you'd imagine it would. The solution to something like this is to run the exact same playbook we're running everywhere else: an adversarial review loop. Now, after Superpowers finishes planning, it kicks off a subagent that reads the plannning docs for sanity and completeness. It's not a panacea and not a substitute for actually glancing at at least the "spec" doc yourself, but does seem to lead to a dramatic improvement in planning document quality. Until now, Superpowers has always offered the user the choice of Subagent Driven Development or having the human partner open up another session and run a plan one chunk at a time. That choice dates from a time when subagents were new and, well, I didn't trust them. Over the past 5 months, it's become crystal clear to me that the Subagent Driven Development workflow is dramatically more capable and effective than the old way. If your coding agent supports subagents, Superpowers will use Subagent Driven Development. If it doesn't support subagents, it will warn you that a harness that supports subagents will do a better job and then it'll do the best it can to work the full plan in a single session. In harnesses like Claude Code that are capable of choosing which model to use for a subagent, we now instruct the agent to use the cheapest model capable of doing a given task. With the detailed plans produced through the brainstorming + writing plans process, it's not uncommon to be able to use Claude Haiku for implementation. Along with that, I've tuned Subagent Driven Development a little bit to allow the subagents to better communicate if they're out of their depth and need a hand. As it becomes more realistic to build larger and larger projects with Superpowers, I've found it helpful to add some additional general software engineering guidance to the core skills. I asked Claude what it understood about these changes and this is what it had to say: One thing that Claude neglected to mention is that brainstorming is now on the lookout for projects that it considers "too big" and will interactively work with you to break them down into more manageable pieces. You'll likely see even more aggressive work on task decomposition in a future release. As of 5.0, Superpowers no longer defaults to writing its specs and plans in docs/plans, instead preferring docs/superpowers/specs and docs/superpowers/plans. Superpowers now explicitly instructs your coding agent to prefer direct instructions from you, your CLAUDE.md or your AGENTS.md to Superpowers internal instructions. If you want to customize Superpowers behavior, it should now be as simple as a line in the right document. OpenAI Codex has recently added support for subagents. It's been a little bit of a moving target, but generally I've been finding that Codex rigorously follows all instructions from Superpowers. And that includes when subagents get ahold of the skill and decide to start brainstorming and delegating all their work to subagents, occasionally recursively. Superpowers 5 adds a mitigation against this behavior. The slash commands that have been in Superpowers since the beginning date from a time when Claude Code didn't have native support for skills. The original prompts that became the brainstorm, writing-plans, and executing-plans skills lived in slash commands. As skills have evolved, Claude Code treats skills as slash commands and Claude is increasingly confused by the 'old' slash commands. As of 5.0, the three slash commands we ship now announce that they've been deprecated. They'll go away in a future release. In general, you should be able to just describe your intent to your coding agent and Superpowers + the native skills system should start using the right skills. (My default smoke test for Superpowers is to open up a coding agent and type "Let's make a react todo list." If the agent starts coding, I failed. If it kicks off the brainstorming skill, then the right thing is happening.) Superpowers 5 is out now. If you're using a harness with a plugin system like Claude Code or Cursor, it should auto-update sometime over the next day or two. If you're using Superpowers in another tool with a direct installation, you may need to git pull or instruct your agent to do a fresh install.

0 views
devansh 1 weeks ago

Four Vulnerabilities in Parse Server

Parse Server is one of those projects that sits quietly beneath a lot of production infrastructure. It powers the backend of a meaningful number of mobile and web applications, particularly those that started on Parse's original hosted platform before it shut down in 2017 and needed somewhere to migrate. Currently the project has over 21,000+ stars on GitHub I recently spent some time auditing its codebase and found four security vulnerabilities. Three of them share a common root, a fundamental gap between what is documented to do and what the server actually enforces. The fourth is an independent issue in the social authentication adapters that is arguably more severe, a JWT validation bypass that allows an attacker to authenticate as any user on a target server using a token issued for an entirely different application. The Parse Server team was responsive throughout and coordinated fixes promptly. All four issues have been patched. Parse Server is an open-source Node.js backend framework that provides a complete application backend out of the box, a database abstraction layer (typically over MongoDB or PostgreSQL), a REST and GraphQL API, user authentication, file storage, push notifications, Cloud Code for serverless functions, and a real-time event system. It is primarily used as the backend for mobile applications and is the open-source successor to Parse's original hosted backend-as-a-service platform. Parse Server authenticates API requests using one of several key types. The grants full administrative access to all data, bypassing all object-level and class-level permission checks. It is intended for trusted server-side operations only. Parse Server also exposes a option. Per its documentation, this key grants master-level read access, it can query any data, bypass ACLs for reading, and perform administrative reads, but is explicitly intended to deny all write operations. It is the kind of credential you might hand to an analytics service, a monitoring agent, or a read-only admin dashboard, enough power to see everything, but no ability to change anything. That contract is what three of these four vulnerabilities break. The implementation checks whether a request carries master-level credentials by testing a single flag — — on the auth object. The problem is that authentication sets both and , and a large number of route handlers only check the former. The flag is set but never consulted, which means the read-only restriction exists in concept but not in enforcement. Cloud Hooks are server-side webhooks that fire when specific Parse Server events occur — object creation, deletion, user signup, and so on. Cloud Jobs are scheduled or manually triggered background tasks that can execute arbitrary Cloud Code functions. Both are powerful primitives: Cloud Hooks can exfiltrate any data passing through the server's event stream, and Cloud Jobs can execute arbitrary logic on demand. The routes that manage Cloud Hooks and Cloud Jobs — creating new hooks, modifying existing ones, deleting them, and triggering job execution — are all guarded by master key access checks. Those checks verify only that the requesting credential has . Because satisfies that condition, a caller holding only the read-only credential can fully manage the Cloud Hook lifecycle and trigger Cloud Jobs at will. The practical impact is data exfiltration via Cloud Hook. An attacker who knows the can register a new Cloud Hook pointing to an external endpoint they control, then watch as every matching Parse Server event — user signups, object writes, session creation — is delivered to them in real time. The read-only key, intended to allow passive observation, can be turned into an active wiretap on the entire application's event stream. The fix adds explicit rejection checks to the Cloud Hook and Cloud Job handlers. Parse Server's Files API exposes endpoints for uploading and deleting files — and . Both routes are guarded by , a middleware that checks whether the incoming request has master-level credentials. Like the Cloud Hooks routes, this check only tests and never consults . The root cause traces through three locations in the codebase. In at lines 267–278, the read-only auth object is constructed with . In at lines 107–113, the delete route applies as its only guard. At lines 586–602 of the same file, the delete handler calls through to without any additional read-only check in the call chain. The consequence is that a caller with only can upload arbitrary files to the server's storage backend or permanently delete any existing file by name. The upload vector is primarily an integrity concern — poisoning stored assets. The deletion vector is a high-availability concern — an attacker can destroy application data (user avatars, documents, media) that may not have backups, and depending on how the application is structured, deletion of certain files could cause cascading application failures. The fix adds rejection to both the file upload and file delete handlers. This is the most impactful of the three issues. The endpoint is a privileged administrative route intended for master-key workflows — it accepts a parameter and returns a valid, usable session token for that user. The design intent is to allow administrators to impersonate users for debugging or support purposes. It is the digital equivalent of a master key that can open any door. The route's handler, , is located in at lines 339–345 and is mounted as at lines 706–708. The guard condition rejects requests where is false. Because produces an auth object where is true — and because there is no check anywhere in the handler or its middleware chain — the read-only credential passes the gate and the endpoint returns a fully usable for any provided. That session token is not a read-only token. It is a normal user session token, indistinguishable from one obtained by logging in with a password. It grants full read and write access to everything that user's ACL and role memberships permit. An attacker with the and knowledge of any user's object ID can silently mint a session as that user and then act as them with complete write access — modifying their data, making purchases, changing their email address, deleting their account, or doing anything else the application allows its users to do. There is no workaround other than removing from the deployment or upgrading. The fix is a single guard added to that rejects the request when is true. This vulnerability is independent of the theme and is the most severe of the four. It sits in Parse Server's social authentication layer — specifically in the adapters that validate identity tokens for Sign in with Google, Sign in with Apple, and Facebook Login. When a user authenticates via one of these providers, the client receives a JSON Web Token signed by the provider. Parse Server's authentication adapters are supposed to verify this token, they check the signature, the expiry, and critically, the audience claim — the field that specifies which application the token was issued for. Audience validation is what prevents a token issued for one application from being used to authenticate against a different application. Without it, a validly signed token from any Google, Apple, or Facebook application in the world can be used to authenticate against any Parse Server that trusts the same provider. The vulnerability arises from how the adapters handle missing configuration. For the Google and Apple adapters, the audience is passed to JWT verification via the configuration option. When is not set, the adapters do not reject the configuration as incomplete — they silently skip audience validation entirely. The JWT is verified for signature and expiry only, and any valid Google or Apple token from any app will be accepted. For Facebook Limited Login, the situation is worse, the vulnerability exists regardless of configuration. The Facebook adapter validates as the expected audience for the Standard Login (Graph API) flow. However, the Limited Login path — which uses JWTs rather than Graph API tokens — never passes to JWT verification at all. The code path simply does not include the audience parameter in the verification call, meaning no configuration value, however correct, can prevent the bypass on the Limited Login path. The attack is straightforward. An attacker creates or uses any existing Google, Apple, or Facebook application they control, signs in to obtain a legitimately signed JWT, and then presents that token to a vulnerable Parse Server's authentication endpoint. Because audience validation is skipped, the token passes verification. Combined with the ability to specify which Parse Server user account to associate the token with, this becomes full pre-authentication account takeover for any user on the server — with no credentials, no brute force, and no interaction from the victim. The fix enforces (Google/Apple) and (Facebook) as mandatory configuration and passes them correctly to JWT verification for both the Standard Login and Limited Login paths on all three adapters. What is Parse Server? The readOnlyMasterKey Contract Vulnerabilities CVE-2026-29182 Cloud Hooks and Cloud Jobs bypass readOnlyMasterKey CVE-2026-30228 File Creation and Deletion bypass readOnlyMasterKey CVE-2026-30229 /loginAs allows readOnlyMasterKey to gain full access as any user CVE-2026-30863 JWT Audience Validation Bypass in Google, Apple, and Facebook Adapters Disclosure Timeline CVE-2026-29182: GHSA-vc89-5g3r-cmhh — Fixed in 8.6.4 , 9.4.1-alpha.3 CVE-2026-30228: GHSA-xfh7-phr7-gr2x — Fixed in 8.6.5 , 9.5.0-alpha.3 CVE-2026-30229: GHSA-79wj-8rqv-jvp5 — Fixed in 8.6.6 , 9.5.0-alpha.4 CVE-2026-30863: GHSA-x6fw-778m-wr9v — Fixed in 8.6.10 , 9.5.0-alpha.11 Parse Server repository: github.com/parse-community/parse-server

0 views
Maurycy 1 weeks ago

Remove annoying banners

This is a small javascript snippet that removes most annoying website elements: It's really simple: removing anything that doesn't scroll with the page, and enabling scrolling if it's been disabled. This gets rid of cookie popups/banners, recommend­ation sidebars, those annoying headers that follow you down the page, etc, etc. If you don't want to mess around with the JS console , you can drag this link into your bookmark bar, and run it by clicking the bookmark: Cleanup Site If you need to manually create the bookmark, here's the URL: (On mobile chrome, you will have to click the bookmark from the address bar instead of the menu.) This is a typical website before the script : ... and after: One click to get all your screen space back. It even works on very complex sites like social media — great for when you want to read a longer post without constant distractions. As a bonus, I made these to fix bad color schemes: Force dark mode ... and ... Force light mode

0 views
Armin Ronacher 1 weeks ago

AI And The Ship of Theseus

Because code gets cheaper and cheaper to write, this includes re-implementations. I mentioned recently that I had an AI port one of my libraries to another language and it ended up choosing a different design for that implementation. In many ways, the functionality was the same, but the path it took to get there was different. The way that port worked was by going via the test suite. Something related, but different, happened with chardet . The current maintainer reimplemented it from scratch by only pointing it to the API and the test suite. The motivation: enabling relicensing from LGPL to MIT. I personally have a horse in the race here because I too wanted chardet to be under a non-GPL license for many years. So consider me a very biased person in that regard. Unsurprisingly, that new implementation caused a stir. In particular, Mark Pilgrim, the original author of the library, objects to the new implementation and considers it a derived work. The new maintainer, who has maintained it for the last 12 years, considers it a new work and instructs his coding agent to do precisely that. According to author, validating with JPlag, the new implementation is distinct. If you actually consider how it works, that’s not too surprising. It’s significantly faster than the original implementation, supports multiple cores and uses a fundamentally different design. What I think is more interesting about this question is the consequences of where we are. Copyleft code like the GPL heavily depends on copyrights and friction to enforce it. But because it’s fundamentally in the open, with or without tests, you can trivially rewrite it these days. I myself have been intending to do this for a little while now with some other GPL libraries. In particular I started a re-implementation of readline a while ago for similar reasons, because of its GPL license. There is an obvious moral question here, but that isn’t necessarily what I’m interested in. For all the GPL software that might re-emerge as MIT software, so might be proprietary abandonware. For me personally, what is more interesting is that we might not even be able to copyright these creations at all. A court still might rule that all AI-generated code is in the public domain, because there was not enough human input in it. That’s quite possible, though probably not very likely. But this all causes some interesting new developments we are not necessarily ready for. Vercel, for instance, happily re-implemented bash with Clankers but got visibly upset when someone re-implemented Next.js in the same way. There are huge consequences to this. When the cost of generating code goes down that much, and we can re-implement it from test suites alone, what does that mean for the future of software? Will we see a lot of software re-emerging under more permissive licenses? Will we see a lot of proprietary software re-emerging as open source? Will we see a lot of software re-emerging as proprietary? It’s a new world and we have very little idea of how to navigate it. In the interim we will have some fights about copyrights but I have the feeling very few of those will go to court, because everyone involved will actually be somewhat scared of setting a precedent. In the GPL case, though, I think it warms up some old fights about copyleft vs permissive licenses that we have not seen in a long time. It probably does not feel great to have one’s work rewritten with a Clanker and one’s authorship eradicated. Unlike the Ship of Theseus , though, this seems more clear-cut: if you throw away all code and start from scratch, even if the end result behaves the same, it’s a new ship. It only continues to carry the name. Which may be another argument for why authors should hold on to trademarks rather than rely on licenses and contract law. I personally think all of this is exciting. I’m a strong supporter of putting things in the open with as little license enforcement as possible. I think society is better off when we share, and I consider the GPL to run against that spirit by restricting what can be done with it. This development plays into my worldview. I understand, though, that not everyone shares that view, and I expect more fights over the emergence of slopforks as a result. After all, it combines two very heated topics, licensing and AI, in the worst possible way.

0 views
(think) 1 weeks ago

Learning OCaml: String Interpolation

Most programming languages I’ve used have some form of string interpolation. Ruby has , Python has f-strings, JavaScript has template literals, even Haskell has a few popular interpolation libraries. It’s one of those small conveniences you don’t think about until it’s gone. OCaml doesn’t have built-in string interpolation. And here’s the funny thing – I didn’t even notice when I was first learning the language. Looking back at my first impressions article, I complained about the comment syntax, the semicolons in lists, the lack of list comprehensions, and a dozen other things – but never once about string interpolation. I was happily concatenating strings with and using without giving it a second thought. I only started thinking about this while working on my PPX article and going through the catalog of popular PPX libraries. That’s when I stumbled upon and thought “wait, why doesn’t OCaml have interpolation?” The short answer: OCaml has no way to generically convert a value to a string. There’s no universal method, no typeclass, no runtime reflection that would let the language figure out how to stringify an arbitrary expression inside a string literal. In Ruby, every object responds to . In Python, everything has . These languages can interpolate anything because there’s always a fallback conversion available at runtime. OCaml’s type information is erased at compile time, so the compiler would need to know at compile time which conversion function to call for each interpolated expression – and the language has no mechanism for that. 1 OCaml does have , which is actually quite nice and type-safe: The format string is statically checked by the compiler – if you pass an where expects a string, you get a compile-time error, not a runtime crash. That’s genuinely better than what most dynamically typed languages offer. But it’s not interpolation – the values aren’t inline in the string, and for complex expressions it gets unwieldy fast. There’s also plain string concatenation with : This works, but it’s ugly and error-prone for anything beyond trivial cases. ppx_string is a Jane Street PPX that adds string interpolation to OCaml at compile time. The basic usage is straightforward: For non-string types, you specify the module whose function should be used: The suffix tells the PPX to call on , and calls on . Note that , , etc. are conventions from Jane Street’s / libraries – OCaml’s uses , and so on, which won’t work with the syntax. This is another reason really only makes sense within the Jane Street ecosystem. Any module that exposes a function works here – including your own: You can also use arbitrary expressions inside the interpolation braces: Though at that point you might be better off with a binding or for readability. A few practical things worth knowing: Honestly? Probably not as much as you think. I’ve been writing OCaml for a while now without it, and it rarely bothers me. Here’s why: That said, when you do need to build a lot of human-readable strings – error messages, log output, CLI formatting – interpolation is genuinely nicer than . If you’re in the Jane Street ecosystem, there’s no reason not to use . The lack of string interpolation in OCaml is one of those things that sounds worse than it actually is. In practice, and cover the vast majority of use cases, and the code you write with them is arguably clearer about types than magical interpolation would be. It’s also a nice example of OCaml’s general philosophy: keep the language core small, provide solid primitives ( , ), and let the PPX ecosystem fill in the syntactic sugar for those who want it. The same pattern plays out with for printing, for monadic syntax, and many other conveniences. Will OCaml ever get built-in string interpolation? Maybe. There have been discussions on the forums over the years, and the language did absorb binding operators ( , ) from the PPX world. But I wouldn’t hold my breath – and honestly, I’m not sure I’d even notice if it landed. That’s all I have for you today. Keep hacking! This is the same fundamental problem that makes printing data structures harder than in dynamically typed languages.  ↩︎ You need the stanza in your dune file: String values interpolate directly, everything else needs a conversion suffix. Unlike Ruby where is called implicitly, requires you to be explicit about non-string types. This is annoying at first, but it’s consistent with OCaml’s philosophy of being explicit about types. It’s a Jane Street library. If you’re already in the Jane Street ecosystem ( , , etc.), adding is trivial. If you’re not, pulling in a Jane Street dependency just for string interpolation might feel heavy. In that case, is honestly fine. It doesn’t work with the module. If you’re building strings for pretty-printing, you’ll still want or . is for building plain strings, not format strings. Nested interpolation doesn’t work – you can’t nest inside another . Keep it simple. is good. It’s type-safe, it’s concise enough for most cases, and it’s available everywhere without extra dependencies. Most string building in OCaml happens through . If you’re writing pretty-printers (which you will be, thanks to ), you’re using , not string concatenation or interpolation. OCaml code tends to be more compute-heavy than string-heavy. Compared to, say, a Rails app or a shell script, the typical OCaml program just doesn’t build that many ad-hoc strings. This is the same fundamental problem that makes printing data structures harder than in dynamically typed languages.  ↩︎

0 views
Lea Verou 1 weeks ago

External import maps, today!

A few weeks ago, I posted Web dependencies are broken. Can we fix them? . Today’s post is a little less gloomy: Turns out that the major limitation that would allow centralized set-it-and-forget-it import map management can be lifted today, with excellent browser support! The core idea is that you can use DOM methods to inject an import map dynamically , by literally creating an element in a classic (blocking) script and appending it after the injector script. 💡 This is a gamechanger. It makes external import maps nice-to-have sugar instead of the only way to have centralized import map management decoupled from HTML generation. All we need to do is build a little injector script, no need for tightly coupled workflows that take over everything. Once you have that, it takes a single line of HTML to include it anywhere. If you’re already using a templating system, great! You could add to your template for every page. But you don’t need a templating system: even if you’re rawdogging HTML (e.g. for a simple SPA), it’s no big deal to just include a in there manually. This is not even new: when the injector is a classic (non-module) script placed before any modules are fetched, it works in every import map implementation , all the way back to Chrome 89, Safari 16.4+, and Firefox 108+ ! Turns out, JSPM made the same discovery: JSPM v4 uses the same technique . It is unclear why it took all of us so long to discover it but I’m glad we got there. First, while there is some progress around making import maps more resilient, your best bet for maximum compatibility is for the injector script to be a good ol’ blocking that comes before everything else. This means no , no , no — you want to get it in before any modules start loading or many browsers will ignore it. Then, you literally use DOM methods to create a and append it after the script that is injecting the import map (which you can get via ). This is a minimal example: Remember, this literally injects inline import maps in your page. This means that any relative URLs will be interpreted relative to the current page ! If you’re building an SPA or your URLs are all absolute or root-relative, that’s no biggie. But if these are relative URLs, they will not work as expected across pages. You need to compute the absolute URL for each mapped URL and use that instead. This sounds complicated, but it only adds about 5 more lines of code: Note that is in module scripts, since the same module can be loaded from different places and different scripts. Once it becomes possible to inject import maps from a module script, you could use to get the URL of the current module. Until then, you can use a bit of error handling to catch mistakes: This is the minimum, since the script literally breaks if is . You could get more elaborate and warn about / attributes, or if scripts are present before the current script. These are left as an exercise for the reader. While this alleviates the immediate need for external import maps , the DX and footguns make it a bit gnarly, so having first-class external import map support would still be a welcome improvement. But even if we could do today, the unfortunate coupling with HTML is still at the receiving end of all this, and creates certain limitations, such as specifiers not working in worker scripts . My position remains that HTML being the only way to include import maps is a hack . I’m not saying this pejoratively. Hacks are often okay — even necessary! — in the short term. This particular hack allowed us to get import maps out the door and shipped quickly, without getting bogged down into architecture astronaut style discussions that can be non-terminating. But it also created architectural debt . These types of issues can always be patched ad hoc, but that increases complexity, both for implementers and web developers. Ultimately, we need deeper integration of specifiers and import maps across the platform . (with or without an attribute) should become a shortcut , not the only way mappings can be specified. In my earlier post, I outlined a few ideas that could help get us closer to that goal and make import maps ubiquitous and mindless . Since they were well received, I opened issues for them: I’m linking to issues in standards repos in the interest of transparency. Please don’t spam them, even with supportive comments (that’s what reactions are for). Also keep in mind that the vast majority of import map improvements are meant for tooling authors and infrastructure providers — nobody expects regular web developers to author import maps by hand. The hope is also that better platform-wide integration can pave the way for satisfying the (many!) requests to expand specifiers beyond JS imports . Currently, the platform has no good story for importing non-JS resources from a package, such as styles, images, icons, etc. But even without any further improvement, simply the fact that injector scripts are possible opens up so many possibilities! The moment I found out about this I started working on making the tool I wished had existed to facilitate end-to-end dependency management without a build process (piggybacking on the excellent JSPM Generator for the heavy lifting), which I will announce in a separate post very soon [1] . Stay tuned! But if you’re particularly curious and driven, you can find it even before then, both the repo and npm package are already public 😉🤫 ↩︎ Linking to import maps via an HTTP header ( ?) URLs to bridge the gap between specifiers and URLs Since import maps as an import attribute proved out to be tricky, I also filed another proposal for a synchronous import map API . But if you’re particularly curious and driven, you can find it even before then, both the repo and npm package are already public 😉🤫 ↩︎

0 views
devansh 1 weeks ago

Hacking Better-Hub

Better-Hub ( better-hub.com ) is an alternative GitHub frontend — a richer, more opinionated UI layer built on Next.js that sits on top of the GitHub API. It lets developers browse repositories, view issues, pull requests, code blobs, and repository prompts, while authenticating via GitHub OAuth. Because Better-Hub mirrors GitHub content inside its own origin, any unsanitized rendering of user-controlled data becomes significantly more dangerous than it would be on a static page — it has access to session tokens, OAuth credentials, and the authenticated GitHub API. That attack surface is exactly what I set out to explore. Description The repository README is fetched from GitHub, piped through with and enabled — with zero sanitization — then stored in the state and rendered via in . Because the README is entirely attacker-controlled, any repository owner can embed arbitrary JavaScript that executes in every viewer's browser on better-hub.com. Steps to Reproduce Session hijacking via cookie theft, credential exfiltration, and full client-side code execution in the context of better-hub.com. Chains powerfully with the GitHub OAuth token leak (see vuln #10). Description Issue descriptions are rendered with the same vulnerable pipeline: with raw HTML allowed and no sanitization. The resulting is inserted directly via inside the thread entry component, meaning a malicious issue body executes arbitrary script for every person who views it on Better-Hub. Steps to Reproduce Arbitrary JavaScript execution for anyone viewing the issue through Better-Hub. Can be used for session hijacking, phishing overlays, or CSRF-bypass attacks. Description Pull request bodies are fetched from GitHub and processed through with / and no sanitization pass, then rendered unsafely. An attacker opening a PR with an HTML payload in the body causes XSS to fire for every viewer of that PR on Better-Hub. Steps to Reproduce Stored XSS affecting all viewers of the PR. Particularly impactful in collaborative projects where multiple team members review PRs. Description The same unsanitized pipeline applies to PR comments. Any GitHub user who can comment on a PR can inject a stored XSS payload that fires for every Better-Hub viewer of that conversation thread. Steps to Reproduce A single malicious commenter can compromise every reviewer's session on the platform. Description The endpoint proxies GitHub repository content and determines the from the file extension in the query parameter. For files it sets and serves the content inline (no ). An attacker can upload a JavaScript-bearing SVG to any GitHub repo and share a link to the proxy endpoint — the victim's browser executes the script within 's origin. Steps to Reproduce Reflected XSS with a shareable, social-engineered URL. No interaction with a real repository page is needed — just clicking a link is sufficient. Easily chained with the OAuth token leak for account takeover. Description When viewing code files larger than 200 KB, the application hits a fallback render path in that outputs raw file content via without any escaping. An attacker can host a file exceeding the 200 KB threshold containing an XSS payload — anyone browsing that file on Better-Hub gets the payload executed. Steps to Reproduce Any repository owner can silently weaponize a large file. Because code review is often done on Better-Hub, this creates a highly plausible attack vector against developers reviewing contributions. Description The function reads file content from a shared Redis cache . Cache entries are keyed by repository path alone — not by requesting user. The field is marked as shareable, so once any authorized user views a private file through the handler or the blob page, its contents are written to Redis under a path-only key. Any subsequent request for the same path — from any user, authenticated or not — is served directly from cache, completely bypassing GitHub's permission checks. Steps to Reproduce Complete confidentiality breach of private repositories. Any file that has ever been viewed by an authorized user is permanently exposed to unauthenticated requests. This includes source code, secrets in config files, private keys, and any other sensitive repository content. Description A similar cache-keying problem affects the issue page. When an authorized user views a private repo issue on Better-Hub, the issue's full content is cached and later embedded in Open Graph meta properties of the page HTML. A user who lacks repository access — and sees the "Unable to load repository" error — can still read the issue content by inspecting the page source, where it leaks in the meta tags served from cache. Steps to Reproduce Private issue contents — potentially including bug reports, credentials in descriptions, or internal discussion — are accessible to any unauthenticated party who knows or guesses the URL. Description Better-Hub exposes a Prompts feature tied to repositories. For private repositories, the prompt data is included in the server-rendered page source even when the requestor does not have repository access. The error UI correctly shows "Unable to load repository," but the prompt content is already serialized into the HTML delivered to the browser. Steps to Reproduce Private AI prompts — which may contain internal instructions, proprietary workflows, or system prompt secrets — leak to unauthenticated users. Description returns a session object that includes . This session object is passed as props directly to client components ( , , etc.). Next.js serializes component props and embeds them in the page HTML for hydration, meaning the raw GitHub access token is present in the page source and accessible to any JavaScript running on the page — including scripts injected via any of the XSS vulnerabilities above. The fix is straightforward: strip from the session object before passing it as props to client components. Token usage should remain server-side only. When chained with any XSS in this report, an attacker can exfiltrate the victim's GitHub OAuth token and make arbitrary GitHub API calls on their behalf — reading private repos, writing code, managing organizations, and more. This elevates every XSS in this report from session hijacking to full GitHub account takeover . Description The home page redirects authenticated users to the destination specified in the query parameter with no validation or allow-listing. An attacker can craft a login link that silently redirects the victim to an attacker-controlled domain immediately after they authenticate. Steps to Reproduce Phishing attacks exploiting the trusted better-hub.com domain. Can be combined with OAuth token flows for session fixation attacks, or used to redirect users to convincing fake login pages post-authentication. All issues were reported directly to Better-Hub team. The team was responsive and attempted rapid remediation. What is Better-Hub? The Vulnerabilities 01. Unsanitized README → XSS 02. Issue Description → XSS 03. Stored XSS in PR Bodies 04. Stored XSS in PR Comments 05. Reflected XSS via SVG Image Proxy 06. Large-File XSS (>200 KB) 07. Cache Deception — Private File Access 08. Authz Bypass via Issue Cache 09. Private Repo Prompt Leak 10. GitHub OAuth Token Leaked to Client 11. Open Redirect via Query Parameter Disclosure Timeline Create a GitHub repository with the following content in : View the repository at and observe the XSS popup. Create a GitHub issue with the following in the body: Navigate to the issue via to trigger the payload. Open a pull request whose body contains: View the PR through Better-Hub to observe the XSS popup. Post a PR comment containing: View the comment thread via Better-Hub to trigger the XSS. Create an SVG file in a public GitHub repo with content: Direct the victim to: Create a file named containing the payload, padded to exceed 200 KB: Browse to the file on Better-Hub at . The XSS fires immediately. Create a private repository and add a file called . As the repository owner, navigate to the following URL to populate the cache: Open the same URL in an incognito window or as a completely different user. The private file content is served — no authorization required. Create a private repo and create an issue with a sensitive body. Open the issue as an authorized user: Open the same URL in a different session (no repo access). While the access-error UI is shown, view the page source — issue details appear in the tags. Create a private repository and create a prompt in it. Open the prompt URL as an unauthorized user: View the page source — prompt details are present in the HTML despite the access-denied UI. Log in to Better-Hub with GitHub credentials. Navigate to: You are immediately redirected to .

0 views
JSLegendDev 1 weeks ago

If You Like PICO-8, You'll Love KAPLAY (Probably)

I’ve been checking out PICO-8 recently. For those unaware, It’s a nicely constrained environment for making small games in Lua. It provides a built-in editor allowing you to write code, make sprites, make tile maps and make sounds. This makes it ideal to prototype game ideas or make small games. You know what tool is also great for prototyping game ideas or making small games? Well… KAPLAY ! It’s a simple free and open source library for making games in JavaScript. I suspect there might be a sizeable overlap between people who like PICO-8 and those who would end up liking or even loving KAPLAY as well if they gave it a try. During my PICO-8 learning journey, I came across this nice tutorial teaching you how to make a coin collecting game in 10 minutes. In this article, I’d like to teach you how to build roughly the same game in KAPLAY. This will better demonstrate in what ways this game library makes game development faster much like PICO-8. Feel free to follow along if you wish to! KAPLAY lacks all of the tools included in PICO-8. There is no all-in-one package you can use to write your code, make your sprites, build your maps or even make sounds. You might be wondering, then, how KAPLAY is in any way similar to PICO-8 if it lacks all of this? My answer : KAPLAY makes up for it by making the coding part really easy by offering you a lot logic built-in. For example, it handles collisions, physics, scene management, animations etc… for you. You’ll see some of this in action when we arrive at the part where we write the game’s code. Now, how do we use KAPLAY? Here’s the simplest way I’ve found. You install VSCode (a popular code editor) along with the Live Server extension (can be found in the extensions marketplace within the editor). You then create a folder that you open within VSCode. Once the folder is opened, we only need to create two files. One called index.html and the other main.js. Your index.html file should contain the following : Since KAPLAY works on the web, it lives within a web page. index.html is that page. Then, we link our JavaScript file to it. We set the type to “module” so we can use import statements in our JS. We then add the following : Voilà! We can now use the KAPLAY library. Since we installed the Live Server extension, you should now have access to a “Go Live” button at the bottom of the editor. To actually run the game, all you have to do is click it. This will open the web page in your default browser. KAPLAY by default creates a canvas with a checkered pattern. One thing pretty cool with this setup, is that every time you change something in your code and hit save (Ctrl+S or Cmd+S on a Mac), the web page reloads and you can see your latest changes instantly. I’ve created the following spritesheet to be used in our game. Note that since the image is transparent, the cloud to the right is not really visible. You can download the image above to follow along. The next step is to place the image in the same folder as our HTML page and JavaScript file. We’re now ready to make our game. Here we set the width and the height of our canvas. The letterbox option is used to make sure the canvas scales according to the browser window but without losing its aspect ratio. We can load our spritesheet by using the loadSprite KAPLAY function. The first param is the name you want to use to refer to it elsewhere in your code. The second param is the path to get that asset. Finally, the third param is used to tell KAPLAY how to slice your image into individual frames. Considering that in our spritesheet we have three sprites placed in a row, the sliceX property should be set to 3. Since we have only one sprite per column (because we only have one column) sliceY should be set to 1. To make the coins fall from the top, we’ll use KAPLAY’s physics system. You can set the gravity by calling KAPLAY’s setGravity function. KAPLAY’s add function is used to create a game object by providing an array of components. These components are offered by KAPLAY and come with prebuilt functionality. The rect() component sets the graphics of the game object to be a rectangle with a width and height of 1000. On the other hand, the color component sets its color. You should have the following result at this stage. Creating The Basket The basket is a also a game object with several different components. Here is what each does : Sets the sprite used by the game object. The first param is for providing the name of the sprite we want to use. Since we’re using a spritesheet which contains three different sprites in the same image, we need to specify the frame to use. The basket sprite corresponds to frame 0. anchor() By default, game objects are positioned based on their top-left corner. However, I prefer having it centered. The anchor component is for this purpose. This component is used to set the position of the game object on the canvas. Here we also use center() which is a KAPLAY provided function that provides the coordinates of the center of the canvas. This component is used to set the hitbox of a game object. This will allow KAPLAY’s physics system to handle collisions for us. There is a debug mode you can access by pressing the f1 (fn+f1 on Mac) key which will make hitboxes visible. Example when debug mode is on. As for setting the shape of the hitbox, you can use the Rect class which takes 3 params. The first expects a vec2 (a data structure offered by KAPLAY used to set pair of values) describing where to place the hitbox relative to the game object. If set to 0, the hitbox will have the same position as the game object. The two params left are used to set its width and height. Finally, the body component is used to make the game object susceptible to physics. If added alone, the game object will be affected by gravity. However, to prevent this, we can set the isStatic property to true. This is very useful, for example, in a platformer where platforms need to be static so they don’t fall off. Here we can use the move method available on all game objects to make the basket move in the desired direction. The loop function spawns a coin every second. We use the randi function to set a random X position between 10 and 950. The offscreen component is used to destroy the game object once it’s out of view. Finally a simple string “coin” is added alongside the array of components to tag the game object being created. This will allow us to determine which coin collided with the basket so we can destroy it and increase the score. Text can be displayed by creating a game object with the text component. To know when a coin collides with the basket, we can use its onCollide method (available by default). The first param of that method is the tag of the game object you want to check collisions with. Since we have multiple coins using the “coin” tag, the specific coin currently colliding with the basket will be passed as a param to the collision handler. Now we can destroy the coin, increase the score and display the new score. As mentioned earlier, KAPLAY does not have a map making tool. However, it does offer the ability to create maps using arrays of strings. For anything more complex, you should check out Tiled which is also open source and made for that purpose. Where we place the # character in the string array determines where clouds will be placed in the game. Publishing a KAPLAY game is very simple. You compress your folder into a .zip archive and you upload it to itch.io or any other site you wish to. The game will be playable in the browser without players needing to download it. Now, what if you’d like to make it downloadable as well? A very simple tool you can use is GemShell. It allows you to make executables for Windows/Mac/Linux in what amounts to a click. You can use the lite version for free. If you plan on upgrading, you can use my link to get 15% off your purchase. To be transparent, this is an affliate link. If you end purchasing the tool using my link, I’ll get a cut of that sale. I just scratched the surface with KAPLAY today. I hope it gave you a good idea of what it’s like to make games with it. If you’re interested in more technical articles like this one, I recommend subscribing to not miss out on future publications. Subscribe now In the meantime, you can check out the following :

0 views
Kev Quirk 2 weeks ago

Quick Clarification on Pure Comments

A couple of people have reached out to me asking if I can offer a version of Pure Comments for their site, as they don't run Pure Blog . I obviously didn't make this clear in the announcement , or on the (now updated) Pure Comments site. Pure Comments can be used on ANY website. It's just an embed script, just like Disqus (only with no bloat or tracking). So you just have to upload the files to wherever you want to host Pure Comments, then add the following embed code wherever you want comments to display (replacing the example domain): You can use Pure Comments on WordPress, Bear Blog, Jekyll, 11ty, Hugo, Micro.blog, Kirby, Grav, and even Pure Blog! Anywhere you can inject the little snippet of code above, Pure Comments will work. Thanks for reading this post via RSS. RSS is ace, and so are you. ❤️ You can reply to this post by email , or leave a comment .

0 views
(think) 2 weeks ago

Building Emacs Major Modes with TreeSitter: Lessons Learned

Over the past year I’ve been spending a lot of time building TreeSitter-powered major modes for Emacs – clojure-ts-mode (as co-maintainer), neocaml (from scratch), and asciidoc-mode (also from scratch). Between the three projects I’ve accumulated enough battle scars to write about the experience. This post distills the key lessons for anyone thinking about writing a TreeSitter-based major mode, or curious about what it’s actually like. Before TreeSitter, Emacs font-locking was done with regular expressions and indentation was handled by ad-hoc engines (SMIE, custom indent functions, or pure regex heuristics). This works, but it has well-known problems: Regex-based font-locking is fragile. Regexes can’t parse nested structures, so they either under-match (missing valid code) or over-match (highlighting inside strings and comments). Every edge case is another regex, and the patterns become increasingly unreadable over time. Indentation engines are complex. SMIE (the generic indentation engine for non-TreeSitter modes) requires defining operator precedence grammars for the language, which is hard to get right. Custom indentation functions tend to grow into large, brittle state machines. Tuareg’s indentation code, for example, is thousands of lines long. TreeSitter changes the game because you get a full, incremental, error-tolerant syntax tree for free. Font-locking becomes “match this AST pattern, apply this face”: And indentation becomes “if the parent node is X, indent by Y”: The rules are declarative, composable, and much easier to reason about than regex chains. In practice, ’s entire font-lock and indentation logic fits in about 350 lines of Elisp. The equivalent in tuareg is spread across thousands of lines. That’s the real selling point: simpler, more maintainable code that handles more edge cases correctly . That said, TreeSitter in Emacs is not a silver bullet. Here’s what I ran into. TreeSitter grammars are written by different authors with different philosophies. The tree-sitter-ocaml grammar provides a rich, detailed AST with named fields. The tree-sitter-clojure grammar, by contrast, deliberately keeps things minimal – it only models syntax, not semantics, because Clojure’s macro system makes static semantic analysis unreliable. 1 This means font-locking forms in Clojure requires predicate matching on symbol text, while in OCaml you can directly match nodes with named fields. To illustrate: here’s how you’d fontify a function definition in OCaml, where the grammar gives you rich named fields: And here’s the equivalent in Clojure, where the grammar only gives you lists of symbols and you need predicate matching: You can’t learn “how to write TreeSitter queries” generically – you need to learn each grammar individually. The best tool for this is (to visualize the full parse tree) and (to see the node at point). Use them constantly. You’re dependent on someone else providing the grammar, and quality is all over the map. The OCaml grammar is mature and well-maintained – it’s hosted under the official tree-sitter GitHub org. The Clojure grammar is small and stable by design. But not every language is so lucky. asciidoc-mode uses a third-party AsciiDoc grammar that employs a dual-parser architecture – one parser for block-level structure (headings, lists, code blocks) and another for inline formatting (bold, italic, links). This is the same approach used by Emacs’s built-in , and it makes sense for markup languages where block and inline syntax are largely independent. The problem is that the two parsers run independently on the same text, and they can disagree . The inline parser misinterprets and list markers as emphasis delimiters, creating spurious bold spans that swallow subsequent inline content. The workaround is to use on all block-level font-lock rules so they win over the incorrect inline faces: This doesn’t fix inline elements consumed by the spurious emphasis – that requires an upstream grammar fix. When you hit grammar-level issues like this, you either fix them yourself (which means diving into the grammar’s JavaScript source and C toolchain) or you live with workarounds. Either way, it’s a reminder that your mode is only as good as the grammar underneath it. Getting the font-locking right in was probably the most challenging part of all three projects, precisely because of these grammar quirks. I also ran into a subtle behavior: the default font-lock mode ( ) skips an entire captured range if any position within it already has a face. So if you capture a parent node like and a child was already fontified, the whole thing gets skipped silently. The fix is to capture specific child nodes instead: These issues took a lot of trial and error to diagnose. The lesson: budget extra time for font-locking when working with less mature grammars . Grammars evolve, and breaking changes happen. switched from the stable grammar to the experimental branch because the stable version had metadata nodes as children of other nodes, which caused and to behave incorrectly. The experimental grammar makes metadata standalone nodes, fixing the navigation issues but requiring all queries to be updated. pins to v0.24.0 of the OCaml grammar. If you don’t pin versions, a grammar update can silently break your font-locking or indentation. The takeaway: always pin your grammar version , and include a mechanism to detect outdated grammars. tests a query that changed between versions to detect incompatible grammars at startup. Users shouldn’t have to manually clone repos and compile C code to use your mode. Both and include grammar recipes: On first use, the mode checks and offers to install missing grammars via . This works, but requires a C compiler and Git on the user’s machine, which is not ideal. 2 The TreeSitter support in Emacs has been improving steadily, but each version has its quirks: Emacs 29 introduced TreeSitter support but lacked several APIs. For instance, (used for structured navigation) doesn’t exist – you need a fallback: Emacs 30 added , sentence navigation, and better indentation support. But it also had a bug in offsets ( #77848 ) that broke embedded parsers, and another in that required to disable its TreeSitter-aware version. Emacs 31 has a bug in where an off-by-one error causes to leave ` *)` behind on multi-line OCaml comments. I had to skip the affected test with a version check: The lesson: test your mode against multiple Emacs versions , and be prepared to write version-specific workarounds. CI that runs against Emacs 29, 30, and snapshot is essential. Most TreeSitter grammars ship with query files for syntax highlighting ( ) and indentation ( ). Editors like Neovim and Helix use these directly. Emacs doesn’t – you have to manually translate the patterns into and calls in Elisp. This is tedious and error-prone. For example, here’s a rule from the OCaml grammar’s : And here’s the Elisp equivalent you’d write for Emacs: The query syntax is nearly identical, but you have to wrap everything in calls, map upstream capture names ( ) to Emacs face names ( ), assign features, and manage behavior. You end up maintaining a parallel set of queries that can drift from upstream. Emacs 31 will introduce which will make it possible to use files for font-locking, which should help significantly. But for now, you’re hand-coding everything. When a face isn’t being applied where you expect: TreeSitter modes define four levels of font-locking via , and the default level in Emacs is 3. It’s tempting to pile everything into levels 1–3 so users see maximum highlighting out of the box, but resist the urge. When every token on the screen has a different color, code starts looking like a Christmas tree and the important things – keywords, definitions, types – stop standing out. Less is more here. Here’s how distributes features across levels: And follows the same philosophy: The pattern is the same: essentials first, progressively more detail at higher levels. This way the default experience (level 3) is clean and readable, and users who want the full rainbow can bump to 4. Better yet, they can use to cherry-pick individual features regardless of level: This gives users fine-grained control without requiring mode authors to anticipate every preference. Indentation issues are harder to diagnose because they depend on tree structure, rule ordering, and anchor resolution: Remember that rule order matters for indentation too – the first matching rule wins. A typical set of rules reads top to bottom from most specific to most general: Watch out for the empty-line problem : when the cursor is on a blank line, TreeSitter has no node at point. The indentation engine falls back to the root node as the parent, which typically matches the top-level rule and gives column 0. In neocaml I solved this with a rule that looks at the previous line’s last token to decide indentation: This is the single most important piece of advice. Font-lock and indentation are easy to break accidentally, and manual testing doesn’t scale. Both projects use Buttercup (a BDD testing framework for Emacs) with custom test macros. Font-lock tests insert code into a buffer, run , and assert that specific character ranges have the expected face: Indentation tests insert code, run , and assert the result matches the expected indentation: Integration tests load real source files and verify that both font-locking and indentation survive on the full file. This catches interactions between rules that unit tests miss. has 200+ automated tests and has even more. Investing in test infrastructure early pays off enormously – I can refactor indentation rules with confidence because the suite catches regressions immediately. When I became the maintainer of clojure-mode many years ago, I really struggled with making changes. There were no font-lock or indentation tests, so every change was a leap of faith – you’d fix one thing and break three others without knowing until someone filed a bug report. I spent years working on a testing approach I was happy with, alongside many great contributors, and the return on investment was massive. The same approach – almost the same test macros – carried over directly to when we built the TreeSitter version. And later I reused the pattern again in and . One investment in testing infrastructure, four projects benefiting from it. I know that automated tests, for whatever reason, never gained much traction in the Emacs community. Many popular packages have no tests at all. I hope stories like this convince you that investing in tests is really important and pays off – not just for the project where you write them, but for every project you build after. This one is specific to but applies broadly: compiling TreeSitter queries at runtime is expensive. If you’re building queries dynamically (e.g. with called at mode init time), consider pre-compiling them as values. This made a noticeable difference in ’s startup time. The Emacs community has settled on a suffix convention for TreeSitter-based modes: , , , and so on. This makes sense when both a legacy mode and a TreeSitter mode coexist in Emacs core – users need to choose between them. But I think the convention is being applied too broadly, and I’m afraid the resulting name fragmentation will haunt the community for years. For new packages that don’t have a legacy counterpart, the suffix is unnecessary. I named my packages (not ) and (not ) because there was no prior or to disambiguate from. The infix is an implementation detail that shouldn’t leak into the user-facing name. Will we rename everything again when TreeSitter becomes the default and the non-TS variants are removed? Be bolder with naming. If you’re building something new, give it a name that makes sense on its own merits, not one that encodes the parsing technology in the package name. I think the full transition to TreeSitter in the Emacs community will take 3–5 years, optimistically. There are hundreds of major modes out there, many maintained by a single person in their spare time. Converting a mode from regex to TreeSitter isn’t just a mechanical translation – you need to understand the grammar, rewrite font-lock and indentation rules, handle version compatibility, and build a new test suite. That’s a lot of work. Interestingly, this might be one area where agentic coding tools can genuinely help. The structure of TreeSitter-based major modes is fairly uniform: grammar recipes, font-lock rules, indentation rules, navigation settings, imenu. If you give an AI agent a grammar and a reference to a high-quality mode like , it could probably scaffold a reasonable new mode fairly quickly. The hard parts – debugging grammar quirks, handling edge cases, getting indentation just right – would still need human attention, but the boilerplate could be automated. Still, knowing the Emacs community, I wouldn’t be surprised if a full migration never actually completes. Many old-school modes work perfectly fine, their maintainers have no interest in TreeSitter, and “if it ain’t broke, don’t fix it” is a powerful force. And that’s okay – diversity of approaches is part of what makes Emacs Emacs. TreeSitter is genuinely great for building Emacs major modes. The code is simpler, the results are more accurate, and incremental parsing means everything stays fast even on large files. I wouldn’t go back to regex-based font-locking willingly. But it’s not magical. Grammars are inconsistent across languages, the Emacs APIs are still maturing, you can’t reuse files (yet), and you’ll hit version-specific bugs that require tedious workarounds. The testing story is better than with regex modes – tree structures are more predictable than regex matches – but you still need a solid test suite to avoid regressions. If you’re thinking about writing a TreeSitter-based major mode, do it. The ecosystem needs more of them, and the experience of working with syntax trees instead of regexes is genuinely enjoyable. Just go in with realistic expectations, pin your grammar versions, test against multiple Emacs releases, and build your test suite early. Anyways, I wish there was an article like this one when I was starting out with and , so there you have it. I hope that the lessons I’ve learned along the way will help build better modes with TreeSitter down the road. That’s all I have for you today. Keep hacking! See the excellent scope discussion in the tree-sitter-clojure repo for the rationale.  ↩︎ There’s ongoing discussion in the Emacs community about distributing pre-compiled grammar binaries, but nothing concrete yet.  ↩︎ Regex-based font-locking is fragile. Regexes can’t parse nested structures, so they either under-match (missing valid code) or over-match (highlighting inside strings and comments). Every edge case is another regex, and the patterns become increasingly unreadable over time. Indentation engines are complex. SMIE (the generic indentation engine for non-TreeSitter modes) requires defining operator precedence grammars for the language, which is hard to get right. Custom indentation functions tend to grow into large, brittle state machines. Tuareg’s indentation code, for example, is thousands of lines long. Use to verify the node type at point matches your query. Set to to see which rules are firing. Check the font-lock feature level – your rule might be in level 4 while the user has the default level 3. The features are assigned to levels via . Remember that rule order matters . Without , an earlier rule that already fontified a region will prevent later rules from applying. This can be intentional (e.g. builtin types at level 3 take precedence over generic types) or a source of bugs. Set to – this logs which rule matched for each line, what anchor was computed, and the final column. Use to understand the parent chain. The key question is always: “what is the parent node, and which rule matches it?” Remember that rule order matters for indentation too – the first matching rule wins. A typical set of rules reads top to bottom from most specific to most general: Watch out for the empty-line problem : when the cursor is on a blank line, TreeSitter has no node at point. The indentation engine falls back to the root node as the parent, which typically matches the top-level rule and gives column 0. In neocaml I solved this with a rule that looks at the previous line’s last token to decide indentation: See the excellent scope discussion in the tree-sitter-clojure repo for the rationale.  ↩︎ There’s ongoing discussion in the Emacs community about distributing pre-compiled grammar binaries, but nothing concrete yet.  ↩︎

0 views
baby steps 2 weeks ago

How Dada enables internal references

In my previous Dada blog post, I talked about how Dada enables composable sharing. Today I’m going to start diving into Dada’s permission system; permissions are Dada’s equivalent to Rust’s borrow checker. Dada aims to exceed Rust’s capabilities by using place-based permissions. Dada lets you write functions and types that capture both a value and things borrowed from that value . As a fun example, imagine you are writing some Rust code to process a comma-separated list, just looking for entries of length 5 or more: One of the cool things about Rust is how this code looks a lot like some high-level language like Python or JavaScript, but in those languages the call is going to be doing a lot of work, since it will have to allocate tons of small strings, copying out the data. But in Rust the values are just pointers into the original string and so is very cheap. I love this. On the other hand, suppose you want to package up some of those values, along with the backing string, and send them to another thread to be processed. You might think you can just make a struct like so… …and then create the list and items and store them into it: But as experienced Rustaceans know, this will not work. When you have borrowed data like an , that data cannot be moved. If you want to handle a case like this, you need to convert from into sending indices, owned strings, or some other solution. Argh! Dada does things a bit differently. The first thing is that, when you create a reference, the resulting type names the place that the data was borrowed from , not the lifetime of the reference . So the type annotation for would say 1 (at least, if you wanted to write out the full details rather than leaving it to the type inferencer): I’ve blogged before about how I would like to redefine lifetimes in Rust to be places as I feel that a type like is much easier to teach and explain: instead of having to explain that a lifetime references some part of the code, or what have you, you can say that “this is a that references the variable ”. But what’s also cool is that named places open the door to more flexible borrows. In Dada, if you wanted to package up the list and the items, you could build a type like so: Note that last line – . We can create a new class and move into it along with , which borrows from list. Neat, right? OK, so let’s back up and talk about how this all works. Let’s start with syntax. Before we tackle the example, I want to go back to the example from previous posts, because it’s a bit easier for explanatory purposes. Here is some Rust code that declares a struct , creates an owned copy of it, and then gets a few references into it. The Dada equivalent to this code is as follows: The first thing to note is that, in Dada, the default when you name a variable or a place is to create a reference. So doesn’t move , as it would in Rust, it creates a reference to the stored in . You could also explicitly write , but that is not preferred. Similarly, creates a reference to the value in the field . (If you wanted to move the character, you would write , not as in Rust.) Notice that I said “creates a reference to the stored in ”. In particular, I did not say “creates a reference to ”. That’s a subtle choice of wording, but it has big implications. The reason I wrote that “creates a reference to the stored in ” and not “creates a reference to ” is because, in Dada, references are not pointers . Rather, they are shallow copies of the value, very much like how we saw in the previous post that a acts like an but is represented as a shallow copy. So where in Rust the following code… …looks like this in memory… in Dada, code like this would look like so Clearly, the Dada representation takes up more memory on the stack. But note that it doesn’t duplicate the memory in the heap, which tends to be where the vast majority of the data is found. This gets at something important. Rust, like C, makes pointers first-class. So given , refers to the pointer and refers to its referent, the . Dada, like Java, goes another way. is a value – including in memory representation! The difference between a , , and is not in their memory layout, all of them are the same, but they differ in whether they own their contents . 2 So in Dada, there is no operation to go from “pointer” to “referent”. That doesn’t make sense. Your variable always contains a string, but the permissions you have to use that string will change. In fact, the goal is that people don’t have to learn the memory representation as they learn Dada, you are supposed to be able to think of Dada variables as if they were all objects on the heap, just like in Java or Python, even though in fact they are stored on the stack. 3 In Rust, you cannot move values while they are borrowed. So if you have code like this that moves into … …then this code only compiles if is not used again: There are two reasons that Rust forbids moves of borrowed data: Neither of these apply to Dada: OK, let’s revisit that Rust example that was giving us an error. When we convert it to Dada, we find that it type checks just fine: Woah, neat! We can see that when we move from into , the compiler updates the types of the variables around it. So actually the type of changes to . And then when we move from to , that’s totally valid. In PL land, updating the type of a variable from one thing to another is called a “strong update”. Obviously things can get a bit complicated when control-flow is involved, e.g., in a situation like this: OK, let’s take the next step. Let’s define a Dada function that takes an owned value and another value borrowed from it, like the name, and then call it: We could call this function like so, as you might expect: So…how does this work? Internally, the type checker type-checks a function call by creating a simpler snippet of code, essentially, and then type-checking that . It’s like desugaring but only at type-check time. In this simpler snippet, there are a series of statements to create temporary variables for each argument. These temporaries always have an explicit type taken from the method signature, and they are initialized with the values of each argument: If this type checks, then the type checker knows you have supplied values of the required types, and so this is a valid call. Of course there are a few more steps, but that’s the basic idea. Notice what happens if you supply data borrowed from the wrong place: This will fail to type check because you get: So now, if we go all the way back to our original example, we can see how the example worked: Basically, when you construct a , that’s “just another function call” from the type system’s perspective, except that in the signature is handled carefully. I should be clear, this system is modeled in the dada-model repository, which implements a kind of “mini Dada” that captures what I believe to be the most interesting bits. I’m working on fleshing out that model a bit more, but it’s got most of what I showed you here. 5 For example, here is a test that you get an error when you give a reference to the wrong value. The “real implementation” is lagging quite a bit, and doesn’t really handle the interesting bits yet. Scaling it up from model to real implementation involves solving type inference and some other thorny challenges, and I haven’t gotten there yet – though I have some pretty interesting experiments going on there too, in terms of the compiler architecture. 6 I believe we could apply most of this system to Rust. Obviously we’d have to rework the borrow checker to be based on places, but that’s the straight-forward part. The harder bit is the fact that is a pointer in Rust, and that we cannot readily change. However, for many use cases of self-references, this isn’t as important as it sounds. Often, the data you wish to reference is living in the heap, and so the pointer isn’t actually invalidated when the original value is moved. Consider our opening example. You might imagine Rust allowing something like this in Rust: In this case, the data is heap-allocated, so moving the string doesn’t actually invalidate the value (it would invalidate an value, interestingly). In Rust today, the compiler doesn’t know all the details of what’s going on. has a impl and so it’s quite opaque whether is heap-allocated or not. But we are working on various changes to this system in the Beyond the goal, most notably the Field Projections work. There is likely some opportunity to address this in that context, though to be honest I’m behind in catching up on the details. I’ll note in passing that Dada unifies and into one type as well. I’ll talk in detail about how that works in a future blog post.  ↩︎ This is kind of like C++ references (e.g., ), which also act “as if” they were a value (i.e., you write , not ), but a C++ reference is truly a pointer, unlike a Dada ref.  ↩︎ This goal was in part inspired by a conversation I had early on within Amazon, where a (quite experienced) developer told me, “It took me months to understand what variables are in Rust”.  ↩︎ I explained this some years back in a talk on Polonius at Rust Belt Rust , if you’d like more detail.  ↩︎ No closures or iterator chains!  ↩︎ As a teaser, I’m building it in async Rust, where each inference variable is a “future” and use “await” to find out when other parts of the code might have added constraints.  ↩︎ References are pointers, so those pointers may become invalidated. In the example above, points to the stack slot for , so if were to be moved into , that makes the reference invalid. The type system would lose track of things. Internally, the Rust borrow checker has a kind of “indirection”. It knows that is borrowed for some span of the code (a “lifetime”), and it knows that the lifetime in the type of is related to that lifetime, but it doesn’t really know that is borrowed from in particular. 4 Because references are not pointers into the stack, but rather shallow copies, moving the borrowed value doesn’t invalidate their contents. They remain valid. Because Dada’s types reference actual variable names, we can modify them to reflect moves. I’ll note in passing that Dada unifies and into one type as well. I’ll talk in detail about how that works in a future blog post.  ↩︎ This is kind of like C++ references (e.g., ), which also act “as if” they were a value (i.e., you write , not ), but a C++ reference is truly a pointer, unlike a Dada ref.  ↩︎ This goal was in part inspired by a conversation I had early on within Amazon, where a (quite experienced) developer told me, “It took me months to understand what variables are in Rust”.  ↩︎ I explained this some years back in a talk on Polonius at Rust Belt Rust , if you’d like more detail.  ↩︎ No closures or iterator chains!  ↩︎ As a teaser, I’m building it in async Rust, where each inference variable is a “future” and use “await” to find out when other parts of the code might have added constraints.  ↩︎

0 views
fLaMEd fury 2 weeks ago

Making fLaMEd fury Glow Everywhere With an Eleventy Transform

What’s going on, Internet? I originally added a CSS class as a fun way to make my name (fLaMEd) and site name (fLaMEd fury) pop on the homepage. shellsharks gave me a shoutout for it, which inspired me to take it further and apply the effect site-wide. Site wide was the original intent and the problem was it was only being applied manually in a handful of places, and I kept forgetting to add it whenever I wrote a new post or created a new page. Classic. Instead of hunting through templates and markdown files, I’ve added an Eleventy HTML transform that automatically applies the glow up. I had Claude Code help me figure out the regex and the transform config. This allowed me to get this done before the kids came home. Don't @ me. The effect itself is a simple utility class using : Swap for whatever gradient custom property you have defined. The repeats the gradient across the text for a more dynamic flame effect. The transform lives in its own plugin file and gets registered in . It runs after Eleventy has rendered each page, tokenises the HTML by splitting on tags, tracks a skip-tag stack, and only replaces text in text nodes. Tags in the set, along with any span already carrying the class, push onto the stack. No replacement happens while the stack is non-empty, so link text, code examples, the page , and already-wrapped instances are all left alone. HTML attributes like and are never touched because they sit inside tag tokens, not text nodes. A single regex handles everything in one pass. The optional group matches " fury" (with space) or “fury” (without), so “flamed fury” and “flamedfury” (as it appears in the domain name) are both wrapped as a unit. The flag covers every capitalisation variant (“fLaMEd fury”, “Flamed Fury”, “FLAMED FURY”) with the original casing preserved in the output. This helps because I can be inconsistent with the styling at times. Export the plugin from wherever you manage your Eleventy plugins: Then register it in . Register it before any HTML prettify transform so the spans are in place before reformatting runs: That’s it. Any mention of the site name (fLaMEd fury) in body text gets the gradient automatically, in posts, templates, data-driven content, wherever. Look out for the easter egg I’ve dropped in. Later. Hey, thanks for reading this post in your feed reader! Want to chat? Reply by email or add me on XMPP , or send a webmention . Check out the posts archive on the website.

0 views