Posts in Typescript (20 found)
Armin Ronacher 1 weeks ago

Absurd In Production

About five months ago I wrote about Absurd , a durable execution system we built for our own use at Earendil, sitting entirely on top of Postgres and Postgres alone. The pitch was simple: you don’t need a separate service , a compiler plugin , or an entire runtime to get durable workflows. You need a SQL file and a thin SDK. Since then we’ve been running it in production, and I figured it’s worth sharing what the experience has been like. The short version: the design held up, the system has been a pleasure to work with, and other people seem to agree. Absurd is a durable execution system that lives entirely inside Postgres. The core is a single SQL file ( absurd.sql ) that defines stored procedures for task management, checkpoint storage, event handling, and claim-based scheduling. On top of that sit thin SDKs (currently TypeScript , Python and an experimental Go one) that make the system ergonomic in your language of choice. The model is straightforward: you register tasks, decompose them into steps, and each step acts as a checkpoint. If anything fails, the task retries from the last completed step. Tasks can sleep, wait for external events, and suspend for days or weeks. All state lives in Postgres. If you want the full introduction, the original blog post covers the fundamentals. What follows here is what we’ve learned since. The project got multiple releases over the last five months. Most of the changes are things you’d expect from a system that people actually started depending on: hardened claim handling, watchdogs that terminate broken workers, deadlock prevention, proper lease management, event race conditions, and all the edge cases that only show up when you’re running real workloads. A few things worth calling out specifically. Decomposed steps. The original design only had , where you pass in a function and get back its checkpointed result. That works well for many cases but not all. Sometimes you need to know whether a step already ran before deciding what to do next. So we added / , which give you a handle you can inspect before committing the result. This turned out to be very useful for modeling intentional failures and conditional logic. This in particular is necessary when working with “before call” and “after call” type hook APIs. Task results. You can now spawn a task, go do other things, and later come back to fetch or await its result. This sounds obvious in hindsight, but the original system was purely fire-and-forget. Having proper result inspection made it possible to use Absurd for things like spawning child tasks from within a parent workflow and waiting for them to finish. This is particularly useful for debugging with agents too. absurdctl . We built this out as a proper CLI tool. You can initialize schemas, run migrations, create queues, spawn tasks, emit events, retry failures from the command line. It’s installable via or as a standalone binary. This has been invaluable for debugging production issues. When something is stuck, being able to just and see exactly where it stopped is a very different experience from digging through logs. Habitat . A small Go application that serves up a web dashboard for monitoring tasks, runs, checkpoints, and events. It connects directly to Postgres and gives you a live view of what’s happening. It’s simple, but it’s the kind of thing that makes the system more enjoyable for humans. Agent integration. Since Absurd was originally built for agent workloads, we added a bundled skill that coding agents can discover and use to debug workflow state via . There’s also a documented pattern for making pi agent turns durable by logging each message as a checkpoint. The thing I’m most pleased about is that the core design didn’t need to change all that much. The fundamental model of tasks, steps, checkpoints, events, and suspending is still exactly what it was initially. We added features around it, but nothing forced us to rethink the basic abstractions. Putting the complexity in SQL and keeping the SDKs thin turned out to be a genuinely good call. The TypeScript SDK is about 1,400 lines. The Python SDK is about 1,900 but most of this comes from the complexity of supporting colored functions. Compare that to Temporal’s Python SDK at around 170,000 lines. It means the SDKs are easy to understand, easy to debug, and easy to port. When something goes wrong, you can read the entire SDK in an afternoon and understand what it does. The checkpoint-based replay model also aged well. Unlike systems that require deterministic replay of your entire workflow function, Absurd just loads the cached step results and skips over completed work. That means your code doesn’t need to be deterministic outside of steps. You can call or in between steps and things still work, because only the step boundaries matter. In practice, this makes it much easier to reason about what’s safe and what isn’t. Pull-based scheduling was the right choice too. Workers pull tasks from Postgres as they have capacity. There’s no coordinator, no push mechanism, no HTTP callbacks. That makes it trivially self-hostable and means you don’t have to think about load management at the infrastructure level. I had some discussions with folks about whether the right abstraction should have been a durable promise . It’s a very appealing idea, but it turns out to be much more complex to implement in practice. It’s however in theory also more powerful. I did make some attempts to see what absurd would look like if it was based on durable promises but so far did not get anywhere with it. It’s however an experiment that I think would be fun to try! The primary use case is still agent workflows. An agent is essentially a loop that calls an LLM, processes tool results, and repeats until it decides it’s done. Each iteration becomes a step, and each step’s result is checkpointed. If the process dies on iteration 7, it restarts and replays iterations 1 through 6 from the store, then continues from 7. But we’ve found it useful for a lot of other things too. All our crons just dispatch distributed workflows with a pre-generated deduplication key from the invocation. We can have two cron processes running and they will only trigger one absurd task invocation. We also use it for background processing that needs to survive deploys. Basically anything where you’d otherwise build your own retry-and-resume logic on top of a queue. Absurd is deliberately minimal, but there are things I’d like to see. There’s no built-in scheduler. If you want cron-like behavior, you run your own scheduler loop and use idempotency keys to deduplicate. That works, and we have a documented pattern for it , but it would be nice to have something more integrated. There’s no push model. Everything is pull. If you need an HTTP endpoint to receive webhooks and wake up tasks, you build that yourself. I think that’s the right default as push systems are harder to operate and easier to overwhelm but there are cases where it would be convenient. In particular there are quite a few agentic systems where it would be super nice to have webhooks natively integrated (wake on incoming POST request). I definitely don’t want to have this in the core, but that sounds like the kind of problem that could be a nice adjacent library that builds on top of absurd. The biggest omission is that it does not support partitioning yet. That’s unfortunate because it makes cleaning up data more expensive than it has to be. In theory supporting partitions would be pretty simple. You could have weekly partitions and then detach and delete them when they expire. The only thing that really stands in the way of that is that Postgres does not have a convenient way of actually doing that. The hard part is not partitioning itself, it’s partition lifecycle management under real workloads. If a worker inserts a row whose lands in a month without a partition, the insert fails and the workflow crashes. So you need a separate maintenance loop that always creates future partitions far enough ahead for sleeps/retries, and does that for every queue. On the delete side, the safe approach is , but getting that to run from doesn’t work because it cannot be run within a transaction, but runs everything in one. I don’t think it’s an unsolvable problem, but it’s one I have not found a good solution for and I would love to get input on . This brings me a bit to a meta point on the whole thing which is what the point of Open Source libraries in the age of agentic engineering is. Durable Execution is now something that plenty of startups sell you. On the other hand it’s also something that an agent would build you and people might not even look for solutions any more. It’s kind of … weird? I don’t think a durable execution library can support a company, I really don’t. On the other hand I think it’s just complex enough of a problem that it could be a good Open Source project void of commercial interests. You do need a bit of an ecosystem around it, particularly for UI and good DX for debugging, and that’s hard to get from a throwaway implementation. I don’t think we have squared this yet, but it’s already much better to use than a few months ago. If you’re using Absurd, thinking about it, or building adjacent ideas, I’d love your feedback. Bug reports, rough edges, design critiques, and contributions are all very welcome—this project has gotten better every time someone poked at it from a different angle.

0 views
Martin Fowler 1 weeks ago

Fragments: April 2

As we see LLMs churn out scads of code, folks have increasingly turned to Cognitive Debt as a metaphor for capturing how a team can lose understanding of what a system does. Margaret-Anne Storey thinks a good way of thinking about these problems is to consider three layers of system health : While I’m getting a bit bemused by debt metaphor proliferation, this way of thinking does make a fair bit of sense. The article includes useful sections to diagnose and mitigate each kind of debt. The three interact with each other, and the article outlines some general activities teams should do to keep it all under control ❄                ❄ In the article she references a recent paper by Shaw and Nave at the Wharton School that adds LLMs to Kahneman’s two-system model of thinking . Kahneman’s book, “Thinking Fast and Slow”, is one of my favorite books. Its central idea is that humans have two systems of cognition. System 1 (intuition) makes rapid decisions, often barely-consciously. System 2 (deliberation) is when we apply deliberate thinking to a problem. He observed that to save energy we default to intuition, and that sometimes gets us into trouble when we overlook things that we would have spotted had we applied deliberation to the problem. Shaw and Nave consider AI as System 3 A consequence of System 3 is the introduction of cognitive surrender, characterized by uncritical reliance on externally generated artificial reasoning, bypassing System 2. Crucially, we distinguish cognitive surrender, marked by passive trust and uncritical evaluation of external information, from cognitive offloading, which involves strategic delegation of cognition during deliberation. It’s a long paper, that does into detail on this “Tri-System theory of cognition” and reports on several experiments they’ve done to test how well this theory can predict behavior (at least within a lab). ❄                ❄                ❄                ❄                ❄ I’ve seen a few illustrations recently that use the symbols “< >” as part of an icon to illustrate code. That strikes me as rather odd, I can’t think of any programming language that uses “< >” to surround program elements. Why that and not, say, “{ }”? Obviously the reason is that they are thinking of HTML (or maybe XML), which is even more obvious when they use “</>” in their icons. But programmers don’t program in HTML. ❄                ❄                ❄                ❄                ❄ Ajey Gore thinks about if coding agents make coding free, what becomes the expensive thing ? His answer is verification. What does “correct” mean for an ETA algorithm in Jakarta traffic versus Ho Chi Minh City? What does a “successful” driver allocation look like when you’re balancing earnings fairness, customer wait time, and fleet utilisation simultaneously? When hundreds of engineers are shipping into ~900 microservices around the clock, “correct” isn’t one definition — it’s thousands of definitions, all shifting, all context-dependent. These aren’t edge cases. They’re the entire job. And they’re precisely the kind of judgment that agents cannot perform for you. Increasingly I’m seeing a view that agents do really well when they have good, preferably automated, verification for their work. This encourages such things as Test Driven Development . That’s still a lot of verification to do, which suggests we should see more effort to find ways to make it easier for humans to comprehend larger ranges of tests. While I agree with most of what Ajey writes here, I do have a quibble with his view of legacy migration. He thinks it’s a delusion that “agentic coding will finally crack legacy modernisation”. I agree with him that agentic coding is overrated in a legacy context, but I have seen compelling evidence that LLMs help a great deal in understanding what legacy code is doing . The big consequence of Ajey’s assessment is that we’ll need to reorganize around verification rather than writing code: If agents handle execution, the human job becomes designing verification systems, defining quality, and handling the ambiguous cases agents can’t resolve. Your org chart should reflect this. Practically, this means your Monday morning standup changes. Instead of “what did we ship?” the question becomes “what did we validate?” Instead of tracking output, you’re tracking whether the output was right. The team that used to have ten engineers building features now has three engineers and seven people defining acceptance criteria, designing test harnesses, and monitoring outcomes. That’s the reorganisation. It’s uncomfortable because it demotes the act of building and promotes the act of judging. Most engineering cultures resist this. The ones that don’t will win. ❄                ❄                ❄                ❄                ❄ One the questions comes up when we think of LLMs-as-programmers is whether there is a future for source code. David Cassel on The New Stack has an article summarizing several views of the future of code . Some folks are experimenting with entirely new languages built with the LLM in mind, others think that existing languages, especially strictly typed languages like TypeScript and Rust will be the best fit for LLMs. It’s an overview article, one that has lots of quotations, but not much analysis in itself - but it’s worth a read as a good overview of the discussion. I’m interested to see how all this will play out. I do think there’s still a role for humans to work with LLMs to build useful abstractions in which to talk about what the code does - essentially the DDD notion of Ubiquitous Language . Last year Unmesh and I talked about growing a language with LLMs. As Unmesh put it Programming isn’t just typing coding syntax that computers can understand and execute; it’s shaping a solution. We slice the problem into focused pieces, bind related data and behaviour together, and—crucially—choose names that expose intent. Good names cut through complexity and turn code into a schematic everyone can follow. The most creative act is this continual weaving of names that reveal the structure of the solution that maps clearly to the problem we are trying to solve. Technical debt lives in code. It accumulates when implementation decisions compromise future changeability. It limits how systems can change. Cognitive debt lives in people. It accumulates when shared understanding of the system erodes faster than it is replenished. It limits how teams can reason about change. Intent debt lives in artifacts. It accumulates when the goals and constraints that should guide the system are poorly captured or maintained. It limits whether the system continues to reflect what we meant to build and it limits how humans and AI agents can continue to evolve the system effectively.

0 views
David Bushell 3 weeks ago

404 Deno CEO not found

I visited deno.com yesterday. I wanted to know if the hundreds of hours I’d spent mastering Deno was a sunk cost. Do I continue building for the runtime, or go back to Node? Well I guess that pretty much sums up why a good chunk of Deno employees left the company over the last week. Layoffs are what American corpo culture calls firing half the staff. Totally normal practice for a sustainable business. Mass layoffs are deemed better for the moral of those who remain than a weekly culling before Friday beers. The Romans loved a good decimation. † If I were a purveyor of slop and tortured metaphors, I’d have adorned this post with a deepfake of Ryan Dahl fiddling as Deno burned. But I’m not, so the solemn screenshot will suffice. † I read Rome, Inc. recently. Not a great book, I’m just explaining the reference. A year ago I wrote about Deno’s decline . The facts, undeterred by my subjective scorn, painted a harsh picture; Deno Land Inc. was failing. Deno incorporated with $4.9M of seed capital five years ago. They raised a further $21M series A a year later. Napkin math suggests a five year runway for an unprofitable company (I have no idea, I just made that up.) Coincidentally, after my blog post topped Hacker News — always a pleasure for my inbox — Ryan Dahl (Deno CEO) clapped back on the offical Deno blog: There’s been some criticism lately about Deno - about Deploy, KV, Fresh, and our momentum in general. You may have seen some of the criticism online; it’s made the rounds in the usual places, and attracted a fair amount of attention. Some of that criticism is valid. In fact, I think it’s fair to say we’ve had a hand in causing some amount of fear and uncertainty by being too quiet about what we’re working on, and the future direction of our company and products. That’s on us. Reports of Deno’s Demise Have Been Greatly Exaggerated - Ryan Dahl Dahl mentioned that adoption had doubled following Deno 2.0. Since the release of Deno 2 last October - barely over six months ago! - Deno adoption has more than doubled according to our monthly active user metrics. User base doubling sounds like a flex for a lemonade stand unless you give numbers. I imagine Sequoia Capital expected faster growth regardless. The harsh truth is that Deno’s offerings have failed to capture developers’ attention. I can’t pretend to know why — I was a fanboy myself — but far too few devs care about Deno. On the rare occasions Deno gets attention on the orange site, the comments page reads like in memoriam . I don’t even think the problem was that Deno Deploy, the main source of revenue, sucked. Deploy was plagued by highly inconsistent isolate start times . Solicited feedback was ignored. Few cared. It took an issue from Wes Bos , one of the most followed devs in the game, for anyone at Deno to wake up. Was Deploy simply a ghost town? Deno rushed the Deploy relaunched for the end of 2025 and it became “generally available” last month. Anyone using it? Anyone care? The Deno layoffs this week suggest only a miracle would have saved jobs. The writing was on the wall. Speaking of ghost towns, the JSR YouTube channel is so lonely I feel bad for linking it. I only do because it shows just how little interest some Deno-led projects mustered. JSR floundered partly because Deno was unwilling couldn’t afford to invest in better infrastructure . But like everything else in the Deno ecosystem, users just weren’t interested. What makes a comparable project like NPMX flourish so quickly? Evidently, developers don’t want to replace Node and NPM. They just want what they already have but better; a drop-in improvement without friction. To Deno and Dahl’s credit, they recognised this with the U-turn on HTTP imports . But the resulting packaging mess made things worse. JSR should have been NPMX. Deno should have gone all-in on but instead we got mixed messaging and confused docs. I could continue but it would just be cruel to dissect further. I’ve been heavily critical of Deno in the past but I really wanted it to succeed. There were genuinely good people working at Deno who lost their job and that sucks. I hope the Deno runtime survives. It’s a breath of fresh air. B*n has far more bugs and compatibility issues than anyone will admit. Node still has too much friction around TypeScript and ECMAScript modules. So where does Deno go from here? Over to you, Ryan. Where is Deno CEO, Ryan Dahl? Tradition dictates an official PR statement following layoffs. Seems weird not to have one prepared in advance. That said, today is Friday, the day to bury bad news. I may be publishing this mere hours before we hear what happens next… Given Dahl’s recent tweets and blog post , a pivot to AI might be Deno’s gamble. By the way, it’s rather telling that all the ex-employees posted their departures on Bluesky. What that tells you depends on whether you enjoy your social media alongside Grok undressing women upon request. I digress. Idle speculation has led to baseless rumours of an OpenAI acquisition. I’m not convinced that makes sense but neither does the entire AI industry. I’m not trying to hate on Dahl but c’mon bro you’re the CEO. What’s next for Deno? Give me users anyone a reason to care. Although if you’re planning a 10× resurgence with automated Mac Minis, I regret asking. Thanks for reading! Follow me on Mastodon and Bluesky . Subscribe to my Blog and Notes or Combined feeds.

0 views
David Bushell 1 months ago

SvelteKit i18n and FOWL

Perhaps my favourite JavaScript APIs live within the Internationalization namespace. A few neat things the global allows: It’s powerful stuff and the browser or runtime provides locale data for free! That means timezones, translations, and local conventions are handled for you. Remember moment.js? That library with locale data is over 600 KB (uncompressed). That’s why JavaScript now has the Internationalization API built-in. SvelteKit and similar JavaScript web frameworks allow you to render a web page server-side and “hydrate” in the browser. In theory , you get the benefits of an accessible static website with the progressively enhanced delights of a modern “web app”. I’m building attic.social with SvelteKit. It’s an experiment without much direction. I added a bookmarks feature and used to format dates. Perfect! Or was it? Disaster strikes! See this GIF: What is happening here? Because I don’t specify any locale argument in the constructor it uses the runtime’s default. When left unconfigured, many environments will default to . I spotted this bug only in production because I’m hosting on a Cloudflare worker. SvelteKit’s first render is server-side using but subsequent renders use in my browser. My eyes are briefly sullied by the inferior US format! Is there a name for this effect? If not I’m coining: “Flash of Wrong Locale” (FOWL). To combat FOWL we must ensure that SvelteKit has the user’s locale before any templates are rendered. Browsers may request a page with the HTTP header. The place to read headers is hooks.server.ts . I’ve vendored the @std/http negotiation library to parse the request header. If no locales are provided it returns which I change to . SvelteKit’s is an object to store custom data for the lifetime of a single request. Event are not directly accessible to SvelteKit templates. That could be dangerous. We must use a page or layout load function to forward the data. Now we can update the original example to use the data. I don’t think the rune is strictly necessary but it stops a compiler warning . This should eliminate FOWL unless the header is missing. Privacy focused browsers like Mullvad Browser use a generic header to avoid fingerprinting. That means users opt-out of internationalisation but FOWL is still gone. If there is a cache in front of the server that must vary based on the header. Otherwise one visitor defines the locale for everyone who follows unless something like a session cookie bypasses the cache. You could provide a custom locale preference to override browser settings. I’ve done that before for larger SvelteKit projects. Link that to a session and store it in a cookie, or database. Naturally, someone will complain they don’t like the format they’re given. This blog post is guaranteed to elicit such a comment. You can’t win! Why can’t you be normal, Safari? Despite using the exact same locale, Safari still commits FOWL by using an “at” word instead of a comma. Who’s fault is this? The ECMAScript standard recommends using data from Unicode CLDR . I don’t feel inclined to dig deeper. It’s a JavaScriptCore quirk because Bun does the same. That is unfortunate because it means the standard is not quite standard across runtimes. By the way, the i18n and l10n abbreviations are kinda lame to be honest. It’s a fault of my design choices that “internationalisation” didn’t fit well in my title. Thanks for reading! Follow me on Mastodon and Bluesky . Subscribe to my Blog and Notes or Combined feeds. Natural alphanumeric sorting Relative date and times Currency formatting

0 views
David Bushell 1 months ago

Building on AT Protocol

At Protocol has got me! I’m morphing into an atmosphere nerd . AT Protocol — atproto for short — is the underlying tech that powers Bluesky and new social web apps. Atproto as I understand it is largely an authorization and data layer. All atproto data is inherently public. In theory it can be encrypted for private use but leaky metadata and de-anonymisation is a whole thing. Atproto users own the keys to their data which is stored on a Personal Data Server (PDS). You don’t need to manage your own. If you don’t know where your data is stored, good chance it’s on Bluesky’s PDS. You can move your data to another PDS like Blacksky or Eurosky . Or if you’re a nerd like me self-host your own PDS . You own your data and no PDS can stop you moving it. Atproto provides OAuth; think “Sign in with GitHub” . But instead of an account being locked behind the whims of proprietary slopware, user identity is proven via their PDS. Social apps like Bluesky host a PDS allowing users to create a new account. That account can be used to login to other apps like pckt , Leaflet , or Tangled . You could start a new account on Tangled’s PDS and use that for Bluesky. Atproto apps are not required to provide a PDS but it helps to onboard new users. Of course I did. You can sign in at attic.social Attic is a cozy space with lofty ambitions. What does Attic do? I’m still deciding… it’ll probably become a random assortment of features. Right now it has bookmarks. Bookmarks will have search and tags soon. Technical details: to keep the server stateless I borrowed ideas from my old SvelteKit auth experiment. OAuth and session state is stored in encrypted HTTP-only cookies. I used the atcute TypeScript libraries to do the heavy atproto work. I found @flo-bit’s projects which helped me understand implementation details. Attic is on Cloudflare workers for now. When I’ve free time I’ll explore the SvelteKit Bunny adapter . I am busy on client projects so I’ll be scheming Attic ideas in my free time. What’s so powerful about atproto is that users can move their account/data. Apps write data to a PDS using a lexicon ; a convention to say: “this is a Bluesky post”, for example. Other apps are free to read that data too. During authorization, apps must ask for permission to write to specific lexicons. The user is in control. You may have heard that Bluesky is or isn’t “decentralised”. Bluesky was simply the first atproto app. Most users start on Bluesky and may never be aware of the AT Protocol. What’s important is that atproto makes it difficult for Bluesky to “pull a Twitter”, i.e. kill 3rd party apps, such as the alternate Witchsky . If I ever abandon attic.social your data is still in your hands. Even if the domain expires! You can extract data from your PDS. You can write a new app to consume it anytime. That’s the power of AT Protocol. Thanks for reading! Follow me on Mastodon and Bluesky . Subscribe to my Blog and Notes or Combined feeds.

0 views
Allen Pike 1 months ago

Launch Now

Inside us are two wolves. One wolf wants to craft, polish and refine – make things of exceptional quality. The other wolf wants to move fast and get feedback now. The two wolves don’t always get along. For years, I’ve balanced this by working toward exceptional products but constantly collecting private feedback along the way. Then, once we’ve built something excellent, something worthy of attention, we launch it to the world with appropriate fanfare. Videos, marketing campaigns, polished onboarding, and so on. “Here’s something worth trying, we think you’ll really like it.” This totally works. At least, it works as a path to eventually ship high-quality software. Polished, usable, even delightful software. But when it comes to building something people will pay for, it’s neither reliable nor fast. Our first product at Forestwalk was a developer tool – a platform for building and running evaluations of LLM-powered apps . We learned a ton building it, but after a few months – as we approached our first pilot projects – feedback from demos and potential first customers convinced us that this was the wrong path. It was more likely to lead us into a lifestyle business than something big. So we pivoted. We spent a few weeks building a prototype a week, showing demos, doing customer research, and found a second promising product path. Our second product was a productivity tool – a work assistant that could capture, organize, and rationalize teams’ tasks . We learned a ton building it, but after a few months – as we approached a public beta – feedback from private testers and our investors convinced us that this was the wrong path. It was more likely to lead us into a lifestyle business than something big. So we pivoted. The third time purports to be the charm. But at the same time, doing the same thing over and over typically gets the same results. We need to build something profoundly useful, something people really want. We can’t keep hiding away, sending out private demos and prototypes, not fully shipping anything! So, we decided to push harder into the discomfort of showing our work early. Just before Christmas, we decided to commit to something and work towards getting it shipped. This third product is codenamed Cedarloop 1 . It’s a realtime meeting agent. Unlike AIs that passively listen in to meetings and just write up notes after the fact, Cedar joins calls and uses “voice in, visuals out” to screen-share useful observations and perform routine tasks live during a Google Meet or Zoom meeting. The vision is to build a kind of agentic PM assistant. It can respond within a second of you talking 2 , which – when it works – feels like magic. We’ve been learning a lot building it. Recently, we started working with an excellent designer here in Vancouver who was keen to get going. I’d like to do some user testing. What do people say when you let them try it? Well, obviously it’s so early right now. They won’t like it. The inference and onboarding need more work. But we’ve been doing research about problems, needs, willingness to pay, and things like that. Sure… but we should also let people try it. What if we launched now? Well, obviously we can’t launch now . I mean… obviously. Launching now would be embarrassing. It’s not my brand to launch something publicly that’s not ready. On the other hand… I keep a printed copy of Y Combinator’s list of essential startup advice on my desk. And if you know YC, you’ll know that the first point of advice is “Launch now”. Only last month I was interviewing Brett Huneycutt, Wealthsimple’s co-founder . He had a lot of great stories, but one that sticks out is that even as a $10B company, they prioritize launching “now”, for as close as they can get to that definition. It’s not just about speed: a rapid feedback loop is a core ingredient in getting to quality. So we launched now. As of today, people can check out our research-preview realtime meeting agent at Cedarloop.ai . With luck, they’ll report issues, inform what we should prioritize next, and tell us what problems they’d love to have automated away. We’re only a few hours in, and yep – people are reporting issues. Linear integration had an OAuth issue. Login didn’t work in social-media webviews. We’ve been so focused on the desktop experience that we’ve let the mobile layout get janky. This is embarrassing! But also, there’s signal. People are trying the Linear integration. Our desktop-focused app is being discovered on mobile. Folks care enough to click at all. And in a week or so, we’ll have a smoother onboarding flow than we would have gotten to with weeks of private user tests. So it’s worth the pain. We’re going to take the feedback, follow the signal, learn and re-learn, and do better. We’ll use it to forge the best damn live agent ever – or, if the feedback peters out, we’ll know we’re on the wrong path, and find the right one. In the meantime, there’s a lot to do. 3 Back to work! This is not a good name yet. For example, sometimes iOS mishears “Hey Cedar” as “Hey Siri”. But part of our move-fast strategy is to worry more about names once we’ve proven something has traction. At that point, we’ll put in the work to give it the right name – and eventually rename the company after it. ↩ It’s fascinating how much you can do to get LLM response times down. Our first prototype often took over 8000ms to respond, which doesn’t feel live at all. Once we got it under ~1200ms, voice-in-vision-out suddenly felt alive – a step change. We have a lot of work planned to get Cedarloop even faster and much more reliable, which I’m keen to write about when I can. ↩ Speaking of having a lot to do: if you’re an experienced product-minded developer in Vancouver who would be excited to iterate and build out realtime agents using LLMs and TypeScript, we’re hiring a Founding Engineer . Just sayin’. ↩ This is not a good name yet. For example, sometimes iOS mishears “Hey Cedar” as “Hey Siri”. But part of our move-fast strategy is to worry more about names once we’ve proven something has traction. At that point, we’ll put in the work to give it the right name – and eventually rename the company after it. ↩ It’s fascinating how much you can do to get LLM response times down. Our first prototype often took over 8000ms to respond, which doesn’t feel live at all. Once we got it under ~1200ms, voice-in-vision-out suddenly felt alive – a step change. We have a lot of work planned to get Cedarloop even faster and much more reliable, which I’m keen to write about when I can. ↩ Speaking of having a lot to do: if you’re an experienced product-minded developer in Vancouver who would be excited to iterate and build out realtime agents using LLMs and TypeScript, we’re hiring a Founding Engineer . Just sayin’. ↩

0 views
Justin Duke 1 months ago

Unshipping Keystatic

Two years after initially adopting it , we've formally unshipped Keystatic . Our CMS, such as it is, is now a bunch of Markdoc files and a TypeScript schema organizing the front matter — which is to say, it's not really a CMS at all. There were a handful of reasons for this move, in no specific order: That last point is basically what I wrote about Invoke — it's a terrible heuristic, judging a project by its commit frequency, and I know that. Things can and should be finished! And yet. When you're already on the fence, a quiet GitHub graph is the thing that tips you over. To Keystatic's credit, it was tremendously easy to extricate. The whole migration was maybe two hours of work, most of which was just deleting code. That's the sign of a well-designed library — one that doesn't metastasize into every corner of your codebase. I wish more tools were this easy to leave. Our team's use of Keystatic as an actual front-end CMS had dropped to zero. All of the non-coders have grown sufficiently adept with Markdown that the GUI was gathering dust; Keystatic had become a pure schema validation and rendering tool, and offered fairly little beyond what we were already getting from our build step. Some of the theoretically nice things — image hosting, better previewing — either didn't work as smoothly as we'd like or were supplanted entirely by Vercel's built-in features. The project appears to have atrophied a little bit, commits dwindling into the one-per-quarter frequency despite a healthy number of open issues. This is not to besmirch the lovely maintainers, who have many other things going on. But it's harder to stick around on a library you're not getting much value from when you're also worried there's not a lot of momentum down the road.

0 views
Simon Willison 2 months ago

How StrongDM's AI team build serious software without even looking at the code

Last week I hinted at a demo I had seen from a team implementing what Dan Shapiro called the Dark Factory level of AI adoption, where no human even looks at the code the coding agents are producing. That team was part of StrongDM, and they've just shared the first public description of how they are working in Software Factories and the Agentic Moment : We built a Software Factory : non-interactive development where specs + scenarios drive agents that write code, run harnesses, and converge without human review. [...] In kōan or mantra form: In rule form: Finally, in practical form: I think the most interesting of these, without a doubt, is "Code must not be reviewed by humans". How could that possibly be a sensible strategy when we all know how prone LLMs are to making inhuman mistakes ? I've seen many developers recently acknowledge the November 2025 inflection point , where Claude Opus 4.5 and GPT 5.2 appeared to turn the corner on how reliably a coding agent could follow instructions and take on complex coding tasks. StrongDM's AI team was founded in July 2025 based on an earlier inflection point relating to Claude Sonnet 3.5: The catalyst was a transition observed in late 2024: with the second revision of Claude 3.5 (October 2024), long-horizon agentic coding workflows began to compound correctness rather than error. By December of 2024, the model's long-horizon coding performance was unmistakable via Cursor's YOLO mode . Their new team started with the rule "no hand-coded software" - radical for July 2025, but something I'm seeing significant numbers of experienced developers start to adopt as of January 2026. They quickly ran into the obvious problem: if you're not writing anything by hand, how do you ensure that the code actually works? Having the agents write tests only helps if they don't cheat and . This feels like the most consequential question in software development right now: how can you prove that software you are producing works if both the implementation and the tests are being written for you by coding agents? StrongDM's answer was inspired by Scenario testing (Cem Kaner, 2003). As StrongDM describe it: We repurposed the word scenario to represent an end-to-end "user story", often stored outside the codebase (similar to a "holdout" set in model training), which could be intuitively understood and flexibly validated by an LLM. Because much of the software we grow itself has an agentic component, we transitioned from boolean definitions of success ("the test suite is green") to a probabilistic and empirical one. We use the term satisfaction to quantify this validation: of all the observed trajectories through all the scenarios, what fraction of them likely satisfy the user? That idea of treating scenarios as holdout sets - used to evaluate the software but not stored where the coding agents can see them - is fascinating . It imitates aggressive testing by an external QA team - an expensive but highly effective way of ensuring quality in traditional software. Which leads us to StrongDM's concept of a Digital Twin Universe - the part of the demo I saw that made the strongest impression on me. The software they were building helped manage user permissions across a suite of connected services. This in itself was notable - security software is the last thing you would expect to be built using unreviewed LLM code! [The Digital Twin Universe is] behavioral clones of the third-party services our software depends on. We built twins of Okta, Jira, Slack, Google Docs, Google Drive, and Google Sheets, replicating their APIs, edge cases, and observable behaviors. With the DTU, we can validate at volumes and rates far exceeding production limits. We can test failure modes that would be dangerous or impossible against live services. We can run thousands of scenarios per hour without hitting rate limits, triggering abuse detection, or accumulating API costs. How do you clone the important parts of Okta, Jira, Slack and more? With coding agents! As I understood it the trick was effectively to dump the full public API documentation of one of those services into their agent harness and have it build an imitation of that API, as a self-contained Go binary. They could then have it build a simplified UI over the top to help complete the simulation. With their own, independent clones of those services - free from rate-limits or usage quotas - their army of simulated testers could go wild . Their scenario tests became scripts for agents to constantly execute against the new systems as they were being built. This screenshot of their Slack twin also helps illustrate how the testing process works, showing a stream of simulated Okta users who are about to need access to different simulated systems. This ability to quickly spin up a useful clone of a subset of Slack helps demonstrate how disruptive this new generation of coding agent tools can be: Creating a high fidelity clone of a significant SaaS application was always possible, but never economically feasible. Generations of engineers may have wanted a full in-memory replica of their CRM to test against, but self-censored the proposal to build it. The techniques page is worth a look too. In addition to the Digital Twin Universe they introduce terms like Gene Transfusion for having agents extract patterns from existing systems and reuse them elsewhere, Semports for directly porting code from one language to another and Pyramid Summaries for providing multiple levels of summary such that an agent can enumerate the short ones quickly and zoom in on more detailed information as it is needed. StrongDM AI also released some software - in an appropriately unconventional manner. github.com/strongdm/attractor is Attractor , the non-interactive coding agent at the heart of their software factory. Except the repo itself contains no code at all - just three markdown files describing the spec for the software in meticulous detail, and a note in the README that you should feed those specs into your coding agent of choice! github.com/strongdm/cxdb is a more traditional release, with 16,000 lines of Rust, 9,500 of Go and 6,700 of TypeScript. This is their "AI Context Store" - a system for storing conversation histories and tool outputs in an immutable DAG. It's similar to my LLM tool's SQLite logging mechanism but a whole lot more sophisticated. I may have to gene transfuse some ideas out of this one! I visited the StrongDM AI team back in October as part of a small group of invited guests. The three person team of Justin McCarthy, Jay Taylor and Navan Chauhan had formed just three months earlier, and they already had working demos of their coding agent harness, their Digital Twin Universe clones of half a dozen services and a swarm of simulated test agents running through scenarios. And this was prior to the Opus 4.5/GPT 5.2 releases that made agentic coding significantly more reliable a month after those demos. It felt like a glimpse of one potential future of software development, where software engineers move from building the code to building and then semi-monitoring the systems that build the code. The Dark Factory. I glossed over this detail in my first published version of this post, but it deserves some serious attention. If these patterns really do add $20,000/month per engineer to your budget they're far less interesting to me. At that point this becomes more of a business model exercise: can you create a profitable enough line of products that you can afford the enormous overhead of developing software in this way? Building sustainable software businesses also looks very different when any competitor can potentially clone your newest features with a few hours of coding agent work. I hope these patterns can be put into play with a much lower spend. I've personally found the $200/month Claude Max plan gives me plenty of space to experiment with different agent patterns, but I'm also not running a swarm of QA testers 24/7! I think there's a lot to learn from StrongDM even for teams and individuals who aren't going to burn thousands of dollars on token costs. I'm particularly invested in the question of what it takes to have agents prove that their code works without needing to review every line of code they produce. You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options . Why am I doing this? (implied: the model should be doing this instead) Code must not be written by humans Code must not be reviewed by humans If you haven't spent at least $1,000 on tokens today per human engineer, your software factory has room for improvement

0 views
devansh 3 months ago

HonoJS JWT/JWKS Algorithm Confusion

After spending some time looking for security issues in JS/TS frameworks , I moved on to Hono - fast, clean, and popular enough that small auth footguns can become "big internet problems". This post is about two issues I found in Hono's JWT/JWKS verification path: Both were fixed in hono 4.11.4 , and GitHub Security Advisories were published on January 13, 2026 . If you already have experience with JWT stuff, you can skip this: The key point here is that, algorithm choice must not be attacker-controlled. Hono's JWT helper documents that is optional - and defaults to HS256. That sounds harmless until you combine it with a very common real-world setup: In that case, the verification path defaults to HS256, treating that public key string as an HMAC secret, and that becomes forgeable because public keys are, well… public. If an attacker can generate a token that passes verification, they can mint whatever claims the application trusts ( , , , etc.) and walk straight into protected routes. This is the "algorithm confusion" class of bugs, where you think you're doing asymmetric verification, but you're actually doing symmetric verification with a key the attacker knows. This is configuration-dependent. The dangerous case is: The core issue is, Hono defaults to , so a public key string can accidentally be used as an HMAC secret, allowing forged tokens and auth bypass. Advisory: GHSA-f67f-6cw9-8mq4 This was classified as High (CVSS 8.2) and maps it to CWE-347 (Improper Verification of Cryptographic Signature) . Affected versions: Patched version: 4.11.4 In the JWK/JWKS verification middleware, Hono could pick the verification algorithm like this: GitHub's advisory spells it out, when the selected JWK doesn't explicitly define an algorithm, the middleware falls back to using the from the unverified JWT header - and since in JWK is optional and commonly omitted, this becomes a real-world issue. If the matching JWKS key lacks , falls back to token-controlled , enabling algorithm confusion / downgrade attacks. "Trusting " is basically letting the attacker influence how you verify the signature. Depending on surrounding constraints (allowed algorithms, how keys are selected, and how the app uses claims), this can lead to forged tokens being accepted and authz/authn bypass . Advisory: GHSA-3vhc-576x-3qv4 This was classified as High (CVSS 8.2) , also CWE-347 , with affected versions and patched in 4.11.4 . Both advisories took the same philosophical stance i.e. Make explicit. Don't infer it from attacker-controlled input. The JWT middleware now requires an explicit option — a breaking change that forces callers to pin the algorithm instead of relying on defaults. Before (vulnerable): After (patched): (Example configuration shown in the advisory.) The JWK/JWKS middleware now requires an explicit allowlist of asymmetric algorithms, and it no longer derives the algorithm from untrusted JWT header values. It also explicitly rejects symmetric HS* algorithms in this context. Before (vulnerable): After (patched): (Example configuration shown in the advisory.) JWT / JWK / JWKS Primer Vulnerabilities [CVE-2026-22817] - JWT middleware "unsafe default" (HS256) Why this becomes an auth bypass Who is affected? Advisory / severity [CVE-2026-22817] - JWK/JWKS middleware fallback Why it matters Advisory / severity The Fix Fix for #1 (JWT middleware) Fix for #2 (JWK/JWKS middleware) Disclosure Timeline a default algorithm footgun in the JWT middleware that can lead to forged tokens if an app is misconfigured a JWK/JWKS algorithm selection bug where verification could fall back to an untrusted value JWT is . The header includes (the signing algorithm). JWK is a JSON representation of a key (e.g. an RSA public key). JWKS is a set of JWKs, usually hosted at something like . The app expects RS256 (asymmetric) The developer passes an RSA public key string But they don't explicitly set you use the JWT middleware with an asymmetric public key and you don't pin Use if present Otherwise, fall back to from the JWT (unverified input) Discovery: 09th Dec, 2025 First Response: 09th Dec, 2025 Patched in: hono 4.11.4 Advisories published: 13 Jan, 2026 Advisory: GHSA-f67f-6cw9-8mq4 Advisory: GHSA-3vhc-576x-3qv4

0 views
Rob Zolkos 3 months ago

So where can we use our Claude subscription then?

There’s been confusion about where we can actually use a Claude subscription. This comes after Anthropic took action to prevent third-party applications from spoofing the Claude Code harness to use Claude subscriptions. The information in this post is based on my understanding from reading various tweets, official GitHub repos and documentation (some of which may or may not be up to date). I will endeavour to keep it up to date as new information becomes available. I would love to see Anthropic themselves maintain an easily parsable page like this that shows what is and is not permitted with a Claude subscription. We've taken action to prevent third-party clients from spoofing the Claude Code agent harness to use consumer subscriptions. Consumer subscriptions and their benefits should only be used in the Anthropic experiences they support (Claude Code CLI, Claude Code web, and via sessionKey in the Agent SDK). Third-party apps can use the API. From what I can gather, consumer subscriptions work with official Anthropic tools, not third-party applications. If you want third-party integrations, you need the API. The consumer applications (desktop and mobile) are the most straightforward way to use your Claude subscription. Available at claude.com/download , these apps give you direct access to Claude for conversation, file uploads, and Projects. The official command-line interface for Claude Code is fully supported with Claude subscriptions. This is the tool Anthropic built and maintains specifically for developers who want to use Claude in their development workflow. You get the full power of Claude integrated into your terminal, with access to your entire codebase, the ability to execute commands, read and write files, and use all the specialized agents that come with Claude Code. The web version of Claude Code (accessible through your browser at claude.ai/code) provides the same capabilities as the CLI but through a browser interface. Upload your project files, or point it at a repository, and you can work with Claude on your codebase directly. Want to experiment with building custom agents? The Claude Agent SDK lets you develop and test specialized agents powered by your Claude subscription for personal development work. The SDK is available in both Python and TypeScript , with documentation here . This is for personal experiments and development. For production deployments of agents, use the API instead of your subscription. You can use your Claude subscription to run automated agents in GitHub Actions. The Claude Code Action lets you set up workflows that leverage Claude for code review, documentation generation, or automated testing analysis. Documentation is here . Any other uses of Claude would require the use of API keys. Your Claude subscription gives you: Let me know if you have any corrections. Claude desktop and mobile apps for general use Claude Code CLI for terminal-based development Claude Code on the web for browser-based work The ability to build custom agents through the official SDK (for personal development) Claude Code GitHub Action for CI/CD integration

1 views
devansh 3 months ago

ElysiaJS Cookie Signature Validation Bypass

The recent React CVE(s) made quite a buzz in the industry. It was a pretty powerful vulnerability, which directly leads to Pre-auth RCE (one of the most impactful vuln classes). The React CVE inspired me to investigate vulnerabilities in other JS/TS frameworks. I selected Elysia as my target for several reasons: active maintenance, ~16K GitHub stars, clear documentation, and clean codebase - all factors that make for productive security research. While scrolling through the codebase, one specific codeblock looked interesting: It took me less than a minute to identify the "anti-pattern" here. Can you see what's wrong here? We'll get to it in a bit, but first, a little primer on ElysiaJS Cookie Signing. Elysia treats cookies as reactive signals, meaning they're mutable objects you can read and update directly in your route handlers without getters/setters. Cookie signing adds a cryptographic layer to prevent clients from modifying cookie values (e.g., escalating privileges in a session token). Elysia uses a signature appended to the cookie value, tied to a secret key. This ensures integrity (data wasn't altered) and authenticity (it came from your server). On a higher level, it works something like this: Rotating secrets is essential for security hygiene (e.g., after a potential breach or periodic refresh). Elysia handles this natively with multi-secret support . This code is responsible for handling cookie related logic (signing, unsigning, secrets rotation). Now, going back to the vulnerability, can you spot the vulnerability in the below screenshot? No worries if you couldn't. I will walk you through. The guard check at the end ( ) becomes completely useless because can never be . This is dead code. You see now? Basically if you are using the vulnerable version of Elysia and using secrets array (secrets rotation); Complete auth bypass is possible because error never gets thrown. This seemed like a pretty serious issue, so I dropped a DM to Elysia's creator SaltyAom . SaltyAom quickly confirmed the issue At this point, we know that this is a valid issue, but we still need to create a PoC for it to showcase what it can do, so a security advisory could be created. Given my limited experience with Tyscript. I looked into the docs of Elysia and looked into sample snippets. After getting a decent understanding of syntax Elysia uses, it was time to create the PoC app using Elysia. I had the basic idea in my mind of how my PoC app would look like, It will have a protected resource only admin can access, and by exploiting this vulnerability I should be able to reach the protected resource without authenticating as admin or without even having admin cookies. Eventually, I came up with the following PoC for demonstrating impact: Without signing up as admin, or login, issue the following cURL command: We got access to protected content; without using an signed admin cookie. Pretty slick, no? The developer likely meant to write: Instead, they wrote: The attacker only needs to: That's literally it. This vulnerability was fixed in v1.4.19 With this fix in place, the verification logic now works correctly. Affected Versions : Elysia ≤ v1.4.18 ( confirmed ), potentially earlier versions Fixed Versions : v1.4.19 Elysia and Cookie Signing Secrets Rotation Vulnerability Proof of Concept What It Does Let's Break It Disclosure Timeline cookies.ts#L413-L426 Signing : When you set a cookie (e.g., profile.value = data), Elysia hashes the serialized value + secret, appends sig to the cookie. Unsigning/Verification : On read, Elysia checks the signature against the secret. If invalid (tampered or wrong secret), it throws an error or rejects the cookie. How It Works: Provide secrets as an array: [oldestDeprecated, ..., currentActive]. Tries the latest secret first for signing new cookies. For reading, it falls back sequentially through the array until a match (or fails). Sets ( assumes the cookie is valid before checking anything! ) Loops through each secret Calls for each secret If any secret successfully verifies, sets ( wait, it's already - this does nothing ), stores the unsigned value, and breaks If no secrets verify , the loop completes naturally without ever modifying Checks if is ... but it's still from step 1 No error is thrown - the tampered cookie is accepted as valid Allows one-time signup of an admin account only Allows an existing admin to log in . Issues a signed session cookie once logged in. Protects a secret route so only logged-in admin can access it. Capture or observe one valid cookie ( even their own ) Edit the cookie value to some other users' identify in their browser or with curl; and remove the signature Send it back to the server Discovery : 9th December 2025 Vendor Contact : 9th December 2025 Vendor Response : 9th December 2025 Patch Release : 13th December 2025 CVE Assignment : Pending Vulnerable Code: src/cookies.ts#L413-L426 Elysia Documentation: elysiajs.com Elysia Cookie Documentation: elysiajs.com/patterns/cookie

0 views
JSLegendDev 3 months ago

The Phaser Game Framework in 5 Minutes

Phaser is the most popular JavaScript/TypeScript framework for making 2D games. It’s performant, and popular games like Vampire Survivors and PokéRogue were made with it. Because it’s a web-native framework, games made with Phaser are lightweight and generally load and run better on the web than the web exports produced by major game engines. For that reason, if you’re looking to make 2D web games, Phaser is a great addition to your toolbelt. In this post, I’ll explain the framework’s core concepts in around 5 minutes. — SPONSORED SEGMENT — In case you want to bring your web game to desktop platforms, today’s sponsor GemShell , allows you to build executables for Windows/Mac/Linux in what amounts to a click. It also makes Steam integration easy. For more info, visit 👉 https://l0om.itch.io/gemshell You have a tool/product you want featured in a sponsored segment? Contact me at [email protected] A Phaser project starts with defining a config to describe how the game’s canvas should be initialized. To make the game scale according to the window’s size, we can set the scale property in our config. The mode property is set to FIT, so the canvas scales while preserving its own aspect ratio. As for keeping the canvas centered on the page, the autoCenter property is used with the CENTER_BOTH value. Most games are composed of multiple scenes and switching between them is expected during the course of gameplay. Since Phaser uses the object oriented paradigm, a scene is created by defining a class that inherits from the Phaser.Scene class. To be able to reference the scene elsewhere in our code, it’s important to give it a key. For this purpose and for being able to use the methods and properties of the parent class we need to call the super constructor and pass to it the key we want to use. The two most important methods of a Phaser scene are the and methods. The first is used for, among other things, creating game objects like text and sprites and setting things like scores. It runs once, every time the scene becomes active. The latter which runs once per frame is, for example, used to handle movement logic. Once a scene is created, we still need to add it to our game. This is done in the Phaser config under a property called scenes, which expects an array. The order of scenes in this array is important. The first element will be used as the default scene of the game. To switch scenes, we can call the method of the scene manager. Before we can render sprites, we need to load them. For this purpose, a Phaser scene has access to the method where asset loading logic should be placed. To load an image, we can use the image method of the loader plugin. Then, in the method, we can render a sprite by calling the method of the Game Object Factory plugin. The first two params are for specifying the X and Y coordinates while the third param is for providing the key of the sprite to render. Because we created our sprite game object in the method, we don’t have access to it in our method, that’s why you’ll often see the pattern of assigning a game object to an instance field so it becomes accessible to other methods of the scene. Finally, movement logic code is placed in the method which runs every frame. Rendering text is similar to sprites. Rather than the using the method, we use the method. If you want to hold data or define custom methods for a sprite game object, a better approach is to define a class that inherits from the Phaser.GameObject.Sprite class. Once the class is defined, we can use it in our scene’s code. While asset loading can be done in any Phaser scene, a better approach is to create a scene dedicated to loading assets, which then switches to the main game scene once loading is complete. This can be achieved like shown below : Another important aspect of any game is the ability to play animations. Usually for 2D games, we have spritesheets containing all the needed frames to animate a character in a single image. An example of a spritesheet In Phaser, we first specify the dimensions of a frame in the loading logic of the spritesheet so that the framework knows how to slice the image into individual frames. Then, we can create an animation by defining its starting and ending frames. To provide the needed frames we call the method of Phaser’s animation manager. Finally once the animation is created, it can be played by using the method of the sprite game object. If you want the animation to loop back indefinitely, add the repeat property and set it to -1. A game needs to be interactive to be called a “game”. One way to handle input is by using event listeners provided by Phaser. For keyboard input, we can use : And for handling mouse and touch input we can use . At one point, you might need to share data between scenes. For this purpose, you can use Phaser’s registry. Here is an example of its usage. To play sounds (assuming you have already loaded the sound first) you can use the method of the sound manager. You can specify the sound’s volume in the second param of that method. If you need to be able to stop, pause or play the same sound at a later time, you can add it to the sound manager rather than playing it immediately. This comes in handy when you transition from one scene to another and you have a sound that loops indefinitely. In that case, you need to stop the sound before switching over otherwise the sound will keep playing in the next scene. By default, Phaser offers an Arcade physics system which is not meant for complex physics simulations. However, it’s well suited for most types of games. To enable it, you can add the following to your Phaser config. You can add an existing game object to the physics system the same way you add one to a scene. This will create a physics body for that game object which is accessible with the body instance field. You can view this body as a hitbox around your sprite if you turn on the debug mode in your project’s config. Example of a Phaser game with debug set to true To create bodies that aren’t affected by gravity, like platforms, you can create a static group and then create and add static bodies to that group. Here’s an example : You can also add already existing physics bodies to a group. Now, you might be wondering what groups are useful for? They shine in collision handling logic. Let’s assume you have multiple enemies attacking the player. To determine when a collision occurs between any enemy and the player, you can set up the following collision handler : There are many concepts I did not have time to cover. If you want to delve further into Phaser, I have a project based course you can purchase where I guide you through the process of building a Sonic themed infinite runner game. This is a great opportunity to put in practice what you’ve learned here. If you’re interested, here’s the link to the course : https://www.patreon.com/posts/learn-phaser-4-147473030 . That said, you can freely play the game being built in the course as well as have access to the final source code. Original Phaser game live demo : https://jslegend.itch.io/sonic-ring-run-phaser-4 Demo of the version built in the course : https://jslegend.itch.io/sonic-runner-tutorial-build Final source code : https://github.com/JSLegendDev/sonic-runner-phaser-tutorial If you enjoy technical posts like this one, I recommend subscribing to not miss out on future releases. Subscribe now In the meantime, you can read the following : Phaser is the most popular JavaScript/TypeScript framework for making 2D games. It’s performant, and popular games like Vampire Survivors and PokéRogue were made with it. Because it’s a web-native framework, games made with Phaser are lightweight and generally load and run better on the web than the web exports produced by major game engines. For that reason, if you’re looking to make 2D web games, Phaser is a great addition to your toolbelt. In this post, I’ll explain the framework’s core concepts in around 5 minutes. — SPONSORED SEGMENT — In case you want to bring your web game to desktop platforms, today’s sponsor GemShell , allows you to build executables for Windows/Mac/Linux in what amounts to a click. It also makes Steam integration easy. For more info, visit 👉 https://l0om.itch.io/gemshell You have a tool/product you want featured in a sponsored segment? Contact me at [email protected] The Phaser Config A Phaser project starts with defining a config to describe how the game’s canvas should be initialized. To make the game scale according to the window’s size, we can set the scale property in our config. The mode property is set to FIT, so the canvas scales while preserving its own aspect ratio. As for keeping the canvas centered on the page, the autoCenter property is used with the CENTER_BOTH value. Scene Creation Most games are composed of multiple scenes and switching between them is expected during the course of gameplay. Since Phaser uses the object oriented paradigm, a scene is created by defining a class that inherits from the Phaser.Scene class. To be able to reference the scene elsewhere in our code, it’s important to give it a key. For this purpose and for being able to use the methods and properties of the parent class we need to call the super constructor and pass to it the key we want to use. The two most important methods of a Phaser scene are the and methods. The first is used for, among other things, creating game objects like text and sprites and setting things like scores. It runs once, every time the scene becomes active. The latter which runs once per frame is, for example, used to handle movement logic. Hooking Up a Scene to Our Game Once a scene is created, we still need to add it to our game. This is done in the Phaser config under a property called scenes, which expects an array. The order of scenes in this array is important. The first element will be used as the default scene of the game. Switching Scenes To switch scenes, we can call the method of the scene manager. Rendering Sprites Before we can render sprites, we need to load them. For this purpose, a Phaser scene has access to the method where asset loading logic should be placed. To load an image, we can use the image method of the loader plugin. Then, in the method, we can render a sprite by calling the method of the Game Object Factory plugin. The first two params are for specifying the X and Y coordinates while the third param is for providing the key of the sprite to render. Because we created our sprite game object in the method, we don’t have access to it in our method, that’s why you’ll often see the pattern of assigning a game object to an instance field so it becomes accessible to other methods of the scene. Finally, movement logic code is placed in the method which runs every frame. Rendering Text Rendering text is similar to sprites. Rather than the using the method, we use the method. Entity Creation If you want to hold data or define custom methods for a sprite game object, a better approach is to define a class that inherits from the Phaser.GameObject.Sprite class. Once the class is defined, we can use it in our scene’s code. Asset Loading While asset loading can be done in any Phaser scene, a better approach is to create a scene dedicated to loading assets, which then switches to the main game scene once loading is complete. This can be achieved like shown below : Animation API Another important aspect of any game is the ability to play animations. Usually for 2D games, we have spritesheets containing all the needed frames to animate a character in a single image. An example of a spritesheet In Phaser, we first specify the dimensions of a frame in the loading logic of the spritesheet so that the framework knows how to slice the image into individual frames. Then, we can create an animation by defining its starting and ending frames. To provide the needed frames we call the method of Phaser’s animation manager. Finally once the animation is created, it can be played by using the method of the sprite game object. If you want the animation to loop back indefinitely, add the repeat property and set it to -1. Input Handling A game needs to be interactive to be called a “game”. One way to handle input is by using event listeners provided by Phaser. For keyboard input, we can use : And for handling mouse and touch input we can use . Sharing Data Between Scenes At one point, you might need to share data between scenes. For this purpose, you can use Phaser’s registry. Here is an example of its usage. Playing Sound To play sounds (assuming you have already loaded the sound first) you can use the method of the sound manager. You can specify the sound’s volume in the second param of that method. If you need to be able to stop, pause or play the same sound at a later time, you can add it to the sound manager rather than playing it immediately. This comes in handy when you transition from one scene to another and you have a sound that loops indefinitely. In that case, you need to stop the sound before switching over otherwise the sound will keep playing in the next scene. Physics, Debug Mode, Physics Bodies and Collision Logic By default, Phaser offers an Arcade physics system which is not meant for complex physics simulations. However, it’s well suited for most types of games. To enable it, you can add the following to your Phaser config. You can add an existing game object to the physics system the same way you add one to a scene. This will create a physics body for that game object which is accessible with the body instance field. You can view this body as a hitbox around your sprite if you turn on the debug mode in your project’s config. Example of a Phaser game with debug set to true To create bodies that aren’t affected by gravity, like platforms, you can create a static group and then create and add static bodies to that group. Here’s an example : You can also add already existing physics bodies to a group. Now, you might be wondering what groups are useful for? They shine in collision handling logic. Let’s assume you have multiple enemies attacking the player. To determine when a collision occurs between any enemy and the player, you can set up the following collision handler : Project Based Tutorial There are many concepts I did not have time to cover. If you want to delve further into Phaser, I have a project based course you can purchase where I guide you through the process of building a Sonic themed infinite runner game. This is a great opportunity to put in practice what you’ve learned here. If you’re interested, here’s the link to the course : https://www.patreon.com/posts/learn-phaser-4-147473030 . That said, you can freely play the game being built in the course as well as have access to the final source code. Original Phaser game live demo : https://jslegend.itch.io/sonic-ring-run-phaser-4 Demo of the version built in the course : https://jslegend.itch.io/sonic-runner-tutorial-build Final source code : https://github.com/JSLegendDev/sonic-runner-phaser-tutorial

0 views
JSLegendDev 3 months ago

Learn Phaser 4 by Building a Sonic Themed Infinite Runner Game in JavaScript

Phaser is the most popular JavaScript/TypeScript framework for making 2D games. It is performant and popular games like Vampire Survivors and Pokérogue were made with it. Because it’s a web-native framework, games built with it are lightweight and generally load and run better on the web than web exports produced by major game engines. For this reason, if you’re a web developer looking to make 2D web games, Phaser is a great addition to your toolbelt. To make the process of learning Phaser easier, I have released a course that takes you through the process of building a Sonic themed infinite runner game with Phaser 4 and JavaScript. You can purchase the course here : h ttps://www.patreon.com/posts/learn-phaser-4-147473030 . Total length of the course is 1h 43min. More details regarding content and prerequisites are included in the link. That said, you can freely play the game being built in the course as well as have access to the final source code. Original Phaser game live demo : https://jslegend.itch.io/sonic-ring-run-phaser-4 Demo of the version built in the course : https://jslegend.itch.io/sonic-runner-tutorial-build Final source code : https://github.com/JSLegendDev/sonic-runner-phaser-tutorial Phaser is the most popular JavaScript/TypeScript framework for making 2D games. It is performant and popular games like Vampire Survivors and Pokérogue were made with it. Because it’s a web-native framework, games built with it are lightweight and generally load and run better on the web than web exports produced by major game engines. For this reason, if you’re a web developer looking to make 2D web games, Phaser is a great addition to your toolbelt. To make the process of learning Phaser easier, I have released a course that takes you through the process of building a Sonic themed infinite runner game with Phaser 4 and JavaScript. You can purchase the course here : h ttps://www.patreon.com/posts/learn-phaser-4-147473030 . Total length of the course is 1h 43min. More details regarding content and prerequisites are included in the link. That said, you can freely play the game being built in the course as well as have access to the final source code. Original Phaser game live demo : https://jslegend.itch.io/sonic-ring-run-phaser-4 Demo of the version built in the course : https://jslegend.itch.io/sonic-runner-tutorial-build Final source code : https://github.com/JSLegendDev/sonic-runner-phaser-tutorial

0 views
DuckTyped 3 months ago

One year of keeping a tada list

A tada list, or to-done list, is where you write out what you accomplished each day. It’s supposed to make you focus on things you’ve completed instead of focusing on how much you still need to do. Here is what my tada lists look like: I have a page for every month. Every day, I write out what I did. At the end of the month, I make a drawing in the header to show what I did that month. Here are a few of the drawings: In January, I started a Substack, made paintings for friends, and wrote up two Substack posts on security. In February, I learned took a CSS course and created a component library for myself. In March, I read a few books, worked on a writing app, took a trip to New York, and drafted several posts on linear algebra for this Substack. (If you’re wondering where these posts are, there’s a lag time between draft and publish, where I send the posts out for technical review and do a couple of rounds of rewrites). I don’t really spend much time celebrating my accomplishments. Once I accomplish something, I have a small hit of, “Yay, I did it,” before moving on to, “So, what else am I going to do?” For example, when I finished my book (a three-year-long effort), I had a couple of weeks of, “Yay, I wrote a book,” before this became part of my normal life, and it turned into, “Yes, I wrote a book, but what else have I done since then?” I thought the tada list would help reinforce “I did something!” but it also turned into “I was able to do this thing, because I did this other thing earlier”. I’ll explain with an For years I have been wanting to create a set of cards with paintings of Minnesota, for family and friends. The problem: I didn’t have many paintings of Minnesota, and didn’t like the ones I had. So I spent 2024 learning a lot about watercolor pigments, and color mixing hundreds of greens, to figure out which greens I wanted to use in my landscapes: Then I spent the early part of 2025 doing a bunch of value studies, because my watercolors always looked faded: (Value studies are where you try to make your paintings look good using black and white only, so you're forced to work using value instead of color. It’s an old exercise to improve your art). Then in the summer, I did about 50 plein air paintings of Minnesota landscapes: (Plein air = painting on location. Please admire the wide variety of greens I mixed for these paintings). Look at how much better these are: Out of those 50, I picked my top four and had cards made. Thanks to the “tada” list, it wasn’t just “I made some cards”, it was Remember when I spent countless hours on color mixing And value studies And spent most of my summer painting outside? The payoff for all that work was these lovely cards. Test prints The final four For a while now, I have wanted a mustache-like templating language, but with static typing. Last year, I created a parser combinator library called `tarsec` for TypeScript, and this year, I used it to write a mustache-like template language called `typestache` for myself that had static typing. I’ve since used both `tarsec` and `typestache` in personal projects, like this one that adds file-based routing to express and autogenerates a client for the frontend. Part of the reason I like learning stuff is it lets me do things I couldn’t do before. I think acknowledging that you CAN do something new is an important part of the learning process, but I usually skip it. The tada list helps. Maybe the most obvious con: a tada list forces you to have an accomplishment each day so you can write it down, and this added stress to my day. Also, a year is a long time to keep it going, and I ran out of steam by the end. You can see that my handwriting gets worse as time goes on and for the last couple of months, I stopped doing the pictures. It’s fun to see things on the list that I had forgotten about. For example, I had started this massive watercolor painting of the Holiday Inn in Pacifica in February, and I completely forgot about it Will I do this next year? Maybe. I need to weigh the accomplishment part against the work it takes to keep it going. It’s neat to have this artifact to look back on either way. Thanks for reading DuckTyped! Subscribe for free to receive new posts and support my work. A few more of the several color studies I did: Including another grid of greens. I have a page for every month. Every day, I write out what I did. At the end of the month, I make a drawing in the header to show what I did that month. Here are a few of the drawings: In January, I started a Substack, made paintings for friends, and wrote up two Substack posts on security. In February, I learned took a CSS course and created a component library for myself. In March, I read a few books, worked on a writing app, took a trip to New York, and drafted several posts on linear algebra for this Substack. (If you’re wondering where these posts are, there’s a lag time between draft and publish, where I send the posts out for technical review and do a couple of rounds of rewrites). Pros I don’t really spend much time celebrating my accomplishments. Once I accomplish something, I have a small hit of, “Yay, I did it,” before moving on to, “So, what else am I going to do?” For example, when I finished my book (a three-year-long effort), I had a couple of weeks of, “Yay, I wrote a book,” before this became part of my normal life, and it turned into, “Yes, I wrote a book, but what else have I done since then?” I thought the tada list would help reinforce “I did something!” but it also turned into “I was able to do this thing, because I did this other thing earlier”. I’ll explain with an example For years I have been wanting to create a set of cards with paintings of Minnesota, for family and friends. The problem: I didn’t have many paintings of Minnesota, and didn’t like the ones I had. So I spent 2024 learning a lot about watercolor pigments, and color mixing hundreds of greens, to figure out which greens I wanted to use in my landscapes: Then I spent the early part of 2025 doing a bunch of value studies, because my watercolors always looked faded: (Value studies are where you try to make your paintings look good using black and white only, so you're forced to work using value instead of color. It’s an old exercise to improve your art). Then in the summer, I did about 50 plein air paintings of Minnesota landscapes: (Plein air = painting on location. Please admire the wide variety of greens I mixed for these paintings). Look at how much better these are: Out of those 50, I picked my top four and had cards made. Thanks to the “tada” list, it wasn’t just “I made some cards”, it was Remember when I spent countless hours on color mixing And value studies And spent most of my summer painting outside?

0 views
Hugo 4 months ago

Implementing a tracking-free captcha with Altcha and Nuxt

For the past few days, I've noticed several suspicious uses of my contact form. Looking closer, I noticed that each contact form submission was followed by a user signup with the same email and a name that always followed the same pattern: qSfDMiWAiLnpYYzdCeCWd fePXzKXbAmiLAweNZ etc... Let's just say their membership in the human species seems particularly dubious. Anyway, it's probably time to add some controls, and one of the most famous is the captcha. ## Next-generation captchas Everyone knows captchas – they're annoying, probably on par with cookie consent banners. Nowadays we see captchas where you have to identify traffic lights, solve additions, drag a puzzle piece to the right spot, and so on. But you may have noticed that lately we're also seeing simple forms with a checkbox: "I am not a robot". ![I'm not a robot](https://writizzy.b-cdn.net/blogs/48b77143-02ee-4316-9d68-0e6e4857c5ce/1764749254941-124yicj.jpg) Sometimes the captcha isn't even visible anymore, with detection happening without asking you anything. So how does it work? And how can I add it to my application? ## Nuxt Turnstile, the default solution with Nuxt In the Nuxt ecosystem, the most common solution is [Nuxt turnstile](https://nuxt.com/modules/turnstile). The documentation is pretty clear on how to add it. It's a great solution, but it relies on [Cloudflare turnstile](https://nuxt.com/modules/turnstile), and I'm trying to use only european products for Writizzy and Hakanai. Still, the documentation helps understand a bit better how next-generation captchas work. When the page loads, the turnstile widget performs client-side checks: - **proof of space: **The script asks the client to generate and store an amount of data according to a predefined algorithm, then asks for the byte at a given position. Not only does this take time, but it's difficult to automate at scale. - **trivial browser detections:** The idea is to try to detect a bot (no plugins, webdriver control, etc.). Fingerprinting also helps in this case. It collects all available info about the browser, OS, available APIs, resolution, etc. Note that fingerprinting can be frowned upon by GDPR, which may consider it as uniquely identifying a person. Personally, I find that debatable, but in the context of anti-spam protection, we're kind of chasing our tail here since it would be necessary to ask bots for their permission to try to detect them. We're at the limits of absurdity here. But let's continue. Based on the previous info, the script sends all this to Cloudflare. Based on this info and relying on a huge database of worldwide traffic, Cloudflare calculates a percentage chance that the user is a bot. The form will vary between: - nothing to do, Cloudflare is convinced it's a human - a checkbox "I am not a robot" - a more elaborate captcha if the suspicion is really strong - a blocking page when there's no doubt about the suspicion Now, you might say, the checkbox is a bit light, isn't it? If I've gotten this far, I can easily automate a click on a checkbox. Especially since Cloudflare is everywhere, it's necessarily the same form everywhere. Yes... But... First, the way you check the box will be analyzed. Is the click too fast, does it seem automated, is the mouse path to reach the box natural? All this can trigger additional protection. *EDIT: Turnstile might not do this operation. reCaptcha, Google's solution, is known for doing it. Turnstile is less explicit on the subject.* But on top of that, the checkbox triggers a challenge, a small calculation requested by Cloudflare that your client must perform. The result is what we call a **proof of work**. This work is slow for a computer. We're talking about 500ms, an eternity for a machine. For a human user, it's totally anecdotal. And the satisfaction of having proven their humanity makes you forget those 500 little milliseconds. On the other hand, for a bot, this time will be a real problem if it needs to automate the creation of hundreds or thousands of accounts. So it's not impossible to check this box, but it's costly. And it's supposed to make the economic equation uninteresting at high volumes. Now, even though all this is nice, I still don't want to use Cloudflare, so how do I replace it? ## Altcha, an open-source alternative During my research, I came across [altcha](https://altcha.org/). The solution is open source, requires no calls to external servers, and shares no data. The implementation requires requesting the Proof of Work (the famous JavaScript challenge) from your server. Here we'll initiate it from the Nuxt backend, in a handler: typescript ```typescript // server/api/altcha/challenge.get.ts import { createChallenge } from 'altcha-lib' export default defineEventHandler(async () => { const hmacKey = useRuntimeConfig().altchaHmacKey as string return createChallenge({ hmacKey, maxnumber: 100000, expires: new Date(Date.now() + 60000) // 1 minute }) }) ``` In the contact form page, we'll add a Vue component: vue ```vue ``` This `altchaPayload` will be added to the post payload, for example: typescript ```typescript await $fetch('/api/contact', { method: 'POST', body: { email: loggedIn.value ? user.value?.email : event.data.email, subject: event.data.subject, message: event.data.message, altcha: altchaPayload.value } }) ``` The calculation result will then be verified in the `/api/contact` endpoint typescript ```typescript const hmacKey = useRuntimeConfig().altchaHmacKey as string const ok = await verifySolution(data.altcha, hmacKey) if (!ok) { throw createError({ statusCode: 400, message: 'Invalid challenge' }) } ``` The Vue component I mentioned earlier is this one: vue ```vue ``` And there you go, the [contact page](https://pulse.hakanai.io/contact) and the [signup page](https://pulse.hakanai.io/signup) are now protected by this altcha. Now, does it work? ## Altcha's limitations The implementation was done yesterday. And unfortunately, I'm still seeing very suspicious signups on Pulse. So clearly, Altcha didn't do its job. However, now that we know how it works, it's easier to understand why it doesn't work. Altcha doesn't do any of the checks that Turnstile does: - no proof of space - no fingerprinting - no fingerprint verification with Cloudflare - no behavioral verification of the mouse click on the checkbox. The only protection is the proof of work, which only costs the attacker time. Now for Pulse, for reasons I don't understand, the person having fun creating accounts makes about 4 per day. The cost of the proof of work is negligible in this case. So Altcha is not suited for this type of "slow attack". Anyway, I'll have to find another workaround... And I'm open to your suggestions.

0 views
Jimmy Miller 4 months ago

The Easiest Way to Build a Type Checker

Type checkers are a piece of software that feel incredibly simple, yet incredibly complex. Seeing Hindley-Milner written in a logic programming language is almost magical, but it never helped me understand how it was implemented. Nor does actually trying to read anything about Algorithm W or any academic paper explaining a type system. But thanks to David Christiansen , I have discovered a setup for type checking that is so conceptually simple it demystified the whole thing for me. It goes by the name Bidirectional Type Checking. The two directions in this type checker are types and types. Unlike Hindley-Milner, we do need some type annotations, but these are typically at function definitions. So code like the sillyExample below is completely valid and fully type checks despite lacking annotations. How far can we take this? I'm not a type theory person. Reading papers in type theory takes me a while, and my comprehension is always lacking, but this paper seems like a good starting point for answering that question. So, how do we actually create a bidirectional type checker? I think the easiest way to understand it is to see a full working implementation. So that's what I have below for a very simple language. To understand it, start by looking at the types to figure out what the language supports, then look at each of the cases. But don't worry, if it doesn't make sense, I will explain in more detail below. Here we have, in ~100 lines, a fully functional type checker for a small language. Is it without flaw? Is it feature complete? Not at all. In a real type checker, you might not want to know only if something typechecks, but you might want to decorate the various parts with their type; we don't do that here. We don't do a lot of things. But I've found that this tiny bit of code is enough to start extending to much larger, more complicated code examples. If you aren't super familiar with the implementation of programming languages, some of this code might strike you as a bit odd, so let me very quickly walk through the implementation. First, we have our data structures for representing our code: Using this data structure, we can write code in a way that is much easier to work with than the actual string that we use to represent code. This kind of structure is called an "abstract syntax tree". For example This structure makes it easy to walk through our program and check things bit by bit. This simple line of code is the key to how all variables, all functions, etc, work. When we enter a function or a block, we make a new Map that will let us hold the local variables and their types. We pass this map around, and now we know the types of things that came before it. If we wanted to let you define functions out of order, we'd simply need to do two passes over the tree. The first to gather up the top-level functions, and the next to type-check the whole program. (This code gets more complicated with nested function definitions, but we'll ignore that here.) Each little bit of may seem a bit trivial. So, to explain it, let's add a new feature, addition. Now we have something just a bit more complicated, so how would we write our inference for this? Well, we are going to do the simple case; we are only allowed to add numbers together. Given that our code would look something like this: This may seem a bit magical. How does make this just work? Imagine that we have the following expression: There is no special handling in for so we end up at If you trace out the recursion (once you get used to recursion, you don't actually need to do this, but I've found it helps people who aren't used to it), we get something like So now for our first left, we will recurse back to , then to , and finally bottom out in some simple thing we know how to . This is the beauty of our bidirectional checker. We can interleave these and calls at will! How would we change our add to work with strings? Or coerce between number and string? I leave that as an exercise to the reader. It only takes just a little bit more code. I know for a lot of people this might all seem a bit abstract. So here is a very quick, simple proof of concept that uses this same strategy above for a subset of TypeScript syntax (it does not try to recreate the TypeScript semantics for types). If you play with this, I'm sure you will find bugs. You will find features that aren't supported. But you will also see the beginnings of a reasonable type checker. (It does a bit more than the one above, because otherwise the demos would be lame. Mainly multiple arguments and adding binary operators.) But the real takeaway here, I hope, is just how straightforward type checking can be. If you see some literal, you can its type. If you have a variable, you can look up its type. If you have a type annotation, you can the type of the value and it against that annotation. I have found that following this formula makes it quite easy to add more and more features.

0 views
Evan Hahn 4 months ago

Experiment: making TypeScript immutable-by-default

I like programming languages where variables are immutable by default. For example, in Rust , declares an immutable variable and declares a mutable one. I’ve long wanted this in other languages, like TypeScript, which is mutable by default—the opposite of what I want! I wondered: is it possible to make TypeScript values immutable by default? My goal was to do this purely with TypeScript, without changing TypeScript itself. That meant no lint rules or other tools. I chose this because I wanted this solution to be as “pure” as possible…and it also sounded more fun. I spent an evening trying to do this. I failed but made progress! I made arrays and s immutable by default, but I couldn’t get it working for regular objects. If you figure out how to do this completely, please contact me —I must know! TypeScript has built-in type definitions for JavaScript APIs like and and . If you’ve ever changed the or options in your TSConfig, you’ve tweaked which of these definitions are included. For example, you might add the “ES2024” library if you’re targeting a newer runtime. My goal was to swap the built-in libraries with an immutable-by-default replacement. The first step was to stop using any of the built-in libraries. I set the flag in my TSConfig, like this: Then I wrote a very simple script and put it in : When I ran , it gave a bunch of errors: Progress! I had successfully obliterated any default TypeScript libraries, which I could tell because it couldn’t find core types like or . Time to write the replacement. This project was a prototype. Therefore, I started with a minimal solution that would type-check. I didn’t need it to be good! I created and put the following inside: Now, when I ran , I got no errors! I’d defined all the built-in types that TypeScript needs, and a dummy object. As you can see, this solution is impractical for production. For one, none of these interfaces have any properties! isn’t defined, for example. That’s okay because this is only a prototype. A production-ready version would need to define all of those things—tedious, but should be straightforward. I decided to tackle this with a test-driven development style. I’d write some code that I want to type-check, watch it fail to type-check, then fix it. I updated to contain the following: This tests three things: When I ran , I saw two errors: So I updated the type in with the following: The property accessor—the line—tells TypeScript that you can access array properties by numeric index, but they’re read-only. That should make possible but impossible. The method definition is copied from the TypeScript source code with no changes (other than some auto-formatting). That should make it possible to call . Notice that I did not define . We shouldn’t be calling that on an immutable array! I ran again and…success! No errors! We now have immutable arrays! At this stage, I’ve shown that it’s possible to configure TypeScript to make all arrays immutable with no extra annotations . No need for or ! In other words, we have some immutability by default. This code, like everything in this post, is simplistic. There are lots of other array methods , like and and ! If this were made production-ready, I’d make sure to define all the read-only array methods . But for now, I was ready to move on to mutable arrays. I prefer immutability, but I want to be able to define a mutable array sometimes. So I made another test case: Notice that this requires a little extra work to make the array mutable. In other words, it’s not the default. TypeScript complained that it can’t find , so I defined it: And again, type-checks passed! Now, I had mutable and immutable arrays, with immutability as the default. Again, this is simplistic, but good enough for this proof-of-concept! This was exciting to me. It was possible to configure TypeScript to be immutable by default, for arrays at least. I didn’t have to fork the language or use any other tools. Could I make more things immutable? I wanted to see if I could go beyond arrays. My next target was the type, which is a TypeScript utility type . So I defined another pair of test cases similar to the ones I made for arrays: TypeScript complained that it couldn’t find or . It also complained about an unused , which meant that mutation was allowed. I rolled up my sleeves and fixed those errors like this: Now, we have , which is an immutable key-value pair, and the mutable version too. Just like arrays! You can imagine extending this idea to other built-in types, like and . I think it’d be pretty easy to do this the same way I did arrays and records. I’ll leave that as an exercise to the reader. My final test was to make regular objects (not records or arrays) immutable. Unfortunately for me, I could not figure this out. Here’s the test case I wrote: This stumped me. No matter what I did, I could not write a type that would disallow this mutation. I tried modifying the type every way I could think of, but came up short! There are ways to annotate to make it immutable, but that’s not in the spirit of my goal. I want it to be immutable by default! Alas, this is where I gave up. I wanted to make TypeScript immutable by default. I was able to do this with arrays, s, and other types like and . Unfortunately, I couldn’t make it work for plain object definitions like . There’s probably a way to enforce this with lint rules, either by disallowing mutation operations or by requiring annotations everywhere. I’d like to see what that looks like. If you figure out how to make TypeScript immutable by default with no other tools , I would love to know, and I’ll update my post. I hope my failed attempt will lead someone else to something successful. Again, please contact me if you figure this out, or have any other thoughts. Creating arrays with array literals is possible. Non-mutating operations, like and , are allowed. Operations that mutate the array, like , are disallowed. is allowed. There’s an unused there. doesn’t exist.

0 views
baby steps 5 months ago

Just call clone (or alias)

Continuing my series on ergonomic ref-counting, I want to explore another idea, one that I’m calling “just call clone (or alias)”. This proposal specializes the and methods so that, in a new edition, the compiler will (1) remove redundant or unnecessary calls (with a lint); and (2) automatically capture clones or aliases in closures where needed. The goal of this proposal is to simplify the user’s mental model: whenever you see an error like “use of moved value”, the fix is always the same: just call (or , if applicable). This model is aiming for the balance of “low-level enough for a Kernel, usable enough for a GUI” that I described earlier. It’s also making a statement, which is that the key property we want to preserve is that you can always find where new aliases might be created – but that it’s ok if the fine-grained details around exactly when the alias is created is a bit subtle. Consider this future: Because this is a future, this takes ownership of and . Because is a borrowed reference, this will be an error unless those values are (which they presumably are not). Under this proposal, capturing aliases or clones in a closure/future would result in capturing an alias or clone of the place. So this future would be desugared like so (using explicit capture clause strawman notation ): Now, this result is inefficient – there are now two aliases/clones. So the next part of the proposal is that the compiler would, in newer Rust editions, apply a new transformat called the last-use transformation . This transformation would identify calls to or that are not needed to satisfy the borrow checker and remove them. This code would therefore become: The last-use transformation would apply beyond closures. Given an example like this one, which clones even though is never used later: the user would get a warning like so 1 : and the code would be transformed so that it simply does a move: The goal of this proposal is that, when you get an error about a use of moved value, or moving borrowed content, the fix is always the same: you just call (or ). It doesn’t matter whether that error occurs in the regular function body or in a closure or in a future, the compiler will insert the clones/aliases needed to ensure future users of that same place have access to it (and no more than that). I believe this will be helpful for new users. Early in their Rust journey new users are often sprinkling calls to clone as well as sigils like in more-or-less at random as they try to develop a firm mental model – this is where the “keep calm and call clone” joke comes from. This approach breaks down around closures and futures today. Under this proposal, it will work, but users will also benefit from warnings indicating unnecessary clones, which I think will help them to understand where clone is really needed . But the real question is how this works for experienced users . I’ve been thinking about this a lot! I think this approach fits pretty squarely in the classic Bjarne Stroustrup definition of a zero-cost abstraction: “What you don’t use, you don’t pay for. And further: What you do use, you couldn’t hand code any better.” The first half is clearly satisfied. If you don’t call or , this proposal has no impact on your life. The key point is the second half: earlier versions of this proposal were more simplistic, and would sometimes result in redundant or unnecessary clones and aliases. Upon reflection, I decided that this was a non-starter. The only way this proposal works is if experienced users know there is no performance advantage to using the more explicit form .This is precisely what we have with, say, iterators, and I think it works out very well. I believe this proposal hits that mark, but I’d like to hear if there are things I’m overlooking. I think most users would expect that changing to just is fine, as long as the code keeps compiling. But in fact nothing requires that to be the case. Under this proposal, APIs that make significant in unusual ways would be more annoying to use in the new Rust edition and I expect ultimately wind up getting changed so that “significant clones” have another name. I think this is a good thing. I think I’ve covered the key points. Let me dive into some of the details here with a FAQ. I get it, I’ve been throwing a lot of things out there. Let me begin by recapping the motivation as I see it: I then proposed a set of three changes to address these issues, authored in individual blog posts: Let’s look at the impact of each set of changes by walking through the “Cloudflare example”, which originated in this excellent blog post by the Dioxus folks : As the original blog post put it: Working on this codebase was demoralizing. We could think of no better way to architect things - we needed listeners for basically everything that filtered their updates based on the state of the app. You could say “lol get gud,” but the engineers on this team were the sharpest people I’ve ever worked with. Cloudflare is all-in on Rust. They’re willing to throw money at codebases like this. Nuclear fusion won’t be solved with Rust if this is how sharing state works. Applying the trait and explicit capture clauses makes for a modest improvement. You can now clearly see that the calls to are calls, and you don’t have the awkward and variables. However, the code is still pretty verbose: Applying the Just Call Clone proposal removes a lot of boilerplate and, I think, captures the intent of the code very well. It also retains quite a bit of explicitness, in that searching for calls to reveals all the places that aliases will be created. However, it does introduce a bit of subtlety, since (e.g.) the call to will actually occur when the future is created and not when it is awaited : There is no question that Just Call Clone makes closure/future desugaring more subtle. Looking at task 1: this gets desugared to a call to when the future is created (not when it is awaited ). Using the explicit form: I can definitely imagine people getting confused at first – “but that call to looks like its inside the future (or closure), how come it’s occuring earlier?” Yet, the code really seems to preserve what is most important: when I search the codebase for calls to , I will find that an alias is creating for this task. And for the vast majority of real-world examples, the distinction of whether an alias is creating when the task is spawned versus when it executes doesn’t matter. Look at this code: the important thing is that is called with an alias of , so will stay alive as long as is executing. It doesn’t really matter how the “plumbing” worked. Yeah, good point, those kind of examples have more room for confusion. Like look at this: In this example, there is code that uses with an alias, but only under . So what happens? I would assume that indeed the future will capture an alias of , in just the same way that this future will move , even though the relevant code is dead: Yep! I am thinking of something like this: Examples that show some edge cased: In the relevant cases, non-move closures will already just capture by shared reference. This means that later attempts to use that variable will generally succeed: This future does not need to take ownership of to create an alias, so it will just capture a reference to . That means that later uses of can still compile, no problem. If this had been a move closure, however, that code above would currently not compile. There is an edge case where you might get an error, which is when you are moving : In that case, you can make this an closure and/or use an explicit capture clause: Yep! We would during codegen identify candidate calls to or . After borrow check has executed, we would examine each of the callsites and check the borrow check information to decide: If the answer to both questions is no, then we will replace the call with a move of the original place. Here are some examples: In the past, I’ve talked about the last-use transformation as an optimization – but I’m changing terminology here. This is because, typically, an optimization is supposed to be unobservable to users except through measurements of execution time (or though UB), and that is clearly not the case here. The transformation would be a mechanical transformation performed by the compiler in a deterministic fashion. I think yes, but in a limited way. In other words I would expect to be transformed in the same way (replaced with ), and the same would apply to more levels of intermediate usage. This would kind of “fall out” from the MIR-based optimization technique I imagine. It doesn’t have to be this way, we could be more particular about the syntax that people wrote, but I think that would be surprising. On the other hand, you could still fool it e.g. like so The way I imagine it, no. The transformation would be local to a function body. This means that one could write a method like so that “hides” the clone in a way that it will never be transformed away (this is an important capability for edition transformations!): Potentially, yes! Consider this example, written using explicit capture clause notation and written assuming we add an trait: The precise timing when values are dropped can be important – when all senders have dropped, the will start returning when you call . Before that, it will block waiting for more messages, since those handles could still be used. So, in , when will the sender aliases be fully dropped? The answer depends on whether we do the last-use transformation or not: Most of the time, running destructors earlier is a good thing. That means lower peak memory usage, faster responsiveness. But in extreme cases it could lead to bugs – a typical example is a where the guard is being used to protect some external resource. This is what editions are for! We have in fact done a very similar transformation before, in Rust 2021. RFC 2229 changed destructor timing around closures and it was, by and large, a non-event. The desire for edition compatibility is in fact one of the reasons I want to make this a last-use transformation and not some kind of optimization . There is no UB in any of these examples, it’s just that to understand what Rust code does around clones/aliases is a bit more complex than it used to be, because the compiler will do automatic transformation to those calls. The fact that this transformation is local to a function means we can decide on a call-by-call basis whether it should follow the older edition rules (where it will always occur) or the newer rules (where it may be transformed into a move). In theory, yes, improvements to borrow-checker precision like Polonius could mean that we identify more opportunities to apply the last-use transformation. This is something we can phase in over an edition. It’s a bit of a pain, but I think we can live with it – and I’m unconvinced it will be important in practice. For example, when thinking about the improvements I expect under Polonius, I was not able to come up with a realistic example that would be impacted. This last-use transformation is guaranteed not to produce code that would fail the borrow check. However, it can affect the correctness of unsafe code: Note though that, in this case, there would be a lint identifying that the call to will be transformed to just . We could also detect simple examples like this one and report a stronger deny-by-default lint, as we often do when we see guaranteed UB. When I originally had this idea, I called it “use-use-everywhere” and, instead of writing or , I imagined writing . This made sense to me because a keyword seemed like a stronger signal that this was impacting closure desugaring. However, I’ve changed my mind for a few reasons. First, Santiago Pastorino gave strong pushback that was going to be a stumbling block for new learners. They now have to see this keyword and try to understand what it means – in contrast, if they see method calls, they will likely not even notice something strange is going on. The second reason though was TC who argued, in the lang-team meeting, that all the arguments for why it should be ergonomic to clone a ref-counted value in a closure applied equally well to , depending on the needs of your application. I completely agree. As I mentioned earlier, this also [addresses the concern I’ve heard with the trait], which is that there are things you want to ergonomically clone but which don’t correspond to “aliases”. True. In general I think that (and ) are fundamental enough to how Rust is used that it’s ok to special case them. Perhaps we’ll identify other similar methods in the future, or generalize this mechanism, but for now I think we can focus on these two cases. One point that I’ve raised from time-to-time is that I would like a solution that gives the compiler more room to optimize ref-counting to avoid incrementing ref-counts in cases where it is obvious that those ref-counts are not needed. An example might be a function like this: This function requires ownership of an alias to a ref-counted value but it doesn’t actually do anything but read from it. A caller like this one… …doesn’t really need to increment the reference count, since the caller will be holding a reference the entire time. I often write code like this using a : so that the caller can do – this then allows the callee to write in the case that it wants to take ownership. I’ve basically decided to punt on adressing this problem. I think folks that are very performance sensitive can use and the rest of us can sometimes have an extra ref-count increment, but either way, the semantics for users are clear enough and (frankly) good enough. Surprisingly to me, doesn’t have a dedicated lint for unnecessary clones. This particular example does get a lint, but it’s a lint about taking an argument by value and then not consuming it. If you rewrite the example to create locally, clippy does not complain .  ↩︎ I believe our goal should be to focus first on a design that is “low-level enough for a Kernel, usable enough for a GUI” . The key part here is the word enough . We need to make sure that low-level details are exposed, but only those that truly matter. And we need to make sure that it’s ergonomic to use, but it doesn’t have to be as nice as TypeScript (though that would be great). Rust’s current approach to fails both groups of users; calls to are not explicit enough for kernels and low-level software: when you see , you don’t know that is creating a new alias or an entirely distinct value, and you don’t have any clue what it will cost at runtime. There’s a reason much of the community recommends writing instead. calls to , particularly in closures, are a major ergonomic pain point , this has been a clear consensus since we first started talking about this issue. First, we introduce the trait (originally called ) . The trait introduces a new method that is equivalent to but indicates that this will be creating a second alias of the same underlying value. Second, we introduce explicit capture clauses , which lighten the syntactic load of capturing a clone or alias, make it possible to declare up-front the full set of values captured by a closure/future, and will support other kinds of handy transformations (e.g., capturing the result of or ). Finally, we introduce the just call clone proposal described in this post. This modifies closure desugaring to recognize clones/aliases and also applies the last-use transformation to replace calls to clone/alias with moves where possible. If there is an explicit capture clause , use that. Else: For non- closures/futures, no changes, so Categorize usage of each place and pick the “weakest option” that is available: by ref For closures/futures, we would change Categorize usage of each place and decide whether to capture that place… by clone , there is at least one call or and all other usage of requires only a shared ref (reads) by move , if there are no calls to or or if there are usages of that require ownership or a mutable reference Capture by clone/alias when a place is only used via shared references, and at least one of those is a clone or alias. For the purposes of this, accessing a “prefix place” or a “suffix place” is also considered an access to . Will this place be accessed later? Will some reference potentially referencing this place be accessed later? Without the transformation, there are two aliases: the original and the one being held by the future. So the receiver will only start returning when has finished and the task has completed. With the transformation, the call to is removed, and so there is only one alias – , which is moved into the future, and dropped once the spawned task completes. This could well be earlier than in the previous code, which had to wait until both and the new task completed. Surprisingly to me, doesn’t have a dedicated lint for unnecessary clones. This particular example does get a lint, but it’s a lint about taking an argument by value and then not consuming it. If you rewrite the example to create locally, clippy does not complain .  ↩︎

1 views

Interview with a new hosting provider founder

Most of us use infrastructure provided by companies like DigitalOcean and AWS. Some of us choose to work on that infrastructure. And some of us are really built different and choose to build all that infrastructure from scratch . This post is a real treat for me to bring you. I met Diana through a friend of mine, and I've gotten some peeks behind the curtain as she builds a new hosting provider . So I was thrilled that she agreed to an interview to let me share some of that with you all. So, here it is: a peek behind the curtain of a new hosting provider, in a very early stage. This is the interview as transcribed (any errors are mine), with a few edits as noted for clarity. Nicole: Hi, Diana! Thanks for taking the time to do this. Can you start us off by just telling us a little bit about who you are and what your company does? Diana: So I'm Diana, I'm trans, gay, AuDHD and I like to create, mainly singing and 3D printing. I also have dreams of being the change I want to see in the world. Since graduating high school, all infrastructure has become a passion for me. Particularly networking and computer infrastructure. From your home internet connection to data centers and everything in between. This has led me to create Andromeda Industries and the dba Gigabit.Host. Gigabit.Host is a hosting service where the focus is affordable and performant host for individuals, communities, and small businesses. Let's start out talking about the business a little bit. What made you decide to start a hosting company? The lack of performance for a ridiculous price. The margins on hosting is ridiculous, it's why the majority of the big tech companies' revenue comes from their cloud offerings. So my thought has been why not take that and use it more constructively. Instead of using the margins to crush competition while making the rich even more wealthy, use those margins for good. What is the ethos of your company? To use the net profits from the company to support and build third spaces and other low return/high investment cost ventures. From my perspective, these are the types of ideas that can have the biggest impact on making the world a better place. So this is my way of adopting socialist economic ideas into the systems we currently have and implementing the changes. How big is the company? Do you have anyone else helping out? It’s just me for now, though the plan is to make it into a co-op or unionized business. I have friends and supporters of the project, giving feedback and suggesting improvements. What does your average day-to-day look like? I go to my day job during the week, and work on the company in my spare time. I have alerts and monitors that warn me when something needs addressing, overall operations are pretty hands off. You're a founder, and founders have to wear all the hats. How have you managed your work-life balance while starting this? At this point it’s more about balancing my job, working on the company, and taking care of my cat. It's unfortunately another reason that I started this endeavor, there just aren't spaces I'd rather be than home, outside of a park or hiking. All of my friends are online and most say the same, where would I go? Hosting businesses can be very capital intensive to start. How do you fund it? Through my bonuses and stocks currently, also through using more cost effective brands that are still reliable and performant. What has been the biggest challenge of operating it from a business perspective? Getting customers. I'm not a huge fan of marketing and have been using word of mouth as the primary method of growing the business. Okay, my part here then haha. If people want to sign up, how should they do that? If people are interested in getting service, they can request an invite through this link: https://portal.gigabit.host/invite/request . What has been the most fun part of running a hosting company? Getting to actually be hands on with the hardware and making it as performant as possible. It scratches an itch of eking out every last drop of performance. Also not doing it because it's easy, doing it because I thought it would be easy. What has been the biggest surprise from starting Gigabit.Host? How both complex and easy it has been at the same time. Also how much I've been learning and growing through starting the company. What're some of the things you've learned? It's been learning that wanting it to be perfect isn't realistic, taking the small wins and building upon and continuing to learn as you go. My biggest learning challenge was how to do frontend work with Typescript and styling, the backend code has been easy for me. The frontend used to be my weakness, now it could be better, and as I add new features I can see it continuing to getting better over time. Now let's talk a little bit about the tech behind the scenes. What does the tech stack look like? Next.js and Typescript for the front and backend. Temporal is used for provisioning and task automation. Supabase is handling user management Proxmox for the hardware virtualization How do you actually manage this fleet of VMs? For the customer side we only handle the initial provisioning, then the customer is free to use whatever tool they choose. The provisioning of the VMs is handled using Go and Temporal. For our internal services we use Ansible and automation scripts. [Nicole: the code running the platform is open source, so you can take a look at how it's done in the repository !] How do your technical choices and your values as a founder and company work together? They are usually in sync, the biggest struggle has been minimizing cost of hardware. While I would like to use more advanced networking gear, it's currently cost prohibitive. Which choices might you have made differently? [I would have] gathered more capital before getting started. Though that's me trying to be a perfectionist, when the reality is buy as little as possible and use what you have when able. This seems like a really hard business to be in since you need reliability out of the gate. How have you approached that? Since I've been self-funding this endeavor, I've had to forgo high availability for now due to costs. To work around that I've gotten modern hardware for the critical parts of the infrastructure. This so far has enabled us to achieve 90%+ uptime, with the current goal to add redundancy as able to do so. What have been the biggest technical challenges you've run into? Power and colocation costs. Colocation is expensive in Seattle. Around 8x the cost of my previous colo in Atlanta, GA. Power has been the second challenge, running modern hardware means higher power requirements. Most data centers outside of hyperscalers are limited to 5 to 10 kW per rack. This limits the hardware and density, thankfully for now it [is] a future struggle. Huge thanks to Diana for taking the time out of her very busy for this interview! And thank you to a few friends who helped me prepare for the interview.

0 views
Armin Ronacher 6 months ago

Building an Agent That Leverages Throwaway Code

In August I wrote about my experiments with replacing MCP ( Model Context Protocol ) with code. In the time since I utilized that idea for exploring non-coding agents at Earendil . And I’m not alone! In the meantime, multiple people have explored this space and I felt it was worth sharing some updated findings. The general idea is pretty simple. Agents are very good at writing code, so why don’t we let them write throw-away code to solve problems that are not related to code at all? I want to show you how and what I’m doing to give you some ideas of what works and why this is much simpler than you might think. The first thing you have to realize is that Pyodide is secretly becoming a pretty big deal for a lot of agentic interactions. What is Pyodide? Pyodide is an open source project that makes a standard Python interpreter available via a WebAssembly runtime. What is neat about it is that it has an installer called micropip that allows it to install dependencies from PyPI. It also targets the emscripten runtime environment, which means there is a pretty good standard Unix setup around the interpreter that you can interact with. Getting Pyodide to run is shockingly simple if you have a Node environment. You can directly install it from npm. What makes this so cool is that you can also interact with the virtual file system, which allows you to create a persistent runtime environment that interacts with the outside world. You can also get hosted Pyodide at this point from a whole bunch of startups, but you can actually get this running on your own machine and infrastructure very easily if you want to. The way I found this to work best is if you banish Pyodide into a web worker. This allows you to interrupt it in case it runs into time limits. A big reason why Pyodide is such a powerful runtime, is because Python has an amazing ecosystem of well established libraries that the models know about. From manipulating PDFs or word documents, to creating images, it’s all there. Another vital ingredient to a code interpreter is having a file system. Not just any file system though. I like to set up a virtual file system that I intercept so that I can provide it with access to remote resources from specific file system locations. For instance, you can have a folder on the file system that exposes files which are just resources that come from your own backend API. If the agent then chooses to read from those files, you can from outside the sandbox make a safe HTTP request to bring that resource into play. The sandbox itself does not have network access, so it’s only the file system that gates access to resources. The reason the file system is so good is that agents just know so much about how they work, and you can provide safe access to resources through some external system outside of the sandbox. You can provide read-only access to some resources and write access to others, then access the created artifacts from the outside again. Now actually doing that is a tad tricky because the emscripten file system is sync, and most of the interesting things you can do are async. The option that I ended up going with is to move the fetch-like async logic into another web worker and use to block. If your entire Pyodide runtime is in a web worker, that’s not as bad as it looks. That said, I wish the emscripten file system API was changed to support stack swiching instead of this. While it’s now possible to hide async promises behind sync abstractions within Pyodide with call_sync , the same approach does not work for the emscripten JavaScript FS API. I have a full example of this at the end, but the simplified pseudocode that I ended up with looks like this: Lastly now that you have agents running, you really need durable execution. I would describe durable execution as the idea of being able to retry a complex workflow safely without losing progress. The reason for this is that agents can take a very long time, and if they interrupt, you want to bring them back to the state they were in. This has become a pretty hot topic. There are a lot of startups in that space and you can buy yourself a tool off the shelf if you want to. What is a little bit disappointing is that there is no truly simple durable execution system. By that I mean something that just runs on top of Postgres and/or Redis in the same way as, for instance, there is pgmq. The easiest way to shoehorn this yourself is to use queues to restart your tasks and to cache away the temporary steps from your execution. Basically, you compose your task from multiple steps and each of the steps just has a very simple cache key. It’s really just that simple: You can improve on this greatly, but this is the general idea. The state is basically the conversation log and whatever else you need to keep around for the tool execution (e.g., whatever was thrown on the file system). What tools does an agent need that are not code? Well, the code needs to be able to do something interesting so you need to give it access to something. The most interesting access you can provide is via the file system, as mentioned. But there are also other tools you might want to expose. What Cloudflare proposed is connecting to MCP servers and exposing their tools to the code interpreter. I think this is a quite interesting approach and to some degree it’s probably where you want to go. Some tools that I find interesting: : a tool that just lets the agent run more inference, mostly with files that the code interpreter generated. For instance if you have a zip file it’s quite fun to see the code interpreter use Python to unpack it. But if then that unpacked file is a jpg, you will need to go back to inference to understand it. : a tool that just … brings up help. Again, can be with inference for basic RAG, or similar. I found it quite interesting to let the AI ask it for help. For example, you want the manual tool to allow a query like “Which Python code should I write to create a chart for the given XLSX file?” On the other hand, you can also just stash away some instructions in .md files on the virtual file system and have the code interpreter read it. It’s all an option. If you want to see what this roughly looks like, I vibe-coded a simple version of this together. It uses a made-up example but it does show how a sandbox with very little tool availability can create surprising results: mitsuhiko/mini-agent . When you run it, it looks up the current IP from a special network drive that triggers an async fetch, and then it (usually) uses pillow or matplotlib to make an image of that IP address. Pretty pointless, but a lot of fun! 4he same approach has also been leveraged by Anthropic and Cloudflare. There is some further reading that might give you more ideas: : a tool that just lets the agent run more inference, mostly with files that the code interpreter generated. For instance if you have a zip file it’s quite fun to see the code interpreter use Python to unpack it. But if then that unpacked file is a jpg, you will need to go back to inference to understand it. : a tool that just … brings up help. Again, can be with inference for basic RAG, or similar. I found it quite interesting to let the AI ask it for help. For example, you want the manual tool to allow a query like “Which Python code should I write to create a chart for the given XLSX file?” On the other hand, you can also just stash away some instructions in .md files on the virtual file system and have the code interpreter read it. It’s all an option. Claude Skills is fully leveraging code generation for working with documents or other interesting things. Comes with a (non Open Source) repository of example skills that the LLM and code executor can use: anthropics/skills Cloudflare’s Code Mode which is the idea of creating TypeScript bindings for MCP tools and having the agent write code to use them in a sandbox.

2 views