Binary Lambda Calculus is Hard
Read on the website: Binary Lambda Calculus is a really alluring idea. But it’s also hard to grasp and use! Here’s my list of complaints and obstacles to using BLC.
Read on the website: Binary Lambda Calculus is a really alluring idea. But it’s also hard to grasp and use! Here’s my list of complaints and obstacles to using BLC.
Simon Willison wrote about how he vibe coded his dream presentation app for macOS . I also took a stab at vibe coding my dream app: an RSS reader. To clarify: Reeder is my dream RSS app and it already exists, so I guess you could say my dreams have already come true? But I’ve kind of always wanted to try an app where my RSS feed is just a list of unread articles and clicking any one opens it in the format in which it was published (e.g. the original website). So I took a stab at it. (Note: the backend portion of this was already solved, as I simply connected to my Feedbin account via the API .) First I tried a macOS app because I never would’ve tried a macOS app before. Xcode, Swift, a Developer Account? All completely outside my wheelhouse. But AI helped be get past that hurdle of going from nothing to something . It was fun to browse articles and see them in situ . A lot of folks have really great personal websites so it’s fun to see their published articles in that format. This was pretty much pure vibes. I didn’t really look at the code at all because I knew I wouldn’t understand any of it. I got it working the first night I sat down and tried it. It was pretty crappy but it worked. From there I iterated. I’d use it for a day, fix things that were off, keep using it, etc. Eventually I got to the point where I thought: I’m picky about software, so the bar for my dreams is high. But I’m also lazy, so my patience is quite low. The intersection of: the LLM failing over and over + my inability to troubleshoot any of it + not wanting to learn = a bad combination for persevering through debugging. Which made me say: “Screw it, I’ll build it as a website!” But websites don’t really work for this kind of app because of CORS. I can’t just stick an article’s URL in an and preview it because certain sites have cross site headers that don’t allow it to display under another domain. But that didn’t stop me. I tried building the idea anyway as just a list view. I could install this as a web app on my Mac and I'd get a simple list view: Anytime I clicked on a link, it would open in my default browser. Actually not a bad experience. It worked pretty decent on my phone too. Once I visited my preview deploy, I could "isntall" it to my home screen and then when I opened it, I'd have my latest unread articles. Clicking on any of them would open a webview that I could easily dismiss and get back to my list. Not too bad. But not what I wanted, especially on desktop. It seemed like the only option to 1) get exactly what I wanted, and 2) distribute it — all in a way that I could understand in case something went wrong or I had to overcome an obstacle — was to make a native app. At this point, I was thinking: “I’m too tired to learn Apple development right now, and I’ve worked for a long time on the web, so I may as well leverage the skills that I got.” So I vibed an Electron app because Electron will let me get around the cross site request issues of a website. This was my very first Electron app and, again, the LLM helped me go from nothing to something quite quickly (but this time I could understand my something way better). The idea was the same: unread articles on the left, a preview of any selected articles on the right. Here’s a screenshot: It’s fine. Not really what I want. But it’s a starting point. Is it better than Reeder? Hell no. Is it my wildest dreams realized? Also no. But it’s a prototype of an idea I’ve wanted to explore. I”m not sure I’ll go any further on it. It’s hacky enough that I can grasp a vision for what it could be. The question is: do I actually want this? Is this experience something I want in the long run? I think it could be. But I have to figure out exactly how I want to build it as a complementary experience to my preferred way of going through my RSS feed. Which won't be your preference. Which is why I'm not sharing it. So what’s my takeaway from all this? I don’t know. That’s why I’m typing this all out in a blog post. Vibe coding is kinda cool. It lets you go from “blank slate” to “something” way faster and easier than before. But you have to be mindful of what you make easy . You know what else is easy? Fast food. But I don’t want that all the time. In fact, vibe coding kinda left me with that feeling I get after indulging in social media, like “What just happened? Two hours have passed and what did I even spend my time doing? Just mindlessly chasing novelty?” It’s fun and easy to mindlessly chasing your whims. But part of me thinks the next best step for this is to sit and think about what I actually want, rather than just yeeting the next prompt out. I’ve quipped before that our new timelines are something like: The making from nothing isn't as hard anymore. But everything after that still is. Understanding it. Making it good. Distributing it. Supporting it. Maintaining it. All that stuff. When you know absolutely nothing about those — like I did with macOS development — things are still hard. After all this time vibing, instead of feeling closer to my dream, I actually kinda feel further from it. Like the LLM helped close the gap in understanding what it would actually take for me to realize my dreams. Which made me really appreciate the folks who have poured a lot of time and thought and effort into building RSS readers I use on a day-to-day basis. Thank you makers of Feedbin & Reeder & others through the years. I’ll gladly pay you $$$ for your thought and care. In the meantime, I may or may not be over here slowly iterating on my own supplemental RSS experience. In fact, I might’ve just found the name: RxSSuplement. Reply via: Email · Mastodon · Bluesky Ok, I could use this on my personal computer. I don’t know that I’ll be able to iterate on this much more because its getting more complicated and failing more and more with each ask ( I was just trying to move some stupid buttons around in the UI and the AI was like, “Nah bro, I can’t.”) I have no idea how I’d share this with someone. I don’t think I’d be comfortable sharing this with someone (even though I think I did things like security right by putting credentials in the built-in keychain, etc.) I guess this is where the road stops. Nothing -> Something? 1hr. Something -> Something Good ? 1 year.
I’ve got a Custom Post Type in WordPress. It’s called because it’s for documentation pages. This is for the CodePen 2.0 Docs . The Classic Docs are just “Pages” in WordPress, and that works fine, but I thought I’d do the correct WordPress thing and make a unique kind of content a Custom Post Type. This works quite nicely, except that they don’t turn up at all in Jetpack Search . I like Jetpack Search. It works well. It’s got a nice UI. You basically turn it on and forget about it. I put it on CSS-Tricks, and they still use it there. I put it on the Frontend Masters blog. It’s here on this blog. It’s a paid product, and I pay for it and use it because it’s good. I don’t begrudge core WordPress for not having better search, because raw MySQL search just isn’t very good. Jetpack Search uses Elasticsearch, a product better-suited for full-blown site search. That’s not a server requirement they could reasonably bake into core. But the fact that it just doesn’t index Custom Post Types is baffling to me. I suspect it’s just something I’m doing wrong. I can tell it doesn’t work with basic tests. For example, I’ve got a page called “Inline Block Processing” but if you search for “Inline Block Processing” it returns zero results . In the Customizing Jetpack Search area, I’m specifically telling Jetpack Search not to exclude “Docs” . That very much feels like it will include it . I’ve tried manually reindexing a couple of times, both from SSHing into Pressable and using WP-CLI to reindex, and from the “Manage Connections” page on WordPress.com. No dice. I contacted Jetpack Support, and they said: Jetpack Search handles Custom Post Types individually, so it may be that the slug for your post type isn’t yet included in the Jetpack Search index. We have a list of slugs we index here: https://github.com/Automattic/jetpack/blob/trunk/projects/packages/sync/src/modules/class-search.php#L691 If the slug isn’t on the list, please submit an issue here so that our dev team can add it: Where they sent me on GitHub was a bit confusing. It’s the end of a variable called , which doesn’t seem quite right, as that seems like, ya know, post metadata that shouldn’t be indexed, which isn’t what’s going on here. But it’s also right before a variable called private static $taxonomies_to_sync, which feels closer, but I know what a taxonomy is, and this isn’t that. A taxonomy is categories, tags, and stuff (you can make your own), but I’m not using any custom taxonomies here; I’m using a Custom Post Type. They directed me to open a GitHub Issue, so I did that . But it’s sat untouched for a month. I just need to know whether Jetpack Search can handle Custom Post Types. If it does, what am I doing wrong to make it not work? If it can’t, fine, I just wanna know so I can figure out some other way to handle this. Unsearchable docs are not tenable.
For eight years, I’ve wanted a high-quality set of devtools for working with SQLite. Given how important SQLite is to the industry 1 , I’ve long been puzzled that no one has invested in building a really good developer experience for it 2 . A couple of weeks ago, after ~250 hours of effort over three months 3 on evenings, weekends, and vacation days, I finally released syntaqlite ( GitHub ), fulfilling this long-held wish. And I believe the main reason this happened was because of AI coding agents 4 . Of course, there’s no shortage of posts claiming that AI one-shot their project or pushing back and declaring that AI is all slop. I’m going to take a very different approach and, instead, systematically break down my experience building syntaqlite with AI, both where it helped and where it was detrimental. I’ll do this while contextualizing the project and my background so you can independently assess how generalizable this experience was. And whenever I make a claim, I’ll try to back it up with evidence from my project journal, coding transcripts, or commit history 5 .
I have these Sunday evenings where I find myself sitting alone at the kitchen table, thinking about my life and how I got here . Usually, these sessions end with an inspiring idea that makes me want to get up and build something. I remember the old days where I couldn't even sleep because I had all these ideas bubbling in my head, and I could just get up and do it because I had no familial responsibility. I still have that flare in me, but I also don't always give in to those ideas. Instead, sometimes I chose to do something much simpler. I read. Sometimes it's a book, sometimes it's a blog post. But always, it's something that stimulates my mind more than any startup idea, or tech disruption. There is a blog I follow, I'm not even sure how I stumbled upon it. I don't think it has a newsletter, and it's not in my RSS feed. But, it's in my mind. It's as if I can just feel it when the author posts something new. Right there on the kitchen table, I load it up, and I get a glimpse into someone else's life. I don't know much about this person, but reading her writing is soothing. It's not commercialized, the most I can say about it is, well it is human . When I read something that is written, anything that's written, I expect to hear the voice of the person behind it . Whether it is a struggle, a victory, or just a small remark. It only makes sense when there is a person behind it. Sometimes people write, and in order to sound professional, they remove their voice from it. It becomes like reading a corporate memphis blog. Devoid of any humanity. It's weird how I have these names in my head. Keenen Charles . I check his blog on Sunday evening as well. In fact, here are a few I read recently in no particular order: This isn't to tell you that you need to read those articles or you will be left behind. It's not that deep. You can find things you like, and enjoy them at your own comfort. They don't have to be world changing, they don't have to turn you into a millionaire, they just have to make you smile or nod for a moment. The world is constantly trying to remind us that we are at the edge of destruction. But you, the person sitting there, reading a random blog post from this random Guinean guy, yes you. Take it easy for the rest of the day. Jerry (I particularly liked this specific article) Don't know her name
What’s going on, Internet? Trying something different. All the pages I bookmarked this month, no life updates in between. Want more? Check out all my bookmarks at /bookmarks/ and subscribe to the bookmarks feed . Hey, thanks for reading this post in your feed reader! Want to chat? Reply by email or add me on XMPP , or send a webmention . Check out the posts archive on the website. Scroll trīgintā ūnus by Shellsharks - Sharing the latest edition of scrolls, posting online without overthinking. I Am Happier Writing Code by Hand by Abhinav Omprakash - Letting AI write his code kills the satisfaction that made programming worth doing. You Are The Driver (The AI Is Just Typing) by Keith - AI coding tools are only useful once you already know what you’re doing. They automate typing, not thinking. Oceania Web Atlas by Zachary Kai - Collects personal websites from across Oceania into one tidy, human-scaled directory. Building the Good Web by Brennan - Building for users instead of against them is what separates the good web from everything else. Unpolished human websites by Joel - Keep your website messy and human. What is Digital Garage - Digital Tinker’s website is a workshop built for joy, not productivity. Creation without pressure. How to feel at home on the Internet by Jatan Mehta - Is having your own domain really the only way to truly “own” your online space? Endgame for the Open Web by Anil Dash - Is 2026 the last year we have a chance to put a stop on the dismantling of the open web?
A recent Tildes thread about computer monitor usage made me wonder what kind of setup others are using, so I spun up a survey! I have a theory about what reader's of this blog will most likely respond, but very curious to see the reality. You can take the 3 question survey here: surveys.darnfinesoftware.com The survey will be open for ~7 days before it auto implodes and all responses are deleted. I'll post a follow-up on what the data looks like (or you can view yourself at the above link).
Sanjay asked me in a comment on my AMA post : I am a fellow reader of multiple blogs of yours and others. But somehow I have been searching for any article where any one can setup of his entire digital life using subscription free model. I am not talking about to get everything FREE and become a PRODUCT. If you think you can setup everything using opensource then how would you setup all of your essentials. You can write a post anytime when you have a time. For example. And so on.. There may be many more things. I always think what would happen to my subscriptions if I will no more or I will have some issue or financial constraint. Will the subscription be a burden to my family when I will not be there. Or any of my important services will stop working for not paying suddenly? Currently I am not paying any subscription for any of my services as I have reduced as minimum services I can opt. I think the short answer to your question, Sanjay, is mostly yes. But I'd advise against it for some things*. Some of the items on your list are really easy to get without a subscription, for example: Unfortunately, some things on your list are either going to cost you money, privacy, or time somewhere along the line. Domains cost money. I know some don't but they tend to be very spammy and have poor email delivery as a result. Also, any email service worth their salt will require you to pay. If not, they're probably sniffing your mail. You could self-host your email at home, but there's then a cost associated with the hardware to host the mail server, or your time administering the system. Email is notoriously difficult for self-hosters too. As with most things that are free on the web, if it's free, you're probably the product. And that's true with both GitHub and Cloudflare, in my opinion. You can host a site for free on either service, but you would either need to buy a domain, or be happy using one of their free sub-domains. There's also the technical debt required to create the static sites that these services support. So there's a time cost. Again, you can host at home, but there's the same hardware or time costs that are associated with self-hosting email. Like email hosting, any service worth their salt is going to charge. Some may have initial tiers that are free, but I doubt they will be very generous. I personally use Bunny for my CDN needs. They're reasonably priced and have a pay-as-you-go model, so no subscription involved. Obviously you can't host a CDN at home, as that would defeat the object of the whole thing. For databases; same story as above. You can host at home, but there's a hardware/time cost associated, or you can pay for a reputable host to do it for you. I think this one is easy. Your options are threefold: I think these decisions ultimately come down to personal preference, and a compromise in one of three things - cost, time, or privacy. There's always a trade off with this stuff. It just boils down to what you're willing to trade off, personally. Thanks for reading this post via RSS. RSS is ace, and so are you. ❤️ You can reply to this post by email , or leave a comment . Free domain based email via MX Routing Hosting on Github or Cloudflare Pages OS - most important using Linux Document, Spreadsheet, Presentation Video Editing RSS feed reader - there are many feed readers you can install locally for free. Vivaldi has one built right into their browser, for example. Or you could self-host something like FreshRSS , or Miniflux . Notes app - my recommendation here would be Obsidian . I personally sync via WebDAV to my server at home. If you don't have the ability to do that, most operating systems have a note taking app pre-installed. Reminders - you can use the calendar app on your device, or on mobile, the built-in reminders/to-do apps. Document editing - LibreOffice is great, as is Only Office if you want something more modern looking. Operating system - Ubuntu for the win. It's what I use. Video editing - Kdenlive is available for all major operating systems, and works really well. A self-hosted media library that will consist of: Ripped music from a physical collection. Buying digital music from services like Bandcamp where you actually own the music, but this can get expensive. Pirated music 🏴☠️. A free account on a streaming service like Spotify , but it will be riddled with ads. A paid subscription to a streaming service. A service can be free and private, but it will be time consuming to manage. It can be quick to get started (hosted) and private, but it won't be free. It can be quick to get started (hosted) and free, but it won't respect your privacy.
In this article, I want to cover the overall design of coding agents and agent harnesses: what they are, how they work, and how the different pieces fit together in practice. Readers of my Build a Large Language Model (From Scratch) and Build a Large Reasoning Model (From Scratch) books often ask about agents, so I thought it would be useful to write a reference I can point to. More generally, agents have become an important topic because much of the recent progress in practical LLM systems is not just about better models, but about how we use them. In many real-world applications, the surrounding system, such as tool use, context management, and memory, plays as much of a role as the model itself. This also helps explain why systems like Claude Code or Codex can feel significantly more capable than the same models used in a plain chat interface. In this article, I lay out six of the main building blocks of a coding agent. You are probably familiar with Claude Code or the Codex CLI, but just to set the stage, they are essentially agentic coding tools that wrap an LLM in an application layer, a so-called agentic harness, to be more convenient and better-performing for coding tasks. Figure 1: Claude Code CLI, Codex CLI, and my Mini Coding Agent . Coding agents are engineered for software work where the notable parts are not only the model choice but the surrounding system, including repo context, tool design, prompt-cache stability, memory, and long-session continuity. That distinction matters because when we talk about the coding capabilities of LLMs, people often collapse the model, the reasoning behavior, and the agent product into one thing. But before getting into the coding agent specifics, let me briefly provide a bit more context on the difference between the broader concepts, the LLMs, reasoning models, and agents. An LLM is the core next-token model. A reasoning model is still an LLM, but usually one that was trained and/or prompted to spend more inference-time compute on intermediate reasoning, verification, or search over candidate answers. An agent is a layer on top, which can be understood as a control loop around the model. Typically, given a goal, the agent layer (or harness) decides what to inspect next, which tools to call, how to update its state, and when to stop, etc. Roughly, we can think about the relationship as this: the LLM is the engine, a reasoning model is a beefed-up engine (more powerful, but more expensive to use), and an agent harness helps us the model. The analogy is not perfect, because we can also use conventional and reasoning LLMs as standalone models (in a chat UI or Python session), but I hope it conveys the main point. Figure 2: The relationship between conventional LLM, reasoning LLM (or reasoning model), and an LLM wrapped in an agent harness. In other words, the agent is the system that repeatedly calls the model inside an environment. So, in short, we can summarize it like this: LLM: the raw model Reasoning model : an LLM optimized to output intermediate reasoning traces and to verify itself more Agent: a loop that uses a model plus tools, memory, and environment feedback Agent harness: the software scaffold around an agent that manages context, tool use, prompts, state, and control flow Coding harness: a special case of an agent harness; i.e., a task-specific harness for software engineering that manages code context, tools, execution, and iterative feedback As listed above, in the context of agents and coding tools, we also have the two popular terms agent harness and (agentic) coding harness . A coding harness is the software scaffold around a model that helps it write and edit code effectively. And an agent harness is a bit broader and not specific to coding (e.g., think of OpenClaw). Codex and Claude Code can be considered coding harnesses. Anyways, A better LLM provides a better foundation for a reasoning model (which involves additional training), and a harness gets more out of this reasoning model. Sure, LLMs and reasoning models are also capable of solving coding tasks by themselves (without a harness), but coding work is only partly about next-token generation. A lot of it is about repo navigation, search, function lookup, diff application, test execution, error inspection, and keeping all the relevant information in context. (Coders may know that this is hard mental work, which is why we don’t like to be disrupted during coding sessions :)). Figure 3. A coding harness combines three layers: the model family, an agent loop, and runtime supports. The model provides the “engine”, the agent loop drives iterative problem solving, and the runtime supports provide the plumbing. Within the loop, “observe” collects information from the environment, “inspect” analyzes that information, “choose” selects the next step, and “act” executes it. The takeaway here is that a good coding harness can make a reasoning and a non-reasoning model feel much stronger than it does in a plain chat box, because it helps with context management and more. As mentioned in the previous section, when we say harness , we typically mean the software layer around the model that assembles prompts, exposes tools, tracks file state, applies edits, runs commands, manages permissions, caches stable prefixes, stores memory, and many more. Today, when using LLMs, this layer shapes most of the user experience compared to prompting the model directly or using web chat UI (which is closer to “chat with uploaded files”). Since, in my view, the vanilla versions of LLMs nowadays have very similar capabilities (e.g., the vanilla versions of GPT-5.4, Opus 4.6, and GLM-5 or so), the harness can often be the distinguishing factor that makes one LLM work better than another. This is speculative, but I suspect that if we dropped one of the latest, most capable open-weight LLMs, such as GLM-5, into a similar harness, it could likely perform on par with GPT-5.4 in Codex or Claude Opus 4.6 in Claude Code. That said, some harness-specific post-training is usually beneficial. For example, OpenAI historically maintained separate GPT-5.3 and GPT-5.3-Codex variants. In the next section, I want to go more into the specifics and discuss the core components of a coding harness using my Mini Coding Agent : https://github.com/rasbt/mini-coding-agent . Figure 4: Main harness features of a coding agent / coding harness that will be discussed in the following sections. By the way, in this article, I use the terms “coding agent” and “coding harness” somewhat interchangeably for simplicity. (Strictly speaking, the agent is the model-driven decision-making loop, while the harness is the surrounding software scaffold that provides context, tools, and execution support.) Figure 5: Minimal but fully working, from-scratch Mini Coding Agent (implemented in pure Python) Anyways, below are six main components of coding agents. You can check out the source code of my minimal but fully working, from-scratch Mini Coding Agent (implemented in pure Python), for more concrete code examples. The code annotates the six components discussed below via code comments: This is maybe the most obvious component, but it is also one of the most important ones. When a user says “fix the tests” or “implement xyz,” the model should know whether it is inside a Git repo, what branch it is on, which project documents might contain instructions, and so on. That’s because those details often change or affect what the correct action is. For example, “Fix the tests” is not a self-contained instruction. If the agent sees AGENTS.md or a project README, it may learn which test command to run, etc. If it knows the repo root and layout, it can look in the right places instead of guessing. Also, the git branch, status, and commits can help provide more context about what changes are currently in progress and where to focus. Figure 6: The agent harness first builds a small workspace summary that gets combined with the user request for additional project context. The takeaway is that the coding agent collects info (”stable facts” as a workspace summary) upfront before doing any work, so that it’s is not starting from zero, without context, on every prompt. Once the agent has a repo view, the next question is how to feed that information to the model. The previous figure showed a simplified view of this (“Combined prompt: prefix + request”), but in practice, it would be relatively wasteful to combine and re-process the workspace summary on every user query. I.e., coding sessions are repetitive, and the agent rules usually stay the same. The tool descriptions usually stay the same, too. And even the workspace summary usually stays (mostly) the same. The main changes are usually the latest user request, the recent transcript, and maybe the short-term memory. “Smart” runtimes don’t rebuild everything as one giant undifferentiated prompt on every turn, as illustrated in the figure below. Figure 7: The agent harness builds a stable prompt prefix, adds the changing session state, and then feeds that combined prompt to the model. The main difference from section 1 is that section 1 was about gathering repo facts. Here, we are now interested in packaging and caching those facts efficiently for repeated model calls. The “stable” “Stable prompt prefix” means that the information contained there doesn’t change too much. It usually contains the general instructions, tool descriptions, and the workspace summary. We don’t want to waste compute on rebuilding it from scratch in each interaction if nothing important has changed. The other components are updated more frequently (usually each turn). This includes short-term memory, the recent transcript, and the newest user request. In short, the caching aspect for the “Stable prompt prefix” is simply that a smart runtime tries to reuse that part. Tool access and tool use are where it starts to feel less like chat and more like an agent. A plain model can suggest commands in prose, but an LLM in a coding harness should do something narrower and more useful and be actually able to execute the command and retrieve the results (versus us calling the command manually and pasting the results back into the chat). But instead of letting the model improvise arbitrary syntax, the harness usually provides a pre-defined list of allowed and named tools with clear inputs and clear boundaries. (But of course, something like Python can be part of this so that the agent could also execute an arbitrary wide list of shell commands.) The tool-use flow is illustrated in the figure below. Figure 8: The model emits a structured action, the harness validates it, optionally asks for approval, executes it, and feeds the bounded result back into the loop. To illustrate this, below is an example of how this usually looks to the user using my Mini Coding Agent. (This is not as pretty as Claude Code or Codex because it is very minimal and uses plain Python without any external dependencies.) Figure 9: Illustration of a tool call approval request in the Mini Coding Agent. Here, the model has to choose an action that the harness recognizes, like list files, read a file, search, run a shell command, write a file, etc. It also has to provide arguments in a shape that the harness can check. So when the model asks to do something, the runtime can stop and run programmatic checks like “Is this a known tool?”, “Are the arguments valid?”, “Does this need user approval?” “Is the requested path even inside the workspace?” Only after those checks pass does anything actually run. While running coding agents, of course, carries some risk, the harness checks also improve reliability because the model doesn’t execute totally arbitrary commands. Also, besides rejecting malformed actions and approval gating, file access can be kept inside the repo by checking file paths. In a sense, the harness is giving the model less freedom, but it also improves the usability at the same time. Context bloat is not a unique problem of coding agents but an issue for LLMs in general. Sure, LLMs are supporting longer and longer contexts these days (and I recently wrote about the attention variants that make it computationally more feasible), but long contexts are still expensive and can also introduce additional noise (if there is a lot of irrelevant info). Coding agents are even more susceptible to context bloat than regular LLMs during multi-turn chats, because of repeated file reads, lengthy tool outputs, logs, etc. If the runtime keeps all of that at full fidelity, it will run out of available context tokens pretty quickly. So, a good coding harness is usually pretty sophisticated about handling context bloat beyond just cutting our summarizing information like regular chat UIs. Conceptually, the context compaction in coding agents might work as summarized in the figure below. Specifically, we are zooming a bit further into the clip (step 6) part of Figure 8 in the previous section. Figure 10: Large outputs are clipped, older reads are deduplicated, and the transcript is compressed before it goes back into the prompt. A minimal harness uses at least two compaction strategies to manage that problem. The first is clipping, which shortens long document snippets, large tool outputs, memory notes, and transcript entries. In other words, it prevents any one piece of text from taking over the prompt budget just because it happened to be verbose. The second strategy is transcript reduction or summarization, which turns the full session history (more on that in the next section) into a smaller promptable summary. A key trick here is to keep recent events richer because they are more likely to matter for the current step. And we compress older events more aggressively because they are likely less relevant. Additionally, we also deduplicate older file reads so the model does not keep seeing the same file content over and over again just because it was read multiple times earlier in the session. Overall, I think this is one of the underrated, boring parts of good coding-agent design. A lot of apparent “model quality” is really context quality. In practice, all these 6 core concepts covered here are highly intertwined, and the different sections and figures cover them with different focuses or zoom levels. In the previous section, we covered prompt-time use of history and how we build a compact transcript. The question there is: how much of the past should go back into the model on the next turn? So the emphasis is compression, clipping, deduplication, and recency. Now, this section, structured session memory, is about the storage-time structure of history. The question here is: what does the agent keep over time as a permanent record? So the emphasis is that the runtime keeps a fuller transcript as a durable state, alongside a lighter memory layer that is smaller and gets modified and compacted rather than just appended to. To summarize, a coding agent separates state into (at least) two layers: working memory: the small, distilled state the agent keeps explicitly a full transcript: this covers all the user requests, tool outputs, and LLM responses Figure 11: New events get appended to a full transcript and summarized in a working memory. The session files on disk are usually stored as JSON files. The figure above illustrates the two main session files, the full transcript and the working memory, that usually get stored as JSON files on disk. As mentioned before, the full transcript stores the whole history, and it’s resumable if we close the agent. The working memory is more of a distilled version with the currently most important info, which is somewhat related to the compact transcript. But the compact transcript and working memory have slightly different jobs. The compact transcript is for prompt reconstruction. Its job is to give the model a compressed view of recent history so it can continue the conversation without seeing the full transcript every turn. The working memory is more meant for task continuity. Its job is to keep a small, explicitly maintained summary of what matters across turns, things like the current task, important files, and recent notes. Following step 4 in the figure above, the latest user request, together with the LLM response and tool output, would then be recorded as a “new event” in both the full transcript and working memory, in the next round, which is not shown to reduce clutter in the figure above. Once an agent has tools and state, one of the next useful capabilities is delegation. The reason is that it allows us to parallelize certain work into subtasks via subagents and speed up the main task. For example, the main agent may be in the middle of one task and still need a side answer, for example, which file defines a symbol, what a config says, or why a test is failing. It is useful to split that off into a bounded subtask instead of forcing one loop to carry every thread of work at once. (In my mini coding agent, the implementation is simpler, and the child still runs synchronously, but the underlying idea is the same.) A subagent is only useful if it inherits enough context to do real work. But if we don’t restrict it, we now have multiple agents duplicating work, touching the same files, or spawning more subagents, and so on. So the tricky design problem is not just how to spawn a subagent but also how to bind one :). Figure 12: The subagent inherits enough context to be useful, but it runs inside tighter boundaries than the main agent. The trick here is that the subagent inherits enough context to be useful, but also has it constrained (for example, read-only and restricted in recursion depth) Claude Code has supported subagents for a long time, and Codex added them more recently. Codex does not generally force subagents into read-only mode. Instead, they usually inherit much of the main agent’s sandbox and approval setup. So, the boundary is more about task scoping, context, and depth. The section above tried to cover the main components of coding agents. As mentioned before, they are more or less deeply intertwined in their implementation. However, I hope that covering them one by one helps with the overall mental model of how coding harnesses work, and why they can make the LLM more useful compared to simple multi-turn chats. Figure 13: Six main features of a coding harness discussed in previous sections. If you are interested in seeing these implemented in clean, minimalist Python code, you may like my Mini Coding Agent . OpenClaw may be an interesting comparison, but it is not quite the same kind of system. OpenClaw is more like a local, general agent platform that can also code, rather than being a specialized (terminal) coding assistant. There are still several overlaps with a coding harness: it uses prompt and instruction files in the workspace, such as AGENTS.md, SOUL.md, and TOOLS.md it keeps JSONL session files and includes transcript compaction and session management it can spawn helper sessions and subagents However, as mentioned above, the emphasis is different. Coding agents are optimized for a person working in a repository and asking a coding assistant to inspect files, edit code, and run local tools efficiently. OpenClaw is more optimized for running many long-lived local agents across chats, channels, and workspaces, with coding as one important workload among several others. I am excited to share that I finished writing Build A Reasoning Model (From Scratch) and all chapters are in early access yet. The publisher is currently working on the layouts, and it should be available this summer. This is probably my most ambitious book so far. I spent about 1.5 years writing it, and a large number of experiments went into it. It is also probably the book I worked hardest on in terms of time, effort, and polish, and I hope you’ll enjoy it. Build a Reasoning Model (From Scratch) on Manning and Amazon . The main topics are evaluating reasoning models inference-time scaling self-refinement reinforcement learning distillation There is a lot of discussion around “reasoning” in LLMs, and I think the best way to understand what it really means in the context of LLMs is to implement one from scratch! Amazon (pre-order) Manning (complete book in early access , pre-final layout, 528 pages) Figure 1: Claude Code CLI, Codex CLI, and my Mini Coding Agent . Coding agents are engineered for software work where the notable parts are not only the model choice but the surrounding system, including repo context, tool design, prompt-cache stability, memory, and long-session continuity. That distinction matters because when we talk about the coding capabilities of LLMs, people often collapse the model, the reasoning behavior, and the agent product into one thing. But before getting into the coding agent specifics, let me briefly provide a bit more context on the difference between the broader concepts, the LLMs, reasoning models, and agents. On The Relationship Between LLMs, Reasoning Models, and Agents An LLM is the core next-token model. A reasoning model is still an LLM, but usually one that was trained and/or prompted to spend more inference-time compute on intermediate reasoning, verification, or search over candidate answers. An agent is a layer on top, which can be understood as a control loop around the model. Typically, given a goal, the agent layer (or harness) decides what to inspect next, which tools to call, how to update its state, and when to stop, etc. Roughly, we can think about the relationship as this: the LLM is the engine, a reasoning model is a beefed-up engine (more powerful, but more expensive to use), and an agent harness helps us the model. The analogy is not perfect, because we can also use conventional and reasoning LLMs as standalone models (in a chat UI or Python session), but I hope it conveys the main point. Figure 2: The relationship between conventional LLM, reasoning LLM (or reasoning model), and an LLM wrapped in an agent harness. In other words, the agent is the system that repeatedly calls the model inside an environment. So, in short, we can summarize it like this: LLM: the raw model Reasoning model : an LLM optimized to output intermediate reasoning traces and to verify itself more Agent: a loop that uses a model plus tools, memory, and environment feedback Agent harness: the software scaffold around an agent that manages context, tool use, prompts, state, and control flow Coding harness: a special case of an agent harness; i.e., a task-specific harness for software engineering that manages code context, tools, execution, and iterative feedback Figure 3. A coding harness combines three layers: the model family, an agent loop, and runtime supports. The model provides the “engine”, the agent loop drives iterative problem solving, and the runtime supports provide the plumbing. Within the loop, “observe” collects information from the environment, “inspect” analyzes that information, “choose” selects the next step, and “act” executes it. The takeaway here is that a good coding harness can make a reasoning and a non-reasoning model feel much stronger than it does in a plain chat box, because it helps with context management and more. The Coding Harness As mentioned in the previous section, when we say harness , we typically mean the software layer around the model that assembles prompts, exposes tools, tracks file state, applies edits, runs commands, manages permissions, caches stable prefixes, stores memory, and many more. Today, when using LLMs, this layer shapes most of the user experience compared to prompting the model directly or using web chat UI (which is closer to “chat with uploaded files”). Since, in my view, the vanilla versions of LLMs nowadays have very similar capabilities (e.g., the vanilla versions of GPT-5.4, Opus 4.6, and GLM-5 or so), the harness can often be the distinguishing factor that makes one LLM work better than another. This is speculative, but I suspect that if we dropped one of the latest, most capable open-weight LLMs, such as GLM-5, into a similar harness, it could likely perform on par with GPT-5.4 in Codex or Claude Opus 4.6 in Claude Code. That said, some harness-specific post-training is usually beneficial. For example, OpenAI historically maintained separate GPT-5.3 and GPT-5.3-Codex variants. In the next section, I want to go more into the specifics and discuss the core components of a coding harness using my Mini Coding Agent : https://github.com/rasbt/mini-coding-agent . Figure 4: Main harness features of a coding agent / coding harness that will be discussed in the following sections. By the way, in this article, I use the terms “coding agent” and “coding harness” somewhat interchangeably for simplicity. (Strictly speaking, the agent is the model-driven decision-making loop, while the harness is the surrounding software scaffold that provides context, tools, and execution support.) Figure 5: Minimal but fully working, from-scratch Mini Coding Agent (implemented in pure Python) Anyways, below are six main components of coding agents. You can check out the source code of my minimal but fully working, from-scratch Mini Coding Agent (implemented in pure Python), for more concrete code examples. The code annotates the six components discussed below via code comments: 1. Live Repo Context This is maybe the most obvious component, but it is also one of the most important ones. When a user says “fix the tests” or “implement xyz,” the model should know whether it is inside a Git repo, what branch it is on, which project documents might contain instructions, and so on. That’s because those details often change or affect what the correct action is. For example, “Fix the tests” is not a self-contained instruction. If the agent sees AGENTS.md or a project README, it may learn which test command to run, etc. If it knows the repo root and layout, it can look in the right places instead of guessing. Also, the git branch, status, and commits can help provide more context about what changes are currently in progress and where to focus. Figure 6: The agent harness first builds a small workspace summary that gets combined with the user request for additional project context. The takeaway is that the coding agent collects info (”stable facts” as a workspace summary) upfront before doing any work, so that it’s is not starting from zero, without context, on every prompt. 2. Prompt Shape And Cache Reuse Once the agent has a repo view, the next question is how to feed that information to the model. The previous figure showed a simplified view of this (“Combined prompt: prefix + request”), but in practice, it would be relatively wasteful to combine and re-process the workspace summary on every user query. I.e., coding sessions are repetitive, and the agent rules usually stay the same. The tool descriptions usually stay the same, too. And even the workspace summary usually stays (mostly) the same. The main changes are usually the latest user request, the recent transcript, and maybe the short-term memory. “Smart” runtimes don’t rebuild everything as one giant undifferentiated prompt on every turn, as illustrated in the figure below. Figure 7: The agent harness builds a stable prompt prefix, adds the changing session state, and then feeds that combined prompt to the model. The main difference from section 1 is that section 1 was about gathering repo facts. Here, we are now interested in packaging and caching those facts efficiently for repeated model calls. The “stable” “Stable prompt prefix” means that the information contained there doesn’t change too much. It usually contains the general instructions, tool descriptions, and the workspace summary. We don’t want to waste compute on rebuilding it from scratch in each interaction if nothing important has changed. The other components are updated more frequently (usually each turn). This includes short-term memory, the recent transcript, and the newest user request. In short, the caching aspect for the “Stable prompt prefix” is simply that a smart runtime tries to reuse that part. 3. Tool Access and Use Tool access and tool use are where it starts to feel less like chat and more like an agent. A plain model can suggest commands in prose, but an LLM in a coding harness should do something narrower and more useful and be actually able to execute the command and retrieve the results (versus us calling the command manually and pasting the results back into the chat). But instead of letting the model improvise arbitrary syntax, the harness usually provides a pre-defined list of allowed and named tools with clear inputs and clear boundaries. (But of course, something like Python can be part of this so that the agent could also execute an arbitrary wide list of shell commands.) The tool-use flow is illustrated in the figure below. Figure 8: The model emits a structured action, the harness validates it, optionally asks for approval, executes it, and feeds the bounded result back into the loop. To illustrate this, below is an example of how this usually looks to the user using my Mini Coding Agent. (This is not as pretty as Claude Code or Codex because it is very minimal and uses plain Python without any external dependencies.) Figure 9: Illustration of a tool call approval request in the Mini Coding Agent. Here, the model has to choose an action that the harness recognizes, like list files, read a file, search, run a shell command, write a file, etc. It also has to provide arguments in a shape that the harness can check. So when the model asks to do something, the runtime can stop and run programmatic checks like “Is this a known tool?”, “Are the arguments valid?”, “Does this need user approval?” “Is the requested path even inside the workspace?” Figure 10: Large outputs are clipped, older reads are deduplicated, and the transcript is compressed before it goes back into the prompt. A minimal harness uses at least two compaction strategies to manage that problem. The first is clipping, which shortens long document snippets, large tool outputs, memory notes, and transcript entries. In other words, it prevents any one piece of text from taking over the prompt budget just because it happened to be verbose. The second strategy is transcript reduction or summarization, which turns the full session history (more on that in the next section) into a smaller promptable summary. A key trick here is to keep recent events richer because they are more likely to matter for the current step. And we compress older events more aggressively because they are likely less relevant. Additionally, we also deduplicate older file reads so the model does not keep seeing the same file content over and over again just because it was read multiple times earlier in the session. Overall, I think this is one of the underrated, boring parts of good coding-agent design. A lot of apparent “model quality” is really context quality. 5. Structured Session Memory In practice, all these 6 core concepts covered here are highly intertwined, and the different sections and figures cover them with different focuses or zoom levels. In the previous section, we covered prompt-time use of history and how we build a compact transcript. The question there is: how much of the past should go back into the model on the next turn? So the emphasis is compression, clipping, deduplication, and recency. Now, this section, structured session memory, is about the storage-time structure of history. The question here is: what does the agent keep over time as a permanent record? So the emphasis is that the runtime keeps a fuller transcript as a durable state, alongside a lighter memory layer that is smaller and gets modified and compacted rather than just appended to. To summarize, a coding agent separates state into (at least) two layers: working memory: the small, distilled state the agent keeps explicitly a full transcript: this covers all the user requests, tool outputs, and LLM responses Figure 11: New events get appended to a full transcript and summarized in a working memory. The session files on disk are usually stored as JSON files. The figure above illustrates the two main session files, the full transcript and the working memory, that usually get stored as JSON files on disk. As mentioned before, the full transcript stores the whole history, and it’s resumable if we close the agent. The working memory is more of a distilled version with the currently most important info, which is somewhat related to the compact transcript. But the compact transcript and working memory have slightly different jobs. The compact transcript is for prompt reconstruction. Its job is to give the model a compressed view of recent history so it can continue the conversation without seeing the full transcript every turn. The working memory is more meant for task continuity. Its job is to keep a small, explicitly maintained summary of what matters across turns, things like the current task, important files, and recent notes. Following step 4 in the figure above, the latest user request, together with the LLM response and tool output, would then be recorded as a “new event” in both the full transcript and working memory, in the next round, which is not shown to reduce clutter in the figure above. 6. Delegation With (Bounded) Subagents Once an agent has tools and state, one of the next useful capabilities is delegation. The reason is that it allows us to parallelize certain work into subtasks via subagents and speed up the main task. For example, the main agent may be in the middle of one task and still need a side answer, for example, which file defines a symbol, what a config says, or why a test is failing. It is useful to split that off into a bounded subtask instead of forcing one loop to carry every thread of work at once. (In my mini coding agent, the implementation is simpler, and the child still runs synchronously, but the underlying idea is the same.) A subagent is only useful if it inherits enough context to do real work. But if we don’t restrict it, we now have multiple agents duplicating work, touching the same files, or spawning more subagents, and so on. So the tricky design problem is not just how to spawn a subagent but also how to bind one :). Figure 12: The subagent inherits enough context to be useful, but it runs inside tighter boundaries than the main agent. The trick here is that the subagent inherits enough context to be useful, but also has it constrained (for example, read-only and restricted in recursion depth) Claude Code has supported subagents for a long time, and Codex added them more recently. Codex does not generally force subagents into read-only mode. Instead, they usually inherit much of the main agent’s sandbox and approval setup. So, the boundary is more about task scoping, context, and depth. Components Summary The section above tried to cover the main components of coding agents. As mentioned before, they are more or less deeply intertwined in their implementation. However, I hope that covering them one by one helps with the overall mental model of how coding harnesses work, and why they can make the LLM more useful compared to simple multi-turn chats. Figure 13: Six main features of a coding harness discussed in previous sections. If you are interested in seeing these implemented in clean, minimalist Python code, you may like my Mini Coding Agent . How Does This Compare To OpenClaw? OpenClaw may be an interesting comparison, but it is not quite the same kind of system. OpenClaw is more like a local, general agent platform that can also code, rather than being a specialized (terminal) coding assistant. There are still several overlaps with a coding harness: it uses prompt and instruction files in the workspace, such as AGENTS.md, SOUL.md, and TOOLS.md it keeps JSONL session files and includes transcript compaction and session management it can spawn helper sessions and subagents Build a Reasoning Model (From Scratch) on Manning and Amazon . The main topics are evaluating reasoning models inference-time scaling self-refinement reinforcement learning distillation Amazon (pre-order) Manning (complete book in early access , pre-final layout, 528 pages)
I’ve spent the last few months figuring out how best to use LLMs to build software. In January and February, I used Claude Code to build a little programming language in C. In December I used local a local LLM to analyze all the journal entries I wrote in 2025 , and then used Gemini to write scripts that could visualize that data. Besides what I’ve written about publicly, I’ve also used Claude Code to: I won’t lie, I started off skeptical about the ability of LLMs to write code, but I can’t deny the fact that, in 2026, they can produce code that’s as good or better than a junior-to-intermediate developer for most programming domains. If you’re abstaining from learning about or using LLMs in your own work, you’re doing a disservice to yourself and your career. It’s a very real possibility that in five years, most of the code we write will be produced using an LLM. It’s not a certainty, but it’s a strong possibility. However, I’m not going to stop writing code by hand. Not anytime soon. As long as there are computers to program, I will be programming them using my own two fleshy human hands. I started programming computers because I enjoy the act of programming. I enjoy thinking through problems, coming up with solutions, evolving those solutions so that they are as correct and clear as possible, and then putting them out into the world where they can be of use to people. It’s a fun and fulfilling profession. Some people see the need for writing code as an impediment to getting good use out of a computer. In fact, some of the most avid fans of generative AI believe that the act of actually doing the work is a punishment. They see work as unnecesary friction that must be optimized away. Truth is, the friction inherent in doing any kind of work—writing, programming, making music, painting, or any other creative activity generative AI purpots to replace—is the whole point. The artifacts you produce as the result of your hard work are not important. They are incidental. The work itself is the point. When you do the work, you change and grow and become more yourself. Work—especially creative work—is an act of self-love if you choose to see it that way. Besides, when you rely on generative AI to do the work, you miss out on the pleasurable sensations of being in flow state. Your skills atrophy (no, writing good prompts is not a skill, any idiot can do it). Your brain gets saturated with dopamine in the same way when you gamble, doomscroll, or play a gatcha game. Using Claude Code as your main method of producing code is like scrolling TikTok eight hours a day, every day, for work. And the worst part? The code you produce using LLMs is pure cognitive debt. You have no idea what it’s doing, only that it seems to be doing what you want it to do. You don’t have a mental model for how it works, and you can’t fix it if it breaks in production. Such a codebase is not an asset but a liability. I predict that in 1-3 years we’re going see organizations rewrite their LLM-generated software using actual human programmers. Personally, I’ve stopped using generative AI to write code for my personal projects. I still use Claude Code as a souped up search engine to look up information, or to help me debug nasty errors. But I’m manually typing every single line of code in my current Django project, with my own fingers, using a real physical keyboard. I’m even thinking up all the code using my own brain. Miraculous! For the commercial projects I work on for my clients, I’m going to follow whatever the norms around LLM use happen to at my workplace. If a client requires me to use Claude Code to write every single line of code, I’ll be happy to oblige. If they ban LLMs outright, I’m fine with that too. After spending hundreds of hours yelling at Claude, I’m dangerously proficient at getting it to do the right thing. But I haven’t lost my programming skills yet, and I don’t plan to. I’m flexible. Given the freedom to choose, I’d probably pick a middle path: use LLMs to generate boilerplate code, write tricky test cases, debug nasty issues I can’t think of, and quickly prototype ideas to test. I’m not an AI vegan. But when it comes to code I write for myself—which includes the code that runs this website—I’m going to continue writing it myself, line by line, like I always did. Somebody has to clean up after the robots when they make a mess, right? Write and debug Emacs Lisp for my personal Emacs configuration. Write several Alfred workflows (in Bash, AppleScript, and Swift) to automate tasks on my computer. Debug CSS issues on this very website. Generate React components for a couple of throwaway side projects. Generate Django apps for a couple of throwaway side projects. Port color themes between text editors. A lot more that I’m forgetting now.
In ‘ small thoughts ’ posts, I’m posting a collection of short thoughts and opinions that don’t warrant their own post. :) It's been a while! I know self love exists, because I feel it and my body lives it (most of the time). I know it’s easy to pretend that self love doesn’t exist, because a (negative) ego should not exist; but in my view, seeing yourself as a part of a whole instead to cope with depression is also done out of self love. Your body wants to survive. Even if you hate yourself, there is a part in you mental illness cannot touch (for now?) that wants to heal and seek ways on how to live regardless and make it bearable. If you have to pretend like self love isn’t real because you can’t consciously do it yet, that’s fine. But I know it is there because otherwise I would not care about what I eat or drink, about healing illnesses, about fitness, about community, about higher goals than survival, like education and hobbies. I wanna enrich my body and my mind. I wanna act on my potential. I wanna be the best partner and friend I can be. Loving myself made loving others better, easier, healthier. Loving myself makes me show up for communities better. Loving myself makes me sacrifice for others within my boundaries and without burning out and without resentment. Self love is seldom selfish. It doesn’t have to be. I think there’s a misconception that self love inherently includes self-obsessed navel-gazing and that in turn makes you constantly nitpick yourself and your life and focus on what you deserve but aren’t getting and therefore makes you sad, but I disagree. It doesn’t have to be narcissistic and obsessive at all. It doesn’t have to mean putting yourself before others constantly, just in a way where you’ll put on the oxygen mask on yourself before you can help others put theirs on. Sometimes I am afraid to show my kindness online. My kindness naturally ebbs and flows - never truly gone, but there are phases where I really go out of my way and go extra hard, and phases where it’s just basic kindness. But as everyone, I can have bad days or a disagreement, I am low energy, my patience runs out or I need to criticize someone, set boundaries or call something out. In theory, all of that can be done kindly or do not detract from kindness. They can even be kindness. But in practice, some people don’t respect things until you say it in a very sharp tone, or you’ll let your negative feelings show. And realistically, the second you don’t let people do what they want or you criticize them, no matter how softly, they’ll see you as unkind. Kindness, to them, is you always being unconditionally supportive. I am a little scared of people feeling tricked when I can’t upkeep a strong habit of kindness at all times. I’ve seen it in the past, when people who made kindness one of their most prominent features (or were just being seen that way) were dragged for suddenly being unkind - like arguing with someone, being rude during a bad day or whatever. It was unfair. People felt as if they had used kindness to cover up actually being an asshole, like being kind was just an act. It makes me sad and scared that one bad moment can undo months or years of consistent kindness. I don’t know if I can be that perfect. On one hand, I get it - there really are these love & light girlies who preach all that stuff but are really toxic, mean and gossipy in real life. I acknowledge the stories of past school bullies always posting about ‘positive vibes only’ online. But it also makes it hard to show open kindness without putting yourself in a very limiting box of perfect behavior. Not to mention that there’s a gender aspect to it too; higher kindness requirements for women and more situations where you’re required to be kind, and normal behaviors read as unkindness because you’re not a servant, doormat, motherly etc.; looking young, feminine, maybe wearing dresses, or predominantly pink stuff increases the effects, in my experience. People, especially men, expect me to be a lot more motherly, forgiving, patient, kind and servant-like than I am. I actually just want to act and behave like myself, without someone slapping onto me that I must be very kind or motherly as a defining trait, just to accidentally violate that invisible role and have people claim that a rude moment is somehow my true self and all the genuine kindness was a mask. Hot take I am willing to change my mind about: I think looking back, it was a mistake that we saw people liking, commenting and following the same individual (as in, a social media account of a private person, not a company or band etc.) as a “community”. There is no community-building or organizing going on in comment sections of LA influencers, for example. Your readers (or viewers) all consume it independently from one another. There is barely any interaction between them, and often not positive or in-depth. No one bonds over “😂😂😂” or “Agreed.” or “Good post.” or some summary of the post. The views and likes you get are also partly people that checked you out once and that’s that. Really, the people that see you online have nothing in common most times. They’re most often not gathering under a shared message, movement or artstyle, nor are they really knowing each other, and pretending it is so has had a role in para-social behaviors. Implying you have a community or fanbase as a simple social media account or blogger is like implying the people who watch the same ads on YouTube are one. Reply via email Published 04 Apr, 2026
Top Matter : Codeberg for the library , doc for the library . I’ve forked Lea Anthony’s library that eventually made its way into core Wails for two reasons: So here we are. I want it in Wails 3 and it’s not there I want to shave a meg off the binary size by not providing the embedded installer exe
Remember that failed experiment where I ran Jellyfin off of a LattePanda V1? Do you recall all the parts where I said what this single board computer cannot do? Yeah, I remember. Then I took it and put the two of the most critical services running on it: the blog you’re reading right now, and my Wireguard setup. Trust me, it makes more sense with some context. The board is incapable of doing anything else other than serving content from the eMMC module, and it has a functioning network port. It doesn’t seem to crash in these scenarios. When I try anything else with this board, especially things that include USB connectivity, things break. This makes the board ideal for a light workload that needs to be up 24/7. The biggest threat to my uptime is not internet connectivity or loss of power (although that did happen for the first time in a year recently), it’s me getting new ideas to try out on my setup, which results in downtime. This board is so unreliable for trying those ideas out that it removes any and all temptation to do that, resulting in a computer that has the highest chance of actually being up and running for a very long time. To play things safe, I used an IKEA SJÖSS 20W USB-C power adapter that I got for 3 EUR, with a cheap USB-C to USB-A adapter thrown into the mix. It looks janky, but the adapter outputs 5V 3A, which makes it the beefiest power adapter that I have in my fleet for plain USB-A powered devices. I then hit the board with some commands, including hitting the 2 GB of memory. It ran really well for days, no issues at all. I also improved the cooling situation. I am now a proud owner of an assortment of M2, M2.5 and M3 screws and bits, and equipped with a Makita cordless drill, I made some mounting holes into an old aluminium server heat sink. The drilling was a complete hack job, everything was misaligned, but it was good enough. Certainly better than holding the board and heat sink together with thin velcro strips. The cooling performance is completely adequate, the board hits a maximum of 65°C with the heat sink facing down. This is well below the point at which the board starts to throttle its CPU. The theoretical maximum Wireguard throughput on this board is about 340 Mbps, measured using the fantastic wg-bench solution. Remember the part about the USB ports being flaky? Yeah. That didn’t stop me from getting a USB Gigabit Ethernet adapter to remove one of the main limitations of the LattePanda V1. Based off of vibe-recommendations by Claude, I got a TP-Link UE300 for its alleged low power usage and its availability at a local computer store in Estonia. It seems to work well enough, you can push gigabit speeds through it measured by , and the actual Wireguard performance that I could push through it with an actual workload was at about 420 Mbit/s, higher than indicated by the benchmark, and plenty fast for most workloads, especially in external networks that are usually slower than that. A few hours after making that change, a HN post put some mild load on the LattePanda V1, what good timing. As of publishing this post, the blog has been running mostly off of the LattePanda V1 for over a month now, with that gap in it being caused by contemplating getting that USB Ethernet adapter and temporarily running the blog and Wireguard off of another mini PC during that time. Did you notice?
About five months ago I wrote about Absurd , a durable execution system we built for our own use at Earendil, sitting entirely on top of Postgres and Postgres alone. The pitch was simple: you don’t need a separate service , a compiler plugin , or an entire runtime to get durable workflows. You need a SQL file and a thin SDK. Since then we’ve been running it in production, and I figured it’s worth sharing what the experience has been like. The short version: the design held up, the system has been a pleasure to work with, and other people seem to agree. Absurd is a durable execution system that lives entirely inside Postgres. The core is a single SQL file ( absurd.sql ) that defines stored procedures for task management, checkpoint storage, event handling, and claim-based scheduling. On top of that sit thin SDKs (currently TypeScript , Python and an experimental Go one) that make the system ergonomic in your language of choice. The model is straightforward: you register tasks, decompose them into steps, and each step acts as a checkpoint. If anything fails, the task retries from the last completed step. Tasks can sleep, wait for external events, and suspend for days or weeks. All state lives in Postgres. If you want the full introduction, the original blog post covers the fundamentals. What follows here is what we’ve learned since. The project got multiple releases over the last five months. Most of the changes are things you’d expect from a system that people actually started depending on: hardened claim handling, watchdogs that terminate broken workers, deadlock prevention, proper lease management, event race conditions, and all the edge cases that only show up when you’re running real workloads. A few things worth calling out specifically. Decomposed steps. The original design only had , where you pass in a function and get back its checkpointed result. That works well for many cases but not all. Sometimes you need to know whether a step already ran before deciding what to do next. So we added / , which give you a handle you can inspect before committing the result. This turned out to be very useful for modeling intentional failures and conditional logic. This in particular is necessary when working with “before call” and “after call” type hook APIs. Task results. You can now spawn a task, go do other things, and later come back to fetch or await its result. This sounds obvious in hindsight, but the original system was purely fire-and-forget. Having proper result inspection made it possible to use Absurd for things like spawning child tasks from within a parent workflow and waiting for them to finish. This is particularly useful for debugging with agents too. absurdctl . We built this out as a proper CLI tool. You can initialize schemas, run migrations, create queues, spawn tasks, emit events, retry failures from the command line. It’s installable via or as a standalone binary. This has been invaluable for debugging production issues. When something is stuck, being able to just and see exactly where it stopped is a very different experience from digging through logs. Habitat . A small Go application that serves up a web dashboard for monitoring tasks, runs, checkpoints, and events. It connects directly to Postgres and gives you a live view of what’s happening. It’s simple, but it’s the kind of thing that makes the system more enjoyable for humans. Agent integration. Since Absurd was originally built for agent workloads, we added a bundled skill that coding agents can discover and use to debug workflow state via . There’s also a documented pattern for making pi agent turns durable by logging each message as a checkpoint. The thing I’m most pleased about is that the core design didn’t need to change all that much. The fundamental model of tasks, steps, checkpoints, events, and suspending is still exactly what it was initially. We added features around it, but nothing forced us to rethink the basic abstractions. Putting the complexity in SQL and keeping the SDKs thin turned out to be a genuinely good call. The TypeScript SDK is about 1,400 lines. The Python SDK is about 1,900 but most of this comes from the complexity of supporting colored functions. Compare that to Temporal’s Python SDK at around 170,000 lines. It means the SDKs are easy to understand, easy to debug, and easy to port. When something goes wrong, you can read the entire SDK in an afternoon and understand what it does. The checkpoint-based replay model also aged well. Unlike systems that require deterministic replay of your entire workflow function, Absurd just loads the cached step results and skips over completed work. That means your code doesn’t need to be deterministic outside of steps. You can call or in between steps and things still work, because only the step boundaries matter. In practice, this makes it much easier to reason about what’s safe and what isn’t. Pull-based scheduling was the right choice too. Workers pull tasks from Postgres as they have capacity. There’s no coordinator, no push mechanism, no HTTP callbacks. That makes it trivially self-hostable and means you don’t have to think about load management at the infrastructure level. I had some discussions with folks about whether the right abstraction should have been a durable promise . It’s a very appealing idea, but it turns out to be much more complex to implement in practice. It’s however in theory also more powerful. I did make some attempts to see what absurd would look like if it was based on durable promises but so far did not get anywhere with it. It’s however an experiment that I think would be fun to try! The primary use case is still agent workflows. An agent is essentially a loop that calls an LLM, processes tool results, and repeats until it decides it’s done. Each iteration becomes a step, and each step’s result is checkpointed. If the process dies on iteration 7, it restarts and replays iterations 1 through 6 from the store, then continues from 7. But we’ve found it useful for a lot of other things too. All our crons just dispatch distributed workflows with a pre-generated deduplication key from the invocation. We can have two cron processes running and they will only trigger one absurd task invocation. We also use it for background processing that needs to survive deploys. Basically anything where you’d otherwise build your own retry-and-resume logic on top of a queue. Absurd is deliberately minimal, but there are things I’d like to see. There’s no built-in scheduler. If you want cron-like behavior, you run your own scheduler loop and use idempotency keys to deduplicate. That works, and we have a documented pattern for it , but it would be nice to have something more integrated. There’s no push model. Everything is pull. If you need an HTTP endpoint to receive webhooks and wake up tasks, you build that yourself. I think that’s the right default as push systems are harder to operate and easier to overwhelm but there are cases where it would be convenient. In particular there are quite a few agentic systems where it would be super nice to have webhooks natively integrated (wake on incoming POST request). I definitely don’t want to have this in the core, but that sounds like the kind of problem that could be a nice adjacent library that builds on top of absurd. The biggest omission is that it does not support partitioning yet. That’s unfortunate because it makes cleaning up data more expensive than it has to be. In theory supporting partitions would be pretty simple. You could have weekly partitions and then detach and delete them when they expire. The only thing that really stands in the way of that is that Postgres does not have a convenient way of actually doing that. The hard part is not partitioning itself, it’s partition lifecycle management under real workloads. If a worker inserts a row whose lands in a month without a partition, the insert fails and the workflow crashes. So you need a separate maintenance loop that always creates future partitions far enough ahead for sleeps/retries, and does that for every queue. On the delete side, the safe approach is , but getting that to run from doesn’t work because it cannot be run within a transaction, but runs everything in one. I don’t think it’s an unsolvable problem, but it’s one I have not found a good solution for and I would love to get input on . This brings me a bit to a meta point on the whole thing which is what the point of Open Source libraries in the age of agentic engineering is. Durable Execution is now something that plenty of startups sell you. On the other hand it’s also something that an agent would build you and people might not even look for solutions any more. It’s kind of … weird? I don’t think a durable execution library can support a company, I really don’t. On the other hand I think it’s just complex enough of a problem that it could be a good Open Source project void of commercial interests. You do need a bit of an ecosystem around it, particularly for UI and good DX for debugging, and that’s hard to get from a throwaway implementation. I don’t think we have squared this yet, but it’s already much better to use than a few months ago. If you’re using Absurd, thinking about it, or building adjacent ideas, I’d love your feedback. Bug reports, rough edges, design critiques, and contributions are all very welcome—this project has gotten better every time someone poked at it from a different angle.
Welcome back to compiler land. Today we’re going to talk about value numbering , which is like SSA, but more. Static single assignment (SSA) gives names to values: every expression has a name, and each name corresponds to exactly one expression. It transforms programs like this: where the variable is assigned more than once in the program text, into programs like this: where each assignment to has been replaced with an assignment to a new fresh name. It’s great because it makes clear the differences between the two expressions. Though they textually look similar, they compute different values. The first computes 1 and the second computes 2. In this example, it is not possible to substitute in a variable and re-use the value of , because the s are different. But what if we see two “textually” identical instructions in SSA? That sounds much more promising than non-SSA because the transformation into SSA form has removed (much of) the statefulness of it all. When can we re-use the result? Identifying instructions that are known at compile-time to always produce the same value at run-time is called value numbering . To understand value numbering, let’s extend the above IR snippet with two more instructions, v3 and v4. In this new snippet, v3 looks the same as v1: adding v0 and 1. Assuming our addition operation is some ideal mathematical addition, we can absolutely re-use v1; no need to compute the addition again. We can rewrite the IR to something like: This is kind of similar to the destructive union-find representation that JavaScriptCore and a couple other compilers use, where the optimizer doesn’t eagerly re-write all uses but instead leaves a little breadcrumb / instruction 1 . We could then run our copy propagation pass (“union-find cleanup”?) and get: Great. But how does this happen? How does an optimizer identify reusable instruction candidates that are “textually identical”? Generally, there is no actual text in the IR . One popular solution is to compute a hash of each instruction. Then any instructions with the same hash (that also compare equal, in case of collisions) are considered equivalent. This is called hash-consing . When trying to figure all this out, I read through a couple of different implementations. I particularly like the Maxine VM implementation. For example, here is the (hashing) and functions for most binary operations, slightly modified for clarity: The rest of the value numbering implementation assumes that if a function returns 0, it does not wish to be considered for value numbering. Why might an instruction opt-out of value numbering? An instruction might opt out of value numbering if it is not “pure”. Some instructions are not pure. Purity is in the eye of the beholder, but in general it means that an instruction does not interact with the state of the outside world, except for trivial computation on its operands. (What does it mean to de-duplicate/cache/reuse ?) A load from an array object is also not a pure operation 2 . The load operation implicitly relies on the state of the memory. Also, even if the array was known-constant, in some runtime systems, the load might raise an exception. Changing the source location where an exception is raised is generally frowned upon. Languages such as Java often have requirements about where exceptions are raised codified in their specifications. We’ll work only on pure operations for now, but we’ll come back to this later. We do often want to optimize impure operations as well! We’ll start off with the simplest form of value numbering, which operates only on linear sequences of instructions, like basic blocks or traces. Let’s build a small implementation of local value numbering (LVN). We’ll start with straight-line code—no branches or anything tricky. Most compiler optimizations on control-flow graphs (CFGs) iterate over the instructions “top to bottom” 3 and it seems like we can do the same thing here too. From what we’ve seen so far optimizing our made-up IR snippet, we can do something like this: The find-and-replace, remember, is not a literal find-and-replace, but instead something like: (if you have been following along with the toy optimizer series) This several-line function (as long as you already have a hash map and a union-find available to you) is enough to build local value numbering! And real compilers are built this way, too. If you don’t believe me, take a look at this slightly edited snippet from Maxine’s value numbering implementation. It has all of the components we just talked about: iterating over instructions, map lookup, and some substitution. This alone will get you pretty far. Code generators of all shapes tend to leave messy repeated computations all over their generated code and this will make short work of them. Sometimes, though, your computations are spread across control flow—over multiple basic blocks. What do you do then? Computing value numbers for an entire function is called global value numbering (GVN) and it requires dealing with control flow (if, loops, etc). I don’t just mean that for an entire function, we run local value numbering block-by-block. Global value numbering implies that expressions can be de-duplicated and shared across blocks. Let’s tackle control flow case by case. First is the simple case from above: one block. In this case, we can go top to bottom with our value numbering and do alright. The second case is also reasonable to handle: one block flowing into another. In this case, we can still go top to bottom. We just have to find a way to iterate over the blocks. If we’re not going to share value maps between blocks, the order doesn’t matter. But since the point of global value numbering is to share values, we have to iterate them in topological order (reverse post order (RPO)). This ensures that predecessors get visited before successors. If you have , we have to visit first and then . Because of how SSA works and how CFGs work, the second block can “look up” into the first block and use the values from it. To get global value numbering working, we have to copy ’s value map before we start processing so we can re-use the instructions. Maybe something like: Then the expressions can accrue across blocks. can re-use the already-computed from because it is still in the map. …but this breaks as soon as you have control-flow splits. Consider the following shape graph: We’re going to iterate over that graph in one of two orders: A B C or A C B. In either case, we’re going to be adding all this stuff into the value map from one block (say, B) that is not actually available to its sibling block (say, C). When I say “not available”, I mean “would not have been computed before”. This is because we execute either A then B or A then C. There’s no world in which we execute B then C. But alright, look at a third case where there is such a world: a control-flow join. In this diagram, we have two predecessor blocks B and C each flowing into D. In this diagram, B always flows into D and also C always flows into D. So the iterator order is fine, right? Well, still no. We have the same sibling problem as before. B and C still can’t share value maps. We also have a weird question when we enter D: where did we come from? If we came from B, we can re-use expressions from B. If we came from C, we can re-use expressions from C. But we cannot in general know which predecessor block we came from. The only block we know for sure that we executed before D is A. This means we can re-use A’s value map in D because we can guarantee that all execution paths that enter D have previously gone through A. This relationship is called a dominator relationship and this is the key to one style of global value numbering that we’re going to talk about in this post. A block can always use the value map from any other block that dominates it. For completeness’ sake, in the diamond diagram, A dominates each of B and C, too. We can compute dominators a couple of ways 4 , but that’s a little bit out of scope for this blog post. If we assume that we have dominator information available in our CFG, we can use that for global value numbering. And that’s just what—you guessed it—Maxine VM does. It iterates over all blocks in reverse post-order, doing local value numbering, threading through value maps from dominator blocks. In this case, their method gets the immediate dominator : the “closest” dominator block of all the blocks that dominate the current one. And that’s it! That’s the core of Maxine’s GVN implementation . I love how short it is. For not very much code, you can remove a lot of duplicate pure SSA instructions. This does still work with loops, but with some caveats. From p7 of Briggs GVN : The φ-functions require special treatment. Before the compiler can analyze the φ-functions in a block, it must previously have assigned value numbers to all of the inputs. This is not possible in all cases; specifically, any φ-function input whose value flows along a back edge (with respect to the dominator tree) cannot have a value number. If any of the parameters of a φ-function have not been assigned a value number, then the compiler cannot analyze the φ-function, and it must assign a unique, new value number to the result. It also talks about eliminating useless phis, which is optional, but would the strengthen global value numbering pass: it makes more information transparent. But what if we want to handle impure instructions? Languages such as Java allow for reading fields from the / object within methods as if the field were a variable name. This makes code like the following common: Each of these reference to and is an implicit reference to or , which is semantically a field load off an object. You can see it in the bytecode (thanks, Matt Godbolt): When straightforwardly building an SSA IR from the JVM bytecode for this method, you will end up with a bunch of IR that looks like this: Pretty much the same as the bytecode. Even though no code in the middle could modify the field (which would require a re-load), we still have a duplicate load. Bummer. I don’t want to re-hash this too much but it’s possible to fold Load and store forwarding into your GVN implementation by either: See, there’s nothing fundamentally stopping you from tracking the state of your heap at compile-time across blocks. You just have to do a little more bookkeeping. In our dominator-based GVN implementation, for example, you can: Not so bad. Maxine doesn’t do global memory tracking, but they do a limited form of load-store forwarding while building their HIR from bytecode: see GraphBuilder which uses the MemoryMap to help track this stuff. At least they would not have the same duplicate instructions in the example above! We’ve now looked at one kind of value numbering and one implementation of it. What else is out there? Apparently, you can get better results by having a unified hash table (p9 of Briggs GVN ) of expressions, not limiting the value map to dominator-available expressions. Not 100% on how this works yet. They note: Using a unified hash-table has one important algorithmic consequence. Replacements cannot be performed on-line because the table no longer reflects availability. Which is the first time that it occurred to me that hash-based value numbering with dominators was an approximation of available expression analysis. There’s also a totally different kind of value numbering called value partitioning (p12 of Briggs GVN ). See also a nice blog post about this by Allen Wang from the Cornell compiler course . I think this mostly replaces the hashing bit, and you still need some other thing for the available expressions bit. Ben Titzer and Seth Goldstein have some good slides from CMU . Where they talk about the worklist dataflow approach. Apparently this is slower but gets you more available expressions than just looking to dominator blocks. I wonder how much it differs from dominator+unified hash table. While Maxine uses hash table cloning to copy value maps from dominator blocks, there are also compilers such as Cranelift that use scoped hash maps to track this information more efficiently. (Though Amanieu notes that you may not need a scoped hash map and instead can tag values in your value map with the block they came from, ignoring non-dominating values with a quick check. The dominance check makes sense but I haven’t internalized how this affects the set of available expressions yet.) You may be wondering if this kind of algorithm even helps at all in a dynamic language JIT context. Surely everything is too dynamic, right? Actually, no! The JIT hopes to eliminate a lot of method calls and dynamic behaviors, replacing them with guards, assumptions, and simpler operations. These strength reductions often leave behind a lot of repeated instructions. Just the other day, Kokubun filed a value-numbering-like PR to clean up some of the waste. ART has a recent blog post about speeding up GVN. Go forth and give your values more numbers. There’s been an ongoing discussion with Phil Zucker on SSI, GVN, acyclic egraphs, and scoped union-find. TODO summarize Commutativity; canonicalization Seeding alternative representations into the GVN Aegraphs and union-find during GVN https://github.com/bytecodealliance/rfcs/blob/main/accepted/cranelift-egraph.md https://github.com/bytecodealliance/wasmtime/issues/9049 https://github.com/bytecodealliance/wasmtime/issues/4371 Writing this post is roughly the time when I realized that the whole time I was wondering why Cinder did not use union-find for rewriting, it actually did! Optimizing instruction by replacing with followed by copy propagation is equivalent to union-find. ↩ In some forms of SSA, like heap-array SSA or sea of nodes, it’s possible to more easily de-duplicate loads because the memory representation has been folded into (modeled in) the IR. ↩ The order is a little more complicated than that: reverse post-order (RPO). And there’s a paper called “A Simple Algorithm for Global Data Flow Analysis Problems” that I don’t yet have a PDF for that claims that RPO is optimal for solving dataflow problems. ↩ There’s the iterative dataflow way (described in the Cooper paper (PDF)), Lengauer-Tarjan (PDF), the Engineered Algorithm (PDF), hybrid/Semi-NCA approach (PDF), … ↩ initialize a map from instruction numbers to instruction pointers for each instruction if wants to participate in value numbering if ’s value number is already in the map, replace all pointers to in the rest of the program with the corresponding value from the map otherwise, add to the map doing load-store forwarding as part of local value numbering and clearing memory information from the value map at the end of each block, or keeping track of effects across blocks track heap write effects for each block at the start of each block B, union all of the “kill” sets for every block back to its immediate dominator finally, remove the stuff that got killed from the dominator’s value map V8 Hydrogen Writing this post is roughly the time when I realized that the whole time I was wondering why Cinder did not use union-find for rewriting, it actually did! Optimizing instruction by replacing with followed by copy propagation is equivalent to union-find. ↩ In some forms of SSA, like heap-array SSA or sea of nodes, it’s possible to more easily de-duplicate loads because the memory representation has been folded into (modeled in) the IR. ↩ The order is a little more complicated than that: reverse post-order (RPO). And there’s a paper called “A Simple Algorithm for Global Data Flow Analysis Problems” that I don’t yet have a PDF for that claims that RPO is optimal for solving dataflow problems. ↩ There’s the iterative dataflow way (described in the Cooper paper (PDF)), Lengauer-Tarjan (PDF), the Engineered Algorithm (PDF), hybrid/Semi-NCA approach (PDF), … ↩
Wander Console 0.4.0 is the fourth release of Wander, a small, decentralised, self-hosted web console that lets visitors to your website explore interesting websites and pages recommended by a community of independent website owners. To try it, go to susam.net/wander/ . This release brings a few small additions as well as a few minor fixes. You can find the previous release pages here: /code/news/wander/ . The sections below discuss the current release. Wander Console now supports wildcard patterns in ignore lists. An asterisk ( ) anywhere in an ignore pattern matches zero or more characters in URLs. For example, an ignore pattern like can be used to ignore URLs such as this: These ignore patterns are specified in a console's wander.js file. These are very important for providing a good wandering experience to visitors. The owner of a console decides what links they want to ignore in their ignore patterns. The ignore list typically contains commercial websites that do not fit the spirit of the small web, as well as defunct or incompatible websites that do not load in the console. A console with a well maintained ignore list ensures that a visitor to that console has a lower likelihood of encountering commercial or broken websites. For a complete description of the ignore patterns, see Customise Ignore List . By popular demand , Wander now adds a query parameter while loading a recommended web page in the console. The value of this parameter is the console that loaded the recommended page. For example, if you encounter midnight.pub/ while using the console at susam.net/wander/ , the console loads the page using the following URL: This allows the owner of the recommended website to see, via their access logs, that the visit originated from a Wander Console. While this is the default behaviour now, it can be customised in two ways. The value can be changed from the full URL of the Wander Console to a small identifier that identifies the version of Wander Console used (e.g. ). The query parameter can be disabled as well. For more details, see Customise 'via' Parameter . In earlier versions of the console, when a visitor came to your console to explore the Wander network, it picked the first recommendation from the list of recommended pages in it (i.e. your file). But subsequent recommendations came from your neighbours' consoles and then their neighbours' consoles and so on recursively. Your console (the starting console) was not considered again unless some other console in the network linked back to your console. A common way to ensure that your console was also considered in subsequent recommendations too was to add a link to your console in your own console (i.e. in your ). Yes, this created self-loops in the network but this wasn't considered a problem. In fact, this was considered desirable, so that when the console picked a console from the pool of discovered consoles to find the next recommendation, it considered itself to be part of the pool. This workaround is no longer necessary. Since version 0.4.0 of Wander, each console will always consider itself to be part of the pool from which it picks consoles. This means that the web pages recommended by the starting console have a fair chance of being picked for the next web page recommendation. The Wander Console loads the recommended web pages in an element that has sandbox restrictions enabled. The sandbox properties restrict the side effects the loaded web page can have on the parent Wander Console window. For example, with the sandbox restrictions enabled, a loaded web page cannot redirect the parent window to another website. In fact, these days most modern browsers block this and show a warning anyway, but we also block this at a sandbox level too in the console implementation. It turned out that our aggressive sandbox restrictions also blocked legitimate websites from opening a link in a new tab. We decided that opening a link in a new tab is harmless behaviour and we have relaxed the sandbox restrictions a little bit to allow it. Of course, when you click such a link within Wander console, the link will open in a new tab of your web browser (not within Wander Console, as the console does not have any notion of tabs). Although I developed this project on a whim, one early morning while taking a short break from my ongoing studies of algebraic graph theory, the subsequent warm reception on Hacker News and Lobsters has led to a growing community of Wander Console owners. There are two places where the community hangs out at the moment: If you own a personal website but you have not set up a Wander Console yet, I suggest that you consider setting one up for yourself. You can see what it looks like by visiting mine at /wander/ . To set up your own, follow these instructions: Install . It just involves copying two files to your web server. It is about as simple as it gets. Read on website | #web | #technology Wildcard Patterns The 'via' Query Parameter Console Picker Algorithm Allow Links that Open in New Tab New consoles are announced in this thread on Codeberg: Share Your Wander Console . We also have an Internet Relay Chat (IRC) channel named #wander on the Libera IRC network. This is a channel for people who enjoy building personal websites and want to talk to each other. You are welcome to join this channel, share your console URL, link to your website or recent articles as well as share links to other non-commercial personal websites.
Well, here we are—tech layoffs are exploding. According to RationalFX , the total number of departures is expected to reach 273,000 by the end of the year. And while this figure alone doesn't mean much, know this: it represents roughly 10 times the annual volume of pre-COVID layoffs. So, can we really say that humans are being progressively replaced by AI as so many claim ? In France, INSEE speaks of a contraction in the job market directly linked to the rise of AI . But correlation doesn't imply causation, so we're entitled to wonder if there's something else hiding behind all the hype. So I wanted to dig deeper and explore the root causes to understand this wave. And it turns out AI might not be our biggest concern. If you read the latest news, there's plenty to worry about: And I could've cited Meta, Amazon, Klarna, ASML, Ericsson, Salesforce—the list goes on. In most cases, AI is cited as one of the reasons. And this narrative has a major advantage because on paper, these companies say: we're automating, we're gaining productivity, and we're cutting fixed costs. Which tends to reassure shareholders. Block's stock price, for example, recovered a bit in February following the announcements. Same with Oracle's stock price (announcement made March 30th). Now doubts linger, and as one article put it : "Isn't this just layoffs with better marketing—AI washing?" Block is the new name for Square, a payments company you might know from its little payment terminal that's now fairly ubiquitous: But Block isn't just a payment terminal—it's also companies in crypto because its founder, Jack Dorsey, is a big believer in cryptocurrencies. Jack Dorsey is also the former founder of Twitter, which he sold to Elon Musk a few years ago. And Jack tends to think big. Twitter had 8,000 employees when he sold it—a company that now runs with 2,800 people. At Block, the company tripled its headcount post-COVID. We're talking about a 12,000-employee company that had just 4,000 pre-2020. Sure, you can understand it by looking at the COVID effect on Block's stock. Except the return to reality in 2022 hit hard. The company stagnated, and with an explosion in the payroll, things couldn't end well. So we started seeing performance improvement plans emerge. Because if you look at the economic fundamentals, as this article does , you realize that Block is far less profitable than its competitors, with gross margins that are half theirs. Today, AI is mostly a "pretty" way to hide management mistakes and reassure investors. Oracle's case is a bit different. Officially, it's not about cuts driven by productivity gains, but a reorientation of investments toward infrastructure to support AI. In their case too, the stock price is rather concerning, but it's not the main driver of changes. As one article puts it, it's primarily about investment : The job cuts at Oracle come as it has invested heavily in AI, spending both on its own infrastructure and on partnerships with other companies like OpenAI. It plans to spend at least $50bn on infrastructure this year, and it has also raised $50bn in debt in order to "meet demand" for even more AI infrastructure. Oracle is also part of the Stargate initiative, alongside OpenAI, SoftBank and MGX, an AI investment fund backed by US President Donald Trump. Here, it's really about reorienting capital from a traditional activity that's flagging to one that's supposed to replace it in a few years. In reality, I won't criticize it. It's a strategy, a bet. A huge bet, but one that falls into the same category as what Kodak should've done when digital arrived. And Oracle doesn't want to be the next Kodak. And that's really the issue—nobody wants to be the next Kodak. When a leader (like Block, Google, or Meta) lays off 10% of its workforce and its stock goes up the next day, every other company is tempted to do the same. Laying people off because you mismanaged your company would be an admission of failure. But laying people off because you're "transforming through AI" is a vision of the future. And this FOMO—fear of missing out—explains a lot of the current departure plans. Gartner calls it "RIFs before reality", the anticipation of unrealized gains : The employment deal is being rewritten in real time. CEOs are making bold moves based on AI's promise rather than its proven impact. Layoffs linked to AI dominated headlines last year, but Gartner data shows fewer than 1% were due to actual productivity gains. This anticipation drives investment reorientations. Oracle's case is representative here. Not everyone is investing in infrastructure, but many are reinvesting in engineering to automate other business functions and, most importantly, to be ready for the future. AI is no longer just a growth story; it's a cost-reduction tool, and firms are restructuring accordingly. What we're witnessing is a shift from headcount-driven expansion to automation-led productivity, a transition that will define the tech sector in the coming years. —Alan Cohen, analyst at RationalFX Now, this isn't new, and I notice a certain hypocrisy among some developers who are discovering today that their profession has always been about automating others' jobs. It's a shame to discover it when it touches us personally. Anyway, what's certain is that companies are anticipating cuts without yet having proof of the gains to come. It's not just a few layoffs—we're seeing signs, notably raised by levels.fyi's founder : we're witnessing a simplification of career paths. A layoff plan is temporary. But when you start eliminating rungs in career ladders, it signals you're anticipating a durable, global reduction in headcount. And yet, once again, the gains aren't that obvious so far. We all have our opinion on this. I consider myself more productive with AI. But not everyone agrees. But in any case, these are just opinions. There are studies on the topic of productivity, but there's no consensus. You can find studies showing we're less productive , but you can also find others saying the opposite . The causes are multiple. The first is what's called the productivity paradox : You can see the computer age everywhere but in the productivity statistics Yes, back then we wondered if computers really made us more productive. It was far from certain. This paradox is explained two ways. First, companies spend more time configuring tools, training people, and reorganizing workflows than actually producing more. Second, a new technology requires a learning period that can be quite long to master. And that's what we're seeing today—AI usage is totally new. Many are just faster at doing what they did badly before. And it's not like we know how to measure developer productivity anyway. I'll remind you that this question still hasn't found a universal answer since we started asking it. Now, I've also heard plenty of CTOs and IT directors privately say they have the means to prove it. But they don't want to . Because proving it would mean making decisions they don't want to make. And I can tell you that in this period, I'm glad I'm no longer a CTO. Still, as we've seen, productivity gains or not, can we really say all current layoffs are AI-related? Probably not. A recently cited study shows that 59% of HR leaders admit that AI was used as a "cover" to justify budget cuts that were actually driven by : But I think that would be overly reductive. It's mostly the nth demonstration that we've entered a new era post-COVID. Between rising inflation, ongoing trade wars, endless debates about tariffs, skyrocketing energy costs, various conflicts that paralyze parts of international commerce— we're really in a recession . AI is a facade to hide the rest. When Trump gets excited about his Stargate project (building datacenters), it's storytelling to hide the mess, even if it's true that AI is probably one of the drivers of the military sector in coming years and the US losing ground on it is probably making them nervous. Yes, because the worst part is that even on AI, it's not certain the people leading the dance will be American. Recent Chinese models like Ernie, DeepSeek, Qwen, and Kimi are largely on par with Gemini or ChatGPT, without necessarily costing the same. Kimi and DeepSeek reportedly cost 10% of their American counterparts during training phases. Which, incidentally, is encouraging but mostly logical—technology improves, and we've never seen tech stay this inefficient over time. The computer that sent a rocket to the moon was less powerful than our smartphone despite consuming far more energy. And for all these reasons, US companies are in full downsizing mode. Players in AI need to become more competitive. They're investing heavily while cutting payroll at the same time. Other tech companies are following suit, further constrained by hyper-unfavorable economic conditions and in a context where saying you're laying off to increase productivity is more sellable than admitting reality. And us in the middle of all this? Well... I'll be honest—I really wondered how to conclude this piece. I always try to end on a positive note, but the exercise is difficult here. I'll try anyway. Is this the end of an era? Probably the era of unreasonable hyper-growth, which isn't so bad. This forced downsizing might help us get back to basics instead of just chasing vanity metrics (like headcount). It's also a global economic shift and a US bloc that seems to be faltering. I want to see some positivity in thinking that Europe has cards to play. We're less affected than the US by the recent massive waves of layoffs. Probably because we have less insane payrolls than the US and more solid social models. While American giants painfully refocus, it's our moment in Europe to catch up. These new technologies, more accessible and efficient, let us move faster with fewer resources. Maybe it's finally time to create real European tech alternatives—more sober and pragmatic. On that note, you can go back to normal activities. Oracle just laid off 30,000 people (20% of its workforce) Block cut 40% of its headcount Over-hiring post-COVID Investor pressure to increase margins Internal strategy mistakes
This is the last of the interventions I'm trying out to see if I can improve the test loss for a from-scratch GPT-2 small base model, trained on code based on Sebastian Raschka 's book " Build a Large Language Model (from Scratch) ". Back when I did my first training run for a base model, on my local RTX 3090 , I used two optimisations: The first of those boosted training speed from 12,599 tokens per second to 15,402 in my test harness, while AMP on its own boosted it to 19,921 tps (and also allowed me to increase the batch size from 5 to 6). Doing both appeared to hit some kind of diminishing returns -- it maxed out at 19,997 tps, only a little better than AMP on its own. But intuitively, you'd expect that might come at a cost. While I'm sure the PyTorch developers have solid understanding of where switching to 16-bit will have a minimal impact on training quality, it seems too good to be true that it would have no impact at all. Let's see what happens if we switch both of these optimisations off! I added a new flag to the config file for the training harness, with a default of 1 . The core implementation was pretty simple; where we had the call to , we needed to guard it: ...and where we did the forward pass and the loss calculation, we had to not wrap it in a : We also had to avoid unscaling when clipping gradients ; I did that by just not creating a scaler when in non-AMP mode, and then: ...and likewise, instead of using the scaler to step the optimiser, we step it directly if we don't have one: However, there was an issue: non-finite gradients. As I discovered when looking into gradient clipping , the scaler was actually doing something quite useful for us. Somewhat buried in the AMP recipes page is a comment: Now, from the gradient clipping train, I'd come to the conclusion that we were occasionally getting non-finite gradients, and the scaler was saving us from applying junk updates when that happened. If our new code was stepping the optimiser directly, we'd not have that safety net. We'd need something to save us from that. My first cut at this was to use the one other API feature I'd seen that handled non-finite gradients for you: has a parameter, so if we were using gradient clipping, we could set that to and use the exception to skip stepping the optimiser if it was raised. To avoid actually doing any gradient clipping when that happened, if we did not have gradient clipping explicitly enabled, we could set the to infinity. Here's the code for that version . I wasn't very happy with it, though. The use of a gradient clipping API just for its side-effect of telling us about non-finite gradients felt a bit ugly, and even worse, the exception it raised was just a generic , not a custom exception type, which meant that I had to distinguish between it and other by looking at the exception message -- not terribly safe, as that's something that could easily change in the future. So I switched to a more explicit, simpler version: scan through the parameters looking for non-finite gradients, and skip the optimiser step if any are found: I did have some concerns about the performance impact of that; on my local machine it took about 0.13 seconds to scan all of the parameters like that for one step. However, it's better than failing to train the model at all due to garbage updates! So with that, it was time to do the training run. It was pretty clear that I would not be able to run this with my normal microbatch size of 12 on the 8x A100 40 GiB machines that I'd been using so far for these intervention tests -- AMP and the lower-precision matrix multiplications save a bit of VRAM, and I was already pretty much at the limit of what would fit in there. Changing the batch size would make this a poor test of the effects of removing the FP precision stuff in isolation, so I decided that the safest minimal change was to use a machine with more VRAM -- specifically an 8x A100 80 GiB, as that was the closest to what I was using (switching to eg. H100s would add all kinds of confounding changes). The next problem was getting any kind of machine at all! Lambda (they appear to have rebranded away from "Lambda Labs") very rarely seemed to have any available instances, never mind the specific type that I wanted. Eventually, I put together a system to poll their API and launch an instance when one was available. At 3:25am today 2 , I got a Telegram message from the script saying that it had managed to find and start one. I kicked off the training run, and watched as it got started. I could see it was using 43.8 GiB/GPU, so it definitely did need the larger instance type. And it quickly became clear that this was going to be a long one -- it was estimating 8 hours to do the complete run! In a way that was good news, though, as I could just set an alarm and go to bed. When I woke up, it was done: That's 8h7m. For comparison, the baseline train took 3h24m, so we're taking more than double the time. Cost-wise, things were even worse -- more than US$135 in server costs, because as well as needing the server for much longer, being a larger machine it cost US$16.48/hour rather than $11.84. So that's more than three times as expensive as the US$42 that a typical recent train has cost me (Lambda raised their prices, so it went up from about US$35 in February). Still, at least it looked like a solid run: Very similar to the others we've seen in this series. Time to upload it to Hugging Face Hub , and on to the evals to see if all of this extra cost was worthwhile. Firstly, the smoke test -- how did it complete ? Not bad at all! But the important metric is the loss on the test set, and for that I got 3.679. Let's add it to the table to see how that compares to the other training runs: So, a tiny improvement over our baseline. Taking more than twice as long on the training run, and spending three times as much, gained us a loss improvement that's smaller than any other successful intervention. The first question is, did removing AMP and lower-precision matrix multiplications lead to a better model? The answer appears to be "yes" -- but it's a tiny enough difference that it could well be in the noise. But the follow-up has to be, was it worth the extra cost in time and money? And for that I'm certain that the answer is "no". If we'd spent twice the time training with AMP -- on an extra 3B-odd tokens, or on a second epoch with the same 3B -- it seems implausible that the resulting loss would not have been better. And anyway, given that my goal with these interventions is to train the best model I can in two days locally (or 3h30m or so on an 8x A100 40 GiB), it's pretty clear that if we'd cut this run off about halfway through it would have been worse -- and that's not even accounting for it being more memory-hungry. So, I think the takeaway from this is that AMP appears to be a huge win, at least for this model. It has a tiny cost (if any) in model quality, and a huge benefit in training speed, plus a smallish but still useful benefit in training VRAM requirements. 3 And with that, I've reached the end of the interventions that I wanted to try ! Next, I'll need to think through what we need to do to try to stack them up. In particular, is there any easy way to work out whether any of the improvements I've seen might be due to random noise? After all, even though I've been carefully using explicit seeds, each intervention will have changed the way the training run uses the random number stream, and that could easily have an effect. Stay tuned! The name of the flag is not quite right, as of course we're switching off not just AMP but the matrix multiplication precision, but it's a decent shorthand. ↩ I'm a night owl, so luckily I was still awake. ↩ I have to admit that I'm very tempted to see what effect even bigger moves in the low-precision direction might have. What if I moved to some kind of 16-bit training, like ? After all, most of the open weights models like Qwen are at least released at that kind of bittedness. But that's one to look into later, I think. ↩ Setting the 32-bit floating point matrix multiplication precision to "high" rather than to "highest" , which means that it uses lower-precision (but still technically 32-bit) TF32 for those operations rather than normal float32. Using PyTorch's Automated Mixed Precision (AMP) , which allows it to use 16-bit calculations rather than 32-bit in places where it makes sense to do so. The name of the flag is not quite right, as of course we're switching off not just AMP but the matrix multiplication precision, but it's a decent shorthand. ↩ I'm a night owl, so luckily I was still awake. ↩ I have to admit that I'm very tempted to see what effect even bigger moves in the low-precision direction might have. What if I moved to some kind of 16-bit training, like ? After all, most of the open weights models like Qwen are at least released at that kind of bittedness. But that's one to look into later, I think. ↩
Lately I've realised that even though I'm barely on social media, my life still feels 95% digital. I don't post on LinkedIn. My Instagram account mostly exists so I can open links people send me when I absolutely have to. I only keep a fake Facebook account for Marketplace and I use my real account (I've had it since the beginning of Facebook and all my friends live there so it stays) for Messenger only. But there is more than social media to occupy our time now. My days are still full of feeds, links, apps, messages (whatsapp groups and such), digital projects, and little things I feel like I should be keeping up with. And they are easy to keep up with, my phone is always in my hand anyway. RELATED: I Choose Living Over Documenting On the Compulsion to Record The Journal Project I Can’t Quit The Art of Organizing (Things That Don’t Need to Be Organized) At work we showcase our AI agents and I wonder (from my anecdotal experience) if we are creating more busy work for ourselves and replacing reflection and with it, the actual prouctivity and output and good old ““getting the job done.” Most of our work meetings now have extensive transcripts that turn into minutes, notes, action points and insights. I remember when the output of such a meeting would be 2-3 points that we actually remembered. AI Generated Workslop certainly is a thing now. I need a break from it all. And from all the self-imposed shoulds such as scanning my old journals into Day One. Backing up Day One, which hasn't been backed up in a while. An external hard drive backup that's probably a year overdue. A Trello board full of things I want to do but don't really want to or have to do, or maybe I want to do them but can't justify the time when I already feel so busy. After a full day of work and virtual meetings, I feel completely depleted. Those self-imposed obligations, things that used to be fun because they were few and far between, are no longer acceptable. I used to sneak in 15 minutes of personal things at work. Now when I have a break, I'd rather grab a coffee with someone or go for a walk. I crave analog. I crave nature. I crave quiet thinking time (not with a meditation app). I have made some changes already and they seem to be sticking. We have dinner at the table now, which has been good, at least we get some family time before everyone retreats to their own corners. We used to eat while watching a show together as a family, which is fine every now and then, but it was too much of it all. But still my phone is somewhere nearby, and I'm half-watching TV and half-checking a message or voice journaling into an app. None of it is thoughtful. It's just me blabbering. My brain feels like it's all over the place. I used to be able to sit with my own thoughts. I haven't been able to do that in a long time. My daughter broke her arm two weeks ago. She has a purple cast all her friends signed, and she was wondering whether to keep it when it comes off. I told her how I broke my arm as a kid, and she asked if I kept my cast. I said I would have liked to, but what we have now is better. I can take a clear photo of hers and she'll have that memory without keeping the physical thing. Then she asked if I had a photo of mine. I didn't. It never even occurred to me. Back then we took maybe 20 photos a year, if that, and they were all the more precious for it. Now I'm struggling to keep my monthly saves under 150 photos and screenshots, most of which I probably don't need. RELATED: My Photo Management and Memory Keeping Workflow I love my Day One journals , I really do. I just exported all of 2025 to PDF and JSON. But reading back through it, it's every tiny minutia of my life. I like to think it'll be interesting to me one day. Probably not to anyone else. And I wonder whether the time I spent on it was worth it. Yes, there are some insights there , but nothing that I didn’t already know. Had I allowed myself that thinking time instead of outsourcing it to AI. RELATED: Committing to the Thinking Life If my house burned down and I lost everything, the memories that matter are still in my head. I'm a cumulative experience of all of it. Do I need the artifact to know who I am? I still have journals from my 20s and 30s sitting back home in Bosnia. Thick ones, full of pasted tickets and stubs and mementos. I haven't looked at them in years but I can't let them go. My plan is to eventually scan them, maybe pay one of my kids to do it since they won't be able to read my handwriting anyway. RELATED: Letting Go of Old Journals and Mementos But anyway. The point is, I just need a break. From reading things online, from note-keeping, from digital journaling, blogging, saving notes and highlights (even my Readwise subscription feels intrusive now), from all of it. I've decided to do a 30-day digital detox. Within reason, because I still have to work. But I'm off until Tuesday, so I have a few days to ease into it. I'm lucky and privileged that I can do this. That I can shut down for a while and stop following things I can't influence and let go of expectations I put on myself. So that's what I'm doing. Simplifying my phone, deleting apps, putting the phone away when I get home. If we're watching something as a family, fine. One episode. But otherwise, even if I'm bored and restless, I'll go for a walk or play a board game, read a book. Journal (on paper). I'll do nothing, like I used to. Go to bed early. Meet a friend for coffee (and be more proactive about that). It's all become too hard because easy distractions that scratch the itch of everything are too easy. Calm my mind. Slow down. It's been too much. Time to reclaim myself. And if you've gotten this far, the world is reminding me once again of E.M. Forster's The Machine Stops , which I wrote about in 2020 . It feels eerily even more relevant now.
Soundtrack — Soundgarden — Blow Up The Outside World A lot of people try to rationalize the AI bubble by digging up the past. Billions of dollars of waste are justified by saying “OpenAI just like Uber” (it isn’t) and “the data center buildout is just like Amazon Web Services” ( it isn’t, Amazon Web Services was profitable in a decade and cost about $52 billion between 2003 and 2017, and that’s normalized for inflation ) and, most egregiously, that AI is “too big to fail.” I think that these statements are acts of cowardice if they are not backed up by direct and obvious comparisons based on historical data and actual research. They are lazy intellectual tropes borne of at best ignorance, or at worst an intellectual weakness that makes somebody willing to take flimsy information and repeat it as if it were gospel. Nobody has any proof that AI is profitable on inference, nor is there any explanation about how it will become profitable at some point, just a cult-like drone of “they’ll work it out” and “look at the growth!” And the last argument, that AI is “too big to fail” is the most cowardly of them all, given that said statement seldom precedes the word “because,” and then an explanation of why generative AI is so economically important, and why any market correction would be so catastrophic that the bubble must continue to inflate. Over the last few months I have worked diligently to unwind these myths. I discussed earlier in the year how the AI Bubble is much worse than the dot com bubble , and ended last year with a mythbusters (AI edition) that paired well with my free opus, How To Argue With An AI Booster . I don’t see my detractors putting in anything approaching a comparable effort. Or any effort, really. This isn’t a game I’m playing or some sort of competitive situation, nor do I feel compelled to “prove my detractors wrong” with any specificity. I believe time will do that for me. My work is about actually finding out what’s going on, and I believe that explaining it is key to helping people understand the world. None of the people who supposedly believe that AI is the biggest, most hugest and most special boy of all time have done anything to counter my core points around AI economics other than glance-grade misreads of years-old pieces and repeating things like “they’re profitable on inference!” Failing to do thorough analysis deprives the general public of the truth, and misleads investors into making bad decisions. Cynicism and skepticism is often framed as some sort of negative process — “hating” on something for the sake of being negative, or to gain some sort of cultural prestige, or as a way of performatively exhibiting one’s personal morality — when both require the courage (when done properly) to actually understand things in-depth. I also realize many major media outlets are outright against skepticism. While they frame their coverage as “taking on big tech,” their questions are safe, their pieces are safer, their criticisms rarely attack the actual soft parts of the industries (the funding of the companies or infrastructure developments, or the functionality of the technology itself), and almost never seek to directly interrogate the actual statements made by AI leaders and investors, or the various hangers-on and boosters. This is why I’ve been so laser-focused on the mythologies that have emerged over the past couple of years, such as when people say “it’s just like the dot com bubble" — it’s not, it’s much worse ! — because if these mythologies actually withstood scrutiny, my work wouldn’t have much weight. The Dot Com Bubble in particular grinds my gears because it’s a lazy trope used to rationalise rotten economics, all while disregarding the actual harms that took place. Unemployment spiked to 6% , venture capital funds lost 90% of their value, and hundreds of thousands of people in the tech industry lost their jobs, some of them for good. It is utterly grotesque how many people minimize and rationalize the dot com bubble, reframing it as a positive , by saying that “things worked out afterwards,” all so that they can use that as proof that we need to keep giving startups as much money as they ask for forever and that AI is the biggest thing in the world. Yet AI is, in reality, much smaller than people think. As I wrote up ( and Bloomberg clearly were inspired by! ) last week, only 5GW of AI data centers are actually under construction worldwide out of the 12GW that are supposedly meant to be delivered this year, with many of them slowed by the necessity of foreign imports of electrical equipment and, you know, the fact that construction is hard, and the power isn’t available. Meanwhile, back in October 2025, The Wall Street Journal claimed that a “ giant new AI data center is coming to the epicenter of America’s fracking boom ” in a deal between Poolside AI (a company that does not appear to have released a product) and CoreWeave (an unprofitable AI data center company that I’ve written about a great deal ). This was an “exclusive” report that included the following quote: Turns out Mr. Kant was correct, as it was just reported that CoreWeave and Poolside’s deal fell apart , along with Poolside’s $2 billion funding round, as Poolside was “unable to stand up the first cluster of chips to CoreWeave’s timeline,” probably because it couldn’t afford them and wasn’t building anything. The FT added that “...Poolside was unable to convince investors that it could train AI models to the same level of established competitors.” It was also unable to get Google to take over the site. Elsewhere, troubling signs are coming from the secondary markets — the place where people sell stock in private companies like OpenAI. Those signs being that, well, nobody’s buying. Per Bloomberg, over $600 million of OpenAI shares are sitting for sale with no interest from buyers at its current $850 billion post-money valuation, though apparently $2 billion is “ready to deploy” for private Anthropic shares at a $380 billion valuation, according to Next Round Capital (a secondary share sale site)’s Ken Smythe. Though people will try to frame this as a case of OpenAI’s shares “being too close to what they might go public at,” one has to wonder why shares of what is supposed to be the literal most valuable company of all time aren’t being sold at what, theoretically, is a massive discount. One might argue that it’s because people think that the stock might drop on IPO and then grow , but…that doesn’t show a great degree of faith in the company. Investors likely think that Anthropic would go public at a higher price than $380 billion, though I do need to note that the full quote was that "buyers have indicated that they have $2 billion of cash ready to deploy into Anthropic,” which is not the same thing as “will actually buy it.” In any case, the market is no longer treating OpenAI like it’s the golden child. Poolside’s CoreWeave deal is dead. Data centers aren’t getting built. Oracle is laying off tens of thousands of people to fund AI data centers for OpenAI , a company that cannot afford to pay for them. AI demand, despite how fucking annoying everybody is being about it, does not seem to exist at the scale that makes any part of this industry make sense. Yet people still squeal that “The Trump Administration Will Bail Out The AI Industry,” and that OpenAI is “too big to fail,” two statements that are not founded in history or analysis, but are the kinds of things that you say only when you’re either so beaten down by bad news that you’ve effectively given up or are so willfully ignorant that you’ll say stuff without knowing what it means because it makes you feel better. As I discussed in this week’s free newsletter, there is a subprime AI crisis going on. When the subprime mortgage crisis happened towards the end of the 2000s, millions of people built their lives around the idea that easy money would always be available, and that housing would only ever increase in value. These assumptions led to the creation of inherently dangerous mortgage products that never should have existed, and that inevitably screwed the buyers. I talked about these in my last free newsletter. Negative amortization mortgages, for example, were a thing in the US. These were where the mortgage payments didn’t actually cover the cost of the interest , let alone the principal. Similarly, in the UK, my country of birth, many homebuyers used endowment mortgages — an interest-only mortgage where, instead of paying the principal, buyers made monthly payments into an investment savings account that (theoretically) would cover the cost of the property (and perhaps provide some extra cash) at the end of the term. If the investments did extremely well, the buyer could potentially pay off the mortgage early. Far too often, those investments underperformed, meaning buyers were left staring at a shortfall at the end of their term . Across the globe, the value of housing was massively overinflated by the lax standards of a mortgage industry incentivized to sign as many people as possible thanks to a lack of regulation and easily-available funding. The value of housing — and indeed the larger housing and construction boom — was a mirage. In reality, housing wasn’t worth anywhere near what it was being sold for, and the massive demand for housing was only possible with unlimited resources, and under ideal conditions (namely, normal levels of inflation and relatively low interest rates). Those buying houses they couldn’t afford with adjustable-rate mortgages either didn’t understand the terms, or believed members of the media and government officials that suggested housing prices would never decrease and that one could easily refinance the mortgage in question. Similarly, AI startups products are all subsidized by venture capital, and must, in literally every case, allow users to burn tokens at a cost far in excess of their subscription fees, a business that only “works” — and I put that in quotation marks — as long as venture capital continues to fund it. While from the outside these may seem like these are functional businesses with paying users, without the hype cycle justifying endless capital, these businesses wouldn’t be possible , let alone viable , in any way shape or form. For example, Harvey is an AI tool for lawyers that just raised $200 million at an $11 billion valuation , all while having an astonishingly small $190 million in ARR, or $15.8 million a month. It raised another $160 million in December 2025 , after raising $300 million in June 2025 , after raising $300 million in February 2025 . Remove even one of those venture capital rounds and Harvey dies. Much like subprime loans allowed borrowers to get mortgages they had no hope of paying, hype cycles create the illusion of viable businesses that cannot and will never survive without the subsidies. The same goes for companies like OpenAI and Anthropic, both of whom created priority processing tiers for their enterprise customers last year , and the latter of which just added peak rate limits from 5am and 11am Pacific Time . Their customers are the subprime borrowers too — they built workflows around using these products that may or may not be possible with new rate limits, and in the case of enterprise customers using priority processing, their costs massively spiked, which is why Cursor and Replit suddenly made their products worse in the middle of 2025. The reason that the Subprime Mortgage Crisis led to the Great Financial Crisis was that trillions of dollars were used to speculate upon its outcome, across $1.1 trillion of mortgage-backed securities . In mid-2008, per the IMF, more than 60% of all US mortgages had been securitized (as in turned into something you could both trade , speculate on the outcome of and thus buy credit default swaps against ). Collateralized debt obligations — big packages of different mortgages and other kinds of debt that masked the true quality of the underlying assets — expanded to over $2 trillion by 2006 , though the final writedowns were around $218 billion of losses . By comparison, AI is pathetically small. While there were $178.5 billion in data center credit deals done in America last year , speculation and securitization remains low, and in many cases the amount of actual cash available is in tranches based on construction milestones, with most data center projects ( like Aligned’s recent $2.58 billion raise ) funded by “facilities” specifically to minimize risk. As I’ve written about previously, building a data center is hard — especially when you’re building at scale. Finding land, obtaining permits (something which can be frustrated by opposition from neighbors or local governments), obtaining electricity, and then obtaining the labor, machinery, and raw materials all take time. Some components — like electrical transformers — have lead times in excess of a year. And so, you can understand why there’s such a disparity between the dollar amount in data center credit deals, and the actual capital deployed to build said data centers. There also isn’t quite as much wilful ignorance on the part of ratings agencies, though that isn’t to say they’re actually doing their jobs. CoreWeave is one of many data center companies that’s been able to raise billions of dollars using its counterparties’ credit ratings, with Moody’s giving the debt for an unprofitable data center company that would die without endless debt that’s insufficiently capitalized to pay it off an “A3 investment grade rating” because it was able to use Meta’s credit rating and the GPUs in question as collateral. Nevertheless, none of this comes close to the apocalypse that the global economy faced as a result of the catastrophically dangerous bets made by the entire finance industry during the late 2000s, because those bets weren’t made on housing so much as they were made on financial instruments that were given power because of housing. Juiced by a mortgage industry that allowed basically anybody to buy a house regardless of whether they could pay for it, by the middle of 2008, nearly $9 trillion of mortgages were outstanding in America (with around $1.1 trillion of home equity loans on top). Trillions (it’s hard to estimate due to the amount of off-balance sheet trades that happened) more were gambled on top of them as they were packaged into CDOs (collateralized debt obligations) and synthetic CDOs where somebody would buy a credit default swap (CDS, or a bet against the default) against the underlying assets, assuming (incorrectly) that the company issuing the CDS would have the funds to pay them. As I’ll get into deeper in the piece, no such comparison exists for AI, and the asset-backed securitization of data centers and GPUs remains very small. Despite many deceptive studies that attempt to claim otherwise, the economy is relatively unaffected by AI , and while software companies might have debt, AI companies, for the most part, do not appear to, and those that do (OpenAI and Anthropic) have credit facilities rather than lump-sum loans. In totality, the AI industry seems to have made about $65 billion in revenue (not profit!) in 2025, with I estimate about a third of that being the result of OpenAI or Anthropic feeding money to hyperscalers or neoclouds like CoreWeave, and a billions more being AI startups (funded entirely by VC) feeding money to Anthropic and OpenAI to rent their models. Even the venture capital scale of AI startups is drastically overestimated. While ( as reported by The New York Times ) “AI startups” raised $297 billion in the first quarter of 2026, $188 billion of that was taken by OpenAI (which has yet to fully receive the funds!), Anthropic, xAI, and Waymo. In 2025, $425 billion was invested in startups globally , with half of that (about $212.5 billion) going to AI startups, but about half of that ($102 billion) going to Anthropic, OpenAI, xAI, Scale AI’s not-quite-acquisition by Meta , and Bezos’ Project Prometheus . The great financial crisis was, as I’ll get into, a literal collapse of how banks, financial institutions, and property businesses operated, with their reckless speculation on a housing market that was only made possible by a craven mortgage industry incentivized to get people to sign at any cost. When people speculated that there was a bubble, articles ran saying that housing was actually cheap , that subprime lending had actually “made the mortgage market more perfect,” that the sky was not falling in the credit markets because unemployment wasn’t going to rise , that subprime mortgages wouldn’t hurt the economy , and that there was no recession coming . In any case, OpenAI, Anthropic and AI startups in general are far from “systemic risks.” They are not load-bearing. TARP and associated bailouts did not bail out the markets themselves — the S&P 500 lost around half of its value during the bear market that followed , and home prices only returned to growth in 2012. I imagine the “systemic risk” argument is that NVIDIA makes up 7% to 8% of the value of the S&P 500, and that makes sense as long as you ignore that Exxon Mobil was around 5% of the value of S&P 500 in 2008 and saw its value tank for years following the crisis without any bailout to stop it. Microsoft, Meta, Amazon, Google, NVIDIA, Tesla, and Apple are not going bankrupt if AI dies, and anybody suggesting they will is wrong. NVIDIA’s revenue collapsing by 50% or 80% or more would not cause a “financial crisis,” nor would said collapse be considered a “systemic risk” to the stability of the broader economy, though I admit, it would be very bad for the markets writ large. Conversely, a similar blow at TSMC — the company that owns the literal foundries that makes many of the leading-edge semiconductors used today, including those used for data center GPUs — would be, because its collapse would massively reduce the demand for its products, which, I add, require billions of dollars of upfront investment to make. GPUs are not critical to the global economy, nor are Large Language Models, nor is OpenAI, nor is Anthropic. Their collapse would end a hype cycle, which would make the markets drop much like they did in the dot com bust, but that is not the same as too big to fail. Today’s premium is one of the most comprehensive analyses I’ve ever written — a rundown of what makes something “Too Big To Fail,” an explanation of the actual fundamentals of the Great Financial Crisis, and a true systemic analysis of the AI bubble writ large. None of this is too big to fail, and in many ways its failure is necessary for us to move forward as a society.