Posts in Programming (20 found)

A Note on File History in Emacs

Once you start digging beyond the surface, you discover that an ancient piece of text editing software called Emacs was light years ahead of its time. It already contained a clipboard history ( ) and automatic saves/backups decades before contemporary editors took a half-baked stab at mimicking these features. Granted, I don’t make use of the kill ring because Alfred manages that for me across different applications, but it’s still pretty damn impressive. If you manage to stumble past the initial setup, that is. Many default settings in Emacs are… weird? The first thing to configure to transition to a bit of a sane default system is moving all those and backup and auto-save files to a central location to stop the editor from littering all over the place. That’s pretty easy to do but begs the question why they don’t change these defaults? Nobody wants random backup files popping up in their Git change set! Do you even need those files? The system feels archaic at first, but the more you think about the possibilities, the more brilliant the idea becomes. Let’s ignore the auto-save system for now—that doesn’t auto-save but auto-saves an auto-save backup that’s not a backup. Got all that? On every manual , a backup file is created or replaced, depending on your configuration. These files can act as your local file history in case you’re not rocking a version control system. If you do, Emacs notices this and stops producing backups. I do recommend setting to as you might lose interesting historical data before doing a commit. That is one of the more useful features of IntelliJ-based IDEs: to go back in time a few minutes to half an hour. Why would you need that? Emacs has a built-in undo history system! Very true, and perhaps better, as that doesn’t require a save, but isn’t persistent. I can hear you say it. You’re right: there’s a package for that . It’s called undo-fu-session and it serialises the undo information without changing any inner logic. This is even more brilliant if coupled with that helps you step through this. If you increase the three related settings, you will have a powerful way to go back in time. Perhaps a bit too powerful? What is a good limit? Contrary to IntelliJ, Emacs does not persist timestamps: it only works with bytes and limits those, so you’ll have to write a function that periodically cleans up those persisted backups. But are you going to remove the entire tree or just prune a bit? Because if you don’t, this is how your session will look like: The vundo tree: a visualised undo tree with a lot of nodes to diff... And that’s just a clean tree with no branching reapplied undo paths. Good luck trying to hop between different nodes, selecting the right ones to diff and revert to. Without timestamp info, a big undo tree is useless. So I removed : too much power, too much responsibility. Let’s keep that history local and non-persistent (even with a daemon you’ll end up with more than enough). I started fine-tuning the built-in backup settings: Which translates to: There’s a bit of a catch here: Emacs only saves a backup once per editing session and then assumes you’re safe. To force it to create a backup every time you save you’ll have to add to the . Or, as I learned from Alex , save with . Ridiculous. GNU Emacs already featured this snapshot backup system in 1985, when I was born! Fine, we now have a bunch of backup files. Then what? This is where things can get interesting. Since they’re just files, you can obviously run a diff tool against them. But which backup file to choose, and how to easily select the right file from the UI and go from there? Consult to the rescue. Consult is a completing-read on steroids that plugs seamlessly into Vertico, my minibuffer completion framework. It’s basically a fuzzy search tool you can throw anything at—including a list of backup files to choose from. Which is exactly what I did. You can change the label (parse the timestamps), choose a lovely icon if you’re using nerd-icons et al., and tell Consult what to do when (1) you preview the candidate and (2) when you select it. So the plan is this: The result looks like this: Selecting different backups automatically changes the opened diff on the right. I have no idea if I butchered , I tried a few things until it sort-of worked and had some help with the rest. You can find the source somewhere in the Bakemacs config files , look for . It could very well be that something like that already exists, but I haven’t found it so far. does something else. sounds good but requires you to navigate to the backup file yourself. The added advantage of mode is that you can revert the diff and re-apply specific hunks. The idea that I’ll never lose anything stupid I wrote will make me sleep better later tonight. Sublime Text’s persistent but unsaved changed file system and IntelliJ’s local history saved my ass more than once. The fact that I cobbled together a working thing using Consult makes this even more satisfying. Isn’t fooling around in Emacs the best thing ever? I hope these nerdy posts are not alienating too many faithful Brain Baking readers… Because, you know, the Lisp Alien mascot? No? Took it too far? Related topics: / emacs / By Wouter Groeneveld on 10 February 2026.  Reply via email . Keep multiple backup files : , , … Also backup even if it’s under version control Clean up older files: keep the oldest 2 and the last 10. Copy the file, don’t turn the existing one into a backup and save the buffer as the new file. For the current buffer, find all backup files. Easy: , substitute a few weird chars into !, read them from , done. (This very file has a backup called ) Sort and properly format a timestamp to show in the Consult minibuffer using . When previewed, with the current buffer into a new window on the right. When selected, make that diff window permanent. When cancelled with , cleanup the mess.

0 views
xenodium Yesterday

Introducing winpulse

Hard to say officially, but I've been primarily using Emacs for roughly a couple of decades. Maybe my eyesight isn't what it used to be, or maybe I've just been wanting a stronger visual signal as I navigate through Emacs windows. Either way, today's the day I finally did something about it… I asked around to see if a package already existed for this purpose. Folks shared a handful of great options: I wanted my windows to temporarily flash when switching between them. Of these options, pulsar came closest, though highlighting the current line only. This is Emacs, so I should be able to get the behavior I want by throwing some elisp at the problem. With that, I give you winpulse , a package to temporarily highlight focused Emacs windows. This package is fresh out of the oven and likely has some edge cases I haven't yet considered. If you're still keen to check it out, it's available on GitHub . Enjoying this package or my content? I'm an indie dev. Consider sponsoring to help make it sustainable. pulsar: Emacs package to pulse the current line after running select functions. dimmer.el: Interactively highlight the active buffer by dimming the others. window-dim.el: A window dimmer package for Emacs.

0 views

Leaning on AI

It’s been five months since my last dedicated Lean post and as usual I have started to lose steam on Lean projects. After the thrill of discovering the world of formalized mathematics started to wear off, I did not find motivation to push as hard as before. The SF Math with Lean work group kept me vaguely connected (at least in the one hour a week we meet (see retro) ) but other than that I wasn’t putting in more than a few hours a week on math and Lean.

0 views
baby steps 2 days ago

Hello, Dada!

Following on my Fun with Dada post, this post is going to start teaching Dada. I’m going to keep each post short – basically just what I can write while having my morning coffee. 1 Here is a very first Dada program I think all of you will be able to guess what it does. Still, there is something worth noting even in this simple program: “You have the right to write code. If you don’t write a function explicitly, one will be provided for you.” Early on I made the change to let users omit the function and I was surprised by what a difference it made in how light the language felt. Easy change, easy win. Here is another Dada program Unsurprisingly, this program does the same thing as the last one. “Convenient is the default.” Strings support interpolation (i.e., ) by default. In fact, that’s not all they support, you can also break them across lines very conveniently. This program does the same thing as the others we’ve seen: When you have a immediately followed by a newline, the leading and trailing newline are stripped, along with the “whitespace prefix” from the subsequent lines. Internal newlines are kept, so something like this: would print Of course you could also annotate the type of the variable explicitly: You will find that it is . This in and of itself is not notable, unless you are accustomed to Rust, where the type would be . This is of course a perennial stumbling block for new Rust users, but more than that, I find it to be a big annoyance – I hate that I have to write or everywhere that I mix constant strings with strings that are constructed. Similar to most modern languages, strings in Dada are immutable. So you can create them and copy them around: OK, we really just scratched the surface here! This is just the “friendly veneer” of Dada, which looks and feels like a million other languages. Next time I’ll start getting into the permission system and mutation, where things get a bit more interesting. My habit is to wake around 5am and spend the first hour of the day doing “fun side projects”. But for the last N months I’ve actually been doing Rust stuff, like symposium.dev and preparing the 2026 Rust Project Goals . Both of these are super engaging, but all Rust and no play makes Niko a dull boy. Also a grouchy boy.  ↩︎ My habit is to wake around 5am and spend the first hour of the day doing “fun side projects”. But for the last N months I’ve actually been doing Rust stuff, like symposium.dev and preparing the 2026 Rust Project Goals . Both of these are super engaging, but all Rust and no play makes Niko a dull boy. Also a grouchy boy.  ↩︎

0 views
Armin Ronacher 2 days ago

A Language For Agents

Last year I first started thinking about what the future of programming languages might look like now that agentic engineering is a growing thing. Initially I felt that the enormous corpus of pre-existing code would cement existing languages in place but now I’m starting to think the opposite is true. Here I want to outline my thinking on why we are going to see more new programming languages and why there is quite a bit of space for interesting innovation. And just in case someone wants to start building one, here are some of my thoughts on what we should aim for! Does an agent perform dramatically better on a language that it has in its weights? Obviously yes. But there are less obvious factors that affect how good an agent is at programming in a language: how good the tooling around it is and how much churn there is. Zig seems underrepresented in the weights (at least in the models I’ve used) and also changing quickly. That combination is not optimal, but it’s still passable: you can program even in the upcoming Zig version if you point the agent at the right documentation. But it’s not great. On the other hand, some languages are well represented in the weights but agents still don’t succeed as much because of tooling choices. Swift is a good example: in my experience the tooling around building a Mac or iOS application can be so painful that agents struggle to navigate it. Also not great. So, just because it exists doesn’t mean the agent succeeds and just because it’s new also doesn’t mean that the agent is going to struggle. I’m convinced that you can build yourself up to a new language if you don’t want to depart everywhere all at once. The biggest reason new languages might work is that the cost of coding is going down dramatically. The result is the breadth of an ecosystem matters less. I’m now routinely reaching for JavaScript in places where I would have used Python. Not because I love it or the ecosystem is better, but because the agent does much better with TypeScript. The way to think about this: if important functionality is missing in my language of choice, I just point the agent at a library from a different language and have it build a port. As a concrete example, I recently built an Ethernet driver in JavaScript to implement the host controller for our sandbox. Implementations exist in Rust, C, and Go, but I wanted something pluggable and customizable in JavaScript. It was easier to have the agent reimplement it than to make the build system and distribution work against a native binding. New languages will work if their value proposition is strong enough and they evolve with knowledge of how LLMs train. People will adopt them despite being underrepresented in the weights. And if they are designed to work well with agents, then they might be designed around familiar syntax that is already known to work well. So why would we want a new language at all? The reason this is interesting to think about is that many of today’s languages were designed with the assumption that punching keys is laborious, so we traded certain things for brevity. As an example, many languages — particular modern ones — lean heavily on type inference so that you don’t have to write out types. The downside is that you now need an LSP or the resulting compiler error messages to figure out what the type of an expression is. Agents struggle with this too, and it’s also frustrating in pull request review where complex operations can make it very hard to figure out what the types actually are. Fully dynamic languages are even worse in that regard. The cost of writing code is going down, but because we are also producing more of it, understanding what the code does is becoming more important. We might actually want more code to be written if it means there is less ambiguity when we perform a review. I also want to point out that we are heading towards a world where some code is never seen by a human and is only consumed by machines. Even in that case, we still want to give an indication to a user, who is potentially a non-programmer, about what is going on. We want to be able to explain to a user what the code will do without going into the details of how. So the case for a new language comes down to: given the fundamental changes in who is programming and what the cost of code is, we should at least consider one. It’s tricky to say what an agent wants because agents will lie to you and they are influenced by all the code they’ve seen. But one way to estimate how they are doing is to look at how many changes they have to perform on files and how many iterations they need for common tasks. There are some things I’ve found that I think will be true for a while. The language server protocol lets an IDE infer information about what’s under the cursor or what should be autocompleted based on semantic knowledge of the codebase. It’s a great system, but it comes at one specific cost that is tricky for agents: the LSP has to be running. There are situations when an agent just won’t run the LSP — not because of technical limitations, but because it’s also lazy and will skip that step if it doesn’t have to. If you give it an example from documentation, there is no easy way to run the LSP because it’s a snippet that might not even be complete. If you point it at a GitHub repository and it pulls down individual files, it will just look at the code. It won’t set up an LSP for type information. A language that doesn’t split into two separate experiences (with-LSP and without-LSP) will be beneficial to agents because it gives them one unified way of working across many more situations. It pains me as a Python developer to say this, but whitespace-based indentation is a problem. The underlying token efficiency of getting whitespace right is tricky, and a language with significant whitespace is harder for an LLM to work with. This is particularly noticeable if you try to make an LLM do surgical changes without an assisted tool. Quite often they will intentionally disregard whitespace, add markers to enable or disable code and then rely on a code formatter to clean up indentation later. On the other hand, braces that are not separated by whitespace can cause issues too. Depending on the tokenizer, runs of closing parentheses can end up split into tokens in surprising ways (a bit like the “strawberry” counting problem), and it’s easy for an LLM to get Lisp or Scheme wrong because it loses track of how many closing parentheses it has already emitted or is looking at. Fixable with future LLMs? Sure, but also something that was hard for humans to get right too without tooling. Readers of this blog might know that I’m a huge believer in async locals and flow execution context — basically the ability to carry data through every invocation that might only be needed many layers down the call chain. Working at an observability company has really driven home the importance of this for me. The challenge is that anything that flows implicitly might not be configured. Take for instance the current time. You might want to implicitly pass a timer to all functions. But what if a timer is not configured and all of a sudden a new dependency appears? Passing all of it explicitly is tedious for both humans and agents and bad shortcuts will be made. One thing I’ve experimented with is having effect markers on functions that are added through a code formatting step. A function can declare that it needs the current time or the database, but if it doesn’t mark this explicitly, it’s essentially a linting warning that auto-formatting fixes. The LLM can start using something like the current time in a function and any existing caller gets the warning; formatting propagates the annotation. This is nice because when the LLM builds a test, it can precisely mock out these side effects — it understands from the error messages what it has to supply. For instance: Agents struggle with exceptions, they are afraid of them. I’m not sure to what degree this is solvable with RL (Reinforcement Learning), but right now agents will try to catch everything they can, log it, and do a pretty poor recovery. Given how little information is actually available about error paths, that makes sense. Checked exceptions are one approach, but they propagate all the way up the call chain and don’t dramatically improve things. Even if they end up as hints where a linter tracks which errors can fly by, there are still many call sites that need adjusting. And like the auto-propagation proposed for context data, it might not be the right solution. Maybe the right approach is to go more in on typed results, but that’s still tricky for composability without a type and object system that supports it. The general approach agents use today to read files into memory is line-based, which means they often pick chunks that span multi-line strings. One easy way to see this fall apart: have an agent work on a 2000-line file that also contains long embedded code strings — basically a code generator. The agent will sometimes edit within a multi-line string assuming it’s the real code when it’s actually just embedded code in a multi-line string. For multi-line strings, the only language I’m aware of with a good solution is Zig, but its prefix-based syntax is pretty foreign to most people. Reformatting also often causes constructs to move to different lines. In many languages, trailing commas in lists are either not supported (JSON) or not customary. If you want diff stability, you’d aim for a syntax that requires less reformatting and mostly avoids multi-line constructs. What’s really nice about Go is that you mostly cannot import symbols from another package into scope without every use being prefixed with the package name. Eg: instead of . There are escape hatches (import aliases and dot-imports), but they’re relatively rare and usually frowned upon. That dramatically helps an agent understand what it’s looking at. In general, making code findable through the most basic tools is great — it works with external files that aren’t indexed, and it means fewer false positives for large-scale automation driven by code generated on the fly (eg: , invocations). Much of what I’ve said boils down to: agents really like local reasoning. They want it to work in parts because they often work with just a few loaded files in context and don’t have much spatial awareness of the codebase. They rely on external tooling like grep to find things, and anything that’s hard to grep or that hides information elsewhere is tricky. What makes agents fail or succeed in many languages is just how good the build tools are. Many languages make it very hard to determine what actually needs to rebuild or be retested because there are too many cross-references. Go is really good here: it forbids circular dependencies between packages (import cycles), packages have a clear layout, and test results are cached. Agents often struggle with macros. It was already pretty clear that humans struggle with macros too, but the argument for them was mostly that code generation was a good way to have less code to write. Since that is less of a concern now, we should aim for languages with less dependence on macros. There’s a separate question about generics and comptime . I think they fare somewhat better because they mostly generate the same structure with different placeholders and it’s much easier for an agent to understand that. Related to greppability: agents often struggle to understand barrel files and they don’t like them. Not being able to quickly figure out where a class or function comes from leads to imports from the wrong place, or missing things entirely and wasting context by reading too many files. A one-to-one mapping from where something is declared to where it’s imported from is great. And it does not have to be overly strict either. Go kind of goes this way, but not too extreme. Any file within a directory can define a function, which isn’t optimal, but it’s quick enough to find and you don’t need to search too far. It works because packages are forced to be small enough to find everything with grep. The worst case is free re-exports all over the place that completely decouple the implementation from any trivially reconstructable location on disk. Or worse: aliasing. Agents often hate it when aliases are involved. In fact, you can get them to even complain about it in thinking blocks if you let them refactor something that uses lots of aliases. Ideally a language encourages good naming and discourages aliasing at import time as a result. Nobody likes flaky tests, but agents even less so. Ironic given how particularly good agents are at creating flaky tests in the first place. That’s because agents currently love to mock and most languages do not support mocking well. So many tests end up accidentally not being concurrency safe or depend on development environment state that then diverges in CI or production. Most programming languages and frameworks make it much easier to write flaky tests than non-flaky ones. That’s because they encourage indeterminism everywhere. In an ideal world the agent has one command, that lints and compiles and it tells the agent if all worked out fine. Maybe another command to run all tests that need running. In practice most environments don’t work like this. For instance in TypeScript you can often run the code even though it fails type checks . That can gaslight the agent. Likewise different bundler setups can cause one thing to succeed just for a slightly different setup in CI to fail later. The more uniform the tooling the better. Ideally it either runs or doesn’t and there is mechanical fixing for as many linting failures as possible so that the agent does not have to do it by hand. I think we will. We are writing more software now than we ever have — more websites, more open source projects, more of everything. Even if the ratio of new languages stays the same, the absolute number will go up. But I also truly believe that many more people will be willing to rethink the foundations of software engineering and the languages we work with. That’s because while for some years it has felt you need to build a lot of infrastructure for a language to take off, now you can target a rather narrow use case: make sure the agent is happy and extend from there to the human. I just hope we see two things. First, some outsider art: people who haven’t built languages before trying their hand at it and showing us new things. Second, a much more deliberate effort to document what works and what doesn’t from first principles. We have actually learned a lot about what makes good languages and how to scale software engineering to large teams. Yet, finding it written down, as a consumable overview of good and bad language design, is very hard to come by. Too much of it has been shaped by opinion on rather pointless things instead of hard facts. Now though, we are slowly getting to the point where facts matter more, because you can actually measure what works by seeing how well agents perform with it. No human wants to be subject to surveys, but agents don’t care . We can see how successful they are and where they are struggling.

0 views
Kev Quirk 2 days ago

Step Aside, Phone!

I read this post on Manu's blog and it immediately resonated. I've been spending more time than I'd like to admit staring at my phone recently, and most of that consists of a stupid game, or YouTube shorts. If you also want to cut down on some of your phone usage, feel free to join in; I’ll be happy to include links to your posts. As a benchmark, my screen time this week averaged around 2.5 hours per day on my phone and 1.5 hours per day on my tablet. That's bloody embarrassing - 28 hours in one week sat staring at (mostly) pointless shite on a fucking screen. I think my phone usage is more harmful as it's stupid stuff, whereas my tablet is more reading posts in my RSS reader, and "proper" YouTube (whatever that is). I think reducing both and picking up my Kindle more - or just being bored - will be far more healthy though. So count me in, Manu. Thanks for reading this post via RSS. RSS is great, and you're great for using it. ❤️ You can reply to this post by email , or leave a comment .

1 views
Simon Willison 3 days ago

How StrongDM's AI team build serious software without even looking at the code

Last week I hinted at a demo I had seen from a team implementing what Dan Shapiro called the Dark Factory level of AI adoption, where no human even looks at the code the coding agents are producing. That team was part of StrongDM, and they've just shared the first public description of how they are working in Software Factories and the Agentic Moment : We built a Software Factory : non-interactive development where specs + scenarios drive agents that write code, run harnesses, and converge without human review. [...] In kōan or mantra form: In rule form: Finally, in practical form: I think the most interesting of these, without a doubt, is "Code must not be reviewed by humans". How could that possibly be a sensible strategy when we all know how prone LLMs are to making inhuman mistakes ? I've seen many developers recently acknowledge the November 2025 inflection point , where Claude Opus 4.5 and GPT 5.2 appeared to turn the corner on how reliably a coding agent could follow instructions and take on complex coding tasks. StrongDM's AI team was founded in July 2025 based on an earlier inflection point relating to Claude Sonnet 3.5: The catalyst was a transition observed in late 2024: with the second revision of Claude 3.5 (October 2024), long-horizon agentic coding workflows began to compound correctness rather than error. By December of 2024, the model's long-horizon coding performance was unmistakable via Cursor's YOLO mode . Their new team started with the rule "no hand-coded software" - radical for July 2025, but something I'm seeing significant numbers of experienced developers start to adopt as of January 2026. They quickly ran into the obvious problem: if you're not writing anything by hand, how do you ensure that the code actually works? Having the agents write tests only helps if they don't cheat and . This feels like the most consequential question in software development right now: how can you prove that software you are producing works if both the implementation and the tests are being written for you by coding agents? StrongDM's answer was inspired by Scenario testing (Cem Kaner, 2003). As StrongDM describe it: We repurposed the word scenario to represent an end-to-end "user story", often stored outside the codebase (similar to a "holdout" set in model training), which could be intuitively understood and flexibly validated by an LLM. Because much of the software we grow itself has an agentic component, we transitioned from boolean definitions of success ("the test suite is green") to a probabilistic and empirical one. We use the term satisfaction to quantify this validation: of all the observed trajectories through all the scenarios, what fraction of them likely satisfy the user? That idea of treating scenarios as holdout sets - used to evaluate the software but not stored where the coding agents can see them - is fascinating . It imitates aggressive testing by an external QA team - an expensive but highly effective way of ensuring quality in traditional software. Which leads us to StrongDM's concept of a Digital Twin Universe - the part of the demo I saw that made the strongest impression on me. The software they were building helped manage user permissions across a suite of connected services. This in itself was notable - security software is the last thing you would expect to be built using unreviewed LLM code! [The Digital Twin Universe is] behavioral clones of the third-party services our software depends on. We built twins of Okta, Jira, Slack, Google Docs, Google Drive, and Google Sheets, replicating their APIs, edge cases, and observable behaviors. With the DTU, we can validate at volumes and rates far exceeding production limits. We can test failure modes that would be dangerous or impossible against live services. We can run thousands of scenarios per hour without hitting rate limits, triggering abuse detection, or accumulating API costs. How do you clone the important parts of Okta, Jira, Slack and more? With coding agents! As I understood it the trick was effectively to dump the full public API documentation of one of those services into their agent harness and have it build an imitation of that API, as a self-contained Go binary. They could then have it build a simplified UI over the top to help complete the simulation. With their own, independent clones of those services - free from rate-limits or usage quotas - their army of simulated testers could go wild . Their scenario tests became scripts for agents to constantly execute against the new systems as they were being built. This screenshot of their Slack twin also helps illustrate how the testing process works, showing a stream of simulated Okta users who are about to need access to different simulated systems. This ability to quickly spin up a useful clone of a subset of Slack helps demonstrate how disruptive this new generation of coding agent tools can be: Creating a high fidelity clone of a significant SaaS application was always possible, but never economically feasible. Generations of engineers may have wanted a full in-memory replica of their CRM to test against, but self-censored the proposal to build it. The techniques page is worth a look too. In addition to the Digital Twin Universe they introduce terms like Gene Transfusion for having agents extract patterns from existing systems and reuse them elsewhere, Semports for directly porting code from one language to another and Pyramid Summaries for providing multiple levels of summary such that an agent can enumerate the short ones quickly and zoom in on more detailed information as it is needed. StrongDM AI also released some software - in an appropriately unconventional manner. github.com/strongdm/attractor is Attractor , the non-interactive coding agent at the heart of their software factory. Except the repo itself contains no code at all - just three markdown files describing the spec for the software in meticulous detail, and a note in the README that you should feed those specs into your coding agent of choice! github.com/strongdm/cxdb is a more traditional release, with 16,000 lines of Rust, 9,500 of Go and 6,700 of TypeScript. This is their "AI Context Store" - a system for storing conversation histories and tool outputs in an immutable DAG. It's similar to my LLM tool's SQLite logging mechanism but a whole lot more sophisticated. I may have to gene transfuse some ideas out of this one! I visited the StrongDM AI team back in October as part of a small group of invited guests. The three person team of Justin McCarthy, Jay Taylor and Navan Chauhan had formed just three months earlier, and they already had working demos of their coding agent harness, their Digital Twin Universe clones of half a dozen services and a swarm of simulated test agents running through scenarios. And this was prior to the Opus 4.5/GPT 5.2 releases that made agentic coding significantly more reliable a month after those demos. It felt like a glimpse of one potential future of software development, where software engineers move from building the code to building and then semi-monitoring the systems that build the code. The Dark Factory. I glossed over this detail in my first published version of this post, but it deserves some serious attention. If these patterns really do add $20,000/month per engineer to your budget they're far less interesting to me. At that point this becomes more of a business model exercise: can you create a profitable enough line of products that you can afford the enormous overhead of developing software in this way? Building sustainable software businesses also looks very different when any competitor can potentially clone your newest features with a few hours of coding agent work. I hope these patterns can be put into play with a much lower spend. I've personally found the $200/month Claude Max plan gives me plenty of space to experiment with different agent patterns, but I'm also not running a swarm of QA testers 24/7! I think there's a lot to learn from StrongDM even for teams and individuals who aren't going to burn thousands of dollars on token costs. I'm particularly invested in the question of what it takes to have agents prove that their code works without needing to review every line of code they produce. You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options . Why am I doing this? (implied: the model should be doing this instead) Code must not be written by humans Code must not be reviewed by humans If you haven't spent at least $1,000 on tokens today per human engineer, your software factory has room for improvement

0 views
Karboosx 4 days ago

Tech documentation is pointless (mostly)

Do you really trust documentation for your evolving codebase? Probably not fully! So why do we even write documentation or constantly complain about lack of it? Let's talk about that :D

0 views

Premium: The Hater's Guide To Microsoft

Have you ever looked at something too long and felt like you were sort of seeing through it? Has anybody actually looked at a company this much in a way that wasn’t some sort of obsequious profile of a person who worked there? I don’t mean this as a way to fish for compliments — this experience is just so peculiar, because when you look at them hard enough, you begin to wonder why everybody isn’t just screaming all the time.  Yet I really do enjoy it. When you push aside all the marketing and the interviews and all that and stare at what a company actually does and what its users and employees say, you really get a feel of the guts of a company. I’m enjoying it. The Hater’s Guides are a lot of fun, and I’m learning all sorts of things about the ways in which companies try to hide their nasty little accidents and proclivities.  Today, I focus on one of the largest.  In the last year I’ve spoken to over a hundred different tech workers, and the ones I hear most consistently from are the current and former victims of Microsoft, a company with a culture in decline, in large part thanks to its obsession with AI. Every single person I talk to about this company has venom on their tongue, whether they’re a regular user of Microsoft Teams or somebody who was unfortunate to work at the company any time in the last decade. Microsoft exists as a kind of dark presence over business software and digital infrastructure. You inevitably have to interact with one of its products — maybe it’s because somebody you work with uses Teams, maybe it’s because you’re forced to use SharePoint, or perhaps you’re suffering at the hands of PowerBI — because Microsoft is the king of software sales. It exists entirely to seep into the veins of an organization and force every computer to use Microsoft 365, or sit on effectively every PC you use, forcing you to interact with some sort of branded content every time you open your start menu . This is a direct results of the aggressive monopolies that Microsoft built over effectively every aspect of using the computer, starting by throwing its weight around in the 80s to crowd out potential competitors to MS-DOS and eventually moving into everything including cloud compute, cloud storage, business analytics, video editing, and console gaming, and I’m barely a third through the list of products.  Microsoft uses its money to move into new markets, uses aggressive sales to build long-term contracts with organizations, and then lets its products fester until it’s forced to make them better before everybody leaves, with the best example being the recent performance-focused move to “ rebuild trust in Windows ” in response to the upcoming launch of Valve’s competitor to the Xbox (and Windows gaming in general), the Steam Machine . Microsoft is a company known for two things: scale and mediocrity. It’s everywhere, its products range from “okay” to “annoying,” and virtually every one of its products is a clone of something else.  And nowhere is that mediocrity more obvious than in its CEO. Since taking over in 2014, CEO Satya Nadella has steered this company out of the darkness caused by aggressive possible chair-thrower Steve Ballmer , transforming from the evils of stack ranking to encouraging a “growth mindset” where you “believe your most basic abilities can be developed through dedication and hard work.” Workers are encouraged to be “learn-it-alls” rather than “know-it-alls,” all part of a weird cult-like pseudo-psychology that doesn’t really ring true if you actually work at the company .  Nadella sells himself as a calm, thoughtful and peaceful man, yet in reality he’s one of the most merciless layoff hogs in known history. He laid off 18,000 people in 2014 months after becoming CEO, 7,800 people in 2015 , 4,700 people in 2016 , 3,000 people in 2017 , “hundreds” of people in 2018 , took a break in 2019, every single one of the workers in its physical stores in 2020 along with everybody who worked at MSN , took a break in 2021, 1,000 people in 2022 , 16,000 people in 2023 , 15,000 people in 2024 and 15,000 people in 2025 .  Despite calling for a “ referendum on capitalism ” in 2020 and suggesting companies “grade themselves” on the wider economic benefits they bring to society, Nadella has overseen an historic surge in Microsoft’s revenues — from around $83 billion a year when he joined in 2014 to around $300 billion on a trailing 12-month basis — while acting in a way that’s callously indifferent to both employees and customers alike.  At the same time, Nadella has overseen Microsoft’s transformation from an asset-light software monopolist that most customers barely tolerate to an asset-heavy behemoth that feeds its own margins into GPUs that only lose it money. And it’s that transformation that is starting to concern investors , and raises the question of whether Microsoft is heading towards a painful crash.  You see, Microsoft is currently trying to pull a fast one on everybody, claiming that its investments in AI are somehow paying off despite the fact that it stopped reporting AI revenue in the first quarter of 2025 . In reality, the one segment where it would matter — Microsoft Azure, Microsoft’s cloud platform where the actual AI services are sold — is stagnant, all while Redmond funnels virtually every dollar of revenue directly into more GPUs.  Intelligent Cloud also represents around 40% of Microsoft’s total revenue, and has done so consistently since FY2022. Azure sits within Microsoft's Intelligent Cloud segment, along with server products and enterprise support. For the sake of clarity, here’s how Microsoft describes Intelligent Cloud in its latest end-of-year K-10 filing : Our Intelligent Cloud segment consists of our public, private, and hybrid server products and cloud services that power modern business and developers. This segment primarily comprises: It’s a big, diverse thing — and Microsoft doesn’t really break things down further from here — but Microsoft makes it clear in several places that Azure is the main revenue driver in this fairly diverse business segment.  Some bright spark is going to tell me that Microsoft said it has 15 million paid 365 Copilot subscribers (which, I add, sits under its Productivity and Business Processes segment), with reporters specifically saying these were corporate seats, a fact I dispute, because this is the quote from Microsoft’s latest conference call around earnings : At no point does Microsoft say “corporate seat” or “business seat.” “Enterprise Copilot Chat” is a free addition to multiple different Microsoft 365 products , and Microsoft 365 Copilot could also refer to Microsoft’s $18 to $21-a-month addition to Copilot Business , as well as Microsoft’s enterprise $30-a-month plans. And remember: Microsoft regularly does discounts through its resellers to bulk up these numbers. When Nadella took over, Microsoft had around $11.7 billion in PP&E (property, plant, and equipment ). A little over a decade later, that number has ballooned to $261 billion, with the vast majority added since 2020 (when Microsoft’s PP&E sat around $41 billion).  Also, as a reminder: Jensen Huang has made it clear that GPUs are going to be upgraded on a yearly cycle, guaranteeing that Microsoft’s armies of GPUs regularly hurtle toward obsolescence. Microsoft, like every big tech company, has played silly games with how it depreciates assets , extending the “useful life” of all GPUs so that they depreciate over six years, rather than four.  And while someone less acquainted with corporate accounting might assume that this move is a prudent, fiscally-conscious tactic to reduce spending by using assets for longer, and stretching the intervals between their replacements, in reality it’s a handy tactic to disguise the cost of Microsoft’s profligate spending on the balance sheet.  You might be forgiven for thinking that all of this investment was necessary to grow Azure, which is clearly the most important part of Microsoft’s Intelligent Cloud segment. I n Q2 FY2020 , Intelligent Cloud revenue sat at $11.9 billion on PP&E of around $40 billion, and as of Microsoft’s last quarter, Intelligent Cloud revenue sat at around $32.9 billion on PP&E that has increased by over 650%.  Good, right? Well, not really. Let’s compare Microsoft’s Intelligent Cloud revenue from the last five years: In the last five years, Microsoft has gone from spending 38% of its Intelligent Cloud revenue on capex to nearly every penny (over 94%) of it in the last six quarters, at the same time in two and a half years that Intelligent Cloud has failed to show any growth.  Things, I’m afraid, get worse. Microsoft announced in July 2025 — the end of its 2025 fiscal year— that Azure made $75 billion in revenue in FY2025 . This was, as the previous link notes, the first time that Microsoft actually broke down how much Azure actually made, having previously simply lumped it in with the rest of the Intelligent Cloud segment.  I’m not sure what to read from that, but it’s still not good. meaning that Microsoft spent every single penny of its Azure revenue from that fiscal year on capital expenditures of $88 billion and then some, a little under 117% of all Azure revenue to be precise. If we assume Azure regularly represents 71% of Intelligent Cloud revenue, Microsoft has been spending anywhere from half to three-quarters of Azure’s revenue on capex. To simplify: Microsoft is spending lots of money to build out capacity on Microsoft Azure (as part of Intelligent Cloud), and growth of capex is massively outpacing the meager growth that it’s meant to be creating.  You know what’s also been growing? Microsoft’s depreciation charges, which grew from $2.7 billion in the beginning of 2023 to $9.1 billion in Q2 FY2026 , though I will add that they dropped from $13 billion in Q1 FY2026, and if I’m honest, I have no idea why! Nevertheless, depreciation continues to erode Microsoft’s on-paper profits, growing (much like capex, as the two are connected!) at a much-faster rate than any investment in Azure or Intelligent Cloud. But worry not, traveler! Microsoft “beat” on earnings last quarter, making a whopping $38.46 billion in net income …with $9.97 billion of that coming from recapitalizing its stake in OpenAI. Similarly, Microsoft has started bulking up its Remaining Performance Obligations. See if you can spot the difference between Q1 and Q2 FY26, emphasis mine: So, let’s just lay it out: …Microsoft’s upcoming revenue dropped between quarters as every single expenditure increased, despite adding over $200 billion in revenue from OpenAI. A “weighted average duration” of 2.5 years somehow reduced Microsoft’s RPOs. But let’s be fair and jump back to Q4 FY2025… 40% of $375 billion is $150 billion. Q3 FY25 ? 40% on $321 billion, or $128.4 billion. Q2 FY25 ? $304 billion, 40%, or $121.6 billion.  It appears that Microsoft’s revenue is stagnating, even with the supposed additions of $250 billion in spend from OpenAI and $30 billion from Anthropic , the latter of which was announced in November but doesn’t appear to have manifested in these RPOs at all. In simpler terms, OpenAI and Anthropic do not appear to be spending more as a result of any recent deals, and if they are, that money isn’t arriving for over a year. Much like the rest of AI, every deal with these companies appears to be entirely on paper, likely because OpenAI will burn at least $115 billion by 2029 , and Anthropic upwards of $30 billion by 2028, when it mysteriously becomes profitable two years before OpenAI “does so” in 2030 .  These numbers are, of course, total bullshit. Neither company can afford even $20 billion of annual cloud spend, let alone multiple tens of billions a year, and that’s before you get to OpenAI’s $300 billion deal with Oracle that everybody has realized ( as I did in September ) requires Oracle to serve non-existent compute to OpenAI and be paid hundreds of billions of dollars that, helpfully, also don’t exist. Yet for Microsoft, the problems are a little more existential.  Last year, I calculated that big tech needed $2 trillion in new revenue by 2030 or investments in AI were a loss , and if anything, I think I slightly underestimated the scale of the problem. As of the end of its most recent fiscal quarter, Microsoft has spent $277 billion or so in capital expenditures since the beginning of FY2022, with the majority of them ($216 billion) happening since the beginning of FY2024. Capex has ballooned to the size of 45.5% of Microsoft’s FY26 revenue so far — and over 109% of its net income.  This is a fucking disaster. While net income is continuing to grow, it (much like every other financial metric) is being vastly outpaced by capital expenditures, none of which can be remotely tied to profits , as every sign suggests that generative AI only loses money. While AI boosters will try and come up with complex explanations as to why this is somehow alright, Microsoft’s problem is fairly simple: it’s now spending 45% of its revenues to build out data centers filled with painfully expensive GPUs that do not appear to be significantly contributing to overall revenue, and appear to have negative margins. Those same AI boosters will point at the growth of Intelligent Cloud as proof, so let’s do a thought experiment (even though they are wrong): if Intelligent Cloud’s segment growth is a result of AI compute, then the cost of revenue has vastly increased, and the only reason we’re not seeing it is that the increased costs are hitting depreciation first. You see, Intelligent Cloud is stalling, and while it might be up by 8.8% on an annualized basis (if we assume each quarter of the year will be around $30 billion, that makes $120 billion, so about an 8.8% year-over-year increase from $106 billion), that’s come at the cost of a massive increase in capex (from $88 billion for FY2025 to $72 billion for the first two quarters of FY2026 ), and gross margins that have deteriorated from 69.89% in Q3 FY2024 to 68.59% in FY2026 Q2 , and while operating margins are up, that’s likely due to Microsoft’s increasing use of contract workers and increased recruitment in cheaper labor markets. And as I’ll reveal later, Microsoft has used OpenAI’s billions in inference spend to cover up the collapse of the growth of the Intelligent Cloud segment. OpenAI’s inference spend now represents around 10% of Azure’s revenue. Microsoft, as I discussed a few weeks ago , is in a bind. It keeps buying GPUs, all while waiting for the GPUs it already has to start generating revenue, and every time a new GPU comes online, its depreciation balloons. Capex for GPUs began in seriousness in Q1 FY2023 following October’s shipments of NVIDIA’s H100 GPUs , with reports saying that Microsoft bought 150,000 H100s in 2023 (around $4 billion at $27,000 each) and 485,000 H100s in 2024 ($13 billion). These GPUs are yet to provide much meaningful revenue, let alone any kind of profit , with reports suggesting ( based on Oracle leaks ) that the gross margins of H100s are around 26% and A100s (an older generation launched in 2020) are 9%, for which the technical term is “dogshit.”  Somewhere within that pile of capex also lies orders for H200 GPUs, and as of 2024, likely NVIDIA’s B100 (and maybe B200) Blackwell GPUs too. You may also notice that those GPU expenses are only some portion of Microsoft’s capex, and the reason is because Microsoft spends billions on finance leases and construction costs. What this means in practical terms is that some of this money is going to GPUs that are obsolete in 6 years, some of it’s going to paying somebody else to lease physical space, and some of it is going into building a bunch of data centers that are only useful for putting GPUs in. And none of this bullshit is really helping the bottom line! Microsoft’s More Personal Computing segment — including Windows, Xbox, Microsoft 365 Consumer, and Bing — has become an increasingly-smaller part of revenue, representing in the latest quarter a mere 17.64% of Microsoft’s revenue in FY26 so far, down from 30.25% a mere four years ago. We are witnessing the consequences of hubris — those of a monopolist that chased out any real value creators from the organization, replacing them with an increasingly-annoying cadre of Business Idiots like career loser Jay Parikh and scummy, abusive timewaster Mustafa Suleyman .  Satya Nadella took over Microsoft with the intention of fixing its culture, only to replace the aggressive, loudmouthed Ballmer brand with a poisonous, passive aggressive business mantra of “you’ve always got to do more with less.” Today, I’m going to walk you through the rotting halls of Redmond’s largest son, a bumbling conga line of different businesses that all work exactly as well as Microsoft can get away with.  Welcome to The Hater’s Guide To Microsoft , or Instilling The Oaf Mindset. Server products and cloud services, including Azure and other cloud services, comprising cloud and AI consumption-based services, GitHub cloud services, Nuance Healthcare cloud services, virtual desktop offerings, and other cloud services; and Server products, comprising SQL Server, Windows Server, Visual Studio, System Center, related Client Access Licenses (“CALs”), and other on-premises offerings. Enterprise and partner services, including Enterprise Support Services, Industry Solutions, Nuance professional services, Microsoft Partner Network, and Learning Experience. Q1: $398 billion of RPOs, 40% within 12 months, $159.2 billion in upcoming revenue. Q2: $625 billion of RPOs, 25% within 12 months, $156.25 billion in upcoming revenue.

0 views
Susam Pal 5 days ago

Stories From 25 Years of Computing

Last year, I completed 20 years in professional software development. I wanted to write a post to mark the occasion back then, but couldn't find the time. This post is my attempt to make up for that omission. In fact, I have been involved in software development for a little longer than 20 years. Although I had my first taste of computer programming as a child, it was only when I entered university about 25 years ago that I seriously got into software development. So I'll start my stories from there. These stories are less about software and more about people. Unlike many posts of this kind, this one offers no wisdom or lessons. It only offers a collection of stories. I hope you'll like at least a few of them. The first story takes place in 2001, shortly after I joined university. One evening, I went to the university computer laboratory to browse the Web. Out of curiosity, I typed into the address bar to see what kind of website existed there. I ended up on this home page: susam.com . I remember that the text and the banner looked much larger back then. Since display resolutions were lower, the text and banner covered almost half the screen. I knew very little about the Internet then and I was just trying to make sense of it. I remember wondering what it would take to create my own website, perhaps at . That's when an older student who had been watching me browse over my shoulder approached and asked if I had created the website. I told him I hadn't and that I had no idea how websites were made. He asked me to move aside, took my seat and clicked View > Source in Internet Explorer. He then explained how websites are made of HTML pages and how those pages are simply text instructions. Next, he opened Notepad and wrote a simple HTML page that looked something like this: He then opened the page in a web browser and showed how it rendered. After that, he demonstrated a few more features such as changing the font face and size, centring the text and altering the page's background colour. Although the tutorial lasted only about ten minutes, it made the World Wide Web feel far less mysterious and much more fascinating. That person had an ulterior motive though. After the tutorial, he never returned the seat to me. He just continued browsing the Web and waited for me to leave. I was too timid to ask for my seat back. Seats were limited, so I returned to my dorm room both disappointed that I couldn't continue browsing that day and excited about all the websites I might create with this newfound knowledge. I could never register for myself though. That domain was always used by some business selling Turkish cuisines. Eventually, I managed to get the next best thing: a domain of my own. That brief encounter in the university laboratory set me on a lifelong path of creating and maintaining personal websites. The second story also comes from my university days. I was hanging out with my mates in the computer laboratory, in front of an MS-DOS machine powered by an Intel 8086 microprocessor. I was writing a lift control program in assembly. In those days, it was considered important to deliberately practise solving made-up problems as a way of honing our programming skills. As I worked on my program, my mind drifted to a small detail about the 8086 microprocessor that we had recently learned in a lecture. Our professor had explained that, when the 8086 microprocessor is reset, execution begins with CS:IP set to FFFF:0000. So I murmured to anyone who cared to listen, 'I wonder if the system will reboot if I jump to FFFF:0000.' I then opened and jumped to that address. The machine rebooted instantly. One of my friends, who topped the class every semester, had been watching over my shoulder. As soon as the machine restarted, he exclaimed, 'How did you do that?' I explained that the reset vector is located at physical address FFFF0 and that the CS:IP value FFFF:0000 maps to that address in real mode. After that, I went back to working on my lift control program and didn't think much more about the incident. About a week later, the same friend came to my dorm room. He sat down with a grave look on his face and asked, 'How did you know to do that? How did it occur to you to jump to the reset vector?' I must have said something like, 'It just occurred to me. I remembered that detail from the lecture and wanted to try it out.' He then said, 'I want to be able to think like that. I come top of the class every year, but I don't think the way you do. I would never have thought of taking a small detail like that and testing it myself.' I replied that I was just curious to see whether what we had learnt actually worked in practice. He responded, 'And that's exactly it. It would never occur to me to try something like that. I feel disappointed that I keep coming top of the class, yet I am not curious in the same way you are. I've decided I don't want to top the class anymore. I just want to explore and experiment with what we learn, the way you do.' That was all he said before getting up and heading back to his dorm room. I didn't take it very seriously at the time. I couldn't imagine why someone would willingly give up the accomplishment of coming first every year. But he kept his word. He never topped the class again. He still ranked highly, often within the top ten, but he kept his promise of never finishing first again. To this day, I feel a mix of embarrassment and pride whenever I recall that incident. With a single jump to the processor's reset entry point, I had somehow inspired someone to step back from academic competition in order to have more fun with learning. Of course, there is no reason one cannot do both. But in the end, that was his decision, not mine. In my first job after university, I was assigned to work on the installer for a specific component of an e-banking product. The installer was written in Python and was quite fragile. During my first week on the project, I spent much of my time stabilising the installer and writing a user guide with step-by-step instructions on how to use it. The result was well received and appreciated by both my seniors and management. To my surprise, my user guide was praised more than my improvements to the installer. While the first few weeks were enjoyable, I soon realised I would not find the work fulfilling for very long. I wrote to management a few times to ask whether I could transfer to a team where I could work on something more substantial. My emails were initially met with resistance. After several rounds of discussion, however, someone who had heard about my situation reached out and suggested a team whose manager might be interested in interviewing me. The team was based in a different city. I was young and willing to relocate wherever I could find good work, so I immediately agreed to the interview. This was in 2006, when video conferencing was not yet common. On the day of the interview, the hiring manager called me on my desk phone. He began by introducing the team, which called itself Archie , short for architecture . The team developed and maintained the web framework and core architectural components on which the entire e-banking product was built. The product had existed long before open source frameworks such as Spring or Django became popular, so features such as API routing, authentication and authorisation layers, cookie management and similar capabilities were all implemented in-house by this specialised team. Because the software was used in banking environments, it also had to pass strict security testing and audits to minimise the risk of serious flaws. The interview began well. He asked several questions related to software security, such as what SQL injection is and how it can be prevented or how one might design a web framework that mitigates cross-site scripting attacks. He also asked programming questions, most of which I answered pretty well. Towards the end, however, he asked how we could prevent MITM attacks. I had never heard the term, so I admitted that I did not know what MITM meant. He then asked, 'Man in the middle?' but I still had no idea what that meant or whether it was even a software engineering concept. He replied, 'Learn everything you can about PKI and MITM. We need to build a digital signatures feature for one of our corporate banking products. That's the first thing we'll work on.' Over the next few weeks, I studied RFCs and documentation related to public key infrastructure, public key cryptography standards and related topics. At first, the material felt intimidating, but after spending time each evening reading whatever relevant literature I could find, things gradually began to make sense. Concepts that initially seemed complex and overwhelming eventually felt intuitive and elegant. I relocated to the new city a few weeks later and delivered the digital signatures feature about a month after joining the team. We used the open source Bouncy Castle library to implement digital signatures. After that project, I worked on other parts of the product too. The most rewarding part was knowing that the code I was writing became part of a mature product used by hundreds of banks and millions of users. It was especially satisfying to see the work pass security testing and audits and be considered ready for release. That was my first real engineering job. My manager also turned out to be an excellent mentor. Working with him helped me develop new skills and his encouragement gave me confidence that stayed with me for years. Nearly two decades have passed since then, yet the product is still in use. In fact, in my current phase of life I sometimes encounter it as a customer. Occasionally, I open the browser's developer tools to view the page source where I can still see traces of the HTML generated by code I wrote almost twenty years ago. Around 2007 or 2008, I began working on a proof of concept for developing widgets for an OpenTV set-top box. The work involved writing code in a heavily trimmed-down version of C. One afternoon, while making good progress on a few widgets, I noticed that they would occasionally crash at random. I tried tracking down the bugs, but I was finding it surprisingly difficult to understand my own code. I had managed to produce some truly spaghetti code full of dubious pointer operations that were almost certainly responsible for the crashes, yet I could not pinpoint where exactly things were going wrong. Ours was a small team of four people, each working on an independent proof of concept. The most senior person on the team acted as our lead and architect. Later that afternoon, I showed him my progress and explained that I was still trying to hunt down the bugs causing the widgets to crash. He asked whether he could look at the code. After going through it briefly and probably realising that it was a bit of a mess, he asked me to send him the code as a tarball, which I promptly did. He then went back to his desk to study the code. I remember thinking that there was no way he was going to find the problem anytime soon. I had been debugging it for hours and barely understood what I had written myself; it was the worst spaghetti code I had ever produced. With little hope of a quick solution, I went back to debugging on my own. Barely five minutes later, he came back to my desk and asked me to open a specific file. He then showed me exactly where the pointer bug was. It had taken him only a few minutes not only to read my tangled code but also to understand it well enough to identify the fault and point it out. As soon as I fixed that line, the crashes disappeared. I was genuinely in awe of his skill. I have always loved computing and programming, so I had assumed I was already fairly good at it. That incident, however, made me realise how much further I still had to go before I could consider myself a good software developer. I did improve significantly in the years that followed and today I am far better at managing software complexity than I was back then. In another project from that period, we worked on another set-top box platform that supported Java Micro Edition (Java ME) for widget development. One day, the same architect from the previous story asked whether I could add animations to the widgets. I told him that I believed it should be possible, though I'd need to test it to be sure. Before continuing with the story, I need to explain how the different stakeholders in the project were organised. Our small team effectively played the role of the software vendor. The final product going to market would carry the brand of a major telecom carrier, offering direct-to-home (DTH) television services, with the set-top box being one of the products sold to customers. The set top box was manufactured by another company. So the project was a partnership between three parties: our company as the software vendor, the telecom carrier and the set-top box manufacturer. The telecom carrier wanted to know whether widgets could be animated on screen with smooth slide-in and slide-out effects. That was why the architect approached me to ask whether it could be done. I began working on animating the widgets. Meanwhile, the architect and a few senior colleagues attended a business meeting with all the partners present. During the meeting, he explained that we were evaluating whether widget animations could be supported. The set-top box manufacturer immediately dismissed the idea, saying, 'That's impossible. Our set-top box does not support animation.' When the architect returned and shared this with us, I replied, 'I do not understand. If I can draw a widget, I can animate it too. All it takes is clearing the widget and redrawing it at slightly different positions repeatedly. In fact, I already have a working version.' I then showed a demo of the animated widgets running on the emulator. The following week, the architect attended another partners' meeting where he shared updates about our animated widgets. I was not personally present, so what follows is second-hand information passed on by those who were there. I learnt that the set-top box company reacted angrily. For some reason, they were unhappy that we had managed to achieve results using their set-top box and APIs that they had officially described as impossible. They demanded that we stop work on animation immediately, arguing that our work could not be allowed to contradict their official position. At that point, the telecom carrier's representative intervened and bluntly told the set-top box representative to just shut up. If the set top box guy was furious, the telecom guy was even more so, 'You guys told us animation was not possible and these people are showing that it is! You manufacture the set-top box. How can you not know what it is capable of?' Meanwhile, I continued working and completed my proof-of-concept implementation. It worked very well in the emulator, but I did not yet have access to the actual hardware. The device was still in the process of being shipped to us, so all my early proof-of-concepts ran on the emulator. The following week, the architect planned to travel to the set-top box company's office to test my widgets on the real hardware. At the time, I was quite proud of demonstrating results that even the hardware maker believed were impossible. When the architect eventually travelled to test the widgets on the actual device, a problem emerged. What looked like buttery smooth animation on the emulator appeared noticeably choppy on a real television. Over the next few weeks, I experimented with frame rates, buffering strategies and optimising the computation done in the the rendering loop. Each week, the architect travelled for testing and returned with the same report: the animation had improved somewhat, but it still remained choppy. The modest embedded hardware simply could not keep up with the required computation and rendering. In the end, the telecom carrier decided that no animation was better than poor animation and dropped the idea altogether. So in the end, the set-top box developers turned out to be correct after all. Back in 2009, after completing about a year at RSA Security, I began looking for work that felt more intellectually stimulating, especially projects involving mathematics and algorithms. I spoke with a few senior leaders about this, but nothing materialised for some time. Then one day, Dr Burt Kaliski, Chief Scientist at RSA Laboratories, asked to meet me to discuss my career aspirations. I have written about this in more detail in another post here: Good Blessings . I will summarise what followed. Dr Kaliski met me and offered a few suggestions about the kinds of teams I might approach to find more interesting work. I followed his advice and eventually joined a team that turned out to be an excellent fit. I remained with that team for the next six years. During that time, I worked on parser generators, formal language specification and implementation, as well as indexing and querying components of a petabyte-scale database. I learnt something new almost every day during those six years. It remains one of the most enjoyable periods of my career. I have especially fond memories of working on parser generators alongside remarkably skilled engineers from whom I learnt a lot. Years later, I reflected on how that brief meeting with Dr Kaliski had altered the trajectory of my career. I realised I was not sure whether I had properly expressed my gratitude to him for the role he had played in shaping my path. So I wrote to thank him and explain how much that single conversation had influenced my life. A few days later, Dr Kaliski replied, saying he was glad to know that the steps I took afterwards had worked out well. Before ending his message, he wrote this heart-warming note: This story comes from 2019. By then, I was no longer a twenty-something engineer just starting out. I was now a middle-aged staff engineer with years of experience building both low-level networking systems and database systems. Most of my work up to that point had been in C and C++. I was now entering a new phase where I would be developing microservices professionally in languages such as Go and Python. None of this was unfamiliar territory. Like many people in this profession, computing has long been one of my favourite hobbies. So although my professional work for the previous decade had focused on C and C++, I had plenty of hobby projects in other languages, including Python and Go. As a result, switching gears from systems programming to application development was a smooth transition for me. I cannot even say that I missed working in C and C++. After all, who wants to spend their days occasionally chasing memory bugs in core dumps when you could be building features and delivering real value to customers? In October 2019, during Cybersecurity Awareness Month, a Capture the Flag (CTF) event was organised at our office. The contest featured all kinds of puzzles, ranging from SQL injection challenges to insecure cryptography problems. Some challenges also involved reversing binaries and exploiting stack overflow issues. I am usually rather intimidated by such contests. The whole idea of competitive problem-solving under time pressure tends to make me nervous. But one of my colleagues persuaded me to participate in the CTF. And, somewhat to my surprise, I turned out to be rather good at it. Within about eight hours, I had solved roughly 90% of the puzzles. I finished at the top of the scoreboard. In my younger days, I was generally known to be a good problem solver. I was often consulted when thorny problems needed solving and I usually managed to deliver results. I also enjoyed solving puzzles. I had a knack for them and happily spent hours, sometimes days, working through obscure mathematical or technical puzzles and sharing detailed write-ups with friends of the nerd variety. Seen in that light, my performance at the CTF probably should not have surprised me. Still, I was very pleased. It was reassuring to know that I could still rely on my systems programming experience to solve obscure challenges. During the course of the contest, my performance became something of a talking point in the office. Colleagues occasionally stopped by my desk to appreciate my progress in the CTF. Two much younger colleagues, both engineers I admired for their skill and professionalism, were discussing the results nearby. They were speaking softly, but I could still overhear parts of their conversation. Curious, I leaned slightly and listened a bit more carefully. I wanted to know what these two people, whom I admired a lot, thought about my performance. One of them remarked on how well I was doing in the contest. The other replied, 'Of course he is doing well. He has more than ten years of experience in C.' At that moment, I realised that no matter how well I solved those puzzles, the result would naturally be credited to experience. In my younger days, when I solved tricky problems, people would sometimes call me smart. Now it was expected. Not that I particularly care for such labels anyway, but it did make me realise how things had changed. I was now simply the person with many years of experience. Solving technical puzzles that involved disassembling binaries, tracing execution paths and reconstructing program logic was expected rather than remarkable. I continue to sharpen my technical skills to this day. While my technical results may now simply be attributed to experience, I hope I can continue to make a good impression through my professionalism, ethics and kindness towards the people I work with. If those leave a lasting impression, that is good enough for me. Read on website | #technology | #programming My First Lesson in HTML The Reset Vector My First Job Sphagetti Code Animated Television Widgets Good Blessings The CTF Scoreboard

0 views
Hugo 5 days ago

AI's Impact on the State of the Art in Software Engineering in 2026

2025 marked a major turning point in AI usage, far beyond simple individual use. Since 2020, we've moved from autocomplete to industrialization: Gradually moving from a few lines produced by autocomplete to applications coded over 90% by AI assistants, dev teams must face the obligation to industrialize this practice at the risk of major disappointments. And more than that, as soon as the developer's job changes, it's actually the entire development team that must evolve with it. It's no longer just a simple tooling issue, but an industrialization issue at the team scale, just as automated testing frameworks changed how software was created in the early 2000s. (We obviously tested before the 2000s, but how we thought about automating these tests through xUnit frameworks, the advent of software factories (CI/CD), etc., is more recent) In this article, we'll explore how dev teams have adapted through testimonials from several tech companies that participated in the writing by addressing: While the term vibe coding became popular in early 2025, we now more readily speak of Context driven engineering or agentic engineering . The idea is no longer to give a prompt, but to provide complete context including the intention AND constraints (coding guidelines, etc.). Context Driven Engineering aims to reduce the non-deterministic part of the process and ensure the quality of what is produced. With Context Driven Engineering, while specs haven't always been well regarded, they become a first-class citizen again and become mandatory before code. Separate your process into two PRs: Source: Charles-Axel Dein (ex CTO Octopize and ex VP Engineering at Gens de confiance) We find this same logic here at Clever Cloud: Here is the paradox: when code becomes cheap, design becomes more valuable. Not less. You can now afford to spend time on architecture, discuss tradeoffs, commit to an approach before writing a single line of code. Specs are coming back, and the judgment to write good ones still requires years of building systems. Source: Pierre Zemb (Staff Engineer at Clever Cloud) or at Google One common mistake is diving straight into code generation with a vague prompt. In my workflow, and in many others', the first step is brainstorming a detailed specification with the AI, then outlining a step-by-step plan, before writing any actual code. Source: Addy Osmani (Director on Google Cloud AI) In short, we now find this method everywhere: Spec: The specification brings together use cases: the intentions expressed by the development team. It can be called RFC (request for change), ADR (architecture decision record), or PRD (Product requirement document) depending on contexts and companies. This is the basic document to start development with an AI. The spec is usually reviewed by product experts, devs or not. AI use is not uncommon at this stage either (see later in the article). But context is not limited to that. To limit unfortunate AI initiatives, you also need to provide it with constraints, development standards, tools to use, docs to follow. We'll see this point later. Plan: The implementation plan lists all the steps to implement the specification. This list must be exhaustive, each step must be achievable by an agent autonomously with the necessary and sufficient context. This is usually reviewed by seniors (architect, staff, tech lead, etc., depending on companies). Act: This is the implementation step and can be distributed to agentic sessions. In many teams, this session can be done according to two methods: We of course find variations, such as at Ilek which details the Act part more: We are in the first phase of industrialization which is adoption. The goal is that by the end of the quarter all devs rely on this framework and that the use of prompts/agents is a reflex. So we're aiming for 100% adoption by the end of March. Our workflow starts from the need and breaks down into several steps that aim to challenge devs in the thinking phases until validation of the produced code. Here's the list of steps we follow: 1- elaborate (challenges the need and questions edge cases, technical choices, architecture, etc.) 2- plan (proposes a technical breakdown, this plan is provided as output in a Markdown file) 3- implement (Agents will carry out the plan steps) 4- assert (an agent will validate that the final result meets expectations, lint, test, guideline) 5- review (agents will do a technical and functional review) 6- learn (context update) 7- push (MR creation on gitlab) This whole process is done locally and piloted by a developer. Cédric Gérard (Ilek) While this 3-phase method seems to be consensus, we see quite a few experiments to frame and strengthen these practices, particularly with two tools that come up regularly in discussions: Bmad and SpeckKit . Having tested both, we can quite easily end up with somewhat verbose over-documentation and a slowdown in the dev cycle. I have the intuition that we need to avoid digitally reproducing human processes that were already shaky. Do we really need all the roles proposed by BMAD for example? I felt like I was doing SaFe in solo mode and it wasn't a good experience :) What is certain is that if the spec becomes queen again, the spec necessary for an AI must be simple, unambiguous. Verbosity can harm the effectiveness of code assistants. While agentic mode seems to be taking over copilot mode, this comes with additional constraints to ensure quality. We absolutely want to ensure: To ensure the quality produced, teams provide the necessary context to inform the code assistant of the constraints to respect. Paradoxically, despite vibe coding's bad reputation and its use previously reserved for prototypes, Context Driven Engineering puts the usual good engineering practices (test harness, linters, etc.) back in the spotlight. Without them, it becomes impossible to ensure code and architecture quality. In addition to all the classic good practices, most agent systems come with their own concepts: the general context file ( agents.md ), skills, MCP servers, agents. A code assistant will read several files in addition to the spec you provide it. Each code assistant offers its own file: for Claude, for Cursor, for Windsurf, etc. There is an attempt at harmonization via agents.md but the idea is always broadly the same: a sort of README for AI. This README can be used hierarchically, we can indeed have a file at the root, then a file per directory where it's relevant. This file contains instructions to follow systematically, example: and can reference other files. Having multiple files allows each agent to work with reduced context, which improves the efficiency of the agent in question (not to mention savings on costs). Depending on the tools used, we find several notions that each have different uses. A skill explains to an AI agent how to perform a type of operation. For example, we can give it the commands to use to call certain code generation or static verification tools. An agent can be involved to take charge of a specific task. We can for example have an agent dedicated to external documentation with instructions regarding the tone to adopt, the desired organization, etc. MCP servers allow enriching the AI agent's toolbox. This can be direct access to documentation (for example the Nuxt doc ), or even tools to consult test account info like Stripe's MCP . It's still too early to say, but we could see the appearance of a notion of technical debt linked to the stacking of these tools and it's likely that we'll see refactoring and testing techniques emerge in the future. With the appearance of these new tools comes a question: how to standardize practice and benefit from everyone's good practices? As Benjamin Levêque (Brevo) says: The idea is: instead of everyone struggling with their own prompts in their corner, we pool our discoveries so everyone benefits. One of the first answers for pooling relies on the notion of corporate marketplace: At Brevo, we just launched an internal marketplace with skills and agents. It allows us to standardize code generated via AI (with Claude Code), while respecting standards defined by "experts" in each domain (language, tech, etc.). The 3 components in claude code: We transform our successes into Skills (reusable instructions), Subagents (specialized AIs) and Patterns (our best architectures). Don't reinvent the wheel: We move from "feeling-based" use to a systematic method. Benjamin Levêque and Maxence Bourquin (Brevo) At Manomano we also initiated a repository to transpose our guidelines and ADRs into a machine-friendly format. We then create agents and skills that we install in claude code / opencode. We have an internal machine bootstrap tool, we added this repo to it which means all the company's tech people are equipped. It's then up to each person to reference the rules or skills that are relevant depending on the services. We have integration-type skills (using our internal IaC to add X or Y), others that are practices (doing code review: how to do react at Manomano) and commands that cover more orchestrations (tech refinement, feature implementation with review). We also observe that it's difficult to standardize MCP installations for everyone, which is a shame when we see the impact of some on the quality of what we can produce (Serena was mentioned and I'll add sequential-thinking). We're at the point where we're wondering how to guarantee an iso env for all devs, or how to make it consistent for everyone Vincent AUBRUN (Manomano) At Malt, we also started pooling commands / skills / AGENTS.MD / CLAUDE.MD. Classically, the goal of initial versions is to share a certain amount of knowledge that allows the agent not to start from scratch. Proposals (via MR typically) are reviewed within guilds (backend / frontend / ai). Note that at the engineering scale we're still searching a lot. It's particularly complicated to know if a shared element is really useful to the greatest number. Guillaume Darmont (Malt) Note that there are public marketplaces, we can mention: Be careful however, it's mandatory to review everything you install… Among deployment methods, many have favored custom tools, but François Descamps from Axa cites us another solution: For sharing primitives, we're exploring APM ( agent package manager ) by Daniel Meppiel. I really like how it works, it's quite easy to use and is used for the dependency management part like NPM. Despite all the instructions provided, it regularly happens that some are ignored. It also happens that instructions are ambiguous and misinterpreted. This is where teams necessarily implement tools to frame AIs: While the human eye remains mandatory for all participants questioned, these tools themselves can partially rely on AIs. AIs can indeed write tests. The human then verifies the relevance of the proposed tests. Several teams have also created agents specialized in review with very specific scopes: security, performance, etc. Others use automated tools, some directly connected to CI (or to Github). (I'm not citing them but you can easily find them). Related to this notion of CI/CD, a question that often comes up: It's also very difficult to know if an "improvement", i.e. modification in the CLAUDE.MD file for example, really is one. Will the quality of responses really be better after the modification? Guillaume Darmont (Malt) Can I evaluate a model? If I change my guidelines, does the AI still generate code that passes my security and performance criteria? Can we treat prompt/context like code (Unit testing of prompts). To this Julien Tanay (Doctolib) tells us: About the question "does this change on the skill make it better or worse", we're going to start looking at and (used in prod for product AI with us) to do eval in CI.(...) For example with promptfoo, you'll verify, in a PR, that for the 10 variants of a prompt "(...) setup my env" the env-setup skill is indeed triggered, and that the output is correct. You can verify the skill call programmatically, and the output either via "human as a judge", or rather "LLM as a judge" in the context of a CI All discussions seem to indicate that the subject is still in research, but that there are already work tracks. We had a main KPI which was to obtain 100% adoption for these tools in one quarter (...) At the beginning our main KPI was adoption, not cost. Julien Tanay (Staff engineer at Doctolib) Cost indeed often comes second. The classic pattern is adoption, then optimization. To control costs, there's on one hand session optimization, which involves For example we find these tips proposed by Alexandre Balmes on Linkedin . This cost control can be centralized with enterprise licenses. This switch between individual key and enterprise key is sometimes part of the adoption procedure: We have a progressive strategy on costs. We provide an api key for newcomers, to track their usage and pay as close to consumption as possible. Beyond a threshold we switch them to Anthropic enterprise licenses as we estimate it's more interesting for daily usage. Vincent Aubrun (ManoMano) On the monthly cost per developer, the various discussions allow us to identify 3 categories: The vast majority oscillates between category 1 and 2. When we talk about governance, documentation having become the new programming language, it becomes a first-class citizen again. We find it in markdown specs present on the project, ADRs/RFCs, etc. These docs are now maintained at the same time as code is produced. So we declared that markdown was the source of truth. Confluence in shambles :) Julien Tanay (Doctolib) It's no longer a simple micro event in the product dev cycle, managed because it must be and put away in the closet. The most mature teams now evolve the doc to evolve the code, which avoids the famous syndrome of piles of obsolete company documents lying around on a shared drive. This has many advantages, it can be used by specialized agents for writing user doc (end user doc), or be used in a RAG to serve as a knowledge base, for customer support, onboarding newcomers, etc. The integration of this framework impacts the way we manage incidents. It offers the possibility to debug our services with specialized agents that can rely on logs for example. It's possible to query the code and the memory bank which acts as living documentation. Cédric Gérard (Ilek) One of the major subjects that comes up is obviously intellectual property. It's no longer about making simple copy-pastes in a browser with chosen context, but giving access to the entire codebase. This is one of the great motivations for switching to enterprise licenses which contain contractual clauses like "zero data training", or even " zero data retention ". In 2026 we should also see the appearance of the AI act and ISO 42001 certification to audit how data is collected and processed. In enterprise usage we also note setups via partnerships like the one between Google and Anthropic: On our side, we don't need to allocate an amount in advance, nor buy licenses, because we use Anthropic models deployed on Vertex AI from one of our GCP projects. Then you just need to point Claude Code to Vertex AI. This configuration also addresses intellectual property issues. On all these points, another track seems to be using local models. We can mention Mistral (via Pixtral or Codestral) which offers to run these models on private servers to guarantee that no data crosses the company firewall. I imagine this would also be possible with Ollama. However I only met one company working on this track during my discussions. But we can anticipate that the rise of local models will rather be a 2026 or 2027 topic. While AI is now solidly established in many teams, its impacts now go beyond the framework of development alone. We notably find reflections around recruitment at Alan Picture this: You're hiring a software engineer in 2025, and during the technical interview, you ask them to solve a coding problem without using any AI tools. It's like asking a carpenter to build a house without power tools, or a designer to create graphics without Photoshop. You're essentially testing them on skills they'll never use in their actual job. This realization hit us hard at Alan. As we watched our engineering teams increasingly rely on AI tools for daily tasks — with over 90% of engineers using AI-powered coding assistants — we faced an uncomfortable truth: our technical interview was completely disconnected from how modern engineers actually work. Emma Goldblum (Engineering at Alan) One of the big subjects concerns junior training who can quickly be in danger with AI use. They are indeed less productive now, and don't always have the necessary experience to properly challenge the produced code, or properly write specifications. A large part of the tasks previously assigned to juniors is now monopolized by AIs (boiler plate code, form validation, repetitive tasks, etc.). However, all teams recognize the necessity to onboard juniors to avoid creating an experience gap in the future. Despite this awareness, I haven't seen specific initiatives on the subject that would aim to adapt junior training. Finally, welcoming newcomers is disrupted by AI, particularly because it's now possible to accompany them to discover the product Some teams have an onboarding skill that helps to setup the env, takes a tour of the codebase, makes an example PR... People are creative* Julien Tanay (Doctolib) As a side effect, this point is deemed facilitated by the changes induced by AI, particularly helped by the fact that documentation is updated more regularly and that all guidelines are very explicit. One of the little-discussed elements remains supporting developers facing a mutation of their profession. We're moving the value of developers from code production to business mastery. This requires taking a lot of perspective. Code writing, practices like TDD are elements that participate in the pleasure we take in work. AI comes to disrupt that and some may not be able to thrive in this evolution of our profession Cédric Gérard (Ilek) The question is not whether the developer profession is coming to an end, but rather to what extent it's evolving and what are the new skills to acquire. We can compare these evolutions to what happened in the past during transitions between punch cards and interactive programming, or with the arrival of higher-level languages. With AI, development teams gain a level of abstraction, but keep the same challenges: identifying the right problems to solve, finding what are the adequate technological solutions, thinking in terms of security, performance, reliability and tradeoffs between all that. Despite everything, this evolution is not necessarily well experienced by everyone and it becomes necessary in teams to support people to consider development from a different angle to find the interest of the profession again. Cédric Gérard also warns us against other risks: There's a risk on the quality of productions that decreases. AI not being perfect, you have to be very attentive to the generated code. However reviewing code is not like producing code. Review is tedious and we can very quickly let ourselves go. To this is added a risk of skill loss. Reading is not writing and we can expect to develop an evaluation capacity, but losing little by little in creativity 2025 saw the rise of agentic programming, 2026 will undoubtedly be a year of learning in companies around the industrialization of these tools. There are points I'm pleased about, it's the return in force of systems thinking . "Context Driven Engineering" forces us to become good architects and good product designers again. If you don't know how to explain what you want to do (the spec) and how you plan to do it (the plan), AI won't save you; it will just produce technical debt at industrial speed. Another unexpected side effect could be the end of ego coding , the progressive disappearance of emotional attachment to produced code that sometimes created complicated discussions, for example during code reviews. Hoping this makes us more critical and less reluctant to throw away unused code and features. In any case, the difference between an average team and an elite team has never been so much about "old" skills. Knowing how to challenge an architecture, set good development constraints, have good CI/CD, anticipate security flaws, and maintain living documentation will be all the more critical than before. And from experience this is not so acquired everywhere. Now, there are questions, we'll have to learn to pilot a new ecosystem of agents while keeping control. Between sovereignty issues, questions around local models, the ability to test reproducibility and prompt quality, exploding costs and the mutation of the junior role, we're still in full learning phase. 2021 with Github Copilot: individual use, essentially focused on advanced autocomplete. then browser-based use for more complex tasks, requiring multiple back-and-forths and copy-pasting 2025 with Claude Code, Windsurf and Cursor: use on the developer's workstation through code assistants Context Driven Engineering, the new paradigm Spec/Plan/Act: the reference workflow The AI Rules ecosystem Governance and industrialization Human challenges The PR with the plan. The PR with the implementation. The main reason is that it mimics the classical research-design-implement loop. The first part (the plan) is the RFC. Your reviewers know where they can focus their attention at this stage: the architecture, the technical choices, and naturally their tradeoffs. It's easier to use an eraser on the drawing board, than a sledgehammer at the construction site copilot /pair programming mode with validation of each modification one by one agent mode, where the developer gives the intention then verifies the result (we'll see how later) that the implementation respects the spec that the produced code respects the team's standards that the code uses the right versions of the project's libraries the Claude marketplace a marketplace by vercel test harness code reviews keeping session windows short, having broken down work into small independent steps. using the /compact command to keep only the necessary context (or flushing this context into a file to start a new session)

3 views
Martin Fowler 5 days ago

Context Engineering for Coding Agents

The number of options we have to configure and enrich a coding agent’s context has exploded over the past few months. Claude Code is leading the charge with innovations in this space, but other coding assistants are quickly following suit. Powerful context engineering is becoming a huge part of the developer experience of these tools. Birgitta Böckeler explains the current state of context configuration features, using Claude Code as an example.

0 views
Anton Zhiyanov 5 days ago

(Un)portable defer in C

Modern system programming languages, from Hare to Zig, seem to agree that is a must-have feature. It's hard to argue with that, because makes it much easier to free memory and other resources correctly, which is crucial in languages without garbage collection. The situation in C is different. There was a N2895 proposal by Jens Gustedt and Robert Seacord in 2021, but it was not accepted for C23. Now, there's another N3734 proposal by JeanHeyd Meneide, which will probably be accepted in the next standard version. Since isn't part of the standard, people have created lots of different implementations. Let's take a quick look at them and see if we can find the best one. C23/GCC  • C11/GCC  • GCC/Clang  • MSVC  • Long jump  • STC  • Stack  • Simplified GCC/Clang  • Final thoughts Jens Gustedt offers this brief version: Usage example: This approach combines C23 attribute syntax ( ) with GCC-specific features: nested functions ( ) and the attribute. It also uses the non-standard macro (supported by GCC, Clang, and MSVC), which expands to an automatically increasing integer value. Nested functions and cleanup in GCC A nested function (also known as a local function) is a function defined inside another function: Nested functions can access variables from the enclosing scope, similar to closures in other languages, but they are not first-class citizens and cannot be passed around like function pointers. The attribute runs a function when the variable goes out of scope: The function should take one parameter, which is a pointer to a type that's compatible with the variable. If the function returns a value, it will be ignored. On the plus side, this version works just like you'd expect to work. On the downside, it's only available in C23+ and only works with GCC (not even Clang supports it, because of the nested function). We can easily adapt the above version to use C11: Usage example: The main downside remains: it's GCC-only. Clang fully supports the attribute, but it doesn't support nested functions. Instead, it offers the blocks extension, which works somewhat similar: We can use Clang blocks to make a version that works with both GCC and Clang: Usage example: Now it works with Clang, but there are several things to be aware of: On the plus side, this implementation works with both GCC and Clang. The downside is that it's still not standard C, and won't work with other compilers like MSVC. MSVC, of course, doesn't support the cleanup attribute. But it provides "structured exception handling" with the and keywords: The code in the block will always run, no matter how the block exits — whether it finishes normally, returns early, or crashes (for example, from a null pointer dereference). This isn't the we're looking for, but it's a decent alternative if you're only programming for Windows. There are well-known implementations by Jens Gustedt and moon-chilled that use and . I'm mentioning them for completeness, but honestly, I would never use them in production. The first one is extremely large, and the second one is extremely hacky. Also, I'd rather not use long jumps unless it's absolutely necessary. Still, here's a usage example from Gustedt's library: Here, all deferred statements run at the end of the guarded block, no matter how we exit the block (normally or through ). The stc library probably has the simplest implementation ever: Usage example: Here, the deferred statement is passed as and is used as the loop increment. The "defer-aware" block of code is the loop body. Since the increment runs after the body, the deferred statement executes after the main code. This approach works with all mainstream compilers, but it falls apart if you try to exit early with or : Dmitriy Kubyshkin provides a implementation that adds a "stack frame" of deferred calls to any function that needs them. Here's a simplified version: Usage example: This version works with all mainstream compilers. Also, unlike the STC version, defers run correctly in case of early exit: Unfortunately, there are some drawbacks: The Stack version above doesn't support deferring code blocks. In my opinion, that's not a problem, since most defers are just "free this resource" actions, which only need a single function call with one argument. If we accept this limitation, we can simplify the GCC/Clang version by dropping GCC's nested functions and Clang's blocks: Works like a charm: Personally, I like the simpler GCC/Clang version better. Not having MSVC support isn't a big deal, since we can run GCC on Windows or use the Zig compiler, which works just fine. But if I really need to support GCC, Clang, and MSVC — I'd probably go with the Stack version. Anyway, I don't think we need to wait for to be added to the C standard. We already have at home! We must compile with . We must put a after the closing brace in the deferred block: . If we need to modify a variable inside the block, the variable must be declared with : Defer only supports single-function calls, not code blocks. We always have to call at the start of the function and exit using . In the original implementation, Dmitriy overrides the keyword, but this won't compile with strict compile flags (which I think we should always use). The deferred function runs before the return value is evaluated, not after.

0 views

Rewriting pycparser with the help of an LLM

pycparser is my most widely used open source project (with ~20M daily downloads from PyPI [1] ). It's a pure-Python parser for the C programming language, producing ASTs inspired by Python's own . Until very recently, it's been using PLY: Python Lex-Yacc for the core parsing. In this post, I'll describe how I collaborated with an LLM coding agent (Codex) to help me rewrite pycparser to use a hand-written recursive-descent parser and remove the dependency on PLY. This has been an interesting experience and the post contains lots of information and is therefore quite long; if you're just interested in the final result, check out the latest code of pycparser - the main branch already has the new implementation. While pycparser has been working well overall, there were a number of nagging issues that persisted over years. I began working on pycparser in 2008, and back then using a YACC-based approach for parsing a whole language like C seemed like a no-brainer to me. Isn't this what everyone does when writing a serious parser? Besides, the K&R2 book famously carries the entire grammar of the C99 language in an appendix - so it seemed like a simple matter of translating that to PLY-yacc syntax. And indeed, it wasn't too hard, though there definitely were some complications in building the ASTs for declarations (C's gnarliest part ). Shortly after completing pycparser, I got more and more interested in compilation and started learning about the different kinds of parsers more seriously. Over time, I grew convinced that recursive descent is the way to go - producing parsers that are easier to understand and maintain (and are often faster!). It all ties in to the benefits of dependencies in software projects as a function of effort . Using parser generators is a heavy conceptual dependency: it's really nice when you have to churn out many parsers for small languages. But when you have to maintain a single, very complex parser, as part of a large project - the benefits quickly dissipate and you're left with a substantial dependency that you constantly grapple with. And then there are the usual problems with dependencies; dependencies get abandoned, and they may also develop security issues. Sometimes, both of these become true. Many years ago, pycparser forked and started vendoring its own version of PLY. This was part of transitioning pycparser to a dual Python 2/3 code base when PLY was slower to adapt. I believe this was the right decision, since PLY "just worked" and I didn't have to deal with active (and very tedious in the Python ecosystem, where packaging tools are replaced faster than dirty socks) dependency management. A couple of weeks ago this issue was opened for pycparser. It turns out the some old PLY code triggers security checks used by some Linux distributions; while this code was fixed in a later commit of PLY, PLY itself was apparently abandoned and archived in late 2025. And guess what? That happened in the middle of a large rewrite of the package, so re-vendoring the pre-archiving commit seemed like a risky proposition. On the issue it was suggested that "hopefully the dependent packages move on to a non-abandoned parser or implement their own"; I originally laughed this idea off, but then it got me thinking... which is what this post is all about. The original K&R2 grammar for C99 had - famously - a single shift-reduce conflict having to do with dangling else s belonging to the most recent if statement. And indeed, other than the famous lexer hack used to deal with C's type name / ID ambiguity , pycparser only had this single shift-reduce conflict. But things got more complicated. Over the years, features were added that weren't strictly in the standard but were supported by all the industrial compilers. The more advanced C11 and C23 standards weren't beholden to the promises of conflict-free YACC parsing (since almost no industrial-strength compilers use YACC at this point), so all caution went out of the window. The latest (PLY-based) release of pycparser has many reduce-reduce conflicts [2] ; these are a severe maintenance hazard because it means the parsing rules essentially have to be tie-broken by order of appearance in the code. This is very brittle; pycparser has only managed to maintain its stability and quality through its comprehensive test suite. Over time, it became harder and harder to extend, because YACC parsing rules have all kinds of spooky-action-at-a-distance effects. The straw that broke the camel's back was this PR which again proposed to increase the number of reduce-reduce conflicts [3] . This - again - prompted me to think "what if I just dump YACC and switch to a hand-written recursive descent parser", and here we are. None of the challenges described above are new; I've been pondering them for many years now, and yet biting the bullet and rewriting the parser didn't feel like something I'd like to get into. By my private estimates it'd take at least a week of deep heads-down work to port the gritty 2000 lines of YACC grammar rules to a recursive descent parser [4] . Moreover, it wouldn't be a particularly fun project either - I didn't feel like I'd learn much new and my interests have shifted away from this project. In short, the Potential well was just too deep. I've definitely noticed the improvement in capabilities of LLM coding agents in the past few months, and many reputable people online rave about using them for increasingly larger projects. That said, would an LLM agent really be able to accomplish such a complex project on its own? This isn't just a toy, it's thousands of lines of dense parsing code. What gave me hope is the concept of conformance suites mentioned by Simon Willison . Agents seem to do well when there's a very clear and rigid goal function - such as a large, high-coverage conformance test suite. And pycparser has an very extensive one . Over 2500 lines of test code parsing various C snippets to ASTs with expected results, grown over a decade and a half of real issues and bugs reported by users. I figured the LLM can either succeed or fail and throw its hands up in despair, but it's quite unlikely to produce a wrong port that would still pass all the tests. So I set it to run. I fired up Codex in pycparser's repository, and wrote this prompt just to make sure it understands me and can run the tests: Codex figured it out (I gave it the exact command, after all!); my next prompt was the real thing [5] : Here Codex went to work and churned for over an hour . Having never observed an agent work for nearly this long, I kind of assumed it went off the rails and will fail sooner or later. So I was rather surprised and skeptical when it eventually came back with: It took me a while to poke around the code and run it until I was convinced - it had actually done it! It wrote a new recursive descent parser with only ancillary dependencies on PLY, and that parser passed the test suite. After a few more prompts, we've removed the ancillary dependencies and made the structure clearer. I hadn't looked too deeply into code quality at this point, but at least on the functional level - it succeeded. This was very impressive! A change like the one described above is impossible to code-review as one PR in any meaningful way; so I used a different strategy. Before embarking on this path, I created a new branch and once Codex finished the initial rewrite, I committed this change, knowing that I will review it in detail, piece-by-piece later on. Even though coding agents have their own notion of history and can "revert" certain changes, I felt much safer relying on Git. In the worst case if all of this goes south, I can nuke the branch and it's as if nothing ever happened. I was determined to only merge this branch onto main once I was fully satisfied with the code. In what follows, I had to git reset several times when I didn't like the direction in which Codex was going. In hindsight, doing this work in a branch was absolutely the right choice. Once I've sufficiently convinced myself that the new parser is actually working, I used Codex to similarly rewrite the lexer and get rid of the PLY dependency entirely, deleting it from the repository. Then, I started looking more deeply into code quality - reading the code created by Codex and trying to wrap my head around it. And - oh my - this was quite the journey. Much has been written about the code produced by agents, and much of it seems to be true. Maybe it's a setting I'm missing (I'm not using my own custom AGENTS.md yet, for instance), but Codex seems to be that eager programmer that wants to get from A to B whatever the cost. Readability, minimalism and code clarity are very much secondary goals. Using raise...except for control flow? Yep. Abusing Python's weak typing (like having None , false and other values all mean different things for a given variable)? For sure. Spreading the logic of a complex function all over the place instead of putting all the key parts in a single switch statement? You bet. Moreover, the agent is hilariously lazy . More than once I had to convince it to do something it initially said is impossible, and even insisted again in follow-up messages. The anthropomorphization here is mildly concerning, to be honest. I could never imagine I would be writing something like the following to a computer, and yet - here we are: "Remember how we moved X to Y before? You can do it again for Z, definitely. Just try". My process was to see how I can instruct Codex to fix things, and intervene myself (by rewriting code) as little as possible. I've mostly succeeded in this, and did maybe 20% of the work myself. My branch grew dozens of commits, falling into roughly these categories: Interestingly, after doing (3), the agent was often more effective in giving the code a "fresh look" and succeeding in either (1) or (2). Eventually, after many hours spent in this process, I was reasonably pleased with the code. It's far from perfect, of course, but taking the essential complexities into account, it's something I could see myself maintaining (with or without the help of an agent). I'm sure I'll find more ways to improve it in the future, but I have a reasonable degree of confidence that this will be doable. It passes all the tests, so I've been able to release a new version (3.00) without major issues so far. The only issue I've discovered is that some of CFFI's tests are overly precise about the phrasing of errors reported by pycparser; this was an easy fix . The new parser is also faster, by about 30% based on my benchmarks! This is typical of recursive descent when compared with YACC-generated parsers, in my experience. After reviewing the initial rewrite of the lexer, I've spent a while instructing Codex on how to make it faster, and it worked reasonably well. While working on this, it became quite obvious that static typing would make the process easier. LLM coding agents really benefit from closed loops with strict guardrails (e.g. a test suite to pass), and type-annotations act as such. For example, had pycparser already been type annotated, Codex would probably not have overloaded values to multiple types (like None vs. False vs. others). In a followup, I asked Codex to type-annotate pycparser (running checks using ty ), and this was also a back-and-forth because the process exposed some issues that needed to be refactored. Time will tell, but hopefully it will make further changes in the project simpler for the agent. Based on this experience, I'd bet that coding agents will be somewhat more effective in strongly typed languages like Go, TypeScript and especially Rust. Overall, this project has been a really good experience, and I'm impressed with what modern LLM coding agents can do! While there's no reason to expect that progress in this domain will stop, even if it does - these are already very useful tools that can significantly improve programmer productivity. Could I have done this myself, without an agent's help? Sure. But it would have taken me much longer, assuming that I could even muster the will and concentration to engage in this project. I estimate it would take me at least a week of full-time work (so 30-40 hours) spread over who knows how long to accomplish. With Codex, I put in an order of magnitude less work into this (around 4-5 hours, I'd estimate) and I'm happy with the result. It was also fun . At least in one sense, my professional life can be described as the pursuit of focus, deep work and flow . It's not easy for me to get into this state, but when I do I'm highly productive and find it very enjoyable. Agents really help me here. When I know I need to write some code and it's hard to get started, asking an agent to write a prototype is a great catalyst for my motivation. Hence the meme at the beginning of the post. One can't avoid a nagging question - does the quality of the code produced by agents even matter? Clearly, the agents themselves can understand it (if not today's agent, then at least next year's). Why worry about future maintainability if the agent can maintain it? In other words, does it make sense to just go full vibe-coding? This is a fair question, and one I don't have an answer to. Right now, for projects I maintain and stand behind , it seems obvious to me that the code should be fully understandable and accepted by me, and the agent is just a tool helping me get to that state more efficiently. It's hard to say what the future holds here; it's going to interesting, for sure. There was also the lexer to consider, but this seemed like a much simpler job. My impression is that in the early days of computing, lex gained prominence because of strong regexp support which wasn't very common yet. These days, with excellent regexp libraries existing for pretty much every language, the added value of lex over a custom regexp-based lexer isn't very high. That said, it wouldn't make much sense to embark on a journey to rewrite just the lexer; the dependency on PLY would still remain, and besides, PLY's lexer and parser are designed to work well together. So it wouldn't help me much without tackling the parser beast. The code in X is too complex; why can't we do Y instead? The use of X is needlessly convoluted; change Y to Z, and T to V in all instances. The code in X is unclear; please add a detailed comment - with examples - to explain what it does.

0 views
Martin Fowler 6 days ago

Fragments: February 4

I’ve spent a couple of days at a Thoughtworks-organized event in Deer Valley Utah. It was my favorite kind of event, a really great set of attendees in an Open Space format. These kinds of events are full of ideas, which I do want to share, but I can’t truthfully form them into a coherent narrative for an article about the event. However this fragment format suits them perfectly, so I’ll post a bunch of fragmentary thoughts from the event, both in this post, and in posts in the next few days. ❄                ❄                ❄                ❄                ❄ We talked about the worry that using AI can cause humans to have less understanding of the systems they are creating. In this discussion one person pointed out that one of the values of Pair Programming is that you have to regularly explain things to your pair. This is an important part of learning - for the person doing the explaining . After all one of the best ways to learn something is to try to teach it. ❄                ❄                ❄                ❄                ❄ One attendee is an SRE for a Very (Very) Large Code Base. He was less worried about people not understanding the code an LLM writes because he already can’t understand the VVLCB he’s responsible for. What he values is that the LLM helps him understand the what the code is doing, and he regularly uses it to navigate to the crucial parts of the code. There’s a general point here: Fully trusting the answer an LLM gives you is foolishness, but it’s wise to use an LLM to help navigate the way to the answer. ❄                ❄                ❄                ❄                ❄ Elsewhere on the internet, Drew Breunig wonders if software libraries of the future might be only specs and no code . To explore this idea he built a simple library to convert timestamps into phrases like “3 hours ago”. He used the spec to build implementations in seven languages. The spec is a markdown document of 500 lines and a set of tests in 500 lines of YAML. “What does software engineering look like when coding is free?” I’ve chewed on this question a bit, but this “software library without code” is a tangible thought experiment that helped firm up a few questions and thoughts. ❄                ❄                ❄                ❄                ❄ Bruce Schneier on the role advertising may play while chatting with LLMs Imagine you’re conversing with your AI agent about an upcoming vacation. Did it recommend a particular airline or hotel chain because they really are best for you, or does the company get a kickback for every mention? Recently I heard an ex-Googler explain that advertising was a gilded cage for Google, and they tried very hard to find another business model. The trouble is that it’s very lucrative but also ties you to the advertisers, who are likely to pull out whenever there is an economic downturn. Furthermore they also gain power to influence content - many controversies over “censorship” start with demands from advertisers. ❄                ❄                ❄                ❄                ❄ The news from Minnesota continues to be depressing. The brutality from the masked paramilitaries is getting worse, and their political masters are not just accepting this, but seem eager to let things escalate. Those people with the power to prevent this escalation are either encouraging it, or doing nothing. One hopeful sign from all this is the actions of the people of Minnesota. They have resisted peacefully so far, their principal weapons being blowing whistles and filming videos. They demonstrate the neighborliness and support of freedom and law that made America great. I can only hope their spirit inspires others to turn away from the path that we’re currently on. I enjoyed this portrayal of them from Adam Serwer (gift link) In Minnesota, all of the ideological cornerstones of MAGA have been proved false at once. Minnesotans, not the armed thugs of ICE and the Border Patrol, are brave. Minnesotans have shown that their community is socially cohesive—because of its diversity and not in spite of it. Minnesotans have found and loved one another in a world atomized by social media, where empty men have tried to fill their lonely soul with lies about their own inherent superiority. Minnesotans have preserved everything worthwhile about “Western civilization,” while armed brutes try to tear it down by force.

0 views
Brain Baking 6 days ago

Favourites of January 2026

The end of the start of another year has ended. So now all there is left to do is to look forward to the end of the next month, starting effective immediately, and of course ending after the end of the end we are going to look forward to. Quite the end-eavour. I guess I’ll end these ramblings by ending this paragraph. But not before this message of general interest: children can be very end-earing, but sometimes you also want to end their endless whining! Fin. Previous month: January 2026 . Is Emacs a game? I think it is. I spent every precious free minute of my time tinkering with my configuration, exploring and discovering all the weird and cool stuff the editor and the thousands of community-provided packages offer. You can tell when you’ve joined the cult when you’re exchanging emails with random internet strangers about obscure Elisp functions and even joining the sporadic “let’s share Emacs learnings!” video calls (thanks Seb ). Does receiving pre-ordered games count as played ? I removed the shrink wrap from Ruffy and my calendar tells me I should start ordering UFO 50 very very soon via . Now if only that stupid Emacs config would stabilise; perhaps then I could pick up the Switch again… The intention was to start learning Clojure but I somehow got distracted after learning the Emacs CIDER REPL is the one you want. A zoomed-out top-down view of the project, centered on Brain Baking (left) and Jefklak's Codex (right). Related topics: / metapost / By Wouter Groeneveld on 4 February 2026.  Reply via email . Nathan Rooy created a very cool One million (small web) screnshots project and explains the technicalities behind it. Browsing to find your blog (mine are in there!) is really cool. It’s also funny to discover the GenAI purple-slop-blob. Brain Baking is located just north of a small dark green lake of expired domain name screenshots. Jefklak’s Codex , being much more colourful, is located at the far edge, to the right of a small Spaceship-domain-shark lake: Shom Bandopadhaya helped me regain my sanity with the Emacs undo philosophy. Install vundo. Done. Related: Sacha Chua was writing and thinking about time travel with Emacs, Org mode, and backups . I promise there’ll be non-Emacs related links in here, somewhere! Keep on digging! Michael Klamerus reminded me the BioMenace remaster is already out there. I loved that game as a kid but couldn’t get past level 3 or 4. It’s known to be extremely difficult. Or I am known to be a noob. Lars Ingebrigtsen combats link rot with taking screenshots of external links . I wrote about link rot a while ago and I must say that’s a genius addition. On hover, a small screenshot appears to permanently frame the thing you’re pointing to. I need to think about implementing this myself. Seb pointed me towards Karthinks’ Emacs window management almanac , a wall of text I will have to re-read a couple of times. I did manage to write a few simple window management helper functions that primarily do stuff with only a 2-split, which is good enough. Mikko shared his Board Gaming Year recap of 2025 . Forest Shuffle reaching 500 plays is simply insane, even if you take out the BoardGameArena numbers. Alex Harri spent a lot of time building an image-to-ASCII renderer and explains how the project was approached. This Precondition Guide to Home Row Mods is really cool and with Karabiner Elements in MacOS totally possible. It will get messy once you start fiddling with the timing. Elsa Gonsiorowski wrote about Emacs Delete vs. Kill which again helped me build a proper mental state of what the hell is going on in this Alien editor. Matt Might shared shell scripts to improve your academic writing by simply scanning the text for so-called “weasel words”. Bad: We used various methods to isolate four samples Better: We isolated four samples . I must say, academic prose sure could use this script. Robert Lützner discovered and prefers it over Git . I’m interested in its interoperability with Git. Charles Choi tuned Emacs to write prose by modifying quite a few settings I have yet to dig into. A friend installed PiVPN recently. I hadn’t heard from that one just yet so perhaps it’s worth a mention here. KeepassXC is getting on my nerves. Perhaps I should simply use pass , the standard unix password manager. But it should also be usable by my wife so… Nah. Input is a cool flexible font system designed for code but also offers proportional fonts. I tried it for a while but now prefer… Iosevka for my variable pitch font. Here’s a random Orgdown cheat sheet that might be of use. With RepoSense it’s easy to visualise programmer activities across Git repositories. We’re using it to track student activities and make sure everyone participates. Tired of configuring tab vs space indent stuff for every programming language? Use EditorConfig , something that works across editors and IDEs.

0 views
Rob Zolkos 1 weeks ago

Deep Dive: How Claude Code’s /insights Command Works

The command in Claude Code generates a comprehensive HTML report analyzing your usage patterns across all your Claude Code sessions. It’s designed to help you understand how you interact with Claude, what’s working well, where friction occurs, and how to improve your workflows. It’s output is really cool and I encourage you to try it and read it through! Description: “Generate a report analyzing your Claude Code sessions” Output: An interactive HTML report saved to But what’s really happening under the hood? Let’s trace through the entire pipeline. The insights generation is a multi-stage process: The facets are cached in so subsequent runs are faster. Before any LLM calls, Claude Code processes your session logs to extract structured metadata. If a session transcript exceeds 30,000 characters, it’s chunked into 25,000-character segments and each chunk is summarized before facet extraction. This is the core qualitative analysis. For each session (up to 50 new sessions per run), Claude analyzes the transcript to extract structured “facets” - qualitative assessments of what happened. Model: Haiku (fast, cost-effective) Max output tokens: 4096 → → → → → → → → → → → → → Once all session data and facets are collected, they’re aggregated and processed through multiple specialized analysis prompts. Model: Haiku Max output tokens: 8192 per prompt Each analysis prompt receives aggregated statistics: Plus text summaries: This is the longest prompt, providing actionable recommendations: The final LLM call generates an executive summary that ties everything together. This prompt receives all the previously generated insights as context. All the collected data and LLM-generated insights are rendered into an interactive HTML report. Here’s how the stages connect: Facets are cached per-session, so running multiple times only analyzes new sessions. All analysis happens locally using the Anthropic API. Your session data stays on your machine - the HTML report is generated locally and can be shared at your discretion. The facet extraction focuses on patterns in your interactions, not the content of your code: Collect all your session logs from Filter out agent sub-sessions and internal operations Extract metadata from each session (tokens, tools used, duration, etc.) Run LLM analysis to extract “facets” (qualitative assessments) from session transcripts Aggregate all the data across sessions Generate insights using multiple specialized prompts Render an interactive HTML report Agent sub-sessions (files starting with ) Internal facet-extraction sessions Sessions with fewer than 2 user messages Sessions shorter than 1 minute - Unique identifier - When the session began - How long the session lasted - Number of user messages / - Token usage - Which tools were used and how often - Programming languages detected from file extensions / - Git activity - How often you interrupted Claude - Tool failures and their categories / / - Code changes / / / - Feature usage - Your initial message - Brief session summary SESSION SUMMARIES: Up to 50 brief summaries FRICTION DETAILS: Up to 20 friction details from facets USER INSTRUCTIONS TO CLAUDE: Up to 15 repeated instructions users gave Claude Total sessions, messages, duration, tokens Git commits and pushes Active days and streaks Peak activity hours Daily activity charts Tool usage distribution Language breakdown Satisfaction distribution Outcome tracking Project areas with descriptions Interaction style analysis What’s working well (impressive workflows) Friction analysis with specific examples CLAUDE.md additions to try Features to explore On the horizon opportunities Fun memorable moment What types of tasks you ask for How you respond to Claude’s output Where friction occurs in the workflow Which tools and features you use Use Claude Code regularly - More sessions = richer analysis Give feedback - Say “thanks” or “that’s not right” so satisfaction can be tracked Don’t filter yourself - Natural usage patterns reveal the most useful insights Run periodically - Check in monthly to see how your patterns evolve

0 views
Justin Duke 1 weeks ago

LLM as advance team

My first foray into using git worktree-style development — spinning up multiple workspaces and having LLM agents attack different problems in parallel — was a failure. I found myself simultaneously exhausted and unproductive, the equivalent of doing a circuit course in Bean Boots. The entire thing felt good in the pernicious way increasingly familiar to developers using these sorts of tools, where you can delude yourself into believing that noise and diffs indicate forward progress, even though at the end of the day you've run a marathon and out of the twenty PRs you shipped, five of them are buggy and three of them are relevant to the actual important thing you should be doing. I am back using worktrees and having a very good time doing so — with a slightly different mindset, which is to treat them not as parallel agents but as an advance team . If I've got five or six things I know I want to do in a given focus block, the first thing I'll do is open up five or six different worktrees within Conductor and spend two or three minutes dictating via Aqua high-level goals and linking in any relevant Linear, Sentry, or Slack data. Then Opus goes off and whirs for thirty minutes, and I don't pay any attention until I decide it's time to pick up that task and that task only. What I find is never complete, but always useful. Bug fixes might have correctly diagnosed the root cause but implemented the fix or regression test in a way that I find clumsy. Feature branches might have done an awful job wiring together the architecture but given me a couple Storybooks that I can click around and better refine my thoughts with. Refactors give me a sense of the scope and a few regression tests that were missing. More often than not, the branch spun up by Conductor never even makes it to GitHub, let alone the main branch. It's purely a first draft from an overstimulated and undercompensated robotic junior colleague — but that is value additive. I couldn't tell you what the dollar amount of that value is, but right now it's certainly greater than zero. And the tax on my workflow is minimal at best.

0 views

Local Lean Workgroup Retro

I started learning formalized mathematics with Lean last year 1 , 2 , 3 , 4 . As I progressed beyond basics, it dawned on me that I need to start interacting with the broader community as Lean is rapidly evolving and beyond basic getting started guides there aren’t as many high-quality resources to learn solo. It could also be a great way to meet folks with similar interests and build more social connections. To that end I tried to start a local workgroup around Oct'25 details . I have never done anything like this before, so this is a small retro of how it went so far (in Jan'26) and what’s next.

0 views