Claude code
My son has been using Claude Code at work and telling me how well it seems to perform. Eventually my inquisitiveness got the better of me and I took Claude for a spin. I was surprised how well it did, but also discovered some limitations and how to use it better. I was working on a deadlock of a particular test, so I asked Claude why the deadlock happened. After quite a lot of debugging and analysis, Claude came up with a plausible theory for the deadlock. It managed to code a very reasonable fix too. I got Claude to commit the fix and write a suitable commit log ( commit ). At that stage I asked my friends “Does that make me a bad person?”. It certainly felt uncomfortable to be committing code written by a LLM. The deadlock has not recurred since. A friend asked: That feels like quite a significant moment! And suppose instead of Claude it had been a new human collaborator joining the project for the first time, and you showed them the issue and pointed them at the repo… how impressed or otherwise would you have been if they produced the same patch and analysis in the same amount of time as Claude did? To which I replied “extremely”. Another pointed out that the fix was coded by Sonnet 4.5 and wondered if Opus would have produced a different outcome. Another friend wrote: Wow. I’m very impressed, but also at the same time glad I’ve retired already. I tend to agree. It’s a relief to delegate grunt work to a LLM, but I think some of my skills would start to atrophy. Also, I think it’s good for me to have to do unpleasant or laborious tasks occasionally. Next I asked Claude to improve the throughput of one specific benchmark. It spotted a FIXME of mine which would give an improvement and implemented it (simply, but correctly). It also spotted an issue I knew about, but had temporarily forgotten (needing to remove stale values from a hash map) and fixed that simply and correctly. This broke some tests, as expected, but Claude struggled to rework them until its quota ran out (to be renewed several hours later), so I fixed the tests by hand. If I wanted to anthropomorphise the behaviour, I’d say Claude acted like a capable junior developer who went home with a headache before getting the tests running. On this occasion, I asked Claude to change a public API to achieve a specific change to the implementation (which I detailed). It came up with a plausible plan, although one of the steps was to make an unrelated improvement, so I told it not to do that step. It went through and made changes according to the plan. At least once, it acted like a human in that it ran a build and then fixed up the errors (rather than applying the fixes ahead of time). One of the steps worried me a little because it seemed rather ambitious. It didn’t add any tests to increase confidence in the change. Anyway, I let it proceed. Finally, it ran the tests, fixed some which were failing, and declared success. I ran the tests and one (the most crucial one, given the changes) hung. Claude produced a rambling discourse on what might be causing the test to hang intermittently. It seemed to say that the omitted step was implemented and that this was somehow essential! Maybe the rest of the changes depended on that step in some subtle way? Anyway, Claude hit its quota limit again, so I came back another day and got Claude to debug the intermittent test and tidy up the code in smaller, safer steps. In anthropomorphic terms, this second case was like a mid-level developer who didn’t exercise enough caution when implementing a more complex piece of logic. I guided it through some design changes I’d been mulling over and it spotted problems in the details and found good solutions. The icing on the cake was when I asked it to suggest a new naming scheme for some types and it came up with a good option and described some alternatives it had considered and why the main option was preferable. Another nice touch was when I asked it to update the README to reflect the design goals of the project and how they are implemented - its changes, including fixing a typo of mine in a URL, were really good. This feels like the situation where an experienced colleague joins the team, gets up to speed really fast, and then produces code as good, or perhaps better, than my own. I need to keep a careful eye on what they are doing in case they start to diverge from my design goals [1] , but the additional capability is well worth the cost of supervision. In this experiment I asked Claude to propose and extract some submodules only to find that it had made most of the struct fields , which destroys any pretence of encapsulation. So it seems that the more well-defined the task, the better. Also, it’s clear that, as someone else wrote, “AI doesn’t care”. I asked Claude to suggest possible solutions to the use of on fields. Some of its suggestions were interesting, but I ultimately did most of the work by hand using Claude for specific, laborious tasks. ( Modularisation pull request ) I thought I’d use it to take a crack at a feature I’ve been putting off for ages: shared memory support. I suspected this wouldn’t be too bad, but it would probably be a little laborious making sure the API was functionally equivalent to the shared memory API of . I asked Claude to plan the change. It wanted to submit a PR on to access the thread locals used to (de)serialise shared memory, so I told it to re-plan without changing . Next it planned to (de)serialise shared memory as a byte vector, which would miss the main advantage of using shared memory. So I told it to create a thin wrapper round ’s shared memory type. It re-planned and I let it execute the plan without interruption. It produced code and tests quite quickly and these looked in good shape. When I compared the API with that of , it was missing a couple of methods on the shared memory type, so I asked it to implement these. I had to ask it to write tests, which it did perfectly well even though at least one corresponding method was untested in (so it wasn’t simple copying/massaging that code). I had to manually change the year of the copyright header of the new file, fix one formatting error, and delete the line in the README which mentioned shared memory needed to be implemented. Finally, I got it to describe shared memory in the README API overview and implementation overview. This all took about 1.5 hours whereas doing it by hand would probably have taken me 2-4 times longer. In anthropomorphic terms, it was like directing a capable, energetic, thorough underling who simply had never written a thin wrapper before. With my reduced appetite for simply grinding out code these days, this has been a very pleasant experience. ( Shared memory support pull request ) I had in mind an improvement to the interprocess protocol for . The improvement is to change polling to use on a response channel rather than sending “probe” messages on a forward channel (which could fill up the channel and block). Claude planned this out and generated the code. CI detected an intermittent hang on macOS in an integration test. I asked Claude to debug the hang and, on Linux, it went into “analysis paralysis”. So I switched to macOS so Claude had the option of debugging the test. However, the change of context allowed Claude to come up with a plausible theory of why the test sometimes hung on macOS. This was to do with the behaviour of ( returning instead of in this scenario), so I asked Claude to generate a test to demonstrate the behaviour so I could raise a bug against . It generated a test, but the test did not reproduce the behaviour, so Claude tried re-analysing the behaviour. After running out of quota once or twice and encountering “compaction” (where Claude tries to shed unnecessary context) and after quite a bit of waiting on my part, Claude came up with another plausible theory about the hang: a bug in the code. It proposed a solution to the bug which would have increased file descriptor consumption on Linux [2] , so I suggested a simpler and more efficient fix, which Claude then implemented. CI passed, but since the overall change to the protocol was significant, I first of all reviewed the changes as a pull request and, having found no issues, asked Claude to review the PR carefully. It found a bug in my simpler fix and its implementation. After it had fixed the bug, and improved the code structure in that area, CI still passed, so I ran the benchmarks to make sure performance hadn’t regressed and then merged the branch. In anthropomorphic terms, Claude feels like a really helpful colleague who is prone to go off on tangents and loose track of the bigger picture (e.g. design goals). With appropriate supervision, Claude is saving me a lot of effort and producing excellent code. Some friends who have also been using Claude discussed putting instructions in the repository to guide Claude in future. AGENTS.md seems to be the agent-neutral spot whereas CLAUDE.md can include Claude-specific instructions (and you can use to “include” the other instructions). My AGENTS.md currently has the following contents: and CLAUDE.md simply contains: The “Design goals” section of may not help, but I felt it was necessary after Claude proposed a solution to the hang which cut across one of the design goals (minimising file descriptor consumption on Linux). The first couple of commits including the deadlock diagnosis/fix used Sonnet 4.5 and the later ones seemed to have used Opus 4.6 apart from the protocol deadlock investigation which used Sonnet again (I deliberately switched to Opus for coding the fix and reviewing the PR). Both these models seems capable, but Sonnet seems to consume less quota and is more suited to lighter tasks. I guess my project, like a compiler, particularly lends itself to AI assistance because of the strong typing and the decent collection of docs, tests, and benchmarks. Claude Code is very impressive nevertheless. I continue to be concerned about the ethics of using AI assistance, especially the environmental impact. Tim Bray helpfully observed : At the end of the day, the business goal of GenAI is to boost monopolist profits by eliminating decent jobs, and damn the consequences. This is a horrifying prospect (although I’m somewhat comforted by my belief that it basically won’t work and most of the investment capital is heading straight down the toilet). But. All that granted, there’s a plausible case, specifically in software development, for exempting LLMs from this loathing. He then goes on to explain his reasoning, with which I agree, but I’ll let you read the post rather than reproduce the argument here. This reflection turned out to be prescient given what happened later (see the “Protocol improvement” section). ↩︎ The proposed fix was to associate a sender id. with each SubSender which would mean that the IpcSender would be transmitted every time another SubSender was transmitted from the same source to the same destination. ↩︎ This reflection turned out to be prescient given what happened later (see the “Protocol improvement” section). ↩︎ The proposed fix was to associate a sender id. with each SubSender which would mean that the IpcSender would be transmitted every time another SubSender was transmitted from the same source to the same destination. ↩︎