Latest Posts (20 found)
iDiallo Today

Where did you think the training data was coming from?

When the news broke that Meta's smart glasses were feeding data directly into their Facebook servers , I wondered what all the fuss was about. Who thought AI glasses used to secretly record people would be private? Then again, I've grown cynical over the years . The camera on your laptop is pointed at you right now. When activated, it can record everything you do. When Zuckerberg posted a selfie with his laptop visible in the background, people were quick to notice that both the webcam and the microphone had black tape over them. If the CEO of one of the largest tech companies in the world doesn't trust his own device, what are the rest of us supposed to do? On my Windows 7 machine, I could at least assume the default behavior wasn't to secretly spy on me. With good security hygiene, my computer would stay safe. For Windows 10 and beyond, that assumption may no longer hold. Microsoft's incentives have shifted. They now require users to create an online account, which comes with pages of terms to agree to, and they are in the business of collecting data . As part of our efforts to improve and develop our products, we may use your data to develop and train our AI models. That's your local data being uploaded to their servers for their benefit. Under their licensing agreement (because you don't buy Windows, you only license it) you are contractually required to allow certain information to be sent back to Microsoft: By accepting this agreement or using the software, you agree to all of these terms, and consent to the transmission of certain information during activation and during your use of the software as per the privacy statement described in Section 3. If you do not accept and comply with these terms, you may not use the software or its features. The data transmitted includes telemetry, personalization, AI improvement, and advertising features. On a Chromebook, there was never an option to use the device without a Google account. Google is in the advertising business, and reading their terms of service, even partially, it all revolves around data collection. Your data is used to build a profile both for advertising and AI training. None of this is a secret. It's public information, buried in those terms of service agreements we blindly click through. Even Apple, which touts itself as privacy-first in every ad, was caught using user data without consent . Tesla employees were found sharing videos recorded inside customers' private homes . While some treat the Ray-Ban glasses story as an isolated incident, here is Yann LeCun, Meta's former chief AI scientist, describing transfer learning using billions of user images: We do this at Facebook in production, right? We train large convolutional nets to predict hashtags that people type on Instagram, and we train on literally billions of images. Then we chop off the last layer and fine-tune on whatever task we want. That works really well. That was seven years ago, and he was talking about pictures and videos people upload to Instagram. When you put your data on someone else's server, all you can do is trust that they use it as intended. Privacy policies are kept deliberately vague for exactly this reason. Today, Meta calls itself AI-first, meaning it's collecting even more to train its models. Meta's incentive to collect data exceeds even that of Google or Microsoft. Advertising is their primary revenue source. Last year, it accounted for 98% of their forecasted $189 billion in revenue . Yes, Meta glasses record you in moments you expect to be private, and their workers process those videos at their discretion. We shouldn't expect privacy from a camera or a microphone, or any internet-connected device, that we don't control. That's the reality we have to accept. AI is not a magical technology that simply happens to know a great deal about us. It is trained on a pipeline of people's information: video, audio, text. That's how it works. If you buy the device, it will monitor you.

0 views

25 Years Of ADSL Speed

Twenty-five years ago, I captured a screenshot of my FTP client showcasing the download of a SuSE Linux gcc compilation package at the dazzling rate of : Downloading the gcc cross-compiler for s390x through the ftp.belnet.be mirror. Note the then very new Windows XP Olive theme. For some reason, that screenshot must have been relevant, as I found it uploaded as part of my UnionVault.NET museum from 2002. Nowadays, such a download speed can officially be scoffed at as being slower than a snarky snail. Yet in 2000-2002, that was lightning-fast. Perspectives change. In Belgium, telecom company Belgacom introduced ADSL in 1999, significantly boosting our digital lives. No longer did I have to hang up the ISDN line when chatting over ICQ when mom wanted to do a quick phone call to grandma to ask about next week’s party. No longer did we have to listen to squeaky sounds and wait and wait and wait… for an image or file to appear. The future was here! For our family, the future was here a smidge earlier than the average Flemish family as my dad worked very close to the source. He was one of the Belgacom employees responsible for testing out various early ADSL modems at home, so our dialup method changed frequently. I do remember that we too were blessed with “The Frog”: the Alcatel ‘Stingray’ ADSL SpeedTouch USB Modem that looked like a frog or ray, depending on who you’d ask: The first iteration of the Alcatel SpeedTouch modem. That lovely shape was capable of handling at most downstream but our cables/ISP was not ready to handle that just yet. In September 2002, Belgacom announced they would further increased the ADSL bandwidth : Snelheidsverhoging: alle Belgacom ADSL-abonnementen. De maximum downstreamsnelheid bedroeg sinds de lancering 750 Kbit/s (ADSL GO) en 1Mbit/s (ADSL Plus-Pro-Office-Premium). Door de bijkomende investeringen en netwerkaanpassingen van Belgacom zal de meerderheid van de klanten pieksnelheden kunnen halen tot . Deze werkzaamheden zullen vermoedelijk voltooid zijn in het eerste kwartaal van 2003. Three whoppin’ megabits (not bytes) per second! Can you imagine that? I guess you can given the current average download speeds of… Wait, let me check speedtest.net … or, in other words, 93 times faster than the bleeding-edge 2003 speeds 1 . Try streaming your favourite YouTube video with a few megabits per second. YouTube didn’t exist until two years later (2005). Perspectives change. In that statement they mention they have 400k customers. Given the widespread adoption of internet in Belgium, that number can be safely multiplied by ten nowadays. The Skynet ISP that was bought up by Belgacom and hosted our very first personal homes under provided a monthly limit of . According to Belgacom in that same announcement, only a tiny portion of their users effectively hit that limit. Nowadays, everyone is accustomed to “stream whatever, whenever! YOLO!”. Back then, speeds were “high”, but we still had to be mindful of the stuff we downloaded each month, especially when wading through newsgroups looking for shady new releases Perspectives change. I wonder if my dad kept a list of the routing hardware we burned through in those late nineties/early noughties. All I can recall is that it was a lot . Since he was employed by the national telecom company that only really was (and still is) rivalled by a single other company—Telenet—we never tried the alternative. Nowadays, multiple “shadow” ISPs exist like Orange, Mobile Vikings, and Scarlet that hire the Proximus cable network. Proximus is the rebranding and full privatisation of Belgacom that was the rebranding of the institute RTT ( Regie voor Telegraaf en Telefoon —or, as my dad would call it, Rap Terug Thuis ). Unfortunately, the Web Archive never crawled all homes and I neglected to backup whatever my dad uploaded on there so our stuff is forever gone. I regret taking only a single screenshot of my download speed, so I cannot repeat this enough: archive your stuff ! That’s also the oldest screenshot of my machine/OS I have; the other desktop screenshots are from 2004+. This blog post is just an excuse to get that image under the moniker. According to meter.net historical speed tests results , only five years ago, for Belgium, that average was . Does this mean that in five years it’ll be on average ? That’s more than a CD-ROM in less than a second. Perspectives change. In twenty more years, nobody will remember what a CD-ROM even is.  ↩︎ Related topics: / adsl / screenshots / By Wouter Groeneveld on 11 March 2026.  Reply via email . According to meter.net historical speed tests results , only five years ago, for Belgium, that average was . Does this mean that in five years it’ll be on average ? That’s more than a CD-ROM in less than a second. Perspectives change. In twenty more years, nobody will remember what a CD-ROM even is.  ↩︎

0 views

curl 8.19.0

Release presentation Welcome to the curlhacker stream at 10:00 CET (09:00 UTC) today March 11, 2026 for a live-streamed presentation of curl 8.19.0. The changes, the security fixes and some bugfixes. the 273rd release 8 changes 63 days (total: 10,712) 264 bugfixes (total: 13,640) 538 commits (total: 38,024) 0 new public libcurl function (total: 100) 0 new curl_easy_setopt() option (total: 308) 0 new curl command line option (total: 273) 77 contributors, 48 new (total: 3,619) 37 authors, 21 new (total: 1,451) 4 security fixes (total: 180) We stopped the bug-bounty but it has not stopped people from finding vulnerabilities in curl. The following upcoming changes might be worth noticing. See the deprecate documentation for details. We plan to ship the next curl release on April 29. See you then! CVE-2026-1965: bad reuse of HTTP Negotiate connection CVE-2026-3783: token leak with redirect and netrc CVE-2026-3784: wrong proxy connection reuse with credentials CVE-2026-3805: use after free in SMB connection reuse We stopped the bug-bounty. It’s worth repeating, even if it was no code change. The cmake build got a option Initial support for MQTTS was merged curl now supports fractions for –limit-rate and –max-filesize curl’s -J option now uses the redirect name as a backup we no longer support OpenSSL-QUIC on Windows, curl can now get built to use the native CA store by default the minimum Windows version curl supports is now Vista (up from XP) NTLM support becomes opt-in RTMP support is getting dropped SMB support becomes opt-in Support for c-ares versions before 1.16 goes away Support for CMake 3.17 and earlier gets dropped TLS-SRP support will be removed

0 views
iDiallo Today

The Server Older than my Kids!

This blog runs on two servers. One is the main PHP blog engine that handles the logic and the database, while the other serves all static files. Many years ago, an article I wrote reached the top position on both Hacker News and Reddit. My server couldn't handle the traffic . I literally had a terminal window open, monitoring the CPU and restarting the server every couple of minutes. But I learned a lot from it. The page receiving all the traffic had a total of 17 assets. So in addition to the database getting hammered, my server was spending most of its time serving images, CSS and JavaScript files. So I decided to set up additional servers to act as a sort of CDN to spread the load. I added multiple servers around the world and used MaxMindDB to determine a user's location to serve files from the closest server . But it was overkill for a small blog like mine. I quickly downgraded back to just one server for the application and one for static files. Ever since I set up this configuration, my server never failed due to a traffic spike. In fact, in 2018, right after I upgraded the servers to Ubuntu 18.04, one of my articles went viral like nothing I had seen before . Millions of requests hammered my server. The machine handled the traffic just fine. It's been 7 years now. I've procrastinated long enough. An upgrade was long overdue. What kept me from upgrading to Ubuntu 24.04 LTS was that I had customized the server heavily over the years, and never documented any of it. Provisioning a new server means setting up accounts, dealing with permissions, and transferring files. All of this should have been straightforward with a formal process. Instead, uploading blog post assets has been a very manual affair. I only partially completed the upload interface, so I've been using SFTP and SCP from time to time to upload files. It's only now that I've finally created a provisioning script for my asset server. I mostly used AI to generate it, then used a configuration file to set values such as email, username, SSH keys, and so on. With the click of a button, and 30 minutes of waiting for DNS to update, I now have a brand new server running Ubuntu 24.04, serving my files via Nginx. Yes, next months Ubuntu 26.04 LTS comes out, and I can migrate it by running the same script. I also built an interface for uploading content without relying on SFTP or SSH, which I'll be publishing on GitHub soon. It's been 7 years running this server. It's older than my kids. Somehow, I feel a pang of emotion thinking about turning it off. I'll do it tonight... But while I'm at it, I need to do something about the 9-year-old and 11-year-old servers that still run some crucial applications.

0 views

Microsoft Patch Tuesday, March 2026 Edition

Microsoft Corp. today pushed security updates to fix at least 77 vulnerabilities in its Windows operating systems and other software. There are no pressing “zero-day” flaws this month (compared to February’s five zero-day treat), but as usual some patches may deserve more rapid attention from organizations using Windows. Here are a few highlights from this month’s Patch Tuesday. Image: Shutterstock, @nwz. Two of the bugs Microsoft patched today were publicly disclosed previously. CVE-2026-21262 is a weakness that allows an attacker to elevate their privileges on SQL Server 2016 and later editions. “This isn’t just any elevation of privilege vulnerability, either; the advisory notes that an authorized attacker can elevate privileges to sysadmin over a network,” Rapid7’s Adam Barnett said. “The CVSS v3 base score of 8.8 is just below the threshold for critical severity, since low-level privileges are required. It would be a courageous defender who shrugged and deferred the patches for this one.” The other publicly disclosed flaw is CVE-2026-26127 , a vulnerability in applications running on .NET . Barnett said the immediate impact of exploitation is likely limited to denial of service by triggering a crash, with the potential for other types of attacks during a service reboot. It would hardly be a proper Patch Tuesday without at least one critical Microsoft Office exploit, and this month doesn’t disappoint. CVE-2026-26113 and CVE-2026-26110 are both remote code execution flaws that can be triggered just by viewing a booby-trapped message in the Preview Pane. Satnam Narang at Tenable notes that just over half (55%) of all Patch Tuesday CVEs this month are privilege escalation bugs, and of those, a half dozen were rated “exploitation more likely” — across Windows Graphics Component, Windows Accessibility Infrastructure, Windows Kernel, Windows SMB Server and Winlogon. These include: – CVE-2026-24291 : Incorrect permission assignments within the Windows Accessibility Infrastructure to reach SYSTEM (CVSS 7.8) – CVE-2026-24294 : Improper authentication in the core SMB component (CVSS 7.8) – CVE-2026-24289 : High-severity memory corruption and race condition flaw (CVSS 7.8) – CVE-2026-25187 : Winlogon process weakness discovered by Google Project Zero (CVSS 7.8). Ben McCarthy , lead cyber security engineer at Immersive , called attention to CVE-2026-21536 , a critical remote code execution bug in a component called the Microsoft Devices Pricing Program. Microsoft has already resolved the issue on their end, and fixing it requires no action on the part of Windows users. But McCarthy says it’s notable as one of the first vulnerabilities identified by an AI agent and officially recognized with a CVE attributed to the Windows operating system. It was discovered by XBOW , a fully autonomous AI penetration testing agent. XBOW has consistently ranked at or near the top of the Hacker One bug bounty leaderboard for the past year. McCarthy said CVE-2026-21536 demonstrates how AI agents can identify critical 9.8-rated vulnerabilities without access to source code. “Although Microsoft has already patched and mitigated the vulnerability, it highlights a shift toward AI-driven discovery of complex vulnerabilities at increasing speed,” McCarthy said. “This development suggests AI-assisted vulnerability research will play a growing role in the security landscape.” Microsoft earlier provided patches to address nine browser vulnerabilities, which are not included in the Patch Tuesday count above. In addition, Microsoft issued a crucial out-of-band (emergency) update on March 2 for Windows Server 2022 to address a certificate renewal issue with passwordless authentication technology Windows Hello for Business. Separately, Adobe shipped updates to fix 80 vulnerabilities — some of them critical in severity — in a variety of products , including Acrobat and Adobe Commerce . Mozilla Firefox v. 148.0.2 resolves three high severity CVEs. For a complete breakdown of all the patches Microsoft released today, check out the SANS Internet Storm Center’s Patch Tuesday post . Windows enterprise admins who wish to stay abreast of any news about problematic updates, AskWoody.com is always worth a visit. Please feel free to drop a comment below if you experience any issues apply this month’s patches.

0 views

Writing an LLM from scratch, part 32e -- Interventions: the learning rate

I'm still working on improving the test loss for a from-scratch GPT-2 small base model, trained on code based on Sebastian Raschka 's book " Build a Large Language Model (from Scratch) ". In my training code, I have this code to create the optimiser: The values in there -- for the learning rate, and for the weight decay -- were just copied from the tiny training run that we do in section 5.2 of the book. What do those values actually mean, and are those really the right values for them? I felt I had a good handle on the learning rate, at least -- it's one of the first things you learn when you start looking at machine learning of any kind -- but how would you go about working out what the correct value for it was? On top of that, when I was reading the Chinchilla paper a while back, I noticed they repeatedly referred to a "cosine cycle" for the learning rate, which didn't fit into anything I'd learned about before. The weight decay was pretty much an unknown for me -- I know it is a parameter controlling the behaviour of the optimiser, but I don't know how it does that. In this post I want to look into the learning rate, and these mysterious cosines; I'll write a follow-up about the weight decay later. If you're reading this blog, you almost certainly know what the learning rate is, but let's go over it briefly to build a solid foundation. The way it's normally explained, using simple gradient descent, goes something like this. Let's assume that we're training a model with just one parameter, and it starts off set to − 5 . We run some training data through, and get a loss, let's say 44.44: We don't know what shape our loss curve is (if we did, we might be able to find the lowest loss algebraically), but we do know the differential of the parameter versus the loss at the point we've measured; it happens to be -13. That is reasonably large and negative: We use that information to say that we want to move in the direction of a larger value for our parameter -- that is, in our case where the gradient is negative, so we have a downhill slope towards the right, we want to increase the parameter to move rightwards on that chart, whereas if it were positive (an uphill slope) we'd want to decrease the parameter to move leftwards. Simply subtracting the gradient from the parameter would lead to an update in the right direction, but it would be a very large one in this case -- we'd move 13 units to the right -- so we multiply the gradient by a small positive number, the learning rate (often written as a lower-case eta, like this: η ), to move a small distance in that direction. Let's say η = 0.3 . That means we want to update our parameter: So now we run that through and get a new loss -- let's say it's 9.06 -- and a new gradient, which happens to be -5.2. Now we can do another update, and our parameter will become 0.46, so we use that and work out another loss and gradient, which come to 3.3816 and -2.08. Let's plot that one, but this time we'll draw back the veil and show the actual loss curve. Now, it's worth reiterating that while we're training this model we don't know what that curve looks like -- we're just finding points on it, along with its gradient at those points, and using that information to work out which parameter value to explore next. But it's pretty clear that as we continue, if the learning rate is set correctly, we'll get to the minimum eventually if the learning rate is the right kind of size, because -- due to the nice smooth U-shape of the curve, the gradient gets smaller the closer we get to the minimum 1 . It's also pretty clear that if the learning rate is smaller than an optimal value, in this simple case we will still find the right point, but it will take more steps because each one is smaller: And, of course, if the learning rate is too high, we might never converge -- we'd "bounce out of" the dip, and wind up with a parameter value that endlessly cycles between increasingly smaller and increasingly larger values, zooming off to infinity: OK, that's the basics. Why might we want to change from something that seems so logical and simple? A few paragraphs back I said: due to the nice smooth U-shape of the curve, the gradient gets smaller the closer we get to the minimum What if it doesn't? Imagine if we had something more like a V-shaped curve, like this: The gradient does not decrease as we get closer to the minimum, and so while we're in the downward-sloping part, each update is exactly the same distance: Now, eventually we'll jump over the minimum: In this example, I've used a gradient of − 8.33 on the downward-sloping part of the curve, and + 8.33 on the upward-sloping part, so that means that our next update just bounces us back to where we were before! Because the gradient isn't decreasing the closer we get to the minimum, we wind up just oscillating around it. That's not very helpful. That's a slightly contrived example (though not entirely -- intuitively, with functions like ReLU or GELU in our real LLMs, it's easy to imagine crazy loss landscapes). But it does show that perhaps we might want to add in our own "artificial" way to decrease the size of the steps we take over the course of training our model rather than just relying on the gradients naturally flattening out for us. Another way of looking at things is that as the model gets trained, we don't want batches of very new-looking data to cause big updates, taking us away from what was a good part of the loss landscape in terms of what we've seen so far. For example, imagine you've been training an LLM on a bunch of documents, which have so far been in English. Halfway through, it encounters a document in Byzantine Greek, the loss skyrockets, and you do a big update. That would be a problem! You might want it to learn a bit from it to push it slightly in a "the world is multi-lingual" direction, but you don't want it to lose a big chunk of the value from its previous training. You might also see a kind of connection to the way that people learn over the course of their lives -- for babies, everything is new and they "update their parameters" constantly as they try to understand the world. Children are still pretty flexible, but as we get older we tend to update our beliefs less and less. That's not always optimal, but as a heuristic it's pretty adaptive. Anyway, in general: for most training runs, we're going to want the learning rate to adjust over time. Most of the time this will be by reducing it, though there can be cases for increasing it again for periods. The general case of doing this is called "learning rate scheduling". There are a bunch of ways that people adjust the learning rate over the course of a train; here are a few that cropped up a lot while I was researching this. If we want the learning rate to go down over time, and we know how many steps we're training for, we can just set it to (say) 0.0004 for the first quarter of our train, then 0.0002 for the next, then 0.0001, then finish off with 0.00005, like this: That can work pretty well! But there is one obvious oddity -- the big step changes in learning rate mean that the exact placement of the drops and the training data before and after can matter. Why are we treating the data and the state of the model immediately before and immediately after so differently? It would make more sense to have a smoother schedule. What functions decay smoothly like that? An exponential curve does: let's say we just multiply the learning rate by a number that is a little smaller than one every step, so that it drops smoothly like this: But there are lots of other curves like that, and one is particularly interesting: As you change θ from 0 to π , the value of cos θ goes smoothly from 1 to − 1 , so it's easy enough to rescale that so that our learning rate follows the same curve: This is called a "cosine annealing" or "cosine decay" schedule, and was apparently inspired by the algorithms used for simulated annealing (an optimisation algorithm that was in turn inspired by how the atomic structures form in metals as they cool -- another one for the list of things to look into in the future...) That solves the mystery from earlier: the cosine that the Chinchilla paper was talking about was exactly this. As it turns out, the cosine decay scheduling curve is quite popular in deep learning, because it has what amounts to two well-defined phases -- an initial high learning rate where lots of exploration of the loss landscape can happen, followed by a smooth transition to something more like fine-tuning to optimise the location in whatever part of the loss landscape we've wound up in. Now, all of the above are assuming that we want the learning rate to start high and finish low, so that we can mimic the textbook gradient descent that we had at the start of this post. Intuitively that feels nice, but on further thought, the important thing is really that we have a low learning rate at the end of the train, so that we can find as close a point as possible for the minimum at the part of the loss landscape we've found ourselves in. But perhaps there's a case for having both high and low periods during the train, so that we don't get stuck in a local minimum -- something to jolt us out of where we were every now and then? 2 With a step function, that's easy: you could, for example, do this: With an exponential, you could do something like this: With cosine decay, of course, things are even easier, because the cosine function is inherently cyclical, so we can just do this: However, at least for our purposes, training an LLM using a Chinchilla-optimal number of training tokens, it makes sense to be guided by what the authors of the Chinchilla paper did. Appendix B says: We find that setting the cosine cycle length too much longer than the target number of training steps results in sub-optimally trained models, as shown in Figure A1. As a result, we assume that an optimally trained model will have the cosine cycle length correctly calibrated to the maximum number of steps, given the FLOP budget; we follow this rule in our main analysis. So, at this point, I think we have one important part of the intervention we want to make: we want to use a cosine learning rate scheduler, going from high near the start of the training run, down to low at the end over one cycle. Additionally, and also from appendix B in the paper: we use a 10x learning rate decay in line with Rae et al. (2021) ...which means that if our learning rate starts at η , then we want it to decay down to η / 10 by the end. So, we just need to work out an initial value for η , and let it rip, right? Well, not so fast... When our model is uninitialised, right at the start of the train, gradients are going to be pretty wild. It's going to be making random errors all of the time, and we'll be making huge jumps across the loss landscape. That sounds bad. Additionally those kind of wild jumps can get the optimiser into a -- well, sub-optimal -- state. I haven't read enough about optimisers yet to have a solid handle on that, but that can wait -- intuitively it makes some kind of sense that erratic gradient updates might confuse it. So, it makes a certain amount of sense to start off with a low learning rate so that we don't do that, and then to increase it gradually to the peak, and only then to schedule the gradual cosine decay. According to this (rather nice looking) masterclass on LLM training , it's typical to do this over "a few thousand steps or a small percentage (e.g., 1-10%) of the total training steps, depending on the dataset size and batch size", and we would just use a linear increase over that period: I think we should do that; a simple linear warmup at the start -- let's relatively arbitrarily say 5% of our training steps going up to our desired peak learning rate. So our learning rate schedule should look something like this: So far I've written a lot about how we vary the learning rate over time, and that's all been very useful. But we still need to know what the value should be initially! In smaller-scale experiments you might just try a bunch of different numbers to see what worked well, but at more than US$30 per train, that's not practical here. Unfortunately it's really quite hard to find good suggestions published anywhere. The GPT-2 paper is (as usual) reticent: The learning rate of each model was manually tuned for the best perplexity on a 5% held-out sample of WebText ...and if you search for "learning rate training llm", you'll see lots of results for when people are fine-tuning existing LLMs ( 2 × 10 − 4 comes up a lot), but almost nothing about when you're training one from scratch. I eventually came across this (long!) post from Hugging Face , which I definitely need to spend time going through in the future, because it covers a lot of the ground I've been going over in this post series. But for this post, I think the most relevant part is in the section " Scaling Laws for Hyperparameters ", where they include a figure from this DeepSeek paper . Here it is, with some of the (also relevant) surrounding text: In our trains we're using something like 5 × 10 18 total FLOPs. Now, they are specifically charting things in terms of non-embedding FLOPs, but I'm going to play a little fast and loose here and ignore that, so reading off their chart, that looks like we should be using about 1.4 × 10 − 3 as our learning rate. We can double-check that against their formula, where C is the compute budget: Nice, a close match! However, it's definitely worth noting that we're using a simple GPT-2 architecture, and they are using something quite different -- RMSNorm instead of LayerNorm, SwiGLU as the activation function on the feed-forward networks, Rotary Position Embedding rather than the fixed ones we're using, and so on. As a sanity check: you can see that they also give a formula for the optimal batch size in terms of tokens. For our FLOP budget, that comes in at 381,782, which is about 373 of our 1,024-token sequences. That is quite a lot higher than the 97-or-so sequences that we appeared to be optimal in our earlier experiments . That is a little concerning, though of course the 97 number came out of a very ad-hoc bit of curve-fitting. For now, I'm going to hope that that doesn't matter too much for the learning rate. This may come back to bite me; if the results of a train with 1.4 × 10 − 3 are radically worse than the existing rate of 4 × 10 − 4 , I'll have to do a bit more investigation. So, now I think we have all of the theoretical pieces in place to do a train. Let's move on to the practicalities. We started by looking at this: What should we change -- disregarding the until the next post? Based on the above, we want to do a linear warmup of about 5% of our steps, going up to a learning rate of 1.4 × 10 − 3 , followed by a cosine decay down to one tenth of that, 1.4 × 10 − 4 . What does that look like in code? The relevant API for scheduling the learning rate in PyTorch is, logically enough, in the module, and there are a bunch of different scheduling classes. You create your optimiser, then create a scheduler for the shape you want, and then you can call on the scheduler (after the on the optimiser) to adjust the optimiser's learning rate over time. Let's make that more concrete; one of the schedulers is , which is what we'll need for our linear warmup period. It takes as its parameters: Let's say that we want to go from almost-zero to our optimiser's learning rate over 1,600 steps -- we'd create our scheduler like this: ...then in our training loop, after we've done the scaled step of the optimiser, we'd also step the scheduler: This confused me a little bit the first time I saw it; after all, if the scheduler hasn't been "triggered" when we step the optimiser, how does the optimiser know what learning rate to use? Surely it would just use whatever it was initialised with? The answer is that when you create the optimiser, it stores away the learning rate that you give it in two places -- an "initial learning rate" and a "current learning rate". Next, when you create your scheduler, it uses the initial learning rate to work out the start and end values, and then sets the current one to the start value immediately. Just by creating a scheduler, you're changing the optimiser's current learning rate -- but not the initial one, which is important, as we'll see in a moment. So, we have a scheduler that handles our warmup period nicely. Another scheduler that's relevant to our interests is the CosineAnnealingLR . This takes: On creation, this scheduler will read in the optimiser's initial learning rate -- note, not the current one -- and then the first time it's stepped, it will set the current learning rate to that value, and then for steps after that it will reduce it so that it follows a nice cosine decay, reaching after steps. So those two cover the two regimes that we want -- the warmup and then the cosine decay. But now we need to put them together; we want to do one and then the other. There's a very useful class, , which allows you to chain schedulers and tell it when each one takes over from the previous one. Let's sketch out some code to use that to do a train with our new peak learning rate of 1.4 × 10 − 3 , a warmup of 1,600 steps, followed by a cosine decay for the next 32,000 steps to one tenth of the peak learning rate: That actually works quite nicely! I wrote a dummy training loop to plot the current learning rate over a fake train using code like the above , and got this: ...with the output confirming that the values were good at the "milestone" point, the start and the end: I was initially a bit surprised by that, as at the time I ran it, I didn't realise that there was that split between the initial and the current learning rates on the optimiser, so I thought that the cosine scheduler would pick up whatever tiny starting value the warmup scheduler had overwritten the optimiser's learning rate with -- but that split saves the day. That means that now we have the outline of how to schedule our learning rate. But before we can put that into the code, we need to think about how it affects our checkpoints. Just like the scheduler and the optimiser, the learning rate scheduler -- or, indeed, our two schedulers here -- contain information about the state of the train. That means that if we recover from a checkpoint, we need to provide them with the information they need. If we just created them afresh, they'd start from the beginning -- for example, if we restarted from step 20,000 in a train like the one above, we'd start a new warmup from pretty much zero, and then start a fresh cosine decay. That would be bad: (Dummy test code here .) Now, we could use the parameter to initialize them with the correct current global step. But they have a state dict, like most other PyTorch objects, so the simplest thing to do is just to write that to another checkpoint file: ...and then load it likewise: (Dummy test code here .) Conveniently, if you save the state dict of a , it will also include the state of all of its component schedulers, and likewise if you reload it, it will load the components' states back in too. The one thing you have to be careful about is what they warn about in the PyTorch docs: Initializing a scheduler overwrites its optimizer’s s. When restoring a checkpoint, initialize the scheduler before calling your optimizer's to avoid overwriting the loaded learning rates. Luckily enough, in our code as it stands, we create all of the things that are checkpointed -- the optimiser and the scaler so far, but shortly the scheduler as well -- before we load in the state dicts, so that drops out quite nicely. So, we have some sketched-out code -- it's time to put it in place for the real training run. I won't go through the details of the changes to my existing DDP training code, though you can see the diff here if you're interested. Much of the complexity was due to keeping backward compatibility so that we don't have to always use a learning rate scheduler; remember that in this mini-series, I'm trying making various changes ("interventions") to the training loop in isolation, seeing whether each one improves things. So it's important to be able to easily train with or without learning rate scheduling; I did that with a flag in the Implementation-wise, initially I was thinking that it would be easiest to always have a scheduler, and in the "non-scheduled" case to just set it to a linear one that didn't change the value over the course of the train. But in the end it turned out to be easier to use as being the switch to tell the training loop which "mode" it was in. The placement of the code to create the schedulers was also a little tricky; the "natural" place was just after the optimiser is created, like it is in the example code above. However, at that point, we don't know how many global steps we're going to have in the train, because we don't have the dataset -- which means that working out the numbers to pass in to the schedulers for the warmup and decay steps would be impossible. It turned out to be easiest to put it in the function , just after the datasets are loaded, as at that point we have all of the information we need. Anyway, that's the code done, so let's see what happens! I wanted to do two trains; one with the learning rate scheduling, and one with just the new value for the learning rate, instead of . I was expecting the updated learning rate alone to be too high and to cause a very choppy train, but had high hopes for the train with the scheduling. Here's how it did; the scheduled learning rate train first: Here's what the training loss looked like over that: Quite a few loss spikes early on in the train when the learning rate is at its peak, but nothing unmanageable -- and, as you'd expect, things calmed down quite a lot later on. I also charted the learning rate, to make sure it really was doing what I thought it was doing: So, a pretty smooth train, and we definitely did the right learning rate scheduling. Time to upload it to Hugging Face , and see what the evals look like. Firstly, the smoke test: Reasonably coherent, at least, though it's not super-impressive. On to the loss on our test set: That's our best loss so far! Let's put it into the table: So, it definitely looked like it was worth it. But was it the scheduling of the learning rate that helped, or just the change from 0.0004 to 0.0014? I kicked off a second run with no scheduling, just a learning rate of 0.0014, to see what would happen. After about an hour, I noticed that the loss chart had stopped updating. The last point had a maximum and minimum loss but no average -- but after that, nothing: However, the learning rate was still being charted, so the train was definitely running: Looking at the checkpoint metadata showed what had happened. At global step 1851, we had this 3 : ...and at the next checkpoint at step 2468, we had this: ...and the same for all checkpoints thereafter. Clearly the parameters had gone off the rails -- exactly what we'd expect with an excessive learning rate: There was no point in continuing the train, as it was pretty much certainly unrecoverable, so I stopped it. Out of interest, I downloaded the model, but I couldn't even run the smoke test on it: So it was pretty clear that just updating the learning rate to 0.0014 was actively harmful. No need to upload that one to HF! And time to wrap up this experiment. While this has been quite a long post, I've really only scratched the surface of how learning rates are set. If I were doing things in more detail, the best would probably be to do a "sweep" over multiple values to try to at least approximate the best possible rate for this model. That would be pretty expensive for me, though, so I decided to stick with the DeepSeek number. It might not be ideal for the specific architecture that I'm using, given how different that is to theirs, but given the results, it's a decent one compared to what I was using. 4 Something that I found interesting is that exactly how to schedule your learning rate is still an area being actively researched. Even in my relatively minimal research, I came across three alternatives to the mainstream warmup-cosine decay pattern: I'm sure there are many more. But for this train, I decided to stick to the mainstream, and the results were pretty good! To reiterate, this has been the most positive intervention so far: So I'll stick with that, and move on to the next thing: what is the parameter that we're passing in to the AdamW optimiser? Tune in next time :-) Yes, I am foreshadowing here.  ↩ To make my earlier analogy about learning rate decaying over time in people as they age even more dubious, we can imagine this as being rather like someone middle-aged going on an ayahuasca retreat ;-)  ↩ If you're wondering how we had a valid maximum and minimum in that first checkpoint when the average was NaN, here's why: You might wonder how large labs work out the right learning rate given their training runs run to millions of dollars. The answer is there in that DeepSeek paper, as that's one of the things they were doing. They scaled their model down from the billions of parameters that they wanted to train to various smaller models, and worked out the optimal learning rate for each of the smaller models by doing full trains on them. Once they had a mapping from model size to the ideal learning rate for their architecture, they could extrapolate that to the large ones that they wanted to train. The problem is that those "smaller" models are actually quite a lot larger than the one we're training here! And while we could potentially scale it down even further, I suspect that such truly tiny models (say, 1M parameters) wouldn't train well enough to give any meaningful results.  ↩ From the paper: Specifically, the learning rate of the model reaches its maximum value after 2000 warmup steps, and then decreases to 31.6% of the maximum value after processing 80% of the training tokens. It further reduces to 10% of the maximum value after 90% of the tokens. , which is the optimiser we're applying it to. , which the optimiser's learning rate is multiplied by to work out where we want to start up. , which is likewise applied to the optimiser's learning rate to work out the value we're heading for. , which is the number of steps over which it should go from the initial learning rate to the final one. , which lets the scheduler know how many steps into its schedule it currently is -- this defaults to , meaning it hasn't started yet. This can be useful if you're resuming from a checkpoint, but for our purposes we can ignore it. , which is the same as the 's. , which is the number of steps before it reaches its minimum , the minimum learning rate we want to get to. , again the same as the 's. Per the Hugging Face paper, some people do warmup, then pause at a set level for a while, then start the cosine decay (warmup-stable-decay). DeepSeek use a relatively simple stepped function after a warmup. 5 I came across a 2025 paper " Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs " which says that a linear decay (after a warmup) outperforms cosine. Yes, I am foreshadowing here.  ↩ To make my earlier analogy about learning rate decaying over time in people as they age even more dubious, we can imagine this as being rather like someone middle-aged going on an ayahuasca retreat ;-)  ↩ If you're wondering how we had a valid maximum and minimum in that first checkpoint when the average was NaN, here's why: ↩ You might wonder how large labs work out the right learning rate given their training runs run to millions of dollars. The answer is there in that DeepSeek paper, as that's one of the things they were doing. They scaled their model down from the billions of parameters that they wanted to train to various smaller models, and worked out the optimal learning rate for each of the smaller models by doing full trains on them. Once they had a mapping from model size to the ideal learning rate for their architecture, they could extrapolate that to the large ones that they wanted to train. The problem is that those "smaller" models are actually quite a lot larger than the one we're training here! And while we could potentially scale it down even further, I suspect that such truly tiny models (say, 1M parameters) wouldn't train well enough to give any meaningful results.  ↩ From the paper: Specifically, the learning rate of the model reaches its maximum value after 2000 warmup steps, and then decreases to 31.6% of the maximum value after processing 80% of the training tokens. It further reduces to 10% of the maximum value after 90% of the tokens. ↩

0 views

Radshield: Software Radiation Protection for Commodity Hardware in Space

Radshield: Software Radiation Protection for Commodity Hardware in Space Haoda Wang, Steven Myint, Vandi Verma, Yonatan Winetraub, Junfeng Yang, and Asaf Cidon ASPLOS'25 If you read no further, here are two interesting factoids about outer space from this paper: Launch costs have fallen 60x, with the current cost to launch 1kg to space clocking in at $1,400 (see Fig. 1 below). Many satellites orbiting the Earth and devices sent to Mars use Snapdragon CPUs! I assumed that all chips leaving planet Earth would be specialized for space, apparently not. Source: https://dl.acm.org/doi/10.1145/3760250.3762218 This paper describes software solutions to deal with two common problems that occur in outer space: Single-Event Latchups and Single-Event Upsets , both of which are caused by radiation interfering with the normal operation of a circuit. A single-event latchup (SEL) causes one portion of the chip to heat up. If left unmitigated, this can damage the chip. The solution to this is to detect the problem and reboot. The trick is in the detection. The classic detection method monitors chip current draw. However, this technique fails with a modern off-the-shelf CPU which is designed to have a wide variability in current draw. When compute load increases, clock frequencies and voltages change, cores come out of sleep states, and power consumption naturally increases. The point of this design is to save power during idle periods, which is especially important for satellites which must get their power from the sun. The solution proposed by this paper is called ILD. The idea is to predict the expected current draw based on a simple model that uses CPU performance counters (e.g., cache hit rate, instruction execution rate) as input. If the measured current draw is much larger than predicted, then the system is rebooted. The model is not perfect, and the authors noticed that this scheme only works well when the CPU load is not too high. This “predict, check, reboot if necessary” cycle only occurs during relatively calm periods of time. The system is modified to force 3-second idle periods every 3 minutes to ensure that reliable measurements can be taken. An SEL takes about 5 minutes to damage the chip, the 3-minute period is chosen to be below that threshold. A single-event upset causes the value of a bit to flip (in memory, cache, the register file, etc). There are two common solutions to SEUs: Use ECC on stored data Perform computations with triple modular redundancy (3-MR), which requires computing each result 3 times and choosing the most popular result if there is disagreement about the correct result This paper deals with mitigating SEUs that affect user “space” code. The authors define the term reliability frontier to represent the interface between hardware components that support ECC and those that do not. For example, if flash storage has ECC but DRAM does not, then flash is considered part of the reliability frontier. A typical smartphone CPU advanced satellite chip has multiple CPU cores. One way to alleviate the compute cost of 3-MR is to compute all 3 results on 3 separate cores in parallel. A problem with this approach is that the CPU cores may share unreliable hardware. For example, the last level cache could be shared by all cores but not support ECC. If a bit flips in the LLC, then all cores will see the corrupted value, and parallel 3-MR will not detect a problem. The paper proposes an algorithm called EMR. The idea is to break a computation into multiple tasks and associate metadata with each task that describes the subset of input data accessed by the task. Fig. 6 shows a motivating example. The task of analyzing an image may be decomposed into many tasks, where each task processes a subset of the input image. Source: https://dl.acm.org/doi/10.1145/3760250.3762218 In EMR, there is an API to explicitly create tasks and specify the set of input data that each task reads from. EMR then runs tasks in multiple epochs. Within an epoch, no two tasks read the same input data. EMR invalidates caches up to the reliability frontier between epochs. If there are many tasks, and few epochs, then this system works great (i.e., it has high CPU utilization and does not spend too much time invalidating caches). Table 2 compares ILD performance in detecting SELs against a random forest model and a model that simply compares current draw against a fixed value: Source: https://dl.acm.org/doi/10.1145/3760250.3762218 Fig. 11 shows the performance impact of EMR. Each result is normalized against a parallel version of 3-MR which ignores the problems associated with shared hardware. The red bars represent 3-MR run on a single core; the blue bars represent EMR. Source: https://dl.acm.org/doi/10.1145/3760250.3762218 Dangling Pointers EMR would benefit from a system that detects when a programmer misspecifies the set of inputs that will be read. Maybe hardware or software support could be added to detect this kind of bug. Subscribe now Source: https://dl.acm.org/doi/10.1145/3760250.3762218 This paper describes software solutions to deal with two common problems that occur in outer space: Single-Event Latchups and Single-Event Upsets , both of which are caused by radiation interfering with the normal operation of a circuit. Single-Event Latchups A single-event latchup (SEL) causes one portion of the chip to heat up. If left unmitigated, this can damage the chip. The solution to this is to detect the problem and reboot. The trick is in the detection. The classic detection method monitors chip current draw. However, this technique fails with a modern off-the-shelf CPU which is designed to have a wide variability in current draw. When compute load increases, clock frequencies and voltages change, cores come out of sleep states, and power consumption naturally increases. The point of this design is to save power during idle periods, which is especially important for satellites which must get their power from the sun. The solution proposed by this paper is called ILD. The idea is to predict the expected current draw based on a simple model that uses CPU performance counters (e.g., cache hit rate, instruction execution rate) as input. If the measured current draw is much larger than predicted, then the system is rebooted. The model is not perfect, and the authors noticed that this scheme only works well when the CPU load is not too high. This “predict, check, reboot if necessary” cycle only occurs during relatively calm periods of time. The system is modified to force 3-second idle periods every 3 minutes to ensure that reliable measurements can be taken. An SEL takes about 5 minutes to damage the chip, the 3-minute period is chosen to be below that threshold. Single-Event Upsets A single-event upset causes the value of a bit to flip (in memory, cache, the register file, etc). There are two common solutions to SEUs: Use ECC on stored data Perform computations with triple modular redundancy (3-MR), which requires computing each result 3 times and choosing the most popular result if there is disagreement about the correct result

0 views
iDiallo Today

I'm Not Lying, I'm Hallucinating

Andrej Karpathy has a gift for coining terms that quickly go mainstream. When I heard "vibe coding," it just made sense. It perfectly captured the experience of programming without really engaging with the code. You just vibe until the application does what you want. Then there's "hallucination." He didn't exactly invent it. The term has existed since the 1970s. In one early instance, it was used to describe a text summarization program's failure to accurately summarize its source material. But Karpathy's revival of the term brought it back into the mainstream, and subtly shifted its meaning, from "prediction error" to something closer to a dream or a vision. Now, large language models don't throw errors. They hallucinate. When they invent facts or bend the truth, they're not lying. They're hallucinating. And with every new model that comes out and promises to stay clean off drugs, it still hallucinates. An LLM can do no wrong when all its failures are framed as neurological disorder. For my part, I hope there's a real effort to teach these models to simply say "I don't know." But in the meantime, I'll adopt the term for myself. If you ever suspect I'm lying, or catch me red-handed, just know that it's not my fault. I'm just hallucinating .

0 views

Note #727

just posted about dot files and attractors over at 2389.ai 2389.ai/posts/the… Thank you for using RSS. I appreciate you. Email me

0 views

The Beginning Of History

Hi! If you like this piece and want to support my work, please subscribe to my premium newsletter. It’s $70 a year, or $7 a month, and in return you get a weekly newsletter that’s usually anywhere from 5000 to 185,000 words, including vast, extremely detailed analyses of NVIDIA , Anthropic and OpenAI’s finances , and the AI bubble writ large .  I just put out a massive Hater’s Guide To Private Equity and one about both Oracle and Microsoft in the last month. I am regularly several steps ahead in my coverage, and you get an absolute ton of value, several books’ worth of content a year in fact!. In the bottom right hand corner of your screen you’ll see a red circle — click that and select either monthly or annual.  Next year I expect to expand to other areas too. It’ll be great. You’re gonna love it.  Before we go any further: no, this is not going to turn into a geopolitics blog. That being said, it’s important to understand the effect of the war in Iran on everything I’ve been discussing. So, let’s start simple. Open Google Maps. Scroll to the Middle East. Look at the bit of water separating the Gulf Arab countries from Iran. That’s the Persian Gulf.  Scroll down a bit. Do you see the narrow channel between the United Arab Emirates and Iran? That’s the Strait of Hormuz. At its narrowest point, it measures 24 miles across. Around 20% of the world’s oil and a similar percentage of the world’s liquified natural gas (LNG) flows through it each year.  Yes, that natural gas, the natural gas being used to power data centers like OpenAI and Oracle’s “Stargate” Abilene (which I’ll get to in a bit) and Musk’s Colossus data center . But really, size is misleading. Oil and gas tankers are massive, and they’re full to the brim with incredibly toxic material. Spills are, obviously, bad . Also, because of their size, these tankers need to stick where to where the water is a specific depth, lest they find themselves stuck.  As a result, there are two lanes that tankers use when navigating through the Strait of Hormuz — one going on, one going out. This a sensible idea with the goal to reduce the risk of collisions, but it also means that the potential chokepoint is even smaller.   Anyway, at the end of last month, Iran’s Revolutionary Guard Corps unilaterally closed off the strait, warning merchant shipping that any attempt to travel through the strait was “not allowed .” This closure, for what it’s worth, is not legally binding. Iran can’t unilaterally close a stretch of international waters. And yes, while some of those shipping lanes cross through Iran’s territorial waters ( and Oman’s, for that matter ), they’re still governed by the UN Convention on the Law of the Sea (UNCLOSS) , which gives ships the right to cross through narrow geographical chokeholds where part of the waters belong to another state, and that says that nations “shall not hamper transit passage.” That requirement, I add, cannot be suspended.  Still, merchant captains don’t want to risk getting themselves and their crews blown up, or arrested and thrown in Evin Prison . Insurers don’t want to pay for any ship that gets blown up, or indeed, for the ensuing environmental catastrophe. And the UAE doesn’t want its pristine beaches covered in crude oil.  And so, the tankers are staying put . And they’ll stay there until one of four things happens: Of the first three, none feels particularly likely, at least in the short-to-medium term. Maybe I’m wrong. Maybe everything reverses and everyone suddenly works it out — Trump realizes that he’s touching the stove and pulls out after claiming a “successful operation.” The world is chaotic and predicting it is difficult. Nevertheless, before that happens, closing the Strait of Hormuz means that Iran can inflict pain on American consumers at the pump, and we’ve already seen a 30% overnight spike in oil prices , with the price of a barrel jumping over $100 for the first time since 2022 (though as of writing this sentence it’s around $95). With midterms on the horizon, Iran hopes that it can translate this consumer pain to political pain for Donald Trump at the ballot box.  This is all especially nasty when you consider that the price of oil is directly tied to inflation. It influences shipping costs, a lot of medicines, construction materials, and consumer objects have petrochemical inputs. In very simple terms, if oil is used to make your stuff (or get it to you), that stuff goes up in price. While this obviously hurts countries with which Iran has previously had cordial relations, (particularly Qatar which is a major exporter of LNG), I genuinely don’t think it cares any more.  I mean, Iran has launched drones and missiles at targets located within Qatar’s territory , resulting in (at the latest count) 16 civilian injuries. Qatar shot down a couple of Iranian jets last week . I’m not sure what pressure any of the Gulf countries could exert on Iran to make it back down.  I don’t see the security situation improving, either. Iran’s Shahed drones are cheap and fairly easy to manufacture, and developed under some of the most punishing sanctions, when the country was cut off from the global supply chain. It then licensed the design to Russia, another heavily-sanctioned country, which has employed them to devastating effect in Ukraine.  Iran can produce these in bulk, and then — for the fraction of a cost of an American tomahawk missile — send them out as a swarm to hit passing ships. Even without the ability to produce new ones, Iran is believed to have possessed a pre-war stockpile of tens of thousands of Shahed drones .  Shaheds aren’t complicated, or expensive, or flashy, or even remotely sophisticated, and that’s what makes them such a threat. It took Ukraine a long time to effectively figure out how to counter them, and it’s done so by using a whole bunch of different tactics — from l and-based defenses like the German-made Gepard anti-aircraft gun , to interceptor drones , to repurposed 1960’s agricultural planes , to (quite literally) people shooting them down with assault rifles from the passenger seat of a propeller-powered planes .  Ukraine has the experience in combating these drones, and even still some manage to slip through its defences, often hitting civilian infrastructure . Airstrikes can probably reduce the threat to shipping (though not without exacting an inevitable and horrible civilian cost), but they can’t eliminate it.  Hell, even the Houthis — despite only controlling a small portion of Yemen, and despite efforts by a coalition of nations to degrade its offensive capabilities — still pose a risk to maritime traffic heading towards the Suez Canal.  Given the cargo these ships carry, any risk is probably too much risk for the insurers, for the carriers, and for the neighbouring countries. While I could imagine the US, at some point, saying “great news! It’s fine to go through the Strait of Hormuz now,” and though it has started offering US government-backed reinsurance for vessels , I don’t know if any shippers will actually believe it or take advantage of it.  And so, we get to the last point on my list. Regime change.  Do I believe that the Iranian government is deeply unpopular with its own people? Yes. Do I believe that said government can be overthrown by airstrikes alone? No. Do I believe that Iran’s government will do anything within its power to remain in control, even if that means slaughtering tens of thousands of their own people? Yes.  Even if there was an uprising, who would lead it? Iran’s virtually cut off from the Internet , and movement within the country is restricted, making it hard for any opposition figures to organize. The two most high-profile outside opposition figures — Reza Pahlavi, the son of the former Shah, and Maryam Rajavi, leader of the MEK and NCRI — both have their own baggage, and they’re living in the US and France respectively.  As I said previously, this isn’t me wading into geopolitics, but more of a statement that there’s no way of knowing when things will eventually return to normal. This conflict might wrap up in a couple of weeks, or it might be months, or, even longer than that. All this amounts to a huge amount of global oil production being bottled up, which is made worse by the fact that there’s also the slight problem that Iran produces a lot of oil itself, sending most of it (over 80%) to China . With Iran unable to export crude, and its production facilities now under attack, China’s going to have to look elsewhere. Which will result in even higher oil prices.  Which, in turn, will make everything else more expensive.  That is what brings us back to the AI bubble.  Now, given that most of the high-profile data center projects you’ve heard about are based in the US, which is (as mentioned) largely self-sufficient when it comes to hydrocarbons, you’d assume that it would be business as usual.  And you would be wrong.  You see, this is a global market. Prices can (and will!) go up in the US, even if the US doesn’t import oil or natural gas from abroad, because that’s just how this shit works. Sure, there are variations in cost where geography or politics play a role, but everyone will be on the same price trajectory. While we won’t see the same kind of shortages that we witnessed during the last oil shock (the one which ended up taking down the Carter presidency ), it will still hurt . While the US managed to decouple itself from oil imports, it hasn’t (and probably can’t) decouple itself from global pricing dynamics.  The US has faced a few major oil shocks — the first in 1973 , after OPEC issued an embargo against the US following the Yom Kippur War, which ended the following year after Saudi Arabia broke ranks, and the second in 1979, following the Iranian Revolution — and both hurt…a lot. This won’t be much different.  First, inflation.  As the cost of living spikes, people will start demanding higher wages, which will, in turn, be passed down through higher prices.  At least, that’s what would normally happen. Paul Krugman, the Nobel-winning economist, wrote in his latest substack that US workers in the 1970s were often unionized, and they benefited from contractual cost-of-living increases in their work contracts.  Sadly, we live in 2026. Union membership hasn’t recovered from the dismal Reagan years, and with layoffs and offshoring, combined with an already tough jobs market, workers have little leverage to demand raises. We’re in an economy oriented around do-nothing bosses that loathe their workers , one where workers will get squeezed even further by the consequences of any economic panic, even if it’s one caused by multiple events completely out of their control. So, it’s unlikely that we’ll see a wage-based amplification of any inflation that comes from the current situation.  That said, depending on how bad things get, we will see inflation spike, and Increases in inflation are usually met with changes in monetary policy, with central banks raising the cost of borrowing in an attempt to “cool” the economy (IE: reduce consumer spending so that companies are forced to bring down prices).    And we’d just started to bring down interest rates, with the Fed announcing in December that it projected rates of 3.4% by the end of 2026 . Iran changes that in the most obvious way possible — if prices soar, interest rates may follow, and if rates go up, even by a point or two of a percentage, financing the tens and hundreds of billions of dollars in borrowing that the AI bubble demands will become significantly more expensive.  For some context, the International Monetary Fund’s Kristalina Georgieva recently said “...a 10% increase in energy prices that persists for a year would push up global inflation by 40 basis points and slow global economic growth by 0.1-0.2%,” per The Guardian, who also added… And remember : the AI bubble, along with the massive private equity and credit funds backing it, is fueled almost entirely by debt. All this chaos and potential for jumps in inflation will also affect the affordability calculations that lenders will make before loaning the likes of Oracle and Meta the money they need at a time when lenders are already turning their nose up at Blue Owl-backed data center debt deals . The alternative is, of course, not raising interest rates — which, if the Fed loses its independence, is a possibility — which would be equally catastrophic, as we saw in the case of Turkey, whose president, Recep Tayyip Erdogan, has a somewhat… ahem… “unorthodox approach to monetary policy .  Erdogan believes that high interest rates cause inflation — a theory which he tested to the detriment of his own people .  In simpler terms, Turkey has faced some of the worst hyperinflation in the developed world , and has a currency that lost nearly 90% of its value in five years.  It’s not just the data centers, either. As interest rates go up, VC funds tend to shrink, because the investors that back said funds can get better returns elsewhere , and with much less risk.  As I discussed in the Hater’s Guide to Private Equity , 14% of large banks’ total loan commitments go to private equity, private credit and other non-banking institutions , at a time when ( to quote Forbes ) PE firms are taking an average of 23 months fundraising (up from 16 months in 2021), after private credit’s corporate borrowers’ default rates (as in the loans written off as unpaid by the borrow) hit 9.2% in 2025 . Put really simply, private equity, private credit, venture capital and basically everything to do with technology currently depends on the near-perpetual availability of debt. The growth of private credit is so recent that we truly don’t know what happens if the debt spigot gets turned off, but I do not think it will be pretty . Things get a little worse when you remember that famed business dipshits SoftBank are currently trying to raise a $40 billion loan to fund its three $10 billion Klarna-esque payments as part of its $30 billion investment in OpenAI’s not-actually-$110-billion-yet funding round . How SoftBank — a company that raised a $15 billion bridge loan due to be paid off in around four months and has about $41.5 billion in existing debt that’s maturing that needs to be refinanced in the next nine months or so, per JustDario — intends to take on another $40 billion is beyond me. And that’s a sentence I would’ve written before the war in Iran began. There’s also evidence that links lower IPO numbers to rising inflation rates , which means that achieving the exit that investors want will become so much harder — and so, they might as well not bother. Need proof? SoftBank-owned mobile payments company PayPay delayed its IPO last week, and I quote Reuters , because “...markets were rattled by [the attack] on Iran, according to two people familiar with the matter.” Inflation also negatively affects company valuations — which, again, will influence whether investors open their purse strings.  This is all a long-winded way of saying that the AI industry is about to enter a world of hurt. Every AI startup is unprofitable, which means they need to raise money from venture capitalists, who raise money from investors that aren’t paying them, pension funds and insurers, and private equity and credit firms that raise money from banks, both of which will struggle should central bank rates spike.  The infrastructural layer — AI data centers — also requires endless debt ( due to the massive upfront costs for NVIDIA chips and construction ), and that debt was already becoming difficult to raise.  Then there's the practical opex and capex costs. Higher interest rates mean that any contractors building the facilities will insist on higher fees, because their costs — labor costs, the price of filling up a van or a truck with gas, or paying for building materials — has gone up. And they’ll probably pad the increase a bit to take into account for any future rises in inflation.  Those gas turbines you’re running to power your facility? Yeah, feeding those is going to get much more expensive. Natural gas is up as much as 50%, and a lot of US capacity is going to serve markets in Asia and Europe to take advantage of the spike in prices , which will mean an increase in prices for US consumers.  In fact, you don’t even need interest rates to spike for things to get nasty. As the price of oil continues to skyrocket, flying a Boeing 747 filled with GB200 racks from Taiwan to Texas or mobilizing the thousands of people that work ( to quote Bloomberg ) day and night to build Stargate Abilene will become extra-normally more expensive. And even in the very, very unlikely event that things somehow quickly return to whatever level of “normal” you’d call the world before the conflict started, even brief shocks to the financial plumbing are enough to destabilize an already-fractured hype cycle. Last week, Bloomberg reported something I’d already confirmed three weeks ago — that OpenAI was no longer part of the planned expansion (past the initial two (of eight) buildings) of Stargate Abilene, a project that’s already massively delayed from its supposed “full energization” by mid-2026 .  Oracle disputes the report (and if it’s wrong, I imagine investors will rightly sue) claiming that “Crusoe [the developer] and Oracle are “operating in lockstep,” which doesn’t make sense considering the delays or, well, reality. My sources in Abilene also tell me that the expansion fell apart due to Oracle’s dissatisfaction with the revenue it was making on buildings one and two, and that a bidding war was taking place between Meta and Google for the future capacity.  Bloomberg’s Ed Ludlow also reports that NVIDIA put down a $150 million deposit as Crusoe attempts to lock down Meta as a tenant — a very strange thing to do considering Meta is flush with cash, suggesting a desperation in the hearts of everybody involved. It’s also very, very strange to have a supplier get involved in a discussion between a vendor and a customer , almost as if there’s some sort of circular financing going on. As I reported back in October, Stargate currently only has around 200MW of power , and The Information reports that power won’t be available for a year or more, something I also said in October .  As self-serving as it sounds, I really do recommend you read my premium piece about the AI Bubble’s Impossible Promises , because I laid out there how stupid and impossible gigawatt data centers were before the war in Iran. We’ve already got a shortage in the electrical grade steel and transformers required to expand America’s (and the world’s) power grid, we’ve already got a shortage of skilled labor required to build that power (and data centers in general) , and we’re moving massive amounts of heavy shit around a large patch of land using thousands of people, which will cost a lot of gas. I don’t know why, but the media and the markets seem incapable of imagining a world where none of this stuff happens, clinging to previous epochs where “things worked out” and where “things were okay” without a second thought. In The Black Swan , Nassim Taleb makes the point that “…the process of having [journalists] report in lockstep [causes] the dimensionality of the opinion set to shrink considerably,” saying that they tend to “[converge] on opinions and [use] the same items as causes.”  In simpler terms, everybody reporting the same thing in the same way naturally makes everybody converge on the same kinds of ideas — that AI is going to be a success because previous eras have “worked out,” even if they can’t really express what “worked out” means.  The logic is almost childlike — in the past, lots of money was invested in stuff that didn’t work out, but because some things worked out after spending lots of money , spending lots of money will work out here.  The natural result is that reporters (and bloggers) seek endless positive confirmation, and build narratives to match. They report that Anthropic hit $19 billion in annualized revenue and OpenAI hit $25 billion in annualized revenue — which has been confirmed to refer to a 4-week-long period of revenue multiplied by 12 — as proof that the AI bubble is real, ignoring the fact that both companies lose billions of dollars and that my own reporting says that OpenAI made billions less and spent billions more in 2025. They assume that a company would not tell everybody something untrue or impossible, because accepting that companies do this undermines the structure of how reporting takes place, and means that reporters have to accept that they, in some cases, are used by companies to peddle information with the intent of deception. And thanks to an affidavit from Anthropic Chief Financial Officer Krishna Rao filed as part of Anthropic’s suit against the Department of Defense’s supply chain risk designation , it’s clear that the deception was intentional, as the affidavit confirmed that Anthropic’s lifetime revenue “to date” (referring to March 9th 2026) is $5 billion , and it has spent $10 billion on inference and training.  To be abundantly clear , this means that Anthropic’s previous statement that it made $14 billion in annualized revenue ( stated by Anthropic on February 12 2026, and referring, I’ve confirmed, to a month-long period multiplied by 12 ) — referring to a period of 30 days where it made $1.16 billion — accounts for more than 23% of its lifetime revenue.  This comes down to which Anthropic you believe, because these two statements do not match up. I am not stating that it is lying , but I do believe annualized revenue is a deliberate attempt to obfuscate things and give the vibe that the business is healthier than it is. I also do not think it’s likely that Anthropic made 23% of its lifetime revenue in the space of a month. What this almost certainly means is that the sources that told media outlets that Anthropic made $4.5 billion in 2025 were misleading them . The exact quote from the affidavit is that “...[Anthropic] has generated substantial revenue since entering the commercial market—exceeding $5 billion to date,” and while boosters will say “uhm, it says “exceeding,” if it were anything higher than $5.5 billion Anthropic would’ve absolutely said so.  We can also do some very simple maths that suggests that Anthropic’s “annualized” figures are…questionable. On February 12 2026, annualized revenue hit $14 billion. Five days before the lawsuit was filed, it was $19 billion, “ with $6 billion added in February ” (per Dario Amodei at a Morgan Stanley conference), suggesting that annualized revenue was $13 billion, or $1.083 billion.  Even if we assume a flat billion, that means that Anthropic made $2.16 billion between January and the end of February 2026. And that’s not including the revenue made in March so far.  But I’m a curious little critter and went ahead and added up all of the times that Anthropic had talked about its annualized revenue from 2025 onward, and the results — which you can find with links here! — and based on my calculations, just using published annualized revenues gets us to $4.837 billion.  We are, however, missing several periods of time, which I’ve used “safe” (as in lower, so that I am trying to give Anthropic the benefit of the doubt) numbers to calculate based on the periods themselves. With these estimates, we get a grand total of $6.66 billion (ominous!), which is a great deal higher than $5 billion. When you remove the estimates and annualized revenues for 2026, you get $3.642 billion, which heavily suggests that Anthropic did not, in fact, make $4.5 billion in 2025. There isn’t a chance in Hell this company made $4.5 billion in 2025 based on its own CFO’s affidavit. I also think it’s reasonable to doubt the veracity of these annualized revenues, or, in my kindest estimation, that Anthropic is using any kind of standard “annualized” formula.  Here are the ways in which people will try and claim I’m wrong: I think it’s reasonable to doubt whether Anthropic made anywhere near $4.5 billion in 2025, whether Anthropic has annualized revenues even approaching those reported, and whether anything it says can be trusted going forward. It appears one of the most prominent startups in the valley has misled everybody about how much it makes, or if it has not, that somebody else is perpetuating a misinformation campaign. Add together the annualized revenues. Look at the links. Do the maths. I got the links for annualized revenues from Epoch AI , though I have seen all of these before in my own research.  People are going to try and justify why this isn’t a problem in all manner of ways. They’ll say that actually Anthropic made less money in 2025 but that’s fine because they all could see what annualized revenues really meant. So far, nobody has a cogent response, likely because there isn’t one. I haven’t even addressed the $10 billion in training and inference costs, because good lord, those costs are stinky , and based on my own reporting — which did not come from Anthropic, which is why I trust it! — Anthropic spent $2.66 billion on Amazon Web Services from January through September 2025, or around 26% of its lifetime compute spend. That’s remarkable, and suggests this company’s compute spend is absolutely out of control. This leads me to one more quote from Anthropic’s CFO: Without attempting to influence their decision making, if I were a counterparty to a company like this, my biggest concern would now be that this filing appears to suggest that Anthropic’s revenues are materially smaller than I believed. Though it might seem dangerous to be like me, pointing at stuff and saying “that doesn’t make sense!” Or questioning a narrative held by the entire stock market and most of modern journalism, but I’d argue the danger is that narrow, narrative-led, establishment-driven thinking makes it impossible for reporters to report.  While you might be able to say “a source told me that something went wrong,” the natural drive to report on what everybody else is saying means that this information is often reported with careful weasel words like “still going as planned” or “still growing incredibly fast.” It’s a kind of post-factual decorum — a need to keep the peace that frames bad signs as bumps in the road and good signs as cast-iron affirmations of future success. This is a catastrophic failure of journalism that deprives retail investors and the general public of useful information. It also — though it feels as if reporters are “getting scoops” or “breaking news” — naturally magnetizes journalists toward information that confirms the narrative, or “leaks” that are actually the company intentionally getting something in front of a reporter so that they (the reporter) can appear as if this was “investigative news” versus “marketing in a different hat.” It also means that modern journalism is ill-equipped, and no, this is not a “new” phenomena. It is the same thing that led to the dot com bubble, the NFT bubble, the crypto bubble, the Clubhouse bubble, the AR and VR bubble, and many more bubbles to come.  To avoid being “wrong,” reporters are pursuing stories that prove somebody else right, which almost invariably ends with the reporter being wrong. “Pursuing stories to prove somebody else right” means that a great many reporters (and newsletter writers) that claim to be objective and fact-focused end up writing the narrative that companies use to raise money using evidence manufactured by the company in question.  In some cases, this is an act of cowardice. Following the narrative because it’s easy and because everybody’s doing it adds a layer of reputation laundering. If everybody failed, everybody was conned and thus nobody has to be held accountable, and because there really has never been any accountability for the media being wrong about any previous bubbles, the assumption is that it’ll never happen.  However you may feel about my work or what I’m saying, I need you to understand something: journalism, both historically and currently, is unprepared for the consequences of being wrong.  The current media consensus around the AI bubble is that even if it pops it will be fine , with some even saying that “even if OpenAI folds, everything will work out, because of the dot com bubble.” This is a natural attempt to rationalize and normalize the chaotic and destructive — an attempt to map how this bubble would burst onto previous bubbles because new things are difficult and scary to imagine.  There has never been a time when the entire market crystallised around a few specific companies — not even the dot com bubble! — and then built an entire infrastructural layer mostly in service of two of them, with a price tag now leering close to the $1tn mark .   Let’s get specific. The scoffing and jeering I get from people when I say that AI demand doesn’t exist or that AI companies don’t have revenues or that OpenAI or Anthropic are unsustainable is never met with a good faith response , just quotes about how “Amazon Web Services lost lots of money” or “Uber lost lots of money” or that “these are the fastest growing companies of all time” or something about “all code being written by AI,” a subject I discussed at length two weeks ago .  The Large Language Model era is uniquely built to exploit human beings’ belief that we can infer the future based on the past, both in how it processes data and in how people report on its abilities. It exploits media outlets that do not have people that are given the time (or held to a standard where they have) to actually learn the subjects in question, and sells itself based on the statement that “this is the worst it’ll ever be” and “previous eras of investment worked out.”  LLMs also naturally cater to those who are willing to accept substandard explanations and puddle-deep domain expertise. The slightest sign that Claude Code can build an app — whether it’s capable of actually doing so or not — is enough for people that are on television every day to say that it will build all software, because it confirms the biases that the cycle of innovation and incumbent disruption still exists, even if it hasn’t for quite some time. A glossy report about job displacement — even one that literally says that Anthropic found “no systematic increase in job displacement in unemployment” from AI — gets reported as proof that jobs are being displaced by AI because it says “AI is far from reaching its theoretical capability: actual coverage remains a fraction of what’s feasible.”  This is an aggressive exploitation in how willing people with the responsibility to tell the truth are willing to accept half-assed expectations, and how willing people are to operate based on principles garnered from the lightest intellectual lifts in the world. The assumption is always the same: that what has happened before will happen again, even if the actuality of history doesn’t really reflect that at all. Society — the media, politicians, chief executives, shit, everyone on some level — is incapable of thinking of new stuff that would happen, especially if that new stuff would be economically destructive, such as a massive scar across all private credit, private equity and venture capital, one so severe that it may potentially destroy the way that businesses (and startups, for that matter) raise capital for the foreseeable future. People are more willing to come up with societally-destructive theories — such as all software engineering and all journalism and all content being created by LLMs, even if it doesn’t actually make sense — because it fits their biases. Perhaps they’re beaten down by decades of muting the power of labor or the destruction of our environment. Perhaps they’re beaten down by the rise of the right and the destruction of the rights of minorities and people of colour.  Or more noxiously, perhaps they’re excited to be the one that called it first, so that the new overlords that they perceive will own this (fictional) future, so much so that they’ll ignore the underlying ridiculousness of the economics, refuse to do any further reading that might invalidate their beliefs, or simply say whatever they’re told because it gets clicks and makes their advertisers, bosses or friends happy. People are willing to fall in line behind mythology because conceiving an entirely-different future is an intellectually challenging and emotionally draining act. It requires learning about a multitude of systems and interconnecting disciplines and being willing to admit, again and again, that you do not understand something and must learn more. There are plenty of people that are willing to do this, and plenty more that are not, and those are the people with TV shows and writing in the newspaper. I believe we’re in a new era. It’s entirely different. Stop trying to say “but in the past,” because the past isn’t that useful, and it’s only useful if you’re capable of evaluating it critically, skeptically, and making sure that it’s actually the same rather than it feeling like it is.  I keep calling this era “The Beginning of History,” not because it directly reflects Francis Fukuyama’s theory (which relates to democracies), but because I believe that those who succeed in this world are not those who are desperate to neatly fit it into the historical failures or successes of the past, but are willing to stare at it with the cold, hard fury of the present.  There are many signs that the past no longer makes sense. The collapse of SaaS (which I’ll cover in this week’s premium), the collapse of the business models of both venture capital and private equity, the collapse of democracies under the weight of fascism because the opposition parties never seem to give enough of a fuck about the experiences of regular people.  That’s because using the past to dictate what will happen in the future is masturbatory. It allows you to feel smart and say “I know the most about anything, which means I know what’s going on.” It is, much like an LLM, assuming that simply reading enough is what makes somebody smart, that shoving a bunch of text in your head — whether or not you understand it is immaterial — is what makes somebody know something or good at something.  It’s an intellectually bankrupt position that I believe will lead those unable to adapt to the reality of the future to destruction. It leads to lazy thinking that grasps at confirmations rather than any fundamental understanding, depriving the general public of good information in the favor of that which confirms the biases and wants and needs of the malignant and ignorant.  It takes courage to be willing to be wrong with deliberacy, but only if you admit that you were wrong. This hasn’t happened in previous bubbles, and it has to again for us to stop bubbles forming. I have made a great deal of effort to learn more as time goes on. I do not see boosters doing the same to prove their points. I will be pointing to this sentence in the future, one way or another.  So much more effort is put into humouring the ideas of the bubbles, of proving the marketing spiel of the bubbles, framed as a noxious “both-sides” that deprives the reader, listener or viewer of their connection with reality. It might be tempting to say this happens with cynicism too, except the majority of attention paid to bubbles is positive , and saying otherwise is a fucking lie. Need to justify unprofitable, unsustainable AI companies? Uber lost money before. Need to explain why AI data centers being built for demand isn’t a problem? Well, the internet exists, and people eventually used that fiber.  You can ignore actual proof while pretending to provide your own, all just by pointing vaguely to things in the past. It takes actual courage to form an opinion, something boosters fundamentally lack.  I’m not saying it’s impossible to make predictions, but that the majority of people make them with flimsy information, such as “this thing happened before” or “everyone’s saying this will happen.” I’m not saying you can’t try and understand what will happen next, but doing so requires you to use information that is not, on its face, generated by wishcasting or events that took place decades ago.  In the end, the greatest lesson we can learn from is that, historically speaking, people tend to fuck around and then find out.  The assumption boosters make is that one can fuck around forever. History tends to disagree. Iran rescinds its ban on travel through the strait. The security situation improves (either because Iran’s ability to attack shipping becomes sufficiently degraded, or because the Gulf countries, or perhaps their Western allies, feel sufficiently confident that they can safely escort ships through the strait).  The current Iranian government is overthrown and the conflict ends.  Both sides reach an agreement and we return to the status quo.  April 1 to 30, 2025, which I estimate as $166 million based on reports of Anthropic’s annualized revenue being $2 billion at the end of March 2025. August 1 to August 20, 2025, which I estimate as $271 million based on July 2025’s revenues ($4 billion). November 1 to November 29, 2025, which I estimate as $556 million, based on October’s $7 billion in annualized revenues.  January 1 to January 11, 2026, which I estimate as $219.1 million, assuming $9 billion in annualized revenue (based on reported December revenues). “Ed, it’s commercial revenue!” — this is all revenue. Anthropic doesn’t have “non-commercial revenue,” unless you are going to use a very, very broad version of what “non-commercial” means, at which point you have to tell me why you trust Anthropic. “This doesn’t include all the revenue up until March 2026! Maybe this suit was written weeks ago!” — even if it doesn’t, based on Anthropic’s own numbers, things don’t line up. Also, this was written specifically as part of the lawsuit with the DoD. It’s recent.  “It says “exceeding”! — it also says “over $10 billion in inference and training costs.” Can I just say whatever number I want here? Because if this is your argument that’s what you’re doing. “That $5 billion number is accurate!” — the only way this makes sense is if some or all of these annualized revenues are incorrect.

0 views
Chris Coyier Yesterday

Claude is an Electron App

Juicy intro from Nikita Prokopov : In  “Why is Claude an Electron App?”  Drew Breunig wonders: Claude spent $20k on an agent swarm implementing (kinda) a C-compiler in Rust, but desktop Claude is an Electron app. If code is free, why aren’t all apps native? And then argues that the answer is that LLMs are not good enough yet. They can do 90% of the work, so there’s still a substantial amount of manual polish, and thus, increased costs. But I think that’s not the real reason. The real reason is: native has nothing to offer.

0 views
Martin Fowler Yesterday

Fragments: March 10

Tech firm fined $1.1m by California for selling high-school students’ data I agree with Brian Marick’s response No such story should be published without a comparison of the fine to the company’s previous year revenue and profits, or valuation of last funding round. (I could only find a valuation of $11.0M in 2017.) We desperately need corporations’ attitudes to shift from “lawbreaking is a low-risk cost of doing business; we get a net profit anyway” to “this could be a death sentence.” ❄                ❄                ❄                ❄                ❄ Charity Majors gave the closing keynote at SRECon last year, encouraging people to engage with generative AI. If I was giving the keynote at SRECon 2026, I would ditch the begrudging stance. I would start by acknowledging that AI is radically changing the way we build software. It’s here, it’s happening, and it is coming for us all. Her agenda this year would be to tell everyone that they mustn’t wait for the wave to crash on them, but to swim out to meet it. In particular, I appreciated her call to resist our confirmation bias: The best advice I can give anyone is: know your nature, and lean against it. ❄                ❄                ❄                ❄                ❄ In a comment to Kief Morris’s recent article on Humans and Agents in Software Loops , in LinkedIn comments Renaud Wilsius may have coined another bit of terminology for the agent+programmer age This completes the story of productivity, but it opens a new chapter on talent: The Apprentice Gap. If we move humans ‘on the loop’ too early in their careers, we risk a future where no one understands the ‘How’ deeply enough to build a robust harness. To manage the flywheel effectively, you still need the intuition that comes from having once been ‘in the loop.’ The next great challenge for CTOs isn’t just Harness Engineering, it’s ‘Experience Engineering’ for our junior developers in an agentic world. ❄                ❄                ❄                ❄                ❄ In hearing conversations about “the ralph loop”, I often hear it in the sense of just letting the agents loose to run on their own. So it’s interesting to read the originator of the ralph loop point out: It’s important to watch the loop as that is where your personal development and learning will come from. When you see a failure domain – put on your engineering hat and resolve the problem so it never happens again. In practice this means doing the loop manually via prompting or via automation with a pause that involves having to prcss CTRL+C to progress onto the next task. This is still ralphing as ralph is about getting the most out how the underlying models work through context engineering and that pattern is GENERIC and can be used for ALL TASKS. At the Thoughtworks Future of Software Development Retreat we were very concerned about cognitive debt. Watching the loop during ralphing is a way to learn about what the agent is building, so that it can be directed effectively in the future. ❄                ❄                ❄                ❄                ❄ Anthropic recently published a page on how AI helps break the cost barrier to COBOL modernization . Using AI to help migrate COBOL systems isn’t an new idea to my colleagues, who shared their experiences using AI for this task over a year ago. While Anthropic’s article is correct about the value of AI, there’s more to the process than throwing some COBOL at an LLM. The assumption that AI can simply translate COBOL into Java treats modernization as a syntactic exercise, as though a system is nothing more than its source code. That premise is flawed. A direct translation would, in the best case scenario, faithfully reproduce existing architectural constraints, accumulated technical debt and outdated design decisions. It wouldn’t address weaknesses; it would restate them in a different language. In practice, modernization is rarely about preserving the past in a new syntax. It’s about aligning systems with current market demands, infrastructure paradigms, software supply chains and operating models. Even if AI were eventually capable of highly reliable code translation, blind conversion would risk recreating the same system with the same limitations, in another language, without a deliberate strategy for replacing or retiring its legacy ecosystem. ❄                ❄                ❄                ❄                ❄ Anders Hoff (inconvergent) an LLM is a compiler in the same way that a slot machine is an ATM ❄                ❄                ❄                ❄                ❄ One of the more interesting aspects of the network of people around Jeffrey Epstein is how many people from academia were connected. It’s understandable why, he had a lot of money to offer, and most academics are always looking for funding for their work. Most of the attention on Epstein’s network focused on those that got involved with him, but I’m interested in those who kept their distance and why - so I enjoyed Jeffrey Mervis’s article in Science Many of the scientists Epstein courted were already well-established and well-funded. So why didn’t they all just say no? Science talked with three who did just that. Here’s how Epstein approached them, and why they refused to have anything to do with him. I believe that keeping away from bad people makes life much more pleasant, if nothing else it reduces a lot of stress. So it’s good to understand how people make decisions on who to avoid. If you are a reflexive naysayer or a pessimist, know that, and force yourself to find a way in to wonder, surprise and delight. If you are an optimist who gets very excited and tends to assume that everything will improve: know that, and force yourself to mind real cautionary tales.

0 views
Ankur Sethi Yesterday

I built a programming language using Claude Code

Over the course of four weeks in January and February, I built a new programming language using Claude Code. I named it Cutlet after my cat. It’s completely legal to do that. You can find the source code on GitHub , along with build instructions and example programs . I’ve been using LLM-assisted programming since the original GitHub Copilot release in 2021, but so far I’ve limited my use of LLMs to generating boilerplate and making specific, targeted changes to my projects. While working on Cutlet, though, I allowed Claude to generate every single line of code. I didn’t even read any of the code. Instead, I built guardrails to make sure it worked correctly (more on that later). I’m surprised by the results of this experiment. Cutlet exists today. It builds and runs on both macOS and Linux. It can execute real programs. There might be bugs hiding deep in its internals, but they’re probably no worse than ones you’d find in any other four-week-old programming language in the world. I have Feelings™ about all of this and what it means for my profession, but I want to give you a tour of the language before I get up on my soapbox. If you want to follow along, build the Cutlet interpreter from source and drop into a REPL using . Arrays and strings work as you’d expect in any dynamic language. Variables are declared with the keyword. Variable names can include dashes. Same syntax rules as Raku . The only type of number (so far) is a double. Here’s something cool: the meta-operator turns any regular binary operator into a vectorized operation over an array. In the next line, we’re multiplying every element of by 1.8, then adding 32 to each element of the resulting array. The operator is a zip operation. It zips two arrays into a map. Output text using the built-in function. This function returns , which is Cutlet’s version of . The meta operator also works with comparisons. Here’s another cool bit: you can index into an array using an array of booleans. This is a filter operation. It picks the element indexes corresponding to and discards those that correspond to . Here’s a shorter way of writing that. Let’s print this out with a user-friendly message. The operator concatenates strings and arrays. The built-in turns things into strings. The meta-operator in the prefix position acts as a reduce operation. Let’s find the average temperature. adds all the temperatures, and the built-in finds the length of the array. Let’s print this out nicely, too. Functions are declared with . Everything in Cutlet is an expression, including functions and conditionals. The last value produced by an expression in a function becomes its return value. Your own functions can work with too. Let’s reduce the temperatures with our function to find the hottest temperature. Cutlet can do a lot more. It has all the usual features you’d expect from a dynamic language: loops, objects, prototypal inheritance, mixins, a mark-and-sweep garbage collector, and a friendly REPL. We don’t have file I/O yet, and some fundamental constructs like error handling are still missing, but we’re getting there! See TUTORIAL.md in the git repository for the full documentation. I’m a frontend engineer and (occasional) designer. I’ve tried using LLMs for building web applications, but I’ve always run into limitations. In my experience, Claude and friends are scary good at writing complex business logic, but fare poorly on any task that requires visual design skills. Turns out describing responsive layouts and animations in English is not easy. No amount of screenshots and wireframes can communicate fluid layouts and animations to an LLM. I’ve wasted hours fighting with Claude about layout issues it swore it had fixed, but which I could still see plainly with my leaky human eyes. I’ve also found these tools to excel at producing cookie-cutter interfaces they’ve seen before in publicly available repositories, but they fall off when I want to do anything novel. I often work with clients building complex data visualizations for niche domains, and LLMs have comprehensively failed to produce useful outputs on these projects. On the other hand, I’d seen people accomplish incredible things using LLMs in the last few months, and I wanted to replicate those experiments myself. But my previous experience with LLMs suggested that I had to pick my project carefully. A small, dynamic programming language met all my requirements. Finally, this was also an experiment to figure out how far I could push agentic engineering. Could I compress six months of work into a few weeks? Could I build something that was beyond my own ability to build? What would my day-to-day work life look like if I went all-in on LLM-driven programming? I wanted to answer all these questions. I went into this experiment with some skepticism. My previous attempts at building something entirely using Claude Code hadn’t worked out. But this attempt has not only been successful, but produced results beyond what I’d imagined possible. I don’t hold the belief that all software in the future will be written by LLMs. But I do believe there is a large subset that can be partially or mostly outsourced to these new tools. Building Cutlet taught me something important: using LLMs to produce code does not mean you forget everything you’ve learned about building software. Agentic engineering requires careful planning, skill, craftsmanship, and discipline, just like any software worth building before generative AI. The skills required to work with coding agents might look different from typing code line-by-line into an editor, but they’re still very much the same engineering skills we’ve been sharpening all our careers. There is a lot of work involved in getting good output from LLMs. Agentic engineering does not mean dumping vague instructions into a chat box and harvesting the code that comes out. I believe there are four main skills you have to learn today in order to work effectively with coding agents: Models and harnesses are changing rapidly, so figuring out which problems LLMs are good at solving requires developing your intuition, talking to your peers, and keeping your ear to the ground. However, if you don’t want to stay up-to-date with a rapidly-changing field—and I wouldn’t judge you for it, it’s crazy out there—here are two questions you can ask yourself to figure out if your problem is LLM-shaped: If the answer to either of those questions is “no”, throwing AI at the problem is unlikely to yield good results. If the answer to both of them is “yes”, then you might find success with agentic engineering. The good news is that the cost of figuring this out is the price of a Claude Code subscription and one sacrificial lamb on your team willing to spend a month trying it out on your codebase. LLMs work with natural language, so learning to communicate your ideas using words has become crucial. If you can’t explain your ideas in writing to your co-workers, you can’t work effectively with coding agents. You can get a lot out of Claude Code using simple, vague, overly general prompts. But when you do that, you’re outsourcing a lot of your thinking and decision-making to the robot. This is fine for throwaway projects, but you probably want to be more careful when you’re building something you will put into production and maintain for years. You want to feed coding agents precisely written specifications that capture as much of your problem space as possible. While working on Cutlet, I spent most of my time writing, generating, reading, and correcting spec documents . For me, this was a new experience. I primarily work with early-stage startups, so for most of my career, I’ve treated my code as the spec. Writing formal specifications was an alien experience. Thankfully, I could rely on Claude to help me write most of these specifications. I was only comfortable doing this because Cutlet was an experiment. On a project I wanted to stake my reputation on, I might take the agent out of the equation altogether and write the specs myself. This was my general workflow while making any change to Cutlet: This workflow front-loaded the cognitive effort of making any change to the language. All the thinking happened before a single line of code was written, which is something I almost never do. For me, programming involves organically discovering the shape of a problem as I’m working on it. However, I’ve found that working that way with LLMs is difficult. They’re great at making sweeping changes to your codebase, but terrible at quick, iterative, organic development workflows. Maybe my workflow will evolve as inference gets faster and models become better, but until then, this waterfall-style model works best. I find this to be the most interesting and fun part of working with coding agents. It’s a whole new class of problem to solve! The core principle is this: coding agents are computer programs, and therefore have a limited view of the world they exist in. Their only window into the problem you’re trying to solve is the directory of code they can access. This doesn’t give them enough agency or information to be able to do a good job. So, to help them thrive, you must give them that agency and information in the form of tools they can use to reach out into the wider world. What does this mean in practice? It looks different for different projects, but this is what I did for Cutlet: All these tools and abilities guaranteed that any updates to the code resulted in a project that at least compiled and executed. But more importantly, they increased the information and agency Claude had access to, making it more effective at discovering and debugging problems without my intervention. If I keep working on this project, my main focus will be to give my agents even more insight into the artifact they are building, even more debugging tools, even more freedom, and even more access to useful information. You will want to come up with your own tooling that works for your specific project. If you’re building a Django app, you might want to give the agent access to a staging database. If you’re building a React app, you might want to give it access to a headless browser. There’s no single answer that works for every project, and I bet people are going to come up with some very interesting tools that allow LLMs to observe the results of their work in the real world. Coding agents can sometimes be inefficient in how they use the tools you give them. For example, while working on this project, sometimes Claude would run a command, decide its output was too long to fit into the context window, and run it again with the output piped to . Other times it would run , forget to the output for errors, and run it a second time to capture the output. This would result in the same expensive checks running multiple times in the course of making a single edit. These mistakes slowed down the agentic loop significantly. I could fix some of these performance bottlenecks by editing or changing the output of a custom script. But there were some issues that required more effort to discover and fix. I quickly got into the habit of observing the agent at work, noticing sequences of commands that the agent repeated over and over again, and turning them into scripts for the agent to call instead. Many of the scripts in Cutlet’s directory came about this way. This was very manual, very not-fun work. I’m hoping this becomes more automated as time goes on. Maybe a future version of Claude Code could review its own tool calling outputs and suggest scripts you could write for it? Of course, the most fruitful optimization was to run Claude inside Docker with and access. By doing this, I took myself out of the agentic loop. After a plan file had been produced, I didn’t want to hang around babysitting agents and saying every time they wanted to run . As Cutlet evolved, the infrastructure I built for Claude also evolved. Eventually, I captured many of the workflows Claude naturally followed as scripts, slash commands, or instructions in . I also learned where the agent stumbled most, and preempted those mistakes by giving it better instructions or scripts to run. The infrastructure I built for Claude was also valuable for me, the human working on the project. The same scripts that helped Claude automate its work also helped me accomplish common tasks quickly. As the project grows, this infrastructure will keep evolving along with it. Models change all the time. So do project requirements and workflows. I look at all this project infrastructure as an organic thing that will keep changing as long as the project is active. Now that it’s possible for individual developers to accomplish so much in such little time, is software engineering as a career dead? My answer to this question is nope, not at all . Software engineering skills are just as valuable today as they were before language models got good. If I hadn’t taken a compilers course in college and worked through Crafting Interpreters , I wouldn’t have been able to build Cutlet. I still had to make technical decisions that I could only make because I had (some) domain knowledge and experience. Besides, I had to learn a bunch of new skills in order to effectively work on Cutlet. These new skills also required technical knowledge. A strange and new and different kind of technical knowledge, but technical knowledge nonetheless. Before working on this project, I was worried about whether I’d have a job five years from now. But today I’m convinced that the world will continue to have a need for software engineers in the future. Our jobs will transform—and some people might not enjoy the new jobs anymore—but there will still be plenty of work for us to do. Maybe we’ll have even more work to do than before, since LLMs allow us to build a lot more software a lot faster. And for those of us who never want to touch LLMs, there will be domains where LLMs never make any inroads. My friends who work on low-level multimedia systems have found less success using LLMs compared to those who build webapps. This is likely to be the case for many years to come. Eventually, those jobs will transform, too, but it will be a far slower shift. Is it fair to say that I built Cutlet? After all, Claude did most of the work. What was my contribution here besides writing the prompts? Moreover, this experiment only worked because Claude had access to multiple language runtimes and computer science books in its training data. Without the work done by hundreds of programmers, academics, and writers who have freely donated their work to the public, this project wouldn’t have been possible. So who really built Cutlet? I don’t have a good answer to that. I’m comfortable taking credit for the care and feeding of the coding agent as it went about generating tokens, but I don’t feel a sense of ownership over the code itself. I don’t consider this “my” work. It doesn’t feel right. Maybe my feelings will change in the future, but I don’t quite see how. Because of my reservations about who this code really belongs to, I haven’t added a license to Cutlet’s GitHub repository. Cutlet belongs to the collective consciousness of every programming language designer, implementer, and educator to have released their work on the internet. (Also, it’s worth noting that Cutlet almost certainly includes code from the Lua and Python interpreters. It referred to those languages all the time when we talked about language features. I’ve also seen a ton of code from Crafting Interpreters making its way into the codebase with my own two fleshy eyes.) I’d be remiss if I didn’t include a note on mental health in this already mammoth blog post. It’s easy to get addicted to agentic engineering tools. While working on this project, I often found myself at my computer at midnight going “just one more prompt”, as if I was playing the world’s most obscure game of Civilization . I’m embarrassed to admit that I often had Claude Code churning away in the background when guests were over at my place, when I stepped into the shower, or when I went off to lunch. There’s a heady feeling that comes from accomplishing so much in such little time. More addictive than that is the unpredictability and randomness inherent to these tools. If you throw a problem at Claude, you can never tell what it will come up with. It could one-shot a difficult problem you’ve been stuck on for weeks, or it could make a huge mess. Just like a slot machine, you can never tell what might happen. That creates a strong urge to try using it for everything all the time. And just like with slot machines, the house always wins. These days, I set limits for how long and how often I’m allowed to use Claude. As LLMs become widely available, we as a society will have to figure out the best way to use them without destroying our mental health. This is the part I’m not very optimistic about. We have comprehensively failed to regulate or limit our use of social media, and I’m willing to bet we’ll have a repeat of that scenario with LLMs. Now that we can produce large volumes of code very quickly, what can we do that we couldn’t do before? This is another question I’m not equipped to answer fully at the moment. That said, one area where I can see LLMs being immediately of use to me personally is the ability to experiment very quickly. It’s very easy for me to try out ten different features in Cutlet because I just have to spec them out and walk away from the computer. Failed experiments cost almost nothing. Even if I can’t use the code Claude generates, having working prototypes helps me validate ideas quickly and discard bad ones early. I’ve also been able to radically reduce my dependency on third-party libraries in my JavaScript and Python projects. I often use LLMs to generate small utility functions that previously required pulling in dependencies from NPM or PyPI. But honestly, these changes are small beans. I can’t predict the larger societal changes that will come about because of AI agents. All I can say is programming will look radically different in 2030 than it does in 2026. This project was a proof of concept to see how far I could push Claude Code. I’m currently looking for a new contract as a frontend engineer, so I probably won’t have the time to keep working on Cutlet. I also have a few more ideas for pushing agentic programming further, so I’m likely to prioritize those over continuing work on Cutlet. When the mood strikes me, I might still add small features now and then to the language. Now that I’ve removed myself from the development loop, it doesn’t take a lot of time and effort. I might even do Advent of Code using Cutlet in December! Of course, if you work at Anthropic and want to give me money so I can keep running this experiment, I’m available for contract work for the next 8 months :) For now, I’m closing the book on Cutlet and moving on to other projects (and cat). Thanks to Shruti Sunderraman for proofreading this post. Also thanks to Cutlet the cat for walking across the keyboard and deleting all my work three times today. I didn’t want to solve a particularly novel problem, but I wanted the ability to sometimes steer the LLM into interesting directions. I didn’t want to manually verify LLM-generated code. I wanted to give the LLM specifications, test cases, documentation, and sample outputs, and make it do all the difficult work of figuring out if it was doing the right thing. I wanted to give the agent a strong feedback loop so it could run autonomously. I don’t like MCPs. I didn’t want to deal with them. So anything that required connecting to a browser, taking screenshots, or talking to an API over the network was automatically disqualified. I wanted to use a boring language with as few external dependencies as possible. LLMs know how to build language implementations because their training data contains thousands of existing implementations, papers, and CS books. I was intrigued by the idea of creating a “remix” language by picking and choosing features I enjoy from various existing languages. I could write a bunch of small deterministic programs along with their expected outputs to test the implementation. I could even get Claude to write them for me, giving me a potentially infinite number of test cases to verify that the language was working correctly. Language implementations can be tested from the command line, with purely textual inputs and outputs. No need to take screenshots or videos or set up fragile MCPs. There’s no better feedback loop for an agent than “run and until there are no more errors”. C is as boring as it gets, and there are a large number of language implementations built in C. Understanding which problems can be solved effectively using LLMs, which ones need a human in the loop, and which ones should be handled entirely by humans. Communicating your intent clearly and defining criteria for success. Creating an environment in which the LLM can do its best work. Monitoring and optimizing the agentic loop so the agent can work efficiently. For the problem you want to solve, is it possible to define and verify success criteria in an automated fashion? Have other people solved this problem—or a similar one—before? In other words, is your problem likely to be in the training data for an LLM? First, I’d present the LLM with a new feature (e.g. loops) or refactor (e.g. moving from a tree-walking interpreter to a bytecode VM). Then I’d have a conversation with it about how the change would work in the context of Cutlet, how other languages implemented it, design considerations, ideas we could steal from interesting/niche languages, etc. Just a casual back-and-forth, the same way you might talk to a co-worker. After I had a good handle on what the feature or change would look like, I’d ask the LLM to give me an implementation plan broken down into small steps. I’d review the plan and go back and forth with the LLM to refine it. We’d explore various corner cases, footguns, gotchas, missing pieces, and improvements. When I was happy with the plan, I’d ask the LLM to write it out to a file that would go into a directory. Sometimes we’d end up with 3-4 plan files for a single feature. This was intentional. I needed the plans to be human-readable, and I needed each plan to be an atomic unit I could roll back if things didn’t work out. They also served as a history of the project’s evolution. You can find all the historical plan files in the Cutlet repository. I’d read and review the generated plan file, go back and forth again with the LLM to make changes to it, and commit it when everything looked good. Finally, I’d fire up a Docker container, run Claude with all permissions—including access—and ask it to implement my plan. Comprehensive test suite . My project instructions told Claude to write tests and make sure they failed before writing any new code. Alongside, I asked it to run tests after making significant code changes or merging any branches. Armed with a constantly growing test suite, Claude was able to quickly identify and fix any regressions it introduced into the codebase. The tests also served as documentation and specification. Sample inputs and outputs . These were my integration tests. I added a number of example programs to the Cutlet repository—most of them written by Claude itself—that not only serve as documentation for humans, but also as an end-to-end test suite. The project instructions told Claude to run all of them and verify their output after every code change. Linters, formatters, and static analysis tools . Cutlet uses and to ensure a baseline of code quality. Just like with tests, the project instructions asked the LLM to run these tools after every major code change. I noticed that would often produce diagnostics that would force Claude to rewrite parts of the code. If I had access to some of the more expensive static analysis tools (such as Coverity ), I would have added them to my development process too. Memory safety tools . I asked Claude to create a target that rebuilt the entire project and test suite with ASan and UBSan enabled (with LSan riding along via ASan), then ran every test under the instrumented build. The project instructions included running this check at the end of implementing a plan. This caught memory errors—use-after-free, buffer overflows, undefined behavior—that neither the tests nor the linter could find. Running these tests took time and greatly slowed down the agent, but they caught even more issues than . Symbol indexes . The agent had access to and for navigating the source code. I don’t know how useful this was, because I rarely ever saw it use them. Most of the time it would just the code for symbols. I might remove this in the future. Runtime introspection tools . Early in the project, I asked Claude to give Cutlet the ability to dump the token stream, AST, and bytecode for any piece of code to the standard output before executing it. This allowed the agent to quickly figure out if it had introduced errors into any part of the execution pipeline without having to navigate the source code or drop into a debugger. Pipeline tracing . I asked Claude to write a Python script that fed a Cutlet program through the interpreter with debug flags to capture the full compilation pipeline : the token stream, the AST, and the bytecode disassembly. It then mapped each token type, AST node, and opcode back to the exact source locations in the parser, compiler, and VM where they were handled. When an agent needed to add a new language feature, it could run the tracer on an example of a similar existing feature to see precisely which files and functions to touch. I was very proud of this machinery, but I never saw Claude make much use of it either. Running with every possible permission . I wanted the agent to work autonomously and have access to every debugging tool it might want to use. To do this, I ran it inside a Docker container with enabled and full access. I believe this is the only practical way to use coding agents on large projects. Answering permissions prompts is cognitively taxing when you have five agents working in parallel, and restricting their ability to do whatever they want makes them less effective at their job. We will need to figure out all sorts of safety issues that arise when you give LLMs the ability to take full control of a system, but on this project, I was willing to accept the risks that come with YOLO mode.

0 views
David Bushell Yesterday

Building on AT Protocol

At Protocol has got me! I’m morphing into an atmosphere nerd . AT Protocol — atproto for short — is the underlying tech that powers Bluesky and new social web apps. Atproto as I understand it is largely an authorization and data layer. All atproto data is inherently public. In theory it can be encrypted for private use but leaky metadata and de-anonymisation is a whole thing. Atproto users own the keys to their data which is stored on a Personal Data Server (PDS). You don’t need to manage your own. If you don’t know where your data is stored, good chance it’s on Bluesky’s PDS. You can move your data to another PDS like Blacksky or Eurosky . Or if you’re a nerd like me self-host your own PDS . You own your data and no PDS can stop you moving it. Atproto provides OAuth; think “Sign in with GitHub” . But instead of an account being locked behind the whims of proprietary slopware, user identity is proven via their PDS. Social apps like Bluesky host a PDS allowing users to create a new account. That account can be used to login to other apps like pckt , Leaflet , or Tangled . You could start a new account on Tangled’s PDS and use that for Bluesky. Atproto apps are not required to provide a PDS but it helps to onboard new users. Of course I did. You can sign in at attic.social Attic is a cozy space with lofty ambitions. What does Attic do? I’m still deciding… it’ll probably become a random assortment of features. Right now it has bookmarks. Bookmarks will have search and tags soon. Technical details: to keep the server stateless I borrowed ideas from my old SvelteKit auth experiment. OAuth and session state is stored in encrypted HTTP-only cookies. I used the atcute TypeScript libraries to do the heavy atproto work. I found @flo-bit’s projects which helped me understand implementation details. Attic is on Cloudflare workers for now. When I’ve free time I’ll explore the SvelteKit Bunny adapter . I am busy on client projects so I’ll be scheming Attic ideas in my free time. What’s so powerful about atproto is that users can move their account/data. Apps write data to a PDS using a lexicon ; a convention to say: “this is a Bluesky post”, for example. Other apps are free to read that data too. During authorization, apps must ask for permission to write to specific lexicons. The user is in control. You may have heard that Bluesky is or isn’t “decentralised”. Bluesky was simply the first atproto app. Most users start on Bluesky and may never be aware of the AT Protocol. What’s important is that atproto makes it difficult for Bluesky to “pull a Twitter”, i.e. kill 3rd party apps, such as the alternate Witchsky . If I ever abandon attic.social your data is still in your hands. Even if the domain expires! You can extract data from your PDS. You can write a new app to consume it anytime. That’s the power of AT Protocol. Thanks for reading! Follow me on Mastodon and Bluesky . Subscribe to my Blog and Notes or Combined feeds.

0 views

Cacheman: A Comprehensive Last-Level Cache Management System for Multi-tenant Clouds

Cacheman: A Comprehensive Last-Level Cache Management System for Multi-tenant Clouds I learned a lot about the LLC configuration and monitoring capabilities of modern CPUs from this paper, I bet you will too. The problem this paper addresses is: how to avoid performance variability in cloud applications due to cross-VM contention for the last level cache (e.g., the L3 cache on a Xeon)? In a typical CPU, the L1 and L2 caches are private to a core, but the L3 is shared. In a cloud environment, the L3 is shared by multiple tenants, and is an avenue for a “noisy neighbor” to annoy its neighbors. The work described by this paper builds upon Intel CMT and CAT. Cache Monitoring Technology allows the hypervisor to track how much of the L3 cache is occupied by each VM. Cache Allocation Technology allows the hypervisor to restrict a VM to only use a subset of the L3. CAT allows a VM to be assigned to a cache level of service (CLOS), which defines the set of L3 ways accessible to the VM ( this page defines the term “ways” if you are unfamiliar). A typical CPU used by a cloud service provider has more CPU cores than L3 ways. If a cloud server hosts many small VMs, then L3 ways must be shared amongst VMs. The key problem solved by this paper is how to reduce performance variability given this constraint. Fig. 1 illustrates the assignments of CLOS levels to LLC ways advocated by this paper. Each row is a level of service, and each column is a way of the LLC cache. CLOS[0] can access all ways, CLOS[1] can access all LLC ways except for one. CLOS[7] can only access a single way of the LLC. Source: https://dl.acm.org/doi/10.1145/3774934.3786415 The hypervisor uses Intel CMT to monitor how much of the LLC is occupied by each VM. Every 6 seconds the hypervisor uses this information to change the CLOS that each VM is assigned to. The hypervisor computes a target LLC occupancy for each VM based on the number of cores assigned to the VM. This target is compared against the measured LLC occupancy to classify each VM into one of three categories: Poor (the VM is starved for space) Adequate (the VM is using just the right amount of cache) Excess (the VM is hogging too much) VMs in the poor category are de-suppressed (i.e., assigned to a CLOS with access to more LLC ways). Additionally, VMs in the excess category are suppressed (i.e., assigned to a CLOS with access to fewer ways), but this suppression only occurs when there are VMs in the poor category. This policy means that cache-hungry VMs can use more than their fair share of the L3 during periods of low server utilization. This can lead to higher mean performance, at the cost of a wider standard deviation. The paper describes a 4th state (overflow), which is only applied to VMs that wish to be held back even if there is plenty of L3 space available. These VMs are suppressed when they are found to be using too much L3, even if all other VMs on the system are getting enough cache space. Fig. 5 shows a case where this strategy works well compared to static allocation. The server in question is running 5 VMs, each running a different application: VM1 - 32 cores VM2 - 16 cores (but doesn’t fully utilize those cores) VM3 - 8 cores VM4 - 4 cores VM5 - 4 cores The top of figure 5 shows a simple static partitioning of LLC ways. VM1 is assigned to 6 ways, VM2 is assigned to 3 ways, VM3 is assigned to 2 ways, and VMs 4 and 5 must share 1 way. They have to share because sharing based on the number of ways in the LLC is inherently coarse-grained. The two charts show measured LLC utilization over 10 minutes. Notice the Y-axis. The technique described in this paper (Cacheman) allows VM4 and VM5 to use far more aggregate LLC capacity than the static partitioning. Also notice that in the static partitioning, VM5 always uses more LLC than VM4 (because they are running different applications), whereas Cacheman allows for a more even balance between them. Source: https://dl.acm.org/doi/10.1145/3774934.3786415 Dangling Pointers While the L3 cache is logically a monolithic shared resource, it is physically partitioned across the chip (with a separate slice near each core). It seems like it could be more efficient if VMs could be assigned to nearby L3 slices rather than L3 ways. Subscribe now Poor (the VM is starved for space) Adequate (the VM is using just the right amount of cache) Excess (the VM is hogging too much) VM1 - 32 cores VM2 - 16 cores (but doesn’t fully utilize those cores) VM3 - 8 cores VM4 - 4 cores VM5 - 4 cores

0 views
Kev Quirk Yesterday

Pure Blog Is Now Feature Complete...ish

I've just released v1.8.0 of Pure Blog , which was the final big feature I wanted to add 1 . At this point, Pure Blog does all the things I would want a useful CMS to do, such as: The result is a tool that works exactly how I want it to work. It's very simple to customise through the admin GUI, but there are also lots of advanced options available to more tech-savvy folk. Someone reached out to me recently and told me that their non-technical grandfather is running Pure Blog with no issues. Equally, I've had developers reach out to say that they're enjoying the flexibility of Pure Blog too. This is exactly why I created Pure Blog - to create a tool that can be used by anyone. My original plan was to just make a simple blogging platform, but I've ended up creating a performant platform that can be used for all kinds of sites, not just a blog. At this point I'm considering Pure Blog to be feature complete*. But there is an asterisk there, because you never know what the future holds. Right now it supports everything I want it to support, but my needs may change in the future. If they do, I'll develop more features. In the meantime I'm going to enjoy what I've built by continuing to produce content in this lovely little CMS (even if I do say so myself). I know there's a few people using Pure Blog our there, so I hope you're enjoying it as much as I am. If you want to try Pure Blog yourself, you can download the source code from here , and this post should get you up and running in just a few minutes. One could argue that previous versions were just development releases, and this is really v1.0, but I've gone with the versioning I went with, and I can't be bothered changing that now. :-)  ↩ This site scores a 96 on Google's Pagespeed Insights. Pretty impressive for a dynamic PHP-based site.  ↩ Thanks for reading this post via RSS. RSS is ace, and so are you. ❤️ You can reply to this post by email , or leave a comment . Storing content in plain markdown, just like an SSG. Easy theme customisations . Hooks for doing clever things when something happens. Data files so I can loop through data to produce pages where I don't have to duplicate effort, like on my blogroll . A couple of simple shortcodes to make my life easier. Layout partials so I can customise certain parts of the site. Custom routes so I can add little extra features, like a discover page , or the ability to visit a random post . Caching because no-one wants a slow site 2 . Custom layouts and functions so I can go even deeper with my customisations without touching the core code base. One could argue that previous versions were just development releases, and this is really v1.0, but I've gone with the versioning I went with, and I can't be bothered changing that now. :-)  ↩ This site scores a 96 on Google's Pagespeed Insights. Pretty impressive for a dynamic PHP-based site.  ↩

0 views
Stratechery Yesterday

Copilot Cowork, Anthropic’s Integration, Microsoft’s New Bundle

Microsoft is seeking to commoditize its complements, but Anthropic has a point of integration of their own; it's good enough that Microsoft is making a new bundle on top of it.

0 views
matduggan.com Yesterday

Update to the Ghost theme that powers this site

I added a few modifications to the OSS Ghost theme that powers this site. You can get it here: https://gitlab.com/matdevdug/minimal-ghost-theme I tried to make it pretty easy to customize, but if you need something changed feel free to open an issue on the repo. Thanks for all the feedback! Added better image caption support. Added the cool Mastodon feature outlined here to attribute posts from your site back to your Mastodon username by following the instructions here.

0 views

Dependency tracking is hard

curl and libcurl are written in C. Rather low level components present in many software systems. They are typically not part of any ecosystem at all. They’re just a tool and a library. In lots of places on the web when you mention an Open Source project, you will also get the option to mention in which ecosystem it belongs. npm, go, rust, python etc. There are easily at least a dozen well-known and large ecosystems. curl is not part of any of those. Recently there’s been a push for PURLs ( Package URLs ), for example when describing your specific package in a CVE. A package URL only works when the component is part of an ecosystem. curl is not. We can’t specify curl or libcurl using a PURL. SBOM generators and related scanners use package managers to generate lists of used components and their dependencies . This makes these tools quite frequently just miss and ignore libcurl. It’s not listed by the package managers. It’s just in there, ready to be used. Like magic. It is similarly hard for these tools to figure out that curl in turn also depends and uses other libraries. At build-time you select which – but as we in the curl project primarily just ships tarballs with source code we cannot tell anyone what dependencies their builds have. The additional libraries libcurl itself uses are all similarly outside of the standard ecosystems. Part of the explanation for this is also that libcurl and curl are often shipped bundled with the operating system many times, or sometimes perceived to be part of the OS. Most graphs, SBOM tools and dependency trackers therefore stop at the binding or system that uses curl or libcurl, but without including curl or libcurl. The layer above so to speak. This makes it hard to figure out exactly how many components and how much software is depending on libcurl. A perfect way to illustrate the problem is to check GitHub and see how many among its vast collection of many millions of repositories that depend on curl. After all, curl is installed in some thirty billion installations, so clearly it used a lot . (Most of them being libcurl of course.) It lists one dependency for curl. Repositories that depend on curl/curl: one. Screenshot taken on March 9, 2026 What makes this even more amusing is that it looks like this single dependent repository ( Pupibent/spire ) lists curl as a dependency by mistake.

0 views