Latest Posts (20 found)

curl 8.19.0

Release presentation Welcome to the curlhacker stream at 10:00 CET (09:00 UTC) today March 11, 2026 for a live-streamed presentation of curl 8.19.0. The changes, the security fixes and some bugfixes. the 273rd release 8 changes 63 days (total: 10,712) 264 bugfixes (total: 13,640) 538 commits (total: 38,024) 0 new public libcurl function (total: 100) 0 new curl_easy_setopt() option (total: 308) 0 new curl command line option (total: 273) 77 contributors, 48 new (total: 3,619) 37 authors, 21 new (total: 1,451) 4 security fixes (total: 180) We stopped the bug-bounty but it has not stopped people from finding vulnerabilities in curl. The following upcoming changes might be worth noticing. See the deprecate documentation for details. We plan to ship the next curl release on April 29. See you then! CVE-2026-1965: bad reuse of HTTP Negotiate connection CVE-2026-3783: token leak with redirect and netrc CVE-2026-3784: wrong proxy connection reuse with credentials CVE-2026-3805: use after free in SMB connection reuse We stopped the bug-bounty. It’s worth repeating, even if it was no code change. The cmake build got a option Initial support for MQTTS was merged curl now supports fractions for –limit-rate and –max-filesize curl’s -J option now uses the redirect name as a backup we no longer support OpenSSL-QUIC on Windows, curl can now get built to use the native CA store by default the minimum Windows version curl supports is now Vista (up from XP) NTLM support becomes opt-in RTMP support is getting dropped SMB support becomes opt-in Support for c-ares versions before 1.16 goes away Support for CMake 3.17 and earlier gets dropped TLS-SRP support will be removed

0 views
iDiallo Today

The Server Older than my Kids!

This blog runs on two servers. One is the main PHP blog engine that handles the logic and the database, while the other serves all static files. Many years ago, an article I wrote reached the top position on both Hacker News and Reddit. My server couldn't handle the traffic . I literally had a terminal window open, monitoring the CPU and restarting the server every couple of minutes. But I learned a lot from it. The page receiving all the traffic had a total of 17 assets. So in addition to the database getting hammered, my server was spending most of its time serving images, CSS and JavaScript files. So I decided to set up additional servers to act as a sort of CDN to spread the load. I added multiple servers around the world and used MaxMindDB to determine a user's location to serve files from the closest server . But it was overkill for a small blog like mine. I quickly downgraded back to just one server for the application and one for static files. Ever since I set up this configuration, my server never failed due to a traffic spike. In fact, in 2018, right after I upgraded the servers to Ubuntu 18.04, one of my articles went viral like nothing I had seen before . Millions of requests hammered my server. The machine handled the traffic just fine. It's been 7 years now. I've procrastinated long enough. An upgrade was long overdue. What kept me from upgrading to Ubuntu 24.04 LTS was that I had customized the server heavily over the years, and never documented any of it. Provisioning a new server means setting up accounts, dealing with permissions, and transferring files. All of this should have been straightforward with a formal process. Instead, uploading blog post assets has been a very manual affair. I only partially completed the upload interface, so I've been using SFTP and SCP from time to time to upload files. It's only now that I've finally created a provisioning script for my asset server. I mostly used AI to generate it, then used a configuration file to set values such as email, username, SSH keys, and so on. With the click of a button, and 30 minutes of waiting for DNS to update, I now have a brand new server running Ubuntu 24.04, serving my files via Nginx. Yes, next months Ubuntu 26.04 LTS comes out, and I can migrate it by running the same script. I also built an interface for uploading content without relying on SFTP or SSH, which I'll be publishing on GitHub soon. It's been 7 years running this server. It's older than my kids. Somehow, I feel a pang of emotion thinking about turning it off. I'll do it tonight... But while I'm at it, I need to do something about the 9-year-old and 11-year-old servers that still run some crucial applications.

0 views

Writing an LLM from scratch, part 32e -- Interventions: the learning rate

I'm still working on improving the test loss for a from-scratch GPT-2 small base model, trained on code based on Sebastian Raschka 's book " Build a Large Language Model (from Scratch) ". In my training code, I have this code to create the optimiser: The values in there -- for the learning rate, and for the weight decay -- were just copied from the tiny training run that we do in section 5.2 of the book. What do those values actually mean, and are those really the right values for them? I felt I had a good handle on the learning rate, at least -- it's one of the first things you learn when you start looking at machine learning of any kind -- but how would you go about working out what the correct value for it was? On top of that, when I was reading the Chinchilla paper a while back, I noticed they repeatedly referred to a "cosine cycle" for the learning rate, which didn't fit into anything I'd learned about before. The weight decay was pretty much an unknown for me -- I know it is a parameter controlling the behaviour of the optimiser, but I don't know how it does that. In this post I want to look into the learning rate, and these mysterious cosines; I'll write a follow-up about the weight decay later. If you're reading this blog, you almost certainly know what the learning rate is, but let's go over it briefly to build a solid foundation. The way it's normally explained, using simple gradient descent, goes something like this. Let's assume that we're training a model with just one parameter, and it starts off set to − 5 . We run some training data through, and get a loss, let's say 44.44: We don't know what shape our loss curve is (if we did, we might be able to find the lowest loss algebraically), but we do know the differential of the parameter versus the loss at the point we've measured; it happens to be -13. That is reasonably large and negative: We use that information to say that we want to move in the direction of a larger value for our parameter -- that is, in our case where the gradient is negative, so we have a downhill slope towards the right, we want to increase the parameter to move rightwards on that chart, whereas if it were positive (an uphill slope) we'd want to decrease the parameter to move leftwards. Simply subtracting the gradient from the parameter would lead to an update in the right direction, but it would be a very large one in this case -- we'd move 13 units to the right -- so we multiply the gradient by a small positive number, the learning rate (often written as a lower-case eta, like this: η ), to move a small distance in that direction. Let's say η = 0.3 . That means we want to update our parameter: So now we run that through and get a new loss -- let's say it's 9.06 -- and a new gradient, which happens to be -5.2. Now we can do another update, and our parameter will become 0.46, so we use that and work out another loss and gradient, which come to 3.3816 and -2.08. Let's plot that one, but this time we'll draw back the veil and show the actual loss curve. Now, it's worth reiterating that while we're training this model we don't know what that curve looks like -- we're just finding points on it, along with its gradient at those points, and using that information to work out which parameter value to explore next. But it's pretty clear that as we continue, if the learning rate is set correctly, we'll get to the minimum eventually if the learning rate is the right kind of size, because -- due to the nice smooth U-shape of the curve, the gradient gets smaller the closer we get to the minimum 1 . It's also pretty clear that if the learning rate is smaller than an optimal value, in this simple case we will still find the right point, but it will take more steps because each one is smaller: And, of course, if the learning rate is too high, we might never converge -- we'd "bounce out of" the dip, and wind up with a parameter value that endlessly cycles between increasingly smaller and increasingly larger values, zooming off to infinity: OK, that's the basics. Why might we want to change from something that seems so logical and simple? A few paragraphs back I said: due to the nice smooth U-shape of the curve, the gradient gets smaller the closer we get to the minimum What if it doesn't? Imagine if we had something more like a V-shaped curve, like this: The gradient does not decrease as we get closer to the minimum, and so while we're in the downward-sloping part, each update is exactly the same distance: Now, eventually we'll jump over the minimum: In this example, I've used a gradient of − 8.33 on the downward-sloping part of the curve, and + 8.33 on the upward-sloping part, so that means that our next update just bounces us back to where we were before! Because the gradient isn't decreasing the closer we get to the minimum, we wind up just oscillating around it. That's not very helpful. That's a slightly contrived example (though not entirely -- intuitively, with functions like ReLU or GELU in our real LLMs, it's easy to imagine crazy loss landscapes). But it does show that perhaps we might want to add in our own "artificial" way to decrease the size of the steps we take over the course of training our model rather than just relying on the gradients naturally flattening out for us. Another way of looking at things is that as the model gets trained, we don't want batches of very new-looking data to cause big updates, taking us away from what was a good part of the loss landscape in terms of what we've seen so far. For example, imagine you've been training an LLM on a bunch of documents, which have so far been in English. Halfway through, it encounters a document in Byzantine Greek, the loss skyrockets, and you do a big update. That would be a problem! You might want it to learn a bit from it to push it slightly in a "the world is multi-lingual" direction, but you don't want it to lose a big chunk of the value from its previous training. You might also see a kind of connection to the way that people learn over the course of their lives -- for babies, everything is new and they "update their parameters" constantly as they try to understand the world. Children are still pretty flexible, but as we get older we tend to update our beliefs less and less. That's not always optimal, but as a heuristic it's pretty adaptive. Anyway, in general: for most training runs, we're going to want the learning rate to adjust over time. Most of the time this will be by reducing it, though there can be cases for increasing it again for periods. The general case of doing this is called "learning rate scheduling". There are a bunch of ways that people adjust the learning rate over the course of a train; here are a few that cropped up a lot while I was researching this. If we want the learning rate to go down over time, and we know how many steps we're training for, we can just set it to (say) 0.0004 for the first quarter of our train, then 0.0002 for the next, then 0.0001, then finish off with 0.00005, like this: That can work pretty well! But there is one obvious oddity -- the big step changes in learning rate mean that the exact placement of the drops and the training data before and after can matter. Why are we treating the data and the state of the model immediately before and immediately after so differently? It would make more sense to have a smoother schedule. What functions decay smoothly like that? An exponential curve does: let's say we just multiply the learning rate by a number that is a little smaller than one every step, so that it drops smoothly like this: But there are lots of other curves like that, and one is particularly interesting: As you change θ from 0 to π , the value of cos θ goes smoothly from 1 to − 1 , so it's easy enough to rescale that so that our learning rate follows the same curve: This is called a "cosine annealing" or "cosine decay" schedule, and was apparently inspired by the algorithms used for simulated annealing (an optimisation algorithm that was in turn inspired by how the atomic structures form in metals as they cool -- another one for the list of things to look into in the future...) That solves the mystery from earlier: the cosine that the Chinchilla paper was talking about was exactly this. As it turns out, the cosine decay scheduling curve is quite popular in deep learning, because it has what amounts to two well-defined phases -- an initial high learning rate where lots of exploration of the loss landscape can happen, followed by a smooth transition to something more like fine-tuning to optimise the location in whatever part of the loss landscape we've wound up in. Now, all of the above are assuming that we want the learning rate to start high and finish low, so that we can mimic the textbook gradient descent that we had at the start of this post. Intuitively that feels nice, but on further thought, the important thing is really that we have a low learning rate at the end of the train, so that we can find as close a point as possible for the minimum at the part of the loss landscape we've found ourselves in. But perhaps there's a case for having both high and low periods during the train, so that we don't get stuck in a local minimum -- something to jolt us out of where we were every now and then? 2 With a step function, that's easy: you could, for example, do this: With an exponential, you could do something like this: With cosine decay, of course, things are even easier, because the cosine function is inherently cyclical, so we can just do this: However, at least for our purposes, training an LLM using a Chinchilla-optimal number of training tokens, it makes sense to be guided by what the authors of the Chinchilla paper did. Appendix B says: We find that setting the cosine cycle length too much longer than the target number of training steps results in sub-optimally trained models, as shown in Figure A1. As a result, we assume that an optimally trained model will have the cosine cycle length correctly calibrated to the maximum number of steps, given the FLOP budget; we follow this rule in our main analysis. So, at this point, I think we have one important part of the intervention we want to make: we want to use a cosine learning rate scheduler, going from high near the start of the training run, down to low at the end over one cycle. Additionally, and also from appendix B in the paper: we use a 10x learning rate decay in line with Rae et al. (2021) ...which means that if our learning rate starts at η , then we want it to decay down to η / 10 by the end. So, we just need to work out an initial value for η , and let it rip, right? Well, not so fast... When our model is uninitialised, right at the start of the train, gradients are going to be pretty wild. It's going to be making random errors all of the time, and we'll be making huge jumps across the loss landscape. That sounds bad. Additionally those kind of wild jumps can get the optimiser into a -- well, sub-optimal -- state. I haven't read enough about optimisers yet to have a solid handle on that, but that can wait -- intuitively it makes some kind of sense that erratic gradient updates might confuse it. So, it makes a certain amount of sense to start off with a low learning rate so that we don't do that, and then to increase it gradually to the peak, and only then to schedule the gradual cosine decay. According to this (rather nice looking) masterclass on LLM training , it's typical to do this over "a few thousand steps or a small percentage (e.g., 1-10%) of the total training steps, depending on the dataset size and batch size", and we would just use a linear increase over that period: I think we should do that; a simple linear warmup at the start -- let's relatively arbitrarily say 5% of our training steps going up to our desired peak learning rate. So our learning rate schedule should look something like this: So far I've written a lot about how we vary the learning rate over time, and that's all been very useful. But we still need to know what the value should be initially! In smaller-scale experiments you might just try a bunch of different numbers to see what worked well, but at more than US$30 per train, that's not practical here. Unfortunately it's really quite hard to find good suggestions published anywhere. The GPT-2 paper is (as usual) reticent: The learning rate of each model was manually tuned for the best perplexity on a 5% held-out sample of WebText ...and if you search for "learning rate training llm", you'll see lots of results for when people are fine-tuning existing LLMs ( 2 × 10 − 4 comes up a lot), but almost nothing about when you're training one from scratch. I eventually came across this (long!) post from Hugging Face , which I definitely need to spend time going through in the future, because it covers a lot of the ground I've been going over in this post series. But for this post, I think the most relevant part is in the section " Scaling Laws for Hyperparameters ", where they include a figure from this DeepSeek paper . Here it is, with some of the (also relevant) surrounding text: In our trains we're using something like 5 × 10 18 total FLOPs. Now, they are specifically charting things in terms of non-embedding FLOPs, but I'm going to play a little fast and loose here and ignore that, so reading off their chart, that looks like we should be using about 1.4 × 10 − 3 as our learning rate. We can double-check that against their formula, where C is the compute budget: Nice, a close match! However, it's definitely worth noting that we're using a simple GPT-2 architecture, and they are using something quite different -- RMSNorm instead of LayerNorm, SwiGLU as the activation function on the feed-forward networks, Rotary Position Embedding rather than the fixed ones we're using, and so on. As a sanity check: you can see that they also give a formula for the optimal batch size in terms of tokens. For our FLOP budget, that comes in at 381,782, which is about 373 of our 1,024-token sequences. That is quite a lot higher than the 97-or-so sequences that we appeared to be optimal in our earlier experiments . That is a little concerning, though of course the 97 number came out of a very ad-hoc bit of curve-fitting. For now, I'm going to hope that that doesn't matter too much for the learning rate. This may come back to bite me; if the results of a train with 1.4 × 10 − 3 are radically worse than the existing rate of 4 × 10 − 4 , I'll have to do a bit more investigation. So, now I think we have all of the theoretical pieces in place to do a train. Let's move on to the practicalities. We started by looking at this: What should we change -- disregarding the until the next post? Based on the above, we want to do a linear warmup of about 5% of our steps, going up to a learning rate of 1.4 × 10 − 3 , followed by a cosine decay down to one tenth of that, 1.4 × 10 − 4 . What does that look like in code? The relevant API for scheduling the learning rate in PyTorch is, logically enough, in the module, and there are a bunch of different scheduling classes. You create your optimiser, then create a scheduler for the shape you want, and then you can call on the scheduler (after the on the optimiser) to adjust the optimiser's learning rate over time. Let's make that more concrete; one of the schedulers is , which is what we'll need for our linear warmup period. It takes as its parameters: Let's say that we want to go from almost-zero to our optimiser's learning rate over 1,600 steps -- we'd create our scheduler like this: ...then in our training loop, after we've done the scaled step of the optimiser, we'd also step the scheduler: This confused me a little bit the first time I saw it; after all, if the scheduler hasn't been "triggered" when we step the optimiser, how does the optimiser know what learning rate to use? Surely it would just use whatever it was initialised with? The answer is that when you create the optimiser, it stores away the learning rate that you give it in two places -- an "initial learning rate" and a "current learning rate". Next, when you create your scheduler, it uses the initial learning rate to work out the start and end values, and then sets the current one to the start value immediately. Just by creating a scheduler, you're changing the optimiser's current learning rate -- but not the initial one, which is important, as we'll see in a moment. So, we have a scheduler that handles our warmup period nicely. Another scheduler that's relevant to our interests is the CosineAnnealingLR . This takes: On creation, this scheduler will read in the optimiser's initial learning rate -- note, not the current one -- and then the first time it's stepped, it will set the current learning rate to that value, and then for steps after that it will reduce it so that it follows a nice cosine decay, reaching after steps. So those two cover the two regimes that we want -- the warmup and then the cosine decay. But now we need to put them together; we want to do one and then the other. There's a very useful class, , which allows you to chain schedulers and tell it when each one takes over from the previous one. Let's sketch out some code to use that to do a train with our new peak learning rate of 1.4 × 10 − 3 , a warmup of 1,600 steps, followed by a cosine decay for the next 32,000 steps to one tenth of the peak learning rate: That actually works quite nicely! I wrote a dummy training loop to plot the current learning rate over a fake train using code like the above , and got this: ...with the output confirming that the values were good at the "milestone" point, the start and the end: I was initially a bit surprised by that, as at the time I ran it, I didn't realise that there was that split between the initial and the current learning rates on the optimiser, so I thought that the cosine scheduler would pick up whatever tiny starting value the warmup scheduler had overwritten the optimiser's learning rate with -- but that split saves the day. That means that now we have the outline of how to schedule our learning rate. But before we can put that into the code, we need to think about how it affects our checkpoints. Just like the scheduler and the optimiser, the learning rate scheduler -- or, indeed, our two schedulers here -- contain information about the state of the train. That means that if we recover from a checkpoint, we need to provide them with the information they need. If we just created them afresh, they'd start from the beginning -- for example, if we restarted from step 20,000 in a train like the one above, we'd start a new warmup from pretty much zero, and then start a fresh cosine decay. That would be bad: (Dummy test code here .) Now, we could use the parameter to initialize them with the correct current global step. But they have a state dict, like most other PyTorch objects, so the simplest thing to do is just to write that to another checkpoint file: ...and then load it likewise: (Dummy test code here .) Conveniently, if you save the state dict of a , it will also include the state of all of its component schedulers, and likewise if you reload it, it will load the components' states back in too. The one thing you have to be careful about is what they warn about in the PyTorch docs: Initializing a scheduler overwrites its optimizer’s s. When restoring a checkpoint, initialize the scheduler before calling your optimizer's to avoid overwriting the loaded learning rates. Luckily enough, in our code as it stands, we create all of the things that are checkpointed -- the optimiser and the scaler so far, but shortly the scheduler as well -- before we load in the state dicts, so that drops out quite nicely. So, we have some sketched-out code -- it's time to put it in place for the real training run. I won't go through the details of the changes to my existing DDP training code, though you can see the diff here if you're interested. Much of the complexity was due to keeping backward compatibility so that we don't have to always use a learning rate scheduler; remember that in this mini-series, I'm trying making various changes ("interventions") to the training loop in isolation, seeing whether each one improves things. So it's important to be able to easily train with or without learning rate scheduling; I did that with a flag in the Implementation-wise, initially I was thinking that it would be easiest to always have a scheduler, and in the "non-scheduled" case to just set it to a linear one that didn't change the value over the course of the train. But in the end it turned out to be easier to use as being the switch to tell the training loop which "mode" it was in. The placement of the code to create the schedulers was also a little tricky; the "natural" place was just after the optimiser is created, like it is in the example code above. However, at that point, we don't know how many global steps we're going to have in the train, because we don't have the dataset -- which means that working out the numbers to pass in to the schedulers for the warmup and decay steps would be impossible. It turned out to be easiest to put it in the function , just after the datasets are loaded, as at that point we have all of the information we need. Anyway, that's the code done, so let's see what happens! I wanted to do two trains; one with the learning rate scheduling, and one with just the new value for the learning rate, instead of . I was expecting the updated learning rate alone to be too high and to cause a very choppy train, but had high hopes for the train with the scheduling. Here's how it did; the scheduled learning rate train first: Here's what the training loss looked like over that: Quite a few loss spikes early on in the train when the learning rate is at its peak, but nothing unmanageable -- and, as you'd expect, things calmed down quite a lot later on. I also charted the learning rate, to make sure it really was doing what I thought it was doing: So, a pretty smooth train, and we definitely did the right learning rate scheduling. Time to upload it to Hugging Face , and see what the evals look like. Firstly, the smoke test: Reasonably coherent, at least, though it's not super-impressive. On to the loss on our test set: That's our best loss so far! Let's put it into the table: So, it definitely looked like it was worth it. But was it the scheduling of the learning rate that helped, or just the change from 0.0004 to 0.0014? I kicked off a second run with no scheduling, just a learning rate of 0.0014, to see what would happen. After about an hour, I noticed that the loss chart had stopped updating. The last point had a maximum and minimum loss but no average -- but after that, nothing: However, the learning rate was still being charted, so the train was definitely running: Looking at the checkpoint metadata showed what had happened. At global step 1851, we had this 3 : ...and at the next checkpoint at step 2468, we had this: ...and the same for all checkpoints thereafter. Clearly the parameters had gone off the rails -- exactly what we'd expect with an excessive learning rate: There was no point in continuing the train, as it was pretty much certainly unrecoverable, so I stopped it. Out of interest, I downloaded the model, but I couldn't even run the smoke test on it: So it was pretty clear that just updating the learning rate to 0.0014 was actively harmful. No need to upload that one to HF! And time to wrap up this experiment. While this has been quite a long post, I've really only scratched the surface of how learning rates are set. If I were doing things in more detail, the best would probably be to do a "sweep" over multiple values to try to at least approximate the best possible rate for this model. That would be pretty expensive for me, though, so I decided to stick with the DeepSeek number. It might not be ideal for the specific architecture that I'm using, given how different that is to theirs, but given the results, it's a decent one compared to what I was using. 4 Something that I found interesting is that exactly how to schedule your learning rate is still an area being actively researched. Even in my relatively minimal research, I came across three alternatives to the mainstream warmup-cosine decay pattern: I'm sure there are many more. But for this train, I decided to stick to the mainstream, and the results were pretty good! To reiterate, this has been the most positive intervention so far: So I'll stick with that, and move on to the next thing: what is the parameter that we're passing in to the AdamW optimiser? Tune in next time :-) Yes, I am foreshadowing here.  ↩ To make my earlier analogy about learning rate decaying over time in people as they age even more dubious, we can imagine this as being rather like someone middle-aged going on an ayahuasca retreat ;-)  ↩ If you're wondering how we had a valid maximum and minimum in that first checkpoint when the average was NaN, here's why: You might wonder how large labs work out the right learning rate given their training runs run to millions of dollars. The answer is there in that DeepSeek paper, as that's one of the things they were doing. They scaled their model down from the billions of parameters that they wanted to train to various smaller models, and worked out the optimal learning rate for each of the smaller models by doing full trains on them. Once they had a mapping from model size to the ideal learning rate for their architecture, they could extrapolate that to the large ones that they wanted to train. The problem is that those "smaller" models are actually quite a lot larger than the one we're training here! And while we could potentially scale it down even further, I suspect that such truly tiny models (say, 1M parameters) wouldn't train well enough to give any meaningful results.  ↩ From the paper: Specifically, the learning rate of the model reaches its maximum value after 2000 warmup steps, and then decreases to 31.6% of the maximum value after processing 80% of the training tokens. It further reduces to 10% of the maximum value after 90% of the tokens. , which is the optimiser we're applying it to. , which the optimiser's learning rate is multiplied by to work out where we want to start up. , which is likewise applied to the optimiser's learning rate to work out the value we're heading for. , which is the number of steps over which it should go from the initial learning rate to the final one. , which lets the scheduler know how many steps into its schedule it currently is -- this defaults to , meaning it hasn't started yet. This can be useful if you're resuming from a checkpoint, but for our purposes we can ignore it. , which is the same as the 's. , which is the number of steps before it reaches its minimum , the minimum learning rate we want to get to. , again the same as the 's. Per the Hugging Face paper, some people do warmup, then pause at a set level for a while, then start the cosine decay (warmup-stable-decay). DeepSeek use a relatively simple stepped function after a warmup. 5 I came across a 2025 paper " Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs " which says that a linear decay (after a warmup) outperforms cosine. Yes, I am foreshadowing here.  ↩ To make my earlier analogy about learning rate decaying over time in people as they age even more dubious, we can imagine this as being rather like someone middle-aged going on an ayahuasca retreat ;-)  ↩ If you're wondering how we had a valid maximum and minimum in that first checkpoint when the average was NaN, here's why: ↩ You might wonder how large labs work out the right learning rate given their training runs run to millions of dollars. The answer is there in that DeepSeek paper, as that's one of the things they were doing. They scaled their model down from the billions of parameters that they wanted to train to various smaller models, and worked out the optimal learning rate for each of the smaller models by doing full trains on them. Once they had a mapping from model size to the ideal learning rate for their architecture, they could extrapolate that to the large ones that they wanted to train. The problem is that those "smaller" models are actually quite a lot larger than the one we're training here! And while we could potentially scale it down even further, I suspect that such truly tiny models (say, 1M parameters) wouldn't train well enough to give any meaningful results.  ↩ From the paper: Specifically, the learning rate of the model reaches its maximum value after 2000 warmup steps, and then decreases to 31.6% of the maximum value after processing 80% of the training tokens. It further reduces to 10% of the maximum value after 90% of the tokens. ↩

0 views
iDiallo Today

I'm Not Lying, I'm Hallucinating

Andrej Karpathy has a gift for coining terms that quickly go mainstream. When I heard "vibe coding," it just made sense. It perfectly captured the experience of programming without really engaging with the code. You just vibe until the application does what you want. Then there's "hallucination." He didn't exactly invent it. The term has existed since the 1970s. In one early instance, it was used to describe a text summarization program's failure to accurately summarize its source material. But Karpathy's revival of the term brought it back into the mainstream, and subtly shifted its meaning, from "prediction error" to something closer to a dream or a vision. Now, large language models don't throw errors. They hallucinate. When they invent facts or bend the truth, they're not lying. They're hallucinating. And with every new model that comes out and promises to stay clean off drugs, it still hallucinates. An LLM can do no wrong when all its failures are framed as neurological disorder. For my part, I hope there's a real effort to teach these models to simply say "I don't know." But in the meantime, I'll adopt the term for myself. If you ever suspect I'm lying, or catch me red-handed, just know that it's not my fault. I'm just hallucinating .

0 views

Note #727

just posted about dot files and attractors over at 2389.ai 2389.ai/posts/the… Thank you for using RSS. I appreciate you. Email me

0 views

The Beginning Of History

Hi! If you like this piece and want to support my work, please subscribe to my premium newsletter. It’s $70 a year, or $7 a month, and in return you get a weekly newsletter that’s usually anywhere from 5000 to 185,000 words, including vast, extremely detailed analyses of NVIDIA , Anthropic and OpenAI’s finances , and the AI bubble writ large .  I just put out a massive Hater’s Guide To Private Equity and one about both Oracle and Microsoft in the last month. I am regularly several steps ahead in my coverage, and you get an absolute ton of value, several books’ worth of content a year in fact!. In the bottom right hand corner of your screen you’ll see a red circle — click that and select either monthly or annual.  Next year I expect to expand to other areas too. It’ll be great. You’re gonna love it.  Before we go any further: no, this is not going to turn into a geopolitics blog. That being said, it’s important to understand the effect of the war in Iran on everything I’ve been discussing. So, let’s start simple. Open Google Maps. Scroll to the Middle East. Look at the bit of water separating the Gulf Arab countries from Iran. That’s the Persian Gulf.  Scroll down a bit. Do you see the narrow channel between the United Arab Emirates and Iran? That’s the Strait of Hormuz. At its narrowest point, it measures 24 miles across. Around 20% of the world’s oil and a similar percentage of the world’s liquified natural gas (LNG) flows through it each year.  Yes, that natural gas, the natural gas being used to power data centers like OpenAI and Oracle’s “Stargate” Abilene (which I’ll get to in a bit) and Musk’s Colossus data center . But really, size is misleading. Oil and gas tankers are massive, and they’re full to the brim with incredibly toxic material. Spills are, obviously, bad . Also, because of their size, these tankers need to stick where to where the water is a specific depth, lest they find themselves stuck.  As a result, there are two lanes that tankers use when navigating through the Strait of Hormuz — one going on, one going out. This a sensible idea with the goal to reduce the risk of collisions, but it also means that the potential chokepoint is even smaller.   Anyway, at the end of last month, Iran’s Revolutionary Guard Corps unilaterally closed off the strait, warning merchant shipping that any attempt to travel through the strait was “not allowed .” This closure, for what it’s worth, is not legally binding. Iran can’t unilaterally close a stretch of international waters. And yes, while some of those shipping lanes cross through Iran’s territorial waters ( and Oman’s, for that matter ), they’re still governed by the UN Convention on the Law of the Sea (UNCLOSS) , which gives ships the right to cross through narrow geographical chokeholds where part of the waters belong to another state, and that says that nations “shall not hamper transit passage.” That requirement, I add, cannot be suspended.  Still, merchant captains don’t want to risk getting themselves and their crews blown up, or arrested and thrown in Evin Prison . Insurers don’t want to pay for any ship that gets blown up, or indeed, for the ensuing environmental catastrophe. And the UAE doesn’t want its pristine beaches covered in crude oil.  And so, the tankers are staying put . And they’ll stay there until one of four things happens: Of the first three, none feels particularly likely, at least in the short-to-medium term. Maybe I’m wrong. Maybe everything reverses and everyone suddenly works it out — Trump realizes that he’s touching the stove and pulls out after claiming a “successful operation.” The world is chaotic and predicting it is difficult. Nevertheless, before that happens, closing the Strait of Hormuz means that Iran can inflict pain on American consumers at the pump, and we’ve already seen a 30% overnight spike in oil prices , with the price of a barrel jumping over $100 for the first time since 2022 (though as of writing this sentence it’s around $95). With midterms on the horizon, Iran hopes that it can translate this consumer pain to political pain for Donald Trump at the ballot box.  This is all especially nasty when you consider that the price of oil is directly tied to inflation. It influences shipping costs, a lot of medicines, construction materials, and consumer objects have petrochemical inputs. In very simple terms, if oil is used to make your stuff (or get it to you), that stuff goes up in price. While this obviously hurts countries with which Iran has previously had cordial relations, (particularly Qatar which is a major exporter of LNG), I genuinely don’t think it cares any more.  I mean, Iran has launched drones and missiles at targets located within Qatar’s territory , resulting in (at the latest count) 16 civilian injuries. Qatar shot down a couple of Iranian jets last week . I’m not sure what pressure any of the Gulf countries could exert on Iran to make it back down.  I don’t see the security situation improving, either. Iran’s Shahed drones are cheap and fairly easy to manufacture, and developed under some of the most punishing sanctions, when the country was cut off from the global supply chain. It then licensed the design to Russia, another heavily-sanctioned country, which has employed them to devastating effect in Ukraine.  Iran can produce these in bulk, and then — for the fraction of a cost of an American tomahawk missile — send them out as a swarm to hit passing ships. Even without the ability to produce new ones, Iran is believed to have possessed a pre-war stockpile of tens of thousands of Shahed drones .  Shaheds aren’t complicated, or expensive, or flashy, or even remotely sophisticated, and that’s what makes them such a threat. It took Ukraine a long time to effectively figure out how to counter them, and it’s done so by using a whole bunch of different tactics — from l and-based defenses like the German-made Gepard anti-aircraft gun , to interceptor drones , to repurposed 1960’s agricultural planes , to (quite literally) people shooting them down with assault rifles from the passenger seat of a propeller-powered planes .  Ukraine has the experience in combating these drones, and even still some manage to slip through its defences, often hitting civilian infrastructure . Airstrikes can probably reduce the threat to shipping (though not without exacting an inevitable and horrible civilian cost), but they can’t eliminate it.  Hell, even the Houthis — despite only controlling a small portion of Yemen, and despite efforts by a coalition of nations to degrade its offensive capabilities — still pose a risk to maritime traffic heading towards the Suez Canal.  Given the cargo these ships carry, any risk is probably too much risk for the insurers, for the carriers, and for the neighbouring countries. While I could imagine the US, at some point, saying “great news! It’s fine to go through the Strait of Hormuz now,” and though it has started offering US government-backed reinsurance for vessels , I don’t know if any shippers will actually believe it or take advantage of it.  And so, we get to the last point on my list. Regime change.  Do I believe that the Iranian government is deeply unpopular with its own people? Yes. Do I believe that said government can be overthrown by airstrikes alone? No. Do I believe that Iran’s government will do anything within its power to remain in control, even if that means slaughtering tens of thousands of their own people? Yes.  Even if there was an uprising, who would lead it? Iran’s virtually cut off from the Internet , and movement within the country is restricted, making it hard for any opposition figures to organize. The two most high-profile outside opposition figures — Reza Pahlavi, the son of the former Shah, and Maryam Rajavi, leader of the MEK and NCRI — both have their own baggage, and they’re living in the US and France respectively.  As I said previously, this isn’t me wading into geopolitics, but more of a statement that there’s no way of knowing when things will eventually return to normal. This conflict might wrap up in a couple of weeks, or it might be months, or, even longer than that. All this amounts to a huge amount of global oil production being bottled up, which is made worse by the fact that there’s also the slight problem that Iran produces a lot of oil itself, sending most of it (over 80%) to China . With Iran unable to export crude, and its production facilities now under attack, China’s going to have to look elsewhere. Which will result in even higher oil prices.  Which, in turn, will make everything else more expensive.  That is what brings us back to the AI bubble.  Now, given that most of the high-profile data center projects you’ve heard about are based in the US, which is (as mentioned) largely self-sufficient when it comes to hydrocarbons, you’d assume that it would be business as usual.  And you would be wrong.  You see, this is a global market. Prices can (and will!) go up in the US, even if the US doesn’t import oil or natural gas from abroad, because that’s just how this shit works. Sure, there are variations in cost where geography or politics play a role, but everyone will be on the same price trajectory. While we won’t see the same kind of shortages that we witnessed during the last oil shock (the one which ended up taking down the Carter presidency ), it will still hurt . While the US managed to decouple itself from oil imports, it hasn’t (and probably can’t) decouple itself from global pricing dynamics.  The US has faced a few major oil shocks — the first in 1973 , after OPEC issued an embargo against the US following the Yom Kippur War, which ended the following year after Saudi Arabia broke ranks, and the second in 1979, following the Iranian Revolution — and both hurt…a lot. This won’t be much different.  First, inflation.  As the cost of living spikes, people will start demanding higher wages, which will, in turn, be passed down through higher prices.  At least, that’s what would normally happen. Paul Krugman, the Nobel-winning economist, wrote in his latest substack that US workers in the 1970s were often unionized, and they benefited from contractual cost-of-living increases in their work contracts.  Sadly, we live in 2026. Union membership hasn’t recovered from the dismal Reagan years, and with layoffs and offshoring, combined with an already tough jobs market, workers have little leverage to demand raises. We’re in an economy oriented around do-nothing bosses that loathe their workers , one where workers will get squeezed even further by the consequences of any economic panic, even if it’s one caused by multiple events completely out of their control. So, it’s unlikely that we’ll see a wage-based amplification of any inflation that comes from the current situation.  That said, depending on how bad things get, we will see inflation spike, and Increases in inflation are usually met with changes in monetary policy, with central banks raising the cost of borrowing in an attempt to “cool” the economy (IE: reduce consumer spending so that companies are forced to bring down prices).    And we’d just started to bring down interest rates, with the Fed announcing in December that it projected rates of 3.4% by the end of 2026 . Iran changes that in the most obvious way possible — if prices soar, interest rates may follow, and if rates go up, even by a point or two of a percentage, financing the tens and hundreds of billions of dollars in borrowing that the AI bubble demands will become significantly more expensive.  For some context, the International Monetary Fund’s Kristalina Georgieva recently said “...a 10% increase in energy prices that persists for a year would push up global inflation by 40 basis points and slow global economic growth by 0.1-0.2%,” per The Guardian, who also added… And remember : the AI bubble, along with the massive private equity and credit funds backing it, is fueled almost entirely by debt. All this chaos and potential for jumps in inflation will also affect the affordability calculations that lenders will make before loaning the likes of Oracle and Meta the money they need at a time when lenders are already turning their nose up at Blue Owl-backed data center debt deals . The alternative is, of course, not raising interest rates — which, if the Fed loses its independence, is a possibility — which would be equally catastrophic, as we saw in the case of Turkey, whose president, Recep Tayyip Erdogan, has a somewhat… ahem… “unorthodox approach to monetary policy .  Erdogan believes that high interest rates cause inflation — a theory which he tested to the detriment of his own people .  In simpler terms, Turkey has faced some of the worst hyperinflation in the developed world , and has a currency that lost nearly 90% of its value in five years.  It’s not just the data centers, either. As interest rates go up, VC funds tend to shrink, because the investors that back said funds can get better returns elsewhere , and with much less risk.  As I discussed in the Hater’s Guide to Private Equity , 14% of large banks’ total loan commitments go to private equity, private credit and other non-banking institutions , at a time when ( to quote Forbes ) PE firms are taking an average of 23 months fundraising (up from 16 months in 2021), after private credit’s corporate borrowers’ default rates (as in the loans written off as unpaid by the borrow) hit 9.2% in 2025 . Put really simply, private equity, private credit, venture capital and basically everything to do with technology currently depends on the near-perpetual availability of debt. The growth of private credit is so recent that we truly don’t know what happens if the debt spigot gets turned off, but I do not think it will be pretty . Things get a little worse when you remember that famed business dipshits SoftBank are currently trying to raise a $40 billion loan to fund its three $10 billion Klarna-esque payments as part of its $30 billion investment in OpenAI’s not-actually-$110-billion-yet funding round . How SoftBank — a company that raised a $15 billion bridge loan due to be paid off in around four months and has about $41.5 billion in existing debt that’s maturing that needs to be refinanced in the next nine months or so, per JustDario — intends to take on another $40 billion is beyond me. And that’s a sentence I would’ve written before the war in Iran began. There’s also evidence that links lower IPO numbers to rising inflation rates , which means that achieving the exit that investors want will become so much harder — and so, they might as well not bother. Need proof? SoftBank-owned mobile payments company PayPay delayed its IPO last week, and I quote Reuters , because “...markets were rattled by [the attack] on Iran, according to two people familiar with the matter.” Inflation also negatively affects company valuations — which, again, will influence whether investors open their purse strings.  This is all a long-winded way of saying that the AI industry is about to enter a world of hurt. Every AI startup is unprofitable, which means they need to raise money from venture capitalists, who raise money from investors that aren’t paying them, pension funds and insurers, and private equity and credit firms that raise money from banks, both of which will struggle should central bank rates spike.  The infrastructural layer — AI data centers — also requires endless debt ( due to the massive upfront costs for NVIDIA chips and construction ), and that debt was already becoming difficult to raise.  Then there's the practical opex and capex costs. Higher interest rates mean that any contractors building the facilities will insist on higher fees, because their costs — labor costs, the price of filling up a van or a truck with gas, or paying for building materials — has gone up. And they’ll probably pad the increase a bit to take into account for any future rises in inflation.  Those gas turbines you’re running to power your facility? Yeah, feeding those is going to get much more expensive. Natural gas is up as much as 50%, and a lot of US capacity is going to serve markets in Asia and Europe to take advantage of the spike in prices , which will mean an increase in prices for US consumers.  In fact, you don’t even need interest rates to spike for things to get nasty. As the price of oil continues to skyrocket, flying a Boeing 747 filled with GB200 racks from Taiwan to Texas or mobilizing the thousands of people that work ( to quote Bloomberg ) day and night to build Stargate Abilene will become extra-normally more expensive. And even in the very, very unlikely event that things somehow quickly return to whatever level of “normal” you’d call the world before the conflict started, even brief shocks to the financial plumbing are enough to destabilize an already-fractured hype cycle. Last week, Bloomberg reported something I’d already confirmed three weeks ago — that OpenAI was no longer part of the planned expansion (past the initial two (of eight) buildings) of Stargate Abilene, a project that’s already massively delayed from its supposed “full energization” by mid-2026 .  Oracle disputes the report (and if it’s wrong, I imagine investors will rightly sue) claiming that “Crusoe [the developer] and Oracle are “operating in lockstep,” which doesn’t make sense considering the delays or, well, reality. My sources in Abilene also tell me that the expansion fell apart due to Oracle’s dissatisfaction with the revenue it was making on buildings one and two, and that a bidding war was taking place between Meta and Google for the future capacity.  Bloomberg’s Ed Ludlow also reports that NVIDIA put down a $150 million deposit as Crusoe attempts to lock down Meta as a tenant — a very strange thing to do considering Meta is flush with cash, suggesting a desperation in the hearts of everybody involved. It’s also very, very strange to have a supplier get involved in a discussion between a vendor and a customer , almost as if there’s some sort of circular financing going on. As I reported back in October, Stargate currently only has around 200MW of power , and The Information reports that power won’t be available for a year or more, something I also said in October .  As self-serving as it sounds, I really do recommend you read my premium piece about the AI Bubble’s Impossible Promises , because I laid out there how stupid and impossible gigawatt data centers were before the war in Iran. We’ve already got a shortage in the electrical grade steel and transformers required to expand America’s (and the world’s) power grid, we’ve already got a shortage of skilled labor required to build that power (and data centers in general) , and we’re moving massive amounts of heavy shit around a large patch of land using thousands of people, which will cost a lot of gas. I don’t know why, but the media and the markets seem incapable of imagining a world where none of this stuff happens, clinging to previous epochs where “things worked out” and where “things were okay” without a second thought. In The Black Swan , Nassim Taleb makes the point that “…the process of having [journalists] report in lockstep [causes] the dimensionality of the opinion set to shrink considerably,” saying that they tend to “[converge] on opinions and [use] the same items as causes.”  In simpler terms, everybody reporting the same thing in the same way naturally makes everybody converge on the same kinds of ideas — that AI is going to be a success because previous eras have “worked out,” even if they can’t really express what “worked out” means.  The logic is almost childlike — in the past, lots of money was invested in stuff that didn’t work out, but because some things worked out after spending lots of money , spending lots of money will work out here.  The natural result is that reporters (and bloggers) seek endless positive confirmation, and build narratives to match. They report that Anthropic hit $19 billion in annualized revenue and OpenAI hit $25 billion in annualized revenue — which has been confirmed to refer to a 4-week-long period of revenue multiplied by 12 — as proof that the AI bubble is real, ignoring the fact that both companies lose billions of dollars and that my own reporting says that OpenAI made billions less and spent billions more in 2025. They assume that a company would not tell everybody something untrue or impossible, because accepting that companies do this undermines the structure of how reporting takes place, and means that reporters have to accept that they, in some cases, are used by companies to peddle information with the intent of deception. And thanks to an affidavit from Anthropic Chief Financial Officer Krishna Rao filed as part of Anthropic’s suit against the Department of Defense’s supply chain risk designation , it’s clear that the deception was intentional, as the affidavit confirmed that Anthropic’s lifetime revenue “to date” (referring to March 9th 2026) is $5 billion , and it has spent $10 billion on inference and training.  To be abundantly clear , this means that Anthropic’s previous statement that it made $14 billion in annualized revenue ( stated by Anthropic on February 12 2026, and referring, I’ve confirmed, to a month-long period multiplied by 12 ) — referring to a period of 30 days where it made $1.16 billion — accounts for more than 23% of its lifetime revenue.  This comes down to which Anthropic you believe, because these two statements do not match up. I am not stating that it is lying , but I do believe annualized revenue is a deliberate attempt to obfuscate things and give the vibe that the business is healthier than it is. I also do not think it’s likely that Anthropic made 23% of its lifetime revenue in the space of a month. What this almost certainly means is that the sources that told media outlets that Anthropic made $4.5 billion in 2025 were misleading them . The exact quote from the affidavit is that “...[Anthropic] has generated substantial revenue since entering the commercial market—exceeding $5 billion to date,” and while boosters will say “uhm, it says “exceeding,” if it were anything higher than $5.5 billion Anthropic would’ve absolutely said so.  We can also do some very simple maths that suggests that Anthropic’s “annualized” figures are…questionable. On February 12 2026, annualized revenue hit $14 billion. Five days before the lawsuit was filed, it was $19 billion, “ with $6 billion added in February ” (per Dario Amodei at a Morgan Stanley conference), suggesting that annualized revenue was $13 billion, or $1.083 billion.  Even if we assume a flat billion, that means that Anthropic made $2.16 billion between January and the end of February 2026. And that’s not including the revenue made in March so far.  But I’m a curious little critter and went ahead and added up all of the times that Anthropic had talked about its annualized revenue from 2025 onward, and the results — which you can find with links here! — and based on my calculations, just using published annualized revenues gets us to $4.837 billion.  We are, however, missing several periods of time, which I’ve used “safe” (as in lower, so that I am trying to give Anthropic the benefit of the doubt) numbers to calculate based on the periods themselves. With these estimates, we get a grand total of $6.66 billion (ominous!), which is a great deal higher than $5 billion. When you remove the estimates and annualized revenues for 2026, you get $3.642 billion, which heavily suggests that Anthropic did not, in fact, make $4.5 billion in 2025. There isn’t a chance in Hell this company made $4.5 billion in 2025 based on its own CFO’s affidavit. I also think it’s reasonable to doubt the veracity of these annualized revenues, or, in my kindest estimation, that Anthropic is using any kind of standard “annualized” formula.  Here are the ways in which people will try and claim I’m wrong: I think it’s reasonable to doubt whether Anthropic made anywhere near $4.5 billion in 2025, whether Anthropic has annualized revenues even approaching those reported, and whether anything it says can be trusted going forward. It appears one of the most prominent startups in the valley has misled everybody about how much it makes, or if it has not, that somebody else is perpetuating a misinformation campaign. Add together the annualized revenues. Look at the links. Do the maths. I got the links for annualized revenues from Epoch AI , though I have seen all of these before in my own research.  People are going to try and justify why this isn’t a problem in all manner of ways. They’ll say that actually Anthropic made less money in 2025 but that’s fine because they all could see what annualized revenues really meant. So far, nobody has a cogent response, likely because there isn’t one. I haven’t even addressed the $10 billion in training and inference costs, because good lord, those costs are stinky , and based on my own reporting — which did not come from Anthropic, which is why I trust it! — Anthropic spent $2.66 billion on Amazon Web Services from January through September 2025, or around 26% of its lifetime compute spend. That’s remarkable, and suggests this company’s compute spend is absolutely out of control. This leads me to one more quote from Anthropic’s CFO: Without attempting to influence their decision making, if I were a counterparty to a company like this, my biggest concern would now be that this filing appears to suggest that Anthropic’s revenues are materially smaller than I believed. Though it might seem dangerous to be like me, pointing at stuff and saying “that doesn’t make sense!” Or questioning a narrative held by the entire stock market and most of modern journalism, but I’d argue the danger is that narrow, narrative-led, establishment-driven thinking makes it impossible for reporters to report.  While you might be able to say “a source told me that something went wrong,” the natural drive to report on what everybody else is saying means that this information is often reported with careful weasel words like “still going as planned” or “still growing incredibly fast.” It’s a kind of post-factual decorum — a need to keep the peace that frames bad signs as bumps in the road and good signs as cast-iron affirmations of future success. This is a catastrophic failure of journalism that deprives retail investors and the general public of useful information. It also — though it feels as if reporters are “getting scoops” or “breaking news” — naturally magnetizes journalists toward information that confirms the narrative, or “leaks” that are actually the company intentionally getting something in front of a reporter so that they (the reporter) can appear as if this was “investigative news” versus “marketing in a different hat.” It also means that modern journalism is ill-equipped, and no, this is not a “new” phenomena. It is the same thing that led to the dot com bubble, the NFT bubble, the crypto bubble, the Clubhouse bubble, the AR and VR bubble, and many more bubbles to come.  To avoid being “wrong,” reporters are pursuing stories that prove somebody else right, which almost invariably ends with the reporter being wrong. “Pursuing stories to prove somebody else right” means that a great many reporters (and newsletter writers) that claim to be objective and fact-focused end up writing the narrative that companies use to raise money using evidence manufactured by the company in question.  In some cases, this is an act of cowardice. Following the narrative because it’s easy and because everybody’s doing it adds a layer of reputation laundering. If everybody failed, everybody was conned and thus nobody has to be held accountable, and because there really has never been any accountability for the media being wrong about any previous bubbles, the assumption is that it’ll never happen.  However you may feel about my work or what I’m saying, I need you to understand something: journalism, both historically and currently, is unprepared for the consequences of being wrong.  The current media consensus around the AI bubble is that even if it pops it will be fine , with some even saying that “even if OpenAI folds, everything will work out, because of the dot com bubble.” This is a natural attempt to rationalize and normalize the chaotic and destructive — an attempt to map how this bubble would burst onto previous bubbles because new things are difficult and scary to imagine.  There has never been a time when the entire market crystallised around a few specific companies — not even the dot com bubble! — and then built an entire infrastructural layer mostly in service of two of them, with a price tag now leering close to the $1tn mark .   Let’s get specific. The scoffing and jeering I get from people when I say that AI demand doesn’t exist or that AI companies don’t have revenues or that OpenAI or Anthropic are unsustainable is never met with a good faith response , just quotes about how “Amazon Web Services lost lots of money” or “Uber lost lots of money” or that “these are the fastest growing companies of all time” or something about “all code being written by AI,” a subject I discussed at length two weeks ago .  The Large Language Model era is uniquely built to exploit human beings’ belief that we can infer the future based on the past, both in how it processes data and in how people report on its abilities. It exploits media outlets that do not have people that are given the time (or held to a standard where they have) to actually learn the subjects in question, and sells itself based on the statement that “this is the worst it’ll ever be” and “previous eras of investment worked out.”  LLMs also naturally cater to those who are willing to accept substandard explanations and puddle-deep domain expertise. The slightest sign that Claude Code can build an app — whether it’s capable of actually doing so or not — is enough for people that are on television every day to say that it will build all software, because it confirms the biases that the cycle of innovation and incumbent disruption still exists, even if it hasn’t for quite some time. A glossy report about job displacement — even one that literally says that Anthropic found “no systematic increase in job displacement in unemployment” from AI — gets reported as proof that jobs are being displaced by AI because it says “AI is far from reaching its theoretical capability: actual coverage remains a fraction of what’s feasible.”  This is an aggressive exploitation in how willing people with the responsibility to tell the truth are willing to accept half-assed expectations, and how willing people are to operate based on principles garnered from the lightest intellectual lifts in the world. The assumption is always the same: that what has happened before will happen again, even if the actuality of history doesn’t really reflect that at all. Society — the media, politicians, chief executives, shit, everyone on some level — is incapable of thinking of new stuff that would happen, especially if that new stuff would be economically destructive, such as a massive scar across all private credit, private equity and venture capital, one so severe that it may potentially destroy the way that businesses (and startups, for that matter) raise capital for the foreseeable future. People are more willing to come up with societally-destructive theories — such as all software engineering and all journalism and all content being created by LLMs, even if it doesn’t actually make sense — because it fits their biases. Perhaps they’re beaten down by decades of muting the power of labor or the destruction of our environment. Perhaps they’re beaten down by the rise of the right and the destruction of the rights of minorities and people of colour.  Or more noxiously, perhaps they’re excited to be the one that called it first, so that the new overlords that they perceive will own this (fictional) future, so much so that they’ll ignore the underlying ridiculousness of the economics, refuse to do any further reading that might invalidate their beliefs, or simply say whatever they’re told because it gets clicks and makes their advertisers, bosses or friends happy. People are willing to fall in line behind mythology because conceiving an entirely-different future is an intellectually challenging and emotionally draining act. It requires learning about a multitude of systems and interconnecting disciplines and being willing to admit, again and again, that you do not understand something and must learn more. There are plenty of people that are willing to do this, and plenty more that are not, and those are the people with TV shows and writing in the newspaper. I believe we’re in a new era. It’s entirely different. Stop trying to say “but in the past,” because the past isn’t that useful, and it’s only useful if you’re capable of evaluating it critically, skeptically, and making sure that it’s actually the same rather than it feeling like it is.  I keep calling this era “The Beginning of History,” not because it directly reflects Francis Fukuyama’s theory (which relates to democracies), but because I believe that those who succeed in this world are not those who are desperate to neatly fit it into the historical failures or successes of the past, but are willing to stare at it with the cold, hard fury of the present.  There are many signs that the past no longer makes sense. The collapse of SaaS (which I’ll cover in this week’s premium), the collapse of the business models of both venture capital and private equity, the collapse of democracies under the weight of fascism because the opposition parties never seem to give enough of a fuck about the experiences of regular people.  That’s because using the past to dictate what will happen in the future is masturbatory. It allows you to feel smart and say “I know the most about anything, which means I know what’s going on.” It is, much like an LLM, assuming that simply reading enough is what makes somebody smart, that shoving a bunch of text in your head — whether or not you understand it is immaterial — is what makes somebody know something or good at something.  It’s an intellectually bankrupt position that I believe will lead those unable to adapt to the reality of the future to destruction. It leads to lazy thinking that grasps at confirmations rather than any fundamental understanding, depriving the general public of good information in the favor of that which confirms the biases and wants and needs of the malignant and ignorant.  It takes courage to be willing to be wrong with deliberacy, but only if you admit that you were wrong. This hasn’t happened in previous bubbles, and it has to again for us to stop bubbles forming. I have made a great deal of effort to learn more as time goes on. I do not see boosters doing the same to prove their points. I will be pointing to this sentence in the future, one way or another.  So much more effort is put into humouring the ideas of the bubbles, of proving the marketing spiel of the bubbles, framed as a noxious “both-sides” that deprives the reader, listener or viewer of their connection with reality. It might be tempting to say this happens with cynicism too, except the majority of attention paid to bubbles is positive , and saying otherwise is a fucking lie. Need to justify unprofitable, unsustainable AI companies? Uber lost money before. Need to explain why AI data centers being built for demand isn’t a problem? Well, the internet exists, and people eventually used that fiber.  You can ignore actual proof while pretending to provide your own, all just by pointing vaguely to things in the past. It takes actual courage to form an opinion, something boosters fundamentally lack.  I’m not saying it’s impossible to make predictions, but that the majority of people make them with flimsy information, such as “this thing happened before” or “everyone’s saying this will happen.” I’m not saying you can’t try and understand what will happen next, but doing so requires you to use information that is not, on its face, generated by wishcasting or events that took place decades ago.  In the end, the greatest lesson we can learn from is that, historically speaking, people tend to fuck around and then find out.  The assumption boosters make is that one can fuck around forever. History tends to disagree. Iran rescinds its ban on travel through the strait. The security situation improves (either because Iran’s ability to attack shipping becomes sufficiently degraded, or because the Gulf countries, or perhaps their Western allies, feel sufficiently confident that they can safely escort ships through the strait).  The current Iranian government is overthrown and the conflict ends.  Both sides reach an agreement and we return to the status quo.  April 1 to 30, 2025, which I estimate as $166 million based on reports of Anthropic’s annualized revenue being $2 billion at the end of March 2025. August 1 to August 20, 2025, which I estimate as $271 million based on July 2025’s revenues ($4 billion). November 1 to November 29, 2025, which I estimate as $556 million, based on October’s $7 billion in annualized revenues.  January 1 to January 11, 2026, which I estimate as $219.1 million, assuming $9 billion in annualized revenue (based on reported December revenues). “Ed, it’s commercial revenue!” — this is all revenue. Anthropic doesn’t have “non-commercial revenue,” unless you are going to use a very, very broad version of what “non-commercial” means, at which point you have to tell me why you trust Anthropic. “This doesn’t include all the revenue up until March 2026! Maybe this suit was written weeks ago!” — even if it doesn’t, based on Anthropic’s own numbers, things don’t line up. Also, this was written specifically as part of the lawsuit with the DoD. It’s recent.  “It says “exceeding”! — it also says “over $10 billion in inference and training costs.” Can I just say whatever number I want here? Because if this is your argument that’s what you’re doing. “That $5 billion number is accurate!” — the only way this makes sense is if some or all of these annualized revenues are incorrect.

0 views

Fragments: March 10

Tech firm fined $1.1m by California for selling high-school students’ data I agree with Brian Marick’s response No such story should be published without a comparison of the fine to the company’s previous year revenue and profits, or valuation of last funding round. (I could only find a valuation of $11.0M in 2017.) We desperately need corporations’ attitudes to shift from “lawbreaking is a low-risk cost of doing business; we get a net profit anyway” to “this could be a death sentence.” ❄                ❄                ❄                ❄                ❄ Charity Majors gave the closing keynote at SRECon last year, encouraging people to engage with generative AI. If I was giving the keynote at SRECon 2026, I would ditch the begrudging stance. I would start by acknowledging that AI is radically changing the way we build software. It’s here, it’s happening, and it is coming for us all. Her agenda this year would be to tell everyone that they mustn’t wait for the wave to crash on them, but to swim out to meet it. In particular, I appreciated her call to resist our confirmation bias: The best advice I can give anyone is: know your nature, and lean against it. ❄                ❄                ❄                ❄                ❄ In a comment to Kief Morris’s recent article on Humans and Agents in Software Loops , in LinkedIn comments Renaud Wilsius may have coined another bit of terminology for the agent+programmer age This completes the story of productivity, but it opens a new chapter on talent: The Apprentice Gap. If we move humans ‘on the loop’ too early in their careers, we risk a future where no one understands the ‘How’ deeply enough to build a robust harness. To manage the flywheel effectively, you still need the intuition that comes from having once been ‘in the loop.’ The next great challenge for CTOs isn’t just Harness Engineering, it’s ‘Experience Engineering’ for our junior developers in an agentic world. ❄                ❄                ❄                ❄                ❄ In hearing conversations about “the ralph loop”, I often hear it in the sense of just letting the agents loose to run on their own. So it’s interesting to read the originator of the ralph loop point out: It’s important to watch the loop as that is where your personal development and learning will come from. When you see a failure domain – put on your engineering hat and resolve the problem so it never happens again. In practice this means doing the loop manually via prompting or via automation with a pause that involves having to prcss CTRL+C to progress onto the next task. This is still ralphing as ralph is about getting the most out how the underlying models work through context engineering and that pattern is GENERIC and can be used for ALL TASKS. At the Thoughtworks Future of Software Development Retreat we were very concerned about cognitive debt. Watching the loop during ralphing is a way to learn about what the agent is building, so that it can be directed effectively in the future. ❄                ❄                ❄                ❄                ❄ Anthropic recently published a page on how AI helps break the cost barrier to COBOL modernization . Using AI to help migrate COBOL systems isn’t an new idea to my colleagues, who shared their experiences using AI for this task over a year ago. While Anthropic’s article is correct about the value of AI, there’s more to the process than throwing some COBOL at an LLM. The assumption that AI can simply translate COBOL into Java treats modernization as a syntactic exercise, as though a system is nothing more than its source code. That premise is flawed. A direct translation would, in the best case scenario, faithfully reproduce existing architectural constraints, accumulated technical debt and outdated design decisions. It wouldn’t address weaknesses; it would restate them in a different language. In practice, modernization is rarely about preserving the past in a new syntax. It’s about aligning systems with current market demands, infrastructure paradigms, software supply chains and operating models. Even if AI were eventually capable of highly reliable code translation, blind conversion would risk recreating the same system with the same limitations, in another language, without a deliberate strategy for replacing or retiring its legacy ecosystem. ❄                ❄                ❄                ❄                ❄ Anders Hoff (inconvergent) an LLM is a compiler in the same way that a slot machine is an ATM ❄                ❄                ❄                ❄                ❄ One of the more interesting aspects of the network of people around Jeffrey Epstein is how many people from academia were connected. It’s understandable why, he had a lot of money to offer, and most academics are always looking for funding for their work. Most of the attention on Epstein’s network focused on those that got involved with him, but I’m interested in those who kept their distance and why - so I enjoyed Jeffrey Mervis’s article in Science Many of the scientists Epstein courted were already well-established and well-funded. So why didn’t they all just say no? Science talked with three who did just that. Here’s how Epstein approached them, and why they refused to have anything to do with him. I believe that keeping away from bad people makes life much more pleasant, if nothing else it reduces a lot of stress. So it’s good to understand how people make decisions on who to avoid. If you are a reflexive naysayer or a pessimist, know that, and force yourself to find a way in to wonder, surprise and delight. If you are an optimist who gets very excited and tends to assume that everything will improve: know that, and force yourself to mind real cautionary tales.

0 views

Building on AT Protocol

At Protocol has got me! I’m morphing into an atmosphere nerd . AT Protocol — atproto for short — is the underlying tech that powers Bluesky and new social web apps. Atproto as I understand it is largely an authorization and data layer. All atproto data is inherently public. In theory it can be encrypted for private use but leaky metadata and de-anonymisation is a whole thing. Atproto users own the keys to their data which is stored on a Personal Data Server (PDS). You don’t need to manage your own. If you don’t know where your data is stored, good chance it’s on Bluesky’s PDS. You can move your data to another PDS like Blacksky or Eurosky . Or if you’re a nerd like me self-host your own PDS . You own your data and no PDS can stop you moving it. Atproto provides OAuth; think “Sign in with GitHub” . But instead of an account being locked behind the whims of proprietary slopware, user identity is proven via their PDS. Social apps like Bluesky host a PDS allowing users to create a new account. That account can be used to login to other apps like pckt , Leaflet , or Tangled . You could start a new account on Tangled’s PDS and use that for Bluesky. Atproto apps are not required to provide a PDS but it helps to onboard new users. Of course I did. You can sign in at attic.social Attic is a cozy space with lofty ambitions. What does Attic do? I’m still deciding… it’ll probably become a random assortment of features. Right now it has bookmarks. Bookmarks will have search and tags soon. Technical details: to keep the server stateless I borrowed ideas from my old SvelteKit auth experiment. OAuth and session state is stored in encrypted HTTP-only cookies. I used the atcute TypeScript libraries to do the heavy atproto work. I found @flo-bit’s projects which helped me understand implementation details. Attic is on Cloudflare workers for now. When I’ve free time I’ll explore the SvelteKit Bunny adapter . I am busy on client projects so I’ll be scheming Attic ideas in my free time. What’s so powerful about atproto is that users can move their account/data. Apps write data to a PDS using a lexicon ; a convention to say: “this is a Bluesky post”, for example. Other apps are free to read that data too. During authorization, apps must ask for permission to write to specific lexicons. The user is in control. You may have heard that Bluesky is or isn’t “decentralised”. Bluesky was simply the first atproto app. Most users start on Bluesky and may never be aware of the AT Protocol. What’s important is that atproto makes it difficult for Bluesky to “pull a Twitter”, i.e. kill 3rd party apps, such as the alternate Witchsky . If I ever abandon attic.social your data is still in your hands. Even if the domain expires! You can extract data from your PDS. You can write a new app to consume it anytime. That’s the power of AT Protocol. Thanks for reading! Follow me on Mastodon and Bluesky . Subscribe to my Blog and Notes or Combined feeds.

0 views
Stratechery Yesterday

Copilot Cowork, Anthropic’s Integration, Microsoft’s New Bundle

Microsoft is seeking to commoditize its complements, but Anthropic has a point of integration of their own; it's good enough that Microsoft is making a new bundle on top of it.

0 views

Dependency tracking is hard

curl and libcurl are written in C. Rather low level components present in many software systems. They are typically not part of any ecosystem at all. They’re just a tool and a library. In lots of places on the web when you mention an Open Source project, you will also get the option to mention in which ecosystem it belongs. npm, go, rust, python etc. There are easily at least a dozen well-known and large ecosystems. curl is not part of any of those. Recently there’s been a push for PURLs ( Package URLs ), for example when describing your specific package in a CVE. A package URL only works when the component is part of an ecosystem. curl is not. We can’t specify curl or libcurl using a PURL. SBOM generators and related scanners use package managers to generate lists of used components and their dependencies . This makes these tools quite frequently just miss and ignore libcurl. It’s not listed by the package managers. It’s just in there, ready to be used. Like magic. It is similarly hard for these tools to figure out that curl in turn also depends and uses other libraries. At build-time you select which – but as we in the curl project primarily just ships tarballs with source code we cannot tell anyone what dependencies their builds have. The additional libraries libcurl itself uses are all similarly outside of the standard ecosystems. Part of the explanation for this is also that libcurl and curl are often shipped bundled with the operating system many times, or sometimes perceived to be part of the OS. Most graphs, SBOM tools and dependency trackers therefore stop at the binding or system that uses curl or libcurl, but without including curl or libcurl. The layer above so to speak. This makes it hard to figure out exactly how many components and how much software is depending on libcurl. A perfect way to illustrate the problem is to check GitHub and see how many among its vast collection of many millions of repositories that depend on curl. After all, curl is installed in some thirty billion installations, so clearly it used a lot . (Most of them being libcurl of course.) It lists one dependency for curl. Repositories that depend on curl/curl: one. Screenshot taken on March 9, 2026 What makes this even more amusing is that it looks like this single dependent repository ( Pupibent/spire ) lists curl as a dependency by mistake.

0 views

10K curl downloads per year

The Linux Foundation, the organization that we want to love but that so often makes that a hard bargain, has created something they call “Insights” where they gather lots of metrics on Open Source projects. I held back so I never blogged and taunted OpenSSF for their scorecard attempts that were always lame and misguided. This Insights thing looks like their next attempt to “grade” and “rate” Open Source. It is so flawed and full of questionable details that I decided there is no point in me listing them all in a blog post – it would just be too long and boring. Instead I will just focus on a single metric. The one that made me laugh out loud when I saw it. They claim curl was downloaded 10,467 times the last year. ( source ) Number of curl downloads the last 365 days according to Linux Foundation What does “a download” mean? They refer to statistics from ecosyste.ms , which is an awesome site and service, but it has absolutely no idea about curl downloads. How often is curl “downloaded”? curl release tarballs are downloaded from curl.se at a rate of roughly 250,000 / month. curl images are currently pulled from docker at a rate of around 400,000 – 700,000 / day. curl is pulled from quay.io at roughly the same rate. curl’s git repository is cloned roughly 32,000 times / day curl is installed from Linux and BSD distributions at an unknown rate. curl, in the form of libcurl, is bundled in countless applications, games, devices, cars, TVs, printers and services, and we cannot even guess how often it is downloaded as such an embedded component. curl is installed by default on every Windows and macOS system since many years back. But no, 10,467 they say.

0 views

What My 2025 Journal Taught Me

Last week I exported my entire Day One journal for 2025 (just the text file) and ran it through ChatGPT, mostly out of curiosity. None of the conclusions were surprising and I could have (and probably did) come to most of them myself. However, it was nice seeing them written out clearly. It was also nice being able to ask follow-up questions, dig a bit deeper into the patterns, and even read some of the insights out loud to my husband. But seeing it all laid out clearly made a few lessons impossible to ignore. When I looked at the entries where I sounded the most content, they all had the same ingredients. Just simple things: swimming in the sea sitting on the beach walking outside quiet mornings with coffee time with my children small family adventures or road trips Those entries have a noticeably calmer tone (according to ChatGPT). It’s a good reminder that the things that regulate me best are actually low-stimulus and simple . Just being outside and present. Another pattern that appeared again and again in my journal: frustration with complexity and that is small, everyday complexity: organising systems digital tools social expectations managing other people’s behaviour I often catch myself mid-entry realising I’m spending time optimising systems instead of actually living (which i discover over and over, in my journaling and on this blog) It’s funny because my brain loves building systems (just look at these entries). But my journal makes it clear they don’t actually make me happier. RELATED:   The Cost of Organizing Ideas – But I Keep Doing It Anyway One theme runs through almost the entire year: awareness of time passing. I write about: my son growing up noticing my daughter becoming her own person reflections on aging wanting to live more fully frustration about wasting time More than anything, 2025 feels like a year where I started asking myself (although probably not just 2025 and what am I going to do about it): Am I actually living the life I want, or just organising it? RELATED: The Art of Organizing (Things That Don’t Need to Be Organized) If there is one clear emotional anchor in my journal, it’s my relationship with my children. Many of the most meaningful entries revolve around them: teaching them how to swim in open water lunch dates with one or both of them watching them grow more independent their humour and imagination small family moments Even tiny everyday experiences become meaningful when I write about them. Reading the year back made me realise that parenthood isn’t just part of my life - it’s the emotional core of it. Another thing that stood out: my tone changes when I travel. Camping trips. Road trips. Travelling back to my home country Visiting other countries During those entries I sound more reflective, more observant, and more alive. Again, duh! My journal also shows a constant push and pull between two sides of myself. One side is the project manager (at home and at work) : organisation productivity digital structure The other side is the observer and writer : noticing small moments reflecting on life When the organisational side takes over too much, I start to feel off balance. My happiest entries happen when structure supports reflection, not when structure replaces it. One of the clearest patterns in the entire journal actually surprised me. The strongest predictor of whether my day felt good or bad wasn’t work, productivity, or even journaling. It was movement. Especially walking. On days where I walk, swim, or do yoga, the tone of the entry is noticeably calmer and clearer (again, according to ChatGPT) On days where I stay indoors on the computer (especially if I end up working from home), I’m far more likely to spiral into overthinking. Even better is when three things happen together: movement (walking/yoga) being outside low pressure (no digital tasks) When those align, everything seems to reset. Looking across all the entries, one theme keeps appearing. The life I seem to want most is actually very simple. It looks something like this: quiet mornings with coffee and reading daily movement outside meaningful work, but not obsessive productivity small adventures with the kids travel (and this includes locally) when possible writing as a natural outlet And I didn’t need ChatGPT to tell me this, though. I already know it, and yet I keep creating complexity (wanting to control) where my life clearly works better with simplicity (letting go of control). So  summed up, the lesson of 2025 is this: Not how to improve my systems. But how to protect the breathing room that makes life feel like living. I only have to make sure I do. The Journal Project I Can’t Quit The Art of Organizing (Things That Don’t Need to Be Organized) Do Fewer Things, Do Them Well The Cost of Organizing Ideas – But I Keep Doing It Anyway A Journey Through Journaling, Tracking and Memories with Day One Committing to the Thinking Life swimming in the sea sitting on the beach walking outside quiet mornings with coffee time with my children small family adventures or road trips organising systems digital tools social expectations managing other people’s behaviour my son growing up noticing my daughter becoming her own person reflections on aging wanting to live more fully frustration about wasting time teaching them how to swim in open water lunch dates with one or both of them watching them grow more independent their humour and imagination small family moments organisation productivity digital structure noticing small moments reflecting on life movement (walking/yoga) being outside low pressure (no digital tasks) quiet mornings with coffee and reading daily movement outside meaningful work, but not obsessive productivity small adventures with the kids travel (and this includes locally) when possible writing as a natural outlet

0 views

Step aside, phone: closing thoughts

Four full weeks of paying more attention to phone screen time are behind us, and it’s time for some closing thoughts on this experiment. But first, a quick recap of how the final week went. The average was slightly higher than the previous 3 weeks, and that was mainly due to what happened on Tuesday and Friday, which, as you can see from the weekly recap, saw higher-than-usual phone usage. On Tuesday, I passed 1 hour of screen time for the first time since the start of this experiment, and that was because of a…phone call? I’m not entirely sure why screen time registers a phone call as screen time, but that's why I passed the 1-hour mark on Tuesday. I had a 30-minute phone call for something work-related, and that apparently is picked up as screen time. Go figure. Aside from that, as you can see, usage was business as usual: about half an hour of messaging and a minute here and there for a few extra things. Friday, I passed the 1-hour mark again, and this time it was actual usage, and it was just Telegram. As you can see from the time distribution, I spent almost 40 minutes chatting with a few people late in the day and aside from Telegram, I barely picked up my phone. The rest of the week was very uneventful. Looking back at these past 4 weeks, I feel like, for me, the way my life is structured at this moment, 4 hours of weekly phone usage is the sweet spot, and I intend to keep it that way. I’m happy I managed not to consume content on my phone. Podcasts, music and RSS are gone from the site, and I feel like my relationship with this stupid object is in a much better place. I have deeper thoughts I want to share, but those will get their own dedicated post, likely tomorrow. How about the others, though? I started this thing to help Kevin get off his phone, and I succeeded so well that he jumped off iOS entirely and moved to Android. Not exactly the outcome we wanted, but hey, at least it's a change. He'll be back using his phone 5 hours a day now that nobody is paying attention. Kev instead is too busy vibe-coding blog platforms to pay attention to his phone, and he abandoned us after one week. As for John, Thomas, and Alex, they all did great, I'd say, and I love that Thomas tracked time spent in front of his computer and not just the phone. Thank you for keeping RSS alive. You're awesome. Email me :: Sign my guestbook :: Support for 1$/month :: See my generous supporters :: Subscribe to People and Blogs Read Kevin's week four recap Read Thomas' week four recap Read John's week four recap Read Alex's week three recaps

0 views

Perhaps not Boring Technology after all

A recurring concern I've seen regarding LLMs for programming is that they will push our technology choices towards the tools that are best represented in their training data, making it harder for new, better tools to break through the noise. This was certainly the case a couple of years ago, when asking models for help with Python or JavaScript appeared to give much better results than questions about less widely used languages. With the latest models running in good coding agent harnesses I'm not sure this continues to hold up. I'm seeing excellent results with my brand new tools where I start by prompting "use uvx showboat --help / rodney --help / chartroom --help to learn about these tools" - the context length of these new models is long enough that they can consume quite a lot of documentation before they start working on a problem. Drop a coding agent into any existing codebase that uses libraries and tools that are too private or too new to feature in the training data and my experience is that it works just fine - the agent will consult enough of the existing examples to understand patterns, then iterate and test its own output to fill in the gaps. This is a surprising result. I thought coding agents would prove to be the ultimate embodiment of the Choose Boring Technology approach, but in practice they don't seem to be affecting my technology choices in that way at all. Update : A few follow-on thoughts: You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options . The issue of what technology LLMs recommend is a separate one. What Claude Code Actually Chooses is an interesting recent study where Edwin Ong and Alex Vikati where they proved Claude Code over 2,000 times and found a strong bias towards build-over-buy but also identified a preferred technical stack, with GitHub Actions, Stripe, and shadcn/ui seeing a "near monopoly" in their respective categories. For the sake of this post my interest is in what happens when the human makes a technology choice that differs from those preferred by the model harness. The Skills mechanism that is being rapidly embraced by most coding agent tools is super-relevant here. We are already seeing projects release official skills to help agents use them - here are examples from Remotion , Supabase , Vercel , and Prisma .

0 views
iDiallo Yesterday

Why Am I Paranoid, You Say?

Technology has advanced to a point I could only have dreamed of as a child. Have you seen the graphics in video games lately? Zero to 60 miles per hour in under two seconds? Communicating with anyone around the world at the touch of a button? It's incredible, to say the least. But every time I grab the TV remote and decline the terms of service, my family watches in confusion. I don't usually have the words to explain my paranoia to them, but let me try. I would love to have all the features enabled on all my devices. I would love to have Siri on my phone. I would love to have Alexa control the lighting in my house and play music on command. I would love to own an electric car with over-the-air updates. I would love to log in with my Google account everywhere. I would love to sign up for your newsletter. I would love to try the free trial. I would love to load all my credit cards onto my phone. I would love all of that. But I can't. I don't get to do these things because I have control over none of them. When I was a kid, I imagined that behind the wild technologies of the future there would be software and hardware, pure and simple. Now that we have the tech, I can say that what I failed to see was that behind every product, there is a company. And these companies are salivating for data. If you're like me, you have dozens of apps on your phone. You can't fit them all on the home screen, so you use a launcher to find the ones you don't open every day. Sometimes, because I have so many, I scroll up and down and still can't find what I'm looking for. Luckily, on most Android phones, there's a search bar at the top to help. But the moment I tap it, a notification pops up asking me to agree to terms and conditions just to use the search. Of course I won't do that. Most people have Siri enabled on their iPhone and never think twice about it. Apple has run several ads touting its privacy-first approach. Yet Apple settled a class action lawsuit last year claiming that Siri had violated users' privacy, to the tune of $95 million . I can't trust any of these companies with my information. They will lose it, or they will sell it. Using Alexa or Google Assistant is no different from using Siri. It's having a microphone in your home that's controlled by a third party. As enthusiastic as I am about electric cars, I didn't see the always-connected aspect coming. I've always assumed that when I pay for something, it belongs to me. But when an automaker can make decisions about your car while it sits in your garage, I'd rather have a dumb car. Unfortunately, it's no longer limited to electric vehicles. Nearly all modern cars now push some form of subscription service on their customers. Have you ever been locked out of your Google account? One day I picked up my phone and, for some reason, my location was set to Vietnam. A few minutes later, I lost access to my Google account. It's one thing to lose access to your email or files in Drive. But when you've used Google to log in to other websites, you're suddenly locked out of those too. Effectively, you're locked out of the internet. I was lucky my account was restored the same day, apparently there were several login attempts from Vietnam. But my account was back in service just in time for me to mark another Stack Overflow question as a duplicate. I don't sign up for services with my real email just to try a free trial, because even when I decide not to continue, the emails keep coming. When my sons were just a few months old, I received a letter in the mail addressed to the baby. It stated that his personal information (name, address, and Social Security number) had been breached. He was still an infant. I had never heard of the company responsible or done any business with them, yet somehow they had managed to lose my child's information. I would love to not worry about any of this, but it's a constant inconvenience. Whenever I grab the TV remote, I accidentally hit the voice button, and the terms of service remind me that my voice may be shared with third parties . Technology is amazing when you have some control over it. But when the terms of service can change out from under you without warning, I'll politely decline and keep my tin hat close by. I have so much to hide .

0 views
Stratechery 2 days ago

MacBook Neo, The (Not-So) Thin MacBook, Apple and Memory

The MacBook Neo was built to be cheap; that it is still good is not only a testament to Apple Silicon, but also the fact that the most important software runs in the cloud.

0 views
ava's blog 2 days ago

yesterday, in my body

If you are a generally healthy person, it can be hard to conceptualize how quickly someone's illnesses can suddenly turn within a few hours, so here is an example from yesterday to illustrate it. My wife and I had made plans to go a tabletop/board game flea market at noon and then head over to a restaurant afterwards. I had slept well, I had a bit of breakfast, I put effort into my looks, I had no pain or other issues, everything was generally fine. My Crohn's disease had acted up here and there in the days prior, but no signs of that yesterday. On the way there, everything was fine. I had forgotten my noise-cancelling headphones at home, but the tram was surprisingly pleasant and manageable without it. I noticed I wasn't able to comfortably stand as long as I had now gotten used to (lower back pain from Bechterew's disease etc.), but I blamed it on being more sedentary recently. The flea market was so full, I only quickly walked through and then waited at the emptier entrance the rest of the time. At the restaurant, I tried a Vietnamese Iced Coffee for the first time, and oh boy... the restaurant really put some extra effort into that! It was very bitter and the coconut cream they included was extra, sickeningly, sweet to make up for it. Since my month without caffeine, I had gotten extra sensitive to caffeine again, and I tend to react badly to lots of sugar, so I expected some negative consequences, but it was tasty. On the way home, I start feeling extremely anxious due to the caffeine. I'm overwhelmed and extra sensitive, every noise and smell is too strong, I feel deeply uncomfortable in my body and just want to run away. I can't at least take away the sound element, because again, I had left the noise-cancelling headphones at home; deep regret at that point. When we make it home, I immediately free myself from everything that isn't necessary or comfortable and lie down in bed. I don't wanna be touched, and I don't want to talk, and if I have to talk, I whisper. Every sound feels like nails on a chalkboard, and every touch burns like lava. After some hours, I recover. I make some dinner with leftovers, and afterwards, decide I should work out at least a bit, as I feel okay again. A few minutes on my indoor cycle, and my body just feels off. I feel weak, but not the kind of weak you feel when you just need to eat or drink something. I start to feel really fatigued from the simplest and easiest movement, and I check my pulse on my watch. There it is, my best indicator that inflammation is currently high in my body: Unusually high bpm for what I do. I was rather slow pedaling without much resistance, and I was already at 122bpm when usually, I'd be at 104-110 max for this warmup/difficulty. Damn. I try to at least finish with very light, easy cycling, but I have to stop entirely. This kind of fatigue feels like you're forced to walk in slow motion, like a dream, or like underwater; everything feels like it has a weird, invisible resistance, and your limbs are so heavy. I try if I can at least do some stretching and crunches on my yoga mat, and that's easier. I still feel weird and fragile, but it's manageable. When I stop, the fatigue hits me like a brick wall. I only have energy to change clothes and collapse onto the sofa. That's where my usual "my autoimmune disorders are acting up" routine starts; I can barely manage anything. I don't really want to move, especially not my arms. I can barely find the words or express myself due to massive brain fog. I feel like I am a tiny ball living in my chest cavity, stuck in a huge meat mech. When it gets bad, I can no longer even handle looking at my phone, I can just lie there and focus on my breathing. That usually goes hand in hand with some general pain and discomfort that's hard to localize and feels like a huge cloud surrounding me, and I ask my wife for my pain/anti-inflammation meds, because otherwise I just start writhing around groaning all the time. I also fall asleep on the sofa, only going to bed some unknown time later (probably close to midnight?). It's the next morning now, and I still feel a little off, but mostly fine, and I'll be taking it slow with my body today; no exercise, no going outside, and lots of rest, though I am working from home, and I have to study a bit for my exam tomorrow! Wish me luck. Unfortunately, it's no coincidence this stuff mostly happens around stress points like exams, and I'm sure the sugar and caffeine didn't help... 😐 Reply via email Published 09 Mar, 2026

0 views
(think) 2 days ago

Emacs and Vim in the Age of AI

It’s tough to make predictions, especially about the future. – Yogi Berra I’ve been an Emacs fanatic for over 20 years. I’ve built and maintained some of the most popular Emacs packages, contributed to Emacs itself, and spent countless hours tweaking my configuration. Emacs isn’t just my editor – it’s my passion, and my happy place. Over the past year I’ve also been spending a lot of time with Vim and Neovim, relearning them from scratch and having a blast contrasting how the two communities approach similar problems. It’s been a fun and refreshing experience. 1 And lately, like everyone else in our industry, I’ve been playing with AI tools – Claude Code in particular – watching the impact of AI on the broader programming landscape, and pondering what it all means for the future of programming. Naturally, I keep coming back to the same question: what happens to my beloved Emacs and its “arch nemesis” Vim in this brave new world? I think the answer is more nuanced than either “they’re doomed” or “nothing changes”. Predicting the future is obviously hard work, but it’s so fun to speculate on it. My reasoning is that every major industry shift presents plenty of risks and opportunities for those involved in it, so I want to spend a bit of time ruminating over the risks and opportunities for Emacs and Vim. VS Code is already the dominant editor by a wide margin, and it’s going to get first-class integrations with every major AI tool – Copilot (obviously), Codex, Claude, Gemini, you name it. Microsoft has every incentive to make VS Code the best possible host for AI-assisted development, and the resources to do it. On top of that, purpose-built AI editors like Cursor , Windsurf , and others are attracting serious investment and talent. These aren’t adding AI to an existing editor as an afterthought – they’re building the entire experience around AI workflows. They offer integrated context management, inline diffs, multi-file editing, and agent loops that feel native rather than bolted on. Every developer who switches to one of these tools is a developer who isn’t learning Emacs or Vim keybindings, isn’t writing Elisp, and isn’t contributing to our ecosystems. The gravity well is real. I never tried Cursor and Windsurf simply because they are essentially forks of VS Code and I can’t stand VS Code. I tried it several times over the years and I never felt productive in it for a variety of reasons. Part of the case for Emacs and Vim has always been that they make you faster at writing and editing code. The keybindings, the macros, the extensibility – all of it is in service of making the human more efficient at the mechanical act of coding. But if AI is writing most of your code, how much does mechanical editing speed matter? When you’re reviewing and steering AI-generated diffs rather than typing code character by character, the bottleneck shifts from “how fast can I edit” to “how well can I specify intent and evaluate output.” That’s a fundamentally different skill, and it’s not clear that Emacs or Vim have an inherent advantage there. The learning curve argument gets harder to justify too. “Spend six months learning Emacs and you’ll be 10x faster” is a tough sell when a junior developer with Cursor can scaffold an entire application in an afternoon. 2 VS Code has Microsoft. Cursor has venture capital. Emacs has… a small group of volunteers and the FSF. Vim had Bram, and now has a community of maintainers. Neovim has a small but dedicated core team. This has always been the case, of course, but AI amplifies the gap. Building deep AI integrations requires keeping up with fast-moving APIs, models, and paradigms. Well-funded teams can dedicate engineers to this full-time. Volunteer-driven projects move at the pace of people’s spare time and enthusiasm. Let’s go all the way: what if programming as we know it is fully automated within the next decade? If AI agents can take a specification and produce working, tested, deployed software without human intervention, we won’t need coding editors at all. Not Emacs, not Vim, not VS Code, not Cursor. The entire category becomes irrelevant. I don’t think this is likely in the near term, but it’s worth acknowledging as a possibility. The trajectory of AI capabilities has surprised even the optimists (and I was initially an AI skeptic, but the rapid advancements last year eventually changed my mind). Here’s the thing almost nobody is talking about: Emacs and Vim have always suffered from the obscurity of their extension languages. Emacs Lisp is a 1980s Lisp dialect that most programmers have never seen before. VimScript is… VimScript. Even Lua, which Neovim adopted specifically because it’s more approachable, is niche enough that most developers haven’t written a line of it. This has been the single biggest bottleneck for both ecosystems. Not the editors themselves – they’re incredibly powerful – but the fact that customizing them requires learning an unfamiliar language, and most people never make it past copying snippets from blog posts and READMEs. I felt incredibly overwhelmed by Elisp and VimScript when I was learning Emacs and Vim for the first time, and I imagine I wasn’t the only one. I started to feel very productive in Emacs only after putting in quite a lot of time to actually learn Elisp properly. (never bothered to do the same for VimScript, though, and admittedly I’m not too eager to master Lua either) AI changes this overnight. You can now describe what you want in plain English and get working Elisp, VimScript, or Lua. “Write me an Emacs function that reformats the current paragraph to 72 columns and adds a prefix” – done. “Configure lazy.nvim to set up LSP with these keybindings” – done. The extension language barrier, which has been the biggest obstacle to adoption for decades, is suddenly much lower. After 20+ years in the Emacs community, I often have the feeling that a relatively small group – maybe 50 to 100 people – is driving most of the meaningful progress. The same names show up in MELPA, on the mailing lists, and in bug reports. This isn’t a criticism of those people (I’m proud to be among them), but it’s a structural weakness. A community that depends on so few contributors is fragile. And it’s not just Elisp and VimScript. The C internals of both Emacs and Vim (and Neovim’s C core) are maintained by an even smaller group. Finding people who are both willing and able to hack on decades-old C codebases is genuinely hard, and it’s only getting harder as fewer developers learn C at all. AI tools can help here in two ways. First, they lower the barrier for new contributors – someone who understands the concept of what they want to build can now get AI assistance with the implementation in an unfamiliar language. Second, they help existing maintainers move faster. I’ve personally found that AI is excellent at generating test scaffolding, writing documentation, and handling the tedious parts of package maintenance that slow everything down. The Emacs and Neovim communities aren’t sitting idle. There are already impressive AI integrations: And this is just a sample. Building these integrations isn’t as hard as it might seem – the APIs are straightforward, and the extensibility of both editors means you can wire up AI tools in ways that feel native. With AI assistance, creating new integrations becomes even easier. I wouldn’t be surprised if the pace of plugin development accelerates significantly. Here’s an irony that deserves more attention: many of the most powerful AI coding tools are terminal-native. Claude Code, Aider, and various Copilot CLI tools all run in the terminal. And what lives in the terminal? Emacs and Vim. 3 Running Claude Code in an Emacs buffer or a Neovim terminal split is a perfectly natural workflow. You get the AI agent in one pane and your editor in another, with all your keybindings and tools intact. There’s no context switching to a different application – it’s all in the same environment. This is actually an advantage over GUI-based AI editors, where the AI integration is tightly coupled to the editor’s own interface. With terminal-native tools, you get to choose your own editor and your own AI tool, and they compose naturally. Emacs’s “editor as operating system” philosophy is uniquely well-suited to AI integration. It’s not just a code editor – it’s a mail client (Gnus, mu4e), a note-taking system (Org mode), a Git interface (Magit), a terminal emulator, a file manager, an RSS reader, and much more. AI can be integrated at every one of these layers. Imagine an AI assistant that can read your org-mode agenda, draft email replies in mu4e, help you write commit messages in Magit, and refactor code in your source buffers – all within the same environment, sharing context. No other editor architecture makes this kind of deep, cross-domain integration as natural as Emacs does. Admittedly, I’ve stopped using Emacs as my OS a long time ago, and these days I use it mostly for programming and blogging. (I’m writing this article in Emacs with the help of ) Still, I’m only one Emacs user and many are probably using it in a more holistic manner. One of the most underappreciated benefits of AI for Emacs and Vim users is mundane: troubleshooting. Both editors have notoriously steep learning curves and opaque error messages. “Wrong type argument: stringp, nil” has driven more people away from Emacs than any competitor ever did. AI tools are remarkably good at explaining cryptic error messages, diagnosing configuration issues, and suggesting fixes. They can read your init file and spot the problem. They can explain what a piece of Elisp does. They can help you understand why your keybinding isn’t working. This dramatically flattens the learning curve – not by making the editor simpler, but by giving every user access to a patient, knowledgeable guide. I don’t really need any AI assistance to troubleshoot anything in my Emacs setup, but it’s been handy occasionally in Neovim-land, where my knowledge is relatively modest by comparison. There’s at least one documented case of someone returning to Emacs after years away , specifically because Claude Code made it painless to fix configuration issues. They’d left for IntelliJ because the configuration burden got too annoying – and came back once AI removed that barrier. “Happy f*cking days I’m home again,” as they put it. If AI can bring back lapsed Emacs users, that’s a good thing in my book. Let’s revisit the doomsday scenario. Say programming is fully automated and nobody writes code anymore. Does Emacs die? Not necessarily. Emacs is already used for far more than programming. People use Org mode to manage their entire lives – tasks, notes, calendars, journals, time tracking, even academic papers. Emacs is a capable writing environment for prose, with excellent support for LaTeX, Markdown, AsciiDoc, and plain text. You can read email, browse the web, manage files, and yes, play Tetris. Vim, similarly, is a text editing paradigm as much as a program. Vim keybindings have colonized every text input in the computing world – VS Code, IntelliJ, browsers, shells, even Emacs (via Evil mode). Even if the Vim program fades, the Vim idea is immortal. 4 And who knows – maybe there’ll be a market for artisanal, hand-crafted software one day. “Locally sourced, free-range code, written by a human in Emacs.” I’d buy that t-shirt. And I’m fairly certain those artisan programmers won’t be using VS Code. So even in the most extreme scenario, both editors have a life beyond code. A diminished one, perhaps, but a life nonetheless. I think what’s actually happening is more interesting than “editors die” or “editors are fine.” The role of the editor is shifting. For decades, the editor was where you wrote code. Increasingly, it’s becoming where you review, steer, and refine code that AI writes. The skills that matter are shifting from typing speed and editing gymnastics to specification clarity, code reading, and architectural judgment. In this world, the editor that wins isn’t the one with the best code completion – it’s the one that gives you the most control over your workflow. And that has always been Emacs and Vim’s core value proposition. The question is whether the communities can adapt fast enough. The tools are there. The architecture is there. The philosophy is right. What’s needed is people – more contributors, more plugin authors, more documentation writers, more voices in the conversation. AI can help bridge the gap, but it can’t replace genuine community engagement. Not everyone in the Emacs and Vim communities is enthusiastic about AI, and the objections go beyond mere technophobia. There are legitimate ethical concerns that are going to be debated for a long time: Energy consumption. Training and running large language models requires enormous amounts of compute and electricity. For communities that have long valued efficiency and minimalism – Emacs users who pride themselves on running a 40-year-old editor, Vim users who boast about their sub-second startup times – the environmental cost of AI is hard to ignore. Copyright and training data. LLMs are trained on vast corpora of code and text, and the legality and ethics of that training remain contested. Some developers are uncomfortable using tools that may have learned from copyrighted code without explicit consent. This concern hits close to home for open-source communities that care deeply about licensing. Job displacement. If AI makes developers significantly more productive, fewer developers might be needed. This is an uncomfortable thought for any programming community, and it’s especially pointed for editors whose identity is built around empowering human programmers. These concerns are already producing concrete action. The Vim community recently saw the creation of EVi , a fork of Vim whose entire raison d’etre is to provide a text editor free from AI integration. Whether you agree with the premise or not, the fact that people are forking established editors over this tells you how strongly some community members feel. I don’t think these concerns should stop anyone from exploring AI tools, but they’re real and worth taking seriously. I expect to see plenty of spirited debate about this on emacs-devel and the Neovim issue tracker in the years ahead. The future ain’t what it used to be. – Yogi Berra I won’t pretend I’m not worried. The AI wave is moving fast, the incumbents have massive advantages in funding and mindshare, and the very nature of programming is shifting under our feet. It’s entirely possible that Emacs and Vim will gradually fade into niche obscurity, used only by a handful of diehards who refuse to move on. But I’ve been hearing that Emacs is dying for 20 years, and it’s still here. The community is small but passionate, the editor is more capable than ever, and the architecture is genuinely well-suited to the AI era. Vim’s situation is similar – the core idea is so powerful that it keeps finding new expression (Neovim being the latest and most vigorous incarnation). The editors that survive won’t be the ones with the flashiest AI features. They’ll be the ones whose users care enough to keep building, adapting, and sharing. That’s always been the real engine of open-source software, and no amount of AI changes that. So if you’re an Emacs or Vim user: don’t panic, but don’t be complacent either. Learn the new AI tools (if you’re not fundamentally opposed to them, that is). Pimp your setup and make it awesome. Write about your workflows. Help newcomers. The best way to ensure your editor survives the AI age is to make it thrive in it. Maybe the future ain’t what it used to be – but that’s not necessarily a bad thing. That’s all I have for you today. Keep hacking! If you’re curious about my Vim adventures, I wrote about them in Learning Vim in 3 Steps .  ↩︎ Not to mention you’ll probably have to put in several years in Emacs before you’re actually more productive than you were with your old editor/IDE of choice.  ↩︎ At least some of the time. Admittedly I usually use Emacs in GUI mode, but I always use (Neo)vim in the terminal.  ↩︎ Even Claude Code has vim mode.  ↩︎ gptel – a versatile LLM client that supports multiple backends (Claude, GPT, Gemini, local models) ellama – an Emacs interface for interacting with LLMs via llama.cpp and Ollama aider.el – Emacs integration for Aider , the popular AI pair programming tool copilot.el – GitHub Copilot integration (I happen to be the current maintainer of the project) elysium – an AI-powered coding assistant with inline diff application agent-shell – a native Emacs buffer for interacting with LLM agents (Claude Code, Gemini CLI, etc.) via the Agent Client Protocol avante.nvim – a Cursor-like AI coding experience inside Neovim codecompanion.nvim – a Copilot Chat replacement supporting multiple LLM providers copilot.lua – native Copilot integration for Neovim gp.nvim – ChatGPT-like sessions in Neovim with support for multiple providers Energy consumption. Training and running large language models requires enormous amounts of compute and electricity. For communities that have long valued efficiency and minimalism – Emacs users who pride themselves on running a 40-year-old editor, Vim users who boast about their sub-second startup times – the environmental cost of AI is hard to ignore. Copyright and training data. LLMs are trained on vast corpora of code and text, and the legality and ethics of that training remain contested. Some developers are uncomfortable using tools that may have learned from copyrighted code without explicit consent. This concern hits close to home for open-source communities that care deeply about licensing. Job displacement. If AI makes developers significantly more productive, fewer developers might be needed. This is an uncomfortable thought for any programming community, and it’s especially pointed for editors whose identity is built around empowering human programmers. If you’re curious about my Vim adventures, I wrote about them in Learning Vim in 3 Steps .  ↩︎ Not to mention you’ll probably have to put in several years in Emacs before you’re actually more productive than you were with your old editor/IDE of choice.  ↩︎ At least some of the time. Admittedly I usually use Emacs in GUI mode, but I always use (Neo)vim in the terminal.  ↩︎ Even Claude Code has vim mode.  ↩︎

0 views

Human Intuition, AI Formalization: A Real Analysis Case Study

Disclaimer - I wrote the core ideas; Claude helped flesh out and polish the article. See appendix for more on this. This is a follow up to my previous post on leaning on Claude for Lean. I’ve now worked up to chapter 8.3 in Tao’s companion. The speed is great and Claude’s capabilities continue to impress (autoformalization is possible, but not my goal). I haven’t been stuck on anything so far. I’ve also upstreamed many typos to the companion repo .

0 views