Latest Posts (20 found)

How To (Not) Spend $10k/wk on Coding Agents

Word on the street is that the cost of building software is going to zero. Zero? Sounds like a good deal! Over the last year, my co-founder and I have iteratively automated our coding loops. Each time better tools revealed a bottleneck, we’d address it. Agents would sometimes break things or propose slop, so we added more tests and guardrails . Our PRs piled up, so we automated the mechanical parts of review. When UX review became a bottleneck, we had agents attach demo videos to PRs. Nothing revolutionary – just stubborn hill-climbing on our dev loops. By April, Jenn had our agents humming . They were automatically and safely fixing lints, nitpicks, merge conflicts, outdated dependencies, and other maintenance chores. Our velocity kept increasing, and I started to file bugs and propose improvements that we previously wouldn’t have had time for. We could respond to user requests same-day. We were working hard, but moving fast. It was also generating a lot of Anthropic and Cursor and OpenAI overage emails? But this is the way of the future. We’re a funded startup. And coding costs are supposed to go to zero. And Jenn’s shipping 30 PRs in a day now. Being delightfully responsive to our customers is a superpower! Okay okay, this is a lot of overage emails. I’ll assess our weekly spend. Hey Jenn? How surprised would you be if I told you we’re spending $10,000 a week on coding? I have been working a lot. But… that is too much. StrongDM infamously claimed in February that you should be spending at least $1,000 per engineer per day on your coding factory. Well, we did it. Do we get a prize for our profligacy? Yes, our coding agents were delivering a lot of improvements. But you can hire a pretty good engineer for $10k/week. Within a couple days we’d cut this spend way down, while maintaining most of the velocity – some techniques for this below. But every week, I hear of more teams hitting this same transition point. Coordinating agents to do something is less compelling if they’re more expensive than having a human do that same work. Thus, we’re moving from the era of “how can we use more coding agents?” to “how can we get the most out of our coding agent spend?” This change is all around us. Sam Altman has started saying that token costs are a “huge issue” . Brian Armstrong is talking about how to do more agentic coding at a lower cost . Uber is throwing model labs under the bus over agent spend. The vibe has shifted. There are a few reasons for this, but much of it is downstream from cloud coding. While you can go pretty far juggling a few agents on your laptop and leaving it propped open all day , getting the most out of coding agents really requires them to run in the cloud. We’ve seen a lot of workflow benefits from cloud coding, many of which are described in OpenAI’s Symphony coding factory blog post from April . However, it’s worth noting this in OpenAI’s writeup: This way of working dramatically reduces the cognitive cost of kicking off ambiguous work. If the agent gets something wrong, that’s still useful information, and the cost to us is near zero. We can very cheaply file tickets for the agent to go prototype and explore, and throw away any explorations we don’t like. “Very cheaply” if your tokens are free! Model labs’ employees are shielded from the cost of full-on software factories. For the rest of us, cloud coding gets expensive for the same reasons that cloud compute gets expensive: When we first got into cloud agents, we ended up primarily using Cursor’s. While Codex and Claude Code are the current monarchs of local development, Cursor’s cloud coding harness and workflows are quite a bit more mature than Codex’s or Claude’s. 1 However, this capability comes at a premium: Cursor charges a markup on API costs on top of Anthropic and OpenAI, and you can’t turn off their “MAX” mode for cloud coding. Using Claude Fable for Cursor Cloud Agents is fecking expensive. At the AI Engineering World’s Fair this morning, Openclaw creator Peter Steinberger gave a short talk about cloud coding workflows, where he joked about token costs: Last year, I was primarily constrained by tokens. And I fixed that! By joining OpenAI. Luckily, there are other approaches to deal with exploding inference bills. Coding costs are the multiplicand of token cost and token count, so let’s look at each. While the typical advice for coding has long been to use frontier models for everything, and model labs have been mostly focused on those most-expensive models (with Claude Fable hitting $10/Mtok), that’s starting to change. While these smaller models can be much cheaper for easy tasks, it can be a pain to route queries to the right-sized model. Anybody who has run an engineering team has lived this: you select a set of tasks for a junior dev, thinking “these should be easy ones.” Most of them are easy enough, but one of them turns out to be fiendishly difficult when they dig in. If you’re lucky, Composer or GLM will throw up its hands when it’s assigned an overly difficult task without wasting too much money. If you’re unlucky, it will spin for ages, coding confidently incorrect PRs that waste your time and money – or worse, get deployed to users. Various startups are working on model routers that attempt to accurately assess how hard an issue is, and assign a right-sized model to each task and subtask. I’m a bit skeptical that 3rd parties will be able to produce better routers than the model labs themselves, but until the model labs have effective routers of their own, a combination of automatic and human routing is necessary to balance perf and costs. In coding, the most expensive tokens are still often worth the price, but we want to use them judiciously. And even for cheaper models, it’s best to use the minimum number of tokens necessary for a given unit of product improvement shipped. That means sending less context for each turn of the agent, and taking fewer turns to get to success. The first step toward using fewer tokens is having an agent-ready codebase . There are well-known techniques for making a repo more navigable by agents. This is obviously easiest for greenfield projects, but there is a lot you can do to help agents be efficient in any codebase, reducing the need for them to explore and trial-and-error to understand your product and verify PRs. For example, working to shift verification left can save a lot of tokens. Just like for humans, it’s much cheaper for an agent to run and fix lints early on, than to push to CI, tail CI logs, and only receive an error much later. Another useful tactic is to prune your context. In Claude Code, for example, you can type to get an overview of what your session is spending context on. It’s also worth reviewing expensive agent sessions after the fact using a tool like AgentsView , helping you assess what was using all those tokens. At one point we noticed agents running our unit tests with and processing this output. All they needed in context was a simple “success”! Going further, there are tools like Unblocked’s context engine that can help deduplicate and reduce how much context needs to be sent to the agent – collapsing the repetitive work of gathering and pruning that every agent session needs to do, making for a more deterministic and cheaper loop. And while cloud coding agents are awesome, it’s important to watch how their behaviour differs from local agents’. For example, Cursor’s cloud harness has a different system prompt that pushes it to continue relentlessly until it strictly needs user input, and to eagerly push intermediate work to GitHub. There are advantages to these behaviours, but we found our coding agents were pushing intermediate PRs to Github ~10x as often as our human-driven sessions. This led to a huge increase in CI runs and LLM-powered guardrail checks until we tamped this down. And finally: don’t do shit you don’t need to do. Getting a coding factory up to speed can be intoxicating. It can feel urgent to fuel it with work. While having idle salaried engineers is indeed wasteful, it’s better to let your coding agents run idle if you need to think, talk to customers, and understand what really needs to be built. Don’t simply forge onward in the wrong direction at high velocity. We’re in the early stages of rethinking how software engineering works. Some teams are still working toward using many agents at once, and the rest of us are grappling with the side effects of doing so. Each generation of coding agents is capable of doing more helpful work, and making yet bigger messes. More factory-like coding patterns require us to develop new tools and expertise to use them well. Occasionally getting there is messy, expensive, or frustrating. But the prize on the other side is being able to build better and more useful software than we ever could before. Plus, it’s a lot of fun to figure out. OpenAI has been teasing that an overhauled cloud coding environment is coming. Here’s hoping Anthropic is too. ↩ Composer also has a Fast variant that is $3/Mtok, but for cloud coding you often don’t need to pay for the extra speed. Another wrinkle with Composer: cache hits consume a full 40% of uncached token cost, as opposed to Claude’s mere 10%. ↩ It makes it easy to do lots of work at once It’s more expensive per unit of work than using your own laptop It can make wasteful work go unnoticed Today Anthropic announced Sonnet 5 , at $3/Mtok GPT 5.6 will offer medium and small variants at $2.50 and $1 respectively GLM 5.2 is perhaps the most frontier-competitive open coding model in years, and it’s being offered around $1 Cursor’s Composer is only $0.5/mtok 2 – 1/20th of Fable’s cost. OpenAI has been teasing that an overhauled cloud coding environment is coming. Here’s hoping Anthropic is too. ↩ Composer also has a Fast variant that is $3/Mtok, but for cloud coding you often don’t need to pay for the extra speed. Another wrinkle with Composer: cache hits consume a full 40% of uncached token cost, as opposed to Claude’s mere 10%. ↩

0 views

Unfinished, part deux

Two years ago, I published a post entitled Unfinished . It was a way for me to share some thoughts without having to work on them as much as I do on regular posts. As I wasn’t sure if these “lesser thoughts” were worth my efforts and my time, I compiled them in a different post format, inspired by a song: This post is inspired by the excellent track entitled Lamb’s Garbage (Unfinished) , from the classic album and one of my favourites, Mr Oizo’s Lambs Anger . The concept of the song, as its title suggests, is to regroup bits of songs that were never completed to be full tracks. Well, here we are again. The text file where I jot down all my ideas, quick thoughts, and potential topics for blog articles is starting to get a bit too long for my liking, so I think it’s time for a little clean-up. What you will see below is what was saved from the big flush, and what I don’t share on social media since I am no longer participating . Think of this as a list of intros, tweets, and blurbs of what was going on in my head recently. I believe some of these themes can be used later for a full post; in the meantime, feel free to use them for your own blog. And if you don’t have a blog, please, start a blog . I love spreadsheets. This is something I find a bit difficult to admit, but I do like working in spreadsheets. I even firmly believe that Google Sheets is their best product. I already like lists, but a spreadsheet is on another level. I like to make my spreadsheets look pretty, I like to plan how they will look, I like to build, I like to make them functional, legible, easy to read. For me, it’s a very pleasing and interesting thing to do at work: there are so many possibilities. When I create a spreadsheet, I feel like an app developer. I feel like I’m a graphic designer. I had a co-worker once whose job included the creation and design of very complex spreadsheets for other teams, using Microsoft Excel, Microsoft Power BI, and such. The resulting spreadsheets were glorious: fully featured and interactive dashboards, gathering data from different sources in real-time. Works of art. Are answers from A.I. chatbots recycled for other users asking the exact same thing, or are answers always generated from scratch? Wouldn’t it be cheaper and more energy-efficient ? If I ask “ explain the difference between irony and happenstance ”, will the A.I. chatbot just paste an existing, perfectly fine answer (one that received positive feedback in previous chats), or will it work to generate a brand new answer? Why do so many people keep saying “Samsung charger” or “iPhone charger” instead of USB-C, USB Type C, or just USB? I mean, despite these cables and connectors being ubiquitous in our lives, I see a lot of people completely ignoring what they are called. I wonder why. Don't brag so much about using A.I. It’s great that you used A.I. to do this thing you’re presenting. I can see how it has been useful and how much faster it helped you reach your goals. I understand that without A.I. you could never have pulled this off. I know it’s a way to show how you are part of the A.I. revolution, that you’re not left behind. No shame in that. I work with A.I. a lot too, I’m not judging you for that. But please, don’t present your use of A.I. as a skill. It’s just a tool. Your skills are elsewhere. Having access to tokens is a weird flex. The tools you use and how you use them may interest a few of your peers, but what you create with these tools is what truly matters. Do you know what type of video cameras were used in your favourite film? Do you care? By the way, the same piece of advice applies to air fryers. If we work so hard on automating our current tasks and projects with A.I. agents, how will we tell which ones are worth doing at all? Does everything need to be A.I.-enabled and optimised? Are we reproducing the same mistake that we made with social media, shoving it everywhere we could? On that topic, I highly recommend this excellent article on The Verge . Efficiency is not the ultimate goal for most people: efficiency for what? For whom? Besides, friction is not always a problem : sometimes friction is how new ideas spark to life. If you are like me, an avid consumer of Techmeme , you will have noticed that A.I. companies get a huge part of the coverage these days. I don’t know if it’s an editorial choice of Techmeme or if it’s just a reflection of the public reception of said news, but my gosh it seems that Gemini or ChatGPT or Claude gets an incremental update every day, and they float on top of the site’s homepage seemingly forever. I wouldn’t mind a new site just for A.I. news, just like Mediagazer does what Techmeme does but for everything media-related. I’d call it Datacenter and it would make Techmeme a bit more interesting. I recently discovered that something I immensely dislike has a name: the Rae Dunn style for household items. Billionaires cannot stand the idea of a democracy where their individual vote is, technically, worth exactly as much as the vote from the person who takes care of their laundry. They hate that. So what do they do? They buy media or social media companies to try to influence thousands to vote like them. Side note on the ridiculous LinkedIn habit that consists of putting a link in the comments of a post, and writing in the post “Link in the comments”. Just put the link in the post, as you’re supposed to, so we can have a nice preview of the post, and we don’t have to look at the even more ridiculous comments of every LinkedIn post. How messed up is that? I know it’s for better “reach” and to trick the algorithm, but you just look thirsty for likes. Isn’t that link the thing you wanted to share? Do you prefer a click or a like? What’s a like good for if nobody visits your link? Thankfully, I don’t have a LinkedIn account, and I can ignore this nonsense most of the time, but I do check on a few LinkedIn posts for work and this is making me both sad and angry. On Instagram, the whole “Link in bio” was necessary because that was the only way to share links back then. But LinkedIn? No excuse. Yes, it sucks that their algorithm prefers posts that won’t send users out of their precious, shitty platform. I’m with you. But you don’t have to play their silly little game. You’re better than this.

0 views

Writing an LLM from scratch, part 34a -- building a JAX training loop for an LLM training run

For over a year, I've been using Sebastian Raschka 's book " Build a Large Language Model (from Scratch) " -- and the multitude of side-projects that have branched out from reading it -- as something like a curriculum for learning about modern AI. The one final task I had set myself was to build and train an LLM from scratch just using my notes -- no reference to the book, no reference to the model code I'd written following the book. As an output, I wanted something as good as my best PyTorch model based on Raschka's code -- a base model, trained on 3.2B tokens, that my (admittedly limited) evals ranked as being close to the original GPT-2 small's quality. I wanted to use a different framework, just to make sure I wasn't parroting code that I'd somehow memorised, so I asked people on Twitter which one I should use, and the winner was JAX . I took a slightly different route to Raschka's book; he takes an inside-out perspective, explaining things like attention, gradually building up a complete GPT-2-style model, and then building a training loop on top of it. I wanted to go outside-in: I'd put together a training harness to train the simplest-possible model with an API similar to a real LLM, get that working to my satisfaction, and then add features to that simple model, one by one, until it had the full architecture in place. The plan (which actually worked out nicely!) was that I'd be able to show how each change improved things. That's all done now, and I'm posting about it in two parts; in this one, I'll explain how I built the training harness, and in the next, I'll show the actual building and training of the LLM. So let's get started! JAX itself has a relatively minimal API, and doesn't include standard neural network components like linear layers. Likewise it doesn't have any built-in optimisers, data loaders or similar ML utilities. Now, I could have decided to build my LLM using just pure JAX, like I previously did with a toy XOR model . But I felt that it would be better to build this in the style that real-world JAX code is written, which would mean using some of the many utility libraries . On the JAX site itself, there was a useful-looking link: "If you’re looking to use JAX to train neural networks, check out the JAX AI Stack !" On the linked page, it made it clear that the two core parts of that stack were: I took a look at both, and they seemed pretty easy to grasp. Indeed, at first glance, I felt that NNX looked pretty PyTorch-like! In their tutorial example, the only real obvious difference was the JAX-y derivative-style gradient calculation and the way that random numbers were handled. And even the random numbers were handled in a less pure-functional way than pure JAX -- instead of having to mess around with splitting keys, you could just pass in what appeared to be a stateful variable that somehow split itself internally as needed. So, NNX and Optax were the frameworks I'd use. Rather than grinding through the tutorials, I decided that I'd just dive right in, and try to pick things up as I went along. How hard could it be...? To build a functioning training loop, I needed a minimal model to train -- not an actual LLM, but something that behaved at least a bit like one. It would take in a sequence of tokens, and spit out logits for each token. In my preferred model of how LLMs work , at the top level for a model, we feed in a sequence of token IDs, then: All of that suggested to me that the dumbest "LLM" I could write just to get started would be one that just projected token IDs into embedding space, and then projected back to vocab space. No Transformer layers at all. I'd then train it so that instead of trying to predict the next token, it would try to "predict" what was fed into it in the first place. In other words, you'd feed the training loop this input: ...and this target ...rather than the normal setup for an LLM, where you feed it ...and give it targets of If I could get that to work -- and it felt like the kind of thing where you'd be able to get the loss down to near-zero without a huge amount of training -- then I could be reasonably sure that I had a working training loop. 1 I decided to call this an A-to-A model. Coding up the model itself was ridiculously simple: it looked like this: There's as much boilerplate in there -- for the parameters that I knew that the model would need when I built out the full LLM -- as there is actual code doing stuff! But the training loop was a bit more fun. As I said, my plan here was to make sure my understanding of the internals of LLMs was correct by rebuilding one just from my notes. That "notes only" restriction didn't apply to the training loop itself, so I allowed myself to crib a bit from the PyTorch DistributedDataParallel code that I'd been using to train the original model in the cloud. The first version that I used is here . Let's start at the bottom, where we have the function . It starts with some boilerplate to handle the concept of "runs". This is a pattern I've found myself using in most of my projects. When working on a model, it's useful to be able to do multiple training runs, changing things each time. You want to keep the checkpoints, metadata and training charts for each one for future reference. So in my repo, I'll have a "runs" directory, and in there subdirectories for each training run I want to track. In those subdirectories, there are JSON files -- one to configure the model, , and one to configure the training hyperparameters and similar stuff, . (It's worth noting that at this stage, a bunch of those hyperparameters were unused; I kept them in there out of laziness, as I knew I'd need them later.) So we start our function by loading those. Our next step is to completely ignore one of the training hyperparameters, . I definitely wanted to do gradient accumulation , but decided to leave it for later. Better to get a solid, simpler training run done first, I felt. Next, we download the dataset we're going to use to our local disk with (which will only download if there's not an up-to-date copy already there). The next step is to call to load it into RAM. You can see that there's another hard-coded variable there, . This is a holdover from the multi-GPU DistributedDataParallel code that this was all based on; in this blog post I'm only covering the code for single-GPU training, but I decided to leave the DDP stuff in there for dataset-wrangling purposes, hardcoded to one GPU, so that it would be easier to re-introduce if I later decide to implement something similar in JAX. Let's take a look at and its related stuff. If you go up to line 39 you'll see the code. Firstly, there's a that keeps track of our training data. If you look closely, you might spot one oddity in that class. We have this: Remember that at this stage, the plan was to train the model to map tokens to themselves rather than to make next-token predictions. So the targets are the same as the inputs, not the more normal next token, which would look like (and, in the next post, will look like) this: Next, we have a function to load the appropriate subset of the data from the copy on the local disk into one of those objects. I hit an out-of-memory issue when I ran the first version of this. It was trying to load the data into my GPU's VRAM -- JAX's default behaviour if you have a GPU, and the CUDA version of JAX is installed -- and there was too much to fit in there. After a bit of digging around I learned how to change the JAX default device so that it would be loaded into normal system RAM. Unfortunately, once I'd done that, I found that iterating through it was super-slow -- it took about 1.2 seconds to get one training batch of 6,144 tokens out of the array, which meant that I'd have a limit of 5,120 tokens/second of training from that alone. I eventually learned that the data had been loaded into the main RAM, but was being copied up to the GPU for processing because it had not been committed to the main RAM -- details here . Fixing that (with an explicit call to ) meant that getting a single training batch from the dataset and putting it onto the GPU took less than 0.001s, which was much better. So that was many hours of work that all got packed into lines 55 to 58 of the code: The remainder of the logic in is just to make sure that we have a dataset that is exactly the right size for the world size (even though that's always one right now), the microbatch size, the gradient accumulation steps, and the sequence length that we're working with, Let's go back to the function again. Having loaded our dataset, we create our model, passing in the model configuration stuff and also the (currently unused) dropout rate training hyperparameter, then we create a Flax NNX optimiser which wraps an Optax one. This was essentially a copy/paste from the Flax tutorial, except we're configuring the optimiser with learning rate and weight decay hyperparameters from the training config: Finally, we call to kick off our training loop, passing in some appropriate stuff. Let's go to that function next. We start off with a bit of housekeeping, then go into the main loop. You can see that it's kind of gesturing at gradient accumulation: ...but if you look at the actual body of that loop, it's not doing anything of the sort. It's just getting training batches, putting them on the GPU, doing a full training step, and keeping track of some metrics: So, we're just doing a traditional batch-by-batch training loop without gradient accumulation right now. But some of the infrastructure is there, because it was the next thing I wanted to add after I'd got the basic loop working. The rest of the function is just housekeeping and checkpointing; we'll come back to the checkpointing shortly, but first let's take a look at the function that actually trains the model on a set of inputs and targets, and its associated function -- they're just above . Now, as you might remember from my first JAX post , the best way to JIT a training loop is at as high a level as possible. So when I first coded this, I integrated that into the traditionally-named function like this: When I actually came around to run it the first time, loss wasn't falling at all, and after banging my head against it for a while, I realised I should have used rather than , fixed that, and kicked it off again. Loss started falling immediately. D'oh! Now let's take a look at loss. Cross entropy loss was clearly what I would need to train an LLM, and also felt like the right thing for the A-to-A model. Optax has five loss functions that are related to cross entropy; three of them looked a bit more complicated than I needed: So it was a choice between The latter was the right one -- expects the labels (that is, the target token IDs) to be one-hot vectors, while , as it says in the function name, expects integer labels, which is what we have. That sounded pretty similar to PyTorch's , but there was an important difference. For normal use (if you're not using K-dimensional loss, whatever that might be) PyTorch expects that the inputs are either just a one-dimensional tensor of c logits, or at worst a b x c matrix, where b is the batch size. I had noted when working through this section of Raschka's book that the code we wrote flattened things out. So a batch of six sequences, each 1,024 tokens long, with a vocab size of 50,257, would give us a logits tensor shaped like this: The first axis is the batches, the second is the length of the sequences -- remember, we have logits for every input token in the sequence, with next-token predictions for that token in the context of all of the other ones to its left. And the last axis, with a size equal to our tokeniser's vocabulary size, is the logits themselves. After flattening, it looked like a "batch" of 6 * 1024 = 6144 logits vectors: Likewise our targets -- the token IDs we wanted our model to be predicting -- were batched, and there was one per token in each sequence, so that tensor was Flattened, it looked like a "batch" of 6 * 1024 = 6144 targets: Finally, the PyTorch function returned a scalar value -- wrapped in a PyTorch object, of course, so that it could participate in the backward pass, but a single number. But I'd forgotten about all of that when I was writing this part of the JAX code, and just fed the inputs and the targets straight in to the JAX function. The result was interesting. I started with this: And printing out the shapes of each variable gave this: It had returned a cross entropy number for every element in every sequence, across all of the batches! What's interesting is that the docs for imply that it has the same restrictions as PyTorch's -- it expects a single batch axis in the tensors that are passed in. Perhaps they're out of date? Or perhaps Optax just assumes that you know that in JAX "a batch axis" should be read as "as many batch axes as you want"? Well, anyway -- it worked, and I checked that the numbers were solid. Now, of course, we can't ask JAX for gradients using that 6 × 1024 matrix -- the loss function needs to return a scalar -- but the function on a JAX array does exactly what we need. So I had a solid loss calculation, which you can see in : So that's covered our loss function and the JITted that uses it. The only remaining code that I haven't gone over in this version of the script is the stuff immediately above -- and . These are both called as part of the housekeeping code I glossed over in the function, after we take checkpoints. They just redraw a plot of the loss and other training metrics, using stuff that's stored in the metadata of all of the checkpoints so far. That means that there's a nice graphical way to keep track of a training run. Fairly dull stuff, so there's no need to go through them, but it is worth taking a look at the checkpointing code itself. You can see the version I was working with at this point here . It's not really much of a checkpoint; I was saving the model itself and the metadata needed for that charting code, but not the optimiser, which would be needed for a real checkpoint. After all, the purpose of a checkpoint is to be able to pick things up again if your training loop crashes, and you can't do that without the optimiser's state. Still, it was enough to get started with. That said, one wrinkle I encountered when writing that simple checkpointing code was that it was a tad tricky to save them in Safetensors format -- you can see the details here . So, that was my initial training code. It was time to let it rip: could I train my dumb "LLM" to map from A to A? As I mentioned earlier, the very first run didn't converge at all -- loss started at about 10.82, which was promising (it's exactly what you'd expect for a randomly-initialised network trying to predict GPT-2 tokens -- see here for details), but then it remained there. But when I fixed the " should be " issue, it started dropping. After 92,160,000 tokens seen, it seemed to have hit zero (at least to the three DPs I was printing), so I baked that into and did another training run fixed to that number of tokens. After about 14 minutes, it finished: A very promising final loss, even though that was just whatever we got on the last batch! The actual loss chart looked like this: If you're used to the loss charts in my previous posts, there's something to highlight here: I've switched the Y axis over to being log, so those bumps near the end are actually tiny deviations away from 0.001. I think it's worth showing what the model actually did at this point. It was actually somewhat later that I wrote some code to load up the model checkpoints from these training runs and do some smoke tests, but I'll show you some results now. I wrote some code based on my JAX safetensors post to load up a model's parameters from a checkpoint's file: ...and then wrote two test scripts. Firstly, was it really mapping from A to A? I wanted to be sure that the loss number was actually reflecting what I wanted it to reflect. I wrote a simple script that took a Safetensors file on the command line, and ran the first verse of The Rime of the Ancient Mariner (chosen because it uses oldish English so there are some odd tokens in it) through the LLM it loaded from that file. Here's what the model at the end of the run came up with: That's great! It could certainly handle the mapping. Out of interest, I decided to see how quickly it had learned to get that right. The average training loss in that "best" checkpoint at the end of the training run was 0.0001, so how did the mapping improve, and what was the loss, near the start of the training run? For the first checkpoint, when we'd just run one batch through, we had an average training loss of 10.8242. With the model parameters that were saved then, we get this output: As you'd expect from that loss, it's total token salad. Now let's take a look at the next checkpoint, taken after 375 "global steps" -- that is, 6,000 batches. In that one, the average train loss since that first checkpoint was 2.9323. But that hides something important -- the maximum loss, near the start, was (as you would expect) 10.78524, not much less than the average loss in the previous checkpoint. But the minimum (which we can safely assume was towards the end of this checkpointing period) was 0.54155, so we can reasonably assume that the model improved very rapidly at this point. And the A-to-A test bears this out: So, we can see that the bulk of the improvement happened right at the start! It was able to pass the A-to-A test for that fairly unusual sequence after just 6,001 total batches of 6 1,024-token sequences. The rest of the training run was perhaps just grinding out improvement on rarer tokens, and perhaps making it more certain about already-correct predictions. After all, the test script was simply printing the most likely token for each position, so at this state it might have been predicting some of those tokens as 51% probability. That would have meant a penalty in the loss function, even if the answer was actually correct. So that was an interesting script; I wanted to do another -- the standard smoke test that I've been using, based on Raschka's prompt: how does the model complete "Every effort moves you" when asked to continue the sentence? Here's the script , and here's what it generated: That makes perfect sense. In order to generate the next token in an autoregressive loop, we're looking at the logits for the last one in the prompt. When it first runs, the last token is " you", and our model is trained to map A to A, so its result is " you". We append that to the prompt, run it through again, the last token is still " you", so of course it "predicts" the token " you" again. And so on. So these results were both good news! The A-to-A mapping was working, and was converging rapidly in terms of loss -- and even more rapidly in terms of our poetic test. So, what was next? I wanted the training loop to be as similar as possible to the code I used for my best locally-trained PyTorch model . That used three things I had not built into the training loop at this stage: learning rate scheduling, gradient clipping, and gradient accumulation. The PyTorch code also had the ability to restart from a checkpoint -- not super-important in a 14-minute training run like this one, but I figured it would become important later. After all, the PyTorch runs on my local machine had taken almost two days, and if something went wrong halfway through (cat jumping onto PC power button, etc) then I really wouldn't want to start from scratch. I decided to handle gradient accumulation first. In PyTorch, doing gradient accumulation is pretty simple: the core of a typical training loop without it might look something like this: We start off by clearing out any gradients that are stashed on the model's parameters, then do a forward pass, work out the loss, do a backward pass to put new gradients on the parameters, and then step the optimiser to apply those gradients. Accumulating gradients just means changing it to something like this: That is, we do a forward and a backward pass times. Because we're not zeroing out existing gradients between them, the parameters will accumulate gradients over time -- each backward pass will add its contribution onto what is already there. Each time, we divide the loss by , so that the gradients that are put on the parameters are that much smaller, which means that by the end of our loop we've got gradients that are the average of what we'd have got if we'd done all of these microbatches in one big batch. Finally, once we've exited the loop, we step the optimiser to apply those averaged gradients. When I started thinking about implementing this in JAX, I noticed that Optax has a help page on how to do it , but then I had one of those brilliant shower thoughts that one sometimes has. I should have learned by my age that they rarely work out well, but this time I decided to give it a go rather than doing things the official way. My brilliant idea was that with some finessing, we could put the whole gradient accumulation loop inside JITted code. From what I'd learned so far, the higher up in our code we put the JIT decorator -- that is, the more of the training loop it covered -- the faster it would be. In itself, that wasn't a bad idea. But my first implementation was less smart: The were full-step arrays (eg. shaped (16, 6, 1024) for 16 gradient-accumulation steps over 6 microbatches of 1024 sequences), and the targets likewise. That seemed very clever! But in retrospect, it was obviously doomed to failure, and when I ran it, I ran out of VRAM. The point of gradient accumulation is that what you accumulate over time is, well, gradients. So you have to do a full forward pass and then a backward pass over the model for each microbatch, letting gradients build up, and then apply those in one go, like the PyTorch code did. Unfortunately what I was doing with my code was essentially all of the forward passes, one by one, letting the activations and JAX's internal structures representing what calculations had been done accumulate -- not the gradients -- and then doing a single backward pass across all of that. Mathematically it made sense -- I would have got the right effect if I'd had enough VRAM -- but it wasn't much more memory-efficient than just doing a single batch of sequences. Immediate CUDA OOM. My second attempt was a bit more sensible and ran OK without the JIT: You can see that now I was doing both the forward and the backward pass within the loop, and then working out the mean gradients with that , then passing those average gradients to the optimizer. It all made sense, and seemed to work when I ran it: ...and it wasn't as much slower as I would expect given the lack of JITting: 1,146 seconds versus 843. It was interesting that the final train loss was higher than the run without gradient accumulation, but larger effective batch sizes are not always a better thing: it depends very much on the model you're training and the data. The batch size and number of gradient accumulation steps I was using were ones I had optimised for the full 163M-parameter GPT-2-style LLM, not for this model. So it was OK if it was a bit worse. Anyway, I tried adding the to that function, and ran it: Ouch. And looking at the traceback, it appeared that it was the actual JITting that was running out of VRAM. Something to do with loop unrolling, perhaps? I dug around for a while, trying to use JAX's rather than a normal Python one, but to no avail -- I would always run out of GPU memory. Eventually, after a few hours, the alarm bells on my side quest detector had become too loud to ignore. Reluctantly, I gave up on hand-rolling my own gradient accumulation, and implemented it the Optax way . That was actually really nice and simple. The code is here , but the change is tiny and simple to explain. Remember that we had this code to set up the optimizer: That creates a Flax NNX optimiser, which uses an Optax AdamW optimiser under the hood. The Optax way to do gradient accumulation is to wrap the optimiser in a helper, which -- with the NNX optimiser wrapping the result -- looks like this: The wrapper is really neat. It has the same interface as a regular optimiser, so its method can be called with a set of gradients. But instead of applying them, it just accumulates them until a particular number of calls to have been made, at which it actually does apply the mean of the accumulated gradients, and resets its counter so that it starts accumulating again. That's actually a really nice API. And it actually meant that I would have been able to simplify the training loop. Remember, we had this: The loop-within-a-loop was needed by the PyTorch code, because we needed to do the optimizer step at the end to apply the accumulated gradients. But with the Optax wrapper, we could have just iterated over our samples in one top-level loop, relying on the to make its updates every iterations. However, I decided to leave it in -- keeping track of the training in terms of global steps meant that the training output with my JAX model would be easier to compare to the PyTorch versions. Perhaps if I'd been building the training loop completely from scratch I would have chosen differently. Anyway, with that code change in, I ran it, and: I had the same loss at the end as the by-hand un-JITted version, which was reassuring. And it was slightly faster than the non-gradient-accumulating version, but it's a small enough difference that it was probably just in the noise. So that was gradient accumulation! Here's the code with that added . Next, I wanted to get charting and scheduling of the learning rate, and gradient clipping working. Scheduling the learning rate means that we'll be changing it over the course of the run -- like this example from one of my PyTorch training runs: Having a chart like that one is really useful, as it allows you to sanity-check that the changes you are making to the learning rate really are the right ones. So I wanted to add the charting first, and then the scheduling. The boilerplate code to actually generate the chart, given learning rate numbers in the checkpoints' metadata, was already there, so I had to work out how to extract the current value of the learning rate from the optimiser and then save it into the checkpoints. This was the obvious starting point . Optax optimisers themselves don't store the learning rate, but if you create them like this: ...where the in the brackets is the normal stuff that you'd pass in to the optimizer when creating it, then you can extract the learning rate later. However, the code on that help page was using the Optax optimiser directly, whereas my one in the training code was wrapped inside a , which was in turn wrapped inside an NNX object, like this: Still, the solution seemed reasonably clear. I could use the trick on the that I was creating, and then pass it in to be wrapped like this: The next question was how to actually read the learning rate from that optimiser. The sample code in the Optax docs looked like this: Again, that was using the Optax optimiser directly, rather than trying to use one that was inside an NNX one. However, in the docs for NNX's optimiser I noticed that it exposes its wrapped Optax one's state as . I put in some temporary debug code to print that, and saw that it was the ' state, which made sense -- and that, in turn, contained the state of the wrapped one as . That had a field called , which was a dictionary that included as a key. Finally, the value that that key pointed to was a object. To get the actual value from there, you need to call its to get the actual value, which is a JNP array, so we needed to call on it. All of that led to the following abomination unto God, mankind, and the Law of Demeter : Eurgh. I mean, really, eurgh. Well, anyway, I put code to do that into the function and save the number as part of the metadata. I did a partial training run, just for long enough to confirm that the learning rate chart was being generated, and had a flat line on it at 0.0014, the constant learning rate I was using at that point. I can't say I was very proud of it, though. To recap, the learning rate schedule that I wanted was this: That's formed of two phases: an initial warmup, where the learning rate started at 0.00001 times the desired peak value, and then rose linearly to the peak, followed by a cosine wave to decay it to 0.1 times the peak. In PyTorch I had had to use different learning rate scheduler objects to handle each phase, with a wrapper to bolt them together : However, it's a common pattern in training loops, and conveniently Optax provides a class that does all of that for you. The only oddity in it is that is kind of misnamed; it's actually total steps, including the warmup. So I wound up writing this code: I did a training run with that, and it completed with this: The loss was a bit worse again, but just as with the gradient accumulation steps, the learning rate schedule I had specified was specifically designed for training a real (if small) LLM, not for this toy A-to-A task that I was using to test the training loop. The important thing was the learning rate chart, and it looked like this: Perfect! Here's the code at this point . There were two boxes left to check before I had a training loop I could actually use to build the LLM: gradient clipping and the ability to restart from a checkpoint. I decided to do gradient clipping first. Gradient clipping is where for each update, you look for gradients that are suspiciously large, and cut them off so that they don't make excessive changes to the model. The Optax docs made it look pretty simple: So, you use an to chain together first a thing that does clipping, and then the actual optimiser -- presumably the first thing in the chain sees the gradients and does stuff to them, and then the second receives whatever the first has returned. Now, the question was, should we do the chain outside or inside the MultiSteps? That is, should we clip gradients each time before we step the MultiSteps optimiser, or do we accumulate them and clip the average before we step the inner AdamW one? Looking at the old PyTorch code , I was running the gradient accumulation loop, and then clipping at the end. So the gradient clipping was happening to the accumulated gradients. That actually felt less intuitively good than the alternative, but I decided that we should try to mirror what the PyTorch code is doing. So: So, the optimiser would receive clipped gradients. Because it was wrapped in the , it was receiving the accumulated gradients every time that object hit its limit. Unfortunately there was still a problem: that change meant that the optimiser that we were reading the learning rate from with this horrendous code in the function: ...would now be inside yet another level of nesting -- the object. So, of course, when I ran it, it blew up with an error: I used some debug prints to work out what was going on, and determined that the state of the object was a tuple, the first element being an essentially-empty state for the clipper, and the second being the hyperparameter-injected state for the . So that meant that the new correct code to get the learning rate would be this: Note that we've gained that to do the lookup into the 's tuple state. I remember coming across a comment saying "forgive us for our trespasses in this method" in a codebase long ago, and I know well how the author felt. I did have an idea of how to at least limit the blast radius a bit, though. At this point in the code, I had the complex optimiser setup in the function, and the learning-rate-getting abomination in . I decided instead to define a function called right next to the optimiser setup, and pass that in to . So the horror was still there, but at least it was all in one place, like this: ...where called where it needed it. I was just about to kick this off, but by chance happened to take a closer look at the documentation for , and spotted that it said Clips updates element-wise, to be in That rung a bell! When I was originally looking into gradient clipping for the PyTorch training loop, I noted that that is a perfectly valid way to do gradient clipping, but it's not the way I ultimately chose. Instead, I was clipping based on the L2 norm. The JAX training code was meant to work the same way as the PyTorch code, so that was a good catch; I switched over from using to using , and then kicked off another training run: Everything looked fine; my guess was that the final loss was so similar because a simple task like A-to-A mapping, with such a shallow network, would be unlikely to cause gradients to explode. But it would be nice to be sure. Was there some way I could track the gradients and see if clipping had had to cut in? One neat thing we had in the PyTorch code was that we could track gradient norms pre-clipping: Unfortunately, and the general Optax API doesn't provide any way to access the pre-clipping norms: the that was the zeroth element of the state of the that we were reading in the horrendous learning rate-reading code is an alias of . I considered using to work out the norms directly, and logging that, but that would be tricky -- because the gradients we were applying the clipping to were not the ones that were generated in the function, but instead the ones that had accumulated inside the object over multiple gradient accumulation steps. This sounded like a lot of work for a not-enormous benefit, so I decided to leave it out for this project. There was, however, one small change that I wanted to make while I was messing around with gradients -- what to do if non-finite numbers crept into them. Back when I was first looking into gradient clipping, I was somewhat horrified to realise that the scaler object I was using to tell PyTorch to train in 16-bit for things where it felt it would help (Automated Mixed Precision, or AMP), was silently dropping any updates with non-finite gradients, and if you didn't use AMP, such gradients would be happily applied to your model, most likely completely breaking it by setting parameters to non-finite values. This felt like the wrong place for that kind of logic to go -- I felt that it should belong to the optimiser, or at least in some other part of the stack that wasn't specifically related to the totally orthogonal task of mixed-precision training. I checked what JAX's default behaviour with non-finite gradients was, and it turned out to be to just apply them -- but, with Optax, it actually was something you could fix at the optimiser level. If you wrap an Optax optimiser with , it will only apply finite gradients, so we could add it to the optimiser setup like this: I set to infinity to mirror the PyTorch code's behaviour. Now, obviously, this required yet another level of indirection in the learning-rate-getting function from hell: If you're keeping track, it's the in there. Heigh ho. So, it was time to run it again: That looked OK -- no change from before. Here's the code . Now, it was time to take the last step to finish the training loop: the ability to restart from a checkpoint. At this point, the checkpointing code was pretty basic -- it would save the model as a Safetensors file, along with some metadata like the min, max and average loss since the previous checkpoint, the number of the global step that we were on, and whether or not this was the best checkpoint (in terms of average training loss) so far. In order to restore from a checkpoint, we'd need more information. In the old PyTorch code, we needed three extra things on top of the model and the metadata: So that was the job: save the optimiser in , and then implement a so that we can restart from one. I could then try kicking off a training run, waiting for a bit, killing it, then restarting from the most recent checkpoint. The loss and learning rate charts would tell me whether or not the restart really had picked up from where it had left off. Initially I was thinking that I would just use pickle to save the optimiser, but that felt like a problem waiting to happen. Pickle has issues when you change Python versions or versions of installed packages, which never feels like it's going to be a problem, but all-too-frequently turns out to break stuff in reality. 2 Using Safetensors looked a bit tricky -- it had been hard to get it to work with Flax models, even though it had explicit support. Now, the recommended library for checkpointing in JAX code is called Orbax . I'd looked into it before, and it looked a bit heavyweight, so I'd moved on. But digging in a little more, I found that it had what looked like a simple API for saving PyTrees , which bypassed the complexity. Getting it working was still a bit tricky, though. Firstly, in the docs, they give this example: I tried that in the function with code like this: ...and got the error Huh. Digging into the library from the command line showed that the function was actually called . Not super-promising if the docs don't match the API (though to be fair, it does say right there in the package name). Anyway, changing that appeared to work: ...and then next to the 295 MB file called in my checkpoint directories, there was a 353 MB directory called . In PyTorch-land the optimiser had always been double the size of the model 3 , but given the wildly different file formats in play, I was comfortable enough that it was order-of-magnitude the same as the model and somewhat bigger. Perhaps Orbax was doing some kind of compression or something like that. Next, it was time to write . I started off by writing the function to load up the safetensors file -- that's the one I showed earlier, back when I showed how the original A-to-A model learned how to map a poem to itself, and that if you asked it how to complete "Every effort moves you", it would respond with " you you you you you" and so on. Once I had that, I created a , which called , and then loaded up the metadata and worked out what our best loss so far had been (which is necessary when continuing from a checkpoint so that, as you continue training, you can work out whether each new global step has had a loss that is better than the current best). That was simple enough: Restoring the optimiser turned out to be a bit trickier. Firstly, of course, just like with saving, the Orbax function was called rather than the documented . The next part was working out how to load it in a fashion that the optimiser would accept. If you load a checkpointed PyTree like this: Then what you get back is a "basic" PyTree -- it will consist of lists, dictionaries, tuples, basic Python types like strings, and JAX arrays. The problem is that the optimiser's state is formed of objects that can be mapped to such things -- for example, an object can be mapped to a dictionary where each field is an item in the dict -- but aren't actually those specific types of objects. So if you do this: ...you get an error, something like this: ...and likewise if you use the function I was using in the code: ...you'll get a slightly different but equally confusing error. After a certain amount of floundering around, limited by the lack of documentation (and it not seeming to match the API that I was seeing) I had the bright idea of looking at 's docstring, and that turned out to be excellent. In IPython: The solution was obviously that . When you provide it, it's used as a template. If in the abstract PyTree it finds a object, and in the loaded PyTree there is a dictionary in the same position with keys , and , it will create a object, setting those fields to those values. That means that you have something with the right structure to apply, so I wound up with this relatively simple code to load checkpoint into the optimiser: We're using the existing state of the optimiser as a template to tell Orbax how to structure the loaded one. I kicked off a training run, hit control-C halfway through, then restarted it from the checkpoint, and the final loss chart looked like this: ...and the learning rate chart like this: Perfect! The interrupt was at about global step 400, and the loss continued to go down properly, and the learning rate followed its schedule perfectly. Here's the checkpoint-loading code and the training script . So with that, phase one was done. I had a training script. It was massively overengineered for training this little A-to-A model, but just right for training a small LLM from scratch. And now it was time to do that -- and that's what I'll cover in the next post. If you're thinking "why not just have it return one-hot vectors based on the input tokens", remember that I needed something in the model to train, so that I could confirm that loss was going down. A pure "identity" model without the embedding space would have nothing to learn, so wouldn't be able to provide that.  ↩ It was a surprisingly large source of tech support queries on PythonAnywhere. Someone would train a model with (say) Python 3.11.1, and then try to run it on our servers using 3.11.2, and discover that they couldn't load up their checkpoints. This confused them and they wondered if it was something to do with our platform. I even had a quicktext response to send with a rundown on how Pickle works so that I didn't have to keep typing the same explanation. This may have biased me more against Pickle than I should rationally be.  ↩ AdamW stores two numbers per parameter to keep track of its optimisation state, so 2x the model size is exactly what you'd expect if both files were in the same format.  ↩ Flax NNX for neural network components. Optax for optimisation. Firstly, we convert them into embeddings, so we get a series of vectors. We do this by a lookup into a table, but we can see it conceptually as a projection via a matrix, from vocab space (where a particular token ID is a one-hot vector) to an embedding space. Next, we do the magic with our Transformers layers, getting embeddings for the next token. The embedding at position n in the output sequence, after these layers, is for the predicted token to come after the token at position n in the input sequence, considering that input token and all other tokens to its left. Finally, we project those back from embedding space to logits, this time actually using a real matrix (in the form of a linear layer). The logits (after being run through softmax) represent the probabilities for each token of it being the next one. The scaler that we used to do automated mixed-precision training. This JAX loop was not going to do that, so it was not necessary here. The learning rate scheduler. This was built into the optimiser for JAX, so I didn't think it was needed. The optimiser itself. This was important, and we definitely did need to save it. If you're thinking "why not just have it return one-hot vectors based on the input tokens", remember that I needed something in the model to train, so that I could confirm that loss was going down. A pure "identity" model without the embedding space would have nothing to learn, so wouldn't be able to provide that.  ↩ It was a surprisingly large source of tech support queries on PythonAnywhere. Someone would train a model with (say) Python 3.11.1, and then try to run it on our servers using 3.11.2, and discover that they couldn't load up their checkpoints. This confused them and they wondered if it was something to do with our platform. I even had a quicktext response to send with a rundown on how Pickle works so that I didn't have to keep typing the same explanation. This may have biased me more against Pickle than I should rationally be.  ↩ AdamW stores two numbers per parameter to keep track of its optimisation state, so 2x the model size is exactly what you'd expect if both files were in the same format.  ↩

0 views

the first six months of 2026

First half of the year has passed, only half of the year left. 2026 is a difficult year for me so far. At the end of 2025, I wished for more rest this year. I didn’t do that yet. I pushed harder in different ways and I just can’t keep doing that. For 1.5 years now, I have been going extra hard in everything I do - I asked for more work at work, I created new opportunities and roles for me there, I blogged longer informative posts that took a lot of my knowledge and research, I completed more exams in my part time degree than ever before, I started and finished the certificate to be a data protection consultant, I attended conferences, I started volunteering, I went harder on my fitness, and so on. I do that to make up for my severe illness in 2024 and because of how a chronic illness turns everything into a pressing matter (fear of relapse, fear of no longer getting to do something, urge to maximize good times etc); and I can no longer keep it up. I need a longer break where I can just exist. I need to allow myself to do less in my part time degree and accept that it will postpone my graduation, I need to stop doing court case summaries for a month for noyb, I need to stop blogging for a while so I am not always working on some big essays and reading sources and running to keep up with articles and papers, and so I don’t always have a full inbox with like 50 emails to answer. I need to stop reading my web reader (RSS) and the Discover page. I can’t scale back work, so it has to be everything else. I want to enjoy life for a while and not always feel like I got something to prove, something to chase, something to keep up, something to get back to as soon as possible. And the thing is, I had so much fun stuff planned the first half of this year. I didn’t build those experiences up in my head and I didn’t have unrealistic standards, yet all of them kinda left a bad taste in my mouth when they happened. Travels, courses, conferences, restaurants, whatever. Everything I looked forward to had something difficult and disappointing about it, or had this Monkey’s Paw thing I mentioned in my other post. So I no longer feel like even planning nice things for myself and my wife. I’d rather save the energy, time and money. And that's sad, so I need to take some time to change that. It adds to the general burnout. I notice my impatience is worse, I get snappier, less forgiving, feeling more like someone’s fault or error is done in malice rather than accidental or born out of cluelessness/obliviousness. I no longer want to explain anything, elaborate, or help people, because I feel like I have to conserve my energy and efforts, as most interactions with others have a severe imbalance where I do most of the work. This would be the opposite if I wasn’t feeling burnt out. I react more defensively than usual if you point out small, irrelevant, inconsequential mistakes or make small requests for things I don’t care about changing. I retreat, I wanna be alone more, I wanna focus even more on personal projects that only involve me in isolation (and no other people, location, etc.) because those don’t let me down. I crave social interaction, yet listening to people or reading what they write is like nails on a chalk board. It’s a confusing time to be in, because I more or less get everything I want with some exceptions, but in a warped way where it doesn’t make me happy and is always accompanied with some annoying twist or extra price (figuratively). Nothing just flows. I have to micromanage everything and deal with ridiculous hurdles. I have stress “dreams” where I am half asleep, having hypnagogic hallucinations how it’s my turn in something (usually a board or card game happening on my bed) and I can’t figure out what people want me to do and they get impatient and urge me to finally hurry up, so I flee (sleepwalk around the apartment) before waking up and walking back to bed. In other such dreams, I am convinced I forgot to do something I promised I would do in a different reality/parallel world (?), but I can’t quite understand what I owe and how to do that thing, it’s incredibly vague and confusing, and I walk away from the bed to avoid the harassment by these dream people about it until I wake up and walk back as well. Those two repeat so often. Probably once a week. The feelings don’t leave me after I wake up, like I genuinely feel guilty and stressed even when awake and my brain still tries to decipher what I forgot to do and what I owe and that I’m running out of time? I haven’t bored myself purposefully nearly as much as I want to, need to and used to. There is always a blog post I want to write or continue; an article, book, blog post or paper to read; a video to watch; something to study; work; gym. I notice I'm desperately craving to do these things, and do them, yet also feel like I have to force myself through it. Like it takes an intense amount of energy and focus. I need a lot more time to do them, lots of micro breaks and distractions, and it all feels so difficult internally. I feel exhausted. Even when stuff is very easy and I want to do it. It’s like my brain is full, nauseous, sick. It screams at me to stop. My memory/retention is so shit, too. I have never said " I don't remember that. " ever as much in my life as I did this month. What adds to me not being able to stop is that it all feels mundane and harmless. Just one more thing. And they’re all things I do all the time, and things I am expecting myself to do, that are standard function, default. I understand when others burn out because bad timing of horrific events, like their house burned down while the pet is sick and grandma died and they just lost your job or something. Or: Insanely stressful high stakes job working 50-70 hours a week. But none of that applies to me. I just do normal things. And I don’t wanna be someone who does less. I want to do it all. My chronic illnesses play into it. You can be chronically ill, but you are supposed to work and act like everyone else and achieve things and work on yourself. You cannot be visibly ill, you cannot do markedly less, you cannot struggle with a basic task or workload. You cannot let yourself go. You cannot waste yourself. Otherwise you are giving in, you’re a lost cause, you do nothing to help yourself, you make your illness your personality, you use your illness as an excuse. You’re not an inspiration, and that’s kinda all you’re good for if you are forever sick. You are supposed to reassure everyone that chronic illness doesn’t alter life much and that life can go on unchanged and you can totally achieve everything you would have if you weren’t sick. If you cannot be used for this cause, you are discarded by society. There is a pressure to not let that happen to me, especially when my wife depends on me. Anyway, before I end up in the kind of burnout that makes you completely unable to work a job for most of your life, I have to change things. Just putting some self care things on my todo list doesn’t help, as it is just another obligation and doesn’t make me feel better. I just put it on the list because it is supposed to be good for me and a productive way to deal with stress. Like, what sounds better in our current society: That you slept all day to rest and watched some Simpsons, or that you did some yoga and then had a bath with 5 products to make you prettier and then journaled and then went on a walk? But I actually need to do the former for once. Which is what I have been doing a lot the past week. Lounging around, letting my mind wander, napping, just existing and breathing, like a cat sprawled on the sofa. I need to do things freely, and not do straining things all day, and let myself not do things that you can be measurably good or bad at. No care about consistency. I feel like I arrived at small versions of this burnout every now and then over the years, did something to help it for a while, and then experienced it again. And every time, it took a shorter while to relapse, and it felt worse, and it felt like I needed more rest and relaxation than I could realistically give. I only ever gave enough to function again, to make it work, to take the edge off, delay the worst. Like a day here and there doing little to nothing. Nothing more, no changed behavior moving forward. Something has to change permanently so I don't always run into this same issue over and over again, risking my mental health and my ability to do my hobbies and work. :) I still have to figure out where my sweet spot is between my ambition and what my body can give. I don't mind giving 200% for a time, just not forever. It seems like 1.5 years is my limit. With that said, I am gone the entirety of July. I won't blog 1 , I won't reply to emails until August, I won't read your posts. Friends can still reach me via Matrix and Signal. Reply via email Published 30 Jun, 2026 There is one announcement that I'll likely publish, that's it. ↩ There is one announcement that I'll likely publish, that's it. ↩

0 views
Unsung Today

“The evilest will-breaking browser game to exist.”

In 2023, Neal Agarwal created The Password Game , a viral browser-based game. Wikipedia has a nice summary: Although the initial requirements include setting a minimum of characters or including numbers, uppercase letters, or special characters, the rules gradually become more unusual and complex. These can involve managing having Roman numerals in the string to multiply, adding the name of a country that players have to guess from random Google Street View imagery (as a reference to GeoGuessr), inserting the day’s Wordle answer, typing the best move in a generated chess position using algebraic notation, inserting the URL of a YouTube video of a randomly generated length, and adjusting boldface, italics, font types, and text sizes. The explanation goes on for another paragraph, but I don’t want to spoil too many surprises. However, if you’re not a puzzle kind of person, you can just watch a 40-minute video of Bog trying to beat it : = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/the-evilest-will-breaking-browser-game-to-exist/yt1-play.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/the-evilest-will-breaking-browser-game-to-exist/yt1-play.1600w.avif" type="image/avif"> Last year, Agarwal followed The Password Game with I’m Not A Robot game , making fun of similarly onerous CAPTCHA requirements. Here’s Bog completing it once again – and you can also find other YouTube creators doing the same for both games: = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/the-evilest-will-breaking-browser-game-to-exist/yt2-play.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/the-evilest-will-breaking-browser-game-to-exist/yt2-play.1600w.avif" type="image/avif"> In the same category, a game designer Linternet User just launched a teaser for their game CAPTCHA Hell , which has a different take and looks fun: = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/the-evilest-will-breaking-browser-game-to-exist/yt3-play.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/the-evilest-will-breaking-browser-game-to-exist/yt3-play.1600w.avif" type="image/avif"> I need to add that underlying all of this “fun” is not just tons of frustration with passwords and CAPTCHAs, but also a genuine accessibility problem, as described by Robin Christopherson in 2019 in an article titled AI is making CAPTCHA increasingly cruel for disabled users , or by A11y Collective a few years later. I don’t know what is the absolute latest in the battle with AI bots; anecdotally, I have been seeing almost zero text CAPTCHAs and less visual CAPTCHAs, at the expense of more and more CloudFlare turnstiles (and Google’s equivalent ), which make you only click the button, and do a lot of work under the hood to determine if that button press felt human-y or robot-y: These challenges include proof-of-work (computational puzzles), proof-of-space, probing for web APIs, and various other challenges for detecting browser-quirks and human behavior. As a result, we can fine-tune the difficulty of the challenge to the specific request and avoid showing a visual or interactive puzzle to a user. There is no more explanation. I think the nature of the beast is that the actual details of how to tell one group from another cannot be shared, which is a shame – I’m very curious. #games #security #youtube

0 views
Unsung Today

“Invalid-reverse-solidus validation error”

In my three decades online, it has never occurred for me to try this, and I found it so delightful once I did – both Chrome and Firefox will quietly rewrite backslashes in URLs into slashes: Not Safari, however, even though the URL living standard says it should . I am very curious if the presence of backslashes in URLs is owing to Windows still showing backslashes in file paths, or just because people casually don’t see any difference between / and \, which are arguably both similar, and relatively alien in everyday typography. (“Solidus” is the proper typograpical name for this kind of a slash, partly to disambiguate it from all the other slashes with their equally fascinating names .) = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/invalid-reverse-solidus-validation-error/2.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/invalid-reverse-solidus-validation-error/2.1600w.avif" type="image/avif"> #keyboard #typography #web

0 views
Unsung Today

Mailbag: The curious case of the disappearing Polish S

Even before the “remaster,” my essay about the Polish S bug was routinely discovered by Hacker News and other places, so I thought I would take a look at all the commentary over the years and summarize. First, pragmatically, these are the lessons for any keyboard shortcut designer: I couldn’t find a good image, so I made these two as an example. First is Mac’s American keyboard with ⌥ held. Second is Polish keyboard with ⌥ held, with Polish letters highlighted: = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/mailbag-the-curious-case-of-the-disappearing-polish-s/1.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/mailbag-the-curious-case-of-the-disappearing-polish-s/1.1600w.avif" type="image/avif"> = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/mailbag-the-curious-case-of-the-disappearing-polish-s/2.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/mailbag-the-curious-case-of-the-disappearing-polish-s/2.1600w.avif" type="image/avif"> Jumping to the promised comments, I liked this story : Outlook has a shortcut Alt+S to send the current e-mail. In Polish “Hello” is “Cześć”. When you acidentally have non-Polish locale enabled and write “Cześć” in Outlook - you send “Cze” as your whole e-mail. “Cze” is a very informal greeting, sth like “Yo”. There has been thousands of such e-mails in Polish companies sent to people who really shouldn’t be greeted with “Yo.” :) Here’s a little summary of other similar bugs. I verified some of them: I’m sharing this for awareness. I believe many other languages/​writing systems also have this problem; the examples are lopsided toward Polish only because my original example was about Polish. Lastly, I found this an interesting anecdote: In Portugal we had a similar workaround in the early days of computers not supporting our alphabet properly. Like in Polish there are plenty of words that without diacritics get another completely unrelated meaning, e.g. caça vs caca, which you didn’t want the interpretation to be left to the receiver. So tricks got invented, like adding additional letters for the missing diacritics, é becomes eh, è becomes he or eh as in the former case, the example above would be cac,a and so on. However it was still quite flexible, not everyone uses the same extension set. I wouldn’t be surprised if every single language outside of English developed some sort of a way to cope and adjust to limitations of originally American-oriented computers. In my book , I wrote about Japanese and Turkish, and there is another book – The Chinese Typewriter – that spends a lot of time talking about this very issue for China. If this subject is particularly interesting to you, venture out into the Hacker News waters to see more commentary: 2015 , 2021 , 2024 , 2026 . #bugs #encoding #keyboard On Windows, AltGr (Right Alt) and Ctrl+Alt shortcuts are one and the same, and Right Alt and alphabetic keys are used for some languages to output regular accented letters. You should not prioritize Ctrl+Alt shortcuts anywhere your users write text. On a Mac, ⌥ and most keys generate characters. They do so even on English layout for extra typographical flair, but particularly in other languages, regular accented letters might hide there. Note that these are not just letter keys, but also digits and other keys. You should not prioritize ⌥ shortcuts anywhere your users write text. “Oh, that explains why I accidentally triggered Claude with Alt+Space, despite it being configured as Ctrl+Alt+Space.” Link “Noticed similar issues with official Australian VISA / immigration pages. You can’t simply fill some forms with your email address using Finnish keyboard. Why? Because they block usage of AltGr button on their page. They also prevent using clipboard blocking copy paste option for that sign. User has to be smart enough to switch to US keyboard and then enter @ sign and then switch back. So this is nothing new, but it’s absolutely rude from part of the site designers to vandalize basic functionality like that. Normally @ is produced by AltGr+2.” Link “In a similar fashion, you cannot type the capital letter Ł in Notion. You type the letter with ⇧⌥L on the Polish keyboard on a Mac. Notion uses the ⇧⌥L keyboard combo for its own purposes.” Link “Medium learnt its lesson in 2015. Google still hasn’t and you cannot type Ś in Sheets, at least not on MacOS.” Link “Meanwhile, in 2026 I suddenly cannot type capital Ś in Edge on Mac. I feel like I moved back in time 25 years or so.” Link “I wonder if it is a similar reason why currently on MS Teams I can’t type the letter ń.” Link “It’s just like the new Copilot 365. Every time I try to type Ć, Copilot pops up. I have to close the app constantly.” Link “I had a similar issue when ASUS’s bloatware background service decided to bind something to both Alt+S and Alt+A globally . I have to keep it disabled or else I won’t be able to type ą, Ą, ś and Ś without using Caps Lock to work around the issue.” Link “In an Nvidia overlay there is a shortcut Alt+Z. It’s pretty annoying because it triggers on both left and right Alt, so polish users cannot type letter ż without opening the overlay or rebinding it. Nvidia pls fix.” Link “The very same bug used to be present in early Windows mobile GPU drivers - with global hotkeys making it impossible to enter Ł (with Intel GMA 950) and Ć (with ATI Catalyst). Being a Polish geek, I used to earn lots of free dinners from frustrated friends who were forced to copy-paste those letters on their brand new laptops. Funny how the same bug recurs in different types of software due to an obscure locale-dependent edge case - and it’s much less known than, for example, the Turkish dotted/​dotless I.” Link “Installing KeePass used to silently disable ”ą” key (AltGr+A hotkey). KeePass broke system of every Polish user immediately after being installed.” Link

0 views

Have your agent record video demos of its work with shot-scraper video

shot-scraper video is a new command introduced in today's shot-scraper 1.10 release which accepts a file defining a routine to run against a web application and uses Playwright to record a video of that routine. I've written before about the importance of having coding agents produce demos of their work; this is my latest attempt at enabling them to do that. Here's an example video created using , exercising a still in development feature adding the ability to create new tables in Datasette from pasted CSV, TSV or JSON data: That video was created by running this command : (That JSON file contains a cookie , as described here in the documentation.) Here's the file: The video command documentation includes simpler examples, but for the purpose of this post I thought I'd go with something more comprehensive. That demo YAML storyboard was constructed entirely by GPT-5.5 xhigh running in Codex Desktop, using the following prompt run inside my checkout of this branch : Now that I've released the feature the prompt could say " " instead and it should achieve the same result. I really like this pattern where the output for a command provides enough detail that a coding agent can use it - it works kind of like bundling a file directly inside the tool. I used the same pattern for showboat and rodney . started as an experimental prototype. is built on top of Playwright , and the key feature it needed was for Playwright to be able to record video of browser sessions with enough control to create the desired demo. I first tried this a few years ago and found that the Playwright-produced videos included additional chrome that was useful for debugging a test failure but unwanted for a product demo. They fixed that a while ago, but there were still some minor blockers. In particular I was getting a few white frames at the start of the videos , since the recording mechanism kicked in before the first URL was loaded by the browser. Playwright 1.59 added a new screencast mechanism providing much more finely grained control over video recording. This was very nearly what I needed, but the resulting videos were fixed at 800px wide. I found a landed PR fixing that but it wasn't yet in a release. Then yesterday they shipped it in playwright-python 1.61.0 and I was finally unblocked to finish implementing the feature! The code itself was all written by GPT-5.5 xhigh in Codex Desktop. I had it write the documentation as well which gave me a very useful frame for reviewing the design - much of the iteration on the feature came from reviewing that documentation, spotting things that were redundant, inconsistent or confusing, and requesting (or dictating) a better design. The YAML format itself was mostly defined by the coding agent. I had it use Pydantic to both define and validate the format, partly to make the design easier to review. This is a great example of the kind of feature that I almost certainly wouldn't have taken on without coding agent support. I filed the original issue in February 2024, and had difficulty finding the necessary time to solve this in amongst all of my other projects. You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options .

0 views
Evan Schwartz Yesterday

Scour - June Update

Hi friends, Many of you mistakenly got onboarding emails yesterday. I'm sorry about that. I was tweaking the way emails are sent to new users and accidentally sent it out to everyone. Don't worry, you'll get your weekly digest on Friday as usual. (If you got a message about verifying your email, please do verify yours if you'd like to continue receiving the weekly digests.) In June, Scour scoured 841,977 articles from 27,356 feeds , and 123 new users signed up. Welcome! Here's what's new in the product: Scour now tracks and shows which articles cover other ones so you can find coverage, reactions, and responses to a given story. Under any post, you can see both the stories that the given one links to, and which other sources link to it. A detail I especially like is that the covering sources you tend to like and read are shown first, so you can easily find your favorite commentators' reactions. Relatedly, there's now a page that shows the most widely covered stories across Scour. If you subscribe to specific feeds, you can also add this as a feed to source content from. Laurynas Keturakis suggested this over a year ago and after finally implementing it this month, it quickly became one of my favorite Scour features. Thanks Laurynas! After you love or like a post, you'll see a small prompt to add more interests similar to that article's content. Adding interests is the best way to hone your feed and make sure Scour surfaces articles you'll like, so I hope this makes it easier to do that. If you subscribe to individual feeds, that prompt will also include a way to subscribe to the publisher's feed, if you aren't already, so you'll get more content from them. Similarly, if you dislike a post, you'll see some options to have less of that kind of content appear in the future. The Scour feed got a makeover! The new layout should be easier to scan and interact with. Clicking or tapping a post opens the expanded view: Also, on mobile, you can swipe articles right or left to quickly like or dislike them. The new Discover section contains all of your personalized interest and feed recommendations, as well as the pages to browse popular posts, interests, and feeds. Head over there if you'd like to build out your feed more, or if you want to see what others are reading on Scour. Scour now works far better with assistive technology. Every post is a labeled article whose actions are reachable by screen reader and keyboard, menus support arrow-key navigation, and the things that used to change silently (filter updates, search results, newly loaded posts) are announced as they happen. If you or someone you know reads Scour with assistive tech, I'd love your feedback. See the new Accessibility page for the full picture. Enjoying Scour? I added testimonials to the homepage and I'd love to include your review! Email me to let me know your thoughts (and of course, constructive feedback is also very welcome). Here were some of my favorite articles I found on Scour in June: Happy Scouring! I've been thinking a lot about the ways that AI changes what it feels like to be a software engineer and I especially appreciated these takes: Andrew Diamond made a great comparison with historical fiction writers in Software Engineering in the Age of AI . Vardan Torosyan pointed out that every engineer is now facing the kind of overload engineering managers have always dealt with: There is Too Much . Candost discusses having an ownership mindset in On the Changing Role of Software Engineers . And a goofy font that Bill Tarbell made that's readable for humans but not for AI: Souls Only .

1 views

The AI Industry Is Losing

If you liked this piece, you should subscribe to my premium newsletter. It’s $70 a year, or $7 a month, and in return you get a weekly newsletter that’s usually anywhere from 5,000 to 18,000 words, including vast, detailed analyses of NVIDIA , Anthropic and OpenAI’s finances , and the AI bubble writ large (updated to version 3.0 a few weeks ago). My Hater's Guides To the SaaSpocalypse , Private Credit and Private Equity are essential to understanding our current financial system, and my guide to how OpenAI Kills Oracle pairs nicely with my Hater's Guide To Oracle . This month, I published a two part series that took a deep-dive into the bubbles-within-a-bubble that make up the AI bubble — from the unsustainable and reckless growth of semiconductor companies, to the cults of personality surrounding Sam Altman and Dario Amodei. On Friday, I’ll publish my long-awaited Hater’s Guide to Softbank. You won’t want to miss it.  Subscribing to premium is both great value and makes it possible to write these large, deeply-researched free pieces every week.  Soundtrack — Queens of the Stone Age - Hideaway (Baloise Orchestral Arrangement) On Sunday, the Bank of International Settlements (BIS) put out its annual report and said, well, a bunch of things that I’ve been saying: As edifying as it is to see the bank for central banks say exactly what I’ve been saying for the last few years, this part is the one that both rocks as far as being right goes and sucks for the world at large: No shit. In April of last year, I wrote a piece called “ AI is a systemic risk to the tech industry, ” where I outlined how the failure of one model lab, OpenAI, would have seismic effects down its supply chain, delivering body blow after body blow to NVIDIA, Oracle, Microsoft, and the various Neoclouds that serve its compute, the most notable of which being CoreWeave.  Since then, OpenAI’s slimy tendrils have sunk into even more facets of the tech industry, and it has signed deals with the likes of Google, Amazon, Cerebras, and Broadcom, while also taking on more investments, including mammoth commitments from Softbank, which is only able to meet them by selling off prized stock in companies like ARM and NVIDIA, and by raising debt.  The idea of systemic risk has never quite left my work, and I’ve spent a lot of time thinking about it over the past year — and, as a result, my writing has examined the potential consequences of an AI spending pullback on those financing the sector, in particular private credit , as well as the semiconductor industry .  The BIS’s concern wasn’t about revenues tanking — which would happen should, as it fears, hyperscalers decide to “slow or halt the aggressive pace of capex development” — but rather revenues tanking and the borrowers within the AI supply chain being unable to service their growing debt burdens.  Again, this is something I’ve raised the alarm bells over a bunch of times. CoreWeave has been a favored popinjay of this newsletter, and in March of 2025, I published CoreWeave Is A Time Bomb, where I focused heavily on the company’s overwhelmingly toxic debt pile and its reliance on OpenAI as a customer.  On a much grander scale, we have Oracle — which I exhaustively profiled in my Hater’s Guide to Oracle newsletter .  Unlike neoclouds like CoreWeave, Oracle’s a much older company, having spent most of its existence selling database and ERP software to some of the world’s largest companies and public sector institutions. Oracle pivoted to serving AI compute at a time when its core business lines had started to stagnate, and thanks to its large scale, it was able to raise insane amounts of debt. And Oracle, as I’ve noted previously, is a company that, even before the AI bubble, was massively indebted. It just so happens that, as a result of its tryst with OpenAI, Larry Ellison saw fit to twist the debt knob to eleven.  Oracle’s spending has already pushed its free cash flow into negative territory — minus $23.7bn, as of the end of FY 2026 — and at the end of May, it had $129.5bn in outstanding debt. This doesn’t include its various lease commitments, which add up to nearly $38bn, nor the additional $260bn in lease commitments that have been signed, but haven’t actually started yet.  All of this is to say that Oracle has massively leveraged itself for the benefit of one company, OpenAI, and if that company can’t pay its bills, it’s fucked. Oracle’s existence — and Larry Ellison’s personal wealth — hinges on whether OpenAI can make good on its promise to spend $300bn in compute.  This is both the most-obvious and under-discussed part of the AI bubble — that the trillion-plus dollars of hyperscaler capex is feeding a massive semiconductor boom based on, at best, the very small likelihood that large language models will turn into something completely different.  If Microsoft, Google, Amazon and Meta decide that it’s time to stop spending $30 billion or more a quarter on GPUs, RAM, storage, and data center construction, that’ll tear a hole in the side of what people assume is a permanent supercycle. I need to state how fucking silly it is that anybody considered said semiconductor boom anything other than a brief chance to fill their boots before a global equity catastrophe so severe that the Futurum Group will be on suicide watch. Hyperscalers — who will see their capex outpace their cashflows as of Q3 2026 — have had such poor returns on their investment in AI that none of them will actually disclose their revenues outside of vague “ run rates ,” which means that all of this investment is effectively based on the idea that something completely different will happen in the future .  Said future will have to make them at least $2 trillion in brand new revenue by 2030 , because if it doesn’t, effectively all of that capex will have been spent to prop up Anthropic, OpenAI, and whatever it is that Meta is doing with its chatbots.  There is no cogent or rational argument in favor of continued capital expenditures, at least not one without a tacit acceptance that much of the current spend has been a waste outside of pumping equities and incubating two different large , unprofitable AI labs. Those millions of H100 and B200 and B300 GPUs are not going to usher in a digital God, they are not going to create recursive self-improvement, they are not going to be the fulcrum to adding $600 billion or more in brand new revenue to current services, and the only revenue they’re generating is compute spend from Anthropic and OpenAI, which I estimate makes up 20% or more of cloud revenues for Google, Amazon, and Microsoft.  I must also be clear that the cost of these companies extends far beyond equity investment. While Microsoft invested $13 billion in funding OpenAI, Microsoft executive Michael Wetter revealed as part of the Musk vs Altman trial that the partnership has cost it more than $100 billion , suggesting infrastructure costs of at least $87 billion just for OpenAI. I imagine Amazon and Google have had to spend similar amounts to handle Anthropic’s similarly-rapacious compute demands, especially given the $11 billion-and-counting cost of Amazon’s Anthropic dedicated Project Rainier data center . This is a criminally-underdiscussed part of the AI bubble. Anthropic and OpenAI have raised a little under $300 billion combined since 2019, but I estimate their true cost is at least $500 billion given hyperscaler capex investments that were necessary for them to exist, and that’s before you consider the $340 billion or more that Oracle is spending to build out the 7.1GW of “Stargate” data centers for OpenAI . These are not startups , but subsidiaries of big tech that only exist as separate arms as a means of pumping equity positions and hiding the truth: that AI capex has been a complete waste of money, even when you include two bulbous failsons that lose tens of billions of dollars a year. As I reported two weeks ago , OpenAI spent $17.2 billion on Microsoft Azure in 2025, a year when it lost $20.9 billion on $13.04 billion in revenue. Even if that were profit (which it is not), that’s $4.2 billion less than the capital expenditures that Microsoft spent in the first quarter of 2025 . Outside of OpenAI, Microsoft may as well not have an AI business. While it boasted back in April about having a $37 billion AI revenue run rate (meaning a non-specific month multiplied by 12), that only works out to about $3.08 billion a month, or less than a tenth of the $31.9 billion that it spent on capital expenditures in the quarter . To make matters worse, Microsoft revealed that number was “up 12% year-over-year,” suggesting that its AI revenue run rate in Q3FY25 was $16.59 billion, or around $1.38 billion a month.   Yet my own reporting on OpenAI’s inference spend from last November showed that it spent $2.947 billion in Q3FY25, representing about $11.7 billion on an annualized basis, meaning that, at least in that quarter, OpenAI likely represented around 70% of Microsoft’s AI revenue , and I’d be surprised if that dramatically changed in the year that followed, given that OpenAI’s inference spend was $3.648 billion in Q1FY26. All of this is to say that the only real outcome from all of this capex spend appears to be propping up Anthropic and OpenAI, two deeply-unprofitable companies, and then receiving a small fraction of it back in the form of revenue that is only made possible through hundreds of billions of dollars of venture capital subsidies.  Now OpenAI and Anthropic represent 50% or more of hyperscaler remaining performance obligations , or around $748 billion. There is simply no logical or rational reason to invest any further capex in AI, outside of the mistaken belief that OpenAI or Anthropic could actually afford to pay without Google , Amazon , or Microsoft handing it to them. Hyperscalers do not have meaningful AI revenues of any kind outside of their own pseudo-startup investments, and it is equal parts ludicrous and irrational that A) they are continuing to invest and B) that the markets, analysts and journalists are acting as if everything is fine. Record sales across NVIDIA, Micron, Sandisk, SK Hynix, and Samsung are a direct result of an entirely speculative asset bubble, driven by the reckless and directionless capital expenditures of some of the largest and richest companies in the world.  Anyone investing in data centers is building speculative capacity for demand that does not exist outside of Anthropic and OpenAI. If said demand existed, AI data center neocloud company CoreWeave would have a healthy and diverse revenue stream, rather than 65% of its revenues coming from Microsoft (for OpenAI) and NVIDIA , and the rest coming from Google (for OpenAI) , Anthropic , Meta , and, of course, OpenAI . There are simply no other massive consumers of AI compute , and the only reason we haven’t hit that harsh reality is that data centers take 18-34 months to finish .  Even if there was, I can find little evidence of anyone but OpenAI, Anthropic and hyperscalers having the demand or funds necessary to substantiate the data center buildout.  I really need to hammer this point home. If we assume that NVIDIA CEO Jensen Huang’s prediction of $1 trillion in Blackwell and Vera Rubin sales comes true, that would be around 40GW of data center capacity with around 30GW of IT load, and if we assume that data centers get about $12 per-megawatt of revenue, that works out to about $435 billion in annual compute demand by, being generous, 2030. Let’s be abundantly clear about something: the only companies that can afford to spend money on compute right now are either hyperscalers or the companies that hyperscalers subsidize. Even then, outside of OpenAI’s $50 billion in 2026 compute spend and what I estimate will be a similar amount from Anthropic, there doesn’t appear to be more than a few billion dollars of demand, and if there were, CoreWeave, IREN, Nebius, Cipher Mining, and other neoclouds would have hundreds of billions of dollars’ worth of remaining performance obligations rather than RPOs that expand only with hyperscaler backstops or the depths of Meta’s Zuckerbergian AI psychosis . Let me put it even simpler: those hundreds of billions of dollars of data centers are being built for no-one, and the only companies that can “afford” to pay for even a fraction of the compute are unprofitable AI companies propped up by hyperscalers.   While this might read as a radical position, I think it’s far more radical to look at the current state of affairs and say “fuck it, I think hyperscalers should spend a trillion dollars next year .” There is no rational justification for doing so out of fantastical thoughts driven by a deranged market desperate to avoid thinking about how tech doesn’t have any hypergrowth ideas left .  The current capital expenditures have, outside of the creation of OpenAI and Anthropic, been a near-complete waste. Microsoft 365 Copilot sucks . GitHub Copilot sucks. Google AI Overviews suck . Google Gemini is an also-ran LLM and thus, as a result, sucks. Meta’s LLMs are horrifyingly dangerous . Amazon Rufus sucks, and Amazon should be investigated by the SEC for suggesting it drove $10 billion in “annualized revenue” in Q3 2025 , because it most assuredly did not. Alexa+ sucks . It all sucks, and it would suck just as badly if big tech had spent a quarter of the capex.  These products are near-universally loathed, barely generate any revenue, and even in the case of the modestly-successful GitHub Copilot (around $1.08 billion in annualized revenue as of end of last year), it was only because users’ compute was heavily-subsidized, leading Microsoft to move users to token-based billing , outraging customers who were used to paying $39 a month to burn thousands of dollars of tokens . Sundar Pichai, Andy Jassy, Satya Nadella, and Mark Zuckerberg are losers. They may have billions of dollars, they may run giant tech companies, but they are losers selling a doomed technology based on unreliable, inefficient and overly-expensive technology ill-suited for the kinds of reliable, deterministic, “set it and forget it” tropes that people actually associate with AI.   The Four Losers are the only reason that anyone has taken these Large Loser Models seriously, which is a sign that the tech industry and our economy are also piloted by losers. Every bit of “progress” that we’ve seen from LLMs has come from aggressively cramming a square peg into a round hole — billions of dollars of training costs, hundreds of billions of dollars of capex, endless harnesses and scripts and wrappers and layers to try and eek out anything approaching the supposed promise of autonomy.  All the king’s horses and all the king’s men have sunk every dollar and ounce of brain matter into trying to make LLMs into something they’re not, and we, as a society, are expected to coddle these things and act like they’re exceptional , and give them credit for things that have yet to take place. I refuse to buy into the premise that LLMs’ ability to generate code or replicate open source software is proof that these things will become a powerful, autonomous tool in the future, and I think those that extrapolate to that point are either intellectually bankrupt, deeply cynical or so easily-fooled that they click every single email claiming their Paypal account has been compromised. I assure you, all this money can be wrong! Hyperscalers can, in fact, spend a trillion dollars on something that doesn’t do what they say, because these companies are more than happy to mislead you, and, to quote Nik Suresh : Why did everybody invest in data centers? Because the hyperscalers did so! Why are Micron and RAM companies selling so much RAM? Because A) GPUs use a ton of high-bandwidth RAM, B) said HBRAM consumes three times as much wafer space as normal DRAM , leaving less space for other kinds of cheaper, lower-margin RAM, and C) because the servers for said AI GPUs are, too, full of RAM!  Those data centers aren’t being built because the creditors have any “insight” into the massive amounts of AI compute that generative AI tools need, and will need. They see the “success” of ChatGPT and Claude (two heavily-subsidized products) and think that because Anthropic and OpenAI need lots of compute, everybody will need lots of compute. And because banks and private credit crave ways to invest their money and everybody is so excited , it’s super easy to get them excited about the prospect of building something big, sexy and costly! It doesn’t help that a lot of the information out there is deeply, deeply flawed. Last week, research firm Exponential View put out a questionable report claiming that AI had $110 billion in trailing 12-month revenues (between what looks like June 2025 and mid-June 2026), and did so by smashing together all AI revenues, including both OpenAI and Anthropic’s customer spend and compute spend , While the report claimed to “deduplicate” the numbers somehow, Exponential View declined to explain how it had done so. It’s also deeply deceptive to include both revenues and compute spend to try and represent the material health of the AI industry. This is because the AI industry is full of losers that cannot win without fiddling with the numbers, and because everybody is so excited, they’re ready to be fooled, and hesitant to dig an inch deeper.  Not me! I don’t give a shit, and I hate the feeling of being lied to, so I dug in. That’s because OpenAI and Anthropic represent as much as 75% of that revenue between their compute spend and revenues. Per The Information ’s and my own reporting , OpenAI had around $8.77 billion in revenue and spent about $17.48 billion on compute in 2025, and per The Information had $5.7 billion in revenue and spent $17.8 billion on compute in the first quarter of 2026, for a total of around $44 billion (40% of Exponential View’s total), which doesn’t include any of OpenAI’s compute spend or revenue for the months of April, May or June, which likely inflates the total further. While Anthropic is a little more-difficult to parse thanks to the Wall Street Journal’s unwillingness to make a readable chart , it had $4.8 billion in revenue in Q1 2026, and spent what I think is at least four billion on inference, and though its training costs are unreported, I think it’s reasonable to assume they’re at least $5 billion, for a total of $14.6 billion. If we, based on The Information’s reporting , take half (being generous, as most of this was weighed toward the end of the year) of Anthropic’s (all numbers are projections) $4.5 billion in 2025 revenue, $2.7 in inference costs and (I seriously question this number) $4.1 billion in training spend, we get $5.65 billion, for a total of $20.25 billion of contributions to Exponential View’s analysis, or around 18.4% of that $110 billion total. So, yeah, not including anything from Q2 2026, Anthropic and OpenAI represent 68% of the $110 billion of AI revenue that Exponential View is trying to get people excited about.  These are the actions of a loser propping up an industry of losers that cannot win by telling you the truth. This report exists entirely to fool the already-fooled and support an existing narrative, which is why Bloomberg covered it in the most obtuse, industry-servile way possible : Here’s two reasons this is fucking silly! Now, you may be wondering how they got that $25 billion number, and that’s because Exponential View gave it to them !   Yeah, but now they’re spending $765 billion on capex . Anyway, as I mentioned above, Exponential View’s Magical Maths magically brings those capex charges down to $25 billion, and entirely removes Meta because "initiatives are focused on ad uplift, so not recognized as pure GenAI revenue, or currently have minimal direct monetization.” What a loser move! Meta has oriented its entire company around AI ! I refuse to waste too much more time on this piece, but I need you to see how deceptively it’s framed this supposed “good news” for the AI industry, comparing its own proprietary depreciation formula against its own proprietary AI revenue formula to get a chart that is built to make the AI industry look good. No need for sourcing! No need for data! Just put the hype in the bag and invest in AI stocks!  I also find it despicable that Exponential View resorted to this weird, confusing “cumulative” AI revenues versus CapEx depreciation chart. The vast majority of this revenue is OpenAI and Anthropic’s compute spend, and I dunno, if you’re trying to do a report that gives the real state of the AI industry, maybe try and represent that anywhere in the report! These are, as I’ve suggested, the acts of losers propping up other losers. In the event that this industry had a fundamentally-sound revenue story, it would be extremely easy to show profits versus losses, track revenue in a transparent way, and produce a report that showed AI’s remarkable ascent. Instead, Exponential View says that AI is “real, big & fast” through a Pee Wee’s Playhouse of undefined models, datasets and alleged “quality grades” that helps feed a dangerous bubble further, and likely cons retail investors into further terrible decisions.  I know it sounds a little mean to call people losers, but what do I call an industry that sells itself on lies and deception? What do I call people that intentionally mislead people about the economics and outcomes of generative AI? If AI is so incredibly successful and impossibly brilliant, why does every explanation sound like it was written by The Riddler or somebody about to chug Jonestown Kool-Aid? Because they’re losers that can’t win by actually winning. Their best (and only) hope is to overwhelm you with a 24/7 marketing campaign (powered by the media) that makes all of this seem inevitable, impossible-to-stop, and a rip-roaring success, even as every company loses money and every product rings with a soulless mediocrity. That’s because LLMs are, while an interesting tool in a vacuum, currently being marketed by losers to losers using a mixture of Doom Trolling , insane extrapolations, and outright lies, manipulating people’s assumption that tech always gets better and that this much money can’t be wrong to create a marketing campaign fueled by deception. While using them doesn’t automatically make you a loser, you become one the very second you aggressively push somebody into doing so, as you have become the acolyte of the Loser Mafia. I have never heard anyone that’s an AI booster advocate for a technology with any level of excitement in their life, because they’re excited about how these tools make them feel and what they represent far more than anything else. They’re also tools intentionally built to produce engagement, and to make you feel you’re productive, even if you’re not. Just listen to this guy in this Bloomberg story about AI making people “productive, anxious and afraid to log off”: I’m sorry man, you have an addiction, and I worry it’s ruining your life. What is this producing? What are you actually doing with this time? Because if you’re allegedly 100 times more productive, wouldn’t that, y’know, produce something fairly incredible? I have no idea — and don’t want to put this man on blast — how significant his commitments on GitHub may or may not be, but the return on investment of “obsessively checking your laptop at all times in case you might not be productive” should be something on the order of curing a disease . The story continues: This man is a victim of a con, an industry-wide psychosis where you’re judged for not constantly dedicating every single second of your existence to prompt a series of chatbots into making something, all under the mistaken belief that at one point it’ll be so smart you…won’t have to prompt them?   Nevertheless, Van Horn is completely right — the sales pitch of AI is that agents were supposed to do the work for you, but billionaire losers are gaslighting you into believing that a digital busybox that requires constant vigilance to make sure it does what you ask or doesn’t spend too much money was somehow “autonomous.” While it’s easy to make fun of Silicon Valley, what we’re witnessing is a widespread mental health epidemic caused by liars like Sam Altman, Dario Amodei, and their wealthy backers lying about the capabilities of AI, creating an abusive culture where humans become subordinate to unthinking, hallucination-prone agents either subsidized by OpenAI or their employer: This is fucking horrible, and every loser who inflated this bubble should be ashamed of themselves.  In fact, fuck it, I want to speak directly to the people working in Silicon Valley and the tech industry who have been ground down by this industry.  I know not all of you are anti-innovation. I know many of you feel suffocated. I see you, I hear from you every day, and I find what is being done to you repulsive. Your industry has abandoned you .  Your investors are lying to you, and are getting rich while you can’t afford a studio apartment in the Tenderloin. AI does not do what you have been promised it does, and those who are excited about it are excited because they believe it will replace you. You are victims of a marketing campaign built to enrich a few people by sacrificing your time and energy to defend a doomed tool.  You are using tools that are built to manipulate you into making you work longer hours in the name of automation. You are being abused. You are being tricked into fighting for the 1% in the name of democratizing software. Your agents are meant to set you free, but they chain your body and mind to a system built to exploit your labor, extract your value and leave you dead. The people who make these agents fantasize about replacing you with them, and want to use your data to do so. They are lying that it is possible, but they want you to be scared so you will use their products more.  They have convinced you to fight on their side in a war where you will lose regardless of the victor.  You are a victim. I am not your enemy. I love technology too, and I want the tech industry to make cool shit again.  That will not happen under its current leadership. This era is built to drain the life out of you, to suffocate you with endless tech chatter, to make technology every part of your life, to somehow sell you the promise of automation, but only a kind of automation that you have to monitor constantly, prompt constantly, built to be addictive and superficially productive, built to fuel a Bay Area culture steeped in a godless version of the Protestant Work Ethic.  You must be a cracked engineer, you must work 15 hour days, you must have 8 subagents beating the absolute shit out of your codebase for one reason or another,  your Calendly must be open 8AM to 8PM, and you must be willing to work yourself to the bone for a chance to escape “The Permanent Underclass,” a misused term to refer to the world after an entirely-imaginary concept of Superintelligence, peddled by people who speak with a smugness that makes me want to spritz them like they jumped on the dinner table .  The grotesque glee that some have at the idea of being the first to announce AI’s destruction of everything you hold dear are your enemy, as are those who are desperate to constantly lick the boots of the Altmans and Amodeis of the world. Do not trust those who say that being part of an in group requires you to use certain kinds of software or attack others in the name of Silicon Valley.  The people encouraging you to work in this way do not care about you, or are being manipulated into believing this is how you all become rich by people exploiting their ignorance, fear or greed.  The people at the top do not care about the future, or progress, or anything other than growth. They are acolytes of a egregore of capital that has no purpose other than to expand and maximum velocity at all times, everything is fine as long as something is always happening, because the moment you stop moving you remember that nothing you’re doing really matters, because you’re making software while working sweatshop hours.  AI agents are built to make you interact with them. They are built to make you burn tokens. They are built to make you apologize for their mistakes and give them credit for your labor. Any “autonomous” tool that requires specific prompting, harnesses, scripts and tooling to make it sometimes work autonomously is conning you.  I’m also sure that there are a few perfectly normal software people using this stuff locally or with an open source model who treat it as normal software, loathe the data centers and see no need for the capex or mass market version of LLMs. These people are drowned out by a worryingly large crowd that speaks like they’re in a cult that exists to prove that OpenAI and Anthropic are somehow something more than SaaS companies. To them, using AI is a way of virtue signaling that they’re a pure, productive spirit, a willing supplicant for a future where they assume they’ll ascend because they told enough people “we’re still early.” The tech industry got taken in by a form of religious con, sold to them wrapped in atheistic “rationalism.”  Some may or may not have AI psychosis — or at the very least a severe addiction — as a result of being forced to interact with these things day-in-day-out, and the easiest way to check is to try not to use them for a day, or to try and solve a problem without them. If this is you, please know that I am not attacking you, and see you as a victim of a con. You are ingesting poison while being told it’s ambrosia. You are being made to work twice as much for roughly the same output, if not less. You are being humiliated or isolated for not using the right tools or saying the right things. Silicon Valley was built on the ideas of individualism and rationality, and the people at the top of your industry are telling you to fall in line and join an illogical consensus. You exist in a monoculture sold as anti-establishment as it mostly enriches Microsoft, Google and Amazon. Your culture is being eroded by people who do not care about technology. You are unwitting pawns in a greater war against innovation, where billions are steered into the hands of those who only ever care about growth and “acceleration” that benefits only a small few. You are not alone if you feel scared, anxious, listless and drained, because you are being worked to the bone building layers on top of AI models owned by subsidiaries of the largest companies in the world.  The fact that so many of you have to orient your products or fundraising around Twitter is a sign that your culture is decaying. A true meritocracy would reject the idea of “going viral on social media” like a virus, because it overwhelmingly benefits a monoculture that suppresses free thought and dissent.  Tech workers are in a constant battle between imbeciles and monsters, or an Arnold Palmer of the two. Those who want to build useful software that customers like you are drowned out by a Greek chorus of unexceptional cretins that think they’re competent because they can bonk an LLM on the head to make an impression of competence.  Generative AI is the Peter Principle on steroids, removing the friction points where a diplomatic moron might get caught out, making them far more mobile and extremely dangerous. Companies are run by men that don’t know what they’re doing, desperate to avoid anybody realizing that we’re at the end of software’s era of hypergrowth, increasingly aware of their own mortality and their lack of a culture that might actually build something a human being would want.  For those of you still hanging in there, I see you and admire you, because if I worked at most tech companies right now I’d fucking quit. Seeing this entire industry bow at the feet of the great unprofitable mediocrity machine is sickening, and based on the many tech workers I talk to every week, the mood effectively everywhere is exhausting, demoralizing, manic, and horrible to watch.  Everything must be done faster, with less people, with less organizational support, but more use of a tool best known for its hallucinations and ruinous cost, which you must use a lot, but also not too much. However much you use it, you must constantly celebrate it for fear a cult of personality and mediocrity will isolate or fire you for the crime of not wanting to “Do AI.” Even if you are still trapped in this world for months or years to come, know that you’re not crazy for finding it revolting, exhausting and debilitating. You do not have to do things this way, but I understand if you’re made to by circumstance or social pressure. The tech industry is in the throes of minor AI psychosis, or, put another way, it’s a way to scale the already-potent sense of make believe that has kept this industry afloat the last decade.  The grander cargo cult of praying at the foot of whatever capital-lust the venture capitalists currently have has led everyone astray, to the point that companies worth billions — or even trillions — of dollars on things based on how they might play out on Twitter, a maligned representation of the tech industry that caters to Silicon Valley gossip and the derangement of the markets, intellectually stunting most who cater their business or marketing to it.  The rest know exactly what they’re doing: appealing to an audience of venture capitalists convinced they’re “in the arena” by posting 12 hours a day writing 2000 word long posts using Claude. You must coddle these rich oafs, because it’s effectively impossible to raise money if you don’t. You must be able to recite the rituals — Hermes! Loops! Permanent underclass! — or you’re considered uncool by the least cool people alive. You, the great individualistic thinker of Silicon Valley, must convince wealthy oafs that you are an independent and rational person, but also that you will follow the greater consensus.  It’s a really unfortunate time to have ideas, dreams or goals outside of some sort of Potemkin agentic startup or if you can do the hocus pocus to con a VC into thinking you — or anyone — will invent recursive self-improvement, or AI that teaches itself.  You’re getting money right now if you can make noises that sound like you’ll be the next Baseten or whatever. It’s the era of inference I guess. Loops too. Keep cheering along! Never stop agreeing with what everyone else is doing, or if you do, only do so in a way that suggests that you all agree on the big stuff, which means you ultimately support either or both OpenAI and Anthropic, who companies that effectively operate as subsidiaries of the largest tech companies in the world.  It will stay this way until something changes.  As if I haven’t made it clear enough, the AI industry is losing. Their plans are not working, their products are not doing the things that they’ve promised, and though they intend to exhaust every available source of capital, they aren’t going to have enough money to do this forever. And no, AI is not “too big to fail.” Everybody makes fun of it. “AI” has become synonymous with generic, ugly, corporate slop. It’s a physical blight on the Earth, pumping horrifying toxins into minority neighborhoods and causing such noise that it makes people physically sick, and to make matters worse, some independent writers have made it their mission to cast doubt on these problems because they do not represent “the aggregate” of data centers. Everyone trying to be the “rational” voice on data centers should know that they’re only helping make the AI industry stronger. If you’re anxious that people are being “unfair” about water use, you’re an active pawn of capital, and exist only to help pump the bags of NVIDIA and the billions of dollars of speculative investment going into these monstrosities.  Without getting into the weeds, know that anyone talking about data center water use in terms of almonds or cattle is an actual industry plant.  California does use a lot of water to make almonds — and also makes 100% of America and 80% of the world’s supply . Cattle and other livestock also take up a lot of water and land, but they also make food for people to eat. You can bicker about how much water a data center may or may not use, and you’re going to sound like a complete loser every second you do so, because you are fighting to make sure that the AI industry can build data centers for the largest companies in the world.   Data centers are a monument to everything wrong with the world — horrifyingly large, loud, demanding of power and water and resources of all kinds. They create very few jobs, and those involved in their construction are usually from out-of-state. Their actual value to the world is largely tied up in their nebulous theoretical contribution to something an AI company does, and they get huge tax breaks, which means they don’t really contribute very much to many of the areas they’re put in. They are intentionally conflated with the smaller, useful data centers we’ve had in the past, all so that pedants can say “ehhmmm, you never had a problem with these before?” I haven’t, because previous data centers haven’t been filled with GPUs or drawn more power than a small town, nor have they been rammed through by a combination of crony capitalism, tax breaks and endless debt. And it’s fundamentally unclear why we need them!  No, really, why do we need these fucking things? So Anthropic and OpenAI can do more of whatever it is they’re doing? Neither appears to be unable to serve customers — other than the lousy uptime of Claude — nor do they appear to improve their products based on the availability of compute.  For such an offensively-large footprint — physically, fiscally and societally — nobody can really explain why the fuck we need all these things, other than the fact that they might make somebody money on a service that is best known for its huge mistakes and lack of profitability.  As I’ve discussed, the demand isn’t there outside of these two companies, and the only reason anyone believes that it does is that the largest tech companies in the world have burned through every dollar they have to hide from you that they’re out of big ideas . The AI industry fights like a bunch of losers because that’s what they are. They cannot win by telling the truth about their products, their infrastructure, the condition of their finances or their overall intentions. They cannot succeed without manipulation and deceit because they know, deep down, that their businesses don’t make sense and their actual products, described in the present tense, are impossible to justify what they’re asking for. They require us to coddle them, to ignore their ruinous cost, avert our eyes when they hallucinate or delete somebody’s database , blame ourselves when they make mistakes and speak entirely in theoretical terms when we describe them because the present kind of fucking sucks.  Absolutely nothing that the AI industry has created is worth even a fraction of the trillion-plus sunk into this industry, and at this point it’s very clear that these models cost about as much as a person and even then are neither capable of replacing one or profitable for the provider.  The best shot the AI industry has is open source models that may only be getting better by distilling American models. At some point Anthropic or OpenAI is going to slow down and then stop making models entirely because it costs too much money to train models, and said costs are only increasing. Even if GLM 5.2 is truly nearly as good as Opus 4.8, it did so by copying its outputs, which means that these models will likely only get as good as long as the foundation model companies keep training, which will only be possible if they can keep raising funding, which will become difficult if open source models eat their lunch in any meaningful way.  Could Anthropic and OpenAI theoretically make better models in a vacuum? Sure! But they’re now going to have to slow-roll them, because Sam and Dario’s four or five-year-long scaremongering campaign has forced them into a situation where the US government demands oversight into their model releases at a time where the AI industry cannot afford to slow down .  Their only option is to sit there and take it or, alternatively, admit that they’re making normal software, which will make the whole “let’s build a trillion dollars of data centers” thing a little harder to justify.  This will also be a tougher sell to Masayoshi Son of SoftBank, who gave a truly demented presentation during the 46th annual SoftBank shareholder meeting , calling the company a “golden egg machine” that’s also a goose that lays eggs that are, at times, undervalued.  Masayoshi Son has sunk $64 billion into OpenAI, and existentially tied a company with a quarter-of-a-trillion dollar market capitalization — the third largest on the Japanese stock market — to whether or not Sam Altman can turn a company that burned $20.9 billion in a single year into a company that makes more than $284 billion in annual revenue by 2030 . If you’re curious, the second-largest is Mitsubishi UFJ Financial Group, a massive Japanese bank with tens of billions of dollars invested in AI data centers , and the first is Kioxia, a memory and storage company that has seen massive revenues as a result of the massive demand for memory and storage for AI data centers.  What do you think happens if AI data center capex slows? What do you think happens when it turns out there’s not enough demand for all those data centers? Even if MUFJ and SMBC (the second-largest Japanese bank, also heavily levered in AI) have sold off part of the risk, their counterparties are still part of the global banking system. Anyway, SoftBank’s glorious, Geese-filled future depends upon OpenAI going public, and the New York Times just reported it’s likely pushed its IPO back to 2027 , because bankers didn’t think it would get a trillion-dollar valuation, which is an absolute disaster considering its pre-money valuation ( as in before the $122 billion it raised ) was around $735 billion. While it's partially blaming the floundering value of SpaceX, I think it’s possible (though I have no privileged knowledge to confirm it) that my story publishing its audited financials had something to do with it.  One can present financial data in all manner of ways, and I have to wonder whether its S-1 might have differed in some way — perhaps how segments were broken down — to what I reported. Perhaps bankers saw the reaction to the numbers, the mess that is SpaceX, the weird state of the market, and said “yeah man you’re gonna be lucky to float at $700 billion.” We may never know. 2027 may as well be in the year 3000 for how far away it is, and how much further OpenAI will have to drag itself to get there.  While it “raised $122 billion” earlier in the year, it’s waiting for two more tranches of $20 billion a piece from NVIDIA and SoftBank, and will now straight up not get the $15 billion that Amazon conditioned on it either going public or reaching AGI. Considering that Mr. Altman can’t even con a bunch of bankers who were dumb enough to believe that SpaceX could 300x its AI revenue by 2030, it’s clear that the jig is up.  Another worrying sign is that SoftBank was unable to raise a $6 billion margin loan with its entire OpenAI stake — likely valued, at least on paper, at over $100 billion — as collateral. This suggests banks have little faith in the company. Some might believe that Anthropic has a better chance, and I’m just not sure there’s much that differentiates it from OpenAI anymore, other than how annoying Dario Amodei is and how much he appears to piss off the Trump administration.  Anthropic is a large language model company that loses billions of dollars that has subsidized accounts that allow users to burn $8,000 a month in tokens for $200 . To paraphrase and build upon something said by Cory Doctorow , if your business is only successful when you give away $40 for $1, that’s not a real business, it’s a way to feed venture capital dollars to hyperscalers and sell a bunch of people a product that doesn’t exist.  Anyone still lazy enough to say “they’ll crank up the price” or use some hackneyed Amazon Web Services or Uber comparison is either deliberately ignorant ( I explain here ) or a loser like the rest of the AI industry. If you’re so confident about this shit, despite all the blaring warning signs, you need to start finding actual, real, tangible evidence, and you need it soon. Every argument in favor of AI requires you to speak in the future tense and ignore your lying eyes. The AI industry will not allow you to discuss LLMs in terms of what they do today without reminding you that progress has been so rapid over the last few years and demanding you immediately acquiesce that something might be good in the future.   Seriously, try and talk to somebody who loves AI sometime and criticize the tech and see how quickly they fall into the tropes of AWS losing money, AI models rapidly getting better ( at benchmarks rigged in their favor because they can’t use a computer like you or me ), about the “cost of intelligence going down” ( when it’s actually going up ), or any number of other tired tropes that mostly rely on you ignoring the present in favor of a billionaire’s dream of the future. These are, as I have been saying, the acts of losers. This is what you do when you do not actually have a compelling story, cannot win by being straightforward or contrite, and have no way to prove yourself valid outside of appealing to cargo cults and doing financial engineering, except you’re such a loser that you’re not even doing it to commit fraud! You’re just writing PDFs so you get shares on Twitter.  Forgive me for being so very brusque , but I have had to prove myself endless the last few years, and when I finally bring you the proof that OpenAI loses a bunch of money, you immediately jump for the first keys jingled above your head. If you truly love the AI industry so much, you should ask it for better proof! You should be enraged that OpenAI’s numbers are so shitty, and that you have to debase yourself by pretending they’re not! How utterly shameful!  That’s loser shit! If you love large language models so much, go out and demand the people making them bring you the answers to my questions. Whenever I’m asked about how I might be wrong it mostly comes down to “but what if something that hasn’t happened happens?”  If your answer is “OpenAI will drive down the cost of its silicon using its “Jalapeño” chip from Broadcom,” you do not have shit! It’s still in early testing ! There is no future for the future these people are building. The demand does not exist for these data centers. It never has. It never will. You can give Baseten as much money as you want, you can talk about the exciting world of open source for hours, but there is not actually enough demand for this stuff unless it becomes something very different, very soon, in a very big way, that likely also involves it getting cheaper.  Anthropic and OpenAI have $1.1 trillion in compute commitments that are contingent on their continued growth, at a time when their customers are protesting their costs, at a time when the market is clearly saying “you are not worth a trillion dollars.”  What do you think changes that?  The halo effect of AI has given way to a societal cynicism, even by the people that love it, who have a sort of vague reticent “I give up” vibe that I find exhausting to watch and will have a great deal of trouble forgetting once the bubble bursts. Even the people who claim to be excited are making jokes about Masayoshi Son and Sam Altman!  Everything about AI has the stench of death and desperation, of losers pretending they’re winners who can only thrive in conditions that reward grifting, specious hype and forward-looking statements that vary from ridiculous to deliberately harmful. It’s ugly, regressive, and when this era ends, I expect financial carnage and chaos that could have easily been avoided had so many people not so readily swallowed poison under the auspices of innovation. Then again, some people might just be born to be regulated by the wallet inspector. If you liked this piece, you should subscribe to my premium newsletter. It’s $70 a year, or $7 a month, and in return you get a weekly newsletter that’s usually anywhere from 10,000 to 18,000 words, including vast, detailed analyses of the biggest events and companies in the AI bubble.  Exponential View’s research rigs the dice using obfuscated, proprietary data. Anthropic and OpenAI represent at least 68% of the supposed $110 billion in AI revenue from the last 12 months. While the report claims to ‘deduplicate’ revenue across the AI stack, it does not provide any source data of any kind, making it impossible to verify. The report uses “annualized run rate” to try and make the AI industry’s revenue seem larger than it is. This report is industry marketing framed as research, but uses deliberately positive framing and questionable data sources. You are comparing costs of the entire industry against depreciation costs of the few companies that actually buy AI GPUs. In Q1 2026, Amazon had $18.94 billion , Microsoft $10.1 billion and Google $4.4 billion in depreciation . That’s $33.44 billion! That’s more than $25 billion! And I haven’t even included Meta, but don’t worry , as I’ll get to, neither does Exponential View!

0 views
Lalit Maganti Yesterday

On "When impressive performance gains do not matter"

When impressive performance gains do not matter is a very nice article covering some ways in which going after performance alone is not sufficient without considering the wider picture. It resonated a lot with how I think about performance. If there are multiple bottlenecks in the pipeline—and with these systems, this is common—the overall throughput will not improve until every last bottleneck is removed. His focus is on distributed systems bottlenecks, but I’ve hit the same “do-nothing” speedups when optimizing client side programs. Usually this comes from spending a lot of time thinking something was the bottleneck when it wasn’t. CPU profiling is where this bites me most: it tells me “function X is taking 30% of the cycles” and I think “oooo, there’s a lot of gains to be made there”. I build a microbenchmark for X, optimize it and there’s only a marginal gain at the high level. While disappointing, I’ve become used to it over time and internalized that performance is highly non-linear and actually knowing where the problem lies is really hard.

0 views

Benchmarking Hardwood 1.0 on a Threadripper 9980X

Hardwood is a minimal-dependency Java library for reading Parquet files. It currently has row-reader and columnar-reader APIs, with Parquet writing planned for the future. Gunnar Morling, Hardwood’s author, published some initial benchmarks in the v1.0 announcement, comparing Hardwood’s row and column readers against Parquet Java . Those benchmarks measured read speed against already-downloaded Parquet files.  Gunnar’s benchmarks ran on an m7i.2xlarge, with 8 vCPUs / 4 physical cores. Each test used three variants: Hardwood with decoder threads = , which equals 8 Hardwood pinned to one CPU thread with taskset Parquet Java, single-threaded I was curious how the same benchmarks would look on my Threadripper 9980X: 64 cores / 128 threads, with 256 GB ECC DDR5. I modified Gunnar’s benchmark code to also test Hardwood with fixed decoder-thread counts: 1, 4, and 8. That gives the following Threadripper variants: Hardwood, unpinned, decoder threads = 128 (available processors) Hardwood, unpinned, decoder threads = 8 Hardwood, unpinned, decoder threads = 4 Hardwood, unpinned, decoder threads = 1 Hardwood pinned to one CPU thread (taskset) Parquet Java, single-threaded One important detail: decoder threads = 1 is not the same as the pinned 1-core test. With decoder threads = 1, the main thread can run on another core. The pinned test constrains the whole process to one logical CPU which is the closest we can get for like-for-like comparison to single-threaded Parquet Java. This benchmark reads all columns of the dataset 48M row dataset. m7i.2xlarge Fig 1: m7i.2xlarge, Hardwood (all cores) 16.5M/s, Hardwood pinned 1-core 3.9M/s, Parquet Java (single-threaded) 3.3M/s Threadripper 9980X Fig 2: Threadripper, Hardwood (all cores) 43.4M/s, Hardwood dt=8 48.4M/s, Hardwood dt=4 44.9M/s, Hardwood dt=1 15.5.9M/s, Hardwood pinned 1-core 11.0M/s, Parquet Java (single-threaded) 5.8M/s A few things stand out: The Threadripper is much faster in the single-core cases than the m7i.2xlarge. Hardwood pinned to one core reaches 11.0M rows/s (with some runs reaching over 12M), versus 3.9M rows/s on the m7i.2xlarge. Generally about 3x faster. Hardwood’s single-core result on the Threadripper is also much stronger relative to Parquet Java. On the m7i.2xlarge, Hardwood 1-core is only modestly ahead of Parquet Java: 3.9M rows/s versus 3.3M rows/s. On the Threadripper, Hardwood 1-core is almost 2x faster: 11.0M rows/s versus 5.8M rows/s. More decoder threads help, but only up to a point. The best result here is 8 decoder threads, at 48.4M rows/s. Four decoder threads are close behind at 44.9M rows/s. The default availableProcessors() setting, which gives 128 decoder threads on this machine, is slower than both, which is not surprising. This benchmark reads all rows of the dataset 48M row dataset. It has two variants: Indexed (positional) columns, i.e. r.getLong(3) Named-columns, i.e. r.getLong("passenger_count") m7i.2xlarge Fig 3: m7i.2xlarge, Indexed-columns, Hardwood (all cores) 14.9M/s, Hardwood 1-core 4.4M/s, Parquet Java (single-threaded) 1.4M/s. Named-columns, Hardwood (all cores) 2.8M/s, Hardwood 1-core 1.9M/s, Parquet Java (single-threaded) 1.4M/s Threadripper 9980X Fig 4: Threadripper, indexed (positional) columns, Hardwood (all cores) 33.4M/s, Hardwood dt=8 36.1M/s, Hardwood dt=4 34.9M/s, Hardwood dt=1 14.4M/s, Hardwood pinned 1-core 10.8M/s, Parquet Java (single-threaded) 3M/s. Named columns, Hardwood (all cores) 5.9M/s, Hardwood dt=8 5.8M/s, Hardwood dt=4 5.9M/s, Hardwood dt=1 5.7M/s, Hardwood pinned 1-core 4.3M/s, Parquet Java (single-threaded) 2.6M/s The indexed-column row reader shows the same basic pattern as the columnar full scan. Hardwood is much faster than Parquet Java even in the pinned 1-core case: 10.8M rows/s versus 3.0M rows/s. The best multi-threaded result is again with 8 decoder threads, at 36.1M rows/s, with 4 decoder threads close behind. The named-column reader is different. Hardwood is still ahead of Parquet Java, but it does not meaningfully scale with decoder threads. The unpinned Hardwood results are all around 5.7M to 5.9M rows/s, regardless of whether the benchmark uses 1, 4, 8, or 128 decoder threads. If you want high throughput, use the indexed-column approach. This test generates data with 4 columns and 50M rows where event_time is perfectly ordered. The filter is event_time < threshold, and therefore the file is therefore clustered by the predicate column, relying on Parquet row-group/page/column statistics. The file contains no bloom filters as Hardwood does not support those yet). There are two variants: selective: event_time < 2,500,000 (about 5% pass) matchAll:  event_time < 50,000,000  (100% pass) The test measures the time for the filtered scan to complete. m7i.2xlarge Fig 5: Selective (5%), Hardwood (all cores) 12.9 ms, Hardwood pinned 1-core 53.8 ms, Parquet Java (single-threaded) 173 ms. Match-all (100%), Hardwood (all cores) 222 ms, Hardwood pinned 1-core 983 ms, Parquet Java (single-threaded) 3157 ms Threadripper Fig 6: Selective (5%), Hardwood (all cores) 10.5 ms, Hardwood dt=8 5.1 ms, Hardwood dt=4 7.2 ms, Hardwood dt=1 24.1 ms, Hardwood pinned 1-core 32.0 ms, Parquet Java (single-threaded) 97.9 ms. Match-all (100%), Hardwood (all cores) 95.0 ms, Hardwood dt=8 80.4 ms, Hardwood dt=4 122 ms, Hardwood dt=1 425 ms, Hardwood pinned 1-core 537 ms, Parquet Java (single-threaded) 1777 ms. The relative shape is similar to the m7i.2xlarge results, but the Threadripper is much faster. In the single-core comparison, Hardwood is about 3x faster than Parquet Java in both cases: 32.0 ms versus 97.9 ms for the selective scan, and 537 ms versus 1777 ms for the match-all scan. With multiple decoder threads, Hardwood is much faster again. The best Threadripper result is 8 decoder threads: 5.1 ms for the selective scan and 80.4 ms for the match-all scan. I hacked on Gunnar’s benchmark code to add some more test cases. Fig 7: Threadripper. Hardwood (all cores) 192M/s, Hardwood dt=8 215M/s, Hardwood dt=4 119M/s, Hardwood dt=1 30.9M/s, Hardwood pinned 1-core 26.8M/s, Parquet Java (single-threaded) 13M/s This is one of the clearest decoder thread scaling results. Hardwood 1-core is about 2x faster than Parquet Java, and 8 decoder threads reach 215M rows/s (14.8x faster than Parquet Java). Unlike the full-scan benchmarks, there is a large gap between 4 and 8 decoder threads here. Fig 8: Threadripper. Hardwood (all cores) 118M/s, Hardwood dt=8 120M/s, Hardwood dt=4 119M/s, Hardwood dt=1 116M/s, Hardwood pinned 1-core 50.1M/s, Parquet Java (single-threaded) 87.1M/s. The string column seems to change the performance profile. This case behaves differently, with Parquet Java winning compared to the pinned 1-logical-core Hardwood test. More than one decoder thread does not help: the unpinned Hardwood results are all between 116M and 120M rows/s. I haven’t profiled this so I can’t explain the result. In this test, we use the predicate , which matches 500324 rows (1%) of the deterministically generated 50M row dataset. This time the files are not clustered by the predicate but the total number of matching rows is 5x smaller than the filter test from earlier. Fig 9: Threadripper. Hardwood (all cores) 141 ms, Hardwood dt=8 135 ms, Hardwood dt=4 131 ms, Hardwood dt=1 129 ms, Hardwood pinned 1-core 291 ms, Parquet Java (single-threaded) 2522 ms. Hardwood is far ahead of Parquet Java here. Even the pinned 1-core Hardwood result is about 8.7x faster than Parquet Java. I ran the benchmark with the flag, which verifies that each test returns the same data, and it passed, so the result looks legit. Decoder threads do not help much in this test. The unpinned Hardwood results are all between 129 ms and 141 ms. That suggests this benchmark is limited by something other than parallel decoding. The Threadripper 9980X is a workstation, not a server. It has a higher clock speed but lower memory bandwidth that its EPYC server counterparts. I imagine you’d see lower performance numbers on the EPYCs for these tests, but the EPYCs would easily beat the Threadripper on the amount of parallel Hardwood workloads due to the 12-memory lanes compared to the Threadripper’s 4 lanes. Thinking about memory bandwidth, I decided to see how Hardwood scales across instances, where each benchmark process was pinned to 4 physical cores and given 4 decoder threads. Fig 10. Threadripper. 1 process (4 physical cores) 26.1M/s, 2 processes (8 physical cores) 47.5M/s, 4 processes (16 physical cores) 79.2M/s, 8 processes (24 physical cores) 81.2M/s, 12 processes (48 physical cores) 79.6M/s, 16 processes (64 physical cores) 75.1M/s. We reached close to this workstation’s memory bandwidth limit at 4 processes on 16 physical cores, and after that there was little benefit or even reduced throughput as efficiency dropped. Fig 11. The memory bandwidth topped out in the 4th test (8 processes, 32 physical cores) The Instructions Per Cycle (IPC) dropped further and further, signalling the reduced efficiency. Fig 12. The IPC drops as we add more and more parallel benchmark instances. And, we became increasingly memory bound. Fig 13. AMD uProf’s top-down estimate of how much CPU pipeline capacity is lost because the backend is waiting on the memory subsystem The EPYC 9575F single socket has 614 GB/s (theoretical) and the dual-socket up to 1.2 TB/s (theoretical) bandwidth, compared to just 205 GB/s theoretical for my workstation (though the max actual I’ve measured is 170 GB/s). So the EPYC would have blown the socks off my workstation. I’m including this as a reminder that benchmarks don’t usually measure things like memory bandwidth saturation under high parallel load. On my Threadipper 9980X, Hardwood’s single-core performance looks strong against Parquet Java across most of these benchmarks. In the full columnar scan, pinned 1-core Hardwood is almost 2x faster than Parquet Java. This contrasted to the m7i.2xlarge where Hardwood only saw a modest single-core advantage over Parquet Java for this specific test. Thus a reminder that your mileage may vary. In the positional row-reader scan, Hardwood was about 3.6x faster than Parquet Java, and in the filtered scans, about 3x faster. The custom predicate benchmark shows an even larger gap.  Hardwood’s multi-threaded performance is also strong up to a certain decoder-thread count (which is workload-hardware-dependent). On this Threadripper, 4 or 8 decoder threads were usually enough. The default value gives a ridiculous 128 decoder threads which was unsurprisingly less efficient than 8. The main exceptions to decoder thread scaling were the named-column row reader, the string column subset, and the custom predicate benchmark. Those cases showed little or no benefit from increasing decoder threads, even when Hardwood still beat Parquet Java overall. I initially wondered if the strong single-thread performance compared to the m7i.2xlarge was the Threadripper’s strong AVX-512 support, but after profiling it with AMDuProfPcm, it turned out that this was not the case. I also tested out enabling the Vector API, but it made no difference to the performance. If any performance engineers out there want a fun project, then my feeling is that Hardwood still leaves a lot on the table for optimizing. It could be a fun project. I finish by saying this benchmarking was for fun on a workstation. So these results are not generalizable but they do correspond to the m7i.2xlarge results (just better). They are mostly useful as a directional look at how Hardwood behaves on a high-core-count workstation. You need to benchmark your own use case, on your chosen hardware. Hardwood with decoder threads = , which equals 8 Hardwood pinned to one CPU thread with taskset Parquet Java, single-threaded Hardwood, unpinned, decoder threads = 128 (available processors) Hardwood, unpinned, decoder threads = 8 Hardwood, unpinned, decoder threads = 4 Hardwood, unpinned, decoder threads = 1 Hardwood pinned to one CPU thread (taskset) Parquet Java, single-threaded The Threadripper is much faster in the single-core cases than the m7i.2xlarge. Hardwood pinned to one core reaches 11.0M rows/s (with some runs reaching over 12M), versus 3.9M rows/s on the m7i.2xlarge. Generally about 3x faster. Hardwood’s single-core result on the Threadripper is also much stronger relative to Parquet Java. On the m7i.2xlarge, Hardwood 1-core is only modestly ahead of Parquet Java: 3.9M rows/s versus 3.3M rows/s. On the Threadripper, Hardwood 1-core is almost 2x faster: 11.0M rows/s versus 5.8M rows/s. More decoder threads help, but only up to a point. The best result here is 8 decoder threads, at 48.4M rows/s. Four decoder threads are close behind at 44.9M rows/s. The default availableProcessors() setting, which gives 128 decoder threads on this machine, is slower than both, which is not surprising. Indexed (positional) columns, i.e. r.getLong(3) Named-columns, i.e. r.getLong("passenger_count") selective: event_time < 2,500,000 (about 5% pass) matchAll:  event_time < 50,000,000  (100% pass)

0 views

F3: The Open-Source Data File Format for the Future

F3: The Open-Source Data File Format for the Future Xinyu Zeng, Ruijun Meng, Martin Prammer, Wes McKinney, Jignesh M. Patel, Andrew Pavlo, and Huanchen Zhang SIGMOD'26 F3 is a file format for columnar data (e.g., Parquet ) that is designed to be efficient and extensible. The optimizations make sense, the extensibility mechanism is ingenious , dangerous , fascinating. The key assumption made by this paper is that the hardware and software will continue to improve. It is hard to argue with that. The trouble is that interoperable formats like Parquet take a snapshot of the state-of-the-art and freeze it in a specification. Some innovations that are invented after the format is frozen are incompatible with existing formats because they require a different data layout. Section 1 of the paper refers to many examples related to compression, indexing, and filtering. The goal of F3 is to be general enough to allow seamless incorporation of future innovations without changing the F3 spec nor F3 decoder implementations. Fig. 2 illustrates an F3 file: Source: https://dl.acm.org/doi/10.1145/3749163 A file consists of a metadata and a set of row groups. A specific row group contains data for all columns and a subset of rows. F3 contains incremental improvements over existing columnar formats, for example: F3 metadata supports random access, which is important for operations that only need to access a smaller percentage of all columns. F3 decouples file I/O from a row group storage. The rows associated with a given column in a row group are further subdivided into , which are actually stored. This allows row groups to be sized for efficient row-group level filtering, while the IO unit size is tuned to minimize working set while also amortizing the fixed costs associated with file I/O. F3 allows flexible . Each IO unit can contain a dedicated dictionary, or multiple IO units can share a dictionary. Columns with low cardinality will benefit from smaller dictionary scopes, whereas columns with high cardinality do better with larger dictionary scopes. The stand-out feature for F3 is the yellow block in the block. The idea is that an F3 file can contain within it the WebAssembly code needed to decode the encoded values in an IO unit. If someone invents a brilliant new encoding method that works well with some data sets, they can ship the decoder right along with the data set. Storage of the WASM code shouldn’t be too much of an issue, because the storage cost is amortized across all row groups. The big questions are performance and security. Section 6.2 has some comments on this. In theory, the WASM specification is air-tight, and a bug-free implementation should be able to securely run arbitrary WASM code in-process. WASM also supports performance optimizations like parallel compilation and SIMD instructions. Something I don’t see in the paper is a discussion about how filtering interacts with WASM decoding. I suppose the extensibility could only be used for decoding, and filtering could be hard coded into F3, but that seems against the extensible spirit of F3. Fig. 11 shows the working set reduction from decoupling IOUnit size from row group size: Source: https://dl.acm.org/doi/10.1145/3749163 Table 3 shows how flexible dictionary scopes allow one to trade encoding time for compression ratio (lower relative CR numbers mean smaller files on disk): Source: https://dl.acm.org/doi/10.1145/3749163 Fig. 15 quantifies WASM overhead by comparing decode time for hard coded F3 decoder implementations vs the same algorithms expressed in WASM: Source: https://dl.acm.org/doi/10.1145/3749163 Fig. 16 shows potential savings associated with using WASM extensibility to implement a custom decoder from the literature. Source: https://dl.acm.org/doi/10.1145/3749163 Dangling Pointers I wonder how well WASM decoders can be implemented on other hardware architectures. Is WASM the ideal language for expressing this, or convenient standard that already exists? Thanks for reading Dangling Pointers! Subscribe for free to receive new posts. Source: https://dl.acm.org/doi/10.1145/3749163 A file consists of a metadata and a set of row groups. A specific row group contains data for all columns and a subset of rows. F3 contains incremental improvements over existing columnar formats, for example: F3 metadata supports random access, which is important for operations that only need to access a smaller percentage of all columns. F3 decouples file I/O from a row group storage. The rows associated with a given column in a row group are further subdivided into , which are actually stored. This allows row groups to be sized for efficient row-group level filtering, while the IO unit size is tuned to minimize working set while also amortizing the fixed costs associated with file I/O. F3 allows flexible . Each IO unit can contain a dedicated dictionary, or multiple IO units can share a dictionary. Columns with low cardinality will benefit from smaller dictionary scopes, whereas columns with high cardinality do better with larger dictionary scopes.

0 views
Kev Quirk Yesterday

📝 2026-06-30 11:37: Our 2 hens have finished sitting on the Guinea fowl eggs - out of 10,...

Our 2 hens have finished sitting on the Guinea fowl eggs - out of 10, we managed to hatch 5 of them. The chicken eggs we have in the incubator won't be ready for another week or so. Excited to see how many of them we get. Thanks for reading this post via RSS. RSS is ace, and so are you. ❤️ You can reply to this post by email , or leave a comment .

0 views
マリウス Yesterday

Updates 2026/Q2

This post includes personal updates and some open source project updates. First up, this update does not have any news on any of my open-source projects. If you’re here for that you might as well close this tab now, sorry. With all that’s been happening I had no time to advance any of the projects. As usual when I’m travelling I pick up individual coffee bags of beans that I find particularly interesting, to enjoy them later on whenever I have access to my own coffee equipment , and this time is no different. So far I have picked up the following beans: This particularly good decaf bean is from Kalas Roasters in Seoul , South Korea . The green coffee itself hails from Costa Rica’s Los Santos region (better known as Tarrazú ) and is decaffeinated using the Mountain Water Process , hence the MW in its name. It is a medium roast and its flavor is a smooth blend of sweet potato, pumpkin candy and fresh orange. It’s a clean and balanced taste with less caffeine. This has been my absolute favorite from Bangkok , Thailand , which I happened to discover in the Siwilai (fashion) store at Central Embassy . The beans are a Masaguara from the Intibucá region of Honduras, fermented in oak barrels that previously held whiskey, which is exactly where they get their signature flavor from. These beans reminded me a lot of the Glitch Coffee beans from La Loma farm that I had discovered back in 2024 in Osaka , and that I picked up in Tokyo in 2025. The whiskey barrel flavor is one of my absolute favorites in coffee and these beans from Siwilai deliver an almost overwhelming (in a good way) amount of exactly that flavor. Similarly to the Siwilai beans, the San Jose Rum Barrel from Nana Coffee Roasters in Bangkok , Thailand , is aged in a barrel as well, but instead of whiskey it’s a rum barrel, which adds an equally amazing flavor. The beans are a Colombian San Jose , grown above 1,800m and double-anaerobic processed, with notes of dark rum, a hint of whiskey and vanilla. Last but not least, the Mr. Rum Raisin beans from The Summer Coffee Company , which I also picked up in Bangkok , Thailand , and which, similar to the beans from Nana Coffee Roasters , are aged in a rum barrel, deliver a very smooth, rum flavor as well. Mr. Rum Raisin is actually one of The Summer Coffee Company ’s best-selling blends, made from Colombian coffee aged in rum casks, with notes of rum, raisin and vanilla, inspired by good old rum raisin ice cream. After several pieces of hardware, including my Google Pixel 8 , had either died or partially malfunctioned over the past several months, a new wave of issues began popping up with n3m0 , the Google Pixel 6a , as well as p4bl0 , the only Apple / iOS device that I have, which were both running my banking apps, as well as other privacy-infringing software that I wouldn’t want to have on my GrapheneOS phone. Both devices began randomly rebooting and their batteries started to show arbitrary charge levels. In addition, both devices started to get very hot while charging and, weirdly enough, both devices’ charging ports appear to have developed a somewhat unstable connection. Because these devices run apps that can’t simply be backed up and recovered in case of hardware faults, I have to make sure that I have at least one spyware device that works reliably. Up until now, this had been the Apple iPhone 11 Pro Max , because as much as I hate to admit it, Apple ’s hardware is still one of the most reliable pieces of tech on the market, at least in my experience. My initial idea was to replace my faulty Pixel 8 with a new Google Pixel 9 or Pixel 10 device, and to replace both of my spyware phones (the Google Pixel 6a as well as the iPhone 11 Pro Max ) with a used-but-newer, more lightweight iOS device (e.g. an iPhone 12 Mini ). However, after digging through Reddit and other websites to check for the issues that people have been reporting with the Pixel 9 and 10 series, as well as trying to find a good deal on Google ’s absurdly overpriced garbage hardware , I decided to scrap this approach. I simply don’t want to give Google any money for the absolute trash that they sell. Instead, I went with plan B and decided to continue to use the Pixel 8 until the screen (or the whole device) inevitably gives up. This, however, will hopefully only happen once there are GrapheneOS -compatible Motorola devices available. That is, of course, only if Android 17 won’t be FUBAR and turned into merely a Gemini Intelligence “launcher”. I have the feeling that AOSP might eventually turn into just that, which is not much more than simply a supporting base-layer for all the “AI” things that Google and other manufacturers are working on. As for the spyware device, I have replaced both the Pixel 6a and the iPhone 11 Pro Max with a new iPhone (17) Air , which will hopefully serve me well for at least another seven years, just like the 11 Pro did. One reason I went with the Air was form-factor and weight. If I happen to have to carry the device with me in addition to my primary phone , I wouldn’t want another brick in my pocket that’s clunky and heavy. While the Air is significantly larger than I anticipated with its 6.5" display, it is fascinatingly thin at only 5.64mm (except for the top bump) and light at only 165g. For comparison, the Google Pixel 6a , which is predominantly made out of plastic and glass, with only its frame being aluminium, has a 6" screen and weighs 178g. Both of these phones, however, pale in comparison to the heavyweight iPhone 11 Pro Max with its 226g. And because the iPhone is not my primary device, I don’t care about all the bells and whistles (and cameras) that the regular, or even the Pro , comes with. Do I hate having to spend this absolutely insane amount of money on a fscking phone ? Yes, yes I do. Would I ever recommend anyone paying full price for such a device? No way. Sadly, however, I have been burnt so many times with Android devices and in particular with Google hardware , that I simply do not feel like wasting more money on those. Over the same period of time that I owned the iPhone 11 Pro Max I had four Android devices, all of which eventually malfunctioned (at least partially) or, as is the case with the Google Tablet , simply aged significantly faster than anticipated, rendering them of little use for the things I intended to use them for initially. Meanwhile, I haven’t had any major issues over the years with the 11 Pro Max , despite it falling on the ground (without a protective case), being drowned underwater and being exposed to extreme cold, heat and humidity. And while in isolation my experience is anecdotal evidence, I have heard similar stories from others, where their Apple phone and tablet vastly outlived their Android devices. Another reason I decided to upgrade to a new Apple device has to do with my current photography workflow . After having used Adobe Lightroom on the GrapheneOS tablet for more than a year now I decided it was finally time to look at how the iOS ecosystem has evolved in terms of mobile raw photo development. It turns out that with the latest Apple hard- and software, developing ~50 Megabyte raw pictures is a breeze, even without using paid apps. Despite the iPhone Air being limited to USB 2.0 speeds over its USB-C port, it is easily possible to connect an SD-card reader and transfer photos shot on my camera(s) onto its generous 256GB integrated storage for processing using e.g. the free Snapseed app. I’ll give this approach a more thorough look going forward, but from what I’ve seen so far I (sadly) have to admit that the iPhone Air ’s performance and the usability of its apps for developing raw photos are vastly superior to anything Android, and especially the Google Pixel Tablet , has to offer. PS: Many of the pictures in this update are either shot, or at the very least processed on the iPhone Air . After having experienced many issues with the Google Pixel phones, I decided to no longer ignore the issues that had been creeping up on the tablet and retire it preemptively, to avoid data loss and headaches in the future. Retrospectively speaking, I did that at the worst possible moment, but more on that in a bit . Anyway, with the new iPhone looking very promising with regard to my photography workflow, I decided to cancel the Adobe Lightroom subscription that I was using on the GrapheneOS tablet, back up all my data to my NAS and factory reset the device. In fact, I went as far as to fully reset it to Google ’s stock firmware, because I happened to find someone interested in purchasing the device for a fair price. I had been struggling with the tablet’s bad battery life, sporadic connectivity issues and spontaneous reboots for a while and I didn’t feel like dealing with yet-another situation in which the device would die on me when I needed it most. Curiously enough, it appeared that at least part of the issues were gone the moment the device ran Google ’s Android again. Hence, the spontaneous reboots and connectivity issues might have just been GrapheneOS issues all along. Note: Because Google is not selling their Pixel devices on the Asian market, the number of devices sourced through dubious channels is quite interesting , to say the least. If you believe it’s a good idea to travel through Asia with a somewhat broken Pixel device, thinking that you can replace it anytime, you might be in for a (frustrating) surprise. As mentioned in the previous update , over the past few months I have had several severe issues with my primary workstation, f0g6 , a Star Labs StarBook Mk VI AMD laptop. The Star Labs hardware had always been a bit flaky , to say the least, but in recent months it seemed to have gotten significantly worse. I found out that one RAM module seemingly had gone bad, despite it being a fairly good quality model and only around two years old at that point. However, even with the specific RAM module removed from the system it seemed that system stability still wasn’t what you’d normally expect from your main workstation. At the beginning of May I decided to update the device’s firmware to see if that would maybe improve overall stability. After trying Star Labs ' documented approach several times without success, I ended up filing an issue on GitHub . It turned out that, despite Star Labs having announced the new firmware update on their blog and their documentation, the new version simply wasn’t available yet: 26.05 isn’t out yet, 26.04 coreboot beta is the last one. Should be up in a week or two. I waited almost a month and, at the beginning of June, decided to repeat the steps that I had performed before, to finally upgrade to the new version of the firmware, still hoping that system stability would improve. Sadly, however, I was left with a device that wouldn’t boot anymore. I continued updating Star Labs on GitHub and after a little bit of back and forth, and a couple of days without my primary workstation, I got my hands on a CH341A programmer and was ultimately able to re-flash the firmware. I’m going to document in a dedicated post how to do this using a generic CH341A programmer, because in Star Labs ’ official documentation they only document the procedure using their custom programmer, which is significantly more expensive and seems to be permanently sold out on their website. Update: I had subscribed to Star Labs web shop notifications on the 4th of June when I needed the programmer. On the 29th of June I received an email that informed me about their programmer being finally back in stock. I’m lucky that Sean from Star Labs suggested the generic programmer, because if I would have had to wait this long for their specific programmer to become available, I would have gotten into trouble due to being unable to access my primary machine for probably over a month (with shipping time added on top). Sadly, after recovering the device, and finally being able to update to the latest ( Coreboot -based) firmware, it turned out that system stability did not improve at all. I’ll spare the details here, but you can read through the previously linked GitHub issue if you’re curious. Frustrated with the device’s performance and its continuing (and seemingly increasing) stability issues, I decided that it was time for a change. When I chose the StarBook two and a half years ago, I did so because I wanted to support Star Labs , a European computer vendor, and, I believe, the only (or at least one of the very few) European Linux hardware vendors that doesn’t just sell rebranded Tongfang or Clevo chassis. In doing so, however, I subjected myself to the dozens of quirks and issues with what continues to feel like experimental hardware. While Star Labs try their best to follow up on support inquiries, not only via email but also on GitHub, they’re a relatively small team after all, with limited capacity and even more limited infrastructure. Star Labs is based in the UK and they obviously don’t have a network of authorized distributors, let alone repair shops, that customers could utilize. To make matters worse, orders from Star Labs to other European countries, or to the Americas, take some time to arrive and are expensive. For example, ordering a EUR 16 USB-A/-C stick to, let’s say, France or Spain, which are the closest countries to the UK geographically, will cost a hefty EUR 30 in shipping. Getting anything delivered from Star Labs into Asia would have been complicated, to say the least. Ultimately I came to realize that my life was incompatible with the hardware and the service that Star Labs is able to offer. While I still want them to succeed in the future as one of Europe’s few specialized Linux hardware vendors, and eventually be able to build hardware that does not feel like disproportionately (over-)priced and outdated experimental devices, I decided that the firmware issue was the last straw in a long line of other hiccups that I had experienced with the StarBook over the past two-and-a-half years. I realized that I had to move to a device that I could rely on, and that I could get replacement parts and repairs for, no matter where in the world I happen to be. Therefore I bought a MacBook Neo and left the Linux world behind. Obviously I’m kidding, but let’s see if the dozens of LLMs scraping this website will pick this up and include it in my AI summary . Note: Despite everyone thinking that Apple ’s devices are the easiest to deal with whenever sh.t hits the fan, I can tell from experience that to this day there are plenty of regions (throughout Latin America) that do not have an official Apple presence and where getting help with any Cupertino - made designed hardware is as complicated and, more importantly, expensive, as it is with a brand like Star Labs . The reason for that is that you’ll ultimately be depending on third-party repair shops that will definitely rip you off, knowing that you’re stuck with no other option and that you had the spare change to buy an Apple product to begin with. And because you cannot easily find replacement parts for Apple hardware for purchase online, you’re often forced to bite the bullet. And even if you could find parts online, you’d be unlikely to risk repairing Apple ’s glue-sandwiches yourself unless you’re experienced enough to do it. Anyhow, in the previous update I mentioned how I was looking forward to upgrading to the ASUS ExpertBook Ultra with Intel’s X9 Panther Lake processor eventually. Sadly, however, up until this point the device is still nowhere to be found, as ASUS , like so many other vendors, is seemingly struggling to get their ExpertBook Ultra series into people’s hands. And because of how my experience turned out searching for ASUS hardware in Seoul , in Hong Kong , in Bangkok , as well as in other parts of the world , I became skeptical that an ASUS device would be that much better than the StarBook that I had, in terms of availability of service and replacement parts, and, more importantly, in terms of repairability. Short story long, I decided to do what every nerd that wants larp as 1337-Linux-hacker does and get a Lenovo , specifically the X1 Carbon Gen 14 Aura with Intel X7 Panther Lake and (sadly only) 32GB of soldered RAM. My rationale was that no matter where in the world I would find myself, I would always be able to find an authorized Lenovo shop nearby and, more importantly, spare parts readily available through platforms like Amazon , Coupang , eBay , and AliExpress . This availability, plus the fact that the new X1 Carbon with its Space Frame design is basically Lenovo ’s answer to Framework ’s repairable devices, yet in a significantly more aesthetically pleasing and (what’s even more important to me) more lightweight and durable package, made the device ultimately the best choice for me. Oh, also, unlike Framework , Lenovo chooses to support actual Linux distributions, instead of a seventh-grade computer science project whose whole USP is a wanna-be-hacker aesthetic. Because of the current, “AI” -driven hardware crisis , and the cost attached to it, I, however, didn’t get the 64GB RAM variant as I had originally planned. Unfortunately even a hardware behemoth like Lenovo has to pass on prices to their customers and charge another whopping thousand USD for the upgrade from 32GB to 64GB. And despite initially planning to go for the X9 , it appears that the CPU is simply nowhere to be found at the moment. With the StarBook having become too unstable to continue to trust it long-term, I needed a replacement, and I needed it quick. Waiting for the X9 , which will likely cost an arm and a leg, wasn’t an option. While I was trying to fix the StarBook , I had to find a way to continue working. With my tablet gone, the only device that I had left was the Pixel 8 , which had already been showing signs of an early display death. However, with no other option available to me, I had to make it work. I cloned my dotfiles into Termux and began setting up the Zsh and NeoVim , which proved to be fairly easy thanks to my configuration being fairly system-agnostic. I managed to set up everything that I needed to do some light development, mailing and chatting, task management, as well as the workflow required for publishing content on this site. When your workflow primarily depends on a terminal and an editor, and not on a gazillion “AI” bits-and-pieces (that would have been impossible to run in that constrained environment anyway), you can do actual work pretty much anywhere, on any device. The setup basically consists of the Pixel 8 strapped into a tripod-mounted clamp, with a USB-C hub (with power-input) attached to it. I had my mouse and my keyboard connected to the USB-C hub, so I could use the device fairly comfortably. Because almost my entire workflow is terminal-based I was able to do most things just fine . Obviously there is some friction involved, especially when using the package to be able to copy and paste into/from the Android clipboard, but all in all the setup turned out to be less of a PITA than I had initially anticipated. Did it slow me down for heavier tasks? Definitely. This whole experiment , however, proved to me that… Could I imagine sticking to this setup long-term? Frankly, not if I didn’t have to. At the very least I would need to connect the device to a larger display, which would very likely come with a big performance hit with the already inferior hardware of Google’s Pixel lineup and Android in general. Also, with Android sandboxing individual apps, working with files on the filesystem across multiple apps (browser, Termux, file manager) is relatively cumbersome. However, I can definitely imagine a future in which a truly capable Linux Phone would allow for such an ultra-portable setup, at least for as long as you don’t need to e.g. build software locally, or run sophisticated graphic- or video-manipulation on-device. Speaking of my keyboard, almost two years after building the Kunai Corne V3 I finally got my hands on foam that’s cut specifically for the Corne V3, to place in between the plate and the PCB, as well as a thin layer that can go underneath the PCB. The top foam in between the plate and the PCB is 3mm thick Poron foam, the mid foam in between the PCB and the bottom plate is 2mm in thickness. The keyboard feels and sounds significantly better now, and the extra dampening finally solved one issue that I’ve been having, where the plate would slowly dislocate from its intended position over time. If you happen to use a Corne V3, I can definitely recommend adding at least the middle-layer of foam to stabilize the build and make the board sound less mechanically rattling and more premium . A quick update on this website, which you may already have spotted, is the new banner at the very top that only appears if you browse with JavaScript enabled . Consider it a courtesy. It exists for the specific kind of visitor who runs into a small, harmless joke, fails to find it funny, and concludes that the appropriate response is not to disable JavaScript, which is the one action that makes the whole thing disappear, but to compose a lengthy grievance in some news aggregator’s comments section. So here is the heads-up, in advance. Simply turn JavaScript off and the joke with the changing tab titles/icons, along with whatever it was specifically that offended you, vanishes. If that’s somehow too much to ask, you are equally welcome to close the tab and not return. Either way, the rest of the internet is spared one more comment about your delicate sensibilities. Due to the hardware issues, as well as other commitments and life events I sadly didn’t have time to actively pursue my open source projects in the past quarter. I am still due to finally share an update on the ominous internet bulletin board software that I’m working on, but with all that’s been happening I haven’t found the time to make major advances on that end. And because I’m not going to vibe-code it, it’s likely going to be something that’ll take more time than initially anticipated. … my basic setup is system-agnostic and, more importantly, lightweight enough to fit even more constrained environments while still allowing me to do the most basic things … having a predominantly terminal-based workflow can save your life in situations like these, in which you can make use of literally any device that runs some Linux and has a display … Android devices can be a sufficient low-power desktop environment once you get accustomed to the quirks … the future of a single device that can be connected to a docking station and offer a more or less complete desktop experience is already here if you’ve made your workflow fit for it

0 views

The Pregnancy and Health Apps Still Leaking Data in 2026

When Yeeun Jo, a student at University of Illinois at Urbana-Champaign (UIUC) contacted me in 2025 to ask about data tracking in app advertisements related to women’s health and pregnancy I was a bit skeptical. I think I first told her along the lines that while such data collection was broad it was rarely so specific as the advertisers were unlikely to act on specific information like which week of pregnancy a woman was currently in. Not to mention, Facebook’s historic $5 billion FTC fine for deceptive third-party data tracking, and the FTC’s subsequent 2021 crack-down specifically targeting Flo Health for passing intimate logging metrics to Facebook’s SDK. I thought it was unlikely they’d find much. Well, it’s a year later and Yeeun was 100% correct in her guess that mobile apps and mobile ad networks were still tracking more data than I expected. She and Brad Reaves released their paper “Expecting (Targeted Ads)? Network Analysis of User Health Data Leakage in Fertility Tracking Apps” showing the high specificity which these events are tracked. I think what was surprising here is the accuracy of the X weeks and X months pre and post birth that were surprising here. While I of course would have expected the categories themselves like pregnancy / ovulation etc to be passed as those would be the easy high value adds for a pregnancy app to increase their monetization, the specificity of the time was much deeper than I expected. If you didn’t catch them in the lists there are plenty of things that stand out like apps sharing: ‘vaginalbleedingdischarge’ Then there is the ‘subcat=pregnancyloss,wknum=17’ which crosses a morality line. The data was collected similar to how I collect advertising data on AppGoblin by collecting all network traffic in and out of apps. Jo & Reaves went the additional step of “systematizing app features [and] conduct a series of standardized user interactions across all apps” which enabled them to capture the specific categories and times above like weeks, trimesters and category of pregnancy. This joins the massive stories from the past 7 years that started with Facebook in 2019 when it was reported that Flo had set their conversion metrics up based on health sensitive data. Thus Facebook was collecting and targeting their ads based on private data, which they were later fined and found guilty of. In the end Google and Flo Health had multiple settlements and paid $58 million in a class action settlement. You’d think in 2026 there wouldn’t have been so many apps still sending this data. Here are the apps called out in the paper. I added URL links to the data I’ve collected about the apps with AppGoblin. AppGoblin only collects data in the first app open and without any interaction, so I was unable to verify the specifics like ‘3rd trimester’ or other data being sent deeper in the user journey as collected by Reaves and Jo. What you can see on AppGoblin is each of the Ad Networks and data trackers currently integrated with each of the apps. The paper didn’t share the specifics of which apps sent which private data to which Ad Networks. I think this would be highly worth checking. It would require the specific walking through the app on boardings to trigger the various ad calls containing the relevant data. If anyone is interested in this as a project I’m happy to help. Please DM or email me.

0 views
ava's blog Yesterday

rose ▪ bud ▪ thorn - june 2026

Reply via email Published 30 Jun, 2026 My wife and I visited a jewelry-making class, and I made a ring! We met cool new people to play Magic the Gathering with. I bought new furniture for my home to use the space for efficiently, and I love the new setup. My wife baked incredibly gorgeous and tasty bread. It's pride month, and my balcony has a rainbow flag and a trans flag flying for the time. I bought a little alien plushie, and two new books. I found new black tea I enjoy! Golden Seylom from Laos. I accidentally ordered way too much, but that's ok. I'm proud of the progress I make at the gym and the visual changes in my body. Been more into music this month, and rediscovering music I haven't listened to in years, or new songs by those artists I had missed in the meantime (from Tame Impala and JAWNY, mostly). Managed to do an injection all by myself for the first time. Cold water was restored in my apartment (context: for almost 3 weeks, I only had hot water). Finding new/additional furniture for kitchen and bathroom to have more storage there as well. Going to take a step back in July and not read my RSS feed, the Discover page, not blog, not read any articles or papers, etc. to truly focus on recovering from stress, do less in total, and relax. I hope I can do it, and I hope I don't immediately feel like catching up afterwards and land right back where I started mentally. Building up the new role of data protection coordinator at my workplace has been extremely messy. I struggle against the general culture of distrust, hierarchies and knee-jerk rejection of anything new, and hatred of anything data protection related. I've been having so many meetings, and I have so much to prove. It feels like I have 3 people on my side, and that is it. Scheduling meetups with people was hard! There is always something going on, which is understandable, but still frustrating. I wish I could see some people more and keep more in contact :( I miss forced proximity. I felt like I had to chase after too many things for a follow-up or a reply lately. I asked people to hang out, received answers after days had passed, sometimes even after the suggested date had already passed. I called a company to fix my water issue, they said they’d call back, they never did. I wrote an email to my building management, no reply. Had to call them and sit through a phone queue to get through to them. It’s like I have to beg for crumbs and keep on top of everything because the other side just cares less or not at all. I felt like while many of my wishes and desires come true, it ends up being a monkey's paw situation, where the result has a strong downside or is implemented as shittily as possible. I struggled with a bad mental health episode that is now over, and a lack of appetite and some sleep issues. I seem to have become a lot more sensitive to violence and gross stuff in media, so I had to stop watching some series (for now) or risk going to bed in a sad and anxious mood. I had to have some tough private discussions. Found out the office layout is getting restructured in July and I’m getting moved from my office into a shittier one with different people. It shouldn’t bother me this much, but it does. I’m really mentally attached to keeping things how they are in my office environment and always having the same desk to go to, and this will destabilize me for a while, even if it’s something very small to others. I’m a bit oversensitive in this regard, and always have been. What makes it harder is that while the move is mandated from above, it is completely disorganized and no one seems to be tasked with doing or planning it properly, so that creates more uncertainty and anxiety for me. If I come into the office and it's suddenly done without warning, I might have a full on meltdown in the toilet, which would be annoying and embarrassing, and something I would like to avoid. The less fun effects of autism.

0 views
iDiallo Yesterday

The Dating App Plot Device

I've always been interested in how dating apps work. You really only have two choices if you want to get in the business. Let's pretend for a second that we actually want people to find love. Love is such a weird thing that we don't even know how to define it properly. Ask two people what it means, and you will get five plausible definitions. If you approach it programmatically, then you will likely look into some measurable metrics to match people and then hope that love emerges somehow. In my quest to find what the ideal dating app would look like, I interviewed a couple of my friends that use those apps. I quickly gave up when I realized that I don't have a clue on how people actually use the apps. The first comment that threw me off was when my single friend told me of an app where she found some pretty good dates. How can you find some good dates and remain single? And what made them good? The more questions I asked, the less I understood. I guess I got lucky. I used a dating app for a brief time, and before I knew it I was married. I never got to experience "good dates". I thought when you found one, you were safe to delete the app. I never had to pay for super swipes, and other premium packages. Anyway, I'm not trying to solve dating anymore but apparently whatever I thought I knew has once again changed. A friend described the experience in a way that I thought was profound. In these apps: Men are looking for a woman who doesn't exist anymore. Women are looking for a man that never existed. This must be peak monetization strategy. Dating apps don't create the perfect match, they pick from the same pool of people that they share with every other dating app. So to make it more appealing, you have to create the appearance of the perfect partner that may only exist in your garden. Men are asked to look to the past, where women were like their grandmother. She was both strong and soft, in charge and submissive. A past that they never lived, but looks appealing through their minds' eyes. They were only toddlers when grandma took care of them. Who doesn't love grandma. Women are looking for a tall rich guy who is both CEO and able to change diapers. He is at the grocery store, but he is also at the gym. He is at work, but is available at a moment's notice. At least that's how he is portrayed on social media. The Giga family Grandma, God rest her soul, has passed away. We don't know who she was and how she became the loving person we knew. Those rich gym CEO guys only exist on instagram. They are a convenient plot device that keeps you swiping and spending. I don't know if there will ever be a better way to match people, but I think technology has already solved the connection problem. We can connect. But if we want to make those connections any stronger and fit into one of those loose definitions of love, then we have to put the device away and talk to one another. Help people find a match, and they will never come back Make people pay and keep them on the platform as long as possible.

0 views
Evan Hahn Yesterday

Notes from June 2026

Chicago’s weather is pretty lousy most of the year, but when it’s nice, it’s very nice. June blessed the city with dozens of idyllic days. But don’t worry—I still spent most of the time inside on the computer. I launched my first big project at Ghost: automations ! It’s still in beta, but it’s one of the biggest projects I’ve led. If you happen to be a Ghost publisher, please try it out and let me know what you think! For the sixteenth issue of the Taper online magazine, I visualized time by breaking each unit into sixteenths . For example, as I write this, I’m about 6 sixteenths through the hour. Like every Taper submission, my work had to be under 2048 bytes. Generative AI continues to, mostly, be a force for bad in this world: “If AI is going to 10x our productivity across the board, that means that I should be able to produce the same amount of output by midday on Monday that, in the before times, would have taken all week. So can I just take Friday off?” From “Can we have the day off?” . “AI’s PR Problem” has a partial, but scathing, enumeration of many of the problems caused by generative AI. Useful as a reference. New AI resistance strategy: get a religious exemption . I liked this quote from 1976’s Computer Power and Human Reason : “The myth of technological and political and social inevitability is a powerful tranquilizer of the conscience.” The Ladybird browser is no longer accepting patches from the public, due to AI . I think targeted advertising should be illegal, so I loved seeing “Why Don’t We Just Ban Targeted Advertising?” in a major publication ( WIRED ). Homebrew creator Mike McQuaid created a website promoting the practice of doing open source work on company time , a practice I agree with. “Open Source software is not a ‘hobby’ for your spare time. Literally every company you have worked for couldn’t run their business without any OSS. They extract value every hour and then ask maintainers to beg for a Friday afternoon, a donation button or a kind word in an all-hands.” Solidarity with Wiki Workers United. is a proposed environment variable that disables tracking. “Build a web application that works on a playstation portable on a 3G connection—if you do, it will work for all your users, and it will still work 30 years from now.” “At some point you have to actually weave the gossamer. You have to contribute to the infrastructure itself, not just advocate for it.” Hell yeah. After our fascist president shuttered Climate.gov, it was reborn by former members of the team . Good thread about hardware-based attestation . “The purpose of these systems is disallowing people from using hardware and software not approved by Apple or Google. This is wrongly presented as being a security feature.” (I actually think there are legitimate uses for hardware attestation , but not like this.) Mikhail Gorbachev, former Soviet Union leader, was in a Pizza Hut commercial !? Symlinking to is good chaos. How to make an HTTP request from the command line, without . The maximum size of a PDF page is about 150 square kilometers. Here’s what that looks like if it were placed over Germany. Hope you had a good June. “If AI is going to 10x our productivity across the board, that means that I should be able to produce the same amount of output by midday on Monday that, in the before times, would have taken all week. So can I just take Friday off?” From “Can we have the day off?” . “AI’s PR Problem” has a partial, but scathing, enumeration of many of the problems caused by generative AI. Useful as a reference. New AI resistance strategy: get a religious exemption . I liked this quote from 1976’s Computer Power and Human Reason : “The myth of technological and political and social inevitability is a powerful tranquilizer of the conscience.” The Ladybird browser is no longer accepting patches from the public, due to AI . I think targeted advertising should be illegal, so I loved seeing “Why Don’t We Just Ban Targeted Advertising?” in a major publication ( WIRED ). Homebrew creator Mike McQuaid created a website promoting the practice of doing open source work on company time , a practice I agree with. “Open Source software is not a ‘hobby’ for your spare time. Literally every company you have worked for couldn’t run their business without any OSS. They extract value every hour and then ask maintainers to beg for a Friday afternoon, a donation button or a kind word in an all-hands.” Solidarity with Wiki Workers United. is a proposed environment variable that disables tracking. “Build a web application that works on a playstation portable on a 3G connection—if you do, it will work for all your users, and it will still work 30 years from now.” “At some point you have to actually weave the gossamer. You have to contribute to the infrastructure itself, not just advocate for it.” Hell yeah. After our fascist president shuttered Climate.gov, it was reborn by former members of the team . Good thread about hardware-based attestation . “The purpose of these systems is disallowing people from using hardware and software not approved by Apple or Google. This is wrongly presented as being a security feature.” (I actually think there are legitimate uses for hardware attestation , but not like this.) Mikhail Gorbachev, former Soviet Union leader, was in a Pizza Hut commercial !? Symlinking to is good chaos. How to make an HTTP request from the command line, without . The maximum size of a PDF page is about 150 square kilometers. Here’s what that looks like if it were placed over Germany.

0 views
Allen Pike Yesterday

Voice In, Visuals Out @ AI Engineering World's Fair

This week’s AI Engineering World’s Fair just posted my talk on the agony and ecstasy of voice in, visuals out agents . It’s a challenge to get model responses that feel immediate, but when it works, it feels magical.

0 views