Posts in Ai (20 found)
Giles's blog Yesterday

JAX backends and devices

There's nothing like writing your own code with a framework to clarify how things fit together! Continuing with my port of my PyTorch LLM code to JAX , I wanted to load up a large dataset: the 10,248,871,837 16-bit unsigned integers in the split of . That's just over 19GiB of data. When I ran that, I got a CUDA out-of-memory error: That makes sense! The allocation it was trying to do is exactly the size of the data I was trying to load. I have an RTX 3090 with 24 GiB, but some is already used up by the OS, various apps, and a model that the code creates earlier on. But in PyTorch land, I was used to things being loaded into RAM by default, and only moved over to the GPU when I asked it to do that. JAX was clearly loading to the GPU by default. How could I stop it from doing that for this case? The load into the GPU was happening inside Safetensors, in code I couldn't directly control. Understanding how to do it helped me understand a little bit more about JAX. JAX has a function that looks relevant: . Without reading the docs, let's try running it. In my virtualenv, with the package installed, I get this: That seems a bit weird! I do indeed have a CUDA device, but I also have a CPU, obviously. Why isn't it showing up? Running the same code in another virtualenv, with just installed -- no CUDA -- gets this: OK, so it did recognise it this time. Feels like it might be time to RTFM. The docs explain things a bit: Returns a list of all devices for a given backend. If is , returns all the devices from the default backend. The default backend is generally or if available, otherwise . OK. So JAX has multiple backends -- named that because they're classes of backend hardware that XLA (the compiler behind the JIT) targets. There is a default one, which is essentially going to be the "best" one available given the hardware configuration and the parts of JAX that are installed. When I had the CUDA version installed, it made the backend default, but when I didn't, it defaulted to (and warned me). And because it only shows the devices on the default backend, when that was , I didn't see the CPU. However, you can specify which backend you want to use with that parameter, so let's go back to the virtualenv with CUDA: Great! So is there some way to list which backends are available? Apparently not -- the recommended way appears to be to try loading devices for the different possibilities, and catch to see which ones aren't available. Yuck. But maybe that's not such a big deal. In PyTorch-land I was very much used to putting code like this near the start of my code: ...then moving models to the device: ...and then moving data to the model's device as needed: What I actually wanted was essentially what JAX does -- have everything on the fastest device available at all times -- but with specific exceptions. In particular, the one that started off this investigation: how would I put this huge array of training data on the CPU's RAM rather than the GPU's VRAM? I had a bit of a false start when I spotted that the function in the Safetensors FLAX API has a parameter, but that appears to be more to do with how it loads up the file -- a backend in a different sense. And anyway, backend is not the right concept in JAX-land, as the backend means just something generic like -- for what we're trying to do, we want to load it onto a specific device . After some digging around, I discovered that JAX has a concept of a default device , which is the one used when it doesn't have any indication of where to put something. It makes sense that this will be on the default backend -- indeed, it looks like it's essentially "the first device in the list that returns for the default backend". There is a config option which you can use to set it; you'd normally use or an environment variable to change it. But what if you only want to change it temporarily? I found this documentation for . The docs are more than a little confusing: Context manager for config option. Configure the default device for JAX operations. Set to a Device object (e.g. ) to use that Device as the default device for JAX operations and jit’d function calls (there is no effect on multi-device computations, e.g. pmapped function calls). Set to None to use the system default device. That near the start tripped me up, as I missed the words "Context manager" just below, and the odd type, and tried this: I still got the CUDA OOM, though, so I reread the docs, spotted the "context manager" bit, swore violently, and tried this: ...which works. It looks like the equals sign in the docs is being used to mean something very different to what you'd normally use it for, and they decided not to actually document the signature of the context manager. Heigh ho. I guess documentation is hard . Still, at least now I have a solution. And as I said earlier, doc grumbles aside, the shape of the code might wind up being a little less fiddly than PyTorch. The default location of things I create is the fastest hardware I have, which is what I want. And for the rare exceptions when I don't want to use that, there is a reasonably simple (now that I know it) way to say where I want things to go. I'll call that a win :-) The only thing I'll need to remember is that when, in my training loop, I want to use subsets of that in-RAM tensor, I'll need to move them to the GPU. looks like the right tool for that.

0 views

Premium: The Hater's Guide To The AI Bubble 3.0

Last year I wrote one of my favourite pieces ever — The Hater’s Guide To The AI Bubble — and followed it up with The Hater’s Guide To The AI Bubble Volume 2 several months later. Sadly, I’ve realized “volume” is a terrible way to structure something like this, because each volume is more of an update , which is why today’s newsletter will move to a versioning system. The AI bubble is a psyop, a melodrama, a financial crisis, and a mask-off moment for the Business Idiots that run the vast majority of our economy. It is the largest-scale exploitation of ignorance in history, gnawing at the intellectual weaknesses of society by presenting just enough information or just enough proof to substantiate a trillion-plus dollars of investment and manufactured consent for a technology that, based on how many discuss it, doesn’t actually exist. And it’s revealed how many rich and powerful people are either (or both) credulous and woefully ignorant. To be clear, LLMs are real and do some things, but they don’t do any of the things that Dario Amodei is talking about when he says that AI will wipe out 50% of white collar jobs . We’re four years into this joyless slog and people are still talking about AI’s “potential” and what it “will” do and that we’re in the early innings of a technology that, for the most part, is still doing exactly what it was doing at the beginning with refinements that never come close to reaching the vacuous heights of boosters’ promises.  Markets are moved by poorly-written fan fiction by outright scam artists and deceptive hedge fund gargoyles because those selling AI services have entirely disconnected the minds of the markets and the media from reality. This is because con artists like Amodei and Altman constantly discuss what AI will or might or theoretically could do rather than what it actually does , because if they had to do that they’d have to say it constantly loses money and doesn’t have a measurable return on investment . As I said on Bloomberg this week , the markets and the media have conflated capital expenditures for data centers with a thriving AI industry. In reality, 89%+ of all AI revenues and 90%+ of all compute demand comes from two companies — OpenAI and Anthropic — largely based on money-losing subsidized AI subscriptions and unrestrained token burn at organizations run by imbeciles that will go away now that executives are having trouble justifying it because there’s no ROI , in part because AI is too inconsistent and unreliable, and in part because you can’t really measure how much a task will cost .  Now enterprises are already capping their AI spend , with many more are going to follow after multiple companies blew through their annual token budgets in a few months . The sheer volume of the “AI ROI” conversation is remarkable considering that Anthropic and OpenAI only moved enterprises to token-based billing — paying the actual costs of AI — in Q1 of this year.  Remember: the total, actual revenue of the entire AI industry — including OpenAI, Google, Microsoft, Amazon, and Anthropic — has barely reached $100 billion in 2026. That includes every ounce of compute spend, every penny of the $500 million that a single customer accidentally spent on Anthropic’s API , and every cent of NVIDIA’s backstop deal with CoreWeave . More importantly, absolutely nobody is making a profit outside of those selling the bits that go inside a data center.  Both OpenAI and Anthropic lose billions of dollars a year, with no end in sight, though Anthropic did a great job swindling the media by having a single “profitable” quarter thanks to Elon Musk discounting two months of compute . Anthropic has already filed to go public , with OpenAI  allegedly not far behind . Neither of these companies are fit for public investors. Their products are inconsistent, unreliable and only ever seem to get “better” in a kind of wobbly way that can only be measured by increasingly-less-useful benchmarks that they specifically train to beat. Despite many people (and some companies like Spotify ) claiming that AI is writing “most” code, nobody can seem to explain what that means. It isn’t saving money, it isn’t saving time, it isn’t making companies ship better or more-functional products, and the only tangible examples of its effects are that it broke AWS several times and deleted a company’s database . It’s unclear where AI exists outside of coding and the various places companies have shoved it.  I’ve spent years trying to catalogue other, non-coding use cases, and most of what I’ve found are vague descriptions of companies like Goldman Sachs maybe launching agents “soon” at some point to do something maybe and this weird story with Novo Nordisk claiming that it was “integrating ChatGPT’s models to analyze complex data sets” despite them claiming to have done this for years . That’s because generative AI is, no matter how many hats or harnesses or deterministic processes you add, limited by its mathematically-certain hallucinations . These models are probabilistic, guessing at what the ideal output may be, which means that every bit of information they produce is suspicious and every decision they make is brainless, thoughtless and arbitrary. They do not “know” things, they do not have “thoughts,” and no amount of API connections will fix that problem. As a result, nobody has really got a clear answer as to what everybody is doing with AI. Code? Image generation? Using it as a shitty search engine? Using it as a companion? You can’t really rely on it to do anything. When a model hallucinates an incorrect answer to something you know is true it’s a problem that can be fixed — when it hallucinates an incorrect decision with your codebase, that’s fucked everything up to a near-permanent end. This is the ultimate problem with AI. You can try and dress it up with billions of investment and supposed ways to mitigate hallucinations, but it still makes — and will continue to make — mistakes that it has no idea are mistakes.  Well, okay, the other problem is that generative AI just isn’t built to do most jobs. It can generate stuff and summarize stuff at varying degrees of complexity, but the more complex the generation, the more likely it is to hallucinate. The only way to reduce hallucinations is pre-training (shoving stuff into the model at the beginning) and post-training (training it on what “good” looks like), and neither of these actually solve the problem. It is clumsy, inaccurate, unreliable, expensive and cumbersome.  AI cannot do the vast majority of jobs, and the only reason that anybody thinks that it can is that the vast majority of CEOs have no actual connection to the work that enriches them , and because AI can do an impression of something that looks like work, they choose to believe it can do anything . It can burp out a half-functional prototype with the company’s name on it or legitimate-looking legal or financial document, and that’s all it takes for a fuckwit with a high salary and a low IQ to think it’s capable of replacing everybody. If I were wrong, it would actually be replacing people. You’d be able to point to both the data and the proof. You’d have single-person software companies making billions of dollars, hyperscalers would have their companies destroyed by people copying and bettering their software, accountants and lawyers and writers and every other knowledge work career would be dead , not threatened with constant layoffs that are mostly connected to improving profits, but actually dead, untenable, impossible to work in thanks to the “power of AI.” In reality, AI is dramatic only in its mediocrity and the ferocity with which it’s proven how ignorant most authority figures and executives have become. Every boss demands you use it, every app screams at you to try its integration, every news story tells you it will replace you imminently, but in the end it doesn’t appear to do very much beyond generating and summarizing at varying levels of complexity.  The media categorically failed to scrutinize an industry built to exploit it, as I said earlier in the week : The consent has been manufactured and the markets are engorged with semiconductor stocks running because people keep mistaking the availability of debt for actual, real demand for AI compute. The geniuses in private credit and the greater markets saw the amounts that hyperscalers were spending on data centers and the ascent of OpenAI and thought “fuck me up, grandpa,” leading to $178.5 billion in data center debt deals in the US in 2025 and $50 billion in data center construction in April 2026 alone .  Yet it turns out that data centers take anywhere from 18 to 36 months to build , with Microsoft finishing a grand total of zero of the data centers it broke ground on in 2023 , and JP Morgan saying a month ago that 60% of capacity planned for completion in 2027 hasn’t even started construction, with another 7% delayed, per the Wall Street Journal .  And despite the supposed 100GW+ of data center capacity being planned, AI compute demand doesn’t really exist outside of Anthropic and OpenAI, two companies that rely on perpetual flows of venture capital and debt to survive. Between them, they’ve raised over $200 billion in the last six months , and their revenue streams are inherently based on either unprofitable AI startups subsidizing their subscriptions , their own unprofitable subsidized subscriptions, or experimental token spend borne of companies allowing their employees to burn as much as they’d like , which is already coming to an end. At the top of the pile lies NVIDIA, the largest company on the stock market, which sells GPUs that are so expensive that once cash-rich hyperscalers are now having to take on mountains of debt or, in Google and Oracle’s case, dump tens of billions of dollars of new stock into the markets. NVIDIA’s continued growth relies on a dwindling subset of clients, with 54% of its last quarter’s revenue and 64% of its accounts receivable coming from three customers in its last quarterly earnings.  Demand is somehow both incredibly high for data center components but so low for AI compute that NVIDIA has agreed to spend $30 billion over the next six years to rent GPU capacity.  That’s because the AI buildout is being driven by people who haven’t bothered to check whether the demand is real, much like AI is being adopted by people that don’t bother to do any real work, much like AI is sold based on things that it can’t actually do.  Midwits and the incurious will say this is just like the Dot Com Bubble ( it isn’t and won’t leave behind any useful infrastructure ), or Uber ( it isn’t ) or Amazon Web Services ( it isn’t ) because they want to rationalize the waste. In reality, the people running the tech industry are listless Business Idiots throwing as much cash at the problem as possible rather than facing the fact that they’ve backed a dead-end technology because they’ve run out of hypergrowth ideas . Today’s piece is an attempt at a little fun — a raucous, aggressive rundown of the major players and stories of the AI Bubble, both as a refresher for those who already know and a guide for those that don’t. Welcome to the Hater’s Guide To The AI Bubble 3.0. The Rot-Com Bubble — A Guide To How The AI Bubble Got Inflated Why You Keep Being Told AI Is Powerful How The AI Industry Is Almost Entirely Wrappers For OpenAI and Anthropic’s Models How NVIDIA’s Findom Operation Conned Every Hyperscaler How Microsoft’s AI Strategy Has Fallen Off The Rails How Google Is Using AI To Destroy Its Legacy How Amazon Lost The Plot And Became Anthropic’s Paypig  How Mark Zuckerberg Burned $158 Billion To Buy GPUs For Effectively No Reason How SpaceX Became Musk’s Last Gasp Attempt For Exit Liquidity How Anthropic Is The Greatest Exploitation of the Media and Economy In Tech History To Prop Up An Unsustainable Company Run By The Most Annoying People Imaginable How OpenAI Became A Miserable Failson With Too Many Ideas, Unsustainable Economics, and No Plan For The Future How The ROI Conversation Could Burst The AI Bubble

1 views
Giles's blog Yesterday

Using Safetensors with Flax

I'm porting my PyTorch LLM code to JAX , using Flax as the neural network layer. For various reasons I wanted to use Safetensors to store checkpoints of the model. It took a little while to get it working; here's the trick I learned. If you look at the Safetensors docs, you'll see that it doesn't mention a JAX implementation -- indeed, searching for "safetensors jax" at the time I'm writing this gives you a link to this GitHub repo by Alvaro Bartolome -- which was last updated in 2023. However, if you look more closely at the docs, they do have a link to the Flax API . I feel this is somewhat misnamed, as it is actually a JAX API. There's no reference (again, as of the time of writing) to Flax in the source -- it's all just JAX code. And in fact Bartolome's library uses it under the hood. There is one problem, though. The API works with simple single-level dictionaries, with strings mapping directly to JAX arrays. For example, the function has this signature: This can cause problems if you're not careful. If you look at the Flax documentation on checkpointing , it suggests that you use Orbax 1 , which has its own API and file format, but then goes on to say: When interacting with checkpoint libraries (like Orbax), you may prefer to work with Python built-in container types. In this case, you can use the and API to convert an to and from pure nested dictionaries. I initially put two and two together -- that and the dictionary-based API for Safetensors -- and got five, and tried feeding one of those "pure" dicts into Safetensors. I got a very confusing error: It's worth digging in to why that happens. The problem is that although Safetensors is expecting a dict of strings mapping to tensors, it doesn't check that that is what it actually gets. And while the dictionaries from are "pure", they are also nested (as the docs say!). Even for the simple model I was working with, I got a structure like this: So, we had strings mapping to dicts, and those dicts mapped from strings to the JAX arrays. More complex models would have had deeper dict structures. Now, internally inside Safetensors, the Flax/JAX API is a simple wrapper. It iterates over the keys in the dictionary it's been provided with, and tries to convert their respective values into NumPy arrays. It does that by passing them into NumPy's function, which accepts things like lists, tuples, and NumPy arrays, and converts them into arrays. JAX's own class exposes an interface that it recognises, so they're converted without trouble. Once it's done that, it passes the result to a lower-level Rust implementation that actually converts everything to Safetensors format. But because Safetensors didn't check types, in my case it was iterating over the top level of the dict, trying to convert the values to NumPy arrays, and got something like this: That is -- because it assumed that the values in the top-level dict were JAX s, it blindly tried to convert them to NumPy arrays. But they were dicts (that happened to map from strings to arrays) -- and if you ask to create an array based on a random object, it happily does so and wraps that object in a NumPy array, with a of . When that is then fed into the lower-level Rust code that is trying to write the file, it encounters NumPy arrays that have a it can't handle, -- hence that error: It all makes sense when you read through the code, but I was a bit perplexed for a while! I think all this might be the reason why Bartolome created his GitHub repo. In the README, he says that: There are no plans from HuggingFace to extend safetensors to support anything more than tensors e.g. , see their response at huggingface/safetensors/discussions/138 . So the motivation to create is to easily provide a way to serialize using safetensors as the tensor storage format However, you don't need to use that library to serialise simple Flax models. Consider how PyTorch models get serialised to Safetensors; my LLMs have keys with names like , , and . They're "flat" dictionaries mapping strings to PyTorch Tensors, similar to what Safetensors wants for these Flax ones, but they use dots to separate different levels, with integers for list items and strings for field names. Looking at the pure-dict structure I had for my model: ...you can see that you could walk the dictionary structure to generate keys like and . That would be easy enough to code up. But -- as Adithya Dsilva points out on GitHub -- you can get there even faster by using . That returns a (non-dict) structure like this: If you iterate over that , you get tuples where the first element is that tuple of strings, like , and the second is a object wrapping the JAX . The tuples mirror the dot-separated string format in the PyTorch-style Safetensors files. objects also implement an interface that can understand, so you can quickly and easily convert the to a regular dict for Safetensors: (You need to wrap in a because if you have a in your model, the item in the tuple will get an integer index rather than a string). You can go the other way pretty easily too; given a model, you can load the saved checkpoint into it like this (because accepts raw JAX s in place of explicit s): A little more work than I'd ideally like, but given that it can be tucked away in general / functions, not too big a deal. Hope that's of use for other people coming across this problem! I'm beginning to feel a bit swamped with all of these libraries with names ending in -ax. It reminds me of the names of the characters in Asterix's village ...  ↩ I'm beginning to feel a bit swamped with all of these libraries with names ending in -ax. It reminds me of the names of the characters in Asterix's village ...  ↩

0 views
Chris Coyier 2 days ago

The New Van

I got something I’ve wanted for years and years! A camper van! I’m a camper van guy now! It’s a Mercedes-Benz Sprinter, and even more technically a Winnebago Revel . I scoured Craigslist, Facebook Marketplace, and RV-specific sites for a long time drooling over these things. I’ve rented a half dozen of them on Ourdoorsy over the years. I’ve borrowed friends. So I feel like I knew what I wanted and I knew the specific price range I could go, so it took a little while to find it. Ultimately one that came up on Craigslist led back to one sitting on the lot at a local spot called Just Used Cars . I liked the size of it. Just a normal length, not “extended”. Plenty of height to stand up in. I like that it’s an actual Sprinter base because of it’s nice poise/stance compared to other van bases. Also it’s 4WD and has good ground clearance which. That wasn’t a requirement for me, but this will make me trust it driving in the winter and up to Mt. Bachelor and such, which will be really nice. Although funnily enough, I’ve already gotten it stuck once out in Pacific City at the beach — even in 4WD Low and using the traction boards on the roof. Sand is rough. I like the tan color as well. Maybe I’ll get some cool decal or have an artist paint the side or something someday. I wasn’t specifically looking for a Winnebago Revel, but that’s just how those roll, and honestly, the Winnebago name sounds nice to me. Long history making campers, obviously. The interest rate on this thing was horrible. It’s 8 or 9% or something. It doesn’t really matter, as my plan is to pay it off in the next few months. My thinking is that it will do great things for my credit score this way. We’ll see. I co-signed for a “normal” new car just recently, and that rate was 3%, which seems fine/good. I started writing a bunch more little stories about the van. I’ve been using it and thinking about it and working on it a ton, so there is a bunch to say. But I think I’ll break those out into smaller blog posts as I go! One quick one: after I bought it, the dealership called and told me the previous owner wanted to talk to me. I approved them giving him my phone number, we chatted, and he came over to see the van. I was able to return to him some things that belonged to him tucked away into crannies all over the van. He was a nice guy who just really really wanted the new owner to understand it . All the little details about how it worked and where you can put things and quirks and whatnot. We spent a few hours going over things. I really appreciated that, and it shows how attached some people can get to these homes-on-wheels.

0 views
Kev Quirk 2 days ago

It's Just Broken: Oh WordPress

by Pup On Tech In a recent post, the Pup ON Tech perfectly captures the absolute nightmare that is building a self-hosted WordPress site. What starts as a simple VPS setup quickly devolves into a bloated mess of heavy themes, dozens of conflicting plugins, and rigid page builders. By the time you’ve fought with broken caching layers and terrible performance, you realise that fixing the bloat defeats the entire purpose of using WordPress. Read post ➡ WordPress really is a nightmare, and this post by Pup On Tech really capsulated that! Should have just used a flat-file system or an SSG from the start. 🙃 Thanks for reading this post via RSS. RSS is ace, and so are you. ❤️ You can reply to this post by email , or leave a comment .

0 views
Stratechery 2 days ago

An Interview with Microsoft CEO Satya Nadella About Finding Core Competencies

Listen to this post: Good morning, This week’s Stratechery Interview is with Microsoft CEO Satya Nadella . I have previously interviewed Nadella in May 2024 , October 2022 , April 2020 , and May 2019 . As I noted yesterday , I spoke to Nadella shortly after the conclusion of his keynote at Build , Microsoft’s annual developer conference . One notable thing about the keynote was the fact that Nadella was — outside of product demos — the sole presenter; one gets the sense he has shifted into a much more hands-on role at Microsoft over the last year. The reasons why are clear: my first question to Nadella was if he was happy about where Microsoft was currently positioned as a company. We talk about the reasons for that question, the status of the company’s partnership with OpenAI, and whether Microsoft has invested sufficiently in AI infrastructure. Then we talk about the future of software, Microsoft’s business model in the age of AI, and if they can operate independently from the leading edge models. At the end we talk about Project Solara and whether Microsoft will ever pay residents to build data centers. One note, with regards to a misunderstanding towards the end of the interview: there is no documentation I could find about being able to use Copilot Cowork with non-Anthropic models; Microsoft’s own documentation fits my understanding. As a reminder, all Stratechery content, including interviews, is available as a podcast; click the link at the top of this email to add Stratechery to your podcast player. On to the Interview: This interview is lightly edited for clarity. Satya Nadella, welcome back to Stratechery. SN: It’s great to be with you, Ben. So first off, I don’t know if you realize this, but at least according to my daughter, the defining word for the real grinders in Gen Z — first off, LinkedIn is like the social network. SN: That’s great! Number two, the word they all use is “build”, “I’m building, I’m building”, so who knew when I was at the first Build, I think, in 2010? Or was it 2011? Who knew you were such a trendsetter? SN: (laughing) There you go, I’m thrilled that your daughter is building and is on LinkedIn. Yeah, well, I’m not sure if she’s on there, she’s more making fun of people, so we’ll see how it works. We last talked the summer of 2024 after Build, this was up in Seattle. To say a lot has changed since then is an understatement. I had a bunch of questions I wanted to ask you about the business as a whole, things going on, I’m going to start with those, then I have questions about the presentation at the end. But relative to that, I want to ask you one simple question: Are you happy with Microsoft’s current competitive position? SN: You know, always this is the trickiest thing, you can sit here and say, “I’m happy” — that means you’re not ambitious enough and when you say, “If you’re not competitive, what the heck are you doing?”. And plus you have like 57 different product lines. SN: I’d say the thing in these platform shifts in particular is to, one, get the conceptual model of, “Where is the opportunity for us as a company?” — most people measure competitive position as if it’s a complete zero-sum game, and it’s never been the case. Which is, it is not the case with the cloud, it is not the case in client-server, and so to me, “What is Microsoft uniquely capable of doing in this new world” — that’s the key thing that we have to answer before we even get to the competitive position. In that context, “What is it that we really have a shot at?”, which is we can be a trusted purveyor of a platform, which is what we’ve always done, that allows people to create more value on top of a platform, which is again the DNA we have. Even in a world where these frontier models seem to have no limit— A very large appetite. SN: They have large appetite. That is what I feel even this Build , this conference, we are at that state where we can now really turn this from any one frontier model to saying, “Hey, there is actually a way for a frontier ecosystem to emerge where there are many stakeholders who all actually are operating with their own frontier intelligence”, that is a place where I think we have a unique shot, a unique competitive angle, and most importantly, brand permission. This is the other thing I’ve learned, Ben, which is every company thinks they can do everything, and then they realize that the world doesn’t need them to, the world wants them to do the one thing. Is that a lesson that you had to learn? SN: Yeah, absolutely. I’ve always said this, at Microsoft we are at our best when we do what the world expects us to do, we are at our worst when we do things out of envy, which is just because somebody else had some cool hit, somewhere, doesn’t mean we should go do that. But enough about the Zune, right? SN: (laughing) Yeah, Zune was a great device, but the world didn’t need Zune from us, and so that was the end of it. This identification of your unique capabilities, is that one of the changes over the last two years where that has emerged? SN: Yeah, in fact, it has emerged and also the world’s kind of gotten to it. Has it been forced on you to an extent? SN: Yeah, even my own conceptual understanding, I started by thinking of, “What are models?”, models are kind of like some stateless APIs, then I adjusted and said, “Oh, maybe there’ll be like databases” — they’re really more than that. I don’t remember talking about this with you, but last time I talked to [Microsoft CTO] Kevin [Scott], we analogized it to processors at some point, and you actually did make a comparison in terms of the partnership to your partnership with Intel. SN: Exactly. So the question now is, it’s a better conceptual model to think of what we’re doing is you have to really build a learning machine, and any company has to build a learning machine, so what I want to build is essentially a multi-tenant learning system that allows everybody to have their own hill-climbing machine . So that conceptual idea, now I’ve turned what is essentially frontier is not about any frontier model — I want to build whatever you did with M365 or with Azure into a platform which allows everybody to basically build their own hill-climbing machine right because the future of a firm at a foundational level they’ll have human capital they’ll have token capital and for the token capital they need their own hill-climbing machine. All right, so I’ll jump to the end, you released seven new models, you emphasize the work you’ve done to build these models from scratch, not with distilling, not with using other models as teachers — so did you just articulate what the ambitions are with these models? SN: Yeah, there are two sets of things. One is we wanted to build from ground up with clean lineage, the models that we will have that we can license and allow enterprises to continuously hill-climb, so that’s why we want that model. By the way you talked about distillation — the point is to not use distillation during any of our own hill-climbing but at the very end, in fact some of the things that we are doing is, after all, we have all the OpenAI IP, in fact some of the performance gains we get is by doing RKLD, which is reverse knowledge distillation , and RL on top of it. So we have effectively two frontiers, we have our own, we have the OpenAI, and we’re going to use these things to eval match. And the clock is ticking to get to the right state you need to be while you still have that access . SN: Yeah, and there’s five years of it. But the bottom line is at any given point in time, I want to make sure that I’m using the best, most efficient model for whether it’s in coding, whether it’s in security, making sure also in our case, we’ll have a harness that’s independent of these models, we have the GitHub Copilot harness that’s used everywhere across Microsoft. Our goal is to make sure we have a model lineage, which we control end-to-end, we then use OpenAI IP, even with all of the capability it has — ultimately, the tests are going to be the evals for us and our customers. In the long run, the way it was framed today, and I thought it was very compelling, and it speaks to what you just said, was this idea of enterprises being able to take these models and in their own RL environments incorporate their data at a much deeper level than sort of a slap-on RAG implementation or basic post-training. Is that the end goal, though? SN: Yeah, the end goal for me is the following, which is I go back and say, let’s say that they’re a generalist model — if you go back even, Windows could have a release, then another release, and Adobe and Autodesk could keep building and keep going up, what’s the moral equivalent of that? That is the thing. And then in the first time, we said fine-tuning, it kind of didn’t work because we didn’t have the tools, we didn’t have the data collection regime, none of that. But now we have it. So let’s say the generalist models keep getting better, MAI models, let’s say, or OpenAI models, then you have this RLE. Right, but this deep customization of the models you’re talking about is only possible with MAI models. SN: That’s correct, but the thing that we want to start getting everyone on is this multi-tenant hill-climbing system — so if you think about it, we literally turned your use of M365, which already is a multi-tenant system, into a hill-climbing system for you. Okay, I’m gonna have to stop you, I’m going to give you an ELI5 opportunity, explain hill-climbing to the audience. SN: Hill-climbing is basically when you think about, “What does AI do?” — AI is all about taking an objective and continuously learning how to go predict and create that output that is the representation of that objective, and do so continuously. So that’s why a metaphor of hill-climbing is the best way to describe learning. And you want everybody to do this individually on their own hill. SN: Individually on their own. As opposed to like, hitching along. SN: What is your moat as a company? Your moat as a company is your tacit knowledge. In a world where AI exists, and network effects of AI exist, you need your own hill-climbing machine in which the models are learning. So the first thing we want you to do is, people don’t talk enough about this, but the private outputs, the evals, as I think about as, maybe the most important IP a firm creates are these private benchmarks and the private evals where you are tastefully recognizing what’s the output, the quality. And by the way, today’s failure cases are informing you to change the benchmark continuously, it’s not a static thing, that’s kind of how the evals work. And so if you have your private evals, then you have your own reinforcement learning environment that you’ve created, then you invite all the models to show up, and then you say, “Model A, generate the output that is maxing this eval using my environment and my trajectories and model B…”, and I can switch. In that context, the MAI models is one more lineage that you can put into,c and what we proved today was even a very efficiently trained reasoning model or a coding model can hill-climb using your traces and that will be more token-efficient and it will be fundamentally a great advantage. Exclusive to you the customer. SN: Yeah, that’s right. But is that just for now? If you fast-forward, is your vision that actually MAI models are fully competitive on the frontier with the other general models? SN: They are. Even today, when you start saying that — the world will keep getting better in general.** Well, I guess this goes back to, is this about how you need to do what you’re good at? SN: Correct. One, what we’re good at and also what’s the equilibrium of the world? Which is, if you believe there are only going to be two firms in the world, then of course, they only need two frontier models, but if you fundamentally believe that there are going to be as many firms as there are today and more, then what is the firm in the age of AI? It’s going to have human capital and token capital, how did that token capital get created? It’s not a bunch of API calls, it’s actually some set of weights even they have. Right. And so do you want to accrue that advantage or do you want to give it to OpenAI and Anthropic? Well, speaking of the OpenAI partnership, I mentioned you referred to it like the Microsoft-Intel partnership, and sometimes partnerships are the only way to get ahead. How do you think about that partnership now? SN: I still think that it’s — I’m very proud of the fact that we came together, you remember the circumstances in which we came together were very different and the fact that there is a company now that may go public and be a trillion-dollar company— This is my question — how long were the knockdown, drag out fights between in this corner, there’s Satya Nadella, the operator, and in this corner, there’s Satya Nadella, the investor, tussling over what to do? SN: (laughing) At the end of the day, we are an operating company, investment is just more of an accident. Yeah, but the shareholders are ultimately those investors! SN: I’m glad and it’s a fantastic outcome for our shareholders too and what have you. But I think the way I came at this, Ben, is to say genuinely I’ve always approached it as, if there’s a partner that we can partner with and ourselves innovate, and they’re also successful, that’s fantastic. I always go back to the story of having built SQL Server with SAP. SAP was successful, we were successful, we also then went on to do other things. And so therefore, I think OpenAI, I’m glad we worked with them, we’re working with them, they continue to be a premier partner. As I said, until 2032, we still have a lot as a customer of theirs, them as a customer of ours, as an IP partner. So every day OpenAI does well, Microsoft does well. Is there a bit where everyone thought you were so far ahead because of your partnership with OpenAI, and now when we talk about things like your MAI models, it’s like actually “We got a little bit lulled to sleep because we offloaded too much to them, and now we’re having to recalibrate”? SN: Lots of things, one is, like all things, there’s a lot more competition, there is OpenAI, there is Anthropic, there’s Google, there is tons of folks who are in there. And so I think for us, the beginning, it was great that we got started with OpenAI. Think about where we were in 2018 to where we are in 2026, here we are competing with Google and a bunch of people whose names I wouldn’t have known in 2018, and so that itself proves that to your very first question, “How competitive is Microsoft?” — I’m glad Microsoft took that shot. Here we are competing with a bunch of new people, a bunch of old people, and we have our own game. So we already talked about Satya Nadella, the operator, and Satya Nadella, the investor. What about Satya Nadella, the capital allocator ? There were a lot of reports in about early 2025 about Microsoft pausing and a reconsidering some data center investments, you guys have sort of spun that as, “Lots of speculative stuff”, “We’re streamlining”, etc. — but at the same time, your percentage of free cash flow committed to CapEx lags fairly significantly behind your peers. Four months ago, that was a compliment. Now, is it a diss? How are you feeling about that? SN: The last time I checked, my free cash flow is getting allocated pretty well to capital return that makes sense. Is there a case that you’ve underinvested? SN: Not really. I think the key thing that at least we wanted to make sure is we were not upside down on building — we have a hyperscale busines, we have our own application business, and we have our own research compute to allocate, there are three buckets, we wanted to allocate with great discipline on all three. So take the hyperscale business. Hyperscale businesses are about having a few big customers, but also having a massive long tail, so you can’t have a book of business that is just a few model companies — in fact, one model company — that was the fundamental decision. And you wanted to get out of that business. SN: Not just get out. They’re still there, they’re a major tenant. SN: They’re a major tenant. But, let’s face it, Anthropic over time or OpenAI over time will build their own, it makes sense. They would use — I’m not saying that they won’t use other cloud providers. So to me, it was clear as day that, what I wanted to do was not allocate all my compute only to one player and so that was the adjustment. And once you make that adjustment, you can’t build 10 gigawatts in Texas and say, “That’s it”, you’ve got to build a plant that is spread around the world, around the United States, and that adjustment is what we want to do on hyperscale. The other thing that I have to do is make sure we’re doing also the long-term thing for our investors, which is, “Let’s invest in ourselves”, which is inference compute has exploded, whether it’s in GitHub or whether it’s in M365 and we needed to make sure we fund our own applications. And then our own research compute, these MAI models. So I just took the approach of putting these three, we will definitely want to allocate as we see progress on all this and we’ll see how it all shakes out. But to me, I’m not literally matching quarter-to-quarter. By the way, the other interesting thing is the catch-up, we started early. You were early, and you got a lot of the good spots, a lot of the good power generation. SN: Yeah, and also two years of cash flow. Yeah, for sure. Well, speaking of the balance between the three, in January 2026 , you missed Azure earnings by like 0.1%, so it was very small, and you said on the call , you allocated more compute to internal R&D and applications. Setting aside the earlier question about whether or not you erred by the total amount of capacity, you talked in that call about having a portfolio approach in terms of investment, balancing Azure, and those two other businesses. That’s all well and good, but if there is a constraint, you do have to choose, do you think you made the right choice then? And is that the choice you’ll make going forward? Where you are at the end of the day, you have a higher lifetime value, higher margin on your own businesses, and that’s going to be number one. SN: Yeah, and also research compute. Ben, I think that for all of us, quite frankly, we have to really, at the end of the day, that’s why I think quarterly earnings are interesting, which is, of course, The Street should hold every one of us very accountable for “What did you do for me lately?”. But was that a very particular, annoying, being held accountable for the wrong thing? SN: It’s their job, everyone’s got to do their job, and so I can’t accuse them of them asking, “Hey, what did you do for me this quarter?”, that’s the question they rightfully should ask. And the right answer for me is, “I’ve done enough for you this quarter, and we’re also making sure that 10 quarters from now, Microsoft’s continuing to thrive”, and that’s the job, and sometimes there’s a little bit of disconnect on it. But when I look at the three things, you just have to be disciplined that you’re doing what you can add value, it can’t be, “Oh, I’m misallocated”. To your point, you get punished if you do things where you’re not producing. So that’s why research compute, here is now an MAI model output. Today, it’s just not a model output as an academic thing, that’s now in differentiating our Foundry where we now are able to license it, it’s going to grow Foundry revenue. And so as long as I’ve felt that as long as Microsoft can continue to invest in ways that show results, then we will have the ability to do the right thing in the long run and in the short run deliver results. For the last quarter, was there a bit of, “Let’s give a little bit more compute to Azure?” SN: Last quarter, no. In fact, that one was just a little more of the compute — we are supply-constrained. I know, but that’s what makes it so interesting. SN: We are not at all, like at this point, if anything, the thing that we do not want to do is to disappoint especially our enterprise customers on Azure. That was the question, right? Because if they look at that quarter and they’re like, “Hmm, Microsoft’s saying we’re supply-constrained and also we’re prioritizing our higher margin, higher lifetime value businesses, where does that leave me? I’m competing against my supplier”. SN: That’s one of the reasons why we had to make some very hard choices around, for example, raw GPUs. We’re not selling raw GPUs to a bunch of Neolabs, for example. I wish I could add more Neolabs on Azure, we just cannot. And so therefore, we are being very disciplined on some business that we turn away. Were those some of the conversations you had to have? SN: Yeah, and so to me, in a world where you have constraints, you want to basically make sure you’re building for both what the world expects and the customers who have trusted you in the longest and so we will definitely make sure that Azure has capacity, it’s just that we are not going to go for what I’ll call in this context, “easy money”. Which is, you can always, in today’s day and age, if you want to have short term Azure revenue, it’s pretty easy. Oh yeah, we’ve seen that , to say the least. SN: Yeah, all you gotta do is turn up, you know, and go sell it to a Neolab. So when it comes to AI infrastructure specifically, as you look out in the long run, you mentioned it may very well be rational for the frontier labs to build their own hardware, for example. You have all these Neolabs, you have whatever controls [Nvidia CEO] Jensen [Haung]’s allocation of GPUs, you have different ASICs, what is your true differentiation as a hyperscaler? Is it just lower cost of capital? SN: First of all, think of our hyperscale business as this portfolio, everything from what we are trying to get done is build a system which we have to be competitive in when it comes to tokens-per-dollar-per-watt, that’s one side of it. We can unpack that and what our thesis is there. Well, I just noticed when you were talking about some of your chips, sometimes it was tokens-per-watt, sometimes it was tokens-per-dollar. SN: Yeah, I think of all three, right? It’s like tokens as a function of both power and dollars and so that’s a systems thing that we have to be world class at and be competitive at. And I would be able to claim, and that’s where I think [Microsoft AI CEO] Mustafa [Suleyman] talked about it, like unless and until you build your own model, you can’t, there’s no point. I believe that you don’t want to build accelerators without building a model, you kind of have to co-design. In the long run, the only way to be super efficient on that is to think about, the network is a great example, which is you want the network, the model, all to come together in ways that make sense, so therefore that’s one side. Then the other side for us is the differentiation has to come from, “If I’m building agents on top of this infrastructure, what agents does Microsoft produce?”. I have three domains in which we are going to try and major on: coding, security, and knowledge work. Luckily these are three massive domains where tokens make sense — I’m not saying there won’t be others, science is another one we will enable but I think there will be others who will do great work in there. But to me the three primary domains in which all this is going to be exercised use. So when I think about the portfolio of building a system plus model plus these three domains, then I feel like that’s where our differentiation will come from. But is that just a re-articulation of circling back to, in the long run, our true differentiation is from our higher margin, our own businesses, higher LTV? Where does that leave just customers who— SN: I think it’s not higher margin. The overall margin dollars from our infrastructure business may be higher. In fact, they already are getting close to being higher than our total margin dollars from our high margin businesses. So I think that Microsoft has always benefited from having a portfolio of businesses, and we’ve been comfortable managing through it, where it’s not one margin profile. But in aggregate, we will have high ROIC, and we will make sure that we have an infrastructure business that’s got ROIC that’s commensurate with an infrastructure business, and we have a business that builds on top of it, which I’d like call it like the new apps are agents. So we’ll have agent businesses in security, in coding, in knowledge work, as the three big domains. We’ll get to agents in a little bit, but I didn’t expect to ask this question, big news this week, will you ever issue equity to fund this build out ? SN: Yeah, I just saw the news, I think Google just did it. Were you as surprised as everyone else? SN: I’m not sure, exactly, I’ve not studied it, it came last night, I think, so I’ve got to go understand what’s happening. But, it’s like maybe it’s the thing to do is everybody is going public or reissuing equity, maybe that’s the season. Gobble up some of the money. Is software dead? SN: I think software is alive, but the way I think this entire meme has come about is, like, if you take the SaaS question in particular, right? We built in a particular way where I had a data model, and then I had a business logic tier, and then I had a UI tier, I coupled the three, then had a business model. Integration is a beautiful thing. SN: Look at this, Ben, right now, we took what is the database that no one knows about underneath Microsoft 365 and said, “Oh, WorkIQ is available , it’s just a skill/MCP, and it’s out there”, and suddenly people are falling in love with, “I can now interrogate and have an agent continuously hit this database to reason over and plan over, act over from any place”. By the way, it requires a new business model. So, for example, when Cowork is using WorkIQ, that’s going to be a usage-based business model, so I think what needs to happen is we now need to take what we built, rebuild it for the agent era and change the levers of the business model such that you have a per-user business model and you have a consumption business model. So the hybrid business model, you do think that is going to be the future? SN: 100%. And once you have that then I think what happened between servers — even I had not understood it when we moved to the cloud, even I was a little worried about, “Oh man, we move to the cloud, we’ll sell the same servers”, and it turned out we sold a lot more subscriptions because people who never bought servers from us were buying subscriptions. I think that’s what’s happening already with agents, I see that on GitHub, I see that on M365, I see that on security, because everyone is building these agent systems that are continuously “working” and so what we built and thought of as the end-user compute is completely getting rebuilt. Is there a bit where, if you have to zoom out a hybrid system where a combination of per-seat but also usage, where does E7 fit in this idea, it’s like double the price, it seems it’s an attempt to respond to maybe a secular decrease in seats by increasing ARPU? Is that the right way to think about it? SN: The way you think about this is, see per-seat is a very important element still because what is per-seat? Per-seat is basically a set of usage entitlements, so anyone who is budgeting really will push you. That’s right, people don’t like usage, we’re seeing that right now , it could explode . SN: Exactly, so therefore you just want to take packaging or bundling of usage into proceeds so that there’s some way for people to budget. So I kind of think about the E7, E5, these things will continue and then you’ll always have the outcall consumption. People also talk about, “Hey, maybe people want outcome-based pricing”. Outcome-based pricing, we’ll be thrilled about some of that, but remember, outcome-based pricing is also called royalty. When a customer has a great outcome, they necessarily don’t want to share their outcome so I think what is really being thought about is, ultimately, there is real marginal cost to software, that’s kind of what it is, and that’s going to be priced through. When did that really click for you, the implications of that? SN: I think that I would say agents. Before agents, if it is still human interaction— Right, you can imagine a world where just like basic inference got super cheap and easy. SN: Exactly, the Moore’s Law itself. Like, if you think about it, if I just used Moore’s Law, get software efficiency, I used software for efficiency and drive that home for customers to have more functionality. In fact, I used to always think about, “Hey, how much more value did we add in M365 and not raise price?” — we didn’t raise prices for a decade plus. That’s all thanks to the software efficiencies on top of hardware. But now where you are, and if you have a thousand autonomous agents that are all working continuously 24/7 hitting Work IQ, then that is a lot and so that is where I think, and so the real test for me Ben is, that’s why evals, outcomes — no customer will use consumption or their seats if it’s not creating value for them. Therefore, they now are going to be a lot more disciplined on, “What exactly did this stuff do for me?”, “How do I measure it?”, “How do I get into the efficient?”. And if you think back to going back to the 80s or 90s, where back then it’s like, “Don’t waste time on optimization, the next processor will come out and solve all your problems”, is that now totally the wrong paradigm? SN: In some sense, you want that to happen, but you can’t just count on that. It will happen, but your prices will explode. SN: Exactly, and more importantly, you will be found out if you don’t optimize. Take that example we showed with Land O’Lakes today, which is, here’s an agent, and there is an outcome you care about, I was able to use a model that is using 500B, I was able to use a 5B, and have it really deliver the same outcome, why would I not use that? That does seem to be a very different thing about this period. It seems clear that’s going to be a huge thing in enterprise going forward, using the right model, optimizing, it’s like we didn’t get to the optimization stage of the PC era. SN: That’s right. I don’t think we ever did get there. SN: We never got there. Stuff’s still bloated as ever, because everyone just assumes it’s going to get faster, it’s going to be fine. SN: Exactly, because things were not priced for it. Once you have consumption, everyone will optimize. For E7, it does seem like the real lure there is Cowork . It’s like this new capability, it’s super powerful, it’s taking Anthropic’s Cowork, which is on your PC, now it’s in the cloud, has all the niceties around that, permissions, controls, all those sorts of things. Is that why it’s there? Is that the hook? SN: Yeah, there’s also the Agent 365 , so there’s a whole lot. Like always, these things, we’re going to take everything from what I’ll talk about as what is an end-user thing and an IT thing, bring it all together. You guys know bundling. SN: And security. Yeah, definitely, and they’re all about, ultimately, how do we get the value equation right such that the customer can cover, because right now, it’s kind of fascinating. You have an agent, you immediately say, “Oh, I’ve got to secure it, I’ve got to have observability on it, I need a sandbox for it”. So it’s just that if you don’t bundle, you kind of are sending the customer down the chase of five different things. With that, though, the reason I find that striking is you’ve talked a lot about — to what extent do you think the point of integration that really matters is it does seem to be increasingly between the models and the harness themselves ? You’ve talked about things like your CoreAI initiative and GitHub Copilot, a lot of which is, “We’re going to build the harness and you can slip the models in and out”, and that works right now for Copilot and you can choose your model and even then, from what I’ve heard, not quite as easy as you might think it might be, but it’s still there, the selector’s there. Cowork seems like, “Yeah, that’s right, it has to be the whole package and it’s important for us to have a selling point on E7” — that this feels like maybe it’s not easily substitutable. SN: No, it is. The same thing on Cowork. In fact, right now, the Cowork that I’m using is already mostly defaulted GPT. Okay, so it is going to be fully interchangeable? SN: We’re using the same harness that we use in GitHub and the same thing in security, too. So we have the same harness that’s a multi-model harness in which we will rotate through — obviously MAI by default gets trained in our harness, but we will have GPT, we will have Anthropic in there and any open weight model. We will allow anyone to take any of the models they fine-tune or build. In fact, they can take an open weight model from Fireworks, tune it, put it into Copilot, no problem. All right, so I am misinformed, so I will take the L on that. Explain what is Cowork then and what is the connection with Anthropic as far as that product goes? SN: Cowork, to me, it’s kind of like Copilot. I took the term Cowork, it’s part of there and it’s definitely got the Anthropic models in there. Cowork is — think of it as a form factor, the best way to describe it is we built a chat interface first for Copilot, then we now have built Cowork for Copilot, and now we’re building autopilots, as I described it there, think of it as the enterprise-grade OpenClaws. So basically, I think of these as different form factors of agents — chat was the first thing, Cowork is the next thing and in fact, you can even go back to the developer thing. Developers, how did we start? We started with code completions first, then we went to— I get all this, but I’m genuinely confused here, because I go back to the blog post . It says, “Working closely with Anthropic, we took what they’ve done with Cowork…”. SN: Yeah, that’s what we launched first. All I’m saying is it’s evolved. It’s kind of like, Copilot today. Got it, which started out with ChatGPT. SN: ChatGPT, now it has both Opus and GPT models. Got it, okay. SN: So, they’re going to be all over. All right. So, I wasn’t completely off the reservation. SN: That’s right. I failed to catch up, I will accept that. [ Editor’s Note: the FAQ for Cowork still says it uses Anthropic models, just like the original blog post ] SN: Every product of ours, you’ll have both Anthropic and OpenAI models, and MAI models, and your ability to put your own models, and that, I think, is the fundamental promise. Oh, by the way, I should mention this. The amount of auto — I don’t know how much you’re doing selection, I’m mostly auto — and so then one of the biggest pieces of work at Microsoft is all the training models to do auto-routing. That, by the way, is perhaps one of the biggest continuous learning things.** It’s interesting because I probably approach it more from a consumer perspective, so I just literally choose the app that I want to do something in or call from the CLI. What happened to Github Copilot? You’re talking about it very positively, but I think a negative spin would be two or three years ago, you were first to market with autocomplete, everyone assumed you got there, you won, and now it’s like, “We’re going to catch up with GitHub Copilot”. SN: I think what happened is this is one of those classic cases — remember, it was a tools business before, and now it is the business, who would have thought that coding is everything? Right, it should have been everything, but it seems like for some period of time, it wasn’t? SN: For us, I think what has happened is we have continued — there are two things that are happening in GitHub, before I even talk about Copilot, I should talk about GitHub. All these coding agents have shown up to work, and where have they shown up? In GitHub. And so the first thing that, quite frankly, I wish we had anticipated better, was the amount of agenting. The whole GitHub reliability thing is like one thing, but for Copilot specifically. SN: I’ll say the first thing, that’s kind of, at some level I take that job seriously, because job number one before you want to get to Copilot is go make sure that we are scaling, so let’s leave that alone. There’s a lot of people very unhappy about that. SN: Yeah, and we’re going to work it and they should have higher expectations of us and we need to deliver for them. Then the next thing is on the Copilot side, you’re absolutely right, we started by saying, “This must be just a code completions thing in the IDE”, we added chat, we added tasks, and guess what? Let’s give credit where it needs to be given. Anthropic showed up with a model. Well, this is like Cursor’s story , they ate your lunch even before Anthropic did. Or you’re saying that that was also an Anthropic story? SN: Not really, I mean it’s kind of like Cursor/Microsoft, it’s like Borland v us , it’s not like that was not the end all be all. It was really the Anthropic coming in with a completely different approach, a more agentic approach. SN: That’s right, with a different approach. With a model and what they’ve done there, and essentially the agent loop is what the change was. In fact, if you look at it, Cursor never, total volume-wise— They got eaten by the same thing, they’re facing the same challenges. SN: Also even the market share and so on — Cursor did fantastic, they forked VS Code, did a good job, lots of credit to them. But the real thing was agentic coding became real and now the good news is the agentic coding really drives — people want choice, we will be there, we will have our own models. GitHub itself and Copilot itself will have both the Anthropic and Claude. In fact, the rubber duck feature is my most favorite feature , which is I can use it to check the others. The headline announcement from this week, I guess is these new Nvidia-based PCs running Windows . However, the announcement I found much more interesting — or not an announcement, preview — Project Solara , viewing these devices as ways to access agents in the cloud, totally different center of gravity. I don’t know if it was you that said it or the presenter, something which I thought was really compelling, which is a limitation of wearables is if you have to interact with them continuously, they get very tiring, so their utility is fundamentally limited. But if you can ask an agent to do something, then you can go do something else and meanwhile, it’s running in the background. Super compelling. I guess the question is, this feels totally different than Windows — it was weird to start this keynote talking about Windows and the AI PC, and that’s nice, and local inference, but this is like, “Actually, what if everything was in the cloud?”. SN: Yeah, I always find this frame back from 2014 of ubiquitous computing and ambient intelligence and it’s becoming more and more real each day. First of all, the first part of it was, “I’m so thrilled to have these Windows machines”, and the fact that Jensen had that beautiful slide, the picture of him with all the desktops, I was like “God, yes, I’ve been waiting for it”, which is it’s great, so I think because it makes sense, it makes logical sense to have powerful silicon systems with power that really have it with unmetered intelligence. When I worked at Windows, I had to like furtively hide my iPhone and then it was okay to show up on campus with an iPhone, now I’m here with a MacBook Air — next time I interview you do I have to feel bad that I don’t have an Nvidia AI PC? SN: You will always have choice, Ben, and I hope you choose the right thing. I’m excited about that stuff because I think there’s unmetered intelligence, even there was one little feature that we showed, which is that ability to have eight agents running continuously, analyzing logs and so on, but all of them were unmetered. Right, but that feels like it’s a side project, side quest. SN: It’s kind of like a billion users all having that, that’s not a side quest. To me, it’s as fundamental as like I think the people are going to want for their knowledge work, for their security work, for their coding work, machines— They’ll want for themselves. Is this actually the new consumer/enterprise separation? SN: The enterprise — the business model, we had this long conversation about enterprises continuously optimizing — in fact, I think the biggest value prop of a Windows machine in the enterprise will be unmetered intelligence. So people are going to say, “Oh wow, instead of having my cloud bill keep going up, I’m going to have Windows machine and amortize it that way”, so I think that there is going to be a real value to — because in a world where you have infinite amount of tokens you want to consume, you want to optimize, and why would I not optimize using everything? I don’t know, I just feel like — as you know, I’ve been very impressed with the job you’ve done with Microsoft, ending the stranglehold Windows had on the company, I still remember I was actually in the Bay Area, I was sitting at the bar at The Westin by the airport typing The End of Windows , recounting all these things you did to not kill Windows, but not make it the center of gravity for the company. SN: And that I think is what goes to Solara. I don’t think Windows, we are trying to make Windows— SN: Solara, to your point, I thought it was a great question, because the thing that I want us to take a shot at is the following which is, “Can you think of a platform and platform rules, by the way, which are built for the agent era?” — because right now, what is everyone else who are “platform owners” who will try to move from the phone to this wearables will try to bring their apps to the same game, right? I want to open that up, so I would like, for example, like what we were able to do with Teams devices , and that’s where we built some of this sort of distribution capability, so I want to use that connected to this agent world so I’m excited I’m in MediaTek, Qualcomm. Well I have a great analogy for you, I think. So there’s a bit where I think you just circle back to the great job you’ve done as CEO — this is the butter-up portion of the interview — there is a bit where I think you benefited from following the follower as it were. Steve Ballmer’s one that had to go after Bill Gates and he for better or worse created the conditions for you to succeed, I think is one way to put it, is it possible that for this, your opportunity device space — like can Apple ever really make an agent that works everywhere as long as they’re stuck on the phone? SN: That’s a great question. That is the question for all of us which is you know the reality is it’s easy to say for someone who’s been so successful with something that in face continues to have a lot of success and say, “I’m going to burn it all down and build something else”. But to the point, the way they’re architectured, everyone’s vertical. SN: Exactly, it’s not natural. Like you think about it, we’re saying, “Building agents is easy”, the SOCs are jumping out everywhere, they’re there, the silicon is easy, the system is easy, the operating system is built, and now you’re telling me that I have only one choice for an ambient thing in a hotel, in a restaurant, in a healthcare setting? It makes no sense. So therefore, I imagine that building these ambient devices using Project Solara will be as easy — if you’re successful a year from now, everybody, even in the enterprise, is going to say, “Oh, I’m just going to order a bunch of these things from a no-name ODM who just built it for me”. I think it’s super smart to start at the enterprise only. Do you have dreams that maybe this will eventually spill over? SN: Right now, I want us to again do what I think is natural, like where am I seeing people— Well, that’s where you have the Microsoft 365 environment, you have all the context there. SN: And also the agents, where would people build agents? The thing is, the consumer one will be like, “I need the one agent I want”, so it’s not like I’m not building a Copilot device, I’m building an agentic platform where the healthcare provider can have their own agent, so that’s the right place for Microsoft to start, let’s see how it goes. One last question. You had a data center segment appropriately focused on communities, you talked about things like paying your way for electricity, not using water, building up the tax base, education, etc. Why not just pay the residents ? Just pay them a dividend? SN: I’m open to all ideas here, I’m not close-minded at all because at the end of the day, I think the fundamental thing you’re asking about is, “How does this industry, including Microsoft, have permission to do what we’re doing in terms of infrastructure build out?”. My theory is we get to everything backwards in the US, this is how we back into UBI [Universal Basic Income], is we’re just paying people to build data centers. SN: Yeah. And I mean, one thing that I have an issue with things like UBI and so on are the— I’m anti-UBI. That’s how you get there while being anti-UBI. SN: I want people and communities to have control, have agency, humans to have real dignity in their work and you’re 100% right in saying, “Look, we have to do what it takes to get that permission”. And so right now, there’s so much about our industry that’s so glorious, so good, so great. What about the you’re going to lose your job part? SN: Yeah, that’s the problem. Self-obsession about our own glory and our own — if you’re not creating opportunity, why would anybody want you to succeed? That’s the fundamental memo that needs to be re-sent to everyone across our industry, and then we have to live up to it. Satya Nadella, great to talk to you again. SN: Thank you so much, Ben, as always. This Daily Update Interview is also available as a podcast. To receive it in your podcast player, visit Stratechery . The Daily Update is intended for a single recipient, but occasional forwarding is totally fine! If you would like to order multiple subscriptions for your team with a group discount (minimum 5), please contact me directly. Thanks for being a supporter, and have a great day!

0 views

Is datacentre sovereignty really that important?

In the UK (and I'm sure elsewhere) politicians and commentators are falling over themselves to suggest that without huge fleets of datacentres built in the UK that we are going to be hopelessly left behind. I'm not convinced this is the case, and it risks really falling into the same (mostly misguided) obsession many politicians have for heavy industry revival. This is going to be a rare UK-centric post on my blog. Apologies for my mostly global readership; the argument may be different where you live. One of the first (and most easily dismissed) arguments I've heard is that without datacentres close to the users, the latency will be too high to use AI services. This would therefore make them too slow to use. Clearly, this isn't the case. Nearly all AI use cases are not hugely latency sensitive. To put this in context, the time to first token (how quickly the AI responds) on Opus models is between 1.6s and 3.6s. The round trip latency introduced from the UK to the East Coast of the US is around 80ms, to Europe 10-20ms, and to Asia around 200ms. So the latency on the providers side is orders of magnitude higher than the latency for a UK based user to reach an overseas datacentre. It is fair to say that real time voice or video applications benefit from lower latency than these typically text based use cases. But these are a tiny fraction of AI usage (at the moment) and even in that case European datacentres can provide reasonable latency for these - it doesn't have to be in the UK itself . And my personal belief is that real time audio based agents are likely to work best when they can run on device entirely (so there is 0 network latency) - so without a data centre requirement at all. Regardless, many of these same commentators also suggest locating datacentres in the very north of Scotland (to take advantage of the excess wind power), but ironically these would have significantly worse latency for users from the densely populated south of England - Paris, Amsterdam, etc all are closer, and thus faster to respond. The next argument that is often floated is that it becomes a tax base - in the UK business rates are applied to commercial buildings and are paid to the local authority in question. The formulae for calculating this is in true UK tax law style overly complicated, but in essence it works on the rateable value of the building in question - what the estimated annual rent would be to rent the property - including relevant fit out. This is then multiplied by 0.508 to arrive at the annual business rate value. To take a very rough example, my research found that buildings 5-8 in the Virtus London campus can support 100MW [1] of load. These are valued as far as I can tell at around £12m/yr of rateable value. So the local authority (London Borough of Hillingdon) gets approx £6m/yr from this in business rates. If we scale that up to 1GW, it's fair to say that the local authority might get somewhere close to £100m/yr of business rates. While this is not nothing - and certainly gives local authorities a valuable source of revenue - it really is a rounding error under the current system . If we moved every single datacentre under construction globally (30GW) to the UK instead, it would bring in approximately £3bn/yr, or around 0.2% of government spending. Detractors may say that this is the current system and the tax base could be changed. But by doing that you massively reduce the attractiveness of the UK as a place to build the aforementioned datacentres. And the potential tax rates to be at all material would have to be punishingly high. This combined with the extremely high price of electricity in the UK would make it completely unfeasible to operate them in the UK. It's a similar story with jobs. Datacentres are famously light on permanent staff - the whole point is that they're highly automated, so even a large 100MW site might employ only a few dozen people once it's running. The construction phase is more labour intensive, but temporary, and much of the capex (the chips especially) is spent overseas rather than in the UK. Even on generous assumptions the direct contribution to a ~£2.8tn economy is a rounding error. The final and perhaps most plausible sounding argument is that in the event of political instability it would give us control over AI usage - which is and will be a growing national priority. There are really two versions of this argument. The cruder one is outright seizure, which I'll come to. The more serious one is that in a global compute crunch, having the datacentres physically here means we won't be left at the back of the queue. But this doesn't survive contact either. If a hyperscaler or a frontier lab owns the racks, a datacentre in Slough serves their global demand - not ours. You can't compel a private operator to give UK users preferential access just because the building sits on UK soil. Location buys you almost nothing. The real leverage here is to contractually lock in the compute - which is something the UK government could do, regardless of where the datacentre is. Onto the cruder version, then. I've even heard certain people suggest that in the event of major turbulence in the world the state could seize control of them. The issue with this is multifaceted - but I think has three main failings. Firstly, this is not a steelworks or power plant. The underlying value is not from the datacentre, it's from the models running on the datacentre. If we assume AI model development continues, the value of a 'seized' datacentre decays rapidly. Imagine the UK government had seized control of a frontier labs datacentre at the start of 2025. They'd have access to GPT4o, or Sonnet 3.7. These models are now outclassed by open weight models that you can run on a relatively powerful laptop. They have virtually no value. Secondly, it completely underestimates the supply chain that modern software runs on. It's highly likely that if the geopolitics had got so bad HM Government was nationalising frontier lab datacentres, the frontier labs would remotely wipe the servers before they could be "seized". And that's not to mention that models have loads of supporting software and operational infrastructure that is not colocated with the models themselves. The concept of the SAS seizing servers running frontier models before they can be wiped in the dead of night is probably best kept to Tom Clancy novels - not government policy. Finally, if we are in some alternate reality where the UK/Europe has been cut off from frontier models, we are almost certainly also cut off from most/all cloud services from big tech, which means no (or much reduced) email, video conferencing, card payments etc. Not being able to run Claude is probably the least of society's worries. By no means am I suggesting that AI datacentres shouldn't be built in the UK - they should - and we should reform the planning system to make it easier to build them. But it's important to get this in perspective. Modern information societies are a huge tangled web of globally interconnected pieces of software. Every day you browse the internet you are connecting to thousands of servers located in dozens of countries. Each one of those servers is sending your requests to various other providers - to store and process data. There are genuine requirements for data sovereignty. It may be preferable to host sensitive health data only in the UK, for example. But that's a simple regulation problem (if desired) - require UK based datacentres for this type of data, including AI usage. But this is a tiny sliver of total AI demand. And the world is too complicated to dream in this "Blitz spirit" self sufficiency era, especially when it comes to digital services. The UK in my opinion has many structural advantages for harnessing the economic power of AI. All of the major frontier labs have significant - and growing - labs and offices in London. We have world class researchers and institutions on the cutting edge of AI. And the UK takes the majority of European tech funding. In my opinion, we need to lean into those strengths and ensure we continue to attract and grow these companies and talent. Not worrying about where exactly we should put huge sheds. Datacentres are measured by the amount of servers it can power, in watts (or megawatts (MW) millions of watts/gigawatts (GW) - billions of watts). ↩︎ Datacentres are measured by the amount of servers it can power, in watts (or megawatts (MW) millions of watts/gigawatts (GW) - billions of watts). ↩︎

0 views
Sean Goedecke 2 days ago

Anti-AI nostalgia and the cult of the past

Programmers were better back in the day, weren’t they? Back when we had real programmers. Not just people who got paid to write code, but people who lived it, who were obsessed with their craft, and whose code was a lively expression of themselves. Hackers were hackers in those days before money took over the industry. Don’t even get me started on LLMs. Could there be a better example of today’s degenerate spirit? A machine to mass-produce software (not good software, just barely good enough), so that the weak minds that dominate the industry can indulge their obsession with quantity : of slop code, of features, and ultimately of money, which is the only way they can understand value. If they weren’t destroying our way of life, they would be pitiable. All of them together don’t have a fraction of the spiritual integrity of someone like Mel . But as it is, we must band together to crush them and drive them from our industry like the parasites they are. Okay, that’s not actually what I believe. But there sure are a lot of posts 1 and comments on the internet that sound a bit like the paragraph above. Here are some older quotes that might sound similar: …the third collapse, in which power tends to pass into the hands of the lowest of the traditional castes, the caste of the beasts of burden and the standardized individuals. The result of this transfer of power was a reduction of horizon and value to the plane of matter, the machine, and the reign of quantity. 2 Usura rusteth the chisel \ It rusteth the craft and the craftsman \ It gnaweth the thread in the loom 3 The actual accomplishments of the past will nevertheless remain accomplishments, while the artistic stammerings of the painting, music, sculpture, and architecture produced by these types of charlatans will one day be nothing but proof of the magnitude of a nation’s downfall. 4 These are all from the writings (or speeches) of famous fascists: Julius Evola, Ezra Pound, and Hitler himself. Mussolini’s Doctrine of Fascism begins by defining fascism as a “spiritual attitude”, which the fascist man adopts in order to regain the mysterious qualities that were lost by the transition to modern life. In his classic Ur-Fascism , Umberto Eco’s first two defining features of fascism are the “cult of tradition” and the “rejection of modernism”. So when someone tells me that the industry has lost its way and we must deny the corrupting influence of modern technology in order to retvrn to the time of virile real programmers (who understood and appreciated the spiritual dimension of programming), I get suspicious. It’s strange to describe anti-AI sentiment as potentially fascist, since a very popular argument is that LLMs themselves are an inherently fascist tool. Surely both sides of the debate can’t be fascist? I do think that the structure of fascist arguments is generally persuasive , and that many avowedly anti-fascist groups do sometimes fall into this trap: describing the world as a struggle between the spiritual power of the macho, traditional man and the corrupting influence of degenerate (often foreign) capital. For instance, I am a big fan of Lord of the Rings. I’ve read the series and watched the films multiple times, and even made a failed attempt to learn Elvish as a kid. But it’s hard to deny that fascists absolutely love Lord of the Rings. “Marble statue of a Roman emperor” might be the most popular avatar for fascists on the internet, but Aragorn is the second most popular. Neo-fascist movements in Italy explicitly take up Lord of the Rings as a foundational text. Why? Because the core conflict in the text is between the traditional, nostalgic heroism of the Shire and Gondor, and the corrupting modern industrial (partly foreign ) influence of Saruman and Sauron 5 . I don’t think Lord of the Rings (or anti-AI rhetoric) is intrinsically fascist. In fact, the surface-level reading of the text is anti-fascist: the plucky people of the West banding together to fight Sauron’s command-and-control totalitarian society. But I can see why fascists love it. One common historical touch-point for anti-AI folks is the Luddites, who were a violent conservative labor movement in early 1800s England. Anti-AI blogs adopt Luddite language like “smashing frames”, and positively cite the Luddites as “the go-to enemies of fascism since its inception”. I’ve written at length about what we can learn from the Luddites in Luddites and burning down AI datacenters , but one point I think is under-emphasized by the (generally pro-Luddite) books is that the Luddites were a little bit fascist themselves . Brian Merchant’s Blood in the Machine is the most popular recent book on the Luddites. I enjoyed it, but Merchant’s attempts to paint the Luddites as a friendly, left-wing, proto-feminist movement 6 seemed really unconvincing to me. From the writings of the Luddites, it’s clear that they were interested in protecting the rights of their all-male elite guild fraternity. Here’s one Luddite threat to a workshop that explicitly includes a threat against the female workers 7 : We think it quite inconsistent with our duty as men, as husbands and as fathers to suffer ourselves to be ruined any longer by a set of vagabond strumpets and those gibbet-deserving rascals that are looking over them. We will lead them to their satisfaction. We sincerely hope, gentlemen, that you will discharge the bitches and take men into your employ again, or they must take what they get. These were fundamentally conservative people who felt (correctly) that modernity had deprived them of their elite status, handing it instead to lower-paid inferiors: women, vagabonds, and foreigners. The Luddites were obviously not fascists 8 . However, the basic ingredients were there: wounded pride, a masculine elite identity, hatred of modern economics, and violence aimed at restoring their previous position in society. The currents that produced Luddism are the same currents that guided so many unhappy people towards fascism. When things are looking grim for an elite group, they often turn towards any movement that promises a return to an idealized past. If my blog has themes, one of them is surely that many software engineers labor under a delusion that their job is to be excellent at their craft. Of course, wanting to be an excellent programmer is not a delusion; it is a completely legitimate value to hold, and a legitimate purpose to pursue. It’s just not what you’re paid to do at work. Your job , unfortunately, is producing shareholder value . This delusion has been punctured by the end of ZIRP , and again more recently by the rise of AI coding. In this environment, I worry that some software engineers will form exactly the kind of disillusioned elite that was the audience for Ezra Pound’s poems about “usury” or the Luddites’ campaign against unapprenticed (often female) textile workers. I worry that AI, and the companies that build AI, are becoming an enemy against which anything is permitted: an enemy which in Umberto Eco’s words is “at the same time too strong and too weak”, unable to reason and yet powerful enough to drastically reshape the global labor market for the worse. The enemy of fascism is nuance. Fascism presents a good, clean, rousing story about a spiritual conflict between right and wrong. It is anathema to fascism to stop and muddy the waters a bit: in this case, to explore the ways in which LLMs, like any transformative technology, can both support and endanger traditional values. In The left-wing case for AI I wrote about how AI is being used right now as a disability aid, and many disabled readers wrote in to share their positive experiences with LLMs, and often how alienated they feel by the anti-AI mainstream on the left. I recently got an email describing how there’s a sudden flood of accessibility software for blind people 9 that’s actually built by blind people , who can now iterate with a LLM to get a product that meets their needs. Framing AI as an ontological evil erases experiences like these. Being anti-AI is not inherently fascist. Many of the anti-AI posts I’ve quoted are thoughtful, sensitive pieces exploring how the author thinks about one of the biggest changes to our industry. I still think the world needs more articles like that, not less, but the more of them I read, the more I recognize the tropes: spiritually pure lovers of the craft, degenerate peddlers of corrupt modernism, a need to return to the traditional ways of the hacker, and a lament for the (potentially) waning power of an elite fraternity of programmers. I know I’m tiptoeing around the worst argument in the world . It isn’t a refutation of anti-LLM arguments to say that they are structurally similar in some ways to fascist arguments, any more than it’s a devastating critique to say the same thing about Lord of the Rings. Sometimes it is good to try and halt the march of progress! Some of our past traditions really were purer and more spiritually robust! It just bothers me, that’s all. I used to read The Story of Mel with unalloyed pleasure. Now it makes me nervous. If you believe you’re fighting the embodiment of fascism , or for the idea of value itself , what tactics are off-limits? What positions might you eventually come to accept? It feels wrong to directly associate my caricature with any actual posts, but it also feels wrong to make a blanket assertion without examples. Just so you know what I’m talking about, here are some posts that have elements of this attitude. I like some of these posts and dislike others. Page 329 of my copy of Julius Evola’s Revolt Against the Modern World . Ezra Pound, Canto XLV. “Usura” should be read as “usury”, or today we could gloss it as “capitalism”: all Pound’s examples of great art were from the pre-capitalist patronage era of art. Adolf Hitler, from his speech at the 1933 Party Congress in Nuremberg. Of course, there’s also historically been a strong pro -technology current in fascist thinking (even specificially Italian fascist thinking ). Page 134 of Blood in the Machine has a brief argument that Luddism was feminist because the (exclusively male) artisans’ wives would provide food for their meetings. No, really. From Kevin Binfield’s Writings of the Luddites , page 40. I’ve taken the liberty of re-rendering it in modern spelling and grammar. Aside from being too early, they didn’t have any connection to the state apparatus of power (in fact, they were ultimately crushed by it) and they famously lacked a singular leader. The example cited was BlindRSS . It feels wrong to directly associate my caricature with any actual posts, but it also feels wrong to make a blanket assertion without examples. Just so you know what I’m talking about, here are some posts that have elements of this attitude. I like some of these posts and dislike others. ↩ Page 329 of my copy of Julius Evola’s Revolt Against the Modern World . ↩ Ezra Pound, Canto XLV. “Usura” should be read as “usury”, or today we could gloss it as “capitalism”: all Pound’s examples of great art were from the pre-capitalist patronage era of art. ↩ Adolf Hitler, from his speech at the 1933 Party Congress in Nuremberg. ↩ Of course, there’s also historically been a strong pro -technology current in fascist thinking (even specificially Italian fascist thinking ). ↩ Page 134 of Blood in the Machine has a brief argument that Luddism was feminist because the (exclusively male) artisans’ wives would provide food for their meetings. No, really. ↩ From Kevin Binfield’s Writings of the Luddites , page 40. I’ve taken the liberty of re-rendering it in modern spelling and grammar. ↩ Aside from being too early, they didn’t have any connection to the state apparatus of power (in fact, they were ultimately crushed by it) and they famously lacked a singular leader. ↩ The example cited was BlindRSS . ↩

0 views
iDiallo 2 days ago

Now that your newsletter is AI-generated, I've Unsubscribed

I've remained subscribed to some newsletters for over 20 years. The authors managed to keep my attention all that time. But then, one day, they decided to switch to an AI-generated newsletter without making any announcement. After a couple of weeks of blue high-tech image thumbnails, I simply hit unsubscribe. Here's what happened: a person earned my trust. He maintained that trust for all those years. But then he thought the best way to improve was to take himself out of the equation. If you're just going to present me with prompt-generated content, I hate to break it to you but I have access to ChatGPT, and I can do that myself. The reason the human voice matters to me is because there's real experience behind the words. The oldest newsletter in my inbox is from when I was just 12 years old. It was from a French writer I used to read. After a decade of following him, the emails stopped coming. I was only reminded a few years later, when the emails started coming back. I didn't jump on it immediately. I didn't even remember who it was. But when I read one at random, the words were different, the tone was nostalgic, and the name was unfamiliar. I dug deeper and found that the author's son had taken over the newsletter. That was my cue to unsubscribe. But he hadn't used AI to replace his father's voice. He didn't use any tricks to garner clicks. Instead, he announced that his father had passed away and that he would share some stories. I remained subscribed until the last story was released. I rarely sign up for any newsletter. If I do, it's intentional because I'm interested in what the author has to say. It's not much deeper than that. There is a big difference between a newsletter written by a person, one that breathes and wanders and sometimes takes his time. Compared to the rapid fire, mechanical hum of AI-generated content. One feels like someone is thinking with you. The other feels like a monetization strategy.

0 views
Kaushik Gopal 3 days ago

OpenCode power user tips

In this post, I’d like to talk about some power user tips for OpenCode - an open source , model agnostic harness that more people should be using. Hopefully some of the advanced use cases convince you to give OpenCode (and OpenChamber ) a shot. intermediate to advanced tips only I am specifically choosing to talk about some advanced tips in this post. If you’ve never used an agent harness or are looking to learn how to use OpenCode, this post can be useful but reader beware. While (Ctrl + P) will list out all the possible commands (and is helpful), OpenCode has the concept of a “leader” key (which defaults to ). The leader key allows you to execute targeted useful commands more quickly and there’s a slew of useful ones pre-defined 1 . People reach for whole terminals and extra tooling to juggle between agent sessions. I too had an overly customized tmux setup that looked like this: OpenCode simplifies this. Just hit and you view current sessions and can instantly switch to that session by just selecting it from the list. The ability to quickly rename a session from this view is a godsend for me and what lets me be organized. session directory filtering you can pass a flag to when launching it, which filters the session list to just this workspace/directory by default. You can alternatively not pass that flag, and the session list will show all sessions. Forking takes the session you’re in and spawns a new one. You branch off into a separate conversation while the main agent keeps grinding on whatever you left it doing. I love this feature and even cobbled my own version with tmux long before most harnesses shipped it. Claude Code, Codex and other harnesses have caught up and support this feature. But OpenCode’s UX is the smoothest. You simply type in your chat. It gives you the option to fork the current chat or from a previous point in the message. You can then rename the forked session right from the list ui, and jump back and forth. The easy session switching again comes in handy here. Need to rewind to an earlier point in the same conversation? In OpenCode, there’s no escape-escape dance. leader g shows you a timeline and you can revert the conversation instantly, fork a new session from there, or just copy the message text. Probably one of the main reasons I find it hard switching away from OpenCode. I can bounce between GPT-5.5, Kimi K2.6, and Opus by just hitting 2 . change model & reasoning + switches the model on the fly. changes the reasoning type. I see a future where we will have smaller models we can run locally. OpenCode can point to that ollama model you have running on your own machine too. Click here if you’re curious about my model choices. Not everyone realizes this but OpenCode ships with LSP servers built-in . This means the coding agents inside OpenCode understand how to navigate different programming languages better. You’ll find less file search and grepping. Anthropic even recommends LSP server integration as an advanced move for making harnesses behave in large codebases. OpenCode gives you much of that for free. The other reason I swear by OpenCode: hit to cycle through custom agents. Here’s a few I use a lot: view the subagent work When an agent fans work out to subagents, + pulls up the subagent view so you can watch them work. Like others, you can use OpenCode for scripting and one-shot reviews: So up until now, I’ve mostly talked about features in the context of the TUI. My good friend YY recently introduced me to OpenChamber and it’s changed a lot of things for me. OpenChamber is an OpenCode GUI wrapper. OpenCode already has a web client btw. But OpenChamber has a lot of nice bells and whistles. But here’s the kicker, it’s using your same OpenCode server. In a previous post I dug into OpenCode’s server-client architecture: you run OpenCode as a server and connect multiple clients to it. A client can be a terminal tab, your phone, a desktop, a browser — each an isolated session pointed at the same server, fully synced. OpenChamber is just another client, but a super powered GUI one. This feature has taken the world by storm; especially since Codex introduced their implementation. OpenChamber gives you this feature for free with a super nice UX. One button click and either using or internally, it opens a secure 3 tunnel that you can connect your phone or another client to. So now, your phone controls OpenChamber and by proxy OpenCode exactly as you would from your computer. This was possible with OpenCode and tailscale too (as I mentioned in my previous post) but OpenChamber’s UX and secure tunnel approach makes this fluid. I almost never take my work laptop with me, when I’m getting out of the house now. Just speaking to my phone and a browser tab that has OpenChamber open. The other OpenChamber feature I lean on: multi-run. You have a prompt and want to try it across several models at once. I think Cursor was the first to introduce this feature. OpenChamber provides a super nice UI for this. This is how I’ve been kicking the tires on Opus 4.8 and updating my model choices . There’s just one caveat to be aware of. OpenChamber by default probes for a running OpenCode server. If it doesn’t find an OpenCode server there, it will silently spawn its own. So if you truly want all your sessions in sync, you should start your OpenCode server on port first, then open OpenChamber regularly and it’ll attach to the one you already have. I have a handy shell alias to just start a background OpenCode server now like so: If you didn’t read this tip in time, and need to kill previous OpenCode server instances, I suggest the handy procs cli command. There’s a lot more to both OpenCode and OpenChamber, but this is the stuff I reach for daily. The bit that’s stuck with me most is the one-server, many-clients setup — run a single OpenCode server and point everything at it: the TUI, OpenChamber, your phone. Steal whatever helps here, and if there’s a tip I’m sleeping on, send it my way. OpenChamber v1.12.0 tunnel bug Heads up: OpenChamber v1.12.0 added a headless web app mode, and remote instance switching now changes the OpenChamber API endpoint without loading the full remote UI. This seems to have busted the remote mobile tunnel setup I describe above. :/ The developer is responsive and working on a fix 🤞. Until then, I recommend sticking to v1.11.7 , which you can download manually. You can also bind commands that don’t have a predefined key. As an example, I bind the “Exit the app” command to so I can quit OpenCode quickly.  ↩︎ yes yes, you’re probably nuking your prompt/KV cache, but you shouldn’t have long running conversations anyway.  ↩︎ one-time + TTL + revocable connect link  ↩︎ + switches the model on the fly. changes the reasoning type. red-team — think differently from the implementer with an independent adversarial lens and hunt for failure modes. ghostwriter — drafts messages, posts with a less AI tropey voice. brainstormer — custom agent that’s explicitly tuned to help me brainstorm ideas, plans etc. pr-reviewer — strict reviewer that ignores past conversation and reviews with fresh eyes. kimi-coder — a coding agent guardrailed to Kimi: fast, cheap implementation. agent-kombat — see my agent-kombat post. I have it wired into a custom agent for quick use. You can also bind commands that don’t have a predefined key. As an example, I bind the “Exit the app” command to so I can quit OpenCode quickly.  ↩︎ yes yes, you’re probably nuking your prompt/KV cache, but you shouldn’t have long running conversations anyway.  ↩︎ one-time + TTL + revocable connect link  ↩︎

0 views
Stratechery 3 days ago

The Nvidia AI PC, Project Solara, Microsoft AI

Listen to this post: Good morning, I don’t normally give away my interview subjects ahead of time, but I’m going to make an exception this week given the subject and the below Update. I am writing this in San Francisco where I interviewed Microsoft CEO Satya Nadella after his Build developer conference keynote ; normally I would want to publish that immediately so that you have the full context of my analysis. In this case, however, I came to the opinions below during the keynote, and before the interview, so for that reason (and a few logistical ones) I wanted to articulate them first (before you see my questions), and follow up with Nadella’s view on them (and a number of other topics) afterwards. So with that noted, on to the Update: From CNBC : Nvidia has emerged as the world’s most valuable company by dominating the market for artificial intelligence chips in the data center. Now the company is expanding its prowess to chips that will serve as the main processor for personal computers, entering an arena that’s long been ruled by Intel, Advanced Micro Devices, Qualcomm and Apple. During a keynote address at Taiwan’s Computex conference on Monday, Nvidia CEO Jensen Huang unveiled a new PC processor made alongside Microsoft. The RTX Spark superchip, which Huang also referred to as the N1X, debuts in the fall on a fresh line of Windows PCs from Microsoft, Dell, HP, ASUS, Lenovo and MSI. I’m actually starting in Taipei on Sunday, where Huang introduced the long-rumored Nvidia PC chip; from Tom’s Hardware : At full strength, this chip offers up to 20 Arm CPU cores, a Blackwell GPU with 6,144 CUDA cores, 128GB of LPDDR5X RAM, and up to 300 GB/s of memory bandwidth. That powerful CPU and GPU, connected over NVLink C2C, and the large memory pool give AI agents and 120-billion-parameter models plenty of power and space for long-running tasks with context lengths stretching to a million tokens, according to Nvidia. We don’t have any benchmarks yet, but the RTX Spark appears to be broadly similar to the DGX Spark; that’s a decent chip that excels at prefill, but is slower than an M5 Max at decode (thanks to lower memory bandwidth), and significantly slower at CPU tasks. Huang appeared during the keynote via live video to discuss the chip. Satya Nadella: Suddenly, this concept of unmetered intelligence right at the edge is so hot again. So maybe you want to talk a little bit about this: you have thought about this, talked about this, and now, of course, with RTX Spark really delivered, I think, what’s a breakthrough system for AI to be much more ubiquitous. But maybe, Jensen, you can just share a little bit your vision around where you see this going. Jensen Huang: Well, this all started about three years ago between a conversation between you and I. And we were talking about how we could build a new class of PCs that’s incredible for designers and creators. And it would be incredible for artificial intelligence. And it would be one of these systems that has the processing capability, but also the software stack that’s integrated into the world’s design packages and creator packages. And, of course, all the things that we’re doing with AI. And here we are, three years later, we built an incredible new chip. And this system is supported by all of this new software that you created for Windows. And we now have the ability to have essentially an autonomous agent running on the PC. This clip explains why I find this chip specifically, and AI PCs generally, pretty underwhelming. Three years ago we were still in the ChatGPT era of AI, and I was very excited about the possibility of local inference. Then came the reasoning era, blowing up KV cache (which increases the need for more memory) and emphasizing the importance of decode (to generate that many more tokens). Now we’re in the agentic era, where CPU performance is incredibly important. To that end, the ideal setup for a local agent is strong local CPU performance and calling out to the cloud for inference. The RTX Spark, however, spends tons of die space on GPU cores that are inferior to the cloud (because of memory size and bandwidth if nothing else) at the expense of CPU. It’s a suitable chip if you just want a chatbot circa 2023; it’s hard to see it being worth the price — or the software compromises that are the reality of Windows on ARM — in 2026. Jump ahead to the Build keynote, which I found very underwhelming to start. Nadella opened with a brief overview of the AI stack, then started talking about Windows, and I was honestly pretty surprised at the lack of vision and enthusiasm. That’s when it occurred to me: I think that Nadella agrees with me! Sure, some local inference is nice, but that’s not where the AI that matters is going to be located. Nadella, keep in mind, has no real loyalty to Windows; indeed, I credit him with The End of Windows . Specifically, Nadella didn’t end Windows as a product, but he ended its run as the organizing principle around which the entire company operated, focusing on software that ran everywhere and a cloud that ran everything. That leads to a surprising takeaway, and the most interesting part of the Build keynote: what if Microsoft is actually well positioned to get back into AI devices? From GeekWire : A team inside Microsoft has been quietly building a platform for devices that run AI agents instead of apps, based on Android instead of Windows, with two working hardware designs so far, and an initial set of big-name companies lined up to run pilots. The platform, dubbed “Project Solara,” is Microsoft’s bet that AI will open up entirely new scenarios for computing — using agents to avoid the constraints of traditional software, and off‑the‑shelf components to develop new devices quickly and inexpensively. Project Solara is, to be clear, vaporware at this point, although the company did show real devices and has signed up Qualcomm and MediaTek as chip partners. It is also extremely compelling. Here’s how Nadella introduced it: So far, we’ve talked about the edge and the cloud. The current form factors, right? I mean, when I saw that Jensen picture from the weekend where he had all the desktops, I felt like, man, I’m back in the 90s, right? Because it was so cool to see the lineup of all the machines that I loved and I grew up with back yet again with new functionality, right? It’s the same form factor, but unbelievable new functionality because of the onboard AI capability, right? So that’s sort of what we’ve seen with the laptop, the desktop, and of course with the cloud. But it also, you know, sets up that next question: if you have that capability, which is new function, and you can put it into existing form factors, can you even purpose-build new form factors for the new function? Can you build a new platform even for the agent era? And that is the motivation behind Project Solara, which we’re introducing today. First off, note the framing: the PC is old tech with agents; what about new tech uniquely enabled by agents? And note the classic Microsoft hook: could that new tech sit on top of a new platform? Corporate Vice President Steve Bathiche, the head of Microsoft’s Applied Sciences Group, explained the vision: Before I talk about those awesome new devices you just saw, let me start with the why. Back at Build 2023, I talked about the outside AI application structure, where AI moves from operating within the application frame to operating globally, working across multiple apps and services to connect, coordinate, and maintain context across entire workflows, devices, and time scales. What if there were an ecosystem of devices specifically designed for that new type of application structure, for those types of agents, for that transformational interaction technology? That is the impetus behind Project Solara. But with so many possible forms, which one do you pick? What is the next device? You see, the big aha for us is that it’s not about choosing one specific form factor. It is about creating a system that extends your agent across a constellation of devices. The next computer is not one device. It is all these devices working together as one system, with agents showing up closer to where and when you need them. There was one brief moment in the promotional video that preceded Bathiche’s appearance that made the concept click for me: The problem with wearable devices is the interaction model: they are only useful when you are interacting with them, when the human is in the loop, but being in the loop with a wearable is annoying and inefficient. What is being demonstrated here, however, is a brief interaction, and then an agent doing work in the background. In other words, the usefulness happens in the cloud without the human needing to be involved, because an agent is doing the work. That’s what I find compelling. On one hand, you can make the case that of course Microsoft would be interested in a device model that uses the cloud as a platform, given that Microsoft doesn’t control a mobile device like an iPhone. What occurs to me, however, is that even if Microsoft doesn’t succeed with Project Solara, this model — where the cloud is the hub and multiple devices are the spoke, instead of the phone being in the center — is clearly a better one for agents. Agents work best in the cloud, and across apps and devices; yes, the phone might be one of those devices, but when it comes to agents it shouldn’t be the hub. Again, this is vaporware, and very much in Microsoft’s interest, so take Project Solara with the appropriate grain of salt. It’s a vision of the future, however, that does make a lot of sense, particularly in an enterprise scenario where all of the context and compute is already in the cloud (and Project Solara is focused on enterprise, not consumer). It’s also something completely different from the past, and fits my thesis that, in the age of AI, thin is in . From GeekWire : Microsoft has based much of its AI business on models from OpenAI, before expanding more recently to Anthropic. On Tuesday, the company showed how it plans to rely less on both. At the Build developer conference, the Microsoft AI Superintelligence Team unveiled a family of seven models built from scratch. It’s part of an ongoing effort by the company to build credible in-house alternatives to models from partners and rivals with competing allegiances… The flagship of the seven newly announced MAI models is MAI-Thinking-1, a reasoning model that Microsoft says draws even with Anthropic’s Claude Sonnet 4.6 in blind human testing, and matches the more capable Claude Opus 4.6 on a widely used coding benchmark. [CEO of Microsoft AI Mustafa] Suleyman stressed that MAI-Thinking-1 was trained from the ground up with no distillation from other companies’ models, looking to appeal to enterprises that care about clean data lineage. These models seem pretty decent, all things considered, but what was interesting to me was the framing: Microsoft emphasized that enterprises could take these models and make them their own. Suleyman said: This is what owning the full stack end-to-end looks like. It’s the foundation of Microsoft Frontier Tuning, it lets you customize the MAI models using our full stack hill climbing machine right where you want it. And it means that the disciplined and very relentless engineering that has gone into building our models is now available to all of you on a platform that you can trust, working on your behalf to create custom agents that you will control. So the really big thing, of course, that’s happened in the last year is these RLEs, reinforcement learning environments, these unique training gyms for your AIs. They create company and task-specific agents adapted only to you, built on MAI models. So for example, within Microsoft, we use our RLEs combined with our MAI models to climb towards the best agentic use cases on Excel. Our MAI-tuned model is now on par with GPT 5.4 on public and private benchmarks, whilst at the same time being 10 times more efficient on cost, and many other early adopters are seeing similar results. When we’ve tuned our models on McKinsey’s tasks, MAI delivered the highest win rate, even outperforming GPT 5.5, and again delivering 10x greater efficiency on cost. So to us, this is the advantage of very carefully calibrated frontier tuning. And importantly, unlike with some of the other companies, with MAI, you don’t rent intelligence from a shared model that learns from everybody. Only you keep the benefits of your hard-earned workflows, know-how, knowledge, and your own institutional data. Only you get to control the resulting model. And so with us, the RLEs and the models that you build inside of them, they become your moat. I really think this is distinct. It marks a new era in AI that we’re all very, very excited about. This has shades of AWS’s Nova Forge offering , which lets enterprises add their data at a checkpoint in pre-training; it’s a little different in that it’s more focused on reinforcement learning, but those lines are getting blurred. The concept is that enterprises get to have their own model for their own data, without sharing it with the frontier labs that want to eat their lunch, and it’s a concept that is certainly appealing in theory; the real test will be to see if enterprises that choose this route aren’t penalized by not being on the cutting edge of functionality. Then again, helping cautious enterprises embrace the future on their terms, without necessarily having to win on pure performance, is exactly how Microsoft has long maintained its position. This Update will be available as a podcast later today. To receive it in your podcast player, visit Stratechery . The Stratechery Update is intended for a single recipient, but occasional forwarding is totally fine! If you would like to order multiple subscriptions for your team with a group discount (minimum 5), please contact me directly. Thanks for being a subscriber, and have a great day!

0 views
Evan Schwartz 3 days ago

Scour - May Update

Hi friends, In May, Scour scoured 865,266 posts from 28,671 feeds (1,766 of which were newly added), and 260 new users signed up to bring it across the 3,000 user mark! Here's what's new in the product: Scour is now better at finding posts that match your interests. You should see more relevant content and far fewer off-topic articles in your feed. (This sounds simple, but it represents at least a full month's effort 😅.) The way this works under the hood was one of the single biggest changes I've made to Scour's core ranking system since I started working on it. At a high level, scoring now combines Scour's original fuzzy concept matching (embedding vector distance) with how much the article uses relevant vocabulary (lexical search). While these ingredients are well-established, I think the exact way Scour implements them might be a somewhat novel system design. The reason this was so complex to build was that existing approaches to lexical search did not work for Scour. For example, every Scour user has between a handful and hundreds of interests (I have 642), each of which might have 3-10+ relevant keywords. This means that every "search" is actually a search for thousands of terms (for my feed, it's around 5,000). Most search systems are built for individual queries with a handful of terms. The even more tricky issue is that lexical search algorithms like BM25 do not produce scores that are comparable across queries, because they are designed for ranking (ordering results for a specific query), not scoring . Scour, however, needs to know which of your interests a given post is most related to and it sorts the posts in your feed by how relevant they are for any of your interests. I believe that the custom scoring and indexing system Scour now uses provides both cross-query score comparability and efficient lookup for thousands of parallel queries. Stay tuned for more details! 🙏 Help me out! Please like, dislike, and report posts as off-topic as you're browsing. These signals help me tune the system and figure out the edge cases where it could be improved. Scour bolds keywords in the post titles to make the feed easier to skim. The new lexical scoring layer discussed above makes it easier to bold exactly the words related to your interest. Two other small changes let you peek under the hood of the new scoring system. On desktop, hovering over a post's title will show you the score breakdown between semantic and lexical. Separately, if you click on an interest tag and go to the single-interest page, there is now an Advanced link that will show you the terms the lexical scoring system is using to find and rank posts. Here were some of my favorite posts that I found on Scour in May (you can tell from the topic concentration where my mind has been!): Happy Scouring! Latent Terms: Dense Retrievers Contain Trivially Extractable BM25-ready Zipfian Vocabularies Rethinking Agentic Search with Pi-Serini: Is Lexical Retrieval Sufficient? Re-autoresearching MSMARCO BM25, on Vespa How we made a SQL query optimization agent 59% more accurate using autoresearch and LLM Observability Your Vector Database Doesn't Know What Similar Means My Plan with RSS Agentic Coding is a Trap

0 views
Jim Nielsen 4 days ago

An Ode to the Exacting Pedantry of Computers

The very first computer programming class I ever took introduced me to the idea of there being different kinds of numbers, like integers, floats, and doubles (it was a C++ course). “You mean, when I assign a variable, I have to say up front what kind of number this is?” It was such an odd concept to me. A number is a number. Why do I have to say it’s this kind of number or that kind of number? I dropped out of that class. A few years later, I decided I wanted to try programming again. So I took another intro class. This time they were teaching with Python instead of C++, so you can imagine my excitement to learn that I didn’t have to think of numbers in this way anymore! It felt like the computer was meeting me partway. Over time, I came to learn how pedantic computers are. They require a kind of exacting precision in saying what you want them to do. And they’ll only ever do exactly what you tell them to do, nothing more, nothing less. If there was a bug in your program, that wasn’t because the computer was doing something you told it not to. The computer was only ever doing exactly what you told it to do. A “bug” was very likely a flaw in your conception of how the program should execute, not the actual execution. It was a failure on your part to be more precise, to imagine a scenario where something happened that you didn’t anticipate — and therefore didn’t tell the program how to handle. “Do what I mean, not what I say!” But now, with LLMs, that kind of exacting precision in language and thought is disappearing. You can have a thought, ask the LLM to build it, and it will fill in all the details you didn’t specify or anticipate. All those pesky details which previously would’ve made you reflect, “Oh, I didn’t think of that. Maybe I should design this differently…” Or, “Oh, well now that I have to think about this some more, I can see that it might not actually be a very good idea…” The pedantic friction, which seemed like such a nuisance, was actually acting as a kind of tool for sharpening and improving your thinking and output. The exacting nature of the computer required you to think more. LLMs, however, have significantly lessened that friction. You can think less and move faster. And yet, that feels like our job as software makers: to think, to anticipate, to explicitly articulate intent. As a software user, I’d rather folks spend more time thinking so that I, in turn, have better experience. This is preferable to giving me more stuff faster that’s only partly conceived. As an industry it feels like we’re headed in a direction where we think it’s better to ship more faster and fix the effects of half-conceived intent later, than to spend more time upfront discovering, sculpting, and specifying intent. That’s one thing writing code by hand has taught me: intent — what you want to build and how you want it to work — is shaped through the act of articulating it. That hard work is not required of us anymore. The LLM will fill in the details. The exacting pedantry of the computer is going away, and in its place are assumptions about intent — many of which we don’t even know about until our users run into their effects. Reply via: Email · Mastodon · Bluesky

0 views

AI Doesn't Have ROI

If you liked this piece, you should subscribe to my premium newsletter. It’s $70 a year, or $7 a month, and in return you get a weekly newsletter that’s usually anywhere from 5,000 to 18,000 words, including vast, detailed analyses of NVIDIA , Anthropic and OpenAI’s finances , and the AI bubble writ large . My Hater's Guides To the SaaSpocalypse , Private Credit and Private Equity are essential to understanding our current financial system, and my guide to how OpenAI Kills Oracle pairs nicely with my Hater's Guide To Oracle . Over the last three weeks , I’ve published an exhaustive three-part guide to how the AI bubble might collapse, the events that might trigger it, and the consequences.  Subscribing to premium is both great value and makes it possible to write these large, deeply-researched free pieces every week.  Something changed in the last week. Shortly after Uber COO Andrew Macdonald said that it was “getting harder to justify” spending money on AI as it was “very hard to draw a line” from that spend to useful consumer features ( after its CTO said Uber burned its entire annual token budget in four months ), Axios’ Madison Mills reported that one company had accidentally spent $500 million in the space of a month on Anthropic’s models after failing to set spend limits. A few days later, Mills would report that other companies were now looking for ways to reduce their AI spend . That’s because, as I’ve said before , nobody can actually measure the ROI of AI, or even create a standard measurement of the cost of a task thanks to the inevitable hallucination-prone nature of LLMs and the ever-growing list of different harnesses and “agentic” (sigh) interfaces. Every different prompt and project and interaction can go wrong in a way that is hard to predict or plan for other than having an eternal vigilance that the supposed “intelligence” doesn’t do something catastrophically stupid, because LLMs have no thoughts, consciousness or ability to learn outside of pre and post-training.  If you can’t measure how good something is, how much it might cost, or what your return on investment might be, it’s fair to ask why you’re even paying for it in the first place. People are (reasonably!) harping on about the ROI problem, but I think the “can’t really measure the cost” part is an even bigger problem.  Yesterday, Microsoft’s GitHub Copilot moved all customers to token-based billing from a premium request model ( as I reported a week before everyone ) as users had been allowed to burn thousands of dollars of tokens on a $39-a-month subscription .  Customers are irate. One burned through 50% of their monthly credits in a single prompt , another burned 60% in the space of a few hours , another 31% in a single prompt , another estimated that they’d burn their monthly credits in the space of a single five hour session , another burned nearly half of their credits in eight prompts , another around 14% of their credits in two prompts , and another lamented that GitHub Copilot had gone from their favorite subscription to their most-stressful overnight after burning 33% of their monthly balance in a few hours . And, to be clear, this is during a promotional period where you get $11 or $21 in free monthly credits: These users — much like the users of effectively every subsidized AI subscription — never really knew how much anything they did cost, because Microsoft intentionally hid the actual cost of prompts and allowed users to spend obscene amounts as a way of boosting growth for GitHub Copilot.  This problem is industry-wide. Every single user of every single AI subscription service is having their tokens subsidized and the actual cost of AI obfuscated. As a result, every frothy, fluffy hype-piece about Claude Code or AI in general is a kalopsia — the belief that something is more beautiful than it really is.  Think of it like this: if you’re using an AI subscription with rate limits but no actual costs , any mistakes a model makes — such as getting stuck in a loop or just doing the wrong thing — can be dismissed as the troubled nature of early-stage technology, because the “cost” was $20, $100, or $200 for the entire month. Anthropic, OpenAI and every other AI company deliberately obfuscated these costs because they knew that the second a user actually had to pay for the fuckups of an AI model they’d scream like they were being stung to death by bees. This issue bubbled to the surface in the last few months because Anthropic and OpenAI both quietly moved all of their enterprise customers to token-based billing in Q1 2026 , and because these enterprise customers are run by Business Idiots with no connection to actual work , CEOs encouraged (or actively incentivized ) their workers to use AI as much as possible, in some cases even making one’s AI use a KPI that could cost them their job.  These same workers were conditioned — through their use of AI subscription products that hide the true costs — to use them as if they cost nothing , all while being screamed at by useless middle managers to “make sure to adopt AI at scale,” all while never, ever having any awareness of what a particular unit of work cost. This was always a recipe for destruction. The overwhelming majority of AI users are completely divorced from and actively trained to ignore the true cost of AI tokens, which means they naturally use these services in a way that’s actively uneconomical. Every frothy hype-piece you’ve read has been written by somebody who has been conned into ignoring the true cost of AI, all in service of spreading a technology that’s unreliable, inconsistent and expensive at its core, and never, ever seems to get cheaper.  OpenAI, Anthropic and other AI companies have actively conspired to mislead the world about the true costs of AI, and it was working great right up until they decided to try charging what it actually cost. Less than a quarter into the shift to token-based billing, enterprises are freaking the fuck out, with Walmart setting token limits on its internal “Code Puppy” AI coding tool , with a spokesperson saying that it “wanted employees to apply AI in ways that create value” mere days after Amazon SVP Dave Treadwell told employees to “ not use AI just for the sake of using AI .” The last few years of AI hype have been built on lies. Every company has conspired to make you think that AI is affordable and sustainable, that profitability was possible, that hallucinations were fixable, and that any problems you faced today were a result of being in “ the early innings .” In reality, the AI industry has absorbed over a trillion dollars, effectively all tech talent, the majority of startup funding, the majority of media coverage, the art and work of millions of people, and been given chance after chance after chance to fix the obvious, glaring issues.  Every time a skeptic dared to stand out and say that none of this made sense, they were told that it was just like Uber ( it’s not ) or that Amazon Web Services cost a lot of money ( it cost $52 billion over the course of 14 years and was cash-flow positive in nine ), that “costs always come down,” and that everything would magically be alright as long as they were patient for an indeterminate amount of time. Four years and a trillion dollars in, AI is more expensive, its companies more cash-intensive, its products just as unreliable, and its boosters more desperate than ever to make you ignore reality as a means of empowering one of a few ultra-rich oafs. Products from OpenAI and Anthropic are built to ingratiate and coddle losers while creating work-shaped outputs that are good enough to impress braindead executives, imbeciles and middle management hall monitors that don’t do any real work, and the reason it’s worked this long is that both companies intentionally misled everybody about how much the real costs were. I must repeat myself: AI is more expensive today than it was three years ago, and it is not getting cheaper. Sam Altman’s comments about “ intelligence too cheap to meter ” were lies. NVIDIA’s Blackwell GPUs didn’t make it cheaper, and its Vera Rubin GPUs won’t either. Google’s TPUs won’t do it, Amazon’s Trainium or Inferentia chips won’t do it, Vera Rubin CPUs won’t do it, OpenAI’s chips won’t do it, and no, DeepSeek won’t do it either.  People chose — and still choose — to believe that AI would get cheaper because they think things got cheaper over time in the past, which is sort of true but not remotely similar in any way, because the cost of running and training AI models comes from using the hardware as well as its upfront cost. Large Language Models require expensive GPUs thanks to their reliance on power-intensive parallel processing, and larger, more-complex models in turn require more GPUs to both train and run inference with. And three generations in, NVIDIA GPUs don’t appear to be bringing the cost down at all, which heavily-suggests that the inherent business model of generative AI is broken. People love to compare AI to the Dot Com Bubble ( AI is far, far worse ) because it’s much easier to rationalize bad behavior than accept that we’re facing the largest misallocation of capital of all time. The Dot Com Bubble was really two bubbles — one around eCommerce and internet startups, and one around telecommunications infrastructure. Per Justin Kollar , the telecommunications bubble grew because of a fundamental misunderstanding of demand: As a result, infrastructure was built far in excess of what demand existed, because most people weren’t online, and those who were had very slow internet connections. Per me : Here’s a critical difference between AI and the Dot Com Bubble: when people actually lit up the dark fiber, the underlying internet service was faster, better and cheaper than a dial-up connection. Services like TheGlobe, WebVan, and Pets Dot Com ran businesses that lost incredible sums of money did so not because of the costs associated with accessing their services, but the unrealistic and unsustainable business models themselves.  Their eventual functional forms — Facebook, Instacart, and Chewy — didn’t require fundamental scientific breakthroughs in how goods were delivered or internet services were accessed. Their failures were a result of poorly run businesses that lost money by expanding too rapidly or spending $400 to acquire each customer .   Dell and CoreWeave just turned on the first Vera Rubin GPUs , and you’ll notice nobody is saying the words “profitable” or “sustainable,” because NVIDIA is not interested in making stuff more efficient rather than more expensive.  According to CEO Jensen Huang , AI data centers — which currently cost somewhere in the region of $50 billion per gigawatt — will now cost between $80 billion and $100 billion per gigawatt in the future. Does this sound like it’s getting cheaper to you? Even if said data center packs theoretically more “power,” what does that “power” do for the customer running compute on it? Is it cheaper? More efficient? How do we not have these answers? All of this is to say that the Dot Com Bubble happened due to irrational exuberance and growth lust, and what was recovered at the end came not from scientific breakthroughs but the fact that the useful infrastructure existed and could be adapted and used to make things cheaper and more efficient. That isn’t the case with AI data centers, AI startups or anything else to do with the AI Bubble. Every few days somebody makes a post like this suggesting that “the internet didn’t go away” and “railways didn’t go away” when their bubbles popped, but I think this is a fundamental misunderstanding of what AI is . An AI data center full of AI GPUs is useful for AI and very little else. There are GPU-powered analytics tools, GPU-powered modeling and scientific applications, but the nature of GPUs — good at doing the same thing across big data sets in parallel, but bad at handling many little independent tasks — makes them impractical for most of what modern computing demands. The entire Dot Com Redemption storyline comes from the idea that it “left behind useful infrastructure,” by which they mean “cabling that allowed hundreds of millions of people to use the internet.” While there was some amount of further construction and capex to handle, the end result was useful fiber that connected people with a faster connection at a lower cost. No such story exists for AI. AI data centers are ruinously expensive , requiring billions in upfront funding with operating costs so high that they, at best, run at a loss for the first five or six years of service, if they ever recover their original costs at all. A rack of Vera Rubin or Blackwell GPUs will cost as much to run in five years as they do today, as will an incomplete data center cost just as much to finish construction, connect to the grid or acquire behind-the-meter (IE: generators) power for.  In the aftermath of the Dot Com Bubble, dead startups flooded the market with cheap server and office gear, which allowed plucky founders to cobble together their own services. A single Sun Microsystems Ultra Enterprise 3000 cost $43,000 ($89,000 in today’s money) and had a power draw of between 1,200W and 1,500W, but could run an entire company’s infrastructure . A single B200 Blackwell GPU uses 1,200W , and more-complex AI coding tasks can take up four to twelve of them for a single user’s output. Put simply, you can’t really do very much with a few of these GPUs, and what you can do isn’t profitable, scaleable or valuable. Similarly, dark fiber could be lit up with the right transceivers and networking gear to create internet access. AI data centers are effectively large boxes with custom cooling built for a very limited subset of chips. Adapting them to other uses would require gutting the data center, which would mean that the vast majority of the capital expenditures were wasted.  Even if you were able to buy a hundred Blackwell GPUs from a dead neocloud, you, as a regular person, couldn’t do anything with them. In fact, nobody really could, because you’d still need a physical data center and bespoke cooling , which means that even if the chips were free , the associated construction capex or, at the very least, physical colocation space would still cost a great deal of money The internet and railways didn’t go away because their up front costs were the only real costs that mattered.   Even if somebody were able to pick up a cheap AI data center full of the latest generations of GPUs, the underlying operating expenses are awful, and the only way to make them even close to generating a profit is to have consistent use of all your GPUs. There’s a cost to having them sit idle — both in electricity and personnel — and unless the plan is to have them sit in a data center turned off until you can find somebody else to sell them to, you’ll have to come up with a business model for your AI services that actually makes a profit…which nobody appears to have done, even with unlimited capital and the entire focus of the tech industry. Then there’s the issue of training , which is entirely made up of opex. If you want to train a new model, you’ll likely need thousands — or even tens of thousands — of H100 or H200 GPUs, and they’ll cost just as much electricity whether or not you make anything useful. A failed or unhelpful training run could cost tens of millions or hundreds of millions of dollars , and that will require financial backing that won’t exist. While there could be a theoretical future of LLMs run at their true cost (IE: unaffordable for most) as I covered in last week’s premium newsletter , that would require demand, and as I’ve discussed above, the demand for AI services is a mirage built on subsidized subscriptions, and companies paying the actual costs are already screaming for mercy.  Once the bubble bursts, any excitement for AI — and by extension excitement to spend money on AI — goes out the window. AI startups won’t get funded . AI token budgets won’t get greenlit . AI data centers won’t be able to raise debt .  Every part of this bubble relies upon the momentum of hype to substantiate every link in the chain. Hype must exist around the nebulous concept of an “ AI factory ” to raise debt to buy NVIDIA GPUs and build data centers, hype must exist around AI software to convince enterprises to keep buying services from OpenAI and Anthropic, hype must exist around theoretical demand and outcomes from AI services to fund AI startups, and hype must exist perpetually in the media to make everybody ignore AI’s ruinous costs.  This hype was unsustainable without buckets of lies, misinformation and a captured tech and business media. The value of AI has been inflated by the vagueness of how it’s discussed. For example, major media outlets will gladly write that “AI can build software,” but said sentence suggests that you can just type “build me Slack 2” into Claude and have it fart out a fully-functional, production-ready piece of software, rather than a quasi-functional mound of code-slop that can do enough to trick a business idiot or lazy journalist, but little else.  Said vagueness created a society-wide gravitational pull of consensus that you needed to be behind AI now, because it’s just like the new internet, except bigger, and if you say it’s not you’re going to be really embarrassed.   Creating this pressure was necessary, because without a society-wide aggression against those who didn’t adopt these tools, AI might have actually had to stand on its own merits. That fact AI companies backed by the full manufactured consent of the markets and most of the economy still had to subsidize their products shows exactly how flimsy their value truly is. The only way to inflate the AI bubble both on a hardware and software level was to mislead the general public and investors on the costs and efficacy of AI models.  Now that organizations are having to pay the actual cost of AI, suddenly they’re concerned about its outcomes, and everybody has become a little hysterical. Late last week, SemiAnalysis wrote one of the most insane articles I’ve ever read — AI Dark Output: The Visible Cost of Invisible Output — saying that “AI output will be real before it is measurable,” and, well, whatever the fuck this is: SemiAnalysis is a semiconductor analyst firm with an obvious reason to keep the AI bubble inflated, and if they’re writing a piece that amounts to “AI has a return on investment, you just can’t see it,” things are getting desperate. Here’s how they define “Dark Output”: That “substitution dark output” is explained using a theoretical example of “...a simple legal document which in theoretical GDP should have the same inflation adjusted value to a user whether a lawyer drafts it or AI drafts it,” which is nonsense.   When you pay a lawyer, you don’t pay them to “create an output,” you buy their experience and time and ability to find and adapt case law to reach an outcome, such as in the process of filing stuff, avoiding or actively participating in litigation. Just because AI can fart out an approximation of what a human output may look like — likely riddled with hallucinations — doesn’t mean that said output was created with any “experience.” Models don’t think , they have no experiences , and even if a lawyer is prompting them , that doesn’t mean that the lawyer’s discernment or taste is reflected in the final output. Then there’s this bit: We’re four fucking years into it but we’re still using hypotheticals. Are “...the simplest documents now completed by AI and not lawyers”? You don’t get a lawyer to write a document because they’re the only ones who can write it — you get it to mitigate the risk using the experience of the law firm, both in the associate drafting the document and the partner overseeing it. This flimsy, half-assed logic is how the AI bubble got inflated in the first place. Supposedly smart people continually show a total lack of awareness of how jobs work at basically every level, and in this case — where it should be theoretically possible to find and talk to a lawyer doing this — the supposed “dark output” includes “the research done to complete this article.”  You may be wondering what that “new work done by AI that wasn’t previously being done by humans because AI made it cheap” is, and the answer is “literature reviews” and “summarizing the last six months of email,” and I wish I was kidding. But don’t worry, “...there are anecdotal signs that a large fraction of current token spend is for new work that wasn’t previously paid for rather than replacing existing work.” Have you ever noticed that every story about AI job loss reads like it was written by The Riddler? For example, last year a ton of outlets reported that “Oxford Economics had proven that entry-level workers were being replaced with AI,” but in reality, the study said that “... there are signs that entry-level positions are being displaced by artificial intelligence at higher rates ” with no actual data beyond post-2022 employment declines in some fields that AI might be able to do.  Similarly, CNBC’s brainless headline that an MIT study found that AI “could already replace 11.7% of the US workforce” was entirely based on a labor simulation tool rather than any economic analysis of the actual shit AI can do and what it’s doing in the real world. That’s because AI job loss is a fucking myth. Every company laying off people because of “the power of AI” is doing so because their shareholders are mad and because they know they’ll get headlines.  And if it were actually happening there’d be fucking riots in the streets! Unemployment would be spiking! Things would be burning!  The thing that everybody wants you to avoid thinking about is that if AI worked as advertised, there would be obvious, impossible-to-ignore economic signs: For all of these things to happen, AI would have to be both flawless , hallucination free, a completely different product capable of autonomous intelligence and having unique ideas.  The reason that we can’t measure “AI job loss” is because AI can’t do jobs. It can be used to replace some specific contract positions with extremely shitty versions that don’t scale , but it does not replace jobs because it is incapable of human work. It cannot speak to colleagues, it cannot accrue experience, it does not have instincts or culture or taste or anything other than whatever training data has been crammed up its ass or through endless post-training.  Nevertheless, the threat of AI job loss has been enough to allow both Sam Altman and Dario Amodei to raise hundreds of billions of dollars lying about it, and now that both of them have walked back their job loss scare-propaganda , every oaf and moron that believed them without actually checking should be booted out of their representative industries. It’s fucking embarrassing! You should all be ashamed of yourselves! As I said above, the ROI of AI should be really easy to measure if it actually existed.   If AI was magically able to build and maintain software, we’d have small companies that could build and deploy at the scale of a hyperscaler, and hyperscalers would, in theory, be expanding their margins so aggressively that it would create a new golden age of software revenues…or they’d become entirely infrastructure providers, as anybody else could compete on software. But on a far-simpler level, it would be extremely obvious. Anybody can access ChatGPT, Claude or Gemini, effectively anywhere in the world. The theoretical “power” of AI is that it “just does stuff,” and the proliferation of LLMs would mean that somebody would’ve “done” some “stuff” that we could point at with exceptional ease. Random guys in the midwest would be pumping out profitable, functional, and feature-rich software. Lawsuits would be won by pro se plaintiffs with incredible counsel from a theoretical “ country of geniuses in a data center .” Four years in, we’d have one major AI-powered company demolishing the competition in any industry, or every industry would become so prevalent with (powerful) AI that it would effectively reduce the cost of the service to nothing.  We’d be able to point to companies that adopted AI and then completely fucking exploded. We’d be able to point to useless coworkers who were now doing impressive, meaningful work. There would be widespread economic upheaval, as the concept of a “large company” would lose meaning, because those theoretical “geniuses in the data center” would be automating all the work.” There also wouldn’t be so many pieces insisting that AI is super powerful and so many quotes from Business Idiots saying it’s “ real .” We wouldn’t talk about what AI could do at all. We wouldn’t need Anthropic to lie that Mythos was too powerful to release only to release it several months later .  We wouldn’t have to talk about the fucking potential at all because we’d be able to point to what was going on because it would be obvious! Last week, Bain & Co. released a study of 951 executives from companies with more than $100 million in revenue , and unsurprisingly, the data did not declaratively explain what the ROI of AI was: 10% of…what? What’s the cost you saved on? 10% of $10 million is a lot for a company with $100 million in revenue, but 10% of $1000 isn’t, much like 20% or 30% isn’t either! Yet there are two punchlines to come: This also assumes that those savings are enough to warrant future spending, which…this data does not actually prove. Thankfully, Bain did manage to publish one of the single-funniest quotes of the AI bubble: Put another way, the technology “worked (?),” but did not provide value in doing so. Sounds like it didn’t fuckin’ work to me! Bain had one other crucial bit of advice: Just so we’re clear, Bain & Co, a management consultancy with billions in annual revenue, is advising its clients that they should make sure that they’re getting some sort of return on their investment? And that reinvesting in something that doesn’t have a return on investment would be bad? If AI was real, these fucknuts would be replaced first! They’d replace everybody who wrote this report! You don’t need somebody to tell you this, and if you do you’re a fucking moron!  Thankfully, the AI industry is saved, as Sam Altman had the following to say about AI’s remarkable costs : Motherfucker you are the industry! You are the one that has to work this out! OpenAI is the AI industry ! You are OpenAI’s CEO! You lazy, ignorant, dog-brained loser!  This was an opportunity for “journalist” David Faber to push back, and here’s how that went: This is how the AI bubble inflated! This is how it happened! It happened every time a journalist asked a meaningful question and then immediately diverted to a totally different imaginary topic that made the subject feel good! David Faber, resign and give your job to somebody who has an iota of courage or pride in their work! Unbelievable! Sam Altman is worth billions of dollars, and OpenAI is allegedly worth $852 billion too, and the best he can give us is “teehee, someone else will work it out,” because Sam Altman is a loser that ingrates other losers empowered by losers to sell loser technology to other losers , and the only way that he’s been able to do this is because the people that should know better are sitting around their thumbs up their asses asking him whether there will be data centers in space. If AI had ROI, we wouldn’t be debating whether it had ROI. We wouldn’t discuss its potential, or whether it could, theoretically, under different circumstances, in the future, in a way that nobody can describe be super powerful and do all of the stuff it can’t do today.  If AI had ROI, we’d be able to point with specificity to inarguable examples of economic impacts. AI boosters can jerk their binguses all they like about how Spotify’s CEO said its best engineers don’t write any code anymore . What does that mean? Is Spotify shipping better features, and are those features launching at a rapid clip? Is the software more secure, or stable? Spotify’s design still looks like absolute dogshit ! Most software is worse! Things keep breaking everywhere , and in many cases it’s because of AI coding tools ! In fact, I’d be willing to believe that AI had a negative economic impact, increasing operating expenses across the board and giving some software engineers prompt-based concussions by automating some coding in a way that makes them lazy and bad at writing software by speeding up the process of writing code with so much of it that it’s impossible to review it all ( see Mo Bitar’s video ). LLMs appear to be able to write some code sometimes and do so at high speed , and ingratiates software engineers that don’t really care about writing software by making them feel like they wrote it.  While it might allow some things to go theoretically faster, the overall economic impact of AI-generated code appears to be worse code, worse software, and massive, multi-million dollar bills from Anthropic and Cursor . I will concede that some software engineers seem to like these things, and that many software engineers appear to be using them, but I am yet to see a single one who obsessively posts about their token spend create anything of note or worth, and none of these people appear to be able to point to the actual ROI of all that AI they’re using. I realize I’m painting with a broad brush, so let me get a broader one: I believe anyone who relies on LLMs for anything is a mark.  I don’t give a shit if you use them to spit out a script or do some simple sideline part of your job, or transcribe or dictate into them, or if you’ve used them as a search engine (and even then, you best check every source!), but the moment you rely on and run your entire process on these things, I immediately doubt your ability to do anything, or at the very least wonder how gullible you truly are when somebody ingratiates you enough. Why? Because every single “AI setup” I’ve seen anyone ever use involves a rube goldberg machine of bullshit deterministic scripts to try and bring the hallucination-guaranteed nature of LLMs to heel, usually to the point that you’re doing more work making the LLM work than you did before they existed, and you’re only proud of it because you feel like you’re special. There are, of course, exceptions. I’ve talked to a few people who describe LLMs normally, without hype, who tell very specific stories of very specific outcomes that save indeterminate amounts of time. There are some that have used LLMs to create python scripts to search and organize data, to which I say “you’re impressed with Python, not LLMs.”  If all we’re left with from this era is the ability for some people to write Python scripts without learning Python, this is still an egregious and horrifying waste of capital.  Remember: what you are using is the end result of over a trillion dollars of investment. It is only made possible through manufactured consent that actively misinforms people about the current and future capabilities of LLMs. They didn’t raise hundreds of billions of dollars by talking about any product currently on the market, and that’s because the current products are not very good products. You are all the victims of a con. No matter how “well” your Breakfast Machine of different API calls and if-this-then-that automations may or may not function, you have been sold a bill of goods for “artificial intelligence” that is impossibly stupid. When some of you are pushed to prove the ROI of AI, you immediately return to boring talking points about Uber, or the Dot Com Bubble, or some other slop fed to you by people actively conning you at this very moment.  I mean this with as much empathy as I can muster: if you’re a huge AI booster, why do you defend this so vociferously? What is it about my criticism that hurts? Is it that I’m yucking your yum? Is it that I don’t immediately ingest and regurgitate the theoretical idea that the thing you’re using all the time is or may become sentient? Is it because I’m not impressed?  I think it’s far more likely that people are angry that I’m asking simple questions that should have — and don’t — have satisfying answers. I’m also fundamentally unimpressed with anything I’ve seen an LLM do, because my requirement for software or hardware is that it works as advertised, and the very fundament of the AI con is that LLMs are sold based on their theoretical capabilities. The reason nobody can show you the ROI from AI is that AI does not have a return on investment. Large Language Models can speed up some things in a way that becomes increasingly less-valuable and accurate with the complexity of the task, and more investment in AI data centers does not appear to do anything other than expand the number of tasks that an LLM can attempt.  While some people have been able to get something out of generative AI, that something never seems to be a tangible or impressive achievement. Every “successful” AI story is a result of either ignoring the obvious problems with LLMs or mitigating them at a great cost for an aggressively expensive and mediocre result.  LLMs are sold as “AI,” a technology best-known for automating things, yet they can’t be trusted to run anything on their own.  Instead, they manipulate the user into covering up their errors, explaining away their failures, coddling their meager returns and crediting them with the actual labor that LLMs are meant to automate away.  They do so by their investors and executives conning the media and the markets with outright lies and half-truths that exploit society’s weak points. The media and markets are informed by people that neither understand technology nor history, and Business Idiots that have reached the heights of their careers through diplomacy and ratfucking that care only about attention and adulation for things that other people do.  LLMs coddle the easily-led and narcissistic into believing that the model is doing the work as the human being has to constantly cater to the model’s inefficiencies and inabilities, using more energy and resources than any technology ever made.  And yet with all the money, all the attention, all the resources, all the land, all the power, all the affordances and excuses and endless fucking applause for mediocrity, nobody can actually point to the ROI of AI, because it doesn’t exist outside of it burping out stolen content and enriching and ingratiating billionaire dullards. Even at a hundredth of the price I’d be dismissive, because everything I’ve seen is so decidedly unexceptional. I realize that some will say I’m dismissive of LLMs’ capabilities, and I’m sorry — I’m just not impressed. You spent a trillion dollars to make it somewhat easier to code some things sometimes but not in such a way that it actually results in anything, research reports that nobody will read, shitty powerpoint decks and excel spreadsheets, and art that looks like stock images because that’s exactly what it was trained on.  This shit needs to work every time without fail and be absolutely flawless and autonomous.  You are paying for a tool. You are paying for software. You are a customer. Your job is not to explain to others why this is exciting, nor is it your job to cover up for its mistakes. If you truly love this stuff you should be either secure enough in doing so that you don’t feel compelled to defend it or be demeaning to those that disagree. The fact that I have to write that sentence is proof that something is very, very wrong with the AI industry, and that LLMs are about far more than software.  If you liked this piece, you should subscribe to my premium newsletter. It’s $70 a year, or $7 a month, and in return you get a weekly newsletter that’s usually anywhere from 10,000 to 18,000 words, including vast, detailed analyses of the biggest events and companies in the AI bubble.  The foundation of software would be destroyed, as literally anyone could create and maintain any software they desired . Literally nobody would buy any software because they’d just type “computer make me a Slack clone for my organization” and it would magically appear on AWS.  The SaaSpocalypse ( see my premium here ) is a media and market-based hallucination where the collapsing growth of software companies is being explained as “AI taking their business” versus “private equity and venture capital overvalued software companies between 2018 and 2022 to the point that Apollo’s John Zito said “ all the marks are wrong ,” which is very bad, but nothing to do with AI. Accountancy would completely collapse, as nobody would need anyone but ChatGPT to do their taxes. Law schools would collapse, because legal internships would become useless and law firms would no longer have need for the thousands of new associates, because ChatGPT could just draft it all.  Legal salaries would also dramatically collapse. Research in effectively every discipline would collapse, because you could ask for a detailed report and said report would be better than any human being creates. The entirety of scientific research would change, because you could now automate many different disciplines out of existence.

0 views
Martin Fowler 4 days ago

Fragments: June 2

Greg Wilson has noticed that lots of folks are using dodgy metrics to figure out if AI tools are worth their costs. Would you measure lines of code generated, or tickets closed? Or would you send out a survey asking whether developers feel more productive? Each of those approaches is flawed in a different way; He lists lots of common metrics, and why they are flawed. Sadly he doesn’t give any suggestions on what would be better. In my view, since we cannot measure productivity , any metrics are weak evidence at the best of times. I do somewhat use one of his flawed measures: “Asking Developers If They Feel More Productive”. While I acknowledge the problems he gives with this measure, I find that in an environment where decent measures are hard to find, even such a dim light is the best we have. In this situation these kinds of qualitative metrics may not be conclusive, but they are useful . ❄                ❄                ❄                ❄                ❄ Benedict Evans observes that extensive automation didn’t mean the demise of professions in the past. we spent a century automating accounting: we built calculating machines, punch cards, mainframes, data processing, databases, PCs, spreadsheets, ERPs, cloud… in fact, we built half of the tech industry around automating this. Yet the number of accountants kept going up. He goes into the myriad of problems that exist when we’re trying to forecast the impact of a technology on jobs. There’s the much-talked-about Jevons paradox - once something becomes cheaper, people do it more, which can increase demand. Often this leads to the nature of jobs changing, even if it’s called the same thing. Accountants today aren’t doing exactly the same work that they did in 1970 or 1980 ‘but more’ - they’re still called ‘accountants’ but the job is different. New technology often starts out being used for ‘the old thing but more’, but it rarely ends up like that. Technologies often affect whole businesses - consider the impact of the internet on news publishing. Did anyone observing the rise of smart phones in the early 2000s realize that a consequence of this would change the economics of taxis due to the rise of ride-sharing apps? The conclusion is that it is, at the very least, almost impossible to forecast the impact of AI on our work. ❄                ❄                ❄                ❄                ❄ Stephen O’Grady looks at how closed and open models have performed on benchmarks over time . Closed models are setting the pace of innovation, and constantly breaking new ground from a capabilities standpoint. Open models are chasing them, and the cycle times seem to be getting shorter. There are no clear capability moats, and what is frontier today is table stakes tomorrow. It tooks 13-18 months for open models to catch up to GPT-4 on these benchmarks, but only 2-7 months to catch up to GPT-4o. There’s a bunch of caveats to this analysis, that he lists, but it’s a worthwhile survey of how various kinds of models perform against the various measures we are trying to assess them with. ❄                ❄                ❄                ❄                ❄ One of the starkest examples of sloppy AI use is hallucinated citations - a give-away of both usage of LLMs and carelessness driving them. GPTZero is a company that makes tools to detect AI writing. I’ve no insight as to whether their tool is effective or not, but they do publish investigations of AI usage, and have published several articles highlighting hallucinated citations. One post focuses on Ernst & Young Canada’s report on cyber threats to loyalty systems and found that more than half its references were hallucinations. The post uses a lot of extremely annoying animations in how it presents its information (breaking Safari’s reader mode in the process). But the harm that these kind of AI generated reports can do goes further than just some misled humans: Publishing a report online is essentially a form of data injection into the pool of knowledge that is the internet. When the report includes fake information (either vibed citations or false claims) it can “poison the well” by misleading future researchers, especially if the report is published by a well-known consulting firm and hosted on a high-traffic website. ❄                ❄                ❄                ❄                ❄ As LLMs get more capable in programming, we are rightly worried that people will use them attack software systems. But these models can also be used for defense, allowing teams to find bugs before attackers do. Some folks from Mozilla posted an article on how they’ve used AI model to identify and fix an unprecedented number of latent security bugs in Firefox . Just a few months ago, AI-generated security bug reports to open source projects were mostly known for being unwanted slop. Dealing with reports that look plausibly correct but are wrong imposes an asymmetric cost on project maintainers: it’s cheap and easy to prompt an LLM to find a “problem” in code, but slow and expensive to respond to it. It is difficult to overstate how much this dynamic changed for us over a few short months. This was due to a combination of two main factors. First, the models got a lot more capable. Second, we dramatically improved our techniques for harnessing these models — steering them, scaling them, and stacking them to generate large amounts of signal and filter out the noise. During 2025, there were 17-31 security bugs fixed each month. In April 2026, they fixed 423. ❄                ❄                ❄                ❄                ❄ Pavel Voronin riffs on Unmesh Joshi’s post on What is Code . He observes that cruft in a codebase (technical debt) has always added friction to software development. But the consequences of this cruft are compounded when LLMs are using existing code as context for future work. In a degraded codebase, the model does not see “technical debt” as debt. It sees examples. It sees precedent. It sees a style to continue. LLMs multiply what’s currently happening. I hear reports that good code might take the place of much of what’s put in markdown, because LLMs will imitate what’s already in the code base. But bad code multiplies too. Inevitably he introduces another variation of rampant debt metaphors: Cognitive debt accumulates when a team uses abstractions it no longer understands. Generative debt accumulates when a codebase contains confused concepts that models are likely to continue. Cognitive debt is about what the team no longer understands. Generative debt is about what the model is now likely to reproduce. ❄                ❄                ❄                ❄                ❄ Jason Koebler, from the very worthwhile 404 media, has written a plaintive essay on how AI-generated slop is driving us crazy . Not just because its filling the web with this slop, but also because how it’s making us humans react to slop and the threat of slop. We review our own writing and notice: it’s not just reading AI slop that hurts us, it’s the risk that we write something that looks like AI slop. If I use phrasing that AI copied from me, does it seem like I’m copying AI? This has led to the appearance of “humanizers” - AI tools that make our writing look less like AI. Humanizers add typos, randomly replaces words, removes “AI tells,” and sometimes inserts random characters. It’s another step on the way to the Zombie internet: I called it the Zombie Internet because the truth is that large parts of the internet are not just bots talking to bots or bots talking to people. It’s people talking to bots, people talking to people, people creating “AI agents” and then instructing them to interact with people. […] It’s my email inbox, in which I used to occasionally get poorly-formatted, poorly written, extremely long emails from delusional people who were positive the CIA had imprisoned them in a virtual torture chamber using undisclosed secret technology but where I now get well-formatted, passably written, extremely long emails from delusional people who are positive they have proven AI sentience and have the AI transcripts to prove it. ❄                ❄                ❄                ❄                ❄ Andy Osmani points out that spawning lots of agents is like launching a bunch of parallel processes that all rely on a single orchestrating thread - yourself . Python has the Global Interpreter Lock (GIL). You can spawn as many threads as you want but only one executes python bytecode at a time because they must acquire the lock. You are the GIL of your AI agents. They all can run at once. But when any of their work needs genuine understanding of the architecture or resolving merge conflicts, that work has to acquire the lock. There is one lock. You hold it. This means you must design the workflow with the agents with that GIL in mind. You shouldn’t launch more agents than you can properly review. It’s handy to separate background tasks that can be offloaded to an agent from complex tasks that require applied attention. Don’t use that precious brain for things that the machine can verify itself. [And I’d add - do get the machine to build tools that ease human verification. For example, it’s better to surface test case data in tables rather than buried in assert statements.] Spawning agents is not the skill. Anyone can run 20. The real skill is designing the system around the one serial resource that cannot be cloned or parallelized. That resource is your attention. ❄                ❄                ❄                ❄                ❄ Jamie Hurst is a Principal Engineer at booking.com, where he works in developer experience with a focus on AI tooling. He’s written realistically about the gains and losses of using LLMs in this work. The cost of building has collapsed, but the cost of aligning organisationally has not. If anything, it’s gone up. When three different teams can each produce a working solution to the same problem in the time it used to take to write a proposal, the bottleneck moves from engineering to coordination. He thinks he’s able to do more as a senior engineer, but is concerned about how sustainable it is, both for him personally and for the organization he works for. He’s able to shape directions for multiple workstreams at once, in a way that he couldn’t three years ago. But one loss is that he doesn’t have enough time for mentoring, which will exact a toll on his employer in the longer term. He also finds he doesn’t have enough time to think. The productivity gains from AI got captured by output volume rather than output quality. The org’s expectations rose to absorb the speed-up, and the slack that used to exist between tasks, the unstructured time where strategic thinking actually happens, got eaten first because it’s invisible on a dashboard. I’m at a point in my career where thinking is supposed to be most of the job, and most of it now happens on holiday because the working week doesn’t accommodate it.

0 views
ava's blog 4 days ago

be a good cook when you use AI to edit your writing

Whenever someone talks about how they let AI improve their writing, I realize we are still taking the wrong things away from what good writing supposedly is. Not that I am the arbiter of good writing, but we can agree that at its core, good writing is a pleasure to read, connects to the reader, respects the context and chooses the correct tone for the audience. It’s also about correctly identifying when “good” writing is needed at all. I think the literacy crisis we are going through extends in this way, where people aren’t just lacking in media literacy, but lacking in the skills above. It’s easy to think that good language is always full of jargon, that being an expert on something means long, drawn out explanations, and that you should use a supposedly intelligent, professorial tone all the time. It’s only with education and reading a lot that you learn that good writing is a spectrum, and all these things depend on the author’s style, occasion and intent, and are used in the right moments. That’s why people who generate responses to casual chat messages aren’t being met with excitement; cases like when you get an AI-generated Happy Birthday text, or when your friend replies to your vent with an AI-generated professional therapist response. These people want to do it right , but don’t respect context-switching and why some interactions need to decidedly be personal and “imperfect” to others. They have only taken away that good writing is big words, many words, and they are willing to shoehorn it into everything. I cannot blame anyone for reading over an AI-generated improvement of their text and thinking “ Wow, that’s so much better! ”. On the first read, it does seem impressive. And I don’t wanna sit here and pretend humans don’t manage to choose a completely wrong tone for the occasion or audience without AI as well, but it seems like many don’t actually tell AI the context or audience, and AI guesses incorrectly. People know what you sound like and how you usually write. Of course you are allowed to improve and change your writing style, but people will know when it is very sudden, completely out of character, and not something you’d manage on your own. And if you overdo it, AI will turn a concise, engaging and personal read with your own endearing quirks into either SEO marketing language, or an extremely dry scientific journal style read. You should be able to detect when that happens and take a step back. Otherwise you will sit there, proud of yourself that you wrote that, when it is so markedly different to your usual style and draft that you essentially employed a ghostwriter and pat yourself on the back for its output. And weirdly enough, I get the feeling many of you were never interested in “improving” your writing when it didn’t mean just copying a machine’s work. That’s having an editor, not you improving on your skills. You can liken it to skills in the kitchen: People who are just learning how to cook are learning about spices and think: The more spice, the better, so throw all of it in! Until a dish doesn’t taste good at all; too salty, too intense, everything is clashing. There is a point when it doesn’t elevate the dish, but ruins it. Some occasions don’t call for a curry, but instead a salad. A good cook will know the right dish and how to use ingredients and spices to make it pop. Don’t come with the fine dining if the people want your rustic potato bake. Employing AI to improve your text into oblivion is a slippery slope to sounding uneducated and phony. Please get away from the notion that longer and more complicated is better just for the sake of it. Reply via email Published 02 Jun, 2026

0 views

I went on the Built for Turbulence podcast

I joined Radical's "Built for Turbulence" podcast recently for a wide-ranging chat on what AI agents are doing to the economics of software. We got into whether 5 people with agents can really out-build a 500-person company, the "Figma Trap" where you end up paying your competitor through your own customer usage, and why I think running human-written code that hasn't been AI-audited is going to start looking reckless. We also got onto open weights quietly closing up, why "safe" enterprise AI tools may be handicapping the organisations using them, and whether the small-team-with-agents advantage is temporary or here to stay. You can find the episode here on Spotify, Apple Podcasts and everywhere else.

0 views

Is the Monaco Grand Prix decided at qualifying?

A Formula One driver triggered my fact-checkitis. They claimed that Winning the Monaco Grand Prix in Monte Carlo is determined nine out of ten times by which position one starts in. That makes intuitive sense, because the Monte Carlo track is a narrow street track with few opportunities for overtakes. But … really? Is that an off-the-cuff remark or an accurate statistical prediction of the race? (Continue reading the full article on the web.)

0 views
iDiallo 5 days ago

The web is changing, and we are not going back

Whenever I saw someone type a natural language query into Google, it made me cringe. "It's not a person," I would say. "Type like you're talking to a machine." This was especially true for programmers and it was before AI took over everything. Instead of "how do I write a function that reads a file?", I would suggest they use specific keywords, something that sounded more like machine language than conversation. "js function to read csv file" or "css gradient background property example." This got you better results. Even though Google was a sophisticated search engine, it was still doing a kind of keyword matching under the hood. But not anymore. You don't get any advantage from writing in "machine language." Google understands natural language just as well. In fact, even better. How is it that in 2026, I Google things less than ever? It's not that I know everything now. It's more that I don't want to call the friend who always talks too much. If the height of the Eiffel Tower ever comes up in conversation, I'll type "eiffel tower wiki" and click through to Wikipedia. I don't want to have a conversation about it. Googling something these days feels like Google is trying to join my private conversation. Where it used to be a tool for finding answers elsewhere, now it's a buddy who gives you an answer. And just as you're about to leave, it says, "hey, did you also know that..." There used to be a machine between me and the information I was looking for. It was good at its job. It sorted, ranked, then presented information. But now, the machine is constantly pushing information at me, watching my reaction, learning from it, and feeding me more, unsolicited. Before, information lived on the web and was hard to find. Today, information still exists, but it's buried under noise. Google no longer helps you find it, it just gives you an answer. That answer might be right or wrong, and right below it, in small print: "AI responses may include mistakes." You rarely get to verify whether the answer is correct, because almost no one clicks through to the source. I know this firsthand. More than three-quarters of my Google referral traffic has disappeared, while my search impressions keep climbing. So what's left to do? I could mourn the old Google, the simpler web. But as the title says, we aren't going back. This is the new reality, and we have to adapt. Rather than blindly embracing change, I think it's smarter to pick and choose. Just last week, I wrote about the small web still being alive . And it did exactly what its name suggests. It stayed small. There are other search engines built for people who want more control. DuckDuckGo. Kagi (my personal favorite). The habit of Googling everything is learned behavior and learned behaviors can be unlearned. What's harder to convey is that Google never presented us with facts, only sources and citations. The way the google answer is presented, we have the impression they are giving us undisputable truths. When everyone is sharing screenshots of the answer they got, all you can do is share a screenshot of the opposite answer you got. The source gets lost. That's where we are now. Skimming the average sentiment of a Reddit thread, or confirming something we already believed. This is the new reality. We're not going back to keyword matching. But I also don't have to accept the new way as the only way. Google has made its search box AI-first and that's their right, it's their product. But it's also my opportunity to try something different. We are not going back. So I might as well choose where I go next.

0 views

Hackers Used Meta’s AI Support Bot to Seize Instagram Accounts

The Instagram accounts for the Obama White House and the Chief Master Sergeant of the U.S. Space Force were briefly defaced with pro-Iranian images and messages over the weekend, after instructions began circulating on Telegram showing how to trick Meta’s “AI support assistant” bot into resetting account passwords. A screenshot from a video released on Telegram claiming to show how Meta’s AI customer support bot could be tricked into resetting a target’s password. On May 31, word began to spread on several Telegram instant message channels that Meta’s AI bot would happily add an email address to an existing account as part of the bot’s standard password reset flow. A video released on Telegram by pro-Iran hackers claimed to document a remarkably simple exploit that appears to have involved using a VPN connection with an IP address that is in or near the target’s usual hometown, requesting a password reset for the account, and then choosing to chat with Meta’s AI support assistant. From there, the video shows the attacker told the bot to link the account in question to a new email address, after which the bot dutifully sent that address a one-time code that allowed a password reset. The Telegram account that posted the video also linked to screenshots of pro-Iran images, videos and messages that defaced the hacked Instagram accounts, saying hackers had used the exploit to hijack a number of valuable (read: short) Instagram account names that allegedly have a resale value of more than a half million dollars. Meta has not responded to requests for comment on the video’s claims, but the company reportedly did acknowledge the dormant Instagram account for the Obama White House was briefly compromised. The security blog thecybersecguru.com reports that Meta pushed an emergency patch over the weekend, and clarified that no back end database was breached. “Instagram has notoriously poor human support infrastructure,” Cybersecguru wrote. “Recovering a locked account – especially a high-value one can take weeks of back-and-forth with an automated ticketing system. Meta’s solution was to deploy a conversational AI layer to handle common recovery workflows: relinking a lost email address, triggering a password reset, verifying account ownership. The assistant, presumably, was supposed to reduce friction for legitimate users stuck in account-access hell.” Ian Goldin , a threat researcher at Lumen’s Black Lotus Labs , said we’re entering unchartered security territory as more large online platforms start allowing AI chatbots to handle sensitive account recovery requests. Just like human customer support employees can be social engineered into providing unauthorized access to someone’s account, AI bots are equally eager to help and vulnerable to persuasion and trickery, he said. “AI chatbots create interesting new attack surface, and we’re likely going to see a lot more of these kinds of attacks,” Goldin said. Securing your various online accounts means taking full advantage of the most secure form of multi-factor authentication (MFA) offered (such as a passkey or security key). In this case, even using the least robust form of MFA that Instagram offers — a one-time code sent via SMS — likely would have blocked the exploit: The hackers who released the video on Telegram said their exploit failed to work against any accounts that had MFA enabled.

0 views